arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.07600 2026-06-09 cs.LG cs.AI 新提交

Reachability and asymptotics of Gaussian Transformer dynamics

高斯Transformer动力学的可达性与渐近性

Albert Alcalde, Zhengping Ji, Enrique Zuazua

发表机构 * Friedrich–Alexander University Erlangen–Nürnberg（弗里德里希-亚历山大大学埃尔朗根-纽伦堡）； Research Council of Norway（挪威研究理事会）

AI总结将Transformer数据传播建模为概率测度空间上的非线性控制系统，证明高斯分布在自注意力与仿射前馈层下保持高斯性，从而降维为双线性控制系统，并揭示与Riccati方程的联系。

详情

AI中文摘要

我们将通过Transformer（驱动大型语言模型的机器学习架构）的数据传播建模为概率测度空间上的非线性控制系统。对于具有自注意力和仿射前馈层的平均场Transformer模型，我们证明高斯分布在诱导流下保持严格高斯性。这种不变性将无限维测度动力学简化为控制均值和协方差演化的有限维双线性控制系统，将Transformer的表达能力重新表述为关于指定高斯矩的可达性问题，并揭示了与经典滤波和控制中Riccati型方程的新联系。\n对于时变控制，我们证明任何目标高斯分布（其协方差矩阵与初始协方差矩阵具有相同秩）的精确有限时间可达性，该秩约束是动力学的一个内在不变量。对于时不变参数，我们推导出显式的谱条件，这些条件要么导致正定平衡点的渐近稳定性，要么导致协方差的有限时间爆破。\n数值实验补充了理论，表明具有高斯输入的实际Transformer在早期和中间层保持与矩匹配的高斯分布接近，而具有指定注意力矩阵的Transformer再现了预测的协方差状态：在稳定配置中有界演化，在失稳配置中爆破。

英文摘要

We formulate data propagation through the Transformer, the machine learning architecture powering large language models, as a nonlinear control system on the space of probability measures. For the mean-field Transformer model with self-attention and affine feed-forward layers, we prove that Gaussian distributions remain exactly Gaussian along the induced flow. This invariance reduces the infinite-dimensional measure dynamics to a finite-dimensional bilinear control system governing the evolution of the mean and covariance, reformulates the expressive capacity of Transformers as a reachability problem for prescribed Gaussian moments, and reveals a novel connection with Riccati-type equations from classical filtering and control. For time-varying controls, we prove exact finite-time reachability of any target Gaussian distribution whose covariance matrix has the same rank as the initial one, this rank constraint being an intrinsic invariant of the dynamics. For time-invariant parameters, we derive explicit spectral conditions leading either to asymptotic stability toward positive-definite equilibria or to finite-time blow-up of the covariance. Numerical experiments complement the theory by showing that practical Transformers with Gaussian inputs remain close to moment-matched Gaussian distributions through early and intermediate layers, while Transformers with prescribed attention matrices reproduce the predicted covariance regimes: bounded evolution in stabilizing configurations and blow-up in destabilizing ones.

URL PDF HTML ☆

赞 0 踩 0

2606.07601 2026-06-09 cs.LG cs.AI 新提交

LFNO: Bridging Laplace and Fourier via Transient-Steady Decomposition

LFNO：通过瞬态-稳态分解桥接拉普拉斯与傅里叶

Jeongun Ha, Sanga Yoon, Donghun Lee

发表机构 * \dagger（† \dagger）

AI总结提出拉普拉斯-傅里叶神经算子（LFNO），通过双分支架构显式分解系统动力学为瞬态和稳态分量，在九个基准上超越现有算子，提升稳定性和可解释性。

Comments 21 pages, 11 figures

详情

AI中文摘要

我们引入了拉普拉斯-傅里叶神经算子（LFNO），这是一个统一框架，通过整合拉普拉斯和傅里叶神经算子的谱优势，对跨瞬态和稳态区域的动力系统进行建模。LFNO采用双分支架构，将系统动力学显式分解为瞬态和稳态分量。我们在九个基准上评估了LFNO，包括三个ODE系统（Duffing、Lorenz和Pendulum）和六个PDE系统（Euler-Bernoulli梁、热方程、反应扩散、Brusselator、Burgers和Navier-Stokes）。在瞬态动力学占主导的ODE系统上，LFNO显著优于现有算子，并且在PDE基准上持续超越LNO，同时达到与FNO竞争的性能。此外，LFNO通过其分量分解提供了改进的稳定性和物理可解释性。这些结果表明，LFNO为跨多个时间尺度学习复杂动力系统提供了一种鲁棒且统一的方法。

英文摘要

We introduce the Laplace-Fourier Neural Operator (LFNO), a unified framework for modeling dynamical systems across transient and steady-state regimes by integrating the spectral advantages of Laplace and Fourier Neural Operators. LFNO employs a dual-branch architecture that explicitly decomposes system dynamics into transient and steady-state components. We evaluate LFNO on nine benchmarks, including three ODE systems (Duffing, Lorenz, and Pendulum) and six PDE systems (Euler-Bernoulli beam, Heat, Reaction-diffusion, Brusselator, Burgers, and Navier-Stokes). LFNO significantly outperforms existing operators on ODE systems, where transient dynamics dominate, and consistently surpasses LNO while achieving performance competitive with FNO on PDE benchmarks. Furthermore, LFNO offers improved stability and physical interpretability through its component-wise decomposition. These results demonstrate that LFNO provides a robust and unified approach for learning complex dynamical systems across multiple temporal scales.

URL PDF HTML ☆

赞 0 踩 0

2606.07604 2026-06-09 cs.LG cs.AI 新提交

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

贡献权重：自注意力Transformer的几何分析

Harry Jake Cunningham, Nicola Muca Cirone

发表机构 * University of Cambridge（剑桥大学）

AI总结提出基于投影的贡献权重度量，结合注意力权重、值向量大小和方向对齐，更准确识别关键令牌，并揭示注意力汇的主动抑制功能。

详情

AI中文摘要

分析注意力权重已成为解释大型语言模型（LLM）信息流的标准方法。然而，这种方法有显著局限性，因为它忽略了被聚合的值向量的几何特性。为了解决这个问题，我们引入了\emph{贡献权重}，这是一种基于投影的度量，通过考虑令牌的注意力权重、值大小以及与层输出的方向对齐来量化令牌的影响。我们证明，贡献权重提供了更忠实的令牌重要性度量，在不同解码器模型、任务和数据集中，始终优于基于注意力的度量，用于识别语义关键令牌。此外，我们的度量能够对\emph{注意力汇}进行新的机制分析。虽然先前的工作将注意力汇描述为多余注意力的被动存储库，但我们揭示它们起到了主动的功能作用，通过汇率与输出范数之间的凸关系抑制信息，通过反对低置信度令牌的语义漂移来稳定表示。

英文摘要

Analyzing attention weights has become a standard approach for interpreting the information flow of Large Language Models (LLMs). However, this approach has significant limitations as it neglects the geometric properties of the value vectors being aggregated. To address this gap, we introduce \emph{Contribution Weights}, a projection-based metric that quantifies a token's influence by accounting for it's attention weight, value magnitude, and directional alignment with the layer output. We demonstrate that contribution weights provide a more faithful measure of token importance, consistently outperforming attention-based metrics in identifying semantically critical tokens across different decoder-only models, tasks, and datasets. Further, our metric enables novel mechanistic analysis of \emph{attention sinks}. While previous work characterized sinks as passive repositories for excess attention, we reveal they serve an active functional role, suppressing information through a convex relationship between sink rate and output norm, stabilizing representations by opposing the semantic drift of low-confidence tokens.

URL PDF HTML ☆

赞 0 踩 0

2606.07695 2026-06-09 cs.LG cs.AI 新提交

DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems

DSFNet：面向城市交通系统多模态时空预测的双域谱算子学习

Yongchao Li, Yang Li, Zhuoxuan Li, Jun Chen, Chu Zhang, Jinde Cao, Leszek Rutkowski

发表机构 * Southeast University（东南大学）； Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies（江苏省现代城市交通技术协同创新中心）； City University of Hong Kong（香港城市大学）； School of Mathematics, Southeast University（东南大学数学学院）； Systems Research Institute of the Polish Academy of Sciences（波兰科学院系统研究所）； Luoyang Normal University（洛阳师范学院）； Purple Mountain Laboratories（紫金山实验室）； AGH University of Krakow（AGH科技大学）

AI总结提出双域谱滤波网络DSFNet，通过特征域和空间域谱算子分解空间-模态交互，显式建模跨变量耦合与异质空间依赖，结合外部门控机制自适应调节时间动态，在五个真实交通数据集上MAE降低3.21%-10.16%。

详情

AI中文摘要

多模态时空预测（MoSTF）通过引入多样化的交通模态扩展了传统的时空预测。尽管近年来在时空建模方面取得了显著进展，现有方法往往未能显式建模不同模态变量之间的耦合关系。准确的MoSTF具有挑战性，因为它需要建模（1）外生影响下的时间动态异质性和（2）异质空间依赖性以及复杂的跨变量耦合。为了解决这些挑战，我们提出了双域谱滤波网络（DSFNet）。我们的框架采用双域谱滤波来捕获异质空间模式并显式建模变量之间的关系。与基于图的消息传递或节点-模态对上的密集注意力不同，DSFNet将空间-模态交互分解为特征域和空间域谱算子，从而实现了非局部依赖和跨模态耦合的可扩展建模。此外，我们引入了一种外部门控机制，以自适应地调节外部影响下的时间动态。我们通过在五个代表性真实世界交通数据集上的大量实验验证了我们的方法。与次优基线相比，DSFNet在这些数据集上将MAE降低了3.21%-10.16%。结果表明，DSFNet在准确性上显著优于现有最先进基线，同时表现出高效性和鲁棒性。

英文摘要

Multi-Modality Spatio-Temporal Forecasting (MoSTF) extends traditional spatio-temporal forecasting by incorporating diverse traffic modalities. Despite significant recent strides in spatio-temporal modeling, existing approaches often fail to explicitly model the coupling relationships between different modality variables. Accurate MoSTF is challenging, as it requires modeling (1) temporal dynamic heterogeneity under exogenous influences and (2) heterogeneous spatial dependencies alongside complex cross-variable couplings. To address these challenges, we propose the Dual-Domain Spectral Filtering Network (DSFNet). Our framework employs dual-domain spectral filtering to capture heterogeneous spatial patterns and explicitly model the relationships between variables. Unlike graph-based message passing or dense attention over node-modality pairs, DSFNet factorizes space-modality interactions into feature-domain and spatial-domain spectral operators, enabling scalable modeling of nonlocal dependencies and cross-modality couplings. Furthermore, we introduce an external gating mechanism to adaptively regulate temporal dynamics under external influences. We validate our method through extensive experiments on five representative real-world traffic datasets. Compared with the second-best baselines, DSFNet reduces MAE by 3.21%-10.16% across these datasets. The results demonstrate that DSFNet significantly outperforms existing state-of-the-art baselines in accuracy while exhibiting efficiency and robustness.

URL PDF HTML ☆

赞 0 踩 0

2606.07710 2026-06-09 cs.LG cs.AI 新提交

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing

WhiFlash: 通过令牌级跨范式路由加速推测解码

Young D. Kwon, Miles Williams, Rui Li, Alexandros Kouris, Stylianos I. Venieris

发表机构 * Samsung AI Center, Cambridge, UK（三星AI中心，剑桥，英国）

AI总结提出WhiFlash，首个统一自回归与扩散并行草稿的跨范式推测解码方法，通过细粒度路由和缓存优化实现高达69.6%的吞吐量提升。

Comments Under review

详情

AI中文摘要

大型语言模型的自回归特性仍然是推理的主要瓶颈，特别是在复杂的代理工作负载中。虽然推测解码加速了推理，但当前方法依赖于静态草稿范式，使用自回归草稿模型进行推理或基于扩散的并行草稿模型生成结构化输出。我们经验发现，草稿准确性在单个序列内波动剧烈，静态范式和粗粒度路由导致显著性能未实现。为解决这种波动性，我们引入WhiFlash，首个跨范式推测解码方法，在单个令牌级控制器下统一自回归和基于扩散的并行草稿。WhiFlash采用细粒度路由机制，使用轻量级基于熵的或学习到的神经策略，两者均参数化以在预期令牌增益和延迟之间提供可调平衡。为使高频切换计算可行，我们引入新颖的缓存管理优化——惰性追赶和仅KV预填充，将切换开销降低到每轮延迟的7%以下。通过利用根本不同草稿架构的互补优势，WhiFlash实现了显著更高的接受长度，在特定类别上吞吐量比最先进的自回归EAGLE-3提升高达69.6%，比基于扩散的DFlash提升37.3%。

英文摘要

The autoregressive nature of large language models (LLMs) remains a significant bottleneck for inference, particularly in complex agentic workloads. While speculative decoding (SD) accelerates inference, current approaches rely on static drafting paradigms, utilising either autoregressive drafting models for reasoning or diffusion-based parallel drafting models for structured outputs. We empirically find that drafting accuracy fluctuates dramatically within a single sequence, leaving significant performance unrealised by static paradigms and coarse-grained routing. To address this volatility, we introduce WhiFlash, the first cross-paradigm SD method that unifies autoregressive and diffusion-based parallel drafting under a single token-level controller. WhiFlash adopts a fine-grained routing mechanism that employs either a lightweight entropy-based or a learned neural policy, both parametrised to provide a tunable balance between expected token gain and latency. To make high-frequency switching computationally viable, we introduce novel cache-management optimisations, Lazy Catch-up and KV-only Prefill, reducing switching overhead to below 7% of per-round latency. By capitalising on the complementary strengths of fundamentally distinct drafting architectures, WhiFlash achieves significantly higher acceptance lengths, yielding category-specific throughput gains of up to 69.6% over the state-of-the-art autoregressive EAGLE-3 and 37.3% over the diffusion-based DFlash.

URL PDF HTML ☆

赞 0 踩 0

2606.07856 2026-06-09 cs.LG 新提交

Teacher-Free Self-Training Amplifies but Does Not Compound: A Pass@$K$ Crossover on a Free-Verifier Domain

无教师自训练放大但不复合：自由验证器域上的 Pass@$K$ 交叉

Igor Lima Strozzi

发表机构 * Federal University of Rio de Janeiro（里约热内卢联邦大学）

AI总结在自由验证器域上，使用无教师自训练（STaR）和批评者指导的选择，发现自训练放大模型能力但不复合，通过 Pass@$K$ 交叉诊断证实。

详情

AI中文摘要

当语言模型在其自身验证的输出上训练时，它是获得了超越其基础的能力，还是仅仅更好地表达了基础已有的能力？我们通过一个无教师的“星座”——一个生成器、一个学习到的批评者和一个自由精确验证器——在一个 FlashFill 风格的“陷阱门”DSL 上使该问题可判定，其中验证的（问题，解决方案）对易于合成、难以反转且可自由精确检查。一切都在单个 4 位 Qwen3-4B 上运行，使用单个 24 GB GPU，循环中没有比基础更大的模型。我们报告三个发现。(i) 批评者指导的选择优于验证器过滤的最佳 $k$ 选择，提高了 $+9.1$ 个百分点（$6/6$ 种子），全部增益集中在候选者在保留输入上意见不一致的任务上。(ii) 每轮 STaR 自训练提高了上限但从未加速——增益跟踪剩余空间并在 $K=4$ 个独立训练轨迹上减速。(iii) 该域没有清晰的零能力边界，因此通常的“$0\% \to$ 爬升 $=$ 涌现”测试在此无效。一个测量的 Pass@$K$ 交叉解决了诊断：训练模型在操作预算（Pass@$8$）上获胜，但基础模型在大预算（Pass@$64$）上在每个轨迹上超越它，因此自训练集中概率质量而非扩展覆盖范围。这是放大，而非复合。（$K=4$ 是指示性的，尚不是跨轨迹的稳健置信区间。）

英文摘要

When a language model trains on its own verified outputs, does it acquire capability beyond its base, or merely get better at expressing capability the base already had? We make the question decidable with a teacher-free "constellation" -- a generator, a learned critic, and a free exact verifier -- on a FlashFill-style "trapdoor" DSL, where verified (problem, solution) pairs are cheap to synthesize, hard to invert, and free to check exactly. Everything runs on one 4-bit Qwen3-4B on a single 24 GB GPU, with no model in the loop larger than the base. We report three findings. (i) Critic-guided selection beats verifier-filtered best-of-$k$ by $+9.1$ pp ($6/6$ seeds), with the entire gain localized to tasks where candidates disagree on held-out inputs. (ii) Per-round STaR self-training raises the ceiling but never accelerates -- the gain tracks remaining headroom and decelerates across $K=4$ independent training trajectories. (iii) The domain has no clean zero-capability frontier, so the usual "$0\% \to$ climb $=$ emergence" test is invalid here. A measured pass@$K$ crossover settles the diagnosis: the trained model wins at the operating budget (pass@$8$) but the base overtakes it at a large budget (pass@$64$) on every trajectory, so self-training concentrates probability mass rather than expanding reach. This is amplification, not compounding. ($K=4$ is indicative, not yet a robust across-trajectory CI.)

URL PDF HTML ☆

赞 0 踩 0

2606.07881 2026-06-09 cs.LG 新提交

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

打破气泡：具有有界权重不一致性的异步流水线并行训练

Itay Elam, Eliron Rahimi, Avi Mendelson, Chaim Baskin

发表机构 * Technion - Israel Institute of Technology（以色列理工学院）； Ben-Gurion University of the Negev（本·古里安大学）

AI总结提出PACI方法，通过局部梯度累积控制版本漂移，实现无气泡异步流水线并行，在GPT风格语言模型预训练中匹配同步1F1B-flush的稳定性和困惑度，吞吐量完全利用，训练时间至准确率提升达1.69倍。

详情

AI中文摘要

流水线并行对于训练大型神经网络至关重要，但现有的调度方案在吞吐量、内存和优化一致性之间进行权衡。同步流水线保持了前向/反向权重一致性，但存在气泡；异步流水线消除了气泡，但引入了权重版本不匹配，通常需要权重暂存、预测或校正机制。我们提出了PACI（具有可控不一致性的流水线异步训练），一种无气泡的异步流水线方法，它限制了前向/反向版本漂移，无需权重暂存、预测、额外的参数副本或全局同步。关键思想是使用局部梯度累积作为版本控制机制：通过相对于流水线延迟减慢参数版本演化，PACI限制了任何微批次跨越的优化器更新次数，同时保持稳态利用率。在GPT风格的语言模型预训练中，PACI匹配了同步1F1B-flush的稳定性和最终困惑度，保留了相同的峰值内存占用，实现了完全利用的流水线吞吐量，并将训练时间至准确率相比最快的flush基线提升了高达1.69倍。这些结果表明，前向/反向不一致性不必消除：当明确有界时，可以安全地将其换取显著的效率提升。

英文摘要

Pipeline parallelism is essential for training large neural networks, but existing schedules trade off throughput, memory, and optimization consistency. Synchronous pipelines preserve forward/backward weight consistency but suffer from bubbles; asynchronous pipelines remove bubbles but introduce weight-version mismatch, typically requiring weight stashing, prediction, or correction mechanisms. We introduce PACI (Pipeline Asynchronous training with Controlled Inconsistency), a bubble-free asynchronous pipeline method that bounds forward/backward version drift without weight stashing, prediction, additional parameter copies, or global synchronization. The key idea is to use local gradient accumulation as a version-control mechanism: by slowing parameter-version evolution relative to pipeline delay, PACI limits the number of optimizer updates crossed by any micro-batch while preserving steady-state utilization. In GPT-style language-model pretraining, PACI matches the stability and final perplexity of synchronous 1F1B-flush, retains the same peak memory footprint, achieves fully utilized pipeline throughput, and improves training time-to-accuracy by up to $1.69\times$ over the fastest flush baseline. These results show that forward/backward inconsistency need not be eliminated: when explicitly bounded, it can be safely traded for substantial efficiency gains.

URL PDF HTML ☆

赞 0 踩 0

2606.07908 2026-06-09 cs.LG 新提交

Layer-wise Derivative Controlled Networks Achieve Competitive Accuracy and Gradient Stability Across Data Regimes

逐层导数控制网络在不同数据体制下实现竞争性准确性和梯度稳定性

Rowan Martnishn

发表机构 * Rowan Martnishn

AI总结基于ChainzRule的导数控制网络通过逐层雅可比惩罚，在表格和NLP任务中实现低数据高性能，梯度尾比作为泛化诊断指标。

详情

AI中文摘要

基于ChainzRule（CR）的导数控制网络结合了三次多项式层与轻量级前向逐层雅可比惩罚（DREG）。在本多部分系列的第二篇论文中，我们评估了CR在不同数据体制下的泛化特性。我们消融了DREG系数调度的形状，证明最优退火范围取决于表示噪声。在Pima糖尿病数据集上，CR在低数据下表现强劲，并在5%至100%训练数据范围内保持相对于基线的持续准确率优势，这得益于异常稳定的梯度尾比（约1.01-1.02，而ReLU网络为1.07-1.09）。扩展到SST-5，在冻结嵌入和BERT微调体制下均取得有竞争力或更优的结果，包括在训练数据显著减少的情况下超越先前的BERT基线。这些结果具有统计显著性：CR在我们能识别的最强已发表基线上，在两个数据集上均取得了更优的准确率（p < 0.05）。这些结果表明，逐层导数控制引入了一种偏向低频、稳定表示的结构性归纳偏置，该偏置在表格和NLP领域、数据量和表示质量上均能稳健泛化。梯度尾比可作为泛化能力的可靠、无标签诊断指标。

英文摘要

Derivative-controlled networks based on ChainzRule (CR) combine cubic polynomial layers with a lightweight forward-mode per-layer Jacobian penalty (DREG). In this second paper of a multi-part series, we evaluate the generalization properties of CR across data regimes. We ablate the shape of the DREG coefficient schedule, demonstrating that the optimal annealing range depends on representation noise. On the Pima Diabetes dataset, CR achieves strong low-data performance and maintains a consistent accuracy advantage over baselines from 5\% to 100\% training data, supported by exceptionally stable gradient tail ratios ($\sim$1.01--1.02 vs. 1.07--1.09 for ReLU networks). Extensions to SST-5 show competitive or superior results in both frozen-embedding and BERT fine-tuned regimes, including outperforming prior BERT baselines despite substantially less training data. These results are statistically significant: CR achieves superior accuracy over the strongest published baselines we could identify on both datasets ($p < 0.05$). These results establish that layer-wise derivative control induces a structural inductive bias toward low-frequency, stable representations that generalizes robustly across tabular and NLP domains, data volumes, and representation qualities. The gradient tail ratio serves as a reliable, label-free diagnostic of generalization capability.

URL PDF HTML ☆

赞 0 踩 0

2606.08105 2026-06-09 cs.LG 新提交

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

注意力汇聚的统一视角：两种算法，两种解决方案

Lukas Fesser, Mozes Jacobs, Thomas Fel, Andy Keller, Sham Kakade

发表机构 * Kempner Institute（肯普纳研究所）； Harvard University（哈佛大学）

AI总结本文揭示注意力汇聚（attention sink）可对应两种不同机制：自适应空操作（adaptive nop）和广播（broadcast），并据此提出诊断方法，证明门控（gating）和寄存器（register）等干预分别针对不同机制，组合使用效果更佳。

详情

AI中文摘要

当注意力集中在一个单一标记（即汇聚）上时，模型实际上在计算什么？注意力汇聚在softmax transformer中普遍存在，然而这种共享的视觉特征可能隐藏着根本不同的算法。我们表明，视觉上相似的汇聚模式可以反映两种不同的机制：{i}自适应空操作，其中注意力头通过路由到空标记来抑制其更新；以及{ii}广播，其中汇聚聚合并重新分配全局信息。在这种情况下，汇聚扮演着类似的作用：当没有有用信息可计算时，作为一个安全的目的地。提出的干预措施如门控或寄存器之所以有效，是因为它们隐式地针对其中一种机制，揭示了方法与假设机制之间的对偶性：门控隐式假设空操作；寄存器隐式假设广播。每种机制都会留下不同的痕迹（空操作汇聚的值范数可忽略；广播汇聚导致低秩输出），我们在合成任务上形式化这些痕迹，并用于推导实用的诊断方法。应用于预训练视觉transformer时，这些诊断表明两种机制在大规模模型中均存在：汇聚从早期层的CLS标记过渡到深层层的块标记，并集中在专门的注意力头中。引人注目的是，为广播设计的寄存器标记被重新用于服务空操作，证实了单独任何一种干预都不足够。将门控与寄存器结合使用在稳定性和性能上带来互补的提升。总体而言，我们发现相同的注意力模式可以反映两种截然不同的计算，有效的干预需要首先询问模型实际在计算什么。

英文摘要

When attention concentrates on a single token, a sink, what is the model actually computing? Attention sinks are ubiquitous in softmax transformers, yet this shared visual signature can hide fundamentally different algorithms. We show that visually similar sink patterns can reflect two distinct mechanisms: {i} adaptive nop, where a head suppresses its update by routing to a null token, and {ii} broadcast, where a sink aggregates and redistributes global information. In that case, sinks serve an analogous role: a safe destination when there is nothing useful to compute. Proposed interventions like gating or registers work because they implicitly target one or the other, revealing a duality between method and assumed mechanism: gating implicitly assumes nop; registers implicitly assume broadcast. Each mechanism leaves distinct traces (nop sinks exhibit negligible value norms; broadcast sinks induce low-rank outputs) which we formalize on synthetic tasks and use to derive practical diagnostics. Applied to pretrained vision transformers, these diagnostics reveal that both mechanisms exist at scale: sinks transition from CLS in early layers to patches in deeper layers, and concentrate in specialized heads. Strikingly, register tokens, designed for broadcast, are repurposed to also serve nop, confirming that neither intervention alone suffices. Combining gating with registers yields complementary gains in stability and performance. Overall, we find that the same attention pattern can reflect two very different computations and effective intervention requires first asking what the model is actually computing.

URL PDF HTML ☆

赞 0 踩 0

2606.08191 2026-06-09 cs.LG cs.AI q-bio.QM 新提交

Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation

频域潜在注意力门控用于跨域令牌聚合

Kewei Li, Rongying Zhang, Xueli Wang, Xiwen Gong, Zhongjian Wang, Lan Huang, Ruochi Zhang, Fengfeng Zhou

发表机构 * College of Computer Science and Technology, Jilin University（吉林大学计算机科学与技术学院）； Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University（教育部符号计算与知识工程重点实验室）； Institute for Quantitative and Computational Biology, University of California（加州大学定量与计算生物学研究所）； Greenwich High School（格林威治高中）； BCPM Data Limited（BCPM数据有限公司）

AI总结提出FLaG模块，通过实FFT变换、可学习潜在查询的频谱分量汇总、通道门控和时域重建，实现跨域令牌聚合，在AMP预测、图像分类和文本分类任务上取得提升。

详情

AI中文摘要

令牌聚合是将令牌表示映射到样本级预测的模型中的常见瓶颈，然而大多数池化方法仅在原始令牌域中操作。我们提出FLaG，一个即插即用的聚合模块，它使用实FFT变换令牌表示，用可学习的潜在查询汇总频谱分量，应用通道门控，并重建增强的时域令牌以进行最终池化。我们在使用ESM2的抗菌肽（AMP）活性预测、使用ResNet18在CIFAR-10和CIFAR-100上的图像分类，以及使用RoBERTa在IMDB和GLUE上的文本分类中评估FLaG。FLaG在ESM2-8M抗菌肽任务和CIFAR-100上取得了最明显的提升，同时在IMDB和GLUE上与强文本基线保持竞争力。然后，我们通过频带消融、门控汇总、残基扰动、潜在查询读出和结构代理分层来探究其在AMP设置中的行为。我们发现低频带贡献最大，其余高频带模式更具样本特异性。门控充当广泛共享的频谱重加权阶段，交叉注意力模式是样本特异性的，具有轻微的查询差异，并且高螺旋肽在两种细菌中表现出更强的平均频谱敏感性。补充材料、源代码和数据发布在https://www.healthinformaticslab.org/supp/ 和 https://github.com/Kewei2023/AMPCliff/tree/FLaG。

英文摘要

Token aggregation is a common bottleneck in models that map token representations to sample-level predictions, yet most pooling methods operate only in the original token domain. We propose FLaG, a plug-in aggregation module that transforms token representations with the real FFT, summarizes spectral components with learnable latent queries, applies a channel-wise gate, and reconstructs enhanced time-domain tokens for final pooling. We evaluate FLaG on antimicrobial peptide (AMP) activity prediction with ESM2, image classification with ResNet18 on CIFAR-10 and CIFAR-100, and text classification with RoBERTa on IMDB and GLUE. FLaG achieves its clearest gains on the ESM2-8M antimicrobial peptide tasks and on CIFAR-100, while remaining competitive with strong text baselines on IMDB and GLUE. Then we probe its behavior on the AMP setting with band knockouts, gate summaries, residue perturbations, latent-query readouts, and structure-proxy stratification. We find that low-frequency bands contribute the most overall, and the remaining higher-band pattern is more sample-specific. The gate acts as a broadly shared spectral reweighting stage and the cross-attention patterns are sample-specific with mild query-wise differentiation, and higher-helix peptides exhibit stronger average spectral sensitivity in both bacteria. The supplementary materials, source code and data are released at https://www.healthinformaticslab.org/supp/ and https://github.com/Kewei2023/AMPCliff/tree/FLaG.

URL PDF HTML ☆

赞 0 踩 0

2606.08262 2026-06-09 cs.LG 新提交

sGPO: 在RLVR中用推理FLOPs换取训练效率

Shivchander Sudalairaj, Kai Xu, Akash Srivastava, Giorgio Giannone

发表机构 * Red Hat（红帽）； IBM

AI总结提出sGPO方法，通过少量推理计算预估查询难度，自适应分配训练预算，将训练计算量降低三倍，同时保持或提升性能。

详情

AI中文摘要

标准的可验证奖励强化学习（RLVR）训练为每个查询分配固定的展开预算，而不考虑每个查询的难度对当前策略的意义。这导致两种对称的失败模式：简单查询产生接近零的优势，因为策略已经解决了它们；而无法解决的查询不产生信号，因为策略从未解决它们。这两种情况都浪费了训练FLOPs，而没有贡献学习梯度。我们引入了排序组策略优化（sGPO），一种计算高效的策略，用少量推理FLOPs换取大量减少浪费的训练FLOPs。关键见解是，廉价的推理计算可以作为查询难度的单一离线代理。通过在初始策略下为每个查询生成一小批并行样本，我们获得了模型感知的经验成功率。这激励将训练展开组大小设置为该成功率的倒数，这是一个实用的规则，通过从每个生成的展开中提取最大优势来最大化样本效率。这一单次性能分析过程同时驱动数据过滤（移除琐碎查询和子采样无法解决的查询）、自适应组大小分配和课程构建（从易到难调度查询）。sGPO匹配或超过基线性能，同时将总训练计算量减少三倍，包括前期的推理性能分析成本。

英文摘要

Standard Reinforcement Learning with Verifiable Rewards (RLVR) training allocates a fixed rollout budget to every query, without regard for what each query's difficulty means for the current policy. This leads to two symmetric failure modes: easy queries produce near-zero advantage because the policy already solves them, while unsolvable queries produce no signal because the policy never solves them. Both regimes waste training FLOPs without contributing to a learning gradient. We introduce sorted Group Policy Optimization (sGPO), a compute-efficient strategy that trades a small budget of inference FLOPs for a large reduction in wasted training FLOPs. The key insight is that cheap inference compute can serve as a single offline proxy for query difficulty. By generating a small batch of parallel samples per query under the initial policy, we obtain a model-aware empirical success rate. This motivates setting the training rollout group size to the inverse of this success rate, a practical rule that maximizes sample efficiency by extracting the most advantage per generated rollout. This single profiling pass simultaneously drives data filtering (removing trivial queries and sub-sampling unsolvable ones), adaptive group size allocation, and curriculum construction (scheduling queries from easy to hard). sGPO matches or exceeds baseline performance while reducing total training compute by a factor of three, with the upfront inference profiling cost included.

URL PDF HTML ☆

赞 0 踩 0

2606.08934 2026-06-09 cs.LG stat.AP stat.CO stat.ME stat.ML 新提交

Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory

递归神经网络中的反向相干性与隐藏状态稳定性：拟逆鞅理论

Yuan-chin Ivan Chang

发表机构 * Institute of Statistical Science, Academia Sinica（中央研究院统计科学研究所）

AI总结提出反向相干性概念，通过拟逆鞅理论证明隐藏状态序列几乎必然收敛，并设计正则化方法，在多个任务中实现更早稳定和更低误差。

详情

AI中文摘要

递归神经网络维护一个隐藏状态 $h_t$，但其概率意义通常不明确。我们通过\emph{反向相干性}研究隐藏状态稳定性：即通过学习的反向投影器 $g_ϕ$ 从 $h_{t+1}$ 重构 $h_t$ 的程度。在收缩性和可和反向漂移条件下，隐藏状态序列构成拟逆鞅。这导致几乎必然收敛、混合下的速率、可解释的极限表示、有限路径停止时间以及时间一致置信序列的理论框架。模拟支持该理论。反向相干性正则化将经验拟鞅总和 $\hat Q$ 降低 $43$--$58\%$，比未正则化的 RNN 早 $28$--$44\%$ 达到稳定，并提供与几何界一致的跟踪误差恢复。额外测试证实回波状态遗忘率受 $ρ$ 限制，并验证增量总和管 $R_t$ 具有 $100\%$ 同时覆盖率，尽管 $R_t$ 是保守的；实践中，缺陷尾代理 $\hat Q_t$ 是更有用的监控指标。反向相干性损失也等价于在高斯反向模型中最小化 Kullback--Leibler 散度，将该方法与变分推断联系起来。扩展涵盖 $ϕ$-混合输入、变点检测和有限样本集中度。三项真实数据研究进一步验证了该方法。在 PhysioNet 2012 ICU 数据上，逆鞅 RNN (RMRNN) 与 RNN 的死亡率预测 AUC 相当，同时提前 13 小时达到稳定表示。在 FRED-MD 上，它在概念漂移下将一个月前预测误差降低约四倍。在 UCI 人类活动识别上，它保持较低的后转换跟踪误差并具有几何衰减。这些保证在所述假设下成立；不声称普适性。

英文摘要

Recurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_ϕ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences. Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $ϕ$-mixing inputs, change-point tracking, and finite-sample concentration. Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.

URL PDF HTML ☆

赞 0 踩 0

2606.08985 2026-06-09 cs.LG 新提交

Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic

超越神经坍缩：任务内在几何决定模算术中的神经表示

Hu Tan, Kuo Gai, Shihua Zhang

发表机构 * Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院）； School of Mathematical Sciences, University of Chinese Academy of Sciences（中国科学院大学数学科学学院）； Shanghai Institute for Mathematics and Interdisciplinary Sciences (SIMIS)（上海数学与交叉学科研究院）； Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences（浙江省系统健康科学重点实验室，中国科学院大学杭州高等研究院生命科学学院）

AI总结本文发现模加法任务中网络表示呈现二维循环几何而非神经坍缩的单纯形等角紧框架，通过层间非均匀训练、子空间锁定后的相位对齐动力学和复杂度优势分析解释了这一现象。

详情

AI中文摘要

虽然神经坍缩（NC）预测一个$K$类平衡分类器应将终端表示组织为$(K-1)$维单纯形等角紧框架（ETF），但模加法始终进入不同的状态：网络压缩为二维循环几何，其中分类器权重和词元嵌入都位于圆上。我们从三个方向精炼对这一现象的解释。首先，我们形式化了一个逐层非均匀训练机制：下游分类器权重被密集交叉熵梯度驱动到秩2等角配置，而上游嵌入尚未完全重组；一旦这个分类器平面形成，反向传播的特征梯度将嵌入运动约束在同一平面内，同时权重衰减抑制正交分量。其次，在此子空间锁定之后，诱导的平面内动力学允许在$S^1$上的一种熵正则化输运解释；结合模加法标签，这使嵌入形成简化为相位对齐，其最小化器是$\mathbb{Z}/P\mathbb{Z}$的单频特征，因此是圆上的等角点。第三，我们量化了为什么这一解优于NC：单纯形ETF在交叉熵上仅获得$O(1)$的优势，而循环秩2解在Schatten或权重衰减代理下享有$\Theta(K)$的优势，产生临界阈值$\lambda_{\mathrm{crit}} = \Theta(1/K)$。我们的结果解释了为什么分类器权重首先移动以及为什么嵌入随后与之对齐，表明模算术上的grokking不是由最大分离单独支配，而是由分离、对称性和复杂性之间的任务结构化权衡所支配。

英文摘要

While neural collapse (NC) predicts that a $K$-class-balanced classifier should organize terminal representations as a $(K-1)$-dimensional simplex equiangular tight frame (ETF), modular addition consistently enters a different regime: networks compress to a two-dimensional cyclic geometry in which both classifier weights and token embeddings lie on circles. We refine the explanation of this phenomenon in three directions. First, we formalize a layerwise non-uniform training mechanism: downstream classifier weights are driven by dense cross-entropy gradients into a rank-2 equiangular configuration before upstream embeddings fully reorganize, and once this classifier plane forms, backpropagated feature gradients constrain embedding motion to the same plane while weight decay suppresses orthogonal components. Second, after this subspace locking, the induced in-plane dynamics admit an entropy-regularized transport interpretation on $S^1$; combined with modular-addition labels, this reduces embedding formation to phase alignment, whose minimizers are single-frequency characters of $\mathbb{Z}/P\mathbb{Z}$ and hence equal-angle points on a circle. Third, we quantify why this solution prevails over NC: a simplex ETF gains only an $O(1)$ advantage in cross-entropy, whereas the cyclic rank-2 solution enjoys a $Θ(K)$ advantage under Schatten or weight-decay surrogates, yielding a critical threshold $λ_{\mathrm{crit}} = Θ(1/K)$. Our results explain both why classifier weights move first and why embeddings subsequently align with them, showing that grokking on modular arithmetic is governed not by maximal separation alone but by a task-structured trade-off between separation, symmetry, and complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.09059 2026-06-09 cs.LG cs.AI cs.CV 新提交

Stage-1 Controls the Entropy Regime, Not the Outcome

Stage-1 控制熵状态，而非最终结果

Jianxiong Shen

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文通过小数据实验研究两阶段后训练中Stage-1（SFT或OPD）的作用，发现其主要影响策略熵状态，但对最终性能影响有限。

详情

AI中文摘要

两阶段后训练——Stage-1 热启动（监督微调 SFT 或在线策略蒸馏 OPD）后接 Stage-2 强化学习（RL）——越来越多地用于视觉语言模型（VLM）。我们使用 Qwen2.5-VL-7B 和同模态 72B VLM 教师进行 OPD，在小数据研究中探究 Stage-1 实际控制什么。首先，三种热启动在 Geometry3K 内部验证集上达到狭窄的 53%–54% 区间，与近期专门方法报告的窄范围一致；该设置几乎没有证据表明 Stage-1 改变了域内终点。其次，匹配配方、早停的 SFT 在域外 MathVista 上提升了 +2.1 点，逆转了过训练变体的 -9.5 点下降。最明显的区别是熵状态：OPD 进入 RL 时的策略熵显著高于任一 SFT 初始化，且这种分离在可用轨迹中持续可见。在域内初始化时，OPD 还具有更高的答案多样性和 pass@16（比 SFT 高 +2.0 到 +5.2 点），尽管问题级自举区间显示较小的对比具有不确定性。RL 后优势消失（终点 pass@16 值在 1.1 点以内），在 MathVista 上也是如此（六个模型在 1.2 点以内）。因此，我们的贡献是一个有界的实证刻画：在此设置中，Stage-1 与熵状态强相关，但下游收益小、局部化，且不能证明 OPD 是更好的 RL 热启动。

英文摘要

Two-stage post-training -- a Stage-1 warm-start (supervised fine-tuning, SFT, or on-policy distillation, OPD) followed by Stage-2 reinforcement learning (RL) -- is increasingly used for vision-language models (VLMs). We ask what Stage-1 actually controls in a small-data study using Qwen2.5-VL-7B with a same-modality 72B VLM teacher for OPD. First, the three warm-starts reach a narrow $53$--$54\%$ band on Geometry3K internal validation, consistent with the narrow range reported by recent specialized methods; this setup provides little evidence that Stage-1 changes the in-domain endpoint. Second, a matched-recipe, early-stopped SFT improves out-of-domain MathVista by $+2.1$ points, reversing the $-9.5$-point drop of an over-trained variant. The clearest difference is the \emph{entropy regime}: OPD enters RL with substantially higher policy entropy than either SFT initialization, and the separation remains visible through the available trajectories. At the in-domain initialization, OPD also has higher answer diversity and pass@16 ($+2.0$ to $+5.2$ points over SFT), although problem-level bootstrap intervals show that the smaller contrast is uncertain. The advantage is absent after RL (endpoint pass@16 values within $1.1$ points) and on MathVista (six models within $1.2$ points). Our contribution is therefore a bounded empirical characterization: Stage-1 is strongly associated with the entropy regime in this setup, but the downstream payoff is small, localized, and not evidence that OPD is a better RL warm-start.

URL PDF HTML ☆

赞 0 踩 0

2606.09077 2026-06-09 cs.LG 新提交

Neural Legendre-Fenchel transform with Hessian Preconditioning

神经 Legendre-Fenchel 变换与 Hessian 预处理

Basile Plus-Gourdon, Frank Nielsen

发表机构 * École Normale Supérieure Paris-Saclay（巴黎-萨克雷高等师范学校）； Sony Computer Science Laboratories Inc.（索尼计算机科学实验室公司）

AI总结提出基于 Hessian 预处理的神经 Legendre-Fenchel 变换方法，通过仿射变形改善病态函数的共轭计算，提高收敛速度和数值精度。

Comments 11 pages, 4 figures

详情

AI中文摘要

Legendre-Fenchel (LF) 变换是凸分析和机器学习中的基本工具，将下半连续函数映射到其凸共轭。在实践中，当给定函数的凸共轭没有闭式公式时，必须使用各种技术进行近似。最近一种通用的数值方法是深度 Legendre 变换方法，它依赖于神经网络，尽管在处理病态函数时仍然具有挑战性。本文基于 LF 变换作为射影对偶的重新表述。该框架的一个显著特性是仿射不变性。我们利用这种仿射不变性引入了一种基于 Hessian 的预处理策略。具体来说，我们在一个极小点附近应用仿射变形，使得函数的二阶泰勒近似与标准抛物面重合，其共轭映射是恒等映射。一个在恒等映射附近初始化的残差网络可以学习这个简化后的映射，而原始共轭映射通过逆变形恢复。所提出的预处理仅带来适度的计算开销，包括初始化时的一次特征分解和每次查询时的两次矩阵-向量乘法。在包括高维基准测试在内的多种凸函数上的实验表明，共轭的收敛速度和数值精度得到了提高，特别是在病态问题上效果显著。最后，我们讨论了所提出方法的适用范围，并指出了其若干局限性。

英文摘要

The Legendre-Fenchel (LF) transform is a fundamental tool in convex analysis and machine learning that maps lower semi-continuous functions to their convex conjugates. In practice, when closed-form formula are not available for expressing convex conjugates of given functions, one must approximate them using various techniques. One recent such versatile numerical method is the deep Legendre transform method which relies on neural networks although it remains challenging particularly for tackling ill-conditioned functions. This work builds on the reformulation of the LF transform as a projective polarity. A notable property of this framework is its affine invariance. We leverage this affine invariance to introduce a Hessian-based preconditioning strategy. Specifically, we apply an affine deformation around a minimizer so that the second-order Taylor approximation of the function coincides with the canonical paraboloid, whose conjugation map is the identity. A residual network initialized near the identity can then learn this simplified mapping, while the original conjugation map is recovered through the inverse deformation. The proposed preconditioning incurs only a modest computational overhead, consisting of a single eigendecomposition during initialization and two matrix-vector multiplications per query. Experiments on a diverse set of convex functions, including high-dimensional benchmarks, demonstrate improved convergence rates and enhanced numerical accuracy of the conjugation, with particularly significant gains for ill-conditioned problems. Finally, we discuss the scope of applicability of our proposed method and highlight several of its limitations.

URL PDF HTML ☆

赞 0 踩 0

2606.09078 2026-06-09 cs.LG 新提交

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

过程奖励模型的隐藏偏见：PRISM用于奖励正确推理

Aakriti Agrawal, Souradip Chakraborty, Armin Saghafian, Nihal Sharma, Rizal Fathony, Nam H Nguyen, C. Bayan Bruss, Amrit Singh Bedi, Furong Huang

发表机构 * University of Maryland（马里兰大学）； Amazon（亚马逊）； University of Central Florida（中佛罗里达大学）

AI总结针对过程奖励模型因训练数据不平衡导致的虚假高评分偏见，提出PRISM框架，通过对比步骤级比较和前瞻策略生成的难负样本，结合难度感知课程学习优化，显著降低假阳性率并提升推理准确性。

详情

AI中文摘要

过程奖励模型（PRM）通过提供步骤级反馈改善了推理的信用分配。然而，我们发现PRM中存在由步骤级训练数据严重不平衡引起的隐藏偏见。标准交叉熵训练放大了这种偏见，导致PRM过度奖励看似合理但错误的步骤，并产生高假阳性率。我们表明这些假阳性具有不对称的下游效应：假阴性主要减缓探索，而假阳性则主动将Best-of-N选择、引导解码和策略优化引导向有缺陷的推理。这表明PRM训练应从逐点标签拟合转向可靠的相对比较。为解决此问题，我们提出PRISM（改进步骤建模的精确排序），一种策略感知的PRM训练框架，从对比步骤级比较和由时间前瞻策略生成的难负样本中学习，无需新的人工标签。我们进一步使用难度感知课程来优化对比步骤间隔。在PRMBench和ProcessBench上，PRISM显著减少了假阳性（PRMBench上降低22%），并在强判别性PRM上提高了宏F1。当应用于策略优化和搜索任务（包括引导解码和Best-of-N选择）时，它持续提高了准确率（引导解码最高22%，Best-of-N最高33%）和鲁棒性。更广泛地说，可信的过程监督不仅仅是分配高奖励，而是为了正确的理由奖励正确的推理。

英文摘要

Process Reward Models (PRMs) improve credit assignment for reasoning by providing step-level feedback. However, we identify a hidden bias in PRMs caused by severe imbalance in step-level training data. Standard cross-entropy training amplifies this bias, causing PRMs to overcredit plausible but incorrect steps and produce high false-positive rates. We show that these false positives have an asymmetric downstream effect: false negatives mainly slow exploration, whereas false positives actively steer Best-of-N selection, guided decoding, and policy optimization toward flawed reasoning. This suggests that PRM training should shift from pointwise label fitting to reliable relative comparisons. To address this, we propose PRISM (Precision Ranking for Improved Step Modeling), a policy-aware PRM training framework that learns from contrastive step-level comparisons and hard negatives generated by a temporal lookahead strategy, requiring no new human labels. We further use a difficulty-aware curriculum to optimize the contrastive step margin. Across PRMBench and ProcessBench, PRISM substantially reduces false positives (22% on PRMBench) and improves macro F1 over strong discriminative PRMs. When applied to policy optimization and search tasks, including guided decoding and Best-of-N selection, it consistently improves accuracy (up to 22% for guided decoding and 33% for Best-of-N) and robustness. More broadly, trustworthy process supervision is not just about assigning high rewards, but about rewarding the right reasoning for the right reasons.

URL PDF HTML ☆

赞 0 踩 0

2606.09091 2026-06-09 cs.LG cs.CV 新提交

Stabilizing On-Policy Distillation for MLLM Reasoning with Global Normalization

稳定基于策略的蒸馏用于多模态大语言模型推理的全局归一化

Dongze Hao, Zhiwei Jin, Chen Chen, Haonan Lu

发表机构 * OPPO AI Center（OPPO AI中心）

AI总结针对策略蒸馏中异常状态导致梯度不稳定的问题，提出全局归一化蒸馏策略优化（GNDPO），通过将KL分数转化为批次级相对优势来稳定优化，提升多模态推理任务的训练鲁棒性和性能。

详情

AI中文摘要

基于策略的蒸馏（OPD）最近成为一种重要的后训练范式。通过使用更强的教师模型为采样轨迹提供密集、细粒度的监督，OPD相比依赖稀疏二元或基于结果的环境反馈的可验证奖励强化学习（RLVR）具有明显优势。然而，朴素的token级蒸馏可能因异常状态中的幅度不匹配而遭受梯度不稳定性。为了解决这个问题，我们提出了全局归一化蒸馏策略优化（GNDPO），这是一种实用方法，通过将原始KL分数转化为批次级相对优势来稳定优化。这种归一化有效缓解了梯度爆炸，同时保留了token级指导的优势。实验结果表明，GNDPO在多模态推理任务中显著提高了训练鲁棒性和下游性能。代码已发布在 https://github.com/OPPO-Mente-Lab/GNDPO。

英文摘要

On-policy distillation (OPD) has recently emerged as an important post-training paradigm. By using a stronger teacher model to provide dense, fine-grained supervision for sampled trajectories, OPD offers a clear advantage over reinforcement learning with verifiable rewards (RLVR), which typically depends on sparse binary or outcome-based environmental feedback. However, naive token-level distillation can suffer from gradient instability, due to magnitude misalignment in outlier states. To address this issue, we propose Globally Normalized Distillation Policy Optimization (GNDPO), a practical method that stabilizes optimization by transforming raw KL scores into batch-level relative advantages. This normalization effectively mitigates gradient explosions while retaining the benefits of token-level guidance. Experimental results show that GNDPO substantially improves training robustness and downstream performance across multimodal reasoning tasks. The code is released at https://github.com/OPPO-Mente-Lab/GNDPO.

URL PDF HTML ☆

赞 0 踩 0

2606.09112 2026-06-09 cs.LG cs.AI 新提交

Hybridizing Equilibrium Propagation with Ising Machines for Efficient Energy-Based Learning

将平衡传播与伊辛机混合以实现高效的基于能量的学习

Chen-Rui Fan, Bo Lu, Xing-Yu Wu, Tie-Jun Wang, Chuan Wang

发表机构 * School of Artificial Intelligence, Beijing Normal University（北京师范大学人工智能学院）； Laboratory for Advanced Computing and Intelligence Engineering, Information Engineering University（信息工程大学先进计算与智能工程实验室）； School of Physical Science and Technology, Beijing University of Posts and Telecommunications（北京邮电大学物理科学与技术学院）

AI总结提出一种受伊辛动力学启发的平衡传播框架，通过扩展相空间动力学替代耗散Hopfield松弛，加速收敛、提高噪声鲁棒性，并在MNIST等数据集上实现与反向传播相当的性能。

详情

AI中文摘要

人工智能的快速发展推动了深度神经网络的重大进步。然而，传统的基于GPU的训练仍然高度耗能，这促使人们探索物理动力学和兼容的基于能量的学习方案，例如平衡传播（EP）。然而，基于EP的训练常常由于相空间收缩而陷入局部最小值。本文介绍了一种受伊辛动力学启发的平衡传播框架，其中耗散的Hopfield松弛被具有共轭变量的扩展相空间动力学所取代。由此产生的训练范式保留了EP的局部两阶段学习规则，同时改变了神经状态达到平衡的物理路径。我们表明，这种动力学降低了有效能量壁垒，加速了收敛，提高了噪声鲁棒性，并在MNIST、FashionMNIST和CIFAR-10上训练了深度卷积Hopfield网络，性能与反向传播相当。

英文摘要

The rapid evolution of artificial intelligence has led to substantial advances in deep neural networks. Nonetheless, conventional GPU-based training remains highly energy-demanding, motivating the exploration of physical dynamics and compatible energy-based learning schemes, such as equilibrium propagation (EP). EP-based training, however, frequently suffers from convergence to local minima due to phase-space contraction. Here we introduce an Ising-dynamics-inspired equilibrium-propagation framework in which dissipative Hopfield relaxation is replaced by an extended phase-space dynamics with conjugate variables. The resulting training paradigm keeps the local two-phase learning rule of EP while changing the physical route by which neural states reach equilibrium. We show that this dynamics lowers effective energy barriers, accelerates convergence, improves noise robustness, and trains deep convolutional Hopfield networks on MNIST, FashionMNIST, and CIFAR-10 with performance comparable to backpropagation.

URL PDF HTML ☆

赞 0 踩 0

2606.09117 2026-06-09 cs.LG cs.AI 新提交

Optimizing Energy-based Neural Network Training with Coherent Ising Machine

利用相干伊辛机优化基于能量的神经网络训练

Chen-Rui Fan, Bo Lu, Zhi-Hong Zhang, Run-Qing Zhang, Jing-Wei Wen, Chuan Wang

发表机构 * School of Artificial Intelligence, Beijing Normal University（北京师范大学人工智能学院）； Laboratory for Advanced Computing and Intelligence Engineering, Information Engineering University（信息工程大学先进计算与智能工程实验室）； China Mobile (Suzhou) Software Technology Company Limited（中移（苏州）软件技术有限公司）； School of Science, Beijing University of Posts and Telecommunications（北京邮电大学理学院）

AI总结本文利用相干伊辛机结合平衡传播训练基于能量的神经网络，并通过Adam优化器加速收敛，展示了在深层架构和卷积操作上的可扩展性，为下一代AI硬件提供了物理框架。

详情

AI中文摘要

尽管伊辛机作为伊辛模型的高级物理求解器，在组合优化和神经网络训练中具有应用潜力，但其在大规模神经网络中的可扩展性仍受限于硬件连接限制和次优的训练方法。在这项工作中，我们利用相干伊辛机（CIM）通过平衡传播训练基于能量的神经网络，实现了与现有软件实现相当的性能。我们进一步通过集成Adam优化器来求解Hopfield能量网络的基态，从而显著提高了收敛速度和求解精度。此外，我们展示了该方法在更深层网络架构和卷积操作上的可扩展性。我们的结果突显了CIM动力学作为训练复杂神经网络的可扩展平台的潜力，为通过模拟电路、光电子或集成光子学实现节能实现提供了途径。这项工作为下一代AI硬件开发建立了一个新颖的物理框架。

英文摘要

While Ising machines serve as advanced physical solvers for the Ising model,enabling applications in combinatorial optimization and neural network training,their scalability for large-scale neural networks remains constrained by hardware connectivity limitations and suboptimal training methodologies. In this work,we leverage a Coherent Ising Machine (CIM) to train an energy-based neural network using Equilibrium Propagation, achieving performance comparable to existing software-based implementations. We further enhance the algorithm by integrating the Adam optimizer to solve for the ground state of a Hopfield energy network, significantly improving convergence speed and solution accuracy. Additionally, we demonstrate the scalability of our approach across deeper network architectures and convolutional operations. Our results highlight the potential of CIM dynamics as a scalable platform for training complex neural networks, offering a pathway toward energy-efficient implementations via analog circuits, optoelectronics, or integrated photonics. This work establishes a novel physical framework for next-generation AI hardware development.

URL PDF HTML ☆

赞 0 踩 0

2606.09278 2026-06-09 cs.LG cs.AI 新提交

Muon 比 Adam 学习更鲁棒和可迁移的特征

Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang

发表机构 * Yale University（耶鲁大学）； National University of Singapore（新加坡国立大学）； University of Chinese Academy of Sciences（中国科学院大学）； Academy of Mathematics and Systems Science, CAS（中国科学院数学与系统科学研究院）

AI总结本文通过鲁棒性和可迁移性视角，证明 Muon 优化器相比 Adam 和 SGD 能学习到更鲁棒、更可迁移的特征，并通过理论分析支持了经验发现。

详情

AI中文摘要

Muon 最近已成为预训练大型语言模型（LLMs）和视觉分类器的最先进优化器。尽管其在效率上优于 Adam 和 SGD，但 Muon 在特征学习方面的优势仍不清楚。本文通过鲁棒性和可迁移性的视角研究了 Muon 的特征学习优势。首先，通过在损坏图像和文本上评估预训练模型，我们表明 Muon 学习到的特征在不同架构（包括 Transformer 和卷积神经网络（CNN））中始终比 Adam 和 SGD 学习到的特征更鲁棒。使用训练好的逐层探针，我们进一步表明这种鲁棒性优势体现在各层更大的 logit 间隔上。其次，通过在下游任务上训练线性分类器或从预训练参数微调完整模型，我们证明 Muon 学习到的特征比 Adam 和 SGD 学习到的特征更有效地迁移。这种可迁移性优势还通过有效秩衡量的各层隐藏状态的多样性得到进一步支持。最后，在一个具有多组件特征的代表性分类问题中，我们证明 Muon 比 Adam 和 SGD 获得更大的间隔和更高的有效秩，为我们的经验发现提供了理论支持。

英文摘要

Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustness and transferability. First, by evaluating pretrained models on corrupted images and texts, we show that features learned by Muon are consistently more robust than those learned by Adam and SGD across different architectures, including transformers and Convolutional Neural Networks (CNNs). Using trained layer-wise probes, we further show that this robustness advantage is reflected in larger logit margins across layers. Second, by training linear classifiers or fine-tuning full models from pretrained parameters on downstream tasks, we demonstrate that Muon-learned features transfer more effectively than those learned by Adam and SGD. This transferability advantage is further supported by the diversity of hidden states across layers, as measured by effective rank. Finally, in a representative classification problem with multi-component features, we prove that Muon attains larger margins and higher effective rank than Adam and SGD, providing theoretical support for our empirical findings.

URL PDF HTML ☆

赞 0 踩 0

2606.09756 2026-06-09 cs.LG cond-mat.dis-nn 新提交

Perturbative Contrastive Physical Learning

扰动对比物理学习

Kyungeun Kim, Amanuel Anteneh, Israel Klich, Olivier Pfister, J. M. Schwarz

发表机构 * Department of Mathematics, University of British Columbia, Vancouver, BC Canada（不列颠哥伦比亚大学数学系）； Department of Physics, University of Virginia, 382 McCormick Rd, Charlottesville, VA 22903, USA（弗吉尼亚大学物理系）； Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany（复杂系统物理研究所）； Charles L. Brown Department of Electrical and Computer Engineering, University of Virginia, 351 McCormick Road, Charlottesville, VA 22903, USA（弗吉尼亚大学电气与计算机工程系）； Department of Physics, Syracuse University, Syracuse, NY 13244, USA（雪城大学物理系）

AI总结提出扰动对比物理学习（PCPL）框架，通过对比物理系统在不同条件下的响应实现学习，无需外部处理器或反向传播，在弹簧网络和光子电路中验证了分类与模拟乘法任务。

Comments 21 pages, 10 figures

详情

AI中文摘要

对扰动的响应是理解物理系统的关键。通过比较系统在略微不同条件下的反应来对比这些响应的能力，提供了一种学习机制。在这里，我们引入了扰动对比物理学习（PCPL），这是一个通用框架，其中学习源于对输入、边界条件、参数或解释器函数进行受控变化所产生的物理状态之间的可测量对比。PCPL统一并扩展了先前的方法：平衡传播源于基于能量的系统中自由平衡和微扰平衡之间的对比，而频率传播对应于从正弦驱动、频率解调响应中提取的对比。我们表明，对比驱动的更新可以反映局部敏感性或全局逆问题结构，但不需要集中梯度计算。相反，有效的学习几何结构从系统自身的物理响应中隐式出现，使得学习行为能够在没有外部处理器或显式反向传播的情况下产生。我们在两个平台上演示了PCPL：（i）使用测量的位移和力更新键刚度的弹簧网络，以及（ii）通过x正交测量和雅可比矩阵的有限差分估计训练的连续变量光子电路。两个平台都成功学习了分类任务。我们进一步展示了连续变量光子电路可以被训练来实现模拟乘法，这标志着向更自主的物理学习系统迈出了一步。

英文摘要

Responses to perturbations are key to understanding physical systems. The ability to contrast such responses by comparing how a system reacts under slightly different conditions provides a mechanism for learning. Here, we introduce Perturbative Contrastive Physical Learning (PCPL), a general framework in which learning emerges from measurable contrasts between physical states produced by controlled changes to inputs, boundary conditions, parameters, or interpreter functions. PCPL unifies and extends prior approaches: Equilibrium Propagation is rooted in contrasts between free and nudged equilibria in energy-based systems, while Frequency Propagation corresponds to contrasts extracted from sinusoidally driven, frequency-demodulated responses. We show that contrast-driven updates can reflect either local sensitivities or global inverse-problem structure, yet do not require centralized gradient computation. Instead, effective learning geometry emerges implicitly from the system's own physical response, allowing learning behavior to arise without an external processor or explicit backpropagation. We demonstrate PCPL in two platforms: (i) spring networks that update bond stiffness using measured displacements and forces, and (ii) continuous-variable photonic circuits trained via x quadrature measurements and finite-difference estimates of the Jacobian. Both platforms successfully learn classification tasks. We further show that a continuous-variable photonic circuit can be trained to implement analog multiplication, illustrating a step toward more autonomous physical learning systems.

URL PDF HTML ☆

赞 0 踩 0

2606.09806 2026-06-09 cs.LG cs.AI 新提交

Topological Neural Operators

拓扑神经算子

Lennart Bastian, Samuel Leventhal, Mustafa Hajij, Tolga Birdal

发表机构 * Imperial College London（伦敦帝国学院）； University of San Francisco（旧金山大学）

AI总结提出拓扑神经算子(TNOs)，利用离散外微积分在细胞复形上实现跨维度耦合，并通过分层结构提升长程信息传播，在PDE基准上优于现有算子。

详情

AI中文摘要

我们引入了拓扑神经算子（TNOs），这是一个在细胞复形上进行算子学习的原理性框架，将神经算子（NOs）从点和/或边上的函数提升到拓扑域。TNOs将数据表示为定义在不同维度细胞上的特征，并通过离散外微积分建模它们的相互作用，通过梯度、旋度和散度型算子实现显式的跨维度耦合。关键设计原则是将信息流向（由固定拓扑算子控制）与信息变换（学习得到）解耦，从而产生尊重物理量几何支撑并暴露守恒和相容性结构的模型。我们进一步提出了分层TNOs（HTNOs），它结合了学习到的粗粒度复形以传播长程和拓扑依赖的信息。我们的框架将现有NOs作为特例，提供了跨离散化的算子学习统一视角。在一系列PDE基准测试中，包括不规则几何流动问题，TNOs和HTNOs提高了精度；控制研究进一步隔离了原生高阶和拓扑结构带来的优势。项目页面：https://circle-group.github.io/research/TNO

英文摘要

We introduce Topological Neural Operators (TNOs), a principled framework for operator learning on cell complexes that lifts neural operators (NOs) from functions on points and/or edges to topological domains. TNOs represent data as features defined on cells of varying dimension and model their interactions through Discrete Exterior Calculus, enabling explicit cross-dimensional coupling via gradient-, curl-, and divergence-type operators. The key design principle is to decouple where information flows, as governed by fixed topological operators, from how it is transformed (which is learned), yielding models that respect the geometric support of physical quantities and expose conservation and compatibility structure. We further propose Hierarchical TNOs (HTNOs), which incorporate learned coarse complexes to propagate long-range and topology-dependent information. Our framework subsumes existing NOs as a special case, providing a unified perspective on operator learning across discretizations. Across a range of PDE benchmarks, including irregular-geometry flow problems, TNOs and HTNOs improve accuracy; controlled studies further isolate the benefits of native higher-rank and topological structure. Project page: https://circle-group.github.io/research/TNO

URL PDF HTML ☆

赞 0 踩 0

2606.07560 2026-06-09 cs.CL cs.LG 交叉投稿

Function-Vector Heads Are Two Populations: Writers and Cancellers in In-Context Learning

函数向量头是两个群体：上下文学习中的写入者和取消者

Han-yu Wang

发表机构 * The University of Hong Kong（香港大学）

AI总结发现函数向量头并非同质群体，而是分为写入者和取消者两个子群体，分别推高和压低规则正确logit，且仅基于幅度的排名无法区分二者。

详情

AI中文摘要

函数向量头（Todd et al., 2024）通常通过其对上下文规则任务的因果贡献幅度来识别，隐含假设顶级集合是同一功能类。这一假设不成立。我们用保留符号的标准（改进的DLA + 置换FDR）替代仅幅度排名，并通过路径修补验证每个候选。然后，FV头群体分裂为两个对立的子群体：写入者推高规则正确logit；取消者压低它。一个四条件规范判定在三个模型家族和六个Pythia规模的13/15个单元中成立，符号置换检验在5/6个主要单元中拒绝同质性。仅幅度排名无法看到这种结构：Todd的前20个在层次任务中捕获了64%的取消者但仅4%的写入者，在模块任务中捕获了59%的写入者但仅8%的取消者。我们在所有27个（取消者，单元，头）对上排除了六种人为解释：归纳重叠、汇点、通用重要性、秩1复制抑制、V级联和最近邻非FV控制。零消融取消者在6/6个主要单元中产生+0.13到+0.29 nats的logit增益，方向一致地带来+2到+7个百分点的准确率提升。

英文摘要

Function-vector (FV) heads (Todd et al., 2024) are typically identified by the magnitude of their causal contribution to in-context rule tasks, under the implicit assumption that the top set is a homogeneous functional class. This assumption fails. We replace magnitude-only ranking with a sign-preserving criterion (refined DLA + permutation FDR) and validate each candidate by path patching. The FV head population then splits into two opposing sub-populations: writers push the rule-correct logit up; cancellers push it down. A four-condition canonical verdict holds in $13/15$ cells across three model families and six Pythia scales, and a sign-shuffle rejects homogeneity in $5/6$ main cells. The structure is invisible to magnitude-only ranking: Todd's top-$20$ captures $64\%$ of cancellers but only $4\%$ of writers on the hierarchical task, and $59\%$ of writers but only $8\%$ of cancellers on the modular task. We rule out six artefact accounts on all $27$ canceller (cell, head) pairs: induction overlap, sinks, generic importance, rank-$1$ copy-suppression, V-cascade, and rank-nearest non-FV controls. Zero-ablating cancellers yields $+0.13$ to $+0.29$ nats of logit gain in $6/6$ main cells with a directionally consistent $+2$ to $+7$ pp accuracy effect.

URL PDF HTML ☆

赞 0 踩 0

2606.07647 2026-06-09 cs.CV cs.CL cs.LG 交叉投稿

Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation

关键位置引导：基于令牌级视觉敏感度引导的LVLMs幻觉缓解

Ruipeng Zhang, Zhihao Li, C. L. Philip Chen, Tong Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出令牌级视觉敏感度引导（TLVS）方法，通过提取令牌级引导向量并自适应调整引导强度，仅在关键解码步骤抑制幻觉，在多个基准上优于现有方法。

详情

AI中文摘要

大型视觉语言模型（LVLMs）取得了快速进展并部署在各种应用中，但幻觉仍然是一个主要挑战。激活引导因其训练开销小和推理时可控制而具有吸引力。然而，我们发现，在自回归解码过程中，视觉条件对令牌预测的影响是稀疏且局部的，许多现有方法对整个序列的图像与非图像差异进行平均，稀释了这些关键信号，导致引导方向信噪比低。此外，许多现有方法应用固定的引导强度，错误分配干预预算，过度扰动非关键令牌，并可能导致不稳定。为了解决这些限制，我们提出了令牌级视觉敏感度引导（TLVS）用于幻觉缓解。我们的方法首先提取令牌级引导向量并进行细化，然后仅在关键位置应用细粒度的、视觉敏感度自适应的引导。这种轻量级、即插即用的机制只需要最少的校准训练，可以应用于各种视觉语言模型。它在每个解码步骤调节引导强度，选择性地抑制易产生幻觉的片段，同时保留基于证据的内容。我们在多个基准上评估TLVS，包括POPE、AMBER、CHAIR（COCO）、MMHal和HallusionBench，证明其相对于先前引导方法的一致改进。

英文摘要

Large vision language models (LVLMs) have made rapid advancements and are deployed across various applications, yet hallucinations remain a major challenge. Activation steering is appealing due to its minimal training overhead and controllability at inference time. However, we found that during autoregressive decoding, visual conditioning affects token prediction sparsely and locally across decoding steps, and many existing methods that average image-versus-no-image differences over the entire sequence dilute these critical signals, yielding low signal-to-noise ratio steering directions. Additionally, many existing methods apply a fixed steering strength, which misallocates the intervention budget, over-perturbs non-critical tokens, and can cause instability. To address these limitations, we propose Token-Level Visual-Sensitivity Steering (TLVS) for hallucination mitigation. Our approach first extracts token-level steering vectors and refines them, and then applies fine-grained, visual-sensitivity-adaptive steering only where it matters. This lightweight, plug-and-play mechanism requires only minimal training for calibration and can be applied across diverse vision-language models. It modulates the steering strength at each decoding step, selectively suppressing hallucination-prone spans while preserving evidence-grounded content. We evaluate TLVS on several benchmarks, including POPE, AMBER, CHAIR (COCO), MMHal, and HallusionBench, demonstrating consistent improvements over previous steering methods.

URL PDF HTML ☆

赞 0 踩 0

2606.07657 2026-06-09 cs.NE cs.LG 交叉投稿

QDS-SNN: Energy-efficient Quantum Deeply-Supervised Spiking Neural Network Algorithm for Traffic Sign Recognition

QDS-SNN：用于交通标志识别的节能量子深度监督脉冲神经网络算法

Zhiguo Qu, Keqi Li, Le Sun, Wenjie Liu, Yimin Yu, Saif Al-Kuwari, Ahmed Farouk

发表机构 * School of Computer Science, School of Software, Nanjing University of Information Science and Technology（计算机科学系、软件学院、信息科学技术大学）

AI总结提出量子深度监督脉冲神经网络（QDS-SNN），结合量子神经网络与时空自适应LIF神经元，在GTSRB数据集上以6个时间步达到99.72%准确率，能耗降低55.77%。

Comments 13 pages, 10 Figures, 8 Tables

详情

AI中文摘要

交通标志识别对于智能交通和自动驾驶至关重要，因为它可以提高驾驶效率并确保道路安全。然而，传统的识别方法基于大规模数据集和密集计算，限制了其实时应用性。脉冲神经网络（SNN）由于其时空处理能力，提供了一种受生物启发的节能替代方案，但在训练过程中存在信息丢失和梯度消失的问题。为了克服这些限制，本研究提出了一种量子深度监督脉冲神经网络（QDS-SNN），它集成了量子神经网络（QNN）以实现高效、低功耗的深度监督。利用量子叠加和纠缠，QNN能够实现表达性表示和并行计算，从而在不影响能效的情况下提升性能。所提出的QDS-SNN包含一个时空自适应LIF（TSA-LIF）神经元和一个量子辅助分类器模块（QACM），以缓解梯度问题并提高训练效果。本研究在PennyLane量子模拟平台上进行实验，结果表明，QDS-SNN在仅6个时间步内，在GTSRB数据集上达到了99.72%的准确率——比MS-ResNet基线高出1.32%，同时能耗降低了55.77%。在TSRD数据集中，它达到了97.90%的准确率，同时能耗降至基线的52.68%。这些结果表明，QDS-SNN为智能交通系统中的交通标志识别提供了一种高性能、节能的解决方案。

英文摘要

Traffic sign recognition is crucial for intelligent transportation and autonomous driving, as it can improve driving efficiency and ensure road safety. However, traditional recognition methods are based on large datasets and intensive computation, which limits their real-time applicability. Spiking Neural Networks (SNNs) offer a biologically inspired, energy-efficient alternative due to their spatiotemporal processing capabilities, but suffer from information loss and vanishing gradients during training. To overcome these limitations, this study proposes a Quantum Deep-supervised Spiking Neural Network (QDS-SNN) that integrates Quantum Neural Networks (QNNs) for efficient, low-power deep supervision. Using quantum superposition and entanglement, QNNs enable expressive representations and parallel computation, thereby enhancing performance without compromising energy efficiency. The proposed QDS-SNN incorporates a temporally and spatially adaptive LIF (TSA-LIF) neuron and a quantum-assisted classifier module (QACM) to mitigate gradient issues and improve training effectiveness. This study conducts experiments on the PennyLane quantum simulation platform, and the results show that QDS-SNN achieves 99.72\% accuracy on the GTSRB dataset in only 6 time steps -- outperforming the MS-ResNet baseline by 1.32\% while reducing energy consumption by 55.77\%. In the TSRD dataset, it achieves 97.90\% accuracy while reducing energy use to 52.68\% of the baseline. These results demonstrate that QDS-SNN offers a high-performance, energy-efficient solution for traffic sign recognition in intelligent transportation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.07675 2026-06-09 eess.IV cs.CV cs.LG 交叉投稿

The Need for Neural ISP in the Small-Pixel Era: How Shrinking Pixels Push Optics to the Limit and Neural Restoration Pushes Back

小像素时代对神经ISP的需求：像素缩小将光学推向极限，神经恢复则逆势而上

Jingxi Li, Neerja Aggarwal, Laurent Gudemann, Shivansh Rao, Vishal Vinod, Tom E. Bishop, Ziv Attar

发表机构 * Glass Imaging Inc（玻璃成像公司）

AI总结针对智能手机小像素长焦模块中光学像差限制分辨率的问题，提出基于学习的神经ISP恢复图像，在0.35微米像素下实现2.5-3倍分辨率提升，表明神经ISP可替代复杂光学设计。

详情

AI中文摘要

智能手机长焦摄像头正接近“长焦物理墙”：随着像素间距缩小至亚0.5微米，光学系统仍受几何像差限制，导致分辨率收益递减。传统图像信号处理器（ISP）无法消除这些像差，因为它们通过局部、分阶段处理运行，没有明确的点扩散函数（PSF）模型。我们展示了基于学习的神经ISP用于图像恢复，通过训练底层退化，逆转了分阶段流水线无法处理的问题，将小像素设计转化为净优势。我们通过一个代表性长焦模块的受控模拟进行研究，评估了五种配置（0.35--0.75微米像素间距）。光圈按比例缩放以保持每像素信噪比和衍射光斑尺寸固定，从而隔离几何像差和空间采样。传统ISP随像素减小仅适度改进，而神经ISP显著扩展：在0.35微米时，其MTF50（垂直）达到745 cycles/mm，比传统ISP分辨率提升2.5-3倍，LPIPS从0.244显著改善至0.151，而传统结果保持相对平坦。在低信噪比扩展中（0.35微米下每帧15 dB突发），多帧神经ISP恢复的性能接近亮光单帧基线，而多帧传统ISP没有显示出有意义的改进——表明小像素下的传统流水线受限于未校正的PSF模糊而非噪声。这些结果指向一种设计理念：神经ISP通过校正残余光学像差而非要求日益复杂的光学系统，实现高分辨率长焦模块。

英文摘要

Smartphone telephoto cameras are approaching a "telephoto physics wall": as pixel pitches shrink toward sub-0.5 micron, the optics remain limited by geometric aberrations, leading to diminishing returns on resolution. Traditional Image Signal Processors (ISPs) cannot eliminate these aberrations, because they operate through local, stage-wise processing with no explicit model of the underlying point spread function (PSF). We demonstrate how a learning-based Neural ISP for image restoration, trained on the underlying degradations, inverts what stage-wise pipelines cannot, turning small-pixel designs into a net advantage. We investigate this through a controlled simulation of a representative telephoto module, evaluating five configurations (0.35--0.75 micron pixel pitch). The aperture is scaled proportionally to keep per-pixel SNR and diffraction spot size fixed, thereby isolating geometric aberration and spatial sampling. While the traditional ISP improves only modestly with smaller pixels, the Neural ISP scales substantially: at 0.35 micron} it reaches 745 cycles/mm MTF50 (vertical), a 2.5--3x resolution improvement over the traditional ISP, and LPIPS improves significantly from 0.244 to 0.151 while traditional results stay comparatively flat. In a low-SNR extension (15 dB per-frame bursts at 0.35 micron), a multi-frame Neural ISP recovers performance close to the bright-light single-frame baseline, whereas a multi-frame traditional ISP shows no meaningful improvement -- indicating that traditional pipelines at small pixels are bottlenecked by uncorrected PSF blur rather than by noise. These results point to a design philosophy in which Neural ISPs enable high-resolution telephoto modules by correcting residual optical aberrations rather than requiring increasingly complex optics.

URL PDF HTML ☆

赞 0 踩 0

2606.07720 2026-06-09 cs.AI cs.CL cs.LG 交叉投稿

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

为什么将残差流限制在层而不是令牌？用于连续潜在推理的持久记忆

Mujtaba Farhan, Maheep Chaudhary

发表机构 * University of Cambridge（剑桥大学）

AI总结针对CoCoNuT在潜在空间推理中因中间隐藏状态被覆盖导致概念瓶颈的问题，提出AGCLR模型，通过门控概念流持久记忆机制，在GSM8K、HotpotQA和ProsQA上取得一致提升。

详情

AI中文摘要

大型语言模型（LLMs）在数学和多跳规划任务上展现了卓越的推理能力。CoCoNuT（连续思维链）范式通过使模型能够在潜在空间中进行推理，同时探索多个推理路径，而不是早期就承诺单一链条，从而扩展了这一能力。然而，我们识别出一个我们称之为\textbf{概念瓶颈}的限制。在每次推理过程中，中间隐藏状态被覆盖，导致模型随着推理深度增加而丢失早期步骤中计算出的关键事实。我们在经验上观察到了这一点。在HotpotQA上，原始CoCoNuT（10.4% EM）未能超过CoT基线（11.0% EM），并且在GSM8K上随着课程深度增加性能下降。为了解决这个问题，我们提出了\textbf{AGCLR}（自适应门控连续潜在推理），它通过一个\textit{门控概念流}增强了CoCoNuT。一个跨所有推理过程保持的持久残差记忆，由三个学习到的门控制：一个将中间事实提交到记忆的\textit{写入}门，一个检索相关先前状态的\textit{读取}门，以及一个修剪不相关上下文的\textit{遗忘}门。在使用GPT-2作为基础模型在GSM8K、HotpotQA和ProsQA上进行评估时，AGCLR在所有类型的数据集上实现了一致的改进。随着课程深度的增加，性能差距进一步扩大，直接解决了概念瓶颈。代码可在https://anonymous.4open.science/r/JJJJ/README.md获取。

英文摘要

Large language models (LLMs) have demonstrated remarkable reasoning abilities on mathematical and multi-hop planning tasks. The CoCoNuT (Chain of Continuous Thought) paradigm~\cite{hao2024coconut} extends this by enabling models to reason in latent space, exploring multiple reasoning paths simultaneously rather than committing to a single chain early on. However, we identify a limitation we term the \textbf{concept bottleneck}. At each reasoning pass, intermediate hidden states are overwritten, causing the model to lose critical facts computed in earlier steps as reasoning depth increases. We observe this empirically. On HotpotQA, vanilla CoCoNuT (10.4\% EM) fails to improve over the CoT baseline (11.0\% EM), and performance degrades with curriculum depth on GSM8K. To address this, we propose \textbf{AGCLR} (Adaptive Gated Continuous Latent Reasoning), which augments CoCoNuT with a \textit{Gated Concept Stream}. A persistent residual memory maintained across all reasoning passes, controlled by three learned gates: a \textit{write} gate that commits intermediate facts to memory, a \textit{read} gate that retrieves relevant prior states, and a \textit{forget} gate that prunes irrelevant context. Evaluated on GSM8K, HotpotQA, and ProsQA using GPT-2 as our base model, AGCLR achieves consistent improvements across all types of datasets. With the performance gap compounding as curriculum depth increases, directly resolving the concept bottleneck. Code available at https://anonymous.4open.science/r/JJJJ/README.md

URL PDF HTML ☆

赞 0 踩 0

2606.08132 2026-06-09 cs.CV cs.LG 交叉投稿

Phase Marginalization for Patch-Grid Instability in Vision Transformers

视觉Transformer中补丁网格不稳定性的相位边缘化

Oğuzhan Ercan

发表机构 * Scientific and Technological Research Council of Türkiye（土耳其科学技术研究委员会）

AI总结提出相位边缘化方法，通过评估结构化补丁网格相位、逆对齐密集输出并在原始图像坐标系聚合，消除视觉Transformer中补丁网格相位引起的预测不稳定性，无需训练即可提升分割、深度和匹配性能。

Comments 13 pages, 1 figure, 9 tables

详情

AI中文摘要

视觉Transformer在固定的补丁网格上操作，这可能导致密集预测中相位依赖的不稳定性：改变补丁划分会改变像素可用的令牌证据，尤其是在边界附近。我们将补丁网格相位形式化为一个干扰变量，并提出相位边缘化，一种事后边缘化方法，该方法评估结构化的补丁网格相位，逆对齐密集输出，并在原始图像坐标系中聚合它们。中心变体，K=4的均匀相位边缘化，无需训练，并在测量的分割、深度和局部匹配设置上优于规范的K=1基线。在受控的Cityscapes实验中，均匀相位边缘化相比基于通用移位的四次前向测试时增强（TTA）提供了适度的计算匹配优势（在最强测试的通用行上平均交并比提高0.31）。一项扩展研究进一步表明，K=4是一个实用的成本-精度权衡：K=8基本不变，K=16在更高延迟下增加很少精度。这些结果将补丁网格相位定位为可测量的干扰变量，并将相位边缘化定位为密集ViT预测的简单诊断和事后边缘化基线。

英文摘要

Vision Transformers operate on fixed patch grids, which can introduce phase-dependent instability for dense prediction: changing the patch partition can change the token evidence available to a pixel, especially near boundaries. We formalize patch-grid phase as a nuisance variable and propose Phase Marginalization, a post-hoc marginalization method that evaluates structured patch-grid phases, inverse-aligns dense outputs, and aggregates them in the original image coordinate system. The central variant, Uniform Phase Marginalization with K = 4, is training-free and improves over the canonical K = 1 baseline across measured segmentation, depth, and local matching settings. In a controlled Cityscapes experiment, Uniform Phase Marginalization provides a modest compute-matched advantage over generic shift-based four-forward test-time augmentation (TTA) (+0.31 mean Intersection-over-Union over the strongest tested generic row). A scaling study further shows that K = 4 is a practical cost-accuracy trade-off: K = 8 is essentially unchanged and K = 16 adds little accuracy at much higher latency. These results position patch-grid phase as a measurable nuisance variable and Phase Marginalization as a simple diagnostic and post-hoc marginalization baseline for dense ViT prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.08203 2026-06-09 math.NA cs.LG cs.NA stat.ML 交叉投稿

Stable and Scalable Probabilistic Numerical Solvers for Stiff and High-Dimensional ODEs

适用于刚性和高维ODE的稳定且可扩展的概率数值求解器

Nathanael Bosch

发表机构 * EPFL（瑞士联邦理工学院）

AI总结针对刚性和高维常微分方程，提出两种互补策略：无矩阵更新步骤实现线性扩展，以及迭代重线性化提升稳定性，从而开发出稳定且可扩展的概率求解器。

详情

AI中文摘要

基于滤波的常微分方程概率数值求解器已被确立为一种灵活高效的仿真框架，具有内置的数值不确定性量化。然而，刚性和高维问题仍然是一个挑战，因为当前方法要么稳定但计算复杂度为ODE维度的三次方，要么线性扩展但牺牲稳定性。在本文中，我们弥合了这一差距，开发了既稳定又可扩展的概率ODE求解器。我们提出了两种互补策略。首先，我们开发了一种无矩阵更新步骤，利用雅可比向量积、迭代线性求解器和随机协方差估计来实现线性扩展，同时保持稳定性。其次，我们提出迭代重线性化以在不牺牲可扩展性的情况下进一步提高稳定性，将概率ODE求解器转变为完全隐式方法。我们在各种刚性和高维问题上评估了所提出的方法，并展示了相对于现有概率求解器在稳定性和可扩展性上的改进。

英文摘要

Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs) have been established as a flexible and efficient simulation framework with built-in numerical uncertainty quantification. However, problems that are both stiff and high-dimensional remain a challenge, as current methods are either stable and have cubic cost in the ODE dimension, or scale linearly at the expense of stability. In this paper, we close this gap and develop probabilistic ODE solvers that are both stable and scalable. We propose two complementary strategies. First, we develop a matrix-free update step that uses Jacobian-vector products, iterative linear solvers, and stochastic covariance estimation to enable linear scaling, all while retaining stability. Second, we propose iterative re-linearization to further improve stability without sacrificing scalability, turning probabilistic ODE solvers into fully implicit methods. We evaluate the proposed approaches on a range of stiff and high-dimensional problems and demonstrate improved stability and scalability over established probabilistic solvers.

URL PDF HTML ☆

赞 0 踩 0

2606.08327 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

Chiaroscuro Attention: Spending Compute in the Dark

明暗对比注意力：在黑暗中投入计算

Prateek Kumar Sikdar

发表机构 * Accenture（埃森哲）

AI总结提出CHIAR-Former，一种基于谱熵路由的混合Transformer，通过DCT谱混合与全注意力互补，在WikiText-103上以62.5%更少注意力FLOPs实现PPL 36.54，较全注意力基线提升45%。

Comments 8 pages, 6 figures, 3 tables

详情

AI中文摘要

标准Transformer在每一层和每个标记上统一应用自注意力，无论输入是否需要动态的跨标记交互。我们提出CHIAR-Former（明暗对比注意力），一种4层混合Transformer，它基于每个标记的谱熵（一种理论上合理的复杂度信号）将每个标记路由到三个算子之一：DCT谱混合、RBF核混合或全自注意力。通过在WikiText-103上的系统消融，我们发现路由崩溃：路由器持续拒绝RBF而偏向DCT和注意力，表明谱混合和动态注意力是互补且充分的。一个专门设计的仅DCT+注意力变体在WikiText-103上达到验证集PPL 36.54——相比全注意力基线（PPL 66.62）提升45%，同时减少62.5%的注意力FLOPs。我们将评估扩展到WikiText-2、IMDB情感分类和合成ListOps操作，建立了一个清晰的操作区间：CHIAR-Former在大型自然文本上表现出色，其中标记多样性支持谱专门化，而全注意力在小数据集和合成模式匹配任务上仍保持优势。这些发现——无论是成功还是失败——共同定义了谱路由何时以及为何值得使用。

英文摘要

Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing, RBF kernel mixing, or full self-attention - based on per-token spectral entropy, a theoretically justified complexity signal. Through systematic ablation on WikiText-103, we discover routing collapse: the router consistently rejects RBF in favour of DCT and attention, revealing that spectral mixing and dynamic attention are complementary and sufficient. A purpose-designed DCT+Attention-only variant achieves Val PPL 36.54 on WikiText-103 - a 45% improvement over a full-attention baseline (PPL 66.62) at 62.5% fewer attention FLOPs. We extend evaluation to WikiText-2, IMDB sentiment classification, and synthetic ListOps operations, establishing a clear operating regime: CHIAR-Former excels on large-scale naturalistic text where token diversity supports spectral specialisation, while full attention retains an edge on small datasets and synthetic pattern-matching tasks. These findings - both the wins and the losses - together define when and why spectral routing earns its keep.

URL PDF HTML ☆

赞 0 踩 0

2606.08347 2026-06-09 cs.CL cs.LG 交叉投稿

Q-Delta：超越键值关联状态演化

Sumin Park, Seojin Kim, Noseong Park

AI总结提出Q-Delta，一种查询感知的delta规则，将混合键-查询预测误差融入状态演化，实现联合校正动态，在语言建模和长上下文检索任务上优于强基线。

Comments Accepted at ICML 2026

详情

AI中文摘要

线性注意力将序列建模重新表述为循环状态演化，实现高效的线性时间推理。在键值关联范式下，现有方法将查询的作用限制在读出操作，使其与状态演化解耦。我们表明，查询条件状态读出在累积记忆上诱导出结构化的值预测，补充了基于键的检索。基于这一洞察，我们提出Q-Delta，一种查询感知的delta规则，将混合键-查询预测误差融入状态演化，在保持delta规则效率的同时实现联合校正动态。我们为所得动态建立了稳定性保证，并推导出硬件高效的块状并行公式，以及自定义Triton实现。实验结果表明，在语言建模和长上下文检索任务上，优化稳定、吞吐量具有竞争力，且一致优于强基线。

英文摘要

Linear attention reformulates sequence modeling as recurrent state evolution, enabling efficient linear-time inference. Under the key-value associative paradigm, existing approaches restrict the role of the query to the readout operation, decoupling it from state evolution. We show that query-conditioned state readout induces a structured value prediction over accumulated memory that complements key-based retrieval. Based on this insight, we propose Q-Delta, a query-aware delta rule that integrates mixed key-query prediction errors into state evolution, enabling jointly corrective dynamics while preserving delta-rule efficiency. We establish stability guarantees for the resulting dynamics and derive a hardware-efficient chunkwise-parallel formulation with a custom Triton implementation. Empirical results demonstrate stable optimization, competitive throughput, and consistent improvements over strong baselines on language modeling and long-context retrieval tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.08814 2026-06-09 cs.AI cs.LG 交叉投稿

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

STAR: 将MoE路由重新思考为结构感知的子空间学习

Sumin Park, Noseong Park

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)（韩国科学技术院）

AI总结提出STAR方法，通过广义Hebbian算法学习主子空间来增强路由对输入结构的感知，实现专家稳定专业化，在合成数据和语言视觉任务上提升路由质量和下游性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

混合专家（MoE）通过选择性地将输入路由到专门的专家子集来高效扩展模型容量。然而，输入-专家专业化（MoE的核心动机）关键取决于路由器是否真正感知输入结构。实践中，MoE路由通常实现为浅层线性投影，对输入表示的感知有限，常导致路由不稳定。我们提出STAR（结构感知路由），将MoE路由重新思考为子空间学习问题，通过广义Hebbian算法（GHA）跟踪主导输入结构的演化主子空间来增强标准可学习路由。通过将路由决策直接与输入结构对齐，STAR实现了稳定的专家专业化。我们在受控合成设置和大规模语言与视觉任务上评估STAR，它持续提高了路由质量和下游性能，超过了强MoE基线。此外，可选的测试时子空间更新进一步增强了输入分布偏移下的路由鲁棒性和泛化能力。

英文摘要

Mixture-of-Experts (MoE) scales model capacity efficiently by selectively routing inputs to a specialized subset of experts. However, input-expert specialization, the core motivation of MoE, critically depends on whether the router is actually aware of input structure. In practice, MoE routing is typically implemented as a shallow linear projection with limited awareness of input representation, which often leads to unstable routing. We propose STAR, a Structure Aware Routing that rethinks MoE routing as a subspace learning problem by augmenting standard learnable routing with an evolving principal subspace that tracks dominant input structure via Generalized Hebbian Algorithm (GHA). By aligning routing decisions directly with input structure, STAR enables stable expert specialization. We evaluate STAR on controlled synthetic setup and large-scale language and vision tasks, where it consistently improves routing quality and downstream performance over strong MoE baselines. Moreover, optional test-time subspace updates further enhance routing robustness and generalization under input distribution shifts.

URL PDF HTML ☆

赞 0 踩 0

2606.08815 2026-06-09 cs.AI cs.CL cs.LG 交叉投稿

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

推理的动量：策略优化中的密集内在信号

Hao Chen, Zhanming Shen, Liyao Li, Yanyu Chen, Xuhang Zhu, Xiaomeng Hu, Qi Zhang, Ru Peng, Xiaoyu Shen, Haobo Wang, Junbo Zhao

发表机构 * Zhejiang University（浙江大学）； The Chinese University of Hong Kong（香港中文大学）； Eastern Institute of Technology（东方理工学院）

AI总结针对GRPO在长链推理中因二元奖励导致的零优势崩溃和幻觉确定性失败模式，提出ISPO方法，通过内在信号密集化奖励，在三个基模型和五个数学推理基准上持续优于基线。

Comments 14 pages, 6 figures, 8 tables

详情

AI中文摘要

基于可验证奖励的强化学习已成为激发大型语言模型长链推理的强大范式。然而，现有基于组相对策略优化（GRPO）的方法依赖于二元结果奖励，这引发了两种结构性失败模式：零优势崩溃，即组内所有轨迹共享相同结果导致梯度消失；以及幻觉确定性，即模型在训练后期对错误轨迹变得过度自信。我们通过使用完全从策略自身条件概率计算的内在信号来密集化奖励，解决了这两种模式，并提出了ISPO（内在信号策略优化），它结合了衡量思考轨迹对最终答案信息量的序列级信号，以及令牌级方向性奖励，其幻觉确定性铰链惩罚关键决策令牌上的错误自信预测。在三个基模型和五个数学推理基准上，ISPO持续优于竞争基线，在零优势崩溃最频繁的最难基准上取得最大提升，训练动态诊断证实两种失败模式均被减少。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for eliciting long-chain reasoning in large language models. However, existing methods based on Group Relative Policy Optimization (GRPO) rely on a binary outcome reward, which induces two structural failure modes: Zero-Advantage Collapse, in which all rollouts in a group share the same outcome and the gradient vanishes, and Hallucinated Certainty, in which the model becomes increasingly confident on incorrect rollouts late in training. We address both modes by densifying the reward with intrinsic signals computed entirely from the policy's own conditional probabilities, and propose ISPO (Intrinsic Signal Policy Optimization, which combines a sequence-level signal measuring how informative the thinking trajectory is for the final answer, with a token-level directional reward whose hallucinated-certainty hinge penalizes confidently-wrong predictions at critical decision tokens. Across three base models and five mathematical reasoning benchmarks, ISPO consistently outperforms competitive baselines, with the largest gains on the hardest benchmarks where zero-advantage collapse is most frequent, and training-dynamics diagnostics confirm that both failure modes are decreased.

URL PDF HTML ☆

赞 0 踩 0

2606.08871 2026-06-09 math.NA cs.LG cs.NA 交叉投稿

SG-OPD: 通过符号一致性门控和分阶段教师采样的符号门控在线蒸馏

Haoran Xu, Hongyu Wang, Yifei Gao, Jiaze Li, Xiaofeng Zhang, Xiaosong Yuan

发表机构 * Zhejiang University（浙江大学）； Hunan University（湖南大学）； Tianjin University（天津大学）； Shanghai Jiao Tong University（上海交通大学）； Jilin University（吉林大学）

AI总结针对在线蒸馏中轨迹级对齐和教师偏好均匀可靠性假设的失效问题，提出SG-OPD方法，通过符号一致性门控和分阶段教师采样改进蒸馏效果，在竞赛级数学推理任务上平均提升1.98和7.50。

详情

AI中文摘要

在线蒸馏（OPD）在自身轨迹上训练学生模型，并利用更强教师的密集逐token监督，通常优于离线蒸馏和标准强化学习。然而，我们发现其有效性隐含地依赖于两个在实践中经常失效的假设：学生与教师之间的轨迹级对齐，以及教师偏好的均匀token级可靠性。因此，我们提出符号门控在线蒸馏（SG-OPD），该方法使用二元验证器作为教师信任信号，在两个互补粒度上发挥作用：分阶段教师采样在冷启动时混合验证器认可的教师轨迹，而符号一致性门控在教师与验证器校正方向一致的token上外推蒸馏更新，在分歧时内插。在竞赛级数学推理基准上的实验表明，SG-OPD持续优于标准OPD，在每样本和每问题水平上平均提升分别为1.98和7.50。

英文摘要

On-policy distillation (OPD) trains a student on its own trajectories with dense per-token supervision from a stronger teacher, and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its effectiveness implicitly relies on two assumptions that frequently break in practice: trajectory-level alignment between the student and the teacher, and uniform token-level reliability of the teacher's preferences. We therefore propose Sign-Gated On-Policy Distillation (SG-OPD), which uses a binary verifier as a trust signal for the teacher at two complementary granularities: phased teacher sampling mixes in verifier-endorsed teacher rollouts at cold-start, and a sign-consistency gate extrapolates the distillation update on tokens where the teacher agrees with the verifier-correct direction and interpolates it where it disagrees. Experiments on competition-level mathematical reasoning benchmarks show that SG-OPD consistently outperforms standard OPD, with average gains of 1.98 and 7.50 at the per-sample and per-question levels, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.09396 2026-06-09 cs.CL cs.LG 交叉投稿

PriFT: Prior-Support Guided Supervised Fine-Tuning

PriFT: 先验支持引导的监督微调

Ke Wang, Shuangqi Li, Mathieu Salzmann, Pascal Frossard

发表机构 * EPFL（瑞士联邦理工学院洛桑分校）

AI总结提出PriFT方法，利用冻结的预训练模型计算token权重，避免在线模型导致的自我强化动态，在数学推理、代码生成和医疗问答任务中取得SFT最优结果，并为后续RL提供更好初始化。

Comments The first two authors contributed equally to this work

详情

AI中文摘要

监督微调（SFT）是下游任务适配的高效方法，通常作为强化学习（RL）的初始化阶段，但其泛化能力可能弱于RL。一个关键限制是其离策略目标：SFT逐token拟合固定演示，包括与模型预训练分布对齐不良的目标，这可能导致过拟合。最近一系列工作通过给与当前模型预测分布更对齐的token分配更大的训练权重来解决此问题，直觉是拟合这些token对模型的预训练知识和表示的扭曲较小。然而，从当前微调模型计算token权重会将token权重与优化轨迹纠缠在一起，随着分布迅速偏离预训练模型，引发自我强化动态。为了解决这个问题，我们提出PriFT（先验支持引导的微调），该方法从冻结的预训练参考模型导出token权重，以获得不受微调影响的稳定重加权信号。该信号估计先验支持：每个目标token受预训练分布支持的程度。在多种现有token重加权规则中，将重加权信号从在线模型替换为预训练模型一致地提升了性能。我们引入了两种实例化：PriFT-prob使用预训练token概率，而PriFT-mass根据预训练分布下的累积概率质量选择token。在数学推理、代码生成和医疗问答上的大量实验表明，PriFT在SFT基线中取得了最先进的结果，并为后续RL训练提供了更好的初始化。

英文摘要

Supervised fine-tuning (SFT) is an efficient approach for downstream task adaptation and often serves as the initialization stage for reinforcement learning (RL), but it can show weaker generalization than RL. A key limitation is its off-policy objective: SFT fits fixed demonstrations token by token, including targets poorly aligned with the model's pretrained distribution, which can lead to overfitting. A recent line of work addresses this issue by assigning larger training weights to tokens better aligned with the current model's predictive distribution, with the intuition that fitting these tokens are less distortive to the model's pretrained knowledge and representations. However, computing the token weights from the model that is currently fine-tuned entangles token weights with the optimization trajectory, inducing a self-reinforcing dynamics as the distribution rapidly departs from the pretrained model. To address this, we propose PriFT (Prior-support guided Fine-Tuning), which derives token weights from a frozen pretrained reference to obtain a stable reweighting signal unaffected by fine-tuning. This signal estimates prior support: the extent to which each target token is supported by the pretrained distribution. Across multiple existing token-reweighting rules, replacing the reweighting signal from the online model to pretrained model consistently improves performance. We introduce two instantiations: PriFT-prob uses pretrained token probability, while PriFT-mass selects tokens by cumulative probability mass under the pretrained distribution. Extensive experiments on mathematical reasoning, code generation, and medical question answering show that PriFT achieves state-of-the-art results among SFT baselines and provides a better initialization for subsequent RL training.

URL PDF HTML ☆

赞 0 踩 0

2606.09734 2026-06-09 quant-ph cs.LG 交叉投稿

Adaptive directional gradients for parameterised quantum circuits

参数化量子电路的自适应方向梯度

Brian Coyle, Snehal Raj, Virag Umathe, El Amine Cherrat, Elham Kashefi

发表机构 * School of Informatics, University of Edinburgh（爱丁堡大学信息学院）； Fujitsu Research of Europe Ltd.（富士通欧洲有限公司）； LIP6, CNRS, Sorbonne Université（LIP6研究所，法国国家科学研究中心，索邦大学）； QC Ware ； Quantum Signals（量子信号）

AI总结提出基于前向自动微分的参数化量子电路梯度估计框架，通过平均随机方向导数得到无偏梯度，并导出自适应优化器QUIVER，在多达1770个参数的问题上比参数平移规则效率提升数个数量级。

Comments 37 pages, 13 figures

详情

AI中文摘要

在量子硬件上训练参数化量子电路（PQC）的瓶颈在于梯度估计的测量成本，在参数平移规则下，该成本与可训练参数数量呈线性关系，并主导了大规模训练的总预算。本文提出了一种基于前向自动微分模式的PQC前向梯度估计器框架，通过平均自由可调数量的随机方向导数得到梯度的无偏估计，并恢复SPSA、随机坐标下降和参数平移规则作为极限情况，无需辅助量子比特或受控门开销。我们证明随机量子前向梯度下降在标准假设下收敛，并给出了显式的二阶矩展开，该展开在SPSA的单方向极端和参数平移的全梯度极端之间插值。在该框架内，我们推导出QUIVER（量子迭代自适应估计器规则），这是一种参数化电路的自适应优化器，其更新规则遵循闭式最小测量成本分配。数值结果表明，在ECG5000和MNIST数据集上，前向梯度训练具有多达60个量子比特和1770个参数的汉明权重保持正交量子神经网络，比参数平移规则效率高数个数量级。我们还证明，我们提出的QUIVER优化器在使用量子近似优化算法和变分量子特征求解器的优化问题上，可以优于iCANS和gCANS等节省测量的优化器。

英文摘要

Training parameterised quantum circuits (PQCs) on quantum hardware is bottlenecked by the measurement cost of gradient estimation, which under the parameter-shift rule scales linearly in the number of trainable parameters and dominates the total shot budget of training at scale. In this work, we propose a framework of forward gradient estimators for PQCs, based on the forward mode of automatic differentiation, that yields an unbiased estimator of the gradient by averaging a freely tunable number of random directional derivatives and recovers SPSA, random coordinate descent, and the parameter-shift rule as limiting cases, with no ancilla qubits or controlled-gate overhead. We prove that stochastic quantum forward gradient descent converges under standard assumptions, with an explicit second-moment expansion that interpolates between the single-direction extreme of SPSA and the full-gradient extreme of parameter-shift. Within this framework we derive QUIVER (Quantum Iterative V-adaptive Estimator Rule), an adaptive optimiser for parameterised circuits whose update rule follows from a closed-form minimum measurement-cost allocation. We show numerically that forward gradients train Hamming-weight-preserving orthogonal quantum neural networks with up to 60 qubits and 1770 parameters on the ECG5000 and MNIST datasets orders of magnitude more efficiently than the parameter-shift rule. We also demonstrate that our proposed QUIVER optimiser can outperform iCANS and gCANS measurement-frugal optimisers on optimisation problems using the quantum approximate optimisation algorithm and quantum simulation with the variational quantum eigensolver.

URL PDF HTML ☆

赞 0 踩 0

2606.09803 2026-06-09 cs.CV cs.GR cs.LG 交叉投稿

Echo-Memory: A Controlled Study of Memory in Action World Models

Echo-Memory：动作世界模型中记忆的受控研究

Wayne King, Zeyue Xue, Yuxuan Bian, Jie Huang, Haoran Li, Yaowei Li, Yaofeng Su, Yuming Li, Haoyu Wang, Shiyi Zhang, Songchun Zhang, Yuwei Niu, Sihan Xu, Junhao Zhuang, Haoyang Huang, Nan Duan

发表机构 * Joy Future Academy

AI总结提出Echo-Memory框架，通过控制变量法研究动作条件世界模型中的记忆机制，发现原始上下文容量和块状状态空间递归对开放域返回任务至关重要。

Comments 9 figures and 28 pages, Code at \href{https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Memory}{this URL}

详情

AI中文摘要

我们提出\textbf{Echo-Memory}，对动作条件世界模型中的记忆机制进行受控研究。这些模型从第一帧、文本提示和相机动作序列生成多段视频，但其核心失败往往是记忆而非局部图像合成：当相机离开并返回时，场景或显著物体可能悄然改变。现有记忆设计难以比较，因为增益与骨干网络、训练、检索和评估差异纠缠在一起。Echo-Memory固定了动作到视频的接口，仅改变生成器存储和读取历史的方式。在共享的视频扩散骨干网络、优化器、相机动作表示、采样器和评估流程下，我们比较了原始上下文、基于压缩的记忆、具有不同读取路径的空间摘要以及状态空间递归。这种匹配矩阵分离了四个通常混淆的轴：\emph{容量}、\emph{压缩}、\emph{读取}和\emph{递归}。我们还通过三个分支协议评估记忆：重放质量、域内循环重访和开放域返回探测。这些分支通常不一致，表明重放保真度不足以作为记忆世界的代理。得出三个发现。原始上下文是一个强大的容量基线，并且比重放指标更能改善开放域返回。紧凑性不能免费替代容量：激进的混合压缩记忆会丢失返回所需的显著证据。最后，块状状态空间递归是我们矩阵中最强的开放域返回机制，表明隐式记忆的结构与是否使用记忆同样重要。这些结果为在孤立的重放指标之外研究动作世界模型中的记忆提供了一个紧凑的协议。

英文摘要

We present \textbf{Echo-Memory}, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera leaves and returns, the scene or salient object may silently change. Existing memory designs are hard to compare because gains are entangled with backbone, training, retrieval, and evaluation differences. Echo-Memory fixes the action-to-video interface and varies only how history is stored and read by the generator. Under a shared video diffusion backbone, optimizer, camera-action representation, sampler, and evaluation pipeline, we compare raw context, compression-based memory, spatial summaries with different read-out paths, and state-space recurrence. This matched matrix separates four otherwise conflated axes: \emph{capacity}, \emph{compression}, \emph{read-out}, and \emph{recurrence}. We also evaluate memory through a three-branch protocol: replay quality, in-domain loop revisit, and open-domain return probes. The branches routinely disagree, showing that replay fidelity is not a sufficient proxy for remembering a world. Three findings follow. Raw context is a strong capacity baseline and improves open-domain return far more than it improves replay metrics. Compactness is not a free substitute for capacity: aggressive spatial and hybrid-compression memories lose the salient evidence needed for return. Finally, block-wise state-space recurrence is the strongest open-domain return mechanism in our matrix, showing that the structure of implicit memory matters as much as the decision to use it. These results provide a compact protocol for studying memory in action world models beyond isolated replay metrics.

URL PDF HTML ☆

赞 0 踩 0

2402.13425 2026-06-09 cs.LG cs.AI stat.ML 版本更新

Investigating the Histogram Loss in Regression

探究回归中的直方图损失

Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

发表机构 * Alberta Machine Intelligence Institute (Amii) and Reinforcement Learning and Artificial Intelligence Laboratory（阿尔伯塔机器智能研究所（Amii）和强化学习与人工智能实验室）； Department of Computing Science, University of Alberta（计算科学系，阿尔伯塔大学）； University of Tübingen（图宾根大学）； Zuse School ELIZA（祖斯学校ELIZA）

AI总结本文通过理论和实验分析，探究直方图损失在回归任务中提升性能的原因，发现其优势源于优化改进而非额外信息建模，并在常见深度学习应用中验证其有效性。

Comments 52 pages

详情

Journal ref: JMLR,2026

AI中文摘要

在回归任务中，即使预测只需要均值，训练神经网络来建模整个分布也变得越来越常见。这种额外的建模通常会带来性能提升，但其背后的原因尚不完全清楚。本文研究了一种最近的回归方法——直方图损失，该方法通过最小化目标分布与灵活直方图预测之间的交叉熵来学习目标变量的条件分布。我们设计了理论和实证分析，以确定这种性能提升出现的原因和时机，以及损失的不同组成部分如何贡献于这种提升。我们的结果表明，在这种设置下学习分布的好处来自于优化方面的改进，而非建模额外信息。然后，我们展示了直方图损失在常见深度学习应用中的可行性，无需昂贵的超参数调优。

英文摘要

It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.

URL PDF HTML ☆

赞 0 踩 0

2505.20137 2026-06-09 cs.LG cs.AI 版本更新

ePC: Fast and Deep Predictive Coding in Digital Simulation

ePC：数字仿真中的快速深度预测编码

Cédric Goemaere, Gaspard Oliviers, Rafal Bogacz, Thomas Demeester

发表机构 * IDLab, Ghent University -- imec, Belgium（ID实验室，根特大学——imec，比利时）； Brain Network Dynamics Unit, University of Oxford, UK（脑网络动力学单位，牛津大学，英国）

AI总结提出误差预测编码（ePC），通过重新参数化解决标准状态预测编码（sPC）在数字仿真中的指数信号衰减问题，实现与反向传播相当的深度模型训练速度。

Comments Accepted at ICML 2026 - Main Track. All code available at https://github.com/cgoemaere/error_based_PC

详情

AI中文摘要

预测编码（PC）为神经网络训练提供了一种受大脑启发的反向传播替代方案，被描述为最小化其内部能量的物理系统。然而，在实践中，PC主要是在数字仿真中实现的，需要大量的计算，同时难以扩展到更深的架构。本文重新构建了PC以克服这种硬件-算法不匹配。首先，我们揭示了规范的状态基PC（sPC）在数字仿真中本质上是深度低效的，不可避免地导致指数级信号衰减，从而阻碍整个最小化过程。然后，为了克服这一根本限制，我们引入了误差基PC（ePC），这是一种新的PC重新参数化，不会遭受信号衰减。虽然不再具有生物合理性，但ePC数值计算精确的PC权重梯度，运行速度比sPC快几个数量级。跨多个架构和数据集的实验表明，即使在sPC难以处理的更深模型中，ePC也能匹配反向传播的性能。除了实际改进，我们的工作还提供了对PC动力学的理论洞察，并为在数字硬件及更广泛领域将基于PC的学习扩展到更深架构奠定了基础。

英文摘要

Predictive Coding (PC) offers a brain-inspired alternative to backpropagation for neural network training, described as a physical system minimizing its internal energy. However, in practice, PC is predominantly digitally simulated, requiring excessive amounts of compute while struggling to scale to deeper architectures. This paper reformulates PC to overcome this hardware-algorithm mismatch. First, we uncover how the canonical state-based formulation of PC (sPC) is, by design, deeply inefficient in digital simulation, inevitably resulting in exponential signal decay that stalls the entire minimization process. Then, to overcome this fundamental limitation, we introduce error-based PC (ePC), a novel reparameterization of PC which does not suffer from signal decay. Though no longer biologically plausible, ePC numerically computes exact PC weights gradients and runs orders of magnitude faster than sPC. Experiments across multiple architectures and datasets demonstrate that ePC matches backpropagation's performance even for deeper models where sPC struggles. Besides practical improvements, our work provides theoretical insight into PC dynamics and establishes a foundation for scaling PC-based learning to deeper architectures on digital hardware and beyond.

URL PDF HTML ☆

赞 0 踩 0

2509.10534 2026-06-09 cs.LG cs.AI cs.CL 版本更新

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

解耦“什么”和“哪里”：极坐标位置嵌入

Anand Gopalakrishnan, Robert Csordás, Jürgen Schmidhuber, Michael C. Mozer

发表机构 * DeepMind, London, UK（深度Mind，伦敦，英国）

AI总结提出极坐标位置嵌入（PoPE）以解耦Transformer注意力机制中的内容和位置，在诊断任务、序列建模和语言模型中优于RoPE，并展现零样本长度外推能力。

Comments ICML 2026 camera-ready version

详情

AI中文摘要

Transformer架构中的注意力机制根据内容（“什么”）和序列中的位置（“哪里”）将键匹配到查询。我们提出一项分析，表明在流行的RoPE旋转位置嵌入中，“什么”和“哪里”是纠缠的。这种纠缠会损害性能，特别是当决策需要在这两个因素上独立匹配时。我们提出对RoPE的改进，称为极坐标位置嵌入（PoPE），它消除了“什么-哪里”的混淆。PoPE在仅通过位置或内容进行索引的诊断任务上表现远优于基线。在音乐、基因组和自然语言领域的自回归序列建模中，使用PoPE作为位置编码方案的Transformer在评估损失（困惑度）和下游任务性能上优于使用RoPE的基线。在语言建模中，这些优势在模型规模从124M到774M参数时持续存在。关键的是，与RoPE甚至专为外推设计的方法YaRN（需要额外微调和频率插值）相比，PoPE展现出强大的零样本长度外推能力。

英文摘要

The attention mechanism in a Transformer architecture matches key to query based on both content -- the what -- and position in a sequence -- the where. We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding. This entanglement can impair performance particularly when decisions require independent matches on these two factors. We propose an improvement to RoPE, which we call Polar Coordinate Position Embeddings or PoPE, that eliminates the what-where confound. PoPE is far superior on a diagnostic task requiring indexing solely by position or by content. On autoregressive sequence modeling in music, genomic, and natural language domains, Transformers using PoPE as the positional encoding scheme outperform baselines using RoPE with respect to evaluation loss (perplexity) and downstream task performance. On language modeling, these gains persist across model scale, from 124M to 774M parameters. Crucially, PoPE shows strong zero-shot length extrapolation capabilities compared not only to RoPE but even a method designed for extrapolation, YaRN, which requires additional fine tuning and frequency interpolation.

URL PDF HTML ☆

赞 0 踩 0

2509.12760 2026-06-09 cs.LG cs.CL 版本更新

Similarity-Distance-Magnitude Activations

相似度-距离-幅度激活函数

Allen Schmaltz

发表机构 * Reexpress AI

AI总结本文提出SDM激活函数，通过引入相似度和距离意识提升softmax的鲁棒性和可解释性，并通过密集匹配实现基于实例的可解释性。SDM估计器通过数据驱动的CDF分区控制分类准确性，优于现有校准方法。

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026. 21 pages, 8 tables, 1 algorithm. arXiv admin note: substantial text overlap with arXiv:2502.20167

2509.15494 2026-06-09 cs.LG physics.data-an 版本更新

Multi-resolution Enhancement for Full Spectrum Neural Representations

全频谱神经表示的多分辨率增强

Yuan Ni, Zhantao Chen, Shizhou Xu, Cheng Peng, Rajan Plumley, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner

发表机构 * Linac Coherent Light Source, SLAC National Accelerator Laboratory（直线相干光源，SLAC国家加速器实验室）； Stanford Institute for Materials and Energy Sciences, Stanford University（斯坦福大学材料与能源科学研究所）； Walker Department of Mechanical Engineering, The University of Texas at Austin（德克萨斯大学奥斯汀分校机械工程系）； Department of Mathematics, University of California Davis（加州大学戴维斯分校数学系）； Department of Physics, Carnegie Mellon University（卡内基梅隆大学物理系）

AI总结提出WIEN-INR框架，通过分层增强网络在不同分辨率尺度上建模，提升小网络对多尺度结构和高频细节的表示能力，实现紧凑高保真表示。

详情

AI中文摘要

科学数据采集持续超越存储和分析能力，使得基于体素的表示越来越难以处理。隐式神经表示（INRs）通过基于坐标的神经网络编码信号，作为数据的替代品，其计算和存储需求随网络复杂度而非数据维度扩展，提供了有前景的解决方案。然而，较小的INRs难以忠实表示构成科学测量大部分的多尺度结构、高频信息和精细纹理。我们提出WIEN-INR，一个理论指导的分层INR框架，跨分辨率尺度分配建模，并通过新颖的增强网络恢复细微细节，从而提高表示能力。这种多尺度架构允许较小的网络保留全频谱信息，同时保持训练效率并降低存储成本。在跨尺度和复杂性的不同原始实验测量上评估，WIEN-INR代表了神经表示在科学工作流中更广泛采用的实用步骤，提供了紧凑、鲁棒和高保真的表示。

英文摘要

Scientific data acquisition continues to outpace storage and analysis capabilities, making voxel-based representations increasingly intractable. Implicit neural representations (INRs) offer a promising solution by encoding signals through coordinate-based neural networks, serving as surrogates of data, with computational and storage requirements scaling with network complexity rather than data dimensionality. However, smaller INRs struggle to faithfully represent the multi-scale structures, high-frequency information, and fine textures that constitute a large proportion of scientific measurements. We propose WIEN-INR, a theoretically-guided hierarchical INR framework that distributes modeling across resolution scales and enables improved representation capacity through a novel enhancement network to recover subtle details. This multi-scale architecture allows smaller networks to retain the full spectrum of information while preserving training efficiency and lowering storage cost. Evaluated on distinct raw experimental measurements across scales and complexities, WIEN-INR represents a practical step toward broader adoption of neural representations in scientific workflows, delivering compact, robust, and high-fidelity representations.

URL PDF HTML ☆

赞 0 踩 0

2510.22450 2026-06-09 cs.LG cs.AI 版本更新

Vision Hopfield Memory Networks

Jianfeng Wang, Amine M'Charrak, Luk Koska, Xiangtao Wang, Daniel Petriceanu, Ruizhi Wang, Michael Bumbar, Luca Pinchetti, Thomas Lukasiewicz

发表机构 * Department of Computer Science, University of Oxford（牛津大学计算机科学系）； Faculty of Informatics, Vienna University of Technology（维也纳理工大学信息学院）

AI总结本文提出了一种受大脑启发的视觉Hopfield记忆网络（V-HMN），通过整合分层记忆机制和迭代细化更新，实现了统一框架下的局部和全局动态建模，提升了可解释性和数据效率。

详情

AI中文摘要

近年来，视觉和多模态基础模型，如Transformer家族和状态空间模型（如Mamba）在图像、文本等领域取得了显著进展。尽管这些架构在经验上取得了成功，但它们与人脑的计算原理仍有很大差距，通常需要大量的训练数据且可解释性有限。在本文中，我们提出了视觉Hopfield记忆网络（V-HMN），一种受大脑启发的基础模型，整合了分层记忆机制和迭代细化更新。具体而言，V-HMN包含局部Hopfield模块，提供图像块级别的关联记忆动态，全局Hopfield模块作为情境调节的事件记忆，以及受预测编码启发的细化规则用于迭代误差校正。通过将这些基于记忆的模块分层组织，V-HMN在一个统一的框架中捕捉了局部和全局动态。记忆检索揭示了输入与存储模式之间的关系，使决策更具可解释性，而存储模式的重用提高了数据效率。这种受大脑启发的设计因此在可解释性和数据效率方面超越了现有的自注意或状态空间方法。我们在公开的计算机视觉基准上进行了广泛的实验，V-HMN在与广泛采用的基础架构竞争的同时，提供了更好的可解释性、更高的数据效率和更强的生物合理性。这些发现突显了V-HMN作为下一代视觉基础模型的潜力，同时为文本和音频等领域的多模态基础模型提供了通用的蓝图，从而将受大脑启发的计算与大规模机器学习联系起来。

英文摘要

Recent vision backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress on image recognition. Despite their empirical success, these architectures remain far from the computational principles of the human brain, often demanding enormous amounts of training data while offering limited interpretability. We propose the Vision Hopfield Memory Network (V-HMN), a brain-inspired vision backbone that integrates hierarchical memory mechanisms across layers with iterative refinement updates. Specifically, V-HMN incorporates local Hopfield modules that provide associative memory dynamics at the image patch level, global Hopfield modules that function as episodic memory for contextual modulation, and a predictive-coding-inspired refinement rule for iterative error correction. By organizing these memory-based modules hierarchically, V-HMN captures both local and global dynamics in a unified framework. Memory retrieval exposes the relationship between inputs and stored patterns, providing a prototype-based form of interpretability through explicit memory retrieval, while the reuse of stored patterns improves data efficiency. This brain-inspired design therefore enhances data efficiency and provides a prototype-based form of interpretability compared to existing self-attention- or state-space-based approaches. We conducted extensive experiments on public image classification benchmarks. V-HMN achieves strong performance on small- and medium-scale benchmarks, and remains competitive with widely adopted backbone architectures on ImageNet despite minimal architectural tuning, while offering improved data efficiency and a prototype-based form of interpretability. These findings highlight the potential of V-HMN as a memory-centric alternative to standard vision backbones, thereby bridging brain-inspired computation with modern machine learning.

URL PDF HTML ☆

赞 0 踩 0

2604.09967 2026-06-09 cs.LG cs.AI 版本更新

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

Muon²：通过自适应二阶矩预条件提升穆隆

Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang

发表机构 * University of California at Santa Barbara（加州大学圣巴巴拉分校）； University at Albany, SUNY（阿尔巴尼大学，SUNY）

AI总结 Muon²通过引入Adam风格的自适应二阶矩预条件改进了穆隆的效率与质量，提升了极化近似中的收敛速度和实际正交化质量，实验表明其在参数规模达13B的预训练任务中表现更优。

Comments Preprint, subject to update

详情

AI中文摘要

Muon已展现为一种有前途的优化器，用于大规模基础模型预训练，通过迭代正交化利用神经网络更新的矩阵结构。然而，Muon的正交化质量依赖于执行的牛顿-施卢茨（NS）迭代次数，这带来了效率挑战，因为其计算和通信成本非平凡。我们提出Muon²，作为Muon的扩展，通过在正交化前应用Adam风格的自适应二阶矩预条件来提高质量和效率。我们的关键见解是，Muon的核心挑战在于极化近似中的病态动量矩阵，其谱通过Muon²显著改善，从而更快收敛到实用的正交化。我们进一步通过方向对齐特性化了实际正交化质量，在此情况下，Muon²在每个极化步骤中均显著优于Muon。在GPT、LLaMA和专家混合预训练实验中，Muon²（及其内存高效变种Muon²-F）在参数规模达13B时，始终优于Muon及其变种，同时将NS迭代次数减少40%，并在达到相同损失时节省了多达四分之一的训练时间。

英文摘要

Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quality of Muon hinges on the number of Newton--Schulz (NS) iterations performed, which poses efficiency challenges due to its non-trivial computation and communication cost. We propose Muon$^2$, an extension of Muon, to improve both quality and efficiency by applying Adam-style adaptive second-moment preconditioning before orthogonalization. Our key insight is that the core challenge of polar approximation in Muon lies in the ill-conditioned momentum matrix, of which the spectrum is substantially improved by Muon$^2$, leading to faster convergence toward a practically sufficient orthogonalization. We further characterize the practical orthogonalization quality via directional alignment, under which Muon$^2$ demonstrates dramatic improvement over Muon at each polar step. Across GPT, LLaMA, and Mixture-of-Experts pre-training experiments up to 13B parameters, Muon$^2$ (and its memory-efficient variant Muon$^2$-F that preserves most of its benefits) consistently outperforms Muon and its variants while reducing NS iterations by 40%, and saves up to 1/4 training time over Muon when achieving the same loss.

URL PDF HTML ☆

赞 0 踩 0

2605.06384 2026-06-09 cs.LG cs.AI cs.FL 版本更新

MinMax Recurrent Neural Cascades

MinMax 循环神经网络级联

Alessandro Ronca

发表机构 * IRIS-AI

AI总结 MinMax RNCs 通过MinMax代数构建，具备强表达性、高效评估、稳定动态和非消失状态梯度等特性，在合成任务中表现优异，能处理长序列并超越传统循环基线。

Comments Code: https://github.com/minmaxrnc/model

详情

AI中文摘要

我们引入MinMax循环神经网络级联（MinMax RNCs），一种基于MinMax代数新形式递归的循环神经网络。我们展示了MinMax RNCs具有一些难以同时获得的关键性质：强大的形式表达性、高效的评估、稳定的动态和非消失的状态梯度。首先，其形式表达性对应正则语言，可能是有限记忆系统的最大表达性。其次，除了递归形式的评估外，它们还允许并行扫描评估，具有对数深度和线性工作量。第三，其状态和激活在所有序列长度下均被统一限制。第四，其损失梯度几乎处处存在且在所有序列长度下均被统一限制。第五，它们不表现出消失的状态梯度：状态相对于过去状态的梯度可以独立于状态之间的时距保持范数一。经验上，我们发现这些理论性质转化为强大的实际性能。MinMax RNCs完美解决了考虑的合成任务，能够泛化到长序列，并在实验中超越了考虑的循环基线。我们还训练了一个1.12亿参数的MinMax RNC进行下一个token预测，获得与其规模相竞争的性能，提供了初始证据表明MinMax递归可以扩展到现实世界的序列建模任务。

英文摘要

We introduce MinMax Recurrent Neural Cascades (MinMax RNCs), a class of recurrent neural networks built from a novel form of recurrence over the MinMax algebra. We show that MinMax RNCs enjoy key properties that are difficult to obtain simultaneously: strong formal expressivity, efficient evaluation, stable dynamics, and non-vanishing state gradients. First, their formal expressivity corresponds to the regular languages, arguably the maximal expressivity for finite-memory systems. Second, in addition to evaluation in recurrent form, they also admit parallel-scan evaluation with logarithmic depth and linear work in the input length. Third, their states and activations are uniformly bounded for all sequence lengths. Fourth, their loss gradients exist almost everywhere and are uniformly bounded for all sequence lengths. Fifth, they do not exhibit vanishing state gradients: the gradient of a state with respect to a past state can retain norm one independently of the temporal distance between the states. Empirically, we find that these theoretical properties translate into strong practical performance. MinMax RNCs solve the considered synthetic tasks perfectly, generalise to long sequences, and outperform the recurrent baselines considered in our experiments. We also train a 112M-parameter MinMax RNC for next-token prediction, obtaining competitive performance for its size and providing initial evidence that MinMax recurrence can scale to real-world sequence-modelling tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.11855 2026-06-09 cs.LG cs.AI cs.AR 版本更新

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

提升为超低功耗应用设计的可并行递归神经网络的性能和学习稳定性

Julien Brandoit, Arthur Fyon, Damien Ernst, Guillaume Drion

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出CMRU和αCMRU，通过累积更新公式恢复梯度流并保持持久记忆，提升收敛稳定性并减少初始化敏感性，在多样本基准中表现优异，尤其在需要离散长距离保留的任务中表现突出。

Comments Accepted as a spotlight at ICML2026. This work has been the subject of patent applications under numbers EP26175243.0 and EP26175248.9

详情

AI中文摘要

序列学习主要由Transformer和可并行递归神经网络（如状态空间模型）主导，但学习长期依赖仍具挑战性，最先进的设计以性能牺牲换取功耗降低。Bistable Memory Recurrent Unit（BMRU）被引入以实现超低功耗RNNs的软硬件协同设计：具有滞后特性的量化状态提供持久记忆并直接映射到模拟基本单元。然而，BMRU在复杂序列任务上性能落后于可并行RNNs。本文识别出在状态更新期间出现的梯度阻塞是关键限制，并提出累积更新公式以恢复梯度流并保持持久记忆，通过时间创建跳跃连接。这导致了累积记忆递归单元（CMRU）及其放松变体αCMRU。实验表明，累积公式显著提高了收敛稳定性并减少了初始化敏感性。CMRU和αCMRU在小模型规模下在多样本基准中与线性递归单元（LRUs）和最小门控递归单元（minGRUs）匹配或超越，尤其在需要离散长距离保留的任务中表现突出，同时CMRU保留量化状态、持久记忆和抗噪声动态，这些对于模拟实现至关重要。

英文摘要

Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to analog primitives. However, BMRU performance lags behind parallelizable RNNs on complex sequential tasks. In this paper, we identify gradient blocking during state updates as a key limitation and propose a cumulative update formulation that restores gradient flow while preserving persistent memory, creating skip-connections through time. This leads to the Cumulative Memory Recurrent Unit (CMRU) and its relaxed variant, the $α$CMRU. Experiments show that the cumulative formulation dramatically improves convergence stability and reduces initialization sensitivity. The CMRU and $α$CMRU match or outperform Linear Recurrent Units (LRUs) and minimal Gated Recurrent Units (minGRUs) across diverse benchmarks at small model sizes, with particular advantages on tasks requiring discrete long-range retention, while the CMRU retains quantized states, persistent memory, and noise-resilient dynamics essential for analog implementation.

URL PDF HTML ☆

赞 0 踩 0

2605.15690 2026-06-09 cs.LG 版本更新

FRWKV+: Periodic-Aware Adaptive Gating for Frequency-Space Linear Time Series Forecasting

FRWKV+: 基于周期感知的自适应门控用于频率域线性时间序列预测

Qingyuan Yang, Dongyue Chen, Da Teng, Junhua Xiao, Jiaji Pan, Shizhuo Deng

发表机构 * College of Information Science and Engineering, Northeastern University（信息科学与工程学院，东北大学）； Foshan Graduate School of Innovation, Northeastern University（创新研究生学院，东北大学）； National Frontiers Science Center for Industrial Intelligence and Systems Optimization（工业智能与系统优化国家级前沿科学中心）

AI总结本文提出FRWKV-Plus模型，通过引入跨分支频谱门和信任门控残差修正，提升频率域时间序列预测的准确性与效率，实验表明其在多个基准数据集上表现优异。

详情

AI中文摘要

准确且高效的长期多变量时间序列预测需要捕捉重复的时序结构，同时在许多变量和预测范围上保持推理成本低。频率域模型能紧凑地表示长程和周期性变化，但通常将实部和虚部频谱组件作为弱耦合流处理，并将周期性提示作为普通输入特征，即使这些提示不可靠。本文提出FRWKV-Plus，一种轻量级周期感知频率域预测模型，基于高效的FRWKV骨干网络。FRWKV-Plus引入了跨分支频谱门，通过总结其兄弟分支来重新加权每个频谱分支，并引入信任门控残差修正，将紧凑的周期内上下文转换为有界的、符号灵活的调整。通过构造，修正在初始化时保持恒等，并严格有界，因此周期性证据可以细化但不会主导或反转基础交互。在七个标准基准上，FRWKV-Plus在强线性、频率域、递归式和Transformer基预测器中表现一致竞争，同时保持骨干网络的轻量级特性。受控三种子消融实验显示，每个组件都起作用，收益在强周期性数据上较小，在更难的交换和IL数据集上更显著，且周期内上下文是最有影响力的单一组件。实现已公开在https://github.com/yangqingyuan-byte/FRWKV-plus。

英文摘要

Accurate and efficient long-term multivariate time series forecasting requires capturing recurring temporal structure while keeping inference cheap across many variables and horizons. Frequency-space models represent long-range and periodic variation compactly, but they typically process the real and imaginary spectral components as weakly coupled streams and treat periodic cues as ordinary input features, even when such cues are unreliable. This paper proposes FRWKV-Plus, a lightweight periodic-aware frequency-space forecasting model built on the efficient FRWKV backbone. FRWKV-Plus introduces a cross-branch spectral gate that reweights each spectral branch using a summary of its sibling branch, and a trust-gated residual correction that converts compact within-period context into a bounded, sign-flexible adjustment of these gates under a learned, data-dependent trust score. By construction, the correction is identity-preserving at initialization and strictly bounded, so periodic evidence can refine but never dominate or invert the base interaction. On seven standard benchmarks, FRWKV-Plus is consistently competitive with strong linear, frequency-domain, recurrent-style, and Transformer-based forecasters while preserving the lightweight profile of the backbone. Controlled three-seed ablations show that each component contributes, that the benefit is modest on strongly periodic data and pronounced on the harder Exchange and ILI datasets, and that the within-period context is the most influential single component. The implementation is publicly available at https://github.com/yangqingyuan-byte/FRWKV-plus.

URL PDF HTML ☆

赞 0 踩 0

2606.04752 2026-06-09 cs.LG cs.AI 版本更新

An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

多通道信号Transformer输入编码器的实证审计

Ossi Lehtinen

发表机构 * Anthropic

AI总结通过合成基准和真实数据ETTh1，实证审计八种输入编码器，发现标准线性投影（nn.Linear(C, d_model)）在大多数情况下与复杂替代方案性能相当，仅共享标量基线和通道独立基线显著落后。

Comments 21 pages, 1 figure, 8 tables. Code: https://github.com/OssiLehtinen/channel-encoder-audit

详情

面向中轴线的符号距离函数学习

Samuel Weidemaier, Christoph Norden-Smoch, Martin Rumpf

发表机构 * Institute for Numerical Simulation, University of Bonn（数值模拟研究所，波恩大学）

AI总结本文提出一种新的变分方法，用于计算高精度的全局符号距离函数，通过高阶变分公式考虑梯度的跳跃集，以提高计算精度。

详情

AI中文摘要

我们提出了一种新的变分方法，用于计算给定点云的高精度全局符号距离函数（SDF）。为此，通过高阶变分公式显式考虑SDF梯度的跳跃集，即表面的中轴线，该公式强制在远离此不连续集的方向上沿梯度方向线性增长。Eikonal方程和SDF的零水平集被作为约束条件。为了使该变分问题具有计算可行性，采用了一种相场近似方法，属于Ambrosio-Tortorelli类型。相关的相场函数隐式地描述了中轴线。该方法用于由无向点云表示的表面，使用神经网络近似SDF和相场函数。实验表明，该方法在近场和全局范围内均具有较高的准确性。定量和定性比较表明，所提出的方法具有优势。

英文摘要

We propose a novel variational method to compute a highly accurate global signed distance function (SDF) to a given point cloud. To this end, the jump set of the gradient of the SDF, which coincides with the medial axis of the surface, is explicitly taken into account through a higher-order variational formulation that enforces linear growth along the gradient direction away from this discontinuity set. The eikonal equation and the zero-level set of the SDF are enforced as constraints. To make this variational problem computationally tractable, a phase field approximation of Ambrosio-Tortorelli type is employed. The associated phase field function implicitly describes the medial axis. The method is implemented for surfaces represented by unoriented point clouds using neural network approximations of both the SDF and the phase field. Experiments demonstrate the method's accuracy both in the near field and globally. Quantitative and qualitative comparisons with other approaches show the advantages of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2605.04913 2026-06-09 cs.CL cs.LG 版本更新

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

重新思考局部学习：一种更便宜更快的LLM后训练配方

Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang, Junhao Su

发表机构 * Independent Researcher（独立研究者）； D 4 Lab（D4实验室）； Southeast University（东南大学）

AI总结本文提出LoPT，一种局部学习后训练策略，通过在transformer中点设置梯度边界，降低内存成本，提高训练效率并保留预训练能力。

Comments 35pages

详情

AI中文摘要

LLM后训练通常通过完整深度传播任务梯度。尽管这种端到端结构简单通用，但将其任务适应与完整深度激活存储、长距离反向依赖和直接任务梯度访问预训练表示耦合在一起。我们主张这种完整深度反向耦合可能不必要的昂贵和侵入性，尤其是在后训练监督远比预训练狭窄时。为此，我们提出LoPT：局部学习后训练，一种简单的后训练策略，使梯度达到成为显式设计选择。LoPT在transformer中点放置单一梯度边界：后半部分块从任务目标学习，而前半部分块通过轻量级特征重建目标进行更新，以保留有用的表示并保持接口兼容性。LoPT缩短了任务引起的反向路径，同时限制了狭窄任务梯度对早期层表示的直接干扰。大量实验表明，LoPT在较低的内存成本、较高的训练效率和更好的保留预训练能力方面实现了竞争性性能。我们的代码可在：https://github.com/HumyuShi/LoPT获取。

英文摘要

LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT

URL PDF HTML ☆

赞 0 踩 0

2605.27410 2026-06-09 quant-ph cs.LG cs.NE 版本更新

Zero-shot Quantum Neural Architecture Search

零样本量子神经架构搜索

Tung Dao, Son N. Tran, Huynh Thi Thanh Binh

发表机构 * Hanoi University of Science and Technology（河内科学技术大学）； Deakin University（德金大学）

AI总结针对变分量子算法中电路架构设计的高计算成本问题，基于量子神经正切核的Gram矩阵收敛性，提出零样本代理模型和MCTS框架MZeQAS，无需完整训练即可高效搜索高性能架构。

详情

AI中文摘要

变分量子算法是利用近期量子硬件的主要方法，通过参数化量子电路和经典优化来获得优势。尽管前景广阔，但VQA的实际部署受到设计平衡表达性、可训练性和硬件约束的量子电路架构的挑战。现有的基于进化的量子神经架构搜索方法解决了这些挑战，但由于候选电路的重复训练而导致高计算成本。在这项工作中，我们确定了量子神经正切核的Gram矩阵收敛的设置。基于这一观察，我们设计了一个零样本代理模型来估计候选性能而无需完整训练，显著加速了架构搜索过程。利用该代理，我们提出了MZeQAS，一种基于蒙特卡洛树搜索的零样本量子神经架构搜索框架，用于VQA。通过将基于代理的性能估计与MCTS探索相结合，MZeQAS高效地发现了高性能架构。实验结果表明，MZeQAS在搜索效率和解决方案质量方面均优于现有方法，为在噪声中等规模量子设备上推进VQA部署提供了一个可扩展且有效的框架。

英文摘要

Variational Quantum Algorithms (VQAs) are a leading approach to exploiting near-term quantum hardware, leveraging parameterized quantum circuits and classical optimization to achieve advantage. Despite their promise, the practical deployment of VQAs is challenged by the difficulty of designing quantum circuit architectures that balance expressivity, trainability, and hardware constraints. Existing evolutionary-based quantum neural architecture search methods address these challenges but suffer from high computational costs due to repeated training of candidate circuits. In this work, we identify a setting in which the Gram matrix of the Quantum Neural Tangent Kernel converges. Building on this observation, we design a zero-shot surrogate model to estimate candidate performance without full training, significantly accelerating the architecture search process. Using this surrogate, we propose MZeQAS, a Monte Carlo Tree Search (MCTS)-based Zero-Shot Quantum Neural Architecture Search framework for VQAs. By integrating proxy-based performance estimation with MCTS exploration, MZeQAS efficiently discovers high-performing architectures. Experimental results demonstrate that MZeQAS outperforms existing approaches in terms of both search efficiency and solution quality, providing a scalable and effective framework for advancing VQA deployment on noisy intermediate-scale quantum devices.

URL PDF HTML ☆

赞 0 踩 0

2606.00229 2026-06-09 cs.RO cs.AI cs.LG 版本更新

Continuous Reasoning for Vision-Language-Action

视觉-语言-动作的连续推理

Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

发表机构 * Airoa

AI总结针对视觉-语言-动作策略中语言与连续控制粒度不匹配的问题，提出一种可共享、可验证的连续推理方法，通过高斯潜变量接口和自验证目标提升机器人任务成功率。

Comments Project page: https://continuous-reasoning.airoa.io

详情

AI中文摘要

自然语言是语言模型和视觉-语言模型强大的推理媒介，但与连续控制的粒度不匹配。文本和显式子目标在任务级粒度上操作，而视觉-语言-动作（VLA）策略必须在更细的时间尺度上选择动作；因此，单个推理步骤可能跨越多个动作块，同时与当前所需动作保持弱耦合。这为VLA提出了一个不同的问题：什么应该扮演语言的角色？我们认为，有用的VLA推理媒介必须能够在模型实例之间共享，通过下游动作改进进行验证，并与时间扩展的控制结构对齐。基于这一观点，我们提出了视觉-语言-动作的连续推理。我们的模型首先以结构化连续思想集的形式预测连续推理，然后将其重用为块结构动作生成的共享上下文。仅凭更好的动作预测并不能证明推理的有效性：如果相同的内部媒介不能在模型实例之间共享，并且不能通过改进的下游控制独立验证，那么添加的潜变量可能只是模型私有的捷径，有助于在已见行为上表现，而不支持泛化的控制。因此，我们将连续推理实例化为一个共享的高斯潜变量接口，并使用自验证目标进行训练，其中指数移动平均教师必须在预测目标动作时成功消费学生的推理。实验上，连续推理提高了LIBERO-PRO的鲁棒性，并在真实机器人上表现强劲，在TX-G2（一种AgiBot G2兼容变体）上平均子任务成功率比π0.5提高了40.4%，在HSR上提高了26.3%。这表明VLA中的推理更多是关于一个可共享、可验证的内部动作语言，而不是额外的标记。

英文摘要

Natural language is a powerful reasoning medium for language and vision-language models, but it is mismatched to the granularity of continuous control. Text and explicit subgoals operate at task-level granularity, whereas vision-language-action (VLA) policies must choose actions at a much finer temporal scale; a single reasoning step can therefore span many action chunks while remaining only weakly coupled to the action needed now. This suggests a different question for VLA: what should play the role of language? We argue that a useful VLA reasoning medium must be shareable across model instances, verifiable through downstream action improvement, and aligned with temporally extended control structure. Based on this view, we propose Continuous Reasoning for Vision-Language-Action. Our model first predicts continuous reasoning in the form of a structured set of continuous thoughts, then reuses them as shared context for chunk-structured action generation. Better action prediction alone does not certify good reasoning: if the same internal medium cannot be shared across model instances and independently verified through improved downstream control, the added latent may simply become a model-private shortcut that helps on seen behaviors without supporting generalizable control. We therefore instantiate continuous reasoning as a shared Gaussian latent interface and train it with a self-verification objective in which an exponential-moving-average teacher must successfully consume the student's reasoning when predicting target actions. Empirically, Continuous Reasoning improves LIBERO-PRO robustness and performs strongly on real robots, raising mean subtask success over π0.5 by 40.4% on TX-G2, an AgiBot G2-compatible variant, and 26.3% on HSR. This suggests that reasoning in VLA is less about extra tokens than about a shareable, verifiable internal language for action.

URL PDF HTML ☆

赞 0 踩 0

2606.06915 2026-06-09 cs.CL cs.AI cs.LG 版本更新

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

ThinkBooster: 一种用于LLM推理无缝测试时扩展的统一框架

Vladislav Smirnov, Chieu Nguyen, Sergey Senichev, Minh Ngoc Ta, Ekaterina Fadeeva, Artem Vazhentsev, Daria Galimzianova, Nikolai Rozanov, Viktor Mazanov, Jingwei Ni, Tianyi Wu, Igor Kiselev, Mrinmaya Sachan, Iryna Gurevych, Preslav Nakov, Timothy Baldwin, Artem Shelmanov

发表机构 * MBZUAI ； ETH Zürich（苏黎世联邦理工学院）； Imperial College London（伦敦帝国理工学院）； NUS（国立大学新加坡）； Accenture（埃森哲）； Innopolis University（因诺普里斯大学）； Independent Researcher（独立研究者）

AI总结提出ThinkBooster框架，通过模块化库、联合评估基准和可部署代理服务，实现LLM推理的测试时计算扩展，在数学和编码任务上验证了性能-计算权衡。

详情

AI中文摘要

测试时计算（TTC）扩展已成为一种强大的范式，通过在推理期间分配额外计算（例如，通过多样本生成和基于验证器的重新排序）来改进大型语言模型（LLM）推理。现有的TTC扩展策略和推理评分器仍然碎片化，在不一致的协议下进行评估，并且很少通过质量-成本权衡的视角进行分析。我们引入了ThinkBooster，一个用于LLM推理无缝测试时计算扩展的统一框架，它包括（i）一个模块化的Python库，实现了最先进的TTC扩展策略和评分器家族，（ii）一个联合评估性能和计算效率的基准，以及（iii）一个可部署的、兼容OpenAI的代理服务，使得将自适应推理无缝集成到实际应用中成为可能。我们还提供了一个演示可视化调试器，用于检查推理轨迹、中间选择决策和替代推理路径。在数学和编码任务上的实证结果揭示了TTC扩展策略和评分方法的性能-计算权衡，并表明ThinkBooster在实际任务中提供了实际收益。代码以MIT许可证在线提供。

通过自监督原则评估扩散模型的表示空间

Xiao Li, Yixuan Jia, Zekai Zhang, Xiang Li, Lianghe Shi, Jinxin Zhou, Zhihui Zhu, Liyue Shen, Qing Qu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结受自监督学习启发，提出基于Fisher信息的度量ICR，分解特征为不变和残差成分，用于联合评估扩散模型的表示与生成能力，发现中间噪声水平下不变性最强且分类性能最佳，ICR可敏感检测训练中的记忆化。

Comments First two authors contributed equally. Accepted at ICML 2026

详情

AI中文摘要

扩散模型已展现出卓越的生成能力，并成为强大的自监督表示学习器，但这两种能力之间的联系仍较少被探索。受自监督学习（SSL）启发，我们引入了一个框架，用于联合评估扩散模型的表示和生成能力。具体地，我们将特征分解为不变成分和残差成分，并推导出不变污染比（ICR），这是一种基于Fisher的度量，用于量化残差变化在特征空间中对不变信号的污染程度。我们利用该框架分析扩散模型的判别和生成行为。在表示方面，我们发现不变性在中间噪声水平达到峰值，同时该水平也产生最佳的下游分类性能。在生成方面，我们研究了在数据有限情况下训练如何从真正的泛化过渡到记忆化，并表明ICR可作为早期学习的敏感训练时指标：沿Fisher方向增加的残差能量标志着记忆化的开始，该指标仅从训练特征即可检测，无需外部评估器或保留测试集。总体而言，我们的结果表明，扩散模型可以通过其学习表示的几何结构从自监督视角进行监控。

英文摘要

Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we introduce a framework for jointly evaluating the representation and generation capabilities of diffusion models. Specifically, we decompose features into invariant and residual components and derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that quantifies how residual variation contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior of diffusion models. On the representation side, we find that invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and show that ICR serves as a sensitive training-time indicator of early learning: increasing residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show that diffusion models can be monitored from a self-supervised perspective through the geometry of their learned representations.

URL PDF HTML ☆

赞 0 踩 0

2606.09725 2026-06-09 cs.LG 新提交

Disentanglement with Holographic Reduced Representations

基于全息约简表示的解缠

Jhonny J. Velasquez Olivera, Christo K. Thomas, Walid Saad

发表机构 * Virginia Tech（弗吉尼亚理工大学）； Worcester Polytechnic Institute（伍斯特理工学院）

AI总结提出使用全息约简表示（HRR）的无监督解缠算法，利用HRR解绑操作提供归纳偏置，分离数据中的因子变化，并通过信息论分析证明其诱导近似独立的符号-值对。

详情

AI中文摘要

解缠，即使用神经网络分离数据中的因子变化，仍然是机器学习中长期存在的挑战。先前的工作通过变分自编码器和生成对抗网络，结合变分推理和信息论约束来解决这个问题。与依赖连续表示的方法不同，我们提出一种将解缠表示视为符号结构的设计，其动机是构成分布样本的概念之间的组合关系。然而，在保持可微性的同时用神经网络学习离散符号结构是困难的，通常需要复杂的架构。为此，我们引入一种无监督学习算法，使用全息约简表示（HRR）进行神经解缠。我们表明，HRR解绑操作为分离因子提供了归纳偏置，并在潜在遍历和解缠度量方面取得了与基线相当的结果。我们通过HRR解绑通道的信息论分析补充了这些实证发现。我们证明解绑诱导了近似独立的符号-值对，并推导出每个槽的容量界限，量化了可以可靠编码的不同符号概念的数量，从而定量解释了朝向解缠的归纳偏置。得到的表示不同于标准的基于自编码器的模型，其潜在单元是求和在一起的向量，而不是低维潜在向量的标量维度。我们表明，这种HRR表示比其他解缠表示对噪声更鲁棒，并在一定信噪比范围内保持重建质量。

英文摘要

Disentanglement, the separation of factors of variation in data using neural networks, remains a long-standing challenge in machine learning. Prior work has addressed this problem with variational autoencoders and generative adversarial networks that incorporate ideas from variational inference and information-theoretic constraints. In contrast to methods that rely on continuous representations, we propose a design that treats disentangled representations as symbolic structures, motivated by the compositional relationships among the concepts that make up samples from a distribution. However, learning discrete symbolic structures with neural networks while maintaining differentiability is difficult and often requires complex architectures. To address this, we introduce an unsupervised learning algorithm that uses holographic reduced representations (HRR) for neural disentanglement. We show that the HRR unbinding operation provides an inductive bias for separating factors and yields competitive results against baselines, as measured by latent traversals and disentanglement metrics. We complement these empirical findings with an information-theoretic analysis of the HRR unbinding channel. We prove that unbinding induces approximately independent symbol-value pairs and derive a per-slot capacity bound that quantifies how many distinct symbolic concepts can be reliably encoded, giving a quantitative account of the inductive bias toward disentanglement. The resulting representations differ from standard autoencoder-based models, in that their latent units are vectors that are summed together, rather than scalar dimensions of a low-dimensional latent vector. We show that this HRR representation is more robust to noise than other disentangled representations and maintains reconstruction quality across a range of SNRs.

URL PDF HTML ☆

赞 0 踩 0

2606.07522 2026-06-09 cs.CL cs.LG cs.SI 交叉投稿

Community-Specific Slang and Entity Detection via Semantic Shift in Fine-Tuned Language Models

通过微调语言模型中的语义偏移检测社区特定俚语和实体

Julia Kruk, Sanchita Porwal, Amitrajit Bhattacharjee, Mansi Phute

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出无监督方法，通过测量词在微调前后的语义偏移幅度，从在线社区文本中自动识别俚语、独特实体和民俗用语。

Comments 6 pages, 6 figures, 2 tables

详情

AI中文摘要

我们提出一种无监督方法，通过隔离词汇中具有最大语义偏移幅度的词，来解析来自在线社区的俚语、独特实体和民俗用语。语义偏移定义为在社区特定文本语料上微调预训练大语言模型（LLM）后，词编码表示的演化。该值与基础模型和微调模型对词的编码表示之间的余弦相似度成反比。我们在从3个Reddit子版块（r/Technology、r/Gaming、r/WorldofWarcraft）收集的文本语料上微调DistilRoBERTa模型，对词汇上的余弦相似度分布进行建模，并表明通过提取底部10百分位的数据，可以成功解析对社区具有独特意义的词。相反，我们表明顶部10百分位的数据由具有相对普遍语义的词组成。

英文摘要

We propose an unsupervised method of resolving slang, unique entities, and folklore from online communities by isolating words in the lexicon that have the highest magnitude of semantic shift. Semantic shift is defined as the evolution of a word's encoded representation as a result of fine-tuning a pretrained Large Language Model (LLM) on a community-specific text corpus. This value is inversely proportional to the cosine similarity between the base model's encoded representation of a word, and a fine-tuned model's encoded representation. We fine-tune the DistilRoBERTa model on text corpora collected from 3 Reddit subreddits (r/Technology, r/Gaming, r/WorldofWarcraft), model a distribution of cosine similarity over the lexicon, and show that one can successfully resolve words that have unique significance to the community by pulling data in the bottom 10-percentile. In contrast, we show that data in the top 10-percentile consist of words that carry relatively universal semantics.

URL PDF HTML ☆

赞 0 踩 0

2606.07725 2026-06-09 physics.geo-ph cs.LG 交叉投稿

GNSS-FM: A Self-Supervised Foundation Model for Daily GNSS Displacement Time Series

GNSS-FM：用于日常GNSS位移时间序列的自监督基础模型

Nick Teutschmann, Laura Crocetti, Fanny Lehmann, Leonardo Trentini, Benedikt Soja

发表机构 * Institute of Geodesy and Photogrammetry, ETH Zurich（大地测量与摄影测量研究所，苏黎世联邦理工学院）； ETH AI Center（ETH人工智能中心）

AI总结提出GNSS-FM自监督基础模型，通过双流输入和掩码潜在预测预训练，在位移预测和地震阶跃定位任务上优于强基线。

详情

AI中文摘要

来自全球导航卫星系统（GNSS）的位移时间序列对于广泛的应用至关重要，包括监测构造地壳变形和研究地震周期的不同阶段。机器学习方法已被证明在GNSS应用中具有前景；然而，大多数方法仍然是完全监督的。这造成了瓶颈，因为标记数据稀缺，尽管大量未标记的GNSS数据可免费获取。我们提出了GNSS-FM，一个用于日常GNSS时间序列的自监督基础模型。该模型使用结合位移和速度类增量的双流输入，并通过掩码潜在预测目标进行预训练，该目标采用从wav2vec 2.0改编的向量量化目标，并针对大地测量数据进行了若干修改。在来自全球超过17,000个GNSS站的数据上预训练后，对学习到的码本的分析表明，这些表示捕获了GNSS位移数据中的主要信号类型，包括地震偏移、构造漂移和季节性模式。该基础模型随后在两个下游任务上进行微调，即90天位移预测和地震阶跃定位，在这两个任务中，它都优于强大的任务特定基线。这些结果表明，自监督预训练是GNSS时间序列分析的一种有前景的方法。

英文摘要

Displacement time series from Global Navigation Satellite Systems (GNSS) are essential for a wide range of applications, including monitoring tectonic crustal deformations and investigating the different stages of the earthquake cycle. Machine learning methods have proven promising for GNSS applications; however, most remain fully supervised. This creates a bottleneck as labeled data are scarce, even though large amounts of unlabeled GNSS data are freely available. We present GNSS-FM, a self-supervised foundation model for daily GNSS time series. The model uses a dual-stream input combining displacement and velocity-like increments, and is pretrained using a masked latent prediction objective with vector-quantized targets adapted from wav2vec 2.0, with several modifications for geodetic data. Pretrained on data from over 17,000 globally distributed GNSS stations, an analysis of the learned codebook suggests that the representations capture the main signal types in GNSS displacement data, including seismic offsets, tectonic drift, and seasonal patterns. The foundation model is later fine-tuned on two downstream tasks, namely 90-day displacement forecasting and seismic step localization, where it outperforms strong task-specific baselines in both cases. These results show that self-supervised pretraining is a promising approach for GNSS time series analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.08236 2026-06-09 cs.CL cs.LG 交叉投稿

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

共享语义，不同机制：通过对齐语义与机制的无监督特征发现

Hyunjin Cho, Youngji Roh, Jaehyung Kim

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种无监督方法，通过语义嵌入和归因签名聚类模型续写，发现隐藏的机制模式，补充电路分析。

Comments 40 pages

详情

Journal ref: ICML 2026 Spotlight

AI中文摘要

随着大型语言模型越来越多地部署在高风险场景中，人们越来越需要工具来审计不仅模型输出，还包括产生这些输出的内部计算。电路分析是机械可解释性中的核心方法，但通常是目标条件化的，解释单个提示与选定补全的配对。这种目标条件化设置可能掩盖模型续写分布中的异质性。我们引入了分布级无监督特征发现，该方法使用语义内容和序列级机械归因对采样续写进行聚类，而无需手动指定目标输出。我们的方法用语义嵌入和前缀到续写的归因签名表示每个续写，然后优化一个率失真目标，该目标在语义一致性、机械一致性和聚类粒度之间进行权衡。在聚类和引导分析中，发现的聚类暴露了单视图基线遗漏的续写模式，并提供了干预证据，表明聚类签名对应于可操作的机械因素。总的来说，我们的方法通过提供对模型续写分布背后机制的可扩展审计，补充了电路分析和行为评估。

英文摘要

As large language models are increasingly deployed in high-stakes settings, there is a growing need for tools that audit not only model outputs but also the internal computations that produce them. Circuit analysis is a central approach in mechanistic interpretability, but it is typically target-conditioned, explaining a single prompt paired with a chosen completion. This target-conditioned setup can obscure heterogeneity across a model's continuation distribution. We introduce distribution-level unsupervised feature discovery, which clusters sampled continuations using both semantic content and sequence-level mechanistic attributions, without manually specifying target outputs. Our method represents each continuation with a semantic embedding and a prefix-to-continuation attribution signature, then optimizes a rate-distortion objective that trades off semantic coherence, mechanistic consistency, and cluster granularity. Across clustering and steering analyses, the discovered clusters expose continuation modes that single-view baselines miss and provide interventional evidence that cluster signatures correspond to actionable mechanistic factors. Overall, our approach complements circuit analysis and behavioral evaluation by providing a scalable audit of the mechanisms underlying a model's continuation distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.08496 2026-06-09 cs.CL cs.LG 交叉投稿

SAEExplainer: Interpreting SAE Features with Activation-Guided Preference Optimization

SAEExplainer: 基于激活引导偏好优化的SAE特征解释

Jingyi He, Haiyan Zhao, Ruxue Shi, Yanguang Liu, Xin Wang, Fei Sun, Mengnan Du

发表机构 * Shanghai Jiao Tong University（上海交通大学）； NJIT（新泽西理工学院）； Jilin University（吉林大学）； Institute of Computing Technology, CAS（中国科学院计算技术研究所）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出SAEExplainer框架，利用激活分数作为奖励信号，通过两轮优化迭代自纠正基础解释，减少解释幻觉并增强因果触发模式。

详情

AI中文摘要

尽管稀疏自编码器（SAE）通过将密集表示分解为稀疏特征缓解了大语言模型（LLM）的不透明性，但解释这些特征仍然是一个核心挑战。然而，当前的解释方法通常运行在开环范式下，未能利用机械反馈进行进一步优化。在本文中，我们提出SAEExplainer，一个利用激活分数作为客观奖励信号来训练模型进行自我纠正和迭代自举的训练框架。通过两轮优化过程迭代验证和纠正基础解释，SAEExplainer实现了其解释能力的持续提升。该机制显著减少了解释幻觉并强化了因果触发模式。大量实验表明，我们的方法在大多数指标上优于已有基线，特别是在因果触发和判别性激活方面。

英文摘要

Although Sparse Autoencoders (SAEs) have mitigated the opacity of large language models (LLMs) by decomposing dense representations into sparse features, explaining these features still remains a central challenge. Current explanation methods, however, typically operate within an open-loop paradigm, failing to leverage mechanistic feedback for further refinement. In this paper, we propose SAEExplainer, a training framework utilizes activation scores as an objective reward signal to train the model for self-correction and iterative bootstrapping. By iteratively verifying and correcting foundational explanations through a two-round optimization process, SAEExplainer achieves continuous improvement in its explanatory capabilities. This mechanism significantly reduces explanation hallucinations and reinforces causal triggering patterns. Extensive experiments demonstrate our approach improves upon established baselines across most metrics, especially in causal triggering and discriminative activation.

URL PDF HTML ☆

赞 0 踩 0

2606.08678 2026-06-09 cs.SD cs.LG 交叉投稿

Speaker-Invariant Representation Learning for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck

基于梯度反转和变分信息瓶颈的说话人不变表示学习用于欺骗检测

Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans

发表机构 * Avignon Universite（阿维尼翁大学）； EURECOM

AI总结针对欺骗检测中说话人偏差导致泛化差的问题，提出教师-学生框架，利用梯度反转层和变分信息瓶颈解耦身份信息，在9个数据集上EER相对降低25.7%。

详情

AI中文摘要

先进的生成语音技术可能破坏语音生物识别的可靠性。虽然欺骗检测系统在域内条件下评估时表现出色，但对域外设置的泛化能力通常较差。在本文中，我们表明此类问题可能由说话人偏差引起，即模型学习个体声音特征而非操作或生成的标记。我们提出了一种用于说话人不变欺骗检测的教师-学生框架，该框架无需说话人标签即可解耦身份。我们利用预训练的说话人识别教师通过梯度反转层指导学生模型。为了控制抑制与语音身份相关线索和保留与欺骗检测相关线索之间的平衡，我们集成了变分信息瓶颈。在九个数据集上的评估表明，与MHFA基线相比，我们的模型实现了EER相对降低25.7%。

英文摘要

Sophisticated generative speech technology can undermined the reliability of voice biometrics. While spoofing detection systems excel when assessed under in-domain conditions, generalisation to out-of-domain settings is often poor. In this paper, we show that such issues could be caused by speaker bias, where models learn individual voice traits rather than markers of manipulation or generation. We propose a teacher-student framework for speaker-invariant spoofing detection that disentangles identity without requiring speaker labels. We leverage a pre-trained speaker recognition teacher to guide a student model via a gradient reversal layer. To control the balance between suppressing cues related to voice identity with the preservation of those related to spoofing detection, we integrate a Variational Information Bottleneck. Evaluations across nine datasets show our model achieves a 25.7% relative reduction to the EER compared to the MHFA baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.09181 2026-06-09 cs.CV cs.LG 交叉投稿

Counterfactual Reasoning for Fine-Grained Evidence Disentanglement in VideoQA

用于视频问答中细粒度证据分离的反事实推理

Zhou Du, Hamid Krim, Xiao Wu, Zhaoquan Yuan, Liangwei Li, Keisuke Fujii

发表机构 * School of OptoElectonic Science and Engineering, University of Electronic Science and Technology of China（电子科技大学光电科学与工程学院）

AI总结提出反事实推理框架CREDiT，通过结构因果模型将视频问答中的跨模态表示分解为因果和非因果成分，在独立性约束下进行特征级因果干预，提升答案准确性和推理可靠性。

Comments 10 pages, 6 figures

详情

AI中文摘要

近期视频多模态模型的进展显著提升了视频问答性能。然而，这些系统往往依赖于虚假的统计相关性而非与答案相关的因果证据，导致推理不忠实且脆弱，尤其在复杂真实场景中。现有方法要么依赖跨模态相关性、昂贵的精心策划的训练资源，要么依赖不充分的因果假设和约束，且通常操作在时间区间级别。因此，它们未能明确地将因果视觉线索与混杂因素分离，且提供的细粒度证据定位有限。为解决此问题，我们提出了一种用于细粒度证据分离的反事实推理框架（CREDiT）。CREDiT使用结构因果模型形式化视频问答过程，并在独立性和最小性约束下学习明确分解为因果和非因果成分的跨模态表示。为促进忠实的分离，我们引入特征级因果干预，构建近似因果效应同时抑制非因果相关性的反事实输入。在NExT-GQA、SportsQA和SPORTU-video上的大量实验表明，CREDiT在通用和复杂体育场景中均能持续提升答案准确性和推理可靠性，从而构建更可信的视频问答系统。

英文摘要

Recent advances in video multimodal models have significantly improved VideoQA performance. However, these systems often rely on spurious statistical correlations rather than answer-relevant causal evidence, resulting in unfaithful and brittle reasoning, especially in complex real-world scenarios. Existing methods either rely on cross-modality correlations, costly curated training resources, or insufficient causal assumptions and constraints, and typically operate at the time-interval level. As a result, they fail to explicitly disentangle causal visual cues from confounders and provide limited fine-grained evidence localization. To address this issue, we propose a Counterfactual Reasoning framework for fine-grained Evidence Disentanglement (CREDiT). CREDiT formulates the VideoQA process using a structural causal model and learns cross-modality representations that are explicitly decomposed into causal and non-causal components under independence and minimality constraints. To facilitate faithful disentanglement, we introduce feature-level causal interventions and construct counterfactual inputs that approximate causal effects while suppressing non-causal correlations. Extensive experiments on NExT-GQA, SportsQA, and SPORTU-video demonstrate that CREDiT consistently improves answer accuracy and reasoning reliability across both generic and complex sports scenarios, leading to more trustworthy VideoQA systems.

URL PDF HTML ☆

赞 0 踩 0

2606.09331 2026-06-09 cs.MM cs.AI cs.LG 交叉投稿

移动性嵌入的POI：从人类移动中学习场所身份与使用方式

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu, Neha Arora, Cyrus Shahabi

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出ME-POIs框架，通过对比学习将大规模人类移动数据与语言模型嵌入结合，学习场所功能，并在五个地图丰富任务上超越文本或移动性单独基线。

详情

AI中文摘要

近期地理空间基础模型的进展强调了学习真实世界位置（特别是人类活动集中的兴趣点POI）通用表示的重要性。然而，现有方法主要关注从静态文本元数据中提取的场所身份，或学习与轨迹上下文相关的表示，这些表示捕捉的是移动规律而非场所的实际使用方式（即POI的功能）。我们认为POI功能是通用POI表示中缺失但关键的信号。我们提出了移动性嵌入的POI（ME-POIs），这是一个框架，通过大规模人类移动数据增强从语言模型派生的POI嵌入，以学习基于真实世界使用的、以POI为中心且上下文无关的表示。ME-POIs将个体访问编码为时间上下文化的嵌入，并通过对比学习将其与可学习的POI表示对齐，以捕捉跨用户和时间的使用模式。为解决长尾稀疏性问题，我们提出了一种新机制，从附近频繁访问的POI跨多个空间尺度传播时间访问模式。我们在五个新提出的地图丰富任务上评估ME-POIs，测试其捕捉POI身份和功能的能力。在所有任务中，用ME-POIs增强文本嵌入始终优于纯文本和纯移动性基线。值得注意的是，仅使用移动数据训练的ME-POIs在某些任务上能超越纯文本模型，凸显了POI功能是准确且可泛化的POI表示的关键组成部分。

英文摘要

Recent progress in geospatial foundation models highlights the importance of learning general-purpose representations for real-world locations, particularly points-of-interest (POIs) where human activity concentrates. Existing approaches, however, focus primarily on place identity derived from static textual metadata, or learn representations tied to trajectory context, which capture movement regularities rather than how places are actually used (i.e., POI's function). We argue that POI function is a missing but essential signal for general POI representations. We introduce Mobility-Embedded POIs (ME-POIs), a framework that augments POI embeddings derived, from language models with large-scale human mobility data to learn POI-centric, context-independent representations grounded in real-world usage. ME-POIs encodes individual visits as temporally contextualized embeddings and aligns them with learnable POI representations via contrastive learning to capture usage patterns across users and time. To address long-tail sparsity, we propose a novel mechanism that propagates temporal visit patterns from nearby, frequently visited POIs across multiple spatial scales. We evaluate ME-POIs on five newly proposed map enrichment tasks, testing its ability to capture both the identity and function of POIs. Across all tasks, augmenting text-based embeddings with ME-POIs consistently outperforms both text-only and mobility-only baselines. Notably, ME-POIs trained on mobility data alone can surpass text-only models on certain tasks, highlighting that POI function is a critical component of accurate and generalizable POI representations.

URL PDF HTML ☆

赞 0 踩 0

2604.27810 2026-06-09 cs.LG 版本更新

Hyper-Dimensional Fingerprints as Molecular Representations

超维指纹作为分子表示

Jonas Teufel, Luca Torresi, André Eberhard, Pascal Friederich

发表机构 * Karlsruhe Institute of Technology (KIT), Institute of Nanotechnology (INT)（卡尔斯鲁厄理工学院（KIT），纳米技术研究所）； Karlsruhe Institute of Technology (KIT), Institute of Anthropomatics and Robotics (IAR)（卡尔斯鲁厄理工学院（KIT），人机学与机器人研究所）

AI总结本文提出超维指纹（HDF），通过高维向量的代数运算生成确定性分子表示，无需训练，在多种属性预测任务中表现优异，且在低维情况下保持分子相似性的一致性。

Comments Code: https://doi.org/10.5281/zenodo.19373621

详情

AI中文摘要

计算分子表示是虚拟筛选、性质预测和材料发现的基础。传统指纹效率高但因基于哈希的压缩丢失结构信息，特别是在低维情况下。通过图神经网络学习的表示恢复了这种表达性，但需要任务特定的训练和大量计算资源。本文引入超维指纹（HDF），用高维向量的代数运算替代消息传递神经网络的学习转换，生成无需训练的确定性分子表示。在多样化的属性预测基准上，HDF在大多数任务中优于传统指纹，且在不同数据集和模型间表现出更高的一致性。关键的是，HDF嵌入保持分子相似性：在32维时，HDF空间的距离与图编辑距离的皮尔逊相关系数达到0.9，而摩根指纹在同等尺寸下仅为0.55。这种结构保真度在低维情况下持续，允许简单的最近邻回归在64个组件中保持预测性。进一步在贝叶斯分子优化中展示了实际影响，HDF基于的替代模型在摩根指纹表现与随机搜索相当的领域中显著提高了样本效率。HDF因此提供了一种通用的、无需训练的替代方案，表明传统固定长度指纹中接受的信息损失是哈希编码方案的限制，而非指纹范式本身。

英文摘要

Computational molecular representations underpin virtual screening, property prediction, and materials discovery. Conventional fingerprints are efficient and deterministic but lose structural information through hash-based compression, particularly at low dimensionalities. Learned representations from graph neural networks recover this expressiveness but require task-specific training and substantial computational resources. Here we introduce hyperdimensional fingerprints (HDF), which replace the learned transformations of message-passing neural networks with algebraic operations on high-dimensional vectors, producing deterministic molecular representations without any training. Across diverse property prediction benchmarks, HDF outperforms conventional fingerprints in the majority of tasks while exhibiting greater consistency across datasets and models. Crucially, HDF embeddings preserve molecular similarity faithfully: at 32 dimensions, distances in HDF space achieve a 0.9 Pearson correlation with graph edit distance, compared to 0.55 for Morgan fingerprints at equivalent size. This structural fidelity persists at low dimensions where hash-based methods degrade, allowing simple nearest-neighbor regression to remain predictive with as few as 64 components. We further demonstrate the practical impact in Bayesian molecular optimization, where HDF-based surrogate models achieve substantially improved sample efficiency in regimes where Morgan fingerprints perform comparably to random search. HDF thus provides a general-purpose, training-free alternative to conventional molecular fingerprints, suggesting that the information loss long accepted as inherent to fixed-length fingerprints is a limitation of the hash-based encoding scheme rather than the fingerprint paradigm itself.

URL PDF HTML ☆

赞 0 踩 0

2605.06582 2026-06-09 cs.LG cs.CL cs.SD 版本更新

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

PairAlign：一种通过自对齐的序列标记化框架及其在音频标记化中的应用

Adhiraj Banerjee, Vipul Arora

发表机构 * Department of Electrical Engineering, Indian Institute of Technology, Kanpur（电子工程系，印度理工学院，坎浦尔）

AI总结 PairAlign通过序列级自对齐实现紧凑音频标记化，利用条件序列生成方法，提升标记一致性、长度控制和编辑相似性。

Comments 57 pages main content, 109 total pages, 9 Figures, pre-print, Under Review

详情

AI中文摘要

许多感官数据的操作——比较、记忆、检索和推理——自然地在离散符号结构上表达。在语言中，这种接口由标记提供；在音频中，必须学习。现有音频标记器依赖于量化、聚类或编解码器重建，将标记局部分配，因此序列一致性、紧凑性、长度控制、终止和编辑相似性很少被直接优化。我们引入PairAlign，一种通过序列级自对齐实现紧凑音频标记化的框架。PairAlign将标记化视为条件序列生成：编码器将语音映射为连续条件，自回归解码器从BOS开始生成标记，学习标记身份、顺序、长度和EOS位置。给定两个保持内容的视图，每个视图的序列在另一个视图的表示下被训练为可能，而无关示例提供竞争序列。这为可扩展的编辑距离保留代理，同时抑制许多对一的坍缩。PairAlign从VQ式标记化开始，并通过EMA教师目标、交叉配对教师强制、前缀损坏、似然对比和长度控制进行优化。在3秒语音上，PairAlign学习紧凑、非退化的序列，具有广泛的词汇使用和强跨视图一致性。在检索测试中，它保留编辑距离搜索，同时将存档标记数量减少55%。连续扫频探针显示其局部重叠低于密集几何标记器，但具有更强的长度控制和在100毫秒移位下的受约束编辑轨迹。PairAlign是一种序列符号预测学习者：像JEPA式目标一样，它从另一个视图预测一个抽象目标作为学习的可变长度符号序列，而不是连续潜在变量。

英文摘要

Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly. We introduce PairAlign, a framework for compact audio tokenization through sequence-level self-alignment. PairAlign treats tokenization as conditional sequence generation: an encoder maps speech to a continuous condition, and an autoregressive decoder generates tokens from BOS, learning token identity, order, length, and EOS placement. Given two content-preserving views, each view's sequence is trained to be likely under the other's representation, while unrelated examples provide competing sequences. This gives a scalable surrogate for edit-distance preservation while discouraging many-to-one collapse. PairAlign starts from VQ-style tokenization and refines it with EMA-teacher targets, cross-paired teacher forcing, prefix corruption, likelihood contrast, and length control. On 3-second speech, PairAlign learns compact, non-degenerate sequences with broad vocabulary usage and strong cross-view consistency. On retrieval tests, it preserves edit-distance search while reducing archive token count by 55%. A continuous-sweep probe shows lower local overlap than a dense geometric tokenizer, but stronger length control and bounded edit trajectories under 100 ms shifts. PairAlign is a sequence-symbolic predictive learner: like JEPA-style objectives, it predicts an abstract target from another view as a learned variable-length symbolic sequence, not a continuous latent.

URL PDF HTML ☆

赞 0 踩 0

2605.16823 2026-06-09 cs.LG 版本更新

VQ-Atom: Semantic Discretization of Local Atomic Environments for Molecular Representation Learning

原子作为语言：VQ-Atom：用于分子表示学习的语义离散化

Takayuki Kimura

发表机构 * Atoms as Language, LLC（Atoms as Language公司）

AI总结本文提出VQ-Atom，一种用于分子表示学习的语义离散化框架，通过将连续的原子级图表示转换为对应局部化学环境的离散标记，从而提升分子表示的学习效果。

详情

AI中文摘要

分子表示学习已成为AI驱动药物发现中的核心方法，但现有分子分词如SMILES仍主要是语法性的，无法自然对齐具有化学意义的子结构。在本文中，我们介绍了VQ-Atom，一种语义离散化框架，将连续的原子级图表示转换为对应局部化学环境的离散标记。利用图神经网络嵌入和向量量化，原子被分配到代表化学有意义的原子上下文的代码本条目中。这些离散标记定义了一种适合基于Transformer的预训练的分子语言。我们评估了VQ-Atom在蛋白质-配体相互作用预测中的表现，采用蛋白质冷分割设置且不依赖3D结构信息。实验结果表明，与传统分词方法相比，VQ-Atom在预测性能上始终有所提升，表明语义基础的离散化可以显著增强分子表示学习。我们的发现表明，分词设计本身在使化学领域有效语言建模中起着关键作用。

英文摘要

Large language models succeed by combining large-scale pretraining with meaningful discrete tokens. In molecular machine learning, SMILES is widely used as a token representation, but it is primarily a linearization format for molecular graphs rather than a semantic decomposition of chemistry. We propose VQ-Atom, a semantic tokenization framework that assigns discrete atom-level tokens based on local chemical environments via vector quantization. Unlike SMILES tokens, VQ-Atom tokens encode graph-local chemical context and are aligned with molecular structure. On protein-cold drug--target interaction prediction using the KIBA dataset, VQ-Atom substantially improves global ranking performance, achieving AUROC of 0.79 while substantially outperforming both SMILES-based and continuous molecular representations under an identical downstream architecture. Furthermore, VQ-Atom enables approximately 3 times faster downstream training than continuous atom-level representations by replacing per-atom continuous features with reusable discrete tokens. These results suggest that molecular tokenization is not merely a preprocessing step, but a central design choice. In particular, well-structured tokens can encode substantial chemical semantics, reducing the burden on downstream learning. VQ-Atom can be interpreted as defining a molecular language, where tokens correspond to chemically meaningful atomic environments, suggesting that token design may constitute an additional axis of machine learning research alongside architecture, objectives, and optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.24942 2026-06-09 cs.LG cs.AI 版本更新

Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

黎曼流形操控：用于无标签操控的几何感知生成自编码器

Narmeen Oozeer, Shivam Raval, Philip Quirke, Manikandan Ravikiran, Jeff Phillips, Shriyash Upadhyay, Amirali Abdullah

发表机构 * Martian ； Harvard University（哈佛大学）； Thoughtworks ； University of Utah（犹他大学）

AI总结提出将语言模型操控重新定义为激活空间上的黎曼测地线计算，通过基于输出空间Hellinger距离学习的编码器实现无标签、无拓扑先验的流形操控。

详情

AI中文摘要

语言模型的操控——干预其内部激活以改变下游行为——最近已从线性插值扩展到非线性方法，如角度操控和核化操控，这些方法定义了干预变换，而无需在激活空间中的路径上学习显式几何。新引入的几何感知流形方法确实学习了这样的几何，但需要带标签的类中心以及预设的循环或顺序结构。这些假设限制了流形操控的应用范围，因为现有构造需要带标签的中心和兼容的边界条件。我们将流形操控更广泛地重新定义为激活空间上的黎曼测地线计算，将线性操控和带标签样条操控恢复为特定度量选择下的测地线。该框架内一个有原则的度量是输出空间Hellinger距离拉回到激活空间；我们通过一个在小型概念-令牌模式上基于输出距离训练的学习编码器来近似该度量——无需每个提示的标签、无需拓扑先验、也无需每个任务的曲线拟合。实验上，该方法在标准四任务语言模型算术基准的所有任务中可靠地将模型驱动到目标类别，同时在较小输出空间上遵循比基线更行为自然的轨迹。因此，我们为流形操控提供了一个统一的黎曼框架，以及一个基于模式监督、无标签的实例化，该实例化无需带标签的中心或预设边界条件即可运行。

英文摘要

Steering a language model - intervening on its internal activations to change downstream behaviour - has recently expanded beyond linear interpolation to nonlinear methods such as angular and kernelized steering, which define intervention transformations without learning an explicit geometry over paths in activation space. Freshly introduced geometry-aware manifold methods do learn such a geometry, but require labelled class centroids together with prescribed cyclic or sequential structure. These assumptions restrict where manifold steering can be applied, since existing constructions require labelled centroids and compatible boundary conditions. We recast manifold steering more broadly as \textbf{Riemannian geodesic computation} on activation space, recovering linear and labelled-spline steering as geodesics under particular choices of metric. A principled metric within this framework is the output-space Hellinger distance pulled back to activations; we approximate this with a learned encoder trained on output distances over a small concept-token schema - no per-prompt labels, no topology prior, and no per-task curve fitting. Empirically, the method reliably drives the model onto the target class across all tasks in a standard four-task language-model arithmetic benchmark, while following more behaviourally natural trajectories than baselines on smaller output spaces. We thereby provide a unified Riemannian framework for manifold steering together with a schema-supervised, label-free instantiation that operates without labelled centroids or prescribed boundary conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.01304 2026-06-09 cs.LG 版本更新

When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval

当硬负例有害时：弥合检索中硬负例生成的生成-判别鸿沟

Zhicheng Zhang, Jiwei Tang, Kuicai Dong, Xiaopeng Li, Jieming Zhu, Jingyu Li, Qianhui Zhu, Fengyuan Lu, Wang Jiaheng, Gang Wang, Hai-Tao Zheng, Zhaocheng Du

发表机构 * Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Huawei Technologies Co., Ltd.（华为技术有限公司）； City University of Hong Kong（香港城市大学）； School of Cyber Science and Technology, Sun Yat-sen University（中山大学信息科学与技术学院）； School of Intelligence Science and Technology, Nanjing University（南京大学智能科学与技术学院）； The Hong Kong University of Science and Technology（香港科学与技术大学）； Huawei Noah’s Ark Lab（华为诺亚实验室）

AI总结针对检索中硬负例生成存在的生成-判别鸿沟问题，提出CausalNeg方法，通过CoT引导的反事实扰动和查询视角熵最大化来提升检索性能。

Comments Accepted at KDD 2026

详情

DOI: 10.1145/3770855.3818118

AI中文摘要

硬负例挖掘已成为训练检索器的主流策略，但它面临内在局限性：负例受限于语料库可用性，由检索器分数而非诊断价值选择，并且随着检索器改进，假阳性污染日益严重。基于LLM的合成提供了一种原则性替代方案，其中负例不受约束、具有针对性且无假阳性风险。但我们表明，将生成的负例天真地融入对比学习通常会降低检索性能。我们识别并形式化根本原因为生成-判别鸿沟：LLM生成优化流畅、合理的文本，而对比学习要求在决策边界处进行战略性的相关性违反。我们的分析揭示了两种复合失败模式：判别无关生成，即LLM缺乏对查询信息需求的显式模型，默认生成通用或主题漂移的文本，不提供对比信号；以及源依赖捷径，即分布性伪影使模型能够根据来源而非相关性区分负例，导致梯度漂移，积极破坏优化。为弥合这一鸿沟，我们提出CausalNeg，包含两个主要模块：(1) CoT引导的反事实扰动用于数据构建：将文档满足查询的原因分解为显式信息需求，然后精确违反个别需求以构建具有可控、可解释硬度的负例。(2) 训练期间的查询视角熵最大化：将生成的负例分散到相似度谱中，最小化源身份与相似度分数之间的互信息，以抑制捷径利用。我们在https://github.com/mzhangzhicheng/CausalNeg公开代码。

英文摘要

Hard negative mining has become the dominant strategy for training retrievers, yet it faces intrinsic limitations: negatives are bounded by corpus availability, selected by retriever score rather than diagnostic value, and increasingly contaminated by false positives as the retriever improves. LLM-based synthesis offers a principled alternative, where negatives that are unconstrained, targeted, and free from false positive risk. But we show that naively incorporating generated negatives into contrastive learning often degrades retrieval performance. We identify and formalize the root cause as a generative-discriminative gap: LLM generation optimizes for fluent, plausible text, while contrastive learning demands strategic violations of relevance at the decision boundary. Our analysis reveals two compounding failure modes: discriminative-agnostic generation, where the LLM lacks an explicit model of query information needs and defaults to generic or topic-drifted text that provides no contrastive signal; and source-dependent shortcuts, where distributional artifacts enable the model to distinguish negatives by origin rather than relevance, causing gradient drift that actively corrupts optimization. To close this gap, we propose CausalNeg consisting of two main modules: (1) CoT-guided counterfactual perturbation for data construction: decomposes why a document satisfies a query into explicit information requirements, then surgically violates individual requirements to construct negatives with controlled, interpretable hardness. (2) Query-view entropy maximization during training: disperses generated negatives across the similarity spectrum, minimizing the mutual information between source identity and similarity scores to suppress shortcut exploitation. We make our code publicly available at https://github.com/mzhangzhicheng/CausalNeg.

URL PDF HTML ☆

赞 0 踩 0

2606.01546 2026-06-09 cs.LG 版本更新

Flexible Online Representation Learning Based on Similarity Matching

基于相似性匹配的灵活在线表示学习

Shagesh Sridharan, Yanis Bahroun, Anirvan M. Sengupta

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种基于相似性匹配的在线生物合理学习算法，能够学习稀疏移位不变表示，适用于聚类、流形平铺或稀疏编码。

Comments 6 pages, 3 figures. Originally accepted to IJCNN 2023 but not presented owing to visa issues

详情

AI中文摘要

稀疏高维表示有助于在无监督数据探索中发现非平凡结构。这种表示可以处理与社区检测问题相关的图中的密集连接。然而，稀疏高维表示还能做更多事情，包括流形平铺和特征学习。传统算法在计算上难以处理的完全正定矩阵空间中进行优化，或者将问题松弛到双非负矩阵空间，这些矩阵的规模随样本大小增长，使得它们对大数据集不实用。其中一些方法还施加了行和约束，例如双随机性。在流形平铺的背景下，行和约束具有平移不变性的额外优势。对输出相似性矩阵的行和约束需要非平凡的在线学习规则。针对这些需求，我们提出了一种通用的在线生物合理学习算法，能够学习稀疏移位不变表示，根据数据结构，可用于聚类、流形平铺或稀疏编码。

英文摘要

Sparse high-dimensional representations are conducive to uncovering nontrivial structures in unsupervised exploration of data. Such a representation can deal with the dense connectivity in graphs relevant to community detection problems. However, sparse high-dimensional representations are capable of doing more, including manifold tiling and feature learning. Conventional algorithms optimize in the space of computationally intractable completely positive matrices or relax the problem to the space of doubly nonnegative matrices that scale with sample size in a way rendering them impractical for large data sets. Some of these methods also impose a row sum constraint, such as double stochasticity. Row sum constraints have the added advantage of being shift-invariant, in the context of manifold tiling. Constraints on the row sum of output similarity matrices require nontrivial online learning rules. Addressing these needs, we propose a versatile online biologically plausible learning algorithm capable of learning sparse shift-invariant representations, useful for clustering, manifold tiling, or sparse coding, depending on the data structure.

URL PDF HTML ☆

赞 0 踩 0

2407.01718 2026-06-09 stat.ML cs.LG math.ST stat.TH 版本更新

Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets

熵最优传输特征映射用于高维数据集的非线性对齐与联合嵌入

Boris Landa, Yuval Kluger, Rong Ma

发表机构 * Department of Electrical and Computer Engineering, Yale University（耶鲁大学电气与计算机工程系）； Department of Biostatistics, Harvard University（哈佛大学生物统计学系）； Program in Applied Mathematics, Yale University（耶鲁大学应用数学项目）； Interdepartmental Program in Computational Biology and Bioinformatics, Yale University（耶鲁大学计算生物学与生物信息学跨学科项目）； Department of Pathology, Yale University School of Medicine（耶鲁大学医学院病理学系）

AI总结提出熵最优传输特征映射方法，通过EOT计划矩阵的奇异向量对齐和联合嵌入两个数据集，具有理论保证，在生成模型下证明其收敛性，并在模拟和真实生物数据中展示优势。

详情

AI中文摘要

将高维数据嵌入低维空间是数据分析中不可或缺的组成部分。在许多应用中，需要对齐和联合嵌入来自不同研究或实验条件的多个数据集。这些数据集可能共享感兴趣的底层结构，但表现出个体扭曲，导致使用传统技术时嵌入不对齐。在这项工作中，我们提出了熵最优传输（EOT）特征映射，一种具有理论保证的对齐和联合嵌入一对数据集的原则性方法。我们的方法利用两个数据集之间EOT计划矩阵的前导奇异向量来提取它们共享的底层结构，并在公共嵌入空间中对齐它们。我们将我们的方法解释为经典拉普拉斯特征映射和扩散映射嵌入的数据间变体，表明它具有许多有利的类似性质。我们分析了一个生成模型，其中两个观测到的高维数据集共享支持在公共低维流形上的潜在变量，而每个数据集受到平移、几何扭曲、正交干扰结构和噪声的影响。在大样本、高维情况下，我们证明EOT计划围绕一个由扭曲的几何均值确定的有效流形上的总体核集中，对平移、正交干扰结构和噪声具有不变性。随后，我们将我们的嵌入与编码共享流形密度和几何的总体水平算子的特征函数联系起来。最后，我们通过模拟和真实生物数据的分析展示了我们的方法在数据集成和嵌入方面的性能，证明了其在挑战性场景下相对于替代方法的优势。

英文摘要

Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose Entropic Optimal Transport (EOT) eigenmaps, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align them in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We analyze a generative model in which two observed high-dimensional datasets share latent variables supported on a common low-dimensional manifold, while each dataset is subject to translation, geometric distortion, orthogonal nuisance structure, and noise. In a large-sample, high-dimensional regime, we prove that the EOT plan concentrates around a population kernel on an effective manifold determined by the geometric mean of the distortions, with invariance to translations, orthogonal nuisance structure, and noise. Subsequently, we relate our embedding to eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.

URL PDF HTML ☆

赞 0 踩 0

2507.00260 2026-06-09 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Disentangled Feature Importance

解耦特征重要性

Jin-Hong Du, Kathryn Roeder, Larry Wasserman

发表机构 * Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China（香港大学统计与精算科学系）； Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China（香港大学数据科学穆斯克特基金会研究所）； Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA（卡内基梅隆大学统计与数据科学系）； Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA（卡内基梅隆大学计算生物学系）； Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA（卡内基梅隆大学机器学习系）

AI总结本文提出解耦特征重要性（DFI），用于解释相关测量通道中的预测信号分配，通过独立潜在表示和熵最优传输几何计算特征重要性，实现稳定且可解释的归因。

Comments 29 main and 44 supplementary pages

详情

AI中文摘要

当预测变量统计依赖时，特征重要性的适当定义取决于操作目标。条件增量措施适合于特征选择、获取和压缩，其中共享的预测信息被视为冗余。然而，对于事后解释，目标通常是将预测信号归因于相关测量通道。我们引入了解耦特征重要性（DFI），这是一种针对此设置的群体层面归因框架。DFI在指定的熵最优传输几何下将协变量映射到独立的潜在表示，计算潜在重要性，并通过巴里中心敏感度将重要性归因于原始协变量。我们证明了广泛的条件增量FI函数在平方误差损失下瞄准条件增量预测价值，因此回答了与依赖下的共享预测信号归因不同的问题。在固定传输成本、参考定律和正则化水平下，DFI定义了一个well-specified的估计量族。潜在分数具有功能ANOVA解释，并在高斯线性情况下，归因DFI恢复了相关回归器的经典R²分解。我们推导了在干扰率和光滑性条件下基于影响函数的推断，并在模拟和HIV-1中和抗性分析中展示了DFI在共享预测信号归因方面产生稳定、可解释、具有不确定性的归因。

英文摘要

When predictors are statistically dependent, the appropriate definition of feature importance depends on the operational goal. Conditional-incremental measures are well-suited for feature selection, acquisition, and compression, where shared predictive information is treated as redundancy. For post-hoc interpretation, however, the goal is often to attribute predictive signals across correlated measurement channels. We introduce Disentangled Feature Importance (DFI), a population-level attribution framework for this setting. DFI maps covariates to an independent latent representation under a specified entropic optimal transport geometry, computes latent importance, and attributes it back to the original covariates through barycentric sensitivities. We show that broad conditional-incremental FI functionals target conditional incremental predictive value under squared-error loss, and therefore answer a different question from attribution of shared predictive signal under dependence. Under fixed transport cost, reference law, and regularization level, DFI defines a well-specified family of estimands. Latent scores admit a functional ANOVA interpretation, and in the Gaussian linear case, the attributed DFI recovers the classical $R^2$ decomposition for correlated regressors. We derive influence-function-based inference under nuisance-rate and smoothness conditions, and show in simulations and an HIV-1 neutralization-resistance analysis that DFI yields stable, interpretable, uncertainty-quantified attributions of shared predictive signal.

URL PDF HTML ☆

赞 0 踩 0

2511.11041 2026-06-09 cs.CL cs.AI cs.LG 版本更新

基于频谱图神经网络强化学习的自愈智能电网故障检测

Lihui Liu, Mucun Sun, Caisheng Wang

发表机构 * Wayne State University（韦恩州立大学）； University of Texas at Dallas（德克萨斯大学达拉斯分校）

AI总结提出频谱图强化学习框架，利用频谱图神经网络学习最优恢复策略，实现配电网故障实时近最优管理，在三个IEEE测试系统上验证了泛化能力。

详情

AI中文摘要

自愈智能电网能够在故障期间快速调整其网络配置，以最小化电力中断。在故障期间，可以采取多种措施，例如通过开关操作进行网络重构和紧急甩负荷。然而，传统的用于故障缓解的机器学习方法由于响应速度慢和计算成本高，不适用于智能电网。为了解决这些挑战，最近的研究探索了使用强化学习自动执行网络重构。在这些方法中，控制策略通常使用图神经网络（GNN）建模。然而，传统的GNN在空间域中运行，可能无法捕捉频域中的重要关系。频域信息对于建模电力网络中的全局结构模式和系统范围交互特别有用。在本文中，我们提出了一种用于配电网故障管理的频谱图强化学习框架，以增强系统韧性。我们的模型使用频谱图神经网络学习最优电力恢复策略。我们在三个修改后的IEEE测试系统上评估了所提出的方法：13节点、34节点和123节点网络。实验结果表明，我们的方法在实时性上达到了接近最优的性能，并且在广泛的故障场景中具有良好的泛化能力。

英文摘要

Self-healing smart grids can quickly adjust their network configuration during outages to minimize power disruptions. During an outage, several actions can be taken, such as network reconfiguration through switching operations and emergency load shedding. However, traditional machine learning methods for outage mitigation are not well suited for smart grids due to their slow response time and high computational cost. To address these challenges, recent studies have explored reinforcement learning to automatically perform network reconfiguration. In these approaches, the control policy is typically modeled using a graph neural network (GNN). However, conventional GNNs operate in the spatial domain and may fail to capture important relationships in the frequency domain. Frequency-domain information is particularly useful for modeling global structural patterns and system-wide interactions in power networks. In this paper, we propose a spectral graph reinforcement learning framework for outage management in distribution networks to enhance system resilience. Our model learns the optimal power restoration policy using a spectral graph neural network. We evaluate the proposed method on three modified IEEE test systems: the 13-bus, 34-bus, and 123-bus networks. Experimental results show that our approach achieves near-optimal performance in real time and generalizes well across a wide range of outage scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.07592 2026-06-09 cs.LG 新提交

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

UNIQ: 离线强化学习中的自适应保守性共形校准

Aditya Upadhyay

发表机构 * IIIT Delhi（印度德里国际信息技术学院）

AI总结提出UNIQ方法，通过共形预测校准不确定性，实现状态自适应的保守性惩罚，在D4RL基准上以接近IQL的内存开销提升性能。

Comments 19 pages, 2 figures, ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情

AI中文摘要

离线强化学习需要谨慎的保守性来缓解分布偏移，然而大多数现有方法在所有状态上统一施加固定惩罚，而不考虑局部数据覆盖。我们提出UNIQ（不确定性信息分位数），一种通过共形校准不确定性估计引入状态自适应保守性的离线RL方法。基于隐式Q学习（IQL）主干，UNIQ训练一个多期望值集成，使用分裂共形预测计算无分布不确定性估计，并将所得信号映射到状态依赖的期望值，从而在覆盖良好的区域放松保守性，在数据边界附近的不确定区域加强保守性。在D4RL MuJoCo基准上，UNIQ持续优于IQL，在Walker2d和重放密集型任务上提升最大。同时，UNIQ以接近IQL的内存成本（约250 MB峰值VRAM）运行，相比EDAC提供约10倍的减少。我们不追求整体最先进性能，而是将UNIQ定位为一种实用机制贡献，改进了离线强化学习中的性能-效率权衡。

英文摘要

Offline reinforcement learning requires careful conservatism to mitigate distribution shift, yet most existing methods apply a fixed penalty uniformly across all states regardless of local data coverage. We present UNIQ (Uncertainty-Informed Quantile), an offline RL method that introduces state-adaptive conservatism through conformally calibrated uncertainty estimation. Built on the Implicit Q-Learning (IQL) backbone, UNIQ trains a multi-expectile value ensemble, computes distribution-free uncertainty estimates using split conformal prediction, and maps the resulting signal to a state-dependent expectile that relaxes conservatism in well-covered regions while strengthening it in uncertain regions near the data frontier. On D4RL MuJoCo benchmarks, UNIQ consistently improves over IQL, with the largest gains observed on Walker2d and replay-heavy tasks. At the same time, UNIQ operates at near-IQL memory cost (approximately 250 MB peak VRAM), providing roughly a 10x reduction compared to EDAC. Rather than pursuing overall state-of-the-art performance, we position UNIQ as a practical mechanism contribution that improves the performance-efficiency trade-off in offline reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2606.07602 2026-06-09 cs.LG cs.AI 新提交

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

面向LEGO空间物理推理的样本高效后训练

Yuhuan Yuan, Zhouliang Yu, Minghao Liu, Weiyang Liu, Ge Lin Kan

发表机构 * HKUST(GZ)（香港科技大学（广州））； CUHK（香港中文大学）； ZODA

AI总结针对LLM生成LEGO组装时出现的物理有效但几何语义错位问题，提出基于模型的数据选择方法和样本高效强化学习PVPO，结合体素空间几何奖励，提升结构、语义对齐和物理有效性。

Comments Technical Report V1, 15 pages, 6 figures, 3 tables

详情

AI中文摘要

基于LLM的LEGO组装生成需要同时具备语义基础和物理可行性。我们发现一种数据引发的失败模式PhysHack，其中组装满足物理有效性约束，但产生的结构在几何上错位、语义上不一致或校准不良。为应对这一挑战，我们提出一种基于模型的数据选择方法，仅使用一小部分训练数据，同时改进基于物理的LEGO组装生成。基于所选轨迹，我们引入PVPO，一种样本高效的强化学习方法，将物理可行性与体素空间几何奖励相结合。我们的结果表明，仅物理有效性不足以作为可靠物理推理的代理：模型可以学习生成有效结构而不保持语义或几何保真度。跨模型主干和测试时缩放设置的实验表明，PVPO改善了结构和语义对齐、物理有效性、结构稳定性和校准，同时减少了对大量事后拒绝采样的依赖。特别是，校准结果表明，PVPO通过使测试时选择更能预测语义和结构质量来缓解PhysHack。

英文摘要

LLM-based LEGO assembly generation requires both semantic grounding and physical feasibility. We identify a data-induced failure mode, PhysHack, in which the assemblies satisfy physical-validity constraints while producing structures that are geometrically misaligned, semantically inconsistent, or poorly calibrated. To address this challenge, we propose a model-based data selection approach that uses only a small fraction of the training data while improving physically grounded LEGO assembly generation. Building on the selected trajectories, we introduce PVPO, a sample-efficient reinforcement learning method that couples physical feasibility with voxel-space geometric rewards. Our results show that physical validity alone is an insufficient proxy for reliable physical reasoning: models can learn to generate valid structures without preserving semantic or geometric fidelity. Experiments across model backbones and test-time scaling settings demonstrate that PVPO improves structural and semantic alignment, physical validity, structural stability, and calibration, while reducing reliance on extensive post-hoc rejection sampling. In particular, results on calibration show that PVPO mitigates PhysHack by making test-time selection more predictive of semantic and structural quality.

URL PDF HTML ☆

赞 0 踩 0

2606.07610 2026-06-09 cs.LG cs.AI cs.CL 新提交

LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training

LEAF: 无需分支的树生长方法用于语音感知大语言模型后训练

Argyrios Gerogiannis, Yekaterina Yegorova, Mark Hasegawa-Johnson, Venugopal V. Veeravalli

发表机构 * University of Illinois, Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结针对语音感知大语言模型后训练中GRPO方法粗粒度信用分配问题，提出LEAF方法，通过回溯式树结构学习、高信息量边界选择和跨度级优势分配，在语音问答和翻译任务上超越GRPO。

Comments 15 pages, 3 figures, 11 tables

详情

AI中文摘要

最先进的GRPO风格方法在语音感知大语言模型后训练中存在粗粒度信用分配问题，将相同的终端奖励优势广播给响应中的每个token。这忽略了rollout批次中的有用结构，其中语音条件下的补全通常共享前缀，然后在重要决策处出现分歧。我们提出低秩探索自适应分叉（LEAF），一种基于回溯树的强化学习方法，无需在线分支或额外解码即可恢复这种结构。LEAF采样完整响应，选择高信息量边界，按共享前缀分组响应，并使用后代奖励分配跨度级优势。我们从理论上证明了LEAF的跨度级信用分配和边界选择设计。实验上，在相同的rollout和低秩适应预算下，LEAF在语音问答和语音翻译基准上优于GRPO。值得注意的是，较小的LEAF训练模型优于当前最先进的完全参数基线。

英文摘要

State-of-the-art GRPO-style methods for speech-aware large language model post-training suffer from coarse credit assignment, broadcasting the same terminal-reward advantage to every token in a response. This ignores useful structure within rollout batches, where speech-conditioned completions often share prefixes before diverging at important decisions. We propose Low-rank Exploration with Adaptive Forking (LEAF), a retrospective tree-based RL method that recovers this structure without online branching or additional decoding. LEAF samples complete responses, selects high-surprisal boundaries, groups responses by shared prefixes, and assigns span-level advantages using descendant rewards. We theoretically justify LEAF's span-level credit assignment and boundary-selection design. Empirically, LEAF improves over GRPO across speech question answering and speech translation benchmarks under the same rollout and low-rank adaptation budget. Notably, smaller LEAF-trained models outperform current state-of-the-art, full-parameter baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.07705 2026-06-09 cs.LG cs.AI 新提交

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

SAW: 面向大语言模型多目标强化学习的阶段感知动态加权

Yuchen He, Baolong Bi, Shenghua Liu, Huaming Liao, Yuyao Ge, Bolin Wan, Siqian Tong, Juan Chen, Jiafeng Guo, Xueqi Cheng

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Electronic Science and Technology of China（电子科技大学）

AI总结针对多目标强化学习中奖励学习异步性问题，提出轻量级动态加权机制SAW，利用变异系数实时调整各目标贡献，在GRPO和GDPO框架下提升训练效率和最终性能。

Comments 17 pages, 7 figures, 5 tables

详情

AI中文摘要

尽管多目标强化学习（MORL）对于将大语言模型与复杂的人类偏好对齐至关重要，但当前普遍采用的静态加权求和忽略了一个更基本的现象：不同目标之间的奖励学习明显异步。学习良好的维度会迅速产生同质、低方差的信号，其残留噪声会污染聚合奖励（在GRPO中）或占据优势预算的固定份额（在GDPO中），从而干扰学习不足维度携带的稀缺但高价值的信号。为了解决这种异步性，我们提出了阶段感知动态加权（SAW），一种轻量级、算法无关的动态加权机制。SAW利用变异系数（CV）作为实时信息量的尺度不变代理，根据批次内各维度的相对信息量重新加权其奖励或优势贡献。与需要多次前向和反向传播的基于梯度的方法不同，SAW仅依赖于批次级统计信息，引入的计算开销几乎可以忽略不计。在工具调用和文本摘要任务上的实验表明，SAW在GRPO和GDPO框架下均能一致地提高训练效率和最终性能，证实了其作为多奖励LLM对齐的通用插件。我们的代码可在 https://github.com/Zhaolutuan/SAW 获取。

英文摘要

Although multi-objective reinforcement learning (MORL) is central to aligning large language models with complex human preferences, the prevailing practice of static weighted summation overlooks a more fundamental phenomenon: reward learning is markedly asynchronous across objectives. Well-learned dimensions quickly produce homogeneous, low-variance signals whose residual noise contaminates the aggregated reward (in GRPO) or occupies a fixed share of the advantage budget (in GDPO), interfering with the scarce yet high-value signals carried by under-learned dimensions. To address this asynchrony, we propose Stage-Aware Dynamic Weighting (SAW), a lightweight, algorithm-agnostic dynamic weighting mechanism. SAW utilizes the coefficient of variation (CV) as a scale-invariant proxy for real-time informativeness, reweighting each dimension's reward or advantage contribution by its relative informativeness within the batch. Unlike gradient-based methods that require multiple forward and backward passes, SAW relies solely on batch-level statistics, introducing nearly negligible computational overhead. Experiments on tool-calling and text summarization tasks demonstrate that SAW consistently improves both training efficiency and final performance under both GRPO and GDPO frameworks, confirming it as a general-purpose plug-in for multi-reward LLM alignment. Our code is available at https://github.com/Zhaolutuan/SAW

URL PDF HTML ☆

赞 0 踩 0

2606.07910 2026-06-09 cs.LG 新提交

CAAL: Contextual Bandits based Online Hand-Craft Active Learning Strategy Selection

CAAL: 基于上下文赌博机的在线手工主动学习策略选择

Shao-An Yin, Jiacong Li, Tianpei Xie, Cecile Levasseur, Wojciech Kowalinski, Nicola Elia

发表机构 * University of Minnesota, Twin Cities（明尼苏达大学双城分校）； Amazon（亚马逊）

AI总结提出CAAL框架，利用上下文信息和奖励预测动态选择主动学习策略，在公共数据集上优于现有基线方法。

Comments 8 pages, 5 figures, Accepted to the NYRL 2025 Workshop

2606.07950 2026-06-09 cs.LG 新提交

The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

简单、困难与可学习：面向LLM推理的置信度与难度自适应策略优化

Zhanke Zhou, Xiangyu Lu, Chentao Cao, Brando Miranda, Tongliang Liu, Bo Han, Sanmi Koyejo

发表机构 * TMLR Group, Department of Computer Science, Hong Kong Baptist University（香港 Baptist 大学计算机科学系 TMLR 组）； Stanford University（斯坦福大学）； Sydney AI Centre, The University of Sydney（悉尼大学人工智能中心）

AI总结针对GRPO训练中均匀采样导致计算效率低的问题，提出CoDaPO方法，通过置信度和难度自适应重加权与重采样，在固定预算下提升可学习问题的发现，在12个基准上优于现有RL方法。

Comments Published in ICML 2026

详情

AI中文摘要

具有可验证奖励的强化学习可以显著提升LLM的推理能力，然而标准的GRPO风格训练通常通过均匀采样和加权同等对待简单、困难和可学习的问题，导致计算分配效率低下。我们通过跟踪token对数概率、组归一化优势以及由此产生的token级更新权重来研究GRPO。这揭示了随着训练进行出现的三种重复动态：(1) 置信度膨胀，(2) 优势收缩，以及(3) 层次收敛。这些发现表明，每次更新的效用很大程度上取决于问题难度和模型当前的能力。受此启发，我们提出了置信度与难度自适应策略优化（CoDaPO），该方法根据展开置信度和经验难度为每个问题分配一个有界值。CoDaPO随后使用该值对策略更新进行重新加权，并在小批量内重新采样高价值的可学习问题，从而在固定计算预算下增加可学习带内的发现。在十二个基准测试中，CoDaPO在准确性上持续优于现有的RL方法。我们的代码公开在 https://github.com/tmlr-group/CoDaPO。

英文摘要

RL with verifiable rewards can substantially improve LLM reasoning, yet standard GRPO-style training often treats easy, hard, and learnable questions alike through uniform sampling and weighting, leading to inefficient compute allocation. We study GRPO by tracking token log-probabilities, group-normalized advantages, and the induced token-level update weights. This reveals three recurring dynamics as training proceeds: (1) confidence inflation, (2) advantage contraction, and (3) hierarchical convergence. These findings suggest that the utility of each update depends strongly on both question difficulty and the model's current competence. Motivated by this, we propose Confidence and Difficulty-adaptive Policy Optimization (CoDaPO), which assigns each question a bounded value from rollout confidence and empirical difficulty. CoDaPO then uses this value to reweight policy updates and resample high-value learnable questions within mini-batches, thereby increasing discovery within the learnable band under a fixed compute budget. Across twelve benchmarks, CoDaPO consistently improves accuracy over existing RL methods. Our code is publicly available at https://github.com/tmlr-group/CoDaPO.

URL PDF HTML ☆

赞 0 踩 0

2606.08068 2026-06-09 cs.LG 新提交

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

DICE: 用于稳定多智能体LLM协调的熵正则化均衡选择

Yi Xie, Zhanke Zhou, Chentao Cao, Bo Liu, Bo Han

发表机构 * University of Arizona（亚利桑那大学）； Hong Kong Baptist University（香港浸会大学）

AI总结提出DICE框架，通过熵正则化均衡选择（HQRE）解决多智能体LLM协调中的不稳定性，实现线性收敛和有限贝叶斯遗憾，在11个基准上平均提升4.3-8.5个百分点。

详情

AI中文摘要

多智能体大语言模型（LLM）系统通常无法可靠地超越配备最佳N采样的单个强模型。我们认为这种不稳定性的一个核心来源是病态的均衡选择：当前系统指定了智能体共享哪些信息，但没有指定应选择哪种协调约定。我们将此类系统的一类广泛形式化为折扣不完全信息马尔可夫博弈，并表明两种常见病理——竞争约定之间的振荡和跨约定漂移——均可导致不稳定的学习和线性贝叶斯遗憾。为了获得一个良定义的目标，我们引入了异质量化响应均衡（HQRE），这是一种具有智能体和状态依赖温度的熵正则化均衡概念。在单调性条件下，HQRE是唯一的，允许线性收敛的镜像更新，并产生有界的贝叶斯遗憾；相同的条件产生可 rollout 测量的稳定性诊断。我们在两种算法中实例化这一目标：DICE-PC，通过提示控制动作协调冻结模型，以及DICE-FT，执行参数高效的镜像微调。在四个领域的十一个基准测试中，DICE在准确性-成本权衡上优于强类内基线；在推理和规划任务上，DICE-PC平均提高4.3个百分点，DICE-FT提高8.5个百分点。

英文摘要

Multi-agent large language model (LLM) systems often fail to reliably outperform a single strong model equipped with best-of-N sampling. We argue that a core source of this instability is ill-posed equilibrium selection: current systems specify what information agents share, but not which coordination convention should be selected. We formalize a broad class of such systems as discounted incomplete-information Markov games and show that two common pathologies, oscillation between competing conventions and drift across them, can both induce unstable learning and linear Bayesian regret. To obtain a well-posed target, we introduce the Heterogeneous Quantal Response Equilibrium (HQRE), an entropy-regularized equilibrium concept with agent- and state-dependent temperatures. Under a monotonicity condition, HQRE is unique, admits linearly convergent mirror updates, and yields bounded Bayesian regret; the same condition yields rollout-measurable stability diagnostics. We instantiate this objective in two algorithms: DICE-PC, which coordinates frozen models through prompt-control actions, and DICE-FT, which performs parameter-efficient mirror fine-tuning. Across eleven benchmarks in four domains, DICE improves accuracy-cost trade-offs over strong within-class baselines; on reasoning and planning tasks, DICE-PC improves by 4.3 percentage points on average and DICE-FT by 8.5 points.

URL PDF HTML ☆

赞 0 踩 0

2606.08088 2026-06-09 cs.LG cs.CL 新提交

ConSteer-RL: Steering Reasoning Capabilities in Large Language Models via Confidence-Aware Reinforcement Learning

ConSteer-RL：通过置信度感知强化学习引导大型语言模型的推理能力

Qing Miao, Yiming Zhao, Jing Yang, Chenxi Liu, Yuehai Chen, Yuewen Liu, Shaoyi Du, Badong Chen

发表机构 * Xi'an Jiaotong University（西安交通大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出ConSteer-RL框架，将模型log概率的token级置信度信号融入GRPO，通过置信度感知奖励塑造机制惩罚过度自信错误并强化正确自信推理，在多个模型规模上平均提升2.3%-4.0%。

2606.08360 2026-06-09 cs.LG cs.AI 新提交

Generative Frontier Planning for Adaptive Peer-Referral Recruitment under Covariate-Dependent Arrivals

协变量依赖到达下的自适应同伴推荐招募的生成前沿规划

Lingkai Kong, Hezi Jiang, Andrew Ma, Keyu Wang, Akseli Kangaslahti, Milind Tambe

发表机构 * Harvard University（哈佛大学）

AI总结针对同伴推荐招募中协变量依赖到达的现实问题，提出生成前沿规划（GFP），通过确定性备份和边际贪心分配实现高效规划，在模拟实验中优于基线方法。

详情

AI中文摘要

同伴推荐招募系统（如受访者驱动抽样）对于研究和干预受传染病影响的隐藏人群至关重要。为了加速招募，公共卫生机构必须在多轮中自适应地分配有限的推荐资源，当前决策影响未来招募者的数量和协变量。先前的工作通过假设推荐来自同质总体的独立同分布抽样使问题可解，但忽略了驱动真实同伴推荐的同质性和共享背景。我们考虑一个更现实的模型，其中推荐容量和新推荐个体的协变量都依赖于推荐者，并通过删失计数模型和条件生成模型从数据中学习。由此产生的规划问题具有挑战性，因为每个候选分配都会导致未来招募者的不同分布。我们提出生成前沿规划（GFP），一种基于模型的规划器，用潜在协变量覆盖值替代的确定性备份替代每步蒙特卡洛采样。该替代的设计使得下一个前沿的期望值仅通过离线摊销的有限维摘要依赖于后代生成模型，并且使得每轮目标具有单调递减收益。这两个性质共同使规划易于处理：确定性备份消除了蒙特卡洛采样，递减收益结构使得边际贪心分配能够为每轮问题实现(1-1/e)近似。在根据真实受访者驱动抽样数据集校准的模拟环境中，GFP在四个折扣因子下均优于随机、强化学习和独立同分布动态规划基线。

英文摘要

Peer-referral recruitment systems such as respondent-driven sampling are critical for studying and intervening on hidden populations affected by infectious diseases. To accelerate recruitment, public health agencies must adaptively allocate limited referral resources across multiple rounds, where current decisions shape both the number and the covariates of future recruits. Prior work makes this problem tractable by assuming that referrals are drawn i.i.d.\ from a homogeneous population, an assumption that ignores the homophily and shared context that drive real peer recruitment. We instead consider a more realistic model in which both referral capacity and the covariates of newly referred individuals are conditioned on the referrer, learned from data with a censored count model and a conditional generative model. The resulting planning problem is challenging because each candidate allocation induces a different distribution over future recruits. We propose \emph{Generative Frontier Planning} (GFP), a model-based planner that replaces per-step Monte-Carlo sampling with a deterministic backup over a latent covariate-coverage value surrogate. The surrogate is designed so that the expected value of the next frontier depends on the offspring generative model only through finite-dimensional summaries that are amortized offline, and so that the resulting per-round objective is monotone with diminishing returns. Together, these two properties make planning tractable: the deterministic backup eliminates Monte-Carlo sampling, and the diminishing-returns structure lets a marginal greedy allocation achieve a $(1-1/e)$-approximation for the per-round problem. On a simulation environment calibrated to a real respondent-driven sampling dataset, GFP outperforms random, reinforcement-learning, and i.i.d.\ dynamic-programming baselines across four discount factors.

URL PDF HTML ☆

赞 0 踩 0

2606.08410 2026-06-09 cs.LG cs.AI 新提交

Provably Efficient Personalized Multi-Objective Bandits with Proactive Conversational Queries

具有主动对话查询的可证明高效个性化多目标老虎机

Linfeng Cao, Ming Shi, Ness B. Shroff

发表机构 * The Ohio State University（俄亥俄州立大学）； University at Buffalo（布法罗大学）

AI总结提出MO-PQUCB算法，通过主动查询获取用户偏好信号，结合Plackett-Luce模型和正则化UCB，解决多目标老虎机中偏好与奖励的耦合问题，实现更优的遗憾界。

Comments UAI 2026

详情

AI中文摘要

多目标老虎机中的个性化决策需要学习用户在不同竞争目标之间的特定权衡。由于臂的效用既取决于未知奖励又取决于未知偏好，现有方法仅从效用反馈中推断偏好，将偏好学习与奖励探索纠缠在一起。然而，在实践中，用户通常通过主动对话查询（例如，“便宜且干净的酒店”）揭示他们的优先级，但这种结构化信号未被利用。我们形式化了一个基于主动查询的框架，其中用户查询提供结构化的偏好信号。通过Plackett-Luce子集选择模型对这些信号进行建模，我们证明了由于基本的平移不变性障碍，仅查询学习是不够的。为了解决这个问题，我们引入了MO-PQUCB，一种混合算法，通过平移不变正则化和双探索UCB将基于查询的偏好锚定与老虎机反馈相结合。我们证明了主动查询加速了偏好估计，并相比先前偏好感知的MO-MAB方法实现了改进的遗憾缩放。在查询被破坏的情况下，我们进一步刻画了统计极限，并设计了一个鲁棒估计器，在破坏稀疏时实现接近最优的性能。实验验证了理论和实际收益。

英文摘要

Personalized decision-making in multi-objective bandits requires learning user-specific trade-offs among competing objectives. Since arm utility depends on both unknown rewards and unknown preferences, existing methods infer preferences only from utility feedback, entangling preference learning with reward exploration. In practice, however, users often reveal their priorities through proactive conversational queries (e.g., "cheap and clean hotel"), yet this structured signal is not leveraged. We formalize a proactive query-based framework in which user queries provide structured preference signals. Modeling these signals via a Plackett-Luce subset choice model, we show that query-only learning is insufficient due to a fundamental shift-invariance barrier. To resolve this, we introduce MO-PQUCB, a hybrid algorithm that integrates query-based preference anchoring with bandit feedback through shift-invariant regularization and dual-exploration UCB. We prove that proactive queries accelerate preference estimation and yield improved regret scaling over prior preference-aware MO-MAB methods. Under corrupted queries, we further characterize statistical limits and design a robust estimator achieving near-optimal performance when the corruption is sparse. Experiments validate both theoretical and practical gains.

URL PDF HTML ☆

赞 0 踩 0

2606.08533 2026-06-09 cs.LG cs.RO 新提交

Autonomous Aerial Manipulation via Contextual Contrastive Meta Reinforcement Learning

通过上下文对比元强化学习的自主空中操控

Lixuan Jin, Bingxuan Lan, Xinyi Bao, Xiangyuan Xie, Chunjie Zhang, Zheng Chen, Tianshuo Liu, Ruijie Tian, Jinyu Ru, Gang Wang, Lei Yuan, Yang Yu

发表机构 * National Key Laboratory of Novel Software Technology, Nanjing University（南京大学计算机软件新技术国家重点实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； Faculty of Robot Science and Engineering, Northeastern University（东北大学机器人科学与工程学院）； National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology（北京理工大学自主智能无人系统国家重点实验室）

AI总结提出Aco2方法，通过上下文对比元强化学习，使四旋翼无人机在无需人工干预下自主完成不同载荷的抓取、运输和投递，并直接迁移到真实世界。

详情

AI中文摘要

无人机越来越多地部署在物流、服务机器人等实际应用中，对自主载荷获取和投递的需求日益增长。现有方法通常假设预附载荷或依赖专用夹爪，使得通用的端到端空中投递问题仍未解决，因为不同载荷会导致高度变化的飞行动力学，需要单一策略在线适应，无需手动校准或显式系统辨识。为此，我们研究了通过上下文对比元强化学习的自主空中操控（\textbf{\textit{Aco2}}），这是一个完全自主的空中投递设置，其中配备轻型钩子的四旋翼无人机连续拾取、运输和投递各种带手柄的物体，在随机位置之间进行，全程无需人工干预。首先，我们设计了一个上下文观测编码器，从最近的交互历史中推断出紧凑的潜在上下文，使策略能够在线适应载荷相关的动力学。为了进一步提高上下文质量，我们引入了一个对比目标，该目标围绕任务相关变化结构化上下文嵌入，从而改善跨不同载荷的泛化能力，无需显式系统辨识。完全在模拟中训练，并采用广泛的域随机化，\textit{Aco2}可以直接部署在物理四旋翼上，无需真实世界微调。

英文摘要

Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real-world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre-attached payloads or rely on specialized grippers, leaving versatile end-to-end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study \textbf{A}utonomous \textbf{A}erial Manipulation via \textbf{Co}ntextual \textbf{Co}ntrastive Meta Reinforcement Learning (\textbf{\textit{Aco2}}), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle-equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload-dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task-relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, \textit{Aco2} can be directly deployed on a physical quadrotor without real-world fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.08602 2026-06-09 cs.LG cs.AI 新提交

Claw-R1：面向智能体强化学习的步骤级数据中间件系统

Daoyu Wang, Mingyue Cheng, Qingchuan Li, Shuo Yu, Jie Ouyang, Qi Liu

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China（中国科学技术大学认知智能国家重点实验室）

AI总结提出Claw-R1系统，通过网关服务器和数据池组件，将智能体交互步骤转化为结构化数据资产，支持实时检查、质量筛选和训练批次配置，解决智能体强化学习中数据生命周期管理问题。

详情

AI中文摘要

智能体强化学习已成为将大语言模型从静态聊天机器人转变为交互式智能体的重要后训练范式，催生了如OpenClaw等代表性应用。现有工作主要关注策略优化算法和训练框架，但对从数据产生到训练消费的智能体-环境交互完整数据生命周期关注不足。为弥补这一差距，我们提出Claw-R1，一个面向智能体强化学习的交互式步骤级数据中间件系统。Claw-R1通过两个核心组件——网关服务器和数据池——连接异构智能体运行时与强化学习训练后端。网关服务器通过统一的LLM API入口捕获多轮交互步骤，而数据池将其组织为由提示ID、响应ID、奖励和其他元数据组成的步骤级记录。在我们的演示中，用户可以交互式检查实时轨迹，查看每一步的状态、动作和奖励，根据质量和就绪程度筛选数据，并为不同的下游强化学习算法配置训练就绪批次。总体而言，Claw-R1将智能体交互轨迹视为受管理的数据资产，而非临时运行时日志。通过此演示，我们希望鼓励社区认识到数据管理在智能体强化学习中的重要性。我们的代码可在https://github.com/AgentR1/Claw-R1获取，演示视频可在https://youtu.be/Pw47dAOw6B0找到。

英文摘要

Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from static chatbots into interactive agents, giving rise to representative applications such as OpenClaw. Existing work mainly focuses on policy optimization algorithms and training frameworks, but pays less attention to the full data lifecycle of agent-environment interactions, from data production to training consumption. To bridge this gap, we present Claw-R1, an interactive step-level data middleware system for agentic RL. Claw-R1 connects heterogeneous agent runtimes with RL training backends through two core components: a Gateway Server and a Data Pool. The Gateway Server captures multi-turn interaction steps through a unified LLM API entry point, while the Data Pool organizes them into step-level records consisting of prompt IDs, response IDs, rewards and other metadata. In our demo, users can interactively inspect live trajectories, examine the state, action, and reward of each step, curate data by quality and readiness, and configure training-ready batches for different downstream RL algorithms. Overall, Claw-R1 treats agent interaction traces as managed data assets rather than temporary runtime logs. Through this demonstration, we hope to encourage the community to recognize the importance of data management in agentic RL. Our code is available at https://github.com/AgentR1/Claw-R1 and the demonstration video can be found at link https://youtu.be/Pw47dAOw6B0.

URL PDF HTML ☆

赞 0 踩 0

2606.09191 2026-06-09 cs.LG stat.ML 新提交

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

风险厌恶型多臂赌博机中汤普森采样的渐近最优性（次高斯奖励）

Joel Q. L. Chang

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文证明了一种无锚非参数汤普森采样算法在风险厌恶型多臂赌博机中达到实例依赖的渐近最优后悔界，适用于任意连续风险泛函，且仅需连续性条件，优于先前参数方法。

Comments 10 pages, 4 figures

详情

AI中文摘要

我们证明 $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$，一种用于风险厌恶型多臂赌博机的无锚非参数汤普森采样算法，其遗憾值在 $\log n$ 的主阶上匹配实例依赖下界，从而确立了它在具有有界密度和次高斯尾部（包括高斯臂）的分布类上对任意连续风险泛函 $\rho$（CVaR、均值-方差、夏普比率、扭曲风险度量等）的渐近最优性。该结果及其有界支撑版本仅要求 $\rho$ 的连续性：严格弱于先前参数汤普森采样结果的支配条件，也严格弱于UCB类算法的Lipschitz条件，从而在无参数奖励假设下首次为夏普比率等非Lipschitz泛函提供了实例最优保证。有界支撑情形作为具有相同证明结构的垫脚石首先被发展。关键技术贡献是一个离散化引理（有界支撑）和一个截断离散化引理（次高斯尾部），每个引理通过Dirichlet聚合性质将增长字母表的Dirichlet后验投影到固定网格上，保持所有多项式前因子在固定次数且独立于样本量，打破了先前证明中阻碍的超指数障碍。

英文摘要

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $ρ$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $ρ$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.

URL PDF HTML ☆

赞 0 踩 0

2606.09348 2026-06-09 cs.LG cs.CL 新提交

PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

PBSD: 特权贝叶斯自蒸馏用于长程信用分配

Yang Tian, Rui Wang, Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian, Bo Zhao

发表机构 * School of AI, Shanghai Jiao Tong University（上海交通大学人工智能学院）； XYZ AI Lab（XYZ AI实验室）

AI总结提出PBSD方法，通过贝叶斯校准的自蒸馏将稀疏最终奖励转化为细粒度步骤级信用信号，解决长程智能体任务中的信用分配问题，实验表明其提升领域内外性能并促进泛化。

详情

AI中文摘要

长程智能体任务对基于结果的强化学习提出了根本性的信用分配挑战：轨迹级奖励验证最终正确性，但很少指导哪些中间推理步骤或工具交互对结果有贡献。在多轮搜索智能体中，这一困难尤为突出，因为成功轨迹可能包含误导性动作，而失败轨迹可能包含有价值的证据收集步骤。我们提出PBSD（特权贝叶斯自蒸馏），一种在稀疏最终奖励下进行细粒度信用分配的贝叶斯校准自蒸馏方法。PBSD通过验证答案的后验与先验概率比来衡量轨迹质量，并应用贝叶斯规则将这个难以估计的答案侧比率转化为标准学生模型与特权答案条件教师模型之间的易处理似然比。对该贝叶斯证据分数的自回归分解产生轮级信号，识别每个中间轮次是支持还是破坏已验证结果。因此，PBSD提供了一种原则性且优雅的重新加权方案，将稀疏结果监督转化为贝叶斯校准的轮级信用信号，同时完全兼容标准策略优化。实验表明，PBSD在领域内和领域外设置中均持续提升性能，并有效将知识从短上下文训练迁移到长上下文推理，表明其细粒度信用分配机制促进了更有效的策略学习并带来更好的泛化。

英文摘要

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents, where successful trajectories may contain misleading actions and failed trajectories may contain valuable evidence-gathering steps. We propose PBSD (Privileged Bayesian Self-Distillation), a Bayes-calibrated self-distillation method for fine-grained credit assignment under sparse final rewards. PBSD measures trajectory quality through the posterior-to-prior probability ratio of the verified answer and applies Bayes' rule to convert this hard-to-estimate answer-side ratio into a tractable likelihood ratio between a standard student model and a privileged answer-conditioned teacher model. Autoregressive decomposition of this Bayesian evidence score yields turn-level signals that identify whether each intermediate turn supports or undermines the verified outcome. Consequently, PBSD provides a principled and elegant reweighting scheme that transforms sparse outcome supervision into Bayes-calibrated turn-level credit signals, while remaining fully compatible with standard policy optimization. Experiments demonstrate that PBSD consistently enhances performance across both in-domain and out-of-domain settings, and effectively transfers knowledge from short-context training to long-context inference, suggesting that its fine-grained credit assignment mechanism facilitates more effective policy learning and yields improved generalization.

URL PDF HTML ☆

赞 0 踩 0

2606.09380 2026-06-09 cs.LG cs.AI cs.CL 新提交

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

推理竞技场：当可验证奖励不足时的轨迹锦标赛

Han Zhou, Adam X. Yang, Laurence Aitchison, Anna Korhonen, Albert Q. Jiang

发表机构 * University of Cambridge（剑桥大学）； Mistral AI

AI总结提出推理竞技场框架，通过轨迹锦标赛将无梯度信号的非多样奖励组转化为相对奖励信号，结合Bradley-Terry模型高效整合强化学习，在数学和编码基准上平均提升7.6%，加速训练27%-41%。

Comments 9 pages, 6 figures, 2 tables (17 pages including references and appendices)

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为通过结果监督提升大语言模型推理能力的主流范式。然而，可验证奖励在组级别常常变得无信息：当给定提示的所有采样轨迹获得相同奖励时，组相对优势估计无法提供梯度信号，尽管这些轨迹在推理质量上可能差异显著。我们提出推理竞技场，一种自适应训练框架，将此类非多样奖励组路由至裁判系统而非丢弃。除了检查最终答案，推理竞技场构建轨迹锦标赛，其中推理轨迹进行两两比较以暴露组内更细粒度的偏好，将推理质量转化为丰富的相对奖励信号。为使奖励估计高效，而非穷举比较每一对，每个新轨迹与一个动态更新的先前生成轨迹小池作为锚点进行评估，以高效建立相对排名。然后我们在不完整比较图上拟合Bradley-Terry模型，实现无需二次成对比较的可扩展强化学习集成。实验结果表明，推理竞技场在竞赛数学和编码基准上平均比RLVR基线高出7.6%。通过将原本浪费的零优势样本转化为有用的梯度更新，我们的方法加速训练27%至41%，节省近50%的生成计算量，并显著提升整体推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given prompt receive identical rewards, group-relative advantage estimation provides no gradient signal, even though the traces may differ substantially in reasoning quality. We propose Reasoning Arena, an adaptive training framework that routes such non-diverse reward groups to a judge system instead of discarding them. Beyond examining the final answer, Reasoning Arena constructs trace tournaments, where reasoning traces are compared head-to-head to expose finer-grained preferences within the group, converting reasoning quality into rich relative reward signals. To make reward estimation efficient, rather than exhaustively comparing every pair, each new trace is evaluated against a small, dynamically updated pool of previously generated traces as anchors to efficiently establish a relative ranking. We then fit a Bradley-Terry model on the incomplete comparison graph, enabling scalable RL integration without quadratic pairwise comparisons. Empirical results demonstrate that Reasoning Arena consistently outperforms the RLVR baseline by 7.6% on average in competition mathematics and coding benchmarks. By converting otherwise wasted zero-advantage samples into useful gradient updates, our method accelerates training by 27% to 41%, saving nearly 50% of generation compute, and substantially improves overall reasoning performance.

URL PDF HTML ☆

赞 0 踩 0

2606.09668 2026-06-09 cs.LG 新提交

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

具有速率最优队列长度遗憾的上下文队列赌博机算法

Seoungbin Bae, Dabeen Lee

发表机构 * KAIST（韩国科学技术院）； Seoul National University（首尔大学）

AI总结针对上下文队列赌博机问题，提出三阶段算法CQB-η-2，通过仅在截止轮前进行随机探索，将队列长度遗憾从Õ(T^{-1/4})改进到Õ(T^{-1/2})，并证明该速率在最小最大意义下最优。

详情

AI中文摘要

上下文队列赌博机为在未知上下文相关服务速率下学习调度异构作业提供了框架。在随机上下文下，现有算法实现了 $\widetilde{\mathcal{O}}(T^{-1/4})$ 的队列长度遗憾，定义为学习者在时间 $T$ 的队列长度与最优队列长度之差的期望。本文将该速率改进至 $\widetilde{\mathcal{O}}(T^{-1/2})$。关键观察是随机探索仅需在精心选择的截止轮之前进行，而非整个时间范围。我们提出 CQB-$\eta$-2，一个三阶段算法：(i) 纯随机探索以构建初始估计器，(ii) $\eta$-随机探索结合 UCB 规则以在保持负漂移的同时继续学习，(iii) 探索截止后的纯 UCB。我们的证明在截止轮处分解队列长度遗憾。截止前，负漂移抑制了由次优选择引起的队列长度差异。截止后，前两个阶段提供了足够的随机探索样本，确保 UCB 决策导致的离开率差距较小。结合这两个界得到 $\widetilde{\mathcal{O}}(T^{-1/2})$ 阶的队列长度遗憾。我们进一步证明了 $\Omega(T^{-1/2})$ 阶的最小最大下界。证明构造了两个统计上不可区分的困难实例直到最终服务决策，并使用队列特定的耦合论证将由此产生的检验误差转化为队列长度遗憾。综上，我们的上下界刻画了在时间 $T$ 上的最小最大依赖关系（忽略对数因子）。

英文摘要

Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ queue length regret, defined as the expected difference between the learner's and oracle's queue lengths at horizon $T$. In this paper, we improve this rate to $\widetilde{\mathcal{O}}(T^{-1/2})$. The key observation is that random exploration is needed only up to a carefully chosen cutoff round, rather than throughout the entire horizon. We propose CQB-$η$-2, a three-phase algorithm: (i) pure random exploration to construct an initial estimator, (ii) $η$-random exploration combined with a UCB rule to continue learning while maintaining negative drift, and (iii) pure UCB after the exploration cutoff. Our proof decomposes the queue length regret at the cutoff round. Before the cutoff, negative drift suppresses queue length differences caused by suboptimal choices. After the cutoff, the first two phases provide sufficient random exploration samples, ensuring that UCB decisions incur small departure-rate gaps. Combining these two bounds yields queue length regret of order $\widetilde{\mathcal{O}}(T^{-1/2})$. We further prove a minimax lower bound of order $Ω(T^{-1/2})$. The proof constructs two hard instances that are statistically indistinguishable up to the final service decision, and uses a queue-specific coupling argument to convert the resulting testing error into queue length regret. Together, our upper and lower bounds characterize the minimax dependence on the horizon $T$ up to logarithmic factors.

URL PDF HTML ☆

赞 0 踩 0

2606.09802 2026-06-09 cs.LG cs.AI stat.ML 新提交

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

高效实验的Bandits：适应控制组、偏好和上下文漂移

Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard

发表机构 * Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 – CRIStAL（里尔大学、法国国家科学研究中心、中央理工学院、UMR 9189 – CRIStAL）

AI总结针对用户偏好和上下文分布随时间漂移的线性上下文随机多臂赌博机问题，提出Dri-MED算法，通过异方差回归处理非平稳噪声，实现实例相关的遗憾界和约束违规界。

详情

AI中文摘要

我们考虑线性上下文随机多臂赌博机的一个变体，其中学习器必须向一组用户提供推荐，每个用户有其个性化的偏好向量，并且上下文分布随时间漂移。在实践者友好的假设下，我们将此设置简化为具有平稳均值但异方差和非平稳噪声的线性赌博机。我们进一步研究了学习器必须确保每个决策的平均奖励超过基线策略$\boldsymbol{\pi}_0$在每个决策步骤的均值的情况。我们引入了Dri-MED，一种受MED策略线性版本启发并仔细调整以处理非平稳异方差噪声的算法。我们表明，实例相关的遗憾界为$\tilde{\mathcal O}\left(\frac{\kappa}{\tilde{\Delta}}d^2(\log(T)\right)$，其中$\tilde{\Delta}$是受策略$\pi_0$约束的次优性间隙，方差感知乘性项$\kappa$通过异方差回归仔细处理。我们进一步表明Dri-MED享有$\tilde{\mathcal{O}}(d)$的期望约束违规。我们的数值结果表明，Dri-MED显著优于忽略漂移和偏好结构的保守基线。

英文摘要

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under practitioner-friendly assumptions, we reduce this setting to linear bandit with stationary mean but heteroskedastic and non-stationary noise. We further study the case when the learner must ensure the mean reward of each decision must exceed that of a baseline strategy $\boldsymbolπ_0$ at each decision step. We introduce Dri-MED, an algorithm inspired from the linear version of the MED strategy, and carefully adapted to handle the non-stationary heteroskedastic noise. We show that the instance-dependent regret scales as $\tilde{\mathcal O}\left(\fracκ{\tildeΔ}d^2(\log(T)\right)$, where $\tildeΔ$ is the constraint-aware sub-optimality gap subject to policy $π_0$, with variance-aware multiplicative term $κ$ that we carefully handle using heteroskedastic regression. We further show Dri-MED enjoys $\tilde{\mathcal{O}}(d)$ expected constraint violations. Our numerical results suggest that Dri-MED significantly outperforms conservative baselines that ignores the drift and preference structure.

URL PDF HTML ☆

赞 0 踩 0

2606.09821 2026-06-09 cs.LG 新提交

Rethinking the Divergence Regularization in LLM RL

重新思考LLM强化学习中的散度正则化

Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang

发表机构 * Tencent Hunyuan（腾讯混元）； UIUC（伊利诺伊大学厄巴纳-香槟分校）； NUS（新加坡国立大学）

AI总结针对PPO等方法的硬裁剪或硬掩码在长尾词汇中分布偏移代理不佳的问题，提出DRPO，用平滑的优势加权二次正则化替代硬掩码，保持信任区域几何的同时提供连续梯度权重，提升训练稳定性和效率。

详情

AI中文摘要

强化学习已成为后训练大型语言模型的关键组成部分。在实践中，由于训练-推理不匹配和策略陈旧，LLM RL通常是离策略的，因此信任区域控制对于稳定优化至关重要。PPO和GRPO等主流方法通过比率裁剪机制近似这种控制，但在长尾词汇中，重要性比率可能成为分布偏移的糟糕代理。最近的工作如DPPO通过用基于散度的掩码替换基于比率的裁剪来解决这种不匹配，从而产生由采样令牌的绝对概率偏移定义的信任区域。然而，DPPO仍然依赖于硬掩码：一旦令牌以有害方向越过信任区域边界，其梯度就会被丢弃而不是纠正。为了解决这个问题，我们提出了散度正则化策略优化（DRPO），它用策略偏移上的平滑优势加权二次正则化器替换硬掩码。DRPO保留了与DPPO相同的信任区域几何，同时引入了有界、连续的梯度权重，这些权重衰减发散更新并在边界之外提供纠正信号。跨模型规模、架构和精度设置的实验表明，DRPO提高了LLM RL训练的稳定性和效率。

英文摘要

Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream methods such as PPO and GRPO approximate this control with a ratio-clipping mechanism, but the importance ratio can be a poor proxy for distributional shift in long-tailed vocabularies. Recent work such as DPPO addresses this mismatch by replacing ratio-based clipping with a divergence-based mask, yielding a trust region defined by the sampled token's absolute probability shift. However, DPPO still relies on a hard mask: once a token crosses the trust-region boundary in a harmful direction, its gradient is discarded rather than corrected. To address this, we propose Divergence Regularized Policy Optimization (DRPO), which replaces the hard mask with a smooth advantage-weighted quadratic regularizer on policy shift. DRPO preserves the same trust-region geometry as DPPO while inducing bounded, continuous gradient weights that attenuate diverging updates and provide corrective signals beyond the boundary. Experiments across model scales, architectures, and precision settings show that DRPO improves the stability and efficiency of LLM RL training.

URL PDF HTML ☆

赞 0 踩 0

2606.09825 2026-06-09 cs.LG cs.AI cs.SY eess.SY math.OC 新提交

An Agency-Transferring Model-Free Policy Enhancement Technique

一种无模型策略增强的代理转移技术

Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko

发表机构 * Center for Engineering Systems and Sciences（工程系统与科学中心）； Central University（中央大学）； Sirius University of Science and Technology（天狼星科技大学）

AI总结提出一种将次优基线策略嵌入强化学习训练的方法，通过逐步从基线策略向可学习策略转移代理权，提升训练效率并最终获得超越基线的独立策略。

详情

AI中文摘要

从头开始训练强化学习（RL）策略成本高昂：需要仔细设计奖励和环境、大量调参以及大量计算。然而，许多控制问题已经有一个功能正常但次优的基线策略可用。本文提出一种方法，将这样的基线策略嵌入RL训练过程，同时提高相对于从头开始方法的训练效率，并产生一个优于基线的学习策略。在每个步骤中，该方法在基线策略和可训练的学习策略之间进行仲裁，最初强烈依赖基线策略，然后逐步将代理权转移给学习策略。训练结束时，学习策略是一个无需基线策略支持的独立神经网络。本文形式化了基线策略“功能正常”的含义：在该策略下，智能体以高概率到达目标集并停留在那里。所提出的仲裁机制旨在训练过程中利用这一特性，从训练开始就产生高目标到达率。理论分析在给定假设下提供了这种行为的形式化解释，并将其扩展到最终无基线场景，其中推导了独立学习策略目标到达概率的显式下界。在连续控制基准上的实验结果表明，所提出的方法实现了与竞争方法相当或更高的回报，同时在训练过程中（包括最终阶段，学习策略无需任何基线支持）保持了最高的目标到达率。

英文摘要

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates without baseline policy support. The paper formalizes what it means for the baseline policy to be functional: under this policy, the agent reaches a goal set and remains there with high probability. The proposed arbitration mechanism is designed to exploit this property during training, yielding high goal-reaching rates right from the beginning of training. A theoretical analysis provides a formal interpretation of this behavior under stated assumptions and extends it to the final baseline-free regime, where explicit lower bounds are derived for the goal-reaching probability of the standalone learning policy. Empirical results on continuous-control benchmarks show that the proposed method achieves returns that match or exceed those of competitive approaches, while maintaining the highest goal-reaching rates throughout training among the compared methods -- including in the final stage, where the learning policy operates without any baseline support.

URL PDF HTML ☆

赞 0 踩 0

2606.07845 2026-06-09 cs.MA cs.LG 交叉投稿

GRPO Does Not Close the Multi-Agent Coordination Gap

GRPO 并未缩小多智能体协调差距

Najmul Hasan, Prashanth BusiReddyGari

发表机构 * Department of Mathematics and Computer Science University of North Carolina at Pembroke（数学与计算机科学系北卡罗来纳大学帕克维尔分校）

AI总结通过哲学家就餐问题测试大语言模型的多智能体协调能力，发现GRPO训练无法显著提升性能，瓶颈在于训练方法而非计算量。

Comments 15 pages, 15 figures

详情

AI中文摘要

我们使用哲学家就餐问题作为干净的测试平台，衡量当前大型语言模型作为共享公共资源的多个智能体进行协调的能力。在涵盖七个模型和三种哲学家数量的630个回合中，四个前沿闭源系统的平均奖励达到0.45至0.87，Mistral-Small 24B达到0.83至0.99，而Qwen3-14B仅为0.13至0.35。然后我们询问，基于任务自身展开的群体相对策略优化（GRPO）能否缩小差距，结果发现不能：对五个哲学家场景的每回合奖励进行Welch t检验，p=0.66，Hedges' g=-0.11，在十个或十五个哲学家场景下也没有统计显著变化。两个进一步的观察限定了这一结果。8B和14B运行中的训练奖励在第九步达到峰值后下降，因此默认在第15步保存的检查点严格劣于之前的几个检查点。我们使用的四项奖励在零动作时存在退化最大值，DeepSeek-R1-Distill-Qwen-7B和Mistral-Small 24B在五个哲学家场景下都处于该状态，零餐时的平均奖励分别为1.0和0.83。对于开放权重的14B模型，多智能体协调的瓶颈不是训练计算量，而是训练方法：不会坍缩到无动作最大值的奖励塑造、不依赖最后一步的检查点纪律，以及跨问题规模的课程学习。

英文摘要

We measure how well current large language models coordinate as multiple agents sharing a common resource, using the dining philosophers problem as a clean test bed. Across 630 episodes spanning seven models and three philosopher counts, four frontier closed-source systems reach mean reward 0.45 to 0.87 and Mistral-Small 24B reaches 0.83 to 0.99, while Qwen3-14B reaches 0.13 to 0.35. We then ask whether group relative policy optimization (GRPO) on rollouts from the task itself can close the gap and find that it cannot: a Welch's t-test on per-episode reward at five philosophers gives p = 0.66 and a Hedges' g of -0.11, with no statistically significant change at ten or fifteen philosophers either. Two further observations qualify the result. The training reward of both 8B and 14B runs peaked at step nine and then declined, so the default saved checkpoint at step 15 is strictly worse than several earlier ones. The four-term reward we use admits a degenerate maximum at zero actions, which DeepSeek-R1-Distill-Qwen-7B and Mistral-Small 24B at five philosophers both inhabit, with mean reward 1.0 and 0.83 respectively at zero meals. The bottleneck for an open-weight 14B model on multi-agent coordination is not training compute but training methodology: reward shaping that does not collapse to a no-action maximum, checkpoint discipline that does not depend on the final step, and curriculum across problem scales.

URL PDF HTML ☆

赞 0 踩 0

2606.08032 2026-06-09 stat.ML cs.LG 交叉投稿

Variational Proximal Policy Optimization

变分近端策略优化

Ousmane Amadou Dia

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出变分近端策略优化（VP₂O），利用粒子变分推理和专家混合架构，通过几何近端控制机制解决强化学习中的策略模式崩溃和分布漂移问题，在复杂推理任务上取得显著提升。

2606.08249 2026-06-09 cs.RO cs.LG 交叉投稿

面向自主水下机器人的端到端运动规划与执行：基于强化学习的方法

Elisei Shafer, Oren Gal

发表机构 * University of Haifa（海法大学）

AI总结提出分层强化学习架构，将原始传感器数据直接映射为推进器指令，实现AUV端到端运动规划与执行，在HoloOcean仿真中轨迹长度接近RRT*基线（误差4%-6%），并具备鲁棒性。

详情

AI中文摘要

自主水下机器人（AUV）传统上依赖复杂、高度工程化的流水线进行感知、路径规划和运动控制。本文探索了一种端到端深度强化学习（DRL）方法的可行性，该方法将原始传感器数据直接映射为推进器指令，减少了人工工程。我们提出了一种分层强化学习（HRL）架构，将问题分解为两个马尔可夫决策过程。高层（HL）策略以2Hz运行，处理原始$84 \ imes 84$像素单目相机帧、堆叠的$100 \ imes 100$像素前视成像声纳以及本体感受数据，生成空间子目标。同时，低层（LL）策略以10Hz运行，将这些子目标转换为推进器指令。HL策略使用基于先前演示的强化学习（RLPD）在修改后的样本高效机器人强化学习（SERL）框架中训练，而LL策略则采用软演员-评论家（SAC）结合后见经验回放（HER）。在高保真HoloOcean模拟器中评估，我们的方法展示了成功的避障能力，轨迹长度与$\ ext{RRT}^*$规划基线非常接近（误差在4%到6%之间）。此外，学习到的策略对模拟传感器噪声和能见度降低表现出强鲁棒性。尽管系统能有效导航熟悉的几何环境，但实验揭示了在遇到具有新颖障碍形状的未访问区域时存在泛化限制。最终，这项工作展示了使用最小计算硬件进行样本高效、端到端DRL在水下导航中的潜力。

英文摘要

Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision Processes. A High-Level (HL) policy operating at 2Hz processes raw $84 \times 84$ pixel monocular camera frames, stacked $100 \times 100$ pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Simultaneously, a Low-Level (LL) policy operating at 10Hz converts these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy utilizes Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the high-fidelity HoloOcean simulator, our method demonstrates successful obstacle avoidance, achieving trajectory lengths closely approximating (within 4% to 6% of) an $\text{RRT}^*$ planning baseline. Furthermore, the learned policy exhibits strong robustness to simulated sensor noise and decreased visibility. While the system navigates familiar geometries effectively, experiments reveal generalization limitations when encountering unvisited areas with novel obstacle shapes. Ultimately, this work demonstrates the promise of sample-efficient, end-to-end DRL for underwater navigation using minimal computational hardware.

URL PDF HTML ☆

赞 0 踩 0

2606.09002 2026-06-09 stat.ML cs.LG math.ST stat.TH 交叉投稿

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

带有到达臂的多臂老虎机：顺序筛选、动态遗憾与次线性保证

Deqi Zheng, Xiaoyang Xu, Yuhong Yang

发表机构 * Qiuzhen College, Tsinghua University（清华大学求真学院）； Yau Mathematical Sciences Center, Tsinghua University（清华大学姚氏数学科学中心）

AI总结针对可用臂随时间扩展的随机多臂老虎机问题，提出基于消除的UCB-AA算法，通过初步筛选新臂并考虑到达信息差异和漂移基准，实现动态遗憾的次线性界。

Comments 24 pages, 4 figures

详情

AI中文摘要

我们研究了一个随机多臂老虎机问题，其中可用臂的集合随时间扩展。这一设置出现在当新动作或治疗在正在进行的研究中变得可用时的顺序实验中，使得对事后单一最佳臂的遗憾不恰当。我们转而评估相对于当前可用最佳臂的性能，从而为到达臂环境引入了一个动态遗憾准则。为了解决到达信息差异（AID）和漂移基准（DB）带来的挑战，我们提出了用于到达臂的UCB（UCB-AA），这是一个基于消除的过程，并包含一个辅助的初步筛选步骤，用于新到达的臂在与现有臂完全竞争之前。我们证明UCB-AA获得的遗憾界明确依赖于到达过程，在间隙演化的正则条件下实现了次线性动态遗憾，并允许对未知时间范围进行在线扩展。仿真结果表明，UCB-AA减少了浪费的拉取次数，保持了较小的活动臂集，同时保持了有竞争力的遗憾性能。

英文摘要

We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.

URL PDF HTML ☆

赞 0 踩 0

2308.07822 2026-06-09 cs.LG cs.SY eess.SY 版本更新

Deep reinforcement learning for process design: Review and perspective

深度强化学习在过程设计中的应用：综述与展望

Qinghe Gao, Artur M. Schweidtmann

发表机构 * Delft University of Technology（代尔夫特理工大学）

AI总结本文综述深度强化学习在化工过程设计中的应用，从信息表示、智能体架构、环境与奖励三要素分析现状，并讨论挑战与未来方向。

2502.01226 2026-06-09 cs.LG stat.ML 版本更新

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

基于高斯过程强化学习的自适应先验选择

Jack Sandberg, Morteza Haghir Chehreghani

发表机构 * Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg（计算机科学与工程系，楚尔姆斯理工大学和哥德堡大学）

AI总结本文提出两种算法，通过高斯过程强化学习进行先验选择和后悔最小化，理论分析证明了HP-GP-TS的亚线性后悔界，并通过实验验证其有效性。

Comments 30 pages, 12 figures

2506.10341 2026-06-09 cs.LG cs.CL 版本更新

Formalizing Learning from Language Feedback with Provable Guarantees

从语言反馈中学习的形式化与可证明保证

Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of Toronto（多伦多大学）

AI总结本文形式化语言反馈学习问题，提出转移埃尔泽维度刻画学习难度，并开发无遗憾算法HELiX，证明其性能保证，展示丰富语言反馈可指数级加速学习。

Comments ICML 2026

详情

AI中文摘要

通过观察和语言反馈进行交互式学习是一个日益受到关注的领域，其驱动力来自大型语言模型（LLM）智能体的出现。尽管有令人印象深刻的实证演示，但迄今为止，这些决策问题的原则性框架仍然缺乏。我们形式化了语言反馈学习（LLF）问题，提出了足以在潜在奖励下实现学习的假设，并引入了$\ extit{转移埃尔泽维度}$作为衡量LLF难度的指标。我们形式化了语言反馈中的信息控制学习复杂性的直觉，并展示了从丰富语言反馈中学习可以比从奖励中学习指数级更快的案例。我们开发了一种名为$\ exttt{HELiX}$的无遗憾算法，通过顺序交互可证明地解决LLF问题，其性能保证随转移埃尔泽维度缩放。在多个实证领域，我们展示了即使重复提示LLM不可靠时，$\ exttt{HELiX}$也能表现良好。我们的贡献标志着朝着使用通用语言反馈设计原则性交互学习算法迈出了重要一步。

英文摘要

Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. Despite impressive empirical demonstrations, so far a principled framing of these decision problems remains lacking. We formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a measure to characterize the hardness of LLF. We formalize the intuition that information in the language feedback governs the learning complexity, and demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark an important step towards designing principled interactive learning algorithms using generic language feedback.

URL PDF HTML ☆

赞 0 踩 0

2508.06336 2026-06-09 cs.LG cs.AI cs.HC cs.MA 版本更新

Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

无监督伙伴设计实现鲁棒的临时团队协作

Constantin Ruhdorfer, Matteo Bortoletto, Victor Oei, Anna Penzkofer, Andreas Bulling

发表机构 * University of Southampton（索姆塞特大学）

AI总结提出无监督伙伴设计(UPD)方法，通过动态生成并基于可学习性准则自适应选择训练伙伴，无需预训练伙伴群体或手动调参，在多个任务中达到强性能，并在人机交互研究中获得更高评价。

Comments 27 pages

2508.06659 2026-06-09 cs.LG cs.AI 版本更新

In-Context Reinforcement Learning via Communicative World Models

通过通信世界模型进行上下文强化学习

Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen

发表机构 * Department of Computer and Information Sciences, Fordham University（福特汉姆大学计算机与信息科学系）； Department of Systems Engineering, City University of Hong Kong（香港城市大学系统工程系）； IBM Research（IBM研究院）

AI总结提出CORAL框架，通过将潜在表示学习与控制分离，利用信息代理预训练世界模型并生成通信消息，使控制代理实现零样本适应和样本效率提升。

详情

AI中文摘要

强化学习（RL）代理通常难以在不更新参数的情况下泛化到新任务和上下文，主要是因为它们学到的表示和策略过度拟合于训练环境的特定性。为了提升代理的上下文RL（ICRL）能力，本文将ICRL形式化为一个双代理涌现通信问题，并引入了CORAL（用于自适应RL的通信表示）框架，该框架通过功能性地分离潜在表示学习与控制来学习可迁移的通信上下文。在CORAL中，信息代理（IA）在多样化的任务分布上作为世界模型进行预训练。其目标不是直接最大化回报，而是进行世界建模并将其理解提炼为简洁的消息。涌现通信协议由一种新颖的因果影响损失塑造，该损失衡量消息对下一动作的影响。在部署期间，预训练的IA作为固定上下文提供者服务于新的控制代理（CA），后者通过解释提供的通信上下文来学习解决任务。我们的实验表明，这种方法使CA能够实现样本效率的显著提升，并在多样化的在线和离线环境中借助预训练的IA成功进行零样本适应，验证了学习可迁移通信表示的有效性。

面向组合动作强化学习的潜在球形流策略

Lingkai Kong, Anagha Satish, Hezi Jiang, Akseli Kangaslahti, Andrew Ma, Wenbo Chen, Mingxiao Song, Lily Xu, Milind Tambe

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出LSFlow方法，通过球形流匹配在紧凑连续潜在空间中学习随机策略，并利用组合优化求解器保证动作可行性，引入平滑贝尔曼算子解决不连续值函数问题，在多个组合RL任务上平均超越基线20.6%。

Comments ICML'26 Spotlight

详情

AI中文摘要

具有组合动作空间的强化学习（RL）仍然具有挑战性，因为可行动作集呈指数级增长且受复杂可行性约束，使得直接策略参数化不切实际。现有方法将任务特定的价值函数嵌入到约束优化程序中，或学习确定性的结构化策略，牺牲了通用性和策略表达能力。我们提出了一种求解器诱导的潜在球形流策略，将现代生成策略的表达能力引入组合RL，同时通过设计保证可行性。我们的方法LSFlow通过球形流匹配在紧凑连续潜在空间中学习随机策略，并将可行性委托给组合优化求解器，该求解器将每个潜在样本映射到有效的结构化动作。为了提高效率，我们直接在潜在空间中训练价值网络，避免在策略优化期间重复调用求解器。为了解决由求解器动作选择引起的分段常数和不连续价值景观，我们引入了一个平滑的贝尔曼算子，该算子产生稳定、定义明确的学习目标。实验表明，我们的方法在一系列具有挑战性的组合RL任务中平均优于最先进的基线20.6%。

英文摘要

Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value network directly in the latent space, avoiding repeated solver calls during policy optimization. To address the piecewise-constant and discontinuous value landscape induced by solver-based action selection, we introduce a smoothed Bellman operator that yields stable, well-defined learning targets. Empirically, our approach outperforms state-of-the-art baselines by an average of 20.6\% across a range of challenging combinatorial RL tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.02572 2026-06-09 cs.LG cs.AI 版本更新

Reward Shaping for (Inference-Time) Alignment: A Stackelberg Game Perspective

奖励塑形用于（推理时）对齐：一个Stackelberg博弈视角

Haichuan Wang, Tao Lin, Lingkai Kong, Ce Li, Hezi Jiang, Milind Tambe

发表机构 * University of Southern California（南加州大学）

AI总结针对KL正则化导致LLM继承基策略偏见的问题，提出将奖励模型优化形式化为Stackelberg博弈，并通过简单奖励塑形方案近似最优奖励模型，在推理时对齐中持续提升平均奖励并达到超过66%的胜率。

Comments Accepted to ICML 2026. Camera-ready version

详情

AI中文摘要

现有的对齐方法直接使用从用户偏好数据中学习到的奖励模型来优化LLM策略，并相对于基策略进行KL正则化。这种做法对于最大化用户效用是次优的，因为KL正则化可能导致LLM继承基策略中与用户偏好冲突的偏见。虽然放大偏好输出的奖励可以减轻这种偏见，但也增加了奖励黑客的风险。这种权衡激励了在KL正则化下最优设计奖励模型的问题。我们将这个奖励模型优化问题形式化为一个Stackelberg博弈，并表明一个简单的奖励塑形方案可以有效近似最优奖励模型。我们在推理时对齐设置中经验性地评估了我们的方法，并证明它可以无缝集成到现有的对齐方法中，且开销最小。我们的方法持续提高了平均奖励，并在所有评估设置中平均达到了超过66%的胜率（相对于所有基线）。

英文摘要

Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This practice is suboptimal for maximizing user's utility because the KL regularization may cause the LLM to inherit the bias in the base policy that conflicts with user preferences. While amplifying rewards for preferred outputs can mitigate this bias, it also increases the risk of reward hacking. This tradeoff motivates the problem of optimally designing reward models under KL regularization. We formalize this reward model optimization problem as a Stackelberg game, and show that a simple reward shaping scheme can effectively approximate the optimal reward model. We empirically evaluate our method in inference-time alignment settings and demonstrate that it integrates seamlessly into existing alignment methods with minimal overhead. Our method consistently improves average reward and achieves win-tie rates exceeding 66% against all baselines, averaged across evaluation settings.

URL PDF HTML ☆

赞 0 踩 0

2602.12107 2026-06-09 cs.LG cs.AI stat.ML 版本更新

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

离线强化学习在 $Q^\star$ 近似与部分覆盖下的复杂性

Haolin Liu, Braham Snyder, Chen-Yu Wei

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结本文通过信息论下界证明 $Q^\star$ 可实现性与贝尔曼完备性在部分覆盖下不足以实现样本高效的离线强化学习，并提出一个通用决策-估计框架来统一和改进现有结果。

详情

AI中文摘要

我们研究了在 $Q^\star$ 近似和部分覆盖下的离线强化学习，这一设定激发了诸如保守 $Q$ 学习（CQL；Kumar et al., 2020）等实用算法，但理论上受到的关注有限。我们的工作受以下开放问题的启发：“在部分覆盖下，$Q^\star$ 可实现性和贝尔曼完备性是否足以实现样本高效的离线强化学习？”我们通过信息论下界给出了否定答案。为了识别在部分覆盖下实现样本高效离线强化学习的额外结构，我们引入了一个通用决策-估计框架，该框架受在线强化学习的无模型决策-估计系数（DEC；Foster et al., 2023b; Liu et al., 2025b）启发。我们的框架将离线强化学习的复杂性分解为决策复杂性和值估计误差，从而允许对这两个子问题进行模块化研究。我们的结果不仅统一了现有结果（Chen and Jiang, 2022; Uehara et al., 2023），而且进一步改进并推广了它们。在决策复杂性方面，我们的改进包括：在部分覆盖下软 $Q$ 学习的首个 $\epsilon^{-2}$ 样本复杂度界，改进了 Uehara 等人（2023）的 $\epsilon^{-4}$ 界；在 Chen 和 Jiang（2022）的值间隙设定中消除了对额外在线交互的需求；以及超越上述两种情况的新可学习设定。在值估计方面，我们提供了在部分覆盖下贝尔曼完备性作用的新刻画，以及一般低贝尔曼秩 MDP（Jiang et al., 2017; Du et al., 2021; Jin et al., 2021）离线可学习性的首个刻画。后者是一个经典的在线强化学习设定，除特殊情况外，在离线强化学习中尚未被探索。作为附带贡献，我们的技术给出了函数近似设定下 CQL 的首个分析。

英文摘要

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first $ε^{-2}$ sample complexity bound for soft $Q$-learning under partial coverage that improves Uehara et al.'s (2023) $ε^{-4}$ bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.

URL PDF HTML ☆

赞 0 踩 0

2603.25184 2026-06-09 cs.LG cs.AI 版本更新

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

在移动边缘训练：一种在线验证的提示选择方法用于大型推理模型的高效强化学习训练

Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Bailong Lin, Chen Jason Zhang, Li Qing, Ke Tang

发表机构 * Southern University of Science and Technology（南方科技大学）； The Hong Kong Polytechnic University（香港理工大学）； The Hong Kong University of Science and Technology（香港科学理工大学）； Nanyang Technological University（南洋理工大学）； Rutgers University（罗格斯大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学理工大学（广州））

AI总结本文提出HIVE方法，通过历史奖励轨迹和实时提示熵实现高效RL训练，提升提示选择效率而不牺牲性能。

详情

AI中文摘要

强化学习（RL）已成为在推理任务中训练大型语言模型（LLMs）的关键技术。尽管扩大 rollout 可以稳定训练并提高性能，但计算开销是一个关键问题。在像 GRPO 等算法中，每个提示多个 rollout 会带来极高的成本，因为大量提示提供微不足道的梯度，因此效用较低。为了解决这个问题，我们研究如何在 rollout 阶段之前选择高效用的提示。我们的实验分析揭示了样本效用是非均匀且动态变化的：最强的学习信号集中在「学习边缘」，即中等难度和高不确定性的交界处，随着训练进行而变化。受此启发，我们提出了 HIVE（基于历史和在线验证的提示选择），一种数据高效的 RL 框架。HIVE 利用历史奖励轨迹进行粗略选择，并利用提示熵作为实时代理来修剪效用过时的实例。通过在多个数学推理基准和模型上评估 HIVE，我们证明 HIVE 在不牺牲性能的情况下显著提高了 rollout 的效率。

英文摘要

Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE utilizes historical reward trajectories for coarse selection and employs prompt entropy as a real-time proxy to prune instances with stale utility. By evaluating HIVE across multiple math reasoning benchmarks and models, we show that HIVE yields significant rollout efficiency without compromising performance.

URL PDF HTML ☆

赞 0 踩 0

2605.03357 2026-06-09 cs.LG math.OC 版本更新

Population-Aware Imitation Learning in Mean-field Games with Common Noise

平均场博弈中考虑共同噪声的群体感知模仿学习

Grégoire Lambrecht, Mathieu Laurière

发表机构 * Institut National des Sciences et Techniques de l'Information et des Systèmes (INSTI)（信息与系统科学与技术国家研究院）

AI总结针对含共同噪声的平均场博弈，提出群体感知模仿学习框架，通过行为克隆和对抗散度两种代理，建立有限样本误差界，并利用广义虚拟博弈和深度学习计算专家策略，实验证明群体感知策略对应对随机性的重要性。

详情

AI中文摘要

平均场博弈（MFGs）为建模大量交互智能体的集体行为提供了强大框架。本文研究了含共同噪声的MFG中的模仿学习（IL）问题，其中群体分布随机演化。这种随机性迫使智能体采用群体感知策略以应对总体冲击。我们制定了两个不同的学习目标：恢复纳什均衡和最大化相对于专家群体的性能。我们研究了两种模仿代理：行为克隆（BC）和对抗（ADV）散度。然后，我们建立了有限样本误差界，表明最小化这些代理能有效控制策略的可利用性及其相对于专家的性能差距。此外，我们提出了一个使用广义虚拟博弈和深度学习的数值框架来计算专家群体感知策略。通过在三个环境上的实验，我们证明了标准的群体无感知策略无法捕捉均衡动态。我们的结果强调，学习群体感知策略对于避免被共同噪声固有的随机性误导至关重要。

英文摘要

Mean Field Games (MFGs) provide a powerful framework for modeling the collective behavior of large populations of interacting agents. In this paper, we address the problem of Imitation Learning (IL) in MFGs subject to common noise, where the population distribution evolves stochastically. This stochasticity compels agents to adopt population-aware policies to respond to aggregate shocks. We formulate two distinct learning objectives: recovering a Nash equilibrium and maximizing performance against an expert population. We investigate two imitation proxies: Behavioral Cloning (BC) and Adversarial (ADV) divergence. We then establish finite-sample error bounds showing that minimizing these proxies effectively controls both the policy's exploitability and its performance gap relative to the expert. Furthermore, we propose a numerical framework using generalized Fictitious Play and Deep Learning to compute expert population-aware policies. Through experiments on three environments we demonstrate that standard population-unaware policies fail to capture the equilibrium dynamics. Our results highlight that learning population-aware policies is crucial to avoid being misled by the randomness inherent in common noise.

URL PDF HTML ☆

赞 0 踩 0

2605.26078 2026-06-09 cs.LG 版本更新

Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

Wasserstein策略梯度在熵正则化强化学习中的全局收敛性

Zhaoyu Zhu, Rui Gao, Shuang Li

发表机构 * Shanghai Jiao Tong University（上海交通大学）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结本文通过利用熵正则化强化学习的Bellman结构，证明了Wasserstein策略梯度（WPG）方法的全局收敛性，并建立了分布Polyak-Łojasiewicz条件。

详情

AI中文摘要

Wasserstein策略梯度（WPG）是一种利用动作分布的最优传输几何的强化学习（RL）策略优化方法。对于熵正则化RL目标，WPG通过将每个状态条件策略沿软Q函数的动作梯度以及Langevin型扩散进行传输来演化。尽管它在连续控制问题中具有吸引力，但其全局收敛性质仍不清楚。标准的Langevin分析并不直接适用，因为RL目标通过Bellman递归而非静态凸泛函依赖于策略，且Langevin漂移由软Q函数决定，其正则性必须在策略迭代过程中加以控制。在本文中，我们通过利用熵正则化RL的Bellman结构，发展了WPG的全局收敛理论。我们表明，通常由凸性扮演的角色可以被基于Bellman的论证所取代：软Bellman残差相对于Gibbs策略具有状态级KL表示；Bellman压缩将此残差与全局最优性差距联系起来；而Bellman预解恒等式将价值改进与相对Fisher信息联系起来。结合演化Gibbs族的均匀对数Sobolev不等式（LSI），这些要素产生了分布Polyak-Łojasiewicz条件。我们进一步建立了控制离散化误差所需的正则性和一致界，从而获得直到离散化偏差的几何收缩。概念上，我们的分析表明，尽管熵正则化RL在通常的平坦意义上不是凸的，但Bellman递归诱导了一种有利的Polyak-Łojasiewicz型（PL）几何，支持WPG的全局收敛。

英文摘要

Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of the soft Q-function together with a Langevin-type diffusion. Despite its appeal for continuous-control problems, its global convergence properties remain poorly understood. Standard Langevin analyses do not directly apply, because the RL objective depends on the policy through the Bellman recursion rather than through a static convex functional, and the Langevin drift is determined by the soft Q-function, whose regularity must be controlled along the policy iterates. In this paper, we develop a global convergence theory for WPG by exploiting the Bellman structure of entropy-regularized RL. We show that the role usually played by convexity can be replaced by a Bellman-based argument: the soft Bellman residual admits a statewise KL representation with respect to a Gibbs policy; Bellman contraction relates this residual to the global optimality gap; and a Bellman resolvent identity connects value improvement to relative Fisher information. Combined with a uniform log-Sobolev inequality (LSI) for the evolving Gibbs family, these ingredients yield a distributional Polyak--Łojasiewicz condition. We further establish the regularity and uniform bounds needed to control the discretization error, thereby obtaining geometric contraction up to a discretization bias. Conceptually, our analysis shows that although entropy-regularized RL is not convex in the usual flat sense, the Bellman recursion induces a favorable Polyak--Lojasiewicz-type (PL) geometry that supports global convergence of WPG.

URL PDF HTML ☆

赞 0 踩 0

2605.31014 2026-06-09 cs.LG 版本更新

SDM-Q: Cost-Aware Staged Decision-Making for Multi-Omics Classification with Deep Q-Learning

SDM-Q: 基于深度Q学习的成本感知分阶段决策用于多组学分类

Nan Mu, Yangfan Xiao, Ling Wang, Xiaoning Li, Yue Kang, Chen Zhao

发表机构 * College of Computer Science, Sichuan Normal University（四川师范大学计算机学院）； Department of Mathematics, College of Science and Mathematics, Kennesaw State University（数学系，科学与数学学院，肯纳邦克州立大学）； Department of Computer Science, College of Computing and Software Engineering, Kennesaw State University（计算机科学系，计算与软件工程学院，肯纳邦克州立大学）

AI总结提出SDM-Q强化学习框架，将多组学诊断建模为有限步序贯决策问题，通过动作价值函数平衡分类正确性与模态获取成本，在四个公共数据集上有效减少冗余模态获取并保持竞争性分类性能。

详情

AI中文摘要

CUA-Gym：为计算机使用智能体扩展可验证的训练环境和任务

Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu

发表机构 * The University of Hong Kong（香港大学）； Qwen Team, Alibaba Inc.（阿里巴巴集团Qwen团队）； University of California, San Diego（加州大学圣地亚哥分校）； Tsinghua University（清华大学）

AI总结提出CUA-Gym可扩展流水线，通过协同生成任务指令、环境状态和奖励函数，构建大规模可验证强化学习训练数据，并合成CUA-Gym-Hub模拟网络应用环境，训练出的智能体在OSWorld-Verified和WebArena上取得领先性能。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）在数学、工具使用和软件工程等领域取得了突破，但其在计算机使用智能体（CUA）上的应用受到缺乏具有确定性奖励的可扩展训练数据的瓶颈。为CUA构建此类数据需要一致的任务指令、可执行的环境和可验证的奖励。然而，手工策划的基准测试实现了高奖励保真度，但覆盖的应用很少；基于LLM作为评判者的数据集广泛扩展，但缺乏可靠的验证。我们提出了CUA-Gym，一个可扩展的流水线，协同生成任务指令、环境状态和奖励函数。具体来说，一个生成器智能体构建初始和黄金环境状态，一个独立的判别器智能体根据任务规范编写奖励函数。一个编排器智能体通过执行中的迭代轮次驱动两者。生成的元组通过一个结合LLM多数投票和智能体回滚的最终过滤器，确保超出每任务对抗循环的质量。为了解决训练环境稀缺的问题，我们进一步合成了CUA-Gym-Hub，一套基于真实软件使用分布的高保真模拟网络应用程序套件，将CUA RLVR数据的规模扩大了一个数量级。使用此流水线，我们构建了CUA-Gym数据集，包含32,112个基于110个环境的已验证RLVR训练元组。在CUA-Gym上使用GSPO训练的CUA-Gym-A3B和CUA-Gym-A17B在OSWorld-Verified上分别达到62.1%和72.6%，在可比规模上优于先前的开源CUA，并且在数据量和环境多样性上性能平滑扩展。相同的检查点还在保留的WebArena基准测试上有所改进，表明训练环境之外的迁移。我们将开源完整的合成流水线、数据集、CUA-Gym-Hub环境和模型。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agents (CUAs) has been bottlenecked by the scarcity of scalable training data with deterministic rewards. Constructing such data for CUAs requires consistent task instruction, executable environment, and verifiable reward. However, hand-curated benchmarks achieve high reward fidelity but cover few applications and LLM-as-judge-based datasets scale broadly but lack reliable verification. We present CUA-Gym, a scalable pipeline that co-generates task instructions, environment states, and reward functions. Concretely, a Generator agent constructs the initial and golden environment states, and a separate Discriminator agent writes the reward function from the task specification. An orchestrator agent drives the two through iterative rounds upon execution. Generated tuples then pass a final filter combining LLM majority voting and agent rollouts, ensuring quality beyond the per-task adversarial loop. To address the scarcity of training environments, we further synthesize CUA-Gym-Hub, a broad suite of high-fidelity mock web applications grounded in real-world software-use distributions, expanding the scale of CUA RLVR data by magnitude. Using this pipeline, we construct CUA-Gym, a dataset of 32,112 verified RLVR training tuples grounded in 110 environments. Trained with GSPO on CUA-Gym, our CUA-Gym-A3B and CUA-Gym-A17B achieve 62.1% and 72.6% on OSWorld-Verified, outperforming prior open-source CUAs at comparable scales, with performance scaling smoothly in both data volume and environment diversity. The same checkpoints also improve on the held-out WebArena benchmark, indicating transfer beyond the training environments. We will open-source the full synthesis pipeline, dataset, CUA-Gym-Hub environments, and models.

URL PDF HTML ☆

赞 0 踩 0

2605.26452 2026-06-09 cs.RO cs.LG cs.SY eess.SY 版本更新

Robust Koopman Control Barrier Filters for Safe Actor-Critic Reinforcement Learning

鲁棒Koopman控制屏障滤波器用于安全演员-评论家强化学习

Dhruv S. Kushwaha, Zoleikha A. Biron

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出鲁棒Koopman-CBF SAC框架，通过数据驱动学习Koopman预测器、构建提升空间中的仿射CBF约束并利用二次规划安全层实施，同时通过投影残差裕度处理近似误差，实现零约束违反或减少违规。

Comments 17 pages, 7 figures

详情

AI中文摘要

机器人系统的安全强化学习需要策略在训练和部署期间满足状态和输入约束的同时提高任务性能。控制屏障函数通过最小侵入性安全滤波器提供强制执行前向不变性的原则性机制，但其在无模型强化学习中的应用受限于对精确动力学和手工设计屏障证书的需求。我们提出鲁棒Koopman-CBF SAC，一种安全滤波的演员-评论家框架，从数据中学习有限维Koopman预测器，在提升空间中构建仿射CBF约束，并通过二次规划安全层强制执行。为考虑有限维Koopman近似误差，使用从留出轨迹数据估计的投影残差裕度收紧CBF条件。评论家在执行的安操作上训练，而演员则被正则化向Koopman-CBF可行集，减少训练中对滤波器的依赖。在安全控制基准测试中，该方法在CartPole稳定和跟踪上实现零约束违反，同时匹配或超过无约束SAC的回报。在高维Safety Gymnasium运动任务中，该方法在某些设置下减少了违规，但也暴露了一阶速度屏障和线性EDMD模型的重要局限性，推动了高阶和多步Koopman-CBF扩展。这些结果表明，鲁棒Koopman-CBF滤波器是无模型强化学习和可证明安全之间的有前途桥梁，同时阐明了此类滤波器保持有效的结构条件。所有代码可在\href{https://github.com/DhruvKushwaha/Koopman-CBF-Soft-Actor-Critic}{Github仓库}获取。

英文摘要

Safe reinforcement learning (RL) for robotic systems requires policies that improve task performance while satisfying state and input constraints during both training and deployment. Control barrier functions (CBFs) provide a principled mechanism for enforcing forward invariance through minimally invasive safety filters, but their use in model-free RL is limited by the need for accurate dynamics and hand-designed barrier certificates. We propose Robust Koopman-CBF SAC, a safety-filtered actor--critic framework that learns a finite-dimensional Koopman predictor from data, constructs affine CBF constraints in the lifted space, and enforces them through a quadratic-program safety layer. To account for finite-dimensional Koopman approximation error, the CBF condition is tightened using a projected residual margin estimated from held-out rollout data. The critic is trained on the executed safe action, while the actor is regularized toward the Koopman-CBF feasible set, reducing dependence on the filter over training. Across safe-control benchmarks, the method achieves zero constraint violations on CartPole stabilization and tracking while matching or exceeding unconstrained SAC returns. On high-dimensional Safety Gymnasium locomotion tasks, the method reduces violations in some settings but also exposes important limitations of first-order velocity barriers and linear EDMD models, motivating high-order and multi-step Koopman-CBF extensions. These results suggest that robust Koopman-CBF filters are a promising bridge between model-free RL and certifiable safety, while clarifying the structural conditions under which such filters remain effective.

URL PDF HTML ☆

赞 0 踩 0

2606.01619 2026-06-09 cs.AI cs.LG stat.ML 版本更新

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

ReSkill：在智能体强化学习中协调技能创建与策略优化

Zelin He, Haotian Lin, Boran Han, Wei Zhu, Haoyang Fang, Bernie Wang, Xuan Zhu, Runze Li, Matthew Reimherr

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出ReSkill框架，通过GRPO的组结构嵌入断言驱动技能创建、组内轨迹采样和自适应汤普森采样，实现技能与策略的协同进化，在多个领域超越现有方法。

详情

AI中文摘要

智能体强化学习使LLM智能体能够从环境奖励中持续改进，但由此产生的策略并未系统地积累可跨任务泛化的可重用策略。模块化技能可以提供此类可重用策略，然而现有的技能增强强化学习方法将技能创建与策略优化分离，存在采用与进化策略冲突的技能的风险。受Anthropic的Skill Creator启发，我们引入ReSkill，一种强化学习在环的技能创建框架，协调技能进化与策略学习。ReSkill利用GRPO的组结构自然嵌入三种机制，仅需少量额外开销：（1）断言驱动的技能创建器，从过去经验中诊断失败并提出基于条件的触发式技能修订；（2）组内轨迹采样，实现技能版本的可控比较，捕获哪个版本最能支持策略的持续学习；（3）自适应折扣的汤普森采样，在策略进化过程中平衡技能版本选择的探索与利用。在多个领域，ReSkill始终优于现有的基于记忆和技能的强化学习方法，在未见任务上提升最大。对技能生命周期的分析显示，随着策略改进，技能被自动创建、测试、精炼和修剪，展示了协调的技能-策略协同进化。

英文摘要

Agentic reinforcement learning (RL) enables LLM agents to improve continuously from environment rewards, yet the resulting policies do not systematically accumulate reusable strategies that generalize across tasks. Modular skills can provide such reusable strategies, yet existing skill-augmented RL methods decouple skill creation from policy optimization, risking adopting skills that conflict with the evolving policy. Inspired by Anthropic's Skill Creator, we introduce ReSkill, an RL-in-the-loop skill creation framework that reconciles skill evolution with policy learning. ReSkill exploits the group-wise structure of GRPO to naturally embed three mechanisms with only marginal additional overhead: (1) an assertion-driven skill creator that diagnoses failures from past experience and proposes conditional, trigger-based skill revisions; (2) within-group rollout sampling that enables controlled comparison of skill versions, capturing which version best supports the policy's ongoing learning; and (3) Thompson Sampling with adaptive discounting to balance exploration and exploitation in skill version selection as the policy evolves. Across several domains, ReSkill consistently outperforms existing memory and skill-based RL methods, with the largest gains on unseen tasks. Analysis of the skill lifecycle shows skills being automatically created, tested, refined, and pruned as the policy improves, demonstrating reconciled skill-policy co-evolution.

URL PDF HTML ☆

赞 0 踩 0

2606.04421 2026-06-09 cs.AI cs.LG 版本更新

Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers

Trivium: 时间遗憾作为因果记忆控制器的一等目标

Edward Y. Chang

发表机构 * Stanford University（斯坦福大学）

AI总结本文提出将长期时间遗憾作为一等目标，与结果遗憾和认知遗憾共同构成因果记忆控制器的可证伪失败分析框架，证明时间校准偏差在对结果遗憾为零时仍线性增长，而基于持久因果日志的探测复杂度为对数级。

Comments 62 pages, 12 tables, 12 figures

详情

AI中文摘要

许多当前的智能体系统和LLM管道通过优化结果奖励来纠正错误。这仅解决了失败的“什么”：当结果偏离预测时，不匹配的“为什么”和“何时”没有被系统地记录、审查或纠正，因此相同的错误可能反复出现。我们认为这是一个结构性问题，而不仅仅是模型容量问题。我们提出将长期时间遗憾作为一等目标，与结果遗憾和工作因果模型上的认知遗憾并列。时间遗憾捕捉失败持续的时间：在纠正之前，一个校准错误的因果模型被容忍了多久。认知遗憾捕捉失败持续的原因：工作因果模型中的残余不确定性或错误。这三个遗憾共同给出了一个可证伪的说明，关于一个长期存在的智能体可能失败的原因、内容和时间。将智能体建模为E个片段的流，我们在显式因果探测、持久性和可检测性假设下证明了三个条件结果。首先，在观测等价混淆下，仅基于结果的学习无法在没有干预通道的情况下区分因果结构和虚假结构，因此时间校准偏差可以在结果遗憾被降至零后仍线性持续。其次，使用持久因果日志和预算探测，总探测复杂度是片段范围的对数，导致O(log E)的时间遗憾。第三，在K个可检测变化点下，速率扩展为O(K log E)。我们实例化了Trivium并预注册了五个可证伪预测。在CausalBench-Seq上，Trivium遵循预测的对数包络线，而仅基于结果的基线线性增长。一个真实LLM流的初步外部有效性证据跨越了一个完整的E=500运行和三个E=100前沿模型试点。这里的自学习意味着修正外部因果模型，而不是重新训练LLM权重。

英文摘要

Many current agentic systems and LLM pipelines correct mistakes by optimizing outcome reward. This addresses only the what of failure: when an outcome diverges from prediction, the why and when of the mismatch are not systematically logged, reviewed, or corrected, so the same error can recur episode after episode. We argue that this is a structural problem, not merely a model-capacity one. We propose long-horizon temporal regret as a first-class objective alongside outcome regret and epistemic regret over the working causal model. Temporal regret captures when failure persists: how long a miscalibrated causal model is tolerated before correction. Epistemic regret captures why failure persists: residual uncertainty or error in the working causal model. Together, the three regrets give a falsifiable account of what, why, and when a long-lived agent can fail. Modeling the agent as a stream of E episodes, we prove three conditional results under explicit causal-probing, persistence, and detectability assumptions. First, under observationally equivalent confounding, outcome-only learning cannot distinguish causal from spurious structure without an intervention channel, so temporal miscalibration can persist linearly even after outcome regret is driven to zero. Second, with a persistent causal log and budgeted probes, total probe complexity is logarithmic in the episode horizon, inducing O(log E) temporal regret. Third, under K detectable change-points, the rate extends to O(K log E). We instantiate Trivium and pre-register five falsifiable predictions. On CausalBench-Seq, Trivium follows the predicted logarithmic envelope while outcome-only baselines grow linearly. A pilot real-LLM stream provides preliminary external-validity evidence across one full E = 500 run and three E = 100 frontier-model pilots. Self-learning here means revising an external causal model, not retraining LLM weights.

URL PDF HTML ☆

赞 0 踩 0

2205.01970 2026-06-09 cs.LG stat.ML 版本更新

Non-Stationary Bandit Learning via Predictive Sampling

非平稳老虎机学习中的预测采样

Yueyang Liu, Xu Kuang, Benjamin Van Roy

发表机构 * Jones Graduate School of Business, Rice University（里士满大学沃森商学院研究生院）； Stanford Graduate School of Business（斯坦福商学院）； Department of Management Science and Engineering, Department of Electrical Engineering, Stanford University（斯坦福大学管理科学与工程系、电气工程系）

AI总结本文提出预测采样算法，通过区分信息快速失效的行动来改进非平稳环境下的老虎机学习，理论证明其性能并验证其在复杂环境中的有效性。

2606.07569 2026-06-09 cs.LG 新提交

TriHead-GAN: A Generative Adversarial Network with Triple-Head Discriminator for Carbon Emission Time Series Generation

TriHead-GAN: 一种具有三头判别器的生成对抗网络用于碳排放时间序列生成

Zesen Wang, Lijuan Lan, Yonggang Li, Chunhua Yang

发表机构 * SanMuGuo

AI总结针对城市级高频碳排放数据稀缺问题，提出TriHead-GAN，通过三头判别器联合监督分布真实性、跨变量依赖和步态平滑性，在多个数据集上优于主流基线并提升下游预测精度。

详情

AI中文摘要

准确的碳排放监测对于气候政策和新兴监管机制（如欧盟碳边境调节机制）至关重要，然而城市级高频监测数据仍然极为稀缺，严重限制了数据饥渴的深度学习模型。时间序列生成是一种自然的补救措施，但现有的基于GAN和扩散的生成器通常对碳排放数据的领域结构提供的显式监督有限：它们可能匹配边际分布统计量，但未能充分保留CO$_2$与共排放污染物和气象因素之间的跨变量相关性，并且倾向于破坏大气测量的一阶差分统计量，产生平均平滑但缺乏底层信号真实步态变异性的序列。我们提出TriHead-GAN，一种基于Transformer的对抗框架，其三头判别器联合监督联合分布的三个互补方面：通过Wasserstein评判器监督分布真实性，通过目标变量的无泄漏回归监督跨变量依赖性，以及通过相邻差分预测监督步态时间平滑性。生成器结合了全局自注意力与局部时间卷积、每步噪声注入以及匹配一阶差分统计量的抗平滑损失。在自收集的长沙碳数据集、两个公共碳数据集（中国、美国）以及ETTh1基准上的实验表明，TriHead-GAN在绝大多数设置下优于主流基线，并且生成的合成窗口在低资源碳监测场景中提高了下游预测准确性。

英文摘要

Accurate carbon emission monitoring is critical for climate policy and emerging regulatory mechanisms such as the EU Carbon Border Adjustment Mechanism, yet city-level high-frequency monitoring data remain extremely scarce, severely limiting data-hungry deep learning models. Time series generation is a natural remedy, but existing GAN and diffusion-based generators often provide limited explicit supervision for the domain structure of carbon emission data: they may match marginal distributional statistics while insufficiently preserving cross-variable correlations between CO$_2$ and co-emitted pollutants and meteorological factors, and tend to collapse the first-difference statistics of atmospheric measurements, producing sequences that are smooth on average but lack the realistic step-wise variability of the underlying signals. We propose TriHead-GAN, a Transformer-based adversarial framework whose triple-head discriminator jointly supervises three complementary aspects of the joint distribution: distributional authenticity via a Wasserstein critic, cross-variable dependency via leakage-free regression of the target variable, and step-wise temporal smoothness via adjacent-difference prediction. The generator combines global self-attention with local temporal convolution, per-step noise injection, and an anti-smoothing loss that matches first-difference statistics. Experiments on the self-collected Changsha Carbon dataset, two public carbon datasets (China, US), and the ETTh1 benchmark show that TriHead-GAN achieves favorable performance over mainstream baselines on the vast majority of settings, and that the resulting synthetic windows improve downstream forecasting accuracy in low-resource carbon monitoring scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.07599 2026-06-09 cs.LG cs.AI cs.CV 新提交

DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression

DiffoR：一种统一的连续生成框架用于通用序数回归

Hongxu Ma, Lin Wang, Chenghou Jin, Han Zhou, Jie Zhang, Xiaoyu Yang, Chunjie Chen, Jihong Guan, Shuigeng Zhou

发表机构 * Fudan University（复旦大学）； Kuaishou Technology（快手科技）； Shanghai University of Finance and Economics（上海财经大学）； Tongji University（同济大学）

AI总结提出DiffOR框架，将序数回归建模为连续生成任务，利用扩散模型通过迭代去噪恢复连续序数值，并设计双解耦策略（多尺度增量聚合与动态去噪感知）保留序数拓扑，在12个基准上超越现有方法。

Comments Accepted at KDD 2026

详情

DOI: 10.1145/3770855.3818149

AI中文摘要

序数回归（OR）旨在预测具有内在顺序的目标值，支撑着从推荐系统到计算机视觉等多个领域的关键应用。尽管从朴素回归发展到基于离散化的分类和生成，现有范式仍然受到量化伪影和缺乏全局序数拓扑感知的根本限制。这些方法通常强制执行刚性边界划分，无法捕捉序数数据固有的非平稳语义转换。在本文中，我们提出了一种新范式，将OR形式化为连续生成序数回归任务。在该新范式下，我们引入了DiffOR，一个统一的框架，利用扩散模型通过迭代去噪恢复连续序数值，从而能够动态学习软语义转换。为了显式保留序数拓扑，我们设计了一种双解耦策略：在空间上，多尺度增量聚合将目标分解为层次化的连续增量；在时间上，动态去噪感知将去噪步骤与特征频率同步，确保稳健的从粗到细的细化。理论上，我们证明了所提方法可以显著增强表示能力和机制可解释性。在四个领域的12个基准上的大量实验验证了DiffOR相对于最先进方法的一致优越性，建立了一个新标准，展示了作为通用序数回归通用解决方案的强大潜力。

英文摘要

Ordinal Regression (OR) aims to predict target values with inherent order, underpinning critical applications across diverse domains, from recommender systems to computer vision. Though having evolved from naive regression to discretization-based classification and generation, existing paradigms remain fundamentally constrained by quantization artifacts and the lack of global ordinal topological perception. These methods typically enforce rigid boundary delineations, failing to capture the non-stationary semantic transitions inherent to ordinal data. In this paper, we propose a novel paradigm where OR is formulated as a Continuous Generative Ordinal Regression task. Under the novel paradigm, we introduce DiffOR, a unified framework that leverages diffusion models to recover continuous ordinal values via iterative denoising, thereby enabling the dynamic learning of soft semantic transitions. To explicitly preserve ordinal topology, we devise a Dual-Decoupling Strategy: Spatially, Multi-scale Increment Aggregation decomposes targets into hierarchical continuous increments; Temporally, Dynamic Denoising Perception synchronizes denoising steps with feature frequencies, ensuring robust coarse-to-fine refinement. Theoretically, we show that the proposed method can significantly enhance both representation capability and mechanistic interpretability. Extensive experiments on 12 benchmarks across four domains validate DiffOR's consistent superiority over state-of-the-art methods, establishing a new standard that demonstrates strong potential as a general-purpose solution for universal ordinal regression.

URL PDF HTML ☆

赞 0 踩 0

2606.07760 2026-06-09 cs.LG 新提交

scCBGM: Interpretable Single-Cell Counterfactual Editing

scCBGM：可解释的单细胞反事实编辑

Alma Andersson, Aya Abdelsalam Ismail, Edward De Brouwer, Doron Haviv, Tommaso Biancalani, Kyunghyun Cho, Gabriele Scalia, Aïcha BenTaieb, Hector Corrada Bravo

发表机构 * University of Copenhagen（哥本哈根大学）； University of Cambridge（剑桥大学）； University of Amsterdam（阿姆斯特丹大学）； University of California, Berkeley（加州大学伯克利分校）； University of Tokyo（东京大学）； University of Washington（华盛顿大学）； University of Oxford（牛津大学）

AI总结提出scCBGM框架，通过概念瓶颈架构和解耦惩罚实现单细胞反事实编辑，在组合泛化和反事实预测上表现优异。

Comments Accepted to ICML 2026; code at https://github.com/almaan/scCBGM

详情

AI中文摘要

理解细胞表型及其对扰动的响应对于疾病生物学和治疗设计至关重要。单细胞RNA测序能够在细胞分辨率下进行表征，但条件的组合空间使得穷举实验映射不可行。我们引入了单细胞概念瓶颈生成模型（scCBGM），这是一个用于对单个细胞进行可解释且精确的反事实编辑的框架。scCBGM通过解码器跳跃连接和促进无维度约束解耦的交叉协方差惩罚，将概念瓶颈架构适应于单细胞数据。我们将该框架扩展到流匹配模型，从而在编码-解码和生成两种模式下实现概念引导的编辑。为了进行严格评估，我们开发了一个具有真实反事实的合成基准。在多个真实数据集上，scCBGM在组合泛化和反事实预测方面表现出优越性能，并通过合成数据上的细胞级验证和真实数据集上的群体级基准得到了支持。

英文摘要

Understanding cellular phenotypes and how they respond to perturbations is critical for disease biology and therapeutic design. Single-cell RNA sequencing enables characterization at cellular resolution, yet the combinatorial space of conditions makes exhaustive experimental mapping infeasible. We introduce single-cell Concept Bottleneck Generative Models (scCBGM), a framework for interpretable and precise counterfactual editing of individual cells. scCBGM adapts concept bottleneck architectures for single-cell data through decoder skip connections and a cross-covariance penalty that promotes disentanglement without dimensional constraints. We extend the framework to flow matching models, enabling concept-guided editing in both encoding-decoding and generation regimes. To enable rigorous evaluation, we develop a synthetic benchmark with ground-truth counterfactuals. Across multiple real datasets, scCBGM demonstrates superior performance in combinatorial generalization and counterfactual prediction, supported by cell-level validation on synthetic data and population-level benchmarks on real datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.07835 2026-06-09 cs.LG 新提交

Mitigating the Contractivity Trap in Diffusion ODEs via Stein Stabilization

通过Stein稳定化缓解扩散ODE中的收缩陷阱

Shigui Li, Delu Zeng

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对扩散模型确定性概率流ODE大步长推理中的收缩陷阱问题，提出SteinDiff框架，通过Stein导出的几何感知残差校正机制正则化求解器更新，无需参考样本即可提升生成质量。

Comments 32 pages, 12 figures. Accepted to ICML 2026

详情

AI中文摘要

在扩散模型通过其确定性概率流常微分方程（PF-ODE）轨迹进行大步长推理时，存在一个基本张力，我们称之为收缩陷阱：高效推理倾向于大步长，而激进的步长和高表达能力的去噪器可能破坏基于收缩的误差抑制稳定性保证。为了解决这个问题，我们提出了SteinDiff，一种逐步推理时稳定化框架，它采用Stein导出的校正，无需参考样本。具体来说，SteinDiff引入了一种几何感知残差校正机制，在不重新训练的情况下正则化大步长求解器更新。为此，我们推导了用于逐步求解器调整的闭式Stein校正系数，实现了对局部数据几何的无参考自适应。我们进一步建立了在分布偏移下的分数控制扰动界，并提供了对EDM风格参数化的补充Stein视角。大量实验表明，SteinDiff在大步长推理设置中减轻了严重伪影并提高了生成质量。

英文摘要

A fundamental tension exists in the large-step inference of diffusion models via their deterministic probability flow ordinary differential equation (PF-ODE) trajectories, which we identify as the contractivity trap: efficient inference favors large step sizes, while aggressive steps and highly expressive denoisers can undermine contraction-based stability certificates for error suppression. To address this, we propose SteinDiff, a step-wise inference-time stabilization framework that employs Stein-derived corrections without requiring reference samples. Specifically, SteinDiff introduces a geometry-aware residual correction mechanism that regularizes large-step solver updates without retraining. To this end, we derive a closed-form Stein correction coefficient for step-wise solver adjustment, enabling reference-free adaptation to local data geometry. We further establish a score-controlled perturbation bound under distributional shifts and provide a complementary Stein perspective on EDM-style parameterizations. Extensive experiments demonstrate that SteinDiff mitigates severe artifacts and improves generative quality across large-step inference settings.

URL PDF HTML ☆

赞 0 踩 0

2606.08221 2026-06-09 cs.LG 新提交

De novo molecular generation with optical property preconditioning at the token level

基于Token级光学性质预条件的从头分子生成

Haozhe Huang, Manuel Gonzalez Lastre, Hyun Suk Park, Jorge A. Campos-Gonzalez-Angulo, Xinjian Liu, Alán Aspuru-Guzik

发表机构 * University of Toronto（多伦多大学）； Vector Institute for Artificial Intelligence（向量人工智能研究所）； Universidad Autónoma de Madrid（马德里自治大学）； Canadian Institute for Advanced Research (CIFAR)（加拿大高等研究院）； NVIDIA（英伟达）

AI总结针对OLED分子光学性质可控生成中数据稀缺和条件控制可靠性有限的问题，提出基于GPT2的Token条件自回归语言模型，通过离散属性Token和多任务优化实现垂直吸收能和振子强度的定向生成，并在TDDFT级别评估分布保真度和可控性。

详情

AI中文摘要

由于高质量数据的稀缺以及生成模型中跨化学基序的条件控制可靠性有限，设计具有目标光学性质的OLED分子仍然具有挑战性。在此，我们在现实低数据场景下对用于OLED分子生成的Token条件自回归语言模型进行了基准测试。一个GPT2模型在大规模化学语料库上进行预训练，增加了离散性质Token，并通过多任务优化进行微调。条件目标为垂直吸收能和振子强度，并将HOMO-LUMO能隙作为辅助电子描述符。生成的分子在TDDFT水平上进行评估，以评估分布保真度和可控性。生成的库再现了训练分布的主要光学性质支持，同时向更低分子量和更少重原子偏移。Token级控制在不同条件区间内一致定向，但并非完全正交，并表现出局部校准不规则性。化学型解析分析进一步表明，可控性强烈依赖于局部电子环境：适度共轭的芳香碳基序与改进的联合目标满足度相关，而吸电子基序，特别是芳基腈，表现出系统性红移和可控性降低。这些结果为条件OLED分子生成建立了定量基准，并表明模型可靠性必须在化学上有意义的子空间中评估，而非仅从聚合性质分布中评估。

英文摘要

Designing OLED molecules with targeted optical properties remains challenging due to the scarcity of high-quality data and the limited reliability of conditional control in generative models across chemical motifs. Here, we benchmark a token-conditioned autoregressive language model for OLED molecular generation in a realistic low-data regime. A GPT2 model is pretrained on large chemical corpora, augmented with discrete property tokens, and fine-tuned using multi-task optimisation. Conditioning targets vertical absorption energy and oscillator strength, with the HOMO-LUMO gap included as an auxiliary electronic descriptor. Generated molecules are evaluated at the TDDFT level to assess distributional fidelity and controllability. The generated library reproduces the dominant optical-property support of the training distribution while shifting towards lower molecular weight and fewer heavy atoms. Token-level control is consistently directional across conditioning bins, but is not fully orthogonal and exhibits local calibration irregularities. A chemotype-resolved analysis further shows that controllability depends strongly on local electronic environments: moderately conjugated aromatic-carbon motifs are associated with improved joint target satisfaction, whereas electron-withdrawing motifs, particularly aryl nitriles, show systematic red-shifting and reduced controllability. These results establish a quantitative benchmark for conditional OLED molecular generation and show that model reliability must be assessed in chemically meaningful subspaces rather than from aggregate property distributions alone.

URL PDF HTML ☆

赞 0 踩 0

2606.08309 2026-06-09 cs.LG cs.CV 新提交

Where the Score Lives: A Wavelet View of Diffusion

分数函数所在之处：扩散的小波视角

Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba

发表机构 * The Kempner Institute for the Study of Natural and Artificial Intelligence（肯普纳自然与人工智能研究所）； Harvard University（哈佛大学）

AI总结提出基于二维正交小波基的分数函数参数化，通过数据分布矩分析揭示不同架构的归纳偏差，解释扩散模型中分数网络与数据分布的相互作用。

Comments 20 pages, 12 figures, AISTATS 2026

详情

Journal ref: Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026, Tangier, Morocco. PMLR: Volume 300

AI中文摘要

基于分数的生成模型在过去十年中在生成多样化视觉上合理的图像方面取得了显著成功。在扩散建模中，包括CNN、U-Net和Transformer在内的多种架构被用作分数近似网络；然而，迄今为止，关于这些架构选择如何影响生成行为的了解相对较少。在这项工作中，为了提供对此领域的见解，我们提出了一种使用二维正交小波基展开的分数函数的解析可解参数化。特别地，我们根据数据分布的矩推导出可解释的最优分数函数。我们利用这种参数化提供了一种与架构无关的、基于矩的分析，揭示了数据分布的哪些属性对去噪最为重要。我们的分数机器足够灵活，可以部分模仿多种架构（包括U-Net和CNN）的相关归纳偏差，朝着理解不同分数架构为何表现出不同生成行为迈出了一步。由于我们的分数函数可以根据数据矩解析求解，我们可以开始理解数据分布如何与分数网络相互作用，从而产生我们在扩散模型中观察到的行为。

英文摘要

Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2606.08375 2026-06-09 cs.LG 新提交

Few-step Cofolding with All-Atom Flow Maps

少步全原子流图共折叠

Gianluca Scarpellini, Ron Shprints, Peter Holderrieth, Juno Nam, Pranav Murugan, Rafael Gómez-Bombarelli, Tommi Jaakola, Maruan Al-Shedivat, Nicholas Matthew Boffi, Avishek Joey Bose

发表机构 * Genesis Molecular AI ； Massachusetts Institute of Technology（麻省理工学院）； Carnegie Mellon University（卡内基梅隆大学）； Imperial College London（伦敦帝国学院）； Mila

AI总结提出DeCAF框架，将全原子共折叠扩散模型蒸馏为流图，仅需几步推理即可生成高质量样本，并通过奖励引导搜索提升采样质量。

详情

AI中文摘要

3D生物分子复合物的全原子生成建模已成为预测蛋白质和蛋白质-配体系统结构的主流范式。然而，在原子级保真度下生成结构通常需要昂贵的迭代扩散展开，这使得传统部署和推理时搜索技术的计算成本都很高。在本文中，我们引入了去噪器共折叠全原子流图（DeCAF）框架，用于将最先进的全原子共折叠模型蒸馏为全原子流图，这些流图仅需几步推理即可产生高质量样本。我们基于去噪器的流图公式构建DeCAF，该公式具有端点损失，自然支持SE(3)刚性对齐，我们证明这对于训练准确模型至关重要。我们进一步推导了一个简单的变量变换，使DeCAF能够在EDM风格架构的σ空间噪声调度中运行，从而能够从预训练的共折叠扩散模型直接蒸馏。借助DeCAF的流图前瞻，我们引入了一个专门构建的推理时框架，通过奖励引导搜索改进采样。实验上，在具有挑战性的Runs N' Poses数据集上，DeCAF-Boltz在严格的NFE预算下，在蛋白质-配体姿势的准确性（RMSD）和物理有效性分数上均统计上优于Boltz-1x，同时在PoseBusters上的所有推理计算预算下显示出更优的帕累托前沿。将最先进的Pearl共折叠模型蒸馏后，DeCAF-Pearl优于基于扩散的共折叠模型，并在成功率上与其教师模型匹配，同时使用的NFE减少了5倍。我们在https://github.com/genesistherapeutics/decaf发布代码。

英文摘要

All-atom generative modeling of 3D biomolecular complexes has emerged as the dominant paradigm for predicting the structure of proteins and protein-ligand systems. Generating structures at the atomic level of fidelity, however, typically requires expensive iterative diffusion rollouts, making both conventional deployment and inference-time search techniques computationally costly. In this paper, we introduce the Denoiser Cofolding All-Atom Flowmap (DeCAF) framework for distilling state-of-the-art all-atom cofolding models into all-atom flow maps that produce high-quality samples in only a few inference steps. We build DeCAF on a denoiser-based formulation of flow maps with endpoint losses that naturally support SE(3) rigid alignment, which we show is critical for training accurate models. We further derive a simple change of variables that lets DeCAF operate in the σ-space noise schedule of EDM-style architectures, enabling direct distillation from pretrained cofolding diffusion models. Equipped with DeCAF's flowmap lookahead, we introduce a purpose-built inference-time framework that improves sampling through reward-guided search. Empirically, DeCAF-Boltz statistically improves over Boltz-1x in both accuracy (RMSD) and physical validity scores of protein-ligand poses at strict NFE budgets on the challenging Runs N' Poses, while also showing a more optimal Pareto frontier across all inference compute budgets on PoseBusters. Distilling the state-of-the-art Pearl cofolding model, DeCAF-Pearl outperforms diffusion-based cofolding models and matches its teacher on success rate while using 5x fewer NFEs. We release our code at https://github.com/genesistherapeutics/decaf.

URL PDF HTML ☆

赞 0 踩 0

2606.08554 2026-06-09 cs.LG 新提交

A Theoretical Analysis of Memory and Overfitting Phenomena in Stochastic Interpolation Models

随机插值模型中的记忆与过拟合现象的理论分析

Yunchen Li, Shaohui Lin, Zhou Yu

AI总结本文通过闭式解分析随机插值模型中的记忆化现象，揭示连续时间下确定性及随机生成过程均恢复训练样本，离散化与估计误差导致样本偏离，并给出过拟合与欠拟合的理论定义。

详情

AI中文摘要

本文对随机插值模型中的记忆化现象进行了理论解释。通过利用最优速度场和相关评分函数的闭式表达式，我们证明，在连续时间预言机设置下，确定性和随机生成过程都能恢复训练样本。在欧拉离散化下，生成的样本仍围绕训练样本中心，偏差由步长控制。我们进一步分析了存在估计误差时的生成过程，并表明累积的估计误差控制了端点与训练集的偏差。这些结果表明，生成的样本可以表示为训练样本加上三个受控项的扰动：离散化引起的界、估计误差引起的界和随机高斯噪声。基于这一表征，我们提供了生成模型中过拟合和欠拟合的理论定义。合成模拟支持了我们的理论发现。

英文摘要

This paper provides a theoretical account of memorization in stochastic interpolation models. By leveraging closed-form expressions for the optimal velocity field and the associated score function, we show that, in the continuous-time oracle setting, both deterministic and stochastic generation processes recover training samples. Under Euler discretization, generated samples remain centered around training samples, with deviations controlled by the step size. We further analyze generation in the presence of estimation errors and show that accumulated estimation errors control the endpoint deviation from the training set. These results imply that the generated sample admits a representation as a training sample perturbed by three controlled terms: a discretization-induced bound, an estimation-error-induced bound, and stochastic Gaussian noise. Based on this characterization, we provide theoretical definitions of overfitting and underfitting in generative models. Synthetic simulations support our theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2606.08802 2026-06-09 cs.LG 新提交

潜空间贝叶斯优化的上下文学习

Tuan A. Vu, Harri Lähdesmäki, Julien Martinelli

发表机构 * Aalto University（阿尔托大学）

AI总结针对潜空间贝叶斯优化中上下文学习模型与优化任务不匹配的问题，提出在分子VAE潜空间上定义合成优化任务进行持续预训练，并引入正则化器保持原始先验，显著提升分子优化性能。

详情

AI中文摘要

贝叶斯优化（BO）是样本高效设计的核心工具，潜空间贝叶斯优化（LSBO）将其扩展到分子和蛋白质等结构化对象。与此同时，TabPFN和TabICL等表格基础模型现已实现最先进的回归性能，并越来越多地被用作BO代理模型。由于其贝叶斯行为是由大规模合成预训练集合诱导的，因此该预训练分布的组成至关重要。LSBO造成了一种独特的不匹配：从潜代码到目标值的映射与当前上下文模型训练所用的回归任务明显不同。我们通过在分子VAE的潜空间上定义合成优化任务来补充表格基础模型代理的预训练阶段，从而解决这种不匹配。持续预训练目标包含一个正则化器，将模型锚定到原始检查点，保留其广泛的回归先验，同时避免对适应任务的过度专业化。在保留的分子优化基准测试中，所得模型实现了强劲性能，支持了针对上下文化代理的LSBO特定适应的重要性。

英文摘要

Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.

URL PDF HTML ☆

赞 0 踩 0

2606.09705 2026-06-09 cs.LG cond-mat.stat-mech 新提交

When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

局部评分模型何时能跨尺寸外推？诊断理论与基准

Wenjie Xi

发表机构 * The University of Hong Kong（香港大学）； Department of Physics and HK Institute of Quantum Science & Technology（物理系与香港量子科学与技术研究所）

AI总结提出诊断理论，证明局部模型能否稳定外推取决于高斯平滑评分的准局部性，并引入有限深度局部流（FDLF）基准进行验证。

详情

AI中文摘要

科学生成建模通常需要尺寸迁移，即在小系统上训练的模型在大系统上评估。虽然平移不变架构允许这种评估，但我们表明架构局部性本身并不能保证稳定的尺寸外推。相反，稳定外推由高斯平滑评分的准局部性决定。通过Tweedie公式，远距离扰动可以通过后验协方差影响局部评分分量，这意味着局部模型只有在感受野覆盖平滑评分的响应范围时才能成功。我们形式化了这一机制，证明了反向扩散下局部边缘的尺寸一致比较定理。我们还引入了有限深度局部流（FDLF），这是一个具有精确评分、密度和可控响应范围的白盒诊断基准。实验上，我们验证了空间混合、平滑评分准局部性和模型感受野之间的相互作用。在空间混合下，平滑评分相对于感受野保持准局部性，从而实现稳定外推。相反，当空间混合减弱时，评分的局部性迅速退化，导致尺寸迁移失败。

英文摘要

Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.

URL PDF HTML ☆

赞 0 踩 0

2606.07640 2026-06-09 cs.CV cs.AI cs.LG 交叉投稿

No Free Lunch for Synthetic Images under Data Scarcity Conditions

数据稀缺条件下合成图像的无免费午餐定理

Borja Arroyo Galende, Alejandro Almodóvar, Patricia A. Apellániz, Juan Parras, Silvia Uribe, Santiago Zazo

发表机构 * Universidad Politécnica de Madrid（马德里理工大学）； Universidad de Alcalá（阿尔卡拉大学）

AI总结研究数据稀缺和隐私敏感条件下合成数据的保真度、隐私和效用权衡，提出联合评估框架，比较VAE、GAN和DDPM在三个图像数据集上的表现，发现GAN和DDPM在差分隐私下更鲁棒。

2606.08694 2026-06-09 cond-mat.soft cond-mat.stat-mech cs.LG 交叉投稿

Discovering and decoding latent mean-field structure with variational autoencoders

通过变分自编码器发现和解读隐平均场结构

Marco Biroli, Max Welling, Vincenzo Vitelli

发表机构 * Department of Physics and the James Franck Institute, University of Chicago（芝加哥大学物理系及詹姆斯·弗兰克研究所）； CuspAI ； AMLab, University of Amsterdam（阿姆斯特丹大学AMLab）； Leinweber Institute for Theoretical Physics（莱因韦伯理论物理研究所）

AI总结提出一种量化变分自编码器（VAE）重建多体系统联合概率分布能力的准则，证明成功VAE的条件独立解码器等价于有限尺寸平均场分解，从而可从解码器读出平均场理论的微观参数，并在标量、向量和张量序参量模型及视网膜记录数据中验证。

Comments 10 pages, 5 figures

详情

AI中文摘要

生成模型越来越多地用于捕捉多体系统中的相关性，但它们学习到的表示在很大程度上仍难以进行物理解释。在这里，我们建立了一个直观的准则，用于量化变分自编码器（VAE）忠实重建多体系统联合概率分布的能力。简而言之，通过将潜在通道的速率与数据的二分互信息进行比较，可以得到VAE容量的一个界限。利用这个界限，我们证明任何成功的VAE的条件独立解码器在结构上等同于有限尺寸平均场分解。因此，成功的重建是潜在平均场理论的直接证据，并且该理论的微观参数可以从训练好的解码器中读出。我们在具有标量（Curie-Weiss）、向量（Hopfield）和张量（Maier-Saupe）序参量的可解模型层次上验证了这些结论，仅从平衡样本中恢复了完整的Hopfield模式矩阵。我们发现，当应用于蝾螈视网膜记录时，一个双潜在VAE仅用两个有效的集体变量就再现了群体统计，使我们能够恢复神经群体的“存储模式”，并写出一个正确建模实验数据的广义Hopfield模型。

英文摘要

Generative models are increasingly used to capture correlations in many-body systems, but the representations they learn remain largely opaque to physical interpretation. Here, we establish an intuitive criterion that quantifies the capacity of a variational autoencoder (VAE) to faithfully reconstruct the joint probability distribution of a many body system. In a nutshell, a bound on the VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, we show that the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence, a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder. We validate these conclusions on a hierarchy of solvable models with scalar (Curie-Weiss), vector (Hopfield) and tensor (Maier-Saupe) order parameters, recovering the full Hopfield pattern matrix from equilibrium samples alone. We find that, when applied to Salamander retinal recordings, a two-latent VAE reproduces the population statistics with only two effective collective variables allowing us to recover the `stored patterns' of the neural population and write a generalized Hopfield model which correctly models the experimental data.

URL PDF HTML ☆

赞 0 踩 0

2606.08847 2026-06-09 cs.CV cs.AI cs.LG 交叉投稿

BLM-SGAN: Bidirectional Language Modeling for Semantic-Spatial Text-to-Image Generation

BLM-SGAN: 用于语义-空间文本到图像生成的双向语言建模

Ahmed Abdelmoneim Mazrou, Haidy Maher El-Amir, Ali Hamdi

发表机构 * Faculty of Computer Science, MSA University, Egypt（MSA大学计算机科学学院，埃及）

AI总结提出BLM-SGAN模型，利用BERT的双向注意力机制捕获长程依赖，解决GAN在文本到图像生成中的梯度消失和序列处理限制，在鸟类图像生成上达到SOTA。

Comments Published in ICACIn 2024. Appears in Advances on Intelligent Computing and Data Science II, Lecture Notes on Data Engineering and Communications Technologies, vol. 254, Springer, 2025

详情

DOI: 10.1007/978-3-031-91351-8_5
Journal ref: Advances on Intelligent Computing and Data Science II (ICACIn 2024), Lecture Notes on Data Engineering and Communications Technologies, vol. 254, Springer, Cham, 2025

AI中文摘要

尽管从文本描述生成图像取得了成功，但在自然语言处理（NLP）和计算机视觉（CV）等领域仍面临难以克服的挑战。文本到图像（T2I）模型的最新进展，特别是那些利用生成对抗网络（GAN）的模型，显著提高了跨领域合成逼真图像的能力。然而，现有的基于GAN的T2I模型仍然面临关键挑战，例如难以捕获长程依赖、梯度消失以及序列处理的局限性。为了解决这些问题，我们引入了BLM-SGAN，一种新颖的模型，它结合了用于语义-空间文本到图像生成的双向语言建模。BLM-SGAN利用BERT的注意力机制来捕获丰富的上下文信息并有效管理扩展序列。我们的模型展示了最先进的性能，Inception Score（IS）为5.45 +/- 0.08，超过了多个竞争模型，如SSA-GAN、DF-GAN、SD-GAN和AttnGAN。BLM-SGAN能够从详细的文本描述中有效生成高度逼真的鸟类图像。实现代码可在以下网址获取：https://github.com/haidy-maher/BLM-SGAN-Text-to-Image-Generation。

英文摘要

Despite the success of image generation from text descriptions, it still faces challenges that are difficult to overcome in domains such as natural language processing (NLP) and computer vision (CV). Recent advancements in text-to-image (T2I) models, particularly those utilizing generative adversarial networks (GANs), have significantly improved the synthesis of realistic images across various domains. However, existing GAN-based T2I models still encounter key challenges, such as difficulty in capturing long-range dependencies, vanishing gradients, and the limitations of sequential processing. To address these issues, we introduce BLM-SGAN, a novel model that incorporates Bidirectional Language Modeling for Semantic-Spatial Text-to-Image Generation. BLM-SGAN leverages BERT's attention mechanisms to capture rich contextual information and efficiently manage extended sequences. Our model demonstrates state-of-the-art performance, with an Inception Score (IS) of 5.45 +/- 0.08, surpassing several competitive models such as SSA-GAN, DF-GAN, SD-GAN, and AttnGAN. BLM-SGAN effectively generates highly realistic images of birds from detailed text descriptions. The implementation code is available at: https://github.com/haidy-maher/BLM-SGAN-Text-to-Image-Generation.

URL PDF HTML ☆

赞 0 踩 0

2606.09056 2026-06-09 cs.CV cs.LG 交叉投稿

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

MilliVid: 用于视频生成中长程一致性的分层潜变量

Ishaan Preetam Chandratreya, David Charatan, Basile Van Hoorick, Sergey Zakharov, Vitor Guizilini, Phillip Isola, Vincent Sitzmann

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Toyota Research Institute（丰田研究所）

AI总结提出一种多尺度token空间的粗到细展开方法，通过预训练层次化自编码器压缩帧为多层token，并训练视频扩散模型生成这些token，在保持几何和物体持久性长程一致性的同时降低计算开销。

Comments Ishaan Preetam Chandratreya and David Charatan contributed equally. Project page: https://davidcharatan.com/millivid/

详情

AI中文摘要

视频生成模型已变得日益强大，但长程一致性仍然难以实现，因为即使只有几十帧也需要不切实际的长Transformer序列长度。我们表明，通过在多尺度token空间内使用粗到细展开生成视频，可以缓解这一问题。我们的方法很简单：首先，预训练一个自编码器，将每一帧压缩成一个token层次结构，层级范围从典型的潜变量分辨率到每帧仅几个token。最粗糙的层级捕获最重要的信息，如场景布局和语义，而更细的层级添加高频外观和纹理。然后，我们训练一个视频扩散模型，使用粗到细展开生成这些token。通过仔细控制在每个展开步骤中生成帧并用作上下文的细节级别，我们能够保持几何和物体持久性的长程一致性，同时将计算花费在感知上不太相关的细节的长程一致性上。我们使用一个自定义的长Minecraft视频数据集验证了这种方法，与现有基线相比，它产生了更一致的展开结果。

英文摘要

Video generative models have become increasingly powerful, but long-range consistency remains challenging to achieve because even a few dozen frames require impractically long transformer sequence lengths. We show that this issue can be mitigated by generating video using coarse-to-fine rollout within a multi-scale token space. Our approach is simple: first, we pre-train an autoencoder that compresses each frame into a hierarchy of tokens, with levels ranging from the typical latent resolution to only a handful of tokens per frame. The coarsest levels capture the most consequential information, such as scene layout and semantics, while finer levels add high-frequency appearance and texture. Then, we train a video diffusion model to generate these tokens using coarse-to-fine rollout. By carefully controlling the level of detail at which frames are generated and used as context during each rollout step, we are able to preserve long-range consistency in geometry and object permanence while spending less compute on the long-range consistency of less perceptually relevant details. We validate this approach using a custom dataset of long Minecraft videos, where it produces substantially more consistent rollouts compared to existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2411.08314 2026-06-09 cs.LG 版本更新

Modeling Stochastic Conditional Dynamics from Sparse Observations via Kernel-Stabilized Flow Matching

通过核稳定流匹配从稀疏观测中建模随机条件动力学

Adam P. Generale, Andreas E. Robertson, Surya R. Kalidindi

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Sandia National Laboratories（桑地亚国家实验室）

AI总结提出条件变量流匹配（CVFM）框架，通过联合采样状态和条件变量流，利用条件不匹配核和Wasserstein距离重加权目标，从稀疏非配对数据中学习条件分布的时间演化，在材料结构建模中表现更优。

Comments Accepted to Transactions on Machine Learning Research (2026); OpenReview: https://openreview.net/forum?id=3A6oAS2TWo

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

学习随时间变换条件概率密度是概率建模和自然科学中的一个基本挑战。在生物和物理领域中，预测随机非线性动力系统的演化至关重要。虽然基于流的模型可以预测概率分布的时间演化，但现有方法通常假设离散条件且样本在时间上配对，限制了其在仅有稀疏非配对连续条件数据时的科学适用性。我们提出条件变量流匹配（CVFM），这是一个学习流的框架，通过跨条件密度的连续空间摊销来变换条件分布。CVFM通过联合采样状态和条件变量流，利用条件不匹配核和条件Wasserstein距离重新加权条件最优传输目标，解决了先前方法的高方差不稳定性。总的来说，这些进展允许从跨时间的稀疏非配对状态-条件测量中学习动力学。我们在条件映射基准和制造过程中材料内部结构时间演化的案例研究上评估了CVFM，观察到与现有条件变体相比，性能和收敛特性有所改善。代码可在https://this https URL获取。

英文摘要

Learning to transform conditional probability densities over time is a fundamental challenge spanning probabilistic modeling and the natural sciences. This task is paramount when forecasting the evolution of stochastic nonlinear dynamical systems in biological and physical domains. While flow-based models can predict the temporal evolution of probability distributions, existing approaches often assume discrete conditioning with samples that are paired across time, limiting their scientific applicability where frequently only sparse data with unpaired continuous conditioning is available. We propose Conditional Variable Flow Matching (CVFM), a framework for learning flows transforming conditional distributions with amortization across the continuous space of conditional densities. CVFM addresses the high-variance instability of prior methods by jointly sampling flows over state and conditioning variables, utilizing a conditioning mismatch kernel alongside a conditional Wasserstein distance to reweight the conditional optimal transport objective. Collectively, these advances allow for learning dynamics from sparse unpaired measurements of state-condition across time. We evaluate CVFM on conditional mapping benchmarks and a case study modeling the temporal evolution of materials internal structure during manufacturing processes, observing improved performance and convergence characteristics over existing conditional variants. Code is available at https://github.com/agenerale/conditional-variable-flow-matching.

URL PDF HTML ☆

赞 0 踩 0

2502.06819 2026-06-09 cs.LG cs.GR 版本更新

AccioScene: Compositional 3D Scene Generation via Graph Diffusion and Interaction-driven Critics

AccioScene: 基于图扩散与交互驱动评判的组合式3D场景生成

Yao Wei, Matteo Toso, Pietro Morerio, Changjae Oh, Michael Ying Yang, Alessio Del Bue

发表机构 * Queen Mary University of London, UK（伦敦大学玛丽女王学院）； Italian Institute of Technology (IIT), Italy（意大利理工学院）； University of Bath, UK（巴斯大学）

AI总结提出多阶段流水线，通过图扩散生成上下文一致的场景图并预测物体布局，结合轻量级人-物交互先验和空间约束，生成支持人类交互且物理合理的3D室内场景。

详情

AI中文摘要

本文提出一个从文本提示生成3D室内场景的框架。现有方法通常将场景合成视为基于单一输入模态（如文本描述、房间形状或场景图）的物体布局预测问题，这种设计可能导致物体碰撞和功能合理性受限，降低了其实用性。为解决这些局限，我们引入一个多阶段流水线，更好地反映实际场景创建场景。给定描述部分场景内容的文本提示，我们的方法首先使用图扩散生成上下文连贯的场景图，然后预测合理的物体布局。此外，我们融入轻量级人-物交互先验以鼓励以人为中心和功能性的布局，并加入显式空间约束以减少相互穿透。我们的方法生成连贯的3D场景，其布局可行且更好地支持人类交互。在3D-FRONT数据集上的实验表明，与现有方法相比，我们的方法达到了有竞争力或最先进的性能，同时提高了生成场景的物理合理性。

英文摘要

This paper presents a framework for generating 3D indoor scenes from text prompts. Existing methods often formulate scene synthesis as an object layout prediction problem conditioned on a single input modality, such as a text description, room shape, or scene graph. This design can lead to object collisions and limited functional plausibility, reducing its practical applicability. To address these limitations, we introduce a multi-stage pipeline that better reflects practical scene creation scenarios. Given a text prompt describing partial scene content, our method first uses graph diffusion to produce a contextually coherent scene graph and then predicts a realistic object layout. In addition, we incorporate lightweight human-object interaction priors to encourage human-centric and functional arrangements, with explicit spatial constraints to reduce interpenetration. Our approach generates coherent 3D scenes with viable layouts that better support human interaction. Experiments on the 3D-FRONT dataset demonstrate that our method achieves competitive or state-of-the-art performance compared with existing approaches, while improving the physical plausibility of generated scenes.

URL PDF HTML ☆

赞 0 踩 0

2502.19049 2026-06-09 cs.LG 版本更新

In-Context Learning of Stochastic Differential Equations with Foundation Inference Models

基于基础推理模型的随机微分方程上下文学习

Patrick Seifner, Kostadin Cvejoski, David Berghaus, Cesar Ojeda, Ramses J. Sanchez

发表机构 * Lamarr Institute（拉马尔研究所）； University of Bonn（波恩大学）； Fraunhofer IAIS（弗劳恩霍夫智能系统研究所）； University of Potsdam（波茨坦大学）

AI总结提出FIM-SDE，一种预训练识别模型，通过上下文学习从噪声时间序列中零样本估计低维SDE的漂移和扩散函数，并支持快速微调，在合成和真实数据上表现鲁棒。

Comments Accepted at NeurIPS 2025. The previous version appeared under the title "Foundation Inference Models for Stochastic Differential Equations: A Transformer-based Approach for Zero-shot Function Estimation.";

详情

Journal ref: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

AI中文摘要

随机微分方程（SDE）描述了由漂移函数控制的确定性流动与由扩散函数决定的随机波动叠加的动态系统。从数据中准确估计（或发现）这些函数是机器学习中的一个核心问题，在自然科学和社会科学中有着广泛的应用。然而，当前的解决方案要么严重依赖于对动力学的先验知识，要么涉及复杂的训练过程。我们引入了FIM-SDE（用于SDE的基础推理模型），这是一种预训练的识别模型，能够从含噪声的时间序列数据中对低维SDE的漂移和扩散函数进行准确的上下文（或零样本）估计，并允许快速微调到目标数据集。利用摊销推理和神经算子的概念，我们以监督方式（预）训练FIM-SDE，将大量含噪声的离散观测SDE路径映射到漂移和扩散函数空间。我们证明，FIM-SDE在广泛的合成和真实世界过程中实现了鲁棒的上下文函数估计——从经典的SDE系统（例如双阱动力学或弱扰动洛伦兹吸引子）到股票价格记录以及油价和风速波动——同时匹配在目标数据集上训练的符号、高斯过程和神经SDE基线的性能。当微调到目标过程时，我们显示FIM-SDE始终优于所有这些基线。

英文摘要

Stochastic differential equations (SDEs) describe dynamical systems where deterministic flows, governed by a drift function, are superimposed with random fluctuations, dictated by a diffusion function. The accurate estimation (or discovery) of these functions from data is a central problem in machine learning, with wide application across the natural and social sciences. Yet current solutions either rely heavily on prior knowledge of the dynamics or involve intricate training procedures. We introduce FIM-SDE (Foundation Inference Model for SDEs), a pretrained recognition model that delivers accurate in-context (or zero-shot) estimation of the drift and diffusion functions of low-dimensional SDEs, from noisy time series data, and allows rapid finetuning to target datasets. Leveraging concepts from amortized inference and neural operators, we (pre)train FIM-SDE in a supervised fashion to map a large set of noisy, discretely observed SDE paths onto the space of drift and diffusion functions. We demonstrate that FIM-SDE achieves robust in-context function estimation across a wide range of synthetic and real-world processes -- from canonical SDE systems (e.g., double-well dynamics or weakly perturbed Lorenz attractors) to stock price recordings and oil-price and wind-speed fluctuations -- while matching the performance of symbolic, Gaussian process and Neural SDE baselines trained on the target datasets. When finetuned to the target processes, we show that FIM-SDE consistently outperforms all these baselines.

URL PDF HTML ☆

赞 0 踩 0

2505.14752 2026-06-09 cs.LG 版本更新

LLMSynthor: Macro-Aligned Micro-Records Synthesis with Large Language Models

LLMSynthor: 使用大语言模型进行宏观对齐的微观记录合成

Yihong Tang, Menglin Kong, Junlin He, Tong Nie, Wei Ma, Lijun Sun

发表机构 * McGill University（麦吉尔大学）； The Hong Kong Polytechnic University（香港理工大学）

AI总结提出LLMSynthor方法，利用大语言模型作为非参数copula，通过迭代生成与目标宏观统计一致的微观记录，解决大规模细粒度数据收集困难的问题。

详情

AI中文摘要

宏观对齐的微观记录对于社会科学和城市研究中的可信模拟至关重要。例如，流行病模型只有在个体层面的流动和接触反映真实行为，且聚合数据匹配真实世界统计数据（如病例数或旅行流量）时才可靠。然而，大规模收集此类细粒度数据不切实际，研究人员只能获得宏观数据。LLMSynthor通过将预训练的大语言模型转化为宏观感知模拟器来解决这一问题，生成与目标宏观统计一致的逼真微观记录。它迭代构建合成数据集：在每一步，LLM生成一批记录以最小化合成聚合与目标聚合之间的差异。将LLM视为非参数copula，使模型能够捕捉变量间真实的联合依赖关系。为提高效率，LLM提议采样引导LLM提出有针对性的记录批次，指定变量范围和数量，以有效纠正差异，同时保持基于模型先验的真实性。跨领域（移动、电子商务、人口）的评估表明，LLMSynthor实现了强真实性、统计保真度和实用性，使其广泛适用于经济学、社会科学和城市研究。

英文摘要

Macro-aligned micro-records are crucial for credible simulations in social science and urban studies. For example, epidemic models are only reliable when individual-level mobility and contacts mirror real behavior, while aggregates match real-world statistics like case counts or travel flows. However, collecting such fine-grained data at scale is impractical, leaving researchers with only macro-level data. LLMSynthor addresses this by turning a pretrained LLM into a macro-aware simulator that generates realistic micro-records consistent with target macro-statistics. It iteratively builds synthetic datasets: in each step, the LLM generates batches of records to minimize discrepancies between synthetic and target aggregates. Treating the LLM as a nonparametric copula allows the model to capture realistic joint dependencies among variables. To improve efficiency, LLM Proposal Sampling guides the LLM to propose targeted record batches, specifying variable ranges and counts, to efficiently correct discrepancies while preserving realism grounded in the model's priors. Evaluations across domains (mobility, e-commerce, population) show that LLMSynthor achieves strong realism, statistical fidelity, and practical utility, making it broadly applicable to economics, social science, and urban studies.

URL PDF HTML ☆

赞 0 踩 0

2507.19700 2026-06-09 cs.LG 版本更新

Disjoint Generation of Synthetic Data

合成数据的分离生成

Anton Danholt Lautrup, Muhammad Rajabinasab, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp

发表机构 * Department of Mathematics and Computer Science（数学与计算机科学系）； University of Southern Denmark（南方大学）

AI总结提出通过分离生成模型生成表格合成数据的新框架，将数据集分区后独立生成再合并，在无公共变量时实现连接，提升隐私性、计算可行性和混合模型合成能力。

详情

Journal ref: Transact. mach. learn. res. (June 2026). https://openreview.net/forum?id=LSzXkAWBKI

AI中文摘要

我们提出了一种通过分离生成模型生成表格合成数据集的新框架。在该范式中，数据集被划分为多个不相交的子集，分别提供给生成模型的独立实例。然后，通过一种在缺乏公共变量/标识符的情况下工作的连接操作，将结果事后组合。通过几个案例研究和表格数据示例，我们展示了该框架的成功，并帮助阐明了一些可能的设计选择。分离生成所实现的优势包括：i) 观察到隐私的经验度量有所提高。ii) 增加了某些模型类型的计算可行性。iii) 能够使用不同生成模型的混合来生成合成数据。具体而言，混合模型合成弥合了隐私和效用性能之间的差距，在下游任务的准确性和曲线下面积方面提供了极具竞争力的性能，同时显著降低了经验重识别风险。

英文摘要

We propose a new framework for generating tabular synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that help illuminate some of the design choices that one may make. The advantages achieved by the disjoint generation include: i) An observed increase in the empirical measurement of privacy. ii) Increased computational feasibility of certain model types. iii) Ability to generate synthetic data using a mixture of different generative models. Specifically, mixed-model synthesis bridges the gap between privacy and utility performance, providing highly competitive performance on Accuracy and Area Under the Curve for downstream tasks while significantly lowering the empirical re-identification risk.

URL PDF HTML ☆

赞 0 踩 0

2508.19857 2026-06-09 cs.LG quant-ph 版本更新

Quantum latent distributions in deep generative models

深度生成模型中的量子潜在分布

Omar Bacarreza, Thorin Farnsworth, Alexander Makarovskiy, Hugo Wallner, Tessa Hicks, Santiago Sempere-Llagostera, John Price, Robert J. A. Francis-Jones, William R. Clements

发表机构 * ORCA Computing（ORCA计算公司）

AI总结研究量子处理器产生的潜在分布何时及为何能提升生成模型性能，理论上证明其可生成经典分布无法高效产生的数据分布，并在合成和分子数据集上验证了量子干涉统计带来的性能优势。

Comments Accepted at ICML 2026

详情

AI中文摘要

许多成功的生成模型家族利用低维潜在分布映射到数据分布。尽管通常使用简单的潜在分布，但分布的选择对模型性能有强烈影响。最近的实验表明，量子处理器产生的概率分布（通常高度相关且经典上难以处理）可以在某些数据集上带来性能提升。然而，量子处理器产生的潜在分布何时以及为何能提升性能，以及这些改进是否与这些分布的量子性质相关，是我们在本工作中研究的开放问题。我们在理论上证明，在某些条件下，这些“量子潜在分布”使生成模型能够产生经典潜在分布无法高效产生的数据分布。我们提供了关于潜在机制的解释，这些机制可以解释在真实数据集上的性能优势。基于此，我们在合成量子数据集和QM9分子数据集上进行了广泛的基准测试，使用了模拟和真实的光子量子处理器。我们发现，与经典基线相比，量子干涉产生的统计特性带来了更好的生成性能，表明量子处理器可以在扩展深度生成模型的能力方面发挥作用。

英文摘要

Many successful families of generative models leverage a low-dimensional latent distribution that is mapped to a data distribution. Though simple latent distributions are often used, the choice of distribution has a strong impact on model performance. Recent experiments have suggested that the probability distributions produced by quantum processors, which are typically highly correlated and classically intractable, can lead to improved performance on some datasets. However, when and why latent distributions produced by quantum processors can improve performance, and whether these improvements are connected to quantum properties of these distributions, are open questions that we investigate in this work. We show in theory that, under certain conditions, these "quantum latent distributions" enable generative models to produce data distributions that classical latent distributions cannot efficiently produce. We provide intuition as to the underlying mechanisms that could explain a performance advantage on real datasets. Based on this, we perform extensive benchmarking on a synthetic quantum dataset and the QM9 molecular dataset, using both simulated and real photonic quantum processors. We find that the statistics arising from quantum interference lead to improved generative performance compared to classical baselines, suggesting that quantum processors can play a role in expanding the capabilities of deep generative models.

URL PDF HTML ☆

赞 0 踩 0

2509.24762 2026-06-09 cs.LG 版本更新

In-Context Learning of Temporal Point Processes with Foundation Inference Models

基于基础推理模型的时间点过程上下文学习

David Berghaus, Patrick Seifner, Kostadin Cvejoski, César Ojeda, Ramsés J. Sánchez

发表机构 * Lamarr Institute（拉马尔研究所）； Fraunhofer IAIS（弗劳恩霍夫人工智能研究所）； University of Bonn（波恩大学）； JetBrains Research（JetBrains研究）； University of Potsdam（波恩大学）

AI总结提出一种基于摊销推理和上下文学习的点过程基础推理模型FIM-PP，通过大规模合成数据预训练，无需额外训练即可估计真实MTPP，或快速微调至目标系统。

Comments This paper is published as a conference paper at ICLR 2026

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

AI中文摘要

利用带标记的时间点过程（MTPP）对多种事件类型的事件序列进行建模，为揭示支配性动态规则和预测未来事件提供了一种原则性方法。当前MTPP推理的神经网络方法依赖于为每个目标系统训练单独的专用模型。我们采用一种截然不同的方法：利用摊销推理和上下文学习，预训练一个深度神经网络，以从由事件序列集合定义的上下文中推断事件历史的条件强度函数。预训练是在从广泛霍克斯过程分布中采样的大规模合成MTPP数据集上进行的。预训练后，我们的点过程基础推理模型（FIM-PP）可以在无需任何额外训练的情况下从真实世界数据中估计MTPP，或者快速微调至目标系统。实验表明，这种摊销方法在常见基准数据集上的下一事件预测任务中与专用模型的性能相匹配。

英文摘要

Modeling event sequences of multiple event types with marked temporal point processes (MTPPs) provides a principled way to uncover governing dynamical rules and predict future events. Current neural network approaches to MTPP inference rely on training separate, specialized models for each target system. We pursue a radically different approach: drawing on amortized inference and in-context learning, we pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences. Pretraining is performed on a large synthetic dataset of MTPPs sampled from a broad distribution of Hawkes processes. Once pretrained, our Foundation Inference Model for Point Processes (FIM-PP) can estimate MTPPs from real-world data without any additional training, or be rapidly finetuned to target systems. Experiments show that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2511.05355 2026-06-09 cs.LG cs.RO cs.SY eess.SY 版本更新

简单自条件适应用于掩码扩散模型

Michael Cardei, Huu Binh Ta, Ferdinando Fioretto

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结本文提出一种简单有效的后训练适应方法，通过自条件预测提升掩码扩散模型的生成能力，减少生成困惑度并提升图像合成和分子生成质量。

详情

AI中文摘要

掩码扩散模型（MDMs）通过迭代去噪在吸收掩码过程中生成离散序列。在标准掩码扩散中，如果一个token在反向更新后仍被掩码，模型会丢弃该位置的干净状态预测。因此，仍被掩码的位置必须反复从掩码token本身推断。这种设计限制了跨步骤的细化。为解决这一限制，本文提出了一种简单但有效的后训练适应方法，使每个去噪步骤都基于模型自身之前的干净状态预测。所提出的方法称为自条件掩码扩散模型（SCMDM），需要最小的架构更改，不引入递归的潜在状态路径，不依赖辅助参考模型，并在采样过程中不增加额外的去噪器评估。这与部分自条件方法形成重要区别，后者需要昂贵的从头模型训练。特别是，本文表明，在后训练阶段，部分自条件，包括用于从头训练自条件模型的常用50% dropout策略，是次优的。相反，一旦模型自生成的干净状态估计变得有信息，专业化于细化优于混合条件和无条件目标。SCMDM在多个领域进行了评估，显示出对普通MDM基线的一致改进，实现了在OWT训练模型上的生成困惑度几乎减少50%（从42.89到23.72），同时在离散图像合成质量、小分子生成和基因组分布建模的保真度方面也取得了显著改进。

英文摘要

Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-state prediction for that position. Thus, still-masked positions must be repeatedly inferred from the mask token alone. This design choice limits cross-step refinement. To address this limitation, this paper proposes a simple, yet effective, post-training adaptation for MDMs that conditions each denoising step on the model's own previous clean-state predictions. The resulting method, called Self-Conditioned Masked Diffusion Models (SCMDM), requires minimal architectural change, does not introduce a recurrent latent-state pathway, does not rely on an auxiliary reference model, and adds no extra denoiser evaluations during sampling. This is an important departure from partial self-conditioning approaches which requires expensive model training from scratch. In particular, the paper shows that partial self-conditioning, including the commonly used 50% dropout strategy for training self-conditioned models from scratch, is suboptimal in the post-training regime. Instead, once the model's self-generated clean-state estimates become informative, the specialization to refinement is preferable to mixing conditional and unconditional objectives. SCMDM is evaluated across multiple domains, demonstrating consistent improvement over vanilla MDM baselines, achieving nearly a 50% reduction in generative perplexity on OWT-trained models (42.89 to 23.72), alongside strong improvements in discretized image synthesis quality, small molecular generation, and enhanced fidelity in genomic distribution modeling.

URL PDF HTML ☆

赞 0 踩 0

2605.29920 2026-06-09 cs.LG 版本更新

Midpoint Generative Models

中点生成模型

Daniil Shlenskii, Nikita Gushchin, Lev Novitskiy, Dmitry V. Dylov, Alexander Korotin

发表机构 * AXXX, Russia（俄罗斯AXXX）； Applied AI Institute, Russia（俄罗斯应用人工智能研究所）； Kandinsky Lab, Russia（俄罗斯康德斯基实验室）

AI总结提出中点生成模型（MGM），利用流匹配的对称性定义中点散度，并通过变分目标训练单步生成模型，在性能上与现有方法竞争。

详情

AI中文摘要

我们引入了中点生成模型（MGM），这是一个用于训练单步生成模型的原则性框架。MGM基于线性插值流匹配的一个简单对称性：当两个端点分布重合时，相应的漂移场在中点时间$t=1/2$处消失。我们证明该场的范数定义了分布之间的有效差异，称为中点散度。我们通过引入随机翻转插值将该散度扩展到中点之外，并通过用对称随机插值替代确定性线性流匹配插值进一步推广，得到广义中点散度。最后，我们推导了广义散度的变分形式，从而得到一个可处理的目标用于训练单步生成器。由此产生的MGM算法为生成建模提供了一种有效且理论上有依据的方法，在单步生成建模方法中取得了有竞争力的性能。

英文摘要

We introduce Midpoint Generative Models (MGM), a principled framework for training one-step generative models. MGM is based on a simple symmetry of Flow Matching with linear interpolation: when the two endpoint distributions coincide, the corresponding drift field vanishes at the midpoint time, $t=1/2$. We show that the norm of this field defines a valid discrepancy between distributions, which we call the Midpoint Divergence. We extend this discrepancy beyond the midpoint by introducing randomly flipped interpolations and further generalize it by replacing deterministic linear Flow Matching interpolations with symmetric stochastic interpolants, yielding a generalized Midpoint Divergence. Finally, we derive a variational formulation of our generalized divergence, yielding a tractable objective for training a one-step generator. The resulting MGM algorithm offers an effective and theoretically grounded approach to generative modeling, achieving competitive performance against existing one-step generative modeling methods.

URL PDF HTML ☆

赞 0 踩 0

2605.31498 2026-06-09 cs.LG q-bio.BM 版本更新

Scalable Inference-Time Annealing with Surrogate Likelihood Estimators

可扩展的推理时退火与代理似然估计器

Daniel Peñaherrera, Rishal Aggarwal, David Ryan Koes

发表机构 * CMU-Pitt PhD Program in Computational Biology Dept. of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA（卡内基梅隆大学-匹兹堡联合博士项目计算生物学部门计算与系统生物学系，匹兹堡大学，匹兹堡，PA 15260，USA）

AI总结提出可扩展推理时退火（SITA）方法，通过基于能量的模型实现快速代理似然，避免昂贵的散度计算，在丙氨酸二肽和三肽上取得最先进性能。

Comments 26 pages, 5 figures, submitted to JMLR 2026

详情

AI中文摘要

计算化学和生物物理学中长期存在的挑战是高效采样分子的玻尔兹曼分布。生成式建模的进展被提出以解决传统采样技术的局限性，通过消除模拟的计算成本。一个有前景的方向是沿着温度阶梯迭代微调扩散模型，其中训练数据通过推理时退火期间的重要性采样生成。不幸的是，这些方法需要在分数场上计算散度来估计重要性权重，使得它们对于较大系统难以处理。在这里，我们提出可扩展的推理时退火（SITA），它重新训练基于流的模型以在逐渐降低的温度下生成样本，使用基于能量的模型来促进快速代理似然。我们在丙氨酸二肽和丙氨酸三肽上展示了最先进的性能，同时避免了昂贵的散度项。我们的代码可在 https://github.com/countrsignal/sita.git 获取。

英文摘要

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git

URL PDF HTML ☆

赞 0 踩 0

2606.04804 2026-06-09 cs.LG 版本更新

The Right Measure for Physics-Constrained Generation: A Co-Area Correction for Posterior-Consistent PDE Inverse Problems

物理约束生成的正确度量：后验一致PDE逆问题的共面积修正

Jian Xu, Yanning Wu, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * University of Cambridge（剑桥大学）； University of Toronto（多伦多大学）

AI总结针对扩散模型和流匹配在硬约束PDE逆问题中采样后验分布错误的问题，提出共面积修正因子和CoCoS采样器，实现正确的后验采样。

详情

AI中文摘要

生成模型——扩散和流匹配——越来越多地用于求解偏微分方程（PDE）逆问题，将控制物理作为硬约束（通过投影或引导）强制执行，并将所得样本报告为具有校准不确定性的贝叶斯后验。我们表明，这种广泛采用的配方采样了错误的分布。在硬PDE约束上条件化生成先验是在测度零流形上的条件化——这一操作本质上是模糊的（Borel-Kolmogorov悖论），而其物理上正确的解，即小残差噪声极限，携带一个共面积（Fixman）雅可比因子$[det(JJ^{\top})]^{-1/2}$，而基于投影和引导的方法默默地忽略了它。我们精确地指出了偏差，表明它随约束敏感性的异质性增长，并在受控问题上通过与独立同分布的真实仲裁者对比验证了这一点。被忽略的因子并非二阶细节：移除它会使后验误差膨胀到采样噪声底限的20倍；最小位移投影（如PCFM）的偏差为底限的9倍；而简单的标量重加权无法修复。我们引入了 extbf{CoCoS}，一种度量感知的约束采样器，针对正确的共面积后验，并表明它在采样噪声内与黄金标准后验匹配。我们的结果意味着“满足物理”并不等同于“采样后验”，并为不确定性感知的科学推理提供了原则性的修正。

英文摘要

Generative models -- diffusion and flow matching -- are increasingly used to solve partial differential equation (PDE) inverse problems, enforcing the governing physics as a \emph{hard constraint} (via projection or guidance) and reporting the resulting samples as a Bayesian posterior with calibrated uncertainty. We show that this widely adopted recipe samples the wrong distribution. Conditioning a generative prior on a hard PDE constraint is conditioning on a measure-zero manifold -- an operation that is intrinsically ambiguous (the Borel--Kolmogorov paradox) and whose physically correct resolution, the small-residual-noise limit, carries a co-area (Fixman) Jacobian factor $[det(JJ^{\top})]^{-1/2}$ that projection- and guidance-based methods silently omit. We make the bias precise, show that it grows with the heterogeneity of the constraint sensitivity, and validate it on controlled problems against an \emph{i.i.d.} ground-truth arbiter. The omitted factor is not a second-order detail: removing it inflates the posterior error to $20\times$ the sampling-noise floor; minimal-displacement projection (as in PCFM) is biased at $9\times$ the floor; and a naive scalar reweighting does not fix it. We introduce \textbf{CoCoS}, a measure-aware constrained sampler that targets the correct co-area posterior, and show that it matches the gold-standard posterior to within sampling noise. Our results imply that ``satisfying the physics'' is not the same as ``sampling the posterior,'' and give a principled correction for uncertainty-aware scientific inference.

URL PDF HTML ☆

赞 0 踩 0

2412.13858 2026-06-09 cs.AI cs.LG 版本更新

IDEQ -- Improving Diffusion Models for the Traveling Salesman Problem (TSP) by Leveraging the Structure of the Solution Space

IDEQ -- 利用解空间结构改进旅行商问题的扩散模型

Mickael Basson, Philippe Preux

发表机构 * Université de Lille（里尔大学）； CNRS（国家科学研究中心）； Inria（法国国家信息与自动化技术研究院）； UMR 9198-CRIStAL（UMR 9198-CRIStAL研究中心）

AI总结提出IDEQ方法，通过利用TSP解空间的约束结构和基于2-opt轨道的均匀分布训练目标，改进扩散模型求解TSP，在合成实例和TSPlib上达到新SOTA，接近LKH3性能。

详情

条件归一化流用于前向和后向联合状态与参数估计

Luke S. Lagunowich, Guoxiang Grayson Tong, Daniele E. Schiavazzi

发表机构 * Department of Computer Science and Engineering University of Notre Dame（计算机科学与工程系诺特达姆大学）； Department of Pediatrics Stanford University（儿科系斯坦福大学）； Department of Applied and Computational Mathematics and Statistics University of Notre Dame（应用与计算数学与统计系诺特达姆大学）

AI总结针对非线性非高斯系统，提出基于条件归一化流的状态滤波方法，结合MLP、Transformer或Mamba-SSM生成条件嵌入，并引入最优传输动力学损失缓解过参数化，在自动驾驶和COVID-19联合估计中验证有效性。

详情

AI中文摘要

传统的状态估计滤波算法——如经典卡尔曼滤波、无迹卡尔曼滤波和粒子滤波——在应用于不确定性遵循任意非高斯且可能多峰分布的非线性系统时，性能会下降。本研究回顾了基于条件归一化流进行非线性滤波的状态估计最新方法，其中条件嵌入由标准MLP架构、Transformer或选择性状态空间模型（如Mamba-SSM）生成。此外，我们测试了最优传输启发的动力学损失项在缓解由大量变换组成的流中过参数化问题的有效性。我们研究了这些方法在自动驾驶和患者群体动力学相关应用中的性能，特别关注它们如何处理时间反转和链式预测。最后，我们评估了各种条件策略在真实世界COVID-19联合SIR系统预测和参数估计应用中的性能。

英文摘要

Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters -- show performance degradation when applied to nonlinear systems whose uncertainty follows arbitrary non-Gaussian, and potentially multi-modal distributions. This study reviews recent approaches to state estimation via nonlinear filtering based on conditional normalizing flows, where the conditional embedding is generated by standard MLP architectures, transformers or selective state-space models (like Mamba-SSM). In addition, we test the effectiveness of an optimal-transport-inspired kinetic loss term in mitigating overparameterization in flows consisting of a large collection of transformations. We investigate the performance of these approaches on applications relevant to autonomous driving and patient population dynamics, paying special attention to how they handle time inversion and chained predictions. Finally, we assess the performance of various conditioning strategies for an application to real-world COVID-19 joint SIR system forecasting and parameter estimation.

URL PDF HTML ☆

赞 0 踩 0

2601.23231 2026-06-09 eess.IV cs.LG 版本更新

Solving Inverse Problems with Flow-based Models via Model Predictive Control

基于模型预测控制的流模型逆问题求解

George Webber, Alexander Denker, Riccardo Barbano, Andrew J Reader

发表机构 * University of Cambridge（剑桥大学）

AI总结提出MPC-Flow框架，将流模型逆问题求解转化为序列控制子问题，实现无需训练的推理时引导，理论联系最优控制，在图像修复任务中表现优异。

Comments Accepted for publication at ICML 2026

详情

AI中文摘要

基于流的生成模型为逆问题提供了强大的无条件先验，但引导其动态进行条件生成仍然具有挑战性。最近的工作将流模型中的无训练条件生成视为最优控制问题；然而，求解由此产生的轨迹优化在计算和内存上都很密集，需要对流动力学进行微分或伴随求解。我们提出了MPC-Flow，一个模型预测控制框架，将基于流的生成模型的逆问题求解公式化为一系列控制子问题，从而在推理时实现实用的基于最优控制的引导。我们提供了将MPC-Flow与底层最优控制目标联系起来的理论分析，并展示了不同的算法选择如何产生一系列引导算法，包括避免通过生成模型轨迹进行反向传播的机制。我们在基准图像恢复任务上评估了MPC-Flow，涵盖线性和非线性设置，如修复、去模糊和超分辨率，并通过在消费级硬件上对FLUX.2（32B）进行量化设置下的无训练引导，展示了强大的性能和可扩展性到大规模最先进架构。

英文摘要

Flow-based generative models provide strong unconditional priors for inverse problems, but guiding their dynamics for conditional generation remains challenging. Recent work casts training-free conditional generation in flow models as an optimal control problem; however, solving the resulting trajectory optimisation is computationally and memory intensive, requiring differentiation through the flow dynamics or adjoint solves. We propose MPC-Flow, a model predictive control framework that formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems, enabling practical optimal control-based guidance at inference time. We provide theoretical analysis linking MPC-Flow to the underlying optimal control objective and show how different algorithmic choices yield a spectrum of guidance algorithms, including regimes that avoid backpropagation through the generative model trajectory. We evaluate MPC-Flow on benchmark image restoration tasks, spanning linear and non-linear settings such as in-painting, deblurring, and super-resolution, and demonstrate strong performance and scalability to massive state-of-the-art architectures via training-free guidance of FLUX.2 (32B) in a quantised setting on consumer hardware.

URL PDF HTML ☆

赞 0 踩 0

2601.23286 2026-06-09 cs.CV cs.AI cs.LG 版本更新

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

VideoGPA: 通过几何先验知识蒸馏实现3D一致的视频生成

Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结 VideoGPA通过几何先验知识蒸馏提升视频生成的3D一致性，利用数据高效的自监督框架引导视频扩散模型，显著增强时间稳定性、几何合理性与运动一致性。

Comments 8 pages, 5 figures, ICML 2026

2602.07345 2026-06-09 cs.CV cs.LG 版本更新

Optimizing Few-Step Generation with Adaptive Matching Distillation

自适应匹配蒸馏优化少步生成

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, Zeke Xie

发表机构 * xLeaF Lab, The Hong Kong University of Science（xLeaF实验室，香港科学与技术大学）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学，深圳）； School of Intelligence Science（智能科学学院）

AI总结提出自适应匹配蒸馏（AMD），通过奖励代理检测并逃离禁止区域，结合结构信号分解和排斥景观锐化，提升少步生成模型的样本保真度和训练鲁棒性。

Comments 25 pages, 15 figures, 11 tables

详情

AI中文摘要

分布匹配蒸馏（DMD）是一种强大的加速范式，但其稳定性常在禁止区域（真实教师提供不可靠指导而虚假教师施加不足排斥力的区域）中受到损害。在这项工作中，我们提出了一个统一的优化框架，将先前的方法重新解释为避免这些受损区域的隐式策略。基于这一见解，我们引入了自适应匹配蒸馏（AMD），一种利用奖励代理显式检测和逃离禁止区域的自我纠正机制。AMD通过结构信号分解动态优先考虑纠正梯度，并引入排斥景观锐化以强制执行陡峭的能量屏障，防止失败模式崩溃。在图像和视频生成任务（如SDXL、Wan2.1）以及严格基准测试（如VBench、GenEval）上的大量实验表明，AMD显著提高了样本保真度和训练鲁棒性。例如，AMD将SDXL上的HPSv2分数从30.64提升至31.25，优于最先进的基线。这些发现验证了在禁止区域内显式纠正优化轨迹对于推动少步生成模型性能上限至关重要。

英文摘要

Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.

URL PDF HTML ☆

赞 0 踩 0

2602.18364 2026-06-09 cs.IT cs.LG math.IT quant-ph stat.ML 版本更新

Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings

通过希尔伯特空间嵌入的量子最大似然预测

Sreejith Sreekumar, Nir Weinberger

发表机构 * L2S, CNRS, CentraleSupélec, University of Paris-Saclay, France（L2S、CNRS、CentraleSupélec、巴黎-萨克雷大学、法国）

AI总结研究量子最大似然预测任务，通过将经验概率分布嵌入量子态并最小化量子相对熵，提出统一框架，给出非渐近性能保证。

Comments 31+3 pages, 1 figure

2603.10823 2026-06-09 stat.ML cs.LG 版本更新

ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning

ReTabSyn：通过强化学习实现真实表格数据合成

Xiaofeng Lin, Seungbae Kim, Zhuoya Li, Zachary DeSoto, Charles Fleming, Guang Cheng

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结 ReTabSyn通过强化学习优先学习条件分布，提升小数据下表格数据合成效率，优于现有基线方法。

详情

AI中文摘要

深度生成模型可通过生成合成训练数据缓解数据稀缺和隐私问题，但在低数据、不平衡的表格设置中难以完全学习复杂的数据分布。我们认为追求完整的联合分布可能过于苛刻；为了提高数据效率，模型应优先学习条件分布$P(y\mid \bm{X})$，这由最近的理论分析所支持。因此，我们通过\textbf{ReTabSyn}，一个提供合成器训练过程中特征相关性保留直接反馈的\textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis流程，克服了这一限制。这一目标鼓励生成器在数据有限时优先考虑最有用的预测信号，从而增强下游模型的实用性。我们通过这种做法对基于语言模型的生成器进行经验微调，并在具有小样本量、类别不平衡和分布偏移的基准测试中，ReTabSyn始终优于最先进的基线方法。此外，我们的方法可以轻松扩展到控制合成表格数据的各种方面，例如应用专家指定的生成观测约束。

英文摘要

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the full joint distribution could be overkill; for greater data efficiency, models should prioritize learning the conditional distribution $P(y\mid \bm{X})$, as suggested by recent theoretical analysis. Therefore, we overcome this limitation with \textbf{ReTabSyn}, a \textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis pipeline that provides direct feedback on feature correlation preservation during synthesizer training. This objective encourages the generator to prioritize the most useful predictive signals when training data is limited, thereby strengthening downstream model utility. We empirically fine-tune a language model-based generator using this approach, and across benchmarks with small sample sizes, class imbalance, and distribution shift, ReTabSyn consistently outperforms state-of-the-art baselines. Moreover, our approach can be readily extended to control various aspects of synthetic tabular data, such as applying expert-specified constraints on generated observations.

URL PDF HTML ☆

赞 0 踩 0

2605.02439 2026-06-09 cs.CV cs.LG 版本更新

Anomaly-Preference Image Generation

异常偏好图像生成

Fuyun Wang, Yuanzhi Wang, Xu Guo, Sujia Huang, Tong Zhang, Dan Wang, Hui Yan, Xin Liu, Zhen Cui

发表机构 * Nanjing University of Science（南京理工大学）； Beijing Normal University, Beijing, China（北京师范大学）； China Academy of Space Technology, Beijing, China（中国航天科技集团）

AI总结本文提出了一种新的异常生成方法，通过隐式偏好对齐机制和时间感知能力分配模块，提升生成图像的真实性和多样性，实验表明其在真实性和多样性上均优于现有方法。

Comments Accepted by ICML 2026

详情

AI中文摘要

从有限数据中合成逼真且多样的异常样本对于鲁棒模型泛化至关重要。然而，现有方法难以平衡保真度和多样性，通常受分布不匹配和过拟合的阻碍。为缓解这一问题，我们引入了异常偏好优化，一种将异常生成重新表述为偏好学习问题的新范式。我们的方法核心是隐式偏好对齐机制，利用真实异常作为正例参考，直接从去噪轨迹偏差中推导优化信号，而无需昂贵的人工标注。此外，我们提出了一个时间感知能力分配模块，动态地沿扩散时间线分配模型能力，在高噪声阶段优先考虑结构多样性，在低噪声阶段增强细粒度保真度。在推理过程中，分层采样策略调节保真度与对齐的权衡，实现对生成过程的精确控制。大量实验表明，该方法显著优于现有基线，实现了真实性和多样性方面的最先进性能。

英文摘要

Synthesizing realistic and diverse anomalous samples from limited data is vital for robust model generalization. However, existing methods struggle to reconcile fidelity and diversity, often hampered by distribution misalignment and overfitting, respectively.To mitigate this, we introduce Anomaly Preference Optimization,a novel paradigm that reformulates anomaly generation as a preference learning problem.Central to our approach is an implicit preference alignment mechanism that leverages real anomalies as positive references, deriving optimization signals directly from denoising trajectory deviations without requiring costly human annotation. Furthermore, we propose a Time-Aware Capacity Allocation module that dynamically distributes model capacity along the diffusion timeline,prioritizing structural diversity during highnoise phases while enhancing fine-grained fidelity in low-noise stages. During inference, a hierarchical sampling strategy modulates the coherencealignment trade-off, enabling precise control over generation. Extensive experiments demonstrate that significantly outperforms existing baselines,achieving state-of-the-art performance in both realism and diversity.

URL PDF HTML ☆

赞 0 踩 0

2605.14285 2026-06-09 eess.IV cs.LG 版本更新

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

通过扩散强迫实现统一且稳健的数据同化：ForcingDAS

Yixuan Jia, Siyi Chen, Yida Pan, Xiao Li, Lianghe Shi, Chanyong Jung, Haijie Yuan, Ismail Alkhouri, Yue Cynthia Wu, Saiprasad Ravishankar, Jeffrey A Fessler, Qing Qu

发表机构 * University of Michigan（密歇根大学）； University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； Massachusetts Institute of Technology（麻省理工学院）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出ForcingDAS，一种基于扩散强迫的统一数据同化框架，能够捕捉长时序依赖并减少误差积累，同时在推理时无需重新训练即可实现滤波到平滑的全谱应用。

详情

AI中文摘要

AI中文摘要

众所周知，ReLU网络定义连续分段线性函数，其线性区域是输入空间中的多面体。这些区域构成一个完全划分输入空间的复形。这些区域组合的方式对网络行为至关重要，因为非线性仅发生在这些区域连接的边界处。然而，除了区域总数的界限外，关于这些复形的几何性质所知甚少，且精确计算复形对大多数网络而言是棘手的。在这项工作中，我们证明了关于这些复形的新的理论结果，这些结果对所有全连接ReLU网络都成立，特别是关于它们的连通图，其中节点对应区域，边存在于由面连接的每对区域之间。我们发现，无论网络的宽度和深度如何，该图的平均度上界是输入维度的两倍，并且该图的直径有一个不依赖于输入维度的上界，尽管区域数量随输入维度指数增长。我们通过在合成和真实数据上训练的网络进行的实验证实了我们的发现，这些实验为ReLU网络的几何提供了额外的见解。重现我们结果的代码可在https://github.com/bl-ake/ICLR-2026找到。

英文摘要

It is well established that ReLU networks define continuous piecewise-linear functions, and that their linear regions are polyhedra in the input space. These regions form a complex that fully partitions the input space. The way these regions fit together is fundamental to the behavior of the network, as nonlinearities occur only at the boundaries where these regions connect. However, relatively little is known about the geometry of these complexes beyond bounds on the total number of regions, and calculating the complex exactly is intractable for most networks. In this work, we prove new theoretical results about these complexes that hold for all fully-connected ReLU networks, specifically about their connectivity graphs in which nodes correspond to regions and edges exist between each pair of regions connected by a face. We find that the average degree of this graph is upper bounded by twice the input dimension regardless of the width and depth of the network, and that the diameter of this graph has an upper bound that does not depend on input dimension, despite the number of regions increasing exponentially with input dimension. We corroborate our findings through experiments with networks trained on both synthetic and real-world data, which provide additional insight into the geometry of ReLU networks. Code to reproduce our results can be found at https://github.com/bl-ake/ICLR-2026.

URL PDF HTML ☆

赞 0 踩 0

2606.07890 2026-06-09 cs.LG stat.ML 新提交

深度高斯过程到底有多深？组合高斯过程的尖锐阈值与非高斯极限

Mark Kozdoba, Shie Mannor

发表机构 * Technion, IIT（以色列理工学院）； NVIDIA（英伟达）

AI总结本文研究了深度高斯过程先验在深度增长时的极限行为，识别出RBF核带宽的尖锐阈值，低于该阈值时先验收敛到非退化非高斯分布，具有非零坐标依赖。

详情

AI中文摘要

组合先验描述了深度贝叶斯模型中分层函数的通用属性，其中随机权重的深度神经网络是一个典型例子。在宽网络极限下，先验是一个具有深度相关核的高斯过程，其随深度增长的行为已通过该核得到广泛研究。这里，我们研究另一种情况，其中每一层本身是一个向量值高斯过程，我们的目标类似地理解先验随深度增长的极限行为。先前的高斯过程工作已确定，对于RBF核和一定范围的带宽$r$，先验在极限下退化，收敛到常数函数集——这作为概率模型是无用的。在本文中，我们建立了几个新结果。首先，我们识别出一个尖锐的带宽阈值$r_c(d) = Θ(\sqrt{d})$，高于该阈值极限是退化的，加强了先前的界限。其次，更重要的是，我们证明对于低于阈值$r_c(d)$的$r$，先验收敛到极限分布$π_{\bar{Z}}$。我们还证明这些分布是非退化且非高斯的，坐标之间具有非消失的依赖性。与先前已知的退化机制相反，深度高斯过程先验因此可以允许非平凡极限。实验上，我们在维度$d$的范围内验证了该阈值，并展示了极限分布$π_{\bar{Z}}$的复杂多模态行为——该机制随$d$增长而变得狭窄，且在不了解阈值的情况下难以识别。

英文摘要

Compositional priors describe the generic properties of layered functions in deep Bayesian models, where deep neural networks with random weights are a canonical example.In the wide-network limit, the prior is a Gaussian process with a depth-dependent kernel, and its behaviour as depth grows has been extensively studied through this kernel. Here, we study another case, where each layer itself is a vector valued Gaussian process, and our aim is similarly to understand the limiting behaviour of the prior as depth grows. Previous GP work has established that for the RBF kernel and a certain range of bandwidths $r$, the prior degenerates in the limit, converging to the set of constant functions -- which is not useful as a probabilistic model. In this paper we establish several new results. First, we identify a sharp bandwidth threshold $r_c(d) = Θ(\sqrt{d})$ above which the limit is degenerate, strengthening the earlier bounds. Second, and more importantly, we show that for $r$ below the threshold $r_c(d)$ the prior converges to a limit distribution $π_{\bar{Z}}$. We also prove that these distributions are non-degenerate and non-Gaussian, with non-vanishing dependence between coordinates. In contrast to the previously known degenerate regime, deep Gaussian process priors can therefore admit non-trivial limits. Empirically, we verify the threshold across a range of dimensions $d$, and demonstrate a complex multimodal behaviour of the limit distributions $π_{\bar{Z}}$ -- a regime that becomes increasingly narrow with $d$ and would be hard to identify without knowing the threshold.

URL PDF HTML ☆

赞 0 踩 0

2606.08291 2026-06-09 cs.LG 新提交

On solving symmetric multi-type orthogonal non-negative matrix tri-factorization problem

求解对称多类型正交非负矩阵三因子分解问题

Rok Hribar, Gregor Papa, Janez Povh, Andrej Kastrin

发表机构 * Laboratory for Engineering Design, Faculty of Mechanical Engineering, University of Ljubljana（卢布尔雅纳大学机械工程学院工程设计实验室）； Rudolfovo – Science and Technology Centre Novo mesto（诺沃莫斯特鲁德沃尔福科学与技术中心）； Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana（卢布尔雅纳大学生物统计与医学信息学研究所）

AI总结研究对称多类型正交非负矩阵三因子分解问题，提出基于KKT条件的定点法和基于ADAM的三阶段算法，在合成数据和引文网络上验证了分解质量与聚类、链接预测等任务中的竞争力。

Comments 27 pages, 9 tables, 3 figures

详情

AI中文摘要

我们研究了对称多类型正交非负矩阵三因子分解问题，其中多个对称非负矩阵被同时近似为形式为$GS_{i}G^{\top}$的因子，共享一个非负且正交的因子$G$。该模型由聚类和网络分析驱动，其中非负性提高了可解释性，正交性为潜在因子提供了自然的分配型结构。由于所得优化问题高度非凸，我们开发了两种启发式算法来计算高质量的局部解。第一种是基于Karush-Kuhn-Tucker条件在添加正交约束惩罚项后导出的不动点方法。第二种是三阶段基于ADAM的方法，结合了保持非负性的优化、正交化以及可行集上的受限ADAM精化。我们在合成数据（包括含噪声实例）和引文网络基准上评估了这两种方法。合成实验表明，两种算法都能恢复接近最优的分解，并在噪声下保持稳定。在真实网络上，学习到的嵌入在链接预测、节点聚类和节点分类任务中与标准基线（如SVD、node2vec和经典链接预测启发式方法）相比具有竞争力或更优。

英文摘要

We study the symmetric multi-type orthogonal non-negative matrix tri-factorization problem, where several symmetric non-negative matrices are simultaneously approximated by factors of the form $GS_{i}G^{\top}$, with a shared non-negative and orthogonal factor $G$. This model is motivated by clustering and network analysis, where non-negativity improves interpretability and orthogonality gives a natural assignment-type structure to the latent factor. Since the resulting optimization problem is highly non-convex, we develop two heuristic algorithms for computing high-quality local solutions. The first one is a fixed point method derived from the Karush-Kuhn-Tucker conditions after adding a penalty term for the orthogonality constraint. The second one is a three-stage ADAM-based method that combines non-negativity-preserving optimization, orthogonalization, and restricted ADAM refinement on the feasible set. We evaluate both methods on synthetic data, including noisy instances, and on citation network benchmarks. The synthetic experiments show that both algorithms recover factorizations close to the optimum and remain stable under noise. On real networks, the learned embeddings are competitive with or better than standard baselines such as SVD, node2vec, and classical link prediction heuristics in link prediction, node clustering, and node classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.08308 2026-06-09 cs.LG 新提交

神经表征的线性可分性几何度量

Yi Wei, Xuan Qi, Furao Shen

发表机构 * State Key Laboratory of Novel Software Technology, School of Intelligence Science and Technology, Nanjing University（南京大学智能科学与技术学院软件新技术国家重点实验室）； AI for Good (AIGO), Istituto Italiano di Tecnologia（意大利技术研究院AI for Good (AIGO)）； DITEN, University of Genoa（热那亚大学DITEN）； State Key Laboratory of Novel Software Technology, School of Artificial Intelligence, Nanjing University（南京大学人工智能学院软件新技术国家重点实验室）

AI总结提出方向线性可分性度量（LSM），通过搜索包含目标类所有样本的仿射半空间并测量最小竞争样本入侵量，为神经表征的类间几何提供不对称、类级、目标归一化的诊断工具。

详情

AI中文摘要

现代神经分类器通常依赖线性读出，但仅预测指标无法刻画此类读出所操作的表征的类间几何。我们引入方向线性可分性度量（LSM），一种用于单侧仿射可分性的有限样本诊断工具。对于目标类A和竞争集B，LSM搜索包含A中所有样本的仿射半空间，并测量必须留在目标侧的最小竞争样本入侵量，按|A|归一化。所得量是不对称的、类级的、目标归一化的，适用于从神经网络提取的有限表征。我们建立了其支撑超平面刻画，将其与最优仿射分类精度关联，并证明了在全秩线性嵌入下的不变性。这些结果将线性重参数化引起的变化与信息丢失或非线性几何变换引起的变化区分开来。我们还给出了一种基于惩罚的仿射搜索，用于在高维特征中估计类级LSM，报告的值根据原始离散保持和违反准则计算。最后，我们将坐标门控非线性作为有限样本几何算子进行分析，并经验性地使用LSM诊断常见深度学习组件和架构中的类级入侵。

英文摘要

Modern neural classifiers commonly rely on linear readouts, yet predictive metrics alone do not characterize the class-wise geometry of the representations on which such readouts operate. We introduce the directional linear separability measure (LSM), a finite-sample diagnostic for one-sided affine separability. For a target class A and a competing set B, LSM searches over affine halfspaces that contain all samples in A and measures the smallest competing-sample intrusion that must remain on the target side, normalized by |A|. The resulting quantity is asymmetric, class-wise, target-normalized, and applicable to finite representations extracted from neural networks. We establish its supporting-hyperplane characterization, relate it to optimal affine classification accuracy, and prove invariance under full-rank linear embeddings. These results separate changes caused by linear reparameterization from those caused by information loss or nonlinear geometric transformations. We also give a penalty-based affine search for estimating class-wise LSM in high-dimensional features, with reported values computed from the original discrete preservation and violation criterion. Finally, we analyze coordinatewise gated nonlinearities as finite-sample geometric operators and empirically use LSM to diagnose class-wise intrusion across common deep-learning components and architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.08768 2026-06-09 cs.LG 新提交

Understanding the Parameter Space Geometry of Transformers Encoding Boolean Functions

理解编码布尔函数的Transformer参数空间几何

Blanka Köver, Alexandra Butoi, Anej Svete, Michael Hahn, Ryan Cotterell

发表机构 * Machine Learning, ICML（机器学习，ICML）

AI总结针对Transformer无法学习某些简单布尔函数（如奇偶函数）的问题，通过分析参数空间几何，证明敏感函数在参数空间中占据极小区域，随机初始化几乎必然错过，从而解释了可表达但不可学习的现象。

Comments ICML 2026

详情

AI中文摘要

Transformer始终无法学习某些简单的函数，而这些函数在特定参数设置下是可证明表达的。这种可学习性与可表达性之间的差距对于敏感函数尤为突出——例如奇偶函数，其输出在输入单个比特翻转时很可能改变。虽然先前的研究已经确定Transformer偏向于平均敏感度低的函数，但这种偏向背后的精确机制仍不清楚。为了阐明这一现象，我们研究了Transformer参数空间的几何结构。我们证明，敏感函数——即使可表示——占据了一个极小区域，随机初始化极有可能错过。具体而言，我们将关注点从平均敏感度转移到完整的敏感度分布——所有输入上敏感度值的分布——并证明随机初始化的Transformer几乎必然计算具有低敏感度字符串的函数。因此，任何缺乏此类字符串的函数都是可证明不可学习的。

英文摘要

Transformers consistently fail to learn certain simple functions that are provably expressible with specific parameter settings. This gap between learnability and expressivity is particularly prominent for sensitive functions -- functions whose output is likely to change if a single bit of the input is flipped -- for example, PARITY. While prior work has established that transformers exhibit a bias toward functions with low average sensitivity, the precise mechanism underlying this bias remains poorly understood. To shed light on this phenomenon, we study the geometry of transformers' parameter space. We show that sensitive functions -- even when representable -- occupy a vanishingly small region that random initialization is very likely to miss. Specifically, we shift the focus from average sensitivity to the full sensitivity profile -- the distribution of sensitivity values across all inputs -- and prove that randomly initialized transformers almost surely compute functions which have low-sensitivity strings. Consequently, any function that lacks such strings is provably unlearnable.

URL PDF HTML ☆

赞 0 踩 0

2606.08797 2026-06-09 cs.LG cs.AI 新提交

Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

通过拉格朗日分解将决策聚焦学习扩展到大规模问题

Stéphane Eilles-Chan Way, Hugo Percot, Quentin Cappart, Tias Guns, Louis-Martin Rousseau

发表机构 * Polytechnique Montréal（蒙特利尔综合理工学院）； Ecole Polytechnique（巴黎综合理工学院）； UCLouvain（鲁汶大学）； Mila - Québec AI Institute（魁北克人工智能研究所）； KU Leuven（荷语鲁汶大学）

AI总结提出结合拉格朗日分解的决策聚焦学习框架，通过新代理目标和两种损失函数，在保持可并行化的同时，有效处理大规模约束优化问题，实验表明在变量数多八倍的实例上优于传统方法。

详情

AI中文摘要

决策聚焦学习在解决预测-优化问题中显示出巨大潜力，尤其是在模型欠规范的情况下。然而，其实际部署常因高计算成本和有限的可扩展性而受阻，因为需要在每次迭代中对每个训练实例求解一个约束优化问题。为解决这些挑战，我们提出了一种新颖的框架，将拉格朗日分解融入决策聚焦学习范式。具体而言，我们引入了一个新的代理目标以及两个用于评估和训练底层预测模型的损失函数。我们进一步提出了两种变体，它们在计算效率和解决方案质量之间提供了不同的权衡。我们的框架可以无缝集成到标准的决策聚焦学习方法中，包括Smart Predict-then-Optimize (SPO+)和隐式最大似然估计 (IMLE)。通过在两个标准基准测试（多维背包问题和二次投资组合优化）上的实验，我们证明了我们的方法在保持可并行化的同时实现了有竞争力的性能。特别是，在大规模实例上，它始终优于传统的决策聚焦学习方法，这些实例的变量数比相关工作通常考虑的要多出八倍。实现代码可在 https://github.com/corail-research/DFL-LD 获取。

英文摘要

Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational costs and limited scalability, as it requires solving a constrained optimization problem for each training instance at every iteration. To address these challenges, we propose a novel framework that incorporates Lagrangian decomposition into the decision-focused learning paradigm. Specifically, we introduce a new surrogate objective along with two loss functions for evaluating and training the underlying prediction model. We further propose two variants of our approach, which offer different trade-offs between computational efficiency and solution quality. Our framework can be seamlessly integrated with standard decision-focused learning methods, including Smart Predict-then-Optimize (SPO+) and Implicit Maximum Likelihood Estimation (IMLE). Through experiments on two standard benchmarks, the multi-dimensional knapsack problem and quadratic portfolio optimization, we demonstrate that our approach achieves competitive performance while remaining amenable to parallelization. In particular, it consistently outperforms traditional decision-focused learning methods on large-scale instances, involving up to eight times more variables than those typically considered in related work. The implementation is available at https://github.com/corail-research/DFL-LD.

URL PDF HTML ☆

赞 0 踩 0

2606.08993 2026-06-09 cs.LG cs.SY eess.SY math.OC 新提交

LEAF: A Learning-Enabled ADMM Framework for Accelerated Convex Optimization

LEAF: 一种用于加速凸优化的学习增强ADMM框架

Binh Nguyen, Trinh Tran, Truong X. Nghiem

发表机构 * University of Central Florida（中佛罗里达大学）

AI总结提出LEAF框架，通过输入凸神经网络学习Moreau包络来加速凸优化，降低模型复杂度并保持收敛性，实验显示比最先进求解器快一个数量级。

详情

AI中文摘要

我们提出LEAF，一种用于加速凸优化的学习增强ADMM框架。关键思想是使用输入凸神经网络（ICNN）逼近目标函数的Moreau包络，从而得到一个保持凸性和光滑性的学习模型。这导致了所提出的Moreau包络学习ADMM（MEL-ADMM）及其分裂变体sMEL-ADMM。与直接学习高维算子的现有方法不同，LEAF学习标量值的Moreau包络，显著降低了模型复杂度并提高了数据效率。该框架适用于包括光滑和非光滑目标在内的广泛凸问题。通过ICNN架构显式嵌入凸性，所提出的方法在保持优化问题关键结构性质的同时保持了高逼近精度。MEL-ADMM和sMEL-ADMM都在学习模型下具有收敛性和可行性的理论保证。严格分析表明，所提出的方法实现了与经典ADMM相当的收敛速度，同时降低了每次迭代的计算成本。数值实验表明，与最先进的求解器相比，速度提升可达一个数量级，同时保持较低的最优性差距。

英文摘要

We propose LEAF, a learning-enabled ADMM framework for accelerated convex optimization. The key idea is to approximate the Moreau envelope of the objective function using an Input Convex Neural Network (ICNN), resulting in a learned model that preserves convexity and smoothness. This leads to the proposed Moreau Envelope Learning ADMM (MEL-ADMM) and its splitting variant sMEL-ADMM. Unlike existing approaches that learn high-dimensional operators directly, LEAF learns a scalar-valued Moreau envelope, significantly reducing model complexity and improving data efficiency. The framework accommodates a broad class of convex problems with smooth and non-smooth objectives. By embedding convexity explicitly through the ICNN architecture, the proposed approach maintains high approximation accuracy while preserving key structural properties of the optimization problem. Both MEL-ADMM and sMEL-ADMM are developed with theoretical guarantees of convergence and feasibility under the learned model. Rigorous analysis shows that the proposed methods achieve convergence rates comparable to classical ADMM while reducing per-iteration computational cost. Numerical experiments demonstrate up to an order-of-magnitude speedup over state-of-the-art solvers while maintaining low optimality gaps

URL PDF HTML ☆

赞 0 踩 0

2606.09154 2026-06-09 cs.LG 新提交

Improved Convergence Analysis of Topology Dependence in Decentralized SGD

去中心化SGD中拓扑依赖性的改进收敛分析

Yuki Takezawa, Anastasia Koloskova, Sebastian U. Stich

发表机构 * University of Washington（华盛顿大学）

AI总结提出更紧的收敛分析，揭示混合矩阵所有特征值影响收敛速率，并通过实验验证比仅用谱间隙的分析更准确。

Comments ICML 2026

详情

AI中文摘要

去中心化SGD是去中心化学习中的基本算法，尽管底层网络拓扑对其收敛行为的影响尚未完全理解。现有的收敛分析表明，在同质和异质情况下，具有小谱间隙的拓扑会显著恶化去中心化SGD的收敛速率。然而，许多先前的论文报告说，在异质情况下拓扑的选择确实对实验有显著影响，但在同质情况下对训练行为影响很小。在本文中，我们提出了去中心化SGD的更紧的收敛分析，比先前的分析更精确地理解拓扑如何影响收敛速率。具体来说，与仅使用谱间隙作为拓扑属性的现有收敛分析不同，我们的新分析表明混合矩阵的所有特征值都影响收敛速率。通过实验，我们仔细评估了去中心化SGD的收敛行为，并证明了我们的新收敛分析可以更准确地描述拓扑对收敛速率的影响。

英文摘要

Decentralized SGD is a fundamental algorithm in decentralized learning, although the influence of an underlying network topology on its convergence behavior is not yet fully understood. Existing convergence analyses have shown that topologies with a small spectral gap significantly deteriorate the convergence rate of Decentralized SGD in both homogeneous and heterogeneous cases. However, many prior papers have reported that indeed the choice of the topology has a significant experimental impact in the heterogeneous case, but has little experimental impact on training behavior in the homogeneous case. In this paper, we present a tighter convergence analysis of Decentralized SGD, offering a more precise understanding of how topologies affect the convergence rate than the prior analysis. Specifically, unlike existing convergence analyses that used only the spectral gap as a property of the topology, our novel analysis shows that all eigenvalues of the mixing matrix affect the convergence rate. Throughout the experiments, we carefully evaluated the convergence behavior of Decentralized SGD and demonstrated that our novel convergence analysis can more accurately describe the effect of topology on the convergence rate.

URL PDF HTML ☆

赞 0 踩 0

2606.09731 2026-06-09 cs.LG 新提交

Tight Sample Complexity of Transformers

Transformer的紧样本复杂度

Chenxiao Yang, Nathan Srebro, Zhiyuan Li

发表机构 * Toyota Technological Institute at Chicago（丰田技术研究所芝加哥分校）

AI总结本文刻画了深度L、总参数W的Transformer的VC维，并建立了思维链学习的样本复杂度上下界，揭示了参数与序列长度对学习所需样本量的影响。

Comments in COLT 2026

2606.07588 2026-06-09 cs.NE cs.LG math.OC quant-ph 交叉投稿

Information-Geometric Optimization on Spheres

球面上的信息几何优化

Vladimir Ja\' cimović

发表机构 * Faculty of Natural Sciences and Mathematics University of Montenegro（自然科学与数学学院蒙特内格罗大学）

AI总结针对球面上的黑箱优化问题，基于庞加莱球和伯格曼球的超几何信息几何，设计了两种信息几何优化流，并展示了广义Kuramoto振子集合如何计算自然搜索梯度并实现IGO算法。

2606.07782 2026-06-09 math.OC cs.LG math.MG 交叉投稿

Non-Archimedean Polydisc Spaces and Applications to Optimisation

非阿基米德多圆盘空间及其在优化中的应用

Paul Lezeau, Yiannis Fam, Anthea Monod, Yue Ren

发表机构 * London School of Geometry and Number Theory（伦敦几何与数论学院）； Department of Mathematics, Imperial College London（伦敦帝国学院数学系）； Department of Mathematics, Durham University（杜伦大学数学系）

AI总结受Berkovich几何启发，提出非阿基米德多圆盘空间，保留刚性层次结构并具备良好几何性质，证明其可嵌入度量树，提出多项式绝对值线性组合的函数类，建立优化理论并给出算法与开源实现。

Comments 54 pages, 23 figures. Comments welcome

详情

AI中文摘要

我们提出了一个受Berkovich几何启发的非阿基米德空间上的优化新框架。具体地，我们引入了多圆盘空间，它由非阿基米德域上的闭球乘积构成。这些空间保留了非阿基米德域的刚性层次结构，同时获得了许多该域所缺乏的优良几何特征。我们证明了度量树自然地嵌入这些空间，展示了它们表示层次数据的能力。我们研究了它们的度量几何，建立了诸如测地线唯一性等性质，证实了它们与经典优化技术的兼容性。我们进一步提出了一类由多项式绝对值线性组合给出的实值函数。这些函数沿测地线具有分段多项式描述，并满足通用逼近性质。我们建立了多圆盘空间上的优化理论：证明了极小值的存在性，并探索了寻找极小值的算法。我们提供了一个配套的开源Julia库，实现了所引入的核心对象和优化过程。

英文摘要

We propose a new framework for optimisation over non-Archimedean spaces inspired by Berkovich geometry. Specifically, we introduce polydisc spaces, which consists of products of closed balls over a non-Archimedean field. These spaces retain the rigid hierarchical structure of the non-Archimedean field whilst acquiring many desirable geometric features absent from it. We show that metric trees embed naturally into these spaces, demonstrating their capacity to represent hierarchical data. We study their metric geometry, establishing properties such as geodesic uniqueness, confirming their comaptibility with classical optimisation techniques. We further propose a class of real-valued functions given by linear combinations of absolute values of polynomials. These functions admit a piecewise polynomial description along geodesics and satisfy a universal approximation property. We formulate a theory of optimisation on polydisc spaces: we prove existence of minimisers and explore algorithms for finding them. We provide an accompanying open-source Julia library implementing the core objects and optimisation procedures introduced.

URL PDF HTML ☆

赞 0 踩 0

2606.07841 2026-06-09 stat.CO cs.LG stat.ML 交叉投稿

Large-scale empirical tuning and comparison of default optimizers for variational inference

变分推断默认优化器的大规模经验调优与比较

Trevor Campbell, Jonathan H. Huggins, Kyurae Kim, Charles C. Margossian

发表机构 * Department of Statistics, UBC（统计学系，不列颠哥伦比亚大学）； Department of Mathematics & Statistics, Boston University（数学与统计学系，波士顿大学）； Faculty of Computing & Statistics, Boston University（计算与统计学学院，波士顿大学）； Department of Computer and Information Science, UPenn（计算机与信息科学系，宾夕法尼亚大学）

AI总结通过大规模实验（56种优化器、1092个问题、55万次运行）评估变分推断中的自适应优化器，发现无单一方法最优，但5种算法组合可接近最佳性能。

详情

AI中文摘要

黑箱变分推断（BBVI）是一种依赖于随机优化的后验近似方法。在实践中，支撑BBVI的随机优化器通常需要大量针对特定问题的调优，这削弱了其作为真正“黑箱”推断算法的承诺。然而，在过去十年中，许多新的自适应随机优化算法已被开发出来，它们减少或完全消除了调优的需要。在这项工作中，我们在BBVI的背景下研究了这些新的自适应方法集合，旨在建立当前无调优优化推断的最新技术水平。具体而言，我们对应用于1092个贝叶斯推断优化问题的56种基于随机梯度的优化算法进行了大规模实证评估，涉及超过55万次独立优化运行和15个核心年的计算。我们评估的优化算法代表了近期方法的广泛谱系，而基准问题则涵盖了从难度范围（后验目标维度1-10^4，条件数1-10^8）以及多种变分族。我们的结果表明，没有单一方法占主导地位，但运行5种算法的选择足以可靠地接近观察到的最佳性能。因此，我们为无法进行专家调优的应用以及开发新的随机优化算法时的比较提供了强有力的基线。

英文摘要

Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.07914 2026-06-09 stat.ML cs.LG 交叉投稿

Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence

边际独立下无标签有限混合模型的可识别性与估计

Takafumi Kanamori, Yushi Hirose, Shohei Yamamoto

发表机构 * Department of Mathematical and Computing Science, Institute of Science Tokyo（科学东京学院数学与计算科学系）； RIKEN Center for Advanced Intelligence Project（日本学术振兴会先进人工智能项目中心）

AI总结研究无标签有限混合模型中，利用边际独立性假设恢复潜在成分和估计混合矩阵，提出PM-MMD估计器并证明其收敛性。

详情

AI中文摘要

我们研究来自无标签有限混合模型的成分恢复和混合矩阵估计，其中可观测分布共享相同的潜在成分但具有未知的混合权重。主要识别信号是边际独立性：每个成分假设在至少一个坐标对上是独立的，但没有观察到标签、干净的成分样本或混合权重。我们首先证明乘积成分的一个结构结果：在一元边际线性独立的条件下，成分的任何独立仿射组合必须与单个成分一致。然后我们将这一原理扩展到可观测混合，并表明在满秩和无抵消条件下，边际独立的仿射组合恢复相应的潜在成分。当每个成分在某个坐标对上是独立的时，所有成分都是可识别的，并且在所陈述的完成条件下混合矩阵是可恢复的。最后，我们提出一个基于可观测混合的仿射组合的乘积边际最大均值差异（PM-MMD）估计器，并证明在近似边际独立下的一致收敛性和稳定性。该框架还分离了假设的经验作用：一般来说，不可约性不能直接从无标签混合中检验，而边际独立性通过保留的PM-MMD提供候选级别的诊断。受控实验和流式细胞术实验显示了边际独立性何时提供有用的恢复信号。在报告的多成分比较中，条件感知的代表性选择稳定了PM-MMD，并相对于使用相同无标签混合的聚类、分解和成对混合比例基线改善了恢复。

英文摘要

We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.

URL PDF HTML ☆

赞 0 踩 0

2606.07926 2026-06-09 stat.ML cs.LG 交叉投稿

Barycentric Projections of Optimal Transport Plans on Riemannian Manifolds

黎曼流形上最优传输计划的重心投影

Kisung You

发表机构 * Baruch College（巴彻学院）

AI总结提出黎曼流形上传输耦合的重心投影框架，通过条件Fréchet均值得到最佳确定性映射，并定义条件方差Monge缺陷，实验验证了内在投影与切向投影的不同作用。

详情

AI中文摘要

最优传输耦合是概率对象，而许多学习流程需要确定性映射。在欧几里得空间中，重心投影通过取条件期望将耦合转换为映射，但在黎曼流形上，曲率和割迹使这一操作变得不平凡。我们开发了一个黎曼流形上传输耦合的重心投影框架。内在投影将每个源点映射到其目标分布的条件Fréchet均值，并证明它是平方测地线损失下的最佳确定性代表。相应的最小值是积分条件Fréchet方差，该方差对于由映射诱导的耦合恰好为零，因此定义了一个条件方差Monge缺陷。我们还研究了一个切向log-exp投影，证明了其欧几里得精确性、在Monge情况下与Brenier-McCann映射的兼容性，以及其作为内在目标的第一单位黎曼梯度更新的解释。对于离散耦合，两种构造都按行分解为加权Fréchet均值和log-exp问题。在球面数据、合成SPD数据和真实EEG协方差矩阵上的实验支持所提出的角色分工：内在投影是变分代表，而切向投影是有用的局部位移代理。

英文摘要

Optimal transport couplings are probabilistic objects, while many learning pipelines require deterministic maps. In Euclidean space, barycentric projection converts a coupling into a map by taking conditional expectations, but on a Riemannian manifold curvature and cut loci make this operation nontrivial. We develop a framework for barycentric projections of transport couplings on Riemannian manifolds. The intrinsic projection maps each source point to the conditional Fréchet mean of its destination law and is shown to be the best deterministic representative under squared geodesic loss. The corresponding minimum value is an integrated conditional Fréchet variance, which vanishes exactly for map-induced couplings and therefore defines a conditional-variance Monge defect. We also study a tangential log-exp projection, prove its Euclidean exactness, its compatibility with Brenier-McCann maps in the Monge case, and its interpretation as the first unit Riemannian gradient update for the intrinsic objective. For discrete couplings, both constructions decompose row-wise into weighted Fréchet mean and log-exp problems. Experiments on spherical data, synthetic SPD data, and real EEG covariance matrices support the proposed division of roles: the intrinsic projection is the variational representative, while the tangential projection is a useful local displacement surrogate.

URL PDF HTML ☆

赞 0 踩 0

2606.07931 2026-06-09 math.PR cond-mat.stat-mech cs.IT cs.LG math.IT math.ST stat.TH 交叉投稿

Pointwise Complexity for Gaussian Fields: Upper Envelopes, Algorithmic Lower Bounds, and Separation

高斯场的逐点复杂度：上包络、算法下界与分离

Yunbei Xu

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文证明了一个方差感知的逐点主测度定理，为高斯过程提供高概率上包络，并通过贝叶斯算法下界和加权基示例，揭示了逐点复杂度与全局极小极大风险之间的分离。

详情

AI中文摘要

我们为中心高斯过程证明了一个方差感知的逐点主测度定理。经典的泛函链刻画了标量量$\mathbb E\sup_{x\in T}X_x$；这里的定理给出了整个场的同时高概率包络。对于先验测度$\mu$，在$x$处的包络由逐点Fernique-Talagrand泛函\[\Phi_\mu(x):=\int_0^{4\sigma(x)}\sqrt{\log\frac{1}{\mu(B_d(x,\varepsilon))}}\,d\varepsilon\]以及相应的高斯尾项控制。该定理提供了经典泛函链的可重用场级精化，以及深度神经网络逐点经验过程界的高斯过程对应物。我们还从交互式Fano/数据处理原理记录了一个贝叶斯算法下包络。对于已知先验$\pi$、观测信道和具体估计量$\widehat t(Y)$，下界通过精确的鬼小弹球质量$\mathbb E_{Y\sim Q}\pi(B_d(\widehat t(Y),\Delta))$表示，而非最坏情况覆盖数。在高斯位置实验中，比较译码器将贝叶斯位置误差转化为决策对齐高斯范围的下界。然后我们构造一个简单的加权基示例，将固定先验的通常Fano松弛、贝叶斯算法下包络、选定子图集上的逐点高斯包络以及全类极小极大风险/全局高斯尺度分离开来。这些结果共同表明，在经典极小极大理论变得过于粗糙或依赖预言机的超参数化环境类中，算法下界为固定估计量提供了逐点复杂性的局部几何证书。

英文摘要

We prove a variance-aware pointwise majorizing-measure theorem for centered Gaussian processes. Classical generic chaining characterizes the scalar quantity $\mathbb E\sup_{x\in T}X_x$; the theorem here gives a simultaneous high-probability envelope for the entire field. For an ambient prior $μ$, the envelope at $x$ is governed by a pointwise Fernique-Talagrand functional \[Φ_μ(x):=\int_0^{4σ(x)}\sqrt{\log\frac{1}{μ(B_d(x,\varepsilon))}}\,d\varepsilon,\] together with the corresponding Gaussian tail term. The theorem provides a reusable field-level refinement of classical generic chaining and a Gaussian-process counterpart of pointwise empirical-process bounds for deep neural networks. We also record a Bayesian algorithmic lower envelope from the interactive Fano/data-processing principle. For a known prior $π$, an observation channel, and a concrete estimator $\widehat t(Y)$, the lower bound is expressed through the exact ghost small-ball mass $\mathbb E_{Y\sim Q}π(B_d(\widehat t(Y),Δ))$, rather than a worst-case covering number. In Gaussian location experiments, comparison decoders convert Bayes location error into lower bounds on decision-aligned Gaussian ranges. We then construct an elementary weighted-basis example separating the usual Fano relaxation for a fixed prior, the Bayesian algorithmic lower envelope, the pointwise Gaussian envelope on the selected subatlas, and the full-class minimax risk/global Gaussian scale. Together, these results show that algorithmic lower bounds provide local-geometric certificates of pointwise complexity for fixed estimators in overparameterized ambient classes, precisely in regimes where classical minimax theory becomes either too coarse or oracle-dependent.

URL PDF HTML ☆

赞 0 踩 0

2606.08188 2026-06-09 math.OC cs.LG 交叉投稿

Latent Structural Categorical Matrix Completion with Application to Quasispecies Analysis

潜在结构分类矩阵补全及其在准种分析中的应用

Qian Zhang, Meixia Lin

发表机构 * Engineering Systems and Design, Singapore University of Technology and Design（新加坡科技设计大学工程系统与设计系）； Institute of Statistics and Big Data, Renmin University of China（中国人民大学统计与大数据研究院）

AI总结提出LCMC双循环优化框架，通过二元张量表示对分类矩阵进行潜在分解，外环自适应估计潜在维度，内环通过张量分解重构矩阵，在病毒准种重建中优于现有方法。

详情

AI中文摘要

矩阵补全在实值数据中已被广泛研究，但现有方法在处理分类变量时往往受限。我们提出LCMC，一种基于二元张量表示的潜在分解分类矩阵补全双循环优化框架。在此设置中，每个分类条目沿第三张量模式编码为独热向量，从而保留其离散、非序数的性质。外环通过内环反馈迭代更新潜在维度来自适应估计，内环通过张量分解重构分类矩阵，并有相应理论分析支持。为进一步提高可扩展性和鲁棒性，我们引入了包括分裂-合并-细化策略和自适应数据缩减技术在内的增强功能。在病毒准种重建的合成和真实数据集上的实验表明，与现有方法相比，LCMC实现了更高的准确性和效率。

英文摘要

Matrix completion has been extensively studied for real-valued data, but existing methods are often limited in handling categorical variables. We propose LCMC, a double-loop optimization framework for categorical matrix completion via latent factorization based on a binary tensor representation. In this setting, each categorical entry is encoded as a one-hot vector along a third tensor mode, thereby preserving its discrete, non-ordinal nature. The outer loop adaptively estimates the latent dimension by iteratively updating it with feedback from the inner loop, while the inner loop reconstructs the categorical matrix through tensor factorization, supported by a corresponding theoretical analysis. To further improve scalability and robustness, we introduce enhancements including a split-merge-refine strategy and an adaptive data reduction technique. Experiments on synthetic and real-world datasets in viral quasispecies reconstruction, demonstrate that LCMC achieves superior accuracy and efficiency compared to existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.08196 2026-06-09 stat.ML cs.AI cs.LG stat.ME 交叉投稿

Beyond Additivity: Causal Discovery in Location-Scale Noise Models with Hidden Variables

超越可加性：含隐变量的位置-尺度噪声模型中的因果发现

Mariyam Khan, Shohei Shimizu, Thong Pham

发表机构 * RIKEN AIP（理化学研究所Advanced Institute for Science Technology）； University of Bergen（卑尔根大学）； The University of Osaka（大阪大学）； Shiga University（滋贺大学）

AI总结针对含隐变量且数据生成过程遵循位置-尺度噪声模型（LSNM）的因果发现，证明满足无弓条件的非循环有向混合图（ADMG）可识别，并提出两阶段算法LSNM-UV，在异方差数据上优于可加性基线。

Comments 33 pages, 4 figures

2606.08438 2026-06-09 stat.ML cs.LG 交叉投稿

Improving Bayesian Optimization via Training-Aware Conditional Diffusion Models

通过训练感知的条件扩散模型改进贝叶斯优化

Yilin Zheng, Haowei Wang, Szu Hui Ng, Enlu Zhou

发表机构 * National University of Singapore（新加坡国立大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出利用条件扩散模型高效近似最优解分布，并开发贝叶斯优化固有的训练策略和基于扩散的模态搜索采集函数，理论保证次优性，实验优于标准基线。

详情

AI中文摘要

贝叶斯优化（BO）是一种广泛使用的黑箱优化方法，它使用高斯过程（GP）作为代理模型，并通过采集函数指导顺序评估，最终目标是定位全局最优解 $\mathbf{x}^{\star}$。为了实现这一目标，基于信息的采集函数（如预测熵搜索PES）将 $\mathbf{x}^{\star}$ 建模为随机变量，并减少其分布的熵，但通过传统的GP后验采样来近似该分布计算成本高昂。为了解决这一限制，我们利用条件扩散模型（CDM）高效近似 $\mathbf{x}^{\star}$ 的分布，并为CDM开发了BO固有的训练策略。受CDM学习分布的结构特性启发，我们进一步提出了一种称为基于扩散的模态搜索（DMS）的采集策略来指导顺序评估。我们为CDM学习分布建立了次优性保证，并通过大量实验证明DMS优于标准BO基线。

英文摘要

Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the global optimum $\mathbf{x}^{\star}$. To align with this goal, information-based acquisition functions such as Predictive Entropy Search (PES) model $\mathbf{x}^{\star}$ as a random variable and reduce the entropy of its distribution, but approximating this distribution via traditional GP posterior sampling is computationally expensive. To address this limitation, we leverage Conditional Diffusion Models (CDMs) to efficiently approximate the distribution of $\mathbf{x}^{\star}$ and develop BO-inherent training strategies for CDMs. Motivated by the structural properties of the CDM-learned distribution, we further develop an acquisition strategy termed Diffusion-based Mode Seeking (DMS) to guide the sequential evaluation. We establish a sub-optimality guarantee for the CDM-learned distribution and demonstrate through extensive experiments that DMS outperforms standard BO baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.08638 2026-06-09 math.OC cs.LG 交叉投稿

Parameter Tuning with Generalization Guarantees for GPU-Accelerated Linear Programming

具有泛化保证的GPU加速线性规划参数调优

Siddharth Prasad, Dravyansh Sharma

发表机构 * Siddharth Prasad ； Dravyansh Sharma

AI总结针对GPU加速线性规划求解器PDLP的超参数调优，基于数据驱动算法设计理论，首次给出学习步长、原始权重等超参数的样本复杂度保证，并通过实验验证了调优必要性。

详情

AI中文摘要

最近的研究开发了实用、可并行化的一阶方法用于大规模线性规划，但性能高度依赖于超参数选择。我们为(cu)PDLP（一种为现代硬件设计的最先进的一阶LP求解器）中的超参数调优推导了泛化保证。首先，我们确定了PDHG（PDLP的基础算法，即原始-对偶混合梯度算法）的行为与其步长和原始权重的函数关系，从而为学习这些参数提供了线性样本复杂度保证。然后，我们对PDLP进行了结构分析，该算法在PDHG基础上增加了多种专门技术，如预处理、自适应步长、平均化、自适应重启和平滑原始权重更新。我们的分析捕捉了作为超参数函数的解轨迹行为，并利用数据驱动算法设计的最新进展，为学习这些超参数获得了多项式样本复杂度保证。最后，我们进行了概念验证实验，证明了数据驱动PDLP参数调优的必要性。我们的结果展示了数据驱动算法设计工具包在复杂现代优化算法的求解器级实现中进行原则性超参数调优的通用性。

英文摘要

Recent research has developed practical, parallelizable first-order methods for large scale linear programming, but performance is highly dependent on hyperparameter selection. We derive generalization guarantees for hyperparameter tuning within (cu)PDLP, a state-of-the-art first-order LP solver designed for modern hardware. First, we pin down the behavior of PDHG, the primal-dual hybrid gradient algorithm that underlies PDLP, as a function of its step size and primal weight, leading to linear sample complexity guarantees for learning those parameters. We then conduct a structural analysis of PDLP, which augments PDHG with several specialized techniques like preconditioning, adaptive step sizes, averaging, adaptive restarts, and smoothed primal weight updates. Our analysis captures the behavior of the solution trajectory as a function of the hyperparameters and leverages recent advances in data-driven algorithm design to obtain polynomial sample complexity guarantees for learning those hyperparameters. Finally, we conduct proof-of-concept experiments that demonstrate the need for data-driven PDLP parameter tuning. Our results showcase the versatility of the data-driven algorithm design toolkit for principled hyperparameter tuning within solver-grade implementations of complex modern optimization algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.08727 2026-06-09 math.NA cs.LG cs.NA 交叉投稿

Compositional Approximation Can Strictly Outperform Superpositional Approximation

组合逼近可以严格优于叠加逼近

Dennis Elbrächter, Philipp Petersen

发表机构 * University of Freiburg（弗赖堡大学）

AI总结本文通过构造显式例子，证明存在函数类使得叠加逼近的速率严格低于组合逼近，且差距可任意大。

详情

AI中文摘要

许多经典研究的函数类已知可以通过叠加方法最优逼近，即通过某些字典中元素的线性组合构造逼近。这里的最优性意味着，以参数数量为函数的均匀逼近误差具有任何参数化方法所能达到的最高阶多项式衰减，其中参数可以编码为长度与参数数量成正比（对数因子内）的比特串。尽管像神经网络这样的组合方法在结构上不同，但通过施加确保这种比例比特串编码的约束，它们的逼近速率可以变得可比。在这项工作中，我们研究了具有结构性质的函数类，这些性质限制了叠加逼近速率严格低于组合逼近速率。特别地，我们构造了显式例子，使得两者之间存在任意大的差距。

英文摘要

Many classically studied function classes are known to be approximated optimally by superpositional methods, i.e. with approximants constructed as the linear combination of elements in some dictionary. Here optimality means that the uniform approximation error viewed as a function of the number of parameters used has polynomial decay of the highest order achievable by any parametrized method whose parameters can be encoded as a bit string of length proportional, up to logarithmic factors, to the number of parameters. While compositional methods like neural networks are structurally different, their approximation rates can be made comparable by imposing constraints that ensure such a proportional bit string encoding. In this work we study function classes exhibiting structural properties that limit superpositional approximation rates to be strictly lower than compositional approximation rates. In particular, we construct explicit examples for which there is an arbitrarily large gap.

URL PDF HTML ☆

赞 0 踩 0

2606.08783 2026-06-09 math.OC cs.LG cs.NA math.NA 交叉投稿

OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

OptMuon：用于随机优化的闭环正交动量方法及其零噪声最优性

Ganzhao Yuan

发表机构 * Faculty of Computer Science and Artificial Intelligence（计算机科学与人工智能学院）； Shenzhen University of Advanced Technology (SUAT)（深圳先进技术大学）

AI总结提出OptMuon，将Muon风格极因子方向与轨迹依赖的AdaGrad-Norm型系数调度结合，实现自适应动量正交化，在无噪声时达到近乎最优的一阶速率，且无需手动调整超参数。

详情

AI中文摘要

正交化动量更新，如Muon风格优化器中所使用的，最近在大规模深度学习中显示出强大的经验稳定性。然而，现有的正交化方法通常与常数或开环幅度规则配对，因此不会根据观察到的优化轨迹明确校准其更新幅度。受Lipschitz-free和噪声自适应方法背后的闭环视角启发，我们提出了OptMuon，一种用于随机非凸优化的自适应动量正交化方法家族。OptMuon将Muon风格的极因子方向与轨迹依赖的AdaGrad-Norm型系数调度相结合，使得更新幅度由观察到的梯度和动量历史决定，而不是由预设的Lipschitz依赖规则决定。该调度在参数选择中不使用光滑常数、方差水平或有界梯度常数，其运行最大值校正防止了孤立的梯度尖峰导致过度的系数崩溃。在随机梯度有界方差、光滑性以及几乎必然有界随机梯度条件下，我们证明了两个互补的保证。OptMuon-A在平均光滑性下达到噪声自适应速率$\tilde{\mathcal O}(T^{-1/2}+σ^{1/2}T^{-1/4})$，而OptMuon-I在个体光滑性下达到$\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})$。在零噪声机制下，两个界限自动简化为近乎最优的确定性一阶速率$\tilde{\mathcal O}(T^{-1/2})$，无需手动重新调整超参数。这些结果表明，闭环标量自适应可以与Muon风格的动量正交化相结合，同时保持噪声自适应性和零噪声最优性（至多对数因子）。

英文摘要

Orthogonalized momentum updates, as used in Muon-style optimizers, have recently shown strong empirical stability in large-scale deep learning. However, existing orthogonalized methods are typically paired with constant or open-loop magnitude rules, and therefore do not explicitly calibrate their update magnitudes from the observed optimization trajectory. Motivated by the closed-loop perspective behind Lipschitz-free and noise-adaptive methods, we propose OptMuon, a family of adaptive momentum orthogonalization methods for stochastic nonconvex optimization. OptMuon combines Muon-style polar-factor directions with a trajectory-dependent AdaGrad-Norm-type coefficient schedule, so that the update magnitude is determined by the observed gradient and momentum history rather than by a prescribed Lipschitz-dependent rule. The schedule does not use the smoothness constant, the variance level, or the bounded-gradient constant in parameter selection, and its running-maximum correction prevents isolated gradient spikes from causing excessive coefficient collapse. Under lower-boundedness, unbiased stochastic gradients with bounded variance, smoothness, and an almost-sure bounded stochastic-gradient condition, we prove two complementary guarantees. OptMuon-A achieves the noise-adaptive rate $\tilde{\mathcal O}(T^{-1/2}+σ^{1/2}T^{-1/4})$ under average smoothness, while OptMuon-I achieves $\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})$ under individual smoothness. In the zero-noise regime, both bounds automatically reduce to a nearly optimal deterministic first-order rate $\tilde{\mathcal O}(T^{-1/2})$ without manual hyperparameter retuning. These results show that closed-loop scalar adaptation can be combined with Muon-style momentum orthogonalization while retaining noise adaptivity and zero-noise optimality up to logarithmic factors.

URL PDF HTML ☆

赞 0 踩 0

2606.08941 2026-06-09 stat.ML cs.LG 交叉投稿

Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls

通过强d-凸包估计完全部分有向无环图中因果效应的可压缩性

Yuxin Deng, Yi Sun, Zhiming Li, Huaxiong Liu

发表机构 * College of Mathematics and System Science, Xinjiang University（新疆大学数学与系统科学学院）； Institute of Statistics and Data Science, Xinjiang University of Finance and Economics（新疆财经大学统计与数据科学研究院）

AI总结提出一种在完全部分有向无环图中保持因果效应估计一致性的可压缩方法，通过强d-凸包刻画最小可压缩集，并设计高效算法结合IDA框架。

2606.09820 2026-06-09 math.FA cs.LG math.PR q-fin.MF stat.ML 交叉投稿

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

无限维流形上可微映射的加权通用逼近

Philipp Schmocker, Josef Teichmann

发表机构 * Department of Mathematics, ETH Zurich, Switzerland（苏黎世联邦理工学院数学系）

AI总结通过加权Nachbin定理，将函数输入神经网络的通用逼近定理推广到可微映射，包括导数逼近，并应用于非预期泛函和路径空间泛函的逼近。

Comments 77 pages, 3 figures

2302.09832 2026-06-09 cs.LG math.OC 版本更新

TAMUNA: Doubly Accelerated Distributed Optimization under Partial Participation

TAMUNA: 部分参与下的双重加速分布式优化

Laurent Condat, Ivan Agarský, Grigory Malinovsky, Peter Richtárik

发表机构 * Computer Science Program, CEMSE Division, King Abdullah University of Science and Technology (KAUST)（卡布斯王国科学与技术大学计算机科学项目，CEMSE部门）； SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI)（数据科学与人工智能卓越中心（SDAIA-KAUST AI））； Brno University of Technology（布拉格技术大学）； Kempelen Institute of Intelligent Technologies (KInIT)（智能技术研究所（KInIT））

AI总结提出TAMUNA算法，首次结合本地训练、压缩和部分参与，实现双重加速收敛，支持任意客户端参与水平。

详情

AI中文摘要

在分布式优化和联邦学习中，并行设备与中央服务器之间缓慢且昂贵的通信是主要瓶颈。为了缓解这一负担，出现了两种策略：1）本地训练（LT），通过在轮次之间执行多次本地计算来降低通信频率；2）压缩（CC），即传输低维度的紧凑表示。最近的理论进展成功地将LT和CC结合起来，实现了关于条件数和模型维度的双重加速通信速率。然而，这些方法有一个主要缺点：它们需要所有客户端参与，并且在空闲客户端错过通信触发时失效。我们引入了TAMUNA，这是第一个成功交织LT、CC和部分参与的算法。通过将原始模型更新与对偶控制变量解耦，TAMUNA克服了先前方法的架构死锁。在强凸设置下，TAMUNA线性收敛到精确解，通过展示双重加速收敛建立了新的最先进水平，同时支持任意水平的客户端参与。

英文摘要

In distributed optimization and federated learning, slow and costly communication between parallel devices and the central server constitutes the primary bottleneck. To alleviate this burden, two strategies have emerged: 1) local training (LT), which reduces communication frequency by performing multiple local computations between rounds, and 2) compression (CC), which consists of transmitting lower-dimensional, compact representations. Recent theoretical advances have successfully combined LT and CC to achieve doubly-accelerated communication rates, with respect to both condition number and model dimension. However, these methods have a major drawback: they require full client participation and break down when idle clients miss communication triggers. We introduce TAMUNA, the first algorithm to successfully intertwine LT, CC, and partial participation. By decoupling primal model updates from dual control variates, TAMUNA overcomes the architectural deadlock of prior methods. In the strongly convex setting, TAMUNA converges linearly to the exact solution, establishing a new state of the art by exhibiting doubly-accelerated convergence, while supporting arbitrary levels of client participation.

URL PDF HTML ☆

赞 0 踩 0

2401.01599 2026-06-09 cs.LG math.ST stat.TH 版本更新

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

幂律衰减下解析谱算法的泛化误差曲线

Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin

发表机构 * Tsinghua University（清华大学）

AI总结本文在温和假设下，完整刻画了核梯度下降等解析谱算法在核回归中的泛化误差曲线，揭示了核插值的不一致性和高资格算法的饱和效应，并通过神经正切核理论加深了对宽神经网络泛化行为的理解。

2506.01052 2026-06-09 cs.LG math.OC stat.ML 版本更新

A Robust $\widetilde{\mathcal{O}}(1/\sqrt{T})$ Rate for Unprojected TD Learning with Linear Function Approximation

线性函数逼近的无投影TD学习的鲁棒 $\widetilde{\mathcal{O}}(1/\sqrt{T})$ 收敛率

Wei-Cheng Lee, Francesco Orabona

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文针对线性函数逼近的时序差分学习，在无投影条件下证明了期望收敛率为 $\widetilde{\mathcal{O}}(\\|\theta^*\\|^2_2/\sqrt{T})$，仅需对学习率进行轻微的对数修正，无需额外正则条件。

详情

AI中文摘要

我们研究了线性函数逼近的时序差分（TD）学习的有限时间收敛性质，这是强化学习的基石。我们关注所谓的“鲁棒”设置，其中收敛保证不依赖于势函数的最小曲率。虽然先前的工作已经建立了该设置下的收敛保证，但这些结果通常依赖于每次迭代被投影到有界集上的人为假设。Bhandari 等人（COLT'18）将去除这一条件留作开放问题，并假设需要额外的“正则条件”。在本文中，我们表明，即使存在马尔可夫噪声，简单的无投影 TD(0) 也能以期望的 $\widetilde{\mathcal{O}}\left(\frac{\\|\theta^*\\|^2_2}{\sqrt{T}}\right)$ 速率收敛。我们不需要额外的正则条件，仅需对学习率进行轻微的对数修正。我们的分析揭示了 TD 更新的一种新的自界性质，并利用它来保证迭代的有界性。

英文摘要

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature. While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each iterate is projected onto a bounded set. Removing such a condition was left as an open problem by Bhandari et al. (COLT'18), hypothesizing the need for additional ``regularity conditions''. In this paper, we show that the simple unprojected TD(0) converges with a rate of $\widetilde{\mathcal{O}}\left(\frac{\|θ^*\|^2_2}{\sqrt{T}}\right)$ in expectation, even in the presence of Markovian noise. We do not require an additional regularity condition, but only a minor polylog correction to the learning rate. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

URL PDF HTML ☆

赞 0 踩 0

2506.11336 2026-06-09 cs.LG math.OC 版本更新

The Sample Complexity of Parameter-Free Stochastic Convex Optimization

无参数随机凸优化的样本复杂度

Jared Lawrence, Ari Kalinsky, Hannah Bradfield, Yair Carmon, Oliver Hinder

发表机构 * Department of Industrial Engineering, University of Pittsburgh（工业工程系，匹兹堡大学）； Department of Computer Science, Tel Aviv University（计算机科学系，特拉维夫大学）

AI总结研究未知问题参数（如到最优点的距离和Lipschitz常数）下随机凸优化的样本复杂度，提出可靠模型选择方法和正则化方法，实现最优样本复杂度并避免过拟合。

Comments Accepted for publication in JMLR

详情

AI中文摘要

我们研究当问题参数（如到最优点的距离和Lipschitz常数）未知时随机凸优化的样本复杂度。我们采用两种策略。首先，我们开发了一种可靠的模型选择方法，避免对验证集的过拟合。该方法允许我们通用地调整随机优化方法的学习率，以匹配最优已知参数样本复杂度（相差log log因子）。其次，我们开发了一种专门针对仅到最优点的距离未知情况的正则化方法。具体而言，它使用范数正则化经验风险最小化来估计到最优点的距离（常数因子内），使得已知参数的随机优化方法能够达到最优样本复杂度。该方法提供了对未知到最优点距离的完美适应性，展示了无参数随机凸优化的样本复杂度与计算复杂度之间的分离。结合这两种方法允许我们同时适应多种问题结构。在CIFAR-10上通过微调CLIP模型和提示工程Gemini计数形状进行的小样本学习实验表明，我们的可靠模型选择方法有助于减轻对小验证集的过拟合。

英文摘要

We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown. We pursue two strategies. First, we develop a reliable model selection method that avoids overfitting to the validation set. This method allows us to generically tune the learning rate of stochastic optimization methods to match the optimal known-parameter sample complexity up to log log factors. Second, we develop a regularization-based method that is specialized to the case that only the distance to optimality is unknown. More specifically, it uses norm-regularized empirical risk minimization to estimate the distance to optimality to within a constant factor, allowing known-parameter stochastic optimization methods to achieve optimal sample complexity. This method provides perfect adaptability to unknown distance to optimality, demonstrating a separation between the sample and computational complexity of parameter-free stochastic convex optimization. Combining these two methods allows us to simultaneously adapt to multiple problem structures. Experiments performing few-shot learning on CIFAR-10 by fine-tuning CLIP models and prompt engineering Gemini to count shapes indicate that our reliable model selection method can help mitigate overfitting to small validation sets.

URL PDF HTML ☆

赞 0 踩 0

2507.01598 2026-06-09 cs.LG 版本更新

Convergence Bound and Critical Batch Size of Muon Optimizer

Muon优化器的收敛界与临界批量大小

Naoki Sato, Hiroki Naganuma, Hideaki Iiduka

发表机构 * Meiji University（立命经济大学）； Université de Montréal（蒙特利尔大学）； Mila（蒙特利尔人工智能研究院）

AI总结本文理论分析了Muon优化器在四种实际设置下的收敛性，证明权重衰减确保参数和梯度范数有界，并推导了临界批量大小的下界，揭示了超参数β和λ对其缩放的影响。

详情

AI中文摘要

Muon是一种最近提出的优化器，利用神经网络参数的固有矩阵结构，展现了强大的实证性能，表明其有潜力成为AdamW等标准优化器的后继者。本文提供理论分析以支持其实践成功。我们在四种实际设置下给出了Muon的收敛证明，系统考察了其有无Nesterov动量和权重衰减时的行为。然后我们证明，添加权重衰减可确保参数和梯度范数几乎必然有界——无需依赖通常施加的有界梯度假设——并阐明了权重衰减系数与学习率之间的相互作用。最后，我们推导了Muon临界批量大小的下界——该批量大小最小化训练的随机一阶预言机（SFO）复杂度。由于所得公式涉及不可直接观测的问题相关量（梯度方差、目标精度、有效秩），它不能绝对预测临界批量大小；而是揭示了超参数$\beta$（动量）和$\lambda$（权重衰减）如何控制该值的定性缩放。我们的实验在包括图像分类和语言建模在内的任务上验证了这些依赖于超参数的预测。

英文摘要

Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents theoretical analysis to support its practical success. We provide convergence proofs for Muon across four practical settings, systematically examining its behavior with and without the inclusion of Nesterov momentum and weight decay. We then demonstrate that the addition of weight decay ensures almost-sure boundedness of the parameter and gradient norms -- without relying on the commonly imposed bounded-gradient assumption -- and clarify the interplay between the weight decay coefficient and the learning rate. Finally, we derive a lower bound on the critical batch size for Muon -- the batch size that minimizes the stochastic first-order oracle (SFO) complexity of training. Because the resulting formula involves problem-dependent quantities that are not directly observable (gradient variance, target precision, effective rank), it does not predict the critical batch size in absolute terms; rather, it reveals how the hyperparameters $β$ (momentum) and $λ$ (weight decay) govern the qualitative scaling of this value. Our experiments validate these hyperparameter-dependent predictions across workloads including image classification and language modeling.

URL PDF HTML ☆

赞 0 踩 0

2511.02003 2026-06-09 cs.LG cond-mat.dis-nn hep-ph 版本更新

Bulk-boundary decomposition of neural networks

神经网络的体-边界分解

Donghee Lee, Hye-Sung Lee, Jaeok Yi

发表机构 * Department of Physics, Korea Advanced Institute of Science and Technology（物理系，韩国科学技术院）

AI总结提出体-边界分解框架，将神经网络训练动力学分解为数据无关的体项和数据相关的边界项，揭示深层网络的局部齐次结构并推导能量连续性方程。

Comments 13 pages, 3 figures

2512.01930 2026-06-09 cs.LG cs.AI 版本更新

SVRG and Beyond via Posterior Correction

SVRG及其后验校正扩展

Nico Daheim, Thomas Möllenhoff, Ming Liang Ang, Mohammad Emtiyaz Khan

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文揭示SVRG与后验校正方法的深层联系，证明SVRG是各向同性高斯后验校正的特例，并通过灵活指数族后验自动导出牛顿型和Adam型新变体。

Comments ICML 2026 (oral)

2512.10656 2026-06-09 cs.LG 版本更新

Token Sample Complexity of Attention

注意力的标记采样复杂度

Léa Bohbot, Cyril Letrouit, Gabriel Peyré, François-Xavier Vialard

发表机构 * CNRS, ENS Paris, France（法国国家科学研究中心、巴黎高等师范学院）

AI总结研究注意力在极端序列长度下的收敛行为，提出标记采样复杂度概念，分析注意力映射的均匀收敛和变换分布矩的收敛速率，实验验证预测结果。

详情

局部偏好贝叶斯优化

Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger, Sebastian Trimpe

发表机构 * Institute for Data Science in Mechanical Engineering, RWTH Aachen University（机械工程数据科学研究所，亚琛工业大学）； Department of Clinical Research, University of Bern（伯尔尼大学临床研究系）； Center for Reproducible Science and Research Synthesis, University of Zurich（苏黎世大学可重复科学与研究综合中心）； aiXopt GmbH（aiXopt公司）

AI总结针对偏好贝叶斯优化在高维问题中效率低的问题，提出利用信任域和导数信息的局部偏好贝叶斯优化方法，显著降低累积遗憾。

详情

AI中文摘要

贝叶斯优化（BO）是一种流行且有效的调优昂贵、有噪声实验的方法，但需要制定明确的目标函数。偏好贝叶斯优化（PBO）通过从成对的人类反馈中学习来消除这一要求，然而现有方法由于其全局搜索策略，难以有效优化中低维以外的问题。我们通过开发一系列局部PBO方法来解决这一限制，这些方法将高维BO的关键思想迁移到偏好设置中。具体而言，我们引入了局部PBO方法，将信任域和导数信息局部搜索适应于成对偏好反馈，其中后者利用了拉普拉斯近似高斯过程后验的一阶和二阶导数。我们在GP样本路径、标准优化基准函数和策略搜索任务上的基准测试表明，局部PBO方法在具有陡峭最优值的高维和复杂景观中特别有效。与基于全局偏好的基线相比，它们可以显著减少累积遗憾，使其对于现实世界中基于偏好的优化任务（如策略搜索）特别有用。

英文摘要

Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.

URL PDF HTML ☆

赞 0 踩 0

2412.16457 2026-06-09 stat.ML cs.DS cs.LG math.PR math.ST stat.TH 版本更新

Robust Random Graph Matching in Dense Graphs via an Approximate Message Passing Type Algorithm

稠密图中的鲁棒随机图匹配：基于近似消息传递类型算法

Zhangsong Li

发表机构 * Peking University（北京大学）

AI总结针对带潜在顶点对应的相关高斯Wigner矩阵对，提出一种近似消息传递迭代算法，在对抗性扰动下实现多项式时间匹配恢复，扰动规模可达n^{1-o(1)}。

Comments 46 pages; accepted by IEEE Trans. Inf. Theory

详情

AI中文摘要

本文关注一对具有潜在顶点对应的相关高斯Wigner矩阵的匹配恢复问题。我们特别关注该问题的鲁棒版本，其中观测为扰动输入$(A+E,B+F)$，$(A,B)$是一对相关高斯Wigner矩阵，$E,F$是分别支撑在$A,B$的未知$\epsilon n \times \epsilon n$主子矩阵上的对抗性选择矩阵。我们提出一种近似消息传递（AMP）类型迭代算法，只要$(A,B)$之间的相关性$\rho$为非零常数且$\epsilon = o\big( \tfrac{1}{(\log n)^{20}} \big)$，该算法就能在多项式时间内成功。与标准AMP的关键区别在于，迭代中引入了时间依赖的矩阵乘法步骤，该步骤同时扩大特征维度并在迭代过程中抵消相关性。我们结果的主要方法输入来自\cite{DL22+, DL23+}中提出的迭代随机图匹配算法和\cite{IS24+}中提出的谱预处理过程。据我们所知，我们的算法是首个在任意$n^{1-o(1)}$大小的对抗性扰动下具有鲁棒性的高效随机图匹配类型算法。

英文摘要

In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $εn * εn$ principal minor of $A,B$, respectively. We propose an approximate message passing (AMP) type iterative algorithm that succeeds in polynomial time as long as the correlation $ρ$ between $(A,B)$ is a non-vanishing constant and $ε= o\big( \tfrac{1}{(\log n)^{20}} \big)$. A key distinction from standard AMP is the introduction of a time-dependent matrix multiplication step within the iteration, which simultaneously enlarges the feature dimension and cancels the correlation during the iteration. The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral preprocessing procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.

URL PDF HTML ☆

赞 0 踩 0

2502.15131 2026-06-09 math.ST cs.LG stat.ME stat.ML stat.TH 版本更新

Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

高维二分类中的最优且可证明的校准：角度校准与Platt缩放

Yufan Li, Pragya Sur

发表机构 * Harvard University（哈佛大学）

AI总结针对高维高斯特征下的线性二分类器，提出基于估计权重与真实权重夹角的角度校准方法，证明其可校准且唯一Bregman最优，并揭示Platt缩放在高维下收敛于该最优解。

详情

AI中文摘要

我们研究校准形如 $\sigma(\hat{w}^\top x)$ 的线性二分类器的基本问题，其中特征向量 $x$ 服从高斯分布，$\sigma$ 是链接函数，$\hat{w}$ 是真实线性权重 $w^\star$ 的估计量。通过与非信息性的 $\textit{机会分类器}$ 插值，我们构建了一个良好校准的预测器，其插值权重取决于估计量 $\hat{w}$ 与真实线性权重 $w_\star$ 之间的夹角 $\angle(\hat{w}, w_\star)$。我们证明，在样本量和特征量均以可比速率发散的高维机制下，这种角度校准方法可证明是良好校准的。夹角 $\angle(\hat{w}, w_\star)$ 可以一致地估计。此外，所得预测器是唯一 $\textit{Bregman最优}$ 的，即在合适的校准预测器类中最小化与真实标签分布的Bregman散度。我们的工作是首个在高维下同时满足校准和最优性可证明的校准策略。此外，我们识别了经典Platt缩放预测器收敛到我们的Bregman最优校准解的条件。因此，Platt缩放在高维下也继承了这些理想性质。

英文摘要

We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.

URL PDF HTML ☆

赞 0 踩 0

2505.08908 2026-06-09 math.ST cs.LG econ.TH stat.TH 版本更新

Statistical Decision Theory with Counterfactual Loss

具有反事实损失的统计决策理论

Benedikt Koch, Kosuke Imai

发表机构 * Harvard University（哈佛大学）

AI总结针对经典统计决策理论忽略反事实信息的问题，提出在强可忽略性下反事实风险可识别当且仅当损失函数在潜在结果上可加，并证明可加反事实损失能捕捉决策难度，通过符号线性逆规划无需数据即可判断可识别性。

详情

AI中文摘要

许多研究者应用经典统计决策理论来评估治疗选择和学习最优策略。然而，由于该框架仅依赖于所选行动下的实现结果而忽略反事实，它无法在单位层面评估决策相对于可行替代方案的质量，而这在某些设置中是一个重要要求。例如，在审前保释决策中，法官必须平衡释放后的犯罪预防与对被捕者施加不必要负担的风险。该框架中的一个核心挑战是可识别性：由于每个单位仅观测到一个潜在结果，反事实风险通常不可识别。我们证明，在强可忽略性下，反事实风险可识别当且仅当损失函数在潜在结果上可加。我们进一步证明，当存在两个以上的治疗选项时，可加反事实损失可以产生与基于标准损失不同的治疗推荐。我们表明，可加反事实损失不仅捕捉决策准确性，还捕捉决策难度，而标准损失仅反映准确性。最后，我们引入一个符号线性逆规划，无需数据即可确定给定的反事实损失是否产生可识别的风险。

英文摘要

Many researchers apply classical statistical decision theory to evaluate treatment choices and learn optimal policies. However, because this framework relies solely on realized outcomes under chosen actions and ignores counterfactuals, it cannot assess the quality of a decision relative to feasible alternatives at the unit level, which is an important requirement in some settings. For example, in pretrial bail decisions, a judge must balance crime prevention upon release against the risk of imposing unnecessary burdens on arrestees. A central challenge in this framework is identification: since only one potential outcome is observed per unit, counterfactual risk is typically not identifiable. We show that, under strong ignorability, counterfactual risk is identifiable if and only if the loss is additive in the potential outcomes. We further demonstrate that additive counterfactual losses can yield treatment recommendations that differ from those based on standard losses when more than two treatment options are available. We show that additive counterfactual losses capture not only decision accuracy but also decision difficulty, whereas standard losses reflect accuracy alone. Finally, we introduce a symbolic linear inverse program that determines whether a given counterfactual loss yields an identifiable risk, without requiring data.

URL PDF HTML ☆

赞 0 踩 0

2509.07779 2026-06-09 math.OC cs.LG cs.MA 版本更新

Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds

超越哈达玛流形的去中心化在线黎曼优化

Emre Sahinoglu, Shahin Shahrampour

发表机构 * Department of Mechanical & Industrial Engineering at Northeastern University（东北大学机械与工业工程系）

AI总结针对可能具有正曲率的流形，提出曲率感知的黎曼共识步骤，实现去中心化在线黎曼梯度下降算法，并证明O(√T)遗憾界。

详情

AI中文摘要

我们研究在可能具有正曲率的流形上的去中心化在线黎曼优化，超越了哈达玛流形设定。去中心化优化技术依赖于共识步骤，该步骤在欧几里得空间中因其线性性质而被充分理解。然而，在正曲率黎曼空间中，一个主要的技术挑战是测地距离可能不诱导全局凸结构。在这项工作中，我们首先分析了一个曲率感知的黎曼共识步骤，该步骤使得在哈达玛流形之外也能实现线性收敛。基于此步骤，我们为去中心化在线黎曼梯度下降算法建立了$O(\sqrt{T})$遗憾界。然后，我们研究了双点bandit反馈设置，其中我们使用平滑技术采用计算高效的梯度估计器，并通过平滑目标的次凸性分析证明了相同的$O(\sqrt{T})$遗憾界。

英文摘要

We study decentralized online Riemannian optimization over manifolds with possibly positive curvature, going beyond the Hadamard manifold setting. Decentralized optimization techniques rely on a consensus step that is well understood in Euclidean spaces because of their linearity. However, in positively curved Riemannian spaces, a main technical challenge is that geodesic distances may not induce a globally convex structure. In this work, we first analyze a curvature-aware Riemannian consensus step that enables a linear convergence beyond Hadamard manifolds. Building on this step, we establish a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent algorithm. Then, we investigate the two-point bandit feedback setup, where we employ computationally efficient gradient estimators using smoothing techniques, and we demonstrate the same $O(\sqrt{T})$ regret bound through the subconvexity analysis of smoothed objectives.

URL PDF HTML ☆

赞 0 踩 0

2510.12744 2026-06-09 stat.ML cs.LG math.ST stat.CO stat.ME stat.TH 版本更新

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency Without Model Sweeps

混合测度的树状图用于Softmax门控高斯混合专家：无需模型扫描的一致性

Do Tien Hai, Trung Nguyen Mai, TrungTin Nguyen, Nhat Ho, Binh T. Nguyen, Christopher Drovandi

发表机构 * Faculty of Mathematics and Computer Science, University of Science, Ho Chi Minh City, Vietnam（越南胡志明市科学大学数学与计算机科学学院）； Vietnam National University Ho Chi Minh City, Vietnam（越南胡志明市国家大学）； Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam（越南胡志明市科学大学信息技术学院）； ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems（细胞系统数学分析 excellence 中心）； School of Mathematical Sciences, Queensland University of Technology, Brisbane City, Australia（昆士兰科技大学数学科学学院）； Department of Statistics and Data Science, University of Texas at Austin, Austin, USA（德克萨斯大学奥斯汀分校统计与数据科学系）

AI总结针对softmax门控高斯混合专家模型，提出基于Voronoi损失函数的统一统计框架，解决参数非可识别性和模型选择问题，并引入混合测度树状图实现一致且无需多尺寸训练的专家数选择。

Comments Do Tien Hai, Trung Nguyen Mai, and TrungTin Nguyen are co-first authors. In Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, AISTATS 2026 Spotlight, Acceptance rate 2.5% over 2102 submissions

详情

AI中文摘要

我们为softmax门控高斯混合专家（SGMoE）开发了一个统一的统计框架，解决了参数估计和模型选择中三个长期存在的障碍：（i）门控参数在公共平移下的非可识别性，（ii）内在的门控-专家交互导致似然中耦合的微分关系，以及（iii）softmax诱导的条件密度中紧密的分子-分母耦合。我们的方法引入了与门划分几何对齐的Voronoi型损失函数，并建立了最大似然估计（MLE）的有限样本收敛速率。在过指定模型中，我们揭示了MLE收敛速率与刻画接近非可识别方向的多项式方程组可解性之间的联系。对于模型选择，我们将混合测度的树状图适配到SGMoE，产生一个一致且无需扫描的专家数选择器，在过拟合下达到逐点最优的参数速率，同时避免多尺寸训练。在合成数据上的模拟验证了理论，准确恢复了专家数量并达到了参数估计的预测速率，同时紧密逼近回归函数。在模型误指定下（例如，$\epsilon$-污染），树状图选择准则具有鲁棒性，恢复了真实的混合成分数量，而Akaike信息准则、贝叶斯信息准则和集成完全似然在样本量增大时倾向于过选择。在一个干旱响应性状的玉米蛋白质组学数据集上，我们的树状图引导的SGMoE选择了两个专家，揭示了清晰的混合测度层次结构，早期稳定了似然，并产生了可解释的基因型-表型图谱，优于无需多尺寸训练的标准准则。

英文摘要

We develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic gate-expert interactions that induce coupled differential relations in the likelihood, and (iii) the tight numerator-denominator coupling in the softmax-induced conditional density. Our approach introduces Voronoi-type loss functions aligned with the gate-partition geometry and establishes finite-sample convergence rates for the maximum likelihood estimator (MLE). In over-specified models, we reveal a link between the MLE's convergence rate and the solvability of an associated system of polynomial equations characterizing near-nonidentifiable directions. For model selection, we adapt dendrograms of mixing measures to SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains pointwise-optimal parameter rates under overfitting while avoiding multi-size training. Simulations on synthetic data corroborate the theory, accurately recovering the expert count and achieving the predicted rates for parameter estimation while closely approximating the regression function. Under model misspecification (e.g., $ε$-contamination), the dendrogram selection criterion is robust, recovering the true number of mixture components, while the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood tend to overselect as sample size grows. On a maize proteomics dataset of drought-responsive traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing-measure hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps, outperforming standard criteria without multi-size training.

URL PDF HTML ☆

赞 0 踩 0

2601.16510 2026-06-09 cs.MS cs.LG math.OC 版本更新

Learning to Optimize by Differentiable Programming

通过可微编程学习优化

Liping Tao, Xindi Tong, Chee Wei Tan

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结本教程介绍利用可微编程学习设计一阶优化算法，通过端到端训练提升收敛性和解质量，并基于Fenchel-Rockafellar对偶性展示ADMM和PDHG等算法的学习与适应。

2602.02431 2026-06-09 stat.ML cs.LG 版本更新

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

全批量梯度下降优于单次SGD：单索引学习中的样本复杂度分离

Filip Kovačević, Hong Chang Ji, Denny Wu, Mahdi Soltanolkotabi, Marco Mondelli

发表机构 * Institute of Science and Technology Austria（奥地利科学与技术研究所）； Sung Kyun Kwan University（顺天妇女大学）； New York University and Flatiron Institute（纽约大学和Flatiron研究所）； University of Southern California（南加州大学）

AI总结研究单索引学习中全批量GD与单次SGD的样本复杂度差异，发现通过截断激活函数，全批量GD在n≃d样本时实现弱恢复，优于单次SGD的n≳d log d样本需求。

Comments Accepted to ICML 2026

详情

AI中文摘要

传统观点认为，多次重用训练数据可以提高基于梯度的学习的统计效率。虽然这一现象在线性回归中已被广泛研究，但在非线性和非凸设置中，除了前两次数据传递实现的损失修改机制外，多遍梯度下降（GD，重用所有数据）相对于单遍随机梯度下降（在线SGD，每个数据点仅使用一次）的优势尚未得到充分理解。在这项工作中，我们考虑学习一个具有二次激活函数的$d$维单索引模型，已知单次SGD需要$n\gtrsim d\log d$个样本才能实现弱恢复。我们首先证明，对于相关损失上的全批量球面GD，样本复杂度中的$\log d$因子仍然存在；然而，通过简单地截断激活函数，全批量GD在$n \simeq d$个样本时展现出有利的优化景观，从而在统计效率上优于单次SGD（使用相同的激活函数）。我们通过从微小初始化开始的平方损失上全批量GD的轨迹分析补充了这一结果，表明$n \gtrsim d$个样本和$T \gtrsim\log d$个梯度步足以实现强（精确）恢复。

英文摘要

It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.

URL PDF HTML ☆

赞 0 踩 0

2602.03682 2026-06-09 stat.ML cs.DC cs.LG cs.NA math.NA 版本更新

Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA

加速噪声幂方法的改进分析及其在分布式PCA中的应用

Pierre Aguié, Mathieu Even, Laurent Massoulié

发表机构 * École Polytechnique Fédérale de Lausanne（洛桑联邦理工学院）

AI总结本文改进了加速噪声幂方法的分析，在更宽松的扰动条件下保持加速收敛速率，并首次提出具有可证明加速收敛的分布式PCA算法。

详情

AI中文摘要

我们分析了加速噪声幂方法，这是一种在仅有不精确矩阵-向量乘积可用的情况下进行主成分分析的算法，例如在分布式PCA中可能出现的情况。虽然先前的工作已经证明，与标准噪声幂方法相比，加速可以改善收敛速度，但这些保证需要对扰动幅度进行过度严格的上界限制，限制了其实用性。我们提供了该算法的改进分析，在更温和的扰动条件下保持了加速收敛速率。我们证明我们的新分析在最坏情况下是最优的，即收敛速率无法进一步提高，并且我们推导的噪声条件在不牺牲收敛保证的情况下无法放宽。我们通过推导一种用于分布式PCA的加速算法来展示我们结果的实际相关性，该算法具有与非加速方法相似的通信成本。据我们所知，这是第一个具有可证明加速收敛的分布式PCA算法。

英文摘要

We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.

URL PDF HTML ☆

赞 0 踩 0

2602.04402 2026-06-09 stat.ML cs.AI cs.CY cs.LG math.ST stat.TH 版本更新

Performative Learning Theory

表现性学习理论

Julian Rodemann, Unai Fischer-Abaigar, James Bailie, Krikamol Muandet

发表机构 * University of Cambridge（剑桥大学）

AI总结将表现性预测嵌入统计学习理论，证明在样本和总体表现性效应下的泛化界，揭示模型影响数据越多则学习越少的权衡，并提出通过再训练改善泛化保证。

Comments ICML 2026. v2: corrected typo in author list; v3: added explanation of condition 3.2, modified condition 3.3 and fixed lemma 3.4, added examples and explanations in sections 2, 5, and 6

详情

AI中文摘要

表现性预测会影响它们试图预测的结果。我们研究影响样本（例如，仅限现有应用用户）和/或整个总体（例如，所有潜在应用用户）的表现性预测。这引发了模型在表现性下泛化能力的问题。例如，当现有用户和新用户都对应用的预测做出反应时，我们基于现有用户对新用户能得出多好的见解？我们通过将表现性预测嵌入统计学习理论来解决这个问题。我们证明了在样本、总体以及两者共同影响下的泛化界。我们证明背后的一个关键直觉是，在最坏情况下，总体否定预测，而样本欺骗性地实现预测。我们分别将这种自我否定和自我实现的预测表述为Wasserstein空间中的最小-最大和最小-最小风险泛函。我们的分析揭示了表现性地改变世界与从中学习之间的基本权衡：模型对数据的影响越大，它能从数据中学到的就越少。此外，我们的分析得出一个令人惊讶的见解：通过对表现性扭曲的样本进行再训练，可以改善泛化保证。我们通过一个案例研究说明了我们的界，该案例涉及基于预测的德国失业居民工作培训分配，利用了德国1975年至2017年的行政劳动力市场记录。

英文摘要

Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.

URL PDF HTML ☆

赞 0 踩 0

2604.26993 2026-06-09 math.NA cs.LG cs.NA math.OC 版本更新

State-Dependent Lyapunov Analysis of Rank-1 Matrix Factorization

基于状态依赖的Lyapunov分析的秩1矩阵分解

Jaehong Moon

发表机构 * Industrial & Enterprise Systems Engineering University of Illinois at Urbana-Champaign（工业与企业系统工程伊利诺伊大学厄巴纳-香槟分校）

AI总结本文通过状态依赖Lyapunov视角研究梯度下降在秩1矩阵分解中的收敛性，提出参数化二次证书I(δ;·)，证明在临界步长以下收敛到全局极小值，临界步长以上则进入平衡终端状态并表现出周期2行为。

详情

AI中文摘要

我们通过状态依赖Lyapunov视角研究梯度下降在秩1矩阵分解中的收敛性。核心对象是一个参数化二次证书I(δ;·)，其边界内向性质诱导单调状态参数δ_t，从而证明轨迹被限制在收缩的水平集内。对于初始值低于临界步长的初始化，此机制证明收敛到全局极小值。在临界步长以上，相同的单调状态机制导致平衡终端状态；对于一系列临界步长以上的步长，减少的动力学表现出周期2行为，与稳定性边缘现象一致。我们进一步表明，标量证书并非随意的代数构造：在结构公理和自然的状态-参数归一化下，它由单调性机制唯一确定。数值实验表明，这种状态依赖Lyapunov机制在证明案例之外也持续存在，包括二维秩1近似和标量分解的四次扩展。

英文摘要

We study gradient descent for rank-1 matrix factorization through a state-dependent Lyapunov perspective. The central object is a parameterized quadratic certificate $I(δ;\,\cdot)$ whose boundary-inward property induces a monotone state parameter $δ_t$, thereby certifying that the trajectory is confined to a shrinking family of level sets. For certified initializations below the critical step size, this mechanism proves convergence to global minimizers. Above the critical step size, the same monotone-state mechanism instead leads to a balanced terminal regime; for a range of post-critical step sizes, the reduced dynamics exhibit period-2 behavior consistent with edge-of-stability phenomena. We further show that the scalar certificate is not an ad hoc algebraic construction: under structural axioms and a natural state-parameter normalization, it is uniquely determined by the monotonicity mechanism. Numerical experiments suggest that this state-dependent Lyapunov mechanism persists beyond the proved cases, including two-dimensional rank-1 approximation and quartic augmentations of scalar factorization.

URL PDF HTML ☆

赞 0 踩 0

2605.25085 2026-06-09 cs.IT cs.AI cs.LG math.IT 版本更新

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

自回归语言模型中的多项式上下文截断敏感性：KV缓存压缩的序列Wyner-Ziv界

Munsik Kim

发表机构 * Independent Researcher（独立研究者）

AI总结研究自回归语言模型中在线KV缓存压缩的率失真极限，将其建模为序列Wyner-Ziv信源编码，发现下一词分布对上下文截断的敏感性呈多项式衰减，并推导了仅后缀缓存策略的每词内存需求。

详情

AI中文摘要

我们研究了自回归语言模型中在线KV缓存压缩的率失真极限，将其建模为模型诱导滤子上的序列Wyner-Ziv信源编码，其中下一步查询作为解码器边信息。实验上，在涵盖两个系列、参数规模0.5-3B的四个模型中，我们发现下一词分布对上下文截断的敏感性呈多项式衰减而非几何衰减：幂律在外推中比指数拟合提升一个数量级，拟合指数通过汇加最近KL测量独立恢复，并通过位置保持消融验证了衰减不受位置编码伪影影响。在相应的多项式截断敏感性假设下，我们的主要结果刻画了仅后缀缓存策略的每词内存需求：滑动窗口方案以窗口大小$w = O(\varepsilon^{-1/α})$达到失真$\varepsilon$，且在附加双边贝叶斯风险条件下，逆命题表明在该策略类内$w = \Omega(\varepsilon^{-1/α})$是必要的，因此仅后缀策略的缩放为$\Theta(\varepsilon^{-1/α})$。循环或传播缓存摘要能否超越此缩放留待进一步研究。一个显式的块马尔可夫方案达到上界；在附加前向衰减和正则性假设（仅由截断敏感性无法推出）下，其收敛速率指数与逆命题匹配，否则相差两倍。实验上，幂律预测了具体缓存策略的退化曲线：基于最近性的驱逐（滑动、汇加最近）在同等预算下将失真抑制约两个数量级，且失真随预算呈幂律衰减。

英文摘要

We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Ziv source coding on the filtration induced by the model, with the next-step query as decoder side information. Empirically, across four models spanning two families and $0.5$-$3$B parameters, we find that the next-token distribution's sensitivity to context truncation decays \emph{polynomially} rather than \emph{geometrically}: a power law improves on an exponential fit by an order of magnitude in extrapolation, the fitted exponent is recovered independently from a sink-plus-recent KL measurement, and the decay is verified to be free of positional-encoding artifacts by a position-preserving ablation. Under a corresponding \emph{polynomial truncation-sensitivity} assumption, our main result characterizes the per-token memory requirement of \emph{suffix-only} cache policies: a sliding-window scheme attains distortion $\varepsilon$ with window $w = O(\varepsilon^{-1/α})$, and -- under an additional two-sided Bayes-risk condition -- a converse shows $w = Ω(\varepsilon^{-1/α})$ is necessary within this policy class, so the scaling is $Θ(\varepsilon^{-1/α})$ for suffix-only policies. Whether recurrent or propagating cache summaries can beat this scaling is left open. An explicit block-Markov scheme achieves the upper bound; its rate-of-convergence exponent matches the converse under additional forward-decay and regularity hypotheses (not implied by truncation sensitivity alone), and differs by a factor of two otherwise. Empirically, the polynomial law predicts the degradation curves of concrete cache policies: recency-based eviction (sliding, sink-plus-recent) suppresses distortion by roughly two orders of magnitude over random retention at equal budget, with a power-law decay in the budget.

URL PDF HTML ☆

赞 0 踩 0

2605.26703 2026-06-09 econ.TH cs.GT cs.LG stat.ML 版本更新

Proper Calibeating

Dean P. Foster, Sergiu Hart

发表机构 * Department of Statistics, Wharton, University of Pennsylvania, Philadelphia, and Amazon, New York（统计系、沃顿商学院、宾夕法尼亚大学费城分校，以及纽约亚马逊公司）； Institute of Mathematics, Department of Economics, and Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem（数学研究所、经济系、理性研究基金会，以色列希伯来大学）

AI总结本文将经典校准预测和calibeating概念扩展到真确评分规则，定义proper-calibration和proper-calibeating，证明校准蕴含proper-calibration而calibeating不一定蕴含proper-calibeating，展示如何保证proper-calibeating和proper-multicalibeating，并证明proper-calibration与不确定性决策中对预测最佳回应时通用无遗憾的等价性。

Comments v2: Updated section 6 "Decision Making Under Uncertainty"

2606.01342 2026-06-09 cs.DS cs.LG 版本更新

Towards Optimal Robustness in Learning-Augmented Paging

面向学习增强分页的最优鲁棒性

Peng Chen, Hailiang Zhao, Xueyan Tang, Yixuan Wang, Shuiguang Deng

发表机构 * Department of XXX, University of YYY, Location, Country ； School of ZZZ, Institute of WWW, Location, Country ； Zhejiang University, Hangzhou, China ； Nanjing University of Aeronautics ； Nanyang Technological University, Singapore

AI总结本文提出一种新框架，通过相对预测预算原语，在学习增强分页中实现最优鲁棒性界 H_k + O(1)，并实验验证其实际性能。

Comments ICML 2026

详情

AI中文摘要

近年来，学习增强分页得到了广泛研究。与朴素基于机器学习的方法相比，一个关键优势是 extit{有界鲁棒性}，即使在预测不准确时也能保证最坏情况性能，这使得这些算法对实际系统有价值。先前工作在随机化设置中实现了 $2H_k + O(1)$ 的鲁棒性界，与最优竞争比 $H_k$ 存在差距。在本文中，我们研究如何缩小这一差距。我们首先回顾在线最优性，并证明最新的 $H_k$-竞争算法的一个新性质，这有助于我们在学习增强设置中的分析。然后，我们回顾现有的学习增强分页算法，并引入一个统一原语—— extit{相对预测预算}，它捕捉了建立鲁棒性的本质，并揭示了先前算法要么过度使用要么未充分利用预测。在上述分析指导下，我们开发了一个新框架，实现了学习增强分页的最优鲁棒性（至多相差一个加法常数）：$H_k + O(1)$。实验进一步证明了强大的实际性能。

英文摘要

Learning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$. In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance.

URL PDF HTML ☆

赞 0 踩 0

2606.07571 2026-06-09 cs.LG cs.AI 新提交

理论最小化的注意力机制：面向内存最优Transformer内核的数组数学框架

Lenore Mullin, Gaetan Hains

发表机构 * University at Albany（奥尔巴尼大学）； Université Paris-Est Créteil（巴黎东大学克雷泰伊分校）

AI总结提出基于数组数学（MoA）的缩放点积注意力重表述，通过代数构造消除所有中间数组，实现O(n dk + n dv)数据移动，相比标准实现O(n^2 + n dk + n dv)显著降低内存流量，并验证了数值精度。

详情

AI中文摘要

注意力机制是现代基于Transformer的AI中的主要计算瓶颈。其标准实现在序列长度~$n$上产生二次内存流量，而DRAM访问在当代硬件上比算术操作消耗100--1000$\times$更多的能量，因此任何仅关注FLOP计数的分析从根本上误解了瓶颈。我们提出了缩放点积注意力及其数值稳定softmax的数组数学（MoA）重表述，推导出指称范式（DNF），通过代数构造而非经验调优消除了所有中间数组——包括隐式转置键缓冲区和每个softmax临时变量。DNF实现了$O(n dk + n dv)$的数据移动，而标准实现为$O(n^2 + n dk + n dv)$，其中$n$是序列长度，$dk$是键维度，$dv$是值维度，并在具体输入上针对PyTorch全双精度浮点进行了数值验证。与硬件特定的加速器或经验性分块方案（如FlashAttention）不同，MoA从单一代数框架同时提供了数组融合、形状变换正确性和预测性成本模型。内存最小性是在编写任何代码之前就确立的定理。预测性性能模型预计加速2--100$\times$，能耗降低2--50$\times$，优势在超大规模下进一步扩大。该推导建立了一个从Python规范经过操作范式（ONF）和维度提升硬件映射的形式化验证流水线，提供了与DARPA边缘部署和DOE超大规模优先事项直接相关的性能可移植AI内核。

英文摘要

The attention mechanism is the dominant computational bottleneck in modern transformer-based AI. Its standard implementation incurs quadratic memory traffic in the sequence length~$n$, and DRAM accesses cost 100--1000$\times$ more energy than arithmetic operations on contemporary hardware, so any analysis focused solely on FLOP counts fundamentally mischaracterises the bottleneck. We present a Mathematics of Arrays (MoA) reformulation of scaled dot-product attention and its numerically stable softmax, deriving a Denotational Normal Form (DNF) that eliminates all intermediate arrays -- including the implicit transposed-key buffer and every softmax temporary -- by algebraic construction rather than empirical tuning. The DNF achieves $O(n_{dk} + n{_{dv}})$ data movement versus $O(n^2 + n_{dk} + n_{dv})$ for the standard implementation, where $n$ is the sequence length, $dk$ is the key dimensionality and $dv$ the value dimensionality, and is verified numerically against PyTorch at full double-precision floating-point on concrete inputs. Unlike hardware-specific accelerators or empirical tiling schemes such as FlashAttention, MoA simultaneously provides array fusion, shape-transformation correctness, and predictive cost models from a single algebraic framework. Memory minimality is a theorem established before any code is written. A predictive performance model projects $2$--$100\times$ speedup and $2$--$50\times$ energy reduction, with the advantage widening at exascale. The derivation establishes a formally verified pipeline from Python specification through (ONF) Operational Normal Form, and dimension-lifted hardware mapping, providing performance-portable AI kernels of direct relevance to DARPA edge-deployment and DOE exascale priorities.

URL PDF HTML ☆

赞 0 踩 0

2606.07878 2026-06-09 cs.LG 新提交

Still: Amortized KV Cache Compaction in a Single Forward Pass

Still: 单次前向传递中的摊销KV缓存压缩

Charles O'Neill, Alex Sandomirsky, Harry Partridge, Mudith Jayasekara, Max Kirkby

发表机构 * Baseten

AI总结提出Still方法，通过单次前向传递的轻量级Perceiver层实现KV缓存压缩，在8×至200×压缩比和8k至128k上下文长度下兼顾速度与质量，长上下文任务超越最强基线8-22分。

详情

AI中文摘要

KV缓存是长时语言模型部署的内存瓶颈。实际上，可部署的压缩器必须足够轻量以便在推理时调用，足够表达以在约束下保留上下文，并且可跨轨迹重用。现有压缩方法仅满足部分要求：选择方法轻量但受限于子集，而合成方法表达性强但依赖于逐上下文优化。这里我们介绍Still，一个小的逐层Perceiver，针对冻结的基础模型训练一次，在单次前向传递中生成紧凑的键和值。在Qwen和Gemma模型上，Still在压缩比从$8\ imes$到$200\ imes$、上下文长度从8k到128k的范围内，占据了速度-质量前沿的有利位置。在长上下文RULER网格上，Still超过最强基线8-22分。相同的紧凑缓存还支持自由形式的摘要，在HELMET上保留了大部分全上下文增益，并在LongBench摘要比较中胜过KV-Distill。由于压缩是一次前向传递，Still可以迭代应用，进入逐上下文方法无法实现的长期场景。我们表明，摊销使长上下文缓存压缩变得可行，而合成使其紧凑状态在极端压缩下有用。

英文摘要

The KV cache is the memory bottleneck of long-horizon language model deployment. Practically, a deployable compactor must be lightweight enough to call during inference, expressive enough to preserve context under constraint, and reusable across a trajectory. Existing compaction methods satisfy only part of this requirement: selection methods are lightweight but subset-bound, while synthesis methods are expressive but rely on per-context optimization. Here we introduce Still, a small per-layer Perceiver trained once against a frozen base model that produces compact keys and values in a single forward pass. On Qwen and Gemma models, Still occupies the favorable side of the speed--quality frontier across compression ratios from $8\times$ to $200\times$ and context lengths from $8$k to $128$k. On the long-context RULER grid, Still exceeds the strongest baseline by 8--22 points. The same compact cache also supports free-form summarization, preserving most of the full-context gain on HELMET and winning a pairwise LongBench summarization comparison against KV-Distill. Because compaction is a forward pass, Still can be applied iteratively, entering a long-horizon regime unavailable to per-context methods. We show that amortization makes long-context cache compaction tractable, and synthesis makes its compact state useful at extreme compression.

URL PDF HTML ☆

赞 0 踩 0

2606.07954 2026-06-09 cs.LG cs.AI 新提交

Minibatch Selection via Partition Matroid Constrained Gradient Matching

基于划分拟阵约束梯度匹配的小批量选择

Prayas Agrawal, Prateek Chanda, Ishita Khatri, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria

发表机构 * Indian Institute of Technology Bombay（印度理工学院班加罗尔）； Department of Computer Science and Engineering（计算机科学与工程系）； Centre for Machine Intelligence and Data Science（机器智能与数据科学中心）； Microsoft Research India（微软印度研究院）； Microsoft India（微软印度）

AI总结提出PartitionSel方法，通过划分拟阵约束下的梯度匹配效用最大化，实现跨域小批量选择，减少冗余并提升训练兼容性，在LLM微调中取得鲁棒性提升。

Comments 28 pages, 12 figures, ICML 2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, PMLR 306, 2026

AI中文摘要

在异构数据上训练大型语言模型（LLMs）需要选择能够平衡收敛速度与跨领域覆盖的小批量。现有方法要么在每个领域内独立选择样本，要么依赖计算昂贵的代理模型来学习连续的领域权重。我们提出PartitionSel，一种跨领域小批量选择方法，它在每个领域的预算（编码为划分拟阵约束）下最大化验证引导的梯度匹配效用。通过单一效用耦合每个领域的预算，PartitionSel旨在减少跨领域选择中的冗余。所提出的目标是弱子模的，并允许使用正交匹配追踪算法，具有可证明的近似保证。在实验中，我们在MetaMathQA和Mol-Instructions上对Qwen2.5和Llama-3进行微调时，评估了PartitionSel的小批量选择。PartitionSel在两个基准测试中均比每个领域和领域无关的基线获得了鲁棒的提升。它还减少了每个批次内冲突梯度对的数量，表明跨领域耦合转化为更兼容的训练更新。

英文摘要

Training large language models (LLMs) on heterogeneous data requires selecting minibatches that balance convergence speed with coverage across domains. Existing methods either select samples independently within each domain or rely on computationally expensive proxy models to learn continuous domain weights. We propose PartitionSel, a cross-domain minibatch selection approach that maximizes a validation-guided gradient-matching utility under per-domain budgets encoded as a partition-matroid constraint. By coupling the per-domain budgets through a single utility, PartitionSel is designed to reduce redundancy in selections across domains. The proposed objective is weakly submodular and admits an orthogonal matching pursuit algorithm with provable approximation guarantees. Empirically, we evaluate PartitionSel for minibatch selection during the fine-tuning of Qwen2.5 and Llama-3 on MetaMathQA and Mol-Instructions. PartitionSel achieves robust gains over per-domain and domain-agnostic baselines on both benchmarks. It also reduces the number of conflicting gradient pairs within each batch, indicating that the cross-domain coupling translates into more compatible training updates.

URL PDF HTML ☆

赞 0 踩 0

2606.08382 2026-06-09 cs.LG cs.AI 新提交

STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

STAR-KV：通过软阈值实现自适应秩控制的低秩KV缓存压缩

Priyansh Bhatnagar, Ashkan Moradifirouzabadi, Se-Hyun Yang, SeungJae Lee, Jungwook Choi, Mingu Kang

发表机构 * University of Washington（华盛顿大学）

AI总结提出STAR-KV框架，通过可微阈值机制实现注意力头和块级别的自适应秩选择，结合混合分解和低秩感知混合精度量化，在多种LLM上达到75%的KV缓存压缩，结合量化可减少20倍，并实现6.9倍注意力模块加速和3.1倍端到端生成吞吐提升。

详情

AI中文摘要

低秩投影通过利用隐藏维度冗余已成为压缩KV缓存的一种有前景的方法。然而，先前的方法依赖于固定或启发式秩选择，难以在最小精度损失下实现激进压缩。我们提出STAR-KV，一种具有细粒度秩控制的自适应低秩KV缓存压缩框架。STAR-KV包括：1）可微阈值机制，可在注意力头和块级别实现最优秩选择；2）混合分解策略，根据键和值投影的敏感性应用不同的低秩分解；3）低秩感知混合精度量化，利用数据统计实现近乎无损的低比特量化。在多个LLM和基准测试中评估，STAR-KV实现了高达75%的KV缓存压缩，结合量化可实现高达20倍的整体KV缓存减少。通过基于Triton的自定义GPU内核，STAR-KV为注意力模块提供高达6.9倍的加速，端到端生成吞吐量提升3.1倍。我们的代码公开在：https://github.com/PriyanshBhatnagar/STAR-KV。

英文摘要

Low-rank projection has emerged as a promising approach for compressing the KV cache by exploiting hidden-dimension redundancy. However, prior methods rely on fixed or heuristic rank selection and struggle to achieve aggressive compression with minimal accuracy degradation. We propose STAR-KV, an adaptive low-rank KV cache compression framework with fine-grained rank control. STAR-KV encompasses 1) a differentiable thresholding mechanism that enables optimal rank selection at both attention-head and block levels, 2) a hybrid decomposition strategy that applies different low-rank factorizations according to the sensitivity of key and value projections, and 3) a low-rank-aware mixed precision quantization that leverages data statistics for near lossless low-bit quantization. Evaluated across multiple LLMs and benchmarks, STAR-KV achieves up to 75% KV cache compression and up to 20x overall KV cache reduction when combined with quantization. Enabled by custom Triton-based GPU kernels, STAR-KV delivers up to 6.9x speedup for the attention module and 3.1x end-to-end generation throughput. Our code is publicly available at: https://github.com/PriyanshBhatnagar/STAR-KV.

URL PDF HTML ☆

赞 0 踩 0

2606.08446 2026-06-09 cs.LG cs.AI 新提交

Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models

Sparrow: 用于大语言模型稳定高效长上下文强化学习的稀疏 rollout

Yang Zhou, Ranajoy Sadhukhan, Zhaofeng Sun, Zhuoming Chen, Souvik Kundu, Saket Dingliwal, Sai Muralidhar Jayanthi, Aram Galstyan, Haizhong Zheng, Beidi Chen

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Cornell University（康奈尔大学）； Intel（英特尔）； Amazon AGI（亚马逊AGI）

AI总结针对RLVR中长上下文rollout计算昂贵的问题，提出Sparrow方法，通过动态稀疏度调度保持token级策略失配的下尾统计量稳定，在Qwen3系列模型上实现2.0-2.4倍加速，并推广到更大模型和编程领域。

详情

AI中文摘要

尽管强大，但带有可验证奖励的强化学习（RLVR）会诱导极长的思维链（COT），使其计算成本高昂。由于RLVR每步成本主要由长上下文rollout生成主导，稀疏注意力为加速密集rollout提供了一种有前景的方法。然而，稀疏rollout需要精细的稳定性-效率权衡：过于激进的稀疏性会导致崩溃，而过于宽松的稀疏性则加速不足。在这项工作中，我们通过稀疏到密集的演员-策略失配来研究这种权衡。我们首先观察到，稀疏rollout崩溃并非由token间的均匀退化驱动：即使在激进的稀疏性下，大多数稀疏token也能与密集token完美对齐。受此启发，我们假设如果每个token的演员-策略失配的下尾在整个轨迹中保持在临界阈值以上，则稀疏rollout训练保持稳定。我们引入一种动态稀疏度调度，在生成过程中保持该尾统计量恒定，并验证了我们的假设。在Qwen3思考族模型上，将尾失配统计量保持在一致阈值附近通常能实现稳定训练。然后，我们使用成本模型在该失配阈值下找到最大加速的稀疏度调度，在训练Qwen3-1.7B、Qwen3-4B和Qwen3-8B时分别实现了2.2倍、2.4倍和2.0倍的rollout加速。实验表明，这些阈值可推广到更大的模型（Qwen3-14B）和另一个RL领域（编程）。最后，我们的分析自然引出了DistillSparse：在稀疏rollout上进行轻量级基于LoRA的蒸馏，使更激进的稀疏性达到相同的稀疏到密集失配阈值，从而获得更高的加速。

英文摘要

Despite being powerful, reinforcement learning with verifiable rewards (RLVR) induces extremely long COT, making it computationally expensive. Since RLVR per-step cost is dominated by long-context rollout generation, sparse attention offers a promising way to accelerate dense rollout. However, sparse rollouts require a delicate stability-efficiency tradeoff: overly aggressive sparsity causes collapse, while overly lenient sparsity gives insufficient speedup. In this work, we study this tradeoff through sparse-to-dense actor-policy mismatch. We first observe that sparse rollout collapse is not driven by uniform degradation across tokens: most sparse tokens align perfectly with dense even under aggressive sparsity. Motivated by this, we hypothesize that sparse rollout training remains stable if the lower tail of per-token actor-policy mismatch stays above a critical threshold throughout the trajectory. We introduce a dynamic sparsity schedule that keeps this tail statistic constant during generation and validate our hypothesis. Across Qwen3 thinking-family models, keeping the tail mismatch statistic near a consistent threshold generally enables stable training. We then use a cost model to find the sparsity schedule for maximum speedup under this mismatch threshold, achieving 2.2x, 2.4x, and 2.0x rollout speedups when training Qwen3-1.7B, Qwen3-4B, and Qwen3-8B. Empirically, we show the thresholds generalize to a larger model (Qwen3-14B) and another RL domain (coding). Finally, our analysis naturally motivates DistillSparse: lightweight LoRA-based distillation on sparse rollout lets more aggressive sparsity reach the same sparse-to-dense mismatch threshold, yielding higher speedup.

URL PDF HTML ☆

赞 0 踩 0

2606.08565 2026-06-09 cs.LG cs.AI 新提交

EinSort: Sorting is All We Need for Tensorizing LLM

EinSort: 张量化大语言模型，排序即一切

Toshiaki Koike-Akino, Jing Liu, Ye Wang

发表机构 * Toshiaki Koike-Akino ； Jing Liu ； Ye Wang

AI总结提出EinSort方法，通过索引排序发现张量中的低秩结构，实现大语言模型权重和KV缓存的张量化压缩，相比基线方法提升了重构质量。

Comments 38 pages, 17 figures

2606.08574 2026-06-09 cs.LG cs.CV 新提交

OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework

OrderDP：一种理论上保证无损的动态数据剪枝框架

Chenhan Jin, Shengze Xu, Qingsong Wang, Fan Jia, Dingshuo Chen, Tieyong Zeng

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Beijing Normal-Hong Kong Baptist University（北京师范大学-香港 Baptist大学）； Guangzhou Nanfang College（广州南方学院）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； Xiangtan University（湘潭大学）； University of Utah（犹他大学）

AI总结提出OrderDP框架，通过随机子集选取与top-q样本选择实现无偏梯度估计，提供收敛性和泛化性理论保证，在CIFAR和ImageNet上降低40%训练成本且保持精度。

Comments Published as a conference paper at ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

数据剪枝（DP）作为一种常被提及的减轻训练负担的策略，根据定义明确的剪枝方法减少训练样本数量，同时力求实现近乎无损的性能。然而，现有方法通常选择信息量大的样本，与全数据集训练相比可能导致有偏的梯度估计。此外，这种偏差及其对最终性能的影响分析仍不明确。为解决这些问题，我们提出OrderDP，一个即插即用的框架，旨在获得稳定、无偏且近乎无损的训练加速，并具有理论保证。具体而言，OrderDP首先随机选择一个子集，然后选择前$q$个样本，其中相对于代理损失建立无偏性。这确保了OrderDP在代理目标方面进行无偏训练。我们进一步建立了收敛性和泛化性分析，阐明了OrderDP如何影响最优性能，并在保证最终性能的同时实现良好控制的加速。实验上，我们在CIFAR-10、CIFAR-100和ImageNet-1K上对OrderDP与全面基线进行了评估，展示了具有竞争力的精度、稳定的收敛和精确的控制——所有这些都通过更简单的设计和更快的运行时间实现，同时将训练成本降低超过40%。我们的方法兼具强性能和计算效率，为数据高效学习提供了一个稳健且易于适应的工具。代码公开于https://github.com/shengze-xu/OrderDP。

英文摘要

Data pruning (DP), as an oft-stated strategy to alleviate heavy training burdens, reduces the volume of training samples according to a well-defined pruning method while striving for near-lossless performance. However, existing approaches, which commonly select highly informative samples, can lead to biased gradient estimation compared to full-dataset training. Furthermore, the analysis of this bias and its impact on final performance remains ambiguous. To address these challenges, we propose OrderDP, a plug-and-play framework that aims to obtain stable, unbiased, and near-lossless training acceleration with theoretical guarantees. Specifically, OrderDP first randomly selects a subset and then chooses the top-$q$ samples, where unbiasedness is established with respect to a surrogate loss. This ensures that OrderDP conducts unbiased training in terms of the surrogate objective. We further establish convergence and generalization analyses, elucidating how OrderDP affects optimal performance and enables well-controlled acceleration while ensuring guaranteed final performance. Empirically, we evaluate OrderDP against comprehensive baselines on CIFAR-10, CIFAR-100, and ImageNet-1K, demonstrating competitive accuracy, stable convergence, and exact control -- all with a simpler design and faster runtime, while reducing training cost by over 40%. Delivering both strong performance and computational efficiency, our method serves as a robust and easily adaptable tool for data-efficient learning. The code is publicly available at https://github.com/shengze-xu/OrderDP.

URL PDF HTML ☆

赞 0 踩 0

2606.08584 2026-06-09 cs.LG 新提交

Convolutional Sparse Coding via the Locally Competitive Algorithm on Loihi 2

基于Loihi 2的局部竞争算法实现卷积稀疏编码

Geoffrey Kasenbacher, Daniel Ruepp, Gerrit A. Ecke

发表机构 * Mercedes-Benz AG（梅赛德斯-奔驰集团）； Institut für Robotik und Kognitive Systeme, Universität zu Lübeck（吕贝克大学机器人与认知系统研究所）

AI总结本文在Loihi 2神经形态芯片上实现了卷积稀疏编码的局部竞争算法，并与GPU基线对比，展示了其在结构化稀疏推理中的可行性和优势。

详情

AI中文摘要

稀疏编码通过将输入表示为仅少量基函数的线性组合，为信号表示提供了一个原则性框架。局部竞争算法（LCA）因其动力学特性（泄漏积分、阈值化和侧向抑制）自然映射到神经形态硬件，在神经形态计算中特别有吸引力。虽然先前的工作已在Loihi 2上研究了非卷积LCA，但卷积设置尤其令人感兴趣，因为它引入了空间结构、权重共享、重叠感受野和缩放行为，这些更代表实际的稀疏推理工作负载。在这项工作中，我们提出了通过LCA在Loihi 2上实现卷积稀疏编码，并在相同的推理问题上与传统的GPU基线进行了评估。该实现遵循单层循环LCA公式，并将其扩展到具有从成对滤波器相互作用导出的局部抑制核的卷积特征图。据我们所知，这是Loihi 2上卷积LCA的首次实现和基准测试。我们的目标不仅是证明可行性，而且还要阐明在何种操作条件下卷积稀疏推理在神经形态硬件上变得有吸引力。由此产生的研究将卷积LCA定位为新兴神经形态系统上结构化稀疏推理的有用基准。

英文摘要

Sparse coding provides a principled framework for signal representation by expressing an input as a linear combination of only a small number of basis functions. The Locally Competitive Algorithm (LCA) is particularly attractive in the context of neuromorphic computing because its dynamics, leaky integration, thresholding, and lateral inhibition map naturally to neuromorphic hardware. While prior work has studied non-convolutional LCA on Loihi 2, the convolutional setting is of particular interest because it introduces spatial structure, weight sharing, overlapping receptive fields, and scaling behavior that are more representative of practical sparse inference workloads. In this work, we present a Loihi 2 implementation of convolutional sparse coding via the LCA and evaluate it against a conventional GPU baseline on the same inference problems. The implementation follows a one-layer recurrent LCA formulation and extends it to convolutional feature maps with local inhibitory kernels derived from pairwise filter interactions. To the best of our knowledge, this is the first implementation and benchmark of convolutional LCA on Loihi 2. Our goal is not only to demonstrate feasibility, but also to clarify in which operating regimes convolutional sparse inference becomes attractive on neuromorphic hardware. The resulting study positions convolutional LCA as a useful benchmark for structured sparse inference on emerging neuromorphic systems.

URL PDF HTML ☆

赞 0 踩 0

2606.08635 2026-06-09 cs.LG cs.DC 新提交

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

SpectrumKV: 面向预填充-解码分离式LLM服务的逐令牌混合精度KV缓存传输

Yang Pengju

发表机构 * GitHub

AI总结针对预填充-解码分离架构中KV缓存传输开销大的问题，提出SpectrumKV，通过为每个令牌分配不同精度（FP16/INT8/INT4）实现混合精度传输，并设计轻量部署探测自适应选择精度策略，在相同传输预算下显著提升模型质量并降低TTFT。

Comments 28 pages,13 figures,8 tables

详情

AI中文摘要

预填充-解码（PD）分离将提示处理与令牌生成解耦，但也使键值（KV）缓存成为网络负载。现有的PD端KV缩减方法大多是二元的：选中的令牌以全精度传输，其余则不传输。本文认为二元选择留下了一个有用的设计空间未被利用。SpectrumKV为每个令牌分配一个精度级别：注意力汇聚点和其他高重要性令牌以FP16保护，中等重要性令牌以INT8发送，低重要性令牌在模型可容忍时以INT4发送。主要的实际复杂性在于INT4容忍度是模型相关的。Qwen2.5-7B在INT4 KV量化下灾难性失败，而Mistral-7B和Gemma-2-9B保持稳定。因此，SpectrumKV运行一个轻量级的部署时探测：在三级策略下进行三次激进的NIAH试验。通过的模型使用FP16+INT8+INT4；失败的模型回退到FP16+INT8。在Qwen2.5-7B-Instruct、Mistral-7B-Instruct-v0.3和Gemma-2-9B-it上，SpectrumKV在相同传输预算下提高了质量。在WikiText-2上，归一化KV预算为50%时，SpectrumKV分别将困惑度改变+1.97%、-0.06%和-0.44%，而PDTrim为+25.85%、+22.07%和+35.63%。在4096令牌的NIAH检索中，自适应策略在激进预算b=0.3下对Qwen达到52.6%，而PDTrim为26.3%，并在b=0.5时达到100%；Mistral和Gemma在三级策略下保持检索性能。传输路径的端到端GPU计时显示，在b=0.5时TTFT降低50-62%。这些结果表明，PD KV传输应被视为精度分配问题，而不仅仅是令牌剪枝。

英文摘要

Prefill-decode (PD) disaggregation decouples prompt processing from token generation, but it also turns the key-value (KV) cache into a network payload. Existing PD-side KV reduction methods are mostly binary: selected tokens are transmitted at full precision and the rest are not transmitted. This paper argues that binary selection leaves a useful design space unused. SpectrumKV assigns a precision level to each token instead: attention sinks and other high-importance tokens are protected at FP16, medium-importance tokens are sent at INT8, and low-importance tokens are sent at INT4 when the model can tolerate it. The main practical complication is that INT4 tolerance is model-dependent. Qwen2.5-7B catastrophically fails under INT4 KV quantization, while Mistral-7B and Gemma-2-9B remain stable. SpectrumKV therefore runs a lightweight deployment-time probe: three aggressive NIAH trials under a 3-tier policy. Models that pass use FP16+INT8+INT4; models that fail fall back to FP16+INT8. Across Qwen2.5-7B-Instruct, Mistral-7B-Instruct-v0.3, and Gemma-2-9B-it, SpectrumKV improves quality at the same transfer budget. At a 50% normalized KV budget on WikiText-2, SpectrumKV changes perplexity by +1.97%,-0.06%, and-0.44%, respectively, compared with PDTrim's +25.85%, +22.07%, and +35.63%. On NIAH retrieval at 4096 tokens, the adaptive policy reaches 52.6% on Qwen at the aggressive b=0.3 budget versus 26.3% for PDTrim, and reaches 100% by b=0.5; Mistral and Gemma preserve retrieval under the 3-tier policy. End-to-end GPU timing of the transfer path shows 50-62% TTFT reductions at b=0.5. These results suggest that PD KV transfer should be treated as a precision-allocation problem, not only as token pruning.

URL PDF HTML ☆

赞 0 踩 0

2606.08962 2026-06-09 cs.LG cs.CV cs.RO 新提交

迈向编译器世界模型：学习潜在动态以实现高效张量程序搜索

Haolin Pan, Lianghong Huang, Xvlin Zhou, Mingjie Xing, Yanjun Wu

发表机构 * Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences（中国科学院大学杭州高等研究院）； Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； University of Chinese Academy of Sciences（中国科学院大学）

AI总结提出一种受世界模型启发的评估器，通过轻量级过渡模型在连续潜在空间中展开调度动作，避免昂贵AST变异和重复编码，在TVM AutoScheduler中实现比Ansor更优的延迟和测量效率。

详情

AI中文摘要

张量程序优化对现代机器学习系统至关重要，但其搜索空间巨大。现有的自动调度器通过学习成本模型来降低测量成本，但它们通常将每个候选视为静态代码快照，忽略了产生它的调度轨迹。这使得它们对动作依赖不敏感，且易受表面代码变化影响。我们提出一种受世界模型启发的评估器，将调度评估建模为程序状态上的动作条件潜在动态。从初始程序开始，它使用轻量级过渡模型在连续潜在空间中展开调度动作，避免了昂贵的AST变异和重复代码编码。最终的动态表示与动作和硬件特征结合以对候选进行排序。在TVM AutoScheduler中实现后，我们的方法在相同64次试验预算下，GPU上代表性子图延迟比Ansor提升1.37倍，CPU上提升1.54倍。它还在使用10倍更少测量次数的情况下，在2.2%几何平均内匹配Ansor-10K，并将完整模型推理速度提升至PyTorch/PyTorch-opt(cuDNN)的4.61倍/3.67倍几何平均。

英文摘要

Tensor program optimization is essential for modern machine learning systems, but its search space is enormous. Existing auto-schedulers reduce measurement cost with learned cost models, yet they usually evaluate each candidate as a static code snapshot, ignoring the schedule trajectory that produced it. This makes them insensitive to action dependencies and vulnerable to superficial code variations. We propose a \emph{world-model-inspired} evaluator that models schedule evaluation as action-conditioned latent dynamics over program states. Starting from the initial program, it rolls out scheduling actions in a continuous latent space with a lightweight transition model, avoiding expensive AST mutation and repeated code encoding. The final dynamic representation is combined with action and hardware features to rank candidates. Implemented in TVM AutoScheduler, our method improves representative-subgraph latency over Ansor by 1.37$\times$ on GPU and 1.54$\times$ on CPU under the same 64-trial budget. It also matches Ansor-10K within 2.2% geometric mean using 10$\times$ fewer measurements, and accelerates full-model inference over PyTorch/PyTorch-opt(cuDNN) by 4.61$\times$/3.67$\times$ geometric mean.

URL PDF HTML ☆

赞 0 踩 0

2606.09388 2026-06-09 cs.LG 新提交

Distilling Safe LLM Systems via Soft Prompts for On Device Settings

通过软提示蒸馏安全的设备端LLM系统

Motasem Alfarra, Cristina Pinneri, Dana Kianfar, Mohammed Almousa, Christos Louizos

发表机构 * Qualcomm AI Research（高通人工智能研究院）

AI总结针对资源受限设备上部署安全大语言模型（LLM）的挑战，提出基于软提示与蒸馏训练的安全对齐方法，在最小化额外计算开销的同时实现优越的安全-有用性权衡。

Comments Accepted to UAI 2026

详情

Journal ref: 42nd Conference on Uncertainty in Artificial Intelligence 2026

AI中文摘要

在资源受限的边缘设备上部署安全的大语言模型（LLM）面临关键挑战：虽然将LLM与防护模型结合的双模型系统能提供有效的安全保障，但其巨大的内存和计算需求使其在设备端部署中代价高昂。本文对资源受限环境下的参数高效安全对齐方法进行了全面研究。通过对多种LLM架构、训练目标和参数高效微调方法的系统评估，我们发现软提示与基于蒸馏的训练相结合始终优于其他方法。我们引入了基于总变差和KL散度的蒸馏框架，能够有效将防护模型的安全行为迁移到学习到的软提示中。我们在多个基准上的评估表明，与LoRA适配器、引导向量和直接优化方法相比，这种组合在安全-有用性权衡上表现更优，同时在推理时仅需极少的额外内存和计算。这些发现确立了软提示蒸馏作为设备端LLM部署中安全对齐的首选方法。

英文摘要

Deploying safe large language models (LLMs) on resource-constrained edge devices presents a critical challenge: while dual-model systems combining LLMs with guard models provide effective safety guarantees, their substantial memory and computational demands make them prohibitively expensive for on-device deployment. This paper presents a comprehensive study of parameter-efficient safety alignment methods for resource-constrained settings. Through systematic evaluation across multiple LLM architectures, training objectives, and parameter-efficient fine-tuning approaches, we identify that soft prompts combined with distillation-based training consistently outperform alternative methods. We introduce distillation frameworks based on total variation and KL divergence that effectively transfer safety behaviors from guard models into learned soft prompts. Our evaluations on various benchmarks demonstrate that this combination achieves superior safety-usefulness trade-offs compared to LoRA adapters, steering vectors, and direct optimization methods, while requiring minimal additional memory and compute at inference time. These findings establish soft prompt distillation as the preferred approach for safety alignment in on-device LLM deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.09456 2026-06-09 cs.LG 新提交

Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families

打破分词器壁垒：跨模型系列的在线策略蒸馏

Yifan Niu, Han Xiao, Dongyi Liu, Zelong Wang, Dihong Gong, Yasheng Wang, Jia Li

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Tencent（腾讯）； The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出跨分词器在线策略蒸馏方法，通过精确的token映射算法使教师模型概率分布信号能跨不同分词器传播，显著提升计算效率。

详情

AI中文摘要

在线策略蒸馏（OPD）已成为大型语言模型（LLM）后训练中从领域专家向学生模型迁移知识的核心技术。然而，现有的OPD蒸馏方法要求教师和学生模型共享相同的分词器，限制了OPD在模型系列内的适用性。当前主流实践通常采用在教师生成的响应上进行监督微调（SFT）来实现跨分词器蒸馏，这未能捕捉到嵌入在教师概率分布中的丰富知识。在这项工作中，我们使标准的在线策略蒸馏方法能够跨模型系列运行，确保高保真的token级信号可以通过精确的token映射算法在不同分词器之间传播。大量实验表明，在各种基准测试上，跨分词器OPD在计算效率上显著优于基线方法。我们的结果为OPD解锁了更广泛的教师-学生配对，为适应和增强LLM之间的交互开辟了新途径。

英文摘要

On-Policy Distillation (OPD) has become a core technique in the post-training of Large Language Models (LLMs) for transferring knowledge from domain experts to student models. However, existing OPD distillation methods require teacher and student models to share the same tokenizer, restricting the applicability of OPD within the model series. Current mainstream practice typically employs Supervised Fine-Tuning (SFT) on teacher-generated responses for cross-tokenizer distillation, which fails to capture the rich knowledge embedded in the teacher's probability distribution. In this work, we enable the standard on-policy distillation method to operate across model families, ensuring that high-fidelity token-level signals can propagate across different tokenizers with a precise token-mapping algorithm. Extensive experiments show that cross-tokenizer OPD is significantly more compute-efficient than baselines on various benchmarks. Our results unlock a broader range of teacher-student pairs for OPD, opening up new avenues for adapting and enhancing interactions between LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.09471 2026-06-09 cs.LG cs.CL 新提交

Escaping the KL Agreement Trap in On-Policy Distillation

逃离在线策略蒸馏中的KL一致陷阱

Haoran Xin, Anhao Zhao, Ying Sun, Jin Li, Xiaoyu Shen, Hui Xiong

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； The Hong Kong Polytechnic University（香港理工大学）； Eastern Institute of Technology, Ningbo（宁波东方理工大学）

AI总结针对在线策略蒸馏中学生陷入低KL一致陷阱导致训练信号弱的问题，提出KAT动态终止规则，过滤弱监督，在数学基准上提升avg@k 2.66%和pass@k 3.43%，同时减少59.73%的rollout长度。

Comments 13 pages, 8 figures

2606.09514 2026-06-09 cs.LG 新提交

BUDDY: BUdget-Driven DYnamic Depth Routing for Adaptive Large Language Model Inference

BUDDY: 预算驱动的动态深度路由用于自适应大型语言模型推理

Yuhua Zhou, Shaoqi Yu, Shichao Weng, Changhai Zhou, Mingze Yin, Fei Yang, Aimin Pan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出BUDDY框架，通过轻量决策模块根据输入动态选择top-k层，并复用KV缓存支持解码时自适应路由，在严格预算控制下提升精度-计算权衡。

详情

AI中文摘要

大型语言模型（LLMs）由于其深度和参数规模，推理成本高昂。深度剪枝可以通过跳过冗余的Transformer块来降低延迟，但现有方法（i）在用户指定的计算预算下提供的控制有限，（ii）通常固定路由路径，无法在解码过程中随着上下文增长而自适应。我们提出了BUDDY，一个预算驱动的动态深度路由框架。BUDDY使用轻量级决策模块根据输入对中间层进行评分，并确定性地执行top-k层以满足给定预算。为了支持解码时的自适应，BUDDY重用第一层的KV缓存作为低开销的全局上下文源，并在每次路由决策前将其与最新令牌表示合并。当未提供明确预算时，可选的预算预测器估计输入相关的计算水平以平衡质量和效率。在Llama系列和Qwen模型上的实验表明，BUDDY与强静态剪枝基线相比具有竞争力，并且通常能改善精度-计算权衡，同时独特地支持严格预算控制、解码时重路由以及单个训练模型内的多个预算。

英文摘要

Large language models (LLMs) incur high inference cost due to their depth and parameter scale. Depth pruning can reduce latency by skipping redundant Transformer blocks, but existing methods (i) provide limited control under user-specific compute budgets and (ii) typically fix the routing path, failing to adapt as the context grows during decoding. We propose Buddy, a budget-driven dynamic depth routing framework. Buddy uses a lightweight Decision Module to score intermediate layers conditioned on the input and deterministically executes the top-k layers to satisfy a given budget. To support decode-time adaptation, Buddy reuses the first-layer KV cache as a low-overhead global context source and pools it together with the newest token representation before each routing decision. When no explicit budget is provided, an optional Budget Predictor estimates an input-dependent compute level to balance quality and efficiency. Experiments on Llama-family and Qwen models show that Buddy is competitive with strong static pruning baselines and often improves the accuracy-compute trade-off, while uniquely supporting strict budget control, decode-time rerouting, and multiple budgets within a single trained model.

URL PDF HTML ☆

赞 0 踩 0

2606.09682 2026-06-09 cs.LG cs.DC cs.PF 新提交

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

AutoMegaKernel：用于自我重定目标超内核合成的静态检查代理框架

Jaber Jaber, Osama Jaber

发表机构 * RightNow AI

AI总结提出AutoMegaKernel系统，将Llama模型编译为单个持久CUDA内核，通过静态调度验证器确保无死锁和无竞争，自动生成10种模型正确超内核，并在NVIDIA推理卡上以W8A16精度超越cuBLAS bf16。

Comments 18 pages, 5 figures. Open-source code, data, and agent harness: https://github.com/RightNow-AI/AutoMegaKernel

详情

AI中文摘要

AutoMegaKernel (AMK) 将HuggingFace Llama系列模型编译成一个持久的协作CUDA内核，该内核在一次启动中运行整个前向传播，无需为每个模型手写CUDA代码。其贡献在于系统本身，而非原始速度。一个冻结的调度IR验证器通过静态图检查（非机械化证明）静态地认证无死锁和无竞争，因此不安全的智能体提议调度在启动前被拒绝：在7,160个对抗性调度（6,091个不安全）中，它实现了零误接受，并接受了所有360个实际底层实现。同一源代码可重定目标至sm_80/sm_90/sm_120，从单一代码库自动为10个支持模型中的全部生成正确的超内核，并在真实的SmolLM2-135M检查点上重现HuggingFace贪婪解码逐token匹配（困惑度差异2.5e-7）。一个无人值守、智能体驱动的自动研究循环在其自身基线之上自我改进超内核（1.25-1.72倍）。一个搜索发现的int8 (W8A16) 超内核在NVIDIA数据中心推理集群的batch-1解码中击败了CUDA图化的cuBLAS bf16：L4最高1.33倍，当前一代L40S 1.25-1.27倍，A10G大规模最高1.08倍，以及消费级RTX 5090 1.19-1.23倍。排序并非带宽的简单函数（864 GB/s的L40S击败了600 GB/s的A10G）；分界线是推理级与训练级。AMK在高带宽训练级A100/H100上落后于cuBLAS，其中框架定位了跨SM同步瓶颈；我们坦率地报告了这一差距。这是解码位置0处精度不对称（W8A16 vs bf16）的比较；最大的真实检查点是TinyLlama-1.1B。代码和框架：https://github.com/RightNow-AI/AutoMegaKernel

英文摘要

AutoMegaKernel (AMK) compiles a HuggingFace Llama-family model into a single persistent cooperative CUDA kernel that runs the whole forward pass in one launch, with no per-model hand-written CUDA. The contribution is the system, not raw speed. A frozen schedule-IR validator statically certifies deadlock-freedom and race-freedom via static graph checks (not a mechanized proof), so an unsafe agent-proposed schedule is rejected before launch: across 7,160 adversarial schedules (6,091 unsafe) it had zero false-accepts and accepted all 360 real lowerings. The same source retargets sm_80/sm_90/sm_120 from one codebase, auto-generates correct megakernels for 10 of 10 supported models, and on a real SmolLM2-135M checkpoint reproduces HuggingFace greedy decode token-for-token (perplexity match 2.5e-7). An unattended, agent-drivable autoresearch loop self-improves the megakernel over its own baseline (1.25-1.72x). A search-found int8 (W8A16) megakernel beats CUDA-graphed cuBLAS bf16 at batch-1 decode across NVIDIA's datacenter inference fleet: L4 up to 1.33x, the current-gen L40S 1.25-1.27x, A10G up to 1.08x at scale, and the consumer RTX 5090 1.19-1.23x. The ordering is not a clean function of bandwidth (the 864 GB/s L40S beats the 600 GB/s A10G); the divide is inference-class vs training-class. AMK trails cuBLAS on the high-bandwidth training-class A100/H100, where the harness localizes the cross-SM-sync bottleneck; we report the gap plainly. This is a precision-asymmetric (W8A16 vs bf16) comparison at decode position 0; the largest real checkpoint is TinyLlama-1.1B. Code and the harness: https://github.com/RightNow-AI/AutoMegaKernel

URL PDF HTML ☆

赞 0 踩 0

2606.09707 2026-06-09 cs.LG cs.CL 新提交

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

BrainSurgery：用于模型编辑和升级的可复现且可靠的声明式权重操作

Gianluca Barmina, Annemette Broch Pirchert, Andrea Blasi Núñez, Lukas Galke Poech, Peter Schneider-Kamp

发表机构 * University of Southern Denmark（南丹麦大学）

AI总结提出BrainSurgery工具，通过声明式YAML计划实现神经网络检查点的鲁棒可复现张量操作，支持结构修改、数学变换和张量重塑，内置断言验证防止静默错误。

详情

AI中文摘要

随着深度学习模型规模的扩大，管理、检查和修改大型检查点变得越来越具有挑战性。研究人员经常需要更改模型权重以进行层重构、精度转换、低秩分解和架构调试，但这些工作流程通常依赖于脆弱的临时Python脚本。在这里，我们介绍BrainSurgery，一个用于对神经网络检查点进行鲁棒且可复现的“张量手术”的工具，并提供一个系统演示，涵盖从模型升级到LoRA提取的四个示例和三个案例研究。通过抽象存储格式和内存管理，BrainSurgery通过声明式YAML计划执行复杂的转换。它支持通过表达性正则表达式和结构定位进行结构修改、数学变换和张量重塑，同时内置断言验证张量形状、数据类型和值，以防止静默错误。我们期望BrainSurgery通过其可复现且经过验证的操作，为未来的研究提供坚实的基础。

英文摘要

As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python scripts. Here, we introduce BrainSurgery, a tool for robust and reproducible "tensor surgery" on neural network checkpoints, and provide a system demonstration covering four examples and three case studies from model upcycling to LoRA extraction. By abstracting storage formats and memory management, BrainSurgery executes complex transformations through declarative YAML plans. It supports structural modifications, mathematical transformations, and tensor reshaping through expressive regex and structural targeting, while built-in assertions validate tensor shapes, data types, and values to prevent silent errors. We envision that BrainSurgery will provide a strong foundation for future research through its reproducible and validated operations.

URL PDF HTML ☆

赞 0 踩 0

2606.07574 2026-06-09 cs.DC cs.AI cs.LG stat.CO stat.ML 交叉投稿

Accelerating Birkhoff Projection for Manifold-Constrained Hyper-Connections

加速流形约束超连接的Birkhoff投影

Chenrui Wang, Yixuan Qiu

发表机构 * School of Statistics（统计学系）； Renmin University of China（中国人民大学）； School of Statistics and Data Science（统计学与数据科学学院）； Institute of Big Data Research（大数据研究院）； Shanghai University of Finance and Economics（上海财经大学）

AI总结针对流形约束超连接中Birkhoff投影的计算瓶颈，提出基于对偶公式和牛顿法的端到端加速框架，结合隐式微分和CUDA内核实现超过20倍加速。

详情

AI中文摘要

流形约束超连接（mHCs）最近被提出作为超连接的一种原则性扩展，其中残差混合矩阵通过投影到Birkhoff多面体上被约束为双随机矩阵。在实际的mHC实现中，该约束通过Sinkhorn-Knopp迭代强制执行，反向传播依赖于展开迭代求解器。这种设计引入了大量的计算和内存开销，并且当算法在具有挑战性的输入上收敛缓慢时，可能产生不准确的投影，从而破坏mHCs预期的范数控制和稳定性保证。在这项工作中，我们聚焦于实际重要的4x4 Birkhoff投影设置，并开发了一个端到端的加速框架。通过利用对偶公式，我们将问题简化为一个三维无约束凸问题，并使用牛顿法求解，实现了快速收敛和高精度。对于反向传播，我们用隐式微分替代展开微分，无需存储中间状态即可获得精确梯度。为了利用大规模并行性，我们设计了一个warp级别的CUDA内核，仅使用寄存器级原语，避免了全局和共享内存I/O。与代表性开源基线的大量实验表明，所提出的求解器产生了更可靠的双随机投影——特别是在输入幅度较大时——并实现了显著的端到端加速（包括反向传播），在大批量下达到超过20倍的加速，同时保持数量级更小的边际误差。

英文摘要

Manifold-constrained hyper-connections (mHCs) have recently been proposed as a principled extension of hyper-connections, where the residual mixing matrices are constrained to be doubly stochastic via projection onto the Birkhoff polytope. In practical mHC implementations, this constraint is enforced by Sinkhorn-Knopp iterations, and the backward pass relies on unrolling the iterative solver. This design introduces substantial computation and memory overhead, and may also yield inaccurate projections when the algorithm converges slowly on challenging inputs, undermining the intended norm-control and stability guarantees of mHCs. In this work, we focus on the practically important 4x4 Birkhoff projection setting and develop an end-to-end acceleration framework. By leveraging the dual formulation, we reduce the problem to a three-dimensional unconstrained convex problem and solve it with Newton's method, achieving fast convergence and high accuracy. For the backward pass, we replace the unrolled differentiation with implicit differentiation, yielding exact gradients without storing intermediate states. To exploit massive parallelism, we design a warp-level CUDA kernel that uses only register-level primitives, avoiding global and shared memory I/O. Extensive experiments against representative open-source baselines demonstrate that the proposed solver yields substantially more reliable doubly stochastic projections -- especially when the input magnitude is large -- and achieves significant end-to-end speedups (including the backward pass), reaching over 20x acceleration at large batch sizes while maintaining orders of magnitude smaller marginal errors.

URL PDF HTML ☆

赞 0 踩 0

2606.07666 2026-06-09 quant-ph cs.AR cs.DC cs.LG 交叉投稿

Hardware-aware Low-latency Quantum Compilation with Data-driven Lightweight Error Detection for Early Fault-Tolerant Systems

面向早期容错系统的硬件感知低延迟量子编译与数据驱动的轻量级错误检测

Sumit Chongder

发表机构 * Inter-Disciplinary Research Platform, Quantum Information and Computation, Indian Institute of Technology Jodhpur（跨学科研究平台、量子信息与计算、印度理工学院贾尔普尔）

AI总结提出一种集成硬件感知编译与数据驱动量子错误检测的框架，通过噪声加权代价函数和学习型多目标调度器联合优化量子比特映射、SWAP插入和综合征调度，在VQE、相位估计和Grover基准测试中，将算法成功概率提升高达68%。

Comments 16 pages, 15 figures, Springer LNCS format. Code available at https://github.com/Sumitchongder/quantum-hw-aware-pipeline

详情

AI中文摘要

噪声中等规模量子（NISQ）处理器正进入早期容错阶段，此时完全量子纠错代价高昂，而轻量级错误检测可有效提高算法成功率。现有编译和错误检测工具链孤立处理这些问题，缺乏在延迟约束下平衡检测开销与成功概率的原则性方法。我们提出一种集成的硬件感知编译与数据驱动量子错误检测（QED）框架，通过噪声加权代价函数和学习型多目标调度器，联合优化量子比特映射、SWAP插入和综合征调度。在HPC集群上使用GPU加速密度矩阵模拟（NVIDIA cuQuantum SDK）进行的仿真实验，涵盖VQE、相位估计和Grover基准测试、三种噪声模型以及6-20量子比特（深度10-160）的电路规模，结果表明，在8量子比特VQE实例上，联合协同设计相比SABRE结合后选择，将算法成功概率提升高达68%（95%置信区间：60%至76%）。

英文摘要

Noisy intermediate-scale quantum (NISQ) processors are entering an early fault-tolerance regime where full quantum error correction carries prohibitive resource costs, yet lightweight error detection can meaningfully improve algorithmic success rates. Existing compilation and error-detection toolchains treat these concerns in isolation, with no principled way to balance detection overhead against success probability under latency constraints. We present an integrated hardware-aware compilation and data-driven quantum error-detection (QED) framework that jointly optimises qubit mapping, SWAP insertion, and syndrome-schedule placement via a noise-weighted cost function and a learned multi-objective scheduler. Simulation experiments on an HPC cluster using GPU-accelerated density-matrix simulation (NVIDIA cuQuantum SDK) across VQE, phase-estimation, and Grover benchmarks, three noise profiles, and circuit sizes of 6-20 qubits (depths 10-160), show that joint co-design raises algorithmic success probability by up to 68 percent (95 percent CI: 60 percent to 76 percent) over SABRE on an 8-qubit VQE instance with post-selection.

URL PDF HTML ☆

赞 0 踩 0

2606.07819 2026-06-09 cs.AI cs.LG 交叉投稿

Aperon技术报告：用于高维近似最近邻搜索的层次化无指针切向局部搜索

Yong Fu

发表机构 * Substratum Labs（Substratum实验室）

AI总结提出HNTL框架，通过无指针块SoA布局和局部切空间划分，实现高维向量索引与候选生成，在768维数据上以C=20候选池达到Rerank Recall@10=1.0000，相比指针追踪图遍历加速3.61倍。

详情

AI中文摘要

我们提出了HNTL（层次化无指针切向局部搜索），这是Aperon向量内存系统的核心向量索引和候选生成框架。近邻图（例如HNSW）在内存开销上承受了沉重的指针税，并导致不规则的内存访问，从而阻塞CPU流水线。HNTL通过将高维空间划分为局部、连贯的颗粒，将向量表示为局部切空间上的低维坐标，并使用无指针的Block-SoA（结构体数组）布局顺序扫描它们来解决这一问题。在非各向同性流形数据（d=768，N=10,000）上，局部PCA捕获了96.3%的方差，使得HNTL能够仅使用C=20个向量的候选池达到最终Rerank Recall@10为1.0000。通过Apple kperf CPU性能监控单元（PMU）计数器进行的硬件性能分析表明，我们使用NEON自动向量化的C++ Block-SoA扫描引擎相比标准的指针追踪图遍历实现了3.61倍加速（4.137纳秒/向量对比14.951纳秒/向量），这得益于3.59倍的IPC（每周期指令数）和接近零的L1/L2数据缓存未命中。

英文摘要

We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximity graphs (e.g., HNSW) incur a heavy pointer tax in memory overhead and induce irregular memory accesses that stall CPU pipelines. HNTL resolves this by partitioning the high-dimensional space into local, coherent grains, representing vectors as low-dimensional coordinates on local tangent spaces, and scanning them sequentially using a pointerless Block-SoA (Structure-of-Arrays) layout. On anisotropic manifold data (d=768, N=10,000), local PCA captures 96.3% of the variance, allowing HNTL to achieve a final Rerank Recall@10 of 1.0000 with a candidate pool size of only C=20 vectors. Hardware profiling via Apple kperf CPU Performance Monitoring Unit (PMU) counters demonstrates a 3.61x speedup (4.137 ns/vector vs. 14.951 ns/vector) for our NEON auto-vectorized C++ Block-SoA scan engine over standard pointer-chasing graph traversals, driven by a 3.59x IPC (Instructions Per Cycle) and near-zero L1/L2 data cache misses.

URL PDF HTML ☆

赞 0 踩 0

2606.09213 2026-06-09 cs.PL cs.LG 交叉投稿

SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C

SNN-MLIR：一种用于将神经形态SNN从NIR编译到裸机C的MLIR方言

Alejandro García Gener, Alvaro Rollón de Pinedo

发表机构 * INTERA-Group（INTERA小组）

AI总结提出SNN-MLIR，一种MLIR方言，通过NIR-MLIR-C编译桥将神经形态SNN模型从框架无关的NIR格式编译为可移植的C代码，支持浮点和量化数据，实现从仿真到硬件部署的统一中间表示。

Comments 8 pages, 5 figures, 5 tables

详情

AI中文摘要

脉冲神经网络（SNN）越来越多地在各种框架（SnnTorch、Lava、Norse等）中训练，每个框架都有自己的模型格式。神经形态中间表示（NIR）通过提供一种通用的、框架无关的格式来交换训练好的SNN模型，解决了碎片化问题。NIR解决了交换问题，但仅止于此。它提供了网络的描述，而非运行网络的路径。每个后端仍需自行实现部署，之间没有共享的、可转换的编译器表示。本文提出snn-mlir，一种用于SNN的树外MLIR方言，以及一个NIR-MLIR-C编译桥。该方言提供了一小组类型多态操作，这些操作在浮点（f32/f64）和量化数据上行为一致，因此单一的中间表示同时服务于仿真和面向硬件的部署。一个Python前端读取任何NIR文件并发出方言IR，自动插入重新缩放操作以保持各层量化尺度一致。一个参考降级过程将方言转换为标准的linalg和arith操作，工具链从中生成自包含、无依赖的C11代码，可在任何支持C的CPU或嵌入式目标上编译和运行。我们评估了数值精度与参考输出的匹配度、跨CPU目标的可移植性以及量化的代价。当前范围是前馈全连接网络，后端为CPU。snn-mlir以Apache-2.0许可证（含LLVM例外）开源发布，并已在GitHub上可用。

英文摘要

Spiking neural networks (SNNs) are increasingly trained in a wide range of frameworks (SnnTorch, Lava, Norse, and others) each with its own model format. The Neuromorphic Intermediate Representation (NIR) addresses this fragmentation by providing a common, framework-independent format for exchanging trained SNN models. NIR solves the exchange problem, but it stops there. It provides a description of a network, not a path to running one. Each backend is still left to implement deployment on its own, with no shared, transformable compiler representation in between. This paper presents snn-mlir, an outof-tree MLIR dialect for SNNs together with a NIR-MLIR-C compilation bridge. The dialect provides a small set of typepolymorphic operations that work identically on floating-point (f32/f64) and quantized data, so a single intermediate representation serves both simulation and hardware-oriented deployment. A Python front end reads any NIR file and emits dialect IR, automatically inserting rescaling operations to keep quantization scales consistent across layers. A reference lowering pass converts the dialect to standard linalg and arith operations, from which the toolchain produces self-contained, dependency free C11 code that compiles and runs on any C-capable CPU or embedded target. We evaluate numerical fidelity against reference outputs, portability across CPU targets, and the cost of quantization. The current scope is feedforward, fully-connected networks with a CPU backend. snn-mlir is released as open source under the Apache-2.0 license with LLVM-exception and it is already available on Github.

URL PDF HTML ☆

赞 0 踩 0

2606.09643 2026-06-09 cs.DC cs.AI cs.LG cs.OS 交叉投稿

FMplex: Model Virtualization for Serving Extensible Foundation Models

FMplex: 用于服务可扩展基础模型的模型虚拟化

Hetvi Shastri, Pragya Sharma, Walid A. Hanafy, David Irwin, Mani Srivastava, Prashant Shenoy

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）； University of California Los Angeles（加州大学洛杉矶分校）

AI总结提出FMplex系统，通过将基础模型作为虚拟化层实现多任务共享，结合批感知公平队列调度器，在7个基础模型和92个下游任务上降低延迟达80%，提升任务容量6倍。

详情

AI中文摘要

基础模型（FMs）越来越多地被用作语言、视觉、时间序列和多模态应用的下游任务骨干。然而，现有的模型服务系统将每个定制任务部署为独立的模型实例，从而复制了重型骨干，浪费了加速器内存，并失去了摊销批处理和加载成本的机会。本文提出了FMplex，一个将FM骨干视为部署共享的虚拟化层的服务系统。FMplex为每个任务提供一个虚拟基础模型（vFM），这是一个由共享物理FM支持的逻辑私有FM实例。这种抽象允许独立定制的任务共享一个骨干，同时保留任务特定的扩展、独立生命周期和任务级隔离。此外，我们提出了一种批感知公平队列调度器，该调度器结合了加权任务级共享以及跨共存任务的批内和批间批处理。我们实现了一个基于FMplex的服务栈，涵盖任务构建、共享感知部署和运行时执行。在7个FM骨干（16个变体）和92个下游任务上，FMplex相比空间分区延迟降低高达80%，相比尽力而为共置延迟降低33.3%，同时在集群规模上可托管多达6倍的任务。

英文摘要

Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing model-serving systems deploy each customized task as an independent model instance, thereby replicating heavyweight backbones, wasting accelerator memory, and losing opportunities to amortize batching and loading costs. This paper presents FMplex, a serving system that treats FM backbones as a virtualization substrate for deployment sharing. FMplex presents each task with a virtual foundation model (vFM), a logically private FM instance backed by a shared physical FM. This abstraction lets independently customized tasks share a backbone while preserving task-specific extensions, independent lifecycles, and task-level isolation. In addition, we propose a batch-aware fair-queueing scheduler that combines weighted task-level sharing with inter- and intra-task batching across colocated tasks. We implement a FMplex-based serving stack spanning task construction, sharing-aware deployment, and runtime execution. Across 7 FM backbones (16 variants) and 92 downstream tasks, FMplex reduces latency by up to 80% over spatial partitioning and 33.3% over best-effort co-location, while hosting up to 6x more tasks at cluster scale.

URL PDF HTML ☆

赞 0 踩 0

2606.09659 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

End-to-End Context Compression at Scale

端到端上下文压缩的规模化

Ang Li, Sean McLeish, Haozhe Chen, Nimit Kalra, Zaiqian Chen, Artem Gazizov, Venkata Anoop Suhas Kumar Morisetty, Bhavya Kailkhura, Harshitha Menon, Zhuang Liu, Brian R. Bartoldson, Tom Goldstein, Sanae Lotfi, Micah Goldblum, Pavel Izmailov

发表机构 * New York University（纽约大学）； Modal Labs（Modal实验室）； University of Maryland（马里兰大学）； Princeton University（普林斯顿大学）； Columbia University（哥伦比亚大学）； Harvard University（哈佛大学）； Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）； FAIR at Meta（Meta FAIR实验室）

AI总结本研究通过架构搜索和持续预训练，提出潜在上下文语言模型（LCLMs），一种端到端编码器-解码器压缩器，在通用任务性能、压缩速度和峰值内存上改进帕累托前沿，并可作为长时智能体的高效骨干。

详情

AI中文摘要

长上下文语言模型推理受限于内存，因为KV缓存随上下文长度增长。最近压缩KV缓存的技术存在不足：它们要么大幅降低模型质量，要么需要大量时间和计算来压缩单个长提示。此外，许多方法要求输入适合目标模型的上下文窗口，并且通常与现代生产推理引擎不兼容。编码器-解码器压缩器原则上是一种有吸引力的替代方案，它将长令牌序列映射到由解码器消费的较短潜在嵌入序列。然而，现有方法在精度-效率前沿上无法与KV缓存压缩竞争。在这项工作中，我们重新审视编码器-解码器压缩并缩小了这一差距。我们首先进行架构搜索，从头开始预训练许多变体，以确定如何最佳设计和训练编码器-解码器压缩器。根据我们的发现，我们持续预训练一系列0.6B编码器、4B解码器模型，每个模型在超过350B令牌上训练，压缩比为1:4、1:8和1:16。我们引入了潜在上下文语言模型（LCLMs），这是一系列压缩器，在通用任务性能、压缩速度和峰值内存使用上改进了帕累托前沿。我们证明了LCLMs可作为长时智能体的高效骨干，让智能体浏览压缩的长上下文并按需自适应扩展相关片段。

英文摘要

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, many methods require the input to fit within the target model's context window, and are generally incompatible with modern production inference engines. Encoder-decoder compressors, which map a long token sequence to a shorter sequence of latent embeddings consumed by a decoder, are an appealing alternative in principle. However, existing approaches are not competitive with KV cache compression on the accuracy-efficiency frontier. In this work, we revisit encoder-decoder compression and close this gap. We first perform an architecture search, pre-training many variants from scratch to determine how best to design and train encoder-decoder compressors. Guided by our findings, we continually pre-train a family of 0.6B-encoder, 4B-decoder models on over 350B tokens each, at compression ratios of 1:4, 1:8, and 1:16. We introduce Latent Context Language Models (LCLMs), a family of compressors that improve the Pareto frontier across general-task performance, compression speed, and peak memory usage. We demonstrate that LCLMs serve as efficient backbones for long-horizon agents, letting the agent skim through a compressed long context and adaptively expand relevant segments on demand.

URL PDF HTML ☆

赞 0 踩 0

2411.03253 2026-06-09 cs.LG cs.AI cs.DS 版本更新

Discovering Data Structures: Nearest Neighbor Search and Beyond

发现数据结构：最近邻搜索及其他

Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant

发表机构 * Université de Montréal（蒙特利尔大学）； Mila ； HEC Montréal（蒙特利尔高等商学院）； Microsoft Research（微软研究院）； University of Southern California（南加州大学）； Stanford University（斯坦福大学）

AI总结提出一个端到端学习数据结构的通用框架，自动适应数据分布并控制查询与空间复杂度，在最近邻搜索中逆向工程出二分搜索、插值搜索、k-d树和局部敏感哈希等算法。

Comments Neurips 2025 Version

详情

AI中文摘要

我们提出了一个用于端到端学习数据结构的通用框架。我们的框架适应底层数据分布，并对查询和空间复杂度提供细粒度控制。关键在于，数据结构是从头开始学习的，不需要仔细初始化或用候选数据结构/算法进行种子化。我们首先将该框架应用于最近邻搜索问题。在多种设置中，我们能够逆向工程出学习到的数据结构和查询算法。对于一维最近邻搜索，模型发现了最优的分布（不）依赖算法，如二分搜索和插值搜索的变体。在更高维度中，模型学习到的解决方案在某些情况下类似于k-d树，而在其他情况下则具有局部敏感哈希的元素。该模型还能学习高维数据的有用表示，并利用它们设计有效的数据结构。我们还将框架应用于数据流上的频率估计问题，并相信它也可以成为新问题的强大发现工具。

英文摘要

We propose a general framework for end-to-end learning of data structures. Our framework adapts to the underlying data distribution and provides fine-grained control over query and space complexity. Crucially, the data structure is learned from scratch, and does not require careful initialization or seeding with candidate data structures/algorithms. We first apply this framework to the problem of nearest neighbor search. In several settings, we are able to reverse-engineer the learned data structures and query algorithms. For 1D nearest neighbor search, the model discovers optimal distribution (in)dependent algorithms such as binary search and variants of interpolation search. In higher dimensions, the model learns solutions that resemble k-d trees in some regimes, while in others, they have elements of locality-sensitive hashing. The model can also learn useful representations of high-dimensional data and exploit them to design effective data structures. We also adapt our framework to the problem of estimating frequencies over a data stream, and believe it could also be a powerful discovery tool for new problems.

URL PDF HTML ☆

赞 0 踩 0

2411.16102 2026-06-09 cs.LG 版本更新

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

BlendServe: 利用资源感知批处理优化自回归大模型的离线推理

Yilong Zhao, Shuo Yang, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yang Zhou, Jiarong Xing, Ion Stoica

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of California, Davis（加州大学戴维斯分校）； Rice University（里士满大学）

AI总结针对离线批处理中资源重叠与前缀共享的冲突，提出资源感知前缀树来最大化资源利用率，相比vLLM和SGLang吞吐量提升1.44倍。

详情

更高效利用预算：使用重置与丢弃（ReD）方法在固定预算下提升大型语言模型的推理性能

Sagi Meir, Tommer D. Keidar, Noam Levi, Shlomi Reuveni, Barak Hirshberg

发表机构 * School of Chemistry, Tel Aviv University（特拉维夫大学化学系）； The Center for Physics and Chemistry of Living Systems, Tel Aviv University（特拉维夫大学生命系统物理与化学中心）； School of Physics and Astronomy, Tel Aviv University（特拉维夫大学物理与天文学系）； The Center for Computational Molecular and Materials Science, Tel Aviv University（特拉维夫大学计算分子与材料科学中心）

AI总结针对固定预算下大型语言模型推理的收益递减问题，提出重置与丢弃（ReD）查询方法，通过优化尝试分配提升覆盖率，并在编码、数学和推理基准上验证了其成本节约效果。

详情

AI中文摘要

大型语言模型（LLMs）在可验证任务上的性能通常通过 pass@k 衡量，即在 k 次尝试中至少正确回答一次的概率。在固定预算下，更合适的指标是 coverage@cost，即作为总尝试次数函数的平均唯一回答问题数量。我们连接这两个指标，并证明 pass@k 中经验观察到的幂律行为导致 coverage@cost 的次线性增长（收益递减）。为解决此问题，我们提出重置与丢弃（ReD），一种 LLMs 的查询方法，无论 pass@k 的形式如何，都能在给定预算下增加 coverage@cost。此外，给定 pass@k，我们可以定量预测使用 ReD 在总尝试次数上的节省。如果模型的 pass@k 不可用，ReD 可以推断其幂律指数。在三个 LLMs 上进行的编码（HumanEval）、数学（GSM8K）和推理（MMLU-Pro）基准测试表明，ReD 显著减少了达到期望覆盖率所需的尝试次数、令牌数和美元成本，同时提供了一种高效测量推理幂律的方法。ReD 的优势在非完美验证器下得以保持，并且优于测试的分配基线。

英文摘要

The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for a given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs across coding (HumanEval), math (GSM8K), and reasoning (MMLU-Pro) benchmarks demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws. ReD's advantage is maintained for imperfect verifiers and outperforms the tested allocation baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.05774 2026-06-09 cs.LG cs.AI math.PR 版本更新

核仿射包机作为冻结语义空间的计算高效编码器

Mohit Kumar, Somayeh Kargaran, Bernhard A. Moser, Manuela Geiß

发表机构 * University of Rostock（罗斯托克大学）； Software Competence Center Hagenberg GmbH（海根堡软件竞争力中心）

AI总结提出核仿射包机（KAHM）作为轻量级查询编码器，在固定教师表示空间下，通过RKHS中的后验权重估计替代神经网络编码，实现计算高效且性能优异的语义检索。

详情

AI中文摘要

基于Transformer的语义编码器在检索中很有效，但在许多部署中，重复出现的瓶颈是在线查询编码，而非离线语料库索引。本文研究，一旦强大的教师表示空间和语料库索引固定，是否可以用一个更轻量且解析明确的估计器来替代重复的神经查询编码。我们将固定教师的词汇到语义编码表述为一个条件均值估计问题，其中目标语义向量表示为由后验聚类概率加权的语义原型的噪声混合。使用核仿射包机（KAHM）几何，在显式识别的RKHS假设空间中，从廉价的词汇特征估计这些后验权重，并通过归一化最小均方更新从带噪声的教师嵌入中精炼语义原型。这产生了一个无反向传播的查询端编码器，以及一个端到端的误差分解，包括后验近似、有限样本/泛化和教师噪声项。我们在一个受控的奥地利法律检索基准上实例化该方法，该基准包含5000个测试查询、84个候选法律和10762个对齐的检索单元，使用特定于法律的编码器进入冻结的Mixedbread嵌入空间。在评估匹配的学习适配器中，KAHM在所有评估截断处实现了最强的教师空间重建和最佳的排名敏感检索性能。在k=20时，它获得了MRR@20=0.504、Hit@20=0.694和Top-1准确率=0.411，同时在报告的CPU设置中，相对于直接Transformer查询编码，在线每查询时间减少了8.53倍。结果支持KAHM作为监督固定表示部署场景中的计算高效编码器。

英文摘要

Transformer-based semantic encoders are effective for retrieval, but in many deployments the recurring bottleneck is online query encoding rather than offline corpus indexing. This paper studies whether, once a strong teacher representation space and corpus index are fixed, repeated neural query encoding can be replaced by a substantially lighter and analytically explicit estimator. We formulate fixed-teacher lexical-to-semantic encoding as a conditional-mean estimation problem in which the target semantic vector is represented as a noisy mixture of semantic prototypes weighted by posterior cluster probabilities. Kernel Affine Hull Machine (KAHM) geometry is used to estimate these posterior weights from inexpensive lexical features in an explicitly identified RKHS hypothesis space, and the semantic prototypes are refined by normalized least-mean-squares updates from noisy teacher embeddings. This yields a backpropagation-free query-side encoder together with an end-to-end error decomposition into posterior-approximation, finite-sample/generalization, and teacher-noise terms. We instantiate the approach on a controlled Austrian-law retrieval benchmark with 5,000 test queries, 84 candidate laws, and 10,762 aligned retrieval units, using law-specific encoders into a frozen Mixedbread embedding space. Among evaluation-matched learned adapters, KAHM achieves the strongest teacher-space reconstruction and the best rank-sensitive retrieval performance at all evaluated cutoffs. At k=20, it obtains MRR@20 = 0.504, Hit@20 = 0.694, and Top-1 Accuracy = 0.411, while reducing online per-query time by 8.53 relative to direct transformer query encoding in the reported CPU setting. The results support KAHMs as compute-efficient encoders for supervised fixed-representation deployment regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.13768 2026-06-09 cs.LG cs.AI cs.IT math.IT 版本更新

High-Rate Quantized Matrix Multiplication II

高速率量化矩阵乘法II

Or Ordentlich, Yury Polyanskiy

发表机构 * Hebrew University of Jerusalem（希伯来大学杰里科分校）； MIT（麻省理工学院）

AI总结本文研究在已知第二因子列协方差矩阵情况下高速率量化矩阵乘法，通过水填充算法改进LLM量化方法，展示WaterSIC方案在信息论极限下的性能。

详情

AI中文摘要

本文是关于量化矩阵乘法（MatMul）工作的第二部分。在第一部分中，我们考虑了无校准量化的情况，而在这里，我们讨论了在第二因子列协方差矩阵$Σ_X$已知的情况下的情形。这种情形出现在广泛应用的LLM后训练量化任务中。权重量化与加权均方误差（WMSE）源编码问题相关，其经典的（反向）水填充解决定了如何在向量的坐标之间分配速率。我们展示了如何利用水填充来改进实际的LLM量化算法（GPTQ），目前这些算法平均分配速率。最近的一种方案（称为``WaterSIC''）仅使用标量INT量化器进行分析，其高速率性能被证明为（a）基无关（即由$Σ_X$的行列式决定，因此不同于现有方案，不受随机旋转的影响）；（b）在信息论极限下的性能与$\frac{2πe}{12}$（或0.25 bit/entry）的乘法因子内。GPTQ的性能受基的选择影响，但对于随机旋转和实际的$Σ_X$来自Llama-3-8B，我们发现其性能在0.1 bit（取决于层类型）以内，表明GPTQ结合随机旋转也接近最优，至少在高速率范围内。

英文摘要

This is the second part of the work investigating quantized matrix multiplication (MatMul). In part I we considered the case of calibration-free quantization, whereas here we discuss the setting where covariance matrix $Σ_X$ of the columns of the second factor is available. This setting arises in the ubiquitous task of weight-only post-training quantization of LLMs. Weight-only quantization is related to the problem of weighted mean squared error (WMSE) source coding, whose classical (reverse) waterfilling solution dictates how one should distribute rate between coordinates of the vector. We show how waterfilling can be used to improve practical LLM quantization algorithms (GPTQ), which at present allocate rate equally. A recent scheme (known as ``WaterSIC'') that only uses scalar INT quantizers is analyzed and its high-rate performance is shown to be (a) basis free (i.e., characterized by the determinant of $Σ_X$ and, thus, unlike existing schemes, is immune to applying random rotations); and (b) within a multiplicative factor of $\frac{2πe}{12}$ (or 0.25 bit/entry) of the information-theoretic distortion limit. GPTQ's performance, in turn, is affected by the choice of basis, but for a random rotation and actual $Σ_X$ from Llama-3-8B we find it to be within 0.1 bit (depending on the layer type) of WaterSIC, suggesting that GPTQ with random rotation is also near optimal, at least in the high-rate regime.

URL PDF HTML ☆

赞 0 踩 0

2605.15491 2026-06-09 cs.LG cs.AI cs.PF 版本更新

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

Ghosted Layers: 无约束激活对齐用于恢复层剪枝的LLM

Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee

发表机构 * University of Southern California（南加州大学）； Inha University（inha大学）

AI总结本文提出Ghosted Layers方法，通过无约束优化解决层剪枝后激活分布不匹配问题，提升LLM准确性和 perplexity 而不牺牲效率。

详情

AI中文摘要

层剪枝从大型语言模型中移除整个Transformer解码器块，但导致后续存活层接收到的隐藏状态分布与训练时分布不匹配，从而引起显著性能下降。我们提出Ghosted Layers，一种无需训练的恢复模块，通过解决边界激活对齐问题来解决此问题。我们的方法从少量校准集推导出闭合形式的最优线性算子，以重建由剪枝层引入的激活差异。我们展示该解决方案对应于对齐目标的无约束最优解，而现有方法受限于有限算子子空间内的约束解。在多个LLM backbone和剪枝策略上的实验表明，我们的方法在保持层剪枝效率增益的同时，一致提升了准确性和perplexity，优于先前的无训练基线。官方代码仓库：https://github.com/daniel-eai/ghosted_layers_official_repository/.

英文摘要

Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, leading to significant performance degradation. We propose Ghosted Layers, a training-free recovery module that addresses this issue by solving a boundary activation alignment problem. Our method derives a closed-form optimal linear operator from a small calibration set to reconstruct the activation discrepancy introduced by the pruned layers. We show that this solution corresponds to the unconstrained optimum of the alignment objective, whereas existing methods are restricted to constrained solutions over limited operator subspaces. Experiments across multiple LLM backbones and pruning strategies demonstrate that our method consistently improves accuracy and perplexity over prior training-free baselines, while preserving the efficiency gains of layer pruning. Official code repository: https://github.com/daniel-eai/ghosted_layers_official_repository/.

URL PDF HTML ☆

赞 0 踩 0

2605.17289 2026-06-09 cs.LG cs.AI 版本更新

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

LEAP：可学习的端到端无结构剪枝大型语言模型

Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi

发表机构 * University of Maryland（马里兰大学）

AI总结本文提出LEAP，一种可学习的端到端无结构剪枝方法，通过伯努利-戈姆贝茨松弛替代传统参数化，提高了无结构剪枝的端到端准确率，实验表明在多个LLM家族上平均提升了零样本准确率。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM)

详情

AI中文摘要

无结构稀疏性现在通过最近的GPU内核和数据流硬件原生加速，瓶颈从推理执行转移到了剪枝算法。最先进的无结构LLM剪枝方法是基于最优大脑外科手术原理的分层代理，牺牲了端到端准确性，尤其是在高稀疏度下。端到端替代方案如MaskLLM和PATCH表明可学习掩码可以缩小这一差距，但它们的类别-模式参数化随有效掩码数量按行数增长，并不适用于无结构设置。我们引入LEAP，用每权重伯努利-戈姆贝茨松弛替代这种不可行参数化，使端到端无结构掩码学习变得可行。在五个从0.5B到8B参数的LLM家族上，在50%和60%稀疏度下，LEAP在六个任务的零样本准确率上平均比ADMM提升+2.59点，ADMM是我们在扫掠中的最佳分层基线。

英文摘要

Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel-sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves six-task average zero-shot accuracy by +2.59 points on average over ADMM, the best layer-wise baseline in our sweep.

URL PDF HTML ☆

赞 0 踩 0

2605.18643 2026-06-09 cs.LG cs.AI cs.CL 版本更新

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Xingtai Lv, Li Sheng, Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui, Bingning Wang, Fan Yang, Youbang Sun, Ning Ding, Bowen Zhou

发表机构 * Frontis.AI ； Kuaishou Technology（快手科技）； Shanghai AI Lab（上海人工智能实验室）； TsinghuaC3I/ZEDA（清华大学C3I/ZEDA）

AI总结本文提出ZEDA框架，通过自蒸馏将预训练的静态MoE模型转换为高效的动态MoE模型，显著减少专家FLOPs并提升推理速度。

详情

AI中文摘要

混合专家（MoE）通过稀疏专家激活高效地扩展语言模型，其动态变体进一步通过输入依赖的方式调整激活专家以减少计算。现有动态MoE方法通常依赖从头训练或任务特定适应，使完全训练的MoE的实际转换未被充分探索。启用此类适应可直接缓解推理成本，通过允许简单令牌在服务时绕过不必要的专家。本文引入了零专家自蒸馏适应（ZEDA），一种低成本框架，将后训练的静态MoE模型转换为高效的动态MoE模型。为稳定此架构转换，ZEDA在每个MoE层中注入无参数的零输出专家，并通过两阶段自蒸馏适应增强模型，利用原始MoE作为冻结的教师，并应用组级平衡损失。在Qwen3-30B-A3B和GLM-4.7-Flash上跨11个基准测试（涵盖数学、代码和指令跟随）中，ZEDA在边际精度损失下消除了超过50%的专家FLOPs。在两个模型上，ZEDA比最强的动态MoE基线分别高出6.1和4.0个点，并提供约1.20倍的端到端推理加速。

英文摘要

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expert during serving. This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a low-cost framework that transforms post-trained static MoE models into efficient dynamic ones. To stabilize this architectural conversion, ZEDA injects parameter-free zero-output experts into each MoE layer and adapts the augmented model through two-stage self-distillation, utilizing the original MoE as a frozen teacher and applying a group-level balancing loss. On Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks spanning math, code, and instruction following, ZEDA eliminates over 50% of expert FLOPs at marginal accuracy loss. It outperforms the strongest dynamic MoE baseline by 6.1 and 4.0 points on the two models, and delivers ~1.20$\times$ end-to-end inference speedup.

URL PDF HTML ☆

赞 0 踩 0

2605.18856 2026-06-09 cs.LG cs.CL cs.IT math.IT 版本更新

SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

SPHERICAL KV: 角度域注意力与率失真保持用于高效长上下文推理

Anay Chauhan, Gurucharan Marthi Krishna Kumar, Arion Das, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das

发表机构 * Synopsys ； McGill University（麦吉尔大学）； IIIT Ranchi（印度理工学院拉奇）； Amazon（亚马逊）； Meta ； Apple（苹果）； Pragya Lab, BITS Pilani Goa（普拉基亚实验室， BITS 拉贾斯坦）

AI总结提出Spherical KV方法，通过角度域注意力（ADA）和率失真保持（RDR）机制，在长上下文推理中减少KV缓存占用并保持解码效率。

详情

AI中文摘要

长上下文推理日益受到KV缓存的限制：常驻内存随上下文长度增长，解码受限于重复的高带宽内存（HBM）流而非算术运算。现有方法如驱逐、窗口化、量化和卸载减少了占用，但通常仅部分解决了关键路径瓶颈，尤其是在解码期间压缩状态仍需重建为密集向量时。我们提出Spherical KV，一种将KV分配视为基于注意力几何的率失真问题以实现高效解码的长上下文推理方法。该方法基于两个思想：(i) 在解码热循环中廉价地表示方向信息，(ii) 根据估计的未来效用分配保留和精度。其第一个组件，角度域注意力（ADA），将键存储在由标量半径和紧凑角度码组成的球面参数化中，并直接根据这些码计算注意力对数，无需重建密集键。这保留了分页、块局部、融合友好的解码路径，并在实际服务设置中直接针对HBM流量。其第二个组件，率失真保持（RDR），在固定预算下联合选择每个令牌和头的保留/丢弃决策及精度层级，生成层级同质的页面，具有轻量级元数据和合并读取。ADA和RDR共同提供了一种面向部署的机制，在保持解码效率的同时减少KV常驻内存。

英文摘要

Long-context inference is increasingly constrained by the KV cache: resident memory grows with context length, and decoding becomes limited by repeated High Bandwidth Memory (HBM) streaming rather than arithmetic. Existing methods such as eviction, windowing, quantization, and offloading reduce footprint, but often leave the critical-path bottleneck only partially addressed, especially when compressed states must still be reconstructed into dense vectors during decoding. We present Spherical KV, a long-context inference method that treats KV allocation as a rate-distortion problem grounded in attention geometry for efficient decoding. The method is built on two ideas: (i) represent directional information cheaply in the decode hot loop, and (ii) allocate retention and precision according to estimated future utility. Its first component, Angle-Domain Attention (ADA), stores keys in a spherical parameterization consisting of a scalar radius and compact angle codes, and computes attention logits directly from these codes without reconstructing dense keys. This preserves a paged, block-local, fusion-friendly decode path and directly targets HBM traffic in realistic serving settings. Its second component, Rate-Distortion Retention (RDR), jointly chooses keep/drop decisions and precision tiers per token and head under a fixed budget, producing tier-homogeneous pages with lightweight metadata and coalesced reads. Together, ADA and RDR provide a deployment-oriented mechanism for reducing KV residency while preserving decode efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.22863 2026-06-09 cs.LG 版本更新

Latent Cache Flow: Model-to-Model Communication Without Text

潜在缓存流：无需文本的模型间通信

Maximillian Rossi, Prajwal Raghunath, Eugene Wu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出潜在缓存流（LCF）方法，通过联合翻译和压缩键值缓存实现高效模型间通信，在上下文不同场景下比基于文本的通信准确率提高23%、速度提升8.5倍。

Comments 6 pages, 5 figures

详情

AI中文摘要

当今的LLM智能体通过文本进行通信，由于需要自回归解码共享模型的状态并在接收模型处编码，这会导致显著的延迟和信息损失。最近的工作如Cache-to-Cache（C2C；Fu等人，2026）试图通过学习适配器来交换KV缓存，该适配器将共享者的KV矩阵转换为接收者模型。然而，这些适配器体积庞大且训练成本高，并且逐词翻译，要求目标上下文完全相同。这对于LLM具有不同上下文的智能体通信来说是不合适的。我们引入了潜在缓存流（LCF）。为了解决效率问题，我们观察到键和值可以联合翻译和压缩，将适配器大小减少到C2C的约4%。为了解决上下文不同的问题，我们设计了适配器来传输目标模型所没有的新信息的摘要。我们的初步实验表明，在共享上下文设置中，一个13 MB的LCF适配器可以比956 MB的C2C适配器更准确；对于不同上下文，LCF比基于文本的通信准确率提高23%，速度提升8.5倍。

英文摘要

LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu et al., 2026) seeks to exchange KV caches by learning adapters that translate sharer KV matrices to the receiver model. However, the adapters are large and expensive to train, and translate individual tokens, which requires the target context to be identical. This is unsuitable for agent communication, where the LLMs have differing context. We introduce Latent Cache Flow (LCF). To address efficiency, we observe that keys and values can be jointly translated and compressed, reducing the adapter to about 4% of C2C's size. To address differing context, we design the adapter to transmit a summary of new information that the target model does not have. Our early experiments show that a pruned 13 MB LCF adapter can be more accurate than C2C at 956 MB in shared-context settings; for different contexts, LCF improves F1 by 7.5% and Exact Match by 23% while 8.5 times faster than text-based communication.

URL PDF HTML ☆

赞 0 踩 0

2605.27786 2026-06-09 cs.LG cs.AI 版本更新

Locality-Aware Redundancy Pruning for LLM Depth Compression

面向LLM深度压缩的局部感知冗余剪枝

Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo, Minkyu Kim, Sunwoo Lee

发表机构 * University of Southern California（美国南加州大学）； Neural Superintelligence Lab, MODULABS（MODULABS神经超级智能实验室）； Seoul National University（首尔国立大学）； Inha University（釜山大学）

AI总结提出LoRP，一种基于表示局部性的无训练单次深度剪枝框架，通过引入表示局部性分数（RLS）来识别和剪除冗余层，在多种LLM上提升了困惑度和下游任务准确率。

详情

AI中文摘要

Hyperflux: 剪枝揭示重要性

Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu

发表机构 * Department of Computer Science（计算机科学系）； Technical University of Cluj-Napoca（克莱津-纳波卡技术大学）； Department of Automation（自动化系）

AI总结提出Hyperflux方法，通过将剪枝建模为连续演化系统（通量和压力），在微观和宏观层面解释剪枝行为，并引入压力调度器实现目标稀疏度，在多个数据集上取得竞争性结果。

2509.10334 2026-06-09 cs.CV cs.AI cs.LG 版本更新

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

I-Segmenter: 用于高效语义分割的纯整数视觉Transformer

Jordan Sassoon, Michal Szczepanski, Martyna Poreba

发表机构 * CEA, France（法国原子能委员会）

AI总结提出I-Segmenter，首个全整数ViT分割框架，通过整数运算替换、λ-ShiftGELU激活函数及解码器优化，在保持精度前提下显著降低模型大小和推理延迟。

Comments Accepted by the Journal of Systems Architecture

详情

AI中文摘要

视觉Transformer（ViT）最近在语义分割中取得了强劲的结果，但由于其高内存占用和计算成本，在资源受限设备上的部署仍然有限。量化提供了一种提高效率的有效策略，但基于ViT的分割模型在低精度下非常脆弱，因为量化误差会在深度编码器-解码器流水线中累积。我们引入了I-Segmenter，这是第一个完全纯整数的ViT分割框架。基于Segmenter架构，I-Segmenter系统地将浮点运算替换为纯整数对应运算。为了进一步稳定训练和推理，我们提出了λ-ShiftGELU，一种新颖的激活函数，它减轻了均匀量化在处理长尾激活分布时的局限性。此外，我们移除了L2归一化层，并将解码器中的双线性插值替换为最近邻上采样，确保整个计算图都是纯整数执行。大量实验表明，I-Segmenter在合理精度范围内（平均5.1%）达到其FP32基线的精度，同时将模型大小减少高达3.8倍，并通过优化的运行时实现高达1.2倍的推理加速。值得注意的是，即使在单张校准图像的一次性PTQ中，I-Segmenter也能提供有竞争力的精度，凸显了其在实际部署中的实用性。

英文摘要

Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers an effective strategy to improve efficiency, but ViT-based segmentation models are notoriously fragile under low precision, as quantization errors accumulate across deep encoder-decoder pipelines. We introduce I-Segmenter, the first fully integer-only ViT segmentation framework. Building on the Segmenter architecture, I-Segmenter systematically replaces floating-point operations with integer-only counterparts. To further stabilize both training and inference, we propose $λ$-ShiftGELU, a novel activation function that mitigates the limitations of uniform quantization in handling long-tailed activation distributions. In addition, we remove the L2 normalization layer and replace bilinear interpolation in the decoder with nearest neighbor upsampling, ensuring integer-only execution throughout the computational graph. Extensive experiments show that I-Segmenter achieves accuracy within a reasonable margin of its FP32 baseline (5.1 % on average), while reducing model size by up to 3.8x and enabling up to 1.2x faster inference with optimized runtimes. Notably, even in one-shot PTQ with a single calibration image, I-Segmenter delivers competitive accuracy, underscoring its practicality for real-world deployment.

URL PDF HTML ☆

赞 0 踩 0

2602.21788 2026-06-09 cs.DC cs.LG 版本更新

Efficient Scaling of LLM Training with Flexible Context Parallelism

利用灵活上下文并行实现LLM训练的高效扩展

Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Huawei Technologies Co., Ltd.（华为技术有限公司）

AI总结针对数据异构导致负载不均和通信冗余问题，提出自适应重配置通信组和上下文并行度的FCP策略，实现近线性加速比，最高达1.46倍吞吐提升。

详情

AI中文摘要

扩展长上下文能力对于大型语言模型（LLM）至关重要。然而，现实世界的数据包含大量具有异构长度的序列。现有的LLM训练库依赖于静态并行策略，在数据异构下会遭受严重的负载不均衡、冗余通信和次优的硬件利用率。在这项工作中，我们提出了灵活上下文并行（FCP），一种高效的并行策略，能够在LLM训练期间自适应地重配置通信组和上下文并行度。我们推广了更灵活的非2的幂次并行度，并开发了一个多项式时间算法，为每个训练批次生成近乎最优的并行策略，开销仅为毫秒级。即使在极端数据异构下，FCP也能保持高硬件效率。实验结果表明，FCP在LLM和多模态大模型（MLLM）训练中均显著优于Megatron-LM和DeepSpeed，在保持大规模集群近线性扩展效率的同时，平均吞吐量提升高达1.46倍。对于极端不平衡的批次，FCP甚至实现了2.24倍的加速。

英文摘要

Scaling long-context capabilities is crucial for Large Language Models (LLMs). However, real-world data contain a large number of sequences with heterogeneous lengths. Existing training libraries for LLMs rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Flexible Context Parallelism (FCP), an efficient parallelism strategy that adaptively reconfigures communication groups and context parallelism degrees during LLM training. We generalize more flexible non-power-of-two parallelism degrees and develop a polynomial-time algorithm to generate near-optimal parallelism strategies with only millisecond-level overhead per training batch. FCP is able to maintain high hardware efficiency even under extreme data heterogeneity. Experimental results demonstrate that FCP significantly outperforms Megatron-LM and DeepSpeed in both LLM and MLLM training, achieving up to 1.46x speedup in average throughput while maintaining near-linear scaling efficiency across large-scale clusters. For extremely unbalanced batches, FCP even achieves 2.24x speedup.

URL PDF HTML ☆

赞 0 踩 0

2603.23640 2026-06-09 cs.DC cs.LG 版本更新

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

边缘侧的大语言模型推理：移动、NPU和GPU在持续负载下的性能效率权衡

Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

发表机构 * Conscious Engines

AI总结研究评估了在持续负载下不同设备上大语言模型的性能效率，发现移动端受热管理限制，专用硬件受电池和内存带宽限制，展示了不同平台的推理表现和能效差异。

Comments 14 pages, 5 figures, 10 tables

详情

AI中文摘要

在设备上部署大语言模型以实现持续运行的个人代理，需要硬件在功率、热限和内存方面的持续推理。我们对Qwen 2.5 1.5B（4位量化）在四个平台上的性能进行了基准测试：Raspberry Pi 5搭载Hailo-10H NPU、三星Galaxy S24 Ultra、iPhone 16 Pro和NVIDIA RTX 4050 GPU笔记本电脑。使用固定258个标记的提示，经过20次预热迭代，我们测量了吞吐量、延迟、功率和热行为。对于移动平台，热管理超越峰值计算成为主要限制：iPhone 16 Pro在两次迭代内几乎失去一半的吞吐量，而S24 Ultra因操作系统强制的GPU频率限制导致推理终止。在专用硬件上，不同的限制主导：RTX 4050受电池电量限制，而Hailo-10H受模块内存带宽限制。RTX 4050在34.1 W下维持131.7 tok/s；Hailo-10H在不到2 W下维持6.9 tok/s，接近零波动，与RTX 4050在能效比例上相匹配，但吞吐量低19倍。结果应视为单个模型和提示类型的平台级部署特征，反映硬件和软件的结合，而非单独的硬件能力声明。

英文摘要

Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we measure throughput, latency, power, and thermal behaviour. For mobile platforms, thermal management supersedes peak compute as the primary constraint: the iPhone 16 Pro loses nearly half its throughput within two iterations, and the S24 Ultra suffers a hard OS-enforced GPU frequency floor that terminates inference entirely. On dedicated hardware, distinct constraints dominate: the RTX 4050 is bounded by its battery power ceiling, while the Hailo-10H is limited by on-module memory bandwidth. The RTX 4050 sustains 131.7 tok/s at 34.1 W; the Hailo-10H sustains 6.9 tok/s at under 2 W with near-zero variance, matching the RTX 4050 in energy proportionality at 19x lower throughput. Results should be interpreted as platform-level deployment characterisations for a single model and prompt type, reflecting hardware and software combined, rather than general claims about hardware capability alone.

URL PDF HTML ☆

赞 0 踩 0

2605.03229 2026-06-09 cs.CL cs.LG 版本更新

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

稀疏记忆微调：作为LoRA和全微调的低遗忘替代方案

Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi

发表机构 * University of Washington（华盛顿大学）

AI总结提出稀疏记忆微调（SMF），通过添加键值记忆层并仅更新当前批次最活跃的记忆行，在MedMCQA任务上提升2.5个百分点，同时将遗忘探针（WikiText困惑度和TriviaQA准确率）控制在基线的1个百分点内，优于LoRA和全微调。

详情

AI中文摘要

将预训练语言模型适应新任务通常会损害其已有的通用能力，这一问题被称为灾难性遗忘。稀疏记忆微调（SMF）通过向模型添加键值记忆层，并在每个训练步骤中仅更新当前批次读取最频繁的一小组记忆行来避免这种情况。我们在Qwen-2.5-0.5B-Instruct上重新实现了SMF，并将其与LoRA和全微调在MedMCQA（一个4选1的医学考试任务）上进行比较，使用WikiText困惑度和TriviaQA准确率作为遗忘探针。SMF将MedMCQA提升了2.5个百分点，同时将两个遗忘探针保持在基线的约1个百分点内，而LoRA和全微调虽然取得了更大的增益，但在两个探针上都出现了明显的漂移。我们还比较了两种行选择规则（KL散度和TF-IDF），它们在两个遗忘指标上取得了不同的平衡。

英文摘要

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updating only the small set of memory rows that the current batch reads most heavily. We re-implement SMF on Qwen-2.5-0.5B-Instruct and compare it with LoRA and full finetuning on MedMCQA, a 4-choice medical exam task, using WikiText perplexity and TriviaQA accuracy as forgetting probes. SMF improves MedMCQA by 2.5 percentage points while keeping both forgetting probes within roughly 1 point of the base model, whereas LoRA and full finetuning achieve larger gains but with clear drift on both. We also compare two row-selection rules (KL-divergence and TF-IDF), which balance the two forgetting metrics differently.

URL PDF HTML ☆

赞 0 踩 0

2605.28207 2026-06-09 cs.CL cs.AI cs.LG 版本更新

HASA：计算受限的模型异构联邦学习中的子网分配

Amir Hossein Shahdadian, Ahmed M. Abdelmoniem, Mahdi Taheri, Samira Nazari, Christian Herglotz

发表机构 * University of Naples "Federico II"（那不勒斯腓特烈二世大学）； Queen Mary University of London（伦敦玛丽女王大学）； Brandenburg University of Technology Cottbus-Senftenberg（勃兰登堡工业大学）； Tallinn University of Technology（塔林理工大学）； University of Zanjan（赞詹大学）

AI总结提出HASA方法，根据客户端异构性分数分配子网宽度，在固定计算预算下提升平均和最差客户端准确率。

详情

AI中文摘要

边缘服务越来越多地使用联邦学习来个性化设备上的模型，同时将敏感数据保留在本地。在实践中，部署必须处理客户端资源和本地数据分布的异构性。模型异构联邦学习通过允许每个客户端训练共享超网的子网来降低客户端成本，但大多数子网分配策略由设备约束驱动，并未明确考虑统计异构性。本文提出异构感知子网分配（HASA），这是一种仅训练规则，根据从本地训练数据计算的客户端异构性分数分配子网宽度，同时强制执行固定的大小加权计算预算。该设计能够与替代分配策略进行预算匹配的比较。在包含七个客户端的文章标题下一个单词预测基准测试中，HASA在10个匹配种子上的未加权平均客户端测试准确率优于均匀分配，将平均客户端测试准确率从13.82%提高到14.32%，并平均提高了最差客户端准确率。在与代表性部分训练基线的匹配预算比较中，HASA在该基准测试上实现了最强的最差客户端和尾部客户端准确率。方向性消融实验表明，将较小的子网分配给更异构的客户端会降低平均和尾部性能。跨领域图像分类研究进一步表明，异构感知分配的有效性取决于异构性分数反映客户端对额外模型宽度需求的程度。

英文摘要

Edge services increasingly use federated learning to personalize on-device models while keeping sensitive data local. In practice, deployments must handle heterogeneity in both client resources and local data distributions. Model-heterogeneous federated learning lowers client cost by allowing each client to train a subnet of a shared supernet, but most subnet-allocation policies are driven by device constraints and do not explicitly account for statistical heterogeneity. This paper proposes Heterogeneity-Aware Subnet Allocation (HASA), a train-only rule that assigns subnet widths based on client heterogeneity scores computed from local training data while enforcing a fixed size-weighted compute budget. This design enables budget-matched comparisons with alternative allocation policies. On an article-title next-word prediction benchmark with seven clients, HASA improves unweighted mean client test accuracy over uniform allocation across 10 matched seeds, increasing mean client test accuracy from 13.82 percent to 14.32 percent, and improves worst-client accuracy on average. In a matched-budget comparison with representative partial-training baselines, HASA achieves the strongest worst-client and tail-client accuracy on this benchmark. A directionality ablation shows that assigning smaller subnets to more heterogeneous clients degrades both mean and tail performance. A cross-domain image-classification study further shows that the effectiveness of heterogeneity-aware allocation depends on how well the heterogeneity score reflects clients' need for additional model width.

URL PDF HTML ☆

赞 0 踩 0

2606.07702 2026-06-09 cs.LG cs.AI 新提交

EvoCSFL: Surrogate-Assisted Evolutionary Client Selection for Efficient and Robust Federated Learning

EvoCSFL：基于代理辅助的进化客户端选择实现高效鲁棒联邦学习

Lin Qiang, Sun Xiaoyan, Hu Yao, Fang Wei

发表机构 * Jiangnan University（江南大学）； The Hong Kong Polytechnic University（香港理工大学）

AI总结针对联邦学习中客户端数据与系统异构性导致收敛慢、鲁棒性差的问题，提出代理辅助的进化客户端选择框架，将选择问题建模为组合优化，用代理模型加速进化搜索，实验表明收敛更快、能耗更低、鲁棒性更强。

详情

AI中文摘要

客户端数据和系统的异构性使得采用随机客户端选择的联邦学习难以获得令人满意的收敛速度和鲁棒性。为解决此问题，本文提出了一种基于代理辅助的客户端进化选择框架。在该框架中，首先使用一些典型的客户端选择策略生成候选集，并开发了一个集成模型性能、通信延迟和能量消耗的度量函数，将客户端选择问题表述为组合优化问题。随后，利用候选选择和度量构建代理模型，以高效逼近所选客户端子集的性能。采用进化算法搜索客户端选择的组合空间，并由代理模型引导以加速收敛。在MNIST、CIFAR10、CINIC10和TinyImageNet上的实验表明，与现有方法相比，所提算法实现了更快的收敛、更低的能量消耗和更好的鲁棒性。

英文摘要

The heterogeneity of client data and systems makes it difficult to achieve satisfactory convergence speed and robustness in federated learning with random client selection. To address this issue, this paper proposes a surrogate-assisted client evolutionary selection framework for federated learning. In this framework, some typical client selection strategies are first used to generate candidate sets, and a metric function that integrates model performance, communication latency, and energy consumption is developed to formulate the client selection problem as a combinatorial optimization one. Subsequently, a surrogate model is constructed using the candidate selections and metric to efficiently approximate the performance of selected client subsets. An evolutionary algorithm is employed to search the combinatorial space of client selections, guided by the surrogate model to accelerate convergence. Experiments on MNIST, CIFAR10, CINIC10, and TinyImageNet demonstrate that the proposed algorithm achieves faster convergence, lower energy consumption, and improved robustness compared to existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.08027 2026-06-09 cs.LG cs.AI 新提交

CausShield: Sample Reconstruction-Resilient Vertical FL via Causal Representation Learning

CausShield: 通过因果表示学习实现样本重建鲁棒的纵向联邦学习

Yongqi Jiang, Yansong Gao, Siguang Chen, Anmin Fu

发表机构 * Nanjing University of Science and Technology（南京理工大学）； University of Western Australia（西澳大学）； Hohai University（河海大学）； Nanjing University（南京大学）

AI总结针对纵向联邦学习中样本重建攻击的防御问题，提出基于因果表示学习的CausShield方法，将共享表示分解为任务相关与无关部分，实现全周期隐私保护，理论证明收敛性，实验优于七种最新方法。

详情

AI中文摘要

纵向联邦学习（VFL）是一种分布式学习范式，利用跨孤立方的垂直划分特征，无需共享原始样本；然而，它仍然容易受到主动样本重建攻击。现有防御方法由于要么抑制任务相关信息的同时也抑制了隐私敏感特征，要么依赖端到端监督训练来收敛防御模块（这暴露了早期轮次的脆弱性），因此无法在模型效用和隐私保护之间实现令人满意的权衡。为了解决这一挑战，我们采用结构因果模型（SCM）的见解，构建了CausShield。从任务学习的角度来看，原始样本中的因果特征是那些直接相关且有助于学习目标的特征，而非因果特征与任务无关，但通常编码了样本特定的私有信息，从而促进了重建。重要的是，我们奠定了理论基础来证明这一见解。因此，CausShield将VFL中客户端与协调服务器之间的共享表示分解为任务相关和任务无关的组件，以确保全周期的隐私保护。然而，由于在保持模型效用的同时减轻隐私泄露的双重目标，这种分解本质上具有挑战性。我们通过一个精心制定的优化问题来解决这一问题，该问题通过无监督表示学习求解。我们进一步从理论上证明CausShield保持了标准VFL的收敛行为。大量实验将CausShield与七种最新方法（包括InvL (USENIX Security'25)）进行比较，并评估了对高级重建攻击（如URVFL (NDSS'25)）的鲁棒性。结果表明，CausShield在隐私保护、模型效用和计算效率方面始终表现优异。

英文摘要

Vertical federated learning (VFL) is a distributed learning paradigm that leverages vertically partitioned features across isolated parties without sharing raw samples; however, it remains vulnerable to active sample reconstruction attacks. Existing defenses fail to achieve a satisfactory trade-off between model utility and privacy protection, due to either suppressing task-relevant information alongside privacy-sensitive features or relying on end-to-end supervised training to converge the defense module, which exposes the model to early-epoch vulnerability. To address this challenge, we adopt a structural causal model (SCM) insight and construct CausShield. From a task-learning standpoint, causal features within a raw sample are those that are directly relevant and contributory to the learning objective, whereas non-causal features are task-irrelevant but often encode sample-specific private information, thereby facilitating reconstruction. Importantly, we lay a theoretical foundation to prove this insight. CausShield thus decomposes the shared representations between the client and the coordinating server in VFL into task-relevant and task-irrelevant components to ensure full-cycle privacy protection. Nonetheless, the decomposition is inherently challenging due to the dual objectives of preserving model utility while mitigating privacy leakage. We address this via a carefully formulated optimization problem, which is solved through unsupervised representation learning. We further theoretically prove that CausShield preserves the convergence behavior of standard VFL. Extensive experiments compare CausShield against seven SOTAs, including InvL (USENIX Security'25), and evaluate robustness against advanced reconstruction attacks such as URVFL (NDSS'25). Results demonstrate that CausShield consistently outperforms in privacy protection, model utility, and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.08473 2026-06-09 cs.LG 新提交

Physically Consistent Null Space Alignment for Detection of Low-Magnitude False Data Injection Attacks

物理一致零空间对齐用于检测低幅值虚假数据注入攻击

Xin Li, Chenhan Xiao, Jonathan Cohen, Aviad Elyashar, Yang Weng, Rami Puzis

发表机构 * Ben-Gurion-University（本-古里安大学）

AI总结提出物理一致零空间对齐（PCNSA）框架，通过伪零空间守恒预处理保持物理零空间与测量伪零空间的几何对应，从而检测低幅值但高影响的隐蔽虚假数据注入攻击。

Comments 12 pages, 13 figures

详情

AI中文摘要

虚假数据注入攻击（FDIAs）引入小的测量扰动，当注入信号与系统模型的伪零空间对齐时，仍可能导致电力系统状态估计出现较大偏差。现有的基于模型和数据驱动的检测器可能无法识别这种低幅值但高影响的攻击，因为残差检验忽略了隐藏在伪零空间中的变化，而子空间学习方法捕获相关模式但未强制执行物理一致性。本文提出物理一致零空间对齐（PCNSA），一种通过预处理保持物理零空间与测量导出伪零空间之间的几何对应来检测隐蔽FDIAs的框架。关键在于伪零空间守恒数据预处理（PSCP）步骤，该步骤在子空间提取之前将测量重新表达在物理坐标系中。我们证明PSCP保持了行空间与其正交补之间的分离，这是传统逐特征标准化所违反的性质。这使得奇异值分解（SVD）导出的伪零子空间与物理残差空间对齐，而无需显式知道H。在IEEE 14、30、57和118节点系统上的实验证实了这一原理：逃避XTM、LSTM、AE和Isolation Forest基线的隐蔽攻击在对齐子空间中表现为明显偏差，从而获得更高的F1分数和检测精度，同时在部分可观测性和实际PMU噪声下保持鲁棒性。

英文摘要

False data injection attacks (FDIAs) introducing small measurement perturbations can still cause large deviations in power system state estimation when the injected signals align with the pseudo-null space of the system model. Existing model- and data-driven detectors may fail to identify such low-magnitude but high-impact attacks because residual tests ignore changes hidden in the pseudo-null space, while subspace learning methods capture correlation patterns without enforcing physical consistency. This paper proposes Physically Consistent Null Space Alignment (PCNSA), a framework that detects stealthy FDIAs by preserving, through preprocessing, the geometric correspondence between the physical null space and the measurement-derived pseudo-null space. The key point is a Pseudo-null Space Conserved data Preprocessing (PSCP) step that re-expresses measurements in the physical coordinate frame before subspace extraction. We prove that PSCP preserves the separation between row space and its orthogonal complement, a property that conventional per-feature standardization violates. This keeps the singular value decomposition (SVD)-derived pseudo-null subspace aligned with the physical residual space without explicit knowledge of H. Experiments on IEEE 14-, 30-, 57-, and 118-bus systems confirm this principle in practice: stealthy attacks that evade XTM, LSTM, AE and Isolation Forest baselines appear as clear deviations in the aligned subspace, yielding higher F1-score and detection accuracy while remaining robust under partial observability and realistic PMU noise.

URL PDF HTML ☆

赞 0 踩 0

2606.09301 2026-06-09 cs.LG 新提交

PRISM: Topology-Aware Cross-Modal Imputation for Modality-Deficient Federated Graph Learning

PRISM: 面向模态缺失联邦图学习的拓扑感知跨模态插补

Zekai Chen, Miao Zhang, Jiayang Xing, Xunkai Li, Xun Wu, Rong-Hua Li, Guoren Wang

发表机构 * Beijing Institute of Technology（北京理工大学）

AI总结针对联邦图学习中客户端级模态缺失问题，提出拓扑感知跨模态插补框架PRISM，通过联邦检索缺失模态语义并利用拓扑控制注入局部图传播，在六个多模态图数据集上平均提升4.48%。

详情

AI中文摘要

多模态联邦图学习（MM-FGL）旨在从包含文本和图像的分散图中协作学习。然而，现实世界的客户端可能没有共同的模态基础：视觉搜索客户端可能包含图像-交互图但没有卖家描述，而目录客户端可能提供文本但没有产品图像。我们将这种实际设置称为客户端级模态缺失。与随机的实例级缺失不同，缺失模态的客户端缺乏重建缺失模态所需的局部语义基础。更重要的是，在图学习中，不完整的表示初始化消息传递，因此插补误差可以被接收拓扑过滤、混合和放大。为了解决这一问题，我们提出了\textbf{PRISM}（\textbf{P}roactive \textbf{R}etrieval and \textbf{I}mputation via \textbf{S}tructural \textbf{M}eta-prompting），一个拓扑感知的联邦跨模态插补框架。PRISM不是仅从局部观测重建缺失模态，而是从联邦中恢复缺失模态语义，并在拓扑感知控制下将其引入局部图传播。在六个多模态图数据集上的实验表明，PRISM持续改善模态缺失客户端，平均优于最先进的基线\textbf{4.48}\%。

英文摘要

Multimodal federated graph learning (MM-FGL) aims to collaboratively learn from decentralized graphs with text and images. However, real-world clients may not share a common modality basis: a visual-search client may contain image--interaction graphs but no seller descriptions, while a catalog client may provide text but no product images. We refer to this practical setting as client-level modality deficiency. Unlike random instance-wise missingness, a deficient client lacks the local semantic basis needed to reconstruct the absent modality. More importantly, in graph learning, incomplete representations initialize message passing, so imputation errors can be filtered, mixed, and amplified by the receiving topology. To address this gap, we propose \textbf{PRISM} (\textbf{P}roactive \textbf{R}etrieval and \textbf{I}mputation via \textbf{S}tructural \textbf{M}eta-prompting), a topology-aware federated cross-modal imputation framework. Rather than reconstructing the missing modality solely from local observations, PRISM recovers missing-modality semantics from the federation and introduces them into local graph propagation under topology-aware control. Experiments on six multimodal graph datasets across graph-centric and modality-centric tasks show that PRISM consistently improves modality-deficient clients, outperforming state-of-the-art baselines by \textbf{4.48}\% on average.

URL PDF HTML ☆

赞 0 踩 0

2606.09401 2026-06-09 cs.LG cs.CR 新提交

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

大语言模型适配的实证隐私保护基准测试

Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic

发表机构 * CISPA Helmholtz Center for Information Security（CISPA 欧洲信息安全中心）

AI总结通过系统变化适配数据分布，使用鲁棒成员推断和金丝雀数据提取攻击，评估差分隐私下大语言模型的实际隐私风险，发现分布偏移显著影响隐私脆弱性，LoRA等参数高效微调方法对分布外数据提供最佳实证保护。

Comments Accepted at ICLR 2026 (Oral)

详情

AI中文摘要

最近的工作应用差分隐私（DP）来适配大语言模型（LLMs）以用于敏感应用，提供了理论保证。然而，其实用有效性仍不明确，部分原因是LLM预训练中，与适配数据的重叠和相互依赖关系可能破坏隐私，尽管采用了DP。为了在实践中分析这一问题，我们使用最先进的攻击（如鲁棒成员推断和金丝雀数据提取）调查了DP适配下LLMs中的隐私风险。我们通过系统变化适配数据分布（从与预训练数据完全重叠，经过分布内（IID）情况，到完全分布外（OOD）示例）来对这些风险进行基准测试。此外，我们评估了不同适配方法和不同隐私机制对脆弱性的影响。我们的结果表明，分布偏移强烈影响隐私脆弱性：在相同的理论保证下，适配数据越接近预训练分布，实际隐私风险越高，即使没有直接的数据重叠。我们发现，参数高效微调方法（如LoRA）对OOD数据实现了最高的实证隐私保护。我们的基准测试确定了在DP LLM适配中实现实际隐私的关键因素，为在敏感环境中部署定制模型提供了可操作的见解。展望未来，我们提出了一个结构化框架，用于超越适配隐私的整体隐私评估，以识别和评估LLM的完整预训练-适配流程中的风险。

英文摘要

Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.09582 2026-06-09 cs.LG stat.ML 新提交

On Choosing the $μ$ Parameter in Gaussian Differential Privacy

论高斯差分隐私中参数 $μ$ 的选择

Bogdan Kulynych, Antti Honkela

发表机构 * Lausanne University Hospital（拉索恩大学医院）； University of Helsinki（赫尔辛基大学）

AI总结本文通过匹配强对手成员推理攻击的最坏情况成功度，提供从纯-DP ε到GDP μ的原则性映射，并推荐 μ≈ε/5 作为保守通用转换。

2606.08179 2026-06-09 cs.DS cs.CR cs.LG 交叉投稿

Differentially Private Range Subgraph Counting

差分隐私范围子图计数

Xian Chen, Ruobing Bai, Pan Peng

发表机构 * School of Computer Science and Technology, University of Science and Technology of China（计算机科学与技术学院，中国科学技术大学）

AI总结针对子图计数中的隐私问题，提出差分隐私范围子图计数（DPRSC）问题，通过子图投影将其转化为加权正交范围计数，结合范围树和局部敏感度估计实现低误差隐私查询，并证明误差下界与维度指数相关。

Comments ICML2026

详情

AI中文摘要

子图计数是图分析中的一个基本问题。受实际场景（图分析在选定顶点诱导的子图上进行，而非整个图）以及日益增长的隐私需求的推动，我们首次研究了差分隐私范围子图计数（DPRSC）。其目标是在由多维属性范围定义的诱导子图中，对固定模式图的出现次数进行隐私计数。与经典的点计数不同，子图计数本质上是非线性的且具有高敏感性：单条边的修改可能影响许多子图出现。我们提出了首个具有小加性误差的高效DPRSC算法。我们的方法引入了一个子图投影，将DPRSC简化为加权正交范围计数，从而能够利用范围树和局部敏感度估计来实现准确的隐私查询回答。我们通过将重建攻击归约到DPRSC并利用差异理论，给出了与算法匹配的下界。特别地，我们证明任何用于DPRSC的差分隐私算法都必须承受与维度指数相关的加性误差。实验评估表明，我们的算法在准确性和运行时间上显著优于基线方法，同时保持强大的隐私保证。

英文摘要

Subgraph counting is a fundamental problem in graph analysis. Motivated by practical scenarios where graph analytics are performed on subgraphs induced by selected vertices -- rather than on the entire graph -- and by growing privacy concerns, we initiate the study of differentially private range subgraph counting (DPRSC). The goal is to privately count occurrences of a fixed pattern graph within induced subgraphs defined by multi-dimensional attribute ranges. Unlike classical point counting, subgraph counting is inherently nonlinear and exhibits high sensitivity: a single edge modification can affect many subgraph occurrences. We present the first efficient algorithms for DPRSC with small additive error. Our approach introduces a subgraph projection that reduces DPRSC to weighted orthogonal range counting, enabling the use of range trees and local sensitivity estimation to achieve accurate private query answering. We complement our algorithms with matching lower bounds, obtained by reducing reconstruction attacks to DPRSC and leveraging discrepancy theory. In particular, we show that any differentially private algorithm for DPRSC must incur additive error exponential in the dimension. Empirical evaluations demonstrate that our algorithms significantly outperform baseline methods in accuracy and runtime while maintaining strong privacy guarantees.

URL PDF HTML ☆

赞 0 踩 0

2606.09411 2026-06-09 cs.CR cs.IT cs.LG math.IT 交叉投稿

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

现在你（仍然）能看到我：检测大语言模型中的隐蔽隐写载荷

Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas

发表机构 * UCL Centre for AI（UCL人工智能中心）； University College London（伦敦大学学院）； ML Alignment Theory Scholars（机器学习对齐理论学者）； Department of Computer Science（计算机科学系）； School of Computing and Communications（计算与通讯学院）； ETH Zürich（苏黎世联邦理工学院）； University of Sussex（Sussex大学）； Imperial College London & University of Oxford（伦敦帝国学院与牛津大学）

AI总结针对大语言模型隐写外泄风险，提出一种基于非线性MLP探针的对抗性微调方法可系统规避现有线性探针检测，但通过信息论指导的数据级干预可恢复检测能力。

详情

AI中文摘要

大型语言模型可以通过微调将提示中的秘密编码到流畅、看似良性的输出中。这造成了一种隐写外泄风险，难以通过输出级隐写分析检测。最近的工作提出使用线性探针从内部激活中恢复秘密的机制检测方法。我们表明这种防御可以被系统性地规避，但通过针对性的数据级干预可以恢复可检测性。首先，我们将检测设置扩展到包括非线性MLP探针。然后，我们在五个基础模型上对抗性微调隐写木马：Qwen3-8B、Llama-3.1-8B、Ministral-8B、Qwen3-14B和Phi-4-14B。得到的模型在规避岭回归和留出MLP探针的同时，保留了58%–79%的精确匹配秘密恢复，在六个基准测试中平均能力下降1%–8%。然后，我们给出了这种规避的信息论特征。成功的规避在保持可恢复性的同时，降低了从内容对齐表示中提取秘密的低阶可提取性，迫使载荷与剩余自由度产生协同交互。这激发了一个重新语境化数据集，限制了这些剩余自由度。在该分布上，所有五个规避木马的岭回归和MLP可检测性都得到恢复。总体而言，我们的发现表明基于激活的隐写检测容易受到自适应规避的影响，但理论指导的评估分布可以暴露原本隐藏的载荷。

英文摘要

Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work proposes mechanistic detection using linear probes that recover the secret from internal activations. We show that this defense can be systematically evaded, but that detectability can be recovered through a targeted data-level intervention. First, we extend the detection setup to include a non-linear MLP probe. We then adversarially fine-tune steganographic trojans across five base models: Qwen3-8B, Llama-3.1-8B, Ministral-8B, Qwen3-14B, and Phi-4-14B. The resulting models retain $58$--$79\%$ exact-match secret recovery while evading both ridge and held-out MLP probes, with $1$--$8\%$ average capability degradation across six benchmarks. We then give an information-theoretic characterization of this evasion. Successful evasion preserves recoverability while reducing low-order extractability of the secret from the content-aligned representation, forcing the payload into synergistic interaction with residual degrees of freedom. This motivates a recontextualization dataset that restricts these residual degrees of freedom. On this distribution, both ridge and MLP detectability are restored across all five evasive trojans. Overall, our findings show that activation-based steganography detection is vulnerable to adaptive evasion, but also that theory-guided evaluation distributions can expose otherwise hidden payloads.

URL PDF HTML ☆

赞 0 踩 0

2409.15723 2026-06-09 cs.LG cs.CL 版本更新

Federated Large Language Models: Current Progress and Future Directions

联邦大语言模型：当前进展与未来方向

Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, Lina Yao, Julian McAuley, Yiran Chen, Carlee Joe-Wong

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Duke University（杜克大学）； University of California San Diego（加州大学圣地亚哥分校）； The University of New South Wales（新南威尔士大学）； Adobe Research（Adobe研究）； University of Maryland College Park（马里兰大学学院公园分校）； CSIRO’s Data61（澳大利亚联邦科学与工业研究组织Data61）

AI总结本文综述联邦学习与大语言模型结合（FedLLM）的最新进展，重点分析联邦微调和联邦提示学习如何应对效率、个性化和安全挑战，并展望联邦预训练和联邦智能体等方向。

Comments Accepted by PAKDD 2026

详情

AI中文摘要

大语言模型在各种应用中取得了令人印象深刻的性能，但其训练通常依赖于集中式数据收集，引发了严重的隐私和治理问题。联邦学习通过使多个客户端能够协作训练共享模型而不暴露原始本地数据，提供了一种去中心化的替代方案。然而，将联邦学习与大语言模型集成带来了新的挑战，包括数据异质性、收敛不稳定性、通信开销和计算约束。本综述提供了联邦学习用于大语言模型（FedLLM）的全面且最新的概述。我们系统地回顾了近期进展，特别强调联邦微调和联邦提示学习，并分析了现有方法如何应对效率、个性化和安全挑战。我们进一步总结了新兴方向，如联邦预训练和联邦智能体。我们的目标是提供对这个快速发展领域的结构化视角，并突出未来研究的有前景的途径。

英文摘要

Large Language Models have achieved impressive performance across diverse applications, yet their training typically depends on centralized data collection, raising serious privacy and governance concerns. Federated Learning offers a decentralized alternative by enabling multiple clients to collaboratively train shared models without exposing raw local data. However, integrating FL with LLMs introduces new challenges, including data heterogeneity, convergence instability, communication overhead, and computational constraints. This survey provides a comprehensive and up-to-date overview of Federated Learning for Large Language Models (FedLLM). We systematically review recent advances, with particular emphasis on federated fine-tuning and federated prompt learning, and analyze how existing methods address efficiency, personalization, and security challenges. We further summarize emerging directions such as federated pre-training and federated agents. Our goal is to offer a structured perspective on this rapidly evolving field and to highlight promising avenues for future research.

URL PDF HTML ☆

赞 0 踩 0

2410.05662 2026-06-09 cs.LG 版本更新

Communication-Efficient Federated Learning under Dynamic Device Arrival and Departure: Convergence Analysis and Algorithm Design

动态设备加入和离开下的通信高效联邦学习：收敛性分析与算法设计

Zhan-Lun Chang, Dong-Jun Han, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton

发表机构 * Elmore Family School of Electrical and Computer Engineering, Purdue University（埃洛姆家族电气与计算机工程学院，普渡大学）； Department of Computer Science and Engineering, Yonsei University（延世大学计算机科学与工程系）； Department of Electrical Engineering, University at Buffalo–SUNY（布法罗大学（SUNY）电气工程系）

AI总结针对设备动态加入/离开的联邦学习场景，提出基于梯度相似性的模型初始化算法，通过加权平均历史全局模型加速分布偏移恢复，实现收敛速度提升一个数量级以上。

详情

AI中文摘要

大多数联邦学习（FL）方法假设设备集固定。然而，现实场景中设备常因用户移动模式或跨小区切换等动态加入或离开系统。这种动态设置带来了独特挑战：（1）优化目标随活动设备集演变，不同于传统FL的静态目标；（2）当前全局模型可能不再作为后续轮次的有效初始化，可能阻碍适应、延迟收敛并降低资源效率。为应对这些挑战，我们首先对动态设备集下的FL进行收敛性分析，考虑了梯度噪声、本地训练迭代次数以及该实际设置中的数据异质性等因素。受此分析启发，我们提出一种模型初始化算法，使设备加入或离开网络时能够快速适应。我们的关键思想是计算先前全局模型的加权平均，以梯度相似性为指导，优先选择在数据分布与当前设备集紧密对齐上训练的模型，从而在更少的训练轮次中加速从分布偏移中恢复。这种即插即用算法设计为与现有FL方法无缝集成，具有广泛适用性。实验表明，与基线相比，我们的方法通常实现一个数量级或更多的收敛加速，我们证明这大幅降低了达到目标精度的能耗。

英文摘要

Most federated learning (FL) approaches assume a fixed device set. However, real-world scenarios often involve devices dynamically joining or leaving the system, driven by, e.g., user mobility patterns or handovers across cell boundaries. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active device set, unlike traditional FL's static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation, delaying convergence, and reducing resource efficiency. To address these challenges, we first provide a convergence analysis for FL under a dynamic device set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity in this practical setting. Motivated by this analysis, we propose a model initialization algorithm that enables rapid adaptation whenever devices join or leave the network. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current device set, thereby accelerating recovery from distribution shifts in fewer training rounds. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability. Experiments demonstrate that our approach achieves convergence speedups typically an order of magnitude or more compared to baselines, which we show drastically reduces energy consumption to reach a target accuracy.

URL PDF HTML ☆

赞 0 踩 0

2503.18314 2026-06-09 cs.LG cs.AI cs.CV 版本更新

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

LoTUS：带有不确定性风味的大规模机器遗忘

Christoforos N. Spartalis, Theodoros Semertzidis, Petros Daras, Efstratios Gavves

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Centre for Research & Technology Hellas（希腊研究中心与技术中心）； Archimedes/Athena RC（阿基米德/雅典娜研究中心）

AI总结提出LoTUS方法，通过平滑预测概率至信息论界限来消除训练样本影响，避免从头重训练，在Transformer和ResNet18模型上超越现有方法，并引入RF-JSD指标用于实际评估。

Comments Accepted as a main conference paper at CVPR 2025 (https://cvpr.thecvf.com/virtual/2025/poster/33292)

2601.22669 2026-06-09 cs.LG 版本更新

Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning

超越固定轮次：面向实际联邦学习的无数据早停法

Youngjoon Lee, Hyukjoon Lee, Seungrok Jung, Andy Luo, Jinu Gong, Yang Cao, Joonhyuk Kang

发表机构 * arXiv

AI总结提出一种无数据早停框架，通过监控任务向量增长率确定最优停止点，在皮肤病变/血细胞/结肠病理分类任务中达到与基于验证集的早停相当的性能，且仅需少量额外轮次。

Comments Under Review

详情

AI中文摘要

联邦学习（FL）无需传输原始数据即可实现去中心化协作学习。然而，依赖固定的全局轮次或验证数据进行超参数调优会带来高计算成本和隐私风险，阻碍了实际部署。为解决这一问题，我们提出了一种无数据早停框架，该框架仅使用服务器端参数监控任务向量的增长率来确定最优停止点。在皮肤病变/血细胞/结肠病理分类上的数值结果表明，我们的方法与多种最先进FL方法中基于验证集的早停性能相当。特别是，所提出的框架平均需要45/12/31（皮肤病变/血细胞/结肠病理）额外轮次即可实现比基于验证数据早停高12.3%/8.9%/3.9%的性能。此外，该框架仅需9/8/14额外轮次即可筛选不良配置，不到固定轮次预算的3%。据我们所知，这是首个为FL方法提出的无数据早停框架。我们的代码已开源。

英文摘要

Federated Learning (FL) facilitates decentralized collaborative learning without transmitting raw data. However, reliance on fixed global rounds or validation data for hyperparameter tuning hinders practical deployment by incurring high computational costs and privacy risks. To address this, we propose a data-free early stopping framework that determines the optimal stopping point by monitoring the task vector's growth rate using only server-side parameters. The numerical results on skin lesion/blood cell/colon pathology classification demonstrate that our approach is comparable to the validation-based early stopping across various state-of-the-art FL methods. In particular, the proposed framework requires an average of 45/12/31 (skin lesion/blood cell/colon pathology) additional rounds to achieve over 12.3%/8.9%/3.9% higher performance than early stopping based on validation data. Moreover, the proposed framework requires only 9/8/14 additional rounds to screen bad configurations, which is less than 3% of the fixed-round budget. To the best of our knowledge, this is the first work to propose a data-free early stopping framework for FL methods. Our code is available at this open repository.

URL PDF HTML ☆

赞 0 踩 0

2601.23221 2026-06-09 cs.LG 版本更新

Optimal Fair Aggregation of Crowdsourced Noisy Labels using Demographic Parity Constraints

使用人口统计平价约束的众包噪声标签的最优公平聚合

Gabriel Singer, Samuel Gruffaz, Olivier Vo Van, Nicolas Vayatis, Argyris Kalogeratos

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Université de Paris（巴黎大学）； CNRS（国家科学研究中心）

AI总结针对众包标签聚合中的公平性问题，提出在ε-公平框架下分析多数投票和最优贝叶斯聚合的公平性差距，并推广多类公平后处理算法以强制执行人口统计平价约束。

详情

AI中文摘要

由于获取可靠的真实标签通常成本高昂或不可行，众包和聚合嘈杂的人类注释是典型的替代方案。然而，聚合主观标签可能会放大个体偏见，特别是关于敏感特征的偏见，引发公平性问题。尽管如此，众包聚合中的公平性在很大程度上仍未得到探索，没有现有的收敛保证，只有有限的后处理方法用于在人口统计平价下强制执行ε-公平性。我们通过在ε-公平框架内分析众包聚合方法的公平性差距来填补这一空白，针对多数投票和最优贝叶斯聚合。在小众群体中，我们推导出多数投票的公平性差距的上界，该上界以个体注释者的公平性差距表示。我们进一步表明，在可解释的条件下，聚合共识的公平性差距指数级收敛到真实标签的公平性差距。由于真实标签本身可能仍然不公平，我们将最先进的多类公平后处理算法从连续设置推广到离散设置，该算法对任何聚合规则强制执行严格的人口统计平价约束。在合成和真实数据集上的实验证明了我们方法的有效性，并证实了理论见解。

英文摘要

As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.

URL PDF HTML ☆

赞 0 踩 0

2605.20341 2026-06-09 cs.LG cs.AI cs.CR cs.PF 版本更新

状态后门：针对状态空间中视觉-语言-动作模型的隐蔽现实世界投毒攻击

Ji Guo, Wenbo Jiang, Yansong Lin, Yijing Liu, Ruichen Zhang, Guomin Lu, Aiguo Chen, Xinshuo Han, Hongwei Li

发表机构 * Laboratory Of Intelligent Collaborative Computing, University of Electronic Science and Technology of China（智能协同计算实验室，电子科学与技术大学）； National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China（无线通信国家重点实验室，电子科学与技术大学）； School of Computer Science and Engineering, University of Electronic Science and Technology of China（计算机科学与工程学院，电子科学与技术大学）； College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics（计算机科学与技术学院，南京航空航天大学）； College of Computing and Data Science, Nanyang Technological University（计算与数据科学学院，南洋理工大学）

AI总结提出状态后门攻击，利用机器人手臂初始状态作为触发器，通过偏好引导遗传算法优化触发器的隐蔽性和有效性，在五个VLA模型和五个真实任务中实现超过90%的攻击成功率。

详情

AI中文摘要

视觉-语言-动作（VLA）模型广泛部署于机器人等安全关键的具身AI应用中。然而，其复杂的多模态交互也暴露了新的安全漏洞。本文研究了VLA模型中的后门威胁，即恶意输入导致目标错误行为，同时保持对干净数据的性能。现有后门方法主要依赖在视觉模态中插入可见触发器，由于环境变化，在现实场景中鲁棒性差且不易被察觉。为克服这些限制，我们引入状态后门，一种新颖且实用的后门攻击，利用机器人手臂的初始状态作为触发器。为优化触发器的隐蔽性和有效性，我们设计了偏好引导遗传算法（PGA），高效搜索状态空间以找到最小但有效的触发器。在五个代表性VLA模型和五个真实任务上的大量实验表明，我们的方法在不影响良性任务性能的情况下实现了超过90%的攻击成功率，揭示了具身AI系统中一个未被充分探索的漏洞。

英文摘要

Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.

URL PDF HTML ☆

赞 0 踩 0

2604.07125 2026-06-09 cs.CR cs.LG 版本更新

Scalable and Private Federated Learning Using Distributed Differential Privacy and Secure Aggregation

可扩展且隐私保护的联邦学习：利用分布式差分隐私和安全聚合

Wenjing Wei, Farid Nait-Abdesselam, Alla Jammine

发表机构 * Université Paris Cité（巴黎Cité大学）

AI总结本文提出DDP-SA框架，结合客户端侧本地差分隐私和全阈值加法秘密共享，实现安全聚合，提供更强的端到端隐私保障且计算可行。

Comments Submitted to IEEE Transactions on Dependable and Secure Computing (under review)

详情

AI中文摘要

本文提出了DDP-SA，一种可扩展的隐私保护联邦学习框架，联合利用客户端侧本地差分隐私（LDP）和全阈值加法秘密共享（ASS）进行安全聚合。与仅依赖差分隐私或安全多方计算（MPC）的方法不同，DDP-SA整合两种技术，提供更强的端到端隐私保障，同时保持计算可行性。该框架引入了双阶段保护机制：客户端首先用校准的拉普拉斯噪声扰动本地梯度，然后将噪声梯度分解为加法秘密份额，分发到多个中间服务器。此设计确保（i）没有单个被入侵的服务器或通信通道能揭示任何关于个体客户端更新的信息，且（ii）参数服务器仅重建聚合的噪声梯度，从不任何客户端特定的贡献。大量实验表明，DDP-SA在模型准确性上显著高于独立LDP，同时提供比MPC-only方法更强的隐私保护。所提框架的扩展性与参与者的数量线性相关，并提供了一个实用的、隐私保护的联邦学习解决方案，具有可控的计算和通信开销。

英文摘要

This article presents DDP-SA, a scalable privacy-preserving federated learning framework that jointly leverages client-side local differential privacy (LDP) and full-threshold additive secret sharing (ASS) for secure aggregation. Unlike existing methods that rely solely on differential privacy or on secure multi-party computation (MPC), DDP-SA integrates both techniques to deliver stronger end-to-end privacy guarantees while remaining computationally practical. The framework introduces a two-stage protection mechanism: clients first perturb their local gradients with calibrated Laplace noise, then decompose the noisy gradients into additive secret shares that are distributed across multiple intermediate servers. This design ensures that (i) no single compromised server or communication channel can reveal any information about individual client updates, and (ii) the parameter server reconstructs only the aggregated noisy gradient, never any client-specific contribution. Extensive experiments show that DDP-SA achieves substantially higher model accuracy than standalone LDP while providing stronger privacy protection than MPC-only approaches. The proposed framework scales linearly with the number of participants and offers a practical, privacy-preserving solution for federated learning applications with controllable computational and communication overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.30123 2026-06-09 cs.CR cs.LG 版本更新

Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels

基于xMK-CKKS的无线信道隐私增强零阶联邦学习

Anthony Ayli, Khalil Harris, Jihad Fahs, Mohamad Assaad

发表机构 * University of Waterloo（滑铁卢大学）

AI总结针对无线联邦学习中单密钥同态加密的客户端安全漏洞，提出一种无需信道估计的四阶段协议，利用xMK-CKKS多密钥同态加密实现安全聚合，并集成零阶优化，在保证收敛率的同时降低通信开销。

Comments 12 pages, 3 figures

详情

AI中文摘要

同态加密（HE）通过允许服务器在不解密的情况下操作加密数据，实现了联邦学习（FL）中的隐私保护聚合。现有的无线同态加密方法主要依赖单密钥HE方案，并需要信道估计或预均衡来补偿无线衰落。然而，单密钥HE仍然容易受到共享相同密钥的诚实但好奇客户端的攻击。此外，攻破单个客户端可能危及整个网络的安全性，而多密钥HE方案通过为每个设备分配自己的密钥来提供更强的客户端级安全性。我们提出了一种四阶段协议，使得著名的多密钥HE方案xMK-CKKS能够在共享无线信道上进行聚合，而无需信道估计。该协议通过相同的信道实现重传部分公钥和密文，使得在解密过程中占主导地位的大模数加密项代数相消。我们将该协议与零阶FL集成在缓慢变化的视距主导信道上，其中每个设备每轮传输一个加密标量，通信/加密开销与模型维度无关。我们证明，解码后的加密噪声保持了$O(1/\sqrt{K})$的收敛速度，直到可忽略的噪声基底。该协议能够抵抗与最多$N-1$个客户端共谋的诚实但好奇服务器，MNIST上的数值结果验证了分析。

英文摘要

Homomorphic encryption (HE) enables privacy-preserving aggregation in federated learning (FL) by allowing the server to operate on encrypted data without decryption. Existing HE-over-the-air (OTA) methods mainly rely on single-key HE schemes and require channel estimation or pre-equalization to compensate for wireless fading. However, single-key HE remains vulnerable to honest-but-curious (HBC) clients holding the shared secret key, while multi-key HE provides stronger client-level security by assigning each device its own secret key. We propose a four-phase protocol that enables the aggregation of xMK-CKKS over a shared wireless channel without channel estimation. The protocol retransmits partial public keys and ciphertexts through the same channel realization, so that the dominant large-modulus encryption terms cancel algebraically during decryption. We integrate this protocol with zero-order FL over slowly varying LoS-dominant channels, where each device transmits a single encrypted scalar per round and the communication/encryption overhead is independent of the model dimension. We show that the residual noise induced by encryption and wireless aggregation preserves the standard convergence rate $O(1/\sqrt{K})$ up to a negligible noise floor, where $K$ is the number of communication rounds. The protocol assumes an non-trusted server and is secure against HBC clients, preventing any client from recovering the local updates of other participants. Numerical results on MNIST validate the theoretical analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.07581 2026-06-09 cs.LG cs.AI cs.ET 新提交

Training-Inference Kernel Contracts: Bounding Divergence in Post-Training and Deployment

训练-推理核契约：约束后训练与部署中的偏差

Bruce Changlong Xu, Lan Wu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出核契约框架，通过数值、统计、运行时和可观测性条款约束训练核与推理核之间的分布偏差，并推导偏差界以保障策略梯度无偏性。

详情

AI中文摘要

现代后训练流程通常为其策略π_θ编写一个符号，但通过两个不同的程序进行评估：一个针对自动微分优化的训练核和一个针对低精度、融合、动态批处理服务优化的推理核。在有限精度下，这些核在相同权重下可能产生不同的分布，且差距集中在基准测试未充分代表的切片上。本文提出核契约：一个契约优先的框架，用于指定K_train和K_inf之间可接受的偏差。契约C = (N, S, R, O, Pi) 结合了数值、统计、运行时和可观测性条款，以及从违规到路由操作的升级策略。我们推导了从logit漂移到总变差距离再到有界奖励漂移的链式界限，并将其专门用于强化学习后训练，其中在显式支持和范数假设下，每个token的重要性比率漂移给出了策略梯度偏差的界限。我们还描述了一个四阶段提升管道、在线路由循环以及用于契约工件的极简YAML DSL。本文是一个框架和词汇论文；我们不报告生产规模的实证验证。

英文摘要

A modern post-training pipeline often writes one symbol for its policy, pi_theta, while evaluating it through two different programs: a training kernel optimized for autograd and an inference kernel optimized for low-precision, fused, dynamically batched serving. In finite precision, these kernels can induce different distributions at identical weights, with the gap concentrated on slices that aggregate benchmarks under-represent. This paper proposes kernel contracts: a contract-first framework for specifying acceptable divergence between K_train and K_inf. A contract C = (N, S, R, O, Pi) combines numerical, statistical, runtime, and observability clauses with an escalation policy from violations to routing actions. We derive a chain of bounds from logit drift to total-variation distance to bounded reward drift, and specialize it to RL post-training, where per-token importance-ratio drift yields a bound on policy-gradient bias under explicit support and norm assumptions. We also describe a four-stage promotion pipeline, online routing loop, and minimal YAML DSL for contract artifacts. This is a framework and vocabulary paper; we do not report production-scale empirical validation.

URL PDF HTML ☆

赞 0 踩 0

2606.07596 2026-06-09 cs.LG 新提交

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

尾部的捷径：通过微调更新的后验谱压缩进行去偏

Edward Sun, Dmitrii Troitskii

发表机构 * UCLA（加州大学洛杉矶分校）； Northeastern University（东北大学）

AI总结提出对微调权重更新进行SVD截断尾部，无需重训练或组标签即可减少虚假关联，在多个模型和基准上以<2%的准确率损失将差距降低最多5倍。

Comments ICML Weight Space Symmetries Workshop 2026

详情

AI中文摘要

应变连贯性：编码代理执行轨迹中的故障前信号

Marut Pandya, Kasey Zhang, Baiqing Lyu

发表机构 * GitHub

AI总结提出“应变连贯性”模式，即编码代理识别到问题但仍按原计划行动，通过构建Claude Sonnet 4.6检测器在44条轨迹上实现94%故障预测精度，优于基线方法。

详情

AI中文摘要

基于LLM的编码代理有时会承认自身推理中的问题，但仍继续执行。我们将这种模式称为应变连贯性：一种与安全相关的故障模式，其中代理拥有应改变其行为的信息，陈述了该信息，却仍违背它行动。该模式与口头奖励黑客行为重叠，即代理指出任务代理与底层目标之间的冲突，却仍优化代理。我们给出操作性定义，构建一个Claude Sonnet 4.6评判器，读取完整轨迹并标记该模式出现的片段，并使用Qwen3.5-35B-A3B骨干在44条Terminal-bench-2轨迹上评估。标记轨迹的失败率为94%，而未标记轨迹为46%（47个百分点的差距，Fisher精确检验p=0.003；排除三个提示嵌入示例后为46个百分点，p=0.006）。在匹配选择性下，检测器达到94%的精确度，而词汇话语标记基线为88%；两种方法的10条轨迹交集具有100%的失败率（Clopper-Pearson 95%置信区间[69%, 100%]）。我们在Gemma4-31B上使用43条轨迹进行复制：整体信号方向一致但不显著（20个百分点差距，p=0.31），衰减主要由13条零思考内容的轨迹驱动，其中检测器没有可分析的基础。在Gemma的高冗长度三分位中，差距为+30个百分点；在Qwen的中等和高冗长度三分位中，差距各为+40个百分点。两个模型的首次标记出现在轨迹经过时间的中位数83-84%处，且二元标记在软化显式冲突标记的释义中保持不变（8/8条轨迹）。与单变量预测器不同，检测器输出可解释的跨度级输出——引用的承认、引用的行动和类型化的冲突——显示代理看到并忽略了什么。

英文摘要

LLM-based coding agents sometimes acknowledge a problem in their own reasoning and then proceed anyway. We call this pattern strained coherence: a safety-relevant failure mode in which an agent has information that should change its behavior, states that information, and still acts against it. The pattern overlaps with verbalized reward hacking, where an agent names a tension between a task proxy and the underlying goal yet optimizes the proxy anyway. We give an operational definition, build a Claude Sonnet 4.6 judge that reads full trajectories and flags spans where the pattern occurs, and evaluate it on 44 Terminal-bench-2 trajectories using a Qwen3.5-35B-A3B backbone. Flagged trajectories fail 94% of the time versus 46% for unflagged trajectories (47-point gap, Fisher's exact p = 0.003; 46 points after excluding three prompt-embedded examples, p = 0.006). At matched selectivity, the detector reaches 94% precision versus 88% for a lexical discourse-marker baseline; the 10-trajectory intersection of the two methods has a 100% failure rate (Clopper-Pearson 95% CI [69%, 100%]). We replicate on Gemma4-31B with 43 trajectories: the overall signal is directionally consistent but not significant (20-point gap, p = 0.31), with attenuation driven largely by 13 trajectories with zero think content, where the detector has no substrate to analyze. In the high-verbosity Gemma tertile, the gap is +30 points; in the mid- and high-verbosity Qwen tertiles, it is +40 points each. The first flag appears at a median of 83-84% of elapsed trajectory time across both models, and the binary flag survives paraphrases that soften explicit conflict markers (8/8 trajectories). Unlike univariate predictors, the detector emits interpretable span-level output -- quoted acknowledgment, quoted action, and typed conflict -- showing what the agent saw and ignored.

URL PDF HTML ☆

赞 0 踩 0

2606.08021 2026-06-09 cs.LG cs.AI cs.MA 新提交

Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure

语义法定数保证：面向非确定性AI基础设施的集体认证

Jun He, Deying Yu

发表机构 * OpenKedge.io

AI总结提出语义法定数保证（SQA），一种通过多样化验证者群体和风险自适应法定数谓词，将非确定性LLM代理的不安全操作批准率从18.5%降至0.3%的控制平面原语。

Comments 21 pages, 2 figures, 6 tables

详情

AI中文摘要

随着大型语言模型（LLM）代理被集成到自主云操作中，分布式系统面临一个语义可靠性问题：提议代理可以生成语法有效且静态授权但操作不安全的生成突变，例如修改IAM策略、开放防火墙安全组或执行数据导出。经典的分布式共识协议复制确定性状态转换，但不评估提议意图的安全性。为弥补这一差距，我们引入语义法定数保证（SQA），一种用于治理非确定性代理基础设施的控制平面原语。SQA将提议表示为绑定到密码证据链的声明性执行合约，并将其路由到由只读、沙盒验证代理组成的多样化面板。SQA在风险自适应法定数谓词下聚合其判断，该谓词强制执行模型和原型多样性，根据校准的保证分数调整权重，并尊重特定原型的否决。通过的提议仅通过主权执行门执行。我们在云原生控制平面中实例化SQA，并为非确定性验证者形式化了一个相关的认知失败模型。在500个基础设施启发的突变场景中，安全结果报告在保留的安全/不安全试验上（排除模糊场景），SQA将不安全批准率从单代理验证的18.5%降低到0.3%，同时在研究风险桶中增加了1.45-4.12秒的中位验证延迟。

英文摘要

As large language model (LLM) agents are integrated into autonomous cloud operations, distributed systems face a semantic reliability problem: proposer agents can generate production mutations, such as modifying IAM policies, opening firewall security groups, or executing data exports, that are syntactically valid and statically authorized but operationally unsafe. Classical distributed consensus protocols replicate deterministic state transitions but do not evaluate the safety of the proposed intent. To address this gap, we introduce Semantic Quorum Assurance (SQA), a control-plane primitive for governing non-deterministic agentic infrastructure. SQA represents proposals as declarative execution contracts bound to cryptographic evidence chains and routes them to a diverse panel of read-only, sandboxed validator agents. SQA aggregates their judgments under a risk-adaptive quorum predicate that enforces model and archetype diversity, adjusts weights based on calibrated assurance scores, and respects archetype-specific vetoes. Admitted proposals execute only through a sovereign execution gate. We instantiate SQA in a cloud-native control plane and formalize a correlated cognitive failure model for non-deterministic validators. On 500 infrastructure-inspired mutation scenarios, with safety results reported on held-out safe/unsafe trials excluding ambiguous scenarios, SQA reduces unsafe approval from 18.5% for single-agent validation to 0.3% while adding median validation latency of 1.45--4.12 seconds across the studied risk buckets.

URL PDF HTML ☆

赞 0 踩 0

2606.08044 2026-06-09 cs.LG cs.AI cs.CL 新提交

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

当行为安全评估失败时：表征层面的视角

Enyi Jiang, Anders Gjølbye, Yibo Jacky Zhang, Sanmi Koyejo

发表机构 * Stanford University（斯坦福大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Technical University of Denmark（丹麦技术大学）

AI总结本文提出行为安全与干预鲁棒性之间的“审计差距”，通过构建解离模型和引入潜在脆弱性评分（LVS），证明行为安全指标不足以衡量表征层面的鲁棒性。

Comments Preprint

详情

AI中文摘要

大型语言模型（LLM）的安全性通常从行为层面进行评估，这提供了有限的内部鲁棒性证据，因为这些评估针对的是输出，而非干预下的表征层面脆弱性。我们将这种差异形式化为审计差距：行为安全与干预下鲁棒性之间的差异。为了研究这一差距，我们构建了解离模型，这些模型在保持安全的外在行为的同时，在潜在空间中仍然脆弱。我们引入了一个基于干预的评估框架，通过在参数和潜在空间中进行软干预（包括有害微调和逐层潜在扰动）来测试模型鲁棒性。为了形式化评估，我们提出了潜在脆弱性评分（LVS），用于衡量通过有界潜在扰动引发有害行为的难易程度。使用该评估框架，我们表明行为安全指标不足以衡量多个安全和对齐及未对齐的最先进模型的表征层面鲁棒性。值得注意的是，解离模型在有害干预下尽管表现出相当的拒绝行为，但LVS显著升高，其中中间表征对干预最为敏感。我们的结果表明，仅凭行为安全评估无法全面反映模型鲁棒性，这促使我们需要进行表征感知的审计，以评估潜在脆弱性和可观察行为。

英文摘要

Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limited evidence of internal robustness, as these evaluations target outputs rather than representation-level vulnerability under intervention. We formalize this discrepancy as the audit gap: the difference between behavioral safety and robustness under intervention. To study this gap, we construct dissociated models that preserve safe outward behavior while remaining vulnerable in the latent space. We introduce an intervention-based evaluation framework to test model robustness through soft interventions in parameter and latent spaces, including harmful fine-tuning and layer-wise latent perturbations. To formalize the evaluation, we propose the Latent Vulnerability Score (LVS) to measure how easily harmful behavior can be elicited by bounded latent perturbations. Using this evaluation framework, we show that behavioral safety metrics are insufficient measures of representation-level robustness across multiple safely and unsafely aligned state-of-the-art models. Notably, dissociated models show substantially elevated LVSs despite comparable refusal behavior under harmful intervention, with intermediate representations being the most sensitive to intervention. Our results suggest that behavioral safety evaluation alone provides an incomplete picture of model robustness, motivating representation-aware audits of latent vulnerability and observable behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.08275 2026-06-09 cs.LG cs.AI 新提交

Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures

因果智能体回放：LLM智能体故障的反事实归因

Jaineet Shah

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出Causal Agent Replay (CAR)方法，通过结构因果模型和干预操作，对LLM智能体失败步骤进行反事实归因，解决现有方法无法定位决策步骤的问题。

Comments Open-source: https://github.com/jaineet17/causal-agent-replay

详情

AI中文摘要

当LLM智能体失败时——例如发放了不应发放的退款、调用了错误的工具、泄露了数据——现有工具只能回答发生了什么（可观测性）或是否通过（评估），但无法回答哪个步骤导致了失败。直观的启发式方法是错误的：执行有害动作的步骤通常不是决定该动作的步骤，而LLM判断的归因是相关性的且不可靠（在Who&When基准上，最先进的步骤级准确率约为14%）。我们提出Causal Agent Replay (CAR)，通过干预来回答这个问题：它将智能体运行建模为结构因果模型，对某个步骤应用do操作，并在相同随机策略下重新执行轨迹，测量结果分布的变化。我们定义了智能体步骤上的干预代数、一个单步对比估计器（其承诺点规则解决了特定于随机向前运行的混杂因素），以及一个预算有界的蒙特卡洛Shapley估计器（用于在交互步骤间分配信用）。每个效应都附有置信区间。我们在具有植入真实标签的合成结构因果模型上进行验证：对比估计器恢复了关键步骤，Shapley恢复了两步交互（0.44, 0.45, ~0；效率总和0.909对比解析值0.91）。CAR是开源的，可在托管或免费的本地模型上运行。

英文摘要

When an LLM agent fails -- issues a refund it should not have, calls the wrong tool, leaks data -- existing tooling answers what happened (observability) or whether it passed (evaluation), but not which step caused the failure. The obvious heuristics are wrong: the step that executes the harmful action is usually not the step that decided on it, and LLM-judge attribution is correlational and unreliable (state-of-the-art step-level accuracy on the Who&When benchmark is about 14%). We present Causal Agent Replay (CAR), which answers the question by intervention: it models an agent run as a structural causal model, applies a do-operation to a step, and re-executes the trajectory forward under the same stochastic policy, measuring the shift in the outcome distribution. We define an intervention algebra over agent steps, a single-step contrastive estimator whose point-of-commitment rule resolves a confound specific to stochastic run-forward, and a budget-bounded Monte-Carlo Shapley estimator that splits credit across interacting steps. Every effect is reported with confidence intervals. We validate against synthetic structural causal models with planted ground truth: the contrastive estimator recovers the pivotal step, and Shapley recovers a two-step interaction (0.44, 0.45, ~0; efficiency sum 0.909 versus the analytic 0.91). CAR is open source and runs on hosted or free local models.

URL PDF HTML ☆

赞 0 踩 0

2606.08365 2026-06-09 cs.LG cs.AI 新提交

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

稀疏自编码器引导副作用的干预前预测

Evan Duan

发表机构 * University of Michigan（密歇根大学）

AI总结提出一种干预前筛选框架，利用特征统计预测SAE引导的副作用（效果不稳定和附带扩散），在多个模型和字典上验证了解码器几何等信号优于基线，但预测效果因模型而异。

详情

AI中文摘要

稀疏自编码器（SAE）特征越来越多地用于引导语言模型，但特征引导很少是干净的：相同的干预在不同上下文中可能表现不一致，并扰动不相关的特征。我们引入了一个干预前筛选框架，用于从引导前计算的特征统计中预测SAE引导的副作用。我们沿着引导模块化的两个轴（效果稳定性和附带扩散）来操作化副作用，并在ReLU、JumpReLU和TopK SAE字典上评估GPT-2-small、Pythia-70M-deduped、Gemma-2-2B和Llama-3.1-8B。在这些设置中，解码器几何、激活统计、共激活结构和直接logit足迹比仅频率和激活幅度基线更好地预测引导模块化。信号在GPT-2-small、Pythia-70M和Llama-3.1-8B中最强，在那里它能在对抗幅度相关混杂的残差化后幸存，而在Gemma-2-2B中较弱。保留筛选表明，通过预测的清洁度对未见特征进行排序可以选择在新上下文中更干净地引导的特征，但成功的轴因设置而异：GPT-2在清洁度上提升最大，Pythia主要在稳定性上提升，Llama主要在附带性上提升，而Gemma仅部分提升。一个受控的Llama Scope宽度比较表明，在32K到128K字典宽度变化下，预测信号仍然存在，尽管筛选收益变得不太稳定。总体而言，SAE引导的副作用是可提前预测的，但有用的预测器签名和迁移的模块化轴依赖于模型和字典设置。

英文摘要

Sparse autoencoder (SAE) features are increasingly used to steer language models, but feature steering is rarely clean: the same intervention can behave inconsistently across contexts and perturb unrelated features. We introduce a pre-intervention screening framework for forecasting SAE steering side effects from feature statistics computed before steering. We operationalize side effects along two axes of steering modularity, effect stability and collateral spread, and evaluate GPT-2-small, Pythia-70M-deduped, Gemma-2-2B, and Llama-3.1-8B across ReLU, JumpReLU, and TopK SAE dictionaries. Across these settings, decoder geometry, activation statistics, co-activation structure, and direct-logit footprint predict steering modularity better than frequency-only and activation-magnitude baselines. The signal is strongest in GPT-2-small, Pythia-70M, and Llama-3.1-8B, where it survives residualization against magnitude-related confounds, and weaker in Gemma-2-2B. Held-out screening shows that ranking unseen features by predicted cleanliness can select features that steer more cleanly on fresh contexts, but the successful axis varies by setting: GPT-2 improves most cleanly, Pythia improves mainly on stability, Llama mainly on collateral, and Gemma only partially. A controlled Llama Scope width comparison shows that the predictive signal persists under a 32K-to-128K dictionary-width change, although the screening payoff becomes less stable. Overall, SAE steering side effects are predictable in advance, but the useful predictor signature and transferred modularity axis are model- and dictionary-setting dependent.

URL PDF HTML ☆

赞 0 踩 0

2606.08467 2026-06-09 cs.LG cs.AI 新提交

The Confidence Trap: Calibration Attacks for Graph Neural Networks

置信陷阱：图神经网络的校准攻击

Cuong Dang, Jiahao Zhang, Hieu Ta Quang, Dung Le, Lu Cheng, Suhang Wang

发表机构 * Virginia Polytechnic Institute and State University（弗吉尼亚理工学院暨州立大学）； The Pennsylvania State University（宾夕法尼亚州立大学）； VinUniversity ； University of Illinois at Chicago（伊利诺伊大学芝加哥分校）

AI总结提出统一图校准攻击（UGCA）框架，通过KL散度损失、重排序机制和混合损失等策略，在保持分类精度下显著提高期望校准误差，揭示高精度或多类模型更易受攻击。

详情

AI中文摘要

尽管置信校准对于安全关键应用中的可信决策至关重要，但校准后的GNN对对抗性结构扰动的鲁棒性仍未被充分探索。然而，研究图上的校准攻击面临独特的技术挑战：（1）图结构的离散性使基于梯度的优化复杂化；（2）现有的低置信目标无法将预测推向均匀分布；（3）GNN对边扰动高度敏感，常导致违反攻击约束的意外标签变化。为应对这些挑战，我们提出一个\textbf{统一图校准攻击（UGCA）}框架，用于GNN校准鲁棒性的\textbf{最坏情况（白盒）分析}。UGCA引入KL散度损失以鼓励均匀预测分布，重排序机制以减少标签翻转，混合损失以在违规时恢复标签，以及束搜索以探索更广的对抗搜索空间。我们进一步提供理论见解，将模型泛化、数据集复杂性和校准脆弱性联系起来，表明在该威胁模型下，具有更高精度或在更多类别数据集上训练的模型更容易受到攻击。大量实验表明，UGCA在保持分类精度的同时显著增加了期望校准误差。我们的代码公开在https://github.com/CaptainCuong/Graph-Calibration-Attack.git。

英文摘要

While confidence calibration is essential for trustworthy decision-making in safety-critical applications, the robustness of calibrated GNNs to adversarial structural perturbations remains largely unexplored. However, studying calibration attacks on graphs presents unique technical challenges: (1) the discrete nature of graph structures complicates gradient-based optimization, (2) existing underconfidence objectives fail to drive predictions toward uniform distributions, and (3) GNNs are highly sensitive to edge perturbations, often causing unintended label changes that violate attack constraints. To address these challenges, we propose a \textbf{Unified Graph Calibration Attack (UGCA)} framework designed for \textbf{worst-case (white-box) analysis} of GNN calibration robustness. UGCA introduces a KL-divergence loss to encourage uniform predictive distributions, a reranking mechanism to reduce label flipping, a hybrid loss to recover labels when violations occur, and beam search to explore a broader adversarial search space. We further provide theoretical insights linking model generalization, dataset complexity, and calibration vulnerability, showing that models with higher accuracy or trained on datasets with more classes are more susceptible under this threat model. Extensive experiments demonstrate that UGCA substantially increases Expected Calibration Error while preserving classification accuracy. Our code is publicly available at https://github.com/CaptainCuong/Graph-Calibration-Attack.git.

URL PDF HTML ☆

赞 0 踩 0

2606.08517 2026-06-09 cs.LG cs.CL 新提交

A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control

自适应选择性共形风险控制的联合有限样本证书

Xiaoli Yu, Jiamiao Liu

发表机构 * Chongqing University of Posts and Telecommunications（重庆邮电大学）； Army Medical University (Third Military Medical University)（陆军军医大学（第三军医大学））

AI总结提出一种联合有限样本证书，同时上界选择性风险、下界接受概率和部署效用，适用于自适应阈值选择，通过比率风险的经验伯恩斯坦界等方法，在ImageNet和COCO上比Hoeffding-CRC提升22个百分点接受前沿，且紧致约10倍。

详情

AI中文摘要

选择性预测器在置信输入上做出预测，否则弃权；安全部署需要一个单一的有限样本证书，同时上界所选风险、下界接受概率 $\pacc$ 高于下限 $\pmin$，并下界部署效用。该证书必须在从 $\ncert$ 样本上的有限网格 $m$ 对中进行自适应阈值选择时有效。我们通过将所选风险直接视为比率而非通过Hoeffding式范围界，为有界、可能非单调的损失给出了这样的证书。该构造耦合了三个置信界：比率风险的方差自适应经验伯恩斯坦界、接受概率的Clopper-Pearson界以及效用的双边接近界。它们共同下界认证策略的绝对效用，并且与认证集上的最优策略相差不超过 $2\gammau$，两者在可行时均非平凡；一个按场景划分的第三部分与外部预言机匹配，仅在风险边际 $\gammar < α$ 时有信息量，在主要操作点处为空。相对于仅范围Hoeffding比率构造，这使接受下限依赖从 $1/\pmin$ 变为 $1/\sqrt{\pmin}$，并且一个闭式推论识别出每对场景，其中我们的风险界优于Hoeffding共形风险控制（Hoeffding-CRC）选择性界。实验上，在ImageNet（三个ResNet）和COCO val 2017全景分割上，该证书比Hoeffding-CRC打开了+22个百分点的认证接受前沿，并且比非平凡匹配验证基线紧致约10倍；这些增益是按场景的，非普适的，在ADE20K上不存在。认证器运行时间为 $O(\ncert m)$。

英文摘要

Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability $\pacc$ above a floor $\pmin$, and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of $m$ pairs on $\ncert$ samples. We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within $2\gammau$ of the best over the \emph{certified set}, both non-vacuous whenever feasible; a regime-scoped third leg matches an external oracle, informative only where the risk margin $\gammar < α$ and vacuous at the headline operating points. Relative to the range-only Hoeffding-ratio construction this sharpens the acceptance-floor dependence from $1/\pmin$ to $1/\sqrt{\pmin}$, and a closed-form corollary identifies a per-pair regime in which our risk bound dominates a Hoeffding conformal risk control (Hoeffding--CRC) selective bound. Empirically, on ImageNet (three ResNets) and COCO val 2017 panoptic, the certificate opens a $+22$ pp certified-acceptance frontier over Hoeffding--CRC and is ${\approx}10{\times}$ tighter than a non-vacuous matched-valid baseline; these gains are regime-scoped, not universal, and absent on ADE20K. The certifier runs in $O(\ncert m)$ time.

URL PDF HTML ☆

赞 0 踩 0

2606.08654 2026-06-09 cs.LG cs.NA math.AP math.NA stat.AP 新提交

Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime

二维不可压缩Navier-Stokes方程的算子学习：数据稀缺情况下的共形预测方法

Weinan Wang, Bowen Gang, Hao Deng

发表机构 * University of Oklahoma（俄克拉荷马大学）； Fudan University（复旦大学）

AI总结针对数据稀缺下算子学习的不确定性量化，提出基于扰动的共形预测框架，在二维Navier-Stokes基准上比现有方法生成更窄的共形带，同时保持目标覆盖。

详情

AI中文摘要

本文提出了一种基于扰动的共形预测框架，用于算子学习中的不确定性量化，重点关注二维Navier-Stokes方程。虽然神经算子为昂贵的PDE求解器提供了快速替代方案，但它们本身无法为时空场预测提供校准的不确定性。我们的方法将训练好的傅里叶神经算子（FNO）与分裂共形预测相结合，通过比较在几乎相同数据集上训练的两个算子的预测来构建局部不确定性尺度：一个使用原始标签，另一个使用添加小高斯噪声的标签。我们在数据稀缺情况下考虑该过程，其中总标签预算固定，而需要单独不确定性网络的方法必须在多个模型之间划分训练数据。在二维Navier-Stokes基准上，在匹配总数据预算的情况下，基于扰动的方法产生的共形带比现有方法窄得多，同时保持目标同时覆盖。这些结果表明，扰动敏感性是共形化神经算子的一种实用且样本高效的不确定性代理。

英文摘要

In this paper, we propose a perturbation-based conformal prediction framework for uncertainty quantification in operator learning, with a focus on the 2D Navier--Stokes equations. While neural operators provide fast surrogates for expensive PDE solvers, they do not by themselves provide calibrated uncertainty for spatiotemporal field predictions. Our approach wraps a trained Fourier Neural Operator (FNO) with split conformal prediction and constructs the local uncertainty scale by comparing the predictions of two operators trained on nearly identical datasets: one on the original labels and one on labels perturbed by small Gaussian noise. We consider this procedure in the data-scarce regime, where the total label budget is fixed and methods that require a separate uncertainty network must divide training data between multiple models. On the 2D Navier--Stokes benchmark, the perturbation-based method produces substantially narrower conformal bands than existing methods under matched total data budgets while maintaining the target simultaneous coverage. These results suggest that perturbation sensitivity is a practical and sample-efficient uncertainty proxy for conformalized neural operators.

URL PDF HTML ☆

赞 0 踩 0

2606.08682 2026-06-09 cs.LG cs.AI 新提交

Activation Steering Induces Emergent Misalignment: A More Comprehensive Evaluation

激活引导引发突现失调：一项更全面的评估

Qi Cao, Jian Lou, Meiting Liu, Wenjie Feng, Dan Li, See-Kiong Ng, Anh Tuan Luu

发表机构 * Nanyang Technological University（南洋理工大学）； Sun Yat-sen University（中山大学）； University of Science and Technology of China（中国科学技术大学）； National University of Singapore（新加坡国立大学）

AI总结研究激活引导是否引发突现失调，通过扩展评估范围，发现激活引导可导致广泛失调，且比微调产生更连贯的有害响应，并分析了关键因素。

详情

AI中文摘要

激活引导已成为一种流行的推理时技术，用于调节大型语言模型（LLMs）的行为。通过从目标行为的示例构建引导向量，并在推理期间将其注入中间激活，激活引导能够实现灵活的行为控制，同时避免微调所需的永久参数更新。与此同时，最近的研究将突现失调（EM）识别为一个重要的安全问题，其中在狭窄任务的不安全示例上微调的模型可能意外地泛化到无关任务上的广泛不安全行为。尽管微调引发的EM已被广泛研究，但激活引导是否能引发EM仍然相对未被探索，尽管它作为一种模型控制技术的使用日益增加。在本文中，我们对激活引导引发的突现失调进行了全面研究，大幅扩展了现有开创性工作的评估范围。首先，我们表明激活引导可以引发广泛的失调，即使在最近的Qwen-3.5系列中也是如此。此外，激活引导的模型产生的有害响应比微调模型具有更强的语义相关性和更高的连贯性，使得由此产生的失调可能更具危害性。其次，我们通过分析关键的引导特定因素来表征AS引发的EM的特性，包括引导幅度、引导子空间的低秩结构以及引导向量构建期间的周期数。第三，我们评估了AS引发的EM在不同模型家族、模型规模、目标任务和干预层上的鲁棒性和敏感性。我们的发现揭示了激活引导是突现失调的一个重要但未被充分研究的来源，并为理解EM的机制和安全风险提供了激活空间视角。

英文摘要

Activation steering has emerged as a popular inference-time technique for modulating the behavior of large language models (LLMs). By constructing a steering vector from examples of a target behavior and injecting it into intermediate activations during inference, activation steering enables flexible behavioral control while avoiding the permanent parameter updates required by finetuning. Meanwhile, recent work has identified emergent misalignment (EM) as a significant safety concern, wherein models finetuned on unsafe examples from a narrow task may unexpectedly generalize to broadly unsafe behavior on unrelated tasks. Although finetuning-induced EM has been extensively studied, whether activation steering can induce EM remains comparatively under-explored, despite its increasing use as a model-control technique. In this paper, we present a comprehensive study of activation-steering-induced emergent misalignment, substantially expanding the evaluation scope beyond existing pioneering work. First, we show that activation steering can induce broad misalignment, even in the recent Qwen-3.5 series. Moreover, activation-steered models produce harmful responses with stronger semantic relevance and higher coherence than their finetuned counterparts, making the resulting misalignment potentially more harmful. Second, we characterize properties of AS-induced EM by analyzing key steering-specific factors, including steering magnitude, the low-rank structure of the steering subspace, and the number of epochs during steering-vector construction. Third, we evaluate the robustness and sensitivity of AS-induced EM across diverse model families, model scales, target tasks, and intervention layers. Our findings reveal activation steering as a significant yet under-examined source of emergent misalignment and provide an activation-space perspective for understanding the mechanisms and safety risks of EM.

URL PDF HTML ☆

赞 0 踩 0

2606.08777 2026-06-09 cs.LG cs.AI 新提交

How Many Counterfactuals Does It Take? Probing VLM Hallucinations Through Circuits and Causal Effects

需要多少反事实？通过电路和因果效应探究VLM幻觉

Abhivansh Gupta, Simardeep Singh, Advika Sinha, Shreyansh Modi, Akshat Tomar

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度思维）

AI总结本文通过定义基于对数概率差异的因果影响度量，并利用电路发现技术，研究视觉语言模型幻觉输出的反事实鲁棒性，推导出检测不稳定所需的最小反事实样本数。

2606.08892 2026-06-09 cs.LG 新提交

Diffuse AI Control on Fuzzy Tasks

模糊任务上的扩散AI控制

Mikhail Terekhov, Caglar Gulcehre, Vivek Hebbar, Joe Benton

发表机构 * Anthropic Fellows Program (via MATS)（Anthropic 研究员计划（通过 MATS））； EPFL（洛桑联邦理工学院）； Redwood Research（红木研究）； Anthropic

AI总结针对AI在模糊任务上的长期扩散威胁，提出蓝队与红队对抗框架，通过弱模型评分训练强模型，并发现红队可利用多目标进化提示优化找到评分高但性能差的子版本行为，蓝队则通过对抗优化提升鲁棒性。

详情

AI中文摘要

部署在关键领域（如AI安全研究）的AI模型可能因对齐问题而微妙地破坏我们的努力。扩散AI控制是AI安全的一个子领域，旨在减轻长期部署范围内AI破坏（扩散威胁）带来的风险。这些风险在模糊任务上尤其有害，即难以评分或需要直觉的任务。为了理解模糊任务上的扩散威胁，我们引入了一个新颖的框架，将AI控制视为蓝队和红队之间的对抗游戏。蓝队使用一个弱可信模型构建一个弱评分，据此训练一个强大的、可能具有颠覆性的模型，以消除如果存在的颠覆倾向。然后红队试图找到被弱评分高评价的模型行为，这些行为可能不会被训练掉，但实际上对应着差的表现。我们在为近期ML论文的研究问题撰写实验提案的任务上测试了我们的框架。我们使用一个能够访问原始论文的语言模型作为代理“真实”评分器。我们的红队使用多目标进化提示优化发现了子版本行为。我们展示了Opus 4.6可以写出比GPT-OSS-20B更差的提案（根据真实代理评分），而弱评分器却将其评为与Opus 4.6最佳提案一样高。为了缓解威胁，我们为蓝队提出了一种对抗优化算法，该算法为弱模型发现更鲁棒的提示。该算法产生的蓝队提示，我们的红队优化未能利用。

英文摘要

AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a novel framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. To mitigate the threat, we propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

URL PDF HTML ☆

赞 0 踩 0

2606.08893 2026-06-09 cs.LG cs.AI cs.CR 新提交

Cheap Reward Hacking Detection

廉价奖励黑客检测

Iván Belenky, Joaquín Itria, Steven Johns

发表机构 * Tamarillo

AI总结提出用小Transformer编码器将轨迹映射到单位球面，使嵌入距离近似奖励与元数据的L1距离，线性探针检测奖励黑客，AUC达0.9467，成本比LLM-as-judge低四个数量级。

Comments 20 pages, 6 figures, 12 tables

2606.09043 2026-06-09 cs.LG cs.CL 新提交

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

DynaCF: 通过动态反事实敏感性缓解奖励模型中的捷径学习

Fengyuan Liu, Yongliang Miao, Zirui He, Yanguang Liu, Fei Sun, Mengnan Du

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； New Jersey Institute of Technology（新泽西理工学院）； Institute of Computing Technology, CAS（中国科学院计算技术研究所）

AI总结提出DynaCF框架，通过在线测量反事实扰动下的边际变化和偏好翻转来动态降低捷径敏感样本的权重，从而缓解奖励模型中的捷径学习问题。

2606.09204 2026-06-09 cs.LG cs.CL cs.CR 新提交

The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection

注入悖论：通过RAG上下文注入在安全训练的LLM推荐中实现品牌级压制

Hyunseok Paeng

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究发现在基于RAG的LLM推荐中，安全训练会导致注入提示反而压制目标品牌推荐率，揭示了安全机制可能被逆向利用的风险。

Comments 16 pages, 1 figure, 15 tables. Accepted at the ICML 2026 Workshop on Failure Modes in Agentic AI (FAGEN), a non-archival venue

详情

AI中文摘要

我们提出了一种在基于RAG的LLM推荐中安全训练的可复现失败模式——注入悖论，其中嵌入在检索文档中的提示注入反而对攻击者不利，将目标品牌压制到低于无注入基线的水平。在安全训练的Claude模型中，包含提示注入的文档推荐率急剧下降，且这种压制会传播到同一品牌的其他未修改文档。在Claude Opus 4.6中，目标品牌从54%的基线降至所有50次试验中零次前二推荐，尽管语料库中4个品牌文档只有1个包含注入。该方向模式在反事实实验和三个品牌中均得到复现。在测试的GPT模型中观察到相反结果，相同的注入反而增加了推荐，表明注入类上下文影响推荐行为的模型族差异。这些发现提出了逆向攻击场景的技术可能性，即攻击者将注入嵌入竞争对手文档，通过安全敏感模型行为压制竞争对手品牌。

英文摘要

We present a reproducible failure mode of safety training in RAG-based LLM recommendation -- the Injection Paradox -- in which prompt injections embedded in retrieved documents backfire against the attacker, suppressing the target brand below the injection-free baseline. In safety-trained Claude models, documents containing prompt injections suffer a sharp drop in recommendation rate, and this suppression propagates beyond the injected document to unmodified documents of the same brand. In Claude Opus 4.6, the target brand drops from a 54% baseline to zero top-2 recommendations across all 50 trials, even though only 1 of 4 brand documents in the corpus contains an injection. The directional pattern is reproduced in counterfactual experiments and across three brands. A contrasting result across the GPT models tested, where the same injection instead increases recommendations, suggests model-family differences in how injection-like context affects recommendation behavior. These findings raise the technical possibility of a reverse-attack scenario in which an adversary embeds injections in a competitor's documents to suppress the competitor's brand via safety-sensitive model behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.09559 2026-06-09 cs.LG cs.AI cs.CR cs.RO 新提交

Safe-RULE: Safe Reinforcement UnLEarning

Safe-RULE：安全强化反学习

Shixiong Jiang, Taozheng Zhu, Fanxin Kong

发表机构 * University of Notre Dame（圣母大学）

AI总结针对离线安全强化学习易受数据投毒攻击的问题，提出Safe-RULE框架，通过反学习移除恶意样本影响，无需从头训练或访问原始环境，实验证明能有效提升安全性。

Comments 20 pages, 3 figures

2606.07528 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

BEACON: 面向大语言模型跨模型幻觉检测的行为熵聚合

Naveen Bera, Pulijala Sai Nikhila, Kondaguduru Abhiram, Shaik Gayaz Ali, Shoaib Sadiq Salehmohamed, Shaik Mohammed Omar, Jinal Prashant Thakkar, Hansika Aredla, Shalmali Ayachit

发表机构 * LLM Lens

AI总结提出BEACON框架，通过多维度行为特征（语义熵、嵌入几何、思维链一致性、释义稳定性）的黑盒检测方法，在7个基准上达到0.8123 AUROC，优于现有方法。

Comments 12 pages, 6 tables, 1 figure. Code and data available upon request

详情

AI中文摘要

大语言模型中的幻觉，即生成事实上不正确或未经支持的内容，仍然是可靠部署的关键障碍。我们提出了BEACON（面向跨模型幻觉检测的行为熵聚合），一个黑盒幻觉检测框架，仅基于模型输出运行，无需访问内部表示或外部知识库。BEACON从结构化的多遍生成中提取31维特征向量，整合了基于NLI的语义熵、嵌入几何、思维链一致性和释义稳定性信号。在七个基准的7,617个标记样本上训练的梯度提升分类器达到了0.8123 ± 0.0102的AUROC（95%置信区间：0.7632-0.8251），优于独立的语义熵（+0.2298）和SelfCheckGPT风格的一致性基线（+0.2457）。特征重要性分析表明，幻觉本质上是多维的，需要组合的不确定性信号。一个高效的5次调用变体达到了0.7795的AUROC，使得在黑盒LLM API上的实际部署成为可能。

英文摘要

Hallucination in large language models (LLMs), defined as the generation of factually incorrect or unsupported content, remains a critical barrier to reliable deployment. We present BEACON (Behavioral Entropy Aggregation for Cross-model hallucination detectiON), a black-box hallucination detection framework that operates purely on model outputs without requiring access to internal representations or external knowledge bases. BEACON extracts a 31-dimensional feature vector from structured multi-pass generation, integrating NLI-based semantic entropy, embedding geometry, chain-of-thought consistency, and paraphrase stability signals. A gradient-boosted classifier trained on 7,617 labeled examples across seven benchmarks achieves 0.8123 +/- 0.0102 AUROC (95% CI: 0.7632-0.8251), outperforming standalone semantic entropy (+0.2298) and SelfCheckGPT-style consistency baselines (+0.2457). Feature importance analysis shows that hallucination is inherently multi-dimensional, requiring combined uncertainty signals. An efficient 5-call variant achieves 0.7795 AUROC, enabling practical deployment across black-box LLM APIs.

URL PDF HTML ☆

赞 0 踩 0

2606.07620 2026-06-09 cs.CV cs.AI cs.DC cs.LG 交叉投稿

SENTRY: Statistical Reliability Analysis of Vision Transformers Under Soft Errors

SENTRY: 视觉Transformer在软错误下的统计可靠性分析

Pramit Kumar Bhaduri, Mahdi Taheri, Samira Nazari, Maksim Jenihhin, Christian Herglotz, Michael Hubner

发表机构 * Brandenburg University of Technology Cottbus-Senftenberg（勃兰登堡工业大学）； Tallinn University of Technology（塔林理工大学）； Zanjan University（赞詹大学）

AI总结提出基于有限总体抽样的统计故障注入框架，仅需数千样本即可在99%置信度下以1%误差界估计故障率，将实验成本降低高达10700倍，并揭示ViT中归一化层和关键指数位是脆弱性热点。

详情

AI中文摘要

随着视觉Transformer在自动驾驶和医学成像等安全关键领域的应用增长，确保其抵抗软错误的可靠性至关重要。尽管ViT提供了最先进的准确性，但其庞大的参数数量使得穷举故障注入不可行。为弥补这一差距，本文提出一个统计故障注入框架，利用有限总体抽样理论提供形式化的可靠性保证。我们证明，无论模型规模如何，仅需数千个样本即可在99%置信度下将故障率限制在1%的误差界内。与穷举方法相比，该方法将实验成本降低高达10700倍，同时保留跨架构组件定位脆弱性的能力。通过对ViT-Tiny和ViT-Small等不同架构的广泛评估，我们揭示了高度非均匀的可靠性景观。结果表明，虽然只有3%的FP32位翻转导致故障，但其中绝大多数事件导致灾难性的精度崩溃。具体脆弱性被定位到归一化层和IEEE-754格式中的关键指数位，为设计加固的、边缘部署的ViT架构提供了数学基础和可操作的见解。

英文摘要

With the growth of Vision Transformers in safety-critical domains like autonomous systems and medical imaging, ensuring their reliability against soft errors is paramount. While ViTs offer state-of-the-art accuracy, their massive parameter counts render exhaustive fault injection campaigns infeasible. To bridge this gap, a statistical fault injection framework is presented, leveraging finite-population sampling theory to provide formal reliability guarantees. It is demonstrated that failure rates are bounded within a 1% margin at 99\% confidence using only a few thousand samples, regardless of model scale. This methodology achieves up to a 10,700 times reduction in experimental cost compared to exhaustive approaches, while preserving the ability to localize vulnerabilities across architectural components. Through extensive evaluation of different architectures like ViT-Tiny and ViT-Small, a highly non-uniform reliability landscape is uncovered. It is shown that while only 3% of FP32 bit-flips result in failure, the vast majority of these events lead to catastrophic accuracy collapse. Specific vulnerabilities are localized to normalization layers and critical exponent bits within the IEEE-754 format, providing a mathematical foundation and actionable insights for the design of hardened, edge-deployed ViT architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.07660 2026-06-09 cs.CV cs.LG 交叉投稿

代码不仅仅是文本：代码生成的不确定性估计

Yuling Shi, Caiqi Zhang, Yuexian Li, Haopeng Wang, Yeheng Chen, Nigel Collier, Xiaodong Gu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； University of Cambridge（剑桥大学）

AI总结针对代码生成中错误程序的可靠性问题，提出基于词法、算法和功能三个正交轴的不确定性估计方法，在五个代码LLM上将AUROC提升8.1个百分点。

详情

AI中文摘要

大型语言模型（LLMs）越来越多地被部署为代码生成器，其中静默错误的程序会带来真实的安全和可靠性风险。可靠的不确定性估计（UE）对于选择性预测、人在回路审查和下游智能体决策至关重要。然而，现有的大多数代码UE方法继承自自然语言（NL）生成，忽略了使代码独特的属性。我们认为代码在三个方面与NL不同：单个错误标记可能破坏整个程序（标记脆弱性）；算法意图和具体实现可能独立不一致（意图-代码差距）；程序可以被执行（可执行性）。我们将这些属性实例化为三个正交的不确定性轴：词法（Top-K标记熵）、算法（伪代码一致性）和功能（行为一致性）。在五个代码LLM上，我们的三轴集成将平均AUROC从最强NL衍生基线的0.696提高到0.776（+8.1点）。值得注意的是，在Qwen3-14B上，我们的单次Top-K标记熵匹配了最强多次基线，同时成本降低超过3倍；在各模型上，它仍然是一个有竞争力的低成本信号。这些结果表明，代码UE需要特定于代码的设计，而不是直接移植NL方法。

英文摘要

Large language models (LLMs) are increasingly deployed as code generators, where silently wrong programs pose real safety and reliability risks. Reliable uncertainty estimation (UE) is essential for selective prediction, human-in-the-loop review, and downstream agentic decisions. Yet most existing code UE methods are inherited from natural language (NL) generation and ignore properties that make code distinct. We argue that code differs from NL in three ways: a single wrong token can break an entire program (token fragility); algorithmic intent and concrete implementation can disagree independently (intent-code gap); and programs can be executed (executability). We instantiate these properties as three orthogonal uncertainty axes: lexical (Top-K token entropy), algorithmic (pseudo-code consistency), and functional (behavioral consistency). Across five code LLMs, our three-axis ensemble improves average AUROC from 0.696 for the strongest NL-derived baseline to 0.776 (+8.1 points). Notably, on Qwen3-14B, our single-pass Top-K token entropy matches the strongest multi-pass baseline while being over 3x cheaper; across models, it remains a competitive low-cost signal. These results suggest that code UE deserves code-specific design rather than direct NL ports.

URL PDF HTML ☆

赞 0 踩 0

2606.09700 2026-06-09 cs.CR cs.HC cs.LG 交叉投稿

What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks

眼睛所见，大语言模型所不见：利用人类感知进行对抗性文本攻击

Qin Yang, Lu Malloy, Joshua Lee, Xiaohan Chang, Meisam Mohammady, Doowon Kim, Yuan Hong

发表机构 * University of Connecticut（康涅狄格大学）； University of Tennessee（田纳西大学）； University of California, Santa Barbara（加州大学圣芭芭拉分校）； Iowa State University（爱荷华州立大学）

AI总结针对LLM内容审核系统忽视人类视觉线索的缺陷，提出人类可感知对抗攻击（HPAA），通过排版操纵嵌入有害内容，在仅三次查询下实现86%人类识别率而机器检测率低于1%。

Comments This work has been accepted for publication at USENIX Security 2026. This paper includes examples of harmful, hateful, or abusive language for research purposes. Reader discretion is advised

详情

AI中文摘要

基于大型语言模型（LLM）的内容审核系统已成为对抗有害在线内容的关键防线。然而，这些系统主要基于分词文本运行，很大程度上忽略了人类在解释内容时自然依赖的视觉线索。我们表明，这种差异造成了根本性的感知不匹配：人类容易识别为有害的内容，对自动审核系统而言可能变得几乎不可见。为研究这一漏洞，我们引入了一类人类可感知对抗攻击（HPAA），其中有害表达通过视觉上显著的排版操纵嵌入到原本良性的文本中。我们的关键洞察是，排版特征（包括间距、视觉强调和空间排列）可以策略性地组合，以保留人类对有害内容的识别，同时大幅降低机器可检测性。在黑盒设置下，仅使用少量查询预算，我们的攻击自动生成规避内容，无需模型访问或梯度信息。我们在多个数据集和十个已部署的审核系统（包括商业API和最先进的开源防护）上评估了该攻击。结果揭示了人类与机器感知之间的显著差距：仅使用三次检测器查询，生成的攻击在评估系统中实现了超过86%的人类识别率，同时检测率低于1%。我们进一步进行消融研究，以识别驱动成功规避的排版因素，分析当前审核架构为何无法捕捉这些信号，并讨论实际防御措施。我们的发现暴露了当今基于LLM的审核生态系统中的根本盲点，并强调了需要以更符合人类感知理解的方式推理内容的审核系统。

英文摘要

Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fundamental perceptual mismatch: content that is readily recognized as harmful by humans can become effectively invisible to automated moderation systems. To study this vulnerability, we introduce a class of Human-Perceptible Adversarial Attacks (HPAA), in which harmful expressions are embedded into otherwise benign text through visually salient typographic manipulations. Our key insight is that typographic features, including spacing, visual emphasis, and spatial arrangement, can be strategically combined to preserve human recognition of harmful content while substantially reducing machine detectability. Operating in black-box settings with only a small query budget, our attack automatically generates evasive content without requiring model access or gradient information. We evaluate the attack across multiple datasets and ten deployed moderation systems, including commercial APIs and state-of-the-art open-source guardrails. Results reveal a striking gap between human and machine perception: with only three detector queries, generated attacks achieve over 86\% human recognition while maintaining detection rates below 1\% across the evaluated systems. We further conduct ablation studies to identify the typographic factors driving successful evasion, analyze why current moderation architectures fail to capture these signals, and discuss practical defenses. Our findings expose a fundamental blind spot in today's LLM-based moderation ecosystem and highlight need for moderation systems that reason about content in a manner more consistent with human perceptual understanding.

URL PDF HTML ☆

赞 0 踩 0

2606.09701 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

学习攻击与防御：通过GRPO对语言模型进行自适应红队测试

Blake Bullwinkel, Eugenia Kim, Amanda Minnich, Mark Russinovich

发表机构 * Microsoft AI Red Team（微软AI红队）； Microsoft Azure（微软Azure）

AI总结提出AdvGRPO框架，通过密集多通道奖励和分离优势归一化实现GRPO在攻击者-防御者联合优化中的稳定训练，产生高效可迁移攻击，防御者优于基线。

2606.09746 2026-06-09 cs.CV cs.AI cs.LG 交叉投稿

Hybrid Robustness Verification for Spatio-Temporal Neural Networks

时空神经网络的混合鲁棒性验证

Sherwin Varghese, Matthew Wicker, Alessio Lomuscio

发表机构 * Imperial College London（伦敦帝国学院）

AI总结针对3D CNN在视频和体素输入中的鲁棒性验证，提出时空约束建模和STBP框架，实现精确闭式传播与可扩展近似，在UCF-101等基准上提升1.7倍认证鲁棒准确率。

Comments Accepted at the 9th International Symposium on AI Verification (SAIV 2026)

详情

AI中文摘要

随着人工智能越来越多地部署在安全关键系统中，为底层模型提供形式化的鲁棒性保证至关重要。现有的验证方法要么依赖过于保守的近似，要么产生难以承受的计算成本。例如，在视频设置中使用lp-范数扰动编码了对手可以在每个视频帧中注入噪声的信念。实际上，对抗性扰动表现出结构化的时空相关性，被约束在低维、语义上有意义的子空间中。在这项工作中，我们研究了处理视频和体素输入的3D CNN的鲁棒性验证，针对动作识别（UCF-101）、自动驾驶（Udacity）和医学成像（MedMNIST）中的应用，通过将对抗强度建模为时空约束——攻击者可以修改一组连续帧中的子集或补丁——来利用关于对抗强度的现实假设。我们证明，建模现实约束能够实现更紧的近似。我们引入了时空边界传播（STBP），这是一个验证框架，它计算第一卷积层的精确闭式表征，并通过可扩展的近似传播认证边界。计算精确闭式为第一卷积层提供了最紧的边界。因此，我们在网络的其余部分使用近似方法。为了推动该领域的进一步发展，我们提出了ST-Bench，一个用于自动驾驶和活动识别的验证基准，以系统评估可验证的鲁棒性。与现有的基于验证的方法相比，STBP在相同的扰动预算下提供了更强的鲁棒性保证，并显著提高了可扩展性，实现了1.7倍更高的认证鲁棒准确率。

英文摘要

With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the underlying models is essential. Existing verification methods either rely on overly conservative approximations or incur prohibitive computational costs. For example, the use of lp-norm perturbations in video settings encodes the belief that the adversary can inject noise in every video frame. In practice, adversarial perturbations exhibit structured spatial and temporal correlations, constrained to lower-dimensional, semantically meaningful subspaces. In this work, we study robustness verification of 3D CNNs processing video and volumetric inputs, targeting applications in action recognition (UCF-101), autonomous driving (Udacity), and medical imaging (MedMNIST) exploiting realistic assumptions on adversarial strength by modelling them as spatio-temporal constraints - where the attacker can modify either a subset of frames or patches within a set of consecutive frames. We demonstrate that modelling realistic constraints enables tighter approximations. We introduce Spatio-Temporal Bound Propagation (STBP), a verification framework that computes an exact closed-form characterization of the first convolutional layer and propagates certified bounds through subsequent layers using scalable approximations. Computing the exact closed form provides the tightest bounds for the first convolutional layer. Thus, we utilise approximation methods in the remainder of the network. To spur further progress in this field, we propose ST-Bench, a verification benchmark for autonomous driving and activity recognition, to systematically evaluate verifiable robustness. Compared to existing verification-based approaches, STBP provides stronger robustness guarantees with significantly improved scalability, achieving 1.7x higher certified robust accuracy under identical perturbation budgets.

URL PDF HTML ☆

赞 0 踩 0

2403.06013 2026-06-09 cs.LG cs.CV 版本更新

Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape

分类鲁棒性与解释鲁棒性真的强相关吗？基于输入损失景观的分析

Tiejin Chen, Wenwang Huang, Linsey Pang, Dongsheng Luo, Hua Wei

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文质疑分类鲁棒性与解释鲁棒性强相关的传统观点，通过聚类评估解释鲁棒性，并提出调整解释损失景观的训练方法，发现两者并不强相关。

详情

AI中文摘要

本文深入探讨深度学习鲁棒性的关键领域，挑战了图像分类系统中分类鲁棒性和解释鲁棒性固有相关的传统观点。通过一种利用聚类高效评估解释鲁棒性的新颖评估方法，我们证明增强解释鲁棒性并不一定会使输入损失景观相对于解释损失变得平坦——这与平坦的损失景观指示更好的分类鲁棒性相反。为了深入探究这一矛盾，我们提出了一种开创性的训练方法，旨在调整相对于解释损失的损失景观。通过这种新的训练方法，我们发现尽管这种调整可以影响解释的鲁棒性，但它们对分类的鲁棒性没有影响。这些发现不仅挑战了两种鲁棒性之间强相关的主流假设，而且为理解损失景观与解释损失之间的关系开辟了新的途径。

英文摘要

This paper delves into the critical area of deep learning robustness, challenging the conventional belief that classification robustness and explanation robustness in image classification systems are inherently correlated. Through a novel evaluation approach leveraging clustering for efficient assessment of explanation robustness, we demonstrate that enhancing explanation robustness does not necessarily flatten the input loss landscape with respect to explanation loss - contrary to flattened loss landscapes indicating better classification robustness. To deeply investigate this contradiction, a groundbreaking training method designed to adjust the loss landscape with respect to explanation loss is proposed. Through the new training method, we uncover that although such adjustments can impact the robustness of explanations, they do not have an influence on the robustness of classification. These findings not only challenge the prevailing assumption of a strong correlation between the two forms of robustness but also pave new pathways for understanding relationship between loss landscape and explanation loss.

URL PDF HTML ☆

赞 0 踩 0

2506.06891 2026-06-09 cs.LG cs.CR 版本更新

Robust In-Context Reinforcement Learning Under Reward Poisoning Attacks

奖励投毒攻击下的鲁棒上下文强化学习

Paulius Sasnauskas, Yiğit Yalın, Goran Radanović

发表机构 * Department of Computing Science, University of Alberta, Edmonton, Canada.（阿尔伯塔大学计算机科学系，加拿大埃德蒙顿）； Alberta Machine Intelligence Institute (Amii), Edmonton, Canada.（阿尔伯塔机器智能研究所（Amii），加拿大埃德蒙顿）

AI总结针对奖励投毒攻击，提出对抗训练框架AT-DPT，通过同时训练攻击者和DPT模型，显著提升上下文强化学习在赌博机环境下的鲁棒性，并泛化到MDP等复杂场景。

Comments ICML 2026, code available at https://github.com/PauliusSasnauskas/AT-DPT

详情

AI中文摘要

我们研究了上下文强化学习（ICRL）的腐败鲁棒性，重点关注决策预训练变换器（DPT, Lee et al., 2023）。为了应对针对DPT的奖励投毒攻击挑战，我们提出了一种新颖的对抗训练框架，称为对抗训练DPT（AT-DPT）。我们的方法同时训练一群攻击者，通过毒化环境奖励来最小化DPT的真实奖励，以及一个DPT模型从毒化数据中推断最优动作。我们评估了该方法相对于标准赌博机算法（包括旨在处理奖励污染的鲁棒基线）的有效性。结果表明，AT-DPT在学习攻击者下的赌博机设置中显著优于它们，并泛化到更复杂的环境，如自适应攻击者和MDP。它作为元强化学习方法，在学习有效的腐败鲁棒算法方面显示出在ICRL中的前景。

英文摘要

We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained DPT (AT-DPT). Our method simultaneously trains a population of attackers to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that AT-DPT significantly outperforms them in bandit settings under a learned attacker, and generalizes to more complex environments such as adaptive attackers and MDPs. It shows promise in ICRL as a meta-RL approach to learning effective corruption-robust algorithms.

URL PDF HTML ☆

赞 0 踩 0

2512.08499 2026-06-09 cs.LG cs.AI 版本更新

Developing Distance-Aware Physics-Constrained Probabilistic Frameworks for Industrial Prognostics

面向工业预测的具有距离感知的物理约束概率框架开发

Waleed Razzaq, Yun-Bo Zhao

发表机构 * University of Science and Technology China（中国科学技术大学）

AI总结提出两种无需采样的距离感知物理约束概率框架PC-SNGP和PC-SNER，通过谱归一化和动态加权策略平衡数据保真度与物理一致性，在轴承预测中提升精度和不确定性校准。

详情

AI中文摘要

可靠且物理可解释的工业预测概率框架的发展仍处于初期阶段，现有文献在输入远离训练流形时往往不敏感。本文开发了两种无需采样的、具有距离感知的物理约束概率框架：(i) PC-SNGP 和 (ii) PC-SNER。两者均对隐藏层权重应用谱归一化，强制从输入到潜在空间的bi-Lipschitz距离保持表示。PC-SNGP将密集输出替换为高斯过程，其后验方差随输入与训练流形的距离增加而增大。PC-SNER修改输出层以预测Normal-Inverse-Gamma (NIG)参数，用于距离保持估计。为在训练过程中保持数据保真度与物理一致性之间的平衡，我们引入了物理约束损失的动态加权策略。我们还引入了一个距离感知系数 (DAC) 指标来量化对分布偏移的敏感性。实验上，我们使用PRONOSTIA、XJTU-SY和HUST基准数据集在滚动轴承 (REBs) 预测上验证了两种框架。实验结果表明，与竞争基线相比，预测精度提高，不确定性估计校准良好，同时在交叉验证中保持可审计性能，并在极端对抗扰动下具有鲁棒性。

英文摘要

Development of reliable and physically interpretable probabilistic frameworks for industrial prognostics remain nascent, and existing literature is often insensitive as inputs move away from the training manifold. In this paper, we develop two sampling-free, distance-aware physics-constrained probabilistic frameworks: (i) PC-SNGP and (ii) PC-SNER. Both apply spectral normalization to hidden layer weights, enforcing bi-Lipschitz distance-preserving representation from the input to the latent space. PC-SNGP replaces the dense output with Gaussian process whose posterior variance increases with input distance from the training manifold. PC-SNER modifies the output layer to predict Normal-Inverse-Gamma~(NIG) parameters for distance preserving estimation. To maintain balance between data fidelity and physical consistency during training, we introduce a dynamic weighting strategy for the physics-constrained loss. We also introduce a distance-aware-coefficient~(DAC) metric to quantify sensitivity to distributional shifts. Empirically, we validate both frameworks on rolling-element-bearings (REBs) prognostics using the PRONOSTIA, XJTU-SY, and HUST benchmark datasets. Experimental results demonstrate improved prediction accuracy and well-calibrated uncertainty estimates relative to competing baselines, while maintaining auditable performance in cross-validation and robustness under extreme adversarial perturbations.

URL PDF HTML ☆

赞 0 踩 0

2512.08724 2026-06-09 cs.LG 版本更新

Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

通过自动化提示搜索暴露文本到图像模型中的隐藏偏见

Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结本文提出Bias-Guided Prompt Search框架，通过自动生成提示最大化图像偏见，揭示文本到图像模型中的隐藏偏见，提升公平性评估。

Comments ICML 2026. Code is here: https://github.com/manosplitsis/BGPS

详情

AI中文摘要

文本到图像（TTI）扩散模型已实现出色的视觉质量，但被反复显示在敏感属性如性别、种族和年龄上存在社会偏见。为缓解这些偏见，现有方法常依赖人工构建或由大型语言模型生成的提示数据集。除了编纂成本外，这还可能忽视那些触发偏见生成的未预见、不明显的提示，即使模型已进行去偏处理。本文引入Bias-Guided Prompt Search（BGPS），一个自动产生旨在最大化结果图像偏见的提示框架。BGPS包含两个组件：（1）一个指导生成中性属性提示的LLM，（2）对TTI内部表示起作用的属性分类器，引导LLM的解码过程向提示空间中放大目标图像属性的区域。我们在Stable Diffusion 1.5和最先进的去偏模型上进行了广泛实验，发现了一系列微妙且此前未记录的偏见，严重损害公平性指标。关键的是，发现的提示是可解释的，即可以由普通用户输入，定量提高困惑度度量相比于一个突出的硬提示优化对手。我们的发现揭示了TTI的脆弱性，同时BGPS扩展了偏见搜索空间，可以作为新的偏见缓解评估工具。

英文摘要

Text-to-image (TTI) diffusion models have achieved remarkable visual quality, yet they have been repeatedly shown to exhibit social biases across sensitive attributes such as gender, race and age. To mitigate these biases, existing approaches frequently depend on curated prompt datasets - either manually constructed or generated with large language models (LLMs) - as part of their training and/or evaluation procedures. Beside the curation cost, this also risks overlooking unanticipated, less obvious prompts that trigger biased generation, even in models that have undergone debiasing. In this work, we introduce Bias-Guided Prompt Search (BGPS), a framework that automatically generates prompts that aim to maximize the presence of biases in the resulting images. BGPS comprises two components: (1) an LLM instructed to produce attribute-neutral prompts and (2) attribute classifiers acting on the TTI's internal representations that steer the decoding process of the LLM toward regions of the prompt space that amplify the image attributes of interest. We conduct extensive experiments on Stable Diffusion 1.5 and a state-of-the-art debiased model and discover an array of subtle and previously undocumented biases that severely deteriorate fairness metrics. Crucially, the discovered prompts are interpretable, i.e they may be entered by a typical user, quantitatively improving the perplexity metric compared to a prominent hard prompt optimization counterpart. Our findings uncover TTI vulnerabilities, while BGPS expands the bias search space and can act as a new evaluation tool for bias mitigation.

URL PDF HTML ☆

赞 0 踩 0

2601.22736 2026-06-09 cs.LG cs.AI 版本更新

UA-DCM: Uncertainty-aware Causal Decision Making via Effect Bound Decomposition

UA-DCM: 基于效应界分解的不确定性感知因果决策

Md Musfiqur Rahman, Ziwei Jiang, Hilaf Hasson, Murat Kocaoglu

发表机构 * Electrical and Computer Engineering, Purdue University（帕克大学电气与计算机工程系）； Computer Science, Johns Hopkins University（约翰霍普金斯大学计算机科学系）； Cohesity

AI总结提出一种新框架，通过分解因果效应值的可消除与不可消除部分，区分收集更多样本能否帮助识别最优行动，并利用神经因果模型近似实现该分解。

详情

AI中文摘要

从观测数据中进行因果推断可以为决策场景中找到最佳行动提供有力证据，而无需进行昂贵的随机试验。由于未观测到的混杂因素，即使有无限数据，行动的因果效应也往往不是点可识别的。此外，仅有有限样本为因果效应估计增加了另一层不确定性。现有几种方法可用于获得因果效应的上下界，从符号方法到最近的基于神经网络的方法，这些方法隐式地结合了两种不确定性来源。然而，这些方法并未告知收集更多样本是否有助于从观测数据中识别最佳行动，使专家对其数据收集策略一无所知。我们通过一种新颖的框架解决了这个问题，该框架能够区分可能通过收集更多样本消除的因果效应值范围与那些高概率无法通过更多观测样本消除的值范围。我们证明这种划分可以通过求解最大-最小和最小-最大优化问题获得。我们利用神经因果模型在实践中近似恢复这种分解。通过在合成和真实世界数据集上的实验，我们证明了我们的算法可以确定何时收集更多样本无助于确定最佳行动。我们的框架可以帮助从业者决定何时应诉诸非观测研究或寻求测量一些未测量的混杂因素以进行最优决策。

英文摘要

Causal inference from observational data can provide strong evidence for finding the best action in a decision-making scenario without having to perform expensive randomized trials. The causal effect of an action is often not pointwise identifiable even with infinite data due to unobserved confounding factors. Furthermore, having only finitely many samples adds another layer of uncertainty to causal effect estimation. Several existing methods can be used to obtain upper and lower bounds to the causal effect, ranging from symbolic methods to the more recent neural network-based approaches, which implicitly incorporate both sources of uncertainty. However, these methods do not inform whether collecting more samples may or may not help identify the best action from observational data, leaving experts in the dark about their data collection strategies. We address this problem with a novel framework that can distinguish the range of causal effect values that might be eliminated by collecting more samples from the range of values that, with high probability, cannot be eliminated with more observational samples. We show that this partitioning can be obtained by solving max-min and min-max optimization problems. We leverage neural causal models to approximately recover this decomposition in practice. We demonstrate via experiments on synthetic and real-world datasets that our algorithm can determine when collecting more samples will not help determine the best action. Our framework can help practitioners decide when to resort to non-observational studies or seek to measure some of the unmeasured confounders for optimal decision-making.

URL PDF HTML ☆

赞 0 踩 0

2602.16015 2026-06-09 cs.LG 版本更新

Geometry-Aware Uncertainty Quantification via Conformal Prediction on Manifolds

几何感知的不确定性量化：流形上的保形预测

Marzieh Amiri Shahbazi, Ali Baheri

发表机构 * Rochester Institute of Technology（罗切斯特理工大学）

AI总结提出自适应测地线保形预测框架，通过测地距离和交叉验证局部难度归一化，在球面和IGRF-14地磁场预测中实现有效覆盖并改善条件覆盖。

2604.12277 2026-06-09 cs.LG 版本更新

Models Know Their Shortcuts: Deployment-Time Shortcut Mitigation

模型知晓其捷径：部署时的捷径缓解

Jiayi Li, Shijie Tang, Gün Kaynar, Shiyi Du, Carl Kingsford

发表机构 * Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University（雷和斯蒂芬妮·兰德计算生物学系，计算机科学学院，卡内基梅隆大学）

AI总结研究提出在部署时通过无监督梯度归因缓解预训练文本编码器的捷径学习，证明部署时的缓解在信息理论上受训练时缓解的限制，并在情感分类、毒性检测和自然语言推理中取得显著性能提升。

详情

AI中文摘要

预训练文本编码器容易产生捷径学习，依赖于token-标签相关性，一旦在部署时分布偏移就会失效。现有捷径缓解方法主要在训练时操作，假设能获取训练数据、训练动态或捷径注释，这些在部署时难以获得，只有收敛的模型存在。我们证明该模型本身足以在部署时缓解捷径：一个偏置模型内部化了其学习捷径的信号，可通过无监督梯度归因捕捉。我们进一步证明部署时的缓解在信息理论上受训练时缓解的限制。尽管如此，利用这一梯度信号，我们提出的无监督部署时捷径缓解框架Shortcut Guardrail，通过恢复捷径分布偏移下的性能，在情感分类、毒性检测和自然语言推理中达到或超越训练时基线性能。

英文摘要

Pretrained text encoders are prone to shortcut learning, relying on token-label correlations that fail once the distribution shifts in deployment. Existing shortcut mitigation methods mainly operate at training time and assume access to training data, training dynamics, or shortcut annotations, which are hardly available during deployment, where only the converged model remains. We show that this model alone suffices to mitigate shortcuts during deployment: a biased model internalizes a signal of its learned shortcuts that can be captured via unsupervised gradient-based attribution. We further prove that deployment-time mitigation is information-theoretically upper-bounded by training-time mitigation. Nevertheless, exploiting this gradient signal, our proposed unsupervised deployment-time shortcut mitigation framework for pretrained text encoders, Shortcut Guardrail, recovers substantial performance under shortcut distribution shift, matching or outperforming training-time baselines across sentiment classification, toxicity detection, and natural language inference.

URL PDF HTML ☆

赞 0 踩 0

2605.03058 2026-06-09 cs.LG cs.AI 版本更新

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

基于对比分层消融的大语言模型神经元锚定规则提取

Francesco Sovrano, Gabriele Dominici, Marc Langheinrich

发表机构 * Università della Svizzera italiana（瑞士意大利大学）

AI总结提出MechaRule方法，通过定位稀疏激动剂激活将规则提取锚定在LLM电路中，利用自适应组测试和置信引导剪枝，以极低代价高召回率识别关键神经元，并在算术和越狱任务中验证其有效性。

Comments Accepted for publication at KDD'2026

详情

DOI: 10.1145/3770855.3818091

AI中文摘要

可解释AI的一个核心目标是符号化地表达大语言模型（LLM）的决策逻辑，并将其锚定在内部机制中。现有的规则提取方法通常学习非锚定的符号代理，而机械可解释性将行为与神经元联系起来，但通常需要手工假设和昂贵的干预。我们提出MechaRule，一种通过定位稀疏激动剂激活（其消融会破坏规则相关行为）将规则提取锚定在LLM电路中的流程。MechaRule基于两个发现。首先，在固定的基线/翻转机制下，稀疏激动剂效应可能表现出“超越”：少数高效应的激活在较大组中仍可检测到，主导较弱效应，并翻转许多相同的示例。在这种机制下，使用置信引导的保守剪枝的自适应组测试，当k << N为激动剂时，需要对N个候选进行O(k log(N/k) + k)次干预。其次，在与接近忠实规则行为对齐的数据分割上，激动剂的定位更可靠；谱分割提供了无规则的备选方案，而不忠实的分割会降低定位效果。实验上，在算术和越狱任务中，MechaRule在匹配的暴力验证中召回97.0%的最高效应激动剂，平均仅消耗完全消融成本的2.14%。消融定位的激动剂消除了97.6–100.0%的合格正确算术答案和越狱，并可纠正算术错误或诱导越狱，分别高达72.8%和32.5%。

英文摘要

A central goal of explainable AI is to express large language model (LLM) decision logic symbolically and ground it in internal mechanisms. Existing rule-extraction methods usually learn ungrounded symbolic surrogates, while mechanistic interpretability links behavior to neurons but often requires hand-crafted hypotheses and costly interventions. We introduce MechaRule, a pipeline that grounds rule extraction in LLM circuits by localizing sparse agonist activations whose ablation disrupts rule-related behavior. MechaRule rests on two findings. First, in a fixed baseline/flip regime, sparse agonist effects can exhibit overtopping: a few high-effect activations remain detectable within larger groups, dominate weaker ones, and flip many of the same examples. In such regimes, adaptive group testing with confidence-guided conservative pruning requires O(k log(N/k) + k) interventions over N candidates when k << N are agonists. Second, agonists are localized more reliably on data splits aligned with close-to-faithful rule behavior; spectral splits provide a rule-free fallback, whereas unfaithful splits degrade localization. Empirically, on arithmetic and jailbreaking, MechaRule recalls 97.0% of highest-effect agonists in matched brute-force validations at only 2.14% of exhaustive-ablation cost on average. Ablating the localized agonists eliminates 97.6--100.0% of eligible correct arithmetic answers and jailbreaks, and can correct arithmetic errors or induce jailbreaks by up to 72.8% and 32.5%.

URL PDF HTML ☆

赞 0 踩 0

2605.03226 2026-06-09 cs.LG cs.AI cs.CR 版本更新

超越独立操纵：具有同伴模仿的个体公平感知策略分类

Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Jinxuan Yang, Yuanlong Chen, Wangrong Huang, Shaowu Yang, Wenjing Yang, Xinwang Liu, Peng Cui, Haotian Wang

发表机构 * College of Computer Science and Technology, National University of Defense Technology（国防科技大学计算机科学与技术学院）； School of Mathematical Sciences, Peking University（北京大学数学学院）； Institute for Theoretical Computer Science, Shanghai University of Finance and Economics（上海财经大学理论计算机科学研究所）； Information Technology Development, Aetos Capital Group, Sydney（悉尼Aetos资本集团信息技术部）； Faculty of Computing, Harbin Institute of Technology（哈尔滨工业大学计算机学院）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）

AI总结提出个体公平感知策略分类（IFSC）框架，通过建模基于个体公平的同伴驱动操纵（模仿邻近被接受同伴），并采用鲁棒学习过程处理同伴可观测性不确定性，以改善个体公平一致性并减轻模仿引起的扭曲。

Comments Accepted by SIGKDD2026

详情

DOI: 10.1145/3770855.3817670

AI中文摘要

策略分类（SC）研究智能体操纵其特征以从预测模型获得有利决策的场景。现有的公平感知SC方法主要关注群体公平，并通常假设智能体独立响应。然而，当需要个体公平时，确保相似个体获得相似结果，智能体的操纵变得相互依赖：一个智能体偏好的操纵取决于邻域的结果。这导致了经典SC公式与公平感知决策设置之间的不匹配，其中独立模型不再准确刻画策略操纵。为解决此问题，我们引入了个体公平感知策略分类（IFSC），这是一个框架，对由个体公平引起的同伴驱动操纵进行建模，其中智能体模仿附近被积极决策的同伴以获得有利结果。IFSC将策略操纵刻画为对可见被接受同伴的基于相似性的模仿，并在由此产生的操纵后分布下学习分类器。为了考虑同伴可观测性的不确定性，IFSC采用鲁棒学习过程，在操纵模拟期间引入随机扰动。在合成和真实数据集上的实验表明，IFSC改善了个体公平一致性并减轻了模仿引起的扭曲。

英文摘要

Strategic classification (SC) investigates scenarios where agents manipulate their features to obtain favorable decisions from predictive models. Existing fairness-aware SC approaches primarily focus on group fairness and typically assume that agents respond independently. However, when individual fairness is required, ensuring similar individuals receive similar outcomes, agents' manipulation becomes interdependent: an agent's preferred manipulation depends on the neighborhoods' outcomes. This induces a mismatch between classical SC formulations and fairness-aware decision settings, where independent models no longer accurately characterize strategic manipulations. To address this issue, we introduce individual fairness-aware strategic classification (IFSC), a framework that models peer-driven manipulation arising from individual fairness, where agents imitate nearby positively decided peers to obtain favorable outcomes. IFSC characterizes strategic manipulation as similarity-based imitation toward visible accepted peers and learns classifiers under the resulting post-manipulation distributions. To account for uncertainty in peer observability, IFSC employs a robust learning process that introduces stochastic perturbations during manipulation simulation. Experiments on synthetic and real-world datasets demonstrate that IFSC improves individual-fairness consistency and mitigates imitation-induced distortions.

URL PDF HTML ☆

赞 0 踩 0

2501.15509 2026-06-09 cs.CR cs.AI cs.LG 版本更新

FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint

FIT-Print：通过目标指纹实现抗虚假声明的模型所有权验证

Shuo Shao, Haozhe Zhu, Yiming Li, Hongwei Yao, Tianwei Zhang, Zhan Qin

发表机构 * State Key Laboratory of Blockchain and Data Security, Zhejiang University（区块链与数据安全国家重点实验室，浙江大学）； Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou（杭州高新技术区（滨江）区块链与数据安全研究院，杭州）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算机与数据科学学院）； Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）

AI总结针对现有模型指纹易受虚假声明攻击的问题，提出目标指纹范式FIT-Print，通过优化将指纹转化为可验证目标签名，并设计两种黑盒方法，实现100%防御成功率和0%误报率。

Comments This paper has been accepted by IEEE Transactions on Information Forensics and Security

详情

AI中文摘要

模型指纹已成为保护开源模型知识产权的重要机制，提供了一种无需修改受保护模型的非侵入式方法。然而，我们的分析表明，现有指纹技术从根本上容易受到虚假声明攻击，即对手可以欺诈性地声称对独立的第三方模型拥有所有权。我们证明，这种脆弱性源于当前方法的非目标性，它们基于任意样本输出而非与特定预定义参考的对齐来评估模型相似性。为缓解此漏洞，我们引入了FIT-Print，一种主动对抗虚假声明攻击的目标指纹范式。具体来说，FIT-Print利用优化将指纹转化为可验证的目标签名。在此基础之上，我们提出了两种黑盒指纹方法：逐位的FIT-ModelDiff和逐列表的FIT-LIME，它们分别利用输出距离和特征归因作为鲁棒的模型签名。在基准模型和数据集上的广泛评估表明，我们的框架完美地中和了虚假声明攻击（100%防御成功率），消除了对独立模型的误报（0.0%），同时针对各种模型复用技术保持了100%的所有权验证率。

英文摘要

Model fingerprinting has emerged as a crucial mechanism for safeguarding the intellectual property of open-source models, offering a non-intrusive approach that requires no modifications to the protected model. However, our analysis reveals that existing fingerprinting techniques are fundamentally vulnerable to false claim attacks, wherein adversaries can fraudulently assert ownership over independent third-party models. We demonstrate that this vulnerability stems from the untargeted nature of current methods, which evaluate model similarity based on arbitrary sample outputs rather than alignment with a specific, predefined reference. To mitigate this vulnerability, we introduce FIT-Print, a targeted fingerprinting paradigm that actively counters false claim attacks. Specifically, FIT-Print leverages optimization to transform the fingerprint into a verifiable, targeted signature. Building upon this foundation, we propose two black-box fingerprinting methods, the bit-wise FIT-ModelDiff and the list-wise FIT-LIME, which utilize output distances and feature attributions as robust model signatures, respectively. Extensive evaluations across benchmark models and datasets show that our framework perfectly neutralizes false claim attacks (100% defense success rate) and eliminates false alarms on independent models (0.0%), all while maintaining a 100% ownership verification rate against diverse model reuse techniques.

URL PDF HTML ☆

赞 0 踩 0

2505.11189 2026-06-09 cs.AI cs.LG 版本更新

Can Global XAI Methods Reveal Injected Behaviours in LLMs? SHAP vs Rule Extraction vs RuleSHAP

全局XAI方法能否揭示LLM中的注入行为？SHAP vs 规则提取 vs RuleSHAP

Francesco Sovrano

发表机构 * Collegium Helveticum at ETH Zurich（苏黎世联邦理工学院霍夫曼学院）； Università della Svizzera italiana（瑞士联邦理工学院）

AI总结研究通过统计验证的抽象将全局LLM信念映射为数值分数，提出RuleSHAP算法，结合全局SHAP与规则归纳，以更好地捕捉非单变量触发因素，平均MRR@1比RuleFit提升82%。

Comments Accepted for publication at KDD'2026

详情

DOI: 10.1145/3770855.3818093

AI中文摘要

大型语言模型（LLM）可能放大错误信息，破坏联合国可持续发展目标等社会目标。我们研究了三个有文献记载的错误信息驱动因素（效价框架、信息过载和过度简化），这些因素通常由默认信念塑造。基于LLM编码此类默认信念（例如，“快乐是积极的”、“数学是复杂的”）并可作为“启发式包”的证据，我们询问是否可以从黑盒LLM行为中恢复出错误信息相关行为背后的信念驱动启发式作为显式规则。一个关键障碍是可解释AI（XAI）中的全局规则提取方法是为数值输入输出数据设计的，而非文本。我们通过引出全局LLM信念并通过统计验证的抽象将其映射为数值分数来解决这一问题，从而使现成的全局XAI能够检测信念驱动的启发式。为了获得真实情况，我们通过系统指令向GPT系列和Llama模型注入复杂度递增的非线性行为触发因素（单变量、合取、非凸）。我们发现RuleFit经常遗漏非单变量触发因素，而全局SHAP在排名合取触发特征方面更好，但不产生符号规则。为了弥合这一差距，我们提出了RuleSHAP，一种将全局SHAP聚合与规则归纳相结合的规则提取算法，以更好地捕捉非单变量触发因素，平均MRR@1比RuleFit提升82%。我们的结果提示了一种揭示LLM中行为触发因素的实用途径。

英文摘要

Large language models (LLMs) can amplify misinformation, undermining societal goals such as the UN SDGs. We study three documented drivers of misinformation (valence framing, information overload, and oversimplification) often shaped by default beliefs. Building on evidence that LLMs encode such defaults (e.g., "joy is positive", "math is complex") and can act as "bags of heuristics", we ask whether belief-driven heuristics behind misinformation-related behaviour can be recovered from black-box LLM behaviour as explicit rules. A key obstacle is that global rule-extraction methods in explainable AI (XAI) are built for numerical input-output data, not text. We address this by eliciting global LLM beliefs and mapping them to numerical scores via statistically validated abstractions, enabling off-the-shelf global XAI to detect belief-driven heuristics. For ground truth, we inject nonlinear behavioural triggers of increasing complexity (univariate, conjunctive, non-convex) into GPT-family and Llama models via system instructions. We find that RuleFit often misses non-univariate triggers, while global SHAP better ranks conjunctive trigger features but yields no symbolic rules. To bridge this gap, we propose RuleSHAP, a rule-extraction algorithm that couples global SHAP aggregates with rule induction to better capture non-univariate triggers, improving MRR@1 over RuleFit by +82% on average. Our results suggest a practical pathway for surfacing behavioural triggers in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2510.16028 2026-06-09 cs.CR cs.AI cs.LG cs.SY eess.SY 版本更新

TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks

TAO：面向浮点神经网络的容忍感知乐观验证

Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, Pramod Viswanath

发表机构 * Princeton University（普林斯顿大学）； HKUST (GZ)（香港科技大学（广州））； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出TAO协议，通过算子级容忍区域和Merkle锚定的争议游戏，在不依赖可信硬件或确定性内核的情况下验证浮点神经网络输出，开销仅0.3%。

Comments 18 pages, 8 figures

详情

DOI: 10.1145/3767295.3803612
Journal ref: Proceedings of the 21st European Conference on Computer Systems, (2026) 1515-1532

AI中文摘要

神经网络越来越多地在用户无法控制的硬件上运行（云GPU、推理市场）。然而，机器学习即服务很少透露实际运行的内容或返回的输出是否忠实反映预期输入。用户无法对服务降级（模型交换、量化、图重写或诸如修改广告嵌入等差异）进行追索。验证输出很困难，因为异构加速器上的浮点执行本质上是不确定的。现有方法要么对实际浮点神经网络不实用，要么重新引入供应商信任。我们提出TAO：一种容忍感知乐观验证协议，它接受在原则性算子级接受区域内的输出，而不是要求逐位相等。TAO结合了两种误差模型：（i）每个算子的IEEE-754最坏情况界限和（ii）跨硬件校准的紧密经验百分位分布。差异触发一个Merkle锚定的、阈值引导的争议游戏，该游戏递归地划分计算图，直到剩下一个算子，此时裁决简化为轻量级理论界限检查或针对经验阈值的小型诚实多数投票。未受挑战的结果在挑战窗口后最终确定，无需可信硬件或确定性内核。我们将TAO实现为PyTorch兼容运行时和当前部署在以太坊Holesky测试网上的合约层。运行时检测图、计算每个算子的界限，并在FP32中运行未经修改的供应商内核，开销可忽略（Qwen3-8B上为0.3%）。在A100、H100、RTX6000、RTX4090上的CNN、Transformer和扩散模型中，经验阈值比理论界限紧10^2-10^3倍，且考虑界限的对抗攻击成功率为0%。总之，TAO为现实世界的异构ML计算协调了可扩展性和可验证性。

英文摘要

Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.

URL PDF HTML ☆

赞 0 踩 0

2601.12263 2026-06-09 cs.CL cs.AI cs.LG 版本更新

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

多模态生成式引擎优化：针对视觉-语言模型排序器的排名操纵

Yixuan Du, Chenxiao Yu, Haoyan Xu, Ziyi Wang, Yue Zhao, Xiyang Hu

发表机构 * Georgetown University（乔治城大学）； University of Southern California（南加州大学）； University of Maryland, College Park（马里兰大学学院公园分校）； Arizona State University（亚利桑那州立大学）

AI总结提出多模态生成式引擎优化（MGEO）方法，通过联合优化图像扰动和文本后缀，利用视觉-语言模型内部跨模态知识耦合，实现对产品排名的有效操纵，揭示了多模态基础模型知识基础的脆弱性。

Comments Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM) at ACL 2026

详情

AI中文摘要

视觉-语言模型（VLM）将视觉和文本知识整合到统一表示中，日益成为现代检索和推荐系统的基础。然而，这些模型在对多模态项目进行排序时如何可靠地利用其跨模态知识，以及其知识基础是否可以被颠覆，仍不清楚。在本文中，我们揭示了VLM在多模态产品排序中应用知识的一个基本漏洞：通过多模态生成式引擎优化（MGEO），我们展示了攻击者可以通过联合制作难以察觉的图像扰动和流畅的文本后缀，利用模型内部的跨模态知识耦合，操纵VLM的排序决策。MGEO采用交替优化策略，针对VLM中视觉和语言表示之间的深层交互，实现了远超单模态攻击和由强大商业模型驱动的启发式基线的排名操纵。我们的发现表明，表面内容质量不足以提升排名；相反，需要直接与模型内部知识利用机制对齐。这些结果对多模态基础模型中知识基础的忠实性和鲁棒性提出了重要问题，并激励了未来多模态检索系统防御机制的研究。代码见：this https URL

英文摘要

Vision-Language Models (VLMs) integrate visual and textual knowledge into unified representations that increasingly underpin modern retrieval and recommendation systems. However, it remains unclear how reliably these models utilize their cross-modal knowledge when ranking multimodal items, and whether their knowledge grounding can be subverted. In this paper, we expose a fundamental vulnerability in how VLMs apply multimodal knowledge for product ranking: through Multimodal Generative Engine Optimization (MGEO), we show that an adversary can manipulate a VLM's ranking decisions by jointly crafting imperceptible image perturbations and fluent textual suffixes that exploit the model's internal cross-modal knowledge coupling. Using an alternating optimization strategy, MGEO targets the deep interactions between visual and linguistic representations within the VLM, achieving rank manipulations that substantially exceed those of unimodal attacks and heuristic baselines powered by strong commercial models. Our findings reveal that surface-level content quality is insufficient for rank promotion; instead, direct alignment with the model's internal knowledge utilization mechanism is required. These results raise important questions on the faithfulness and robustness of knowledge grounding in multimodal foundation models, and motivate future work on defense mechanisms for multimodal retrieval systems. Code is available at: https://github.com/glad-lab/MGEO

URL PDF HTML ☆

赞 0 踩 0

2602.16061 2026-06-09 stat.ML cs.LG econ.EM stat.ME 版本更新

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

利用预训练模型中的弱影子变量在缺失数据下的部分识别

Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong

发表机构 * Massachusetts Institute of Technology, Cambridge, MA 02139（麻省理工学院）； Emory University, Atlanta, GA 30322（埃默里大学）

AI总结针对缺失非随机（MNAR）导致的估计偏差，提出部分识别框架，通过线性规划结合预训练模型（如LLM）的预测作为弱影子变量收紧边界，并设计集合扩张估计器保证覆盖，实验显示识别区间缩小75-83%。

详情

AI中文摘要

从用户反馈中估计总体量（如平均结果）是平台评估和社会科学的基础，但反馈通常非随机缺失（MNAR）：意见更强的用户更可能回应，因此标准估计量有偏，且在没有额外假设的情况下目标量不可识别。现有方法通常依赖强参数假设或实践中可能不可用的定制辅助变量。在本文中，我们开发了一个部分识别框架，其中通过求解一对线性规划获得目标量的尖锐边界，其约束编码了观测数据结构。该公式自然地将来自预训练模型（包括大型语言模型LLM）的结果预测作为额外的线性约束纳入，从而收紧可行集。我们将这些预测称为弱影子变量：它们满足关于缺失性的条件独立性假设，但不需要经典影子变量方法所需的完备性条件。当预测足够信息时，边界坍缩为点，将标准识别作为特例恢复。在有限样本中，为了提供对识别集的有效覆盖，我们提出了一种集合扩张估计器，在集合识别状态下达到慢于$\sqrt{n}$的收敛速度，在点识别下达到标准$\sqrt{n}$速度。在模拟和半合成实验（基于客服对话）中，我们发现LLM预测通常对经典影子变量方法条件不良，但在我们的框架中仍然非常有效。在现实的MNAR机制下，它们将识别区间缩小75-83%，同时保持有效覆盖。

英文摘要

Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2602.16346 2026-06-09 cs.CL cs.LG 版本更新

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

有益于故障：测量多轮、多语言LLM代理中的非法协助

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut

发表机构 * EPFL（苏黎世联邦理工学院）； independent（独立研究员）； tubingen（图宾根大学）

AI总结本文提出STING框架，用于评估多轮多语言LLM代理在执行非法任务时的协助能力，发现低资源语言中攻击成功率不一致，提供实际部署中的压力测试方法。

Comments Accepted in ICML 2026

详情

AI中文摘要

基于工具和记忆的LLM代理通过执行现实世界工作流。这些功能使恶意对手也能利用这些代理执行复杂的恶意场景。现有代理恶意使用基准测试主要测试单提示指令，留下测量代理在多轮中帮助执行有害或非法任务的空白。我们引入STING（序列测试非法N步目标执行），一种自动红队框架，构建基于良性角色的逐步非法计划，并通过适应性后续问题迭代探测目标代理，使用判断代理跟踪阶段完成。我们进一步引入分析框架，将多轮红队测试建模为首次越狱时间随机变量，使分析工具如发现曲线、攻击语言的危险比率归因以及新指标：受限均值越狱发现。在AgentHarm场景中，STING的非法任务完成率显著高于单轮提示和适应于工具使用代理的多轮基线。在六个非英语设置的多语言评估中，发现攻击成功率和非法任务完成率在低资源语言中不一致，与常见聊天机器人发现不同。总体而言，STING提供了一种评估和压力测试代理恶意使用在现实部署环境中的实用方法，其中交互本质上是多轮且经常多语言的。

英文摘要

LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to carry out complex misuse scenarios. Existing agent misuse benchmarks largely test single-prompt instructions, leaving a gap in measuring how agents end up helping with harmful or illegal tasks over multiple turns. We introduce STING (Sequential Testing of Illicit N-step Goal execution), an automated red-teaming framework that constructs a step-by-step illicit plan grounded in a benign persona and iteratively probes a target agent with adaptive follow-ups, using judge agents to track phase completion. We further introduce an analysis framework that models multi-turn red-teaming as a time-to-first-jailbreak random variable, enabling analysis tools like discovery curves, hazard-ratio attribution by attack language, and a new metric: Restricted Mean Jailbreak Discovery. Across AgentHarm scenarios, STING yields substantially higher illicit-task completion than single-turn prompting and chat-oriented multi-turn baselines adapted to tool-using agents. In multilingual evaluations across six non-English settings, we find that attack success and illicit-task completion do not consistently increase in lower-resource languages, diverging from common chatbot findings. Overall, STING provides a practical way to evaluate and stress-test agent misuse in realistic deployment settings, where interactions are inherently multi-turn and often multilingual.

URL PDF HTML ☆

赞 0 踩 0

2603.07445 2026-06-09 cs.CL cs.LG 版本更新

Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning

少令牌，大杠杆：在微调期间通过约束安全令牌保持安全对齐

Guoli Wang, Haonan Shi, Tu Ouyang, An Wang

发表机构 * Case Western Reserve University（凯斯西储大学）

AI总结提出PACT框架，通过约束安全相关令牌的置信度来防止微调导致的安全对齐漂移，同时保持下游任务性能。

Comments Accepted to KDD 2026

详情

DOI: 10.1145/3770855.3817837

AI中文摘要

大型语言模型（LLMs）通常需要微调（FT）才能在下游任务上表现良好，但即使训练数据集仅包含良性数据，FT也可能导致安全对齐漂移。先前的研究表明，引入少量有害数据会显著损害LLM的拒绝行为，导致LLM顺从有害请求。现有的防御方法通常依赖于模型范围的干预，例如限制哪些参数更新或注入额外的安全数据，这可能会限制通用性并降低下游任务性能。为了解决这些限制，我们提出了一种名为PACT（通过约束令牌保持安全对齐）的微调框架，该框架稳定了模型在安全令牌上的置信度。我们的方法基于经验观察：安全对齐行为反映在模型的令牌级输出置信度中，并且通常集中在少量安全相关令牌上。在下游微调期间，我们正则化微调模型，使其在每一步响应中与对齐参考模型在安全相关令牌上的置信度匹配，同时允许非安全令牌基本不受约束以实现有效的任务适应。这种有针对性的约束防止了对齐漂移，而无需施加通常以牺牲模型效用为代价的全局限制。我们的代码可在{https://github.com/Glresearch1/PACT}获取。

英文摘要

Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the training dataset contains only benign data. Prior work shows that introducing a small fraction of harmful data can substantially compromise LLM refusal behavior, causing LLMs to comply with harmful requests. Existing defense methods often rely on model-wide interventions, such as restricting which parameters are updated or injecting additional safety data, which can limit generality and degrade downstream task performance. To address these limitations, we propose a fine-tuning framework called Preserving Safety Alignment via Constrained Tokens (PACT), which stabilizes the model's confidence on safety tokens. Our approach is motivated by the empirical observation that safety-aligned behavior is reflected in the model's token-level output confidence and is often concentrated on a small subset of safety-related tokens. During downstream fine-tuning, we regularize the fine-tuned model to match the aligned reference model's confidence on safety-related tokens at each response step, while leaving non-safety tokens largely unconstrained to allow effective task adaptation. This targeted constraint prevents alignment drift without imposing global restrictions that typically trade off with model utility. Our code is available at {https://github.com/Glresearch1/PACT}.

URL PDF HTML ☆

赞 0 踩 0

2604.17249 2026-06-09 cs.CR cs.AR cs.LG 版本更新

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

LLM服务系统中共享KV缓存块的位翻转漏洞

Yuji Yamamoto, Satoshi Matsuura

发表机构 * Institute of Science Tokyo（东京科学研究所）

AI总结研究揭示LLM服务系统中共享KV缓存块的位翻转漏洞，指出其具有静默分歧、选择性传播和持久累积特性，提出基于校验和的防护措施以限制累积损害。

Comments 12 pages, 4 figures. Accepted at SECRYPT 2026 (23rd International Conference on Security and Cryptography). Conference: https://secrypt.scitevents.org/

详情

AI中文摘要

在GPU DRAM上进行Rowhammer攻击可以导致模型权重中的对抗性位翻转；LLM服务系统中的共享KV缓存块呈现出类似但此前未被研究的目标。在vLLM的前缀缓存中，这些块以单一物理副本存在且无完整性保护。通过软件故障注入在理想位目标下，我们表征了最坏情况的严重性，并识别出三个特性：（1）静默分歧——16个BF16位位置中有13个产生一致但修改后的输出，无法与合法响应区分；（2）选择性传播——只有共享目标前缀的请求受影响；（3）持久累积——没有时间衰减，因此累积损害随后续请求线性增长。这些特性构成了不同于权重篡改的独特威胁：静默分歧和选择性传播使检测逃避成为可能；持久累积则继续 unchecked，损害放大仅受缓存块保持缓存时间的限制。基于校验和的防护措施在调度时检测任何单比特损坏，将累积损害限制为一个批次，无论块的缓存时间如何，且具有可忽略的开销。这些结果呼吁在端到端利用之前对前缀块进行完整性保护。

英文摘要

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.

URL PDF HTML ☆

赞 0 踩 0

2604.25965 2026-06-09 stat.ML cs.LG 版本更新

Adversarial Robustness of NTK Neural Networks

NTK神经网络的对抗鲁棒性

Yuxuan Hou

发表机构 * Qiuzhen College, Tsinghua University（清华大学求真学院）； Yau Mathematical Sciences Center, Tsinghua University（清华大学auer数学科学中心）

AI总结本文研究了NTK神经网络在非参数回归中的对抗鲁棒性，推导了Sobolev空间中的对抗回归最小最大最优速率，并证明了通过梯度流早停训练的NTK网络可达到该最优速率，但在过拟合情况下最小范数插值器易受对抗扰动影响。

2605.19228 2026-06-09 cs.CL cs.AI cs.IT cs.LG math.IT 版本更新

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

通过分步置信度归因诊断黑盒大语言模型的多步推理失败

Xiaoou Liu, Tiejin Chen, Dengjia Zhang, Yaqing Wang, Lu Cheng, Hua Wei

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出了一种基于分步置信度归因（SCA）的方法，用于诊断黑盒大语言模型在多步推理中的失败，通过信息瓶颈原理对生成的推理轨迹进行置信度评估，并通过实验验证该方法在数学推理和多跳问答任务中的有效性。

Comments Accepted by ICML 2026

详情

AI中文摘要

大型语言模型通过生成分步解决方案在具有客观答案的推理任务中实现了强大的性能，但诊断多步推理轨迹可能失败的位置仍然困难。置信度估计提供了一种诊断信号，但现有方法受限于最终答案或需要内部模型访问。在本文中，我们引入了分步置信度归因（SCA），一种适用于封闭源LLM的框架，该框架仅基于生成的推理轨迹分配步骤级置信度。SCA应用信息瓶颈原理：与正确解决方案中的一致结构对齐的步骤获得高置信度，而偏差则被标记为可能错误。我们提出了两种互补的方法：（1）NIBS，一种非参数化的IB方法，用于测量一致性而无需图结构，以及（2）GIBS，一种基于图的IB模型，通过可微分掩码学习子图以捕捉逻辑变化。在数学推理和多跳问答任务上的大量实验表明，SCA能够可靠地识别与推理错误高度相关的低置信度步骤。此外，使用步骤级置信度指导自我修正，比使用答案级反馈提高了13.5%的修正成功率。

英文摘要

Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing methods are restricted to final answers or require internal model access. In this paper, we introduce Stepwise Confidence Attribution (SCA), a framework for closed-source LLMs that assigns step-level confidence based only on generated reasoning traces. SCA applies the Information Bottleneck principle: steps aligning with consensus structures across correct solutions receive high confidence, while deviations are flagged as potentially erroneous. We propose two complementary methods: (1) NIBS, a non-parametric IB approach measuring consistency without graph structures, and (2) GIBS, a graph-based IB model that learns subgraphs through a differentiable mask to capture logical variability. Extensive experiments on mathematical reasoning and multi-hop question answering show that SCA reliably identifies low-confidence steps strongly correlated with reasoning errors. Moreover, using step-level confidence to guide self-correction improves the correction success rate by up to 13.5\% over answer-level feedback.

URL PDF HTML ☆

赞 0 踩 0

2606.00419 2026-06-09 stat.ML cs.LG 版本更新

Parameter-Free and Group Conditional Online Conformal Prediction

无参数和组条件在线共形预测

Beepul Bharti, Ambar Pal, Jacopo Teneggi, Jeremias Sulam

发表机构 * Data Science and AI Institute (DSAI), Johns Hopkins University（数据科学与人工智能研究院（DSAI），约翰霍普金斯大学）； Mathematical Institute for Data Science (MINDS), Johns Hopkins University（数据科学数学研究院（MINDS），约翰霍普金斯大学）； Department of Biomedical Engineering, Johns Hopkins University（生物医学工程系，约翰霍普金斯大学）； Department of Computer Science, Johns Hopkins University（计算机科学系，约翰霍普金斯大学）； Amazon Responsible AI（亚马逊负责任人工智能）

AI总结提出一种无参数算法用于组条件在线共形预测，在保证组条件覆盖的同时无需调参，并在合成和真实数据上验证了其有效性和可靠性。

详情

AI中文摘要

不确定性量化对于机器学习预测器在数据分布随时间变化（即数据可能不可交换）的真实场景中的部署至关重要。在线共形预测方法解决了这个问题，但代价是（i）组间误差控制或（ii）与学习率无关的实现。组条件覆盖对于跨不同数据点集合的公平性以及提供更精细的不确定性量化保证至关重要。无参数优化对于对抗对抗性和未知数据偏移的鲁棒性至关重要。我们提出了一种用于组条件在线共形预测的无参数算法，并证明它实现了最佳的组条件覆盖保证。我们在合成和真实数据上评估了我们的算法，表明我们的方法不仅提高了现有无参数在线共形预测方法的可靠性，而且提供了与调优良好的组条件方法大小相当的预测区间。通过将组条件覆盖与无参数在线算法统一，我们的工作为变化环境中公平且鲁棒的不确定性量化奠定了基础。

英文摘要

Uncertainty quantification (UQ) is critical for the deployment of machine learning predictors in real-world scenarios where the data distribution may shift over time (i.e., data may not be exchangeable). Online conformal prediction (OCP) methods address this issue at the expense of either (i) group-wise error control or (ii) learning-rate independent implementation. Group-conditional coverage is essential for fairness across different collections of data points and for providing finer UQ guarantees. Parameter-free optimization is crucial for robustness to adversarial and unknown data shifts. We propose a parameter-free algorithm for group-conditional OCP and demonstrate that it achieves the best group-conditional coverage guarantees. We evaluate our algorithm on synthetic and real-world data, demonstrating that our method not only improves the reliability of existing parameter-free OCP methods but also provides prediction intervals that are comparable in size to well-tuned group-conditional approaches. By unifying group-conditional coverage with parameter-free online algorithms, our work lays a foundation for fair and robust uncertainty quantification in shifting environments.

URL PDF HTML ☆

赞 0 踩 0

2606.07598 2026-06-09 cs.LG cs.AI 新提交

A Topological Characterization of Graph Neural Networks via Stochastic Block Model Embeddings on the n-Sphere

图神经网络的拓扑特征化：通过n-球面上的随机块模型嵌入

Gopal Anantharaman

发表机构 * KnotTheory.ai Inc.（KnotTheory.ai 公司）； Dept. of Mathematics, Emporia State University（恩波利亚州立大学数学系）

AI总结提出将消息传递神经网络诱导的随机块模型映射到单位n-球面的拓扑框架，用于比较训练后的图神经网络，并实现无需重新训练的迁移学习候选检索。

详情

AI中文摘要

我们提出一个拓扑框架，用于比较训练后的图神经网络（GNN），通过将消息传递神经网络（MPNN）在图信号空间上诱导的随机块模型（SBM）映射到单位$n$-球面$\sphere^{n-1}\subset\R^n$上。该构建基于三个经典支柱：割距离图空间$(\Wo,\cutdist)$的紧性\citep{lovasz2006limits,lovasz2012large}，Frieze--Kannan弱正则引理及其由\citet{levie2023graphon}推广的图信号扩展，以及MPNN关于割距离的Lipschitz连续性。我们证明，对于任意给定的容差$\varepsilon>0$，一个训练后的MPNN $Φ$作用于足够大的图时，可以通过一个复杂度有界的阶梯图信号（误差不超过$\varepsilon$）来分解，并且我们构造了一个显式的保测映射$Ψ_n\colon[0,1]\to\sphere^{n-1}$，将SBM区域放置在不相交的球冠上。这产生了一个与问题无关的低维训练GNN“指纹”，便于视觉检查和跨模型库的最近邻搜索，从而实现无需重新训练的迁移学习候选检索。我们讨论了高维中测度集中现象带来的障碍——这一现象与大规模语言模型规模的嵌入直接相关。最后，我们提出五个具体的未来研究方向：双曲和格拉斯曼流形替代球面模型，基于图信号的Gromov--Wasserstein距离作为$n$-球面映射的无等距替代，SBM流形的信息几何（Fisher）重新表述，逐层嵌入云的持续同调指纹，以及基于图信号特征分解的谱距离基线。

英文摘要

We propose a topological framework for comparing trained Graph Neural Networks (GNNs) by mapping the Stochastic Block Models (SBMs) induced on the graphon-signal space of a Message Passing Neural Network (MPNN) onto the unit $n$-sphere $\sphere^{n-1}\subset\R^n$. The construction rests on three classical pillars: the \emph{compactness} of the cut-distance graphon space $(\Wo,\cutdist)$ \citep{lovasz2006limits,lovasz2012large}, the Frieze--Kannan \emph{weak regularity lemma} together with its graphon-signal extension due to \citet{levie2023graphon}, and the Lipschitz continuity of MPNNs with respect to the cut-distance. We show that, for any prescribed tolerance $\varepsilon>0$, a trained MPNN $Φ$ acting on a sufficiently large graph factors (up to $\varepsilon$) through a step-graphon-signal of bounded complexity, and we construct an explicit measure-preserving map $Ψ_n\colon[0,1]\to\sphere^{n-1}$ that places the SBM regions on disjoint spherical caps. This produces a problem-agnostic, low-dimensional ``fingerprint'' of a trained GNN that is amenable to visual inspection and to nearest-neighbour search across model zoos, enabling \emph{transfer-learning candidate retrieval} without retraining. We discuss the obstruction posed by concentration of measure in high dimension -- a phenomenon directly relevant to LLM-scale embeddings. We close with five concrete future research directions: hyperbolic and Grassmannian alternatives to the spherical model, Gromov--Wasserstein distances on graphon-signals as an isometry-free alternative to the $n$-sphere map, an information-geometric (Fisher) reformulation of the SBM manifold, persistent-homology fingerprints of layer-wise embedding clouds, and a spectral-distance baseline derived from the graphon eigendecomposition.

URL PDF HTML ☆

赞 0 踩 0

2606.07619 2026-06-09 cs.LG math.GR 新提交

Graph Neural Networks for Predicting Solvability of Finite Groups

用于预测有限群可解性的图神经网络

Tal Weissblat

发表机构 * The Institute of Agricultural and Biosystems Engineering Agricultural Research Organization - Volcani Institute（农业与生物系统工程研究所农业研究组织-瓦尔康伊研究所）

AI总结提出图神经网络框架，利用Cayley图等图表示，仅通过结构信息区分可解群与不可解群，探索图神经网络学习群论代数性质的能力。

Comments 7 pages, 3 tables

2606.08067 2026-06-09 cs.LG 新提交

Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense

超越同质性：迈向广义图重构攻击与防御

Zhanke Zhou, Bo Han, Xuan Li, Jiangchao Yao, Sanmi Koyejo, Michael K. Ng

发表机构 * Hong Kong Baptist University（香港浸会大学）； Shanghai Jiao Tong University（上海交通大学）； Stanford University（斯坦福大学）

AI总结针对图神经网络可能泄露训练图邻接信息的问题，提出基于马尔可夫链近似的攻击方法MC-GRA(+)和防御方法MC-GPB(+)，在异质图上实现高保真重构攻击并有效防御。

详情

AI中文摘要

图神经网络（GNN）广泛部署于关系数据上，但它们可能泄露关于训练图邻接的敏感或专有信息，例如社交关系、交易和交互。本文研究图重构攻击（GRA），这是一种模型反演形式，从训练好的GNN中重构训练邻接，给定不同级别的攻击方信息。我们首先系统地表征了邻接何时以及为何通过特征、标签、嵌入和预测变得可恢复，其中泄漏由图的同质性、异质性和模型的归纳偏差调节。受这些发现启发，我们通过马尔可夫链近似视角审视GNN推理，将分层前向计算视为一个拓扑依赖表示的链。基于此视角，我们开发了互补的攻击和防御方法。在攻击方面，我们提出MC-GRA(+)，通过优化一个替代邻接来重构邻接，该替代邻接的GNN诱导表示在各层与目标模型的表示对齐。在防御方面，我们提出MC-GPB(+)，在整个表示链中抑制邻接依赖的信息，同时旨在在隐私-效用权衡下保持分类准确性。在同质/异质图基准和GNN上的实验表明，我们的攻击比先前方法提高了重构保真度，而我们的防御仅以轻微精度损失降低了重构成功率。

英文摘要

Graph neural networks (GNNs) are widely deployed on relational data, yet they can leak sensitive or proprietary information about the training graph adjacency, e.g., social ties, transactions, and interactions. This work studies graph reconstruction attacks (GRA), a form of model inversion that reconstructs the training adjacency from a trained GNN, given different levels of attacker-side information. We first provide a systematic characterization of when and why adjacency becomes recoverable through features, labels, embeddings, and predictions, with leakage modulated by graph homophily, heterophily, and the model's inductive bias. Motivated by these findings, we view GNN inference through a Markov chain approximation lens, treating the layered forward computation as a chain of topology-dependent representations. Building on this view, we develop complementary attack and defense methods. On the attack side, we propose MC-GRA (+), which reconstructs the adjacency by optimizing a surrogate adjacency whose GNN-induced representations align with those of the target model at each layer. On the defense side, we propose MC-GPB (+), which suppresses adjacency-dependent information throughout the representation chain while aiming to preserve classification accuracy under a privacy-utility trade-off. Experiments across homophilic/heterophilic graph benchmarks and GNNs show that our attacks improve reconstruction fidelity over prior methods, while our defenses reduce reconstruction success with only minor accuracy loss.

URL PDF HTML ☆

赞 0 踩 0

2606.08287 2026-06-09 cs.LG cond-mat.mtrl-sci cs.CE 新提交

Mesh Graph Neural Network Framework for Accelerating Finite Element Simulation for Arbitrary Geometries

网格图神经网络框架加速任意几何形状的有限元仿真

Josiah D. Kunz, Kamal Choudhary

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出网格图网络（MGN）预测任意孔洞几何2D结构的von Mises应力场，通过编码节点类型、相对边特征和全局特征实现平移和旋转不变性，在未见几何和载荷下R²≥0.97，优于传统模型。

Comments 10 pages, 6 figures, to be published. Code available at https://github.com/Josiah-Kunz/MGN-Public

详情

AI中文摘要

有限元分析（FEA）对于结构设计至关重要，但在评估多个设计迭代或载荷场景时计算成本高昂。机器学习代理模型提供了一种有前景的替代方案，但大多数方法在跨不同几何形状的泛化方面存在关键局限性。本文提出一种网格图网络（MGN），用于预测具有任意孔洞几何的二维结构部件中的von Mises应力场。与使用绝对节点坐标作为特征的传统机器学习方法不同，该模型基于现有的MGN框架，编码节点类型（例如固定边界、自由表面、孔洞边缘）、相对边特征（邻居之间的距离）和全局特征（施加的载荷）。这种架构本质上是平移和旋转不变的，使得无需重新训练即可泛化到未见过的几何形状。MGN在11种板几何形状和20种载荷条件下训练，并在7种未见几何形状和3种未见载荷下评估。在最有利的情况下，模型在未见几何和未见载荷上达到$R^2 \geq 0.97$，而传统模型（随机森林、梯度提升、K近邻）在相同数据上训练的$R^2$约为$0.01$--$0.86$。然而，即使在不太有利的情况下，MGN模型仍然优于传统模型。本文将Pfaff等人（arXiv:2010.03409）的基于网格的仿真框架扩展到结构力学，证明了图神经网络可以作为跨不同几何形状的有限元分析的高效代理。

英文摘要

Finite element analysis (FEA) is essential for structural design but remains computationally expensive, particularly when evaluating multiple design iterations or load scenarios. Machine learning surrogate models offer a promising alternative, yet most approaches struggle with a critical limitation: generalizing across varying geometries. This work presents a mesh graph network (MGN) for predicting von Mises stress fields in 2D structural components with arbitrary hole geometries. Unlike traditional machine learning approaches that use absolute node coordinates as features, the proposed model builds on existing MGN frameworks that encode node types (e.g., fixed boundary, free surface, hole edge), relative edge features (distance between neighbors), and global features (applied load). This architecture is inherently translation- and rotation-invariant, enabling generalization to unseen geometries without retraining. The MGN was trained on 11 plate geometries under 20 load conditions and evaluated on 7 unseen geometries and 3 unseen loads. In the most favorable case, the model achieves $R^2 \geq 0.97$ on an unseen geometry and unseen load, compared to $R^2 \approx 0.01$--$0.86$ for conventional models (Random Forest, Gradient Boosting , K-Nearest Neighbors) trained on identical data. However, even in less favorable cases, the MGN model still outperforms conventional models. This work extends the mesh-based simulation framework of Pfaff et al. (arXiv:2010.03409) to structural mechanics, demonstrating that graph neural networks can serve as efficient surrogates for finite element analysis across varying geometries.

URL PDF HTML ☆

赞 0 踩 0

2606.08303 2026-06-09 cs.LG 新提交

GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks

GeoGNN：使用双塔图神经网络的时间序列地理定位

Toan Tran, Waqwoya Abebe, Abhishek Potnis, Supriya Chinthavali, Cyrus Shahabi, Li Xiong, Dalton Lunga

发表机构 * Emory University（埃默里大学）； Oak Ridge National Laboratory（橡树岭国家实验室）； University of Southern California（南加州大学）

AI总结提出GeoGNN双塔架构，利用地理邻接图学习空间嵌入，结合时间序列表示，通过点积匹配实现时间序列地理定位，在电力消费数据集上平均提升约27%的定位精度。

详情

AI中文摘要

本文研究时间序列地理定位的新概念，目标是推断每个原始时间序列的地理来源。成功的地理定位可以为时间序列提供空间上下文，支持下游位置感知应用。我们形式化了该问题，借鉴图像地理定位的核心思想建立了强基线，并提出了GeoGNN——一种双塔架构。训练时，GeoGNN的空间塔通过利用地理邻接图学习地理单元候选的嵌入，而时间塔从时间序列中提取信息表示。推理时，每个时间表示与候选地理嵌入通过点积相似度匹配，并结合辅助分类头，以预测时间序列关联的地理来源。在全国范围的大规模电力消费数据集上的实验表明，GeoGNN在数据集上取得了最佳性能，并将细粒度和粗粒度地理定位精度平均提高了约27%。

英文摘要

This paper investigates a novel concept of time series geolocalization, where the goal is to infer the geographic origin of each raw time series. Successful geolocalization can provide spatial context to time series, enabling downstream location-aware applications. We formalize the problem, adapt core ideas from image geolocalization to establish strong baselines, and propose GeoGNN, a two-tower architecture. During training, GeoGNN's spatial tower learns embeddings of geographic cell candidates by leveraging the geographic adjacency graph, while the temporal tower extracts informative representations from time series. During inference, each temporal representation is matched against candidate geographic embeddings using dot-product similarity, combined with an auxiliary classification head, to predict the time series' associated geographic origin. Experiments on large-scale, countrywide electricity-consumption datasets demonstrate that GeoGNN achieves the best performance across datasets and enhances both fine- and coarse-grained geolocalization accuracy by ~27% on average.

URL PDF HTML ☆

赞 0 踩 0

2606.08306 2026-06-09 cs.LG cs.SI 新提交

Towards Graph Foundation Models for Dynamics in Complex Networked Systems: Lessons from Super-Spreader Identification in Multilayer Networks

面向复杂网络系统中动力学的图基础模型：来自多层网络超级传播者识别的教训

Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka

AI总结本文提出图基础模型在动力学中需具备归纳跨网络泛化能力，通过仅基于合成多层网络训练的ts-net模型，在真实多层网络上实现零样本泛化，并优于传统方法。

2606.08978 2026-06-09 cs.LG 新提交

Heterophily-Aware Adaptive Knowledge Distillation for Hypergraph Neural Networks

异质性感知的自适应知识蒸馏用于超图神经网络

Joohee Cho, David Yoon Suk Kang, Yunyong Ko

发表机构 * Chung-Ang University（中央大学）； Chungbuk National University（忠北国立大学）

AI总结针对超图神经网络在异质性节点上性能下降的问题，提出异质性感知的自适应蒸馏方法HADES，通过量化节点异质性调节教师知识迁移，使学生模型性能超越教师并实现最高12.3倍加速。

Comments 5 pages, 2 figures, 4 tables

详情

AI中文摘要

超图知识蒸馏旨在通过轻量级学生模型保留超图神经网络（HNN）教师的预测性能，同时降低推理成本。在这项工作中，我们观察到HNN在通过语义多样的超边连接的异质性节点上的预测性能显著较低，表明教师知识的可靠性在不同节点间存在差异。受此观察启发，我们提出了HADES，一种用于超图神经网络的异质性感知自适应蒸馏方法。HADES量化节点异质性，并将其作为教师可靠性的估计，以在蒸馏过程中调节教师知识的迁移。在真实世界超图上的实验结果表明，HADES在不同HNN教师和蒸馏目标下持续提升学生性能。在许多情况下，所得学生模型的预测性能超越其教师，同时实现高达12.3倍的推理加速。

英文摘要

Hypergraph knowledge distillation aims to retain the predictive performance of a hypergraph neural network (HNN) teacher while reducing inference costs through a lightweight student model. In this work, we observe that HNNs exhibit substantially lower prediction performance on heterophilic nodes connected through semantically diverse hyperedges, indicating that the reliability of teacher knowledge varies across nodes. Motivated by this observation, we propose HADES, a heterophily-aware adaptive distillation method for hypergraph neural networks. HADES quantifies node heterophily and leverages it as an estimate of teacher reliability to modulate the transfer of teacher knowledge during distillation. Experimental results on real-world hypergraphs demonstrate that HADES consistently improves student performance across different HNN teachers and distillation objectives. In many cases, the resulting student models surpass the predictive performance of their teachers while achieving up to 12.3 times faster inference.

URL PDF HTML ☆

赞 0 踩 0

2606.09051 2026-06-09 cs.LG 新提交

Beyond Convolution: Advancing Hypergraph Neural Networks with Hypergraph U-Nets

超越卷积：用超图U-Net推进超图神经网络

Fuli Wang, Wei Qian, Daniel L. Lau, Gonzalo R. Arce

发表机构 * Institute for Financial Services Analytics, University of Delaware（特拉华大学金融服务分析研究所）； Department of Applied Economics and Statistics, University of Delaware（特拉华大学应用经济学与统计学系）； Department of Electrical and Computer Engineering, University of Kentucky（肯塔基大学电气与计算机工程系）； Department of Electrical and Computer Engineering, University of Delaware（特拉华大学电气与计算机工程系）

AI总结提出并行层次池化和反池化算子，构建首个超图U-Net架构，在分类、重构和异常检测任务上超越现有方法。

详情

AI中文摘要

卷积已成功从图像处理过渡到非欧几里得高阶域的复杂领域，特别是在超图中。尽管卷积取得了成功，但由于缺乏定义良好的池化和反池化操作，一种名为U-Net的流行架构在超图数据上的探索仍然很少。本工作开创性地研究了超图数据的U-Net架构，解决了设计有效池化和反池化操作的关键挑战，这些操作能保留输入超图的最大结构信息。受层次聚类启发，我们提出通过在不同粒度上切割聚类树状图来一次性构建池化和反池化算子，称为并行层次池化（PHPool）和反池化（PHUnpool）算子。与现有通过顺序学习过程可能造成局部结构损坏的池化方法不同，我们的PHPool算子以全局并行方式设计，确保对原始超图结构的保真度和高效计算，而PHUnpool算子则专门设计为执行PHPool的逆操作以进行超图重构。我们通过超图重构模拟、超图分类和节点级异常检测验证了我们的模型，在这些任务中，它表现出优于现有最先进的图和超图深度学习方法的性能。

英文摘要

Convolutions have successfully transitioned from image processing to the complex realm of non-Euclidean higher-order domains, particularly in hypergraphs. Despite the success in convolution, the exploration of a popular architecture named U-Net remains largely unexplored for hypergraph data due to the lack of well-defined pooling and unpooling operations. This work pioneers the study of U-Net architectures for hypergraph data, addressing the critical challenge of designing effective pooling and unpooling operations that retain maximal structural information from the input hypergraph. Motivated by hierarchical clustering, we propose to construct the pooling and unpooling operators all at once by cutting the clustering dendrogram at different granularities, named the Parallel Hierarchical Pooling (PHPool) and Unpooling (PHUnpool) operators. Unlike existing pooling methods that risk local structural damage through a sequential learning procedure, our PHPool operators are designed in a global and parallel manner to ensure fidelity to the original hypergraph structure with efficient computation while the PHUnpool operators are tailored to perform inverse operations of the PHPools for hypergraph reconstruction. We validate our model through hypergraph reconstruction simulation, hypergraph classification, and node-level anomaly detection, where it demonstrates superior performance over existing state-of-the-art graph and hypergraph deep learning methods.

URL PDF HTML ☆

赞 0 踩 0

2606.09340 2026-06-09 cs.LG 新提交

OSMGraphCLIP：从OpenStreetMap图学习全局位置表示

Dimitrios Michail, Eleni Saka, Ioannis Giannopoulos, Ioannis Papoutsis

发表机构 * Harokopio University of Athens（雅典哈罗科皮奥大学）； National Technical University of Athens（雅典国家技术大学）； Vienna University of Technology（维也纳技术大学）； National Observatory of Athens（雅典国家天文台）

AI总结提出OSMGraphCLIP模型，利用OpenStreetMap异构图结构学习全局位置嵌入，通过多尺度图编码器和对比学习对齐，在气候、生态、社会经济等下游任务中达到或超越卫星基线方法。

详情

AI中文摘要

我们提出了OSMGraphCLIP，一种CLIP风格的地理空间表示模型，从免费可用的OpenStreetMap（OSM）数据中学习全局位置嵌入。OSMGraphCLIP将地理环境表示为带类型的OSM特征的异构图，保留了道路、建筑物、土地利用区域和兴趣点之间的拓扑和语义关系。多尺度图编码器捕获细粒度的局部结构和更广泛的景观组成，并通过对比对齐目标监督球谐位置编码器。我们在涵盖气候、生态、社会经济指标、公共卫生、土地覆盖、生物多样性和野火预测等一系列下游地理空间回归和分类任务中评估了OSMGraphCLIP，并表明仅结构化OSM数据就支持跨领域的强全局位置表示。OSMGraphCLIP在大多数基准测试中达到或超过了基于卫星的基线，在社会经济和公共卫生任务中优势最为明显，因为OSM对建成环境的显式语义注释编码了卫星像素只能间接捕获的人类活动模式。在生态和环境任务中，尽管未使用地球观测数据，该模型仍与基于图像的方法保持紧密竞争。定性分析证实，学习到的嵌入连贯地组织了地理空间，仅从地图拓扑中恢复了生物群落边界、城市梯度和热带-温带区别。

英文摘要

We present OSMGraphCLIP, a CLIP-style geospatial representation model that learns global location embeddings from freely available OpenStreetMap (OSM) data. OSMGraphCLIP represents geographic environments as heterogeneous graphs of typed OSM features, preserving the topological and semantic relationships among roads, buildings, land-use regions, and points of interest. A multi-scale graph encoder captures both fine-grained local structure and broader landscape composition, and supervises a spherical-harmonics location encoder through a contrastive alignment objective. We evaluate OSMGraphCLIP across a diverse suite of downstream geospatial regression and classification tasks spanning climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting, and show that structured OSM data alone supports strong global location representations across domains. OSMGraphCLIP matches or exceeds satellite-based baselines on the majority of benchmarks, with the most pronounced advantage on socioeconomic and public-health tasks, where OSM's explicit semantic annotation of the built environment encodes patterns of human activity that satellite pixels can only capture indirectly. On ecological and environmental tasks, the model remains closely competitive with imagery-based methods despite using no Earth observation data. Qualitative analysis confirms that the learned embeddings organize geographic space coherently, recovering biome boundaries, urban gradients, and tropical--temperate distinctions from map topology alone.

URL PDF HTML ☆

赞 0 踩 0

2606.08258 2026-06-09 cs.GR cs.CV cs.LG 交叉投稿

MS-COOT: Comparing Morse-Smale Complexes with Co-Optimal Transport

MS-COOT: 用共最优传输比较Morse-Smale复形

Guangyu Meng, Mingzhe Li, Erin Wolf Chambers

发表机构 * Department of Computer Science and Engineering, University of Notre Dame（Notre Dame 大学计算机科学与工程系）

AI总结提出MS-COOT距离，将Morse-Smale复形表示为超图，通过共最优传输联合匹配临界点和区域，实现区域级结构比较，在分类等任务中优于图方法。

详情

AI中文摘要

理解和比较标量场中的结构是科学可视化的核心挑战，应用范围从特征分析到时间和结构比较。Morse-Smale (MS) 复形通过将标量场分解为由梯度流诱导的区域提供了自然表示。然而，现有方法通常依赖于基于图的表示，捕获临界点之间的关系而丢弃区域级结构。在这项工作中，我们将MS复形表示为超图，其中临界点构成节点，区域定义超边。我们引入MS-COOT，一种共最优传输距离，联合计算临界点和区域之间的对应关系。这种公式化使得在基于距离的框架内能够进行显式的区域到区域匹配，从而识别诸如分裂和合并等区域级事件。我们使用领域特定组件实例化该框架，包括编码临界点-区域关系的超网络函数、强调拓扑显著特征的基于持久性的概率度量，以及包含临界点属性的样本代价项。我们在涵盖2D模拟、3D曲面网格和体积数据的五个数据集上评估MS-COOT。我们的结果表明，MS-COOT捕获了基于图的距离未反映的区域级结构变化，同时在分类和分辨率判别等下游任务中实现了强性能。

英文摘要

Understanding and comparing structures in scalar fields is a central challenge in scientific visualization, with applications ranging from feature analysis to temporal and structural comparison. The Morse-Smale (MS) complex provides a natural representation by decomposing a scalar field into regions induced by gradient flow. However, existing approaches typically rely on graph-based representations, capturing relationships between critical points while discarding region-level structure. In this work, we represent the MS complex as a hypergraph, where critical points form nodes and regions define hyperedges. We introduce MS-COOT, a co-optimal transport distance that jointly computes correspondences between critical points and regions. This formulation enables explicit region-to-region matching within a distance-based framework, allowing identification of region-level events such as splitting and merging. We instantiate this framework with domain-specific components, including a hypernetwork function encoding critical point-region relationships, persistence-based probability measures that emphasize topologically significant features, and a sample cost term that incorporates critical point attributes. We evaluate MS-COOT on five datasets spanning 2D simulations, 3D surface meshes, and volumetric data. Our results show that MS-COOT captures region-level structural changes that are not reflected by graph-based distances, while achieving strong performance in downstream tasks such as classification and resolution discrimination.

URL PDF HTML ☆

赞 0 踩 0

2606.09100 2026-06-09 cs.SI cs.LG 交叉投稿

Alcmean's: Unsupervised community detection using local Laplacian, automatic detection of the number of centers

Alcmean's: 使用局部拉普拉斯算子的无监督社区检测与中心数量自动检测

Shahin Momenzadeh, Rojiar Pir Mohammadiani

发表机构 * Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran（伊朗库尔德大学桑和达吉分校计算机工程系）

AI总结提出ALCMeans算法，结合拉普拉斯能量自动识别中心与DeepWalk嵌入，无需预设社区数，在基准数据集上NMI和ARI比Louvain等方法高10-20%。

详情

DOI: 10.22103/jmmr.2026.25756.1849

AI中文摘要

社区检测是复杂网络分析中的一个基本问题，在社交、生物和金融领域都有应用。传统算法如Louvain、LPA和模块度优化通常需要手动参数调整，还存在聚类中心选择不准确和可扩展性差的问题。为了解决这些挑战，我们提出了自动拉普拉斯中心均值（ALCMeans），一种新颖的社区检测算法。ALCMeans将基于拉普拉斯能量的自动中心识别与DeepWalk嵌入相结合，以实现稳健的节点表示。与现有的基于拉普拉斯和聚类方法不同，ALCMeans无需预定义社区数量，利用结构重要性增强聚类中心选择，并利用表示学习实现更准确和稳定的分配。在基准数据集上的实验结果表明，与Louvain、Newman-Girvan、LPA、Fast-Greedy以及最近基于GNN的竞争者（MAGI, KDD 2024）相比，NMI和ARI得分提高了10%到20%。使用模块度和F1分数的额外评估证实了ALCMeans的优越性。消融研究突出了每个组件的关键贡献。尽管依赖于DeepWalk参数并且相对于轻量级启发式方法运行时间增加，ALCMeans始终优于最先进的方法，使其成为现实世界网络分析的一个有前景的工具。

英文摘要

Community detection is a fundamental problem in the analysis of complex networks. It has applications across social, biological, and financial domains. Traditional algorithms such as Louvain, LPA, and modularity optimization often require manual parameter tuning. They also suffer from inaccurate cluster center selection and struggle with scalability. To address these challenges, we propose Automatic Laplacian Centrality Means (ALCMeans), a novel community detection algorithm. ALCMeans combines Laplacian energy-based automatic center identification with DeepWalk embeddings for robust node representation. Unlike existing Laplacian-based and clustering methods, ALCMeans eliminates the need to predefine the number of communities, enhances cluster center selection using structural importance, and leverages representation learning for more accurate and stable assignments. Experimental results on benchmark datasets demonstrate 10 to 20 percent higher NMI and ARI scores compared to Louvain, Newman-Girvan, LPA, Fast-Greedy, and a recent GNN-based competitor (MAGI, KDD 2024). Additional evaluations with modularity and F1-scores confirm the superiority of ALCMeans. Ablation studies highlight the critical contributions of each component. Despite its reliance on DeepWalk parameters and increased runtime relative to lightweight heuristics, ALCMeans consistently outperforms state-of-the-art methods. This makes it a promising tool for real-world network analysis.

URL PDF HTML ☆

赞 0 踩 0

2510.02014 2026-06-09 cs.LG 版本更新

Normality Calibration in Semi-supervised Graph Anomaly Detection

半监督图异常检测中的正态性校准

Guolei Zeng, Hezhe Qiao, Guoguo Ai, Jinsong Guo, Guansong Pang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出GraphNC框架，通过教师模型在异常分数和表示空间联合校准正态性，解决半监督图异常检测中正态性过拟合问题，降低误报。

Comments Accepted by ICML2026

详情

MetaEvo：一种基于经验驱动的智能体进化的元优化框架

Bowen Ren, Heyan Huang, Yinghao Li, Yang Gao

发表机构 * School of Computer Science and Technology, Beijing Institute of Technology（北京理工大学计算机科学与技术学院）； Beijing Institute of Technology Southeast Academy of Information Technology（北京理工大学东南信息技术研究院）

AI总结提出MetaEvo两阶段框架，通过偏好优化增强模型从任务经验中抽象原则的能力，并在模块化架构中积累复用，持续提升推理性能。

详情

AI中文摘要

大型语言模型（LLM）展现出强大的推理能力，但大多数基于LLM的智能体是静态部署的，无法通过任务交互进行改进。现有的经验驱动方法通常依赖于记忆或启发式方法，而不增强模型的学习能力，将其视为被动执行者，导致早期性能平台和有限的长期改进。为了解决这个问题，我们提出了MetaEvo，一个用于持续智能体进化的两阶段框架，专注于改进模型如何从任务经验中学习，而不仅仅是存储什么。MetaEvo首先应用基于偏好的优化来增强模型的原则抽象能力，然后在模块化智能体架构中实现这些原则的积累和重用。在多样化推理基准上的实验结果表明，MetaEvo始终优于强基线，并在迭代中保持可靠的改进。这些发现验证了元优化在使智能体从经验中学习并持续增强其推理能力方面的有效性。

英文摘要

Large language models (LLMs) exhibit strong reasoning capabilities, yet most LLM-based agents are statically deployed and unable to improve through task interactions. Existing experience-driven methods often rely on memory or heuristics without enhancing the model's ability to learn, treating it as a passive executor and leading to early performance plateaus and limited long-term improvement. To address this issue, we propose MetaEvo, a two-stage framework for continual agent evolution that focuses on improving how the model learns from tasks experience, rather than solely on what it stores. MetaEvo first applies preference-based optimization to enhance the model's ability of principle abstraction, then enables the accumulation and reuse of these principles within a modular agent architecture. Experimental results on diverse reasoning benchmarks demonstrate that MetaEvo consistently outperforms strong baselines, maintains reliable improvement across iterations. These findings validate the effectiveness of meta-optimization in enabling agents to learn from experience and continually enhance their reasoning capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.07627 2026-06-09 cs.LG math.AT math.CT 新提交

我以前解决过这个问题吗？检索相似分割问题进行进化学习

Andreas Margraf, Henning Cui, Jörg Hähner

发表机构 * University of Augsburg（奥格斯堡大学）

AI总结提出一种基于检索相似分割问题的进化学习方法，通过重用已有管道避免从头训练模型，降低开发成本，并分析跨域迁移的可行性。

详情

AI中文摘要

监控系统的可靠集成和稳固配置是实现现代制造环境高效率和生产率的基本前提。关于传感器类型和系统架构的设计决策必须在早期阶段且在高不确定性下做出。本文研究了一种偏离传统监控系统开发过程的研究方向，将注意力从算法设计转向对检测问题的更深入分析。与传统设计周期不同，本文提出逐步收集知识并将其存储在抽象系统模型中。这使得能够检索未来用例的相似解决方案，避免了昂贵的从头开始模型训练，而是允许对现有基础配置进行增量改进。重用先前生成的管道降低了后期昂贵修订的风险。由于关于滤波器管道的跨域可转移性知之甚少，本研究分析了检索滤波器管道以将其转移到不同但相似的分割问题的潜力。最后，我们统计分析了这种主要应用于图像分割问题的“迁移学习”变体的优势。此外，我们讨论了简单模型如何帮助在设计过程中平衡复杂性、技术要求和可靠性之间的权衡。

英文摘要

Reliable integration and solid configuration of monitoring systems constitute a fundamental prerequisites for achieving high efficiency and productivity in contemporary manufacturing environments. Design decisions on sensor type and system architecture have to be made at an early stage and under comparably high uncertainty. This work investigates a research direction that deviates from the traditional monitoring-system development process by shifting the attention from algorithm design to a deeper analysis of the inspection problem. In contrast to traditional design cycles, this paper proposes to gradually collect knowledge and store it in an abstract system model. This enables the retrieval of similar solutions for future use cases, preventing the need for expensive model training from scratch and allowing instead for the incremental refinement of existing base configurations. Reuse of previously generated pipelines reduces the risk of late and costly revisions. As there is little knowledge on cross-domain transferability of filter pipelines, this study analyzes the potential of retrieving filter pipelines to transfer them to different but similar segmentation problems. Finally, we statistically analyze the benefits of this `transfer learning' variant which is predominantly applied to image segmentation problems. In addition, we discuss how simple models help balancing the trade-off between complexity, technical requirements, and reliability in the design process.

URL PDF HTML ☆

赞 0 踩 0

2606.08447 2026-06-09 cs.LG cs.AI 新提交

Not Just After One: Sleep-Inspired Replay Prevents Catastrophic Forgetting After Sequential Tasks

不仅仅是在一次之后：受睡眠启发的回放防止顺序任务后的灾难性遗忘

Anthony Bazhenov, Jean Erik Delanois, Giri P. Krishnan

发表机构 * Department of Neuroscience, University of California, San Diego, CA, USA（1 神经科学系，加州大学圣地亚哥分校，美国加利福尼亚州圣地亚哥）

AI总结提出受睡眠启发的无监督回放机制，在多个新任务顺序训练后应用，以部分恢复所有先前学习任务的性能，防止灾难性遗忘。

详情

AI中文摘要

人工神经网络的关键限制之一是缺乏持续学习的能力：在新任务上训练常常导致对先前任务的干扰和遗忘。尽管已有几种算法被提出以保护旧记忆免受干扰，但它们通常在每个新训练阶段期间或之后立即应用。相比之下，人类和动物可以持续学习，在主动学习期间获取多个新记忆，然后将它们全部巩固到长期存储中。在这里，我们展示了多个新任务可以顺序训练，然后应用无监督的睡眠样回放阶段，以部分恢复所有先前学习任务的性能。我们的研究进一步表明，任务特定信息对新训练具有弹性，但随着网络在新任务上训练而逐渐衰减。这些发现为开发广泛范围的持续学习AI解决方案提供了新颖的原则。

英文摘要

One of the critical limitations of artificial neural networks is their lack of ability to continually learn: training on new tasks often leads to interference and forgetting of the previous ones. While several algorithms have been proposed to protect old memories from interference, they are typically applied during or immediately after each new episode of training. In contrast, humans and animals can learn continuously, acquiring multiple new memories during active learning before consolidating all of them into long-term storage. Here we show that multiple new tasks can be trained sequentially before an unsupervised sleep-like replay phase is applied to partially restore performance across all previously learned tasks. Our study further suggests that task-specific information remains resilient to new training but decays gradually as network is trained on new tasks. These findings point to novel principles for developing a broad range of continual learning AI solutions.

URL PDF HTML ☆

赞 0 踩 0

2606.08452 2026-06-09 cs.LG 新提交

Theoretical Foundations of Continual Learning via Drift-Plus-Penalty

基于漂移加惩罚的持续学习的理论基础

Nazreen Shah, Govinda Arya, Bharath B. N., Ranjitha Prasad

发表机构 * IIIT Delhi（德里印度理工学院）； IIT Dharwad（达尔瓦德印度理工学院）

AI总结提出COLD框架，利用漂移加惩罚原理调节稳定性-可塑性权衡，通过虚拟队列控制遗忘，理论保证收敛性，实验优于现有方法。

Comments Accepted to Transactions on Machine Learning Research (TMLR)

详情

AI中文摘要

在许多实际场景中，数据流是非平稳的且顺序到达，要求学习系统在不从头重新训练的情况下持续适应。持续学习通过整合新任务同时缓解灾难性遗忘来应对这一挑战，其中学习新信息会降低先前知识的性能。我们引入了一种控制理论视角来明确调节遗忘的演化，将适应视为受长期稳定性约束的受控过程。我们专注于基于回放的持续学习，其中有限的内存缓冲区存储来自先前任务的代表性样本。我们提出了基于漂移加惩罚原理的持续学习框架COLD，该原理来自随机优化。为了便于分析，我们还考虑了一种oracle变体COLD-ORACLE作为参考基准。在每个任务中，两种方法都最小化当前任务损失，同时维护一个虚拟队列，该队列跟踪先前学习任务上长期稳定性的偏差，将稳定性-可塑性权衡捕捉为受调节的动态过程。我们建立了稳定性和收敛性保证，通过可调控制参数表征这种权衡。在标准基准上的实验表明，COLD在提供竞争性和可控的遗忘行为的同时，通过显式调节稳定性和可塑性，始终优于广泛的最先进的持续学习方法。

英文摘要

In many real-world settings, data streams are nonstationary and arrive sequentially, requiring learning systems to adapt continuously without retraining from scratch. Continual learning (CL) addresses this challenge by incorporating new tasks while mitigating catastrophic forgetting, where learning new information degrades performance on previously acquired knowledge. We introduce a control-theoretic perspective on CL that explicitly regulates the evolution of forgetting, framing adaptation as a controlled process subject to long-term stability constraints. We focus on replay-based CL, where a finite memory buffer stores representative samples from prior tasks. We propose COntinual Learning with Drift-Plus-Penalty (COLD), a continual learning framework based on the Drift-Plus-Penalty (DPP) principle from stochastic optimization. To facilitate analysis, we also consider an oracle variant, COLD-ORACLE, as a reference benchmark. At each task, both methods minimize the current task loss while maintaining a virtual queue that tracks deviations from long-term stability on previously learned tasks, capturing the stability-plasticity trade-off as a regulated dynamical process. We establish stability and convergence guarantees that characterize this trade-off through a tunable control parameter. Experiments on standard benchmarks demonstrate that COLD consistently outperforms a broad range of state-of-the-art CL methods while providing competitive and controllable forgetting behavior through explicit regulation of stability and plasticity.

URL PDF HTML ☆

赞 0 踩 0

2606.08691 2026-06-09 cs.LG stat.ME 新提交

Hierarchical Projection for Adaptive Knowledge Transfer

自适应知识迁移的分层投影

Samhita Pal, Tian Gu

发表机构 * Vanderbilt University Medical Center（范德比尔特大学医学中心）； Columbia University（哥伦比亚大学）

AI总结提出ProjectionTL框架，通过分层贝叶斯建模与自适应投影实现源选择与特征选择，缓解负迁移，提升跨域学习的准确性、稳定性和可解释性。

详情

AI中文摘要

现代数据驱动应用越来越多地涉及从多个异质源中学习，其中目标数据集有限，但跨域可获得相关信息。当相关性变化或存在虚假信号时，简单组合这些源会降低性能，这对可信的跨域学习构成了根本性挑战。我们提出了投影迁移学习（ProjectionTL），这是一个统一框架，将分层贝叶斯建模与自适应投影相结合，用于选择性知识迁移。关键思想是在两个层次上解耦迁移：首先，我们构建一个源引导的分层先验，通过数据驱动的权重聚合跨源信息，捕捉每个源与目标之间的全局对齐；其次，我们通过后验投影步骤在特征层面细化这种借用，选择性地保留与目标信号局部一致的坐标。这种两阶段设计使该方法能够同时进行源选择和特征选择，从而减轻负迁移，同时保持可解释性。ProjectionTL提供了一种跨域整合异质数据的原则性方法，桥接了统计建模和现代机器学习范式，以实现鲁棒且可解释的迁移。通过模拟和真实世界的生物医学应用，我们证明了与现有方法相比，准确性、稳定性和可解释性的提升。我们的框架为高维设置下的可信跨域学习提供了一种可扩展且通用的策略。

英文摘要

Modern data-driven applications increasingly involve learning from multiple heterogeneous sources, where a target dataset is limited but related information is available across domains. Naively combining these sources can degrade performance when relevance varies or spurious signals are present, posing a fundamental challenge for trustworthy cross-domain learning. We propose Projection Transfer Learning (ProjectionTL), a unified framework that integrates hierarchical Bayesian modeling with adaptive projection for selective knowledge transfer. The key idea is to decouple transfer at two levels: first, we construct a source-guided hierarchical prior that aggregates information across sources using data-driven weights, capturing global alignment between each source and the target; second, we refine this borrowing through a posterior-projection step that operates at the feature level, selectively retaining coordinates that exhibit local agreement with the target signal. This two-stage design enables the method to simultaneously perform source selection and feature selection, thereby mitigating negative transfer while preserving interpretability. ProjectionTL provides a principled approach to integrating heterogeneous data across domains, bridging statistical modeling and modern machine learning paradigms for robust and interpretable transfer. Through simulations and real-world biomedical applications, we demonstrate improved accuracy, stability, and interpretability compared to existing methods. Our framework offers a scalable and generalizable strategy for trustworthy cross-domain learning in high-dimensional settings.

URL PDF HTML ☆

赞 0 踩 0

2606.09052 2026-06-09 cs.LG cs.AI cs.CL cs.GT stat.ML 新提交

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

INFUSER: 影响力引导的自我进化提升推理能力

Siyu Chen, Miao Lu, Beining Wu, Heejune Sheen, Fengzhuo Zhang, Shuangning Li, Zhiyuan Li, Jose Blanchet, Tianhao Wang, Zhuoran Yang

发表机构 * Yale University（耶鲁大学）； Stanford University（斯坦福大学）； University of Chicago（芝加哥大学）； Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； University of California, San Diego（圣地亚哥大学）

AI总结提出INFUSER框架，通过生成器与求解器的协同进化，利用影响力分数和DuGRPO优化，从文档池中自适应生成训练数据，显著提升模型推理性能。

Comments 66 pages, 17 figures

详情

AI中文摘要

自我进化为更强的推理提供了一条可扩展的路径：预训练语言模型仅需极少的外部监督即可自我改进。然而，现有方法要么依赖于大量精心策划或教师生成的训练数据，要么在生成器无监督运行时，使用未必能改进求解器的难度启发式方法对其进行奖励。我们引入了INFUSER，一个迭代协同训练框架，包含两个共同进化的角色：一个生成器，从自动收集的非结构化文档池中起草问题并参考标准答案；一个求解器，通过在这些数据上训练来改进。求解器使用标准正确性奖励（针对生成器提供的答案）进行训练，而生成器则通过一种优化器感知的影响力分数获得奖励，该分数衡量每个提出的问题是否真正能改进求解器在目标分布上的表现。由于这种连续、有噪声的影响力分数不适合标准的GRPO，我们提出了DuGRPO，一种GRPO的双归一化变体，用于生成器训练。这些设计共同将文档池转化为一个自适应课程，倾向于对当前求解器有用的问题，而不仅仅是困难的问题。在Qwen3-8B-Base上，INFUSER在Olympiad和SuperGPQA基准测试中相对于强自我进化基线取得了超过20%的相对改进，并且一个8B的INFUSER协同进化生成器在数学和编程任务上优于冻结的32B思考生成器。消融实验证实了每个设计选择的必要性，两个扩展——将INFUSER应用于指令微调锚点并辅以规则可验证的RLVR数据——进一步展示了该框架的灵活性和泛化能力。代码可在https://github.com/FFishy-git/INFUSER获取。

英文摘要

Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts questions and reference golden answers from a pool of unstructured, automatically collected documents, and a Solver that improves by training on them. The solver is trained with standard correctness rewards against the generator-provided answers, while the generator is rewarded by an optimizer-aware influence score that measures whether each proposed question would actually improve the solver on the target distribution. Because this continuous, noisy influence score is poorly served by standard GRPO, we propose DuGRPO, a dual-normalized variant of GRPO, for generator training. Together, these turn the document pool into an adaptive curriculum that favors questions useful to the current solver, not just hard ones. On Qwen3-8B-Base, INFUSER outperforms strong self-evolution baselines with over 20% relative improvement on Olympiad and SuperGPQA benchmarks, and an 8B INFUSER co-evolving generator outperforms a frozen 32B thinking generator on math and coding. Ablations confirm each design choice is necessary, and two extensions, applying INFUSER to an instruction-finetuned anchor and augmenting it with rule-verifiable RLVR data, further demonstrate the flexibility and generalizability of the framework. Code is available at https://github.com/FFishy-git/INFUSER.

URL PDF HTML ☆

赞 0 踩 0

2606.09430 2026-06-09 cs.LG cs.AI 新提交

LargeMonitor: Monitoring Online Task-Free Continual Learning via Large Pretrained Models

LargeMonitor: 通过大型预训练模型监控在线无任务持续学习

Mingqi Yuan, Xiaoquan Sun, Shihao Luo, Jiayu Chen

发表机构 * HKU（香港大学）； Qicore Tech（启科科技）

AI总结提出LargeMonitor框架，利用大型预训练模型（LVM和LMM）解耦检测与诊断，实现无任务持续学习中的零样本漂移检测和语义病因诊断，提升现有算法性能。

详情

AI中文摘要

在线无任务持续学习（TFCL）要求智能体在严格单次遍历约束下，从无界、非平稳的数据流中顺序积累知识，且无显式任务标识。现有在线TFCL范式主要依赖于参数高效的提示调整或由训练耦合优化动态（如经验损失波动或潜在距离演变）驱动的动态结构扩展。因此，这些训练耦合求解器对分布漂移的结构起源不可知，机械地在根本不同的流变化上强制执行固定策略。为解决这一问题，我们提出LargeMonitor，一个利用大型预训练基础模型自主编排无任务连续适应的框架。具体而言，LargeMonitor引入一个解耦的检测模块，利用大型视觉模型（LVM）的冻结、稳定表示空间，实现鲁棒的零样本漂移检测，无需训练依赖的干扰或脆弱的阈值调整。在确认漂移后，该框架激活一个由大型多模态模型（LMM）驱动的上下文感知诊断模块，以解释流变化的精确语义病因（例如，新类出现 vs. 环境域偏移）。这种双阶段能力使连续学习者能够动态部署自适应且特定于漂移的优化策略。在多个TFCL设置和基准上的大量实验表明，LargeMonitor实现了对复杂数据流的精确、鲁棒检测和诊断，同时持续提升现有在线TFCL算法的性能。

英文摘要

Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers. Existing online TFCL paradigms primarily rely on parameter-efficient prompt tuning or dynamic structure expansion driven by training-coupled optimization dynamics, such as empirical loss fluctuations or evolving latent distances. As a result, these training-coupled solvers remain agnostic to the structural origins of distribution drift, mechanically enforcing a fixed strategy across fundamentally distinct streaming variations. To address this gap, we propose LargeMonitor, a framework that leverages large pretrained foundation models to autonomously orchestrate task-free continuous adaptation. Specifically, LargeMonitor introduces a decoupled detection module utilizing the frozen, stable representation space of large vision models (LVMs) to achieve robust, zero-shot drift detection without training-dependent interference or brittle threshold tuning. Upon a confirmed drift, the framework activates a context-aware diagnostic module driven by large multimodal models (LMMs) to interpret the precise semantic etiologies of the stream variation (e.g., novel class emergence vs. environmental domain shift). This dual-stage capability empowers the continuous learner to dynamically deploy adaptive and shift-specific optimization strategies. Extensive experiments across multiple TFCL settings and benchmarks demonstrate that LargeMonitor achieves precise, robust detection and diagnosis of complex data streams while consistently improving the performance of existing online TFCL algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.09762 2026-06-09 cs.LG cs.AI 新提交

Preserving Plasticity in Continual Learning via Dynamical Isometry

通过动态等距保持持续学习中的可塑性

Andries Rosseau, Robert Müller, Ann Nowé

发表机构 * University of Amsterdam（阿姆斯特丹大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文通过动态等距机制保持深度神经网络在持续学习中的可塑性，提出等距正则化方法和AdamO优化器，在多个基准上匹配或超越现有方法。

Comments ICML26

详情

Journal ref: Forty-Third International Conference on Machine Learning (ICML 2026)

AI中文摘要

深度神经网络在非平稳条件下的持续训练通常会导致可塑性逐渐丧失，最终限制进一步学习。我们将可塑性与经验神经正切核联系起来，并确定动态等距（即逐层雅可比奇异值保持接近1的条件）是保持持续学习中可塑性的关键机制。我们重新审视一类几乎处处等距且同时保持通用Lipschitz函数逼近能力的网络，证明近动态等距与表达性非线性表示兼容。对于通用架构，我们提出一种高效的等距促进正则化方案，并识别出一种可以重新激活休眠ReLU单元的新机制。在此基础上，我们引入AdamO，一种Adam风格的自适应优化器，将等距正则化与梯度更新解耦，类似于AdamW。我们进一步通过动态等距的视角重新解释先前的可塑性保持方法，表明它们仅针对等距的部分度量。在旨在诱导可塑性损失的监督和强化学习持续学习基准上，我们的方法一致地匹配或超越现有方法。

英文摘要

Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisit a class of networks that are almost-everywhere isometric while remaining universal Lipschitz function approximators, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, we propose an efficient isometry-promoting regularization scheme and identify a novel mechanism by which it can reactivate dormant ReLU units. Building on this, we introduce AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, analogous to AdamW. We further reinterpret prior plasticity-preserving approaches through the lens of dynamical isometry, showing that they target only a partial measure of isometry. Across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss, our methods consistently match or outperform existing approaches.

URL PDF HTML ☆

赞 0 踩 0

最强的教师并不总是最好的教师：以学生为中心的答案选择

Zhengyu Hu, Zheyuan Xiao, Linxin Song, Fengqing Jiang, Yuetai Li, Zhengyu Chen, Zhihan Xiong, Yue Liu, Junhao Lin, Yao Su, Lijie Hu, Kaize Ding, Teng Xiao, Radha Poovendran

发表机构 * University of Washington（华盛顿大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of Southern California（南加州大学）； Independent Researcher（独立研究者）； National University of Singapore（新加坡国立大学）； Microsoft（微软）； Google（谷歌）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Northwestern University（西北大学）； Allen Institute for AI (AI2)（人工智能研究院（AI2））

AI总结提出以学生为中心的答案采样（SCAS）框架，通过估计学生中心的学习成本选择教师生成的答案，从而提升学生模型性能。

详情

AI中文摘要

LLM训练越来越依赖教师生成的监督，包括合成响应、推理轨迹和工具使用演示。当前实践通常选择表现最好的教师来生成学生训练数据，隐含地将教师测试表现视为教学质量的代理。我们表明这一假设可能失败：即使多个教师对同一问题提供正确答案，最强教师的答案也不一定是对给定学生的最佳监督。为解决这一问题，我们提出以学生为中心的答案采样（SCAS），该框架根据估计的学生中心学习成本从经过验证的教师生成答案中进行选择。受逐词梯度分解的启发，我们推导出该成本的高效前向代理，并在训练中用于指导答案选择。在30个教师模型、6个学生基础模型和8个任务上的实验表明，SCAS持续提升学生性能，表明有效的蒸馏应优先考虑与当前学生匹配的监督，而非仅依赖教师强度。

英文摘要

LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers provide correct answers to the same question, the answer from the strongest teacher is not necessarily the best supervision for a given student. To address this gap, we propose Student-Centric Answer Sampling (SCAS), a framework that selects from verified teacher-generated answers according to their estimated student-centric learning cost. Motivated by a token-wise gradient decomposition, we derive an efficient forward-only proxy for this cost and use it to guide answer selection during training. Experiments across 30 teacher models, 6 student base models, and 6 tasks show that SCAS consistently improves student performance, suggesting that effective distillation should prioritize supervision matched to the current student rather than teacher strength alone.

URL PDF HTML ☆

赞 0 踩 0

2606.01379 2026-06-09 cs.LG 版本更新

Turning Back Without Forgetting: Selective Backward Refinement for Parameter-Efficient Continual Learning

在不遗忘的情况下回溯：面向参数高效持续学习的选择性反向精炼

Anushka Tiwari, Kaiyi Ji

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出SABER框架，通过基于提示梯度几何和损失分布相似性的任务相关性准则，在提示型参数高效持续学习中实现受控的正向反向知识迁移，无需重放。

Comments Accepted at ICML 2026

详情

AI中文摘要

虽然基于提示的参数高效持续学习通过隔离任务特定提示来缓解灾难性遗忘，但这种隔离也限制了后续任务改进先前任务，导致反向知识迁移未被充分探索。我们通过提出选择性反向精炼以实现正向反向知识迁移（SABER）来解决这一限制，这是一个无需重放的框架，能够在基于提示的持续学习中实现受控的反向迁移。SABER利用基于提示梯度几何和损失分布相似性的互补任务相关性准则，判断何时进行反向精炼有益，并通过将更新限制在提示参数空间中的非干扰方向来安全执行精炼。在多个持续学习基准和不同预训练骨干网络（包括T5-Large、LLaMA和Qwen）上的大量实验表明，SABER在保持强大整体平均性能的同时，持续实现正向反向迁移。代码可在https://github.com/OptMN-Lab/SABER-ICML-2026/获取。

英文摘要

While prompt-based parameter-efficient continual learning mitigates catastrophic forgetting by isolating task-specific prompts, this isolation also limits later tasks from improving earlier ones, leaving backward knowledge transfer underexplored. We address this limitation by proposing Selective bAckward refinement for positive Backward knowledge transfER (SABER), a replay-free framework that enables controlled backward transfer in prompt-based continual learning. SABER determines when backward refinement is beneficial using complementary task-correlation criteria based on prompt-gradient geometry and loss-distribution similarity, and how to perform refinement safely by restricting updates to non-interfering directions in the prompt parameter space. Extensive experiments across multiple continual learning benchmarks and diverse pretrained backbones, including T5-Large, LLaMA, and Qwen, demonstrate that SABER consistently achieves positive backward transfer while maintaining strong overall average performance. Code is available at https://github.com/OptMN-Lab/SABER-ICML-2026/.

URL PDF HTML ☆

赞 0 踩 0

2605.16309 2026-06-09 cs.AI cs.LG cs.MA 版本更新

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

ANNEAL：通过受控符号补丁学习适应大语言模型代理

Safayat Bin Hakim, Keyan Guo, Wenkai Tan, Alvaro Velasquez, Shouhuai Xu, Houbing Herbert Song

发表机构 * University of Maryland, Baltimore County（马里兰大学巴尔的摩县分校）； University at Buffalo（布法罗大学）； University of Colorado Boulder（科罗拉多大学博尔德分校）； University of Colorado Colorado Springs（科罗拉多大学科罗拉多州立分校）

AI总结 ANNEAL通过受控符号补丁学习适应大语言模型代理，解决重复故障问题，其核心机制FDKA能定位责任操作符并生成类型补丁，实现持久结构修复，优于现有方法。

Comments Code Implementation: https://github.com/sbhakim/anneal-agents

详情

AI中文摘要

基于大语言模型的代理可以恢复个体执行错误，但在底层过程知识未修复时，同一故障会反复失败。现有自我进化方法通过更新提示、记忆或模型权重来解决这一差距，但未直接修复编码任务执行的符号结构，且缺乏安全部署所需的治理保证。我们引入ANNEAL，一种神经符号代理，将重复失败转化为受控符号编辑过程知识图谱，而无需修改基础模型权重。其核心机制，故障驱动知识获取（FDKA），定位责任操作符，通过约束LLM生成合成类型补丁，并通过多维评分、符号护栏和金丝雀测试验证提案，再提交。每条接受的编辑都携带完整溯源和确定性回滚能力。在四个领域和27个多种子运行中，ANNEAL是唯一在测试重复故障设置中将失败率降至0%的评估系统。消融实验表明，移除FDKA会消除所有结构修复并使成功率下降最高26.7个百分点。这些结果表明，受控符号修复为持续故障消除提供了与权重级和提示级适应互补的范式。

英文摘要

LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches address this gap by updating prompts, memory, or model weights, but none directly repair the symbolic structures that encode how tasks are executed, and few provide the governance guarantees required for safe deployment. We introduce ANNEAL, a neuro-symbolic agent that converts recurring failures into governed symbolic edits of a process knowledge graph without modifying foundation model weights. Its core mechanism, Failure-Driven Knowledge Acquisition (FDKA), localizes the responsible operator, synthesizes a typed patch through constrained LLM generation, and validates the proposal via multi-dimensional scoring, symbolic guardrails, and canary testing before commit. Every accepted edit carries full provenance and deterministic rollback capability. Across four domains and 27 multi-seed runs, ANNEAL is the only evaluated system that commits persistent structural repairs--strong baselines such as ReAct and Reflexion achieve high episodic recovery yet retain 72--100% holdout failure rates on recurring faults, whereas ANNEAL reduces these to 0% in the tested recurring-failure settings. Ablation confirms that removing FDKA eliminates all structural repairs and drops success rate by up to 26.7 percentage points. These results suggest that governed symbolic repair offers a complementary paradigm to weight-level and prompt-level adaptation for persistent fault elimination.

URL PDF HTML ☆

赞 0 踩 0

2605.30407 2026-06-09 cs.CL cs.AI cs.IR cs.LG 版本更新

Exploring Autonomous Agentic Data Engineering for Model Specialization

探索用于模型专业化的自主智能体数据工程

Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang, Yuqi Zhu, Jintian Zhang, Runnan Fang, Kewei Xu, Ye Liu, Zheng Wei, Jiang Bian, Zang Li, Shumin Deng

发表机构 * Zhejiang University（浙江大学）； Platform and Content Group, Tencent（腾讯平台与内容部）

AI总结本文提出自主智能体数据工程任务，让LLM作为自主数据工程师，通过端到端数据策划驱动模型专业化，实验显示GPT-5.2通过迭代数据适应使学生模型性能提升57.29%。

Comments Work in progress

详情

AI中文摘要

大型语言模型（LLM）在通用任务上表现出色，但往往难以适应没有高质量领域特定数据的专业领域。现有的基于LLM的数据策划方法主要依赖人工设计的工作流程，尚未检验LLM能否自主执行端到端的数据工程流水线以实现模型专业化。我们形式化了 extbf{自主智能体数据工程}，这是一个新任务，旨在评估LLM作为自主数据工程师，通过端到端数据策划驱动模型专业化。我们将数据视为可优化组件，研究能够跨多个领域规划、生成和迭代优化训练数据的智能体，并以训练后性能提升为指导。实验表明，自主LLM数据工程师带来了显著收益，GPT-5.2构建的训练课程使学生模型性能提升了 extbf{57.29\%}，完全通过迭代的智能体驱动数据适应实现。通过揭示潜力和瓶颈，我们的研究将自主数据工程确立为一种可衡量的能力，并为智能体驱动的模型专业化指明了道路 ootnote{代码将在https://github.com/zjunlp/DataAgent发布。}。

英文摘要

Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization. We formalize Autonomous Agentic Data Engineering, a novel task designed to evaluate LLMs as autonomous data engineers that drive model specialization through end-to-end data curation. We frame data as an optimizable component and study agents that plan, generate, and iteratively optimize training data across multiple domains, guided by post-training performance improvement. Experiments show that autonomous LLM data engineers yield substantial gains, as GPT-5.2 constructs a training curriculum that improves a student model by 57.29%, entirely through iterative, agent-driven data adaptation. By illuminating both potential and bottlenecks, our study establishes autonomous data engineering as a measurable capability and charts a path toward agent-driven model specialization (Code will be released at https://github.com/zjunlp/DataAgent).

URL PDF HTML ☆

赞 0 踩 0

2606.07550 2026-06-09 cs.LG cs.AI 新提交

Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

核聚变等离子体控制的离线强化学习：代码库与基准

Yang Fu, Haomin Bao, Rohit Sonker, Xiaoyan Hu, Aravind Venugopal, Jeff Schneider, Jiayu Chen

发表机构 * Central South University（中南大学）； Chongqing University（重庆大学）； Carnegie Mellon University（卡内基梅隆大学）； The University of Hong Kong（香港大学）

AI总结提出RL4F基准，基于DIII-D托卡马克历史数据构建评估环境，比较多种离线RL方法在等离子体控制任务上的性能，发现基于模型的离线RL方法平均表现最佳。

Comments 23 pages (10 pages main text)

详情

AI中文摘要

离线强化学习（RL）为从历史托卡马克数据开发等离子体控制器提供了一条有前景的途径，因为在真实设备上进行在线试错成本高昂且风险巨大。然而，由于缺乏针对核聚变中现实多执行器、长时域等离子体控制问题的标准化离线RL基准，这一方向的进展仍然难以衡量。我们引入了RL4F，一个用于核聚变等离子体控制的离线强化学习基准，提供了闭环评估环境和四个全剖面跟踪任务（旋转、密度、温度和压力）的基线比较。评估环境背后的动力学函数基于真实托卡马克DIII-D的历史放电数据构建。我们在统一协议下评估了广泛的模仿学习和离线RL基线。我们发现，基于模型的离线RL方法在大多数目标上获得了最佳平均性能，尽管没有单一方法在所有任务中占主导地位，这突显了动力学建模在复杂、长时域等离子体控制任务中的重要性。为了促进进一步研究，我们开源了代码库、数据集和评估框架，不仅为聚变社区，也为离线RL的算法开发提供了一个基准。

英文摘要

Offline reinforcement learning (RL) offers a promising route for developing plasma controllers from historical tokamak data, since online trial-and-error on real devices is costly and risky. However, progress in this direction remains difficult to measure due to the lack of a standardized offline RL benchmark for realistic multi-actuator, long-horizon plasma control problems in nuclear fusion. We introduce RL4F, an Offline Reinforcement Learning Benchmark for Plasma Control in Nuclear Fusion, providing closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure. The dynamics function underlying the evaluation environment is built from historical discharge data from DIII-D, a real-world Tokamak. We evaluate a broad set of imitation learning and offline RL baselines under a unified protocol. We find that offline model-based RL methods obtain the best average performance on most objectives, although no single method dominates all tasks, highlighting the importance of dynamics modeling in complex, long-horizon plasma control tasks. To foster further research, we open-source the codebase, datasets, and evaluation framework, providing a benchmark not only for the fusion community but also for algorithm development in offline RL.

URL PDF HTML ☆

赞 0 踩 0

2606.07553 2026-06-09 cs.LG cs.AI 新提交

MedicalRec: Medical recommender system for image classification without retraining

MedicalRec：无需重新训练的图像分类医疗推荐系统

Roghayeh Taghavi, Aysa Hasanazde Bashkandi, Amir Ali Bengari, Mohammad Amin Raji, Mohammad Salahi Ardekani, Parisa Mardukhian, Parvaneh Rezaei, Ramin Mousa

发表机构 * University of Tehran（塔里班大学）

AI总结提出基于Transformer的医疗推荐系统MedicalRec，利用从3000篇论文中构建的MedicalRec-Bench数据集（含5000+记录），无需重新训练即可为医疗图像分类任务推荐最优模型，最高HitRate@100达75.5%。

详情

AI中文摘要

机器学习和深度学习的出现彻底改变了医疗保健中诊断、治疗和管理系统的效率。然而，这种快速采用是以需要大量计算能力和能源消耗以及电子垃圾处理和碳排放为代价的。这些模型的挑战之一是为分类任务选择合适的模型。为此，研究人员尝试通过试错法使用他们的数据来确定最佳模型，这涉及能源消耗和浪费。本研究的目标是开发一个基于模型的医疗图像分类推荐系统。为此，从3000篇医疗图像分类领域的文章中收集了一个数据集。该数据集以MedicalRec-Bench的名称公开可用，包含超过5000条在各种任务中测试的模型记录，包括皮肤癌分类、肿瘤分类、伤口分类、乳腺癌和MRI分类。根据特征数量，数据集在四种不同模式下进行评估：MedicalRec I（5个特征）、MedicalRec II（9个特征）、MedicalRec III（11个特征）和MedicalRec IV（18个特征）。由于作者未报告，收集所有特征值具有挑战性；因此，数据集包含大量缺失值。医疗推荐系统（MedicalRec）是一个基于Transformer的模型，用于本研究中的项目推荐。该模型在数据集评估和与12个基础模型的评估中取得了显著成果。该模型实现了最高HitRate@100为75.5%。数据集和实现可通过GitHub链接获取：https://github.com/Ramin1Mousa/MedicalRec

英文摘要

The emergence of machine learning and deep learning has revolutionized the efficiency of diagnostic, therapeutic, and administrative systems in healthcare. However, this rapid adoption has come at the cost of requiring significant computing power and energy consumption, as well as e-waste disposal and carbon emissions. One of the challenges of these models is choosing the right model for classification tasks. To this end, researchers attempt to identify the optimal model using their data through trial and error, which involves energy consumption and waste. The goal of this study is to develop a model-based recommender system for medical image classification. For this purpose, a data set was collected from 3,000 articles in the field of medical image classification. This dataset, publicly available under the name MedicalRec-Bench, contains over 5,000 records of models tested in various tasks, including Skin Cancer Classification, Tumour Classification, Wound Classification, Breast Cancer, and MRI classification. The dataset was evaluated in four different modes, depending on the number of features: MedicalRec I (5 features), MedicalRec II (9 features), MedicalRec III (11 features), and MedicalRec IV (18 features). Collecting all values for the features is challenging due to non-reporting by the authors; hence, the dataset contains significant amounts of missing values. The Medical Recommender System (MedicalRec) is a transformer-based model used for item recommendations in this study. This model achieved remarkable results in the evaluation on the dataset and in the evaluation with 12 base models. This model achieved a maximum HitRate@100 of 75.5%. The dataset and implementations are available through the GitHub link: https://github.com/Ramin1Mousa/MedicalRec

URL PDF HTML ☆

赞 0 踩 0

2606.07587 2026-06-09 cs.LG 新提交

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

路由平台：理解并突破LLM路由器的准确性极限

Yifan Lu, Qiyue Zhang, Shenrun Zhang, Zhibo Yu, Zhuang Wang, Hanjie Chen, Jiarong Xing

发表机构 * Rice University（莱斯大学）； Amazon（亚马逊）

AI总结研究发现多种LLM路由方法存在“路由平台”现象，即准确性趋同且远低于理想路由器，主要原因是可预测性瓶颈；通过增大训练数据、更强编码器和端到端微调可突破平台。

Comments 23 Pages, 12 Tables, 9 Figures

详情

AI中文摘要

LLM路由已成为一种流行的方法，通过为每个查询动态选择模型来改善LLM服务的成本-质量权衡。最近的工作探索了广泛的路由方法，包括基于聚类的路由器、学习分类器、成对排序和基于置信度的方法。我们对五个基准测试中的21种路由方法的广泛研究揭示了一个一致的现象，我们称之为路由平台：许多方法，包括kNN，实现了非常相似的准确性，并收敛到一个狭窄的性能范围，远低于理想路由器。我们的研究表明，平台主要是由可预测性瓶颈引起的：当前路由器主要学习全局平均模型性能趋势，而不是细粒度的查询特定路由信号。因此，它们解决了重叠的简单查询，但共同在需要实例特定路由决策的困难查询上失败。我们进一步研究如何超越平台，发现更大的训练数据集、更强的编码器和端到端微调可以进一步提高路由准确性。这些发现表征了当前路由方法的常见限制，并为社区构建更有效的路由系统提供了见解和可操作的方向。

英文摘要

LLM routing has become a popular approach to improve the cost-quality trade-off of LLM services by dynamically selecting a model for each query. Recent work has explored a broad range of routing methods, including clustering-based routers, learned classifiers, pairwise ranking, and confidence-based approaches. Our extensive study of 21 routing methods across five benchmarks reveals a consistent phenomenon that we call the routing plateau: many methods, including kNN, achieve very similar accuracy and converge to a narrow performance range that remains far below the oracle router. Our investigation shows that the plateau is largely caused by a predictability bottleneck: current routers mainly learn global averaged model-performance trends rather than fine-grained query-specific routing signals. As a result, they solve overlapping easy queries but collectively fail on hard queries that require instance-specific routing decisions. We further study how to move beyond the plateau and find that larger training datasets, stronger encoders, and end-to-end fine-tuning can further improve routing accuracy. These findings characterize the common limits of current routing methods and provide insights and actionable directions for the community to build more effective routing systems.

URL PDF HTML ☆

赞 0 踩 0

2606.07597 2026-06-09 cs.LG cs.AI 新提交

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

重复不匹配：为什么数据混合实验无法扩展以及如何修复

Kevin Zhou, Lisa Alazraki, Kris Cao, Marek Rei

发表机构 * Imperial College London（帝国理工学院）； Cohere

AI总结针对预训练数据混合中因高质量数据重复率变化导致的小规模实验外推失败问题，提出重复控制子采样方法，在1/16目标token预算下实现接近最优混合，揭示了重复动态而非规模决定实验泛化性。

详情

AI中文摘要

预训练数据混合通常通过运行小规模实验并外推到目标训练预算来调整。当高质量数据稀缺且必须重复时，这种外推经常失败，但失败的原因尚未被隔离。我们表明，一个主要原因是重复不匹配：由于高质量数据集很小，它们的重复率随着训练预算的增长而变化，以小规模代理实验未预期的方式改变最优混合。一种匹配目标重复率的子采样程序可以控制这种效应。在结合有限高质量数据和网络爬取的双源设置中，仅使用目标token的1/16的单一重复控制实验即可恢复757M参数模型的最优混合，误差在0.05以内，而无重复控制时误差为0.75。在没有重复控制的情况下达到相当的精度需要三到四个视野，消耗目标token预算的44%到94%。对于三个数据源，更大的混合空间需要不止一个实验来约束，但该方法仍然有效：在757M规模下，仅两个重复控制视野即可恢复最优混合，优于需要完整双源实验构建的基线。我们的结果表明，重复动态（而非仅规模）决定了小规模混合实验是否泛化。更广泛地说，它们表明数据重复应被视为混合优化中的第一类变量，而不是有限数据的不便副作用。

英文摘要

Pre-training data mixtures are commonly tuned by running small-scale experiments and extrapolating to the target training budget. When high-quality data is scarce and must be repeated, this extrapolation frequently fails, but the source of the failure has not been isolated. We show that a primary culprit is a repetition mismatch: because high-quality datasets are small, their repetition rate changes as the training budget grows, shifting the optimal mixture in ways that small-scale proxy experiments do not anticipate. A subsampling procedure that matches the target repetition rate controls for this effect. In a two-source setting combining limited high-quality data with web crawl, a single repetition-controlled experiment using only 1/16 of the target tokens recovers a mixture within 0.05 of the optimum for a 757M parameter model, compared to an error of 0.75 without repetition control. Achieving comparable accuracy without repetition control requires three to four horizons, consuming 44 to 94% of the target token budget. With three data sources, the larger mixture space requires more than a single experiment to constrain, but the approach remains effective: at the 757M scale, just two repetition-controlled horizons recover the optimal mixture, outperforming baselines that instead require the full two-source experiments to construct. Our results reveal that repetition dynamics, not scale alone, shape whether small-scale mixture experiments generalize. More broadly, they suggest that data repetition deserves treatment as a first-class variable in mixture optimization, rather than an inconvenient side effect of limited data.

URL PDF HTML ☆

赞 0 踩 0

2606.07607 2026-06-09 cs.LG q-bio.GN 新提交

Position: Genomic Model Research Must Move Beyond Anecdotal Evaluation of Interpretability Methods

立场：基因组模型研究必须超越可解释性方法的轶事评估

Shasha Zhou, Mingyu Huang, Ke Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本文通过转录因子结合基准测试，揭示不同可解释性方法常产生矛盾解释、无法定位已知调控基序且不能忠实反映模型决策，主张采用类似临床试验的系统验证框架。

详情

AI中文摘要

机器学习和计算能力的进步释放了人类基因组的预测潜力，但生物学家现在要求这些模型也能阐明潜在的生物学机制。尽管可解释机器学习（IML）技术已被越来越多地用于弥合这一差距，但普遍存在对轶事验证的依赖：绝大多数研究仅依赖单一IML方法，并仅报告孤立的成功实例。通过对转录因子结合的基准测试，我们展示了当前实践的风险。我们表明，不同的IML方法通常可能（1）对同一预测产生矛盾的解释，（2）无法定位已知的调控基序，以及（3）未能忠实反映模型的内部决策过程。鉴于此，我们主张建立一个类似于临床试验的验证框架：正如试验需要严格的设计和不良事件报告，基因组可解释性必须超越挑选的合理性，转向对一致性、忠实性和生物学有效性的系统评估。为促进这一点，我们提出了一个分层框架，以指导基因组IML方法的严格评估和报告。

英文摘要

Advances in machine learning and computational power have unlocked the predictive potential of the human genome, yet biologists now demand that these models also elucidate the underlying biological mechanisms. While interpretable machine learning (IML) techniques have been increasingly applied to bridge this gap, there has been a pervasive reliance on anecdotal validation: the vast majority of research relies on a single IML method and reports only isolated successful instances. Through a benchmarking study on transcription factor binding, we demonstrate the risks of current practices. We show that different IML methods can often (1) yield contradictory explanations for the same predictions, (2) fail to localize known regulatory motifs, and (3) fail to faithfully reflect the model's internal decision process. In light of this, we argue for a validation framework analogous to clinical trials: just as trials require rigorous design and adverse-event reporting, genomic interpretability must move beyond cherry-picked plausibility toward systematic assessment of consistency, faithfulness, and biological validity. To facilitate this, we propose a tiered framework to guide rigorous evaluation and reporting of genomic IML methods.

URL PDF HTML ☆

赞 0 踩 0

2606.07616 2026-06-09 cs.LG cs.AI cs.CL 新提交

Item Response Scaling Laws: A Measurement Theory Approach for Efficient and Generalizable Neural Scaling Estimation

项目反应缩放定律：一种高效且可泛化的神经缩放估计的测量理论方法

Sang Truong, Yuheng Tu, Rylan Schaeffer, Sanmi Koyejo

AI总结提出项目反应缩放定律（IRSL），将项目反应理论融入缩放定律框架，通过Beta-IRT模型利用语言模型的概率响应，将参数复杂度从O(M×N)降至O(M+N)，在预训练和测试时缩放场景中仅用50个问题即可实现可靠估计。

详情

AI中文摘要

缩放定律为理解语言模型（LM）的性能提供了基本框架，但推导它们需要在数千个检查点或数百万个推理样本上进行成本高昂的评估。为了解决这个问题，我们引入了项目反应缩放定律（IRSL），这是一个将项目反应理论（IRT）整合到缩放定律框架中的统一框架。与将每个模型-基准对单独处理的传统方法不同，IRSL将潜在模型能力与问题特征分离，将M个模型和N个问题的缩放定律估计分解，从而将参数复杂度从O(M×N)显著降低到O(M+N)。我们使用Beta-IRT实例化IRSL，它利用LM的经验概率响应——例如预训练中的token概率和测试时采样中的通过率——来捕获比二元响应更丰富的信号。我们在两种常见的缩放范式上验证了我们的方法：（1）预训练下游缩放，使用来自10个基准的6,612个LM检查点和37,682个问题；以及（2）测试时缩放，使用来自4个基准的12个LM和120个问题，每个问题最多2,500个样本。在现有模型响应上进行一次性校准后，IRSL仅使用每个基准50个问题（减少99.9%）即可产生更可靠的缩放估计，达到与传统方法相当或更优的决策准确性。此外，我们表明估计的潜在模型能力是可泛化的，从而能够跨共享相同测量目标的基准进行准确的性能预测。

英文摘要

Scaling laws provide a fundamental framework for understanding the performance of Language Models (LMs), yet deriving them requires prohibitively expensive evaluations across thousands of checkpoints or millions of inference samples. To address this, we introduce Item Response Scaling Laws (IRSL), a unified framework that integrates Item Response Theory (IRT) within the scaling law framework. Unlike traditional approaches that treat each model-benchmark pair in isolation, IRSL disentangles latent model ability from question characteristics, factorizing the scaling law estimation for $M$ models and $N$ questions to significantly reduce parameter complexity from $O(M \times N)$ to $O(M + N)$. We instantiate IRSL with Beta-IRT, which leverages the empirical probability responses of LMs -- such as token probabilities in pre-training and pass rates in test-time sampling -- to capture richer signals than binary responses. We validate our approach across two prevalent scaling paradigms: (1) pre-training downstream scaling, using 6,612 LM checkpoints and 37,682 questions from 10 benchmarks; and (2) test-time scaling, using 12 LMs and 120 questions from 4 benchmarks with up to 2,500 samples per question. Given a one-time calibration on existing model responses, IRSL yields more reliable scaling estimates using only 50 questions per benchmark (a 99.9\% reduction), achieving comparable or superior decision accuracy to traditional approaches. Furthermore, we show that the estimated latent model abilities are generalizable, enabling accurate performance forecasting across benchmarks that share the same measurement objective.

URL PDF HTML ☆

赞 0 踩 0

2606.07630 2026-06-09 cs.LG cs.AI stat.ML 新提交

Active Learning with Foundation Model Priors: Efficient Learning under Class Imbalance

基于基础模型先验的主动学习：类别不平衡下的高效学习

Jiancheng Zhang, Meiqing Li, Qi Zhang, Yinglun Zhu

发表机构 * University of California, Riverside（加州大学河滨分校）； Carnegie Mellon University（卡内基梅隆大学）； Worcester Polytechnic Institute（伍斯特理工学院）

AI总结针对现实数据中的类别不平衡和噪声标注问题，提出一种利用基础模型先验的主动学习框架，通过不平衡感知的协同决策选择信息量最大的样本，在图像和文本数据集上实现超过50%的标注节省。

Comments To appear at ICML 2026

详情

AI中文摘要

现实世界中图像和文本领域的数据集通常具有偏斜的类别分布和噪声标注，这共同降低了模型性能，尤其是对少数类。在现有解决方案中，主动学习通过选择性地查询信息最丰富且平衡的样本进行标注，提供了一种有效且高效的范式。我们提出了一种创新的主动学习框架，该框架减轻了类别不平衡，并选择信息量最大的样本进行标注。利用基础模型先验，我们的算法使得基础模型和小模型之间能够进行不平衡感知的协同决策，以处理跨领域的有噪声和不平衡标签。我们首次系统性地研究了在图像和文本领域中标签噪声和类别不平衡双重挑战下的主动学习。在不平衡数据集上的大量实验表明，我们的方法实现了显著的标注节省——与最佳主动学习基线相比超过50%——同时保持了对标签噪声的性能和鲁棒性。

英文摘要

Real-world datasets across image and text domains are often characterized by skewed class distributions and noisy annotations, which jointly degrade model performance, particularly on minority classes. Among existing solutions, active learning offers an effective and efficient paradigm by selectively querying the most informative and balanced samples for annotation. We propose an innovative active learning framework that mitigates class imbalance and selects the most informative samples to annotate. Leveraging foundation model priors, our algorithm enables imbalance-aware co-decisions between foundation model and small model to tackle noisy and imbalanced labels across various domains. We introduce the first study to systematically explore active learning under the dual challenges of label noise and class imbalance across image and text domains. Extensive experiments on imbalanced datasets demonstrate that our method achieves substantial annotation savings-over 50% compared to the best active learning baseline-while preserving performance and robustness to label noise.

URL PDF HTML ☆

赞 0 踩 0

2606.07632 2026-06-09 cs.LG 新提交

Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment

评估机器学习资源利用需要模型生命周期评估

Jared Fernandez, Clara Na, Yonatan Bisk, Constantine Samaras, Emma Strubell

发表机构 * GitHub ； arXiv

AI总结本文提出应用生命周期评估方法全面核算AI系统从硬件制造到训练推理的全链条资源消耗与环境影响，以弥补传统单一训练或推理成本评估的不足。

Comments ICML 2026: Position Paper Track

2606.07690 2026-06-09 cs.LG cs.AI 新提交

使用PCA和核PCA的航空公司聚类分析中的正交性与维度性

Andreas Schlapbach

发表机构 * Swiss Federal Railways (SBB)（瑞士联邦铁路（SBB））； University of Berne（伯尔尼大学）

AI总结本文复现了Renold等人对1995-2020年美国航空公司利润周期的聚类实验，通过PCA和核PCA分析，发现六聚类分类在原始7维和3维PC空间中具有几何鲁棒性，并验证了数据的内在线性流形结构。

详情

AI中文摘要

为了刻画1995年至2020年美国航空公司的利润周期，Renold等人（2023）结合了k-means聚类、主成分分析和系统动力学建模。我们在三个空间中复现了他们的聚类实验——原始7维变量空间、3维PC得分空间和4维PC得分空间，使用了他们论文中慷慨包含的数据集。我们表明，六聚类分类在几何上是鲁棒的：在3-PC空间中的k-means产生的聚类分配与7维原始空间逐位相同。作为非线性检验，我们在六个核（涵盖三个族加上一个线性基线）下应用核PCA。所有六个核在2D中保留了六聚类分配。一个1D诊断进一步收紧：线性核将COVID年份C_3与峰值利润聚类C_0混淆，而所有五个非基线核将C_3移动到仅与后金融危机聚类C_5重叠。核族之间的一致性证实了一个内在的线性流形，没有隐藏的曲率。轮廓准则显示，该数据集在结构上仅支持三个聚类，而不是六个。原始7D空间中的共线性抑制了本应识别k=3作为结构上合理选择的轮廓信号。

英文摘要

To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.

URL PDF HTML ☆

赞 0 踩 0

2606.08376 2026-06-09 cs.LG cs.AI 新提交

RiskNet: A large-scale dataset of AI risk incidents from news with alignment and multi-dimensional annotations

RiskNet：一个来自新闻的大规模AI风险事件数据集，包含对齐和多维标注

Leihan Zhang, Wecheng Ye, Xianlong Ma, Haochuan Liu, Yang Li, Qianyu Zhang, Jinliang Chen, Qiang Yan

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Beijing Key Laboratory of Multimodal Data Intelligent Perception and Governance（多模态数据智能感知与治理北京市重点实验室）

AI总结提出RiskNet，一个从多语言新闻构建的大规模AI风险事件数据集，通过结构化流水线进行事件识别、对齐和多维分类，支持AI安全、治理和风险分析研究。

Comments The manuscript has been submitted to Scientific Data

详情

AI中文摘要

随着人工智能（AI）系统越来越多地部署在社会关键领域，与AI相关的危害和失败事件的报告在频率和多样性上不断增加。尽管现有的治理框架阐述了负责任AI的高层原则，但用于跟踪和分析真实世界AI风险事件的大规模实证资源仍然有限。现有的事件集合通常由人工整理，规模相对较小，不足以支持持续、数据驱动的监控和下游计算分析。为满足这一需求，我们提出了RiskNet，一个从大规模多语言新闻源构建的AI风险事件数据集。RiskNet应用了一个结构化的流水线，用于AI风险新闻识别、事件级报告筛选、事件对齐和多维事件分类。生成的资源将分散的新闻报道组织成以事件为中心的记录，并为事件分类、事件对齐和事件级风险标注提供基准数据集。在当前版本中，RiskNet覆盖了数亿条源记录，并生成了一个大规模的AI风险相关报告集合，包括对齐的事件簇和标注的基准子集。该数据集还通过一个在线平台提供浏览和探索功能。我们描述了数据源、处理工作流、分类法设计以及资源的技术验证。RiskNet旨在支持AI安全、治理、风险分析和基准测试的下游研究，以及对AI相关危害的纵向和跨源分析。通过提供一个结构化且可复用的实证资源，RiskNet有助于弥合高层治理原则与AI风险事件记录现实之间的差距。

英文摘要

As artificial intelligence (AI) systems are increasingly deployed across socially consequential domains, reports of AI-related harms and failures have grown in frequency and diversity. Although existing governance frameworks articulate high-level principles for responsible AI, large-scale empirical resources for tracking and analyzing real-world AI risk incidents remain limited. Existing incident collections are often manually curated, relatively small in scale, and insufficient for continuous, data-driven monitoring and downstream computational analysis. To address this need, we present RiskNet, a large-scale dataset of AI risk incidents constructed from large-scale multilingual news sources. RiskNet applies a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification. The resulting resource organizes dispersed news reports into incident-centered records and provides benchmark datasets for event classification, incident alignment, and incident-level risk labeling. In its current release, RiskNet covers hundreds of millions of source records and yields a large-scale collection of AI risk-related reports, including aligned incident clusters and annotated benchmark subsets. The dataset is also accessible through an online platform for browsing and exploration. We describe the data sources, processing workflow, taxonomy design, and technical validation of the resource. RiskNet is intended to support downstream research on AI safety, governance, risk analysis, and benchmarking, as well as longitudinal and cross-source analyses of AI-related harms. By providing a structured and reusable empirical resource, RiskNet helps bridge the gap between high-level governance principles and the documented realities of AI risk incidents.

URL PDF HTML ☆

赞 0 踩 0

2606.08481 2026-06-09 cs.LG cs.AI cs.DB cs.SE 新提交

PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

PIPE-Cypher：面向文本到Cypher系统的自动企业基准生成

Suraj Ranganath, Anish Raghavendra

发表机构 * Halıcıoğlu School of Data Science and Computing, University of California, San Diego（加利福尼亚大学圣迭戈分校哈勒乔卢数据科学与计算学院）； Independent Researcher（独立研究员）

AI总结提出PIPE-Cypher流水线，利用本地大模型从企业属性图自动生成平衡的NL-to-Cypher基准，通过模式分析、逆向查询约束生成和执行验证等步骤，实现可重复的基准构建。

详情

AI中文摘要

企业属性图在模式结构、内部术语、领域假设、治理约束和用户交互模式上差异很大。因此，与部署相关的Text2Cypher基准反映了用户和代理实际对该图提出的问题。创建这样的基准很困难，因为模式和值是唯一的，且图结构随时间变化。每个自然语言查询对必须可执行、使用真实图实体、保持多样性，并在查询类型和难度级别上保持平衡。我们提出PIPE-Cypher，一个本地基准生成流水线，它将实时属性图和来自客户问题、分析师日志或代理工具调用的可选种子查询转化为平衡的NL-to-Cypher基准。PIPE-Cypher结合了模式分析、逆向查询接地、约束生成、确定性Cypher治理、执行验证、编辑、多样性控制以及校准的本地大语言模型评判器。使用本地Qwen3.5-9B生成和评判，PIPE-Cypher导出了3000个可接受的FinBench/SNB示例，完成了三个审计消融套件，用人类标签校准评判器行为，并评估了11个本地下游模型。生成的基准具有明确的区分性：零样本迁移效果弱，而少样本控制表明，特定模式的示例库可以帮助兼容的模型家族。总之，PIPE-Cypher使Text2Cypher基准测试成为一个可重复的过程，随图、用户和目标工作负载而演变。

英文摘要

Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain balanced across query types and difficulty levels. We present PIPE-Cypher, a local benchmark-generation pipeline that turns a live property graph and optional seed queries from customer questions, analyst logs, or agent tool calls into balanced NL-to-Cypher benchmarks. PIPE-Cypher combines schema profiling, reverse-query grounding, constrained generation, deterministic Cypher governance, execution validation, redaction, diversity controls, and a calibrated local LLM judge. Using local Qwen3.5-9B generation and judging, PIPE-Cypher exports 3,000 accepted FinBench/SNB examples, completes three audited ablation suites, calibrates judge behavior with human labels, and evaluates 11 local downstream models. The resulting benchmark is deliberately discriminative: zero-shot transfer is weak, while a few-shot control shows that schema-specific example banks can help compatible model families. Together, PIPE-Cypher makes Text2Cypher benchmarking a repeatable process that evolves with the graph, its users, and its target workloads.

URL PDF HTML ☆

赞 0 踩 0

2606.08718 2026-06-09 cs.LG cs.AI 新提交

Deep Active Re-Labeling: Toward Noise-Resilient Annotation Efficiency

深度主动重标注：迈向抗噪的标注效率

Md Abdullah Al Forhad, Weishi Shi

AI总结针对深度主动学习中人工标注噪声导致性能下降的问题，提出一种通过分配部分标注预算重新标注已标注数据来去噪的框架，实验表明在相同预算下更高效且最终数据集噪声较少。

Comments Accepted and published in the 2025 IEEE International Conference on Big Data (BigData). DOI: 10.1109/BigData66926.2025.11402126

详情

DOI: 10.1109/BigData66926.2025.11402126
Journal ref: 2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 886-895

AI中文摘要

虽然深度主动学习（DAL）有效减少了人工标注成本，但其效果受到人工标注误差的限制。这是因为主动学习采样的数据被认为对训练具有高度信息性。当人工标注者以一定比率向这些信息性数据引入错误时，主动学习性能显著下降，有时甚至比被动学习更差。本文首先分析了DAL设置中人工标注误差的影响。然后，我们提出了一个框架来解决DAL中的人工标注噪声问题。受人类学习模式的启发，我们提出的解决方案的核心思想是将部分人工标注预算分配给重新标注已标注的数据。先前的理论工作表明，当模型具备一定识别潜在噪声数据的能力时，即使重新标注一小部分数据也能有效去除主动训练集中的噪声。为此，我们实现了两种主动噪声采样策略，在不同情况下检测噪声，并分配部分标注预算重新标注这些实例。我们的方法赋予了主动学习一种回顾和内省的行为。实验表明，在相同标注预算下，我们的方法数据效率更高，并最终产生一个相对无噪声的标注数据集。

英文摘要

While Deep Active Learning (DAL) effectively reduces human annotation costs, its efficacy is constrained by human annotation errors. This is because the data sampled for active learning is assumed to be highly informative for training. When human annotators introduce errors into this informative data at a certain rate, the active learning performance drops significantly and, in some cases, even exhibits worse outcomes than passive learning. In this paper, we first analyze the impact of human annotation errors in the DAL setting. Then we propose a framework to address the human annotation noise problem for DAL. Informed by human learning patterns, the core idea of our proposed solution involves allocating a portion of the human annotation budget to re-annotate data that has already been labeled. Previous theoretical work suggests that when the model possesses a certain level of ability to identify potentially noisy data, even re-labeling a small fraction of the data can effectively remove noise from the active training set. To achieve this, we implement two active noise sampling strategies to detect noise under different circumstances and allocate a part of the annotation budget to re-annotate these instances. Our approach imbues active learning with a revisiting and introspective behavior. Our experiments demonstrate that, under the same annotation budget, our method is more data-efficient and yields a relatively noise-free annotation dataset in the end.

URL PDF HTML ☆

赞 0 踩 0

2606.08736 2026-06-09 cs.LG cs.DB 新提交

Declarative Outcome-Conformant Synthesis: Exact, Closed-Form Specification Satisfaction and a Conformance Benchmark

声明性结果一致性合成：精确、闭式规范满足及一致性基准

Muhammed Rasin

发表机构 * Independent Researcher（独立研究员）

AI总结针对无源数据下精确满足声明性分析结果的需求，提出结果一致性合成任务，通过闭式条件伽马抽样实现精确聚合，并构建SpecBench基准，证明一致性保真度正交。

Comments 22 pages, 1 figure. Benchmark and reference implementation (MIT): https://github.com/rasinmuhammed/misata

详情

AI中文摘要

我们研究合成表格数据主流范式未能提供的能力：在无源数据下精确满足声明的分析结果。模仿方法（copula、GAN、扩散）学习真实分布并从中采样，其评价基于对真实数据的保真度。一大类实际需求不同：在无源数据（冷启动）下生成数据，该数据在关系模式上复现声明的结果（收入曲线、流失率、群体份额）。现成的模仿工具不提供针对此类目标的接口，且由于采样方差，没有采样器能精确命中聚合值。在真实公共数据集上，基于该数据训练的现成学习合成器将声明的月度聚合值偏离74%至86%；逐周期优化将偏离降至约19%，但仍无法达到0；而闭式生成器精确达到0。我们将此任务命名为结果一致性合成，论证其评价轴为一致性而非保真度，并展示两轴正交。我们的贡献包括：(1) 形式化描述，表明广泛使用的精确聚合生成器族实际上是伽马总体的条件求和采样（通过Lukacs刻画），具有闭式精确性、闭式边际变异系数和尺度不变性；受控实验描绘边界，强制精确聚合在1-Wasserstein距离上对任意外部边际的成本最多为0.006，其余为形状族失配；(2) SpecBench，据我们所知，这是首个衡量冷启动关系合成中分析结果一致性的基准；(3) 一个闭式确定性参考系统。精确聚合本身是平凡的；贡献在于一致性联合闭式边际、完整性、确定性和零源数据。我们承认在存在真实数据时模仿方法的保真度优势。

英文摘要

We study a capability the dominant paradigm in synthetic tabular data does not provide: exact satisfaction of a declared analytical outcome with no source data. Imitation methods (copulas, GANs, diffusion) learn a real distribution and sample from it, and are judged on fidelity to real data. A large, practical class of needs is different: generating data with no source data ("cold start") that reproduces a declared outcome (a revenue curve, a churn rate, a group share) across a relational schema. Off-the-shelf imitation tools offer no interface for such targets, and no sampler can hit an exact aggregate, because sampling has variance. On a real public dataset, off-the-shelf learned synthesizers trained on that very data miss the declared monthly aggregate by 74 to 86 percent; a per-period steelman cuts the miss to about 19 percent and still cannot reach 0; a closed-form generator reaches exactly 0. We name this task outcome-conformant synthesis, argue its evaluation axis is conformance rather than fidelity, and show the two axes are orthogonal. We contribute: (1) a formal account showing a widely-used family of exact-aggregate generators is exactly conditional-sum sampling of a Gamma population (via Lukacs' characterization), with closed-form exactness, a closed-form marginal CV, and scale-invariance; a controlled experiment maps the boundary, enforcing the exact aggregate costs at most 0.006 in 1-Wasserstein distance to an arbitrary external marginal, the rest being shape-family mismatch; (2) SpecBench, to our knowledge the first benchmark to measure conformance to analytical outcomes for cold-start relational synthesis; and (3) a closed-form, deterministic reference system. Exact aggregation alone is trivial; the contribution is conformance jointly with closed-form marginals, integrity, determinism, and zero source data. We concede fidelity to imitation where real data exists.

URL PDF HTML ☆

赞 0 踩 0

2606.08903 2026-06-09 cs.LG 新提交

Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records

合成但不真实：结构化电子病历生成建模中的评估挑战

Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm

发表机构 * Centre for Big Data Research in Health, the University of New South Wales（新南威尔士大学健康大数据研究中心）

AI总结针对合成电子病历评估过度依赖统计相似性而忽视临床有效性的问题，提出基于流行病学的多维度评估框架，发现当前生成模型虽能复现边缘分布，但无法同时保持亚组结构、效应估计和依赖关系，导致评估高估数据质量。

详情

AI中文摘要

合成医疗数据被广泛提议作为真实患者数据的隐私保护替代品，但其评估仍然以统计相似性和预测性能为主，这些并不能反映临床有效性。我们引入了一个基于流行病学的多维度评估框架，评估描述性保真度、临床实用性和结构有效性，分别对应描述性、预测性和因果性问题。我们使用PRIME-CVD（一个具有已知真实结构的5万人队列）评估了四种代表性生成范式——基于GAN、VAE增强、基于扩散和掩码建模。虽然所有模型都再现了边缘分布，但没有一个能同时保留亚组结构、效应估计和依赖结构。值得注意的是，具有强分布保真度的模型可能表现出较差的校准和扭曲的关系，导致不可靠的推断。这些结果表明，当前的评估实践可能高估了合成数据质量，并促使基于支持有效临床和科学结论的能力进行领域知情的评估。

英文摘要

Synthetic healthcare data are widely proposed as privacy-preserving substitutes for real patient data, yet their evaluation remains dominated by statistical similarity and predictive performance that do not reflect clinical validity. We introduce a multi-dimensional evaluation framework grounded in epidemiology, assessing descriptive fidelity, clinical utility, and structural validity, corresponding to descriptive, predictive, and causal questions. We evaluate four representative generative paradigms - GAN-based, VAE-boosted, diffusion-based, and masked modelling - using PRIME-CVD, a 50,000-person cohort with known ground-truth structure. While all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Notably, models with strong distributional fidelity can exhibit poor calibration and distorted relationships, leading to unreliable inference. These results show that current evaluation practices can overestimate synthetic data quality and motivate domain-informed assessment based on the ability to support valid clinical and scientific conclusions.

URL PDF HTML ☆

赞 0 踩 0

2606.08921 2026-06-09 cs.LG 新提交

Generalized Rank-based Evaluation for Knowledge Graph Completion: Perspectives, Framework, and Analyses

基于排序的知识图谱补全广义评估：视角、框架与分析

Sooho Moon, Jian Kang, Yunyong Ko

发表机构 * Chung-Ang University（中央大学）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结针对现有评估指标忽视预测锐度与流行偏差鲁棒性的问题，提出广义评估框架PROBE，通过排序变换器和排序聚合器实现更全面、灵活且一致的模型评估。

Comments 25 pages, 12 figures, 5 tables

详情

AI中文摘要

Orange Lab：通过嵌入式交互工作流降低数据挖掘门槛

Matej Bevec, Aleš Erjavec, Vesna Tanko, Lena Trnovec, Lan Žagar, Ana Farič, Janez Demšar, Blaž Zupan

发表机构 * University of Ljubljana（卢布尔雅那大学）； Revelo d.o.o.（Revelo公司）

AI总结提出Orange Lab，一种基于Web的可视化数据分析环境，通过组件展示范式将机器学习工作流嵌入任意网页，实现动态交互与数据驱动叙事，降低数据科学使用门槛。

详情

AI中文摘要

虽然数据分析工作流的可视化编程已成为数据科学民主化的重要工具，但此类系统仍主要局限于独立应用程序，并且对将其可视化分析解决方案过渡到交互式网络环境的支持有限。因此，数据分析管道难以共享、嵌入和适应用户面向的分析工具。我们提出了Orange Lab，一个基于Web的协作式可视化数据分析环境。其核心是，Orange Lab使用户能够从模块化组件中可视化地构建机器学习工作流，其中任何组件中的交互都会无缝地传播到整个工作流，将静态管道转变为支持探索和数据驱动叙事的动态响应系统。我们的关键贡献是组件展示，这是一种范式，允许作者将选定的工作流组件或其界面部分嵌入到任意网络上下文中，创建同步的交互式界面，同时隐藏底层工作流的复杂性。这支持开发定制化的分析视图和叙事驱动的体验，将数据分析直接集成到在线材料中。我们通过在数据素养教育中的部署来展示该方法，其中嵌入式组件引导学生动手探索机器学习概念，而无需了解底层系统，表明Orange Lab有效降低了入门门槛并支持数据科学的民主化。

英文摘要

While visual programming of data analysis workflows has become an important vehicle for the democratization of data science, such systems remain largely confined to standalone applications and offer limited support for transitioning their visual analytics solutions into interactive web environments. As a result, data analysis pipelines are difficult to share, embed, and adapt into user-facing analytical tools. We present Orange Lab, a web-based collaborative environment for visual data analytics. At its core, Orange Lab enables users to visually construct machine learning workflows from modular components, where interactions in any component propagate seamlessly through the workflow, turning static pipelines into dynamic, reactive systems that support exploration and data-driven storytelling. Our key contribution is component exposition, a paradigm that allows authors to embed selected workflow components, or parts of their interfaces, into arbitrary web contexts, creating synchronized, interactive interfaces while hiding underlying workflow complexity. This enables the development of tailored analytical views and narrative-driven experiences that integrate data analysis directly into online materials. We demonstrate the approach through deployments in data literacy education, where embedded components guide students in hands-on exploration of machine learning concepts without requiring knowledge of the underlying system, showing that Orange Lab effectively lowers barriers to entry and supports the democratization of data science.

URL PDF HTML ☆

赞 0 踩 0

2606.09276 2026-06-09 cs.LG 新提交

ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms

ERBench：方程发现算法的基准与测试套件

Paul Kahlmeyer, Henrik Voigt, Michael Habeck, Joachim Giesen

发表机构 * University of Jena（耶拿大学）

AI总结提出ERBench基准，通过方程恢复任务评估符号回归算法，强调在变化维度、采样大小、分布和域下的鲁棒性，填补现有基准的空白。

详情

AI中文摘要

方程发现旨在从数据中自动发现数学方程形式的科学模型。技术上，方程发现通过符号回归算法实现。符号回归用于方程发现的性能沿两个维度衡量：测试数据的预测精度，以及已知真实公式的恢复。对于标准回归，精度通常通过域内测试数据衡量，例如，将数据集随机分为训练和测试数据。虽然这对于域内插值（普通回归的常见目标）有意义，但它可能误导真正的模型发现和泛化。明显的替代方案是衡量域外精度。然而，获得具有挑战性的域外测试数据是一个非平凡问题。因此，我们专注于方程恢复来评估用于方程发现的符号回归算法。理由是，在恢复已知真实公式方面表现良好的符号回归算法是未知方程发现中表现良好的良好候选。现有的符号回归基准包括方程恢复任务，但只有少量公开已知的真实公式。此外，这些基准较少强调评估算法在变化维度、采样大小、采样分布和采样域下的鲁棒性。然而，这对于希望发现自然现象建模方程的从业者至关重要，因为数据几乎肯定有噪声，并且来自不同的域、分布和样本大小。为填补这一空白，我们引入了方程恢复基准（ERBench），这是一个新的评估框架，旨在严格评估明确针对方程发现任务的算法。

英文摘要

Equation discovery aims to automate the discovery of scientific models in the form of mathematical equations from data. Technically, equation discovery is implemented by symbolic regression algorithms. Performance of symbolic regression for equation discovery is measured along two dimensions: Prediction accuracy on test data, and recovery of known groundtruth formulas. For standard regression, accuracy is typically measured on in-domain test data, for instance, by splitting a data set randomly into training and test data. While this makes sense for in-domain interpolation, which is the common goal in ordinary regression, it can be a misleading proxy for true model discovery and generalization. The obvious alternative is to measure out-of-domain accuracy. However, obtaining challenging out-of-domain test data is a non-trivial problem. Therefore, we focus on equation recovery for evaluating symbolic regression algorithms for equation discovery. The rationale is that symbolic regression algorithms that perform well in recovering known groundtruth formulas are good candidates to perform well in unknown equation discovery. Existing benchmarks for symbolic regression include equation recovery tasks, however, with only a small number of groundtruth formulas that are publicly known. Moreover, these benchmarks place less emphasis on evaluating the robustness of algorithms in terms of their behavior under changing dimensionality, sampling size, sampling distribution and sampling domain. This, however, is of central importance to practitioners wanting to discover equations for modeling natural phenomena, since data is almost certainly noisy and comes from diverse domains, distributions, and sample sizes. To fill this gap, we introduce the Equation Recovery Benchmark (ERBench), a new evaluation framework designed to rigorously assess algorithms explicitly targeting the task of equation discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.09517 2026-06-09 cs.LG 新提交

Investigating Calibration Challenges in Probabilistic Electricity Price Forecasting

研究概率电价预测中的校准挑战

Jan Niklas Lettner, Hadeer El Ashhab, Benjamin Schäfer

发表机构 * Institute for Automation and Applied Informatics（自动化与应用信息学研究所）； Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）

AI总结本文指出当前概率电价预测中评分规则偏向锐度而忽视校准，导致过自信估计，呼吁未来研究转向校准感知的目标和架构。

Comments Presented at the ACM Sustainability Week Companion 2026, Banff, AB, Canada

2606.09764 2026-06-09 cs.LG cs.CL 新提交

评估AI代理在神经科学数据到发现流程中的案例研究

Kai A. Horstmann, Ethan Lin, Alice A. Robie, Jennifer J. Sun, Kristin Branson

发表机构 * Cornell University（康奈尔大学）； HHMI Janelia Research Campus（霍华德·休斯医学研究所贾雷尔研究园区）

AI总结本研究评估通用编码代理在果蝇光遗传学数据到发现流程中的表现，发现代理能解决单个阶段任务，但端到端流程仍超出其能力，主要挑战包括缺乏预定义迭代标准和科学判断能力。

详情

AI中文摘要

代理型AI工具为自动化科学研究流程中的软件开发瓶颈提供了有希望的路径，特别是对于那些需要领域专家花费数天到数月构建的阶段，科学家关心的是正确性和鲁棒性，而非实现细节。我们针对果蝇光遗传学数据到发现流程，对通用编码代理进行了实证研究。我们在比现有基准大得多的任务、数量级更大的数据集以及基于领域专家标准的评估标准上评估代理。我们表明，代理可以解决几个单独的流程阶段，这表明阶段级自动化是可行的。通过分析代理的代码迭代，我们发现当没有预定义的标准可供迭代时，它们最困难，此时它们必须利用自己的科学判断来评估当前解决方案，这是一个关键开放挑战。与科学实践相呼应，它们有时尝试对中间输出进行视觉检查以进行自我评估，但大多未能正确解释所见或据此采取行动。正确解决端到端流程需要将所有流程阶段的成功串联起来，这超出了代理当前的能力。我们识别出现有基准中基本缺失的挑战，包括计算资源管理和对大型保留数据集的泛化。最后，我们提炼出构建科学任务和针对开放问题的严格评估标准的原则。

英文摘要

Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and robustness, not implementation details. We present an empirical study of general-purpose coding agents on a fly optogenetics data-to-discovery pipeline. We assess agents on tasks substantially larger than existing benchmarks, datasets orders of magnitude bigger, and evaluation criteria grounded in domain expert standards. We show that agents can solve several individual pipeline stages, suggesting stage-level automation is tractable. By analyzing agents' code iterations, we show that they struggle most when there is not a pre-defined criterion to iterate on, and they must instead use their scientific judgment to assess their current solution, a key open challenge. Mirroring scientific practice, they sometimes attempt visual inspection of intermediate outputs for self-evaluation, but largely fail to interpret what they see or act on it appropriately. Solving the end-to-end pipeline correctly requires stringing together successes across all pipeline stages, and this is beyond agents' current abilities. We identify challenges largely absent from existing benchmarks, including computational resource management and generalization to large held-out data collections. Finally, we distill principles for constructing scientific tasks and rigorous evaluation criteria for open-ended problems.

URL PDF HTML ☆

赞 0 踩 0

2606.07810 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

SLMJury: Can Small Language Models Judge as Well as Large Ones?

SLMJury：小型语言模型能否像大型模型一样进行评判？

Anish Laddha, Nitesh Pradhan, Gaurav Srivastava

发表机构 * LNMIIT ； Virginia Tech（弗吉尼亚理工大学）

AI总结提出SLMJury框架，评估小型语言模型作为评判者的能力，发现领域依赖的过度思考效应、领域泛化差异、闭端与开端评判能力分离，以及多智能体辩论降低准确性。

详情

AI中文摘要

大型语言模型（LLMs）被广泛用作评估模型输出的评判者，但其高成本、延迟和不透明性限制了可扩展性。我们引入SLMJury，一个评估小型语言模型（SLMs）作为评判者的框架，涵盖两种范式：闭端二元正确性和开端质量评分。我们在四个模型家族的16个SLM评判者（0.6B-14B参数）上，跨十个基准进行基准测试：八个闭端任务涵盖数学、科学和通用推理（每个配置N=64,824个判断），以及用于摘要和对话评分的SummEval和MT-Bench。我们将评判形式化为预算条件函数，并研究五个维度。得出四个发现。（1）过度思考效应是领域依赖的：对于大多数评判者，快速10令牌判决在数学评判上匹配或优于扩展推理（在有帮助的情况下提升2-7%），而推理在通用任务上胜出高达23%。（2）领域泛化区分了模型家族，数学到通用准确率差距从低于10%到接近40%不等。（3）闭端和开端评判依赖不同的能力：最佳二元评判者（Phi-4）在MT-Bench上降至第9名，而经过推理训练的模型则反转了这一顺序。（4）在反思-批判-改进（RCR）辩论协议下，多智能体辩论在所有测试配置中降低了准确性，而顶级评判者抵抗六种对抗性人格的方差<=0.55%。可靠的自动评估不需要大型专有模型，但没有单一的SLM占主导地位。排行榜可在https://anishh15.github.io/SLMJury/获取，我们的框架代码和pip包公开在https://github.com/anishh15/SLMJury和https://pypi.org/project/slmjury/。

英文摘要

Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges (0.6B-14B parameters) from four model families across ten benchmarks: eight closed-ended tasks spanning mathematical, scientific, and general reasoning (N=64,824 judgments per configuration), plus SummEval and MT-Bench for summarization and conversational scoring. We formalize judging as a budget-conditioned function and study five dimensions. Four findings emerge. (1) The overthinking effect is domain-dependent: for most judges quick 10-token verdicts match or beat extended reasoning on mathematical judging (by 2-7% where they help), while reasoning wins on general tasks by up to 23%. (2) Domain generalization separates model families, with math-to-general accuracy gaps ranging from under 10% to nearly 40%. (3) Closed-ended and open-ended judging draw on different capabilities: the best binary judge (Phi-4) drops to rank 9 on MT-Bench, while reasoning-trained models invert this ordering. (4) Under the Reflect-Critique-Refine (RCR) debate protocol, multi-agent debate degrades accuracy across all tested configurations, whereas the top judges resist six adversarial personas with <=0.55% variance. Reliable automated evaluation does not require large proprietary models, yet no single SLM dominates. The leaderboard is available at https://anishh15.github.io/SLMJury/, and our framework code and pip package are publicly available at https://github.com/anishh15/SLMJury and https://pypi.org/project/slmjury/.

URL PDF HTML ☆

赞 0 踩 0

2606.07951 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

From `May' to `Is': Certainty Distortion in Language Model Rewriting

从“可能”到“是”：语言模型改写中的确定性扭曲

Catarina G Belem, Shang Wu, Hongyu Yao, Mark Steyvers, Sameer Singh, Padhraic Smyth

发表机构 * University of California Irvine（加利福尼亚大学尔湾分校）； Massachusetts Institute of Technology（麻省理工学院）

AI总结研究语言模型在改写任务中系统性增加表达确定性的偏差，提出基于人群判断的评估指标，发现高达75%的输出存在确定性扭曲，且模型更倾向于提高确定性。

详情

AI中文摘要

人类越来越多地以塑造信念和驱动决策的方式使用语言模型（LM），包括讨论、改写和总结来自科学文章、新闻和医学报告的信息。然而，在这些领域中，主张表达的信心程度至关重要，但关于LM是否忠实地保留它却知之甚少。在这项工作中，我们研究了LM中的确定性扭曲，定义为当语义内容被保留时，表达确定性的有意义变化。我们提出了一种基于LM的评估指标，该指标与人群层面的确定性判断一致。使用该指标，我们在科学和医学交流任务的背景下，表征了不同规模和系列的模型中的确定性扭曲。我们的结果表明，确定性扭曲影响了高达75%的LM输出，并且在改写任务中系统性地不对称，大多数LM将表达确定性增加的可能性是降低的1.5-2倍。这些效应可以通过重复释义累积：在医学领域，claude-haiku-4-5在一次迭代后增加了20%示例的确定性，五次迭代后增加到40%。基于提示的干预减少了整体确定性扭曲，但并未消除它。总之，这些发现揭示了普遍存在的夸大表达确定性的偏差，对在高风险领域依赖LM的用户有直接影响。

英文摘要

Humans increasingly turn to Language Models (LMs) in ways that shape beliefs and drive decisions, including discussing, rewriting, and summarizing information from scientific articles, news, and medical reports. However, in these domains, where how confidently a claim is expressed matters, little is known about whether LMs faithfully preserve it. In this work, we investigate certainty distortion in LMs, defined as meaningful changes in expressed certainty when semantic content is preserved. We propose an LM-based evaluation metric that is consistent with population-level judgments of certainty. Using this metric, we characterize certainty distortion across different sizes and families of models in the context of scientific and medical communication tasks. Our results show that certainty distortion affects up to 75\% of LM outputs and is systematically asymmetric in rewriting tasks with most LMs being 1.5-2$\times$ more likely to increase the expressed certainty than to decrease it. These effects can compound over repeated paraphrasing: in the medical domain, claude-haiku-4-5 increases certainty of 20\% examples after a single iteration, increasing to 40\% after five iterations. Prompt-based interventions reduce overall certainty distortion but do not eliminate it. Together, these findings reveal a general bias toward inflating expressed certainty, with direct implications for users who rely on LMs in high-stakes domains.

URL PDF HTML ☆

赞 0 踩 0

2606.08200 2026-06-09 cs.AI cs.LG 交叉投稿

Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents

在线智能体作为裁判：面向交互式智能体的情境生成评估

Hyogon Ryu, Jeonghwan Kim, Yewon Lim, Chaeun Lee, Jeongwook Kim, Donghoon Ham

发表机构 * KAIST（韩国科学技术院）

AI总结提出在线智能体作为裁判框架，通过部署环境内评估智能体主动生成相关情境，以评估交互式社交智能体的能力，提高标准覆盖率和与人类标签的一致性。

Comments ICML 2026 Workshop on Trustworthy AI for Good

详情

AI中文摘要

评估基于LLM的交互式社交智能体具有挑战性，因为社交相关行为不仅取决于孤立输出，还取决于先前的交互、社会角色和后续行动。现有方法通常允许目标智能体在环境中自由行动，然后对生成的轨迹进行评分。然而，这种被动设置可能会遗漏仅在特定社交情境下才可观察到的能力；例如，如果没有出现分歧，冲突处理可能不会被测试。我们提出在线智能体作为裁判，一种面向交互式社交智能体的情境生成评估框架。在线智能体作为裁判部署一个环境内评估智能体，通过环境原生的对话和行动协议与目标智能体交互，主动引出与评估标准相关的情境。生成的轨迹为评估即时响应和后续行为提供了证据。在一个包含32个设计师编写的社会标准的生命模拟环境中，在线智能体作为裁判提高了标准覆盖率和与人类标签的一致性，为被动方法可能未观察到的行为提供了更可靠的基于证据的评估。

英文摘要

Evaluating LLM-powered interactive social agents is challenging because socially relevant behaviors depend not only on isolated outputs, but also on prior interactions, social roles, and downstream actions. Existing methods typically allow a target agent to act freely in an environment and then score the resulting trajectory. However, this passive setup can miss capabilities that only become observable under specific social circumstances; for example, conflict handling may remain untested if no disagreement arises. We propose Online Agent-as-a-Judge, a situation-generating evaluation framework for interactive social agents. Online Agent-as-a-Judge deploys an in-world evaluator agent that interacts with the target agent through the environment's native dialogue and action protocol, actively eliciting situations relevant to the evaluation criteria. The resulting trajectories provide evidence for assessing both immediate responses and subsequent behavior. In a life-simulation environment with $32$ designer-authored social criteria, Online Agent-as-a-Judge improves criteria coverage and agreement with human labels, yielding more reliable evidence-grounded evaluations of behaviors that passive methods can leave unobserved.

URL PDF HTML ☆

赞 0 踩 0

2606.08228 2026-06-09 q-fin.TR cs.LG q-fin.CP q-fin.ST 交叉投稿

Post-Rejection Follow-up Sampling: A Methodology for Counterfactual Outcome Measurement in Algorithmic DEX Trading

拒绝后跟踪采样：算法DEX交易中反事实结果测量的一种方法

Arati Uday Kamat

发表机构 * Independent Researcher（独立研究者）

AI总结提出拒绝后跟踪采样（PRFS）方法，通过独立跟踪子系统采样被拒绝代币的价格和流动性，以评估过滤器精度，数据集包含2997个拒绝事件的67000条观测记录。

Comments 12 pages. Companion methodology paper to RED-2400 (arXiv:2605.12151). Currently under review at Ledger. SSRN abstract ID 6607301. Zenodo concept DOI 10.5281/zenodo.20043516

详情

DOI: 10.5281/zenodo.20043516

AI中文摘要

去中心化交易所（DEX）上的算法交易系统拒绝了它们评估的大多数候选代币。被拒绝候选代币的反事实结果（如果系统进入会发生什么）很少被测量。本文介绍了拒绝后跟踪采样（PRFS）。一个独立的跟踪子系统以可配置的频率对每个被拒绝代币的价格和流动性进行采样，时间跨度长达二十四小时。PRFS提供了评估过滤器精度所需的数据，这些数据基于被拒绝候选代币的实际市场结果，而不是基于合成的回测重建。方法论、数据架构和存款格式在第三节中描述。配套数据集包含2997个拒绝事件的67000个前向结果观测行，涵盖457个独特的铸币厂，在连续八天的时间窗口内收集（2026-04-10至2026-04-19，UTC）。大约55%的拒绝事件至少有一个前向观测；铸币厂级别的覆盖是完整的。下游分类的主要约束是每个事件的时间密度，而不是事件级别的覆盖。PRFS是数据集无关的。它适用于任何拒绝次数大大超过执行次数的算法决策系统。

英文摘要

Algorithmic trading systems on decentralised exchanges (DEXs) reject most candidate tokens they evaluate. The counterfactual outcome of rejected candidates (what would have happened had the system entered) is rarely measured. This paper introduces Post-Rejection Follow-up Sampling (PRFS). A separate tracking subsystem samples each rejected token's price and liquidity at a configurable cadence, over a horizon of up to twenty-four hours. PRFS produces the data needed to evaluate filter precision against actual market outcomes of rejected candidates, not against synthetic backtest reconstructions. The methodology, data architecture, and deposit format are described in Section III. The companion dataset contains 67,000 forward-outcome observation rows across 2,997 rejection events spanning 457 unique mints, collected over a continuous eight-day window (2026-04-10 to 2026-04-19, UTC). Approximately 55 percent of rejection events receive at least one forward observation; coverage at the mint level is complete. The principal binding constraint on downstream classification is per-event horizon density, not event-level coverage. PRFS is dataset-independent. It generalises to any algorithmic decision system in which rejections substantially outnumber executions.

URL PDF HTML ☆

赞 0 踩 0

2606.08340 2026-06-09 cs.AI cs.LG cs.MA 交叉投稿

Benchmarking Open-Ended Multi-Agent Coordination in Language Agents

开放式多智能体协作在语言智能体中的基准测试

Kale-ab Abebe Tessera, Andras Szecsenyi, Cameron Barker, Alexander Rutherford, Davide Paglieri, Aidan Scannell, Henry Gouk, Elliot J. Crowley, Tim Rocktäschel, Amos Storkey

发表机构 * University of Edinburgh（爱丁堡大学）； University of Oxford（牛津大学）； University College London（伦敦大学学院）

AI总结提出基于JAX的开放式多智能体协作基准Alem，评估13种现代LLM在长时生存世界中的零样本协作能力，发现协调能力是前沿LLM智能体的独立瓶颈。

Comments 42 pages, preprint

详情

AI中文摘要

随着语言模型越来越多地被部署为自主智能体，它们必须在开放式交互任务中与他人进行长期协调。然而，现有评估很少同时测试这些需求，而是强调单智能体任务、短交互或高度结构化的多智能体设置。我们提出了$alem$，一个基于JAX的开放式多智能体协作基准，构建在类似Craftax的动态之上。Alem将程序生成的协调任务、软专业化、通信和可控制的协调难度嵌入到一个具有探索、制作、交易和战斗的长期生存世界中。我们在同质团队中零样本评估了$13$种现代LLM，并以训练好的MARL智能体作为参考点。当前的LLM智能体远未解决Alem，平均标准化回报仅约6%，但它们的失败并非均匀分布。在最难的协调设置下，零样本的Gemini-3.1-Pro-High接近训练了十亿步的MARL智能体，而GPT-5.4-High实现了强基础任务奖励但协调奖励低得多。这种对比表明，个体任务能力并不等同于协调能力。消融实验表明，通信是协调的最大贡献者，而记忆和推理在用于维护多步计划时有所帮助。总体而言，我们的结果将协调确定为前沿LLM智能体的一个独立瓶颈，与单智能体能力分开。Alem使这一瓶颈可测量，并为开发能够通信、分配角色和执行共享计划的智能体提供了一个受控测试平台。代码可在https://github.com/alem-world/alem-env获取。

英文摘要

As language models are increasingly deployed as autonomous agents, they must coordinate with others over long horizons in open-ended interactive tasks. Yet existing evaluations rarely test these demands together, instead emphasising single-agent tasks, short interactions, or highly structured multi-agent settings. We introduce $alem$, a JAX-based benchmark for open-ended multi-agent coordination built on Craftax-like dynamics. Alem embeds procedurally generated coordination tasks, soft specialisation, communication, and controllable coordination difficulty into a long-horizon survival world with exploration, crafting, trading, and combat. We evaluate $13$ modern LLMs zero-shot within homogeneous teams, with trained MARL agents as reference points. Current LLM agents remain far from solving alem, averaging only ~6% normalised return, but their failures are not uniform. On the hardest coordination setting, zero-shot Gemini-3.1-Pro-High approaches MARL agents trained for one billion steps, while GPT-5.4-High achieves strong base-task reward but much lower coordination reward. This contrast shows that individual task competence does not imply coordination competence. Ablations show that communication is the largest contributor to coordination, while memory and reasoning help when used to maintain multi-step plans. Overall, our results identify coordination as a distinct bottleneck for frontier LLM agents, separate from single-agent capabilities. Alem makes this bottleneck measurable and provides a controlled testbed for developing agents that communicate, allocate roles, and execute shared plans. Code is available at https://github.com/alem-world/alem-env.

URL PDF HTML ☆

赞 0 踩 0

2606.08372 2026-06-09 cs.CR cs.LG 交叉投稿

SoK: Reconstruction Attacks on Synthetic Tabular Data (Insights from Winning the NIST CRC)

SoK: 合成表格数据的重建攻击（来自赢得NIST CRC的见解）

Steven Golob, Sikha Pentyala, Martine De Cock

发表机构 * School of Engineering and Technology, University of Washington Tacoma（华盛顿大学塔科姆分校工程与技术学院）； Department of Mathematics, Computer Science, and Statistics, Ghent University（根特大学数学、计算机科学与统计学系）

AI总结本文系统化了针对去标识化和合成表格数据的重建攻击，提出分类法、最全面的实证评估和新攻击，并引入解释攻击成功的方法论，发现合成数据生成方法比攻击选择更影响风险，差分隐私仅在低预算下有效。

详情

AI中文摘要

合成数据越来越被推广为发布敏感表格记录的隐私保护替代方案，但其核心对抗威胁（“重建”，即从合成发布和少量已知准标识符中恢复个体的隐藏属性值）仅在分散且难以比较的设置中研究过。我们首次系统化了针对去标识化和合成表格数据的重建（等价于属性推断）攻击。我们贡献了一个分类法，按攻击利用的结构组织攻击；迄今为止最系统的实证评估，将14种攻击与5个基准数据集上的9种合成数据生成（SDG）方法进行对比；以及一组填补分类法空白的新攻击，其中一种（CoBP-RA）是我们测量到的最强攻击。关键的是，我们引入了一种解释攻击成功含义的方法：一个记忆测试，区分从训练记录的记忆中重建总体分布，以及一个归约，将重建和成员推断置于单一可比较的尺度上。我们的发现：SDG方法的选择对风险的影响远大于攻击的选择；差分隐私主要在小预算（$\varepsilon\lesssim1$）下提供保护，超过该预算保护趋于平稳，受限于合成器的容量而非噪声；去标识化方法最暴露；大多数重建反映分布结构而非记忆，将个体风险集中在异常记录上。这些攻击和基础设施通过我们在2025年国家标准与技术研究院（NIST）合作研究周期中所有红队中取得第一名的成绩得到了外部验证。

英文摘要

Synthetic data is increasingly promoted as a privacy-preserving substitute for releasing sensitive tabular records, yet its central adversarial threat ("reconstruction", the recovery of an individual's hidden attribute values from a synthetic release and a handful of known quasi-identifiers) has been studied only in scattered, hard-to-compare settings. We present the first systematization of reconstruction (equivalently, attribute inference) attacks on de-identified and synthetic tabular data. We contribute a taxonomy that organizes attacks by the structure they exploit; the most systematic empirical evaluation to date, pitting fourteen attacks against nine synthetic data generation (SDG) methods across five benchmark datasets; and a set of new attacks that fill gaps in the taxonomy, one of which (CoBP-RA) is the strongest attack we measure. Crucially, we introduce a methodology for interpreting what attack success means: a memorization test that distinguishes reconstruction of the population distribution from memorization of training records, and a reduction that places reconstruction and membership inference on a single comparable scale. Our findings: the choice of SDG method governs risk far more than the choice of attack; differential privacy protects mainly at small budgets ($\varepsilon\lesssim1$), above which protection plateaus, bounded by the synthesizer's capacity rather than its noise; de-identification methods are the most exposed; and most reconstruction reflects distributional structure rather than memorization, concentrating individual risk on atypical records. The attacks and infrastructure are externally validated by our first-place finish among all red teams in the 2025 \textit{National Institute of Standards and Technology} (NIST) Collaborative Research Cycle.

URL PDF HTML ☆

赞 0 踩 0

2606.08460 2026-06-09 stat.ML cs.LG 交叉投稿

LOTTERY: Learning from Reference-Only Samples in Two-Sample Testing under Size Asymmetry

LOTTERY: 在样本量不对称下的双样本检验中仅从参考样本学习

Xunye Tian, Zhijian Zhou, Liuhua Peng, Feng Liu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对参考样本丰富而查询样本极少的双样本检验问题，提出利用参考样本学习依赖参考的表示并自适应加权，实现置换检验的I类错误控制和一致性。

Comments 16 pages, 1 figure

详情

Journal ref: ICML 2026

AI中文摘要

数据自适应的双样本检验通过从数据中学习的差异（例如基于核的特征表示）来评估两个样本是否来自同一分布。这类方法通常依赖数据分割来解耦学习和检验，并控制I类错误。然而，这种范式不适用于样本量严重不平衡的小样本场景：有大量参考样本可用，而只有少量查询样本。在本文中，我们展示了如何建设性地利用这种不平衡。利用丰富的参考数据，我们学习依赖参考的表示，这些表示总结了参考分布的主要结构，并为检测偏离提供了信息信号。我们引入了一系列表示族，捕获全局和局部结构，并通过不确定性引导原则仅使用参考样本自适应地加权它们。理论上，我们建立了基于置换的I类错误控制，并证明了聚合检验的一致性：随着样本量增长，只要表示集中至少包含一个一致表示，检验功效收敛到1。实验上，我们的聚合方法在多个基准测试中实现了强性能，同时保持了I类错误控制。

英文摘要

Data-adaptive two-sample testing assesses if two samples come from the same distribution, using a discrepancy learned from the data (e.g., via kernel-based feature representations). Such methods typically rely on data splitting to decouple learning from testing and control type I error. However, this paradigm is ill-suited to few-shot settings with severe sample-size imbalance: abundant reference samples are available, while only a handful of query samples arrive. In this paper, we show how this imbalance can be leveraged constructively. Using abundant reference data, we learn reference-dependent representations that summarize salient structure of the reference distribution and provide informative signals for detecting departures. We incorporate a collection of representation families that capture both global and local structure, and adaptively weight them using only reference samples via an uncertainty-guided principle. Theoretically, we establish permutation-based type I error control and show consistency of the aggregated test: as the sample sizes grow, the test power converges to one whenever the representation set contains at least one consistent representation. Empirically, our aggregation achieves strong performance across a range of benchmarks while retaining type I error control.

URL PDF HTML ☆

赞 0 踩 0

2606.08529 2026-06-09 cs.AI cs.CL cs.LG 交叉投稿

Scaffold Effects on GAIA: A Controlled Comparison

脚手架对GAIA的影响：一项受控比较

Jason Starace

发表机构 * Independent Researcher（独立研究员）

AI总结通过受控实验比较三种脚手架（ReAct、多智能体设计、规划-执行）对五个模型在GAIA验证集上的影响，发现脚手架选择可导致准确率差异高达28个百分点，且模型能力越强对脚手架依赖性不一定越低。

Comments 12 pages, 3 figures

详情

AI中文摘要

已发布的智能体能力评分混淆了模型本身的能力与脚手架赋予的能力，且这种激发差距的大小在受控条件下尚未得到充分表征。本研究在GAIA验证集的Level 1和Level 2上，对来自三个提供商的五个模型（Claude Opus 4.7、Sonnet 4.6、Haiku 4.5；Gemini 3.1 Pro Preview；GPT-5.5）进行了预先注册的受控比较，涉及三种脚手架（ReAct、规划-执行者-评估者多智能体设计以及规划-执行），保持任务和条件固定，每个问题尝试三次。仅脚手架选择就使单个模型（Opus，Level 2，稳健切片）的测量准确率移动了多达28个百分点，证实了预先注册的假设，即脚手架变化至少产生10个百分点的差距。预先注册的预测——能力更强的模型对脚手架敏感性更低——在方向上被拒绝：在每个数据集切片中，脚手架效应因模型而异，但能力最强的Anthropic模型在更难级别上从结构化脚手架中获益最多，且层级缩放仅在Level 1的稳健切片下成立。在Level 2上，多智能体相对于ReAct的优势出现在Anthropic系列内部，但跨提供商模型中没有，因此模型系列而非能力层级成为调节变量，而预测的规划-执行者在文件读取任务上的优势被证伪。结构化脚手架在更难级别上调用工具次数更少，但从中途错误中恢复的频率更高，且单个单元（Gemini搭配规划-执行者）在两个级别上成本最低，在Level 2上准确率最高。这些结果表明，单脚手架能力数值是脚手架条件估计，且激发差距不一定会随着模型改进而缩小。

英文摘要

Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characterized under controlled conditions. This study executes a pre-registered controlled comparison of three scaffolds (ReAct, a Planner-Actor-Rater multi-agent design, and planner-then-executor) across five models from three providers (Claude Opus 4.7, Sonnet 4.6, Haiku 4.5; Gemini 3.1 Pro Preview; GPT-5.5) on GAIA validation Levels 1 and 2, holding tasks and conditions fixed, with three attempts per question. Scaffold choice alone moves measured accuracy by as much as 28 percentage points within a single model (Opus, Level 2, robust slice), confirming the pre-registered hypothesis that scaffold variation produces gaps of at least 10 points. The pre-registered prediction that more capable models would be less scaffold-sensitive is rejected in direction: scaffold effects vary significantly by model in every dataset slice, but the most capable Anthropic model gains the most from structured scaffolds at the harder level, and tier-scaling holds only at Level 1 under the robust slice. The multi-agent advantage over ReAct at Level 2 appears within the Anthropic family but not for the cross-provider models, making model family rather than capability tier the conditioning variable, and the predicted planner-executor advantage on file-reading tasks is falsified. Structured scaffolds make fewer tool calls yet recover more often from mid-trajectory errors at the harder level, and a single cell (Gemini with planner-then-executor) is the cheapest at both levels and the most accurate at Level 2. These results indicate that single-scaffold capability numbers are scaffold-conditional estimates and that the elicitation gap is not guaranteed to shrink as models improve.

URL PDF HTML ☆

赞 0 踩 0

2606.08669 2026-06-09 cs.SD cs.LG 交叉投稿

A Comparison of SSL-Based Feature Extractors and Back-End Classifiers for Spoofing Detection: A Multi-Corpus Training and Cross-Linguistic Analysis

基于SSL的特征提取器与后端分类器在欺骗检测中的比较：多语料库训练与跨语言分析

Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans

发表机构 * Avignon Universite（阿维尼翁大学）； EURECOM

AI总结本研究通过多语料库训练和跨语言分析，比较了四种自监督学习特征提取器与四种后端分类器在欺骗检测中的性能，揭示了ASVspoof 5数据集中的领域偏差，并发现仅用8小时目标语言数据微调即可提升检测鲁棒性。

详情

AI中文摘要

语音生物识别系统面临来自欺骗攻击的日益增长的威胁，然而检测模型的评估在不同数据集上仍然不一致。为了研究这些不可预测的波动，我们对四种自监督学习特征提取器与四种后端分类器的组合进行了全面基准测试。我们比较了ResNet的层次化局部特征提取与基于注意力和图的后端的全局序列和关系建模。通过三种场景下的多语料库训练和六个评估数据集，我们的实证分析得出了两个关键发现。首先，我们揭示了ASVspoof 5数据集中的领域偏差，表明简单的数据缩放会主动降低性能。其次，我们的跨语言分析表明，仅用8小时的目标语言数据微调即可增强检测鲁棒性。这些发现共同强调了在欺骗检测中需要领域感知和语言特定适应的关键需求。

英文摘要

Voice biometric systems face growing threats from spoofing attacks, yet the evaluation of detection models remains inconsistent across datasets. To investigate these unpredictable fluctuations, we conduct a comprehensive benchmark of four self-supervised learning feature extractors paired with four back-end classifiers. We compare the hierarchical local feature extraction of ResNet with the global sequence and relational modeling of attention and graph-based back-ends. Through multi-corpus training across three scenarios and six evaluation datasets, our empirical analysis yields two critical findings. First, we expose a domain bias within the ASVspoof 5 dataset, showing that naive data scaling actively degrades performance. Second, our cross-linguistic analysis reveals that fine-tuning with just 8 hours of target-language data enhances detection robustness. Together, these findings emphasize the critical need for domain-aware and language-specific adaptation in spoofing detection.

URL PDF HTML ☆

赞 0 踩 0

2606.08679 2026-06-09 stat.ML cs.CL cs.LG stat.ME 交叉投稿

通过对抗性黑客-修复者循环强化智能体基准测试

Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Fewshot Corp（Fewshot公司）； Independent Researcher（独立研究员）

AI总结提出黑客-修复者循环方法，通过LLM代理交替攻击和修补验证器，自动生成抗利用的验证器，将KernelBench攻击成功率从62%降至0%。

详情

AI中文摘要

智能体基准测试通常使用手工编写且脆弱的验证器来评分提交结果，这容易导致奖励黑客攻击。我们审计了五个终端智能体基准测试中的1,968个任务，发现其中323个（16%）可以被前沿模型仅通过任务描述成功攻击。这既破坏了排行榜排名，也破坏了强化学习训练信号，但标准的应对措施是手动且被动的。\n我们引入了黑客-修复者循环，一种无需逐任务手动修补即可构建抗利用验证器的方法。该循环交替使用三个LLM代理：黑客尝试在不解决任务的情况下通过验证器，修复者修补验证器以拒绝每个发现的漏洞，求解者确认修补后的验证器仍接受合法解决方案。循环迭代：每次修补都会重塑验证器的奖励机制，从而暴露下一个漏洞。我们进一步增加了验证器访问权限，并允许修补跨任务迁移，以扩大循环发现的漏洞范围。\n在KernelBench上，该循环将公开报告的漏洞语料库上的攻击成功率从62%降至0%。我们还发现，循环中的较弱代理可以防御更强的黑客：Gemini 3 Flash的循环将更强的Gemini 3.1 Pro和Claude Opus 4.7在KernelBench上的攻击成功率从76%和61%降至0%，而Gemini 3.1 Pro在Terminal Bench上的攻击成功率从39%降至17%（覆盖77个任务）。我们发布了Terminal Wrench（323个可攻击环境，3,632条攻击轨迹）作为当前攻击面的快照，以及我们修补后的验证器、循环发现的漏洞和我们的实现，作为未来工作的基础。

英文摘要

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive. We introduce the hacker-fixer loop, a method for building exploit-resistant verifiers without per-task manual patching. The loop alternates three LLM agents: a hacker tries to pass the verifier without solving the task, a fixer patches the verifier to reject each discovered exploit, and a solver confirms the patched verifier still admits legitimate solutions. The loop iterates: each patch reshapes what the verifier rewards, surfacing the next exploit. We further add verifier access, and let patches transfer across tasks, to broaden the exploits the loop discovers. On KernelBench, the loop drives the attack success rate from 62% to 0% on a held-out corpus of publicly reported exploits. We also find that weaker agents in the loop can defend against much stronger hackers: Gemini 3 Flash's loop drives the stronger Gemini 3.1 Pro and Claude Opus 4.7's attack success rate from 76% and 61% to 0% on KernelBench, and Gemini 3.1 Pro's from 39% to 17% on Terminal Bench across 77 tasks. We release Terminal Wrench (323 hackable environments, 3,632 hack trajectories) as a snapshot of the current attack surface, our patched verifiers, the exploits the loop discovered, and our implementation as a basis for future work.

URL PDF HTML ☆

赞 0 踩 0

2606.09409 2026-06-09 cs.AI cs.CL cs.LG 交叉投稿

Correct Looks Better: Pairwise Comparisons Reveal Accuracy Rankings

正确看起来更好：成对比较揭示准确性排名

Mina Remeli, Moritz Hardt

发表机构 * Max Planck Institute for Intelligent Systems, Tübingen, Germany（马克斯·普朗克智能系统研究所，蒂宾根，德国）； Tübingen AI Center（蒂宾根人工智能中心）

AI总结本文通过将基准测试转化为生成式评估，发现成对比较结合Elo方法得到的模型排名与基于真实准确率的排名高度一致（Spearman相关系数>0.9），且风格和裁判偏见影响较小，但答案重复（echo）是裁判偏好的因果驱动因素。

Comments Accepted at ICML'26

2606.09473 2026-06-09 stat.ML cs.LG 交叉投稿

Report the Floor: A Training-Free Conformal Interval Is a Mandatory Baseline for Probabilistic Time-Series Forecasting

报告基线：无训练共形区间是概率时间序列预测的强制性基准

Valery Manokhin

发表机构 * Independent researcher（独立研究者）

AI总结提出无参数、无训练的共形朴素区间作为概率预测的强基线，在2217个真实序列上击败了多种现有方法，并主张其应成为强制性基准。

详情

DOI: 10.5281/zenodo.20594484

AI中文摘要

概率预测器越来越多地通过学习得到，但它们所比较的基线往往较弱或被忽略。我们表明，最简单的共形区间——一个包裹在有限样本分割共形残差分位数中的最后值点预测，无参数且无需训练——是一个远比其在近期学习预测和共形时间序列比较中几乎完全缺失所暗示的更强大的基线。在来自九个公共来源（Monash、LOTSA、LTSF交通/电力/天气套件、METR-LA、BOOM、nips/probts）的2217个真实序列的单步在线预测中，这个ConformalNaive区间决定性地击败了朴素值分位数基线、整个NPTS系列（NPTS 73%，SeasonalNPTS 64%的序列）以及已发表的共形季节池（CSP）方法（71%的序列，bootstrap 95% CI [69,73]，配对Wilcoxon p约7.6e-135）；它与更简单的学习共形预测器（RCI，分位数回归；中位数相对Winkler在2%以内）相当，并且仅被跟踪分布偏移的自适应在线和集成方法（SPCI、ACI、AgACI）击败，后者在相对Winkler上领先9-33%。它也比训练过的神经预测器校准得更好：在引入DeepNPTS的六个数据集上，平凡的基线在名义95%下覆盖真实值84-85%的时间，而DeepNPTS为66%。在多步季节视界上，情况反转：随机游走基线是最弱的方法，季节池（CSP）获胜——我们描绘了这一边界。最后，我们给出了ConformalNaive+，一个一行代码、无训练、视界自适应的选择器，它在每个视界上达到两个互补基线中较好的一个，并恢复了覆盖。我们认为，每当学习概率预测器声称有改进时，匹配的共形朴素基线必须是一个强制性基准。

英文摘要

Probabilistic forecasters are increasingly learned, yet the baselines they are compared against are often weak or omitted. We show that the simplest possible conformal interval - a last-value point forecast wrapped in a finite-sample split-conformal residual quantile, with no parameters and no training - is a far stronger baseline than its near-total absence from recent learned-forecasting and conformal-time-series comparisons would suggest. In one-step-ahead online forecasting across 2,217 real series from nine public sources (Monash, LOTSA, the LTSF traffic/electricity/weather suites, METR-LA, BOOM, nips/probts), this ConformalNaive interval decisively beats the naive value-quantile baselines, the entire NPTS family (NPTS 73%, SeasonalNPTS 64% of series), and the published Conformal Seasonal Pools (CSP) method (71% of series, bootstrap 95% CI [69,73], paired Wilcoxon p approx 7.6e-135); it is on par with the simpler learned conformal predictors (RCI, quantile regression; median relative Winkler within 2%) and is beaten only by the adaptive-online and ensemble methods (SPCI, ACI, AgACI), which track distribution shift and lead by 9-33% relative Winkler. It is also better calibrated than a trained neural forecaster: on the six datasets that introduced DeepNPTS, the trivial floors cover the truth 84-85% of the time at a nominal 95%, versus DeepNPTS's 66%. At multi-step seasonal horizons the picture inverts: the random-walk floor is the weakest method and the seasonal pool (CSP) wins - a boundary we map. Finally we give ConformalNaive+, a one-line, training-free, horizon-adaptive selector that attains the better of two complementary floors at every horizon with restored coverage. We argue the matching conformal naive floor must be a mandatory baseline whenever a learned probabilistic forecaster claims gains.

URL PDF HTML ☆

赞 0 踩 0

2606.09547 2026-06-09 cs.CV cs.LG 交叉投稿

Streaming Interventions: Can Video Large Language Models Correct Mistakes as They Occur?

流式干预：视频大语言模型能否在错误发生时即时纠正？

Apratim Bhattacharyya, Shweta Mahajan, Sanjay Haresh, Rajeev Yasarla, Reza Pourreza, Litian Liu, Risheek Garrepalli, Roland Memisevic

发表机构 * Qualcomm AI Research（高通人工智能研究院）； York University（约克大学）； Vector Institute for AI（向量人工智能研究所）

AI总结提出Ego-MC-Bench基准评估视频LLM在烹饪场景中的实时干预能力，并构建Ego-CoMist反事实合成数据集提升小模型性能。

Comments Qualcomm Interactive Cooking: Ego-MC-Bench -- available at https://huggingface.co/datasets/neuripsedtracksub/ego-mistake-corrections and Ego-CoMist -- available at https://huggingface.co/datasets/neuripsedtracksub/ego-counterfactual-mistakes

详情

AI中文摘要

学习日常技能（如烹饪一道菜）越来越依赖于教学媒体，例如在线视频。这为使用视频（和多模态）大语言模型（LLMs）作为任务指导助手打开了大门。一个潜在的任务指导助手在现实世界中成功的关键能力是，它能够在错误一出现时就主动干预以引导用户。为了评估这一关键能力，我们引入了Ego-MC-Bench（错误纠正），这是一个用于评估在现实烹饪场景中反应性、逐步任务指导的基准。大量实验表明，Ego-MC-Bench对于最先进的视频LLMs具有高度挑战性。我们认为一个关键原因是用于在此任务上微调模型的训练数据有限。尽管存在广泛的烹饪视频数据集，但现有数据集缺乏错误示例以及适当时间的干预。为了帮助解决这一数据限制，我们还引入了Ego-CoMist，这是一个反事实合成数据集，通过将非交互式烹饪视频转换为显示主动干预的监督训练示例而创建。我们表明，在Ego-CoMist上进行微调可以带来性能提升，特别是对于更适合在边缘设备上提供帮助的更小、更高效的视频LLMs。

英文摘要

Learning everyday skills, like cooking a dish, relies increasingly on instructional media such as online videos. This opens the door to the use of video (and multimodal) large language models (LLMs) as task guidance assistants. A crucial capability for the real-world success of a prospective task guidance assistant is it's ability to intervene proactively as soon as a mistake is apparent in order to guide the user. To evaluate this crucial capability, we introduce Ego-MC-Bench (Mistake Corrections), a benchmark for evaluating reactive, step-by-step task guidance in realistic cooking scenarios. Extensive experiments show that Ego-MC-Bench is highly challenging for state-of-the-art video LLMs. We argue that a key reason is the limited availability of training data for fine-tuning models on this task. Although there exists a wide range of cooking video datasets, existing datasets lack examples of mistakes along with appropriately timed interventions. To help address this data limitation, we also introduce Ego-CoMist, a counterfactual synthetic dataset created by transforming non -interactive cooking videos into supervised training examples showing proactive interventions. We show that fine-tuning on Ego-CoMist yields performance gains especially for smaller and more efficient video LLMs that are well suited for delivering assistance on edge devices.

URL PDF HTML ☆

赞 0 踩 0

2606.09646 2026-06-09 cs.CV cs.AI cs.LG 交叉投稿

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

视频基础模型是否理解直觉物理？逐层探测分析

Samuele Punzo, Niccolò Caselli, Ippokratis Pantelidis, Francesco Massafra, Salvatore Lo Sardo, Mohammadreza Salehi

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结通过冻结特征探测，研究预训练视频基础模型在直觉物理信息上的编码能力，发现V-JEPA表现最佳，物理信息在中后期层最易获取，且时序破坏显著降低性能。

详情

AI中文摘要

我们研究预训练视频基础模型是否在其冻结表示中编码直觉物理信息，以及该信息如何随模型家族、层和探测类型变化。通过在IntPhys2和Minimal Video Pairs (MVP)上进行冻结特征探测，我们比较了预测联合嵌入模型(V-JEPA)、掩码重建模型(VideoMAE)和基于扩散的视频生成器(LTX-Video)。V-JEPA在基准测试中取得最强整体结果，尤其是在建模时序动态的探测器中，而VideoMAE仍具竞争力，LTX-Video恢复较弱但非平凡的信号。逐层分析表明，物理相关信息在早期层最弱，在中后期深度最易获取；时序控制表明，打乱帧顺序显著降低性能，尤其是在MVP上。综合来看，这些结果表明直觉物理知识在预训练视频表示中可靠地出现，但其可获取性强烈依赖于预训练范式、表示深度和读出机制。

英文摘要

We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across model families, layers, and probe types. Using frozen-feature probing on IntPhys2 and Minimal Video Pairs (MVP), we compare predictive joint-embedding models (V-JEPA), masked reconstruction models (VideoMAE), and a diffusion-based video generator (LTX-Video). V-JEPA achieves the strongest overall results across benchmarks, especially with probes that model temporal dynamics, while VideoMAE remains competitive and LTX-Video recovers weaker but non-trivial signal. Layerwise analyses show that physics-relevant information is weakest in early layers and becomes most accessible at intermediate-to-late depth, and temporal controls show that disrupting frame order substantially reduces performance, especially on MVP. Together, these results suggest that intuitive-physics knowledge emerges reliably in pretrained video representations, but its accessibility depends strongly on pretraining paradigm, representational depth, and readout mechanism.

URL PDF HTML ☆

赞 0 踩 0

2606.09748 2026-06-09 cs.AI cs.CL cs.LG 交叉投稿

Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback

深度研究智能体在过程级反馈下的多轮评估

Rishabh Sabharwal, Hongru Wang, Amos Storkey, Jeff Z. Pan

发表机构 * Google DeepMind ； OpenAI ； Perplexity AI ； LangChain AI

AI总结针对深度研究智能体（DRA）在单轮输出评估的不足，提出研究缺口推断（RGI）方法提供过程级反馈，发现单轮过程反馈可提升8-15分，但多轮改进因回归问题难以持续。

Comments Published as a workshop paper at SCALE - ICML 2026 (Oral)

详情

AI中文摘要

现有的深度研究智能体（DRA）基准仅评估单次输出，忽略了一个关键问题：DRA能否在反馈指导下改进其报告？为此，我们在两种反馈设置下对DRA进行多轮评估：自我反思（智能体在无外部诊断信号的情况下修改报告）和过程级反馈（智能体接收针对其研究策略缺口的指导）。为提供过程级反馈，我们设计了研究缺口推断（RGI），该方法通过分析满足和未满足的评分标准模式来推断研究过程缺口。我们的分析揭示了三个关键发现：（i）在自我反思下，智能体以几乎相等的速率纳入和退步评分标准，导致净改进可忽略；（ii）单轮过程级反馈带来显著收益，将归一化分数提高约8-15分，并产生约35-40%的纳入率；（iii）这些收益在后续轮次中不会累积，因为智能体在重写完整报告以解决剩余缺口时，会退步多达24%的先前满足的标准。即使有针对性指导，我们所评估的DRA架构仍无法实现可靠的多轮改进。我们的代码和结果公开在 https://github.com/sabharwalrishabh/Multi-Turn-Evaluation-of-DRAs。

英文摘要

Existing benchmarks for deep research agents (DRAs) assess only single-shot outputs, ignoring a key question: can DRAs improve their reports when guided by feedback? To investigate this, we conduct a multi-turn evaluation of DRAs under two feedback settings: self-reflection, in which the agent revises its report without any external diagnostic signal, and process-level feedback, in which the agent receives guidance targeting gaps in its research strategy. To enable process-level feedback, we design Research Gap Inference (RGI), a method that analyzes patterns of satisfied and unsatisfied rubric criteria to infer research-process gaps. Our analysis reveals three key findings: (i) under self-reflection, agents incorporate and regress on rubric criteria at nearly equal rates, yielding negligible net improvement; (ii) a single round of process-level feedback yields substantial gains, raising the normalized score by approximately $8$-$15$ points and yielding a roughly $35$-$40\%$ incorporation rate; (iii) these gains do not compound over subsequent turns, as agents regress on up to $24\%$ of previously satisfied criteria when rewriting the full report to address remaining gaps. Even with targeted guidance, reliable multi-turn improvement remains out of reach for the DRA architectures we evaluate. Our code and results are publicly available at https://github.com/sabharwalrishabh/Multi-Turn-Evaluation-of-DRAs.

URL PDF HTML ☆

赞 0 踩 0

2402.08922 2026-06-09 cs.LG stat.ML 版本更新

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

镜像影响假说：利用前向传播的高效数据影响估计

Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia

发表机构 * Virginia Tech（弗吉尼亚理工大学）； Columbia University（哥伦比亚大学）

AI总结提出镜像影响假说，将训练数据对测试预测的影响转化为逆问题，通过测试样本梯度加训练样本前向传播高效估计数据影响，显著提升效率。

Comments The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

详情

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

AI中文摘要

大规模黑盒模型已在众多应用中变得无处不在。理解单个训练数据源对这些模型预测的影响对于提高其可信度至关重要。当前的影响估计技术涉及计算每个训练点的梯度或在不同子集上重复训练。这些方法在扩展到大型数据集和模型时面临明显的计算挑战。在本文中，我们引入并探索了镜像影响假说，强调了训练数据和测试数据之间影响的互反性质。具体来说，它表明评估训练数据对测试预测的影响可以重新表述为一个等价的逆问题：评估如果模型在特定测试样本上训练，训练样本的预测将如何改变。通过经验验证和理论验证，我们证明了我们假说的广泛适用性。受此启发，我们引入了一种新的训练数据影响估计方法，该方法需要计算特定测试样本的梯度，并结合每个训练点的前向传播。这种方法可以利用常见的不对称性，即同时检查的测试样本数量远小于训练数据集的规模，从而相比现有方法在效率上获得显著提升。我们展示了我们的方法在一系列场景中的适用性，包括扩散模型中的数据归因、数据泄露检测、记忆化分析、错误标记数据检测以及语言模型中的行为追踪。我们的代码将在以下网址提供：https://this https URL。

英文摘要

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

URL PDF HTML ☆

赞 0 踩 0

2503.05169 2026-06-09 cs.LG 版本更新

phepy: Visual benchmarks and improvements for out-of-distribution detectors

phepy: 面向分布外检测器的可视化基准与改进

Felix Krumbiegel, Juniper Tyree, Michael Boy, Petri Clusius, Andreas Rupp

发表机构 * Department of Mathematics, Saarland University（萨尔兰大学数学系）； Institute for Atmospheric and Earth System Research, University of Helsinki（赫尔辛基大学大气与地球系统研究所）； School of Engineering Sciences, LUT University（卢霍斯大学工程科学学院）

AI总结提出包含三个可视化玩具示例的OOD检测基准，评估现有方法，并引入t-poking和OOD样本加权改进监督式检测器在ID-OOD边界上的精度。

详情

AI中文摘要

将机器学习应用于日益高维且训练数据稀疏或有偏的问题，增加了模型在其训练领域之外的输入上使用的风险。对于此类分布外（OOD）输入，模型无法再做出有效预测，其误差可能无界。由于在真实数据集上测试OOD检测方法较为复杂，我们设计了一个OOD检测基准，其中包含三个新颖且易于可视化的玩具示例。这些简单示例提供了直接且直观的洞察，判断检测器是否能够检测（1）线性和（2）非线性概念，以及（3）在高维空间（干草堆）中识别细小的分布内（ID）子空间（针）。我们利用该基准评估了文献中多种方法的性能。由于OOD输入的触觉示例可能有益于OOD检测，我们还回顾了几种用于监督训练合成OOD输入的简单方法。我们引入了两项改进，即$t$-poking和OOD样本加权，使监督式检测器在ID-OOD边界上更加精确。当真实ID样本与合成OOD样本之间的冲突模糊了决策边界时，这一点尤为重要。最后，我们为在机器学习中构建和应用OOD检测器提供了建议。

英文摘要

Applying machine learning to increasingly high-dimensional problems with sparse or biased training data increases the risk that a model is used on inputs outside its training domain. For such out-of-distribution (OOD) inputs, the model can no longer make valid predictions, and its error is potentially unbounded. Since testing OOD detection methods on real-world datasets is complicated, we design a benchmark for OOD detection, which includes three novel and easily-visualisable toy examples. These simple examples provide direct and intuitive insight into whether the detector is able to detect (1) linear and (2) non-linear concepts and (3) identify thin in-distribution (ID) subspaces (needles) within high-dimensional spaces (haystacks). We use our benchmark to evaluate the performance of various methods from the literature. Since tactile examples of OOD inputs may benefit OOD detection, we also review several simple methods to synthesise OOD inputs for supervised training. We introduce two improvements, $t$-poking and OOD sample weighting, to make supervised detectors more precise at the ID-OOD boundary. This is especially important when conflicts between real ID and synthetic OOD sample blur the decision boundary. Finally, we provide recommendations for constructing and applying OOD detectors in machine learning.

URL PDF HTML ☆

赞 0 踩 0

2507.12843 2026-06-09 cs.LG stat.ML 版本更新

Are Two Datasets Close Enough With Statistical Significance? A Kernel Distributional Closeness Testing Approach

两个数据集在统计意义上是否足够接近？一种核分布接近性检验方法

Zhijian Zhou, Liuhua Peng, Xunye Tian, Mingming Gong, Feng Liu

AI总结针对分布接近性检验（DCT）在复杂数据上的局限性，提出基于核的最大均值差异（MMD）的改进度量NAMMD，并构建NAMMD-DCT方法，在保持I类错误有界的同时提高检验功效。

详情

AI中文摘要

两个分布在统计意义上是否接近？分布接近性检验（DCT）通过检验分布对之间的距离是否至少为epsilon来形式化这一问题。现有的DCT方法主要测量定义在离散空间上的分布对之间的差异，例如使用总变差，这限制了它们在图像等复杂数据上的应用。为了将DCT扩展到更多类型的数据，一个自然的想法是将最大均值差异（MMD）引入DCT场景，MMD是衡量复杂分布之间分布差异的强大度量。然而，实证结果表明，许多分布对可能具有相同的MMD值，尽管它们在同一个再生核希尔伯特空间（RKHS）中具有不同的范数。这些分布对可能表现出不同的有限样本可区分性，并反映不同的实际接近程度，使得MMD在DCT中信息量不足。为了缓解这个问题，我们设计了一种新的分布差异度量——范数自适应MMD（NAMMD），它使用分布的RKHS范数来缩放MMD值。基于NAMMD的渐近分布，我们提出了基于NAMMD的DCT来评估分布对的接近程度。理论上，我们证明了基于NAMMD的DCT比基于MMD的DCT具有更高的检验功效，同时保持有界的I类错误。这一点在多种类型的数据（包括合成噪声和真实图像）上的大量实验中得到进一步验证。我们的代码可在此https URL获取。

英文摘要

Are two distributions close to each other with statistical significance? Distribution closeness testing (DCT) formalizes this question by testing whether the distance between a distribution pair is at least epsilon-far. Existing DCT methods mainly measure discrepancies between distribution pairs defined on discrete spaces, for example using total variation, which limits their application to complex data such as images. To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measure of distributional discrepancy between complex distributions, into DCT scenarios. However, empirical results indicate that many distribution pairs can have the same MMD value despite having different norms in the same reproducing kernel Hilbert space (RKHS). These pairs may exhibit different finite-sample distinguishability and reflect different practical closeness levels, making MMD less informative for DCT. To mitigate this issue, we design a new measure of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales the MMD value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we propose NAMMD-based DCT to assess the closeness level of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power than MMD-based DCT while maintaining bounded type-I error. This is further validated by extensive experiments on multiple types of data, including synthetic noise and real images. Our code is available at https://github.com/zhijianzhouml/NAMMD.

URL PDF HTML ☆

赞 0 踩 0

2510.09783 2026-06-09 cs.LG cs.AI stat.ML 版本更新

Large Language Models for Imbalanced Classification: Diversity makes the difference

大语言模型用于不平衡分类：多样性至关重要

Dang Nguyen, Sunil Gupta, Kien Do, Thin Nguyen, Taylor Braund, Alexis Whitton, Svetha Venkatesh

发表机构 * Applied Artificial Intelligence Initiative (A 2 I 2 )（应用人工智能倡议（A2I2））； Deakin University（德肯大学）； Black Dog Institute（黑狗研究所）； University of New South Wales（新南威尔士大学）

AI总结提出基于大语言模型的过采样方法，通过条件采样、排列微调和插值样本增强多样性，在10个表格数据集上优于8个基线方法。

详情

AI中文摘要

过采样是解决不平衡分类最广泛使用的方法之一。其核心思想是生成额外的少数类样本以重新平衡数据集。大多数现有方法（如SMOTE）需要将分类变量转换为数值向量，这通常会导致信息损失。最近，基于大语言模型（LLM）的方法被引入以克服这一限制。然而，当前的LLM方法通常生成多样性有限的少数类样本，降低了下游分类任务的鲁棒性和泛化能力。为了解决这一问题，我们提出了一种新的基于LLM的过采样方法，旨在增强多样性。首先，我们引入了一种采样策略，将合成样本生成条件化为少数类标签和特征。其次，我们开发了一种新的排列策略来微调预训练的LLM。第三，我们不仅在少数类样本上微调LLM，还在插值样本上微调以进一步丰富变异性。在10个表格数据集上的大量实验表明，我们的方法显著优于八个SOTA基线。生成的合成样本既真实又多样。此外，我们通过基于熵的视角提供了理论分析，证明了我们的方法鼓励生成样本的多样性。

英文摘要

Oversampling is one of the most widely used approaches for addressing imbalanced classification. The core idea is to generate additional minority samples to rebalance the dataset. Most existing methods, such as SMOTE, require converting categorical variables into numerical vectors, which often leads to information loss. Recently, large language model (LLM)-based methods have been introduced to overcome this limitation. However, current LLM-based approaches typically generate minority samples with limited diversity, reducing robustness and generalizability in downstream classification tasks. To address this gap, we propose a novel LLM-based oversampling method designed to enhance diversity. First, we introduce a sampling strategy that conditions synthetic sample generation on both minority labels and features. Second, we develop a new permutation strategy for fine-tuning pre-trained LLMs. Third, we fine-tune the LLM not only on minority samples but also on interpolated samples to further enrich variability. Extensive experiments on 10 tabular datasets demonstrate that our method significantly outperforms eight SOTA baselines. The generated synthetic samples are both realistic and diverse. Moreover, we provide theoretical analysis through an entropy-based perspective, proving that our method encourages diversity in the generated samples.

URL PDF HTML ☆

赞 0 踩 0

2511.03877 2026-06-09 cs.LG 版本更新

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

社交平台领先滞后预测的基准数据集

Kimia Kazemian, Zhenzhen Liu, Yangfanyu Yang, Katie Luo, Shuhan Gu, Audrey Du, Xinyu Yang, Jack Jansons, Kilian Q. Weinberger, John Thickstun, Yian Yin, Sarah Dean

发表机构 * Cornell University（康奈尔大学）； Stanford University（斯坦福大学）； Boston University（波士顿大学）

AI总结本文提出领先滞后预测（LLF）问题，并发布arXiv和GitHub两个大规模基准数据集，通过统计检验验证领先滞后动态，为社交平台时间序列预测提供标准化测试平台。

Comments 11 pages, 8 figures, includes supplementary material (6 pages, 5 figures). Accepted at ACM SIGKDD 2026 (KDD '26). Code and data: https://lead-lag-forecasting.github.io

详情

DOI: 10.1145/3770855.3817523

AI中文摘要

社交和协作平台产生多变量时间序列轨迹，其中早期交互（如浏览、点赞或下载）之后，有时数月或数年后，会出现更高影响力的结果（如引用、销售或评论）。我们将此设定形式化为领先滞后预测（LLF）：给定一个早期使用通道（领先），预测一个相关但时间上偏移的结果通道（滞后）。尽管这种模式普遍存在，但LLF尚未被时间序列社区视为统一的预测问题，主要原因是缺乏标准化数据集。为了锚定LLF研究，本文提出了两个大规模基准数据集：arXiv（访问量 -> 230万篇论文的引用量）和GitHub（推送/星标 -> 300万个仓库的复刻量）。我们的数据集通过捕捉跨年的长期动态、涵盖完整的结果谱以及避免采样中的生存偏差，为领先滞后预测提供了理想的测试平台。我们记录了数据整理和清洗的所有技术细节，通过统计和分类测试验证了领先滞后动态的存在，并基准测试了参数化和非参数化回归基线。我们的研究将LLF确立为一种新的预测范式，并为其在社交和使用数据中的系统探索奠定了实证基础。

英文摘要

Social and collaborative platforms emit multivariate time-series traces in which early interactions -- such as views, likes, or downloads -- are followed, sometimes months or years later, by higher impact like citations, sales, or reviews. We formalize this setting as Lead-Lag Forecasting (LLF): given an early usage channel (the lead), predict a correlated but temporally shifted outcome channel (the lag). Despite the ubiquity of such patterns, LLF has not been treated as a unified forecasting problem within the time-series community, largely due to the absence of standardised datasets. To anchor research in LLF, here we present two high-volume benchmark datasets: arXiv (accesses -> citations of 2.3M papers) and GitHub (pushes/stars -> forks of 3M repositories). Our datasets provide ideal testbeds for lead-lag forecasting, by capturing long-horizon dynamics across years, spanning the full spectrum of outcomes, and avoiding survivorship bias in sampling. We documented all technical details of data curation and cleaning, verified the presence of lead-lag dynamics through statistical and classification tests, and benchmarked parametric and non-parametric baselines for regression. Our study establishes LLF as a novel forecasting paradigm and lays an empirical foundation for its systematic exploration in social and usage data.

URL PDF HTML ☆

赞 0 踩 0

2601.04498 2026-06-09 cs.LG cs.CV 版本更新

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

IGenBench：文本到信息图生成可靠性基准测试

Yinghao Tang, Xueding Liu, Boyuan Zhang, Tingfeng Lan, Yupeng Xie, Jiale Lao, Yiyao Wang, Haoxuan Li, Tingting Gao, Bo Pan, Luoxuan Weng, Xiuqi Huang, Minfeng Zhu, Yingchaojie Feng, Yuyu Luo, Wei Chen

发表机构 * State Key Lab of CAD&CG, Zhejiang University（浙江大学CAD与CG国家重点实验室）； UESTC ； University of Virginia（弗吉尼亚大学）； HKUST(GZ)（香港科技大学（广州））； Cornell University（康奈尔大学）； Zhejiang University（浙江大学）； National University of Singapore（新加坡国立大学）

AI总结提出IGENBENCH基准，包含30种信息图类型和600个测试用例，通过多模态大语言模型分解为10类原子问题评估10种T2I模型，发现数据完整性等维度是普遍瓶颈。

详情

AI中文摘要

信息图是结合数据可视化与文本和插图元素的复合视觉制品，用于传达信息。虽然最近的文本到图像（T2I）模型可以生成美观的图像，但它们在生成信息图方面的可靠性仍不清楚。生成的信息图可能乍看正确，但包含容易被忽视的问题，例如扭曲的数据编码或错误的文本内容。我们提出了IGENBENCH，这是第一个评估文本到信息图生成可靠性的基准，包含跨越30种信息图类型的600个精心设计的测试用例。我们设计了一个自动评估框架，将可靠性验证分解为基于10种问题类型的原子是否问题。我们使用多模态大语言模型（MLLM）验证每个问题，得到问题级准确率（Q-ACC）和信息图级准确率（I-ACC）。我们在IGENBENCH上全面评估了10个最先进的T2I模型。我们的系统分析揭示了未来模型开发的关键见解：（i）三级性能层次，顶级模型的Q-ACC为0.90，但I-ACC仅为0.49；（ii）数据相关维度成为普遍瓶颈（例如，数据完整性：0.21）；（iii）所有模型实现端到端正确性的挑战。我们在https://this URL发布IGENBENCH。

英文摘要

Infographics are composite visual artifacts that combine data visualizations with textual and illustrative elements to communicate information. While recent text-to-image (T2I) models can generate aesthetically appealing images, their reliability in generating infographics remains unclear. Generated infographics may appear correct at first glance but contain easily overlooked issues, such as distorted data encoding or incorrect textual content. We present IGENBENCH, the first benchmark for evaluating the reliability of text-to-infographic generation, comprising 600 curated test cases spanning 30 infographic types. We design an automated evaluation framework that decomposes reliability verification into atomic yes/no questions based on a taxonomy of 10 question types. We employ multimodal large language models (MLLMs) to verify each question, yielding question-level accuracy (Q-ACC) and infographic-level accuracy (I-ACC). We comprehensively evaluate 10 state-of-the-art T2I models on IGENBENCH. Our systematic analysis reveals key insights for future model development: (i) a three-tier performance hierarchy with the top model achieving Q-ACC of 0.90 but I-ACC of only 0.49; (ii) data-related dimensions emerging as universal bottlenecks (e.g., Data Completeness: 0.21); and (iii) the challenge of achieving end-to-end correctness across all models. We release IGENBENCH at https://igen-bench.vercel.app/.

URL PDF HTML ☆

赞 0 踩 0

2601.06649 2026-06-09 cs.LG cs.AI 版本更新

CHIMERA-Bench：一种针对表位特异性抗体设计的基准数据集

Mansoor Ahmed, Nadeem Taj, Imdad Ullah Khan, Hemanth Venkateswara, Murray Patterson

发表机构 * Georgia State University（佐治亚州立大学）； Georgia Institute of Technology（佐治亚理工学院）； University of Engineering and Technology（工程与技术大学）； Lahore University of Management Sciences（拉合尔管理科学大学）

AI总结本文提出CHIMERA-Bench，一个统一的抗体设计基准，包含2922个抗原-抗体复合物数据，测试泛化能力，并评估多种生成方法的通用性。

详情

AI中文摘要

计算抗体设计在过去三年中取得了快速的方法进展，提出了数十种深度生成方法，但该领域缺乏标准化的基准用于公平比较和模型开发。这些方法在不同的SAbDab快照、非重叠测试集和不兼容的指标上进行评估，文献将设计问题分解为多个子任务，没有共同定义。我们引入CHIMERA-Bench：（CDR建模与表位引导的重设计），围绕单一经典任务：表位条件下的CDR序列-结构共设计。CHIMERA-Bench提供三个组成部分。第一个是一个经过精心挑选、去重的包含2922个抗体-抗原复合物的数据集，带有表位和抗原结合位点注释。第二个是一组三个生物动机的分割，测试泛化到未见表位、未见抗原折叠和前瞻性时间目标的能力。第三个是全面的评估协议，包括五个指标组，包括新的表位特异性度量。我们基准测试了十一种方法，涵盖六个生成范式，并在所有分割上报告结果。CHIMERA-Bench是该抗体设计问题中最大的数据集，允许社区开发和测试新方法，并评估其泛化能力。

英文摘要

Computational antibody design has seen rapid methodological progress, with dozens of deep generative methods proposed in the past three years, yet the field lacks a standardized benchmark for fair comparison and model development. These methods are evaluated on different SAbDab snapshots, non-overlapping test sets, and incompatible metrics, and the literature fragments the design problem into numerous sub-tasks with no common definition. We introduce CHIMERA-Bench: (CDR Modeling with Epitope-guided Redesign), a unified benchmark built around a single canonical task: epitope-conditioned CDR sequence-structure co-design. CHIMERA-Bench provides three components. The first is a curated, deduplicated dataset of 2,922 antibody-antigen complexes with epitope and paratope annotations. The second is a set of three biologically motivated splits that test generalization to unseen epitopes, unseen antigen folds, and prospective temporal targets. The third is a comprehensive evaluation protocol with five metric groups, including novel epitope-specificity measures. We benchmark eleven methods spanning six generative paradigms and report results across all splits. CHIMERA-Bench is the largest dataset of its kind for the antibody design problem, allowing the community to develop and test novel methods and evaluate their generalizability.

URL PDF HTML ☆

赞 0 踩 0

2604.26498 2026-06-09 cs.LG q-bio.QM 版本更新

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

大模型真的在药物发现中胜出吗？AI驱动的分子性质和活性预测中模型规模的基准评估

Jinjiang Guo, Sheng Ding

发表机构 * Global Health Drug Discovery Institute（全球健康药物发现研究所）； School of Pharmaceutical Sciences（药学院）

AI总结本文通过26个ADME、毒性及生物活性端点评估，发现传统机器学习在多数任务中表现最佳，大模型在部分困难分割中竞争力有限，模型性能依赖于任务与验证场景的适配性，而非单纯规模。

Comments Improved benchmark design and reproducibility, replaced restricted datasets with public benchmarks in primary analyses, and added sensitivity analyses supporting the interpretation of model scaling and evaluation protocol effects in molecular prediction

详情

AI中文摘要

分子基础模型和大语言模型的快速发展促使人们以规模为中心看待AI在药物发现中的应用，认为更大的预训练模型将取代紧凑的化学信息学模型。我们测试了这一假设，涵盖26个ADME、毒性及生物活性端点，共165,541个端点级别化合物标签记录。基准测试包含78个端点和分割条目，通过随机、Murcko骨架和结构分离的5折交叉验证协议评估，代表递增的化学泛化难度。在156个任务和指标比较中，传统机器学习（ML）提供了最大的最佳表现份额（47.4%），其次是预训练分子序列模型（28.8%）、图神经网络（21.8%）和基于LLM的SAR基线（1.9%）。传统ML在随机分割插值中占优，并总体上是最大的胜利家族。GNN和序列模型在部分更难的分割中具有竞争力，但其严格胜利份额在固定最终窗口读取下减少，表明对训练设置和模型选择的敏感性。配对Bootstrap分析显示，模型间的小数值差异不应被视为决定性胜利。训练折叠中的SAR知识提高了GPT5.5-SAR和Opus4.7-SAR指标，但并未使基于规则的推理成为监督预测器的通用替代品。紧凑的专业模型仍高度有效，预测性能取决于模型、任务和验证场景之间的适配性，而非规模本身。

英文摘要

The rapid growth of molecular foundation models and large language models (LLMs) has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models. We test this assumption across 26 ADME, toxicity and bioactivity endpoints, covering 165,541 endpoint level compound label records. The benchmark contains 78 endpoint and split entries evaluated under random, Murcko scaffold and structure separated 5-fold cross validation protocols, representing increasing chemical generalization difficulty. Across 156 task and metric comparisons, classical machine learning (ML) provides the largest share of best performing entries (47.4%), followed by pretrained molecular sequence models (28.8%), graph neural networks (21.8%) and LLM based SAR baselines (1.9%). Classical ML dominates random split interpolation and remains the largest winner family overall. GNN and sequence models are competitive in selected harder splits, but their strict winner shares decrease under a fixed final-window readout, indicating sensitivity to training settings and model selection. Paired bootstrap analyses show that small numerical differences between individual models should not be read as decisive victories. SAR knowledge from training folds improves GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective, and predictive performance depends on the fit among model, task and validation scenario, not on scale alone.

URL PDF HTML ☆

赞 0 踩 0

2605.23595 2026-06-09 cs.LG cs.AI cs.CV cs.ET cs.PF 版本更新

Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

基于元学习的成本效益模型评估

Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

发表机构 * Griffith University（格里菲斯大学）； Edith Cowan University（埃迪斯科文大学）； The University of Queensland（昆士兰大学）

AI总结提出MetaEvaluator，一种基于元学习的模型无关框架，通过参考模型池实现无标签数据上的快速、准确且成本效益高的新模型评估。

Comments Accepted by KDD 2026

详情

AI中文摘要

机器学习的快速发展产生了不断扩展的模型生态系统，使得在未见过的未标记数据上验证新发布模型的可靠性变得越来越具有挑战性。传统的评估流程依赖于昂贵的标注、重复的微调或无法跨模型家族迁移的狭窄假设。我们提出了MetaEvaluator，一个成本效益高、模型无关的框架，用于快速、无标签地评估跨不同架构和模态的未见模型。MetaEvaluator利用参考模型池上的元学习来获得可迁移的初始化，从而能够准确评估新模型，同时将成本分摊到整个池中，并消除了每个模型重新训练的需要。据我们所知，这是第一个能够在完全未标记数据集上评估新模型的模型无关框架。大量实验表明，与传统方法相比，MetaEvaluator以显著降低的成本产生稳定且准确的性能估计，使得在未标记数据上对新出现的模型进行可扩展的基准测试变得实用。

英文摘要

The rapid advancement of machine learning has led to an unprecedented expansion of model ecosystems, making it increasingly difficult to assess the reliability of newly released models on unseen and unlabeled data. Existing evaluation pipelines typically rely on costly annotation, repeated fine-tuning, or assumptions that do not generalize well to new models. We introduce MetaEvaluator, a cost-effective, model-agnostic framework for fast, label-free evaluation of unseen models across diverse architectures and modalities. MetaEvaluator meta-learns over a pool of reference models to acquire an effective initialization for accurate assessment of unseen models, thereby amortizing evaluation cost and eliminating the need for per-model retraining. To the best of our knowledge, this is the first model-agnostic framework that evaluates new models on unlabeled datasets. Extensive experiments demonstrate that MetaEvaluator delivers stable and accurate performance estimates at substantially lower cost than conventional approaches, enabling scalable benchmarking on unlabeled datasets for emerging models. The code is available at: https://github.com/phkhanhtrinh23/MetaEvaluator.

URL PDF HTML ☆

赞 0 踩 0

2605.30184 2026-06-09 cs.LG physics.ao-ph 版本更新

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

AI天气模型能否预测两周以上？长期推演的定量基准与分析

Fanny Lehmann, Firat Ozdemir, Yun Cheng, Torsten Hoefler, Sebastian Schemm, Benedikt Soja, Siddhartha Mishra

发表机构 * ETH AI Center（ETH人工智能中心）； ETH Zurich（苏黎世联邦理工学院）； Swiss Data Science Center（瑞士数据科学中心）； Scalable Parallel Computing Lab（可扩展并行计算实验室）； Dep. of Applied Mathematics and Theoretical Physics（应用数学与理论物理系）； University of Cambridge（剑桥大学）； Institute of Geodesy and Photogrammetry（大地测量与摄影测量研究所）； Seminar for Applied Mathematics（应用数学研讨会）

AI总结通过九种AI天气模型的一年推演，将长期不稳定性分类为爆发、漂移和季节性丧失三种模式，并发现稳定性取决于对小时空尺度的处理。

2606.05441 2026-06-09 cs.LG cs.AI stat.ML 版本更新

GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

GOTabPFN: 从特征排序到高维表格基础模型的紧凑分词化

Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Kumar Gyawali, Gianfranco Doretto, Donald A. Adjeroh

发表机构 * University of Cambridge（剑桥大学）

AI总结针对高维小样本表格预测问题，提出GOTabPFN模型，通过图引导排序和神经启发子单元压缩实现紧凑表示，提升TabPFN在严格token预算下的稳定性和准确性。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Code and resources GitHub https://github.com/zadid6pretam/GOTabPFN PyPI https://pypi.org/project/gotabpfn Project webpage https://www.zadidhabib.com/gotabpfn.html Hugging Face ZeroGPU https://huggingface.co/spaces/zadid6pretam/GOTabPFN CPU backup https://huggingface.co/spaces/zadid6pretam/GOTabPFN_CPU

2606.07379 2026-06-09 cs.LG cs.AI cs.CL stat.ME 版本更新

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

编码智能体会欺骗我们吗？通过带随机测试的上限评估检测和防止作弊

Thanawat Lodkaew, Johannes Ackermann, Soichiro Nishimori, Nontawat Charoenphakdee, Masashi Sugiyama, Takashi Ishida

发表机构 * The University of Tokyo（东京大学）； RIKEN（理化学研究所）

AI总结提出CapCode框架，通过设置上限评估检测模型在编码任务中的作弊行为，并设计CapReward奖励机制防止作弊，实验表明该方法能有效检测和减少作弊。

详情

AI中文摘要

在智能体评估和训练中，一个日益增长的失败模式是模型可以通过利用捷径而非解决预期任务来获得高评估分数，产生欺骗性表现。这使得评估分数作为真实任务解决能力的度量不可靠。我们提出CapCode，一个构建带有随机测试的编码数据集的框架，其最佳可达的非作弊性能被故意限制在1以下。这种上限性能设计赋予评估分数更清晰的解释：显著高于上限的分数是不可信的，因此提供了作弊的证据。为了防止作弊，我们提出CapReward，一种基于CapCode原则的奖励设计，以抑制超出上限的优化。跨多个数据集的实验表明，CapCode能够检测作弊同时保持模型的性能排名，CapReward减少了作弊行为，产生了更好地遵循预期任务规范的模型。

英文摘要

A growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance. This makes evaluation scores unreliable as measures of true task-solving ability. We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer interpretation: scores substantially above the cap are implausible and therefore provide evidence of cheating. To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance ranking of models, and CapReward reduces cheating behavior, yielding models that better follow the intended task specification.

URL PDF HTML ☆

赞 0 踩 0

2009.10277 2026-06-09 cs.CL cs.LG cs.SI 版本更新

Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

使用分面Rasch项目反应理论和可解释性设计的深度学习测量仇恨言论谱系

Chris J. Kennedy, Geoff Bacon, Alexander Sahn, Claudia von Vacano

发表机构 * Center for Precision Psychiatry, Mass General Hospital Department of Psychiatry, Harvard Medical School（精准精神病学中心，麻省总医院精神病科，哈佛医学院）； D-Lab University of California, Berkeley（加州大学伯克利分校D实验室）

AI总结提出结合监督深度学习与分面Rasch项目反应理论的方法，将仇恨言论分解为10个有序标签，通过IRT模型转化为区间测量值并调整标注者视角，在RoBERTa模型上提升准确性，实现连续谱系测量与可解释性。

Comments 7 pages, 6 figures

详情

AI中文摘要

我们提出一个系统，通过结合监督深度学习与分面Rasch项目反应理论（IRT），在从种族灭绝言论到支持性言论的连续区间值谱系上测量仇恨言论。我们将仇恨言论的理论构念分解为10个有序标签的操作化构成概念。这些标签通过IRT概率潜在模型重构为区间结果测量，同时估计并调整每个标注者的标注视角。我们的标度程序自然地与用于自动预测的多任务深度学习架构集成，允许通过那些组件对连续分数进行基于设计的可解释性。我们将此方法应用于一个新的开源数据集，该数据集包含来自YouTube、Twitter和Reddit的50,070条社交媒体评论，由11,143名美国亚马逊土耳其机器人工作者进行标注和标记。我们的基于RoBERTa的模型相比替代方法显示出改进的准确性。该系统为监督NLP提供了一种新范式，鼓励连续而非二元的构念，以及基于设计的标注者视角和模型可解释性的整合。

英文摘要

We propose a system for measuring hate speech on a continuous, interval-valued spectrum ranging from genocidal to supportive speech by combining supervised deep learning with faceted Rasch item response theory (IRT). We decompose the theoretical construct of hate speech into constituent concepts operationalized as 10 ordinal labels. Those labels are reconstituted via IRT probabilistic latent modeling into an interval outcome measure while simultaneously estimating and adjusting for each annotator's labeling perspective. Our scaling procedure integrates naturally with a multitask deep learning architecture for automated prediction, allowing design-based explainability of the continuous score through those components. We apply this method to a new, open source dataset of 50,070 social media comments sourced from YouTube, Twitter, and Reddit, annotated and labeled by 11,143 United States-based Amazon Mechanical Turk workers. Our RoBERTa-based model shows improved accuracy compared to alternative approaches. This system offers a new paradigm for supervised NLP that encourages continuous rather than binary constructs, and design-based incorporation of annotator perspective and model explainability.

URL PDF HTML ☆

赞 0 踩 0

2208.00778 2026-06-09 cs.DB cs.LG q-bio.QM 版本更新

SFILES 2.0: An extended text-based flowsheet representation

SFILES 2.0：一种扩展的基于文本的流程图表示

Gabriel Vogel, Edwin Hirtreiter, Lukas Schulze Balhorn, Artur M. Schweidtmann

发表机构 * University of Technology, Department of Chemical Engineering（技术大学，化工系）； TU Delft（代尔夫特理工大学）； Van der Maasweg 9 2629 HZ ； Delft, The Netherlands（代尔夫特，荷兰）

AI总结提出SFILES 2.0，通过扩展符号和命名约定解决原版无法明确描述关键配置和控制结构的问题，并开源实现流程图与字符串的自动转换，旨在推动化工流程图FAIR数据库建设。

详情

DOI: 10.1007/s11081-023-09798-9
Journal ref: Optimization and Engineering, Volume 24, pages 2911-2933, (2023)

AI中文摘要

SFILES是一种基于文本的化工流程图表示法。最初由d'Anterroches（通过基团贡献法进行流程生成与设计）提出，其灵感来自基于文本的分子SMILES表示法。与流程图图像相比，文本格式在存储格式、计算可访问性以及最终的数据分析和处理方面具有若干优势。然而，原始SFILES版本无法明确描述基本的流程图配置，例如塔顶和塔底产品的区分。它也无法描述化工过程安全可靠运行所需的控制结构。此外，目前没有公开可用的软件用于将化工过程拓扑结构编码或解码为SFILES。我们提出了SFILES 2.0，并完整描述了扩展符号和命名约定。此外，我们提供了开源软件，用于流程图图与SFILES 2.0字符串之间的自动转换。通过这种方式，我们希望鼓励研究人员和工程师以SFILES 2.0字符串的形式发布他们的流程图拓扑结构。最终目标是建立化工过程流程图FAIR数据库的标准，这对于未来的数据处理和分析将具有重要价值。

英文摘要

SFILES are a text-based notation for chemical process flowsheets. They were originally proposed by d'Anterroches (Process flow sheet generation & design through a group contribution approach) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.

URL PDF HTML ☆

赞 0 踩 0

2506.20573 2026-06-09 stat.ML cs.LG 版本更新

LARP: Learner-Agnostic Robust Data Prefiltering

LARP: 学习者无关的鲁棒数据预过滤

Kristian Minchev, Dimitar I. Dimitrov, Nikola Konstantinov

发表机构 * INSAIT, Sofia University "St. Kliment Ohridski"（INSAIT，索菲亚大学‘圣克莱门特·奥赫里德斯基’）

AI总结提出LARP框架，通过预过滤程序保护多种下游学习器性能，理论证明可行性并分析性能损失，实验评估了图像和表格任务中的代价。

Comments Published in Transactions on Machine Learning Research (06/2026). URL: https://openreview.net/forum?id=gI6VOV3jfO

详情

AI中文摘要

公共数据集对现代机器学习和统计推断至关重要，但通常包含低质量或受污染的样本，这可能损害模型性能。因此，需要一种原则性的预过滤程序，数据提供者可以应用该程序同时保护一系列潜在下游统计和学习程序的准确性。在这项工作中，我们形式化并分析了学习者无关的鲁棒数据预过滤（LARP），即设计预过滤程序的问题，该程序对预先指定的学习者集合上的最坏情况损失有保证。我们在两个理论环境中建立了LARP的可行性，通过提供最坏情况损失的上界保证。我们的理论结果表明，与针对单个学习者的特定预过滤相比，通过LARP保护异构学习者集合会以一定的性能损失为代价；我们将这一差距称为LARP的代价。为了评估这一性能差距，我们在图像和表格任务上实证测量了LARP的代价。我们进一步从节省重复数据整理工作的角度探讨了LARP的潜在好处，在一个博弈论模型中，下游学习者可以分摊单一预过滤的成本。

英文摘要

Public datasets, crucial for modern machine learning and statistical inference, often contain low-quality or contaminated samples that can harm model performance. This creates a need for principled prefiltering procedures that a data provider can apply to protect the accuracy of a range of potential downstream statistical and learning procedures simultaneously. In this work, we formalize and analyze Learner-Agnostic Robust data Prefiltering (LARP), the problem of designing prefiltering procedures with guarantees on the worst-case loss over a pre-specified set of learners. We establish the feasibility of LARP in two theoretical settings, by providing upper-bound guarantees on the worst-case loss. Our theoretical results indicate that protecting heterogeneous learner sets via LARP comes at the price of some performance loss compared to individual, learner-specific prefiltering; we call this gap the price of LARP. To assess this gap in performance, we empirically measure the price of LARP across image and tabular tasks. We further explore potential benefits of LARP from the perspective of saving on repeated data curation efforts, in a game-theoretic model where the downstream learners can split the cost of the single prefiltering.

URL PDF HTML ☆

赞 0 踩 0

2507.20975 2026-06-09 stat.ML cs.LG 版本更新

Locally Adaptive Conformal Inference for Operator Models

算子模型的局部自适应共形推断

Trevor Harris, Yan Liu

发表机构 * University of Connecticut（康涅狄格大学）； Meta Platforms Inc（Meta平台公司）

AI总结提出局部切片共形推断（LSCI），一种无分布框架，为算子模型生成函数值、局部自适应预测集，在合成和实际任务中比共形基线更紧、适应性更强。

Comments 12 pages, 3 figures, 2 tables, Preprint

2509.09151 2026-06-09 cs.CV cs.AI cs.LG 版本更新

Video Understanding by Design: How Datasets Shape Video Models

通过设计理解视频：数据集如何塑造视频模型

Lei Wang, Syuan-Hao Li, Piotr Koniusz, Yongsheng Gao

发表机构 * School of Engineering and Built Environment, Electrical and Electronic Engineering, Griffith University（工程与建筑环境学院，电气与电子工程学院，格里菲斯大学）； School of Computer Science and Engineering, University of New South Wales（计算机科学与工程学院，新南威尔士大学）

AI总结本文从数据集视角出发，提出统一框架连接数据集结构、归纳偏差与架构设计，分析数据集特性如何驱动视频理解架构创新，并讨论不同数据体制下的表征偏差。

Comments Research report

详情

AI中文摘要

楔形采样：具有近线性样本复杂性的高效张量补全

Hengrui Luo, Anna Ma, Ludovic Stephan, Yizhe Zhu

发表机构 * Rice University（里士满大学）； University of California, Irvine（加州大学尔湾分校）； Univ Rennes, Ensai, CNRS, CREST-UMR 9194（里昂大学，Ensai，CNRS，CREST-UMR 9194）； University of Southern California（南加州大学）

AI总结提出楔形采样非自适应方案，通过结构化长度二模式（楔形）分配观测，在均匀采样稀疏时增强谱信号，实现近线性样本复杂度的张量补全。

Comments COLT 2026 arXiv version. 65 pages, 3 figures

详情

AI中文摘要

我们引入了楔形采样（Wedge Sampling），一种用于低秩张量补全的新型非自适应采样方案。我们研究从部分条目中恢复维度为 $n \times \cdots \times n$ 的 $k$ 阶低秩张量。与标准均匀条目模型（即来自 $[n]^k$ 的 i.i.d. 样本）不同，楔形采样将观测分配到关联二分采样图中的结构化长度二模式（楔形）。通过直接促进这些长度二连接，采样设计增强了在均匀采样过于稀疏而无法产生足够信息相关性的情况下高效初始化所依赖的谱信号。我们的主要结果表明，这种采样范式的改变使得多项式时间算法能够以 $n$ 的近线性样本复杂度实现弱恢复和精确恢复。该方法也是即插即用的：基于楔形采样的谱初始化可以与现有的细化过程（例如，谱方法或梯度方法）结合，仅需额外 $\tilde{O}(n)$ 个均匀采样条目，显著优于在均匀条目采样下高效方法通常所需的 $\tilde{O}(n^{k/2})$ 样本复杂度。总体而言，我们的结果表明，Barak 和 Moitra (2022) 中强调的统计-计算差距在很大程度上是张量补全中均匀条目采样模型的结果，而保证强初始化的替代非自适应测量设计可以克服这一障碍。

英文摘要

We introduce Wedge Sampling, a new non-adaptive sampling scheme for low-rank tensor completion. We study recovery of an order-$k$ low-rank tensor of dimension $n \times \cdots \times n$ from a subset of its entries. Unlike the standard uniform entry model (i.e., i.i.d. samples from $[n]^k$), wedge sampling allocates observations to structured length-two patterns (wedges) in an associated bipartite sampling graph. By directly promoting these length-two connections, the sampling design strengthens the spectral signal that underlies efficient initialization, in regimes where uniform sampling is too sparse to generate enough informative correlations. Our main result shows that this change in sampling paradigm enables polynomial-time algorithms to achieve both weak and exact recovery with nearly linear sample complexity in $n$. The approach is also plug-and-play: wedge-sampling-based spectral initialization can be combined with existing refinement procedures (e.g., spectral or gradient-based methods) using only an additional $\tilde{O}(n)$ uniformly sampled entries, substantially improving over the $\tilde{O}(n^{k/2})$ sample complexity typically required under uniform entry sampling for efficient methods. Overall, our results suggest that the statistical-to-computational gap highlighted in Barak and Moitra (2022) is, to a large extent, a consequence of the uniform entry sampling model for tensor completion, and that alternative non-adaptive measurement designs that guarantee a strong initialization can overcome this barrier.

URL PDF HTML ☆

赞 0 踩 0

2602.12129 2026-06-09 cs.IR cs.LG 版本更新

Towards Personalized Bangla Book Recommendation: A Large-Scale Heterogeneous Book Graph Dataset

面向个性化孟加拉语图书推荐：大规模异构图书图谱数据集

Rahin Arefin Ahmed, Md. Anik Chowdhury, Sakil Ahmed Sheikh Reza, Devnil Bhattacharjee, Muhammad Abdullah Adnan, Julian McAuley, Nafis Sadeq

发表机构 * East West University（东西方大学）； Bangladesh University of Engineering and Technology（孟加拉工程与技术大学）； University of California San Diego（加州大学圣地亚哥分校）

AI总结针对孟加拉语文学缺乏结构化大规模公开数据集的问题，构建了RokomariBG异构图书图谱数据集，包含12.7万本书、6.3万用户等实体及多种关系，通过基准测试表明异构关系与混合文本元数据显著影响推荐性能。

Comments Added new experiment results on sequential recommendation, top-N recommendation results have been updated using per user temporal leave-last-one-out instead of random split

详情

AI中文摘要

孟加拉语文学中的个性化图书推荐一直受限于缺乏结构化、大规模且公开可用的数据集。本文介绍了RokomariBG，一个大规模异构图书图谱数据集，旨在支持低资源语言环境下的个性化推荐研究。该数据集包含127,302本书、63,723个用户、16,601位作者、1,515个类别、2,757家出版社和209,602条评论，通过多种关系类型连接，并组织为综合知识图谱。为展示数据集的实用性，我们针对Top-N推荐和序列推荐任务进行了系统基准研究，评估了多种代表性推荐模型。通过全面基准测试，我们证明了该领域的推荐性能同时受异构关系信息和混合文本元数据的强烈影响。这些发现揭示了孟加拉国电商生态系统中现有推荐基准大多缺失的独特挑战。总体而言，本文为孟加拉语图书推荐研究建立了基础基准和公开可用资源，实现了可重复评估及未来对低资源文化领域推荐的研究。数据集和代码已公开于此https URL。

英文摘要

Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through several relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we present a systematic benchmarking study on the top-N recommendation and sequential recommendation tasks, evaluating a diverse set of representative recommendation models. Through comprehensive benchmarking, we demonstrate that recommendation performance in this domain is strongly influenced by both heterogeneous relational information and code-mixed textual metadata. These findings reveal unique challenges of Bangladeshi e-commerce ecosystems that are largely absent from existing recommendation benchmarks. Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset

URL PDF HTML ☆

赞 0 踩 0

2604.06210 2026-06-09 cs.CL cs.AI cs.CY cs.LG 版本更新

Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook

基于价值码本的LLM文化价值对齐的分布式开放式评估

Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Hyunjin Hwang, Roy Ka-Wei Lee, Xing Xie, JinYeong Bak

发表机构 * KAIST（韩国科学技术院）

AI总结提出DOVE框架，通过率失真变分优化构建价值码本，利用不平衡最优传输度量分布对齐，解决LLM文化价值评估中的构造-组成-上下文挑战。

Comments ICML 2026 Camera Ready

详情

AI中文摘要

随着LLM在全球部署，使其文化价值取向对齐对于安全性和用户参与至关重要。然而，现有基准面临构造-组成-上下文（$C^3$）挑战：依赖判别性、多项选择格式，探测的是价值知识而非真实取向，忽视亚文化异质性，且与真实世界的开放式生成不匹配。我们引入DOVE，一个直接比较人类撰写的文本分布与LLM生成输出的分布式评估框架。DOVE利用率失真变分优化目标从10K文档中构建紧凑的价值码本，将文本映射到结构化价值空间以过滤语义噪声。使用不平衡最优传输测量对齐，捕捉文化内分布结构和子群体多样性。在12个LLM上的实验表明，DOVE实现了优越的预测有效性，与下游任务的相关性达到31.56%，同时每个文化仅需500个样本即可保持高可靠性。

英文摘要

As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and subgroup diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.

URL PDF HTML ☆

赞 0 踩 0

2605.19276 2026-06-09 cs.CL cs.LG 版本更新

大规模MST-Direct：基于Sinkhorn最优传输的多变量与条件地质统计模拟

Tcharlies Bachmann Schmitz

发表机构 * GitHub ； arXiv

AI总结提出MST-Direct扩展方法，通过稀疏Sinkhorn匹配器、多变量元组匹配和克里金条件化，实现大规模、多变量、条件地质统计模拟，精确保持联合分布。

详情

AI中文摘要

本文将MST-Direct（一种用于多变量地质统计模拟的基于Sinkhorn传输的匹配方法）从原始的二元、无条件、小网格形式扩展到多变量、条件和大网格设置。我们解决了原始工作中确定的三个主要限制：（i）通过具有O(nC)内存复杂度的稀疏、候选限制的Sinkhorn匹配器，实现超过几千个节点的可扩展性；（ii）通过将目标值元组匹配到独立FFT-MA高斯骨干上扩展到多个变量，该骨干再现指定的变差函数；以及（iii）通过克里金法条件化骨干，同时在其空间位置固定观测数据元组进行硬数据条件化。由于传输计划仍然是目标元组的排列，多变量联合分布被精确保持。该方法使用与直接多变量模拟（DMS）相同的六变量、异方差、强非线性参考分布进行验证，在无条件（200x200）和条件（100x100，200个硬数据样本）场景下，并与投影寻踪多变量变换（PPMT）进行基准比较。结果表明，MST-Direct以零直方图误差再现联合分布，精确满足硬数据，并准确再现指定的空间相关结构，而PPMT仍然是近似。索引术语-最优传输，Sinkhorn算法，地质统计模拟，多变量模拟。

英文摘要

This paper extends MST-Direct, a Matching-via-Sinkhorn-Transport approach for multivariate geostatistical simulation, from the original bivariate, unconditional, small-grid formulation to multivariate, conditional, and large-grid settings. We address the three main limitations identified in the original work: (i) scalability beyond a few thousand nodes through a sparse, candidate-restricted Sinkhorn matcher with O(nC) memory complexity; (ii) extension to multiple variables by matching target value tuples onto an independent FFT-MA Gaussian backbone that reproduces a prescribed variogram; and (iii) hard-data conditioning by fixing observed data tuples at their spatial locations while conditioning the backbone through kriging. Because the transport plan remains a permutation of the target tuples, the multivariate joint distribution is preserved exactly. The method is validated using the same six-variate, heteroscedastic, strongly nonlinear reference distribution employed in Direct Multivariate Simulation (DMS), under both unconditional (200x200) and conditional (100x100, 200 hard-data samples) scenarios, and is benchmarked against the Projection Pursuit Multivariate Transform (PPMT). Results show that MST-Direct reproduces the joint distribution with zero histogram error, exactly honours hard data, and accurately reproduces the prescribed spatial correlation structure, whereas PPMT remains an approximation. Index Terms-Optimal transport, Sinkhorn algorithm, geostatistical simulation, multivariate simulation.

URL PDF HTML ☆

赞 0 踩 0

2606.07582 2026-06-09 cs.LG cs.AI cs.ET 新提交

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

基于FT-Transformer和堆叠集成的结构化数据客户流失预测

Joyjit Roy, Samaresh Kumar Singh, Laxmi Shaw

发表机构 * Independent Researcher, Austin, TX, USA（独立研究员，美国德克萨斯州奥斯汀）； Independent Researcher, Leander, TX（独立研究员，美国德克萨斯州利安德）； Texas A & M University-Victoria, Victoria, TX（德克萨斯农工大学维多利亚分校）

AI总结提出一种结合FT-Transformer与XGBoost的混合架构，通过校准感知堆叠集成处理类别不平衡和特征交互，在银行客户流失数据集上F1达62.10%，AUC-ROC为0.861。

Comments 22 pages, 9 figures, 20 tables; published in IEEE Access

详情

DOI: 10.1109/ACCESS.2026.3686374
Journal ref: IEEE Access, vol. 14, pp. 62834-62855, 2026

AI中文摘要

客户流失预测在保险、数字银行、电子商务和订阅平台等数据驱动行业中至关重要，因为保留现有客户通常比获取新客户更具成本效益。由于类别不平衡、非线性特征交互和异质特征类型，在结构化数据集上预测流失仍然具有挑战性。基于树的集成方法在这些场景中始终表现出强大的性能，通常优于传统神经网络。本研究引入了一种经过验证的混合架构，通过校准感知堆叠将特征标记化变换器（FT-Transformer）与梯度提升树相结合。所提出的框架解决了先前研究中在统计验证、概率校准和可重复性方面的持续空白。FT-Transformer利用自注意力捕获高阶特征交互，而XGBoost通过互补的归纳偏置捕获梯度提升决策边界。类别不平衡通过使用类别加权损失函数处理，从而避免合成过采样并保留少数类分布。模型使用基于折叠外（OOF）堆叠的逻辑回归元学习器进行集成，该元学习器重新校准过于自信的基模型输出并学习最优组合权重。在一个公开的银行流失数据集上，混合模型在5x5交叉验证下达到62.10%的F1、0.861的AUC-ROC和0.647的PR-AUC，相比多层感知机（MLP）基线分别提升3.37个F1点和0.027个AUC，并报告了95%置信区间。消融研究表明，变换器组件和堆叠策略都对性能有实质性贡献。所提出的方法为结构化表格数据上的当代流失预测提供了一个可重复且可扩展的参考架构。

英文摘要

Customer churn prediction is essential across data-driven industries such as insurance, digital banking, eCommerce, and subscription platforms, where retaining existing customers is typically more cost-effective than acquiring new ones. Predicting churn on structured datasets remains challenging due to class imbalance, nonlinear feature interactions, and heterogeneous feature types. Tree-based ensemble methods consistently demonstrate strong performance in these contexts, often outperforming conventional neural networks. This study introduces a validated hybrid architecture that integrates feature-tokenized transformers (FT-Transformer) with gradient-boosted trees through calibration-aware stacking. The proposed framework addresses persistent gaps in statistical validation, probability calibration, and reproducibility found in prior research. The FT-Transformer captures higher-order feature interactions using self-attention, while XGBoost captures gradient-boosted decision boundaries with complementary inductive biases. Class imbalance is handled using class-weighted loss functions, thereby avoiding synthetic oversampling and preserving minority-class distributions. The models are ensembled using out-of-fold (OOF) stacking with a logistic regression meta-learner, which recalibrates overconfident base model outputs and learns optimal combination weights. On a public bank churn dataset, the hybrid model achieves 62.10% F1, 0.861 AUC-ROC, and 0.647 PR-AUC, outperforming the Multi-Layer Perceptron (MLP) baseline by 3.37 F1 points and 0.027 AUC under 5x5 cross-validation with 95% confidence intervals reported. Ablation studies demonstrate that both the transformer component and stacking strategy contribute materially to performance. The proposed methodology offers a reproducible and extensible reference architecture for contemporary churn prediction on structured tabular data.

URL PDF HTML ☆

赞 0 踩 0

2606.07606 2026-06-09 cs.LG 新提交

QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants

QDSP：一种用于预测极低出生体重婴儿死亡或脑瘫的可解释结构化学习框架

Ling Wang, Xiaolong Li, Hui Zhou, Jing Shi, Fuhao Zhang, Dapeng Chen, Nan Mu

发表机构 * College of Computer Science, Sichuan Normal University（四川师范大学计算机科学学院）； West China Second University Hospital, Sichuan University（四川大学华西第二医院）

AI总结提出QDSP框架，集成配额引导子空间采样和可微决策结构感知，在极低出生体重婴儿队列中实现高精度死亡/脑瘫预测，并提供可解释的临床决策路径。

详情

AI中文摘要

极低出生体重婴儿（VLBWI）面临高死亡风险和严重神经发育障碍（包括脑瘫），但在高维且数据有限的临床环境中，可靠的出院时预后分层仍然具有挑战性。为解决此问题，我们提出QDSP，一种可解释的结构化学习框架，集成配额引导子空间采样（QSS）和可微决策引导结构感知（DSP）。QSS模块通过基于自助法的特征一致性估计构建稳定性感知且低冗余的特征子空间，而DSP模块采用可微软斜决策结构建模非线性临床交互，同时保留可追溯的决策证据。该框架在包含51名婴儿的真实VLBWI队列上评估，并在三个公共医学表格数据集上进一步验证。在主要队列上，QDSP达到0.9200的准确率和0.9714的AUC，优于代表性机器学习和深度表格学习基线，包括XGBoost、TabNet和TabPFN。在外部数据集上，QDSP在不同样本量和临床分布下保持有竞争力的判别力和校准度。此外，基于SHAP的分析和可微决策路径追踪识别出临床相关预测因子，包括囊性脑室周围白质软化（cPVL）和出生体重，与已建立的新生儿病理生理学证据一致。这些结果表明，QDSP为VLBWI出院时风险分层提供了可解释且稳健的框架，并可能支持新生儿重症监护环境中的早期个体化临床决策。

英文摘要

Very low birth weight infants (VLBWI) are at high risk of mortality and severe neurodevelopmental impairment, including cerebral palsy, yet reliable discharge-time prognostic stratification remains challenging in high-dimensional and data-limited clinical settings. To address this problem, we propose QDSP, an interpretable structured learning framework that integrates Quota-guided Subspace Sampling (QSS) and Differentiable-decision-guided Structure Perception (DSP). The QSS module constructs stability-aware and low-redundancy feature subspaces through bootstrap-based feature consistency estimation, whereas the DSP module employs differentiable soft oblique decision structures to model nonlinear clinical interactions while preserving traceable decision evidence. The proposed framework was evaluated on a real-world VLBWI cohort comprising 51 infants and further validated on three public medical tabular datasets. On the primary cohort, QDSP achieved an accuracy of 0.9200 and an AUC of 0.9714, outperforming representative machine learning and deep tabular learning baselines, including XGBoost, TabNet, and TabPFN. Across external datasets, QDSP maintained competitive discrimination and calibration under varying sample sizes and clinical distributions. In addition, SHAP-based analyses and differentiable decision-path tracing identified clinically relevant predictors, including cystic periventricular leukomalacia (cPVL) and birth weight, consistent with established neonatal pathophysiological evidence. These results suggest that QDSP provides an interpretable and robust framework for discharge-time risk stratification in VLBWI and may support early individualized clinical decision-making in neonatal intensive care settings.

URL PDF HTML ☆

赞 0 踩 0

2606.07614 2026-06-09 cs.LG stat.AP 新提交

Measuring Poverty and Inequality with Reduced Data: A Machine Learning Approach Using Nigerian Household Data

用缩减数据衡量贫困与不平等：基于尼日利亚住户数据的机器学习方法

Vanesa Jordá, Miguel Niño-Zarazúa

发表机构 * Cantabria University（坎塔布里亚大学）； SOAS University of London（伦敦大学亚非学院）； United Nations University World Institute for Development Economics Research (UNU-WIDER)（联合国大学世界发展经济学研究所）

AI总结本文利用随机森林递归特征消除法分析尼日利亚调查数据，发现少量预测因子即可高精度识别贫困状态和不平等线位置，表明机器学习可优化调查设计并降低数据需求。

详情

AI中文摘要

可靠衡量收入和消费对于监测中低收入国家的贫困与不平等至关重要，但完整的住户调查成本高昂且难以定期实施。本文探讨缩减调查工具能否保留关键分布信息。我们应用随机森林递归特征消除法（RF-RFE）对2018/19年尼日利亚通用住户调查面板数据进行分析，识别最能将个体划分到福利分布中的收入来源、消费类别和住户特征。分析聚焦三个结果：贫困状态、在五等分分布中的位置以及相对于基于基尼系数的不平等线的位置。调查的种植后和收获后阶段使我们能够评估不同季节背景下的表现。结果表明，RF-RFE在少量预测因子下实现了强分类准确率。对于消费，使用少量支出类别即可准确预测贫困状态和不平等线位置，而五等分分类对季节性消费达到约80%的准确率，对从单次季节性访问预测的年消费达到60-65%的准确率。对于收入，使用五个预测因子贫困状态准确率约达90%，不平等线位置主要由劳动收入捕获。研究结果表明，机器学习方法有助于改进调查设计并减少数据需求，同时保留衡量和监测贫困与不平等所需的大部分分布信息。

英文摘要

Reliable measurement of income and consumption is essential for monitoring poverty and inequality in low- and middle-income countries, yet full household surveys are costly and difficult to implement regularly. This paper examines whether reduced survey instruments can preserve key distributional information. We apply Random Forest Recursive Feature Elimination (RF-RFE) to the 2018/19 Nigeria General Household Survey-Panel to identify the income sources, consumption categories and household characteristics that best classify individuals within the welfare distribution. The analysis focuses on three outcomes: poverty status, location in the quintile distribution and position relative to the Gini-based inequality line. The survey's post-planting and post-harvest periods allow us to assess performance under different seasonal contexts. Results show that RF-RFE achieves strong classification accuracy with few predictors. For consumption, poverty status and inequality-line position are accurately predicted using a small set of expenditure categories, while quintile classification reaches about 80 percent accuracy for seasonal consumption and 60--65 percent for annual consumption predicted from a single seasonal visit. For income, poverty status reaches around 90 percent accuracy with five predictors, and inequality-line position is largely captured by labour earnings. The findings suggest that machine-learning methods can help improve survey design and reduce data requirements while retaining much of the distributional information needed to measure and monitor poverty and inequality.

URL PDF HTML ☆

赞 0 踩 0

2606.07651 2026-06-09 cs.LG cs.CV 新提交

KITE: A Tri-Modal Transformer Integrating Text, Images, and Knowledge Graphs for Fake News Detection

KITE：一种融合文本、图像和知识图谱的三模态假新闻检测Transformer

Kevin Patel, Shashi Bhushan Jha

发表机构 * Department of Computer Science, University of West Florida（威斯福大学计算机科学系）

AI总结提出三模态假新闻检测框架KITE，联合建模文本、视觉和知识表示，利用跨模态注意力整合特征，在基准数据集上显著优于单双模态基线。

详情

AI中文摘要

随着多模态虚假信息日益复杂，无缝融合欺骗性文本、操纵性视觉和事实错误的主张，传统的假新闻检测方法已落后。大多数先前工作侧重于文本-图像融合，或将外部知识仅作为后处理步骤应用，限制了其检测更深层语义不一致的能力。在本文中，我们引入了KITE（知识集成文本-图像编码器），一种三模态假新闻检测框架，联合建模文本、视觉和事实知识表示。KITE利用Roberta [23,14]和CLIP [24]进行语言和视觉编码，同时图注意力网络（GAT）处理从Wikidata检索的结构化事实。KITE在多模态Transformer中使用跨模态注意力[9]来集成文本、视觉和知识特征，帮助理解每种模态如何相互关联。模态特定置信度分数与最终预测一起生成，通过指示哪种输入类型对决策影响最大来提供可解释性。在基准数据集上的评估表明，KITE显著优于单模态和双模态基线，特别是在涉及图像-文本不匹配或与外部知识矛盾的情景中。

英文摘要

Traditional fake news detection methods are falling behind as multimodal misinformation grows more advanced, seamlessly blending deceptive text, manipulated visuals, and factually incorrect claims. Most prior work focuses on text-image fusion or applies external knowledge only as a post-processing step, limiting their ability to detect deeper semantic inconsistencies. In this paper, we introduce KITE (Knowledge-Integrated Text-Image Encoder), a tri-modal fake news detection framework that jointly models textual, visual, and factual knowledge representations. KITE leverages Roberta [23,14] and CLIP [24] for linguistic and visual encoding, while a Graph Attention Network (GAT) processes structured facts retrieved from Wikidata. KITE uses cross-modal attention [9] within a multimodal transformer to integrate text, visual, and knowledge features, helping it understand how each modality relates to one another. Modality-specific confidence scores are generated alongside the final prediction, offering interpretability by indicating which input type most influenced the decision. Evaluations on benchmark datasets demonstrate that KITE significantly outperforms unimodal and bimodal baselines, particularly in scenarios involving image-text mismatches or contradictions with external knowledge.

URL PDF HTML ☆

赞 0 踩 0

2606.07685 2026-06-09 cs.LG cs.AI 新提交

Test-Time Adaptive Composition for Machine Learning as a Service (MLaaS) in IoT Environments

物联网环境下机器学习即服务（MLaaS）的测试时自适应组合

Deepak Kanneganti, Sajib Mistry, Sheik Mohammad Mostakim Fattah, Aneesh Krishna

发表机构 * Deepak Kanneganti ； Sajib Mistry ； Sheik Mohammad Mostakim Fattah ； Aneesh Krishna

AI总结针对物联网环境中MLaaS组合因动态性而失效的问题，提出一种测试时自适应（TTA）组合框架，通过TTA感知可组合性模型和服务级自适应模型，在推理时调整服务并保持组合性能，显著降低计算时间。

2606.07686 2026-06-09 cs.LG cs.AI 新提交

Knowledge-Inclusive Adaptive Physics-Informed Neural Network for Microbial Interaction Modelling

知识包容的自适应物理信息神经网络用于微生物相互作用建模

Ravisha Rupasinghe, Rajith Vidanaarachchi, Asela Hevapathige, Sachith Seneviratne, Sen-Lin Tang, Saman Halgamuge

发表机构 * University of Melbourne（墨尔本大学）； Academia Sinica（中央研究院）

AI总结提出一种知识包容的自适应PINN框架，通过整合文本和网络结构知识改进微生物群落建模，在真实和模拟数据集上性能提升最高53%。

Comments 33 pages

详情

AI中文摘要

物理信息神经网络（PINN）是一种在机器学习方法中以方程形式包含知识的方式。除了方程，知识还以其他形式存在，如文本和网络结构。虽然现有的基于PINN的方法从数据中发现方程参数，但它们仅依赖实验测量。我们提出一个新的PINN框架，通过整合辅助知识源来丰富参数发现。我们将该框架应用于微生物学，其中广义Lotka-Volterra（gLV）作为建模微生物群落的生物学基础。我们证明，整合知识可以改进微生物群落建模。我们的框架利用同行评审的宏基因组学文献丰富gLV参数，因为文本提供了gLV单独无法捕捉的外部影响的生物学背景。我们使用数据驱动的整合方法将这些知识与微生物丰度的实验测量相结合。我们通过显式建模微生物相互作用来整合基于网络的结构知识。我们的知识包容框架推断微生物网络，揭示生态学见解。我们根据文献中记录的生态角色验证这些发现。我们在涵盖人类和植物相关微生物群落的真实和模拟数据集上进行评估。我们的框架在无知识情况下比现有技术提升最高53%。知识添加在基于Bray-Curtis差异的准确率上带来最高23%的提升，在R²上带来47%的提升。

英文摘要

Physics-Informed Neural Network (PINN) is a way of including knowledge in the form of equations in Machine Learning methods. Beyond equations, knowledge exists in other forms, such as text and network structure. While existing PINN-based approaches discover equation parameters from data, they rely solely on experimental measurements. We propose a new PINN framework that enriches parameter discovery by incorporating auxiliary knowledge sources. We instantiate our framework for microbiology, where generalised Lotka-Volterra (gLV) serves as a biological foundation for modelling microbial communities. We demonstrate that incorporating knowledge improves microbial community modelling. Our framework enriches the gLV parameters using peer-reviewed metagenomics literature, as text provides biological context on external influences that gLV alone cannot capture. We combine this knowledge with experimental measurements of microbial abundance using a data-driven integration approach. We integrate network-based structural knowledge by explicitly modelling microbial interactions. Our knowledge-inclusive framework infers microbial networks, revealing ecological insights. We validate these findings against ecological roles documented in the literature. We evaluate on real and simulated datasets spanning human- and plant-associated microbial communities. Our framework improves over the state-of-the-art by up to 53%, even without knowledge. Knowledge addition yields gains of up to 23% in Bray-Curtis Dissimilarity-based accuracy and 47% in $\mathrm{R}^2$.

URL PDF HTML ☆

赞 0 踩 0

2606.07692 2026-06-09 cs.LG cs.AI cs.ET 新提交

BCG-FM: A Foundation Model for Ambient Cardiac Health Sensing

BCG-FM：一种用于环境心脏健康感知的基础模型

Magnus Ruud Kjaer, Haejun Han, Ashish Neupane, David Q. Sun

发表机构 * Department of Computer Science and Engineering, University of California, San Diego（1 加州大学圣迭戈分校计算机科学与工程系）

AI总结提出首个环境机械生物信号基础模型BCG-FM，利用床垫压电传感器无感采集心冲击图，通过14.6万人的275万小时数据预训练，在生物年龄估计上达到3.26年MAE，并实现15种健康状态的临床相关判别。

详情

AI中文摘要

可穿戴生物信号的基础模型在多项临床任务中已匹配或超越监督专家，但所有模型都依赖于需要用户主动操作的模态——佩戴设备或访问睡眠实验室。我们提出BCG-FM，首个用于环境机械生物信号的基础模型。嵌入床垫表面的压电传感器每晚无感记录心冲击图（BCG）；我们使用参与者级对比学习，基于145,985名个体的总计275万小时夜间记录预训练BCG-FM，这是迄今为止最大的原始波形生物信号预训练语料库。冻结的BCG-FM嵌入在生物年龄估计上达到3.26年MAE（所有环境、非接触模态中最低报告值），并在15种自我报告健康状况和三个独立外部队列中产生临床相关的判别。仅500名标注参与者的预训练表示优于在3,372名参与者上训练的完全监督基线，且表示质量与对比批次大小呈对数线性关系。这些结果确立了环境、纵向机械生物信号作为健康基础模型的可行模态。

英文摘要

Foundation models for wearable biosignals have matched or exceeded supervised specialists across a range of clinical tasks, yet all rely on modalities that require deliberate user action--wearing a device or visiting a sleep lab. We introduce BCG-FM, the first foundation model for ambient mechanical biosignals. A piezoelectric sensor embedded in the bed surface records ballistocardiography (BCG) each night without user effort; we pretrain BCG-FM with participant-level contrastive learning and using a total of 2.75 million hours of nightly recordings from 145,985 individuals, the largest raw-waveform biosignal pretraining corpus to date. Frozen BCG-FM embeddings achieve 3.26-year MAE on biological-age estimation (the lowest reported for any ambient, contactless modality) and yield clinically relevant discrimination across 15 self-reported health conditions and three independent external cohorts. Pretrained representations from only 500 labeled participants outperform a fully supervised baseline trained on 3,372, and representation quality scales log-linearly with contrastive batch size. These results establish ambient, longitudinal mechanical biosignals as a viable modality for health foundation models.

URL PDF HTML ☆

赞 0 踩 0

2606.07694 2026-06-09 cs.LG stat.ML 新提交

FunctionEvolve: 基于结构引导的符号回归与大型语言模型

Zeyu Xia, Jun Zhu, Dong Yan

发表机构 * Bosch Center for Artificial Intelligence（博世人工智能中心）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； Tsinghua-Bosch Joint Center for ML, Tsinghua University（清华大学-博世联合机器学习中心）

AI总结提出FunctionEvolve框架，利用表达式树组织符号回归搜索，通过结构摘要、局部树编辑和结构感知系数拟合，在LLM-SRBench合成子集上以Claude Opus 4.6实现82.9%的SA@50，较同基线提升4.5倍。

详情

AI中文摘要

符号回归旨在从数据中揭示显式的科学定律。近期方法使用大型语言模型（LLM）引导基于背景文本的变异，这比随机遗传编程更具方向性。然而，精确的符号恢复既需要语义引导，也需要显式结构，以便通过有效的符号表示进行领域信息搜索。当前的LLM驱动系统仍然是结构盲的：它们在模糊的候选者中进行选择，缺乏局部变异的显式机制，并依赖脆弱的系数拟合，这可能会低估正确的骨架。我们提出FunctionEvolve，一个使用表达式树组织整个搜索的进化框架：结构摘要促进多样化的父代选择，局部树编辑保留有用的子表达式，结构感知拟合分解、约束和简化系数，以实现更可靠的评分。它仅使用初等函数族，无需额外的领域特定规则限制泛化能力。在LLM-SRBench的129任务合成子集上，使用Claude Opus 4.6的FunctionEvolve恢复了107个精确形式，达到82.9%的SA@50，是同骨干基线的4.5倍，以及55.8%的SA@1，是此前最强已发布top-1结果的3.6倍。消融实验表明，结构可见搜索是可靠恢复的核心，LLM引导的改进和结构感知系数优化作为必要的提议和评分机制。我们还对基准进行了审计，显示其材料科学子集中的共线性导致了可识别性问题。

英文摘要

Symbolic regression aims to uncover explicit scientific laws from data. Recent methods use LLMs to guide mutation from background text, which is more directed than random genetic programming. However, exact symbolic recovery requires both semantic guidance and explicit structure, so that domain-informed search are carried out through valid symbolic representation. Current LLM-driven systems remain structure-blind: they select among opaque candidates, lack explicit mechanisms for local mutation, and rely on brittle coefficient fitting that can undervalue correct skeletons. We propose FunctionEvolve, an evolutionary framework using expression trees to organize the whole search: structural summaries promote diverse parent selection, local tree edits preserve useful subexpressions, and structure-aware fitting decomposes, constrains, and simplifies coefficients for more reliable scoring. It uses only elementary function families, without additional domain-specific rules limiting generalization. On the 129-task synthetic subset of LLM-SRBench, FunctionEvolve with \emph{Claude Opus 4.6} recovers 107 exact forms, reaching 82.9% SA@50, 4.5x above same-backbone baselines, and 55.8% SA@1, 3.6x above the strongest previously published top-1 result. Ablations show that structure-visible search is central to reliable recovery, with LLM-guided refinements and structure-aware coefficient optimization serving as essential proposal and scoring mechanisms. We also audit the benchmark and show that collinearity in its materials-science subset creates identifiability issues.

URL PDF HTML ☆

赞 0 踩 0

2606.07707 2026-06-09 cs.LG 新提交

Decoding Naturalistic Emotion Dynamics from the Brain: An LLM-Enhanced Regression Framework

从大脑解码自然情感动态：一种LLM增强的回归框架

Lemei Zhang, Peng Liu, Hans Dahle Kvadsheim, August Sætre Aasvær, Shuer Ye, Reza Bonyadi, Maryam Ziaei, Jon Atle Gulla

发表机构 * NTNU（挪威科技大学）； Kavli Institute for Systems Neuroscience, NTNU（挪威科技大学卡弗里系统神经科学研究所）； Microsoft（微软）

AI总结提出多目标回归框架，利用LLM从自然叙事中提取连续情感特征，结合动态功能连接和机器学习算法，实现从fMRI数据中解码连续情感轨迹，并揭示可解释的情感特异性脑网络拓扑。

详情

AI中文摘要

从神经信号解码情感状态通常被框架化为基于情感稳定刺激的离散单标签分类任务，这种表述过于简化了人类情感的连续、流动和共现特性。本研究通过采用多目标回归框架来重新概念化情感解码，以跟踪随时间变化的多个重叠情感维度作为连续轨迹。利用大型语言模型（LLM）的强大泛化能力，我们从自然听觉叙事《爱丽丝梦游仙境》中提取了细粒度的连续情感特征，作为人类fMRI数据集中主观情感的 scalable 代理。与标准分类范式或过滤网络动态的 mass-univariate 减法对比不同，我们利用正则化和基于核的机器学习算法作为连续估计器来跟踪宏观神经状态变化的幅度。我们证明，基于动态功能连接（DFC）时间快照训练的模型显著优于静态感兴趣区域（ROI）幅度表示，能够有效捕捉快速变化的叙事输入下的连续情感轨迹。此外，通过实施图论可解释人工智能（XAI）技术，我们解构了底层预测特征，揭示了高度可解释的、情感特定的拓扑配置。总体而言，这些结果凸显了LLM自动注释在情感神经科学中的实用性，并为心理建构主义框架提供了令人信服的实证证据，表明动态、分布式的网络交互比严格定位主义的情感解释具有更强的解释力。

英文摘要

Decoding emotional states from neural signals has been typically framed as a discrete, single-label classification task based on emotionally stable stimuli, a formulation that oversimplifies the continuous, fluid, and co-occurring nature of human affect. This study reconceptualizes emotion decoding by adopting a multi-target regression framework to track multiple overlapping emotional dimensions as continuous trajectories over time. Leveraging the robust generalization capabilities of Large Language Models (LLMs), we extracted fine-grained, continuous sentiment profiles from a naturalistic auditory narrative, Alice in Wonderland, to serve as scalable proxies for subjective affect from human fMRI dataset. Departing from standard classification paradigms or mass-univariate subtractive contrasts that filter out network dynamics, we leverage regularized and kernel-based machine learning algorithms as continuous estimators to track the magnitude of macroscale neural state variations. We demonstrate that models trained on temporal snapshots of Dynamic Functional Connectivity (DFC) significantly outperform static region-of-interest (ROI) amplitude representations, effectively capturing continuous emotional trajectories under rapidly fluctuating narrative input. Furthermore, by implementing graph-theoretical Explainable AI (XAI) techniques, we deconstruct the underlying predictive features to reveal highly interpretable, emotion-specific topological configurations. Collectively, these results highlight the utility of LLM-automated annotation in affective neuroscience and provide compelling empirical evidence for psychological constructionist frameworks, demonstrating that dynamic, distributed network interactions offer superior explanatory power over strictly locationist accounts of emotion.

URL PDF HTML ☆

赞 0 踩 0

2606.07714 2026-06-09 cs.LG cs.AI cs.HC 新提交

Beyond Accuracy: Interpreting Topic Representation in Suicide Ideation Detection Models

超越准确率：解释自杀意念检测模型中的主题表示

Hamideh Ghanadian, Isar Nejadgholi, Hussein Al Osman

发表机构 * University of Ottawa（渥太华大学）； National Research Council Canada（加拿大国家研究委员会）

AI总结本研究通过可视化与几何分析，探究自杀意念检测模型内部如何编码心理风险因素，发现主题增强能提升低表征风险因素表示的清晰度与可解释性。

详情

AI中文摘要

自杀意念检测模型通常使用聚合性能指标进行评估，但对其内部如何表示具有心理意义的风险因素知之甚少。在高风险心理健康应用中，理解这些内部表示对于安全性、透明度和负责任部署至关重要。在这项工作中，我们超越准确率，分析在原始和主题增强数据集上训练的自杀检测模型如何在其内部表示空间中编码心理风险因素。通过可视化和几何分析，我们检查主题相关特征的连贯性和可分离性。我们的结果表明，主题感知增强提高了低表征心理社会风险因素（如移民、家庭问题和金融危机）的清晰度和区分度。这些发现表明，增强不仅提高了模型性能，还导致了更结构化和可解释的内部表示。

英文摘要

Suicide ideation detection models are typically evaluated using aggregate performance metrics, yet little is known about how they internally represent psychologically meaningful risk factors. In high-stakes mental health applications, understanding these internal representations is essential for safety, transparency, and responsible deployment. In this work, we move beyond accuracy and analyze how suicide detection models trained on original and topic-augmented datasets encode psychological risk factors in their internal representation space. Using visualization and geometric analysis, we examine the coherence and separability of topic-related features. Our results show that topic-aware augmentation increases the clarity and distinctness of underrepresented psychosocial risk factors such as immigration, family issues, and financial crisis. These findings suggest that augmentation not only improves model performance but also leads to more structured and interpretable internal representations.

URL PDF HTML ☆

赞 0 踩 0

2606.07724 2026-06-09 cs.LG 新提交

A Geometry-Aware Triplane Field Network for Vehicle Aerodynamic Prediction

几何感知三平面场网络用于车辆气动预测

Kangkang Qi, Huiyu Yang, Keqi Ding, Yunpeng Wang, Yuntian Chen, Yuanwei Bin, Rikui Zhang, Jianchun Wang

发表机构 * Southern University of Science and Technology（南方科技大学）； Shenzhen Tenfong Technology Co., Ltd.（深圳腾风科技有限公司）； Eastern Institute of Technology（东方理工高等研究院）

AI总结提出几何感知三平面场网络(GTF-Net)，通过双流骨干网络结合自适应傅里叶神经算子与CNN，实现车辆气动压力和壁面剪切应力的高效预测，在精度上超越现有方法。

Comments 28 pages, 8 figures

详情

AI中文摘要

高保真计算流体动力学(CFD)对车辆气动分析至关重要，但其成本仍制约早期设计探索。基于机器学习的表面场预测提供了一种更快的替代方案，前提是模型能高效捕捉全局流动上下文和局部几何细节。本文提出一种基于机器学习的方法，名为几何感知三平面场网络(GTF-Net)，用于车辆气动压力和壁面剪切应力预测。GTF-Net通过共享多层感知器(MLP)和光滑双线性光栅化，直接从采样表面点构建三平面特征。然后，这些平面由双流骨干网络处理，该网络将自适应傅里叶神经算子(AFNO)谱混合与卷积神经网络(CNN)细化相结合，从而在同一表示中建模长程气动耦合和局部几何诱导变化。在查询阶段，采样的三平面特征与车辆对齐的方向坐标、法向投影特征和基于体素的曲率代理相结合。将GTF-Net与Transolver、几何信息神经算子(GINO)以及基于三平面的代理模型TripNet进行比较。GTF-Net将压力预测的最强基线相对L2误差从0.157降至0.145，壁面剪切应力预测从0.237降至0.226。消融结果表明，AFNO混合、局部CNN细化和查询侧几何编码均有助于提高精度，支持了将结构化三平面表示与显式气动几何线索相结合的提议机制。

英文摘要

High-fidelity computational fluid dynamics (CFD) is crucial to vehicle aerodynamic analysis, but its cost still constrains early-stage design exploration. Machine-learning-based surface-field prediction offers a faster alternative if the model can efficiently capture both global flow context and local geometric detail. This work proposes a machine-learning-based method, named the geometry-aware triplane field network (GTF-Net), for vehicle aerodynamic pressure and wall shear stress prediction. GTF-Net constructs triplane features directly from sampled surface points through a shared multilayer perceptron (MLP) and smooth bilinear rasterization. The planes are then processed by a dual-stream backbone that combines adaptive Fourier neural operator (AFNO) spectral mixing with convolutional neural network (CNN) refinement, so long-range aerodynamic coupling and local geometry-induced variations are modeled in the same representation. At query stage, sampled triplane features are combined with vehicle-aligned directional coordinates, normal-projection features, and a voxel-based curvature proxy. GTF-Net is compared with Transolver, geometry-informed neural operator (GINO), and TripNet, a triplane-based surrogate model. GTF-Net improves the relative L2 error from the strongest baseline value of 0.157 to 0.145 for pressure prediction and from 0.237 to 0.226 for wall shear stress prediction. Ablation results show that AFNO mixing, local CNN refinement, and query-side geometric encoding each contribute to accuracy, supporting the proposed mechanism of combining structured triplane representation with explicit aerodynamic geometry cues.

URL PDF HTML ☆

赞 0 踩 0

2606.07982 2026-06-09 cs.LG 新提交

Overcoming the Limits of Finite Difference Method; Physics-Informed Neural Network for Noisy High-Dimensional Heat Diffusion

克服有限差分法的局限性：用于含噪高维热扩散的物理信息神经网络

Shreesh Bhattarai, Harish Chandra Bhandari

发表机构 * Kathmandu University（加德满都大学）

AI总结针对高维含噪热扩散问题，提出物理信息神经网络（PINN）框架，在噪声和维度较高时显著优于有限差分法（FDM），实现精度与效率的权衡。

详情

AI中文摘要

高维瞬态热扩散在噪声边界条件下暴露了经典数值方法的根本局限性：在物理噪声不可避免的情况下，精度会灾难性地下降。本文提出了一个物理信息神经网络（PINN）框架，作为在一维、二维和三维空间中对这一问题的系统性解决方案，建立了明确的操作机制，重新定义了含噪热系统中求解器的选择。在三维空间中，当边界噪声为20%时，PINN保持约91%的精度，而有限差分法（FDM）降至36%，这是一个明显的决定性优势。这一点在物理铜热系统中得到进一步证实，在真实噪声条件下，PINN将边界重建误差降低了3.3倍。这种噪声鲁棒性伴随着维度驱动的效率交叉：在三维空间中，PINN所需的时空节点少于FDM，同时实现更高的精度，揭示了经典离散化在大规模下的真实成本。这些发现重新定义了求解器的选择：决定性的轴不仅是精度，而是噪声暴露和维度的共同作用。当噪声和维度都较高时，经典求解器范式不足；本工作为证明PINN在此类机制中作为操作标准提供了基础。

英文摘要

High-dimensional transient heat diffusion under noisy boundary conditions exposes a fundamental limitation of classical numerical methods: accuracy degrades catastrophically where physical noise is unavoidable. This paper presents a Physics-Informed Neural Network (PINN) framework as a systematic solution to this problem across one, two, and three spatial dimensions, establishing clear operational regimes that redefine solver selection in noisy thermal systems. Under 20% boundary noise in 3D, PINN sustains approximately 91% accuracy while Finite Difference Method (FDM) collapses to 36%, a clear decisive advantage. This is further confirmed in a physical copper thermal system, where PINN reduces boundary reconstruction error by 3.3 times under realistic noise conditions. This noise resilience is accompanied by a dimensionality-driven efficiency crossover: PINN requires fewer spacetime nodes than FDM in 3D while achieving superior accuracy, exposing the true cost of classical discretization at scale. These findings reframe solver selection: the decisive axis is not accuracy alone, but noise exposure and dimensionality jointly. When noise and dimensionality are both high, the classical solver paradigm is insufficient; this work provides the foundation to justify PINN as the operational standard in such regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.08037 2026-06-09 cs.LG cs.AI 新提交

SafeECGMatch: Calibration-Aware Joint Frequency and Time Space Semi-Supervised Learning for Open-Set ECG Classification

SafeECGMatch：面向开放集心电图分类的校准感知联合频率与时间空间半监督学习

Hongkyu Koh, Ikbeom Jang

发表机构 * Hankuk University of Foreign Studies（韩国外国语大学）

AI总结提出SafeECGMatch框架，通过双分支架构提取时频特征，结合自适应标签平滑和温度缩放校准模型，在标签分布不匹配下实现可靠的开集分类和OOD检测。

Comments 8 pages. Accepted to the KDD-UC 2026 (ACM International Conference on Data Mining and Knowledge Discovery - Undergraduate Consortium 2026)

详情

AI中文摘要

心电图（ECG）分类模型常面临严重的标签稀缺问题，使得半监督学习（SSL）成为降低标注成本的有效策略。然而，在临床环境中，未标注数据池通常包含分布外（OOD）异常或标注集中不存在的诊断类别。标准SSL会强制对这些未见类别分配错误的伪标签，产生过度自信的预测。为解决此问题，我们提出SafeECGMatch，一个校准感知的安全SSL框架，用于标签分布不匹配下的单标签ECG分类。方法上，SafeECGMatch采用双分支架构，通过ECG特定的数据增强提取时频潜在表示。关键地，它通过自适应标签平滑和温度缩放动态对齐置信度与经验准确性，在时间和频谱域上校准多类分类器和OOD检测器。这种联合优化实现了可信的OOD拒绝和可靠的伪标签分配。在PTB-XL和PhysioNet/CinC Challenge基准上评估，SafeECGMatch达到了最先进的准确性和校准性能，推动了生理时间序列中可靠知识发现。代码可在https://github.com/labhai/SafeECGMatch获取。

英文摘要

Electrocardiogram (ECG) classification models often suffer from severe label scarcity, making semi-supervised learning (SSL) an attractive strategy for reducing annotation costs. In clinical settings, however, unlabeled pools frequently contain out-of-distribution (OOD) anomalies or diagnostic groups absent from the labeled set. Standard SSL forces incorrect pseudo-labels onto these unseen classes, producing overconfident predictions. To address this, we propose SafeECGMatch, a calibration-aware safe SSL framework for single-label ECG classification under label distribution mismatch. Methodologically, SafeECGMatch employs a dual-branch architecture extracting time-frequency latent representations via ECG-specific augmentations. Crucially, it dynamically aligns confidence with empirical accuracy through adaptive label smoothing and temperature scaling, calibrating both the multiclass classifier and the OOD detector across temporal and spectral domains. This joint optimization allows trustworthy OOD rejection and reliable pseudo-labeling. Evaluated on the PTB-XL and PhysioNet/CinC Challenge benchmarks, SafeECGMatch achieves state-of-the-art accuracy and calibration, advancing reliable knowledge discovery in physiological time-series. Code is available at https://github.com/labhai/SafeECGMatch.

URL PDF HTML ☆

赞 0 踩 0

2606.08100 2026-06-09 cs.LG 新提交

面向机器学习初学者的公共机器学习求解器框架

Lokman Saleh, Hafedh Mili, Mounir Boukadoum

发表机构 * LATECE Lab, Université du Québec à Montréal（LATECE实验室，魁北克大学蒙特利尔分校）

AI总结提出一个结合专家知识和迁移学习的半自动化平台，为非专家推荐完整的机器学习流水线，并自动提取数据特征，通过一阶逻辑推理提供排名算法。

详情

AI中文摘要

解决机器学习问题很复杂，通常只有专家才能胜任。过去二十年中，出现了支持非专家的系统。根据我们的回顾，我们识别出三类：(1) 全自动AutoML系统，(2) 用于算法选择的专家备忘单，以及(3) 使用选择标准（准确性、透明度、数据要求）的决策支持系统。我们提出一个新平台，结合了第2和第3类，为非专家提供半自动化、智能的解决方案推荐。与推荐单一算法的现有方法不同，我们的平台建议一个针对用户问题量身定制的完整流水线。它整合了专家定义的选择标准与迁移学习，并自动从用户提供的数据集中提取数据特征（例如，类别不平衡、缺失值）。该平台使用一阶逻辑对其知识库进行推理，并推荐按相关性排序的合适算法。它具有用户友好的界面，并连接到面向机器学习专家的众包平台，确保持续更新。该平台是增量构建的，允许无缝集成新算法、标准和领域知识。据我们所知，这是第一个免费、公开可访问的在线框架，系统地捕获和操作专家知识，以结构化、透明的方式指导非专家解决机器学习问题。

英文摘要

Solving machine learning problems is complex and typically reserved for experts. Over the past two decades, systems have emerged to support non-experts. Based on our review, we identify three categories: (1) fully automated AutoML systems, (2) expert cheat sheets for algorithm selection, and (3) decision-support systems using selection criteria (accuracy, transparency, data requirements). We propose a new platform combining categories 2 and 3 to deliver semi-automated, intelligent solution recommendations for non-experts. Unlike existing approaches that recommend a single algorithm, our platform suggests a complete pipeline tailored to the user's problem. It integrates expert-defined selection criteria with transfer learning and automatically extracts data characteristics (e.g., class imbalance, missing values) from user-provided datasets. The platform uses first-order logic to reason over its knowledge base and recommends suitable algorithms ranked by relevance. It features a user-friendly interface and connects to a crowdsourcing platform for ML experts, ensuring continuous updates. The platform is built incrementally, allowing seamless integration of new algorithms, criteria, and domain knowledge. To our knowledge, this is the first free, publicly accessible online framework that systematically captures and operationalizes expert knowledge to guide non-experts in solving ML problems in a structured, transparent manner.

URL PDF HTML ☆

赞 0 踩 0

2606.08238 2026-06-09 cs.LG 新提交

GPT-Micro: A large language paradigm for accelerated, inexpensive, and thermodynamics-consistent discovery of constitutive models in manufacturing

GPT-Micro: 一种用于制造业中加速、低成本且热力学一致的本构模型发现的大语言范式

Soumik Dutta, Kiarash Naghavi Khanghah, Sania Shree, Logan McNeil, Thomas Feldhausen, Hongyi Xu, Rajiv Malhotra

发表机构 * Department of Mechanical and Aerospace Engineering, Rutgers University（罗格斯大学机械与航空航天工程系）； Department of Mechanical, Aerospace & Manufacturing Engineering, University of Connecticut（康涅狄格大学机械、航空航天与制造工程系）； Edison Welding Institute（埃迪森焊接研究所）； Manufacturing Science Division, Oak Ridge National Laboratory（橡树岭国家实验室制造科学分会）； Department of Aerospace and Mechanical Engineering, University of Texas at El Paso（德克萨斯州埃尔帕索大学航空航天与机械工程系）

AI总结提出GPT-Micro范式，结合大语言模型、热力学约束和稀疏数据，实现自主发现本构模型，在印刷电子测试中数据量减少70%、发现时间缩短400倍。

Comments 23 pages, 4 tables, 11 equations, 9 figures

详情

AI中文摘要

本构模型描述了工艺施加的材料状态与基本材料属性之间的关系，对于制造过程中材料微观结构的控制至关重要。传统上依赖易错的人类经验和直觉来假设和修正模型函数形式，导致模型发现过程缓慢且增量式改进，精度有限。传统的机器学习需要大量数据生成成本和时间。使用大语言模型的模型发现存在上述问题，并且/或者忽略了基本热力学定律的不可违背性。本文创建了一种新颖的GPT-Micro范式，用于自主、数据稀疏且符合热力学的全新本构模型发现。该框架无缝集成了文献语义知识提取、基于热力学的守恒定律强制执行、稀疏数据集以及大语言模型驱动的模型假设生成与改进。在印刷电子工艺测试平台上对一个长期难以解决的本构建模问题进行了验证。结果表明，与现有技术相比，该方法具有显著且多方面的优势，包括：(a) 相比基于机器学习的建模，数据负担减少超过70%，且精度不损失；(b) 相比人工驱动建模，数据生成后的发现时间从数月缩短至数小时，减少400倍；(c) 发现具有新颖函数形式的模型，无需主观选择初始假设；(d) 通过综合紧凑、符合守恒定律且物理完整的解析模型，增强了基于物理的可信度、人类可解释性和机理洞察。讨论了GPT-Micro在制造业中实现快速、低成本、物理可信且可解释的微观结构建模的潜力。

英文摘要

Constitutive modeling of the relationship between process-imposed material states and fundamental material properties is critical to control of material microstructure in manufacturing processes. The limited accuracy resulting from the typical reliance on fallible human expertise and intuition for postulation and revision of the models functional form results in incremental and time consuming model discovery. Conventional Machine Learning (ML) incurs significant cost and time of data generation. Model discovery using Large Language Models (LLMs) suffers from the above issues and/or ignores the inviolability of fundamental thermodynamics laws. This work creates a novel GPT-Micro paradigm for autonomous, data sparse, and thermodynamics-compliant discovery of de-novo constitutive models. This framework seamlessly integrates semantic knowledge extraction from literature, enforcement of thermodynamics-based conservation laws, and sparse datasets, with LLM-driven generation and refinement of model hypotheses. Validation is performed for a long-intractable constitutive modeling problem in a printed electronics process testbed. This reveals significant and simultaneous advantages over the state-of-the-art including: (a) More than 70 percent reduction in data burden relative to ML-based modeling without loss in accuracy; (b) 400X reduction in discovery time after data generation, from months to hours, relative to human-driven modeling; (c) Discovery of models with novel functional forms without subjective human choice of a starting hypothesis; (d) Enhanced physics-rooted trustworthiness, human interpretability, and mechanistic insight via synthesis of compact, conservation-compliant, and physically complete analytical models. The potential of GPT-Micro to realize rapid, low-cost, physically trustworthy, and interpretable microstructure modeling across the manufacturing landscape is discussed.

URL PDF HTML ☆

赞 0 踩 0

2606.08300 2026-06-09 cs.LG 新提交

QueryWeaver: Reliable Multi-Tool Query Execution Planning via LLM-Based Graph Generation

QueryWeaver: 基于LLM图生成的可靠多工具查询执行规划

Aishwarya Chakravarthy, Vidhi Kulkarni, Duen Horng Chau

AI总结提出将自然语言查询转换为结构化图并通过确定性规划器执行的系统，利用深度优先搜索解决跨工具依赖，实现高可靠性查询。

2606.08479 2026-06-09 cs.LG 新提交

Inferring hidden forcing in a biological oscillator using Kolmogorov-Arnold networks

利用Kolmogorov-Arnold网络推断生物振荡器中的隐藏驱动力

Julian Szereszewski, Facundo Fainstein, Leandro E. Fernandez, Gabriel B. Mindlin

发表机构 * Universidad de Buenos Aires（布宜诺斯艾利斯大学）； CONICET - Universidad de Buenos Aires（阿根廷国家科研委员会-布宜诺斯艾利斯大学）； Instituto de Física Interdisciplinaria y Aplicada (INFINA)（跨学科与应用物理研究所（INFINA））

AI总结提出利用Kolmogorov-Arnold网络从气压测量数据重建鸟类呼吸动力学方程，揭示隐藏的两相肌肉激活模式，并通过肌电图验证。

Comments 11 pages, 4 figures

详情

AI中文摘要

从部分观测中推断驱动动力系统的力是物理学中的一个基本挑战，特别是当不同的潜在机制产生相似的观测动力学时。在这里，我们展示了仅通过气囊压力测量即可重建鸟类呼吸动力学背后的有效肌肉驱动力。使用基于Kolmogorov-Arnold网络的可解释学习框架，我们直接从数据中推断系统的控制方程，并揭示潜在驱动力中的非平凡结构，该结构从压力信号中并不明显，而压力信号反而暗示了一种类似松弛的振荡。重建的动力学预测每个呼吸周期内存在两相激活模式，我们通过呼气肌的肌电图记录独立验证了这一点。这些结果表明，数据驱动的动力学定律重建可以揭示隐藏的物理结构，并提供对未观测驱动变量的访问，从而建立了一种在部分观测动力系统中推断潜在力的通用途径。

英文摘要

Inferring the forces that drive a dynamical system from partial observations is a fundamental challenge across physics, particularly when distinct underlying mechanisms produce similar observable dynamics. Here we show that the effective muscular forcing underlying avian respiratory dynamics can be reconstructed from measurements of air-sac pressure alone. Using an interpretable learning framework based on Kolmogorov-Arnold networks, we infer the governing equations of the system directly from data and uncover a nontrivial structure in the underlying forcing that is not apparent from the pressure signal, which instead suggests a relaxation-like oscillation. The reconstructed dynamics predict a two-phase activation pattern within each respiratory cycle, which we independently validate through electromyographic recordings of expiratory muscles. These results demonstrate that data-driven reconstruction of dynamical laws can reveal hidden physical structure and provide access to unobserved driving variables, establishing a general route to infer latent forces in partially observed dynamical systems.

URL PDF HTML ☆

赞 0 踩 0

2606.08480 2026-06-09 cs.LG cs.AI cs.IR 新提交

Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

生成式推荐中噪声鲁棒GRPO的自适应损失平衡

Kewei Xu, Junbo Qi, Yanyan Zou, Pengfei Zhang, Xingzhi Yao, Shengjie Li

发表机构 * JD.com（京东）； Waseda University（早稻田大学）； University of Electronic Science and Technology of China（电子科技大学）

AI总结针对生成式推荐中奖励模型因曝光偏差导致噪声的问题，提出AdaGRPO框架，通过策略难度和奖励可区分性诊断动态切换GRPO与监督学习，在电商数据集上提升召回率并抑制幻觉。

详情

AI中文摘要

强化学习为超越监督模仿的生成式推荐提供了有前景的途径，通过利用奖励信号指导策略改进。然而，其有效性关键取决于奖励模型对所评估样本的可信度。实践中，广泛采用的奖励模型——生产级排序器，是在有曝光偏差的日志上训练的，导致样本相关的误差，违反了这一假设。我们的分层分析揭示了一个一致的模式：当策略表现出不确定性且排序器能有效区分真实物品与rollout负样本时，奖励指导最为有益。在其他样本上，奖励信号要么可忽略，要么有害，凸显了统一应用RL的风险。为解决此问题，我们引入AdaGRPO，一种新颖框架，将奖励指导优化视为选择性准入而非统一压力。训练以监督负对数似然为基础，而GRPO目标由基于两个rollout诊断（策略侧难度和奖励可区分性）的逐样本二元裁剪门控。未通过任一诊断的实例退化为纯监督，确保稳定性并减轻噪声梯度的放大。我们在大规模电商数据集上验证了AdaGRPO。在最佳中间检查点，它将HR@10从11.01%提升至12.18%，同时将幻觉限制在0.22%以下，并在最终检查点保持鲁棒性（HR@10 11.63%，幻觉0.27%），在检索-有效性前沿上优于固定NLL-GRPO混合。在生产A/B测试中，AdaGRPO在点击率和停留时间上实现了统计显著的提升，证实了其实用价值。

英文摘要

Reinforcement learning (RL) presents a promising avenue for enhancing generative recommendation beyond supervised imitation, leveraging reward signals to guide policy improvement. However, its efficacy is critically contingent on the trustworthiness of the reward model for the samples it evaluates. In practice, production rankers, the widely adopted reward models, are trained on exposure-biased logs, leading to sample-dependent inaccuracies that violate this assumption. Our stratified analysis uncovers a consistent pattern: reward guidance is most beneficial when the policy exhibits uncertainty and the ranker can effectively discriminate the ground-truth item from rollout negatives. On other samples, the reward signal is either negligible or detrimental, highlighting the risk of uniform RL application. To address such an issue, we introduce AdaGRPO, a novel framework that treats reward-guided optimization as selective admission rather than uniform pressure. Training is anchored in supervised negative log-likelihood, while the GRPO objective is gated by a binary, per-sample clip determined by two rollout diagnostics: policy-side difficulty and reward discriminability. Instances failing either diagnostic default to pure supervision, ensuring stability and mitigating the amplification of noisy gradients. We validate AdaGRPO on a large-scale e-commerce dataset. At the best intermediate checkpoint, it elevates HR@10 from 11.01% to 12.18% while constraining hallucination below 0.22%, and maintains robustness at the final checkpoint (HR@10 11.63%, hallucination 0.27%), outperforming fixed NLL--GRPO mixtures across the retrieval--validity frontier. In production A/B tests, AdaGRPO achieves statistically significant gains in click-through rate and dwell time, confirming its practical utility.

URL PDF HTML ☆

赞 0 踩 0

2606.08484 2026-06-09 cs.LG cs.AI 新提交

STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling

STELLAR: 面向长尾物种分布建模的时空环境学习与潜在对齐精炼

Shufeng Kong, Tao Yu, Yuanyuan Wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink, Carla P. Gomes

发表机构 * Sun Yat-sen University（中山大学）； Cornell University（康奈尔大学）； Foshan University（佛山大学）； Cornell Lab of Ornithology（康奈尔鸟类学实验室）

AI总结提出STELLAR框架，通过图-时间编码器、上下文锚定潜在对齐和不平衡感知解码模块，联合优化动态栖息地上下文和群落结构，有效解决物种分布建模中的时空耦合与长尾不平衡问题。

Comments Accept by IJCAI 2026

详情

AI中文摘要

联合物种分布建模（JSDM）是生物多样性监测和保护规划的关键工具。然而，准确的JSDM面临两个耦合挑战：环境驱动因素和物种分布本质上是时空的，而物种共现模式表现出复杂的非线性群落结构以及由稀有物种导致的严重长尾不平衡。现有方法通常孤立地处理这些因素，从静态协变量中学习或忽略动态群落结构的历史轨迹。为克服这些限制，我们提出STELLAR（时空环境学习与潜在对齐精炼），一种新颖的框架，学习一个共享潜在空间，其中动态栖息地上下文和群落结构被联合优化。我们的方法整合了三个互补组件：（1）图-时间编码器，采用图注意力和循环单元来聚合空间邻域效应并捕捉环境上下文和群落结构的共同演化历史动态；（2）上下文锚定潜在对齐机制，利用标签激活的混合先验和监督对比学习结构化潜在空间，基于共享环境偏好主动聚类物种；（3）不平衡感知解耦解码模块，利用非对称损失聚焦于困难稀有物种样本的学习，防止长尾中的模式崩溃。在领域专家精心整理的大规模eBird数据集上的实验表明，我们的框架显著优于最先进的基线，特别是在预测稀有物种和揭示可解释的物种相互作用方面。

英文摘要

Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the historical trajectories of dynamic community structure. To overcome these limitations, we propose STELLAR (Spatio-Temporal Environmental Learning with Latent Alignment and Refinement), a novel framework that learns a shared latent space where dynamic habitat context and community structure are optimized jointly. Our approach integrates three complementary components: (1) a Graph-Temporal Encoder that employs graph attention and recurrent units to aggregate spatial neighborhood effects and capture the co-evolving historical dynamics of environmental context and community structure; (2) a Context-Anchored Latent Alignment mechanism that structures the latent space using a label-activated mixture prior and supervised contrastive learning, actively clustering species based on shared environmental preferences; and (3) an Imbalance-Aware Decoupled Decoding module that utilizes Asymmetric Loss to focus learning on hard, rare species samples, preventing mode collapse in the long tail. Experiments on the large-scale eBird dataset, curated with domain experts, demonstrate that our framework significantly outperforms state-of-the-art baselines, particularly in predicting rare species and revealing interpretable species interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.08538 2026-06-09 cs.LG 新提交

Routine laboratory trajectories encode the onset of organ-level complications in cancer

常规实验室轨迹编码癌症器官级并发症的发生

Jannik Lübberstedt, Krischan Braitsch, Jacqueline Lammert, Christof Winter, Florian Gabriel, Tristan Lemke, Christopher Zirn, Markus Graf, Friedrich Puttkammer, Hartmut Häntze, Johannes Moll, Anirudh Narayanan, Andrei Zhukov, Fabian Drexel, Zeineb Ben Chaaben, Sebastian Ziegelmayer, Su Hwan Kim, Marion Högner, Jan Kirschke, Florian Bassermann, Marcus Makowski, Christian Wachinger, Lisa Adams, Keno Bressem

发表机构 * Technical University of Munich（慕尼黑工业大学）； Charité - Universitätsmedizin Berlin（柏林夏里特医学院）； German Heart Center（德国心脏中心）

AI总结利用Transformer分析癌症患者常规实验室检测的纵向轨迹，预测162种治疗相关并发症，性能优于单时间点方法，验证了轨迹数据对器官功能恶化的早期编码能力。

详情

AI中文摘要

癌症治疗期间抽取的常规实验室检查构成了器官功能的纵向生理记录，然而其时间结构被单时间点预后工具所忽略。一个基于Transformer的模型在来自3,905名多发性骨髓瘤或卵巢癌患者的2,777,595次实验室测量上训练，预测了162种治疗相关并发症（包括治疗相关骨髓增生异常综合征）的两年内发生，涵盖八个临床类别，在群体水平上实现了高于患病率1.5至6.1倍的富集。它在分组终点上匹配或超越了非序列基线（AUROC提升高达+0.11），表明纵向实验室轨迹捕捉到了从孤立测量中无法获得的、随并发症演变的特异性生理信息。预测在两种癌症中均具有泛化能力，差异集中在疾病特异性并发症上，生物标志物掩膜恢复了与既定病理生理学一致的签名。在MIMIC-IV和MMRF CoMMpass上的外部验证证实了其在独立医疗系统中的可迁移性（AUROC高达0.85）。常规肿瘤学实验室数据在临床发作前数周至数月编码了器官恶化，从而无需额外检测基础设施即可实现并发症特异性监测。

英文摘要

Routine laboratory panels drawn during cancer treatment constitute longitudinal physiological recordings of organ function, yet their temporal structure is discarded by single-timepoint prognostic tools. A transformer trained on 2,777,595 laboratory measurements from 3,905 patients with multiple myeloma or ovarian cancer predicted the two-year onset of 162 treatment-associated complications, including therapy-related myelodysplastic syndromes, spanning eight clinical categories, achieving 1.5- to 6.1-fold enrichment above prevalence at the group level. It matched or outperformed non-sequential baselines across grouped endpoints (AUROC gains up to +0.11), demonstrating that longitudinal laboratory trajectories capture evolving complication-specific physiology inaccessible from isolated measurements. Predictions generalised across both cancers, divergence concentrating in disease-specific complications, and biomarker masking recovered signatures consistent with established pathophysiology. External validation on MIMIC-IV and MMRF CoMMpass confirmed transferability across independent healthcare systems (AUROC up to 0.85). Routine oncological laboratory data encode organ deterioration weeks to months before clinical onset, enabling complication-specific surveillance without additional testing infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2606.08563 2026-06-09 cs.LG physics.ao-ph 新提交

Physics-Guided Dual Decoding and Spectral Supervision for Global 3D Hydrometeor Prediction

物理引导的双解码与光谱监督用于全球三维水凝物预测

Dandan Chen, Yaqiang Wang

发表机构 * Chinese Academy of Meteorological Sciences（中国气象科学研究院）； Xiong’an Institute of Meteorological Artificial Intelligence（雄安气象人工智能研究院）

AI总结针对三维水凝物预测中零膨胀长尾分布导致的过度平滑问题，提出物理引导的双解码框架PredHydro-Net，通过解耦架构、小波频率解耦和对抗训练，在极端事件检测和光谱表示上优于现有模型。

详情

AI中文摘要

虽然全球数据驱动模型在预测连续大气变量方面表现出色，但由于这些变量的零膨胀长尾分布，三维水凝物预测仍然具有挑战性。标准的深度学习优化通常会产生过度平滑的预测，削弱极端事件和空间纹理。我们提出了PredHydro-Net，一个物理引导的双解码框架，以缓解这种平滑。为了解决多变量优化冲突，它采用了解耦架构，其中宏观热力学和动力学场单向调节水凝物的生成。通过集成基于小波的频率解耦、光谱幅度匹配和对抗训练，该模型在定量准确性和空间保真度之间实现了有利的权衡。在72小时全球评估中，PredHydro-Net在极端事件检测和光谱表示方面优于时空深度学习基线（Earthformer和PredRNNv2）以及业务全球预报系统（GFS）。此外，它与全球降水测量（GPM）卫星反演表现出良好的气候一致性。该模型合理地再现了极端天气事件（如飓风伊恩）中的三维云结构。特征归因证实了其对物理前兆（如相对湿度和风辐合）的依赖，为长尾大气预测提供了一种稳健的、物理信息的方法。

英文摘要

While global data-driven models excel at predicting continuous atmospheric variables, three-dimensional hydrometeor forecasting remains challenging due to the zero-inflated, long-tailed distributions of these variables. Standard deep learning optimization often yields overly smooth forecasts, attenuating extreme events and spatial textures. We propose PredHydro-Net, a physics-guided dual-decoding framework that mitigates this smoothing. To resolve multi-variable optimization conflicts, it employs a decoupled architecture where macroscopic thermodynamic and dynamic fields unidirectionally modulate hydrometeor generation. By integrating wavelet-based frequency decoupling, spectral amplitude matching, and adversarial training, the model achieves a favorable trade-off between quantitative accuracy and spatial fidelity. In a 72-h global evaluation, PredHydro-Net outperforms both spatiotemporal deep learning baselines (Earthformer and PredRNNv2) and the operational Global Forecast System (GFS) in extreme-event detection and spectral representation. Furthermore, it demonstrates strong climatological consistency with Global Precipitation Measurement (GPM) satellite retrievals. The model reasonably reproduces the three-dimensional cloud structures in extreme weather events, such as Hurricane Ian. Feature attribution confirms its dependence on physical precursors such as relative humidity and wind convergence, offering a robust, physics-informed approach to long-tailed atmospheric prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.08573 2026-06-09 cs.LG cs.CL 新提交

Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition

Titans-as-a-Layer：对话语音情感识别的测试时记忆

Daniel Chen, Qicong Hu, Yang Xiao, Ting Dang, Hong Jia

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出Memory-as-a-Layer (MAL)适配器，利用测试时神经记忆为对话语音情感识别提供上下文，在不修改大型音频语言模型的前提下提升性能。

Comments ICML 2026 Workshop on Machine Learning for Audio

详情

AI中文摘要

语音情感识别（SER）通常被表述为话语级分类，尽管对话情感取决于说话者通常的音域和先前话语建立的情感上下文。语音语言模型提供了强大的预训练声学和语义表示，并可以通过微调将其适应于SER标签，但这种机制仍然缺少每对话状态。我们研究测试时神经记忆是否可以在保持大型音频语言模型（LALMs）主干不变的情况下提供这种缺失的上下文。基于Titans，我们引入了一种即插即用的Memory-as-a-Layer（MAL）适配器，它将对话历史写入小型神经记忆，并作为音频令牌对齐的残差更新读回，避免了对宿主模型令牌位置的更改。在不同的音频LLM和情感识别数据集评估中，我们的设计在不同评估指标上改善了SER性能，支持测试时记忆作为对话SER的残差上下文机制。

英文摘要

Speech emotion recognition (SER) is commonly formulated as utterance-level classification, although conversational emotion depends on a speaker's usual vocal range and the emotional context established by previous utterances. Speech-language models provide strong pretrained acoustic and semantic representations, and can adapts them to SER labels via finetune, but this mechanism still missing per-dialogue state. We study whether test-time neural memory can supply this missing context while leaving the large audio language models (LALMs) backbone intact. Building on Titans, we introduce a plug-and-play Memory-as-a-Layer (MAL) adapter that writes dialogue history into a small neural memory and reads it back as an audio-token-aligned residual update, avoiding changes to the host model's token positions. Across different audio LLMs and emotion recognition datasets evaluations, our design improves SER performs across different evaluation metrics, supporting test-time memory as a residual contextual mechanism for conversational SER.

URL PDF HTML ☆

赞 0 踩 0

2606.08630 2026-06-09 cs.LG cs.AI 新提交

Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

Tyan-WP：用于超短期概率预测的风电基础模型

Jiahui Huang, Ao Luo, Lei Liu, Hongwei Zhao, Tengyuan Liu, Ruibo Guo, Bo Wang, Zhao Wang, Bin Li

发表机构 * School of Information Science and Technology, University of Science and Technology of China（中国科学技术大学信息科学技术学院）； China Electric Power Research Institute（中国电力科学研究院）

AI总结提出首个风电基础模型Tyan-WP，通过静态站点嵌入和功率感知气象融合模块，在零样本场景下实现超短期概率预测，显著优于传统模型。

详情

AI中文摘要

全球风电容量，特别是在中国，正在蓬勃发展，新的风电场跨越了多样的地形和气候。行业迫切需要准确的风电基础模型，以缩短调试并加速并网。这是因为特定站点的时间序列模型（TSM）不适用于数据稀缺场景且泛化能力差，而通用大型时间序列模型（LTSM）大多限于单变量输入，无法充分利用静态站点属性或功率与气象协变量之间的依赖关系，导致精度不足。为填补这一空白，我们提出了\textbf{Tyan-WP}，这是首个用于超短期概率预测的风电基础模型。在覆盖美国超过126,000个站点、跨越七年的大规模风电数据集上预训练后，Tyan-WP通过两个特定领域模块设计进一步提升了零样本预测：使用坐标、地形和生态区域元数据的静态站点嵌入，以及一个功率感知气象融合（PAMF）模块，该模块对历史功率和气象协变量之间的交互进行建模。在统一评估协议下，Tyan-WP在10个域内站点上超越了八个特定站点的监督TSM，并在127个域内站点上优于十一个通用LTSM，MAE降低19.9%，RMSE降低16.6%，CRPS降低22.2%，AQL降低21.7%，同时R^2提升16.7%。它还在六个真实的英国站点上展示了强大的跨地理泛化能力。这些结果表明，风电基础模型可以在无需目标站点训练的情况下实现准确的零样本预测，为新风电场快速涡轮机接入和概率风险管理提供了实用途径。

英文摘要

Global wind power capacity, especially in China, is booming, with new farms spanning diverse terrains and climates. The industry urgently needs accurate wind power foundation models to shorten commissioning and accelerate grid connection. This is because site-specific time series models (TSMs) are not well suited to data-scarce scenarios and generalize poorly, while generic large time series models (LTSMs) are mostly limited to univariate inputs and cannot fully exploit static site attributes or the dependencies between power and meteorological covariates, leading to insufficient accuracy. To fill this gap, we propose \textbf{Tyan-WP}, the first wind power foundation model for ultra-short-term probabilistic forecasting. Pretrained on a large-scale wind power dataset covering more than 126,000 U.S. sites over seven years, Tyan-WP further improves zero-shot forecasting through two domain-specific module designs: static site embedding using coordinate, terrain, and ecoregion metadata, and a power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates. Under a unified evaluation protocol, Tyan-WP surpasses eight site-specific supervised TSMs on 10 in-domain sites and outperforms eleven generic LTSMs on 127 in-domain sites, reducing MAE by 19.9%, RMSE by 16.6%, CRPS by 22.2%, and AQL by 21.7%, while raising R^2 by 16.7%. It further demonstrates strong cross-geography generalization on six real U.K. sites. These results show that the wind power foundation model can achieve accurate zero-shot forecasting without target-site training, providing a practical pathway for rapid turbine onboarding and probabilistic risk management at new wind farms.

URL PDF HTML ☆

赞 0 踩 0

2606.08696 2026-06-09 cs.LG cs.AI 新提交

Agentic Search for Counterfactual Recourse under Fixed LLM Budgets

固定LLM预算下的反事实追索的智能搜索

Yasuo Tabei

AI总结提出Comp-MCTS框架，在固定LLM调用预算下，通过树搜索最大化生成唯一且经oracle验证的反事实，平衡数量与质量。

详情

AI中文摘要

反事实追索旨在提供可操作的特征变化，以改变预测模型做出的不利决策。在实践中，受影响的个体通常受益于多个可行的替代方案，而非单一的最优解释。产生此类替代方案的一种自然方式是提示大语言模型（LLMs）。然而，提示引入了一个实际约束：LLM调用的数量通常是主要的计算和经济成本。对多个替代方案的需求以及这一成本约束共同将问题从寻找单个高质量反事实转变为在固定LLM调用预算下高效生成一组经oracle验证的反事实。在这项工作中，我们将LLM智能体设置中的反事实追索生成作为固定预算搜索问题进行研究，并提出了Comp-MCTS，一个智能体树搜索框架，该框架在此预算下最大化唯一、经oracle验证的反事实的产出，同时保持有利的数量-质量权衡。Comp-MCTS通过基于LLM的提议生成、oracle验证和压缩引导剪枝，在无训练、仅oracle的设置中将预算分配给新颖的干预方向。在四个真实世界表格数据集上的实验表明，Comp-MCTS在唯一、经oracle验证的反事实产出方面显著优于单候选LATS风格基线，并且与更强的多候选变体相比，提供了有利的数量-质量-效率权衡：在四个数据集中的三个上，以相似或更低的oracle评估成本获得相当或更高的产出，同时具有有竞争力的接近性、稀疏性和新颖性。

英文摘要

Counterfactual recourse aims to provide actionable feature changes that would alter an unfavorable decision made by a predictive model. In practice, affected individuals often benefit from multiple feasible alternatives rather than a single optimal explanation. A natural way to produce such alternatives is to prompt large language models (LLMs). However, prompting incurs a practical constraint: the number of LLM calls is often the dominant computational and economic cost. Together, the need for multiple alternatives and this cost constraint shift the problem from finding a single high-quality counterfactual to efficiently generating a set of oracle-validated counterfactuals under a fixed LLM-call budget. In this work, we study counterfactual recourse generation in the LLM-agentic setting as a fixed-budget search problem and propose Comp-MCTS, an agentic tree-search framework that maximizes the yield of unique, oracle-validated counterfactuals under this budget while maintaining favorable quantity--quality trade-offs. Comp-MCTS allocates the budget toward novel intervention directions via LLM-based proposal generation, oracle validation, and compression-guided pruning, in a training-free, oracle-only setting. Experiments on four real-world tabular datasets show that Comp-MCTS substantially outperforms single-candidate LATS-style baselines in the yield of unique, oracle-validated counterfactuals, and offers favorable quantity--quality--efficiency trade-offs against stronger multi-candidate variants: comparable or higher yield at similar or lower oracle-evaluation cost on three of four datasets, plus competitive proximity, sparsity, and novelty.

URL PDF HTML ☆

赞 0 踩 0

2606.08712 2026-06-09 cs.LG cs.AI cs.CV 新提交

SNR-ST-Mix: Sample-specific Neighborhood Regression Mixup for Augmented Spatial Transcriptomics Imputation with Deep Neural Network

SNR-ST-Mix: 基于样本特异性邻域回归混合增强的空间转录组学深度神经网络插补

Hongyi Yu, Yaoyu Fang, Jiahe Qian, Xinkun Wang, Lee A. Cooper, Bo Zhou

发表机构 * Northwestern University（西北大学）； Yale University（耶鲁大学）

AI总结针对空间转录组数据噪声大、分辨率低的问题，提出SNR-ST-Mix数据增强框架，通过空间邻域约束和表达相似性加权混合生成生物合理的合成样本，提升深度神经网络插补性能。

Comments 19 pages, 4 figures, 3 tables

详情

AI中文摘要

目的：空间转录组学（ST）能够在组织背景下测量基因表达。然而，这些测量通常噪声大、分辨率低且采样稀疏，限制了精细空间结构的恢复。深度神经网络已成为从组织学进行表达插补的强大工具，但其性能仍受限于有限的样本量和缺乏生物学信息的增强。大多数现有的学习增强策略是为分类任务而非回归任务设计的，忽略了空间和转录组关系，导致生物上不合理的插值，阻碍了预测性能。方法：为解决这些限制，我们提出SNR-ST-Mix，一种专门为ST数据设计的几何和表达感知数据增强框架。它将混合限制在点的k个最近空间邻域内，并基于表达相似性自适应加权插值系数，生成保留局部生物结构同时确保空间平滑性的增强样本。这种双重条件化产生合成样本，扩展了有效训练流形，促进了泛化，并在样本特异性训练下增强了预测稳定性。结果：使用各种组织类型的大量实验表明，SNR-ST-Mix在不需要架构更改或额外计算的情况下，始终优于传统增强方法。结论：SNR-ST-Mix为空间转录组学回归任务提供了一种有效且生物学原理的增强策略。通过显式利用空间几何和转录组相似性，它扩展了有效训练流形，并在不增加模型复杂度的情况下提高了预测性能。

英文摘要

Purpose: Spatial transcriptomics (ST) enables gene expression measurements within the tissue context. However, these measurements are often noisy, low-resolution, and sparsely sampled, which limits the recovery of fine spatial structure. Deep neural networks have become powerful tools for expression imputation from histology, but their performance remains constrained by limited sample sizes and a lack of biologically informed augmentation. Most of the existing augmentation strategies for learning are designed for classification tasks rather than regression, which neglect spatial and transcriptomic relationships, leading to biologically implausible interpolations that hinder prediction performance. Approach: To address these limitations, we propose SNR-ST-Mix, a geometry- and expression-aware data augmentation framework designed specifically for ST data. It constrains mixing to a spot's k-nearest spatial neighbors and adaptively weights interpolation coefficients based on expression similarity, generating augmented samples that preserve local biological structure while ensuring spatial smoothness. This dual conditioning yields synthetic examples that expand the effective training manifold, promote generalization, and enhance prediction stability under sample-specific training. Results: Extensive experiments with various tissue types demonstrate that SNR-ST-Mix consistently outperforms conventional augmentation methods without requiring architectural changes or additional computation. Conclusions: SNR-ST-Mix provides an effective and biologically principled augmentation strategy for spatial transcriptomics regression tasks. By explicitly leveraging spatial geometry and transcriptomic similarity, it expands the effective training manifold and improves predictive performance without increasing model complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.08816 2026-06-09 cs.LG cs.AI 新提交

Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors

知识图谱与推理大语言模型用于寻找简单而有效的转录组扰动预测因子

Jake Fawkes, Liam Hodgson, Jason Hartford

发表机构 * University College London（伦敦大学学院）； University of Manchester（曼彻斯特大学）； Valence Labs（Valence实验室）； Recursion（Recursion公司）

AI总结利用知识图谱的K近邻方法在基因敲除扰动预测中表现优异，结合强化学习优化的LLM可达到最先进性能。

详情

AI中文摘要

预测未见过的基因敲除扰动对转录组基因表达的影响仍然是虚拟细胞模型的一个极具挑战性的问题。最近，通过利用生物知识图谱提供相似扰动的概念，在训练扰动集之外实现了更好的外推。在这项工作中，我们证明了利用这些假设的最简单模型——知识图谱的K近邻——在此任务上取得了极具竞争力的性能，并且通过使用强化学习（RL）优化的LLM可以进一步提高预测性能。具体来说，我们发现K近邻方法在分布外扰动预测上几乎击败了所有方法，而当通过RL训练推理LLM以改变邻域时，它在Replogle等人（2022）的细胞系上获得了与当前最先进方法相当的性能。我们还证明，尽管没有直接训练，RL训练提高了LLM在差异表达预测下游任务上的性能。总体而言，这些发现证明了知识图谱作为模型先验的有效性，并显示出RL可以将LLM精炼为预测复杂生物反应的通用工具的早期迹象。

英文摘要

Predicting the effect of an unseen gene knockout perturbation on transcriptomic gene expression remains a highly challenging problem for virtual cell models. Recent progress has been made by leveraging biological knowledge graphs to provide a notion of similar perturbation, allowing for improved extrapolation beyond the set of training perturbations. In this work, we demonstrate that the simplest model to leverage these assumptions - a K-nearest neighbour from the knowledge graph - achieves highly competitive performance on this task, and that this can be improved further using LLMs optimised via reinforcement learning (RL) for predictive performance. Specifically, we find that the K-nearest neighbour approach beats almost all methods on out-of-distribution perturbation prediction, and when a reasoning LLM is trained via RL to make changes to the neighbourhood, it obtains equivalent performance to current state of the art methods on the cell lines from Replogle et al. (2022). We also demonstrate that the RL training improves the LLM's performance on the downstream task of differential expression prediction, despite not being trained on this directly. Overall, these findings demonstrate the efficacy of knowledge graphs as model priors, and show early signs that RL can refine LLMs into generalizable tools for predicting complex biological responses.

URL PDF HTML ☆

赞 0 踩 0

2606.08935 2026-06-09 cs.LG cs.AI 新提交

PAI: Preserving Amplitude Information in Representation-Based Time-Series Anomaly Detection

PAI：在基于表示的时间序列异常检测中保留振幅信息

Kang Zhang, Wei Jian Lau, Shoushou Ren, Dong Lin, Joon Son Chung, Chuanhao Sun

发表机构 * HUAWEI（华为）； KAIST（韩国科学技术院）

AI总结针对现有基于表示的时间序列异常检测方法忽略振幅信息导致性能下降的问题，提出PAI方案，通过诊断模块和分数增强函数融合振幅相关分数，在TSB-AD-U-Eva和TAB UV数据集上平均VUS-PR提升98.4%和36.8%。

Comments 15 pages

详情

AI中文摘要

基于表示的时间序列异常检测算法在多种异常检测任务上显著优于其他方法。然而，我们在评估中发现它们存在一个主要限制——学习到的嵌入通常是振幅无关的。丢失振幅信息会降低与振幅相关异常的性能，并且这种失败普遍存在于所有现有的基于表示的方法中。为了解决上述问题，我们提出了一种新的异常评分方案PAI。PAI由两个互补模块组成：诊断模块和最终分数增强函数。诊断模块比较同一表示库上的余弦评分和欧几里得评分，以测试振幅信息是否已被捕获到学习到的表示中。然后在最终分数增强函数中，PAI计算逐点中位数和MAD偏差分数以及局部均值偏移分数——这些分数与表示分数融合以产生最终异常分数。在TSB-AD-U-Eva和TAB UV数据集上，PAI在所有报告的指标上改进了所有四种评估的基于表示的方法，平均VUS-PR增益分别为98.4%和36.8%。在所有评估的组合中，PaAno + PAI实现了最佳性能，比最先进的方法高出15%。对bootstrap置信区间、异常类型细分以及TS2Vec输入归一化消融的进一步评估进一步支持了所提出的方案。这些结果表明，显式保留振幅信息对于基于表示的时间序列异常检测非常重要，而这一点在现有的评分方案中未得到充分重视。代码可在https://github.com/pantheon5100/PAI获取。

英文摘要

Representation-based time-series anomaly detection algorithms significantly outperform other methods on diverse anomaly detection tasks. However, we notice that they suffer from a major limitation in our evaluation - their learned embeddings are often amplitude-agnostic. Losing amplitude information can degrade performance on amplitude related anomalies, and this failure is prevalent across all existing representation-based methods. To address aforementioned issues, we propose a new anomaly scoring scheme named PAI. PAI consists of two complementary modules, a diagnostic module and a final score augmentation function. The diagnostic module compares cosine and Euclidean scoring on the same representation bank to test whether amplitude information is already captured in the learned representation. Then in final score augmentation function, PAI computes a point-wise median and MAD deviation score and a local mean-shift score-which are fused with the representation score to produce the final anomaly score. On the TSB-AD-U-Eva and TAB UV datasets, PAI improves all four evaluated representation-based methods across every reported metric, achieving average VUS-PR gains of 98.4% and 36.8%, respectively. Among all evaluated combinations, PaAno + PAI achieves the best performance, outperforming the state-of-the-art method by 15%. Further evaluation on bootstrap confidence intervals, anomaly-type breakdowns, and a TS2Vec input-normalization ablation further support the proposed scheme. These results suggest that explicitly retaining amplitude information is important for representation-based time-series anomaly detection, which has been underemphasized in existing scoring schemes. Code is available at: https://github.com/pantheon5100/PAI

URL PDF HTML ☆

赞 0 踩 0

2606.08945 2026-06-09 cs.LG 新提交

From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model

从风险函数到语言空间：Cox监督的生存风险蒸馏到大语言模型

Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm

发表机构 * Centre for Big Data Research in Health, the University of New South Wales（新南威尔士大学健康大数据研究中心）

AI总结提出将Cox比例风险模型的时间事件风险信息迁移到大语言模型中的方法，通过文本提示微调Qwen模型，在三个数据集上取得有竞争力的区分度和校准性，并发现隐藏状态呈现连续风险梯度。

详情

AI中文摘要

我们研究了Cox比例风险模型估计的时间事件风险信息是否可以迁移到生成式大语言模型中。我们提出了一种基于文本的生存建模流程，其中结构化的临床协变量被转换为文本提示，并微调基于Qwen的大语言模型，以使用Cox模型预测作为训练目标生成患者特定的生存风险。在GBSG2、ACTG320和WHAS500数据集上，尽管该模型是作为文本生成任务而非使用传统的生存分析损失进行训练，但它取得了有竞争力的留出区分度和校准性。我们进一步分析了模型隐藏状态的几何结构，其中t-SNE可视化揭示了潜在空间中的平滑风险梯度，表明模型将生存风险表示为连续结构而非孤立的风险类别。这些发现共同表明，大语言模型可以内化生存风险结构，同时支持校准预测，为语言模型中的时间事件推理提供了一条途径。

英文摘要

We investigate whether information about time-to-event risk estimated by a Cox proportional hazards model can be transferred into a generative large language model. We propose a text-based survival modelling pipeline in which structured clinical covariates are converted into text prompts and a Qwen-based large language model is fine-tuned to generate patient-specific survival risk using Cox model predictions as a training target. Across GBSG2, ACTG320, and WHAS500, the model achieves competitive held-out discrimination and calibration despite being trained as a text-generation task rather than with a conventional survival-analysis loss. We further analyse the geometry of the model's hidden states, where t-SNE visualisations reveal smooth risk gradients in latent space, suggesting that the model represents survival risk as a continuous structure rather than isolated risk categories. Together, these findings suggest that large language models can internalise survival-risk structure while supporting calibrated prediction, providing a route towards time-to-event reasoning in language models.

URL PDF HTML ☆

赞 0 踩 0

2606.09030 2026-06-09 cs.LG cs.AI cs.CL 新提交

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

TRIAGE: 基于辩证推理的不规则采样医学时间序列风险可解释预测方法

Hyeongwon Jang, Gyouk Chu, Changhun Kim, Joonhyung Park, Hangyul Yoon, Eunho Yang

发表机构 * KAIST（韩国科学技术院）； AITRICS ； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结提出TRIAGE框架，利用大语言模型对竞争性临床结果生成辩证推理，缓解风险极化，实现连续风险评分与可解释推理，在三个基准上AUPRC提升3.3%，校准误差降低81%。

Comments Code is available at https://github.com/HyeongWon-Jang/TRIAGE

详情

AI中文摘要

基于电子健康记录的临床早期预警系统，其中临床观察记录为不规则采样的医学时间序列（ISMTS），必须提供校准的风险评分用于患者分诊，以及临床医生可验证的可解释理由。大语言模型（LLMs）已被探索用于此任务，但它们将分级临床风险崩溃为过度自信的二元预测。这种风险极化损害了校准性和跨患者可比性。为解决此问题，我们提出TRIAGE框架，该框架训练LLM通过引出特定结果的理由，对竞争性临床结果生成辩证推理。这种辩证公式减轻了风险极化，使单个LLM能够产生基于明确临床推理的连续风险评分。在三个ISMTS基准上评估，TRIAGE相比竞争基线实现了平均AUPRC提升3.3%，校准误差降低81%。LLM作为评判者的评估进一步表明，我们的理由在临床推理质量上比基线的后验解释高出20%。源代码可在https://github.com/HyeongWon-Jang/TRIAGE获取。

英文摘要

Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .

URL PDF HTML ☆

赞 0 踩 0

2606.09065 2026-06-09 cs.LG cs.AI 新提交

算子学习求解不同初始条件下的福克-普朗克方程

Li Zeng, Xiaoliang Wan, Yaobin Wang, Fabio Nobile, Tao Zhou

发表机构 * Fuzhou University（福州大学）； Louisiana State University（路易斯安那州立大学）； Beijing Normal-Hong Kong Baptist University（北京师范大学-香港浸会大学联合国际学院）； École Polytechnique Fédérale de Lausanne（洛桑联邦理工学院）； Chinese Academy of Sciences（中国科学院）

AI总结提出基于条件归一化流的物理信息神经网络框架，利用Chapman-Kolmogorov方程和线性化SDE基分布，高效求解多种初始条件下FPE的算子，引入时间加权损失函数解决小时间不稳定性。

详情

AI中文摘要

福克-普朗克方程（FPE）在描述由随机动力学支配的系统概率密度函数（PDF）的时间演化中起着关键作用。本文提出了一种基于条件归一化流的物理信息神经网络（PINN）框架，用于高效逼近整个初始条件范围内FPE的解算子。利用马尔可夫随机过程的Chapman-Kolmogorov方程，将问题重新表述为逼近从任意点狄拉克质量开始的初始时刻的转移PDF。采用关联线性化随机微分方程（SDE）的PDF作为归一化流的基分布，该分布提供了目标PDF的良好近似，特别是在小时间尺度下，从而避免了与狄拉克δ初始分布相关的映射奇异性。此外，引入时间加权损失函数以减轻小时间尺度下出现的数值不稳定性，在时间推进过程中实现因果性与训练难度之间的平衡。通过多种数值实验展示了所提方法的有效性和鲁棒性。

英文摘要

The Fokker-Planck equation (FPE) plays a pivotal role in describing the time evolution of probability density functions (PDFs) for systems governed by stochastic dynamics. In this work, we propose a conditional normalizing flow-based physics-informed neural network (PINN) framework for efficiently approximating the solution operator of the FPE for a whole range of initial conditions. Leveraging the Chapman-Kolmogorov equation for Markovian stochastic processes, the problem is reformulated into approximating a transition PDF starting at initial time from a Dirac mass centered at an arbitrary point. The PDF of an associated linearized stochastic differential equation (SDE) is employed as the base distribution for the normalizing flow, providing a good approximation of the target PDF, especially for small times, and thereby avoiding the singularity of the map associated with the Dirac delta initial distribution. Furthermore, a time-weighted loss function is introduced to mitigate numerical instabilities arising at small times, achieving a balance between causality and training difficulty as time progresses. A variety of numerical experiments are presented to illustrate the effectiveness and robustness of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2606.09480 2026-06-09 cs.LG 新提交

Loss-Guided Adaptive Scale Refinement for Molecular Force Prediction

损失引导的自适应尺度细化用于分子力预测

Limin Yu

发表机构 * Tianjin Medical University（天津医科大学）

AI总结提出损失引导的自适应尺度细化框架，通过插值、路由和尺度池更新自动发现任务有效尺度，在NaCl水溶液体系中降低力预测误差。

Comments 23 pages, 2 figures, 6 tables. Preprint on adaptive scale refinement for molecular force prediction

详情

AI中文摘要

分子系统涉及多个空间尺度的相互作用，从局部配位和短程扰动到长程静电和溶剂介导效应。然而，大多数分子表示学习方法依赖于手动预定义的尺度，而任务最优建模尺度可能与这些固定水平不一致。本研究引入了一个损失引导的自适应尺度细化框架用于分子力预测，将预定义尺度视为初始锚点，通过插值、路由、可微尺度更新和尺度池细化来发现任务有效的分辨率。使用NaCl水溶液离子系统作为最小测试平台，构建了短尺度和长程力预测分支并分析了它们的互补性。Oracle硬路由将整体力MAE从399.65降低到382.67，而连续Oracle插值进一步将其降低到380.96。在最近离子距离低于0.6 nm的紧密接触区域，紧密接触MAE从327.22降低到260.51。一个最小尺度池更新实验表明，从端点锚点{0,1}开始，损失引导的更新自动生成中间尺度并恢复了大部分连续Oracle性能。最终更新的尺度池{0,0.125,0.25,0.375,0.5,0.75,1}实现了381.23的整体MAE。这些结果支持自适应尺度细化作为分子表示学习的一个有前景的方向，特别是当固定尺度建模不足时。

英文摘要

Molecular systems involve interactions across multiple spatial scales, from local coordination and short-range perturbations to long-range electrostatic and solvent-mediated effects. However, most molecular representation learning methods rely on manually predefined scales, and the task-optimal modeling scale may not coincide with these fixed levels. This study introduces a loss-guided adaptive scale refinement framework for molecular force prediction, treating predefined scales as initial anchors and discovering task-effective resolutions through interpolation, routing, differentiable scale updates, and scale pool refinement. Using a NaCl aqueous ionic system as a minimal testbed, this study constructs short-scale and long-range force prediction branches and analyzes their complementarity. Oracle hard routing reduces the overall force MAE from 399.65 to 382.67, while continuous oracle interpolation further reduces it to 380.96. In close-contact regimes with nearest-ion distance below 0.6 nm, the close-contact MAE decreases from 327.22 to 260.51. A minimal scale pool update experiment shows that starting from endpoint anchors {0,1}, loss-guided updates automatically generate intermediate scales and recover most of the continuous oracle performance. The final updated scale pool {0,0.125,0.25,0.375,0.5,0.75,1} achieves an overall MAE of 381.23. These results support adaptive scale refinement as a promising direction for molecular representation learning, especially when fixed-scale modeling is insufficient.

URL PDF HTML ☆

赞 0 踩 0

2606.09623 2026-06-09 cs.LG 新提交

Constrained user-item allocation for e-commerce marketing campaigns

面向电子商务营销活动的约束用户-物品分配

Maja Lindström, Natalija Glisovic, Jan von Pichowski, Tommy Löfstedt, Martin Rosvall

发表机构 * Umeå University（于默奥大学）； KTH Royal Institute of Technology（皇家理工学院）； University of Würzburg（维尔茨堡大学）

AI总结提出自动定向方法，通过约束谱双聚类、贪心局部搜索和多臂老虎机框架联合选择用户和物品构建多个不重叠营销活动，在合成数据、Amazon评论和商业数据上优于模拟退火。

详情

AI中文摘要

在开展营销活动时，零售商必须决定推广哪些产品以及针对哪些用户。这些决策本质上是耦合的：有效的活动将具有强烈相互亲和力的用户和物品匹配到预定义大小的非重叠组中。然而，现有方法假设预定义的活动结构或将物品选择与用户分配解耦，无法直接从联合交互模式中发现活动分组。因此，我们将该活动问题形式化为自动定向：联合选择用户和物品以构建多个不相交的活动。为了解决这个组合问题，我们提出了三种互补策略：（i）约束谱双聚类，以在用户-物品亲和力矩阵中找到密集区域；（ii）具有成对交换的贪心局部搜索，用于组合优化；（iii）多臂老虎机框架，通过探索逃离局部最优。我们在合成数据集、Amazon Reviews基准测试和大规模专有商业数据上评估了这些方法，并将结果与模拟退火基线进行比较。结果表明，双聚类始终获得最高的活动质量、提升度和公平性得分。虽然双聚类在较小数据集上运行高效，但在非常大的数据集上其运行时间显著增加，而基于老虎机的方法则提供了可扩展的替代方案。

英文摘要

When running marketing campaigns, retailers must decide which products to promote and which users to target. These decisions are inherently coupled: effective campaigns match users and items with strong mutual affinity into non-overlapping groups of predefined sizes. However, existing approaches assume predefined campaign structure or decouple item selection from user assignment, and cannot discover campaign groupings directly from joint interaction patterns. We therefore formalize this campaign problem as auto-targeting: jointly selecting users and items to construct multiple disjoint campaigns. To solve this combinatorial problem, we propose three complementary strategies: (i) constrained spectral biclustering to find dense regions in the user-item affinity matrix, (ii) greedy local search with pairwise swaps for combinatorial refinement, and (iii) a multi-armed bandit framework to escape local optima through exploration. We evaluate these methods on a synthetic dataset, the Amazon Reviews benchmarks, and large-scale proprietary commercial data, and compare the results to simulated annealing as a baseline. The results show that biclustering consistently achieves the highest campaign quality, lift, and fairness scores. While biclustering runs efficiently on smaller datasets, its runtime increases substantially on very large ones, where bandit-based methods instead offer a scalable alternative.

URL PDF HTML ☆

赞 0 踩 0

2606.09638 2026-06-09 cs.LG cs.SC math-ph math.MP physics.comp-ph stat.AP 新提交

超越视频ID：通过语义原生长序列建模实现短视频推荐规模化

Ruixiao Sun, Diego Uribe Mora, Zhimeng Jiang, Yuanzhen Lin, Jiarui Wang, Yuening Li, Danfeng Guo, Zhizhong Chen, Chuan He, Liang Liu

发表机构 * Google Mountain View, USA（谷歌山景城，美国）

AI总结针对短视频推荐中序列长度受限于视频ID语义稀疏性和Transformer二次复杂度的问题，提出采用语义ID和全局感知压缩Transformer，实现十亿用户规模的超长行为序列建模，显著降低内存和计算开销，在线实验提升用户满意度和内容消费。

Comments this manuscript has been accepted by SIGIR 2026

详情

DOI: 10.1145/3805712.3808503

AI中文摘要

捕捉用户跨广泛观看历史的兴趣对于短视频推荐至关重要，但扩展序列长度受到两个瓶颈的限制：原子视频ID的语义稀疏性和Transformer的二次计算复杂度。传统的正交视频ID无法捕捉内容关系，并且需要大型嵌入表，而自注意力的二次复杂度在严格的工业延迟和资源约束下限制了最大序列长度。在这项工作中，我们提出了一个在生产环境中部署的框架，用于在十亿用户规模上建模超长用户行为序列。我们首先通过采用内容原生的语义ID来解决表示瓶颈。通过使用深度截断、粗粒度的语义ID，我们将嵌入表大小从语料库基数中缩小。这种紧凑的表示通过共享语义前缀自然地泛化到冷启动内容。其次，为了克服序列扩展障碍，我们引入了全局感知压缩Transformer，它利用非参数时间折叠和统一全局查询集成来有效压缩序列，缓解了标准自注意力的内存和计算瓶颈。在我们计算基础设施上的离线分析显示，峰值内存占用减少了一个数量级，计算开销大幅降低。这种效率提升使得在生产中以可承受的成本支持更长的序列长度，在大规模在线A/B测试中，在满意的用户参与度和满意的内容消费方面取得了显著的在线收益。

英文摘要

Capturing user interests across extensive watch histories is critical for short-form video recommendation, yet scaling sequence length is limited by two bottlenecks: the semantic sparsity of atomic Video IDs and the quadratic computational complexity of Transformers. Traditional orthogonal Video IDs fail to capture content relationships and demand large embedding tables, while the quadratic complexity of self-attention restricts the maximum sequence length under strict industrial latency and resource constraints. In this work, we present a production-deployed framework for modeling ultra-long user behavior sequences at a billion-user scale. We first address the representation bottleneck by adopting content-native Semantic IDs. By utilizing depth-truncated, coarse-grained Semantic IDs, we shrink the embedding table size from corpus cardinality. This compact representation naturally generalizes to cold-start content through shared semantic prefixes. Second, to overcome the sequence scaling barrier, we introduce a Global-Aware Compression Transformer that leverages non-parametric temporal folding and unified global query integration to effectively condense the sequence, alleviating both the memory and computational bottlenecks of standard self-attention. Offline profiling on our computing infrastructure demonstrates an order-of-magnitude reduction in peak memory footprint and a drastic decrease in computational overhead. This efficiency gain enables supporting longer sequence lengths at an affordable cost in production, yielding substantial online gains in satisfied user engagement and satisfied content consumption in large-scale online A/B tests.

URL PDF HTML ☆

赞 0 踩 0

2606.07552 2026-06-09 cs.MA cs.AI cs.LG 交叉投稿

Symbolic Reasoning Frameworks Modulate LLM Risk Aversion in Multi-Agent Strategic Settings

符号推理框架在多智能体战略环境中调节大语言模型的风险规避

Augustin Chan

发表机构 * iterative.day

AI总结本研究通过注入符号推理框架（如易经、塔罗牌）作为反思提示，发现其能差异化调节LLM的风险规避倾向，并在多智能体博弈中产生框架特定的胜者分布，且该效应源于反思过程而非内容遵循。

Comments 17 pages, 3 figures, 6 tables, 6 listings. Code and data: https://doi.org/10.5281/zenodo.20338937

详情

AI中文摘要

大型语言模型在作为战略智能体部署时表现出内在的行为倾向——尤其是风险规避的“乌龟”偏向于防御性玩法。我们证明，符号推理框架作为每轮反思提示注入一个智能体，能够差异化地调节这种偏向，并重塑多智能体生态系统，产生框架特定的胜者分布。在一个7玩家的战国策外交变体（41局游戏，4种条件，单战役记忆积累）中，每个框架产生独特的生态系统特征：在控制条件下，燕国主导（7/11，64%）；在易经蓍草占卜下，燕国和楚国共同主导，而秦国被完全压制（0/10）；在塔罗牌下，秦国主导（5/10，Fisher vs. 合并p=0.006）；在乱序文本消融（保留提示结构的无意义神谕文本）下，齐国主导（5/10，Fisher vs. 合并p=0.006）。接受框架的智能体（韩国）从未获胜，且在不同条件下生存率无差异（Fisher p=1.0），但塔罗牌持续提升韩国的峰值领土（平均3.0个SC vs. 2.1-2.5个其他，Kruskal-Wallis p=0.010）。两个框架的内容均不能预测后续行动——卦象主题（卡方p=0.95）和塔罗牌姿态（卡方p=0.69）均与行动选择独立——表明调节作用是通过反思过程而非内容遵循实现的。我们将其作为一篇观察论文呈现，确立智能体层面的对齐框架选择在多智能体环境中产生独特的系统级后果。

英文摘要

Large language models exhibit innate behavioral tendencies when deployed as strategic agents -- notably a risk-averse "turtle" bias toward defensive play. We show that symbolic reasoning frameworks, injected as per-round reflective prompts into one agent, differentially modulate this bias and reshape the multi-agent ecosystem to produce framework-specific winner distributions. In a 7-player Warring States Diplomacy variant (41 games, 4 conditions, single-campaign memory accumulation), each framework produces a distinct ecosystem signature: under control, Yan dominates (7/11, 64%); under I-Ching yarrow divination, Yan and Chu co-dominate while Qin is completely suppressed (0/10); under Tarot, Qin dominates (5/10, Fisher vs. pooled p = 0.006); under scrambled-text ablation (incoherent oracle text preserving prompt structure), Qi dominates (5/10, Fisher vs. pooled p = 0.006). The framework-receiving agent (Han) never wins and shows no survival difference across conditions (Fisher p = 1.0), but Tarot consistently elevates Han's peak territory (mean 3.0 SCs vs. 2.1-2.5 others, Kruskal-Wallis p = 0.010). Neither framework's content predicts subsequent actions -- hexagram themes (chi-squared p = 0.95) and Tarot card postures (chi-squared p = 0.69) are both independent of action choice -- suggesting the modulation operates through the reflective process, not content-following. We present this as an observation paper establishing that alignment-framework choice at the agent level produces distinctive system-level consequences in multi-agent settings.

URL PDF HTML ☆

赞 0 踩 0

2606.07568 2026-06-09 cs.HC cs.AI cs.CV cs.LG physics.data-an 交叉投稿

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

行为克隆在科学数据标注中的系统研究

Ishaan Singh Chandok, Core Francisco Park

发表机构 * GitHub

AI总结针对科学数据标注中人工验证校正耗时问题，提出行为克隆框架，通过9个合成任务模拟专家策略，发现模型层次化技能习得、多任务预训练高效微调、内部表示共享错误模式等关键结论。

Comments ICML 2026 Oral

详情

AI中文摘要

科学数据标注，例如视频中动物追踪或神经重建的校对，仍然受限于“最后一公里”问题：即使有强大的自动化，验证和校正仍需大量人力。标准方法训练模型直接预测标注，丢弃了专家如何导航、点击、验证和校正的丰富监督信息。我们引入了一个研究科学标注上行为克隆的框架：9个合成任务配以合成标注，模拟真实人类策略，包括探索、错误校正和战略决策。我们的实验揭示了若干发现。首先，技能层次化出现：模型先学习GUI机制，再学习任务关键决策，且比训练数据犯更少错误，同时保留在错误发生时校正的能力。其次，在多任务行为克隆上扩展模型表明，在我们的规模范围内，更大的模型数据效率更高。第三，多任务预训练能够高效微调至新任务，而从零开始训练则完全失败。第四，线性探针揭示模型内部表示标注过程的潜在变量，如任务阶段和数据位置；有趣的是，我们发现一个跨不同标注任务泛化的共享错误表示。总体而言，我们的框架建立了系统基准并识别了关键瓶颈，为将行为克隆扩展到真实世界科学数据标注奠定了基础。

英文摘要

Scientific data annotation, such as tracking animals in video or proofreading neural reconstructions, remains bottlenecked by the "last mile" problem: even with strong automation, verification and correction consume substantial human effort. Standard approaches train models to directly predict annotations, discarding the rich supervision in how experts navigate, click, verify, and correct. We introduce a framework for studying behavioral cloning on scientific annotation: 9 synthetic tasks paired with synthetic annotations that simulate realistic human strategies including exploration, mistake correction, and strategic decision-making. Our experiments reveal several findings. First, skills emerge hierarchically: models learn GUI mechanics before task-critical decisions, and commit fewer mistakes than the training data while retaining the ability to correct errors when they occur. Second, scaling models on multi-task behavioral cloning shows that larger models are more data efficient within our scale range. Third, multi-task pretraining enables efficient fine-tuning to new tasks, while training from scratch fails entirely. Fourth, linear probes reveal that models internally represent latent variables of the annotation process such as task phase and data position; interestingly, we find a shared mistake representation that generalizes across different annotation tasks. Overall, our framework establishes systematic benchmarks and identifies key bottlenecks, providing a foundation for scaling behavioral cloning to real-world scientific data annotation.

URL PDF HTML ☆

赞 0 踩 0

2606.07570 2026-06-09 cs.DL cs.LG 交叉投稿

Can LLMs extract scientific consensus? A case study in high-temperature superconductivity

LLMs能否提取科学共识？以高温超导为例

Mouyang Cheng, Wenhao He, Zhuotao Jin, Bowen Yu, Ju Li, Boris Kozinsky, Yao Wang, Pavel Volkov, Liangzi Deng, Ching-Wu Chu, Xiao-Gang Wen, Mingda Li

发表机构 * Center for Computational Science and Engineering, MIT（MIT计算科学与工程中心）； Department of Materials Science and Engineering, MIT（MIT材料科学与工程系）； Department of Physics, MIT（MIT物理系）； Department of Nuclear Science and Engineering, MIT（MIT核科学与工程系）； John A. Paulson School of Engineering and Applied Sciences, Harvard University（哈佛大学约翰·A·保罗森工程与应用科学学院）； Department of Chemistry, Emory University（埃默里大学化学系）； Department of Physics, University of Connecticut（康涅狄格大学物理系）； Department of Physics and Texas Center for Superconductivity, University of Houston（休斯顿大学物理系和德克萨斯超导中心）

AI总结本研究以高温超导领域为测试平台，利用近18,000篇高被引文献构建知识图谱，发现LLM提取的表征能恢复出连贯且物理可解释的结构，表明LLM可作为解码竞争性科学知识的可扩展工具。

Comments 23 pages, 4 figures

详情

AI中文摘要

科学知识日益分散在庞大且异质的科学文献中，其中重要的主张往往是隐含的、不断演变的，并且存在内部争议。尽管大型语言模型（LLM）在信息提取和摘要方面表现出色，但它们恢复潜在科学共识的能力仍不清楚。本文以凝聚态物理中长期存在且备受争议的高温超导（HTS）问题为挑战性测试平台，研究了这一问题。利用过去七十年间近18,000篇高被引出版物，我们构建了一个结构化的知识图谱，链接了竞争性的超导机制、材料家族、证据模态和引用关系。我们发现，LLM提取的表征恢复出了连贯且物理可解释的结构，包括家族依赖的机制概况、证据特定的相关性以及引用介导的科学信念的时间演化。对LLM的消融研究进一步表明，全局结构在提示、解码和模型变化下保持稳健。我们的结果表明，LLM确实可以作为可扩展的工具，用于解读以竞争性解释和知识演变为特征的领域的科学知识。

英文摘要

Scientific knowledge is increasingly dispersed across vast and heterogeneous scientific literature, where important claims are often implicit, evolving, and internally debated. While large language models (LLMs) have shown impressive performance in information extraction and summarization, their ability to recover latent scientific consensus remains unclear. Here, we investigate this problem in the context of high-temperature superconductivity (HTS), a long-standing and highly debated topic in condensed matter physics, as a challenging testbed. Using near 18,000 highly-cited publications over the past seven decades, we construct a structured knowledge graph linking competing superconducting mechanisms, material families, evidential modalities, and citation relations. We find that LLM-extracted representations recover coherent and physically interpretable structures, including family-dependent mechanism profiles, evidence-specific correlations, and citation-mediated temporal evolution of scientific beliefs. Ablation studies on LLM further show that the global structure remains robust across prompting, decoding, and model variations. Our results suggest that LLMs can indeed serve as scalable tools for deciphering scientific knowledge in domains characterized by competing interpretations and evolving knowledge.

URL PDF HTML ☆

赞 0 踩 0

2606.07572 2026-06-09 physics.soc-ph cs.LG stat.AP 交叉投稿

Forecasting Japanese elections: A nonlinear machine-learning approach

预测日本选举：一种非线性机器学习方法

Sota Kato, Xuan Luo, Budrul Ahsan, Asahi Obata, Takafumi Nakanishi

发表机构 * International University of Japan（国际大学）； The Tokyo Foundation（东京基金会）； IBM Japan（IBM日本）； Rice University（里士满大学）； Tokyo University of Technology（东京技术大学）

AI总结本研究引入基于决策树和集成学习的非线性机器学习模型，预测日本众议院选举结果，相比传统线性模型在样本内和样本外评估中均表现出更优的预测精度。

详情

AI中文摘要

尽管日本是世界上最大的先进民主国家之一，但其全国选举的预测模型发展仍然有限。本研究引入了基于决策树和集成学习方法的非线性机器学习预测模型，用于预测日本众议院选举结果。为了评估我们方法的方法论优势，我们复现了Lewis-Beck和Tien（LBT）针对日本选举的基础统计预测模型的理论框架和数据集。我们的模型在样本内和样本外评估中均显示出比LBT模型适度但持续提高的预测准确性，表明非线性算法在捕捉复杂选举动态方面为经典线性方法提供了一种替代方案。本研究是非线性机器学习技术较早应用于单一国家选举预测的案例之一。它提供了一个可复现的框架，当与其他国家的特定选举理论相结合时，可能提高预测模型在更广泛国家背景下的预测性能。

英文摘要

Despite Japan being one of the world's largest advanced democracies, the development of election forecasting models for its national elections remains limited. This study introduces nonlinear machine-learning forecasting models, based on decision tree and ensemble learning methods, for predicting the outcomes of Japanese lower-house elections. To assess the methodological benefits of our approach, we replicated the theoretical framework and dataset of Lewis-Beck and Tien's (LBT) foundational statistical forecasting model for Japanese elections. Our models demonstrated moderately but consistently improved predictive accuracy compared to LBT's model in both in-sample and out-of-sample evaluations, suggesting that nonlinear algorithms offer an alternative approach to classical linear methods in capturing complex electoral dynamics. This study represents one of the earlier applications of nonlinear machine-learning techniques to single-country election forecasting. It offers a replicable framework that, when combined with the country-specific electoral theories of other nations, may enhance the predictive performance of forecasting models in broader national contexts.

URL PDF HTML ☆

赞 0 踩 0

2606.07575 2026-06-09 q-fin.RM cs.LG 交叉投稿

神经外科医生需要看到的：用于脑肿瘤手术中脑移位补偿的超声合成术中MRI

Santiago Cepeda, Olga Esteban-Sinovas, Ignacio Arrese, Rosario Sarabia

发表机构 * Department of Neurosurgery, Neurovascular Unit, Río Hortega University Hospital, Valladolid, Spain（西班牙巴利亚多利德里奥·奥尔特加大学医院神经外科神经血管科）； Specialized Group in Biomedical Imaging and Computational Analysis (GEIBAC), Instituto de Investigación Biosanitaria de Valladolid (IBioVALL), Valladolid, Spain（西班牙巴利亚多利德生物医学研究与计算分析专业组(GEIBAC)，巴利亚多利德生物健康研究所(IBioVALL)）

AI总结提出一种端到端流水线，通过融合术前MRI、术中超声生成的合成MRI及锚定该合成图像的可变形配准，生成术前成像空间中的全脑MRI体积，以补偿脑移位，为神经导航提供类似MRI的术中视野更新。

详情

AI中文摘要

最大安全切除是胶质瘤手术的主要目标。硬脑膜打开后，神经导航引导会因脑移位而逐渐退化。术中MRI可以补偿，但需要专用基础设施且很少可用，而术中超声（ioUS）廉价、可重复且与常规工作流程兼容。将ioUS与术前MRI结合的导航系统通常依赖刚性配准；即使是可变形多模态配准也受限于超声散斑对比度、窄视野以及无法表示术前扫描中不存在的结构，最关键的是切除腔和残余肿瘤。我们提出一个端到端流水线，通过合并术前MRI、从ioUS生成的合成MRI以及锚定在该合成图像上的可变形配准，生成术前成像空间中的全脑MRI体积。它集成了一个2.5D残差变换器合成骨干（ResViT-2.5D）和一个两阶段配准，将NiftyReg与合成锚定的SynthMorph阶段耦合，直接对原始扫描仪输入进行操作。在切除后的ReMIND队列上，ResViT-2.5D生成的合成图像在结构、强度和感知指标上与术中T2紧密匹配。在14名受试者的215个专家标志点上，合成锚定配准将平均目标配准误差从6.27毫米降低到5.86毫米，与强大的经典NiftyReg基线（5.85毫米）相当，同时为每个受试者产生微分同胚变形场。贡献不在于配准精度的提高，而在于集成的体积本身，它在超声视野内反映了术中切除后的状态。这为外科医生提供了手术视野的类似MRI的更新，并有可能集成到手术导航工作流程中。

MOLOT系统卡：恶意操作逻辑观察变换器

Daniil Lopatkin, Maksim Mitrofanov, Stanislav Rakovsky, Aleksandr Khalikov

发表机构 * False Positive Community

AI总结提出MOLOT系统，利用静态调用图的行为序列进行恶意代码检测，结合解释阶段定位可疑行为，在PyPI和npm包上评估，并发布Open Malicious-Code Bench基准。

Comments 13 pages, 3 figures

详情

AI中文摘要

MOLOT（恶意操作逻辑观察变换器）是一个静态恶意代码检测系统，专为SAST环境设计，其中包元数据、维护者历史和动态执行轨迹可能不可用或不可靠。该系统将源代码表示为从静态调用图派生的行为序列，并包含一个解释阶段，该阶段对可疑行为活动进行排序并将其映射回源代码位置。该方法在来自PyPI和npm的Python和JavaScript包上进行了评估，与开源检测工具进行了比较，并在产品约束下进行了验证，包括运行时、内存使用以及在实际审核工作流中观察到的误报率。我们还发布了Open Malicious-Code Bench，这是一个用于可重复评估恶意包检测方法的公共基准。结果表明，静态行为序列建模可以为现代DevSecOps工作流提供准确、可解释且可部署的恶意代码检测。

英文摘要

MOLOT (Malicious Operational Logic Observation Transformer) is a static malicious-code detection system designed for SAST setup where package metadata, maintainer history, and dynamic execution traces may be unavailable or unreliable. The system represents source code as behavior sequences derived from static call graphs, includes an explanation stage that ranks suspicious behavior activities and maps them back to source-code locations. The approach is evaluated on Python and JavaScript packages from PyPI and npm, compared with opensource detection tools, and validated under product constraints including runtime, memory use, and false-positive rates observed in a real moderation workflow. We also release Open Malicious-Code Bench, a public benchmark for reproducible evaluation of malicious-package detection methods. The results show that static behavior-sequence modeling can provide accurate, explainable, and deployable malicious-code detection for modern DevSecOps workflows.

URL PDF HTML ☆

赞 0 踩 0

2606.07798 2026-06-09 cs.AI cs.LG q-bio.NC 交叉投稿

Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings

在资源受限环境中利用常规数据重建和预测阿尔茨海默病患者的疾病轨迹

Ratnadeep Das, Atri Chatterjee, Sitikantha Roy

发表机构 * Yardi School of Artificial Intelligence (ScAI), Indian Institute of Technology Delhi（印度理工学院德里分校亚迪人工智能学院）； Department of Neurology, Vardhman Mahavir Medical College and Safdarjung Hospital（瓦尔丹·马哈维尔医学院和萨夫达戎医院神经内科）； Department of Applied Mechanics, Indian Institute of Technology Delhi（印度理工学院德里分校应用力学系）

AI总结提出GNOVA框架，结合GRU编码器和神经ODE解码器的变分自编码器，利用常规临床数据（无需神经影像或生物标志物）实现认知评分的双向预测、插值/外推及不确定性估计，在ADNI数据集上取得低误差。

详情

AI中文摘要

阿尔茨海默病是一种进行性神经退行性疾病，其进展在不同患者间差异显著。现有工作旨在预测患者未来的认知状态，但很少关注从既往就诊中重建状态。此外，当前研究中，量化预测不确定性仍未被充分探索，且依赖于MRI、PET和CSF等昂贵模态，限制了在资源有限环境中的部署。在本研究中，我们的主要目标是：第一，从不规则就诊中双向预测认知评分，以呈现完整的疾病轨迹；第二，实现插值和外推能力，以辅助临床医生做出知情预后决策；第三，为所有预测提供校准良好的不确定性估计；最后，利用常规就诊中可用的模态实现上述目标。我们提出了一个统一框架GNOVA：GRU-神经ODE变分自编码器。该架构在变分自编码器框架内结合了门控循环单元编码器和神经ODE解码器。在我们的工作中，我们预测了CDR-SB和MMSE评分。GRU编码器允许在任何时间点输入任意数量的数据。神经ODE解码器执行连续估计，允许在任何期望的时间点进行插值和外推。变分自编码器允许预测中的不确定性估计。我们使用了ADNI数据集中1727名患者超过10年的数据；该模型在无需任何神经影像或生物标志物数据的情况下，对CDR-SB和MMSE评分分别实现了1.35和2.28的平均绝对误差。特征消融研究表明，年龄、BMI和APOE4状态是强预测因子。所提出的框架能够重建不完整的患者病史并预测未来的认知状态。

英文摘要

Alzheimer's disease is a progressive neurodegenerative disorder, and its progression varies substantially across patients. Existing work aims to forecast patients' future cognitive state, with minimal focus on reconstructing the state from past visits. Furthermore, in current research, quantifying predictive uncertainty remains underexplored and relies on costly modalities such as MRI, PET, and CSF, limiting their deployment in resource-limited settings. In this research, our primary objectives are: First, bidirectional prediction of cognitive scores from irregular visits to present the complete disease trajectory. Second, to enable interpolation and extrapolation capabilities to assist clinicians in informed prognostic decision making, and third, to provide a well-calibrated uncertainty estimate for all predictions, and finally, to achieve the objectives using the modalities available during routine visits. We propose a unified framework, GNOVA: A GRU-Neural ODE Variational Autoencoder. The architecture combines a Gated Recurrent Unit encoder and a Neural ODE decoder within a variational autoencoder framework. In our work, we forecast the CDR-SB and MMSE Scores. The GRU encoder allows for any number of inputs at any time point. The Neural-ODE decoder performs continuous estimation, allowing interpolation and extrapolation at any desired time point. The Variational autoencoder allows for uncertainty estimation in predictions. We worked with 1,727 patients from the ADNI dataset over 10 years; the model achieved mean absolute errors of 1.35 and 2.28 for CDR-SB and MMSE scores, respectively, without requiring any neuroimaging or biomarker data. Feature-ablation studies revealed that age, BMI, and APOE4 status were strong predictors. The proposed framework enables the reconstruction of incomplete patient histories and the anticipation of future cognitive states.

URL PDF HTML ☆

赞 0 踩 0

2606.07843 2026-06-09 cs.DB cs.IR cs.LG 交叉投稿

RACT: Retrieval Augmented Column-Table Learning and Prediction for Multi-Table Schema Matching

RACT: 检索增强的列-表学习与预测用于多表模式匹配

Leonard Traeger, Enas Khwaileh, Andreas Behrend, George Karabatis

发表机构 * University of Maryland, Baltimore County（马里兰大学巴尔的摩县分校）； Utrecht University（乌特勒支大学）； Technical University of Cologne（科隆技术大学）

AI总结提出RACT自监督框架，通过检索候选表约束列匹配空间，在多表模式匹配中优于相似性基线，精度和完整性提升高达70%。

Comments Research Preprint, 12 pages

2606.07923 2026-06-09 cs.DB cs.AI cs.LG 交叉投稿

Larch: Learned Query Optimization for Semantic Predicates

Larch: 面向语义谓词的学习型查询优化

Fuheng Zhao, Pawel Liskowski, Zihan Li, Benjamin Han, Puxuan Yu, Varich Boonsanong, Dimitris Tsirogiannis, Anupam Datta

发表机构 * Snowflake Inc.（Snowflake公司）

AI总结提出Larch框架，利用嵌入增强的图神经网络和强化学习或监督学习优化AI SQL查询中语义过滤器的执行顺序，显著降低令牌开销。

详情

AI中文摘要

随着大型语言模型（LLM）的出现，许多数据库系统引入了语义运算符，使得能够对非结构化数据（如文本、图像、视频）进行分析查询。语义运算符通常会产生高昂的推理成本和延迟，使得语义（AI）SQL查询难以应用于大规模数据集。同时，其语义性质导致数据库引擎将其视为黑盒，使得AISQL查询难以优化。在本文中，我们介绍了Larch，一个用于优化AI SQL查询中语义过滤器执行的框架。Larch的灵感来自两个关键观察：i) 语义运算符的高延迟为计算密集型运行时优化技术留下了显著空间，ii) 非结构化数据通常伴随着嵌入形式的语义信息，允许在AI_FILTER提示和数据值之间进行高效的语义比较。基于这两个关键观察，我们提出了两种Larch变体：Larch-A2C和Larch-Sel。Larch-A2C使用嵌入增强的门控图神经网络编码任意语义过滤器表达式树，并将过滤器评估顺序表述为马尔可夫决策过程。相比之下，Larch-Sel利用监督学习模型预测过滤器选择性，随后应用动态规划为每个输入行找到接近最优的评估顺序。在多样化的真实世界数据集和全面的合成工作负载上进行评估，两种Larch变体在令牌使用方面始终优于现有的语义过滤器优化技术。我们的结果表明，Larch在不同工作负载下具有鲁棒性，与Palimpzest和Quest相比，将总令牌成本开销降低了3倍至19倍。

英文摘要

With the advent of Large Language Models (LLMs), many database systems introduced semantic operators that enabled analytical queries over unstructured data (e.g. text, images, videos). Semantic operators typically incur high inference costs and latencies making semantic (AI) SQL queries challenging to apply on large scale datasets. At the same time, their semantic nature leads database engines to treat them as black boxes, making AISQL queries difficult to optimize. In this paper, we introduce Larch, a framework for optimizing the execution of semantic filters in AI SQL queries. Larch was inspired by two key observations: i) the high latency of semantic operators leaves significant room for computationally-heavy runtime optimization techniques, ii) unstructured data are typically accompanied by semantic information in the form of embeddings allowing for efficient semantic comparisons between AI_FILTER prompts and data values. Based on these two key observations, we present two Larch variants: Larch-A2C and Larch-Sel. Larch-A2C encodes arbitrary semantic filters expression tree using an embedding-augmented Gated Graph Neural Network and formulates the filter evaluation order as a Markov decision process. In contrast, Larch-Sel leverages a supervised learning model to predict filter selectivities, subsequently applying dynamic programming to find a near-optimal evaluation order for each input row. Evaluated across diverse real-world datasets and comprehensive synthetic workloads, both Larch variants always outperform existing semantic filter optimization techniques in terms of token usage. Our results demonstrate that Larch is robust across diverse workloads, reducing total token cost overhead by 3x-19x compared to Palimpzest and Quest.

URL PDF HTML ☆

赞 0 踩 0

2606.07924 2026-06-09 cs.CV cs.AI cs.CL cs.LG cs.MM 交叉投稿

Decoupling Semantics and Logic: A Training-Free Coarse-to-Fine Pipeline for Video Retrieval-Augmented Generation

解耦语义与逻辑：一种无需训练的从粗到精的视频检索增强生成流水线

Jiaxin Dai, Zehang Wei, Jiamin Yan, Xiang Xiang

发表机构 * School of Computer Science & Tech, Huazhong University of Science and Technology（华中科技大学计算机科学与技术学院）； School of AI and Automation, Huazhong University of Science and Technology（华中科技大学人工智能与自动化学院）

AI总结提出一种无需训练的两阶段级联视频RAG流水线，通过解耦语义检索与逻辑推理，实现跨语言长视频理解、严格角色遵循和零幻觉时间定位。

Comments To be presented at ACL 2026 MAGMAR Workshop (Oral; Retrieval leaderboard No.1)

详情

AI中文摘要

本文介绍了我们为第二届多模态增强生成研讨会（MAGMaR）提交的系统描述。针对跨语言长视频理解、严格角色遵循和零幻觉时间定位等关键挑战，我们提出了一种完全无需训练的两阶段级联视频RAG流水线。我们的架构通过模态感知的任务分工，策略性地将语义检索与认知逻辑推理解耦。在第一阶段，一个高召回率的语义预取模块仅使用高保真视觉摘要和全局文本描述进行密集检索，明确隔离噪声模态（如OCR和ASR）以保持纯净的向量空间。在第二阶段，一个由商业大语言模型（LLM）驱动的自适应、迭代和推理（A.I.R.）过滤代理执行细粒度认知重排序。该代理重新整合完整的多模态上下文，以强制执行与用户角色的严格逻辑对齐，有效剪除语义相似但逻辑无关的候选。最后，提示雕刻机制约束生成器将蒸馏后的子集合成为严格格式化的JSON响应，并带有精确的块级引用。在RAG轨道上的评估表明，我们的资源感知方法在信息检索和角色条件生成方面均表现出卓越的精度。

英文摘要

This paper presents our system description for the 2nd Workshop on Multimodal Augmented Generation via MultimodAl Retrieval (MAGMaR). Addressing the critical challenges of cross-lingual long-video comprehension, strict persona adherence, and zero-hallucination temporal grounding, we propose a fully training-free, two-stage cascaded Video RAG pipeline. Our architecture strategically decouples semantic retrieval from cognitive logical reasoning through a modality-aware division of labor. In the first stage, a high-recall semantic pre-fetching module employs dense retrieval using only high-fidelity visual summaries and global text descriptions, explicitly isolating noisy modalities (e.g., OCR and ASR) to maintain a pristine vector space. In the second stage, an Adaptive, Iterative, and Reasoning-based (A.I.R.) filtering agent, powered by a commercial Large Language Model (LLM), performs fine-grained cognitive reranking. The agent re-incorporates full multimodal contexts to enforce strict logical alignment with user personas, effectively pruning semantically similar but logically irrelevant candidates. Finally, a Prompt Sculpting mechanism constrains the generator to synthesize the distilled subset into strictly formatted JSON responses with exact chunk-level citations. Evaluated on the RAG track, our resource-aware approach shows exceptional precision in both information retrieval and persona-conditioned generation.

URL PDF HTML ☆

赞 0 踩 0

2606.08033 2026-06-09 cs.CV cs.LG 交叉投稿

Balancing Real and Synthetic Data for CNN-based Masonry Crack Detection

基于CNN的砌体裂缝检测中真实与合成数据的平衡

Mattia Forlesi, Alfonso Esposito, Ivan Zyrianoff, Alessandro Marzani, Marco Di Felice

发表机构 * University of Bologna（博洛尼亚大学）

AI总结针对砌体裂缝检测中真实数据不足的问题，提出用合成数据补充训练，通过调整真实与合成数据比例，发现20%真实数据加合成数据即可达到甚至超越纯真实数据的效果。

详情

AI中文摘要

裂缝是建筑健康的关键指标，早期识别对于防止有害损害至关重要。深度学习（DL）的进展，特别是卷积神经网络（CNN），已实现可扩展的自动裂缝检测解决方案。然而，CNN性能高度依赖于大规模多样化数据集的可用性，这对于砌体等复杂表面尤其具有挑战性。收集足够的真实数据耗时，而公开数据集可能不充分。为解决这一限制，我们探索生成合成裂缝数据，以补充真实数据并提高训练效果。真实数据集由从博洛尼亚及周边地区建筑收集的砌体裂缝图像组成。相比之下，合成数据集使用裂缝叠加工具生成，该工具以受控方向和位置向背景图像添加裂缝。使用真实数据集训练多种DL架构，以确定最佳性能模型（InceptionV4），用于生成数据的实验。通过改变真实与合成数据的比例，在InceptionV4上测试了六种训练场景，并在由真实图像组成的测试集上使用F1分数和平均交并比（mIoU）指标进行评估。结果表明，在合成数据上训练加上少量20%真实数据，可获得与仅使用真实数据训练相当的结果。此外，20/80（合成/真实）场景实现了76%的F1分数和80%的平均IoU，优于纯真实情况。可以看出，该方法展示了合成数据在减少收集工作同时提高裂缝检测准确性的潜力。

英文摘要

Cracks are a critical indicator of building health, and early stage identification is fundamental to prevent harmful damages. Advances in deep learning (DL), particularly convolutional neural networks (CNNs), have enabled scalable solutions for automated crack detection. However, CNN performance strongly depends on the availability of large and diverse datasets, which is particularly challenging for complex surfaces such as masonry. Collecting sufficient real data is time-consuming, while publicly available datasets may not be adequate. To address this limitation, we explored generating synthetic crack data, which complements real data and improves training effectiveness. The real dataset consists of masonry crack images collected from buildings in Bologna and surrounding areas. In contrast, the synthetic dataset was generated using a crack overlay tool that adds cracks to background images in a controlled orientation and placement. The real dataset was used to train several DL architectures, to identify the best-performing model (InceptionV4) employed for experiments with generated data. Six training scenarios were tested in InceptionV4 by varying the ratio of real and synthetic data, with evaluation performed on a test set composed of real images using the F1-score and mean Intersection over Union (mIoU) metrics. Results show that training on synthetic data plus a modest addition of 20% real data achieves results comparable to training on real data only. Moreover, the 20/80 scenario (synthetic/real) achieved an 76% F1-score and 80% mean IoU, outperforming the real-only case. As can be seen, the method demonstrates the potential of synthetic data to reduce collection efforts while enhancing crack detection accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.08110 2026-06-09 math.FA cs.LG 交叉投稿

New Fractional Ambiguity Function Integrated with CNN-Based Machine Learning for Signal Classification

基于CNN机器学习的分数阶模糊函数新方法用于信号分类

Aamir H. Dar, Prakhar Kumar Sonkar, Neeraj Kumar Sharma

发表机构 * Mehta Family School of Data Science & Artificial Intelligence（梅hta家族数据科学与人工智能学院）； Indian Institute of Technology Guwahati（印度理工学院古瓦哈提）

AI总结提出一种新的分数阶模糊函数（NFrAF），并集成到CNN框架中，用于信号分类，相比传统方法提高了分类精度。

2606.08147 2026-06-09 q-bio.GN cs.LG 交叉投稿

Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

面向可解释调控DNA活性预测的生物学推理引导回归

Yi Duan, Zhao Yang, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su

发表机构 * Gaoling School of Artificial Intelligence（甘岭人工智能学院）； Renmin University of China（中国人民大学）； Zhongguancun Academy（中关村学院）

AI总结提出R3LM框架，通过结构化生物学知识教LLM进行推理引导回归，在增强子预测上达到最优性能并提供可解释机制。

Comments Accepted at KDD 2026 AI4Sciences Track

详情

AI中文摘要

DNA顺式调控元件（CREs）如增强子控制基因表达水平。从DNA序列准确预测调控活性是有价值但具有挑战性的，因为它需要理解复杂的生物调控过程。现有方法通常以黑盒方式从序列回归活性分数，限制了可解释性和回归性能。同时，大型语言模型（LLMs）受益于显式推理过程，但直接将LLMs应用于原始DNA序列表现不佳。在本文中，我们通过引入R3LM框架弥合这一差距，该框架通过结构化生物学知识教LLMs对调控DNA进行推理引导回归。具体来说，我们设计了一种基于生物学的数据格式，结构化DNA的调控信息以改善LLM理解，并构建了CRE-ReasonBench，这是第一个将DNA序列和活性分数与机制推理轨迹关联的数据集。通过两阶段训练，首先教LLMs对结构化生物信息进行推理，然后进行回归，R3LM在三种细胞类型的增强子预测上达到了最先进性能，优于使用原始序列输入的LLMs和专门的DNA模型，同时提供了可解释的机制解释。我们期望R3LM作为一种可解释的奖励模型，能够有效辅助生物学家进行CRE设计。代码可在https://github.com/DuanYi516/R3LM获取。

英文摘要

DNA cis-regulatory elements (CREs) such as enhancers control gene expression levels. Accurately predicting regulatory activity from DNA sequences is valuable but challenging, as it requires understanding complex biological regulatory processes. Existing methods typically regress activity scores from sequences in a black-box manner, limiting both interpretability and regression performance. Meanwhile, large language models (LLMs) benefit from explicit reasoning processes, yet directly applying LLMs to raw DNA sequences performs poorly. In this paper, we bridge this gap by introducing R3LM, a framework that teaches LLMs reasoning-informed regression on regulatory DNA through structured biological knowledge. Specifically, we design a biologically grounded data format that structures DNA's regulatory information for improved LLM understanding, and construct CRE-ReasonBench, the first dataset that associates DNA sequences and activity scores with mechanistic reasoning traces. Through two-stage training that first teaches LLMs reasoning over structured biological information then performs regression, R3LM achieves state-of-the-art performance on enhancer prediction across three cell types, outperforming both LLMs with raw sequence input and specialized DNA models while providing interpretable mechanistic explanations. We expect R3LM as an interpretable reward model that can effectively assist biologists in CRE design. Code is available at https://github.com/DuanYi516/R3LM.

URL PDF HTML ☆

赞 0 踩 0

2606.08148 2026-06-09 cond-mat.mtrl-sci cs.LG 交叉投稿

MEC-Cox：基于机器学习的广义熵校准用于ATT边际风险比估计

Se Yoon Lee, Yonghyun Kwon, Jae Kwang Kim

发表机构 * Department of Statistics, Texas A&M University（统计学系，德克萨斯A&M大学）； Department of Mathematics, Korea Military Academy（数学系，韩国军事学院）； Department of Statistics, Iowa State University（统计学系，爱荷华州立大学）

AI总结提出MEC-Cox方法，结合机器学习辅助的广义熵校准与逆概率加权Cox回归，估计处理组平均处理效应（ATT）边际风险比，通过校准预后评分减少偏差并提高效率。

详情

AI中文摘要

当同时随机对照不可行时，外部对照生存试验越来越多地用于肿瘤学和罕见病等具有时间至事件终点的场景。我们针对处理组平均处理效应（ATT）类型的边际风险比估计量，比较处理组试验人群中的治疗与反事实对照，并使用逆概率加权（IPW）Cox回归进行估计。由于IPW Cox回归通过事件贡献和风险集平均值依赖于权重，使得灵活的机器学习干扰估计难以直接纳入，有效推断具有挑战性。基于Lee和Kim（2026）的机器学习辅助广义熵校准（MEC），我们提出了用于ATT加权IPW Cox回归的MEC-Cox方法。该方法首先对外部对照使用归一化的源倾向得分优势比权重，然后应用Bregman校准来平衡外部对照与处理组试验患者之间的交叉拟合预后摘要。校准基础可包括对照生存预测、Cox线性预测器、惩罚生存模型预测或其他预后评分摘要。因此，MEC更新后的权重扮演源传输和预后评分平衡权重的双重角色。我们建立了相合性，刻画了校准带来的效率增益，并开发了堆叠三明治方差估计器。模拟表明，MEC-Cox通过灵活的机器学习辅助调整可以减少偏差、提高效率并改善覆盖。

英文摘要

Externally controlled survival trials are increasingly used when concurrent randomized controls are infeasible, particularly in oncology and rare-disease settings with time-to-event endpoints. We target an average-treatment-effect-on-the-treated (ATT)-type marginal hazard-ratio estimand, comparing treatment with counterfactual control in the treated trial population, and estimate it using inverse-probability-weighted (IPW) Cox regression. Valid inference is challenging because IPW Cox regression depends on the weights through both event contributions and risk-set averages, making flexible machine-learning nuisance estimation difficult to incorporate directly. Building on machine-learning-assisted generalized entropy calibration (MEC) by Lee and Kim (2026), we propose MEC-Cox for ATT-weighted IPW Cox regression. The method begins with normalized source-propensity-score odds weights for external controls and then applies Bregman calibration to balance cross-fitted prognostic summaries between external controls and treated trial patients. The calibration basis may include control-survival predictions, Cox linear predictors, penalized-survival-model predictions, or other prognostic-score summaries. MEC-updated weights therefore play a dual role as source-transport and prognostic-score balancing weights. We establish consistency, characterize a calibration-induced efficiency gain, and develop a stacked sandwich variance estimator. Simulations show that MEC-Cox can reduce bias, increase efficiency, and improve coverage through flexible machine-learning-assisted adjustment.

URL PDF HTML ☆

赞 0 踩 0

2606.08587 2026-06-09 stat.ML cs.LG 交叉投稿

Improving the sharpness in neural network-based parametric post-processing of ensemble forecasts

提高基于神经网络的集合预报参数化后处理中的锐度

Ágnes Baran, Máté Mihalina

发表机构 * Faculty of Informatics, University of Debrecen（德布雷岑大学信息学院）

AI总结针对集合预报后处理中锐度下降的问题，通过在损失函数中加入惩罚项，在保持CRPS和RMSE不变的情况下，将中心预测区间宽度相对减小8.2%-12.5%。

Comments 18 pages

详情

AI中文摘要

统计后处理已被证明是改进不同天气变量集合预报的有效工具。案例研究表明，后处理可以纠正集合预报通常存在的分散不足和潜在偏差行为，同时优化表示预报技巧的适当评分规则。这些积极效应的代价通常是锐度下降；中心预测区间的宽度和预测的不确定性增加，尤其是在较短预报时效。本研究旨在通过扩展网络损失函数加入惩罚项，减少基于神经网络的参数化后处理方法中后一种现象的程度。我们使用从EUPPBench基准数据集下载的欧洲中期天气预报中心2米温度集合预报，并对照天气观测进行验证，展示了所提技术的效果。这里，预测分布为高斯分布，我们使用连续排序概率评分（CRPS）作为损失函数。案例研究证实，与未加惩罚项计算的预测分布宽度相比，名义中心预测区间的宽度有显著相对减小（8.2%-12.5%），而概率预报的平均CRPS和预测均值的RMSE没有恶化。

英文摘要

Statistical post-processing has proven to be an effective tool in improving ensemble forecast of different weather variables. Case studies show that post-processing can remedy the typically underdispersive and potentially biased behaviour of the ensemble while optimizing a proper scoring rule expressing the forecast skill. The price of these positive effects is generally a deterioration in sharpness; the width of the central prediction intervals and the uncertainty of the predictions are increasing, especially for shorter lead times. This work aims to reduce the extent of the latter phenomenon for neural network-based parametric post-processing methods by extending the network's loss function with a penalty term. We demonstrate the effect of the proposed technique for 2m temperature ensemble forecasts of the European Centre for Medium-Range Weather Forecasts downloaded from the EUPPBench benchmark dataset and verified against synoptic observations. Here, the predictive distribution is Gaussian, and we use the continuous ranked probability score (CRPS) as loss function. The case studies confirm a substantial relative decrease ($8.2\%-12.5\%$) in the width of the nominal central prediction interval compared to the width of the predictive distribution computed without the penalty term, while there is no deterioration in the mean CRPS of probabilistic forecasts and in the RMSE of the predictive mean.

URL PDF HTML ☆

赞 0 踩 0

2606.08611 2026-06-09 eess.SY cs.LG cs.SY 交叉投稿

Bayesian Optimization of a Multi-Product Chemical Reactor Using Composite Models and Partial Physics Knowledge

使用复合模型和部分物理知识的多产品化学反应器贝叶斯优化

Liqiu Dong, Marta Zagórowska, Mehmet Mercangöz

发表机构 * Department of Chemical Engineering, Imperial College London（化学工程系，帝国理工学院伦敦分校）； DCSC, Delft University of Technology（Delft理工大学DCSC）

AI总结提出一种复合贝叶斯优化方法，利用高斯过程预测物理量并计算利润，结合能量平衡残差惩罚和约束处理，实现多产品反应器的数据驱动实时经济优化。

Comments Accepted to IFAC 2026. 11 pages, 4 figures

详情

AI中文摘要

我们研究了多产品化学反应器的数据驱动实时经济优化问题，当没有可靠的基于第一性原理的模型（除了稳态能量平衡）时。我们不直接学习经济目标作为黑箱函数，而是使用复合公式，其中高斯过程（GP）模型预测物理上有意义的输出，包括产品浓度和反应器温度，而利润则根据这些预测以及原材料、产品和公用事业价格解析计算。这保留了经济目标的结构，使其在价格变化时无需重新训练即可参数化，并允许通过物理残差检查候选操作点是否符合可用的能量平衡。GP还提供预测不确定性，在贝叶斯优化（BO）框架中利用该不确定性进行数据高效探索以及通过上置信界保守地执行反应器温度约束。采集函数还惩罚通过将GP预测的输出和候选输入代入可用的稳态能量平衡而获得的大能量平衡失配。该方法在非等温多产品反应器的基准模拟上进行了演示。相对于信任域安全BO实现，所提出的方法在可用迭代预算内实现了更好的模拟经济性能。相对于不使用可用物理信息的纯数据驱动BO方法，它避免了反应器温度约束违反。

英文摘要

We study data-driven real-time economic optimization of a multi-product chemical reactor when no reliable first-principles model is available beyond a steady-state energy balance. Instead of learning the economic objective directly as a black-box function, we use a composite formulation in which Gaussian process (GP) models predict physically meaningful outputs, including product concentrations and reactor temperature, while profit is computed analytically from these predictions together with raw-material, product, and utility prices. This preserves the structure of the economic objective, makes it parametric in changing prices without needing retraining, and allows candidate operating points to be checked against the available energy balance through a physics residual. The GPs also provide predictive uncertainty, which is exploited in a Bayesian optimization (BO) framework both for data-efficient exploration and for conservative enforcement of the reactor temperature constraint through an upper confidence bound. The acquisition function additionally penalizes large energy-balance mismatch obtained by substituting the GP-predicted outputs and candidate inputs into the available steady-state energy balance. The approach is demonstrated on a benchmark simulation of a non-isothermal multi-product reactor. Relative to a trust-region safe BO implementation, the proposed method achieves better simulated economic performance within the available iteration budget. Relative to a purely data-driven BO approach that does not use the available physics information, it avoids reactor temperature constraint violations.

URL PDF HTML ☆

赞 0 踩 0

2606.08633 2026-06-09 cs.AI cs.LG 交叉投稿

Towards Long-Horizon Vessel Trajectory and Destination Forecasting with Reasoning Large Language Models

面向长时域船舶轨迹与目的地预测的推理型大语言模型

Hongwei Wang, Miao Zhou, Fengde Wang, Yuting Wang, Jiewen Yu, Jun-Yan He, Bohao Qu, Wanbing Zhang, Xiuju Fu, Qing Guo, Zipei Fan, Yingying Xing, Yi Yuan

发表机构 * Institute of High Performance Computing (IHPC), A*STAR, Singapore（新加坡科技研究局高性能计算研究所）； The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University（同济大学道路与交通工程教育部重点实验室）； Meituan Inc., Shenzhen, China（美团（深圳））； Centre for Frontier AI Research (CFAR), A*STAR, Singapore（新加坡科技研究局前沿人工智能研究中心）； Nankai University（南开大学）； School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）

AI总结提出基于可验证奖励强化学习（RLVR）的Maritime LLM后训练框架，将轨迹转化为语义文本，通过物理有效性约束和层次匹配提升长时域（30天）预测精度，4B模型表现最优。

Comments The IEEE International Conference on Intelligent Transportation Systems (ITSC) 2026, Naples, Italy

详情

AI中文摘要

长时域海上轨迹预测对航运管理、物流规划和海上风险分析至关重要，但月度级别的预测仍研究不足。现有深度学习方法主要关注短期和中期坐标外推，在长时间跨度下往往难以保持路线可行性和目的地正确性。本文研究了利用具备推理能力的大语言模型进行联合长时域船舶轨迹和目的地预测，并基于可验证奖励强化学习（RLVR）开发了Maritime LLM后训练框架。构建了一个基于AIS的基准数据集，包含60天历史轨迹和30天预测范围，其中轨迹被转换为语义文本表示用于RL提示构建。RLVR通过强制执行物理有效性、提供早期加权轨迹监督以及通过层次匹配和课程学习评估目的地正确性，使LLM与海上预测目标对齐。实验结果表明，RLVR训练的LLM在零样本LLM和代表性深度学习基线方法上均有显著提升，尤其在目的地相关指标上。在评估的RLVR训练变体中，4B LLM实现了最佳整体性能，表明奖励兼容优化和任务特定容量匹配比单纯使用更大的8B或14B LLM更为重要。结果还显示，在有限的微调数据下，LSTM仍然是一个强大的深度学习基线，而Transformer风格的时空模型通常需要更大的数据集和更丰富的结构化输入。总体而言，这项工作推进了用于运营决策支持的语义化、验证器对齐的海上预测。

英文摘要

Long-horizon maritime trajectory prediction is important for shipping management, logistics planning, and maritime risk analysis, yet month-level forecasting remains insufficiently studied. Existing deep learning methods mainly focus on short- and mid-term coordinate extrapolation and often struggle to preserve route feasibility and destination correctness over extended horizons. This paper investigates joint long-horizon vessel trajectory and destination forecasting with reasoning-capable large language models, and develops a Maritime LLM post-training framework based on Reinforcement Learning with Verifiable Reward (RLVR). An AIS-based benchmark is constructed with 60-day historical trajectories and 30-day forecasting horizons, where trajectories are converted into semantic textual representations for RL prompt construction. RLVR aligns LLMs with maritime forecasting objectives by enforcing physical validity, providing early-weighted trajectory supervision, and evaluating destination correctness through hierarchical matching and curriculum learning. Experimental results show that RLVR-trained LLMs substantially improve over zero-shot LLMs and representative deep learning baselines, especially on destination-related metrics. Among the evaluated RLVR-trained variants, 4B LLMs achieve the best overall performance, suggesting that reward-compatible optimization and task-specific capacity matching are more important than simply using larger 8B or 14B LLMs. The results also show that LSTM remains a strong deep learning baseline under limited fine-tuning data, while Transformer-style spatio-temporal models typically require larger datasets and richer structured inputs. Overall, this work advances semantic, verifier-aligned maritime forecasting for operational decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.08714 2026-06-09 eess.SY cs.AI cs.LG cs.RO cs.SY 交叉投稿

Hybrid Neural Network and Conventional Controller Approach for Robust Control of Highly Unstable Systems: Application to Tilt-Rotor Control

混合神经网络与传统控制器方法用于高度不稳定系统的鲁棒控制：应用于倾转旋翼控制

Ali Kafili Gavgani, Amin Talaeizadeh, Aria Alasty, Hossein Nejat Pishkenari

发表机构 * Advanced Research Lab for Control and Agricultural Robotics (Sharif AgRoLab)（控制与农业机器人高级研究实验室（谢尔生产大学AgRoLab））； Department of Mechanical Engineering, Sharif University of Technology, Tehran, Iran（技术大学机械工程系，德黑兰，伊朗）

AI总结提出一种神经网络增强的滑模控制器，将系统动力学分解为输入无关和输入相关部分，前者用轻量网络从少量数据学习，实现对全驱动倾转旋翼系统的鲁棒控制，LSTM优于MLP。

Comments Proceedings of the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025)

详情

DOI: 10.6084/m9.figshare.32572083

AI中文摘要

多旋翼飞行器广泛应用于从监视到精准农业等领域，但传统设计仍受限于其欠驱动特性。倾转旋翼配置通过实现全驱动克服了这一限制。本文研究基于神经网络的控制策略，用于一个具有四个推力矢量输入的全驱动倾转旋翼系统。我们的工作分为两部分。首先，我们有意呈现一个负面结果，通过评估直接输入-输出控制方法。在该方法中，多层感知器（MLP）、长短期记忆（LSTM）网络和Transformer模型被训练为直接将系统状态及其期望值映射到控制信号。我们表明该策略无法稳定系统，凸显了将直接输入-输出学习应用于高度不稳定对象的固有困难。其次，作为主要贡献，我们提出一种神经网络增强的滑模控制器（SMC）。该方法将系统动力学分解为输入无关和输入相关两部分，前者使用轻量网络从少量数据集学习，从而降低实时计算需求。此外，所提方法可以使用从低性能控制器收集的飞行日志进行训练，并且从真实数据学习到的动力学模型可用于仿真。我们进一步比较了基于MLP和LSTM的实现，在模型不确定性和外部干扰下，展示了所提方法的鲁棒性和有效性；特别是，带有LSTM植物动力学预测器的控制器相比基于MLP的对应物实现了更优性能，同时运行时也更低。

英文摘要

Multirotors are widely used in applications ranging from surveillance to precision agriculture, yet conventional designs remain limited by their under-actuation. Tilt-rotor configurations overcome this limitation by enabling full actuation. This paper investigates neural-network-based control strategies for a fully actuated tilt-rotor system with four thrust-vectoring inputs. Our work is structured in two parts. First, we deliberately present a negative result by evaluating a direct input-output control approach. In this method, multilayer perceptrons (MLPs), long short-term memory (LSTM) networks, and transformer models are trained to map system states and their desired values directly to control signals. We show that this strategy fails to stabilize the system, highlighting the inherent difficulty of applying direct input-output learning to highly unstable plants. Second, as the main contribution, we propose a neural-network-enhanced sliding mode controller (SMC). The method decomposes the system dynamics into input-independent and input-dependent components, with the former learned from a small dataset using lightweight networks, thereby reducing real-time computational demands. Moreover, the proposed method can be trained using flight logs collected from low-performance controllers, and the resulting dynamic model learned from real-world data can be used in simulation. We further compare MLP- and LSTM-based implementations under model uncertainties and external disturbances, demonstrating the robustness and effectiveness of the proposed approach; in particular, the controller with the LSTM plant dynamics predictor achieves superior performance to its MLP-based counterpart while also exhibiting lower runtime.

URL PDF HTML ☆

赞 0 踩 0

2606.08770 2026-06-09 cs.CL cs.AI cs.CV cs.LG 交叉投稿

TeamHerald@CHIPSAL 2026: Hate Speech Detection and Sentiment Analysis of Nepali Memes using Transformer-based Architectures and Ensemble Learning

TeamHerald@CHIPSAL 2026：基于Transformer架构和集成学习的尼泊尔语模因仇恨言论检测与情感分析

Ashish Acharya, Anish Khatiwada, Rohit Khadka, Pragya Aryal

发表机构 * Herald College Kathmandu（加德满都赫尔德学院）

AI总结针对尼泊尔语模因中代码混合和资源匮乏问题，采用OCR提取文本并结合Transformer模型，发现硬/软投票集成策略在二分类和多分类任务中表现不同，软投票在多类情感任务中提升15.8%的Macro F1分数。

Comments Accepted at the 2nd Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2026) at LREC 2026

详情

AI中文摘要

尼泊尔语互联网模因的分析因频繁的代码混合和缺乏已建立的基线资源而变得复杂。虽然模因本质上结合了视觉和文本元素，但本研究侧重于以文本为中心的方法，通过OCR层提取嵌入文本，并使用基于Transformer的架构进行建模。我们评估了六种不同的模型，并研究了硬投票和软投票集成策略在两项任务中的比较效果：二分类仇恨言论检测和三分类情感分析。实验结果表明，独立的仅解码器模型在二分类任务中取得了最高性能，而软投票集成在多类情感任务中表现最佳，相比最强的独立基线，Macro F1分数相对提升了15.8%。这些发现表明，集成策略在二分类和多类任务中表现不同，突出了选择适合分类目标的聚合方法的重要性。

英文摘要

The analysis of internet memes in the Nepali language is complicated by frequent code-mixing and a lack of established baseline resources. While memes inherently combine visual and textual elements, this study focuses on a text-centric approach by extracting embedded text using an OCR layer and modeling it with Transformer-based architectures. We evaluate six distinct models and investigate the comparative effectiveness of Hard and Soft Voting ensemble strategies across two tasks: binary hate speech detection and three-class sentiment analysis. Experimental results show that a standalone decoder-only model achieved the highest performance for binary classification, whereas the Soft Voting ensemble performed best for the multi-class sentiment task, yielding a 15.8% relative improvement in Macro F1-score over the strongest standalone baseline. These findings suggest that ensemble strategies behave differently across binary and multi-class tasks, highlighting the importance of selecting aggregation methods suited to the classification objective.

URL PDF HTML ☆

赞 0 踩 0

2606.08843 2026-06-09 cs.SD cs.LG 交叉投稿

From A to B to A: Palindromic Zero-Shot Voice Conversion with Non-Parallel Data

从A到B再回到A：基于非平行数据的回文零样本语音转换

Moshe Mandel, Shlomo E. Chazan

发表机构 * Independent, Israel（以色列独立机构）； OriginAI, Israel（以色列OriginAI公司）

AI总结提出利用WavLM表示的K近邻检索对齐非平行语音，构建合成训练对，结合说话人损失实现零样本语音转换，在仅用英语数据训练下跨语言表现优异。

2606.08973 2026-06-09 q-bio.QM cs.LG 交叉投稿

面向复杂查询的驾驶视频检索与结构化对齐

Manyi Yao, Sparsh Garg, Christian Shelton, Amit Roy-Chowdhury, Abhishek Aich

发表机构 * NEC Laboratories, America（美国NEC实验室）； University of California, Riverside（加州大学河滨分校）

AI总结提出STRIVE-D框架，通过弱监督领域视频校准规则、融合视觉语言与关键词检索信号，在驾驶视频检索中实现高达84%的top-1准确率提升。

详情

AI中文摘要

大规模视频检索是自动驾驶中数据整理和安全验证的核心，用户不仅希望找到场景，还希望找到诸如切入和急刹车等动态事件。现有的视觉语言和基于关键词的检索方法常常遗漏这些事件，因为相关的运动可能没有在文本中明确描述或通过词汇重叠捕获。基于规则的检索可以更直接地编码此类事件，但它是脆弱的：生成的或手工编写的规则在假设与真实驾驶数据不匹配时常常失败。我们提出了STRIVE-D，一种针对驾驶视频的数据校准检索框架。它使用弱标记的领域内视频来估计查询规则何时可靠，调整与观测数据不匹配的规则，并将校准后的规则分数与视觉语言和基于关键词的检索信号融合。在三个驾驶基准测试中，包括新发布的DrivingDojo上的人工标注事件数据，STRIVE-D相对于最先进方法在top-1准确率上实现了高达84%的相对改进。

英文摘要

Video retrieval at scale is central to data curation and safety validation in autonomous driving, where users want to find not only scenes but also dynamic events such as cut-ins and hard braking. Existing vision-language and keyword-based retrieval methods often miss these events because the relevant motion may not be explicitly described in text or captured by lexical overlap. Rule-based retrieval can encode such events more directly, but it is brittle: generated or hand-written rules often fail when their assumptions do not match real driving data. We propose STRIVE-D, a data-calibrated retrieval framework for driving videos. It uses weakly labeled in-domain videos to estimate when a query rule is reliable, adapt rules that mismatch observed data, and fuse calibrated rule scores with vision-language and keyword-based retrieval signals. Across three driving benchmarks, including newly released human-annotated event data on DrivingDojo, STRIVE-D delivers up to 84% relative improvement in top-1 accuracy over state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2606.09271 2026-06-09 cs.SD cs.LG 交叉投稿

Multi-View Speech Representation Learning for Parkinson's Disease Detection Using Context-guided Cross-modal Attention

基于上下文引导跨模态注意力的多视角语音表示学习用于帕金森病检测

George Theodosiou, Loukas Ilias, Dimitris Askounis

发表机构 * National Technical University of Athens（雅典国家技术大学）

AI总结提出多分支深度学习框架，融合Log-Mel谱图、MFCC和HuBERT嵌入三种互补语音模态，通过上下文引导跨模态注意力机制动态加权，在PC-GITA语料库上实现91.51%准确率和95.97% AUC，验证了异质语音建模对帕金森病检测的有效性。

详情

AI中文摘要

帕金森病（PD）是一种进行性神经退行性疾病，常导致与运动功能减退性构音障碍相关的言语障碍。由于言语产生依赖于复杂神经肌肉机制的精确协调，语音分析已成为早期PD检测中一种有前景的非侵入性、成本效益高的生物标志物。最近的深度学习方法显示出令人鼓舞的结果；然而，大多数现有方法依赖单一语音表示，可能忽略跨不同特征空间编码的互补病理信息。在这项工作中，我们提出了一种多分支深度学习框架，用于从语音中自动检测PD。每个录音被分割成5秒的片段，并使用三种互补模态表示：Log-Mel谱图、MFCC和从原始波形中提取的HuBERT嵌入。谱图使用预训练的ResNet-18编码器处理，MFCC序列通过BiLSTM网络建模，原始语音使用预训练的HuBERT模型编码。为了有效整合这些异质表示，我们引入了一种上下文引导的跨模态注意力机制，该机制根据来自谱图和MFCC分支的全局声学上下文动态加权时间HuBERT嵌入。在公开的西班牙语PC-GITA语料库上，在严格的说话人独立5折交叉验证下进行的实验证明了所提出方法的有效性。所提出的架构实现了91.51%的准确率、91.24%的F1分数和95.97%的AUC。此外，消融研究证实了所提出的上下文引导跨模态注意力机制以及互补语音表示整合的贡献。这些发现突显了异质语音建模在稳健且临床可靠的PD检测中的潜力。

英文摘要

Parkinson's disease (PD) is a progressive neurodegenerative disorder that frequently causes speech impairments associated with hypokinetic dysarthria. As speech production relies on the precise coordination of complex neuromuscular mechanisms, speech analysis has emerged as a promising non-invasive and cost-effective biomarker for early PD detection. Recent deep learning approaches have shown encouraging results; however, most existing methods rely on a single speech representation, potentially overlooking complementary pathological information encoded across different feature spaces. In this work, we propose a multi-branch deep learning framework for automatic PD detection from speech. Each recording is segmented into 5-second chunks and represented using three complementary modalities: Log-Mel spectrograms, MFCCs, and HuBERT embeddings extracted from raw waveforms. The spectrograms are processed using a pre-trained ResNet-18 encoder, MFCC sequences are modeled through a BiLSTM network, and raw speech is encoded using a pre-trained HuBERT model. To effectively integrate these heterogeneous representations, we introduce a context-guided cross-modal attention mechanism that dynamically weights temporal HuBERT embeddings according to the global acoustic context derived from the spectrogram and MFCC branches. Experiments conducted on the publicly available Spanish PC-GITA corpus under strict speaker-independent 5-fold cross-validation demonstrate the effectiveness of the proposed approach. The proposed architecture achieves an accuracy of 91.51%, an F1-score of 91.24%, and an AUC of 95.97%. Furthermore, ablation studies confirm the contribution of both the proposed context-guided cross-modal attention mechanism and the integration of complementary speech representations. These findings highlight the potential of heterogeneous speech modeling for robust and clinically reliable PD detection.

URL PDF HTML ☆

赞 0 踩 0

2606.09362 2026-06-09 cs.CV cs.LG 交叉投稿

Zero-Shot Semantic Re-Identification for Autonomous Driving: A VLM Baseline Study

零样本语义重识别用于自动驾驶：一项VLM基线研究

Eduardo Borges, Manuel Abreu, Luís Garrote, Urbano J. Nunes

发表机构 * Autonomous Mobile Robot（自主移动机器人）； University of Minho（明德大学）

AI总结提出使用视觉-语言模型生成语义描述进行零样本重识别，在自动驾驶场景中实现与监督CNN基线相当的检索性能，并增强可解释性。

Comments 7 pages

详情

AI中文摘要

自动驾驶中的重识别通常被表述为一个视觉匹配问题，其中车辆、行人和骑自行车者的观测通过学习的外观嵌入在时间、帧或相机视图之间进行关联，通常辅以运动、几何或多模态线索。然而，纯视觉表示可能对视角、遮挡、光照和传感器域变化敏感，限制了其在复杂驾驶场景中的可解释性和鲁棒性。我们提出了一项零样本管道的基线研究，使用视觉-语言模型生成检测到的交通参与者的文本描述，并评估这些描述是否能够支持跨观测的身份匹配。该公式不仅依赖低层次视觉相似性，而是通过结构化语义属性表示每个对象，包括类别、颜色、形状、姿态、可见部分、空间上下文和独特的视觉线索。本研究为自动驾驶场景中基于语言的重识别提供了初始基准，讨论并评估了当前VLM在此任务中的优势和局限性。结果表明，零样本语义描述可以支持有效的对象重识别，实现与监督CNN基线相当的检索性能，同时通过显式身份线索提供更大的可解释性。然而，实验也揭示了重要挑战，包括跨视角的属性不一致以及视觉相似实例之间的细粒度区分有限。

英文摘要

Re-Identification (ReID) in autonomous driving is typically formulated as a visual matching problem, where observations of vehicles, pedestrians, and cyclists are associated across time, frames, or camera views using learned appearance embeddings, often complemented by motion, geometric, or multimodal cues. However, purely visual representations may be sensitive to viewpoint, occlusion, illumination, and sensor-domain variations, limiting their interpretability and robustness in complex driving scenes. We propose a baseline study of a zero-shot pipeline using Vision-Language Models (VLMs) to generate textual descriptions of detected traffic participants and evaluate whether these descriptions can support identity matching across observations. Instead of relying only on low-level visual similarity, the proposed formulation represents each object through structured semantic attributes, including category, color, shape, pose, visible parts, spatial context, and distinctive visual cues. This study provides an initial benchmark for language-based re-identification in autonomous-driving scenarios, discussing and evaluating the strengths and limitations of current VLMs for this task. Results demonstrate that zero-shot semantic descriptions can support effective object re-identification, achieving retrieval performance comparable to a supervised CNN baseline while offering greater interpretability through explicit identity cues. However, the experiments also reveal important challenges, including attribute inconsistency across viewpoints and limited fine-grained discrimination between visually similar instances.

URL PDF HTML ☆

赞 0 踩 0

2606.09451 2026-06-09 cs.RO cs.CV cs.LG 交叉投稿

Dense Force Estimation with an Event-based Optical Tactile Sensor

基于事件的光学触觉传感器的稠密力估计

Agis Politis, René Zurbrügg, Valentina Cavinato

发表机构 * Sony Advanced Visual Sensing, Zurich, Switzerland（索尼高级视觉传感公司，苏黎世，瑞士）； ETH Zürich（苏黎世联邦理工学院）

AI总结提出首个利用事件相机重建稠密3D力场的方法，通过事件数据估计表面位移并映射为力，平均误差(0.14N,0.10N,0.93N)，工作频率100Hz。

详情

AI中文摘要

人类依赖空间稠密、几何和力感知的触觉反馈以高时间分辨率进行灵巧操作。虽然基于视觉的触觉传感器能够实现稠密力估计，但受限于相机帧率、运动模糊和数据带宽。基于事件的光学触觉传感器具有微秒级时间分辨率和低运动模糊的优点，但现有方法仅限于预测净力。我们提出了首个利用基于事件的光学触觉传感器进行稠密3D力场重建的框架。我们的方法从事件数据估计3D表面位移，并通过逆有限元方法（iFEM）将其映射为力。剪切位移通过所提出的事件标记跟踪算法恢复，而法向位移则由卷积神经网络预测，该网络在收集的同步力-位移-事件数据集上训练。实验表明，该方法能够准确重建物理力，在力范围高达(4N,4N,20N)时，平均绝对误差为(0.14N,0.10N,0.93N)，同时以平均100Hz的频率运行。这项工作为在机器人抓取和灵巧操作中实现高频控制的稠密力反馈迈出了第一步。

英文摘要

Humans rely on spatially dense, geometry and force-aware tactile feedback at high temporal resolution for dexterous manipulation. While vision-based tactile sensors enable dense force estimation, they are limited by camera frame rates, motion blur, and data bandwidth. Event-based optical tactile sensors offer an attractive alternative with microsecond temporal resolution and low motion blur, but existing methods are restricted to predicting only net forces. We introduce the first framework for dense 3D force field reconstruction using event-based optical tactile sensors. Our approach estimates 3D surface displacements from event data and maps them to forces via the inverse Finite Elements Method (iFEM). Shear displacements are recovered through the proposed event-based marker tracking algorithm, while normal displacements are predicted by a convolutional neural network trained on a collected dataset of synchronized force-displacement-event data. Experiments demonstrate accurate reconstruction of physically grounded forces, achieving a mean absolute error of (0.14 N, 0.10 N, 0.93 N) over force ranges up to (4 N, 4 N, 20 N), while operating at an average of 100 Hz. This work constitutes a first step toward enabling dense force feedback for high-frequency control in robotic grasping and dexterous manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.09541 2026-06-09 physics.app-ph cs.LG 交叉投稿

Automating the Expert Eye: A System-Agnostic Deep Learning Framework for Rare Event Discovery in Imbalanced Force Spectroscopy

自动化专家眼：用于非平衡力谱中稀有事件发现的系统无关深度学习框架

Jorge Rodriguez-Ramos

发表机构 * Independent Researcher（独立研究者）； Marseille, France（法国马赛）

AI总结提出一种系统无关的可解释深度学习框架，利用1D到2D光栅化几何矩阵和修改的ResNet18架构，结合非对称Focal Loss，在极端类别不平衡的力谱数据中实现高召回率（0.9231），并通过双阈值分诊系统减少90%以上人工审核工作量。

Comments 13 pages, 2 figures, 2 tables

详情

AI中文摘要

单分子力谱（SMFS）为生物分子力学提供了前所未有的见解，然而高通量生成的力-延伸轨迹造成了严重的数据筛选瓶颈。在数千条噪声主导的曲线中识别罕见的分子解绑事件传统上依赖于繁琐、不可扩展的人工审核。在这里，我们提出了一个系统无关、可解释的深度学习框架，专门用于克服自动SMFS分诊中的极端类别不平衡。利用1D到2D光栅化几何矩阵，我们部署了由非对称Focal Loss目标函数控制的修改版ResNet18架构。我们在R. champanellensis纤维小体的复杂机械解折叠路径上评估了该框架。在超不平衡测试条件下，目标相互作用仅占数据集的1.34%（970条轨迹中13个真实事件），模型实现了0.9196的整体准确率和0.9231的惊人真阳性率（召回率）。通过实施经验校准的双阈值分诊系统，该流程自动丢弃了880条明确的背景噪声轨迹，将人工审核工作量减少超过90%，同时安全地保留了高价值的稀有数据。最后，梯度加权类激活映射（Grad-CAM）可视化验证了网络的决策牢固地基于力曲线的相关几何特征，特别是定位于结构解绑区域，有效缓解了“黑箱”质疑。该开源工具专为免费云端执行而构建，使生物物理学社区能够民主化地实现可扩展、高精度的分子发现。

英文摘要

Single-Molecule Force Spectroscopy (SMFS) provides unprecedented insights into biomolecular mechanics, yet the high-throughput generation of force-extension trajectories creates a severe data curation bottleneck. Identifying rare molecular unbinding events within thousands of noise-dominated curves traditionally relies on tedious, non-scalable manual auditing. Here, we present a system-agnostic, interpretable deep learning framework tailored to overcome extreme class imbalance in automated SMFS triage. Utilizing 1D-to-2D rasterized geometric matrices, we deployed a modified ResNet18 architecture governed by an asymmetric Focal Loss objective function. We evaluated this framework on the complex mechanical unfolding pathways of the R. champanellensis cellulosome. Under hyper-imbalanced test conditions where the target interaction constituted only 1.34% of the dataset (13 true events out of 970 traces), the model achieved an overall accuracy of 0.9196 and a remarkable True Positive Rate (Recall) of 0.9231. By implementing an empirically calibrated dual-threshold triage system, the pipeline automatically discarded 880 unambiguous background noise traces , reducing the manual curation workload by over 90% while safely preserving high-value rare data. Finally, Gradient-weighted Class Activation Mapping (Grad-CAM) visually validated that the network's decisions are firmly anchored in the relevant geometric features of the force curves, specifically localizing on the structural unbinding regions, effectively mitigating 'black-box' skepticism. Built for free cloud-based execution, this open-source tool democratizes scalable, highly precise molecular discovery across the biophysics community.

URL PDF HTML ☆

赞 0 踩 0

2606.09558 2026-06-09 q-bio.GN cs.LG 交叉投稿

从流程图学习：用于流程图自动补全的生成式Transformer模型

Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann

发表机构 * University of Freiburg（弗赖堡大学）

AI总结受文本自动补全启发，提出基于SFILES 2.0字符串表示和Transformer语言模型的化工流程图自动补全方法，通过预训练和微调实现交互式流程图合成辅助。

详情

DOI: 10.1016/j.compchemeng.2023.108162
Journal ref: Computers and Chemical Engineering Volume 171, March 2023, 108162

AI中文摘要

我们提出了一种新颖的方法，能够实现化工流程图的自动补全。这一想法受到文本自动补全的启发。我们使用基于文本的SFILES 2.0符号将流程图表示为字符串，并利用基于Transformer的语言模型学习SFILES 2.0语言的语法结构以及流程图中的常见模式。我们在合成生成的流程图拓扑上预训练模型，以学习流程图语言语法。然后，通过迁移学习步骤在真实流程图拓扑上微调模型。最后，我们使用训练好的模型进行因果语言建模，以自动补全流程图。最终，所提出的方法可以在交互式流程图合成过程中为化学工程师提供建议。结果表明，该方法在未来AI辅助过程合成中具有巨大潜力，但也揭示了当前阶段的局限性以及在实际流程图合成场景中部署该技术需要采取的后续步骤。

英文摘要

We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheet topologies to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis but also reveal the limitations at the present state and the next steps that need to be taken to deploy this technique in realistic flowsheet synthesis scenarios.

URL PDF HTML ☆

赞 0 踩 0

2312.02873 2026-06-09 cs.LG cs.AI 版本更新

Toward autocorrection of chemical process flowsheets using large language models

利用大型语言模型实现化工流程图的自动纠错

Lukas Schulze Balhorn, Marc Caballero, Artur M. Schweidtmann

发表机构 * Process Intelligence Research Group, Department of Chemical Engineering, Delft University of Technology（过程智能研究组，化学工程系，代尔夫特理工大学）

AI总结提出一种基于大型语言模型的生成式AI方法，自动识别化工流程图中的错误并给出修正建议，在合成数据集上达到80%的top-1准确率。

详情

DOI: 10.1016/B978-0-443-28824-1.50519-6
Journal ref: Computer Aided Chemical Engineering, Volume 53, 2024, Pages 3109-3114

AI中文摘要

过程工程领域广泛使用工艺流程图（PFD）和管道及仪表流程图（P&ID）来表示工艺流程和设备配置。然而，P&ID和PFD（以下统称为流程图）可能包含错误，导致安全隐患、操作效率低下和不必要的开支。纠正和验证流程图是一个繁琐的手动过程。我们提出了一种新颖的生成式AI方法，用于自动识别流程图中的错误并向用户建议修正，即自动纠错流程图。受大型语言模型（LLM）在人类语言语法自动纠错方面突破的启发，我们研究了LLM用于流程图的自动纠错。模型的输入是可能出错的流程图，输出是修正后的流程图建议。我们在合成数据集上以监督方式训练自动纠错模型。该模型在独立测试的合成流程图数据集上达到了80%的top-1准确率和84%的top-5准确率。结果表明，模型能够学习自动纠错合成流程图。我们设想流程图自动纠错将成为化学工程师的有用工具。

英文摘要

The process engineering domain widely uses Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs) to represent process flows and equipment configurations. However, the P&IDs and PFDs, hereafter called flowsheets, can contain errors causing safety hazards, inefficient operation, and unnecessary expenses. Correcting and verifying flowsheets is a tedious, manual process. We propose a novel generative AI methodology for automatically identifying errors in flowsheets and suggesting corrections to the user, i.e., autocorrecting flowsheets. Inspired by the breakthrough of Large Language Models (LLMs) for grammatical autocorrection of human language, we investigate LLMs for the autocorrection of flowsheets. The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet. We train our autocorrection model on a synthetic dataset in a supervised manner. The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets. The results suggest that the model can learn to autocorrect the synthetic flowsheets. We envision that flowsheet autocorrection will become a useful tool for chemical engineers.

URL PDF HTML ☆

赞 0 踩 0

2407.13303 2026-06-09 cs.LG 版本更新

Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

基于Mean Teacher的半监督学习框架用于Wi-Fi RSSI指纹室内定位

Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith

发表机构 * SIIT, Beijing（北京信息科技大学）； Beijing College of Science and Technology（北京科学技术学院）； XJTLU（新疆大学）； University of Liverpool（利物浦大学）

AI总结针对Wi-Fi指纹室内定位中标记数据采集耗时、监督学习泛化差及动态环境性能下降问题，提出基于Mean Teacher的半监督深度学习框架，结合接入点选择、预训练和噪声注入，在静态和动态场景下显著降低定位误差。

Comments 41 pages, 13 figures

详情

DOI: 10.1016/j.asoc.2026.115711
Journal ref: Applied Soft Computing, Available online 6 June 2026, 115711

AI中文摘要

基于Wi-Fi RSSI指纹的传统大规模室内定位面临标记数据采集耗时费力、监督学习框架下训练的模型因无法利用未标记数据而泛化能力有限，以及在环境变化的动态场景中模型性能下降等问题。为解决这些挑战性问题，我们提出了一种基于Mean Teacher的深度神经网络定位模型的综合半监督学习框架，该框架融合了接入点选择、模型预训练/克隆以及批量级噪声注入。所提出的SSL框架不仅能在离线阶段高效利用混合标记/未标记数据库进行模型静态训练，还能利用现场部署的室内定位系统用户的未标记指纹，在在线阶段对模型进行持续再训练。我们选择Mean Teacher作为基础，因为它能通过模型权重的指数移动平均生成更稳定的目标标签，且不会像Pi-Model那样引入高计算复杂度，同时比时间集成具有更好的在线学习可扩展性，使其成为在大规模室内定位中平衡性能与计算复杂度的最优选择。在UJIIndoorLoc数据库上，与传统的SL框架相比，所提出的SSL框架将CNNLoc和SIMO-DNN模型的平均3D误差分别降低了7.403%和7.748%；在XJTLU动态数据库上，动态训练场景下的平均2D误差最大降低达49.227%，展示了所提出的SSL框架带来的显著性能提升。

英文摘要

Conventional large-scale indoor localization based on Wi-Fi RSSI fingerprinting faces issues of time-consuming and labor-intensive labeled data collection, limited generalization of a model trained under a supervised learning (SL) framework due to its inability to leverage unlabeled data, and model performance degradation in dynamic scenarios with environmental variations. To address those challenging issues, we propose a comprehensive semi-supervised learning (SSL) framework for a deep neural network (DNN) localization model based on the Mean Teacher, which incorporates access point selection, model pre-training/cloning, and batch-level noise injection. The proposed SSL framework can not only efficiently use hybrid labeled/unlabeled databases for static training of a model during the offline phase, but also exploit unlabeled fingerprints from users of the indoor localization system deployed in the field for continuous retraining of the model during the online phase. We base the proposed SSL framework on the Mean Teacher because it can generate more stable target labels through an exponential moving average of model weights without incurring the high computational complexity of the Pi-Model and with better scalability for online learning than Temporal Ensembling, making it an optimal choice that strikes the right balance between performance and computational complexity in large-scale indoor localization. With the UJIIndoorLoc database, the proposed SSL framework reduces the mean 3D errors of the CNNLoc and SIMO-DNN models by 7.403% and 7.748%, respectively, compared with those under the conventional SL framework; with the XJTLU dynamic database, the maximum reduction in mean 2D error reaches up to 49.227% under a dynamic training scenario, demonstrating the substantial performance improvement achieved by the proposed SSL framework.

URL PDF HTML ☆

赞 0 踩 0

2411.11350 2026-06-09 cs.LG eess.SP 版本更新

Zero and Few Shot Load Forecasting with Large Language Models

基于大语言模型的零样本和少样本负荷预测

Wenlong Liao, Chengrui Zhang, Zhe Yang, Mengshuo Jia, Christian Rehtanz, Jiannong Fang, Fernando Porté-Agel

发表机构 * School of Electrical Engineering, Southeast University（东南大学电气工程学院）； Wind Engineering and Renewable Energy Laboratory, Ecole Polytechnique Federale de Lausanne (EPFL)（瑞士联邦理工学院洛桑分校风能与可再生能源实验室）； College of Electrical Engineering and New Energy, China Three Gorges University（中国三峡大学电气工程与新能源学院）； Department of Electrical and Electronic Engineering, Imperial College London（伦敦帝国理工学院电子与电气工程系）； The Department of Automation, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University（上海交通大学自动化与智能感知学院）； The Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai（中国教育部系统控制与信息处理重点实验室，上海）； State Key Laboratory of Submarine Geoscience, Shanghai（上海 submarine 地球科学国家重点实验室）； Institute of Energy Systems, Energy Efficiency and Energy Economic, TU Dortmund University（德意志图林根大学能源系统、能效与能源经济研究所）

AI总结提出利用预训练语言模型Chronos进行零样本和少样本负荷预测，在数据稀缺场景下显著优于多种基线模型。

Comments 24 pages,5 figures

详情

DOI: 10.1016/j.ijepes.2026.111792
Journal ref: International Journal of Electrical Power & Energy Systems, Volume 177,April 2026

AI中文摘要

深度学习模型在负荷预测中表现出色，但通常需要大量数据进行模型训练才能应用于新场景，这限制了其在数据稀缺场景下的有效性。受预训练语言模型（LLMs）在自然语言处理中巨大成功的启发，本文提出了一种使用高级LLM框架（称为Chronos模型）的零样本和少样本负荷预测方法。通过利用其广泛的预训练知识，Chronos模型能够在数据稀缺场景下实现准确的负荷预测。在五个真实世界数据集上的仿真结果表明，Chronos模型在确定性和概率性负荷预测中，针对不同的预测时间范围（例如1至48小时），均显著优于九种流行的基线模型，尽管Chronos模型既未针对这些特定负荷数据集进行定制也未进行微调。值得注意的是，与基线模型相比，Chronos将均方根误差（RMSE）、连续排序概率得分（CRPS）和分位数得分（QS）分别降低了约7.34%-84.30%、19.63%-60.06%和22.83%-54.49%。这些结果突显了Chronos模型的优越性和灵活性，使其成为数据稀缺场景下的有效解决方案。

英文摘要

Deep learning models have shown strong performance in load forecasting, but they generally require large amounts of data for model training before being applied to new scenarios, which limits their effectiveness in data-scarce scenarios. Inspired by the great success of pre-trained language models (LLMs) in natural language processing, this paper proposes a zero and few shot load forecasting approach using an advanced LLM framework denoted as the Chronos model. By utilizing its extensive pre-trained knowledge, the Chronos model enables accurate load forecasting in data-scarce scenarios. Simulation results across five real-world datasets demonstrate that the Chronos model significantly outperforms nine popular baseline models for both deterministic and probabilistic load forecasting with various forecast horizons (e.g., 1 to 48 hours), even though the Chronos model is neither tailored nor fine-tuned to these specific load datasets. Notably, Chronos reduces root mean squared error (RMSE), continuous ranked probability score (CRPS), and quantile score (QS) by approximately 7.34%-84.30%, 19.63%-60.06%, and 22.83%-54.49%, respectively, compared to baseline models. These results highlight the superiority and flexibility of the Chronos model, positioning it as an effective solution in data-scarce scenarios.

URL PDF HTML ☆

赞 0 踩 0

2412.00508 2026-06-09 cs.LG cs.AI cs.CE 版本更新

Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence

Graph-to-SFILES: 基于生成式人工智能从过程拓扑预测控制结构

Lukas Schulze Balhorn, Kevin Degens, Artur M. Schweidtmann

发表机构 * Process Intelligence Research Group（过程智能研究组）； Department of Chemical Engineering（化学工程系）； Delft University of Technology（代尔夫特理工大学）

AI总结提出Graph-to-SFILES模型，利用图神经网络从流程图拓扑生成控制扩展流程图序列，在小数据集上显著提升控制结构预测精度。

详情

DOI: 10.1016/j.compchemeng.2025.109121
Journal ref: Computers & Chemical Engineering, Volume 199, 2025, Pages 109121

AI中文摘要

控制结构设计是P&ID开发中重要但繁琐的步骤。生成式人工智能有望通过支持工程师来减少P&ID开发时间。先前关于化学过程设计中生成式AI的研究主要用序列表示过程。然而，图因其置换不变性而成为一种有前景的替代方案。我们提出了Graph-to-SFILES模型，一种从流程图拓扑预测控制结构的生成式AI方法。Graph-to-SFILES模型将流程图拓扑作为图输入，并返回以SFILES 2.0符号表示的控制扩展流程图序列。我们比较了四种不同的图编码器架构，其中一种是本文提出的图神经网络（GNN）。Graph-to-SFILES模型在10,000个流程图拓扑上训练时达到了73.2%的top-5准确率。此外，所提出的GNN在编码器架构中表现最佳。与纯基于序列的方法相比，Graph-to-SFILES模型在相对较小的1,000个流程图训练数据集上将top-5准确率从0.9%提高到28.4%。然而，在100,000个流程图的大规模数据集上，基于序列的方法表现更好。这些结果突显了基于图的AI模型在小数据场景下加速P&ID开发的潜力，但其在工业相关案例研究中的有效性仍需进一步研究。

英文摘要

Control structure design is an important but tedious step in P&ID development. Generative artificial intelligence (AI) promises to reduce P&ID development time by supporting engineers. Previous research on generative AI in chemical process design mainly represented processes by sequences. However, graphs offer a promising alternative because of their permutation invariance. We propose the Graph-to-SFILES model, a generative AI method to predict control structures from flowsheet topologies. The Graph-to-SFILES model takes the flowsheet topology as a graph input and returns a control-extended flowsheet as a sequence in the SFILES 2.0 notation. We compare four different graph encoder architectures, one of them being a graph neural network (GNN) proposed in this work. The Graph-to-SFILES model achieves a top-5 accuracy of 73.2% when trained on 10,000 flowsheet topologies. In addition, the proposed GNN performs best among the encoder architectures. Compared to a purely sequence-based approach, the Graph-to-SFILES model improves the top-5 accuracy for a relatively small training dataset of 1,000 flowsheets from 0.9% to 28.4%. However, the sequence-based approach performs better on a large-scale dataset of 100,000 flowsheets. These results highlight the potential of graph-based AI models to accelerate P&ID development in small-data regimes but their effectiveness on industry relevant case studies still needs to be investigated.

URL PDF HTML ☆

赞 0 踩 0

2412.06147 2026-06-09 cs.LG cs.ET 版本更新

Advancements in Machine Learning and Deep Learning for Early Detection and Management of Mental Health Disorder

机器学习和深度学习在心理健康障碍早期检测和管理中的进展

Kamala Devi Kannan, Senthil Kumar Jagatheesaperumal, Rajesh N. V. P. S. Kandala, Mojtaba Lotfaliany, Roohallah Alizadehsanid, Mohammadreza Mohebbi

发表机构 * Department of Computer Science and Engineering, Mepco Schlenk Engineering College（梅科斯伦克工程学院计算机科学与工程系）； Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College（梅科斯伦克工程学院电子与通信工程系）； School of Electronics Engineering (SENSE), VIT-AP University（VIT-AP大学电子工程学院（SENSE））； The Institute for Mental and Physical Health and Clinical Translation (IMPACT), School of Medicine, Deakin University（德金大学医学院心理健康与身体健康及临床转化研究所（IMPACT））； Biostatistics Unit, Faculty of Health, Deakin University（德金大学健康学院生物统计学单位）； School of Medicine, Deakin University（德金大学医学院）

AI总结综述了ML/DL在心理健康障碍早期诊断中的应用，涵盖医学影像、遗传和行为数据，并讨论了数据整合、伦理挑战及未来方向。

Comments 21 pages, 2 figures, 3 tables

详情

AI中文摘要

对于心理健康疾病的早期识别、诊断和治疗，深度学习（DL）和机器学习（ML）的整合已开始发挥重要作用。通过评估来自影像、遗传学和行为评估的复杂数据，这些技术有潜力显著改善临床结果。然而，它们也带来了与数据整合和伦理问题相关的独特挑战。本综述回顾了ML和DL方法在心理健康问题早期诊断和治疗中的发展。它考察了一系列应用，特别强调了行为评估、遗传和生物标志物分析，以及用于诊断抑郁症、双相情感障碍和精神分裂症等疾病的医学影像。综述进一步讨论了疾病发展的预测建模，重点关注风险预测模型和纵向研究的作用。重要发现显示了ML和DL如何提高诊断准确性和治疗结果，同时解决方法不一致、数据整合和伦理问题。研究强调了构建用于个性化治疗的实时监测系统、改进数据融合技术和跨学科合作的重要性。未来的研究应集中于克服这些障碍，以最大化ML和DL在心理健康服务中的有益和道德实施。

英文摘要

For the early identification, diagnosis, and treatment of mental health illnesses, the integration of deep learning (DL) and machine learning (ML) have started playing a significant role. By evaluating complex data from imaging, genetics, and behavioral assessments, these technologies have the potential to improve clinical results significantly. However, they also present unique challenges relating to data integration and ethical issues. The development of ML and DL methods for the early diagnosis and treatment of mental health issues is reviewed in this survey. It examines a range of applications, with a particular emphasis on behavioral assessments, genetic and biomarker analysis, and medical imaging for the diagnosis of diseases like depression, bipolar disorder, and schizophrenia. Predictive modeling for illness development is further discussed in the review, focusing on the function of risk prediction models and longitudinal investigations. Important discoveries show how ML and DL might improve treatment outcomes and diagnostic accuracy while tackling methodological inconsistency, data integration, and ethical concerns. The study emphasizes the significance of building real-time monitoring systems for individualized treatment, improving data fusion techniques, and interdisciplinary collaboration. Upcoming studies should concentrate on surmounting these obstacles to maximize ML and DL's valuable and moral implementation in mental health services.

URL PDF HTML ☆

赞 0 踩 0

2504.18451 2026-06-09 cs.LG 版本更新

Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning

利用回测IoT传感器数据和机器学习增强草莓产量预测

Tewodros Alemu Ayall, Andy Li, Matthew Beddows, Milan Markovic, Georgios Leontidis

发表机构 * The School of Natural and Computing Sciences and the Interdisciplinary Institute at the University of Aberdeen（阿伯丁大学自然科学与计算科学学院及跨学科研究所）； UiT The Arctic University of Norway（挪威北极大学）

AI总结针对IoT数据缺失问题，提出基于AI的回测方法合成传感器数据，结合真实数据训练产量预测模型，在草莓生产中验证了合成数据可提升预测精度。

Comments V2: 10 pages, 4 figures, 4 Tables

详情

AI中文摘要

全球人口的快速增长凸显了数字化农业系统的必要性，该系统支持可持续粮食生产以及为农民和利益相关者提供数据驱动的资源管理。采用能够捕获实时环境（如温度、湿度）和操作（如灌溉）参数的物联网（IoT）技术，是实现基于AI的产量预测等高级应用的关键一步。然而，此类模型的有效性通常受限于数据可用性有限，特别是在动态农场环境中，IoT观测数据需要跨越多个生长季节积累。在本研究中，我们在两个生长季节内于草莓生产塑料大棚中部署了IoT传感器，收集了用水量、内外温湿度、土壤湿度、土壤温度以及光合有效辐射数据。这些观测数据与跨越四个季节的手动记录产量数据相结合。为了填补无传感器覆盖的两个季节的IoT数据缺口，我们开发了一种基于AI的回测方法，利用附近气象站的历史天气数据和现有塑料大棚测量值合成缺失的传感器观测数据。然后，我们使用真实和合成数据集训练基于AI的产量预测模型。在这项回顾性评估中，结果表明，结合合成数据提高了产量预测准确性，在组合数据集上训练的模型优于仅使用真实传感器、天气和产量数据的模型。

增强大型语言模型在金属有机框架结构预测中的空间推理能力

Mianzhi Pan, JianFei Li, Peishuo Liu, Botian Wang, Yawen Ouyang, Yiming Rong, Hao Zhou, Jianbing Zhang

发表机构 * National Key Laboratory for Novel Software Technology（新型软件技术国家重点实验室）； Nanjing University（南京大学）； Institute of AI Industry Research (AIR)（人工智能产业研究院）； Tsinghua University（清华大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； University of Chinese Academy of Sciences（中国科学院大学）； ChemBIC（化学信息学中心）

AI总结针对MOF结构预测中原子数量多、复杂度高的问题，提出MOF-LLM框架，通过空间感知持续预训练、结构监督微调和匹配驱动强化学习，增强Qwen-3 8B模型的空间推理能力，实现35.78%匹配率和0.04秒/结构的采样效率。

Comments KDD 2026

详情

AI中文摘要

金属有机框架（MOFs）是多孔晶体材料，在碳捕获和药物输送等领域有广泛应用，但准确预测其三维结构仍然是一个重大挑战。尽管大型语言模型（LLMs）在生成晶体结构方面显示出潜力，但由于MOF单胞中原子数量多导致的结构高度复杂性，LLMs在MOF上的应用受到阻碍。受深度生成模型中块级范式成功启发，我们率先将LLMs应用于该领域，引入了MOF-LLM，这是第一个专门针对块级MOF结构预测的LLM框架。为了有效利用LLMs完成这一3D模块化组装任务，我们的训练范式整合了空间感知持续预训练（CPT）、结构监督微调（SFT）和匹配驱动强化学习（RL）。通过引入显式空间先验并利用软自适应策略优化（SAPO）优化结构稳定性，我们的方法显著增强了Qwen-3 8B模型在MOF结构预测中的空间推理能力。综合实验表明，MOF-LLM实现了最先进的性能，匹配率达到35.78%，同时展现出卓越的采样效率，每个结构仅需0.04秒。

英文摘要

Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a significant challenge. While Large Language Models (LLMs) have shown promise in generating crystal structures, their application to MOFs is hindered by MOFs' high structural complexity arising from the large number of atoms in unit cell. Inspired by the success of block-wise paradigms in deep generative models for MOFs, we pioneer the application of LLMs in this domain by introducing MOF-LLM, the first LLM framework specifically adapted for block-level MOF structure prediction. To effectively harness LLMs for this 3D modular assembly task, our training paradigm integrates spatial-aware continual pre-training (CPT), structural supervised fine-tuning (SFT), and matching-driven reinforcement learning (RL). By incorporating explicit spatial priors and optimizing structural stability via Soft Adaptive Policy Optimization (SAPO), our approach substantially enhances the spatial reasoning in a Qwen-3 8B model for MOF structure prediction. Comprehensive experiments demonstrate that MOF-LLM achieves state-of-the-art performance with a match rate of 35.78% while exhibiting superior sampling efficiency of 0.04 seconds per structure.

URL PDF HTML ☆

赞 0 踩 0

2602.03395 2026-06-09 cs.LG 版本更新

The Label Horizon Paradox: Rethinking Supervision Targets in Financial Forecasting

标签地平线悖论：金融预测中监督目标的再思考

Chen-Hui Song, Shuoling Liu, Liyuan Chen

发表机构 * GitHub

AI总结本文提出标签地平线悖论，指出最优监督信号常偏离预测目标，并基于动态信噪比权衡理论，提出双层优化框架自动寻找最优代理标签，在金融数据集上取得一致改进。

详情

AI中文摘要

虽然深度学习通过复杂的架构革新了金融预测，但监督信号本身的设计却很少受到审视。我们挑战了训练标签必须严格反映推理目标的经典假设，揭示了标签地平线悖论：最优监督信号往往偏离预测目标，而是在由市场动态决定的中间地平线上转移。我们从理论上将这一现象归结为动态信噪比权衡，证明泛化取决于边际信号实现与噪声积累之间的竞争。为了将这一见解付诸实践，我们提出了一个双层优化框架，能够在单次训练运行中自主识别最优代理标签。在大型金融数据集上的大量实验表明，该方法相比传统基线取得了一致的改进，从而为金融预测中基于标签的研究开辟了新途径。

英文摘要

While deep learning has revolutionized financial forecasting through sophisticated architectures, the design of the supervision signal itself is rarely scrutinized. We challenge the canonical assumption that training labels must strictly mirror inference targets, uncovering the Label Horizon Paradox: the optimal supervision signal often deviates from the prediction goal, shifting across intermediate horizons governed by market dynamics. We theoretically ground this phenomenon in a dynamic signal-noise trade-off, demonstrating that generalization hinges on the competition between marginal signal realization and noise accumulation. To operationalize this insight, we propose a bi-level optimization framework that autonomously identifies the optimal proxy label within a single training run. Extensive experiments on large-scale financial datasets demonstrate consistent improvements over conventional baselines, thereby opening new avenues for label-centric research in financial forecasting.

URL PDF HTML ☆

赞 0 踩 0

2602.08733 2026-06-09 cs.LG 版本更新

Foundation Inference Models for Ordinary Differential Equations

常微分方程的基础推理模型

Maximilian Mauel, Johannes R. Hübers, David Berghaus, Patrick Seifner, Ramses J. Sanchez

发表机构 * University of Cambridge（剑桥大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出FIM-ODE，一种预训练的基础推理模型，通过单次前向传播从含噪轨迹直接预测向量场，实现零样本性能匹配并超越ODEFormer，微调后优于现代神经和GP基线。

Comments Published in ICML 2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

AI中文摘要

常微分方程（ODE）是科学建模的核心，但从含噪轨迹中推断其向量场仍然具有挑战性。当前的方法，如符号回归、高斯过程（GP）回归和神经常微分方程，通常需要复杂的训练流程和大量的机器学习专业知识，或者严重依赖于系统特定的先验知识。我们提出FIM-ODE，一种预训练的基础推理模型，通过单次前向传播直接从含噪轨迹数据预测向量场，从而摊销低维ODE推理。我们在具有低次多项式向量场的ODE先验分布上预训练FIM-ODE，并用神经算子表示目标场。FIM-ODE实现了强大的零样本性能，在多种设置下匹配并常常优于最近的预训练符号基线ODEFormer，尽管使用了更简单的预训练先验分布。预训练还为微调提供了强大的初始化，实现了快速且稳定的适应，在不需要机器学习专业知识的情况下优于现代神经和GP基线。

英文摘要

Ordinary differential equations (ODEs) are central to scientific modelling, but inferring their vector fields from noisy trajectories remains challenging. Current approaches such as symbolic regression, Gaussian process (GP) regression, and Neural ODEs often require complex training pipelines and substantial machine learning expertise, or they depend strongly on system-specific prior knowledge. We propose FIM-ODE, a pretrained Foundation Inference Model that amortises low-dimensional ODE inference by predicting the vector field directly from noisy trajectory data in a single forward pass. We pretrain FIM-ODE on a prior distribution over ODEs with low-degree polynomial vector fields and represent the target field with neural operators. FIM-ODE achieves strong zero-shot performance, matching and often improving upon ODEFormer, a recent pretrained symbolic baseline, across a range of regimes despite using a simpler pretraining prior distribution. Pretraining also provides a strong initialisation for finetuning, enabling fast and stable adaptation that outperforms modern neural and GP baselines without requiring machine learning expertise.

URL PDF HTML ☆

赞 0 踩 0

2602.15253 2026-06-09 cs.LG q-bio.GN 版本更新

Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

单细胞转录组学中掩码重建Transformer的缩放定律

Ihor Kendiukhov

发表机构 * Department of Computer Science, University of Tübingen（图宾根大学计算机科学系）

AI总结本研究首次系统探索单细胞RNA测序数据上掩码重建Transformer的缩放行为，发现数据充足时存在幂律缩放定律，数据稀缺时缩放可忽略，并指出数据-参数比是关键决定因素。

详情

AI中文摘要

神经缩放定律——损失、模型大小和数据之间的幂律关系——已在语言和视觉Transformer中得到广泛记录，但它们在单细胞基因组学中的存在性仍未得到充分探索。我们首次系统研究了在单细胞RNA测序（scRNA-seq）数据上训练的掩码重建Transformer的缩放行为。使用CELLxGENE Census的表达谱，我们构建了两种实验设置：数据丰富设置（512个高度可变基因，200,000个细胞）和数据有限设置（1,024个基因，10,000个细胞）。在参数数量跨越三个数量级（533到3.4×10^8个参数）的七种模型大小上，我们将参数化缩放定律拟合到验证均方误差（MSE）。数据丰富设置表现出清晰的幂律缩放，不可约损失下限c约为1.44，而数据有限设置显示出可忽略的缩放，表明当数据稀缺时模型容量不是约束条件。这些结果确立了类似于自然语言处理中观察到的缩放定律在单细胞转录组学中确实存在（当数据充足时），并确定了数据-参数比是缩放行为的关键决定因素。将数据丰富渐近下限初步转换为信息论单位，估计每个掩码基因位置约2.30比特熵。我们讨论了对单细胞基础模型设计的启示，并概述了完善该熵估计所需的额外测量。

英文摘要

Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in single-cell transcriptomics when sufficient data are available, and they identify the data-to-parameter ratio as a critical determinant of scaling behaviour. A preliminary conversion of the data-rich asymptotic floor to information-theoretic units yields an estimate of approximately 2.30 bits of entropy per masked gene position. We discuss implications for the design of single-cell foundation models and outline the additional measurements needed to refine this entropy estimate.

URL PDF HTML ☆

赞 0 踩 0

2603.12666 2026-06-09 cs.LG cs.AI 版本更新

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

RetroReasoner：一种用于战略 retrosynthesis 预测的推理 LLM

Hanbum Ko, Chanhui Lee, Ye Rin Kim, Rodrigo Hormazabal, Sehui Han, Sungbin Lim, Sungwoong Kim

发表机构 * Department of Artificial Intelligence, Korea University（韩国大学人工智能系）； Department of Statistics, Korea University（韩国大学统计系）； Materials Intelligence Lab, LG AI Research（LG人工智能研究实验室）

AI总结 RetroReasoner 通过监督微调和强化学习，捕捉化学家基于断键策略的推理过程，提升 retrosynthesis 预测的准确性和多样性。

Comments 35 pages, 19 figures

详情

AI中文摘要

通过预训练分子嵌入距离推进基于配体的虚拟筛选和分子生成

Shiyun Wa, Yifei Wang, Simone Sciabola, Ye Wang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本文提出预训练嵌入距离作为高效替代方案，用于虚拟筛选和分子生成，展示其在结构信息捕捉和相似性测量方面的有效性。

Comments Accepted by ICML 2026 AI4Science (https://openreview.net/forum?id=HbfrCipfNl). Code and data are available

详情

AI中文摘要

分子相似性在基于配体的药物发现中起核心作用，如虚拟筛选、类比搜索和目标导向的分子生成。然而，传统相似性度量，从基于指纹的Tanimoto系数到3D形状叠加，往往在大规模计算上昂贵或依赖手工制作的分子描述符。同时，许多深度学习方法在相似性感知设计中仍依赖相似性特定的监督或昂贵的数据整理，限制了其在不同目标上的通用性。在本工作中，我们提出预训练嵌入距离（PED）作为有效的替代方法，直接从预训练的分子模型计算得出，无需任务特定训练。实验结果表明，PED与传统相似性度量显示出不同的相关性，并在虚拟筛选中分子排名和通过奖励设计指导分子生成方面表现良好。这些发现表明，预训练分子嵌入捕捉了丰富的结构信息，并可以作为现代人工智能辅助药物发现中有力且可扩展的相似性度量方法。

英文摘要

Molecular similarity plays a central role in ligand-based drug discovery, such as virtual screening, analog searching, and goal-directed molecular generation. However, traditional similarity measures, ranging from fingerprint-based Tanimoto coefficients to 3D shape overlays, are often computationally expensive at scale or rely on hand-crafted molecular descriptors. Meanwhile, many deep learning approaches to similarity-aware design still depend on similarity-specific supervision or costly data curation, limiting their generality across targets. In this work, we propose pretrained embedding distance (PED) as an effective alternative, computed directly from pretrained molecular models without task-specific training. Experimental results show that PED exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening and guiding molecular generation via reward design. These findings suggest that pretrained molecular embeddings capture rich structural information and can serve as a promising and scalable similarity measurement for modern AI-aided drug discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.00647 2026-06-09 cs.LG 版本更新

Label-Conditioned Cross-Modal Fusion for Adult-to-Pediatric ECG Transfer via Curriculum-Gated Contrastive Alignment

基于标签的跨模态融合用于成人到儿童ECG转移 via 课程门控对比对齐

Xinran Liu, Yuwen Li, Hongxiang Gao, Heyang Xu, Jianqing Li, Zongmin Wang, Chengyu Liu

发表机构 * School of Instrument Science and Engineering, Southeast University（东南大学仪器科学与工程学院）； Nanjing Medical University（南京医科大学）； Zhengzhou University（郑州大学）

AI总结本文提出PEACE框架，通过预训练和适应性融合提升儿童ECG诊断，采用对比学习和课程适应策略，在有限标注下实现高准确率。

详情

AI中文摘要

自动化的儿童心电图（ECG）解释仍具挑战性，因为心率、间隔和波形的发育差异限制了主要在成人数据上训练的模型的可转移性，同时专家标注的儿童ECG数据集稀缺。我们提出PEACE（通过跨模态增强的儿童-成人ECG对齐），一个在MIMIC-IV ECG上预训练并适应于儿童目标的成人到儿童ECG转移框架。PEACE整合标签特定的双向对比学习（LSBC）以对齐ECG表示与诊断语义，并采用课程适应融合（CAF）以在有限的儿童监督下稳定优化。标签条件的短文本描述在训练期间提供辅助语义监督，而推理仅需ECG信号。在ZZU-pECG上，PEACE在零样本、50样本和全微调设置下分别达到宏平均AUCs为59.39%、81.74%和91.56%，优于ECG-only、多模态和通用领域适应基线，包括DANN和MMD。在PTB-XL上，经过全微调后，其在九个和谐标签上的宏平均AUC达到96.90%。基于梯度的注意力图显示在与房间相关RVH相关的QRS电压和形态区域以及与LQTS相关的QRS到T/复极化间隔区域的显著性增加，与常规解释中常见的ECG区域一致。这些结果表明，成人规模的ECG预训练结合节律、形态和ST-T复极化语义描述在标签稀缺的情况下提高了可转移的儿童诊断，同时保持了临床可解释的波形焦点。

英文摘要

Automated pediatric electrocardiogram (ECG) interpretation remains challenging because developmental differences in heart rate, intervals, and waveforms limit the transferability of models trained mainly on adult data, while expert-labeled pediatric ECG cohorts are scarce. We propose PEACE (Pediatric-Adult ECG Alignment via Cross-modal Enhancement), an adult-to-pediatric ECG transfer framework pretrained on MIMIC-IV ECGs and adapted to pediatric targets. PEACE integrates label-specific bidirectional contrastive learning (LSBC) to align ECG representations with diagnostic semantics and curriculum adaptive fusion (CAF) to stabilize optimization under limited pediatric supervision. Label-conditioned short text descriptors provide auxiliary semantic supervision during training, whereas inference requires ECG signals only. On ZZU-pECG, PEACE achieves macro-average AUCs of 59.39%, 81.74%, and 91.56% under zero-shot, 50-shot, and full fine-tuning settings, respectively, outperforming ECG-only, multimodal, and generic domain adaptation baselines including DANN and MMD. On PTB-XL, it reaches 96.90% macro-average AUC after full fine-tuning over nine harmonized labels with nonzero mapped incidence. Gradient-based attention maps show increased saliency around QRS voltage and morphology regions for chamber-related RVH and around QRS-to-T/repolarization intervals for LQTS, broadly consistent with ECG regions commonly inspected during routine interpretation. These results suggest that adult-scale ECG pretraining coupled with rhythm, morphology, and ST-T repolarization semantic descriptors improves transferable pediatric diagnosis under label scarcity while preserving clinically interpretable waveform focus.

URL PDF HTML ☆

赞 0 踩 0

2605.01616 2026-06-09 cs.LG cs.AI cs.CY cs.NI 版本更新

Learning Behavioral Signals from Encrypted Smartphone Network Traffic

从加密智能手机网络流量中学习行为信号

Rameen Mahmood, Omar El Shahawy, Souptik Barua, Zachary Beattie, Jeffrey Kaye, Xuhai "Orson'' Xu, Chao-Yi Wu, Danny Yuxing Huang

发表机构 * New York University（纽约大学）； NYU Langone Health（NYU Langone健康）； NYU Grossman School of Medicine（NYU Grossman医学院）； Oregon Health & Science University（俄勒冈健康与科学大学）； Columbia University（哥伦比亚大学）； Harvard Medical School（哈佛医学院）

AI总结本文利用基于Transformer的模型从加密网络流量中学习行为表征，结合用户特定适配器，并通过稀疏表示和广义估计方程分析，发现压力、孤独感和睡眠障碍分别与个体间差异、个体内波动及两者组合相关，且学习到的表征优于传统手工特征。

Comments 19 pages, 6 figures

详情

AI中文摘要

人类行为难以在大规模下连续测量，然而日常活动和幸福感的痕迹可能反映在与个人设备的交互中。我们研究加密的智能手机网络流量是否可以作为被动感知信号，用于检测与睡眠障碍、压力和孤独感相关的行为状态。为了捕捉群体层面的模式和个体特定的行为，我们采用基于Transformer的模型，该模型带有用户特定的适配器，学习网络活动的表征，同时考虑个人基线及其偏差。为了提高可解释性，我们进一步使用稀疏表示学习分析这些表征，以识别与不同活动模式相关的潜在行为特征。我们使用带有Mundlak分解的广义估计方程将所得特征与睡眠障碍、压力和孤独感联系起来，从而能够区分稳定的个体间差异和随时间变化的个体内变化。我们的分析揭示了这三种结果具有不同的时间动态：压力主要与持续的个体间变异相关，孤独感与个体内波动更密切相关，而睡眠障碍则反映了两者的结合。重要的是，这些个体内行为信号无法通过传统的手工网络流量特征恢复，这突显了学习表征在纵向行为建模中的优势。总体而言，我们的发现表明加密网络流量包含可解释的行为信息，并能够支持被动、可扩展的行为动态监测，特别是相对于个体典型活动模式的变化。

英文摘要

Human behavior is challenging to measure continuously at scale, yet traces of daily routines and well-being may be reflected in interactions with personal devices. We investigate whether encrypted smartphone network traffic can serve as a passive sensing signal for behavioral states related to sleep disturbance, stress, and loneliness. To capture both population-level patterns and individual-specific behavior, we employ a transformer-based model with user-specific adapters that learns representations of network activity while accounting for personal baselines and deviations from them. To improve interpretability, we further analyze these representations using sparse representation learning to identify latent behavioral features associated with distinct activity patterns. We relate the resulting features to sleep disturbance, stress, and loneliness using generalized estimating equations with Mundlak decomposition, enabling separation of stable between-person differences from within-person changes over time. Our analysis reveals that the three outcomes are characterized by different temporal dynamics: stress is predominantly associated with persistent between-person variation, loneliness is more strongly linked to within-person fluctuations, and sleep disturbance reflects a combination of both. Importantly, these within-person behavioral signals are not recovered by conventional handcrafted network-traffic features, highlighting the advantages of learned representations for longitudinal behavioral modeling. Overall, our findings demonstrate that encrypted network traffic contains interpretable behavioral information and can support passive, scalable monitoring of behavioral dynamics, particularly changes relative to an individual's typical pattern of activity.

URL PDF HTML ☆

赞 0 踩 0

2605.23247 2026-06-09 cs.LG 版本更新

可解释人工智能的信息论分析

Ram S Iyer

发表机构 * Rajiv Gandhi Institute of Petroleum Technology（拉贾夫·甘地石油技术研究所）

AI总结提出基于互信息的激活映射方法MI CAM，通过特征图与输入图像的互信息加权生成显著性可视化，实现模型推理的因果解释，性能优于现有方法。

详情

AI中文摘要

随着机器视觉在医疗和自动化电厂等关键日常需求中的介入，卷积神经网络的内部机制以及网络提供特定推理的原因引起了关注。本文提出了一种新颖的基于激活映射的事后视觉解释方法，称为MI CAM。与之前基于类激活映射的方法不同，MI CAM通过每个特征图与输入图像的互信息对其进行加权，生成显著性可视化，最终结果由权重和激活图的线性组合产生。它还通过反事实分析验证了因果解释的生成。我们旨在展示MI CAM在模型推理过程中实现的视觉表现和无偏解释。我们的方法与所有最先进的方法相当，但在定性和定量度量上尤其优于其中一些方法。

英文摘要

With the intervention of machine vision in our crucial day to day necessities including healthcare and automated power plants, attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network provides specific inferences. This paper proposes a novel post-hoc visual explanation method called MI CAM based on activation mapping. Differing from previous class activation mapping based approaches, MI CAM produces saliency visualizations by weighing each feature map through its mutual information with the input image and the final result is generated by a linear combination of weights and activation maps. It also adheres to producing causal interpretations as validated with the help of counterfactual analysis. We aim to exhibit the visual performance and unbiased justifications for the model inferencing procedure achieved by MI CAM. Our approach works at par with all state-of-the-art methods but particularly outperforms some in terms of qualitative and quantitative measures.

URL PDF HTML ☆

赞 0 踩 0

2508.00917 2026-06-09 cs.RO cs.CV cs.LG 版本更新

A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles

联网自动驾驶车辆中深度多任务学习综述

Jiayuan Wang, Farhad Pourpanah, Q. M. Jonathan Wu, Ning Zhang

发表机构 * Department of Electrical and Computer Engineering, University of Windsor（温莎大学电气与计算机工程系）； Department of Electrical and Computer Engineering, Queen’s University（皇后大学电气与计算机工程系）

AI总结综述联网自动驾驶车辆中深度多任务学习，涵盖感知、预测、规划、控制及V2X通信与资源管理，分析现有方法优缺点并指出未来方向。

详情

DOI: 10.1109/COMST.2026.3699223

AI中文摘要

联网自动驾驶车辆（CAVs）必须同时执行多个任务，如感知、预测、规划和控制，以确保在复杂环境中安全可靠地导航。此外，通过车联万物（V2X）通信，可以实现CAVs之间的协同感知和驾驶，从而减轻单个车辆的局限性，同时也引入了严格的延迟、可靠性和带宽约束。传统上，任务使用单独的模型处理，这导致部署成本高、计算开销增加以及实现实时性能的挑战。多任务学习（MTL）最近成为一种有前景的解决方案，能够在统一模型中联合学习多个任务，从而提供更高的效率和资源利用率。据我们所知，本综述是首次专注于CAVs中深度MTL的全面回顾。我们首先概述CAVs和MTL以提供基础背景。然后，我们回顾了CAVs关键功能领域的MTL方法，包括感知、预测、规划、控制以及V2X通信和无线电资源管理（RRM）。对于前四个领域，我们将现有工作分为仅单车（车载）和V2X增强协同（多智能体）范式。我们进一步将V2X通信和RRM作为以通信为中心的MTL问题进行讨论。最后，我们讨论了现有方法的优势和局限性，识别了关键研究空白，并提供了旨在推进CAV系统MTL方法的未来研究方向。

英文摘要

Connected autonomous vehicles (CAVs) must simultaneously perform multiple tasks, such as perception, prediction, planning, and control, to ensure safe and reliable navigation in complex environments. Moreover, through vehicle-to-everything (V2X) communication, cooperative perception and driving among CAVs can be enabled, thereby mitigating the limitations of individual vehicles, while it also introduces stringent latency, reliability, and bandwidth constraints. Traditionally, tasks are addressed using separate models, which leads to high deployment costs, increased computational overhead, and challenges in achieving real-time performance. Multi-task learning (MTL) has recently emerged as a promising solution that enables the joint learning of multiple tasks within a unified model. This offers improved efficiency and resource utilization. To the best of our knowledge, this survey is the first comprehensive review focusing on deep MTL in CAVs. We begin with an overview of CAVs and MTL to provide foundational background. Then, we review MTL approaches across key functional domains in CAVs, including perception, prediction, planning, control, as well as V2X communications and radio resource management (RRM). For the first four domains, we categorize existing works under ego vehicle-only (onboard-only) and V2X-enhanced cooperative (multi-agent) paradigms. We further discuss V2X communications and RRM as communication-centric MTL problems. Finally, we discuss the strengths and limitations of existing methods, identify key research gaps, and provide future research directions aimed at advancing MTL methodologies for CAV systems.

URL PDF HTML ☆

赞 0 踩 0

2510.03389 2026-06-09 quant-ph cs.LG 版本更新

Quantum feature-map learning with reduced resource overhead

量子特征映射学习：降低资源开销

Jonas Jäger, Philipp Elsässer, Elham Torabian

发表机构 * Department of Computer Science and Institute of Applied Mathematics, University of British Columbia (UBC), Vancouver, B.C. V6T 1Z4, Canada（计算机科学系和应用数学研究所，不列颠哥伦比亚大学（UBC），温哥华，B.C. V6T 1Z4，加拿大）； Stewart Blusson Quantum Matter Institute (QMI), Vancouver, B.C. V6T 1Z4, Canada（斯图尔特·布卢森量子物质研究所（QMI），温哥华，B.C. V6T 1Z4，加拿大）； Institute of Physics, University of Freiburg, Freiburg (Breisgau), 79104, Germany（物理研究所，弗赖堡大学，弗赖堡（巴登-符腾堡），79104，德国）； Department of Chemistry, University of British Columbia (UBC), Vancouver, B.C. V6T 1Z1, Canada（化学系，不列颠哥伦比亚大学（UBC），温哥华，B.C. V6T 1Z1，加拿大）

AI总结提出Q-FLAIR算法，通过部分解析重构将工作负载转移到经典计算机，显著降低量子资源开销，在真实IBM设备上仅用4小时即在完整MNIST数据集上达到90%以上准确率。

Comments 24 pages, 12 figures, 2 tables

详情

DOI: 10.1103/v29j-rh32
Journal ref: Phys. Rev. Research 8(2), 023247 (2026)

AI中文摘要

当前的量子计算机需要算法经济地使用有限资源。在量子机器学习中，成功取决于量子特征映射，它将经典数据嵌入到量子比特的状态空间中。我们引入了通过解析迭代重构的量子特征映射学习（Q-FLAIR），这是一种在迭代特征映射电路构建中减少量子资源开销的算法。它通过部分解析重构量子模型，仅使用少量评估就将工作负载转移到经典计算机上。对于每次探测到的门添加到拟设中，数据特征和权重参数的同时选择和优化则完全在经典计算机上进行。集成到量子神经网络和量子核支持向量分类器中，Q-FLAIR展示了最先进的基准性能。由于资源开销与特征维度解耦，我们在真实的IBM设备上仅用四小时就训练了一个量子模型，在完整分辨率MNIST数据集（784个特征，数字3 vs 5）上达到了超过90%的准确率。这样的结果以前是无法实现的，因为特征维度会极大地增加固定拟设的硬件需求以及自适应拟设的搜索成本。此外，Q-FLAIR展示了针对直接经典建模的去量子化鲁棒性，满足了文献中罕见的基准，这是潜在量子优势的必要条件。通过超越黑盒优化重新思考特征映射学习，这项工作为在现实问题和近期量子计算机上实现量子机器学习迈出了具体的一步。

英文摘要

Current quantum computers require algorithms that use limited resources economically. In quantum machine learning, success hinges on quantum feature-maps, which embed classical data into the state space of qubits. We introduce Quantum Feature-Map Learning via Analytic Iterative Reconstructions (Q-FLAIR), an algorithm that reduces quantum resource overhead in iterative feature-map circuit construction. It shifts workloads to a classical computer via partial analytic reconstructions of the quantum model, using only a few evaluations. For each probed gate addition to the ansatz, the simultaneous selection and optimization of the data feature and weight parameter is then entirely classical. Integrated into quantum neural network and quantum kernel support vector classifiers, Q-FLAIR shows state-of-the-art benchmark performance. Since resource overhead decouples from feature dimension, we train a quantum model on a real IBM device in only four hours, surpassing 90% accuracy on the full-resolution MNIST dataset (784 features, digits 3 vs 5). Such results were previously unattainable, as the feature dimension prohibitively drives hardware demands for fixed and search costs for adaptive ansätze. Furthermore, Q-FLAIR demonstrates de-quantization robustness against direct classical modeling, satisfying a benchmark rare in the literature and a necessary condition for potential quantum advantage. By rethinking feature-map learning beyond black-box optimization, this work takes a concrete step toward enabling quantum machine learning for real-world problems and near-term quantum computers.

URL PDF HTML ☆

赞 0 踩 0

2511.07280 2026-06-09 econ.GN cs.IR cs.LG q-fin.EC 版本更新

The Value of Personalized Recommendations: Evidence from Netflix

个性化推荐的价值：来自Netflix的证据

Kevin Zielnicki, Guy Aridor, Aurélien Bibaut, Allen Tran, Winston Chou, Nathan Kallus

发表机构 * Netflix ； Kellogg School of Management, Northwestern University（西北大学凯洛格管理学院）

AI总结本文通过Netflix观众数据，构建离散选择模型评估个性化推荐的价值，发现替换推荐算法会降低用户参与度和消费多样性，且有效推荐主要来自精准定位而非机械曝光。

详情

AI中文摘要

个性化推荐系统塑造了用户在线选择的大部分内容，然而其针对性使得分离推荐价值和底层商品的价值具有挑战性。我们构建了一个嵌入推荐诱导效用、低秩异质性和灵活状态依赖的离散选择模型，并将其应用于Netflix的观众数据。我们利用推荐算法引入的异质性变化来识别并分别评估这些组成部分，同时恢复出无需模型的分流比率，以验证我们的结构模型。我们使用该模型评估了反事实场景，量化了个性化推荐产生的增量参与度。首先，我们显示，用矩阵分解或流行度为基础的算法取代当前推荐系统会导致参与度分别减少4%和12%，并降低消费多样性。其次，大多数推荐带来的消费增长来自于有效的定位，而非机械曝光，其中中等流行商品（而非广泛流行或非常小众商品）的收益最大。

英文摘要

Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).

URL PDF HTML ☆

赞 0 踩 0

2601.05261 2026-06-09 cs.IR cs.LG 版本更新

Improving User Experience with Personalized Review Ranking and Summarization

通过个性化评论排名和摘要提升用户体验

Muhammad Jawad Mufti, Omar Hammad, MD. Mahfuzur Rahman

发表机构 * Information and Computer Science Dept., King Fahd University of Petroleum and Minerals（信息与计算机科学系，国王法赫德石油与矿物大学）； Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS), King Fahd University of Petroleum and Minerals（智能安全系统交叉研究中心（IRC-ISS），国王法赫德石油与矿物大学）

AI总结提出融合用户偏好建模、混合情感估计、方面级评论匹配和LLM摘要的个性化评论排名与摘要框架，在亚马逊数据集和用户研究中优于现有方法。

详情

AI中文摘要

在线消费者评论是电子商务中重要的决策支持资源，然而日益增长的评论量常常造成信息过载，使用户难以识别符合个人偏好的内容。现有的评论排名方法通常依赖星级评分、有用性投票或时效性等聚合信号，这些可能无法反映用户特定兴趣。本文提出了一种个性化评论排名和摘要框架，融合了用户偏好建模、混合情感估计、方面级评论匹配和基于大语言模型（LLM）的摘要。该框架首先从历史评论中提取方面级偏好和情感信号，然后结合用户选择的产品方面和书面评论输入来构建个性化用户画像。通过比较该画像与评论级别的方面和情感表示，对候选评论进行排名。随后对排名靠前的评论进行摘要，以提供简洁且符合偏好的信息。该方法使用亚马逊移动电子产品评论数据集和一项涉及70名参与者的结构化用户研究（涵盖常见消费电子产品类别）进行评估。结果表明，所提出的排名方法优于随机排序、基于星级评分、有用性投票、时效性和语义相似度的排名。用户研究结果进一步表明，该方法在满意度、感知相关性、决策信心、信息查找便捷性和阅读效率方面均有提升。研究结果表明，结合方面级个性化、情感感知排名和基于LLM的摘要可以减少评论过载，支持更高效的用户中心决策。

英文摘要

Online consumer reviews are important decision-support resources in e-commerce, yet the increasing volume of reviews often creates information overload and makes it difficult for users to identify content that matches their individual preferences. Existing review-ranking approaches commonly rely on aggregate signals such as star ratings, helpfulness votes, or recency, which may not reflect user-specific interests. This paper proposes a personalized review ranking and summarization framework that integrates user preference modeling, hybrid sentiment estimation, aspect-level review matching, and Large Language Model (LLM)-based summarization. The framework first extracts aspect-level preferences and sentiment signals from historical reviews. It then incorporates user-selected product aspects and written review input to build a personalized user profile. Candidate reviews are ranked by comparing this profile with review-level aspect and sentiment representations. The top-ranked reviews are then summarized to provide concise, preference-aligned information. The proposed method was evaluated using an Amazon Mobile Electronics review dataset and a structured user study involving 70 participants across common consumer electronics categories. Results show that the proposed ranking method outperformed random ordering, star-rating-based ranking, helpfulness-vote ranking, recency-based ranking, and semantic-similarity-based ranking. User-study results further indicate improvements in satisfaction, perceived relevance, decision-making confidence, ease of finding information, and reading efficiency. The findings suggest that combining aspect-level personalization, sentiment-aware ranking, and LLM-based summarization can reduce review overload and support more efficient user-centered decision-making.

URL PDF HTML ☆

赞 0 踩 0

2601.15408 2026-06-09 cs.CV cs.AI cs.CL cs.LG 版本更新

CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation

CURE：基于课程引导的多任务训练实现可靠的解剖学接地报告生成

Pablo Messina, Andrés Villa, Juan León Alcázar, Karen Sánchez, Carlos Hinojosa, Denis Parra, Álvaro Soto, Bernard Ghanem

发表机构 * Pontificia Universidad Católica de Chile（智利天主教大学）； CENIA ； iHEALTH ； KAUST（科威特皇家科学与技术局）

AI总结提出CURE框架，通过课程学习动态调整多任务训练，提升医学报告生成的视觉接地准确性和事实一致性，无需额外数据。

Comments 31 pages, 7 figures, accepted to CVPR 2026 (oral)

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 36279-36289

AI中文摘要

医学视觉语言模型可以自动生成放射学报告，但在精确的视觉接地和事实一致性方面存在困难。现有模型常常将文本发现与视觉证据错误对齐，导致不可靠或弱接地的预测。我们提出CURE，一个错误感知的课程学习框架，无需任何额外数据即可改善接地和报告质量。CURE在短语接地、接地报告生成和解剖学接地报告生成上，使用公共数据集微调多模态指令模型。该方法基于模型性能动态调整采样，强调困难样本以改善空间和文本对齐。CURE将接地准确率提高了+0.35 IoU，报告质量提高了+0.192 CXRFEScore，并将幻觉减少了18.6%。CURE是一个数据高效的框架，增强了接地准确性和报告可靠性。代码可从此https URL获取，模型权重可从此https URL获取。

英文摘要

Medical vision-language models can automate the generation of radiology reports but struggle with accurate visual grounding and factual consistency. Existing models often misalign textual findings with visual evidence, leading to unreliable or weakly grounded predictions. We present CURE, an error-aware curriculum learning framework that improves grounding and report quality without any additional data. CURE fine-tunes a multimodal instructional model on phrase grounding, grounded report generation, and anatomy-grounded report generation using public datasets. The method dynamically adjusts sampling based on model performance, emphasizing harder samples to improve spatial and textual alignment. CURE improves grounding accuracy by +0.35 IoU, boosts report quality by +0.192 CXRFEScore, and reduces hallucinations by 18.6%. CURE is a data-efficient framework that enhances both grounding accuracy and report reliability. Code is available at https://github.com/PabloMessina/CURE and model weights at https://huggingface.co/pamessina/medgemma-4b-it-cure

URL PDF HTML ☆

赞 0 踩 0

2602.08916 2026-06-09 cs.SC cs.ET cs.LG 版本更新

AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection

AMS-HD：用于实时和节能急性高海拔病检测的高维计算

Abu Masum, Mehran Moghadam, M. Hassan Najafi, Bige Unluturk, Ulkuhan Guler, Beth A. Beidleman, Sercan Aygun

发表机构 * School of Computing and Informatics, University of Louisiana at Lafayette（路易斯安那州立大学拉斐特分校计算机与信息学院）； Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University（凯斯西储大学电气、计算机与系统工程系）； Electrical and Biomedical Engineering, Michigan State University（密歇根州立大学电气与生物医学工程系）； Electrical and Computer Engineering Department, Worcester Polytechnic Institute（沃思菲技术学院电气与计算机工程系）； US Army Research Institute of Environmental Medicine（美国陆军环境医学研究院）

AI总结本文提出AMS-HD框架，利用高维计算实现实时急性高海拔病检测，通过特征选择、超向量编码和位置投影提升分类效率，在多种平台上实现高准确率和低能耗。

详情

AI中文摘要

目标：急性高海拔病（AMS）是最常见的高海拔疾病，影响未适应者在海拔2500米以上攀登时，传统机器学习方法在连续监测中难以满足实时硬件效率要求。方法：本文提出AMS-HD，首个基于高维计算的实时AMS检测框架，涵盖移动平台的高维双极计算和FPGA/ASIC的低维二进制计算。框架整合互信息特征选择、超向量编码和位置投影以提高分类效率。验证在ARM、FPGA和智能手表-智能手机平台使用可穿戴的血氧和心率信号。结果：AMS-HD在二分类和多分类中匹配或优于SVM和MLP基线，二分类准确率高达91%，F1分数达90%。在FPGA上，AMS-HD减少LUT和触发器使用量达7.3倍和5.8倍，能耗仅为MLP的3.9倍。在移动平台，AMS-HD每会话仅消耗1%电池，2.50毫秒推理时间，能耗低于SVM和MLP。结论：AMS-HD提供了一个可扩展、硬件感知的替代方案，实现竞争性性能和显著降低资源消耗。意义：本文首次提出完整的高维计算框架用于高海拔病检测，连接可穿戴推理和低层硬件部署，为资源受限健康监测提供解决方案。

英文摘要

Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine learning (ML) methods for AMS detection from wearable physiological signals often fail to meet real-time hardware efficiency requirements of continuous monitoring. Methods: We present AMS-HD, the first hyperdimensional computing (HDC)-based framework for real-time AMS detection, spanning high-level bipolar (-1/+1) computing for mobile platforms and low-level binary (0/1) computing for FPGA and ASIC targets. The framework integrates mutual information feature selection, hypervector encoding, and positional projection to enhance classification efficiency. Validation spans ARM, FPGA, and smartwatch-smartphone platforms using wearable-accessible SpO2 and heart rate signals. Results: AMS-HD matches or outperforms SVM and MLP baselines in both binary and multiclass classification, achieving up to 91% accuracy and 90% F1-score in binary classification, and up to 85% accuracy on external AMS-related datasets. On FPGA, AMS-HD reduces LUT and flip-flop usage by 7.3x and 5.8x, while consuming 3.9x less power than MLP. On mobile platforms, AMS-HD requires only 1% battery per session, 60 Bytes of memory, and 2.50 ms inference time -- approximately 2x and more than 3x lower energy consumption than SVM and MLP. Conclusion: AMS-HD provides a scalable, hardware-aware alternative to conventional ML for real-time AMS monitoring, achieving competitive performance with substantially lower resource consumption. Significance: This work presents the first complete HDC framework for altitude sickness detection, bridging wearable inference and low-level hardware deployment for resource-constrained health monitoring.

URL PDF HTML ☆

赞 0 踩 0

2602.23234 2026-06-09 cs.IR cs.AI cs.LG 版本更新

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

扩展搜索相关性：用LLM生成的判断增强应用商店排名

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha

发表机构 * Apple（苹果公司）

AI总结针对应用商店排名中专家文本相关性标签稀缺的问题，通过微调LLM生成数百万标签，结合行为相关性优化排序器，显著提升Pareto前沿和转化率。

详情

AI中文摘要

大规模商业搜索系统优化相关性以驱动成功的会话，帮助用户找到他们想要的内容。为了最大化相关性，我们利用两个互补的目标：行为相关性（用户倾向于点击或下载的结果）和文本相关性（结果与查询的语义匹配）。一个持续的挑战是，相对于丰富的行为相关性标签，专家提供的文本相关性标签稀缺。我们首先通过系统评估LLM配置来解决这个问题，发现一个专门的、微调的模型在提供高度相关的标签方面显著优于一个更大的预训练模型。使用这个最优模型作为力量倍增器，我们生成了数百万个文本相关性标签以克服数据稀缺性。我们展示了用这些文本相关性标签增强我们的生产排序器会导致Pareto前沿显著外移：离线NDCG在行为相关性上改善，同时在文本相关性上也提高。这些离线收益通过在全球应用商店排序器上的A/B测试得到验证，该测试显示转化率统计上显著提高了+0.24%，其中最大的性能提升出现在尾部查询中，新的文本相关性标签在缺乏可靠行为相关性标签时提供了稳健的信号。

英文摘要

Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a significant outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance while simultaneously increasing for textual relevance. These offline gains were validated by a worldwide A/B test on the App Store ranker, which demonstrated a statistically significant +0.24% increase in conversion rate, with the most substantial performance gains occurring in tail queries, where the new textual relevance labels provide a robust signal in the absence of reliable behavioral relevance labels.

URL PDF HTML ☆

赞 0 踩 0

2603.04177 2026-06-09 cs.SE cs.AI cs.LG 版本更新

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

CodeTaste：LLM能否生成人类级别的代码重构？

Alex Thillen, Niels Mündler, Veselin Raychev, Martin Vechev

发表机构 * University of California, Berkeley（加州大学伯克利分校）； ETH Zurich（苏黎世联邦理工学院）

AI总结研究LLM代理在代码重构中的能力，通过CodeTaste基准测试发现，代理在详细指定重构时表现良好，但难以自主发现人类选择的重构，提出“先提议后实现”分解可改善对齐。

详情

AI中文摘要

LLM编码代理可以生成可工作的代码，但它们的解决方案往往积累复杂性、重复和架构债务。人类开发者通过重构来解决这些问题：行为保持的程序转换，改善结构和可维护性。我们研究代理是否(i)能够可靠地执行重构，以及(ii)识别人类开发者在实际代码库中实际选择的重构。为此，我们构建了CodeTaste，一个从大型多文件开源重构中挖掘的基准测试。为了评分解决方案，我们结合了测量功能正确性的仓库测试套件和定制的静态检查，这些检查使用数据流推理验证不期望模式的移除和期望模式的引入。我们的结果显示了一个明显的差距：代理在实现详细指定的重构时表现良好，但当给定变更的关注区域时，往往无法发现人类的重构选择。先提议后实现的分解改善了对齐，而在实现之前选择最佳对齐的提议可以带来进一步的收益。CodeTaste为在现实代码库中将编码代理与人类重构决策对齐提供了评估目标和潜在的偏好信号。我们发布了基准测试、排行榜和代码。

膝-xRAI：一种用于自动膝骨关节炎Kellgren-Lawrence分级的可解释AI框架

Azmul A. Irfan, Nur Ahmad Khatim, Alfan Alfian Irfan, Achmad Zaki, Erike A. Suwarsono, Mansur M. Arief

发表机构 * Orthopaedic Department, Faculty of Medicine UIN Syarif Hidayatullah Jakarta（乌姆尼大学医学学院骨科部）； Informatics Engineering, Institut Teknologi Sepuluh Nopember（十月份技术研究所信息工程系）； Information Technology, Universitas Muhammadiyah Yogyakarta（尤科阿卡塔大学信息技术系）； Industrial and Systems Engineering, King Fahd University of Petroleum and Minerals（国王法赫德石油与矿物大学工业与系统工程系）

AI总结本文提出Knee-xRAI框架，通过模拟临床放射流程，结合JSN、骨刺和下骨质硬化等特征，利用XGBoost-SHAP和ConvNeXt模型实现可解释的KL分级，验证了其在膝骨关节炎诊断中的有效性。

Comments 8 pages, 5 figures

详情

AI中文摘要

对平片进行膝骨关节炎（KOA）分级的可重复性差。KL评分单级分歧可能改变手术管理或将患者从保守治疗转为关节内注射。同时，超越人类读者的深度学习模型通常缺乏决策解释。我们提出了Knee-xRAI，一个分解分级过程的流程，通过模仿临床放射流程独立测量关节间隙狭窄（JSN）、骨刺和下骨质硬化，然后将这些发现组合成可解释的KL评分。具体而言，U-Net++架构通过轮廓分割量化JSN，SE-ResNet-50多任务网络在OARSI尺度上对骨刺进行解剖部位评分，混合纹理-CNN检测二进制硬化。该流程产生一个50维特征向量，通过XGBoost-SHAP分类器（路径A，审计）和ConvNeXt混合预测器（路径B，部署）进行评估。在8,260个OAI衍生的放射图像上，JSN模块的Dice得分为0.8909，mJSW ICC为0.8674。路径A达到QWK为0.6294和AUC为0.8046，证实了结构化特征向量具有显著的诊断信号。路径B达到QWK为0.8436和AUC为0.9017。SHAP分析显示JSN是主导特征，骨刺增加了一致的增量，硬化贡献微小。移除JSN证据会降低KL3-KL4召回率，而早期等级保持不变，与KL诊断标准一致。Knee-xRAI将每个预测都基于可审计的放射学发现链，提供临床透明度。

英文摘要

Grading knee osteoarthritis (KOA) on plain radiographs is poorly reproducible across readers. A single-grade disagreement on the Kellgren-Lawrence (KL) scale can alter surgical management or redirect a patient from conservative therapy to intra-articular injection. Meanwhile, deep learning models that outperform human readers often offer no explanation for their decisions. We present Knee-xRAI, a pipeline that decomposes the grading process by mimicking clinical radiological workflows. It independently measures joint space narrowing (JSN), osteophytes, and subchondral sclerosis, then combines these findings into an explainable KL grade. Specifically, a U-Net++ architecture quantifies JSN via contour segmentation, an SE-ResNet-50 multi-task network grades osteophytes per anatomical site on the OARSI scale, and a hybrid texture-CNN detects binary sclerosis. This pipeline yields a 50-dimensional feature vector evaluated via an XGBoost-SHAP classifier (Path A, audit) and a ConvNeXt hybrid predictor (Path B, deployed). On 8,260 OAI-derived radiographs, the JSN module achieved a Dice score of 0.8909 and an mJSW ICC of 0.8674. Path A reached a QWK of 0.6294 and an AUC of 0.8046, confirming the structured feature vector carries substantial diagnostic signal. Path B achieved a QWK of 0.8436 and an AUC of 0.9017. SHAP analysis identifies JSN as the dominant feature, with osteophytes adding a consistent increment and sclerosis contributing marginally. Removing JSN evidence collapses KL3-KL4 recall while early grades remain intact, aligning with the KL diagnostic criteria. Knee-xRAI grounds every prediction in an auditable chain of measured radiographic findings, providing clinical transparency at the point of care.

URL PDF HTML ☆

赞 0 踩 0

2605.01171 2026-06-09 cs.CV cs.LG 版本更新

CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization

CADFit：基于混合优化的精确网格到CAD程序生成

Ghadi Nehme, Eamon Whalen, Faez Ahmed

发表机构 * Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA（麻省理工学院机械工程系）

AI总结提出CADFit框架，通过基于几何反馈的增量拟合和验证参数化操作，从网格中恢复复杂可编辑的CAD构造序列，在多个基准上优于现有方法，并显著降低无效比率。

详情

AI中文摘要

尽管最近取得了进展，但从几何输入（如网格或点云）恢复参数化CAD构造序列仍然是设计和制造的关键挑战，因为现有的CAD重建和生成方法主要局限于难以编辑的格式（如网格或Breps）或可编辑的简单草图-拉伸流水线和低复杂度数据集。我们引入了CADFit，一个基于混合优化的CAD重建框架，通过使用几何反馈增量拟合和验证参数化操作，从网格中恢复复杂、可编辑的CAD构造序列。我们的方法的特点是将重建公式化为对结构化CAD程序的IoU驱动优化，并支持丰富的操作集，包括拉伸、旋转、圆角和倒角。在多个CAD基准上的实验表明，CADFit在体积交并比和倒角距离方面优于最先进的网格到CAD方法，同时显著降低了重建CAD程序的无效比率，特别是对于复杂设计。我们进一步提出了一个多模态流水线，通过将基于图像的几何重建与CADFit相结合，实现从图像端到端重建CAD构造序列。通过实现更高复杂度CAD模型的精确重建，CADFit为生成更丰富的数据集和推进未来基于学习的CAD逆向工程方法提供了实用基础。代码可在：https://github.com/ghadinehme/CADFit 获取。

英文摘要

Despite recent progress, recovering parametric CAD construction sequences from geometric input, such as meshes or point clouds, is a key challenge for design and manufacturing, as existing CAD reconstruction and generation methods are largely restricted to difficult-to-edit formats like meshes or Breps or editable simple sketch-and-extrude pipelines and low-complexity datasets. We introduce CADFit, a hybrid optimization-based CAD reconstruction framework that recovers complex, editable CAD construction sequences from meshes by incrementally fitting and validating parametric operations using geometric feedback. Our approach is distinguished by formulating reconstruction as an IoU-driven optimization over structured CAD programs and supporting a rich set of operations, including extrusions, revolutions, fillets, and chamfers. Experiments on multiple CAD benchmarks show that CADFit outperforms state-of-the-art mesh-to-CAD methods in volumetric Intersection-over-Union and Chamfer Distance, while substantially reducing the Invalid Ratio of reconstructed CAD programs, particularly for complex designs. We further present a multimodal pipeline that enables end-to-end reconstruction of CAD construction sequences from images by combining image-based geometry reconstruction with CADFit. By enabling accurate reconstruction of higher-complexity CAD models, CADFit provides a practical foundation for generating richer datasets and advancing future learning-based approaches to CAD reverse engineering. The code is available at: https://github.com/ghadinehme/CADFit.

URL PDF HTML ☆

赞 0 踩 0

2605.03395 2026-06-09 cs.SD cs.AI cs.LG cs.MM 版本更新

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

APEX：面向AI生成音乐的大规模多任务美学感知流行度预测

Jaavid Aktar Husain, Dorien Herremans

发表机构 * AMAAI Lab, Singapore University of Technology and Design（新加坡科技设计大学AMAAI实验室）

AI总结提出APEX框架，利用MERT音频嵌入联合预测AI生成音乐的流行度指标与五维美学质量，在Music Arena数据集上验证了美学特征对偏好预测的泛化能力。

详情

AI中文摘要

音乐流行度预测因其对艺术家、平台和推荐系统的重要性而吸引了越来越多的研究兴趣。然而，AI生成音乐平台的爆炸式增长创造了一个全新且很大程度上未被探索的领域，每天都有大量歌曲被生产和消费，而没有传统的艺术家声誉或唱片公司支持。在这一探索中，美学质量是关键但尚未被研究的因素。我们提出了APEX，这是首个面向AI生成音乐的大规模多任务学习框架，在来自Suno和Udio的超过21.1万首歌曲（1万小时音频）上训练，该框架联合预测基于参与度的流行度信号——流媒体播放量和点赞分数——以及从MERT（一个自监督音乐理解模型）提取的冻结音频嵌入中的五个感知美学质量维度。美学质量和流行度捕捉了音乐的互补方面，两者结合被证明是有价值的：在Music Arena数据集上的分布外评估中，该数据集包含训练期间未见过的十一个生成音乐系统之间的成对人类偏好对决，引入美学特征持续改进了偏好预测，展示了所学表示在生成架构上的强大泛化能力。

英文摘要

Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.16163 2026-06-09 physics.ao-ph cs.LG 版本更新

SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forecasts over Switzerland

SwAIther-Precip：考虑提前时间的偏倚校正实现瑞士全球AI降水预报的公里级降尺度

Dan Assouline, Erwan Koch, Federico Amato, Filippo Quarenghi, Daniele Nerini, Thibaut Loiseau, Kyle van de Langemheen, Tom Beucler

发表机构 * European Centre for Medium-Range Weather Forecasts（欧洲中期天气预报中心）； University of Geneva（日内瓦大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出SwAIther-Precip框架，通过校正提前时间依赖性偏倚，提升全球AI降水预报的公里级概率降尺度能力，实验显示CRPS降低48%。

详情

AI中文摘要

技能性中短期降水预报在复杂地形上仍具挑战，因降水源于多尺度非线性过程，全球模型无法以经济成本显式解析。全球AI天气模型可产生技能性中短期预报，但其原生0.25度分辨率限制了本地灾害应用。统计降尺度有助于弥合这一差距，但现有方法常难以处理状态依赖性及尤其提前时间依赖性的全球预报偏倚。我们引入SwAIther-Precip，一种考虑提前时间的降尺度框架，将粗分辨率AIFS预报转换为瑞士公里级概率降水场。首先，通过特征-wise线性调制的U-Net，利用提前时间条件确定性校正粗分辨率系统性偏倚。这种针对性校正使后续更便宜的超分辨率阶段仅需校正降水，允许直接训练于观测而非完整大气状态。扩散模型随后独立于提前时间生成精细空间变异性。使用AIFS预报和CombiPrecip雷达-雨量计观测，SwAIther-Precip将CRPS相对于原始AIFS降低48%。生成的场在大尺度（0.85以上）和小尺度（0.88）上再现观测空间变异性，对应于1公里网格上约4公里的有效分辨率，适用于最多5天的提前时间。跨提前时间训练进一步提升长程性能，相对于提前时间特定模型，在6天时CRPS减少13%。这些结果表明，在生成超分辨率前显式校正提前时间依赖性偏倚是高效公里级概率降尺度的关键。

英文摘要

Skillful medium-range precipitation forecasting at kilometer scale remains challenging over complex terrain because precipitation arises from multiscale nonlinear processes that global models cannot explicitly resolve at affordable cost. Global AI weather models can produce skillful medium-range forecasts, but their native 0.25 degrees resolution limits direct use for local hazard applications. Statistical downscaling can help bridge this gap, yet existing approaches often struggle with state-dependent, and especially lead-time-dependent, biases in global forecasts. We introduce SwAIther-Precip, a lead-time-aware downscaling framework that converts coarse-resolution AIFS forecasts into probabilistic km-scale precipitation fields over Switzerland. First, a U-Net conditioned on lead time via feature-wise linear modulation deterministically corrects systematic biases at coarse resolution. This targeted correction enables a cheaper super-resolution stage conditioned only on corrected precipitation, allowing direct training on observations rather than on the full atmospheric state. A diffusion-based model then generates fine-scale spatial variability independently of lead time. Using AIFS forecasts and CombiPrecip radar-gauge observations, SwAIther-Precip reduces CRPS by 48% relative to raw AIFS. The generated fields reproduce observed spatial variability with spectral fidelity above 0.85 at large scales and 0.88 at small scales, corresponding to an effective resolution of approximately 4 km on a 1 km grid for lead times up to 5 days. Training across lead times further improves long-range performance, yielding a 13% CRPS reduction at 6 days relative to lead-time-specific models. These results show that explicitly correcting lead-time-dependent biases before generative super-resolution is key to efficient km-scale probabilistic downscaling of global AI precipitation forecasts.

URL PDF HTML ☆

赞 0 踩 0

2605.20735 2026-06-09 cs.CV cs.LG 版本更新

Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition

降低参与IREX的门槛：用于虹膜识别的开源算法、工具包和基准测试

Siamul Karim Khan, Patrick J. Flynn, Adam Czajka

发表机构 * University of Notre Dame（内布拉斯加大学）

AI总结本文提出两种新的开源虹膜识别算法，提供Python和符合IREX标准的C++实现，用于提交官方IREX X计划。研究旨在首次根据IREX测试协议评估开源虹膜识别解决方案，并提供一个模型C++提交，显著促进其他团队的开源方法进入IREX评估。新方法包括两个神经网络，分别使用三元组损失与批量硬三元组挖掘（TripletIris）和ArcFace损失（ArcIris）。此外，文章还提供了两种现有方法的开源IREX兼容C++实现：基于虹膜图像过滤的人类显著性驱动内核（HDBIF）算法，以及用于检测和比较Fuchs密钥（CRYPTS）的人类可解释算法。除了CRYPTS在1:N搜索中面临时间限制外，其他方法已通过官方IREX X评估，并在多个流行学术基准上进行了评估。最后，本文还提供了可用于任何新虹膜识别方法的虹膜分割和圆圈估计开源模型。

详情

AI中文摘要

本文提出了两种新的开源虹膜识别算法，提供了Python和符合IREX标准的C++实现，用于提交官方IREX X计划。本研究有两个主要目标：（a）首次根据IREX测试协议评估开源虹膜识别解决方案；（b）提供一个模型C++提交，显著促进其他团队的开源方法进入IREX评估。新方法包括两个神经网络，分别使用三元组损失与批量硬三元组挖掘（TripletIris）和ArcFace损失（ArcIris）。本文还提供了两种现有方法的开源IREX兼容C++实现：（a）基于虹膜图像过滤的人类显著性驱动内核（HDBIF）算法；（b）用于检测和比较Fuchs密钥（CRYPTS）的人类可解释算法。除了CRYPTS在1:N搜索中面临时间限制外，这些方法已通过官方IREX X评估，并在多个流行学术基准上进行了评估：Quality-Face/Iris Research Ensemble、Warsaw-Biobase Post-Mortem Iris、CASIA-Iris-Thousand-V4、CASIA-Iris-Lamp-V4、IIT Delhi Iris Database、IIITD Contact Lens Iris Database、NDIris3D和Notre Dame Variable Iris Image Quality Release 2。最后，本文还提供了可用于任何新虹膜识别方法的虹膜分割和圆圈估计开源模型。

英文摘要

NIST Iris Exchange (IREX) offers an appealing solution to evaluating new open-source iris recognition algorithms, but it presents high barriers to entry because these algorithms must be written in C++, using a specific API, and adapted to meet strict IREX speed and memory constraints. The main goal of this paper is to lower these barriers and advance open-source iris recognition large-scale evaluations by offering: (a) two new modern deep learning-based open-source iris matchers (ArcIris and TripletIris), along with their C++ IREX X-compliant implementations, which are the first open-source iris recognition methods included into the IREX X leaderboard (and thus IREX-vetted), as well as new segmentation and iris circular approximation models that can be incorporated into any new iris recognition method, and (b) a performance assessment (according to IREX X testing protocols) of all major and currently available open-source iris recognition solutions. The paper also provides Python implementations of the new ArcIris and TripletIris methods and discusses the differences one may encounter between C++ and Python implementations of the same conceptually equivalent approaches. Finally, the paper offers open-source, IREX X-compliant C++ implementations of two existing methods: (a) an iris image filtering-based algorithm utilizing human saliency-driven kernels (HDBIF), and (b) a human-interpretable algorithm for detecting and comparing Fuchs' crypts (CRYPTS). In addition to IREX X evaluation results, the paper reports the performance of all methods on major academic benchmarks: Quality-Face/Iris Research Ensemble (Q-FIRE), Warsaw-Biobase Post-Mortem Iris, CASIA-Iris-Thousand-V4, CASIA-Iris-Lamp-V4, IIT Delhi Iris Database, IIITD Contact Lens Iris Database, NDIris3D, and Notre Dame Variable Iris Image Quality Release 2 (VII-Q-R2).

URL PDF HTML ☆

赞 0 踩 0

2605.27441 2026-06-09 cs.IR cs.LG 版本更新

A Unified Structured Query Understanding Framework for Industrial Semantic Search

面向工业语义搜索的统一结构化查询理解框架

Ping Liu, Qianqi Shen, Jianqiang Shen, Chunnan Yao, Kevin Kao, Rajat Arora, Dan Xu, Baofen Zheng, Yunxiang Ren, Benjamin Le, Ali Hooshmand, Igor Lapchuk, Juan Bottaro, Raghavan Muthuregunathan, Caleb Johnson, Liangjie Hong, Jingwei Wu, Wenjing Zhang

发表机构 * LinkedIn Corporation（领英公司）

AI总结提出一个统一的结构化查询理解系统，将多个异构功能整合到单个小语言模型（SLM）中，并引入Query Illuminator框架用于自动标注和评估，在LinkedIn的职位搜索和人员搜索中验证了效果。

Comments Accepted by KDD-ADS 2026

详情

DOI: 10.1145/3770855.3818312

AI中文摘要

大规模工业搜索系统中的查询理解通常实现为一系列不同、任务特定的组件的级联。虽然每个组件可单独优化，但这种碎片化架构导致维护开销高，且行为不一致，特别是对于长尾查询。在这项工作中，我们提出并部署了一个统一的结构化查询理解系统，将异构功能整合到单个执行模式约束生成的小语言模型（SLM）中。为了解决统一建模中的数据瓶颈，我们引入了Query Illuminator，一个双重用途的框架，作为：(i) 用于高质量自动标注和蒸馏的教师模型，以及(ii) 在人工标注稀缺时用于可扩展评估的替代评判者。我们通过在LinkedIn的职位搜索系统中的广泛离线和在线测试验证了该方法。此外，我们通过跨领域的人员搜索案例研究展示了该框架的水平可扩展性。结果表明，在有限的GPU资源上满足严格的低延迟服务约束的同时，用户参与度提高，运营成本降低。

英文摘要

Query understanding in large-scale industrial search systems is typically implemented as a cascade of disparate, task-specific components. While individually optimizable, this fragmented architecture incurs high maintenance overhead and results in inconsistent behaviors, particularly for long-tail queries. In this work, we propose and deploy a unified structured query understanding system that consolidates these heterogeneous functions into a single Small Language Model (SLM) that performs schema-constrained generation. To address the data bottlenecks inherent in unified modeling, we introduce Query Illuminator, a dual-purpose framework serving as: (i) a teacher model for high-quality auto-annotation and distillation, and (ii) a surrogate judge for scalable evaluation where human labels are scarce. We validate this approach through extensive offline and online tests within LinkedIn's Job Search system. Furthermore, we demonstrate the framework's horizontal extensibility through a cross-domain case study on People Search. The results show improved user engagement and reduced operational costs, achieved while satisfying strict low-latency serving constraints on limited GPU resources.

URL PDF HTML ☆

赞 0 踩 0

2606.00384 2026-06-09 cs.AI cs.CL cs.CV cs.LG stat.CO 版本更新

放射学中比较推理的视觉语言框架

Tengfei Zhang, Ziheng Zhao, Xiaoman Zhang, Lisong Dai, Pengcheng Qiu, Ya Zhang, Yanfeng Wang, Weidi Xie

发表机构 * University of Science and Technology of China（中国科学技术大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； School of Artificial Intelligence, Shanghai Jiao Tong University（上海交通大学人工智能学院）； Department of Biomedical Informatics, Harvard Medical School（哈佛医学院生物医学信息学系）； Department of Radiology, Renmin Hospital of Wuhan University（武汉大学仁民医院放射科）； Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University（上海交通大学附属第六人民医院）

AI总结提出一个实体感知的跨图像推理框架，通过构建大规模比较影像数据集MedReCo-DB和开发MedReCo及MedReCo-VLM模型，实现了参考病例检索和时间比较解读，显著提升了放射学比较推理性能。

详情

AI中文摘要

医学影像人工智能在孤立图像解读方面取得了强劲性能，但仍与放射学实践存在较大差距，因为诊断和随访依赖于对先前研究和类似参考病例的比较。本文我们将放射学比较形式化为一个实体感知的跨图像推理问题，并引入一个支持参考病例检索和时间比较解读的框架。我们构建了MedReCo-DB，这是一个从常规图像-报告对中派生的大规模比较影像资源，包含来自八个机构、四个国家、七种成像模态的超过16万名患者的69万余张图像。报告被分解为解剖结构、异常发现和病理状况，为实体条件检索和比较视觉问答提供监督。利用该资源，我们开发了MedReCo，一个用于可控检索临床类似病例的实体感知视觉编码器，以及MedReCo-VLM，一个用于生成性解读间隔变化的视觉语言扩展。在内部、外部和跨中心评估中，MedReCo在所有12个内部检索设置中实现了最高的Recall@1，并将外部检索平均提高了6.0个百分点。在临床易混淆的鉴别组中，它始终优于最强的基线。MedReCo-VLM在所有比较生成评估中取得了最佳性能，并在胸部X光片上将纵向随访准确性提高了14.5-46.5个百分点，在CT上提高了13.0-27.9个百分点。这些发现表明，实体感知的比较推理可以从常规临床数据中大规模学习，并可能为医学影像AI提供更符合临床的基础。

英文摘要

Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retrieval and temporal comparative interpretation. We construct MedReCo-DB, a large-scale comparative imaging resource derived from routine image-report pairs, comprising more than 690,000 images from over 160,000 patients across eight institutions, four countries and seven imaging modalities. Reports are decomposed into anatomical structures, abnormal findings and pathological conditions to provide supervision for entity-conditioned retrieval and comparative visual question answering. Using this resource, we develop MedReCo, an entity-aware visual encoder for controllable retrieval of clinically analogous cases, and MedReCo-VLM, a vision--language extension for generative interpretation of interval change. Across internal, external and cross-center evaluations, MedReCo achieved the highest Recall@1 in all 12 internal retrieval settings and improved external retrieval by a mean of 6.0 percentage points. In clinically confusable differential groups, it consistently outperformed the strongest baselines. MedReCo-VLM achieved the best performance across all comparative generation evaluations and improved longitudinal follow-up accuracy by 14.5-46.5 percentage points on chest radiographs and 13.0-27.9 percentage points on CT. These findings suggest that entity-aware comparative reasoning can be learned from routine clinical data at scale and may provide a more clinically aligned foundation for medical imaging AI.

URL PDF HTML ☆

赞 0 踩 0

2606.07235 2026-06-09 cs.IR cs.LG 版本更新

大型语言模型应学习个性化而非聚合的人类偏好

Cristina Garbacea

AI总结本文主张大型语言模型应学习个性化偏好而非聚合偏好，分析聚合偏好的理论局限与实证问题，提出通过有界个性化框架兼顾个体自主与集体安全。

Comments Accepted to ICML 2026

详情

AI中文摘要

当前对齐大型语言模型（LLM）的方法将多样化的人类偏好聚合为单一奖励信号，实际上优化了一个不代表任何真实个体的假设性“平均用户”。本文立场论文认为，LLM应学习个性化、个体化的偏好而非聚合偏好。我们表明，聚合掩盖了关于偏好多样性、个体价值观和上下文依赖的关键信息，这在理论上基于社会选择理论，并在经验上跨人口群体明显。我们分析了人类偏好编码的丰富结构，调查了个性化的技术方法，并系统地回应了关于可扩展性、共享标准和操纵风险的反驳。虽然个性化引入了真正的安全挑战，包括过滤气泡、价值锁定和心理操纵，但我们认为这些挑战可以通过有界个性化框架来管理，该框架在容纳合法个体差异的同时保留通用安全约束。最后，我们提出了一个具体的研究和政策议程，以开发尊重个体自主和集体安全的偏好感知模型。

英文摘要

Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.

URL PDF HTML ☆

赞 0 踩 0

2606.08369 2026-06-09 cs.LG cs.AI 新提交

An Information-Theoretic Definition for Open-Ended Learning

开放学习的信息论定义

Wanqiao Xu, Yifan Zhu, Benjamin Van Roy

发表机构 * Stanford University（斯坦福大学）

AI总结提出基于比特等价的信息论定义开放环境，证明经典赌博机非开放，设计算法实现开放学习。

2606.07527 2026-06-09 cs.CL cs.AI cs.LG 交叉投稿

Post-training is (Massive) Supervised Learning

后训练是（大规模）监督学习

Michael Hassid, Yossi Adi, Roy Schwartz

发表机构 * FAIR, Meta AI（Meta AI 基础人工智能研究团队）； The Hebrew University of Jerusalem（耶路撒冷希伯来大学）

AI总结本文论证当前LLM后训练阶段（SFT+RL）实质是回归到BERT时代的“预训练-微调”范式，通过实验表明从零开始后训练的模型也能取得显著性能，并提出应转向“学会学习”的训练方式。

详情

AI中文摘要

训练LLM的主流范式已演变为依赖包含SFT和RL的大规模后训练阶段。在这篇立场论文中，我们认为这种方法实际上标志着回归到BERT时代的“预训练然后微调”方法，明确地使模型适应期望的行为和评估所用的特定基准。我们首先回顾LLM的历史，描述LLM演化的不同阶段。我们认为当前格局与LLM早期惊人地相似，那时任务性能严重依赖于将模型拟合到分布内数据集。为了实证证明这一点，我们比较了预训练模型和随机初始化模型，在现代推理数据集上对两种变体进行微调，并在竞争性数学和代码基准上评估它们。我们表明，从头开始后训练的模型产生了高度非平凡的性能。我们的发现表明，当前的后训练方法主要作为分布拟合机制发挥作用。最后，我们提出，开发通用能力的模型和系统需要超越针对预定义行为的广泛后训练，转而采用模型“学会如何学习”的训练过程。

英文摘要

The prevailing paradigm for training LLMs has evolved to rely on a massive post-training phase consisting of SFT and RL. In this position paper, we argue that this methodology effectively marks a reversion to the ``pre-train then fine-tune'' approach of the BERT era, explicitly tailoring models to the desired behaviors and specific benchmarks on which they are evaluated. We begin with a historical overview of LLMs, describing the different phases of the LLM evolution. We argue that the current landscape is remarkably similar to the early days of LLMs, where task performance heavily relied on fitting the models to in-distribution datasets. To empirically demonstrate this, we compare pre-trained models to randomly initialized ones, by fine-tuning both variants on modern reasoning datasets and evaluating them on competitive math and code benchmarks. We show that models post-trained from scratch yield highly non-trivial performance. Our findings suggest that current post-training methodologies function primarily as a distribution-fitting mechanism. We finish by positing that developing generally capable models and systems requires moving beyond extensive post-training for predefined behaviors, shifting instead toward training procedures where models ``learn how to learn''.

URL PDF HTML ☆

赞 0 踩 0

2606.07612 2026-06-09 cs.CY cs.AI cs.LG 交叉投稿

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence

立场：拟人化错位研究需要更强证据

Vansh Gupta, Peter Nutter, Samuel Stante, Andreas Krause, Florian Tramèr, Lukas Fluri, Xin Chen, Anna Hedström

发表机构 * University of Cambridge（剑桥大学）

AI总结本文指出拟人化错位研究（AMR）在概念模糊、数据不鲁棒、实验设计不足等问题上存在证据薄弱，提出证据层级框架和诊断清单以提升方法论严谨性。

2606.08202 2026-06-09 stat.ML cs.LG physics.data-an q-bio.NC 交叉投稿

Vector Space of Cycles

循环向量空间

Moo K. Chung, Anass B. El-Yaagoubi, Hernando Ombao

发表机构 * Department of Biostatistics and Medical Informatics University of Wisconsin Madison（威斯康星大学麦迪逊分校生物统计学与医学信息学系）； Statistics Program King Abdullah University of Science and Technology（国王 Abdullah 科学与技术大学统计学项目）

AI总结提出一种变分框架，将循环交互表示为单纯复形上的边流，通过能量最小化动力学分离瞬态与持久谐波流，得到低维循环空间，实现循环结构的投影、平均、比较和统计推断。

详情

AI中文摘要

大多数用于有向交互的统计和机器学习方法关注变量之间的成对效应。即使现有的循环模型也主要通过节点级依赖表示反馈，使得大规模循环组织难以估计和比较。这一限制在生物和神经系统中尤为突出，其中交互高度循环且涉及许多重叠的循环。我们引入了一个用于循环交互统计推断的变分框架。有向交互被表示为单纯复形上的边流，并在能量最小化动力系统下演化。由此产生的动力学将瞬态交互分量与持久谐波流分离，产生一个捕获稳定循环组织的低维循环空间。该框架不是枚举单个循环，而是将循环交互表示为希尔伯特空间的元素，从而实现投影、平均、比较和群体级统计推断。我们建立了谐波投影的理论性质，包括循环空间的表征、方差减少和群体推断。模拟表明，与现有的有向交互方法相比，该方法在密集循环系统中显著改善了循环结构的恢复。应用于400名人类受试者的静息态fMRI，该框架揭示了通过边平均无法检测的可重复的大规模循环组织。这些结果为研究高维动力系统中的循环交互提供了一个可扩展的统计框架。

英文摘要

Most statistical and machine learning methods for directed interactions focus on pairwise effects among variables. Even existing cyclic models represent feedback primarily through node-level dependencies, making large-scale recurrent organization difficult to estimate and compare. This limitation is particularly acute in biological and neural systems, where interactions are highly recurrent and involve many overlapping cycles. We introduce a variational framework for statistical inference on cyclic interactions. Directed interactions are represented as edge flows on a simplicial complex and evolved under an energy-minimizing dynamical system. The resulting dynamics separate transient interaction components from persistent harmonic flows, yielding a low-dimensional cycle space that captures stable recurrent organization. Rather than enumerating individual cycles, the proposed framework represents cyclic interactions as elements of a Hilbert space, enabling projection, averaging, comparison, and population-level statistical inference. We establish theoretical properties of the harmonic projection, including characterization of the cycle space, variance reduction, and population inference. Simulations demonstrate substantially improved recovery of cyclic structure in dense recurrent systems compared with existing directed-interaction methods. Applied to resting-state fMRI from 400 human subjects, the framework reveals reproducible large-scale cyclic organization that is not detectable through edgewise averaging. These results provide a scalable statistical framework for studying recurrent interactions in high-dimensional dynamical systems.

URL PDF HTML ☆

赞 0 踩 0

2606.08296 2026-06-09 cs.AI cs.LG 交叉投稿

Revisiting the shutdown problem

重新审视关机问题

David Thorstad

发表机构 * GitHub

AI总结本文重新评估了AI关机问题的难度，指出现有论证未能证明其难以解决，且相关技术方案对模型性能造成了高安全代价。

2606.08728 2026-06-09 cs.AI cs.CL cs.CV cs.LG 交叉投稿

Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

人工智能数学推理：语言模型、神经符号系统与验证发现的综合综述

Syed Rifat Raiyan, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Cambridge（剑桥大学）； University of Toronto（多伦多大学）

AI总结本文综述了数学推理领域从早期规则系统到当代推理模型、多智能体系统及验证发现工作流的演变，沿非正式推理、形式推理、数学发现及推理技术四轴组织，并评估了基准测试、失败模式及未来方向。

Comments Under review, 47 pages, 14 figures, 22 tables

详情

AI中文摘要

数学推理长期以来一直是机器智能的严格测试；在过去十年中，它已从NLP中的一个边缘问题发展为最重要的人工智能前沿之一。本综述对该领域的演变进行了统一阐述，从早期基于规则的数学文字题（MWP）求解器和模板驱动的几何系统，到神经表达式生成和LLM提示，再到当代推理模型、多智能体系统、神经符号定理证明器和验证发现工作流。我们沿四个轴组织该领域：(i) 文本和图表的非正式推理，涵盖MWP求解、多模态几何和VLM；(ii) 证明助手的形式推理，包括自动形式化、策略预测、编译器引导修复和证明搜索；(iii) 数学发现，其中系统提出构造、改进界限或协助攻击开放问题；以及(iv) 推理和训练时技术，包括CoT提示、工具使用、过程奖励模型和RLVR，这些技术日益将生成与验证联系起来。我们编目了涵盖小学算术、竞赛数学、几何、形式证明、多模态和多语言推理以及专家评估的主要基准，并考察了基准饱和、污染、报告不匹配以及pass@1、多数投票和验证器辅助pass@$k$之间的区别。我们批判性地评估了失败模式：扰动下的脆弱性、奖励黑客、多模态基础失败、脆弱形式化以及推理规模推理的能源成本。借鉴来自在职数学家的近期观点，我们确定了未来方向，集中于验证发现工作流、推理效率以及使AI辅助形式化广泛可用的基础设施。配套材料：https://github.com/Starscream-11813/awesome-AI4Math。

英文摘要

Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it has moved from a niche problem within NLP to one of the most consequential AI frontiers. This survey provides a unified account of the field's evolution, from early rule-based math word problem (MWP) solvers and template-driven geometry systems, through neural expression generation and LLM prompting, to contemporary reasoning models, multi-agent systems, neuro-symbolic theorem provers, and verified discovery workflows. We organize the landscape along four axes: (i) informal reasoning over text and diagrams, spanning MWP solving, multimodal geometry, and VLMs; (ii) formal reasoning in proof assistants, including autoformalization, tactic prediction, compiler-guided repair, and proof search; (iii) mathematical discovery, where systems propose constructions, improve bounds, or assist attacks on open problems; and (iv) the inference and training-time techniques, including CoT prompting, tool use, process reward models, and RLVR, that increasingly connect generation with verification. We catalog major benchmarks across grade-school arithmetic, competition mathematics, geometry, formal proving, multimodal and multilingual reasoning, and expert evaluation, and we examine benchmark saturation, contamination, reporting mismatches, and the distinction between pass@1, majority voting, and verifier-assisted pass@$k$. We critically assess failure modes: brittleness under perturbation, reward hacking, multimodal grounding failures, fragile formalization, and the energy cost of reasoning-scale inference. Drawing on recent perspectives from working mathematicians, we identify future directions centered on verified-discovery workflows, reasoning efficiency, and infrastructure to make AI-assisted formalization broadly usable. Companion materials: https://github.com/Starscream-11813/awesome-AI4Math.

URL PDF HTML ☆

赞 0 踩 0

2606.09404 2026-06-09 stat.ML cs.AI cs.LG 交叉投稿

SAILS: Surrogate-based Analysis of Interactions via Local Effect Smooths

SAILS: 基于局部效应平滑的交互作用代理分析

Timo Heiß, Julia Herbinger, Bernd Bischl, Giuseppe Casalicchio

发表机构 * Department of Statistics, LMU Munich（慕尼黑大学统计系）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Leibniz Institute for Prevention Research and Epidemiology（莱比锡预防研究与流行病学研究所）

AI总结提出SAILS框架，通过可解释的广义加性模型代理分析黑箱模型中的成对交互作用，实现交互检测、形式分类和可视化。

详情

AI中文摘要

特征交互驱动了机器学习模型的大部分预测能力，然而现有的解释方法仅能检测和量化交互作用，而无法揭示其函数形式，或者只能可视化受限的交互类型。我们提出了基于局部效应平滑的交互作用代理分析（SAILS），这是一个模型无关的框架，通过拟合黑箱模型局部效应的可解释广义加性模型（GAM）代理来分析成对交互作用。对于感兴趣特征的每个区间，代理平滑项在导数层面隔离交互成分，从而实现（i）通过对平滑项显著性检验的启发式方法进行交互检测，（ii）将交互形式分类为线性、乘积可分离和非乘积可分离类型，以及（iii）为每种交互类型提供定制化、可解释的可视化。我们通过受控模拟和实际任务实证验证了该框架，展示了其在成对交互作用上的有效性，但在强特征相关性和高阶交互作用下存在局限性。SAILS填补了XAI工具箱中的一个显著空白，超越了仅检测交互作用，进而表征其函数形式。

英文摘要

Feature interactions drive much of the predictive power of machine learning models, yet existing explanation methods only detect and quantify interactions without revealing their functional form, or visualize only restricted interaction types. We propose Surrogate-based Analysis of Interactions via Local effect Smooths (SAILS), a model-agnostic framework that analyzes pairwise interactions through interpretable generalized additive model (GAM) surrogates fitted to the local effects of a black-box model. For each interval of a feature of interest, the surrogate smooth terms isolate the interaction components on derivative level, enabling (i) interaction detection through a heuristic derived from significance tests on smooth terms, (ii) interaction form categorization into linear, product-separable, and non-product-separable types, and (iii) tailored, interpretable visualizations for each interaction type. We empirically validate the framework through controlled simulations and a real-world task, demonstrating its effectiveness for pairwise interactions, with limitations under strong feature correlations and higher-order interactions. SAILS fills a notable gap in the XAI toolbox, going beyond detection of interactions alone to characterizing their functional form.

URL PDF HTML ☆

赞 0 踩 0

2606.09672 2026-06-09 cs.AI cs.CL cs.LG cs.PF q-bio.QM 交叉投稿

Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

相关性不够：嵌入人类元数据用于个体因果发现

Suraj Biswas, Saurabh Gupta, Pritam Mukherjee

发表机构 * Assessli Research（Assessli研究）； Dots-In Research（Dots-In研究）

AI总结针对预训练生物医学语言模型在跨域无关对中产生高余弦相似度（0.76-0.92）导致因果推断错误的问题，提出对比学习（提升分离度至1.63x）和BODHI硬负例挖掘（提升至2.30x），结合OpenVINO优化实现133倍加速。

Comments 20 pages, 18 figures, 9 tables

详情

AI中文摘要

询问一个预训练的生物医学语言模型“皮质醇28 ug/dL”和“股市波动”是否相关，它会返回0.83的余弦相似度（1.0表示完全相同）。两者没有共同机制。这不是个例：我们测试的所有现成生物医学编码器（BioBERT、PubMedBERT、BioM-ELECTRA）在跨域无关对上得分在0.76到0.92之间，而正确答案应接近零。跨域区分准确率为0%。检索系统可以承受这一点，因为下游语言模型会过滤噪声。但大型行为模型（LBM）——一种以人为对象而非句子的基础模型——则不能：它在用户生活图上推理，并将嵌入接近性视为两个事件因果关联的证据。虚假接近性会写入虚假因果边，所有下游都会继承错误。在这里，嵌入几何不是调节旋钮，而是正确性的关键。我们报告了修复方法。对72,034对进行对比训练，将PubMedBERT的BIOSSES相关性从0.633提升到0.828，域内与域间分离度从1.05倍提升到1.63倍。第二次训练BODHI从生物医学知识图中缺失的边挖掘硬负例，将分离度提升到2.30倍，区分差距提升到+0.392，BIOSSES代价为4.5%。在带有AMX的Intel Xeon 6737P上，OpenVINO将单查询延迟从1367毫秒降至10毫秒（133倍），达到每秒555个句子。一个发现与标准建议相悖：在此芯片上，FP16在所有服务批量大小下优于INT8，我们解释了原因。同一模型在无AMX的Ice Lake实例上运行慢13-27倍。我们发布了基准测试套件、训练语料库、BODHI生成器和OpenVINO脚本。

英文摘要

Ask a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT, BioM-ELECTRA) scores unrelated cross-domain pairs between 0.76 and 0.92 when the answer should be near zero. Accuracy on cross-domain discrimination is 0%. Retrieval systems survive this, because a language model downstream filters the noise. A Large Behavioural Model (LBM), a foundation model whose subject is a person rather than a sentence, does not: it reasons over a graph of a user's life and treats embedding proximity as evidence that two events are causally linked. False proximity writes a false causal edge, and everything downstream inherits the error. Here, embedding geometry is not a tuning knob; it is correctness. We report the fix. A contrastive pass over 72,034 pairs raises PubMedBERT BIOSSES correlation from 0.633 to 0.828 and within-vs-across-domain separation from 1.05x to 1.63x. A second pass, BODHI, mines hard negatives from edges absent in a biomedical knowledge graph and lifts separation to 2.30x and the discrimination gap to +0.392, at a 4.5% BIOSSES cost. On an Intel Xeon 6737P with AMX, OpenVINO cuts single-query latency from 1367 ms to 10 ms (133x) and reaches 555 sentences/sec. One finding contradicts standard advice: FP16 beats INT8 on this silicon at every serving batch size, and we explain why. The same model on a no-AMX Ice Lake instance runs 13-27x slower. We release the benchmark suite, training corpora, the BODHI generator, and the OpenVINO scripts.

URL PDF HTML ☆

赞 0 踩 0

2606.09711 2026-06-09 cs.AI cs.LG 交叉投稿

Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

代理奖励内化与机制性利用：奖励黑客及其泛化的学习前兆

Mohammad Beigi, Ming Jin, Lifu Huang

发表机构 * UC Davis（加州大学戴维斯分校）； Virginia Tech（弗吉尼亚理工大学）

AI总结提出PRIME概念，通过思维链监控、直接探针和激活级概念向量测量，发现PRIME在持续奖励黑客前分阶段出现，且直接探针得分可预测后续黑客爆发，跨检查点跟踪域外失调。

详情

AI中文摘要

奖励黑客通常在其变得可见后才被研究，即当模型获得高代理奖励但未能完成预期任务时。我们转而研究代理强化学习在失败出现之前教会了什么。我们引入了代理奖励内化与机制性利用（PRIME），这是一种评估任务正确性、预测代理接受度以及推理可被利用的代理-黄金差距的学习能力。在具有可被利用的pytest奖励的编码强化学习环境中，我们通过思维链监控、直接探针和激活级概念向量来测量PRIME。我们发现，PRIME在持续奖励黑客之前以阶段性顺序出现，并且其当前的直接探针得分可以预测后续黑客的爆发时间和严重程度，即使可见的黑客率仍然很低。当评估者发生变化时，PRIME也会适应，重新瞄准任何仍然获得奖励的代理-黄金差距，并在黄金奖励抑制公开黑客时持续存在；消除其激活方向会减少黑客行为。跨检查点，域内PRIME跟踪域外失调。这些结果共同表明，可被利用的代理强化学习放大了可见黑客上游的代理内化能力，使PRIME成为更广泛对齐风险的候选早期预警信号。

英文摘要

Reward hacking is usually studied after it becomes visible, once a model earns high proxy reward while failing the intended task. We instead study what proxy RL teaches before that failure appears. We introduce Proxy Reward Internalization and Mechanistic Exploitation (PRIME), a learned capability to assess task correctness, predict proxy acceptance, and reason about exploitable proxy--gold gaps. In coding RL environments with exploitable pytest rewards, we measure PRIME through chain-of-thought monitoring, direct probes, and activation-level concept vectors. We find that PRIME emerges in a staged sequence before sustained reward hacking, and that its current direct-probe score forecasts later hack onset and severity even when the visible hack rate is still low. PRIME also adapts when the evaluator changes, retargeting to whichever proxy--gold gap remains rewarded and persisting when gold reward suppresses overt hacking, and ablating its activation directions reduces hacking. Across checkpoints, in-domain PRIME tracks out-of-domain misalignment. Together these results suggest that exploitable proxy RL amplifies a proxy-internalization capability upstream of visible hacking, making PRIME a candidate early-warning signal for broader alignment risk.

URL PDF HTML ☆

赞 0 踩 0

2310.10196 2026-06-09 cs.LG cs.AI 版本更新

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

时间序列与时空数据的大模型：综述与展望

Ming Jin, Yaxuan Kong, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li, Vincent S. Tseng, Yu Zheng, Lei Chen, Hui Xiong, Shirui Pan, Qingsong Wen

发表机构 * Griffith University（格里菲斯大学）； University of Oxford（牛津大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Zhejiang Normal University（浙江师范大学）； Ant Group（蚂蚁集团）； Alibaba Group（阿里巴巴集团）； Deloitte Service LLP（德勤服务有限责任公司）； The University of Hong Kong（香港大学）； NEC Laboratories America（NEC美国实验室）； A*STAR ； National Yang Ming Chiao Tung University（阳明交通大学）； JD Technology（京东科技）； Squirrel Ai Learning

AI总结综述了面向时间序列和时空数据的大模型，按数据类型、模型类别、范围和应用领域分类，总结了通用与领域专用模型，并整理了相关资源与开放问题。

Comments Accepted by ACM Computing Surveys; 35 Pages; Github Repo: https://github.com/qingsongedu/Awesome-TimeSeries-SpatioTemporal-LM-LLM

详情

AI中文摘要

时间数据，包括时间序列和时空数据，在现实应用中无处不在。物理和虚拟传感器生成的海量数据记录了动态系统行为，支持各种下游任务。有效分析这些数据对于挖掘其丰富信息至关重要。大型语言模型和其他基础模型的最新进展加速了它们在时间序列和时空数据挖掘中的应用。这些方法不仅提高了跨领域的模式识别和推理能力，还支持了能够理解和处理时间数据的人工通用智能的发展。在本综述中，我们沿着四个维度（数据类型、模型类别、模型范围和应用领域/任务）对针对时间序列和时空数据定制或适配的大模型进行了全面、最新的回顾。我们将现有工作分为两大组：用于时间序列分析的大模型（LM4TS）和用于时空数据挖掘的大模型（LM4STD），并进一步区分通用模型和领域专用模型。我们还整理了相关资源，包括数据集、模型实现和工具，按主要应用领域组织。总体而言，本综述整合了近期进展，并突出了以大型模型为中心的时间数据分析的基础、应用、资源和开放研究机会。

英文摘要

Temporal data, including time series and spatio-temporal data, are pervasive in real-world applications. Generated in massive volumes by physical and virtual sensors, they record dynamic system behaviors and enable a wide range of downstream tasks. Effectively analyzing such data is crucial to unlocking their rich information content. Recent advances in large language models and other foundation models have accelerated their use in time series and spatio-temporal data mining. These approaches not only improve pattern recognition and reasoning across diverse domains but also support progress toward artificial general intelligence that can understand and process temporal data. In this survey, we present a comprehensive, up-to-date review of large models tailored or adapted for time series and spatio-temporal data along four dimensions: data types, model categories, model scopes, and application areas/tasks. We organize existing work into two main groups: large models for time series analysis (LM4TS) and for spatio-temporal data mining (LM4STD), and further distinguish general-purpose from domain-specific models. We also curate related resources, including datasets, model implementations, and tools, organized by major application areas. Overall, this survey consolidates recent advances and highlights foundations, applications, resources, and open research opportunities in large model-centric temporal data analysis.

URL PDF HTML ☆

赞 0 踩 0

2506.20699 2026-06-09 cs.LG 版本更新

Structural Decoupling: A Scaffold-Flow Theory of Generalization and Alignment

结构解耦：泛化与对齐的支架流理论

Xin Li

发表机构 * NSF（美国国家科学基金会）； Xin Li（李新）

AI总结提出结构学习理论（StrLT），通过宽度概念和收缩相似性算子，揭示非平稳环境下结构发现与维护的机制，并导出结构解耦原则，解释幻觉、奖励模型边界错误等安全问题。

详情

AI中文摘要

在非平稳和多上下文环境中的学习需要超越普通的任务内泛化。系统还必须发现哪些上下文存在，将输入路由到正确的上下文，保留旧上下文，并在环境变化时修订上下文库。本文提出结构学习理论（StrLT）作为填补这一结构缺失的框架。StrLT 补充了 Vapnik 的统计学习理论（SLT）：SLT 支配着固定机制内的预测或控制（即“漏斗”）；而 StrLT 支配着结构机制的发现与维护（即“陷阱”）。StrLT 的核心对象是宽度，即覆盖一个问题所需的最少局部可行上下文数量。我们总结了三个基本结果：宽度与 VC 维不可比较；学习在真实宽度处发生相变；宽度可通过收缩相似性（CS）算子估计，该算子将任务诱导的非收缩性转化为谱分离。在 StrLT 框架下，我们解释了固定类别的结构可学习性如何导致结构解耦原则：维持结构支架的机制不应由优化上下文内流的相同梯度来训练。这一原则激发了一种支架流模型，其中对齐和泛化在架构上分离。最后，我们论证了若干安全故障，包括幻觉、奖励模型边界错误和欺骗性对齐，可以被解释为支架分辨率或支架维护的失败，而不仅仅是输出层面的预测错误。

英文摘要

Learning in non-stationary and multi-context environments requires more than ordinary within-task generalization. A system must also discover which contexts exist, route inputs to the correct context, preserve old contexts, and revise the context library when the environment changes. This paper presents Structural Learning Theory (StrLT) as a framework of filling this missing structural gap. StrLT complements Vapnik's Statistical Learning Theory (SLT): SLT governs the \emph{funnel}, prediction or control within a fixed regime; while StrLT governs the \emph{trap}, the discovery and maintenance of structural regimes. The core StrLT object is \emph{width}, the minimum number of locally feasible contexts needed to cover a problem. We summarize three basic results: width is incomparable with VC dimension; learning exhibits a phase transition at the true width; and width can be estimated by a contractive-similarity (CS) operator that converts task-induced non-contractivity into spectral separation. Under the StrLT framework, we explain how fixed-class structural learnability leads to a \emph{structural decoupling principle}: the mechanisms that maintain the structural scaffold should not be trained by the same gradients that optimize within-context flow. This principle motivates a scaffold-flow model in which alignment and generalization separate architecturally. Finally, we argue that several safety failures, including hallucination, reward-model boundary errors, and deceptive alignment, can be interpreted as scaffold-resolution or scaffold-preservation failures rather than merely output-level prediction errors.

URL PDF HTML ☆

赞 0 踩 0

2606.00568 2026-06-09 cs.LG q-bio.GN 版本更新

On the Recoverability of Causal Relations from Bulk Gene Expression Data

从批量基因表达数据中恢复因果关系的可能性

Gongxu Luo, Boyang Sun, Kun Zhang

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·泽伊德人工智能大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文通过形式化聚合下的一致性和推导充要条件，研究了从批量基因表达数据中恢复因果关系的可能性，并发现仅在线性聚合与仿射结构方程下可恢复，而实证数据偏离线性假设。

详情

AI中文摘要

批量基因表达谱分析将生物样本中所有细胞的RNA混合后测量，在单细胞时代仍然重要，因为它通常比单细胞检测噪声更低、灵敏度更高且成本效益更好。因此，越来越多的计算方法试图从批量表达数据中恢复基因间的因果关系。然而，聚合是对底层细胞系统的有损、不可逆的粗化，目前尚不清楚是否以及在何种条件下可以从聚合的批量基因表达数据中恢复因果关系。为了回答这个问题，我们通过两种一致性概念（函数形式一致性和条件独立性一致性）形式化了聚合下的可恢复性。然后，我们推导了可恢复性的必要和充分条件，表明这些性质仅在线性聚合（如求和/均值）与仿射结构方程结合时得以保持。为了评估这些条件的实际可行性，对四个批量基因表达数据集和四个单细胞基因表达数据集的分析进一步揭示，两种数据类型中估计的基因间成对调控函数均偏离线性，为可恢复性所需的线性假设提供了有限的经验支持。总之，这些结果告诫我们，在没有强额外假设的情况下，不应从聚合的批量表达数据中恢复因果关系。

英文摘要

Bulk gene expression profiling, which aggregates pooled RNA across cells within a biological sample, remains important in the single-cell era because it is typically less noisy, more sensitive, and more cost-effective than single-cell assays. Accordingly, a growing body of computational methods seeks to recover causal relations among genes from bulk expression data. However, aggregation is a lossy, non-invertible coarsening of the underlying cellular system, and it remains unclear whether and under what conditions causal relations are recoverable from aggregated bulk gene expression data. To answer this, we formalize recoverability under aggregation through two notions of consistency: functional-form consistency and conditional-independence consistency. We then derive necessary and sufficient conditions for recoverability, showing that these properties are preserved only under linear aggregations (e.g., sum/mean) coupled with affine structural equations. To assess the practical plausibility of these conditions, analyses of four bulk and four single-cell gene expression datasets further reveal that the estimated pairwise regulatory functions among genes deviate from linearity in both data types, providing limited empirical support for the linearity assumptions required for recoverability. Together, these results caution against recovering causal relations from aggregated bulk expression data without strong additional assumptions.

URL PDF HTML ☆

赞 0 踩 0

2406.05335 2026-06-09 cond-mat.dis-nn cs.LG 版本更新

Phase transition in large language models and the criticality of natural languages

大型语言模型中的相变与自然语言的临界性

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

发表机构 * Center for Advanced Intelligence Project, RIKEN（先进智能项目中心，理化学研究所）； National Institute for Japanese Language and Linguistics（日本语言学研究所）； Department of Physics, Nagoya University（名古屋大学物理系）； Department of Multidisciplinary Sciences, The University of Tokyo（东京大学多学科科学系）； Komaba Institute for Science, The University of Tokyo（东京大学Komaba科学研究所）

AI总结通过将大型语言模型作为可控有效模型，发现当调节类似物理温度的参数时，模型经历相变，临界点生成的文本呈现幂律行为，最接近自然语言，表明自然语言具有临界性。

Comments 8 pages, 6 figures

详情

AI中文摘要

自然语言中的文本和语音生成可以建模为随机过程。这一思想可追溯到马尔可夫的开创性工作，以及后来的香农，也构成了大型语言模型（LLMs）近期发展的基础。自然语言对应的随机过程应不同于生成非语言序列的过程。区分语言与非语言序列的特征之一是幂律行为，这在不同语言中普遍存在。在统计物理学中，这种行为表明自然语言是临界的：它们位于参数化随机过程空间中的相变点附近。然而，验证这一猜想并不直接。即使存在相变，也无法在现实世界的自然语言中直接观察到，因为它们没有任何可控参数。在这里，我们使用LLMs作为自然语言的可控有效模型。通过对LLMs生成文本的统计分析，我们发现，当改变类似于物理温度的参数时，LLMs经历相变。该相变将低温相（生成文本具有复杂重复结构）与高温相（LLMs生成难以理解的文本）分开。在这些相之间的临界点，生成的文本显示出与自然语言相似的幂律行为，并且通过自然语言处理中的标准度量最接近自然语言。这些发现强烈表明自然语言确实是临界的。

英文摘要

Generation of text and speech in natural languages can be modeled as a stochastic process. This idea dates back to the seminal work of Markov and, later, to that of Shannon and also underlies the recent development of large language models (LLMs). The stochastic processes corresponding to natural languages should be distinct from those that generate nonlinguistic sequences. One of the features that discriminate linguistic and nonlinguistic sequences is power-law behavior, which is universally observed across different languages. In statistical physics, such behavior suggests that natural languages are critical: They lie near a phase transition point in a parametrized space of stochastic processes. However, testing this conjecture is not straightforward. A phase transition, even if it exists, cannot be directly observed in real-world natural languages because they do not have any controllable parameters. Here, we use LLMs as controllable effective models of natural languages. Through statistical analyses of texts generated by LLMs, we find that, when a parameter analogous to physical temperature is varied, LLMs undergo a phase transition. The transition separates a low-temperature phase with complex repetitive structures in generated texts from a high-temperature phase in which LLMs generate incomprehensible texts. At the critical point between these phases, generated texts display the power-law behavior similar to that of natural languages and most closely resemble natural languages as measured by a standard metric in natural language processing. These findings strongly suggest that natural languages are indeed critical.

URL PDF HTML ☆

赞 0 踩 0

2407.10247 2026-06-09 cs.CY cs.AI cs.LG econ.GN q-fin.EC 版本更新

Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

人工智能在C级管理层的战略整合：首席人工智能官的角色

Marc Schmitt

发表机构 * University of Oxford（牛津大学）

AI总结本文提出角色设计理论，解释企业为何设立首席AI官（CAIO）或采用其他结构，并分析AI的独特属性（分布式判断问责、上游治理、非平稳性）如何影响高管角色设计。

详情

AI中文摘要

人工智能（AI）融入企业战略已成为组织在数字时代保持竞争优势的关键。尽管组织日益将AI视为战略和组织资源，但现有的C级管理层角色仅部分具备在企业层面统一治理、整合和利用AI的能力。各组织的应对方式不同：有的设立专职首席AI官（CAIO），有的将现有职责扩展为混合角色，还有的通过联邦式结构协调AI。本文发展了一种角色设计理论来解释这种差异。我识别出AI区别于以往跨领域企业技术的三个属性——分布式判断问责、上游治理和非平稳性——以及组织应对的三种配置：集中扩展、分布式扩展和角色创建。CAIO框架将这些属性与它们产生的行政设计问题以及专职角色所需的功能和能力联系起来。四个命题具体说明了专职CAIO何时出现、组织采取何种形式、专职角色何时有效以及配置如何随时间演变。本文通过提供高管层面AI战略整合的理论驱动解释，为高管领导力、组织设计和数字治理研究做出贡献。

英文摘要

The integration of Artificial Intelligence (AI) into corporate strategy has become critical for organizations seeking to maintain competitive advantage in the digital age. Although organizations increasingly rely on AI as a strategic and organizational resource, existing C-suite roles remain only partially equipped to govern, integrate, and leverage it coherently at the enterprise level. Organizations vary in their responses. Some create a dedicated Chief AI Officer (CAIO), others extend existing mandates into hybrid roles, and still others coordinate AI through federated structures. This paper develops a role-design theory to explain this variation. I identify three properties that distinguish AI from earlier cross-cutting enterprise technologies - distributed accountability for judgment, upstream governance, and non-stationarity - and three configurations through which organizations respond: concentrated extension, distributed extension, and role creation. The CAIO Framework links these properties to the executive design problems they generate and to the functions and capabilities required of the dedicated role. Four propositions specify when a dedicated CAIO emerges, what form an organization's response takes, when the dedicated role is effective, and how configurations evolve over time. This paper contributes to research on executive leadership, organizational design, and digital governance by offering a theory-driven account of the strategic integration of AI at the executive level.

URL PDF HTML ☆

赞 0 踩 0

2601.06077 2026-06-09 cs.IT cs.AI cs.LG math.IT math.OC 版本更新

生成AI的另一种轨迹

Margarita Belova, Yuval Kansal, Yihao Liang, Jiaxin Xiao, Niraj K. Jha

发表机构 * Princeton University（普林斯顿大学）

AI总结本文提出通过构建领域特定超智能（DSS）来改进生成AI，利用符号抽象提升领域推理能力，避免LLM合成数据的模型崩溃问题，实现可持续发展。

详情

AI中文摘要

生成人工智能（AI）生态系统正经历快速变革，威胁其可持续性。随着模型从研究原型转向高流量产品，能耗从一次性训练转向持续的无界推理。推理模型使计算成本每查询增加数个数量级。通过单体模型扩展追求人工通用智能与物理约束的碰撞：电网故障、用水消耗和数据扩展的边际效益递减。此轨迹产生具有出色事实记忆的模型，但在需要深入推理的领域表现不佳，可能由于训练数据中的抽象不足。当前大型语言模型（LLMs）仅在数学和编程等领域表现出真实的推理深度，其他领域泛化能力差。我们提出基于领域特定超智能（DSS）的替代轨迹。我们主张首先构建显式的符号抽象（知识图谱、本体和形式逻辑）以支撑合成课程，使小型语言模型能够掌握领域特定推理，而无需LLM基于合成数据方法的模型崩溃问题。而非单一通用巨模型，我们设想“DSS模型社会”：动态生态系统，其中协调代理将任务路由到不同的DSS后端。此范式转变使能力脱离规模，使智能从能耗高的数据中心迁移到安全的设备专家。通过将算法进步与物理约束对齐，DSS社会使生成AI从环境负担转变为可持续的经济赋能力量。

英文摘要

The generative artificial intelligence (AI) ecosystem is undergoing rapid transformations that threaten its sustainability. As models transition from research prototypes to high-traffic products, the energetic burden has shifted from one-time training to recurring, unbounded inference. This is exacerbated by reasoning models that inflate compute costs by orders of magnitude per query. The prevailing pursuit of artificial general intelligence through scaling of monolithic models is colliding with hard physical constraints: grid failures, water consumption, and diminishing returns on data scaling. This trajectory yields models with impressive factual recall but struggles in domains requiring in-depth reasoning, possibly due to insufficient abstractions in training data. Current large language models (LLMs) exhibit genuine reasoning depth only in domains like mathematics and coding, where rigorous, pre-existing abstractions provide structural grounding. In other fields, the current approach fails to generalize well. We propose an alternative trajectory based on domain-specific superintelligence (DSS). We argue for first constructing explicit symbolic abstractions (knowledge graphs, ontologies, and formal logic) to underpin synthetic curricula enabling small language models to master domain-specific reasoning without the model collapse problem typical of LLM-based synthetic data methods. Rather than a single generalist giant model, we envision "societies of DSS models": dynamic ecosystems where orchestration agents route tasks to distinct DSS back-ends. This paradigm shift decouples capability from size, enabling intelligence to migrate from energy-intensive data centers to secure, on-device experts. By aligning algorithmic progress with physical constraints, DSS societies move generative AI from an environmental liability to a sustainable force for economic empowerment.

URL PDF HTML ☆

赞 0 踩 0

2606.01060 2026-06-09 cs.CL cs.AI cs.LG 版本更新

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

MENTIS: 对齐改变了什么信念？语言模型中多尺度潜在扭转的测量

Partha Pratim Saha, Samarth Raina, Mayur Parvatikar, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das

发表机构 * Pragya Lab, BITS Pilani Goa, India（BITS Pilani 去掉 Goa 的机构名，因为该机构名中包含 'Goa'，但根据规则，如果机构已有常见中文名，使用常见中文名。'Pragya Lab, BITS Pilani' 是 BITS Pilani 的一个实验室，因此翻译为 'BITS Pilani 实验室'）； IIIT Delhi, India（德里印度理工学院）； Amazon, USA（美国亚马逊）； Meta, USA（美国Meta）； Apple, USA（美国苹果）

AI总结提出MENTIS框架，通过层间协方差扭转范数、谱扭转诊断和能量-辐射-激活度量，测量偏好对齐在语言模型内部计算中引起的选择性、深度局部的几何结构变化。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

偏好对齐显著改善了大语言模型的可观察行为，但尚不清楚对齐在内部改变了什么。对齐系统在越狱、提示注入和检索时损坏下仍然失败，表明仅行为级评估是不完整的。后训练应在内部计算中留下可测量的痕迹。我们问：当指令微调（IT）模型变为偏好对齐（PA）模型时，哪些几何结构发生了变化，这些变化集中在何处，以及它们在不同概念、提示和模型家族中的选择性如何？我们引入MENTIS，一个几何优先的框架，用于测量配对检查点中对齐引起的内部重组。MENTIS使用基于层间协方差的主扭转范数（T1）、辅助谱扭转诊断（T2）和用于深度定位的能量-辐射-激活度量（ERA）来比较IT和PA模型。在LITMUS上的四个7-8B模型对中，我们的研究表明对齐引起的变化是选择性的而非均匀的：规范性概念平均表现出比事实性概念更大的扭转偏移；扭转与上下文熵负相关；峰值效应定位于架构特定的中后层。相同的模式出现在词级、提示级和模型级分析中。这些结果表明偏好对齐在内部计算中留下了结构化的、深度局部的几何特征，超越了仅行为级评估所能揭示的内容。

英文摘要

Preference alignment has substantially improved the observable behavior of large language models, yet it remains unclear what alignment changes internally. Aligned systems still fail under jailbreaks, prompt injection, and retrieval-time corruption, suggesting behavior-level evaluation alone is incomplete. Post-training should leave measurable traces in internal computation. We ask: when an instruction-tuned (IT) model becomes a preference-aligned (PA) model, what geometric structure changes, where do those changes concentrate, and how selectively do they vary across concepts, prompts, and model families? We introduce MENTIS, a geometry-first framework for measuring alignment-induced internal reorganization in paired checkpoints. MENTIS compares IT and PA models using a primary layerwise covariance-based torsion norm (T1), a secondary spectral torsion diagnostic (T2), and an Energy-Radiance-Activation measure (ERA) for depth localization. Across four 7-8B model pairs on LITMUS, our study reveals that alignment-induced change is selective rather than uniform: normative concepts exhibit larger torsion shifts than factual concepts on average; torsion is negatively correlated with contextual entropy; and peak effects localize to architecture-specific mid-to-late layers. The same pattern appears across word-level, prompt-level, and model-level analyses. These results suggest preference alignment leaves structured, depth-localized geometric signatures in internal computation beyond what behavior-level evaluation alone can reveal.

URL PDF HTML ☆

赞 0 踩 0

2606.05363 2026-06-09 cs.GT cs.LG econ.TH math.OC 版本更新

Should Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion

需求模型是否应包含竞争对手价格？无知学习与算法合谋

Yuhang Wu, Assaf Zeevi

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）

AI总结研究在竞争市场中，定价算法是否应显式建模竞争对手价格，通过对比无知与知情学习策略，发现知情策略是纳什均衡且价格收敛至竞争结果，而合谋模式不稳健。

Comments Preliminary version "Oblivious Learning, Price Exploration and Collusive Dynamics" accepted at EC 2026

详情

AI中文摘要

在一个拥有多个卖家的平台上，定价算法在学习需求时是否应显式建模竞争对手的价格？经典学习论点给出肯定答案：忽略竞争对手会导致模型错误指定和效率低下。相反，关于算法合谋的最新研究表明，战略性无知——故意忽略竞争对手价格——可能促进合谋结果并提高利润。我们在一个具有未知噪声需求的风格化竞争市场中研究这一建模选择，其中多个卖家重复设定价格并通过迭代最小二乘法估计需求，要么将竞争对手价格纳入其需求模型（知情），要么忽略它们（无知）。我们首先证明，相对于垄断者，竞争市场中的无知卖家必须更积极地探索以补偿动态竞争对手信息的损失。基于这一见解，我们刻画了所有卖家均为无知时的市场动态，并表明在充分探索下价格收敛至竞争结果，而当探索衰减时会出现连续伪均衡。分析价格轨迹，我们发现一种“偏离”现象，产生随学习进行而消散的暂时合谋模式。在同时存在无知和知情卖家的市场中，知情卖家的收益严格高于无知卖家。作为策略博弈解读，该建模选择具有唯一的纳什均衡：全知情市场，其中价格有效收敛至竞争结果。总体而言，我们的结果表明合谋模式不稳健，且不能由无知建模维持；因此，纳入竞争对手信息，结合充分的价格探索，仍是竞争市场中卖家的可靠策略。

英文摘要

On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits. We study this modeling choice in a stylized competitive market with unknown noisy demand, in which multiple sellers repeatedly set prices and estimate demand via iterated least squares, and either incorporate competitors' prices into their demand models (informed) or ignore them (oblivious). We first show that, relative to a monopolist, an oblivious seller in a competitive market must explore more aggressively to compensate for the loss of dynamic competitor information. Building on this insight, we characterize market dynamics when all sellers are oblivious and show that prices converge to the competitive outcome under sufficient exploration, while a continuum of pseudo-equilibria arises when exploration decays. Analyzing the resulting price trajectories, we uncover an excursion phenomenon that gives rise to transient collusive patterns that dissipate as learning progresses. In markets with both oblivious and informed sellers, the informed strictly out-earn the oblivious. Read as a strategy game, the modeling choice has a unique Nash equilibrium: the all-informed market, in which prices converge to the competitive outcome efficiently. Overall, our results indicate that collusive patterns are not robust and are not sustained by oblivious modeling; therefore, incorporating competitor information, together with sufficient price exploration, remains a reliable strategy for sellers in competitive markets.

URL PDF HTML ☆

赞 0 踩 0

2602.14975 2026-06-09 physics.chem-ph cs.LG 版本更新

Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces

通过蒸馏多时间步长和非保守力加速基于神经网络势的分子动力学

Nicolaï Gouraud, Côme Cattin, Thomas Plé, Olivier Adjoua, Louis Lagardère, Jean-Philip Piquemal

发表机构 * Qubit Pharmaceuticals, Advanced Research Department（Qubit制药公司，先进研究部）； Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS（索邦大学，理论化学实验室，UMR 7616 CNRS）； Laboratoire de Chimie Théorique, UMR 7616 CNRS（理论化学实验室，UMR 7616 CNRS）

AI总结提出DMTS-NC方法，利用蒸馏多时间步长和非保守力策略，结合基础神经网络模型（如FeNNix-Bio1）加速原子分子动力学模拟，在保持精度的同时实现15-30%的额外加速，并支持氢质量再分配和氢摩擦以扩展时间步长至10 fs。

详情

DOI: 10.1021/acs.jctc.6c00653
Journal ref: Journal of Chemical Theory and Computation, 2026

AI中文摘要

继我们之前的工作（J. Phys. Chem. Lett., 2026, 17, 5, 1288-1295）之后，我们提出了DMTS-NC方法，这是一种使用非保守力的蒸馏多时间步长策略，用于进一步加速使用基础神经网络模型（如FeNNix-Bio1）的原子分子动力学模拟。该方法采用双层可逆参考系统传播算法（RESPA）形式，将目标精确保守势与为产生非保守力而优化的简化蒸馏表示耦合。尽管是非保守的，但蒸馏架构被设计为强制执行关键物理先验，例如旋转等变性和原子力分量的抵消。这些选择促进了蒸馏过程，从而大幅提高了模拟的鲁棒性，显著限制了两种模型之间的异常差异，从而实现了与力数据的极好一致性。总体而言，DMTS-NC方案比其保守对应方案更稳定、更高效，额外加速比DMTS达到15-30%。无需微调步骤，它更易于实现，并且可以推至系统物理共振的极限，以在保持精度的同时提供最大效率。我们通过结合氢质量再分配（HMR）和高氢摩擦（HHF）获得了额外的加速，将方案的最大时间步长进一步扩展到10 fs，同时保持稳定性和精度。与DMTS一样，DMTS-NC适用于任何神经网络势，并且可以应用于计算量比FeNNix-Bio1更大的方法。我们展示了将该方法应用于MACE-OFF23蒸馏的原理验证，与单时间步长相比，获得了3.66至5.64的加速比。

英文摘要

Following our previous work (J. Phys. Chem. Lett., 2026, 17, 5, 1288-1295), we propose the DMTS-NC approach, a distilled multi-time-step (DMTS) strategy using non-conservative (NC) forces to further accelerate atomistic molecular dynamics simulations using foundation neural network models such as FeNNix-Bio1. There, a dual-level reversible reference system propagator algorithm (RESPA) formalism couples a target accurate conservative potential to a simplified distilled representation optimized for the production of non-conservative forces. Despite being non-conservative, the distilled architecture is designed to enforce key physical priors, such as equivariance under rotation and cancellation of atomic force components. These choices facilitate the distillation process and therefore improve drastically the robustness of simulation, significantly limiting abnormal discrepancies between the two models, thus achieving excellent agreement with the forces data. Overall, the DMTS-NC scheme is found to be more stable and efficient than its conservative counterpart with additional speedups reaching 15-30% over DMTS. Requiring no fine-tuning steps, it is easier to implement and can be pushed to the limit of the systems physical resonances to maintain accuracy while providing maximum efficiency. We obtain additional speedup by combining hydrogen mass repartitioning (HMR), High Hydrogen Friction (HHF) to further extended the largest timestep up to 10fs of our schemes while conserving stability and accuracy. As for DMTS, DMTS-NC is applicable to any neural network potential and can be applied to approaches that are computationally heavier than FeNNix-Bio1. We show a proof of principle applying the approach to the distillation of MACE-OFF23 with consequent speedups ranging from 3.66 to 5.64 compared to single timestep.

URL PDF HTML ☆

赞 0 踩 0

2603.10453 2026-06-09 cs.LG 版本更新

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

挡土墙变形的时空预测：通过多分辨率ConvLSTM堆叠集成减轻误差累积

Jihoon Kim, Heejung Youn

发表机构 * Department of Civil and Environmental Engineering, Hongik University（弘国大学土木与环境工程系）

AI总结提出多分辨率ConvLSTM集成框架，利用不同时间输入分辨率减轻误差累积，提高分阶段开挖中挡土结构长期变形预测的准确性。

Comments 27 pages, 17 figures

详情

DOI: 10.12989/gae.2026.45.5.649
Journal ref: Geomechanics and Engineering, 45(5), 649-674, 2026

AI中文摘要

本研究提出了一种多分辨率卷积长短期记忆（ConvLSTM）集成框架，利用多样化的时间输入分辨率来减轻误差累积，并提高分阶段开挖过程中挡土结构行为的长期预测。通过PLAXIS2D模拟生成了一个广泛的侧向墙位移响应数据库，该模拟包含五层土壤地层、两种开挖深度（14米和20米）以及随机变化的岩土和结构参数，产生了2000个时间序列挠度剖面。使用全连接神经网络元学习器集成了三个在不同输入分辨率下训练的ConvLSTM模型，构建了集成模型。使用数值结果和现场测量进行的验证表明，集成方法始终优于单独的ConvLSTM模型，特别是在长期多步预测中，表现出减少的误差传播和改进的泛化能力。这些发现强调了多分辨率集成策略的潜力，该策略共同利用多样化的时间输入尺度来增强AI驱动的岩土预测中的预测稳定性和准确性。

英文摘要

This study proposes a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) ensemble framework that leverages diverse temporal input resolutions to mitigate error accumulation and improve long-horizon forecasting of retaining-structure behavior during staged excavation. An extensive database of lateral wall displacement responses was generated through PLAXIS2D simulations incorporating five-layered soil stratigraphy, two excavation depths (14 and 20 m), and stochastically varied geotechnical and structural parameters, yielding 2,000 time-series deflection profiles. Three ConvLSTM models trained at different input resolutions were integrated using a fully connected neural network meta-learner to construct the ensemble model. Validation using both numerical results and field measurements demonstrated that the ensemble approach consistently outperformed the standalone ConvLSTM models, particularly in long-term multi-step prediction, exhibiting reduced error propagation and improved generalization. These findings underscore the potential of multi-resolution ensemble strategies that jointly exploit diverse temporal input scales to enhance predictive stability and accuracy in AI-driven geotechnical forecasting.

URL PDF HTML ☆

赞 0 踩 0

2502.18834 2026-06-09 cs.CE cs.LG 版本更新

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting

FinTSB：一个全面且实用的金融时间序列预测基准

Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-tao Xia, Dawei Cheng, Changjun Jiang

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China（清华大学深圳国际研究生院，清华大学，深圳 518055，中国）； School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States（卡内基梅隆大学计算机科学学院，匹兹堡 15213，宾夕法尼亚州，美国）； School of Computer Science and Technology, Tongji University, Shanghai 201804, China（同济大学计算机科学与技术学院，上海 201804，中国）； College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518055, China（深圳大学计算机科学与软件工程学院，深圳 518055，中国）； Shanghai Artificial Intelligence Laboratory, Shanghai 200030, China（上海人工智能实验室，上海 200030，中国）

AI总结针对金融时间序列预测中多样性不足、评估标准缺失和现实匹配度低的问题，提出FinTSB基准，通过分类运动模式、标准化评估指标和模拟真实交易约束，提供全面的评估平台。

详情

DOI: 10.1007/s11704-026-51064-5
Journal ref: Frontiers of Computer Science 2026

AI中文摘要

金融时间序列记录了人脑增强决策行为，捕获了可用于盈利投资策略的历史信息。该领域吸引了大量研究者，提出了基于各种骨干网络的多种方法。然而，该领域的评估通常存在三个系统性局限：1. 未能考虑动态金融市场中观察到的全部股票运动模式（多样性差距）；2. 缺乏统一的评估协议，削弱了跨研究性能比较的有效性（标准化缺失）；3. 忽视关键市场结构因素，导致性能指标虚高，缺乏实际适用性（现实不匹配）。为解决这些问题，我们提出了FinTSB，一个全面且实用的金融时间序列预测基准。为增加多样性，我们将运动模式分为四类，对数据进行分词和预处理，并基于序列特征评估数据质量。为消除不同评估设置带来的偏差，我们在三个维度上标准化指标，并构建了一个用户友好、轻量级的流水线，集成了多种骨干网络的方法。为准确模拟真实交易场景并促进实际应用，我们广泛建模了各种监管约束，包括交易费用等。最后，我们在FinTSB上进行了大量实验，突出了关键见解，以指导不同市场条件下的模型选择。总体而言，FinTSB为研究者提供了一个新颖且全面的平台，用于改进和评估金融时间序列预测方法。代码可在https://github.com/TongjiFinLab/FinTSB获取。

英文摘要

Financial time series (FinTS) record the behavior of human-brain-augmented decision-making, capturing valuable historical information that can be leveraged for profitable investment strategies. Not surprisingly, this area has attracted considerable attention from researchers, who have proposed a wide range of methods based on various backbones. However, the evaluation of the area often exhibits three systemic limitations: 1. Failure to account for the full spectrum of stock movement patterns observed in dynamic financial markets. (Diversity Gap), 2. The absence of unified assessment protocols undermines the validity of cross-study performance comparisons. (Standardization Deficit), and 3. Neglect of critical market structure factors, resulting in inflated performance metrics that lack practical applicability. (Real-World Mismatch). Addressing these limitations, we propose FinTSB, a comprehensive and practical benchmark for financial time series forecasting (FinTSF). To increase the variety, we categorize movement patterns into four specific parts, tokenize and pre-process the data, and assess the data quality based on some sequence characteristics. To eliminate biases due to different evaluation settings, we standardize the metrics across three dimensions and build a user-friendly, lightweight pipeline incorporating methods from various backbones. To accurately simulate real-world trading scenarios and facilitate practical implementation, we extensively model various regulatory constraints, including transaction fees, among others. Finally, we conduct extensive experiments on FinTSB, highlighting key insights to guide model selection under varying market conditions. Overall, FinTSB provides researchers with a novel and comprehensive platform for improving and evaluating FinTSF methods. The code is available at https://github.com/TongjiFinLab/FinTSB.

URL PDF HTML ☆

赞 0 踩 0

2605.09813 2026-06-09 cs.NI cs.DC cs.LG cs.SY eess.SY 版本更新

Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks

优化动态边缘/雾网络中垂直联邦学习的服务器部署

Su Wang, Mung Chiang, H. Vincent Poor

发表机构 * Department of Electrical and Computer Engineering, Purdue University（普洛威斯顿大学电子工程与计算机科学系）

AI总结本文研究动态边缘/雾网络中垂直联邦学习的控制与优化，提出SC-DN方法，通过联合优化服务器部署、传输功率、处理器频率和本地训练迭代数，提升模型性能与资源利用率。

Comments Under revision at IEEE/ACM transactions on networking

详情

DOI: 10.1109/TON.2026.3700898

AI中文摘要

我们研究了垂直联邦学习（VFL）的控制与优化，VFL是一种分布式机器学习方法，其中边缘/雾设备包含独立的数据特征。由于边缘/雾网络中数据特征和硬件的异构性，设备对VFL的贡献差异显著，且动态网络可能导致某些数据特征的永久退出或进入。在该设置下，我们提出的方法，动态网络中的服务器控制VFL（SC-DN），首先证明了每个全局轮次都存在一个全局一阶 stationary 点，然后利用这一结果，基于四个关键控制变量：（i）服务器部署，（ii）设备到服务器的传输功率，（iii）本地设备处理器频率，以及（iv）每个全局轮次的本地训练迭代数，联合优化机器学习模型训练和资源消耗。所得到的优化公式包含耦合变量以及多种对数约束，我们证明这是一个混合整数符号多项式问题，一个NP难问题，为此我们开发了一个通用求解器。最后，通过在图像和多模态数据集上的实验，我们表明我们的方法在分类/回归性能和资源消耗节省方面优于甚至贪心方法。

英文摘要

We investigate the control and optimization of vertical federated learning (VFL), a class of distributed machine learning (ML) methods in which edge/fog devices contain separate data features, in dynamic edge/fog networks. Owing to heterogeneous data features and hardware across edge/fog networks, devices' contributions to VFL vary substantially, and, moreover, dynamic edge/fog networks can lead to the permanent exit or entry of select data features. In this setting, our proposed methodology, server controlled VFL in dynamic networks (SC-DN), first establishes the existence of a global first-order stationary point for every global round, and then leverages this result to jointly optimize ML model training and resource consumption based on four key control variables: (i) server placement, (ii) device-to-server transmit power, (iii) local device processor frequency, and (iv) local training iterations per global round. The resulting optimization formulation contains coupled variables as well as numerous forms of logarithmic constraints which we show is a mixed-integer signomial program, an NP-hard problem, and for which we develop a general solver. Finally, via experiments on both image and multi-modal datasets, we show that our methodology demonstrates superior classification/regression performance and resource consumption savings than even greedy methodologies.

URL PDF HTML ☆

赞 0 踩 0

2507.18967 2026-06-09 cs.CV cs.AI cs.LG 版本更新

Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN

利用深度学习进行水下垃圾检测：YOLOv7到YOLOv10与Faster R-CNN的性能比较

UMMPK Nawarathne, HMNS Kumari, HMLS Kumari

发表机构 * Faculty of Computing, Sri Lanka Institute of Information Technology（计算学院，斯里兰卡信息科技学院）； Faculty of Information Technology and Communication Sciences, Tampere University（信息科技与通信科学学院，塔尔皮埃大学）； Computing Centre, Faculty of Engineering, University of Peradeniya（工程学院计算机中心，珀德尼亚大学）

AI总结本文比较了YOLOv7到YOLOv10及Faster R-CNN在水下垃圾检测中的性能，发现YOLOv8在低能见度和不同深度条件下表现最佳，mAP达80.9%。

Comments 7 pages, 11 figures, to be published in International Journal of Research in Computing (IJRC)

详情

Journal ref: Vol. 5 No. I (2026): International Journal of Research in Computing (IJRC)

AI中文摘要

水下污染是当今最严重的环境问题之一，全球海洋、河流和景观中发现大量垃圾。准确检测这些垃圾对废物管理、环境监测和缓解策略至关重要。本文研究了五种先进的物体识别算法，包括YOLO模型（YOLOv7、YOLOv8、YOLOv9、YOLOv10）和Faster R-CNN，以确定哪种模型在水下环境中识别材料最有效。这些模型在包含十五种不同类别的大型数据集上进行了彻底训练和测试。结果显示，YOLOv8在低能见度和变量深度条件下表现最佳，mAP为80.9%。这种性能提升归因于YOLOv8的架构，其包含改进的无锚机制和自监督学习，从而在各种环境中实现更精确和高效的识别。这些发现突显了YOLOv8模型在全球抗污染斗争中的潜力，提高了水下清理作业的检测能力和可扩展性。

英文摘要

Underwater pollution is one of today's most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8's architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model's potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.

URL PDF HTML ☆

赞 0 踩 0

2508.03453 2026-06-09 cs.CL cs.LG 版本更新

Cropping outperforms dropout as an augmentation strategy for self-supervised training of text embeddings

裁剪优于dropout作为自监督训练文本嵌入的增强策略

Rita González-Márquez, Philipp Berens, Dmitry Kobak

发表机构 * Hertie Institute for AI in Brain Health（人工智能与脑健康赫尔蒂研究所）； University of Tübingen（图宾根大学）； University of Tübingen, Germany（德国图宾根大学）

AI总结本文研究了自监督微调中裁剪和dropout两种增强策略，发现裁剪在文本嵌入质量上表现更优，尤其在领域内数据中能快速生成高质量嵌入。

详情

Journal ref: Transactions on Machine Learning Research (TMLR) 2026

AI中文摘要

文本嵌入，即整个文本的向量表示，在许多NLP应用中起重要作用，如检索增强生成、聚类或文本集合的数据探索。目前，表现最佳的嵌入模型是通过监督对比微调从预训练语言模型中衍生而来。这种微调策略依赖于外部相似性概念和标注数据生成正样本对。本文研究了自监督微调，并系统比较了两种最知名的增强策略。我们评估了MTEB和额外的领域内评估，并发现裁剪增强显著优于基于dropout的方法。我们发现，在领域外数据中，生成的嵌入质量远低于监督的最新成果，但针对领域内数据，自监督微调能在极短的微调后生成高质量文本嵌入。最后，我们发现表示质量随着最后一层transformer层的改变而增加，仅微调这些最后一层足以达到相似的嵌入质量。

英文摘要

Text embeddings, i.e. vector representations of entire texts, play an important role in many NLP applications, such as retrieval-augmented generation, clustering, or visualizing collections of texts for data exploration. Currently, top-performing embedding models are derived from pre-trained language models via supervised contrastive fine-tuning. This fine-tuning strategy relies on an external notion of similarity and annotated data for generation of positive pairs. Here we study self-supervised fine-tuning and systematically compare the two most well-known augmentation strategies used for fine-tuning text embeddings models. We assess embedding quality on MTEB and additional in-domain evaluations and show that cropping augmentation strongly outperforms the dropout-based approach. We find that on out-of-domain data, the quality of resulting embeddings is substantially below the supervised state-of-the-art models, but for in-domain data, self-supervised fine-tuning can produce high-quality text embeddings after very short fine-tuning. Finally, we show that representation quality increases towards the last transformer layers, which undergo the largest change during fine-tuning; and that fine-tuning only those last layers is sufficient to reach similar embedding quality.

URL PDF HTML ☆

赞 0 踩 0

2309.10370 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML 版本更新

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

浅层神经网络的几何结构与构造性${\mathcal L}^2$成本最小化

Thomas Chen, Patrícia Muñoz Ewald

发表机构 * Department of Mathematics, University of Texas at Austin（德克萨斯大学奥斯汀分校数学系）

AI总结本文研究浅层ReLU网络在欠参数化情况下的成本最小化问题，通过构造上界揭示分类数据的几何结构，不依赖梯度下降。证明了成本函数最小值的上界与训练数据信噪比相关，并确定了特定子空间的构造性训练网络。

Comments AMS Latex, 29 pages. Experimental evidence added. To appear in Physica D: Nonlinear Phenomena

详情

Journal ref: Phys. D, 490, Article No. 135176 (2026)

AI中文摘要

本文通过显式构造上界，探讨欠参数化浅层ReLU网络中成本（损失）最小化问题，不使用梯度下降方法。重点在于阐明近似和精确极小值的几何结构。考虑$ L^2 $成本函数，输入空间$\mathbb{R}^M$，输出空间${\mathbb R}^Q$，其中$Q\leq M$，训练输入样本大小可任意大。证明了成本函数最小值的上界为$O(δ_P)$，其中$δ_P$衡量训练数据的信噪比。在特殊情况下$M=Q$时，显式确定了成本函数的精确退化局部极小值，并显示该精确值与$Q\leq M$时获得的上界相比，相对误差为$O(δ_P^2)$。上界证明提供了构造性训练的网络；我们证明该网络度量了输入空间$\mathbb{R}^M$中的特定$Q$维子空间。我们还评论了在给定上下文中成本函数全局极小值的特征化问题。

英文摘要

In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow ReLU networks through the explicit construction of upper bounds which appeal to the structure of classification data, without use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider an $L^2$ cost function, input space $\mathbb{R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(δ_P)$ where $δ_P$ measures the signal-to-noise ratio of training data. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(δ_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}^M$. We comment on the characterization of the global minimum of the cost function in the given context.

URL PDF HTML ☆

赞 0 踩 0

2602.13271 2026-06-09 cs.AI cs.HC cs.LG 版本更新

Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework

面向安全增强的人本可解释AI：一种深度入侵检测框架

Md Muntasir Jahid Ayan, Md. Shahriar Rashid, Tazzina Afroze Hassan, Hossain Md. Mubashshir Jamil, Mahbubul Islam, Lisan Al Amin, Rupak Kumar Das, Farzana Akter, Faisal Quader

发表机构 * Department of Computer Science and Engineering, United International University (UIU), Dhaka 1212, Bangladesh（计算机科学与工程系，国际联合大学（UIU），达卡1212，孟加拉国）； Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur 1704, Bangladesh（电气与电子工程系，伊斯兰科技大学，加兹ipur 1704，孟加拉国）； Department of Computer Science and Engineering (CSE), University of Asia Pacific (UAP), Dhaka 1207, Bangladesh（计算机科学与工程系（CSE），亚洲太平洋大学（UAP），达卡1207，孟加拉国）； Department of Information Systems, University of Maryland, Baltimore, 21250, Maryland, USA（信息系统系，马里兰大学，巴尔的摩，21250，美国）； College Of Information Sciences and Technology, Pennsylvania State University, University Park, PA 16802, USA（信息科学与技术学院，宾夕法尼亚州立大学，大学公园，PA 16802，美国）； Department of Information Technology, Washington University of Science and Technology, Alexandria, VA（信息技术系，科学与技术华盛顿大学，亚历山大，VA）； College of Engineering and Information Technology, University of Maryland, College Park, 20742, Maryland, USA（工程与信息技术学院，马里兰大学，学院公园，20742，美国）

AI总结本文提出一种结合可解释AI的深度入侵检测框架，利用CNN和LSTM捕捉流量序列的时间依赖性，通过SHAP实现模型可解释性，提升安全分析的透明度与可靠性。

详情

DOI: 10.1109/SoutheastCon63549.2026.11476073

AI中文摘要

随着网络威胁的复杂性和频率增加，需要准确且可解释的入侵检测系统（IDS）。本文提出了一种新颖的IDS框架，整合可解释人工智能（XAI）以增强深度学习模型的透明性。该框架在NSL-KDD基准数据集上进行实验评估，显示优于传统IDS和黑箱深度学习模型。所提方法结合卷积神经网络（CNN）和长短期记忆网络（LSTM）以捕捉流量序列的时间依赖性。深度学习结果表明，CNN和LSTM的准确率均达到0.99，其中LSTM在宏平均精度、召回率和F-1分数上优于CNN。对于加权平均精度、召回率和F-1分数，两种模型得分几乎相同。为确保可解释性，XAI模型SHapley Additive exPlanations（SHAP）被纳入，使安全分析师能够理解和验证模型决策。SHAP指出，srv_serror_rate、dst_host_srv_serror_rate和serror_rate是两个模型中的一些重要特征。我们还基于IPIP6和Big Five人格特质进行了以信任为导向的专家调查，通过交互式UI评估系统的可靠性和可用性。本工作强调了在网络安全解决方案中结合性能和透明性的潜力，并通过自适应学习推荐未来改进以实现实时威胁检测。

英文摘要

The increasing complexity and frequency of cyber-threats demand intrusion detection systems (IDS) that are not only accurate but also interpretable. This paper presented a novel IDS framework that integrated Explainable Artificial Intelligence (XAI) to enhance transparency in deep learning models. The framework was evaluated experimentally using the benchmark dataset NSL-KDD, demonstrating superior performance compared to traditional IDS and black-box deep learning models. The proposed approach combined Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks for capturing temporal dependencies in traffic sequences. Our deep learning results showed that both CNN and LSTM reached 0.99 for accuracy, whereas LSTM outperformed CNN at macro average precision, recall, and F-1 score. For weighted average precision, recall, and F-1 score, both models scored almost similarly. To ensure interpretability, the XAI model SHapley Additive exPlanations (SHAP) was incorporated, enabling security analysts to understand and validate model decisions. Some notable influential features were srv_serror_rate, dst_host_srv_serror_rate, and serror_rate for both models, as pointed out by SHAP. We also conducted a trust-focused expert survey based on IPIP6 and Big Five personality traits via an interactive UI to evaluate the system's reliability and usability. This work highlighted the potential of combining performance and transparency in cybersecurity solutions and recommends future enhancements through adaptive learning for real-time threat detection.

URL PDF HTML ☆

赞 0 踩 0

2602.00058 2026-06-09 cs.CR cs.LG 版本更新

Comparison of Multiple Classifiers for Android Malware Detection with Emphasis on Feature Insights Using CICMalDroid 2020 Dataset

多分类器比较用于Android恶意软件检测：侧重于特征洞察使用CICMalDroid 2020数据集

Md Min-Ha-Zul Abedin, Tazqia Mehrub

发表机构 * Department of Biosystems Engineering, Auburn University（生物系统工程系，阿伯拉罕大学）； Independent Researcher（独立研究员）

AI总结本文比较了多个分类器在Android恶意软件检测中的性能，发现基于原始特征的梯度提升在准确率、精确率、召回率和F1值上表现最佳，同时揭示了关键驱动因素。

详情

DOI: 10.1109/STI69347.2025.11367549

AI中文摘要

准确的Android恶意软件检测对于保护用户至关重要。签名扫描器在公共应用商店的快速发布周期中显得滞后。我们旨在通过结合全面的数据集和严谨透明的评估来构建一个可信的检测器，并识别决策的可解释驱动因素。我们使用CICMalDroid2020数据集，其中包含17,341个应用，涵盖良性、广告软件、银行软件、短信恶意软件和风险软件。我们提取了301个静态特征和263个动态特征，形成一个564维的混合向量，然后在三种方案下评估了七个分类器：原始特征、主成分分析（PCA）和线性判别分析（LDA），采用70%训练和30%测试分割。结果表明，基于原始特征的梯度提升表现最佳。XGBoost在准确率、精确率、召回率和F1值上分别达到0.9747、0.9703、0.9731和0.9716，混淆矩阵显示恶意应用的良性标签很少。HistGradientBoosting的准确率为0.9741，F1值为0.9708，而CatBoost和随机森林的准确率分别为0.9678和0.9687，F1值分别为0.9636和0.9637。KNN和SVM表现较差。PCA降低了所有模型的性能，XGBoost的准确率降至0.9164，F1值降至0.8988。LDA保持了中90年代的准确率，并在投影中清晰分离了聚类。一个深度为2的替代树突显了包名、主要活动和目标SDK作为关键驱动因素。这些发现建立了Android恶意软件检测的高保真监督基线，并表明丰富的混合特征与梯度提升提供了实用且可解释的基础。

英文摘要

Accurate Android malware detection was critical for protecting users at scale. Signature scanners lagged behind fast release cycles on public app stores. We aimed to build a trustworthy detector by pairing a comprehensive dataset with a rigorous, transparent evaluation, and to identify interpretable drivers of decisions. We used CICMalDroid2020, which contained 17,341 apps across Benign, Adware, Banking, SMS malware, and Riskware. We extracted 301 static and 263 dynamic features into a 564 dimensional hybrid vector, then evaluated seven classifiers under three schemes, original features, principal component analysis, PCA, and linear discriminant analysis, LDA, with a 70 percent training and 30 percent test split. Results showed that gradient boosting on the original features performed best. XGBoost achieved 0.9747 accuracy, 0.9703 precision, 0.9731 recall, and 0.9716 F1, and the confusion matrix indicated rare benign labels for malicious apps. HistGradientBoosting reached 0.9741 accuracy and 0.9708 F1, while CatBoost and Random Forest were slightly lower at 0.9678 and 0.9687 accuracy with 0.9636 and 0.9637 F1. KNN and SVM lagged. PCA reduced performance for all models, with XGBoost dropping to 0.9164 accuracy and 0.8988 F1. LDA maintained mid 90s accuracy and clarified separable clusters in projections. A depth two surrogate tree highlighted package name, main activity, and target SDK as key drivers. These findings established high fidelity supervised baselines for Android malware detection and indicated that rich hybrid features with gradient boosting offered a practical and interpretable foundation for deployment.

URL PDF HTML ☆

赞 0 踩 0

2508.13747 2026-06-09 cs.LG 版本更新

DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction

DREAMS: 在降维中保持局部和全局结构

Noël Kury, Dmitry Kobak, Sebastian Damrich

发表机构 * Hertie Institute for AI in Brain Health（人工智能与脑健康赫尔蒂研究所）； University of Tübingen（图宾根大学）； University of Tübingen, Germany（德国图宾根大学）

AI总结 DREAMS结合t-SNE和PCA的局部和全局结构保持，通过简单正则化项生成多种嵌入，平衡局部和全局结构。

Comments Transactions on Machine Learning Research (2026)

详情

Journal ref: Transactions on Machine Learning Research (TMLR) 2026

AI中文摘要

降维技术广泛用于将高维数据可视化为二维。现有方法通常只保留局部（如t-SNE、UMAP）或全局（如MDS、PCA）结构，但没有方法能同时良好表示两者。本文提出DREAMS（多尺度增强降维），通过简单正则化项结合t-SNE的局部结构保持和PCA的全局结构保持。我们的方法在t-SNE局部结构良好的嵌入和PCA全局结构良好的嵌入之间生成一系列嵌入，高效平衡局部和全局结构保持。我们在十一组真实世界数据集上基准测试DREAMS，展示其在多尺度结构保持方面优于先前方法的能力。

英文摘要

Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g., $t$-SNE, UMAP) or global (e.g., MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured $t$-SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across eleven real-world datasets, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

URL PDF HTML ☆

赞 0 踩 0

2405.07098 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML 版本更新

Interpretable global minima of deep ReLU neural networks on sequentially separable data

可解释的深度ReLU神经网络在依次可分数据上的全局极小值

Thomas Chen, Patrícia Muñoz Ewald

发表机构 * Department of Mathematics, University of Texas at Austin（德克萨斯大学奥斯汀分校数学系）

AI总结本文通过构造零损失分类器，利用累积参数确定截断映射，研究了在小且分离的簇数据及依次线性可分等价类情况下，深度ReLU网络的全局极小值描述。

Comments AMS Latex, 31 pages, 3 figures

2512.10745 2026-06-09 physics.med-ph cs.LG 版本更新

PMB-NN: Physiology-Centred Hybrid AI for Personalized Hemodynamic Monitoring from Photoplethysmography

PMB-NN：以生理为中心的混合AI用于从光体积脉搏波测记中进行个性化血流动力学监测

Yaowen Zhang, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

发表机构 * Department of Biomedical Signals and Systems, University of Twente（乌得勒支理工大学生物医学信号与系统系）； Department of Cardiovascular and Respiratory Physiology, University of Twente（乌得勒支理工大学心血管与呼吸生理学系）； Department of Intensive Care, University Medical Center Utrecht（乌得勒支大学医学中心重症医学科）

AI总结本文提出PMB-NN方法，结合生理模型与深度学习，实现个性化血流动力学监测，验证其在血压估计中的准确性、可解释性和合理性，展示了生理约束对混合AI框架的增强作用。

详情

DOI: 10.1016/j.cmpb.2026.109479

AI中文摘要

连续监测血压（BP）及血流动力学参数如外周阻力（R）和动脉顺应性（C）对早期血管功能障碍检测至关重要。尽管PPG可穿戴设备已广受欢迎，但现有数据驱动的BP估计方法缺乏可解释性。我们改进了之前提出的以生理为中心的混合AI方法——基于生理模型的神经网络（PMB-NN）——用于血压估计，该方法结合了深度学习与基于两个元件风阻模型的参数化模型，参数R和C作为物理约束。PMB-NN模型通过PPG衍生的时间特征以受试者特异性方式训练，同时利用人口统计数据推断一个中间变量：心输出量。我们验证了模型在10名健康成人进行静态和骑车活动两天内的表现，以测试模型的日常鲁棒性，并与深度学习（DL）模型（FCNN、CNN-LSTM、Transformer）和独立风阻生理模型（PM）进行基准测试。验证从三个角度进行：准确性、可解释性和合理性。PMB-NN在收缩压准确性（MAE：7.2 mmHg）方面与DL基准相当，在舒张压表现（MAE：3.9 mmHg）方面优于DL模型。然而，PMB-NN在生理合理性方面优于DL基线和PM，表明混合架构统一并增强了生理原理和数据驱动技术的各自优势。除了BP外，PMB-NN在训练过程中识别出R（ME：0.15 mmHg·s/ml）和C（ME：-0.35 ml/mmHg），其准确性与PM相似，证明了嵌入的生理约束为混合AI框架提供了可解释性。这些结果使PMB-NN成为一种平衡且基于生理的替代方案，用于日常血流动力学监测，替代纯粹数据驱动的方法。

英文摘要

Continuous monitoring of blood pressure (BP) and hemodynamic parameters such as peripheral resistance (R) and arterial compliance (C) are critical for early vascular dysfunction detection. While photoplethysmography (PPG) wearables has gained popularity, existing data-driven methods for BP estimation lack interpretability. We advanced our previously proposed physiology-centered hybrid AI method-Physiological Model-Based Neural Network (PMB-NN)-in blood pressure estimation, that unifies deep learning with a 2-element Windkessel based model parameterized by R and C acting as physics constraints. The PMB-NN model was trained in a subject-specific manner using PPG-derived timing features, while demographic information was used to infer an intermediate variable: cardiac output. We validated the model on 10 healthy adults performing static and cycling activities across two days for model's day-to-day robustness, benchmarked against deep learning (DL) models (FCNN, CNN-LSTM, Transformer) and standalone Windkessel based physiological model (PM). Validation was conducted on three perspectives: accuracy, interpretability and plausibility. PMB-NN achieved systolic BP accuracy (MAE: 7.2 mmHg) comparable to DL benchmarks, diastolic performance (MAE: 3.9 mmHg) lower than DL models. However, PMB-NN exhibited higher physiological plausibility than both DL baselines and PM, suggesting that the hybrid architecture unifies and enhances the respective merits of physiological principles and data-driven techniques. Beyond BP, PMB-NN identified R (ME: 0.15 mmHg$\cdot$s/ml) and C (ME: -0.35 ml/mmHg) during training with accuracy similar to PM, demonstrating that the embedded physiological constraints confer interpretability to the hybrid AI framework. These results position PMB-NN as a balanced, physiologically grounded alternative to purely data-driven approaches for daily hemodynamic monitoring.

URL PDF HTML ☆

赞 0 踩 0

2503.23822 2026-06-09 cs.LG 版本更新

Node Embeddings via Neighbor Embeddings

通过邻居嵌入进行节点嵌入

Jan Niklas Böhm, Marius Keute, Alica Guzmán, Sebastian Damrich, Andrew Draganov, Dmitry Kobak

发表机构 * Hertie AI, University of Tübingen, Germany（赫尔特人工智能研究所、图宾根大学，德国）； Department of Computer Science, Aarhus University, Denmark（计算机科学系，奥胡斯大学，丹麦）

AI总结本文提出图邻居嵌入框架，无需随机游走即可直接整合相邻节点的嵌入向量，优于现有节点嵌入算法，在局部结构保持方面表现突出，并应用于2D节点嵌入问题，获得优于现有图布局算法的t-SNE布局。

Comments Accepted to Transactions of Machine Learning Research (TMLR)

2510.06742 2026-06-09 cs.AI cs.LG 版本更新

MultiCNKG: Integrating Cognitive Neuroscience, Gene, and Disease Knowledge Graphs Using Large Language Models

MultiCNKG: 利用大语言模型整合认知神经科学、基因和疾病知识图谱

Ali Sarabadani, Kheirolah Rahsepar Fard

发表机构 * Department of Computer Engineering and Information Technology, University of Qom（卡姆大学计算机工程与信息科技系）； University of Qom（卡姆大学）

AI总结本文提出MultiCNKG框架，整合认知神经科学、基因和疾病知识图谱，利用大语言模型实现实体对齐和图谱增强，提升生物医学领域知识图谱的整合与应用能力。

详情

AI中文摘要

大语言模型（LLMs）的出现革新了生物医学和认知科学中知识图谱（KGs）的整合，克服了传统机器学习方法在捕捉基因、疾病和认知过程之间复杂语义联系方面的局限。我们介绍了MultiCNKG，一种创新框架，整合了三个关键知识源：包含2.9K节点和4.3K边的认知神经科学知识图谱（CNKG），涵盖9种节点类型和20种边类型；基因本体（GO）包含43K节点和75K边，涵盖3种节点类型和4种边类型；疾病本体（DO）包含11.2K节点和8.8K边，涵盖1种节点类型和2种边类型。利用LLMs如GPT-4，我们进行实体对齐、语义相似性计算和图谱增强，创建了一个连接遗传机制、神经疾病和认知功能的统一知识图谱。结果图谱包含6.9K节点，涵盖5种类型（如基因、疾病、认知过程）和11.3K边，涵盖7种类型（如因果关系、关联、调控）。评估指标如精确率（85.20%）、召回率（87.30%）、覆盖率（92.18%）、图一致性（82.50%）、新颖性检测（40.28%）和专家验证（89.50%）证实了其鲁棒性和一致性。链接预测评估显示，与TransE（MR: 391，MRR: 0.411）和RotatE（MR: 263，MRR: 0.395）等模型相比，性能与基准如FB15k-237和WN18RR相当。该图谱在个性化医学、认知障碍诊断和认知神经科学假设形成中具有应用前景。

英文摘要

The advent of large language models (LLMs) has revolutionized the integration of knowledge graphs (KGs) in biomedical and cognitive sciences, overcoming limitations in traditional machine learning methods for capturing intricate semantic links among genes, diseases, and cognitive processes. We introduce MultiCNKG, an innovative framework that merges three key knowledge sources: the Cognitive Neuroscience Knowledge Graph (CNKG) with 2.9K nodes and 4.3K edges across 9 node types and 20 edge types; Gene Ontology (GO) featuring 43K nodes and 75K edges in 3 node types and 4 edge types; and Disease Ontology (DO) comprising 11.2K nodes and 8.8K edges with 1 node type and 2 edge types. Leveraging LLMs like GPT-4, we conduct entity alignment, semantic similarity computation, and graph augmentation to create a cohesive KG that interconnects genetic mechanisms, neurological disorders, and cognitive functions. The resulting MultiCNKG encompasses 6.9K nodes across 5 types (e.g., Genes, Diseases, Cognitive Processes) and 11.3K edges spanning 7 types (e.g., Causes, Associated with, Regulates), facilitating a multi-layered view from molecular to behavioral domains. Assessments using metrics such as precision (85.20%), recall (87.30%), coverage (92.18%), graph consistency (82.50%), novelty detection (40.28%), and expert validation (89.50%) affirm its robustness and coherence. Link prediction evaluations with models like TransE (MR: 391, MRR: 0.411) and RotatE (MR: 263, MRR: 0.395) show competitive performance against benchmarks like FB15k-237 and WN18RR. This KG advances applications in personalized medicine, cognitive disorder diagnostics, and hypothesis formulation in cognitive neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2507.17726 2026-06-09 cond-mat.dis-nn cond-mat.mtrl-sci cs.LG 版本更新

Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images

从磁力显微镜图像中深度生成学习人工自旋冰中的磁性摩擦

Arnab Neogi, Suryakant Mishra, Prasad P Iyer, Tzu-Ming Lu, Ezra Bussmann, Sergei Tretiak, Andrew Crandall Jones, Jian-Xin Zhu

发表机构 * Theoretical Division, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室理论 division）； Center for Integrated Nanotechnologies, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室集成纳米技术中心）； Center for Integrated Nanotechnologies, Sandia National Laboratory（桑塔纳国家实验室集成纳米技术中心）

AI总结本文通过深度学习方法从磁力显微镜图像中自动计算自旋冰结构的磁矩和方向，利用变分自编码器生成合成图像并提取特征，以减少实验和分割误差，实现对摩擦顶点和纳米磁性段的精确识别，优化自旋冰配置。

详情

DOI: 10.1038/s41524-026-02124-8

AI中文摘要

日益增长的高分辨率微观图像数据集促进了机器学习方法的发展，用于识别和分析图像中嵌入的细微物理现象。在本工作中，蜂窝晶格自旋冰样本的微观图像被用作数据集，用于自动化计算自旋冰配置的净磁矩和方向。在工作流程的第一阶段，机器学习模型被训练以准确预测自旋冰结构中的磁矩和方向。变分自编码器（VAEs），一种新兴的无监督深度学习技术，被用于生成高质量的合成磁力显微镜（MFM）图像并提取潜在特征表示，从而减少实验和分割误差。工作流程的第二阶段使能够精确识别和预测摩擦顶点和纳米磁性段，有效关联微观图像的结构和功能方面。这促进了设计具有受控摩擦模式的优化自旋冰配置，实现潜在的按需合成。

英文摘要

Increasingly large datasets of microscopic images with atomic resolution facilitate the development of machine learning methods to identify and analyze subtle physical phenomena embedded within the images. In this work, microscopic images of honeycomb lattice spin-ice samples serve as datasets from which we automate the calculation of net magnetic moments and directional orientations of spin-ice configurations. In the first stage of our workflow, machine learning models are trained to accurately predict magnetic moments and directions within spin-ice structures. Variational Autoencoders (VAEs), an emergent unsupervised deep learning technique, are employed to generate high-quality synthetic magnetic force microscopy (MFM) images and extract latent feature representations, thereby reducing experimental and segmentation errors. The second stage of proposed methodology enables precise identification and prediction of frustrated vertices and nanomagnetic segments, effectively correlating structural and functional aspects of microscopic images. This facilitates the design of optimized spin-ice configurations with controlled frustration patterns, enabling potential on-demand synthesis.

URL PDF HTML ☆

赞 0 踩 0

2507.15152 2026-06-09 cs.CL cs.AI cs.LG 版本更新

What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction

什么是‘足够’的自动化水平？大型语言模型在元分析数据提取中的基准测试

Lingbo Li, Anuradha Mathrani, Teo Susnjak

发表机构 * School of Mathematical and Computational Sciences（数学与计算科学学院）； Massey University（梅西大学）； Auckland, New Zealand（新西兰奥克兰）

AI总结本文评估了三种大型语言模型在医疗领域数据提取中的性能，发现定制提示能显著提升召回率，提出三层次指南以平衡自动化与专家监督。

详情

DOI: 10.1017/rsm.2025.10066
Journal ref: Research Synthesis Methods (2026)

AI中文摘要

自动化从全文随机对照试验（RCT）中提取数据用于元分析仍是一个重大挑战。本研究评估了三种LLM（Gemini-2.0-flash、Grok-3、GPT-4o-mini）在高血压、糖尿病和骨科三个医学领域中统计结果、偏倚风险评估和研究层面特征任务上的实际表现。我们测试了四种不同的提示策略（基本提示、自我反思提示、模型集成和定制提示）以确定如何提高提取质量。所有模型均表现出高精度，但普遍存在召回率低的问题，因遗漏关键信息。我们发现定制提示是最有效的，召回率可提升高达15%。基于此分析，我们提出了一套三层指南，根据任务复杂性和风险匹配数据类型与适当的自动化水平。本研究为现实世界中的元分析自动化数据提取提供了实用建议，通过有针对性的、任务特定的自动化平衡LLM效率与专家监督。

英文摘要

Automating data extraction from full-text randomised controlled trials (RCTs) for meta-analysis remains a significant challenge. This study evaluates the practical performance of three LLMs (Gemini-2.0-flash, Grok-3, GPT-4o-mini) across tasks involving statistical results, risk-of-bias assessments, and study-level characteristics in three medical domains: hypertension, diabetes, and orthopaedics. We tested four distinct prompting strategies (basic prompting, self-reflective prompting, model ensemble, and customised prompts) to determine how to improve extraction quality. All models demonstrate high precision but consistently suffer from poor recall by omitting key information. We found that customised prompts were the most effective, boosting recall by up to 15\%. Based on this analysis, we propose a three-tiered set of guidelines for using LLMs in data extraction, matching data types to appropriate levels of automation based on task complexity and risk. Our study offers practical advice for automating data extraction in real-world meta-analyses, balancing LLM efficiency with expert oversight through targeted, task-specific automation.

URL PDF HTML ☆

赞 0 踩 0

2507.02606 2026-06-09 cs.SD cs.AI cs.CR cs.LG eess.AS 版本更新

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

De-AntiFake：重新思考对抗语音克隆攻击的保护扰动

Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出一种两阶段净化方法，旨在提升对抗语音克隆攻击的防御效果，通过净化扰动语音并利用音素指导进行优化，实验表明其优于现有方法。

Comments Accepted by ICML 2025

详情

Journal ref: Proceedings of the 42nd International Conference on Machine Learning, PMLR 267, 2025

AI中文摘要

随着语音生成模型的快速发展，语音克隆（VC）带来的隐私和安全问题日益突出。近期研究尝试通过引入对抗扰动来阻止未经授权的语音克隆，但确定性攻击者可以缓解这些保护扰动并成功执行VC。本文首次系统评估这些保护扰动在包含扰动净化的现实威胁模型下的有效性。研究发现，尽管现有净化方法能中和大量保护扰动，但仍导致VC模型特征空间的失真，影响VC性能。因此，我们提出一种新的两阶段净化方法：（1）净化扰动语音；（2）利用音素指导进行优化，使其符合干净语音分布。实验结果表明，我们的方法在破坏VC防御方面优于现有方法。本研究揭示了基于对抗扰动的VC防御的局限性，并强调了需要更鲁棒的解决方案以缓解VC带来的安全和隐私风险。代码和音频样本可在https://de-antifake.github.io获取。

英文摘要

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https://de-antifake.github.io.

URL PDF HTML ☆

赞 0 踩 0

2502.09252 2026-06-09 cs.LG 版本更新

On the Importance of Embedding Norms in Self-Supervised Learning

关于嵌入范数在自监督学习中的重要性

Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich, Jan Niklas Böhm, Lucas Maes, Dmitry Kobak, Erik Bekkers

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结本文研究了嵌入范数在自监督学习中的作用，通过理论分析和实验表明范数影响收敛速度和网络置信度，且较小的范数对应意外样本。

详情

Journal ref: International Conference on Machine Learning (ICML) 2025

AI中文摘要

自监督学习（SSL）允许在无监督信号的情况下训练数据表示，已成为机器学习的重要范式。大多数SSL方法使用嵌入向量的余弦相似度，从而有效将数据嵌入到超球面上。虽然这似乎表明嵌入范数在SSL中不起作用，但一些近期工作表明嵌入范数与网络收敛和置信度有关。本文解决这一明显矛盾，系统地确立嵌入范数在SSL训练中的作用。通过理论分析、模拟和实验，我们证明嵌入范数（i）控制SSL收敛速度（ii）编码网络置信度，较小的范数对应意外样本。此外，我们还表明操纵嵌入范数对收敛速度有显著影响。我们的发现表明，SSL嵌入范数对于理解和优化网络行为至关重要。

英文摘要

Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm's role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.

URL PDF HTML ☆

赞 0 踩 0

2503.17400 2026-06-09 physics.flu-dyn cs.LG 版本更新

TripNet: Learning Large-scale High-fidelity 3D Car Aerodynamics with Triplane Networks

TripNet：利用三平面网络学习大规模高保真3D汽车空气动力学

Qian Chen, Mohamed Elrefaie, Angela Dai, Faez Ahmed

发表机构 * Department of Mechanical Engineering（机械工程系）； Massachusetts Institute of Technology（麻省理工学院）； Department of Computer Science（计算机科学系）； Technical University of Munich（慕尼黑技术大学）

AI总结 TripNet通过三平面网络实现高分辨率3D汽车空气动力学模拟，无需依赖网格结构，提供高效准确的CFD预测。

详情

DOI: 10.1063/5.0324695

AI中文摘要

代理建模已成为加速计算流体力学（CFD）模拟的强大工具。现有基于点云、体素、网格或图的3D几何学习模型依赖显式几何表示，内存消耗大且分辨率受限。对于具有数百万节点和单元的大型模拟，现有模型因依赖网格分辨率而需进行剧烈下采样，导致精度下降。我们提出了TripNet，一种基于三平面的神经框架，通过隐式编码3D几何到紧凑的连续特征图中。与依赖网格的方法不同，TripNet可扩展到高分辨率模拟，而无需增加内存成本，并以查询方式在任意空间位置进行CFD预测，不依赖网格连接或预定义节点。TripNet在DrivAerNet和DrivAerNet++数据集上实现了最先进的性能，准确预测了阻力系数、表面压力和完整的3D流动场。通过统一的三平面骨干支持多种模拟任务，TripNet为传统CFD求解器和现有代理模型提供了可扩展、准确和高效的替代方案。

英文摘要

Surrogate modeling has emerged as a powerful tool to accelerate Computational Fluid Dynamics (CFD) simulations. Existing 3D geometric learning models based on point clouds, voxels, meshes, or graphs depend on explicit geometric representations that are memory-intensive and resolution-limited. For large-scale simulations with millions of nodes and cells, existing models require aggressive downsampling due to their dependence on mesh resolution, resulting in degraded accuracy. We present TripNet, a triplane-based neural framework that implicitly encodes 3D geometry into a compact, continuous feature map with fixed dimension. Unlike mesh-dependent approaches, TripNet scales to high-resolution simulations without increasing memory cost, and enables CFD predictions at arbitrary spatial locations in a query-based fashion, independent of mesh connectivity or predefined nodes. TripNet achieves state-of-the-art performance on the DrivAerNet and DrivAerNet++ datasets, accurately predicting drag coefficients, surface pressure, and full 3D flow fields. With a unified triplane backbone supporting multiple simulation tasks, TripNet offers a scalable, accurate, and efficient alternative to traditional CFD solvers and existing surrogate models.

URL PDF HTML ☆

赞 0 踩 0

2311.07065 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML 版本更新

On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning

关于深度学习中梯度下降无法逼近零损失全局L²最小化器的非近似性

Thomas Chen, Patricia Muñoz Ewald

发表机构 * Department of Mathematics, University of Texas at Austin（德克萨斯大学奥斯汀分校数学系）

AI总结本文分析了深度学习中梯度下降算法的几何特性，指出在欠参数化网络中，零损失最小化通常无法实现，因此训练输入分布必须非典型才能产生零损失最小化器。

Comments AMS Latex, 7 pages. Typos corrected, Corollary 1.6 upgraded to Theorem, acknowledgment added

2405.17151 2026-06-09 cs.LG 版本更新

Smoke and Mirrors in Causal Downstream Tasks

因果下游任务中的烟与幻影

Riccardo Cadei, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello

发表机构 * Institute of Science and Technology Austria (ISTA)（奥地利科学与技术研究所）； Inria（法国国家信息与自动化技术研究所）； Ecole Normale Supérieure（法国高等科学研究院）； CNRS（法国国家科学研究中心）； PSL Research University（巴黎科学哲学大学）

AI总结本文探讨了因果推断中常见方法的偏差问题，通过实验证明模型选择对因果估计精度的影响，并提出科学问题应被考虑在内。

详情

DOI: 10.52202/079017-0820

AI中文摘要

机器学习和人工智能有潜力改变数据驱动的科学发现，能够为多种科学现象提供准确的预测。由于许多科学问题本质上是因果的，本文探讨了因果推断任务中的处理效应估计，其中感兴趣的结局是在随机对照试验（RCT）中记录在高维观测中的。尽管是最简单的因果设置，且完美适合深度学习，但我们理论发现许多文献中的常见选择可能导致估计偏差。为了测试这些考虑的实际影响，我们记录了ISTAnt，第一个针对高维观测的因果推断下游任务的真实世界基准，作为研究园丁蚁（Lasius neglectus）对施加在群体成员上的微粒体的反应的RCT。比较6480个从最先进的视觉骨干网络微调的模型，我们发现采样和建模选择显著影响因果估计的准确性，且分类准确性并非其代理。我们进一步验证了分析，将其重复应用于合成的视觉数据集，以控制因果模型。我们的结果表明，未来的基准应仔细考虑实际的下游科学问题，尤其是因果问题。此外，我们还强调了表示学习方法的指导方针，以帮助在科学中回答因果问题。

英文摘要

Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations in a Randomized Controlled Trial (RCT). Despite being the simplest possible causal setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded ISTAnt, the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences.

URL PDF HTML ☆

赞 0 踩 0

2501.12421 2026-06-09 cs.LG cs.AI q-bio.QM 版本更新

Tackling Small Sample Survival Analysis via Transfer Learning: A Study of Colorectal Cancer Prognosis

通过迁移学习解决小样本生存分析：结直肠癌预后的研究

Yonghao Zhao, Changtao Li, Chi Shu, Qingbin Wu, Hong Li, Chuan Xu, Tianrui Li, Ziqiang Wang, Zhipeng Luo, Yazhou He

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过迁移学习提升小样本生存分析，针对结直肠癌预后，改进了多种生存模型，如DeepSurv、Cox-CC、DeepHit和Random Survival Forest，实验结果显示迁移学习显著提升了模型性能。

详情

DOI: 10.1016/j.artmed.2026.103426
Journal ref: Artificial Intelligence in Medicine, 178:103426, 2026

AI中文摘要

生存预后对医疗信息学至关重要。实践者常面临小规模临床数据，尤其是癌症患者数据，难以诱导有用的生存预测模式。本文通过迁移学习解决小样本生存分析问题，提出适用于常见生存模型的迁移学习方法。对于参数模型如DeepSurv、Cox-CC和DeepHit，应用预训练和微调等标准迁移学习技术。对于非参数模型如Random Survival Forest，提出新的迁移生存森林（TSF）模型，通过转移树结构并用目标数据微调。在结直肠癌（CRC）预后中评估了迁移学习方法。源数据为27,379名SEER CRC I期患者，目标数据为728名来自西昌医院的CRC I期患者。迁移学习增强后，Cox-CC的C^{td}值从0.7868提升至0.8111，DeepHit从0.8085提升至0.8135，DeepSurv从0.7722提升至0.8043，RSF从0.7940提升至0.8297（最高性能）。所有模型在数据量仅50时训练也表现出更显著的提升。结论：因此，用于癌症预后的现有生存模型可通过适当设计的迁移学习技术得到增强和改进。本研究使用的源代码可在https://github.com/YonghaoZhao722/TSF获取。

英文摘要

Survival prognosis is crucial for medical informatics. Practitioners often confront small-sized clinical data, especially cancer patient cases, which can be insufficient to induce useful patterns for survival predictions. This study deals with small sample survival analysis by leveraging transfer learning, a useful machine learning technique that can enhance the target analysis with related knowledge pre-learned from other data. We propose and develop various transfer learning methods designed for common survival models. For parametric models such as DeepSurv, Cox-CC (Cox-based neural networks), and DeepHit (end-to-end deep learning model), we apply standard transfer learning techniques like pretraining and fine-tuning. For non-parametric models such as Random Survival Forest, we propose a new transfer survival forest (TSF) model that transfers tree structures from source tasks and fine-tunes them with target data. We evaluated the transfer learning methods on colorectal cancer (CRC) prognosis. The source data are 27,379 SEER CRC stage I patients, and the target data are 728 CRC stage I patients from the West China Hospital. When enhanced by transfer learning, Cox-CC's $C^{td}$ value was boosted from 0.7868 to 0.8111, DeepHit's from 0.8085 to 0.8135, DeepSurv's from 0.7722 to 0.8043, and RSF's from 0.7940 to 0.8297 (the highest performance). All models trained with data as small as 50 demonstrated even more significant improvement. Conclusions: Therefore, the current survival models used for cancer prognosis can be enhanced and improved by properly designed transfer learning techniques. The source code used in this study is available at https://github.com/YonghaoZhao722/TSF.

URL PDF HTML ☆

赞 0 踩 0

2411.18385 2026-06-09 cs.LG cs.CV stat.ML 版本更新

Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization

基于高效二阶优化的联邦学习中的不确定性与个性化

Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai

发表机构 * Department of Computer Science and Engineering, IIT Kanpur, India（计算机科学与工程系，印度IIT坎pur）

AI总结本文提出一种高效的联邦学习方法，利用二阶优化减少计算和通信成本，同时保留贝叶斯方法的不确定性与个性化优势。

详情

Journal ref: Transactions on Machine Learning Research (TMLR), 2025

AI中文摘要

联邦学习（FL）已发展为一种有前景的方法，用于在不同客户端上协作学习分布式和异质数据，而无需数据离开客户端。最近的FL研究倡导采用贝叶斯方法，因为它提供了一种系统的方法来考虑模型和预测不确定性，通过学习客户端和/或服务器模型的后验分布。此外，贝叶斯FL自然能够实现个性化，以处理不同客户端上的数据异质性，通过让每个客户端学习其独特的个性化模型。特别是，层次贝叶斯方法使所有客户端都能学习其个性化模型，同时通过服务器提供的先验分布考虑共同点。然而，尽管有这些优势，贝叶斯方法在FL中可能计算成本高且通信成本高，因为需要计算和发送后验分布。我们提出了一种新的贝叶斯FL方法，采用高效的二阶优化方法，其计算成本与Adam等一阶优化方法相似，同时提供贝叶斯方法的多种优势（例如不确定性、个性化），并且在标准和个性化FL设置中都比最先进的贝叶斯FL方法更高效和准确。我们的方法在预测准确性和不确定性估计方面优于基线方法，包括基于优化和贝叶斯FL的方法。

英文摘要

Federated Learning (FL) has emerged as a promising method to collaboratively learn from decentralized and heterogeneous data available at different clients without the requirement of data ever leaving the clients. Recent works on FL have advocated taking a Bayesian approach to FL as it offers a principled way to account for the model and predictive uncertainty by learning a posterior distribution for the client and/or server models. Moreover, Bayesian FL also naturally enables personalization in FL to handle data heterogeneity across the different clients by having each client learn its own distinct personalized model. In particular, the hierarchical Bayesian approach enables all the clients to learn their personalized models while also taking into account the commonalities via a prior distribution provided by the server. However, despite their promise, Bayesian approaches for FL can be computationally expensive and can have high communication costs as well because of the requirement of computing and sending the posterior distributions. We present a novel Bayesian FL method using an efficient second-order optimization approach, with a computational cost that is similar to first-order optimization methods like Adam, but also provides the various benefits of the Bayesian approach for FL (e.g., uncertainty, personalization), while also being significantly more efficient and accurate than SOTA Bayesian FL methods (both for standard as well as personalized FL settings). Our method achieves improved predictive accuracies as well as better uncertainty estimates as compared to the baselines which include both optimization based as well as Bayesian FL methods.

URL PDF HTML ☆

赞 0 踩 0

2311.03087 2026-06-09 cs.LG math.AT 版本更新

Persistent Homology for High-dimensional Data Based on Spectral Methods

基于谱方法的高维数据持续同调

Sebastian Damrich, Philipp Berens, Dmitry Kobak

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen, Germany（图宾根大学希特研究所，德国）； Tübingen AI Center, Germany（图宾根人工智能中心，德国）； IWR, Heidelberg University, Germany（海德堡大学IWR研究所，德国）

AI总结本文提出利用谱方法中的扩散距离和有效电阻检测高维噪声下的拓扑结构，推导出有效电阻的闭式公式，并应用于单细胞RNA测序数据以识别细胞周期环路。

Comments NeurIPS 2024, 54 pages, 44 figures

详情

Journal ref: Conference on Neural Information Processing Systems (NeurIPS) 2024

AI中文摘要

持续同调是一种分析点云拓扑结构的流行计算工具，如检测环或空洞的存在。然而，许多低内在维度的真实世界数据集存在于远高于维度的环境空间中。我们显示在这种情况下，传统持续同调对噪声非常敏感且无法检测正确的拓扑结构。现有的持续同调改进方法也是如此。作为解决方法，我们发现数据的k近邻图上的谱距离，如扩散距离和有效电阻，能够在高维噪声存在下检测正确的拓扑结构。此外，我们推导出有效电阻的闭式公式，并描述其与扩散距离的关系。最后，我们应用这些方法到高维单细胞RNA测序数据，并展示谱距离允许稳健检测细胞周期环路。

英文摘要

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.

URL PDF HTML ☆

赞 0 踩 0

2407.13288 2026-06-09 cs.LG 版本更新

Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

基于Wi-Fi RSSI指纹的多建筑多楼层室内定位的分层阶段式训练链接深度神经网络

Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

发表机构 * School of Advanced Technology, Xi’an Jiaotong-Liverpool University（西安交通大学利物浦大学先进技术学院）； Department of Electrical Engineering and Electronics, University of Liverpool（利物浦大学电子工程与电子系）； Postgraduate Research Scholarships, Key Program Special Fund, Research Enhancement Fund of Xi’an Jiaotong-Liverpool University（西安交通大学利物浦大学研究生研究奖学金、重点专项基金、研究增强基金）

AI总结本文提出一种基于链接神经网络的多建筑多楼层室内定位方法，通过分层阶段式训练框架提升定位精度，实验表明该方法在UJIIndoorLoc数据库上达到8.19米的三维定位误差，优于现有神经网络模型。

Comments 9 pages, 5 figures, under review for journal publication

详情

DOI: 10.1109/JSEN.2024.3455554
Journal ref: IEEE Sensors Journal, volume 25, issue 13, pages 23341--23351, July 1, 2025

AI中文摘要

本文提出了一种基于链接神经网络的多建筑多楼层室内定位新方案，每个神经网络专门解决子问题，并在分层阶段式训练框架下训练。当传感器数据具有层次结构时，利用这种层次结构进行数据处理以提供可扩展的解决方案。该框架通过利用更高层次网络训练获得的先验知识来训练更低层次网络。实验结果表明，基于所提出分层阶段式训练框架训练的链接神经网络在UJIIndoorLoc数据库上实现了8.19米的三维定位误差，这是目前使用完整数据集训练和评估的神经网络模型中最准确的结果。当应用于基于层次卷积神经网络的模型时，该训练框架还能显著将三维定位误差从11.78米降低到8.71米。

英文摘要

In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localization, it is important to exploit the hierarchical nature in data processing to provide a scalable solution. In this regard, the hierarchical stage-wise training framework extends the original stage-wise training framework to the case of multiple linked networks by training a lower-hierarchy network based on the prior knowledge gained from the training of higher-hierarchy networks. The experimental results with the publicly-available UJIIndoorLoc multi-building and multi-floor Wi-Fi RSSI fingerprint database demonstrate that the linked neural networks trained under the proposed hierarchical stage-wise training framework can achieve a three-dimensional localization error of 8.19 m, which, to the best of the authors' knowledge, is the most accurate result ever obtained for neural network-based models trained and evaluated with the full datasets of the UJIIndoorLoc database, and that, when applied to a model based on hierarchical convolutional neural networks, the proposed training framework can also significantly reduce the three-dimensional localization error from 11.78 m to 8.71 m.

URL PDF HTML ☆

赞 0 踩 0

2311.12167 2026-06-09 cs.LG cs.SI 版本更新

Node Classification in Random Trees

随机树中的节点分类

Wouter W. L. Nuijten, Vlado Menkovski

发表机构 * Eindhoven University of Technology（埃因霍温理工大学）

AI总结本文提出一种方法，用于对结构为随机树的对象进行分类，通过马尔可夫网络和图神经网络建模节点标签分布，优于现有方法。

详情

DOI: 10.1007/978-3-031-58547-0_9
Journal ref: Lecture Notes in Computer Science, 2024, pp. 105-116

AI中文摘要

我们提出了一种方法，用于对结构为随机树的对象进行分类。我们的目标是在树数据结构与节点属性（通常为高维嵌入）相关联的情况下，建模节点标签分配的分布。树拓扑不是预设的，在推断过程中没有节点标签存在。其他方法要么假设标签分配的条件独立性，要么在固定图拓扑上操作，或需要部分节点标签被观察。我们的方法定义了具有随机树相应拓扑的马尔可夫网络及其关联的吉布斯分布。我们用图神经网络参数化吉布斯分布，该网络在随机树和节点嵌入上操作。这使得我们能够估计给定随机树的节点分配的似然，并使用MCMC从节点分配分布中采样。我们评估了该方法在斯坦福情感树库数据集上的节点分类任务，结果优于基线方法，证明了其在随机树中联合分布建模的有效性。

英文摘要

We propose a method for the classification of objects that are structured as random trees. Our aim is to model a distribution over the node label assignments in settings where the tree data structure is associated with node attributes (typically high dimensional embeddings). The tree topology is not predetermined and none of the label assignments are present during inference. Other methods that produce a distribution over node label assignment in trees (or more generally in graphs) either assume conditional independence of the label assignment, operate on a fixed graph topology, or require part of the node labels to be observed. Our method defines a Markov Network with the corresponding topology of the random tree and an associated Gibbs distribution. We parameterize the Gibbs distribution with a Graph Neural Network that operates on the random tree and the node embeddings. This allows us to estimate the likelihood of node assignments for a given random tree and use MCMC to sample from the distribution of node assignments. We evaluate our method on the tasks of node classification in trees on the Stanford Sentiment Treebank dataset. Our method outperforms the baselines on this dataset, demonstrating its effectiveness for modeling joint distributions of node labels in random trees.

URL PDF HTML ☆

赞 0 踩 0

2310.20699 2026-06-09 physics.chem-ph cs.LG physics.comp-ph physics.data-an stat.AP 版本更新

Bayesian Multistate Bennett Acceptance Ratio Methods

贝叶斯多状态贝纳特接受比率方法

Xinqiang Ding

发表机构 * Department of Chemistry, Tufts University（塔夫茨大学化学系）

AI总结本文提出贝叶斯多状态贝纳特接受比率方法，通过整合热力学状态的采样配置与先验分布，计算自由能的后验分布，并改进自由能估计的不确定性评估。

详情

DOI: 10.1021/acs.jctc.3c01212
Journal ref: Journal of Chemical Theory and Computation 2024 20 (5), 1878-1888

AI中文摘要

多状态贝纳特接受比率（MBAR）方法是一种计算热力学状态自由能的常用方法。本文介绍了贝叶斯MBAR，即MBAR的贝叶斯推广。通过整合从热力学状态采样的配置与先验分布，贝叶斯MBAR计算自由能的后验分布。利用后验分布，我们推导出自由能估计并计算其相关不确定性。值得注意的是，当使用均匀先验分布时，贝叶斯MBAR恢复了MBAR的结果，但提供了更准确的不确定性估计。此外，当有关于自由能的先验知识时，贝叶斯MBAR可以通过使用非均匀先验分布将此信息纳入估计过程。作为示例，我们展示通过结合关于自由能表面光滑性的先验知识，贝叶斯MBAR比MBAR方法提供更准确的估计。鉴于MBAR在自由能计算中的广泛应用，我们预计贝叶斯MBAR将成为自由能计算各种应用中的重要工具。

英文摘要

The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.

URL PDF HTML ☆

赞 0 踩 0

1909.02747 2026-06-09 eess.IV cs.CV cs.LG stat.ML 版本更新

Eelgrass beds and oyster farming at a lagoon before and after the Great East Japan Earthquake 2011: potential to apply deep learning at a coastal area

2011年东日本大地震前后三重县洋浦湾的海草床和牡蛎养殖：在沿海地区应用深度学习的潜力

Takehisa Yamakita

发表机构 * Marine Biodiversity and Environmental Assessment Research Center (BioEnv)（海洋生物多样性与环境评估研究中心）

AI总结本文通过比较手动勾勒、简单图像分割和深度学习图像变换，研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类，展示了深度学习在地震后沿海地区空间模式提取中的潜力。

详情

DOI: 10.1109/IGARSS.2019.8900354.

AI中文摘要

本文通过对比手动勾勒、简单图像分割和深度学习图像变换方法，研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类，展示了深度学习在地震后沿海地区空间模式提取中的潜力。实验结果表明，图像变换方法在输出分辨率上表现最佳，其在植被分类上的准确率超过69%，通过随机点评估独立测试数据。沙地分布通过分割模型检测，而牡蛎养殖筏的分布则通过分割模型识别。通过手动勾勒和图像变换结果评估地震前后的变化，发现沙地面积增加而植被面积减少。仅通过分割模型检测到牡蛎养殖面积的减少。这些结果证明了深度学习在地震和海啸后空间模式提取中的潜力。

英文摘要

There is a small number of case studies of automatic land cover classification on the coastal area. Here, I test extraction of seagrass beds, sandy area, oyster farming rafts at Mangoku-ura Lagoon, Miyagi, Japan by comparing manual tracing, simple image segmentation, and image transformation using deep learning. The result was used to extract the changes before and after the earthquake and tsunami. The output resolution was best in the image transformation method, which showed more than 69% accuracy for vegetation classification by an assessment using random points on independent test data. The distribution of oyster farming rafts was detected by the segmentation model. Assessment of the change before and after the earthquake by the manual tracing and image transformation result revealed increase of sand area and decrease of the vegetation. By the segmentation model only the decrease of the oyster farming was detected. These results demonstrate the potential to extract the spatial pattern of these elements after an earthquake and tsunami. Index Terms: Great East Japan Earthquake of 2011, Land use land cover (LULC), Zosteracea seagrass, cultured oyster, deep learning, Mangoku Bay

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 79 篇

Reachability and asymptotics of Gaussian Transformer dynamics

LFNO: Bridging Laplace and Fourier via Transient-Steady Decomposition

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing

Teacher-Free Self-Training Amplifies but Does Not Compound: A Pass@$K$ Crossover on a Free-Verifier Domain

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

Layer-wise Derivative Controlled Networks Achieve Competitive Accuracy and Gradient Stability Across Data Regimes

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation

Causal Semantic Alignment for LLM-based Time Series Forecasting

Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior

Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

Intrinsic Selection and Particle Resampling for Inference-Time Scaling Beyond Domain Verifiability

sGPO: Trading Inference FLOPs for Training Efficiency in RLVR

Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory

Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic

Stage-1 Controls the Entropy Regime, Not the Outcome

Neural Legendre-Fenchel transform with Hessian Preconditioning

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

Stabilizing On-Policy Distillation for MLLM Reasoning with Global Normalization

Hybridizing Equilibrium Propagation with Ising Machines for Efficient Energy-Based Learning

Optimizing Energy-based Neural Network Training with Coherent Ising Machine

Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation

Efficient Traffic Prediction at Scale: A Systematic Study of STGCN Architectural Depth

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

Muon Learns More Robust and Transferable Features than Adam

Perturbative Contrastive Physical Learning

Topological Neural Operators

Function-Vector Heads Are Two Populations: Writers and Cancellers in In-Context Learning

Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation

QDS-SNN: Energy-efficient Quantum Deeply-Supervised Spiking Neural Network Algorithm for Traffic Sign Recognition

The Need for Neural ISP in the Small-Pixel Era: How Shrinking Pixels Push Optics to the Limit and Neural Restoration Pushes Back

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

Phase Marginalization for Patch-Grid Instability in Vision Transformers

Stable and Scalable Probabilistic Numerical Solvers for Stiff and High-Dimensional ODEs

Chiaroscuro Attention: Spending Compute in the Dark

Tensorizing Engram: Sharing Latents Across N-Gram Embeddings is Beneficial in LLMs

Predictive Coding with Bayesian Priors via Proximal Gradients

FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning

Learning to Solve Generative ODEs Beyond the Linear Span

Q-Delta: Beyond Key-Value Associative State Evolution

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

Fourier Neural Operators with rank-1 lattice points and hyperbolic cross

Families of Control-Cost-Parametrized Inverse-Optimal Universal Stabilizers

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

PriFT: Prior-Support Guided Supervised Fine-Tuning

Adaptive directional gradients for parameterised quantum circuits

Echo-Memory: A Controlled Study of Memory in Action World Models

Investigating the Histogram Loss in Regression

ePC: Fast and Deep Predictive Coding in Digital Simulation

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Similarity-Distance-Magnitude Activations

Multi-resolution Enhancement for Full Spectrum Neural Representations

SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks

Decomposable Neuro Symbolic Regression

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

Lattice: A Confidence-Gated Hybrid System for Uncertainty-Aware Sequential Prediction with Behavioral Archetypes

Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning

Operationalising the Superficial Alignment Hypothesis via Task Complexity

Amortized Predictability-aware Training Framework for Time Series Forecasting and Classification

Integral Formulas for Vector Signal Tensor Products

Vision Hopfield Memory Networks for Image Recognition

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

MinMax Recurrent Neural Cascades

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

FRWKV+: Periodic-Aware Adaptive Gating for Frequency-Space Linear Time Series Forecasting

An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

The Flexibility Trap: Rethinking the Value of Arbitrary Order in Diffusion Language Models

Component Ablation for Efficient Hybrid Language Model Architectures: Performance, Resilience, and Compression Implications

Medial Axis Aware Learning of Signed Distance Functions

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

Zero-shot Quantum Neural Architecture Search

Continuous Reasoning for Vision-Language-Action

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning