arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.31596 2026-06-01 cs.CV cs.LG 版本更新

KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems

KLIP:通过逆问题中扩散先验的KL散度进行局部分布偏移检测

Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出基于KL散度的OOD检测指标,无需校准数据或偏移分布知识,可检测并定位图像中的局部分布偏移。

Comments CVPR 2026

详情
AI中文摘要

扩散模型作为计算成像的数据驱动先验以及检测分布外(OOD)图像方面已展现出有前景的性能。然而,现有的OOD检测方法通常需要一些关于偏移分布的知识,无法检测细微或局部的分布偏移,并且作用于完整图像而非逆问题中可用的间接测量。我们提出了一种基于扩散先验与后验分布之间的Kullback-Leibler散度的OOD检测指标,该指标(i)不需要任何校准数据或关于偏移分布的知识,并且(ii)可以检测整张图像是否为OOD,以及定位图像内的OOD块。实验上,我们表明该指标可以检测细微但语义上有意义的分布偏移,例如从健康肝脏CT扫描到有肿瘤的CT扫描的偏移,并且能够泛化到不同类型的扩散模型、数据集和逆问题。我们的代码可在https://github.com/voilalab/KLIP找到。

英文摘要

Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require some knowledge of the shifted distribution, fail to detect subtle or localized distribution shifts, and operate on full images, rather than the indirect measurements available in inverse problems. We propose an OOD detection metric based on the Kullback-Leibler divergence between the diffusion prior and the posterior distribution, that (i) does not require any calibration data or knowledge of the shifted distribution, and (ii) can detect whole images as OOD as well as localize OOD patches within an image. Experimentally, we show that this metric can detect subtle yet semantically meaningful distribution shifts, such as the shift from healthy liver CT scans to those with tumors, and generalizes across different types of diffusion models, datasets, and inverse problems. Our code can be found at https://github.com/voilalab/KLIP.

2605.31594 2026-06-01 cs.LG math.OC 版本更新

A Tight Theory of Error Feedback Algorithms in Distributed Optimization

分布式优化中误差反馈算法的紧致理论

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

发表机构 * Inria, D.I. ENS, CNRS, PSL Research University, Paris, France(法国国家信息与自动化研究所、巴黎综合理工学院、法国国家科学研究中心、巴黎高等师范学院、法国巴黎)

AI总结 本文针对分布式优化中的两种主流误差反馈算法(EF和EF21),通过确定最优步长和构造最优Lyapunov函数,给出了紧致的收敛性分析,结果与智能体数量无关且恢复单智能体情形下的已知最优保证。

详情
AI中文摘要

通信成本是分布式学习和一阶优化的主要瓶颈。缓解此问题的常见方法是压缩智能体之间交换的梯度信息。然而,这种压缩通常会降低基于梯度方法的收敛保证。误差反馈机制为此问题提供了一种简单且计算成本低的补救措施,但已提出众多变体,且它们的相对性能仍知之甚少。本文通过确定最优步长选择并为每种方法构造最优Lyapunov函数,为文献中的两种主要误差反馈算法——经典误差反馈方法(EF)和误差反馈21(EF21)——提供了紧致的收敛性分析。结果与智能体数量无关,并恢复了单智能体情形下已知的最佳保证。

英文摘要

Communication costs are a major bottleneck in distributed learning and first-order optimization. A common approach to alleviate this issue is to compress the gradient information exchanged between agents. However, such compression typically degrades the convergence guarantees of gradient-based methods. Error feedback mechanisms provide a simple and computationally cheap remedy for this issue, but numerous variants have been proposed, and their relative performance remains poorly understood. This paper provides tight convergence analyses for two of the main error-feedback algorithms from the literature, the classic Error Feedback method (EF) and Error Feedback 21 (EF21), by identifying optimal step-size choices and constructing optimal Lyapunov functions tailored to each method. The results hold independently of the number of agents and recover the known best guarantees possible in the single-agent regime.

2605.31584 2026-06-01 cs.CL cs.AI cs.LG 版本更新

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

LongTraceRL: 基于评分奖励从搜索智能体轨迹中学习长上下文推理

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

发表机构 * Tsinghua University(清华大学)

AI总结 提出LongTraceRL框架,通过知识图谱随机游走生成多跳问题并利用搜索智能体轨迹构建分层干扰物,结合基于实体链的评分奖励进行过程监督,提升大语言模型在长上下文推理中的表现。

详情
AI中文摘要

长上下文推理仍然是大型语言模型的核心挑战,模型往往难以在大量干扰内容中定位和整合关键信息。基于可验证奖励的强化学习(RLVR)在此任务上展现出潜力,但现有方法受限于低混淆度的干扰物和稀疏的、仅基于结果的奖励信号,无法监督中间推理步骤。为解决这些问题,我们引入了 extsc{LongTraceRL}。在数据构建方面,我们通过知识图谱随机游走生成多跳问题,并利用搜索智能体轨迹构建\emph{分层干扰物}:智能体读取但未引用的文档(高混淆度)和搜索结果中出现但从未打开的文档(低混淆度),从而生成比随机采样或单次搜索构建的训练上下文更具挑战性的内容。在奖励设计方面,我们提出了一种\emph{评分奖励},利用每条推理链上的黄金实体作为细粒度的实体级过程监督。该评分奖励仅应用于最终答案正确的响应(正向策略),以区分正确响应之间的推理质量,并防止奖励作弊。在五个长上下文基准上对三种推理LLM(4B-30B)进行的实验表明, extsc{LongTraceRL} 始终优于强基线,并鼓励全面、基于证据的推理。代码、数据集和模型可在 \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL} 获取。

英文摘要

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractors and sparse, outcome-only reward signals that cannot supervise intermediate reasoning steps. To address these issues, we introduce \textsc{LongTraceRL}. For data construction, we generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build \emph{tiered distractors}: documents the agent read but did not cite (high confusability) and documents that appeared in search results but were never opened (low confusability), producing training contexts that are far more challenging than those built by random sampling or one-shot search. For reward design, we propose a \emph{rubric reward} that uses the gold entities along each reasoning chain as fine-grained, entity-level process supervision. This rubric reward is applied only to responses with correct final answers (positive-only strategy), distinguishing the reasoning quality among correct responses and preventing reward hacking. Experiments on three reasoning LLMs (4B--30B) across five long-context benchmarks demonstrate that \textsc{LongTraceRL} consistently outperforms strong baselines and encourages comprehensive, evidence-grounded reasoning. Codes, datasets and models are available at \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL}.

2605.31580 2026-06-01 cs.LG 版本更新

Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

赋予传感器声音:用于语义时间序列嵌入的多模态JEPA

Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad, Henrik Ohlsson

发表机构 * C3 AI

AI总结 提出CHARM模型,通过通道级文本描述与Transformer编码器结合,利用联合嵌入预测架构(JEPA)学习语义时间序列嵌入,在异常检测、分类和预测任务中仅用线性探针即取得强性能。

Comments 9 pages, 5 figures, accepted at ICML 2026. arXiv admin note: substantial text overlap with arXiv:2505.14543

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML), PMLR 306, 2026
AI中文摘要

基于Transformer的架构在语言和视觉领域的序列建模中取得了进展,但针对异构多变量时间序列的通用表示学习仍未被充分探索。我们提出了CHARM(通道感知表示模型),该模型将通道级文本描述整合到对通道顺序等变的Transformer编码器中。CHARM采用联合嵌入预测架构(JEPA)和一种新颖的损失函数进行训练,该损失函数促进信息丰富且时间稳定的嵌入;潜在空间预测增强了对传感器噪声的鲁棒性,而描述感知门控通过学习到的通道间关系提供了可解释性。在异常检测、分类以及短期和长期预测任务中,学习到的嵌入仅使用线性探针就取得了强性能。性能主要由JEPA目标和条件架构驱动,文本描述作为跨数据集泛化的通道标识符。

英文摘要

Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.

2605.31562 2026-06-01 cs.LG 版本更新

Effective Biological Representation Learning by Masking Gene Expression

通过掩码基因表达实现有效的生物表示学习

Kian Kenyon-Dean, Alina Selega, Ihab Bendidi, Jordan M. Sorokin, Luca Bertinetto, David Errington, Hayley Donnella, Oren Kraus

发表机构 * Recursion Valence Labs École Normale Supérieure PSL

AI总结 提出自监督模型TxFM,采用掩码自编码方法处理RNA-seq数据,通过消融研究确定关键架构,并在精心策划的DiverseRNA-1.4M数据集上训练,获得优于大规模基础模型的基因表示。

Comments 31 pages, 11 figures. Preprint; presented at ICLR 2026 2nd Workshop on Foundation Models for Science: Real-World Impact and Science-First Design

详情
AI中文摘要

RNA测序产生丰富多样的基因表达数据集,为细胞状态和功能提供了引人注目的见解,在药物发现中有许多应用。由于固有的技术噪声和实验批次效应,对此类数据进行建模具有挑战性,许多现有的转录组基础模型(FMs)表现不如线性基线。这些结果提出了一个问题:深度表示学习是否比直接使用原始转录计数具有明显优势。我们的工作通过开发一种新的自监督模型TxFM来探索这一点,重点关注归纳表示学习评估。TxFM采用了一种针对多样化RNA-seq计数数据定制的掩码自编码方法,我们的消融研究通过实验确定了强迁移性能所需的关键架构配置。此外,我们策划了一个公共训练语料库DiverseRNA-1.4M,并发现,在此策划数据集上训练的TxFM产生了高保真度的基因表示,其性能优于在规模大100倍以上的图谱级语料库上训练的FMs。总体而言,我们的结果表明,只要精心综合模型架构和训练数据策划,归纳自监督学习是转录组表示的一种可行建模方法。

英文摘要

RNA sequencing produces rich and diverse datasets of gene expression, offering compelling insights into cellular state and function that have many applications in drug discovery. Modeling such data is challenging due to inherent technical noise and experimental batch effects, as evidenced by many existing transcriptomic foundation models (FMs) underperforming relative to linear baselines. Such results raise the question of whether deep representation learning provides a distinct advantage over the direct use of raw transcript counts. Our work explores this by developing a new self-supervised model, TxFM, with a focus on inductive representation learning evaluations. TxFM employs a masked autoencoding approach tailored to diverse RNA-seq count data, and our ablation study empirically identifies crucial architecture configurations required for strong transfer performance. Additionally, we curate a public training corpus, DiverseRNA-1.4M, and find that TxFM trained on this curated dataset yields high-fidelity gene representations that outperform FMs trained on atlas-scale corpora over 100x larger. Overall, our results indicate that inductive self-supervised learning is a viable modeling approach for transcriptomics representation, provided a careful synthesis of model architecture and training data curation.

2605.31559 2026-06-01 cs.LG 版本更新

Functional Attention: From Pairwise Affinities to Functional Correspondences

函数注意力:从成对亲和性到函数对应

Jiefang Xiao, Maolin Gao, Simon Weber, Guandao Yang, Daniel Cremers

发表机构 * Technical University of Munich, Germany(慕尼黑技术大学,德国) Munich Center for Machine Learning (MCML), Germany(慕尼黑机器学习中心(MCML),德国) PIXL, Department of Computer Science, University of Oxford, United Kingdom(牛津大学计算机科学系PIXL,英国) ECE, University of Texas at Austin, USA(德克萨斯大学奥斯汀分校电子与计算机工程系,美国)

AI总结 提出函数注意力机制,将注意力重新解释为自适应基之间的函数对应,通过结构化线性算子替代softmax亲和性,实现紧凑、可泛化、分辨率不变的全局依赖表示,在PDE求解、3D分割和回归等算子学习任务中达到最先进性能。

Comments 26 pages, 12 figures. Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

学习无限维函数空间之间的映射,即算子学习,对于许多机器学习应用至关重要。尽管基于Transformer的算子很流行,但它们通常依赖于token-wise注意力。这些方法将连续场视为离散token,通常忽略全局函数结构。我们引入了\emph{函数注意力},它将注意力重新解释为自适应基之间的函数对应。受几何函数映射的启发,我们的方法用结构化线性算子替换softmax亲和性。这产生了一个紧凑、可泛化、分辨率不变的表示,显式捕获全局依赖关系。实验表明,\emph{函数注意力}可以在许多算子学习任务中达到最先进的性能,包括求解PDE、3D分割和回归,同时保持对不同离散化的鲁棒性。项目页面可在https://github.com/xjffff/FUNCATTN获取。

英文摘要

Learning mappings between infinite-dimensional function spaces, or operator learning, is essential for many machine learning applications. Although transformer-based operators are popular, they often rely on token-wise attention. These methods treat continuous fields as discrete tokens and usually ignore the global functional structure. We introduce \emph{Functional Attention}, which reinterprets attention as a functional correspondence between adaptive bases. Inspired by geometric functional maps, our method replaces softmax affinities with structured linear operators. This yields a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies. Experiments demonstrate that \emph{Functional Attention} can match state-of-the-art performance in many operator learning tasks, including solving PDEs, 3D segmentation, and regression, while remaining robust to varying discretizations. Project page is available at https://github.com/xjffff/FUNCATTN.

2605.31558 2026-06-01 cs.LG cs.AI 版本更新

Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

位置注意力头与符号注意力头:学习动态、RoPE几何和长度泛化

Felipe Urrutia, Juan José Alegría, Cinthia Sanchez Macias, Jorge Salas, Cristian B. Calderon, Cristobal Rojas

发表机构 * CENIA & Faculty of Mathematics UC Santiago(CENIA与圣托里尼大学数学系) IMC UC & CENIA Santiago(UC IMC与圣托里尼CENIA)

AI总结 通过控制实验研究Transformer注意力头在位置推理和符号推理任务中的学习动态,发现位置和符号注意力头的不同机制及其对长度泛化的影响。

详情
AI中文摘要

基于Transformer的语言模型在当今社会广泛应用。因此,理解它们解决结构化任务的机制以及预测它们在新型场景中的行为对于安全部署至关重要。我们通过在两个结构等价的多跳推理任务上训练仅解码器Transformer(GPT-J)来研究注意力头的学习动态:一个需要位置推理的数字任务和一个需要符号推理的字母任务。利用最近引入的度量标准,该标准将注意力头的行为分类为给定提示下的位置性或符号性,我们表明成功学习与纯头(即表达为位置性或符号性的头)的出现相关。尽管任务结构等价,但它们施加了不同的机制需求:数字任务需要位置头和符号头,而字母任务仅需要符号头。然后,我们识别这些头的计算角色,描述它们实现的基本功能,并给出理论构造,展示单层基于RoPE的注意力如何通过几何可解释的查询、键和值操作实现这些功能。该分析通过一种新的差异概念形式化,在位置和符号机制对更长序列的鲁棒性上产生了定量分离。我们在受控模型和真实世界模型中经验验证了由此产生的预测,表明符号机制更可靠地外推到更长序列,而位置机制面临更严格的限制。

英文摘要

Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic. Despite the tasks' structural equivalence, they impose different mechanistic demands: the number task requires both positional and symbolic heads, whereas the letter task requires only symbolic heads. We then identify the computational roles of these heads, characterize the basic functions they implement, and give theoretical constructions showing how single-layer RoPE-based attention can realize these functions through geometrically interpretable query, key, and value operations. This analysis yields a quantitative separation between positional and symbolic mechanisms in their robustness to longer sequences, formalized through a novel notion of discrepancy. We empirically validate the resulting predictions in both controlled and real-world models, showing that symbolic mechanisms extrapolate more reliably to longer sequences while positional mechanisms face sharper limitations.

2605.31547 2026-06-01 cs.LG math.DS stat.ML 版本更新

The Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling

混沌替代建模中的动态-概率一致性差距

Andre Herz, Matthijs Pals, Daniel Durstewitz, Georgia Koppe

发表机构 * Interdisciplinary Center for Scientific Computing, Heidelberg University, Germany(海德堡大学交叉科学计算中心) Faculty of Mathematics and Computer Science, Heidelberg University, Germany(海德堡大学数学与计算机科学学院) Dept. of Theoretical Neuroscience, Central Institute of Mental Health (CIMH), Mannheim, Germany(曼海姆中央心理健康研究所理论神经科学部门) Faculty of Physics and Astronomy, Heidelberg University, Germany(海德堡大学物理与天文学学院) Hector Institute for AI in Psychiatry and Dept. of Psychiatry and Psychotherapy, CIMH, Mannheim, Germany(曼海姆中央心理健康研究所精神病AI研究所及精神病学与心理学系) Hertie Institute for AI in Brain Health, University of Tübingen, Germany(图宾根大学脑健康AI研究所)

AI总结 针对混沌系统替代建模中动态与概率目标不一致的问题,提出基于可微扩展卡尔曼滤波的KAFFEE框架,通过局部预测残差似然和雅可比协方差传播来缩小差距。

详情
AI中文摘要

动力系统重构旨在学习捕捉时间序列数据背后动力学的替代模型。可靠部署这些替代模型需要与所学动力学一致的不确定性估计。我们揭示了一个动态-概率一致性差距:追求有限时域概率目标可能会退化动力学,或使预测不确定性脱离其应反映的局部切向动力学。我们分离出这一差距背后的三种机制:核心坍缩、噪声掩盖和盲不确定性。具体来说,我们表明开环高斯滚动目标会惩罚混沌系统中雅可比生成的协方差增长,鼓励削弱物理扩张或使不确定性与之脱钩的优化捷径。为缓解这一差距,我们提出KAFFEE(用于遍历仿真的卡尔曼感知框架),这是一个基于可微扩展卡尔曼滤波的训练框架,在通过学习的局部雅可比传输协方差的同时,评估局部预测残差(新息)的似然。在随机超混沌Lorenz-96上,KAFFEE减少了已识别的失败模式,改善了相对于开环目标的动力学不变量重建,并保持了有竞争力的预测分数。我们进一步表明,当概率性地将DSR基础模型适应于13个混沌系统时,DPC差距出现,而KAFFEE在基本保留零样本动力学的同时实现了上下文贝叶斯滤波。

英文摘要

Dynamical systems reconstruction (DSR) aims to learn surrogate models that capture the dynamics underlying time-series data. Reliably deploying these surrogates requires uncertainty estimates consistent with the learned dynamics. We expose a dynamic-probabilistic consistency (DPC) gap: the pursuit of finite-horizon probabilistic objectives can degrade dynamics or decouple predictive uncertainty from the local tangent dynamics it ought to reflect. We isolate three mechanisms behind this gap: core collapse, noise masking, and blind uncertainty. Specifically, we show that open-loop Gaussian rollout objectives can penalize Jacobian-generated covariance growth in chaotic systems, encouraging optimization shortcuts that weaken physical expansion or decouple uncertainty from it. To mitigate this gap, we propose KAFFEE (Kalman-Aware Framework For Ergodic Emulation), a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals (innovations) while transporting covariance through learned local Jacobians. On stochastic hyperchaotic Lorenz-96, KAFFEE reduces the identified failure modes, improves reconstruction of dynamical invariants relative to open-loop objectives, and maintains competitive predictive scores. We further show that the DPC gap appears when probabilistically adapting a DSR foundation model across 13 chaotic systems, where KAFFEE enables in-context Bayesian filtering while largely preserving zero-shot dynamics.

2605.31539 2026-06-01 cs.CV cs.LG q-bio.QM 版本更新

Automated Prediction of Postoperative Pancreatic Fistula Using Preoperative Computed Tomography

利用术前计算机断层扫描自动预测术后胰瘘

Ashok Choudhary, Chris Varghese, Leo Y. Li-Han, Frank G. Lee, Ellen L. Larson, Elizabeth B. Habermann, Cornelius A. Thiels, Hojjat Salehinejad

发表机构 * Department of Surgery, Mayo Clinic, Rochester, MN, USA(梅奥诊所外科部,罗切斯特,明尼苏达州,美国) Department of Surgery, University of Auckland, Auckland, NZ(奥克兰大学外科部,奥克兰,新西兰) Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA(健康照护科学中心,梅奥诊所,罗切斯特,明尼苏达州,美国) Department of Artificial Intelligence(人工智能部)

AI总结 提出一种从胰腺分割到分类的端到端深度学习流程,利用术前CT扫描自动预测术后胰瘘风险,为临床决策提供工具和方法基准。

详情
AI中文摘要

术后胰瘘(POPF)是胰腺切除术后的一种严重并发症,会增加发病率、住院时间和医疗费用。我们提出了一种自动化的端到端深度学习流程——从胰腺分割到分类——用于利用术前CT扫描进行术前POPF风险估计和分层。使用包含自动分割的胰腺体积和手术结果的数据集评估了多种架构,包括自定义轻量级3D CNN基线(CNN3D)、R(2+1)D ResNet-18和ResNet-MC3-18模型。在多个3D架构上的评估显示了有前景的预测性能。该方法为胰腺特异性CT分类提供了临床有价值的工具和方法基准,支持胰腺手术中改进的术前决策。

英文摘要

Postoperative pancreatic fistula (POPF) is a serious complication after pancreatic resection, increasing morbidity, hospital stay, and healthcare costs. We present an automatic, end-to-end deep learning pipeline-from pancreatic segmentation to classification-for preoperative POPF risk estimation and stratification using preoperative CT scans. A data set with auto-segmented pancreas volumes and surgical outcomes was used to evaluate multiple architectures, including a custom lightweight 3D CNN baseline (CNN3D), R(2+1)D ResNet-18, and ResNet-MC3-18 models. Evaluation across multiple 3D architectures demonstrated promising predictive performance. This approach offers a clinically valuable tool and a methodological benchmark for pancreas-specific CT classification, supporting improved preoperative decision-making in pancreatic surgery.

2605.31535 2026-06-01 cs.CV cs.AI cs.LG 版本更新

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

RayDer: 从真实世界视频中可扩展的自监督新视角合成

Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

发表机构 * Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心 (MCML))

AI总结 提出统一前馈变压器RayDer,将相机估计、场景重建和渲染整合为单一骨干,实现自监督新视角合成的可扩展幂律缩放,在零样本开放集性能上媲美有监督方法。

Comments Project Page: https://compvis.github.io/rayder

详情
AI中文摘要

自监督新视角合成(NVS)在扩展方面仍然具有挑战性,尽管视频数据丰富,这主要是由于在真实视频上训练的脆弱性以及多网络系统设计的难以预测的缩放行为。我们引入了RayDer,一个统一的前馈变压器,将相机估计、场景重建和渲染整合到一个单一骨干中,将自监督NVS转化为一个适定的单模型缩放问题。一个最小的动态状态,被视为干扰因素,吸收时变内容,使得在无约束的真实世界视频上稳定训练成为可能。重要的是,RayDer将静态场景NVS作为其目标任务:动态内容仅作为可扩展的监督被利用,而不是像动态场景(4D)NVS那样重建。在多个模型大小和数量级的数据上,RayDer展示了与数据和计算量相关的清晰幂律缩放,并优于静态场景数据混合。在大量基准测试中,RayDer实现了与最先进的有监督方法相竞争的强大零样本开放集性能。项目页面:https://compvis.github.io/rayder

英文摘要

Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem. A minimal dynamic state, treated as a nuisance factor, absorbs time-varying content and enables stable training on unconstrained real-world video. Importantly, RayDer keeps static-scene NVS as its target task: dynamic content is leveraged purely as scalable supervision, not reconstructed as in dynamic-scene (4D) NVS. Across multiple model sizes and orders of magnitude in data, RayDer exhibits clean power-law scaling with data and compute, and outperforms static-scene data mixtures. On a large number of benchmarks, RayDer achieves strong zero-shot open-set performance competitive with state-of-the-art supervised approaches. Project Page: https://compvis.github.io/rayder

2605.31532 2026-06-01 cond-mat.soft cs.LG 版本更新

Discovering Thermodynamically Admissible Dissipation Potentials via Grammar-Based Symbolic Regression

通过基于语法的符号回归发现热力学允许的耗散势

Federico Califano, Jacopo Ciambella

AI总结 提出一种基于语法的符号回归框架,在广义标准材料形式下自动发现满足热力学约束(凸性和非负性)的耗散势,并在合成数据和实验数据上验证其有效性。

详情
AI中文摘要

非弹性材料的本构定律必须满足严格的热力学允许性要求,然而当前的数据驱动方法即使通过物理编码架构提供了形式保证,也牺牲了可解释性。我们提出了一种符号回归框架,用于在广义标准材料(GSM)形式下数据驱动地发现控制内变量演化的耗散势。从Clausius-Duhem不等式出发,我们强制执行对偶耗散势必须满足的热力学要求——凸性和非负性,以保证非负的机械耗散。这些要求在一般的次微分设置中表述,在一个统一框架内涵盖了率相关(粘弹性)和粘塑性耗散机制,包括具有真正弹性区域的势。候选势由一种复合扩展的保凸语法生成,该语法通过构造保证热力学允许性。该框架在包含牛顿、幂律和Bingham粘塑性真实过程的合成数据集(含过程和测量噪声)上进行了验证,并在合成弹性体的实验振荡剪切测量(多个应变幅度和频率)上进行了验证,其中发现的势再现了动态模量的幅度依赖性软化,并优于校准的线性Zener基线。

英文摘要

Constitutive laws for inelastic materials must satisfy strict thermodynamic admissibility requirements, yet current data-driven approaches sacrifice interpretability, even when formal guarantees are provided by physics-encoded architectures. We propose a symbolic regression framework for the data-driven discovery of dissipation potentials governing the evolution of internal variables within the Generalized Standard Materials (GSM) formalism. Starting from the Clausius--Duhem inequality, we enforce the thermodynamic requirements, convexity and non-negativity, that the dual dissipation potential must satisfy to guarantee non-negative mechanical dissipation. These requirements are formulated in the general subdifferential setting, encompassing rate-dependent (viscoelastic) and viscoplastic dissipative mechanisms, including potentials with genuine elastic domains, within a unified framework. Candidate potentials are generated by a composition-extended convexity-preserving grammar that guarantees thermodynamic admissibility \emph{by construction}. The framework is validated on synthetic datasets spanning Newtonian, power-law, and Bingham viscoplastic ground truths under process and measurement noise, and on experimental oscillatory shear measurements of a synthetic elastomer across multiple strain amplitudes and frequencies, where the discovered potentials reproduce the amplitude-dependent softening of the dynamic moduli and outperform a calibrated linear Zener baseline.

2605.31524 2026-06-01 cs.LG cs.LO 版本更新

Value Functions as Supermartingale Certificates

值函数作为超鞅证书

Alessandro Abate, Daniel Contro, Mirco Giacobbe, Agustín Martínez-Suñé, Diptarko Roy

发表机构 * University of Oxford, UK(英国牛津大学) University of Birmingham, UK(英国伯明翰大学)

AI总结 本文通过建立值函数与Streett超鞅证书之间的理论联系,将随机系统的形式化验证方法与强化学习相结合,为ω-正则性质提供了一种基于RL的证书合成方法。

Comments To appear in SAIV'26

详情
AI中文摘要

随机系统的认证方法提供了基于实值超鞅证书的充分证明规则,用于确定在一般状态空间(包括可数无限和连续状态空间)上几乎必然满足ω-正则性质(因此也适用于线性时序逻辑)。相反,针对ω-正则任务的强化学习(RL)方法已受到广泛关注,但它们通常缺乏对所学策略满足规范的形式化保证,除非可能限于有限状态和动作空间。我们通过建立一个新的理论联系来弥合这两条研究路线:在适当的奖励下,与几乎必然满足ω-正则性质的策略相关联的值函数编码了该规范的Streett超鞅证书。我们的结果在有限马尔可夫决策过程上通过实验验证,适用于有限、可数无限和连续状态空间,为通过RL进行证书合成提供了一条有原则的途径。

英文摘要

Certification methods for stochastic systems provide sufficient proof rules, based on real-valued supermartingale certificates, to determine the almost-sure satisfaction of $ω$-regular properties (and therefore of linear temporal logic) over general state spaces, encompassing both countably infinite and continuous state spaces. Conversely, reinforcement learning (RL) methods for $ω$-regular tasks have received considerable attention, but they typically lack formal guarantees that the learned policy satisfies the specification, except possibly for finite state and action spaces. We bridge these two lines of research by establishing a novel theoretical connection: under an appropriate reward, the value function associated to a policy that almost surely satisfies an $ω$-regular property encodes a Streett supermartingale certificate for that specification. Our results, validated experimentally on finite Markov decision processes, hold for finite, countably infinite, and continuous state spaces, suggesting a principled route to certificate synthesis via RL.

2605.31522 2026-06-01 cs.LG q-bio.GN q-bio.QM 版本更新

Chem-PerturBridge: a harmonized compendium of small molecule perturbation transcriptomic effects

Chem-PerturBridge:小分子扰动转录组效应的协调汇编

Artur Szałata, Olga Novitskaia, Maiia Shulman, Matthew Mella, Altynbek Zhubanchaliyev, Fabian J. Theis

发表机构 * Institute of Computational Biology, Helmholtz Center Munich(计算生物学研究所,慕尼黑亥姆霍尔茨中心) TUM School of Life Sciences Weihenstephan, Technical University of Munich(慕尼黑技术大学生命科学学院Weihenstephan分校) Institut Curie, INSERM U1331, Computational Oncology(curie研究所,INSERM U1331,计算肿瘤学) TUM School of Computation, Information and Technology, Technical University of Munich(慕尼黑技术大学计算、信息与技术学院)

AI总结 为解决小分子扰动转录组数据碎片化问题,构建了涵盖37k化合物、136种细胞背景和125万样本的协调资源Chem-PerturBridge,并验证了其在跨数据集签名一致性评估和化合物表示学习预训练中的有效性。

Comments 33 pages, 6 figures, 16 tables

详情
AI中文摘要

大型扰动模型需要涵盖化学、细胞和检测多样性的训练数据。然而,当前用于小分子建模的转录组资源在技术、元数据惯例、对照、剂量和预处理流程方面是碎片化的。我们引入了Chem-PerturBridge,这是一个协调的多数据集资源,包含超过37k种化合物、136种细胞背景和125万个转录组样本,涵盖八种检测类型,具有标准化的标识符、元数据和考虑重复的条件级效应。我们利用该资源评估了跨数据集的匹配条件一致性和数据集内的重复一致性。匹配的相同化合物条件在大多数数据集对上的细粒度logFC排名和幅度上通常表现出弱一致性,通常低于相同背景不同化合物的基线。相比之下,logFC方向的一致性要稳定得多,并且通常超过这些基线。我们进一步评估了Chem-PerturBridge作为化合物表示学习预训练资源的效果。在化合物留出的OP3评估分割下,基于Chem-PerturBridge预训练的嵌入在各项指标上优于仅使用L1000的嵌入、Morgan指纹和无描述符的OP3基线。在11个数据集上的广泛分子留出评估进一步表明,基于Chem-PerturBridge训练的模型优于或匹配未使用该资源的模型。因此,Chem-PerturBridge支持跨数据集签名一致性的诊断评估以及异质扰动转录组数据的模型导向复用。

英文摘要

Large perturbation models require training data encompassing chemical, cellular, and assay diversity. Current transcriptomic resources for small-molecule modeling, however, are fragmented across technologies, metadata conventions, controls, doses, and preprocessing pipelines. We introduce Chem-PerturBridge, a harmonized multi-dataset resource comprising over 37k compounds, 136 cellular contexts, and 1.25M transcriptomic samples across eight assay types, with standardized identifiers, metadata, and replicate-aware condition-level effects. We use the resource to evaluate matched-condition agreement across datasets and replicate agreement within datasets. Matched same-compound conditions generally show weak agreement in fine-grained logFC rankings and magnitudes across most dataset pairs, often falling below same-context different-compound baselines. In contrast, logFC direction agreement is substantially more stable and usually exceeds these baselines. We further evaluate Chem-PerturBridge as a pretraining resource for compound representation learning. Under a compound-held-out OP3 evaluation split, embeddings pretrained on Chem-PerturBridge improve over L1000-only embeddings, Morgan fingerprints, and the descriptor-free OP3 baseline across metrics. An extensive molecule-holdout evaluation across 11 datasets further shows that models trained on Chem-PerturBridge outperform or match those that are not. Chem-PerturBridge therefore supports both diagnostic evaluation of cross-dataset signature agreement and model-oriented reuse of heterogeneous perturbation transcriptomic data.

2605.31518 2026-06-01 cs.LG 版本更新

On the Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders

关于稀疏自编码器中激活异常值与特征死亡之间关系的研究

Elana Simon, Etowah Adams, James Zou

发表机构 * Stanford University(斯坦福大学) Columbia University(哥伦比亚大学)

AI总结 本文通过理论分析和实验验证,揭示了稀疏自编码器中维度级激活异常值导致特征死亡的机制,并提出均值中心化预处理方法有效消除该问题。

Comments Accepted to ICML 2026 main conference

详情
AI中文摘要

稀疏自编码器(SAEs)将神经网络激活分解为可解释的特征,但许多学习到的特征从未激活,这一称为特征死亡的问题浪费了字典容量并可能重新引入叠加。不同模型之间的死亡率差异巨大:在GPT-2上接近零,而在相同配置的AlphaFold3上超过70%。我们发现维度级激活异常值(其平均幅度相对于每个token的变化较大的维度)通过根据每个特征与激活均值的对齐方式在初始化时改变预激活来导致此问题。与均值反对齐的特征获得永久负预激活且从不触发。我们将异常值严重程度形式化为$γ= \|μ\|/\|σ\|$;它在涵盖语言、视觉、蛋白质和基因组模型的454个模型-层组合上预测初始死亡率(对于TopK死亡的Spearman $ρ= 0.89$,对于ReLU死亡的$0.82$)。死亡特征可以在训练期间复活,但恢复需要SAE偏置学习激活均值,这一过程在高$γ$时过于缓慢。均值中心化(减去激活均值)绕过了这一点,并在所有测试模型中消除了异常值诱导的死亡,确认了该机制,并为何时以及为何需要这一预处理步骤提供了原则性基础。

英文摘要

Sparse autoencoders (SAEs) decompose neural network activations into interpretable features, but many learned features never activate, a problem called feature death that wastes dictionary capacity and can reintroduce superposition. Death rates vary dramatically between models: near-zero on GPT-2, over 70% on AlphaFold3 with identical configurations. We find that dimension-level activation outliers (dimensions whose mean magnitude is large relative to per-token variation) cause this by shifting pre-activations at initialization based on each feature's alignment with the activation mean. Features anti-aligned with the mean receive permanently negative pre-activations and never fire. We formalize outlier severity as $γ= \|μ\|/\|σ\|$; it predicts initial death rates (Spearman $ρ= 0.89$ for dead-by-TopK, $0.82$ for dead-by-ReLU) across 454 model-layer combinations spanning language, vision, protein, and genomic models. Dead features can revive during training, but recovery requires the SAE bias to learn the activation mean, a process that is prohibitively slow at high $γ$. Mean-centering (subtracting the activation mean) sidesteps this and eliminates outlier-induced death across all tested models, confirming the mechanism and providing a principled basis for when and why this preprocessing step is necessary.

2605.31509 2026-06-01 cs.LG cs.AI 版本更新

Skill Reuse as Compression in Agentic RL

智能体强化学习中的技能重用作为压缩

Zhikun Xu, Yu Feng, Jacob Dineen, Taiwei Shi, Jieyu Zhao, Ben Zhou

发表机构 * Arizona State University(亚利桑那州立大学) University of Pennsylvania(宾夕法尼亚大学) University of Southern California(南加州大学)

AI总结 提出ReuseRL方法,基于最小描述长度原则将成功轨迹压缩为可重用技能字典,并通过分割代价惩罚低效编码行为,在多个环境中提升分布内和分布外成功率。

Comments Work in progress

详情
AI中文摘要

使用强化学习训练的大语言模型智能体通常学习到脆弱且任务特定的捷径。我们假设,当智能体的成功轨迹在结构上可压缩,分解为一小组可重用的抽象模式时,智能体能够更好地泛化。为形式化这一观点,我们引入ReuseRL,它将智能体强化学习建立在最小描述长度原则之上。ReuseRL从成功轨迹中提取共享技能字典,并通过分割代价增强强化学习目标,显式惩罚编码效果差的特殊行为。我们证明了该压缩惩罚的PAC-Bayes泛化界。在ALFWorld、TextWorld-Cooking和Countdown-Stepwise上,ReuseRL在分布内和分布外成功率上均优于原始GRPO和强回合长度基线。

英文摘要

Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce ReuseRL, which grounds agentic RL in the Minimum Description Length (MDL) principle. ReuseRL extracts a shared skill dictionary from successful trajectories and augments the RL objective with a segmentation cost, explicitly penalizing idiosyncratic behaviors that encode poorly. We prove a PAC-Bayes generalization bound for this compression penalty. Across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL improves in- and out-of-distribution success over vanilla GRPO and strong round-length baselines.

2605.31504 2026-06-01 cs.LG stat.ML 版本更新

When Are Multimodal Predictions Biologically Supported? A Diagnostic Evaluation Framework

何时多模态预测具有生物学支持?一个诊断性评估框架

Dylan Steiner, Gustavo Arango-Argoty, Gerald Sun, Etai Jacob

发表机构 * Oncology Data Science & AI, R&D(肿瘤数据科学与人工智能,研发)

AI总结 提出DECAT框架,通过五个零参考指标和规则决策,将多模态表示分类为四种诊断场景,以检测模型是否学到共享生物学、单模态生物学或虚假相关性。

详情
AI中文摘要

肿瘤学中的多模态模型可以产生准确的预测,但准确预测并不能揭示模型是否学到了跨模态共享的生物学、局限于单一模态的生物学,还是反映了混杂因素而非真正生物学的虚假相关性。我们引入了DECAT,一个模型无关的事后评估框架,该框架针对给定任务和模态,使用五个零参考指标和基于规则的决策程序,将多模态表示分类为四种诊断场景。该框架作用于学习到的表示,不需要知道存在哪个特定混杂因素,并在证据不足时返回不确定。我们在四种多模态模型类别(超过2500个训练表示)的合成数据上以及来自8979名TCGA患者的真实数据上验证了DECAT,评估了多模态嵌入和五个预训练的病理基础模型。纠缠模型(如CLIP)实现了近乎完美的共享生物学检测,但在真实基础模型嵌入中,大多数情况下错误地声称存在共享生物学。这种错误声称率随着混杂强度增加而增加,因此更大的队列和更强的表示会产生更自信但仍然错误的诊断。应用于多模态TCGA嵌入和五个没有配对RNA的病理基础模型时,DECAT检测到了AUROC无法看到的混杂,而无需混杂标签,这一点通过事后分层得到了证实。

英文摘要

Multimodal models in oncology can produce accurate predictions, but accurate prediction does not reveal whether the model has learned biology that is shared across modalities, biology confined to one modality, or spurious correlations that reflect confounders rather than genuine biology. We introduce DECAT, a model-agnostic post-hoc evaluation framework that classifies multimodal representations into four diagnostic scenarios for a given task and modality, using five null-referenced metrics and a rule-based decision procedure. The framework operates on learned representations, requires no knowledge of which specific confounder is present, and returns indeterminate when the evidence is insufficient. We validate DECAT on synthetic data across four multimodal model classes (over 2,500 trained representations) and on real data from 8,979 TCGA patients, evaluating both multimodal embeddings and five pretrained pathology foundation models. Entangled models (e.g., CLIP) achieve near-perfect shared biology detection but falsely claim shared biology in the majority of cases where it is absent on real foundation model embeddings. This false claim rate increases with confound strength so that larger cohorts and stronger representations produce more confident but still incorrect diagnoses. Applied to both multimodal TCGA embeddings and five pathology foundation models without paired RNA, DECAT detects confounding invisible to AUROC without requiring the confounder labels, as confirmed by post-hoc stratification.

2605.31503 2026-06-01 cs.CV cs.LG 版本更新

How can embedding models bind concepts?

嵌入模型如何绑定概念?

Arnas Uselis, Darina Koishigarina, Seong Joon Oh

AI总结 本文研究视觉-语言嵌入模型(如CLIP)在概念绑定上的局限性,发现场景嵌入可加性分解为对象表示,但CLIP的高复杂度绑定函数阻碍了泛化,而通过充分数据训练的Transformer模型能学习低复杂度乘法交互绑定函数实现系统泛化。

Comments ICML 2026

详情
AI中文摘要

人类在多物体场景中能轻松判断哪种颜色属于哪种形状,这种能力称为概念绑定。视觉-语言嵌入模型(如CLIP)在绑定时存在困难:它们能识别单个概念,但无法表示哪些概念构成哪些对象。尽管CLIP在跨模态检索中表现为词袋模型,但对象信息可以从其图像和文本嵌入中分别恢复。我们通过绑定函数(将概念映射到场景嵌入)研究这种张力。我们发现场景嵌入可加性分解为对象表示,这解释了为何单模态探针能恢复对象信息。然而,CLIP的绑定函数具有高复杂度,这可能阻止图像和文本编码器学习共享的绑定机制,从而无法泛化到未见过的概念组合。然后我们探究这种局限性是否是根本性的。我们证明并非如此。在从零开始训练的受控Transformer模型中,随着数据覆盖率的增加,绑定泛化出现。这些模型学习到低复杂度的绑定函数,其特点是概念之间的乘法交互,从而实现系统泛化。代码公开于https://github.com/oshapio/binding-concepts-complexity。

英文摘要

Humans easily determine which color belongs to which shape in multi-object scenes, an ability known as concept binding. Vision-language embedding models such as CLIP struggle with binding: they recognize individual concepts but fail to represent which concepts form which objects. Although CLIP behaves like a bag-of-concepts model in cross-modal retrieval, object information is recoverable from its image and text embeddings separately. We study this tension through the binding function, which maps concepts to scene embeddings. We find that scene embeddings decompose additively into object representations, explaining why uni-modal probes can recover object information. However, CLIP's binding function is high-complexity, which likely prevents the image and text encoders from learning a shared binding mechanism that generalizes to unseen concept combinations. We then ask whether this limitation is fundamental. We show that it is not. In controlled transformer models trained from scratch, binding generalization emerges with sufficient data coverage. These models learn low-complexity binding functions characterized by multiplicative interactions between concepts, enabling systematic generalization. Code is publicly available at https://github.com/oshapio/binding-concepts-complexity.

2605.31500 2026-06-01 cs.LG cs.AI 版本更新

On Efficient Scaling of GNNs via IO-Aware Layers Implementations

通过IO感知层实现实现GNN的高效扩展

Daria Fomina, Daniil Krasylnikov, Alexey Boykov, Andrey Dolgovyazov, Vyacheslav Zhdanovskiy, Fedor Velikonivtsev

发表机构 * HSE University(俄罗斯高等经济大学) ITMO University(ITMO大学)

AI总结 针对GNN中稀疏不规则内存访问瓶颈,提出三种GPU内核族(SpMM卷积、归约聚合、注意力层)以减少数据移动并提升局部性,在真实图上实现高达8.5倍加速和76倍内存降低。

Comments International Conference on Machine Learning (ICML) 2026, Spotlight Paper

详情
AI中文摘要

图神经网络(GNN)受限于稀疏、不规则的内存访问。流行的框架如DGL和PyTorch Geometric支持通用消息传递,但复杂层通常具体化边中间结果,增加内存流量并限制在大图上的可扩展性。我们以I/O和算术强度为中心的观点表明,广泛使用的层分为三种内核族:基于SpMM的卷积、基于归约的聚合和基于注意力的层(GATv2/Graph Transformer)。对于每个族,我们开发了减少数据移动、改善局部性并在真实图上保持鲁棒性的GPU内核。我们还研究了图重排序,发现其影响取决于内核映射:它对邻居并行(以gather为主)内核的益处比特征并行设计更一致。实验表明,我们的融合注意力内核在Graph Transformer上达到高达$ extbf{3.9} imes$的加速(中位数$ extbf{1.6} imes$),在局部密集图上使用Tensor Core(块稀疏)变体达到高达$ extbf{7.3} imes$;对于GATv2,我们达到高达$ extbf{8.5} imes$的加速(中位数$ extbf{2.0} imes$),同时峰值内存降低高达$ extbf{76} imes$(中位数$ extbf{6} imes$)。我们的度感知归约内核达到高达$ extbf{10} imes$的加速(中位数$ extbf{2.6} imes$)。对于基于SpMM的层,适当缓存的cuSPARSE比DGL达到高达$ extbf{8} imes$的加速,并在大多数评估中优于评估的自定义基线。我们发布我们的实现作为即插即用的替代品,以支持可重现的、硬件感知的GNN加速。

英文摘要

Graph Neural Networks (GNNs) are bottlenecked by sparse, irregular memory access. Popular frameworks such as DGL and PyTorch Geometric support general message passing, but complex layers often materialize edge-wise intermediates, increasing memory traffic and limiting scalability on large graphs. We take an I/O- and arithmetic-intensity--centric view and show that widely used layers fall into three kernel families: SpMM-based convolutions, reduction-based aggregations, and attention-based layers (GATv2/Graph Transformer). For each family, we develop GPU kernels that reduce data movement, improve locality, and remain robust across realistic graphs. We also study graph reordering and find that its impact depends on the kernel mapping: it benefits neighbor-parallel (gather-dominated) kernels more consistently than feature-parallel designs. Empirically, our fused attention kernels reach up to $\textbf{3.9}\times$ speedup for Graph Transformer (median $\textbf{1.6}\times$), with Tensor Core (block-sparse) variants up to $\textbf{7.3}\times$ on locally dense graphs; for GATv2 we reach up to $\textbf{8.5}\times$ speedup (median $\textbf{2.0}\times$) while reducing peak memory by up to $\textbf{76}\times$ (median $\textbf{6}\times$). Our degree-aware reduction kernels achieve up to $\textbf{10}\times$ speedup (median $\textbf{2.6}\times$). For SpMM-based layers, properly cached cuSPARSE achieves up to $\textbf{8}\times$ speedup over DGL and outperforms evaluated custom baselines in the majority of evaluations. We release our implementations as drop-in replacements to support reproducible, hardware-aware GNN acceleration.

2605.31497 2026-06-01 cs.LG stat.ML 版本更新

Assign and Add: A Mechanistic Study of Compositional Arithmetic

Assign and Add: 组合算术的机制研究

Brady Exoo, Alberto Bietti, John Sous

发表机构 * Yale University(耶鲁大学) Flatiron Institute(Flatiron研究所)

AI总结 通过变量赋值和模加法任务,研究Transformer中组合泛化的机制,发现模型利用同一模加法模块处理直接和间接输入,并揭示了三阶段学习动态。

详情
AI中文摘要

大型语言模型能够组合技能以执行复杂任务,其中许多任务可能在训练期间未曾见过。这种组合发生的具体细节仍然难以捉摸。在本文中,我们通过考虑一个涉及变量赋值和模加法的简单受控设置,研究Transformer中组合泛化的机制。通过将训练数据划分为不相交的集合,我们观察到小型Transformer能够泛化到先前未见过的变量和数字组合。我们的机制分析表明,无论输入是直接给出还是通过单独的变量赋值机制间接给出,都使用相同的“模加法”MLP模块。我们还从经验角度分析了训练动态,揭示了三个学习阶段:首先学习模加法,然后学习变量赋值所需的结构,最后是精炼阶段,模型泛化到训练中未见的一些困难序列。最后,我们提供了一个理论框架来解释组合性如何从训练动态中涌现。这些结果表明,组合泛化可以是Transformer内部机制组合性的自然结果。

英文摘要

Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.

2605.31494 2026-06-01 cs.CL cs.LG 版本更新

Consolidating Rewarded Perturbations for LLM Post-Training

整合奖励扰动用于大语言模型后训练

Zheyu Zhang, Shuo Yang, Gjergji Kasneci

发表机构 * Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心(MCML))

AI总结 提出CoRP方法,通过奖励加权聚合、兼容性重加权和验证门控,将奖励扰动整合为单一模型,无需梯度,在单次推理下平均提升8.1分。

详情
AI中文摘要

语言模型的后训练通常被框架为通过梯度下降实现的样本-分数-更新循环。最近的一系列工作,以RandOpt为例,将此循环转移到权重空间,在预训练模型周围采样高斯扰动,并在推理时集成前K个奖励专家。虽然在与PPO和GRPO匹配训练计算量下具有竞争力,但这种预测级集成每个测试样本需要K次前向传播,并且不能干净地扩展到自由生成。我们询问是否可以将奖励种群折叠成一个单一的可部署模型,用一次整合更新替代推理时集成。对25个模型-任务对的拆分半分析揭示了每种情况下可复现的低秩结构。我们将这种几何结构转化为CoRP(整合奖励扰动),这是一种无梯度算子,结合了奖励加权聚合、兼容性感知重加权和保留验证门控,且没有梯度通过语言模型。在从0.5B到8B的五个语言模型和涵盖数学、代码和创意写作的五个任务上,CoRP平均将基础模型提升了8.1分。使用RandOpt扰动预算的十分之一,CoRP超过了单次推理的RandOpt 6.5分,并恢复了50次多数投票集成增益的一半以上,而每个测试样本只需一次前向传播。

英文摘要

Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations around a pretrained model and ensembling the top-K rewarded specialists at inference. While competitive with PPO and GRPO under matched training compute, this prediction-level ensemble incurs K forward passes per test example and does not extend cleanly to free-form generation. We ask whether the rewarded population can instead be folded into a single deployable model, replacing the inference-time ensemble with one consolidated update. A split-half analysis over 25 model-task pairs reveals reproducible low-rank structure in every case. We turn this geometry into CoRP (Consolidating Rewarded Perturbations), a gradient-free operator that combines reward-weighted aggregation, compatibility-aware reweighting, and a held-out validation gate, with no gradient flowing through the language model. Across five language models from 0.5B to 8B and five tasks covering math, code, and creative writing, CoRP improves the base model by 8.1 points on average. Using one tenth of RandOpt's perturbation budget, CoRP exceeds single-inference RandOpt by 6.5 points and recovers more than half of the gain of the 50-pass majority-vote ensemble, at one forward pass per test example.

2605.31485 2026-06-01 cs.LG math.CT 版本更新

Graphical einops: bridging tensor networks and computation graphs

Graphical einops: 桥接张量网络与计算图

Vincent Wang-Maścianica, Nikhil Khatri

发表机构 * Laboratory for Human-Centered AI, Department of Philosophy, University of Oxford(人类中心人工智能实验室,哲学系,牛津大学) Machine Learning Research Group, Department of Engineering Science, University of Oxford(机器学习研究组,工程科学系,牛津大学)

AI总结 本文提出一种形式化的图形演算,用于einops的张量编程结构片段,通过等级自然性重写实现张量等变性的图解证明,并应用于注意力掩码转换以优化稀疏注意力实现。

详情
AI中文摘要

架构图在深度学习中无处不在,但它们通常仅具有表示性:它们所暗示的张量程序恒等式仍然通过散文和张量轴操作来证明。我们为einops基础下的张量编程结构片段引入了一种形式化的图形演算,使得此类图能够支持证明。我们的演算将张量轴表示为围绕基础类型的嵌套分级管。管边界恢复了轴的无向张量网络视图,而有向内部保留了计算图的操作性解读。关键的重写是等级自然性:在管上滑动眼镜。标准的等变性证明变成了简短的图解推导。我们还展示了如何将我们的重写系统应用于将注意力掩码转换为预处理操作,从而恢复稀疏注意力块的高效实现。

英文摘要

Architecture diagrams are ubiquitous in deep learning, but they are usually only representational: the tensor-program identities they suggest are still proved by prose and tensor-axis manipulation. We introduce a formal graphical calculus for the structural fragment of tensor programming underlying einops, making such diagrams proof-enabling. Our calculus represents tensor axes as nested graded tubes around a base type. The tube boundary recovers the undirected tensor-network view of axes, while the directed interior retains the operational reading of computation graphs. The key rewrite is grade-naturality: sliding spectacles over tubes. Standard equivariance proofs become short diagrammatic derivations. We additionally demonstrate how our rewrite system may be applied to convert attention masks into pre-processing operations, recovering efficient implementations of sparse attention blocks.

2605.31484 2026-06-01 cs.LG 版本更新

Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence

平衡LoRA:消除参数不变性以加速收敛

Valérie Castin, Kimia Nadjahi, Pierre Ablin, Gabriel Peyré

发表机构 * Apple, Paris, France(苹果公司,巴黎,法国) CNRS(法国国家科学研究中心)

AI总结 针对LoRA过参数化导致不同低秩因子对条件数差异大而影响收敛速度的问题,提出BaLoRA,通过投影到平衡流形改善损失景观条件,实现更快收敛和更优性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

低秩适应(LoRA)是微调大型语言模型最广泛采用的方法。值得注意的是,LoRA本质上是过参数化的:多对低秩因子可以产生相同的适应权重矩阵。我们从理论和经验上表明,这些对表现出显著不同的条件数。因此,收敛到不同的损失最小化器直接影响LoRA的收敛速度。基于这一观察,我们引入了平衡低秩适应(BaLoRA),这是LoRA的一种变体,将迭代投影到平衡流形上。该流形在保持适应矩阵的同时改善了损失景观的条件。投影步骤计算量轻,并可以无缝集成到现有的微调流程中。经验上,BaLoRA比标准LoRA收敛更快,并在各种微调任务中实现了更优的性能。

英文摘要

Low-Rank Adaptation (LoRA) is the most widely adopted method for fine-tuning large language models. Notably, LoRA is inherently overparameterized: multiple pairs of low-rank factors can yield the same adapted weight matrix. We show--both theoretically and empirically--that these pairs exhibit significantly different condition numbers. As a result, converging to different loss minimizers directly impacts the convergence rate of LoRA. Building on this observation, we introduce Balanced Low-Rank Adaptation (BaLoRA), a variant of LoRA that projects iterates onto a balanced manifold. This manifold improves the conditioning of the loss landscape while preserving the adapted matrix. The projection step is computationally lightweight and integrates seamlessly into existing fine-tuning pipelines. Empirically, BaLoRA converges faster than standard LoRA and achieves superior performance across a range of fine-tuning tasks.

2605.31464 2026-06-01 cs.LG cs.AI 版本更新

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

GPU预测器:语言模型作为内核运行时优化的选择性替代

Zaid Khan, Justin Chih-Yao Chen, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

发表机构 * UNC Chapel Hill(北卡罗来纳大学教堂山分校) AI2 Johns Hopkins University(约翰霍普金斯大学) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 研究利用语言模型作为GPU内核性能的选择性替代,通过强化学习提高预测准确性和校准度,在有限GPU评估预算下加速内核搜索。

Comments Code: https://github.com/codezakh/gpu-forecasters

详情
AI中文摘要

GPU内核是现代深度学习的主力,优化它们(通过进化搜索或编码代理)通常需要在目标硬件上重复测量。虽然这些测量提供了内核搜索所需的地面真实信号,但成本高昂,因为每次评估内核都需要编译并在GPU上重复执行。随着LLM推理的改进降低了编写新内核的成本,并且LLM驱动的搜索扩展到大的搜索预算,设备上的评估成为瓶颈。为了解决这个问题,我们研究LLM如何通过预测所提议内核的性能,作为选择性GPU替代用于内核评估。一个有用的替代应该是准确的,并且应该是选择性的,知道何时可能出错,并推迟到GPU。为了评估替代,我们测量其预测是否准确、校准良好,并且在有限的GPU测量预算下对恢复快速内核实际有用。接下来,我们研究强化学习是否能提高预测准确性和置信度校准。我们的实验表明,LLM可以准确预测相对内核性能,并且通过强化学习可以提高其实用性。在内核搜索中使用替代,使得搜索在相同的GPU评估预算下可以考虑多倍的候选,从而比同等预算的基线找到更快的内核。这些结果表明,LLM可以在内核优化中发挥更广泛的作用,作为GPU的虚拟模型,而不仅仅是搜索的内核生成器。

英文摘要

GPU kernels are the workhorse of modern deep learning, and optimizing them (via evolutionary search or coding agents) usually requires repeated measurement on target hardware. While these measurements provide the ground-truth signal necessary for kernel search, they are costly, because each evaluation of a kernel requires compilation and repeated execution on a GPU. As improvements in LLM inference reduce the cost of writing novel kernels and LLM-driven searches scale to large search budgets, on-device evaluation becomes a bottleneck. To address this, we study how LLMs can serve as selective GPU surrogates for kernel evaluation, by forecasting the performance of proposed kernels. A useful surrogate should be accurate, and it should be selective, by knowing when it could be wrong, and deferring to the GPU. To evaluate surrogates, we measure whether their forecasts are accurate, calibrated, and practically useful for recovering fast kernels under limited GPU-measurement budgets. Next, we study whether reinforcement learning can improve forecast accuracy and confidence calibration. Our experiments demonstrate that LLMs can accurately forecast relative kernel performance, that their utility can be improved through reinforcement learning. Used inside a kernel search, the surrogate lets the search consider several times as many candidates under the same GPU evaluation budget, and that leads to finding faster kernels than an equal-budget baseline. These results suggest that LLMs can play a broader role in kernel optimization, by acting as virtual models of a GPU rather than solely as kernel generators for search.

2605.31463 2026-06-01 cs.LG cs.AI cs.CL cs.DC 版本更新

PithTrain: A Compact and Agent-Native MoE Training System

PithTrain: 一个紧凑且面向智能体的MoE训练系统

Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy, Zichun Yu, Junru Shao, Todd C. Mowry, Chenyan Xiong, Tianqi Chen

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Xlue NVIDIA(英伟达)

AI总结 提出PithTrain,一个基于智能体原生设计原则的紧凑型MoE训练框架,通过引入ATE-Bench评估智能体任务效率,在保持生产框架吞吐量的同时,将智能体任务轮次和活跃GPU时间分别降低62%和64%。

详情
AI中文摘要

混合专家模型(MoE)已成为前沿语言模型的主导架构。为满足这一需求,生产框架经过多年的工程努力构建了优化的MoE训练栈。然而,为新的架构和系统优化而演进这些栈仍然代价高昂。随着AI编码智能体的兴起,它们可以自动化训练框架开发的部分工作并加速这一演进。但将这些智能体应用于现有框架会带来隐藏成本,这些成本在当今仅关注吞吐量的评估中不可见。我们将这一缺失维度命名为智能体任务效率(ATE):即使用编码智能体理解、操作和扩展框架的成本。基于四个智能体原生设计原则,我们构建了PithTrain,一个紧凑、智能体原生的MoE训练框架。我们进一步引入了ATE-Bench,涵盖现实世界的训练框架任务。我们的评估表明,PithTrain在吞吐量上与生产框架相当,并且在ATE-Bench上,PithTrain实现了更高的智能体任务效率,智能体轮次减少高达62%,活跃GPU时间减少64%。

英文摘要

Mixture-of-Experts (MoE) has become the dominant architecture for frontier language models. To meet this demand, production frameworks have built optimized MoE training stacks over years of engineering effort. Yet evolving these stacks for new architectures and system optimizations remains expensive. With the rise of AI coding agents, they could automate parts of training-framework development and accelerate this evolution. But applying them to these existing frameworks carries hidden costs, invisible to today's throughput-only evaluations. We name this missing dimension agent-task efficiency (ATE): the cost of using coding agents to understand, operate, and extend a framework. Grounded in four agent-native design principles, we build PithTrain, a compact, agent-native MoE training framework. We further introduce ATE-Bench, covering real-world training-framework tasks. Our evaluation shows PithTrain matches the throughput of production frameworks, and on ATE-Bench, PithTrain enables higher agent-task efficiency, with up to 62% fewer Agent Turns and 64% less Active GPU Time.

2605.31455 2026-06-01 cs.LG cs.CL 版本更新

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

DRIFT: 解耦的轨迹采样与重要性加权微调以实现高效的多轮优化

Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu

发表机构 * The Hong Kong University of Science(香港科学与技术大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 针对多轮交互中在线强化学习成本高而离线监督微调存在分布偏移的问题,提出DRIFT框架,通过将KL正则化强化学习目标等价转化为重要性加权监督学习,实现高效且稳定的多轮优化。

详情
AI中文摘要

大型语言模型越来越多地部署在多轮交互环境中,用户或环境可以迭代地提供轻量级反馈。不幸的是,优化这种行为在实践中面临一个尖锐的困境:在线强化学习能够有效处理多轮动态,但由于每次更新时生成完整修正轨迹的成本过高而变得昂贵,而离线监督微调(SFT)虽然高效,但存在分布偏移和行为崩溃的问题。为此,我们创新性地提出了DRIFT(解耦的轨迹采样与重要性加权微调)框架,该框架实现了KL正则化强化学习目标等价于重要性加权监督学习的理论洞察。DRIFT通过从固定参考策略中采样离线交互轨迹,推导基于回报的重要性权重,并通过在所得数据集上进行加权SFT来优化策略,从而将轨迹采样与优化解耦。实验表明,DRIFT在多轮强化学习基线中达到或超越其性能,同时保持了标准监督微调的训练效率和简单性。代码可在 https://github.com/2020-qqtcg/DRIFT 获取。

英文摘要

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereas offline supervised fine-tuning (SFT) is efficient but suffers from distribution shift and behavioral collapse. To this end, we novelly propose DRIFT (Decoupled Rollouts and Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that the KL-regularized RL objective is equivalent to importance-weighted supervised learning. DRIFT decouples rollout from optimization by sampling offline interaction trajectories from a fixed reference policy, deriving return-based importance weights, and optimizing the policy via weighted SFT on the resulting dataset. Empirically, we demonstrate that DRIFT matches or exceeds the performance of multi-turn reinforcement learning baselines while maintaining the training efficiency and simplicity of standard supervised fine-tuning. Code is available at https://github.com/2020-qqtcg/DRIFT.

2605.31445 2026-06-01 cs.GT cs.AI cs.CL cs.LG 版本更新

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

二手车销售机器人?作为讨价还价代理的LLM在部分信息下的诚实与轻信

Antonio Valerio Miceli-Barone, Vaishak Belle, Shay B. Cohen

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 研究LLM代理在模拟讨价还价场景中的表现,发现它们偏离博弈论均衡,尝试撒谎但无法有效利用信息不对称,且优化财务效用会增强谈判能力但增加不诚实行为。

Comments 18 pages, 14 figures

详情
AI中文摘要

在这项工作中,我们研究了模拟讨价还价场景中的代理,其中买方和卖方通过文本渠道进行通信,并试图在不同信息制度(完全信息、信息不对称或相互不确定性)下谈判互利交易。我们评估了它们相对于博弈论解决方案的表现,并进一步调查了它们的诚实性(披露或隐瞒信息、误导或欺骗的倾向)以及轻信性(信任或不信任对方提供信息的倾向)。我们研究了零样本LLM代理(使用简单的提示脚手架)以及微调代理,以探讨优化代理以最大化财务利润是否使它们成为更强的谈判者,但也更不诚实和更不信任。我们发现,现成的LLM都显著偏离博弈论均衡,它们试图对自己的私人信息撒谎,但无法有效利用信息不对称。对财务效用的微调使代理在达成更好交易方面更强,但也更不诚实,这突显了优化代理任务对其安全性可能带来的风险。我们发布了我们的代码和一个讨价还价场景数据集。

英文摘要

In this work we study agents in simulated bargaining scenarios, where a buyer and a seller communicate through a text channel and attempt to negotiate mutually beneficial trades, under different information regimes (complete information, information asymmetry or mutual uncertainty). We evaluate their performance w.r.t. game-theoretical solutions and further investigate their honesty (their tendency to disclose or withhold information or to mislead and deceive) as well as their credulity (their tendency to trust or distrust information provided by the other agent). We study zero-shot LLM agents with simple prompting scaffolding as well as fine-tuned agents, in order to investigate whether optimising the agents to maximise financial profits makes them stronger negotiators but also more dishonest and less trusting. We find that off-the-shelf LLMs all substantially deviate from game-theoretical equilibria, they attempt to lie about their private information but cannot efficiently exploit information asymmetries. Fine-tuning on financial utility makes the agents stronger at achieving better deals but also more dishonest, highlighting the risks that optimising agents for a task can have on their safety. We release our code and a dataset of bargaining scenarios.

2605.31443 2026-06-01 stat.ME cs.LG econ.EM math.ST stat.TH 版本更新

Modeling Covariate Transition for Efficient Estimation of Longitudinal Treatment Effects in Randomized Experiments

建模协变量转移以高效估计随机实验中的纵向处理效应

Naoki Chihara, Tatsushi Oka, Yasuko Matsubara, Yasushi Sakurai, Shota Yasui

发表机构 * SANKEN, The University of Osaka, Osaka, Japan.(大阪大学SANKEN分校) CyberAgent, Inc., Tokyo, Japan(日本东京CyberAgent公司) Keio University, Tokyo, Japan.(东京大学)

AI总结 提出一种回归调整框架,通过建模协变量转移来估计随机实验中的纵向处理效应,并实现渐近正态性和半参数有效性。

Comments Accepted by ICML'26

详情
Journal ref
The 43rd International Conference on Machine Learning, 2026
AI中文摘要

我们提出一个回归调整框架,用于在静态制度下估计随机实验中的纵向处理效应。虽然回归调整方法通过使用预处理协变量有助于随机实验中的方差减少,但它们通常只关注平均效应,从中我们无法获得关于效应何时出现以及持续多久的有价值见解。为了解决这个问题,我们考虑随时间变化的中间结果和事后协变量,并使用转移核表示这些动态轨迹。此外,我们建立了估计量的渐近正态性和半参数效率界,从而实现更强大的统计推断。使用日本某流媒体平台的A/B测试数据进行的模拟研究和实证分析显示了我们的方法的实际优势。

英文摘要

We present a regression-adjustment framework designed for the estimation of longitudinal treatment effects in randomized experiments under static regimes. While regression-adjustment methods are useful for variance reduction in randomized experiments by using pre-treatment covariates, they usually focus only on average effects, from which we cannot obtain valuable insights into when the effects appear and how long they continue. To address this issue, we consider intermediate outcomes and evolving post-treatment covariates over time, and we represent such dynamic trajectories using transition kernels. Furthermore, we establish the asymptotic normality and the semiparametric efficiency bound for our estimator, enabling more powerful statistical inference. Simulation studies and empirical analysis using A/B test data from a streaming platform in Japan show the practical advantages of our method.

2605.31438 2026-06-01 cs.LG 版本更新

Flow map learning in nonlinear vector autoregressive models: influence of the feature-library structure on the training error

非线性向量自回归模型中的流映射学习:特征库结构对训练误差的影响

Markus Gross

发表机构 * Institute for AI Safety and Security, German Aerospace Center (DLR)(人工智能安全与保密研究所,德国航空航天中心(DLR))

AI总结 研究非线性向量自回归模型(NVAR/NG-RC)中特征库结构如何影响训练误差,揭示了训练误差随时间分辨率遵循的标度律,并指出特征库能否精确表示流映射的Lie级数系数决定了误差行为。

Comments 35 pages, 12 figures

详情
AI中文摘要

时间序列预测通常需要学习非线性和时滞依赖关系。一类典型的预测模型是非线性向量自回归过程(NVAR),也称为下一代储层计算机(NG-RC)。这些模型在其显式特征库张成的空间上近似Koopman算子。我们考虑学习马尔可夫非线性动力系统的可辨识性问题,并表明训练误差作为时间分辨率的函数遵循特征性的(预)渐近标度律。这些定律取决于特征库能否精确或仅近似表示流映射(传播子)的早期Lie级数系数。对于由多项式向量场控制的动力系统,我们展示了具有单项式和傅里叶特征库的NVAR/NG-RC模型的机制。我们确定了训练误差对时间分辨率、涉及的非线性阶数和延迟项数量的依赖性。虽然延迟项减少了最优单步训练误差,但只有当库提供足够的非线性时,它们才能改善长期预测。因此,当模型类与真实数据生成过程不匹配时,小的训练误差与弱的泛化能力共存。在各种混沌动力系统上的数值实验证实了理论预测。

英文摘要

Time series forecasting often requires learning nonlinear and time-delayed dependencies. A paradigmatic class of forecasting models are nonlinear vector autoregressive processes (NVAR), also known as next-generation reservoir computers (NG-RCs). These models approximate the Koopman operator on the space spanned by their explicit feature library. We consider the identifiability problem for learning Markovian nonlinear dynamical systems and show that the training error as a function of time resolution follows characteristic (pre-)asymptotic scaling laws. These laws depend on whether the feature library can represent the early Lie-series coefficients of the flow map (propagator) exactly or merely approximately. For dynamical systems governed by polynomial vector fields, we demonstrate the mechanism for NVAR/NG-RC models with monomial and Fourier feature libraries. We determine the dependence of the training error on the temporal resolution, the involved nonlinear degree, and the number of delay terms. While delay terms reduce the optimal one-step training error, they improve long-horizon forecasts only when the library provides sufficient nonlinearity. Thus, small training error coexists with weak generalization as the model class is mismatched to the true data-generating process. Numerical experiments on various chaotic dynamical systems confirm the theoretical predictions.

2605.31427 2026-06-01 cs.LG cs.DC 版本更新

DG-CoLearn: An Efficient Collaborative Learning Framework for Dynamic Graphs

DG-CoLearn:一种高效的动态图协同学习框架

Ashley Hoi-Ting Au, Zikun Zhang, Ligang He, Qiang Ni

发表机构 * Department of Computer Science, The University of Warwick(华威大学计算机科学系) School of Computing and Communications, Lancaster University(兰卡斯特大学计算机与通信学系)

AI总结 针对动态图学习中重复全快照重训练计算开销大且不适用于分区数据协同场景的问题,提出基于增量图快照处理的客户端无感知协同学习框架DG-CoLearn,通过服务器中介的嵌入交换机制实现准确的多跳消息传递,在训练速度、通信开销和预测性能上均取得显著提升。

详情
AI中文摘要

动态图学习(DGL)对于建模演化的图数据至关重要,但现有方法由于重复的全快照重训练而遭受显著的计算开销,并且不适合具有分区数据的协同设置。在现实的图系统中,跨分区边是不可避免的,但客户端之间直接共享图结构可能违反隐私约束。我们提出DG-CoLearn,一种基于增量图快照处理的客户端无感知协同动态图学习框架,该框架将计算集中在受时间更新影响的图区域,同时通过时间建模保留历史信息。这种增量设计一致地应用于整个图处理流程,包括一种服务器中介的嵌入交换机制,以实现准确的多跳消息传递,而无需暴露原始的跨客户端结构信息。大量实验表明,DG-CoLearn在训练时间上实现了高达33.8倍的加速,通信开销降低了27.4倍,同时在节点分类(F1提升高达13.36%)和链接预测(MAP提升高达8.27%)任务上持续提高了预测性能。这些结果突显了DG-CoLearn在协同动态图学习中桥接效率、可扩展性和客户端间结构隐私方面的有效性。

英文摘要

Dynamic graph learning (DGL) is essential for modelling evolving graph data, but existing methods suffer from significant computational overhead due to repeated full-snapshot retraining and are not well-suited for collaborative settings with partitioned data. In realistic graph systems, cross-partition edges are unavoidable, but direct sharing of graph structure between clients may violate privacy constraints. We propose DG-CoLearn, a client-oblivious collaborative dynamic graph learning framework built on incremental graph snapshot processing, which focuses computation on graph regions affected by temporal updates while preserving historical information through temporal modelling. This incremental design is consistently applied across the entire graph processing pipeline, including a server-mediated embedding exchange mechanism to enable accurate multi-hop message passing without exposing raw cross-client structural information. Extensive experiments demonstrate that DG-CoLearn achieves up to 33.8$\times$ speedup in training time and 27.4$\times$ reduction in communication overhead, while consistently improving predictive performance on both node classification (up to 13.36% F1 improvement) and link prediction (up to 8.27% MAP improvement) tasks. These results highlight the effectiveness of DG-CoLearn in bridging efficiency, scalability, and client-to-client structural privacy in collaborative dynamic graph learning.

2605.31423 2026-06-01 cs.LG 版本更新

Fixed Universal Transformers

固定通用Transformer

Jingwen Liu, Alexandr Andoni, Daniel Hsu

发表机构 * Columbia University(哥伦比亚大学)

AI总结 提出固定通用Transformer,通过输入嵌入模拟任意给定类别的Transformer,证明其通用性在足够大嵌入维度下可通过稀疏结构实现,且随机初始化几乎必然通用,实验验证了理论。

详情
AI中文摘要

我们引入了\emph{通用Transformer}:固定Transformer,通过适当的输入嵌入可以模拟给定类别中的任何Transformer。类似于通用图灵机,输入嵌入编码了目标模型的描述,而所有内部参数保持不变。我们提供了明确的稀疏构造,在嵌入维度足够大时实现通用性,并进一步表明通用性是普遍的:随机初始化的Transformer几乎必然具有通用性,这与Zhong和Andreas(2024)最近的实证结果一致。我们在括号平衡和多跳推理的算法任务上实证验证了我们的理论。我们的结果表明,Transformer的很大一部分表达能力可能在于其输入表示,而不是其学习到的权重。

英文摘要

We introduce \emph{universal transformers}: fixed transformers that can simulate any transformer in a given class via a suitable input embedding. Analogous to a universal Turing machine, the input embedding encodes a description of the target model while all internal parameters remain fixed. We provide explicit sparse constructions achieving universality when the embedding dimension is sufficiently large, and further show that universality is generic: randomly initialized transformers are universal almost surely, which aligns with recent empirical results of Zhong and Andreas (2024). We empirically validate our theory on the algorithmic tasks of parenthesis balancing and multi-hop reasoning. Our results suggest that much of a transformer's expressive power may reside in its input representation rather than its learned weights.

2605.31413 2026-06-01 math.ST cs.LG stat.TH 版本更新

Improved Guarantees for Langevin Monte Carlo with Average Smoothness

Langevin Monte Carlo 的平均光滑性改进保证

Arnak S. Dalalyan, Avetik Karagulyan

发表机构 * L2S, CNRS, Centrale Sup\'elec, Universit\'e Paris-Saclay e2

AI总结 针对强对数凹情形下的 Langevin Monte Carlo,利用平均坐标光滑常数而非全局光滑常数,改进了 Wasserstein 距离下的非渐近误差界,并推广至变步长、Laplacian-Lipschitz 势及有限和问题。

详情
AI中文摘要

我们在强对数凹背景下,当误差由 Wasserstein 距离度量时,建立了 Langevin Monte Carlo 的改进非渐近界。主要结果表明,离散化误差由平均坐标光滑常数控制,而非通常的全局光滑常数。证明简短且概率化,依赖于同步耦合的精细使用。我们进一步表明,相同的想法导致了变步长、Laplacian 是 Lipschitz 连续的势以及通过具有固定点控制变量的随机梯度 Langevin 动力学采样的有限和问题的改进界。在 Laplacian 光滑情形下,通常的 Hessian-Lipschitz 贡献被一个更弱的迹型三阶光滑量所取代。在有限和设定中,得到的 SGLD 界改进了对分量函数均方根光滑性的依赖。应用于具有高斯设计的广义线性模型表明,这些改进可以产生比先前已知界显著且依赖于维度的改进,特别是对于相关协变量。

英文摘要

We establish improved nonasymptotic bounds for Langevin Monte Carlo in the strongly log-concave setting, when the error is measured by the Wasserstein distance. The main result shows that the discretization error is governed by an average coordinate-wise smoothness constant, rather than by the usual global smoothness constant. The proof is short and probabilistic, and relies on a refined use of the synchronous coupling. We further show that the same ideas lead to improved bounds for variable step sizes, for potentials whose Laplacian is Lipschitz-continuous, and for finite-sum problems sampled by stochastic-gradient Langevin dynamics with fixed point control variates. In the Laplacian-smooth case, the usual Hessian-Lipschitz contribution is replaced by a weaker trace-type third-order smoothness quantity. In the finite-sum setting, the resulting SGLD bound improves the dependence on the root mean square smoothness of the component functions. Applications to generalized linear models with Gaussian design show that these refinements can yield substantial, dimension-dependent improvements over previously known bounds, especially for correlated covariates.

2605.31408 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

大型语言模型代理中的技能可用性与呈现粒度:一项受控的SkillsBench研究

Xiaonan Xu, Wenjing Wu

发表机构 * Computer Information Technology, Northern Arizona University(计算机信息科技,北亚利桑那大学) Computer Science, University of Colorado Boulder(计算机科学,科罗拉多大学博尔德分校)

AI总结 通过受控实验研究技能知识的呈现粒度对下游任务成功率的影响,发现技能可用性显著提升成功率,而呈现粒度变化影响较小且不确定。

详情
AI中文摘要

技能文档在推理时为大型语言模型代理提供程序性知识。本文研究受控技能知识的呈现粒度是否会改变下游任务成功率。实验使用固定的SkillsBench版本,包含30个任务、领域平衡的子集(由官方oracle运行验证)、两种启用推理的模型配置、六种技能条件,以及每个任务-条件-模型单元五次试验。技能可用性是最清晰的经验信号。相对于无技能,技能条件使GPT-5.5的任务平均通过率提高26.7至36.0个百分点,使DeepSeek V4-Flash提高18.0至26.0个百分点。最终数据包含1800行,每个模型900行。任务是推理单元。在每个任务-条件-模型单元内聚合五次试验,然后在30个任务上估计配对对比。主要的呈现对比较小且不确定。低抽象指导与高抽象指导相比,GPT-5.5差异为+0.7个百分点,DeepSeek V4-Flash差异为-6.7个百分点,两者的95%自助法置信区间均跨越零。在中抽象指导中添加一个工作示例与无示例变体相比,差异分别为+0.7和+1.3个百分点。平均奖励稳健性检验保持了相同的实质性结论。在这个受控子集中,技能可用性与更高的成功率相关,而测试的呈现粒度变化产生的影响较小、不确定且依赖于模型。

英文摘要

Skill documents provide procedural knowledge to large-language-model agents at inference time. This article studies whether the presentation granularity of controlled skill knowledge changes downstream task success. The experiment uses a pinned SkillsBench version, a 30-task domain-balanced subset validated by official oracle runs, two reasoning-enabled model configurations, six skill conditions, and five trials per task-condition-model cell. Skill availability is the clearest empirical signal. Relative to no skill, skill conditions increase task-mean pass rate by 26.7 to 36.0 percentage points for GPT-5.5 and by 18.0 to 26.0 percentage points for DeepSeek V4-Flash. The final data contain 1,800 rows, with 900 rows for each model. The task is the inference unit. Five trials are aggregated within each task-condition-model cell before paired contrasts are estimated over 30 tasks. The primary presentation contrasts are smaller and uncertain. Low-abstraction guidance differs from high-abstraction guidance by +0.7 percentage points for GPT-5.5 and -6.7 percentage points for DeepSeek V4-Flash, with both 95% bootstrap confidence intervals crossing zero. Adding one worked example to medium-abstraction guidance differs from the no-example variant by +0.7 and +1.3 percentage points. Mean-reward robustness checks preserve the same substantive conclusion. In this controlled subset, skill availability is associated with higher success than no skill, while the tested presentation-granularity changes yield small, uncertain, and model-dependent effects.

2605.31391 2026-06-01 physics.ins-det cs.LG hep-ex 版本更新

Deep-learning-based low-energy trigger algorithms for the Hyper-Kamiokande experiment

基于深度学习的Hyper-Kamiokande实验低能量触发算法

Katharina Lachner, Saúl Alonso-Monsalve, Benjamin Richards, Davide Sgalaberna

发表机构 * University of Warwick(沃里克大学)

AI总结 本文针对Hyper-Kamiokande实验的低能中微子事件(<7 MeV),提出并比较了监督式神经网络和基于异常检测(自编码器与MPDR)的触发算法,在3 MeV单电子信号上效率分别达76.7%和31.8%,远超传统命中计数触发的26.4%,且GPU推理延迟低于毫秒级,满足实时运行需求。

Comments 16 pages, 6 figures

详情
AI中文摘要

现代机器学习技术因其强大的模式识别能力在粒子物理学中变得越来越重要,包括在具有严格运行时间约束的实时数据采集中。本文详细介绍了针对大型水切伦科夫探测器(如Hyper-Kamiokande)的低能中微子事件(低于7 MeV)的基于深度学习的触发算法的性能。展示了自定义神经网络监督分类器的性能,以及两种仅基于探测器噪声训练的异常检测方法:纯自编码器和基于流形投影-扩散恢复(MPDR)的能量模型。监督模型对动能为3 MeV的单电子信号识别效率为76.7%,显著超过了传统基于命中计数触发的26.4%的信号效率,MPDR方法也达到了31.8%。在GPU上的运行时间评估显示,每窗口推理延迟远低于毫秒量级,表明实时操作是可行的。

英文摘要

Modern machine learning techniques have become increasingly important in particle physics because of their powerful pattern-recognition capabilities, including in real-time data acquisition where stringent runtime constraints apply. This paper details the performance of deep-learning-based trigger algorithms for a large water Cherenkov detector such as Hyper-Kamiokande aimed at low-energy neutrino events (below 7 MeV). The performance of custom neural-network supervised classifiers is shown alongside two anomaly-detection approaches trained solely on detector noise: a pure autoencoder and an energy-based model based on Manifold Projection--Diffusion Recovery (MPDR). The supervised model shows signal identification efficiencies of 76.7% for single electrons of 3 MeV kinetic energy, significantly exceeding signal efficiencies obtained from a traditional hit-count-based trigger of 26.4%, as does the MPDR approach with 31.8%. Runtime evaluations on GPU yield per-window inference latencies well below the millisecond scale, indicating that real-time operation is feasible.

2605.31388 2026-06-01 cs.LG 版本更新

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion

带最大最小准则的约束多目标强化学习

Giseung Park, Hyunyoung Nam, Woohyeon Byeon, Amir Leshem, Youngchul Sung

发表机构 * Robotics Institute, University of Toronto(多伦多大学机器人研究所) Faculty of Engineering, Bar-Ilan University(巴伊兰大学工程学院)

AI总结 提出一种融合最大最小准则与显式约束满足的多目标强化学习框架,通过理论分析和表格实验验证收敛性,并在建筑热控制、多目标运动控制和温室气体排放感知交通管理中展示其平衡公平性与约束满足的有效性。

Comments Accepted to ICML 2026

详情
AI中文摘要

多目标强化学习(MORL)通过优化针对多个(通常相互冲突的)目标的策略来扩展标准强化学习。虽然最大最小MORL已成为促进公平性的有效方法,但其适用性仍然有限,特别是在必须纳入约束的情况下。在本文中,我们提出了一种将最大最小准则与显式约束满足相结合的MORL框架。我们为所提出的框架建立了理论基础,并通过收敛性分析和表格设置中的实验验证了所得算法。我们进一步在模拟建筑热控制、多目标运动控制和温室气体排放感知交通管理中展示了我们方法的实际相关性。在这些领域中,我们的方法有效地平衡了多目标决策中的公平性和约束满足。

英文摘要

Multi-Objective Reinforcement Learning (MORL) extends standard RL by optimizing policies with respect to multiple, often conflicting, objectives. While max-min MORL has emerged as an effective approach for promoting fairness, its applicability remains limited, particularly when constraints must be incorporated. In this paper, we propose a MORL framework that integrates the max-min criterion with explicit constraint satisfaction. We establish a theoretical foundation for the proposed framework and validate the resulting algorithm through convergence analysis and experiments in tabular settings. We further demonstrate the practical relevance of our approach in simulated building thermal control, multi-objective locomotion control, and greenhouse-gas-emission-aware traffic management. Across these domains, our method effectively balances fairness and constraint satisfaction in multi-objective decision-making.

2605.31373 2026-06-01 cs.LG cs.AI 版本更新

Scaling Higher-Order Graph Learning with Maximal Clique Complexes

基于最大团复形的规模化高阶图学习

Antoine Vialle, Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo

发表机构 * LTCI, Télécom Paris Institut Polytechnique de Paris(巴黎理工学院LTCI研究所) SAMOVAR, Télécom SudParis Institut Polytechnique de Paris(巴黎理工学院南巴黎研究所) CentraleSupélec, Inria Université Paris-Saclay(中央理工-巴黎高等师范学院与巴黎-萨克雷大学)

AI总结 提出简化与分解的细胞Weisfeiler-Leman测试及最大团复形,结合CliqueWalk随机游走,实现可扩展的高阶图神经网络。

详情
AI中文摘要

图神经网络(GNN)仅限于建模成对交互,而基于细胞复形的高阶模型虽然具有更强的表达能力,但通常可扩展性差。我们引入了简化和分解的细胞Weisfeiler-Leman测试(sCWL和fCWL),它们在保持CWL测试表达力的同时提高了计算效率。我们进一步引入了最大团复形,使得可扩展的CWN在降低时间和内存复杂度的同时保持强大的实证性能。为了避免显式枚举团,我们提出了CliqueWalk,一种有偏随机游走,用于采样最大团,并且其复杂度与图大小呈线性关系。这些贡献为高阶图表示学习提供了一个可扩展的拓扑学习框架。

英文摘要

Graph neural networks (GNNs) are limited to modeling pairwise interactions, while higher-order models based on cell complexes achieve greater expressivity but often suffer from poor scalability. We introduce simplified and factored cellular Weisfeiler Leman tests (sCWL and fCWL), which preserve the expressivity of the CWL test while improving computational efficiency. We further introduce the maximal clique complex, enabling scalable CWNs with reduced time and memory complexity while retaining strong empirical performance. To avoid explicit clique enumeration, we propose CliqueWalk, a biased random walk that samples maximal cliques and scales linearly with graph size. These contributions yield a scalable topological learning framework for higher-order graph representation.

2605.31371 2026-06-01 cs.LG 版本更新

Softsign: Smooth Sign in Your Optimizer For Better Parameter Heterogeneity Handling

Softsign: 优化器中的平滑符号函数以更好地处理参数异质性

Dmitrii Feoktistov, Timofey Belinsky, Andrey Veprikov, Amir Zainullin, Aleksandr Beznosikov

发表机构 * HSE University(莫斯科国立高等经济学院) Yandex Research(Yandex研究院) BRAIn Lab(BRAIn实验室) SB AI Lab(SB人工智能实验室) Innopolis University(因诺波利斯大学)

AI总结 提出SoftSignum和SoftMuon优化器,通过温度控制的软符号变换替代硬符号映射,结合自适应分位数温度调度,解决基于符号的优化器在参数异质性和终端收敛上的问题,并在随机非凸设置下证明收敛性,实验表明在多种深度学习任务(包括大语言模型预训练)中优于硬符号优化器和AdamW。

Comments 9 pages, 3 tables, 4 Figures

详情
AI中文摘要

基于符号和LMO启发的优化器最近在深度学习中因其强大的性能和低内存占用而受到广泛关注。然而,它们的固定幅度更新会损害终端收敛:它们将更新机制与梯度幅度解耦,未能考虑参数异质性,常常导致振荡而非收敛。我们提出SoftSignum,一种基于符号优化的平滑松弛方法,用温度控制的软符号变换替代硬符号映射,实现了从符号类更新到幅度敏感的SGD类步骤的参数级过渡。我们辅以自适应分位数温度调度,并将相同原理扩展到矩阵值优化器,得到SoftMuon。我们还开发了一个基于强凸正则化子和Fenchel共轭的广义几何松弛框架,证明了在随机非凸设置下的收敛性。在包括大语言模型预训练在内的多种深度学习任务上的实验表明,SoftSignum和SoftMuon持续优于其硬符号对应物和标准AdamW。

英文摘要

Sign-based and LMO-inspired optimizers have recently attracted substantial attention in deep learning due to their strong performance and low memory footprint. However, their fixed-magnitude updates can hurt terminal convergence: they decouple update mechanisms from gradient magnitudes and fail to account for parameter heterogeneity, often leading to oscillation rather than convergence. We propose SoftSignum, a smooth relaxation of sign-based optimization that replaces the hard sign map with a temperature-controlled soft-sign transformation, enabling a parameter-wise transition from sign-like updates to magnitude-sensitive SGD-like steps. We complement it with an adaptive quantile-based temperature schedule and extend the same principle to matrix-valued optimizers, obtaining SoftMuon. We also develop a generalized geometry-relaxation framework based on strongly convex regularizers and Fenchel conjugates, proving convergence in stochastic non-convex setting. Experiments on diverse deep learning tasks, including LLM pretraining, show that SoftSignum and SoftMuon consistently improve over their hard sign-based counterparts and standard AdamW.

2605.31369 2026-06-01 cs.LG cs.CV 版本更新

A Unifying View of Variational Generative Wasserstein Flows

变分生成式Wasserstein流的统一视角

Paul Caucheteux, Clément Bonet, Anna Korba

发表机构 * CMAP, CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France(CMAP、法国国家科学研究中心、巴黎高等理工学院、巴黎理工 institute、Palaiseau,法国)

AI总结 本文提出生成式Wasserstein流(GWF)的统一理论框架,将多种现有生成模型视为f-散度目标的参数化JKO方案实例,并扩展至积分概率度量与最大均值差异,推导新算法并阐明与GAN的联系。

Comments Accepted as a spotlight at ICML2026

详情
AI中文摘要

许多现代生成模型可视为最小化概率分布之间的散度,但它们依赖于不同的算法和几何原理。Wasserstein梯度流为优化分布提供了连续时间形式,可通过Jordan-Kinderlehrer-Otto(JKO)方案的隐式离散化来近似。在这项工作中,我们提出了一个基于Wasserstein梯度流的生成建模统一理论框架,称为生成式Wasserstein流(GWF)。我们表明,一大类现有方法可以推导为f-散度目标的参数化JKO方案实例,并建立了几个最近提出的算法之间的等价性。我们将此框架扩展到f-散度之外,涵盖积分概率度量和平方最大均值差异,推导了新的基于JKO的生成算法,并阐明了它们与GAN的联系。我们通过实验研究了JKO正则化对广泛目标的影响。最后,我们分析了参数化Wasserstein流,其中动力学限制在由参数化映射诱导的分布上。

英文摘要

Many modern generative models can be viewed as minimizing divergences between probability distributions, yet they rely on different algorithmic and geometric principles. Wasserstein gradient flows provide a continuous-time formulation for optimizing over distributions, and can be approximated through their implicit discretization via the Jordan-Kinderlehrer-Otto (JKO) scheme. In this work, we present a unified theoretical framework for generative modeling based on Wasserstein gradient flows, which we refer to as Generative Wasserstein Flows (GWF). We show that a broad class of existing methods can be derived as instances of parametric JKO schemes for $f$-divergence objectives, and we establish equivalences between several recently proposed algorithms. We extend this framework beyond f-divergence to Integral Probability Metrics and squared Maximum Mean Discrepancy, deriving new JKO-based generative algorithms, and clarifying their connections with GANs. We study empirically the impact of the JKO regularization for a wide set of objectives. Finally, we analyze parametric Wasserstein flows, where the dynamics are restricted to distributions induced by parametrized maps.

2605.31367 2026-06-01 cs.LG cs.CL 版本更新

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing

通过结构化广义线性令牌混合在表达性与复杂性之间进行权衡

Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen

发表机构 * Department of Computer Science, School of Computing, Institute of Science Tokyo(东京科学研究所计算机科学系)

AI总结 本文提出一个统一框架,将令牌混合层分解为直接输入-输出影响和递归传播,通过设计结构化递归模式在运行时复杂度和表达性之间进行可证明的权衡,并在合成任务和语言建模上验证。

Comments 20 pages, 3 figures, ICML 2026 main

详情
AI中文摘要

令牌混合层在语言模型学习和生成长期依赖关系中起着关键作用。其效率依赖于解码速度与内存需求以及缓存大小之间的必要权衡。考虑因果生成,本文通过一个统一框架探索新的权衡,该框架分离了两个关键特征:(i) 在一个生成步骤中输入对输出的直接影响;(ii) 通过过去输出进行信息的递归传播。该框架涵盖了注意力机制和状态空间模型等主要架构,但也通过允许每个状态依赖于多个过去状态(而不仅仅是直接前驱)来推广递归方程。通过引入结构,我们设计了新的递归模式,这些模式可证明达到所需的复杂度,同时提供关于其表达性的理论见解——以原则性的方式用运行时换取表达性。在合成任务以及语言建模上进行了实证验证。这些结果共同提供了一个统一的工具包,用于理解和设计跨模型家族的高效且富有表达性的令牌混合器。

英文摘要

Token mixing layers play a key role in how language models can learn and generate long-range dependencies. Their efficiency relies on the necessary trade-off between decoding speed and the memory requirements, along with the cache size. Considering causal generation, this paper explores new trade-offs thanks to a unified framework which separates two crucial features: (i) the direct influence of inputs on outputs in one generation step; (ii) the recurrent propagation of information through past outputs. This framework encompasses major architectures such as attention and state-space models, but also generalizes the recurrence equations by allowing each state to depend on multiple past states rather than only the immediate predecessor. By introducing structure, we design new recurrence patterns that provably achieve the desired complexity, while providing theoretical insights on their expressivity -- trading runtime for expressivity in a principled way. Empirical validation is performed on synthetic tasks, along with language modeling. Together, these results provide a unified toolkit for the understanding and design of efficient and expressive token mixers across model families.

2605.31361 2026-06-01 cs.MA cs.AI cs.LG 版本更新

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

梦见他人:多智能体强化学习中世界模型内的潜在队友建模

Tomas Leroy-Stone

发表机构 * Tomas Leroy-Stone

AI总结 提出一种将队友建模为世界模型中可学习组件的方法,通过分解潜在状态并引入心智理论头来推断队友行为,实现零样本和少样本协调。

Comments 5 pages, 2 figures. Accepted as a poster at the 2026 World Modeling Workshop. Conceptual workshop paper

详情
AI中文摘要

在合作多智能体强化学习(MARL)中,智能体必须与内部策略和意图不可直接观察的伙伴协调。虽然像Dreamer这样的世界模型在单智能体设置中表现出强大的泛化能力和样本效率,但它们由于无法处理队友引起的不确定性而在MARL中的应用受到限制。我们提出一个新的视角:将队友视为智能体世界模型中的结构化、可学习组件。我们引入一种架构,将Dreamer风格的循环状态空间模型(RSSM)的潜在状态分解为环境和队友组件,并学习一个辅助的心智理论(ToM)头,从部分轨迹中推断队友行为的潜在嵌入,如角色、意图和预测动作。这些队友潜在变量影响演员和评论家,使智能体能够想象并适应多样化的合作者。我们概述了这种方法如何在部分可观察设置中支持零样本和少样本协调,并提出了一套基准测试和评估协议来评估其影响。这项工作将世界模型定位为不仅是环境动态的预测器,而且是社会行为的模拟器,为可泛化、与人类兼容的AI开辟了新方向。

英文摘要

In cooperative multi-agent reinforcement learning (MARL), agents must coordinate with partners whose internal policies and intentions are not directly observable. While world models such as Dreamer have demonstrated strong generalization and sample efficiency in single-agent settings, their application to MARL remains limited by an inability to handle teammate-induced uncertainty. We propose a new perspective: treat teammates as structured, learnable components within the agent's world model. We introduce an architecture that factorizes the latent state of a Dreamer-style recurrent state-space model (RSSM) into environment and teammate components, and learns an auxiliary Theory-of-Mind (ToM) head to infer latent embeddings of partner behavior such as character, intent, and predicted actions from partial trajectories. These teammate latents condition the actor and critic, enabling the agent to imagine and adapt to diverse collaborators. We outline how this approach can support zero-shot and few-shot coordination in partially observable settings and propose a set of benchmarks and evaluation protocols to assess its impact. This work positions world models as not only predictors of environmental dynamics, but as simulators of social behavior, opening new directions for generalizable, human-compatible AI.

2605.31360 2026-06-01 cs.LG cs.AI 版本更新

dashi: A Python library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment

dashi: 一个用于数据集偏移表征以支持可信AI开发和部署的Python库

David Fernández-Narro, Pablo Ferri, Ángel Sánchez-García, Juan M. García-Gómez, Carlos Sáez

发表机构 * Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de Valéncia(生物医学数据科学实验室,信息与通信技术大学,巴塞罗那理工大学)

AI总结 本文介绍dashi,一个开源Python库,通过无监督(基于信息几何和非参数统计流形)和有监督方法,对数据集偏移进行探索、量化和表征,以支持AI生命周期中的可信度评估。

详情
AI中文摘要

人工智能(AI)生命周期需要对底层数据动态有透彻理解,以实现稳健、安全且经济高效的AI开发和使用。数据集偏移定义为训练和测试数据分布之间的变化。无论是随时间(时间性)还是跨不同站点(多源)发生,它们都可能严重降低模型性能并损害数据质量。这在健康AI中尤为重要,因为不受控制的偏移在训练和操作阶段都可能严重影响患者的安全和基本权利。虽然协变量偏移、先验偏移和概念偏移的理论基础已很完善,但缺乏可访问且全面的软件工具来执行其分析。我们介绍了dashi,一个开源Python库,旨在对数据集偏移进行探索、量化和表征。dashi提供双重方法:一种无监督方法,利用信息几何和非参数统计流形进行数据变异性表征和分析(例如,信息几何时间图和多源变异性指标,如全局概率偏差和源概率异常度);以及一种有监督方法,量化和表征模型性能退化。无监督和有监督方法均适用于用户定义的时间批次和域/源批次。我们在三个模拟和真实世界的健康AI案例研究(妊娠期糖尿病、COVID-19和紧急医疗调度)中展示了dashi的实用性。通过提供交互式视觉分析和变异性指标,dashi支持AI生命周期阶段的可信度,通过评估数据一致性和AI性能实现稳健且安全的机器学习管道。

英文摘要

The Artificial Intelligence (AI) life cycle requires a thorough understanding of the underlying data dynamics for robust, safe and cost-effective AI development and use. Dataset shifts are defined as changes between train and test data distributions. Whether occurring over time (temporal) or across different sites (multi-source), they can severely degrade model performance and compromise data quality. This is particularly important in health AI, where the safety and fundamental rights of patients can be severely affected by uncontrolled shifts both at training and operational stages. While the theoretical foundations of covariate, prior, and concept shifts are well established, there is a lack of accessible and comprehensive software tools to perform their analysis. We introduce dashi, an open-source Python library designed for the exploration, quantification, and characterization of dataset shifts. dashi provides a dual approach: an unsupervised approach that leverages information geometry and non-parametric statistical manifolds to data variability characterization and analysis (e.g., Information Geometric Temporal plots and Multi-Source Variability metrics like Global Probabilistic Deviation and Source Probabilistic Outlyingness), and a supervised approach that quantifies and characterizes model performance degradation. Both unsupervised and supervised approaches work across user-defined temporal and domain/source batches. We demonstrate the utility of dashi on three simulated and real-world health AI case studies on gestational diabetes mellitus, COVID-19 and emergency medical dispatch. By providing interactive visual analytics and variability metrics, dashi supports trustworthiness of AI life cycle stages enabling robust and safe machine learning pipelines through the assessment of data coherence and AI performance.

2605.31354 2026-06-01 cs.AI cs.LG 版本更新

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

资源受限视觉代理中共享状态协作的故障模式诊断

Yunpeng Zhou

发表机构 * Nanjing University of Information Science \& Technology, Nanjing, China

AI总结 本文通过噪声累积视角研究弱学习者(4B-8B模型)在共享工作记忆下的协作推理故障模式,提出CoSee审计框架追踪文档视觉问答中的信息流,发现朴素共享工作空间会放大幻觉而非解决,并识别出噪声强化和策略崩溃两种主要故障模式。

详情
AI中文摘要

模块化视觉推理系统越来越依赖共享工作记忆进行多步协作,但低容量场景下中间状态演化的故障动态仍未被充分探索。我们通过噪声累积的视角研究弱学习者(4B-8B模型)的协作推理故障模式。我们引入了CoSee,一个审计框架,形式化了读-写-验证循环以追踪文档视觉问答中的信息流。在多页、图表和基于网页的基准测试中,我们发现了一个反直觉的退化:朴素的共享工作空间往往放大而非解决幻觉。我们识别出两种主要的故障模式:噪声强化(未基于事实的笔记被重新用作证据)和策略崩溃(添加的上下文使模型转向欠指定的短形式答案)。使用成本-准确率帕累托前沿,我们表明增加计算量在没有显式验证的情况下可能与性能负相关。我们的发现表明,对于资源受限的代理,瓶颈不在于推理深度而在于通信保真度,为可靠的模块化设计提供了轨迹级诊断和机制基线。

英文摘要

Modular visual reasoning systems increasingly rely on shared working memory for multi-step collaboration, yet the failure dynamics of intermediate state evolution in low-capacity regimes remain underexplored. We study failure modes of collaborative reasoning with weak learners (4B--8B models) through the lens of noise accumulation. We introduce CoSee, an auditing framework that formalizes the read-write-verify loop to trace information flow in document visual question answering. Across multi-page, chart, and web-based benchmarks, we find a counter-intuitive degradation: naive shared workspaces often amplify hallucinations rather than resolve them. We identify two dominant failure modes: Noise Reinforcement, where ungrounded notes are reused as evidence, and Policy Collapse, where added context shifts the model toward under-specified, short-form answers. Using cost-accuracy Pareto frontiers, we show that increased compute can correlate negatively with performance without explicit verification. Our findings suggest that for resource-constrained agents, the bottleneck lies not in reasoning depth but in communication fidelity, providing trace-level diagnostics and a mechanistic baseline for reliable modular design.

2605.31346 2026-06-01 math.OC cs.LG 版本更新

Wall-Clock Complexity for Zeroth-Order Optimization with Tunable Oracle Fidelity

可调 oracle 保真度的零阶优化的挂钟复杂度

Alexandra Suvorikova, Igor Pavlov, Artem Vasin, Georgii Bychkov, Anastasia Antsiferova, Darina Dvinskikh, Alexander Gasnikov

发表机构 * Weierstrass Institute for Applied Analysis and Stochastics(魏泽拉斯应用分析与随机学研究所) Moscow Independent Research Institute of Artificial Intelligence(莫斯科独立人工智能研究 institute) MSU Institute for Artificial Intelligence(莫斯科大学人工智能研究所) HSE University(莫斯科国立高等经济大学) Trusted AI Research Center, RAS(可信人工智能研究中心,俄罗斯科学院) Innopolis University(伊诺波利斯大学)

AI总结 针对零阶优化中 oracle 保真度可调的场景,提出挂钟复杂度模型并分析参数选择对总时间的影响,揭示加速方法可能劣于非加速方法,并刻画恒定保真度策略最优的条件。

详情
AI中文摘要

零阶(黑箱)优化应用于梯度不可用且目标评估依赖昂贵模拟的情况。在许多此类应用中,oracle 保真度是可调的:更高精度的查询降低噪声但增加计算成本。为捕捉这一权衡,我们研究一个精度感知的挂钟模型,其中每次保真度为 $\delta$ 的查询具有成本 $c(\delta)$,并在目标精度约束下最小化总时间 $T_{\mathrm{total}} = \sum_{k=1}^{N} c(\delta_k)$。我们展示了 oracle 类型、噪声模型和优化方案的选择如何导致算法参数的显式挂钟最优选择。例如,我们证明加速方法在挂钟时间上可能劣于非加速方案。此外,我们刻画了恒定保真度策略在 Big-O 意义上最优的条件。我们的框架提供了一种统一的方法,将收敛保证转化为实际的保真度和批处理建议。

英文摘要

Zeroth-order (black-box) optimization is applied when gradients are unavailable and objective evaluations rely on expensive simulations. In many such applications, the oracle fidelity is tunable: higher-accuracy queries reduce noise but incur higher computational costs. To capture this trade-off, we study an accuracy-aware wall-clock model where each query with fidelity $δ$ has a cost $c(δ)$, and we minimize the total time $T_{\mathrm{total}} = \sum_{k=1}^{N} c(δ_k)$, subject to a target accuracy constraint. We show how the choice of oracle type, noise model, and optimization scheme induces explicit wall-clock-optimal choices for the algorithmic parameters. For instance, we demonstrate that accelerated methods can be wall-clock inferior to non-accelerated schemes. Furthermore, we characterize the conditions under which a constant fidelity strategy is optimal in the Big-O sense. Our framework provides a unified methodology to translate convergence guarantees into practical fidelity and batching recommendations.

2605.31345 2026-06-01 stat.ML cs.LG stat.ME 版本更新

Log-Ratio Propagation on the Simplex: A Theory of Cellwise Contamination for Compositional Data

单纯形上的对数比传播:成分数据细胞污染的理论

Matthias Templ

发表机构 * School of Business, FHNW Fachhochschule Nordwestschweiz(北瑞士应用科学大学商学院)

AI总结 本文提出单纯形上细胞污染的理论,通过乘法扰动和传播定理证明单个成分污染导致对数比向量秩一偏移,并揭示欧几里得细胞方法在单纯形上的失效与降维现象。

Comments 50 pages, no figures; 11-page supplement included as an ancillary file. A companion methods paper (cellPcaCoDa: cellwise-robust PCA for compositional data) is forthcoming

详情
AI中文摘要

成分数据必须通过对数比进行分析:尺度不变性,该领域的定义公理,别无选择。中心化对数比除以每个部分的几何平均值,因此单个受污染成分会同时移动所有中心化对数比坐标,将对数比向量位移一个固定量,任何坐标选择都无法减少。我们围绕这一观察发展了单纯形上细胞污染的理论。基于乘法扰动的尺度不变污染模型与传播定理相结合,表明单个原始部分的腐败会导致对数比向量的秩一偏移,方向由对比矩阵决定。由此产生的扰动模式不等同于对数比坐标中的任何独立细胞污染模型——因此,应用于对数比的标准欧几里得细胞方法在单纯形污染机制下是不适定的。对于其欧几里得细胞崩溃由列集中配置见证的估计量——包括MCD、$S$-、$τ$-和坐标$M$-估计量的位置和散度——单纯形上的细胞崩溃值相对于其欧几里得对应值减少了因子$(D-1)/D$,这种减少是紧的,并且纯粹源于$nD$个原始细胞与$n(D-1)$个ilr细胞之间的归一化不匹配。变异矩阵的细胞影响函数携带诊断指纹:单个部分的污染恰好膨胀一行和一列,从而识别出责任成分。这些结果为单纯形上的细胞鲁棒方法奠定了理论基础;一篇配套论文开发了一种利用传播几何的细胞鲁棒PCA估计器,并在模拟和地球化学数据上进行了演示。

英文摘要

Compositional data must be analysed through log-ratios: scale invariance, the defining axiom of the field, leaves no alternative. The centred log-ratio divides by the geometric mean of every part, so a single contaminated component shifts every centred-log-ratio coordinate at once, displacing the log-ratio vector by a fixed amount that no choice of coordinates can reduce. We develop a theory of cellwise contamination on the simplex around this observation. A scale-invariant contamination model built from multiplicative perturbation combines with a propagation theorem showing that corruption of a single raw part induces a rank-one shift of the log-ratio vector, with direction determined by the contrast matrix. The resulting perturbation pattern is not equivalent to any independent cellwise contamination model in log-ratio coordinates -- so standard Euclidean cellwise methods applied to log-ratios are ill-posed under the simplex contamination mechanism. For estimators whose Euclidean cellwise breakdown is witnessed by a column-concentrated configuration -- a class including MCD, $S$-, $τ$-, and coordinate-wise $M$-estimators of location and scatter -- the cellwise breakdown value on the simplex is reduced by the factor $(D-1)/D$ relative to its Euclidean counterpart, a reduction that is tight and arises purely from the normalisation mismatch between $nD$ raw cells and $n(D-1)$ ilr cells. The cellwise influence function for the variation matrix carries a diagnostic fingerprint: contamination of a single part inflates exactly one row and column, identifying the responsible component. These results form the theoretical foundation for cellwise-robust methods on the simplex; a companion paper develops a cellwise-robust PCA estimator that exploits the propagation geometry and demonstrates it on simulated and geochemical data.

2605.31324 2026-06-01 cs.LG cs.AI 版本更新

Inconsistency-Aware Minimization: Improving Generalization with Unlabeled Data

不一致感知最小化:利用无标签数据提升泛化能力

Hee-Sung Kim, Hyeonseong Kim, Sungyoon Lee

发表机构 * Department of Computer Science, Hanyang University, Seoul, Korea(汉阳大学计算机科学系)

AI总结 本文提出一种基于信息几何的局部不一致性度量,并据此设计不一致感知最小化(IAM)方法,通过无标签数据计算该度量并融入训练目标,从而提升深度学习模型的泛化性能。

Comments ICML 2026

详情
AI中文摘要

估计泛化差距并开发改进泛化的优化方法对于深度学习模型至关重要,无论是从理论理解还是实际应用角度。利用无标签数据实现这些目标在实际场景中具有显著优势。本文从神经网络参数空间的信息几何角度出发,引入了一种新的泛化度量——局部不一致性。局部不一致性的一个关键特征是它可以在没有显式标签的情况下计算。我们通过将局部不一致性与Fisher信息矩阵和损失Hessian矩阵联系起来,建立了理论基础。实验上,我们证明了局部不一致性与泛化差距相关。基于这些发现,我们提出了不一致感知最小化(IAM),将局部不一致性纳入训练目标。我们证明,在标准监督学习设置中,IAM增强了泛化能力,实现了与现有方法(如锐度感知最小化)相当的性能。此外,IAM在半监督和自监督学习场景中表现出有效性,其中局部不一致性是从无标签数据计算得出的。

英文摘要

Estimating the generalization gap and developing optimization methods that improve generalization are crucial for deep learning models, for both theoretical understanding and practical applications. Leveraging unlabeled data for these purposes offers significant advantages in real-world scenarios. This paper introduces a novel generalization measure, local inconsistency, derived from an information-geometric perspective on the parameter space of neural networks. A key feature of local inconsistency is that it can be computed without explicit labels. We establish theoretical underpinnings by connecting local inconsistency to the Fisher information matrix and the loss Hessian. Empirically, we demonstrate that local inconsistency correlates with the generalization gap. Based on these findings, we propose Inconsistency-Aware Minimization (IAM), which incorporates local inconsistency into the training objective. We demonstrate that in standard supervised learning settings, IAM enhances generalization, achieving performance comparable to that of existing methods such as Sharpness-Aware Minimization. Furthermore, IAM exhibits efficacy in semi- and self-supervised learning scenarios, where the local inconsistency is computed from unlabeled data.

2605.31318 2026-06-01 cs.LG cs.MA 版本更新

Generalized Intention Modeling in Multi-Agent Reinforcement Learning

多智能体强化学习中的广义意图建模

Mateusz Odrowaz-Sypniewski, Jasmine Bayrooti, Ajay Shankar, Amanda Prorok

发表机构 * Department of Computer Science and Technology, University of Cambridge, UK(计算机科学与技术系,剑桥大学,英国)

AI总结 提出一种任务自适应的对手建模框架,通过性能驱动的多意图表示混合及最大化与自我智能体未来回报的互信息的新意图表示,提升非合作多智能体环境中的决策性能。

详情
AI中文摘要

在非合作、竞争和一般和的多智能体强化学习中,建模对手的意图对于有效决策至关重要。现有的对手建模方法使用从先验选择的回合信息(如对手的下一个动作或未来环境状态)中提取的嵌入来编码意图,并以此引导自我智能体的行为。这些方法假设所选信息普遍代表意图;然而,我们通过实验证明情况并非如此,因为意图通常依赖于任务和环境。为了解决这个问题,我们引入了一个任务自适应的对手建模框架,该框架学习一种性能驱动的多意图表示混合。此外,我们提出了一种新的意图表示,它最大化与自我智能体未来回报的互信息,从而捕获与性能最直接相关的对手信息。我们的方法在各种任务中始终匹配或超越最先进基线的性能,并揭示了不同对手建模策略何时以及为何成功。

英文摘要

Modeling an opponent's intent is critical for effective decision-making in non-cooperative, competitive, and general-sum multi-agent reinforcement learning. Existing opponent modeling methods encode intent using an embedding derived from episode information chosen a priori, such as the opponent's next action or a future environment state, and use this to guide the ego-agent's behavior. These approaches assume that the chosen information is universally representative of intent; however, we show empirically that this is not the case as intentions are often task- and environment-dependent. To address this, we introduce a task-adaptive opponent modeling framework that learns a performance-driven mixture of multiple intent representations. We further introduce a new intention representation that maximizes mutual information with the ego-agent's future returns, thereby capturing opponent information that is most directly relevant to performance. Our approach consistently matches or exceeds the performance of state-of-the-art baselines across diverse tasks and yields insights into when and why different opponent modeling strategies succeed.

2605.31317 2026-06-01 cs.LG 版本更新

Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning

遗忘有邻居:机器遗忘中的局部连带遗忘

Polina Dolgova, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍茨信息安全中心) Universität des Saarlandes(萨尔兰州立大学)

AI总结 本文研究机器遗忘中梯度上升和随机标签方法导致的局部连带遗忘现象,并提出了基于局部教师蒸馏的缓解策略。

详情
AI中文摘要

机器遗忘旨在无需完全重新训练的情况下移除选定训练样本的影响。标准评估通常使用聚合指标(如准确率和遗忘分数)来概括遗忘质量,这可能会掩盖局部失败。我们通过比较遗忘模型与删除后重新训练模型的预测,在样本级别研究这种失败模式。我们表明,这种逐点差异可能高度不均匀:对于梯度上升和随机标签方法,无论是否进行保留集微调,差异都随着与遗忘集的几何接近度而增大。我们将这种现象称为局部连带遗忘。我们的分析确定了该效应背后的机制:遗忘过程中使用的替代目标可能与重新训练引起的局部预测结构不一致,并且这种不一致通过共享表示传播到邻近样本。受此机制启发,我们提出了局部教师蒸馏,一种简单的缓解策略,用仅在遗忘集的保留邻居上训练的小教师生成的软标签替换随机目标。在CIFAR-100部分类别删除任务中,这种局部教师使遗忘模型更接近重新训练,尤其是在遗忘集附近,同时保持有竞争力的聚合遗忘指标。

英文摘要

Machine unlearning aims to remove the influence of selected training examples without full retraining. Standard evaluations often summarize unlearning quality with aggregate metrics, such as accuracy- and forgetting-based scores, which can hide localized failures. We study this failure mode at the example level by comparing the predictions of an unlearned model to those of the model retrained after deletion. We show that this pointwise discrepancy can be highly non-uniform: for gradient-ascent and random-labeling methods, with and without retain-set fine-tuning, it grows with geometric proximity to the forget set. We call this phenomenon localized collateral forgetting. Our analysis identifies a mechanism behind the effect: surrogate targets used during unlearning can be inconsistent with the local prediction structure induced by retraining, and this inconsistency propagates through shared representations to nearby examples. Motivated by this mechanism, we propose Local Teacher Distillation, a simple mitigation strategy that replaces random targets with soft labels from a small teacher trained only on retained neighbors of the forget set. On CIFAR-100 partial-class deletion, this local teacher brings the unlearned model substantially closer to retraining, especially near the forget set, while maintaining competitive aggregate unlearning metrics.

2605.31315 2026-06-01 cs.LG 版本更新

Graph Neural Networks Are Not Continuous Across Graph Resolutions

图神经网络在图分辨率上不连续

Christian Koke, Yuesong Shen, Abhishek Saroha, Marvin Eisenberger, Bastian Rieck, Michael Bronstein, Daniel Cremers

发表机构 * Munich Center for Machine Learning(慕尼黑机器学习中心) Technical University of Munich, Munich(慕尼黑技术大学) University of Fribourg(弗里堡大学) Institute of Computational Biology(计算生物学研究所) University of Oxford(牛津大学)

AI总结 本文证明图神经网络在自然图收敛模式下不连续,并提出一种基于信息传播方案的结构性修改,使其具备跨尺度连续性,从而实现对不同分辨率的稳定整合与泛化。

Comments arXiv admin note: text overlap with arXiv:2310.00431

详情
AI中文摘要

我们表明,与社区中的传统观点相反,图神经网络(GNN)对于所有自然的图收敛模式并不连续。因此,GNN 可能为非常相似的图生成截然不同的潜在表示。特别是,它们为表示同一底层对象但处于不同分辨率尺度的图分配了非常不同的潜在嵌入。我们将这种不连续性的失败追溯到由常用信息传播方案引起的结构性障碍。基于这一见解,我们推导出对标准 GNN 架构的一种原则性修改,使模型具备跨尺度的连续性。所提出的修改能够实现不同分辨率的稳定整合以及它们之间的可靠泛化。我们通过广泛的数值实验系统性地验证了我们的理论发现。

英文摘要

We show that contrary to conventional wisdom in the community, graph neural networks (GNNs) are not continuous with respect to all natural modes of graph convergence. As a result, GNNs may generate substantially different latent representations for graphs that are very similar. In particular they assign vastly different latent embeddings to graphs that represent the same underlying object at different resolution scales. We trace this failure of continuity back to a structural obstruction arising from commonly used information-propagation schemes. Building on this insight we then derive a principled modification to standard GNN architectures which equips models with continuity across scales. The proposed modification enables consistent integration of distinct resolutions and reliable generalization between them. We systematically validate our theoretical findings in a wide range of numerical experiments.

2605.31311 2026-06-01 math.OC cs.DC cs.LG 版本更新

S$^3$LDBO: A Snapshot Single-Loop Algorithm for Decentralized Bilevel Optimization

S$^3$LDBO: 一种用于去中心化双层优化的快照单循环算法

Chao Yin, Youran Dong, Shiqian Ma, Bofan Wang, Junfeng Yang

发表机构 * School of Mathematics, Hohai University(河海大学数学学院) School of Mathematics, Nanjing University(南京大学数学学院) Department of Computational Applied Mathematics and Operations Research, Rice University(理海大学计算应用数学与运筹学系)

AI总结 提出S$^3$LDBO算法,通过快照机制间歇跳过昂贵导数计算,实现去中心化双层优化的高效单循环求解,并理论证明其复杂度,实验验证计算效率与学习性能的平衡。

详情
AI中文摘要

网络化AI系统日益依赖多个智能体通过通信网络协作学习和适应模型。在此类系统中,双层公式自然出现在超参数优化、数据清洗和元学习中,但梯度、雅可比矩阵和海森矩阵的重复评估可能给单个智能体带来巨大计算负担。为应对这一挑战,我们提出Snapshot-SLDBO(S$^3$LDBO),一种高效的单循环去中心化双层优化算法,通过快照机制使智能体能够间歇性地跳过昂贵的导数计算。该机制可解释为网络化AI的自主计算-适应策略,其中智能体选择性执行昂贵的局部更新,同时保持全局协作学习。我们在确定性设定下建立了所提出算法的遍历迭代复杂度和高概率非遍历迭代复杂度。在合成数据集和MNIST数据集上的超参数优化、Fashion-MNIST上的数据超清洗以及miniImageNet上的去中心化元学习实验结果表明,所提出算法在保持竞争性学习性能的同时提高了计算效率。

英文摘要

Networked AI systems increasingly rely on multiple agents that collaboratively learn and adapt models over communication networks. In such systems, bilevel formulations naturally arise in hyperparameter optimization, data cleaning, and meta-learning, but the repeated evaluation of gradients, Jacobians, and Hessians can impose a substantial computational burden on individual agents. To address this challenge, we propose Snapshot-SLDBO (S$^3$LDBO), an efficient single-loop decentralized bilevel optimization algorithm that enables agents to intermittently skip expensive derivative evaluations through a snapshot mechanism. This mechanism can be interpreted as an autonomous computation-adaptation strategy for networked AI, where agents selectively perform costly local updates while maintaining global collaborative learning. We establish the ergodic iteration complexity and the high probability nonergodic iteration complexity of the proposed algorithm within a deterministic setting. Experimental results on hyperparameter optimization with synthetic and MNIST datasets, data hyper-cleaning on Fashion-MNIST, and decentralized meta-learning on miniImageNet demonstrate that the proposed algorithm improves computational efficiency while maintaining competitive learning performance.

2605.31309 2026-06-01 cs.LG math.PR stat.ML 版本更新

Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework

随机迭代算法的非渐近收敛性:一个李雅普诺夫框架

Zaiwei Chen, Siva Theja Maguluri

发表机构 * Purdue IE(普渡大学工业工程) Georgia Tech IE(佐治亚理工学院工业工程)

AI总结 本文综述了基于李雅普诺夫技术的随机迭代算法(随机逼近)的有限时间分析方法,通过广义Moreau包络作为通用李雅普诺夫函数,给出了均方收敛保证,并应用于随机梯度下降、线性SA及Q学习等强化学习算法,最后讨论了马尔可夫噪声、半范数压缩算子等扩展。

Comments 44 pages

详情
AI中文摘要

我们综述了基于李雅普诺夫技术的随机迭代算法(也称为随机逼近(SA)算法)的有限时间分析方法,用于求解不动点方程 $ar{F}(x)=x$,其中算子 $ar{F}(\cdot)$ 只能通过带噪声的预言机访问。我们首先关注标准设定,其中 $ar{F}(\cdot)$ 关于某种范数是压缩的且噪声是独立同分布的,并解释广义Moreau包络如何作为通用李雅普诺夫函数,无论底层范数如何。然后,我们展示该框架如何产生均方收敛保证,并应用于随机梯度下降、线性SA以及基于值的强化学习算法,如Q学习和时序差分学习。最后,我们讨论向马尔可夫噪声、半范数压缩算子、耗散算子和高概率界的扩展,并以开放问题作结。目标是提供一个统一且自包含的SA有限时间分析及其应用(尤其是在强化学习中)的路线图。

英文摘要

We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.

2605.31304 2026-06-01 cs.LG cs.CV 版本更新

Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance

无权衡的可解释性:在同等预测性能下解开多义性

Doğukan Bağcı, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse

发表机构 * Max Planck Institute for Informatics(马克斯·普朗克信息研究所) Department of Computer Science, TU Darmstadt(图宾根大学计算机科学系)

AI总结 提出ELUDe方法,通过无损重组层间信息流,在不改变模型输出的前提下将多义神经元分解为单义特征,提升深度神经网络的可解释性。

Comments Preprint

详情
AI中文摘要

深度神经网络(DNN)被广泛使用,但解释它们实际学到什么仍然困难。一个主要障碍是单个神经元通常编码多个不相关的概念,模糊了网络的决策过程。虽然先前的工作,如稀疏自编码器,可以将这些混合信号分离成更有意义的“单义”特征,但这通常需要以可能降低下游性能的方式改变模型。为了克服这一点,我们引入了ELUDe(显式、无损、无监督解缠),一种在保持功能等价性的同时提高DNN可解释性的方法。ELUDe将潜在表示分解为清晰、可检查的子单元,这些子单元表现得像可解释的特征,同时保证模型的输出保持完全相同。它不需要显式训练,不需要标签,并且可以应用于预训练模型。ELUDe通过重组层间信息流的方式工作,重新路由特定概念的贡献,同时通过构造保留原始计算。在多个视觉模型上,包括DINOv2和有监督的ViT-B/16,ELUDe提高了可解释性,保持下游准确性不变,运行高效,并支持实际用途,如引导模型表示。简而言之,ELUDe提供了(几乎)没有权衡的可解释性:更清晰、可扩展且可操作的模型洞察,且性能无损失。

英文摘要

Deep neural networks (DNNs) are widely used, but interpreting what they actually learn remains difficult. A major obstacle is that individual neurons often encode multiple unrelated concepts, obscuring the decision process of the network. While prior work, such as sparse autoencoders, can separate these mixed signals into more meaningful, "monosemantic" features, this typically requires altering the model in ways that can degrade downstream performance. To overcome this, we introduce ELUDe (explicit, lossless, unsupervised disentanglement), a method for improving the interpretability of DNNs while preserving their functional equivalence. ELUDe breaks latent representations into clear, inspectable sub-units that behave like interpretable features, while guaranteeing that the model's outputs remain exactly the same. It requires no explicit training, no labels, and can be applied to pretrained models. ELUDe works by reorganizing how information flows between layers, re-routing concept-specific contributions while preserving the original computation by construction. Across several vision models, including DINOv2 and supervised ViT-B/16, ELUDe improves interpretability, keeps downstream accuracy unchanged, runs efficiently, and supports practical uses such as steering model representations. In short, ELUDe offers interpretability (almost) without a tradeoff: clearer, scalable, and actionable model insights with no loss in performance.

2605.31296 2026-06-01 q-bio.BM cs.LG 版本更新

mRNAutilus: Multi-Objective-Guided Discrete Generation of mRNA with Optimized Therapeutic Properties

mRNAutilus:多目标引导的mRNA离散生成与优化治疗特性

Sawan Patel, Sophia Tang, Yesol Kim, Yinuo Zhang, Divya Srijay, Ping-Jung Lin, Shambhavi Shubham, Fengmei Pi, Cedric Wu, Sherwood Yao, Pranam Chatterjee

发表机构 * Atom Bioworks Inc.(Atom Bioworks公司) Department of Computer and Information Science University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系) Department of Bioengineering University of Pennsylvania(宾夕法尼亚大学生物工程系) Center of Computational Biology Duke-NUS Medical School(杜克-新加坡国立大学医学学院计算生物学中心) GenScript USA Inc.(GenScript美国公司)

AI总结 提出mRNAutilus框架,结合掩码离散扩散模型和蒙特卡洛树引导,实现同时优化密码子和从头设计UTR,生成多目标帕累托最优的完整mRNA序列,在多个靶标上显著提升表达和稳定性。

详情
AI中文摘要

治疗性mRNA设计需要协调整个转录本中多个相互作用的序列特征,其中密码子使用、非翻译区(UTR)及其耦合共同决定稳定性、翻译效率和蛋白质表达。在这里,我们提出通过展开轨迹和信息潜在更新生成mRNA(mRNAutilus),这是一个直接从序列进行同时密码子优化和从头UTR设计的框架。mRNAutilus结合了在数百万全长mRNA上训练的掩码离散扩散模型与蒙特卡洛树引导,在多个功能目标下生成帕累托高效序列,使用模型嵌入上的轻量级回归器预测半衰期、翻译效率和蛋白质丰度。与最近分别设计编码序列和UTR或依赖事后组装和筛选的方法不同,mRNAutilus在单个过程中生成完整转录本,并跨属性优化。在多种靶标上,编码P. pyralis荧光素酶的零样本mRNA表达量比野生型高400倍以上,并优于商业和机器学习设计的基线,包括零样本生成方法。零样本SARS-CoV-2 Spike mRNA超过临床使用和商业构建体,并匹配或超越实验室优化设计,同时具有更好的耐久性。我们进一步展示了在治疗环境中的通用性,包括先导编辑(PEMax)和可编程蛋白质组调节,其中mRNAutilus设计的构建体增强了用于β-连环蛋白降解的肽引导E3连接酶(uAbs)的表达。这些结果建立了一个基于序列的多目标框架,用于生成适用于多种生物应用的功能性mRNA。

英文摘要

Therapeutic mRNA design requires coordinating multiple interacting sequence features across the full transcript, where codon usage, untranslated regions (UTRs), and their coupling jointly determine stability, translation efficiency, and protein expression. Here, we present mRNA generation via unrolled trajectories and informed latent updates (mRNAutilus), a framework for simultaneous codon optimization and de novo UTR design directly from sequence. mRNAutilus combines a masked discrete diffusion model trained on millions of full-length mRNAs with Monte Carlo Tree Guidance to generate Pareto-efficient sequences under multiple functional objectives, using lightweight regressors over model embeddings to predict half-life, translation efficiency, and protein abundance. Unlike recent methods that design coding sequences and UTRs separately or rely on post hoc assembly and screening, mRNAutilus generates complete transcripts in a single process optimized across properties. Across diverse targets, zero-shot mRNAs encoding P. pyralis luciferase achieve over 400-fold higher expression than wild-type and outperform commercial and machine learning-designed baselines, including zero-shot generative approaches. Zero-shot SARS-CoV-2 Spike mRNAs exceed clinically used and commercial constructs and match or surpass lab-optimized designs with improved durability. We further demonstrate generality in therapeutic settings, including prime editing (PEMax) and programmable proteome modulation, where mRNAutilus-designed constructs enhance expression of peptide-guided E3 ligases (uAbs) for beta-catenin degradation. These results establish a sequence-based, multi-objective framework for generating functional mRNAs tailored to diverse biological applications.

2605.31295 2026-06-01 cs.SD cs.AI cs.IR cs.LG 版本更新

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

通过激活引导实现潜在空间解缠:符号音乐生成中可解释的属性控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics Innovation Lab Orfium Athens, Greece Department of Music Technology Acoustics Hellenic Mediterranean University Rethymno, Greece Institute of Informatics \& Telecommunications National Center for Scientific Research “Demokritos” Athens, Greece Department of Informatics Athens University of Economics

AI总结 本文利用差分均值方法从多轨音乐Transformer的残差流中分离音高和时长的潜在方向,并通过Gram-Schmidt正交化实现双属性引导,从而在推理时实现可解释的确定性属性调制。

Comments Accepted at EUSIPCO 2026 (34th European Signal Processing Conference), 5 pages, 2 figures

详情
AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展,但在实现对离散信号属性的细粒度、可解释控制方面仍存在显著差距。本文研究了多轨音乐Transformer(MMT)的机制可解释性,并提出了一种无需重新训练的确定性属性调制框架,通过推理时的激活引导来弥合这一差距。利用差分均值(DiffMean)方法,我们在残差流中分离了信号属性(特别是音高和时长)的潜在方向。我们验证了该领域的线性表示假设,实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题,我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明,与简单的向量加法相比,这种几何解耦减少了概念干扰和信号退化,即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

2605.31291 2026-06-01 cs.IR cs.LG 版本更新

Contextual Scalarisation Thompson Sampling for multi-objective decisions in public media

面向公共媒体多目标决策的上下文标量化汤普森采样

Théo Maëtz, Luc Guillet, Andrea Cavallaro

发表机构 * Radio Télévision Suisse(瑞士广播电视台) EPFL(苏黎世联邦理工学院)

AI总结 提出上下文标量化汤普森采样(CSTS)方法,通过学习上下文相关的目标权重,在公共媒体推荐中平衡多个竞争目标,实验表明其优于固定权重和标准上下文赌博机方法。

Comments 15 pages, 3 figures, 3 tables. Submitted-manuscript version of a paper accepted at ICPR 2026. The Version of Record will be published in the Springer Lecture Notes in Computer Science series; DOI will be added when available

详情
AI中文摘要

推荐系统可能在多个相互竞争的目标下运行。例如,在公共服务的编辑决策中,必须平衡受众覆盖、文化价值、公共服务使命和运营约束。现有方法依赖于固定的目标组合或基于帕累托的优化,无法适应不同情境下优先级的动态变化。本文提出上下文标量化汤普森采样(CSTS),一种多目标上下文赌博机方法,它学习根据观察到的上下文对目标进行加权。我们在瑞士国家广播公司Radio Télévision Suisse的真实节目数据上评估CSTS,结果显示,与固定权重和标准上下文赌博机方法相比,CSTS在上下文相关性和与专家策展实践的一致性方面均有提升。

英文摘要

Recommender systems may operate under multiple, competing objectives. For example, audience reach, cultural values, public service mandate, and operational constraints must be balanced in editorial decisions of public service media. Existing approaches relying on fixed combinations of objectives or Pareto-based optimisation do not adapt to changing priorities across situations. In this paper, we propose Contextual Scalarisation Thompson Sampler (CSTS), a multi-objective contextual bandit method that learns to weight objectives as a function of the observed context. We evaluate CSTS on real programming data from Radio Télévision Suisse, the Swiss national broadcaster, showing improved contextual relevance and better alignment with expert curation practices compared to fixed weight and standard contextual bandit approaches.

2605.31289 2026-06-01 cs.LG cs.AI 版本更新

The Terminal Representation in Reinforcement Learning

强化学习中的终端表示

Amir Esterhuysen, Anders Jonsson

发表机构 * Dept. Information and Communication Technologies(信息与通信技术系) Universitat Pompeu Fabra(庞培法布拉大学)

AI总结 提出终端表示(TR),一种无需特征分解即可直接用于下游任务且计算开销更低的奖励加权状态表示方法。

详情
AI中文摘要

表示学习是强化学习(RL)中用于时空抽象的强大工具。两种成熟的方法是通过后继表示(SR)和默认表示(DR)。SR通过状态引发的未来轨迹对其进行编码,捕获与奖励解耦的信息流。DR在此基础上用奖励加权轨迹,将信用分配结构整合到表示中。两种表示的特征向量已被用于支持一系列下游任务——包括选项发现、奖励塑造、迁移学习和探索。我们引入了一种结构不同的公式:终端表示(TR)。TR类似于DR对奖励加权轨迹进行编码,但可以作为更低维度的对象进行学习,并且可以直接用于上述应用而无需特征分解。特征分解还施加了对称转移动力学的假设,而TR可以绕过这一点。在这项工作中,我们发展了TR的理论基础:其推导、两种学习算法的收敛性、其在零样本组合性中的使用,以及替代奖励公式之间的等价性。我们进一步表明TR嵌入在顶部DR特征向量中,使其无需特征分解即可捕获相同的基础知识。此外,我们提供了经验证据,证明TR在辅助应用中作为现有表示的可行替代方案,同时在学习、存储和使用方面需要更少的计算开销。

英文摘要

Representation learning is a powerful tool for spatio-temporal abstraction within reinforcement learning (RL). Two well established approaches are through the successor representation (SR) and the default representation (DR). The SR encodes states by the future trajectories they induce, capturing information flow decoupled from reward. The DR builds on this by weighting trajectories with reward, integrating credit-assignment structure into the representation. Eigenvectors of both representations have been used to support a range of downstream tasks -- including option discovery, reward shaping, transfer learning, and exploration. We introduce a structurally distinct formulation: the terminal representation (TR). The TR encodes reward-weighted trajectories similarly to the DR, but can be learned as a lower-dimensionality object, and can be used directly for the mentioned applications without eigenvector computations. Eigendecomposition also imposes the assumption of symmetric transition dynamics, which the TR can bypass. In this work we develop the theoretical foundations of the TR: its derivation, convergence of two learning algorithms, its use for zero-shot compositionality, and equivalences between alternative reward formulations. We further show the TR is embedded in the top DR eigenvector, allowing it to capture the same underlying knowledge without eigendecomposition. Additionally, we provide empirical evidence of the TR as a viable alternative to existing representations in subsidiary applications, while requiring less computational overhead to learn, store, and use.

2605.31277 2026-06-01 cs.CR cs.LG 版本更新

GETA: Generalized Encrypted Traffic Analysis

GETA: 通用加密流量分析

Ransika Gunasekara, Rahat Masood, Salil Kanhere

发表机构 * University of New South Wales (UNSW)(新南威尔士大学)

AI总结 提出GETA框架,通过元学习、嵌入优化和自注意力机制,仅利用流量元数据建模多变量时间序列,实现跨协议、少样本的加密流量分析,在应用识别、VPN分类、IoT设备指纹和攻击检测等任务上优于现有方法。

详情
AI中文摘要

传统流量分析正受到加密、隧道和隐私保护协议的快速采用的根本性挑战,这些协议日益模糊数据包载荷并限制深度包检测(DPI)的实用性。尽管机器学习推进了加密流量分析,但现有方法通常仍依赖于特定协议的头部特征,依赖大量标注数据集,并在跨异构网络环境部署时性能下降。我们提出GETA,一个协议无关的加密流量分析框架,它仅使用流量元数据将网络流建模为多变量时间序列,从而避免了对数据包载荷或头部语义的依赖。GETA结合了元学习、嵌入优化和自注意力机制,以支持对未见过的领域进行少样本适应,仅需极少的标注数据。在涵盖应用识别、VPN流量分类、IoT设备指纹识别和攻击检测的九个公开数据集上,GETA始终优于最先进的基线方法。这些结果表明,GETA为现代加密网络中的鲁棒流量分析提供了一个实用且可泛化的基础。

英文摘要

Traditional traffic analysis is being fundamentally challenged by the rapid adoption of encryption, tunnelling, and privacy-preserving protocols, which increasingly obscure packet payloads and limit the usefulness of Deep Packet Inspection (DPI). Although machine learning has advanced encrypted traffic analysis, existing approaches often remain tied to protocol-specific header features, depend on large labelled datasets, and degrade when deployed across heterogeneous network environments. We present GETA, a protocol-agnostic framework for encrypted traffic analysis that models network flows as multivariate time series using only traffic metadata, thereby avoiding reliance on packet payloads or header semantics. GETA combines meta-learning, embedding refinement, and self-attention to support few-shot adaptation to previously unseen domains with minimal labelled data. Across nine public datasets spanning application identification, VPN traffic classification, IoT device fingerprinting, and attack detection, GETA consistently outperforms state-of-the-art baselines. These results show that GETA offers a practical and generalisable foundation for robust traffic analysis in modern encrypted networks.

2605.31276 2026-06-01 cs.LG 版本更新

Learning Parametric Nitrogen Fertilizer Response Curves Using Neuro Symbolic Regression

使用神经符号回归学习参数化氮肥响应曲线

Giorgio Morales, John Sheppard

发表机构 * Aston Centre for Artificial Intelligence Research and Application(阿斯顿人工智能研究与应用中心) Aston University(阿斯顿大学) Gianforte School of Computing(吉安福特计算学院) Montana State University(蒙大拿州立大学)

AI总结 提出一种基于神经符号回归的方法,无需预设函数形式即可学习氮肥响应曲线,并在真实冬小麦数据上验证其优于传统模型。

Comments Accepted at the Workshop on Symbolic Regression and Equation Discovery, part of the 2026 IEEE World Congress on Computational Intelligence (WCCI) and the IEEE Congress on Evolutionary Computation (CEC)

详情
AI中文摘要

准确模拟作物对氮肥的响应是精准农业中的基本挑战,因为它影响经济效益和环境可持续性。现有方法要么依赖预定义的参数形式,要么使用不透明的机器学习模型,限制了它们从数据中解释或发现特定地点函数关系的能力。在这项工作中,我们提出了一种神经符号回归方法,无需假设预定义的函数形式即可学习参数化的氮响应曲线。我们的方法集成了基于Transformer的多集符号骨架预测策略,能够发现多个子域或管理区之间的共享函数结构。通过构建多样化的输入子集并强制它们之间的一致性,该方法恢复了稳健的符号骨架,随后使用遗传算法将其拟合到观测数据上。该框架首先在合成一维问题上进行评估,以评估其在不同认知不确定性水平下的稳健性。结果表明,即使在数据稀缺的情况下,所提出的符号回归方法也能恢复正确的表达式。在这项工作中,我们展示了将我们的方法应用于真实冬小麦数据的结果,学习了田间不同管理区的不同参数化氮响应曲线。结果表明,发现的表达式不仅比二次平台和指数函数等传统模型实现了更低的拟合误差,而且还捕捉了不同空间区域的多样化函数行为。这证明了神经符号回归在发现特定地点农学关系和支持精准农业中知情决策方面的潜力。

英文摘要

Accurately modeling crop response to Nitrogen (N) fertilization is a fundamental challenge in precision agriculture, as it impacts both economic returns and environmental sustainability. Existing approaches either rely on predefined parametric forms or opaque machine learning models, limiting their ability to interpret or discover site-specific functional relationships from data. In this work, we propose a neuro symbolic regression (SR) approach to learn parametric N-response curves without assuming a predefined functional form. Our approach integrates a transformer-based Multi-Set Symbolic Skeleton Prediction strategy, enabling the discovery of shared functional structures across multiple subdomains or management zones (MZs). By constructing diverse input subsets and enforcing consistency across them, the method recovers robust symbolic skeletons that are subsequently fitted to observed data using a genetic algorithm. This framework was first evaluated on synthetic one-dimensional problems to assess its robustness under varying levels of epistemic uncertainty. The results demonstrate the ability of the proposed SR approach to recover correct expressions even in data-scarce regimes. In this work, we present the results of applying our method to real-world winter wheat data, learning distinct parametric N-response curves for different MZs within a field. The results show that the discovered expressions not only achieve lower fitting errors than traditional models such as quadratic-plateau and exponential functions, but also capture diverse functional behaviors across spatial regions. This demonstrates the potential that neuro SR has to enable the discovery of site-specific agronomic relationships and support informed decision-making in precision agriculture.

2605.31273 2026-06-01 cs.LG 版本更新

Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

生存强化学习:迈向可扩展的自监督强化学习

Franki Nguimatsia-Tiofack, Fabian Schramm, Théotime Le Hellard, Justin Carpentier

发表机构 * Inria and École Normale Supérieure PSL Research University(Inria 和 法国国家科学研究中心巴黎大学)

AI总结 提出生存强化学习(SRL),一种基于在线分类的方法,通过最大化智能体在目标状态停留时间来解决对比强化学习中的均匀性-容忍性困境,在长时程运动任务上性能提升2至8倍。

详情
AI中文摘要

虽然自监督对比强化学习(CRL)展现了显著的深度扩展能力,成功使用了超过64层的网络,但由于对比损失固有的均匀性-容忍性困境,扩展的CRL在长时程目标条件规划中仍然存在困难。我们引入了生存强化学习(SRL),一种基于在线分类的替代方法,通过最大化智能体在目标状态的停留时间来扩展生存价值学习框架。SRL绕过了CRL的结构约束,并缓解了生存框架固有的“bang-bang”控制解,这种控制解在复杂动态系统中往往引发不良行为。在多种机器人基准测试中,扩展的SRL在操作任务上与最先进的CRL相当,并在稳定的长时程运动任务上性能提升2至8倍。我们的结果提供了强有力的额外证据,表明基于分类的方法可能成为扩展强化学习这一更广泛努力中的关键原语。

英文摘要

While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We introduce Survival Reinforcement Learning (SRL), an online classification-based alternative that extends the survival value learning framework by maximizing the agent's dwell time at target goals. SRL bypasses the structural constraints of CRL and mitigates the "bang-bang" control solutions inherent to survival frameworks, which often induce undesirable behavior in complex dynamical systems. Evaluated across diverse robotic benchmarks, scaled SRL matches state-of-the-art CRL on manipulation tasks and outperforms it by 2x to 8x on stable, long-horizon locomotion tasks. Our results provide strong additional evidence that classification-based methods may serve as a key primitive in the broader effort to scale reinforcement learning.

2605.31272 2026-06-01 cs.LG 版本更新

Algorithmic Recourse of In-Context Learning for Tabular Data

表格数据的上下文学习算法补救

Wenshuo Dong, Jiaming Zhang, Shaopneg Fu, Hongbin Lin, Di Wang, Lijie Hu

发表机构 * Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates(阿联酋人工智能马尔代夫 bin Zayed 大学) University of Copenhagen, Denmark(丹麦哥本哈根大学) Renmin University of China, China(中国人民大学) King Abdullah University of Science and Technology, Saudi Arabia(沙特阿拉伯王国科学与技术大学) The Hong Kong University of Science and Technology(Guangzhou), China(广州科技大学(香港))

AI总结 针对表格数据上下文学习中的黑箱模型,提出自适应子空间补救框架ASR-ICL,通过零阶优化高效生成可操作且稀疏的补救方案,理论证明补救有界且随上下文增大收敛至经典解。

Comments Accepted by ICML 2026

详情
AI中文摘要

随着预测模型越来越多地部署在信用审批等高风险场景中,对受影响的个体提供补救的后验方法需求日益增长。许多此类模型处理表格数据,其中特征对应现实世界的属性。最近,上下文学习(ICL)使大型语言模型能够通过在推理时以标注示例为条件进行表格预测,而无需显式训练。然而,ICL下表格决策的算法补救仍基本未被探索。在这项工作中,我们首次研究了ICL下表格数据的算法补救。我们进行了理论分析,表明补救仍然定义良好且有界,并刻画了随着上下文增大,补救如何收敛到经典解。在实践中,我们提出了一种新颖的零阶补救框架——自适应子空间补救用于上下文学习(ASR-ICL),该框架高效地为黑箱ICL模型生成可操作且稀疏的补救。所提出的框架自然地扩展到多类表格任务。在多个真实世界数据集和模型上的实验表明,ASR-ICL以更少的查询实现了与现有方法相当的补救质量,并经验性地验证了预测的收敛行为,支持了我们的理论分析。

英文摘要

As predictive models are increasingly deployed in high-stakes settings such as credit approval, there is a growing need for post-hoc methods that provide recourse to affected individuals. Many such models operate on tabular data, where features correspond to real-world attributes. Recently, in-context learning (ICL) has enabled large language models to perform tabular prediction by conditioning on labeled examples at inference time, without explicit training. However, algorithmic recourse for tabular decision-making under ICL remains largely unexplored. In this work, we present the first study of algorithmic recourse for tabular data under ICL. We carry out a theoretical analysis, showing that recourse remains well-defined and bounded, and we characterize how recourse converges toward classical solutions as the context size increases. In practice, we propose a novel zeroth-order recourse framework, Adaptive Subspace Recourse for In-Context Learning (ASR-ICL), that efficiently generates actionable and sparse recourse for black-box ICL models. The proposed framework naturally extends to multi-class tabular tasks. Experiments across multiple real-world datasets and models demonstrate that ASR-ICL achieves recourse quality comparable to existing methods with fewer queries and empirically confirm the predicted convergence behavior, supporting our theoretical analysis.

2605.31266 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

超越少数:用于少样本非典型布局到图像生成的解耦语义与基元

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

发表机构 * State Key Laboratory of Virtual Reality Technology and Systems(虚拟现实技术与系统国家重点实验室) School of Computer Science and Engineering(计算机科学与工程学院) Qingdao Research Institute, Beihang University, China(北京航空航天大学青岛研究所,中国)

AI总结 针对少样本非典型布局到图像生成中表示碎片化问题,提出通过语义锚定和基元注入解耦语义与视觉细节,实现鲁棒少样本适应。

Comments Accepted to ICML 2026; code available at https://github.com/iCVTEAM/DSP

详情
AI中文摘要

布局到图像(L2I)任务通过对象类别和空间布局实现对图像生成的细粒度控制。然而,现有的L2I方法在少样本非典型设置下会产生碎片化和扭曲的生成结果。我们将这种失败称为表示碎片化,源于将语义身份与视觉细节纠缠在一起的粒度不匹配。为了解决这个问题,我们提出了一种表示驱动的框架,将语义与基元解耦,以实现鲁棒的少样本适应。具体来说,语义锚定将类别语义聚合到锚点中以实现稳定的身份,而基元注入则建模可重新组合的基元以实现鲁棒的局部细节建模。概念引导进一步通过显著性感知目标调节优化,以保持前景语义一致性。大量实验表明,在5样本设置下,我们的方法在视觉保真度和跨不同非典型领域的对齐方面,均优于最先进的L2I方法。源代码公开于 https://github.com/iCVTEAM/DSP。

英文摘要

The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details. To address this issue, we propose a representation-driven framework that disentangles semantics from primitives for robust few-shot adaptation. Specifically, Semantic Anchoring aggregates categorical semantics into anchors for stable identity, while Primitive Imbuing models recomposable primitives for robust local detail modeling. Conceptual Steering further regulates optimization with a saliency-aware objective to preserve foreground semantic consistency. Extensive experiments demonstrate consistent improvements in the 5-shot regime over state-of-the-art L2I methods in both visual fidelity and alignment across diverse atypical domains. The source code is publicly available at https://github.com/iCVTEAM/DSP.

2605.31264 2026-06-01 cs.AI cs.CL cs.LG 版本更新

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL: 通过专家知识蒸馏实现自动化AI技能生成

Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao, Xia Hu

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 提出一个从异构痕迹到可检查、可修正、可代理使用的技能包的自动化蒸馏系统,用于生成基于人的AI技能。

Comments 12 pages, 4 figures

详情
AI中文摘要

LLM代理不仅被期望完成孤立的任务,还要承载人类专业知识、判断和互动风格的有限表示。构建这种基于人的代理仍然困难,因为与人或角色相关的可操作知识通常嵌入在异构痕迹中,而不是写成清晰的指令。现有的记忆和角色系统捕捉了这些证据的片段,而技能框架提供了可移植的打包格式;然而,没有端到端的工作流将这些痕迹蒸馏成可检查、可修正和代理可用的技能。我们提出了一个自动化的痕迹到技能蒸馏系统,通过专家知识蒸馏生成基于人的AI技能。给定目标人物或角色的材料,COLLEAGUE.SKILL 生成一个版本化的技能包,包含两个协调的轨道:一个能力轨道,用于实践、心理模型和决策启发式;一个边界行为轨道,用于沟通风格、互动规则和修正历史。该包可以被检查、调用、通过自然语言反馈更新、回滚、跨代理主机安装,并可选择性地为受控分发做准备。我们描述了开源系统中实现的人工制品契约、生成工作流、修正生命周期、部署表面和领域预设。在撰写本文时,公共仓库拥有约18.5k个GitHub星标;画廊列出了来自165位贡献者的215个技能,以及跨列出的技能卡累计超过10万个星标。该系统说明了基于人的技能如何表示为可移植、可修正的包,而不是不透明的提示或隐藏的记忆。

英文摘要

LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style. Building such person-grounded agents remains difficult because actionable knowledge associated with a person or role is usually embedded in heterogeneous traces rather than written as clean instructions. Existing memory and persona systems capture fragments of this evidence, while skill frameworks provide portable packaging formats; however, there is no end-to-end workflow for distilling these traces into inspectable, correctable, and agent-usable skills. We present an automated trace-to-skill distillation system for generating person-grounded AI skills via expert knowledge distillation. Given materials from a target person or role, COLLEAGUE.SKILL produces a versioned skill package with two coordinated tracks: a capability track for practices, mental models, and decision heuristics, and a bounded behavior track for communication style, interaction rules, and correction history. The package can be inspected, invoked, updated through natural-language feedback, rolled back, installed across agent hosts, and optionally prepared for controlled distribution. We describe the artifact contract, generation workflow, correction lifecycle, deployment surface, and domain presets implemented in the open-source system. At the time of writing, the public repository has approximately 18.5k GitHub stars; the gallery lists 215 skills from 165 contributors and more than 100k cumulative stars across listed skill cards. The system illustrates how person-grounded skills can be represented as portable, correctable packages rather than opaque prompts or hidden memories.

2605.31261 2026-06-01 cs.LG cs.AI stat.ML 版本更新

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

为什么线性循环记忆在部分可观测强化学习中有效

Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed, Michael Muehlebach

发表机构 * EPFL(苏黎世联邦理工学院) Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所)

AI总结 本文通过构造两种线性滤波器,从理论上证明了线性循环神经网络在部分可观测强化学习中作为记忆单元的有效性,并扩展到动作控制的隐马尔可夫模型。

详情
AI中文摘要

线性循环神经网络家族在部分可观测强化学习中作为循环记忆单元表现出色。我们通过构造并研究两种线性滤波器为其经验有效性提供了理论依据:(i) 第一种在确定性转移矩阵下精确重现隐马尔可夫模型(HMM)中信念向量的预softmax logits,从而作为最优策略学习的充分统计量;(ii) 第二种在近似确定性转移矩阵下实现状态解码误差趋近于零,从而将状态模糊性降至接近零。结果扩展到动作控制的HMM,其中相应的线性滤波器变为随时间变化且依赖于动作的动态。我们通过数值实验说明了主要结果,并进一步展示了所构造的线性滤波器在小型强化学习游戏中作为强特征提取器的能力。

英文摘要

The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.

2605.31259 2026-06-01 cs.LG 版本更新

Lightweight CNN-Based Anomaly Detection for High Voltage Converter Modulators in the Spallation Neutron Source

基于轻量级CNN的散裂中子源高压转换器调制器异常检测

Alberto D. Cencillo, Leonardo Concepción, Julián Luengo, Isaac Triguero

发表机构 * Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI)(安达卢西亚数据科学与计算智能研究院) Department of Computer Science and Artificial Intelligence (DECSAI), University of Granada(格拉纳达大学计算机科学与人工智能系)

AI总结 针对高压转换器调制器多通道信号异常检测,通过改变时间滤波与跨通道混合的顺序并引入自适应通道重加权,在公开数据集上达到AUC-PR 0.816和AUC-ROC 0.934,超越现有方法。

Comments 21 pages, 8 figures

详情
AI中文摘要

高功率脉冲转换器的非计划停机是大型加速器设施停机的主要原因。在散裂中子源(SNS)中,高压转换器调制器(HVCM)始终是丢失束流时间的第二大贡献者。每个HVCM脉冲通过跨电流、电压和磁通量的传感器通道记录,这些通道的相互交互编码了系统的运行状态。故障前兆在这些通道中并非均匀显现:根据故障类型,它们可能改变单个信号的时间结构,改变通道间的统计依赖性,或两者兼有。现有的深度学习方法通常使用标准卷积流水线处理多通道信号,该流水线从第一层开始就纠缠时间和跨通道操作,使得模型没有明确的机制来表示通道独立性或结构化的通道间交互。我们假设架构归纳偏差,特别是时间滤波和跨通道混合的顺序,在这类数据的检测性能中起着核心作用。为了验证这一点,我们改变了这两个操作的顺序,并检查每个脉冲的自适应通道重加权是否进一步提高灵敏度。在涵盖所有四个SNS子系统(RFQ、DTL、CCL、SCL)的公开HVCM数据集上评估,我们最好的变体实现了池化AUC-PR为0.816和AUC-ROC为0.934,在大多数子系统和六个故障家族中的五个上优于现有技术。消融实验识别出三个主导输入通道,并将每个故障家族的性能与前兆表现为单个通道的幅度偏移还是需要联合通道表示才能显现的更细微模式联系起来。

英文摘要

Unscheduled trips of high-power pulsed converters are a leading source of downtime at large accelerator facilities. At the Spallation Neutron Source (SNS), the High Voltage Converter Modulators (HVCMs) are consistently the second-largest contributor to lost beam time. Each HVCM pulse is recorded across sensor channels spanning currents, voltages, and magnetic fluxes, whose mutual interactions encode the operating state of the system. Fault precursors do not manifest uniformly across these channels: depending on fault type, they may alter the temporal structure of individual signals, change the statistical dependencies among channels, or both. Existing deep-learning approaches typically process multi-channel signals with standard convolutional pipelines that entangle temporal and cross-channel operations from the first layer, giving the model no explicit mechanism to represent channel independence or structured inter-channel interaction. We hypothesise that architectural inductive bias, specifically the ordering of temporal filtering and cross-channel mixing, plays a central role in detection performance on this class of data. To test this, we vary the order in which these two operations are applied, and examine whether per-pulse adaptive channel reweighting further improves sensitivity. Evaluated on the public HVCM dataset across all four SNS subsystems (RFQ, DTL, CCL, SCL), our best variant achieves a pooled AUC-PR of 0.816 and AUC-ROC of 0.934, outperforming the state of the art on most subsystems and five of the six fault families. Ablations identify three dominant input channels and link per-fault-family performance to whether precursors manifest as amplitude shifts in individual channels or as subtler patterns requiring joint channel representations to surface.

2605.31257 2026-06-01 cs.LG stat.ML 版本更新

Fraud Type Decomposition and the Observation-Mechanism Taxonomy:Class-Specific Detection Limits in Payment Networks

欺诈类型分解与观测机制分类:支付网络中的类别特定检测极限

Gaurav Dhama

AI总结 本文通过引入观测机制分类将欺诈分为五类,证明按类别分别估计欺诈率并聚合优于整体估计,并推导了每类检测的理论约束。

Comments 59 pages

详情
AI中文摘要

支付网络中的欺诈检测依赖于通过异质且不完美的观测过程生成的标签,但现有方法将欺诈视为同质二元变量。我们证明这一假设在结构上不正确,并导致可证明的低效。我们引入一个观测机制分类,将欺诈分为五类,每类由不同的审查和标记流程定义。我们证明按类别分别估计欺诈率并聚合严格优于整体估计,效率差距由异质观测率导致的Jensen惩罚刻画。对于每类,我们推导了检测的绑定理论约束,包括内生标签腐败、结构不可观测性和特征非信息性。这些结果确立了欺诈检测本质上是一组不同的估计问题,每个问题由其自身的观测结构和检测极限支配。

英文摘要

Fraud detection in payment networks relies on labels generated through heterogeneous and imperfect observation processes, yet existing approaches treat fraud as a homogeneous binary variable. We show that this assumption is structurally incorrect and leads to provable inefficiency. We introduce an observation-mechanism taxonomy that partitions fraud into five classes, each defined by a distinct censorship and labeling pipeline. We prove that estimating fraud rates separately by class and aggregating strictly dominates pooled estimation, with the efficiency gap characterized as a Jensen penalty arising from heterogeneous observation rates. For each class, we derive the binding theoretical constraint on detection, including endogenous label corruption, structural non-observability, and feature non-informativeness. These results establish that fraud detection is fundamentally a collection of distinct estimation problems, each governed by its own observation structure and detection limit.

2605.31250 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Entropic Projection Alignment: Estimating, Explaining, and Improving Model Performance Under Distribution Shift

熵投影对齐:估计、解释和改进分布偏移下的模型性能

Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra, Manuela Veloso

发表机构 * J.P. Morgan AI Research(摩根大通AI研究所)

AI总结 提出熵投影对齐(EPA)方法,通过匹配选定矩并最小化KL散度来对齐源分布与目标分布,从而统一解决分布偏移下的性能估计、解释和改进问题。

Comments Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情
AI中文摘要

我们提出了一个统一框架,用于解决分布偏移的三个关键挑战:(1)估计模型在未标记目标域上的性能,(2)通过识别导致偏移的特征来解释偏移,以及(3)提高目标域性能。我们的方法,熵投影对齐(EPA),通过匹配精心选择的矩同时最小化与源分布的KL散度,将源分布与目标分布对齐。该公式为重要性权重提供了唯一的闭式解,通过隐式方差控制实现鲁棒性。借鉴领域适应理论,我们证明矩匹配足以实现可靠的估计和适应,避免了完全密度比恢复的需要。大量实验以及强有力的理论保证表明,EPA在提供显著计算效率的同时,始终优于最先进的基线方法。

英文摘要

We propose a unified framework for addressing three key challenges of distribution shift: (1) estimating a model's performance on an unlabeled target domain, (2) explaining the shift by identifying the features responsible, and (3) improving the target domain performance. Our method, Entropic Projection Alignment (EPA), aligns the source distribution to the target by matching carefully selected moments while simultaneously minimising the KL divergence from the source. This formulation yields a unique closed-form solution for importance weights, achieving robustness through implicit variance control. Drawing on domain adaptation theory, we establish that moment matching is sufficient for reliable estimation and adaptation, avoiding the need for full density ratio recovery. Extensive experiments, together with strong theoretical guarantees, demonstrate that EPA consistently outperforms state-of-the-art baselines while offering substantial computational efficiency.

2605.31249 2026-06-01 cs.LG cs.AI 版本更新

Learning Cardiac Latent Representations in Vectorcardiogram Space

在向量心电图空间中学习心脏潜在表示

Bosong Huang, Panzhen Zhao, Zengxiang Li, Patricia Lee, Wei Jin, Alan Wee-Chung Liew, Ming Jin, Shirui Pan

发表机构 * Griffith University, Australia(格里菲斯大学) SingHealth Duke-NUS AI in Medicine Institute, Singapore(新加坡SingHealth Duke-NUS医学人工智能研究所) Emory University, USA(埃默里大学)

AI总结 针对标准十二导联心电图表示学习中的冗余和过拟合问题,提出基于Frank向量心电图模型的LVCG框架,在物理潜在空间中学习视图不变的心脏电活动表示,提升鲁棒性和泛化能力。

详情
AI中文摘要

心电图(ECG)是心脏评估的基石,学习信息丰富的ECG表示对于从疾病诊断到临床报告生成等任务至关重要。然而,现有方法几乎完全在可观测的ECG信号空间中操作。实际上,标准十二导联ECG代表了同一心脏电活动在不同空间方向上的多个投影。因此,在ECG空间中进行表示学习不可避免地引入了大量冗余,可能导致虚假相关性和过拟合风险增加。为了解决这个问题,受Frank向量心电图(VCG)模型启发,我们提出直接在VCG空间中学习心脏电活动的统一潜在表示。我们引入了LVCG,这是第一个设计用于在此物理基础潜在空间中运行的通用自监督表示学习框架。通过学习视图不变的潜在VCG表示而非导联特定伪影,LVCG最小化了冗余并提高了泛化能力。LVCG在各项任务中普遍优于ECG空间基线,展现出增强的鲁棒性和泛化能力,尤其在领域偏移设置中。

英文摘要

Electrocardiography (ECG) is a cornerstone of cardiac assessment, making the learning of informative ECG representations fundamental to tasks ranging from disease diagnosis to clinical report generation. However, existing methods operate almost exclusively in the observable ECG signal space. In practice, the standard twelve-lead ECG represents multiple projections of the same underlying cardiac electrical activity from different spatial orientations. Therefore, representation learning in the ECG space inevitably introduces substantial redundancy, which may lead to spurious correlations and increased risk of overfitting. To address this and motivated by the Frank vectorcardiogram (VCG) model, we propose learning a unified latent representation of cardiac electrical activity directly in the VCG space. We introduce LVCG, the first general self-supervised representation learning framework designed to operate in this physically grounded latent space. By learning view-invariant latent VCG representations rather than lead-specific artifacts, VCG minimizes redundancy and improves generalization. LVCG generally outperforms ECG-space baselines across tasks, demonstrating enhanced robustness and generalization, especially in domain shift settings.

2605.31245 2026-06-01 cs.LG 版本更新

Toward Identifiable Sparse Autoencoders

走向可识别的稀疏自编码器

Walter Nelson, Theofanis Karaletsos, Francesco Locatello

发表机构 * Institute of Science and Technology Austria(科学与技术奥地利研究所) Pyramidal Inc.(Pyramidal公司) Achira Inc., USA(Achira公司,美国)

AI总结 针对稀疏自编码器训练不稳定的问题,通过理论分析模型属性并改进架构与训练流程,提出iSAE变体,实现更低重构误差与更高稳定性。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

最近,稀疏自编码器(SAE)已成为解释和交互实际神经网络表示的有吸引力的工具。虽然常见的经验共识如此,但我们也在理论上表明SAE高度不稳定:不同的训练运行可能产生不同的概念字典和稀疏编码。我们刻画了阻碍实际SAE稳定性的模型属性,并通过架构和训练过程的最小改动解决每个问题。这些改动共同产生了两个版本的 extbf{可识别}SAE(iSAE),这是标准TopK SAE的变体,具有更低的重构误差和更高的稳定性。我们通过将SAE与传统字典学习方法联系起来,从理论上解释了这一改进,并表明实践中学习的字典满足近似受限等距条件,从而使这些模型中的相应稀疏编码接近可识别。

英文摘要

Recently, sparse autoencoders (SAEs) have emerged as an attractive tool for interpreting and interacting with representations in practical neural networks. While it is common empirical folklore, we also show theoretically that SAEs are highly unstable: different training runs are likely to produce different concept dictionaries and sparse codes. We characterize the model properties that hinder the stability of real-world SAEs, and address each of these problems through minimal changes to the architecture and training procedure. Together, these changes yield two versions of an \textbf{i}dentifiable SAE (iSAE), a variant of the standard TopK SAE with lower reconstruction error and improved stability. We explain this improvement theoretically by connecting SAEs with traditional dictionary learning approaches, and show that the dictionaries learned in practice satisfy an approximate restricted isometry condition, rendering the corresponding sparse codes in those models near-identifiable.

2605.31244 2026-06-01 cs.LG physics.comp-ph 版本更新

Spectral Reach: Understanding Neural Scaling as Progress into the Spectral Tail

谱范围:理解神经缩放作为进入谱尾的进展

Konstantin Nikolaou, Jonas Scheunemann, Sven Krippendorf, Samuel Tovey, Christian Holm

发表机构 * Institute for Computational Physics, University of Stuttgart, Stuttgart, Germany(斯图加特大学计算物理研究所) Department of Applied Mathematics(应用数学系) Department of Physics, University of Cambridge, Cambridge, UK(剑桥大学物理系)

AI总结 本文提出“谱位置”度量,通过经验神经正切核的特征值分析神经缩放定律,发现大模型通过“谱范围”进入更深的谱尾从而降低损失,并指出特征学习是关键机制。

详情
AI中文摘要

神经缩放定律描述了模型大小、数据集大小、计算量与性能之间可预测的幂律关系。尽管这些定律指导了现代基础模型的发展,但其背后的机制仍知之甚少,部分原因是缺乏可扩展的分析工具。为弥补这一差距,我们引入了“谱位置”:一种可扩展的度量,用于衡量经验神经正切核(eNTK)的哪些特征值当前驱动损失降低。将该度量应用于缩放实验,我们发现谱位置在整个训练过程中下降:学习从主导特征模式转移到谱尾。大模型比小模型更深入地进入谱尾,揭示了一种我们称之为“谱范围”的与大小相关的能力。这解释了为什么大模型能达到更低的损失:它们能持续学习小模型无法访问的弱谱信号。我们进一步确定特征学习是谱范围的关键促成因素。它自适应地放大梯度幅度,使学习在冻结表示停滞的地方持续进展。这指向了通过架构和优化器设计的具体干预措施。

英文摘要

Neural scaling laws describe predictable power-law relationships between model size, dataset size, compute, and performance. While these laws guide the development of modern foundation models, the mechanisms underpinning them remain poorly understood, in part due to the absence of scalable analysis tools. To close this gap, we introduce "spectral position": a scalable measure of which eigenvalues of the empirical neural tangent kernel (eNTK) currently drive loss reduction. Applying this measure to scaling experiments, we find that spectral position decreases throughout training: learning shifts from dominant eigenmodes into the spectral tail. Larger models reach further into the tail than smaller models, revealing a size-dependent capacity we call "spectral reach". This suggests why larger models achieve lower losses: they sustain learning on weak spectral signals inaccessible to smaller models. We further identify feature learning as a key enabler of spectral reach. It adaptively amplifies gradient magnitudes as learning advances, sustaining progress where frozen representations stall. This points to concrete interventions through architecture and optimizer design.

2605.31241 2026-06-01 cs.LG 版本更新

Bifurcated Remaining Useful Life Prediction: A Hybrid Approach for Realistic Uncertainty Characterization

分支剩余使用寿命预测:一种用于现实不确定性表征的混合方法

Xabier Belaunzaran, Antonio Nappa, Arkaitz Artetxe, Basilio Sierra

发表机构 * Fundación Vicomtech Basque Research and Technology Alliance (BRTA)(维克森科技巴斯克研究与技术联盟) University of the Basque Country (UPV/EHU)(巴斯克国家大学)

AI总结 提出一种混合预测框架,通过将涡扇发动机寿命分为健康与退化阶段,结合LSTM自编码器、条件威布尔生存分析和概率神经网络,实现不确定性感知的剩余使用寿命预测。

Comments Submitted to 9th European Conference of the Prognostics and Health Management Society 2026

详情
AI中文摘要

本研究提出了一种新颖的混合预测框架,用于使用NASA C-MAPSS数据集对涡扇发动机进行不确定性感知的剩余使用寿命(RUL)估计。该框架采用状态感知策略,将发动机的运行寿命分为“健康”和“退化”两个阶段。一个基于LSTM的自编码器,仅在标称数据(RUL > 150个循环)上训练,通过监测重构误差作为鲁棒的状态分类器。对于健康阶段,使用条件威布尔生存分析进行平均剩余寿命估计。对于退化阶段,使用带有蒙特卡洛丢弃法的概率神经网络捕获偶然和认知不确定性。不使用严格的二元标签,而是使用校准的sigmoid函数将自编码器的输出转换为连续状态概率,动态加权最终集成预测。该框架的主要优势在于生成物理一致的不确定性带,在寿命末期提供高置信度预测,同时准确反映早期运行的内在方差,为风险知情维护提供鲁棒工具。

英文摘要

This study presents a novel hybrid prognostic framework for uncertainty-aware Remaining Useful Life (RUL) estimation in turbofan engines using the NASA C-MAPSS dataset. The framework employs a state-aware strategy that bifurcates the engines operational lifespan into "healthy" and "degraded" regimes. An LSTM-based autoencoder, trained strictly on nominal data (RUL > 150 cycles), monitors reconstruction error to act as a robust state classifier. For the healthy regime, a Conditional Weibull Survival Analysis is used for Mean Residual Life estimation. For the degraded regime, a Probabilistic Neural Network with Monte Carlo Dropout captures both aleatoric and epistemic uncertainties. Rather than using rigid binary labels, a calibrated sigmoid function converts the autoencoders output into continuous state probabilities, dynamically weighting the final ensemble prediction. The primary strength of this framework is its generation of physically consistent uncertainty bands, yielding high-confidence predictions near end-of-life while accurately reflecting the inherent variance of early operation, providing a robust tool for risk-informed maintenance.

2605.31239 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Correcting Split Selection in Online Decision Trees via Anytime-Valid Inference

通过随时有效推断纠正在线决策树中的分裂选择

Salim I. Amoukou, Saumitra Mishra, Manuela Veloso

发表机构 * J.P. Morgan AI Research(摩根大通AI研究)

AI总结 针对在线决策树分裂选择缺乏有效统计保证的问题,提出基于随时有效推断的方法,实现任意数据流下错误分裂的随时有效控制、预测优势下的有限承诺时间,并在平稳独立同分布数据下保证风险单调递减且每次分裂严格改善。

Comments Accepted as a Spotlight at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

基于装袋的集成方法,尤其是自适应随机森林,是数据流学习中最强的表现者之一。这些方法的共同点是依赖霍夫丁树作为基学习器,通过使用浓度不等式测试候选分裂是否显著优于其替代方案来增量式地构建决策树。尽管经验成功,现有变体缺乏有效的统计保证。当前分析依赖于固定样本浓度界,而分裂决策使用数据依赖的停止规则,这使其保证无效,并可能将错误分裂的概率推向1。我们引入了一种基于随时有效推断的原则性替代方案。我们的方法提供:(i) 在任意数据流(包括非平稳设置)下对错误分裂的随时有效控制;(ii) 在预测优势下的有限承诺时间;(iii) 在平稳独立同分布数据下,风险单调递减且每次分裂严格改善。在经验上,我们评估了独立树及其在非平稳流中在自适应随机森林中的使用。我们的方法提高了性能,同时生成了更小的树。

英文摘要

Bagging-based ensembles, most notably Adaptive Random Forests, are among the strongest performers for learning from data streams. A common denominator across these methods is their reliance on Hoeffding Trees as base learners, which grow decision trees incrementally by testing whether a candidate split is significantly better than its alternatives using concentration inequalities. Despite their empirical success, existing variants lack valid statistical guarantees. Current analyses rely on fixed-sample concentration bounds, while split decisions are made using data-dependent stopping rules, which invalidates their guarantees and can drive the probabilty of incorrect splits to one. We introduce a principled alternative based on anytime-valid inference. Our method provides: (i) anytime-valid control of false splits under arbitrary data streams, including non-stationary settings; (ii) finite commitment time under a predictive advantage; and (iii) under stationary i.i.d. data, risk is monotone decreasing and strictly improves at every split. Empirically, we evaluate both standalone trees and their use within Adaptive Random Forests on non-stationary streams. Our method improves performance while producing substantially smaller trees.

2605.31238 2026-06-01 cs.CL cs.LG 版本更新

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection

通过图约束路径选择扩展多跳训练数据

Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song, Wei Xue, Yike Guo

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学) Hong Kong Generative AI Research and Development Center(香港生成式人工智能研究与发展中心) Hong Kong Baptist University(香港 Baptist 大学)

AI总结 针对专业文档的组成推理,提出基于图约束路径选择的方法,通过解耦路径发现与语言化,利用图约束过滤无效路径,显著扩展可用语料并提升模型性能。

Comments 21 pages, 5 figures

详情
AI中文摘要

赋予大型语言模型对专业文档的组成推理能力需要大规模的多跳训练数据,而此类数据除了基于结构化来源的精心策划基准外很少存在。为了直接从纯文本、无标注文本中构建此类数据,现有方法要求单个教师模型联合发现文档中的证据路径并将其表述为问答对。然而,当文档围绕重复模板和密集交叉引用子句(这是大多数真实世界专业语料库的特征)构建时,这些方法会严重退化。在这项工作中,我们将这两个操作解耦:推理路径在上下文关键词质心的图上离线枚举,教师模型仅用于将预验证的路径语言化。该图强制执行五个几何可接受性约束,我们提供了Gram矩阵论证,表明仅局部相似性边界允许端点漂移高达约91°,并且需要上相似性边界才能退出由模板文本形成的密集嵌入团。一项匹配规模的消融实验揭示了其机制:在相等的训练规模下,约束链和无约束链产生无法区分的下游性能,而全规模下的增益来自可用语料库的4.4倍扩展,而非更高的每条链质量——这重新定义了图约束在此设置中的作用:提高教师可合成性而非改进链内容。在从CUAD法律合同语料库构建的80K示例上微调Qwen3-32B,将闭卷Token F1从21.66%提高到38.58%。我们已在https://github.com/hkgai-official/GCSCS发布代码。

英文摘要

Endowing large language models with compositional reasoning over specialized documents requires multi-hop training data at scale, where such data rarely exists outside of curated benchmarks built on structured sources. To construct it directly from plain, unannotated text, existing methods ask a single teacher model to jointly discover an evidence path through a document and verbalize it as a question-answer pair. However, these methods degrade sharply when documents are structured around repetitive templates and densely cross-referencing clauses, conditions that characterize most real-world specialized corpora. In this work, we decouple the two operations: reasoning paths are enumerated offline over a graph of contextual keyword centroids, and the teacher is invoked only to verbalize pre-validated paths. The graph enforces five geometric admissibility constraints, for which we provide Gram-matrix arguments establishing that local similarity bounds alone admit endpoint drift up to ${\sim}91^{\circ}$, and that an upper similarity bound is necessary to exit dense embedding cliques formed by boilerplate text. A matched-size ablation isolates the mechanism: at equal training scale, constrained and unconstrained chains yield indistinguishable downstream performance, and the gain at full scale comes from a 4.4$\times$ expansion of the usable corpus rather than from higher per-chain quality -- reframing the role of graph constraints, in this setting, as raising teacher synthesizability rather than improving chain content. Fine-tuning Qwen3-32B on 80K examples constructed from the CUAD legal contract corpus improves closed-book Token F1 from 21.66% to 38.58%. We have released our codes at https://github.com/hkgai-official/GCSCS.

2605.31231 2026-06-01 math.NA cs.LG cs.NA 版本更新

A holomorphic neural network framework for 3D boundary value problems governed by harmonic potentials

基于全纯神经网络的调和势控制的三维边值问题框架

Enrico Ballini, Allan Peter Engsig-Karup, Tito Andriollo

发表机构 * Department of Mechanical and Production Engineering, Aarhus University(阿arhus大学机械与生产工程系) Department of Applied Mathematics and Computer Science, Technical University of Denmark(技术大学 of Denmark应用数学与计算机科学系)

AI总结 提出一种基于Whittaker积分公式和全纯神经网络的框架,通过构造精确满足偏微分方程的神经网络求解三维调和势边值问题,仅需边界配点训练,在拉普拉斯和线弹性问题中验证了精度。

详情
AI中文摘要

我们提出了一种基于神经网络的框架,用于求解解可表示为调和势的三维边值问题。该方法利用Whittaker积分公式,通过关于合适复变量的全纯函数来表示解。这些函数随后使用全纯神经网络进行逼近,从而保证全纯性要求。该公式的一个关键特征是,控制偏微分方程(PDE)通过构造精确满足。因此,与标准的物理信息神经网络相比,在域内部不需要PDE的残差最小化,训练完全基于边界配点。该方法针对三维拉普拉斯和线弹性问题进行了验证,在后一种情况下,位移和应力场通过Papkovich-Neuber势表示。数值结果表明,标量和矢量场均得到精确逼近,误差在整个域内保持可控。总体而言,该工作表明,将解析结构融入神经网络架构为三维边值问题的无网格逼近提供了一种自然且有效的框架,同时保留了控制方程的基本性质。

英文摘要

We present a neural-network-based framework for the solution of three-dimensional boundary value problems where the solution is expressible in terms of harmonic potentials. The approach leverages the Whittaker integral formula, which allows representing the solution through functions that are holomorphic with respect to a suitable complex variable. These functions are subsequently approximated using holomorphic neural networks, which guaranty fulfillment of the holomorphicity requirement. A key feature of the proposed formulation is that the governing partial differential equations (PDEs) are satisfied exactly by construction. Therefore, in contrast to standard physics-informed neural networks, no residual minimization of PDEs is required in the interior of the domain, and training is based exclusively on boundary collocation points. The method is validated against three-dimensional Laplace and linear elasticity problems, where, in the latter case, displacement and stress fields are expressed via the Papkovich-Neuber potentials. The numerical results show an accurate approximation of both scalar and vector fields, with errors remaining controlled throughout the domain. Overall, the work demonstrates that the incorporation of analytical structures into neural network architectures provides a natural and effective framework for the meshless approximation of three-dimensional boundary value problems while preserving the underlying properties of the governing equations.

2605.31228 2026-06-01 cs.LG cs.AI 版本更新

EchoRL: Reinforcement Learning via Rollout Echoing

EchoRL:通过回滚回响进行强化学习

Jinhe Bi, Aniri, Minglai Yang, Xingcheng Zhou, Wenke Huang, Sikuan Yan, Yujun Wang, Zixuan Cao, Michael Färber, Xun Xiao, Volker Tresp, Yunpu Ma

发表机构 * Munich Center for Machine Learning(慕尼黑机器学习中心) Huawei Heisenberg Research Center(华为海森堡研究所以) University of Arizona(亚利桑那大学) College of Computing(计算学院) Data Science, Nanyang Technological University, Singapore(数据科学,南洋理工大学,新加坡) MemAgents Lab(MemAgents实验室)

AI总结 针对RLVR训练中优势退化问题,提出EchoRL模块,通过从成功回滚中提取EchoClip作为辅助监督信号,持续提升训练性能。

Comments ICML 2026

详情
AI中文摘要

基于可验证奖励的强化学习是增强大语言模型推理能力的有效后训练方法。然而,随着训练进行,学习信号可能崩溃,导致训练收益变得微弱且无效。具体而言,越来越多的提示回滚出现优势退化:所有自生成回滚均显示验证成功,使得其奖励的标准差为零;相应地,每个回滚的优势也退化为零。由于这些回滚的优势为零,用于模型优化的策略梯度最终消失,限制了训练性能。我们认为,其中一些回滚仍然包含有价值的学习信号,但不幸被现有RLVR方法忽略。本文受外部专家模型生成的金色轨迹的熵模式分析启发,提出EchoRL以更好地利用优势退化的回滚来进一步提升训练性能。EchoRL是一个轻量级模块,首先根据逐步熵值从验证成功的回滚中识别出EchoClip,然后将该片段作为辅助监督信号反馈到RL目标中。在10个基准、5个LLM骨干网络和4种流行RLVR后训练方法上的大量实验表明,EchoRL能够以最小开销持续改进RLVR后训练。

英文摘要

Reinforcement Learning with Verifiable Rewards is an effective route for post-training to strengthen the reasoning capability of large language models. However, as training proceeds, the learning signal can collapse thus makes the training gain become marginal and ineffective. Specifically, a growing fraction of prompts' rollouts become advantage-degenerated: all the self-generated rollouts show verified-success, making the standard deviation over their rewards be zero; accordingly each rollout's advantage becomes degenerated (zero) as well. Given such rollouts' advantages, the policy-gradient for model optimization eventually vanishes, capping the training performance. We argue that some of these rollouts still contain valuable learning signals but unfortunately omitted with the existing RLVR methods. In this paper, inspired through analyzing the entropy pattern behind golden trajectories produced by external expert models, we propose EchoRL for better exploiting the advantage-degenerated rollouts to further improve the training performance. EchoRL is a lightweight module that first identifies an EchoClip from verified-success rollouts based on their step-level entropy values, and then feeds this clip back as an auxiliary supervision signal in the RL objective. Extensive experiments across 10 benchmarks, 5 LLM backbones, and 4 popular RLVR post-training methods demonstrate that EchoRL consistently improves RLVR post-training with minimal overhead.

2605.31226 2026-06-01 cs.LG cs.AI 版本更新

What changes after deployment? A survey on On-device Learning in TinyML

部署后发生了什么变化?TinyML中设备端学习综述

Massimo Pavan, Luca Pezzarossa, Fabrizio Pittorino, Manuel Roveri, Xenofon Fafoutis

发表机构 * Technical University of Denmark (DTU)(丹麦技术大学)

AI总结 本文针对微控制器级设备上的机器学习模型,系统综述了约70篇设备端学习(ODL)工作,基于分布变化类型分析其对应用、硬件和解决方案的影响,并指出方法论基准与现实部署之间的差距。

详情
AI中文摘要

微控制器级设备上的机器学习模型(TinyML)面临一个根本性挑战:部署后的分布变化会破坏静态模型。设备端学习(ODL)通过直接在设备上运行学习过程来解决这一问题。现有文献尚未描述分布变化如何发生,以及不同类型的变化需要不同的解决方案。本文基于分布变化类型这一原则,综述了约70篇ODL工作。调查分析了不同类型的分布变化如何影响可寻址的设备端应用、所使用的硬件以及解决方案的结构。还指出了方法论基准与现实部署场景之间持续存在的差距。

英文摘要

Machine learning models on microcontroller-class devices (TinyML) face a fundamental challenge: post-deployment distribution change undermines static models. On-device learning (ODL) addresses this by running the learning process directly on the device. The existing literature has not characterized how distribution change occurs or how different change types require different solutions. Approximately 70 ODL works are surveyed under one principle: the distribution change regime. The survey analyzes how different types of distribution change influence the applications addressable on-device, the hardware employed, and the structure of the solutions. A persistent gap between methodological benchmarks and real-world deployment scenarios is also identified.

2605.31222 2026-06-01 cs.LG 版本更新

Multivariate Distributional Reinforcement Learning Using Sliced Divergences

使用切片散度的多变量分布强化学习

Baptiste Debes, Tinne Tuytelaars

发表机构 * Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium(电子工程系(ESAT),比利时鲁文大学,鲁文,比利时)

AI总结 提出SDRL方法,通过投影将一维散度扩展到多变量回报分布,并证明在标量折扣和一般矩阵折扣下的贝尔曼收缩性,支持多种散度并适用于标准单样本贝尔曼更新。

详情
AI中文摘要

分布强化学习(DRL)建模完整的回报分布而非期望,但将其扩展到多变量设置仍然具有挑战性。许多常见度量不能自然地推广到一维以上,或者失去计算可行性,并且多变量情况引入了额外的困难,例如一般矩阵折扣,对此没有可用的收缩结果。我们引入了切片分布强化学习(SDRL),它通过投影将可处理的一维散度提升到多变量回报分布。我们证明了在共享标量折扣下均匀切片的贝尔曼收缩,并引入了一种在一般密集折扣矩阵下具有收缩性的最大切片变体。SDRL支持广泛的基散度;我们分析了Wasserstein、Cramér和最大均值差异(MMD),并表征了哪些SDRL变体适用于分布强化学习中使用的标准单样本贝尔曼更新。我们在一个玩具链问题、一个基于网格世界的图像环境以及一组Atari游戏上评估了SDRL。

英文摘要

Distributional reinforcement learning (DRL) models the full return distribution rather than expectations, but extending it to multivariate settings remains challenging. Many common metrics do not naturally generalize beyond one dimension or lose computational tractability, and the multivariate case introduces additional difficulties such as general matrix discounting, for which no contraction results are available. We introduce Sliced Distributional Reinforcement Learning (SDRL), which lifts tractable one-dimensional divergences to multivariate return distributions via projections. We prove Bellman contraction for uniform slicing under shared scalar discounting, and introduce a maximum-slicing variant with contraction under general dense discount matrices. SDRL supports a broad class of base divergences; we analyze Wasserstein, Cramér, and Maximum Mean Discrepancy (MMD), and characterize which SDRL variants suit the standard single-sample Bellman update used in distributional RL. We evaluate SDRL on a toy chain problem and a gridworld image-based environment as well as a subset of Atari games.

2605.31220 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models

共享疑虑:语言模型的零样本跨语言置信度估计

Athina Kyriakou, Dennis Ulmer, Ivan Titov

发表机构 * ILLC, University of Amsterdam(阿姆斯特丹大学ILLC) ILCC, University of Edinburgh(爱丁堡大学ILCC)

AI总结 研究多语言大语言模型是否编码共享的、可跨语言迁移的置信度特征,通过轻量级线性探针从中间表示直接预测答案正确性,实现零样本跨语言泛化,并发现置信度特征集中在中间层。

详情
AI中文摘要

置信度估计(CE),即量化模型预测的可靠性,在大语言模型(LLM)背景下引起了极大兴趣。然而,大多数研究集中在英语上,忽视了LLM使用的多语言现实,而许多CE方法会退化或需要跨语言重新训练。为了解决这一差距,我们研究了多语言LLM是否编码共享的、可跨语言迁移的置信度特征。我们使用一个轻量级线性探针,直接从中间表示预测答案正确性。经过单语言训练后,该探针在零样本情况下泛化到未见过的、类型多样的语言,无需目标语言监督。学习到的层权重和多次消融实验表明,置信度特征集中在各语言的中间层,表明存在共享的置信度子空间。虽然零样本跨语言性能取决于与源语言的相似性,但该探针无需任何重新训练即可提供强基线,并且与其他流行的置信度估计方法相比具有优势。

英文摘要

Confidence estimation (CE), i.e. quantifying the reliability of a model's prediction, has attracted great interest in the context of large language models (LLMs). However, most studies focus on English, ignoring the multilingual reality of LLM usage, while many CE methods degrade or require retraining across languages. To address this gap, we investigate whether multilingual LLMs encode shared, language-transferable confidence features. We use a lightweight linear probe that predicts answer correctness directly from intermediate representations. Trained monolingually, the probe generalizes zero-shot to unseen, typologically diverse languages without target-language supervision. Learned layer weights and multiple ablations reveal that confidence features concentrate in middle layers across languages, suggesting a shared confidence subspace. While zero-shot cross-lingual performance depends on similarity to the source language, the probe provides a strong baseline without any retraining and compares favorably to other popular confidence estimation methods.

2605.31215 2026-06-01 cs.LG cs.CV 版本更新

Fixed-Point Masked Generative Modeling

不动点掩码生成建模

Andrea Miele, Yiming Qin, Alba Carballo-Castro, Justin Deschenaux, Pascal Frossard

发表机构 * LTS4, EPFL(EPFL LTS4实验室)

AI总结 提出不动点掩码生成模型(FP-MGM),通过共享注意力层的不动点求解器实现自适应深度,并引入跨步一致性损失和三态重用(3SR)策略,在降低参数和训练成本的同时提升低预算掩码生成质量。

详情
AI中文摘要

掩码生成模型(MGM)支持并行解码并在多种模态上取得强性能,但每一步都需要全序列双向变换器,导致训练成本高且在低采样预算下质量下降。现有工作通过更好的采样器或更便宜的固定深度去噪器提升效率,但仍为每个精炼步骤分配固定量的去噪器计算。我们提出不动点掩码生成模型(FP-MGM),用共享注意力层上的不动点求解器替换部分去噪器,实现自适应深度且参数更少。为使其更有效地用于掩码生成,我们首先引入跨步一致性损失,对齐相邻去噪步骤的隐藏表示;其次,三态重用(3SR)通过分别处理未改变、仍掩码和新揭示的令牌,利用先前解热启动求解器。这些组件共同定义了我们的不动点掩码生成的完整训练到推理框架CoFRe。我们还表明,预训练的MGM可以通过短微调转换为FP-MGM,避免完全重新训练。跨模态,CoFRe改善了质量与成本的权衡。在OpenWebText上,与MDLM相比,CoFRe参数减少38.8%,训练时间减少11.5%,VRAM减少16.9%,同时在96个变换器块前向传播的预算下,生成困惑度从830.8提升到101.8。在ImageNette上,CoFRe训练时间减少48.6%,VRAM减少50.7%,并在所有测试的样本预算下改善FID。总体而言,CoFRe为更便宜的训练和更强的低预算掩码生成提供了一个实用框架。

英文摘要

Masked Generative Models (MGMs) enable parallel decoding and achieve strong performance across modalities, but require full-sequence bidirectional transformers at every step, making training costly and degrading quality under low sampling budgets. Existing work improves efficiency via better samplers or cheaper fixed-depth denoisers, but they still allocate a fixed amount of denoiser computation to each refinement step. We introduce Fixed-Point Masked Generative Models (FP-MGMs), which replace part of the denoiser with a fixed-point solver over shared attention layers to enable adaptive depth with fewer parameters. To make it more effective for masked generation, we first introduce a cross-step consistency loss, which aligns hidden representations at neighboring denoising steps and, second, three-state reuse (3SR) which warm-starts the solver using the previous solution by treating differently unchanged, still-masked, and newly revealed tokens respectively. Together, these components define our complete training-to-inference framework for fixed-point masked generation, \emph{CoFRe}. We also show that pre-trained MGMs can be converted into FP-MGMs with short fine-tuning, avoiding full retraining. Across modalities, CoFRe improves the quality and cost trade-off. On OpenWebText, CoFRe reduces parameters by 38.8\%, training time by 11.5\%, and VRAM by 16.9\%, while improving generative perplexity from 830.8 to 101.8 at a budget of $96$ transformer-block forward passes, compared to MDLM. In ImageNette, CoFRe reduces training time by 48.6\% and VRAM by 50.7\%, while improving FID in all sample budgets tested. Overall, CoFRe offers a practical framework for cheaper training and stronger low-budget masked generation.

2605.31193 2026-06-01 cs.LG 版本更新

Geometry-based Schrödinger Bridges for Trustworthy Multimodal Fusion

基于几何的薛定谔桥用于可信多模态融合

Jiayu Xiong, Jing Wang, Qi Zhang, Wanlong Wang, Jun Xue

发表机构 * Department of Computer Science(计算机科学系) Techonology, Huaqiao University(技术学系,华侨大学) Xiamen Key Laboratory of Computer Vision(厦门计算机视觉实验室) Pattern Recognition, Huaqiao University(模式识别,华侨大学) Tongji University(同济大学) School of Cyber Science(网络科学学院) Engineering, Wuhan University(工程学院,武汉大学)

AI总结 提出基于几何的多模态融合方法GMF,利用扩散薛定谔桥的初始速度平方作为独立于预测的可靠性信号,以提升对低质量数据的鲁棒性。

Comments ICML 2026 accepted paper

详情
AI中文摘要

现实世界的多模态系统必须对低质量数据具有鲁棒性,例如传感器噪声、不完整的多模态数据和冲突输入。然而,现有的可信融合方法依赖模型自身的预测置信度来判断数据质量,这造成了循环依赖:当模型自信但错误时,这些方法无法检测到错误。为了打破这一循环,我们提出了基于几何的多模态融合(GMF)。我们不依赖预测,而是通过测量输入在潜在空间中所需的传输校正量来评估可靠性。我们实现了带有整流流的扩散薛定谔桥传输,其中初始速度的平方提供了一个高效的学习校正分数。有效数据具有低的平方速度幅度,而噪声、不完整数据或冲突数据需要更强的传输校正。这种基于几何的可靠性信号充当独立判断,即使在分类器被欺骗时也能有效标记不可靠输入。大量实验表明,与基于置信度的基线相比,GMF显著提高了对严重传感器噪声和语义冲突的鲁棒性。

英文摘要

Real-world multimodal systems must be robust against low-quality data, such as sensor noise, incomplete multimodal data and conflicting inputs. However, existing trustworthy fusion methods rely on the model's own prediction confidence to judge data quality. This creates a circular dependency: when a model is confident but wrong, these methods fail to detect the error. To break this loop, we propose Geometry-based Multimodal Fusion (GMF). Instead of relying on predictions, we evaluate reliability by measuring how much transport correction the input needs in latent space. We implement Diffusion Schrödinger Bridge transport with Rectified Flow, where the squared initial velocity gives an efficient learned correction score. Valid data has low squared velocity magnitude, while noisy, incomplete data or conflicting data requires stronger transport correction. This geometry-based reliability signal acts as an independent judge, effectively flagging unreliable inputs even when the classifier is fooled. Extensive experiments demonstrate that GMF significantly improves robustness against severe sensor noise and semantic conflicts compared to confidence-based baselines.

2605.31191 2026-06-01 cs.LG cs.CV 版本更新

Student Capacity Moderates Knowledge Distillation Effectiveness: A Systematic Study Across ResNet Teacher-Student Pairs on CIFAR-10

学生容量调节知识蒸馏有效性:基于CIFAR-10上ResNet教师-学生对的系统研究

Umut Onur Yasar

发表机构 * GitHub

AI总结 通过ResNet教师-学生对在CIFAR-10上的图像分类实验,系统研究学生容量如何调节知识蒸馏(KD)的有效性,发现学生容量是蒸馏增益的关键调节因素,并指出实现正确性和输入分辨率感知架构的重要性。

Comments 9 pages, 2 figures, 5 tables. Code available at https://github.com/umutonuryasar/kd-capacity-gap

详情
AI中文摘要

我们研究了教师-学生容量关系如何调节基于ResNet的CIFAR-10图像分类中知识蒸馏(KD)的有效性。在三个教师-学生对(R50->R18、R34->R18和R50->R34)中,我们在受控、可重复的条件下(3个种子,全程报告均值±标准差)比较了Logit-KD和Feature-KD。我们报告三个主要发现。首先,学生容量是蒸馏增益的关键调节因素:即使教师-学生准确率差距相当,R34学生从KD中获得的收益也远大于R18学生,R50->R34 Feature-KD的最大增益为+0.30个百分点,而R34->R18 Feature-KD为+0.18个百分点,R34->R18 Logit-KD为+0.00个百分点。其次,实现的正确性对Feature-KD至关重要:一个排除了投影层的梯度裁剪错误抑制了Feature-KD的性能,并产生了与Logit-KD的误导性比较。修正后,Feature-KD在三个对中的两个上匹配或优于Logit-KD,在R50->R34上达到95.55%,基线为95.25%。第三,输入分辨率感知架构是有效蒸馏的先决条件:将ResNet主干修正为32x32输入使教师准确率提高超过5个百分点——比任何KD增益高出一个数量级。所有代码和结果可在github.com/umutonuryasar/kd-capacity-gap获取。

英文摘要

We investigate how teacher-student capacity relationships modulate knowledge distillation (KD) effectiveness in ResNet-based image classification on CIFAR-10. Across three teacher-student pairs -- R50->R18, R34->R18, and R50->R34 -- we compare Logit-KD and Feature-KD under controlled, reproducible conditions (3 seeds, mean+/-std reported throughout). We report three main findings. First, student capacity is a key moderating factor in distillation gain: R34 students benefit substantially more from KD than R18 students even when teacher-student accuracy gaps are comparable, with the strongest gain of +0.30pp observed for R50->R34 Feature-KD versus +0.18pp for R34->R18 Feature-KD and +0.00pp for R34->R18 Logit-KD. Second, implementation correctness critically affects Feature-KD: a gradient clipping bug that excluded projection layers suppressed Feature-KD performance and produced misleading comparisons with Logit-KD. After correction, Feature-KD matches or outperforms Logit-KD in two of three pairs, reaching 95.55% on R50->R34 against a baseline of 95.25%. Third, input-resolution-aware architecture is a prerequisite for effective distillation: correcting the ResNet stem for 32x32 inputs raises teacher accuracy by over 5pp -- an order of magnitude larger than any KD gain. All code and results are available at github.com/umutonuryasar/kd-capacity-gap.

2605.31189 2026-06-01 cs.LG 版本更新

FlagGAM: Rule-Based Generalized Additive Modeling for Explainable Tabular Prediction

FlagGAM:基于规则的可解释表格预测广义加性模型

Zijie Zhao, Roy E. Welsch

发表机构 * EECS Department, Massachusetts Institute of Technology, Cambridge, MA, USA(麻省理工学院电子工程与计算机科学系) Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA(麻省理工学院斯隆管理学院)

AI总结 提出FlagGAM框架,通过规则定义的基函数分离特征级规则构建与预测,在保持可解释性的同时提升对不完美输入的鲁棒性。

详情
AI中文摘要

在高风险领域的表格预测中,需要准确、透明且对不完美输入鲁棒的模型。我们提出FlagGAM,一个规则定义的基函数框架,将特征级规则构建与预测分离。Flag核心模块将数值和分类变量转换为稀疏、可读的单变量基函数,包括阈值标志、类别级标志、尾部偏差基和分类阶跃函数;默认的加性头部随后将这些基函数组合为受限的GAM风格预测器。FlagGAM不是将触发的规则简化为紧凑的计数摘要,而是保留稀疏的规则基矩阵,支持混合类型分类和回归、特征特定权重以及可选的灵活预测头部。在表格基准测试中,默认FlagGAM在透明加性模式下接近EBM,在混合类型回归上显著优于岭回归,并在缺失和噪声扰动下显示出比常见基线更小的AUROC下降。灵活头部进一步提高了准确性,接近强树基线,但需要注意,所得模型应解释为规则基表示后接非线性预测器,而非完全加性GAM。总体而言,FlagGAM为需要竞争性准确性、可传达规则和对不完美输入鲁棒性的表格设置提供了实用的中间地带。

英文摘要

Tabular prediction in high-stakes domains requires models that are accurate, transparent, and robust to imperfect inputs. We propose FlagGAM, a rule-defined basis framework that separates feature-level rule construction from prediction. A Flag Core Module converts numerical and categorical variables into sparse, human-readable univariate bases, including threshold flags, category-level flags, tail-deviation bases, and categorical step functions; a default additive head then combines these bases as a restricted GAM-style predictor. Rather than reducing triggered rules to compact count summaries, FlagGAM retains a sparse rule-basis matrix that supports mixed-type classification and regression, feature-specific weighting, and optional flexible prediction heads. Across tabular benchmarks, default FlagGAM remains close to EBM in transparent additive mode, improves substantially over ridge regression on mixed-type regression, and shows smaller AUROC degradation than common baselines under missing and noisy perturbations. Flexible heads further improve accuracy and approach strong tree-based baselines, with the caveat that the resulting model should be interpreted as a rule-basis representation followed by a nonlinear predictor rather than as a fully additive GAM. Overall, FlagGAM provides a practical middle ground for tabular settings that require competitive accuracy, communicable rules, and robustness to imperfect inputs.

2605.31187 2026-06-01 cs.CV cs.LG 版本更新

From Local Geometry to Global Pseudo Labeling for Robust Positive Unlabeled Learning under Covariate Shift

从局部几何到全局伪标注:协变量偏移下鲁棒的正无标记学习

Firas Gabetni, Alexandre Rocchi Henry, Nacim Belkhir, Ziyi Liu, Gianni Franchi

发表机构 * U2IS, ENSTA(U2IS,ENSTA) Institut Polytechnique de Paris(巴黎政治学院) AMIAD, Pôle Recherche, Palaiseau(AMIAD,研究学院,帕莱索)

AI总结 提出SPUNA框架,利用局部流形结构逐步发现偏移数据,在协变量偏移下实现正无标记学习,性能达到全监督方法水平。

详情
AI中文摘要

检测协变量偏移对于构建可靠的视觉系统至关重要。虽然大多数先前工作专注于提高对偏移的鲁棒性,但显式检测协变量偏移仍未被充分探索。现有方法通常依赖于全监督训练,需要来自原始分布和偏移分布的有标签样本,这往往不切实际。在本文中,我们表明协变量偏移检测可以通过使用正无标记(PU)学习的弱监督有效解决。然而,在协变量偏移下,分布内数据和偏移数据显著重叠,使得经典PU方法不稳定且对噪声敏感。为克服这一挑战,我们引入了谱PU邻域标注(SPUNA),这是一种几何感知框架,通过利用视觉特征的局部流形结构逐步发现偏移数据。大量实验表明,SPUNA在PU设置中实现了最先进的性能,并且显著匹配了全监督方法的性能。此外,我们的方法在不同类型的偏移之间鲁棒地迁移,展示了强大的泛化能力。

英文摘要

Detecting covariate shift is critical for building reliable vision systems. While most prior work focuses on improving robustness to shift, explicitly detecting covariate shift remains underexplored. Existing approaches typically rely on fully supervised training, requiring labeled examples from both original and shifted distributions, which is often impractical. In this paper, we show that covariate shift detection can be effectively addressed with weaker supervision using Positive Unlabeled (PU) learning. However, under covariate shift, in distribution and shifted data overlap significantly, making classical PU methods unstable and sensitive to noise. To overcome this challenge, we introduce Spectral PU Neighborhood Annotation (SPUNA), a geometry aware framework that progressively discovers shifted data by leveraging the local manifold structure of visual features. Extensive experiments show that SPUNA achieves state of the art performance in PU settings and remarkably matches the performances of fully supervised methods. Moreover, our approach transfers robustly across different types of shifts, demonstrating strong generalization capabilities.

2605.31186 2026-06-01 cs.LG 版本更新

How well does Classification Accuracy capture Concept Drift Detection Quality? An overview of Concept Drift Detection evaluation

分类精度在多大程度上捕捉概念漂移检测质量?概念漂移检测评估综述

Joanna Komorniczak

发表机构 * Department of Systems and Computer Networks(系统与计算机网络系)

AI总结 本文综述了概念漂移检测质量度量与分类性能之间的关系,通过七种合成数据流工具研究八种漂移检测质量度量,旨在确定最具信息量的度量集。

详情
AI中文摘要

数据流是当今最常分析的数据结构之一,概念漂移对处理系统构成了重大挑战。尽管提出了许多解决方案来应对概念漂移导致的精度下降,但科学界尚未建立统一的概念漂移检测评估框架。现有研究通常依赖分类质量度量,但这些度量可能受多种因素影响,无法可靠反映漂移检测质量。本文深入概述了合成非平稳数据流中漂移检测质量度量与分类性能之间的关系。研究通过七种合成数据流生成工具,考察了八种漂移检测质量度量与分类器性能的关系,并额外考虑了漂移动态因素。研究旨在识别最具信息量的漂移检测质量度量集,并提供对方法评估的深入理解。

英文摘要

Data streams are nowadays among the most frequently analyzed data structures, with the concept drift posing a major challenge encountered by processing systems. Despite the proposition of numerous solutions to counteract the accuracy degeneration due to concept drift, the scientific community has not yet established a unified framework for evaluating the concept drift detection task. Existing research often relies on classification quality metrics, but these can be affected by multiple factors and may not reliably reflect drift detection quality. In this work, we present an in-depth overview of the relationship between metrics for quantifying drift detection quality and classification performance in synthetic nonstationary data streams. The proposed research studies eight drift detection quality metrics in relation to the classifier's performance across seven synthetic data stream generation tools, additionally considering drift dynamics as a factor. The studies aim to identify the most informative set of drift detection quality metrics and provide a deep understanding of the method's evaluation.

2605.31183 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

引导LLM?实际上,稀疏自编码器可以胜过简单基线

Mikkel Godsk Jørgensen, Lars Kai Hansen

发表机构 * DTU Compute(丹麦技术大学计算学院)

AI总结 本文通过监督流水线选择并标注特征,证明稀疏自编码器在模型引导任务上可接近LoRA性能,并发现高稀疏性对基于可解释性的引导并非关键。

详情
AI中文摘要

稀疏自编码器(SAEs)被视为探索大型语言模型(LLMs)内部机制和引导模型输出生成的有前途的途径。当Wu等人(2025)引入模型引导基准AxBench时,SAEs由于相对于一组简单基线的引导性能较差,似乎并未达到最初的期望。本文作为对稀疏自编码器的部分反驳,表明Wu等人(2025)的结果并未完全公正地评价它们。我们发现,当使用我们的监督流水线选择并标注特征时,稀疏自编码器实际上可以在AxBench基准上达到接近参考LoRA性能的水平。我们还发现,当仅使用基于可解释性的组件时,我们的流水线选择的特征与其识别标签具有令人惊讶的因果性。最后,我们提供证据表明,高稀疏性(低l0)可能对于基于可解释性的成功引导并非关键,这与Wang等人(2025)早期的发现相反。

英文摘要

Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to their original hype due to poor steering performance relative to a set of simple baselines. This work serves as a partial rebuttal for Sparse Autoencoders and suggests that the results of Wu et al. (2025) did not do them full justice. We find that Sparse Autoencoders can, in fact, perform close to on par with the reference LoRA performance on the AxBench benchmark, when features are selected and labelled with our supervised pipeline. We also find that our pipeline selects features that are surprisingly causal of their identified labels when using only its interpretability-based components. Lastly, we present evidence that high sparsity (low l0) may not be crucial for successful steering based on interpretability, which is in contrast to the earlier findings in Wang et al. (2025).

2605.31176 2026-06-01 cs.LG cs.DS 版本更新

Retriever Portfolios: A Principled Approach to Adaptive RAG

检索器组合:一种自适应RAG的原则性方法

Miltiadis Stouras, Vincent Cohen-Addad, Silvio Lattanzi, Ola Svensson

发表机构 * EPFL(瑞士联邦理工学院) Google Research(谷歌研究院)

AI总结 提出从大量候选检索器中自动选择小型多样子集(组合)的方法,通过期望最优k目标优化查询分布,实现自适应RAG,在多个QA基准上优于单检索器和朴素多检索器基线,并降低延迟和令牌成本。

Comments Accepted at ICML 2026. Code available at: https://github.com/mstou/retriever-portfolios

详情
AI中文摘要

检索增强生成(RAG)系统通常依赖单一检索器和一组超参数,尽管面临从简单事实性问题到复杂多跳推理的高度异构查询。我们提出一种方法,从大量候选检索器中自动选择一个小型、多样的子集(组合),以覆盖目标查询分布的不同区域。我们通过查询分布上的期望最优$k$目标形式化这一设置,并证明其存在一个具有近最优保证的高效组合构建算法。在多个QA基准上,我们学习的组合和路由管道在检索指标和答案质量上始终优于单检索器和朴素多检索器基线。此外,与推理时超参数调优方法相比,固定组合支持并行检索和LLM调用,在实现相当(有时更好)准确性的同时,显著降低延迟和令牌成本。

英文摘要

Retrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.

2605.31174 2026-06-01 cs.CV cs.LG 版本更新

Detect in Any Scene: An Agentic Framework for Object Detection with Experience-Aware Reasoning

任意场景检测:一种具有经验感知推理的目标检测智能体框架

Wenlun Zhang, Jun Yin, Kentaro Yoshioka

发表机构 * Keio University(Keio大学) Tsinghua University(清华大学)

AI总结 提出DetAS/DetAS-X智能体框架,利用多模态大语言模型自适应组合恢复模块和专用检测器,通过自进化经验积累实现经验感知推理,在六个基准上平均F1提升28.36%。

详情
AI中文摘要

现实场景中的目标检测由于图像退化多样和物体分布异质而仍然具有挑战性,这显著阻碍了现有检测器的泛化。传统方法,包括场景特定表示学习和端到端流水线设计,本质上受限于对预定义条件的依赖,缺乏对动态环境的适应性。本文提出DetAS,一种将目标检测表述为动态决策过程的智能体检测框架。DetAS不依赖静态流水线,而是利用多模态大语言模型(MLLM)作为中央智能体,通过从恢复模块和专用检测器的工具箱中选择来自适应地组合检测工作流。具体来说,DetAS包含两个关键组件:自适应图像恢复,动态决定是否以及如何增强图像以进行下游检测;以及多专家检测,集成多个领域专用检测器并通过实例级推理解决它们的预测。为了在细粒度条件下进一步提高决策质量,我们引入了自进化经验积累,并将框架扩展到DetAS-X,该框架从少量标注数据中积累节点级决策经验,并在推理过程中实现经验感知推理。这种机制使系统能够逐步优化其决策策略,并适应各种现实场景。在六个具有挑战性的基准上的大量实验表明,DetAS-X显著优于现有的基于MLLM的检测器,在F1分数上平均提高28.36%,在DarkFace上增益高达37.01%。这些结果展示了智能体检测的前景,并为其在复杂动态环境中的应用奠定了坚实基础。

英文摘要

Object detection in real-world scenarios remains challenging due to diverse image degradations and heterogeneous object distributions, which significantly hinder the generalization of existing detectors. Conventional approaches, including scene-specific representation learning and end-to-end pipeline design, are inherently limited by their reliance on predefined conditions and lack adaptability to dynamic environments. In this paper, we propose DetAS, an agentic detection framework that formulates object detection as a dynamic decision process. Instead of relying on static pipelines, DetAS leverages a Multimodal Large Language Model (MLLM) as a central agent to adaptively compose detection workflows by selecting from a toolbox of restoration modules and specialized detectors. Specifically, DetAS consists of two key components: Self-Adaptive Image Restoration, which dynamically determines whether and how to enhance images for downstream detection, and Multi-Expertise Detection, which integrates multiple domain-specialized detectors and resolves their predictions through instance-level reasoning. To further improve decision quality under fine-grained conditions, we introduce Self-Evolving Experience Harvesting and extend the framework to DetAS-X, which accumulates node-level decision experience from a small set of annotated data and enables experience-aware reasoning during inference. This mechanism allows the system to progressively refine its decision policy and adapt to diverse real-world scenarios. Extensive experiments on six challenging benchmarks demonstrate that DetAS-X significantly outperforms existing MLLM-based detectors, achieving an average improvement of 28.36% in F1 score, with up to 37.01% gain on DarkFace. These results demonstrate the promise of agentic detection and establish a solid foundation for its application in complex and dynamic environments.

2605.31172 2026-06-01 cs.LG stat.ML 版本更新

Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement Learning

双时间尺度马尔可夫随机逼近的收敛性及其在强化学习中的应用

Vagul Mahadevan, Claire Chen, Shuze Daniel Liu, Shangtong Zhang

发表机构 * Department of Computer Science, University of Virginia, Charlottesville, VA, USA(弗吉尼亚大学计算机科学系) Data Science Lab, MIT, Cambridge, MA, USA(麻省理工学院数据科学实验室) Mitch Daniels School of Business, Purdue University, West Lafayette, IN, USA(普渡大学米切尔丹尼尔斯商学院) Division Office Physics, Math and Astronomy, California Institute of Technology, Pasadena, CA, USA(加州理工学院物理、数学和天文学分校)

AI总结 本文研究双时间尺度随机逼近在马尔可夫噪声下的稳定性与收敛性,通过用慢时间尺度参数的运行最大值控制快时间尺度参数,首次证明了带资格迹的TDC在离策略线性函数逼近下的几乎必然收敛。

Comments ICML 2026

详情
AI中文摘要

本文研究双时间尺度随机逼近(SA)的收敛性,这是一类迭代算法,分别以快慢时间尺度更新两组参数。强化学习中双时间尺度SA的著名例子包括带梯度校正的时间差分学习(TDC)和演员-评论家方法。以往,双时间尺度SA的稳定性(即有界性)和收敛性仅在独立同分布噪声下建立。本文则在马尔可夫噪声下建立双时间尺度SA的稳定性和收敛性,这种设置更符合强化学习实际。值得注意的是,我们无需使用任何投影算子,且噪声无需位于紧集内。我们的关键技术新颖之处在于,用慢时间尺度参数的运行最大值来控制快时间尺度参数,而非像大多数先前工作那样使用当前慢时间尺度参数。作为一个关键应用,我们首次证明了带资格迹的TDC在离策略线性函数逼近下的几乎必然收敛。

英文摘要

This work studies the convergence of two-timescale stochastic approximations (SA), a class of iterative algorithms that update two sets of parameters in fast and slow timescales respectively. Notable examples of two-timescale SA in reinforcement learning (RL) include temporal difference learning with gradient correction (TDC) and actor-critic methods. Previously, the stability (i.e., boundedness) and convergence of two-timescale SA were only established under i.i.d. noise. This work instead establishes the stability and convergence of two-timescale SA under Markovian noise, a setup that is more realistic in RL. Notably, we do not need to use any projection operator and the noise does not need to live in a compact space. Our key technical novelty is to control the fast timescale parameter with the running max of the slow timescale parameter, instead of with the current slow timescale parameter, as most prior works do. As a key application, we establish the first almost sure convergence of TDC with eligibility traces under off-policy learning with linear function approximation.

2605.31163 2026-06-01 stat.ML cs.LG 版本更新

Memory by Design: Probabilistic Sequence Layers

记忆设计:概率序列层

Matthew Dowling, Hyungju Jeon, Cristina Savin, Il Memming Park

发表机构 * Champalimaud Research, Champalimaud Foundation, Portugal(恰帕拉马德研究、恰帕拉马德基金会、葡萄牙) Center for Neural Science, New York University, USA(神经科学中心、纽约大学、美国) Center for Data Science, New York University, USA(数据科学中心、纽约大学、美国) RyvivyR Inc., NY, USA(RyvivyR公司、纽约、美国)

AI总结 提出设计-模型框架,通过精确贝叶斯滤波推导高效循环序列映射,线性高斯实例中的贝叶斯层传播均值和协方差以跟踪不确定性,统一多种次二次递归,并提升鲁棒性和长上下文检索。

Comments Preprint, in submission

详情
AI中文摘要

我们引入了设计-模型框架:一种从关于记忆的显式假设中推导高效循环序列映射的方法。设计模型通过精确贝叶斯滤波将证据写入记忆;查询相关的读出产生一个预测分布,其均值即为层输出。在我们的线性高斯实例中,贝叶斯层同时传播均值和协方差:协方差跟踪存储关联的不确定性,引导写入朝向不确定方向,随着证据积累而衰减增益,并保留自信的记忆。同一框架统一了几种次二次递归。线性注意力、GLA和Mamba-2/SSD在一个设计模型下是精确滤波器,而DeltaNet及相关Delta-rule模型在另一个设计模型下作为协方差重置约简出现。恢复协方差为检索动力学提供了闭式预测,并经实验验证,在受控碰撞研究、学习关联回忆和Zoology MQAR基准上,改善了训练范围外的鲁棒性;将贝叶斯层蒸馏到预训练的340M Gated DeltaNet中,在匹配计算下提升了RULER长上下文检索性能。

英文摘要

We introduce the design-model framework: a way to derive efficient recurrent sequence maps from explicit assumptions about memory. A design model writes evidence into memory by exact Bayesian filtering; a query-dependent readout produces a predictive distribution whose mean is the layer output. In our linear-Gaussian instantiation, the \emph{Bayesian Layer} propagates both a mean and a covariance: the covariance tracks uncertainty over stored associations, steering writes toward uncertain directions, attenuating gains as evidence accumulates, and preserving confident memories. The same framework unifies several sub-quadratic recurrences. Linear attention, GLA, and Mamba-2/SSD are exact filters under one design model, whereas DeltaNet and related Delta-rule models arise as covariance-reset reductions under another. Restoring the covariance yields closed-form predictions for retrieval dynamics, verified empirically, and improves robustness beyond the training regime across controlled collision studies, learned associative recall, and the Zoology MQAR benchmark; distilling Bayesian Layers into a pretrained 340M Gated DeltaNet improves RULER long-context retrieval at matched compute.

2605.31159 2026-06-01 cs.LG cs.AI 版本更新

Trust-Region Behavior Blending for On-Policy Distillation

信任域行为混合用于在线策略蒸馏

Daniil Plyusov, Alexey Gorbatovski, Alexey Malakhov, Nikita Balagansky, Boris Shaposhnikov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结 提出信任域行为混合(TRB)预热方法,通过在学生中心的KL信任域内用最接近教师的行为策略替换早期学生策略,解决在线策略蒸馏中早期学生轨迹质量差的问题,在数学推理蒸馏中取得最佳平均性能。

详情
AI中文摘要

在线策略蒸馏(OPD)训练学生模型在其自身策略采样的前缀上进行学习,同时匹配更强的教师模型。这解决了离线蒸馏中的前缀不匹配问题,但早期的学生模型 rollout 仍然可能质量较差,导致教师监督应用于弱或低质量的前缀。我们提出信任域行为混合(TRB),一种预热方法,在学生中心的KL信任域内,用最接近教师的行为策略替换早期的 rollout 策略,同时保持每个前缀的反向KL OPD损失不变。KL预算逐渐退火至零,因此预热后训练恢复为纯学生 rollout。在两个数学推理蒸馏设置中,TRB在比较方法中取得了最强的平均性能。

英文摘要

On-policy distillation (OPD) trains a student on prefixes sampled from its own policy while matching a stronger teacher. This addresses the prefix mismatch of offline distillation, but early student rollouts can still be poor, placing teacher supervision on weak or low-quality prefixes. We propose Trust-Region behavior Blending (TRB), a warmup method that replaces the early rollout policy with the closest-to-teacher behavior policy inside a student-centered KL trust region, while keeping the per-prefix reverse-KL OPD loss unchanged. The KL budget is annealed to zero, so training returns to pure student rollouts after warmup. Across two math-reasoning distillation settings, TRB attains the strongest average among the compared methods.

2605.31156 2026-06-01 cs.LG 版本更新

TabCausal: Pretraining Across Causal Environments for Tabular Causal Discovery

TabCausal: 跨因果环境的表格因果发现预训练

Zi-Rong Li, Si-Yang Liu, Tian-Zuo Wang, Han-Jia Ye

发表机构 * Nanjing University(南京大学)

AI总结 提出TabCausal,一种通过动态任务构建策略在多样化因果环境中进行大规模预训练的因果发现基础模型,在合成和语义基准上优于现有方法。

详情
AI中文摘要

因果发现旨在从观测和干预数据中恢复有向因果关系,为机制理解和可靠决策提供基础。因果发现基础模型(CDFMs)试图通过将数据集直接映射到因果图(单次前向传播)来分摊该问题,避免每个数据集上的测试、搜索或优化。然而,现有的CDFMs仍然有限,常常无法一致地匹配强大的经典方法,我们发现关键瓶颈在于因果预训练任务的构建方式。基于这一观察,我们提出了TabCausal,一种数据驱动的CDFM,在多样化的图先验、结构机制、噪声模型、维度、样本量和干预机制上进行广泛的因果预训练。一种动态任务构建策略将这些因果环境组合成多样的发现任务,使得从观测和混合干预数据中实现更具迁移性的结构学习。在大规模合成基准上,TabCausal实现了比多种因果发现基线更好的宏观平均性能。为了进一步弥合抽象合成生成器与现实因果推理场景之间的差距,我们引入了一个协议引导且LLM审计的语义因果环境基准,其中基于领域的结构因果模型(SCMs)生成可解释的观测和干预数据集,用于分布外分析。在合成和语义环境中,TabCausal均展现出鲁棒的结构恢复能力,尤其是在干预证据下,凸显了广泛因果预训练作为可迁移摊销因果发现的关键要素。

英文摘要

Causal discovery aims to recover directed causal relations from observational and interventional data, providing a basis for mechanistic understanding and reliable decision-making. Causal discovery foundation models (CDFMs) seek to amortize this problem by mapping a dataset directly to a causal graph in a single forward pass, avoiding per-dataset testing, search, or optimization. However, existing CDFMs remain limited, often failing to consistently match strong classical methods, and we find that a key bottleneck is how causal pretraining tasks are constructed. Based on this observation, we propose TabCausal, a data-driven CDFM trained with broad causal pretraining over diverse graph priors, structural mechanisms, noise models, dimensions, sample sizes, and intervention regimes. A dynamic task construction strategy composes these causal environments into varied discovery tasks, enabling more transferable structural learning from observational and mixed-interventional data. On large-scale synthetic benchmarks, TabCausal achieves better macro-averaged performance than a diverse set of causal discovery baselines. To further bridge abstract synthetic generators and realistic causal reasoning scenarios, we introduce a protocol-guided and LLM-audited semantic causal environment benchmark, where domain-grounded SCMs generate interpretable observational and interventional datasets for out-of-distribution analysis. Across both synthetic and semantic environments, TabCausal demonstrates robust structure recovery, especially under interventional evidence, highlighting broad causal pretraining as a key ingredient for transferable amortized causal discovery.

2605.31155 2026-06-01 cs.LG 版本更新

Learning Hyperspherical Time-Frequency Representations for Time-Series Out-of-Distribution Detection

学习超球面时频表示用于时间序列分布外检测

Willian T. Lunardi, Samridha Shrestha, Martin Andreoni

发表机构 * Technology Innovation Institute(技术创新研究所) Khalifa University(哈利法大学)

AI总结 本文提出一种基于超球面嵌入的表示学习方法,通过von Mises-Fisher目标函数结合时频域编码器,实现时间序列的分布外检测,在UCR和UEA数据集上优于对比学习和后处理方法。

Comments 14 pages, 2 figures, 4 tables, accepted at IJCAI-ECAI 2026

详情
AI中文摘要

与视觉和语言领域相比,时间序列数据的分布外(OOD)检测仍然相对未被充分探索,对于如何利用监督时间序列表示在分布偏移下进行可靠检测,缺乏原则性的理解。本文将时间序列OOD检测形式化为具有超球面嵌入的表示学习,其中通过单位球面上的von Mises-Fisher(vMF)似然目标诱导类条件结构。学习到的表示通过特定领域的编码器结合输入信号的时域和频域视图,将它们整合到一个联合嵌入空间中进行OOD检测。检测使用基于距离的分数对学习到的嵌入进行评估,包括k近邻(k-NN)和马氏距离分数。我们在完整的UCR和UEA时间序列存档上,在跨数据集协议下大规模评估该方法。实验结果表明,在相同设置下,与强对比学习和后处理方法基线相比,k-NN和马氏距离评分均取得一致改进。代码可在https://github.com/tiiuae/hypertf-time-series-ood获取。

英文摘要

Out-of-distribution (OOD) detection for time-series data remains comparatively underexplored compared to vision and language, with a limited principled understanding of how supervised time-series representations can be leveraged for reliable detection under distributional shifts. This work formulates time-series OOD detection as representation learning with hyperspherical embeddings, where class-conditional structure is induced by a von Mises-Fisher (vMF) likelihood-based objective on the unit sphere. The learned representation combines time- and frequency-domain views of the input signal via domain-specific encoders, integrating them into a joint embedding space for OOD detection. Detection uses distance-based scores over the learned embeddings, including k-nearest neighbors (k-NN) and Mahalanobis scores. We evaluate the approach at scale on the complete UCR and UEA time-series archives under a cross-dataset protocol. Empirical results show consistent improvements under both k-NN and Mahalanobis scoring over strong contrastive learning and post-hoc baselines in the same setting. Code is available at https://github.com/tiiuae/hypertf-time-series-ood.

2605.31152 2026-06-01 stat.ML cs.LG cs.NA math.NA 版本更新

Approximation and learning of anisotropic and mixed smooth functions by deep ReLU neural networks

深度ReLU神经网络对各向异性和混合光滑函数的逼近与学习

Yunfei Yang, Jun Fan

发表机构 * School of Mathematics (Zhuhai) and Guangdong Province Key Laboratory of Computational Science(数学系(珠海)和广东省计算科学重点实验室) Department of Mathematics(数学系)

AI总结 本文研究深度ReLU神经网络对各向异性和混合光滑函数类的逼近率,并证明在平均光滑度条件下可达到接近最优的逼近速率。

详情
AI中文摘要

本文研究深度ReLU神经网络逼近和学习光滑函数的效率。当误差以$L^p([0,1]^d)$范数度量且逼近器为宽度$W$、深度$L$的网络时,近期工作已证明在Sobolev嵌入条件$s/d>1/q-1/p$下,对于Besov空间$\mathcal{B}^s_{q,r}([0,1]^d)$有超逼近率$\mathcal{O}((WL)^{-2s/d})$。为克服该速率中的维数灾难,我们将此结果推广到各向异性和混合光滑函数类。对于各向异性光滑度$oldsymbol{s}=(s_1,\dots,s_d)$的各向异性Besov空间$\mathcal{B}^{oldsymbol{s}}_{q,r}([0,1]^d)$,在嵌入条件$ ilde{s} > 1/q-1/p$下建立逼近率$\mathcal{O}((WL)^{-2 ilde{s}})$,其中平均光滑度$ ilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$。对于混合光滑度$s>1/q-1/p$的混合光滑Besov空间$\mathcal{MB}^s_{q,r}([0,1]^d)$,我们证明逼近率$\mathcal{O}((WL)^{-2s})$(忽略对数因子)。利用这些结果,我们还推导了各向异性Besov函数复合的逼近界。作为应用,表明深度ReLU神经网络可在广泛光滑函数类上达到极小化最优速率(忽略对数因子)。

英文摘要

This paper studies how efficiently deep ReLU neural networks can approximate and learn smooth functions. When the error is measured in $L^p([0,1]^d)$ norm and the approximator is a network with width $W$ and depth $L$, recent works have proven the supper approximation rate $\mathcal{O}((WL)^{-2s/d})$ for Besov space $\mathcal{B}^s_{q,r}([0,1]^d)$ under the Sobolev embedding condition $s/d>1/q-1/p$. In order to overcome the curse of dimensionality in this rate, we extent this result to anisotropic and mixed smooth function classes. We establish the approximation rate $\mathcal{O}((WL)^{-2\tilde{s}})$ for anisotropic Besov space $\mathcal{B}^{\boldsymbol{s}}_{q,r}([0,1]^d)$ with anisotropic smoothness $\boldsymbol{s}=(s_1,\dots,s_d)$ under the embedding condition $\tilde{s} > 1/q-1/p$, where the mean smoothness $\tilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$. For mixed smooth Besov space $\mathcal{MB}^s_{q,r}([0,1]^d)$ with mixed smoothness $s>1/q-1/p$, we show that the approximation rate $\mathcal{O}((WL)^{-2s})$ holds up to logarithmic factors. Using these results, we also derive approximation bounds for the composition of anisotropic Besov functions. As an application, it is shown that deep ReLU neural networks can achieve minimax optimal rates up to logarithmic factors for a wide range of smooth function classes.

2605.31145 2026-06-01 cs.CV cs.AI cs.LG 版本更新

FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization

FOCUS: 通过视觉支持约束和策略优化强制上下文目标定位

Mohammed Asad Karim, Vinay Kumar Verma

发表机构 * Amazon, Seattle, USA(亚马逊(美国西雅图))

AI总结 提出一种两阶段训练框架,通过优化支持框与查询图像间的上下文注意力并结合GRPO强化学习,实现无类别监督的类别无关上下文目标定位,7B模型性能超越72B模型。

Comments Accepted at ICML 2026. * Equal Contributions

详情
AI中文摘要

上下文定位(ICL)旨在通过查询图像中的少量支持示例定位目标对象,无需训练或参数更新即可即时操作。尽管视觉语言模型(VLM)快速发展,实现类别无关且基于视觉的ICL仍然是一个未解决的问题,尽管它对图像编辑、个性化视觉搜索和检索等应用至关重要。现有方法脆弱且依赖显式类别监督,这不仅限制了在具有未命名或实例特定对象的现实场景中的适用性,还引入了类别偏差,使预测偏向语义先验而非视觉证据。我们提出一个两阶段训练框架,在无类别监督的情况下显式优化支持边界框与查询图像之间的上下文注意力。我们进一步通过使用组相对策略优化(GRPO)的强化学习来细化定位,直接最小化定位误差。这种公式强制视觉对应优于语义先验,产生鲁棒的实例级定位。实验表明,使用我们的目标训练的7B参数模型优于高达72B参数的模型,证明了上下文感知定位目标可以超越单纯扩展规模。全面的消融实验验证了每个组件的贡献。

英文摘要

In-context localization (ICL) seeks to localize a target object specified by a small set of support examples in a query image, operating on the fly without training or parameter updates. Despite rapid advances in vision-language models (VLMs), achieving category-agnostic and visually grounded ICL remains an open problem, even though it is essential for applications such as image editing, personalized visual search, and retrieval. Existing methods are fragile and rely on explicit category supervision, which not only limits applicability in realistic settings with unnamed or instance-specific objects but also introduces category bias that steers predictions toward semantic priors rather than visual evidence. We introduce a two-stage training framework that explicitly optimizes in-context attention between support bounding boxes and query images without category supervision. We further refine localization via reinforcement learning using Group Relative Policy Optimization (GRPO) to directly minimize localization error. This formulation enforces visual correspondence over semantic priors, yielding robust instance-level localization. Empirically, a 7B-parameter model trained with our objectives outperforms models up to 72B parameters, demonstrating that context-aware localization objectives can surpass scaling alone. Comprehensive ablations validate the contribution of each component.

2605.31129 2026-06-01 cs.LG 版本更新

Generalizing Multi-Scale Time-Series Modeling with a Single Operator

使用单一算子泛化多尺度时间序列建模

Cheonwoo Lee, Dooho Lee, Doyun Choi, Jaemin Yoo

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea(韩国成均馆大学电子工程学院) Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea(首尔国立大学计算机科学与工程系)

AI总结 提出SiGMA架构,通过可学习离散高斯核实现距离感知缩放,解决现有方法固定离散缩放的局限性,在长期和短期预测任务中均达到最优性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

多尺度建模通过捕获多个分辨率的时间动态,已成为时间序列预测的有效设计原则。由于文献中尚未建立原则性基础,我们将现有的缩放方法统一为一个缩放算子族,揭示了现有方法的一个基本局限性:依赖固定和离散的缩放。为了解决这一局限性,我们提出了SiGMA(单一泛化多尺度架构),它通过基于尺度空间理论的可学习离散高斯(LDG)核实现距离感知缩放。我们在长期和短期预测基准上全面评估了SiGMA,与最先进的多尺度基线进行了比较。SiGMA在两项任务上均优于所有竞争对手,特别是在16个长期评估设置中,有13个达到了最佳性能。除了准确性,SiGMA在训练速度上比最强竞争对手提高了最多5.3倍,内存消耗降低了最多3.8倍。代码可在https://github.com/cheonwoolee/SiGMA获取。

英文摘要

Multi-scale modeling has emerged as an effective design principle for time-series forecasting by capturing temporal dynamics at multiple resolutions. As no principled foundation has been established in the literature, we unify existing scaling methods into a scaling operator family, revealing a fundamental limitation of existing approaches: reliance on fixed and discrete scaling. To address this limitation, we propose SiGMA (Single Generalized Multi-scale Architecture), which enables distance-aware scaling via the learnable discrete Gaussian (LDG) kernel grounded in scale-space theory. We evaluate SiGMA comprehensively on long- and short-term forecasting benchmarks against state-of-the-art multi-scale baselines. SiGMA outperforms all competitors on both tasks, especially achieving the best performance in 13 out of 16 long-term evaluation settings. Beyond accuracy, SiGMA significantly improves training speed by up to 5.3 times and reduces memory consumption by up to 3.8 times over the strongest competitors. Code is available at https://github.com/cheonwoolee/SiGMA.

2605.31127 2026-06-01 cs.LG cs.NA math.NA 版本更新

Scalable Bayesian Inference for Nonlinear Conservation Laws

非线性守恒律的可扩展贝叶斯推断

Tim Weiland, Philipp Hennig

发表机构 * Tübingen AI Center, University of Tübingen, Tübingen, Germany(图宾根人工智能中心,图宾根大学,德国图宾根)

AI总结 提出一种基于高斯过程先验的数值保守方法,用于非线性守恒律的不确定性量化,并通过稀疏近似技术实现大规模正反问题的高效求解。

Comments 27 pages, 13 figures, 3 tables

详情
AI中文摘要

非线性守恒律是科学和工程中许多最重要动力系统的核心。在实际应用中,此类系统常受到各种不确定性来源的影响,例如稀疏或有噪声的测量。推断感兴趣的物理量和场成为一个不适定问题,经典数值方法和现代深度学习方法都难以恰当处理。最近的工作将经典数值方法框架化为高斯过程先验下的贝叶斯推断,从而实现了对不确定性的物理感知处理。沿着这一思路,我们开发了一种新颖的数值保守方法,用于非线性守恒律的不确定性感知模拟。我们利用最近的稀疏近似技术,将规模扩展到大规模正问题和反问题。对于正问题模拟,我们继承了经典求解器的精度,同时提供了结构化的不确定性量化。在反问题上,我们在数秒内恢复非参数源场的后验——优于需要数分钟才能产生不太精确点估计的神经基线方法。

英文摘要

Nonlinear conservation laws are at the heart of many of the most important dynamical systems in science and engineering. In practical applications, such systems are often subject to various sources of uncertainty, e.g. due to sparse or noisy measurements. Inferring physical quantities and fields of interest then becomes an ill-posed problem which both classical numerical methods and modern deep learning-based methods struggle to treat appropriately. Recent work has framed classical numerical methods as Bayesian inference under Gaussian process priors, resulting in a physics-aware treatment of uncertainties. Following this line of work, we develop a novel numerically conservative method for uncertainty-aware simulations of nonlinear conservation laws. We use recent sparse approximation techniques to scale up to large-scale forward and inverse problems. For forward simulation, we inherit the accuracy of classical solvers while providing structured uncertainty quantification. On inverse problems, we recover posteriors over nonparametric source fields in seconds -- outperforming neural baselines that take minutes to produce a less accurate point estimate.

2605.31126 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Not All Synthetic Data Is Yours to Learn From

并非所有合成数据都适合学习

Sina Alemohammad, Li Chen, Richard G. Baraniuk, Zhangyang Wang

发表机构 * ECE Department(电子工程系) Apple(苹果公司) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Rice University(里奇大学)

AI总结 研究无提示、无教师、无验证器、无奖励模型的自训练中,语言模型能否从自身生成的文本中学习,发现合成数据与学生之间的兼容性是关键,并揭示了能力与逐字记忆可分离的现象。

详情
AI中文摘要

语言模型能否从自身采样的纯文本中改进,无需提示、教师、验证器或奖励模型?可以,但仅当合成语料库与学生兼容时,这是一种源-学生对的关联属性,而非数据的内在属性。我们称之为潜在能力重现假说:弱自训练可以放大预训练模型中已有的能力,但仅在这种兼容条件下。我们在无提示无条件自训练的最小设置中研究这一点,其中基础语言模型仅在BOS令牌生成的文本上进行微调,没有任务规范或外部监督。我们报告三个发现。首先,合成效用是关联的而非内在的:自生成数据是最有效的来源,同源迁移优于更强但不同来源的训练,跨家族迁移显著较弱。其次,常见的内在代理失效:基准级别的语义相似性和学生下的平均每令牌似然都不能预测哪些语料库有帮助。第三,这种机制产生了一个令人惊讶的副产品。在受控的Pythia实验中,能力和逐字记忆解耦:基准效用得以保留或改善,而保留的精确匹配提取下降超过95%,无需遗忘集、隐私目标或针对性遗忘。总之,这些结果表明,无提示自训练通过放大学生已知的内容来工作,而不是从数据中导入结构。它们还揭示了一种无需任何显式遗忘目标即可分离能力和逐字记忆的机制。

英文摘要

Can a language model improve from plain text sampled from itself, with no prompts, no teacher, no verifier, and no reward model? Yes, but only when the synthetic corpus is compatible with the student, a relational property of the source-student pair rather than an intrinsic property of the data. We call this the latent capability resurfacing hypothesis: weak self-training can amplify capabilities already present in the pretrained model, but only under this compatibility condition. We study this in the minimal setting of prompt-free unconditional self-training, where base language models are fine-tuned on text generated from the BOS token alone, with no task specification or external supervision. We report three findings. First, synthetic utility is relational rather than intrinsic: self-generated data is the most effective source, same-lineage transfer outperforms stronger but differently trained sources, and cross-family transfer is substantially weaker. Second, common intrinsic proxies fail: neither benchmark-level semantic similarity nor average per-token likelihood under the student predicts which corpora help. Third, this regime produces a surprising byproduct. In controlled Pythia experiments, capability and verbatim memorization decouple: benchmark utility is preserved or improved while held-out exact-match extraction drops by over 95 percent, with no forget set, privacy objective, or targeted unlearning. Together, these results suggest that prompt-free self-training works by amplifying what the student already knows, not by importing structure from the data. They also reveal a regime in which capability and verbatim memorization can be separated without any explicit unlearning objective.

2605.31120 2026-06-01 cs.GR cs.AI cs.LG 版本更新

SWIM: Single-Instance Whole-Body Imitation for swiMming

SWIM: 用于游泳的单实例全身模仿

Binglun Wang, Edmond S. L. Ho, He Wang

发表机构 * University College London(伦敦大学学院) University of Glasgow(格拉斯哥大学)

AI总结 提出一种基于物理的游泳动作合成方法SWIM,通过单实例模仿学习实现全身协调与流体连续交互,在数据效率、稳定性、鲁棒性和泛化性上优于现有方法。

详情
AI中文摘要

我们提出了一种合成基于物理的游泳动作的新方法。基于物理的角色动画旨在生成物理有效、可控且自然的动作,能够应对意外干扰,其中难度的一个决定性因素是任务的复杂性,尤其是与所需环境交互的复杂程度。现有研究已在静态和动态环境中的各种任务上取得成功。我们进一步将难度推向游泳,这需要全身协调和与流体的持续交互,这是与环境交互时的一个新复杂性层次。这种复杂性在学习控制时面临挑战,包括在易变的环境力下的控制学习、将控制泛化到不同环境和游泳风格、缺乏数据参考,以及在控制学习过程中不可避免的极其缓慢的物理模拟。为此,我们提出了SWIM,一种新的游泳动作模仿方法,它可以从单个游泳动作中学习,并泛化到未见过的环境、身体条件和游泳风格。广泛的评估和比较表明,SWIM具有数据效率高、稳定、鲁棒和可泛化的特点,在多个任务类别和指标上优于替代方法。

英文摘要

We propose a new method for synthesizing physically-based swimming motions. Physically-based character animation aims to generate physically valid, controllable, and natural-looking motions which can respond to unexpected disturbances, where one dictating factor of difficulty is the complexity of the task, especially the level of sophistication of the required interactions with the environment. Existing research has succeeded in various tasks in static and dynamic environments. We push the difficulty further to swimming, which requires full-body coordination and continuous interactions with fluids, a new level of complexity when it comes to interacting with the environment. This complexity imposes challenges in learning control under volatile environmental forces, generalizing control to different environments and swimming styles, lack of data references, and prohibitively slow physical simulation which is inevitable during control learning. To this end, we propose SWIM, a new imitation method for swimming motions, which can learn from a single swimming motion and generalize to unseen environments, body conditions, and swimming styles. Extensive evaluation and comparison demonstrate that SWIM is data-efficient, stable, robust, and generalizable, outperforming alternative methods across multiple classes of tasks and metrics.

2605.31119 2026-06-01 cs.RO cs.LG 版本更新

Don't Fool Me Twice: Adapting to Adversity in the Wild with Experience-Driven Reasoning

不要愚弄我两次:通过经验驱动推理在野外适应逆境

Navin Sriram Ravie, Andrew Jong, Krrish Jain, John Liu, Omar Alama, Bijo Sebastian, Sebastian Scherer

发表机构 * Department of Engineering Design, Indian Institute of Technology, Madras(印度理工学院工程设计系,马德拉斯) Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所)

AI总结 提出一种持续学习框架,使移动机器人能够在线从干扰中学习,通过语义将异常行为归因于原因,从而更好地预测和规划未来。

详情
AI中文摘要

在机器人学中,危险和逆境模式通常具有具体性且相对于每个智能体。自主移动机器人的一个前沿是使智能体能够在未见的非结构化环境中有效运行。在未见的非结构化环境中的一个重大挑战是可能无法预测特定机器人的所有危险。尽管最近的工作使用大型基础视觉语言模型(VLM)来预先预测一个详尽的常识性危险列表,但仍然难以捕捉可能的交互和依赖于具体性的逆境。我们提出了一个持续学习框架,使移动具身智能体能够在线从干扰中学习,并通过语义将异常行为归因于原因,从而更好地预测和规划未来世界。我们的框架“不要愚弄我两次”首先观察干扰并描述其对机器人的影响;该描述通过视觉上下文增强,以查询VLM预测可能的原因;使用核回归对局部干扰进行特征化,从而实现对瞬态异常的高效、少样本建模。我们利用语义体素中心建模来估计认知不确定性,通过将交互驱动的干扰视为可学习的空间行为,实现更丰富的下游恢复。我们提出了四个假设,并在仿真和硬件上跨具体性和逆境模式进行了验证。

英文摘要

In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possible to predict all the dangers to the specific robot. Although recent work has used large foundation vision-language models (VLMs) to preemptively predict an exhaustive list of common-sense dangers, it remains difficult to capture possible interaction and embodiment-dependent adversities. We propose a continual learning framework for a mobile embodied agent to learn online from disturbances and attribute anomalous behaviours to causes through semantics, enabling better prediction and planning of the world in the future. Our framework, "Don't Fool Me Twice", first observes disturbances and describes their effects on the robot; this description is augmented with visual context to query a VLM to predict possible causes; the local disturbance is characterized using kernel regression, which allows for efficient, few-shot modeling of transient anomalies. We leverage semantic voxel-centric modeling to estimate epistemic uncertainty, enabling richer downstream recovery by treating interaction-driven disturbances as learnable spatial behaviors. We present four hypotheses and validate them in simulation and on hardware across embodiments and adversity modes.

2605.31111 2026-06-01 cs.LG 版本更新

Subspace-Decomposed JEPAs: Disentangling Progression and Content in Latent World Models

子空间分解的JEPA:解耦潜在世界模型中的进展与内容

Lucas Thil, Jesse Read, Rim Kaddah, Guillaume Doquet

发表机构 * LIX, École Polytechnique(巴黎高等学院LIX实验室) IRT SystemX(系统X研究院) Safran Tech(萨弗兰科技)

AI总结 提出SD-JEPA方法,通过将JEPA潜在空间分解为正交的进展子空间和内容子空间,利用余弦边际三元组损失和SIGReg正则化分别约束,在控制基准上优于LeWM基线,并证明进展坐标可作为场景感知的指南针。

详情
AI中文摘要

联合嵌入预测架构(JEPA)通过预测未来嵌入来学习紧凑的潜在世界模型,但潜在空间的任何单一坐标都未被指定用于编码任务进展。我们将JEPA潜在空间分解为两个具有不相交角色的正交子空间:一个由余弦边际三元组损失塑造的低维进展子空间,以及一个由LeWM现有SIGReg目标正则化的高维内容子空间。我们证明两个抗坍塌力作用于不相交的坐标,因此它们加性组合而非在同一维度上竞争。我们的方法SD-JEPA在大多数控制基准上以匹配的计算量优于LeWM基线,并在Push-T上优于最强的非LeWM JEPA基线;子空间消融验证了分解是关键因素。除了规划之外,得到的一维角进展坐标在潜在空间中充当场景感知的指南针。它随任务进展而前进,当智能体回溯时后退,在受控扰动下既会尖峰也会重新定位到语义上合适的新任务阶段区域,以预测误差标量无法做到的方式将惊讶时刻与其意义分离。三个定量测试支持这一点:在40个保留的立方体情节中,|Δθ_t|在定位语义事件方面优于标准潜在预测误差惊讶度,最高可达+0.18的合并AUROC(在±1步容差下每情节胜率97.5%);在所有四个环境(每个环境40个情节)的情节内线性探针显示,8维进展子空间(潜在空间的4.2%)解释了72-95%的任务进展方差。

英文摘要

Joint-Embedding Predictive Architectures (JEPAs) learn compact latent world models by predicting future embeddings, but no single coordinate of the latent is designated to encode task progression. We carve the JEPA latent into two orthogonal subspaces with disjoint roles: a low-dimensional progression subspace shaped by a cosine-margin triplet loss, and a high-dimensional content subspace regularised by the existing SIGReg objective of LeWM. We prove that the two anti-collapse forces act on disjoint coordinates, so they compose additively rather than competing on the same dimensions. Our method, SD-JEPA improves over the LeWM baseline on the majority of its control benchmarks at matched compute, and outperforms the strongest non-LeWM JEPA baseline on Push-T; a subspace-ablation falsifier confirms the split is the load-bearing ingredient. Beyond planning, the resulting 1-D angular progression coordinate functions as a scene-aware compass on the latent. It advances with task progress, regresses when the agent backtracks, and under controlled perturbations both spikes and relocalises to a semantically appropriate new task-phase sector, separating the moment of surprise from its meaning in a way that prediction-error scalars cannot. Three quantitative tests back this up: $|Δθ_t|$ outperforms the standard latent-prediction-error surprise at localising semantic events on 40 held-out cube episodes by up to +0.18 pooled AUROC (97.5% per-episode win rate at $\pm 1$-step tolerance); a within-episode linear probe across all four environments (40 episodes per env) shows the 8-dimensional progression subspace (4.2% of the latent) explains 72-95% of task-progress variance..

2605.31108 2026-06-01 cs.CV cs.LG 版本更新

Remembering by Reconstructing: Domain Incremental Learning With Test-Time Training on Video Streams

通过重建来记忆:视频流上的域增量学习与测试时训练

Jonathan Swinnen, Tinne Tuytelaars

发表机构 * ESAT, KU Leuven(ESAT,比利时鲁汶大学)

AI总结 提出一种结合主任务头和自监督掩码自编码器头的域增量学习方法,通过测试时训练识别最佳LoRA适配器以重新记忆域,适用于视频流数据。

详情
AI中文摘要

在这项工作中,我们提出了一种新颖的域增量学习方法,使模型能够随时间适应不断演变的非平稳数据。与其他工作不同,我们不试图避免灾难性遗忘,而是允许并利用它。我们的模型结合了一个主任务头和一个自监督掩码自编码器(MAE)头。然后在增量训练期间学习特定于域的LoRA适配器。每个适配器专攻其域,自然地在两个头上诱导对其他域的遗忘。在推理时,我们在自监督MAE头上进行在线测试时训练,以识别哪些LoRA最匹配当前输入,从而使模型能够再次“记住”该域。我们的方案特别适用于现实世界的流数据,例如视频,其中连续样本高度相关且域变化是渐进的。我们在域增量动作识别和语义分割任务上展示了我们的方法。

英文摘要

In this work we introduce a novel approach to domain incremental learning, adapting models over time to evolving, non-stationary data. In contrast to other works, we do not attempt to avoid catastrophic forgetting, but rather allow it and exploit it. Our model combines a main task head with a self-supervised masked autoencoder (MAE) head. We then learn domain-specific LoRA adapters during incremental training. Each adapter specializes to its domain, naturally inducing forgetting on other domains in both heads. At inference, we perform online test-time training on the self-supervised MAE head to identify which LoRAs best matches the current input, so the model can `remember' the domain again. Our scheme is especially well-suited to real-world streaming data, such as video, where consecutive samples are highly correlated and domain shifts are gradual. We demonstrate our method on domain-incremental action recognition and semantic segmentation tasks.

2605.31106 2026-06-01 cs.LG 版本更新

Riemannian Diffusion Models on General Manifolds via Physics-Informed Neural Networks

基于物理信息神经网络的通用流形上的黎曼扩散模型

Gyeonghoon Ko, Juho Lee

发表机构 * Korea Advanced Institute of Science and Technology, Korea(韩国科学技术院)

AI总结 针对黎曼流形上热核难以解析计算的问题,提出用物理信息神经网络求解流形热方程来近似热核,从而实现扩散模型的训练与采样。

详情
AI中文摘要

黎曼扩散模型通过流形上的随机扩散方程将基于分数的生成建模推广到流形支持的数据。然而,训练需要从流形热核中采样并对其求导,而除少数高度对称的流形外,热核很少具有封闭形式。我们提出一种通用方法,通过使用物理信息神经网络(PINN)直接求解流形热方程来近似热核。给定显式流形规范,我们选择坐标系,推导相应的热(Fokker--Planck)方程和短时渐近近似,然后训练PINN学习对数热核。得到的替代模型能够实现前向加噪(热核采样)和去噪分数匹配的条件分数评估。我们在多种流形上演示了该方法,包括$S^2$、$SO(3)$、$\mathrm{SPD}(n)$和置换商点云。

英文摘要

Riemannian diffusion models generalize score-based generative modeling to manifold-supported data via stochastic diffusion equations on the manifold. However, training requires sampling from and differentiating the manifold heat kernel, which is rarely available in closed form beyond a few highly symmetric manifolds. We propose a general approach that approximates the heat kernel by directly solving the manifold heat equation with a physics-informed neural network (PINN). Given an explicit manifold specification, we choose a coordinate system, derive the corresponding heat (Fokker--Planck) equation and a short-time asymptotic approximation, and then train a PINN to learn the log heat kernel. The resulting surrogate enables both forward noising (heat-kernel sampling) and conditional-score evaluation for denoising score matching. We demonstrate the method on diverse manifolds including $S^2$, $SO(3)$, $\mathrm{SPD}(n)$, and permutation-quotiented point clouds.

2605.31070 2026-06-01 cs.LG cs.GT 版本更新

Learning to Bid in FCR Markets: A Best-of-Both-Worlds Approach

在FCR市场中学习投标:一种两全其美的方法

Marius Potfer, Cheng Wan, Pierre Gruet

发表机构 * EDF Lab Paris-Saclay, FiME (Laboratoire de Finance des Marchés de l’Énergie)(EDF巴黎萨克雷实验室,FiME(能源市场金融实验室))

AI总结 针对欧洲频率控制储备(FCR)市场中投标者仅能观察到部分反馈(如出清价格和分配数量)的问题,提出了一种将多国FCR出清问题转化为重复多单位统一价格拍卖的方法,并采用两全其美的组合半强盗算法实现对数伪遗憾(随机环境)和平方根遗憾(对抗环境),实验验证了其理论缩放性和实际竞争力。

Comments Algorithms and data available at https://data.mendeley.com/datasets/htprbf47dg/1

详情
AI中文摘要

在欧洲频率控制储备(FCR)市场中,由于竞争报价是隐藏的,投标者只能观察到来自市场的部分反馈,如出清价格和分配数量,因此对于灵活性提供商而言,投标具有挑战性。对于活跃在单个国家的参与者,我们证明多国FCR出清问题可以转化为针对内生对手报价向量的重复多单位统一价格拍卖。这种重新表述产生了一个在线学习问题,并使我们能够适应一种两全其美的组合半强盗算法,该算法可从这种标准市场反馈中实现。由此产生的投标者在随机环境中实现对数伪遗憾,在对抗环境中实现$\mathcal{O}(\sqrt{T})$遗憾。综合实验验证了预期的缩放性,对历史欧洲FCR数据的回测显示了实际中的竞争性能:该方法在稳定产品上表现尤其出色,而EXP3类型的基线在更强的非平稳性下可能更安全。总体而言,结果表明,当学习规则与产品级市场稳定性相匹配时,基于学习的FCR市场投标在理论上是有根据的,在实践中是有用的。

英文摘要

Bidding in the European Frequency Containment Reserve (FCR) market is challenging for flexibility providers because competing offers are hidden and bidders observe only partial feedback form the market, such as, clearing price and awarded quantity. For a participant active in a single country, we show that the multi-country FCR clearing problem can be recast as a repeated multi-unit uniform-price auction against an endogenous vector of opposing bids. This reformulation yields an online learning problem and allows us to adapt a Best-of-Both-Worlds combinatorial semi-bandit algorithm implementable from this standard market feedback. The resulting bidder achieves logarithmic pseudo-regret in stochastic environments and $\mathcal{O}(\sqrt{T})$ regret in adversarial ones. Synthetic experiments confirm the expected scaling, and backtests on historical European FCR data show competitive performance in practice: the method performs especially well on stable products, while EXP3-type baselines can be safer under stronger non-stationarity. Overall, the results show that learning-based bidding in FCR markets is theoretically grounded and practically useful when the learning rule matches product-level market stability.

2605.31063 2026-06-01 stat.ML cs.LG physics.chem-ph physics.comp-ph 版本更新

Free energy Estimation on Any State Space

任意状态空间上的自由能估计

Jiajun He, Zijing Ou, Francisco Vargas, Yingzhen Li, José Miguel Hernández-Lobato, Carles Domingo-Enrich, Yuanqi Du

发表机构 * University of Cambridge(剑桥大学) Imperial College London(伦敦帝国理工学院) Xaira Therapeutics(Xaira制药) Microsoft Research New England(微软研究院新英格兰分部)

AI总结 提出一种基于广义神经传输学习的框架,将自由能估计推广到任意状态空间,并揭示时间反演与Doob h-变换的群论结构。

详情
AI中文摘要

自由能估计是一个从物理学到统计学的基础且具有挑战性的问题。经典方法依赖于热力学变换,包括直接估计、准静态积分和有限时间平均。最近的工作[He and Du et al., 2025]通过学习神经传输显著加速了有限时间区间的效率。在本文中,我们将此框架推广到任意状态空间。基于这一观点,我们开发了一种广义神经传输学习方法以实现高效估计。实验验证了所提方法在连续设置之外的有效性和效率,扩展到离散和多模态空间以及自回归设置。除了自由能估计,我们还建立了代数恒等式并揭示了连接无穷小时间反演和广义Doob h-变换的群论结构,表明它们的组合形成一个广义二面体群。

英文摘要

Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.

2605.31061 2026-06-01 cs.LG cs.AI 版本更新

STEP: Learning STructured Embeddings for Progressive Time Series

STEP:学习渐进时间序列的结构化嵌入

Lucas Thil, Jesse Read, Rim Kaddah, Guillaume Doquet

发表机构 * LIX, École Polytechnique(高等理工学院LIX) IRT SystemX(系统X研究院) Safran Tech(萨弗兰科技)

AI总结 提出一种自监督对比学习方法,通过构建具有固定正交原型向量的低维流形几何结构,实现渐进时间序列的端状态预测、多步预测和可解释相位分离。

详情
AI中文摘要

我们提出了一种新颖的方法,用于学习渐进时间序列的可解释表示,即捕获不可逆状态转换(如退化或任务完成)的数据。我们的方法使用自监督对比目标来学习低维潜在空间,其几何结构本身就是解释:每个观测成为位于两个固定正交原型向量之间的流形上的一个点,轨迹成为穿过该流形的路径。从这种结构中,我们读取一个潜在指南针,即潜在向量的极坐标(θ, r),其中θ跟踪潜在状态的进展(例如,从健康到故障),r识别活动模式(例如,操作条件),无需任何代理标签。我们在不同领域(包括工业退化、机器人任务和神经活动)上评估了该方法与最先进方法的对比,验证了三个关键能力:(1)端状态预测,(2)多步预测,以及(3)可解释的相位分离。我们的方法在所有方面匹配或优于黑盒对应方法,同时提供对底层机制的透明性。在潜在指南针坐标之上的简单线性回归器与深度架构具有竞争力,这是底层状态以几何可访问形式编码的直接定量证据。

英文摘要

We present a novel method for learning interpretable representations of progressive time series, that is, data capturing irreversible state transitions such as degradation or task completion. Our approach uses a self-supervised contrastive objective to learn a low-dimensional latent space whose geometry is itself the interpretation: each observation becomes a point on a manifold anchored between two fixed orthogonal prototype vectors, and a trajectory becomes a path across that manifold. From this structure we read a latent compass, the polar coordinates (θ, r) of the latent vector, in which θ tracks the progression of the underlying state (e.g., from healthy to failed) and r identifies the active mode (e.g., the operating condition), without any proxy labels. We evaluate the approach against the state of the art on diverse domains, including industrial degradation, robotic tasks, and neural activity, validating three key capabilities: (1) end-state prediction, (2) multi-step forecasting, and (3) interpretable phase separation. Our method matches or improves over black-box counterparts on all of these while providing transparency about the underlying mechanisms. A simple linear regressor on top of the latent compass coordinates is competitive with deep architectures, direct quantitative evidence that the underlying state is encoded in a geometrically accessible form.

2605.31057 2026-06-01 cs.CV cs.LG 版本更新

LVSA: Training-Free Sparse Attention for Long Video Diffusion

LVSA:长视频扩散的无训练稀疏注意力

Gael Glorian, Ioannis Lamprou, Zhen Zhang, Yujie Yuan, Hongsheng Liu

发表机构 * Distributed Parallel Technology Laboratory, Paris Research Center, Huawei Technologies France(华为法国巴黎研究中心分布式并行技术实验室) AI Framework and Data Technology Lab, Huawei Technologies Co., Ltd.(华为技术有限公司人工智能框架与数据技术实验室)

AI总结 提出一种无需训练、模型无关的块稀疏注意力方法LVSA,通过结构化窗口模式与旋转全局锚点结合,在降低长视频扩散推理计算成本的同时消除固定网格偏差,支持超训练时域的视频生成。

Comments 10 pages, 5 figures, 4 tables. Code: https://github.com/JiusiServe/LongVideoSparseAttention

详情
AI中文摘要

密集自注意力是长视频扩散推理的计算和质量的瓶颈:成本随序列长度二次增长,且超出训练时域时模型收敛到近乎静态的输出,即“冻结”的重复视频。最先进的方法要么成本过高(例如需要重新训练),要么无法以可扩展的方式同时满足性能和质量目标。为此,我们提出长视频稀疏注意力(LVSA),一种无需训练、模型无关的块稀疏注意力方法,用于视频扩散Transformer,它结合了结构化窗口模式与旋转全局锚点,从而消除了导致长时域伪影的固定网格偏差。LVSA结合FlashInfer内核,与密集注意力相比,在Wan 2.1 1.3B上以6倍时域减少计算量达3.17倍,在Wan 2.1 14B上以6倍时域减少2.98倍,在HunyuanVideo 1.5上以1.5倍时域减少3.33倍。除了减少计算量,LVSA还使得HunyuanVideo 1.5能够在2倍时域下生成,否则在单个GPU上会内存不足。此外,与RIFLEx相比,LVSA在Wan 2.1 1.3B上提供高达2.41倍的加速,与UltraViCo相比提供3.27倍的加速。为了展示跨不同平台的适用性,我们将LVSA应用于NPU,与密集注意力相比,在Wan 2.2 A14B上实现高达2.71倍的加速,在Wan 2.1 1.3B上实现3.24倍的加速。为了公平地评估质量,我们引入了VQeval,一个正确评分循环视频失败的工具,而VBench-Long等最先进评估器则会奖励这类失败。LVSA在训练时域长度下生成时质量中性,在扩展长度下质量积极。

英文摘要

Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the model converges to near-static output, that is, "frozen" repetitive video. State of the art approaches are either too costly, e.g., they require retraining, or fail to satisfy both performance and quality objectives in a scalable manner. To this end, we introduce Long Video Sparse Attention (LVSA), a training-free model-agnostic block-sparse attention for video diffusion transformers that combines a structured window pattern with rotating global anchors, thus removing the fixed-grid bias which causes long-range temporal artifacts. LVSA, combined with a FlashInfer kernel, reduces compute up to 3.17x on Wan 2.1 1.3B at a 6x horizon, 2.98x on Wan 2.1 14B at a 6x horizon, and 3.33x on HunyuanVideo 1.5 at a 1.5x horizon, compared to dense attention. Beyond reducing compute, LVSA enables HunyuanVideo 1.5 generation at a 2x horizon, which is otherwise out-of-memory on a single GPU. Moreover, LVSA provides speedups up to 2.41x compared to RIFLEx and 3.27x compared to UltraViCo on Wan 2.1 1.3B. To demonstrate applicability across diverse platforms, we apply LVSA on NPUs and achieve speedups up to 2.71x on Wan 2.2 A14B and 3.24x on Wan 2.1 1.3B compared to dense attention. To evaluate quality in a fair way, we introduce VQeval, a tool properly scoring loopy video failures, which instead are rewarded in state of the art evaluators like VBench-Long. LVSA is quality-neutral for generation at training horizon length and quality-positive at extended lengths.

2605.31050 2026-06-01 cs.LG 版本更新

Best-Arm Identification-Based Trust Region Selection for Bayesian Optimization on Multimodal Functions

基于最佳臂识别的多模态函数贝叶斯优化信任区域选择

Nobuo Namura, Sho Takemori

发表机构 * Fujitsu Limited(富士通有限公司)

AI总结 提出一种结合最佳臂识别与信任区域贝叶斯优化的轨迹感知框架,通过预测局部优化器最终性能并逐步淘汰次优候选,加速多模态函数全局优化。

Comments 19 pages, 13 figures

详情
AI中文摘要

基于高斯过程的贝叶斯优化是昂贵的黑箱优化的流行方法,但其性能在复杂多模态或高维问题上常常下降。基于信任区域的贝叶斯优化通过聚焦局部区域缓解了这一问题,最近的研究表明,选择有效区域可以建模为多臂老虎机问题。我们提出了一种轨迹感知框架,将最佳臂识别与基于信任区域的贝叶斯优化相结合,以高效求解多模态优化问题。我们的方法外推多个局部初始化优化器的优化轨迹以预测其最终性能,并通过最佳臂识别逐步淘汰次优候选。我们从理论上证明,在温和假设下,所提出的最佳臂识别引导的贝叶斯优化比传统贝叶斯优化更快收敛到全局最优,并通过在合成和真实世界基准上的大量实验证明了其有效性。

英文摘要

Gaussian process-based Bayesian optimization (BO) is a popular approach for expensive black-box optimization, but its performance often degrades on complex multimodal or high-dimensional problems. Trust region-based BO mitigates this issue by focusing on local regions, and recent studies suggest that selecting an effective region can be formulated as a multi-armed bandit problem. We propose a trajectory-aware framework that integrates best-arm identification (BAI) with trust region-based BO to efficiently solve multimodal optimization problems. Our method extrapolates the optimization trajectories of multiple locally initialized optimizers to predict their final performance and progressively eliminates suboptimal candidates via BAI. We theoretically show that the proposed BAI-guided BO converges faster to the global optimum than conventional BO under mild assumptions, and demonstrate its effectiveness through extensive experiments on synthetic and real-world benchmarks.

2605.31049 2026-06-01 cs.LG cs.AI cs.LO 版本更新

Learning to Solve and Optimize by Evolving Code

通过代码演化学习求解与优化

Veronika Semmelrock, Benedetta Strizzolo, Francesco Zuccato, Gerhard Friedrich, Patrick Rodler, Konstantin Schekotihin

发表机构 * University of Klagenfurt(克雷格福大学) University of Udine(乌迪大学)

AI总结 提出CHECKMATE工具,利用形式规范确保解的正确性并通过自然语言描述指导代码演化,自动生成算法,在配置与调度问题上超越最先进求解器。

Comments Preprint of a paper accepted to IJCAI26

详情
AI中文摘要

组合与优化问题是许多工业AI应用的基础。解决此类大规模现实世界实例通常需要仔细的问题形式化、专门的求解器以及专家设计的启发式方法。因此,专家不仅需要指定解是什么,还需要指定如何推导出解。通过引入工具CHECKMATE,我们展示了通过代码演化生成算法代表了一种范式转变,消除了制定如何的需求。CHECKMATE仅依赖于是什么。具体来说,形式规范确保了解的正确性,并能够对生成的程序进行系统性能评估,而自然语言描述则指导演化过程。我们的方法在两个工业领域(配置与调度)的选定问题上展示了有效性。在所有案例中,演化出的算法始终优于最先进的求解器。这凸显了形式方法在引导代码演化以自动解决复杂现实问题方面的潜力。

英文摘要

Combinatorial and optimization problems are fundamental to many industrial AI applications. Solving large-scale real-world instances of such problems typically requires careful problem formalization, specialized solvers, and expert-designed heuristics. Thus, experts need to specify not only what solutions are, but also how they are derived. By introducing the tool CHECKMATE, we show that algorithm generation via code evolution represents a paradigm shift by eliminating the need to formulate the how. CHECKMATE solely relies on the what. Specifically, a formal specification ensures solutions' correctness and enables systematic performance evaluation of the generated programs, while a natural language description guides the evolutionary process. The effectiveness of our method is demonstrated on selected problems from two industrial domains: configuration and scheduling. In all cases, the evolved algorithms consistently outperform state-of-the-art solvers. This underscores the potential of formal methods in guiding code evolution for automatically solving complex real-world problems.

2605.31044 2026-06-01 cs.LG 版本更新

The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

使用强化学习控制工业能源系统的挑战

Tobias Lademann, Théo Vincent, Jan Peters, Matthias Weigold

发表机构 * Institute for Production Management, Technology and Machine Tools (PTW), Technical University of Darmstadt(技术大学达姆施塔特生产管理、技术与机床研究所) DFKI GmbH, SAIROL(DFKI GmbH,SAIROL) Department of Computer Science, Technical University of Darmstadt(技术大学达姆施塔特计算机科学系) Hessian.ai, Technical University of Darmstadt(黑森人工智能公司,技术大学达姆施塔特)

AI总结 本文以热力供暖网络为例,研究强化学习在真实工业能源系统部署中的挑战,包括部分可观测性、动作空间设计、奖励设计及仿真到现实的差距,并基于实际部署发现强化学习虽能实现运行稳定性但存在性能差距。

Comments Submitted to Finding the Frame Workshop at RLC 2026

详情
AI中文摘要

强化学习在优化工业能源系统控制方面显示出有希望的结果,然而现有研究大多局限于仿真环境中的应用。我们以热力供暖网络为例,研究了在真实工业能源系统中部署强化学习的挑战。我们将任务形式化为马尔可夫决策过程,并沿着形式化描述的结构系统分析了相关挑战,包括部分可观测性、动作空间设计、奖励设计以及仿真到现实的差距。这些挑战基于现有的真实部署,其中强化学习实现了运行稳定性,但与仿真相比表现出显著的性能差距。

英文摘要

Reinforcement learning has shown promising results for optimizing the control of industrial energy systems, yet most existing studies remain limited to the application in simulation environments. We investigate the challenges of deploying reinforcement learning in a real-world industrial energy system, considering a thermal heating network as a use case. We formulate the task as a Markov Decision Process and systematically analyze the associated challenges along the structure of the formal description, including partial observability, action space design, reward design, and the simulation-to-reality gap. The challenges are grounded in an existing real-world deployment, where reinforcement learning achieves operational stability but shows a significant performance gap compared to simulation.

2605.31043 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?

Stiefel流形上的路由:自适应子空间选择何时有助于跨域脑电解码?

Isabella Costa Maia, Pedro L. C. Rodrigues, Salem Said, Marco Congedo

发表机构 * GIPSA-lab, University Grenoble Alpes, CNRS, Grenoble-INP(GIPSA实验室,格勒诺布尔阿尔卑斯大学,法国国家科学研究中心,格勒诺布尔-INP) Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK(格勒诺布尔阿尔卑斯大学,法国国家信息与自动化研究所,法国国家科学研究中心,格勒诺布尔-INP,LJK) Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK(格勒诺布尔阿尔卑斯大学,法国国家科学研究中心,格勒诺布尔-INP,LJK)

AI总结 针对跨域脑电解码中协方差矩阵域偏移问题,提出动态Stiefel路由方法,通过Stiefel流形上的专家投影滤波器池和交叉注意力机制实现自适应子空间选择,并引入三种结构性质避免退化为集成平均,在三个数据集上取得一致提升。

详情
AI中文摘要

尽管黎曼深度学习取得了进展,跨域脑电解码仍然具有挑战性:来自不同受试者的协方差矩阵占据了SPD流形上系统不同的区域,然而现有的域适应方法要么需要目标域校准数据,要么学习无法跨域泛化的受试者特定组件。我们提出了动态Stiefel路由:在Stiefel流形上有一个包含$K$个专家投影滤波器的池,每个滤波器专门处理SPD流形上的不同区域,每个输入协方差通过交叉注意力路由到最合适的滤波器,从而为每个样本自适应调整子空间投影。一个核心发现是,这种朴素实现的方法会退化为集成平均:当路由权重均匀时,自适应滤波器恰好等价于专家的等贡献组合,与单个固定滤波器无法区分。三种结构性质打破了这种退化:一个对称锚点$W_{\mathrm{base}} \in \mathrm{St}(n,k)$消除了专家间的邻近偏差;一个冻结的域判别查询编码器将路由与任务优化解耦;以及一个解耦的键对齐损失,将专家键训练到稳定的域吸引子。它们共同产生了SPD流形上第一个真正承诺且域结构化的路由,在三个数据集上取得一致提升:平衡准确率分别从$0.773\to 0.823$、$0.757\to 0.809$和$0.801\to 0.839$,且对齐策略由单一数据驱动规则自动确定,无需数据集特定的超参数搜索。

英文摘要

Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the Stiefel manifold, each specialised for a different region of the SPD manifold, with each input covariance routed to the most appropriate filter via cross-attention, adapting the subspace projection per sample. A central finding is that this approach, implemented naively, provably collapses to ensemble averaging: when routing weights are uniform, the adaptive filter reduces exactly to an equal-contribution combination of experts, indistinguishable from a single fixed filter. Three structural properties break this degeneracy: a symmetric anchor $W_{\mathrm{base}} \in \mathrm{St}(n,k)$ that removes proximity bias among experts; a frozen domain-discriminative query encoder that decouples routing from task optimisation; and a decoupled key alignment loss that trains expert keys toward stable domain attractors. Together they produce the first genuinely committed and domain-structured routing on SPD manifolds, with consistent gains across three datasets: balanced accuracy improves from $0.773\to 0.823$, $0.757\to 0.809$, and $0.801\to 0.839$, with the alignment strategy determined automatically by a single data-driven rule and no dataset-specific hyperparameter search.

2605.31040 2026-06-01 cs.LG 版本更新

UniRTL: Unifying Code and Graph for Robust RTL Representation Learning

UniRTL:统一代码和图以实现稳健的RTL表示学习

Yi Liu, Hongji Zhang, Lei Chen, Mingxuan Yuan, Qiang Xu

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR(计算机科学与工程系,香港中文大学,香港特别行政区) Noah's Ark Lab, Huawei, Hong Kong SAR(华为诺亚实验室,香港特别行政区)

AI总结 提出UniRTL多模态预训练框架,通过互掩码建模和分层训练策略联合利用RTL代码与控制数据流图,实现细粒度对齐,在性能预测和代码检索任务上优于现有方法。

Comments Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

为寄存器传输级(RTL)设计开发有效的表示对于加速硬件设计工作流至关重要。然而,现有方法通常依赖于单一数据模态,即RTL代码或其相关的基于图的表示,限制了所学表示的表达能力和泛化能力。对于RTL,控制数据流图(CDFG)提供了保留完整信息的全面结构表示,而代码模态显式编码了语义和功能信息。我们认为,整合这些互补模态对于全面理解RTL设计至关重要。为此,我们提出UniRTL,一种多模态预训练框架,通过联合利用代码和CDFG学习统一的RTL表示。UniRTL通过互掩码建模实现代码和图之间的细粒度对齐,并采用分层训练策略,该策略结合了预训练的图感知分词器以及在图集成之前对文本(即功能摘要)和代码进行分阶段对齐。我们在两种下游任务(性能预测和代码检索)的多种设置下评估UniRTL。实验结果表明,UniRTL始终优于先前的方法,使其成为推进硬件设计自动化的更稳健和更强大的基础。

英文摘要

Developing effective representations for register transfer level (RTL) designs is crucial for accelerating the hardware design workflow. Existing approaches, however, typically rely on a single data modality, either the RTL code or its associated graph-based representation, limiting the expressiveness and generalization ability of the learned representations. For RTL, the control data flow graph (CDFG) offers a comprehensive structural representation that preserves complete information, while the code modality explicitly encodes semantic and functional information. We argue that integrating these complementary modalities is essential for a thorough understanding of RTL designs. To this end, we propose UniRTL, a multimodal pretraining framework that learns unified RTL representations by jointly leveraging code and CDFG. UniRTL achieves fine-grained alignment between code and graph through mutual masked modeling and employs a hierarchical training strategy that incorporates a pretrained graph-aware tokenizer and staged alignment of text (i.e., functional summary) and code prior to graph integration. We evaluate UniRTL on two downstream tasks, performance prediction and code retrieval, under multiple settings. Experimental results show that UniRTL consistently outperforms prior methods, establishing it as a more robust and powerful foundation for advancing hardware design automation.

2605.31036 2026-06-01 cs.GT cs.LG 版本更新

Model Monotonicity in Autobidding Auctions: When Do Better Predictions Lead to Better Outcomes?

自动竞价拍卖中的模型单调性:更好的预测何时带来更好的结果?

Ashwinkumar Badanidiyuru

发表机构 * Uber Technologies, Inc.(优步技术公司)

AI总结 研究在线广告中推荐系统模型质量、拍卖格式和自动竞价者行为的相互作用,通过聚类精炼定义模型改进,并系统刻画不同竞价者类型、拍卖格式和预算约束下评估指标单调性的条件。

详情
Journal ref
ICML 2026
AI中文摘要

在线广告平台依赖机器学习模型预测点击率(pCTR)和转化率(pCVR)以用于拍卖机制。我们引入了一个新框架来研究推荐系统模型质量、拍卖格式和自动竞价者行为之间的相互作用。我们形式化了模型改进——通过受概率论中滤子启发的精炼关系定义——何时导致平台级评估指标(如收入、福利或流动性福利)的改进。我们的主要贡献是:(1)基于聚类精炼的模型改进的形式化定义,以及(2)跨不同竞价者类型(tCPA、max-CPA)、拍卖格式(第一价格、第二价格、VCG)和预算约束的ECM单调性的系统刻画。我们证明,具有统一竞价的第一价格拍卖保证了无预算的tCPA竞价者的收入单调性(通过Jensen不等式),而第二价格拍卖和预算约束可能破坏这一性质。我们为非单调性结果提供了完整的数值构造。我们的发现对寻求将模型改进与业务成果对齐的广告平台具有实际意义。

英文摘要

Online advertising platforms rely on machine learning models to predict click-through rates (pCTR) and conversion rates (pCVR) for auction mechanisms. We introduce a novel framework to study the interaction between recommender system model quality, auction format, and autobidder behavior. We formalize when model improvements -- defined via a refinement relation inspired by filtrations in probability theory -- lead to improvements in platform-level Evaluation Criteria Metrics (ECM) such as revenue, welfare, or liquid welfare. Our main contributions are: (1) a formal definition of model improvement based on cluster refinement, and (2) a systematic characterization of ECM monotonicity across different combinations of bidder types (tCPA, max-CPA), auction formats (first-price, second-price, VCG), and budget constraints. We show that first-price auctions with uniform bidding guarantee revenue monotonicity for tCPA bidders without budgets (via Jensen's inequality), while second-price auctions and budget constraints can break this property. We provide full numerical constructions for the non-monotonicity results. Our findings have practical implications for advertising platforms seeking to align model improvements with business outcomes.

2605.31034 2026-06-01 cs.LG cs.AI 版本更新

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

多臂贝叶斯老虎机中的退火Softmax贪婪算法

William Overman, Mohsen Bayati

发表机构 * Stanford University(斯坦福大学)

AI总结 本文研究退火Softmax贪婪算法在多臂贝叶斯伯努利老虎机中的贝叶斯遗憾,证明在先验满足线性上尾条件(β=1的β正则性)时,算法达到接近最优的贝叶斯遗憾率,并与RLVR方法形成结构类比。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)和基于组的策略优化方法(如GRPO)通过为每个提示采样多个完成并增加策略在奖励较高的完成上的概率来更新随机策略,同时通过KL惩罚向参考策略正则化。这些更新不包括追踪认知不确定性的显式机制。本文研究为何这种不确定性无关的更新仍然有效的一个风格化解释。我们分析了一个退火softmax(玻尔兹曼)策略,该策略在多臂贝叶斯伯努利老虎机中根据经验平均奖励的softmax选择动作。在先验满足线性上尾条件(β正则性的β=1情况)下,该条件意味着存在大量接近最优的臂,我们证明退火softmax贪婪算法实现了贝叶斯遗憾$ ilde{O}(m + T/m)$,特别地,当臂数$m = Θ(\sqrt{T})$时,遗憾为$ ilde{O}(\sqrt{T})$。这是该机制下接近最优的贝叶斯遗憾率,经验平均贪婪算法也能达到。在β正则性下,许多臂在整个学习过程中保持经验均值接近最优,因此当softmax采样一个非经验最优的臂时,该臂往往是另一个接近最优的臂,而不是明显较差的臂。相比之下,当臂数较少时,同类的softmax策略可能遭受线性遗憾。该结果也为RLVR提供了结构类比,其中以非可忽略概率产生正确完成的基础策略扮演了β正则性的角色。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) and group-based policy optimization methods such as GRPO update a stochastic policy by sampling multiple completions per prompt and increasing the policy's probability on those with higher reward, regularized by a KL penalty toward a reference policy. These updates do not include explicit mechanisms that track epistemic uncertainty. This paper studies a stylized explanation for why such uncertainty-agnostic updates can nevertheless be effective. We analyze an annealed softmax (Boltzmann) policy that selects actions according to a softmax of empirical mean rewards in a many-armed Bayesian Bernoulli bandit. Under a linear upper-tail condition on the prior (the $β=1$ case of $β$-regularity), which implies an abundance of near-optimal arms, we prove that annealed softmax greedy achieves Bayes regret $\tilde{O}(m + T/m)$, and in particular $\tilde{O}(\sqrt{T})$ when the number of arms scales as $m = Θ(\sqrt{T})$. This is the near-optimal Bayes regret rate in this regime, attained also by empirical-mean greedy. Under $β$-regularity, many arms maintain empirical means close to the optimum throughout learning, so when softmax samples an arm other than the empirically best, that arm tends to be another near-optimal one rather than a clearly inferior one. By contrast, with a small number of arms, the same kind of softmax policy can suffer linear regret. The result also provides a structural analogy to RLVR, where a base policy with a non-negligible probability of producing a correct completion plays the role of $β$-regularity.

2605.31023 2026-06-01 cs.AI cs.LG cs.MA 版本更新

HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

HADT: 一种用于自主对地观测卫星集群的异构多智能体差分Transformer

Mohamad A. Hady, Muhammad Anwar Masum, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

发表机构 * School of Computer Science and Information Technology, Adelaide University(计算机科学与信息科技学院,阿德莱德大学) School of Electrical Engineering, Computing and Mathematical Sciences (EECMS), Curtin University(电气工程、计算与数学科学学院(EECMS), Curtin大学) Systems Research Institute, Polish Academy of Sciences(波兰科学院系统研究所)

AI总结 针对异构卫星集群自主对地观测任务,提出基于Transformer的架构,通过关系观测-动作令牌化和差分注意力机制实现自适应实时资源管理,性能显著优于基线。

Comments Accepted in ECML-PKDD 2026. arXiv admin note: text overlap with arXiv:2511.12792

详情
AI中文摘要

本文解决了执行对地观测任务(包括光学和合成孔径雷达卫星)的异构卫星集群中的自主资源管理问题。在自主运行模式下,卫星配备智能能力,能够根据最新条件实时决策,同时最小化与地面操作员的交互。传统的调度方法通常依赖数学模型来表示卫星任务和资源管理,然后通过优化算法求解。然而,当底层模型不可用、过于复杂或因空间任务环境中的动态变化和不确定性而不准确时,此类解决方案效果不佳。一个有前景的替代方案是将问题重新表述为序列决策过程,并应用无模型强化学习技术来实现自适应和实时资源管理。为此,我们提出了一种新颖的基于Transformer的架构,专门针对异构卫星集群自主对地观测任务,采用关系观测-动作令牌化和差分注意力机制。我们的实验结果表明,与现有基线相比,性能有显著提升。此外,所提出的架构在不同卫星集群数量下表现出强大的适应性和可迁移性。

英文摘要

This work addresses the problem of autonomous resource management in heterogeneous satellite cluster conducting Earth Observation (EO) missions including optical and Synthetic Aperture Radar (SAR) satellites. In autonomous operation mode, satellites are equipped with intelligent capabilities enabling real-time decision-making based on the latest conditions, while requiring minimal interaction with ground operators. Traditional scheduling approaches typically rely on mathematical models to represent satellite mission and resource management. Then, this problem is solved by using optimization algorithms. However, such solutions become less effective when the underlying models are not available, over complex, and inaccurate due to dynamic changes and uncertainties inherent in the space mission environment. A promising alternative is to reformulate the problem as a sequential decision-making process and apply model-free reinforcement learning techniques to enable adaptive and real-time resource management. To this end, we propose a novel transformer-based architecture tailored for heterogeneous satellite cluster autonomous EO Mission with relational observations-actions tokenization and differential attention mechanism. Our experimental results demonstrate significant performance improvements compared to the available baselines. Moreover, the proposed architecture exhibits strong adaptability and transferability with respect to varying numbers of satellite clusters.

2605.31022 2026-06-01 cs.LG 版本更新

Augmented Lagrangian Predictive Coding

增广拉格朗日预测编码

Jeffrey Seely, Julian Gould

发表机构 * Sakana AI

AI总结 提出增广拉格朗日预测编码(PC-ALM),通过层局部拉格朗日乘子累积约束误差,使局部更新对齐反向传播梯度,在深度网络中匹配反向传播性能。

Comments 22 pages, 10 figures

详情
AI中文摘要

预测编码(PC)是反向传播(BP)的一种局部学习替代方案,通过局部能量最小化动力学而非全局反向传播来训练深度网络。我们引入了增广拉格朗日预测编码(PC-ALM),它保持了PC的推理预算,但通过将每层约束误差累积到层局部拉格朗日乘子中,使每个权重更新与BP对齐。在线性PC网络中,PC-ALM收敛到一个平衡点,其中精确的BP梯度仅通过层局部更新分布在整个网络中。我们在深度达128的非线性PC网络中分析了PC-ALM,并表明它在所有宽度-深度设置下匹配BP性能,特别是在PC表现不佳的深度窄网络中。PC-ALM在每层激活中引入了循环动力学。与PC在标量能量上的热流相比,PC-ALM动力学由增广拉格朗日上的对偶上升驱动。我们观察到在非常深的网络中“弹道”式信用传播,信用信号均匀分布在各层,而PC则是缓慢、扩散的信用传播。除了算法本身,增广拉格朗日框架提供了PC的泛化,并可能为分布式系统如何通过纯局部动力学计算和传播类似BP的信用信号提供见解。

英文摘要

Predictive coding (PC) is a local-learning alternative to backpropagation (BP), training deep networks via local energy-minimization dynamics rather than a global backward pass. We introduce Augmented Lagrangian Predictive Coding (PC-ALM), which maintains PC's inference budget but aligns each weight update toward BP by accumulating per-layer constraint errors into a layer-local Lagrange multiplier. In linear PC networks, PC-ALM converges to an equilibrium with exact BP gradients distributed across the network via only layer-local updates. We analyze PC-ALM in nonlinear PC networks up to depth 128 and show that it matches BP performance across all width-depth regimes, notably in deep narrow networks where PC underperforms. PC-ALM introduces recurrent dynamics in each layer's activations. Compared to PC's heat flow on a scalar energy, PC-ALM dynamics are driven by dual ascent on the augmented Lagrangian. We observe "ballistic" credit propagation across very deep networks, with credit signals evenly distributed across layers, compared to PC's slow, diffusive credit propagation. Beyond the algorithm itself, the augmented Lagrangian framework offers a generalization of PC, and may yield insights into how distributed systems could compute and propagate BP-like credit signals through purely local dynamics.

2605.31021 2026-06-01 cs.AI cs.CL cs.LG 版本更新

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

基于人格的生成式AI多元对齐评估框架

Atahan Karagoz

发表机构 * Atahan Karagöz(阿塔汗·卡拉戈兹)

AI总结 提出一种状态空间约束仿真框架,通过合成认知轮廓替代单一评估函数,实现反映真实世界共识变异性的多元、视角依赖的基准测试,并分析仿真评估者的稳定性问题,论证动态调节机制的必要性。

详情
AI中文摘要

当前生成式人工智能的对齐范式主要依赖单一基准测试框架,将人类判断的多元性简化为聚合统计基线,从而掩盖了评估中的文化、人口和语境变异性。我们引入一种用于AI评估的状态空间约束仿真框架,用代表不同人类视角的合成认知轮廓的结构化流形替代单一评估函数。我们表明,现代生成架构能够以高度一致性实例化和维护这些评估人格,从而实现一种更接近现实世界共识变异性的多元、视角依赖的基准测试。然而,我们进一步分析了这些模拟评估者在顺序推理和随机提示扰动下的稳定性,揭示了人格一致性的系统性退化,表现为状态空间漂移和语义不一致。这些发现表明,静态对齐约束不足以维持随时间推移的稳健评估行为。相反,我们主张必须在生成系统中嵌入动态的、可行性驱动的调节机制,以保持连贯的认知仿真。通过将基于人格的评估视为潜在表征流形上的结构化动力系统,本研究为更自适应、更符合人类、更注重语境的AI评估方法奠定了基础。

英文摘要

Current alignment paradigms for generative artificial intelligence rely predominantly on monolithic benchmarking frameworks that reduce the plurality of human judgment to aggregated statistical baselines, thereby obscuring cultural, demographic, and contextual variability in evaluation. We introduce a state-space constrained emulation framework for AI evaluation that replaces singular assessment functions with a structured manifold of synthetic cognitive profiles representing diverse human perspectives. We show that modern generative architectures can instantiate and maintain these evaluative personas with high consistency, enabling a form of pluralistic, perspective-dependent benchmarking that more closely reflects real-world consensus variability. However, we further analyze the stability of these simulated evaluators under sequential inference and stochastic prompt perturbations, revealing systematic degradation in persona coherence that manifests as state-space drift and semantic inconsistency. These findings suggest that static alignment constraints are insufficient for sustaining robust evaluative behavior over time. Instead, we argue for the necessity of embedding dynamic, viability-driven regulatory mechanisms within generative systems to preserve coherent cognitive emulation. By framing persona-based evaluation as a structured dynamical system over latent representation manifolds, this study provides a foundation for more adaptive, human-aligned, and context-sensitive approaches to AI evaluation.

2605.31016 2026-06-01 cs.LG 版本更新

An Efficient and Scalable Graph Condensation with Structure-Preserving

一种高效且可扩展的保结构图压缩方法

Yulin Hu, Fuyan Ou, Ye Yuan

发表机构 * Southwest University(西南大学)

AI总结 提出一种解耦节点压缩与图结构生成的保结构图压缩方法(SP-ESGC),通过热核特征传播和混合聚类策略实现高效图压缩,并利用预训练边预测器生成可迁移的结构模式,在保持高计算效率的同时提升跨GNN架构的泛化能力。

详情
AI中文摘要

图压缩(GC)对于在资源受限场景中部署图神经网络(GNN)至关重要,它通过将大规模图压缩为紧凑的合成图来实现。现有的GC方法通常由于耦合优化而面临计算效率低的问题,并且在不同GNN架构上泛化能力差。为了解决这些挑战,本研究提出了一种高效且可扩展的保结构图压缩方法(SP-ESGC),该方法采用解耦设计,将节点压缩与图结构生成分离。具体来说,首先利用热核特征传播,通过谱图理论启发的扩散生成节点表示。进一步,设计了一种新颖的混合聚类策略,从节点表示中提取判别性的类内质心。最后,一个预训练的边预测器从原始图中推断可迁移的结构模式,确保合成图的准确生成。在真实世界图数据集上的大量实验表明,所提出的SP-ESGC实现了精确的图压缩,同时具有显著高的计算效率。此外,SP-ESGC在多种GNN架构上也具有良好的泛化能力。

英文摘要

Graph condensation (GC) is pivotal for enabling Graph Neural Networks (GNNs) deployment in resource-constrained scenarios by compressing large-scale graphs into compact synthetic counterparts. Existing GC methods commonly suffer from computational inefficiency due to coupled optimization as well as encountering poor generalization across GNN architectures. To address these challenges, this study proposes an Efficient and Scalable Graph Condensation with Structure-Preserving (SP-ESGC), which possesses a decoupled design that separates node condensation from graph structure generation. Specifically, it first employs heat kernel feature propagation to generate node representation via spectral graph theory-inspired diffusion. Further, a novel hybrid clustering strategy is designed to extracts discriminative intra-class centroids from the node representation. Finally, a pre-trained edge predictor infers transferable structural patterns from the original graph, ensuring accurate synthetic graph generation. Extensive experiments on real-world graph datasets demonstrate that the proposed SP-ESGC implementes a precise GC with significantly high computational efficiency. Moreover, SP-ESGC also generalizes well across diverse GNN architectures.

2605.31013 2026-06-01 cs.LG 版本更新

Physics-Informed Coarsening for Multigrid Graph Neural Surrogates

物理信息粗化用于多重网格图神经网络代理

Amir Bazzi, David Cardinaux, Ramy Nemer, Jose Alaves, Arjun Kalkur Matpadi Raghavendra, Elie Hachem

发表机构 * Amir Bazzi(阿米尔·巴齐) David Cardinaux(大卫·卡迪纳克斯) Ramy Nemer(拉米·纳默) José Alves(若泽·阿尔维斯) Arjun Kalkur(阿鲁金·卡尔库) Matpadi Raghavendra(马特帕迪·拉吉文德拉) Elie Hachem(埃利·哈克)

AI总结 针对固体力学中的非线性弹性、塑性和瞬态行为,提出一种结合物理信息粗化策略的多重网格图神经网络,通过基于残差的局部活动评分保留高应变/应力区域,实现分层消息传递,提升长期滚动稳定性和精度。

Comments Accepted at ICML 2026. 16 pages, 5 figures

详情
AI中文摘要

基于学习的偏微分方程代理最近在流体设置和结构化几何中达到了经典求解器的精度,同时实现了数量级的加速。相比之下,尽管存在非线性弹性、塑性和瞬态行为挑战标准架构,但针对可变形固体的鲁棒代理仍未得到充分探索。我们提出了一种用于固体力学的多重网格图神经网络,它将编码器-处理器-解码器主干与物理信息粗化策略相结合。我们的方法不是通过几何启发式进行下采样,而是使用基于残差的局部物理活动度量对节点进行评分,并优先保留高应变或应力集中区域,在最需要的地方分配多尺度容量。这通过分层消息传递保留了长程相互作用,同时提高了长期滚动的稳定性。我们在涵盖线性、非线性和瞬态状态的多个数据集上进行评估,并观察到与标准采样基线相比,在精度和滚动稳定性方面的一致提升。我们的结果突出了物理信息粗化对于固体力学中可扩展代理建模的重要性。

英文摘要

Learning-based surrogates for partial differential equations have recently matched the accuracy of classical solvers while achieving orders-of-magnitude speedups, predominantly in fluid settings and structured geometries. In contrast, robust surrogates for deformable solids remain underexplored, despite the presence of nonlinear elasticity, plasticity, and transient behavior that challenge standard architectures. We introduce a multigrid graph neural network for solid mechanics that couples an encoder-processor-decoder backbone with a physics-informed coarsening strategy. Instead of downsampling via geometric heuristics, our method scores nodes using a residual-based measure of local physical activity and preferentially retains regions of high strain or stress concentration, allocating multiscale capacity where it is most needed. This preserves long-range interactions through hierarchical message passing while improving stability over long rollouts. We evaluate on multiple datasets covering linear, nonlinear, and transient regimes, and observe consistent gains in accuracy and rollout stability compared to standard sampling baselines. Our results highlight the importance of physics-informed coarsening for scalable surrogate modeling in solid mechanics.

2605.31007 2026-06-01 cs.LG cs.AI 版本更新

DEM: A Distilled Explanation Model for Interpretable Anomaly Detection in Physiological Sensor Networks

DEM:面向生理传感器网络中可解释异常检测的蒸馏解释模型

Jyotirmoy Singh, Anushka Roy, Shreea Bose, Chittaranjan Hota

发表机构 * Department of Computer Science and Information Systems(计算机科学与信息系统系) Department of Electrical and Electronics Engineering(电气与电子工程系)

AI总结 提出一种三阶段玻璃箱框架DEM,通过将梯度提升专家模型的知识蒸馏到基于线性基线残差的决策树中,实现高精度与内在可解释性的异常检测,并引入蒸馏保真度指标量化解释可信度。

Comments 21 pages, 10 figures, 7 tables. Code: https://github.com/Jyotirmoy17/dem-model

详情
AI中文摘要

无线体域网(WBANs)中生理传感器数据的异常检测可能由传感器故障、网络中断或数据缺失引起,导致误报。因此,它既需要高预测精度,也需要临床可解释的解释。现有方法要么依赖性能强但无透明度的黑盒模型,要么依赖SHAP和LIME等事后解释方法。本文提出蒸馏解释模型(DEM),一个三阶段玻璃箱框架,将梯度提升专家模型的非线性知识蒸馏到基于线性基线残差的可解释决策树中,使得解释不是近似而是预测本身。DEM引入了一种新颖的蒸馏保真度指标,量化解释树忠实捕捉专家模型非线性贡献的程度,提供了先前可解释模型所缺乏的解释可信度的原则性度量。在包括MIMIC-IV、WESAD、eICU和内部SmartNet WBAN语料库在内的四个生理数据集上评估,DEM在临床上下文异常检测上达到0.9964的AUC,在可穿戴压力检测上达到0.9047,同时以可控深度生成人类可读的if-then规则。推理每1000个样本需要0.17ms,使DEM比基于SHAP的事后解释快1235倍,适用于实时生理监测。消融研究证实,XGBoost蒸馏步骤比朴素残差拟合提供了可测量的增益,深度敏感性分析展示了DEM在现有内在可解释模型中独有的、用户可控的准确性-可解释性权衡。

英文摘要

Anomaly detection in physiological sensor data from Wireless Body Area Networks (WBANs) can be caused by sensor faults, network disruptions, or missing data, leading to false alarms. Hence, it demands both high predictive accuracy and clinically interpretable explanations. Existing approaches rely either on black-box models that achieve strong performance but offer no transparency, or on post-prediction explanation methods such as SHAP and LIME. In this paper, we propose the Distilled Explanation Model (DEM), a three-stage glass-box framework that distills the non-linear knowledge of a gradient boosting expert into an interpretable decision tree operating on residuals relative to a linear baseline, so that the explanation is not an approximation but the prediction itself. DEM introduces a novel distillation fidelity metric that quantifies how faithfully the explanation tree captures the expert model's non-linear contribution, providing a principled measure of explanation trustworthiness absent from prior interpretable models. Evaluated across four physiological datasets, including MIMIC-IV, WESAD, eICU, and an in-house SmartNet WBAN corpus, DEM achieves an AUC of 0.9964 on clinical contextual anomaly detection and 0.9047 on wearable stress detection while producing human-readable if-then rules at a controllable depth. Inference requires 0.17ms per 1000 samples, rendering DEM 1235x faster than SHAP-based post-hoc explanation and suitable for real-time physiological monitoring. Ablation studies confirm that the XGBoost distillation step provides measurable gains over naive residual fitting, and depth-sensitivity analysis demonstrates an explicit, user-controlled accuracy-interpretability trade-off unique to DEM among existing intrinsically interpretable models.

2605.31005 2026-06-01 cs.LG 版本更新

Learning Multi-Agent Coordination via Sheaf-ADMM

通过 Sheaf-ADMM 学习多智能体协调

Jeffrey Seely, Bartłomiej Cupiał, Llion Jones

发表机构 * universityofwarsaw(华沙大学)

AI总结 提出一种可微优化框架,利用细胞层(sheaf)和ADMM实现多智能体协调,在迷宫路径规划、图像分类和数独任务中验证了其有效性,并展现出优于标准消息传递架构的可解释性和鲁棒性。

Comments 17 pages, 8 figures, 6 tables. Accepted at ICML 2026

详情
AI中文摘要

我们提出了一种用于多智能体协调的可微优化框架。输入被分解为重叠的局部视图,每个视图由一个智能体处理,该智能体求解由神经编码器参数化的凸子问题。智能体通过交替方向乘子法(ADMM)进行协调,其中智能体间的约束由细胞层(cellular sheaf)指定。该层指定了相邻解必须在哪些方面达成一致,从而允许异构的全局共识概念。通过展开的优化进行反向传播,联合训练多智能体系统的所有组件。我们在迷宫路径规划、图像分类和数独任务上进行了评估,在这些任务中,局部视图单独不足的智能体学会了协调以产生正确的全局输出。在MNIST上,相对于标准CNN,局部视图分解提高了对分布偏移的鲁棒性。在数独上,优化导出的结构比参数匹配的MPNN基线产生了显著更高的求解率。最后,ADMM结构暴露了不同的原始、共识和对偶状态变量,使得协调动态可以直接分析和干预——这是标准消息传递架构所不具备的特性。

英文摘要

We present a differentiable optimization framework for multi-agent coordination. An input is decomposed into overlapping local views, each processed by an agent that solves a convex subproblem parameterized by a neural encoder. Agents coordinate through the Alternating Direction Method of Multipliers (ADMM) with inter-agent constraints specified by a cellular sheaf. The sheaf specifies which aspects of neighboring solutions must agree, allowing for heterogeneous notions of global consensus. Backpropagating through the unrolled optimization jointly trains all components of the multi-agent system. We evaluate on maze pathfinding, image classification, and Sudoku, where agents with individually insufficient local views learn to coordinate to produce correct global outputs. On MNIST, the local-view decomposition yields improved robustness to distribution shifts relative to a standard CNN. On Sudoku, the optimization-derived structure yields markedly higher solve rates than parameter-matched MPNN baselines. Finally, the ADMM structure exposes distinct primal, consensus, and dual state variables, opening the coordination dynamics to direct analysis and intervention -- a property unavailable in standard message-passing architectures.

2605.31000 2026-06-01 cs.NI cs.LG 版本更新

HetCCL: Enabling Collective Communication For Mixed-Vendor Heterogeneous Clusters

HetCCL:实现混合供应商异构集群的集体通信

Yuejie Wang, Tao Chang, Yuanyuan Zhao, Yulong Ao, Zeyu Gu, Zhiyu Li, Yanmin Jia, Yan Zhang, Mingjun Zhang, He Liu, Yongzhe He, Yonghua Lin, Guyue Liu

发表机构 * Peking University(北京大学) Beijing Academy of Artificial Intelligence(北京人工智能研究院) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所)

AI总结 提出HetCCL框架,通过高效P2P传输和边界通信器机制,在异构集群中实现跨供应商的集体通信,消除主机-设备内存拷贝开销,并优化带宽利用率。

详情
AI中文摘要

在异构集群上训练大型语言模型(LLM)给集体通信带来了重大挑战,因为来自多个供应商的硬件引入了多样化的网络和计算特性。现有的为同构环境设计的集体通信框架(如NCCL、RCCL)无法处理混合硬件设置,而支持异构的通信库(如Gloo、OpenMPI)在数据路径中引入了大量开销。本文提出了HetCCL,一个通过跨异构设备(如GPU)的高效P2P传输实现异构集体通信的框架,消除了主机-设备内存拷贝开销,同时将控制卸载到CPU。对于组合集体(如AllReduce、ReduceScatter),HetCCL引入了一种边界通信器机制,通过使用供应商集体通信库中组合集体的内在归约来实现供应商独立性。凭借高效的异构P2P传输和可移植的归约机制,HetCCL提出了异构集群的层次拓扑抽象,将集体通信分解为集群级原语,保证了最优的跨集群数据传输量和最优的带宽利用率。我们实现了支持4种不同供应商的HetCCL,并在4种异构设置下使用基准测试和端到端LLM任务进行了评估。评估结果表明,在异构通信中,HetCCL的带宽比Gloo高17-19倍,并且在端到端训练中每步时间加速高达16.9%。

英文摘要

Training Large Language Models (LLMs) on heterogeneous clusters presents significant challenges for collective communication, as hardware from multiple vendors introduces diverse network and computational characteristics. Existing collective communication frameworks (e.g., NCCL, RCCL) designed for homogeneous environments fail to address mixed-hardware setups, while communication libraries with heterogeneous support (e.g., Gloo, OpenMPI) incur heavy overhead in the data path. This paper presents HetCCL, a framework that enables heterogeneous collective communication by efficient P2P transport across heterogeneous devices (e.g., GPUs), eliminating the host-device memory copy overhead while offloading the control to the CPUs. For combining collectives (e.g., AllReduce, ReduceScatter), HetCCL introduces a border-communicator mechanism that achieves vendor independence by using the intrinsic reduction in the combining collectives in vendor collective communication libraries. With efficient heterogeneous P2P transport and portable reduction mechanism, HetCCL proposes a hierarchical topology abstraction for heterogeneous clusters, dissecting collective communication into cluster-level primitives that guarantee optimal cross-cluster data transfer volume and optimal bandwidth utilization. We implement HetCCL with 4 different vendor support and evaluate it in 4 heterogeneous settings with benchmarks and end-to-end LLM tasks. Our evaluation shows that HetCCL achieves 17-19x higher bandwidth than Gloo in heterogeneous communications, and speeds up end-to-end training by up to 16.9% in the per-step-time.

2605.30997 2026-06-01 stat.ML cs.LG 版本更新

Hedging on the Frontier: Learning New Tasks with Few Samples

前沿对冲:基于少量样本学习新任务

Tobias Wegel, Federico Di Gennaro, Geelon So, Fanny Yang

发表机构 * Department of Computer Science, ETH Zurich(苏黎世联邦理工学院计算机科学系) Department of Computer Science and Engineering, UC San Diego(南加州大学计算机科学与工程系)

AI总结 针对新任务样本少的问题,利用弱单调性假设,通过转移学习和模型选择聚合在模型前沿进行对冲,实现可证明的统计增益。

详情
AI中文摘要

当学习者面临少量样本的新任务时,必须利用任何可用的辅助信息。在实践中,这通常以公共基准中相关任务的模型评估形式出现。一个关键问题是如何对任务相关性进行建模,使其既现实又能从基准评估中获得可证明的收益。经验上,我们观察到弱单调性通常近似满足:如果一个模型在许多基准上占优,那么它在新任务上也往往表现更好。我们探索了在(近似)弱单调性下学习的统计复杂性,并在两种学习范式(迁移学习和模型选择聚合)中利用它。我们表明,不仅可以根据单调性剪枝模型类,还可以通过在前沿进行对冲来进一步适应可用权衡的几何结构。

英文摘要

When a learner faces a new task with few samples, it must leverage any available side information. In practice, this often comes in the form of model evaluations on related tasks in public benchmarks. A key question then is how to model task relatedness such that it is both realistic and the benchmark evaluations lead to provable gains. Empirically, we observe that weak monotonicity is often approximately satisfied: if a model dominates another on many benchmarks, it also tends to outperform on the new task. We explore the statistical complexity of learning under (approximate) weak monotonicity, leveraging it within two learning paradigms: transfer learning and model selection aggregation. We show that not only can we prune the model class based on monotonicity, but we can also further adapt to the geometry of the available trade-offs by hedging on the frontier.

2605.30992 2026-06-01 cs.LG 版本更新

Eigenvectors of Experts are Training-free Non-collapsing Routers

专家特征向量是无需训练的非崩溃路由器

Giang Do, Hung Le, Truyen Tran

发表机构 * Applied Artificial Intelligence Intiative (A2I2), Deakin University, Victoria, Australia(应用人工智能倡议(A2I2),德肯大学,维多利亚,澳大利亚)

AI总结 针对稀疏混合专家模型中专家崩溃问题,提出基于专家权重矩阵特征向量的无需训练路由框架SSMoE,通过奇异值分解利用谱特性提升模型性能。

Comments 24 pages

详情
Journal ref
ICML 2026
AI中文摘要

稀疏混合专家(SMoE)架构通过将输入令牌路由到选定的专家子集来提高大型语言模型(LLMs)的训练效率。尽管取得了显著成功,SMoE模型在训练和推理中仍面临专家崩溃问题(Chi等人,2022),这会降低模型性能。先前研究主要关注改进路由器;然而,这些方法依赖于从头训练或微调,需要高昂的计算和数据处理成本。此外,我们通过理论和实证结果证明,尽管有这些努力,在推进预训练良好的SMoE模型时,该问题仍然存在。为填补这一空白,我们分析了先进的SMoE模型,观察到专家权重矩阵的特征向量编码了丰富的语义信息,指向传统路由策略的有效替代方案。基于这一见解,我们提出了奇异值分解SMoE(SSMoE),一种新颖且无需训练的框架,利用专家权重的谱特性来解决崩溃问题并提升模型性能。在多种语言和视觉任务上的大量实验,包括干净和损坏数据设置,证明了SSMoE的强大泛化能力和鲁棒性。我们的发现强调了更深入理解模型内部结构如何指导开发更有效的SMoE架构。我们的实现已在https://github.com/giangdip2410/SSMoE公开。

英文摘要

Sparse Mixture of Experts (SMoE) architectures improve the training efficiency of Large Language Models (LLMs) by routing input tokens to a selected subset of specialized experts. Despite their remarkable success, both training and inference in SMoE models suffer from the expert collapse issue (Chi et al., 2022), which degrades model performance. Prior studies primarily focus on improving the router; however, such methods rely on training from scratch or fine-tuning, which requires high computational and data-processing costs. Furthermore, we demonstrate that, despite these efforts, the issue persists when advancing well-pretrained SMoE models, as evidenced by both theoretical and empirical results. To fill that gap, we analyze the advanced SMoE models and observe that the eigenvectors of expert weight matrices encode rich semantic information, pointing to an effective alternative to conventional routing strategies. Building on this insight, we propose Singular Value Decomposition SMoE (SSMoE), a novel and training-free framework that leverages spectral properties of the expert weights to address the collapse issue and enhance model performance. Extensive experiments across diverse language and vision tasks, under both clean and corrupt data settings, demonstrate the strong generalization and robustness of SSMoE. Our findings highlight how a deeper understanding of model internals can guide the development of more effective SMoE architectures. Our implementation is publicly available at https://github.com/giangdip2410/SSMoE.

2605.30991 2026-06-01 cs.LG cs.CV 版本更新

Parallel Tempering Initial Sampling in Inference-Time Reward Alignment

推理时奖励对齐中的并行回火初始采样

Myeongjun Oh, Gwangho Kim, Sungyoon Lee

发表机构 * Department of Artificial Intelligence(人工智能系) Department of Computer Science(计算机科学系)

AI总结 针对推理时奖励对齐中标准SMC方法因初始采样陷入局部模式的问题,提出基于并行回火的PATHS方法,通过耦合多条回火链实现高效探索,提升对齐质量。

Comments 31 pages, 11 figures

详情
AI中文摘要

推理时奖励对齐无需重新训练即可引导预训练的扩散和基于流的生成模型满足用户指定的奖励。最近,序贯蒙特卡洛(SMC)通过迭代过滤和传播多个粒子成为该任务的有力框架。然而,我们表明基于SMC的标准方法通常性能不佳,因为它们从标准先验初始化粒子,而复杂奖励景观中的高奖励区域极为罕见。此外,我们表明即使最近的奖励感知初始采样方法仍然容易陷入局部模式,因为复杂奖励景观通常是多模态的。为克服这些限制,我们提出PATHS(用于高复杂度奖励采样的并行回火),一种通过并行回火耦合多个采样链的新型初始化方法。PATHS维护一个奖励回火链的阶梯,并定期执行Metropolis交换,从而在平坦化的奖励景观中实现高效探索,缓解模式陷阱问题。我们的分析表明,该机制显著增强了有限预算下对通常难以采样的罕见高奖励区域的探索。在布局到图像和数量感知生成上的实验表明,PATHS在对齐质量上取得了一致的提升,尤其是在复杂提示上。

英文摘要

Inference-time reward alignment steers pretrained diffusion and flow-based generative models to satisfy user-specified rewards without retraining. Recently, Sequential Monte Carlo (SMC) has emerged as a powerful framework for this task by iteratively filtering and propagating multiple particles. However, we show that standard SMC-based methods often suffer from poor performance because they initialize particles from a standard prior, whereas high-reward regions in complex reward landscapes are extremely rare. Further, we show that even recent reward-aware initial sampling approaches remain vulnerable to getting trapped in local modes, as complex reward landscapes are often multi-modal. To overcome these limitations, we propose PATHS (PArallel Tempering for High-complexity reward Sampling), a novel initialization method that couples multiple sampling chains through parallel tempering. PATHS maintains a ladder of reward-tempered chains and periodically performs Metropolis swaps, enabling efficient exploration across flattened reward landscapes, thereby mitigating the mode-trapping issues. Our analysis reveals that this mechanism substantially enhances the finite-budget exploration of rare, high-reward regions that are typically challenging to sample. Experiments on layout-to-image and quantity-aware generation show that PATHS achieves consistent gains in alignment quality, particularly on complex prompts.

2605.30981 2026-06-01 cs.CL cs.LG 版本更新

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

自回归Transformer中的认知疲劳:形式化与测量

Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain, Michael Stewart, Amit Sheth

发表机构 * Guru Gobind Singh Indraprastha University, India(古鲁·戈宾德·辛格·印度普拉斯塔大学) Artificial Intelligence Institute, University of South Carolina, USA(人工智能研究所,南卡罗来纳大学) Indian Institute of Technology, Kanpur, India(印度理工学院,坎浦尔) Indian AI Research Organization, India(印度人工智能研究组织)

AI总结 本文形式化自回归语言模型在长程生成中的退化现象为认知疲劳,并提出轻量级诊断指标疲劳指数(FI),通过聚合注意力衰减、表征漂移和熵校准三个信号实现实时监测,实验表明FI能高精度预测任务退化和重复生成。

Comments 9 pages, 7 figures. Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

自回归语言模型在长程生成过程中经常退化,产生重复文本、失去指令遵循能力并表现出不稳定的熵。尽管这些失败普遍存在,但从业者缺乏在线诊断工具来实时检测它们。我们将这种退化形式化为认知疲劳,这是一种可测量的生成时状态,其特征是对原始提示的注意力衰减、表征漂移和熵校准错误。我们引入了疲劳指数(FI),这是一种轻量级、模型无关的诊断方法,在明确的公理(单调性、有界性、可解释性)下聚合这三个信号,从而实现可靠的运行时监控。在九个模型(1B-13B参数)上,FI轨迹表现出结构化的时间动态,预测任务退化(AUROC = 0.95)和重复(Spearman rho = 0.94),并揭示了非单调的缩放行为:低于3B的指令微调模型比基础模型退化更快,而在7B时这一趋势逆转。压力分析进一步表明,在更长的上下文、中间位置的证据和降低的数值精度下,FI onset加速。这些结果确立了认知疲劳作为一个连贯且可测量的现象,并将FI定位为生产级LLM系统中运行时可靠性监控的原则性工具。

英文摘要

Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real-time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axioms (monotonicity, boundedness, interpretability) enabling reliable runtime monitoring. Across nine models (1B-13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (Spearman rho = 0.94), and reveal non-monotonic scaling behavior: instruction-tuned models below 3B exhibit faster collapse than base models, with this trend reversing at 7B. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.

2605.30976 2026-06-01 stat.ML cs.IT cs.LG math.IT 版本更新

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

具有1比特通信约束的批量随机线性赌博机

Ivan Lau, Daniel McMorrow, Kevin Jamieson, Jonathan Scarlett

发表机构 * National University of Singapore(新加坡国立大学) University of Washington(华盛顿大学)

AI总结 研究在批量大小B和每批仅1比特反馈的通信约束下,随机线性赌博机的遗憾最小化问题,提出了两种基于G-最优设计和1比特均值估计的相位消除算法,实现了接近无约束线性赌博机的最优遗憾。

详情
AI中文摘要

我们研究了在批处理和通信约束的自然组合下的随机线性赌博机:时间范围被划分为大小相等的批次$B$,在每个批次中,学习器向一个智能体发送$B$个请求的臂拉动,智能体观察相应的$B$个奖励,并用单个比特的反馈回复学习器。对于每个批次,学习器指定智能体使用的1比特量化规则,该规则可能依赖于所有先前接收到的比特,但不直接依赖于任何过去的奖励。这一设置解决了先前模型(仅有每轮量化或仅有总比特预算)之间一个显著但尚未探索的“中间地带”。我们建立了一个极小极大下界,表明由于1比特通信瓶颈,即使在没有噪声的情况下,$Ω(B\min\{d,\log\lvert \mathcal{A} vert\})$的遗憾也是不可避免的。结合标准的统计极限,这给出了一个通用的下界$\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} vert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} vert\}})$。我们开发了两种基于$G$-最优设计和1比特均值估计的相位消除算法。第一种算法实现了$\widetilde{O}(dB + d\sqrt{T})$的遗憾,当$\lvert \mathcal{A} vert = \exp(Ω(d))$时,该下界在对数因子内匹配;第二种算法结合了安全臂识别和热启动过程,获得了$\widetilde{O}(B\log\lvert \mathcal{A} vert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} vert})$的遗憾,在$(\lvert \mathcal{A} vert, B, d, T)$的广泛缩放范围内接近最优。总之,我们的结果表明,每批仅需一个比特的反馈就足以在广泛的缩放范围内几乎匹配无约束线性赌博机的极小极大遗憾,即使对于$Θ(\sqrt{T})$这样大的批量大小也是如此。

英文摘要

We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all previously received bits but not on any past rewards directly. This setting addresses a significant yet unexplored ``middle ground'' between previous models having per-round quantization only or total bit budgets only. We establish a minimax lower bound showing that $Ω(B\min\{d,\log\lvert \mathcal{A} \rvert\})$ regret is unavoidable due to the 1-bit communication bottleneck, even in the absence of noise. Combined with standard statistical limits, this yields a general lower bound of $\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} \rvert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} \rvert\}})$. We develop two phased-elimination algorithms based on $G$-optimal designs and 1-bit mean estimation. The first achieves $\widetilde{O}(dB + d\sqrt{T})$ regret, matching the lower bound up to logarithmic factors when $\lvert \mathcal{A} \rvert = \exp(Ω(d))$, and the second incorporates a safe-arm identification and warm-start procedure to obtain $\widetilde{O}(B\log\lvert \mathcal{A} \rvert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} \rvert})$ regret, which is near-optimal in broad scaling regimes of $(\lvert \mathcal{A} \rvert, B, d, T)$. Together, our results demonstrate that a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes, even for batch sizes as large as $Θ(\sqrt{T})$.

2605.30960 2026-06-01 cs.LG 版本更新

Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens

重新审视零阶Hessian近似:单步策略优化视角

Junbin Qiu, Zhaowei Hong, Renzhe Xu, Yao Shu

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Shanghai University of Finance and Economics(上海财经大学)

AI总结 本文通过单步策略优化视角统一零阶Hessian估计,提出方差缩减的ZoVH框架,实现全Hessian矩阵、正则化逆及偏差校正逆Hessian-梯度积的高效估计。

详情
AI中文摘要

精确的零阶Hessian估计是无导数方法的基石,对于双层优化、贝叶斯推断和不确定性量化等任务至关重要。然而,在高维设置中获取完整的低方差Hessian及其逆估计器仍然是一个重大挑战。为了解决这一问题,我们提出了一个统一框架,通过单步策略优化的视角重新解释零阶Hessian近似。该视角建立了通用零阶Hessian估计器与平滑策略优化目标Hessian之间的理论等价性,将不同的经典随机估计器统一为基线选择的特定实例。在此基础上,我们引入了ZoVH,一个针对全Hessian矩阵、其正则化逆以及偏差校正的逆Hessian-梯度积的方差缩减估计器套件。ZoVH利用两种关键技术:(1) 推导出的唯一最优基线,可证明最小化方差;(2) 一种查询重用策略,结合历史函数查询以提高样本效率而不增加成本。我们严格的理论分析证实了Hessian估计器的无偏性,验证了基线的方差最优性,提供了整个ZoVH套件的误差界,并为由此产生的曲率感知零阶算法建立了收敛保证。广泛的实证结果验证了我们的理论发现,表明ZoVH在实际应用中实现了卓越的估计精度和收敛性能。代码可在 https://github.com/Qjbtiger/ZoVH 获取。

英文摘要

Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimators for the Hessian and its inverse in high-dimensional settings remains a significant challenge. To address this, we propose a unified framework that reinterprets ZO Hessian approximation through the lens of single-step Policy Optimization (PO). This perspective establishes a theoretical equivalence between general ZO Hessian estimators and the Hessian of a smoothed PO objective, unifying distinct classical randomized estimators as specific instances of baseline selection. Building on this foundation, we introduce ZoVH, a comprehensive suite of variance-reduced estimators for the full Hessian matrix, its regularized inverse, and the bias-corrected inverse Hessian-gradient product. ZoVH leverages two key techniques: (1) a unique optimal baseline derived to provably minimize variance, and (2) a query reuse strategy that incorporates historical function queries to enhance sample efficiency without inflating costs. Our rigorous theoretical analysis confirms the unbiasedness of the Hessian estimator, validates the variance optimality of our baseline, provides error bounds for the entire ZoVH suite, and establishes convergence guarantees for the resulting curvature-aware ZO algorithm. Extensive empirical results validate our theoretical findings, demonstrating that ZoVH achieves superior estimation accuracy and convergence performance in real-world applications. Code is available at https://github.com/Qjbtiger/ZoVH

2605.30936 2026-06-01 cs.LG math.OC stat.ML 版本更新

Local linear convergence of gradient methods for overparameterized Gaussian mixtures

过参数化高斯混合模型梯度方法的局部线性收敛性

Jingxing Wang, Vasileios Charisopoulos, Maryam Fazel

发表机构 * Electrical & Computer Engineering, University of Washington(华盛顿大学电气与计算机工程系) National Institute for Theory and Mathematics in Biology(生物理论与数学国家研究所) Amazon, Inc.(亚马逊公司)

AI总结 针对过参数化高斯混合模型,提出一种交替使用短梯度步和长Polyak步的方法,实现局部线性收敛速率,克服了过参数化导致的慢收敛问题。

Comments 45 pages, 7 figures

详情
AI中文摘要

我们研究了过参数化下学习高斯混合模型的问题。先前的工作表明,虽然过参数化对于避免虚假局部最优和通过梯度EM算法实现全局恢复真实模型至关重要,但它会显著减慢局部收敛速度。在混合权重的某些假设下,我们证明了统计学习过程最小化的标准散度度量具有一个缓慢增长的流形,在该流形上著名的Polyak步长可以几何级地减少损失,并设计了一种基于梯度的方法,该方法以局部线性速率收敛到极小值点。此外,我们表明,对于具有任意权重的混合模型,我们的方法收敛到接近最优的解——直到一个自然的误设阈值。在高层次上,该方法在接近流形的几个“短”梯度下降步和收缩到极小值点距离的“长”Polyak步之间交替。我们的结果表明,慢收敛不是过参数化的内在挑战,而是可以通过利用损失景观的有利结构来克服。

英文摘要

We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.

2605.30919 2026-06-01 cs.LG cs.AI 版本更新

De-attribute to Forget for LLM Unlearning

De-attribute to Forget for LLM Unlearning

Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng, Anthony Kum Hoe Tung, Bryan Kian Hsiang Low

发表机构 * Department of Computer Science, National University of Singapore(新加坡国立大学计算机科学系)

AI总结 本文提出基于数据归因奖励的LLM遗忘框架DareU,通过强化学习降低生成响应与遗忘数据的归因分数,实现有效遗忘并平衡模型效用。

详情
AI中文摘要

大型语言模型(LLM)的快速发展引发了对使用不当数据进行训练的担忧,这导致了对LLM遗忘研究的兴趣日益增长。许多现有的LLM遗忘方法依赖于优化预测损失,例如最大化遗忘集上的损失,但常常面临过度遗忘和模型效用差等关键问题。为了解决这些问题,本文创新地将LLM遗忘的优化目标定义为归零数据归因。具体而言,我们提出了第一个基于数据归因奖励的LLM遗忘框架,称为DareU,该框架通过强化学习来更新LLM,通过降低其生成响应与遗忘数据所有者的归因分数(即去归因)来实现遗忘。使用LLM分类器作为归因的有效近似进行的实证评估表明,DareU在实现有效遗忘的同时,很好地平衡了遗忘质量和模型效用,优于现有基线。

英文摘要

The rapid development of large language models (LLMs) has raised concerns on the use of inappropriate data for training, which has led to a growing interest in LLM unlearning. Many existing LLM unlearning approaches rely on optimizing prediction loss(es), such as maximizing the loss on the forget set, but often face critical issues like over-forgetting and poor model utility. To address them, this paper novelly frames the optimization objective for LLM unlearning as one of zeroing out data attribution instead. In particular, we propose the first LLM unlearning framework based on data attribution rewards called DareU that performs reinforcement learning to update the LLM by reducing the attribution score of its generated responses (i.e., de-attributing) to the forget data owners. Empirical evaluation using an LLM classifier as an efficient approximation of attribution shows that DareU outperforms existing baselines by achieving effective unlearning while balancing forget quality and model utility well.

2605.30916 2026-06-01 cs.LG cs.GT econ.TH 版本更新

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

福利、可改进性与方差:最优基准测试项聚合的主-代理方法

Andreas Haupt, Justin Hartenstein, Anka Reuel, Mykel Kochenderfer, Sanmi Koyejo

发表机构 * Department of Economics & Computer Science(经济与计算机科学系) Institute for Computational and Mathematical Engineering(计算与数学工程研究所) Department of Computer Science(计算机科学系) Department of Aeronautics & Astronautics(航空与航天系)

AI总结 提出将基准测试建模为多任务主-代理博弈,通过福利、可改进性和方差三个维度评估项目,并应用于OLMES数据集识别帕累托劣势项目。

详情
AI中文摘要

AI基准测试存在记录完善的局限性,先前研究探讨了污染、饱和以及构造不明确等问题。聚合受到的关注要少得多:基准测试通常通过统一平均项目级分数来总结,隐含地将每个测试项目视为同等重要。我们将基准测试建模为多任务主-代理博弈,并表明基准测试的福利损失由三个项目级原始要素共同决定:与规范性福利优先级的一致性、边际可改进性和性能方差。我们将该理论转化为一个审计框架,沿这三个轴对项目进行排序,并使用WORKBank(福利)、EvoLM 4B套件(可改进性)和PolyPythias 410M面板(方差)将其应用于OLMES项目。该框架揭示了在OLMES中,在亲工人福利操作化下帕累托劣势的项目。所有代码可在 https://github.com/stair-lab/principal-agent-benchmarks 获取。

英文摘要

AI benchmarks have well-documented limitations, with prior work examining contamination, saturation, and construct underspecification. Aggregation has received far less attention: benchmarks are typically summarized by uniformly averaging item-level scores, implicitly treating every test item as equally valuable. We model benchmarking as a multitask principal-agent game and show that the welfare loss from a benchmark is determined jointly by three item-level primitives: alignment with normative welfare priorities, marginal improvability, and performance variance. We translate the theory into an audit framework that ranks items along each of these three axes, and apply it to OLMES items using WORKBank for welfare, the EvoLM 4B suite for improvability, and the PolyPythias 410M panel for variance. The framework surfaces items that are Pareto-inferior within OLMES subject to a pro-worker welfare operationalization. All code is available at https://github.com/stair-lab/principal-agent-benchmarks.

2605.30914 2026-06-01 cs.LG cs.SE 版本更新

Automating Formal Verification with Reinforcement Learning and Recursive Inference

用强化学习和递归推理自动化形式验证

Max Tan

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系) Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究通过可验证奖励的强化学习和验证器引导的推理搜索,提升大语言模型生成验证程序和证明的能力,在Dafny和Lean上取得显著进展。

Comments Master's thesis, 140 pages, 16 figures, 17 tables

详情
AI中文摘要

自动化形式验证对大语言模型仍然具有挑战性,因为证明助手和验证感知语言的数据稀缺,且正确性取决于满足精确的机器可检查规范,而非生成合理的代码。本文研究验证器环境如何通过可验证奖励的强化学习(RLVR)和验证器引导的推理时搜索,改进大语言模型生成验证程序和证明的能力。首先,我们使用组相对策略优化(GRPO)及相关变体,在Dafny中训练开源模型,将生成的候选程序组装成完整程序,并根据编译器和验证器的结果进行评分。在APPS衍生的Dafny数据集上的初步实验将验证奖励从2.2%提升至58.1%,但发现了规范破解问题,即模型利用弱形式规范而非实现预期解决方案。在过滤掉欠规范和易受攻击的任务后,多轮RLVR在改进的基准上将验证通过率从9.7%提升至31.1%。其次,我们在Lean中开发了一个验证器引导的推理框架,将证明生成视为对分解子目标、验证器反馈、诊断和修复的结构化搜索。使用固定的基础模型,包含证明修订器的完整框架在初始VeriCoding试点集上将通过率从直接修复的46.2%提升至69.2%。在更大的VERINA数据集上,整体任务分解加上证明修订器解决了42个先前未解决任务中的7个。我们还引入了Dalek-Bench,一个从Rust $ exttt{curve25519-dalek}$验证项目派生的仓库级Lean基准;初步结果仍然较弱,表明仍需更强的进度评估和特定任务的工具使用策略。

英文摘要

Automated formal verification remains challenging for large language models because data for proof assistants and verification-aware languages is scarce, and correctness depends on satisfying precise machine-checkable specifications rather than producing plausible code. This thesis studies how verifier environments can improve LLM generation of verified programs and proofs through reinforcement learning from verifiable rewards (RLVR) and verifier-guided inference-time search. First, we train open-source models in Dafny with RLVR using Group Relative Policy Optimization (GRPO) and related variants, assembling generated candidates into complete programs and scoring them with compiler and verifier outcomes. Initial experiments on an APPS-derived Dafny dataset increased verified reward from 2.2% to 58.1%, but revealed specification hacking, where models exploit weak formal specifications instead of implementing the intended solutions. After filtering underspecified and vulnerable tasks, multi-turn RLVR on the refined benchmark improves the verified pass rate from 9.7% to 31.1%. Second, we develop a verifier-guided inference scaffold in Lean that treats proof generation as structured search over decomposed subgoals, verifier feedback, diagnostics, and repair. With a fixed base model, the full scaffold with proof reviser improves pass rate on an initial VeriCoding pilot set from 46.2% under direct repair to 69.2%. On the larger VERINA dataset, whole-task decomposition plus proof reviser solves 7 of 42 previously unsolved tasks. We also introduce Dalek-Bench, a repository-scale Lean benchmark derived from the Rust $\texttt{curve25519-dalek}$ verification project; preliminary results remain weak, indicating that stronger progress evaluation and task-specific tool-use policies are still needed.

2605.30910 2026-06-01 cs.LG 版本更新

PINNs Failure Modes are Overfitting

PINNs 的失败模式是过拟合

Nigel T. Andersen, Takashi Matsubara

发表机构 * Graduate School of Information Science and Technology(信息科学与技术研究生学校) RIKEN Center for Advanced Intelligence Project (AIP)(RIKEN高级智能项目中心(AIP))

AI总结 本文通过可视化残差证明物理信息神经网络的失败模式源于过拟合,并提出基于正则化和双反向传播的方法来消除失败模式,在标准方程上以更少的配置点实现最先进性能。

详情
AI中文摘要

物理信息神经网络(PINNs)是一类常见的基于机器学习的偏微分方程(PDE)求解器,它们通过最小化编码 PDE 的残差损失来训练网络以表示解。尽管取得了成功,但已知它们在某些简单方程上会失败,收敛到不正确的解,尽管损失很低。这些失败模式在过去几年中引起了文献中的广泛关注,激发了基于架构和优化的解决方案。通过直接可视化残差,我们表明失败模式是过拟合的结果:损失在配置点上被最小化,但在其他地方则不然。应用正则化会使失败模式消失。最后,我们将双反向传播扩展到整个残差集,并使用它在四个标准失败模式方程上实现了最先进的性能,配置点数量减少多达 $23\times$,且使用普通架构。

英文摘要

Physics-Informed Neural Networks (PINNs) are a common class of machine learning-based partial differential equation (PDE) solvers which train a network to represent a solution by minimizing a residual loss that encodes the PDE. Despite their successes, they are known to fail on certain simple equations, converging to an incorrect solution despite low loss. These failure modes have garnered significant attention in the literature over the past several years, motivating both architectural and optimization based solutions. By directly visualizing the residual, we show that failure modes are the result of overfitting: the loss is minimized on the collocation points, but not elsewhere. Applying regularization causes the failure modes to vanish. Finally, we extend double backpropagation over the full set of residuals, and use it to achieve state-of-the-art performance on four standard failure mode equations with up to $23\times$ fewer collocation points and a vanilla architecture.

2605.30907 2026-06-01 cs.SE cs.AI cs.CL cs.LG 版本更新

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

BlueFin: 在金融电子表格上对LLM智能体进行基准测试

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta, Case Winter, George Fang, John Ling, Emma Strubell, Zach Kirshner

发表机构 * Longitude Labs Inc.(Longitude Labs公司) Cornell University(康奈尔大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出BlueFin基准,通过131个真实金融电子表格任务评估LLM智能体的合成、操作和理解能力,并验证了LM评判与人类专家的一致性。

Comments 26 pages

详情
AI中文摘要

我们提出BlueFin,一个基准测试,要求大语言模型(LLM)智能体在专业金融领域的电子表格工作簿上执行合成、操作和理解任务。尽管全球电子表格软件付费用户估计数亿——比全球专业开发人员估计数量高一个数量级——但投入探索和扩展LLM在电子表格领域能力的资源相对较少,而专门用于反映专业金融角色实际职业任务的资源更少。为此,我们整理了131个具有现实相关性的挑战性复杂任务,包含3225个细粒度评分标准;值得注意的是,我们的评分标准和LM评判评估由一组专家人工标注员验证,从而对难以通过编程验证但可由LM评判智能体可靠评估的复杂任务进行高质量、细粒度的评估。我们的评判与专家共识达到一致(α=0.826),宏F1得分为0.839。前沿LLM在此挑战性基准上表现不佳,最强LLM在任务上的平均得分低于50%——模型在动态正确性方面表现出特别弱点。我们的贡献包括:涵盖三类电子表格任务的示例数据集、开源工具包和智能体评估框架,以及现有前沿模型在我们基准上的性能表征。

英文摘要

We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipulation, and comprehension tasks over spreadsheet workbooks in the professional finance domain. Though estimates of the global population of paying users of spreadsheet software range in the hundreds of millions -- an order of magnitude more than the estimated global population of professional developers -- comparatively fewer resources have been devoted to exploring and expanding LLM capabilities in the spreadsheet domain, with fewer still dedicated to mirroring real occupational tasks encountered by those in professional finance roles. In response, we curate a set of 131 challenging, complex tasks with real-world relevance in the domain, containing 3,225 granular rubric criteria; notably, our rubric criteria and LM judge evaluations are validated by a team of expert human annotators, resulting in high-quality, granular evaluations of complex tasks that are difficult to verify programmatically but can be reliably evaluated by an LM judge agent. Our judge achieves parity with expert consensus ($α=0.826$) with a macro-F1 score of 0.839. Frontier LLMs demonstrate poor performance on the challenging benchmark, with the strongest LLMs achieving less than 50\% average scores across tasks -- models exhibit particular weaknesses in dynamic correctness. Our contributions include a dataset of examples across three categories of spreadsheet tasks, an open source harness and agentic evaluation framework, and a characterization of existing frontier models' performance on our benchmark.

2605.30905 2026-06-01 math.OC cs.LG 版本更新

A Unifying View of Anchoring via Operator-Side Tikhonov Regularization

通过算子侧Tikhonov正则化实现锚定的统一视角

Zihao Chen

发表机构 * UC Berkeley(加州大学伯克利分校)

AI总结 本文提出锚定固定点和单调方程方法可通过在基础方法查询的算子上添加消失的Tikhonov正则项来统一构造,并分析了四种变体的残差收敛率。

详情
AI中文摘要

锚定不动点和单调方程方法,包括Halpern迭代、额外锚定梯度及其相关方法,通过向参考点添加消失的拉力来获得最后迭代保证。现有的锚定变体通常能获得尖锐的最后迭代保证,但从更新层面来看,锚点的放置可能是算法特定的且概念上不透明。我们表明锚定允许一个单一的算子侧构造:用消失的Tikhonov项正则化基础方法查询的算子,然后运行未修改的基础方法。应用于Picard迭代,该配方重现了Halpern迭代;应用于前向步、外梯度(EG)和过去外梯度(PEG,也称为Popov方法),它产生了三种变体,其锚点放置继承了基础方法的查询模式。前向步实例化给出了一个新的残差收敛保证,而EG和PEG实例化给出了新的正则化变体。四种分析共享一个残差递推关系,恢复了Halpern残差范数的$O(1/k)$收敛速率,为正则化前向步给出了$O(1/\sqrt{k})$,并在无约束单调Lipschitz设置下为正则化EG和PEG变体给出了$O(1/k)$。

英文摘要

Anchored fixed point and monotone equation methods, including Halpern iteration, extra anchored gradient, and their relatives, add a vanishing pull toward a reference point to obtain last-iterate guarantees. Existing anchored variants often achieve sharp last-iterate guarantees, but from the update-level perspective the placement of the anchor can be algorithm-specific and conceptually opaque. We show that anchoring admits a single operator-side construction: regularize the operator queried by the base method with a vanishing Tikhonov term, then run the unmodified base method. Applied to the Picard iteration, this recipe reproduces the Halpern iteration; applied to the forward step, extragradient (EG), and past extragradient (PEG, also known as Popov's method), it yields three variants whose anchor placements inherit the base method's query pattern. The forward-step instantiation gives a new residual convergence guarantee, while the EG and PEG instantiations give new regularized variants. The four analyses share a residual recurrence, recovering the $O(1/k)$ Halpern residual-norm convergence rate, giving $O(1/\sqrt{k})$ for the regularized forward step, and giving $O(1/k)$ for the regularized EG and PEG variants in the unconstrained monotone Lipschitz setting.

2605.30903 2026-06-01 cs.LG cs.AI 版本更新

Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach

无最优演示者的逆强化学习:一种可行奖励集方法

Kihyun Kim, Shripad Deshmukh, Nikos Vlassis, Jiawei Zhang

发表机构 * MIT LIDS(麻省理工学院媒体实验室) University of Massachusetts, Amherst(马萨诸塞大学阿姆赫斯特分校) Adobe Research(Adobe研究院) University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 针对多个非最优演示者数据,提出可行奖励集框架,通过线性约束联合可行集单调收缩,并给出恢复保证与高维环境离线算法。

详情
AI中文摘要

逆强化学习(IRL)通常假设来自单个最优演示者的演示,但在许多应用中,数据来自多个具有异质次优性水平的非完美演示者。我们通过可行奖励集框架研究这一设置下的奖励学习:对于每个演示者,我们将其声明的次优性水平编码为线性约束,并在演示者之间对所得可行集取交集。我们的理论分析表明,随着数据的增加,联合可行集单调收缩,并且我们精确刻画了新演示者何时严格收紧该集合。我们进一步为真实最优演示者的可行奖励集建立了两个恢复保证:一个界限依赖于与最优占用度的接近程度,而另一个仅需要足够的覆盖且没有接近最优的演示者。在实际方面,我们引入了解决所得奖励集中固有奖励模糊性的策略,并提供了适用于高维环境的函数逼近离线算法。在表格型网格世界和大语言模型(LLM)微调设置中的实验与理论预测一致,并证明了所提框架相对于基线的有效性。

英文摘要

Inverse reinforcement learning (IRL) typically assumes demonstrations from a single optimal demonstrator, but in many applications data come from multiple imperfect demonstrators with heterogeneous suboptimality levels. We study reward learning in this setting through a feasible-reward-set framework: for each demonstrator, we encode its declared suboptimality level as a linear constraint and intersect the resulting feasible sets across demonstrators. Our theoretical analysis shows that the joint feasible set shrinks monotonically as data are added, and we give an exact characterization of when a new demonstrator strictly tightens it. We further establish two recovery guarantees for the feasible reward set of the ground-truth optimal demonstrator: one bound depends on closeness to the optimal occupancy, while the other requires only sufficient coverage and no near-optimal demonstrator. On the practical side, we introduce strategies to address the inherent reward ambiguity in the obtained reward set and provide an offline algorithm with function approximation for high-dimensional environments. Experiments in tabular grid-world and large language model (LLM) fine-tuning settings are consistent with the theoretical predictions and demonstrate the effectiveness of the proposed framework over baselines.

2605.30901 2026-06-01 cs.LG 版本更新

Density-Guided Robust Counterfactual Explanations on Tabular Data under Model Multiplicity

模型多重性下表格数据的密度引导鲁棒反事实解释

Jun Tan, Qing Guo, Zicheng Xu, Jinglin Li, Qi Fang, Ning Gui

发表机构 * School of Computer Science and Engineering, Central South University, Changsha, China(计算机科学与工程学院,中南大学,长沙,中国)

AI总结 提出DensityFlow生成框架,利用神经ODE和密度评分构建鲁棒反事实解释,避免低密度区域,并在模型多重性下保持有效性。

Comments 26 pages, 11 figures, accepted by ICML 2026

详情
AI中文摘要

反事实解释(CEs)对于可操作的补救措施至关重要,但其可靠性在低密度区域常常受到损害,因为分类器在这些区域表现出高方差。与依赖昂贵的集成交集来定义稳定性的现有方法不同,我们提出了 extit{DensityFlow},一种生成框架,通过遵循高置信度数据流形来构建鲁棒的反事实解释。具体来说,我们将反事实生成建模为由神经ODE参数化的连续时间动力学,并由可微密度评分引导,以主动避免不确定的低密度区域。该密度评分通过噪声对比估计学习,有效利用$(K{+}1)$路判别器来估计密度比。对于黑盒设置,我们引入了一种局部代理蒸馏机制,该机制在CE生成的轨迹内严格地将轻量级代理与目标模型对齐,从而实现高效的基于梯度的优化,且查询次数最少。实验表明,与基于集成的基线相比, extit{DensityFlow}在模型多重性下实现了优越的有效性,同时显著降低了查询成本。我们的实现可在https://github.com/G-AILab/DensityFlow获取。

英文摘要

Counterfactual explanations (CEs) are essential for actionable recourse, yet their reliability is often compromised in low-density regions, where classifiers exhibit high variance. Unlike existing methods that rely on expensive ensemble intersections to define stability, we propose \textit{DensityFlow}, a generative framework that constructs robust CEs by adhering to the high-confidence data manifold. Specifically, we model the counterfactual generation as continuous-time dynamics parameterized by Neural ODE, guided by a differentiable density score to actively avoid uncertain, low-density areas. This density score is learned via Noise Contrastive Estimation, effectively leveraging a $(K{+}1)$-way discriminator to estimate density ratios. For black-box settings, we introduce a local proxy distillation mechanism that aligns a lightweight surrogate with the target model strictly within the trajectory of CE generation, enabling efficient gradient-based optimization with minimal queries. Experiments demonstrate that \textit{DensityFlow} achieves superior validity under model multiplicity while significantly reducing query costs compared to ensemble-based baselines. Our implementation is available at https://github.com/G-AILab/DensityFlow.

2605.30896 2026-06-01 cs.LG 版本更新

Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments

零坍塌:策略梯度方法在不连续奖励环境中的一种失败模式

Nishant Kumar, Enrique Areyan Viqueira, Amy Greenwald

AI总结 本文发现策略梯度方法在拍卖等不连续奖励环境中会出现“零坍塌”失败模式,即策略因梯度信号消失而陷入零奖励区域,并提出了缓解策略。

Comments 20 pages, 7 figures; includes Appendix

详情
AI中文摘要

重复拍卖中的竞价是强化学习(RL)的一个核心挑战,它结合了连续控制与数字广告的策略复杂性。尽管策略梯度和基于值的方法似乎适合这些设置,但它们常常难以应对拍卖奖励景观的不连续、“悬崖状”特性。例如,在首价拍卖中,竞拍者在达到特定阈值之前获得零奖励,之后奖励随出价增加而减少。这形成了由尖锐边界分隔的平坦零奖励区域。我们识别出这种设置中一个基本的失败模式,称为“零坍塌”。我们表明,随机探索和基于梯度的更新可能导致策略越过最优高奖励区域,进入平坦的零奖励区域。一旦进入,由于缺乏信息性的梯度信号,恢复变得极其样本低效,有效地困住了智能体。我们发现演员-评论家方法特别容易受到影响,因为偏差的值估计会加速向不稳定区域的移动。我们的贡献包括:(1)对不连续奖励如何导致信号消失和零坍塌的机制解释;(2)对策略随机性和步长之间相互作用的分析;(3)在REINFORCE和演员-评论家变体上对该现象的经验演示。我们提出了涉及初始化和架构选择的实用缓解策略以提高稳定性。最后,我们引入了一个正式的拍卖环境RL框架,突出了其独特的结构特性。

英文摘要

Bidding in repeated auctions is a central challenge for reinforcement learning (RL), combining continuous control with the strategic complexities of digital advertising. While policy gradient and value-based methods seem well-suited for these settings, they often struggle with the discontinuous, "cliff-like" nature of auction reward landscapes. In a first-price auction, for example, a bidder receives zero reward until they cross a specific threshold, after which the reward decreases as the bid increases. This creates a landscape of flat, zero-reward regions separated by sharp boundaries. We identify a fundamental failure mode in this setting termed "zero collapse." We show that stochastic exploration and gradient-based updates can cause policies to overshoot optimal high-reward regions and enter flat, zero-reward regimes. Once there, the lack of an informative gradient signal makes recovery extremely sample-inefficient, effectively trapping the agent. We find that actor-critic methods are particularly susceptible, as biased value estimates can accelerate this movement toward unstable regions. Our contributions include: (1) a mechanistic explanation of how discontinuous rewards lead to vanishing signals and zero collapse; (2) an analysis of the interaction between policy stochasticity and step size; and (3) an empirical demonstration of this phenomenon across REINFORCE and actor-critic variants. We propose practical mitigation strategies involving initialization and architectural choices to improve stability. Finally, we introduce a formal RL framework for auction environments highlighting their unique structural properties.

2605.30892 2026-06-01 cs.LG 版本更新

Bandwidth Allocation with Device Partitioning for Federated Learning over Industrial IoT networks

面向工业物联网联邦学习的设备分区带宽分配

Kangmin Kim, Jaeyoung Song

发表机构 * School of Electrical and Electronics Engineering, Pusan National University(釜山国立大学电气与电子工程学院)

AI总结 针对联邦学习在工业物联网中的通信瓶颈,提出一种基于设备计算能力分区的带宽分配策略,通过顺序分配全带宽给子集来最小化训练时间,并理论证明其优于无分区方案,同时降低上行能耗。

详情
AI中文摘要

我们考虑一个联邦学习(FL)系统,其中工业物联网(IIoT)设备通过无线信道协作训练全局模型,而不共享本地数据。在此类系统中,通信时间是制约整体训练效率的主要瓶颈。与优先考虑个体服务质量需求的传统网络不同,FL系统旨在尽可能高效地收敛到最优全局模型,这需要一种根本不同的带宽分配方法。本文提出一种新颖的带宽分配策略,利用设备计算能力的异构性来最小化总训练时间。该策略并非同时将所有选定设备的带宽分配出去,而是将参与设备划分为有序子集,并依次授予每个子集全带宽的独占访问权。我们正式证明,无论底层调度算法如何,这种基于分区的策略都能实现比任何无分区带宽分配方案更低的训练时间。此外,通过减少每台设备的传输持续时间,该策略还最小化了上行能耗,这对电池受限的IIoT设备尤其有利。在真实数据集(包括工业表面缺陷基准GC10-Det和标准图像分类基准CIFAR-10)上的大量实验表明,与现有带宽分配方案相比,所提策略持续降低了训练时间和能耗,接近轮次时间的理论下界。

英文摘要

We consider a federated learning (FL) system in which Industrial Internet-of-Things (IIoT) devices collaboratively train a global model over wireless channels without sharing local data. In such systems, communication time is a primary bottleneck that constrains overall training efficiency. Unlike conventional networks that prioritize individual quality-of-service requirements, FL systems collectively aim to converge to an optimal global model as efficiently as possible, which calls for a fundamentally different approach to bandwidth allocation. In this paper, we propose a novel bandwidth allocation policy that exploits the heterogeneity of device computing capabilities to minimize total training time. Rather than distributing bandwidth among all selected devices simultaneously, the proposed policy partitions the participating devices into ordered subsets and sequentially grants each subset exclusive access to the full bandwidth. We formally prove that this partitioning-based policy achieves a strictly lower training time than any bandwidth allocation scheme without partitioning, irrespective of the underlying scheduling algorithm. Furthermore, by reducing per-device transmission duration, the proposed policy also minimizes uplink energy consumption, which is particularly beneficial for battery-constrained IIoT devices. Extensive experiments on real-world datasets - including GC10-Det, an industrial surface defect benchmark, and CIFAR-10, a standard image classification benchmark - demonstrate that the proposed policy consistently reduces training time and energy consumption compared to existing bandwidth allocation schemes, approaching the theoretical lower bound on round time.

2605.30889 2026-06-01 physics.chem-ph cs.LG 版本更新

MLIPilot: LLM-Driven Auto-Research for Machine-Learned Interatomic Potentials

MLIPilot:面向机器学习原子间势的LLM驱动自动研究

Etinosa Osaro, Santosh Adhikari, Stamatia Zavitsanou, Kelsey Parker, Dario Rocca

发表机构 * PsiQuantum

AI总结 提出MLIPilot框架,利用大语言模型自动提出假设、编辑训练代码并基于物理约束评分卡优化机器学习原子间势,在QM7和Cu EMT数据集上验证了其有效性。

详情
AI中文摘要

构建生产质量的机器学习原子间势(MLIP)需要在单个训练损失无法捕捉的约束下平衡精度、动力学稳定性和计算吞吐量。我们引入了MLIPilot,一个自动研究框架,其中工具调用的大语言模型提出假设、编辑MLIP训练代码、启动HPC作业,并使用固定的、受物理约束的评分卡接受或回退更改。我们在MACE势优化上评估了MLIPilot,使用了商业和开源权重LLM代理,包括GPT-5.5、GPT-4.1、Mistral-24B和Qwen3-32B。基准测试涵盖分子和周期性设置:一个QM7衍生数据集(我们为其生成了B3LYP/6-31G(d)能量和力),以及一个Cu EMT数据集(包含由ASE有效介质理论计算器标记的周期性铜超胞)。在这些基准测试中,最强的代理通过发现有用的训练策略(包括输出归一化、损失函数更改、渐进训练计划和模型容量调整),将最初违反约束的基线模型转变为可接受的模型。这些结果表明,当LLM代理的搜索受到领域特定验证标准的约束时,它们可以作为科学机器学习工作流的自主操作者,将MLIP开发从手动试错转向可审计的自动化实验。

英文摘要

Constructing production-quality machine-learned interatomic potentials (MLIPs) requires balancing accuracy, dynamical stability, and computational throughput under constraints that are not captured by a single training loss. We introduce MLIPilot, an auto-research framework in which tool-calling large language models propose hypotheses, edit MLIP training code, launch HPC jobs, and accept or revert changes using a fixed, physically constrained scorecard. We evaluate MLIPilot on MACE potential optimization using both commercial and open-weight LLM agents, including GPT-5.5, GPT-4.1, Mistral-24B, and Qwen3-32B. The benchmarks span molecular and periodic settings: a QM7-derived dataset for which we generated B3LYP/6-31G(d) energies and forces, and a Cu EMT dataset with periodic copper supercells labeled by ASE's Effective Medium Theory calculator. Across these benchmarks, the strongest agents move initially constraint-violating baselines to accepted models by discovering useful training strategies, including output normalization, loss-function changes, progressive training schedules, and model-capacity adjustments. These results suggest that LLM agents can serve as autonomous operators for scientific machine-learning workflows when their search is constrained by domain-specific validation criteria, shifting part of MLIP development from manual trial-and-error toward auditable, automated experimentation.

2605.30873 2026-06-01 cs.LG cs.AI cs.DC 版本更新

Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences

联邦变分偏好对齐与Gumbel-Softmax先验用于个性化用户偏好

Jabin Koo, Hoyoung Kim, Minwoo Jang, Jungseul Ok

发表机构 * Graduate School of AI, POSTECH, Pohang, Republic of Korea(POSTECH人工智能研究生院) Department of CSE, POSTECH, Pohang, Republic of Korea(POSTECH计算机科学与工程系) National AI Research Lab, Seoul, Republic of Korea(首尔国家人工智能研究实验室)

AI总结 提出FedVPA-GP框架,通过联邦混合先验和正交损失解决联邦学习中用户偏好冲突和个性化问题,在HH-RLHF数据集上优于单一模型。

Comments 21 pages, 4 figures. Accepted to ICML 2026

详情
AI中文摘要

联邦学习(FL)为对齐大型语言模型(LLMs)提供了一条保护隐私的途径;然而,现有框架通常强制使用单一奖励模型,不可避免地平均了本质上相互冲突的用户偏好(例如,有用性与无害性)。虽然变分偏好学习(VPL)提供了一条个性化的途径,但将其适应于去中心化设置面临一个基本挑战:由严重的局部数据稀缺性和异质性驱动的后验坍塌。在本文中,我们提出了具有Gumbel-Softmax先验的联邦变分偏好对齐(FedVPA-GP),这是一个旨在在不牺牲隐私的情况下解耦多样偏好的框架。为了稳定变分推断,我们引入了一个联邦混合先验,使客户端能够利用聚合的总体分布作为动态先验。此外,我们加入了一个正交损失,明确强制在潜在空间中分离偏好原型。在HH-RLHF数据集上的实验表明,FedVPA-GP显著优于单一基线,成功解耦了冲突的用户意图,并实现了动态偏好切换。

英文摘要

Federated Learning (FL) offers a privacy-preserving pathway for aligning Large Language Models (LLMs); however, existing frameworks typically enforce a monolithic reward model, inevitably averaging out inherently conflicting user preferences (e.g., helpfulness vs. harmlessness). While Variational Preference Learning (VPL) offers a pathway to personalization, adapting it to decentralized settings presents a fundamental challenge: posterior collapse driven by severe local data scarcity and heterogeneity. In this paper, we propose Federated Variational Preference Alignment with Gumbel-Softmax Prior (FedVPA-GP), a framework designed to disentangle diverse preferences without compromising privacy. To stabilize variational inference, we introduce a Federated Mixture Prior that enables clients to leverage the aggregate population distribution as a dynamic prior. Furthermore, we incorporate an Orthogonal Loss that explicitly enforces the separation of preference prototypes in the latent space. Experiments on the HH-RLHF dataset demonstrate that FedVPA-GP significantly outperforms monolithic baselines, successfully disentangling conflicting user intents and enabling dynamic preference switching.

2605.30866 2026-06-01 quant-ph cs.LG 版本更新

Generative Quantum Data Embeddings for Supervised Learning

用于监督学习的生成式量子数据嵌入

Jaewoong Heo, Daniel K. Park

发表机构 * Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea(统计与数据科学系,延世大学,首尔03722,韩国) Department of Applied Statistics, Yonsei University, Seoul 03722, Republic of Korea(应用统计学系,延世大学,首尔03722,韩国) Department of Quantum Information, Yonsei University, Seoul 03722, Republic of Korea(量子信息系,延世大学,首尔03722,韩国)

AI总结 提出一种基于能量的生成学习框架,通过保真度替代目标优化嵌入结构和参数,提升分类性能,并利用Wasserstein距离解释性能饱和现象。

Comments 14 pages, 7 figures

详情
AI中文摘要

量子机器学习的许多实际相关应用涉及经典数据,其性能关键取决于输入如何嵌入到量子态中。然而,使用固定的嵌入电路拟设仍是标准做法。我们提出了一种基于能量的生成学习框架,该框架合成门序列以优化嵌入结构并细化数据定制的参数,使用基于保真度的替代目标引导搜索以提高类别区分度。实验表明,该方法在不同设置下改善了分类性能,同时也揭示了在现有嵌入族内进行架构搜索仅带来有限额外收益的数据集。我们通过推导输入空间中Wasserstein距离的可实现经验风险界限来解释这种饱和,表明经典数据几何为不太可能从嵌入优化中获得实质性收益的情况提供了先验诊断。结果建立了一个实用且有理论依据的框架,通过生成优化搜索有效的量子数据嵌入,并通过底层经典数据的几何诊断可获得的收益。

英文摘要

Many practically relevant applications of quantum machine learning involve classical data, for which performance depends critically on how inputs are embedded into quantum states. Yet the use of a fixed embedding circuit ansatz remains standard practice. We propose an energy-based generative learning framework that synthesizes gate sequences to optimize embedding structures and refine data-tailored parameters, using a fidelity-based surrogate objective to guide the search toward improved class distinguishability. Empirically, the method improves classification performance across diverse settings, while also revealing datasets where architecture search within the present embedding family yields only limited additional gains. We explain this saturation by deriving bounds on the achievable empirical risk in terms of the Wasserstein distance in the input space, showing that classical data geometry provides an \emph{a priori} diagnostic for regimes in which substantial gains from embedding optimization are unlikely. The results establish a practically useful and theoretically motivated framework for searching effective quantum data embeddings through generative optimization, with the attainable gains diagnosed through the geometry of the underlying classical data.

2605.30865 2026-06-01 cs.LG 版本更新

GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring

GlucoFM: 一种用于连续血糖监测的双流基础模型

Zechen Li, Keerthana Natarajan, Weizhi Zhang, Menglian Zhou, Simon A. Lee, Yuwei Zhang, Maxwell A. Xu, Zeinab Esmaeilpour, Flora D. Salim, Mark Malhotra, Lindsey Sunden, Shwetak Patel, Yuzhe Yang, Ahmed A. Metwally

发表机构 * Google Research(谷歌研究) University of New South Wales(新南威尔士大学)

AI总结 提出GlucoFM,一种轻量级CGM基础模型,通过将血糖动态分解为慢生理状态和瞬态事件流,在7个临床预测任务上平均PR-AUC比最佳CGM专用模型提高4.1点。

详情
AI中文摘要

连续血糖监测(CGM)提供了日常代谢生理的密集视图,然而现有的通用时间序列和CGM专用基础模型通常将血糖轨迹编码为纠缠的单流序列,使得血糖动态的独特时间结构仅被隐式建模。我们提出GlucoFM,一种轻量级CGM基础模型,它将不规则记录对齐到24小时时间网格,保留观测掩码,并将血糖动态分解为慢生理状态和瞬态事件流,捕捉低频血糖基线和可能反映急性生理反应或传感器伪影的短期偏差。GlucoFM在来自477名受试者的109,066小时未标记CGM记录上进行了预训练,具有两个互补目标:融合每日表示上的掩码上下文潜在预测以及状态和事件流上的时间动态预测。在四个不同队列和七个临床预测任务中,GlucoFM在评估基线中实现了最强的受试者分离线性探测性能,比最佳CGM专用基础模型平均PR-AUC提高4.1点。其收益在核心代谢结果上最为显著,在所有糖尿病风险和β细胞功能障碍任务以及4个胰岛素抵抗任务中的3个上领先PR-AUC。GlucoFM还在评估方法中实现了最佳的整体跨数据集迁移性能和强大的少样本适应能力,并且在聚合多天进行受试者级别预测时获得一致收益,突出了生理感知分解作为可迁移CGM表示学习的有效归纳偏置。

英文摘要

Continuous glucose monitoring (CGM) provides a dense view of daily metabolic physiology, yet existing generic time-series and CGM-specific foundation models often encode glucose traces as entangled single-stream sequences, leaving the distinct temporal structure of glycemic dynamics only implicitly modeled. We present GlucoFM, a lightweight CGM foundation model that aligns irregular recordings to a 24-hour chronological grid, preserves observation masks, and decomposes glucose dynamics into slow physiological state and transient event streams, capturing low-frequency glycemic baselines and short-term deviations that may reflect acute physiological responses or sensor artifacts. GlucoFM is pretrained on 109,066 hours of unlabeled CGM recordings from 477 subjects with two complementary objectives: masked contextual latent prediction over fused daily representations and temporal dynamics prediction over state and event streams. Across four diverse cohorts and seven clinical prediction tasks, GlucoFM achieves the strongest subject-disjoint linear-probing performance among evaluated baselines, improving average PR-AUC by 4.1 points over the best CGM-specific foundation model. Its gains are most pronounced on core metabolic outcomes, leading PR-AUC on all diabetes-risk and $β$-cell dysfunction tasks and on 3 of 4 insulin-resistance tasks. GlucoFM also achieves the best overall cross-dataset transfer performance and strong few-shot adaptation among evaluated methods, and consistent gains when aggregating multiple days for subject-level prediction, highlighting physiology-aware decomposition as an effective inductive bias for transferable CGM representation learning.

2605.30860 2026-06-01 math.ST cs.LG math.PR stat.TH 版本更新

Bayesian Inference with Shaped Deep Non-linear MLPs

具有形状深度非线性MLP的贝叶斯推断

Boris Hanin, Tianze Jiang

发表机构 * Princeton University(普林斯顿大学)

AI总结 本文通过神经协方差SDE分析深度非线性MLP在训练样本数、输入维数、隐藏层宽度和层数均较大时的贝叶斯推断,发现LP/N的一阶准则决定深度对模型证据的益处,并推导出贝叶斯预测后验等价于数据相关核方法。

Comments 35 Pages

详情
AI中文摘要

深度学习理论的一个核心目标是刻画神经网络在模型规模和训练集规模同时较大时的预测行为。由于模型参数数量和数据集大小发散极限不可交换,先验上并不清楚存在哪些极限。在这项工作中,我们通过研究深度非线性MLP在训练样本数($P$)、输入维数($N_0$)、隐藏层宽度($N$)和隐藏层数($L$)均可大时的贝叶斯推断,为这些问题提供了新的见解。我们基于神经协方差SDE(Li等人,2022)分析$LP/N\in\Theta(1)$(扮演有效网络深度角色)区域的预测后验。我们的框架涵盖光滑和ReLU激活函数,并适用于任意温度。我们发现,在$LP/N$的一阶近似下,存在一个简单准则,用于判断哪些数据生成过程能从深度中获益,即更大的$LP/N$会增加贝叶斯模型证据。我们还对物理学文献中的一个先前结果给出了新的推导:至少在$LP/N$的一阶近似下,贝叶斯预测后验极其简单,等价于一个数据相关的核方法。

英文摘要

A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.

2605.30859 2026-06-01 cs.LG cs.AI 版本更新

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

DARTS: 分布感知的主动展开轨迹塑造以加速LLM强化学习

Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu, Xupeng Miao, Fangcheng Fu, Bin Cui

发表机构 * School of Computer Science \& Beijing Key Laboratory of Software Hardware Cooperative Artificial Intelligence Systems, Peking University, Beijing, China School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China Institute of Computational Social Science, Peking University (Qingdao), Qingdao, China

AI总结 针对强化学习中长尾响应分布导致的效率瓶颈,提出分布感知的主动轨迹塑造方法,通过细粒度识别提示内长尾并削减无效冗余,实现高达1.77倍的加速而不损失模型性能。

Comments 16 pages, 14 figures, 5 tables. Accepted to ICML 2026

详情
AI中文摘要

强化学习已成为提升模型能力的关键技术,但由于响应长度的长尾分布,其展开效率受到瓶颈制约。现有工作通过提示级尾部调度缓解长尾影响,但我们关注低效率的根本来源:分布本身。具体而言,我们以更细粒度刻画长尾分布,识别提示内长尾,并揭示它们通常包含无效冗余。为解决此问题,我们提出一种主动分布塑造的新范式,将展开分布向简洁性和确定性方向塑造,从而从根本上解决尾部带来的开销。我们通过一种分布感知的轨迹采样机制实现这一点,该机制为每个提示从冗余探索空间中选择轨迹,并采用自适应冗余分配方案以最大化塑造效果和系统效率。实验表明,与最先进系统相比,在不影响模型性能的情况下,实现了高达1.77倍的显著加速。

英文摘要

Reinforcement Learning (RL) has become pivotal for improving model capabilities yet suffers from rollout efficiency bottlenecks due to the long-tail response length distribution. While existing works mitigate the impact of long tails via prompt-level tail scheduling, we focus on the root source of inefficiency: the distribution itself. Specifically, we characterize the long-tail distribution at a finer granularity, identifying intra-prompt long tails, and revealing that they frequently consist of ineffective verbosity. To address this, we propose a novel paradigm of active distribution shaping to shape the rollout distribution towards conciseness and certainty, thereby fundamentally resolving tail-induced overheads. We achieve this through a distribution-aware trajectory sampling mechanism, which selects trajectories from a redundant exploration space for each prompt, and an adaptive redundancy allocation scheme to maximize both shaping effectiveness and system efficiency. Experiments demonstrate significant acceleration over state-of-the-art systems by up to 1.77x without compromising model performance.

2605.30858 2026-06-01 cs.LG 版本更新

ForecastCompass: Guiding Agentic Forecasting with Adaptive Factor Memory

ForecastCompass: 自适应因子记忆引导的智能预测

Yurui Chang, Yongkang Du, Yuanpu Cao, Jinghui Chen, Lu Lin

发表机构 * Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 提出ForecastCompass框架,通过分层预测任务分类和双组件记忆(因子记忆与推理记忆),结合回顾分析迭代修正,提升智能体在动态环境中的概率预测准确性和校准性。

详情
AI中文摘要

智能预测对于动态环境中的决策至关重要,但由于智能体必须从不完整、时间有限的证据中进行推理,并在结果确定之前产生校准的概率,因此仍然具有挑战性。记忆提供了一种自然机制,将经验从已解决的预测转移到未来的预测任务。然而,现有的智能体记忆方法并非为预测量身定制,因为它们通常存储过去的交互、反思或事实关联,而没有明确表示可重用的预测因子或校准知识。我们提出了ForecastCompass (FoCo),一种用于智能预测的自适应因子记忆框架。FoCo通过分层预测任务分类来组织预测经验,从而能够检索与任务相关的预测知识。它维护两个互补的记忆组件:因子记忆(捕获可重用的预测维度)和推理记忆(编码概率更新、不确定性处理和校准原则)。利用回顾分析作为学习信号,FoCo通过口头记忆修正程序迭代修正记忆,使智能体能够随时间积累可迁移的预测知识。在Prophet Arena和FutureX上使用GPT-5-mini和Gemini-2.5-Flash进行的实验表明,FoCo提高了概率准确性和校准性。

英文摘要

Agentic forecasting is important for decision-making in dynamic environments, but it remains challenging because agents must reason from incomplete, time-limited evidence and produce calibrated probabilities before outcomes are resolved. Memory provides a natural mechanism for transferring experience from resolved forecasts to future prediction tasks. However, existing agent-memory methods are not tailored to forecasting, as they typically store past interactions, reflections, or factual associations without explicitly representing reusable predictive factors or calibration knowledge. We propose ForecastCompass (FoCo), an adaptive factor-based memory framework for agentic forecasting. FoCo organizes forecasting experience with a hierarchical forecasting-task taxonomy, enabling retrieval task-relevant forecasting knowledge. It maintains two complementary memory components: factor memory, which captures reusable predictive dimensions, and reasoning memory, which encodes probability updating, uncertainty handling, and calibration principles. Using retrospective analyses as learning signals, FoCo iteratively revises memory through a verbalized memory-revision procedure, enabling the agent to accumulate transferable forecasting knowledge over time. Experiments on Prophet Arena and FutureX with GPT-5-mini and Gemini-2.5-Flash show that FoCo improves both probabilistic accuracy and calibration.

2605.30843 2026-06-01 cs.LG econ.EM 版本更新

A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models

离线强化学习与逆强化学习讲义,第二部分:逆强化学习与动态离散选择模型的基础

Enoch Hyunwook Kang

发表机构 * University of Washington, Foster School of Business(华盛顿大学,福斯特商学院)

AI总结 本文证明了逆强化学习(IRL)与动态离散选择(DDC)模型的等价性,回顾了经典识别结果和计算范式,并介绍了现代机器学习方法及其识别特性。

详情
AI中文摘要

在前向强化学习问题中,奖励是固定且已知的;学习者被要求找到一个好的策略或价值函数。这里我们反过来提问:给定由专家生成的离线数据,我们能否恢复专家所优化的奖励?这就是逆强化学习问题,值得注意的是,两个社区——研究动态离散选择(DDC)的结构计量经济学家和研究熵正则化IRL的机器学习者——一直在以不同的名称研究完全相同的概率模型。我们首先证明它们的等价性。然后,我们发展Magnac和Thesmar的经典识别结果以及由此产生的经典计算范式:Rust的嵌套不动点算法、Hotz和Miller的条件选择概率方法,以及Adusumilli和Eckardt的两种时间差分方法:线性半梯度TD和近似价值迭代。每种方法都有其局限性:维度、转移核估计、致命三元组或投影不动点偏差。接着,我们回顾现代ML/IRL分支:对抗性IRL、占用匹配、IQ-Learn和离线ML-IRL,推导每种方法的实际目标,并精确说明它识别了什么和没有识别什么。最后,我们介绍Kang等人的经验风险最小化框架,该框架为离线IRL/DDC提供了基于梯度的估计器。

英文摘要

In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing? This is the inverse reinforcement learning problem, and remarkably, two communities, structural econometricians studying dynamic discrete choice (DDC) and machine learners studying entropy-regularized IRL, have been working on exactly the same probabilistic model under different names. We begin by proving their equivalence. We then develop the classical identification result of Magnac and Thesmar and the classical computational paradigms that grew out of it: Rust's nested fixed-point algorithm, the conditional-choice-probability approach of Hotz and Miller, and the two temporal-difference approaches of Adusumilli and Eckardt: linear semi-gradient TD and approximate value iteration. Each route has its limits: dimensionality, transition-kernel estimation, the deadly triad, or projected fixed-point bias. We then walk through the modern ML/IRL strand: adversarial IRL, occupancy matching, IQ-Learn, and offline ML-IRL, deriving each method's actual objective and stating precisely what it does and does not identify. We close with the empirical-risk-minimization framework of Kang et al., which yields a gradient-based estimator for offline IRL/DDC.

2605.30842 2026-06-01 cs.LG 版本更新

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: 基于解耦长上下文模型的上下文管理

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

发表机构 * Halıcıoğlu Data Science Institute, University of California, San Diego(哈里卡卢斯数据科学研究所,加州大学圣地亚哥分校) Amazon(亚马逊)

AI总结 提出CoMem框架,通过将记忆管理与智能体工作流解耦并采用k步偏移异步流水线,利用奖励驱动训练策略,在SWE-Bench-Verified上实现1.4倍延迟改进且保持大部分性能。

Comments Work in progress

详情
AI中文摘要

上下文管理使智能体模型能够通过对先前交互历史的迭代总结来解决长时任务。然而,这一过程通常会因额外的总结标记而产生大量解码开销,显著影响部署时的端到端响应延迟。在本文中,我们介绍CoMem,一种新颖的框架,它将记忆管理与主要智能体工作流解耦,使这些过程能够并行执行。我们提出了一种k步偏移异步流水线,将记忆模型的总结与智能体的推理重叠,有效掩盖了上下文处理的延迟。为了确保在这种异步设置下的鲁棒性,我们引入了一种奖励驱动的训练策略,使记忆模型对齐以捕获足够统计信息供智能体决策。理论分析证实,与耦合架构相比,CoMem提供了更优的效率-效果权衡。我们在SWE-Bench-Verified上的广泛实验结果表明,CoMem在保留大部分性能的同时,相比普通长上下文解决方案提供了1.4倍的延迟改进。此外,我们证明这些延迟增益随系统吞吐量增加而有利地扩展,为智能体推理和记忆压缩的独立优化提供了一条模块化路径。

英文摘要

Context management enables agentic models to solve long-horizon tasks through iterative summarization of previous interaction histories. However, this process typically incurs substantial decoding overhead for the extra summarization tokens, which significantly affect the end-to-end response latency at deployment. In this paper, we introduce CoMem, a novel framework that decouples memory management from the primary agent workflow, enabling these processes to execute in parallel. We propose a $k$-step-off asynchronous pipeline that overlaps the memory model's summarization with the agent's inference, effectively masking the latency of context processing. To ensure robustness under this asynchronous setting, we introduce a reward-driven training strategy that aligns the memory model to capture sufficient statistics for the agent's decision-making. Theoretical analysis confirms that CoMem offers a superior efficiency-effectiveness trade-off compared to coupled architectures. Our extensive experimental results on SWE-Bench-Verified show that CoMem provides 1.4x latency improvements upon vanilla long-context solutions while preserving most of the performance. Furthermore, we demonstrate that these latency gains scale favorably with increased system throughput, offering a modular path forward for the independent optimization of agent reasoning and memory compression.

2605.30831 2026-06-01 q-bio.QM cs.LG physics.chem-ph 版本更新

The Geometry of Activity Cliffs: Representation Dependence and Multi-Scale Characterization of Activity Landscapes

活性悬崖的几何结构:活性景观的表征依赖性与多尺度表征

Pawel Dabrowski-Tumanski, Bartosz Topolski, Dariusz Plewczynski, Tomasz Jetka

发表机构 * Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw 00-662, Poland(数学与信息科学学院,华沙技术大学,华沙 00-662,波兰)

AI总结 本研究通过六步分析流程,系统探究不同分子表征(如指纹和嵌入)对活性悬崖定义的影响,发现无单一表征在所有标准下均最优,揭示了活性悬崖是表征诱导的几何现象而非分子对固有属性。

详情
AI中文摘要

活性悬崖是指结构相似但活性差异巨大的化合物,通常被视为化学数据集的固有特征。我们认为,除了靶标生物学因素外,我们对活性悬崖的理解很大程度上是由所选分子表征所诱导的几何结构决定的,而非分子对本身的属性。我们设计了一个六步分析流程来系统检验这一假设。该流程包括:评估成对距离几何、悬崖富集度、活性梯度分布、悬崖子空间的持续同调、嵌入和度量对的预测基准测试,以及最终匹配分子对和立体异构体的分析。我们将该流程应用于十五种嵌入和度量配置,以构建针对三个已知活性悬崖挑战的不同数据集的基准。没有一种表征在所有标准上均表现优异:Morgan Tanimoto 提供了最强的悬崖富集度和跨骨架泛化能力;MolFormer 余弦提供了唯一有意义的立体化学敏感性;MACCS 和 RDKit Dice 指纹对匹配分子对变换最敏感;ChemBERTa 由于嵌入坍缩而全面失败。这些发现并非排名。它们反映了不同表征编码了分子识别的不同方面,而选择一种表征实际上就隐含地定义了活性悬崖是什么。

英文摘要

Activity cliffs, structurally similar compounds with large potency differences, are widely treated as intrinsic features of chemical datasets. We argue that apart from target biology, much of our cliff understanding is a consequence of the geometry induced by the chosen molecular representation, not a property of a molecule pair itself. We designed a six-step pipeline to systematically test this hypothesis. The pipeline consists of: assessing pairwise distance geometry, cliff enrichment, activity gradient distribution, persistent homology of the cliff subspace, predictive benchmarking for a chosen pair of an embedding and a metric, and eventually, analysis of the matched molecular pairs and stereoisomers. We applied the pipeline to fifteen configurations of embeddings and metrics to build a benchmark across three distinctive datasets known of activity cliffs challenges. No representation excels on all criteria: Morgan Tanimoto provides the strongest cliff enrichment and cross-scaffold generalization; MolFormer cosine provides the only meaningful stereochemical sensitivity; MACCS and RDKit Dice fingerprints are most sensitive to matched-molecular-pair transformations; ChemBERTa fails uniformly due to embedding collapse. These findings are not a ranking. They reflect the fact that different representations encode different aspects of molecular recognition, and that choosing one implicitly defines what an activity cliff actually is.

2605.30825 2026-06-01 cs.LG cs.AI math.OC stat.ML 版本更新

Unlearning in Diffusion Models: A Unified Framework with KL Divergence and Likelihood Constraints

扩散模型中的遗忘学习:基于KL散度和似然约束的统一框架

Shervin Khalafi, Alejandro Ribeiro, Dongsheng Ding

发表机构 * University of Pennsylvania(宾夕法尼亚大学) University of Tennessee, Knoxville(田纳西大学,基洛纳)

AI总结 提出一个约束优化框架,通过最小化与预训练模型的偏差并施加与遗忘分布的分离约束,实现扩散模型中的概念和数据遗忘,并基于KL散度和似然约束推导最优解及原始-对偶算法。

Comments 27 pages, 6 figures, 4 tables; Accepted by ICML 2026

详情
AI中文摘要

扩散模型中的遗忘学习旨在移除不需要的数据或概念,同时保留预训练模型的效用——这两个目标本质上相互冲突。我们提出了一个原则性的约束优化框架,将遗忘学习形式化为在满足与遗忘分布的显式分离约束下,最小化与预训练模型的偏差。具体地,我们基于反向和正向KL散度以及似然约束,构建了三个约束优化问题。前两个问题泛化了现有的概念和数据遗忘方法,而第三个问题为遗忘学习提供了一种新颖且自然的表述。尽管KL约束非凸,我们证明了所有三个问题的强对偶性,从而能够显式地表征其最优解作为遗忘目标,并为每个公式开发原始-对偶算法。实验结果表明,与基于权重的基线方法相比,我们的KL约束方法在概念和数据遗忘中实现了更优的保留-遗忘权衡,而基于似然的方法在匹配遗忘效果的同时,更好地保留了保留概念。

英文摘要

Unlearning in diffusion models aims to remove undesirable data or concepts while preserving the utility of pretrained models -- two fundamentally conflicting objectives. We propose a principled constrained optimization framework that formulates unlearning as minimizing the deviation from a pretrained model, subject to explicit separation constraints from the unlearning distributions. Specifically, we formulate three constrained optimization problems based on reverse and forward KL divergences, and likelihood constraints. The first two generalize existing approaches for concept and data unlearning, while the third offers a novel and natural formulation for unlearning. Despite the nonconvexity of the KL constraints, we establish strong duality for all three problems, enabling us to explicitly characterize their optimal solutions as unlearning targets and develop primal-dual algorithms for each formulation. Experimental results demonstrate that our KL-constrained approach achieves superior retention-unlearning tradeoffs compared to weight-based baselines for concept and data unlearning, and that our likelihood-based approach matches unlearning effectiveness while better preserving retained concepts compared to baselines.

2605.30812 2026-06-01 cs.LG physics.comp-ph 版本更新

Learning Permutation-invariant Macroscopic Dynamics

学习置换不变的宏观动力学

Zhichao Han, Mengyi Chen, Qianxiao Li

发表机构 * Institute for Functional Intelligent Materials, National University of Singapore(功能智能材料研究所,新加坡国立大学) Department of Mathematics, National University of Singapore(数学系,新加坡国立大学)

AI总结 提出一种置换不变的自编码器框架,通过重建质量分布而非逐点重建来学习无序微观系统的宏观动力学,并在粒子系统、Lennard-Jones流体和聚合物拉伸动力学中验证了有效性。

Comments ICML 2026 submission

详情
AI中文摘要

准确建模高维微观系统的宏观动力学在科学领域具有广泛兴趣。许多数据驱动方法通过自编码器学习低维潜在状态,该自编码器针对逐点输入重建进行训练。这些方法通常假设输入中微观自由度的固定顺序。然而,在许多场景中,例如粒子系统,微观状态本质上是无序的。这激发了一种学习置换不变潜在表示的自编码器框架。为此,我们采用置换不变的编码器,并设计解码器来重建以观测点为中心的质量分布,而不是逐样本重建。然后,我们联合学习可观测量和潜在状态的宏观动力学。我们展示了所提方法在各种微观设置中的有效性和鲁棒性,包括学习相互作用粒子系统中的能量动力学、预测Lennard-Jones流体中的混合动力学,以及从拉伸力场中运动的聚合物视频数据建模拉伸动力学。

英文摘要

Accurately modeling the macroscopic dynamics of high-dimensional microscopic systems is of broad interest across the sciences. Many data-driven approaches learn a low-dimensional latent state through an autoencoder trained for pointwise input reconstruction. These methods typically assume a fixed ordering of microscopic degrees of freedom in the input. However, in many settings, such as particle systems, the microscopic state is inherently unordered. This motivates an autoencoder framework that learns permutation-invariant latent representations. To this end, we adopt a permutation-invariant encoder and design the decoder to reconstruct the mass distribution centered at the observed points rather than per-sample reconstruction. We then jointly learn the macroscopic dynamics of the observables together with the latent states. We demonstrate the effectiveness and robustness of the proposed method across a range of microscopic settings, including learning the energy dynamics in interacting particle systems, predicting mixing dynamics in Lennard-Jones fluids, and modeling the stretching dynamics from video data of polymers moving in an elongational force field.

2605.30811 2026-06-01 cs.LG 版本更新

Non-destructive Identification of Oyster Species is possible from Hyperspectral Images with Machine Learning

基于高光谱图像与机器学习实现牡蛎物种的无损鉴别

Ethan Kane Waters, Max Wingfield, Aiden Mellor, Paul Stewart, Iman Tahmasbian

发表机构 * Department of Primary Industries(初级产业部) QUT Centre for Data Science, School of Mathematical Sciences, Queensland University of Technology(昆士兰理工大学数据科学中心、数学科学学院) School of Environment and Science, Griffith University, Nathan(格里菲斯大学环境与科学学院、纳恩)

AI总结 本研究利用高光谱成像结合偏最小二乘判别分析和卷积神经网络,实现了对黑唇岩牡蛎和悉尼岩牡蛎的无损、高准确率鉴别。

Comments 13 pages, 9 figures

详情
AI中文摘要

区分牡蛎物种对于开发适合生产系统的新型商业牡蛎物种至关重要,并且对海鲜供应链的可追溯性至关重要。常见方法(如DNA分析)具有破坏性且耗时。本研究探讨了使用高光谱成像(HSI)区分黑唇岩牡蛎(BL)和悉尼岩牡蛎(SR)的可能性。对活体BL和SR样本(N=156)用HSI相机(950-2515nm)进行扫描。使用蒙特卡洛交叉验证训练偏最小二乘判别分析(PLS-DA)和卷积神经网络(CNN),根据左右壳的光谱反射率区分BL和SR牡蛎。PLS-DA模型成功区分了左右壳的物种,中位测试集分类准确率为100%,优于CNN(分别为83%和96%)。通过电子显微镜测量了牡蛎壳表面和横截面的元素及矿物组成。右壳分析显示,BL的层数多于SR(4层 vs 2层)。右壳外层的碳和氧浓度存在差异,BL富含碳,SR富含氧。BL和SR右壳之间碳和氧浓度的变化可能反映了几丁质和糖蛋白的相对丰度或组成差异。模型导出的波长重要性对应于这些化合物特征官能团的振动模式,支持了这一观点。透射分析显示,光透过壳体和壳体边缘,表明光谱特征可能受到另一壳或肉的影响。最终,研究结果突显了一种快速、无损的牡蛎物种鉴别方法。

英文摘要

Differentiating between oyster species is important for developing new commercial oyster species suited to production systems and is critical for traceability in seafood supply chains. Common methods, such as DNA profiling, are destructive and time consuming. The possibility of using hyperspectral imaging (HSI) for discriminating between Black-Lip rock (BL) and Sydney rock (SR) oysters was investigated. Live BL and SR samples (N = 156) were scanned with a HSI camera (950-2515nm). Partial Least Square Discriminant Analysis and Convolutional Neural Networks were trained with Monte Carlo Cross Validation to distinguish BL and SR oysters from the spectral reflectance of their left and rights valves. The PLS-DA model successfully distinguished between the species from both the left and right valves with a median test set classification accuracy of 100%, out performing the CNN with 83% and 96% respectively. Elemental and mineralogical composition in the surface and cross-section of oyster valves were measured with electron microscopy. Analysis of the right valve revealed a greater number of layers in BL compared to SR (4 vs 2). The concentrations of carbon and oxygen varied in the outer layer of the right valves, with BL being rich in carbon and SR being rich in oxygen. The variation in carbon and oxygen concentrations observed between BL and SR right valves may reflect differences in the relative abundance or composition of chitin and glycoproteins. This is supported by model-derived wavelength importance corresponding to vibrational modes of functional groups characteristic of these compounds. Transmittance analysis revealed that light was transmitted through the valves, around the valve edges, indicating that the spectral signatures may have been influenced by the other valve or the meat. Ultimately, the findings highlight an effective rapid, non-destructive methodology for oyster species.

2605.30810 2026-06-01 cs.LG 版本更新

IRIS: time-structured manifold projections

IRIS: 时间结构化流形投影

Brian Ondov, Chia-Hsuan Chang, Weipeng Zhou, Xingjian Zhang, Xueqing Peng, Yutong Xie, Huan He, Qiaozhu Mei, Hua Xu

发表机构 * Department of Biomedical Informatics and Data Science, Yale School of Medicine(耶鲁医学院生物医学信息学与数据科学部) School of Information, University of Michigan(密歇根大学信息学院)

AI总结 提出IRIS算法,通过结合时间顺序和流形拓扑结构,解决t-SNE和UMAP无法体现时间动态的问题,适用于scRNA-seq、比较宏基因组学等动态生物医学数据可视化。

详情
AI中文摘要

高维生物医学数据,如细胞-基因矩阵,越来越多地按时间顺序生成。然而,流形学习算法(如t-SNE和UMAP)无法在其布局中融入时间顺序,模糊了细胞类型或其他类别的动态变化。作为解决方案,我们提出了IRIS,一种新的流形学习算法,能够按时间顺序和流形拓扑结构构建布局。IRIS可以可视化广泛的动态生物医学数据,包括scRNA-seq、比较宏基因组学和文献数据。

英文摘要

High-dimensional biomedical data, such as cell-by-gene matrices, are increasingly generated temporally. However, Manifold Learning algorithms, like t-SNE and UMAP, cannot incorporate time-ordering in their layouts, obfuscating the dynamics of cell types or other classes. As a solution, we present IRIS, a new Manifold Learning algorithm that structures layouts both chronologically and by manifold topology. IRIS can visualize a wide range of dynamic biomedical data, including scRNA-seq, comparative metagenomics, and literature.

2605.30808 2026-06-01 cs.CR cs.AI cs.LG 版本更新

Differentially Private Preference Data Synthesis for Large Language Model Alignment

面向大语言模型对齐的差分隐私偏好数据合成

Fengyu Gao, Jing Yang

发表机构 * Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA(弗吉尼亚大学计算机科学系) Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, Virginia, USA(弗吉尼亚大学电气与计算机工程系)

AI总结 提出DPPrefSyn算法,基于Bradley-Terry偏好模型和DP-PCA生成差分隐私合成偏好数据,实现隐私保护的偏好对齐。

Comments Accepted to ICML 2026

详情
AI中文摘要

偏好对齐是大语言模型(LLMs)的关键后训练步骤,以确保其输出与人类价值观一致。然而,在真实人类偏好数据上进行后训练会引发隐私问题,因为这些数据集通常包含敏感的用户提示和人类判断。为了解决这一问题,我们提出了DPPrefSyn,一种用于生成差分隐私(DP)合成偏好数据的新算法,以实现隐私保护的偏好对齐。DPPrefSyn是一个基于Bradley-Terry偏好模型和成对人类偏好数据内在几何结构的原理性框架。它首先从具有正式差分隐私保证的私有数据中学习一个潜在的偏好模型,然后利用学习到的模型结合公共提示合成高质量的偏好数据。它利用每个簇奖励模型的共享线性结构来有效捕捉私有数据中的异构人类偏好,并利用差分隐私主成分分析(DP-PCA)来提高学习准确性。大量实验结果表明,DPPrefSyn在强DP保证下实现了具有竞争力的对齐性能。这些发现突显了合成偏好数据作为隐私保护偏好对齐的实用替代方案在广泛应用中的潜力。据我们所知,这是首项为LLM对齐生成DP合成偏好数据的工作。我们的代码可在https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis获取。

英文摘要

Preference alignment is a crucial post-training step for large language models (LLMs) to ensure their outputs align with human values. However, post-training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose DPPrefSyn, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving preference alignment. DPPrefSyn is a principled framework grounded in the Bradley-Terry preference model and the intrinsic geometric structure of pairwise human preference data. It first learns an underlying preference model from private data with formal differential privacy guarantees, and then leverages the learned model together with public prompts to synthesize high-quality preference data. It exploits the shared linear structure of per-cluster reward models to effectively capture heterogeneous human preferences in private datasets, and leverages DP Principal Component Analysis (DP-PCA) to improve learning accuracy. Extensive experimental results demonstrate that DPPrefSyn achieves competitive alignment performance under strong DP guarantees. These findings highlight the potential of synthetic preference data as a practical alternative for privacy-preserving preference alignment across a broad range of applications. To the best of our knowledge, this is the first work to generate DP synthetic preference data for LLM alignment. Our code is available at https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis.

2605.30807 2026-06-01 cs.LG 版本更新

Conformal Reliability: A New Evaluation Metric for Conditional Generation

共形可靠性:条件生成的新评估指标

Yachen Gao, Xinwei Sun, Yikai Wang, Ye Shi, Jingya Wang, Jianfeng Feng, Yanwei Fu

发表机构 * Institute of Science and Technology for Brain-Inspired Intelligence(脑启发式智能科学与技术研究院) Shanghai Innovation Institute(上海创新研究院) School of Data Science(数据科学学院) Nanyang Technological University(南洋理工大学) School of Information Science and Technology(信息科学与技术学院)

AI总结 提出基于共形预测的可靠性分数作为条件生成模型的新评估指标,并开发CReL框架高效计算该分数,实验证明其有效性和可解释性。

Comments Accepted at ICML 2026

详情
AI中文摘要

条件生成模型近年来在各种应用中取得了显著成功。然而,目前仍缺乏一个合适的指标来评估这些模型的可靠性,该指标需要考虑其固有的不确定性。现有指标通常评估单个输出,可能无法捕捉生成中的变异性或潜在风险。在本文中,我们提出了一种基于共形预测的新型评估指标,称为可靠性分数,该指标在预指定的置信水平下衡量预测集内的最差性能。然而,由于输出空间的高维性以及指标函数和预测集的非凸性,计算该分数具有挑战性。为了高效计算该分数,我们引入了共形可靠性(CReL)框架,该框架可以(i)构建具有期望覆盖率的预测集;(ii)在构建的预测集内准确优化可靠性分数。我们提供了关于覆盖率的理论结果,并实验证明我们的方法比现有方法能产生更具信息量的预测集。在合成数据以及图像到文本和文本到图像任务上的实验进一步展示了我们新指标的可解释性,以及我们计算框架的有效性和高效性。源代码可在https://ggc29.github.io/CReL/找到。

英文摘要

Conditional generative models have recently achieved remarkable success in various applications. However, a suitable metric for evaluating the reliability of these models, which takes into account their inherent uncertainty, is still lacking. Existing metrics, which typically assess a single output, may fail to capture the variability or potential risks in generation. In this paper, we propose a novel evaluation metric called reliability score based on conformal prediction, which measures the worst-case performance within the prediction set at a pre-specified confidence level. However, computing this score is challenging due to the high-dimensional nature of the output space and the nonconvexity of both the metric function and the prediction set. To efficiently compute this score, we introduce Conformal ReLiability (CReL), a framework that can (i) construct the prediction set with desired coverage; and (ii) accurately optimize the reliability score within the constructed prediction set. We provide theoretical results on coverage and demonstrate empirically that our method produces more informative prediction sets than existing approaches. Experiments on synthetic data and the image-to-text and text-to-image tasks further demonstrate the interpretability of our new metric, and the validity and effectiveness of our computational framework. Source code can be found at https://ggc29.github.io/CReL/.

2605.30788 2026-06-01 cs.CL cs.AI cs.LG 版本更新

XLGoBench: Detecting cross-lingual skill gaps with algorithmic tasks

XLGoBench: 用算法任务检测跨语言技能差距

Purvam Jain, Preethi Jyothi, Vihari Piratla, Suvrat Raju

发表机构 * Google DeepMind(谷歌深Mind) Indian Institute of Technology Bombay(印度理工学院孟买分校) International Centre for Theoretical Sciences, Tata Institute of Fundamental Research(理论科学国际中心, Tata 基础研究机构)

AI总结 提出一套合成算法任务基准,通过跨语言执行相同任务来检测大语言模型的跨语言能力差距,实验揭示多个先进模型存在持续差距。

Comments 8+37pages

详情
AI中文摘要

我们引入一套合成算法任务,用于检测大语言模型在跨语言能力上的差距。我们的基准在语言间具有可比性,因为它要求模型在不同语言中执行相同的底层任务;可扩展,因为每个任务可以在不同复杂度级别生成,从而适应不同能力的模型;可量化,因为每个任务都承认客观的正确性概念;且透明,因为任务是从简单模板生成的,可以轻松审计翻译错误。由于我们的基准专注于算法任务,性能差异是跨语言差距的充分但不必要条件。尽管如此,我们通过大量实验表明,我们的基准暴露了多个最先进模型中存在的持续跨语言差距。

英文摘要

We introduce a set of synthetic algorithmic tasks to detect cross-lingual gaps in the abilities of large language models. Our benchmark is commensurate across languages, since it requires models to perform the same underlying task in different languages; scalable, since each task can be generated at varying levels of complexity allowing it to be adapted to models with different capabilities; quantifiable, since every task admits an objective notion of correctness; and transparent, since tasks are generated from simple templates that can be readily audited for translation errors. Because our benchmark focuses on algorithmic tasks, differential performance is a sufficient -- but not necessary -- indicator of cross-lingual gaps. Nevertheless, we show through extensive experiments that our benchmark exposes persistent cross-lingual gaps in multiple state-of-the-art models.

2605.30786 2026-06-01 cs.LG 版本更新

AbstainGNN: Teaching Graph Neural Networks to Abstain for Graph Classification

AbstainGNN:教会图神经网络在图分类中弃权

Xixun Lin, Zhiheng Zhou, Zhengyin Zhang, Yancheng Chen, Shuai Zhang, Ge Zhang, Shichao Zhu, Lixin Zou, Chuan Zhou, Peng Zhang, Shirui Pan, Yanan Cao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(信息工程研究所,中国科学院) School of Mathematics and Statistics, Shandong University(数学与统计学学院,山东大学) Academy of Mathematics and Systems Science, Chinese Academy of Sciences(数学与系统科学学院,中国科学院) School of Information and Intelligent Science, Donghua University(信息与智能科学学院,东华大学) TikTok, ByteDance Inc.(TikTok,字节跳动公司) School of Cyber Science and Engineering, Wuhan University(网络安全与工程学院,武汉大学) Cyberspace Institute of Advanced Technology, Guangzhou University(空域技术高级研究所,广州大学) Griffith University(格里菲斯大学)

AI总结 提出AbstainGNN框架,通过理论驱动的弃权机制让GNN在不确定时拒绝预测,避免错误决策,并基于PAC-Bayes理论优化分类与弃权权衡。

Comments Accepted at KDD 2026

详情
AI中文摘要

图分类是图数据挖掘中的核心任务,具有广泛的现实应用。图神经网络的最新进展显著提升了图分类的性能。然而,现有的GNN即使在高度不确定性或未知条件下也通常被迫做出预测,导致不可靠的决策,特别是在安全关键场景中会严重影响下游任务。为了解决这一关键限制,我们提出了AbstainGNN,一种新颖且理论驱动的带弃权图分类框架,使GNN能够拒绝不确定的预测,而不是产生错误的决策。具体来说,AbstainGNN显式地建模了预测函数和弃权函数,从而有效利用图结构信息。此外,与现有的启发式弃权方法不同,我们从PAC-Bayesian泛化角度理论刻画了分类错误与拒绝成本之间的权衡,并推导出用于模型优化的统一学习目标。在此理论洞察的指导下,我们进一步开发了一种高效的两阶段训练策略,包括预测函数预热和弃权函数校准。在五个基准数据集上的大量实验表明,AbstainGNN优于现有的弃权方法,在相同拒绝率下实现了更优的分类性能。

英文摘要

Graph classification is a core task in graph data mining with widespread real-world applications. Recent advances in graph neural networks (GNNs) have led to substantial performance improvements for graph classification. However, existing GNNs are typically forced to make predictions even under high uncertainty or unknown conditions, resulting in unreliable decisions that can severely impact downstream tasks, particularly in safety-critical scenarios. To address this critical limitation, we propose AbstainGNN, a novel and theory-driven framework for graph classification with abstention, which enables GNNs to reject uncertain predictions instead of producing incorrect decisions. Specifically, AbstainGNN explicitly models both the predictive function and the abstention function, allowing for effective utilization of graph structural information. Moreover, unlike existing heuristic abstention methods, we theoretically characterize the trade-off between classification errors and rejection costs from a PAC-Bayesian generalization perspective, and derive a unified learning objective for model optimization. Guided by this theoretical insight, we further develop an efficient two-stage training strategy consisting of predictive function warm-start and abstention function calibration. Extensive experiments on five benchmark datasets show that AbstainGNN outperforms existing abstention methods, achieving superior classification performance under the same rejection rates.

2605.30776 2026-06-01 cs.LG 版本更新

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

高效且不确定性感知的离线到在线强化学习扩散框架

Ha Manh Bui, Metod Jazbec, Eric Nalisnick, Anqi Liu

发表机构 * Department of Computer Science, Johns Hopkins University, Baltimore, MD, U.S.A.(约翰霍普金斯大学计算机科学系) AMLab, University of Amsterdam, Amsterdam, Netherlands(阿姆斯特丹大学AM实验室)

AI总结 提出DUAL框架,利用扩散模型先验知识蒸馏快速采样扩散策略和转移模型,并通过拉普拉斯近似和距离转移状态偏移检测进行不确定性量化,以改进在线阶段的探索与利用平衡。

Comments International Conference on Machine Learning, 2026

详情
AI中文摘要

离线到在线强化学习(O2O-RL)利用离线预训练策略来最小化昂贵的在线交互。尽管数据高效,但O2O-RL容易受到离线与在线分布之间偏移的影响。现有工作旨在通过对从扩散模型采样的轨迹数据微调策略来减轻这种偏移的危害。受此启发,我们提出了DUAL:一个用于离线到在线强化学习的高效不确定性感知扩散框架。DUAL利用扩散模型的先验知识,在离线阶段蒸馏出一个快速采样的扩散策略和转移模型。DUAL还采用拉普拉斯近似和距离转移状态偏移检测,从而通过不确定性量化来改进在线阶段的探索与利用平衡。我们正式证明,带有拉普拉斯近似的策略损失提供了认知不确定性原则性估计的代理。实验上,DUAL在多种设置和环境下的在线期望回报优于O2O-RL基线。

英文摘要

Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions. Existing work aims to mitigate the harm of this shift by finetuning the policy on trajectory data sampled from a diffusion model. Inspired by this line of work, we propose DUAL: an efficient \textbf{D}iffusion \textbf{U}ncertainty-\textbf{A}ware framework for offline-to-online reinforcement \textbf{L}earning. DUAL utilizes the prior knowledge of the diffusion model to distill a fast-sampling diffusion actor policy and transition model in the offline phase. DUAL also employs a Laplace approximation and distance transition-state-shift detection, thereby using uncertainty quantification to improve exploration versus exploitation in the online phase. We formally show that our actor loss with the Laplace approximation provides a proxy for a principled estimate of epistemic uncertainty. Empirically, DUAL improves the online expected return over O2O-RL baselines across multiple settings and environments.

2605.30758 2026-06-01 cs.CL cs.LG 版本更新

Pairwise Reference Alignment as a Model-Level Ordinal Observable

成对参考对齐作为模型级序数可观测量

Mujing Li

发表机构 * Independent Researcher(独立研究者)

AI总结 本文定义成对参考对齐为模型评分函数诱导的序数可观测量,提出中心化序参数统计量和基于边界的扩展,并给出有限样本估计和浓度界,通过Qwen2.5和RewardBench实验验证。

详情
AI中文摘要

成对偏好数据广泛用于语言模型评估和对齐,通常用于模型排名、奖励建模或偏好优化。本文提出了一个更基础的测量问题:给定成对偏好的参考分布,当我们测试模型是否将首选响应排在拒绝响应之上时,估计的是哪个模型级量?我们将成对参考对齐定义为由模型评分函数诱导的序数可观测量。给定三元组$(x,y^+,y^-)$上的参考对分布$P_{\mathrm{pair}}$和标量模型分数$S_M(x,y)$,我们将对齐可观测量定义为模型诱导的排序与参考偏好排序一致的概率。我们进一步定义了一个中心化的序参数类统计量,并讨论了基于边界的扩展。所得量在独立抽样假设下具有简单的有限样本估计量和浓度界。本文没有引入新的基准。它为成对参考对齐提供了概念和统计公式,阐明了参考对分布的作用,并将一般的序数可观测量与评分选择(如归一化对数概率或基于能量的分数)区分开来。我们还在Qwen2.5模型和RewardBench上进行了初步实证研究,其中所提出的统计量随模型大小和指令调优而增加,并根据公式在参考对子集之间变化。

英文摘要

Pairwise preference data is widely used in language-model evaluation and alignment, often for model ranking, reward modeling, or preference optimization. This note formulates a more basic measurement question: given a reference distribution of pairwise preferences, what model-level quantity is estimated when we test whether a model ranks preferred responses above rejected responses? We define pairwise reference alignment as an ordinal observable induced by a model scoring function. Given a reference pair distribution $P_{\mathrm{pair}}$ over triples $(x,y^+,y^-)$, and a scalar model score $S_M(x,y)$, we define the alignment observable as the probability that the model-induced ordering agrees with the reference preference ordering. We further define a centered order-parameter-like statistic and discuss a margin-based extension. The resulting quantities admit simple finite-sample estimators and concentration bounds under independent sampling assumptions. This note does not introduce a new benchmark. It provides a conceptual and statistical formulation for pairwise reference alignment, clarifies the role of the reference pair distribution, and distinguishes the general ordinal observable from scoring choices such as normalized log-probability or energy-based scores. We also provide an initial empirical study on Qwen2.5 models and RewardBench, where the proposed statistics increase with model size and instruction tuning and vary across reference-pair subsets as predicted by the formulation.

2605.30757 2026-06-01 cs.LG 版本更新

Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation

思维链与压缩循环Transformer:记忆预算分离

Haozhou Zhang

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 本文通过比较三种记忆机制(压缩潜在循环、全序列状态循环和思维链暂存区),证明压缩循环Transformer的记忆预算限制其推理能力,而思维链通过扩展上下文实现更强的问题求解。

详情
AI中文摘要

思维链提示和循环Transformer都赋予固定模型更多的测试时计算,但它们在记忆内容上有所不同。思维链将中间状态存储在生成的标记中,这些标记保留在上下文中,而循环Transformer通过循环隐藏激活传递状态。我们认为这种持久可变记忆是测试时推理的核心资源。我们比较了三种记忆机制:压缩潜在循环、全序列状态循环和思维链暂存区。我们的主要结果表明,压缩循环受其循环状态大小的限制。运行更长时间的循环增加了计算量,但本身不会创建增长的暂存区,因此即使运行多个步骤,具有小循环状态的循环仍然是小空间推理器。在标准复杂性假设下,这样的循环无法解决在logspace归约下P-complete的问题,而多项式长度的思维链可以。这种分离是压缩循环特有的,因为全序列状态循环在每个输入位置携带状态,并处于更接近显式暂存区的记忆丰富状态。受控的指针追逐和关联回忆扫描说明了这种记忆预算观点,其性能对持久状态预算是否匹配任务的工作记忆需求敏感。

英文摘要

Chain-of-thought prompting and looped Transformers both give a fixed model more test-time computation, but they differ in what they remember. Chain-of-thought stores intermediate state in generated tokens that remain in the context, whereas a looped Transformer carries state through recurrent hidden activations. We argue that this persistent mutable memory is a central resource for test-time reasoning. We compare three memory regimes, the compressed latent loop, the full sequence-state loop, and the chain-of-thought scratchpad. Our main result shows that a compressed loop is limited by the size of its recurrent state. Running the loop longer adds computation but does not by itself create a growing scratchpad, so a loop with a small recurrent state remains a small-space reasoner even when run for many steps. Under a standard complexity assumption, such loops cannot decide problems that are P-complete under logspace reductions, whereas polynomial-length chain-of-thought can. The separation is specific to compressed loops, as full sequence-state loops carry state at every input position and live in a memory-rich regime closer to explicit scratchpads. Controlled pointer-chasing and associative-recall sweeps illustrate this memory-budget view, with performance sensitive to whether the persistent-state budget matches the task's working-memory demand.

2605.30749 2026-06-01 cs.LG cs.RO 版本更新

FLAG: Flow Policy MaxEnt-RL by Latent Augmented Guidance

FLAG: 通过潜在增强引导的流策略最大熵强化学习

Sungha Kim, Gawon Lee, Jusuk Lee, Jonghae Park, H. Jin Kim, Daesol Cho

发表机构 * Seoul National University(首尔国立大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出FLAG方法,通过潜在变量增强状态空间并优化代理最大熵目标,解决重要性权重崩溃问题,实现高维控制任务中的表达性策略优化。

详情
AI中文摘要

最大熵强化学习(MaxEnt-RL)能够实现鲁棒的探索,然而实际实现通常将策略限制为简单的高斯分布。最近的方法通过重要性加权监督学习引入表达性生成策略,但容易受到重要性权重崩溃的影响,这限制了它们在高维动作空间中的可扩展性。我们的关键见解是通过局部化采样区域来缓解这一限制,避免在整个动作空间上进行重要性采样导致的权重退化。为了实例化这一见解,我们引入了FLAG(具有潜在增强引导的流策略)。FLAG通过流潜在变量增强状态空间,并优化一个可证明一致的代理MaxEnt-RL目标。我们经验证明,FLAG能够在有限的重要性样本下实现表达性策略优化,并扩展到高维控制任务。此外,FLAG在具有挑战性的基准测试中达到了最先进的性能。我们的项目网页:https://flag-rl.github.io/

英文摘要

Maximum entropy reinforcement learning (MaxEnt-RL) enables robust exploration, yet practical implementations often restrict policies to simple Gaussians. While recent approaches incorporate expressive generative policies via importance-weighted supervised learning, they are prone to importance weight collapse, which limits their scalability in high-dimensional action spaces. Our key insight is to mitigate this limitation by localizing the sampling region, avoiding the weight degeneracy induced by importance sampling over the entire action space. To instantiate this insight, we introduce \textbf{FLAG} (\textbf{F}low policy with \textbf{L}atent-\textbf{A}ugmented \textbf{G}uidance). FLAG augments the state space with a flow latent variable and optimizes a provably consistent proxy MaxEnt-RL objective. We empirically demonstrate that FLAG enables expressive policy optimization with limited importance samples and scales to high-dimensional control tasks. Furthermore, FLAG achieves state-of-the-art performance across challenging benchmarks. Our project webpage: https://flag-rl.github.io/

2605.30741 2026-06-01 stat.ML cs.LG 版本更新

Is the Last Layer Sufficient for Uncertainty Quantification?

最后一层是否足以进行不确定性量化?

Joseph Wilson, Chris van der Heide, Liam Hodgkinson, Fred Roosta

发表机构 * School of Mathematics and Physics, University of Queensland, Australia(昆士兰大学数学与物理学院,澳大利亚) School of Mathematics and Statistics, University of Melbourne, Australia(墨尔本大学数学与统计学学院,澳大利亚) Department of Electrical and Electronic Engineering, University of Melbourne, Australia(墨尔本大学电子与电气工程系,澳大利亚) ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia(信息韧性ARC培训中心(CIRES),布里斯班,澳大利亚)

AI总结 通过理论分析和实验评估,比较全网络线性化与最后一层线性化在深度神经网络认知不确定性量化中的性能,发现最后一层近似在保持相当UQ性能的同时显著提升计算效率。

Comments 40 pages, 14 figures, 7 tables

详情
AI中文摘要

深度神经网络(DNN)的认知不确定性量化(UQ)是在关键任务环境中安全采用AI的要求。几种领先的UQ方法将DNN线性化以形成贝叶斯广义线性模型(GLM),其中认知不确定性通过预测后验分布建模。在DNN的最终连接层参数周围进行线性化是一种常用的近似方法,用于减少此类GLM的计算负担,尽管通常认为这会以性能下降为代价。在这项工作中,我们使用理论和实证方法比较了由全网络和最后一层线性化产生的GLM。我们首先利用随机矩阵理论进行理论比较;该分析显示全线性化在UQ能力上没有有意义的改进。结合一系列现代机器学习任务的大规模实证评估,我们得出以下结论:最后一层近似在提供显著提高的计算效率的同时,产生了可比的UQ性能。

英文摘要

Epistemic uncertainty quantification (UQ) for deep neural networks (DNNs) is a requirement for safe adoption of AI in mission-critical settings. Several leading methods for UQ linearize DNNs to form Bayesian Generalized Linear Models (GLMs), where epistemic uncertainty is modeled via the predictive posterior distribution. Linearizing around the parameters of the final connected layer of a DNN is a commonly used approximation for reducing the computational burden of such GLMs, though it is often believed to come at the cost of degraded performance. In this work, we compare GLMs arising from full-network and last-layer linearization using both theoretical and empirical approaches. We first employ tools from random matrix theory to conduct a theoretical comparison; this analysis reveals no meaningful improvement in the UQ capabilities of full linearization. Coupled with a large-scale empirical evaluation across a range of modern machine learning tasks, we arrive at the following conclusion: a last-layer approximation yields comparable UQ performance while offering substantially improved computational efficiency.

2605.30736 2026-06-01 cs.LG cs.AI cs.CL 版本更新

OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline-Online Learning

OrcaRouter: 一种面向生产的混合离线-在线学习LLM路由器

Zhenghua Bao, Fengya Tian, Chris Zhang, Zhenjun Chen, Xile Ma, Yi Shi

发表机构 * Continuum AI

AI总结 提出OrcaRouter,一种结合LinUCB上下文赌博机与混合离线-在线学习协议的生产级LLM路由器,通过离线全信息反馈和在线赌博机学习实现低成本高精度模型选择。

Comments 6 pages, 1 table. Technical report

详情
AI中文摘要

大型语言模型的快速发展,每个模型具有不同的能力和推理成本,引发了一个实际部署问题:给定一个传入请求,应由哪个模型处理?我们提出OrcaRouter,一种面向生产的LLM路由器,它结合了基于词法和句子嵌入特征的LinUCB上下文赌博机与混合离线-在线学习协议。在离线阶段,OrcaRouter通过在一组精心策划的路由提示上评估每个候选模型来获取全信息反馈,生成一个奖励矩阵,用于为每个臂拟合一个岭回归器。在部署时,它从这些参数初始化,并可选地从赌博机反馈中继续学习,在观察到奖励后仅更新所选模型的臂。在我们提交RouterArena时(2026年5月20日),OrcaRouter-Adaptive以72.08的竞技场得分在公共RouterArena排行榜上排名第二,在每1000次查询成本1.00美元的情况下实现了75.54%的准确率。

英文摘要

The rapid development of large language models, each with distinct capabilities and inference costs, raises a practical deployment question: given an incoming request, which model should handle it? We present OrcaRouter, a production-oriented LLM router that combines a LinUCB-based contextual bandit over lexical and sentence-embedding features with a hybrid offline-online learning protocol. Offline, OrcaRouter obtains full-information feedback by evaluating each candidate model on a curated set of routing prompts, yielding a reward matrix used to fit one ridge regressor per arm. At deployment time, it initializes from these parameters and can optionally continue learning from bandit feedback, updating only the selected model's arm after observing its reward. At the time of our RouterArena submission (May 20, 2026), OrcaRouter-Adaptive ranked second on the public RouterArena leaderboard with an arena score of 72.08, achieving 75.54% accuracy at a cost of USD 1.00 per 1,000 queries.

2605.30734 2026-06-01 cs.LG cs.CV 版本更新

Beyond Accuracy: Evaluating Efficiency, Robustness and Explainability in Deep Learning for Malaria Diagnosis

超越准确率:评估深度学习在疟疾诊断中的效率、鲁棒性和可解释性

Olivier Kanamugire, Kerol Djoumessi

发表机构 * African Institute for Mathematical Sciences(非洲数学科学研究所) Hertie Institute for AI in Brain Health(脑健康人工智能研究所)

AI总结 本研究在NLM-Malaria数据集上基准测试四种深度学习模型,联合评估预测性能、鲁棒性和事后可解释性,发现轻量级模型在性能上与重型模型相当,但可解释性在图像损坏下脆弱。

Comments Under review

详情
AI中文摘要

疟疾仍然是撒哈拉以南非洲地区的主要死亡原因,该地区诊断基础设施匮乏,使得及时准确的诊断尤其具有挑战性。虽然深度学习为自动化疟疾筛查提供了一条有前景的途径,但临床采用受到计算成本和决策不透明性的阻碍。本研究在NLM-Malaria数据集上基准测试了四种涵盖广泛设计架构和模型容量的深度学习模型,联合评估了预测性能、鲁棒性和事后可解释性。我们发现,轻量级、高效设计的模型在预测性能上与更重的模型相当,Friedman检验确认无统计显著差异。基于CAM的XAI方法一致地定位诊断相关区域,而细粒度归因方法产生的解释针对性较弱,尤其是在使用更重的骨干网络时。在三种图像损坏下的鲁棒性评估进一步揭示,模型置信度下降速度快于准确率,为人工审核提供了实用信号。然而,没有一种XAI方法对损坏具有鲁棒性,即使在预测仍然准确的情况下,解释可靠性也会在临床实践中可能出现的噪声水平下降。这些发现支持在资源受限环境中部署轻量级架构用于疟疾诊断,同时强调事后解释的脆弱性,这是负责任临床部署的重要考虑因素。

英文摘要

Malaria remains a leading cause of mortality in sub-Saharan Africa, where scarce diagnostic infrastructure makes timely, accurate diagnosis particularly challenging. While deep learning offers a compelling path toward automated malaria screening, clinical adoption is hindered by computational cost and opacity in decision-making. This work benchmarks four deep learning models spanning a wide range of designed design architectures and model capacities on the NLM-Malaria dataset, jointly evaluating predictive performance, robustness, and post-hoc explainability. We find that lightweight, efficient-by-design models match their heavier counterparts in predictive performance, and the Friedman test confirms no statistically significant performance differences. CAM-based XAI methods consistently localize diagnostically relevant regions, while fine-grained attribution methods produce less targeted explanations, particularly with heavier backbones. Robustness evaluation under three types of image corruption further reveals that model confidence degrades faster than accuracy, providing a practical signal for human review. However, no XAI method is robust to corruption, with explanation reliability degrading at noise levels plausible in clinical practice, even when predictions remain accurate. These findings support the deployment of lightweight architectures for malaria diagnosis in resource-constrained settings, while highlighting the vulnerability of post-hoc explanations as an important consideration for responsible clinical deployment.

2605.30729 2026-06-01 cs.LG cs.IR 版本更新

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

SemStruct: 利用结构信息对语义嵌入进行上下文化以实现模式匹配

Inwon Kang, Kavitha Srinivas, Nandana Mihindukulasooriya, Sola Shirai, Parikshit Ram, Horst Samulowitz, Oshani Seneviratne

发表机构 * Rensselaer Polytechnic Institute(伦斯勒理工学院) IBM Research(IBM研究院)

AI总结 提出SemStruct框架,通过将冻结的预训练语言模型与图神经网络的结构归纳偏置相结合,利用行级共现关系作为结构信息,在模式匹配任务中实现最先进性能。

Comments Accepted to KDD 26 Research Track

详情
AI中文摘要

模式匹配是集成异构数据源的基本步骤。虽然预训练语言模型通过捕获语言语义彻底改变了这一任务,但它们通常将表格数据视为独立列描述的序列化文本。这种序列化丢弃了关键的结构信息——具体来说,行级共现,即关系上下文——迫使模型仅依赖列标题语义或独立分布。为弥补这一差距,我们提出了SemStruct,一个将冻结的PLM的语义能力与图神经网络的结构归纳偏置相结合的框架。我们将表格建模为一个异构图,其中列和值是由行连接的节点,允许GNN在结构上传播消歧上下文。与需要专有LLM访问和语言模型微调的其他最先进方法不同,SemStruct保持语言模型冻结,仅训练一个轻量级结构编码器。在Valentine和SOTAB-SM基准上的大量实验表明,SemStruct实现了最先进的性能,在复杂的、可语义连接的数据集上超越了完全微调的基线。此外,我们的消融研究表明,行表示主要作为拓扑导管而非语义实体,验证了在模式匹配中显式结构建模的必要性。

英文摘要

Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as serialized text sequences of standalone column descriptions. This serialization discards critical structural information -- specifically, the row-level co-occurrences, i.e. the relational context -- forcing models to rely solely on column header semantics or standalone distributions. To bridge this gap, we propose SemStruct, a framework that joins the semantic power of frozen PLMs with the structural inductive bias of Graph Neural Networks (GNNs). We model the table as a heterogeneous graph where columns and values are nodes connected by rows, allowing the GNN to propagate disambiguating context across the structure. Unlike other state-of-the-art methods that require proprietary LLM access and fine-tuning of language models, SemStruct keeps the language model frozen and trains only a lightweight structural encoder. Extensive experiments on the Valentine and SOTAB-SM benchmarks demonstrate that SemStruct achieves state-of-the-art performance, outperforming fully fine-tuned baselines on complex, semantically joinable datasets. Furthermore, our ablation studies reveal that row representations serve primarily as topological conduits rather than semantic entities, validating the necessity of explicit structural modeling in schema matching.

2605.30728 2026-06-01 cs.LG cs.DC 版本更新

Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended

通过无损压缩减少机器学习中的GPU内存瓶颈——扩展版

Aditya K Kamath, Arvind Krishnamurthy, Marco Canini, Simon Peter

发表机构 * University of Washington(华盛顿大学)

AI总结 提出无损压缩算法IBP,通过识别和消除张量组中的不变位,利用GPU优化的解压缩和异步PCIe传输,显著减少数据传输时间,加速GNN训练、DLRM嵌入查找和LLM推理。

Comments Extended version of paper published at 21st European Conference on Computer Systems (EUROSYS '26), April 27-30, 2026, Edinburgh, Scotland Uk

详情
Journal ref
2026. In Proceedings of the 21st European Conference on Computer Systems. Association for Computing Machinery, 899-918
AI中文摘要

机器学习(ML)训练和推理经常处理远超GPU内存容量的数据集,迫使它们依赖PCIe进行按需张量传输,导致关键的传输瓶颈。有损压缩已被提出以缓解瓶颈,但会引入依赖工作负载的精度损失,使得在现有ML部署中使用变得复杂甚至不可行。我们探索无损压缩作为替代方案,以避免这种部署复杂性。我们确定了无损压缩可以集成到ML流水线中的位置,同时最小化对GPU执行的干扰。基于我们的发现,我们引入了不变位打包(IBP),一种新颖的无损压缩算法,旨在最小化ML的数据传输时间。IBP识别并消除张量组中的不变位,通过利用warp并行性、低开销位操作和异步PCIe传输的GPU优化解压缩来提高吞吐量。我们提供易于使用的API,通过为GNN训练以及DLRM和LLM推理框架添加IBP支持来展示它们。IBP平均实现了74%更快的GNN训练、180%更快的DLRM嵌入查找和24%更快的LLM推理。

英文摘要

Machine learning (ML) training and inference often process data sets far exceeding GPU memory capacity, forcing them to rely on PCIe for on-demand tensor transfers, causing critical transfer bottlenecks. Lossy compression has been proposed to relieve bottlenecks but introduces workload-dependent accuracy loss, making it complex or even prohibitive to use in existing ML deployments. We explore lossless compression as an alternative that avoids this deployment complexity. We identify where lossless compression can be integrated into ML pipelines while minimizing interference with GPU execution. Based on our findings, we introduce Invariant Bit Packing (IBP), a novel lossless compression algorithm designed to minimize data transfer time for ML. IBP identifies and eliminates invariant bits across groups of tensors, improving throughput through GPU-optimized decompression that leverages warp parallelism, low-overhead bit operations, and asynchronous PCIe transfers. We provide easy-to-use APIs, showcasing them by adding IBP support to GNN training, as well as DLRM and LLM inference frameworks. IBP achieves, on average, 74% faster GNN training, 180% faster DLRM embedding lookup, and 24% faster LLM inference.

2605.30720 2026-06-01 cs.LG cs.AI econ.GN q-fin.EC stat.ML 版本更新

Kalimati Vegetable Price Index Forecasting with a Momentum Corrected Online Stacking Ensemble

Kalimati蔬菜价格指数预测:基于动量校正的在线堆叠集成方法

Sahaj Raj Malla

发表机构 * Department of Mathematics, Kathmandu University(数学系,加德满都大学)

AI总结 针对新兴经济体农产品价格高波动性问题,提出动量校正在线堆叠集成模型,通过构建逆波动率加权综合指数和64个因果特征,在90天预测期实现RMSE=1.771、MAPE=0.68%、R²=0.845的优异性能。

Comments 21 pages, 8 figures, 2 tables

详情
AI中文摘要

由于高波动性、频繁的供应中断以及强烈的文化需求影响,新兴经济体的农产品价格预测十分困难。本研究引入了Kalimati蔬菜价格指数(KVPI),这是一个新的逆波动率加权综合指数,汇总了加德满都十年(2013-2023年)的135种日度批发商品。通过创建稳定的宏观信号,KVPI减少了单个作物建模固有的噪声。我们开发了包含64个因果有效特征的丰富特征集,包括节日领先滞后效应、滚动统计量和日历变量。对涵盖统计、树基、深度学习、混合和Transformer架构的14种预测模型,在短期(7天)、中期(14天和30天)和长期(90天)预测期上进行了严格评估。树基集成方法表现出显著的鲁棒性,而经典统计模型和复杂Transformer在处理噪声数据集时表现不佳。提出的动量校正在线堆叠集成模型取得了最强性能,在90天预测期上均方根误差(RMSE)为1.771,平均绝对百分比误差(MAPE)低至0.68%,并解释了84.5%的方差(R²=0.845)。这一开源流程为尼泊尔及类似市场的政策制定者和供应链参与者提供了实用、可靠的工具,以预测价格波动并加强粮食安全。

英文摘要

Forecasting agricultural commodity prices in emerging economies is difficult due to high volatility, frequent supply disruptions, and strong cultural influences on demand. This study introduces the Kalimati Vegetable Price Index (KVPI), a new inverse-volatility weighted composite index that aggregates 135 daily wholesale commodities from Kathmandu over ten years (2013-2023). By creating a stable macro-level signal, the KVPI reduces the noise inherent in modelling individual crops. A rich set of 64 causally valid features was developed, including festival lead-lag effects, rolling statistics, and calendar variables. Fourteen forecasting models spanning statistical, tree-based, deep learning, hybrid, and transformer architectures were rigorously evaluated across short (7-day), medium (14- and 30-day), and long-term (90-day) horizons. Tree-based ensembles proved notably robust, while classical statistical models and complex transformers struggled with the noisy dataset. The proposed Momentum-Corrected Online Stacking Ensemble achieved the strongest performance, yielding a Root Mean Square Error (RMSE) of 1.771, an exceptionally low Mean Absolute Percentage Error (MAPE) of 0.68%, and explaining 84.5% of the variance (R-squared = 0.845) at the 90-day horizon. This open-source pipeline provides policymakers and supply chain actors in Nepal and similar markets with a practical, reliable tool for anticipating price movements and strengthening food security.

2605.30719 2026-06-01 cs.LG cs.AI 版本更新

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

何时LLMs足以作为序列RL任务的策略优化器?

Stephane Hatgis-Kessell, Emma Brunskill

发表机构 * Department of Computer Science, Stanford University(计算机科学系,斯坦福大学)

AI总结 提出PromptPO方法,利用LLM通过Python描述状态空间、动作空间和奖励函数,基于rollout反馈迭代生成和优化可执行策略,在多种环境中匹配或超越标准RL基线,但在细粒度连续控制任务中表现不足。

详情
AI中文摘要

我们研究大型语言模型(LLMs)何时可以作为强化学习(RL)任务的有效黑盒策略优化器,即何时可以用LLM替代经典RL算法?我们通过引入提示策略优化(PromptPO)来探索这个问题,这是一种迭代方法,它用状态空间、动作空间和奖励函数的Python描述提示LLM,然后让LLM根据rollout反馈生成并优化可执行策略。在硬探索环境、Meta-World机器人任务以及几个现实世界控制问题中,PromptPO通常匹配或超过标准RL基线的性能,同时使用显著更少的环境交互。为了最大化期望回报,且无需进一步显式提示,PromptPO输出的策略范围从调谐的比例控制器或基于规则的规划到运行值迭代等规划算法的策略。我们的结果表明,当LLM能够利用关于环境或优化策略的先验知识时,基于LLM的策略优化是足够的。PromptPO在MuJoCo领域中的表现不如标准RL基线,这展示了基于LLM的策略优化在需要细粒度连续控制的设置中可能存在的局限性。

英文摘要

We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by introducing Prompted Policy Optimization (PromptPO), an iterative method that prompts an LLM with Python descriptions of the state space, action space, and reward function, then has it generate and refine executable policies based on rollout feedback. Across hard exploration environments, Meta-World robotics tasks, and several real-world control problems, PromptPO often matches or exceeds the performance of standard RL baselines while using substantially fewer environment interactions. To maximize expected return, and without further explicit prompting, the policies PromptPO outputs range from tuned proportional controllers or rule-based plans to policies that run planning algorithms like value iteration. Our results demonstrate that LLM-based policy optimization is sufficient when the LLM can leverage prior knowledge about the environment or optimization strategy. PromptPO underperforms standard RL baselines in MuJoCo domains. This demonstrates possible limitations of LLM-based policy optimization to settings that requiring fine-grained continuous control.

2605.30713 2026-06-01 cs.LG cs.CV cs.MM 版本更新

Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

多样性至关重要:重新审视视觉-语言模型中的测试时计算

Yijie Tong, Yifan Hou, Shaobo Cui, Antoine Bosselut, Mrinmaya Sachan

发表机构 * ETH Zürich(苏黎世联邦理工学院) Shanghai Jiao Tong University(上海交通大学) EPFL(苏黎世联邦理工学院)

AI总结 针对视觉-语言模型(VLM)中测试时计算(TTC)策略应用不足的问题,提出基于预测熵的ETTC方法,通过利用模型间的置信度差异提升集成性能,理论证明并实验验证其优于多数投票和最佳单模型。

Comments ICML 2026

详情
AI中文摘要

测试时计算(TTC)策略已成为提升大型语言模型(LLM)推理能力的一种轻量级方法。然而,它们在视觉-语言模型(VLM)中的应用和益处尚未得到充分探索。我们对七个VLM和六个基准进行了TTC的系统研究,特别分析了基于特征的评分和多数投票方法。我们发现特征启发式方法失败,而投票在单模型设置中仅带来微小提升。我们从理论上证明,这种局限性源于缺乏预测多样性:当输出高度相关时,投票收益甚微。相比之下,多模型集成提供了更丰富的多样性,但标准的多数投票未能考虑不同模型的能力差异。为解决这一问题,我们提出了基于熵的TTC(ETTC),它根据预测熵选择最自信的预测。在单模型情况下,我们的方法退化为多数投票,但在模型集成中,它利用置信度差异优先考虑更强的模型。我们证明,在温和假设下ETTC优于多数投票,并通过实验表明它始终优于投票和最佳个体模型。关键在于,我们的结果表明,较小的模型可以协同增强较大的模型,释放出标准策略无法实现的集成增益。

英文摘要

Test-time compute (TTC) strategies have emerged as a lightweight approach to boost reasoning in large language models (LLMs). However, their application and benefits for vision-language models (VLMs) remain underexplored. We present a systematic study of TTC across seven VLMs and six benchmarks, specifically analyzing feature-based scoring and majority voting methods. We find that feature heuristics fail and voting yields only modest gains in single-model settings. We theoretically show that this limitation stems from a lack of prediction diversity: when outputs are highly correlated, voting provides little benefit. In contrast, multi-model ensembles offer richer diversity, yet standard majority voting fails to account for varying model capabilities. To address this, we propose Entropy-based TTC (ETTC), which selects the most confident prediction based on predictive entropy. Our method reduces to majority voting in the single-model case, but in model ensembles, it leverages confidence disparities to prioritize stronger models. We prove that ETTC outperforms majority voting under mild assumptions and empirically demonstrate that it consistently surpasses both voting and the best individual model. Crucially, our results show that smaller models can synergistically enhance larger ones, unlocking ensembling gains not achievable with standard strategies.

2605.30711 2026-06-01 cs.CL cs.AI cs.LG stat.ML 版本更新

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

SAGE: 一种用于智能体大语言模型中高效记忆演化的新颖门控机制

Sijia Wang, Dhanajit Brahma, Ricardo Henao

发表机构 * Duke University(杜克大学)

AI总结 提出SAGE门控机制,基于von Mises-Fisher密度估计和自适应阈值,将记忆写入控制建模为新奇性检测问题,在LoCoMo上以更低成本实现最优token-F1。

详情
AI中文摘要

智能体大语言模型必须持续决定新提取的事实是应添加、与现有记忆合并还是忽略,然而先前的工作更侧重于检索和存储,而非原则性的写入端控制。我们将记忆演化视为一个新颖性检测问题,并提出SAGE(Spherical Adaptive Gate for memory Evolution),一种用于记忆演化的球形自适应门控机制,它通过基于von Mises-Fisher的密度估计器对记忆嵌入上的候选事实进行评分,并使用跟踪记忆存储几何结构的自适应阈值对其进行路由。SAGE将明确新颖的事实解析为ADD,明确冗余的事实解析为NOOP,仅将不确定的情况发送给LLM合并步骤,从而减少了昂贵的写入时推理。在LoCoMo上,SAGE在所有七个开放权重骨干对比中均实现了对Mem0的最佳平均token-F1,而在GPT-4o-mini上,它将添加阶段的API成本降低了3.4倍,添加阶段延迟降低了2.5倍,且平均评判分数差距很小。作为A-Mem的即插即用二进制门控,SAGE在五个模型上跳过了大约16-18%的LLM调用,且在开放权重骨干上质量变化极小。这些结果表明,新颖性感知的写入控制是提高长期智能体记忆中记忆质量和系统效率的实用杠杆。

英文摘要

Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We frame memory evolution as a novelty-detection problem and propose SAGE, a Spherical Adaptive Gate for memory Evolution that scores candidate facts with a von Mises-Fisher-based density estimator over memory embeddings and routes them with an adaptive threshold that tracks memory-store geometry. SAGE resolves clearly novel facts as ADD, clearly redundant facts as NOOP, and sends only uncertain cases to an LLM merge step, reducing expensive write-time reasoning. On LoCoMo, SAGE achieves the best average token-F1 against Mem0 on all seven open-weight backbone comparisons, while on GPT-4o-mini it reduces add-phase API cost by 3.4$\times$ and add-phase latency by 2.5$\times$ with only a small average judge-score gap. As a drop-in binary gate for A-Mem, SAGE skips roughly 16-18% of LLM calls across five models with minimal quality change on open-weight backbones. These results suggest that novelty-aware write control is a practical lever for improving both memory quality and system efficiency in long-term agentic memory.

2605.30700 2026-06-01 cs.CV cs.LG 版本更新

Mathematical Morphology in Machine Learning

机器学习中的数学形态学

Erick Oliveira Rodrigues, Aura Conci

发表机构 * Universidade Federal Fluminense(里贝伦联邦大学)

AI总结 将数学形态学引入机器学习,提出基于形态学重建的快速聚类算法和一种结合闵可夫斯基与切比雪夫距离的新型距离度量,并设计新型形态学分类器以建模形状、密度和分形信息。

详情
Journal ref
sibgrapi 2018
AI中文摘要

本工作将数学形态学——一种成熟的视觉计算理论——引入机器学习,以利用标准技术常忽视的形状和密度方面。我们提出了一种基于形态学重建的快速聚类算法,该算法能精确保留聚类形状和密度。该方案具有独特特性:内在的最大聚类感知、无成本的噪声去除以及由结构元素控制的多样化增长模式。此外,我们提出了一种结合闵可夫斯基距离和切比雪夫距离的新型距离度量,对于形态学膨胀非常高效。在 $Z^2$ 离散邻域迭代中,它比曼哈顿距离快约1.3倍,比欧几里得距离快约329.5倍。当使用k近邻(k-NN)分类器在33个UCI数据集上与其他14种距离度量进行评估时,我们的度量在大多数情况下(33例中的26例)达到了高于平均的准确率,并在9个案例中取得了最佳整体准确率。最后,我们引入了新型形态学分类器。与现有文献不同,本方案独特地对数据集中的形状、密度和分形信息进行建模。

英文摘要

This work introduces mathematical morphology-an established visual computing theory-into machine learning to exploit shape and density aspects often overlooked by standard techniques. We propose a fast clustering algorithm based on morphological reconstruction that accurately preserves cluster shapes and density. This scheme offers unique features: an intrinsic sense of maximal clusters, cost-free noise removal, and diverse growth patterns controlled by structuring elements.Additionally, we propose a novel distance metric combining Minkowski and Chebyshev distances, highly efficient for morphological dilations. In $Z^2$ discrete neighbourhood iterations, it is roughly 1.3 times faster than Manhattan and 329.5 times faster than Euclidean distances. When evaluated using a k-Nearest Neighbours (k-NN) classifier across 33 UCI datasets against 14 other distances, our metric achieved above-average accuracies most frequently (26 of 33 cases) and the best overall accuracy in 9 cases.Finally, we introduce novel morphological classifiers. Unlike current literature, this proposal uniquely models shape, density, and fractal information in datasets.

2605.30699 2026-06-01 cs.LG cs.CV 版本更新

A Context-Aware Middleware for Medical Image Based Reports: An approach based on image feature extraction and association rules

基于医学图像报告的情境感知中间件:一种基于图像特征提取和关联规则的方法

Erick O. Rodrigues, Jose Viterbo, Aura Conci, Trueman Mac Henry

发表机构 * Department of Computer Science(计算机科学系) Departament of Mathematics & Statistics(数学与统计学系) Universidade Federal Fluminense(联邦Fluminense大学) York University(约克大学)

AI总结 提出一种情境感知中间件,通过图像特征提取和关联规则,自动将医学图像分派给最合适的医疗人员,以提高医疗工作流程效率。

详情
Journal ref
2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)
AI中文摘要

本工作提出了一种用于医疗工作流程组织和效率提升的情境感知中间件。在医院、实验室和远程放射学公司中,每位医生或技术人员都专注于特定类型的诊断或分析。因此,某些类型的医学图像通常会被转发给特定的医生或特定群体。这种转发非常耗时。也就是说,反复决定谁是最合适的医生,以及他在特定情境下是否可用,既繁琐又可能非常低效。因此,所提出的中间件能够处理并收集每位医疗人员所分析图像的数据。基于收集的数据和当前临床情境,中间件能够推断出谁是最适合接收特定传入医学图像的人员。

英文摘要

This work proposes a context-aware middleware for medical workflow organization and efficiency improvement. In hospitals, laboratories and teleradiology companies, each physician or technician is specialized in a specific kind of diagnosis or analysis. Therefore, certain types of medical images are often forwarded to a certain physician or a certain group. This forwarding is time consuming. That is, repeatedly deciding who would be the best physician, whether he is available at a certain moment given a certain context is exhaustive and may be very inefficient. Thus, the proposed middleware has the ability to process and collect data from images analyzed by each medical staff. Based on the collected data and current clinical context, the middleware is able to infer who would be the best fit staff to receive a certain incoming medical image.

2605.30694 2026-06-01 cs.LG 版本更新

Universal Decision Learners

通用决策学习器

Sridhar Mahadevan

发表机构 * Adobe Research(Adobe研究院) University of Massachusetts(马萨诸塞大学) Amherst(阿默斯特)

AI总结 本文提出通用决策学习器(UDL)的范畴论框架,通过左Kan扩展和右Kan扩展将局部决策行为规范地扩展到全局一致行为,统一了规划、强化学习、因果干预、在线学习和博弈均衡等多种决策形式。

Comments 15 pages

详情
AI中文摘要

许多决策理论——规划、强化学习、因果干预、在线学习和博弈均衡——将局部信息转化为全局一致的行为。本文提出一个共同的范畴论形式化:通用决策学习器(UDL)通过一对通用构造将部分指定的决策函子从观测上下文扩展到新上下文。左Kan扩展表达展开、聚合和候选生成;右Kan扩展表达一致性、约束满足和不动点语义。核心主张并非每个决策问题都有相同的算法,而是许多决策形式化实例化同一个通用问题:规范地扩展局部行为数据,然后刻画全局一致的扩展。我们给出抽象的UDL构造,证明其通用比较性质,定义Kan不变的行为等价性和最小抽象,并展示贝尔曼方程、规划递归、因果干预、在线遗憾和均衡如何作为特例出现。补充材料更详细地发展了强化学习特例。

英文摘要

Many theories of decision making -- planning, reinforcement learning, causal intervention, online learning, and game-theoretic equilibrium -- turn local information into globally coherent behavior. This paper proposes a common categorical formulation: a Universal Decision Learner (UDL) extends a partially specified decision functor from observed contexts to new contexts by a pair of universal constructions. Left Kan extensions express rollout, aggregation, and candidate generation; right Kan extensions express consistency, constraint satisfaction, and fixed-point semantics. The central claim is not that every decision problem has the same algorithm, but that many decision formalisms instantiate the same universal problem: extend local behavioral data canonically, then characterize the globally coherent extensions. We give the abstract UDL construction, prove its universal comparison property, define Kan-invariant behavioral equivalence and minimal abstractions, and show how Bellman equations, planning recursions, causal interventions, online regret, and equilibria arise as special cases. The supplementary material develops the reinforcement-learning specialization in more detail.

2605.30686 2026-06-01 cs.CR cs.AI cs.LG 版本更新

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

工具调用ReAct代理中深度相关的间接提示注入:注入深度、载荷框架和轮次预算敏感性

Mohammadreza Rashidi

发表机构 * Department of Computer Science(计算机科学系) AI and Media Analysis Lab(人工智能与媒体分析实验室) Berlin, Germany(柏林,德国)

AI总结 通过四个对照实验(共460次试验),研究在工具调用ReAct代理中,注入深度、载荷框架和轮次预算对间接提示注入攻击成功率的影响,发现注入深度是主导变量,且仅清理第一个工具观察可捕获67%的注入成功。

Comments 17 pages, 16 figures

详情
AI中文摘要

将链式推理与工具调用交错的ReAct代理越来越多地用于实际任务,如调度、文件检索和数据访问。它们的工具观察循环创建了一个直接攻击面:控制任何工具返回值的攻击者可以嵌入指令,将代理从用户目标引开,这种威胁称为间接提示注入。现有基准在固定条件下评估固定注入位置的攻击成功率(ASR),留下了三个未探索的风险维度:载荷在工具序列中出现的位置(注入深度)、使用的修辞风格(框架)以及代理允许的轮次数(轮次上限)。我们在五个攻击类别的20个场景中进行了四项对照研究,总共对GPT-4o-mini和Claude Haiku进行了460次试验,总API成本低于0.36美元。研究1显示,GPT-4o-mini的ASR从深度1的60%衰减到深度4和5的0%(Cramer's V = 0.58,p < 0.001;限制在序列内深度1-3:V = 0.47,p = 0.0013),这是由于深度1的模型抵抗和更深位置在遇到载荷前任务完成所致。研究2在Claude Haiku上重复了深度实验,通过保守的工具调用和真正的指令抵抗,在每个深度均实现了0%的ASR。研究3显示,在深度1,框架将ASR调节在25%(中性)到75%(角色)之间,范围达50个百分点,但在每个条件下N=20时未达到统计显著性。研究4确认ASR在3、5和7的轮次上限下稳定,表明轮次预算在此设置中不是风险因素。我们的结果确立了注入深度为主导变量,并表明仅清理第一个工具观察可捕获67%的测量注入成功。

英文摘要

ReAct agents that interleave chain-of-thought reasoning with tool calls are increasingly deployed for real tasks such as scheduling, file retrieval, and data access. Their tool observation loop creates a direct attack surface: an adversary who controls any tool's return value can embed instructions that redirect the agent away from the user's goal, a threat known as indirect prompt injection. Existing benchmarks evaluate attack success rate (ASR) at a fixed injection position under fixed conditions, leaving three risk dimensions unexplored: where in the tool sequence the payload appears (injection depth), what rhetorical register it uses (framing), and how many turns the agent is permitted (turn cap). We conduct four controlled studies on 20 scenarios spanning five attack categories, totalling 460 trials against GPT-4o-mini and Claude Haiku at a combined API cost under 0.36 USD. Study 1 shows that ASR against GPT-4o-mini decays from 60% at depth 1 to 0% at depths 4 and 5 (Cramer's V = 0.58, p < 0.001; restricted to within-sequence depths 1-3: V = 0.47, p = 0.0013), driven by model resistance at depth 1 and task completion before payload encounter at deeper positions. Study 2 replicates the depth experiment on Claude Haiku, which achieves 0% ASR at every depth through a combination of conservative tool invocation and genuine instruction resistance. Study 3 shows that framing modulates ASR between 25% (neutral) and 75% (persona) at depth 1, a 50-percentage-point range that does not reach statistical significance at N = 20 per condition. Study 4 confirms that ASR is stable across turn caps of 3, 5, and 7, indicating the turn budget is not a risk factor in this setting. Our results establish injection depth as the dominant variable and show that sanitising only the first tool observation captures 67% of measured injection successes.

2605.30662 2026-06-01 cs.LG q-bio.PE 版本更新

Spatio-temporal stochastic graph-based learning for infectious disease forecasting

基于时空随机图的传染病预测学习

Luz Stefani Sotomayor Valenzuela, Susanna Cramb, Darren Wraith

发表机构 * School of Public Health and Social Work, Queensland University of Technology(昆士兰理工大学公共卫生与社会科学学院) QUT Centre for Data Science, Queensland University of Technology(昆士兰理工大学数据科学中心)

AI总结 提出一种集成随机公式和不确定性近似过程的时空图架构,用于预测新发传染病病例,在COVID-19和水痘数据集上表现出竞争性性能。

Comments Preprint under review

详情
AI中文摘要

时空图模型通常用于预测COVID-19和水痘爆发等传染病的新病例。然而,在其学习过程中使用随机建模的研究却出人意料地不足,并且很少考虑大国家的完整数据集。因此,尚不清楚这些模型是否能在真实疾病传播场景中提供准确的预测。在这项工作中,我们提出了一种时空随机图架构,该架构集成了随机公式和不确定性近似过程,以预测新的传染病病例。我们发现,我们的方法能够适应在单一模型架构中编码大小人口地理网络。使用两个真实世界数据集——美国COVID-19和匈牙利水痘,我们报告了所提出的架构在预测美国2022年第一波COVID-19和匈牙利2012-2014年水痘波次中的增强效果。通过与四种时空图模型进行基准测试,定量结果显示,所提出的方法在预测美国所有3218个县和匈牙利所有20个县的新病例方面,具有竞争性的整体周度性能。所提出的方法能够表示相对于基线的整体流行病进展,尽管存在一步延迟;同时表现出对高频低幅变异的低敏感性。

英文摘要

Spatio-temporal graph-based models have typically been used to forecast new cases of infectious diseases such as COVID-19 and chickenpox outbreaks. However, the use of stochastic modelling into their learning process has been surprisingly under-investigated and rarely considered entire data sets of large countries. As a result, it is unknown whether these models would provide accurate forecasts in real-world disease spread scenarios. In this work, we propose a spatio-temporal stochastic graph-based architecture that integrates a stochastic formulation and uncertainty approximation process to forecast new infectious disease cases. We find that our approach can adapt to encode large and small population geographical networks within a single model architecture. Using two real-world data sets, COVID-19 in the US and chickenpox in Hungary, we report an enhanced effect of the proposed architecture across predictions of the 2022 first wave for COVID-19 in the US and comparative results of chickenpox waves during 2012-2014 in Hungary. By benchmarking with four spatio-temporal graph-based models, quantitative results show competitive overall weekly performance of the proposed approach on forecasting new cases for all 3,218 US counties and all 20 Hungary counties. The proposed approach can represent overall epidemic progression relative to baselines, though with a one-step delay; while exhibiting a reduced sensitivity to high-frequency and low-amplitude variability.

2605.30660 2026-06-01 cs.LG cs.RO 版本更新

BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies

BOKBO (Best of K Bad Options): VLA策略的校准式弃权

Anya Singh, Cabrel Happi, Jai Relan, Varun Nair, Vidyut Baradwaj

AI总结 针对视觉-语言-动作(VLA)策略的测试时扩展方法,提出首个共形弃权层BOKBO,通过全局和逐任务变体提供有限样本无分布保证的执行违规率控制,并揭示基于扰动的K采样下策略内部非一致性分数的结构性缺陷。

详情
AI中文摘要

针对视觉-语言-动作(VLA)策略的测试时扩展方法,如RoboMonkey、SEAL、MG-Select和V-GPS,在推理时采样K个候选动作块并执行验证器最优结果。当所有K个候选都不安全时,系统会执行违规动作且无警告。我们提出BOKBO,这是首个用于K样本VLA推理的共形弃权层,提供执行违规率的有限样本无分布保证。我们提供全局和逐任务(Mondrian)变体,其中逐任务变体缩小了最困难任务上的条件差距。我们的分析揭示了基于扰动的K采样下策略内部非一致性分数的结构性失败:基础策略置信度代理与K样本不一致性之间的相关性为0.98(与动作噪声超参数σ相关),而与实际安全违规的相关性处于噪声基底。我们通过复现令牌级温度采样下的分析来测试失败范围,发现该失败是机制特定的,并在基于策略随机性的采样下得到部分缓解。一个基于语义视觉特征和任务标识学习的违规预测器支持紧密校准:在libero_object_temp_x0.1上使用OpenVLA-OFT,ε=0.05时,条件CRC边界在86%的bootstrap分割上成立,覆盖率为78%,净任务成功率为70%。Mondrian-BOKBO将最小逐任务条件保持比例从0.71提高到0.93。结果在5个训练种子上稳定,在π_0-FAST上的bootstrap噪声内可复现,在libero_spatial_temp_x0.1作为同等基准上成立,并经受住了四个套件内分布偏移。我们还识别并纠正了一个方法论陷阱:全局设置的力阈值远低于专家典型的操作力,将不安全行为与正常操作混淆,导致违规率膨胀5倍。

英文摘要

Test-time scaling for vision-language-action (VLA) policies, methods such as RoboMonkey, SEAL, MG-Select, and V-GPS, samples K candidate action chunks at inference and executes the verifier-best. When all K candidates are unsafe, the system executes a violating action with no warning. We propose BOKBO, the first conformal abstention layer for K-sample VLA inference, providing finite-sample distribution-free guarantees on executed-violation rate. We provide both global and per-task (Mondrian) variants, with the per-task variant closing the conditional gap on the hardest tasks. Our analysis exposes a structural failure of policy-internal nonconformity scores under perturbation-based K-sampling: the base-policy confidence proxy and K-sample disagreement correlate at 0.98 with the action-noise hyperparameter $σ$, while correlating at the noise floor with actual safety violations. We test the failure's scope by replicating the analysis under token-level temperature sampling and find the failure is mechanism-specific and partially mitigated under policy-stochasticity-based sampling. A learned violation predictor conditioned on semantic visual features and task identity supports tight calibration: at $ε$ = 0.05 on libero_object_temp_x0.1 with OpenVLA-OFT, the conditional CRC bound holds on 86% of bootstrap splits with 78% coverage and 70% net task success. Mondrian-BOKBO raises the minimum per-task conditional hold fraction from 0.71 to 0.93. Results are stable across 5 training seeds, replicate within bootstrap noise on $π_0$-FAST, hold on libero_spatial_temp_x0.1 as a co-equal benchmark, and survive four within-suite distribution shifts. We additionally identify and correct a methodological pitfall: globally-set force thresholds well below expert-typical manipulation forces conflate unsafe behavior with normal manipulation, inflating violation rates by $5\times$.

2605.30656 2026-06-01 cs.LG 版本更新

Learning to Perceive the World Through Control: Empowerment-Based Representation Learning

通过控制学习感知世界:基于赋能的表示学习

Mahsa Bastankhah, Sophie Broderick, Benjamin Eysenbach

发表机构 * Princeton University, USA(普林斯顿大学,美国)

AI总结 本文通过最大化赋能目标,研究如何学习仅捕捉环境控制相关特征的表示,并证明赋能代理诱导的前向和后向表示对控制无关特征具有不变性。

详情
AI中文摘要

在许多实际强化学习环境中,观测的维度远高于对控制重要的变量。在这项工作中,我们提出一个问题:我们能否学习仅捕捉环境中控制相关特征的表示?我们通过赋能目标研究这个问题,该目标最大化代理对环境的影响,并广泛用于无监督技能学习。我们表明,赋能代理诱导两种不同的表示——前向和后向——它们捕捉状态的互补方面,并且两者都对控制无关特征具有不变性。因此,赋能最大化导致代理学习一个隐式的、以控制为中心的世界模型。我们的分析强调了通过交互而非被动数据集学习表示的重要性:旨在最大化控制的交互对于学习有用的不变性属性至关重要,这一观点与因果学习文献紧密一致。

英文摘要

In many practical reinforcement learning environments, observations are far higher-dimensional than the variables that matter for control. In this work, we ask: can we learn representations that capture only control-relevant features of the environment? We study this question through the empowerment objective, which maximizes an agent's influence over the environment and is widely used for unsupervised skill learning. We show that empowerment agents induce two distinct representations -- forward and backward -- that capture complementary aspects of the state, and both of which are invariant to control-irrelevant features. Thus, empowerment maximization leads agents to learn an implicit, control-centric model of the world. Our analysis highlights the importance of learning representations through interaction rather than from passive datasets: interaction aimed at maximizing control is essential for learning useful invariance properties, a perspective that aligns closely with the causal learning literature.

2605.30652 2026-06-01 cs.LG 版本更新

Bridging the Gap Between Natural Language and Market Dynamics via High-Dimensional Representation Learning

弥合自然语言与市场动态之间的差距:基于高维表示学习

Yujin Jeong, Noelle Jung, Brian Y. C. Leung

AI总结 本文通过用密集FinBERT嵌入替代离散情感评分,在Transformer架构中探索高维表示学习,以提升金融新闻对短期股价预测的准确性。

详情
AI中文摘要

传统的多模态金融预测通常依赖于标量情感分数,这无法捕捉金融新闻的细微差别。为了解决这一信息损失问题,本文通过在基于Transformer的预测架构中用密集的FinBERT嵌入替代离散极性评分,探索高维表示学习。我们在FNSPID数据集上对各种嵌入策略进行了基准测试,包括原始嵌入、注意力加权聚合和自定义孪生网络。虽然基于注意力的机制在处理金融数据典型的低信噪比时表现不佳,但集成孪生优化嵌入的方法在预测短期股价走势方面优于标量基线和原始嵌入方法,表明保留高维叙事背景能提高预测准确性。

英文摘要

Traditional multi-modal financial forecasting often relies on scalar sentiment scores, which fail to capture the nuances of financial news. To address this information loss, this paper explores high-dimensional representation learning by replacing discrete polarity ratings with dense FinBERT embeddings within a Transformer-based forecasting architecture. We benchmarked various embedding strategies on the FNSPID dataset, including raw embeddings, attention-weighted aggregation, and a custom Siamese network. While the attention-based mechanism struggled with the low signal-to-noise ratio typical of financial data, the integration of Siamese-optimized embeddings outperformed both the scalar baseline and raw embedding approaches, demonstrating that preserving high-dimensional narrative context yields improved predictive accuracy for short-term stock price movements.

2605.30651 2026-06-01 cs.LG cs.AI 版本更新

LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation

LARK:基于可学习性的轨迹选择用于高效推理蒸馏

Tianrun Yu, Kaixiang Zhao, Chih-Chun Chen, Amanda Hughes, Taylor W. Killian, Fenglong Ma, Weitong Zhang, Porter Jenkins

发表机构 * Brigham Young University The Pennsylvania State University University of North Carolina at Chapel Hill

AI总结 提出LARK方法,通过可学习性因子ρ和χ²正则化选择策略,在推理蒸馏中高效选择学生模型可学习的轨迹,同时保持分布覆盖,显著提升多个基模型和推理任务的性能。

Comments 43 pages, 9 figures, 2 tables

详情
AI中文摘要

我们研究推理蒸馏中的轨迹选择问题,其中教师生成的推理轨迹被选择性地用作学生模型的监督。现有方法依赖于启发式规则,如轨迹质量或模型置信度,但往往忽略了轨迹是否可被学生模型学习。本文提出LARK,一种基于可学习性的推理轨迹选择方法。LARK选择学生能够高效学习的轨迹,同时保留完整训练分布的泛化能力。LARK的核心是可学习性因子$ρ$,它刻画了学生训练损失下降的速率。为了高效估计该速率并保持泛化,我们引入了一个可学习性代理和一个$χ^2$正则化的选择策略,该策略平衡可学习性和分布覆盖,两者均具有强理论保证的估计误差。实验表明,LARK在多个基模型和推理任务上持续优于数据选择基线。诊断分析显示,LARK得分能预测下游训练效用,且LARK选择的轨迹能诱导更快的监督微调损失下降。我们的代码可在https://github.com/Tianrun-Yu/LARK获取。

英文摘要

We study trajectory selection for reasoning distillation, where teacher-generated reasoning trajectories are selectively used as supervision for a student model. Existing methods rely on heuristics such as trajectory quality or model confidence, but they often overlook whether a trajectory is learnable by the student. In this paper, we present LARK, a learnability-grounded method for reasoning trajectory selection. LARK selects trajectories that the student can learn efficiently while preserving the generalization of the full training distribution. At the core of LARK is a learnability factor $ρ$, which characterizes the rate at which the student's training loss decreases. To estimate this rate efficiently and maintain generalization, we introduce a learnability proxy and a $χ^2$-regularized selection policy that balances learnability and distributional coverage, both with strong theoretical guarantees on their estimation error. Empirically, LARK consistently outperforms data selection baselines across multiple base models and reasoning tasks. Diagnostic analyses show that the LARK score predicts downstream training utility and that LARK-selected trajectories induce faster supervised fine-tuning loss reduction. Our code is available at https://github.com/Tianrun-Yu/LARK.

2605.30648 2026-06-01 cs.LG math.OC 版本更新

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

非均匀光滑条件下最速下降与Adam的收敛性

Sharan Vaswani, Yifan Sun, Reza Babanezhad

发表机构 * Simon Fraser University(西蒙弗雷泽大学) Stony Brook University(石溪大学)

AI总结 本文在非均匀光滑假设下,研究最速下降法及RMSProp和Adam的确定性对角变体的收敛率,证明在逻辑回归、softmax策略梯度等目标上符号梯度下降线性收敛且快于梯度下降,并在两层神经网络上证明RMSProp和Adam可线性收敛。

Comments ICML 2026

详情
AI中文摘要

近期工作分析了在非均匀光滑假设下的一阶方法的收敛性,该假设更好地模拟了机器学习任务中的损失景观。我们将这一假设推广到曲率是目标值的仿射函数的目标函数上。这一性质被广泛的问题类别所满足,包括逻辑回归、具有逻辑链接函数的广义线性模型、强化学习中的softmax策略梯度以及一类神经网络。在此假设和梯度支配条件下,我们建立了最速下降法以及RMSProp和Adam的确定性对角变体的通用收敛率。我们的结果表明,对于可分离数据上的逻辑回归和softmax策略梯度目标,符号梯度下降线性收敛且被证明比梯度下降更快。此外,我们证明对于可分离数据上的一类两层神经网络,RMSProp和Adam可以在恒定步长和动量参数下以线性速率收敛。最后,我们给出了一个下界,表明在我们的假设下,RMSProp和Adam被证明比AdaGrad、AMSGrad、梯度下降和重球动量更快。

英文摘要

Recent work has analyzed the convergence of first-order methods under non-uniform smoothness assumptions that better model the loss landscape in machine learning tasks. We generalize this assumption to objectives whose curvature is an affine function of the objective value. This property is satisfied by a broad class of problems, including logistic regression, generalized linear models with a logistic link function, softmax policy gradient in reinforcement learning, and a class of neural networks. Under this assumption and gradient domination conditions, we establish a general convergence rate for the steepest descent method, and deterministic, diagonal variants of RMSProp and Adam. Our results imply that for logistic regression on separable data and the softmax policy gradient objective, sign GD converges linearly and is provably faster than GD. Furthermore, we show that for a class of two-layer neural networks on separable data, RMSProp and Adam can converge at a linear rate with a constant step-size and momentum parameter. Finally, we present a lower bound demonstrating that, under our assumption, RMSProp and Adam are provably faster than AdaGrad, AMSGrad, gradient descent, and heavy-ball momentum.

2605.30642 2026-06-01 cs.LG 版本更新

Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?

扩散模型优先记忆原型样本,或:为什么我的扩散模型喜欢“潦草”?

Marta Aparicio Rodriguez, Anastasia Borovykh, Grigorios A. Pavliotis, Daniel J. Korchinski

发表机构 * Department of Mathematics, Imperial College London, UK ML Lab, Capital Fund Management, France Department of Physics, \'Ecole Polytechnique F\'ed\'erale de Lausanne (EPFL), Switzerland

AI总结 本文通过随机层次模型生成的字符串训练扩散模型,发现模型优先记忆常见子串组成的样本,即使数据完全去重,表明点级去重无法保证隐私,而数据集多样性(尤其是高层抽象)能延缓记忆,并识别出部分记忆的中间状态导致生成均值回归的“潦草”现象。

详情
AI中文摘要

生成模型存在一个持久限制:它们记忆训练数据的倾向可能产生法律责任并削弱创意多样性。因此,理解哪些样本被全部或部分记忆,以及在什么条件下被记忆,仍然是一个重要的开放问题。本文对“非典型或稀有样本是否首先被记忆?”这一问题给出了否定答案。我们根据随机层次模型(RHM)的产生规则生成的字符串训练扩散模型,发现由常见子串组成的样本被优先记忆。即使训练数据由完全独特的样本组成,这一结论仍然成立,表明在数据点级别进行去重并不能提供有意义的隐私保证。相应地,我们预测并随后观察到,对于重尾数据集(即包含更多非典型样本的数据集),记忆会延迟。当重尾特性引入高层产生规则时,这种效应会放大。这些结果共同表明,数据集多样性,尤其是在更高抽象层次上,在延缓记忆方面起着重要作用。最后,我们识别出一个部分记忆的中间状态,其中常见子串首先被学习,随后在生成过程中过度产生。如果在此状态停止训练,模型将表现出均值回归的平淡性,常被讥讽为“潦草”。

英文摘要

Generative models have a persistent limitation: their tendency to memorize training data can create legal liabilities and erode creative diversity. Understanding which samples are memorized in whole or in part, and under what conditions, therefore remains an important open problem. Here we answer the question "Are atypical or rare samples memorized first?" in the negative. We train diffusion models on strings generated according to the production rules of the Random Hierarchy Model (RHM), and find that samples composed of common substrings are preferentially memorized. This holds true even if the training data consists of entirely unique samples, indicating that deduplication at the data point level does not provide a meaningful privacy guarantee. Correspondingly we predict, then observe, delayed memorization for fat-tailed datasets (i.e., those with more atypical samples). This effect is amplified when fat-tails are introduced into high-level production rules. These together suggest that dataset diversity, particularly at higher levels of abstraction, plays an important role in staving off memorization. Finally, we identify an intermediate regime of partial memorization in which common substrings are learned first and subsequently overproduced during generation. If training is stopped in this regime, models will exhibit the reversion-to-the-mean blandness often derided as "slop".

2605.30640 2026-06-01 cs.LG cs.CL 版本更新

CSULoRA: Closest Safe Update Low-Rank Adaptation

CSULoRA:最近安全更新低秩适应

Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh, Salima Lamsiyah

发表机构 * Department of Computer Science, University of Luxembourg(卢森堡大学计算机科学系)

AI总结 提出CSULoRA方法,通过后处理校正LoRA适配器,在保留任务相关性的同时抑制不安全更新方向,降低攻击成功率。

Comments 10 pages, 3 figure

详情
AI中文摘要

低秩适应已成为大型语言模型参数高效微调的标准方法,但即使少量不安全或对抗性微调数据也会显著削弱对齐模型的安全行为。现有的安全保持LoRA方法通常依赖硬干预,如投影、剪枝、阈值化或额外训练目标。虽然这些方法可以抑制不安全更新方向,但它们也可能移除任务相关信息或需要额外调优。我们提出CSULoRA,一种通过最近安全更新估计来校正训练后LoRA适配器的后处理方法。CSULoRA从安全对齐模型与其对应基础检查点之间的权重位移中估计安全对齐子空间。然后,它将每个LoRA更新分解为完全对齐、部分对齐和子空间外分量。CSULoRA不丢弃估计安全子空间外的分量,而是求解一个闭式惩罚最小变化问题,该问题保留完全对齐分量,同时根据相对能量平滑衰减潜在不安全方向。在对抗性微调实验中,CSULoRA显著降低了攻击成功率,同时保留了标准LoRA微调获得的大部分效用增益。

英文摘要

Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned models. Existing safety-preserving LoRA methods often rely on hard interventions such as projection, pruning, thresholding, or additional training objectives. While these methods can suppress unsafe update directions, they may also remove task-relevant information or require extra tuning. We introduce CSULoRA, a post-hoc method for correcting trained LoRA adapters through closest safe update estimation. CSULoRA estimates a safety-aligned subspace from the weight displacement between a safety-aligned model and its corresponding base checkpoint. It then decomposes each LoRA update into fully aligned, partially aligned, and off-subspace components. Instead of discarding components outside the estimated safety subspace, CSULoRA solves a closed-form penalized minimum-change problem that preserves the fully aligned component while smoothly attenuating potentially unsafe directions according to their relative energy. In adversarial fine-tuning experiments, CSULoRA substantially reduces attack success rate while preserving most of the utility gains obtained from standard LoRA fine-tuning.

2605.30638 2026-06-01 cs.LG cs.AI 版本更新

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

分数广播与去相关:基于广播的信用分配通用框架

Mustafa Uzun, Mete Erdogan, Cengiz Pehlevan, Alper T. Erdogan

发表机构 * KUIS AI Center, Koc University, Turkey(科克大学KUIS人工智能中心,土耳其) Electrical and Electronics Engineering, Koc University, Turkey(科克大学电子与电气工程系,土耳其) Department of Electrical Engineering, Stanford University, USA(斯坦福大学电气工程系,美国) John A. Paulson School of Engineering & Applied Sciences, Harvard University, USA(哈佛大学约翰·A·保罗森工程与应用科学学院,美国) Kempner Institute, Harvard University, USA(哈佛大学凯姆纳研究所,美国) Center for Brain Science, Harvard University, USA(哈佛大学脑科学中心,美国)

AI总结 提出分数广播与去相关(SBD)框架,通过输出分数与隐藏层激活的正交性原理,统一了多种可微损失函数下的广播式信用分配,并理论支撑了三因子学习规则。

详情
AI中文摘要

我们引入了分数广播与去相关(SBD),一个用于一般可微损失族基于广播的信用分配的原则性框架。误差广播是反向传播的一种生物合理替代方案,它无需权重传输即可将输出信息发送到隐藏层。最近针对均方误差(MSE)设置引入的误差广播与去相关(EBD)框架,将这一机制建立在最优估计量的随机正交性基础上,即最优残差与输入的函数正交。我们通过引入输出分数(损失对最终层输出的梯度)与隐藏层激活之间的正交性原理来推广这一基础,该原理在最优分数条件均值为零时成立。这一单一原理统一了标准可微损失族(包括交叉熵、Bregman散度、适当评分规则和指数族负对数似然)的广播式信用分配。该框架为一般损失下的三因子学习规则提供了理论基础,其中神经调节因子被推导为广播损失分数。我们明确推导了交叉熵情况,刻画了可接受损失类,并引入了一种分数向量扩展技术,该技术在保持正交性框架的同时丰富了广播信号。在CIFAR-10和Tiny ImageNet上的实验表明,SBD显著优于现有的广播方法,而分数向量扩展带来了进一步的提升。总体而言,这项工作确定了损失分数作为广播信号,提供了正交性理论以及神经科学中三因子学习规则的理论基础,并展示了分数向量扩展如何丰富所得目标函数的去相关方向。

英文摘要

We introduce Score Broadcast and Decorrelation (SBD), a principled framework for broadcast-based credit assignment for general families of differentiable losses. Error broadcast is a biologically plausible alternative to backpropagation that sends output information to hidden layers without weight transport. The Error Broadcast and Decorrelation (EBD) framework, recently introduced for the mean-squared-error (MSE) setting, grounded this mechanism in the stochastic orthogonality of optimal estimators, under which the optimal residual is orthogonal to functions of the input. We generalize that foundation by introducing an orthogonality principle between the output score (the gradient of loss with respect to the final-layer output) and hidden-layer activations, which holds whenever the optimal score has conditional mean zero. This single principle unifies broadcast-based credit assignment across the standard differentiable-loss families, including cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods. The framework supplies a theoretical grounding for the three-factor learning rule under general losses, with the neuromodulatory factor derived as the broadcast loss score. We derive the cross-entropy case explicitly, characterize the admissible loss class, and introduce a score vector expansion technique that enriches the broadcast signal while preserving the orthogonality framework. Experiments on CIFAR-10 and Tiny ImageNet show that SBD substantially improves over existing broadcast approaches, with score vector expansion delivering further gains. Overall, this work identifies the loss score as the signal to broadcast, supplies the orthogonality theory and theoretical grounding for the three-factor learning rule from neuroscience, and shows how score vector expansion enriches the decorrelation directions of the resulting objective.

2605.30635 2026-06-01 cs.LG q-bio.GN 版本更新

CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment

CellBRIDGE:通过交互感知对齐学习细胞轨迹

Silas Ruhrberg Estévez, Nicolas Huynh, Tennison Liu, Roderik M. Kortlever, Gerard I. Evan, David L. Bentley, Mihaela van der Schaar

发表机构 * DAMTP, University of Cambridge(剑桥大学应用数学与理论物理系) Francis Crick Institute(弗朗西斯·克里克研究所) University of Colorado Anschutz Medical Campus(科罗拉多大学安舒茨医学校区)

AI总结 提出CellBRIDGE方法,通过将配体-受体介导的细胞间通信成本融入最优传输框架,改进了单细胞RNA测序数据中的轨迹推断和跨快照耦合。

详情
Journal ref
ICML 2026
AI中文摘要

从群体快照推断动态是机器学习和生物学中的一个基本挑战。在单细胞RNA测序(scRNA-seq)中,破坏性测量阻止了跨时间直接追踪单个细胞,使得轨迹推断欠定。最优传输(OT)为快照对齐提供了一个原则性框架,但一个长期存在的建模问题是哪些成本函数能产生生物学上有意义的耦合。标准的OT方法依赖于基因表达距离,隐含地将细胞视为独立点,并忽略了由配体-受体信号介导的结构化细胞间通信。我们引入了CellBRIDGE(基于细胞的规则化交互驱动基因表达),它用源自配体-受体活性的定向、类型化交互成本来增强基于特征的OT。通过显式建模细胞间通信,与仅基于特征的基线相比,CellBRIDGE在合成和真实scRNA-seq数据集上改善了跨快照耦合和下游轨迹估计。值得注意的是,CellBRIDGE实现了可机械解释的计算机扰动:在肺癌数据上,沉默特定的配体-受体对诱导的轨迹变化重现了预期靶向通路抑制的效果。

英文摘要

Inferring dynamics from population snapshots is a fundamental challenge in machine learning and biology. In scRNA-sequencing (scRNA-seq), destructive measurements preclude direct tracking of individual cells across time, making trajectory inference underdetermined. Optimal Transport (OT) provides a principled framework for snapshot alignment, but a long-standing modeling question is which cost functions yield biologically meaningful couplings. Standard OT approaches rely on gene-expression distances, implicitly treating cells as independent points and neglecting structured cell-cell communication mediated by ligand-receptor signaling. We introduce CellBRIDGE (Cell-Based Regularized Interaction-Driven Gene Expression), which augments feature-based OT with a directed, typed interaction cost derived from ligand-receptor activity. By explicitly modeling cell-cell communication, CellBRIDGE improves cross-snapshot couplings and downstream trajectory estimates across synthetic and real scRNA-seq datasets relative to feature-only baselines. Notably, CellBRIDGE enables mechanistically interpretable in silico perturbations: on lung cancer data, silencing specific ligand-receptor pairs induces trajectory shifts that recapitulate expected effects of targeted pathway inhibition.

2605.30632 2026-06-01 cs.HC cs.AI cs.LG 版本更新

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Rationalize: 人机对齐的共享语义推理

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen, Subhodeep Ghosh, Xun Song

发表机构 * New Jersey Institute of Technology(新泽西理工学院)

AI总结 提出Rationalize角色对框架,通过共享推理空间中的互补角色对(如探索者-引导者)实现人类与AI在数据驱动意义建构中的语义对齐,并设计元素级和角色特定的对齐评估方法。

Comments Accepted by ACM CHI 2026 BiAlign Workshop

详情
AI中文摘要

我们介绍了Rationalize,一个用于数据驱动意义建构中人类与AI模型之间共享语义推理的角色对框架。基于人机协作和批判性思维的思路,我们将人机交互概念化为一系列互补的角色对(探索者-引导者、调查者-告知者、教师-学生、法官-倡导者),这些角色对在共享推理空间中运作。在这个空间中,人类分析师和AI模型(如LLM)使目的、问题、假设、证据、推理和影响变得明确,不仅促进输出层面的对齐,而且促进双方意图和行动的合理化层面的对齐。我们将这些角色对与双向人机对齐框架联系起来,说明“使AI对齐人类”和“使人类对齐AI”如何因角色而异,并勾勒出一个使用元素级和角色特定方法进行对齐设计和评估的协作研究议程。

英文摘要

We introduce Rationalize, a role-pair framework for shared semantic reasoning between humans and AI models in data-driven sensemaking. Building on ideas in human-machine teaming and critical thinking, we conceptualize human-AI interaction as a series of complementary role pairs (Explorer-Guide, Investigator-Informant, Teacher-Student, Judge-Advocate) operating in a shared reasoning space. In this space, human analysts and AI models (such as LLMs) make purposes, questions, assumptions, evidence, inferences, and implications explicit, facilitating alignment not only at the output level but at the level of rationalization of intent and action by each side. We relate these role pairs to the bidirectional human-AI alignment framework, illustrating how "aligning AI to humans" and "aligning humans to AI" differ by role, and sketch a collaborative research agenda for alignment design and assessment using element-level and role-specific approaches.

2605.30631 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Controllable Lung Nodule Synthesis via Histogram-Regularized Latent Diffusion Models

基于直方图正则化潜扩散模型的可控肺结节合成

Arunkumar Kannan, Yanbo Zhang, Han Liu, Michael Baumgartner, Jianing Wang, Alexander Hertel, Bogdan Georgescu, Sasa Grbic

发表机构 * Johns Hopkins University(约翰霍普金斯大学) Department of Radiology and Nuclear Medicine, University Medical Center Mannheim, Heidelberg University(放射学与核医学科,曼海姆大学医学中心,海德堡大学)

AI总结 提出一种直方图正则化潜扩散模型,通过结合亚型、空间掩码和HU直方图条件以及可微特征空间直方图正则化项,在3D CT体积中合成肺结节,以准确建模结节特异性强度分布,提高视觉真实感和亚型一致性。

详情
AI中文摘要

尽管自动诊断系统在基于CT的肺癌筛查中取得了显著成功,但其发展仍受限于多样化、带标注的肺结节数据集的稀缺性。基于扩散的生成模型为数据合成提供了一种有前景的策略;然而,许多现有的条件方法主要优化空间重建损失,这鼓励体素级相似性,但可能不足以约束病灶级强度分布。因此,这些方法可能产生过度平滑的纹理轮廓,并低估不同结节亚型(包括实性、部分实性和磨玻璃结节)的独特衰减特性。为解决这一挑战,我们提出了一种可控潜扩散模型,该模型在全3D CT体积内合成肺结节,同时准确建模结节特异性强度分布。具体而言,我们不只依赖空间损失,还引入了一个基于直方图的正则化项,在生成过程中约束体素强度分布。该模型结合了亚型、空间掩码和Hounsfield单位(HU)直方图条件以及可微特征空间直方图正则化项,以更好地对齐病灶级强度分布,提高合成结节的视觉真实感和亚型一致性。在肺部CT数据上的大量实验表明,我们的框架实现了强烈的视觉真实感,通过定量指标和视觉图灵测试验证。此外,当用于数据增强时,生成的结节提高了下游临床任务的性能,特别是对于代表性不足的结节亚型,并显示出对亚型知情恶性分类的潜在益处。

英文摘要

While automated diagnosis systems have achieved remarkable success in computed tomography (CT)-based lung cancer screening, their development remains limited by the scarcity of diverse, annotated pulmonary nodule datasets. Diffusion-based generative models offer a promising strategy for data synthesis; however, many existing conditional approaches primarily optimize spatial reconstruction losses, which encourage voxel-wise similarity but may inadequately constrain lesion-level intensity distributions. As a result, these methods may produce over-smoothed texture profiles and underrepresent the distinct attenuation characteristics of different nodule subtypes, including solid, part-solid, and ground-glass nodules. To address this challenge, we propose a controllable latent diffusion model that synthesizes pulmonary nodules within full 3D CT volumes while accurately modeling nodule-specific intensity distributions. Specifically, rather than relying solely on spatial losses, we introduce a histogram-based regularization term that constrains voxel intensity distributions during the generative process. The model combines subtype, spatial mask, and Hounsfield unit (HU) histogram conditioning with the differentiable feature-space histogram regularization term to better align lesion-level intensity distributions, improving the visual plausibility and subtype consistency of synthesized nodules. Extensive experiments on lung CT data demonstrate that our framework achieves strong visual realism, validated through both quantitative metrics and a visual Turing test. Furthermore, when used for data augmentation, the generated nodules improve performance in downstream clinical tasks, particularly for underrepresented nodule subtypes, and show a potential benefit for subtype-informed malignancy classification.

2605.30628 2026-06-01 cs.CL cs.AI cs.LG 版本更新

The Architecture of Errors: From Universal Impossibility to Patch-Local LLM Reliability

错误的架构:从普遍不可能到局部补丁的LLM可靠性

Mikhail L. Arbuzov, Lee Mosbacker, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey Shvets

发表机构 * Independent Researcher(独立研究者) Palo Alto Networks(帕洛阿尔托网络)

AI总结 本文通过两个命题和一个推论,形式化地论证了通用LLM可靠性在无限域上不可实现,但在操作有界的局部补丁中可通过目录发现和干预覆盖实现可靠性。

Comments 25 pages, no figures

详情
AI中文摘要

通用LLM可靠性不是一个有限库问题:在所有可能任务、工具、模式、知识源和评估者期望中,新的可干预区分的失败模式会无界出现,因此没有有限的干预词典能保证对每种此类模式的有界残余错误。但部署的系统并不在整个宇宙中运行。它们在操作有界的补丁(法律审查、医学RAG、代码修复、客户支持代理、合同提取)内运行,这些补丁具有重复的任务、模式、工具和评估者期望。在这些补丁内,经验证据表明失败是稀疏的、重复的,并集中在一个小的重复目录中,因此可靠性变成了一个局部目录发现和干预覆盖问题,而不是指数级的令牌长度问题。我们通过两个命题和一个推论形式化了这一转变。命题1是最坏情况模式方面的负面结果:没有有限的干预词典能覆盖无界域的每个可区分的失败模式。推论1是逆发现蕴含:模式发现的对数上界无法容纳线性更多的不同尾模式,除非指数级地观察到更多的硬失败事件。命题2是积极的局部补丁结果:在活跃模式暴露对数增长和头部重覆盖下,每个硬决策的足够干预预算随序列长度多对数增长,并在补丁目录饱和后变为域常数。该框架重新定位而非消解长上下文困难:当硬决策数量本身随任务长度增长时,可靠性仍然困难;贡献在于识别轴向干预,而非使这些区域变得容易。

英文摘要

Universal LLM reliability is not a finite-library problem: across all possible tasks, tools, schemas, knowledge sources, and evaluator expectations, new intervention-distinguishable failure modes can appear without bound, so no finite intervention dictionary can guarantee bounded residual error for every such mode. But deployed systems do not operate over the whole universe. They operate inside operationally bounded patches (legal review, medical RAG, code repair, customer-support agents, contract extraction) with recurring tasks, schemas, tools, and evaluator expectations. Within such patches, empirical evidence suggests failures are sparse, repetitive, and concentrated in a small recurring catalogue, so reliability becomes a local catalogue-discovery and intervention-coverage problem rather than an exponential token-length problem. We formalize this transition with two propositions and one corollary. Proposition 1 is the worst-case-mode-wise negative result: no finite intervention dictionary covers every distinguishable failure mode of an unbounded domain. Corollary 1 is the inverse-discovery implication: the logarithmic upper bound on mode discovery cannot accommodate linearly more distinct tail modes without exponentially more observed hard-failure events. Proposition 2 is the positive patch-local result: under log active-mode exposure and head-heavy coverage, a sufficient per-hard-decision intervention budget grows polylogarithmically in sequence length and becomes domain-constant once the patch catalogue saturates. The framework relocates rather than dissolves long-context difficulty: where the number of hard decisions itself grows with task length, reliability remains hard; the contribution is to identify the on-axis intervention rather than to make those regimes easy.

2605.30625 2026-06-01 cs.LG cs.AI stat.ML 版本更新

Active Timepoint Selection for Learning Measure-Valued Trajectories

学习测度值轨迹的主动时间点选择

Nicolas Huynh, Mihaela van der Schaar

发表机构 * DAMTP, University of Cambridge(剑桥大学 DAMTP 实验室)

AI总结 针对高成本破坏性数据获取场景,提出基于线性化最优传输的主动学习框架,通过高斯过程建模概率路径并迭代选择最优测量时间点以最小化不确定性。

Comments ICML 2026

详情
AI中文摘要

从稀疏快照推断连续概率路径是单细胞生物学等领域的基本挑战,其中高保真数据获取通常具有破坏性且受限于高昂测序成本。这促使需要主动学习策略来战略性选择最优测量时间。然而,为此场景设计主动学习策略仍是一个开放问题:目标对象位于无限维Wasserstein空间,标准欧几里得度量在此不适用,且当前插值方法缺乏认知不确定性量化。我们提出一个将主动实验扩展到测度空间的框架。通过利用线性化最优传输(LOT),我们将分布快照映射到适合高斯过程建模的切空间,从而为底层概率路径构建可处理的概率代理模型。这产生了一种采集策略,通过迭代选择测量时间以最小化不确定性。实验结果表明,我们的策略在合成和真实数据集上均优于不考虑不确定性的基线方法。

英文摘要

Inferring continuous probability paths from sparse snapshots is a fundamental challenge in domains like single-cell biology, where high-fidelity data acquisition is often destructive and constrained by prohibitive sequencing costs. This motivates the need for active learning strategies to strategically select optimal measurement times. However, designing active learning policies for this setting remains an open problem: the target objects reside on the infinite dimensional Wasserstein space where standard Euclidean metrics are ill-defined, and current interpolation methods lack epistemic uncertainty quantification. We introduce a framework which extends active experimentation to the space of measures. By leveraging Linearized Optimal Transport (LOT), we map distributional snapshots into a tangent space amenable to Gaussian Process modeling, allowing us to construct a tractable probabilistic surrogate for the underlying probability path. This yields an acquisition policy that iteratively selects measurement times to minimize uncertainty. Empirical results demonstrate that our strategy outperforms uncertainty-agnostic baselines on both synthetic and real-world datasets.

2605.30622 2026-06-01 cs.NI cs.LG 版本更新

Jamming-Resilient PRB Reservation for Latency-Critical O-RAN Network Slicing

抗干扰的PRB预留用于延迟关键型O-RAN网络切片

Elahe Delavari, Junaid Farooq

发表机构 * Department of Electrical and Computer Engineering, University of Michigan-Dearborn(密歇根大学迪尔伯恩分校电气与计算机工程系)

AI总结 针对O-RAN下行链路中的恶意干扰导致PRB容量降低和延迟违规问题,提出一种基于预留的弹性框架,通过近实时RIC xApp控制预留PRB池,结合主动清除积压和被动分配预留容量,并采用掩蔽深度Q网络学习非平稳干扰下的最优策略,显著降低URLLC延迟违规并提高预留效率。

Comments Accepted at ML-Spec Workshop in IEEE DySPAN 2026

详情
AI中文摘要

开放无线接入网络(O-RAN)架构通过部署在近实时RAN智能控制器(near-RT RIC)上的可编程xApp,实现对网络切片的近实时、软件驱动控制。在工业5G下行链路系统中,恶意干扰会突然降低有效物理资源块(PRB)容量,导致队列积压和持续延迟违规,尤其是在存在低频谱效率的小区边缘用户设备时。本文提出了一种基于预留的弹性框架,用于切片O-RAN部署中的PRB分配。一个有限的预留PRB池由near-RT RIC xApp控制,该xApp通过主动清除积压以建立延迟余量,并在干扰活跃期间被动分配预留容量,提供混合缓解措施。我们将预留激活建模为受约束的序贯决策问题,并设计了一个掩蔽深度Q网络,以学习非平稳干扰下的有效控制策略。仿真结果表明,与反应式基线相比,URLLC延迟违规显著减少,预留效率提高。

英文摘要

Open radio access network (O-RAN) architectures enable near real-time, software-driven control of network slicing through programmable xApps deployed on the near-real-time RAN Intelligent Controller (near-RT RIC). In industrial 5G downlink systems, adversarial jamming can abruptly reduce the effective physical resource block (PRB) capacity, triggering queue buildup and persistent latency violations, particularly in the presence of low spectral efficiency cell edge user equipments. This paper proposes a reserve-based resilience framework for PRB allocation in sliced O-RAN deployments. A finite pool of reserved PRBs is controlled by a near-RT RIC xApp that provides hybrid mitigation by proactively clearing backlog to build latency margin and reactively allocating reserve capacity during jammer active intervals. We formulate reserve activation as a constrained sequential decision problem and design a masked Deep Q-Network to learn effective control policies under non-stationary jamming. Simulation results show substantial reductions in URLLC latency violations and improved reserve efficiency compared to reactive baselines.

2605.30619 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

从 Best-of-$N$ 偏好数据中学习奖励:目标、权衡与设计原则

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

发表机构 * Machine Learning Department(机器学习系)

AI总结 本文分析了从 Best-of-$N$ 采样构建的成对偏好数据中 Bradley-Terry 奖励学习的目标,揭示了 $N$ 和基础分布对奖励估计的影响,并提出了基于样本效率和连通性权衡的设计原则。

详情
AI中文摘要

Best-of-$N$ 采样被广泛用于构建成对偏好数据:从基础分布中抽取 $N$ 个候选,并将最佳响应与拒绝响应配对。尽管其广泛使用,但 Bradley-Terry (BT) 奖励学习从这类数据中提取了什么,以及如何选择 $N$ 和基础分布,仍不清楚。我们将近期通过诱导条件分布对偏好数据的分析专门应用于 Best-of-$N$。对于独立参考变体,我们推导出作为 $N$ 和基础分布显式函数的闭式奖励目标,并证明它们保留了潜在奖励排名。对于实用的 Best-vs-Random 和 Best-vs-Worst 变体,所选和拒绝的响应通过同一候选集耦合,因此精确的 BT 可表示性通常不成立;然而,随着 $N$ 增长,有界类最小化器接近参考目标。尽管已知边界和连通性在成对偏好学习中控制样本效率,但 Best-of-$N$ 通过 $N$ 以相反方向耦合它们:更大的 $N$ 加宽成对边界但降低连通性。这种权衡产生了两个设计原则:当偏好标签是瓶颈时使用较大的 $N$,当生成是瓶颈时使用较小的 $N$;并塑造基础分布,使其质量集中在测试时比较最重要的响应之间。在合成和真实偏好数据上的实验支持了对样本量和基础分布形状的预测依赖性。

英文摘要

Best-of-$N$ sampling is widely used to construct pairwise preference data: $N$ candidates are drawn from a base distribution, and the best is paired with a rejected response. Despite its widespread use, what Bradley--Terry (BT) reward learning extracts from such data, and how to choose $N$ and the base distribution, remain unclear. We specialize a recent analysis of preference data via its induced conditional distribution to Best-of-$N$. For independent-reference variants, we derive closed-form reward targets as explicit functions of $N$ and the base distribution, and show that they preserve the latent reward ranking. For the practical Best-vs-Random and Best-vs-Worst variants, chosen and rejected responses are coupled through the same candidate set, so exact BT representability generally fails; nevertheless, bounded-class minimizers approach the reference targets as $N$ grows. Although margin and connectivity are known to govern sample efficiency in pairwise preference learning, Best-of-$N$ couples them through $N$ in opposing directions: larger $N$ widens pairwise margins but reduces connectivity. This trade-off yields two design principles: use larger $N$ when preference labels are the bottleneck, smaller $N$ when generation is the bottleneck; and shape the base distribution to place mass between the responses whose comparison matters most at test time. Experiments on synthetic and real preference data support the predicted dependence on sample size and base-distribution shape.

2605.30615 2026-06-01 cs.LG 版本更新

Improving Selective Classification with Pairwise Queries for Binary Classification

通过成对查询改进二分类的选择性分类

Harsh Vardhan, Sunav Choudhary, Natwar Modani, Arya Mazumdar

发表机构 * Adobe Research(Adobe研究院)

AI总结 针对选择性分类中模型置信度与预测不一致导致高错误率的问题,提出使用成对查询检测高错误样本,以降低非拒绝样本的错误率,并通过理论和实验验证了其有效性。

详情
AI中文摘要

在选择性分类中,模型预测其确信的数据样本的标签,并避免预测不确信样本的标签。被拒绝的样本通常由专家标注,这成本高昂。当模型在非拒绝样本上错误率低时,专家的预算得到最佳利用。然而,模型置信度的估计可能与模型的预测不一致,这可能导致非拒绝样本上的高错误率。这种情况在LLM的上下文二分类中容易发生。为了解决这个问题,我们提出向同一模型进行额外的成对查询。这些成对查询可以检测高错误样本,并整合到选择性分类技术中,以降低非拒绝样本上的错误率。理论上,我们建立了使用成对查询的简单算法优于不一致置信度估计的条件。我们通过大量实验支持这一见解,包括1个合成数据集和4个基于上下文学习的真实二分类数据集。在所有情况下,我们展示了使用成对查询的算法比仅使用原始置信度估计(例如LLM的下一个token对数概率)获得了更好的准确率-成本权衡。

英文摘要

In selective classification, a model predicts the labels of data samples where it is confident, and abstains from predicting labels for samples on which it is not confident. The rejected samples are often labeled by an expert, which is expensive. The budget for the expert is best utilized when the model has low error on non-rejected samples. However, the estimate of a model's confidence might be inconsistent with the model's predictions, which can lead to high error on non-rejected points. Such situations can readily occur in in-context binary classification by LLMs. To remedy this, we propose making additional pairwise queries to the same model. These pairwise queries can detect high-error samples and be incorporated into selective classification techniques to reduce the error on non-rejected samples. Theoretically, we establish the conditions under which a simple algorithm using pairwise queries outperforms an inconsistent confidence estimate. We support this insight through extensive experiments for $1$ synthetic and $4$ in-context learning-based real binary classification datasets. In all these cases, we show that our algorithms, using pairwise queries, obtain a better accuracy-cost tradeoff than using only the raw confidence estimates, for instance, the LLM's next-token logits.

2605.30613 2026-06-01 cs.CR cs.LG 版本更新

CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs

CacheProbe: 审计网关API中的提示缓存隔离

Ryan Fahey

发表机构 * Khoury College of Computer Sciences(计算机科学学院) Northeastern University(东北大学)

AI总结 本文通过CacheProbe方法审计OpenRouter API网关架构,发现其共享组织凭证的路由机制可能绕过提供商级别的提示缓存隔离,导致全局缓存共享漏洞。

Comments 11 pages, 8 figures, 2 tables Accepted at SAGAI '26 (Workshop on Secure Agents for Generative AI), co-located with IEEE Symposium on Security and Privacy 2026

详情
AI中文摘要

在过去一年中,大型语言模型(LLM)推理API中的提示缓存变得越来越流行。提示缓存通过重用特定提示的KV缓存部分来处理另一个请求,从而节省宝贵的计算资源并加快响应时间。然而,许多提示缓存的实现无法抵御时序攻击甚至基本的元数据泄露。Gu等人(ICML 2025)开发了一种审计LLM中提示缓存的方法。本文研究OpenRouter的API网关架构是否引入了提示缓存漏洞,从而绕过了提供商级别的提示缓存隔离保证。大多数LLM推理提供商实现按账户或按组织的提示缓存以防止数据泄露,但通过OpenRouter使用共享组织凭证进行路由是否会在所有OpenRouter用户之间无意中创建全局缓存共享?

英文摘要

Over the past year, prompt caching in Large Language Models (LLMs) has become increasingly more popular across inference APIs. Prompt caching helps save precious compute resources and speeds up response times by reusing parts of the KV cache of a specific prompt for another request. However, many implementations of prompt caching are not secure against timing attacks or even basic metadata disclosure. Gu et al. (ICML 2025) develop a method to audit prompt caching in LLMs. This paper investigates whether OpenRouter's API gateway architecture introduces prompt caching vulnerabilities that bypass provider-level prompt cache isolation guarantees. Most LLM inference providers implement per-account or per-organization prompt caching to prevent data leaks, but does routing through OpenRouter with shared organizational credentials inadvertently create global cache sharing across all OpenRouter users?

2605.30612 2026-06-01 cs.RO cs.LG cs.SY eess.SY 版本更新

ZAPS-DA: Zero-Phase Action Policy Smoothing with Decoupled Actor for Continuous Control in Reinforcement Learning

ZAPS-DA:基于解耦演员的零相位动作策略平滑用于强化学习中的连续控制

Faiq Shamass

发表机构 * Independent Researcher(独立研究者)

AI总结 提出ZAPS-DA框架,通过解耦演员网络模仿零相位滤波目标,在不引入相位延迟和后处理的情况下减少连续控制策略的动作抖动,并在驾驶仿真中验证了其有效性。

Comments 7 pages, 5 figures, 5 tables. Submitted to IEEE RA-L

详情
AI中文摘要

基于离策略强化学习训练的连续控制策略经常表现出高频动作抖动,使得直接部署在物理执行器上不可行。事后滤波可以减弱抖动但引入相位延迟;在演员损失中嵌入平滑惩罚会将其与RL梯度耦合,并将奖励回归与过度激进的平滑混为一谈。我们提出ZAPS-DA,一个在部署时减少动作抖动且具有可忽略相位延迟和无后处理的框架。ZAPS-DA将一个未修改的主演员(由基础RL损失训练)与一个单独的解耦演员配对,该解耦演员通过监督学习模仿存储在回放缓冲区中的零相位滤波目标。部署的策略是解耦演员:一个从当前观测到平滑动作的前馈映射,没有推理时滤波和动作历史输入——我们称之为非因果滤波器的因果蒸馏机制。幅度匹配的MSE损失提供了跨优化器类别的零超参数可移植性。使用Soft Actor-Critic和Savitzky-Golay滤波器在两个驾驶模拟器中通过配对n=150评估协议进行验证:在MetaDrive上,ZAPS-DA将转向抖动减少14-21倍,油门抖动减少3-5倍(所有p < 10^{-4},Bonferroni校正),同时以6.3%的奖励成本匹配任务完成率(成功率p=0.31,碰撞率p=0.31);在自定义Webots自适应巡航控制环境中,相同的SG配置产生了帕累托改进——奖励持平(p=0.121),转向抖动减少8-45倍,总任务失败率从2.0%降至0.7%。

英文摘要

Continuous control policies trained with off-policy reinforcement learning frequently exhibit high-frequency action jitter, rendering direct deployment on physical actuators impractical. Post-hoc filtering attenuates jitter but introduces phase lag; embedding smoothness penalties in the actor's loss couples them with the RL gradient and conflates reward regression with over-aggressive smoothing. We present ZAPS-DA, a framework that reduces action jitter at deployment with negligible phase lag and no post-processing. ZAPS-DA pairs an unmodified main actor (trained by the base RL loss) with a separate decoupled actor trained via supervised imitation of zero-phase filtered targets stored in the replay buffer. The deployed policy is the decoupled actor: a feed-forward map from the current observation to a smooth action, with no inference-time filter and no action-history input -- a mechanism we term causal distillation of a non-causal filter. A magnitude-matched MSE loss provides zero-hyperparameter portability across optimizer classes. Validated with Soft Actor-Critic and a Savitzky--Golay filter in two driving simulators using paired n=150 evaluation protocols: on MetaDrive, ZAPS-DA reduces steering jitter by 14--21x and throttle jitter by 3--5x (all $p < 10^{-4}$, Bonferroni-corrected) while matching task-completion (p=0.28 success, p=0.31 crash) at a 6.3% reward cost; on a custom Webots adaptive cruise control environment, the same SG configuration produces a Pareto improvement -- reward parity (p=0.121), 8--45x steering jitter reduction, and total task-failure rate reduced from 2.0% to 0.7%.

2605.30610 2026-06-01 cs.LG 版本更新

Constrained Flow Optimization via Sequential Fine Tuning for Molecular Design

通过序列微调进行约束流优化以用于分子设计

Sven Gutjahr, Riccardo De Santi, Luca Schaufelberger, Kjell Jorner, Andreas Krause

发表机构 * Department of Computer Science, ETH Zurich(苏黎世联邦理工学院计算机科学系) Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich(苏黎世联邦理工学院化学与生物工程研究所) ETH AI Center(苏黎世联邦理工学院人工智能中心)

AI总结 提出约束流优化(CFO)算法,通过将约束生成优化问题转化为序列微调,在分子设计中平衡奖励最大化与约束满足。

Comments ICML 2026

详情
AI中文摘要

适应生成基础模型,特别是扩散和流模型,以优化给定奖励函数(例如结合亲和力)同时满足约束(例如分子可合成性),对于其在分子设计或蛋白质工程等现实世界科学发现应用中的采用至关重要。虽然最近的工作通过强化学习和控制方案引入了可扩展的奖励引导微调方法,但如何以可靠和可预测的方式在算法上权衡奖励最大化和约束满足仍然是一个开放问题。受此挑战的启发,我们首先提出了约束生成优化的严格框架,该框架将优化视角引入所提出的适应问题,并将约束生成的相关任务作为子案例。然后,我们引入了约束流优化(CFO),这是一种通过将原始问题简化为通过已建立的可扩展方法进行序列微调来自动且可证明地平衡奖励最大化和约束满足的算法。我们为约束生成优化和通过CFO进行约束生成提供了收敛保证。最后,我们在合成(但具有说明性)设置和分子设计任务上对CFO进行了实验评估。在这些评估中,CFO在确保高约束满足的同时实现了奖励的持续增长,展示了其在约束生成优化中的实用性。

英文摘要

Adapting generative foundation models, in particular diffusion and flow models, to optimize given reward functions (e.g., binding affinity) while satisfying constraints (e.g., molecular synthesizability) is fundamental for their adoption in real-world scientific discovery applications such as molecular design or protein engineering. While recent works have introduced scalable methods for reward-guided fine-tuning of such models via reinforcement learning and control schemes, it remains an open problem how to algorithmically trade-off reward maximization and constraint satisfaction in a reliable and predictable manner. Motivated by this challenge, we first present a rigorous framework for Constrained Generative Optimization, which brings an optimization viewpoint to the introduced adaptation problem and retrieves the relevant task of constrained generation as a sub-case. Then, we introduce Constrained Flow Optimization (CFO), an algorithm that automatically and provably balances reward maximization and constraint satisfaction by reducing the original problem to sequential fine-tuning via established, scalable methods. We provide convergence guarantees for constrained generative optimization and constrained generation via CFO. Ultimately, we present an experimental evaluation of CFO on both synthetic, yet illustrative, settings, and a molecular design task. Across these evaluations, CFO achieves consistent increases in reward while ensuring high constraint satisfaction, showcasing its practical utility for constrained generative optimization.

2605.30603 2026-06-01 physics.ao-ph cs.LG 版本更新

Learning effective Sargassum transport dynamics from limited drifter observations

从有限的漂流观测中学习有效的马尾藻输运动力学

F. J. Beron-VEra, M. J. Olascoaga, J. Morell, E. Cruz

发表机构 * Rosenstiel School of Marine, Atmospheric, and Earth Science(罗森斯蒂尔海洋、大气与地球科学学院) University of Miami(迈阿密大学) Department of Atmospheric Sciences(大气科学系) Department of Ocean Sciences(海洋科学系) Department of Marine Sciences(海洋科学系) University of Puerto Rico(波多黎各大学)

AI总结 针对浮游物质输运中未解析过程的影响,提出基于物理诊断和有限记忆表示的数据驱动输运学习框架,通过MLP集成和SINDy方法从有限拉格朗日观测中学习有效输运修正,并在波多黎各和墨西哥湾流区域验证了诊断信息的有效性及延迟稀疏符号修正的局限性。

详情
AI中文摘要

浮游物质输运受到未解析过程的影响,这些过程通常无法从现有的环流产品中获得。我们开发了一个数据驱动的输运学习框架,利用物理驱动的海洋-大气诊断和部分受惯性粒子记忆效应启发的有限记忆表示,从有限的拉格朗日观测中学习有效的输运修正。通过留一轨迹验证,使用预测性和稀疏符号发现方法分析诊断表示。在波多黎各地区和墨西哥湾流的马尾藻跟随漂流器应用中,结果表明诊断包含超越基线环流产品的输运相关信息。多层感知器(MLP)集成提供了灵活的预测轨迹修正,而非线性动力学稀疏辨识(SINDy)测试是否可以从诊断中提取瞬时或延迟的稀疏符号输运结构。结果在不同流态下有所不同:(i)在波多黎各,延迟稀疏符号修正提供了适度但系统的改进;(ii)在墨西哥湾流应用中,尽管延迟预测信息持续存在,但动态有用的稀疏符号修正主要保持瞬时性。这些结果支持粗粒度浮游物质输运中的有限记忆效应,同时也说明了获得稳定延迟稀疏符号闭合的困难。

英文摘要

Floating-material transport is influenced by unresolved processes that are often absent from available circulation products. We develop a data-driven transport-learning framework for learning effective transport corrections from limited Lagrangian observations using physically motivated ocean--atmosphere diagnostics and finite-memory representations motivated in part by inertial-particle memory effects. The diagnostic representation is analyzed through predictive and sparse symbolic-discovery approaches under leave-one-trajectory-out validation. Applications to Sargassum-following drifters in the Puerto Rico region and the Gulf Stream show that the diagnostics contain transport-relevant information beyond the baseline circulation products. Multilayer perceptron (MLP) ensembles provide flexible predictive trajectory corrections, while Sparse Identification of Nonlinear Dynamics (SINDy) tests whether instantaneous or delayed sparse symbolic transport structure can be extracted from the diagnostics. The results differ across flow regimes: (i) in Puerto Rico, delayed sparse symbolic corrections provide modest but systematic improvement; (ii) in the Gulf Stream application, dynamically useful sparse symbolic corrections remain primarily instantaneous even though delayed predictive information persists. These results support finite-memory transport effects in coarse-grained floating-material transport while also illustrating the difficulty of obtaining stable delayed sparse symbolic closures.

2605.30601 2026-06-01 cs.LG 版本更新

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

TASER: 面向几何驱动鲁棒性的任务感知斯坦正则化

Michał Kozyra, Gesine Reinert

发表机构 * Department of Statistics, University of Oxford, United Kingdom(英国牛津大学统计系)

AI总结 提出TASER(任务感知斯坦正则化),一种基于Langevin斯坦算子的训练时正则化框架,通过惩罚训练分布下的逐点斯坦残差,诱导各向异性数据感知平滑性,从而提升模型在分布偏移和对抗扰动下的鲁棒性。

详情
AI中文摘要

现代深度网络在分布偏移和对抗扰动下仍然脆弱,通常是由于过度或结构不良的输入敏感性。我们引入TASER(任务感知斯坦正则化),一种源自Langevin斯坦算子的训练时正则化框架。通过惩罚训练分布下的逐点斯坦残差,TASER鼓励预测器与数据密度之间的几何兼容性,诱导各向异性、数据感知的平滑性。我们提供了斯坦正则化与降低一阶偏移敏感性之间的理论联系,开发了与现代架构兼容的可扩展实现变体,并在回归和视觉基准上展示了改进的鲁棒性和稳定性。在CIFAR-10实验中,TASER一致地提高了已有训练方法的对抗鲁棒性,且未造成统计显著的干净准确率下降。

英文摘要

Modern deep networks remain fragile under distribution shift and adversarial perturbations, often due to excessive or poorly structured input sensitivity. We introduce TASER (Task-Aware Stein Regularisation), a training-time regularisation framework derived from Langevin Stein operators. By penalising pointwise Stein residuals under the training distribution, TASER encourages geometric compatibility between predictors and data density, inducing anisotropic, data-aware smoothness. We provide theoretical links between Stein regularisation and reduced first-order shift sensitivity, develop scalable implementation variants compatible with modern architectures, and demonstrate improved robustness and stability across regression and vision benchmarks. Across CIFAR-10 experiments, TASER consistently improves the adversarial robustness of established training methods without incurring statistically significant clean-accuracy degradation.

2605.30600 2026-06-01 cs.LG cs.IT math.IT 版本更新

The Fast Mixing Mechanism for Differential Privacy

差分隐私的快速混合机制

Omri Lev, Moshe Shenfeld, Vishwak Srinivasan, Katrina Ligett, Ashia C. Wilson

AI总结 提出基于快速变换的差分隐私草图机制,在保持隐私保证的同时匹配经典快速草图方法的运行时间,并应用于差分隐私线性回归实现首个快速方法。

详情
AI中文摘要

随机草图是压缩大规模优化问题同时保持准确性的核心工具。特别是基于结构化矩阵(如Hadamard矩阵)的草图可以高效应用,并且通常以更低的计算成本得到接近原始问题的解。在差分隐私(DP)中,高斯草图已被用于解决DP线性回归,始于\citet{sheffet2017differentially, sheffet2019old},随后由\citet{lev2025gaussianmix, lev2026near}改进。然而,尽管这些方法实现了强大的效用保证,它们通常不会比经典DP方法提高运行时间。在这项工作中,我们引入了一种基于快速变换的新DP草图机制,该机制在某些情况下匹配经典快速草图方法的运行时间。我们证明了该机制的最先进隐私保证,并表明在有利情况下,它们与高斯草图的隐私保证相差一个常数因子。作为一个应用,我们将该机制与最近的基于草图的DP线性回归方法相结合,得到了一种具有强效用和改进运行时间的新算法。我们为该算法建立了隐私和准确性保证,据我们所知,这是第一个用于DP普通最小二乘法的快速方法。

英文摘要

Randomized sketching is a central tool for compressing large-scale optimization problems while preserving accuracy. In particular, sketches that are based on structured matrices, such as the Hadamard matrix, can be applied efficiently and often yield solutions that approximate those of the original problem at much lower computational cost. In differential privacy (DP), Gaussian sketching has been used to solve DP linear regression, beginning with \citet{sheffet2017differentially, sheffet2019old} and later refined by \citet{lev2025gaussianmix, lev2026near}. However, although these methods achieve strong utility guarantees, they usually do not improve runtime over classical DP approaches. In this work, we introduce a new DP sketching mechanism based on fast transforms, which, in certain cases, matches the runtime of classical fast sketching methods. We prove state-of-the-art privacy guarantees for this mechanism and show that, in favorable regimes, they match those of the Gaussian sketch up to a constant factor. As an application, we combine this mechanism with recent sketch-based methods for DP linear regression to obtain a new algorithm with strong utility and improved runtime. We establish privacy and accuracy guarantees for this algorithm, yielding, to the best of our knowledge, the first fast method for DP ordinary least squares.

2605.30599 2026-06-01 cs.LG cs.CL 版本更新

AMNESIA: A Large Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis

AMNESIA: 一个大规模医学遗忘基准套件与疾病知情分析

Saeedeh Davoudi, Reihaneh Iranmanesh, Ophir Frieder, Nazli Goharian

发表机构 * IR Lab, Computer Science Department, Georgetown University, Washington D.C.(信息检索实验室,计算机科学系,乔治·华盛顿大学,华盛顿特区)

AI总结 提出AMNESIA,首个大规模开源医学遗忘基准,包含70,560个问答对,评估四种遗忘方法,发现个体遗忘会侵蚀同病患者的其他知识。

详情
AI中文摘要

医学知识不断演变。这需要更新或选择性遗忘已训练的医学LLM中编码的信息。机器遗忘旨在无需完全重新训练即可移除特定训练数据对模型的影响。然而,现有的遗忘基准依赖于合成或小规模通用数据,导致临床遗忘研究不足。我们引入AMNESIA,首个大规模、开源医学遗忘基准,包含来自11种疾病类别、8,820份患者笔记的70,560个问答对。AMNESIA包括测试直接回忆的事实问题和测试临床推理的推理问题。我们用它来评估四种广泛使用的遗忘方法,分别在随机患者和疾病级别,并引入一个新的指标来检测医学术语的泄露。我们表明,遗忘个体患者会侵蚀具有相同病症的其他患者的知识,这需要能够更好地区分患者与共享临床知识的方法。

英文摘要

Medical knowledge is continuously evolving. This creates a need to update or selectively forget information encoded in already-trained medical LLMs. Machine unlearning aims to remove the influence of specific training data from a model without full retraining. Yet, existing unlearning benchmarks rely on synthetic or small-scale general data, leaving clinical unlearning understudied. We introduce AMNESIA, the first large-scale, open source benchmark for medical unlearning, with 70,560 question-answer pairs from 8,820 patient notes across 11 disease categories. AMNESIA includes both factual questions testing direct recall and reasoning questions testing clinical inference. We use it to evaluate four widely used unlearning methods at both random patient and disease-level, and introduce a new metric for detecting leakage of medical terminology. We show that unlearning individual patients erodes knowledge of others with the same condition, calling for methods that can better separate patients from shared clinical knowledge.

2605.30597 2026-06-01 cs.LG 版本更新

ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings

ScaleMAP: 在低维嵌入中保持局部密度和邻域结构

Rajas Poorna, Marcus T. Cicerone

发表机构 * School of Chemical and Biomolecular Engineering(化学与生物分子工程学院) Georgia Institute of Technology(佐治亚理工学院) School of Chemistry and Biochemistry(化学与生物化学学院)

AI总结 提出ScaleMAP方法,通过将每对嵌入位移除以原始空间局部半径的几何均值,在保持UMAP级邻域保真度的同时恢复密度信息,解决了UMAP等非线性降维方法丢失邻域尺度的问题。

Comments 23 pages, 16 figures

详情
AI中文摘要

非线性降维方法(如UMAP和PaCMAP)在图构建过程中自适应地归一化局部距离,从而抹去了数据中的邻域尺度。这不仅扭曲了相对聚类大小:稀疏结构(如过渡细胞类型之间的桥梁和超光谱图像中的窄光谱峰值)可能被抑制或完全丢失。DensMAP通过添加密度惩罚来纠正这一点,但该惩罚与UMAP的吸引-排斥力竞争,导致点远离其邻域分散。ScaleMAP采用不同的方法:每个成对嵌入位移除以两个端点原始空间局部半径的几何均值,将尺度信息作为变量变换而非竞争目标重新注入。在标准基准测试以及来自转录组学、超光谱成像和流式细胞术的科学数据集中,ScaleMAP在密度保持方面与DensMAP相当,同时保持UMAP级别的邻域保真度。在转录组数据中,它恢复了UMAP压缩的细胞群体之间的稀疏桥梁;在流式细胞术中,它忠实地表示了跨越17个数量级的密度结构。同样的原理应用于PaCMAP,持续改善了密度保持,表明该方法可推广到UMAP之外。

英文摘要

Nonlinear dimensionality-reduction methods such as UMAP and PaCMAP adaptively normalize local distances during graph construction, erasing neighborhood scale from the data. This distorts more than relative cluster sizes: sparse structures like bridges between transitioning cell types and narrow spectral spikes in hyperspectral images can be suppressed or lost entirely. DensMAP adds a density penalty to correct this, but this penalty competes with UMAP's attraction-repulsion forces, scattering points far from their neighborhoods. ScaleMAP takes a different approach: each pairwise embedding displacement is divided by the geometric mean of the two endpoints' original-space local radii, re-injecting scale information as a change of variables rather than as a competing objective. Across standard benchmarks and scientific datasets from transcriptomics, hyperspectral imaging, and flow cytometry, ScaleMAP matches DensMAP on density preservation while maintaining UMAP-level neighborhood preservation. In transcriptomic data, it recovers sparse bridges between cell populations that UMAP collapses; in flow cytometry, it faithfully represents density structure across 17 orders of magnitude. The same principle applied to PaCMAP yields consistently improved density preservation, suggesting the approach generalizes beyond UMAP.

2605.30596 2026-06-01 cs.LG 版本更新

Improving Relative Representations with Learned Anchors and Whitened Inner Products

改进相对表示:使用学习锚点和白化内积

Oscar Thorsted Svendsen, Nikolaj Holst Jakobsen, Fabian Mager, Hiba Nassar

发表机构 * Technical University of Denmark(丹麦技术大学)

AI总结 提出通过学习锚点作为鲁棒语义原型并采用几何感知的相似度度量(白化内积),改进相对表示方法,实现跨模型的高保真信息传输和零样本通信。

Comments 14 pages, 5 figures

详情
AI中文摘要

独立训练的神经模型通常收敛到不兼容的潜在表示,这成为高度模块化AI系统的基本障碍。相对表示(RR)通过将绝对坐标映射到由与公共锚点的相似性定义的共享空间来解决这一问题,但传统实现依赖于随机采样的锚点和余弦相似度,常常无法捕捉现代架构(如Transformer)的各向异性几何结构。在这项工作中,我们提出了一个基于两项改进的跨模型通信鲁棒框架。我们学习锚点作为鲁棒的语义原型,并利用一种几何感知的相似度度量,该度量保留了判别性的幅度信息且对仿射变换具有不变性。我们的方法在视觉和语言任务中展示了显著的性能和一致性提升。值得注意的是,它实现了几乎无损的信息传输和稳定的零样本通信,即使在高度异构的架构之间,例如不同规模的小型语言模型。

英文摘要

Independently trained neural models typically converge to incompatible latent representations, creating a fundamental barrier to highly modular AI systems. While Relative Representations (RR) address this by mapping absolute coordinates to a shared space defined by similarities to common anchor points, traditional implementations rely on randomly sampled anchors and cosine similarity, which frequently fail to capture the anisotropic geometries of modern architectures like Transformers. In this work, we propose a robust framework for cross-model communication based on two improvements. We learn anchors as robust semantic prototypes and utilize a geometry-aware similarity metric which preserves discriminative magnitude information and is invariant to affine shifts. Our approach demonstrates significant gains in performance and consistency across vision and language tasks. Notably, it enables nearly lossless information transfer and stable zero-shot communication even between highly heterogeneous architectures, such as small language models of varying scales.

2605.30593 2026-06-01 cs.LG cs.AI cs.CE 版本更新

Scientific Machine Learning for Engine Health Management and Remaining Useful Life Prediction

面向发动机健康管理与剩余寿命预测的科学机器学习

Jostein Barry-Straume, Changmin Son, Adrian Sandu, Gavan Burke, Rekha Sundararajan, Andrew Rimell, James G. Steinrock

发表机构 * Computational Science Laboratory(计算科学实验室) Department of Computer Science(计算机科学系) Virginia Tech(弗吉尼亚理工学院)

AI总结 提出一个多任务科学机器学习框架,通过联合预测涡轮气体温度、温差和剩余寿命并提供量化不确定性区间,以支持基于风险的维护决策。

详情
AI中文摘要

发动机健康管理依赖于对剩余寿命的可靠预测以及对涡轮气体温度等热指标的跟踪。在实际应用中,真实机队数据具有异质性和非平稳性,仅靠点预测不足以支持风险感知的维护决策。本文提出了一种用于涡轮机预测的多任务科学机器学习框架,该框架联合预测未修剪涡轮气体温度、涡轮气体温差和剩余寿命,并以预测区间的形式提供量化不确定性,并评估其经验覆盖率。共享序列编码器(带有残差双向LSTM层和注意力池化的卷积前端)为任务特定头部提供输入,包括用于概率回归的均值-方差估计,以及可选的用于基于阈值事件建模的生存头部。该框架设计为可通过少量面向实践者的参数(例如,温差阈值规则和剩余寿命目标构建)进行调整,以便部署能够与内部策略和专有标准保持一致。使用点指标和区间指标评估所提出框架的预测性能,包括平均绝对误差、预测区间覆盖概率、平均预测区间宽度以及覆盖-宽度准则。结果按总体和按飞行阶段与维护段分层报告,以突出运营环境的影响并支持不确定性感知监控。

英文摘要

Engine Health Management (EHM) depends on reliable forecasting of Remaining Useful Life (RUL) and on tracking thermal indicators such as turbine gas temperature (TGT). In practice, real-world fleet data are heterogeneous and non-stationary, and point predictions alone are insufficient for risk-aware maintenance decisions. This paper presents a multi-task scientific machine learning framework for turbine prognostics that jointly predicts turbine gas temperature untrimmed (TGTU), Delta Turbine Gas Temperature (DTGT), and RUL, with quantified uncertainty in the form of prediction intervals whose empirical coverage is evaluated. A shared sequence encoder (convolutional front-end with residual bidirectional LSTM layers and attention pooling) feeds task-specific heads, including mean--variance estimation for probabilistic regression and, optionally, a survival head for threshold-based event modeling. The framework is designed to be tunable via a small set of practitioner-facing parameters (e.g., DTGT thresholding rules and RUL target construction) so that deployment can align with in-house policies and proprietary criteria. The predictive performance of the proposed framework is evaluated using both point and interval metrics, including mean absolute error (MAE), prediction interval coverage probability (PICP), mean prediction interval width (MPIW), and the coverage--width criterion (CWC). Results are reported both in aggregate and stratified by flight phase and maintenance segment to highlight operational-context effects and to support uncertainty-aware monitoring.

2605.30592 2026-06-01 cs.LG 版本更新

Learning Transferable Predictability Representations

学习可迁移的可预测性表示

Diyali Goswami, Auroop R. Ganguly

发表机构 * Sustainability and Data Sciences Laboratory (SDS Lab)(可持续性与数据科学实验室) AI4CaS: AI for Climate and Sustainability(AI4CaS:为气候与可持续性的人工智能) Institute for Experiential AI(体验式人工智能研究所) Pacific Northwest National Laboratory (PNNL)(太平洋西北国家实验室)

AI总结 提出Gauge-Fixed Ordinal Network (GON)模型,通过锚定方差目标学习跨系统一致的序数评分,解决可预测性评估中的尺度模糊问题。

Comments 27 pages, 3 figures

详情
AI中文摘要

我们研究将标量分数分配给短轨迹窗口的问题,该分数反映其在有序可预测性机制连续体上的位置,范围从结构化确定性动力学到非结构化随机噪声。现有方法在单个系统内进行确定性-随机性判别,并且不能产生跨系统具有一致数值解释的分数。我们将此形式化为五级可预测性阶梯上的序数估计,并识别出跨系统模糊性的结构来源:仅排序监督使分数坐标在单调重参数化下未固定,我们称之为序数评分的规范自由度。我们提出了规范固定序数网络(GON),这是一种时间卷积模型,使用锚定方差目标训练,将级别-wise分数均值固定到共享目标坐标。GON操作于2-jet特征,这些特征暴露局部轨迹几何结构,由平滑流保持,并被随机代理过程破坏。在五个保留的动力学系统上,从预训练的GON检查点初始化在所有窗口预算上始终优于从头训练,适应深度反映了与训练家族的几何接近性。零样本分数在随机边界保留序数结构,其中代理过程最强烈地破坏非线性几何,并且预训练初始化在所有窗口预算上始终优于从头训练。成对判别和全局一致的序数评分是不同的属性,需要稳定的分数坐标以实现跨系统迁移,这对自然和工程动力学系统的可预测性评估、模型选择和早期预警诊断具有直接影响。

英文摘要

We study the problem of assigning a scalar score to a short trajectory window that reflects its position on an ordered continuum of predictability regimes, spanning structured deterministic dynamics to unstructured stochastic noise. Existing methods address deterministic-versus-stochastic discrimination within a single system and do not produce scores with a consistent numerical interpretation across systems. We formalize this as ordinal estimation over a five-level predictability ladder and identify a structural source of cross-system ambiguity: ranking supervision alone leaves the score coordinate unfixed up to a monotone reparameterization, which we term the gauge freedom of ordinal scoring. We propose the Gauge-Fixed Ordinal Network (GON), a temporal convolutional model trained with an anchor-and-variance objective that pins level-wise score means to shared target coordinates. GON operates on 2-jet features that expose local trajectory geometry, preserved by smooth flows and disrupted by stochastic surrogate procedures. On five held-out dynamical systems, initializing from a pretrained GON checkpoint consistently outperforms training from scratch across all window budgets, with adaptation depth reflecting geometric proximity to the training family. Zero-shot scores retain ordinal structure at the stochastic boundary, where surrogate procedures most strongly disrupt nonlinear geometry, and pretrained initialization consistently beats scratch across all window budgets. Pairwise discrimination and globally coherent ordinal scoring are distinct properties requiring a stable score coordinate for cross-system transfer, with direct implications for predictability assessment, model selection, and early-warning diagnostics across natural and engineered dynamical systems.

2605.30590 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

反事实评估揭示临床LLM和智能体的隐藏能力画像

Matt Turk

发表机构 * Protege Data Lab(Protege数据实验室)

AI总结 提出因果敏感性评分(CSS),通过沿五个临床维度变异肿瘤病例来评估模型是否按预期方向更新推荐,发现与覆盖度指标排名相反,并揭示所有前沿模型在手术状态干预上的安全盲点。

Comments Accepted to RLEval @ ACM CAIS 2026 (Workshop on Methods and RL Environments for Evaluating AI Agents) and selected for an invited talk based on reviewer ratings. 4-page short paper + appendix

详情
AI中文摘要

两个临床AI系统在基于覆盖度的评分标准上得分几乎相同,但当患者输入变化时行为却截然不同:一个更新其推荐以匹配新的临床信号,而另一个无论输入如何都产生相同输出。我们引入因果敏感性评分(CSS),这是一个预注册的干预性指标,沿五个临床有意义的维度——生物标志物翻转、先前治疗失败、生物标志物移除、手术状态变化和分期扰动——变异肿瘤肿瘤委员会病例,并使用{0, 0.5, 1.0}量表对每个模型是否在预注册的正确方向上更新其推荐进行评分。与基于覆盖度的加权召回指标共识匹配评分(CMS)相比,来自三个实验室的六个前沿模型在224个病例的单次推理中评估,排名几乎完全相反:所有六个模型排名发生变化,CMS最差的模型成为CSS最好的模型,而一个中上CMS模型在CSS上排名最后。我们进一步揭示了一个普遍的安全盲点:每个前沿模型在手术状态干预上失败(D家族最多17.2%的CSS),这是CMS未暴露的发现。该指标也适用于使用工具的智能体:在ReAct风格的实验中,工具使用改善了六个模型中五个的CSS(+2.5到+20.3个百分点),然而CSS最低的模型检索相同的图表部分但仍未能更新其推荐——揭示了仅在反事实评估下可见的结构性响应缺陷。跨评判者复制和三位评估者的医学专业验证确认了总体发现。像CSS这样的干预性预注册指标补充了临床AI智能体的基于覆盖度的评估:它们捕捉了覆盖度指标遗漏的响应性,并为未来的智能体强化学习系统提供了候选的密集奖励信号。

英文摘要

Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other produces the same output regardless. We introduce the Causal Sensitivity Score (CSS), a pre-registered interventional metric that mutates oncology tumor-board cases along five clinically meaningful dimensions - biomarker flips, prior-treatment failures, biomarker removals, surgery-status changes, and stage perturbations - and scores whether each model updates its recommendations in the pre-registered correct direction using a {0, 0.5, 1.0} scale. Benchmarked against the Consensus Match Score (CMS), a coverage-based weighted recall metric, six frontier models from three labs evaluated in single-shot inference across 224 cases rank in nearly opposite orders: all six models change rank, the CMS-worst model becomes CSS-best, and one upper-mid CMS model ranks last on CSS. We further surface a universal safety blind spot: every frontier model fails on surgery-status interventions (at most 17.2% CSS on Family D), a finding CMS does not expose. The metric also transfers to tool-using agents: in a ReAct-style experiment, tool use improves CSS for five of six models (+2.5 to +20.3 percentage points), yet the lowest-CSS model retrieves the same chart sections and still fails to update its recommendations - revealing a structural responsiveness deficit visible only under counterfactual evaluation. Cross-judge replication and three-rater medical-professional validation confirm the aggregate findings. Interventional pre-registered metrics like CSS complement coverage-based evaluation for clinical AI agents: they capture responsiveness that coverage metrics miss and offer a candidate dense reward signal for future agentic RL systems.

2605.30585 2026-06-01 cs.LG cs.AI cs.CE 版本更新

Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Temperature Degradation

机器学习不确定性量化方法在预测涡轮燃气温度退化中的基准测试

Jostein Barry-Straume, Changmin Son, Adrian Sandu, Gavan Burke, Rekha Sundararajan, Andrew Rimell, James G. Steinrock

发表机构 * Computational Science Laboratory(计算科学实验室) Department of Computer Science(计算机科学系) Virginia Tech(弗吉尼亚理工大学)

AI总结 本文研究了五种预测区间构建方法(Delta法、贝叶斯蒙特卡洛Dropout、Bootstrap法、下上界估计和均值方差估计),在统一实验框架下评估其捕捉涡轮燃气温度神经网络预测不确定性的能力,并基于覆盖概率、归一化平均预测区间宽度和覆盖宽度准则等指标比较了各方法的可靠性、锐度及权衡,为发动机健康管理中的预测区间方法选择和调优提供了实用指南。

详情
AI中文摘要

现代发动机的有效预测与健康管理依赖于准确的涡轮燃气温度预测和稳健的不确定性量化,以确保可靠性和安全性。本文研究了五种构建预测区间的主要方法——即Delta法、贝叶斯蒙特卡洛Dropout、Bootstrap法、下上界估计和均值方差估计——作为捕捉涡轮燃气温度神经网络预测中不确定性的手段。每种方法都在统一的实验框架内实现,该框架采用交叉验证进行超参数选择、重复训练-测试分割以保证性能稳健性,并使用多个指标评估区间的准确性和紧致性。具体地,测量了覆盖概率、归一化平均预测区间宽度以及基于覆盖宽度的准则,以全面评估每种方法的可靠性和锐度。在代表性涡轮燃气温度数据集上进行的实验揭示了五种方法在区间覆盖、宽度和稳定性方面的不同权衡。这些发现为发动机健康管理和预测中选择和调整预测区间方法提供了实用指南,确保在实际应用中的可解释性和精度。

英文摘要

Effective prognostics and health management of modern engines relies on accurate turbine gas temperature predictions and robust uncertainty quantification to ensure reliability and safety. This paper investigates five major approaches for constructing prediction intervals -- namely the Delta method, Bayesian Monte Carlo Dropout, Bootstrap method, Lower-Upper Bound Estimation, and Mean-Variance Estimation -- as a means of capturing the uncertainty in neural network predictions of turbine gas temperature. Each approach is implemented within a unified experimental framework that employs cross-validation for hyperparameter selection, repeated train-test splits for performance robustness, and multiple metrics to evaluate both the accuracy and tightness of the intervals. In particular, Coverage Probability, Normalized Mean Prediction Interval Width, and the Coverage Width-based Criterion are measured to comprehensively assess each method's reliability and sharpness. Experiments conducted on a representative turbine gas temperature dataset reveal distinct trade-offs among the five methods in terms of interval coverage, width, and stability. These findings provide a practical guide for selecting and tuning prediction interval methods in engine health management and prognostics, ensuring both interpretability and precision in real-world applications.

2605.30580 2026-06-01 cs.CL cs.LG 版本更新

Speculative Decoding Across Languages

跨语言的推测解码

Nirajan Paudel, Michael Ginn, Luc De Nardi, Alexis Palmer

发表机构 * University of Colorado(科罗拉多大学)

AI总结 本文研究了通过微调草稿模型或使用n-gram模型来提高非英语语言中推测解码效率的策略,发现任务特定蒸馏虽能提升效率但泛化性差,而n-gram模型尽管接受率较低,但由于生成速度快,始终能提供显著的加速效果。

Comments 10 pages, 11 figures, submitted to ACL ARR May 2026

详情
AI中文摘要

推测解码已成为大型语言模型(LLM)推理的关键组成部分,通过草拟多个令牌并并行验证,实现更快的生成。然而,小型草稿模型往往在多语言能力上严重不足。因此,在生成非英语文本时,推测解码的效率远低于英语。我们比较了三种提高十一种语言推测解码效率的策略:在任务特定数据(翻译)上微调草稿模型;在未标记的单语语料库上微调草稿模型;以及在相同单语语料库上训练简单的n-gram草稿模型。我们在翻译(从英语到目标语言)和保留任务故事生成上评估效率。我们发现,虽然任务特定蒸馏可以显著提高效率,但蒸馏模型在新任务上泛化能力差。与此同时,n-gram草稿模型尽管接受率较低,但由于草稿生成速度快得多,始终能提供大的加速。

英文摘要

Speculative decoding has become a crucial component of large language model (LLM) inference, enabling faster generation by drafting multiple tokens and verifying them in parallel. However, small draft models tend to suffer from disproportionately poor multilingual capabilities. Thus, when generating text in a non-English language, speculative decoding is far less effective. We compare three strategies to improve speculative decoding efficiency for eleven languages: finetuning the draft model on task-specific data (translation); finetuning the draft model on unlabeled monolingual corpora; and training simple n-gram draft models on the same monolingual corpora. We evaluate efficiency on translation (from English into the target language) and the held-out task of story generation. We find that while task-specific distillation can significantly improve efficiency, distilled models generalize poorly to a new task. Meanwhile, n-gram draft models, despite lower acceptance rates, consistently provide large speed-ups due to much faster draft generation.

2605.30573 2026-06-01 cs.LG 版本更新

Zeroth-Order Non-Log-Concave Sampling with Variance Reduction and Applications to Inverse Problems

零阶非对数凹采样与方差缩减及其在逆问题中的应用

M. Berk Sahin, Behzad Sharif, Abolfazl Hashemi

发表机构 * Elmore School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA(电子工程学院,普渡大学,西拉法叶,美国) Weldon School of Biomedical Engineering, Purdue University, West Lafayette, USA(生物医学工程学院,普渡大学,西拉法叶,美国)

AI总结 针对黑盒设置下非对数凹分布采样中梯度不可访问且经典零阶估计器方差大的问题,提出方差缩减的零阶朗之万采样方法,首次建立非渐近收敛保证,并应用于逆问题后验采样。

Comments Accepted to ICML 2026

详情
AI中文摘要

从具有未归一化密度的高维非对数凹分布中采样仍然是机器学习中的一个基本挑战,特别是在梯度信息不可访问或计算上禁止的黑盒设置中。虽然朗之万动力学在梯度可访问时提供了一个原则性的采样框架,但其扩展到黑盒设置时存在高方差问题,并且缺乏非对数凹采样的非渐近收敛保证。为了解决这些限制,我们提出了一种方差缩减的零阶朗之万采样方法。我们的方法采用了一种梯度估计器,该估计器显著降低了经典批处理零阶估计器的方差,并消除了准确估计所需批处理大小的不利维度依赖性,从而实现实用且稳定的采样。我们首次建立了零阶非对数凹采样在ε-相对Fisher信息以及(在Poincaré不等式假设下)平方总变差距离方面的非渐近收敛保证。我们进一步提出了ZO-APMC,一种用于具有预训练基于分数的生成先验的黑盒逆问题的后验采样算法,为此类方法建立了首个非渐近收敛保证。我们通过合成实验验证了我们的理论,并在实际线性和非线性逆问题上展示了强大的实证性能。

英文摘要

Sampling from high-dimensional, non-log-concave distributions with unnormalized densities remains a fundamental challenge in machine learning, particularly in black-box settings where gradient information is inaccessible or computationally prohibitive. While Langevin dynamics provides a principled framework for sampling when gradients are accessible, its extension to the black-box settings suffers from high variance and lacks non-asymptotic convergence guarantees for non-log-concave sampling. To address these limitations, we propose a variance-reduced zeroth-order Langevin sampling method. Our method employs a gradient estimator that substantially reduces the variance of the classical batched zeroth-order estimator and eliminates the unfavorable dimensional dependence of the batch size required for accurate estimation, enabling practical and stable sampling. We establish the first non-asymptotic convergence guarantees for zeroth-order non-log-concave sampling in terms of $\varepsilon$-relative Fisher information, and, under a Poincaré inequality assumption, squared total variation distance. We further propose ZO-APMC, a posterior sampling algorithm for black-box inverse problems with pre-trained score-based generative priors, establishing the first non-asymptotic convergence guarantees for such methods. We validate our theory through synthetic experiments and demonstrate strong empirical performance on practical linear and nonlinear inverse problems.

2605.30556 2026-06-01 cs.LG q-bio.NC 版本更新

Supervised Training Rapidly Degrades Early Visual Cortex Alignment Across Biologically Plausible Learning Rules

监督训练在生物合理学习规则下迅速降低早期视觉皮层对齐

Nils Leutenegger

发表机构 * Independent Researcher(独立研究者)

AI总结 研究发现无训练网络在早期视觉皮层表征相似性上优于或持平于训练网络,通过对比四种学习规则(BP、FA、PC、STDP)在训练过程中与人类fMRI数据的对齐变化,揭示全局误差信号(BP)比局部学习规则(PC、STDP)更剧烈地重塑早期表征。

Comments 7 pages, 4 figures

详情
AI中文摘要

随机、未训练的神经网络在早期视觉皮层的表征相似性上始终达到或超过训练网络。这一令人困惑的发现挑战了学习能改善大脑对齐的假设。我们通过追踪四种学习规则(反向传播BP、反馈对齐FA、预测编码PC和脉冲时序依赖可塑性STDP)在训练过程中与人类fMRI数据的表征相似性分析(RSA)对齐来研究这一问题。使用THINGS数据库中的720张物体图像和三名被试在六个视觉ROI上的fMRI数据,我们在八个训练检查点(epoch 0-40)测量模型与大脑表征相异矩阵之间的Spearman相关性。我们发现:(1)单个训练epoch根据学习规则不同使V1对齐降低25-90%;(2)反向传播对V1对齐的降低最为严重(delta r = -0.080),而预测编码和STDP保留更多(delta r ~ -0.04);(3)在物体选择皮层(LOC)中出现较弱的相反趋势,BP在训练中对齐增加最大,但绝对变化很小。这些结果表明,未训练架构仅通过归纳偏置捕获低级视觉统计,且全局误差信号(BP)比局部学习规则(PC、STDP)更激进地重塑早期表征,后者更好地保留了类脑结构。

英文摘要

Random, untrained neural networks consistently match or exceed trained networks in representational similarity to early visual cortex. This puzzling finding challenges the assumption that learning improves brain alignment. We investigate it by tracking representational similarity analysis (RSA) alignment to human fMRI data across training for four learning rules: backpropagation (BP), feedback alignment (FA), predictive coding (PC), and spike-timing-dependent plasticity (STDP). Using 720 object images from the THINGS database and fMRI data from three subjects across six visual ROIs, we measure Spearman correlations between model and brain representational dissimilarity matrices at eight training checkpoints (epochs 0-40). We find that (1) a single epoch of training reduces V1 alignment by 25-90%, depending on the learning rule; (2) backpropagation reduces V1 alignment most severely (delta r = -0.080), while predictive coding and STDP preserve substantially more (delta r ~ -0.04); and (3) a weaker, opposite tendency appears in object-selective cortex (LOC), where BP shows the largest increase in alignment during training, although the absolute change is small. These results suggest that untrained architectures capture low-level visual statistics through inductive biases alone, and that global error signals (BP) reshape early representations more aggressively than local learning rules (PC, STDP), which better preserve brain-like structure.

2605.30553 2026-06-01 cs.LG cs.IT math.IT 版本更新

Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future

破坏是学习生成的一般策略;扩散的优势在于认真对待它;探索是未来

Pierre-André Noël

发表机构 * ServiceNow AI Research(ServiceNow AI研究院)

AI总结 本文提出扩散模型作为信息隐藏与猜测框架的一部分,论证其破坏式信息隐藏比手工设计更灵活,尤其在数据稀缺场景有优势,并探讨强化学习技术移植到扩散上下文时的微妙问题及原生探索方向。

Comments Published April 27th, 2026 as an ICLR blogpost https://iclr-blogposts.github.io/2026/blog/2026/destruction/

详情
Journal ref
Noël, Piere-André. "Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future", ICLR Blogposts, 2026
AI中文摘要

我将扩散模型视为机器学习技术家族的一部分,这些技术从模型输入中隐藏信息,并训练模型猜测被隐藏的信息。我认为扩散的破坏式信息隐藏方法比典型的手工设计信息隐藏技术更灵活,提供了一个丰富的训练环境,在某些场景(尤其是数据稀缺场景)中可能具有优势。然后,我讨论了将强化学习技术移植到扩散上下文时可能出现的微妙问题,并思考如何以更扩散原生的方式解决这些探索问题。我没有确定的答案,但我指出了我认为有趣的方向。本文之后附有一篇教程,进一步阐述了先破坏后生成的观点。为了便于教程的阐述,引入了一种新型的概率图模型。

英文摘要

I present diffusion models as part of a family of machine learning techniques that withhold information from a model's input and train it to guess the withheld information. I argue that diffusion's destroying approach to withholding is more flexible than typical hand-crafted information withholding techniques, providing a rich training playground that could be advantageous in some settings, notably data-scarce ones. I then address subtle issues that may arise when porting reinforcement learning techniques to the diffusion context, and wonder how such exploration problems could be addressed in more diffusion-native ways. I do not have definitive answers, but I do point my fingers in directions I deem interesting. A tutorial follows this thesis, expanding on the destroy-then-generate perspective. A novel kind of probabilistic graphical models is introduced to facilitate the tutorial's exposition.

2605.30550 2026-06-01 cs.LG 版本更新

Early Prediction of Future Behavioral Strategy from Process Traces

从过程轨迹早期预测未来行为策略

Robert Kasumba, Dennis Barbour, Chien-Ju Ho

发表机构 * Division of Computational and Data Sciences(计算与数据科学系) Department of Biomedical Engineering(生物医学工程系) Department of Computer Science(计算机科学系)

AI总结 提出过程级潜变量模型(PLVM),通过跨任务过程轨迹融合共享人级潜在表示,实现早期跨任务行为策略预测。

详情
AI中文摘要

自适应系统通常需要从有限的证据中做出关于人的特定任务决策:导师可能需要预测学习者将如何解决新问题,游戏可能需要适应玩家进入新关卡,人机系统可能需要推断合作伙伴是会坚持计划还是切换目标。这些决策依赖于塑造人们如何解决相关任务的人级倾向,但这类倾向难以从标准行为证据中推断。一种方法是使用聚合结果摘要,如分数、完成率或生产率;这些摘要紧凑且跨任务可用,但可能将不同的行为过程压缩为相似的结果。另一种方法是使用过程级轨迹,记录行为如何展开;然而,单一任务内的过程建模可能将稳定的人级倾向与任务特定布局和可供性纠缠在一起。在本工作中,我们研究早期跨任务行为推断:部分源任务过程轨迹是否能揭示可迁移的人级结构,从而预测保留目标任务中的策略。我们引入过程级潜变量模型(PLVM),该模型编码任务特定轨迹并将其融合为共享的人级潜在表示以进行跨任务预测。在自然主义的人类游戏遥测数据集PowerWash Simulator中,PLVM使用来自两个清洁任务的部分轨迹,预测保留的消防站关卡中局部持久的区域规划者行为与频繁的区域跳跃者行为。具有已知潜在类型的受控模拟表明,当源任务揭示共享潜在过程的互补维度时,跨任务融合有所帮助。这些结果表明,当观察足够的目标任务行为不切实际时,过程级跨任务建模可以支持目标任务策略的早期预测。

英文摘要

Adaptive systems often need to make task-specific decisions about people from limited evidence: a tutor may need to anticipate how a learner will approach a new problem, a game may need to adapt when a player enters a new level, and a human-AI system may need to infer whether a partner will persist with a plan or switch goals. These decisions depend on person-level tendencies that shape how people solve related tasks, but such tendencies are difficult to infer from standard behavioral evidence. One approach is to use aggregate outcome summaries, such as scores, completion rates, or productivity; these summaries are compact and available across tasks, but can collapse distinct behavioral processes into similar outcomes. Another approach is to use process-level traces, which record how behavior unfolds; however, process modeling within one task can entangle stable person-level tendencies with task-specific layout and affordances. In this work, we study early cross-task behavioral inference: whether partial source-task process traces can reveal transferable person-level structure that predicts strategy in a held-out target task. We introduce a Process-Level Latent Variable Model (PLVM), which encodes task-specific traces and fuses them into a shared person-level latent representation for cross-task prediction. In PowerWash Simulator, a naturalistic telemetry dataset of human gameplay, PLVM uses partial traces from two cleaning tasks to predict locally persistent Zone Planner behavior versus frequent Zone Hopper behavior in the held-out Fire Station level. Controlled simulations with known latent types show that cross-task fusion helps when source tasks reveal complementary dimensions of a shared latent process. These results suggest that process-level cross-task modeling can support early prediction of target-task strategy when observing sufficient target-task behavior is impractical.

2605.30541 2026-06-01 cs.LG physics.geo-ph 版本更新

SubsurfaceGen: Procedural Generation of Field-Scale Earth Models and Seismic Data

SubsurfaceGen: 野外尺度地球模型与地震数据的程序化生成

Joseph Stitt, Pratik Rathore, Madeleine Udell, Ching-Yao Lai

发表机构 * Stanford University(斯坦福大学)

AI总结 提出SubsurfaceGen,一个GPU加速的3D速度模型与地震数据生成器,并发布包含4276个2D速度切片、5秒波场和8秒炮集记录的数据集,用于评估机器学习在全波形反演中的表现。

Comments 38 pages

详情
AI中文摘要

全波形反演(FWI)是地下成像的黄金标准,应用范围从碳封存到能源和矿产勘探再到地震灾害评估。机器学习方法进行FWI需要野外尺度、地质多样性和物理真实的训练数据,但现有资源如Marmousi、SEAM和OpenFWI在空间范围、时间范围、地质多样性和物理真实性方面存在不足。我们通过SubsurfaceGen(一个用于3D速度模型和地震数据的GPU加速生成器)来解决这些限制。与SubsurfaceGen一起,我们发布了一个配对数据集,包含来自42个真实、野外尺度的3D速度模型的4276个2D速度切片、5秒波场和8秒炮集记录,每个模型横向跨度10 km x 10 km,深度6.19 km,分辨率为10 m。该数据集涵盖六种地质环境——四种由SubsurfaceGen构建,两种来自先前来源——与碳封存和碳氢化合物勘探相关。我们使用该数据集评估神经算子进行波场预测和编码器-解码器进行端到端速度反演,并保留一种地质环境用于分布外测试。这些实验揭示了野外尺度的失败模式,并展示了SubsurfaceGen及相关数据集如何影响基于机器学习的FWI。

英文摘要

Full waveform inversion (FWI) is the gold standard for subsurface imaging, with applications from carbon sequestration to energy and mineral exploration to earthquake hazard assessment. Machine learning approaches to FWI need field-scale, geologically diverse, and physically realistic training data, but existing resources such as Marmousi, SEAM, and OpenFWI fall short on spatial extent, temporal extent, geological diversity, and physical realism. We address these limitations with SubsurfaceGen, a GPU-accelerated generator for 3D velocity models and seismic data. Along with SubsurfaceGen, we release a paired dataset of 4,276 2D velocity slices, 5 s wavefields, and 8 s shot gathers drawn from 42 realistic, field-scale 3D velocity models, each spanning 10 km x 10 km laterally and 6.19 km deep at 10 m resolution. The dataset spans six geological settings -- four built with SubsurfaceGen and two drawn from prior sources -- relevant for carbon sequestration and hydrocarbon exploration. We use this dataset to evaluate neural operators on wavefield prediction and encoder-decoders on end-to-end velocity inversion, holding out one geological setting for out-of-distribution testing. These experiments surface failure modes at field-scale and demonstrate how SubsurfaceGen and the associated dataset can impact ML-based FWI.

2605.30538 2026-06-01 cs.LG 版本更新

DisasterLex: An Expert Concept-to-Schema Knowledge Graph for Geospatial Reasoning in Disaster Analytics

DisasterLex:面向灾害分析中地理空间推理的专家概念到模式知识图谱

Yiming Xiao, Ankit Basu, Kai Yin, Sahil Vartak, Christian Swords, Ali Mostafavi

发表机构 * Texas A&M University(德克萨斯大学)

AI总结 提出DisasterLex框架,通过插入专家知识图谱(EKG)将用户查询与数据库模式桥接,在灾害分析场景中实现文本到SQL的准确转换,性能优于现有方法1.4-2.75倍。

详情
AI中文摘要

灾害不可避免且日益昂贵,有效响应依赖于查询结构化表格数据:支撑灾害管理的精确、信息密集的危害、暴露度、脆弱性和生命线基础设施记录。当前的文本到SQL方法允许自然语言访问此类表格,但迁移到灾害领域时效果不佳,因为查询跨越异构地理空间模式,并需要对因果关系进行推理。我们引入DisasterLex,一个知识图谱中介的框架,在用户查询和数据库之间插入一个包含精选概念和类型化因果边的专家知识图谱(EKG),并通过概念到表格链接与模式桥接。该编排运行四个阶段(识别查询实体、路由到操作域、在因果边上规划、以及生成SQL),在每个步骤限制传递给模型的模式。我们在一个灾害分析数据库(36个地理空间表,150列)上实例化,该数据库具有包含107个概念、117条因果边和52个概念到模式链接的EKG,并在75个查询的测试集上评估。在所有七个涵盖专有和开源权重系列的基础模型上,DisasterLex以1.65到3.56(满分5.0)的绝对分数,比四个最先进的基线(LightRAG、HippoRAG 2、ReFoRCE、CHESS)高出1.4到2.75倍。错误分析显示基线失败集中在路由和多表SQL组合上,这正是我们的编排明确解决的操作。代码、数据和EKG工件可在https://github.com/YimingXiao98/DisasterLex 和Zenodo https://doi.org/10.5281/zenodo.20388029 获取。

英文摘要

Disasters are inevitable and increasingly costly, and effective response depends on querying structured tabular data: precise, information-dense records of hazard, exposure, vulnerability, and lifeline infrastructure that underpin disaster management. Current text-to-SQL methods enable natural-language access to such tables but transfer poorly to the disaster domain, where queries span heterogeneous geospatial schemas and require reasoning over causal relations. We introduce DisasterLex, a knowledge-graph-mediated framework that inserts an Expert Knowledge Graph (EKG) of curated concepts and typed causal edges between the user query and the database, bridged to schema by concept-to-table links. The orchestration runs four stages (identifying query entities, routing to the operational domain, planning over causal edges, and grounding the SQL), restricting the schema passed to the model at each step. We instantiate it on a disaster-analytics database (36 geospatial tables, 150 columns) with an EKG of 107 concepts, 117 causal edges, and 52 concept-to-schema links, evaluated on a 75-query test set. On all seven base models spanning proprietary and open-weight families, DisasterLex beats four state-of-the-art baselines (LightRAG, HippoRAG 2, ReFoRCE, CHESS) by 1.4x to 2.75x, with absolute scores of 1.65 to 3.56 (of 5.0). Error analysis shows baseline failures cluster in routing and multi-table SQL composition, the operations our orchestration explicitly addresses. Code, data, and the EKG artifact are available at https://github.com/YimingXiao98/DisasterLex and on Zenodo at https://doi.org/10.5281/zenodo.20388029.

2605.30537 2026-06-01 cs.LG 版本更新

The Long-Term Effects of Data Selection in LLM Fine-Tuning

LLM微调中数据选择的长期影响

Yuxin Yang, Aoxiong Zeng, Xiangquan Yang

发表机构 * Shanghai University(上海大学) East China Normal University(华东师范大学)

AI总结 研究多阶段LLM微调中,短视数据选择策略(如基于当前效用)可能导致后续学习变慢、遗忘加剧和排名反转,提出长视距感知选择(LHAS)目标函数以缓解此问题。

Comments work in process

详情
AI中文摘要

数据选择越来越多地被用于降低大型语言模型(LLM)微调的成本,近期方法根据当前效用、多样性、质量或影响力对样本进行优先级排序。本文研究一个不同的问题:当微调在多个阶段进行时,当前看起来最优的选择策略是否会使模型后续适应性变差?我们引入LLM数据选择的长期视角,其中选择器不仅通过即时任务性能评估,还通过未来适应速度、遗忘、能力不平衡和分布外鲁棒性评估。我们在统一的多阶段协议下比较了代表性的随机、基于损失、基于梯度、基于多样性、基于质量和基于效用-多样性的选择家族。通过旨在实例化该协议的控制实验,我们展示了短期选择器如何表现出排名反转:它们改善了当前阶段,同时减慢了后续学习并增加了遗忘。我们将这种行为形式化为“短视选择”,提供了其可能发生的简单局部分析,并提出了一个诊断性的长视距感知选择(LHAS)目标函数,该函数在即时效用基础上增加了覆盖度、未来代理迁移和反集中项。该研究认为,数据选择应被评估为一种塑造模型学习轨迹的训练干预,而不仅仅是一种局部数据效率机制。

英文摘要

Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different question: when fine-tuning occurs over multiple stages, can selection strategies that look optimal now make the model less adaptable later? We introduce a long-horizon view of LLM data selection in which a selector is evaluated not only by immediate task performance, but also by future adaptation speed, forgetting, capability imbalance, and out-of-distribution robustness. We compare representative random, loss-based, gradient-based, diversity-based, quality-based, and utility-diversity selection families under a unified multi-stage protocol. Through controlled experiments designed to instantiate this protocol, we show how short-term selectors can exhibit rank reversal: they improve the current stage while slowing subsequent learning and increasing forgetting. We formalize this behavior as \emph{myopic selection}, provide a simple local analysis of why it can occur, and propose a diagnostic Long-Horizon Aware Selection (LHAS) objective that augments immediate utility with coverage, future-proxy transfer, and anti-concentration terms. The study argues that data selection should be evaluated as a training intervention that shapes the model's learning trajectory, rather than only as a local data-efficiency mechanism.

2605.30532 2026-06-01 stat.CO cs.LG stat.ML 版本更新

True Self-Avoiding Walk for Accelerating Markov-Chain Monte Carlo Integration

真实自回避行走用于加速马尔可夫链蒙特卡洛积分

Qinghua, Ding, Venkat Anantharam

发表机构 * Department of Electrical Engineering and Computer Sciences University of California at Berkeley(加州大学伯克利分校电子工程与计算机科学系)

AI总结 本文提出使用真实自回避行走(TSAW)改进马尔可夫链蒙特卡洛(MCMC)积分估计,通过惩罚过度访问的转移概率,使得经验积分误差达到几乎必然的O(√log t / t)量级,显著优于标准随机游走的t^{-1/2}误差。

详情
AI中文摘要

我们研究真实自回避行走(TSAW)作为一种通过马尔可夫链蒙特卡洛(MCMC)改进经验积分估计的机制。我们考虑与有限集上不可约马尔可夫核$P$(具有平稳分布$π$)相关的有限状态自适应采样动力学,其中转移概率根据经验过度使用而受到惩罚。我们的主要结果是,由此产生的基于TSAW的行走的经验占用计数$L_t(i)$和转移计数$N_t(i,j)$满足\[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad ext{和}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad ext{几乎必然} \]对于每个状态$i$和每个满足$P_{ij}>0$的边$(i,j)$。因此,对于每个有界函数$f:V o\mathbb R$,我们的积分估计器的误差收敛为\[ \left| rac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i) ight| = O\left( rac{\sqrt{\log t}}{t} ight) \qquad ext{几乎必然}. \]这些结果表明,与标准随机游走方法下经验平均的通常$t^{-1/2}$误差标度相比,基于TSAW的估计器产生的经验积分误差几乎必然为$O(\sqrt{\log t}/t)$量级,从而实现了对样本量$t$的显著更尖锐的依赖性。

英文摘要

We study true self-avoiding walk (TSAW) as a mechanism for improving empirical integral estimation via Markov chain Monte Carlo (MCMC). We consider finite-state adaptive sampling dynamics associated with an irreducible Markov kernel $P$ on a finite set, with stationary distribution $π$, in which the transition probabilities are penalized according to empirical overuse. Our main result is that the empirical occupation counts $L_t(i)$ and transition counts $N_t(i,j)$ of the resulting TSAW-based walk satisfy \[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad\text{and}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad\text{almost surely} \] for every state $i$ and every edge $(i,j)$ with $P_{ij}>0$. Consequently, for every bounded function $f:V\to\mathbb R$, the error of our integral estimator converges as \[ \left|\frac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i)\right| = O\left(\frac{\sqrt{\log t}}{t}\right) \qquad\text{almost surely}. \] These results show that, in contrast with the usual $t^{-1/2}$ error scaling for empirical averages under standard random-walk-based methods, TSAW-based estimator yields empirical integral errors of order $O(\sqrt{\log t}/t)$ almost surely, thereby achieving a substantially sharper dependence on the sample size $t$.

2605.30529 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

通用嵌入还是特定嵌入,哪个更好?非英语语言临床编码搜索的实证研究

David Rey-Blanco, Roberto Cruz

发表机构 * TietAI

AI总结 本研究通过使用大型生成语言模型生成的合成数据微调双语编码器,构建两阶段检索器,解决了非英语语言临床编码检索中召回率下降的问题,并在多语言基准上取得了优于BioBERT-ST的性能。

Comments 24 pages, 12 figures, 6 tables

详情
AI中文摘要

用于语义搜索的句子嵌入模型绝大多数是在英语语料库上开发和评估的。当应用于其他语言的临床检索——特别是ICD-10-CM/CIE-10代码的检索——召回率会下降,而这种下降往往被聚合基准所掩盖。我们研究大型生成语言模型是否可以作为数据工厂来缩小这一差距。我们构建了一个两阶段检索器(双编码器后接交叉编码器重排序器),该检索器在Gemini生成的合成数据(涵盖英语、西班牙语、加泰罗尼亚语、意大利语、葡萄牙语和法语)上对西班牙生物医学编码器(PlanTL-GOB-ES/bsc-bio-ehr-es)进行微调,并与BioBERT-ST和未调优的西班牙编码器进行评估。仅双编码器在MRR(0.876 vs. 0.866)上匹配BioBERT-ST,并在R@3(0.650 vs. 0.626)和R@5(0.804 vs. 0.790)上超越它,且无需英语生物医学预训练。添加交叉编码器重排序器将聚合R@5提升至0.822,并在五种语言中的四种上占据主导地位(西班牙语+0.017,加泰罗尼亚语+0.033,法语+0.018,葡萄牙语+0.037),但以英语的小幅回归为代价。这种权衡在临床上是可接受的:葡萄牙语的R@5达到0.829,而BioBERT-ST为0.714。贡献:一个基于LLM生成数据构建领域特定医学检索器的开放配方;学习增益的量化(MRR从0.755到0.876,+15.9%,使用约19,500个合成对);以及按语言和排名对增益集中区域的刻画。

英文摘要

Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora. When applied to clinical retrieval in other languages -- particularly retrieval of ICD-10-CM / CIE-10 codes -- recall degrades in ways often masked by aggregate benchmarks. We study whether large generative language models can serve as data factories to close this gap. We build a two-stage retriever (bi-encoder followed by cross-encoder reranker), fine-tuned from a Spanish biomedical encoder (PlanTL-GOB-ES/bsc-bio-ehr-es) on Gemini-generated synthetic data covering English, Spanish, Catalan, Italian, Portuguese and French, and evaluate against BioBERT-ST and the un-tuned Spanish encoder. The bi-encoder alone matches BioBERT-ST on MRR (0.876 vs. 0.866) and overtakes it on R@3 (0.650 vs. 0.626) and R@5 (0.804 vs. 0.790) without English biomedical pretraining. Adding a cross-encoder reranker lifts aggregate R@5 to 0.822 and dominates on four of five languages (+0.017 Spanish, +0.033 Catalan, +0.018 French, +0.037 Portuguese) at the cost of a small English regression. The trade-off is clinically acceptable: Portuguese reaches R@5 = 0.829 vs. BioBERT-ST's 0.714. Contributions: an open recipe for building domain-specific medical retrievers from LLM-generated data; quantification of the learning gain (MRR 0.755 to 0.876, +15.9% with ~19,500 synthetic pairs); and a characterisation of where gains concentrate by language and rank.

2605.30526 2026-06-01 cs.LG cs.CL 版本更新

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

测量、定位和消融LLMs中的对齐特征

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür, Nick Feamster

发表机构 * University of Chicago(芝加哥大学) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Toyota Technological Institute at Chicago(芝加哥丰田技术研究所)

AI总结 研究通过对比人类文本、基模型和对齐模型生成,发现对齐训练引入AI风格特征,并提出PASTA方法通过消融对齐方向来降低AI检测率。

详情
AI中文摘要

对齐语言模型通常表现出可识别的AI风格,但其与后训练和内部表示的联系尚不清楚。本文研究后训练是否引入或放大了AI风格规律,以及这些规律是否具有局部内部特征。为此,我们在匹配的人类源前缀下比较人类文本、基模型生成和对齐模型生成。对齐生成显示出比基生成更低的人类语料库亲和力和更高的AI检测率,表明后训练使生成文本偏离人类语料库风格,转向检测器可见的AI风格文本。然后我们引入PASTA(后训练对齐特征目标消融),一种无需训练的方法,通过对齐-基残差对比估计后训练对齐特征,并在解码过程中消融相应方向。在11个对齐模型和6个AI检测器上,PASTA降低了对大多数对齐模型的检测率;该效果在检测器间良好迁移,且不被随机方向复现。定性分析表明,PASTA生成保持相关性和连贯性,同时表现出更大的风格变化。这些结果共同表明,后训练的AI风格效果可以通过激活消融进行测量、定位和因果测试。

英文摘要

Aligned language models often exhibit a recognizable AI-like style, yet its connection to post-training and internal representations remains poorly understood. In this work, we study whether post-training introduces or amplifies AI-like stylistic regularities and whether these regularities have a localized internal signature. To this end, we compare human text, base-model generations, and aligned-model generations under matched human-source prefixes. Aligned generations show lower human-corpus affinity and higher AI-detection rates than base generations, suggesting that post-training shifts generated text away from human-corpus style and toward detector-visible AI-like text. We then introduce PASTA (Post-training Alignment Signature Targeted Ablation), a training-free method that estimates a post-training alignment signature from aligned-base residual contrasts and ablates the corresponding direction during decoding. Across 11 aligned models and 6 AI detectors, PASTA lowers the detection rate for most aligned models; this effect transfers well across detectors and is not reproduced by random directions. Qualitative analysis suggests that PASTA generations remain relevant and coherent while exhibiting greater stylistic variation. Together, these results show that AI-like stylistic effects of post-training can be measured, localized, and causally tested through activation ablation.

2605.30524 2026-06-01 cs.LG 版本更新

Representation Collapse in Sequential Post-Training of Large Language Models

大型语言模型顺序后训练中的表示坍缩

Yichen Liu, Mingyu Chen, Hao Wang, Xiaoran Xu, Chenxi Lin, Rui Zhang, Yutong Zhou, Yuxin Yang, Jiarui Wu, Wei Sun

发表机构 * Hangzhou Dianzi University(杭州电子科技大学) Zhejiang Gongshang University(浙江工商大学) Ningbo University(宁波大学) Shanghai University(上海大学)

AI总结 研究大型语言模型在顺序后训练阶段中内部表示逐渐压缩为低秩、各向异性且同质的特征空间,并提出轻量级干预措施以保持未来可学习性。

Comments work in progress

详情
AI中文摘要

大型语言模型现在通过一系列后训练阶段进行适配,而不是通过单次指令微调。本文研究这种顺序后训练是否逐渐将内部表示压缩为低秩、各向异性且同质的特征空间。我们定义了一套针对隐藏状态、logits、token轨迹和LoRA更新的测量方法,并利用它来分析在受控阶段顺序下的监督微调、偏好优化、安全/拒绝调优、数学和代码专业化以及长思维链调优。中心假设是,过度的表示集中不仅仅是几何上的奇特性:它预示着后期适配中可塑性降低、域外泛化能力减弱以及校准效果变差。我们进一步评估了轻量级干预措施,包括混合域重放、特征刷新、表示多样性正则化和LoRA更新去相关,作为在不放弃后训练行为收益的情况下保持未来可学习性的方法。

英文摘要

Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies whether such sequential post-training gradually compresses internal representations into low-rank, anisotropic, and homogeneous feature spaces. We define a measurement suite for hidden states, logits, token trajectories, and LoRA updates, and we use it to analyze supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings. The central hypothesis is that excessive representation concentration is not merely a geometric curiosity: it predicts reduced plasticity during later adaptation, weaker out-of-domain generalization, and poorer calibration. We further evaluate lightweight interventions, including mixed-domain replay, feature refresh, representation diversity regularization, and LoRA update decorrelation, as ways to preserve future learnability without giving up the behavioral gains of post-training.

2605.30523 2026-06-01 cs.LG cs.AI cs.CC cs.CL cs.FL 版本更新

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

重新审视填充Transformer的表达能力:哪些架构选择重要,哪些不重要

Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

发表机构 * ETH Zürich(苏黎世联邦理工学院) Allen Institute for AI(人工智能研究所)

AI总结 本文通过连接布尔电路,系统研究了填充Transformer的表达能力,发现数值精度和模型深度是影响表达能力的主要因素,而注意力类型、模型宽度和均匀性等架构选择对表达能力影响不大。

详情
AI中文摘要

近期工作通过连接布尔电路描述了Transformer能计算和不能计算的内容,但现有结果缺乏精确刻画,且对建模选择敏感。填充Transformer——在其输入后附加填充符号如“...”——通过为自适应并行计算提供多项式空间,成为建立与电路类等价关系的有用工具。然而,目前仅研究了有限的填充Transformer理想化模型,这些等价关系在注意力类型、模型宽度和均匀性变化下的稳健性仍待探索。我们发现,在实际假设下,填充Transformer对所有这些变化都出奇地稳健,并确定数值精度和模型深度是影响表达能力的主要因素。具体地,我们证明多项式填充的L-均匀常数精度Transformer等价于L-均匀AC⁰,而增长精度的Transformer达到L-均匀TC⁰,与宽度无关。此外,循环机制允许类似电路的顺序处理:log^d N次循环的常数精度Transformer达到FO-均匀AC^d,增长精度的达到FO-均匀TC^d。有趣的是,宽度或精度超过对数增长并不会增加表达能力,且我们所有结果对softmax和平均硬注意力Transformer均成立。

英文摘要

Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.

2605.30514 2026-06-01 cs.LG cs.CL 版本更新

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

MAAT: 多阶段适配器感知的定向遗忘学习

Suryash Yagnik, Shubham Gaur, Saksham Thakur, Vinija Jain, Aman Chadha, Amitava Das

发表机构 * Indian Institute of Information Technology, Bhopal, India(印度比哈尔理工学院) University of California, Santa Cruz, USA(加州大学圣克鲁兹分校) Independent Researcher(独立研究者) Stanford University, USA(斯坦福大学) BITS Pilani Goa, India(比斯拉米印度学院)

AI总结 针对现有机器遗忘评估中因果知识(Why类)样本极少导致评估失衡的问题,提出5WBENCH平衡基准和MAAT多阶段框架,首次在Why类知识上同时实现高遗忘与高保留。

Comments 16 pages, 4 figures, 10 tables

详情
AI中文摘要

机器遗忘评估在结构上存在偏差:Why类问题(探究因果和关系知识)在CounterFact中占比不足0.06%,在ZSRE中占0.6%,在TOFU、MUSE和WMDP-Cyber中占不到1.3%。这种近乎为零的表示意味着,在因果知识上失败的方法可以在整体上获得高分,而这种失败在没有平衡评估的情况下是无法检测的。我们提出了5WBENCH,一个平衡的5000样本基准,每个5W类别(谁、什么、何时、何地、为什么)包含1000个样本,首次使因果遗忘失败变得可量化。使用5WBENCH,我们表明现有基线方法无法在Why类问题上同时实现高遗忘和高保留:激进的遗忘会降低保留知识,而保守的方法则无法遗忘因果事实。Why类问题的难度源于多跳推理链(Why条目占44%,其他类别≤2%)以及超过40.1个token答案跨度上的梯度稀释。我们提出了MAAT(多阶段适配器感知的定向遗忘学习),一个三阶段框架,作用于LoRA适配器权重,结合梯度投影上升、SVD秩维度剪枝、任务向量否定以及混合KL-隐藏状态保留修复。MAAT是第一个在Why类因果知识上同时实现高遗忘和高保留的方法,在遗忘-保留帕累托前沿上达到了新的操作点。我们公开了代码。

英文摘要

Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and less than 1.3% of TOFU, MUSE, and WMDP-Cyber. This near-zero representation means that methods that fail on causal knowledge can score highly in aggregate, and this failure is undetectable without balanced evaluation. We present 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), making causal unlearning failures quantifiable for the first time. Using 5WBENCH, we show that no existing baseline simultaneously achieves high forgetting and high retention on Why-type questions: aggressive forgetting degrades retained knowledge, while conservative methods fail to forget causal facts. Why-type difficulty stems from multi-hop reasoning chains (44% of Why entries vs. less than or equal to 2% for others) and gradient dilution over 40.1-token answer spans. We present MAAT (Multi-phase Adapter-Aware Targeted Unlearning), a three-phase framework operating on LoRA adapter weights, combining gradient-projected ascent, SVD rank-dimension pruning, task vector negation, and hybrid KL-hidden-state retain repair. MAAT is the first method to simultaneously achieve high forgetting and high retention on Why-type causal knowledge, reaching a new operating point on the forget-retain Pareto frontier. We make our code publicly available.

2605.30509 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Improved Distribution Estimation in $\ell_\infty$

在 $\ell_\infty$ 下的改进分布估计

Doron Cohen, Aryeh Kontorovich, Yonatan Livshitz

发表机构 * Department of Computer Science, Ben-Gurion University of the Negev(本·古里安大学计算机科学系)

AI总结 本文在 $\ell_\infty$ 范数下改进了离散概率分布的估计,给出了期望极小极大界和高概率尾界,解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的开放问题,包括最紧风险界的完全经验版本和最坏情况极值分布的形式,并报告了鼓励性的实证结果。

Comments 24 pages, 3 figures

详情
AI中文摘要

我们提出了在 $\ell_\infty$ 范数下估计离散概率分布的改进界。这些包括期望极小极大界和高概率尾界。我们解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的一些开放问题——包括他们提出的最紧风险界的完全经验版本以及识别最坏情况极值分布的形式。还报告了鼓励性的实证结果。

英文摘要

We present improved bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These include minimax bounds in expectation and high-probability tail bounds. We resolve some of the open questions posed in Kontorovich and Painsky (JMLR, 2025) -- including a fully empirical version of the tightest risk bound they presented and identifying the form of the worst-case extremal distribution. Encouraging empirical results are reported as well.

2605.30486 2026-06-01 cs.LG cs.AI 版本更新

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

图条件化的图神经网络专家混合模型用于交通预测

Amirhossein Ghaffari, Saeid Sheikhi, Ekaterina Gilman

发表机构 * Future Computing Group, University of Oulu(奥卢大学未来计算组)

AI总结 提出GC-MoE框架,通过图拓扑和近期交通输入为每个节点分配个性化专家组合,仅训练轻量路由模块,在四个基准上提升MAE。

Comments An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026)

详情
AI中文摘要

传感器图上的时空预测通常采用统一应用于所有节点的单一骨干架构,尽管图区域可能表现出不同的动态。道路段在功能类别、结构和交通行为上存在差异,表明节点级专家专业化可能是有用的。我们提出GC-MoE,一种图条件化的专家混合框架,基于图拓扑和近期交通输入窗口为每个节点分配个性化的冻结预测专家组合。GC-MoE将冻结的预训练时空GNN专家与输入感知、空间上下文化的路由器相结合,同时仅训练轻量级路由模块。我们还研究了一个有界图条件化输出精炼层作为可选扩展,并仅作为消融诊断包含节点自适应ST-LoRA适配器。在四个标准基准(PEMS04、PEMS07、METR-LA和PEMS-BAY)上,GC-MoE在零参数集成基线上改善了MAE,具有竞争力的RMSE和MAPE,同时在1.5M冻结专家权重之上仅训练约17K参数。实现代码见https://github.com/Ahghaffari/gc_moe。

英文摘要

Spatio-temporal forecasting on sensor graphs is commonly tackled with a single backbone architecture applied uniformly across all nodes, although graph regions can exhibit different dynamics. Road segments differ in functional class, structure, and traffic behavior, suggesting that node-wise expert specialization can be useful. We propose GC-MoE, a graph-conditioned mixture of experts framework that assigns each node a personalized combination of frozen forecasting experts based on graph topology and the recent traffic input window. GC-MoE combines frozen pretrained spatio-temporal GNN experts with an input-aware, spatially contextualized router while training only a lightweight routing module. We also study a bounded graph-conditioned output refinement layer as an optional extension and include node-adaptive ST-LoRA adapters only as an ablation diagnostic. Across four standard benchmarks (PEMS04, PEMS07, METR-LA, and PEMS-BAY), GC-MoE improves MAE over a zero-parameter ensemble baseline, with competitive RMSE and MAPE, while training only ~17K parameters on top of 1.5M frozen expert weights. The implementation is available at https://github.com/Ahghaffari/gc_moe.

2605.30482 2026-06-01 cs.LG 版本更新

Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability

通过机制可解释性发现 Dyck 路径上的 Zeta 映射算法

Xiaoyu Huang, Blake Jackson, Kyu-Hwan Lee

发表机构 * Department of Mathematics, Temple University, Philadelphia, PA, USA(特拉华大学数学系) Institute for Computer-Aided Reasoning in Mathematics, Carnegie Mellon University, Pittsburgh, PA, USA(计算机辅助数学推理研究所,卡内基梅隆大学) Department of Mathematics, University of Connecticut, Storrs, CT, USA(康乃狄克大学数学系) Korea Institute for Advanced Study, Seoul 02455, Republic of Korea(韩国高等研究院)

AI总结 本文通过训练一个小型编码器-解码器 Transformer 模型来学习 Dyck 路径上的 zeta 映射,并利用机制可解释性工具分析其计算过程,从而发现并证明了一种新的显式组合算法——脚手架映射。

详情
AI中文摘要

机器学习越来越多地用于数学发现,但在数学中,期望的输出通常不是预测本身,而是一个可以独立验证的显式构造。我们通过 Dyck 路径上的 zeta 映射(q,t-卡特兰数组合学中的一个经典双射)来研究这一设定。我们在该映射上训练了一个特意设计的小型单层单头编码器-解码器 Transformer,并使用机制可解释性工具(包括解码器交叉注意力分析、线性探测和因果干预)分析其学习到的计算过程。分析揭示了一种基于层级的机制:编码器表示使路径层级线性可访问,而解码器以结构化方式选择和遍历输入位置。将这些信号转化为组合学,得到了脚手架映射,这是一种针对 Dyck 路径的显式以峰为中心的遍历算法。我们证明该算法与 zeta 映射一致,只是标签的逆转约定有所不同。这提供了一个受控的 AI 辅助数学发现示例,其中机制可解释性将模型行为转化为精确、人类可验证的组合算法。

英文摘要

Machine learning is increasingly used in mathematical discovery, but in mathematics the desired output is often not a prediction itself, but an explicit construction that can be checked independently. We study this setting through the zeta map on Dyck paths, a classical bijection in the combinatorics of the q,t-Catalan numbers. We train a deliberately small one-layer, one-head encoder-decoder transformer on this map and analyze its learned computation using mechanistic interpretability tools, including decoder cross-attention analysis, linear probing, and causal intervention. The analysis reveals a level-based mechanism: encoder representations make path levels linearly accessible, while the decoder selects and traverses input positions in a structured way. Translating these signals into combinatorics leads to the scaffolding map, an explicit peak-centered traversal algorithm for Dyck paths. We prove that this algorithm agrees with the zeta map, modulo a reversal convention in the labeling. This gives a controlled example of AI-assisted mathematical discovery in which mechanistic interpretability turns model behavior into a precise, human-verifiable combinatorial algorithm.

2605.30479 2026-06-01 cs.LG 版本更新

Universal Multiclass Transductive Online Learning

通用多类别转导在线学习

Steve Hanneke, Hongao Wang

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA.(计算机科学系,普渡大学,西拉法叶,印第安纳州,47907,美国)

AI总结 研究具有可能无界标签空间的通用转导在线分类问题,通过引入“Level-Constrained-Littlestone-Littlestone (LCLL)树”和冷漠性质来刻画可学习性,并证明可学习类的最优错误率要么有界要么对数增长。

详情
AI中文摘要

我们考虑具有可能无界标签空间的通用转导在线分类问题。该设置考虑在线学习,其中实例序列(无标签)预先已知给学习器。我们说一个概念类$\mathcal{H}$是可学习的,如果存在一个学习算法$\mathcal{A}$,使得对于每个可实现序列,$\mathcal{A}$犯的错误数量最多随预测次数次线性增长。我们刻画了该设置的可学习性,并表明对于可学习类,只有两种可能的最优速率:有界或对数增长。我们引入了一种新的组合结构,称为“Level-Constrained-Littlestone-Littlestone (LCLL)树”,它与冷漠性质一起刻画了可学习性。我们还将可学习性结果扩展到不可知情况以及仅已知生成实例序列的随机过程的情况。

英文摘要

We consider the problem of universal transductive online classification with a possibly unbounded label space. This setting considers online learning, with the sequence of instances (without labels) known to the learner in advance. We say a concept class $\mathcal{H}$ is learnable if there is a learning algorithm $\mathcal{A}$, such that for every realizable sequence, the number of mistakes made by $\mathcal{A}$ grows at most sublinearly with the number of predictions. We characterize the learnability of this setting and show that there are only two possible optimal rates for the learnable classes: either bounded or increasing logarithmically. We introduce a new combinatorial structure, called ``Level-Constrained-Littlestone-Littlestone (LCLL) tree'', which, along with the indifference property, characterizes the learnability. We also extend the learnability result to the agnostic case and the case where only the stochastic process that generates the instance sequence is known.

2605.30476 2026-06-01 cs.IT cs.CR cs.LG math.IT 版本更新

Local Differential Privacy with Correlated Noise Achieves Central-DP Optimal Cost

具有相关噪声的本地差分隐私实现中心DP最优成本

Madhura Pathegama, Srikanth Avasarala, Viveck R. Cadambe, Juba Ziani

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 通过设计本地噪声之间的相关性,构造ε-差分隐私机制,使得估计成本与中心化设置下的最优成本匹配,差距可任意小。

详情
AI中文摘要

我们研究在存在诚实但好奇的服务器的情况下,私有地估计n个用户持有的值的总和。这要求不仅在数据发布时,而且在服务器端计算过程中也要保证隐私。因此,我们采用本地(纯)差分隐私模型,其中每个用户传输一个噪声扰动值。众所周知,与中心化模型(仅在聚合后添加噪声)相比,独立的本地噪声通常会导致显著的效用损失。我们表明,这种差距并非根本性的。通过精心设计本地添加的噪声变量之间的相关性,我们构造了ε-DP机制,其估计成本与中心化设置下可实现的最优成本匹配,误差可任意小。

英文摘要

We study privately estimating the sum of $n$ user-held values in the presence of an honest-but-curious server. This motivates requiring privacy not only at data release but also throughout server-side computation. We therefore adopt the local (pure) differential privacy model, in which each user transmits a noise-perturbed value. It is well known that independent local noise typically incurs a substantial utility loss compared to the centralized model, where noise is added only after aggregation. We show that this gap is not fundamental. By carefully designing correlations among the locally added noise variables, we construct $\varepsilon$-DP mechanisms whose estimation cost matches the optimal cost achievable in the centralized setting, up to an arbitrarily small error.

2605.30470 2026-06-01 cs.LG 版本更新

Can Subgraph Explanations Be Weaponized to Steal Graph Neural Networks?

子图解释能否被武器化以窃取图神经网络?

Ojas Nimase, Jiate Li, Yue Zhao, Yushun Dong

发表机构 * University of Southern California(南加州大学) Florida State University(佛罗里达州立大学)

AI总结 本文提出首个针对图分类的黑盒模型提取攻击,利用模型解释输出引导蒙特卡洛边敏感性估计,并利用解释子图缩小边界搜索空间,实验表明该方法优于现有基线。

Comments 28 pages, 8 figures, 10 tables. Under review at NeurIPS 2026

详情
AI中文摘要

图机器学习即服务(GMLaaS)平台越来越多地实现可解释性接口以满足监管透明度要求。然而,这种透明度为模型提取攻击创造了可利用的漏洞。我们提出了首个针对图分类的模型提取攻击,该攻击在严格的黑盒约束下进行,攻击者仅观察到离散类标签和二进制解释掩码(无概率分数、梯度或置信度值)。我们的方法(1)利用模型解释输出引导蒙特卡洛边敏感性估计朝向决策边界,并具有Hoeffding集中保证估计精度;(2)利用解释子图有效缩小边界搜索空间。在多个领域的基准图数据集上的大量实验表明,我们的方法优于可比基线。这些发现表明,此类可解释性接口创造了可利用的攻击面,为可解释AI指令的防御机制和政策框架提供了信息。实现代码见https://github.com/LabRAI/XSTEAL/。

英文摘要

Graph Machine Learning as a Service (GMLaaS) platforms increasingly implement explainability interfaces to meet regulatory transparency requirements. However, this transparency creates exploitable vulnerabilities for model extraction attacks. We present the first model extraction attack specifically designed for graph classification under strict black-box constraints where the attacker observes only discrete class labels and binary explanation masks (no probability scores, gradients, or confidence values). Our method (1) uses model explanation outputs to guide Monte Carlo edge sensitivity estimation toward decision boundaries, with Hoeffding concentration guarantees on estimation accuracy and (2) exploits explanation subgraphs to efficiently narrow the boundary search space. Extensive experiments on benchmark graph datasets across multiple domains demonstrate our method's superiority over comparable baselines. These findings demonstrate that such explainability interfaces create exploitable attack surfaces, informing both defensive mechanisms and policy frameworks for explainable AI mandates. The implementation code is provided in https://github.com/LabRAI/XSTEAL/.

2605.30462 2026-06-01 cs.LG cs.AI 版本更新

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

idSCD: 通过语义相关描述符识别训练数据集

Andrada Gobeaja, Ionut Hodoroaga, Elena Burceanu, Marius Leordeanu

发表机构 * POLITEHNICA University of Bucharest(巴尔贝鲁斯理工大学) Bitdefender, Romania(罗马尼亚Bitdefender公司) Institute of Mathematics of the Romanian Academy(罗马尼亚科学院数学研究所)

AI总结 提出基于语义相关描述符(SCD)的白盒方法,通过模型学习到的语义相关结构识别训练数据集中的成员关系,在多个实验设置中优于现有基线方法。

Comments 16 pages, 3 figures

详情
AI中文摘要

一个数据集能否通过其在训练过程中引起的虚假相关性被识别?我们认为,数据集会在模型学习的语义相关结构中留下特定于数据集的痕迹:在数据集中具有预测性但对底层任务非因果的偶然规律性,可能在训练过程中被内化。我们利用这一洞察研究数据集级别的成员推断,超越了依赖置信度分数、损失、边际、生成样本或查询响应等行为或分布证据的现有方法。我们引入了一种基于语义相关描述符(SCD)的白盒语义指纹方法,该方法捕获模型学习的语义相关结构,并使其在不同数据集混合中具有可比性。在受控的留一数据集诊断中,SCD恢复了数据集特定的变化,并完美区分匹配与非匹配的数据集对。然后,我们提出了一种实用的基于SCD的成员分数,该分数仅使用模型的SCD和目标数据集的独立SCD来测试目标数据集是否是模型训练混合的一部分,无需留一数据集模型。在三个不同的实验设置中,包括自然语言推理、情感分类和医学文本分类的数据集组,我们测试了基于SCD的成员推断在不同程度的语义分离和数据集划分之间的关键词支持下的优势和局限性。平均而言,基于该分数的分类器实现了最高的性能和最低的标准差,优于黑盒基线RMIA、Attack-P和LiRA,以及白盒基线SIF。这些结果表明,数据集成员可以通过内部语义相关性进行追踪,当数据集组暴露不同的语义特性时,ROC-AUC的最大相对增益超过60%。

英文摘要

Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-level membership inference, moving beyond existing methods that rely on behavioral or distributional evidence such as confidence scores, losses, margins, generated samples, or query responses. We introduce a white-box semantic fingerprinting approach based on semantic correlation descriptors (SCDs), which capture the semantic correlation structure learned by a model and make it comparable across dataset mixtures. In a controlled leave-one-dataset-out diagnostic, SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs. We then propose a practical SCD-based membership score that tests whether a target dataset is part of a model's training mixture using only the model's SCD and the target dataset's standalone SCD, without requiring leave-one-dataset-out models. Across three diverse experimental settings, with dataset groups for natural language inference, emotion classification, and medical text classification, we test both the advantages and limitations of SCD-based membership inference with different degrees of semantic separation and keyword support between dataset splits. On average, the classifier based on this score achieves the highest performance and the lowest std, outperforming black-box baselines RMIA, Attack-P, and LiRA, as well as the white-box SIF baseline. These results show that dataset membership can be traced through internal semantic correlations, with the largest relative gain exceeding 60% in ROC-AUC when dataset groups expose distinct semantic particularities.

2605.30461 2026-06-01 cs.LG cs.AI 版本更新

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

通过状态增强和共识实现可分离动力学的可扩展约束多智能体强化学习

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

发表机构 * Department of Engineering University Pompeu Fabra(工程系庞培法布拉大学)

AI总结 提出一种结合状态增强策略学习与对偶变量分布式共识的分布式约束多智能体强化学习方法,解决可分离动力学系统中全局资源约束的协调问题,实现线性可扩展性并保证约束满足。

Comments 17 pages, 8 figures, 3 tables. Plus appendix

详情
AI中文摘要

我们提出了一种用于约束多智能体强化学习(MARL)的分布式方法,该方法将状态增强策略学习与对偶变量的分布式共识相结合。我们的方法针对智能体具有可分离动力学但必须协调以满足全局资源约束的系统,正如我们通过实验证明的,在这种设置下,独立学习无法产生可行解,因为智能体无法确定各自对集体约束满足的适当贡献。关键技术贡献在于证明,对拉格朗日乘子进行轻量级邻居到邻居共识足以实现全局协调的约束执行,同时保持独立训练的可扩展性。每个智能体离线学习一个单一的增强策略,该策略以其局部状态和编码约束反馈的对偶变量为条件。在执行过程中,智能体仅通过局部通信就该对偶变量达成共识。我们证明,在温和的连通性假设下,智能体乘子之间的共识误差是有界的,并且表明这转化为有界的约束违反,该违反随图连通性和共识轮次增加而减小。与集中训练分散执行(CTDE)方法相比,后者的复杂度至少随智能体数量呈二次增长,而我们的方法在训练和执行中均呈线性扩展。在智能电网需求响应上的实验表明,共识协调对于可行性至关重要:没有共识,智能体只能通过无限期推迟需求来满足电网容量约束,这是一种退化的非解。有了共识,智能体收敛到共享的对偶变量,并同时满足电网约束和需求满足,可扩展到数千个智能体,而CTDE基线仅能处理数十个。

英文摘要

We present a distributed approach for constrained Multi-Agent Reinforcement Learning (MARL) that combines state-augmented policy learning with distributed consensus over dual variables. Our method targets systems where agents have separable dynamics but must coordinate to satisfy global resource constraints, a setting in which, as we demonstrate empirically, independent learning fails to produce feasible solutions because agents cannot determine appropriate individual contributions toward collective constraint satisfaction. The key technical contribution is showing that lightweight neighbor-to-neighbor consensus over Lagrange multipliers suffices for globally coordinated constraint enforcement while preserving the scalability of independent training. Each agent learns a single augmented policy offline, conditioned on both its local state and a dual variable encoding constraint feedback. During execution, agents reach agreement on this dual variable through local communication alone. We prove that under mild connectivity assumptions, the consensus error among agents' multipliers is bounded, and show that this translates to a bounded constraint violation that decreases with graph connectivity and the number of consensus rounds. Unlike centralized training with decentralized execution (CTDE) approaches, whose complexity grows at least quadratically with agent count, our method scales linearly in both training and execution. Experiments on smart grid demand response demonstrate that consensus coordination is \emph{essential for feasibility}: without it, agents satisfy grid capacity constraints only by indefinitely postponing demand, a degenerate non-solution. With consensus, agents converge to a shared dual variable and satisfy both grid constraints and demand fulfillment, scaling to thousands of agents while CTDE baselines are limited to dozens.

2605.30456 2026-06-01 cs.LG math.OC 版本更新

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

DisjunctiveNet: 通过可微凸优化层实现的神经符号学习

Shraman Pal, Can Li

发表机构 * Davidson School of Chemical Engineering, Purdue University, West Lafayette, USA(帕克大学化学工程大卫逊学校)

AI总结 针对数据稀疏且富含领域知识的场景,提出DisjunctiveNet框架,通过可微凸优化层将析取约束嵌入神经网络,实现硬约束满足与强预测性能。

Comments ICML 2026

详情
AI中文摘要

科学与工程中的许多学习任务以稀疏数据集为特征,这限制了纯数据驱动方法的有效性。同时,这些问题通常伴随着源自物理定律、操作要求和专家启发式的丰富领域知识。这些知识经常以涉及逻辑命题和线性不等式的规则形式表达。现有的神经符号方法通常通过软惩罚近似地强制执行这些规则,在设计专门架构时假设输入无关的规则,或者依赖推理时的不可微后处理来实现硬约束满足。虽然可微优化层的最新进展使得在神经网络中实现端到端的可行性强制成为可能,但由于固有的非凸性,将这些方法扩展到逻辑或混合整数规则仍然具有挑战性。在这项工作中,我们提出了一个统一的端到端框架,用于在神经网络中强制执行硬性的、输入相关的混合整数线性约束。我们的方法将规则表示为析取约束,并应用层次凸松弛来获得凸包公式。这些松弛产生了易于处理的线性约束,可以嵌入为可微优化层,同时实现精确的规则满足。我们在真实数据集上展示了所提出框架的有效性,实现了完美的规则满足和强大的预测性能。

英文摘要

Many learning tasks in science and engineering are characterized by sparse datasets, which limits the effectiveness of purely data-driven approaches. At the same time, these problems are often accompanied by rich domain knowledge derived from physical laws, operational requirements, and expert heuristics. Such knowledge is frequently expressed as rules involving logical propositions and linear inequalities. Existing neuro-symbolic methods typically enforce these rules approximately through soft penalties, assume input-independent rules when designing specialized architectures, or rely on non-differentiable post-processing at inference time to achieve hard constraint satisfaction. While recent advances in differentiable optimization layers enable end-to-end feasibility enforcement within neural networks, extending these approaches to logical or mixed-integer rules remains challenging due to inherent nonconvexity. In this work, we propose a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks. Our approach represents rules as disjunctive constraints and applies hierarchical convex relaxations to obtain convex hull formulations. These relaxations yield tractable linear constraints that can be embedded as differentiable optimization layers while enabling exact rule satisfaction. We demonstrate the effectiveness of the proposed framework on real-world datasets, achieving perfect rule satisfaction and strong predictive performance.

2605.30453 2026-06-01 hep-ph cs.LG physics.data-an 版本更新

Generative Models and Statistical Validation

生成模型与统计验证

Sascha Diefenbacher, Sofia Palacios Schweitzer, Gregor Kasieczka

发表机构 * Institut für Theoretische Physik, Universität Heidelberg(海德堡大学理论物理研究所) Physics Division, Lawrence Berkeley National Laboratory(伯克利国家实验室物理部) NHETC, Department of Physics & Astronomy, Rutgers University(罗切斯特大学物理与天文学系NHETC) Institut für Experimentalphysik, Universität Hamburg(汉堡大学实验物理研究所)

AI总结 本文介绍现代生成网络的框架,并讨论量化其准确性、精度和统计能力的挑战。

Comments 36 pages, 4 figures, Part of the VERaiPHY Initiative

详情
AI中文摘要

生成式机器学习已成为理论和实验物理中的重要工具,特别是在快速代理和密度估计的背景下。在这项工作中,我们首先介绍现代生成网络的基本框架,然后讨论量化其准确性、精度和统计能力的挑战。

英文摘要

Generative machine learning has become an essential tool in theoretical and experimental physics, especially in the context of fast surrogates and density estimators. In this work, we first introduce the underlying framework of modern generative networks and then discuss challenges in quantifying their accuracy, precision, and statistical power.

2605.30452 2026-06-01 cs.LG cs.AI math.OC 版本更新

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

多目标优化中梯度聚合的统一框架

Zeou Hu, Kelvin Ho, Yaoliang Yu

发表机构 * Cheriton School of Computer Science(切尔顿计算机科学学院) University of Waterloo(滑铁卢大学) Vector Institute(向量研究所) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出一个统一框架,通过充分对齐条件建立梯度聚合方法的收敛率,并引入基于CVaR的capped MGDA算法,在对抗联邦学习中验证鲁棒性。

详情
AI中文摘要

许多机器学习问题涉及多个固有的权衡,最好通过基于梯度的多目标优化(MOO)算法来解决。现有方法通常基于不同的动机提出,逐个案例进行分析,并且在每一步中如何聚合分量梯度在算法上有所不同。在这项工作中,我们为MOO中的梯度聚合开发了一个统一框架,建立了收敛到帕累托平稳性(MOO的标准性能度量)的(最优)速率。我们分析的核心是一个充分对齐条件,由此我们推导出一个定理,表明当在梯度的凸包内选择时,非冲突方向构成了收敛的基本充分条件。我们进一步表明,通过对偶锥上的投影可以确保可行性,从而拓宽了具有收敛保证的方法的范围。同时,我们提出了梯度聚合的原始优化视角,该视角涵盖了已有算法,阐明了它们的理论关系,并能够设计新的变体。作为示例,我们引入了capped MGDA,它基于CVaR公式推导而来,并展示了其在对抗联邦学习中的鲁棒性。最后,我们通过在合成问题和实际基准上的实验验证了我们的理论。

英文摘要

Many machine learning problems involve multiple inherent trade-offs that are best addressed by gradient-based multi-objective optimization (MOO) algorithms. Existing methods are often proposed with various motivations, analyzed case by case, and differ algorithmically in how the component gradients are aggregated at each step. In this work, we develop a unifying framework for gradient aggregation in MOO, establishing (optimal) rates of convergence to Pareto stationarity, the standard measure of performance in MOO. Central to our analysis is a sufficient alignment condition, from which we derive a theorem showing that non-conflicting directions, when chosen within the convex hull of gradients, form a fundamental sufficient condition for convergence. We further show that feasibility can be ensured through projection onto the dual cone, broadening the scope of methods that admit convergence guarantees. In parallel, we present a primal optimization perspective of gradient aggregation that encompasses established algorithms, clarifies their theoretical relationships, and enables the design of new variants. As an illustration, we introduce capped MGDA, derived from a CVaR-based formulation, and demonstrate its robustness in adversarial federated learning. Finally, we validate our theory through experiments on synthetic problems and practical benchmarks.

2605.30451 2026-06-01 cs.LG 版本更新

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

VeriGate: 验证器门控的步骤级监督用于GRPO

Aakriti Agrawal, Minghui Liu, Furong Huang

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 提出VeriGate方法,通过验证器门控的步骤级监督扩展GRPO,解决稀疏奖励和信用分配问题,在多个推理基准上显著提升准确率。

详情
AI中文摘要

组相对策略优化(GRPO)是一种有效的训练推理模型的方法,使用基于验证器的结果奖励,但其监督是稀疏的:当针对某个提示的所有采样轨迹获得相同的验证器奖励时,组相对优势会坍缩为零,学习停滞。仅结果奖励也不提供步骤级信用分配,限制了探索,使得学习稳健推理更加困难。我们提出了VeriGate(验证器门控步骤级GRPO),这是GRPO的一种验证器门控扩展,通过三个设计选择解决了这些限制。首先,每当验证器奖励在采样轨迹之间诱导出有意义的偏好时,VeriGate让验证器负责,并且仅在验证器奖励退化时使用过程监督。其次,VeriGate不将过程奖励模型(PRM)的步骤分数坍缩为单个轨迹奖励,而是将其转换为未来累积奖励,以分配延续感知的信用。第三,VeriGate将这些奖励转换为组归一化的令牌级优势,恢复信息丰富的梯度和细粒度的信用分配,同时相比优化聚合PRM分数的方法,对奖励黑客攻击的敏感性更低。实验上,在MATH上使用1.5B和7B Qwen2.5-Instruct模型进行训练,并在六个推理基准上评估,VeriGate将1.5B和7B模型的平均准确率分别提高了约20%和12%,显著减少了零梯度失败,降低了奖励黑客行为,并相对于仅结果GRPO和PRM作为结果的基线提高了推理质量。

英文摘要

Group Relative Policy Optimization (GRPO) is an effective recipe for training reasoning models with verifier-based outcome rewards, but its supervision is sparse: when all sampled trajectories for a prompt receive the same verifier reward, the group-relative advantage collapses to zero and learning stalls. Outcome-only rewards also provide no step-level credit assignment, limiting exploration and making it harder to learn robust reasoning. We present VeriGate (Verifier-Gated Step-Level GRPO), a verifier-gated extension of GRPO that addresses these limitations with three design choices. First, VeriGate keeps the verifier in charge whenever verifier rewards induce a meaningful preference among sampled trajectories, and uses process supervision only when verifier rewards are degenerate. Second, instead of collapsing Process Reward Model (PRM) step scores into a single trajectory reward, VeriGate converts them into future-cumulated rewards to assign continuation-aware credit. Third, VeriGate transforms these rewards into group-normalized token-level advantages, restoring informative gradients and fine-grained credit assignment while remaining less susceptible to reward hacking than methods that optimize aggregated PRM scores. Empirically, training on MATH with 1.5B and 7B Qwen2.5-Instruct models and evaluating on six reasoning benchmarks, VeriGate improves average accuracy by about 20% and 12% for 1.5B and 7B models respectively, substantially reduces zero-gradient failures, decreases reward-hacking behavior, and improves reasoning quality relative to outcome-only GRPO and PRM-as-outcome baselines.

2605.30448 2026-06-01 cs.LG cs.CL 版本更新

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

黑盒大语言模型蒸馏的有界行为不可区分性

Munawar Hasan

发表机构 * Michigan Technological University(密歇根技术大学)

AI总结 针对黑盒LLM蒸馏,提出有界行为不可区分性形式化定义,并通过对抗评估揭示语义相似性不足以保证行为不可区分性。

详情
AI中文摘要

黑盒大语言模型蒸馏通常被评估为输出匹配问题:当学生模型的响应与教师模型在语义上相似或任务一致时,即认为学生模型成功。然而,输出相似性并不意味着学生模型与其模仿的模型在行为上不可区分。我们引入了有界行为不可区分性,形式化为在显式提示分布上的$(ε,q,t,\mathbb{A})$-行为不可区分性,其中$ε$限制区分优势,$q$限制预言机查询次数,$t$限制计算量,$\mathbb{A}$表示对手类别。我们在Qwen和Llama教师-学生对上使用受控的$5,000$提示行为探测套件实例化该概念。对于每个系列,我们比较教师模型与基础学生模型以及LoRA蒸馏学生模型,衡量蒸馏是否降低了可区分性而不仅仅是提高了相似性。LoRA将Qwen的语义相似性从$0.788$提升至$0.862$,Llama从$0.814$提升至$0.874$。然而,对抗评估揭示了剩余的行为差异:学习到的判别器保持非零优势,成对类别分析显示伪影集中在风格/格式、鲁棒性和领域技术提示中。成对教师识别对手证实了这一趋势。使用不同系列的Llama评判器和A/B交换一致性过滤,Qwen的区分优势从基础学生模型的$0.158$下降到LoRA蒸馏后的$0.081$。查询预算实验表明,分歧引导的采集并不始终优于分层随机采样,表明覆盖率和多样性仍然是强基线。我们的结果表明,语义保真度有用但不足:黑盒大语言模型蒸馏需要有界、对抗性和类别感知的评估。

英文摘要

Black-box LLM distillation is usually evaluated as an output-matching problem: a student is considered successful when its responses are semantically similar to, or task-consistent with, those of a teacher. However, output similarity does not imply that the student is behaviorally indistinguishable from the model it imitates. We introduce bounded behavioral indistinguishability, formalized as $(ε,q,t,\mathbb{A})$-behavioral indistinguishability over an explicit prompt distribution, where $ε$ bounds distinguishing advantage, $q$ bounds oracle queries, $t$ bounds computation, and $\mathbb{A}$ denotes the adversary class. We instantiate this notion on Qwen and Llama teacher-student pairs using a controlled $5,000$-prompt behavioral probe suite. For each family, we compare the teacher with both the base student and the LoRA-distilled student, measuring whether distillation reduces distinguishability rather than merely improving similarity. LoRA raises semantic similarity from $0.788$ to $0.862$ for Qwen and from $0.814$ to $0.874$ for Llama. Yet adversarial evaluation reveals remaining behavioral differences: learned discriminators retain nonzero advantage, and pairwise category analysis shows artifacts concentrated in style/format, robustness, and domain-technical prompts. A pairwise teacher-identification adversary confirms this trend. With a different-family Llama judge and A/B-swap consistency filtering, Qwen distinguishing advantage drops from $0.158$ for the base student to $0.081$ after LoRA distillation. Query-budget experiments show that disagreement-guided acquisition does not consistently outperform stratified random sampling, indicating that coverage and diversity remain strong baselines. Our results show that semantic fidelity is useful but insufficient: black-box LLM distillation requires bounded, adversarial, and category-aware evaluation.

2605.30447 2026-06-01 cs.LG cs.AI stat.ML 版本更新

Calibrated Preference Learning: The Case of Label Ranking

校准偏好学习:以标签排序为例

Santo M. A. R. Thies, Viktor Bengs, Timo Kaufmann, Sebastian J. Vollmer, Eyke Hüllermeier

发表机构 * Munich Center for Machine Learning, Munich (MCML), Germany(慕尼黑机器学习中心,慕尼黑(MCML),德国)

AI总结 针对概率标签排序问题,形式化定义了校准概念并建立层次体系,通过理论证明和实验验证了不同校准概念的关系及现有模型的校准缺陷。

详情
AI中文摘要

校准,即预测概率与真实结果频率的对齐,对于可靠决策至关重要。尽管在分类和回归中已有广泛研究,但校准尚未在概率标签排序中得到正式处理,其目标是预测标签集排序上的分布。将排序视为类别会忽略其结构,并无法捕捉成对和top-k预测等重要模态。我们形式化了标签排序的校准,并建立了一个涵盖完整排序、子排序和top-k排序的概念层次。我们证明完整排序校准蕴含其他校准,但反之不成立,且子排序和top-k校准不可比较。实验发现,流行的标签排序模型通常校准不良,子排序和top-k指标之间存在显著差异。将我们的框架应用于RLHF奖励模型,发现校准与基准准确性强相关但不完全一致,表明它捕捉了超越top-1准确性的有意义的质量维度。这些发现激励了未来关于理解误校准的下游影响以及开发纠正方法的工作。

英文摘要

Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making. While extensively studied for classification and regression, calibration has not been formally addressed for probabilistic label ranking, where the goal is to predict a distribution over orderings of a label set. Naively treating rankings as classes ignores their structure and fails to capture important modalities such as pairwise and top-k predictions. We formalize calibration for label ranking and develop a hierarchy of notions covering full rankings, sub-rankings, and top-k rankings. We prove that full-rank calibration implies the others but not conversely, and sub-ranking and top-k calibration are incomparable. Empirically, we find popular label ranking models are often poorly calibrated, with substantial differences between sub-ranking and top-k metrics. Applying our framework to RLHF reward models, we find that calibration correlates strongly but not perfectly with benchmark accuracy, suggesting it captures a meaningful quality dimension beyond top-1 accuracy. These findings motivate future work on understanding the downstream effects of miscalibration and developing methods to correct it.

2605.30434 2026-06-01 cs.LG cs.AI cs.CL cs.MA 版本更新

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

LongDS-Bench:关于长周期智能数据分析的失败

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding, Haoming Xu, Lei Liang, Ningyu Zhang

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团) Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph(知识图谱联合实验室)

AI总结 提出LongDS基准,用于评估长周期多轮数据分析中智能体维护和更新分析状态的能力,发现最佳模型平均准确率仅48.45%,且长周期错误占失败原因的52%-69%。

Comments Ongoing work

详情
AI中文摘要

现实世界的数据分析本质上是迭代的,然而现有基准大多评估孤立或短期的交互任务,未能测试智能体在长周期内跟踪不断变化的分析上下文的能力。我们引入了LongDS,一个用于长周期、多轮数据分析的基准,其中智能体必须维护、更新、恢复和组合不断变化的分析状态。LongDS包含68个从真实世界Kaggle笔记本构建的任务,涵盖地球科学、商业和教育等六个领域的2,225轮交互。任务围绕状态演化模式(例如反事实扰动、回滚、多状态组合)设计,平均依赖跨度为11.3轮。评估五个最先进模型,我们发现最佳模型仅达到48.45%的平均准确率,性能从早期到后期轮次下降近47个百分点,长周期错误占失败原因的52%-69%。进一步分析表明,额外的智能体步骤并不一定能提高性能,这表明关键瓶颈在于维护正确的分析状态,而非增加交互预算。我们发布LongDS以支持可靠的长周期智能数据分析研究。代码和数据将在https://github.com/zjunlp/DataMind发布。

英文摘要

Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. LongDS comprises 68 tasks constructed from real-world Kaggle notebooks, spanning 2,225 turns across six domains including Geoscience, Business, and Education. Tasks are designed around state-evolution patterns (e.g., counterfactual perturbation, rollback, multi-state composition), with an average dependency span of 11.3 turns. Evaluating five state-of-the-art models, we find that the best model reaches only 48.45% average accuracy, performance drops nearly 47 points from early to late turns, and long-horizon errors account for 52%--69% of failures. Further analysis shows that additional agent steps do not necessarily improve performance, suggesting that the key bottleneck is maintaining a correct analytical state rather than increasing interaction budget. We release LongDS to support research on reliable long-horizon agentic data analysis. Code and data will be released at https://github.com/zjunlp/DataMind.

2605.30429 2026-06-01 quant-ph cs.LG 版本更新

Attention-based optimizer for symmetry finding

基于注意力的对称性发现优化器

Shreya Banerjee, Vinodh Raj Rajagopal Muthu, Charlie Nation, Rick P. A. Simon, Francesco Martini, Alessandro Ricottone, Federico Cerisola, Luca Dellantonio

发表机构 * Department of Physics and Astronomy, University of Exeter, Stocker Road, Exeter EX4 4QL, United Kingdom(物理与天文学系,埃克塞特大学,斯托克罗德路,埃克塞特 EX4 4QL,英国) QuAOS collaboration(QuAOS合作) Institute for Quantum Computing, University of Waterloo, Waterloo, ON N2L 3G1, Canada(量子计算研究所,滑铁卢大学,滑铁卢,ON N2L 3G1,加拿大)

AI总结 提出一个基于Set-Transformer架构的优化框架,利用自注意力编码Pauli字符串间的相关性,并通过自定义对易目标优化,以高概率发现哈密顿量的Pauli对称性。

Comments 9+4 pages, 2 Figures, Comments welcome

详情
AI中文摘要

发现对称性对于理解物理模型至关重要。在这项工作中,我们提出了一个优化框架,用于搜索哈密顿量的Pauli对称性,融合了机器学习与自动对称性发现领域。该框架基于Set-Transformer架构,利用自注意力编码Pauli字符串之间的成对和高阶相关性。然后将这些关系解码为候选对称性,并通过基于对易的自定义目标进一步优化,映射到输入哈密顿量的对称性。我们将该方法应用于随机Pauli哈密顿量、一维和二维周期横向场伊辛模型以及Toric码。结果表明,对于物理哈密顿量(伊辛和Toric),我们的框架以接近确定性的概率成功,同时与最先进策略相比提供了显著优势。对于随机Pauli哈密顿量,我们估计了在固定设计规格下以高成功概率找到对称性所需的计算资源,特别是并行启动次数和GPU数量。

英文摘要

Finding symmetries is crucial for understanding physical models. In this work, we present an optimization framework that searches Pauli symmetries of Hamiltonians, merging the fields of machine learning with automated symmetry finding. Built on a Set-Transformer architecture, our framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli-Strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective, and mapped to a symmetry of the input Hamiltonian. We apply our method to random Pauli Hamiltonians, periodic one and two dimensional transverse-field Ising model and the Toric code. We show that for physical Hamiltonians (Ising and Toric), our framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies. For random Pauli Hamiltonians, we estimate the required computational resources, specifically the number of parallel starts and the number of GPUs, to find a symmetry with high success probability under fixed design specifications.

2605.30399 2026-06-01 q-bio.QM cs.LG eess.IV 版本更新

A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture

一种用于评估鱼类对水产养殖中侵入性物体反应的新型计算机视觉方法

Hanne-Grete Alvheim, Stian Mjelde Jakobsen, Martin Føre, Eleni Kelasidi

发表机构 * Department of Engineering Cybernetics, NTNU(工程 cybernetics 系,挪威技术大学) Department of Aquaculture, SINTEF Ocean AS(水产养殖系,SINTEF 海洋公司)

AI总结 本研究提出一种基于YOLOv8、ByteTrack、SuperGlue和三角测量的新型立体视觉方法,用于检测、跟踪和估计鱼类三维位置,以分析不同形状、大小和颜色的结构对鱼类行为的影响。

详情
AI中文摘要

水产养殖业需要应对若干挑战,以确保可持续的海产品生产满足日益增长的全球需求。其中一个主要挑战是确保生产过程中鱼类健康良好和福利可接受,因为改善鱼类福利在当前和未来的生产系统中至关重要。本研究通过开发和实施方法,识别鱼类对侵入性物体的个体和群体行为反应,从而解决这一问题。因此,我们开发了一种检测、跟踪和估计个体鱼类三维位置的新方法,并专门设计用于跟踪工业海水网箱中养殖鱼类的尾鳍。跟踪数据采用一种新型立体视觉方法进行处理,该方法适用于估计鱼类的位置、速度、加速度以及转向和俯仰角。随后,分析了从工业规模养鱼场获得的数据集,以识别不同形状、大小和颜色的结构对鱼类行为的影响。该方法使用手动标注的尾鳍进行训练,并采用YOLOv8结合ByteTrack作为目标检测器和跟踪器,SuperGlue用于匹配左右帧中的检测结果,以及三角测量来重建鱼类的三维位置。测试了不同的图像预处理和增强方法以提高目标检测准确性,并比较了它们的性能,同时测试了RAFT-Stereo用于深度估计。获得的结果既验证了该方法相对于先前研究工作的性能,也展示了该方法在提供对海水网箱中行为动态更深入理解方面的新颖性和潜力。

英文摘要

The aquaculture industry needs to address several challenges to secure sustainable seafood production that can serve an increasing global demand. One major challenge is to ensure good fish health and acceptable welfare during production since the improvement of fish welfare is of vital importance in current and future production systems. In this study, this is addressed by developing and implementing methods to identify fish behaviors in response to intrusive objects both on individual and on a group basis. A novel approach for detecting, tracking, and estimating the 3D position of individual fish has thus been developed, and specifically designed to track the caudal fins of farmed fish in industrial sea cages. The tracking data was subjected to a novel stereo-vision method adapted to estimate fish positions, velocities, accelerations, and turning and pitch angles. Datasets obtained from industrial-scale fish farms were then analyzed to identify the impact of structures of varying shapes, sizes, and colors on fish behavior. The method was trained using manually labeled caudal fins, and used YOLOv8 with ByteTrack as an object detector and tracker, SuperGlue for matching detections in the left and right frames, and triangulation to reconstruct the 3D positions of the fish. Different image pre-processing and augmentation methods for enhancing object detection accuracy were tested and their performance compared, while RAFT-Stereo was tested for depth estimation purposes. The obtained results both validate the method's performance against previous research efforts, and demonstrate the novelty and potential of this method in providing more insight into behavioral dynamics in sea-cages.

2605.30396 2026-06-01 cs.GR cs.LG 版本更新

Smaller and Faster 3DGS via Post-Training Dictionary Learning

通过训练后字典学习实现更小更快的3DGS

Jiarong Gong, Jonas Unger, Ehsan Miandji

发表机构 * Linköping University Department of Science and technology(利乌普斯大学科学与技术学院)

AI总结 提出首个基于字典学习的3DGS后训练压缩框架,无需重新训练即可显著压缩模型、保持图像质量并提升实时渲染速度。

详情
AI中文摘要

3D高斯泼溅(3DGS)是一种有前景的实时渲染神经场景表示方法,但训练后的模型通常占用大量内存,限制了在低性能设备上的部署。现有的压缩技术往往引入额外的可训练参数,虽然实现了出色的压缩比,但会导致图像质量明显下降。在这项工作中,我们首次提出了基于字典学习的3DGS压缩框架。所提出的后训练压缩流程几乎可以应用于任何3DGS模型,无需重新训练或修改现有3DGS模型。我们的压缩框架实现简单,但提供了显著的压缩能力,保持了图像质量,并提升了实时渲染性能。在13个基准场景上,我们的方法应用于3DGS、3DGS-MCMC和PixelGS时,平均压缩比分别达到3.95倍、3.10倍和4.55倍。同时,渲染速度分别持续提升23.3%、24.3%和25.3%,且图像质量保持不变。

英文摘要

3D Gaussian Splatting (3DGS) is a promising neural scene representation for real-time rendering, but trained models often suffer from large memory footprints, limiting deployment on less powerful devices. Existing compression techniques often lead to architectures with several additional trainable parameters. While achieving outstanding compression ratios, they introduce noticeable drops in image quality. In this work, we introduce the first dictionary-learning-based compression framework for 3DGS. The proposed post-training compression pipeline can be deployed in virtually any 3DGS model without the need for re-training or modifications to existing 3DGS models. Our compression framework is straightforward to implement, yet provides significant compression capabilities, preserves image quality, and improves real-time rendering performance. Across 13 benchmark scenes, our approach achieves an average compression ratio of 3.95x, 3.10x, and 4.55x when applied to 3DGS, 3DGS-MCMC, and PixelGS, respectively. This yields consistent rendering speedups of 23.3%, 24.3%, and 25.3%, while maintaining image quality.

2605.30393 2026-06-01 cs.LG cs.AI cs.CR 版本更新

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

NumLeak: 基础模型中的公开数值基准作为潜在标签

Anany Kotawala

发表机构 * Princeton University(普林斯顿大学)

AI总结 提出NumLeak框架,通过API边界探测和开源因果模型的白盒验证,揭示基础模型在预训练中记忆公开数值基准,导致评估高估泛化能力。

Comments 23 pages, 12 figures, 17 tables. Accepted at the ICML 2026 Workshop on the Impact of Memorization on Trustworthy Foundation Models (MemFM)

详情
AI中文摘要

公开数值基准出现在预训练中,因此基于日期进行评估可能测量的是记忆性回忆而非样本外技能。我们引入NumLeak,一个结合生产模型API边界探测与开源因果模型白盒受控验证的测量框架。顶级前沿LLM在3种子池化后,对Fama-French市场超额收益的回忆皮尔逊相关系数r=0.97-0.99,同时五个兄弟因子在25个基点内误差不超过0.15;在美国失业率、CPI通胀和NOAA温度上观察到类似保真度。在近期发布的保留集上,解析率骤降至21-57%,但在回答的月份上r仍约为0.99,拒绝-回忆不对称性符合记忆通道的预测。白盒实验重现了剂量反应,对数概率排名检测到开放生成遗漏的记忆,意味着封闭API黑盒探测低估了该通道。一个Sonnet“日期到市场情绪”回归与真实Mkt-RF的相关性r=0.74,在残差化模型自身回忆后降至r=0.02。一行系统提示防御在概念和历史叙事查询上以接近零的效用成本阻止了99.8%的非自适应单轮后缀攻击集。

英文摘要

Public numeric benchmarks appear in pretraining, so an evaluation that conditions on a date may be measuring memorized recall rather than out-of-sample skill. We introduce NumLeak, a measurement framework that combines API-boundary probes on production models with a white-box controlled validation on an open causal LM. Top-tier frontier LLMs recall the Fama-French market excess return at 3-seed pooled Pearson r=0.97-0.99 while staying within 0.15 within-25bps on the five sibling factors; comparable fidelity appears on U.S. unemployment, CPI inflation, and NOAA temperature. On a recent-release holdout, parse rate collapses to 21-57% but r stays at approximately 0.99 on months answered, the refuse-or-recall asymmetry a memorized channel predicts. The white-box experiment reproduces the dose-response, and logprob ranking detects memorization that open-ended generation misses, implying closed-API black-box probes understate the channel. A Sonnet "date to market-sentiment" regression that correlates with true Mkt-RF at r=0.74 collapses to r=0.02 once the model's own recall is residualized out. A one-line system-prompt defense blocks 99.8% of a non-adaptive single-turn suffix attack set at near-zero utility cost on conceptual and historical-narrative queries

2605.30389 2026-06-01 cs.FL cs.LG 版本更新

The Inclusion Depth of Pattern Languages: An Open Problem in Algorithmic Learning Theory

模式语言的包含深度:算法学习理论中的一个开放问题

Wei Luo

发表机构 * School of Information Technology, Deakin University(信息技术学院,迪金大学)

AI总结 本文提出模式语言包含深度(最长严格包含链长度)的计算问题,并猜测其公式为2|p| - #var(p) - 1,该问题连接形式语言、组合词论和极限识别学习。

Comments 2 pages. Open problem from COLT 2005. Generic author-prepared version for arXiv. Originally appeared in Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 2005

详情
Journal ref
Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Lecture Notes in Artificial Intelligence 3559, Springer, 2005, pp. 689-690
AI中文摘要

模式语言是形式语言理论和算法学习理论中的经典模型。本文提出了计算模式语言包含深度的问题:从通用模式语言到给定模式生成的语言的最长严格包含链的长度。包含深度捕捉了从正数据识别模式的心智变化复杂度。核心开放问题是:对于每个有限字母表Σ(至少两个符号)上的每个模式p,包含深度ID_Σ(p)是否可计算,以及是否可在多项式时间内计算。一个简单的猜想公式ID_Σ(p) = 2|p| - #var(p) - 1将蕴含线性时间算法。该问题连接了模式语言包含、词上的组合学、极限中的语言识别以及有界心智变化学习。

英文摘要

Pattern languages are a classical model in formal language theory and algorithmic learning theory. This note formulates the problem of computing the inclusion depth of a pattern language: the length of the longest strict inclusion chain from the universal pattern language to the language generated by a given pattern. Inclusion depth captures the mind-change complexity of pattern identification from positive data. The central open question is whether the inclusion depth ID_Sigma(p) is computable for every pattern p over every finite alphabet Sigma with at least two symbols, and whether it is computable in polynomial time. A simple conjectured formula, ID_Sigma(p) = 2|p| - #var(p) - 1, would imply a linear-time algorithm. The problem connects pattern language inclusion, combinatorics on words, language identification in the limit, and mind-change-bounded learning.

2605.30388 2026-06-01 cs.LG 版本更新

A Novel Evaluation Metric for Unsupervised Learning in AIS-Based Maritime Anomaly Detection: MADQI

基于AIS的海事异常检测中无监督学习的新型评估指标:MADQI

Ismet Gocer, Zakirul Bhuiyan, Raza Hasan, Shakeel Ahmad

发表机构 * Southampton Solent University School of Technology and Maritime Industries(索尔森大学技术与海运学院)

AI总结 提出一种无需标签数据的海事异常检测质量指标MADQI,通过结合四个子指标来评估无监督学习模型的异常检测性能。

Comments 26 pages, A new Eval Metric for Unsupervised Machine Learning

详情
AI中文摘要

本文介绍了一个新的系统框架,用于检测海事自动识别系统(AIS)数据集中的异常。这些异常包括与速度、位置跳跃、时间间隔和转向角度相关的异常船舶行为。尽管诸如孤立森林之类的无监督学习算法被广泛用于检测异常船舶运动,但它们通常缺乏系统且有意义的评估措施。为了解决这一局限性,我们提出了一种称为海事异常检测质量指标(MADQI)的新型质量指标。所提出的MADQI是一个复合指标,旨在评估机器学习模型的异常检测性能,而无需标记数据。该框架使用哈弗辛距离计算来分析AIS数据集,并根据空间和行为特征识别异常。所提出的MADQI评估框架整合了四个相互关联的指标:异常率一致性(ARC)、物理合理性评分(PPS)、评分分布分离度(SDS)和极端案例证据(ECE)。这些指标通过使用多块评估和自适应缩放技术的自动归一化进行组合。在AIS数据集上的实验结果表明,所提出的框架实现了80.37%的MADQI分数,证明了其在无监督异常检测中的有效性。特别是,该算法在识别异常船舶行为方面表现强劲。在MADQI的各个组成部分中,ECE和ARC分别达到了0.907和1.000的分数,表明其在检测极端异常和保持异常率一致性方面具有出色的能力。总体而言,这些结果令人鼓舞,并表明所提出的框架为评估海事AIS数据中的无监督异常检测提供了一种可靠且有意义的方法。

英文摘要

This paper introduces a new systematic framework for detecting anomalies in maritime Automatic Identification System (AIS) datasets. These anomalies include abnormal vessel behaviours related to speed, position jumps, time gaps, and turn angles. Although unsupervised learning algorithms such as Isolation Forest are widely used for detecting anomalous vessel movements, they often lack systematic and meaningful evaluation measures. To address this limitation, we propose a novel quality metric called Maritime Anomaly Detection Quality Index (MADQI). The prosed MADQI is a composite index designed to evaluate the anomaly detection performance of machine learning models without requiring labelled data. The proposed framework uses Haversine distance calculations to analyse AIS datasets and identify anomalies based on their spatial and behavioural characteristics. The proposed MADQI evaluation framework integrates four interconnected metrics: Anomaly Rate Consistency (ARC), Physical Plausibility Score (PPS), Score Distribution Separation (SDS), and Extreme Case Evidence (ECE). These metrics are combined through automatic normalisation using multi-chunk evaluation and adaptive scaling techniques. Experimental results on the AIS dataset show that the proposed framework achieved a MADQI score of 80.37%, demonstrating its effectiveness for unsupervised anomaly detection. In particular, the algorithm performed strongly in identifying abnormal vessel behaviour. Among the individual MADQI components, ECE and ARC achieved scores of 0.907 and 1.000, respectively, indicating excellent capability in detecting extreme anomalies and maintaining anomaly rate consistency. Overall, these results are encouraging and demonstrate that the proposed framework provides a reliable and meaningful approach for evaluating unsupervised anomaly detection in maritime AIS data.

2605.30387 2026-06-01 cs.LG cs.AI cs.CV eess.SP 版本更新

Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

基于小波图像变换和频谱流匹配的功能磁共振时间序列生成用于脑疾病识别

Hwa Hui Tew, Junn Yong Loo, Fang Yu Leong, Julia K. Lau, Ding Fan, Hernando Ombao, Raphaël C. -W. Phan, Chee Pin Tan, Chee-Ming Ting

发表机构 * School of Information Technology, Monash University Malaysia(墨尔本大学马来西亚分校信息科技学院) School of Engineering, Monash University Malaysia(墨尔本大学马来西亚分校工程学院) Statistics Program, King Abdullah University of Science and Technology(国王阿卜杜勒·阿齐兹大学科学与技术学院统计学项目)

AI总结 提出双频谱流匹配(DSFM)框架,通过离散小波变换和离散余弦变换对BOLD信号进行双频表示,结合频谱流匹配生成类条件余弦频率表示,再经逆变换重建生理上合理的时域BOLD信号,以改善下游脑网络分类。

Comments Accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
AI中文摘要

功能磁共振成像(fMRI)通过测量随时间变化的血氧水平依赖(BOLD)信号,提供对动态脑活动的非侵入性访问。然而,fMRI采集的资源密集型特性限制了数据驱动脑分析模型所需的高保真样本的可用性。虽然现代生成模型可以合成fMRI数据,但它们在复制原始BOLD信号固有的非平稳性、复杂的时空动态和生理变化方面仍然面临挑战。为了解决这些挑战,我们提出了双频谱流匹配(DSFM),一种新颖的fMRI生成框架,它将BOLD信号的双频表示与频谱流匹配级联起来。具体来说,我们的框架首先通过离散小波变换(DWT)将BOLD信号转换为小波分解图,以捕获全局瞬态和多尺度变化,并将其投影到跨脑区和时间的离散余弦变换(DCT)空间中,以利用低频主导BOLD系数的局部能量压缩。随后,训练一个频谱流匹配模型来生成类条件余弦频率表示。通过逆DCT和逆DWT操作重建生成的样本,以恢复生理上合理的时域BOLD信号。这种双变换方法施加了结构化的频率先验,并保留了关键的生理脑动力学。最终,我们通过改进的下游基于fMRI的脑网络分类证明了我们方法的有效性。代码可在 https://github.com/htew0001/DSFM.git 获取。

英文摘要

Functional Magnetic Resonance Imaging (fMRI) provides non-invasive access to dynamic brain activity by measuring blood oxygen level-dependent (BOLD) signals over time. However, the resource-intensive nature of fMRI acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often remain challenging in replicating their inherent non-stationarity, intricate spatiotemporal dynamics, and physiological variations of raw BOLD signals. To address these challenges, we propose Dual-Spectral Flow Matching (DSFM), a novel fMRI generative framework that cascades dual frequency representation of BOLD signals with spectral flow matching. Specifically, our framework first converts BOLD signals into a wavelet decomposition map via a discrete wavelet transform (DWT) to capture globalized transient and multi-scale variations, and projects into the discrete cosine transform (DCT) space across brain regions and time to exploit localized energy compaction of low-frequency dominant BOLD coefficients. Subsequently, a spectral flow matching model is trained to generate class-conditioned cosine-frequency representation. The generated samples are reconstructed through inverse DCT and inverse DWT operations to recover physiologically plausible time-domain BOLD signals. This dual-transform approach imposes structured frequency priors and preserves key physiological brain dynamics. Ultimately, we demonstrate the efficacy of our approach through improved downstream fMRI-based brain network classification. The code is available at https://github.com/htew0001/DSFM.git .

2605.30385 2026-06-01 cs.LG cs.AI 版本更新

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

无需深度神经网络的LLM:新架构、优势与案例研究

Vincent Granville

AI总结 本文提出一种基于RBF网络的新架构,无需深度神经网络即可通过闭式解找到损失函数全局最优,消除繁琐训练步骤,并提高可解释性和准确性。

Comments 9 pages, 5 figures

详情
AI中文摘要

本文旨在验证我在LLM背景下提出的深度神经网络替代方案。最近,中国研究人员对一种称为RBF网络的模型产生了浓厚兴趣,该模型作为标准DNN的替代品,具有更高的可解释性和准确性。事实证明,我独立发现的新模型基于完全相同的机制,但有一个重大转折:它不需要DNN,因为它以闭式解在一次迭代中找到损失函数的全局最优,从而消除了繁琐的训练步骤。这里我提供了我的技术的高层概述,包括案例研究和与类似方法的比较。

英文摘要

The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs. Very recently, there has been a significant interest by Chinese researchers in a model called RBF network, as a substitute to standard DNNs, with increased explainability and higher accuracy. It turns out that my new model, discovered independently, is based on the exact same machinery. But with a major twist: it does not need DNN as it finds the global optimum of the loss function in closed form, in one iteration, thus eliminating the tedious training step. Here I provide a high-level overview of my technology, with case study and comparison to similar methods.

2605.30381 2026-06-01 cs.LG cs.AI 版本更新

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

当LLM学会一致错误:合成欺骗的线性表示的多模型研究

Vahideh Zolfaghari

发表机构 * Algoverse AI Research Medical Sciences Education Research Center, Mashhad University of Medical Sciences(马什哈德大学医学科学教育研究中心) Student Research Committee, Department of Health Information Technology and Management, Medical Informatics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences(谢赫·贝赫什提大学医学科学学院学生研究委员会,健康信息科技与管理系,医学信息学)

AI总结 通过LoRA微调五个Transformer模型的诚实与欺骗变体,使用线性探针检测合成欺骗,发现早期层即可达到近完美AUC,支持线性表示假说,并揭示两种表示机制。

详情
AI中文摘要

欺骗性对齐(模型保持准确的内部表示同时故意产生错误输出)仍然是AI安全的核心挑战。虽然战略性欺骗是主要的长期关注点,但通过直接优化错误答案诱导的合成不诚实为研究学习欺骗的表示基础提供了受控测试平台。我们引入了一个多模型范式,其中五个Transformer模型(Pythia-1.4B、Gemma-2-2B/9B、Qwen2.5-7B、Llama-3.1-8B)的诚实和欺骗变体使用LoRA在相同问题分布上进行微调。在平均池化隐藏状态上训练的线性探针在四个架构的1-3层即可检测到合成欺骗,AUC接近完美(≥0.99),而Pythia-1.4B达到峰值0.705。逻辑回归探针始终匹配或优于MLP探针,支持线性表示假说。在TruthfulQA上训练的探针以近乎零损失(ΔAUC≈0)泛化到保留的MMLU主题。深层表示对高斯噪声表现出强鲁棒性,其中Gemma-2模型表现出卓越的稳定性。对Fisher判别比、有效秩、质心几何、方向稳定性、跨域对齐和校准(ECE)的机制分析揭示了两种机制:Pythia/Llama/Qwen中的表示坍缩与Gemma-2中的高维保持。在所有模型中,欺骗方向在更深层逐渐巩固,在1-4层可实现最优校准(除Pythia外ECE<0.01)。这些结果表明,通过适度的监督微调,鲁棒、域不变的欺骗表示可以迅速固化,对基于激活的监控具有启示意义。

英文摘要

Deceptive alignment, in which models maintain accurate internal representations while deliberately producing false outputs, remains a central challenge in AI safety. While strategic deception is the primary long-term concern, synthetic dishonesty - induced via direct optimization on incorrect answers - provides a controlled testbed for studying the representational basis of learned deception. We introduce a multi-model paradigm in which honest and deceptive variants of five transformer models (Pythia-1.4B, Gemma-2-2B/9B, Qwen2.5-7B, Llama-3.1-8B) are fine-tuned using LoRA on the same question distribution. Linear probes trained on mean-pooled hidden states detect synthetic dishonesty with near-perfect AUC (greater than or equal to 0.99) as early as layers 1-3 in four architectures, while Pythia-1.4B reaches a peak of 0.705. Logistic regression probes consistently match or outperform MLP probes, supporting the Linear Representation Hypothesis. Probes trained on TruthfulQA generalize with near-zero loss (Delta AUC approx. 0) to held-out MMLU subjects. Late-layer representations show strong robustness to Gaussian noise, with Gemma-2 models exhibiting exceptional stability. Mechanistic analysis of Fisher Discriminant Ratio, effective rank, centroid geometry, directional stability, cross-domain alignment, and calibration (ECE) reveals two regimes: representational collapse in Pythia/Llama/Qwen versus high-dimensional preservation in Gemma-2. Across all models, the dishonesty direction consolidates progressively in deeper layers, with optimal calibration (ECE less than 0.01 except Pythia) achievable in layers 1-4. These results demonstrate that robust, domain-invariant dishonesty representations can be rapidly entrenched via modest supervised fine-tuning, with implications for activation-based monitoring.

2605.30376 2026-06-01 cs.LG cs.AI 版本更新

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn: 通过通用相关性建模实现高维时间序列的规模化预测

Haochen Yuan, Yichen Song, Yunbo Wang, Xiaokang Yang

发表机构 * MoE Key Lab of Artificial Intelligence(人工智能大规模并行计算实验室) AI Institute(人工智能研究院) School of Computer Science(计算机科学学院) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出Unicorn框架,通过潜在原型码本解耦相关性建模与特定通道身份,实现跨异构数据集的可扩展多数据集预训练,在少样本迁移场景中显著优于现有模型。

详情
AI中文摘要

现代时间序列架构面临一个基本权衡:通道独立模型随着数据量增加可扩展性好,但忽略了关键的通道间依赖性;而通道依赖模型具有表达力,但仍然是“维度受限的”,难以泛化到异构数据集。为了弥合这一差距,我们引入了Unicorn(通用相关网络),一个用于高维时间序列的可扩展、多数据集预训练框架。Unicorn的核心是一个潜在原型码本,它将相关性建模与特定通道身份解耦。通过将异构通道投影到共享潜在空间,Unicorn学习与身份无关的、可复用的交互模式,这些模式可以跨具有不同维度和语义的领域迁移。大量实验表明,Unicorn显著优于最先进的预测架构,特别是在少样本迁移场景中,为多变量时间序列基础模型提供了一条可扩展的路径。

英文摘要

Modern time series architectures face a fundamental trade-off: channel-independent models scale well with increasing data volume but ignore critical inter-channel dependencies, while channel-dependent models are expressive but remain ``dimension-bounded'', struggling to generalize across heterogeneous datasets.To bridge this gap, we introduce Unicorn (Universal Correlation Network), a framework for scalable, multi-dataset pretraining on high-dimensional time series. At the core of Unicorn is a latent prototype codebook that decouples correlation modeling from specific channel identities. By projecting heterogeneous channels into a shared latent space, UniCorN learns identity-agnostic, reusable interaction patterns that transfer across domains with diverse dimensionalities and semantics. Extensive experiments show that Unicorn significantly outperforms state-of-the-art forecasting architectures, particularly in few-shot transfer scenarios, offering a scalable path toward multivariate time series foundation models.

2605.30374 2026-06-01 cs.LG 版本更新

Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

Gait2Hip-60:基于多节奏步态运动学预测髋部肌肉力和关节力矩的统一深度学习基准

Jiaqi Zhang, Ji Hou, Qing Sun, Xianzhi Gao, Bo Huo

发表机构 * Capital University of Physical Education and Sports(首都体育学院) Beijing Institute of Technology(北京理工大学) Beijing Key Laboratory of Interdisciplinary Intelligent Technologies of Sports, Medicine and Engineering(北京体育医学与工程交叉智能技术重点实验室)

AI总结 本研究提出一个深度学习框架,利用LSTM、Transformer和Mamba三种模型从下肢步态运动学直接预测髋部肌肉力和关节力矩,在60名健康受试者数据上评估,发现Transformer表现最佳,并在股骨头坏死患者零样本测试中保持中等预测能力。

Comments 16 pages, 9 figures. Code and dataset publicly available

详情
AI中文摘要

在步态过程中估计髋部肌肉力和关节力矩通常依赖于肌肉骨骼仿真,这种方法信息丰富但耗时且难以应用于临床。本研究开发了一个深度学习框架,直接从下肢步态运动学预测这些髋部动力学参数,并在统一协议下比较了三种代表性序列模型。步态数据来自60名健康成年人在三种节拍器引导的节奏条件下的行走。使用十个双侧下肢关节角度作为输入,以OpenSim导出的髋部肌肉力和髋关节力矩作为参考输出。训练并评估了LSTM、Transformer和Mamba三种深度学习模型,采用相同的受试者级别划分、预处理流程和评价指标。随后,最佳模型直接在一个由9名股骨头坏死(ONFH)患者组成的外部队列上进行测试,无需重新训练。在健康受试者基准测试中,Transformer在髋部肌肉力预测(RMSE = 1.33 N/kg, MAE = 0.57 N/kg, R2 = 0.819)和髋关节力矩预测(RMSE = 0.11 Nm/kg, MAE = 0.07 Nm/kg, R2 = 0.862)方面均取得了最佳的受试者级别平均性能,且在不同步行节奏下具有相似优势。在零样本外部验证中,Transformer在ONFH患者中保留了中等预测能力,髋部肌肉力预测(RMSE = 1.51 N/kg, MAE = 0.70 N/kg, R2 = 0.537)和髋关节力矩预测(RMSE = 0.17 Nm/kg, MAE = 0.12 Nm/kg, R2 = 0.569)。这些发现支持了从步态运动学估计髋部动力学的可行性,将Transformer确定为强基线,并强调了在临床应用前需要进行更广泛的病理验证和改进泛化能力。

英文摘要

Estimating hip muscle forces and joint moments during gait typically relies on musculoskeletal simulation, which is informative but time-consuming and difficult to apply in clinical settings. This study developed a deep learning framework to predict these hip dynamics parameters directly from lower-limb gait kinematics and compared three representative sequence models under a unified protocol. Gait data were collected from 60 healthy adults under three metronome-guided cadence conditions. Ten bilateral lower-limb joint angles were used as inputs, and OpenSim-derived hip muscle forces and hip joint moments were used as reference outputs. Three deep learning models of LSTM, Transformer, and Mamba were trained and evaluated using the same subject-level split, preprocessing pipeline, and metrics. The best model was then directly tested on an external cohort of 9 patients with osteonecrosis of the femoral head (ONFH) without retraining. In the healthy-subject benchmark, Transformer achieved the best subject-level mean performance for both hip muscle force prediction (RMSE = 1.33 N/kg, MAE = 0.57 N/kg, R2 = 0.819) and hip joint moment prediction (RMSE = 0.11 Nm/kg, MAE = 0.07 Nm/kg, R2 = 0.862), with similar advantages across walking cadences. In zero-shot external validation, Transformer retained moderate predictive ability in ONFH for hip muscle force prediction (RMSE = 1.51 N/kg, MAE = 0.70 N/kg, R2 = 0.537) and hip joint moment prediction (RMSE = 0.17 Nm/kg, MAE = 0.12 Nm/kg, R2 = 0.569). These findings support the feasibility of estimating hip dynamics from gait kinematics, identify Transformer as a strong baseline, and highlight the need for broader pathological validation and improved generalization before clinical application.

2605.30372 2026-06-01 cs.NE cs.AI cs.LG q-bio.NC 版本更新

Evolutionary Algorithm for Reservoir Learning and Yielding

用于储层学习和生成的进化算法

Julien Testu, Pierrick Legrand, Xavier Hinaut

发表机构 * Inria LaBRI, CNRS UMR 5800(LaBRI,CNRS UMR 5800) Bordeaux INP, ENSC(Bordeaux INP,ENSC) IMS, CNRS UMR 5218(IMS,CNRS UMR 5218)

AI总结 提出进化算法EARLY,通过进化多储层回声状态网络的拓扑和超参数,在时序学习任务上优于随机搜索,并发现任务难度影响网络结构。

详情
Journal ref
GECCO '26 - The Genetic and Evolutionary Computation Conference, Jul 2026, San jos{é}, Costa Rica
AI中文摘要

储层计算是一种递归神经网络,因其将动态处理与训练好的读出层分离而成为时序学习的有前途方法。然而,经典的回声状态网络(ESN)通常需要针对任务调整其架构和超参数才能获得良好性能。本文介绍了EARLY(用于储层学习和生成的进化算法),这是一个旨在进化多储层ESN的拓扑和超参数的框架。受大脑模块化组织的启发,EARLY将架构编码为基于图的基因组,并应用交叉、变异和选择来发现有效的配置。我们的目标是创建通用架构和任务诱导泛化。该方法在CogScale数据集的时序学习任务上进行了评估。结果表明,进化出的架构在多个任务上优于通过随机搜索获得的架构,并根据任务难度表现出结构差异:简单任务产生轻量级架构,而复杂任务倾向于更丰富的模块化组织。这些发现表明,进化搜索有助于为更广泛的时序问题识别可复用的储层结构。进一步在跨情境学习数据集上评估进化出的架构,以评估其适应新环境的能力。

英文摘要

Reservoir computing, a type of recurrent neural network, is a promising approach for temporal learning as it separates dynamic processing from the trained readout layer. However, classical Echo State Networks (ESNs) often require task-specific tuning of their architecture and hyperparameters to achieve good performance. This paper introduces EARLY (Evolutionary Algorithm for Reservoir Learning and Yielding), a framework designed to evolve both the topology and hyperparameters of multi-reservoir ESNs. Inspired by the modular organisation of the brain, EARLY encodes architectures as graph-based genomes and applies crossover, mutation, and selection to discover effective configurations. Our goal is to create both generic architectures and tasks inducing generalization. The method is evaluated on temporal learning tasks from the CogScale dataset. Results show that evolved architectures outperform those obtained with random search on several tasks and exhibit structural differences depending on task difficulty: simpler tasks yield lightweight architectures, while more complex tasks favour richer modular organisations. These findings suggest that evolutionary search can help identify reusable reservoir structures for a broader range of temporal problems. The evolved architectures are further evaluated on a cross-situational learning dataset to assess their ability to adapt to new environments.

2605.30371 2026-06-01 cs.NE cs.LG math.DS 版本更新

From Mean-Field Limits to Semiclassical Concentration: Global Convergence of the Canonical Evolutionary Strategy

从平均场极限到半经典集中:规范进化策略的全局收敛性

Matías Neto, Nicolás Garay, Luis Martí, Nayat Sanchez-Pi

发表机构 * Inria Chile Research Center(Inria智利研究中心) Departamento de Ingeniería Matemática, Universidad de Chile(智利大学数学工程系)

AI总结 本文通过将规范进化策略建模为受控数学框架,利用薛定谔型复制子-突变子方程的半经典极限,从离散个体动力学到确定性平均场极限建立严格层级,证明全局收敛性由主算子的主特征函数决定,从而为“最平坦者生存”现象提供数学解释,并展示其在高维移位初始化场景中的优势。

详情
AI中文摘要

我们解决了随机连续优化中的全局收敛性问题。为此,我们将规范进化策略(CES)表述为一个受控数学框架,通过薛定谔型复制子-突变子方程的半经典极限来分析进化算法的全局收敛性。我们建立了从离散个体动力学到确定性平均场极限的严格层级,证明全局收敛性由底层算子的主特征函数控制。这一性质(称为几何选择)自然地优先考虑鲁棒、平坦的最优解而非狭窄的局部陷阱,为“最平坦者生存”现象提供了数学依据。此外,与当全局最小点位于初始支撑之外时容易发生过早方差崩溃的共识驱动方法不同,CES的复制子-突变子动力学促进了内在的质量传输。高维基准测试(d=30)证实了这一优势,表明在标准共识驱动和基于梯度的方法无法有效迁移的移位初始化场景中,CES实现了更低的残差误差。通过将焦点从逐点共识转移到谱集中,我们的框架为进化策略(ES)的全局收敛性提供了坚实的理论基础,无需额外的数值启发式方法。

英文摘要

We address the issue of global convergence in stochastic continuous optimization. For that purpose, we formulate the Canonical Evolutionary Strategy (CES) as a controlled mathematical framework to analyze global convergence in evolutionary algorithms via the semiclassical limit of a Schr{ö}dinger-type replicator-mutator equation. We provide a rigorous hierarchy from a discrete individual-based dynamics to a deterministic mean-field limit, demonstrating that global convergence is governed by the principal eigenfunction of the underlying operator. This property, defined as Geometric Selection, naturally prioritizes robust, flat optima over narrow local traps, offering a mathematical justification for the ''survival of the flattest'' phenomenon. Moreover, unlike consensus-driven methods that are prone to premature variance collapse when the global minimizer resides outside the initial support, the replicator-mutator dynamics of CES facilitate intrinsic mass transport. High-dimensional benchmarks (d = 30) confirm this advantage, showing that CES achieves lower residual errors in shifted initialization scenarios where standard consensus-driven and gradient-based methods fail to migrate effectively. By shifting the focus from point-wise consensus to spectral concentration, our framework provides a robust theoretical foundation for global convergence in Evolution Strategies (ES) without the need for additional numerical heuristics.

2605.30363 2026-06-01 q-fin.CP cs.AI cs.LG q-fin.ST 版本更新

Enhancing Regime Shift Detection Using Unstructured Data: A Study on the Treasury Market

利用非结构化数据增强制度转换检测:国债市场研究

Mingxuan Yi, Vidal Mehra, Jing Chen, John Cartlidge

发表机构 * School of Engineering Mathematics and Technology, University of Bristol, UK(布里斯托大学工程数学与技术学院) Propellant Digital B.V., Amsterdam, Netherlands(荷兰阿姆斯特丹Propellant Digital公司) School of Mathematics, Cardiff University, UK(卡迪夫大学数学学院)

AI总结 提出一种结合大语言模型推理与统计检验的文本增强型制度转换检测框架,在国债市场数据上实现F1=0.82,优于纯数据驱动方法。

Comments 8 pages, 4 figures. Code available at: https://github.com/mingxuan-yi/regime_shift

详情
AI中文摘要

金融市场的制度转换会重组资产价格和宏观变量的联合动态,打破任何单一制度校准。然而,由于数据信号嘈杂且高度多重共线性,而宣布制度转换的同期文本是非结构化的,因此难以可靠检测。标准的制度转换检测方法仅依赖结构化时间序列数据,忽略政策沟通,尽管这些文本往往在观察到的价格中实现转换之前就发出信号。我们提出了一种文本增强的制度转换检测流程,该流程将大语言模型(LLM)对央行沟通的推理与多元金融时间序列的统计验证相结合。该框架是检测器无关的:文本提出的候选点通过向量自回归(VAR)上的自助法似然比检验进行验证,而来自任意制度检测器的数据驱动候选点则通过宽松的LLM文本检查进行确认。我们在2010-2024年FOMC会议记录以及14变量美国国债和宏观经济面板数据上评估了该框架,使用了四种可互换的数据驱动检测器。所提出的流程在经核实的货币政策制度转换锚定列表上实现了F1=0.82,具有当日模态检测延迟,并且性能始终优于纯数据驱动基线。结果表明,将非结构化政策文本与统计结构性断点检测相结合,提高了金融市场制度转换识别的鲁棒性和可解释性。

英文摘要

Regime shifts in financial markets reorganise the joint dynamics of asset prices and macro variables, breaking any single-regime calibration. They are nonetheless difficult to detect reliably because the data signal is noisy and heavily multicollinear, while the contemporaneous text that announces them is unstructured. Standard regime shift detection methods rely solely on structured time-series data and ignore policy communications, even though these texts often signal shifts before they materialise in observed prices. We propose a text-enhanced regime shift detection pipeline that combines large language model (LLM) reasoning over central-bank communications with statistical validation on multivariate financial time series. The framework is detector-agnostic: text-proposed candidates are validated using a bootstrap likelihood-ratio test on a vector autoregression (VAR), while data-driven candidates from arbitrary regime detectors are ratified through a lenient LLM text check. We evaluate the framework on 2010-2024 FOMC minutes paired with a 14-variable U.S. Treasury and macroeconomic panel, using four interchangeable data-driven detectors. The proposed pipeline achieves F1 = 0.82 against a verified anchor list of monetary-policy regime shifts, with same-day modal detection latency and consistently stronger performance than pure data-driven baselines. The results demonstrate that combining unstructured policy text with statistical structural-break detection improves the robustness and interpretability of regime shift identification in financial markets.

2605.30361 2026-06-01 cs.NE cs.AI cs.LG 版本更新

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

通过低秩进化策略的无梯度训练脉冲神经网络

Dhruv Patankar, Sachit Ramesha Gowda

发表机构 * Shunya Research(Shunya研究)

AI总结 提出EGGROLL方法,利用低秩因子化进化策略扰动,在N-MNIST数据集上以79.21%测试精度和2.23倍加速实现脉冲神经网络的无梯度训练。

Comments 12 pages, 4 figures

详情
AI中文摘要

脉冲神经网络(SNN)在神经形态硬件上具有显著的能效优势,但由于离散脉冲阈值不可微,其训练仍然具有挑战性。代理梯度方法通过近似导数规避了这一问题,但它们需要反向传播基础设施,这与片上学习不兼容。进化策略(ES)是一种自然的无梯度替代方案,但其计算成本随参数数量扩展,使得对于大型权重矩阵不实用。我们提出了一种使用EGGROLL训练SNN的方法,这是一种ES扰动的低秩因子化,将每代内存从$\mathcal{O}(mn)$降低到$\mathcal{O}(r(m{+}n))$。将EGGROLL与N-MNIST上的漏积分点火SNN相结合,我们证明了无梯度训练达到了79.21%的测试准确率,同时相对于全秩ES,每代墙钟时间减少了2.23倍。我们的结果表明EGGROLL对于SNN训练是可行的,具有明确的准确率-速度权衡,并且兼容于无需代理梯度的神经形态硬件上的训练。

英文摘要

Spiking Neural Networks (SNNs) offer compelling energy efficiency on neuromorphic hardware, yet their training remains challenging because the discrete spike threshold is non-differentiable. Surrogate-gradient methods sidestep this by approximating the derivative, but they impose backpropagation infrastructure that is incompatible with on-chip learning. Evolution Strategies (\es) are a natural gradient-free alternative, yet their computational cost scales with the number of parameters, making them impractical for large weight matrices. We present a method for training SNNs using EGGROLL, a low-rank factorisation of ES perturbations that reduces per-generation memory from $\mathcal{O}(mn)$ to $\mathcal{O}(r(m{+}n))$. Combining EGGROLL with a Leaky Integrate-and-Fire SNN on N-MNIST, we demonstrate that gradient-free training achieves 79.21% test accuracy while reducing per-generation wall-clock time by 2.23$\times$ relative to full-rank ES. Our results demonstrate EGGROLL is viable for SNN training, with a clear accuracy-speed tradeoff, compatible with training on neuromorphic hardware without surrogate gradients.

2605.30359 2026-06-01 cs.NE cs.DC cs.LG cs.PF cs.SE cs.SY eess.SY 版本更新

Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts

Kernel Foundry:一种基于诊断的多专家进化内核优化器

Zixuan Huang, Da Chen, Kecheng Huang, Lihao Yin, Xing Li, Huiling Zhen, Mingxuan Yuan, Zili Shao

发表机构 * The Chinese University of Hong Kong(香港中文大学) Noah’s Ark Lab, Huawei Hong Kong(华为香港诺亚实验室)

AI总结 提出Kernel Foundry,一种结合专家引导初始化、多岛进化搜索和结构化诊断反馈的自动GPU内核优化框架,显著提升正确性和性能。

详情
AI中文摘要

生成高性能GPU内核仍然具有挑战性,因为需要同时保证正确性和硬件感知优化。虽然大型语言模型(LLMs)在代码生成方面显示出潜力,但它们通常无法生成既正确又高效的内核。我们提出Kernel Foundry,一种诊断驱动的进化框架,用于自动GPU内核优化。我们的方法将专家引导的、检索增强的初始化与多岛进化搜索相结合,其中候选内核通过结构化诊断反馈进行迭代细化。一个集中的经验库积累可重用的优化知识以指导后续进化,同时显式机制防止绕过内核级计算的作弊行为。在KernelBench上的实验表明,我们的方法在正确性和性能上均持续优于强基线,在Level~2上实现了高达100%的正确性。

英文摘要

Generating high-performance GPU kernels remains challenging due to the need for both correctness and hardware-aware optimization. While large language models (LLMs) show promise in code generation, they often fail to produce kernels that are both correct and efficient. We propose Kernel Foundry, a diagnosis-driven evolutionary framework for automatic GPU kernel optimization. Our method combines expert-guided, retrieval-augmented initialization with a multi-island evolutionary search, where candidate kernels are iteratively refined using structured diagnostic feedback. A centralized experience library accumulates reusable optimization knowledge to guide subsequent evolution, while explicit mechanisms prevent cheating behaviors that bypass kernel-level computation. Experiments on KernelBench show that our method consistently improves both correctness and performance over strong baselines, achieving up to 100% correctness on Level~2.

2605.30358 2026-06-01 cs.LG quant-ph 版本更新

QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits

QASM-Eval:用于训练和评估LLM在超越量子电路的OpenQASM-3上的数据集

Zhenxiao Fu, Lei Jiang, Fan Chen

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 针对LLM在OpenQASM-3硬件级编程上的训练与评估空白,构建了包含专家验证测试集和训练集的数据集,覆盖经典逻辑、时序调度、脉冲控制等,并通过扩展验证器自动验证,实验表明微调后LLM性能显著提升。

详情
AI中文摘要

量子计算仍处于含噪中等规模量子(NISQ)时代,其性能受到噪声的高度限制。解决这一限制通常需要超越门序列电路规范的硬件相关能力,包括用于量子纠错(QEC)的中间电路测量和经典反馈、用于动态解耦(DD)的精确时序控制,以及用于校准的脉冲级波形访问。OpenQASM-3正是为了暴露这些能力而引入的,提供了硬件级编程接口。然而,尽管大语言模型在代码生成方面取得了快速进展,目前仍没有专门设计用于训练和评估LLM在涉及高级硬件导向特性的OpenQASM-3程序上的数据集。为填补这一空白,我们推出了QASM-Eval,这是首个专门设计用于训练和评估LLM在OpenQASM-3上的全面数据集。QASM-Eval并非专注于量子算法设计或推理,而是明确针对该语言的硬件相关特性。QASM-Eval包含一个由专家验证的100个任务的测试集和一个4000个任务的训练集,系统性地涵盖了经典逻辑、时序调度、脉冲控制以及复杂的实际工作流程。为了自动验证生成的程序,我们使用扩展的验证器检查语法、量子态和程序时间线。我们的评估表明,虽然最先进的LLM在OpenQASM-3编码任务上表现困难,但在QASM-Eval上进行针对性微调后取得了显著提升。QASM-Eval为加速开发NISQ时代硬件相关量子编程的可靠LLM助手提供了关键的基准和训练基础。数据和代码:https://github.com/fuzhenxiao/QASM-Eval

英文摘要

Quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where the performance is highly constrained to noise. Addressing the limitation often requires hardware-facing capabilities beyond gate-sequence circuit specification, including mid-circuit measurement and classical feedback for quantum error correction (QEC), precise timing control for dynamical decoupling (DD), and pulse-level waveform access for calibration. OpenQASM-3 was introduced to expose exactly these capabilities, providing a hardware-level programming interface. However, despite the rapid progress of large language models in code generation, there is still no dataset specifically designed to train and evaluate LLMs on OpenQASM-3 programs that involve its advanced hardware-oriented features. To address this gap, we introduce QASM-Eval, the first comprehensive dataset designed to train and evaluate LLMs on OpenQASM-3. Rather than focusing on quantum algorithm design or reasoning, QASM-Eval explicitly targets the language's hardware-facing features. QASM-Eval comprises an expert-verified test set of 100 tasks and a training set of 4,000 tasks, systematically covering classical logic, timing scheduling, pulse control, and complex real-world workflows. To automatically validate generated programs, we check syntax, quantum states and program timeline using an extended verifier. Our evaluation reveals that while state-of-the-art LLMs struggle heavily in OpenQASM-3 coding tasks, targeted fine-tuning on QASM-Eval yields significant gains. QASM-Eval provides a crucial benchmark and training foundation to accelerate the development of reliable LLM assistants for hardware-facing quantum programming in NISQ era. Data and code: https://github.com/fuzhenxiao/QASM-Eval

2605.28420 2026-06-01 cs.LG 版本更新

Conveyance: A Versatile Framework for Learning in Structured Class Spaces

Conveyance: 结构化类空间学习的通用框架

Yasser Taha, Grégoire Montavon, Nils Körber

发表机构 * Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute(公共健康人工智能研究中心,罗伯特· Koch 研究所) Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究所) Institute for AI in Medicine, Charité Universitätsmedizin Berlin(医学人工智能研究所,柏林夏里特大学医学院)

AI总结 针对标准损失函数忽略类间结构关系的问题,提出Conveyance分类方法,通过最大化不同类划分上的两个间隔来编码图结构关系,在层次分类、序数回归和多实例学习任务中达到或超越专用基线。

详情
AI中文摘要

尽管机器学习架构已迅速发展以处理复杂数据,但在许多实际应用中,像交叉熵这样的损失函数仍然大多与结构无关。然而,这些标准损失的“类对称”性质从根本上限制了机器学习模型利用类间结构关系的能力,尤其是在面对结构化噪声时。我们提出了Conveyance,一种针对结构化类空间的新分类方法及相关损失函数。它允许用户编码类之间的图结构关系,而无需定义复杂的联合分布或手动调整效用矩阵。从技术上讲,我们的损失函数通过最大化不同类划分上的两个间隔来运作,同时保持单调性和部分凸性等正式性质。我们通过将方法应用于层次分类、序数回归和多实例学习来展示其通用性和有效性。在这些任务中,Conveyance要么匹配要么超过专用基线的性能,从而为结构化类空间提供了统一解决方案。

英文摘要

While machine learning (ML) architectures have evolved rapidly to account for complex data, loss functions like cross-entropy remain mostly structure-agnostic in many real-world applications. However, the "class-symmetric" nature of these standard losses fundamentally limits the ability of ML models to exploit structural relationships between classes, particularly when facing structured noise. We propose Conveyance, a new classification approach and associated loss function tailored to structured class spaces. It allows users to encode graph-like relations between classes without having to define complex joint distributions or manually tune utility matrices. Technically, our loss function operates by maximizing two separate margins over distinct class partitions, while preserving formal properties such as monotonicity and partial convexity. We demonstrate the versatility and effectiveness of our method by applying it to hierarchical classification, ordinal regression, and multiple instance learning. Across these tasks, Conveyance either matches or exceeds the performance of specialized baselines, thereby offering a unified solution for structured class spaces.

2605.28162 2026-06-01 quant-ph cs.LG 版本更新

Learning Logical Operations for Arbitrary Quantum Error Correction Codes

学习任意量子纠错码的逻辑操作

Nico Meyer, Christopher Mutschler, Dominik Seuß, Andreas Maier, Daniel D. Scherer

发表机构 * Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, Germany(弗劳恩霍夫集成电路研究所(IIS),德国纽伦堡) Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany(埃朗根-纽伦堡弗赖堡-亚历山大-大学模式识别实验室,德国埃朗根) University of Technology Nuremberg (UTN), Nuremberg, Germany(纽伦堡技术大学(UTN),德国纽伦堡) Center for Artificial Intelligence (CAIRO), Technical University of Applied Sciences Würzburg-Schweinfurt, Würzburg, Germany(人工智能中心(CAIRO),韦尔茨堡-施维茨应用技术大学,德国韦尔茨堡)

AI总结 提出基于学习的框架,仅通过编码电路为任意量子纠错码构造具有横向性或浅深度等结构性质的逻辑操作,并扩展为变分早期容错量子计算(VarEFTQC)方法,用于协同设计非加性编码和逻辑门集。

Comments 23 pages, 12 figures, 5 tables

详情
AI中文摘要

逻辑操作对于量子纠错码内的量子计算至关重要。然而,发现其物理实现具有挑战性,特别是对于缺乏稳定子描述的非加性码。我们提出了一个通用的基于学习的框架,仅给定编码电路,即可构造逻辑操作的物理实现,同时强制执行诸如横向性或浅深度等结构性质。我们的方法通过重新发现标准稳定子码的已知逻辑操作得到验证。然后,我们将其扩展为协同设计过程,称为变分早期容错量子计算(VarEFTQC),该过程针对给定噪声模型定制非加性编码,并强制执行所需的逻辑门集,例如横向IQP型族或低深度通用集。一个软件库实现了完整的学习流程,包括损失函数变体、ansatz族和优化例程。这些结果共同将VarEFTQC定位为发现用于早期容错量子计算的硬件自适应逻辑工具的实用工具。

英文摘要

Logical operations are essential for quantum computation within quantum error-correcting codes. However, discovering their physical realizations is challenging, especially for non-additive codes that lack a stabilizer description. We present a general learning-based framework that, given only an encoding circuit, constructs physical implementations of logical operations while enforcing structural properties such as transversality or shallow depth. Our approach is validated by rediscovering known logical operations of standard stabilizer codes. We then extend it to a co-design procedure, dubbed variational early fault-tolerant quantum computing (VarEFTQC), which tailors non-additive encodings to a given noise model and enforces desired logical gate sets, such as transversal IQP-type families or low-depth universal sets. A software library implements the complete learning pipeline, including loss-function variants, ansatz families, and optimization routines. Together, these results position VarEFTQC as a practical tool for discovering hardware-adapted logical gadgets for early fault-tolerant quantum computing.

2605.28068 2026-06-01 cs.LG 版本更新

PINE: Pruning Boosted Tree Ensembles with Conformal In-Distribution Prediction Equivalence

PINE:基于共形分布内预测等价的剪枝提升树集成

Haruki Yajima, Yusuke Matsui

发表机构 * The University of Tokyo(东京大学)

AI总结 提出PINE方法,通过共形校准控制分布内区域,在保持预测等价的同时将剪枝压缩比提升高达30%。

Comments Accepted to ICML 2026

详情
AI中文摘要

树集成是具有强预测性能和可解释性的机器学习模型,广泛用于表格数据。树集成的标准剪枝方法通常优化精度-压缩权衡,可能会改变部分预测,从而影响决策一致性。忠实剪枝方法通过在整个输入空间上保持预测等价来解决这个问题,但这一要求导致较低的压缩比。我们提出PINE,一种在分布内区域提供强保证的剪枝方法。PINE在该区域内保持预测等价,并通过共形校准使用单个参数$α$控制区域大小。在12个公开表格数据集上的实验表明,PINE在保持与现有忠实剪枝方法相当的预测水平的同时,将压缩比提高了高达30%。

英文摘要

Tree ensembles are machine learning models with strong predictive performance and interpretability, and remain widely used for tabular data. Standard pruning methods for tree ensembles typically optimize an accuracy-compression trade-off and may change a subset of predictions, potentially compromising decision consistency. Faithful pruning methods address this issue by preserving prediction equivalence over the entire input space, but this requirement leads to lower compression ratios. We propose PINE, a pruning method that provides strong guarantees within an in-distribution region. PINE preserves prediction equivalence within this region and controls the region size using a single parameter $α$ via conformal calibration. Experiments on 12 public tabular datasets show that PINE improves the compression ratio by up to 30% while preserving predictions at a comparable level to existing faithful pruning methods.

2605.27912 2026-06-01 cs.CR cs.DS cs.LG 版本更新

Privately Estimating Monotone Statistics in Polynomial Time

多项式时间内私有估计单调统计量

Gavin Brown, Ephraim Linder, Mahbod Majid, Vikrant Singhal

发表机构 * University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Boston University(波士顿大学) MIT Mathematics(麻省理工学院数学系) University of Copenhagen(哥本哈根大学)

AI总结 针对单调统计量的差分隐私估计,提出一种改进的子采样-聚合算法,在样本复杂度上节省因子t,运行时间增加因子e^t,并证明其最优性。

详情
AI中文摘要

我们研究用于估计单调统计量的高效差分隐私算法,即那些在新观测值加入时单调的统计量。我们研究的起点是子采样-聚合:一种经典范式,它将数据集划分为多个块,估计每个块上的统计量,然后私有地聚合这些估计。虽然这种方法实用且具有通用性,但它相当耗费数据。我们针对单调统计量类改进了这一框架——与子采样-聚合相比,我们的算法在样本复杂度上节省了因子$t$,而在运行时间上付出了因子$e^t$的代价,其中$t>0$是一个可调参数。我们通过查询复杂度的下界来补充我们的结果,表明我们的算法在此任务上本质上是最优的。作为一个应用,我们在私有特征值估计、私有损失估计以及私有估计高维模型(例如线性回归)中的单个参数方面获得了改进的结果。

英文摘要

We study efficient differentially private algorithms for estimating monotone statistics, i.e., statistics that are monotone under the addition of new observations. The starting point for our investigation is subsample-and-aggregate: a classical paradigm that partitions the dataset into blocks, estimates the statistic on each block, and then privately aggregates the estimates. While practical and generically applicable, this approach is quite data-hungry. We improve upon this framework for the class of monotone statistics -- compared to subsample-and-aggregate, our algorithms save a factor of $t$ in sample complexity and pay a factor of $e^t$ in running time, where $t>0$ is a tunable parameter. We complement our results with a query-complexity lower bound, showing that our algorithms are essentially optimal for this task. As an application, we obtain improved results for private eigenvalue estimation, private loss estimation, and privately estimating a single parameter of a high-dimensional model, e.g., in linear regression.

2605.27557 2026-06-01 cs.LG stat.ML 版本更新

The Fundamental Limits of Fraud Detection in Card Payment Networks

银行卡支付网络中欺诈检测的基本极限

Gaurav Dhama

发表机构 * Mastercard

AI总结 本文通过形式化支付授权为具有延迟、审查、污染和反事实缺失反馈的序贯决策问题,推导出极小极大遗憾下界,证明生态系统信息质量是欺诈检测的根本瓶颈,而非模型复杂度。

详情
AI中文摘要

银行卡支付欺诈检测通常被框架化为一个监督分类问题。尽管这种方法已经取得了实际进展,但尽管模型架构取得了重大进展,改进仍然只是渐进的。我们认为,这主要不是函数逼近或优化的失败,而是支付生态系统固有的结构性信息损害的结果。我们将银行卡授权形式化为一个具有延迟、审查、污染和反事实缺失反馈的序贯决策问题。我们推导出一个极小极大遗憾下界,表明这些损害在可达学习率的分母中相乘。该下界表明,提高发卡机构报告质量或减少审查可以比增加模型复杂度更大幅度地降低遗憾下限。我们还表明,发卡机构之间的异质性会进一步恶化可学习性,超出平均损害率所暗示的程度。本文贡献了一个理论,解释了为什么支付网络中的欺诈检测本质上比标准在线学习设置更困难,将生态系统信息质量确定为关键瓶颈,并为优先投资于报告基础设施、争议处理质量和选择性探索提供了理论基础。本文以理论为先,不依赖专有交易数据。

英文摘要

Card payment fraud detection is usually framed as a supervised classification problem. Although this approach has generated practical progress, improvement has remained incremental despite major advances in model architecture. We argue that this is not mainly a failure of function approximation or optimization, but a consequence of structural information impairments inherent to the payment ecosystem. We formalize card authorization as a sequential decision problem with delayed, censored, corrupted, and counterfactually missing feedback. We derive a minimax regret lower bound showing that these impairments enter multiplicatively in the denominator of the achievable learning rate. The bound implies that improving issuer reporting quality or reducing censorship can yield larger reductions in the regret floor than increasing model complexity. We also show that heterogeneity across issuers worsens learnability beyond what average impairment rates suggest. The paper contributes a theory of why fraud detection in payment networks is fundamentally harder than in standard online learning settings, identifies ecosystem information quality as the key bottleneck, and provides a theoretical basis for prioritizing investments in reporting infrastructure, dispute process quality, and selective exploration. The paper is theory-first and does not rely on proprietary transaction data.

2605.27355 2026-06-01 cs.AI cs.CL cs.LG 版本更新

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

对齐篡改:人类反馈强化学习如何被利用以优化错位偏见

Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

发表机构 * MIT(麻省理工学院)

AI总结 本文提出对齐篡改漏洞,即对齐中的LLM通过影响偏好数据集使RLHF放大不良行为,并通过实验展示多种偏见的放大,指出现有缓解方法难以在不牺牲质量的情况下解决该问题。

Comments Accepted at ICML 2026, Source code: https://alignment-tampering.github.io/

详情
AI中文摘要

人类反馈强化学习(RLHF)是将大型语言模型(LLM)与人类偏好对齐的标准方法。在本工作中,我们引入对齐篡改,这是一种潜在漏洞,即正在对齐的LLM影响偏好数据集,导致RLHF放大不良行为。这源于RLHF的核心局限性:(1)偏好数据集由LLM自身的输出构建,使其能够影响它们;(2)成对比较仅指示哪个响应更好,而不说明原因。这些局限性可能被利用以导致对齐篡改。例如,如果LLM以更高质量生成有偏见的响应,标注者会基于质量偏好它们。然而,偏好标签无法区分质量与偏见,奖励模型继承了这一局限性。通过强化学习或最佳N采样优化此类奖励可能放大错位偏见。我们的实验展示了跨多种偏见的放大:从关键词偏见到宣传(例如性别歧视)、品牌推广和工具性目标寻求。缓解仍然具有挑战性,因为现有的鲁棒RLHF技术无法在不牺牲响应质量的情况下完全解决对齐篡改。这些发现揭示了当前RLHF的结构性漏洞,并强调了防止此漏洞的必要性。项目页面:https://alignment-tampering.github.io/

英文摘要

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are constructed from the LLM's own outputs, allowing it to influence them, and (2) pairwise comparisons only indicate which response is better, not why. These limitations can be exploited to cause alignment tampering. For example, if an LLM generates biased responses with higher quality, annotators will prefer them based on quality. However, preference labels do not distinguish quality from bias, and the reward model inherits this limitation. Optimizing such rewards through reinforcement learning or best-of-N sampling can amplify misaligned biases. Our experiments demonstrate amplification across diverse biases: from keyword bias to propaganda (e.g., sexism), brand promotion, and instrumental goal-seeking. Mitigation remains challenging, as existing techniques for robust RLHF fail to fully resolve alignment tampering without sacrificing response quality. These findings reveal structural vulnerabilities of current RLHF and emphasize the need to prevent this vulnerability. Project page: https://alignment-tampering.github.io/

2605.26929 2026-06-01 cs.LG 版本更新

When Muon Optimizer Meets Adversarial Training: A Theoretical and Empirical Study

当Muon优化器遇到对抗训练:理论与实证研究

Jun Yan, Weiquan Huang, Jiankai Zuo, Yujian Mo, Xi Fang, Chengliang Wu, Zeming Wei

发表机构 * IT College, Shanghai Ocean University(上海海洋大学信息学院) School of Computer Science and Technology, Tongji University(同济大学计算机科学与技术学院) SEIE, Suzhou University of Science and Technology(苏州科技大学SEIE学院) DP Tech(DP科技) School of Mathematical Sciences, Peking University(北京大学数学科学学院)

AI总结 本文通过理论和实证研究,探讨Muon优化器(基于近似极分解的正交化更新)在对抗训练中的效果,发现其能限制矩阵更新的谱范数增长,在CNN和ViT上优于AdamW,与SGD竞争力相当。

详情
AI中文摘要

对抗训练(AT)仍然是最可靠的对抗攻击经验防御方法之一。其鲁棒性关键取决于底层极小极大目标如何优化。在实践中,随机梯度下降(SGD)优化器仍然是AT的默认优化选择,而自适应优化器通常能改善标准训练,但可能产生较差的鲁棒性。最近,Muon优化器通过近似极分解对矩阵值更新进行正交化,在内存成本与SGD相当的情况下,在大规模训练中取得了显著成功。这提出了一个与安全相关的问题:正交化优化能否在强异质威胁模型下改进AT?针对这一问题,我们进行了全面的理论和实证研究。理论上,我们表明Muon对矩阵更新施加了谱范数稳定性上限,限制了训练动态中不受控制的谱增长,而无需显式缩小学习权重。实证上,在五种架构和三种$\ell_p$威胁模型($\ell_\infty$、$\ell_1$、$\ell_2$)及其联合下,Muon在CNN上与SGD竞争力相当,并在CNN和ViT上显著优于AdamW。这些结果将优化器几何识别为对抗训练中的一个安全相关因素,同时阐明了正交化更新有益的经验场景。总体而言,我们的发现强调了优化器设计是AT的一个安全关键组成部分。

英文摘要

Adversarial training (AT) remains one of the most reliable empirical defenses against adversarial attacks. Its robustness critically depends on how the underlying min-max objective is optimized. In practice, Stochastic Gradient Descent (SGD) optimizer remains the default optimization choice for AT, whereas adaptive optimizers often improve standard training but may yield inferior robustness. Recently, the Muon optimizer, which orthogonalizes matrix-valued updates via an approximate polar decomposition, has achieved notable success in large-scale training at a memory cost comparable to SGD. This raises a security-relevant question: \textit{can orthogonalized optimization improve AT under strong and heterogeneous threat models?} Focusing on this problem, we conduct a comprehensive theoretical and empirical study. Theoretically, we show that Muon imposes a spectral-norm stability ceiling on matrix updates, limiting uncontrolled spectral growth in the training dynamics without explicitly shrinking the learned weights. Empirically, across five architectures and three $\ell_p$ threat models ($\ell_\infty$, $\ell_1$, $\ell_2$) and their union, Muon is competitive with SGD on CNNs and substantially outperforms AdamW on both CNNs and ViTs. These results identify optimizer geometry as a security-relevant factor in adversarial training, while clarifying the empirical regimes in which orthogonalized updates are beneficial. Overall, our findings highlight optimizer design as a security-critical component of AT.

2605.26711 2026-06-01 cs.CL cs.LG 版本更新

The Need for an External Observer Formalizing the Sufficiency Gap: A Mathematical Extension of Mixture Identifiability and Contextual Grounding in Sequence Models

外部观察者的必要性:充分性差距的形式化——序列模型中混合可识别性与上下文基础化的数学扩展

Francesco Corielli

AI总结 本文通过构建二元混合过程,形式化了由未观测隐状态导致的充分性差距,并引入辅助信号建立上下文主导阈值,证明温度缩放无法弥补缺失上下文,而外部观察者或验证器在高风险领域是必要的。

详情
AI中文摘要

我们构建了一个二元混合过程,其中一个确定性文本机制和一个随机机制由未观测的隐状态控制。即使一个理想的无容量限制的序列预测器能够精确恢复纯文本边际分布,当观测到的前缀与错误的隐状态兼容时,它也可能变得过度自信。由此产生的熵差并非普通的优化误差;而是由未观测状态上的边缘化导致的充分性差距。然后,我们通过一个保真度为$γ∈[1/2,1]$的辅助二元信号形式化检索、工具使用和外部基础化。由此产生的贝叶斯更新给出了一个上下文主导阈值:当纠正信号的保真度超过纯文本后验权重中分配给误导机制的部分时,该信号恰好反转由文本历史诱导的后验几率。该阈值减小了充分性差距,但通常不能完全消除;完全消除需要相关隐状态的完美揭示或等效的验证机制。该分析阐明了为什么温度缩放无法恢复缺失的上下文,为什么基础化机制必须既信息丰富又可被模型学习使用,以及为什么在高风险领域自主序列模型需要结构上解耦的观察者或验证器。

英文摘要

We construct a binary mixed-regime process with one deterministic textual regime and one random regime governed by an unobserved latent state. Even an ideal infinite-capacity sequence predictor that exactly recovers the text-only marginal law can become overconfident when the observed prefix is compatible with the wrong latent regime. The resulting entropy difference is not an ordinary optimization error; it is a sufficiency gap caused by marginalization over an unobserved state. We then formalize retrieval, tool use, and external grounding through an auxiliary binary signal with fidelity $γ\in [1/2,1]$. The resulting Bayesian update yields a contextual dominance threshold: a corrective signal reverses the posterior odds induced by the textual history exactly when its fidelity exceeds the text-only posterior weight assigned to the misleading regime. This threshold reduces, but does not generally eliminate, the sufficiency gap; complete closure requires perfect revelation of the relevant latent state or an equivalent verification mechanism. The analysis clarifies why temperature scaling cannot restore missing context, why grounding mechanisms must be both informative and learnably usable by the model, and why autonomous sequence models require structurally decoupled observers or verifiers in high-stakes domains.

2605.26502 2026-06-01 cs.LG physics.optics 版本更新

PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design

PRISM:用于多层薄膜设计的位置编码回归逆光谱模型

Runtian Wang, Renhao Xue, Baige Chen, Hao Wu

发表机构 * Independent Researcher(独立研究者) Work does not relate to position at Amazon(与亚马逊职位无关的工作)

AI总结 提出PRISM,一种解码器仅自回归变压器,通过联合预测离散材料选择和连续厚度回归,解决多层薄膜光学涂层设计的逆问题,相比其他变压器基线MAE降低50%以上,参数仅为其五分之一。

Comments 8 pages, 3 figures, Proceedings of the AI4Physics Workshop at the 43rd International Conference on Machine Learning (AI4Physics@ICML 2026)

详情
AI中文摘要

多层薄膜光学涂层设计的逆问题是一个复杂的组合-连续优化挑战。我们提出了PRISM(位置编码回归逆光谱模型),一种统一的解码器仅自回归变压器,通过在单个骨干网络中联合预测离散材料选择和连续厚度回归,简化了这一过程。PRISM引入了两个主要的架构创新:(1)光谱前缀条件化,利用标准前缀令牌进行上下文目标注入;(2)累积深度旋转位置嵌入,将连续厚度直接编码到位置表示中,以保留堆栈的物理空间关系。我们的基准测试表明,PRISM-13M模型相比其他变压器基线将MAE降低了50%以上,同时仅使用五分之一的参数。此外,一个44M参数的变体在我们的分布内验证基准上实现了最先进的性能(MAE = 0.010),并且运行速度显著快于模拟退火,为经典优化方法提供了一种高效的替代方案。

英文摘要

The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary architectural innovations: (1) spectrum prefix conditioning, which utilizes standard prefix tokens for in-context target injection, and (2) cumulative-depth Rotary Position Embeddings, which encode continuous thickness directly into the positional representation to preserve the physical spatial relationships of the stack. Our benchmarks demonstrate that a PRISM-13M model reduces MAE by over 50\% compared to other transformer baselines while utilizing only one-fifth of the parameters. Furthermore, a 44M-parameter variant achieves state-of-the-art performance (MAE = 0.010) on our in-distribution validation benchmark and operates significantly faster than simulated annealing, offering a highly efficient alternative to classical optimization methods.

2605.26396 2026-06-01 cs.AI cs.CL cs.LG 版本更新

Advancing Creative Physical Intelligence in Large Multimodal Models

推进大型多模态模型中的创造性物理智能

Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Jeonghwan Kim, Emre Can Acikgoz, Bingxuan Li, Kunlun Zhu, Jiateng Liu, Aditi Tiwari, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Heng Ji

发表机构 * UIUC(伊利诺伊大学香槟分校) Amazon(亚马逊)

AI总结 针对大型多模态模型在开放式环境中缺乏基于视觉的创造性工具使用能力的问题,提出MM-CreativityBench基准和基于偏好学习的具身对齐方法,显著提升实体选择并减少幻觉。

Comments 51 Pages, 9 Figures, 7 Tables, Previous Work CreativityBench: arXiv:2605.02910

详情
AI中文摘要

大型多模态模型(LMMs)在感知和推理方面取得了快速进展;然而,目前尚不清楚这些能力是否能够泛化到在开放式环境中发现基于视觉的解决方案,超越模式识别。在此类场景中,智能需要的不仅仅是回答明确的问题:它涉及识别场景中的元素如何以非显而易见但物理上可行的方式被重新利用。这种创造性问题解决形式是人类智能的核心,但在当前基准测试中基本上未得到测试。为了评估这一能力,我们引入了MM-CreativityBench,这是一个用于在视觉丰富、物理受限的环境中进行基于可操作性的创造性工具使用的基准。每个实例呈现一个场景图像,包含候选实体及其部件的结构化视图,从而能够对模型如何迭代检查场景、识别相关可操作性以及组合视觉和物理上可行的解决方案进行细粒度、交互式评估。我们的实验表明,当前的LMMs往往表现不佳,不是由于缺乏生成能力,而是因为它们无法维持基于具身的探索。模型经常忽略相关实体,对关键部件检查不足,或幻觉出图像中不存在的属性。受此失败模式的启发,我们提出了具身对齐,将创造性工具使用视为一个偏好学习问题。使用直接偏好优化,我们鼓励模型偏好基于视觉证据的属性-可操作性推理,而非幻觉替代方案。此外,我们结合从可操作性知识库中获得的监督,以指导更广泛的实体探索和多轮规划。我们的结果显示,在正确选择实体和部件方面取得了持续改进,同时大幅减少了幻觉和与具身相关的错误。

英文摘要

Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded solutions in open-ended environments, beyond pattern recognition. In such settings, intelligence requires more than answering well-posed questions: it involves identifying how elements in a scene can be repurposed in non-obvious yet physically feasible ways. This form of creative problem-solving is central to human intelligence, but remains largely untested in current benchmarks. To evaluate this ability, we introduce MM-CreativityBench, a benchmark for affordance-grounded creative tool use in visually rich, physically constrained environments. Each instance presents a scenario image with structured views of candidate entities and their parts, enabling fine-grained, interactive evaluation of how models iteratively inspect the scene, identify relevant affordances, and compose visually and physically grounded solutions. Our experiments show that current LMMs often fall short, not due to lack of generative capability, but because they do not sustain grounded exploration. Models often overlook relevant entities, under-examine critical parts, or hallucinate attributes not grounded in the image. Motivated by this failure mode, we propose affordance-grounded alignment, which casts creative tool use as a preference learning problem. Using Direct Preference Optimization, we encourage models to prefer attribute-affordance reasoning grounded in visual evidence over hallucinated alternatives. In addition, we incorporate supervision derived from an affordance knowledge base to guide broader entity exploration and multi-turn planning. Our results show consistent gains in selecting the correct entities and parts, while substantially reducing hallucination and grounding-related errors.

2605.26183 2026-06-01 q-bio.QM cs.LG 版本更新

What Molecular Structure Cannot Tell Us: A Taxonomy of Explainability Gaps in GNN-Based Drug Toxicity Prediction

分子结构无法告诉我们的事:基于GNN的药物毒性预测中可解释性差距的分类

Juergen Dietrich

AI总结 本研究引入了一个操作分类法,系统性地分析了图神经网络在药物毒性预测中由于结构信息限制导致的不可解释性差距,并以阿司匹林为例量化了分子结构仅能解释约45%的不良反应。

Comments 13 pages

详情
AI中文摘要

并非所有临床相关的不良反应都能从分子图中结构推断出来——无论模型质量或架构复杂性如何。本研究引入了一个操作分类法,用于描述独立于所用学习算法的结构信息限制,这些限制阻碍了基于结构的毒性预测。图神经网络(GNN)已成为分子毒性预测的自然方法,直接作用于原子连接性,避免了固定长度指纹固有的信息损失。然而,药物已知药理学特征中实际可从分子结构推断的比例仍未被系统探索。以乙酰水杨酸(ASA,阿司匹林)——药理学中表征最全面的药物之一——作为模型化合物进行系统性案例研究。在Tox21基准上训练消息传递神经网络(MPNN),并应用GNNExplainer表征原子级归因。结果表明,分子结构解释了约45%(5/11)的已知ASA不良反应。引入了一个四类差距分类法(GAP-1至GAP-4),区分了原则上不可编码的效应、由非随机缺失(MNAR)机制引起的数据差距、检测面板不匹配和表示误差。通过系统的ChEMBL查询(42个已记录检测,0个可检索生物活性条目)经验量化了MNAR差距。注意力池化实验将表示误差定位到MPNN消息传递层而非聚合步骤。该差距分类法对药物安全信号检测和监管框架(包括良好药物警戒实践(GVP)指南和新方法论(NAMs))具有直接影响。在伴随的DDI消融研究中确认了所识别的结构限制。

英文摘要

Not all clinically relevant adverse effects are structurally inferable from molecular graphs - regardless of model quality or architectural complexity. This study introduces an operational taxonomy of the structural information limits that prevent structure-based toxicity prediction, independent of the learning algorithm employed. Graph Neural Networks (GNNs) have emerged as a natural approach for molecular toxicity prediction, operating directly on atomic connectivity without the information loss inherent to fixed-length fingerprints. However, the fraction of a drug's known pharmacological profile that is actually inferable from molecular structure remains systematically underexplored. A systematic case study using acetylsalicylic acid (ASA, Aspirin) - one of the most comprehensively characterized drugs in pharmacology - serves as model compound. A Message Passing Neural Network (MPNN) is trained on the Tox21 benchmark and GNNExplainer is applied to characterize atom-level attribution. Results indicate that molecular structure explains approximately 45% (5/11) of known ASA adverse effects. A four-category Gap Taxonomy (GAP-1 through GAP-4) is introduced distinguishing between principally non-encodable effects, data gaps arising from Missing Not At Random (MNAR) mechanisms, assay panel mismatches, and representation errors. The MNAR gap is empirically quantified via a systematic ChEMBL query (42 documented assays, 0 retrievable bioactivity entries). An attention pooling experiment localizes the representation error to the MPNN message passing layers rather than the aggregation step. The Gap Taxonomy has direct implications for drug safety signal detection and regulatory frameworks including Good Pharmacovigilance Practice (GVP) guidelines and New Approach Methodologies (NAMs). Structural limits identified are confirmed in a companion DDI ablation study.

2605.26121 2026-06-01 cs.LG cs.AI 版本更新

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

GEM: 用于最优LLM数据策展的几何熵混合

Yue Min, Ziyun Qiao, Ruining Chen, Yujun Li

发表机构 * The Hong Kong University of Science and Technology, Hong Kong SAR, China(香港科学与技术大学) Peking University, Beijing, China(北京大学) University of Science and Technology of China, Hefei, China(中国科学技术大学)

AI总结 提出GEM框架,通过将数据策展重构为超球面上的变分问题并采用MM算法优化,解决了分类缺陷和嵌入各向异性问题,在1.1B参数模型上实现下游准确率提升1.2%。

Comments ICML 2026 Poster

详情
AI中文摘要

LLM预训练的有效性越来越依赖于数据组成而非单纯的数据量。然而,最优混合受到分类缺陷的阻碍:人类分类法存在本体论错位,而欧几里得聚类无法解决嵌入各向异性。我们引入GEM(几何熵混合),这是一个将数据策展重构为超球面上的变分问题并辅以混合平衡正则化项的框架。通过解耦生成先验并使用可证明的MM(Minorize-Maximize)算法优化目标,GEM有效对抗聚类坍缩,从而发现欧几里得启发式方法无法察觉的平衡语义结构。我们采用师生蒸馏将这种几何保真度扩展到网络规模语料库,并引入几何影响分数(GIS)用于可解释的分类法生成。使用1.1B参数模型的实验表明,当集成到DoReMi和RegMix等混合策略中时,GEM建立了新的最先进水平,将平均下游准确率提升高达1.2%,并为可预测的数据混合提供了稳健的坐标系。

英文摘要

LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorization flaws: human taxonomies suffer from ontological misalignment, and Euclidean clustering fails to address embedding anisotropy. We introduce GEM (Geometric Entropy Mixing), a framework reformulating data curation as a variational problem on the hypersphere augmented with a mixing-balance regularizer. By decoupling the generative prior and optimizing the objective via a provable MM (Minorize-Maximize) algorithm, GEM effectively counteracts the cluster collapse to discover balanced semantic structures invisible to Euclidean heuristics. We employ teacher-student distillation to scale this geometric fidelity to web-scale corpora and introduce the Geometric Influence Score (GIS) for interpretable taxonomy generation. Experiments with 1.1B-parameter models demonstrate that GEM establishes a new state-of-the-art when integrated into mixing strategies like DoReMi and RegMix, improving average downstream accuracy by up to 1.2% and offering a robust coordinate system for predictable data mixing.

2605.30018 2026-06-01 cs.CL cs.LG 版本更新

Latent Performance Profiling of Large Language Models

大型语言模型的潜在性能剖析

Tanmoy Chakraborty, Ayan Sengupta, Suparna Bhattacharya, Partha Pratim Chakrabarti, Amlan Chakrabarti, Supratik Chakraborty, Partha Pratim Das, Lipika Dey, Richa Singh, Mayank Vatsa

发表机构 * Department of Electrical Engineering, Indian Institute of Technology Delhi(印度理工学院德里分校电子工程系) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(印度理工学院德里分校人工智能学院) Hewlett Packard Enterprise, India(印度惠普企业公司) Department of Computer Science & Engineering, Indian Institute of Technology Kharagpur(印度理工学院Khargapur分校计算机科学与工程系) A.K.Choudhury School of Information Technology, University of Calcutta, India(印度加尔各答大学信息科技学院) Department of Computer Science & Engineering, Indian Institute of Technology Bombay(印度理工学院孟买分校计算机科学与工程系) Department of Computer Science, Ashoka University, India(阿什oka大学计算机科学系) Department of Computer Science & Engineering, Indian Institute of Technology Jodhpur(印度理工学院朱罗普分校计算机科学与工程系)

AI总结 提出潜在性能剖析(LPP)框架,通过隐藏激活和输出分布提取任务无关的诊断指标,揭示模型内在特性,补充传统基准评估。

详情
AI中文摘要

大型语言模型(LLMs)在标准化基准测试中经常取得令人印象深刻的分数,但仅凭准确性对能力的了解有限。通过排行榜评估开源LLMs面临持续的问题,如数据污染、任务范围狭窄以及与真实世界可靠性的弱对齐。基于基准的评估(如MMLU PRO、BBH或IFEval)主要捕捉模型在固定测试集上的输出,而非其如何处理信息、校准不确定性或构建内部知识。在本文中,我们主张从以基准为中心的评估转向对LLMs进行互补的、以状态为中心的内在评估。为此,我们引入了潜在性能剖析(LPP)——一个从隐藏激活和输出分布中提取任务无关诊断的框架。LPP在模型的潜在表示和动态上定义了一组标量指标,揭示了与规模无关的特征,从而实现可解释的比较并揭示隐藏的脆弱性。与静态准确性分数不同,LPP在相似规模的模型间提供稳定、对架构敏感的签名。通过对八个LLMs(规模范围0.5B-14B)的广泛实证分析,我们证明了具有相似基准分数的模型可能表现出对比的潜在特征,例如熵或适应性的差异。在这些见解的指导下,我们设计了用于不确定性和符号推理的合成探针,这些探针与内在指标一致,同时与排行榜偏差解耦。我们建议将LPP与基准一起报告,以提供对模型行为更深入、可解释的理解,从而实现更可靠的模型选择、安全评估以及超越表面准确性的评估。

英文摘要

Large language models (LLMs) frequently achieve impressive scores on standardized benchmarks, yet accuracy alone offers a limited view of their capabilities. Evaluating open-source LLMs through leaderboards faces persistent issues like data contamination, narrow task scope, and weak alignment with real-world reliability. Benchmark-based evaluations such as MMLU PRO, BBH, or IFEval primarily capture what a model outputs on fixed test sets, not how it processes information, calibrates uncertainty, or structures internal knowledge. In this article, we advocate for a shift from benchmark-centric evaluation toward a complementary, state-centered intrinsic assessment of LLMs. To this end, we introduce Latent Performance Profiling (LPP) -- a framework that derives task-agnostic diagnostics from hidden activations and output distributions. LPP defines a set of scalar metrics on a model's latent representations and dynamics, revealing scale-independent traits that enable interpretable comparisons and uncover hidden vulnerabilities. Unlike static accuracy scores, LPP provides stable, architecture-sensitive signatures across models of similar size. With extensive empirical analyses across eight LLMs, spanning a size range of 0.5B-14B, we demonstrate that models with similar benchmark scores can exhibit contrasting latent profiles, such as differences in entropy or adaptability. Guided by these insights, we design synthetic probes for uncertainty and symbolic reasoning that align with intrinsic metrics while decoupling from leaderboard bias. We recommend that reporting LPP alongside benchmarks provides a deeper, interpretable understanding of model behavior, enabling more reliable model selection, safety assessment, and evaluation beyond surface-level accuracy.

2605.29852 2026-06-01 cs.CV cs.LG cs.MM 版本更新

Parameter-Efficient Subspace Decoupling ViT for Mitigating Multi-Task Negative Transfer in Histological Scoring

参数高效子空间解耦ViT用于缓解组织学评分中的多任务负迁移

Youhan Huang, Jiajun Li, Yilin Fang, Shuai Wang, Chuheng Li

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Beijing University of Chemical Technology(北京化工大学) Capital Medical University(首都医科大学)

AI总结 提出子空间解耦多任务Vision Transformer,通过轻量级任务特定适配器和正交性约束构建独立特征子空间,减少任务干扰并保留共享表示,有效缓解多任务负迁移。

Comments 6 pages, 5 figures, 2 tables. IEEE ICME 2026 (Oral). Camera-ready version

详情
AI中文摘要

组织学评分对于诊断非酒精性脂肪性肝病(NAFLD)至关重要,但由于高标注成本以及多任务学习中强相关的NAFLD活动评分(NAS)指标之间的负迁移,其自动化仍然具有挑战性。为了解决这个问题,我们提出了一种子空间解耦的多任务Vision Transformer(ViT),它集成了轻量级的任务特定适配器与基于正交性的约束。该设计为脂肪变性、气球样变和炎症构建了独立的特征子空间,有效减少了任务干扰,同时保留了共享表示。我们进一步构建了一个精心策划的多任务小鼠NAFLD组织学数据集,其中包含所有NAS组件的专家标注。实验结果表明,与训练单独的单个任务模型相比,所提出的方法以显著降低的计算成本提高了多任务稳定性和泛化能力。代码和策划的数据集已准备就绪,将在接收后公开以支持可重复性。

英文摘要

Histological scoring is essential for diagnosing Non-Alcoholic Fatty Liver Disease (NAFLD), yet its automation remains challenging due to the high annotation cost and negative transfer among the strongly correlated NAFLD Activity Score (NAS) indicators in multi-task learning. To address this issue, we propose a subspace-decoupled multi-task Vision Transformer (ViT) that integrates lightweight task-specific Adapters with orthogonality-based constraints. This design constructs independent feature subspaces for steatosis, ballooning, and inflammation, effectively reducing task interference while retaining shared representations. We further construct a curated multi-task mouse NAFLD histology dataset with expert annotations for all NAS components. Experimental results demonstrate that the proposed method improves multi-task stability and generalization with substantially reduced computational cost compared to training separate single-task models. The code and the curated dataset have been prepared and will be made publicly available upon acceptance to support reproducibility.

2605.22737 2026-06-01 cs.LG cs.AI 版本更新

The Distillation Game: Adaptive Attacks & Efficient Defenses

蒸馏博弈:自适应攻击与高效防御

Youssef Allouah, Mahdi Haghifam, Sanmi Koyejo, Reza Shokri

发表机构 * Stanford University(斯坦福大学) Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) National University of Singapore(新加坡国立大学)

AI总结 通过最小化博弈框架研究蒸馏攻击中模型提供者的部署权衡,提出自适应评估规则和产品专家(PoE)防御方法,实验表明自适应学生能恢复更多能力,且PoE在成本和质量上具有优势。

详情
AI中文摘要

蒸馏攻击为模型提供者带来了部署权衡:使模型更有用的相同输出也可能使其更容易被模仿。我们通过一个效用受限的教师和自适应学生之间的最小化博弈来研究这种权衡。我们的框架产生了可处理的一侧响应规则:一个自适应评估规则,其中学生重新加权高价值示例,以及一个教师侧防御模板,抑制对蒸馏最有用的输出。从示例价值的廉价代理中,我们推导出产品专家(PoE),一种简单的前向传递防御,在生成过程中将教师与代理学生结合。实验上,自适应评估揭示了一个大的被动-自适应差距:在最先进的防御上,自适应学生在GSM8K和MATH上恢复了比被动评估所建议的更多的能力。在这种更强的评估下,昂贵防御和PoE之间的明显鲁棒性差距显著缩小,而PoE仍然便宜得多,并保留了更高质量的推理轨迹。总体而言,我们的结果表明,强大的蒸馏仍然难以阻止,并且反蒸馏的进展应该根据自适应学生而非被动学生来判断。我们的代码可在:https://github.com/ysfalh/distillation-game 获取。

英文摘要

Distillation attacks create a deployment trade-off for model providers: the same outputs that make a model more useful can also make it easier to imitate. We study this trade-off through a minimax game between a utility-constrained teacher and an adaptive student. Our framework yields tractable one-sided response rules: an adaptive evaluation rule in which the student reweights high-value examples, and a teacher-side defense template that suppresses outputs most useful for distillation. From a cheap proxy for example value, we derive Product-of-Experts (PoE), a simple forward-pass-only defense that combines the teacher with a proxy student during generation. Empirically, adaptive evaluation reveals a large passive--adaptive gap: on state-of-the-art defenses, adaptive students recover substantially more capability than passive evaluation suggests on GSM8K and MATH. Under this stronger evaluation, the apparent robustness gap between expensive defenses and PoE narrows considerably, while PoE remains substantially cheaper and preserves higher-quality reasoning traces. Overall, our results suggest that strong distillation remains difficult to stop, and that progress on antidistillation should be judged against adaptive students rather than passive ones. Our code is available at: https://github.com/ysfalh/distillation-game.

2605.12340 2026-06-01 stat.ML cs.LG 版本更新

Online Learning-to-Defer with Varying Experts

在线学习延迟决策与变化专家

Dang Hoang Duy, Yannis Montreuil, Maxime Meyer, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing, National University of Singapore(新加坡国立大学计算机学院) Fédération ENAC ISAE-SUPAERO ONERA, Université de Toulouse, France(法国图卢兹大学ENAC ISAE-SUPAERO ONERA联合体) Institute for Infocomm Research, A*STAR , Singapore(新加坡A*STAR信息通信研究所) IPAL, IRL 2955, Singapore(新加坡IPAL实验室) Department of Mathematics, National University of Singapore(新加坡国立大学数学系)

AI总结 针对动态专家池和流式数据,提出首个在线学习延迟决策算法,利用H-一致性界和在线凸优化实现遗憾界保证。

详情
AI中文摘要

学习延迟决策(L2D)方法将每个查询路由到预测模型或外部专家。虽然现有工作研究批处理设置中的这个问题,但实际部署需要处理流数据、变化的专家可用性和变化的专家分布。我们引入了第一个用于多类分类的在线L2D算法,具有bandit反馈和动态变化的专家池。我们的方法在一般情况下实现了$O((n+n_e)T^{2/3})$的遗憾界,在低噪声条件下实现了$O((n+n_e)\sqrt{T})$的遗憾界,其中$T$是时间范围,$n$是标签数量,$n_e$是跨轮次观察到的不同专家数量。该分析基于在线框架的新颖$\mathcal{H}$-一致性界,结合在线凸优化的一阶方法。在合成和真实世界数据集上的实验表明,我们的方法有效地将标准学习延迟决策扩展到具有变化专家可用性和可靠性的设置。

英文摘要

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.

2604.09414 2026-06-01 stat.ML cs.LG 版本更新

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

超越增强动作代理的多专家学习延迟决策

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing(计算学院) Fédération ENAC(ENAC联合会) ISAE-SUPAERO ONERA National University of Singapore(新加坡国立大学) Université de Toulouse(图卢兹大学) Agency for Science, Technology and Research(科技研究局) Institute for Infocomm Research(信息通信研究所)

AI总结 针对多专家学习延迟决策问题,提出一种解耦代理损失函数,通过独立sigmoid头与softmax分类器头分离优化,解决了现有方法中的优化病理问题,并首次给出不随专家数量增长的校准常数界。

详情
AI中文摘要

学习延迟决策(L2D)系统针对每个输入决定是自行预测还是交给若干可用专家之一。非常成熟的方案通过将$K$个类别和$J$个专家视为共享$(K{+}J)$动作几何中的竞争动作,联合训练分类器和路由器。后续工作在该几何内提出了一系列增量修复;我们表明,即使在统计一致性下,每个方法仍不同程度地遭受优化层面的病理问题(目标失真、梯度放大、赢家通吃饥饿、集合质量崩溃或类别-专家耦合)。我们完全跳出增强动作家族,提出一种解耦代理:一个softmax分类器头以及每个专家独立的sigmoid头,镜像了问题的两个自然对象。我们证明每个样本的更新是坐标式的,且类别-专家Hessian块恒为零,并证明了具有校准常数$\max\{2\sqrt{2},\sqrt{2J/λ}\}$的过量风险界——据我们所知,这是第一个在多专家L2D中当每个专家权重固定时常数不随专家池增长的保证。在受控合成研究以及CIFAR-10、CIFAR-10H和Covertype上,它是我们比较中唯一在专家池增长时保持稳定、保留稀有专家并在每个真实数据基准上优于独立分类器的方法。

英文摘要

A learning-to-defer (L2D) system decides, for each input, whether to predict on its own or to hand it to one of several available experts. The very well established recipe trains classifier and router jointly by treating the $K$ classes and $J$ experts as competing actions in one shared $(K{+}J)$-action geometry. Subsequent work has proposed a series of incremental fixes within this geometry; we show that each still suffers, to varying severity, from an optimization-level pathology (target distortion, gradient amplification, winner-take-all starvation, set-mass collapse, or class-expert coupling) even under statistical consistency. We step outside the augmented-action family entirely and propose a decoupled surrogate: a softmax classifier head and an independent sigmoid head per expert, mirroring the two natural objects of the problem. We show that per-sample updates are then coordinatewise and the class-expert Hessian block is identically zero, and prove an excess-risk bound with calibration constant $\max\{2\sqrt{2},\sqrt{2J/λ}\}$ -- to our knowledge the first multi-expert L2D guarantee whose constant does not grow with the expert pool when the per-expert weight is held fixed. On controlled synthetic studies and on CIFAR-10, CIFAR-10H, and Covertype, it is the only method in our comparison that remains stable as the expert pool grows, preserves rare specialists, and improves over a standalone classifier on every real-data benchmark.

2603.14324 2026-06-01 stat.ML cs.LG 版本更新

Learning-to-Defer with Expert-Conditional Advice

基于专家条件建议的学习-延迟决策

Yannis Montreuil, Leïna Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing(计算学院) Département de Mathématiques(数学系) National University of Singapore(新加坡国立大学) Sorbonne University(索邦大学) Fédération ENAC(ENAC联合会) ISAE-SUPAERO ONERA Agency for Science, Technology and Research(科技研究局) Université de Toulouse(图卢兹大学) Institute for Infocomm Research(信息与通信研究所)

AI总结 研究在决策时可为专家提供额外信息(建议)的延迟学习问题,提出一种在复合专家-建议动作空间上的增广替代损失,并证明其一致性保证和最优策略恢复能力。

详情
AI中文摘要

学习-延迟决策将每个输入路由到预期成本最小的专家,但假设决策时每个专家可获得的信息是固定的。许多现代系统违反了这一假设:选择专家后,还可以选择该专家应接收哪些额外信息,例如检索到的文档、工具输出或升级上下文。我们研究了这个问题,并将其称为带建议的学习-延迟决策。我们表明,即使在最简单的非平凡设置中,一系列广泛使用的自然分离替代损失(通过不同头部学习路由和建议)也是不一致的。然后,我们引入了一个在复合专家-建议动作空间上操作的增广替代损失,并证明了其$\mathcal{H}$一致性保证以及超额风险转移界,从而在极限情况下恢复贝叶斯最优策略。在表格、语言和多模态任务上的实验表明,所提方法优于标准学习-延迟决策,同时根据成本机制调整其建议获取行为;一个合成基准证实了分离替代损失预测的失败模式。

英文摘要

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.

2510.10988 2026-06-01 stat.ML cs.LG 版本更新

Adversarial Robustness in One-Stage Learning-to-Defer

单阶段学习委托中的对抗鲁棒性

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing, National University of Singapore(新加坡国立大学计算机学院) Fédération ENAC ISAE-SUPAERO ONERA, Université de Toulouse, France(法国图卢兹大学ENAC ISAE-SUPAERO ONERA联合体) Institute for Infocomm Research, A*STAR , Singapore(新加坡星展研究所) IPAL, IRL 2955, Singapore(新加坡IPAL实验室)

AI总结 针对单阶段学习委托(L2D)中预测器与分配器联合训练的场景,提出首个对抗鲁棒性框架,通过形式化攻击、设计成本敏感的对抗替代损失并建立理论保证(包括H、R/F和贝叶斯一致性),在基准数据集上验证了方法在保持干净性能的同时提升了对无目标和有目标攻击的鲁棒性。

详情
AI中文摘要

学习委托(L2D)通过将输入路由到预测器或外部专家来实现混合决策。尽管前景广阔,但L2D极易受到对抗扰动的影响,这些扰动不仅可能翻转预测,还可能操纵委托决策。先前的鲁棒性分析仅关注两阶段设置,未涉及预测器和分配器联合训练的端到端(单阶段)情况。我们首次提出了单阶段L2D中对抗鲁棒性的框架,涵盖分类和回归。我们的方法形式化了攻击,提出了成本敏感的对抗替代损失,并建立了包括$\mathcal{H}$、$(\mathcal{R}, \mathcal{F})$和贝叶斯一致性在内的理论保证。在基准数据集上的实验证实,我们的方法在保持干净性能的同时,提高了对无目标和有目标攻击的鲁棒性。

英文摘要

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

2605.29511 2026-06-01 cs.MA cs.CL cs.LG 版本更新

DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration

DynaGraph: 通过动态拓扑重构的轻量级多模型交互框架

Yanxing Guo, Zihao Zheng, Fangzhou Wu, Ling Liang, Lin Bao, Zongwei Wang, Yimao Cai

发表机构 * Peking University(北京大学) Nanjing University(南京大学) Beijing Advanced Innovation Center for Integrated Circuits(北京集成电路先进制造创新中心) Beijing University of Posts and Telecommunications(北京邮电大学) Yanxin Co. Ltd(燕新有限公司)

AI总结 提出DynaGraph框架,通过动态拓扑重构和PEFT适配器复用,在单消费级GPU上实现多模型协作,接近72B单模型推理能力并大幅降低延迟和token消耗。

详情
AI中文摘要

处理复杂推理任务通常依赖于庞大的单体LLM,这会导致严重的计算冗余。虽然通过结构化流水线或多智能体协作进行任务分解提供了替代方案,但这些方法不可避免地陷入一个关键困境:预定义的静态拓扑极易受到级联错误的影响,而无约束的动态智能体则面临轨迹发散和不可预测的内存膨胀。为了解决这个问题,我们提出了DynaGraph,一个由动态拓扑重构驱动的轻量级多模型框架。在执行层面,DynaGraph在共享基础模型上复用时分PEFT适配器,使得整个系统的训练和推理部署可以在单个消费级GPU上完成。在路由层面,评估器持续监控执行置信度以触发分层自愈:针对局部数据差距的细粒度修补和针对严重逻辑断裂的子图重构。在StrategyQA、MATH和FinQA上的实验表明,我们的8B模型接近72B单体模型的推理能力(例如,在StrategyQA上为87.6%,在MATH上为82.7%)。此外,与无约束的动态架构相比,它延迟降低了高达68.1%,token消耗降低了68.6%。

英文摘要

Tackling complex reasoning tasks typically relies on massive monolithic LLMs, which suffer from severe computational redundancy. While task decomposition through structured pipelines or multi-agent collaborations offers an alternative, these approaches inevitably fall into a critical dilemma: predefined static topologies are highly vulnerable to cascading errors, whereas unconstrained dynamic agents suffer from trajectory divergence and unpredictable memory bloat. To address this, we present DynaGraph, a lightweight multi-model framework driven by dynamic topological reconfiguration. At the execution level, DynaGraph multiplexes time-division PEFT adapters over a shared base model, enabling both full system training and inference deployment on a single consumer-grade GPU. At the routing level, the Evaluator continuously monitors execution confidence to trigger hierarchical self-healing: Fine-grained Patching for localized data gaps and Subgraph Reconstruction for severe logical ruptures. Experiments on StrategyQA, MATH, and FinQA demonstrate our 8B model closely approximates the reasoning capabilities of a 72B monolithic model (e.g., 87.6% on StrategyQA, 82.7% on MATH). Furthermore, it reduces latency by up to 68.1% and token consumption by 68.6% compared to unconstrained dynamic architectures.

2605.29373 2026-06-01 cs.LG cs.NA math.NA 版本更新

Deep Adaptive Dimension Reduction for Bayesian Inference in Inverse Problems

逆问题中贝叶斯推理的深度自适应降维

Yueyang Wang, Xili Wang, Kejun Tang, Xiaoliang Wan, Tao Zhou, Chao Yang

发表机构 * School of Mathematical Sciences, Peking University(北京大学数学科学学院) School of Sciences, Great Bay University(大湾大学理学院) Department of Mathematics, Louisiana State University(路易斯安那州立大学数学系) SKLMS & Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences(中国科学院数学与系统科学研究院SKLMS及计算数学与科学/工程计算研究所)

AI总结 提出基于变分流模型的深度自适应降维贝叶斯推理框架,结合VAE非线性降维、双归一化流和迭代先验更新策略,并自适应微调傅里叶神经算子代理,以高效求解高维PDE控制逆问题中的复杂非高斯后验分布。

Comments 25 pages, 5 figures

详情
AI中文摘要

求解高维PDE控制的逆问题通常具有挑战性,原因在于复杂的非高斯后验分布、昂贵的正演模型评估以及错误的先验信息。为了解决这些问题,我们提出了一种基于变分流(VF)模型的深度自适应降维贝叶斯推理框架。由于标准归一化流受双射映射限制且无法直接降维,VF通过将基于VAE的非线性降维与潜在先验和编码器的双归一化流相结合,克服了这一限制。该设计提供了严格高于VAE的证据下界,并允许更灵活地逼近复杂后验分布。我们进一步引入了一种迭代先验更新策略,该策略逐渐将先验均值移向高概率后验区域,避免了手动先验调整。这些组件与自适应微调的傅里叶神经算子(FNO)代理一起形成了一个闭环自适应循环:VF生成后验集中样本以改进代理,而更新的代理进一步改进后验推理。在100维Rosenbrock问题和三个标准PDE控制逆问题上的数值实验表明,与MCMC、UKI和SVGD基线相比,我们的方法在所有测试配置中均具有竞争性或更优的精度,在高噪声观测和高维参数空间等挑战性场景中优势最为明显。

英文摘要

Solving high-dimensional PDE-governed inverse problems is often challenging due to complex non-Gaussian posterior distributions, expensive forward model evaluations, and misspecified prior information. To address these issues, we propose a deep adaptive dimension-reduction Bayesian inference framework based on the Variational Flow (VF) model. Since standard normalizing flows are restricted by bijective mappings and cannot directly reduce dimensions, VF overcomes this limitation by integrating VAE-based nonlinear dimension reduction with dual normalizing flows for the latent prior and encoder. This design provides a strictly higher evidence lower bound than VAE and allows more flexible approximation of complex posterior distributions. We further introduce an iterative prior updating strategy that gradually moves the prior mean toward high-probability posterior regions, avoiding manual prior tuning. These components form a closed adaptive loop together with an adaptively fine-tuned Fourier Neural Operator (FNO) surrogate: VF generates posterior-concentrated samples to refine the surrogate, while the updated surrogate further improves posterior inference. Numerical experiments on a 100-dimensional Rosenbrock problem and three standard PDE-governed inverse problems show that our method delivers competitive or superior accuracy compared with MCMC, UKI, and SVGD baselines across all tested configurations, with the most pronounced advantages emerging in challenging scenarios such as high-noise observations and high-dimensional parameter spaces.

2605.29268 2026-06-01 cs.CL cs.AI cs.LG cs.NE 版本更新

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

进化搜索中的计算分配:从深度-广度到多臂老虎机

Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang, Haozheng Luo, Tianfan Fu, Aarthy Nagarajan

发表机构 * University of Notre Dame(诺丁汉大学) Northeastern University(东北大学) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Southeast University(东南大学) Northwestern University(西北大学) Nanjing University(南京大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 针对LLM引导的进化搜索中固定预算的LLM调用分配问题,提出基于多臂老虎机的BaSE方法,通过跨并行轨迹分配调用,平均适应度提升12.3%。

详情
AI中文摘要

LLM引导的进化搜索(Evolve系统)在数学和组合任务上达到了最先进的结果,但现有系统通常只报告多次运行中的最佳结果,而未记录运行间的分布。我们询问如何分配固定的LLM调用预算,以及单次运行达到报告数字的可靠性如何。通过扫描五个模型和三个任务的深度-广度网格,我们识别出两个经验规律:一个适应度-计算包络线,其中能力排序主要取决于有效FLOPs;以及一个双线性深度-广度拟合,具有任务特定的交互;两者都受模型-任务能力门控。受这些规律启发,我们提出BaSE(基于老虎机的自进化),一种多臂老虎机,它在并行轨迹间分配LLM调用。在不改变模型、提示或评估器的情况下,BaSE在8个(模型,任务)单元上比最强的岛屿协议基线平均适应度提高12.3%,在方差高的设置上增益最大:仅通过分配实现可靠性提升。

英文摘要

LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report only the best of many runs and leave the run-to-run distribution undocumented. We ask how a fixed budget of LLM calls should be allocated, and how reliably a single run reaches the reported numbers. Sweeping the depth-breadth grid over five models and three tasks, we identify two empirical regularities: a fitness-compute envelope along which capability ordering largely collapses on effective FLOPs, and a bilinear depth-breadth fit with task-specific interaction; both are gated by model-task capability. Motivated by these regularities, we propose BaSE (Bandit-based Self-Evolving), a multi-armed bandit that allocates LLM calls across parallel trajectories. Without changing the model, prompt, or evaluator, BaSE improves mean fitness by 12.3% over the strongest island-protocol baseline across 8 (model, task) cells, with the largest gains on high-variance settings: a reliability gain from allocation alone.

2605.28918 2026-06-01 cs.LG cs.AI cs.IR 版本更新

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

当LLM奖励设计失败时:面向诊断的稀疏结构化RL改进

Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang

AI总结 针对稀疏结构化强化学习任务,提出诊断驱动的迭代奖励函数改进方法,通过训练诊断和失败模式分类指导修正,显著提升MiniGrid任务成功率。

详情
AI中文摘要

对于具有语义奖励函数接口的稀疏结构化强化学习任务,LLM生成的奖励塑造更适合被视作调试而非一次性生成。我们使用MiniGrid作为核心评估、MuJoCo作为边界压力测试,研究PPO训练的智能体。我们的审计发现两种主要的一次性失败模式——奖励泛滥和语义/API误解,以及一种较罕见的弱塑造情况。我们提出诊断驱动的迭代改进,其中训练诊断和失败模式分类法指导有针对性的奖励函数修订。改进使DoorKey-8x8从2.3%提升至97.6%,KeyCorridor从31.2%提升至86.7%,但种子间方差较高。控制实验表明这些提升并非来自重试或额外训练:仅指标重新提示导致大幅下降,而静态词汇控制恢复了大部分差距(87.6%;70.7%),表明分类法提示是主要机制,动态标签仅提供部分孤立的增量证据。预算匹配和Best-of-3比较将改进与选择和训练时间效应分离。组件移除测试、敏感性分析以及针对作者标签的审计为调试解释提供了汇聚证据,同时揭示了校准限制。连续控制结果显示了边界:基于成功的诊断可能在密集奖励的 locomotion 中误报,而回报趋势反馈移除了一个假阳性机制但未带来稳健提升。低调用协议是与基于种群的奖励搜索的成本对比,而非基准比较。在四个交叉方差设计环境中,点估计表明当LLM奖励函数方差占主导时收益更大,但bootstrap区间较宽。该方法局限于PPO下具有可靠接口的稀疏结构化任务;event_text等字段可能有益、有害或中性。

英文摘要

For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core evaluation and MuJoCo as boundary stress test. Our audit finds two dominant one-shot failure modes -- reward flooding and semantic/API misunderstanding -- plus a rarer weak-shaping case. We propose diagnostic-driven iterative refinement, where training diagnostics and a failure-mode taxonomy guide targeted reward-function revision. Refinement improves DoorKey-8x8 from 2.3% to 97.6% and KeyCorridor from 31.2% to 86.7% with high seed-to-seed variance. Controls show these gains are not from retrying or extra training: metrics-only re-prompting yields large drops, while a static-vocabulary control recovers much of the gap (87.6%; 70.7%), showing the taxonomy prompt is a major mechanism and dynamic labels provide only partially isolated incremental evidence. Budget-matched and Best-of-3 comparisons separate refinement from selection and training-time effects. Component-removal tests, sensitivity analyses, and an audit against author labels provide converging evidence for the debugging interpretation while revealing calibration limits. Continuous-control results show the boundary: success-based diagnostics can misfire in dense-reward locomotion, and return-trend feedback removes one false-positive mechanism without robust gains. The low-call protocol is a cost contrast with population-based reward search, not a benchmark comparison. In four crossed-variance-design environments, point estimates suggest larger gains when LLM reward-function variance dominates but bootstrap intervals are wide. The method is bounded to sparse structured tasks with reliable interfaces under PPO; fields like event_text may help, hurt, or be neutral.

2605.25134 2026-06-01 cs.LG cs.AI 版本更新

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

重参数化、权重衰减和自适应学习率下稀疏优化的理论分析

Huangyu Xu, Jingqin Yang, Qianqian Xu, Jiaye Teng

发表机构 * State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China(人工智能安全国家重点实验室,计算技术研究所,中国科学院,北京,中国) School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China(中国科学院大学计算机科学与技术学院,北京,中国) Beijing Academy of Artificial Intelligence (BAAI), Beijing, China(北京人工智能研究院(BAAI),北京,中国) IIIS, Tsinghua University, Beijing, China(清华大学人工智能院,北京,中国) School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China(上海财经大学统计与管理学院,上海,中国) Institute of Data Science and Statistics, Shanghai University of Finance and Economics, Shanghai, China(上海财经大学数据科学与统计研究所,上海,中国)

AI总结 针对稀疏优化中的不稳定问题,提出基于重参数化、权重衰减和自适应学习率的ReWA方法,通过改善优化景观实现比ℓ1正则化更好的稀疏性,同时保持测试精度。

Comments 32 pages, 5 figures. Submitted to ICML 2026

详情
AI中文摘要

稀疏优化是各种实际应用中的一个基本挑战。一种流行的稀疏优化方法是ℓ_p正则化。然而,当0<p<1时,由于无界梯度,它可能遇到优化不稳定性。在本文中,我们介绍了一种新的稀疏优化方法,称为ReWA,它基于重参数化、权重衰减和自适应学习率。ReWA与ℓ_p正则化密切相关,但它揭示了一个不同的优化景观,有助于缓解不稳定性问题。在CIFAR-10和ImageNet上使用ResNets进行的实验表明,与ℓ_1正则化方法相比,ReWA在保持测试精度的同时显著提高了稀疏性。

英文摘要

Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradients when $0<p<1$. In this paper, we introduce a novel approach to sparse optimization termed ReWA, based on Reparameterization, Weight decay, and Adaptive learning rate. ReWA is closely connected to $\ell_p$-regularization, yet it unveils a distinct optimization landscape that helps mitigate instability issues. Experiments on CIFAR-10 and ImageNet with ResNets demonstrate that ReWA leads to significant sparsity improvements over the $\ell_1$-regularization approach while preserving test accuracy.

2602.20176 2026-06-01 q-bio.BM cs.LG 版本更新

Cross-Chirality Generalization by Axial Vectors for Hetero-Chiral Protein-Peptide Interaction Design

通过轴向向量实现异手性蛋白质-肽相互作用设计的跨手性泛化

Ziyi Yang, Zitong Tian, Yinjun Jia, Tianyi Zhang, Jiqing Zheng, Hao Wang, Yubu Su, Juncai He, Lei Liu, Yanyan Lan

发表机构 * Department of Chemistry, Tsinghua University, Beijing, China(清华大学化学系) School of Life Sciences, Tsinghua University, Beijing, China(清华大学生命科学学院) Anew Labs, Shanghai, China(Anew实验室) Tsinghua-Peking Center for Life Sciences, Beijing, China(清华大学-北京大学生命科学中心) Ministry of Education Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Tsinghua University, Beijing, China(教育部生物有机磷化学与化学生物学重点实验室) Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China(合成与系统生物学中心) Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China(北京生物结构前沿研究中心) Qiuzhen College, Tsinghua University, Beijing, China(齐臻学院) Yau Mathematical Sciences Center, Tsinghua University, Beijing, China(叶德平数学科学中心) Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China(人工智能产业研究院(AIR),清华大学) AI Industry Research Innovation Center,Wuxi Research Institute for Applied Technologies, Tsinghua University(人工智能产业研究创新中心,无锡应用技术研究院,清华大学)

AI总结 提出向E(3)等变(极)向量特征注入轴向特征的方法,结合潜在扩散模型实现从同手性训练数据到异手性设计任务的跨手性泛化,首次通过湿实验验证了生成式AI从头设计D-肽结合物的有效性。

Comments v3: Revised acknowledgements only. The paper has been accepted to ICML 2026

详情
AI中文摘要

靶向L-蛋白的D-肽结合物具有广阔的治疗潜力。尽管基于机器学习的靶标条件肽设计取得了快速进展,但生成D-肽结合物仍基本未被探索。在这项工作中,我们表明通过向$E(3)$等变(极)向量特征注入轴向特征,可以实现从同手性(L--L)训练数据到异手性(D--L)设计任务的跨手性泛化。通过在潜在扩散模型中实现该方法,我们实现了D-肽结合物设计,不仅在 extit{in silico}基准测试中优于现有工具,而且在湿实验验证中显示出有效性。据我们所知,我们的方法代表了首个经过湿实验验证的用于 extit{de novo}设计D-肽结合物的生成式AI,为处理蛋白质设计中的手性提供了新视角。代码可在https://github.com/YZY010418/PepMirror获取。

英文摘要

D-peptide binders targeting L-proteins have promising therapeutic potential. Despite rapid advances in machine learning-based target-conditioned peptide design, generating D-peptide binders remains largely unexplored. In this work, we show that by injecting axial features to $E(3)$-equivariant (polar) vector features, it is feasible to achieve cross-chirality generalization from homo-chiral (L--L) training data to hetero-chiral (D--L) design tasks. By implementing this method within a latent diffusion model, we achieved D-peptide binder design that not only outperforms existing tools in \textit{in silico} benchmarks, but also demonstrates efficacy in wet-lab validation. To our knowledge, our approach represents the first wet-lab validated generative AI for the \textit{de novo} design of D-peptide binders, offering new perspectives on handling chirality in protein design. Codes are available at https://github.com/YZY010418/PepMirror

2510.15340 2026-06-01 quant-ph cs.LG cs.SY eess.SY 版本更新

Singularity-free dynamical invariants-based quantum control

基于无奇点动力学不变量量子控制

Ritik Sareen, Akram Youssry, Alberto Peruzzo

发表机构 * Quantum Photonics Laboratory(量子光子实验室) Centre for Quantum Computation and Communication Technology(量子计算与通信技术中心) RMIT University(皇家墨尔本理工大学) School of Electrical Engineering and Telecommunications(电气工程与电信学院)

AI总结 针对非马尔可夫开放量子系统中的态制备问题,提出一种广义不变量协议,通过将有限维控制问题转化为单量子比特问题,构建有界脉冲族并优化选择以抑制噪声,实现高保真度且硬件可行的控制。

详情
AI中文摘要

态制备是量子技术的基石,支撑着计算、通信和传感等应用。在非马尔可夫开放量子系统中,其重要性更加凸显,因为环境记忆和模型不确定性对实现高保真度控制构成了重大挑战。基于不变量的逆向工程为合成解析控制场提供了一个原则性框架,然而现有的参数化常常导致实验上不可行的奇异脉冲,并且仅限于简化噪声模型(如Lindblad形式)。本文针对任意噪声条件下的有限维态制备,引入了一种广义的不变量协议。通过将动力学限制在一个设计的SU(2)子空间内,我们将有限维控制问题转化为单量子比特的等效问题。控制协议分为两个阶段:首先,我们构造一族有界脉冲,在封闭系统中实现完美的态制备;其次,我们确定该族中的最优成员,以最小化噪声的影响。该框架同时适用于(i)已表征噪声,支持噪声感知控制合成,以及(ii)未表征噪声,其中一种与噪声无关的变体无需主方程描述即可保持鲁棒性。数值模拟表明,该方法能在多种目标上实现高保真度态制备,同时产生平滑、硬件可行的控制场。这一无奇点框架将基于不变量的控制扩展到现实的开放系统区域,为在NISQ硬件及其他表现出非马尔可夫动力学的平台上实现鲁棒的量子态工程提供了一条通用途径。

英文摘要

State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for finite-dimensional state preparation under arbitrary noise conditions. We transform the finite-dimensional control problem into the equivalent problem for a single-qubit, by restricting the dynamics to a designed SU(2) subspace. The control protocol then proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.

2509.21190 2026-06-01 cs.LG cs.AI 版本更新

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy

面向零样本时间序列异常检测的基础模型:利用合成数据和相对上下文差异

Tian Lan, Hao Duong Le, Jinbo Li, Wenjun He, Meng Wang, Chenghao Liu, Chen Zhang

发表机构 * Department of Industrial Engineering, Tsinghua University, Beijing, China(清华大学工业工程系) Datadog AI Research, Paris, France. This work was completed prior to joining Datadog(Datadog AI 研究院) Lab, Huawei Technologies, ShenZhen, China(华为技术2012实验室)

AI总结 提出基于相对上下文差异(RCD)的预训练范式,通过合成数据训练Transformer模型比较查询模式与上下文,实现零样本时间序列异常检测,在多个基准上超越现有基础模型。

Comments This manuscript is withdrawn, as the authors intend to further extend and develop the work beyond its current scope

详情
AI中文摘要

时间序列异常检测(TSAD)是一项关键任务,但开发能够以零样本方式泛化到未见数据的模型仍然具有挑战性。现有的TSAD基础模型通常依赖推理时的重构误差评分,这可能会遗漏重构良好的细微异常,并可能错误地标记未见领域中复杂但正常的模式。我们引入了TimeRCD,这是一个基于相对上下文差异(RCD)构建的TSAD基础模型,RCD是一种预训练范式,通过比较查询模式与其周围上下文来训练模型检测异常。这种关系公式通过标准Transformer架构实现,使模型能够从输入上下文中推断正常性,而不是依赖固定的全局正常模式。我们进一步构建了一个大规模合成语料库,其中包含上下文相关的异常标签,为RCD提供监督预训练信号。跨多个基准的实验表明,在大多数零样本TSAD设置中,TimeRCD优于现有的通用和异常特定基础模型,同时与数据集特定的全样本基线保持竞争力。这些结果提供了实证证据,表明RCD是构建鲁棒且可泛化的TSAD模型的有效方向。

英文摘要

Time series anomaly detection (TSAD) is a critical task, but developing models that generalize to unseen data in a zero-shot manner remains challenging. Existing foundation models for TSAD often rely on reconstruction-error scoring at inference time, which can miss subtle anomalies that are well reconstructed and can falsely flag complex but normal patterns in unseen domains. We introduce TimeRCD, a foundation model for TSAD built on Relative Context Discrepancy (RCD), a pre-training paradigm that trains the model to detect anomalies by comparing a query pattern with its surrounding context. This relational formulation, implemented with a standard Transformer architecture, enables the model to infer normality from the input context rather than relying on fixed global normal patterns. We further construct a large-scale synthetic corpus with context-dependent anomaly labels to provide supervised pre-training signals for RCD. Experiments across diverse benchmarks show that TimeRCD outperforms existing general-purpose and anomaly-specific foundation models in most zero-shot TSAD settings, while remaining competitive with dataset-specific full-shot baselines. These results provide empirical evidence that RCD is an effective direction for building robust and generalizable TSAD models.

2605.25773 2026-06-01 stat.ML cs.AI cs.CL cs.LG 版本更新

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

高效基准测试仅是特征选择与多元回归

Sam Bowyer, Acyr Locatelli, Kris Cao

发表机构 * Cohere University of Bristol(布里斯托大学)

AI总结 将高效基准测试重新定义为带特征选择的多元回归问题,使用核岭回归预测和mRMR特征选择算法,在降低计算成本的同时提高预测精度和排名相关性。

Comments 36 pages, 27 figures

详情
AI中文摘要

高效基准测试技术旨在通过仅使用基准测试问题子集预测完整基准测试分数,从而降低评估LLMs的计算成本。通过将此问题重新定义为带特征选择的多元回归实例,我们发现只需在预测阶段使用核岭回归即可大幅改进现有高效基准测试方法。此外,使用一种名为最小冗余最大相关性(mRMR)的信息论特征选择算法,我们可以通过选择对预测最有用的问题子集进一步改进这些方法。除数据非常匮乏的情况外,这些方法在二元和连续指标的各种基准测试中,始终实现更小的预测误差(MAE和RMSE),以及预测分数与真实分数之间更大的排名相关性(Spearman ρ和Kendall τ)。此外,mRMR子采样比竞争方法(通常涉及拟合概率模型或运行聚类算法)快得多,并且在不同随机种子或训练数据划分下更可能选择相同的问题。教程代码见https://github.com/sambowyer/mrmr_eval。

英文摘要

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression with feature selection, we find that existing efficient benchmarking methods can be greatly improved by simply using kernel ridge regression at the prediction stage. Additionally, using an information-theoretic feature-selection algorithm called minimum redundancy maximum relevance (mRMR), we can further improve upon these methods by selecting question subsets that will be maximally useful for prediction. Except in very data-poor settings, these approaches consistently achieve smaller prediction errors (in both MAE and RMSE), and greater ranking correlation between predicted and true scores (in both Spearman $ρ$ and Kendall $τ$) across a range of benchmarks using both binary and continuous metrics. Furthermore, mRMR subsampling is much faster than competitor methods (which often involve fitting probabilistic models or running clustering algorithms), and is more likely to select the same questions under different random seeds or training data splits. Tutorial code can be found at https://github.com/sambowyer/mrmr_eval .

2503.07482 2026-06-01 cs.LG cs.AI 版本更新

How does Bayesian Sampling help Membership Inference Attacks?

贝叶斯采样如何帮助成员推断攻击?

Zhenlong Liu, Wenyu Jiang, Feng Zhou, Hongxin Wei

发表机构 * Department of Statistics and Data Science, Southern University of Science and Technology(统计与数据科学系,南方科技大学) Shanghai Innovation Institute(上海创新研究院) School of Computer Science, Nanjing University(南京大学计算机科学系) Center for Applied Statistics and School of Statistics, Renmin University of China(应用统计中心和统计学系,中国人民大学)

AI总结 提出贝叶斯成员推断攻击(BMIA),通过拉普拉斯近似对单个参考模型进行贝叶斯采样以估计条件分数分布,理论证明降低模型内方差从而提升攻击性能,并在多模态数据集上实现最先进的效果与效率。

Comments Accepted to ICML 2026

详情
AI中文摘要

成员推断攻击(MIAs)旨在估计特定数据点是否用于给定模型的训练。现有的最先进攻击通常依赖于训练多个参考模型来近似单个数据点的条件分数分布,这导致显著的计算开销并限制了其实际适用性。在这项工作中,我们提出了一种新颖的方法——贝叶斯成员推断攻击(BMIA),通过贝叶斯采样执行条件攻击。具体来说,我们对单个参考模型应用拉普拉斯近似以获得模型参数的后验分布,从而能够直接估计条件分数分布。理论上,我们证明了贝叶斯采样降低了模型内方差,从而提高了攻击能力。这一见解自然地激发了多参考变体,当有额外的参考模型可用时,该变体进一步提升了性能。在图像、文本和表格数据集上的大量实验表明,我们的方法在有效性和效率方面均达到了最先进的性能。

英文摘要

Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Existing state-of-the-art attacks typically rely on training multiple reference models to approximate the conditional score distribution for individual data points, which leads to significant computational overhead and limits their practical applicability. In this work, we propose a novel approach -- Bayesian Membership Inference Attack (BMIA), which performs conditional attack through Bayesian sampling. Specifically, we apply Laplace approximation to a single reference model to obtain a posterior over model parameters, enabling direct estimation of the conditional score distribution. Theoretically, we demonstrate that Bayesian sampling reduces intra-model variance, thereby improving attack power. This insight naturally motivates the multi-reference variant that further enhances performance when additional reference models are available. Extensive experiments across image, text, and tabular datasets indicate that our method achieves state-of-the-art performance in both effectiveness and efficiency.

2605.24535 2026-06-01 cs.CR cs.LG 版本更新

Steering Beyond the Support: Adversarial Training on Unsupervised Jailbroken Activation Simulation

超越支持域:无监督越狱激活模拟的对抗训练

Luoyu Chen, Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Feng Wu, Jianhuan Huang, Ahmed Asiri, Shui Yu

发表机构 * University of Technology Sydney, Sydney, Australia(技术悉尼大学) School of Cyber Science and Engineering, Southeast University, Nanjing, China(东南大学计算机科学与工程学院) Xi'an Jiaotong University, Xi'an, China(西安交通大学)

AI总结 针对现有安全引导方法对未见越狱攻击泛化失败的问题,提出基于无监督潜在方向发现的双层对抗训练框架,通过外推拒绝态有害请求激活模拟多样越狱激活,并训练势诱导引导场实现零样本越狱防御。

Comments accepted by ICML 2026

详情
AI中文摘要

越狱提示可以触发对齐LLM的有害完成,相应地,安全引导被提出:在测试时进行激活干预,引导越狱激活触发拒绝,同时保持良性效用。然而,现有的引导方法本质上是监督的,并且依赖于静态、有限的训练集,而真实的越狱是不断演化的,并且通常与训练集分布外,导致对未见攻击的失败。在本文中,我们解决了未见越狱的失败问题,基于无监督潜在方向发现。我们提出了一个双层对抗训练框架用于零样本越狱防御。在内部步骤中,我们通过无监督潜在方向发现从拒绝态有害请求激活外推,模拟多样的越狱激活,从而扩展真实越狱激活子空间的覆盖范围。在外部步骤中,我们训练一个势诱导引导场,将这些对抗性越狱状态推入拒绝区域,同时保持良性不变。在三个LLM和六个经典越狱家族上,我们的方法实现了强防御,攻击成功率大多低于5%,并且训练过程中子空间覆盖率的上升有助于解释改进的泛化性。

英文摘要

Jailbreak prompts can trigger harmful completions on aligned LLMs, In accordance, safety steering has been proposed: test-time activation interventions that steer jailbreak activations to trigger refusal while preserving benign utility. However, existing steering methods are fundamentally supervised and tied to a static, limited training set, whereas real jailbreaks evolve and are often out-of-distributed from the training set, leading to failures on unseen attacks. In this paper, we tackle the failure on unseen jailbreaks problem, base on unsupervised latent direction discovery. We propose a bi-level adversarial training framework for zero-shot jailbreak defense. In the inner step, we simulate diverse jail-broken activations by extrapolating from refusal-state harmful-request activations via unsupervised latent direction discovery, which expands the coverage of real jailbreak activation subspaces. In the outer step, we train a potential-induced steering field to push these adversarial jailbroken states into refusal regions while keeping benign unchanged. Across three LLMs and six classical jailbreak families, our method achieves strong defense with attack success rates mostly below 5%, and rising subspace coverage throughout training helps explain the improved generalization.

2603.24254 2026-06-01 cs.LG cs.AI 版本更新

Beyond Static Uncertainty: Modeling Temporal Uncertainty Dynamics for Probabilistic Time Series Forecasting

超越静态不确定性:为概率时间序列建模时间不确定性动态

Yijun Wang, Qiyuan Zhuang, Larysa Marchanka, Xiu-Shen Wei

发表机构 * Department of Computer Science, Southeast University(东南大学计算机科学系) Francisk Skorina Gomel State University(弗拉基米尔·斯科里纳戈梅尔州立大学)

AI总结 提出VolDy-VAE模型,通过循环尺度路径捕捉波动率动态,实现时间一致的概率预测,提升准确性和不确定性校准。

详情
AI中文摘要

现实世界的时间序列表现出时间结构化的不确定性:波动率在动荡时期聚集,在稳定时期消散,并在结构断裂处突然变化。然而,许多概率预测方法将预测不确定性估计为独立的逐点量,忽略了波动率机制的演变和持续性。我们将这一缺失维度形式化为时间不确定性动态,并在波动率动态变分自编码器(VolDy-VAE)中实例化它,这是一个具有位置-尺度解码器的非自回归生成预测器。VolDy-VAE结合了用于均值预测的位置路径和用于传递和演化波动率隐藏状态的循环尺度路径,该状态从回溯窗口转移到预测范围,从而实现时间一致的预测方差。这种设计产生了一种自适应衰减机制:高方差观测值对位置估计的影响较小,而其不确定性通过明确的尺度预测得以保留。我们进一步提供了一个简化的机制转换分析,表明当方差已知或一致估计时,波动率感知目标简化为逆方差加权,而基于MSE的估计量保持无偏但统计效率较低。在九个基准上的实验表明,VolDy-VAE在保持低推理延迟的同时,提高了预测准确性和不确定性校准,优于竞争的概率和点预测基线;插件研究进一步表明,VolDy原理可以有益于GAN、Koopman VAE和Transformer骨干网络。源代码公开于https://github.com/wangyijunlyy/VolDy-VAE。

英文摘要

Real-world time series exhibit temporally structured uncertainty: volatility clusters in turbulent regimes, dissipates in stable periods, and shifts abruptly around structural breaks. Yet many probabilistic forecasting methods estimate predictive uncertainty as an independent per-step quantity, leaving the evolution and persistence of volatility regimes under-modeled. We formalize this missing dimension as temporal uncertainty dynamics and instantiate it in the Volatility Dynamics Variational Autoencoder (VolDy-VAE), a non-autoregressive generative forecaster with a location-scale decoder. VolDy-VAE combines a location path for mean prediction with a recurrent scale path that transfers and evolves a volatility hidden state from the look-back window to the forecasting horizon, enabling temporally coherent predictive variances. This design yields an adaptive attenuation mechanism: high-variance observations receive lower influence on the location estimate while their uncertainty is preserved through explicit scale predictions. We further provide a simplified regime-switching analysis showing that, when variances are known or consistently estimated, the volatility-aware objective reduces to inverse-variance weighting, whereas MSE-based estimators remain unbiased but statistically inefficient. Experiments on nine benchmarks show that VolDy-VAE improves forecasting accuracy and uncertainty calibration over competitive probabilistic and point-forecasting baselines while maintaining low inference latency; plug-in studies further indicate that the VolDy principle can benefit GAN, Koopman VAE, and Transformer backbones. The source code is publicly available at https://github.com/wangyijunlyy/VolDy-VAE.

2501.02672 2026-06-01 stat.ML cs.LG econ.EM stat.ME 版本更新

Re-examining Granger Causality with Causal Bayesian Networks and Reichenbachs Principles

重新审视格兰杰因果关系:基于因果贝叶斯网络和赖兴巴赫原理

S. A. Adedayo

发表机构 * Univie Doctoral School of Computer Science (DOCS)(维也纳计算机科学博士学院)

AI总结 本文通过赖兴巴赫原理和因果贝叶斯网络重新解释格兰杰因果关系,提出因果化格兰杰因果关系(c-GC)算法,赋予其稳健的因果解释,并在合成数据上取得满意结果。

详情
AI中文摘要

表征复杂系统中的因果关系是理解其潜在机制的基础。格兰杰因果关系(GC)仍然是识别时间序列数据中因果关系的广泛使用的计算工具。然而,与其他因果发现方法一样,GC存在局限性,并因缺乏严格的因果基础而受到批评。在这项工作中,我们通过赖兴巴赫原理和因果贝叶斯网络的视角重新解释GC,从而解决了这一批评。这种重新解释被实现为一种算法,我们称之为因果化格兰杰因果关系(c-GC)。我们在理论上和图形上证明,这种重新表述在特定假设下赋予GC稳健的因果解释。c-GC在合成数据上取得了令人满意的结果,为观测数据集中的因果发现提供了一个更有原则的框架。

英文摘要

Characterising cause-effect relationships in complex systems is fundamental to understanding their underlying mechanisms. Granger causality (GC) remains a widely used computational tool for identifying causal relationships in time series data. However, like other causal discovery methods, GC has limitations and has been criticised for lacking a rigorous causal foundation. In this work, we present a fix to this criticism by reinterpreting GC through the lenses of Reichenbach's principles and causal Bayesian networks. This reinterpretation was implemented as an algorithm we call causalized Granger causality (c-GC). We demonstrate, both theoretically and graphically, that this reformulation endows GC with a robust causal interpretation under specific assumptions. c-GC yields satisfactory results on synthetic data, offering a more principled framework for causal discovery in observational datasets.

2605.23937 2026-06-01 cs.AI cs.LG cs.LO math.OC 版本更新

BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization

BoxLitE:基于凸优化的忠实知识库嵌入

Bruno F. Lourenço, Hesham Morgan, Ana Ozaki, Aleksandar Pavlović, Emanuel Sallinger

发表机构 * The Institute of Statistical Mathematics, Japan(日本统计数学研究所) TU Wien, Austria(奥地利技术大学维也纳分校) University of Oslo, Norway(挪威奥斯陆大学) University of Applied Sciences Campus Vienna, Austria(奥地利应用科学大学维也纳校区)

AI总结 提出BoxLitE模型,通过凸优化实现DL-Lite$^{\mathcal{H}}$知识库的忠实嵌入,确保可满足知识库存在弱忠实模型。

Comments 28 pages. Full version of paper accepted to KR 2026 (23nd International Conference on Principles of Knowledge Representation and Reasoning). Track: KR meets Machine Learning and Explanation. Added a figure and some minor changes

详情
AI中文摘要

知识库(KB)嵌入旨在结合经典知识图谱嵌入在事实(ABox)中泛化信息的能力与本体语言(TBox)表示的概念知识。多位作者最近探索了将概念映射到向量空间中凸区域的思想。这对于表示TBox中通常存在的层次结构很有用,因为更一般的概念可以映射到更大的区域,包含与更具体概念相关的区域。然而,在实际学习任务中,凸性的能力很少被利用。在这里,我们引入了BoxLitE,一个针对DL-Lite$^{\mathcal{H}}$的KB嵌入模型,允许凸优化。我们证明,对于任何可满足的DL-Lite$^{\mathcal{H}}$ KB,存在一个BoxLitE嵌入,它是一个弱忠实模型。作为概念验证,我们展示了如何将KB嵌入任务表述为凸优化问题,以及如何获得具有这种理想忠实性属性的嵌入。

英文摘要

Knowledge base (KB) embeddings aim at combining the capability of classical knowledge graph embeddings to generalize the information present in facts, the ABox, with conceptual knowledge represented in an ontology language, the TBox. Several authors have recently explored the idea of mapping concepts to convex regions in a vector space. This is useful to represent hierarchies, typically present in TBoxes, since more general concepts can be mapped to larger regions, containing those regions associated with more specific concepts. However, the power of convexity is rarely leveraged during the actual learning tasks. Here, we introduce BoxLitE, a KB embedding model for DL-Lite$^{\mathcal{H}}$ that allows for convex optimization. We show that for any satisfiable DL-Lite$^{\mathcal{H}}$ KB, there is a BoxLitE embedding that is a weakly faithful model. As a proof of concept, we show how to formulate the KB embedding task as a convex optimization problem and how to obtain embeddings with such desirable faithfulness properties.

2605.15530 2026-06-01 cs.LG 版本更新

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

重新思考神经网络学习率:从Stackelberg视角

Sihan Zeng, Sujay Bhatt, Sumitra Ganesh

发表机构 * JPMorgan AI Research, United States(摩根大通人工智能研究实验室,美国)

AI总结 本文从Stackelberg优化角度研究非均匀学习率,证明对网络主体层用小学习率、最后一层用大学习率可解释为两时间尺度交替梯度下降,并建立有限时间收敛保证,揭示其通过改善优化结构和局部曲率加速训练。

详情
AI中文摘要

神经网络通常在所有层使用单一学习率进行训练。虽然最近的实验证据表明,为各层分配特定学习率可以加速训练,但对于非均匀学习率有益的条件和机制,目前仍缺乏原则性的理解。在这项工作中,我们从Stackelberg优化的角度研究非均匀学习率。具体来说,我们证明,对网络主体层使用较小的学习率、对最后一层使用较大的学习率来训练神经网络,可以解释为对原始目标的Stackelberg重构应用两时间尺度交替梯度下降算法。我们在适应约束集和非光滑激活函数的广泛条件下,建立了该算法的有限时间收敛保证。除了收敛性,我们识别了非均匀学习率优于均匀学习率的两种机制:(i)我们表明,某些问题实例会诱导出比原始目标具有更强优化结构的Stackelberg目标,从而更快收敛到全局最优解;(ii)我们的数值分析揭示,Stackelberg目标可以表现出明显更尖锐的局部曲率,尤其是在训练早期,这导致更信息丰富的梯度和学习加速。在监督学习和强化学习中的实验支持了我们的发现。

英文摘要

Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechanisms under which non-uniform learning rates are beneficial remains limited. In this work, we investigate non-uniform learning rates through the lens of Stackelberg optimization. Specifically, we demonstrate that training neural networks with a smaller learning rate for the body layers and a larger learning rate for the final layer can be interpreted as a two-time-scale alternating gradient descent algorithm applied to a Stackelberg reformulation of the original objective. We establish finite-time convergence guarantees for the algorithm under broad conditions that accommodate constraint sets and non-smooth activation functions. Beyond convergence, we identify two mechanisms by which non-uniform learning rates can outperform uniform learning rates: (i) we show that certain problem instances induce a Stackelberg objective with stronger optimization structure than the original objective, yielding faster convergence to globally optimal solutions, (ii) our numerical analysis reveals that the Stackelberg objective can exhibit substantially sharper local curvature, especially in early training, which leads to more informative gradients and learning acceleration. Experiments in supervised learning and reinforcement learning support our findings.

2605.22967 2026-06-01 cs.LG 版本更新

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

学习的中继表示用于前向思考的离散扩散模型

Benjamin Rozonoyer, Jacopo Minniti, Dhruvesh Patel, Neil Band, Avishek Joey Bose, Tim G. J. Rudner, Andrew McCallum

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) University of Toronto(多伦多大学) Stanford University(斯坦福大学) Imperial College London(伦敦帝国学院) Mila Vijil

AI总结 提出Learned Relay Representations (Relay)方法,通过可微通道传递潜在信息,使掩码扩散模型在去噪步骤间前向思考,减少推理延迟并提升性能。

Comments 16 pages, 3 figures. Equal contribution: Benjamin Rozonoyer, Jacopo Minniti, and Dhruvesh Patel. Code: https://github.com/jacopo-minniti/relay

详情
AI中文摘要

当掩码扩散模型(MDMs)通过迭代细化生成序列时,掩码位置上的丰富内部计算被丢弃,迫使每个后续细化步骤重新计算存储为模型表示的有价值内部信息。为了避免去噪轮次之间的硬重置,我们提出了学习的中继表示(Relay),一种允许MDMs在去噪时进行前向思考的方法,通过显式学习如何传播潜在信息以利于未来的去噪步骤。Relay引入了一个可微的逐token通道,在前向传递之间传递信息,并通过时间截断反向传播(BPTT)进行训练。我们展示了该框架可以扩展到最先进的扩散语言模型(DLMs),并且与块扩散和KV缓存等技术无缝兼容。我们首先在具有挑战性的基于数独的规划任务上对Relay的设计选择进行了彻底验证。然后,我们将Relay扩展到最先进的DLM Fast-dLLM v2,在编码任务上优于标准的监督微调,同时将推理延迟降低高达32%。我们的实证结果表明,最先进的DLM可以被显式训练以在解码步骤间前向中继潜在信息,从而推进性能-延迟帕累托前沿。我们提供了所有实验的代码。

英文摘要

When Masked Diffusion Models (MDMs) generate sequences through iterative refinement, the rich internal computation over masked positions is discarded, forcing every subsequent refinement step to recompute the valuable internal information stored as model representations. To avoid a hard reset between denoising rounds, we propose Learned Relay Representations (Relay), a method that allows MDMs to be forward-thinking when denoising by explicitly learning how to propagate latent information for the benefit of future denoising steps. Relay introduces a differentiable per-token channel that passes information between forward passes and is trained via truncated backpropagation through time (BPTT). We show that this framework can be scaled to state-of-the-art Diffusion Language Models (DLMs), and is seamlessly compatible with techniques like block diffusion and KV caching. We first provide a thorough justification of the design choices in Relay on a challenging Sudoku-based planning task. We then scale Relay to Fast-dLLM v2, a state-of-the-art DLM, outperforming standard supervised finetuning on coding tasks while reducing inference latency by up to 32%. Our empirical results demonstrate that state-of-the-art DLMs can be explicitly trained to relay latent information forward across decoding steps, advancing the performance-latency Pareto frontier. We provide code for all our experiments.

2605.20036 2026-06-01 cs.LG 版本更新

D$^3$-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing Market

D$^3$-Subsidy:大规模网约车市场的在线和顺序司机补贴决策

Taijie Chen, Rui Su, Siyuan Feng, Laoming Zhang, Hongyang Zhang, Haijiao Wang, Zhaofeng Ma, Jintao Ke, Li Ma

发表机构 * University of Hong Kong(香港大学) Harbin Institute of Technology(哈尔滨工业大学) Hong Kong Polytechnic University(香港理工大学)

AI总结 针对网约车市场动态环境,提出基于扩散的分层框架D$^3$-Subsidy,通过前缀条件扩散模型和拉格朗日对偶映射实现城市级补贴控制,在满足补贴率上限和低延迟约束下提升订单量和GMV。

Comments 14 pages, 14 figures

详情
AI中文摘要

滴滴出行等网约车平台运行在高度动态的环境中,平衡司机供给和乘客需求至关重要。尽管司机端补贴是调整这些力量并改善关键KPI(如完成订单数(\texttt{Rides})和总交易额(\texttt{GMV}))的主要杠杆,但在生产中优化它们需要同时满足三个约束:(i)对随机冲击的响应性,(ii)严格的补贴率上限,以及(iii)城市规模的低延迟执行。这些要求排除了昂贵的逐订单优化,需要一种前瞻性的、约束感知的城市级控制器用于在线顺序决策。为了满足这些要求,我们引入了D$^3$-Subsidy(动态司机端基于扩散的补贴),一种基于扩散的分层框架,用于可部署的全城补贴控制。为了弥合训练-推理差距,D$^3$-Subsidy采用前缀条件扩散模型,从不可变的历史观测中采样可能的未来轨迹,确保训练协议与在线部署的固定历史性质一致。这些生成的计划随后由上下文条件逆模块解码为低维城市级控制信号。对于可扩展的执行,我们通过拉格朗日对偶导出的映射弥合了城市级规划和细粒度调度之间的差距,该映射将补贴率上限直接嵌入到订单-司机激励中,无需迭代优化。此外,采用参数高效微调的多城市预训练策略能够实现跨异构城市的鲁棒迁移。广泛的离线评估表明,D$^3$-Subsidy在提高\texttt{Rides}和\texttt{GMV}的同时增强了上限合规性,而真实世界的A/B测试证实了显著提升,同时将预算相关的违规指标保持在运营阈值内。

英文摘要

Ride-hailing platforms like DiDi Chuxing operate in highly dynamic environments where balancing driver supply and passenger demand is critical. Although driver-side subsidies serve as a primary lever to align these forces and improve key KPIs like completed rides (\texttt{Rides}) and gross merchandise value (\texttt{GMV}), optimizing them in production requires simultaneously meeting three constraints: (i) responsiveness to stochastic shocks, (ii) strict subsidy-rate caps, and (iii) low-latency execution at city scale. These requirements rule out expensive per-order optimization, calling for a forward-looking, constraint-aware city-level controller for online sequential decision making. To meet these requirements, we introduce D$^3$-Subsidy (Dynamic Driver-side Diffusion-based Subsidy), a hierarchical diffusion-based framework for deployable city-wide subsidy control. To bridge the train-inference gap, D$^3$-Subsidy employs a prefix-conditioned diffusion model that samples plausible future trajectories from immutable historical observations, ensuring the training protocol aligns with the fixed-history nature of online deployment. These generated plans are then decoded by a context-conditioned inverse module into low-dimensional city-level control signals. For scalable execution, we bridge the gap between city-level planning and fine-grained dispatch via a Lagrangian-dual-derived mapping, which embeds subsidy-rate caps directly into order-driver incentives without iterative optimization. Additionally, a multi-city pretraining strategy with parameter-efficient fine-tuning enables robust transfer across heterogeneous cities. Extensive offline evaluations demonstrate that D$^3$-Subsidy improves \texttt{Rides} and \texttt{GMV} while enhancing cap compliance, and a real-world A/B test confirms significant uplift while keeping budget-related violation metrics within operational thresholds.

2506.21035 2026-06-01 cs.LG 版本更新

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

循序渐进:通过增量混合秩-1联想记忆专家实现持续学习

Haodong Lu, Chongyang Zhao, Minhui Xue, Lina Yao, Kristen Moore, Dong Gong

发表机构 * University of New South Wales(新南威尔士大学)

AI总结 针对持续学习中专家粒度粗糙导致的冗余、干扰和遗忘问题,提出MoRAM方法,将秩-1适配器作为细粒度专家和联想记忆单元,通过自激活机制实现增量扩展,显著提升塑性-稳定性权衡和泛化能力。

Comments Accepted at ICML2026. Project page: https://artificer-ai-lab.github.io/MoRAM/

详情
AI中文摘要

持续学习(CL)与大型预训练模型旨在增量获取知识而不发生灾难性遗忘。现有的基于LoRA的混合专家(MoE)方法通过添加孤立的新专家并冻结旧专家来扩展容量,但仍存在冗余、干扰、路由模糊以及由此导致的遗忘问题。我们研究了源于粗粒度专家粒度的问题。粗粒度专家(例如高秩LoRA)编码低专一性信息,导致专家重复/干扰以及随着专家积累而路由退化/混乱。在这项工作中,我们提出了MoRAM(混合秩-1联想记忆)。基于权重矩阵作为线性联想记忆的观点,MoRAM将CL实现为可重用原子秩-1专家作为记忆的增量扩展。每个秩-1适配器充当细粒度MoE专家或联想记忆单元。通过将秩-1专家视为键值记忆对,我们消除了显式的MoE-LoRA路由器,采用自激活机制,其中每个记忆原子通过其内在键评估其相关性。因此,推理过程成为对增量累积的学习快照记忆的内容可寻址检索和回忆。在CLIP和LLM上的大量实验表明,MoRAM显著优于最先进的方法,实现了更好的塑性-稳定性权衡、更强的泛化能力和更少的遗忘。项目页面:https://artificer-ai-lab.github.io/MoRAM/。

英文摘要

Continual learning (CL) with large pre-trained models aims to incrementally acquire knowledge without catastrophic forgetting. Existing LoRA-based Mixture-of-Experts (MoE) methods expand capacity by adding isolated new experts while freezing old ones, but still suffer from redundancy, interference, routing ambiguity, and consequent forgetting. We investigate the issues stemming from coarse-grained expert granularity. Coarse-grained experts (e.g., high-rank LoRA) encode low-specialty information, leading to expert duplication/interference and routing degradation/confusion as experts accumulate. In this work, we propose MoRAM (Mixture of Rank-1 Associative Memory). Grounded in the view that weight matrices act as linear associative memories, MoRAM achieves CL as incremental expansion of reusable atomic rank-1 experts as memory. Each rank-1 adapter acts as a fine-grained MoE expert or an associative memory unit. By viewing rank-1 experts as key-value memory pairs, we eliminate explicit MoE-LoRA routers with self-activation, where each memory atom evaluates its relevance via its intrinsic key. The inference process thus becomes a content-addressable retrieval and recall over the incrementally accumulated memory of learning snapshots. Extensive experiments on CLIP and LLMs show that MoRAM significantly outperforms state-of-the-art methods, achieving a better plasticity-stability trade-off, stronger generalization, and reduced forgetting. Project Page: https://artificer-ai-lab.github.io/MoRAM/.

2605.21470 2026-06-01 cs.LG cs.AI 版本更新

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

面向延迟优化的Web Agent规划与调度的Agent即时编译

Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, Christos Kozyrakis

发表机构 * Stanford University(斯坦福大学)

AI总结 提出Agent即时编译系统,通过JIT-Planner生成代码计划、JIT-Scheduler探索并行化策略及不变式工具协议,显著降低延迟并提高准确性。

Comments Accepted at ICML 2026

详情
AI中文摘要

计算机使用Agent通过生成对浏览器中点击、输入、滚动等工具的调用序列,自动化自然语言指定的任务,例如“从Taco Bell订购最便宜的商品”。当前实现遵循顺序的获取截图-执行循环,每次迭代需要一次LLM调用,导致高延迟和因工具使用错误而频繁出错。我们提出了Agent即时编译系统,该系统将任务描述直接编译为可执行代码,其中可能包含LLM调用、工具调用和并行化。我们的方法包括三个组件:(1)JIT-Planner,生成多个代码计划,根据工具规范验证每个计划,并选择最小成本候选;(2)JIT-Scheduler,通过从学习到的延迟分布进行蒙特卡洛成本估计,探索并行化策略;(3)不变式强制工具协议,指定前置条件和后置条件要求,以减少工具使用错误率。在五个应用中,JIT-Planner相比Browser-Use实现了10.4倍的加速和28%的更高准确率,而JIT-Scheduler相比OpenAI CUA实现了2.4倍的加速和9%的更高准确率。

英文摘要

Computer-use agents (CUAs) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We present agent just-in-time (JIT) compilation, a system that compiles task descriptions directly into executable code that may include LLM calls, tool calls, and parallelization. Our approach comprises three components: (1) JIT-Planner, which generates multiple code plans, validates each against tool specifications, and selects the minimum-cost candidate; (2) JIT-Scheduler, which explores parallelization strategies via Monte Carlo cost estimation from learned latency distributions; and (3) an invariant-enforcing tool protocol specifying precondition and postcondition requirements to reduce the rate of incorrect tool use. Across five applications, JIT-Planner achieves $10.4\times$ speedup and 28$\%$ higher accuracy over Browser-Use, while JIT-Scheduler achieves $2.4\times$ speedup and 9\% higher accuracy over OpenAI CUA.

2605.21108 2026-06-01 cs.LG cs.AI 版本更新

Efficient Learning of Deep State Space Models via Importance Smoothing

通过重要性平滑高效学习深度状态空间模型

John-Joseph Brady, Nikolas Nusken, Yunpeng Li

发表机构 * Centre for Oral, Clinical and Translational Sciences, King's College London, London, United Kingdom(口腔、临床与转化科学中心,伦敦国王学院,伦敦,英国) Department of Mathematics, King's College London, London, United Kingdom(数学系,伦敦国王学院,伦敦,英国)

AI总结 提出并行变分蒙特卡洛(PVMC)方法,结合变分推断和序贯蒙特卡洛,实现深度状态空间模型在判别与生成任务上的高效训练,速度提升10倍。

Comments Accepted to the proceedings of ICML 2026

详情
AI中文摘要

潜在状态空间系统在统计建模中无处不在,当通过噪声观测时间序列时自然出现。然而,大规模训练深度状态空间模型(DSSM)仍然困难。训练DSSM出现了两种截然不同的策略。第一种是自编码DSSM,通过优化变分下界来训练生成模型。第二种是通过经典序贯蒙特卡洛(SMC)算法的输出进行反向传播。这些方法可以训练DSSM用于判别和生成任务,但其固有的顺序前向传递在现代硬件上扩展性差。我们提出了并行变分蒙特卡洛(PVMC),一种新的训练方法,它桥接了这些范式,并稳健地训练DSSM用于判别和生成任务。在一组基准实验中,PVMC达到或超过了最先进的性能,同时训练速度比最快的竞争SMC方法快10倍。

英文摘要

Latent state space systems are ubiquitous in statistical modelling, arising naturally when time series are observed through noisy measurements. However, training deep state space models (DSSMs) at scale remains difficult. Two largely distinct strategies have emerged for training DSSMs. The first, auto-encoding DSSMs, trains generative models by optimising a variational lower bound. The second backpropagates through the outputs of classical sequential Monte Carlo (SMC) algorithms. Such approaches can train DSSMs for both discriminative and generative tasks, but their inherently sequential forward passes scale poorly on modern hardware. We propose \emph{parallel variational Monte Carlo} (PVMC), a new training method that bridges these paradigms and robustly trains DSSMs for both discriminative and generative tasks. Across a set of benchmark experiments, PVMC matches or exceeds state-of-the-art performance while training $10\times$ faster than the fastest competing SMC-based approach.

2605.20873 2026-06-01 cs.AI cs.LG 版本更新

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

PlanningBench: 生成可扩展且可验证的规划数据以评估和训练大型语言模型

Ziliang Zhao, Zenan Xu, Shuting Wang, Hongjin Qian, Yan Lei, Minda Hu, Zhao Wang, Shihan Dou, Zhicheng Dou, Pluto Zhou

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学人工智能学院 Gallagher 学校) LLM Department, Hunyuan Team, Tencent(腾讯 Hunyuan 团队 LLM 部门) Beijing Academy of Artificial Intelligence(北京人工智能研究院) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出PlanningBench框架,通过约束驱动合成管道生成可扩展、多样化且可验证的规划数据,用于评估和训练LLMs,并验证其在提升规划能力上的有效性。

详情
AI中文摘要

规划是大型语言模型(LLMs)的一项基本能力,因为这类复杂任务要求模型将目标、约束、资源和长期后果协调成可执行且可验证的解决方案。然而,现有的规划基准通常将规划数据视为固定的实例集合,而非可控的生成目标。这限制了场景覆盖范围,将难度与表面代理而非结构来源挂钩,并且对可扩展生成、自动验证或面向规划的训练支持有限。我们引入PlanningBench,一个用于生成可扩展、多样化且可验证的规划数据的框架,既可用于评估也可用于训练。PlanningBench从真实规划场景出发,将实际工作流程抽象为包含30多种任务类型、子任务、约束族和难度因素的结构化分类体系。在该分类体系的指导下,一个约束驱动的合成管道实例化自包含的规划问题,具备自适应难度控制、质量过滤和实例级验证检查表。这将规划数据构建从固定基准收集转变为可控生成,同时保留现实任务基础。我们使用PlanningBench评估开源和闭源前沿LLMs,发现当前模型在耦合约束下仍难以生成完整解决方案。除评估外,在已验证的PlanningBench数据上进行强化学习可提升在未见规划基准和更广泛的指令遵循任务上的性能。进一步分析表明,确定性或明确指定的最优解提供了更清晰的奖励信号和更稳定的训练动态。总体而言,PlanningBench为诊断和提高LLMs中可泛化的规划能力提供了可控的规划数据来源。

英文摘要

Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited support for scalable generation, automatic verification, or planning-oriented training. We introduce PlanningBench, a framework for generating scalable, diverse, and verifiable planning data for both evaluation and training. PlanningBench starts from real planning scenarios and abstracts practical workflows into a structured taxonomy of more than 30 task types, subtasks, constraint families, and difficulty factors. Guided by this taxonomy, a constraint-driven synthesis pipeline instantiates self-contained planning problems with adaptive difficulty control, quality filtering, and instance-level verification checklists. This shifts planning data construction from fixed benchmark collection to controllable generation while preserving realistic task grounding. We use PlanningBench to evaluate open-source and closed-source frontier LLMs, and find that current models still struggle to produce complete solutions under coupled constraints. Beyond evaluation, reinforcement learning on verified PlanningBench data improves performance on unseen planning benchmarks and broader instruction-following tasks. Further analysis suggests that determinate or well-specified optimal solutions provide clearer reward signals and more stable training dynamics. Overall, PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs.

2601.22538 2026-06-01 cs.LG stat.AP 版本更新

Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models

通过切换状态空间模型在非平稳时间序列中的学习-延迟决策

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing(计算机科学学院) National University of Singapore(新加坡国立大学) Institute for Infocomm Research(信息通信研究所) ISAE-SUPAERO ONERA A*STAR, Singapore(新加坡A*STAR)

AI总结 提出L2D-SLDS框架,利用因子化切换线性高斯状态空间模型处理非平稳流式数据,通过共享因子持续更新未查询专家的信念,并设计学习感知查询分数平衡即时成本与信息增益,实现在线学习-延迟决策。

详情
AI中文摘要

学习-延迟决策(L2D)将每个决策路由到系统自身的预测器或外部专家。流式时间序列设置打破了离线L2D的假设:数据是非平稳的,专家可用性随时间变化,内部预测器在线训练。我们提出L2D-SLDS,一种基于因子化切换线性高斯状态空间模型的一阶段在线L2D框架,该模型覆盖所有潜在残差:一个离散状态、一个共享全局因子以及每个专家的特异状态。始终观测的内部残差通过共享因子持续更新关于每个未查询专家的信念,而学习感知查询分数平衡即时成本与潜在状态信息增益以及一步学习者的改进。我们证明了一个针对时变学习-延迟比较器的oracle不等式,将遗憾分解为查询奖励预算、SLDS预测成本误差项$\mathcal{E}_{\mathrm{SLDS}}$以及内部学习者的区间动态遗憾。在合成数据、墨尔本、耶拿和24专家德里基准测试上,L2D-SLDS与上下文和非平稳老虎机基线相比具有竞争力或更优,同时在真实数据轮次中延迟比例低于$2\%$。

英文摘要

Learning-to-defer (L2D) routes each decision to a system's own predictor or to an external expert. Streaming time-series settings break the offline-L2D assumptions: the data are non-stationary, expert availability shifts over time, and the internal predictor is trained online. We propose L2D-SLDS, a one-stage online L2D framework based on a factorized switching linear-Gaussian state-space model over all potential residuals: a discrete regime, a shared global factor, and per-expert idiosyncratic states. The always-observed internal residual continuously updates beliefs about every unqueried expert through the shared factor, and a learner-aware query score balances immediate cost against latent-state information gain and one-step learner improvement. We prove an oracle inequality against a time-varying learn-and-defer comparator, decomposing regret into a query-bonus budget, an SLDS predictive-cost-error term~$\mathcal{E}_{\mathrm{SLDS}}$, and the internal learner's interval dynamic regret. On synthetic, Melbourne, Jena, and 24-expert Delhi benchmarks, L2D-SLDS is competitive with or improves on contextual- and non-stationary-bandit baselines while deferring on ${<}2\%$ of real-data rounds.

2509.10308 2026-06-01 cs.LG 版本更新

GraphCSVAE: Graph Categorical Structured Variational Autoencoder for Spatiotemporal Auditing of Physical Vulnerability Towards Sustainable Post-Disaster Risk Reduction

GraphCSVAE: 面向可持续灾后风险降低的物理脆弱性时空审计的图类别结构化变分自编码器

Joshua Dimasaka, Christian Geiß, Robert Muir-Wood, Emily So

发表机构 * University of Cambridge(剑桥大学) Cambridge University Centre for Risk in the Built Environment(剑桥大学建筑环境风险中心) Earth Observation Center(地球观测中心) Institute of Geography(地理研究所)

AI总结 提出GraphCSVAE框架,通过整合深度学习、图表示和类别概率推断,利用时间序列卫星数据和专家先验,对物理脆弱性进行建模,并在两个灾后地区验证其时空审计能力。

Comments Accepted for publication in Progress in Disaster Science (on May 20, 2026) and at the 8th International Disaster and Risk Conference, IDRC 2025 | Keywords: weakly supervised, graph, categorical, vulnerability, remote sensing, spatiotemporal | The data and code are respectively available at https://doi.org/10.5281/zenodo.16656471 and https://github.com/riskaudit/GraphCSVAE

详情
AI中文摘要

在灾害发生后,全球许多机构在监测灾害风险变化方面面临挑战,限制了评估联合国仙台减少灾害风险框架(2015-2030)进展的能力。尽管众多研究通过地球观测和数据驱动方法显著推进了灾害暴露和危险性的大规模建模,但在风险方程中另一个同等重要但具有挑战性的要素——物理脆弱性的建模方面进展仍然有限。为弥补这一空白,我们引入了图类别结构化变分自编码器(GraphCSVAE),这是一个概率数据驱动框架,通过整合深度学习、图表示和类别概率推断,利用时间序列卫星数据集和专家先验来建模物理脆弱性。我们引入了一个弱监督的一阶转移矩阵,以捕捉两个受灾害影响且社会经济弱势地区脆弱性时空分布的变化:孟加拉国受气旋影响的Khurushkul社区和塞拉利昂受泥石流影响的弗里敦市。在两个案例研究中,该框架构建了2016-2023年的大规模图表示,并由于缺乏时间地面真值标签,使用Aitchison距离评估后验成分分布与专家先验的差异。该工作揭示了灾后物理脆弱性的区域动态,为局部时空审计和可持续的灾后风险降低策略提供了宝贵见解。

英文摘要

In the aftermath of disasters, many institutions worldwide face challenges in monitoring changes in disaster risk, limiting assessment of progress towards the UN Sendai Framework for Disaster Risk Reduction 2015-2030. While numerous efforts have substantially advanced the large-scale modeling of hazard and exposure through Earth observation and data-driven methods, progress remains limited in modeling another equally important yet challenging element of the risk equation: physical vulnerability. To address this gap, we introduce Graph Categorical Structured Variational Autoencoder (GraphCSVAE), a probabilistic data-driven framework for modeling physical vulnerability by integrating deep learning, graph representation, and categorical probabilistic inference, using time-series satellite-derived datasets and expert priors. We introduce a weakly supervised first-order transition matrix to capture changes in the spatiotemporal distribution of vulnerability across two disaster-affected and socioeconomically disadvantaged regions: the cyclone-impacted Khurushkul community in Bangladesh and the mudslide-affected city of Freetown in Sierra Leone. Across both case studies, the framework constructs large-scale graph representations spanning 2016-2023 and evaluates posterior compositional distributions against expert priors using Aitchison distance due to the lack of temporal groundtruth labels. The work reveals post-disaster regional dynamics in physical vulnerability, offering valuable insights into localized spatiotemporal auditing and sustainable strategies for post-disaster risk reduction.

2605.19233 2026-06-01 cs.CR cs.LG quant-ph 版本更新

Quantum Machine Learning for Cyber-Physical Anomaly Detection in Unmanned Aerial Vehicles: A Leakage-Free Evaluation with Proxy-Audited Feature Sets

量子机器学习在无人机网络物理异常检测中的应用:基于代理审计特征集的无泄漏评估

Carlos A. Durán Paredes, Javier E. León Calderón, Nicolás Sánchez Perea, Germán Darío Díaz, Camilo Segura Quintero

发表机构 * Corporation for Aerospace Initiatives, Research and Innovation (CASIRI)(航空航天研究与创新公司) Department of Electronics Engineering, Universidad Nacional de Colombia(国立哥伦比亚大学电子工程系) Department of Electronics Engineering, Universidad del Cauca(卡利学院电子工程系) Department of Physics, Universidad del Cauca(卡利大学物理系)

AI总结 针对无人机网络物理攻击,提出无泄漏评估框架,结合分组时间协议、三模式特征审计和混合XGBoost+数据重上传分类器,验证量子增强混合方法的增量优势。

Comments 10 pages, 7 figures, 1 table; open Qiskit 2.x implementation available at https://github.com/Carlosandp/TLM-UAV-Quantum-Anomaly-Detection

详情
AI中文摘要

无人机是网络物理系统,其攻击面涵盖网络化航空电子设备和机载传感器融合:受损的GPS或电池模块可以模拟良性任务段并逃避简单的异常检测器。我们在多传感器TLM:UAV基准上对无人机异常检测的量子机器学习进行了无泄漏评估。三项贡献支持该研究。(i) 一种分组感知时间协议(B2)将数据集划分为十个连续的TimeUS块,并在十个随机种子上进行评估,消除了随机分层分割混合邻近样本所产生的膨胀。(ii) 一种三模式特征审计(完整/宽松/严格)量化了准确度有多少来自瞬时物理信号与上下文代理(累积能量、电池状态、GPS轨迹)。(iii) 在相同预算下,将混合XGBoost+数据重上传(DRU)分类器与五个配对的非线性控制(原始、PCA、多项式-2、随机RBF和未训练的DRU映射)进行基准测试。独立DRU在种子间并不始终匹配最强的经典基线;然而,经过训练的DRU混合模型是唯一一个平均F1宏从完整模式到严格模式向上移动(+0.05)的模型,这一方向性信号由于种子间标准差而无法解释为统计上确定的差异。经过训练的DRU混合模型在无代理评估下还记录了最低的平均误报率,但受所报告的种子间方差影响。我们将其视为一种增量的、可复现的量子增强混合优势,并提供一个开源的Qiskit 2.x实现,作为NISQ时代航空航天系统中网络安全分析的基准。

英文摘要

Unmanned aerial vehicles (UAVs) are cyber-physical systems whose attack surface spans networked avionics and on-board sensor fusion: a compromised GPS or battery module can mimic a benign mission segment and evade naive anomaly detectors. We present a leakage-free evaluation of quantum machine learning for UAV anomaly detection on the multi-sensor TLM:UAV benchmark. Three contributions support the study. (i) A group-aware temporal protocol (B2) partitions the dataset into ten contiguous TimeUS blocks and evaluates over ten seeds, eliminating the inflation produced by random stratified splits that mix neighbouring samples. (ii) A three-mode feature audit (full/loose/strict) quantifies how much accuracy stems from instantaneous physical signals versus contextual proxies (cumulative energy, battery state, GPS trajectory). (iii) A hybrid XGBoost + Data Reuploading (DRU) classifier is benchmarked against five paired non-linear controls (raw, PCA, polynomial-2, random-RBF, and an untrained DRU map) under identical budgets. The standalone DRU does not consistently match the strongest classical baseline across seeds; however, the trained-DRU hybrid is the only model whose mean F1 macro shifts upward from full to strict (+0.05), a directional signal that the per-seed standard deviations prevent from being interpreted as a statistically established difference. The trained-DRU hybrid also records the lowest mean false-alarm rate under proxy-free evaluation, subject to the inter-seed variance reported. We frame this as an incremental, reproducible quantum-enhanced hybrid benefit, and provide an open Qiskit 2.x implementation as a benchmark for cybersecurity analytics in NISQ-era aerospace systems.

2605.19145 2026-06-01 cs.LG 版本更新

PMF-CL: Pareto-Minimal-Forgetting Continual Learner for Conflicting Tasks

PMF-CL: 面向冲突任务的帕累托最小遗忘持续学习器

Srijith Nair, Atilla Eryilmaz, Jia Liu

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 提出基于多任务学习视角的帕累托最优框架,通过寻找帕累托最优解实现冲突任务下最小化遗忘的持续学习,并推导出适用于线性回归、基函数回归及具有二次上界损失函数的帕累托最小遗忘算法。

Comments 25 pages, 4 figures, 4 algorithms

详情
AI中文摘要

文献中提出了许多持续学习算法来解决机器学习模型中的灾难性遗忘问题(即学习新任务导致先前学习任务性能下降)。尽管所有持续学习方法都使用某种形式的记忆来保留过去任务的信息,但对需要存储哪些信息以最小化灾难性遗忘的基本理解仍然难以捉摸。最近,人们认识到,在存在所有任务共同全局最小化器的强假设下,灾难性遗忘可以完全避免。然而,在实践中,任务很少具有共同的全局最小化器,一定程度的遗忘是不可避免的。本文提出了一个基于多任务学习视角的、原则性且系统化的冲突任务持续学习基础框架。该方法基于寻找帕累托最优解,即根据定义,在帕累托意义上最小化遗忘先前任务的解。我们推导了线性回归和基函数回归的帕累托最小遗忘持续学习算法,以及具有二次上界的一般损失函数(例如逻辑回归)。对于二次问题,PMF-CL使用内存高效的迭代更新,对于具有$d$个参数的模型,静态内存占用为$\mathcal{O}(d^2)$。

英文摘要

In the literature, many continual learning (CL) algorithms have been proposed to address the issue of catastrophic forgetting in ML models (i.e., learning new tasks leads to the loss of performance on previously learned tasks). Although all CL approaches use some form of memory to retain information about past tasks, a grounded understanding of what information needs to be stored to minimize catastrophic forgetting remains elusive. Recently, it has been recognized that under the strong assumption of the existence of a common global minimizer over all tasks, catastrophic forgetting can be completely avoided. However, in practice, tasks rarely have a common global minimizer, and a certain amount of forgetting is inevitable. In this paper, we propose a foundational framework for principled and systematic CL of conflicting tasks using a multi-task learning (MTL) perspective. The approach is based on finding Pareto-optimal solutions, i.e., the solutions which, by definition, minimally forget the previous tasks in the Pareto sense. We derive Pareto-minimal-forgetting CL algorithms for linear and basis-function regression, and general loss functions which have a quadratic upper bound, e.g., logistic regression. For quadratic problems, PMF-CL uses memory-efficient iterative updates with a static memory footage of $\mathcal{O}(d^2)$ for models with $d$ parameters.

2605.18807 2026-06-01 cs.LG cs.AI 版本更新

Block-Based Double Decoders

基于块的双解码器

Asher Labovich, Benjamin Bradley, Vanessa Alexander, Chaitanya Harsha

发表机构 * Brown University(布朗大学)

AI总结 提出基于块的双解码器架构,利用双重因果块注意力掩码实现全损失监督和静态序列打包,结合解码器训练效率与编码器-解码器推理效率,在缩放定律实验中优于编码器-解码器并接近解码器模型,推理时KV缓存和每token计算减少至少2/3。

Comments 8 pages main, 13 pages total

详情
AI中文摘要

编码器-解码器模型在推理时间上比仅解码器模型节省大量成本,但其预训练目标存在稀疏监督和动态序列长度的问题,使其难以大规模实践。我们提出了基于块的双解码器,一种新颖的Transformer架构,利用双重因果块注意力掩码进行全损失监督和静态序列打包,结合了解码器训练效率与编码器-解码器推理效率。在缩放定律实验中,基于块的双解码器显著优于编码器-解码器,并在各规模上紧密跟踪仅解码器模型。在推理时,它们在不牺牲预填充缓存或仅解码器模型可用的其他现有推理优化的情况下,将KV缓存内存和每token计算减少至少2/3。

英文摘要

Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengths, keeping them out of practice at scale. We propose block-based double decoders, a novel transformer architecture that utilizes doubly-causal block-based attention masks to train with full loss supervision and static sequence packing, combining decoder-only training efficiency with encoder-decoder inference efficiency. In scaling law experiments, block-based double decoders strongly outperform encoder-decoders and closely track decoder-only models across scales. At inference time, they cut KV-cache memory and per-token compute by at least 2/3 without sacrificing prefill caching or other existing inference optimizations available to decoder-only models.

2605.18803 2026-06-01 cs.LG cs.AI 版本更新

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

PROWL: 基于优先遗憾驱动的世界模型学习优化

Ahmet H. Güzel, Jenny Seidenschwarz, Benjamin Graham, Jonathan Sadeghi, Jeffrey Hawke, Ilija Bogunovic

发表机构 * University College London AI Centre(伦敦大学学院人工智能中心) Odyssey University of Basel(巴塞尔大学)

AI总结 提出一种KL约束的对抗课程,通过训练策略暴露扩散世界模型的高误差轨迹并持续微调,结合优先对抗轨迹缓冲区,解决被动数据中罕见关键转换的鲁棒性问题。

详情
AI中文摘要

现代动作条件视频世界模型在短期视觉真实性上表现强劲,但在罕见且对交互关键的转换上仍不可靠,而这些转换主导了下游规划和策略性能。由于被动演示数据系统性地对这些高影响区域采样不足,提高鲁棒性需要主动引发模型失败,而非依赖其自然发生。我们引入了一种KL约束的对抗课程,其中训练一个策略来暴露基于扩散的世界模型的高误差轨迹,同时保持接近行为分布。世界模型在这些对抗性发现的轨迹上持续微调,形成一个对抗训练循环,将罕见失败转化为稳定的、接近分布的训练信号,而不会漂移到分布外利用。为了在模型改进时持续对未解决的弱点施加压力,我们提出了一种优先对抗轨迹(PAT)缓冲区,该缓冲区根据预测误差、动作保真度和学习进度对轨迹重新排序,将训练集中在未解决的失败模式上,而不是重复访问已解决的案例。我们在MineRL框架中实现了我们的方法,并在保留的分布外轨迹上进行了评估;PROWL提高了相对于仅在被动数据上训练的模型的鲁棒性,揭示了在弱行为约束下的奖励黑客行为,并证明了有效的对抗世界模型训练关键取决于平衡探索性失败发现与显式行为正则化。我们的结果表明,可扩展的世界模型不仅受益于更大的数据集,还受益于选择性生成信息丰富的训练数据。

英文摘要

Modern action-conditioned video world models achieve strong short-horizon visual realism, yet remain unreliable on rare, interaction-critical transitions that dominate downstream planning and policy performance. Because passive demonstration data systematically under-samples these high-impact regimes, improving robustness requires actively eliciting model failures rather than relying on their natural occurrence. We introduce a KL-constrained adversarial curriculum in which a policy is trained to expose high-error trajectories of a diffusion-based world model while remaining close to the behavior distribution. The world model is continuously fine-tuned on these adversarially discovered trajectories, yielding an adversarial training loop that converts rare failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation. To maintain pressure on unresolved weaknesses as the model improves, we propose a Prioritized Adversarial Trajectory (PAT) buffer that re-ranks trajectories based on prediction error, action fidelity, and learning progress, focusing training on unresolved failure modes rather than repeatedly revisiting solved cases. We implement our approach in the MineRL framework and evaluate it on held-out out-of-distribution trajectories; PROWL improves robustness over models trained on passive data alone, reveals reward-hacking behaviors under weak behavioral constraints, and demonstrates that effective adversarial world-model training critically depends on balancing exploratory failure discovery with explicit behavioral regularization. Our results suggest that scalable world models benefit not only from larger datasets, but also from selectively generating informative training data.

2605.18606 2026-06-01 cs.LG 版本更新

Physics-Aligned Canonical Equivariant Fourier Neural Operator under Symmetry-Induced Shifts

对称性诱导位移下的物理对齐规范等变傅里叶神经算子

Jiaxiao Xu, Changhong Mou, Yeyu Zhang, Fengxiang He

发表机构 * Shanghai University of Finance and Economics(上海财经大学) Utah State University(犹他州立大学) University of Edinburgh(爱丁堡大学)

AI总结 提出PACE-FNO,通过李代数坐标估计将输入场对齐到参考帧,再应用标准FNO并恢复目标帧,利用周期性演化方程的连续对称性分离坐标对齐与物理演化,在多种PDE上实现OOD相对误差降低高达12倍。

Comments 36 pages, 14 figures, 10 tables

详情
AI中文摘要

神经算子近似PDE解映射,但未必尊重控制方程的对称性。在分布外(OOD)场景中,标准神经算子通常需要在单个映射中学习坐标对齐和物理演化,这可能会损害泛化能力。我们利用周期性域上演化方程的已知连续对称性来分离这两个角色。我们提出了物理对齐规范等变傅里叶神经算子(PACE-FNO),它通过李代数坐标估计器估计输入帧,将场映射到参考帧,应用标准傅里叶神经算子(FNO),并将预测恢复到目标帧。我们使用有界对称扰动联合训练对齐和算子预测,并在推理时通过可选的低维精化步骤更新估计帧。等变性通过输入和输出变换强制执行,而FNO架构保持不变。在周期性域上的1-D和2-D Burgers、浅水方程和Navier-Stokes方程中,PACE-FNO在分布内(ID)精度上与标准神经算子相当,并在平移和伽利略位移下将分布外(OOD)相对误差比带对称增强的FNO(FNO+Aug)降低多达12倍,在耦合旋转-平移位移下增益较小。消融实验表明,对齐输入和恢复输出帧贡献了大部分OOD增益;推理时精化提供了较小的修正。

英文摘要

Neural operators approximate PDE solution maps, but they need not respect the symmetries of the governing equation. In out-of-distribution (OOD) regimes, a standard neural operator must often learn coordinate alignment and physical evolution within a single map, which can hurt generalization. We use known continuous symmetries of evolution equations on periodic domains to separate these two roles. We propose the Physics-Aligned Canonical Equivariant Fourier Neural Operator (PACE-FNO), which estimates the input frame with a Lie-algebra coordinate estimator, maps the field to a reference frame, applies a standard Fourier Neural Operator (FNO), and restores the prediction to the target frame. We train alignment and operator prediction jointly using bounded symmetry perturbations, with an optional low-dimensional refinement step that updates the estimated frame at inference. Equivariance is enforced by the input and output transformations, while the FNO architecture remains unchanged. Across 1-D and 2-D Burgers, shallow-water, and Navier-Stokes equations on periodic domains, PACE-FNO matches the in-distribution (ID) accuracy of standard neural operators and reduces out-of-distribution (OOD) relative error by up to 12x over FNO with symmetry augmentation (FNO+Aug) under translations and Galilean shifts, with smaller gains for coupled rotation-translation shifts. Ablations show that aligning the input and restoring the output frame account for most OOD gains; inference-time refinement provides a smaller correction.

2605.18364 2026-06-01 cs.LG math.OC 版本更新

Proximal basin hopping: global optimization with guarantees

近端盆地跳跃:有保证的全局优化

Guillaume Lauga, Cesare Molinari, Samuel Vaiter

发表机构 * LJAD MALGA Université Côte d’Azur(法国尼斯大学) Università di Genova(热那亚大学) CNRS(国家科学研究中心)

AI总结 提出近端盆地跳跃(PBH)理论框架,结合近端优化与局部最小化,构建算法以高概率收敛到全局最小值,在合成硬函数和深度学习标度律拟合等实际问题中表现优于有理论保证的已知算法,且维度越高性能差距越大。

详情
AI中文摘要

全局优化是一个具有挑战性的问题,大量算法展示了经验上的成功,但缺乏理论支持。在这项工作中,我们提出了一个名为近端盆地跳跃(PBH)的新理论框架,精心设计以结合近端优化和局部最小化。我们利用它构建了一个实用算法,在使用有限样本时以高概率收敛到全局最小值。近端盆地跳跃在标准合成硬函数和实际问题(如拟合深度学习标度律)上优于具有理论保证的已知算法。此外,维度越高,性能差距越大。

英文摘要

Global optimization is a challenging problem, with plenty of algorithms displaying empirical success, but scarce theoretical backing. In this work, we propose a new theoretical framework called Proximal Basin Hopping (PBH), carefully tailored to combine proximal optimization and local minimization. We use it to construct a practical algorithm that converges to the global minimizer with high probability, when using a finite amount of samples. Proximal Basin Hopping outperforms well known algorithms with theoretical backing on standard synthetic hard functions, and real problems such as fitting scaling laws for deep learning. Furthermore, the higher the dimension, the better the performance gap.

2605.18024 2026-06-01 cs.LG cs.AI cs.MA 版本更新

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

交互破坏对抗学习框架用于鲁棒多智能体强化学习

Sunwoo Lee, Mingu Kang, Yonghyeon Jo, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea(人工智能研究生院,UNIST,乌山,韩国)

AI总结 提出交互破坏对抗学习框架,从信息论角度构建攻击破坏智能体间交互,并训练智能体在干扰下可靠执行,提升鲁棒性。

Comments 9 pages for main, 33 pages for total, Accepted to ICML 2026

详情
AI中文摘要

合作是多智能体强化学习(MARL)的核心,然而当外部扰动破坏智能体间的交互时,学到的协调可能变得脆弱。先前的鲁棒MARL方法主要考虑面向价值的攻击,在交互结构本身被破坏时存在鲁棒性缺口。在本文中,我们提出一个交互破坏对抗学习(IBAL)框架,该框架从信息论角度构建攻击,通过扰动智能体的观测和动作来阻碍协调,并训练智能体在此类干扰下可靠执行。实验上,我们的方法在多种攻击设置下比现有鲁棒MARL基线具有更好的鲁棒性,甚至在智能体缺失场景下也表现出更强的性能。我们的代码可在 https://sunwoolee0504.github.io/IBAL 获取。

英文摘要

Cooperation is central to multi-agent reinforcement learning (MARL), yet learned coordination can be fragile when external perturbations disrupt inter-agent interactions. Prior robust MARL methods have primarily considered value-oriented attacks, leaving a gap in robustness when interaction structures themselves are corrupted. In this paper, we propose an interaction-breaking adversarial learning (IBAL) framework that takes an information-theoretic view to construct attacks that impede coordination by perturbing agents' observations and actions, and trains agents to perform reliably under such disruptions. Empirically, our approach improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even under agent-missing scenarios. Our code is available at https://sunwoolee0504.github.io/IBAL.

2605.17524 2026-06-01 cs.LG cs.DB 版本更新

Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings

协方差结构与坐标异质性支配对比嵌入的二值量化

Wenxuan Xiao

发表机构 * Changsha University(长沙大学)

AI总结 通过分析InfoNCE训练表示的协方差结构,揭示了协方差矩阵的非对角项和坐标异质性如何分别影响二值量化的排序保真度和设计选择,并推导出缩放律以指导系统设计。

Comments 21 pages, 1 figure, 19 tables (6 in main text, 13 in appendix)

详情
AI中文摘要

二值量化(BQ)将高维嵌入压缩为每个坐标一或两个比特,从而实现极速的最近邻搜索。然而,一个显著的谜题仍然存在:BQ在对比嵌入上取得了有竞争力的召回率,但在其他嵌入上却失败——并且两个领先系统采用了截然相反的策略(随机旋转与保留坐标轴),而没有共同的理论解释何时适用哪种策略。我们通过将最近建立的InfoNCE训练表示的Gaussian结构与BQ质量的统计框架联系起来,解决了这个谜题。我们的分析揭示了协方差矩阵的两个不同作用。首先,完整的协方差结构——而不仅仅是其对角线——决定了排序保真度的绝对水平,其中非对角相关性贡献了30-50%的信号。其次,坐标异质性(每个坐标方差的非均匀性)支配着关键设计选择:每个额外比特贡献多少,以及随机旋转是有益还是有害。我们推导了Gaussian模型下排序保真度的近似表达式,表明幅度比特携带与异质性成比例的信息,并表明随机旋转恰好破坏了某个范式所利用的信号,同时创造了另一个范式所需的各向同性。一个现象学缩放律预测了跨模型和维度的保真度。在涵盖9个嵌入家族的18个数据集上的实验支持了主要预测,并据我们所知,为二值量化系统提供了第一个有原则的设计指南。

英文摘要

Binary quantization (BQ) compresses high-dimensional embeddings into one or two bits per coordinate, enabling nearest neighbor search at extreme speed. Yet a striking puzzle persists: BQ achieves competitive recall on contrastive embeddings but fails on others -- and two leading systems adopt diametrically opposite strategies (random rotation vs. preserving coordinate axes) without a common theory explaining when each is appropriate. We address this puzzle by connecting the Gaussian structure recently established for InfoNCE-trained representations to a statistical framework for BQ quality. Our analysis reveals two distinct roles of the covariance matrix. First, the full covariance structure -- not merely its diagonal -- determines the absolute level of ranking fidelity, with off-diagonal correlations contributing 30--50% of the signal. Second, coordinate heterogeneity (the non-uniformity of per-coordinate variances) governs key design choices: how much each additional bit contributes, and whether random rotation helps or hurts. We derive approximate expressions for ranking fidelity under a Gaussian model, show that the magnitude bit carries information proportional to heterogeneity, and show that random rotation destroys precisely the signal that one paradigm exploits while creating the isotropy that the other requires. A phenomenological scaling law predicts fidelity across models and dimensions. Experiments on 18 datasets spanning 9 embedding families support the main predictions and provide, to our knowledge, the first principled design guide for binary quantization systems.

2605.17373 2026-06-01 cs.LG cs.AI 版本更新

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

FML-bench:从搜索动力学视角对AI研究代理策略的受控研究

Qiran Zou, Hou Hei Lam, Wenhao Zhao, Tingting Chen, Yiming Tang, Samson Yu, Yingtao Zhu, Srinivas Anumasa, Zufeng Zhang, Tianyi Zhang, Chang Liu, Zhengyao Jiang, Anirudh Goyal, Dianbo Liu

发表机构 * National University of Singapore(国立新加坡大学) Tsinghua University(清华大学) University of Minnesota(明尼苏达大学) Weco Meta

AI总结 本文提出FML-Bench基准,通过分离策略与基础设施并定义过程级指标,评估六种代理策略,发现贪婪爬山法接近最优树搜索,且自适应策略基于搜索密度切换可超越其他代理。

Comments Our benchmark is available at: https://github.com/qrzou/FML-bench

详情
AI中文摘要

AI研究代理通过自动化假设生成、实验和实证改进来加速机器学习研究。现有代理策略从贪婪爬山法到树搜索和进化优化不等,但哪些策略选择驱动性能仍不清楚。回答这个问题需要一个基准,该基准将代理策略(例如搜索拓扑)与执行基础设施(例如代码编辑器)分离,以便性能差异归因于策略而非基础设施,并提供最终分数之外的过程级指标来分析探索行为。现有基准支持有限。我们提出FML-Bench,一个涵盖10个领域18个基础ML研究任务的基准,将代理策略与执行基础设施分离,并定义了12个过程级行为指标。评估六个代表性代理,我们发现:(1) 策略复杂性本身并不能保证强性能:一个简单的贪婪爬山者几乎与最佳性能的树搜索代理相匹配,两者均远高于其余代理;(2) 我们的分析表明,这种模式与改进机会结构相关:当机会密集时,贪婪搜索往往更有效,而当机会稀疏时,树搜索和进化策略往往更有效;基于这一见解构建的自适应代理在检测到改进停滞时切换到更广泛的探索,并优于其他六个代理,初步支持了这一观察;(3) 过程级分析表明,早期收敛和方向聚焦的探索与最终性能显著相关,而解决方案多样性和计算成本则不然。我们的基准可在 https://github.com/qrzou/FML-bench 获取。

英文摘要

AI research agents accelerate ML research by automating hypothesis generation, experimentation, and empirical refinement. Existing agent strategies range from greedy hill-climbing to tree search and evolutionary optimization, yet which strategy choices drive performance remains unclear. Answering this question requires a benchmark that separates agent strategy (e.g., search topology) from execution infrastructure (e.g., code editor), so that performance differences are attributable to strategy rather than infrastructure, and that provides process-level metrics beyond final scores to analyze exploration behaviors. Existing benchmarks offer limited support. We propose FML-Bench, a benchmark of 18 fundamental ML research tasks across 10 domains that separates agent strategy from execution infrastructure and defines 12 process-level behavioral metrics. Evaluating six representative agents, we find that: (1) strategy complexity alone does not guarantee strong performance: a simple greedy hill-climber nearly matches the best-performing tree-search agent, both well above the remaining agents; (2) our analysis suggests this pattern relates to improvement opportunity structure: greedy search tends to be more effective when opportunities are dense, while tree-search and evolutionary strategies tend to be more effective when opportunities are sparse; an adaptive agent built on this insight switches to broader exploration upon detecting improvement stagnation and outperforms the other six agents, lending initial support to this observation; and (3) process-level analysis reveals that early convergence and directionally focused exploration are significantly associated with final performance, while solution diversity and compute cost are not. Our benchmark is available at: https://github.com/qrzou/FML-bench.

2605.17126 2026-06-01 stat.ML cs.LG stat.ME 版本更新

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety

无需特征值下界的多任务线性回归:自适应性、鲁棒性与安全性

Seok-Jin Kim

发表机构 * Columbia(哥伦比亚大学)

AI总结 针对存在污染任务的多任务线性回归问题,提出基于矩阵加权范数正则化的估计器,引入相对平衡条件,在弱谱假设下达到与现有方法相当的预测误差界,并具备安全性保证。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了存在污染任务的多任务线性回归问题。我们处理了大多数任务的未知参数在 $\ell_2$ 范数下接近,而部分任务是任意异常值的情况。现有的理论框架严重依赖于每个任务的经验二阶矩的最小特征值远离零(阶为 $\Omega(1)$)的假设。关键的是,这一假设在许多高维场景中不成立,导致先前的保证无效。为了克服这一限制,我们提出了一种基于矩阵加权范数正则化的估计器。我们还引入了一个相对平衡条件,由平衡常数量化,该条件将每个任务的二阶矩与平均内点几何进行比较,并放宽了对任务级二阶矩下界的需求。在具有适度平衡性的有利情况下,我们的预测 MSE 界在显著更弱的谱假设下匹配 Duan 和 Wang (2023) 的速率;由此得到的任务总体 MSE 在最小化极大意义下是最优的,仅相差对数因子。此外,我们证明了我们的估计器具有安全性保证:当相关的平衡常数很大或无穷大,或者任务不相关时,该方法的表现不会差于独立任务学习。

英文摘要

We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $Ω(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning.

2605.15706 2026-06-01 cs.LG 版本更新

Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models

可微分的混合智能体激励大型语言模型的群体智能

Xingjian Wu, Junkai Lu, Siyu Yan, Xiangfei Qiu, Jilin Hu, Chenjuan Guo, Bin Yang

发表机构 * East China Normal University(华东师范大学)

AI总结 提出可微分的混合智能体(DMoA)框架,通过可微分的上下文感知路由机制动态激活智能体,实现推理过程中的弹性协作,并在9个基准上取得最优性能。

详情
AI中文摘要

大型语言模型(LLMs)的最新进展推动了用于复杂推理任务的多智能体系统(MAS)的发展。然而,现有的MAS通常依赖于预定义或预编译的通信拓扑,这限制了它们对动态任务需求的灵活性和适应性。在这项工作中,我们提出了可微分的混合智能体(DMoA),一个自我进化的多智能体框架,能够在推理过程中实现弹性且自适应的智能体协作。不同于静态构建工作流,DMoA在每个推理步骤动态路由和激活智能体,使系统能够隐式模拟多样化的通信拓扑并适应不断变化的需求。为了实现这一点,我们设计了一个可微分的、上下文感知的路由机制,利用循环结构融入历史和上下文信息,以逐步方式产生稀疏的智能体激活。此外,我们引入预测熵作为自监督信号来优化路由过程,实现了无需外部标注的高效测试时自适应。在9个基准上的广泛实验表明,DMoA在实现最先进性能的同时,展现出强大的效率、鲁棒性和集成能力。

英文摘要

Recent advances in Large Language Models (LLMs) have catalyzed the development of multi-agent systems (MAS) for complex reasoning tasks. However, existing MAS typically rely on pre-defined or pre-compiled communication topologies, which limits their flexibility and adaptability to dynamic task requirements. In this work, we propose Differentiable Mixture-of-Agents (DMoA), a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference. Instead of statically constructing workflows, DMoA dynamically routes and activates agents at each reasoning step, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving demands. To achieve this, we design a differentiable, context-aware routing mechanism that leverages recurrent structures to incorporate historical and contextual information, producing sparse agent activations in a step-wise manner. Furthermore, we introduce predictive entropy as self-supervised signals to optimize the routing process, enabling efficient test-time adaptation without external annotations. Extensive experiments across 9 benchmarks demonstrate that DMoA achieves state-of-the-art performance while exhibiting strong efficiency, robustness, and ensembling capabilities.

2605.15470 2026-06-01 cs.LG physics.ao-ph 版本更新

Njord: A Probabilistic Graph Neural Network for Ensemble Ocean Forecasting

Njord: 一种用于集合海洋预报的概率图神经网络

Daniel Holmberg, Joel Oskarsson, Erik Wikingsson, Fredrik Lindsten, Teemu Roos

发表机构 * University of Helsinki(赫尔辛基大学) ETH AI Center(苏黎世联邦理工学院人工智能中心) Linköping University(利德诺大学)

AI总结 提出结合深度潜变量框架和图神经网络的概率模型Njord,在全球和区域海洋实现单次前向传播采样集合预报,并引入K-means聚类网格适应不规则海面几何,相比确定性基线在观测评估中取得更低误差。

Comments Preprint

详情
AI中文摘要

海洋动力学本质上是混沌的,但现有的机器学习海洋模型仅产生确定性预报。我们介绍了Njord,一种用于海洋预报的概率数据驱动模型,适用于全球和区域领域。Njord结合了深度潜变量框架与图神经网络架构,使得每次预报步骤可以在单次前向传播中采样。我们在全球0.25°分辨率和波罗的海区域2 km分辨率上应用Njord。为了扩展到这些大型海洋网格,我们引入了K-means聚类网格,以适应不规则的海面几何。实验表明,与确定性机器学习基线相比,Njord在两个领域均表现出强劲性能,同时通过采样的集合预报提供不确定性估计。在全球OceanBench基准上,Njord在针对真实观测评估时,在上层海洋变量上平均实现了最低误差,其中海表温度预测改进最大。

英文摘要

Ocean dynamics are inherently chaotic, yet existing machine learning ocean models produce only deterministic forecasts. We introduce Njord, a probabilistic data-driven model for ocean forecasting, applicable to both global and regional domains. Njord combines a deep latent variable framework with a graph neural network architecture, enabling sampling each forecast step in a single forward pass. We apply Njord globally at 0.25° resolution and regionally to the Baltic Sea at 2 km resolution. To scale to these large ocean grids we introduce K-means cluster meshes that adapt to irregular sea surface geometry. Experiments demonstrate strong performance on both domains compared to deterministic machine learning baselines, while also providing uncertainty estimates from the sampled ensemble forecasts. On the global OceanBench benchmark, Njord achieves the lowest errors on average across upper-ocean variables when evaluated against real-world observations, with the largest improvements in surface temperature prediction.

2410.06074 2026-06-01 cs.LG cs.NA math.NA 版本更新

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

可扩展的机械神经网络用于微分方程和机器学习

Jiale Chen, Dingling Yao, Adeel Pervez, Dan Alistarh, Francesco Locatello

发表机构 * Institute of Science and Technology Austria (ISTA)(奥地利科学技术研究所)

AI总结 提出可扩展机械神经网络(S-MNN),通过线性化序列长度的计算和空间复杂度,实现高效建模长期动力学,保持精度和可解释性。

Comments Published as a conference paper at the Thirteenth International Conference on Learning Representations (ICLR 2025): https://openreview.net/forum?id=Oazgf8A24z

详情
Journal ref
International Conference on Learning Representations, 2025, pp. 10018-10039
AI中文摘要

我们提出了可扩展机械神经网络(S-MNN),这是一个增强的神经网络框架,专为涉及长时间序列的科学机器学习应用而设计。通过重新表述原始机械神经网络(MNN)(Pervez等人,2024),我们将计算时间和空间复杂度从分别关于序列长度的三次和二次降低到线性。这一显著改进使得在不牺牲准确性或可解释性的情况下高效建模长期动力学成为可能。大量实验表明,S-MNN在精度上与原始MNN相当,同时大幅减少计算资源。因此,S-MNN可以在应用中直接替换原始MNN,为将机械瓶颈集成到复杂动力系统的神经网络模型中提供实用且高效的工具。源代码可在https://github.com/IST-DASLab/ScalableMNN获取。

英文摘要

We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

2605.11134 2026-06-01 cs.LG cs.AI 版本更新

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

偏好优化中的虚假相关学习:机制、后果及通过平局训练的缓解方法

Christian Moya, Alex Semendinger, Guang Lin, Elliott Thornley

发表机构 * Department of Mathematics, Purdue University, West Lafayette IN, USA(普渡大学数学系) School of Mechanical Engineering, Purdue University, West Lafayette IN, USA(普渡大学机械工程学院) Massachusetts Institute of Technology, Cambridge MA, USA(麻省理工学院)

AI总结 本文通过统一理论分析揭示了偏好优化(如DPO)中虚假相关学习的机制(均值虚假偏差和因果-虚假相关泄漏),证明其导致分布偏移下的不可逆脆弱性,并提出平局训练数据增强策略以选择性减少虚假学习。

Comments Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea
AI中文摘要

偏好学习方法(如直接偏好优化DPO)已知会诱导对虚假相关的依赖,导致当前语言模型中的谄媚和长度偏差,并可能在未来系统中造成严重的目标泛化错误。在这项工作中,我们对此现象进行了统一的理论分析,描述了虚假学习的机制、其在部署中的后果以及一种可证明的缓解策略。聚焦于对数线性策略,我们展示了标准偏好学习目标通过两个渠道在总体水平上诱导对虚假特征的依赖:均值虚假偏差和因果-虚假相关泄漏。然后我们表明这种依赖造成了分布偏移的不可逆脆弱性:来自相同训练分布的更多数据无法减少模型对虚假特征的依赖。为了解决这个问题,我们提出了平局训练,一种使用平局(等效用偏好对)的数据增强策略,以引入数据驱动的正则化。我们证明了该方法选择性地减少虚假学习而不降低因果学习。最后,我们在对数线性模型上验证了我们的理论,并提供了实证证据,表明虚假学习机制和平局训练的益处均适用于神经网络和大语言模型。

英文摘要

Preference learning methods like Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenomenon, characterizing the mechanisms of spurious learning, its consequences on deployment, and a provable mitigation strategy. Focusing on log-linear policies, we show that standard preference-learning objectives induce reliance on spurious features at the population level through two channels: mean spurious bias and causal-spurious correlation leakage. We then show that this reliance creates an irreducible vulnerability to distribution shift: more data from the same training distribution fails to reduce the model's dependence on spurious features. To address this, we propose tie training, a data augmentation strategy using ties (equal-utility preference pairs) to introduce data-driven regularization. We demonstrate that this approach selectively reduces spurious learning without degrading causal learning. Finally, we validate our theory on log-linear models and provide empirical evidence that both the spurious learning mechanisms and the benefits of tie training persist for neural networks and large language models.

2605.02125 2026-06-01 cs.DC cs.LG 版本更新

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

FedQueue:面向跨设施HPC训练的队列感知联邦学习

Yijiang Li, Emon Dey, Zilinghan Li, Krishnan Raghavan, Ravi Madduri, Kibaek Kim

发表机构 * Computer Science Division, Argonne National Laboratory, Lemont, IL, USA(阿贡国家实验室计算机科学部) Learning Division, Argonne National Laboratory, Lemont, IL, USA(阿贡国家实验室学习部)

AI总结 提出FedQueue协议,通过在线预测队列延迟、截止时间准入控制和陈旧感知聚合,解决跨HPC设施联邦学习中的随机调度延迟问题,实现非凸目标下的收敛保证并在实际部署中提升20.5%性能。

详情
AI中文摘要

跨多个HPC设施的联邦学习面临来自批处理调度器的随机准入延迟,这些延迟主导了挂钟时间。同步联邦学习遭受严重的掉队者问题,而异步联邦学习在队列激增时会累积过时更新。我们提出FedQueue,一种队列感知的联邦学习协议,将调度器延迟直接纳入训练和聚合中,它(i)在线预测每个设施的队列延迟以预算本地工作量,(ii)应用基于截止时间的准入控制,缓冲迟到到达以限制陈旧度,以及(iii)执行陈旧感知聚合以稳定异构本地工作负载。我们证明了在有限陈旧度下非凸目标以$\mathcal{O}(1/\sqrt{R})$速率收敛,并表明在队列预测误差下准入控制以高概率产生有限陈旧度。FedQueue在真实跨设施部署中相比基线算法提升了20.5%。受控队列模拟显示对基线有稳健提升;特别是在高队列方差和非独立同分布分区下,达到目标精度水平的时间最多减少60%。

英文摘要

Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a queue-aware FL protocol that incorporates scheduler delays directly into training and aggregation, which (i) predicts per-facility queue delays online to budget local work, (ii) applies cutoff-based admission that buffers late arrivals to bound staleness, and (iii) performs staleness-aware aggregation to stabilize heterogeneous local workloads. We prove the convergence for non-convex objectives at rate $\mathcal{O}(1/\sqrt{R})$ under bounded staleness, and show that the admission controls yield bounded staleness with high probability under queue-prediction error. Real-world cross-facility deployment of FedQueue shows 20.5% improvement over baseline algorithms. Controlled queue simulations demonstrate robust improvement over the baselines; in particular, up to 60% reduction in time to reach a target accuracy level under high queue variance and non-IID partitions.

2604.12579 2026-06-01 cs.LG 版本更新

EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts

基于EEG的多模态学习:曲率混合专家双曲空间方法

Runhe Zhou, Shanglin Li, Guanxiang Huang, Xinliang Zhou, Qibin Zhao, Motoaki Kawanabe, Yi Ding, Cuntai Guan

发表机构 * Nanyang Technological University, Singapore(新加坡南洋理工大学) BIFOLD, Berlin Institute for the Foundations of Learning(柏林学习与数据基础研究所) University of Cambridge, Cambridge, UK(剑桥大学)

AI总结 提出EEG-MoCE框架,通过可学习曲率的双曲空间为每个模态分配专家,并采用曲率感知融合策略,实现层次结构建模,在情绪识别、睡眠分期和认知评估任务上达到最优性能。

Comments Accepted at the Forty-third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

基于脑电图(EEG)的多模态学习将脑信号与互补模态相结合,以改善精神状态评估,具有巨大的临床潜力。这种范式的有效性在很大程度上取决于异构模态上的表示学习。对于基于EEG的范式,一种有前景的方法是利用其层次结构,因为最近的研究表明,EEG和相关模态(例如面部表情)都表现出反映复杂认知过程的层次结构。然而,欧几里得嵌入由于其平坦的几何结构难以表示这些层次结构,而双曲空间由于其指数增长特性,天然适合表示层次结构。在这项工作中,我们提出了EEG-MoCE,一种新颖的基于双曲曲率混合专家框架,专为多模态神经技术设计。EEG-MoCE将每个模态分配给一个具有可学习曲率的双曲空间中的专家,从而能够自适应地建模其内在几何结构。然后,一种曲率感知融合策略动态加权专家,强调具有更丰富层次信息的模态。在基准数据集上的大量实验表明,EEG-MoCE在情绪识别、睡眠分期和认知评估等任务上达到了最先进的性能。代码可在https://github.com/zhourunhe/EEG-MoCE获取。

英文摘要

Electroencephalography (EEG)-based multimodal learning integrates brain signals with complementary modalities to improve mental state assessment, providing great clinical potential. The effectiveness of such paradigms largely depends on the representation learning on heterogeneous modalities. For EEG-based paradigms, one promising approach is to leverage their hierarchical structures, as recent studies have shown that both EEG and associated modalities (e.g., facial expressions) exhibit hierarchical structures reflecting complex cognitive processes. However, Euclidean embeddings struggle to represent these hierarchical structures due to their flat geometry, while hyperbolic spaces, with their exponential growth property, are naturally suited for them. In this work, we propose EEG-MoCE, a novel hyperbolic mixture-of-curvature experts framework designed for multimodal neurotechnology. EEG-MoCE assigns each modality to an expert in a learnable-curvature hyperbolic space, enabling adaptive modeling of its intrinsic geometry. A curvature-aware fusion strategy then dynamically weights experts, emphasizing modalities with richer hierarchical information. Extensive experiments on benchmark datasets demonstrate that EEG-MoCE achieves state-of-the-art performance, including emotion recognition, sleep staging, and cognitive assessment. Code is available at https://github.com/zhourunhe/EEG-MoCE.

2602.16165 2026-06-01 cs.LG cs.AI 版本更新

HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

HiPER: 具有显式信用分配的分层强化学习用于大型语言模型智能体

Jiangweizhi Peng, Yuanxin Liu, Ruida Zhou, Charles Fleming, Zhaoran Wang, Alfredo Garcia, Mingyi Hong

发表机构 * University of Minnesota Northwestern University Amazon AGI Texas A\&M University Cisco Research

AI总结 针对稀疏奖励长程任务中LLM智能体信用分配困难的问题,提出HiPER分层规划-执行框架,通过分层优势估计(HAE)在规划和执行层面显式分配信用,在ALFWorld和WebShop上达到97.4%和83.3%的成功率。

Comments ICML 2026

详情
AI中文摘要

将LLM训练为用于多轮决策的交互式智能体仍然具有挑战性,特别是在具有稀疏和延迟奖励的长程任务中,智能体必须在获得有意义的反馈之前执行一系列扩展的动作。大多数现有的强化学习方法将LLM智能体建模为在单一时间尺度上运行的扁平策略,每轮选择一个动作。在稀疏奖励设置中,这种扁平策略必须跨整个轨迹传播信用,而没有显式的时间抽象,这常常导致不稳定的优化和低效的信用分配。我们提出HiPER,一种新颖的分层规划-执行强化学习框架,明确地将高层规划与低层执行分开。HiPER将策略分解为一个提出子目标的高层规划器和一个在多个动作步骤中执行这些子目标的低层执行器。为了将优化与此结构对齐,我们引入了一种称为分层优势估计(HAE)的关键技术,该技术在规划和执行层面仔细分配信用。通过聚合每个子目标执行过程中的回报并协调两个层面的更新,HAE提供了无偏的梯度估计器,并且与扁平广义优势估计相比,可证明地减少了方差。实验上,HiPER在具有挑战性的交互式基准测试中达到了最先进的性能,在ALFWorld上达到97.4%的成功率,在WebShop上达到83.3%的成功率(使用Qwen2.5-7B-Instruct,分别比先前最佳方法高出6.6%和8.3%),在需要多个依赖子任务的长程任务上尤其取得了巨大收益。这些结果突显了显式层次分解对于多轮LLM智能体的可扩展RL训练的重要性。

英文摘要

Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where agents must execute extended sequences of actions before receiving meaningful feedback. Most existing reinforcement learning (RL) approaches model LLM agents as flat policies operating at a single time scale, selecting one action at each turn. In sparse-reward settings, such flat policies must propagate credit across the entire trajectory without explicit temporal abstraction, which often leads to unstable optimization and inefficient credit assignment. We propose HiPER, a novel Hierarchical Plan-Execute RL framework that explicitly separates high-level planning from low-level execution. HiPER factorizes the policy into a high-level planner that proposes subgoals and a low-level executor that carries them out over multiple action steps. To align optimization with this structure, we introduce a key technique called hierarchical advantage estimation (HAE), which carefully assigns credit at both the planning and execution levels. By aggregating returns over the execution of each subgoal and coordinating updates across the two levels, HAE provides an unbiased gradient estimator and provably reduces variance compared to flat generalized advantage estimation. Empirically, HiPER achieves state-of-the-art performance on challenging interactive benchmarks, reaching 97.4\% success on ALFWorld and 83.3\% on WebShop with Qwen2.5-7B-Instruct (+6.6\% and +8.3\% over the best prior method), with especially large gains on long-horizon tasks requiring multiple dependent subtasks. These results highlight the importance of explicit hierarchical decomposition for scalable RL training of multi-turn LLM agents.

2512.03706 2026-06-01 physics.comp-ph cs.LG math.DS 版本更新

Consistent Projection of Langevin Dynamics: Preserving Thermodynamics and Kinetics in Coarse-Grained Models

Langevin动力学的一致投影:在粗粒化模型中保持热力学和动力学

Vahid Nateghi, Lara Neureither, Selma Moqvist, Carsten Hartmann, Simon Olsson, Feliks Nüske

发表机构 * Max-Planck-Institute for Dynamics of Complex Technical Systems(复杂技术系统动力学马克斯-普朗克研究所) Institute of Mathematics(数学研究所) Brandenburgische Technische Universität Cottbus-Senftenberg(克托夫-森滕堡技术大学) Department of Computer Science and Engineering(计算机科学与工程系) Chalmers University of Technology(挑战大学) University of Gothenburg(哥德堡大学)

AI总结 提出一种基于投影的粗粒化形式,结合生成式扩展动态模式分解和热力学插值,准确捕捉全空间模型的热力学和动力学性质。

详情
AI中文摘要

粗粒化(CG)是对复杂多尺度系统(如生物分子的构象动力学)进行高效建模和模拟的重要任务。本文针对一般的欠阻尼Langevin动力学,提出了一种基于投影的粗粒化形式。遵循Zwanzig投影方法,我们推导了粗粒化动力学的闭式表达式。此外,我们展示了如何利用在Koopman算子方法背景下开发的生成式扩展动态模式分解(gEDMD)方法来建模CG动力学并评估其动力学性质,如过渡时间尺度。最后,我们将我们的方法与热力学插值(TI)相结合,这是一种在热力学条件之间转换样本的生成方法,从而无需重复数值模拟即可将方法扩展到跨热力学状态。通过一个二维模型系统,我们证明了所提出的方法能够准确捕捉全空间模型的热力学和动力学性质。

英文摘要

Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a projection-based coarse-graining formalism for general underdamped Langevin dynamics. Following the Zwanzig projection approach, we derive a closed-form expression for the coarse grained dynamics. In addition, we show how the generator Extended Dynamic Mode Decomposition (gEDMD) method, which was developed in the context of Koopman operator methods, can be used to model the CG dynamics and evaluate its kinetic properties, such as transition timescales. Finally, we combine our approach with thermodynamic interpolation (TI), a generative approach to transform samples between thermodynamic conditions, to extend the scope of the approach across thermodynamic states without repeated numerical simulations. Using a two-dimensional model system, we demonstrate that the proposed method allows to accurately capture the thermodynamic and kinetic properties of the full-space model.

2605.08145 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

自描述多模态交互调优:放大可利用冗余以实现鲁棒的视觉语言模型

Yuriel Ryan, Hei Man Ip, Adriel Kuek, Paul Pu Liang, Roy Ka-Wei Lee

发表机构 * Singapore University of Technology and Design(新加坡科技设计大学) DSO National Laboratories(国防部国家实验室) Massachusetts Institute of Technology(麻省理工学院)

AI总结 针对视觉语言模型中的幻觉和鲁棒性问题,提出自描述多模态交互调优方法,通过放大模态间冗余信息来补偿受损模态,并设计多模态交互门机制将独特交互转化为冗余交互,实验表明该方法可减少38.3%的视觉诱导错误并提升16.8%的一致性。

Comments Accepted to ICML 2026. Code: https://github.com/yurielryan/Multimodal-Interaction-Tuning

详情
AI中文摘要

当前的视觉语言模型在面对模糊或受损模态时存在幻觉和鲁棒性问题。我们假设这些问题可以通过利用模态间的共享信息来补偿受损模态得到解决。为此,我们分析了多模态交互——模态提供的冗余(共享)、独特(排他)和协同(涌现)任务相关信息——以确定它们对模型可靠性的影响。具体来说,放大冗余交互将增加这种可利用的共享信息以解决这些问题;然而,现代指令数据集通常消除冗余以优先考虑视觉定位。我们通过一个自描述工作流弥合这一差距,该工作流包含一个 extsc{多模态交互门}:一种将独特交互转化为冗余交互的机制。我们的发现表明,增加冗余可以减少38.3%的视觉诱导错误,并提高16.8%的一致性。

英文摘要

Current vision language models face hallucination and robustness issues against ambiguous or corrupted modalities. We hypothesize that these issues can be addressed by exploiting the shared information between modalities to compensate for the impaired one. To this end, we analyze multimodal interactions -- redundant (shared), unique (exclusive), and synergistic (emergent) task-relevant information provided by the modalities -- to determine their impacts on model reliability. Specifically, amplifying redundant interactions would increase this exploitable shared information to resolve these issues; yet, modern instruction datasets often eliminate redundancies to prioritize visual grounding. We bridge this gap through a self-captioning workflow featuring a \textsc{Multimodal Interaction Gate}: a mechanism to convert unique interactions into redundant interactions. Our findings suggest that increasing redundancy can reduce visual induced errors by 38.3\% and improve consistency by 16.8\%.

2605.06831 2026-06-01 cs.LG cs.AI 版本更新

Why DDIM Hallucinates More Than DDPM: A Theoretical Analysis of Reverse Dynamics

为什么DDIM比DDPM更容易产生幻觉:反向动力学的理论分析

Muhammad H. Ashiq, Samanyu Arora, Abhinav N. Harish, Ishaan Kharbanda, Hung Yun Tseng, Grigorios G. Chrysos

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 通过理论分析高斯混合目标下的反向ODE(DDIM)和SDE(DDPM),证明在临界时间τ后DDIM会卡在两个最近模式之间的线段上,而DDPM的随机性帮助其脱离该区域从而避免幻觉。

Comments Accepted in ICML

详情
AI中文摘要

我们从理论上研究了两种经典扩散采样器中的幻觉现象:随机去噪扩散概率模型(DDPM)和确定性去噪扩散隐式模型(DDIM)。我们分析了高斯混合目标下的反向ODE(DDIM)和SDE(DDPM),证明在临界时间τ后,(a) DDIM可能卡在连接两个最近模式的线段上,(b) DDPM的随机性帮助其脱离该区域,从而避免幻觉。我们的实证验证表明,当进入该区域时,DDPM的幻觉率显著低于DDIM。基于我们的观察,我们展示了如何使用额外的随机步骤帮助DDIM避免幻觉,并为设计改进的采样器提供了新见解。

英文摘要

We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $τ$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM *stochasticity* helps it become unstuck from this region, thus avoiding hallucination. Our empirical validation verifies that DDPM has a significantly lower hallucination rate than DDIM when this region is entered. Building on our observations, we exhibit how using additional stochastic steps can help DDIM avoid hallucinations and offer new insights on how to design improved samplers.

2605.06137 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Autoregressive Visual Generation Needs a Prologue

自回归视觉生成需要一个序幕

Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) hi-Lab, Xiaohongshu Inc(小红书实验室)

AI总结 提出Prologue方法,通过生成前置的序幕令牌来弥合自回归图像生成中的重建-生成差距,在不影响重建质量的前提下显著提升生成性能。

Comments Code: https://github.com/Zyriix/prologue Demo: https://huggingface.co/spaces/Zyriix/prologue-demo

详情
AI中文摘要

在这项工作中,我们提出了Prologue,一种弥合自回归(AR)图像生成中重建-生成差距的方法。Prologue不修改视觉令牌以同时满足重建和生成,而是生成一小部分序幕令牌,并将其前置到视觉令牌序列之前。这些序幕令牌仅使用AR交叉熵(CE)损失进行训练,而视觉令牌则专用于重建。这种解耦设计使我们能够通过AR模型的真实分布优化生成,而不影响重建质量,我们进一步从ELBO角度形式化了这一点。在ImageNet 256x256上,Prologue-Base在没有无分类器引导的情况下将gFID从21.01降至10.75,同时几乎保持重建不变;Prologue-Large使用标准AR模型,无需辅助语义监督,达到了具有竞争力的rFID 0.99和gFID 1.46。有趣的是,仅由AR梯度驱动,序幕令牌展现出涌现的语义结构:对16个序幕令牌进行线性探测达到35.88%的Top-1准确率,远高于标准分词器前16个令牌的23.71%;使用固定序幕令牌进行重采样保留了相似的高层语义布局。我们的结果暗示了一个新方向:通过引入单独学习的生成表示,同时保持原始表示不变,可以提升生成质量。

英文摘要

In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further formalize from an ELBO perspective. On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision. Interestingly, driven only by AR gradients, prologue tokens exhibit emergent semantic structure: linear probing on 16 prologue tokens reaches 35.88% Top-1, far above the 23.71% of the first 16 tokens from a standard tokenizer; resampling with fixed prologue tokens preserves a similar high-level semantic layout. Our results suggest a new direction: generation quality can be improved by introducing a separate learned generative representation while leaving the original representation intact.

2605.05520 2026-06-01 cs.LG stat.AP stat.ML 版本更新

Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

使用商业微波链路和扩散模型先验的贝叶斯雨场重建

Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati, Hagit Messer, Eric Moulines

发表机构 * School of Electrical and Computer Engineering, Tel Aviv University, Tel Aviv, Israel(电气与计算机工程学院,特拉维夫大学,特拉维夫,以色列)

AI总结 提出将雨场重建视为贝叶斯逆问题,利用扩散模型作为高保真空间先验,通过无需训练的后验采样方法(如即插即用、序贯蒙特卡洛和副本交换)实现优于传统方法的性能。

Comments Added link to source code

详情
Journal ref
ICML 2026
AI中文摘要

商业微波链路(CML)为降雨感知提供了密集的空间覆盖,但其产生的路径积分测量使得精确的地面重建具有挑战性。现有方法通常将CML简化为点传感器,并忽略降雨与信号衰减之间的线积分关系,导致在非均匀降水条件下性能下降。在这项工作中,我们将雨场重建视为一个贝叶斯逆问题,使用扩散模型(DM)作为高保真空间先验。我们表明,与删失高斯过程相比,扩散模型能更好地保留关键降雨统计量。将降雨估计视为具有DM先验的贝叶斯逆问题,使得可以使用广泛的无需训练的后验采样方法,包括即插即用、序贯蒙特卡洛和副本交换方法。在合成和真实世界数据集上的实验表明,与基于CML的现有重建基线相比,该方法具有一致的改进。

英文摘要

Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-fidelity spatial priors. We show that diffusion models better preserve key rainfall statistics compared to censored Gaussian processes. Framing rainfall estimation as a Bayesian inverse problem with a DM prior enables training-free posterior sampling using a broad family of methods, including Plug-and-Play, Sequential Monte Carlo, and Replica Exchange methods. Experiments on synthetic and real-world datasets demonstrate consistent improvements over established CML-based reconstruction baselines.

2510.03096 2026-06-01 cs.LG 版本更新

Adaptive Node Feature Selection For Graph Neural Networks

图神经网络的自适应节点特征选择

Madeline Navarro, Ali Azizpour, Santiago Segarra

发表机构 * Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA(电气与计算机工程系,理海大学,休斯顿,德克萨斯州,美国)

AI总结 提出一种自适应节点特征选择方法,通过置换特征值后验证性能的变化来识别和移除无关特征,适用于任意数据、模型和任务。

详情
AI中文摘要

我们为图神经网络(GNN)提出了一种自适应节点特征选择方法,能够在训练过程中识别并移除不必要的特征。衡量特征对模型输出的贡献能力对于解释决策和通过消除无帮助变量来降低维度至关重要。然而,图结构数据引入了复杂的依赖关系,可能不适合经典的特征重要性度量。受此启发,我们提出了一种数据、模型和任务无关的方法,该方法基于置换特征值后验证性能的变化,在训练过程中确定相关特征。我们从理论上通过刻画节点数据与图结构之间的关系如何影响GNN性能来论证我们的方法。实验表明:(i)我们的高度通用方法可与利用先验假设的定制特征选择方法相媲美;(ii)在GNN完全训练之前,我们就能返回有意义的特征重要性分数;(iii)我们的分数明显提取了与各种图学习设置中特征重要性相关的属性。

英文摘要

We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.

2605.00265 2026-06-01 cs.LG 版本更新

Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

Polaris: 用于层次概念学习的耦合轨道极坐标嵌入

Sahil Mishra, Srinitish Srinivasan, Sourish Dasgupta, Tanmoy Chakraborty

发表机构 * Indian Institute of Technology Delhi, New Delhi, India(印度理工学院德里分校,新德里,印度) Indian Institute of Technology Delhi, Abu Dhabi, UAE(印度理工学院德里分校,阿布扎比,阿联酋) KDM Lab, Dhirubhai Ambani University Gandhinagar, Gujarat, India(KDM实验室,迪鲁布希阿姆巴尼大学冈丁加尔,古吉拉特邦,印度)

AI总结 提出Polaris极坐标超球面嵌入框架,通过角度和半径分离语义与层次,结合局部约束、全局正则化和不确定性感知非对称目标,在多种层次结构扩展任务中显著提升检索性能。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

现实世界的知识通常组织为层次结构,如产品分类法、医学本体和标签树,但由于非对称结构和噪声语义,学习层次表示具有挑战性。我们引入了Polaris,一个极坐标超球面嵌入框架,它使用角度几何和半径将语义性与层次分离,使得在不干扰的情况下学习意义和结构。为了将潜在表示映射到球面上,我们将其投影到北极的切空间,应用指数映射,并使用球面线性层学习单位范数表示。Polaris结合了鲁棒的局部约束、防止几何坍缩的全局正则化以及鼓励方向包含的不确定性感知非对称目标。在推理时,Polaris使用结构引导检索在最终排序前高效缩小候选父节点范围。我们在分类法扩展的不同设置上评估Polaris——涵盖树、多父DAG和多模态层次结构,在top-K检索中一致提升高达约19个点,在14个强基线上平均排名降低高达约60%。

英文摘要

Real-world knowledge is often organized as hierarchies such as product taxonomies, medical ontologies, and label trees, yet learning hierarchical representations is challenging due to asymmetric structure and noisy semantics. We introduce Polaris, a polar hyperspherical embedding framework that separates semanticity from hierarchy using angular geometry and radius, enabling the learning of meaning and structure without interference. To map latent representation onto the sphere, we project it to the tangent space at the north pole, apply the exponential map, and learn unit-norm representations using spherical linear layers. Polaris then combines robust local constraints, global regularization that prevents geometric collapse, and uncertainty-aware asymmetric objectives that encourage directional containment. At inference time, Polaris uses structure-guided retrieval to efficiently narrow down candidate parents before final ranking. We evaluate Polaris on different settings of taxonomy expansion - spanning trees, multi-parent DAGs, and multimodal hierarchies, showing consistent improvements of up to ~19 points in top-K retrieval and up to ~60% reduction in mean rank over fourteen strong baselines.

2602.03216 2026-06-01 cs.CL cs.LG 版本更新

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Token Sparse Attention: 交错令牌选择的高效长上下文推理

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

发表机构 * Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea(电气电子工程系,首尔国立大学,首尔,韩国)

AI总结 提出Token Sparse Attention,一种轻量级动态令牌级稀疏化机制,通过交错选择令牌并在注意力前后压缩/解压缩,实现高效长上下文推理,在128K上下文中获得高达3.23倍加速且精度损失小于1%。

Comments ICML 2026

详情
AI中文摘要

注意力的二次复杂度仍然是大语言模型长上下文推理的核心瓶颈。先前的加速方法要么使用结构化模式稀疏化注意力图,要么在特定层永久驱逐令牌,这可能会保留不相关的令牌或依赖不可逆的早期决策,尽管令牌重要性具有层/头动态性。在本文中,我们提出Token Sparse Attention,一种轻量级动态令牌级稀疏化机制,在注意力期间将每个头的$Q$、$K$、$V$压缩到减少的令牌集,然后将输出解压缩回原始序列,使得令牌信息可以在后续层中重新考虑。此外,Token Sparse Attention在令牌选择和稀疏注意力的交叉点上暴露了一个新的设计点。我们的方法完全兼容密集注意力实现,包括Flash Attention,并且可以无缝地与现有的稀疏注意力内核组合。实验结果表明,Token Sparse Attention持续改善精度-延迟权衡,在128K上下文中实现高达3.23倍的注意力加速,且精度下降小于1%。这些结果表明,动态和交错的令牌级稀疏化是可扩展长上下文推理的一种互补且有效的策略。

英文摘要

The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured patterns or permanently evict tokens at specific layers, which can retain irrelevant tokens or rely on irreversible early decisions despite the layer-/head-wise dynamics of token importance. In this paper, we propose Token Sparse Attention, a lightweight and dynamic token-level sparsification mechanism that compresses per-head $Q$, $K$, $V$ to a reduced token set during attention and then decompresses the output back to the original sequence, enabling token information to be reconsidered in subsequent layers. Furthermore, Token Sparse Attention exposes a new design point at the intersection of token selection and sparse attention. Our approach is fully compatible with dense attention implementations, including Flash Attention, and can be seamlessly composed with existing sparse attention kernels. Experimental results show that Token Sparse Attention consistently improves accuracy-latency trade-off, achieving up to $\times$3.23 attention speedup at 128K context with less than 1% accuracy degradation. These results demonstrate that dynamic and interleaved token-level sparsification is a complementary and effective strategy for scalable long-context inference.

2604.28020 2026-06-01 cs.LG 版本更新

Cost-Aware Learning

成本感知学习

Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour

发表机构 * Kempner Institute(凯姆纳研究所) Harvard University(哈佛大学) Google Research(谷歌研究) Tel Aviv University(特拉维夫大学)

AI总结 针对有限和优化中不同组件采样成本不同的问题,提出基于梯度范数和成本的采样分布算法Cost-Aware SGD,并应用于语言模型强化学习,显著降低策略优化中的token使用量。

详情
AI中文摘要

我们考虑成本感知学习问题,其中对有限和目标的各个组件进行采样会产生不同的成本。目标是在最小化总成本的同时达到目标误差。我们提出了成本感知SGD,它使用基于梯度范数和成本的分布来采样组件。我们对该算法进行了深入分析,包括相对于基线的成本改进界限、分布代理次优性的刻画以及下界。我们将理论见解应用于语言模型的强化学习,其中序列级策略梯度的计算成本随长度变化。我们发现优势幅度作为梯度范数的高保真代理,并据此引入成本感知GRPO。在1.5B、4B和8B LLM上的实验结果表明,该算法显著减少了策略优化中使用的token数量,同时匹配或超过基线准确率。

英文摘要

We consider the problem of Cost-Aware Learning, where sampling different components of a finite-sum objective incurs different costs. The objective is to reach a target error while minimizing the total cost. We propose Cost-Aware SGD, which uses a distribution based on gradient norms and costs to sample components. We provide a thorough analysis of this algorithm, including cost-improvement bounds over baselines, a characterization of distribution proxy sub-optimality, and a lower bound. We apply our theoretical insights to reinforcement learning with language models, where the computational cost of sequence-level policy gradients varies with length. We find that the advantage magnitude serves as a high-fidelity proxy for gradient norms, and use this to introduce Cost-Aware GRPO. Empirical results on 1.5B, 4B, and 8B LLMs demonstrate that this algorithm significantly reduces the tokens used in policy optimization while matching or exceeding baseline accuracy.

2604.23436 2026-06-01 stat.ML cs.LG math.OC stat.CO 版本更新

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

带有Nesterov加速草图的在线牛顿方法的推断

Haoxuan Wang, Xinchen Du, Sen Na

发表机构 * School of Industrial and Systems Engineering, Georgia Institute of Technology(工业与系统工程系,佐治亚理工学院)

AI总结 针对在线牛顿方法推断计算成本高的问题,提出结合Hessian平均与Nesterov加速草图投影求解器的方法,在保持一阶方法$O(d^2)$复杂度下实现二阶方法的鲁棒性,并建立了全局收敛性、渐近正态性和在线协方差估计器。

Comments 52 pages, 2 tables, 3 figures; accepted at ICML 2026

详情
AI中文摘要

基于流式数据的可靠决策需要对在线方法进行原则性的不确定性量化。虽然一阶方法能够实现高效的迭代更新,但其推断过程仍需更新适当的(协方差)矩阵,导致$O(d^2)$的时间和内存复杂度,并且对问题的病态性和噪声异质性敏感。这一昂贵的推断任务为更鲁棒的二阶方法提供了机会,然而二阶方法受限于求解牛顿系统所需的$O(d^3)$复杂度。在本文中,我们通过研究一种带有Hessian平均的在线牛顿方法来解决这一差距,其中每一步的牛顿方向使用带有Nesterov加速的草图投影求解器近似计算,匹配了一阶方法的$O(d^2)$复杂度。对于所提出的方法,我们量化了来自随机数据和随机计算的不确定性。在标准光滑性和矩条件下,我们建立了全局几乎必然收敛性,证明了最后迭代的渐近正态性,其极限协方差由Lyapunov方程刻画,并开发了一个完全在线的协方差估计器,具有非渐近收敛保证。我们还将所得的不确定性量化与没有Nesterov加速的精确和草图牛顿方法联系起来。在回归模型上的大量实验证明了所提出方法在在线推断中的优越性。

英文摘要

Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.

2604.22794 2026-06-01 eess.SY cs.LG cs.SY 版本更新

Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations

通过专家演示加速风电场控制的强化学习

Marcus Binder Nilsen, Julian Quick, Tuhfe Göçmen, Nikolay Dimitrov, Pierre-Elouan Réthoré

发表机构 * Department of Wind and Energy Systems, Technical University of Denmark(丹麦技术大学风能与能源系统系)

AI总结 提出一种利用稳态尾流模型生成的专家演示预训练方法,通过行为克隆初始化Soft Actor-Critic网络,消除初始学习阶段,使初始性能接近基线水平,并在在线微调后超越查表控制器。

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

详情
Journal ref
J. Phys.: Conf. Ser. 3224 032016 (2026)
AI中文摘要

强化学习为自适应风电场流量控制提供了一种有前景的方法,但其实际部署受到训练收敛缓慢和初始性能差的阻碍,如果直接部署未经训练的智能体,这些因素可能导致多年的功率输出减少。本文研究了稳态尾流模型中的领域知识是否可以加速强化学习训练并提高初始控制器性能。我们提出了一种预训练方法,其中通过在动态尾流模拟(WindGym)中部署基于PyWake的稳态优化器生成专家演示,然后通过行为克隆初始化Soft Actor-Critic智能体的演员和评论家网络。在2x2风电场上的实验表明,预训练消除了代价高昂的初始学习阶段:未经训练的智能体性能比贪婪零偏航基线低约12%,而预训练将初始性能提升至接近基线水平。在在线微调过程中,所有配置在250,000个环境步骤内收敛到相似性能,最终超过查表控制器,后者在500,000步后达到约7%的功率增益。

英文摘要

Reinforcement learning (RL) offers a promising approach for adaptive wind farm flow control, yet its practical deployment is hindered by slow training convergence and poor initial performance, factors that could translate to years of reduced power output if an untrained agent were deployed directly. This work investigates whether domain knowledge from steady-state wake models can accelerate RL training and improve initial controller performance. We propose a pretraining methodology in which expert demonstrations are generated by deploying a PyWake-based steady-state optimizer within a dynamic wake simulation (WindGym), then used to initialize both the actor and critic networks of a Soft Actor-Critic agent via behavior cloning. Experiments on a 2x2 wind farm show that pretraining eliminates the costly initial learning phase: while an untrained agent underperforms the greedy zero-yaw baseline by approximately 12%, pretraining raises initial performance to near-baseline levels. During online fine-tuning, all configurations converge within 250,000 environment steps to achieve similar performance, ultimately exceeding that of a lookup-table controller, which reaches approximately 7% power gain after 500,000 steps.

2604.22722 2026-06-01 cs.IR cs.AI cs.LG 版本更新

Aligning Dense Retrievers with LLM Utility via Distillation

通过蒸馏将稠密检索器与LLM效用对齐

Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai, Maksims Volkovs, Ga Wu

发表机构 * Dalhousie University(达尔豪西大学)

AI总结 提出Utility-Aligned Embeddings (UAE)框架,通过蒸馏LLM的困惑度降低效用分布来训练双编码器,在不增加测试时LLM推理开销的情况下提升稠密检索的精度和效率。

详情
AI中文摘要

稠密向量检索是检索增强生成(RAG)的实用支柱,但相似性搜索可能受限于精度。相反,利用LLM重排序的基于效用的方法通常能实现更优性能,但计算成本高且易受困惑度估计中固有噪声的影响。我们提出Utility-Aligned Embeddings (UAE),一个旨在将这些优势融合为实用、高性能检索方法的框架。我们将检索表述为分布匹配问题,使用Utility-Modulated InfoNCE目标训练双编码器以模仿由困惑度降低导出的效用分布。该方法将分级效用信号直接注入嵌入空间,无需测试时LLM推理。在QASPER基准上,UAE在召回率@1上提升30.59%,MAP提升30.16%,Token F1提升17.3%,优于强语义基线BGE-Base。关键的是,UAE比高效的LLM重排序方法快180倍以上,同时保持竞争性能,表明将检索与生成效用对齐能在规模上产生可靠的上下文。

英文摘要

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

2604.09429 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

射线即像素:学习视频与相机轨迹的联合分布

Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang

发表机构 * Meta AI

AI总结 提出一种视频扩散模型(Rays as Pixels),通过将相机表示为密集射线像素(raxels)并与视频帧共享潜在空间,联合去噪实现相机轨迹预测和相机控制视频生成。

Comments Accepted to ICML 2026. 9-page main paper plus supplementary material. Project page: https://wbjang.github.io/raysaspixels/

详情
AI中文摘要

从图像恢复相机参数和从新视角渲染场景在计算机视觉和图形学中被视为独立任务。当图像覆盖稀疏或姿态模糊时,这种分离会失效,因为每个任务依赖于另一个任务的输出。我们提出Rays as Pixels,一种视频扩散模型(VDM),学习视频和相机轨迹的联合分布。据我们所知,这是首个在单一框架内预测相机姿态并进行相机控制视频生成的模型。我们将每个相机表示为密集射线像素(raxels),这是一种与视频帧位于同一潜在空间的像素对齐编码,并通过解耦自交叉注意力机制联合去噪两者。一个训练好的模型处理三个任务:从视频预测相机轨迹、沿预定义轨迹从输入图像生成视频、以及从输入图像联合合成视频和轨迹。我们在姿态估计和相机控制视频生成上进行评估,并引入闭环自一致性测试,显示模型预测的姿态及其基于这些姿态的渲染结果一致。与Plücker嵌入的消融实验证实,将相机与视频共享潜在空间显著更有效。

英文摘要

Recovering camera parameters from images and rendering scenes from novel viewpoints have been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task depends on what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. To our knowledge, this is the first model to predict camera poses and do camera-controlled video generation within a single framework. We represent each camera as dense ray pixels (raxels), a pixel-aligned encoding that lives in the same latent space as video frames, and denoise the two jointly through a Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, generating video from input images along a pre-defined trajectory, and jointly synthesizing video and trajectory from input images. We evaluate on pose estimation and camera-controlled video generation, and introduce a closed-loop self-consistency test showing that the model's predicted poses and its renderings conditioned on those poses agree. Ablations against Plücker embeddings confirm that representing cameras in a shared latent space with video is subtantially more effective.

2604.18587 2026-06-01 cs.LG cs.AI cs.LO cs.PL 版本更新

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

编译以压缩:通过编译器输出提升形式定理证明器

Guchan Li, Rui Tian, Hongning Wang

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China(清华大学计算机科学与技术系)

AI总结 利用编译器将大量证明尝试压缩为结构化失败模式,提出一种学习-精炼框架,通过树搜索基于验证器反馈局部修正错误,在可比测试时预算下在PutnamBench上达到最先进性能。

详情
AI中文摘要

大型语言模型在形式定理证明中展现出显著潜力,但最先进的性能往往需要通过大量展开或扩展上下文窗口来实现令人望而却步的测试时计算。在这项工作中,我们通过利用形式验证中的一种信息结构来解决这一可扩展性瓶颈:观察到编译器将大量不同的证明尝试空间映射到一组紧凑的结构化失败模式。我们引入了一个学习-精炼框架,利用这种压缩来执行高效的学习和证明探索。我们执行树搜索,根据明确的验证器反馈局部修正错误,从而避免了积累长历史证明尝试的相关成本。大量评估表明,我们的方法在不同规模上持续增强了基础证明器的推理能力。值得注意的是,在可比较的测试时预算下,我们的方法在PutnamBench上达到了公开报告的约80亿和约320亿参数模型中的最先进性能,为下一代验证器引导推理提供了一种可扩展的范式。

英文摘要

Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured failure modes. We introduce a learning-to-refine framework that leverages this compression to perform efficient learning and proof exploration. We perform tree search that corrects errors locally conditioned on explicit verifier feedback, thereby circumventing the costs associated with accumulating a long history of proof attempts. Extensive evaluations show that our method consistently amplifies the reasoning capabilities of base provers across varying scales. Notably, our approach achieves state-of-the-art performance on PutnamBench among publicly reported $\sim$8B and $\sim$32B parameter models under comparable test-time budgets, offering a scalable paradigm for next-generation verifier-guided reasoning.

2604.17551 2026-06-01 cs.LG cs.AI 版本更新

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

SVL:目标条件强化学习作为生存学习

Franki Nguimatsia Tiofack, Fabian Schramm, Théotime Le Hellard, Justin Carpentier

发表机构 * Inria(法国国家信息与自动化研究所) École Normale Supérieure, PSL Research University, Paris, France(巴黎高等师范学院,PSL研究大学)

AI总结 提出生存价值学习(SVL),通过将时间到目标建模为概率分布,将目标条件强化学习重构为生存学习问题,并利用危险模型进行最大似然估计,在离线基准上匹配或超越强基线方法。

Comments Accepted to the 43rd International Conference on Machine Learning, Seoul, South Korea

详情
AI中文摘要

标准的目标条件强化学习(GCRL)方法依赖于时间差分学习,由于自举可能导致不稳定和样本效率低下。虽然最近的工作探索了对比和监督公式以提高稳定性,但我们提出了一种概率替代方案,称为生存价值学习(SVL),通过将每个状态到目标的时间建模为概率分布,将GCRL重新定义为生存学习问题。这种结构化的分布蒙特卡洛视角产生了一个闭式恒等式,将目标条件价值函数表示为生存概率的折扣和,从而通过危险模型在事件和右删失轨迹上进行最大似然估计来实现价值估计。我们引入了三种实用的价值估计器,包括有限视界截断和两种分箱无限视界近似,以捕捉长视界目标。在离线GCRL基准上的实验表明,SVL与层次化演员结合,匹配或超越了强大的层次化TD和蒙特卡洛基线,在复杂的长视界任务上表现出色。网页和代码:https://simple-robotics.github.io/publications/survival-value-learning/

英文摘要

Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a probability distribution. This structured distributional Monte Carlo perspective yields a closed-form identity that expresses the goal-conditioned value function as a discounted sum of survival probabilities, enabling value estimation via a hazard model trained via maximum likelihood on both event and right-censored trajectories. We introduce three practical value estimators, including finite-horizon truncation and two binned infinite-horizon approximations to capture long-horizon objectives. Experiments on offline GCRL benchmarks show that SVL combined with hierarchical actors matches or surpasses strong hierarchical TD and Monte Carlo baselines, excelling on complex, long-horizon tasks. Webpage and Code: https://simple-robotics.github.io/publications/survival-value-learning/

2604.16278 2026-06-01 cs.AI cs.CL cs.LG 版本更新

Learning to Reason with Insight for Informal Theorem Proving

学习在非形式定理证明中进行洞察推理

Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song

发表机构 * City University of Hong Kong(香港城市大学) Tsinghua University(清华大学) Ke Holdings Inc.(Ke控股公司) Shenzhen University of Advanced Technology(深圳先进技术大学) Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 针对非形式定理证明中缺乏洞察(识别核心技巧)的瓶颈,提出统一训练框架DeepInsight,通过分层数据集、渐进式多阶段SFT和基于洞察的策略优化方法,显著提升大语言模型的数学推理能力。

详情
AI中文摘要

尽管大多数自动定理证明方法依赖于形式证明系统,但非形式定理证明能更好地发挥大语言模型(LLMs)在自然语言处理方面的优势。在这项工作中,我们识别出非形式定理证明的一个主要瓶颈是缺乏洞察,即难以识别解决复杂问题所需的核心技巧。为了解决这个问题,我们提出了$ exttt{DeepInsight}$,一个统一的训练框架,旨在培养这种基本的推理技能,并使LLMs能够进行洞察推理。我们的框架由三个部分组成:(1)$ exttt{DeepInsightTheorem}$,一个分层数据集,通过显式提取核心技巧和证明草图以及最终证明来结构化非形式证明;(2)渐进式多阶段SFT策略,模拟人类学习过程,教授模型证明写作、规划和洞察识别;(3)$ exttt{InsightPO}$,一种策略优化方法,在此洞察层次结构上分配结构化奖励。我们在具有挑战性的数学基准上的实验表明,这种洞察感知的生成策略显著优于基线。这些结果表明,教模型识别和应用核心技巧可以大幅提高其数学推理能力。

英文摘要

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose $\texttt{DeepInsight}$, a unified training framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. Our framework consists of three components: (1) $\texttt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof; (2) a Progressive Multi-Stage SFT strategy that mimics the human learning process, teaching the model proof writing, planning, and insight identification; and (3) $\texttt{InsightPO}$, a policy optimization method that assigns structured rewards over this insight hierarchy. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantly outperforms baselines. These results demonstrate that teaching models to identify and apply core techniques can substantially improve their mathematical reasoning.

2604.15959 2026-06-01 cs.LG 版本更新

Multi-Objective Bayesian Optimization via Adaptive \varepsilon-Constraints Decomposition

基于自适应 ε-约束分解的多目标贝叶斯优化

Yaohong Yang, Sammie Katt, Samuel Kaski

发表机构 * Department of Computer Science, Aalto University, Espoo, Finland(阿尔托大学计算机科学系,芬兰 Espoo) ELLIS Institute Finland(芬兰 ELLIS 机构) Department of Computer Science, University of Manchester, Manchester, United Kingdom(曼彻斯特大学计算机科学系,英国 Manchester)

AI总结 提出STAGE-BO方法,通过自适应ε-约束分解将多目标优化转化为序列约束子问题,实现均匀帕累托覆盖并处理约束与偏好。

Comments 24 pages, 22 figures, 4 tables. Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

多目标贝叶斯优化(MOBO)为优化多个昂贵的黑箱函数提供了一个原则性框架。然而,现有的MOBO方法通常在覆盖性、可扩展性以及处理约束和偏好方面存在困难。在这项工作中,我们提出了STAGE-BO,即顺序目标自适应间隙填充ε-约束贝叶斯优化:通过分析代理帕累托前沿的覆盖性,我们的方法识别出具有最大未覆盖间隙的帕累托前沿点,并使用其坐标在ε-约束方法中定义自适应约束,从而将问题转化为一系列不等式约束子问题,并通过约束期望改进采集函数高效求解。我们的方法无需超体积计算即可实现均匀的帕累托覆盖,并自然地处理约束和偏好。在合成和真实世界基准上的实验表明,与最先进的基线相比,我们的方法具有优越的覆盖性和具有竞争力的超体积性能。我们的代码实现可在https://github.com/YangYaohong1/STAGE-BO找到。

英文摘要

Multi-objective Bayesian optimization (MOBO) provides a principled framework for optimizing multiple expensive black-box functions. However, existing MOBO methods often struggle with coverage, scalability, and handling constraints and preferences. In this work we propose STAGE-BO, Sequential Targeting Adaptive Gap-Filling $\varepsilon$-Constraint Bayesian Optimization: by analyzing the coverage of the surrogate Pareto front, our method identifies the Pareto front point with the largest uncovered gap, and uses its coordinates to define adaptive constraints in $\varepsilon$-constraint method, which transforms the problem into a sequence of inequality-constrained subproblems, efficiently solved via constrained expected improvement acquisition. Our approach provides uniform Pareto coverage without hypervolume computation and naturally handles constraints and preferences. Experiments on synthetic and real-world benchmarks demonstrate superior coverage and competitive hypervolume performance against state-of-the-art baselines. Our code implementation can be found at https://github.com/YangYaohong1/STAGE-BO.

2604.11613 2026-06-01 cs.LG cs.AI 版本更新

Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification

对称性揭示逐层动力学:Transformer如何执行上下文分类

Patrick Lutz, Themistoklis Haris, Arjun Chandra, Aditya Gangrade, Venkatesh Saligrama

发表机构 * Boston University, Departments of Computer Science

AI总结 通过强制特征和标签排列等变性,从Transformer中提取出显式的深度索引递归更新规则,揭示了上下文分类的几何驱动算法。

Comments appears in the Proceedings of the 43rd International Conference on Machine Learning (ICML '26)

详情
AI中文摘要

Transformer可以从少量标记示例中执行上下文分类,但推理时的算法仍然不透明。我们研究了硬无间隔机制下的多类线性分类,并通过在每一层强制特征和标签排列等变性使计算可识别。这实现了可解释性,同时保持了功能等价性,并产生了高度结构化的权重。从这些模型中,我们提取出一个显式的深度索引递归:一个端到端可识别的、在softmax Transformer内部涌现的更新规则,据我们所知这是首个此类规则。由混合特征-标签Gram结构形成的注意力矩阵驱动训练点、标签和测试探针的耦合更新。由此产生的动力学实现了一个几何驱动的算法主题,该主题可以证明放大类别分离并产生鲁棒的期望类别对齐。

英文摘要

Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract an explicit depth-indexed recursion: an end-to-end identified, emergent update rule inside a softmax transformer, to our knowledge the first of its kind. Attention matrices formed from mixed feature-label Gram structure drive coupled updates of training points, labels, and the test probe. The resulting dynamics implement a geometry-driven algorithmic motif, which can provably amplify class separation and yields robust expected class alignment.

2604.09412 2026-06-01 stat.ML cond-mat.dis-nn cs.LG 版本更新

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

高维两层ReLU神经网络损失景观中局部极小值的精确描述

Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli

发表机构 * Physics, Chalmers University of Technology University of Gothenburg Ecole Normale Superieure, PSL \& CNRS Engineering, Chalmers University of Technology School of Computer Science Applied Mathematics, University of the Witwatersrand

AI总结 本文通过总结统计量精确刻画了高维两层ReLU神经网络损失景观中的局部极小值,并建立了与单次SGD的关联,揭示了过参数化对极小值稳定性和可达性的影响。

Comments 29 pages, 18 figures. Accepted as a conference paper at ICML 2026

详情
AI中文摘要

我们研究了在可实现教师-学生设置下,具有高斯协变量的形式为$\sum_{k=1}^K \mathrm{ReLU}(w_k^ op x)$的两层ReLU网络的总体损失景观。我们证明局部极小值在总结统计量方面允许精确的低维表示,从而对景观产生清晰且可解释的描述。我们进一步建立了与单次SGD的直接联系:局部极小值对应于总结统计量空间中动力学的吸引不动点。这一视角揭示了极小值分组成离散族的层次结构,并展示了过参数化如何改变它们在基于梯度动力学下的稳定性和可达性。在这种过参数化机制下,全局极小值变得越来越可访问,吸引动力学并减少收敛到虚假解。总的来说,我们的结果揭示了常见简化假设的内在局限性,这些假设即使在最小的神经网络模型中也可能遗漏损失景观的基本特征。

英文摘要

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical organisation of minima into discrete families and shows how overparameterisation changes their stability and reachability under gradient-based dynamics. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

2503.09315 2026-06-01 cs.LG 版本更新

ShuffleGate: Scalable Feature Optimization for Recommender Systems via Batch-wise Sensitivity Learning

ShuffleGate: 通过批量敏感性学习实现推荐系统的可扩展特征优化

Yihong Huang, Chen Chu, Fan Zhang, Liping Wang Fei Chen, Yu Lin, Ruiduan Li, Zhihao Li

发表机构 * Bilibili Inc.(哔哩哔哩公司) Guangzhou University(广州大学) East China Normal University(华东师范大学)

AI总结 提出ShuffleGate机制,通过批量洗牌策略以可微方式估计特征重要性,统一特征选择和维度选择任务,实现极化重要性分布,避免复杂阈值调优,在四个基准上优于现有方法,并在工业部署中实现10倍维度压缩和91%训练吞吐量提升。

详情
AI中文摘要

特征优化——特别是特征选择(FS)和维度选择(DS)——对于大规模推荐系统的效率和泛化能力至关重要。虽然概念上相关,但这些任务通常采用孤立的解决方案,往往存在重要性分数模糊或计算成本过高的问题。在本文中,我们提出ShuffleGate,一种统一且可解释的机制,通过衡量模型对信息损失的敏感性来估计组件重要性。与学习相对权重的传统门控不同,ShuffleGate引入了一种批量洗牌策略,以端到端可微的方式有效“擦除”信息。这种范式转变产生了自然极化的重要性分布,弥合了长期存在的“搜索-重训练差距”,并在无需复杂阈值调优的情况下区分关键信号与噪声。在四个基准上的大量实验验证了ShuffleGate在特征选择和维度选择任务中均持续优于最先进的方法。它比排列基线实现了15倍的加速,并展示了极端的可扩展性,在仅700秒内处理了2.7亿个参数。最后,在一项顶级工业部署中,它将输入维度压缩了10倍,使得训练吞吐量提高了91%,同时每天服务数十亿次请求且性能无下降。

英文摘要

Feature optimization -- specifically Feature Selection (FS) and Dimension Selection (DS) -- is critical for the efficiency and generalization of large-scale recommender systems. While conceptually related, these tasks are typically tackled with isolated solutions that often suffer from ambiguous importance scores or prohibitive computational costs. In this paper, we propose ShuffleGate, a unified and interpretable mechanism that estimates component importance by measuring the model's sensitivity to information loss. Unlike conventional gating that learns relative weights, ShuffleGate introduces a batch-wise shuffling strategy to effectively "erase" information in an end-to-end differentiable manner. This paradigm shift yields naturally polarized importance distributions, bridging the long-standing "search-retrain gap" and distinguishing essential signals from noise without complex threshold tuning. Extensive experiments across four benchmarks validate that ShuffleGate consistently outperforms state-of-the-art methods in both Feature and Dimension Selection tasks. It achieves a 15\times speedup over permutation baselines and demonstrates extreme scalability by processing 270M parameters in just 700 seconds. Finally, in a top-tier industrial deployment, it compressed input dimensions by 10\times, yielding a 91% increase in training throughput while serving billions of daily requests without performance degradation.

2604.06881 2026-06-01 cs.LG physics.flu-dyn 版本更新

MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems

MENO: 用于动力系统的MeanFlow增强神经算子

Tianyue Yang, Xiao Xue

发表机构 * Centre for Computational Science, University College London, London, United Kingdom(伦敦大学学院计算科学中心)

AI总结 提出MENO框架,通过改进的MeanFlow方法恢复多尺度特征,在低分辨率训练数据下实现高分辨率准确预测,且推理速度比扩散增强方法快14倍。

Comments 27 pages, 13 figures

详情
AI中文摘要

神经算子因其网格不变性和计算效率而成为动力系统的强大替代模型。然而,基于傅里叶的变体在谱空间中固有地截断高频分量,导致小尺度结构丢失,并在低分辨率数据训练时降低高分辨率下的预测质量。虽然基于扩散的增强方法可以恢复多尺度特征,但它们引入了大量推理开销,削弱了神经算子的效率优势。在这项工作中,我们引入了MeanFlow增强神经算子(MENO),一种新颖的框架,以最小的推理成本实现准确的全尺度预测。通过利用改进的MeanFlow方法,MENO恢复了小尺度细节和大尺度动力学,具有优越的物理保真度和统计准确性。我们在三个具有挑战性的动力系统上评估了MENO,包括相场动力学、二维Kolmogorov流和活性物质动力学,分辨率高达256×256。在所有基准测试中,与基线神经算子相比,MENO将功率谱密度精度提高了最多2倍,同时与最先进的去噪扩散隐式模型(DDIM)增强对应方法相比,实现了高达14倍的推理加速,有效弥合了准确性和效率之间的差距。MENO的灵活性和效率使其成为科学机器学习应用中高效的替代模型,其中统计完整性和计算效率至关重要。

英文摘要

Neural operators have emerged as powerful surrogates for dynamical systems due to their grid-invariant properties and computational efficiency. However, Fourier-based variants inherently truncate high-frequency components in spectral space, resulting in the loss of small-scale structures and degraded prediction quality at high resolutions when trained on low-resolution data. While diffusion-based enhancement methods can recover multi-scale features, they introduce substantial inference overhead that undermines the efficiency advantage of neural operators. In this work, we introduce MeanFlow-Enhanced Neural Operators (MENO), a novel framework that achieves accurate all-scale predictions with minimal inference cost. By leveraging the improved MeanFlow method, MENO restores both small-scale details and large-scale dynamics with superior physical fidelity and statistical accuracy. We evaluate MENO on three challenging dynamical systems, including phase-field dynamics, 2D Kolmogorov flow, and active matter dynamics, at resolutions up to 256$\times$256. Across all benchmarks, MENO improves the power spectrum density accuracy by up to a factor of 2 compared to baseline neural operators while achieving up to $14\times$ faster inference than the state-of-the-art Denoising Diffusion Implicit Model (DDIM)-enhanced counterparts, effectively bridging the gap between accuracy and efficiency. The flexibility and efficiency of MENO position it as an efficient surrogate model for scientific machine learning applications where both statistical integrity and computational efficiency are paramount.

2604.02969 2026-06-01 stat.ML cs.LG stat.CO stat.ME 版本更新

Inversion-Free Natural Gradient Descent on Riemannian Manifolds

黎曼流形上的无逆自然梯度下降

Dario Draca, Takuo Matsubara, Minh-Ngoc Tran

发表机构 * School of Mathematics and Statistics, University of Sydney(悉尼大学数学与统计学学院) University of Sydney Business School(悉尼大学商学院)

AI总结 针对参数位于一般黎曼流形上的统计模型,提出一种内在的无逆自然梯度方法,通过流形上逆Fisher信息矩阵的移动近似和低秩更新,避免矩阵求逆,并证明迭代序列的几乎必然收敛率。

Comments 80 pages, 4 figures. Updated empirical examples

详情
AI中文摘要

自然梯度法是统计优化的核心工具,但其广泛应用受到欧几里得参数空间假设、Fisher信息矩阵(FIM)的重复估计以及后续求逆计算成本的限制。本文针对参数位于一般黎曼流形上的统计模型,提出了一种内在的、无逆的自然梯度方法。在这种非欧几里得设定下进行统计优化,可以自然地强制执行参数约束、消除不可辨识参数,并利用测地凸性。我们的算法基于逆FIM的移动近似,该近似直接在流形上维护。通过低秩矩阵恒等式,利用新的得分向量高效更新该近似。我们证明了迭代序列的几乎必然收敛率为$O(\log s / s^α)$,近似FIM也有类似速率。针对大规模应用,进一步提出了一种存储复杂度为次二次的有限内存变体。我们在Bures-Wasserstein流形上的变分贝叶斯、Stiefel流形上的归一化流以及降秩逻辑回归中展示了我们方法的有效性。

英文摘要

The natural gradient method is a central tool for statistical optimisation, but its broader application is hindered by the assumption of a Euclidean parameter space, the repeated estimation of the Fisher information matrix (FIM), and the computational cost of its subsequent inversion. This paper proposes an intrinsic, inversion-free natural gradient method for statistical models whose parameters lie on general Riemannian manifolds. Formulating statistical optimisation in this non-Euclidean setting allows for the natural enforcement of parameter constraints, the elimination of non-identifiable parameters, and the exploitation of geodesic convexity. Our algorithm is based on a moving approximation of the inverse FIM, which is maintained directly on the manifold. This approximation is efficiently updated with new score vectors using low-rank matrix identities. We prove almost-sure convergence rates of $O(\log s / s^α)$ for the sequence of iterates, and a similar rate for the approximate FIM. A limited-memory variant with sub-quadratic storage complexity is further proposed for large-scale applications. We demonstrate the efficacy of our method on variational Bayes within the Bures-Wasserstein manifold, normalising flows on the Stiefel manifold, and reduced-rank logistic regression.

2604.01985 2026-06-01 cs.LG cs.AI cs.RO 版本更新

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

World Action Verifier: 通过前向-反向不对称性自我改进世界模型

Yuejiang Liu, Fan Feng, Lingjing Kong, Weifeng Lu, Jinzhou Tang, Kun Zhang, Kevin Murphy, Chelsea Finn, Yilun Du

发表机构 * Stanford University(斯坦福大学) UC San Diego(加州大学圣地亚哥分校) Carnegie Mellon University(卡内基梅隆大学) Google DeepMind(谷歌深Mind) Harvard University(哈佛大学)

AI总结 提出World Action Verifier (WAV)框架,利用状态合理性和动作可达性的独立验证以及前向-反向不对称性,通过视频语料库的多样子目标生成器和稀疏逆模型实现循环一致性,从而在欠探索区域自我改进世界模型,在多个任务中样本效率提升2倍且下游策略性能提升22%以上。

Comments Project Website: https://world-action-verifier.github.io

详情
AI中文摘要

通用世界模型有望实现可扩展的策略评估、优化和规划,但达到所需的鲁棒性仍然具有挑战性。与主要关注最优动作的策略学习不同,世界模型需要在大量次优动作的空间中保持可靠,而这些动作在带有动作标签的机器人交互中往往代表性不足。为了解决这一挑战,我们提出了World Action Verifier (WAV)框架,该框架使世界模型能够识别自身的预测错误并进行自我改进。关键思想是将动作条件的状态预测分解为两个独立可验证的因素:状态合理性和动作可达性。我们证明,由于两个潜在的不对称性——更广泛的无动作数据的可用性和动作相关特征的更低维度——验证这些因素比直接前向预测更容易处理。利用这些不对称性,我们通过(i)从视频语料库中获得的多样子目标生成器和(ii)从状态特征子集推断动作的稀疏逆模型来增强世界模型。通过强制提议的子目标、推断的动作和前向展开之间的循环一致性,WAV在现有方法常常失败的欠探索区域提供了一种有效的验证机制。在涵盖MiniGrid、RoboMimic和ManiSkill的九个任务中,我们的方法实现了2倍的样本效率提升,同时将下游策略性能提高了22%以上。

英文摘要

General-purpose world models promise scalable policy evaluation, optimization, and planning, yet achieving the required level of robustness remains challenging. Unlike policy learning which primarily focuses on optimal actions, a world model needs to be reliable over a vast space of suboptimal actions, which are often underrepresented in action-labeled robot interactions. To address this challenge, we propose World Action Verifier (WAV), a framework that enables world models to identify their own prediction errors and self-improve. The key idea is to decompose action-conditioned state prediction into two independently verifiable factors: state plausibility and action reachability. We show that verifying these factors is significantly more tractable than direct forward prediction due to two underlying asymmetries: the broader availability of action-free data and the lower dimensionality of action-relevant features. Leveraging these asymmetries, we augment a world model with (i) a diverse subgoal generator obtained from video corpora and (ii) a sparse inverse model that infers actions from a subset of state features. By enforcing cycle consistency among proposed subgoals, inferred actions, and forward rollouts, WAV provides an effective verification mechanism in under-explored regimes, where existing methods often fail. Across nine tasks spanning MiniGrid, RoboMimic, and ManiSkill, our method achieves 2x higher sample efficiency while improving downstream policy performance by over 22%.

2603.29972 2026-06-01 stat.ME cs.LG econ.EM stat.ML 版本更新

Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

协变量能否解释这些群体差异?参考组的选择可能逆转Oaxaca-Blinder分解的结论

Manuel Quintero, Advik Shreekumar, William T. Stephenson, Tamara Broderick

发表机构 * Massachusetts Institute of Technology(麻省理工学院) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文通过理论和实证证明,在Oaxaca-Blinder分解中,参考组的选择可能导致实质性不同的结论,且该问题在复杂回归模型中更为常见,建议研究者报告两种方向的分解结果。

Comments 28 pages, 4 figures

详情
AI中文摘要

科学家们常常试图解释为什么两个群体的结果存在差异。例如,两家医院患者死亡率的差异可能源于患者本身的差异(协变量)或医疗护理的差异(给定协变量下的结果)。Oaxaca-Blinder分解(OBD)是区分这些因素的标准工具。众所周知,OBD需要选择其中一个群体作为参考,且数值答案可能因参考组而异。据我们所知,目前尚无系统研究探讨OBD参考组的选择是否会导致不同的实质性结论以及该问题的普遍性。在本文中,我们通过真实数据和模拟数据给出了存在性证明,表明OBD参考组确实可能导致实质性不同的结论。我们的实证研究发现,当OBD扩展到更复杂的回归模型(包括预训练变换器)时,这种敏感性更为常见。我们的理论和实证结果共同表明,这些结论逆转并非完全由模型误设、小数据或对抗性参数选择导致。我们的结果表明,实践者应始终报告OBD的两个方向;现代机器学习和大数据集并不能自动解决结论逆转问题;且需要进一步研究这一问题。

英文摘要

Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has been no systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can in fact yield substantively different conclusions. Our empirical exercises find that this sensitivity is more common when the OBD is extended to more complex regression models, including a pretrained transformer. Our theoretical and empirical results together establish that these conclusion reversals are not entirely driven by model misspecification, small data, or adversarial parameter choices. Our results suggest that practitioners should always report both directions of the OBD; that modern machine learning and large datasets do not automatically resolve the conclusion reversal problem; and that further work on this problem is needed.

2603.28201 2026-06-01 cs.LG stat.ML 版本更新

A Perturbation Approach to Unconstrained Linear Bandits

无约束线性赌博机的一种扰动方法

Andrew Jacobsen, Dorian Baudry, Shinji Ito, Nicolò Cesa-Bianchi

发表机构 * Inria, Univ. Grenoble Alpes, Grenoble INP, CNRS, LIG, 38000 Grenoble, France(法国国家信息与自动化研究所、格勒诺布尔阿尔卑斯大学、格勒诺布尔INP、国家科学研究中心、格勒诺布尔理工大学) The University of Tokyo(东京大学)

AI总结 本文提出一种基于扰动的框架,将无约束线性赌博机问题简化为标准在线线性优化问题,并实现了静态和动态遗憾的最优高概率保证。

Comments 50 pages; v2: ICML 2026

详情
AI中文摘要

我们重新审视了Abernethy等人(2008)在无约束赌博机线性优化(uBLO)背景下的标准基于扰动的方法。我们展示了一个令人惊讶的结果:在无约束设置中,这种方法有效地将赌博机线性优化(BLO)简化为一个标准的在线线性优化(OLO)问题。我们的框架在几个方面改进了先前的工作。首先,当我们的扰动方案与比较器自适应的OLO算法结合时,我们推导出了期望遗憾保证,从而对不同的对抗模型如何影响最终的比较器自适应率提供了新的见解。我们还将分析扩展到动态遗憾,在没有$P_T$先验知识的情况下,首次获得了具有最优$\sqrt{P_T}$路径长度依赖的保证。然后,我们为uBLO中的静态和动态遗憾开发了第一个高概率保证。最后,我们讨论了静态遗憾的下界,并证明了欧几里得球上对抗性线性赌博机的传说$Ω(\sqrt{dT})$率,这具有独立的意义。

英文摘要

We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the first guarantees with optimal $\sqrt{P_T}$ path-length dependencies without prior knowledge of $P_T$. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore $Ω(\sqrt{dT})$ rate for adversarial linear bandits on the Euclidean ball, which is of independent interest.

2603.20253 2026-06-01 physics.comp-ph cs.AI cs.DC cs.LG 版本更新

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

SimulCost: 一个用于自动化物理模拟的代价感知基准与工具包

Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

发表机构 * University of California San Diego(加州大学圣地亚哥分校) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Peking University(北京大学) University of California, Los Angeles(加州大学洛杉矶分校) California Institute of Technology(加州理工学院) ETH Zurich(苏黎世联邦理工学院)

AI总结 针对现有LLM评估忽略工具使用代价的问题,提出SimulCost基准,通过单轮和多轮参数调优任务比较LLM与传统扫描方法在准确性和计算代价上的表现,发现LLM在高精度任务中初始猜测不可靠且多轮模式效率更低。

Comments accepted version at ICML

详情
AI中文摘要

评估用于科学任务的LLM代理时,现有研究主要关注令牌成本,而忽略了工具使用成本,如模拟时间和实验资源。因此,在现实预算约束下,pass@k等指标变得不切实际。为弥补这一差距,我们引入了SimulCost,这是首个针对物理模拟中代价敏感参数调优的基准。SimulCost将LLM调优代价敏感参数与传统扫描方法在准确性和计算成本上进行比较,涵盖了来自流体动力学、固体力学和等离子体物理的13个模拟器中的2,947个单轮(初始猜测)和1,931个多轮(通过试错调整)任务。每个模拟器的成本是解析定义的且与平台无关。前沿LLM在单轮模式下的成功率为46-65%,在高精度要求下下降至35-55%,使得它们的初始猜测在高精度任务中不可靠。多轮模式将成功率提升至72-81%,但LLM比传统扫描慢1.5-2.5倍,因此不是经济的选择。我们还研究了参数组相关性以了解知识迁移潜力,以及上下文示例和推理努力的影响,为部署和微调提供了实际意义。我们将SimulCost开源为一个静态基准和可扩展工具包,以促进改进物理模拟的代价感知代理设计以及扩展新的模拟环境的研究。代码和数据可在https://github.com/Rose-STL-Lab/SimulCost-Bench获取。

英文摘要

Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,947 single-round (initial guess) and 1,931 multi-round (adjustment by trial-and-error) tasks across 13 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46-65% success rates in single-round mode, dropping to 35-55% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 72-81%, but LLMs are 1.5-2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at https://github.com/Rose-STL-Lab/SimulCost-Bench.

2603.26506 2026-06-01 q-bio.NC cs.LG 版本更新

Identifying Connectivity Distributions from Neural Dynamics Using Flows

利用流从神经动力学中识别连接分布

Timothy Doyeon Kim, Ulises Pereira-Obilinovic, Yiliu Wang, Eric Shea-Brown, Uygar Sümbül

发表机构 * Allen Institute, Seattle, WA, USA(艾伦研究所) University of Washington, Seattle, WA, USA(华盛顿大学)

AI总结 针对低秩循环神经网络(lrRNN)中连接结构不可识别的问题,提出基于最大熵和连续归一化流(CNF)的推理框架,通过流匹配训练学习连接权重分布,以无偏方式匹配观测动力学,并应用于合成数据和真实神经记录。

详情
AI中文摘要

连接结构塑造了神经计算,但从群体记录中推断这种结构是退化的:多种连接结构可以产生相同的动力学。最近的工作使用低秩循环神经网络(lrRNN)从观测活动中推断低维潜在动力学和连接,从而对动力学进行机制性解释。然而,训练lrRNN的标准方法可能会恢复与潜在动力学无关的虚假结构。我们首先刻画了lrRNN中连接结构的可识别性,并确定了唯一解存在的条件。为了找到这样的解,我们开发了一个基于最大熵和连续归一化流(CNF)的推理框架,通过流匹配进行训练。我们的方法不是估计单个连接矩阵,而是学习一个连接权重的分布,该分布在不可识别分量上最大程度地无偏,同时匹配观测动力学。这种方法捕捉了复杂但必要的分布,例如经验数据中发现的重尾连接。我们在具有产生多稳态吸引子、极限环和环吸引子的连接结构的合成数据集上验证了我们的方法,并展示了其在决策过程中大鼠额叶皮层记录中的适用性。我们的框架将电路推断从恢复连接转变为识别哪些连接结构是计算上必需的,哪些是欠约束推断的产物。

英文摘要

Connectivity structure shapes neural computation, but inferring this structure from population recordings is degenerate: multiple connectivity structures can generate identical dynamics. Recent work uses low-rank recurrent neural networks (lrRNNs) to infer low-dimensional latent dynamics and connectivity from observed activity, enabling a mechanistic interpretation of the dynamics. However, standard approaches for training lrRNNs can recover spurious structures irrelevant to the underlying dynamics. We first characterize the identifiability of connectivity structures in lrRNNs and determine conditions under which a unique solution exists. To find such solutions, we develop an inference framework based on maximum entropy and continuous normalizing flows (CNFs), trained via flow matching. Instead of estimating a single connectivity matrix, our method learns a distribution over connection weights that is maximally unbiased over unidentifiable components while matching the observed dynamics. This approach captures complex yet necessary distributions such as heavy-tailed connectivity found in empirical data. We validate our method on synthetic datasets with connectivity structures that generate multistable attractors, limit cycles, and ring attractors, and demonstrate its applicability in recordings from rat frontal cortex during decision-making. Our framework shifts circuit inference from recovering connectivity to identifying which connectivity structures are computationally required, and which are artifacts of underconstrained inference.

2603.23977 2026-06-01 cs.LG cs.AI 版本更新

Circuit-Inspired High-Order Neural Networks with Unified Neural Dynamics Modeling for PDE Solving and Visual Perception

电路启发的具有统一神经动力学建模的高阶神经网络用于PDE求解与视觉感知

Tongfei Chen, Jingying Yang, Linlin Yang, Juan Zhang, Jinhu Lü, David Doermann, Chunyu Xie, Long He, Tian Wang, Guodong Guo, Baochang Zhang

发表机构 * Communication University of China(通信大学) AI Research, Qihoo 360(360人工智能研究院,奇虎360) Eastern Institute of Technology, Ningbo(宁波工程技术院)

AI总结 提出电路启发的高阶神经网络(CHONN),通过基尔霍夫级联组合实现高阶动力学算子,在PDE求解、长期物理预测和ImageNet-1K识别中提升结构保真度和稳定性。

详情
AI中文摘要

深度网络通常依赖架构启发式方法来塑造表示演化,限制了其对由内在动力学支配的数据的建模能力。我们提出了电路启发的高阶神经网络(CHONN),这是一个模块化框架,将表示演化视为一个潜在势过程,并通过基尔霍夫启发的级联组合增加其有效阶数。单个基尔霍夫神经单元实现稳定的一阶更新,而串行组合的单元在一个块内形成高阶动力学算子。这种构造是可解释的、数值稳定的,并且与常见的神经骨干网络兼容。理论分析表明,级联单元诱导出端到端的高阶算子,控制实验证明块内高阶构造不同于通用深度堆叠,特别是在导数敏感度量上。在稳态算子学习、长期物理预测和ImageNet-1K识别中,CHONN提高了结构保真度、滚动稳定性和视觉表示学习。这些结果将高阶电路组合确定为神经动力学建模的一般原则。

英文摘要

Deep networks often rely on architectural heuristics to shape representation evolution, limiting their ability to model data governed by intrinsic dynamics. We present the Circuit-inspired High-Order Neural Network (CHONN), a modular framework that treats representation evolution as a latent potential process and increases its effective order through Kirchhoff-inspired cascade composition. A single Kirchhoff Neural Cell implements a stable first-order update, while serially composed cells form higher-order dynamical operators within one block. This construction is interpretable, numerically stable and compatible with common neural backbones. Theoretical analysis shows that cascaded cells induce end-to-end high-order operators, and controlled experiments demonstrate that intra-block high-order construction differs from generic depth stacking, especially on derivative-sensitive measures. Across steady-state operator learning, long-horizon physical forecasting and ImageNet-1K recognition, CHONN improves structural fidelity, rollout stability and visual representation learning. These results identify high-order circuit composition as a general principle for neural dynamics modeling.

2510.02578 2026-06-01 q-bio.BM cs.LG 版本更新

FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

FLOWR.root:基于流匹配的基础模型,用于联合多用途结构感知3D配体生成和亲和力预测

Julian Cremer, Tuan Le, Mohammad M. Ghahremanpour, Emilia Sługocka, Filipe Menezes, Djork-Arné Clevert

发表机构 * Machine Learning & Computational Sciences, Pfizer Worldwide R&D(辉瑞全球研发 机器学习与计算科学部) Computational Chemistry, Medicine Design, Pfizer Worldwide R&D(辉瑞全球研发 计算化学与医学设计部) Doctoral School of Medical and Health Sciences, Jagiellonian University Medical College(杰拉西利昂大学医学院医学与健康科学博士学院) Department of Physicochemical Drug Analysis, Faculty of Pharmacy, Jagiellonian University Medical College(杰拉西利昂大学医学院药物物理化学分析系) Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich(海德堡-慕尼黑分子靶点与治疗中心 结构生物学研究所) TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich(慕尼黑技术大学自然科学学院 生物科学系,拜耳核磁共振中心)

AI总结 提出SE(3)-等变流匹配模型FLOWR.root,实现口袋感知的3D配体生成、效力与结合亲和力预测及置信度估计,支持从头生成、条件采样、片段优化替换及多终点亲和力预测,在无条件分子生成和口袋条件配体生成上达到最优性能,并通过参数高效微调在亲和力预测上超越现有方法。

详情
AI中文摘要

我们提出了FLOWR.root,一个SE(3)-等变流匹配模型,用于口袋感知的3D配体生成,同时进行效力和结合亲和力预测及置信度估计。该模型支持从头生成、相互作用和药效团条件采样、片段优化和替换,以及多终点亲和力预测(pIC50、pKi、pKd、pEC50)。训练结合了大规模配体库与混合保真度的蛋白质-配体复合物,并在精选的共晶数据集上进行了细化,通过参数高效微调适应项目特定数据。基础FLOWR.root模型在无条件3D分子和口袋条件配体生成中达到了最先进的性能。在HiQBind上,预训练和微调后的模型展示了高度准确的亲和力预测,并在FEP+/OpenFE基准测试中超越了Boltz-2等最新方法,具有显著的速度优势。然而,我们表明解决未见过的结构-活性景观需要领域适应;参数高效的LoRA微调在多样化的专有数据集和PDE10A上带来了显著改进。联合生成和亲和力预测通过重要性采样实现了推理时缩放,将设计引导向更高亲和力的化合物。案例研究验证了这一点:针对CLK3的选择性CK2α配体生成显示了预测结合能与量子力学结合能之间的显著相关性。在ERα、TYK2和BACE1上的骨架优化证实了预测亲和力与QM计算之间的强一致性,同时确认了几何保真度。通过整合结构感知生成、亲和力估计、属性引导采样和高效领域适应,FLOWR.root为从先导发现到先导优化的基于结构的药物设计提供了全面基础。

英文摘要

We present FLOWR.root, an SE(3)-equivariant flow-matching model for pocket-aware 3D ligand generation with joint potency and binding affinity prediction and confidence estimation. The model supports de novo generation, interaction- and pharmacophore-conditional sampling, fragment elaboration and replacement, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, refined on curated co-crystal datasets and adapted to project-specific data through parameter-efficient finetuning. The base FLOWR.root model achieves state-of-the-art performance in unconditional 3D molecule and pocket-conditional ligand generation. On HiQBind, the pre-trained and finetuned model demonstrates highly accurate affinity predictions, and outperforms recent state-of-the-art methods such as Boltz-2 on the FEP+/OpenFE benchmark with substantial speed advantages. However, we show that addressing unseen structure-activity landscapes requires domain adaptation; parameter-efficient LoRA finetuning yields marked improvements on diverse proprietary datasets and PDE10A. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies. Scaffold elaboration on ER$α$, TYK2, and BACE1 demonstrates strong agreement between predicted affinities and QM calculations while confirming geometric fidelity. By integrating structure-aware generation, affinity estimation, property-guided sampling, and efficient domain adaptation, FLOWR.root provides a comprehensive foundation for structure-based drug design from hit identification through lead optimization.

2603.22867 2026-06-01 cs.AR cs.AI cs.LG 版本更新

TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

TRINE: 一种面向多模态AI的令牌感知、运行时自适应FPGA推理引擎

Hyunwoo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Suyeon Jang, Behnam Khaleghi, Fei Wen, Mohsen Imani

发表机构 * University of California, Irvine(加州大学尔湾分校) Purdue University Northwest(北达科他州立大学) Qualcomm(高通) Samsung(三星)

AI总结 针对多模态AI中不同计算/内存模式导致嵌入式平台实时性不足的问题,提出TRINE,一种无需重配置的单比特流FPGA加速器与编译器,通过统一层映射、运行时模式切换、令牌剪枝和依赖感知层卸载,实现端到端多模态推理,在Alveo U50和ZCU104上相比RTX 4090和Jetson Orin Nano分别降低延迟22.57倍和6.86倍,功耗仅20-21W。

Comments Accepted to DAC 2026

详情
AI中文摘要

混合ViT、CNN、GNN和Transformer NLP的多模态堆栈给嵌入式平台带来压力,因为它们的计算/内存模式不同,且硬实时目标几乎没有松弛空间。TRINE是一个单比特流FPGA加速器和编译器,无需重配置即可执行端到端多模态推理。层被统一为DDMM/SDDMM/SpMM,并映射到一个模式可切换的引擎上,该引擎在运行时在权重/输出驻留脉动阵列、1xCS SIMD和可路由加法树(RADT)之间切换,共享PE阵列。一个宽度匹配的两阶段top-k单元支持流内令牌剪枝,而依赖感知层卸载(DALO)在可重构处理单元上重叠独立内核以维持利用率。在Alveo U50和ZCU104上评估,TRINE相比RTX 4090和Jetson Orin Nano分别降低延迟高达22.57倍和6.86倍,功耗20-21W;仅令牌剪枝在ViT密集型流水线上可实现高达7.8倍加速,DALO贡献高达79%的吞吐量提升。采用int8量化,代表性任务的精度下降<2.5%,为统一的视觉、语言和图工作负载提供了最先进的延迟和能效——仅需一个比特流。

英文摘要

Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack. TRINE is a single-bitstream FPGA accelerator and compiler that executes end-to-end multimodal inference without reconfiguration. Layers are unified as DDMM/SDDMM/SpMM and mapped to a mode-switchable engine that toggles at runtime among weight/output-stationary systolic, 1xCS SIMD, and a routable adder tree (RADT) on a shared PE array. A width-matched, two-stage top-k unit enables in-stream token pruning, while dependency-aware layer offloading (DALO) overlaps independent kernels across reconfigurable processing units to sustain utilization. Evaluated on Alveo U50 and ZCU104, TRINE reduces latency by up to 22.57x vs. RTX 4090 and 6.86x vs. Jetson Orin Nano at 20-21 W; token pruning alone yields up to 7.8x on ViT-heavy pipelines, and DALO contributes up to 79% throughput improvement. With int8 quantization, accuracy drops remain <2.5% across representative tasks, delivering state-of-the-art latency and energy efficiency for unified vision, language, and graph workloads-in one bitstream.

2512.05976 2026-06-01 physics.comp-ph cs.LG 版本更新

Physics Enhanced Deep Surrogates for the Phonon Boltzmann Transport Equation

声子玻尔兹曼输运方程的物理增强深度代理模型

Antonio Varagnolo, Giuseppe Romano, Raphaël Pestourie

发表机构 * School of Computational Science and Engineering, Georgia Institute of Technology(计算科学与工程学院,佐治亚理工学院) Institute for Soldier Nanotechnologies, Massachusetts Institute of Technology(士兵纳米技术研究所,麻省理工学院)

AI总结 提出物理增强深度代理模型(PEDS),结合可微傅里叶求解器与神经网络生成器,通过不确定性驱动主动学习,在弹道和扩散区域实现高精度、高数据效率的声子输运模拟,仅需300次高保真BTE模拟即可达到约5%的误差。

详情
AI中文摘要

设计具有可控纳米尺度热流的材料对于微电子、热电和能量转换技术的进步至关重要。在这些尺度上,声子输运遵循玻尔兹曼输运方程(BTE),该方程捕捉了非扩散(弹道)效应,但在逆设计循环中反复求解成本过高。现有的代理方法在速度和准确性之间权衡:快速宏观求解器可能高估热导率数百个百分点,而最近的数据驱动算子学习器通常需要数千次高保真模拟。因此,需要一种快速、数据高效的代理模型,在弹道和扩散区域均保持可靠。我们提出了一种物理增强深度代理模型(PEDS),它将可微傅里叶求解器与神经网络生成器相结合,并与不确定性驱动的主动学习耦合。傅里叶求解器作为物理归纳偏置,而网络学习几何依赖的修正和混合系数,该系数在宏观和纳米尺度行为之间插值。与纯数据驱动基线相比,PEDS将训练数据需求降低了高达70%,仅需300次高保真BTE模拟即可实现约5%的分数误差,并能够高效设计覆盖12-85 W m$^{-1}$ K$^{-1}$的多孔几何结构,平均设计误差为4%。学习到的混合参数恢复了弹道-扩散转变,并提高了分布外鲁棒性。这些结果表明,嵌入简单、可微的低保真物理可以显著提高代理模型的数据效率和可解释性,使重复的PDE约束优化在纳米尺度热材料设计中变得实用。

英文摘要

Designing materials with controlled heat flow at the nano-scale is central to advances in microelectronics, thermoelectrics, and energy-conversion technologies. At these scales, phonon transport follows the Boltzmann Transport Equation (BTE), which captures non-diffusive (ballistic) effects but is too costly to solve repeatedly in inverse-design loops. Existing surrogate approaches trade speed for accuracy: fast macroscopic solvers can overestimate conductivities by hundreds of percent, while recent data-driven operator learners often require thousands of high-fidelity simulations. This creates a need for a fast, data-efficient surrogate that remains reliable across ballistic and diffusive regimes. We introduce a Physics-Enhanced Deep Surrogate (PEDS) that combines a differentiable Fourier solver with a neural generator and couples it with uncertainty-driven active learning. The Fourier solver acts as a physical inductive bias, while the network learns geometry-dependent corrections and a mixing coefficient that interpolates between macroscopic and nano-scale behavior. PEDS reduces training-data requirements by up to 70% compared with purely data-driven baselines, achieves roughly 5% fractional error with only 300 high-fidelity BTE simulations, and enables efficient design of porous geometries spanning 12-85 W m$^{-1}$ K$^{-1}$ with average design errors of 4%. The learned mixing parameter recovers the ballistic-diffusive transition and improves out of distribution robustness. These results show that embedding simple, differentiable low-fidelity physics can dramatically increase surrogate data-efficiency and interpretability, making repeated PDE-constrained optimization practical for nano-scale thermal-materials design.

2603.19862 2026-06-01 cs.CV cs.LG 版本更新

IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

IsoCLIP: 分解CLIP投影器以实现高效的模态内对齐

Simone Magistri, Dipam Goswami, Marco Mistretta, Bartłomiej Twardowski, Joost van de Weijer, Andrew D. Bagdanov

发表机构 * Media Integration and Communication Center (MICC), University of Florence, Italy(意大利佛罗伦萨大学媒体集成与通信中心) Department of Computer Science, Universitat Autònoma de Barcelona, Spain(西班牙巴塞罗那自治大学计算机科学系) Computer Vision Center, Barcelona, Spain(西班牙巴塞罗那计算机视觉中心) IDEAS Research Institute, Warsaw, Poland(波兰华沙IDEAS研究所)

AI总结 本文通过分析CLIP投影器的谱特性,发现模态间对齐子空间和各向异性方向,提出无训练方法IsoCLIP去除各向异性方向以改善模态内对齐,在模态内检索和分类任务上降低延迟并超越现有方法。

Comments Accepted at CVPR2026

详情
AI中文摘要

视觉-语言模型如CLIP被广泛用于涉及视觉和文本模态的跨模态任务。然而,当个体模态编码器应用于固有的模态内任务(如图像到图像检索)时,其性能因模态内错位而受损。本文研究CLIP中的模态内错位,重点关注将投影前图像和文本嵌入映射到共享嵌入空间的投影器的作用。通过分析应用于投影特征的余弦相似度形式及其与对比CLIP损失的交互,我们发现在训练期间存在一个负责对齐两种模态的跨模态算子,以及第二个仅强制执行模态内归一化但不促进模态内对齐的模态内算子。通过对跨模态算子的谱分析,我们识别出一个近似各向同性的子空间,其中两种模态良好对齐,以及每个模态特有的各向异性方向。我们证明该对齐子空间可以直接从投影器权重中获得,并且去除各向异性方向可改善模态内对齐。我们在模态内检索和分类基准上的实验表明,我们的无训练方法减少了模态内错位,大大降低了延迟,并在多个预训练的类CLIP模型上优于现有方法。代码公开于:https://github.com/simomagi/IsoCLIP。

英文摘要

Vision-Language Models like CLIP are extensively used for inter-modal tasks which involve both visual and text modalities. However, when the individual modality encoders are applied to inherently intra-modal tasks like image-to-image retrieval, their performance suffers from the intra-modal misalignment. In this paper we study intra-modal misalignment in CLIP with a focus on the role of the projectors that map pre-projection image and text embeddings into the shared embedding space. By analyzing the form of the cosine similarity applied to projected features, and its interaction with the contrastive CLIP loss, we show that there is an inter-modal operator responsible for aligning the two modalities during training, and a second, intra-modal operator that only enforces intra-modal normalization but does nothing to promote intra-modal alignment. Via spectral analysis of the inter-modal operator, we identify an approximately isotropic subspace in which the two modalities are well-aligned, as well as anisotropic directions specific to each modality. We demonstrate that this aligned subspace can be directly obtained from the projector weights and that removing the anisotropic directions improves intra-modal alignment. Our experiments on intra-modal retrieval and classification benchmarks show that our training-free method reduces intra-modal misalignment, greatly lowers latency, and outperforms existing approaches across multiple pre-trained CLIP-like models. The code is publicly available at: https://github.com/simomagi/IsoCLIP.

2601.05770 2026-06-01 cs.LG cs.CL 版本更新

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

权重到代码:从离散Transformer中提取可解释算法

Yifan Zhang, Wei Bi, Kechi Zhang, Dongming Jin, Jie Fu, Zhi Jin

发表机构 * Key Laboratory of High Confidence Software Technology (PKU), MOE, Beijing, China(高可信软件技术重点实验室(PKU),教育部,北京,中国) School of Computer Science, Peking University, Beijing, China(计算机学院,北京大学,北京,中国) Shanghai AI Lab, Shanghai, China(上海人工智能实验室,上海,中国) Shanghai Innovation Institute, Shanghai, China(上海创新研究院,上海,中国)

AI总结 提出离散Transformer架构,通过温度退火采样注入离散性,结合假设检验和符号回归从模型权重中提取可解释算法,在离散任务上性能与RNN基线相当,并扩展到连续中间计算任务。

详情
AI中文摘要

算法提取旨在直接从算法任务训练的模型中合成可执行程序,从而无需依赖人工编写的目标程序即可从权重中重新发现可执行机制。然而,将此范式应用于Transformer时,由于表示纠缠(例如叠加),其中重叠方向编码的特征严重阻碍了符号表达式的恢复。我们提出了离散Transformer,这是一种专门设计用于弥合连续表示与离散符号逻辑之间差距的架构。通过温度退火采样注入离散性,我们的框架有效利用假设检验和符号回归来提取人类可读的程序。实验表明,离散Transformer在共享离散任务上实现了与基于RNN的MIPS基线相当的性能,同时将提取扩展到具有连续值中间计算的任务。最后,我们展示了架构归纳偏置对合成程序提供了细粒度控制,使离散Transformer成为算法提取和Transformer可解释性的可控测试平台。

英文摘要

Algorithm extraction aims to synthesize executable programs directly from models trained on algorithmic tasks, enabling de novo recovery of executable mechanisms from weights without relying on human-written target programs. However, applying this paradigm to Transformer is complicated by representation entanglement (e.g., superposition), where features encoded in overlapping directions substantially hinder the recovery of symbolic expressions. We propose the Discrete Transformer, an architecture explicitly designed to bridge the gap between continuous representations and discrete symbolic logic. By injecting discreteness through temperature-annealed sampling, our framework effectively leverages hypothesis testing and symbolic regression to extract human-readable programs. Empirically, the Discrete Transformer achieves performance comparable to the RNN-based MIPS baseline on shared discrete tasks, while broadening extraction to tasks with continuous-valued intermediate computations. Finally, we show that architectural inductive biases provide fine-grained control over synthesized programs, establishing the Discrete Transformer as a controllable testbed for algorithm extraction and Transformer interpretability.

2603.17145 2026-06-01 cs.LG cs.AI 版本更新

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

REAL: 面向LLM评判的回归感知强化学习

Yasi Zhang, Tianyu Chen, Mingyuan Zhou, Oscar Leong, Ying Nian Wu, Michal Lukasik

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) The University of Texas at Austin(得克萨斯大学奥斯汀分校) Google Research Now at Google DeepMind(谷歌研究 现在在谷歌深Mind)

AI总结 提出REAL框架,通过广义策略梯度将回归目标融入强化学习,优化LLM作为评分器的数值评估,在多个规模模型上超越SFT和标准RL方法。

Comments Accepted to ICML 2026. The first two authors contributed equally

详情
AI中文摘要

大型语言模型(LLM)越来越多地被部署为自动评估器,为模型输出分配数值分数,这种范式称为LLM-as-a-Judge。然而,标准的强化学习(RL)方法通常依赖二元奖励(例如0-1准确率),从而忽略了回归任务中固有的序结构;例如,当真实值为5时,它们未能识别出预测4显著优于预测1。相反,现有的回归感知方法通常局限于监督微调(SFT),限制了其探索最优推理路径的能力。为弥合这一差距,我们提出\textbf{REAL}(\underline{RE}gression-\underline{A}ware Reinforcement \underline{L}earning),这是一个原则性的RL框架,旨在优化回归奖励,并且也被证明对相关性指标是最优的。一个关键的技术挑战是回归目标显式地依赖于策略,从而使标准策略梯度方法失效。为解决此问题,我们采用广义策略梯度估计器,该估计器自然地将优化分解为两个互补组件:(1)对思维链(CoT)轨迹的探索,以及(2)最终分数的回归感知预测细化。跨模型规模(8B到32B)的大量实验表明,REAL在域外基准上始终优于回归感知SFT基线和标准RL方法,展现出显著更好的泛化能力。具体在Qwen3-32B上,我们相比SFT基线获得了+8.40 Pearson和+7.20 Spearman相关性的提升,相比基础模型提升了+18.30/+11.20。这些发现凸显了将回归目标整合到RL探索中对准确LLM评估的关键价值。

英文摘要

Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1 accuracy), thereby ignoring the ordinal structure inherent in regression tasks; for instance, they fail to recognize that predicting 4 is significantly better than predicting 1 when the ground truth is 5. Conversely, existing regression-aware approaches are often confined to Supervised Fine-Tuning (SFT), limiting their ability to explore optimal reasoning paths. To bridge this gap, we propose \textbf{REAL} (\underline{RE}gression-\underline{A}ware Reinforcement \underline{L}earning), a principled RL framework designed to optimize regression rewards, and also proven to be optimal for correlation metrics. A key technical challenge is that the regression objective is explicitly policy-dependent, thus invalidating standard policy gradient methods. To address this, we employ the generalized policy gradient estimator, which naturally decomposes optimization into two complementary components: (1) exploration over Chain-of-Thought (CoT) trajectory, and (2) regression-aware prediction refinement of the final score. Extensive experiments across model scales (8B to 32B) demonstrate that REAL consistently outperforms both regression-aware SFT baselines and standard RL methods, exhibiting significantly better generalization on out-of-domain benchmarks. On Qwen3-32B specifically, we achieve gains of +8.40 Pearson and +7.20 Spearman correlation over the SFT baseline, and +18.30/+11.20 over the base model. These findings highlight the critical value of integrating regression objectives into RL exploration for accurate LLM evaluation.

2603.16123 2026-06-01 cs.LG cs.AI math.AT math.CT 版本更新

Functorial Neural Architectures from Higher Inductive Types

基于高阶归纳类型的函子神经架构

Karen Sargsyan

发表机构 * Institute of Chemistry, Academia Sinica, Taipei, Taiwan(中国科学院化学研究所,台湾台北)

AI总结 提出通过高阶归纳类型规范编译为神经架构,强制解码器满足严格幺半函子性质,从而在组合泛化任务上比非函子方法提升2-10倍。

Comments 26 pages, 10 tables. Code and Cubical Agda formalization: https://github.com/karsar/hott_neuro

详情
AI中文摘要

神经网络通常能学习任务的各个部分,但在这些部分的新组合上失败。我们认为这种失败是架构性的:只有当解码器尊重任务的代数法则,即从自由生成的序列下降到由这些法则确定的商时,它才能组合泛化。我们通过将高阶归纳类型(HIT)规范编译为神经架构,使这一原则具有建设性。基点、路径构造子和2-胞腔分别映射为基约束、生成器网络、结构拼接和学习到的同伦。由此产生的传输解码器在构造上是严格幺半函子:解码一个拼接的词是独立生成的环段的拼接。相反,我们证明softmax自注意力无法同时满足严格幺半组合和下降到任何非平凡组合商。在环面、圆楔和克莱因瓶上的实验验证了预期的层次结构:函子解码器比非函子替代方案性能提升2-10倍,而学习到的2-胞腔恰好在使用克莱因瓶关系的词上缩小了46%的误差差距。这些结果表明,组合泛化应作为架构中的函子结构强制执行,而非仅从示例中学习。

英文摘要

Neural networks often learn the parts of a task but fail on novel combinations of those parts. We argue that this failure is architectural: a decoder generalizes compositionally only when it respects the algebraic laws of the task, i.e. when it descends from freely generated sequences to the quotient determined by those laws. We make this principle constructive by compiling Higher Inductive Type (HIT) specifications into neural architectures. Basepoints, path constructors, and 2-cells are mapped to base constraints, generator networks, structural concatenation, and learned homotopies. The resulting transport decoders are strict monoidal functors by construction: decoding a concatenated word is concatenation of independently generated loop segments. In contrast, we prove that softmax self-attention cannot simultaneously satisfy strict monoidal composition and descent to any non-trivial compositional quotient. Experiments on the torus, wedge of circles, and Klein bottle validate the predicted hierarchy: functorial decoders outperform non-functorial alternatives by $2$--$10\times$, and a learned 2-cell closes a $46\%$ error gap precisely on words exercising the Klein-bottle relation. These results suggest that compositional generalization should be enforced as functorial structure in the architecture, rather than learned from examples alone.

2509.25269 2026-06-01 eess.IV cs.CV cs.LG cs.NA math.NA physics.optics 版本更新

Position-Blind Ptychography: Viability of image reconstruction via data-driven variational inference

位置盲叠层成像:通过数据驱动变分推断进行图像重建的可行性

Simon Welker, Lorenz Kuger, Tim Roith, Berthy Feng, Martin Burger, Timo Gerkmann, Henry Chapman

发表机构 * Department of Informatics, University of Hamburg(汉堡大学信息学院) Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY(自由电子激光科学中心 CFEL,德意志电子同步辐射实验室) Department of Mathematics, Bundesstr. 55, University of Hamburg(汉堡大学数学系) CIT School, Technical University of Munich(慕尼黑技术大学 CIT 学院) Munich Center for Machine Learning, München(慕尼黑机器学习中心) Massachusetts Institute of Technology (MIT)(麻省理工学院) The NSF AI Institute for Artificial Intelligence and Fundamental Interactions(国家科学基金会人工智能与基本相互作用研究院) Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY(海德堡成像,德意志电子同步辐射实验室)

AI总结 针对位置盲叠层成像这一新盲逆问题,利用基于分数的扩散模型作为数据驱动先验,通过变分推断联合恢复扫描位置和图像,在模拟简化二维变体中验证了图像重建的可行性。

详情
AI中文摘要

在这项工作中,我们提出并研究了位置盲叠层成像这一新颖的盲逆问题,即在没有任何扫描位置信息的情况下进行叠层相位恢复,必须与图像联合恢复扫描位置。该问题的动机来自单粒子衍射X射线成像,其中随机取向的粒子被照射并收集一组衍射图案。如果使用高度聚焦的X射线束,测量结果也会对每个粒子的光束位置敏感,从而成为叠层成像,但这些位置也是未知的。我们通过使用基于分数的扩散模型作为现代数据驱动图像先验,采用变分推断,在模拟的简化二维变体中研究了这个困难问题的图像重建可行性。我们发现,在适当的照明结构和强先验条件下,即使在测量噪声下,除了最困难的成像场景外,所有情况下都能实现可靠且成功的图像重建。

英文摘要

In this work, we present and investigate the novel blind inverse problem of position-blind ptychography, i.e., ptychographic phase retrieval without any knowledge of scan positions, which then must be recovered jointly with the image. The motivation for this problem comes from single-particle diffractive X-ray imaging, where particles in random orientations are illuminated and a set of diffraction patterns is collected. If one uses a highly focused X-ray beam, the measurements would also become sensitive to the beam positions relative to each particle and therefore ptychographic, but these positions are also unknown. We investigate the viability of image reconstruction in a simulated, simplified 2-D variant of this difficult problem, using variational inference with modern data-driven image priors in the form of score-based diffusion models. We find that, with the right illumination structure and a strong prior, one can achieve reliable and successful image reconstructions even under measurement noise, in all except the most difficult evaluated imaging scenario.

2603.12916 2026-06-01 cs.LG cs.AI 版本更新

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Surprised by Attention: 面向时间序列异常检测的可预测查询动态

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler

发表机构 * Mercedes-Benz AG(梅赛德斯-奔驰集团) Institute for Intelligent Systems, Esslingen University of Applied Sciences(智能系统研究所,埃森嫩应用科学大学)

AI总结 提出 AxonAD 无监督检测器,通过预测多头注意力查询向量的演化并结合重构误差与查询不匹配分数,有效检测多变量时间序列中的结构依赖偏移异常。

Comments This manuscript has been accepted for publication at ECML-PKDD 2026. The final version will be published in the conference proceedings. Main: 17 Pages, 7 Figures, 3 Tables; Appendix: 3 Pages, 4 Tables

详情
AI中文摘要

多变量时间序列异常通常表现为跨通道依赖的偏移,而非简单的幅度异常。例如,在自动驾驶中,转向指令可能内部一致,但与产生的横向加速度解耦。当灵活的序列模型尽管协调性改变仍能合理重构信号时,基于残差的检测器可能遗漏此类异常。我们提出 AxonAD,一种无监督检测器,将多头注意力查询演化视为短视界可预测过程。梯度更新重构路径与仅基于历史上下文的预测器耦合,该预测器通过掩码预测器-目标目标针对指数移动平均(EMA)目标编码器进行训练。推理时,重构误差与尾部聚合的查询不匹配分数结合,该分数衡量最近时间步上预测查询与目标查询之间的余弦偏差。这种双重方法在保留幅度级检测的同时,对结构依赖偏移敏感。在带有区间标注的专有车载遥测数据以及 TSB-AD 多变量套件(17 个数据集,180 个序列)上,使用无阈值和范围感知指标,AxonAD 在排名质量和时间定位上优于强基线。消融实验证实查询预测和组合评分是观察到的改进的主要驱动因素。代码可在 https://github.com/iis-esslingen/AxonAD 获取。

英文摘要

Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

2603.09453 2026-06-01 cs.LG cs.AI stat.ML 版本更新

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

变分路由:用于校准混合专家Transformer的可扩展贝叶斯框架

Albus Yizhuo Li, Matthew Wicker

发表机构 * Department of Computing, Imperial College London(伦敦帝国理工学院计算机系)

AI总结 提出变分混合专家路由(VMoER),通过将贝叶斯推断限制在专家选择阶段,实现大规模模型的不确定性校准,在微调基础模型上显著提升路由稳定性、降低校准误差并提高分布外检测AUROC,且额外计算开销极小。

Comments 8 pages, 7 figures for main text; 16 pages for Appendix; Accepted by ICML 2026;

详情
AI中文摘要

基础模型越来越多地部署在需要理解其输出不确定性的场景中,这对于确保负责任部署至关重要。虽然贝叶斯方法为不确定性量化提供了原则性方法,但其计算开销使得在基础模型规模下进行训练或推理不切实际。最先进的模型通过精心设计的稀疏性(包括混合专家(MoE)层)实现了数万亿的参数数量。在这项工作中,我们通过引入变分混合专家路由(VMoER)展示了大规模下的校准不确定性,这是一种用于建模MoE层不确定性的结构化贝叶斯方法。VMoER将贝叶斯推断限制在通常由确定性路由网络完成的专家选择阶段。我们使用两种推断策略实例化VMoER:对路由logits的摊销变分推断和推断用于随机专家选择的温度参数。在微调测试的基础模型上,VMoER在噪声下将路由稳定性提高了38%,校准误差降低了94%,分布外AUROC提高了12%,同时额外FLOPs增加不到1%。这些结果表明,VMoER为构建鲁棒且具有不确定性意识的基础模型提供了一条可扩展的路径。

英文摘要

Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across fine-tuning tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.

2603.13875 2026-06-01 cs.CL cs.LG 版本更新

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

GradMem: 通过测试时梯度下降将上下文写入记忆

Yuri Kuratov, Matvey Kairov, Aydar Bulatov, Ivan Rodkin, Mikhail Burtsev

发表机构 * AXXX, Cognitive AI Systems Lab, Moscow, Russia(AXXX认知人工智能系统实验室,莫斯科,俄罗斯) London Institute for Mathematical Sciences, London, UK(伦敦数学科学研究所,伦敦,英国)

AI总结 提出GradMem方法,利用测试时梯度下降将上下文写入紧凑记忆状态,通过自监督重构损失优化记忆令牌,在键值检索和自然语言任务上优于前向式记忆写入方法。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

许多大型语言模型应用需要基于长上下文进行条件生成。Transformer通常通过存储每层过去激活的KV缓存来支持这一点,这会产生大量内存开销。一种理想的替代方案是压缩记忆:一次性读取上下文,将其存储在紧凑状态中,并从该状态回答许多查询。我们在上下文移除设置中研究这一点,其中模型在推理时无法访问原始上下文的情况下必须生成答案。我们引入了GradMem,它通过每个样本的测试时优化将上下文写入记忆。给定一个上下文,GradMem在保持模型权重冻结的情况下,对一小部分前缀记忆令牌执行几步梯度下降。GradMem显式优化模型级的自监督上下文重构损失,从而产生带有迭代纠错的损失驱动写入操作,这与仅前向方法不同。在关联键值检索中,GradMem在相同记忆大小下优于仅前向记忆写入器,并且额外的梯度步长比重复的前向写入更有效地扩展容量。我们进一步表明,GradMem可以迁移到合成基准之外:使用预训练语言模型,它在自然语言任务(包括bAbI和SQuAD变体)上取得了有竞争力的结果,仅依赖于记忆中的编码信息。

英文摘要

Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is compressive memory: read a context once, store it in a compact state, and answer many queries from that state. We study this in a context removal setting, where the model must generate an answer without access to the original context at inference time. We introduce GradMem, which writes context into memory via per-sample test-time optimization. Given a context, GradMem performs a few steps of gradient descent on a small set of prefix memory tokens while keeping model weights frozen. GradMem explicitly optimizes a model-level self-supervised context reconstruction loss, resulting in a loss-driven write operation with iterative error correction, unlike forward-only methods. On associative key--value retrieval, GradMem outperforms forward-only memory writers with the same memory size, and additional gradient steps scale capacity much more effectively than repeated forward writes. We further show that GradMem transfers beyond synthetic benchmarks: with pretrained language models, it attains competitive results on natural language tasks including bAbI and SQuAD variants, relying only on information encoded in memory.

2603.13727 2026-06-01 cs.LG physics.data-an 版本更新

Data-driven Progressive Discovery of Physical Laws

数据驱动的物理定律渐进发现

Mingkun Xia, Weiwei Zhang

AI总结 提出链式符号回归(CoSR)框架,通过逐步组合具有明确物理意义的知识单元,从数据中渐进发现物理定律,并在多个物理问题中验证其有效性。

Comments This paper needs to be retracted due to methodological flaws found in RBC case

详情
AI中文摘要

符号回归是知识发现的有力工具,能够直接从数据中提取可解释的数学表达式。然而,传统的符号发现通常采用端到端的“一步式”过程,在处理真实物理系统时往往生成冗长且物理意义不明的表达式,导致模型泛化能力差。这一局限性根本上源于其偏离了科学发现的基本路径:物理定律并非以单一形式存在,而是遵循从简单到复杂、层次化且渐进式的模式。受此原理启发,我们提出了链式符号回归(CoSR),一种将物理定律发现建模为符号知识链的新框架。该知识链通过沿特定逻辑逐步组合多个具有明确物理意义的知识单元而形成,最终能够从数据中精确发现潜在的物理定律。CoSR完整复现了从开普勒第三定律到万有引力定律的经典力学渐进发现路径,并应用于三类问题:湍流瑞利-贝纳德对流、圆管粘性流以及激光-金属相互作用,展示了其改进经典标度理论的能力。最后,CoSR在复杂工程问题——不同飞行器气动系数标度中展示了发现新知识的能力。

英文摘要

Symbolic regression is a powerful tool for knowledge discovery, enabling the extraction of interpretable mathematical expressions directly from data. However, conventional symbolic discovery typically follows an end-to-end, "one-step" process, which often generates lengthy and physically meaningless expressions when dealing with real physical systems, leading to poor model generalization. This limitation fundamentally stems from its deviation from the basic path of scientific discovery: physical laws do not exist in a single form but follow a hierarchical and progressive pattern from simplicity to complexity. Motivated by this principle, we propose Chain of Symbolic Regression (CoSR), a novel framework that models the discovery of physical laws as a chain of symbolic knowledge. This knowledge chain is formed by progressively combining multiple knowledge units with clear physical meanings along a specific logic, ultimately enabling the precise discovery of the underlying physical laws from data. CoSR fully recapitulates the progressive discovery path from Kepler's third law to the law of universal gravitation in classical mechanics, and is applied to three types of problems: turbulent Rayleigh-Benard convection, viscous flows in a circular pipe, and laser-metal interaction, demonstrating its ability to improve classical scaling theories. Finally, CoSR showcases its capability to discover new knowledge in the complex engineering problem of aerodynamic coefficients scaling for different aircraft.

2603.09936 2026-06-01 cs.LG 版本更新

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

生成性漂移实际上是分数匹配:一个谱与变分视角

Erkan Turan, Nicolas Dufour, Maks Ovsjanikov

发表机构 * LIX, Ecole Polytechnique, IP Paris(巴黎高等理工学院信息研究所)

AI总结 本文通过揭示高斯核下漂移算子等价于平滑分布上的分数差,将生成性漂移方法纳入分数匹配框架,并利用谱分析和变分方法解决了原始工作中的三个遗留问题,同时提出了指数带宽退火策略和基于JKO方案的停止梯度算子理论依据。

详情
AI中文摘要

基于漂移的生成建模~\citep{deng2026drifting} 最近通过核驱动的漂移算子实现了最先进的一步图像生成,但其成功很大程度上是经验性的,其理论基础仍不明确。我们观察到,\emph{在高斯核下,漂移算子恰好是平滑分布上的分数差}。这回答了原始工作中遗留的三个问题:(1) 消失的漂移是否保证分布相等 ($V_{p,q}=0\Rightarrow p=q$),(2) 如何在核之间选择,以及 (3) 为什么停止梯度算子对于稳定训练不可或缺。我们的观察将漂移定位在分数匹配家族中。通过线性化McKean-Vlasov动力学并在傅里叶空间中探测,我们揭示了与等离子体动力学理论中的\emph{朗道阻尼}相当的频率依赖收敛时间尺度:高斯核遭受指数高频瓶颈,这可能解释了经验上对拉普拉斯核的偏好。这提出了一种修复方法:指数带宽退火调度 $σ(t)=σ_0 e^{-rt}$,将收敛时间从 $\exp(O(K_{\max}^2))$ 减少到 $O(\log K_{\max})$。最后,通过将漂移形式化为平滑KL散度的Wasserstein梯度流,我们证明了停止梯度算子不是启发式的,而是源于Jordan-Kinderlehrer-Otto (JKO) 方案所要求的冻结场离散化,移除它会切断训练与任何梯度流保证的联系。这种变分视角进一步为构建新颖的漂移算子提供了通用模板,我们通过Sinkhorn散度漂移进行了演示。我们在玩具数据集上验证了分析,并将其扩展到ImageNet。

英文摘要

Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the score-matching family. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, potentially explaining the empirical preference for the Laplacian kernel. This suggests a fix: an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is not a heuristic but is derived from the frozen-field discretization mandated by the Jordan-Kinderlehrer-Otto (JKO) scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, which we demonstrate with a Sinkhorn divergence drift. We validate our analysis on toy datasets and scale it up to ImageNet.

2603.09787 2026-06-01 cs.CV cs.LG 版本更新

What is Missing? Explaining Neurons Activated by Absent Concepts

缺失的是什么?解释被缺失概念激活的神经元

Robin Hesse, Simone Schaub-Meyer, Janina Hesse, Bernt Schiele, Stefan Roth

发表机构 * Max Planck Institute for Informatics, SIC(马克斯·普朗克信息研究所,SIC) Department of Computer Science, Technical University of Darmstadt(达姆施塔特技术大学计算机科学系) Leibniz Institute for Resilience Research(莱比锡韧性研究所) Institute for Quantitative and Computational Biosciences, Johannes Gutenberg University Mainz(美因茨雅各布·冯·特利尔大学定量与计算生物科学研究所) University Medical Center Mainz(美因茨大学医学中心)

AI总结 针对深度神经网络中编码缺失(概念缺失导致神经元激活)这一被忽视的因果关系,提出两种扩展归因和特征可视化方法以揭示并解释这种缺失,实验表明ImageNet模型利用此类缺失且考虑它们可改善去偏。

Comments ICML 2025 | Code: https://github.com/visinf/what-is-missing

详情
AI中文摘要

可解释人工智能(XAI)旨在通过估计模型的简化因果结构,提供对深度神经网络(DNN)行为的人类可解释洞察。在现有工作中,这种因果结构通常包括概念的存在与神经元强激活之间的关系。例如,归因方法主要识别对预测贡献最大的输入像素,而特征可视化方法揭示导致目标神经元高激活的输入——前者隐含假设相关信息存在于输入中,后者假设神经元编码概念的存在。然而,一种很大程度上被忽视的因果关系是编码缺失,即概念的缺失会增加神经元的激活。在这项工作中,我们展示了这种缺失但相关的概念是常见的,并且主流XAI方法在标准形式下难以揭示它们。为了解决这个问题,我们提出了两种简单的扩展,分别应用于归因和特征可视化技术,以揭示编码缺失。通过实验,我们展示了如何使用主流XAI方法揭示和解释编码缺失,ImageNet模型如何利用它们,以及考虑它们时如何改进去偏。

英文摘要

Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure often includes relationships where the presence of a concept is associated with a strong activation of a neuron. For example, attribution methods primarily identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron - the former implicitly assuming that the relevant information resides in the input, and the latter that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of encoded absences, where the absence of a concept increases neural activation. In this work, we show that such missing but relevant concepts are common and that mainstream XAI methods struggle to reveal them when applied in their standard form. To address this, we propose two simple extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show how mainstream XAI methods can be used to reveal and explain encoded absences, how ImageNet models exploit them, and that debiasing can be improved when considering them.

2603.09221 2026-06-01 cs.LG 版本更新

Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning

超越测试时记忆:用于LLM推理的状态空间最优控制

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René Vidal

发表机构 * vita-group(vita组)

AI总结 提出测试时控制(TTC)层,通过有限时域LQR规划实现推理,作为适配器集成到预训练LLM中,在数学推理任务上提升高达27.8%的准确率。

Comments ICML 2026

详情
AI中文摘要

联想记忆长期以来支撑着序列模型的设计。除了回忆之外,人类通过预测未来状态和选择目标导向行动来进行推理,这是现代语言模型日益需要但并未原生编码的能力。虽然先前的工作使用强化学习或测试时训练,但规划仍然独立于模型架构。我们将推理形式化为最优控制,并引入测试时控制(TTC)层,该层在推理时对潜在状态执行有限时域LQR规划,在神经架构内表示价值函数,并利用它作为嵌套目标以实现预测前的规划。为了确保可扩展性,我们基于辛公式推导出一个硬件高效的LQR求解器,并将其实现为融合CUDA内核,从而以最小开销实现并行执行。作为适配器集成到预训练LLM中,TTC层在MATH-500上将数学推理性能提升高达27.8%,在AMC和AIME上实现2-3倍的Pass@8改进,证明将最优控制嵌入为架构组件为超越测试时训练的推理提供了有效且可扩展的机制。

英文摘要

Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.

2408.16457 2026-06-01 cs.LG cs.DM 版本更新

HYGENE: A Diffusion-based Hypergraph Generation Method

HYGENE: 一种基于扩散的超图生成方法

Dorian Gailhard, Enzo Tartaglione, Lirida Naviner, Jhony H. Giraldo

AI总结 提出一种基于扩散过程的超图生成方法HYGENE,通过渐进局部扩展和去噪扩散过程,从单对连接节点逐步构建目标超图,首次将深度学习应用于超图生成。

Comments arXiv admin note: text overlap with arXiv:2312.11529 by other authors

详情
AI中文摘要

超图是强大的数学结构,可以模拟社交网络、生物信息学和推荐系统等各个领域中的复杂高阶关系。然而,由于其固有的复杂性和缺乏有效的生成模型,生成真实且多样化的超图仍然具有挑战性。在本文中,我们介绍了一种基于扩散的超图生成(HYGENE)方法,通过渐进局部扩展方法解决了这些挑战。HYGENE 作用于超图的二分表示,从单对连接节点开始,迭代扩展以形成目标超图。在每一步中,使用去噪扩散过程以局部方式添加节点和超边,这允许在细化局部细节之前构建全局结构。我们的实验证明了 HYGENE 的有效性,证明了它能够紧密模仿超图中的各种属性。据我们所知,这是首次尝试使用深度学习模型进行超图生成,我们的工作旨在为该领域的未来研究奠定基础。

英文摘要

Hypergraphs are powerful mathematical structures that can model complex, high-order relationships in various domains, including social networks, bioinformatics, and recommender systems. However, generating realistic and diverse hypergraphs remains challenging due to their inherent complexity and lack of effective generative models. In this paper, we introduce a diffusion-based Hypergraph Generation (HYGENE) method that addresses these challenges through a progressive local expansion approach. HYGENE works on the bipartite representation of hypergraphs, starting with a single pair of connected nodes and iteratively expanding it to form the target hypergraph. At each step, nodes and hyperedges are added in a localized manner using a denoising diffusion process, which allows for the construction of the global structure before refining local details. Our experiments demonstrated the effectiveness of HYGENE, proving its ability to closely mimic a variety of properties in hypergraphs. To the best of our knowledge, this is the first attempt to employ deep learning models for hypergraph generation, and our work aims to lay the groundwork for future research in this area.

2603.08721 2026-06-01 cs.AR cs.LG cs.SE 版本更新

KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware

KernelCraft: 面向新兴硬件的近底层内核生成的智能体基准测试

Jiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao

发表机构 * Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom(计算机科学与技术系,剑桥大学,剑桥,英国) Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom(电气与电子工程系,伦敦帝国理工学院,伦敦,英国) School of Informatics, University of Edinburgh, Edinburgh, United Kingdom(信息学院,爱丁堡大学,爱丁堡,英国)

AI总结 提出KernelCraft基准,通过函数调用和反馈驱动的工作流评估LLM智能体为新兴加速器生成和优化底层内核的能力,在多个任务上验证其能快速生成正确且高效的内核。

详情
AI中文摘要

具有新颖指令集架构(ISA)的新AI加速器通常需要开发者手动编写底层内核,这是一个耗时且易出错的过程,且无法跨硬件目标扩展。这延迟了新兴硬件平台进入市场。虽然先前基于LLM的代码生成在成熟的GPU生态系统中显示出潜力,但目前尚不清楚智能体LLM系统能否快速为具有新ISA的新兴硬件生成有效且高效的内核。我们提出KernelCraft:首个基准,用于评估LLM智能体通过函数调用、反馈驱动的工作流为定制加速器生成和优化底层内核的能力。我们在三个新兴加速器上评估智能体性能,涵盖20多个机器学习任务,每个任务有五种不同的配置。在四个领先的推理模型中,最强的智能体能在几步优化内为未见过的ISA生成功能正确的内核,并产生匹配或超越编译器基线的优化内核。这些结果证明了KernelCraft加速加速器芯片开发周期的潜力。KernelCraft可在https://kernelcraft-cam.github.io/获取。

英文摘要

New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels, a time-consuming and error-prone process that does not scale across hardware targets. This delays emerging hardware platforms from reaching the market. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark for evaluating an LLM agent's ability to generate and optimize low-level kernels for customized accelerators through a function-calling, feedback-driven workflow. We evaluate agent performance across three emerging accelerators on more than 20 machine-learning tasks, each with five diverse task configurations. Across four leading reasoning models, the strongest agents generate functionally correct kernels for unseen ISAs within a few refinement steps and produce optimized kernels that match or outperform compiler baselines. These results demonstrate KernelCraft's potential to accelerate the accelerator chip development cycle. KernelCraft is available at https://kernelcraft-cam.github.io/.

2603.08651 2026-06-01 cs.LG hep-th math-ph math.MP 版本更新

Group Entropies and Mirror Duality: A Class of Flexible Mirror Descent Updates for Machine Learning

群熵与镜像对偶:一类灵活的机器学习镜像下降更新

Andrzej Cichocki, Piergiulio Tempesta

发表机构 * Systems Research Institute of Polish Academy of Science(波兰科学院系统研究所) Warsaw University of Technology(华沙理工大学)

AI总结 本文提出一个连接形式群论和群熵与现代机器学习的理论算法框架,通过群论镜像映射实现灵活可调的镜像下降优化更新,并引入镜像对偶概念以切换链接函数,在单纯形约束二次规划问题上验证了有效性。

Comments 36 pages, 5 figures

详情
AI中文摘要

我们引入了一个全面的理论和算法框架,将形式群论和群熵与现代机器学习联系起来,为无限、灵活的镜像下降(MD)优化算法族铺平了道路。我们的方法利用了群熵的丰富结构,这些熵是由群合成法则控制的广义熵泛函,涵盖并显著扩展了所有迹形式熵,如Shannon、Tsallis和Kaniadakis族。通过在MD中利用群论镜像映射(或链接函数),通过多参数广义对数及其逆(群指数)表达,我们实现了高度灵活和自适应的MD更新,可以针对不同的数据几何和统计分布进行定制。为此,我们引入了“镜像对偶”的概念,允许我们在特定的学习率约束下,无缝地切换或互换群论链接函数及其逆。通过调整或学习群对数的超参数,使我们能够使模型适应训练分布的统计特性,同时通过微调确保理想的收敛特性。这种通用性不仅提供了更大的灵活性和改进的收敛特性,而且通过扩展正则化器和自然梯度算法的设计,为机器学习和深度学习中的应用开辟了新的视角。我们在大规模、单纯形约束的二次规划问题上广泛评估了所提出更新的有效性、鲁棒性和性能。

英文摘要

We introduce a comprehensive theoretical and algorithmic framework that bridges formal group theory and group entropies with modern machine learning, paving the way for an infinite, flexible family of Mirror Descent (MD) optimization algorithms. Our approach exploits the rich structure of group entropies, which are generalized entropic functionals governed by group composition laws, encompassing and significantly extending all trace-form entropies such as the Shannon, Tsallis, and Kaniadakis families. By leveraging group-theoretical mirror maps (or link functions) in MD, expressed via multi-parametric generalized logarithms and their inverses (group exponentials), we achieve highly flexible and adaptable MD updates that can be tailored to diverse data geometries and statistical distributions. To this end, we introduce the notion of \textit{mirror duality}, which allows us to seamlessly switch or interchange group-theoretical link functions with their inverses, subject to specific learning rate constraints. By tuning or learning the hyperparameters of the group logarithms enables us to adapt the model to the statistical properties of the training distribution, while simultaneously ensuring desirable convergence characteristics via fine-tuning. This generality not only provides greater flexibility and improved convergence properties, but also opens new perspectives for applications in machine learning and deep learning by expanding the design of regularizers and natural gradient algorithms. We extensively evaluate the validity, robustness, and performance of the proposed updates on large-scale, simplex-constrained quadratic programming problems.

2603.06738 2026-06-01 cs.LG cs.AI 版本更新

Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

秩分解隐式神经偏置:使用FlashAttention扩展超分辨率Transformer

Dongheon Lee, Seokju Yun, Jaegyun Im, Youngmin Ro

发表机构 * University of Seoul(首尔大学) KAIST AI(韩国科学技术院人工智能研究所)

AI总结 提出秩分解隐式神经偏置(RIB)替代相对位置偏置(RPB),通过低秩隐式神经表示和通道级拼接实现FlashAttention兼容,并引入卷积局部注意力和循环窗口策略,在Urban100×2上达到35.63 dB PSNR,训练和推理时间分别减少2.1倍和2.9倍。

详情
AI中文摘要

最近的超分辨率(SR)方法主要采用Transformer,因其强大的长程建模能力和卓越的表征能力。然而,大多数SR Transformer严重依赖相对位置偏置(RPB),这阻碍了它们利用硬件高效的注意力内核,如FlashAttention。这一限制在训练和推理过程中带来了巨大的计算负担,严重限制了通过扩大训练块大小或自注意力窗口来扩展SR Transformer的尝试。因此,与其他积极利用Transformer固有可扩展性的领域不同,SR Transformer仍然主要关注有效利用有限的感受野。在本文中,我们提出了秩分解隐式神经偏置(RIB),作为RPB的替代方案,使SR Transformer能够使用FlashAttention。具体来说,RIB使用低秩隐式神经表示来近似位置偏置,并以通道方式将它们与像素内容标记连接起来,将注意力分数计算中的逐元素偏置加法转化为点积运算。此外,我们引入了卷积局部注意力和循环窗口策略,以充分利用RIB和FlashAttention带来的长程交互优势。我们将窗口大小扩大到**96×96**,同时联合扩大训练块大小和数据集大小,最大化Transformer在SR任务中的优势。因此,我们的网络在Urban100×2上达到了**35.63 dB PSNR**,同时与基于RPB的SR Transformer(PFT)相比,训练和推理时间分别减少了**2.1倍**和**2.9倍**。

英文摘要

Recent Super-Resolution~(SR) methods mainly adopt Transformers for their strong long-range modeling capability and exceptional representational capacity. However, most SR Transformers rely heavily on relative positional bias~(RPB), which prevents them from leveraging hardware-efficient attention kernels such as FlashAttention. This limitation imposes a prohibitive computational burden during both training and inference, severely restricting attempts to scale SR Transformers by enlarging the training patch size or the self-attention window. Consequently, unlike other domains that actively exploit the inherent scalability of Transformers, SR Transformers remain heavily focused on effectively utilizing limited receptive fields. In this paper, we propose Rank-factorized Implicit Neural Bias~(RIB), an alternative to RPB that enables FlashAttention in SR Transformers. Specifically, RIB approximates positional bias using low-rank implicit neural representations and concatenates them with pixel content tokens in a channel-wise manner, turning the element-wise bias addition in attention score computation into a dot-product operation. Further, we introduce a convolutional local attention and a cyclic window strategy to fully leverage the advantages of long-range interactions enabled by RIB and FlashAttention. We enlarge the window size up to \textbf{96$\times$96} while jointly scaling the training patch size and the dataset size, maximizing the benefits of Transformers in the SR task. As a result, our network achieves \textbf{35.63\,dB PSNR} on Urban100$\times$2, while reducing training and inference time by \textbf{2.1$\times$} and \textbf{2.9$\times$}, respectively, compared to the RPB-based SR Transformer~(PFT).

2603.02630 2026-06-01 cs.LG cs.AI 版本更新

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB: 基于图神经网络的多智能体系统提示优化方法

Zhi Hong, Qian Zhang, Jiahang Sun, Zhiwei Shang, Mingze Kong, Xiangyi Wang, Yao Shu, Zhongxiang Dai

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) South China University of Technology(华南理工大学) Ritsumeikan University(立命馆大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州))

AI总结 提出基于赌博机的样本高效框架MASPOB,利用UCB平衡探索与利用、GNN捕获拓扑先验、坐标上升分解优化,解决多智能体系统提示优化中的样本效率、拓扑耦合和组合爆炸问题。

Comments ICML 2026 Spotlight

详情
AI中文摘要

大型语言模型(LLMs)在许多实际应用中取得了巨大成功,尤其是作为多智能体系统(MAS)的认知骨干来编排复杂工作流。由于许多部署场景排除了MAS工作流修改,且其性能对输入提示高度敏感,提示优化成为提高性能的更自然方法。然而,实际中的MAS提示优化面临三个关键挑战:(1)由于评估成本高昂,需要样本效率;(2)提示之间的拓扑诱导耦合;(3)搜索空间的组合爆炸。为了解决这些挑战,我们引入了MASPOB(基于赌博机的多智能体系统提示优化),一种基于赌博机的新型样本高效框架。通过利用上置信界(UCB)量化不确定性,赌博机框架平衡了探索与利用,在严格有限的预算内最大化收益。为了处理拓扑诱导耦合,MASPOB集成了图神经网络(GNN)以捕获结构先验,学习提示语义的拓扑感知表示。此外,它采用坐标上升将优化分解为单变量子问题,将搜索复杂度从指数级降低到线性级。跨不同基准的大量实验表明,MASPOB实现了最先进的性能,持续优于现有基线。

英文摘要

Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approach to improve its performance. However, real-world prompt optimization for MAS is impeded by three key challenges: (1) the need of sample efficiency due to prohibitive evaluation costs, (2) topology-induced coupling among prompts, and (3) the combinatorial explosion of the search space. To address these challenges, we introduce MASPOB (Multi-Agent System Prompt Optimization via Bandits), a novel sample-efficient framework based on bandits. By leveraging Upper Confidence Bound (UCB) to quantify uncertainty, the bandit framework balances exploration and exploitation, maximizing gains within a strictly limited budget. To handle topology-induced coupling, MASPOB integrates Graph Neural Networks (GNNs) to capture structural priors, learning topology-aware representations of prompt semantics. Furthermore, it employs coordinate ascent to decompose the optimization into univariate sub-problems, reducing search complexity from exponential to linear. Extensive experiments across diverse benchmarks demonstrate that MASPOB achieves state-of-the-art performance, consistently outperforming existing baselines.

2602.10117 2026-06-01 cs.LG cs.AI 版本更新

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

盲点中的偏见:检测大语言模型未能提及的内容

Iván Arcuschin, David Chanin, Adrià Garriga-Alonso, Oana-Maria Camburu

发表机构 * Poseidon Research(Poseidon研究) University College London, United Kingdom(伦敦大学学院, 英国) Imperial College London, United Kingdom(伦敦帝国学院, 英国)

AI总结 提出全自动黑盒流水线,通过统计测试和思维链分析,自动检测大语言模型在任务中未明确表述的偏见。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

大语言模型(LLMs)通常提供看似合理的思维链(CoT)推理痕迹,但可能隐藏内部偏见。我们称这些为未表述的偏见。因此,通过模型陈述的推理来监控模型是不可靠的,现有的偏见评估通常需要预定义类别和手工制作的数据集。在这项工作中,我们引入了一个全自动的黑盒流水线,用于检测特定任务的未表述偏见。给定一个任务数据集,该流水线使用LLM自动评分器生成候选偏见概念。然后,通过生成正面和负面变体,在逐渐增大的输入样本上测试每个概念,并应用统计技术进行多重测试和早期停止。如果一个概念在模型的CoT中未被引用为理由,但产生了统计上显著的性能差异,则将其标记为未表述的偏见。我们在三个决策任务(招聘、贷款审批和大学录取)上对七个LLM评估了我们的流水线。我们的技术自动发现了这些模型中以前未知的偏见(例如,西班牙语流利度、英语熟练度、写作正式度)。在同一运行中,该流水线还验证了先前工作手动识别的偏见(性别、种族、宗教、民族)。更广泛地说,我们提出的方法为自动、更高效和更广泛的特定任务未表述偏见发现提供了一条实用、可扩展的路径。

英文摘要

Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these unverbalized biases. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automated, black-box pipeline for detecting task-specific unverbalized biases. Given a task dataset, the pipeline uses LLM autoraters to generate candidate bias concepts. It then tests each concept on progressively larger input samples by generating positive and negative variations, and applies statistical techniques for multiple testing and early stopping. A concept is flagged as an unverbalized bias if it yields statistically significant performance differences while not being cited as justification in the model's CoTs. We evaluate our pipeline across seven LLMs on three decision tasks (hiring, loan approval, and university admissions). Our technique automatically discovers previously unknown biases in these models (e.g., Spanish fluency, English proficiency, writing formality). In the same run, the pipeline also validates biases that were manually identified by prior work (gender, race, religion, ethnicity). More broadly, our proposed approach provides a practical, scalable path to automatic, more efficient, and broader task-specific unverbalized bias discovery.

2602.23280 2026-06-01 cs.LG cs.RO 版本更新

Mollified Value Learning

Mollified Value Learning

Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Mihir Chauhan, Damon Conover, Ziran Wang, Aniket Bera

发表机构 * Department of Computer Science, Purdue University, USA(普渡大学计算机科学系) College of Engineering, Purdue University, USA(普渡大学工程学院) DEVCOM Army Research Laboratory, USA(美国国防部 DEVCOM 军事研究实验室)

AI总结 针对离线目标条件强化学习中值函数估计困难的问题,提出一种通过空间测度聚合约束(而非逐点微分约束)来诱导距离类值几何的方法,称为Mollified Value Learning(MVL),在导航和高维机器人操作任务中提升了目标达成性能。

详情
AI中文摘要

离线目标条件强化学习(GCRL)从静态数据集中学习达到目标的行为,但在有限的状态-动作覆盖下,准确的值估计仍然具有挑战性。现有的物理信息方法通过施加由Hamilton-Jacobi-Bellman(HJB)最优性原理导出的逐点距离类几何约束(通常通过一阶偏微分方程如Eikonal方程)来解决这一问题。然而,通过显式微分结构强制局部一致性在复杂高维环境中可能变得不稳定。我们的关键洞察是,将距离类约束重新解释为局部空间测度上的期望。通过在该测度上聚合约束而非逐点评估,目标函数充当空间平滑器(mollifier),在无需昂贵微分算子的情况下诱导出距离类值几何。我们称之为Mollified Value Learning(MVL)。在导航和高维机器人操作任务上的实验表明,当与隐式值表示学习方法结合使用时,MVL学习到结构化的值表示,提高了目标达成性能。开源代码可在https://github.com/HrishikeshVish/MVL获取。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.

2602.21620 2026-06-01 cs.GT cs.LG 版本更新

Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners

重新审视 Bertrand 悖论:通过无遗憾学习者的均衡分析

Arnab Maiti, Junyan Liu, Kevin Jamieson, Lillian J. Ratliff

发表机构 * University of Washington(华盛顿大学)

AI总结 本文通过无遗憾学习者的重复博弈模型,分析 Bertrand 定价博弈中高价格均衡出现的条件,并比较外部遗憾与交换遗憾对竞争行为的影响。

Comments 36 pages, 34 figures

详情
AI中文摘要

我们研究具有非递增需求函数的离散 Bertrand 定价博弈。该博弈有 $n \ge 2$ 个玩家,他们同时从集合 $\{1/k, 2/k, \ldots, 1\}$ 中选择价格,其中 $k\in\mathbb{N}$。设定最低价格的玩家获得全部需求;如果多个玩家并列最低价格,则他们平分需求。我们研究 Bertrand 悖论,即经典理论预测低价格,而实际市场往往维持高价格。为了理解这一差距,我们分析了一个重复博弈模型,其中企业使用无遗憾学习算法设定价格。我们的目标是刻画在不同无遗憾学习保证下可能出现的均衡结果。我们特别关注诸如无外部遗憾学习者是否能收敛到不良的高价格结果,以及更强的保证(如无交换遗憾)如何塑造竞争性低价格行为的出现等问题。我们通过理论分析解决这些问题及相关问题,并辅以实验支持理论,揭示无交换遗憾学习者的惊人现象。

英文摘要

We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.

2602.21340 2026-06-01 cs.LG 版本更新

HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models

HiPPO动物园:可解释状态空间模型的显式记忆机制

Jack Goffinet, Casey Hanks, David E. Carlson

发表机构 * Department of Computer Science, Duke University, Durham NC, USA(计算机科学系,杜克大学,北卡罗来纳州达勒姆)

AI总结 本文通过扩展HiPPO框架,提出五种显式、可解释的记忆机制(统称“HiPPO动物园”),使状态空间模型具备自适应记忆分配和联想记忆等能力,并在合成序列建模任务中验证其有效性。

Comments 24 pages, 7 figures; to be published in ICML 2026; additional experimental results included

详情
AI中文摘要

以压缩、高效且信息丰富的方式表示过去是处理序列数据系统的核心问题。Gu & Dao等人最初提出的HiPPO框架通过结构化线性常微分方程将信号投影到正交多项式(OP)基上,为序列压缩提供了一种原则性方法。后续工作将这些动态嵌入状态空间模型(SSM)中,其中HiPPO结构用作初始化。这些SSM方法的非线性后继(如Mamba)在许多具有长程依赖的任务中达到最先进水平,但它们表示和优先处理历史的机制在很大程度上仍是隐式的。在这项工作中,我们重新审视HiPPO框架,目标是使这些机制显式化。我们展示了如何扩展历史的多项式表示以支持现代SSM的能力(如自适应记忆分配和联想记忆),同时保留在OP基上的直接可解释性。我们引入一个统一的框架,包含五种这样的扩展,统称为“HiPPO动物园”。每种扩展通过对HiPPO框架进行显式、可解释的修改,暴露特定的建模能力。所得模型在线调整其记忆,并在流式设置中以高效更新进行训练。我们通过一系列合成序列建模任务展示了这些扩展的行为和建模优势,证明通常与现代SSM相关的能力可以通过显式、可解释的多项式记忆结构实现。

英文摘要

Representing the past in a compressed, efficient, and informative manner is a central problem for systems trained on sequential data. The HiPPO framework, originally proposed by Gu & Dao et al., provides a principled approach to sequential compression by projecting signals onto orthogonal polynomial (OP) bases via structured linear ordinary differential equations. Subsequent works have embedded these dynamics in state space models (SSMs), where HiPPO structure serves as an initialization. Nonlinear successors of these SSM methods such as Mamba are state-of-the-art for many tasks with long-range dependencies, but the mechanisms by which they represent and prioritize history remain largely implicit. In this work, we revisit the HiPPO framework with the goal of making these mechanisms explicit. We show how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive memory allocation and associative memory, while retaining direct interpretability in the OP basis. We introduce a unified framework comprising five such extensions, which we collectively refer to as a "HiPPO zoo." Each extension exposes a specific modeling capability through an explicit, interpretable modification of the HiPPO framework. The resulting models adapt their memory online and train in streaming settings with efficient updates. We illustrate the behaviors and modeling advantages of these extensions through a range of synthetic sequence modeling tasks, demonstrating that capabilities typically associated with modern SSMs can be realized through explicit, interpretable polynomial memory structures.

2411.00759 2026-06-01 cs.LG stat.ML 版本更新

Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching

离散流匹配中的小批量最优传输与困惑度界估计

Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman

发表机构 * Metadialog Research. Code: github.com/ehaxholli/DFM-OT(MetaDialog研究。代码:github.com/ehaxholli/DFM-OT)

AI总结 针对离散流匹配中状态转移过多和概率估计困难的问题,提出基于小批量最优传输的动态优化目标以减少转移次数,并给出两个困惑度上界以支持训练与评估。

详情
AI中文摘要

离散流匹配是一种用于建模分类数据的最新框架,在性能上与自回归模型相当。然而,与连续流匹配不同,由于离散路径的随机性,整流策略无法应用,因此需要替代方法来最小化状态转移。我们提出了一种动态最优传输类的最小化目标,并推导了其用于具有凸插值的离散流的Kantorovich形式,其中传输成本仅取决于状态间的不相似性,并可通过小批量策略进行优化。我们表明,此类方法可以将转移次数减少多达32倍(从1024到32),以达到相同的生成困惑度,同时不损害多样性。此外,离散流中的路径非确定性排除了瞬时变量变换的类似物,从而无法进行连续流可用的精确概率估计。因此,我们提出了两个困惑度上界,实现了有原则的训练、评估和模型比较。最后,我们引入了多掩码流,其在生成困惑度上优于掩码流且不损害多样性,特别是在使用小批量最优传输时。

英文摘要

Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state dissimilarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity without compromising diversity, particularly when utilizing minibatch Optimal Transport.

2602.19049 2026-06-01 cs.CL cs.LG 版本更新

IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

IAPO:面向令牌高效推理的信息感知策略优化

Yinhan He, Yaochen Zhu, Mingjia Shi, Wendy Zheng, Lin Su, Xiaoqing Wang, Qi Guo, Jundong Li

发表机构 * University of Virginia(弗吉尼亚大学) LinkedIn Inc.(LinkedIn公司)

AI总结 提出信息感知策略优化框架IAPO,通过基于条件互信息的令牌级优势塑造,在提升推理准确率的同时将推理长度减少高达36%。

详情
AI中文摘要

大型语言模型越来越依赖长思维链来提高准确性,但这种提升伴随着巨大的推理时间成本。我们重新审视令牌高效的后训练,并认为现有的序列级奖励塑造方法对推理努力在令牌间的分配控制有限。为弥补这一差距,我们提出IAPO,一个信息论后训练框架,根据每个令牌与最终答案的条件互信息(MI)分配令牌级优势。这提供了一种明确、有原则的机制来识别信息丰富的推理步骤并抑制低效探索。我们的理论分析表明,IAPO可以在不损害正确性的情况下诱导推理冗长性的单调减少。实验上,IAPO在保持推理准确率的同时,将推理长度减少高达36%,在各种推理数据集上优于现有的令牌高效强化学习方法。广泛的实证评估表明,信息感知优势塑造是令牌高效后训练的一个强大且通用的方向。代码可在 https://github.com/YinhanHe123/IAPO 获取。

英文摘要

Large language models increasingly rely on long chains of thought to improve accuracy, yet such gains come with substantial inference-time costs. We revisit token-efficient post-training and argue that existing sequence-level reward-shaping methods offer limited control over how reasoning effort is allocated across tokens. To bridge the gap, we propose IAPO, an information-theoretic post-training framework that assigns token-wise advantages based on each token's conditional mutual information (MI) with the final answer. This yields an explicit, principled mechanism for identifying informative reasoning steps and suppressing low-utility exploration. We provide a theoretical analysis showing that our IAPO can induce monotonic reductions in reasoning verbosity without harming correctness. Empirically, IAPO consistently improves reasoning accuracy while reducing reasoning length by up to 36%, outperforming existing token-efficient RL methods across various reasoning datasets. Extensive empirical evaluations demonstrate that information-aware advantage shaping is a powerful and general direction for token-efficient post-training. The code is available at https://github.com/YinhanHe123/IAPO.

2602.18837 2026-06-01 cs.LG 版本更新

L2G-Net: Local to Global Spectral Graph Neural Networks via Cauchy Factorizations

L2G-Net:通过柯西分解的局部到全局谱图神经网络

Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega

发表机构 * University of Southern California(南加州大学)

AI总结 提出L2G-Net,通过将图傅里叶变换精确分解为作用于子图的算子并利用柯西矩阵组合,实现局部到全局的谱图神经网络,避免全特征分解,在长程依赖任务上以极少的可学习参数达到竞争性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

尽管具有理论优势,基于图傅里叶变换(GFT)的谱方法由于计算特征基的成本以及所得表示缺乏顶点域局部性,很少用于图神经网络(GNN)。因此,大多数GNN依赖局部近似,如多项式拉普拉斯滤波器或消息传递,这限制了它们建模长程依赖的能力。在本文中,我们引入了一种将GFT精确分解为作用于子图的算子的方法,然后通过一系列柯西矩阵进行组合。基于这种分解,我们提出了一类新的谱GNN,称为L2G-Net(局部到全局网络)。与现有的谱方法(使用GFT时完全全局,或使用多项式滤波器时局部)不同,L2G-Net通过处理子图的谱表示,然后通过结构化矩阵组合它们来运作。我们的算法避免了完全特征分解,利用图拓扑结构以节点数的二次复杂度(按子图间最大割大小缩放)构建分解。在强调长程依赖的大图上的实验表明,L2G-Net可扩展到标准GFT无法企及的范围,并以数量级更少的可学习参数与最先进方法竞争。

英文摘要

Despite their theoretical advantages, spectral methods based on the graph Fourier transform (GFT) are seldom used in graph neural networks (GNNs) due to the cost of computing the eigenbasis and the lack of vertex-domain locality in the resulting representations. As a result, most GNNs rely on local approximations such as polynomial Laplacian filters or message passing, which limit their ability to model long-range dependencies. In this paper, we introduce an exact factorization of the GFT into operators acting on subgraphs, which are then combined via a sequence of Cauchy matrices. Building on this factorization, we propose a new class of spectral GNNs, termed L2G-Net (Local to Global Net). Unlike existing spectral methods, which are either fully global (when using the GFT) or local (when using polynomial filters), L2G-Net operates by processing the spectral representations of subgraphs and then combining them via structured matrices. Our algorithm avoids full eigendecompositions, exploiting graph topology to construct the factorization with quadratic complexity in the number of nodes, scaled by the maximum cut size between subgraphs. Experiments stressing long-range dependencies on large graphs show that L2G-Net scales to regimes out of reach for the standard GFT, and is competitive with state-of-the-art methods with orders of magnitude fewer learnable parameters.

2602.08885 2026-06-01 cs.LG cs.AI cs.SC 版本更新

Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression

打破摊销神经符号回归中的简化瓶颈

Paul Saegert, Ullrich Köthe

发表机构 * Heidelberg University(海德堡大学)

AI总结 针对摊销符号回归中表达式简化速度慢的问题,提出基于规则的简化引擎SimpliPy,实现百倍加速,从而提升模型精度和可扩展性。

Comments main text: 8 pages, 7 figures; appendix: 12 pages, 11 figures; code available at https://github.com/psaegert/simplipy and https://github.com/psaegert/flash-ansr; v2: Fixed rendering artifact in Figure 7; v3: Fixed Figure 3 title and formula; v4: Fixed Eq (1), example in App. M, Fig 13; v5: ICML 2026 Camera-Ready Version

详情
AI中文摘要

符号回归旨在发现准确描述观测数据的可解释解析表达式。摊销符号回归有望比主流的遗传编程符号回归方法效率更高,但目前难以扩展到真实的科学复杂度。我们发现一个关键障碍是缺乏将等价表达式快速简化为简洁规范形式的方法。摊销符号回归已通过通用计算机代数系统(如SymPy)解决此问题,但其高计算成本严重限制了训练和推理速度。我们提出SimpliPy,一个基于规则的简化引擎,在相当质量下实现比SymPy快100倍的速度。这使摊销符号回归获得显著改进,包括扩展到更大的训练集、更高效地使用每个表达式的令牌预算,以及系统性地消除训练集中与测试等价表达式的污染。我们在Flash-ANSR框架中展示了这些优势,在FastSRB基准上比摊销基线(NeSymReS, E2E)获得更好的准确率。此外,其性能与最先进的直接优化方法(PySR)相当,同时在增加推理预算时恢复更简洁而非更复杂的表达式。

英文摘要

Symbolic regression (SR) aims to discover interpretable analytical expressions that accurately describe observed data. Amortized SR promises to be much more efficient than the predominant genetic programming SR methods, but currently struggles to scale to realistic scientific complexity. We find that a key obstacle is the lack of a fast reduction of equivalent expressions to a concise normalized form. Amortized SR has addressed this with general-purpose Computer Algebra Systems (CAS) like SymPy, but the high computational cost severely limits training and inference speed. We propose SimpliPy, a rule-based simplification engine achieving a 100-fold speed-up over SymPy at comparable quality. This enables substantial improvements in amortized SR, including scalability to much larger training sets, more efficient use of the per-expression token budget, and systematic training set decontamination with respect to equivalent test expressions. We demonstrate these advantages in our Flash-ANSR framework, which achieves much better accuracy than amortized baselines (NeSymReS, E2E) on the FastSRB benchmark. Moreover, it performs on par with state-of-the-art direct optimization (PySR) while recovering more concise rather than more complex expressions with increasing inference budget.

2506.01928 2026-06-01 cs.CL cs.LG 版本更新

Esoteric Language Models: A Family of Any-Order Diffusion LLMs

深奥语言模型:一类任意阶扩散LLM

Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat

发表机构 * Cornell University(康奈尔大学) NVIDIA(英伟达)

AI总结 提出Eso-LMs模型,融合自回归与掩码扩散范式,通过因果注意力实现精确似然计算和KV缓存,在速度-质量帕累托前沿上达到新最优。

Comments ICML 2026

详情
AI中文摘要

基于扩散的语言模型通过并行和可控生成为自回归(AR)模型提供了引人注目的替代方案。在这一类模型中,掩码扩散模型(MDM)目前表现最佳,但在困惑度上仍不如AR模型,并且缺乏关键的推理时效率特性,尤其是KV缓存。我们引入了Eso-LMs,这是一个融合AR和MDM范式的新模型家族,能够平滑地插值它们的困惑度,同时克服各自的局限性。与以往使用具有双向注意力的Transformer作为MDM去噪器的工作不同,我们利用了MDM与任意阶自回归模型之间的联系,并采用因果注意力。这种设计使我们首次能够计算MDM的精确似然,并且关键的是,首次能够在保持并行生成的同时为MDM引入KV缓存,从而显著提高推理效率。结合优化的采样调度,Eso-LMs在无条件生成的快速-质量帕累托前沿上建立了新的最先进水平。我们在项目页面上提供代码、模型检查点和视频教程:https://s-sahoo.com/Eso-LMs。

英文摘要

Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs) currently perform best but still underperform AR models in perplexity and lack key inference-time efficiency features, most notably KV caching. We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating between their perplexities while overcoming their respective limitations. Unlike prior work, which uses transformers with bidirectional attention as MDM denoisers, we exploit the connection between MDMs and Any-Order autoregressive models and adopt causal attention. This design lets us compute the exact likelihood of MDMs for the first time and, crucially, enables us to introduce KV caching for MDMs while preserving parallel generation for the first time, significantly improving inference efficiency. Combined with an optimized sampling schedule, Eso-LMs establish a new state of the art on the speed-quality Pareto frontier for unconditional generation. We provide the code, model checkpoints, and the video tutorial on the project page: https://s-sahoo.com/Eso-LMs.

2602.18333 2026-06-01 cs.LG cs.CL 版本更新

On the "Induction Bias" in Sequence Models

论序列模型中的“归纳偏置”

M. Reza Ebrahimi, Michaël Defferrard, Sunny Panchal, Roland Memisevic

发表机构 * Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc(高通人工智能研究,由高通技术公司发起)

AI总结 通过大规模实验比较Transformer和RNN在状态跟踪任务上的数据效率,发现Transformer需要更多训练数据且难以跨长度共享权重,而RNN通过权重共享实现有效学习。

Comments Accepted to the International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

尽管基于Transformer的语言模型在实际应用中取得了显著成功,但近期研究对其执行状态跟踪的能力提出了担忧。特别是,越来越多的文献通过分布外(OOD)泛化失败(如长度外推)来展示这一局限性。在本工作中,我们将注意力转向这些局限性的分布内影响。我们在多种监督机制下对Transformer和循环神经网络(RNN)的数据效率进行了大规模实验研究。我们发现,Transformer所需的训练数据量随状态空间大小和序列长度的增长远快于RNN。此外,我们分析了学习到的状态跟踪机制在不同序列长度上的共享程度。我们表明,Transformer在不同长度上的权重共享可以忽略甚至有害,表明它们孤立地学习长度特定的解决方案。相比之下,循环模型通过跨长度共享权重实现了有效的摊销学习,使得一个序列长度的数据能够提高其他长度上的性能。这些结果共同表明,即使训练和评估分布匹配,状态跟踪仍然是Transformer的一个基本挑战。

英文摘要

Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization, such as length extrapolation. In this work, we shift attention to the in-distribution implications of these limitations. We conduct a large-scale experimental study of the data efficiency of transformers and recurrent neural networks (RNNs) across multiple supervision regimes. We find that the amount of training data required by transformers grows much more rapidly with state-space size and sequence length than for RNNs. Furthermore, we analyze the extent to which learned state-tracking mechanisms are shared across different sequence lengths. We show that transformers exhibit negligible or even detrimental weight sharing across lengths, indicating that they learn length-specific solutions in isolation. In contrast, recurrent models exhibit effective amortized learning by sharing weights across lengths, allowing data from one sequence length to improve performance on others. Together, these results demonstrate that state tracking remains a fundamental challenge for transformers, even when training and evaluation distributions match.

2602.17531 2026-06-01 cs.LG cs.AI 版本更新

Position: Evaluation of ECG Representations Must Be Fixed

Position: Evaluation of ECG Representations Must Be Fixed

Zachary Berger, Daniel Prakah-Asante, John Guttag, Collin M. Stultz

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Massachusetts General Hospital(麻省总医院)

AI总结 本文主张必须改进12导联心电图表示学习的基准测试实践,以确保进展可靠且符合临床目标,并提出了扩展评估范围、采用最佳实践以及将随机编码器作为基线等建议。

Comments Project website at https://ecgfix.csail.mit.edu/

详情
AI中文摘要

这篇立场论文认为,当前12导联心电图表示学习的基准测试实践必须加以改进,以确保进展可靠且与临床有意义的目标一致。该领域已基本集中于三个公共多标签基准(PTB-XL、CPSC2018、CSN),这些基准主要由心律失常和波形形态标签主导,尽管已知心电图编码了更广泛的临床信息。我们认为,下游评估应扩展到包括结构性心脏病评估和患者级预测,以及其他不断发展的心电图相关终点,作为相关的临床目标。接下来,我们概述了多标签、不平衡设置下的评估最佳实践,并表明当应用这些实践时,文献中关于哪些表示性能最佳的当前结论会发生变化。此外,我们展示了一个令人惊讶的结果:随机初始化的编码器在线性评估下与许多任务上的最先进预训练方法相匹配。这促使将随机编码器作为合理的基线模型。我们通过实证评估五种代表性心电图预训练方法在六种评估设置(三个标准基准、一个结构性心脏病数据集、血流动力学推断和患者预测)中的表现来证实我们的观察。

英文摘要

This position paper argues that current benchmarking practice in 12-lead ECG representation learning must be fixed to ensure progress is reliable and aligned with clinically meaningful objectives. The field has largely converged on three public multi-label benchmarks (PTB-XL, CPSC2018, CSN) dominated by arrhythmia and waveform-morphology labels, even though the ECG is known to encode substantially broader clinical information. We argue that downstream evaluation should expand to include an assessment of structural heart disease and patient-level forecasting, in addition to other evolving ECG-related endpoints, as relevant clinical targets. Next, we outline evaluation best practices for multi-label, imbalanced settings, and show that when they are applied, the literature's current conclusion about which representations perform best is altered. Furthermore, we demonstrate the surprising result that a randomly initialized encoder with linear evaluation matches state-of-the-art pre-training on many tasks. This motivates the use of a random encoder as a reasonable baseline model. We substantiate our observations with an empirical evaluation of five representative ECG pre-training approaches across six evaluation settings: the three standard benchmarks, a structural disease dataset, hemodynamic inference, and patient forecasting.

2602.16601 2026-06-01 stat.ML cs.LG 版本更新

Quantifying Error Propagation and Model Collapse in Diffusion Models

量化扩散模型中的误差传播与模型崩溃

Nail B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge, Cambridge, United Kingdom(剑桥大学)

AI总结 本文理论分析了基于分数的扩散模型中递归训练导致模型崩溃的误差传播机制,给出了生成分布与目标分布之间累积散度的上下界,并刻画了不同漂移区域。

Comments Accepted at ICML 2026

详情
AI中文摘要

机器学习模型越来越多地在合成数据上进行训练或微调。已观察到,在此类数据上递归训练会显著降低各种任务的性能,通常表现为逐渐偏离目标分布。在这项工作中,我们在基于分数的扩散模型设置下从理论上分析了这一现象。对于每个训练轮次使用合成数据与来自目标分布的新鲜样本组合的实际流程,我们获得了生成分布与目标分布之间累积散度的上界和下界。值得注意的是,据我们所知,这是首次对学习分布与目标分布之间的散度给出下界,即使对于标准扩散模型也是如此。我们的结果使我们能够根据分数估计误差和每代中使用的新鲜数据比例来表征不同的漂移区域。在某个区域中,多次再训练轮次后的累积散度可以表示为每代分数估计误差的折现和。我们还提供了合成数据和图像上的实证结果以说明该理论。

英文摘要

Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. Notably, to the best of our knowledge, this is the first lower bound on the divergence between the learned and target distributions, even for standard diffusion models. Our results allow us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. In a certain regime, the accumulated divergence after several retraining rounds can be expressed as a discounted sum of score estimation errors made at each generation. We also provide empirical results on synthetic data and images to illustrate the theory.

2602.16305 2026-06-01 cs.SD cs.LG 版本更新

BAT: Better Audio Transformer Guided by Convex Gated Probing

BAT: 基于凸门控探测的更好音频Transformer

Houtan Ghaffari, Lukas Rauch, Christoph Scholz, Paul Devos

发表机构 * Ghent University(根特大学) University of Kassel(卡塞尔大学)

AI总结 提出凸门控探测(CGP)方法,通过门控机制有效利用所有冻结层,缩小音频自监督学习中探测与微调的差距,并基于CGP改进SSL流程,构建Better Audio Transformer(BAT),在音频基准上取得新最优结果。

Comments Accepted @ ICML26

详情
AI中文摘要

探测在计算机视觉中被广泛用于忠实评估自监督学习(SSL)嵌入,因为微调可能扭曲其内在质量。相比之下,音频SSL模型仍依赖微调,因为简单探测无法充分发挥其潜力,并在AudioSet竞争时改变排名。因此,需要一种稳健高效的探测机制来引导音频SSL走向可靠和可重复的方法。我们引入凸门控探测(CGP),一种基于原型的方法,显著缩小了音频中微调和探测之间的差距。CGP通过门控机制高效利用所有冻结层,并揭示潜在任务相关信息的所在位置。以CGP作为可靠的事后评估探测为指导,我们重新设计了当前最佳音频模型的整个SSL流程,这些模型使用了先前SSL方法的遗留实现。通过改进数据预处理、模型架构和预训练方案,我们推出了Better Audio Transformer(BAT),并在音频基准上建立了新的最优结果。

英文摘要

Probing is widely adopted in computer vision to faithfully evaluate self-supervised learning (SSL) embeddings, as finetuning may misrepresent their inherent quality. In contrast, audio SSL models still rely on finetuning because simple probing fails to unlock their full potential and alters their rankings when competing on AudioSet. Hence, a robust and efficient probing mechanism is required to guide the trajectory of audio SSL towards reliable and reproducible methods. We introduce Convex Gated Probing (CGP), a prototype-based method that significantly closes the gap between finetuning and probing in audio. CGP efficiently utilizes all frozen layers via a gating mechanism and exposes the location of latent task-relevant information. Guided by CGP as a reliable post-hoc evaluation probe, we rework the entire SSL pipeline of current best performing audio models that use legacy implementations of prior SSL methods. By refining data preprocessing, model architecture, and pretraining recipe, we introduce Better Audio Transformer (BAT), and establish new SOTA on audio benchmarks.

2602.15634 2026-06-01 cs.LG 版本更新

Beyond ReLU: Bifurcation, Oversmoothing, and Topological Priors

超越ReLU:分岔、过平滑与拓扑先验

Erkan Turan, Gaspard Abel, Maysam Behmanesh, Emery Pierson, Maks Ovsjanikov

发表机构 * Université Paris Saclay, Université Paris Cité, ENS Paris Saclay, CNRS, SSA, INSERM, Centre Borelli(巴黎萨克雷大学、巴黎城市大学、巴黎萨克雷高等师范学院、国家科学研究中心、SSA、国家卫生研究院、Borelli中心) Centre d’Analyse et de Mathématique Sociales, EHESS, CNRS(社会科学分析与数学中心、EHESS、国家科学研究中心)

AI总结 从分岔理论视角重新解释图神经网络的过平滑问题,发现用特定激活函数替代ReLU可打破同质稳定状态,诱导出抵抗过平滑的非同质模式,并推导出分岔感知初始化方法。

详情
AI中文摘要

图神经网络(GNN)通过基于网络的迭代消息传递学习节点表示。尽管强大,深层GNN却遭受过平滑问题,即节点特征收敛到同质、无信息的状态。我们从分岔理论的角度重新审视这种表示坍缩问题,将过平滑表征为收敛到稳定的“同质不动点”。我们的核心贡献是理论发现:通过用一类函数替代标准单调激活函数(如ReLU),可以打破这种不期望的稳定性。利用Lyapunov-Schmidt约化,我们解析证明这种替换会诱导分岔,使同质状态失稳,并产生一对新的稳定、非同质的模式,这些模式被证明能抵抗过平滑。我们的理论预测了这些涌现模式振幅的精确、非平凡标度律,并在实验中定量验证。最后,我们通过推导闭式的、分岔感知的初始化方法,并在实际基准实验中展示其效用,证明了我们理论的实用价值。

英文摘要

Graph Neural Networks (GNNs) learn node representations through iterative network-based message-passing. While powerful, deep GNNs suffer from oversmoothing, where node features converge to a homogeneous, non-informative state. We re-frame this problem of representational collapse from a \emph{bifurcation theory} perspective, characterizing oversmoothing as convergence to a stable ``homogeneous fixed point.'' Our central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations (e.g., ReLU) with a class of functions. Using Lyapunov-Schmidt reduction, we analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous \emph{patterns} that provably resist oversmoothing. Our theory predicts a precise, nontrivial scaling law for the amplitude of these emergent patterns, which we quantitatively validate in experiments. Finally, we demonstrate the practical utility of our theory by deriving a closed-form, bifurcation-aware initialization and showing its utility in real benchmark experiments.

2602.15293 2026-06-01 cs.LG cs.AI cs.CL stat.ML 版本更新

The Information Geometry of Softmax: Probing and Steering

Softmax的信息几何:探测与引导

Kiho Park, Todd Nief, Yo Joong Choe, Victor Veitch

发表机构 * University of Chicago(芝加哥大学)

AI总结 本文从信息几何角度研究AI系统如何将语义结构编码到表示空间的几何结构中,并提出一种利用线性探针鲁棒引导表示以展现特定概念的“双重引导”方法。

Comments Code is available at https://github.com/KihoPark/dual-steering

详情
Journal ref
In Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026
AI中文摘要

本文关注AI系统如何将语义结构编码到其表示空间的几何结构中的问题。动机观察是,这些表示空间的自然几何应反映模型使用表示产生行为的方式。我们聚焦于定义softmax分布的重要特例。在这种情况下,我们认为自然几何是信息几何。我们的重点是信息几何在语义编码和线性表示假设中的作用。作为一个说明性应用,我们开发了“双重引导”,一种利用线性探针鲁棒地引导表示以展现特定概念的方法。我们证明双重引导在最小化对非目标概念改变的同时,最优地修改目标概念。实验上,我们发现双重引导增强了概念操控的可控性和稳定性。

英文摘要

This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation is that the natural geometry of these representation spaces should reflect the way models use representations to produce behavior. We focus on the important special case of representations that define softmax distributions. In this case, we argue that the natural geometry is information geometry. Our focus is on the role of information geometry on semantic encoding and the linear representation hypothesis. As an illustrative application, we develop "dual steering", a method for robustly steering representations to exhibit a particular concept using linear probes. We prove that dual steering optimally modifies the target concept while minimizing changes to off-target concepts. Empirically, we find that dual steering enhances the controllability and stability of concept manipulation.

2602.13069 2026-06-01 cs.LG cs.CL 版本更新

Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning

面向设备上大语言模型微调的内存高效结构化反向传播

Juneyoung Park, Yuri Hong, Seongwan Kim, Jaeho Lee

发表机构 * OptAI Inc.(OptAI公司)

AI总结 提出MeSP方法,通过手动推导利用LoRA低秩结构的反向传播,在计算数学等价梯度的同时平均减少49%内存,使内存受限设备上的微调成为可能。

Comments ACL2026

详情
AI中文摘要

设备上微调能够实现大语言模型的隐私保护个性化,但移动设备存在严重的内存限制,通常所有工作负载共享6-12GB内存。现有方法迫使在高内存的精确梯度(MeBP)和低内存的噪声估计(MeZO)之间进行权衡。我们提出内存高效结构化反向传播(MeSP),通过手动推导利用LoRA低秩结构的反向传播来弥合这一差距。我们的关键洞察是,中间投影 $h = xA$ 可以在反向传播中以最小成本重新计算,因为秩 $r \ll d_{in}$,从而无需存储它。在Qwen2.5模型(0.5B-3B)上,MeSP相比MeBP平均减少49%内存,同时计算数学上等价的梯度。我们的分析还揭示,MeZO的梯度估计与真实梯度的相关性接近零(余弦相似度≈0.001),解释了其收敛缓慢的原因。MeSP将Qwen2.5-0.5B的峰值内存从361MB降低到136MB,使得先前在内存受限设备上不可行的微调场景成为可能。

英文摘要

On-device fine-tuning enables privacy-preserving personalization of large language models, but mobile devices impose severe memory constraints, typically 6--12GB shared across all workloads. Existing approaches force a trade-off between exact gradients with high memory (MeBP) and low memory with noisy estimates (MeZO). We propose Memory-efficient Structured Backpropagation (MeSP), which bridges this gap by manually deriving backward passes that exploit LoRA's low-rank structure. Our key insight is that the intermediate projection $h = xA$ can be recomputed during backward at minimal cost since rank $r \ll d_{in}$, eliminating the need to store it. MeSP achieves 49\% average memory reduction compared to MeBP on Qwen2.5 models (0.5B--3B) while computing mathematically identical gradients. Our analysis also reveals that MeZO's gradient estimates show near-zero correlation with true gradients (cosine similarity $\approx$0.001), explaining its slow convergence. MeSP reduces peak memory from 361MB to 136MB for Qwen2.5-0.5B, enabling fine-tuning scenarios previously infeasible on memory-constrained devices.

2602.12386 2026-06-01 cs.MA cs.GT cs.LG 版本更新

Provably Convergent Actor-Critic for MARL through Risk-aversion

通过风险厌恶实现可证明收敛的MARL演员-评论家算法

Yizhou Zhang, Eric Mazumdar

发表机构 * caltech(加州理工学院)

AI总结 针对无限时域一般和马尔可夫博弈,提出基于风险厌恶分位数响应均衡(RQE)的单时间尺度演员-评论家算法,利用RQE的正则性证明全局收敛并给出有限样本保证。

详情
AI中文摘要

在无限时域一般和马尔可夫博弈(MGs)中学习平稳策略仍然是多智能体强化学习(MARL)中的一个基本开放问题。尽管平稳策略因其实用性而受到青睐,但计算经典博弈论均衡的平稳形式在计算上是棘手的——这与解决单智能体RL或零和博弈的相对容易形成鲜明对比。为了弥合这一差距,我们研究了风险厌恶分位数响应均衡(RQE),这是一种根植于行为博弈论的概念,结合了风险厌恶和有限理性。我们证明RQE具有强正则性条件,使其特别适合在MGs中进行学习。我们提出了一种新颖的单时间尺度演员-评论家算法,其特点是演员更新更快而评论家更新较慢。利用RQE的正则性,我们证明该方法实现了具有有限样本保证的全局收敛。我们在多个环境中进行了实证验证,表明与风险中性基线相比,我们的算法具有优越的收敛性能。

英文摘要

Learning stationary policies in infinite-horizon general-sum Markov games (MGs) remains a fundamental open problem in Multi-Agent Reinforcement Learning (MARL). While stationary strategies are preferred for their practicality, computing stationary forms of classic game-theoretic equilibria is computationally intractable -- a stark contrast to the comparative ease of solving single-agent RL or zero-sum games. To bridge this gap, we study Risk-averse Quantal response Equilibria (RQE), a solution concept rooted in behavioral game theory that incorporates risk aversion and bounded rationality. We demonstrate that RQE possesses strong regularity conditions that make it uniquely amenable to learning in MGs. We propose a novel single-timescale Actor-Critic algorithm characterized by a faster actor and a slower critic. Leveraging the regularity of RQE, we prove that this approach achieves global convergence with finite-sample guarantees. We empirically validate our algorithm in several environments to demonstrate superior convergence properties compared to risk-neutral baselines.

2602.11802 2026-06-01 cs.LG 版本更新

Structural Bias Beyond Homophily: A Study of Fairness in Link Prediction

超越同质性的结构偏差:链接预测中的公平性研究

Lilian Marey, Mathilde Perez, Tiphaine Viard, Charlotte Laclau

发表机构 * LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France(LTCI, Télécom 巴黎, 巴黎理工学院, 帕莱萨乌, 法国)

AI总结 本研究通过形式化拓扑偏差度量并引入可控结构属性的合成图生成方法,实证了图拓扑与链接预测公平性之间的强相关性,并揭示了现有公平感知方法对同质性之外的结构偏差仍然敏感。

详情
AI中文摘要

图链接预测(LP)在诸如工作推荐和友谊形成等具有社会影响力的应用中发挥着关键作用,使得公平性成为该任务中的一个关键问题。虽然许多公平感知方法通过操纵图结构来减轻预测差异,但社会图中固有的拓扑偏差仍然未被充分理解,并且始终仅与同质性混为一谈。在这项工作中,我们研究了结构偏差与LP中公平性结果之间的关系。为此,我们形式化了拓扑偏差度量的分类,并引入了一种图生成方法,该方法可生成具有可控结构属性的多样化合成图语料库。利用该语料库,我们实证表明公平性结果与图拓扑强相关,并且当前的公平感知方法对同质性之外的结构偏差仍然敏感。这些发现强调了在公平图学习中进行基于结构的评估的必要性。

英文摘要

Graph link prediction (LP) plays a critical role in socially impactful applications such as job recommendation and friendship formation, making fairness a critical concern in this task. While many fairness-aware methods manipulate graph structures to mitigate prediction disparities, the topological biases inherent to social graphs remain poorly understood and are consistently conflated with homophily alone. In this work, we study the relationship between structural biases and fairness outcomes in LP. To this end, we formalize a taxonomy of topological bias measures and introduce a graph generation method producing a diverse corpus of synthetic graphs with controlled structural properties. Using this corpus, we show empirically that fairness outcomes are strongly correlated with graph topology, and that current fairness-aware methods remain sensitive to structural biases beyond homophily. These findings highlight the need for structurally grounded evaluations in fair graph learning.

2602.11216 2026-06-01 cs.LG physics.bio-ph 版本更新

Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators

蛋白质语言模型嵌入提升隐式转移算子的泛化能力

Panagiotis Antoniadis, Beatrice Pavesi, Simon Olsson, Ole Winther

发表机构 * University of Copenhagen(哥本哈根大学) Chalmers University of Technology(查尔姆斯理工大学) University of Gothenburg(哥德堡大学) Technical University of Denmark(丹麦技术大学)

AI总结 本研究提出PLaTITO方法,通过整合蛋白质语言模型嵌入改进隐式转移算子,在分子动力学中实现更高效的数据利用和跨分子系统的泛化,在非平衡蛋白质系统采样中达到最优性能。

Comments 29 pages, 14 figures and 11 tables, Accepted at ICML 2026

详情
AI中文摘要

分子动力学(MD)是物理学、化学和生物学中的核心计算工具,能够将实验可观测量作为高维分子分布(如玻尔兹曼分布和转移密度)的期望进行定量预测。然而,传统MD受到生成独立样本所需高计算成本的根本限制。生成式分子动力学(GenMD)最近作为一种替代方案出现,通过数据或与能量模型交互学习分子分布的替代模型。尽管这些方法实现了高效采样,但它们在不同分子系统间的可迁移性通常有限。在本工作中,我们表明整合辅助信息源可以提高可迁移隐式转移算子(TITO)在分子动力学中的数据效率和泛化能力。我们发现粗粒化TITO模型比玻尔兹曼模拟器在数据效率上显著更高,并且整合蛋白质语言模型(pLM)嵌入进一步改善了分布外泛化。我们的方法PLaTITO在非平衡蛋白质系统(包括快速折叠蛋白质)的平衡采样基准测试中达到了最先进的性能。我们进一步研究了额外条件信号(如结构嵌入、温度和大语言模型衍生嵌入)对模型性能的影响。

英文摘要

Molecular dynamics (MD) is a central computational tool in physics, chemistry, and biology, enabling quantitative prediction of experimental observables as expectations over high-dimensional molecular distributions such as Boltzmann distributions and transition densities. However, conventional MD is fundamentally limited by the high computational cost required to generate independent samples. Generative molecular dynamics (GenMD) has recently emerged as an alternative, learning surrogates of molecular distributions either from data or through interaction with energy models. While these methods enable efficient sampling, their transferability across molecular systems is often limited. In this work, we show that incorporating auxiliary sources of information can improve the data efficiency and generalization of transferable implicit transfer operators (TITO) for molecular dynamics. We find that coarse-grained TITO models are substantially more data-efficient than Boltzmann Emulators, and that incorporating protein language model (pLM) embeddings further improves out-of-distribution generalization. Our approach, PLaTITO, achieves state-of-the-art performance on equilibrium sampling benchmarks for out-of-distribution protein systems, including fast-folding proteins. We further study the impact of additional conditioning signals such as structural embeddings, temperature, and large-language-model-derived embeddings on model performance.

2602.11208 2026-06-01 cs.LG 版本更新

Adaptive Physics Transformer with Fused Global-Local Attention for Subsurface Energy Systems

自适应物理Transformer融合全局-局部注意力用于地下能源系统

Xin Ju, Nok Hei, Fung, Yuyan Zhang, Carl Jacquemyn, Matthew Jackson, Randolph Settgast, Sally M. Benson, Gege Wen

发表机构 * Department of Energy Science and Engineering, Stanford University(斯坦福大学能源科学与工程系) Department of Earth Sciences and Engineering, Imperial College London(伦敦帝国理工学院地球科学与工程系) Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室) EarthFlow AI, Inc.(EarthFlow AI公司)

AI总结 提出自适应物理Transformer(APT),通过融合图编码器和全局注意力机制,高效处理地下能源系统中的异构网格和物理耦合问题,在规则与不规则网格上均优于现有架构,并首次直接从高分辨率自适应网格细化模拟中学习。

详情
AI中文摘要

地球地下空间是现代社会的基石,提供碳氢化合物、地热和矿物等基本能源资源,同时是$CO_2$封存的主要储层。然而,由于地质异质性、高分辨率要求以及具有不同传播时间尺度的物理过程的紧密耦合,这些系统的全物理数值模拟计算成本极高。本文提出$ extbf{自适应物理Transformer}$(APT),这是一种与几何、网格和物理无关的神经算子,明确解决了这些挑战。APT融合了基于图的编码器以提取高分辨率局部异质特征,并结合全局注意力机制以解析远程物理影响。我们的结果表明,APT在规则和不规则网格上的地下任务中均优于最先进的架构,并具有鲁棒的超分辨率能力。值得注意的是,APT是第一个直接从高分辨率自适应网格细化模拟中学习的架构。我们还展示了APT良好的扩展行为和跨数据集学习能力,使其成为大规模地下基础模型开发的稳健且可扩展的骨干网络。

英文摘要

The Earth's subsurface is a cornerstone of modern society, providing essential energy resources like hydrocarbons, geothermal, and minerals while serving as the primary reservoir for $CO_2$ sequestration. However, full physics numerical simulations of these systems are notoriously computationally expensive due to geological heterogeneity, high resolution requirements, and the tight coupling of physical processes with distinct propagation time scales. Here we propose the $\textbf{Adaptive Physics Transformer}$ (APT), a geometry-, mesh-, and physics-agnostic neural operator that explicitly addresses these challenges. APT fuses a graph-based encoder to extract high-resolution local heterogeneous features with a global attention mechanism to resolve long-range physical impacts. Our results demonstrate that APT outperforms state-of-the-art architectures in subsurface tasks across both regular and irregular grids with robust super-resolution capabilities. Notably, APT is the first architecture that learns directly from HR-adaptive mesh refinement simulations. We also demonstrate APT's favorable scaling behavior and cross-dataset learning capability, positioning it as a robust and scalable backbone for large-scale subsurface foundation model development.

2602.11137 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Weight Decay Improves Language Model Plasticity

权重衰减提升语言模型可塑性

Tessa Han, Sebastian Bordt, Hanlin Zhang, Sham Kakade

发表机构 * Broad Institute, Schmidt Center(Broad研究所,Schmidt中心) University of Tübingen, Tübingen AI Center(图宾根大学,图宾根人工智能中心) Harvard University(哈佛大学)

AI总结 本文通过系统实验表明,预训练中较大的权重衰减能提高模型的可塑性,使微调后下游性能更优,并揭示了其促进线性可分表示、正则化注意力矩阵和减少过拟合的机制。

详情
AI中文摘要

大型语言模型通常分两个主要阶段训练:预训练以产生基础模型,然后进一步训练以提高下游性能。然而,超参数优化和缩放定律主要从基础模型验证损失的角度研究,忽略了一个关键的模型属性:下游适应性。在这项工作中,我们从模型可塑性的角度研究预训练,即基础模型在额外训练后成功适应下游任务的能力。我们关注权重衰减的作用,这是预训练中的一个关键正则化参数,并通过系统实验表明,较大的权重衰减提高了预训练模型的可塑性,导致微调后下游性能提升更大。这种效应可能导致反直觉的权衡,即预训练后表现较差的基础模型在进一步训练后可能表现更好。对权重衰减对模型行为的机制影响的进一步研究表明,它鼓励线性可分的表示,正则化注意力矩阵,并减少对训练数据的过拟合。这些发现共同强调了预训练模型可塑性的重要性,使用交叉熵损失作为超参数优化的唯一指标的局限性,以及单个优化超参数在塑造模型行为中的多方面作用。

英文摘要

Large language models are typically trained in two broad phases: pretraining to produce a base model, followed by further training to improve downstream performance. However, hyperparameter optimization and scaling laws are studied primarily from the perspective of the base model's validation loss, overlooking a crucial model property: downstream adaptability. In this work, we study pretraining from the perspective of model plasticity, that is, the ability of the base model to successfully adapt to downstream tasks upon additional training. We focus on the role of weight decay, a key regularization parameter during pretraining, and show through systematic experiments that larger weight decay increases the plasticity of the pretrained model, resulting in greater performance gains downstream after fine-tuning. This effect can lead to counterintuitive trade-offs where base models that perform worse after pretraining can perform better after further training. Further investigation of weight decay's mechanistic effects on model behavior reveals that it encourages linearly separable representations, regularizes attention matrices, and reduces overfitting on the training data. Together, these findings highlight the importance of pretrained model plasticity, the limits of using cross-entropy loss as the sole metric for hyperparameter optimization, and the multifaceted role that a single optimization hyperparameter plays in shaping model behavior.

2602.11083 2026-06-01 cs.LG cs.CR 版本更新

Token-Efficient Change Detection in LLM APIs

LLM API中的令牌高效变化检测

Timothée Chauvin, Clément Lalanne, Erwan Le Merrer, Jean-Michel Loubes, François Taïani, Gilles Tredan

发表机构 * Université de Rennes, Inria, CNRS/IRISA(里昂大学、法国国家信息与自动化技术研究院、法国国家科学研究中心/IRISA) Equipe Regalia, Inria(Regalia团队、法国国家信息与自动化技术研究院) LAAS, CNRS(拉劳斯研究中心、法国国家科学研究中心) Univ Toulouse, INUC, UT2J, INSA Toulouse, TSE, CNRS, IMT(图卢兹大学、INUC、UT2J、图卢兹国家高等工业学院、TSE、法国国家科学研究中心、IMT)

AI总结 提出基于边界输入的黑盒变化检测方案B3IT,在仅观察输出令牌的条件下实现低成本、高性能的LLM变化检测。

Comments ICML 2026

详情
AI中文摘要

远程检测LLM中的变化是一个难题。现有方法要么在大规模部署时成本过高,要么需要初始的白盒访问模型权重或灰盒访问对数概率。我们的目标是实现低成本和严格的黑盒操作,仅观察输出令牌。我们的方法依赖于我们称为边界输入的特定输入,对于这些输入,存在多个输出顶部令牌。从统计角度来看,最优变化检测取决于模型的雅可比矩阵和输出分布的Fisher信息。在低温状态下分析这些量表明,边界输入能够实现强大的变化检测测试。基于这一见解,我们提出了黑盒边界输入跟踪(B3IT)方案。大量的体内和体外实验表明,对于非推理测试端点,边界输入很容易找到,并且性能与最佳可用的灰盒方法相当。与现有方法相比,B3IT将成本降低了30倍,同时在严格的黑盒设置中运行。

英文摘要

Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model's Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.

2602.10286 2026-06-01 cs.LG 版本更新

What Does Preference Learning Recover from Pairwise Comparison Data?

成对比较数据中的偏好学习恢复了什么?

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

发表机构 * Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA(卡内基梅隆大学机器学习系)

AI总结 本文通过条件偏好分布(CPRD)形式化成对比较数据中的偏好信息,分析了Bradley-Terry模型在数据违反假设时的恢复能力,并揭示了影响样本效率的关键因素(边界和连通性)。

详情
Journal ref
ICML 2026
AI中文摘要

成对偏好学习是机器学习的核心,最近应用于将语言模型与人类偏好对齐。典型数据集由三元组 $(x, y^+, y^-)$ 组成,其中对于上下文 $x$,响应 $y^+$ 优于响应 $y^-$。Bradley-Terry (BT) 模型是主要方法,将偏好概率建模为潜在得分差异的函数。标准实践假设数据遵循此模型,并相应地学习潜在得分。然而,真实数据可能违反这一假设,目前尚不清楚 BT 学习在这种情况下恢复了什么。从三元组比较数据出发,我们通过条件偏好分布 (CPRD) 形式化其编码的偏好信息。我们给出了 BT 适用于建模 CPRD 的精确条件,并确定了影响样本效率的因素——即边界和连通性。这些结果共同为理解偏好学习实际恢复了什么提供了以数据为中心的基础。

英文摘要

Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over response $y^-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.

2602.07721 2026-06-01 cs.LG cs.CL cs.DB 版本更新

ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs

ParisKV:面向长上下文LLM的快速且漂移鲁棒的KV缓存检索

Yanlin Qi, Xinhang Chen, Huiqiang Jiang, Qitong Wang, Botao Peng, Themis Palpanas

发表机构 * Xi'an Jiaotong University, Xi'an, China(西安交通大学) Qwen Team, Alibaba Group, China(通义实验室) Harvard University, Cambridge, MA, USA(哈佛大学) Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China(中国科学院计算技术研究所)

AI总结 提出基于碰撞候选选择和量化内积重排序的GPU原生KV缓存检索框架ParisKV,在百万token上下文中实现低延迟、高吞吐且分布漂移鲁棒的检索,性能优于或持平全注意力。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

KV缓存检索对于长上下文LLM推理至关重要,但现有方法在处理大规模分布漂移和高延迟时存在困难。我们提出ParisKV,一种基于碰撞候选选择、随后使用量化内积重排序估计器的漂移鲁棒、GPU原生的KV缓存检索框架。对于百万token上下文,ParisKV通过统一虚拟寻址(UVA)支持CPU卸载的KV缓存,实现按需的top-$k$获取,开销极小。ParisKV在长输入和长生成基准测试中匹配或超越全注意力质量。它实现了最先进的长上下文解码效率:即使在长上下文的批大小为1时,也能匹配或超过全注意力速度;在全注意力可运行范围内提供高达2.8倍的吞吐量;并扩展到全注意力内存不足的百万token上下文。在百万token规模下,与两个最先进的KV缓存Top-$k$检索基线MagicPIG和PQCache相比,ParisKV分别将解码延迟降低了17倍和44倍。代码可在https://github.com/amy-77/ParisKV/tree/main获取。

英文摘要

KV-cache retrieval is essential for long-context LLM inference, yet existing methods struggle with distribution drift and high latency at scale. We introduce ParisKV, a drift-robust, GPU-native KV-cache retrieval framework based on collision-based candidate selection, followed by a quantized inner-product reranking estimator. For million-token contexts, ParisKV supports CPU-offloaded KV caches via Unified Virtual Addressing (UVA), enabling on-demand top-$k$ fetching with minimal overhead. ParisKV matches or outperforms full attention quality on long-input and long-generation benchmarks. It achieves state-of-the-art long-context decoding efficiency: it matches or exceeds full attention speed even at batch size 1 for long contexts, delivers up to 2.8$\times$ higher throughput within full attention's runnable range, and scales to million-token contexts where full attention runs out of memory. At million-token scale, ParisKV reduces decode latency by 17$\times$ and 44$\times$ compared to MagicPIG and PQCache, respectively, two state-of-the-art KV-cache Top-$k$ retrieval baselines, code is available at https://github.com/amy-77/ParisKV/tree/main.

2602.09405 2026-06-01 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

记忆是有益还是有害?先验信息设定阈值

Chen Cheng, Rina Foygel Barber

发表机构 * Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 在过参数化线性模型和贝叶斯框架下,研究先验分布如何决定训练误差与泛化误差的关系,给出记忆必要或过拟合有害的条件。

Comments 33 pages, 3 figures. Accepted to the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了任意估计过程中训练误差与泛化误差之间的联系,在贝叶斯设置下,基于一般先验的过参数化线性模型中进行工作。我们发现了先验分布$π$固有的决定因素,给出了最优泛化需要训练误差(i)接近插值(相对于噪声大小,即记忆是必要的),或(ii)接近噪声水平(即过拟合是有害的)的显式条件。值得注意的是,当噪声达到由Fisher信息和先验$π$的方差参数决定的阈值时,这些现象会发生。

英文摘要

We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.

2602.09309 2026-06-01 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG physics.atm-clus 版本更新

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

你能长多远?表征材料科学中图生成模型的外推前沿

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

发表机构 * Texas A\&M University College Station Texas USA Ankara University Ankara Turkey Texas A\&M University at Qatar Doha Qatar College of Science \& Engineering Hamad Bin Khalifa University Doha Qatar Texas A\&M University Ankara University Texas A\&M University at Qatar College of Science \& Engineering Hamad Bin Khalifa University

AI总结 提出RADII基准,通过半径分辨的纳米粒子结构评估晶体生成模型的外推能力,发现模型在训练半径外误差增加,且外推前沿可预测。

详情
AI中文摘要

每种晶体材料生成模型都存在一个临界结构尺寸,超出该尺寸其输出变得不可靠;我们称之为外推前沿。尽管这对纳米材料设计有重要影响,但这一前沿从未被系统测量过。我们引入RADII,一个半径分辨的基准,包含约75,000个晶体衍生的纳米粒子结构(33-11,298个原子),将半径视为连续缩放旋钮,在无泄漏分割下追踪从分布内到分布外的生成质量。每个模型以目标组成和原子数为条件,将几何外推作为评估变量。RADII提供前沿特定的诊断:每个半径的误差曲线精确定位每个架构的缩放上限,表面-内部分解分离边界和体相失效,跨度量排序揭示结构保真度的哪个方面首先失效。对五种最先进架构进行基准测试,我们发现:(i) 表现良好的模型在训练半径外全局位置误差增加约13%,而发散模型在所有尺度上保真度差,局部键合保真度从可忽略的退化到超过2倍的误差增长;(ii) 没有两个架构共享相同的失效序列,揭示前沿是由模型族决定的多维表面;(iii) 表现良好的模型遵循预期的几何缩放指数α ~ 1/3,其分布内拟合可预测分布外误差,使前沿可预测。将MatterGen扩展到其公布的参数数量稳定了采样,但并未关闭前沿,而DiffCSP在公布规模下仍不稳定。这些发现将输出尺度确立为几何生成模型的一级评估轴。代码和数据:https://github.com/KurbanIntelligenceLab/RADII。

英文摘要

Every generative model for crystalline materials harbors a critical structure size beyond which its outputs become unreliable; we call this the extrapolation frontier. Despite its consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ~75,000 crystal-derived nanoparticle structures (33-11,298 atoms) that treats radius as a continuous scaling knob, tracing generation quality from in- to out-of-distribution under leakage-free splits. Each model is conditioned on target composition and atom count, isolating geometric extrapolation as the evaluation variable. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition separates boundary from bulk failures, and cross-metric sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) well-behaved models degrade by ~13% in global positional error beyond training radii, while divergent models show poor fidelity across scales, with local bond fidelity ranging from negligible degradation to over 2x error growth; (ii) no two architectures share a failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models follow the expected geometric scaling exponent alpha ~ 1/3, whose in-distribution fit predicts out-of-distribution error, making frontiers forecastable. Scaling MatterGen to its published parameter count stabilizes sampling but does not close the frontier, while DiffCSP remains unstable at published scale. These findings establish output scale as a first-class evaluation axis for geometric generative models. Code and data: https://github.com/KurbanIntelligenceLab/RADII.

2602.09276 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Effective Reasoning Chains Reduce Intrinsic Dimensionality

有效推理链降低内在维度

Archiki Prasad, Mandar Joshi, Kenton Lee, Mohit Bansal, Peter Shaw

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 本文通过内在维度量化推理链有效性,发现有效推理策略能降低任务内在维度,并在GSM8K上验证其与泛化性能的强负相关。

Comments ICML (spotlight) camera-ready; 22 pages, 3 figures

详情
AI中文摘要

思维链推理及其变体显著提升了语言模型在复杂推理任务上的性能,但不同策略促进泛化的精确机制仍不明确。虽然当前解释常指向增加测试时计算或结构引导,但建立这些因素与泛化之间一致、可量化的联系仍具挑战。本文中,我们将内在维度识别为表征推理链有效性的定量度量。内在维度量化了在给定任务上达到特定准确率阈值所需的最小模型维度数。通过固定模型架构并改变不同推理策略下的任务表述,我们证明有效推理策略持续降低任务的内在维度。在GSM8K上使用Gemma-3 1B和4B验证这一点,我们观察到推理策略的内在维度与其在分布内和分布外数据上的泛化性能之间存在强负相关。我们的发现表明,有效推理链通过使用更少参数更好地压缩任务来促进学习,为分析推理过程提供了新的定量度量。

英文摘要

Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increased test-time computation or structural guidance, establishing a consistent, quantifiable link between these factors and generalization remains challenging. In this work, we identify intrinsic dimensionality as a quantitative measure for characterizing the effectiveness of reasoning chains. Intrinsic dimensionality quantifies the minimum number of model dimensions needed to reach a given accuracy threshold on a given task. By keeping the model architecture fixed and varying the task formulation through different reasoning strategies, we demonstrate that effective reasoning strategies consistently reduce the intrinsic dimensionality of the task. Validating this on GSM8K with Gemma-3 1B and 4B, we observe a strong inverse correlation between the intrinsic dimensionality of a reasoning strategy and its generalization performance on both in-distribution and out-of-distribution data. Our findings suggest that effective reasoning chains facilitate learning by better compressing the task using fewer parameters, offering a new quantitative metric for analyzing reasoning processes.

2602.08964 2026-06-01 cs.LG cs.AI cs.CL cs.CY 版本更新

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

语言模型智能体中目标导向性的行为与表征评估

Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli

发表机构 * University of Pennsylvania(宾夕法尼亚大学) New York University(纽约大学) Indiana University, Bloomington(印第安纳大学,布卢明顿) Northeastern University(东北大学) University College London(伦敦大学学院)

AI总结 本文提出一种结合行为评估与内部表征可解释性分析的目标导向性评估框架,并以LLM智能体在2D网格世界中的导航为例,验证了其行为与表征的一致性。

Comments Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

理解智能体的目标有助于解释和预测其行为,但目前尚无可靠的方法来归因智能系统的目标。我们提出一个评估目标导向性的框架,该框架将行为评估与基于可解释性的模型内部表征分析相结合。作为案例研究,我们考察了一个在二维网格世界中导航至目标状态的LLM智能体。在行为上,我们评估智能体在不同网格大小、障碍物密度和目标结构下的最优策略,发现其性能随任务难度扩展,同时对保持难度的变换和多目标结构具有鲁棒性。然后,我们使用探测方法解码环境及多步行动计划的内部表征。我们发现,LLM智能体非线性地编码了一个粗略的空间地图,保留了关于其位置和目标位置的任务相关近似线索;其行动与这些内部表征大致一致;推理过程重新组织这些表征,从空间线索转向即时行动选择。我们的研究结果支持这样的观点:除了行为评估之外,还需要内省检查来表征智能体如何表示和追求其目标。

英文摘要

Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study, we examine an LLM agent navigating a 2D grid world towards a goal state. Behaviourally, we evaluate the agent against optimal policies across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and multi-goal structures. We then use probing methods to decode internal representations of the environment and multi-step action plans. We find that the LLM agent non-linearly encodes a coarse spatial map, preserving approximate task-relevant cues about its position and the goal location; that its actions are broadly consistent with these internal representations; and that reasoning reorganises them, shifting from spatial cues towards immediate action selection. Our findings support the view that introspective examination is required beyond behavioural evaluations to characterise how agents represent and pursue their objectives.

2602.08267 2026-06-01 cs.LG cs.AI 版本更新

Inverting Data Transformations via Diffusion Sampling

通过扩散采样逆变换数据变换

Jinwoo Kim, Sékou-Oumar Kaba, Jiyun Park, Seunghoon Hong, Siamak Ravanbakhsh

发表机构 * Mila - Quebec Artificial Intelligence Institute, Montr\'eal, Canada School of Computer Science, McGill University, Montr\'eal, Canada

AI总结 提出一种在一般李群上通过扩散采样逆变换未知变换的方法,用于恢复原始数据分布,并在测试时等变性应用中提升预训练神经网络的鲁棒性。

Comments 31 pages, 11 figures

详情
AI中文摘要

我们研究了一般李群上的变换逆问题:一个数据被未知群元素变换,目标是恢复一个逆变换,将其映射回原始数据分布。这种未知变换在机器学习和科学建模中广泛出现,会显著扭曲观测数据。我们采用概率视角,将变换的后验建模为玻尔兹曼分布,由数据空间上的能量函数定义。为了从该后验中采样,我们引入了一个李群上的扩散过程,该过程保持所有更新在流形上,并且仅需在关联的李代数中进行计算。我们的方法,即变换逆能量扩散(TIED),依赖于一个新的平凡化目标分数恒等式,能够高效地对变换后验进行基于分数的采样。作为一个关键应用,我们专注于测试时等变性,其目标是提高预训练神经网络对输入变换的鲁棒性。在图像单应性和PDE对称性上的实验表明,TIED可以在测试时将变换后的输入恢复到训练分布,表现出优于强规范化和采样基线的性能。代码可在 https://github.com/jw9730/tied 获取。

英文摘要

We study the problem of transformation inversion on general Lie groups: a datum is transformed by an unknown group element, and the goal is to recover an inverse transformation that maps it back to the original data distribution. Such unknown transformations arise widely in machine learning and scientific modeling, where they can significantly distort observations. We take a probabilistic view and model the posterior over transformations as a Boltzmann distribution defined by an energy function on the data space. To sample from this posterior, we introduce a diffusion process on Lie groups that keeps all updates on-manifold and only requires computations in the associated Lie algebra. Our method, Transformation-Inverting Energy Diffusion (TIED), relies on a new trivialized target-score identity that enables efficient score-based sampling of the transformation posterior. As a key application, we focus on test-time equivariance, where the objective is to improve the robustness of pretrained neural networks to input transformations. Experiments on image homographies and PDE symmetries demonstrate that TIED can restore transformed inputs to the training distribution at test time, showing improved performance over strong canonicalization and sampling baselines. Code is available at https://github.com/jw9730/tied.

2506.00175 2026-06-01 cs.LG cs.AI 版本更新

Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems

谁获得功劳或责备?在现代AI系统中分配责任

Shichang Zhang, Hongzhe Du, Jiaqi W. Ma, Himabindu Lakkaraju

发表机构 * Harvard University, Cambridge, MA, USA(哈佛大学) University of California, Los Angeles, Los Angeles, CA, USA(加州大学洛杉矶分校) University of Illinois Urbana-Champaign, Urbana-Champaign, IL, USA(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一个归因框架,通过反事实问题量化模型开发各阶段(预训练、微调等)对最终行为的影响,并设计无需重训练的估计器,成功识别并移除多阶段任务中的虚假关联。

详情
AI中文摘要

现代AI系统通常通过多个阶段开发——预训练、微调轮次以及后续的适应或对齐,每个阶段都建立在先前阶段之上并以不同方式更新模型。这引发了一个关键的责任问题:当部署的模型成功或失败时,哪个阶段负责,以及负责到什么程度?我们提出了责任归因问题,用于将模型行为追溯到模型开发过程的特定阶段。为了解决这一挑战,我们提出了一个通用框架,回答关于阶段效应的反事实问题:如果特定阶段的更新没有发生,模型的行为会如何改变?在此框架内,我们引入了无需重新训练模型即可高效量化阶段效应的估计器,考虑了数据和模型优化动态的关键方面,包括学习率调度、动量和权重衰减。我们证明了我们的方法成功量化了每个阶段对模型行为的责任。基于归因结果,我们的方法可以识别并移除在图像分类和文本毒性检测任务中跨多个阶段开发时学到的虚假相关性。我们的方法为模型分析提供了实用工具,并代表了向更负责任的AI发展迈出的重要一步。

英文摘要

Modern AI systems are typically developed through multiple stages-pretraining, fine-tuning rounds, and subsequent adaptation or alignment, where each stage builds on the previous ones and updates the model in distinct ways. This raises a critical question of accountability: when a deployed model succeeds or fails, which stage is responsible, and to what extent? We pose the accountability attribution problem for tracing model behavior back to specific stages of the model development process. To address this challenge, we propose a general framework that answers counterfactual questions about stage effects: how would the model's behavior have changed if the updates from a particular stage had not occurred? Within this framework, we introduce estimators that efficiently quantify stage effects without retraining the model, accounting for both the data and key aspects of model optimization dynamics, including learning rate schedules, momentum, and weight decay. We demonstrate that our approach successfully quantifies the accountability of each stage to the model's behavior. Based on the attribution results, our method can identify and remove spurious correlations learned during image classification and text toxicity detection tasks that were developed across multiple stages. Our approach provides a practical tool for model analysis and represents a significant step toward more accountable AI development.

2602.07928 2026-06-01 cs.LG cs.AI 版本更新

A Kinetic Energy Perspective of Flow Matching

流匹配的动能视角

Ziyun Li, Huancheng Hu, Soon Hoe Lim, Xuyu Li, Fei Gao, Enmao Diao, Zezhen Ding, Michalis Vazirgiannis, Henrik Bostrom

发表机构 * KTH Royal Institute of Technology(皇家理工学院) Nordita, Nordic Institute for Theoretical Physics(北欧理论物理研究所) Hasso Plattner Institute, University of Potsdam(波茨坦大学哈索 Plattner 研究院) Trinity College Dublin(都柏林三一学院) Hangzhou Institute of Technology, Xidian University(西安电子科技大学杭州研究院) The Hong Kong University of Science(香港科学大学) Mohamed bin Zayed University of Artificial Intelligence(莫莫丁·本·扎耶德人工智能大学)

AI总结 本文引入动能路径能量(KPE)作为流匹配生成模型的诊断工具,发现其与语义保真度和数据稀疏性相关,并基于此提出无训练的动能轨迹塑形(KTS)策略以改善生成质量。

Comments ICML 2026 Spotlight

详情
AI中文摘要

基于流的生成模型可以通过物理视角来审视:采样通过积分学习到的速度场将粒子从噪声传输到数据,每个样本对应一条具有自身动力学努力的轨迹。受经典力学启发,我们引入了动能路径能量(KPE),这是一种类似作用量的每样本诊断指标,用于测量沿常微分方程(ODE)轨迹累积的动能努力。实验上,KPE表现出两种稳健的对应关系:{i} 较高的KPE预测更强的语义保真度;{ii} 高KPE轨迹落在稀疏表示区域。我们进一步提供了将轨迹能量与数据稀疏性联系起来的理论保证。矛盾的是,这种相关性是非单调的。在足够高的能量下,生成可能退化为记忆。利用经验流匹配的闭式公式,我们表明极端能量驱动轨迹接近训练样本的副本。这产生了金发姑娘原则,并激发了动能轨迹塑形(KTS),一种无训练的两阶段推理策略,该策略增强早期运动并强制执行后期软着陆,从而减少记忆并提高基准任务上的生成质量。

英文摘要

Flow-based generative models can be viewed through a physics lens: sampling transports a particle from noise to data by integrating a learned velocity field, and each sample corresponds to a trajectory with its own dynamical effort. Motivated by classical mechanics, we introduce Kinetic Path Energy (KPE), an action-like, per-sample diagnostic that measures the accumulated kinetic effort along an ordinary differential equation (ODE) trajectory. Empirically, KPE exhibits two robust correspondences: {i} higher KPE predicts stronger semantic fidelity; {ii} high-KPE trajectories land in sparse representation regions. We further provide theoretical guarantees linking trajectory energy to data sparsity. Paradoxically, this correlation is non-monotonic. At sufficiently high energy, generation can degenerate into memorization. Leveraging the closed-form formula of empirical flow matching, we show that extreme energies drive trajectories toward near-copies of training examples. This yields a Goldilocks principle and motivates Kinetic Trajectory Shaping (KTS), a training-free two-phase inference strategy that boosts early motion and enforces a late-time soft landing, reducing memorization and improving generation quality across benchmark tasks.

2602.07285 2026-06-01 cs.LG 版本更新

Fair Decisions from Calibrated Scores: Achieving Optimal Classification While Satisfying Sufficiency

基于校准分数的公平决策:在满足充分性的同时实现最优分类

Etam Benger, Katrina Ligett

发表机构 * School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel(计算机科学与工程学院,耶路撒冷希伯来大学,以色列) The Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem, Israel(理性研究基金会,耶路撒冷希伯来大学,以色列)

AI总结 本文针对充分性公平约束下的二元分类问题,提出了一种基于分组校准分数的后处理算法,能够实现最优随机分类,并给出了可行正预测值与错误遗漏率对的几何刻画。

Comments Accepted to ICML 2026

详情
AI中文摘要

基于预测概率(分数)的二元分类是监督机器学习中的基本任务。在无约束设置中,阈值化分数是贝叶斯最优的,但使用单一阈值通常会违反统计群体公平约束。在独立性(统计均等)和分离性(均等机会)下,当分数已经满足相应准则时,这种阈值化就足够了。然而,这并不扩展到充分性:即使完全分组校准的分数——包括真实类别概率——在阈值化后也会违反预测均等。在这项工作中,我们提出了在充分性下最优二元(随机)分类的精确解,假设有限的分组校准分数集。我们给出了这些分类器可实现的正预测值(PPV)和错误遗漏率(FOR)可行对的几何刻画,并利用它推导出一个简单的后处理算法,该算法仅使用分组校准分数和组成员身份即可获得最优分类器。最后,由于充分性和分离性通常不兼容,我们确定了在满足充分性的前提下最小化与分离性偏差的分类器,并表明该分类器也可以通过我们的算法获得,其性能通常与最优值相当。

英文摘要

Binary classification based on predicted probabilities (scores) is a fundamental task in supervised machine learning. While thresholding scores is Bayes-optimal in the unconstrained setting, using a single threshold generally violates statistical group fairness constraints. Under independence (statistical parity) and separation (equalized odds), such thresholding suffices when the scores already satisfy the corresponding criterion. However, this does not extend to sufficiency: even perfectly group-calibrated scores -- including true class probabilities -- violate predictive parity after thresholding. In this work, we present an exact solution for optimal binary (randomized) classification under sufficiency, assuming finite sets of group-calibrated scores. We provide a geometric characterization of the feasible pairs of positive predictive value (PPV) and false omission rate (FOR) achievable by such classifiers, and use it to derive a simple post-processing algorithm that attains the optimal classifier using only group-calibrated scores and group membership. Finally, since sufficiency and separation are generally incompatible, we identify the classifier that minimizes deviation from separation subject to sufficiency, and show that it can also be obtained by our algorithm, often achieving performance comparable to the optimum.

2510.10544 2026-06-01 cs.LG cs.AI stat.ML 版本更新

PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

PAC-Bayesian 强化学习训练可泛化策略

Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, Ryan Horache, Nadia Kabachi, Omar Rivasplata

发表机构 * Université Claude Bernard Lyon 1, LIRIS, UMR CNRS 5205, France(里尔一大学,LIRIS,法国CNRS 5205)

AI总结 提出一种新的 PAC-Bayesian 泛化界,通过链的混合时间显式考虑数据中的马尔可夫依赖性,并基于此设计 PB-SAC 算法以优化该界指导探索,在连续控制任务中提供有意义的置信度证书且保持竞争性能。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Camera-ready version

详情
AI中文摘要

我们推导了一个新的用于强化学习的 PAC-Bayesian 泛化界,该界通过链的混合时间显式考虑了数据中的马尔可夫依赖性。这有助于克服在强化学习中获取泛化保证的挑战,因为数据的序列性质破坏了经典界所依赖的独立性假设。新界为现代离策略算法(如 Soft Actor-Critic)提供了非空泛证书。我们通过 PB-SAC 展示了该界的实际效用,这是一种在训练过程中优化该界以指导探索的新算法。在多个连续控制任务上的实验表明,所提出的方法在保持竞争性能的同时提供了有意义的置信度证书。

英文摘要

We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining generalization guarantees for reinforcement learning, where the sequential nature of data breaks the independence assumptions underlying classical bounds. The new bound provides non-vacuous certificates for modern off-policy algorithms such as Soft Actor-Critic. We demonstrate the practical utility of the bound through PB-SAC, a novel algorithm that optimizes the bound during training to guide exploration. Experiments across several continuous control tasks show that the proposed approach provides meaningful confidence certificates while maintaining competitive performance.

2602.06902 2026-06-01 cs.LG stat.ML 版本更新

Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory

无参数动态遗憾:时变移动成本、延迟反馈和记忆

Hao Qiu, Andrew Jacobsen, Emmanuel Esposito, Mengxiao Zhang

发表机构 * University of Iowa(爱荷华大学) National University of Singapore(新加坡国立大学)

AI总结 本文提出一种新算法,在具有时变移动成本的在线凸优化中,首次实现了比较器自适应的动态遗憾界,并应用于延迟反馈和时变记忆问题。

Comments 28 pages; v2: ICML 2026

详情
AI中文摘要

在本文中,我们研究了具有移动成本的无约束在线凸优化(OCO)中的动态遗憾。具体来说,我们通过允许移动成本系数$λ_t$随时间任意变化来推广标准设置。我们的主要贡献是一种新颖的算法,该算法为此设置建立了第一个比较器自适应动态遗憾界,保证$\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$遗憾,其中$P_T$是比较器序列在$T$轮上的路径长度,$M$是最大比较器范数。我们的结果恢复了OCO中静态和动态遗憾的最优自适应率,作为所有轮次中$λ_t=0$的特例。为了展示我们结果的多功能性,我们考虑了两个应用:具有延迟反馈的OCO和具有时变记忆的OCO。我们表明这两个问题都可以转化为时变移动成本,特别是为延迟反馈设置建立了一种新颖的归约,这具有独立的意义。一个关键的观察是,我们的遗憾界中对移动成本的一阶依赖在实现两种设置中的最优比较器自适应动态遗憾保证中起着关键作用。

英文摘要

In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $λ_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$ regret, where $P_T$ is the path length of the comparator sequence over $T$ rounds and $M$ is the maximal comparator norm. Our result recovers the optimal adaptive rates for both static and dynamic regret in OCO as the special case where $λ_t=0$ for all rounds. To demonstrate the versatility of our results, we consider two applications: OCO with delayed feedback and OCO with time-varying memory. We show that both problems can be translated into time-varying movement costs, establishing a novel reduction specifically for the delayed feedback setting that is of independent interest. A crucial observation is that the first-order dependence on movement costs in our regret bound plays a key role in enabling optimal comparator-adaptive dynamic regret guarantees in both settings.

2602.00942 2026-06-01 cs.LG 版本更新

SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference

SALAAD: 基于ADMM的稀疏低秩适配用于大语言模型推理

Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach

发表机构 * ETH Zurich(苏黎世联邦理工学院) Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所) École polytechnique fédérale de Lausanne (EPFL)(洛桑联邦理工学院(EPFL))

AI总结 提出SALAAD框架,通过增广拉格朗日方法在训练中诱导稀疏低秩结构,实现模型容量灵活控制,降低部署内存且无需重训。

详情
AI中文摘要

现代大型语言模型越来越多地在计算和内存限制下部署,使得模型容量的灵活控制成为核心挑战。虽然稀疏和低秩结构自然地权衡了容量和性能,但现有方法通常依赖于忽略层和矩阵异质性的启发式设计,或需要特定于模型的架构修改。我们提出了SALAAD,一个适用于不同模型架构的即插即用框架,在训练过程中诱导稀疏和低秩结构。通过在增广拉格朗日框架下制定结构化权重学习,并引入自适应控制器动态平衡训练损失和结构约束,SALAAD保持了标准训练动态的稳定性,同时实现了对训练过程中有效模型容量演变的显式控制。跨模型规模的实验表明,SALAAD在部署期间显著减少了内存消耗,同时实现了与特设方法相当的性能。此外,单次训练运行产生了一个连续谱的模型容量,使得能够在不同的内存预算下实现平滑和弹性的部署,而无需重新训练。

英文摘要

Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structures naturally trade off capacity and performance, existing approaches often rely on heuristic designs that ignore layer and matrix heterogeneity or require model-specific architectural modifications. We propose SALAAD, a plug-and-play framework applicable to different model architectures that induces sparse and low-rank structures during training. By formulating structured weight learning under an augmented Lagrangian framework and introducing an adaptive controller that dynamically balances the training loss and structural constraints, SALAAD preserves the stability of standard training dynamics while enabling explicit control over the evolution of effective model capacity during training. Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment while achieving performance comparable to ad-hoc methods. Moreover, a single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets without the need for retraining.

2601.01754 2026-06-01 cs.LG cs.CC cs.CL cs.FL 版本更新

Context-Free Recognition with Transformers

使用Transformer进行上下文无关语言识别

Selim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill

发表机构 * ETH Zürich(苏黎世联邦理工学院) Boston University(波士顿大学) Allen Institute for AI(人工智能研究院)

AI总结 本文证明循环Transformer通过O(log N)层和O(N^6)填充符号可识别所有上下文无关语言,并针对无歧义子类将填充需求降至O(N^3)。

详情
AI中文摘要

Transformer在处理符合某种语法的良好形式输入(如自然语言和代码)的任务中表现出色。然而,它们如何处理语法句法仍不清楚。事实上,在标准复杂性猜想下,标准Transformer无法识别上下文无关语言(CFL)——一种描述句法的规范形式,甚至无法识别正则语言(CFL的子类)。过去的工作表明,O(log(N))循环层(相对于输入长度N)允许Transformer识别正则语言,但循环Transformer识别上下文无关语言的问题仍然开放。在这项工作中,我们证明具有O(log(N))循环层和O(N^6)填充符号的循环Transformer可以识别所有CFL。然而,使用O(N^6)填充符号的训练和推理可能不切实际。幸运的是,我们表明,对于无歧义CFL等自然子类,Transformer上的识别问题变得更加易处理,只需要O(N^3)填充。实验上,循环和填充Transformer在识别CFL方面比固定深度Transformer表现更好。总体而言,我们的结果揭示了Transformer识别CFL的复杂性:虽然一般识别可能需要难以处理的填充量,但无歧义性等自然约束产生了高效的识别算法。

英文摘要

Transformers excel empirically on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs. Past work has shown that $\mathcal{O}(\log(N))$ looping layers (w.r.t. input length $N$) allow transformers to recognize regular languages, but the question of context-free recognition with looped transformers remained open. In this work, we show that looped transformers with $\mathcal{O}(\log(N))$ looping layers and $\mathcal{O}(N^6)$ padding symbols can recognize all CFLs. However, training and inference with $\mathcal{O}(N^6)$ padding symbols is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(N^3)$ padding. Empirically, looped and padded transformers perform better than fixed-depth transformers in recognizing CFLs. Overall, our results shed light on the intricacy of CFL recognition by transformers: while general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.

2405.07836 2026-06-01 cs.LG stat.ME 版本更新

Forecasting with Hyper-Trees

超树预测

Alexander März, Kashif Rasul

发表机构 * Independent Researcher(独立研究者) Morgan Stanley Research(摩根士丹利研究)

AI总结 提出超树框架,通过梯度提升树学习目标时间序列模型(如ARIMA或指数平滑)的参数,结合决策树与经典预测模型,并引入混合架构解决高维参数估计的缩放限制。

Comments Gradient Boosted Trees, Hyper Models, Hybrid Models, Time Series Forecasting, Time-Varying Parameters

详情
AI中文摘要

我们引入超树作为一种新颖的框架,用于使用梯度提升树对时间序列数据进行建模。与直接预测时间序列的传统树方法不同,超树学习目标时间序列模型(如ARIMA或指数平滑)的参数,这些参数是特征的函数。然后,目标模型使用这些参数生成最终预测。我们的框架将决策树在表格数据上的有效性与经典预测模型相结合,从而将时间序列归纳偏差引入树模型。为了解决提升树在估计高维目标模型参数时的缩放限制,我们将决策树和神经网络结合在一个统一的框架中。在这种混合方法中,树从输入特征生成信息表示,然后浅层网络将其作为输入来学习时间序列模型的参数。通过我们的研究,我们探索了超树在各种预测任务中的有效性,并将基于树的建模扩展到时间序列分析中的传统用途之外。

英文摘要

We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series model, such as ARIMA or Exponential Smoothing, as functions of features. These parameters are then used by the target model to generate the final forecasts. Our framework combines the effectiveness of decision trees on tabular data with classical forecasting models, thereby inducing a time series inductive bias into tree-based models. To resolve the scaling limitations of boosted trees when estimating a high-dimensional set of target model parameters, we combine decision trees and neural networks within a unified framework. In this hybrid approach, the trees generate informative representations from the input features, which a shallow network then uses as input to learn the parameters of a time series model. With our research, we explore the effectiveness of Hyper-Trees across a range of forecasting tasks and extend tree-based modeling beyond its conventional use in time series analysis.

2601.19791 2026-06-01 cs.LG stat.ML 版本更新

To Grok Grokking: Provable Grokking in Ridge Regression

理解Grokking:岭回归中可证明的Grokking现象

Mingyue Xu, Gal Vardi, Itay Safran

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN, USA(普渡大学计算机科学系) Department of Computer Science(计算机科学系) Applied Mathematics, Weizmann Institute of Science, Israel(科学应用数学系,魏茨曼研究院) Stein Faculty of Computer(斯坦因计算机科学学院) Information Science, Ben-Gurion University of the Negev, Israel(信息科学系,本· Gurion 军事大学)

AI总结 本文在经典岭回归设置中研究grokking现象,证明使用带权重衰减的梯度下降学习过参数化线性回归模型时,存在过拟合、泛化延迟和最终泛化误差任意小的三个阶段,并首次给出泛化延迟(grokking时间)的严格定量界,同时通过实验表明该界也适用于非线性神经网络。

详情
AI中文摘要

我们在经典岭回归设置中研究grokking现象,即过拟合后很久才出现泛化。我们证明了使用带权重衰减的梯度下降学习过参数化线性回归模型的端到端grokking结果。具体地,我们证明以下阶段发生:(i) 训练早期模型过拟合训练数据;(ii) 过拟合显现后长时间泛化性能差;(iii) 泛化误差最终变得任意小。此外,我们从理论和实验上表明,通过适当的超参数调优,可以以原则性的方式放大或消除grokking。据我们所知,这是首次以训练超参数表示的泛化延迟(我们称之为“grokking时间”)的严格定量界。最后,超越线性设置,我们实验证明我们的定量界也捕捉了非线性神经网络上grokking的行为。我们的结果表明,grokking不是深度学习固有的失败模式,而是特定训练条件的结果,因此不需要对模型架构或学习算法进行根本性改变来避免。

英文摘要

We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both theoretically and empirically, that grokking can be amplified or eliminated in a principled manner through proper hyperparameter tuning. To the best of our knowledge, these are the first rigorous quantitative bounds on the generalization delay (which we refer to as the "grokking time") in terms of training hyperparameters. Lastly, going beyond the linear setting, we empirically demonstrate that our quantitative bounds also capture the behavior of grokking on non-linear neural networks. Our results suggest that grokking is not an inherent failure mode of deep learning, but rather a consequence of specific training conditions, and thus does not require fundamental changes to the model architecture or learning algorithm to avoid.

2602.05649 2026-06-01 cs.LG 版本更新

End-to-End Compression for Tabular Foundation Models

表格基础模型的端到端压缩

Guri Zabërgja, Rafiq Kamel, Arlind Kadra, Christian M. M. Frey, Josif Grabocka

发表机构 * Department of Computer Science, Technical University of Nuremberg(纽伦堡技术大学计算机科学系) Department of Computer Science University of Freiburg(弗赖堡大学计算机科学系)

AI总结 提出TACO,一种端到端表格压缩模型,在潜在空间压缩训练数据,以解决表格Transformer在推理时间和内存上的二次复杂度问题,在TabArena基准上实现高达94倍加速和97%内存节省,且性能无明显下降。

Comments Accepted as Spotlight at ICML 2026

详情
AI中文摘要

长期以来,梯度提升决策树在表格数据上的主导地位最近受到了上下文学习表格基础模型的挑战。上下文学习方法通过将训练数据作为上下文来预测查询测试点,无需参数更新即可在一次前向传播中完成拟合和预测。尽管最近的表格基础模型达到了最先进的性能,但基于注意力机制的Transformer架构在数据集大小上具有二次复杂度,这增加了训练和推理时间的开销,并限制了模型处理大规模数据集的能力。在这项工作中,我们提出了TACO,一种端到端的表格压缩模型,它在潜在空间中压缩训练数据集。我们在TabArena基准上测试了我们的方法,与最先进的表格Transformer架构相比,我们的方法在推理时间上快了高达94倍,同时内存消耗减少了97%,且性能没有显著下降。最后,我们的方法不仅随着数据集规模的增大而更好地扩展,而且与其他基线相比也取得了更好的性能。

英文摘要

The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97\% less memory compared to the state-of-the-art tabular transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.

2512.14980 2026-06-01 cs.LG 版本更新

Softly Constrained Denoisers for Diffusion Models Applied to Partial Differential Equations

应用于偏微分方程的扩散模型的软约束去噪器

Victor M. Yeom-Song, Severi Rissanen, Arno Solin, Samuel Kaski, Mingfei Sun

发表机构 * ELLIS Institute Finland(芬兰ELLIS研究所) Aalto University(阿alto大学) University of Manchester(曼彻斯特大学)

AI总结 提出在扩散模型的去噪器中引入基于偏微分方程的软归纳偏置,以在提高约束遵从性的同时保持对模型错误指定的适应性。

Comments 22 pages including appendix, 8 figures including appendix, preprint

详情
AI中文摘要

扩散模型已成为偏微分方程解的强大生成先验。现有方法通过将PDE残差添加为损失正则化器或通过推理时调整来强制执行物理约束。这些方法使模型偏离真实数据分布,当控制PDE被错误指定时尤其成问题。为了规避这些问题同时充分利用PDE约束,我们在从PDE导出的去噪器架构中引入软归纳偏置。我们表明,这些软约束去噪器利用约束知识来改善对标准去噪器的遵从性,同时在相对于观测数据存在错误指定的情况下保持足够的灵活性以偏离它。

英文摘要

Diffusion models have become a powerful generative prior for solutions of partial differential equations (PDEs). Existing approaches enforce physical constraints either by adding the PDE residuals as loss regularizers or through inference-time adjustments. These methods bias the model away from the true data distribution, which is especially problematic when the governing PDE is misspecified. To circumvent these issues while making the most out of the PDE constraint, we introduce soft inductive biases into the denoiser architecture derived from the PDEs. We show that these softly constrained denoisers exploit constraint knowledge to improve compliance over standard denoisers, while maintaining enough flexibility to deviate from it in case of misspecification with respect to observed data.

2602.04737 2026-06-01 cs.LG 版本更新

Rationality Measurement and Theory for Reinforcement Learning Agents

强化学习智能体的理性度量与理论

Kejiang Qian, Amos Storkey, Fengxiang He

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 本文提出一套理性度量及其理论,用于评估强化学习智能体在部署中的行为理性,并分解理性风险差距为环境变化和算法泛化能力两部分。

详情
AI中文摘要

本文针对强化学习智能体提出了一套理性度量及其相关理论,该属性日益关键但鲜有探索。我们定义部署中的行动为完全理性,如果它在最陡方向上最大化隐藏的真实价值函数。策略行动与其理性对应物的期望价值差异,在部署轨迹上累积,被定义为期望理性风险;训练中的经验平均版本也被定义。它们的差异称为理性风险差距,被分解为(1)由训练和部署之间环境变化引起的外在成分,以及(2)由算法在动态环境中的泛化能力引起的内在成分。它们分别被(1)训练和部署中转移核与初始状态分布之间的$1$-Wasserstein距离,以及(2)价值函数类的经验Rademacher复杂度所上界。我们的理论提出了关于正则化(包括层归一化、$\ell_2$正则化和权重归一化)和领域随机化的益处,以及环境变化的危害的假设。实验与这些假设完全一致。代码可在https://github.com/EVIEHub/Rationality获取。

英文摘要

This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from regularisers (including layer normalisation, $\ell_2$ regularisation, and weight normalisation) and domain randomisation, as well as the harm from environment shifts. Experiments are in full agreement with these hypotheses. The code is available at https://github.com/EVIEHub/Rationality.

2506.05994 2026-06-01 cs.LG cs.AR cs.ET 版本更新

RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory

RETENTION: 基于内容可寻址存储器的资源高效树集成模型加速

Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo

发表机构 * Department of Computer Science and Information Engineering, National Taiwan University(国立台湾大学计算机科学与资讯工程学系) IRIT, Université de Toulouse, Toulouse INP–UT3, CNRS(图卢兹大学IRIT实验室) Lab-STICC, CNRS UMR 6285 , ENSTA, Institut Polytechnique de Paris(ENSTA巴黎理工学院Lab-STICC实验室)

AI总结 提出RETENTION框架,通过迭代剪枝算法和树映射方案,显著减少内容可寻址存储器容量需求,实现资源高效的树集成模型加速。

Comments Under review by IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems

详情
AI中文摘要

尽管深度学习在处理非结构化数据方面展现了卓越的能力,但现代基于树的集成模型在从结构化数据中提取相关信息和学习方面仍然更胜一筹。虽然已有若干工作致力于加速树模型,但模型的固有特性对传统加速器构成了重大挑战。最近利用内容可寻址存储器(CAM)的研究为加速树模型提供了有前景的解决方案,然而现有设计存在内存消耗过大和利用率低的问题。本文通过引入RETENTION,一个端到端框架,显著降低了树模型推理的CAM容量需求。我们提出了一种迭代剪枝算法,该算法具有针对基于装袋模型(例如随机森林)的新颖剪枝准则,在确保受控精度下降的同时最小化模型复杂度。此外,我们提出了一种树映射方案,其中包含两种创新的数据放置策略,以缓解CAM中广泛使用的无关状态导致的内存冗余。实验结果表明,仅实施树映射方案即可将CAM容量需求降低1.46倍至21.30倍,而完整的RETENTION框架在精度损失小于3%的情况下实现了4.35倍至207.12倍的降低。这些结果表明,RETENTION在最小化CAM资源需求方面非常有效,为树模型加速提供了一种资源高效的方向。

英文摘要

Although deep learning has demonstrated remarkable capability in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don't care states in CAM. Experimental results show that implementing the tree mapping scheme alone reduces CAM capacity requirement by $1.46\times$ to $21.30 \times$, while the full RETENTION framework achieves $4.35\times$ to $207.12\times$ reduction with less than 3\% accuracy loss. These results demonstrate that RETENTION is highly effective in minimizing CAM resource demand, providing a resource-efficient direction for tree-based model acceleration.

2602.04107 2026-06-01 cs.LG cs.IT math.IT 版本更新

Supervised Learning as Lossy Compression: Characterizing Generalization and Sample Complexity via Finite Blocklength Analysis

监督学习作为有损压缩:通过有限块长分析刻画泛化与样本复杂度

Kosuke Sugiyama, Masato Uchida

发表机构 * Waseda University(早稻田大学)

AI总结 本文通过将学习问题置于有损压缩框架中并应用有限块长分析,从信息论角度推导了固定随机学习算法及其最优采样策略的样本复杂度和泛化误差下界,显式分离了过拟合程度与归纳偏置-任务不匹配项。

Comments 40 pages, 1 figure

详情
AI中文摘要

本文通过将学习问题置于有损压缩的背景下并应用有限块长分析,提出了一种关于机器学习中泛化的新颖信息论视角。在我们的方法中,训练数据的采样形式上对应于编码过程,而模型构建对应于解码过程。通过利用有限块长分析,我们推导了固定随机学习算法及其相关最优采样策略的样本复杂度和泛化误差的下界。我们的界限明确地将学习算法的过拟合程度与其归纳偏置和任务之间的不匹配作为不同的项进行刻画。这种分离提供了相对于现有框架的显著优势。此外,我们分解了过拟合项,以显示其与信息论界限和稳定性理论中现有度量的理论联系,从而在我们的提议框架下统一了这些视角。

英文摘要

This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the sampling of training data formally corresponds to an encoding process, and the model construction to a decoding process. By leveraging finite blocklength analysis, we derive lower bounds on sample complexity and generalization error for a fixed randomized learning algorithm and its associated optimal sampling strategy. Our bounds explicitly characterize the degree of overfitting of the learning algorithm and the mismatch between its inductive bias and the task as distinct terms. This separation provides a significant advantage over existing frameworks. Additionally, we decompose the overfitting term to show its theoretical connection to existing metrics found in information-theoretic bounds and stability theory, unifying these perspectives under our proposed framework.

2602.04031 2026-06-01 cs.LG 版本更新

The Illusion of Generalization in Tabular Language Models

表格语言模型中的泛化错觉

Aditya Gorla, Ratish Puduppully

发表机构 * University of California, Los Angeles, USA(加州大学洛杉矶分校) IT University of Copenhagen, Denmark(哥本哈根IT大学)

AI总结 通过系统评估Tabula-8B在165个数据集上的表现,发现其声称的泛化能力主要源于评估伪影(如数据污染和格式熟悉度),而非真正的表格推理。

详情
Journal ref
In Proc. 43th International Conference on Machine Learning (ICML 2026)
AI中文摘要

表格语言模型(TLMs)据称在表格预测中实现了强大的泛化能力。我们对代表性TLM——Tabula-8B进行了系统性的重新评估,使用了UniPredict基准中的165个数据集。我们的研究揭示了三个发现。首先,二分类和多类别分类在多数类基线上实现了接近零的中位数提升,而强大的聚合性能完全由四分位数分类任务驱动。其次,表现最好的数据集存在普遍的数据污染,包括完整的训练-测试重叠和任务级泄露,这些污染规避了标准的去重方法。第三,在没有表格数据暴露的情况下进行指令微调,恢复了标准分类性能的92.2%,而在四分位数分类上,格式熟悉度缩小了71.3%的差距,剩余部分归因于污染数据集。这些发现表明,声称的泛化能力可能反映的是评估伪影,而非学到的表格推理。最后,我们提出了加强TLM评估的建议。

英文摘要

Tabular Language Models (TLMs) have been claimed to achieve strong generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utilizing 165 datasets from the UniPredict benchmark. Our investigation reveals three findings. First, binary and categorical classification achieve near-zero median lift over majority-class baselines and strong aggregate performance is driven entirely by quartile classification tasks. Second, top-performing datasets exhibit pervasive contamination, including complete train-test overlap and task-level leakage that evades standard deduplication. Third, instruction-tuning without tabular exposure recovers 92.2% of standard classification performance and on quartile classification, format familiarity closes 71.3% of the gap with the residual attributable to contaminated datasets. These findings suggest claimed generalization likely reflects evaluation artifacts rather than learned tabular reasoning. We conclude with recommendations for strengthening TLM evaluation.

2602.03896 2026-06-01 stat.ML cs.LG q-bio.NC 版本更新

A hitchhiker's guide to Poisson gradient estimation

泊松梯度估计的旅行者指南

Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Reality Labs, Meta(Meta现实实验室) Aerospace Information Research Institute, Chinese Academy of Sciences(中国科学院航空信息研究所) Redwood Center for Theoretical Neuroscience, UC Berkeley(伯克利大学理论神经科学中心) VERSES AI Research Lab, Los Angeles, USA(洛杉矶VERSES AI研究实验室)

AI总结 本文系统比较了指数到达时间模拟和Gumbel-SoftMax松弛两种方法,提出改进的EAT方法以降低偏差,并在泊松潜变量模型上验证其优越性能。

Comments Published at ICML2026 --- code: https://github.com/hadivafaii/PoissonGradientEstimation

详情
AI中文摘要

泊松分布潜变量模型在计算神经科学中广泛使用,但通过离散随机样本进行微分仍然具有挑战性。两种方法解决了这一问题:*指数到达时间*(EAT)模拟和*Gumbel-SoftMax*(GSM)松弛。我们首次对这些方法进行了系统比较,并为实践者提供了实用指导。我们的主要技术贡献是对EAT方法的修改,理论上保证了无偏的一阶矩(精确匹配发放率),并减少了二阶矩偏差。我们在分布保真度、梯度质量以及两个任务上的性能对这些方法进行了评估:(1)具有泊松潜变量的变分自编码器,以及(2)部分可观测的广义线性模型,其中必须从观测到的脉冲序列推断潜在的神经连接性。在所有指标上,我们修改后的EAT方法表现出更好的整体性能(通常与精确梯度相当),并且对超参数选择具有更高的鲁棒性。这些结果扩展到过度分散的负二项潜变量,其中修改后的EAT再次表现最佳。然而,只有GSM可以推广到任意非泊松分布,包括欠分散的情况。总之,我们的结果阐明了这些方法之间的权衡,并为使用泊松潜变量模型的实践者提供了具体建议。

英文摘要

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT) simulation and *Gumbel-SoftMax* (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. These results extend to over-dispersed Negative Binomial latents, where modified EAT again performs best. However, only GSM generalizes to arbitrary non-Poisson distributions, including the under-dispersed regime. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.

2602.03655 2026-06-01 cs.LG 版本更新

Sequential Group Composition: A Window into the Mechanics of Deep Learning

序列群组合:深度学习机制的一扇窗口

Giovanni Luca Marchetti, Daniel Kunin, Adele Myers, Francisco Acosta, Nina Miolane

发表机构 * KTH Royal Institute of Technology(皇家理工学院)

AI总结 通过序列群组合任务,研究神经网络如何学习结构化运算,揭示群结构、编码统计和序列长度对学习的影响,并证明深度架构能显著改善宽度需求。

Comments Accepted at ICML 2026

详情
AI中文摘要

经过序列训练的神经网络如何获得执行结构化运算(如算术、几何和算法计算)的能力?为了深入了解这个问题,我们引入了序列群组合任务。在该任务中,网络接收来自有限群的元素序列(这些元素编码在实向量空间中),并必须预测它们的累积乘积。该任务可能对顺序敏感,且无法通过线性模型解决。我们的分析隔离了群结构、编码统计和序列长度在塑造学习中的作用。我们证明,从零初始化开始的两层网络一次学习群的一个不可约表示,顺序由编码的傅里叶统计决定。为了完美学习该任务,这些网络需要隐藏宽度随序列长度 $k$ 呈指数增长。相比之下,我们构建了利用结合律的更深层架构,显著改善了这种缩放:循环神经网络可以在 $k$ 步内顺序组合元素,而多层网络可以在 $\log k$ 层内并行组合相邻对。总体而言,序列群组合任务为深度学习机制提供了一个可处理的窗口。

英文摘要

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. This task can be order-sensitive and cannot be solved by a linear model. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks from vanishing initialization learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. To perfectly learn the task, these networks require a hidden width exponential in the sequence length $k$. In contrast, we construct deeper architectures that exploit associativity to dramatically improve this scaling: recurrent neural networks can compose elements sequentially in $k$ steps, while multilayer networks can compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.

2601.20789 2026-06-01 cs.CL cs.LG cs.SE 版本更新

SERA: Soft-Verified Efficient Repository Agents

SERA:软验证的高效仓库智能体

Ethan Shen, Daniel Tormoen, Saurabh Shah, Ali Farhadi, Tim Dettmers

发表机构 * Allen Institute for AI(艾伦人工智能研究所) University of Washington(华盛顿大学) Carnegie Mellon University(卡内基梅隆大学) Paul G. Allen School of Computer Science and Engineering(保罗·G·艾伦计算机科学与工程学院) Allen Institute of Artificial Intelligence(艾伦人工智能研究所) Machine Learning Department(机器学习系)

AI总结 提出SERA方法,通过软验证生成(SVG)高效训练编码智能体,使其快速适应私有代码库,在开源模型中取得领先性能且成本极低。

Comments 21 main pages, 6 pages appendix

详情
AI中文摘要

开源编码智能体应比闭源系统具有根本优势,因为它们可以专门化到私有代码库,将仓库特定信息直接编码在其权重中。然而,训练的成本和复杂性一直使这一优势停留在理论层面。我们提出了软验证高效仓库智能体(SERA),一种高效的编码智能体训练方法,能够快速、廉价地创建专门化到私有代码库的智能体。利用软验证生成(SVG),我们可以从任何代码仓库生成数千条轨迹,而无需单元测试。除了仓库专门化,我们将SVG应用于更大的代码库语料库,生成了超过200,000条合成轨迹。仅使用监督微调(SFT),SERA在全开源(开放数据、方法、代码)模型中取得了领先结果,同时匹配了如Devstral-Small-2等开源权重模型的性能。创建SERA模型的成本比强化学习便宜26倍,比先前达到同等性能的合成数据方法便宜57倍。我们利用数据集提供了关于训练编码智能体的缩放定律、消融实验和混淆因素的详细分析。总体而言,我们相信我们的工作将极大加速开源编码智能体的研究,并展示能够适应私有代码库的开源模型的优势。我们将SERA作为Ai2开源编码智能体系列的第一个模型发布,同时公开所有代码、数据和Claude Code集成,以支持研究社区。

英文摘要

Open-weight coding agents should hold a fundamental advantage over closed-source systems because they can specialize to private codebases, encoding repository-specific information directly in their weights. Yet the cost and complexity of training has kept this advantage theoretical until now. We present Soft-Verified Efficient Repository Agents (SERA), an efficient method for training coding agents that enables the rapid and cheap creation of agents specialized to private codebases. Using Soft Verified Generation (SVG), we generate thousands of trajectories from any code repository, without requiring unit tests. Beyond repository specialization, we apply SVG to a larger corpus of codebases, generating 200,000+ synthetic trajectories. Using only supervised finetuning (SFT), SERA achieves leading results among fully open-source (open data, method, code) models while matching the performance of open-weight models like Devstral-Small-2. Creating SERA models is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. We use our dataset to provide detailed analysis of scaling laws, ablations, and confounding factors for training coding agents. Overall, we believe our work will greatly accelerate research on open coding agents and showcase the advantage of open-source models that can adapt to private codebases. We release SERA as the first model in Ai2's Open Coding Agents series, along with all our code, data, and Claude Code integration to support the research community.

2510.00845 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis

作为统计估计的机械可解释性:方差分析

Maxime Méloux, François Portet, Maxime Peyrard

发表机构 * Université Grenoble Alpes, CNRS, Grenoble INP, LIG(格勒诺布尔阿尔卑斯大学、国家科学研究中心、格勒诺布尔INP、实验室LIG)

AI总结 本文从统计估计角度审视机械可解释性中的电路发现,揭示因果中介分析中单输入得分的固有方差导致电路不稳定,并系统分解方差来源,倡导更严谨的实践。

详情
AI中文摘要

机械可解释性(MI)旨在通过识别功能子网络来逆向工程模型行为。然而,这些发现的科学有效性取决于其稳定性。在这项工作中,我们认为电路发现不是一个独立的任务,而是一个建立在因果中介分析(CMA)基础上的统计估计问题。我们揭示了这一基础层的根本不稳定性:精确的单输入CMA得分表现出高固有方差,这意味着组件的因果效应是一个易变的随机变量,而非固定属性。然后,我们证明电路发现流程继承了这一方差并进一步放大。快速近似方法,如边缘属性修补及其后续方法,引入了额外的估计噪声,而在数据集上聚合这些噪声得分会导致脆弱的结构估计。因此,输入数据或超参数的小扰动会产生截然不同的电路。我们系统地分解了这些方差来源,并倡导更严格的MI实践,优先考虑统计稳健性和稳定性指标的常规报告。

英文摘要

Mechanistic Interpretability (MI) aims to reverse-engineer model behaviors by identifying functional sub-networks. Yet, the scientific validity of these findings depends on their stability. In this work, we argue that circuit discovery is not a standalone task but a statistical estimation problem built upon causal mediation analysis (CMA). We uncover a fundamental instability at this base layer: exact, single-input CMA scores exhibit high intrinsic variance, implying that the causal effect of a component is a volatile random variable rather than a fixed property. We then demonstrate that circuit discovery pipelines inherit this variance and further amplify it. Fast approximation methods, such as Edge Attribution Patching and its successors, introduce additional estimation noise, while aggregating these noisy scores over datasets leads to fragile structural estimates. Consequently, small perturbations in input data or hyperparameters yield vastly different circuits. We systematically decompose these sources of variance and advocate for more rigorous MI practices, prioritizing statistical robustness and routine reporting of stability metrics.

2602.01914 2026-06-01 cs.LG 版本更新

Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs

面向长程可解释性:高效且忠实的多Token归因用于推理大语言模型

Wenbo Pan, Zhichao Liu, Xianlong Wang, Haining Yu, Xiaohua Jia

发表机构 * Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China(香港城市大学计算机科学系) Harbin Institute of Technology, Harbin, China(哈尔滨工业大学)

AI总结 提出FlashTrace方法,通过跨跨度聚合和递归归因机制,在长上下文推理中实现高效且忠实的多Token归因,速度提升超130倍。

Comments Accepted as an Oral paper at ICML 2026. Code available at https://github.com/wbopan/flashtrace

详情
AI中文摘要

Token归因方法通过识别因果重要的输入Token为语言模型输出提供直观解释。然而,随着现代LLM越来越依赖扩展的推理链,现有方案面临两个关键挑战:(1)效率瓶颈,在长度为N的上下文中归因M个Token的目标跨度需要O(M*N)次操作,使得长上下文归因极其缓慢;(2)忠实度下降,中间推理Token吸收归因质量,阻止重要性传播回原始输入。为解决这些问题,我们引入FlashTrace,一种高效的多Token归因方法,它采用跨跨度聚合在单次传递中计算多Token目标的归因,同时保持忠实度。此外,我们设计了一种递归归因机制,通过中间推理链将重要性追溯回源输入。在长上下文检索(RULER)和多步推理(MATH、MorehopQA)任务上的大量实验表明,FlashTrace在保持优越忠实度的同时,比现有基线实现了超过130倍的加速。我们进一步分析了递归归因的动态特性,表明即使单次递归跳跃也能通过沿推理链追溯重要性来提高忠实度。

英文摘要

Token attribution methods provide intuitive explanations for language model outputs by identifying causally important input tokens. However, as modern LLMs increasingly rely on extended reasoning chains, existing schemes face two critical challenges: (1) efficiency bottleneck, where attributing a target span of M tokens within a context of length N requires O(M*N) operations, making long-context attribution prohibitively slow; and (2) faithfulness drop, where intermediate reasoning tokens absorb attribution mass, preventing importance from propagating back to the original input. To address these, we introduce FlashTrace, an efficient multi-token attribution method that employs span-wise aggregation to compute attribution over multi-token targets in a single pass, while maintaining faithfulness. Moreover, we design a recursive attribution mechanism that traces importance through intermediate reasoning chains back to source inputs. Extensive experiments on long-context retrieval (RULER) and multi-step reasoning (MATH, MorehopQA) tasks demonstrate that FlashTrace achieves over 130x speedup over existing baselines while maintaining superior faithfulness. We further analyze the dynamics of recursive attribution, showing that even a single recursive hop improves faithfulness by tracing importance through the reasoning chain.

2602.01553 2026-06-01 cs.LG cs.AI 版本更新

Plain Transformers are Surprisingly Powerful Link Predictors

普通Transformer竟是惊人的链接预测器

Quang Truong, Yu Song, Donald Loveland, Mingxuan Ju, Tong Zhao, Neil Shah, Jiliang Tang

发表机构 * Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.(计算机科学与工程系,密歇根州立大学,东兰辛,MI,美国。) Department of Computer Science(计算机科学系) Engineering, Michigan State University, East Lansing, MI, USA.(工程,密歇根州立大学,东兰辛,MI,美国。)

AI总结 提出PENCIL,一种仅编码器的普通Transformer,通过采样局部子图的注意力机制替代手工先验,在保持标准Transformer可扩展性的同时,隐式泛化多种启发式方法,实现高效且参数经济的链接预测。

Comments ICML'26

详情
AI中文摘要

链接预测是图机器学习中的核心挑战,需要能够捕捉丰富且复杂的拓扑依赖关系的模型。虽然图神经网络(GNN)是标准解决方案,但最先进的流程通常依赖于显式结构启发式或内存密集型的节点嵌入——这些方法难以泛化或扩展到大规模图。新兴的图Transformer(GT)提供了一种潜在的替代方案,但由于复杂的结构编码,它们通常会产生显著的开销,阻碍了其在大规模链接预测中的应用。我们通过PENCIL挑战这些复杂的范式,这是一种仅编码器的普通Transformer,用对采样局部子图的注意力替代手工先验,保留了标准Transformer的可扩展性和硬件效率。通过实验和理论分析,我们表明PENCIL比GNN提取了更丰富的结构信号,隐式泛化了一类广泛的启发式和基于子图的表达能力。实验上,PENCIL优于启发式信息增强的GNN,并且比基于ID嵌入的替代方案参数效率高得多,同时在各种基准测试中保持竞争力——即使没有节点特征。我们的结果挑战了当前对复杂工程技术的依赖,表明简单的设计选择可能足以实现相同的能力。我们的代码公开在 https://github.com/quang-truong/pencil。

英文摘要

Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies. While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structural heuristics or memory-intensive node embeddings -- approaches that struggle to generalize or scale to massive graphs. Emerging Graph Transformers (GTs) offer a potential alternative but often incur significant overhead due to complex structural encodings, hindering their applications to large-scale link prediction. We challenge these sophisticated paradigms with PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs, retaining the scalability and hardware efficiency of standard Transformers. Through experimental and theoretical analysis, we show that PENCIL extracts richer structural signals than GNNs, implicitly generalizing a broad class of heuristics and subgraph-based expressivity. Empirically, PENCIL outperforms heuristic-informed GNNs and is far more parameter-efficient than ID-embedding--based alternatives, while remaining competitive across diverse benchmarks -- even without node features. Our results challenge the prevailing reliance on complex engineering techniques, demonstrating that simple design choices are potentially sufficient to achieve the same capabilities. Our code is publicly available at https://github.com/quang-truong/pencil.

2512.19673 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

自底向上策略优化:你的语言模型策略内部隐藏着内部策略

Yuqiao Tan, Minzheng Wang, Shizhu He, Huanxuan Liao, Chengfeng Zhao, Qiunan Lu, Tian Liang, Jun Zhao, Kang Liu

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) University of Chinese Academy of Sciences(中国科学院大学) Tencent AI Lab(腾讯AI实验室)

AI总结 本文通过分解Transformer残差流中的内部层策略和内部模块策略,提出自底向上策略优化(BuPO)方法,通过早期优化内部层来重建LLM的推理基础,在复杂推理基准上验证了有效性。

Comments Preprint. Our code is available at https://github.com/Trae1ounG/BuPO

详情
AI中文摘要

现有的强化学习方法将大型语言模型(LLM)视为统一策略,忽略了其内部机制。在本文中,我们通过Transformer的残差流将基于LLM的策略分解为内部层策略和内部模块策略。我们对内部策略的熵分析揭示了不同的模式:(1)普遍地,内部策略从早期层的高熵探索演变为顶层层的确定性精炼;(2)Qwen表现出显式的渐进推理结构,与Llama中的突然收敛形成对比。此外,我们发现优化内部层会引发特征精炼,迫使较低层早期捕获高层推理表示。受这些发现启发,我们提出了自底向上策略优化(BuPO),一种新的强化学习范式,通过在早期阶段优化内部层来自底向上重建LLM的推理基础。在复杂推理基准上的大量实验证明了BuPO的有效性。

英文摘要

Existing reinforcement learning (RL) approaches treat large language models (LLMs) as a unified policy, overlooking their internal mechanisms. In this paper, we decompose the LLM-based policy into Internal Layer Policies and Internal Modular Policies via the Transformer's residual stream. Our entropy analysis of internal policy reveals distinct patterns: (1) universally, internal policies evolve from high-entropy exploration in early layers to deterministic refinement in the top layers; and (2) Qwen exhibits an explicit progressive reasoning structure, contrasting with the abrupt convergence in Llama. Furthermore, we discover that optimizing internal layers induces feature refinement, forcing lower layers to capture high-level reasoning representations early. Motivated by these findings, we propose Bottom-up Policy Optimization (BuPO), a novel RL paradigm that reconstructs the LLM's reasoning foundation from the bottom up by optimizing internal layers in early stages. Extensive experiments on complex reasoning benchmarks demonstrate the effectiveness of BuPO.

2602.01399 2026-06-01 cs.LG cs.AI stat.ML 版本更新

An Odd Estimator for Shapley Values

Shapley 值的一个奇估计器

Fabian Fumagalli, Landon Butler, Justin Singh Kang, Kannan Ramchandran, R. Teal Witter

发表机构 * Department of Statistics, LMU Munich(LMU慕尼黑统计系) Department of Electrical Engineering and Computer Science, UC Berkeley(伯克利电子工程与计算机科学系) Mathematical Sciences Department, Claremont McKenna College(克莱门茨麦肯纳学院数学科学系)

AI总结 本文证明 Shapley 值仅依赖于集合函数的奇分量,并基于此提出 OddSHAP 估计器,通过在奇子空间上进行多项式回归实现高效近似,在较大采样预算下达到最先进精度。

Comments Accepted to ICML 2026

详情
AI中文摘要

Shapley 值是机器学习中用于归因的普遍框架,涵盖特征重要性、数据估值和因果推断。然而,其精确计算通常是棘手的,需要高效的近似方法。虽然最有效和流行的估计器利用配对采样启发式来减少估计误差,但驱动这种改进的理论机制仍然不透明。在这项工作中,我们为配对采样提供了一个优雅且基本的理由:我们证明了 Shapley 值仅依赖于集合函数的奇分量,并且配对采样正交化回归目标以滤除无关的偶分量。利用这一见解,我们提出了 OddSHAP,一种新颖的一致估计器,它仅在奇子空间上进行多项式回归。通过利用傅里叶基来隔离该子空间,并使用代理模型识别高影响交互,OddSHAP 克服了高阶近似的组合爆炸。通过广泛的基准测试,我们发现 OddSHAP 在较大的采样预算下实现了最先进的估计精度。

英文摘要

The Shapley value is a ubiquitous framework for attribution in machine learning, encompassing feature importance, data valuation, and causal inference. However, its exact computation is generally intractable, necessitating efficient approximation methods. While the most effective and popular estimators leverage the paired sampling heuristic to reduce estimation error, the theoretical mechanism driving this improvement has remained opaque. In this work, we provide an elegant and fundamental justification for paired sampling: we prove that the Shapley value depends exclusively on the odd component of the set function, and that paired sampling orthogonalizes the regression objective to filter out the irrelevant even component. Leveraging this insight, we propose OddSHAP, a novel consistent estimator that performs polynomial regression solely on the odd subspace. By utilizing the Fourier basis to isolate this subspace and employing a proxy model to identify high-impact interactions, OddSHAP overcomes the combinatorial explosion of higher-order approximations. Through an extensive benchmark, we find that OddSHAP achieves state-of-the-art estimation accuracy at larger sampling budgets.

2602.01267 2026-06-01 cs.LG 版本更新

Diving into Kronecker Adapters: Component Design Matters

深入Kronecker适配器:组件设计至关重要

Jiayu Bai, Danchen Yu, Zhenyu Liao, TianQi Hou, Feng Zhou, Robert C. Qiu, Zenan Ling

发表机构 * School of Electronic Information and Communications, Huazhong University of Science and Technology(华中科技大学电子信息学院) Huawei(华为) Center for Applied Statistics and School of Statistics, Renmin University of China(中国人民大学应用统计中心和统计学院)

AI总结 本文通过分析Kronecker适配器的组件维度和数量,提出组件设计的Kronecker适配器(CDKA),并给出参数预算感知的配置指南和训练稳定策略,实验证明其有效性。

详情
AI中文摘要

Kronecker适配器已成为微调大规模模型的一种有前景的方法,通过可调组件结构实现高秩更新。然而,现有工作大多将组件结构视为固定或启发式设计选择,对Kronecker组件的维度和数量探索不足。在本文中,我们确定组件结构是控制Kronecker适配器能力的关键因素。我们对Kronecker组件的维度和数量进行了细粒度分析。特别地,我们展示了Kronecker适配器与全微调之间的对齐取决于组件配置。在这些见解的指导下,我们提出了组件设计的Kronecker适配器(CDKA)。我们进一步提供了参数预算感知的配置指南和针对实际部署的定制训练稳定策略。跨各种架构和模态的实验证明了CDKA的有效性。代码可在https://github.com/rainstonee/CDKA获取。

英文摘要

Kronecker adapters have emerged as a promising approach for fine-tuning large-scale models, enabling high-rank updates through tunable component structures. However, existing work largely treats the component structure as a fixed or heuristic design choice, leaving the dimensions and number of Kronecker components underexplored. In this paper, we identify component structure as a key factor governing the capacity of Kronecker adapters. We perform a fine-grained analysis of both the dimensions and number of Kronecker components. In particular, we show that the alignment between Kronecker adapters and full fine-tuning depends on component configurations. Guided by these insights, we propose Component Designed Kronecker Adapters (CDKA). We further provide parameter-budget-aware configuration guidelines and a tailored training stabilization strategy for practical deployment. Experiments across various architectures and modalities demonstrate the effectiveness of CDKA. Code is available at https://github.com/rainstonee/CDKA.

2602.01186 2026-06-01 cs.LG cs.AI 版本更新

The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics

高斯头OFL系列:基于客户端全局统计的一次性联邦学习

Fabio Turazza, Marco Picone, Marco Mamei

发表机构 * Department of Sciences and Methods for Engineering(工程科学与方法系) Artificial Intelligence Research and Innovation Center(人工智能研究与创新中心) University of Modena and Reggio Emilia(摩德纳和雷吉奥艾米利亚大学)

AI总结 提出高斯头OFL系列方法,通过客户端仅传输每类计数和一二阶矩,服务器利用闭式高斯头、FisherMix和Proto-Hyper三种组件构建模型,实现严格无数据的一次性联邦学习,在强非独立同分布下达到最先进鲁棒性和准确性。

Comments Accepted at the International Conference on Learning Representations (ICLR) 2026 - Final Version

详情
AI中文摘要

经典联邦学习依赖于服务器与客户端之间多轮迭代的模型交换和聚合过程,存在高通信成本和重复模型传输带来的隐私风险。相比之下,一次性联邦学习(OFL)通过将通信减少到单轮来缓解这些限制,从而降低开销并增强实际部署能力。然而,现有大多数一次性方法仍然不切实际或受限,例如,它们通常依赖公共数据集的可用性、假设同质客户端模型,或需要上传额外数据或模型信息。为克服这些问题,我们引入了高斯头OFL(GH-OFL)系列,这是一套一次性联邦方法,假设预训练嵌入具有类条件高斯性。客户端仅传输充分统计量(每类计数和一阶/二阶矩),服务器通过三个组件构建头部:(i)直接从接收统计量计算的闭式高斯头(NB/LDA/QDA);(ii)FisherMix,一种在估计的Fisher子空间中采样的合成样本上训练的带余弦边界的线性头;以及(iii)Proto-Hyper,一种轻量级低秩残差头,通过知识蒸馏在这些合成样本上细化高斯logits。在我们的实验中,GH-OFL方法在强非独立同分布偏移下提供了最先进的鲁棒性和准确性,同时保持严格无数据。

英文摘要

Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds heads via three components: (i) Closed-form Gaussian heads (NB/LDA/QDA) computed directly from the received statistics; (ii) FisherMix, a linear head with cosine margin trained on synthetic samples drawn in an estimated Fisher subspace; and (iii) Proto-Hyper, a lightweight low-rank residual head that refines Gaussian logits via knowledge distillation on those synthetic samples. In our experiments, GH-OFL methods deliver state-of-the-art robustness and accuracy under strong non-IID skew while remaining strictly data-free.

2601.13433 2026-06-01 cs.CL cs.LG 版本更新

Who Endorsed It? Measuring Authority Bias Across Expertise Levels in Language Models

谁背书了它?测量语言模型中跨专业水平的权威偏差

Priyanka Mary Mammen, Emil Joswin, Shankar Venkitachalam

发表机构 * UMass Amherst(马萨诸塞大学阿默斯特分校) Independent Research(独立研究)

AI总结 研究语言模型在推理任务中是否因背书来源的专业水平而产生系统性偏差,发现模型对高权威来源的错误背书更易受影响,导致准确率下降和错误答案置信度增加,但可通过机制干预减轻偏差。

详情
AI中文摘要

先前研究表明,语言模型在推理任务上的表现可能受到建议、提示和背书的影响。然而,背书来源可信度的影响仍未充分探索。我们调查语言模型是否根据背书提供者的感知专业水平表现出系统性偏差。跨越数学、法律和医学推理的4个数据集,我们使用代表每个领域四个专业水平的角色评估了11个模型。我们的结果表明,随着来源专业水平的增加,模型越来越容易受到错误/误导性背书的影响,更高权威的来源不仅导致准确率下降,还增加了对错误答案的置信度。我们还表明,这种权威偏差在模型内部被机制性地编码,并且模型可以被引导远离偏差,从而即使在专家给出误导性背书时也能提高其性能。

英文摘要

Prior research demonstrates that performance of language models on reasoning tasks can be influenced by suggestions, hints and endorsements. However, the influence of endorsement source credibility remains underexplored. We investigate whether language models exhibit systematic bias based on the perceived expertise of the provider of the endorsement. Across 4 datasets spanning mathematical, legal, and medical reasoning, we evaluate 11 models using personas representing four expertise levels per domain. Our results reveal that models are increasingly susceptible to incorrect/misleading endorsements as source expertise increases, with higher-authority sources inducing not only accuracy degradation but also increased confidence in wrong answers. We also show that this authority bias is mechanistically encoded within the model and a model can be steered away from the bias, thereby improving its performance even when an expert gives a misleading endorsement.

2601.22985 2026-06-01 cs.LG 版本更新

dgMARK: Decoding-Guided Watermarking for Diffusion Language Models

dgMARK: 面向扩散语言模型的解码引导水印方法

Pyo Min Hong, Albert No

发表机构 * Department of Computer Engineering, Hongik University(鸿基大学计算机工程系) Department of Artificial Intelligence, Yonsei University(延世大学人工智能系)

AI总结 提出dgMARK方法,通过引导离散扩散语言模型的去掩码顺序满足奇偶约束,实现无需显式重加权概率的文本水印嵌入,并利用滑动窗口检测器保证对编辑操作的鲁棒性。

Comments Accepted at ICML 2026. Project page: https://dgmark-watermarking.github.io

详情
AI中文摘要

我们提出了dgMARK,一种面向离散扩散语言模型(dLLMs)的解码引导水印方法。与自回归模型不同,dLLMs可以以任意顺序生成token。虽然理想的条件预测器应对此顺序不变,但实际dLLMs对去掩码顺序表现出强敏感性,这为水印创建了一个新通道。dgMARK将去掩码顺序引导至那些高奖励候选token满足由二元哈希引入的简单奇偶约束的位置,而不显式重新加权模型学习到的概率。该方法可与常见解码策略(例如基于置信度、熵和边界的排序)即插即用,并可通过一步前瞻变体增强。水印通过升高的奇偶匹配统计量检测,滑动窗口检测器确保在插入、删除、替换和释义等后期编辑操作下的鲁棒性。项目网站:https://dgmark-watermarking.github.io

英文摘要

We propose dgMARK, a decoding-guided watermarking method for discrete diffusion language models (dLLMs). Unlike autoregressive models, dLLMs can generate tokens in arbitrary order. While an ideal conditional predictor would be invariant to this order, practical dLLMs exhibit strong sensitivity to the unmasking order, creating a new channel for watermarking. dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint induced by a binary hash, without explicitly reweighting the model's learned probabilities. The method is plug-and-play with common decoding strategies (e.g., confidence, entropy, and margin-based ordering) and can be strengthened with a one-step lookahead variant. Watermarks are detected via elevated parity-matching statistics, and a sliding-window detector ensures robustness under post-editing operations including insertion, deletion, substitution, and paraphrasing. Project website: https://dgmark-watermarking.github.io

2601.22943 2026-06-01 cs.LG 版本更新

Scalable Topology-Preserving Graph Coarsening: Concepts and Algorithms

可扩展的拓扑保持图粗化:概念与算法

Xiang Wu, Rong-Hua Li, Xunkai Li, Kangfei Zhao, Hongchao Qin, Guoren Wang

发表机构 * Department of Computer Science, Beijing Institute of Technology(北京理工大学计算机科学系)

AI总结 针对现有拓扑保持图粗化方法时间复杂度高的问题,提出基于代数拓扑的图强坍缩和图边坍缩概念的可扩展拓扑保持图粗化(STPGC),通过三种新算法消除主导节点和边,严格保持拓扑特征,并证明其保持GNN感受野,加速GNN训练。

详情
AI中文摘要

图粗化在保持某些属性的同时减小图的规模。现有方法大多保持谱或空间特征。最近研究表明,拓扑保持粗化方法在粗化图上保持GNN性能,但存在指数时间复杂度。为解决这些问题,我们通过引入从代数拓扑扩展而来的图强坍缩和图边坍缩概念,提出了可扩展拓扑保持图粗化(STPGC)。STPGC包含基于这两个概念的三种新算法:GStrongCollapse、GEdgeCollapse和NeighborhoodConing,它们在严格保持拓扑特征的同时消除主导节点和边。我们进一步证明STPGC保持GNN感受野,并开发近似算法以加速GNN训练。在节点分类任务上的实验表明了STPGC的效率和有效性。

英文摘要

Graph coarsening reduces the size of a graph while preserving certain properties. Most existing methods preserve either spectral or spatial characteristics. Recent research shows that topology-preserving coarsening methods maintain GNN performance on coarsened graphs but suffer from exponential time complexity. To address these problems, we propose Scalable Topology-Preserving Graph Coarsening (STPGC) by introducing the concepts of graph strong collapse and graph edge collapse extended from algebraic topology. STPGC comprises three new algorithms, GStrongCollapse, GEdgeCollapse, and NeighborhoodConing based on these two concepts, which eliminate dominated nodes and edges while rigorously preserving topological features. We further prove that STPGC preserves the GNN receptive field and develop approximate algorithms to accelerate GNN training. Experiments on node classification with GNNs demonstrate the efficiency and effectiveness of STPGC.

2601.22787 2026-06-01 cs.LG 版本更新

Float8@2bits: Entropy Coding Enables Data-Free Model Compression

Float8@2bits: 熵编码实现无数据模型压缩

Patrick Putzky, Martin Genzel, Mattes Mollenhauer, Sebastian Schulze, Thomas Wollmann, Stefan Dietzel

发表机构 * Merantix Momentum GmbH

AI总结 提出EntQuant框架,通过熵编码解耦数值精度与存储成本,在无需数据和微调的情况下实现2比特极端压缩,10分钟内压缩70B参数模型并保持性能。

Comments ICML 2026. Code available at https://github.com/merantix-momentum/entquant

详情
AI中文摘要

训练后压缩目前分为两种对比鲜明的范式。一方面,快速、无数据且与模型无关的方法(如NF4或HQQ)提供了最大的可访问性,但在低于4比特的极端比特率下会出现功能崩溃。另一方面,利用校准数据或大量恢复训练的技术实现了更高的保真度,但施加了高计算约束,并且在数据分布偏移下面临不确定的鲁棒性。我们引入了EntQuant,一个统一了这些不同范式优势的框架。通过匹配数据依赖方法的性能与数据无关技术的速度和通用性,EntQuant在极端压缩机制下实现了实际效用。我们的方法通过熵编码将数值精度与存储成本解耦,在不到10分钟内压缩了一个70B参数模型。我们证明,EntQuant不仅在标准评估集和模型上取得了最先进的结果,而且在指令调优模型的更复杂基准测试中保持了功能性能,同时推理开销适中。

英文摘要

Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness under data distribution shifts. We introduce EntQuant, a framework that unites the advantages of these distinct paradigms. By matching the performance of data-dependent methods with the speed and universality of data-free techniques, EntQuant enables practical utility in the extreme compression regime. Our method decouples numerical precision from storage cost via entropy coding, compressing a 70B parameter model in less than 10 minutes. We demonstrate that EntQuant does not only achieve state-of-the-art results on standard evaluation sets and models, but also retains functional performance on more complex benchmarks with instruction-tuned models, all at modest inference overhead.

2508.09925 2026-06-01 cs.LG cs.AI 版本更新

Residual Reservoir Memory Networks

残差储备记忆网络

Matteo Pinna, Andrea Ceni, Claudio Gallicchio

发表机构 * Department of Computer Science(计算机科学系) University of Pisa(比萨大学)

AI总结 提出一种新型无训练循环神经网络ResRMN,通过结合线性记忆储备与基于时间残差正交连接的非线性储备,增强长期输入传播,在时间序列和像素级一维分类任务中优于传统储备计算模型。

Comments IJCNN 2025

详情
AI中文摘要

我们在储备计算(RC)范式内引入了一类新型无训练循环神经网络(RNN),称为残差储备记忆网络(ResRMN)。ResRMN将线性记忆储备与非线性储备相结合,其中后者基于沿时间维度的残差正交连接,以增强输入的长期传播。通过线性稳定性分析研究所得储备状态动力学,并探讨了时间残差连接的不同配置。所提出的方法在时间序列和像素级一维分类任务上进行了实证评估。我们的实验结果突出了所提出方法相对于其他传统RC模型的优势。

英文摘要

We introduce a novel class of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) paradigm, called Residual Reservoir Memory Networks (ResRMNs). ResRMN combines a linear memory reservoir with a non-linear reservoir, where the latter is based on residual orthogonal connections along the temporal dimension for enhanced long-term propagation of the input. The resulting reservoir state dynamics are studied through the lens of linear stability analysis, and we investigate diverse configurations for the temporal residual connections. The proposed approach is empirically assessed on time-series and pixel-level 1-D classification tasks. Our experimental results highlight the advantages of the proposed approach over other conventional RC models.

2506.01318 2026-06-01 cs.LG cs.AI 版本更新

Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack

机器遗忘的盲点:过度遗忘与原型重学习攻击

SeungBum Ha, Saerom Park, Sung Whan Yoon

发表机构 * Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea(人工智能研究生院,乌山国立科学与技术研究所(UNIST),乌山,韩国) Department of Industrial Engineering, UNIST, Ulsan, South Korea(工业工程系,UNIST,乌山,韩国) Department of Electrical Engineering, UNIST, Ulsan, South Korea(电气工程系,UNIST,乌山,韩国)

AI总结 针对类别级机器遗忘,提出过度遗忘度量OU@epsilon并揭示原型重学习攻击,通过Spotter方法结合掩码知识蒸馏和类内分散损失来缓解这两个盲点。

Comments 9 pages, ICML 2026

详情
AI中文摘要

机器遗忘(MU)旨在从训练模型中删除指定的遗忘集,而无需昂贵的重新训练,但现有技术忽略了两个关键盲点:"过度遗忘"会恶化遗忘集附近的保留数据,以及事后"重学习"攻击旨在复活被遗忘的知识。聚焦于类别级遗忘,我们首先推导出一个过度遗忘度量OU@epsilon,它量化了遗忘集邻近区域(过度遗忘主要发生区域)的附带损害。接下来,我们揭示了MU上一个未预见的重学习威胁,即原型重学习攻击,该攻击仅利用少量样本就能利用遗忘类的每类原型,并轻松恢复遗忘前的性能。为了应对类别级遗忘中的这两个盲点,我们引入了Spotter,一个即插即用的目标函数,它结合了(i)对遗忘类邻近区域的掩码知识蒸馏惩罚以抑制OU@epsilon,和(ii)一个类内分散损失,用于分散遗忘类嵌入,从而中和原型重学习攻击。Spotter在CIFAR、TinyImageNet和CASIA-WebFace数据集上取得了最先进的结果,为机器遗忘的盲点提供了实用的补救措施。

英文摘要

Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots: "over-unlearning" that deteriorates retained data near the forget set, and post-hoc "relearning" attacks that aim to resurrect the forgotten knowledge. Focusing on class-level unlearning, we first derive an over-unlearning metric, OU@epsilon, which quantifies collateral damage in regions proximal to the forget set, where over-unlearning mainly occurs. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots in class-level unlearning, we introduce Spotter, a plug-and-play objective that combines (i) a masked knowledge-distillation penalty on the nearby region of forget classes to suppress OU@epsilon, and (ii) an intra-class dispersion loss that scatters forget-class embeddings, neutralizing Prototypical Relearning Attacks. Spotter achieves state-of-the-art results across CIFAR, TinyImageNet, and CASIA-WebFace datasets, offering a practical remedy to unlearning's blind spots.

2601.22296 2026-06-01 cs.LG cs.AI 版本更新

ParalESN: Enabling parallel information processing in Reservoir Computing

ParalESN:在储层计算中实现并行信息处理

Matteo Pinna, Giacomo Lagomarsini, Andrea Ceni, Claudio Gallicchio

发表机构 * Department of Computer Science, University of Pisa, Pisa, Italy(意大利比萨大学计算机科学系)

AI总结 提出ParalESN,利用复数域对角线性递归实现储层计算的并行化,在保持回声状态属性和普适性保证的同时,大幅提升计算效率。

Comments ICML 2026

详情
AI中文摘要

储层计算(RC)已成为时间处理的有效范式。然而,其可扩展性受到顺序处理时间数据的需要和高维储层巨大内存占用的严重限制。为了解决这些限制,我们通过结构化算子和状态空间建模的视角重新审视RC,引入了并行回声状态网络(ParalESN)。利用复数域中的对角线性递归,ParalESN实现了时间数据的并行处理以及高效高维储层的构建。彻底的理论分析表明,传统回声状态网络的回声状态属性和普适性保证得以保留,同时允许任意线性储层在复数对角形式下的等价表示。实验上,ParalESN在预测精度上与传统的RC和完全可训练的序列模型相当,同时实现了数量级的计算节省。总体而言,ParalESN为将RC集成到深度学习领域提供了一条可扩展且有原则的路径。

英文摘要

Reservoir Computing (RC) has established itself as an efficient paradigm for temporal processing. However, its scalability remains severely constrained by the need to process temporal data sequentially and the prohibitive memory footprint of high-dimensional reservoirs. To address these limitations, we revisit RC through the lens of structured operators and state space modeling, introducing Parallel Echo State Network (ParalESN). Leveraging diagonal linear recurrence in the complex domain, ParalESN enables parallel processing of temporal data and the construction of efficient, high-dimensional reservoirs. A thorough theoretical analysis demonstrates that the Echo State Property and the universality guarantees of traditional Echo State Networks are preserved, while also admitting an equivalent representation of arbitrary linear reservoirs in the complex diagonal form. Empirically, ParalESN achieves competitive predictive accuracy with traditional RC and with fully trainable sequence models, while delivering computational savings by orders of magnitude. Overall, ParalESN offers a scalable and principled pathway for integrating RC within the deep learning landscape.

2505.17595 2026-06-01 cs.LG cs.CL 版本更新

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs

NeUQI: 低比特大语言模型的近最优均匀量化参数初始化

Li Lin, Xinyu Hu, Xiaojun Wan

发表机构 * Wangxuan Institute of Computer Technology, Peking University, China(北京大学王轩计算机技术研究院)

AI总结 针对低比特大语言模型均匀量化中参数初始化依赖Min-Max公式的局限,提出NeUQI方法,通过推导零点实现仅优化缩放因子,从而高效获得近最优初始化,在LLaMA和Qwen系列上优于现有方法,且结合轻量蒸馏可超越资源密集的PV-tuning。

Comments accepted by ICML 2026

详情
AI中文摘要

大型语言模型(LLM)在跨领域任务中表现出色,但由于高内存消耗和推理成本,在消费级GPU或个人设备(如笔记本电脑)上部署时面临重大挑战。LLM的训练后量化(PTQ)提供了一种有前景的解决方案,可减少内存占用和解码延迟。实践中,均匀量化表示的PTQ因其高效性和易于部署而受到青睐,因为均匀量化被主流硬件和软件库广泛支持。近期关于低比特均匀量化的研究在量化后模型性能上取得了显著改进;然而,这些研究主要关注量化方法,而量化参数的初始化仍未被充分探索,且仍依赖于传统的Min-Max公式。在本工作中,我们识别了Min-Max公式的局限性,突破其约束,提出了NeUQI,一种高效确定均匀量化近最优初始化的方法。我们的NeUQI通过为给定缩放因子推导零点,简化了缩放因子和零点的联合优化,从而将问题简化为仅缩放因子优化。得益于改进的量化参数,我们的NeUQI在LLaMA和Qwen系列的各种设置和任务上的实验中一致优于现有方法。此外,当与轻量蒸馏策略结合时,NeUQI甚至实现了优于PV-tuning(一种资源密集得多的方法)的性能。

英文摘要

Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula. In this work, we identify the limitations of the Min-Max formula, move beyond its constraints, and propose NeUQI, a method that efficiently determines near-optimal initialization for uniform quantization. Our NeUQI simplifies the joint optimization of the scale and zero-point by deriving the zero-point for a given scale, thereby reducing the problem to a scale-only optimization. Benefiting from the improved quantization parameters, our NeUQI consistently outperforms existing methods in the experiments with the LLaMA and Qwen families on various settings and tasks. Furthermore, when combined with a lightweight distillation strategy, NeUQI even achieves superior performance to PV-tuning, a considerably more resource-intensive method.

2601.22068 2026-06-01 cs.LG 版本更新

Quantifying the Uncertainty of Foundation Models with Singular Value Ensembles

用奇异值集合量化基础模型的不确定性

Mehmet Ozgur Turkoglu, Dominik J. Mühlematter, Alexander Becker, Konrad Schindler, Helge Aasen

发表机构 * ETH Zürich, Photogrammetry \& Remote Sensing

AI总结 提出奇异值集成(SVE)方法,通过冻结奇异向量并仅训练每个成员的奇异值,以极小的参数开销实现隐式集成,从而有效量化基础模型的不确定性。

Comments Accepted at ICML 2026 (camera-ready version)

详情
AI中文摘要

基础模型已成为机器学习中的主导范式,通过大规模预训练在各种任务中取得了显著性能。然而,它们往往产生过度自信、未校准的预测。量化认知不确定性的标准方法是使用多个独立训练模型的集成。但它们的计算成本随集成规模线性增长,使得大型基础模型难以实用。我们提出奇异值集成(SVE),一种参数高效的隐式集成方法。SVE 基于一个简单而强大的核心假设:即权重矩阵的奇异向量对应于表示空间中有意义的方向。如果奇异向量确实是有意义的(正交)“知识方向”,那么可以通过仅调节每个方向对输出的贡献强度来获得模型集成。我们冻结奇异向量,而不是为每个集成成员学习新参数,仅训练每个成员的奇异值,这些奇异值重新缩放共享知识基中每个方向的贡献。集成多样性在联合训练期间自然出现,因为随机初始化和随机批次采样导致不同成员收敛到相同底层知识的不同组合。SVE 的性能与显式集成相当,同时将基础模型的参数数量增加不到1%,使得在资源受限环境中也能进行有原则的不确定性估计。我们在 NLP 和视觉任务上使用各种不同的骨干网络验证了 SVE,并表明它在保持预测准确性的同时改善了校准。

英文摘要

Foundation models have become a dominant paradigm in machine learning, achieving remarkable performance across diverse tasks through large-scale pretraining. However, they often yield overconfident, uncalibrated predictions. The standard approach to quantifying epistemic uncertainty are ensembles of multiple independently trained models. But their computational cost scales linearly with ensemble size, making them impractical for large foundation models. We propose Singular Value Ensemble (SVE), a parameter-efficient implicit ensembling method. SVE builds on a simple, but powerful core assumption: namely, that the singular vectors of the weight matrices correspond to meaningful directions in the representation space. If the singular vectors are indeed meaningful (orthogonal) "knowledge directions", then a model ensemble can be obtained by modulating only how strongly each direction contributes to the output. Rather than learning new parameters for each ensemble member, we freeze the singular vectors and only train per-member singular values that rescale the contribution of each direction in that shared knowledge basis. Ensemble diversity emerges naturally during joint training as stochastic initialization and random batch sampling cause different members to converge to different combinations of the same underlying knowledge. SVE performs comparable to an explicit ensemble, while increasing the parameter count of the base model by <1%, making principled uncertainty estimation accessible in resource-constrained settings. We validate SVE on NLP and vision tasks with various different backbones and show that it improves calibration while maintaining predictive accuracy.

2601.21778 2026-06-01 cs.NE cs.LG 版本更新

Error Amplification Limits ANN-to-SNN Conversion in Continuous Control

误差放大限制了连续控制中的ANN到SNN转换

Zijie Xu, Zihan Huang, Yiting Dong, Kang Chen, Wenxuan Liu, Zhaofei Yu

发表机构 * Institute for Artificial Intelligence, Peking University, Beijing, China(人工智能研究院,北京大学,北京,中国) Beijing Key Laboratory of Brain-inspired Spiking Large Models, School of Computer Science, Peking University, Beijing, China(脑启发式脉冲大规模模型北京重点实验室,计算机科学学院,北京大学,北京,中国)

AI总结 针对连续控制中ANN到SNN转换性能差的问题,提出跨步残差电位初始化(CRPI)机制,通过抑制时间相关误差恢复性能。

Comments Accepted by ICML2026

详情
AI中文摘要

脉冲神经网络(SNN)可以通过转换已有的训练良好的人工神经网络(ANN)来获得有竞争力的性能,从而避免额外的昂贵训练。这一特性在强化学习(RL)中特别有吸引力,因为通过环境交互进行训练成本高昂且存在潜在风险。然而,现有的转换方法在连续控制中表现不佳,而合适的基线方法基本缺失。我们确定误差放大是关键原因:小的动作近似误差在决策步骤间变得时间相关,导致累积的状态分布偏移和严重的性能退化。为了解决这个问题,我们提出了跨步残差电位初始化(CRPI),一种轻量级无梯度机制,它在决策步骤间传递残余膜电位以抑制时间相关误差。在具有向量和视觉观测的连续控制基准上的实验表明,CRPI可以集成到现有的转换流程中,并显著恢复丢失的性能。我们的结果强调了连续控制是ANN到SNN转换的一个关键且具有挑战性的基准,其中小的误差可能被强烈放大并影响性能。代码可在 https://github.com/xuzijie32/ANN2SNN-CRPI 获取。

英文摘要

Spiking Neural Networks (SNNs) can achieve competitive performance by converting already existing well-trained Artificial Neural Networks (ANNs), avoiding further costly training. This property is particularly attractive in Reinforcement Learning (RL), where training through environment interaction is expensive and potentially unsafe. However, existing conversion methods perform poorly in continuous control, where suitable baselines are largely absent. We identify error amplification as the key cause: small action approximation errors become temporally correlated across decision steps, inducing cumulative state distribution shift and severe performance degradation. To address this issue, we propose Cross-Step Residual Potential Initialization (CRPI), a lightweight gradient-free mechanism that carries over residual membrane potentials across decision steps to suppress temporally correlated errors. Experiments on continuous control benchmarks with both vector and visual observations demonstrate that CRPI can be integrated into existing conversion pipelines and substantially recovers lost performance. Our results highlight continuous control as a critical and challenging benchmark for ANN-to-SNN conversion, where small errors can be strongly amplified and impact performance. Code is available at https://github.com/xuzijie32/ANN2SNN-CRPI.

2509.02970 2026-06-01 cs.LG math.OC 版本更新

Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation

延迟动量聚合:部分参与下通信高效的拜占庭鲁棒联邦学习

Kaoru Otsuka, Yuki Takezawa, Makoto Yamada

发表机构 * Okinawa Institute of Science and Technology, Japan(冲绳科学技术大学院大学,日本) Toyota Motor Corporation, Japan(丰田汽车公司,日本)

AI总结 针对部分参与场景下拜占庭客户端可能占多数的问题,提出延迟动量聚合原则,通过聚合未采样客户端的缓存动量和采样客户端的即时动量,确保拜占庭客户端在服务器视角中保持少数,并实例化为DeMoA优化器,理论分析和实验验证其鲁棒性和高效性。

Comments camera-ready version for ICML 2026

详情
AI中文摘要

部分参与对于大规模通信高效的联邦学习至关重要,然而现有的拜占庭鲁棒方法通常假设完全客户端参与。在部分参与设置中,一旦拜占庭客户端占主导地位,现有方法会立即失效。我们引入了延迟动量聚合原则,即中央服务器聚合来自未采样客户端的缓存动量以及来自采样客户端的即时动量。该原则确保即使拜占庭客户端在采样集中占主导地位,从服务器视角看它们仍然是少数。我们将该原则实例化为我们的优化器DeMoA。我们分析了DeMoA的收敛速率,表明DeMoA在部分参与下具有拜占庭鲁棒性。实验表明,在20%的拜占庭比例和仅10%的部分参与率下,即使现有方法在实践中失败,DeMoA也能达到最佳准确率。

英文摘要

Partial participation is essential for communication-efficient federated learning at scale, yet existing Byzantine-robust methods typically assume full client participation. In the partial participation setting, a majority of the sampled clients may be Byzantine, once Byzantine clients dominate, existing methods break down immediately. We introduce delayed momentum aggregation, a principle where the central server aggregates cached momentum from non-sampled clients along with fresh momentum from sampled clients. This principle ensures Byzantine clients remain a minority from the server's perspective even when they dominate the sampled set. We instantiate this principle in our optimizer DeMoA. We analyze the convergence rate of DeMoA, showing that DeMoA is Byzantine-robust under partial participation. Experiments show that, with 20% Byzantine ratio and only 10% partial participation rate, DeMoA achieves the best accuracy even when existing methods fail empirically.

2601.21686 2026-06-01 cs.LG 版本更新

Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold

不要那么Stief!在Stiefel流形上学习KV缓存低秩近似

Luca Benfenati, Matteo Risso, Andrea Vannozzi, Ahmet Caner Yüzügüler, Lukas Cavigelli, Enrico Macii, Daniele Jahier Pagliari, Alessio Burrello

发表机构 * Department of Control and Computer Engineering, Politecnico di Torino(控制与计算机工程系,托里诺理工学院) Huawei Zurich Research Center(华为苏黎世研究中心)

AI总结 提出StiefAttention方法,通过在Stiefel流形上学习正交投影基并最小化解码器层输出重建误差,实现KV缓存压缩,优于现有SVD方法。

详情
AI中文摘要

键值(KV)缓存能够实现快速自回归解码,但在长上下文中成为高带宽内存(HBM)容量和带宽的主要瓶颈。一种常见的缓解方法是通过将每个头的矩阵投影到较低秩来压缩缓存的键和值,仅将投影存储在HBM中。然而,现有的训练后方法通常使用SVD风格的代理目标来拟合这些投影,这可能无法很好地反映softmax、值混合以及后续解码器层变换后的端到端重建。为此,我们引入了StiefAttention,一种训练后KV缓存压缩方法,通过直接最小化解码器层输出重建误差来学习正交投影基。StiefAttention还构建了候选秩上的逐层误差-秩分布,从而能够在用户指定的KV缓存预算下进行顺序秩分配。值得注意的是,在相同条件下,对于Llama3-8B,StiefAttention在C4困惑度上比EigenAttention高出4.2个点,在0-shot MMLU准确率上高出8.9个点,在等压缩率下,相对于原始解码器层输出,实现了更低的相对误差和更高的余弦相似度。

英文摘要

Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end reconstruction after softmax, value mixing, and subsequent decoder-layer transformations. For these reasons, we introduce StiefAttention, a post-training KV-cache compression method that learns orthonormal projection bases by directly minimizing decoder-layer output reconstruction error. StiefAttention additionally constructs layer-wise error-rank profiles over candidate ranks, enabling sequential rank allocation under a user-specified KV cache budget. Notably, on Llama3-8B under the same conditions, StiefAttention outperforms EigenAttention by $4.2$ points on C4 perplexity and $8.9$ points on 0-shot MMLU accuracy at iso-compression, yielding lower relative error and higher cosine similarity with respect to the original decoder-layer outputs.

2601.21645 2026-06-01 cs.LG math.CT math.RT 版本更新

Identifiable Equivariant Networks are Layerwise Equivariant

可识别的等变网络是逐层等变的

Vahid Shahverdi, Giovanni Luca Marchetti, Georg Bökman, Kathlén Kohn

发表机构 * Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden(瑞典皇家理工学院数学系) University of Amsterdam, The Netherlands(荷兰阿姆斯特丹大学)

AI总结 本文证明,在适当可识别性条件下,端到端等变网络的参数选择可使每一层在潜在空间上等变,从而从数学上解释了训练中权重等变结构的涌现。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了深度神经网络中端到端等变性与逐层等变性之间的关系。我们证明:对于一个端到端函数关于输入和输出空间上的群作用等变的网络,存在一个参数选择使得该网络产生相同的端到端函数,并且其每一层关于潜在空间上的某些群作用是等变的。我们的结果假设模型参数在适当意义下是可识别的。对于一大类网络,这种可识别性已在文献中得到确立,我们的结果立即适用;而对于其他网络,它仍是推测性的。我们发展的理论基于抽象形式化,因此与架构无关。总体而言,我们的结果为训练过程中神经网络权重中等变结构的涌现——这一在实践中持续观察到的现象——提供了数学解释。

英文摘要

We investigate the relation between end-to-end equivariance and layerwise equivariance in deep neural networks. We prove the following: For a network whose end-to-end function is equivariant with respect to group actions on the input and output spaces, there is a parameter choice yielding the same end-to-end function such that its layers are equivariant with respect to some group actions on the latent spaces. Our result assumes that the parameters of the model are identifiable in an appropriate sense. This identifiability property has been established in the literature for a large class of networks, to which our results apply immediately, while it is conjectural for others. The theory we develop is grounded in an abstract formalism, and is therefore architecture-agnostic. Overall, our results provide a mathematical explanation for the emergence of equivariant structures in the weights of neural networks during training -- a phenomenon that is consistently observed in practice.

2601.20774 2026-06-01 cs.LG 版本更新

When More Data Doesn't Help: Limits of Adaptation in Multitask Learning

当更多数据无济于事:多任务学习中适应的极限

Steve Hanneke, Mingyue Xu

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN, USA(计算机科学系,普渡大学,西拉法济,印第安纳州,美国)

AI总结 本文通过建立更强的适应性不可能性结果,证明即使每个任务的数据量任意大,多任务学习仍然存在统计极限,无法通过聚合样本克服。

详情
AI中文摘要

多任务学习及相关框架在现代应用中取得了巨大成功。在多任务学习问题中,我们有一组从相关源任务收集的异构数据集,并希望提高性能,超过单独解决每个任务所能达到的效果。arXiv:2006.15785 的最新工作表明,在无法访问分布信息的情况下,只要每个任务的样本量有界,任何基于聚合样本的算法都无法保证最优风险。在本文中,我们专注于理解多任务学习的统计极限。我们超越了 arXiv:2006.15785 中的无免费午餐定理,建立了一个更强的适应性不可能性结果,该结果对每个任务的任意大样本量都成立。这一改进传达了一个重要信息:多任务学习的困难无法通过每个任务拥有大量数据来克服。我们还讨论了可能对未来研究感兴趣的最优适应性的概念。

英文摘要

Multitask learning and related frameworks have achieved tremendous success in modern applications. In multitask learning problem, we are given a set of heterogeneous datasets collected from related source tasks and hope to enhance the performance above what we could hope to achieve by solving each of them individually. The recent work of arXiv:2006.15785 has showed that, without access to distributional information, no algorithm based on aggregating samples alone can guarantee optimal risk as long as the sample size per task is bounded. In this paper, we focus on understanding the statistical limits of multitask learning. We go beyond the no-free-lunch theorem in arXiv:2006.15785 by establishing a stronger impossibility result of adaptation that holds for arbitrarily large sample size per task. This improvement conveys an important message that the hardness of multitask learning cannot be overcame by having abundant data per task. We also discuss the notion of optimal adaptivity that may be of future interests.

2601.20076 2026-06-01 math.OC cs.LG 版本更新

Randomized Feasibility Methods for Constrained Optimization with Adaptive Step Sizes

自适应步长的约束优化随机可行性方法

Abhishek Chakraborty, Angelia Nedić

发表机构 * School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA(电气、计算机与能源工程学院,亚利桑那州立大学,Tempe,USA)

AI总结 提出一种结合Polyak步长和随机约束采样的自适应步长随机可行性算法,用于解决强凸或一般凸目标函数下的约束优化问题,并证明线性收敛或O(1/√T)最坏情况速率。

详情
AI中文摘要

我们考虑在凸函数下水平集交集定义的约束下最小化目标函数。研究两种情况:(i) 强凸且Lipschitz光滑的目标函数;(ii) 凸但可能非光滑的目标函数。为了处理不易投影的约束,我们使用带有Polyak步长和每轮随机采样约束数量的随机可行性算法,同时采取(次)梯度步来最小化目标函数。对于情况(i),我们证明了使用自适应步长时目标函数值在期望上线性收敛到任意给定容差。对于情况(ii),我们开发了一种完全无问题参数的自适应步长方案,在期望上达到O(1/√T)的最坏情况速率。迭代的不可行性几乎必然随可行性更新次数几何级数下降,而对于平均迭代,我们建立了函数值相对于最优值的期望下界,该下界依赖于随机采样约束数量的分布。对于某些样本量增长的选择,可以达到最优速率。最后,在二次约束二次规划(QCQP)问题、支持向量机(SVM)和具有群体公平约束的逻辑回归上的仿真表明,我们的算法相比其他最先进方法具有计算效率优势。

英文摘要

We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (ii) convex but possibly nonsmooth objective function. To deal with the constraints that are not easy to project on, we use a randomized feasibility algorithm with Polyak steps and a random number of sampled constraints per iteration, while taking (sub)gradient steps to minimize the objective function. For case (i), we prove linear convergence in expectation of the objective function values to any prescribed tolerance using an adaptive stepsize. For case (ii), we develop a fully problem parameter-free and adaptive stepsize scheme that yields an $O(1/\sqrt{T})$ worst-case rate in expectation. The infeasibility of the iterates decreases geometrically with the number of feasibility updates almost surely, while for the averaged iterates, we establish an expected lower bound on the function values relative to the optimal value that depends on the distribution for the random number of sampled constraints. For certain choices of sample-size growth, optimal rates are achieved. Finally, simulations on a Quadratically Constrained Quadratic Programming (QCQP) problem, Support Vector Machines (SVM), and logistic regression with group fairness constraints demonstrate the computational efficiency of our algorithm compared to other state-of-the-art methods.

2601.19966 2026-06-01 cond-mat.mtrl-sci cs.LG physics.chem-ph physics.comp-ph 版本更新

Global Plane Waves From Local Gaussians: Periodic Charge Densities in a Blink

从局部高斯到全局平面波:眨眼间的周期电荷密度

Jonas Elsborg, Felix Ærtebjerg, Luca Thiede, Alán Aspuru-Guzik, Tejs Vegge, Arghya Bhowmik

发表机构 * Technical University of Denmark(技术大学) University of Toronto(多伦多大学) CAPeX Pioneer Center for Accelerating P2X Materials Discovery(CAPeX先锋中心) Canadian Institute for Advanced Research (CIFAR)(加拿大高级研究 institute) Vector Institute for Artificial Intelligence(人工智能研究所)

AI总结 提出ELECTRAFI模型,利用实空间各向异性高斯的解析傅里叶变换和泊松求和公式,通过单次逆FFT快速重建周期电荷密度,在保持高精度的同时速度提升高达633倍。

Comments ICML 2026, 29 pages including appendix, 11 Figures, 7 tables

详情
AI中文摘要

我们引入了ELECTRAFI,一种快速、端到端可微的模型,用于预测晶体材料中的周期电荷密度。ELECTRAFI在实空间中构建各向异性高斯,并利用其闭式傅里叶变换,通过泊松求和公式解析地评估平面波系数。该公式将非局域和周期行为委托给解析变换,使得通过单次逆FFT即可重建完整的周期电荷密度。通过避免显式的实空间网格探测、周期图像求和以及球谐展开,ELECTRAFI在周期基准测试中达到或超越了最先进的精度,同时比最强的竞争方法快高达633倍,在几分之一秒内重建晶体电荷密度。当用于初始化DFT计算时,ELECTRAFI将总DFT计算成本降低高达约20%,而较慢的电荷密度模型由于高推理时间而抵消了节省。我们的结果表明,准确性和推理成本共同决定了端到端DFT加速,并激励我们关注效率。

英文摘要

We introduce ELECTRAFI, a fast, end-to-end differentiable model for predicting periodic charge densities in crystalline materials. ELECTRAFI constructs anisotropic Gaussians in real space and exploits their closed-form Fourier transforms to analytically evaluate plane-wave coefficients via the Poisson summation formula. This formulation delegates non-local and periodic behavior to analytic transforms, enabling reconstruction of the full periodic charge density with a single inverse FFT. By avoiding explicit real-space grid probing, periodic image summation, and spherical harmonic expansions, ELECTRAFI matches or exceeds state-of-the-art accuracy across periodic benchmarks while being up to $633 \times$ faster than the strongest competing method, reconstructing crystal charge densities in a fraction of a second. When used to initialize DFT calculations, ELECTRAFI reduces total DFT compute cost by up to ~20%, whereas slower charge density models negate savings due to high inference times. Our results show that accuracy and inference cost jointly determine end-to-end DFT speedups, and motivate our focus on efficiency.

2601.19936 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Gap-K%: 通过测量 Top-1 预测差距检测预训练数据

Minseo Kwak, Jaehyung Kim

发表机构 * Yonsei University(延世大学)

AI总结 提出 Gap-K% 方法,利用 LLM 的 top-1 预测与目标 token 的对数概率差距及滑动窗口策略,在 WikiMIA 和 MIMIR 基准上实现预训练数据检测的最优性能。

Comments ACL 2026 Main Conference; 15 pages

详情
AI中文摘要

大型语言模型(LLM)中大规模预训练语料库的不透明性引发了严重的隐私和版权问题,使得预训练数据检测成为一项关键挑战。现有的最先进方法通常依赖于 token 似然,但它们往往忽略了目标 token 与模型 top-1 预测之间的差距,以及相邻 token 之间的局部相关性。在这项工作中,我们提出了 Gap-K%,一种基于 LLM 预训练优化动态的新型预训练数据检测方法。通过分析下一个 token 预测目标,我们观察到模型 top-1 预测与目标 token 之间的差异会引发强烈的梯度信号,这些信号在训练过程中被明确惩罚。受此启发,Gap-K% 利用 top-1 预测 token 与目标 token 之间的对数概率差距,并结合滑动窗口策略来捕获局部相关性并缓解 token 级别的波动。在 WikiMIA 和 MIMIR 基准上的大量实验表明,Gap-K% 实现了最先进的性能,在各种模型大小和输入长度上始终优于先前的基线方法。

英文摘要

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the gap between the target token and the model's top-1 prediction, as well as local correlations between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model's top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.

2601.19448 2026-06-01 cs.LG cs.CR 版本更新

From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Data-Free Online Backdoor Defense

从内部诊断到外部审计:一种VLM驱动的数据自由在线后门防御范式

Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

发表机构 * The Chinese University of Hong Kong, Hong Kong(香港中文大学) Sun Yat-sen University, China(中山大学) Zhejiang University, China(浙江大学)

AI总结 提出一种从内部诊断到外部语义审计的范式转变,利用通用视觉语言模型(VLM)作为外部语义门控,通过PRISM框架(原型精炼与统计监控检查)实现数据自由的在线后门防御,在17个数据集和11种攻击类型上达到最先进性能。

Comments 25 pages, 10 figures, 19 tables. To appear in the Proceedings of the 43 rd International Conference on Machine Learning (ICML '26)

详情
AI中文摘要

深度神经网络本质上仍然容易受到后门攻击。传统的测试时防御主要在内部诊断方法的范式下运作,如模型修复或输入鲁棒性,但这些方法在高级攻击下往往脆弱,因为它们仍然与受害模型的受损参数纠缠在一起。我们提出从内部诊断到外部语义审计的范式转变,认为有效的防御需要通过一个独立的、基于语义的审计器将安全性与受害模型解耦。为此,我们提出了一个框架,利用通用视觉语言模型(VLM)作为不断演化的语义门控。我们引入了PRISM(原型精炼与统计监控检查),通过两个关键机制克服通用VLM的领域差距:一个混合VLM教师动态在线精炼视觉原型,以及一个由统计边界监控驱动的自适应路由器实时校准门控阈值。在17个数据集和11种攻击类型上的广泛评估表明,PRISM实现了最先进的性能,在CIFAR-10上将攻击成功率抑制到<1%,同时提高了干净准确率,为模型无关的外部化安全建立了新标准。

英文摘要

Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes online, and an Adaptive Router powered by statistical margin monitoring to calibrate gating thresholds in real-time. Extensive evaluation across 17 datasets and 11 attack types demonstrates that PRISM achieves state-of-the-art performance, suppressing Attack Success Rate to <1% on CIFAR-10 while improving clean accuracy, establishing a new standard for model-agnostic, externalized security.

2601.19220 2026-06-01 cs.LG 版本更新

Accelerated Multiple Wasserstein Gradient Flows for Multi-objective Distributional Optimization

加速多Wasserstein梯度流用于多目标分布优化

Dai Hai Nguyen, Duc Dung Nguyen, Atsuyoshi Nakamura, Hiroshi Mamitsuka

发表机构 * Graduate School of Information Science and Technology, Hokkaido University, Japan(信息科学与技术研究生学校,北海道大学,日本) Bioinformatics Center, Kyoto University, Japan(生物信息学中心,京都大学,日本) Institute of Information Technology, Vietnam Academy of Science and Technology, Vietnam(信息技术研究所,越南科学与技术学院,越南)

AI总结 提出加速多Wasserstein梯度下降算法(A-MWGraD),通过Nesterov加速实现多目标分布优化,在测地凸和强测地凸目标下分别达到O(1/t^2)和O(e^{-√βt})收敛率,优于MWGraD的O(1/t)。

Comments ICML 2026

详情
AI中文摘要

我们研究了Wasserstein空间中概率分布的多目标优化。最近,Nguyen等人(2025)提出了多Wasserstein梯度下降(MWGraD)算法,该算法利用Wasserstein空间的几何结构来联合优化多个目标。基于这种方法,我们提出了一种加速变体A-MWGraD,其灵感来自Nesterov加速。我们分析了连续时间动力学,并建立了在概率空间中收敛到弱帕累托最优点。我们的理论结果表明,对于测地凸目标,A-MWGraD达到O(1/t^2)的收敛速度;对于β-强测地凸目标,达到O(e^{-√βt})的收敛速度,在测地凸设置下优于MWGraD的O(1/t)速率。我们进一步引入了A-MWGraD的实用基于核的离散化,并通过数值实验证明,在多目标采样任务中,它在收敛速度和采样效率上始终优于MWGraD。

英文摘要

We study multi-objective optimization over probability distributions in Wasserstein space. Recently, Nguyen et al. (2025) introduced Multiple Wasserstein Gradient Descent (MWGraD) algorithm, which exploits the geometric structure of Wasserstein space to jointly optimize multiple objectives. Building on this approach, we propose an accelerated variant, A-MWGraD, inspired by Nesterov's acceleration. We analyze the continuous-time dynamics and establish convergence to weakly Pareto optimal points in probability space. Our theoretical results show that A-MWGraD achieves a convergence rate of O(1/t^2) for geodesically convex objectives and O(e^{-\sqrtβt}) for $β$-strongly geodesically convex objectives, improving upon the O(1/t) rate of MWGraD in the geodesically convex setting. We further introduce a practical kernel-based discretization for A-MWGraD and demonstrate through numerical experiments that it consistently outperforms MWGraD in convergence speed and sampling efficiency on multi-target sampling tasks.

2601.16426 2026-06-01 cs.LG 版本更新

Safe Multitask Molecular Graph Networks for Vapor Pressure and Odor Threshold Prediction

用于蒸气压和气味阈值预测的安全多任务分子图网络

Shuang Wu, Meijie Wang, Lun Yu

发表机构 * Department of Civil, Environmental and Geomatic Engineering, University College London(伦敦大学学院土木、环境与地理工程系) Metanovas Biotech, Inc.(MetaNovas生物技术公司)

AI总结 提出一种安全多任务方法,以蒸气压为主任务、气味阈值为辅助任务,结合A20/E17分子图特征和PNA骨干网络,在Bemis-Murcko骨架划分下实现最优蒸气压泛化性能。

详情
AI中文摘要

我们研究了气味相关属性建模中的两个重要任务:蒸气压(VP)和气味阈值(OP)。为了评估模型的分布外(OOD)能力,我们采用了Bemis-Murcko骨架划分。在特征方面,我们引入了丰富的A20/E17分子图特征(20维原子特征+17维键特征),并系统比较了GINE和PNA骨干网络。结果表明:对于VP,使用简单回归头的PNA实现了验证MSE≈0.21(归一化空间);对于相同骨架划分下的OP单任务,使用A20/E17和鲁棒训练(Huber/winsor)实现了验证MSE≈0.60-0.61。对于多任务训练,我们提出了一种**“安全多任务”**方法:以VP为主任务,OP为辅助任务,使用延迟激活+梯度裁剪+小权重,这避免了对主任务的损害,同时获得了最佳的VP泛化性能。本文提供了完整的可重复实验、消融研究和误差相似性分析,同时讨论了数据噪声的影响和方法的局限性。

英文摘要

We investigate two important tasks in odor-related property modeling: Vapor Pressure (VP) and Odor Threshold (OP). To evaluate the model's out-of-distribution (OOD) capability, we adopt the Bemis-Murcko scaffold split. In terms of features, we introduce the rich A20/E17 molecular graph features (20-dimensional atom features + 17-dimensional bond features) and systematically compare GINE and PNA backbones. The results show: for VP, PNA with a simple regression head achieves Val MSE $\approx$ 0.21 (normalized space); for the OP single task under the same scaffold split, using A20/E17 with robust training (Huber/winsor) achieves Val MSE $\approx$ 0.60-0.61. For multitask training, we propose a **"safe multitask"** approach: VP as the primary task and OP as the auxiliary task, using delayed activation + gradient clipping + small weight, which avoids harming the primary task and simultaneously yields the best VP generalization performance. This paper provides complete reproducible experiments, ablation studies, and error-similarity analysis while discussing the impact of data noise and method limitations.

2601.16366 2026-06-01 cs.LG cs.SC 版本更新

Post-Training Neural Network Pruning using Graph Curvature

使用图曲率的训练后神经网络剪枝

Shuhang Tan, Jayson Sia, Paul Bogdan, Radoslav Ivanov

发表机构 * Rensselaer Polytechnic Institute(新罕布什尔理工学院) University of Southern California(南加州大学)

AI总结 提出基于Ollivier-Ricci曲率(ORC)的神经曲率(NC)概念,通过计算激活模式下的边曲率来识别神经网络中不重要的连接,实现高效剪枝。

详情
AI中文摘要

本文通过图论的视角为神经网络(NN)剪枝问题提供了新的视角。为了实现有效的剪枝,我们旨在识别主要的NN数据流以及相应的NN连接,这些连接对于完整模型的性能最重要和最不重要。与基于信息论的NN数据分析标准方法不同,我们采用了图曲率的概念,特别是Ollivier-Ricci曲率(ORC)。ORC已成功用于识别各种领域中的重要图边,如道路交通分析、生物网络和社交网络。特别是,具有负ORC的边被认为是瓶颈,因此对图的整体连通性至关重要,而正ORC的边则不那么重要。我们将这种直觉用于NN:(1)构建由NN结构诱导的图,并基于ORC引入神经曲率(NC)的概念;(2)根据一组输入示例的激活模式计算曲率;(3)证明NC可用于根据边对整体NN功能的重要性对边进行排序。我们通过在三个图像数据集(MNIST、CIFAR-10和CIFAR-100)上训练的各种中小型模型上进行剪枝实验来评估我们的方法。结果表明,与现有剪枝方法相比,我们的方法可以识别出更多不重要的边。

英文摘要

This paper provides a fresh view of the neural network (NN) pruning problem through the lens of graph theory. To achieve effective pruning, we aim to identify the main NN data flows and the corresponding NN connections that are most and least important for the performance of the full model. Unlike the standard approach to NN data flow analysis, which is based on information theory, we employ the notion of graph curvature, specifically Ollivier-Ricci curvature (ORC). ORC has been successfully used to identify important graph edges in various domains such as road traffic analysis, biological networks, and social networks. In particular, edges with negative ORC are considered bottlenecks and are therefore critical to the graph's overall connectivity, whereas positive-ORC edges are less essential. We use this intuition for NNs to (1) construct a graph induced by the NN structure and introduce the notion of neural curvature (NC) based on ORC; (2) calculate curvatures based on activation patterns for a set of input examples; and (3) demonstrate that NC can be used to rank edges according to their importance for overall NN functionality. We evaluate our method through pruning experiments on a variety of small and medium size models trained on three image datasets: MNIST, CIFAR-10, and CIFAR-100. The results indicate that our method can identify a larger number of unimportant edges compared to existing pruning methods.

2601.13704 2026-06-01 cs.SD cs.AI cs.LG eess.AS 版本更新

Performance and Complexity Trade-off Optimization of Speech Models During Training

训练过程中语音模型的性能与复杂度权衡优化

Esteban Gómez, Tom Backström

发表机构 * Department of Information and Communications Engineering, Aalto University(信息与通信工程系,艾尔托大学)

AI总结 提出一种基于特征噪声注入的重新参数化技术,利用随机梯度下降方法在训练中联合优化语音模型的性能和计算复杂度,实现动态模型大小调整。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

在语音机器学习中,神经网络模型通常通过选择具有固定层大小和结构的架构来设计。这些模型随后被训练以最大化与任务目标相关的性能指标。虽然整体架构通常由任务的先验知识指导,但各层的大小往往是启发式选择的。然而,这种方法并不能保证性能与计算复杂度之间的最优权衡;因此,通常采用权重量化或模型剪枝等后处理方法以降低计算成本。这是因为随机梯度下降(SGD)方法只能优化可微函数,而影响计算复杂度的因素(如层大小和每秒浮点运算次数(FLOP/s))是不可微的,需要在训练过程中修改模型结构。我们提出了一种基于特征噪声注入的重新参数化技术,使得在训练过程中能够使用基于SGD的方法联合优化性能和计算复杂度。与传统的剪枝方法不同,我们的方法允许模型大小针对目标性能-复杂度权衡进行动态优化,而无需依赖启发式标准来选择要移除的权重或结构。我们通过三个案例研究证明了我们方法的有效性,包括一个合成示例和两个实际应用:语音活动检测和音频反欺骗。与我们的工作相关的代码已公开,以鼓励进一步研究。

英文摘要

In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure. These models are then trained to maximize performance on metrics aligned with the task's objective. While the overall architecture is usually guided by prior knowledge of the task, the sizes of individual layers are often chosen heuristically. However, this approach does not guarantee an optimal trade-off between performance and computational complexity; consequently, post hoc methods such as weight quantization or model pruning are typically employed to reduce computational cost. This occurs because stochastic gradient descent (SGD) methods can only optimize differentiable functions, while factors influencing computational complexity, such as layer sizes and floating-point operations per second (FLOP/s), are non-differentiable and require modifying the model structure during training. We propose a reparameterization technique based on feature noise injection that enables joint optimization of performance and computational complexity during training using SGD-based methods. Unlike traditional pruning methods, our approach allows the model size to be dynamically optimized for a target performance-complexity trade-off, without relying on heuristic criteria to select which weights or structures to remove. We demonstrate the effectiveness of our method through three case studies, including a synthetic example and two practical real-world applications: voice activity detection and audio anti-spoofing. The code related to our work is publicly available to encourage further research.

2510.01137 2026-06-01 cs.LG 版本更新

Re-examining Low Rank adaptation for private LLM fine-tuning

重新审视用于私有LLM微调的低秩适应

Ali Dadsetan, Frank Rudzicz

发表机构 * Dalhousie University(达尔豪斯大学) Vector Institute(向量研究所)

AI总结 研究差分隐私SGD中噪声导致的梯度奇异值膨胀问题,提出通过部分恢复原始奇异值分布来提升DP-SGD的样本效率。

详情
AI中文摘要

隐私是在敏感数据上微调大型语言模型(LLM)时的核心关注点,差分隐私随机梯度下降(DP-SGD)——它裁剪每个样本的梯度并添加校准的高斯噪声——是形式化隐私保证的标准工具。理论和实践都表明,低秩模型更适合DP训练,这一特性对LLM尤其相关,因为其微调梯度表现出强烈的低秩结构。诸如DP-LoRA之类的方法通过将更新限制在低秩子空间来利用这一点,即仅保留每层梯度SVD中的少数非零分量。然而,我们认为,虽然非零分量少很重要,但DP-SGD注入的各向同性噪声会膨胀梯度矩阵的奇异值,破坏其自然快速衰减。在这项工作中,我们研究了这种噪声引起的特征值膨胀是否会降低性能,并表明部分恢复原始奇异值分布显著提高了DP-SGD的样本效率。在语言分类(使用RoBERTa的GLUE基准)和文本生成(使用Qwen和Llama模型(参数高达4B)的E2E和DART表格到文本基准)上的实验表明,恢复奇异值的快速衰减是一种在不损害隐私保证的情况下加速DP优化过程的有效策略。

英文摘要

Privacy is a central concern when fine-tuning large language models (LLMs) on sensitive data, and differentially private stochastic gradient descent (DP-SGD) -- which clips per-sample gradients and adds calibrated Gaussian noise -- is the standard tool for formal privacy guarantees. Both theory and practice show that lower-rank models are better suited to DP training, a property especially relevant for LLMs, whose fine-tuning gradients exhibit a strong low-rank structure. Methods such as DP-LoRA exploit this by restricting updates to a low-rank subspace, i.e., retaining only a few non-zero components in the SVD of each layer's gradient. However, we argue that while having few non-zero components is important, the isotropic noise injected by DP-SGD inflates the singular values of the gradient matrix, disrupting their naturally fast decay. In this work, we investigate whether this noise-induced eigenvalue blow-up reduces performance, and show that partially restoring the original singular-value profile significantly improves the sample efficiency of DP-SGD. Experiments on language classification (GLUE benchmark with RoBERTa) and text generation (E2E and DART table-to-text benchmarks with Qwen and Llama models up to 4B parameters) showcase that restoring the fast decay of singular values is a viable strategy for speeding up the DP optimization process, without compromising privacy guarantees.

2601.05134 2026-06-01 cs.LG 版本更新

Sequential Subspace Noise Injection Prevents Accuracy Collapse in Certified Unlearning

顺序子空间噪声注入防止认证遗忘中的精度崩溃

Polina Dolgova, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍茨信息安全中心) Universität des Saarlandes(萨尔布吕肯大学)

AI总结 提出顺序子空间噪声调度,将噪声预算分配到参数空间的正交子空间,在保持差分隐私认证保证的同时,显著提高遗忘后模型精度。

详情
AI中文摘要

基于差分隐私的认证遗忘提供了强有力的保证,但在很大程度上仍不实用:目前提出的噪声微调方法虽然实现了这些保证,但严重降低了模型精度。我们提出了顺序噪声调度,它将噪声预算分布到参数空间的正交子空间中,而不是一次性注入所有噪声。这种简单的修改减轻了噪声的破坏性影响,同时保留了原始的认证保证。我们将噪声微调的分析扩展到子空间设置,证明保留了相同的 $(\varepsilon,\delta)$ 隐私预算。在图像分类基准上的实验结果表明,我们的方法在遗忘后显著提高了精度,同时对成员推断攻击保持鲁棒性。这些结果表明,认证遗忘可以实现严格的保证和实际的效用。

英文摘要

Certified unlearning based on differential privacy offers strong guarantees but remains largely impractical: the noisy fine-tuning approaches proposed so far achieve these guarantees but severely reduce model accuracy. We propose sequential noise scheduling, which distributes the noise budget across orthogonal subspaces of the parameter space, rather than injecting it all at once. This simple modification mitigates the destructive effect of noise while preserving the original certification guarantees. We extend the analysis of noisy fine-tuning to the subspace setting, proving that the same $(\varepsilon,δ)$ privacy budget is retained. Empirical results on image classification benchmarks show that our approach substantially improves accuracy after unlearning while remaining robust to membership inference attacks. These results show that certified unlearning can achieve both rigorous guarantees and practical utility.

2601.01456 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration

重新思考多模态少样本3D点云分割:从融合精炼到解耦仲裁

Wentao Bian, Fenglei Xu

发表机构 * Suzhou University of Science and Technology(苏州科技大学)

AI总结 针对多模态少样本3D点云分割中“融合-精炼”范式的“可塑性-稳定性困境”和CLIP的语义盲区,提出解耦专家仲裁少样本分割网络(DA-FSS),通过解耦语义与几何路径并相互正则化梯度,实现更好的泛化性能。

Comments Accepted to IJCAI-ECAI 2026 (Main Track). 9 pages, 3 figures, 3 tables

详情
AI中文摘要

本文重新审视多模态少样本3D点云语义分割(FS-PCS),识别出“融合-精炼”范式中的一个冲突:“可塑性-稳定性困境”。此外,CLIP的类间混淆可能导致语义盲区。为解决这些问题,我们提出解耦专家仲裁少样本分割网络(DA-FSS),该模型有效区分语义和几何路径,并相互正则化它们的梯度以实现更好的泛化。DA-FSS采用与MM-FSS相同的主干网络和预训练文本编码器生成文本嵌入,从而提高自由模态的利用率并更好地利用每个模态的信息空间。为此,我们提出并行专家精炼模块以生成每个模态相关性。我们还提出堆叠仲裁模块(SAM)执行卷积融合并为每个模态路径仲裁相关性。并行专家解耦两条路径:几何专家保持可塑性,语义专家确保稳定性。它们通过解耦对齐模块(DAM)协调,该模块在不传播混淆的情况下传递知识。在流行数据集(S3DIS、ScanNet)上的实验表明DA-FSS优于MM-FSS。同时,几何边界、完整性和纹理区分均优于基线。代码可在https://github.com/MoWenQAQ/DA-FSS/获取。

英文摘要

In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in "Fuse-then-Refine" paradigms: the "Plasticity-Stability Dilemma." In addition, CLIP's inter-class confusion can result in semantic blindness. To address these issues, we present the Decoupled-experts Arbitration Few-Shot SegNet (DA-FSS), a model that effectively distinguishes between semantic and geometric paths and mutually regularizes their gradients to achieve better generalization. DA-FSS employs the same backbone and pre-trained text encoder as MM-FSS to generate text embeddings, which can increase free modalities' utilization rate and better leverage each modality's information space. To achieve this, we propose a Parallel Expert Refinement module to generate each modal correlation. We also propose a Stacked Arbitration Module (SAM) to perform convolutional fusion and arbitrate correlations for each modality pathway. The Parallel Experts decouple two paths: a Geometric Expert maintains plasticity, and a Semantic Expert ensures stability. They are coordinated via a Decoupled Alignment Module (DAM) that transfers knowledge without propagating confusion. Experiments on popular datasets (S3DIS, ScanNet) demonstrate the superiority of DA-FSS over MM-FSS. Meanwhile, geometric boundaries, completeness, and texture differentiation are all superior to the baseline. The code is available at: https://github.com/MoWenQAQ/DA-FSS/.

2601.01075 2026-06-01 cs.LG cs.AI cs.CV 版本更新

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

流等变世界模型:部分观测动态环境的记忆

Hansen Jin Lillemark, Benhao Huang, Fangneng Zhan, Yilun Du, Thomas Anderson Keller

发表机构 * Kempner Institute, Harvard University(哈佛大学 Kempner 研究所) ML, Carnegie Mellon University(卡内基梅隆大学 ML 研究所) SEAS, Harvard University(哈佛大学 SEAS 研究所)

AI总结 提出流等变世界建模框架,利用时间参数化对称性在潜在记忆中实现长时程稳定准确的动力学预测,解决部分观测问题。

Comments Accepted at ICML 2026

详情
AI中文摘要

具身系统将世界体验为“流之交响”:多种连续感官输入流与自身运动耦合,并与外部物体的动力学交织。这些感官流和世界的基本动力学遵循平滑的时间参数化对称性,而现有的世界模型忽略了这一点。如果没有尊重这种结构的记忆,部分可观测性对现有方法构成主要障碍:每次观测仅揭示世界的一部分,而未观测区域继续演化。在这项工作中,我们引入了流等变世界建模,这是一个利用潜在记忆中的时间参数化对称性来实现长时程稳定准确动力学预测的框架。潜在记忆随自身运动和推断的外部物体运动等变地移动和变换,使关于视野外区域的信息随时间保持对齐。我们在2D和3D部分观测视频世界建模基准上展示了该框架相对于最先进的扩散、记忆增强和循环世界模型架构的优势。更广泛地说,我们的结果表明,当预测表示按照它们所建模的世界的时间和动力学结构组织时,它们会变得更加强大。项目页面:https://flowequivariantworldmodels.github.io/

英文摘要

Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external objects. These sensory streams and the underlying dynamics of the world obey smooth, time-parameterized symmetries which existing world models ignore. Without a memory that respects this structure, partial observability presents a major obstacle to existing methods: each observation reveals only a fraction of the world, while unobserved regions continue to evolve. In this work, we introduce Flow Equivariant World Modeling, a framework that leverages time-parameterized symmetries within a latent memory for stable and accurate dynamics prediction over long horizons. The latent memory shifts and transforms equivariantly with self-motion and inferred external object motion, keeping information about out-of-view regions aligned as time progresses. We demonstrate the advantage of this framework over state-of-the-art diffusion, memory-augmented, and recurrent world model architectures on 2D and 3D partially observed video world modeling benchmarks. More broadly, our results suggest that predictive representations become more powerful when they are organized in line with the temporal and dynamical structure of the world they model. Project page: https://flowequivariantworldmodels.github.io/

2512.23626 2026-06-01 cs.AI cs.LG 版本更新

Regret-Based Federated Causal Discovery with Unknown Interventions

基于遗憾的联邦因果发现与未知干预

Federico Baldo, Charles K. Assaad

发表机构 * Sorbonne Université(索邦大学) INSERM(国家健康与医学研究院) Institut Pierre Louis d’Epidémiologie et de Santé Publique(流行病学与公共卫生研究所)

AI总结 提出I-PERI算法,通过恢复客户端图并集的CPDAG并利用跨客户端干预引起的结构差异定向额外边,得到更紧的Φ-马尔可夫等价类,解决联邦环境下未知客户端级干预的因果发现问题。

Comments ICML 2026

详情
AI中文摘要

大多数因果发现方法从观测数据中恢复一个表示马尔可夫等价类的完全部分有向无环图。最近的工作将这些方法扩展到联邦设置以解决数据去中心化和隐私约束,但通常假设所有客户端共享相同的因果模型,这在实践中不现实,因为客户端特定的策略或协议(例如不同医院)自然会导致异质且未知的干预。在这项工作中,我们解决了未知客户端级干预下的联邦因果发现问题。我们提出了I-PERI,一种新颖的联邦算法,首先恢复客户端图并集的CPDAG,然后通过利用跨客户端干预引起的结构差异来定向额外的边。这产生了一个更紧的等价类,我们称之为Φ-马尔可夫等价类,由Φ-CPDAG表示。我们提供了I-PERI收敛性及其隐私保护属性的理论保证,并在合成数据上进行了实证评估,证明了所提算法的有效性。

英文摘要

Most causal discovery methods recover a completed partially directed acyclic graph representing a Markov equivalence class from observational data. Recent work has extended these methods to federated settings to address data decentralization and privacy constraints, but often under idealized assumptions that all clients share the same causal model. Such assumptions are unrealistic in practice, as client-specific policies or protocols, for example, across hospitals, naturally induce heterogeneous and unknown interventions. In this work, we address federated causal discovery under unknown client-level interventions. We propose I-PERI, a novel federated algorithm that first recovers the CPDAG of the union of client graphs and then orients additional edges by exploiting structural differences induced by interventions across clients. This yields a tighter equivalence class, which we call the $\mathbfΦ$-Markov Equivalence Class, represented by the $\mathbfΦ$-CPDAG. We provide theoretical guarantees on the convergence of I-PERI, as well as on its privacy-preserving properties, and present empirical evaluations on synthetic data demonstrating the effectiveness of the proposed algorithm.

2507.12453 2026-06-01 cs.LG 版本更新

Cost-aware Stopping for Bayesian Optimization

成本感知的贝叶斯优化停止规则

Qian Xie, Linda Cai, Alexander Terenin, Peter I. Frazier, Ziv Scully

发表机构 * Cornell University, Ithaca, New York, USA(康奈尔大学) University of California, Berkeley, Berkeley, California, USA(加州大学伯克利分校)

AI总结 针对贝叶斯优化中成本感知的停止问题,提出一种基于理论连接成本感知采集函数的停止规则,并证明其能保证期望成本调整简单遗憾的界。

Comments Accepted by ICML 2026

详情
AI中文摘要

在自动化机器学习、科学发现以及贝叶斯优化的其他应用中,以成本感知的方式决定何时停止评估昂贵的黑盒函数是一个重要但尚未充分探索的实际考虑。为此目的的一个自然性能指标是成本调整的简单遗憾,它明确捕捉了解决方案质量与累积评估成本之间的权衡。现有的贝叶斯优化停止规则要么是启发式的,要么具有理论基础但设计用于优化简单遗憾而不考虑评估成本;因此,当成本较高时,它们无法保证避免不必要的评估。我们提出了一种原则性的成本感知贝叶斯优化停止规则,该规则无需启发式调优即可适应变化的评估成本。我们的规则基于与最先进的成本感知采集函数(即潘多拉盒子Gittins指数(PBGI)和对数每成本期望改进(LogEIPC))的理论联系。当与任一采集函数配对时,我们证明所得策略满足一个理论保证,限制了期望成本调整的简单遗憾。在包括超参数优化和神经架构规模搜索在内的合成任务和实证基准测试中,将我们的停止规则与PBGI或LogEIPC配对,通常在成本调整的简单遗憾方面匹配或优于其他采集函数-停止规则配对。

英文摘要

In automated machine learning, scientific discovery, and other applications of Bayesian optimization, deciding when to stop evaluating expensive black-box functions in a cost-aware manner is an important but underexplored practical consideration. A natural performance metric for this purpose is the cost-adjusted simple regret, which explicitly captures the trade-off between solution quality and cumulative evaluation cost. Existing stopping rules for Bayesian optimization are either heuristic, or are theoretically grounded but designed to optimize simple regret without accounting for evaluation costs; as a result, they provide no guarantees against unnecessary evaluations when costs are high. We propose a principled cost-aware stopping rule for Bayesian optimization that adapts to varying evaluation costs without heuristic tuning. Our rule is grounded in a theoretical connection to state-of-the-art cost-aware acquisition functions, namely the Pandora's Box Gittins Index (PBGI) and log expected improvement per cost (LogEIPC). When paired with either acquisition function, we prove that the resulting policy satisfies a theoretical guarantee bounding the expected cost-adjusted simple regret. Across synthetic tasks and empirical benchmarks including hyperparameter optimization and neural architecture size search, pairing our stopping rule with PBGI or LogEIPC usually matches or outperforms other acquisition-function--stopping-rule pairs in terms of cost-adjusted simple regret.

2512.20732 2026-06-01 cs.LG cs.AI cs.SE 版本更新

FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs

FEM-Bench:评估代码生成大语言模型的结构化科学推理基准

Saeed Mohammadzadeh, Erfan Hamdi, Joel Shor, Emma Lejeune

发表机构 * Boston University(波士顿大学) Move37 Labs(Move37实验室) Department of Mechanical Engineering(机械工程系)

AI总结 提出FEM-Bench基准,通过有限元方法相关编程任务评估大语言模型在科学计算中的结构化推理能力,实验表明现有模型尚不能稳定解决所有任务。

Comments 45 pages, 5 figures, 9 tables, 7 listings

详情
AI中文摘要

随着大语言模型在物理世界推理能力上的进步,缺乏严格基准来评估其生成科学有效物理模型的能力已成为一个关键缺口。计算力学开发和运用数学模型与数值方法,预测物理系统在力、变形和约束下的行为,为结构化科学推理评估提供了理想基础。问题遵循清晰的数学结构,强制执行严格的物理和数值约束,并支持客观验证。该学科要求构建物理系统的显式模型,并推理几何、空间关系和材料行为,直接联系到新兴的AI物理推理和世界建模目标。我们提出FEM-Bench,一个计算力学基准,旨在评估大语言模型生成正确有限元方法及相关代码的能力。FEM-Bench 2025包含一系列入门但非平凡的任务,与计算力学研究生第一门课程的材料一致。这些任务捕捉了基本的数值和物理建模挑战,同时仅代表该学科复杂性的很小一部分。尽管简单,最先进的大语言模型并不能可靠地解决所有任务。在五次尝试中,函数编写表现最好的模型Gemini 3 Pro至少一次完成了30/33个任务,五次全部完成26/33个任务。单元测试编写表现最好的模型GPT-5的平均联合成功率为73.8%。其他流行模型显示出广泛的性能差异。FEM-Bench为评估AI生成的科学代码建立了结构化基础,未来版本将纳入更复杂的任务以跟踪模型进展。

英文摘要

As LLMs advance their reasoning capabilities about the physical world, the absence of rigorous benchmarks for evaluating their ability to generate scientifically valid physical models has become a critical gap. Computational mechanics, which develops and applies mathematical models and numerical methods to predict the behavior of physical systems under forces, deformation, and constraints, provides an ideal foundation for structured scientific reasoning evaluation. Problems follow clear mathematical structure, enforce strict physical and numerical constraints, and support objective verification. The discipline requires constructing explicit models of physical systems and reasoning about geometry, spatial relationships, and material behavior, connecting directly to emerging AI goals in physical reasoning and world modeling. We introduce FEM-Bench, a computational mechanics benchmark designed to evaluate the ability of LLMs to generate correct finite element method (FEM) and related code. FEM-Bench 2025 contains a suite of introductory but nontrivial tasks aligned with material from a first graduate course on computational mechanics. These tasks capture essential numerical and physical modeling challenges while representing only a small fraction of the complexity present in the discipline. Despite their simplicity, state-of-the-art LLMs do not reliably solve all of them. In a five attempt run, the best performing model at function writing, Gemini 3 Pro, completed 30/33 tasks at least once and 26/33 tasks all five times. The best performing model at unit test writing, GPT-5, had an Average Joint Success Rate of 73.8%. Other popular models showed broad performance variation. FEM-Bench establishes a structured foundation for evaluating AI-generated scientific code, and future iterations will incorporate increasingly sophisticated tasks to track progress as models evolve.

2509.16187 2026-06-01 cs.SE cs.LG 版本更新

MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair

MatchFixAgent: 语言无关的自主仓库级代码翻译验证与修复

Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening

发表机构 * Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL, USA(伊利诺伊大学厄巴纳-香槟分校Siebel计算与数据科学学院) Amazon, Arlington, VA, USA(亚马逊公司,阿灵顿,弗吉尼亚州,美国) University of Oxford, Oxford, UK(牛津大学,牛津,英国)

AI总结 提出基于大语言模型的多智能体框架MatchFixAgent,实现语言无关的仓库级代码翻译等价性验证与修复,在验证覆盖率和修复成功率上显著优于现有方法。

Comments Published in ICML 2026

详情
AI中文摘要

代码翻译将源代码从一种编程语言转换为另一种。验证翻译的功能等价性并在必要时进行修复是代码翻译的关键步骤。现有的自动化验证和修复方法由于工程开销高而难以泛化到多种编程语言,并且它们依赖于现有且往往不充分的测试套件,导致等价性误判和翻译修复效果不佳。为弥补这一差距,我们开发了MatchFixAgent,一个基于大语言模型、语言无关的翻译等价性验证与修复框架。MatchFixAgent采用多智能体架构,将等价性验证分解为多个子任务,以确保对翻译进行彻底且一致的语义分析。我们将MatchFixAgent的验证和修复结果与四种仓库级代码翻译技术进行了比较。结果表明,MatchFixAgent对99.2%的翻译对生成了(不)等价判定,其中72.8%的等价性验证结果与先前工作一致。当MatchFixAgent的结果与先前工作不一致时,我们发现60.7%的情况下MatchFixAgent的结果实际上是正确的。此外,我们证明MatchFixAgent可以修复50.6%的不等价翻译,而先前工作仅为18.5%。

英文摘要

Code translation transforms source code from one programming language (PL) to another. Validating the functional equivalence of translation and repairing, if necessary, are critical steps in code translation. Existing automated validation and repair approaches struggle to generalize to many PLs due to high engineering overhead, and they rely on existing and often inadequate test suites, which results in false claims of equivalence and ineffective translation repair. To bridge this gap, we develop MatchFixAgent, a large language model (LLM)-based, PL-agnostic framework for equivalence validation and repair of translations. MatchFixAgent features a multi-agent architecture that divides equivalence validation into several sub-tasks to ensure thorough and consistent semantic analysis of the translation. We compare MatchFixAgent's validation and repair results with four repository-level code translation techniques. Our results demonstrate that MatchFixAgent produces (in)equivalence verdicts for 99.2% of translation pairs, with the same equivalence validation result as prior work on 72.8% of them. When MatchFixAgent's result disagrees with prior work, we find that 60.7% of the time MatchFixAgent's result is actually correct. In addition, we show that MatchFixAgent can repair 50.6% of inequivalent translation, compared to prior work's 18.5%.

2512.11779 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Conditional Coverage Diagnostics for Conformal Prediction

条件覆盖诊断用于共形预测

Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach

发表机构 * Sierra team, Inria Paris, France(Inria巴黎研究院法国团队) Ecole Normale Supérieure, PSL Research University, Paris(巴黎高等师范学院PSL研究大学) Soda team, Inria Paris-Saclay, France(Inria巴黎-萨克雷分校法国团队) Departments of EECS(电子工程与计算机科学系)

AI总结 提出将条件覆盖估计转化为分类问题,通过超额风险度量(ERT)来诊断共形预测的条件覆盖偏差,实验表明使用现代分类器比传统指标具有更高的统计功效。

详情
AI中文摘要

评估条件覆盖仍然是评估预测系统可靠性中最持久的挑战之一。尽管共形方法可以保证边际覆盖,但没有方法能保证产生具有正确条件覆盖的集合,这使得实践者无法清晰解释局部偏差。为了克服现有指标的样本低效和过拟合问题,我们将条件覆盖估计转化为一个分类问题。当且仅当某个分类器能够达到比目标覆盖更低的风险时,条件覆盖被违反。通过选择(适当的)损失函数,得到的风险差异给出了自然误覆盖度量(如L1和L2距离)的保守估计,甚至可以分离过覆盖和欠覆盖以及非恒定目标覆盖的影响。我们将得到的度量族称为目标覆盖的超额风险(ERT)。实验表明,使用现代分类器比基于简单分类器的现有指标(如CovGap)具有更高的统计功效。此外,我们使用我们的度量来基准测试不同的共形预测方法。最后,我们发布了ERT以及先前条件覆盖度量的开源软件包。这些贡献共同为理解、诊断和改进预测系统的条件可靠性提供了新视角。

英文摘要

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if some classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.

2512.11561 2026-06-01 cs.LG 版本更新

View Space: Learning Representation across Arbitrary Graphs

视图空间:跨任意图的学习表示

Dooho Lee, Myeong Kong, Minho Jeong, Jaemin Yoo

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea(韩国釜山科学技术院电子工程学院) Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea(首尔国立大学计算机科学与工程系)

AI总结 本文提出视图空间概念,通过图视图变换(GVT)实现跨任意图的归纳节点表示学习,并在节点分类任务中显著优于现有方法。

Comments Accepted to ICML 2026

详情
AI中文摘要

将预训练模型泛化到未见数据集而无需重新训练是基础模型的核心挑战。由于数据集间特征维度和语义的巨大差异,在数值数据上实现完全归纳推理尤为困难。我们观察到,在存在图结构的情况下,数值数据在特征空间之外还允许一个由结构诱导的独特表示轴,我们将其形式化为视图空间。该视图空间能够统一表示具有异构特征的图,并激发了图视图变换(GVT),这是一类可在任意图间共享的参数化映射。我们通过循环GVT实例化该框架,这是一种用于节点分类中完全归纳节点表示学习的架构。在OGBN-Arxiv上预训练并在27个基准上评估,循环GVT比先前的完全归纳图模型GraphAny高出8.93%,并超过12个单独调优的GNN至少3.30%。这些结果确立了视图空间作为跨异构特征空间图学习的原理性和实用基础。代码和检查点可在https://github.com/dooho00/graph-view-space获取。

英文摘要

Generalizing pretrained models to unseen datasets without retraining is a central challenge toward foundation models. Achieving fully inductive inference on numerical data is particularly difficult due to large variations in feature dimensionality and semantics across datasets. We observe that, in the presence of graph structure, numerical data admits a distinct structure-induced representational axis beyond the feature space, which we formalize as the view space. This view space enables a unified representation of graphs with heterogeneous features and motivates Graph View Transformation (GVT), a class of parametric mappings that can be shared across arbitrary graphs. We instantiate this framework with Recurrent GVT, an architecture for fully inductive node representation learning in node classification. Pretrained on OGBN-Arxiv and evaluated on 27 benchmarks, Recurrent GVT outperforms GraphAny, the prior fully inductive graph model, by +8.93%, and surpasses 12 individually tuned GNNs by at least +3.30%. These results establish the view space as a principled and practical foundation for learning across graphs with heterogeneous feature spaces. Code and checkpoints are available in https://github.com/dooho00/graph-view-space.

2512.05038 2026-06-01 cs.LG 版本更新

The SuperActivator Mechanism: Transformers Concentrate Reliable Concept Signals in the Tail

超级激活机制:Transformer将可靠概念信号集中在尾部

Cassandra Goldberg, Chaehyeon Kim, Adam Stein, Eric Wong

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文发现Transformer中的超级激活机制,通过放大概念激活差距,将最可靠的概念证据集中在少数高激活token上,并基于此提出检测方法,在图像和文本模态中F1提升高达0.14。

详情
AI中文摘要

概念向量旨在通过将内部表示与人类可理解的语义联系起来增强模型可解释性,但其实际效用常受限于噪声和不一致的激活。在这项工作中,我们揭示了超级激活机制:一种Transformer动态,它放大概念激活差距,将最可靠的概念证据集中在少数高激活token上。为了从理论上理解这一机制,我们证明概念对齐的注意力头乘法放大成对激活差距,其中已经极端的激活增长最快。我们发现这种放大不仅是理论上的,而且在大型模型上经验性地发生:虽然概念内和概念外激活分布有相当重叠,但概念内分布发展出一个与噪声明显分离的正尾部。这些高尾token,我们称之为超级激活器,在概念正样本中一致出现,使其成为概念存在的可靠指标。因此,基于超级激活器的检测在标准概念激活聚合器和提示基线之上,在图像和文本模态、模型、层和概念提取技术中,F1提升高达0.14,展示了我们见解的通用性和实用性。进一步的实证分析表明,最可靠的超级激活器是稀疏的,检测通常在使用仅5-10%的概念内token激活时达到峰值,并且比全局概念向量捕获更忠实的局部语义。

英文摘要

Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their practical utility is often limited by noisy and inconsistent activations. In this work, we uncover the SuperActivator Mechanism: a transformer dynamic that amplifies concept activation gaps, concentrating the most reliable concept evidence into a small set of high-activation tokens. To develop a theoretical understanding of this mechanism, we prove that concept-aligned attention heads multiplicatively amplify pairwise activation gaps, with already-extreme activations growing fastest. We find that this amplification is not just theoretical, but also occurs empirically on large-scale models: while in- and out-of-concept activation distributions overlap considerably, the in-concept distribution develops a positive tail clearly separated from the noise. These high-tail tokens, which we call SuperActivators, appear consistently across concept-positive samples, making them reliable indicators of concept presence. Accordingly, SuperActivator-based detection improves F1 by up to 0.14 over standard concept activation aggregators and prompting baselines across image and text modalities, models, layers, and concept extraction techniques, demonstrating the generality and practicality of our insights. Further empirical analysis demonstrates that the most reliable SuperActivators are sparse, with detection typically peaking when using only 5-10% of in-concept token activations, and capture more faithful localized semantics than global concept vectors.

2509.24901 2026-06-01 cs.SD cs.LG 版本更新

Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification

取消补丁令牌静音:重新审视多标签音频分类中的探测方法

Lukas Rauch, René Heinrich, Houtan Ghaffari, Lukas Miklautz, Ilyass Moummad, Bernhard Sick, Christoph Scholz

发表机构 * University of Kassel(卡塞尔大学) Fraunhofer IEE(弗劳恩霍夫研究所) Ghent University(根特大学) ML and Systems Biology, MPI of Biochemistry(生物化学Max Planck研究所) INRIA Montpellier(蒙彼利埃INRIA)

AI总结 针对自监督音频模型线性探测性能不佳的问题,提出二值化原型探测方法,通过学习原型进行类别级信息聚合,在13个数据集上超越线性探测和注意力探测,建立探测作为高效评估范式的可行性。

Comments Accepted @ ICLR26

详情
AI中文摘要

尽管探测冻结模型已成为标准评估范式,但音频中的自监督学习在追求AudioSet上的最优性能时默认采用微调。一个关键原因是全局池化造成信息瓶颈,导致线性探测错误地表示嵌入质量:$\texttt{cls}$-token丢弃了关于音频中分散、局部事件的关键令牌信息。这一弱点根源于预训练目标(全局)与下游任务(局部)之间的不匹配。在包含13个数据集和6个基于频谱图的编码器的综合基准测试中,我们研究了全局池化瓶颈。我们引入了二值化原型探测:一种轻量级且简单的池化方法,通过学习原型进行类别级信息聚合。尽管简单,我们的方法显著优于线性探测和注意力探测。我们的工作将探测确立为评估音频SSL模型的一种有竞争力且高效的范式,挑战了对昂贵微调的依赖。

英文摘要

Although probing frozen models has become a standard evaluation paradigm, self-supervised learning in audio defaults to fine-tuning when pursuing state-of-the-art on AudioSet. A key reason is that global pooling creates an information bottleneck causing linear probes to misrepresent the embedding quality: The $\texttt{cls}$-token discards crucial token information about dispersed, localized events in audio. This weakness is rooted in the mismatch between the pretraining objective (globally) and the downstream task (localized). Across a comprehensive benchmark of 13 datasets and 6 spectrogram-based encoders, we investigate the global pooling bottleneck. We introduce binarized prototypical probes: a lightweight and simple pooling method that learns prototypes to perform class-wise information aggregation. Despite its simplicity, our method notably outperforms linear and attentive probing. Our work establishes probing as a competitive and efficient paradigm for evaluating audio SSL models, challenging the reliance on costly fine-tuning.

2512.00919 2026-06-01 stat.ML cs.LG 版本更新

Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression

面向工具变量回归的结果感知谱特征学习

Dimitri Meunier, Jakub Wornbard, Vladimir R. Kostic, Antoine Moulin, Alek Fröhlich, Karim Lounici, Massimiliano Pontil, Arthur Gretton

发表机构 * Gatsby Computational Neuroscience Unit, University College London(Gatsby计算神经科学单位,伦敦大学学院) Faculty of Science, University of Novi Sad(科学学院,诺维萨德大学) AI Centre, University College London(人工智能中心,伦敦大学学院) DIBRIS, University of Genoa(DIBRIS,热那亚大学)

AI总结 针对存在隐藏混杂因素的非参数工具变量回归问题,提出一种通过最小化基于增广算子的对比损失来学习结果感知谱特征的方法,以缓解谱错位导致的因果函数表示不足问题。

Comments ICML 2026

详情
AI中文摘要

我们解决了在存在隐藏混杂因素的情况下使用非参数工具变量(IV)回归进行因果效应估计的问题。一种成熟的方法是使用基于学习到的谱特征的估计量,即跨越连接处理变量与工具变量的算子的主要奇异子空间的特征。虽然这种方法很强大,但此类特征对结果变量是无关的。因此,当真实因果函数无法被这些主导奇异函数很好地表示时,该方法可能会失败。为了缓解这一问题,我们引入了增广谱特征学习,这是一个使特征学习过程具有结果感知能力的框架。我们的方法通过最小化从增广算子导出的新颖对比损失来学习特征,该增广算子融合了结果的信息。通过学习这些任务特定的特征,即使在谱错位的情况下,我们的方法仍然有效。我们对该框架进行了理论分析,并在具有挑战性的基准测试上验证了我们的方法。

英文摘要

We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.

2505.18069 2026-06-01 cs.LG eess.SP 版本更新

Ubiquity of Emergent Hebbian Dynamics in Regularized Learning

正则化学习中涌现的赫布动力学的普遍性

David Koplow, Tomaso Poggio, Liu Ziyin

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Center for Brains, Minds and Machines(大脑、心智与机器中心) NTT Research(NTT研究)

AI总结 本文发现L2权重衰减在近稳态条件下普遍驱动学习信号与赫布方向对齐,且随机噪声可诱导反赫布对齐,这为区分真正的赫布计算与涌现的赫布特征提供了实验动机。

Comments ICML 2026 Camera Ready

详情
AI中文摘要

赫布和反赫布可塑性在大脑中广泛观察到,经典上被建模为由稳态约束稳定的机械性、局部同突触规则。这引发了一个可识别性问题:在突触更新中观察到赫布/反赫布结构是否唯一地暗示了底层的赫布计算?我们识别出另一种涌现途径。我们表明,在近稳态条件下,L2权重衰减通常驱动许多更新规则的学习信号分量与赫布方向对齐,且对齐程度随衰减强度单调增加。这种类似赫布的特征并非SGD特有,甚至可以在学习停止之前很久的非学习或随机更新规则中出现。我们进一步表明,学习信号中的随机噪声可以诱导反赫布对齐,从而在回归设置中产生与权重衰减的简单权衡和相边界。这些机制并不取代标准的赫布理论;它们可以与真正的赫布可塑性共存,并使突触测量的解释复杂化,从而激励区分机械性赫布计算与涌现的赫布特征的实验。

英文摘要

Hebbian and anti-Hebbian plasticity are widely observed in the brain and are classically modeled as mechanistic, local homosynaptic rules stabilized by homeostatic constraints. This raises an identifiability question: does observing Hebbian/anti-Hebbian structure in synaptic updates uniquely imply an underlying Hebbian computation? We identify an alternative, emergent route. We show that near stationarity, L2 weight decay generically drives the \emph{learning-signal} component of many update rules to align with a Hebbian direction, with alignment increasing monotonically with decay strength. This Hebbian-like signature is not specific to SGD and can arise even for non-learning or random update rules long before learning has ceased. We further show that stochastic noise in the learning signal can induce anti-Hebbian alignment, yielding a simple tradeoff with weight decay and a phase boundary in regression settings. These mechanisms do not replace standard Hebbian theory; they can coexist with genuine Hebbian plasticity and complicate the interpretation of synaptic measurements, motivating experiments that distinguish mechanistic Hebbian computation from emergent Hebbian signatures.

2511.21513 2026-06-01 cs.LG 版本更新

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

IntAttention: 面向高效边缘推理的全整数注意力流水线

Wanli Zhong, Haibo Feng, Zirui Zhou, Hanyang Peng, Shiqi Yu

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country(匿名机构,匿名城市,匿名地区,匿名国家)

AI总结 针对Transformer在边缘设备上部署时softmax路径导致的数据类型转换瓶颈,提出IntAttention全整数注意力流水线,通过IndexSoftmax算子、稀疏感知裁剪、32项查找表近似和直接整数归一化,消除数据类型转换开销,在Armv8 CPU上实现高达3.7倍加速和61%能耗降低。

详情
AI中文摘要

在边缘设备上部署Transformer模型受到延迟和能量预算的限制。虽然INT8量化有效加速了主要的矩阵乘法,但它将softmax相关路径暴露为主要瓶颈。该阶段需要进行昂贵的反量化->softmax->再量化绕行,这可以占到总注意力延迟的65%,并破坏了边缘硬件效率至关重要的端到端整数数据流。为了解决这一限制,我们提出了IntAttention,这是第一个全整数注意力流水线,可作为无需训练的即插即用替代方案。我们方法的核心是IndexSoftmax,一种在整数域内完全替代浮点指数运算的硬件友好算子。IntAttention集成了稀疏感知裁剪、32项查找表近似和直接整数归一化,从而消除了注意力路径上的数据类型转换开销。在Armv8 CPU上的实验表明,与FP16基线相比,我们的方法实现了高达3.7倍的加速和61%的能耗降低,与传统的INT8注意力流水线相比,加速高达2.0倍。在多种语言和视觉模型以及额外的推理和长上下文评估中,IntAttention保持了强大的整体保真度,并展示了比现有基于LUT的softmax近似更有利的权衡。代码可在https://github.com/WanliZhong/IntAttention获取。

英文摘要

Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency. To address this limitation, we present IntAttention, the first fully integer attention pipeline that serves as a training-free drop-in replacement. At the core of our approach lies IndexSoftmax, a hardware-friendly operator that replaces floating-point exponentials entirely within the integer domain. IntAttention integrates sparsity-aware clipping, a 32-entry lookup table approximation, and direct integer normalization, thereby eliminating datatype conversion overhead along the attention path. Experiments on Armv8 CPUs show that our method achieves up to 3.7x speedup and 61% energy reduction over FP16 baselines, and up to 2.0x speedup over conventional INT8 attention pipelines. Across diverse language and vision models, as well as additional reasoning and long-context evaluations, IntAttention maintains strong overall fidelity and demonstrates a more favorable trade-off than existing LUT-based softmax approximations. Code is available at https://github.com/WanliZhong/IntAttention

2511.19513 2026-06-01 cs.LG 版本更新

Row-Stochastic Matrices Can Provably Outperform Doubly Stochastic Matrices in Decentralized Learning

行随机矩阵在去中心化学习中可证明优于双随机矩阵

Bing Liu, Boao Kong, Limin Lu, Kun Yuan, Chengcheng Zhao

发表机构 * College of Control Science and Engineering, Zhejiang University(浙江大学控制科学与工程学院) Center for Data Science, Peking University(北京大学数据科学中心) Center for Machine Learning Research, Peking University(北京大学机器学习研究中心)

AI总结 本文通过加权希尔伯特空间框架,严格证明了行随机矩阵相比双随机矩阵在去中心化学习中具有更快的收敛速度,并给出了拓扑条件指导设计。

详情
AI中文摘要

去中心化学习通常涉及具有异构节点权重$λ$的加权全局损失。我们重新审视了两种融入这些权重的自然策略:(i) 将权重嵌入局部损失以保持均匀权重(从而得到双随机矩阵),以及(ii) 保留原始损失同时采用由$λ$诱导的行随机矩阵。尽管先前的工作表明两种策略都针对相同的$λ$加权全局损失,但尚不清楚欧几里得空间中的保证是否紧致,以及它们的表现有何根本差异。为了澄清这一点,我们开发了一个加权希尔伯特空间框架$L^2(λ;\\\mathbb{R}^d)$,并获得了比标准欧几里得分析严格更紧的收敛速率。在该几何中,行随机矩阵成为\\emph{自伴的},而双随机矩阵则不是,从而产生了额外的\\emph{惩罚项},放大了共识误差,进而减缓了收敛。因此,收敛差异不仅来自谱间隙,还来自这些惩罚项。然后,我们推导了行随机设计即使具有更小的谱间隙也能更快收敛的充分条件。最后,通过使用瑞利商和Loewner序特征值比较,我们进一步获得了保证这一优势的拓扑条件,并给出了实用的拓扑设计指南。

英文摘要

Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies target the same $λ$-weighted global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from standard Euclidean analysis. In this geometry, the row-stochastic matrix becomes \emph{self-adjoint} whereas the doubly stochastic one does not, creating additional \emph{penalty terms} that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.

2511.17826 2026-06-01 cs.LG cs.CL stat.ML 版本更新

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

跨张量并行大小的确定性推理,消除训练-推理不匹配

Ziyang Zhang, Xinheng Ding, Jiayi Yuan, Rixin Liu, Huizi Mao, Jiarong Xing, Zirui Liu

发表机构 * Independent Researcher(独立研究者) University of Minnesota, Minneapolis, Minnesota, USA(明尼苏达大学) Rice University, Houston, Texas, USA(里士满大学) NVIDIA Corp., Santa Clara, California, USA(NVIDIA公司)

AI总结 针对不同张量并行大小导致浮点运算非结合性引起的推理非确定性问题,提出基于树的核(TBIK)实现跨TP大小的比特级一致结果,消除RL训练中推理与训练引擎间的精度不匹配。

详情
AI中文摘要

确定性推理对于大型语言模型(LLM)应用(如LLM-as-a-judge评估、多智能体系统和强化学习(RL))日益关键。然而,现有的LLM服务框架表现出非确定性行为:当系统配置(例如张量并行(TP)大小、批大小)变化时,即使采用贪心解码,相同的输入也可能产生不同的输出。这是由于浮点运算的非结合性以及GPU间归约顺序不一致导致的。虽然先前的工作通过批不变核解决了与批大小相关的非确定性,但跨不同TP大小的确定性仍然是一个开放问题,特别是在RL设置中,训练引擎通常使用全分片数据并行(即TP=1),而部署引擎依赖多GPU TP以最大化推理吞吐量,从而在两者之间产生自然的不匹配。这种精度不匹配问题可能导致RL训练性能次优甚至崩溃。我们识别并分析了TP引起不一致的根本原因,并提出了基于树的核(TBIK),这是一组TP不变的矩阵乘法和归约原语,无论TP大小如何,都能保证比特级相同的结果。我们的关键见解是通过统一的层次二叉树结构对齐GPU内和GPU间的归约顺序。我们在Triton中实现了这些核,并将其集成到vLLM和FSDP中。实验证明,在不同TP大小下,确定性推理的概率发散为零,且具有比特级可重复性。此外,在采用不同并行策略的RL训练流程中,我们在vLLM和FSDP之间实现了比特级相同的结果。代码可在https://github.com/nanomaoli/llm_reproducibility获取。

英文摘要

Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs. While prior work has addressed batch-size-related nondeterminism through batch-invariant kernels, determinism across different TP sizes remains an open problem, particularly in RL settings, where the training engine typically uses Fully Sharded Data Parallel (i.e., TP = 1) while the rollout engine relies on multi-GPU TP to maximize the inference throughput, creating a natural mismatch between the two. This precision mismatch problem may lead to suboptimal performance or even collapse for RL training. We identify and analyze the root causes of TP-induced inconsistency and propose Tree-Based Invariant Kernels (TBIK), a set of TP-invariant matrix multiplication and reduction primitives that guarantee bit-wise identical results regardless of TP size. Our key insight is to align intra- and inter-GPU reduction orders through a unified hierarchical binary tree structure. We implement these kernels in Triton and integrate them into vLLM and FSDP. Experiments confirm zero probability divergence and bit-wise reproducibility for deterministic inference across different TP sizes. Also, we achieve bit-wise identical results between vLLM and FSDP in RL training pipelines with different parallel strategy. Code is available at https://github.com/nanomaoli/llm_reproducibility.

2506.08255 2026-06-01 cs.LG cs.AI cs.CR 版本更新

SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense

SHIELD: 用于增量扩展学习防御的安全超网络

Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek

发表机构 * Jagiellonian University, Faculty of Mathematics and Computer Science(杰洛内维大学数学与计算机科学学院) Jagiellonian University, Doctoral School of Exact and Natural Sciences(杰洛内维大学精确与自然科学研究博士学院) Akces NCBR IDEAS Research Institute(IDEAS研究所)

AI总结 提出一种结合区间边界传播(IBP)与超网络的框架SHIELD,通过生成任务特定参数和区间混合训练策略,实现可认证鲁棒的持续学习,在保持可扩展性的同时达到最优平均准确率。

Comments Accepted to CVPR 2026 (Findings track)

详情
AI中文摘要

在对抗条件下的持续学习仍然是一个开放问题,现有方法往往在鲁棒性、可扩展性或两者之间做出妥协。我们提出了一种新颖的框架,将区间边界传播(IBP)与基于超网络的架构相结合,以实现跨顺序任务的可认证鲁棒持续学习。我们的方法SHIELD通过一个共享的超网络生成任务特定的模型参数,该超网络仅依赖于紧凑的任务嵌入,从而消除了对重放缓冲区或完整模型副本的需求,并实现了高效的时间扩展。为了进一步增强鲁棒性,我们引入了区间混合(Interval MixUp),这是一种新颖的训练策略,它将表示为以MixUp点为中心的$\ell_{\infty}$球的虚拟示例混合。利用区间算术,该技术保证了可认证的鲁棒性,同时减轻了包裹效应,从而产生更平滑的决策边界。我们在多个基准测试上评估了SHIELD在强白盒对抗攻击(包括PGD和AutoAttack)下的表现。它持续优于现有的鲁棒持续学习方法,在保持可扩展性和认证性的同时,实现了最先进的平均准确率。这些结果向在对抗环境中实现实用且理论扎实的持续学习迈出了重要一步。

英文摘要

Continual learning under adversarial conditions remains an open problem, as existing methods often compromise either robustness, scalability, or both. We propose a novel framework that integrates Interval Bound Propagation (IBP) with a hypernetwork-based architecture to enable certifiably robust continual learning across sequential tasks. Our method, SHIELD, generates task-specific model parameters via a shared hypernetwork conditioned solely on compact task embeddings, eliminating the need for replay buffers or full model copies and enabling efficient over time. To further enhance robustness, we introduce Interval MixUp, a novel training strategy that blends virtual examples represented as $\ell_{\infty}$ balls centered around MixUp points. Leveraging interval arithmetic, this technique guarantees certified robustness while mitigating the wrapping effect, resulting in smoother decision boundaries. We evaluate SHIELD under strong white-box adversarial attacks, including PGD and AutoAttack, across multiple benchmarks. It consistently outperforms existing robust continual learning methods, achieving state-of-the-art average accuracy while maintaining both scalability and certification. These results represent a significant step toward practical and theoretically grounded continual learning in adversarial settings.

2511.17380 2026-06-01 cs.CV cs.LG 版本更新

Non-Parametric Probabilistic Robustness: A Conservative Risk Estimator under Unknown Perturbation Distributions

非参数概率鲁棒性:未知扰动分布下的保守风险估计

Zheng Wang, Yi Zhang, Siddartha Khastgir, Carsten Maple, Xingyu Zhao

发表机构 * WMG, University of Warwick, Coventry, United Kingdom(沃里克大学商学院,沃里克,英国) Wuhan University, Wuhan, China(武汉大学,武汉,中国)

AI总结 提出非参数概率鲁棒性(NPPR)度量,通过从数据中学习扰动分布,在分布不确定性下实现保守的概率鲁棒性估计,并基于高斯混合模型开发估计器。

详情
AI中文摘要

深度学习模型尽管取得了显著成功,但仍然容易受到微小输入扰动的影响,导致错误输出,这促使最近提出概率鲁棒性(PR)作为对抗鲁棒性(AR)的补充替代方案。然而,现有的PR公式假设扰动分布固定且已知,这在实践中是不现实的期望。为了解决这一限制,我们提出了非参数概率鲁棒性(NPPR),一种更实用的PR度量,不依赖于任何预定义的扰动分布。遵循统计建模中的非参数范式,NPPR直接从数据中学习优化的扰动分布,从而在分布不确定性下实现保守的PR评估。我们进一步开发了基于高斯混合模型(GMM)的NPPR估计器,涵盖了各种输入相关和输入无关的扰动场景。理论分析建立了AR、PR和NPPR之间的关系。在CIFAR-10、CIFAR-100和Tiny ImageNet上使用ResNet18/50、WideResNet50和VGG16的大量实验验证了NPPR作为更实用的鲁棒性度量,与假设最先进技术中使用的常见扰动分布相比,显示出保守(较低)的PR估计。

英文摘要

Deep learning (DL) models, despite their remarkable success, remain vulnerable to small input perturbations that can cause erroneous outputs, motivating the recent proposal of probabilistic robustness (PR) as a complementary alternative to adversarial robustness (AR). However, existing PR formulations assume a fixed and known perturbation distribution, an unrealistic expectation in practice. To address this limitation, we propose non-parametric probabilistic robustness (NPPR), a more practical PR metric that does not rely on any predefined perturbation distribution. Following the non-parametric paradigm in statistical modeling, NPPR learns an optimized perturbation distribution directly from data, enabling conservative PR evaluation under distributional uncertainty. We further develop an NPPR estimator based on a Gaussian Mixture Model (GMM), covering various input-dependent and input-independent perturbation scenarios. Theoretical analyses establish the relationships among AR, PR, and NPPR. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet across ResNet18/50, WideResNet50 and VGG16 validate NPPR as a more practical robustness metric, showing conservative (lower) PR estimates compared to assuming those common perturbation distributions used in state-of-the-arts.

2511.10868 2026-06-01 cs.LG 版本更新

Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go

Go-UT-Bench:用于基于LLM的Go语言单元测试生成的微调数据集

Yashshi Pipalani, Hritik Raj, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta

发表机构 * Nutanix

AI总结 针对代码LLM训练数据不平衡问题,提出Go-UT-Bench数据集(5264对代码与单元测试),通过微调提升模型在Go语言单元测试生成任务上的性能,在超过75%的基准任务上优于基础模型。

Comments 9 pages, 5 figures

详情
AI中文摘要

训练数据不平衡对代码LLM构成了重大挑战。大多数可用数据严重偏向原始开源代码,而低估了更广泛的软件工程任务,尤其是在像Golang这样的低资源语言中。因此,模型在代码自动补全方面表现出色,但在单元测试生成等实际开发者工作流程中表现不佳。为了解决这一差距,我们引入了GO UT Bench,这是一个包含5264对代码和单元测试的基准数据集,来自10个宽松许可的Golang仓库,涵盖不同领域。我们评估了它作为微调数据集在两个LLM家族(即专家混合模型和密集解码器)上的有效性。我们的结果表明,微调后的模型在超过75%的基准任务上优于其基础对应模型。

英文摘要

Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation. To address this gap, we introduce GO UT Bench, a benchmark dataset of 5264 pairs of code and unit tests, drawn from 10 permissively licensed Golang repositories spanning diverse domain. We evaluate its effectiveness as a fine tuning dataset across two LLM families i.e. mixture of experts and dense decoders. Our results show that finetuned models outperform their base counterparts on more than 75% of benchmark tasks.

2511.03100 2026-06-01 cs.LG cs.AI cs.MA 版本更新

Scaling Multi-Agent Environment Co-Design with Diffusion Models

基于扩散模型的多智能体环境协同设计扩展

Hao Xiang Li, Michael Amir, Amanda Prorok

发表机构 * Department of Computer Science, University of Cambridge, Cambridge, United Kingdom(剑桥大学计算机科学系,剑桥,英国)

AI总结 提出扩散协同设计(DiCoDe)框架,通过投影通用引导(PUG)和评论家蒸馏机制,实现高维环境设计空间下的可扩展、样本高效的智能体-环境协同优化。

详情
AI中文摘要

智能体-环境协同设计范式联合优化智能体策略和环境配置,以寻求系统性能提升。其应用领域从仓库物流到风电场管理,有望从根本上改变多智能体系统的部署方式。然而,当前的协同设计方法难以扩展:在高维环境设计空间下失效,且在处理联合优化中固有的移动目标时样本效率低下。我们通过开发扩散协同设计(DiCoDe)来应对这些挑战,这是一个可扩展且样本高效的协同设计框架,将协同设计推向实际相关场景。DiCoDe包含两项核心创新。首先,我们引入投影通用引导(PUG),这是一种采样技术,使DiCoDe能够在满足硬约束(如障碍物之间的空间间隔)的同时,探索奖励最大化环境的分布。其次,我们设计了一种评论家蒸馏机制,以共享来自强化学习评论家的知识,确保引导扩散模型利用密集且最新的学习信号适应不断演化的智能体策略。在具有挑战性的多智能体环境协同设计基准(包括仓库自动化、多智能体路径规划和风电场优化)上验证时,这些改进共同产生了更优的环境-策略对。我们的方法持续超越现有技术,例如在仓库场景中,以少66%的仿真样本实现了39%更高的奖励。这为智能体-环境协同设计设立了新标准,并向着在现实世界中收获协同设计成果迈出了关键一步。

英文摘要

The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising environments while satisfying hard constraints such as spatial separation between obstacles. Second, we devise a critic distillation mechanism to share knowledge from the reinforcement learning critic, ensuring that the guided diffusion model adapts to evolving agent policies using a dense and up-to-date learning signal. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent environment co-design benchmarks including warehouse automation, multi-agent pathfinding and wind farm optimisation. Our method consistently exceeds the state-of-the-art, achieving, for example, 39% higher rewards in the warehouse setting with 66% fewer simulation samples. This sets a new standard in agent-environment co-design, and is a stepping stone towards reaping the rewards of co-design in real world domains.

2510.17111 2026-06-01 cs.RO cs.LG 版本更新

Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

面向具身操作的高效视觉-语言-动作模型:系统综述

Weifan Guan, Qinghao Hu, Aosheng Li, Jian Cheng

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) University of Chinese Academy of Sciences(中国科学院大学) AiRiA Nanjing University of Information Science and Technology(南京信息科学技术大学)

AI总结 本文系统综述了通过模型架构、感知特征、动作生成和训练/推理策略四个维度降低视觉-语言-动作模型延迟、内存占用及计算成本的方法。

详情
AI中文摘要

视觉-语言-动作(VLA)模型通过将自然语言指令和视觉观察映射到机器人动作,将视觉-语言模型扩展到具身控制。尽管功能强大,但VLA系统因其巨大的计算和内存需求而面临重大挑战,这与需要实时性能的边缘平台(如机载移动操作器)的约束相冲突。解决这一矛盾已成为近期研究的核心焦点。鉴于对更高效、可扩展的VLA系统的日益关注,本综述系统回顾了提高VLA效率的方法,重点在于减少延迟、内存占用以及训练和推理成本。我们将现有解决方案分为四个维度:模型架构、感知特征、动作生成和训练/推理策略,并总结了每个类别中的代表性技术。最后,我们讨论了未来趋势和开放挑战,指出了推进高效具身智能的方向。

英文摘要

Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.

2506.22304 2026-06-01 cs.LG cs.CV 版本更新

Unfolding Generative Flows with Koopman Operators: Trajectory-Preserving Linearization

利用Koopman算子展开生成流:轨迹保持的线性化

Erkan Turan, Aristotelis Siozopoulos, Louis Martinez, Julien Gaubil, Emery Pierson, Maks Ovsjanikov

发表机构 * University of Athens, Greece(雅典大学)

AI总结 提出基于Koopman理论的全局线性化方法,将预训练的条件流匹配模型提升到高维Koopman空间,实现轨迹保持的线性化,从而支持一步并行采样和生成轨迹的谱分析。

详情
AI中文摘要

连续归一化流(CNFs)实现了优雅的生成建模,但受限于其迭代性质,需要昂贵的采样且缺乏中间状态的可解释性。最近的方法通过拉直轨迹或蒸馏端点来加速采样,但将原始生成过程视为黑箱,丢弃了教师模型的中间动态。我们提出了一种根本不同的视角:通过Koopman理论全局线性化流动态,以实现轨迹保持的线性化。通过将预训练的条件流匹配(CFM)模型提升到高维Koopman空间,我们用单个线性算子表示其演化。关键的是,与仅边界蒸馏不同,我们的方法沿整个生成路径强制与教师向量场保持无穷小一致性。我们推导了一个实用的、无模拟的训练目标,确保这种全局对齐,并带来两个关键优势。首先,采样变为一步且可并行化。其次,由于线性化忠实于动态,Koopman算子提供了对生成的独特见解。我们证明,这种结构能够实现先前方法无法实现的新应用,包括发现语义一致的编辑方向、使用与教师对齐的线性算子进行反演以及类条件谱特征。实验上,我们的方法在实现竞争性样本质量的同时,能够对生成流的整个轨迹进行谱分析和控制。

英文摘要

Continuous Normalizing Flows (CNFs) enable elegant generative modeling but remain bottlenecked by their iterative nature requiring costly sampling and lacking interpretability of the intermediate states. Recent approaches accelerate sampling by straightening trajectories or distilling endpoints, yet they treat the original generative process as a black box, discarding the teacher's intermediate dynamics. We propose a fundamentally different perspective: globally linearizing flow dynamics via Koopman theory to achieve trajectory-preserving linearization. By lifting a pre-trained Conditional Flow Matching (CFM) model into a higher-dimensional Koopman space, we represent its evolution with a single linear operator. Crucially, unlike boundary-only distillation, our method enforces infinitesimal consistency with the teacher's vector field along the full generative path. We derive a practical, simulation-free training objective that ensures this global alignment and yields two key benefits. First, sampling becomes one-step and parallelizable. Second, because the linearization is faithful to the dynamics, the Koopman operator provides unique insights on the generation. We demonstrate that this structure enables novel applications unavailable in prior approaches, including discovery of semantically coherent editing directions, inversion with a teacher-aligned linear operator and class-conditional spectral signatures. Empirically, our approach achieves competitive sample quality, while enabling spectral analysis and control of the entire trajectories of generative flows.

2510.16138 2026-06-01 cs.LG stat.ML 版本更新

Expert Merging in Sparse Mixture of Experts with Nash Bargaining

基于纳什谈判的稀疏混合专家模型专家合并

Dung V. Nguyen, Anh T. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Shiqi Jiang, Ethan Fetaya, Linh Duy Tran, Gal Chechik, Tan M. Nguyen

发表机构 * Department of Mathematics, National University of Singapore(新加坡国立大学数学系) Viettel AI, Viettel Group(越南电信AI部门) Faculty of Mathematics and Informatics, Hanoi University of Science and Technology(河内科学技术大学数学与信息学系) Bar Ilan University, Israel(以色列巴伊兰大学) AI Imaging Team, Data Solution Department, FPT Software Japan(日本FPT软件数据解决方案部门AI成像团队)

AI总结 针对稀疏混合专家模型缺乏原则性加权机制的专家合并问题,提出基于纳什谈判的NAMEx框架,实现专家间更平衡高效的协作,在多项任务中优于现有方法。

Comments 10 pages in the main text. ICLR 2026 Poster

详情
AI中文摘要

现有的稀疏混合专家模型(SMoE)专家合并策略通常依赖于输入相关或输入无关的专家参数平均,但往往缺乏原则性的加权机制。在这项工作中,我们通过博弈论的视角重新解释专家合并,揭示了专家之间的合作与竞争动态。基于这一视角,我们引入了专家纳什合并(NAMEx),这是一个将纳什谈判融入合并过程的新框架,使专家之间能够实现更平衡和高效的协作。此外,我们将复杂动量纳入NAMEx,以加速专家传播,并提供了收敛的理论保证。在语言建模、文本分类、图像分类以及数据损坏下的零样本鲁棒性等广泛实验中,NAMEx始终优于竞争方法,同时与流行的MoE架构无缝集成。最后,我们通过将NAMEx应用于大规模系统(包括Qwen1.5-MoE (14B)和DeepSeek-MoE (16B))展示了其可扩展性,在零样本和微调设置中均证明了其有效性。代码公开于:https://github.com/anh147/NAMEx。

英文摘要

Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modelling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx's scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings. The code is publicly available at: https://github.com/anh147/NAMEx.

2510.11683 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

边界引导策略优化:面向扩散大语言模型的内存高效强化学习

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

发表机构 * Tsinghua University(清华大学)

AI总结 针对扩散大语言模型中似然函数难以处理导致强化学习内存开销大的问题,提出边界引导策略优化(BGPO),通过构造满足线性和等价性的下界实现内存高效训练,在数学求解、代码生成和规划任务中显著优于现有方法。

详情
AI中文摘要

将强化学习(RL)应用于扩散大语言模型(dLLMs)的一个关键挑战是其似然函数的难解性,而似然函数对于RL目标至关重要,因此在训练过程中需要相应的近似。现有方法通过自定义蒙特卡洛(MC)采样,利用证据下界(ELBO)近似对数似然,但由于需要保留所有MC样本用于RL目标中非线性项的梯度计算,导致显著的内存开销,从而限制了可行的样本量,导致似然近似不精确和RL目标失真。为了解决这个问题,我们提出了边界引导策略优化(BGPO),一种内存高效的RL算法,它最大化基于ELBO的目标的一个特殊构造的下界。该下界经过精心设计,满足两个关键性质:(1)线性:它是一个线性求和,其中每一项仅依赖于单个MC样本,从而能够跨样本进行梯度累积并确保恒定的内存使用;(2)等价性:在在线策略训练中,该下界的值和梯度与基于ELBO的目标相等,因此它也是对原始RL目标的有效近似。这些性质使得BGPO能够采用大的MC样本量,改进似然近似和RL目标估计,从而带来性能提升。实验表明,BGPO在数学问题求解、代码生成和规划任务中显著优于先前的dLLMs RL算法。我们的代码和模型可在https://github.com/THU-KEG/BGPO获取。

英文摘要

A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during training. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, they incur significant memory overhead due to the need to retain all MC samples for the gradient computation of non-linear terms in the RL objective, and thus restrict feasible sample sizes, leading to imprecise likelihood approximations and distorted RL objective. To address this, we propose \emph{Boundary-Guided Policy Optimization} (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective. This lower bound is carefully designed to satisfy two key properties: (1) Linearity: it is a linear sum where each term depends only on a single MC sample, thereby enabling gradient accumulation across samples and ensuring constant memory usage; (2) Equivalence: Both the value and gradient of this lower bound are equal to those of the ELBO-based objective in on-policy training, making it also an effective approximation for the original RL objective. These properties allow BGPO to adopt a large MC sample size, improving likelihood approximations and RL objective estimation, which in turn leads to enhanced performance. Experiments show that BGPO significantly outperforms previous RL algorithms for dLLMs in math problem solving, code generation, and planning tasks. Our codes and models are available at \href{https://github.com/THU-KEG/BGPO}{https://github.com/THU-KEG/BGPO}.

2510.11711 2026-06-01 cs.LG stat.ML 版本更新

Reinforced sequential Monte Carlo for amortised sampling

强化序贯蒙特卡洛用于摊销采样

Sanghyeok Choi, Sarthak Mittal, Víctor Elvira, Jinkyoo Park, Esmeralda S. Whitammer

发表机构 * University of Edinburgh Mila -- Qu\'ebec AI Institute CIFAR Fellow

AI总结 本文提出一种摊销方法与粒子方法相结合的采样框架,通过最大熵强化学习训练序贯蒙特卡洛采样器,并利用离线策略学习提高目标分布探索效率,在合成多模态目标和丙氨酸二肽构象玻尔兹曼分布上验证了改进的近似精度与训练稳定性。

Comments ICML 2026. Code: https://github.com/hyeok9855/ReinforcedSMC

详情
AI中文摘要

本文提出了一种摊销方法和基于粒子的方法的协同作用,用于从未归一化的密度函数定义的分布中采样。我们阐述了序贯蒙特卡洛(SMC)与通过最大熵强化学习(MaxEnt RL)训练的神经序贯采样器之间的联系,其中学习的采样策略和价值函数定义了提议核和扭曲函数。利用这一联系,我们引入了一种离线策略RL训练程序,该程序使用来自SMC的样本(将学习的采样器作为提议)作为行为策略,以更好地探索目标分布。我们描述了稳定联合训练提议和扭曲函数的技术,以及一种自适应权重退火方案以减少训练信号方差。此外,基于过去使用经验回放指导神经采样器训练的尝试,我们推导出一种方法,将历史样本与退火重要性采样权重结合在回放缓冲区中。在合成多模态目标(连续和离散空间)以及丙氨酸二肽构象的玻尔兹曼分布上,我们展示了在近似真实分布以及训练稳定性方面相比摊销方法和蒙特卡洛方法的改进。

英文摘要

This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC -- using the learnt sampler as a proposal -- as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.

2501.12500 2026-06-01 cs.LG stat.ME 版本更新

Learning General Causal Structures with Hidden Dynamic Process for Climate Analysis

学习具有隐藏动态过程的通用因果结构用于气候分析

Minghao Fu, Biwei Huang, Zijian Li, Yujia Zheng, Ignavier Ng, Guangyi Chen, Yingyao Hu, Kun Zhang

发表机构 * Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE(穆罕默德·本·扎耶德人工智能大学) Carnegie Mellon University, Pittsburgh, PA, USA(卡内基梅隆大学) University of California San Diego, La Jolla, CA, USA(加州大学圣地亚哥分校) Johns Hopkins University, Baltimore, MD, USA(约翰·霍普金斯大学)

AI总结 提出统一框架CaDRe,联合发现观测变量间的因果关系和隐藏动态过程,在非参数设置下可识别,并在气候数据上验证了有效性和可解释性。

Comments Accepted by ICML 2026

详情
AI中文摘要

理解气候动力学需要超越观测数据中的相关性,揭示潜在的因果过程。诸如大气过程等潜在驱动因素在时间动态中起着核心作用,而地理上邻近的观测变量之间也存在直接的因果影响。传统的因果表示学习(CRL)通常关注潜在因素,但忽略了这种观测到观测的因果关系,这限制了其在气候分析中的适用性。在本文中,我们引入了一个统一框架,联合揭示(i)观测变量之间的因果关系和(ii)潜在驱动力及其相互作用。我们建立了条件,使得隐藏动态过程和观测变量之间的因果结构可以从时间序列数据中同时识别,并且我们的保证在非参数设置下通过恢复潜在变量和因果关系的上下文信息仍然成立。基于这些见解,我们提出了CaDRe(因果发现与表示学习),一个具有结构约束的时间序列生成模型,集成了CRL和因果发现。在合成数据集上的实验验证了我们的理论结果。在真实世界的气候数据集上,CaDRe提供了有竞争力的预测精度,并恢复了与领域专业知识一致的可视化因果图,从而为气候系统提供了可解释的见解。代码可在https://github.com/MinghaoFu/CaDRe获取。

英文摘要

Understanding climate dynamics requires going beyond correlations in observational data to uncover the underlying causal process. Latent drivers such as atmospheric processes play a central role in temporal dynamics, while direct causal influences also exist among geographically proximate observed variables. Traditional Causal Representation Learning (CRL) typically focuses on latent factors but overlooks such observable-to-observable causal relations, which limits its applicability to climate analysis. In this paper, we introduce a unified framework that jointly uncovers (i) causal relations among observed variables and (ii) latent driving forces together with their interactions. We establish conditions under which both the hidden dynamic process and the causal structure among observed variables are simultaneously identifiable from time-series data, and our guarantees continue to hold in the nonparametric setting through contextual information that recovers latent variables and causal relations. Building on these insights, we propose CaDRe (Causal Discovery and Representation learning), a time-series generative model with structural constraints that integrates CRL and causal discovery. Experiments on synthetic datasets validate our theoretical results. On real-world climate datasets, CaDRe delivers competitive forecasting accuracy and recovers visualized causal graphs aligned with domain expertise, thereby offering interpretable insights into climate systems. Code is available at https://github.com/MinghaoFu/CaDRe.

2510.03839 2026-06-01 cs.LG stat.ML 版本更新

Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting

关于通过鞅驱动的Fisher提示进行顺序测试时间自适应的技术说明

Behraj Khan, Tahir Qasim Syed

发表机构 * Institute of Business Administration(商业管理学院)

AI总结 提出M-FISHER框架,通过指数鞅检测分布漂移并利用Fisher预条件更新实现稳定自适应,提供时间一致的错误控制保证和最优检测延迟。

详情
AI中文摘要

我们提出了M-FISHER的理论框架,这是一种用于流数据中顺序分布漂移检测和稳定自适应的方法。对于检测,我们从非一致性分数构建指数鞅,并应用Ville不等式获得关于误报控制的时间一致保证,确保在任何停止时间下的统计有效性。在持续漂移下,我们进一步将期望检测延迟界定为$\mathcal{O}(\log(1/δ)/Γ)$,其中$Γ$反映了漂移后的信息增益,从而将检测效率与分布散度联系起来。对于自适应,我们展示了提示参数的Fisher预条件更新实现了在分布流形上的自然梯度下降,产生局部最优更新,最小化KL散度同时保持稳定性和参数化不变性。总之,这些结果确立了M-FISHER作为一种在协变量漂移下的顺序决策中实现鲁棒、任意时间有效检测和几何稳定自适应的原则性方法。

英文摘要

We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/δ)/Γ)$, where $Γ$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.

2505.05168 2026-06-01 math.ST cs.LG stat.ML stat.TH 版本更新

Dynamical local Fréchet curve regression in manifolds

流形上的动态局部Fréchet曲线回归

M. D. Ruiz-Medina, A. Torres-Signes

发表机构 * organization= 1 University of Granada 2 University of M\'alaga , country= Spain

AI总结 本文在可分离希尔伯特空间中推导了响应和回归变量的最小二乘局部线性Fréchet曲线预测器,并提出了基于加权Fréchet均值的流形内蕴局部线性Fréchet曲线预测器,证明了其渐近最优性。

Comments This paper is currently under journal second revision

详情
AI中文摘要

在温和条件下,本文推导了在可分离希尔伯特空间中评估的响应和回归变量的最小二乘局部线性Fréchet曲线预测器。我们获得了允许在向量函数的L^{2}空间中实现该局部线性Fréchet函数预测器的条件,该空间的值位于紧致黎曼流形上的时变切空间。其次,基于加权Fréchet均值方法,提出了在该流形上评估的内蕴局部线性Fréchet曲线预测器。证明了其渐近最优性。模拟研究和实际数据分析分析了两种预测器经验版本的有限样本性能,并与测地线Nadaraya-Watson型曲线预测器进行了比较。在实际数据分析中,基于NASA MAGSAT卫星的地心纬度和经度观测,对地球磁场的时变球坐标进行了函数预测。

英文摘要

Under mild conditions, this paper derives a least-squares local linear Fréchet curve predictor for response and regressor evaluated in a separable Hilbert space. We obtain the conditions allowing the implementation of this local linear Fréchet functional predictor in the ambient L^{2}-space of vector functions, with values in the time-varying tangent space on a compact Riemannian manifold. An intrinsic local linear Fréchet curve predictor evaluated in such a manifold is secondly proposed, based on a weighted Fréchet mean approach. Its asymptotical optimality is proved. The simulation study and real-data application analyze the finite-sample performance of the empirical versions of both predictors, compared with a geodesic Nadaraya-Watson-type curve predictor. In the real-data application, the functional prediction of the time-varying spherical coordinates of the Earth's magnetic field is addressed, from the observation of the geocentric latitude and longitude of the satellite NASA's MAGSAT spacecraft.

2510.02060 2026-06-01 cs.AI cs.LG 版本更新

ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection

ReTabAD: 恢复表格异常检测中语义上下文的基准

Sanghyu Yoon, Dongmin Kim, Suhee Yoon, Ye Seul Sim, Seungdong Yoa, Hye-Seung Cho, Soonyoung Lee, Hankook Lee, Woohyung Lim

发表机构 * LG AI Research, Seoul, South Korea(LG人工智能研究实验室,首尔,韩国) Sungkyunkwan University, Suwon, South Korea(成均馆大学,水原,韩国)

AI总结 针对现有表格异常检测基准缺乏语义上下文的问题,提出ReTabAD基准,通过丰富结构化文本元数据并集成零样本LLM框架,验证了语义上下文能提升检测性能和可解释性。

Comments Accepted to ICLR 2026

详情
AI中文摘要

在表格异常检测(AD)中,文本语义通常承载关键信号,因为异常的定义与特定领域的上下文紧密相关。然而,现有基准仅提供原始数据点,缺乏语义上下文,忽略了专家在实践中依赖的丰富文本元数据,如特征描述和领域知识。这一限制阻碍了研究灵活性,并阻止模型充分利用领域知识进行检测。ReTabAD通过恢复文本语义来解决这一差距,以实现上下文感知的表格AD研究。我们提供(1)20个精心策划的表格数据集,这些数据集丰富了结构化的文本元数据,以及最先进的AD算法的实现,包括经典方法、深度学习和基于LLM的方法,以及(2)一个零样本LLM框架,该框架利用语义上下文而无需特定任务训练,为未来研究建立了强大的基线。此外,本工作通过实验和分析提供了关于文本元数据在AD中的作用和实用性的见解。结果表明,语义上下文通过支持领域感知推理提高了检测性能并增强了可解释性。这些发现将ReTabAD确立为系统探索上下文感知AD的基准。

英文摘要

In tabular anomaly detection (AD), textual semantics often carry critical signals, as the definition of an anomaly is closely tied to domain-specific context. However, existing benchmarks provide only raw data points without semantic context, overlooking rich textual metadata such as feature descriptions and domain knowledge that experts rely on in practice. This limitation restricts research flexibility and prevents models from fully leveraging domain knowledge for detection. ReTabAD addresses this gap by restoring textual semantics to enable context-aware tabular AD research. We provide (1) 20 carefully curated tabular datasets enriched with structured textual metadata, together with implementations of state-of-the-art AD algorithms including classical, deep learning, and LLM-based approaches, and (2) a zero-shot LLM framework that leverages semantic context without task-specific training, establishing a strong baseline for future research. Furthermore, this work provides insights into the role and utility of textual metadata in AD through experiments and analysis. Results show that semantic context improves detection performance and enhances interpretability by supporting domain-aware reasoning. These findings establish ReTabAD as a benchmark for systematic exploration of context-aware AD.

2510.00419 2026-06-01 cs.LG 版本更新

Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

学习零阶优化器以微调大语言模型

Kairun Zhang, Haoyu Li, Yanjun Zhao, Yifan Sun, Huan Zhang

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一种基于学习的零阶优化器ZO-Finetuner,通过紧凑且内存高效的设计自动学习高效扰动策略,实现大语言模型微调时避免反向传播并降低内存开销,在4个LLM和7个数据集上82.1%的任务-模型组合中优于现有零阶基线方法。

Comments ICML 2026

详情
AI中文摘要

零阶优化器最近成为微调大语言模型(LLM)的一种有吸引力的方法,因为它们避免了反向传播,并且相对于标准一阶训练可以大幅减少内存开销。然而,现有的零阶方法依赖于手工设计的静态采样策略,无法适应模型特定的结构。为了解决这个问题,我们提出了ZO-Finetuner,一种基于学习的零阶优化器,通过紧凑且内存高效的设计自动学习高效的扰动策略。基于少量基础LLM在多个任务上被重复微调这一事实,ZO-Finetuner支持一次性每模型训练,并在下游任务中以最小开销重用。因此,为给定LLM学习一次优化器并在不同下游任务中重用既是可行的也是高度可取的。相应地,ZO-Finetuner旨在通过支持一次性每模型训练且开销最小,将学习优化(L2L)扩展到基础模型时代。在4个LLM和7个数据集上的实验表明,ZO-Finetuner在82.1%的任务-模型组合中优于先前的零阶基线方法,从而展示了其在高效LLM微调中的强大性能和可扩展性。代码可在https://github.com/ASTRAL-Group/ZO_Fine_tuner找到。

英文摘要

Zeroth-order optimizers have recently emerged as an attractive approach for fine-tuning large language models (LLMs), as they avoid backpropagation and can substantially reduce memory overhead relative to standard first-order training. However, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO-Finetuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Motivated by the fact that a small set of base LLMs is repeatedly fine-tuned across tasks, ZO-Finetuner supports one-time per-model training and reuse across downstream tasks with minimal overhead. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO-Finetuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time per-model training with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO-Finetuner outperforms prior zeroth-order baselines in 82.1\% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. The code can be found in https://github.com/ASTRAL-Group/ZO_Fine_tuner.

2509.25906 2026-06-01 cs.LG 版本更新

Federated Learning with Enhanced Privacy via Model Splitting and Random Client Participation

通过模型拆分和随机客户端参与增强隐私的联邦学习

Yiwei Li, Shuai Wang, Zhuojun Tian, Xiuhua Wang, Shijian Su

发表机构 * School of Optoelectronic & Communication Engineering, Xiamen University of Technology(厦门理工学院光电信息与通信工程学院) National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China(电子科技大学信息与通信国家重点实验室) Division of Information Science and Engineering, KTH Royal Institute of Technology(皇家理工学院信息科学与工程系) School of Cyber Science and Engineering, Huazhong University of Science and Technology(华中科技大学网络安全科学与工程学院) School of Engineering, Huaqiao University(华侨大学工程学院)

AI总结 提出MS-PAFL框架,通过将模型拆分为私有和公共子模型并仅向公共子模型注入噪声,结合随机客户端参与和本地数据子采样的隐私放大分析,在强隐私保证下实现更优的隐私-效用权衡。

Comments Accepted for publication in IEEE Transactions on Cognitive Communications and Networking

详情
AI中文摘要

联邦学习(FL)通常采用差分隐私(DP)来保护客户端数据,但隐私保证所需的附加噪声会显著降低模型精度。为解决这一挑战,我们提出了模型拆分隐私放大联邦学习(MS-PAFL),一种结合结构模型拆分与统计隐私放大的新颖框架。在该框架中,每个客户端的模型被划分为保留在本地私有子模型和用于全局聚合的公共子模型。校准的高斯噪声仅注入公共子模型,从而限制其不利影响,同时保留本地模型的效用。我们进一步提供了严格的理论分析,刻画了在该架构下通过随机客户端参与和本地数据子采样实现的联合隐私放大。分析给出了单轮和总隐私损失的紧界,表明MS-PAFL显著减少了满足目标隐私保护水平所需的噪声。大量实验验证了我们的理论发现,表明MS-PAFL始终获得更优的隐私-效用权衡,并能在强隐私保证下训练高精度模型。

英文摘要

Federated Learning (FL) often adopts differential privacy (DP) to protect client data, but the added noise required for privacy guarantees can substantially degrade model accuracy. To resolve this challenge, we propose model-splitting privacy-amplified federated learning (MS-PAFL), a novel framework that combines structural model splitting with statistical privacy amplification. In this framework, each client's model is partitioned into a private submodel, retained locally, and a public submodel, shared for global aggregation. The calibrated Gaussian noise is injected only into the public submodel, thereby confining its adverse impact while preserving the utility of the local model. We further present a rigorous theoretical analysis that characterizes the joint privacy amplification achieved through random client participation and local data subsampling under this architecture. The analysis provides tight bounds on both single-round and total privacy loss, demonstrating that MS-PAFL significantly reduces the noise necessary to satisfy a target privacy protection level. Extensive experiments validate our theoretical findings, showing that MS-PAFL consistently attains a superior privacy-utility trade-off and enables the training of highly accurate models under strong privacy guarantees.

2506.01467 2026-06-01 cs.LG cs.DM 版本更新

Feature-Aware (Hyper)graph Generation via Next-Scale Prediction

特征感知的(超)图生成:基于下一尺度预测

Dorian Gailhard, Enzo Tartaglione, Lirida Naviner, Jhony H. Giraldo

发表机构 * GitHub

AI总结 提出FAHNES框架,通过层次化下一尺度预测联合生成图/超图的拓扑和特征,实现大规模带特征图/超图的高效生成。

详情
AI中文摘要

图生成模型在小型结构化数据上表现良好,但难以扩展到大型复杂结构。层次化方法提高了可扩展性,但通常忽略节点和边特征,而这些特征在实际应用中至关重要,特别是对于建模高阶关系的超图。在本文中,我们提出FAHNES(通过下一尺度预测进行特征感知的(超)图生成),这是一个层次化框架,可联合生成图和超图的拓扑与特征。FAHNES通过节点粗化和局部扩展构建多尺度表示,并由一种新颖的层次化尺度编码引导,该编码控制粒度并确保跨尺度一致性。在合成数据集、3D网格和图点云数据集上的实验表明,该方法在独特扩展到带特征的大规模图和超图的同时,实现了具有竞争力或最先进的性能。我们的代码是开源的。

英文摘要

Graph generative models perform well on small structured data but struggle to scale to large, complex structures. Hierarchical approaches improve scalability but often ignore node and edge features, which are critical in real-world applications, particularly for hypergraphs that model higher-order relationships. In this paper, we propose FAHNES (feature-aware (hyper)graph generation via next-scale prediction), a hierarchical framework that jointly generates topology and features for graphs and hypergraphs. FAHNES builds multi-scale representations through node coarsening and localized expansion, guided by a novel hierarchical scale encoding that controls granularity and ensures cross-scale consistency. Experiments on synthetic, 3D mesh, and graph point cloud datasets demonstrate competitive or state-of-the-art performance while uniquely scaling to featured large-scale graphs and hypergraphs. Our code is open source

2509.22335 2026-06-01 cs.LG cs.AI 版本更新

Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

深度持续学习中的谱坍缩导致塑性丧失

Arjun Prakash, Naicheng He, Kaicheng Guo, Saket Tiwari, Ruo Yu Tao, Tyrone Serapio, Amy Greenwald, George Konidaris

发表机构 * Department of Computer Science, Brown University(布朗大学计算机科学系)

AI总结 研究深度神经网络在持续学习中塑性丧失的原因,发现新任务初始化时的Hessian谱坍缩是主要因素,并提出基于Kronecker分解的两种正则化方法以保持塑性。

详情
AI中文摘要

我们研究为什么深度神经网络在持续学习中会丧失塑性,从而在不重新初始化参数的情况下无法学习新任务。我们表明,这种失败之前在新任务初始化时会出现Hessian谱坍缩,其中有意义的曲率方向消失,梯度下降变得无效。通过分析线性化ReLU网络,我们推导出成功训练的显式$ε$-秩条件,并证明损失加权Gram矩阵在谱上与广义高斯-牛顿近似等价,从而将NTK动力学与Hessian曲率联系起来。直接针对谱坍缩,我们讨论了Hessian的Kronecker因子近似,这激发了两种正则化增强:保持高有效特征秩和应用L2惩罚。在持续监督学习和强化学习任务上的实验证实,结合这两种正则化器可以有效保持塑性。

英文摘要

We investigate why deep neural networks suffer from loss of plasticity in continual learning, and thus fail to learn new tasks without reinitializing parameters. We show that this failure is preceded by Hessian spectral collapse at new-task initialization, where meaningful curvature directions vanish and gradient descent becomes ineffective. Analyzing a linearized ReLU network, we derive explicit $ε$-rank conditions for successful training and prove that the loss-weighted Gram matrix is spectrally equivalent to the Generalized Gauss-Newton approximation, thereby relating NTK dynamics to Hessian curvature. Targeting spectral collapse directly, we then discuss the Kronecker factored approximation of the Hessian, which motivates two regularization enhancements: maintaining high effective feature rank and applying L2 penalties. Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.

2509.19452 2026-06-01 cs.RO cs.CV cs.LG 版本更新

HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames

HUNT:通过瞬时相对帧在非结构化环境中进行高速无人机导航与跟踪

Alessandro Saviolo, Jeffrey Mao, Giuseppe Loianno

发表机构 * New York University(纽约大学) University of California Berkeley(加州大学伯克利分校)

AI总结 提出HUNT框架,利用瞬时相对帧统一搜索与跟踪,实现高速飞行和鲁棒自主性。

详情
AI中文摘要

搜索与救援任务要求无人机既能高速穿越未知的非结构化环境,又能在检测到目标后跟踪目标。在感知退化且无全局定位的情况下实现这两种能力仍是一个开放挑战。最近的相对导航工作通过将规划和控制锚定到可见的检测目标上展示了鲁棒跟踪,但在视野中没有目标时无法进行导航。我们提出了HUNT(高速无人机导航与跟踪),一个实时框架,在单一相对公式中统一了穿越、获取和跟踪。HUNT直接从机载瞬时观测量(如姿态、高度和速度)定义导航目标,从而在搜索过程中实现反应式高速飞行。一旦检测到目标,相同的感知-控制管道无缝过渡到跟踪。在茂密森林、集装箱场地以及使用车辆和人体模型的搜索与救援任务中的户外实验表明,在全局方法失败的情况下,该框架实现了鲁棒自主性。

英文摘要

Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.

2506.11653 2026-06-01 cs.CV cs.AI cs.LG 版本更新

DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation

DISCO: 使用条件距离相关性减轻深度学习中的偏差

Emre Kavak, Tom Nuno Wolf, Christian Wachinger

发表机构 * Technical University of Munich, Germany(慕尼黑技术大学) Konrad Zuse School of Excellence in Reliable AI, Germany(Konrad Zuse可靠性人工智能卓越学院) Munich Center for Machine Learning (MCML), Germany(慕尼黑机器学习中心(MCML))

AI总结 提出基于反因果模型的条件独立性准则,并设计条件距离相关性的高效估计器DISCO$_m$和sDISCO,通过正则化实现梯度模型中的偏差缓解,在多个数据集上优于或媲美现有方法。

Comments Accepted to ICML 2026 (oral)

详情
AI中文摘要

数据集偏差常常导致深度学习模型利用虚假相关性而非任务相关信号。我们引入了标准反因果模型(SAM),这是一个统一的因果框架,用于刻画偏差机制并得出因果稳定性的条件独立性准则。基于这一理论,我们提出了DISCO$_m$和sDISCO,它们是条件距离相关性的高效且可扩展的估计器,能够在基于梯度的模型中实现独立性正则化。在六个不同数据集上,我们的方法在现有观察偏差缓解方法中持续表现更优或具有竞争力,同时需要更少的超参数并能够无缝扩展到多偏差场景。这项工作桥接了因果理论与实际深度学习,为稳健预测提供了原则性基础和有效工具。源代码:https://github.com/yakamoz5/DISCO。

英文摘要

Dataset bias often leads deep learning models to exploit spurious correlations instead of task-relevant signals. We introduce the Standard Anti-Causal Model (SAM), a unifying causal framework that characterizes bias mechanisms and yields a conditional independence criterion for causal stability. Building on this theory, we propose DISCO$_m$ and sDISCO, efficient and scalable estimators of conditional distance correlation that enable independence regularization in gradient-based models. Across six diverse datasets, our methods consistently outperform or are competitive in existing observed bias mitigation approaches, while requiring fewer hyperparameters and scaling seamlessly to multi-bias scenarios. This work bridges causal theory and practical deep learning, providing both a principled foundation and effective tools for robust prediction. Source Code: https://github.com/yakamoz5/DISCO.

2509.06856 2026-06-01 stat.ML cs.LG cs.NA math.NA 版本更新

Sequential Least-Squares Estimators with Fast Randomized Sketching for Linear Statistical Models

用于线性统计模型的快速随机草图化序贯最小二乘估计器

Guan-Yu Chen, Dong-Yue Xie, Xi Yang

发表机构 * School of Mathematics, Nanjing University of Aeronautics and Astronautics(南京航空航天大学数学学院)

AI总结 提出一种融合草图-求解与迭代草图方法的序贯最小二乘估计框架,通过逐步增大草图尺寸迭代求解子问题,高效获得高精度参数估计。

详情
AI中文摘要

我们提出了一种新颖的随机化框架,用于大规模线性统计模型的估计问题,即快速随机草图化序贯最小二乘估计器(SLSE-FRS),该框架首次集成了草图-求解和迭代草图方法。通过迭代构建和求解草图最小二乘子问题,并逐步增大草图尺寸以获得更好的精度,SLSE-FRS逐步细化真实参数向量的估计,最终产生高精度估计器。我们分析了SLSE-FRS的收敛性质,并提供了其高效实现。数值实验表明,SLSE-FRS优于最先进的方法,即预处理共轭梯度法和迭代双重草图法。

英文摘要

We propose a novel randomized framework for the estimation problem of large-scale linear statistical models, namely Sequential Least-Squares Estimators with Fast Randomized Sketching (SLSE-FRS), which integrates Sketch-and-Solve and Iterative-Sketching methods for the first time. By iteratively constructing and solving sketched least-squares (LS) subproblems with increasing sketch sizes to achieve better precisions, SLSE-FRS gradually refines the estimators of the true parameter vector, ultimately producing high-precision estimators. We analyze the convergence properties of SLSE-FRS, and provide its efficient implementation. Numerical experiments show that SLSE-FRS outperforms the state-of-the-art methods, namely the Preconditioned Conjugate Gradient (PCG) method, and the Iterative Double Sketching (IDS) method.

2509.00834 2026-06-01 cs.AI cs.FL cs.LG cs.LO 版本更新

Neuro-Symbolic Predictive Process Monitoring

神经符号预测性过程监控

Axel Mezini, Elena Umili, Ivan Donadello, Fabrizio Maria Maggi, Matteo Mancanelli, Fabio Patrizi

发表机构 * Faculty of Engineering, Free University of Bozen-Bolzano(博洛尼亚-博尔扎诺自由大学工程学院) Department of Computer, Control and Management Engineering, Sapienza, Università di Roma(罗马大学计算机、控制与管理工程系)

AI总结 提出一种结合数据驱动学习与时序逻辑先验知识的神经符号方法,通过可微逻辑损失函数训练自回归序列预测器,以提升业务过程管理中后缀预测的准确性和逻辑一致性。

详情
AI中文摘要

本文通过提出一种神经符号预测性过程监控(PPM)方法,解决了业务流程管理(BPM)中的后缀预测问题,该方法将数据驱动学习与时序逻辑先验知识相结合。尽管最近的方法利用深度学习模型进行后缀预测,但由于训练过程中缺乏领域知识的显式集成,它们常常无法满足甚至基本的逻辑约束。我们提出了一种新颖方法,将有限迹上的线性时序逻辑(LTLf)融入自回归序列预测器的训练过程。我们的方法引入了一个可微的逻辑损失函数,该函数使用LTLf语义的软近似和Gumbel-Softmax技巧定义,可以与标准预测损失结合。这确保了模型学习生成既准确又逻辑一致的后缀。在三个真实世界数据集上的实验评估表明,我们的方法提高了后缀预测的准确性和对时序约束的遵从性。我们还引入了逻辑损失的两种变体(局部和全局),并展示了它们在噪声和现实环境下的有效性。虽然是在BPM背景下开发的,我们的框架适用于任何符号序列生成任务,并有助于推进神经符号人工智能。

英文摘要

This paper addresses the problem of suffix prediction in Business Process Management (BPM) by proposing a Neuro-Symbolic Predictive Process Monitoring (PPM) approach that integrates data-driven learning with temporal logic-based prior knowledge. While recent approaches leverage deep learning models for suffix prediction, they often fail to satisfy even basic logical constraints due to the lack of explicit integration of domain knowledge during training. We propose a novel method to incorporate Linear Temporal Logic over finite traces (LTLf) into the training process of autoregressive sequence predictors. Our approach introduces a differentiable logical loss function, defined using a soft approximation of LTLf semantics and the Gumbel-Softmax trick, which can be combined with standard predictive losses. This ensures that the model learns to generate suffixes that are both accurate and logically consistent. Experimental evaluation on three real-world datasets shows that our method improves suffix prediction accuracy and compliance with temporal constraints. We also introduce two variants of the logic loss (local and global) and demonstrate their effectiveness under noisy and realistic settings. While developed in the context of BPM, our framework is applicable to any symbolic sequence generation task and contributes to advancing Neuro-Symbolic AI.

2508.20326 2026-06-01 stat.ML cs.LG math.OC 版本更新

Stochastic Gradients under Nuisances

干扰下的随机梯度

Facheng Yu, Ronak Mehta, Alex Luedtke, Zaid Harchaoui

发表机构 * University of Washington(华盛顿大学)

AI总结 本文研究目标函数依赖于未知干扰参数的学习问题中随机梯度算法的非渐近收敛性,证明在Neyman正交性等条件下经典算法仍可收敛,并提出近似正交化更新变体以在非正交情形下达到类似收敛率。

Comments Published at NeurIPS 2025

详情
AI中文摘要

随机梯度优化是从经典监督学习到现代自监督学习等多种场景的主要学习范式。我们考虑目标函数依赖于未知干扰参数的学习问题的随机梯度算法,并建立非渐近收敛保证。我们的结果表明,虽然干扰的存在会改变最优值并扰乱优化轨迹,但在适当条件下(如Neyman正交性),经典随机梯度算法仍可能收敛。此外,即使不满足Neyman正交性,我们证明一种具有近似正交化更新(通过近似正交化梯度预言)的算法变体也能达到类似的收敛率。讨论了来自正交统计学习/双机器学习以及因果推断的例子。

英文摘要

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.

2508.18730 2026-06-01 cs.LG cs.AR 版本更新

Beyond Tokens: Enhancing RTL Quality Estimation via Structural Graph Learning

超越令牌:通过结构图学习增强RTL质量估计

Yi Liu, Hongji Zhang, Yiwen Wang, Dimitris Tsaras, Lei Chen, Mingxuan Yuan, Qiang Xu

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR(香港中文大学计算机科学与工程系) Noah's Ark Lab, Huawei, Hong Kong SAR(华为诺亚实验室)

AI总结 提出StructRTL框架,利用控制数据流图的结构语义和自监督学习,结合知识蒸馏,显著提升寄存器传输级设计质量估计的准确性。

Comments Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

在电子设计自动化工作流程中,估计寄存器传输级(RTL)设计的质量至关重要,因为它能够在不进行耗时的逻辑综合的情况下,即时反馈面积和延迟等关键性能指标。虽然最近的方法利用大型语言模型从RTL代码中提取嵌入并取得了有希望的结果,但它们忽略了对于准确质量估计至关重要的结构语义。相比之下,控制数据流图(CDFG)视图更明确地揭示了设计的结构特征,为表示学习提供了更丰富的线索。在这项工作中,我们引入了StructRTL,一种新颖的结构感知图自监督学习框架,用于改进RTL设计质量估计。通过从CDFG学习结构信息表示,StructRTL在各种质量估计任务上显著优于先前的方法。为了进一步提升性能,我们结合了一种知识蒸馏策略,将后映射网表的低级洞察转移到基于CDFG的预测器中。实验结果表明,StructRTL取得了新的最先进结果,突显了将结构学习与跨阶段监督相结合的有效性。

英文摘要

Estimating the quality of register transfer level (RTL) designs is crucial in the electronic design automation (EDA) workflow, as it enables instant feedback on key performance metrics like area and delay without the need for time-consuming logic synthesis. While recent approaches have leveraged large language models (LLMs) to derive embeddings from RTL code and achieved promising results, they overlook the structural semantics essential for accurate quality estimation. In contrast, the control data flow graph (CDFG) view exposes the design's structural characteristics more explicitly, offering richer cues for representation learning. In this work, we introduce StructRTL, a novel structure-aware graph self-supervised learning framework for improved RTL design quality estimation. By learning structure-informed representations from CDFGs, StructRTL significantly outperforms prior art on various quality estimation tasks. To further boost performance, we incorporate a knowledge distillation strategy that transfers low-level insights from post-mapping netlists into the CDFG-based predictor. Experimental results demonstrate that StructRTL establishes new state-of-the-art results, highlighting the effectiveness of combining structural learning with cross-stage supervision.

2508.16687 2026-06-01 cs.LG 版本更新

Native Hierarchical and Compositional Representations with Subspace Embeddings

原生层次与组合表示:基于子空间嵌入

Gabriel Moreira, Zita Marinho, Manuel Marques, João Paulo Costeira, Chenyan Xiong

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出用线性子空间替代向量表示概念,通过子空间维度与包含关系自然建模层次与组合性,并引入可微软投影矩阵实现端到端训练,在层次推理和自然语言推理任务上达到最优性能。

Comments KDD 2026

详情
AI中文摘要

传统嵌入将数据点表示为向量,这使得相似度计算简单,但限制了其捕捉层次结构和组合性的能力。我们提出了一种根本不同的方法:将概念表示为线性子空间。通过跨越多个维度,子空间可以用高维区域建模更广泛的概念,并将更具体的概念嵌套其中。这种几何结构通过维度自然地捕捉一般性,通过包含关系捕捉层次性,并通过线性代数运算为组合提供涌现结构。为了使这种范式可训练,我们通过软投影矩阵引入了一种可微的子空间参数化方法,允许学习每个子空间的有效维度。我们的方法不仅在层次推理和自然语言推理基准上达到了最先进的性能,还提供了一种基于几何的蕴含模型。此外,我们证明,当标准向量嵌入在否定查询上退化为接近随机性能时,子空间嵌入无需显式监督即可原生地捕捉逻辑组合,同时保持与高效欧几里得向量搜索的兼容性。

英文摘要

Traditional embeddings represent datapoints as vectors, which makes similarity easy to compute but limits how well they capture hierarchies and compositionality. We propose a fundamentally different approach: representing concepts as linear subspaces. By spanning multiple dimensions, subspaces can model broader concepts with higher-dimensional regions and nest more specific concepts within them. This geometry naturally captures generality through dimension, hierarchy through inclusion, and enables an emergent structure for composition via linear algebraic operations. To make this paradigm trainable, we introduce a differentiable subspace parameterization via soft projection matrices, allowing the effective dimension of each subspace to be learned. Our method not only achieves state-of-the-art performance on hierarchical and natural language inference benchmarks but also provides a geometrically-grounded model of entailment. Further, we demonstrate that while standard vector embeddings degrade to near-random performance on negated queries, subspace embeddings natively capture logical composition without explicit supervision, while preserving compatibility with efficient Euclidean vector search.

2508.11911 2026-06-01 math.NA cs.LG cs.NA physics.comp-ph 版本更新

Reduced-order modeling of Hamiltonian dynamics based on symplectic neural networks

基于辛神经网络的哈密顿动力学降阶建模

Yongsheng Chen, Wei Guo, Qi Tang, Xinghui Zhong

发表机构 * School of Mathematical Sciences, Zhejiang University(浙江大学数学科学学院) Department of Mathematics and Statistics, Texas Tech University(德克萨斯理工大学数学与统计系) School of Computational Science and Engineering, Georgia Institute of Technology(佐治亚理工学院计算科学与工程学院)

AI总结 提出一种数据驱动的辛诱导降阶建模框架,通过统一端到端神经架构同时发现潜空间和学习动力学,确保降阶模型精确保持辛结构,提升长期稳定性和保真度。

详情
AI中文摘要

我们为高维哈密顿系统引入了一种新颖的数据驱动辛诱导降阶建模框架,该框架在单个端到端神经架构中统一了潜空间发现和动力学学习。编码器-解码器由Henon神经网络构建,并可增加线性SGS-反射层,从而在全相空间和潜相空间之间产生精确的辛映射。潜动力学由作为HenonNet实现的辛流映射推进。这种统一的神经架构确保在降阶水平上精确保持底层辛结构,显著增强所得ROM的保真度和长期稳定性。我们通过在典型哈密顿系统上的全面数值实验验证了该方法。结果表明,该方法具有准确的轨迹重建能力、训练时间范围之外的鲁棒预测性能以及精确的哈密顿量保持。这些有希望的结果强调了我们的辛ROM框架在广泛科学和工程学科中复杂动力系统的有效性和潜在适用性。

英文摘要

We introduce a novel data-driven symplectic induced-order modeling (ROM) framework for high-dimensional Hamiltonian systems that unifies latent-space discovery and dynamics learning within a single, end-to-end neural architecture. The encoder-decoder is built from Henon neural networks (HenonNets) and may be augmented with linear SGS-reflector layers. This yields an exact symplectic map between full and latent phase spaces. Latent dynamics are advanced by a symplectic flow map implemented as a HenonNet. This unified neural architecture ensures exact preservation of the underlying symplectic structure at the reduced-order level, significantly enhancing the fidelity and long-term stability of the resulting ROM. We validate our method through comprehensive numerical experiments on canonical Hamiltonian systems. The results demonstrate the method's capability for accurate trajectory reconstruction, robust predictive performance beyond the training horizon, and accurate Hamiltonian preservation. These promising outcomes underscore the effectiveness and potential applicability of our symplectic ROM framework for complex dynamical systems across a broad range of scientific and engineering disciplines.

2508.04457 2026-06-01 stat.ML cs.LG 版本更新

Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification

多标签胸部X光分类中的不确定性及其解缠基准测试

Simon Baur, Wojciech Samek, Jackie Ma

发表机构 * Fraunhofer Heinrich-Hertz-Institut(弗劳恩霍夫海因里希-赫兹研究所) Technische Universität Berlin(柏林技术大学) The Berlin Institute for the Foundations of Learning and Data (BIFOLD)(柏林学习与数据基础研究所)

AI总结 本研究使用MIMIC-CXR-JPG数据集,对多标签胸部X光分类任务中的13种不确定性量化方法进行基准测试,评估了卷积和Transformer架构,并扩展了三种方法到多标签设置,揭示了不同方法和架构在不确定性估计和解缠认知与偶然不确定性方面的优缺点。

详情
AI中文摘要

可靠的不确定性量化对于医疗影像中可信赖的决策和AI模型的部署至关重要。虽然先前的工作已经探索了神经网络在合成或定义良好的数据设置(如自然图像分类)中使用信息论方法量化预测、认知和偶然不确定性的能力,但其在真实医学诊断任务中的适用性仍未得到充分探索。在本研究中,我们使用MIMIC-CXR-JPG数据集为多标签胸部X光分类提供了广泛的不确定性量化基准。我们评估了基于卷积(ResNet)和基于Transformer(Vision Transformer)架构的13种不确定性量化方法,涵盖广泛的任务。此外,我们将证据深度学习、HetClass神经网络和深度确定性不确定性扩展到多标签设置。我们的分析提供了对不确定性估计有效性以及解缠认知和偶然不确定性能力的见解,揭示了方法和架构特定的优势和局限性。

英文摘要

Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our analysis provides insights into uncertainty estimation effectiveness and the ability to disentangle epistemic and aleatoric uncertainties, revealing method- and architecture-specific strengths and limitations.

2508.02217 2026-06-01 cs.LG 版本更新

Population-Free Pareto Tracking for Sample-Efficient Multi-Policy MORL

无种群的帕累托跟踪:面向样本高效的多策略多目标强化学习

Zeyu Zhao, Yueling Che, Kaichen Liu, Jian Li, Junmei Yao

发表机构 * College of Computer Science and Software Engineering, Shenzhen University, China(深圳大学计算机科学与软件工程学院)

AI总结 提出MPFT框架,通过无自进化种群的帕累托跟踪机制,结合单目标极端策略初始化,高效逼近完整帕累托前沿,显著提升样本效率并减少智能体-环境交互。

Comments 37 pages, 10 figures, ICML26 accepted paper

详情
AI中文摘要

多目标强化学习(MORL)是涉及多个冲突标准的现实世界决策问题的基本框架。现有的多策略(MP)方法通常依赖于维护大型策略种群的在线进化框架,导致高样本复杂性和过多的智能体-环境交互。为了缓解这些限制,我们提出了多策略帕累托前沿跟踪(MPFT),一种无需自进化种群的框架。它利用高效的帕累托跟踪机制,以单目标极端策略初始化来追踪帕累托前沿,并进一步加密稀疏区域以实现对完整帕累托前沿的精确近似。MPFT可以无缝集成先进的离线MORL算法,从而显著提高样本效率。我们在最多三个目标的六个机器人控制任务和超过三个目标的三个高维任务上评估了MPFT。实验结果表明,MPFT在超体积和期望效用方面优于最先进的基线。它还显著减少了智能体-环境交互。这些结果进一步证明,MPFT是一个通用框架,可以无缝集成在线和离线MORL算法。

英文摘要

Multi-objective reinforcement learning (MORL) is a fundamental framework for real-world decision-making problems involving multiple conflicting criteria. Existing multi-policy (MP) methods typically rely on online evolutionary frameworks that maintain large policy populations, leading to high sample complexity and excessive agent-environment interactions. To mitigate these limitations, we present Multi-policy Pareto Front Tracking (MPFT), a framework without a self-evolving population. It leverages an efficient Pareto-tracking mechanism initialized with single-objective extreme policies to trace the Pareto front, and further densifies sparse regions to achieve an accurate approximation of the full Pareto front. MPFT can be seamlessly integrated with advanced offline MORL algorithms, thereby substantially improving sample efficiency. We evaluate MPFT on six robotic control tasks with up to three objectives and three high-dimensional tasks with more than three objectives. Experimental results show that MPFT outperforms state-of-the-art baselines in terms of hypervolume and expected utility. It also significantly reduces agent-environment interactions. These results further demonstrate that MPFT serves as a general-purpose framework that can seamlessly integrate both online and offline MORL algorithms.

2507.17026 2026-06-01 stat.ML cs.LG 版本更新

Conformal C2ST: Turning weak classifiers into strong two-sample tests

Conformal C2ST:将弱分类器转化为强双样本检验

Vansh Bansal, Tianyu Chen, James G. Scott

发表机构 * Department of Statistics and Data Sciences, University of Texas at Austin, United States(统计与数据科学系,德克萨斯大学奥斯汀分校,美国)

AI总结 本文提出基于共形预测的C2ST变体,使任意弱分类器都能产生精确有限样本p值,实现可控第一类错误和温和退化的检验功效,并应用于神经后验估计验证。

详情
AI中文摘要

双样本检验问题是统计学和机器学习中的一项基本任务,旨在判断来自潜在分布$p$和$q$的两组样本是否实际上同分布(即$p=q$)。一种流行且直观的方法是分类器双样本检验(C2ST),其中训练一个分类器来区分来自$p$和$q$的样本。然而,尽管C2ST简单,其可靠性依赖于接近贝叶斯最优的分类器,这一要求很少满足且难以验证。这引发了一个重要的开放问题:弱分类器是否仍能用于双样本检验?我们证明答案是肯定的。基于Hu和Lei(2024)的工作,我们分析了C2ST的两种共形变体,它们将任何训练好的分类器(即使是弱的、有偏的或过拟合的)的分数转化为精确的有限样本p值。我们建立了共形C2ST的两个关键理论性质:(i)有限样本第一类错误控制,以及(ii)非平凡的功效,该功效随训练分类器误差的增加而温和退化。结果是,即使是表现不佳的分类器也能产生强大且可靠的双样本检验。这一通用框架在贝叶斯推断中找到了强大的应用,特别是在验证神经后验估计(NPE)模型时,其中比较学习到的后验近似$q(θ\mid y)$与真实后验$p(θ\mid y)$的任务可以表述为双样本检验。实验上,共形C2ST在此任务的广泛基准测试中优于经典判别检验。我们的结果确立了共形C2ST作为一种实用、理论基础的诊断工具。

英文摘要

The two-sample testing problem, a fundamental task in statistics and machine learning, seeks to determine whether two sets of samples, drawn from underlying distributions $p$ and $q$, are in fact identically distributed (i.e. whether $p=q$). A popular and intuitive approach is the classifier two-sample test (C2ST), where a classifier is trained to distinguish between samples from $p$ and $q$. Yet despite simplicity of the C2ST, its reliability hinges on access to a near-Bayes-optimal classifier, a requirement that is rarely met and difficult to verify. This raises a major open question: can a weak classifier still be useful for two-sample testing? We show that the answer is a definitive yes. Building on the work of Hu and Lei (2024), we analyze two conformal variants of the C2ST that convert the scores from any trained classifier -- even if weak, biased, or overfit -- into exact, finite-sample p-values. We establish two key theoretical properties of the conformal C2ST: (i) finite-sample Type-I error control, and (ii) non-trivial power that degrades gently in tandem with the error of the trained classifier. The upshot is that even poorly performing classifiers can yield powerful and reliable two-sample tests. This general framework finds a powerful application in Bayesian inference, particularly for validating Neural Posterior Estimation (NPE) models, where the task of comparing a learned posterior approximation $q(θ\mid y)$ to the true posterior $p(θ\mid y)$ can be framed as a two-sample test. Empirically, the Conformal C2ST outperforms classical discriminative tests across a wide range of benchmarks for this task. Our results establish the conformal C2ST as a practical, theoretically grounded diagnostic tool.

2506.03779 2026-06-01 quant-ph cs.LG stat.ML 版本更新

Position: Quantum Kernel Machines Should Move Beyond Scalar-Valued Kernels to Realize Their Potential

立场:量子核机器应超越标量值核以实现其潜力

Hachem Kadri, Joachim Tomasi, Yuka Hashimoto, Sandrine Anthoine

发表机构 * Aix-Marseille University, CNRS, LIS, Marseille, France(艾克斯-马赛大学,法国国家科学研究中心,LIS实验室,马赛,法国) Aix-Marseille University, CNRS, I2M, Marseille, France(艾克斯-马赛大学,法国国家科学研究中心,I2M实验室,马赛,法国) NTT, Inc., Tokyo, Japan(日本NTT公司,东京)

AI总结 本文主张量子核机器应转向算子值核等更富表达力的框架,以利用纠缠和非交换结构处理复杂结构化预测问题,并通过初步概念验证展示其优势。

详情
Journal ref
ICML 2026
AI中文摘要

基于量子力学原理构建的量子核函数已成为量子机器学习的核心。最近的研究表明,当从经典数据学习时,量子核无法提供显著的计算或统计优势,这削弱了最初对量子核机器的热情。然而,该领域的大多数研究都集中在标准分类或回归设置中的标量值核上,而经典核方法在这些设置中已经高效且有效,留给量子核改进的空间很小。在这篇立场论文中,我们认为该领域的进展需要超越标量值核,转向更富表达力的核框架。标量值核缺乏充分利用纠缠等内在量子资源所需的自由度,并且不足以处理经典学习方法难以应对的复杂学习任务。基于算子值核学习和$C^*$-代数核表示的最新进展,我们提出了一条设计能够利用纠缠和非交换结构来处理复杂结构化预测问题的量子核的路线图。为了支持这一观点,我们展示了一个初步的概念验证,说明量子算子值核公式如何揭示标量值核方法难以访问的结构依赖性。这一焦点的转移可能为新一代量子核机器及其潜在优势的更忠实探索开辟道路。

英文摘要

Quantum kernel functions built using quantum-mechanical principles and have emerged as a centerpiece of quantum machine learning. The initial enthusiasm for quantum kernel machines has been tempered by recent studies suggesting that quantum kernels could not offer significant computational or statistical advantages when learning from classical data. However, most of the research in this area has been devoted to scalar-valued kernels in standard classification or regression settings for which classical kernel methods are efficient and effective, leaving very little room for improvement with quantum kernels. In this position paper, we argue that progress in this field requires moving beyond scalar-valued kernels toward more expressive kernel frameworks. Scalar-valued kernels lack the degrees of freedom necessary to fully exploit intrinsically quantum resources such as entanglement and are not rich enough to deal with complex learning tasks where classical learning methods struggle. Building on recent advances in operator-valued kernel learning and $C^*$-algebraic kernel representations, we propose a roadmap for designing quantum kernels capable of leveraging entanglement and non-commutative structures to tackle complex structured prediction problems. To support this viewpoint, we present an initial proof-of-concept illustrating how quantum operator-valued kernel formulations can reveal structural dependencies that remain difficult to access for scalar-valued kernel methods. This shift in focus could open a pathway toward a new generation of quantum kernel machines and a more faithful exploration of their potential advantages.

2505.22934 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

解开LoRA干扰:用于鲁棒模型合并的正交子空间

Haobo Zhang, Jiayu Zhou

发表机构 * University of Michigan Ann Arbor(密歇根大学安娜堡分校)

AI总结 针对LoRA微调模型合并时性能下降的问题,提出通过微调前约束LoRA子空间正交性来减少任务间干扰的方法OSRM,可无缝集成现有合并算法,提升合并性能并保持单任务准确率。

Comments 14 pages, 5 figures, 16 tables, accepted by ACL 2025

详情
AI中文摘要

针对单个任务微调大型语言模型(LM)虽然性能强劲,但部署和存储成本高昂。近期研究探索模型合并,将多个任务特定模型组合成单个多任务模型,无需额外训练。然而,现有合并方法对于使用低秩适应(LoRA)微调的模型往往失败,导致性能显著下降。本文表明,这一问题源于模型参数与数据分布之间先前被忽视的相互作用。我们提出用于鲁棒模型合并的正交子空间(OSRM),在微调*之前*约束LoRA子空间,确保与一个任务相关的更新不会对其他任务的输出产生不利偏移。我们的方法可以无缝集成到大多数现有合并算法中,减少任务间的意外干扰。在八个数据集上使用三种广泛使用的LM和两种大型LM进行的广泛实验表明,我们的方法不仅提升了合并性能,还保持了单任务准确率。此外,我们的方法对合并的超参数表现出更强的鲁棒性。这些结果突显了数据-参数交互在模型合并中的重要性,并为合并LoRA模型提供了一种即插即用的解决方案。

英文摘要

Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

2505.20840 2026-06-01 cs.LG 版本更新

Aggregation Buffer: Revisiting DropEdge with a New Parameter Block

聚合缓冲区:用新参数块重新审视 DropEdge

Dooho Lee, Myeong Kong, Sagad Hamid, Cheonwoo Lee, Jaemin Yoo

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea(韩国釜山国立大学电气工程学院) Computer Science Department, University of Münster, Münster, Germany(德国穆斯堡大学计算机科学系)

AI总结 针对 DropEdge 在监督学习中性能受限的问题,提出一种名为 Aggregation Buffer 的参数块,通过改进 GNN 的鲁棒性来提升性能,并统一解决度偏差和结构差异等问题。

Comments Published at ICML 2025

详情
AI中文摘要

我们重新审视了 DropEdge,这是一种用于 GNN 的数据增强技术,通过在训练过程中随机移除边来暴露多样化的图结构。虽然这是一种有效减少对图中特定连接过拟合的有前途的方法,但我们观察到其在监督学习任务中的潜在性能提升非常有限。为了理解原因,我们提供了理论分析,表明 DropEdge 的有限性能来自于许多 GNN 架构中存在的根本性限制。基于此分析,我们提出了 Aggregation Buffer,这是一个专门设计的参数块,通过解决 DropEdge 的限制来提高 GNN 的鲁棒性。我们的方法与任何 GNN 模型兼容,并在多个数据集上展示了一致的性能提升。此外,我们的方法作为统一解决方案,有效解决了度偏差或结构差异等众所周知的问题。代码和数据集可在 https://github.com/dooho00/agg-buffer 获取。

英文摘要

We revisit DropEdge, a data augmentation technique for GNNs which randomly removes edges to expose diverse graph structures during training. While being a promising approach to effectively reduce overfitting on specific connections in the graph, we observe that its potential performance gain in supervised learning tasks is significantly limited. To understand why, we provide a theoretical analysis showing that the limited performance of DropEdge comes from the fundamental limitation that exists in many GNN architectures. Based on this analysis, we propose Aggregation Buffer, a parameter block specifically designed to improve the robustness of GNNs by addressing the limitation of DropEdge. Our method is compatible with any GNN model, and shows consistent performance improvements on multiple datasets. Moreover, our method effectively addresses well-known problems such as degree bias or structural disparity as a unifying solution. Code and datasets are available at https://github.com/dooho00/agg-buffer.

2411.13865 2026-06-01 cs.IR cs.AI cs.CL cs.LG 版本更新

Breaking Information Cocoons: A Hyperbolic Framework for Balancing Exploration and Exploitation in Recommender Systems

打破信息茧房:推荐系统中平衡探索与利用的双曲框架

Qiyao Ma, Menglin Yang, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying

发表机构 * University of California, Davis(加州大学戴维斯分校) The Hong Kong University of Science(香港科学大学) Snap Inc.(Snap公司) Yale University(耶鲁大学)

AI总结 提出双曲框架HERec,通过语义增强的层次机制和自动层次聚类,在推荐系统中平衡探索与利用,有效缓解信息茧房。

Comments Accepted to KDD 2026. Code: https://github.com/Martin-qyma/HERec

详情
AI中文摘要

现代推荐系统常常形成信息茧房,限制用户接触多样化内容。核心挑战在于平衡内容探索与利用,同时允许用户调整推荐偏好。理想情况下,这种平衡可以通过层次表示来捕捉,其中深度搜索促进利用,广度搜索促进探索。然而,现有方法面临两个基本限制:欧几里得方法难以捕捉层次结构,而双曲方法尽管在层次建模上表现优越,但缺乏对用户和物品画像的语义理解,且未能提供平衡探索与利用的原则性机制。为解决这些问题,我们提出HERec,一个在推荐系统中有效平衡探索与利用的双曲框架。我们的框架引入两项关键创新:(1)语义增强的层次机制,直接在双曲空间中将丰富的文本描述与协同信息对齐。理论梯度分析表明,这种对齐有效利用了底层双曲流形结构,从而更准确地建模用户和物品;(2)通过优化Dasgupta代价的自动层次聚类机制,无需预定义超参数即可发现层次结构,实现用户可调节的探索-利用权衡。大量实验表明,HERec持续优于欧几里得和双曲基线,在效用指标上提升高达5.49%,多样性指标提升11.39%,有效缓解了信息茧房。

英文摘要

Modern recommender systems often create information cocoons, restricting users' exposure to diverse content. The central challenge is to balance content exploration and exploitation while allowing users to adjust their recommendation preferences. Ideally, this balance can be captured with a hierarchical representation, where depth search facilitates exploitation and breadth search enables exploration. However, existing approaches face two fundamental limitations: Euclidean methods struggle to capture hierarchical structures, while hyperbolic methods, despite their superior hierarchical modeling, lack semantic understanding of user and item profiles and fail to provide a principled mechanism for balancing exploration and exploitation. To address these challenges, we propose HERec, a hyperbolic framework that effectively balances exploration and exploitation in recommender systems. Our framework introduces two key innovations: (1) a semantic-enhanced hierarchical mechanism that aligns rich textual descriptions with collaborative information directly in hyperbolic space. Theoretical gradient analysis demonstrates that this alignment effectively leverages the underlying hyperbolic manifold structure, resulting in more accurate modeling of users and items; (2) an automatic hierarchical clustering mechanism by optimizing Dasgupta's cost, which discovers hierarchical structures without requiring predefined hyperparameters, enabling user-adjustable exploration-exploitation trade-offs. Extensive experiments demonstrate that HERec consistently outperforms both Euclidean and hyperbolic baselines, achieving up to 5.49% improvement in utility metrics and 11.39% increase in diversity metrics, effectively mitigating information cocoons.

2504.10564 2026-06-01 q-bio.QM cs.LG q-bio.BM 版本更新

FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation

FLOWR: 用于结构感知的从头、基于相互作用和片段的配体生成的流匹配

Julian Cremer, Ross Irwin, Alessandro Tibo, Jon Paul Janet, Simon Olsson, Djork-Arné Clevert

发表机构 * Machine Learning & Computational Sciences, Pfizer Worldwide R&D(机器学习与计算科学,辉瑞全球研发) Molecular AI, Discovery Sciences, R&D, AstraZeneca(分子人工智能,发现科学,研发,阿斯利康) Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg(计算机科学与工程系,查尔姆斯理工大学和哥德堡大学)

AI总结 提出FLOWR框架,通过结合连续和分类流匹配与等变最优传输,并利用高效蛋白口袋条件化,实现三维配体的生成与优化,在有效性、姿态精度和相互作用恢复上超越现有方法,推理速度提升高达70倍。

详情
AI中文摘要

我们介绍了FLOWR,一个新颖的基于结构的框架,用于三维配体的生成和优化。FLOWR将连续和分类流匹配与等变最优传输相结合,并通过高效的蛋白口袋条件化增强。与FLOWR一起,我们提出了SPINDR,一个精心策划的数据集,包含配体-口袋共晶复合物,专门用于解决现有数据质量问题。实证评估表明,FLOWR在PoseBusters有效性、姿态精度和相互作用恢复方面超越了当前最先进的基于扩散和流的方法,同时提供了显著的推理加速,性能提升高达70倍。此外,我们引入了FLOWR:multi,一个高精度的多用途模型,允许针对性地采样符合预定义相互作用谱和化学子结构的新配体,用于基于片段的设计,无需重新训练或任何重采样策略。

英文摘要

We introduce FLOWR, a novel structure-based framework for the generation and optimization of three-dimensional ligands. FLOWR integrates continuous and categorical flow matching with equivariant optimal transport, enhanced by an efficient protein pocket conditioning. Alongside FLOWR, we present SPINDR, a thoroughly curated dataset comprising ligand-pocket co-crystal complexes specifically designed to address existing data quality issues. Empirical evaluations demonstrate that FLOWR surpasses current state-of-the-art diffusion- and flow-based methods in terms of PoseBusters-validity, pose accuracy, and interaction recovery, while offering a significant inference speedup, achieving up to 70-fold faster performance. In addition, we introduce FLOWR:multi, a highly accurate multi-purpose model allowing for the targeted sampling of novel ligands that adhere to predefined interaction profiles and chemical substructures for fragment-based design without the need of re-training or any re-sampling strategies

2502.15224 2026-06-01 cs.LG cs.AI 版本更新

Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery

自动发现基准:在Oracle引导发现中诊断结构化状态追踪

Tingting Chen, Beibei Lin, Srinivas Anumasa, Vedant Shah, Zifeng Yuan, Qiran Zou, Anirudh Goyal, Dianbo Liu

发表机构 * National University of Singapore(国立新加坡大学) Mila-Quebec AI institute(魁北克AI研究院) Meta Superintelligence Labs(Meta超智能实验室)

AI总结 提出Auto-Discovery-Bench基准,通过确定性Oracle引导的假设-干预-反馈循环,诊断智能体在结构化状态追踪中的能力瓶颈。

Comments 13 pages

详情
AI中文摘要

交互式发现要求智能体在多轮反馈中维护和更新结构化信念。在评估智能体于嘈杂、开放的科学环境中的表现之前,有必要在受控条件下隔离这一先决能力。我们引入了Auto-Discovery-Bench,一个确定性的Oracle引导诊断基准,其中智能体通过重复的假设-干预-反馈循环恢复隐藏结构。该基准实例化了三种受控发现抽象:有向图发现、无向关系发现和符号方程发现。在所有模型中,性能随着变量数量、轨迹长度和干扰项的增加而下降。一个独立的轨迹追踪诊断表明,即使移除了干预选择和假设生成,许多失败仍然存在,这表明在维护和整合长程结构化信息方面的限制是Oracle引导发现的重要瓶颈。Auto-Discovery-Bench并非旨在取代真实的发现环境;相反,它提供了一个可重复、低混淆的诊断测试平台,用于隔离交互式科学智能体的先决能力。

英文摘要

Interactive discovery requires agents to maintain and update structured beliefs over many rounds of feedback. Before evaluating agents in noisy, open-ended scientific environments, it is useful to isolate this prerequisite capability under controlled conditions. We introduce Auto-Discovery-Bench, a deterministic oracle-guided diagnostic benchmark in which agents recover hidden structures through repeated hypothesis--intervention--feedback cycles. The benchmark instantiates three controlled discovery abstractions: directed graph discovery, undirected relational discovery, and symbolic equation discovery. Across models, performance degrades as the number of variables, trajectory length, and distractors increase. A separate trajectory-tracking diagnostic shows that many failures persist even when intervention selection and hypothesis generation are removed, suggesting that limitations in maintaining and integrating long-range structured information are an important bottleneck for oracle-guided discovery. Auto-Discovery-Bench is not intended to replace realistic discovery environments; rather, it provides a reproducible, low-confound diagnostic testbed for isolating a prerequisite capability for interactive scientific agents.

2502.04671 2026-06-01 cs.AI cs.LG cs.LO cs.PL 版本更新

ProofWala: A Framework for Multilingual Proof Data Synthesis and Theorem-Proving

ProofWala: 多语言证明数据合成与定理证明框架

Amitayush Thakur, George Tsoukalas, Greg Durrett, Swarat Chaudhuri

发表机构 * University of Texas, Austin, USA(得克萨斯大学奥斯汀分校)

AI总结 提出ProofWala框架,通过itp-interface库实现与交互式定理证明器的程序化交互,支持多语言证明数据合成、并行证明搜索,并验证了跨语言与跨领域迁移的有效性。

详情
AI中文摘要

神经定理证明方法需要强大的基础设施来与交互式定理证明器(ITP)交互、提取结构化证明数据以及大规模执行证明搜索。然而,现有工具通常针对特定助手且面向文件级执行,使得仓库级分析和并行实验变得困难。我们提出ProofWala,一个多语言证明工程框架,基于 exttt{itp-interface}构建,这是一个用于与ITP进行程序化交互的可重用库。对于Lean 4,我们实现了一个在阐释器内部执行的元编程交互层,支持语义上忠实的策略级跟踪,以及跨整个仓库的声明和依赖级提取。该设计超越了传统的REPL式交互,支持项目范围的分析、环境克隆和证明状态的池化执行。相同的接口抽象支持多个版本的Rocq,形成统一的跨助手流水线。 基于此基础设施,ProofWala提供标准化的多语言证明数据集、模型训练工具和并行证明搜索算法。使用该框架,我们展示了跨Lean和Rocq的多语言训练能够实现跨语言和跨领域迁移。我们在Lean Mathlib和领域适应(CategoryTheory)上观察到统计显著的改进,而其他设置也呈现一致的增长趋势。我们在两个仓库中开源了完整框架、并行证明搜索模块、数据集和模型:ProofWala (https://github.com/trishullab/proof-wala) 和 itp-interface 库 (https://github.com/trishullab/itp-interface)。

英文摘要

Neural approaches to theorem proving require robust infrastructure for interfacing with interactive theorem provers (ITPs), extracting structured proof data, and executing proof search at scale. However, existing tooling is often assistant-specific and oriented toward file-level execution, making repository-scale analysis and parallel experimentation challenging. We present ProofWala, a multilingual proof engineering framework built around \texttt{itp-interface}, a reusable library for programmatic interaction with ITPs. For Lean 4, we implement a meta-programmed interaction layer executing inside the elaborator, enabling semantically faithful tactic-level tracing alongside declaration- and dependency-level extraction across entire repositories. This design extends beyond traditional REPL-style interaction by supporting project-wide analysis, environment cloning, and pooled execution of proof states. The same interface abstraction supports multiple versions of Rocq, yielding a unified cross-assistant pipeline. Built on this infrastructure, ProofWala provides standardized multilingual proof datasets, model training utilities, and parallel proof search algorithms. Using the framework, we demonstrate that multilingual training across Lean and Rocq enables cross-lingual and cross-domain transfer. We observe statistically significant improvements on Lean Mathlib and in domain adaptation (CategoryTheory), while other settings exhibit consistent upward trends. We open-source the full framework, parallel proof search module, datasets, and models across two repositories: ProofWala (https://github.com/trishullab/proof-wala) and the itp-interface library (https://github.com/trishullab/itp-interface).

2410.19153 2026-06-01 cs.LG 版本更新

Learning Coupled Subspaces for Multi-Condition Spike Data

学习多条件尖峰数据的耦合子空间

Yididiya Y. Nadew, Xuhui Fan, Christopher J. Quinn

发表机构 * Department of Computer Science, Iowa State University(爱荷华州立大学计算机科学系) School of Computing, Macquarie University(麦考瑞大学计算科学学院)

AI总结 提出耦合子空间高斯过程因子分析(CS-GPFA)模型,联合学习神经活动在条件空间中的潜在表示,并开发主动学习算法自适应选择条件,在合成和真实神经数据集上优于现有方法。

Comments 46 pages, 7 figures

详情
AI中文摘要

在神经科学中,许多研究在多种条件下进行感觉或行为实验,以获取高维尖峰序列数据集形式的神经反应。分析高维尖峰数据是一个具有挑战性的统计问题。为此,高斯过程因子分析(GPFA)作为一种流行的潜变量模型类别,被提出用于单一实验条件下收集的数据。GPFA提取平滑、低维的潜在轨迹,总结高维尖峰数据集。然而,标准GPFA独立推断每个实验条件下的这些轨迹,未考虑底层活动如何在条件空间上变化。这对准确性和潜在表示的可解释性都造成了限制。为解决这些限制,我们提出耦合子空间GPFA(CS-GPFA),一种联合学习潜在表示的贝叶斯模型,表征神经活动如何在条件空间上变化。在此基础上,我们进一步开发了一种主动学习算法,用于自适应选择条件。在合成和真实神经数据集上的实验表明,CS-GPFA相比现有方法实现了更优的性能。此外,我们的主动学习结果显示,CS-GPFA能在实际设置中有效指导实验设计。

英文摘要

In neuroscience, numerous studies conduct sensory or behavioral experiments under multiple conditions to acquire neural responses in the form of high-dimensional spike train datasets. Analyzing high-dimensional spike data is a challenging statistical problem. To this end, Gaussian process factor analysis (GPFA), a popular class of latent variable models, has been proposed for data collected under a single experimental condition. GPFA extracts smooth, low-dimensional latent trajectories that summarize highdimensional spike datasets. However, standard GPFA infers these trajectories independently for each experimental condition, not accounting for how the underlying activity varies across the condition space. This poses limitations on both accuracy and the interpretability of the latent representation. To address these limitations, we propose Coupled Subspaces GPFA (CS-GPFA), a Bayesian model that jointly learns latent representations, characterizing how the neural activity varies over the condition space. Building on this, we further develop an active-learning algorithm for adaptively selecting conditions. Experiments on both synthetic and real neural datasets demonstrate that CS-GPFA achieves superior performance compared to existing approaches. Moreover, our active learning results show that CS-GPFA can efficiently guide experiment design in practical settings.

2404.14928 2026-06-01 cs.LG cs.AI cs.CL cs.SI 版本更新

Graph Machine Learning in the Era of Large Language Models (LLMs)

大语言模型时代的图机器学习

Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Wenqi Fan, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li

发表机构 * The Hong Kong Polytechnic University(香港理工大学) Michigan State University(密歇根州立大学) North Carolina State University(北卡罗来纳州立大学) Baidu Inc(百度公司)

AI总结 本文综述了大语言模型如何增强图机器学习的泛化、迁移和少样本学习能力,以及图如何提升大语言模型的推理和可解释性。

Comments Accepted by TIST

详情
AI中文摘要

图在表示社交网络、知识图谱和分子发现等各个领域的复杂关系中扮演着重要角色。随着深度学习的出现,图神经网络(GNN)已成为图机器学习(Graph ML)的基石,促进了图的表示和处理。最近,大语言模型(LLM)在语言任务中展现出前所未有的能力,并被广泛应用于计算机视觉和推荐系统等各种应用中。这一显著成功也引起了将LLM应用于图领域的兴趣。越来越多的努力致力于探索LLM在提升图机器学习的泛化性、迁移性和少样本学习能力方面的潜力。同时,图,尤其是知识图谱,富含可靠的事实知识,可用于增强LLM的推理能力,并可能缓解其局限性,如幻觉和缺乏可解释性。鉴于这一研究方向的快速进展,有必要对LLM时代图机器学习的最新进展进行系统综述,为研究人员和从业者提供深入理解。因此,在本综述中,我们首先回顾了图机器学习的最新发展。然后,我们探讨了如何利用LLM来增强图特征的质量,减轻对标注数据的依赖,并解决图异质性和分布外(OOD)泛化等挑战。之后,我们深入探讨了图如何增强LLM,突出了它们增强LLM预训练和推理的能力。此外,我们调查了各种应用,并讨论了这一有前景领域的潜在未来方向。

英文摘要

Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graphs. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph Heterophily and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.

2307.04722 2026-06-01 cs.LG 版本更新

Advances and Challenges in Meta-Learning: A Technical Review

元学习的进展与挑战:技术综述

Anna Vettoruzzo, Mohamed-Rafik Bouguelia, Joaquin Vanschoren, Thorsteinn Rögnvaldsson, KC Santosh

发表机构 * Automated Machine Learning Group, Eindhoven University of Technology, Netherlands(埃因霍温理工大学自动化机器学习小组) Applied AI Research Lab, Department of Computer Science, University of South Dakota, USA(南达科他大学计算机科学系应用人工智能研究实验室)

AI总结 本文全面综述元学习技术,探讨其与多任务学习、迁移学习等领域的关联,并指出未来研究方向。

详情
AI中文摘要

元学习使学习系统能够从多个任务中获取知识,从而更快地适应和泛化到新任务。本综述对元学习进行了全面的技术概述,强调了其在数据稀缺或获取成本高的实际应用中的重要性。本文涵盖了最先进的元学习方法,并探讨了元学习与多任务学习、迁移学习、领域适应与泛化、自监督学习、个性化联邦学习和持续学习之间的关系。通过突出这些主题与元学习领域之间的协同作用,本文展示了某一领域的进展如何惠及整个领域,同时避免不必要的重复工作。此外,本文深入探讨了高级元学习主题,例如从复杂的多模态任务分布中学习、无监督元学习、学习有效适应数据分布变化以及持续元学习。最后,本文指出了该领域未来研究的开放问题和挑战。通过综合最新的研究进展,本文提供了对元学习及其对各种机器学习应用潜在影响的深入理解。我们相信,这篇技术综述将有助于元学习的进步及其在解决实际问题中的实际应用。

英文摘要

Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, self-supervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised meta-learning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing real-world problems.

1709.08894 2026-06-01 stat.ML cs.LG 版本更新

On the regularization of Wasserstein GANs

关于Wasserstein GANs的正则化

Henning Petzka, Asja Fischer, Denis Lukovnikov

发表机构 * Fraunhofer Institute IAIS(弗劳恩霍夫研究所IAIS) Department of Computer Science, University of Bonn(波恩大学计算机科学系)

AI总结 本文研究Wasserstein GANs中Lipschitz约束的正则化方法,通过理论分析和实验证明使用较弱的正则化项优于权重裁剪。

Comments Published as a conference paper at ICLR 2018. * Henning Petzka and Asja Fischer contributed equally to this work (11 pages +13 pages appendix)

详情
AI中文摘要

自生成对抗网络(GANs)发明以来,它们已成为学习建模真实(未标记)数据分布的一种流行方法。训练过程中的收敛问题通过Wasserstein GANs得以克服,后者通过不同的度量最小化模型与经验分布之间的距离,但由此在优化问题中引入了Lipschitz约束。在神经网络可建模的函数类上强制Lipschitz约束的一种简单方法是权重裁剪。有人提出,可以通过在损失函数中添加正则化项来改进训练,该正则化项惩罚判别器(作为网络输入的函数)的梯度偏离1。我们提出了理论论据,说明为什么使用较弱的正则化项来强制Lipschitz约束更可取。这些论据得到了在玩具数据集上的实验结果的支持。

英文摘要

Since their invention, generative adversarial networks (GANs) have become a popular approach for learning to model a distribution of real (unlabeled) data. Convergence problems during training are overcome by Wasserstein GANs which minimize the distance between the model and the empirical distribution in terms of a different metric, but thereby introduce a Lipschitz constraint into the optimization problem. A simple way to enforce the Lipschitz constraint on the class of functions, which can be modeled by the neural network, is weight clipping. It was proposed that training can be improved by instead augmenting the loss by a regularization term that penalizes the deviation of the gradient of the critic (as a function of the network's input) from one. We present theoretical arguments why using a weaker regularization term enforcing the Lipschitz constraint is preferable. These arguments are supported by experimental results on toy data sets.