arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 40 篇

2606.13705 2026-06-15 cs.LG cs.AI 新提交

Can Editing 1 Neuron Fix Repetition Loops in LLMs?

编辑1个神经元能修复LLM中的重复循环吗?

Aristotelis Lazaridis, Aman Sharma, Dylan Bates, Brian King, Vincent Lu, Jack FitzGerald

发表机构 * Edgerunner AI

AI总结 本文发现Gemma 4模型在长事实列举任务中高达95%的概率陷入重复循环,通过逐层消融和逐神经元归因定位到少量MLP神经元,并用静态权重编辑(小至单个神经元符号反转)消除循环,但无法解决因知识缺失导致的“末日循环”。

详情
AI中文摘要

是的。它能治愈末日循环吗?可能不行。Gemma 4指令微调模型存在一个可复现的失败:在长事实列举提示(如列出电视剧的每一集、88个IAU星座或151个原始宝可梦)上,它们会崩溃成重复,要么是严格的逐字循环,要么是列表条目退化到单一答案。这些循环的发生率高达95%,并且能抵抗提示改写、推理引擎更改和大多数采样调整。在本文中,我们探讨这种行为是否足够局部化,从而可以通过权重编辑来消除。为了定位原因,我们使用逐层消融和逐神经元归因,然后通过完整生成扫描确认最强候选。循环追溯到一小部分MLP神经元(或者在26B-A4B混合专家模型中,几个路由专家),我们通过静态权重编辑抑制它们。这些“手术”可以小到单个符号反转的神经元(在E2B模型中)。有效编辑的大小随模型规模增长,但在所有情况下,循环模式可以在正常生成预算内解决,同时保持通用基准分数。然而,编辑并不能解决所有问题:我们还研究了更长的思考预算,其中两个较大的模型最明显地进入末日循环,即模型在无法回忆的事实上自我纠正的循环,耗尽预算而不给出最终答案。我们表明,这种残余失败通过相同的编辑减少但未消除,并认为它本质上是知识精度问题,而非可移除的电路;权重手术可以删除循环,但不能提供缺失的事实。我们的结果既是可行性证明——即具体的生成病理可以定位到少数参数并编辑掉——也是对该方法适用范围的界定。

英文摘要

Yes. Can it cure doom loops? Probably not. The Gemma 4 instruction-tuned models share a reproducible failure: on long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokemon, they collapse into repetition, either a tight verbatim loop or a list whose entries decay onto a single answer. These loops occur at rates as high as 95% and survive prompt rewording, inference-engine changes, and most sampling adjustments. In this paper we explore whether this behavior is localized enough to remove by weight edits. To localize the cause, we use per-layer ablation and per-neuron attribution, then confirm the strongest candidates with full-generation sweeps. The loops trace to a small set of MLP neurons (or, in the 26B-A4B Mixture-of-Experts model, a few routed experts) which we suppress with static weight edits. These "surgeries" can be as small as a single sign-inverted neuron (in the E2B model). The size of the effective edits grows with model scale, but in all cases, the loop patterns can be addressed at normal generation budgets while preserving general-purpose benchmark scores. However, the edits do not solve everything: we also study longer thinking budgets, where the two larger models most visibly enter doom looping, i.e. a non-convergent regime in which the model self-corrects in circles over a fact it cannot recall, exhausting the budget without committing to a final answer. We show this residual failure is reduced but not eliminated by the same edits, and argue it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop, but it cannot supply a missing fact. Our results are both a feasibility demonstration, that is, evidence that a concrete generation pathology can be localized to a few parameters and edited out, and a delineation of where that approach stops.

2606.13754 2026-06-15 cs.LG 新提交

D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection

D2H-AD:一种利用超维度计算进行高级异常检测的混合模型

Ghazal Ghajari, Elaheh Ghajari, Ashutosh Ghimire, Saeid Ataei, Faris Alsulami, Fathi Amsaad

发表机构 * Wright State University(莱特州立大学) Azad University(阿扎德大学) Stevens Institute of Technology(史蒂文斯理工学院) University of Jeddah(吉达大学)

AI总结 提出基于超维度计算的异常检测框架D2H-AD,通过距离相似性和密度感知编码统一表示,在多个基准数据集上优于现有方法,具有轻量、可解释和高效的特点。

详情
AI中文摘要

异常检测是智能系统的基本组成部分,应用于医疗、网络安全、智能电网和物联网环境。尽管传统的机器学习和深度学习方法在识别异常方面表现出有效性,但它们通常依赖大量标记数据集、计算成本高,并在边缘和高维场景中面临可扩展性挑战。本文提出D2H-AD,一种基于超维度计算(HDC)的新型异常检测框架,HDC是一种受大脑启发的范式,使用高维分布式向量表示信息。与现有基于HDC的方法不同,D2H-AD在统一框架中集成了基于距离的相似性和密度感知编码,改进了异常表示和检测性能。消融研究表明,仅超维度编码相比直接在原始特征空间应用相同的密度-距离评分,ROC-AUC提升高达5.4%。此外,D2H-AD在所有评估数据集上始终优于五个基线方法:HDAD、ODHD、一类SVM、孤立森林和自编码器。该框架轻量、可解释且计算高效,适用于资源受限和实时应用。我们在五个基准数据集上验证了D2H-AD,展示了优越的F1分数和ROC-AUC性能,以及对类别不平衡、噪声和数据复杂性的鲁棒性。除了提高准确性,D2H-AD还提供可扩展性、小内存占用和低延迟操作,这得益于二进制计算和紧凑设计。这些特性使其特别适用于TinyML和边缘AI部署。所提出的框架突显了HDC在动态环境中进行准确、可解释和节能异常检测的潜力。

英文摘要

Anomaly detection is a fundamental component of intelligent systems with applications in healthcare, cybersecurity, smart grids, and IoT environments. Although conventional machine learning and deep learning methods have demonstrated effectiveness in identifying anomalies, they often rely on large labeled datasets, incur high computational costs, and face scalability challenges in edge and high-dimensional settings. This paper presents D2H-AD, a novel anomaly detection framework based on Hyperdimensional Computing (HDC), a brain-inspired paradigm that represents information using high-dimensional distributed vectors. Unlike existing HDC-based methods, D2H-AD integrates distance-based similarity and density-aware encoding within a unified framework, improving anomaly representation and detection performance. Ablation studies show that hyperdimensional encoding alone yields up to 5.4% higher ROC-AUC than applying the same density-distance scoring directly in the original feature space. Furthermore, D2H-AD consistently outperforms five established baselines, namely HDAD, ODHD, One-Class SVM, Isolation Forest, and Autoencoders, across all evaluated datasets. The framework is lightweight, interpretable, and computationally efficient, making it suitable for resource-constrained and real-time applications. We validate D2H-AD on five benchmark datasets and demonstrate superior F1-score and ROC-AUC performance, together with robustness to class imbalance, noise, and data complexity. In addition to improved accuracy, D2H-AD offers scalability, a small memory footprint, and low-latency operation enabled by binary computations and a compact design. These properties make it particularly attractive for TinyML and edge AI deployments. The proposed framework highlights the potential of HDC for accurate, interpretable, and energy-efficient anomaly detection in dynamic environments.

2606.13803 2026-06-15 cs.LG 新提交

Neural Slack Variables for Shape Constraints

形状约束的神经松弛变量

Ruben Wiedemann, Antoine Jacquier, Lukas Gonon

发表机构 * Imperial College London(伦敦帝国理工学院) University of St. Gallen(圣加仑大学)

AI总结 提出神经松弛变量方法,将约束强制执行转化为回归问题,通过联合学习辅助网络实现零违规,应用于单调性和凸性约束及金融波动曲面学习。

详情
AI中文摘要

在神经网络中强制执行单调性和凸性等函数不等式约束是许多工业和科学应用中的基本挑战。经典的惩罚方法和基于互补松弛性的原始-对偶方法仅在违反位置提供约束梯度,导致约束满足脆弱。另一方面,通过构造保证可行性的架构仍然主要限于简单情况,并引入额外的归纳偏差。我们提出神经松弛变量,一种深度学习原生的原始侧方法,通过将主网络与联合学习的辅助网络耦合,将约束强制执行转化为回归问题。辅助网络作为主网络约束量的有效目标,诱导可行性和正则性。神经松弛变量在密集网格的单调性和凸性测试案例上实现了零测量违规,而惩罚和原始-对偶基线存在残余违规,并实现了波动率曲面的无套利学习,这是量化金融中的一个开放工业挑战。

英文摘要

Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.

2606.13862 2026-06-15 cs.LG cs.AI cs.CL 新提交

SuperThoughts: Reasoning Tokens in Superposition

SuperThoughts: 叠加中的推理令牌

Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Microsoft Research(微软研究院) Independent(独立机构) Princeton University(普林斯顿大学) Rice University(莱斯大学)

AI总结 提出SuperThoughts方法,通过将连续CoT令牌对压缩为单一潜在表示并利用多令牌预测模块解码,在保持训练监督的同时将推理吞吐量翻倍,实现约20-30%的CoT长度缩减且精度损失极小。

详情
AI中文摘要

长链思维(CoT)推理提升了LLM的问题解决能力,但由于顺序生成令牌导致计算成本高昂。尽管近期工作探索在连续潜在空间中进行推理以绕过离散令牌生成,但这些方法常面临训练稳定性问题,且因缺乏监督信号而难以扩展到复杂的长程任务。我们提出SuperThoughts,将连续的CoT令牌对压缩为单一潜在表示,并通过轻量级多令牌预测(MTP)模块每步解码两个令牌。这既在训练时保留了离散令牌监督,又在推理时使吞吐量翻倍。我们在Qwen2.5-Math-1.5B-Instruct、Qwen2.5-Math-7B-Instruct、Qwen2.5-Math-14B-Instruct上进行微调,并在MATH500、AMC、OlympiadBench和GPQA-Diamond上评估。通过基于置信度的自适应机制(在不确定时回退到标准解码),SuperThoughts实现了约20-30%的CoT长度缩减,同时保持精度,在大多数任务上仅下降1-2个准确率点。

英文摘要

Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20--30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).

2606.13901 2026-06-15 cs.LG cs.NE 新提交

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

SpikF-GO: 用于多元时间序列预测的尖峰傅里叶图算子

Jafar Bakhshaliyev, Niels Landwehr

发表机构 * Data Science Group, University of Hildesheim(希尔德斯海姆大学数据科学组)

AI总结 针对现有SNN预测方法缺乏变量间依赖建模的问题,提出SpikF-GO,通过超变量图公式和尖峰驱动谱处理,结合可学习稀疏频率门和复数LIF门,在统一协议下达到SNN方法最佳平均排名,并降低能耗。

Comments 23 pages, 2 figures, 11 tables. Accepted for presentation at ECML PKDD 2026. Code: https://github.com/jafarbakhshaliyev/SpikF-GO

详情
AI中文摘要

尖峰神经网络(SNNs)已成为传统神经网络的一种节能替代方案,在计算机视觉和机器人技术中表现出强劲性能。最近,SNNs已被应用于时间序列预测(TSF),相关方法探索了尖峰时间骨干网络、尖峰兼容位置编码、傅里叶域处理以及重新设计的神经元动力学。然而,现有的SNN预测方法独立处理变量,缺乏对变量间依赖关系建模的显式机制。在多元设置中,跨变量相关性携带大量预测信息,这是一个关键限制。我们提出了尖峰傅里叶图算子(SpikF-GO),通过结合超变量图公式(其中每个标量观测值成为一个图节点)和尖峰驱动谱处理来解决这一空白。SpikF-GO引入了一个硬混凝土频率门用于可学习的稀疏频率选择,以及一个复数LIF门,该门对实部和虚部傅里叶分量应用独立的尖峰神经元,在整个谱域中保持二进制事件驱动计算。我们进一步提出了一个变体,结合了基于中央模式生成器的位置编码,以增强长程时间建模。在统一实验协议下对八个基准进行评估,SpikF-GO在所有SNN方法中取得了最佳平均排名,并以更低的能耗优于其ANN对应方法FourierGNN。即使在显著更小的嵌入维度下,SpikF-GO仍保持竞争性精度,从而实现了显著的能耗降低。据我们所知,这是首批将基于图的多元建模引入尖峰领域用于TSF的工作之一,也是首个在共同实验协议下提供SNN预测架构统一比较的工作。

英文摘要

Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series forecasting (TSF), with methods exploring spiking temporal backbones, spike-compatible positional encodings, Fourier-domain processing, and redesigned neuron dynamics. However, existing SNN forecasting approaches process variables independently, lacking explicit mechanisms for modeling inter-variable dependencies. This is a critical limitation in multivariate settings, where cross-variable correlations carry substantial predictive information. We propose Spiking Fourier Graph Operators (SpikF-GO), which addresses this gap by combining a hypervariate graph formulation in which every scalar observation becomes a graph node with spike-driven spectral processing. SpikF-GO introduces a Hard Concrete frequency gate for learnable sparse frequency selection and a Complex LIF gate that applies independent spiking neurons to real and imaginary Fourier components, preserving binary, event-driven computation throughout the spectral domain. We further present a variant incorporating Central Pattern Generator-based positional encodings for stronger long-range temporal modeling. Evaluated on eight benchmarks under a unified experimental protocol, SpikF-GO achieves the best average rank among all SNN methods and outperforms its ANN counterpart, FourierGNN, at reduced energy cost. SpikF-GO maintains competitive accuracy even at substantially smaller embedding dimensions, thereby achieving significant energy reductions. To our knowledge, this is among the first works to bring graph-based multivariate modeling into the spiking domain for TSF and the first to provide a unified comparison across SNN forecasting architectures under a common experimental protocol.

2606.14040 2026-06-15 cs.LG 新提交

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

在应当稀疏处分解,在应当稠密处吸收

Ruixuan Deng, Zehao Jin, Zekun Wang, Zihan Dong

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 针对稀疏自编码器假设所有激活内容均可稀疏分解的缺陷,提出在标准SAE旁添加低秩线性瓶颈以吸收稠密成分,在Gemma-2-2B第12层上秩24瓶颈减少84%稠密潜变量,并揭示该成分是结构可识别、因果必要且被稀疏字典冗余编码的计算脚手架。

详情
AI中文摘要

稀疏自编码器(SAE)通常被训练为通过稀疏字典重建残差流的\textbf{全部}内容,隐含假设所有激活内容都适合稀疏、单语义的分解。我们质疑这一假设,并推测激活包含一个低秩、稠密的成分,该成分对模型计算重要但本质上不适合稀疏表示,这是训练SAE中广泛观察到的持久稠密潜变量的主要来源。为验证这一点,我们在标准SAE(BatchTopK和Matryoshka)旁添加一个小型秩$r$线性瓶颈,使得稠密结构在稀疏重建前被吸收。在Gemma-2-2B第12层上,秩24瓶颈将稠密潜变量计数减少高达84%,同时在匹配稀疏度下改善了两种架构的稀疏探测和定向探测扰动。被吸收的成分(i)在\textbf{结构上可识别},即顶部主成分和离群维度;(ii)在\textbf{因果上必要},移除它会使下一个token的交叉熵增加7.5倍,远超移除几何上几乎相同的顶部24个PCA方向带来的2.8倍增加;(iii)被\textbf{稀疏字典冗余编码},消融787个最大对齐的稀疏特征仅使交叉熵增加2.9倍,消融2048个主题对齐特征几乎不改变MMLU主题分类,而移除脚手架则使其从98.7%降至随机水平。综合来看,我们的发现识别出残差流激活中一个紧凑、语义信息丰富且因果重要的成分(我们称之为\textbf{计算脚手架}),标准稀疏字典对其表示效率低下,表明基于稀疏性的可解释性方法的适用范围需要谨慎重新审视。

英文摘要

Sparse autoencoders (SAEs) are typically trained to reconstruct the \textbf{entire} residual stream through a sparse dictionary, implicitly assuming that all activation content is amenable to sparse, monosemantic decomposition. We question this assumption and hypothesize that activations contain a low-rank, dense component that is computationally important to the model yet inherently unsuitable for sparse representation, which serves as a major source of the persistent dense latents widely observed in trained SAEs. To test this, we add a small rank-$r$ linear bottleneck in parallel with standard SAEs (BatchTopK and Matryoshka), allowing dense structure to be absorbed before sparse reconstruction. On Gemma-2-2B layer 12, a rank-24 bottleneck reduces dense latent count by up to 84\% while improving sparse probing and targeted probe perturbation on both architectures at matched sparsity. The absorbed component is (i) \textbf{structurally identifiable} as the top principal components and outlier dimensions; (ii) \textbf{causally necessary}, with removing it raising next-token cross-entropy by 7.5$\times$, far exceeding the 2.8$\times$ from removing the geometrically near-identical top-24 PCA directions; and (iii) \textbf{redundantly encoded by sparse dictionaries}, with ablating 787 maximally aligned sparse features raising cross-entropy by only 2.9$\times$ and ablating 2,048 topic-aligned features leaving MMLU topic classification virtually unchanged, whereas removing the scaffold drops it from 98.7\% to chance. Together, our findings identify a compact, semantically informative and causally important component of residual stream activations (which we term a \textbf{computational scaffold}) that standard sparse dictionaries represent inefficiently, suggesting that the scope of sparsity-based interpretability methods warrants careful re-examination.

2606.14079 2026-06-15 cs.LG 新提交

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

随机动力系统的嵌入潜转移算子深度谱学习

Ryogo Tanaka, Yoshinobu Kawahara

发表机构 * Graduate School of Information Science and Technology, The University of Osaka(大阪大学信息科学与技术研究生院) Center for Advanced Intelligence Project, RIKEN(理化学研究所先进智能项目中心)

AI总结 提出一种深度谱编码器方法,通过可学习的非线性特征映射定义马尔可夫潜状态,利用泛函典型相关分析和Galerkin投影估计转移与观测算子,实现贝叶斯滤波和Koopman谱分解,在噪声和部分可观测条件下表现稳定优越。

Comments Accepted at the 42nd Conference on Uncertainty in Artificial Intelligence (UAI 2026)

详情
AI中文摘要

我们提出了一种用于随机非线性动力系统的谱学习方法,该方法在深度特征空间中用嵌入的潜转移算子表示。我们将该方法实例化为深度谱编码器(DSE),一种基于算子的潜状态空间模型,其中时不变神经编码器从观测中实现可学习的非线性特征映射,这些特征定义了马尔可夫潜状态,其时间演化和观测映射分别由转移算子和观测算子描述。在可学习的Galerkin投影特征空间中的泛函典型相关分析提供了来自过去和未来观测的状态坐标,两个线性算子以岭正则化的闭式解形式在状态坐标上估计,这些解与相关算子的Galerkin投影一致。在此表示上,我们推广了特征空间中的序贯贝叶斯滤波和Koopman谱模态分解。多个场景的实验表明,即使在噪声和部分可观测条件下,与序贯贝叶斯滤波和动态模式分解基线相比,该方法性能稳定且优越。

英文摘要

We propose a spectral learning method for stochastic nonlinear dynamical systems represented with embedded latent transfer operators in deep feature spaces. We instantiate the method as Deep Spectral Encoder (DSE), an operator-based latent state-space model in which a time-invariant neural encoder implements learnable nonlinear feature maps from observations, and these features define Markovian latent states whose temporal evolution and observation mapping are described by the transfer and observation operators, respectively. Functional canonical correlation analysis in a learnable Galerkin-projected feature space provides state coordinates from past and future observations, and the two linear operators are estimated on the state coordinates as ridge-regularized closed-form solutions that coincide with Galerkin projections of the associated covariance operators. On this representation, we generalize sequential Bayesian filtering and Koopman spectral mode decomposition in feature space. Experiments on several scenarios show stable and superior performance with sequential Bayesian filtering and dynamic mode decomposition baselines even under noise and partial observability.

2606.14156 2026-06-15 cs.LG cs.AI 新提交

Learning High Coverage Discriminative Parsimonious Rulesets

学习高覆盖判别性简约规则集

Mariamma Antony, Raman Sankaran, Chiranjib Bhattacharyya, Uma Satya Ranjan

发表机构 * Indian Institute of Science(印度科学研究所) Compass

AI总结 提出CDPR方法,通过子模最大化算法学习高覆盖、判别性且简约的规则集,在保持高准确率的同时显著提升可解释性,覆盖率比次优算法提升2.5倍以上。

详情
AI中文摘要

基于IF-THEN规则表示的学习系统易于提供可解释性,使其成为当代人工智能研究的关键焦点。此类规则集的一个关键目标是实现高判别能力和可解释性。虽然现有的最先进算法隐式地优先考虑预测准确性,但它们通常在确保可解释性的一个或多个质量指标(如规则集的覆盖率和简约性)上表现不足。受此启发,本文提出开发CDPR,旨在为分类问题创建高度准确且可解释的规则集。据我们所知,这是首次尝试建立这样的方法。在本研究中,我们引入了两种基于子模最大化的算法,这些算法不仅提供了可证明的覆盖率保证,而且产生的规则集既具有判别性又简约。我们通过实验证明,通过我们的方法学习的规则集在准确性和可解释性方面表现更好,并且与次优算法相比,平均覆盖率提高了2.5倍以上。

英文摘要

Learning systems based on IF-THEN rule representations readily offer interpretability, making them a crucial focus in contemporary AI research. A key objective for such rule sets is to achieve both high discriminative power and interpretability. While existing state-of-the-art algorithms implicitly prioritize predictive accuracy, they often fall short on one or more quality metrics that ensure interpretability, such as coverage and parsimony of rule sets. Motivated by this, this paper propose the development of CDPR, which aims to create highly accurate and interpretable rule sets for classification problems. To the best of our knowledge, this represents the first attempt to establish such an approach. In this study, we introduce two algorithms rooted in submodular maximization, which not only provide provable guarantees on coverage but also yield rule sets that are both discriminative and parsimonious. We empirically demonstrate that rule sets learned through our approaches achieve higher accuracy and interpretability and has more than a 2.5-fold improvement in average coverage rates when compared to the next best algorithm.

2606.14195 2026-06-15 cs.LG math.OC 新提交

Structured Noise Adaptation for Sequential Bayesian Filtering with Embedded Latent Transfer Operators

基于嵌入潜传递算子的序贯贝叶斯滤波的结构化噪声自适应

Naichang Ke, Pongpisit Thanasutives, Yoshinobu Kawahara

发表机构 * The University of Osaka(大阪大学) RIKEN Center for Advanced Intelligence Project (AIP)(理化学研究所革新智能综合研究中心(AIP))

AI总结 针对ELTO卡尔曼滤波器噪声模型无法适应非平稳过程的问题,提出结构化噪声参数化方法,结合最优时不变噪声学习与动态参数自适应,提升时变噪声环境下的状态估计性能。

Comments Accepted by TMLR

详情
AI中文摘要

基于嵌入潜传递算子(ELTO)的卡尔曼滤波器成为序贯状态估计的新型统计工具。然而,一个关键限制源于其使用简化的噪声模型,无法动态适应非平稳过程。为解决此限制,我们引入了一种基于ELTO的贝叶斯滤波方法,对滤波器的噪声模型采用新的结构化参数化。该参数化实现了结构化噪声自适应,将最优时不变噪声模型的数据驱动学习与响应非平稳过程中动态变化的动态参数自适应相结合。实验结果表明,我们的结构化噪声自适应提高了滤波器在噪声、时变环境中的动态状态估计性能。

英文摘要

Kalman filters based on the Embedded Latent Transfer Operators (ELTO) emerge as novel statistical tools for sequential state estimation. However, a critical limitation stems from their use of simplified noise models, which fail to dynamically adapt to non-stationary processes. To address this limitation, we introduce an ELTO-based Bayesian filtering approach with a new structured parameterization for the filter's noise model. This parameterization enables structured noise adaptation, which couples the data-driven learning of an optimal time-invariant noise model with dynamic parameter adaptation that responds to changes in dynamics within non-stationary processes. Empirical results show that our structured noise adaptation improves the filter's dynamic state estimation performance in noisy, time-varying environments.

2606.14283 2026-06-15 cs.LG cs.AI 新提交

DIFF-ERO: A Conformance-Aware Loss for Deep Learning in Process Mining

DIFF-ERO:一种面向过程挖掘的深度学习一致性感知损失函数

Johannes De Smedt, Jari Peeperkorn, Artem Polyvyanyy, Jochen De Weerdt

发表机构 * KU Leuven(鲁汶大学) The University of Melbourne(墨尔本大学) Information Systems Engineering Research Group (LIRIS), KU Leuven(鲁汶大学信息系统工程研究组(LIRIS))

AI总结 提出DIFF-ERO,一种可微的随机一致性损失函数,通过构建软边界的批次级随机转移矩阵,在训练中融入控制流信息,提升深度学习模型在过程数据上的结构预测性能。

Comments Accepted at the 24th International Conference on Business Process Management

详情
AI中文摘要

深度学习推动了过程分析领域的许多最新进展,尤其是在预测性和规范性监控方面。然而,诸如交叉熵之类的标准目标函数优化的是局部下一步似然,仅隐式地捕获控制流结构。因此,模型在实现高令牌级准确率的同时,可能允许不精确的全局行为。我们提出了DIFF-ERO,一种用于过程数据深度学习模型的一致性感知损失函数。DIFF-ERO是基于熵的随机一致性的可微形式,在训练过程中融入控制流信息。我们的方法构建了具有软边成员资格的批次级随机转移矩阵,使得结构精度和召回率信号能够直接指导反向传播。该损失函数是模型无关的,只要最终表示参数化随机转移,就可以应用。我们将DIFF-ERO实例化到用于下一活动预测的Transformer编码器-解码器流水线中,并与交叉熵联合使用,分析其理论组件在收敛方面的表现。在比较其他损失函数和目标的基准测试中,DIFF-ERO在结构至关重要的地方显示出改进的预测性能,同时在其它地方保持同等水平。同时,学习到的随机自动机向结构真实值收敛,表明网络内化了过程模型结构。

英文摘要

Deep learning has driven many recent advances in process analytics, especially for predictive and prescriptive monitoring. However, standard objectives such as cross-entropy optimize local next-step likelihoods and only implicitly capture control-flow structure. As a result, models can achieve high token-level accuracy while permitting imprecise global behaviour. We introduce DIFF-ERO, a conformance-aware loss function for deep learning models on process data. DIFF-ERO is a differentiable formulation of entropy-based stochastic conformance that incorporates control-flow information during training. Our approach constructs batch-level stochastic transition matrices with soft edge memberships, allowing structural precision and recall signals to directly inform backpropagation. The loss is model-agnostic and can be applied whenever the final representation parametrizes stochastic transitions. We instantiate DIFF-ERO in transformer encoder-decoder pipelines for next-activity prediction and use it jointly with cross-entropy to analyse its theoretical components with respect to convergence. Across benchmarks comparing other loss functions and targets, DIFF-ERO shows improved predictive performance where structure matters most while maintaining parity elsewhere. At the same time, the learned stochastic automaton converges towards the structural ground truth, indicating that the network internalizes process model structure.

2606.14284 2026-06-15 cs.LG cs.AI 新提交

Hierarchical ODE: Learning Continuous-Time Physical Prototypes for Early Link Failure Detection

层次化常微分方程:学习连续时间物理原型用于早期链路故障检测

Jiaen Lv, Leran Qi, Shaowei Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出层次化常微分方程聚类网络,利用神经ODE建模连续潜状态演化,解耦随机噪声与动态趋势,自适应确定原型数量,在不规则采样时间序列的早期链路故障检测中有效提取物理原型。

Comments International Conference on Machine Learning 2026

详情
AI中文摘要

时间序列原型学习从根本上受到观测模糊性的挑战。离散架构无法解决这一问题,因为它们缺乏将随机噪声与连续动态解耦的能力。此外,僵化的封闭集假设无法捕捉未见过的多样性。为了解决这些局限性,我们提出了一种层次化常微分方程聚类网络,该网络利用神经常微分方程将潜状态演化建模为连续积分曲线。这种形式强制时间连续性,从而有效将平滑特征趋势与随机噪声分离,同时我们的自适应层次机制无需严格的先验约束即可自主确定合适的原型数量。在具有不规则采样时间序列的早期链路故障检测任务上验证,所提方法有效提取了底层物理原型,从而实现了鲁棒的故障检测。我们的代码可在此https URL获取。

英文摘要

Time series prototype learning is fundamentally challenged by observational ambiguity. Discrete architectures fail to resolve this, as they lack the capacity to decouple stochastic noise from continuous dynamics. Furthermore, rigid closed-set assumptions fail to capture unseen diversity. To address these limitations, we propose a hierarchical ordinary differential equation clustering network, which utilizes neural ordinary differential equation to model latent state evolution as a continuous integral curve. This formulation enforces temporal continuity to effectively disentangle smooth feature trends from stochastic noise, while our adaptive hierarchical mechanism autonomously determines the appropriate number of prototypes without rigid prior constraints. Validated on the early link failure detection task with irregularly sampled time series, the proposed method effectively extracts underlying physical prototypes, thereby enabling robust failure detection. Our code is available at https://github.com/NJ-LNN/Hierarchical-ODE.

2606.14368 2026-06-15 cs.LG cs.CL 新提交

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

做我的导师:通过同伴反馈实现互惠LLM改进的在策略共蒸馏

Woohyeon Byeon, Jiwon Jeon, Jeonghye Kim, Youngchul Sung

发表机构 * KAIST(韩国科学技术院)

AI总结 提出在策略共蒸馏(OPCoD)方法,通过认知门控和反馈锚定实现两个不同领域强模型间的相互教学,达到帕累托改进。

详情
AI中文摘要

我们研究多领域LLM训练,其中两个模型各自在不同领域更强,通过在线策略反馈相互教学共同进化。与单向蒸馏或单模型微调不同,我们的目标是互惠帕累托改进:每个模型在所有领域提升而不损失原有优势。为此,我们提出在策略共蒸馏(OPCoD),其中每个学生的自蒸馏以其自身正确展开和同伴反馈为条件。为使反馈交换有效,OPCoD使用基于认知的门控决定何时给予反馈,以及反馈锚定将反馈扎根于问题。在科学问答任务上,OPCoD持续优于基线,并在所有评估的领域对和学生中实现帕累托改进。

英文摘要

We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual Pareto improvement: each model improves across domains without losing its original strength. To this end, we propose On-Policy Co-Distillation (OPCoD), where each student's self-distillation is conditioned on its own correct rollout and feedback from its peer. To make feedback exchange effective, OPCoD uses cognizance-based gating to decide when to give feedback and feedback anchoring to ground feedback in the problem. On Science Q\&A tasks, OPCoD consistently outperforms baselines and achieves Pareto improvement across all evaluated domain pairs and students.

2606.14388 2026-06-15 cs.LG 新提交

A Low-Rank Subspace Analysis of LLM Interventions

LLM干预的低秩子空间分析

Angira Sharma, Christian Schroeder de Witt, Philip Torr, Anisoara Calinescu, Jialin Yu

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出诊断框架,将行为建模为激活空间中的低秩子空间,发现干预一个行为会不对称地影响其他行为,效果与子空间重叠和决策子空间角度相关。

Comments Mechanistic Interpretability Workshop @ ICML 2026

详情
AI中文摘要

旨在修改LLM特定行为(如拒绝或谄媚)的干预措施通常会在其他行为中产生意外变化。这种缺乏针对性控制使得设计和实施可靠的安全控制变得困难。为了理解这些副作用,我们引入了一个诊断框架来分析LLM中交互行为。我们将行为建模为激活空间中的低秩子空间,并研究干预如何跨行为产生影响。在多个指令调优模型(7B-70B)以及拒绝、越狱和谄媚设置中,我们发现不同行为共享内部表示,并且干预一个行为会以不对称方式改变其他行为。一些行为作为上游控制点,其干预广泛传播到其他行为,而另一些则更为孤立。我们将这些效应与两个几何量联系起来:(i)行为子空间之间的重叠,以主角度平均余弦平方度量;(ii)每个行为子空间与决策子空间(捕捉模型最终决策,如拒绝与服从)之间的角度。经验上,对于子空间重叠较高的行为对,以及子空间更接近(角度更小)决策子空间的源行为,干预对其他行为的影响往往更大。这些发现突显了针对性行为控制的挑战:行为难以独立修改,因为干预可以通过共享表示和不对称交互传播。

英文摘要

Interventions designed to modify a particular behavior in LLMs, such as refusal or sycophancy, often produce unintended changes in other behaviors. This lack of targeted control makes it difficult to design and implement reliable safety controls. To understand these side-effects, we introduce a diagnostic framework for analyzing interacting behaviors in LLMs. We model behaviors as low-rank subspaces in activation space, and study how interventions influence across behaviors. Across multiple instruction-tuned models (7B-70B) and across refusal, jailbreak, and sycophancy settings, we find that different behaviors share internal representations, and intervening on one behavior alters others in asymmetric ways. Some behaviors act as upstream control points whose interventions propagate broadly across other behaviors, while others remain more isolated. We relate these effects to two geometric quantities: (i) the overlap between behavior subspaces, measured as the average squared cosine of principal angles, and (ii) the angle between each behavior subspace and the decision subspace (capturing the model's final decision e.g., refuse vs. comply). Empirically, intervention effects on other behaviors tend to be larger for behavior pairs with higher subspace overlap, and for source behaviors whose subspaces lie closer (smaller angle) to the decision subspace. These findings highlight a challenge for targeted behavior control: behaviors are difficult to modify independently, as interventions can propagate through shared representations and asymmetric interactions.

2606.14463 2026-06-15 cs.LG 新提交

EM-NeSy: Expectation Maximization for Neurosymbolic Learning

EM-NeSy:神经符号学习的期望最大化

Annegret Seibt, Luc De Raedt, Giuseppe Marra

发表机构 * Department of Computer Science(计算机科学系) KU Leuven(根特大学)

AI总结 提出EM-NeSy框架,将概率神经符号学习视为期望最大化算法实例,通过概率推理计算符号后验,仅通过神经组件进行梯度更新,实现可扩展且高效的近似推理。

详情
AI中文摘要

神经符号(NeSy)模型融合神经网络和符号推理,以实现鲁棒且可解释的人工智能。最先进的NeSy模型要求符号组件以可微分方式表达,这常常使近似推理的使用复杂化。我们提出EM-NeSy,将概率神经符号学习视为期望最大化(EM)算法的一个实例。在期望步骤中,我们通过概率推理计算基于标签的神经预测符号的后验。在最大化步骤中,我们仅通过神经组件使用梯度下降基于该后验更新神经参数。该公式释放了EM算法在NeSy学习中的全部潜力。它允许NeSy自然地扩展到近似推理,无需对符号组件进行任何额外修改或可微分性要求。此外,在精确推理下,它恢复了标准的端到端基于梯度的NeSy设置。我们的实验结果证明了EM-NeSy的可扩展性和计算效率。

英文摘要

Neurosymbolic (NeSy) models integrate neural networks and symbolic reasoning for robust and interpretable AI. State-of-the-art NeSy models require that the symbolic component is expressed in a differentiable way, often complicating the use of approximate inference. We propose EM-NeSy which casts probabilistic NeSy learning as an instance of the Expectation-Maximization (EM) algorithm. In the expectation step, we compute the posterior over the neurally predicted symbols conditioned on the label via probabilistic inference. In the maximization step, we update the neural parameters based on this posterior using gradient descent only through the neural component. This formulation unlocks the full potential of the EM algorithm for NeSy learning. It allows NeSy to extend naturally to approximate reasoning without any additional modifications or differentiability requirements of the symbolic component. Furthermore, it recovers the standard end-to-end gradient-based NeSy setting under exact inference. Our experimental results demonstrate the scalability and computational efficiency of EM-NeSy.

2606.14530 2026-06-15 cs.LG 新提交

Code Correctness Signals in LLM Hidden States: Pre-Generation Probing and Repair Geometry

LLM隐藏状态中的代码正确性信号:生成前探测与修复几何

Carlo Di Cicco

发表机构 * Independent researcher(独立研究员)

AI总结 本文通过残差化方法,发现Qwen3-4B-Instruct模型在生成前隐藏状态可线性解码代码正确性(AUC 0.931),但修复成功的方向性信号在控制上下文协变量后消失,揭示了方法学上的正负结果。

Comments 12 pages, 8 tables. Code, data, and analysis scripts available at https://github.com/CarloDiCicco/ReasoningLab

详情
AI中文摘要

大型语言模型在其隐藏状态中编码丰富信息。本文研究在Qwen3-4B-Instruct-2507生成之前以及修复失败尝试时,代码正确性是否可从隐藏状态中解读,基于444个LiveCodeBench任务。报告两个发现,通过单一混杂控制工具——残差化联系起来。首先,模型首次尝试代码的正确性可从提示最终隐藏状态线性解码,在50个外部分割上无泄漏的留出AUC为0.931±0.008。从每个隐藏状态维度去除提示长度的线性效应后,探针仍达到0.911±0.010,远高于提示长度基线0.754±0.014。其次,在236个清理后的案例中,模型尝试修复失败的首次尝试,从失败尝试到修复的隐藏状态偏移携带统计上可检测的对比方向,在幅度和分割半测试中均显著高于标签打乱的零假设。该方向在对修复上下文协变量(成功与失败修复间不同)进行条件残差化后不再存在,表明它是修复成功的相关因素,由修复上下文驱动,而非孤立的修复理解特征。探针层通过嵌套交叉验证选择,同样的残差化方法支持了生成前正确性结果,却推翻了修复方向解释。贡献既是方法论上的也是实证上的:一个足够诚实的诊断,同时报告了负面结果和正面结果。

英文摘要

Large language models encode rich information in their hidden states. This work asks whether code correctness is legible in the hidden states of Qwen3-4B-Instruct-2507, before it generates and as it repairs a failed attempt, studied on 444 LiveCodeBench tasks. It reports two findings connected by a single confound-control tool: residualization. First, the correctness of the model's first-attempt code is linearly decodable from the prompt-final hidden state, with a leakage-free held-out AUC of 0.931 +/- 0.008 across 50 outer splits. After the linear effect of prompt length is removed from each hidden state dimension, the probe still reaches 0.911 +/- 0.010, well above a prompt-length baseline of 0.754 +/- 0.014. Second, on 236 cleaned cases where the model attempts to repair a failed first attempt, the hidden state shift from the failing attempt to its repair carries a statistically detectable contrastive direction, significant on both a magnitude and a split-half test against label-shuffled nulls. This direction does not survive a conditional residualization against repair-context covariates that differ between successful and failed repairs, marking it as a correlate of repair success driven by the repair context rather than an isolated repair-comprehension feature. The probe layer is selected by nested cross-validation, and the same residualization approach that upholds the pre-generation correctness result overturns the repair-direction interpretation. The contribution is as much methodological as empirical: a diagnostic honest enough to report a negative result alongside a positive one.

2606.14597 2026-06-15 cs.LG 新提交

Zero-shot generalization of transformer neural operators to larger domains

Transformer神经算子对更大领域的零样本泛化

Armand de Villeroché, Sibo Cheng, Vincent Le Guen, Marc Bocquet, Rem-Sophia Mouradi, Patrick Armand, Alban Farchi, Patrick Massin

发表机构 * CEREA, ENPC, EDF R&D, Institut Polytechnique de Paris(CEREA, ENPC, EDF研发部, 巴黎综合理工学院) SINCLAIR AI Laboratory(SINCLAIR人工智能实验室) EDF R&D(EDF研发部) CEA, DAM, DIF(法国原子能委员会, 军事应用局, 法兰西岛)

AI总结 提出一种在注意力对数计算中引入可分解局部性偏置的方法,结合旋转位置嵌入,使Transformer神经算子能零样本泛化到更大空间域,在PDE和3D工业流中验证有效性。

详情
AI中文摘要

基于Transformer的神经算子在逼近复杂几何上偏微分方程的解算子方面表现出色。然而,现有方法隐式假设固定域大小,限制了其推理时的泛化能力。在这项工作中,我们研究了域扩展,即在空间域显著大于训练时遇到的域上进行零样本推理。我们认为这种设置从根本上需要空间局部性和平移等变性。我们提出通过在注意力对数计算中引入可分解偏置来实现这种局部性,从而在保持完全可分解为查询-键内积的同时实现精细可控的局部性,并直接与优化的注意力内核兼容。结合旋转位置嵌入,它能够在不改变Transformer架构的情况下,实现具有可控空间支持的表达性嵌入。我们通过实验表明,我们的方法在两个PDE基准测试和一个3D工业大气流动应用中显著改善了向更大域的零样本泛化。我们的代码和数据集可在以下网址获取:此 https URL。

英文摘要

Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.

2606.14620 2026-06-15 cs.LG 新提交

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens

既非并行也非顺序:DiffusionGemma 实际如何提交令牌

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 通过钩取DiffusionGemma 26B的采样器接受步骤,测量其解码顺序,发现解码既非并行也非块自回归,而是呈现部分从左到右的提交偏差,且块大小是测量尺度的伪影而非架构特性。

详情
AI中文摘要

开放扩散语言模型被宣传为并行的非自回归解码器,但实际检查点提交令牌的顺序几乎从未被测量。我们对DiffusionGemma 26B(基于Gemma 4构建的掩码离散扩散混合专家模型)进行检测,钩取其采样器的接受步骤,记录哪些画布位置在何时以何种置信度提交。通过686个提示、六种机制的探测套件,我们发现其解码既非并行也非块自回归:它遵循部分从左到右的提交偏差,其表观强度几乎完全取决于分析的粒度。令牌级别的顺序较弱,随着分析粗化而平滑增强,因此模型的“块大小”实际上是测量尺度的伪影而非架构特性。模型以大的同步批次提交令牌,批次内的顺序大部分是真正未定义的,而不仅仅是未被观测到。该行为依赖于机制:结构化JSON以基本任意的顺序提交,位置的提交置信度在数学推理中跟踪正确性,但在事实回忆中无信号。提交是激进的,在步骤预算内以短暂的后期爆发完成,而任务准确率与其自回归的Gemma-4兄弟模型相匹配。除了这些发现,我们的核心贡献是方法论上的:诚实地测量解码顺序需要处理尾部EOS填充、机制内混杂、提交非单调性、块大小敏感性以及大批次提交的平局,否则每个因素都可能制造出实际上不存在的解码顺序结果。

英文摘要

Open diffusion language models are marketed as parallel, non-autoregressive decoders, yet the order in which a shipped checkpoint actually commits its tokens is almost never measured. We instrument DiffusionGemma 26B, a masked discrete-diffusion mixture-of-experts model built on Gemma 4, hooking its sampler's accept step to record which canvas positions commit, when, and at what confidence. Across a 686-prompt, six-regime probe suite we find that its decoding is neither parallel nor block-autoregressive: it follows a partial left-to-right commit bias whose apparent strength depends almost entirely on the granularity at which you look. Order is weak token by token and strengthens smoothly as the analysis is coarsened, so the model's "block size" turns out to be an artifact of the measuring ruler rather than the architecture. The model commits in large simultaneous batches, leaving much of the within-batch order genuinely undefined rather than merely unobserved. The behaviour is regime-dependent: structured JSON is committed in essentially arbitrary order, and a position's commit confidence tracks correctness on mathematical reasoning but carries no signal on factual recall. Commitment is aggressive, finishing in a short late burst well inside the step budget, while task accuracy matches the model's autoregressive Gemma-4 sibling. Beyond these findings, our central contribution is methodological: measuring decoding order honestly demands handling trailing-EOS padding, within-regime confounding, commit non-monotonicity, block-size sensitivity, and large commit-batch ties, each of which can otherwise manufacture a decoding-order result that is not really there.

2606.14673 2026-06-15 cs.LG 新提交

Compressed Computation is (probably) not Computation in Superposition

压缩计算(可能)不是叠加计算

Jai Bhagat, Sara Molas-Medina, Giorgi Giglemiani, Stefan Heimersheim

发表机构 * Metamorphic Independent(独立研究者) UK AI Security Institute(英国人工智能安全研究所) Apollo Research

AI总结 通过分析压缩计算(CC)模型,发现其性能提升源于标签中的混合矩阵,而非真正的叠加计算,SNMF基线可复现其损失特征。

Comments Presented at the Mechanistic Interpretability Workshop at NeurIPS 2025

详情
AI中文摘要

我们研究压缩计算(CC)玩具模型(Braun等人,2025)是否是叠加计算的一个实例。CC模型似乎仅用50个神经元就能计算100个ReLU函数,其损失优于仅表示50个ReLU函数的预期。我们表明,该模型通过其带噪的残差流混合输入,对应于标签中一个非预期的混合矩阵。将训练目标分解为ReLU项和混合项,我们发现性能增益随混合矩阵的幅度缩放,并在移除该矩阵时消失。学习到的神经元方向集中在与混合矩阵前50个特征值相关的子空间中,表明混合项主导了解决方案。最后,仅从混合矩阵导出的半非负矩阵分解(SNMF)基线重现了定性损失曲线,并改进了先前的基线,尽管它未能匹配训练后的模型。这些结果表明CC不是叠加计算的一个合适玩具模型。

英文摘要

We study whether the Compressed Computation (CC) toy model (Braun et al., 2025) is an instance of computation in superposition. The CC model appears to compute 100 ReLU functions with just 50 neurons, achieving a better loss than expected from only representing 50 ReLU functions. We show that the model mixes inputs via its noisy residual stream, corresponding to an unintended mixing matrix in the labels. Splitting the training objective into the ReLU term and the mixing term, we find that performance gains scale with the magnitude of the mixing matrix and vanish when the matrix is removed. The learned neuron directions concentrate in the subspace associated with the top 50 eigenvalues of the mixing matrix, suggesting that the mixing term governs the solution. Finally, a semi-non-negative matrix factorization (SNMF) baseline derived solely from the mixing matrix reproduces the qualitative loss profile and improves on prior baselines, though it does not match the trained model. These results suggest CC is not a suitable toy model of computation in superposition.

2606.13589 2026-06-15 cs.LG cs.AI 新提交

Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning

单纯形约束的稀疏装袋:集成学习中从均匀先验到稀疏后验的转变

Meher Sai Preetam, Meher Bhaskar

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出SCSB框架,通过最小化袋外损失在概率单纯形上联合优化集成剪枝与校准,引入凹二次惩罚解决L1单纯形悖论,实现高达96%的压缩并提升校准性能。

Comments 6 pages, 3 tables

详情
AI中文摘要

我们提出单纯形约束的稀疏装袋(SCSB),一个用于基于自助法的装袋集成后训练压缩和概率校准的数学严格框架。标准装袋集成(如随机森林、装袋SVM和装袋神经网络)赋予所有组成估计器均匀的投票权。然而,这种朴素的均匀先验忽略了基估计器不同的局部能力,并导致模型过度自信。我们将集成剪枝和校准表述为在概率单纯形上的联合优化问题,通过最小化袋外(OOB)损失。为了诱导稀疏性,我们通过引入凹二次惩罚来解决理论上的“L1单纯形悖论”——即L1范数在单纯形上为常数且无法剪枝的数学现实。SCSB是模型无关的,实现了高达96%的集成压缩,带来线性推理加速和优越的概率校准(降低期望校准误差),同时保持或提升泛化精度。

英文摘要

We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such as Random Forests, Bagged SVMs, and Bagged Neural Networks) assign uniform voting power to all constituent estimators. However, this naive uniform prior ignores the varying local competence of base estimators and contributes to model overconfidence. We formulate ensemble pruning and calibration as a joint optimization problem over the probability simplex by minimizing the Out-Of-Bag (OOB) loss. To induce sparsity, we address the theoretical "L1-simplex paradox" -- the mathematical reality that the L1 norm is constant on the simplex and fails to prune -- by introducing a concave quadratic penalty. SCSB is model-agnostic and achieves up to 96% ensemble compression, yielding linear inference speedups and superior probability calibration (lowered Expected Calibration Error) while preserving or enhancing generalization accuracy.

2606.13982 2026-06-15 stat.ML cs.LG 交叉投稿

Adaptive Nucleus Truncation for Long-Form Reasoning

自适应核截断用于长形式推理

Ousmane Amadou Dia

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出自适应核截断采样(ANTS),通过熵条件控制器动态调整截断宽度,在长文本生成中提升推理性能,在33B参数稀疏MoE模型上平均提升1.9-5.2分。

详情
AI中文摘要

采样在长形式语言模型推理中扮演重要角色。在数千个解码步骤中,候选token集合的微小变化可能累积成不同的推理轨迹、稳定性配置和最终答案。现有的截断方法如top-$p$、min-$p$和固定top-$n\sigma$采样改进了无限制采样,但它们依赖固定阈值,无法适应熵、任务难度、训练阶段或生成预算的变化。我们引入自适应核截断采样(ANTS),将top-$n\sigma$采样从固定解码规则扩展为长形式生成的自适应展开控制机制。ANTS在温度缩放前选择最大logit周围的标准邻域,使用熵条件控制器自适应调整截断宽度,并保留一个无截断回退臂以在截断不安全时稳定训练。在33B总参数/4B活跃参数的稀疏混合专家推理模型上,ANTS在8K、16K和32K生成预算下分别比基于百分比的基准平均提升1.9、3.8和5.2分。最大提升出现在指令遵循和数学推理上,其中IFBench在32K时提升超过10分,AIME 2025提升7分。代码生成揭示了重要的预算交互:在Codeforces上,ANTS在8K时落后于基线,但在16K和32K时逆转差距并显著提升ELO。这些结果表明,采样器设计不应仅被视为解码超参数,而应作为我们稳定和扩展长预算推理的一部分。

英文摘要

Sampling plays an important role in long-form language-model reasoning. Over thousands of decoding steps, small changes in the candidate token set can compound into different reasoning trajectories, stability profiles, and final answers. Existing truncation methods such as top-$p$, min-$p$, and fixed top-$nσ$ sampling improve over unrestricted sampling, but they rely on fixed thresholds that cannot adapt to changes in entropy, task difficulty, training stage, or generation budget. We introduce Adaptive Nucleus Truncation Sampling (ANTS), which extends top-\(nσ\) sampling from a fixed decoding rule into an adaptive rollout-control mechanism for long-form generation. ANTS selects standardized neighborhoods around the maximum logit before temperature scaling, adapts the truncation width using an entropy-conditioned controller, and retains a no-truncation fallback arm to stabilize training when truncation becomes unsafe. On a 33B-total / 4B-active sparse Mixture-of-Experts reasoning model, ANTS improves average performance over percentage-based benchmarks by +1.9, +3.8, and +5.2 points at 8K, 16K, and 32K generation budgets, respectively. The strongest gains appear on instruction following and mathematical reasoning, with IFBench improving by more than 10 points at 32K and AIME 2025 improving by 7 points. Code generation reveals an important budget interaction. On Codeforces, ANTS trails the baseline at 8K, but reverses this gap and substantially improves ELO at 16K and 32K. These results suggest that sampler design should be treated not just as a decoding hyperparameter, but as part of how we stabilize and scale long-budget reasoning.

2606.14120 2026-06-15 eess.SP cs.AI cs.LG cs.SD eess.AS 交叉投稿

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

FAConformer:用于听觉注意解码的频率感知卷积Transformer

Ziwei Wang, Xingyi He, Tianwang Jia, Hongbin Wang, Dongrui Wu

发表机构 * Hubei Key Laboratory of Brain-inspired Intelligent Systems, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology(湖北脑启发智能系统重点实验室,人工智能与自动化学院,华中科技大学)

AI总结 提出FAConformer框架,通过频带特定编码和自适应跨频带交互,有效利用脑电图频域信息进行听觉注意解码,在公开数据集上超越现有最佳模型4.9%。

Comments 15 pages, 7 figures

详情
AI中文摘要

听觉注意解码(AAD)旨在从多说话人声学环境中的神经反应推断被注意的说话人,是神经导向听力系统的关键问题。尽管最近的研究取得了令人鼓舞的进展,但现有的AAD模型仍未充分利用频域脑电图(EEG)信息。特别是,大多数方法通过手工特征提取或直接跨频带特征拼接引入多频带信息,这主要是在浅层利用频率信息,可能忽略频带特定模式和跨频带交互。为了解决这些局限性,本文提出了FAConformer,一种用于AAD的频率感知CNN-Transformer框架,它明确集成了频带特定编码和自适应跨频带交互。具体来说,FAConformer首先将EEG信号分解为多个频带,并为每个频带分配一个独立的CNN-Transformer编码器进行频带特定建模。然后,通过精心设计的频率感知注意(FAA)模块自适应地融合得到的频带特征,该模块通过将频带特征视为令牌来建模跨频带依赖关系。此外,引入了频带辅助监督(BAS)以防止在联合训练中贡献较弱的分支优化不足。通过这种方式,FAConformer执行频率感知建模,更有效地利用频域信息。在两个公开AAD数据集上使用三种决策窗口长度进行的广泛实验表明,FAConformer始终优于12个竞争基线,比当前最先进模型高出4.9%。对频带重要性、消融和参数敏感性的进一步分析验证了所提出框架的有效性、鲁棒性和可解释性。代码可在此https URL获取。

英文摘要

Auditory attention decoding (AAD) aims to infer the attended speaker from neural responses in multi-speaker acoustic environments and is a key problem for neuro-steered hearing systems. Although recent studies have achieved encouraging progress, existing AAD models still do not fully exploit frequency domain electroencephalography (EEG) information. In particular, most approaches introduce multi-band information through handcrafted feature extraction or direct cross-band feature concatenation, which mainly exploit frequency information at a shallow level and may overlook band-specific patterns and cross-band interactions. To address these limitations, this paper proposes FAConformer, a frequency-aware CNN-Transformer framework for AAD that explicitly integrates band-specific encoding and adaptive cross-band interaction. Specifically, FAConformer first decomposes EEG signals into multiple frequency bands and assigns each band to an independent CNN-Transformer encoder for band-specific modeling. The resulting band-wise features are then adaptively fused by a carefully designed frequency-aware attention (FAA) module that models cross-band dependencies by treating band-wise features as tokens. Further, band-wise auxiliary supervision (BAS) is introduced to prevent weakly contributing branches from being under-optimized during joint training. In this way, FAConformer performs frequency-aware modeling that more effectively exploits frequency domain information. Extensive experiments on two public AAD datasets with three decision-window lengths demonstrated that FAConformer consistently outperformed 12 competitive baselines, surpassing the current state-of-the-art model by 4.9%. Further analyses of band importance, ablation, and parameter sensitivity verify the effectiveness, robustness, and interpretability of the proposed framework. Code is available at https://github.com/wzwvv/FAConformer.

2606.14470 2026-06-15 cs.AI cs.CL cs.LG 交叉投稿

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge

GitOfThoughts: 版本控制的推理与可回放、差异比较和合并的智能体记忆

Pavan C Shekar, Abhishek H S, Aswanth Krishnan

发表机构 * QpiAI

AI总结 提出GitOfThoughts框架,将智能体推理树存储为git仓库,实现推理的可回放、审计和合并;实验表明,对于新问题,任何记忆格式均不能可靠提升准确率,仅当检索案例与当前问题高度相似(>0.8)时才有显著提升,且收益来自答案检索而非方法迁移。

Comments 10 pages, 1 figure, 9 tables

详情
AI中文摘要

大语言模型推理是短暂的:思维链随上下文窗口消失,剪枝的搜索分支不留记录,记忆缓冲区无法进行差异比较、合并或审计。其他所有复杂的软件过程(代码、基础设施、数据、实验)都受版本控制;推理却没有。我们提出GitOfThoughts,将智能体的推理树存储为git仓库:每个评分的思维是一个提交,分数是注释,结果是标签,检索是智能体自身历史上的“git log”。这使得推理可回放、可审计,并且可以在智能体之间以近乎零的工程成本进行合并。然后我们提出一个更难的问题:记忆在任何基质上是否真的能提高准确性?在五种基质(无、markdown、向量、图、git)、两个基准、两个模型规模以及预注册的复制实验中,对于新问题的答案是否定的。没有一种记忆格式可靠地有帮助,一个有希望的早期结果在其自身的预注册复制下崩溃了。记忆只有在超过我们所谓的可复制阈值时才有效:当检索到的案例与当前问题几乎重复(相似度>~0.8)时,准确率急剧上升;低于此阈值,则无效果。收益是答案检索,而非方法迁移:一个4.5倍大的模型使近重复收益翻倍,但仍然无法从工作示例中提取可迁移的方法。我们发现唯一的通用杠杆是测试时采样。因此,git作为基质的理由是审计性、溯源性和可合并性,且准确率相当。我们记录了一个撤回的结果和一个被反驳的假设,以体现我们坚持的评估标准。

英文摘要

Large language model (LLM) reasoning is ephemeral: chains of thought vanish with the context window, pruned search branches leave no record, and memory buffers cannot be diffed, merged, or audited. Every other complex software process (code, infrastructure, data, experiments) is version-controlled; reasoning is not. We introduce GitOfThoughts, which stores an agent's reasoning tree as a git repository: every scored thought is a commit, scores are notes, outcomes are tags, and retrieval is "git log" over the agent's own history. This makes reasoning replayable, auditable, and mergeable across agents at near-zero engineering cost. We then ask the harder question: does memory, in any substrate, actually improve accuracy? Across five substrates (none, markdown, vector, graph, git), two benchmarks, two model scales, and pre-registered replications, the answer for novel problems is no. No memory format reliably helps, and a promising early result collapsed under its own pre-registered replication. Memory pays only above what we call the copyability threshold: when the retrieved case is a near-duplicate of the current problem (similarity >~ 0.8), accuracy jumps sharply; below it, nothing. The gain is answer retrieval, not method transfer: a 4.5x larger model doubles the near-duplicate payoff yet still cannot extract a transferable method from a worked example. The only general lever we find is test-time sampling. The case for git-as-substrate is therefore auditability, provenance, and mergeability at accuracy parity. We document a retracted result and a refuted hypothesis to model the evaluation standard we hold ourselves to.

2506.14202 2026-06-15 cs.LG cs.AI stat.ML 版本更新

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

DiffusionBlocks: 通过扩散解释进行分块神经网络训练

Makoto Shing, Masanori Koyama, Takuya Akiba

发表机构 * Sakana AI The University of Tokyo(东京大学)

AI总结 提出DiffusionBlocks框架,利用残差连接与动力系统的对应关系,将网络转换为去噪过程,通过分数匹配目标实现独立分块训练,在多种Transformer架构上达到与端到端训练相当的性能,同时降低内存需求。

Comments To appear at the 14th International Conference on Learning Representations (ICLR 2026). v4: Fixed typos in experimental details (Appendix E.4)

详情
AI中文摘要

端到端反向传播需要存储所有层的激活值,造成内存瓶颈,限制了模型的可扩展性。现有的分块训练方法提供了缓解该问题的途径,但它们依赖于特设的局部目标,并且在分类任务之外尚未得到充分探索。我们提出$\textit{DiffusionBlocks}$,一个将基于Transformer的网络转化为真正独立可训练块的原则性框架,这些块能保持与端到端训练相竞争的性能。我们的关键洞察在于利用残差连接自然对应于动力系统中的更新这一事实。通过对该系统进行最小修改,我们可以将这些更新转换为去噪过程的更新,其中每个块可以通过利用分数匹配目标独立学习。这种独立性使得每次只训练一个块的梯度成为可能,从而将内存需求按块数量成比例降低。我们在多种Transformer架构(视觉、扩散、自回归、递归深度和掩码扩散)上的实验表明,DiffusionBlocks训练与端到端训练性能匹配,同时能够在实际任务(超越小规模分类)上实现可扩展的分块训练。DiffusionBlocks提供了一种理论上有依据的方法,成功地将现代生成任务扩展到多种架构。代码可在该https URL获取。

英文摘要

End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to alleviate this problem, but they rely on ad-hoc local objectives and remain largely unexplored beyond classification tasks. We propose $\textit{DiffusionBlocks}$, a principled framework for transforming transformer-based networks into genuinely independent trainable blocks that maintain competitive performance with end-to-end training. Our key insight leverages the fact that residual connections naturally correspond to updates in a dynamical system. With minimal modifications to this system, we can convert the updates to those of a denoising process, where each block can be learned independently by leveraging the score matching objective. This independence enables training with gradients for only one block at a time, thereby reducing memory requirements in proportion to the number of blocks. Our experiments on a range of transformer architectures (vision, diffusion, autoregressive, recurrent-depth, and masked diffusion) demonstrate that DiffusionBlocks training matches the performance of end-to-end training while enabling scalable block-wise training on practical tasks beyond small-scale classification. DiffusionBlocks provides a theoretically grounded approach that successfully scales to modern generative tasks across diverse architectures. Code is available at https://github.com/SakanaAI/DiffusionBlocks .

2510.01663 2026-06-15 cs.LG cs.AI 版本更新

Shift-Invariant Attribute Scoring for Kolmogorov-Arnold Networks via Shapley Value

基于Shapley值的Kolmogorov-Arnold网络平移不变属性评分

Wangxuan Fan, Ching Wang, Siqi Li, Nan Liu

发表机构 * GitHub

AI总结 提出ShapKAN框架,利用Shapley值归因实现平移不变的节点重要性评估,有效压缩KAN网络并保持其可解释性优势。

Comments 14 pages, 6 figures, 9 tables

详情
AI中文摘要

对于许多实际应用,理解特征与结果之间的关系与实现高预测准确性同样重要。虽然传统神经网络在预测方面表现出色,但其黑箱性质掩盖了潜在的功能关系。Kolmogorov-Arnold网络(KAN)通过在边上采用可学习的基于样条的激活函数来解决这一问题,能够在保持竞争性能的同时恢复符号表示。然而,KAN的架构对网络剪枝提出了独特的挑战。由于对输入坐标平移的敏感性,传统的基于幅度的方法变得不可靠。我们提出了\textbf{ShapKAN},一种使用Shapley值归因以平移不变方式评估节点重要性的剪枝框架。与基于幅度的方法不同,ShapKAN量化每个节点的实际贡献,确保无论输入参数化如何,重要性排名保持一致。在合成和真实世界数据集上的大量实验表明,ShapKAN在实现有效网络压缩的同时保留了真实的节点重要性。我们的方法提升了KAN的可解释性优势,便于在资源受限环境中部署。

英文摘要

For many real-world applications, understanding feature-outcome relationships is as crucial as achieving high predictive accuracy. While traditional neural networks excel at prediction, their black-box nature obscures underlying functional relationships. Kolmogorov--Arnold Networks (KANs) address this by employing learnable spline-based activation functions on edges, enabling recovery of symbolic representations while maintaining competitive performance. However, KAN's architecture presents unique challenges for network pruning. Conventional magnitude-based methods become unreliable due to sensitivity to input coordinate shifts. We propose \textbf{ShapKAN}, a pruning framework using Shapley value attribution to assess node importance in a shift-invariant manner. Unlike magnitude-based approaches, ShapKAN quantifies each node's actual contribution, ensuring consistent importance rankings regardless of input parameterization. Extensive experiments on synthetic and real-world datasets demonstrate that ShapKAN preserves true node importance while enabling effective network compression. Our approach improves KAN's interpretability advantages, facilitating deployment in resource-constrained environments.

2604.17892 2026-06-15 cs.LG cs.AI 版本更新

LEPO: Latent Reasoning Policy Optimization for Large Language Models

LEPO:面向大语言模型的潜在推理策略优化

Yuyan Zhou, Jiarui Yu, Hande Dong, Zhezheng Hao, Hong Wang, Jianqing Zhang, Qiang Lin

发表机构 * Tencent(腾讯) Zhejiang University(浙江大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 LEPO通过引入Gumbel-Softmax在大语言模型中实现可控的随机性,提升其探索能力与强化学习兼容性,通过直接在连续潜在表示上应用强化学习,显著优于现有方法。

详情
AI中文摘要

近年来,潜在推理被引入大语言模型(LLMs)以利用连续空间中的丰富信息。然而,缺乏随机采样时,这些方法不可避免地退化为确定性推理,无法发现多样的推理路径。为弥合这一差距,我们通过Gumbel-Softmax在潜在推理中注入可控的随机性,恢复LLMs的探索能力并增强其与强化学习(RL)的兼容性。在此基础上,我们提出LEPO,一种将强化学习直接应用于连续潜在表示的新框架。具体而言,在回放阶段,LEPO保持随机性以实现多样化的轨迹采样;在优化阶段,LEPO为潜在表示和离散令牌构建统一的梯度估计。大量实验表明,LEPO在离散和潜在推理方面显著优于现有RL方法。

英文摘要

Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference, failing to discover diverse reasoning paths. To bridge the gap, we inject controllable stochasticity into latent reasoning via Gumbel-Softmax, restoring LLMs' exploratory capacity and enhancing their compatibility with Reinforcement Learning (RL). Building on this, we propose \textbf{\underline{L}}atent R\textbf{\underline{e}}asoning \textbf{\underline{P}}olicy \textbf{\underline{O}}ptimization~(\textbf{LEPO}), a novel framework that applies RL directly to continuous latent representations. Specifically, in rollout stage, LEPO maintains stochasticity to enable diverse trajectory sampling, while in optimization stage, LEPO constructs a unified gradient estimation for both latent representations and discrete tokens. Extensive experiments show that LEPO significantly outperforms existing RL methods for discrete and latent reasoning.

2605.05983 2026-06-15 cs.LG 版本更新

Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

无牺牲的引导:面向仅提示干预的引导向量的原则性训练

Yuntai Bao, Qinfeng Li, Xinyan Yu, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin, Xuhong Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出联合训练引导因子和方向的方法,消除后验因子选择;引入仅提示引导向量(PrOSV),仅干预少量提示词,在AxBench上优于传统全序列引导向量,并实现更好的通用模型效用与对抗鲁棒性权衡。

Comments 63 pages, 50 figures; accepted by ICML 2026

详情
AI中文摘要

近年来,引导向量(SVs)已成为一种有效且轻量级的方法来引导大型语言模型(LLMs)的行为,其中微调后的SVs比无优化的SVs更有效。然而,当前的微调SV方法存在两个局限性。首先,它们需要在推理时针对每个SV仔细选择引导因子,以平衡引导效果和生成质量。其次,它们作为全序列SV(FSSVs)运行,无论因子选择如何,由于对模型生成过程的过度干预,都可能牺牲生成质量。为了解决第一个局限性,我们提出联合训练引导因子和方向,从而不再需要事后因子选择。利用神经网络缩放理论,我们发现引导因子适中的大初始化大小和学习率对于联合训练的稳定性和效率至关重要。为了解决第二个局限性,我们从表示微调中汲取灵感,引入了仅提示SV(PrOSV),一种仅干预少量提示词的SV。我们的实验结果表明,在使用我们的联合训练方案时,PrOSV在AxBench上优于传统的FSSVs。我们还发现,与FSSV相比,PrOSV在通用模型效用和对抗鲁棒性之间实现了更好的权衡。

英文摘要

Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time. Second, they operate as full-sequence SVs (FSSVs), which can sacrifice generation quality regardless of factor selection due to excessive intervention on the model generation process. To address the first limitation, we propose joint training of steering factors and directions, such that post-hoc factor selection is no longer required. Using neural network scaling theory, we find that moderately large initialization sizes and learning rates for steering factors are essential for stability and efficiency of joint training. To tackle the second limitation, we draw inspiration from representation fine-tuning and introduce Prompt-only SV (PrOSV), an SV that intervenes only on a few prompt tokens. Our empirical results show that PrOSV outperforms traditional FSSVs on AxBench when using our joint training scheme. We also find that PrOSV achieves a better tradeoff between general model utility and adversarial robustness than FSSV.

2605.07984 2026-06-15 cs.LG cs.AI 版本更新

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

计划在哪里?通过轻量级机制干预定位语言模型中的潜在规划

Nicole Ma, Nick Rui

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过押韵对句补全任务,使用线性探针和激活修补方法,研究语言模型在生成过程中是否形成并因果依赖未来约束的潜在规划,发现仅Gemma-3-27B模型存在因果依赖,并定位到五个注意力头。

Comments 13 pages, 20 figures, 3 tables. Accepted to Workshop on Mechanistic Interpretability @ ICML 2026

详情
AI中文摘要

我们研究语言模型中的规划位点形成——在前向传播过程中,结构约束的未来标记的内部表示是否形成,以及它们是否因果驱动生成。使用押韵对句补全作为前向约束的干净测试,我们在Qwen3、Gemma-3和Llama-3的十多个规模上应用两种轻量级方法(线性探针和激活修补)。探针显示,未来押韵信息在行边界处是线性可解码的,且信号在所有三个模型族中随规模增强。激活修补揭示,只有Gemma-3-27B因果依赖这种编码,表现出一种交接,其中因果驱动因素在大约第30层从押韵词迁移到行边界。我们测试的其他每个模型在整个生成过程中都条件于押韵词,在行边界处因果效应接近零,尽管探针信号很强。通过两阶段路径修补,我们将Gemma-3-27B的交接定位到五个注意力头,这些头在新行处恢复了约90%的押韵路由能力。

英文摘要

We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite strong probe signal. We localize the Gemma-3-27B handoff to five attention heads through two-stage path patching that recover ~90% of the rhyme-routing capacity at the newline.

2605.11558 2026-06-15 cs.LG stat.ML 版本更新

A Composite Activation Function for Learning Stable Binary Representations

一种用于学习稳定二进制表示的复合激活函数

Seokhun Park, Choeun Kim, Kwanho Lee, Sehyun Park, Insung Kong, Yongdai Kim

发表机构 * Department of Statistics(统计学系) Seoul National University(首尔国立大学) Department of Applied Mathematics(应用数学系) University of Twente(埃因霍温理工大学)

AI总结 本文提出HTAF复合激活函数,通过平滑近似Heaviside函数实现稳定训练,适用于Spiking神经网络等模型,并引入ICBMs模型实现可解释的图像处理。

Comments 32 pages

详情
AI中文摘要

激活函数在神经网络中通过塑造内部表示起核心作用。最近,学习二进制激活表示因其在计算和内存效率以及可解释性方面的优势而受到广泛关注。然而,使用Heaviside激活函数训练神经网络仍具挑战性,因其非可导性阻碍了标准梯度优化。本文提出Heavy Tailed Activation Function (HTAF),一种Heaviside函数的平滑近似,使基于梯度的优化能够稳定训练。我们构造HTAF为sigmoid双曲正切复合函数,并理论证明其在零输入附近保持大梯度质量,同时在尾部区域表现出更慢的梯度衰减。我们展示Spiking神经网络、二进制神经网络和深度Heaviside神经网络可以使用HTAF稳定训练。最后,我们引入隐式概念瓶颈模型(ICBMs),一种利用HTAF诱导离散特征表示的可解释图像模型。在各种架构和图像数据集上的广泛实验表明,ICBMs能够稳定地实现离散化,同时预测性能与标准模型相当或更好。

英文摘要

Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neural networks with Heaviside activations remains challenging, as their non-differentiability obstructs standard gradient-based optimization. In this paper, we propose Heavy Tailed Activation Function (HTAF), a smooth approximation to the Heaviside function that enables stable training with gradient-based optimization. We construct HTAF as a sigmoid hyperbolic tangent composite function and theoretically show that it maintains a large gradient mass around zero inputs while exhibiting slower gradient decay in the tail regions. We show that Spiking Neural Networks, Binary Neural Networks and Deep Heaviside neural Networks can be trained stably using HTAF with gradient-based optimization. Finally, we introduce Implicit Concept Bottleneck Models (ICBMs), an interpretable image model that leverages HTAF to induce discrete feature representations. Extensive experiments across various architectures and image datasets demonstrate that ICBM enables stable discretization while achieving prediction performance comparable to or better than standard models.

2605.17779 2026-06-15 cs.LG 版本更新

Learning Variable-Length Tokenization for Generative Recommendation

学习可变长度分词以生成推荐

Minhao Wang, Bowen Wu, Wei Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出VarLenRec框架,通过Popularity-Weighted Information Budget Allocation方法解决生成推荐中可变长度分词问题,提升推荐准确性和效率。

Comments 14 pages, 5 figures

详情
AI中文摘要

生成推荐将推荐问题重新表述为离散语义标识符(ID)的下一个标记预测。一个基本但未被探索的设计选择是现有方法对所有项目使用固定长度分词,隐含假设编码能力在项目特性上是均匀的。通过系统地在四个数据集上进行实验,我们发现流行度-长度悖论:流行项目在短ID下表现最佳,而尾部项目需要显著更长的代码来捕捉判别性语义。这揭示了一个关键不匹配:流行项目受益于丰富的协同信号并需要最小的语义细节,而尾部项目必须依赖于细粒度的内容特征,因为交互数据稀疏。为了解决这个问题,我们提出了VarLenRec,一个学习可变长度分词的框架。我们开发了流行度加权信息预算分配(PIBA),一个信息论框架证明最优ID长度应与流行度的负幂成比例。直接实现可变长度分配面临两个技术挑战:标准欧几里得残差量化缺乏支持不同代码长度的几何容量,而离散长度决策是非可微的。我们通过双曲残差量化解决这些问题,该方法利用庞加莱球的指数体积增长来自然分层编码能力,并通过软长度控制器实现可微长度预测,通过连续层保留概率正则化由PIBA推导出的先验。广泛的实验表明,VarLenRec在推荐准确性和训练/推理效率上显著优于现有最先进方法,揭示了自适应编码能力在生成推荐中的重要性。

英文摘要

Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincaré ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

2605.18848 2026-06-15 cs.LG cs.AI 版本更新

Exact Linear Attention

精确线性注意力

Weinuo Ou

发表机构 * GitHub

AI总结 本文提出精确线性注意力(ELA),通过利用核函数的精确分解性质,实现Transformer注意力的线性计算复杂度,消除近似误差。针对先前线性注意力的两个关键限制——梯度爆炸和token注意力稀释,提出核约束以确保非负性、判别性和几何可解释性。此外,本文还提出了三种工程创新,包括Hyper-Link结构、Memory Lobe模块和基于路由分数的MoE偏置机制,实验结果表明ELA在解码速度和KV缓存内存使用上分别达到全注意力的6倍和75%的减少,同时保持或优于训练性能。

Comments 9 pages, 19 figures, journal

详情
AI中文摘要

本文介绍精确线性注意力(ELA),一种通过利用核函数的精确分解性质,实现Transformer注意力线性计算复杂度的机制,从而消除近似误差。我们识别并解决了先前线性注意力的两个关键限制——梯度爆炸和token注意力稀释——通过施加核约束,确保非负性、判别性和几何可解释性。提出了几种核函数,包括Hadamard Exp核、求和平方欧几里得距离核和减法平方欧几里得距离核,每种都针对特定的注意力行为进行了优化。除了核心注意力公式之外,本文还提出了三种工程创新:(1)Hyper-Link结构,用以替代传统残差连接以缓解梯度退化;(2)基于双向线性注意力的Memory Lobe模块,捕捉跨层的“转换流”以实现定性记忆和隐式强化学习范式;(3)基于路由分数的MoE偏置机制,以提高可解释性和语义对齐。实验结果表明,ELA在解码速度和KV缓存内存使用上分别达到全注意力的6倍和75%的减少,同时保持或优于训练性能。所提出的记忆模块加速了收敛并增强了泛化能力。此外,我们还将线性注意力原理扩展到视觉模型,得到YOLO-LAT,其在GPU推理速度和参数减少方面分别达到4.3倍和7.9倍,同时保持竞争性的检测精度。这些结果表明,精确线性注意力在扩展Transformer模型以处理超长序列和高效视觉任务方面具有广泛的应用前景。

英文摘要

This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by exploiting the exact decomposition property of kernel functions, thereby eliminating approximation error. We identify and address two key limitations of prior linear attention -- gradient explosion and token attention dilution -- by imposing kernel constraints that ensure non-negativity, discriminability, and geometric interpretability. Several kernel functions are proposed, including the Hadamard Exp Kernel, Summation Squared Euclidean Distance Kernel, and Subtraction Squared Euclidean Distance Kernel, each tailored for specific attention behaviors. Beyond the core attention formulation, the paper presents three engineering innovations: (1) a Hyper-Link structure that replaces traditional residual connections to mitigate gradient degradation; (2) a Memory Lobe module based on bidirectional linear attention, which captures "transformation flow" across layers to implement qualitative memory and an implicit reinforcement learning paradigm; and (3) a routing-score-based bias mechanism for Mixture-of-Experts (MoE) to improve interpretability and semantic alignment. Experimental results demonstrate that ELA achieves up to 6x faster decoding speed and 75% reduction in KV cache memory usage compared to full attention, while maintaining comparable or superior training performance. The proposed memory module accelerates convergence and enhances generalization. Furthermore, we extend the linear attention principle to vision models, yielding YOLO-LAT, which attains up to 4.3x GPU inference speedup and 7.9x parameter reduction with competitive detection accuracy. These results underline the broad applicability of exact linear attention for scaling Transformer models to ultra-long sequences and efficient visual tasks.

2606.01476 2026-06-15 cs.LG cs.CL 版本更新

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

OmniOPD:通过推测性验证实现无Logit的在线策略蒸馏

Yuhang Zhou, Lizhu Zhang, Yifan Wu, Mingyi Wang, Bo Peng, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao

发表机构 * Meta AI

AI总结 提出OmniOPD框架,通过基于蒙特卡洛展开的块级语义相似度替代token级logit匹配,结合峰值熵调度器和贝叶斯先验,解决在线策略蒸馏中logit不可获取和信号脆弱问题,在数学任务上超越标准OPD达28.64%。

Comments 26 pages, 3 figures

详情
AI中文摘要

在线策略蒸馏(OPD)在强教师模型的密集token级反馈下,基于学生模型自身的生成轨迹进行训练,缓解了监督微调(SFT)的离策略分布偏移和强化学习(RL)的稀疏信用分配问题。然而,标准OPD面临两个耦合的限制。首先,它需要直接访问教师模型的token级logit,将一大类有能力的专有模型排除在教师之外。其次,token级logit信号本身是脆弱的,依赖于教师和学生之间合理下一个token的狭窄重叠,并且容易放大重复循环等退化模式。在本文中,我们引入了OmniOPD,一种通过无logit的块级监督信号解决这两个限制的新框架。OmniOPD用蒙特卡洛展开替代确定性logit匹配,通过多token块上的连续语义相似性度量近似教师的局部偏好,并通过峰值熵调度器集中这种监督,仅在学生的高不确定性推理分叉处进行审计。Dirichlet-Multinomial贝叶斯先验和基础模型KL锚进一步限制了离散采样的方差,并防止了未审计token上的策略崩溃。在竞争性基准测试中,OmniOPD在数学任务上超越标准OPD方法高达28.64%,证实了块级语义验证提取了比token级logit匹配更可靠的学习信号,后者高信息密度被显著的噪声和脆弱性所抵消。此外,当与更强的黑盒教师(如Claude-4.5-Haiku和Gemini-2.5-Flash)配对时,OmniOPD在数学任务上相对于其开放权重教师对应物额外获得了9.54%的相对提升,使学生超越了自我探索RL的性能。

英文摘要

On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access to the teacher's token-level logits, excluding a broad class of capable proprietary models from serving as teachers. Second, the token-level logit signal itself is brittle, depending on a narrow overlap of plausible next tokens between teacher and student, and prone to amplifying degenerate patterns such as repetition loops. In this paper, we introduce OmniOPD, a novel framework that addresses both limitations through a logit-free, chunk-level supervision signal. OmniOPD replaces deterministic logit matching with Monte Carlo rollouts that approximate the teacher's local preferences through a continuous semantic similarity metric over multi-token chunks, and concentrates this supervision via a peak-entropy scheduler that audits the student only at its high-uncertainty reasoning forks. A Dirichlet-Multinomial Bayesian prior and a base-model KL anchor further bound the variance of discrete sampling and prevent policy collapse across unaudited tokens. Across competitive benchmarks, OmniOPD surpasses the standard OPD approach by up to +28.64% on math, confirming that chunk-level semantic verification extracts a more reliable learning signal than token-level logit matching, whose high information density is offset by significant noise and brittleness. Furthermore, when paired with stronger black-box teachers such as Claude-4.5-Haiku and Gemini-2.5-Flash, OmniOPD achieves an additional +9.54% relative on math over its open-weight teacher counterpart, advancing the student past the performance of self-exploratory RL.

2606.03085 2026-06-15 cs.LG cs.CL 版本更新

Multi-component Causal Tracing in Large Language Models

大型语言模型中的多组件因果追踪

Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer

发表机构 * Rensselaer Polytechnic Institute(拉特拉姆技术学院) IBM Research(IBM研究院)

AI总结 本文提出一个统一框架,通过软干预和度量转换高效识别对目标性能指标最关键的多组件子集,优于现有基线方法。

Comments Accepted to ACL 2026 main conference

详情
AI中文摘要

因果追踪通过系统地干预大型语言模型(LLM)的内部表示,揭示并量化将特定输入或计算与特定感兴趣指标联系起来的因果路径,从而量化LLM的行为。在先前单组件或单层研究的基础上,本文提出了一个同时因果追踪多个组件的统一框架。该框架系统地识别对期望目标性能指标(如准确性和公平性)最关键的组件子集(例如注意力头和多层感知器神经元)。这是通过将灵活的干预应用于广泛期望的指标来实现的。为了解决多组件问题的组合复杂性,设计了一种高效算法,该算法利用软干预和精心设计的度量转换,将组合搜索问题转化为一个连续问题,该问题可以在适当约束下高效求解,从而为选择组件生成适当的二元决策。实验结果表明,所提出的方法高效地识别出对目标指标具有高影响力的模型组件子集,优于现有基线方法。我们的代码可从此https URL获取。

英文摘要

Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model's components that have a high impact on the target metric, outperforming existing baseline approaches. Our code is available at https://github.com/ZiruiYan/multi-component-causal-tracing.

2606.06010 2026-06-15 cs.LG cs.DB 版本更新

Adaptive Oscillatory-State Alignment for Time Series Forecasting

自适应振荡状态对齐用于时间序列预测

Zhangyao Song, Chaofeng Qu, Chao Zha, Xiaoyu Zhao, Yinfei Xu, Tao Guo

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出AOSNET框架,通过希尔伯特变换将固定模板匹配改为自适应振荡状态对齐,以处理实际时间序列中的非平稳振荡行为,在多个基准上达到先进或竞争性精度。

详情
AI中文摘要

长期时间序列预测受益于揭示重复时间结构的归纳偏置。现有的周期性预测方法通常通过预定义周期、全局频谱分量或固定可学习模板来建模重复性。然而,现实世界的时间动态很少是严格周期性的:振荡行为通常通过幅度调制、相位漂移和局部频率变化而演变。在这些条件下,固定模板的周期性建模可能与底层时间状态根本性不匹配。我们提出了AOSNET,一个希尔伯特引导的预测框架,将周期性预测从固定模板匹配重新表述为自适应振荡状态对齐。AOSNET从观测序列和可学习的全局振荡先验中提取解析信号描述符,然后通过描述符条件门自适应地对齐局部状态,该门选择性地保留可靠观测,同时软性纠正不匹配区域。学习到的先验不是作为刚性的重复模板,而是作为通过局部状态动力学解释的灵活振荡参考。在八个基准上的实验表明,具有快速推理速度的最先进或高度竞争的准确性。控制合成研究分离幅度调制、相位漂移和局部频率变化,证实振荡状态对齐的优势随着非平稳性加剧而持续增加。

英文摘要

Long-term time series forecasting benefits from inductive biases that expose recurring temporal structure. Existing periodic forecasting methods typically model recurrence through predefined periods, global spectral components, or fixed learnable templates. However, real-world temporal dynamics are rarely rigidly periodic: around a nominal cycle, oscillatory behavior often exhibits \emph{non-rigid periodicity} (NRP), where cycle magnitude, cycle alignment, and local cycle duration vary over time. Under these conditions, fixed-template periodic modeling can become fundamentally mismatched to the underlying temporal states. We propose AOSNet, a Hilbert-guided forecasting framework that reformulates periodic forecasting from fixed template matching to adaptive oscillatory-state alignment. AOSNet extracts analytic-signal descriptors from both the observed sequence and a learnable global oscillatory prior, then adaptively aligns local states through a descriptor-conditioned gate that selectively preserves reliable observations while softly correcting mismatched regions. The learned prior serves not as a rigid repeated template but as a flexible oscillatory reference interpreted through local state dynamics. Experiments on eight public benchmarks and two cloud workload traces demonstrate leading or highly competitive accuracy with a compact model size and low inference latency, supporting repeated forecasting settings such as capacity planning and autoscaling. Controlled synthetic studies that isolate cycle-magnitude and cycle-alignment variation and combine them with cycle-duration changes show that the advantage of oscillatory-state alignment increases as NRP intensifies.

2606.13119 2026-06-15 cs.LG cs.AI cs.NE 版本更新

MP3: Multi-Period Pattern Pre-training for Spatio-Temporal Forecasting

MP3:面向时空预测的多周期模式预训练

Lilan Peng, Yandi Liu, Qingren Yao, Chongshou Li, Tianrui Li

发表机构 * School of Computing and Artificial Intelligence, Southwest Jiaotong University(西南交通大学计算机与人工智能学院) Eindhoven University of Technology(埃因霍温理工大学)

AI总结 针对时空数据中短窗口输入导致的时间幻象问题,提出多周期模式预训练插件MP3,通过多周期时间建模、空间建模和跨周期因果交互,提升现有STGNN的预测性能。

详情
AI中文摘要

时空预测在交通、气候和能源等多个领域至关重要。城市时空数据表现出时间幻象:相似的短窗口输入具有不同的未来趋势,反之亦然。现有的时空图神经网络(STGNN)无法有效识别此类幻象。我们认为核心原因在于短窗口输入具有不完整的周期观测、异质的全局空间相关性和跨周期叠加因果性。为弥补这一差距,我们开发了一种新颖的多周期模式预训练(MP3),这是一种用于区分时间幻象的即插即用预训练插件。MP3提出了两项核心创新:(1)多周期模式学习旨在从长时间序列中学习多周期模式。具体地,多周期时间建模利用边卷积来识别不同的多周期模式。多周期空间建模使用瓶颈投影和全局记忆库来高效捕获异质的全局空间关系。跨周期模式交互采用因果增强的Transformer来捕获不同周期模式之间的依赖关系。(2)该插件可以无缝集成到现有的STGNN骨干中,以增强其预测性能。在五个真实世界数据集(包括大规模数据集CA)上的五个STGNN基线实验验证了MP3的有效性、优越的可扩展性和强适应性,其在所有评估基线上带来了一致且稳健的性能提升。平均而言,MP3将MAE降低了4.7%,RMSE降低了5.0%。代码可在此https URL获取。

英文摘要

Spatio-Temporal forecasting is crucial in diverse fields, such as transportation, climate, and energy. Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends, and vice versa. Existing spatio-temporal graph neural networks (STGNNs) cannot effectively identify such mirages. We argue that the core reason lies in the short-window inputs that have incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality. To bridge this gap, we develop a novel Multi- Period Pattern Pre-training (MP3), a plug-and-play pre-training plugin for distinguishing temporal mirages. MP3 presents two core innovations: (1) The multi-period pattern learning is designed to learn multi-period patterns from long time series. Specifically, multi-period temporal modeling leverages edge convolution to identify different multi-period patterns. Multi-period spatial modeling uses a bottleneck project and a global memory bank to capture heterogeneous global spatial relations efficiently. Cross-period pattern interaction employs a causality-enhanced Transformer to capture dependencies across different period patterns. (2) This plugin can seamlessly integrate into existing STGNN backbones to strengthen their forecasting performance. The experiment on five STGNN baselines across five real-world datasets (including a large-scale dataset CA) verify the effectiveness, superior scalability and strong adaptability of MP3, which brings consistent and robust performance improvements across all evaluated baselines. On average, MP3 reduces the MAE 4.7% and the RMSE 5.0%. The code can be available at https://github.com/YAN-outlook/MP3.

2606.13657 2026-06-15 cs.LG 版本更新

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

密集监督,稀疏更新:论策略蒸馏的稀疏性与几何结构

Guo Yu, Wenlin Liu, Yulan Hu, Hao-Xuan Ma, Jun-Peng Jiang, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University(南京大学人工智能学院) National Key Laboratory for Novel Software Technology, Nanjing University(南京大学计算机软件新技术国家重点实验室) Amap, Alibaba Group(阿里巴巴集团高德地图)

AI总结 本文分析策略蒸馏(OPD)中参数更新的稀疏性和几何特性,发现更新稀疏且集中于小权重坐标,并验证了稀疏子网络的有效性。

Comments Code is available at https://github.com/SydCS/OPD-Param-Analysis

详情
AI中文摘要

策略蒸馏(\ extsc{OPD})最近成为一种重要的后训练方法,因为它结合了两个理想的要素:策略学生轨迹和密集教师监督,但这种混合如何改变模型参数仍不清楚。在多个语言和视觉-语言模型对及用例中,我们的分析得出两个主要发现。关于稀疏性,\ extsc{OPD}风格的更新小且坐标稀疏。它们分布在各层,通常以前馈网络(FFN)为主。这种稀疏结构在操作上有用:仅训练发现的子网络几乎能恢复完整\ extsc{OPD}的性能。然而,在我们的优化器消融实验中,诱导稀疏性的SGD优化器表现不如AdamW,可能是因为密集教师监督保留了异质的坐标梯度尺度,而AdamW的自适应缩放仍然有用。关于几何结构,更新在数值上是满秩的,但谱集中;它们主要位于源权重的奇异子空间之外,并且不成比例地落在源权重接近零的坐标上。这些发现表明,密集教师监督并不会使\ extsc{OPD}变成普通的密集参数重写;相反,\ extsc{OPD}保留了策略后训练的重要几何特征。

英文摘要

On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe by combining two desirable ingredients: on-policy student trajectories and dense teacher supervision. However, how this hybrid changes a model's parameters remains unclear. Across several language and vision-language model pairs and \textsc{OPD} use cases, our analysis yields two main findings. On sparsity, \textsc{OPD} updates are small and coordinate-sparse. They are distributed across layers, with the largest relative movement usually appearing in FFN modules. This sparse structure is operationally useful: training only the discovered subnetwork nearly recovers full-training performance. The sparse support does not remove the need for adaptive optimization: SGD, previously reported to be competitive in \textsc{RLVR}, underperforms AdamW in our \textsc{OPD} optimizer ablation, suggesting that dense teacher supervision preserves useful momentum structure and heterogeneous second-moment scales. On geometry, the updates are numerically full-rank but spectrally concentrated; they lie mostly away from the principal singular subspaces of the source weights and fall disproportionately on coordinates where the source weights are close to zero. These findings suggest that dense teacher supervision does not turn \textsc{OPD} into ordinary dense parameter rewriting; instead, \textsc{OPD} retains important geometric signatures of on-policy post-training.

2504.20734 2026-06-15 cs.CL cs.AI cs.CV cs.IR cs.LG 版本更新

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

UniversalRAG: 在多样模态和粒度的语料库上实现检索增强生成

Woongyeong Yeo, Kangsan Kim, Soyeong Jeong, Jinheon Baek, Sung Ju Hwang

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出UniversalRAG,一种能够处理多种模态和粒度的检索增强生成框架,通过动态路由机制和多粒度组织,提升跨模态知识检索的有效性,实验表明其在多个模态基准上的优越性。

Comments ACL 2026. Project page : https://universalrag.github.io

详情
AI中文摘要

检索增强生成(RAG)通过将外部相关知识与查询绑定,显著提升了事实准确性。然而,现有方法多局限于文本语料,尽管最近有尝试扩展到图像、视频等模态,但通常仅针对单一模态语料。相比之下,现实中的查询所需知识类型多样,单一知识源无法满足。为此,我们引入UniversalRAG,一种any-to-any RAG框架,旨在从异构源中检索和整合多样模态和粒度的知识。具体而言,受强制所有模态进入单一聚合语料的统一表示空间导致模态间隙的观察启发,我们提出模态感知路由,动态识别最合适的模态特定语料并执行针对性检索,并通过理论分析证明其有效性。此外,除模态外,我们对每个模态组织为多个粒度层级,实现针对查询复杂性和范围的精细检索。我们验证UniversalRAG在10个多种模态基准上的性能,显示其优于各种模态特定和统一基线。

英文摘要

Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In contrast, real-world queries vary widely in the type of knowledge they require, which a single type of knowledge source cannot address. To address this, we introduce UniversalRAG, an any-to-any RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities. Specifically, motivated by the observation that forcing all modalities into a unified representation space derived from a single aggregated corpus causes a modality gap, where the retrieval tends to favor items from the same modality as the query, we propose modality-aware routing, which dynamically identifies the most appropriate modality-specific corpus and performs targeted retrieval within it, and further justify its effectiveness with a theoretical analysis. Moreover, beyond modality, we organize each modality into multiple granularity levels, enabling fine-tuned retrieval tailored to the complexity and scope of the query. We validate UniversalRAG on 10 benchmarks of multiple modalities, showing its superiority over various modality-specific and unified baselines.

2505.04671 2026-06-15 cs.CL cs.LG 版本更新

Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards

Reward-SQL:通过逐步执行感知推理和过程监督奖励提升Text-to-SQL

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Guoliang Li, Bin Wu, Wenchao Zhou

发表机构 * Renmin University of China(中国人民大学) Tsinghua University(清华大学) Alibaba Cloud Computing(阿里云 computing)

AI总结 针对强化学习在Text-to-SQL中缺乏逐步执行感知推理和过程级奖励的问题,提出CoCTE框架和Reward-SQL方法,通过中间视图验证、结构化CTE及过程奖励模型,显著提升复杂查询的准确性和可解释性。

详情
AI中文摘要

最近,使用强化学习(RL)训练的大型语言模型(LLMs)的进展提高了Text-to-SQL的性能。然而,基于RL的方法仍然在处理复杂查询时面临两个关键限制:缺乏基于数据库反馈的逐步执行感知推理,以及缺乏用于指导推理优化的过程级奖励。为了解决这些问题,我们提出了CoCTE,一种分治且执行感知的推理框架,通过中间视图验证和结构化公共表表达式(CTEs)逐步组合SQL查询,提高了准确性和可解释性。为了实现CoCTE推理过程,我们开发了Reward-SQL,一种统一的方法,包含三个阶段:(1)模型初始化,使LLMs具备结构化CoCTE推理能力;(2)过程奖励设计,提供细粒度的、执行感知的监督;(3)过程监督的RL和推理,将过程奖励整合到训练中,并通过过程奖励指导推理阶段。本文解决了Reward-SQL中的核心挑战,并做出了以下贡献。我们引入了一个过程奖励模型(PRM),它将执行感知的轨迹评分与基于熵的步骤加权相结合,在推理步骤中提供密集且可解释的监督。我们将PRM集成到RL训练和推理阶段,稳定优化并通过过程级信号改进轨迹探索。实验表明,Reward-SQL在可比模型大小下显著优于基线,并表现出强大的跨领域泛化能力。

英文摘要

Recent advances in large language models (LLMs) trained with reinforcement learning (RL) have improved Text-to-SQL performance. However, RL-based approaches still struggle with complex queries due to two key limitations: insufficient stepwise execution-aware reasoning grounded in database feedback, and the lack of process-level rewards for guiding reasoning optimization. To address these issues, we propose CoCTE, a divide-and-conquer and execution-aware reasoning framework that progressively composes SQL queries through intermediate view validation and structured Common Table Expressions (CTEs), improving both accuracy and interpretability. To realize a CoCTE reasoning process, we develop Reward-SQL, a unified approach with three stages: (1) model initialization, which equips LLMs with structured CoCTE reasoning capabilities; (2) process reward design, which delivers fine-grained, execution-aware supervision; and (3) process-supervised RL and inference, which integrates process rewards into training and guides the inference stage by process rewards. This paper addresses the core challenges in Reward-SQL and makes the following contributions. We introduce a process reward model (PRM) that combines execution-aware trajectory scoring with entropy-based step weighting, providing dense and interpretable supervision across reasoning steps. We integrate PRM into both RL training and inference stages, stabilizing optimization and improving trajectory exploration with process-level signals. Experiments show that Reward-SQL significantly outperforms baselines with comparable model sizes, and exhibits strong cross-domain generalization.

2601.05106 2026-06-15 cs.AI cs.CL cs.LG 版本更新

Token-Level LLM Collaboration via FusionRoute

通过融合路由实现令牌级LLM协作

Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao

发表机构 * [cs.AI](计算机科学与人工智能)

AI总结 本文提出FusionRoute框架,通过轻量级路由器在解码步骤中选择最合适的专家并补充对数几率以优化下一个令牌分布,解决了单个通用模型在多个领域表现不佳的问题,同时在多个基准测试中优于其他方法。

Comments 25 pages

详情
AI中文摘要

大型语言模型(LLMs)在多个领域表现出色。然而,使用单一通用模型在这些领域实现强大性能通常需要扩展到训练和部署成本极高的规模。另一方面,虽然较小的领域专用模型更高效,但它们在训练分布之外的泛化能力较差。为了解决这一矛盾,我们提出了FusionRoute,一种稳健且有效的令牌级多LLM协作框架,其中轻量级路由器同时(i)在每个解码步骤中选择最合适的专家,(ii)贡献一个互补的对数几率,通过对数几率添加来细化或校正所选专家的下一个令牌分布。与现有依赖固定专家输出的令牌级协作方法不同,我们提供了一个理论分析,表明纯专家路由本质上是有限的:除非持有强全局覆盖假设,否则无法一般实现最优解码策略。通过在专家选择中加入可训练的互补生成器,FusionRoute扩展了有效的策略类别,并在温和条件下实现了最优价值函数的恢复。经验上,FusionRoute在Llama-3和Gemma-2家族以及涵盖数学推理、代码生成和指令跟随在内的多种基准测试中,优于序列级和令牌级协作、模型融合和直接微调方法,同时在各自任务上与领域专家保持竞争力。

英文摘要

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

2605.26702 2026-06-15 cs.CV cs.AI cs.CR cs.LG 版本更新

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

通过三阶SO(3)表示耦合的旋转不变球面水印

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Antonios Argyriou, Wu Liu, Weiping Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对全景图像在任意3D旋转下水印鲁棒性不足的问题,提出利用三阶SO(3)表示耦合构造旋转不变的球面双谱,将水印嵌入高阶球谐系数并从不变标量中提取,实现理论保证的旋转不变性和高视觉保真度。

Comments ICML 2026

详情
AI中文摘要

全景图像的可靠水印面临任意3D旋转的根本挑战。由于全景图定义在球面上,它们在$SO(3)$作用下自然变换,使得传统的平面表示和基于增强的鲁棒策略变得不充分且缺乏理论保证。为了解决这个问题,我们将全景图表示为球面信号,并利用$SO(3)$表示理论推导出可证明的旋转不变描述符。虽然球谐系数在旋转下等变变换,但自然的旋转不变构造通常限于零阶统计量,这消除了方向信息并严重限制了嵌入容量。在这项工作中,我们通过张量积耦合高阶$SO(3)$不可约表示并投影到平凡表示,引入了一种有原则的三阶不变构造。这产生了球面不变双谱,它在保持严格旋转不变性的同时保留了相位信息。利用这一特性,我们将水印嵌入到高阶球谐系数中,并从不变双谱标量中恢复它们,从而在任意3D旋转下实现可靠的提取。我们提供了其$SO(3)$不变性的理论证明,并通过实验证明其对连续旋转具有近乎完美的鲁棒性,同时保持高视觉保真度。

英文摘要

Reliable watermarking of panoramic imagery is fundamentally challenged by arbitrary 3D rotations. As panoramas are defined on the sphere, they naturally transform under the action of $SO(3)$, rendering conventional planar representations and augmentation-based robustness strategies inadequate and devoid of theoretical guarantees. To address this, we formulate panoramas as spherical signals and leverage $SO(3)$ representation theory to derive provably rotation-invariant descriptors. While spherical harmonic coefficients transform equivariantly under rotations, the natural invariant constructions are typically limited to zeroth-order statistics which eliminate directional information and severely constrain embedding capacity. In this work, we introduce a principled third-order invariant construction by coupling higher-order $SO(3)$ irreducible representations via tensor products and projecting onto the trivial representation. This yields a spherical invariant bispectrum that preserves phase information while remaining strictly rotation-invariant. Leveraging this property, we embed watermarks into higher-order spherical harmonic coefficients and recover them from invariant bispectral scalars, enabling reliable extraction under arbitrary 3D rotations. We provide a theoretical proof of $SO(3)$ invariance for it and demonstrate experimentally its near-perfect robustness to continuous rotations while maintaining high visual fidelity.

2606.12881 2026-06-15 cs.CL cs.LG 版本更新

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

面向聊天机器人微调的直接偏好优化:一项实证研究

Dezhi Yu, Yvonne Qiu, ShuoJia Fu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文实证研究直接偏好优化(DPO)在聊天机器人微调中的应用,表明其简化训练流程、提升计算效率且性能有竞争力,但存在训练不稳定性。

Comments 7 pages, 3 figures, 1 table. All authors contributed equally

详情
AI中文摘要

我们提出了一种使用直接偏好优化(DPO)微调大型语言模型的方法,这是一种强化学习技术。我们的实验结果表明,DPO简化了训练流程,提高了计算效率,并实现了有竞争力的性能。使用BLEU、ROUGE和余弦相似度指标的评估表明,模型有效学习并收敛,尽管需要进一步研究以解决观察到的训练不稳定性。

英文摘要

We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves computational efficiency, and achieves competitive performance. The evaluation using BLEU, ROUGE, and cosine similarity metrics indicates effective learning and convergence, though further investigation is needed to address observed training instability.

2. 表示学习、自监督与对比学习 8 篇

2606.14108 2026-06-15 cs.LG cs.AI 新提交

Numbers Already Carry Their Own Embeddings

数字本身已携带其嵌入

Suhyun Bae, Donghun Lee

发表机构 * Department of Mathematics, Korea University(高丽大学数学系)

AI总结 提出无训练嵌入方法AOE,同时保留数字的实数值与p-adic模签名,实现即插即用并在代数组合基准上首次达到完美精度。

Comments Presented at the MATH-AI Workshop at NeurIPS 2025

详情
AI中文摘要

我们引入了Adelic运算保持嵌入(AOE),这是一种无需训练的表示,同时捕捉数字的实数值及其模(p-adic)签名。该构造通过设计保留了加法和乘法结构,将数字输入转化为“用数学语言表达”的嵌入。与依赖任务特定重新训练的先前方法不同,AOE是即插即用的,可无缝集成到现有架构中。在代数组合基准测试上,它取得了持续的性能提升,包括在编织图案任务上首次实现完美准确率——这为克服人工智能中长期存在的“数字问题”提供了一条有原则的前进道路。

英文摘要

We introduce Adelic operation-preserved embeddings (AOE), a training-free representation that captures both a number's real value and its modular (p-adic) signatures. This construction preserves additive and multiplicative structure by design, turning numerical input into embeddings that "speak in the language of mathematics." Unlike prior approaches that rely on task-specific retraining, AOE is plug-and-play and drops seamlessly into existing architectures. On algebraic combinatorics benchmarks, it delivers consistent gains including the first-ever perfect accuracy on the Weaving Pattern task-while suggesting a principled path forward for overcoming the long-standing "number problem" in AI.

2606.14334 2026-06-15 cs.LG math.DG 新提交

Riemannian Metric Matching for Scalable Geometric Modeling of Distributions

黎曼度量匹配:面向分布的可扩展几何建模

Jacob Bamberger, Adam Gosztolai, Pierre Vandergheynst, Michael Bronstein, Iolo Jones

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出黎曼度量匹配框架,通过神经网络学习数据的黎曼几何,利用carré du champ算子的条件期望形式实现样本级训练和恒定成本推理,在精度相当或更优下速度提升400倍。

Comments ICML 2026 (Oral)

详情
AI中文摘要

高维数据集通常集中在低维结构附近,但从样本中估计其几何通常依赖于图和核,这些方法随数据集大小和维度扩展性差。我们提出黎曼度量匹配:一种去噪概率框架,使用神经网络学习数据的黎曼几何。具体而言,我们学习carré du champ算子,通过扩散几何,该算子使我们能够访问黎曼几何工具包,用于下游机器学习和统计任务。我们的关键观察是,carré du champ算子可以表述为数据随机扰动的条件期望,这可用于样本级训练和恒定成本的摊销推理,而无需显式核构造。实验上,度量匹配在精度上媲美或改进基于$k$-NN的扩散几何估计器,同时实现高达$400$倍更快的摊销推理,并支持在最近邻失效的高维图像上进行无图几何分析。

英文摘要

High-dimensional datasets often concentrate near low-dimensional structures, but estimating their geometry from samples typically relies on graphs and kernels that scale poorly with dataset size and dimension. We propose Riemannian metric matching: a denoising probabilistic framework for learning the Riemannian geometry of data using neural networks. Specifically, we learn the carré du champ operator, which, using diffusion geometry, gives us access to the Riemannian geometry toolkit for downstream machine learning and statistical tasks. Our key observation is that the carré du champ operator can be formulated as a conditional expectation over random perturbations of the data, which can be exploited for sample-wise training and constant cost, amortized inference without explicit kernel construction. Empirically, metric matching rivals or improves the accuracy of $k$-NN-based diffusion geometry estimators, while enabling amortized inference that is up to $400\times$ faster, and supports graph-free geometric analysis on high-dimensional images where nearest neighbors break down.

2606.14347 2026-06-15 cs.LG 新提交

When Language Representations Interact: Separability and Cross-Lingual Effects in LLMs

当语言表示交互时:LLM中的可分离性与跨语言效应

Boris Marinov, Angira Sharma, Christian Schroeder de Witt, Philip Torr, Anisoara Calinescu, Jialin Yu

发表机构 * University of Oxford(牛津大学) Imperial College London(帝国理工学院) University of York(约克大学)

AI总结 通过因果几何分析,研究多语言LLM中语言表示的线性可分离性及跨语言结构依赖,发现语言概念在协方差调整内积下可分离,同语系语言呈现单纯形几何结构。

Comments Trustworthy AI for Good (AI4Good) Workshop @ ICML 2026

详情
AI中文摘要

大型语言模型展现出强大的多语言能力,然而其内部表示难以解释。理解这些交互对于确保多语言系统的可靠行为至关重要。近期研究表明,因果几何结构可以解释某些概念如何被编码为近似线性和可分离的方向,但该框架是否适用于语言身份相关且层次化的多语言模型,尚待探索。我们将因果几何分析应用于多语言LLM,研究了三个模型中的28个双语对比,从而分析语言何时表现为近似独立因子,以及何时存在结构化依赖。我们发现证据表明,语言概念具有稳定的线性表示,在协方差调整(因果)内积下大致可分离,而结构化偏差反映了语言相似性。此外,同一语系(如日耳曼语或罗曼语)内的语言表现出类似单纯形的几何结构,暗示了层次化组织。这些结果将因果几何可解释性扩展到多语言设置,并揭示了多语言LLM表示中可分离性与相似性如何共存,从而激励可解释性分析以诊断何时以及如何预期概念间的结构化依赖。这对可信部署具有重要意义,因为语言间的残余结构可能导致在监控或干预模型时产生意外的跨语言效应。

英文摘要

Large language models exhibit strong multilingual capabilities, however, their internal representations are difficult to interpret. Understanding these interactions is important for ensuring reliable behavior in multilingual systems. Recent work has shown that causal-geometric structure can explain how certain concepts are encoded as approximately linear and separable directions, but whether this framework extends to multilingual models, where language identity is correlated and hierarchical, is underexplored. We apply causal-geometric analysis to multilingual LLMs, studying 28 bilingual contrasts across three models, allowing us to analyze when languages behave as approximately independent factors and when structured dependencies persist. We find evidence that language concepts admit stable linear representations that are largely separable under a covariance-adjusted (causal) inner product, with structured deviations reflecting linguistic similarity. Moreover, languages within the same family (such as Germanic or Romance) exhibit a simplex-like geometric structure, suggesting hierarchical organization. These results extend causal-geometric interpretability to multilingual settings and provide insight into how separability and similarity may exist in multilingual LLM representations, motivating interpretability analyses that diagnose when and how structured dependencies between concepts can be anticipated. This has implications for trustworthy deployment, as residual structure between languages may lead to unintended cross-lingual effects when models are monitored or intervened upon.

2606.14662 2026-06-15 cs.LG cs.SD 新提交

Beyond task performance: Decoding bioacoustic embeddings with speech features

超越任务性能:用语音特征解码生物声学嵌入

Ines Nolasco, Jules Cauzinille, Marius Miron, Gagan Narula, Milad Alizadeh, Emmanuel Fernandez, Matthieu Geist, Ellen Gilsenan-McMahon, Olivier Pietquin, Emmanuel Chemla, Sara Keen

发表机构 * Earth Species Project(地球物种项目)

AI总结 本研究通过线性与非线性回归探针,揭示生物声学预训练嵌入编码的语音特征,发现不同模型互补覆盖声学空间,并提出基于特征可恢复性的模型选择指南。

Comments Accepted at Interspeech 2026

详情
AI中文摘要

预训练音频嵌入在生物声学中是标准做法,但关于这些模型编码了哪些声学特征以及哪些特征对特定任务有用,我们知之甚少。这阻碍了透明度,并限制了向稀有物种或数据稀缺领域的扩展。在这里,我们揭示了生物声学表示中编码了哪些类似语音的特征。使用跨越六个分类群的88个eGeMAPS特征,我们应用线性和非线性回归探针来量化每个模型捕获了哪些声学属性。结果证实了“没有免费午餐”的模式:没有单个模型能捕获完整的特征空间。拼接嵌入实现了最高性能,表明模型之间互补的声学空间覆盖。响度特征编码最好($R^2 = 0.76$),而F0最难恢复($R^2 = 0.33$)。通过将可恢复性与每个物种的特征显著性(NMI)交叉引用,我们为生物声学得出了数据驱动的模型选择指南。

英文摘要

Pretrained audio embeddings are standard in bioacoustics, yet little is known about which acoustic features these models encode, nor which are useful for a given task. This hinders transparency and limits extension to rare species or data-scarce domains. Here we reveal which speech-like features are encoded in bioacoustic representations. Using the 88~eGeMAPS features across six taxonomic groups, we apply linear and nonlinear regression probes to quantify which acoustic properties each model captures. Results confirm a ``no free lunch'' pattern: no single model captures the full feature space. A concatenated embedding achieves the highest performance, suggesting complementary acoustic space coverage across models. Loudness features are best encoded ($R^2 = 0.76$) while F0 is hardest to recover ($R^2 = 0.33$). By cross-referencing recoverability with per-species feature salience (NMI), we derive data-driven model selection guidance for bioacoustics.

2606.14703 2026-06-15 cs.CV cs.CL cs.LG 交叉投稿

Gaze Heads: How VLMs Look at What They Describe

注视头:视觉语言模型如何观察它们所描述的内容

Rohit Gandikota, David Bau

发表机构 * Northeastern University(东北大学)

AI总结 发现视觉语言模型的语言骨干中存在一组“注视头”,其注意力跟踪当前描述的图像区域,通过干预这些头可精确控制模型描述内容,准确率达83.1%。

详情
AI中文摘要

视觉语言模型在内部如何解决描述图像的任务远非显而易见。我们发现模型为此发展出一种特定机制:其语言模型骨干中的一小部分注意力头(我们称之为注视头),其注意力跟踪模型当前正在描述的图像区域。我们通过简单的相关性得分从几次前向传播中发现了它们,使用连环漫画作为受控测试平台,其中叙事顺序在空间上展开。这些注视头不仅跟踪正在描述的图像标记:将它们的注意力重定向到所选区域会强制视觉语言模型描述该区域。对前100个注视头(少于所有头的9%)进行单次注意力掩码干预,以83.1%的准确率将模型的答案引导到任何选定的漫画面板,而对随机头进行相同干预则无法重定向答案,并且对所有头进行干预会破坏生成。相同的杠杆还扩展到连续控制:在生成过程中切换注视目标会使模型在几个标记内结束当前面板描述并转向新面板。在漫画之外,相同的干预将答案重定向到自然COCO图像中的选定区域。该机制进一步在2B到32B参数的模型大小以及其他视觉语言模型架构中重复出现,尽管一些冻结编码器系列没有显示可比较的头集。更广泛地说,这表明通过机制分析识别的目标编辑可以作为实用的推理时杠杆来引导多模态模型行为,而无需任何重新训练。我们的代码、交互式演示和数据集可在以下网址获取:此 https URL

英文摘要

How a vision-language model internally solves the task of describing an image is far from obvious. We find that the model develops a specific mechanism for this: a small set of attention heads in its language-model backbone, which we call gaze heads, whose attention tracks the image region the model is currently describing. We find them with a simple correlation score from a few forward passes, using comic strips as a controlled testbed where narrative order is laid out spatially. These gaze heads do not just track the image tokens being described: redirecting their attention to a chosen region forces the VLM to describe that region instead. A single attention-mask intervention on the top-100 gaze heads, fewer than 9% of all heads, steers the model's answer to any chosen comic panel at 83.1% accuracy, while the same intervention on random heads fails to redirect the answer, and intervening on all heads destroys generation. The same lever also extends to continuous control: switching the gaze target mid-generation makes the model wrap up its current panel description and move to the new one within a few tokens. Beyond comics, the same intervention redirects answers to chosen regions in natural COCO images. The mechanism further recurs across model sizes from 2B to 32B parameters and across other VLM architectures, although some frozen-encoder families show no comparable head set. More broadly, this shows that targeted edits identified through mechanistic analysis can serve as practical inference-time levers for steering multimodal model behavior, without any retraining. Our code, interactive demo, and datasets are available at https://gaze.baulab.info/

2207.03116 2026-06-15 cs.LG math.GR 版本更新

Equivariant Representation Learning via Class-Pose Decomposition

通过类-姿态分解的等变表示学习

Giovanni Luca Marchetti, Gustaf Tegnér, Anastasiia Varava, Danica Kragic

发表机构 * School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology(电气工程与计算机科学学院,皇家理工学院)

AI总结 提出一种将潜在空间分解为不变因子和对称群的方法,利用相对对称信息监督学习等变表示,实现无损、可解释且解耦的表示,实验优于其他等变表示学习框架。

Comments 12 pages

详情
AI中文摘要

我们提出了一种通用的学习方法,用于学习对数据对称性等变的表示。我们的核心思想是将潜在空间分解为一个不变因子和对称群本身。这些组件在语义上分别对应于内在数据类别和姿态。学习器基于相对对称信息的监督,在鼓励等变性的损失上进行训练。该方法受到群论理论结果的启发,并保证了表示是无损、可解释且解耦的。我们通过涉及多种对称性数据集的实验进行了实证研究。结果表明,我们的表示捕捉了数据的几何结构,并优于其他等变表示学习框架。

英文摘要

We introduce a general method for learning representations that are equivariant to symmetries of data. Our central idea is to decompose the latent space into an invariant factor and the symmetry group itself. The components semantically correspond to intrinsic data classes and poses respectively. The learner is trained on a loss encouraging equivariance based on supervision from relative symmetry information. The approach is motivated by theoretical results from group theory and guarantees representations that are lossless, interpretable and disentangled. We provide an empirical investigation via experiments involving datasets with a variety of symmetries. Results show that our representations capture the geometry of data and outperform other equivariant representation learning frameworks.

2505.16077 2026-06-15 cs.LG 版本更新

Ensembling Sparse Autoencoders

集成稀疏自编码器

Soham Gadgil, Chris Lin, Su-In Lee

AI总结 针对单个稀疏自编码器只能捕获激活空间中有限特征的问题,提出通过朴素Bagging和Boosting集成多个SAE,理论证明可降低重构误差,实验表明集成方法在重构质量、稳定性和下游任务上优于扩展单个SAE。

Comments Accepted to ICML 2026

详情
AI中文摘要

稀疏自编码器(SAEs)用于将神经网络激活分解为人类可解释的特征。通常,单个SAE学习到的特征用于下游应用。然而,最近研究表明,单个SAE只能捕获从激活空间中提取的特征的有限子集。受此限制,我们引入并形式化了SAE集成。此外,我们提出通过朴素Bagging和Boosting集成多个SAE。在朴素Bagging中,集成使用不同权重初始化训练的SAE;而在Boosting中,集成顺序训练以最小化残差误差的SAE。理论上,朴素Bagging和Boosting被证明是减少重构误差的方法。实证上,我们在三种语言模型和SAE架构设置下评估了我们的集成方法。我们的实证结果表明,与匹配集成中特征数量的扩展SAE相比,集成SAE改善了语言模型激活的重构以及SAE稳定性。此外,在概念检测和虚假相关性去除等下游任务中,SAE集成实现了更好的性能,显示出改进的实际效用。

英文摘要

Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we introduce and formalize SAE ensembles. Furthermore, we propose to ensemble multiple SAEs through naive bagging and boosting. In naive bagging, SAEs trained with different weight initializations are ensembled, whereas in boosting SAEs sequentially trained to minimize the residual error are ensembled. Theoretically, naive bagging and boosting are justified as approaches to reduce reconstruction error. Empirically, we evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that, compared to an expanded SAE that matches the number of features in the ensemble, ensembling SAEs improves the reconstruction of language model activations along with SAE stability. Additionally, on downstream tasks such as concept detection and spurious correlation removal, SAE ensembles achieve better performance, showing improved practical utility.

2601.22108 2026-06-15 cs.LG cs.AI 版本更新

Learning What to Predict: Downstream-Guided Task Design for Continued Pretraining

学习预测什么:下游引导的持续预训练任务设计

Shuqi Ke, Giulia Fanti

发表机构 * Department of ECE(电子工程系) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出V-pretraining方法,通过轻量级任务设计器为无标签批次构建目标或视图,利用下游损失的一阶减少作为反馈,指导自监督更新,提升目标能力而不损害泛化。

详情
AI中文摘要

持续预训练通过固定的自监督任务进行优化,但根据下游性能选择检查点,形成了一个粗粒度的反馈循环:实践者评估检查点、改变数据混合或目标、重新开始运行,而单个更新仍然对目标能力视而不见。我们询问是否一小部分可验证的下游示例可以在不直接监督学习器的情况下提供步骤级反馈。我们引入了V-pretraining,它将仅使用自监督损失训练的学习器与一个轻量级任务设计器解耦,该设计器为无标签批次构建目标或视图。给定当前学习器和批次,V-pretraining通过预测诱导的自监督更新后下游损失的一阶减少来评分候选构建。设计器最大化该值;然后学习器应用带有分离目标或视图的更新,因此下游标签永远不会更新学习器参数。我们将V-pretraining实例化为用于语言建模的自适应top-K软目标和用于自监督视觉的学习视图或掩码。在两种模态中,V-pretraining在不降低泛化的情况下提高了目标能力。在挂钟时间匹配的持续预训练下,它仅使用1,024个GSM8K示例作为反馈,提高了Qwen模型的GSM8K Pass@1,包括Qwen2.5-0.5B的单次运行+7.4点增益。在视觉方面,它改善了DINOv3向ADE20K语义分割和NYUv2深度估计的迁移,同时保持了ImageNet线性准确率,表明反馈引导的任务构建可以在不破坏通用表示的情况下提高目标能力。

英文摘要

Continued pretraining is optimized with fixed self-supervised tasks but selected by downstream performance, creating a coarse feedback loop in which practitioners evaluate checkpoints, change data mixtures or objectives, and restart runs, while individual updates remain blind to target capabilities. We ask whether a small set of verifiable downstream examples can provide step-level feedback without directly supervising the learner. We introduce V-pretraining, which decouples a learner trained only with a self-supervised loss from a lightweight task designer that constructs targets or views for unlabeled batches. Given the current learner and batch, V-pretraining scores a candidate construction by predicting the first-order reduction in downstream loss after the induced self-supervised update. The designer maximizes this value; the learner then applies the update with targets or views detached, so downstream labels never update learner parameters. We instantiate V-pretraining as adaptive top-K soft targets for language modeling and learned views or masks for self-supervised vision. Across both modalities, V-pretraining improves target capabilities without degrading generalization. Under wall-clock-matched continued pretraining, it improves GSM8K Pass@1 for Qwen models using 1,024 GSM8K examples only as feedback, including a +7.4 point single-run gain for Qwen2.5-0.5B. In vision, it improves DINOv3 transfer to ADE20K semantic segmentation and NYUv2 depth estimation while preserving ImageNet linear accuracy, suggesting that feedback-guided task construction can improve target capabilities without collapsing general-purpose representations.

3. 强化学习与序列决策 21 篇

2606.14029 2026-06-15 cs.LG 新提交

Utility-Constrained Policy Optimization

效用约束策略优化

Mehrdad Moghimi, Bernardo Avila Pires

发表机构 * York University(约克大学) Google DeepMind(谷歌深度思维)

AI总结 提出一种简单而强大的效用约束MDP方法,支持风险敏感约束,无需预先固定约束限值,在多个安全基准任务上匹配或超越现有基线。

详情
AI中文摘要

约束MDP(CMDP)是将安全性纳入强化学习智能体的广泛采用框架;然而,该框架不支持风险敏感约束。这可能是有问题的:例如,CMDP允许最优解为了满足风险中性约束,混合了罕见的灾难性行为和频繁的过度保守行为。此外,先前的实证结果表明,即使在风险中性评估下,执行更严格的风险敏感约束也能提高性能。纳入风险敏感约束的自然框架是效用约束MDP(UCMDP),但此前没有针对该问题的实用解决方案。在这项工作中,我们为UCMDP和约束RL引入了一种简单而强大的方法。除了允许风险敏感约束外,我们的框架不需要在训练智能体之前预先固定约束限值,只要知道一个合理的范围即可。这增加了策略的灵活性,并且在实践中允许以零额外训练成本调整这些限值。除了受益于框架的通用性外,我们的智能体在实践中表现出强大的性能,在多个Safety Gymnasium基准任务中持续匹配或超越现有基线。

英文摘要

Constrained MDPs (CMDPs) are a widely adopted framework for incorporating safety into RL agents; however, the framework does not support risk-sensitive constraints. This can be problematic: For example, CMDPs allow for optimal solutions that, in order to satisfy the risk-neutral constraints, mix infrequent catastrophic behaviors and frequent, overly conservative ones. Moreover, prior empirical results suggest that enforcing stricter, risk-sensitive constraints can improve performance even under risk-neutral evaluation. The natural framework to incorporate risk-sensitive constraints is utility-constrained MDPs (UCMDPs), but no practical solutions for this problem existed. In this work, we introduce a simple yet powerful methodology for UCMDPs and constrained RL. Besides allowing for risk-sensitive constraints, our framework does not require us to fix constraint limits in advance of training the agent, provided that a sensible range is known. This increases policy flexibility and, in practice, allows for adjustments to these limits at no extra training cost. Besides benefiting from the generality of the framework, our agent shows strong performance in practice, consistently matching or outperforming existing baselines in several Safety Gymnasium benchmark tasks.

2606.14130 2026-06-15 cs.LG cs.MA 新提交

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

基于合约的组合屏蔽实现安全多智能体强化学习

Omar Adalat, Edwin Hamel-De le Court, Francesco Belardinelli

发表机构 * Imperial College London(伦敦帝国学院) University of Manchester(曼彻斯特大学)

AI总结 提出一种去中心化屏蔽方法,通过合约机制协调智能体局部LTL安全义务,在无集中运行时控制下保证全局安全并优化团队奖励。

详情
AI中文摘要

在多智能体强化学习中,当任何智能体无法单方面强制执行全局安全时,就会出现安全协调问题:一个智能体动作的可接受性可能取决于其他智能体的动态。去中心化屏蔽可以在运行时强制执行安全,但纯粹分解的权限通常会排除仅通过协调才能安全的团队最优行为。我们研究了在去中心化执行下训练和部署的智能体的确定性安全保证,无需集中运行时控制即可恢复团队最优的安全行为。智能体共享一个在安全线性时序逻辑片段($\mathsf{LTL}_{\mathsf{safe}}$)中的全局规范$\phi$,并选择局部$\mathsf{LTL}_{\mathsf{safe}}$义务元组,这些义务的合取蕴含全局规范$\phi$。每个智能体可以依赖其他智能体的局部义务作为假设,因为整个合约元组同时被认证,并允许投影到局部动作掩码。在学习时,一个非平稳的多臂赌博机从局部$\mathsf{LTL}_{\mathsf{safe}}$义务库中选择元组以优化团队奖励,同时不放弃端到端安全性。我们在6个环境和15种算法变体上评估了该方法。

英文摘要

Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents. Decentralised shields can enforce safety at runtime, but purely factorised permissions often exclude optimal team behaviour that is safe only through coordination. We study deterministic safety guarantees for agents trained and deployed under decentralised execution, recovering team-optimal safe behaviour without centralised runtime control. Agents have a shared global specification $ϕ$ in the safety fragment of Linear Temporal Logic ($\mathsf{LTL}_{\mathsf{safe}}$ ), and select among tuples of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations whose conjunction implies the global specification $ϕ$. Each agent may rely on the other agents' local obligations as assumptions because the whole contract tuple is certified simultaneously and allows projection into local action masks. At learning time, a non-stationary multi-armed bandit chooses among a library of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations to select the tuple that optimises team reward, all without forgoing end-to-end safety. We evaluate the approach across 6 environments and 15 algorithmic variants.

2606.14192 2026-06-15 cs.LG 新提交

DRIVE: Distributional and Retrieval-Augmented Bidding with Value Evaluation

DRIVE:基于分布与检索增强的价值评估竞价方法

Miduo Cui, Haochen Wang, Shangqin Mao, Xun Yang, Qianlong Xie, Xingxing Wang, Xuri Ge, Ying Zhou, Zhiwei Xu

发表机构 * Machine Learning, ICML(机器学习,ICML)

AI总结 提出DRIVE框架,通过解耦候选动作生成与决策,结合分布建模、检索增强和价值评估,提升离线自动竞价性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

自动竞价是实时广告系统的核心组成部分,决策必须在预算和成本约束下优化长期性能,而在线探索风险极高。离线强化学习以及最近基于Transformer的序列建模在从日志数据中学习竞价策略方面显示出前景,但其单峰和纯参数化公式通常将多种有效竞价策略折叠为次优的平均动作,并在稀疏或长尾流量下表现不可靠。为缓解这些限制,我们提出DRIVE(基于分布与检索增强的价值评估竞价),一个统一的基于Transformer的框架,将候选动作生成与决策解耦,用于离线自动竞价。DRIVE结合了分布动作建模、从高质量历史决策中检索增强的候选生成以及基于价值的评估,以在推理时选择最有希望的出价。在AuctionNet和额外离线强化学习基准上的大量实验表明,DRIVE持续改善竞价性能,并在多种基于Transformer的方法上具有良好的泛化能力。

英文摘要

Auto-bidding is a core component of real-time advertising systems, where decisions must optimize long-term performance under budget and cost constraints, while online exploration is prohibitively risky. Offline reinforcement learning and, more recently, Transformer-based sequence modeling have shown promise for learning bidding policies from logged data, but their unimodal and purely parametric formulations often collapse multiple effective bidding strategies into suboptimal averaged actions and perform unreliably under sparse or long-tail traffic. To mitigate these limitations, we propose DRIVE (Distributional and Retrieval-Augmented Bidding with Value Evaluation), a unified Transformer-based framework that decouples candidate action generation from decision making for offline auto-bidding. DRIVE combines distributional action modeling, retrieval-augmented candidate generation from high-quality historical decisions, and value-based evaluation to select the most promising bid at inference time. Extensive experiments on AuctionNet and additional offline reinforcement learning benchmarks demonstrate that DRIVE consistently improves bidding performance and generalizes well across multiple Transformer-based methods.

2606.14536 2026-06-15 cs.LG cs.RO cs.SY eess.SY 新提交

Provably Safe, Yet Scalable Reinforcement Learning

可证明安全且可扩展的强化学习

Kai S. Yun, Zeyang Li, Navid Azizan

发表机构 * MIT(麻省理工学院)

AI总结 提出PS2-RL框架,通过两阶段架构(学习备份策略隐式构造控制不变集,再通过可微投影层训练RL策略)实现可证明安全且可扩展的强化学习,在高达10维状态空间中保持性能与安全性。

详情
AI中文摘要

安全强化学习旨在学习在满足约束的同时优化奖励的策略。主流方法依赖于软约束策略优化,虽取得经验成功,但无法为学习策略提供正式安全保证。相反,具有严格保证的方法通常依赖显式证书函数,其构造需要直接综合和验证控制不变集,这一过程随状态维度扩展性差,且往往导致过于保守的行为。本文提出可证明安全且可扩展的强化学习(PS2-RL)框架,一种新颖的两阶段架构,以可扩展方式学习可证明安全的策略,旨在克服先前方法的关键瓶颈。PS2-RL不显式计算不变集,而是利用学习的备份策略前向积分系统动力学,在线生成隐式控制不变集。第一阶段,通过提出的安全到达值函数训练备份策略,该值函数刻画了用于不变集构造的最优备份策略。第二阶段,通过可微投影层端到端训练RL策略,该投影层严格强制由学习备份策略诱导的安全保证。通过在第一阶段最大化隐式控制不变集的体积,第二阶段得到的PS2策略既高效又可扩展,同时保持可证明安全性。关键的是,PS2-RL对底层RL算法无限制,可插入任何现有训练流程。我们为所提框架建立了理论保证,并在状态维度高达10的机器人控制任务上进行了评估,而在此范围内,先前可证明安全的RL方法难以应对或变得不实用。

英文摘要

Safe reinforcement learning (RL) aims to learn policies that optimize rewards while satisfying constraints. Predominant approaches rely on soft-constrained policy optimization, which has achieved empirical success but does not provide formal safety guarantees for the learned policy. In contrast, methods with strict guarantees typically rely on explicit certificate functions, whose construction requires the direct synthesis and verification of control-invariant sets, a process that scales poorly with state dimension and often yields overly conservative behavior. In this paper, we present the Provably Safe, yet Scalable RL (PS2-RL) framework, a novel two-phase architecture for learning provably safe policies in a scalable manner, designed to overcome the key bottlenecks of prior methods. Rather than explicitly computing invariant sets, PS2-RL leverages a learned backup policy to forward-integrate the system dynamics, generating an implicit control-invariant set online. In the first phase, the backup policy is trained with our proposed safe-arrival value function, which characterizes the optimal backup policy for invariant-set construction. In the second phase, an RL policy is trained end-to-end through a differentiable projection layer that strictly enforces the safety guarantees induced by the learned backup policy. By maximizing the volume of the implicit control-invariant set in the first phase, the resulting PS2 policy from the second phase is performant and scalable, while maintaining provable safety. Crucially, PS2-RL imposes no restrictions on the underlying RL algorithm and can be plugged into any existing training pipeline. We establish theoretical guarantees for the proposed framework and evaluate it on robotic control tasks with state dimensions up to 10, a regime in which prior provably safe RL methods struggle or become impractical.

2606.14650 2026-06-15 cs.LG 新提交

Graph Structured Combinatorial Semi-Bandit with Nonlinear Reward Associations through Separable Signals

具有非线性奖励关联的图结构组合半赌博机通过可分离信号

Christoph Bauschmann, Setareh Maghsudi

发表机构 * IEEE

AI总结 针对图结构组合半赌博机问题,提出基于图因果奖励建模、再生核方法和泰勒近似的自适应策略,实现时间次线性与数据量线性性能保证,并验证于合成与真实交通数据。

详情
AI中文摘要

在大量互连数据中识别最优结构需要大量的采样和计算工作。学习和利用潜在的信号依赖关系可以显著提高效率和预测能力,但非线性统计关系的普遍性增加了此类任务的复杂性。在本文中,我们开发了新颖的通用自适应策略,配备了基于图的因果奖励建模、解析再生核方法以及函数过程的泰勒逼近。我们建立了理论性能保证,在时间上呈次线性,在数据量上随时间呈线性。我们的分析涵盖了对噪声干扰、渐进模型收敛和解空间不匹配等多种不确定性的鲁棒性。该框架的通用性通过最小化条件集或对先验估计的依赖得到证实,而各种概述的修改则针对特定或扩展设置。为了证明实际有效性,我们使用基准合成和真实世界交通数据集进行了数值实验。

英文摘要

The identification of optimal structures within vast arrays of interconnected data necessitates significant sampling- and computational effort. Learning and leveraging underlying signal dependencies can improve efficiency and predictive capabilities considerably, but the ubiquity of nonlinear statistical relations amplifies the complexity of such undertakings. In this paper, we develop novel generic and adaptive strategies equipped with routines for graph-based causal reward modeling, analytic reproducing kernel methods, and Taylor approximation of functional processes. We establish theoretical performance guarantees sublinear in time and linear in data volume over time. Our analyses cover robustness to a multitude of uncertainties arising from noise interference, gradual model convergence, and solution space mismatch. The framework's general appeal is substantiated by a minimalistic set of conditions or reliance on prior estimates, while various outlined modifications address specific or extended settings. To demonstrate practical effectiveness, we conduct numerical experiments using both benchmarked synthetic and real-world transportation datasets.

2606.13698 2026-06-15 eess.SY cs.AI cs.LG cs.NI cs.PF cs.SY 交叉投稿

Active Inference for Adaptive Traffic Signal Control in Noisy Nonstationary IoT Environments

嘈杂非平稳物联网环境下自适应交通信号控制的主动推理方法

Dénes Toth, George Ambroladze, Edwin Sundberg, Ali Beikmohammadi, Alfreds Lapkovskis

发表机构 * Department of Computer Systems and Sciences(计算机系统与科学系) Stockholm University(斯德哥尔摩大学)

AI总结 提出一种基于主动推理的交通信号控制器,通过最小化期望自由能动态选择相位,在传感器遮挡、天气衰减和非平稳需求下优于深度Q网络和规则方法,降低空闲时间和CO2排放。

Comments Submitted to IEEE 12th World Forum on Internet of Things (WF-IoT) 2026

详情
AI中文摘要

在物联网化交叉口的城市交通信号控制必须在传感器遮挡、天气衰减和非平稳需求下保持有效。传统控制器在这些条件下性能下降,学习策略难以审计。为应对这些挑战,我们提出一种针对四臂信号交叉口的主动推理控制器,通过最小化关于各方向拥堵水平的高斯信念的期望自由能(EFE)动态选择相位,形成完全可追踪的决策流程。我们在SUMO交通模拟器中,将控制器与基于规则的启发式方法和深度Q网络(DQN)进行对比,涵盖四种逐渐增加噪声和非平稳性的场景,包括传感器遮挡、恶劣天气和随机事故。每个场景进行100次独立随机评估,主动推理在噪声最大的场景中实现了最低的空闲时间和CO2排放(分别为56,977秒和29.12千克,而DQN为71,741秒和30.56千克)。这些收益以公交优先服务率和相位切换频率的适度代价为代价。

英文摘要

Urban traffic signal control at IoT-instrumented intersections must remain effective under sensor occlusion, weather attenuation, and nonstationary demand. Conventional controllers degrade under these conditions, and learned policies remain difficult to audit. To address these challenges, we propose an active inference controller for a four-arm signalized intersection that dynamically selects phases by minimizing expected free energy (EFE) over Gaussian beliefs about per-direction congestion levels, yielding a fully traceable decision pipeline. We benchmark the controller in a SUMO traffic simulator against a rule-based heuristic and a deep Q-network (DQN) across four scenarios that progressively increase noise and nonstationarity, spanning sensor occlusion, adverse weather, and stochastic accidents. Across 100 independent random evaluations per scenario, active inference attains the lowest idle times and CO2 emissions in the noisiest scenarios (56,977 s and 29.12 kg vs. 71,741 s and 30.56 kg for DQN). These gains come at a modest cost in bus priority service rate and phase switch frequency.

2606.13832 2026-06-15 cs.MA cs.AI cs.CR cs.LG 交叉投稿

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

安全合约图多智能体强化学习用于自主网络安全响应

Jose Luis Lima de Jesus Silva

发表机构 * Oxaala Tecnologias(Oxaala技术公司) Universidade Federal da Bahia(巴西巴伊亚联邦大学)

AI总结 提出安全合约图MARL框架ACD$^3$-GAT,通过约束优化、图编码和反事实筛选,在CAGE Challenge 4中将停机违规率从100%降至0.3%或13.8%,实现安全与性能的平衡。

详情
AI中文摘要

自主网络安全响应系统有望减少安全运营中心(SOC)的响应延迟,但仅基于奖励的多智能体强化学习(MARL)虽然能提高安全奖励,却仍无法部署。我们提出一个安全合约图MARL框架,并实例化为ACD$^3$-GAT(自适应约束反事实决策与图注意力网络编码器),该架构将模拟器观测与可重用运营预算、约束优化、图状态编码和反事实动作筛选分离开来。我们在CAGE Challenge 4中评估该方法,其中智能体在平均恢复时间(MTTR)、误报响应和防火墙变更管理中断的预算下运行。在整个基准测试中,每个无约束方法在100%的评估回合中违反SOC停机预算,平均停机代理成本为311-430,而预算为50。这补充了先前CAGE Challenge 4的发现,表明仅基于奖励的学习缺乏操作纪律。约束MAPPO-GAT(C-MAPPO-GAT)隔离了拉格朗日运营成本控制和预算感知筛选,而ACD$^3$-GAT增加了预算上下文、CVaR尾部风险估计、对手信念状态和图反事实风险传播(G-CRP)。复现比较包括IPPO、MAPPO-GAT、C-MAPPO-GAT和ACD$^3$-GAT的三个200回合种子。C-MAPPO-GAT将停机违规率从100%降至0.3%,平均停机成本从355.4降至15.5(相对于MAPPO-GAT)。ACD$^3$-GAT将平均停机成本降至48.2,违规率为13.8%,使其处于安全合约前沿而非最保守的合规点。拓扑种子和耦合自适应红方过程压力测试保持了这种对比,并显示安全约束策略的最差自适应退化程度低于仅基于奖励的MAPPO-GAT。

英文摘要

Autonomous network-security response systems promise to reduce Security Operations Centre (SOC) reaction latency, but reward-only multi-agent reinforcement learning (MARL) can improve security reward while remaining non-deployable. We present a safety-contract graph MARL framework and instantiate it as ACD$^3$-GAT (Adaptive Constrained Counterfactual Decisioning with a Graph Attention Network encoder), an architecture that separates simulator observations from reusable operational budgets, constrained optimization, graph state encoding, and counterfactual action screening. We evaluate the method in CAGE Challenge 4, where agents operate under budgets for Mean Time to Recover (MTTR), false-positive response, and firewall change-management disruption. Across the benchmark, every unconstrained method violates the SOC downtime budget in 100% of evaluated episodes, with mean downtime proxy costs of 311-430 against a budget of 50. This complements prior CAGE Challenge 4 findings by showing that reward-only learning lacks operational discipline. Constrained MAPPO-GAT (C-MAPPO-GAT) isolates Lagrangian operational-cost control and budget-aware screening, while ACD$^3$-GAT adds budget context, CVaR tail-risk estimation, opponent-belief state, and Graph Counterfactual Risk Propagation (G-CRP). The replicated comparison includes three 200-episode seeds for IPPO, MAPPO-GAT, C-MAPPO-GAT, and ACD$^3$-GAT. C-MAPPO-GAT reduces downtime violation from 100% to 0.3% and mean downtime cost from 355.4 to 15.5 relative to MAPPO-GAT. ACD$^3$-GAT reduces mean downtime cost to 48.2 with a 13.8% violation rate, placing it on the safety-contract frontier rather than at the most conservative compliance point. Topology-seed and coupled adaptive Red-process stress tests preserve this contrast and show lower worst adaptive degradation for safety-constrained policies than reward-only MAPPO-GAT.

2606.13848 2026-06-15 cs.NI cs.LG 交叉投稿

Temporally Consistent Graph Q-Networks for Intelligent Network Control

时序一致图Q网络用于智能网络控制

Zacharias Veiksaar, Maxime Bouton

发表机构 * Ericsson Research(爱立信研究)

AI总结 提出时序一致图Q网络(TC-GQN)算法,利用图神经网络学习任务无关的全局网络自预测表示,实现多智能体强化学习协调基站动作,在节能与服务质量约束下优于基线方法。

Comments 7 pages, 5 figures. Accepted to the 6G AI-RAN Workshop at IEEE INFOCOM 2026. The final published version will be available via IEEE Xplore

详情
AI中文摘要

移动网络复杂性持续增长,下一代网络需支持不断增加的流量负载和更多样化的服务。随着网络复杂性上升,在动态或变化目标下优化天线参数变得愈发具有挑战性。我们提出一种新颖的多智能体强化学习(MARL)算法,用于移动网络的高级控制和编排。时序一致图Q网络(TC-GQN)算法学习整个网络的任务无关自预测表示,该表示聚合所有基站的信息。图神经网络使用全局奖励函数进行训练,基于学习到的全局网络状态编码分配协调的局部动作。我们在模拟环境中评估该算法,以在不同服务质量(QoS)约束下跨多个扇区和多个载波编排节能功能。所提算法在维持QoS的同时提高硬件休眠时间,优于最先进的基于图的基线和竞争性的基于规则的控制器。此外,学习到的表示能够快速适应变化的意图。

英文摘要

Mobile networks continue to grow in complexity and next generation networks are expected to support both increasing traffic loads and more diverse services. As network complexity rises, optimizing antenna parameters under dynamic or changing objectives becomes increasingly challenging. We propose a novel multi-agent reinforcement learning (MARL) algorithm for high-level control and orchestration of mobile networks. The Temporally Consistent Graph Q-Network (TC-GQN) algorithm learns a self-predicting representation of the whole network that is task-independent and aggregates information from all base-stations. A graph neural network is trained using a global reward function to assign coordinated local actions based on the learned encoding of the global network state. We evaluate the algorithm in a simulated environment to orchestrate an energy-saving feature across multiple sectors and multiple carriers under different quality of service (QoS) constraints. The proposed algorithm outperforms state-of-the-art graph-based baselines and a competitive rule-based controller by improving hardware sleep time while maintaining QoS. Moreover, the learned representation enables rapid adaptation to changing intents.

2606.13886 2026-06-15 cs.RO cs.CV cs.LG 交叉投稿

PhysVLA: Towards Physically-Grounded VLA for Embodied Robotic Manipulation

PhysVLA:面向物理基础的VLA用于具身机器人操作

Namai Chandra, Shriram Damodaran, Lin Wang

发表机构 * IIT Madras(印度理工学院马德拉斯分校) Nanyang Technological University(南洋理工大学)

AI总结 提出PhysVLA,一种即插即用的推理时框架,通过相位有限状态机和选择性欧拉-拉格朗日门,在不重新训练的情况下为任何冻结的VLA骨干注入物理约束,提升成功率、稳定性和轨迹效率。

Comments 9 pages, 5 figures, supplementary material included

详情
AI中文摘要

视觉-语言-动作(VLA)模型擅长将视觉输入和自然语言指令直接映射到机器人控制策略。然而,由于它们主要针对行为演示数据进行训练,并未明确强制执行刚体动力学或接触约束等基本物理原理。这暴露了一个关键的物理差距:在单步或分块VLA上应用的标准时间平滑以轨迹质量为代价,增加了短期记忆无法解决的失败。为弥补这一差距,我们提出PhysVLA(Physics-VLA),一种即插即用、推理时的框架,旨在包装任何冻结的VLA骨干,无需重新训练、微调或权重访问,每个控制步骤的开销小于1毫秒。PhysVLA拦截预测的控制动作,仅捕获模拟器或系统状态,并应用双层校正:(i)一个相位感知的有限状态机,用于结构化离散任务段(接近、抓取、运输和放置),以及(ii)一个选择性欧拉-拉格朗日门,仅在动力学预言器检测到运动学不一致时激活。在LIBERO-Spatial上使用7自由度Franka Panda对OpenVLA、OpenVLA-OFT、Force-VLA和Generalist-VLA进行评估,该框架实现了高达17%的绝对成功率提升和高达19%的稳定性提升,且无每任务回归,在所有四个骨干上轨迹效率提升高达15%,并在Robosuite Lift跨模拟器扫描中显示出高达10倍的轨迹急动度鲁棒性提升。我们还在真实的Agilex Piper机械臂上通过拾取和放置任务进一步验证了该框架,确认PhysVLA无需重新训练即可迁移到物理硬件,成功率提升高达50%,将物理意识确立为一种可组合、骨干无关的运行时模块。

英文摘要

Vision-Language-Action (VLA) models excel at mapping visual inputs and natural language instructions directly to robotic control policies. However, because they are trained primarily to fit behavioural demonstration data, they do not explicitly enforce fundamental physical principles such as rigid-body dynamics or contact constraints. This exposes a critical physics gap: standard temporal smoothing applied on top of single-step or chunked VLAs trades trajectory quality for added failures that short-term memory cannot resolve. To bridge this gap, we introduce PhysVLA (Physics-VLA), a plug-and-play, inference-time framework designed to wrap any frozen VLA backbone without retraining, fine-tuning, or weight access, with less than 1 ms of overhead per control step. PhysVLA intercepts the predicted control action, captures only the simulator or system state, and applies a dual-layered correction: (i) a phase-aware finite-state machine that structures discrete task segments (approach, grasp, transport, and place), and (ii) a selective Euler-Lagrange gate that activates only when a dynamics oracle detects kinodynamic inconsistency. Evaluated across OpenVLA, OpenVLA-OFT, Force-VLA, and Generalist-VLA on LIBERO-Spatial with a 7-DoF Franka Panda, the framework delivers absolute success rate increases of up to 17% and stability increases of up to 19% with no per-task regressions, improves trajectory efficiency by up to 15% across all four backbones, and shows up to a 10x improvement in trajectory jerk robustness on a Robosuite Lift cross-simulator sweep. We further validate the framework on a real Agilex Piper arm with a pick-and-place task, confirming that PhysVLA transfers to physical hardware without retraining, with success-rate improvements of up to 50%, establishing physical awareness as a composable, backbone-agnostic runtime module.

2606.14211 2026-06-15 cs.AI cs.LG 交叉投稿

Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL

缩小反思差距:面向智能体强化学习的免费校准奖励

Yinglun Zhu

发表机构 * University of California, Riverside(加州大学河滨分校)

AI总结 针对LLM智能体在环境反馈后自我评估不准确的问题,提出RefGRPO方法,通过对比反思与实际结果计算免费校准奖励并动态调整系数,同时提升反思校准和任务准确率。

详情
AI中文摘要

LLM越来越多地被部署为与外部环境交互并观察执行结果、错误消息和工具输出等反馈的智能体。一个功能良好的智能体应能利用这些反馈准确评估自身表现。然而,我们发现存在持续的反思差距:LLM智能体在观察到具体环境反馈后,倾向于错误评估自身输出——即使对于它们正确回答的问题也是如此——而标准RL由于信用分配不匹配几乎无济于事。为缩小这一差距,我们提出RefGRPO,一种简单而有效的修复方法,通过两个关键要素增强标准RL算法:一个免费校准奖励,通过对比智能体自身反思与实际结果计算(无需额外奖励模型、LLM评判或外部标注),以及对其系数的动态调度。与标准RL基线相比,我们的方法在五个基准的文本到SQL任务上同时提高了反思校准(例如,将不自信率从44.4%降至7.7%)和任务准确率(例如,从75.1%提升至76.5%)。由此产生的校准反思将智能体转变为基于环境反馈的自身验证器,进一步实现:(i)更好的自我改进,使用反思作为伪奖励而无需结果监督;(ii)更有效的测试时选择性预测,仅提交标记为正确的rollout。

英文摘要

LLMs are increasingly deployed as agents that interact with external environments and observe feedback such as execution results, error messages, and tool outputs. A well-functioning agent should be able to leverage this feedback to accurately assess its own performance. Yet we find a persistent reflection gap: LLM agents tend to mis-assess their own outputs after observing concrete environment feedback -- even for questions they correctly answered -- and standard RL barely helps due to a credit-assignment mismatch. To close this gap, we propose RefGRPO, a simple yet effective fix that augments standard RL algorithms with two key ingredients: a free calibration bonus computed by contrasting the agent's own reflection with the actual outcome (requiring no additional reward model, LLM judge, or external annotation), and a dynamic schedule on its coefficient. Compared to standard RL baselines, our method simultaneously improves reflection calibration (e.g., reduces underconfidence rate $44.4\% \to 7.7\%$) and task accuracy (e.g., $75.1\% \to 76.5\%$) on text-to-SQL across five benchmarks. The resulting calibrated reflection turns the agent into its own verifier grounded in environment feedback, which further enables (i) better self-improvement that uses reflections as pseudo-rewards without outcome supervision, and (ii) more effective test-time selective prediction by committing only to rollouts flagged as correct.

2606.14418 2026-06-15 cs.AI cs.LG cs.RO 交叉投稿

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

用于蒙特卡洛树搜索规划的因果对象中心模型

Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov

发表机构 * MIRAI CogAILab

AI总结 提出COMET算法,结合无监督对象中心编码器和Transformer世界模型,通过动作-槽融合机制和对象因果注意力实现高效规划,在多个基准上优于基线方法。

详情
AI中文摘要

我们提出了COMET(用于高效树搜索的因果对象中心模型),一种基于模型的强化学习算法,在槽结构化的潜在空间中执行蒙特卡洛树搜索。COMET将冻结的无监督对象中心编码器与基于Transformer的世界模型配对,其中通过一种新颖的动作-槽融合机制将动作绑定到对象上,该机制用于槽转移预测。策略和价值头使用对象因果注意力,通过学习到的每槽相关性分数调节令牌交互,使决策集中在任务相关实体上。COMET为MuZero风格的潜在规划增加了显式的对象级归纳偏差。在来自Object-Centric Visual RL基准、ManiSkill、Robosuite和VizDoom的八个视觉和动态多样化的任务中,COMET在训练早期相比对象中心和单一基线实现了更高的平均归一化分数。

英文摘要

We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction. Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning. Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.

2605.13217 2026-06-15 cs.CL cs.AI cs.LG 交叉投稿

GAGPO: Generalized Advantage Grouped Policy Optimization

GAGPO:通用优势分组策略优化

Siyuan Zhu, Chao Yu, Rongxin Yang, Zongkai Liu, Jinjun Hu, Qiwen Chen, Yibo Zhang

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University(中山大学计算机科学与工程学院) Meituan(美团)

AI总结 GAGPO提出一种无需价值模型的强化学习方法,通过分组价值代理和动作重要性比,实现多轮任务中精确的时间信用分配,实验表明其在ALFWorld和WebShop上优于现有基线。

详情
AI中文摘要

GAGPO提出了一种无需价值模型的强化学习方法,通过分组价值代理和动作重要性比,实现多轮任务中精确的时间信用分配,实验表明其在ALFWorld和WebShop上优于现有基线。

英文摘要

Reinforcement learning has become a powerful paradigm for post-training large language model agents, yet credit assignment in multi-turn environments remains a challenge. Agents often receive sparse, trajectory-level rewards only at the end of an episode, making it difficult to determine which intermediate actions contributed to success or failure. As a result, propagating delayed outcomes back to individual decision steps without relying on costly auxiliary value models remains an open problem. We propose Generalized Advantage Grouped Policy Optimization (GAGPO), a critic-free reinforcement learning method for precise, step-aligned temporal credit assignment. GAGPO constructs a non-parametric grouped value proxy from sampled rollouts and uses it to compute TD/GAE-style temporal advantages, recursively propagating outcome supervision backward through time. Combined with group-wise advantage normalization and an action-level importance ratio, GAGPO extracts stable, localized optimization signals directly from multi-turn trajectories. Experiments on ALFWorld and WebShop show that GAGPO outperforms strong reinforcement learning baselines. Further analyses demonstrate faster early-stage learning, improved interaction efficiency, and smoother optimization dynamics, suggesting that GAGPO offers a simple yet effective framework for multi-turn agentic reinforcement learning.

2507.20068 2026-06-15 cs.LG stat.ML 版本更新

PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

PERRY: 使用辅助数据的策略评估与置信区间

Aishwarya Mandyam, Jason Meng, Ge Gao, Jiankai Sun, Mac Schwager, Barbara E. Engelhardt, Emma Brunskill

发表机构 * Stanford University(斯坦福大学) Gladstone Institutes(加利福尼亚大学旧金山分校)

AI总结 提出两种方法,利用辅助数据构建离线策略评估的置信区间,通过共形预测和双重稳健估计,在多个模拟和真实医疗数据集上验证有效性。

详情
AI中文摘要

离线策略评估(OPE)方法在部署前估计新强化学习(RL)策略的价值。最近的研究表明,利用辅助数据集(例如由生成模型合成的数据)可以提高OPE方法的准确性。不幸的是,此类辅助数据集也可能存在偏差,并且现有在OPE中使用数据增强的方法缺乏原则性的不确定性量化。在医疗等高风险领域,可靠的不确定性估计对于确保RL策略的安全和知情部署至关重要。在这项工作中,我们提出了两种方法来构建带有数据增强的OPE的有效置信区间。第一种方法提供了关于$V^{\pi}(s)$的置信区间,即条件于初始状态$s$的策略价值。为此,我们引入了一种适用于具有连续状态空间的马尔可夫决策过程(MDP)的新共形预测方法,将先前工作扩展到更高维度的设置。其次,我们考虑更常见的任务,即估计所有初始状态上的平均策略性能$V^{\pi}$;我们引入了一种方法,该方法借鉴了双重稳健估计和预测驱动推断的思想。在涵盖库存管理、机器人、医疗以及来自MIMIC-IV的真实医疗数据集的模拟器中,我们发现我们的方法可以有效利用辅助数据,并一致地产生覆盖真实策略价值的置信区间,这与先前提出的方法不同。我们的工作使得OPE能够在高风险领域提供严格的不确定性估计成为可能。

英文摘要

Off-policy evaluation (OPE) methods estimate the value of a new reinforcement learning (RL) policy prior to deployment. Recent advances have shown that leveraging auxiliary datasets, such as those synthesized by generative models, can improve the accuracy of OPE methods. Unfortunately, such auxiliary datasets may also be biased, and existing methods for using data augmentation within OPE lack principled uncertainty quantification. In high stakes domains like healthcare, reliable uncertainty estimates are important for ensuring safe and informed deployment of RL policies. In this work, we propose two methods to construct valid confidence intervals for OPE with data augmentation. The first provides a confidence interval over $V^π(s)$, the policy value conditioned on an initial state $s$. To do so we introduce a new conformal prediction method suitable for Markov Decision Processes (MDPs) with continuous state spaces, extending prior work to higher-dimensional settings. Second, we consider the more common task of estimating the average policy performance over all initial states, $V^π$; we introduce a method that draws on ideas from doubly robust estimation and prediction powered inference. Across simulators spanning inventory management, robotics, healthcare, and a real healthcare dataset from MIMIC-IV, we find that our methods can effectively leverage auxiliary data and consistently produce confidence intervals that cover the ground truth policy values, unlike previously proposed methods. Our work enables a future in which OPE can provide rigorous uncertainty estimates for high-stakes domains.

2509.18930 2026-06-15 cs.LG cs.AI 版本更新

Tackling GNARLy Problems: Graph Neural Algorithmic Reasoning Reimagined through Reinforcement Learning

解决GNARLy问题:通过强化学习重新构想图神经算法推理

Alex Schutz, Victor-Alexandru Darvariu, Efimia Panagiotaki, Bruno Lacerda, Nick Hawes

发表机构 * Oxford Robotics Institute, University of Oxford(牛津大学机器人研究所) Stateful Robotics

AI总结 提出GNARL框架,将算法轨迹学习转化为马尔可夫决策过程,结合模仿学习和强化学习,在CLRS-30问题上取得高精度,适用于NP难问题及无专家算法场景。

详情
AI中文摘要

神经算法推理(NAR)是一种通过监督学习训练神经网络执行经典算法的范式。尽管取得了成功,但仍存在重要局限性:无法在不进行后处理的情况下构建有效解,无法推理多个正确解,在组合NP难问题上性能差,且不适用于尚未已知强算法的问题。为了解决这些局限性,我们将学习算法轨迹的问题重新定义为马尔可夫决策过程,这为解构建过程施加了结构,并解锁了模仿学习和强化学习(RL)的强大工具。我们提出了GNARL框架,包括将问题从NAR转化为RL的方法论,以及适用于广泛图问题的学习架构。我们在多个CLRS-30问题上取得了非常高的图准确率结果,性能匹配或超过针对NP难问题的更窄NAR方法,并且值得注意的是,即使在缺乏专家算法的情况下也能适用。

英文摘要

Neural algorithmic reasoning (NAR) is a paradigm that trains neural networks to execute classic algorithms by supervised learning. Despite its successes, important limitations remain: inability to construct valid solutions without post-processing and to reason about multiple correct ones, poor performance on combinatorial NP-hard problems, and inapplicability to problems for which strong algorithms are not yet known. To address these limitations, we reframe the problem of learning algorithm trajectories as a Markov decision process, which imposes structure on the solution construction procedure and unlocks the powerful tools of imitation and reinforcement learning (RL). We propose the GNARL framework, encompassing the methodology to translate problem formulations from NAR to RL and a learning architecture suitable for a wide range of graph-based problems. We achieve very high graph accuracy results on several CLRS-30 problems, performance matching or exceeding much narrower NAR approaches for NP-hard problems and, remarkably, applicability even when lacking an expert algorithm.

2510.02695 2026-06-15 cs.LG cs.AI 版本更新

RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

RAMAC: 多模态风险感知离线强化学习及行为正则化的作用

Kai Fukazawa, Kunal Mundada, Iman Soltani

AI总结 提出RAMAC框架,结合分布性评论家与生成式演员(如扩散模型),通过条件风险价值与行为克隆的复合目标实现离线强化学习中的风险敏感学习,抑制分布外动作并提升CVaR。

Comments ICML 2026

详情
AI中文摘要

在安全关键领域中,当在线数据收集不可行时,离线强化学习(RL)只有在策略能够实现高回报且避免灾难性的下尾风险时才具有吸引力。先前关于风险厌恶离线RL的工作通过(i)基于值/模型的悲观主义或(ii)限制策略类以限制表达能力来实现安全性,而扩散/流式表达性生成策略主要在中性风险设置中使用。我们引入了\textbf{风险感知多模态演员-评论家(RAMAC)},一个简单、模块化、无模型的框架,它将表达性生成演员(例如扩散/流)与分布性评论家相结合,并优化一个结合条件风险价值(CVaR)与行为克隆(BC)的复合目标,从而在复杂的多模态场景中实现风险敏感学习。由于分布外(OOD)动作是离线RL中灾难性失败的主要驱动因素,我们进一步提供了一个目标层面的分析,表明通过BC控制行为发散可以抑制OOD动作并稳定CVaR。使用扩散演员实例化RAMAC,我们在二维风险赌博机上展示了这些见解,并在Stochastic-D4RL上进行了评估,观察到在保持高回报的同时,$\mathrm{CVaR}_{0.1}$的一致提升。代码和实验结果可在\href{this https URL}{项目网站}上获取。

英文摘要

In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) is attractive only if policies achieve high returns without catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of either (i) value/model-based pessimism or (ii) restricted policy classes that limit expressiveness, whereas diffusion/flow-based expressive generative policies have largely been used in risk-neutral settings. We introduce \textbf{Risk-Aware Multimodal Actor-Critic (RAMAC)}, a simple, modular, model-free framework that couples an expressive generative actor (e.g., diffusion/flow) with a distributional critic and optimizes a composite objective that combines Conditional Value-at-Risk (CVaR) with behavioral cloning (BC), enabling risk-sensitive learning in complex multimodal scenarios. Since out-of-distribution (OOD) actions are a major driver of catastrophic failures in offline RL, we further provide an objective-level analysis showing that controlling behavior divergence via BC suppresses OOD actions and stabilizes CVaR. Instantiating RAMAC with a diffusion actor, we illustrate these insights on a 2-D risky bandit and evaluate on Stochastic-D4RL, observing consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns. The code and experimental results are available on the \href{https://kaifukazawa.github.io/ramac-project/} {project website}

2601.19810 2026-06-15 cs.LG cs.AI cs.RO 版本更新

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

高效探索的无监督学习:通过自我设定目标预训练自适应策略

Octavio Pappalardo

发表机构 * University College London (UCL)(伦敦大学学院(UCL))

AI总结 提出ULEE方法,结合上下文学习器与对抗性目标生成策略,在无监督元学习框架中优化多回合探索与适应,提升零样本和少样本性能。

Comments ICLR 2026; v2 adds link to code: https://github.com/Octavio-Pappalardo/ulee-jax

详情
Journal ref
The Fourteenth International Conference on Learning Representations, 2026
AI中文摘要

无监督预训练可以为强化学习智能体提供先验知识,加速下游任务的学习。一个基于人类发展的有前景方向是研究智能体通过设定和追求自身目标来学习。核心挑战在于如何有效地生成、选择并从这些目标中学习。我们的关注点是下游任务的广泛分布,其中零样本解决每个任务是不可行的。当目标任务位于预训练分布之外或智能体未知其身份时,这种设置自然出现。在这项工作中,我们(i)在元学习框架内优化高效的多回合探索和适应,以及(ii)用智能体适应后性能的演化估计来指导训练课程。我们提出了ULEE,一种无监督元学习方法,它将上下文学习器与对抗性目标生成策略相结合,该策略将训练维持在智能体能力的前沿。在XLand-MiniGrid基准测试中,ULEE预训练产生了改进的探索和适应能力,这些能力泛化到新的目标、环境动态和地图结构。得到的策略获得了改进的零样本和少样本性能,并为更长的微调过程提供了强初始化。它优于从头学习、DIAYN预训练和替代课程。代码可在以下网址获取:https://github.com/facebookresearch/ulee

英文摘要

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax

2602.04879 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Rethinking the Trust Region in LLM Reinforcement Learning

重新思考LLM强化学习中的信任区域

Penghui Qi, Xiangxin Zhou, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Toronto(多伦多大学)

AI总结 针对PPO在LLM微调中因词表大导致的训练不稳定问题,提出基于策略散度直接约束的DPPO算法,并引入高效近似方法。

详情
AI中文摘要

强化学习已成为微调大型语言模型(LLM)的基石,其中近端策略优化(PPO)是事实上的标准算法。尽管其普遍存在,我们认为PPO中的核心比率裁剪机制在结构上不适合LLM固有的大词表。PPO基于采样令牌的概率比率约束策略更新,该比率是对真实策略散度的有噪单样本蒙特卡洛估计。这导致次优的学习动态:低概率令牌的更新被过度惩罚,而高概率令牌中潜在的灾难性变化却约束不足,导致训练效率低下和不稳定。为解决此问题,我们提出散度近端策略优化(DPPO),用基于策略散度(如总变差或KL)直接估计的更原则性约束替代启发式裁剪。为避免巨大内存占用,我们引入了高效的二元和Top-K近似,以可忽略的开销捕获本质散度。大量实证评估表明,DPPO相比现有方法实现了更优的训练稳定性和效率,为基于RL的LLM微调提供了更稳健的基础。我们的代码可在https://github.com/sail-sg/Stable-RL获取。

英文摘要

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning. Our code is available at https://github.com/sail-sg/Stable-RL.

2602.14169 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

基于枢轴驱动重采样的LLM强化学习深度密集探索

Yiran Guo, Zhongjian Qiao, Yingqi Xie, Jie Liu, Dan Ye, Ruiqing Zhang, Shuang Qiu, Lijie Xu

发表机构 * Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所) City University of Hong Kong(香港城市大学) Baidu(百度)

AI总结 针对大语言模型强化学习中探索效率低的问题,提出深度密集探索(DDE)策略,通过识别失败轨迹中的可恢复枢轴状态并局部密集重采样,结合双流优化目标,在数学推理基准上优于现有方法。

详情
AI中文摘要

有效探索是大语言模型强化学习中的一个关键挑战:在有限的采样预算内,从庞大的自然语言序列空间中发现高质量轨迹。现有方法面临显著局限性:GRPO仅从根节点采样,使高概率轨迹饱和,而深层易错状态探索不足;基于树的方法盲目地将预算分散到琐碎或不可恢复的状态,导致采样稀释,无法发现罕见的正确后缀并破坏局部基线。为解决此问题,我们提出深度密集探索(DDE),一种将探索聚焦于失败轨迹中的“枢轴”——深层、可恢复状态的策略。我们通过DEEP-GRPO实例化DDE,引入三个关键创新:(1)轻量级数据驱动效用函数,自动平衡可恢复性和深度偏差以识别枢轴状态;(2)在每个枢轴处进行局部密集重采样,增加发现后续正确轨迹的概率;(3)双流优化目标,将全局策略学习与局部纠正更新解耦。在数学推理基准上的实验表明,我们的方法一致优于GRPO、基于树的方法及其他强基线。代码见 https://this https URL

英文摘要

Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution that fails to uncover rare correct suffixes and destabilizes local baselines. To address this, we propose Deep Dense Exploration (DDE), a strategy that focuses exploration on $\textit{pivots}$-deep, recoverable states within unsuccessful trajectories. We instantiate DDE with DEEP-GRPO, which introduces three key innovations: (1) a lightweight data-driven utility function that automatically balances recoverability and depth bias to identify pivot states; (2) local dense resampling at each pivot to increase the probability of discovering correct subsequent trajectories; and (3) a dual-stream optimization objective that decouples global policy learning from local corrective updates. Experiments on mathematical reasoning benchmarks demonstrate that our method consistently outperforms GRPO, tree-based methods, and other strong baselines. Code is available at https://github.com/AgentCombo/DEEP-GRPO

2603.12231 2026-06-15 cs.LG 版本更新

Temporal Straightening for Latent Planning

时间拉直用于隐式规划

Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim G. J. Rudner, Yann LeCun, Mengye Ren

发表机构 * New York University(纽约大学) Brown University(布朗大学) University of Toronto(多伦多大学)

AI总结 受人类视觉处理中感知拉直假说启发,提出时间拉直方法,通过曲率正则化联合学习JEPA世界模型的编码器和预测器,改善隐式规划中的表示学习,使梯度规划更稳定并提高目标到达任务成功率。

Comments ICML2026 Camera Ready

详情
AI中文摘要

学习良好的表示对于基于世界模型的隐式规划至关重要。虽然预训练的视觉编码器能产生强大的语义视觉特征,但它们并非为规划定制,且包含与规划无关甚至有害的信息。受人类视觉处理中感知拉直假说的启发,我们引入时间拉直来改进隐式规划的表示学习。通过使用鼓励局部拉直隐式轨迹的曲率正则化器,我们联合学习联合嵌入预测架构(JEPA)世界模型的编码器和预测器。我们表明,以这种方式降低曲率使得隐空间中的欧氏距离更好地近似测地距离,并改善了规划目标的条件。我们通过实验证明,时间拉直使得基于梯度的规划更稳定,并在一系列目标到达任务中显著提高了成功率。我们的代码可在该 https URL 获取。

英文摘要

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is available at https://agenticlearning.ai/temporal-straightening.

2603.18464 2026-06-15 cs.LG 版本更新

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

AcceRL: 面向视觉-语言-动作模型的分布式异步强化学习与世界模型框架

Chengxuan Lu, Shukuan Wang, Yanjie Li, Yingying Fang, Huoyan Wang, Tian Zhang, Wei Liu, Shiji Jin, Fuyuan Qian, Peiming Li, Chao Xu, Baigui Sun, Yang Liu

发表机构 * IROOTECH TECHNOLOGY Wolf 1069 b Lab, Sany Group(伊罗科技沃尔夫1069b实验室,三一集团)

AI总结 提出AcceRL框架,通过物理隔离环境交互、模型推理和梯度更新实现分布式异步强化学习,消除同步系统的空闲气泡,提升硬件利用率,并支持即插即用的世界模型集成,在LIBERO任务上实现2.4倍吞吐加速和200倍样本效率提升。

详情
AI中文摘要

大规模视觉-语言-动作(VLA)模型的强化学习(RL)严重受限于同步障碍和环境数据获取的高成本。为克服这些挑战,我们提出AcceRL,一种分布式异步RL框架,物理隔离环境交互、模型推理和梯度更新。通过消除同步系统中固有的级联长尾空闲气泡,AcceRL最大化硬件利用率并确保可扩展吞吐量。此外,AcceRL采用模块化设计,支持将多种即插即用的世界模型集成到其分布式流水线中。大量实验表明,基础框架在所有四个LIBERO~\cite{liu2023libero}任务套件上均取得极具竞争力的性能。系统层面,异步架构相比领先的同步基线实现了2.4倍的吞吐加速。算法层面,通过利用在1000条离线轨迹上预训练的世界模型,AcceRL在LIBERO-Spatial上实现了高达200倍的在线样本效率提升,为具身AI建立了一个既样本高效又时间高效的稳健框架。代码包含在补充材料中。代码见此网址。

英文摘要

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models is severely bottlenecked by synchronization barriers and the high cost of environment data acquisition. To overcome these challenges, we propose AcceRL, a distributed asynchronous RL framework that physically isolates environment rollouts, model inference, and gradient updates. By eliminating the cascading long-tail idle bubbles inherent in synchronous systems, AcceRL maximizes hardware utilization and ensures scalable throughput. Furthermore, AcceRL features a modular design that supports the integration of diverse, plug-and-play world models into its distributed pipeline. Extensive experiments demonstrate that the base framework achieves highly competitive performance across all four LIBERO~\cite{liu2023libero} task suites. Systematically, the asynchronous architecture delivers a $2.4\times$ throughput speedup over leading synchronous baselines. Algorithmically, by leveraging a world model pre-trained on 1,000 offline trajectories, AcceRL achieves up to a $200\times$ improvement in online sample efficiency on LIBERO-Spatial, establishing a robust framework that is both sample-efficient and time-efficient for embodied AI. Code is included in the supplementary material. Code is available at https://github.com/distanceLu/AcceRL.

2605.03065 2026-06-15 cs.LG cs.RO 版本更新

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

OGPO:生成控制策略的样本高效全微调

Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine, Abhishek Gupta, Hongkai Dai, Paarth Shah, Max Simchowitz

发表机构 * University of California, Berkeley(加州大学伯克利分校) UC Berkeley(加州大学伯克利分校)

AI总结 提出OGPO算法,通过离策略评论网络和修改的PPO目标,实现生成控制策略的样本高效微调,在多种操作任务上达到最优性能,并能在无专家数据下微调不良初始化的行为克隆策略。

详情
AI中文摘要

生成控制策略(GCPs),如基于扩散和基于流的控制策略,已成为机器人学习的有效参数化方法。本文介绍了离策略生成策略优化(OGPO),一种用于微调GCPs的样本高效算法,该算法维护离策略评论网络以最大化数据重用,并通过修改的PPO目标将策略梯度传播到策略的完整生成过程,使用评论网络作为终端奖励。OGPO在涵盖多任务设置、高精度插入和灵巧控制的操作任务上达到了最先进的性能。据我们所知,它也是唯一一种能够在在线回放缓冲区中无专家数据的情况下,将初始化不良的行为克隆策略微调到接近完全任务成功的方法,并且只需很少的任务特定超参数调整。通过广泛的实证研究,我们证明了OGPO在策略引导和残差学习方面显著优于替代方法,并确定了其性能背后的关键机制。我们进一步引入了实用的稳定技巧,包括成功缓冲区正则化、双边保守优势和Q方差减少,以减轻基于状态和基于像素的设置中的评论网络过度利用。除了提出OGPO,我们还对GCP微调进行了系统的实证研究,确定了控制成功离策略全策略改进的稳定机制和失败模式。

英文摘要

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate that OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilization tricks, including success-buffer regularization, two-sided conservative advantages, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

4. 生成模型与概率建模 16 篇

2606.13955 2026-06-15 cs.LG 新提交

Smoothing Dark Areas in Molecular Latent Diffusion

分子潜在扩散中的暗区平滑

Xi Wang, Jiahan Li, Yuxuan Xia, Yingcheng Wu, Shaoyi Zheng, Shengjie Wang

发表机构 * New York University(纽约大学) Stanford University(斯坦福大学)

AI总结 针对分子潜在扩散中存在的暗区问题,提出拓扑优化VAE(TopVAE),通过训练时内化结构和化学约束,减少暗区,提升离后验鲁棒性,在QM9和GEOM-Drugs上取得显著改进。

详情
AI中文摘要

潜在扩散是可扩展3D分子生成的有前景框架,但它需要潜在空间在扩散采样之外保持平滑、有效且可导航。然而,现有的分子VAE通常通过基于重建的目标学习,这并不能保证这样的潜在空间。我们表明这会导致暗区:在扩散采样过程中可达但解码为不连通或化学无效分子的潜在空间区域。与图像生成不同,分子解码需要严格的结构和化学精度,因此即使微小的潜在扰动也可能导致灾难性失败。因此,我们提出TopVAE,一种拓扑优化的VAE,通过使解码器在训练期间内化结构和化学约束来减少暗区,消除了测试时化学校正的需要。TopVAE大大提高了离后验鲁棒性,当与标准DiT配对时,在QM9上实现了$77\%$更低的FCD-3D、最高的V&C,在GEOM-Drugs上实现了$52\%$更低的FCD-3D,以及在零样本支架修复中实现了$1.29{\ imes}$更稳定和更连通的分子。

英文摘要

Latent diffusion is a promising framework for scalable 3D molecular generation, but it requires a latent space that remains smooth, valid, and navigable beyond posterior samples. Existing molecular VAEs, however, are typically learned through reconstruction-based objectives, which do not guarantee such a latent space. We show that this leads to dark areas: regions of latent space that are reachable during diffusion sampling but decode to disconnected or chemically invalid molecules. Unlike in image generation, molecular decoding requires strict structural and chemical precision, so even small latent perturbations can produce catastrophic failures. We therefore propose TopVAE, a topology-optimized VAE that reduces dark areas by making the decoder internalize structural and chemical constraints during training, eliminating the need for test-time chemical correction. TopVAE greatly improves off-posterior robustness, and when paired with a standard DiT, achieves $77\%$ lower FCD-3D on QM9, the highest V&C, $52\%$ lower FCD-3D on GEOM-Drugs, and $1.29{\times}$ more stable and connected molecules on zero-shot scaffold inpainting.

2606.14139 2026-06-15 cs.LG 新提交

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

全波形反演的扩散模型解耦潜变量优化

Chen Min, Zheng Ma

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University(上海交通大学数学科学学院) CMA-Shanghai, Shanghai Jiao Tong University(上海交通大学CMA-上海)

AI总结 提出解耦潜变量优化(DLO),通过二次惩罚目标分离物理变量和潜变量,结合数据保真度梯度和扩散先验,在OpenFWI基准上优于经典正则化和现有扩散方法。

Comments 35 pages, 14 figures

详情
AI中文摘要

全波形反演(FWI)通过求解严重不适定、非凸的PDE约束优化,从地震记录中恢复地下速度。经典正则化方法稳定反演但无法再现真实地质结构;最近的扩散先验方法提高了真实性,但以数据保真度和先验一致性之间的脆弱权衡为代价。我们提出解耦潜变量优化(DLO),将标准潜变量优化形式松弛为辅助物理变量和潜变量上的二次惩罚目标。数据保真度梯度作用于物理空间,扩散采样器仅通过解码的先验样本贡献,并保留了经典FWI的标准平滑速度初始化。在OpenFWI基准上,DLO在干净、含噪和缺失道采集下优于经典正则化和现有扩散方法。在70×70 OpenFWI模型上训练的先验直接迁移到Marmousi和Overthrust基准,DLO恢复了复杂的断层结构,并对初始化平滑和测量噪声保持鲁棒。

英文摘要

Full waveform inversion (FWI) recovers subsurface velocity from seismic recordings by solving a severely ill-posed, nonconvex PDE-constrained optimization. Classical regularizers stabilize the inversion but fail to reproduce realistic geological structures; recent diffusion-prior methods improve realism at the cost of a fragile trade-off between data fidelity and prior consistency. We propose Decoupled Latent Optimization (DLO), which relaxes the standard latent-optimization formulation into a quadratic-penalty objective over an auxiliary physical variable and a latent variable. The data-fidelity gradient acts in physical space, the diffusion sampler contributes only through a decoded prior sample, and the standard smoothed-velocity initialization of classical FWI is preserved. On the OpenFWI benchmark, DLO outperforms classical regularizers and existing diffusion-based methods under clean, noisy, and missing-trace acquisitions. The prior, trained on 70*70 OpenFWI models, transfers directly to the Marmousi and Overthrust benchmarks, where DLO recovers intricate fault structures and remains robust to initialization smoothing and measurement noise.

2606.14215 2026-06-15 cs.LG 新提交

LapidaryEngine: Fully Conversational Crystal Generation

LapidaryEngine: 全对话式晶体生成

Yusei Ito, Yuta Suzuki, Tomoya Murata, Masaki Adachi

发表机构 * Lattice Lab, Toyota Motor Corporation(丰田汽车公司Lattice实验室) The University of Osaka(大阪大学)

AI总结 提出LapidaryEngine,首个支持全对话式晶体生成的模型,通过枢轴表示实现文本与晶体结构的双向翻译,支持自由形式自然语言请求和迭代优化。

Comments 11 main pages, 5 main figures, and 1 table

详情
AI中文摘要

大型语言模型(LLM)的出现激发了直接从自然语言指令生成定制晶体材料的愿景,使用户能够通过直观的对话式交互设计材料。现有的文本到晶体生成模型代表了朝着这一目标的重要早期步骤,但它们存在两个关键限制:(i)输入格式受限,需要高度结构化的描述(例如化学式),以及(ii)单向生成,模型可以将文本映射到晶体,但无法执行逆映射。这些限制阻碍了全对话式工作流程,并妨碍了与用户固有的模糊和不断变化的需求的对齐。我们通过LapidaryEngine解决了这些挑战,这是第一个支持全对话式晶体生成的模型。LapidaryEngine接受自由形式的自然语言请求,并以类似对话的方式执行迭代优化和编辑。关键创新在于枢轴表示,这是一种第三种中间形式,尽管缺乏直接配对数据集,但仍能实现文本与晶体结构之间的双向翻译。利用这一枢轴可以稳健地解释用户反馈并进行精确的结构控制。我们在多种任务上展示了LapidaryEngine,包括绝缘体发现、稳定性优化、成分修改和结构编辑,展示了其以交互方式使生成的材料与用户意图对齐的能力。

英文摘要

The emergence of Large Language Models (LLMs) has inspired the vision of generating bespoke crystal materials directly from natural-language instructions, enabling users to design materials through intuitive, conversational interaction. Existing text-to-crystal generative models represent important early steps toward this goal, but they suffer from two critical limitations: (i) restricted input formats that require highly structured descriptions (e.g., chemical formulas), and (ii) one-directional generation, where models can map text to crystal but cannot perform the inverse. These limitations prevent fully conversational workflows and hinder alignment with users' inherently ambiguous and evolving desiderata. We address these challenges with LapidaryEngine, the first model to support fully conversational crystal generation. LapidaryEngine accepts free-form natural-language requests and performs iterative refinement and editing in a dialogue-like manner. The key innovation is a pivot representation, a third, intermediate form that enables bidirectional translation between text and crystal structures despite the absence of direct paired datasets. Leveraging this pivot allows robust interpretation of user feedback and precise structural control. We demonstrate LapidaryEngine across diverse tasks, including insulator discovery, stability optimization, compositional modification, and structural editing, showcasing its ability to align generated materials with user intent in an interactive manner.

2606.14235 2026-06-15 cs.LG 新提交

Implicit Variational Rejection Sampling

隐式变分拒绝采样

Jian Xu, Shigui Li, Wei Chen, Jiacheng Li, Zhiqi Lin, Delu Zeng, Xinghao Ding, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS RIKEN AIP South China University of Technology(华南理工大学) Xiamen University(厦门大学) Columbia University(哥伦比亚大学)

AI总结 提出隐式变分拒绝采样(IVRS),结合隐式分布与拒绝采样,通过神经网络构建提议分布并用判别器估计密度比来改进后验近似,引入IR-ELBO作为质量度量,实验优于传统变分推断。

详情
AI中文摘要

变分推断(VI)是贝叶斯机器学习中用于近似复杂后验分布的基本推断技术。传统的VI通常依赖于均值场分解,这可能无法充分捕捉真实后验的复杂性。最近的进展利用神经网络建模隐式分布,提供了更大的灵活性。然而,神经网络架构的实际约束仍然会导致不准确性。在本文中,我们提出了一种称为隐式变分拒绝采样(IVRS)的方法,该方法将隐式分布与拒绝采样相结合,以改进后验近似。我们的方法使用神经网络构建隐式提议分布,并通过一个判别器网络进行拒绝采样,该网络估计隐式提议与真实后验之间的密度比,以细化近似。为此,我们引入了隐式重采样证据下界(IR-ELBO)作为度量重采样分布质量的指标,并推导出更紧的变分下界。实验结果表明,我们的方法优于传统的变分推断技术。

英文摘要

Variational Inference (VI) is a fundamental inference technique in Bayesian machine learning for approximating complex posterior distributions. Traditional VI often relies on the mean-field factorization, which can inadequately capture true posterior complexity. Recent advancements have leveraged neural networks to model implicit distributions, offering increased flexibility. However, the practical constraints of neural network architectures still produces inaccuracies. In this paper, we propose a method called Implicit Variational Rejection Sampling (IVRS), which integrates implicit distributions with rejection sampling to improve the posterior approximation. Our method uses neural networks to construct implicit proposal distributions, and rejection sampling with a discriminator network that estimates the density ratio between the implicit proposal and the true posterior for refining the approximation. Towards this end, we introduce the Implicit Resampling Evidence Lower Bound (IR-ELBO) as a metric to characterize the resampled distribution's quality and derive a tighter variational lower bound. Experimental results demonstrate that our method outperforms traditional variational inference techniques.

2606.14510 2026-06-15 cs.LG 新提交

PepALD: Macrocyclic Peptide Generation via Autoregressive Latent Diffusion

PepALD: 通过自回归潜在扩散生成大环肽

Junming Zhang, Siyu Yi, Wei Ju, Zhonghui Gu

发表机构 * College of Computer Science, Sichuan University(四川大学计算机科学学院) School of Mathematics, Sichuan University(四川大学数学学院) School of Artificial Intelligence, Sichuan University(四川大学人工智能学院) Lingang Laboratory(临港实验室)

AI总结 提出PepALD模型,结合自回归潜在扩散与化学嵌入,实现从头设计大环肽,并利用偏好优化提升亲和力,在生成质量和奖励优化上优于基线。

Comments 18 pages, 5 figures, 3 tables

详情
AI中文摘要

大环肽是细胞内靶点的有前景的治疗候选物,但其设计需要同时控制非天然单体化学、环拓扑、膜通透性和靶点结合。现有的SMILES或HELM字符串生成模型要么在长原子级序列空间中操作,要么将单体视为具有有限化学基础符号化令牌。我们引入了PepALD,一个用于从头生成大环肽的自回归潜在扩散(ALD)基础模型。该模型使用结构化化学嵌入表示HELM单体,通过在化学信息潜在空间中的上下文条件扩散生成每个残基,在自回归生成过程中预测R基团感知的环闭合,并使用胜者保护的扩散自适应偏好优化将去噪器与亲和力奖励对齐。体外实验表明,PepALD在生成质量和奖励优化性能上优于代表性肽生成基线。

英文摘要

Macrocyclic peptides are promising therapeutic candidates for intracellular targets, but their design requires simultaneous control over non-natural monomer chemistry, ring topology, membrane permeability, and target binding. Existing SMILES- or HELM-string generative models either operate in long atom-level sequence spaces or treat monomers as symbolic tokens with limited chemical grounding. We introduce PepALD, an Autoregressive Latent Diffusion (ALD) foundation model for \textit{de novo} macrocyclic peptide generation. The model represents HELM monomers with structured chemical embeddings, generates each residue through context-conditioned diffusion in chemically informed latent space, predicts R-group-aware ring closures during autoregressive generation, and aligns the denoiser to affinity rewards using winner-protected diffusion-adapted preference optimization. In silico experiments demonstrate PepALD's generation quality and reward-optimization performance against representative peptide generation baselines.

2606.13796 2026-06-15 stat.ML cs.LG 交叉投稿

Recursively Trained Diffusion Models: Limiting Collapse Distribution and Spectral Characterization

递归训练的扩散模型:限制崩溃分布与谱特征

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge(剑桥大学)

AI总结 研究递归训练扩散模型时的分布崩溃问题,证明即使完美学习也会因早期停止导致漂移,并收敛到唯一极限分布,该分布具有低通滤波谱特性。

详情
AI中文摘要

生成模型在其自身输出上的递归训练可能导致模型崩溃,即与真实数据分布的复合漂移。现有的理论工作限制了扩散模型背景下有限轮误差的累积,但有两个问题仍然悬而未决:递归收敛到何种分布,以及收敛速度如何?我们回答了这两个问题,并分离出一种不同于不完美学习的机制:即使具有完美的分数估计和精确采样,反向扩散的早期停止(出于数值稳定性需要)也会驱动逐渐偏离数据分布。我们证明该递归几何收敛到唯一的极限分布,该分布具有闭式表征,即数据分布的无限混合,其中每个分量是数据分布的高斯平滑版本,且平滑程度递增。该极限的Hermite谱分解表明,递归训练充当低通滤波器:编码精细非高斯结构的高阶模式比粗模式衰减得更强。这种谱图景启发了一种退火截断调度,该调度在再训练轮次中逐步缩小截断时间;我们证明任何收敛到0的调度都能渐近消除递归复合。最后,我们展示了理想化表征的鲁棒性:在存在离散化和分数估计误差的情况下,学习到的分布保持在理想极限周围的Wasserstein-2球内,且具有模式依赖的收缩率,高阶误差比低阶误差收缩更快。我们在合成高斯混合和CIFAR-10上验证了该理论。

英文摘要

Recursive training of generative models on their own outputs can lead to model collapse, a compounding drift away from the true data distribution. Existing theoretical works bound finite-round error accumulation in the context of diffusion models, but two questions remain open:~what distribution does the recursion converge to, and how fast? We answer both, isolating a mechanism distinct from imperfect learning: even with perfect score estimation and exact sampling, the early stopping of the reverse diffusion (required for numerical stability) drives a progressive drift away from the data distribution. We prove that this recursion converges geometrically to a unique limiting distribution, which admits a closed-form characterization as an infinite mixture of increasingly Gaussian-smoothed versions of the data distribution. A Hermite spectral decomposition of this limit reveals that recursive training acts as a low-pass filter: higher-order modes, which encode fine non-Gaussian structure, are attenuated much more strongly than coarse modes. This spectral picture motivates annealed truncation schedules that progressively shrink truncation times across retraining rounds; we prove that any schedule converging to $0$ asymptotically eliminates recursive compounding. Finally, we show our idealized characterization is robust: in the presence of discretization and score estimation errors, the learned distribution remains in a Wasserstein-2 ball around the ideal limit, with mode-dependent contraction rates that contract high-order errors faster than low-order ones. We validate the theory on synthetic Gaussian mixtures and CIFAR-10.

2606.13817 2026-06-15 cs.RO cs.LG 交叉投稿

FlowMo-WM: A World Model with Object Momentum and Hidden Ambient Drift

FlowMo-WM:具有物体动量和隐藏环境漂移的世界模型

Yitao Jiang, Luyang Zhao, Muhao Chen, Devin Balkcom

发表机构 * Dartmouth College(达特茅斯学院) Clemson University(克莱姆森大学) University of Houston(休斯顿大学)

AI总结 提出FlowMo-WM,一种端到端可训练的视觉世界模型,通过分解图像-动作历史为短历史潜在状态和长历史上下文,分别建模物体运动和环境漂移,提升水下机器人等场景的长程预测精度。

详情
AI中文摘要

机器人学习中的世界模型根据视觉观察和动作预测未来状态,使智能体能够推理其控制后果。然而,许多动作条件模型在运动由即时控制主导的场景中评估,而水面航行器和其他真实世界物体在惯性下持续运动,并被水流或风等隐藏环境漂移所位移。我们提出FlowMo-WM,一种端到端可训练的视觉世界模型,无需流场直接监督,从图像-动作历史中推断以物体为中心的运动状态和与隐藏漂移相关的预测性长历史上下文。FlowMo-WM将图像-动作历史分解为短历史潜在状态(训练以总结以物体为中心的运动)和长历史上下文(训练以总结缓慢变化的外生影响)。在潜在展开期间,零上下文残差转移将动作条件基础动力学与上下文相关的漂移效应分离。在具有多样隐藏流、干扰和随机化车辆动力学的模拟水面航行器环境中,FlowMo-WM相比代表性动作条件潜在世界模型提高了长程展开精度。预测时上下文消融实验(在展开过程中将推断的上下文置零或打乱)表明,环境上下文对于隐藏漂移下的稳定预测至关重要,而冻结线性探针则表征了学习因子中编码的信息。

英文摘要

World models in robot learning predict future states from visual observations and actions, enabling agents to reason about the consequences of their controls. However, many action-conditioned models are evaluated in settings where motion is dominated by immediate control, whereas aquatic surface vehicles and other real-world objects continue moving under inertia and are displaced by hidden ambient drift, such as water currents or wind. We propose FlowMo-WM, an end-to-end trainable visual world model that infers object-centric motion state and a predictive long-history context associated with hidden drift from image-action histories without direct supervision of flow fields. FlowMo-WM factorizes image-action history into a short-history latent state, trained to summarize object-centric motion, and a longer-history context, trained to summarize slowly varying exogenous influences. A zero-context residual transition separates action-conditioned base dynamics from context-dependent drift effects during latent rollout. In simulated aquatic surface-vehicle environments with diverse hidden flows, disturbances, and randomized vehicle dynamics, FlowMo-WM improves long-horizon rollout accuracy over representative action-conditioned latent world models. Prediction-time context ablations, in which the inferred context is zeroed or shuffled during rollout, show that the ambient context is important for stable prediction under hidden drift, while frozen linear probes characterize information encoded in the learned factors.

2606.13929 2026-06-15 cs.CV cs.LG 交叉投稿

Self-Evolving Visual Questioner

自演化视觉提问器

Yijun Liang, Hengguang Zhou, Ming Li, Lichen Li, Cho-Jui Hsieh, Tianyi Zhou

发表机构 * University of Maryland, College Park(马里兰大学帕克分校) University of California, Los Angeles(加州大学洛杉矶分校) Peking University(北京大学) Arena MBZUAI(穆罕默德·本·扎耶德人工智能大学)

AI总结 提出自演化框架,让视觉语言模型作为提问器和过滤器,无需外部监督即可生成更难、更信息丰富、更视觉中心的问题,并保持探索多样性以避免训练崩溃,显著提升自主提问质量和难度边界。

Comments 21 pages, including references and appendix. Project Page is available at https://joliang17.github.io/SelfEvolvingVQG/

详情
AI中文摘要

视觉语言模型(VLM)通常被训练为被动的回答者,而它们主动提出多样化、非平凡、视觉中心且基于问题的问题的能力仍未被充分探索。现有的视觉提问器的性能受到高质量训练数据的可用性或整理成本的瓶颈限制。我们证明,VLM可以在没有任何外部监督的情况下作为视觉提问器持续自我改进。我们提出一个自演化框架,该框架使用VLM本身作为提议者和过滤器,以产生更难、更信息丰富、更视觉中心的问题,同时保持其探索多样性以避免训练崩溃。这些问题随后用于以提问者和回答者模式训练VLM。为了评估提问器,我们引入了一个代理协议,从感知、推理和多样性维度评估问题。在各种骨干VLM上的实验表明,我们的方法显著提高了自主问题生成的质量,并大幅扩展了难度边界。在相同预算下,我们的自监督比在静态源数据上训练更有效。此外,自演化提问器仍然是一个有竞争力甚至更好的回答者。

英文摘要

Vision-language models (VLMs) are typically trained as passive answerers, while their ability to actively ask diverse, non-trivial, visual-centric and grounded questions remains underexplored. Existing visual questioners' performance is bottlenecked by the availability of high-quality training data or the cost of curating them. We show that a VLM can continuously improve itself as a visual questioner without any external supervision. We propose a self-evolving framework that uses a VLM itself as both a proposer and a filter to produce harder, more informative, and visual-centric questions, while maintaining their exploration diversity to avoid training collapse. These questions are then used to train the VLM in both questioner and answerer modes. To evaluate the questioner, we introduce an agentic protocol that assesses questions along perception, reasoning, and diversity dimensions. Experiments across various backbone VLMs show that our method substantially enhances the quality and substantially expands the difficulty boundary of autonomous question generation. Under the same budget, our self-supervision is more effective than training on the static source data. Moreover, the self-evolving questioner remains a competitive or even better answerer.

2606.13970 2026-06-15 cs.RO cs.LG 交叉投稿

An Attention-based Model for Robust Forecasting with Missing Modality

基于注意力的缺失模态鲁棒预测模型

Zhitian Zhang, Wenjie Zi, Yunduz Rakhmangulova, Saghar Irandoust, Hossein Hajimirsadeghi, Thibaut Durand

发表机构 * Simon Fraser University(西蒙菲莎大学) RBC Borealis

AI总结 提出一种基于条件变分自编码器和Transformer的多模态模型,通过注意力机制学习统一固定维度的表示,在训练和推理中处理缺失模态,在人类轨迹预测和机器人操作预测任务上优于现有方法。

Comments Work originally done in 2023

详情
AI中文摘要

在缺失模态下的学习是多模态机器人学习中的一个基本挑战,因为现实世界的机器人系统通常运行在传感器数据不完整的环境中。基于注意力的模型在处理多模态数据时具有吸引力,因为它们可以用单一骨干网络处理多种模态。然而,大多数多模态模型假设在训练和推理过程中所有模态都可用,限制了它们在机器人感知和决策中的适用性。在本文中,我们介绍了一种多模态模型,旨在在训练和推理过程中处理缺失模态。该模型被表述为条件变分自编码器(CVAE),并采用基于Transformer的架构,利用注意力机制学习统一的固定维度表示,即使某些模态缺失。我们表明,所提出的模型可以在缺失模态的情况下进行训练,同时逼近所有模态的鲁棒表示。我们在五个多模态数据集上评估了我们的方法,涉及两个机器人学习任务:人类轨迹预测和机器人操作预测。实验结果表明,我们的模型有效地从不完整数据中学习,并且优于先前的多模态融合方法。

英文摘要

Learning with missing modalities is a fundamental challenge in multimodal robot learning, as real-world robotic systems often operate in environments with incomplete sensor data. Attention-based models are appealing for processing multimodal data because they can handle multiple modalities with a single backbone network. However, most multimodal models assume that all modalities are available during both training and inference, limiting their applicability in robotic perception and decision-making. In this paper, we introduce a multimodal model designed to handle missing modalities during both training and inference. The model is formulated as a conditional variational autoencoder (CVAE) and incorporates a transformer-based architecture that leverages attention mechanisms to learn a unified, fixed-dimensional representation, even when some modalities are missing. We show that our proposed model can be trained with missing modalities while approximating a robust representation of all modalities. We evaluate our approach on five multimodal datasets across two robot learning tasks: human trajectory prediction and robot manipulation forecasting. Experimental results demonstrate that our model effectively learns from incomplete data and is superior to prior multimodal fusion approaches.

2606.14003 2026-06-15 cond-mat.mtrl-sci cs.LG physics.comp-ph 交叉投稿

XRDiff: Crystal Structure Prediction from Powder X-Ray Diffraction Data Using Diffusion Models

XRDiff: 使用扩散模型从粉末X射线衍射数据进行晶体结构预测

Nofit Segal, Mingda Li, Benjamin Kurt Miller, Rafael Gómez-Bombarelli

发表机构 * Department of Materials Science and Engineering, MIT(材料科学与工程系,麻省理工学院) Department of Nuclear Science and Engineering, MIT(核科学与工程系,麻省理工学院) FAIR, Meta(FAIR,Meta)

AI总结 提出XRDiff扩散模型,从粉末X射线衍射数据恢复晶体结构,在模拟基准上实现强结构恢复率,并采用基于峰值的编码提升对实验数据的泛化能力。

详情
AI中文摘要

从粉末X射线衍射(PXRD)图谱确定材料的晶体结构是材料科学中的一个核心挑战。PXRD是一种易于使用且广泛应用的表征技术,但由于相位信息的丢失,从衍射数据恢复原子结构需要求解一个欠定逆问题。生成建模可以为原子结构提供先验,并通过模拟的结构-谱图对学习从PXRD图谱到晶体结构的映射。我们提出了XRDiff,一个扩散模型,能够在给定化学计量比或更困难的情况下(给定元素组成和晶胞中原子总数)从PXRD恢复晶体结构。我们在每个化学计量比具有多个多晶型物且给定组成的所有多晶型物被一起保留的数据集上进行评估,确保高性能反映了对衍射信号的真实利用。XRDiff在模拟基准上实现了强结构恢复率,表明模型学习了足够精确的谱图到结构的映射,能够区分多晶型物。为了解决对实验数据的泛化问题,我们比较了全谱编码和基于峰描述符的编码。基于峰的编码泛化能力显著更好,甚至优于使用针对实验噪声分布进行增强的全谱训练模型。这些结果表明,对真实世界PXRD中存在的噪声和伪影具有鲁棒性的表示为弥合模拟与实验之间的差距提供了一条实用且可扩展的路径,使得能够从实验PXRD中零样本求解晶体结构,输入完整的或部分的化学成分。

英文摘要

Determining the crystal structure of a material from its powder X-ray diffraction (PXRD) pattern is a central challenge in materials science. PXRD is an accessible and widely used characterization technique, yet recovering the atomic structure from diffraction data requires solving an underdetermined inverse problem due to the loss of phase information. Generative modeling can provide a prior over atomic structure and learn the mapping from PXRD patterns to crystal structures via simulated structure-spectrum pairs. We present XRDiff, a diffusion model that recovers crystal structures from PXRD given either the stoichiometry or, in a more challenging setting, the elemental constituents and total number of atoms in the unit cell. We evaluate on datasets where each stoichiometry has multiple polymorphs and all polymorphs of a given composition are held out together, ensuring that high performance reflects genuine use of the diffraction signal. XRDiff achieves strong structure recovery rates on simulated benchmarks, indicating that the model learns a spectrum-to-structure mapping precise enough to differentiate between polymorphs. To address generalization to experimental data, we compare a full-spectrum encoding against an encoding based on peak descriptors. The peak-based encoding generalizes substantially better, outperforming even a model trained on full spectra with augmentations fitted to the experimental noise distribution. These results demonstrate that representations robust to the noise and artifacts present in real-world PXRD offer a practical and scalable path toward closing the simulation-to-experiment gap, enabling zero-shot crystal structure solution from experimental PXRD with full or partial chemical composition input.

2606.14313 2026-06-15 stat.ML cs.LG 交叉投稿

Nonlocal Bayesian Modeling of Continuous Spatio-Temporal Dynamics

连续时空动力学的非局部贝叶斯建模

Jaeyeong Lee, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院)

AI总结 提出NLBST模型,通过坐标基展开和连续时间ODE结合非局部积分微分方程,实现不规则观测下的连续时空预测与不确定性量化。

Comments Accepted at UAI 2026

详情
AI中文摘要

现实世界的时空预测必须处理不规则时间点、空间稀疏观测以及不确定性量化的需求。这种设置通常因非局部相互作用(长程空间耦合)而进一步复杂化。对连续空间、连续时间的非局部动力学进行建模自然会导致无限维积分微分方程(IDE),使得原则性的贝叶斯推断变得棘手。我们提出了非局部贝叶斯时空模型(NLBST),这是一个用于连续时空场的分层贝叶斯框架,它在保留可处理推断的同时学习显式的非局部耦合。NLBST通过基于坐标的空间基展开表示潜在场,并用连续时间ODE对系数过程进行建模,其可学习的线性算子对应于非局部IDE的伽辽金约化;神经ODE残差捕获额外的非线性动力学。线性高斯观测模型使得在缺失和不规则观测下能够进行卡尔曼式顺序更新,而空间基表示则使得无需重新训练即可在未测量位置进行归纳预测。全局参数通过变分推断学习,不确定性通过贝叶斯层次结构处理。在合成和真实数据集上的实验表明,该模型具有强大的预测能力和空间泛化能力,且不确定性校准良好,在强非局部和部分观测场景下相比基线方法取得了显著提升。

英文摘要

Real-world spatio-temporal forecasting must handle irregular time points, spatially sparse observations, and the need for uncertainty quantification. This setting is often further compounded by nonlocal interactions (long-range spatial coupling). Modeling continuous-space, continuous-time nonlocal dynamics naturally leads to infinite-dimensional integro-differential equations (IDEs), making principled Bayesian inference intractable. We propose the NonLocal Bayesian Spatio-Temporal model (NLBST), a hierarchical Bayesian framework for continuous spatio-temporal fields that learns explicit nonlocal coupling while retaining tractable inference. NLBST represents the latent field via a coordinate-based spatial basis expansion and models the coefficient process with a continuous-time ODE whose learnable linear operator corresponds to a Galerkin reduction of a nonlocal IDE; a Neural ODE residual captures additional nonlinear dynamics. A linear-Gaussian observation model enables Kalman-style sequential updates under missing and irregular observations, while the spatial basis representation enables inductive prediction at unmeasured locations without retraining. Global parameters are learned via variational inference, and uncertainty is handled through a Bayesian hierarchy. Experiments on synthetic and real-world datasets demonstrate strong forecasting and spatial generalization with well-calibrated uncertainty, yielding substantial gains over baselines in strongly nonlocal and partially observed regimes.

2502.00336 2026-06-15 cs.LG stat.ML 版本更新

Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves

随机特征去噪分数匹配:从精确学习曲线看扩散模型

Anand Jerry George, Rodrigo Veiga, Nicolas Macris

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)(联邦理工学院洛桑校区)

AI总结 通过随机特征神经网络参数化分数函数,推导去噪分数匹配的渐近精确误差,揭示模型复杂度、数据量和噪声样本数对扩散模型泛化与记忆的影响。

Comments Published at AISTATS 2026

详情
AI中文摘要

我们从理论上研究扩散模型中的泛化和记忆现象。实证研究表明,这些现象受模型复杂度和训练数据集大小的影响。在我们的实验中,我们进一步观察到去噪分数匹配(DSM)中每个数据样本使用的噪声样本数($m$)起着显著且非平凡的作用。我们通过在一个简单理论设置下推导DSM测试误差和训练误差的渐近精确表达式,捕捉这些行为并揭示其机制。分数函数由随机特征神经网络参数化,目标分布为$d$维高斯分布。我们在维度$d$、数据样本数$n$和特征数$p$趋于无穷大,同时保持比率$\psi_n=\frac{n}{d}$和$\psi_p=\frac{p}{d}$固定的情况下进行操作。通过刻画测试和训练误差,我们确定了作为$\psi_n$、$\psi_p$和$m$函数的泛化和记忆区域。我们的理论发现与实证观察一致。

英文摘要

We theoretically investigate the phenomena of generalization and memorization in diffusion models. Empirical studies suggest that these phenomena are influenced by model complexity and the size of the training dataset. In our experiments, we further observe that the number of noise samples per data sample ($m$) used during Denoising Score Matching (DSM) plays a significant and non-trivial role. We capture these behaviors and shed insights into their mechanisms by deriving asymptotically precise expressions for test and train errors of DSM under a simple theoretical setting. The score function is parameterized by random features neural networks, with the target distribution being $d$-dimensional Gaussian. We operate in a regime where the dimension $d$, number of data samples $n$, and number of features $p$ tend to infinity while keeping the ratios $ψ_n=\frac{n}{d}$ and $ψ_p=\frac{p}{d}$ fixed. By characterizing the test and train errors, we identify regimes of generalization and memorization as a function of $ψ_n,ψ_p$, and $m$. Our theoretical findings are consistent with the empirical observations.

2603.02230 2026-06-15 cs.LG cs.AI 版本更新

Generalized Discrete Diffusion with Self-Correction

广义离散扩散与自校正

Linxuan Wang, Ziyi Wang, Yikun Bai, Wei Deng, Guang Lin, Qifan Song

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出自校正离散扩散模型(SCDD),通过显式状态转移和离散时间学习,简化训练噪声调度,消除冗余重掩码步骤,在GPT-2规模上实现高效并行解码并保持生成质量。

Comments 40 pages, 3 figures, 6 tables

详情
AI中文摘要

自校正是保持离散扩散模型中并行采样且性能损失最小的有效技术。先前的工作在推理时或后训练期间探索了自校正;然而,此类方法通常泛化能力有限,并可能损害推理性能。GIDD通过多步BERT风格的均匀吸收目标开创了基于预训练的自校正。然而,GIDD依赖于连续的基于插值的管道,其中均匀转移和吸收掩码之间的交互不透明,这使超参数调整复杂化并阻碍实际性能。在这项工作中,我们提出了一种自校正离散扩散(SCDD)模型,以显式状态转移和直接在离散时间中学习的方式重新表述预训练自校正。我们的框架还简化了训练噪声调度,消除了冗余的重掩码步骤,并完全依赖均匀转移来学习自校正。在GPT-2规模上的实验表明,我们的方法能够实现更高效的并行解码,同时保持生成质量。

英文摘要

Self-correction is an effective technique for maintaining parallel sampling in discrete diffusion models with minimal performance degradation. Prior work has explored self-correction at inference time or during post-training; however, such approaches often suffer from limited generalization and may impair reasoning performance. GIDD pioneers pretraining-based self-correction via a multi-step BERT-style uniform-absorbing objective. However, GIDD relies on a continuous interpolation-based pipeline with opaque interactions between uniform transitions and absorbing masks, which complicates hyperparameter tuning and hinders practical performance. In this work, we propose a Self-Correcting Discrete Diffusion (SCDD) model to reformulate pretrained self-correction with explicit state transitions and learn directly in discrete time. Our framework also simplifies the training noise schedule, eliminates a redundant remasking step, and relies exclusively on uniform transitions to learn self-correction. Experiments at the GPT-2 scale demonstrate that our method enables more efficient parallel decoding while preserving generation quality.

2509.24710 2026-06-15 stat.ML cs.LG cs.NA math.NA 版本更新

MAD: Manifold Attracted Diffusion

MAD: 流形吸引扩散

Dennis Elbrächter, Giovanni S. Alberti, Matteo Santacesaria

发表机构 * Department of Mathematics, University of Vienna(维也纳大学数学系) MaLGa Center, Department of Mathematics, University of Genoa(热那亚大学数学系MaLGa中心)

AI总结 提出流形吸引扩散方法,利用流形假设通过扩展得分函数在推理阶段去除噪声,生成无噪声样本,在玩具问题、合成数据和真实数据上验证有效性。

详情
Journal ref
Forty-third International Conference on Machine Learning, 2026
AI中文摘要

基于得分的扩散模型是从图像分布中生成样本的一种高效方法。我们考虑训练数据来自目标分布的有噪声版本的情况,并提出一种可高效实现的推理过程修改,以生成无噪声样本。我们的方法受流形假设启发,该假设认为有意义的数据集中在高维环境空间的某个低维流形周围。核心思想是,噪声表现为离流形方向上的低幅度变化,而目标分布的相关变化主要限于流形方向。我们引入了扩展得分概念,并表明在简化设置中,它可以将小变化减少为零,同时基本保持大变化不变。我们描述了如何从标准得分的近似中高效计算其近似,并在玩具问题、合成数据和真实数据上展示了其有效性。

英文摘要

Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently implementable modification of the inference procedure to generate noiseless samples. Our approach is motivated by the manifold hypothesis, according to which meaningful data is concentrated around some low-dimensional manifold of a high-dimensional ambient space. The central idea is that noise manifests as low magnitude variation in off-manifold directions in contrast to the relevant variation of the desired distribution which is mostly confined to on-manifold directions. We introduce the notion of an extended score and show that, in a simplified setting, it can be used to reduce small variations to zero, while leaving large variations mostly unchanged. We describe how its approximation can be computed efficiently from an approximation to the standard score and demonstrate its efficacy on toy problems, synthetic data, and real data.

2605.24795 2026-06-15 math.OC cs.LG cs.RO cs.SY eess.SY 版本更新

Lifted Schrödinger Bridges for Gaussian Mixture Endpoints: Projection Gaps and Path-Space Obstructions

提升的Schrödinger桥用于高斯混合端点:投影间隙与路径空间障碍

Siddhartha Ganguly, George Rapakoulias, Panagiotis Tsiotras

发表机构 * Daniel Guggenheim School of Aerospace, Georgia Institute of Technology(丹尼尔·加金吉姆航空航天学院,佐治亚理工学院)

AI总结 针对高斯混合端点分布下的随机密度控制问题,提出一种提升路径空间构造,将问题分解为高斯分量间的显式Schrödinger桥与有限维熵耦合,并分析投影后的标签信息间隙及路径空间障碍。

Comments 35 pages. Submitted to a journal; comments are welcome

详情
AI中文摘要

我们研究了布朗先验动力学下高斯混合端点分布之间的随机密度控制。由于高斯混合之间的直接Schrödinger桥通常没有闭式解,我们引入了一种提升路径空间构造,其中每条轨迹都增加了一个源-目标分量标签。因此,问题分解为具有显式边际、漂移和成本公式的高斯分量间Schrödinger桥,而混合级分配简化为具有Sinkhorn缩放形式的有限维熵耦合问题。然后,我们分析了通过丢弃或遗忘标签得到的投影。通过构造,投影律满足原始高斯混合端点约束,但其相对熵通常与提升相对熵相差一个非负的条件标签信息间隙。这个间隙揭示了一个路径空间障碍:提升优化器在投影后通常不能等同于直接的无标签Schrödinger桥。我们还推导了与投影边际流相关的后验平均马尔可夫漂移,证明了动能上界,并识别了一个公共路径势条件,在该条件下投影间隙消失。为了自包含的阐述,记录了几个显示密度和形状控制的数值示例。

英文摘要

We study stochastic density control between Gaussian-mixture endpoint distributions under Brownian prior dynamics. Since the direct Schrödinger bridge between Gaussian mixtures is generally not available in closed form, we introduce a lifted path-space construction in which each trajectory is augmented with a source--target component label. Consequently, the problem decomposes into Gaussian component-to-component Schrödinger bridges with explicit marginal, drift, and cost formulas, while the mixture-level assignment reduces to a finite-dimensional entropic coupling problem with a Sinkhorn scaling form. We then analyze the projection obtained by discarding or forgetting the label. By construction, the projected law satisfies the original Gaussian-mixture endpoint constraints, but its relative entropy generally differs from the lifted relative entropy by a nonnegative conditional label-information gap. This gap reveals a path-space obstruction: the lifted optimizer cannot, in general, be identified with the direct unlabeled Schrödinger bridge after projection. We also derive the posterior-averaged Markov drift associated with the projected marginal flow, prove a kinetic-energy upper bound, and identify a common path-potential condition under which the projection gap vanishes. Several numerical illustrations showing density and shape control are recorded for a self-contained exposition.

2606.13626 2026-06-15 cs.SD cs.LG 版本更新

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

巴赫风格符号音乐的生成建模:自回归、潜变量和对抗方法的比较研究

Dezhi Yu, Kyuil Lee, Yongkang Huang

发表机构 * Stanford University(斯坦福大学)

AI总结 比较自回归LSTM、潜变量模型和生成对抗网络在巴赫风格钢琴音乐生成中的表现,发现带注意力的自回归LSTM生成音乐最连贯,向量量化缓解后验塌陷,对抗方法捕捉局部音高但训练困难。

Comments 11 pages, 13 figures. All authors contributed equally

详情
AI中文摘要

我们使用共享的MIDI语料库和三个模型家族研究巴赫风格符号钢琴音乐的生成建模:带注意力的自回归LSTM、包括循环VAE和向量量化VAE的潜变量模型,以及生成对抗网络。我们比较它们对复调音符序列建模、学习有用潜在表示以及生成风格连贯作品的能力。实验表明,带注意力的自回归LSTM生成最音乐连贯的样本,而向量量化有助于缓解后验塌陷,并产生比传统循环VAE更结构化的输出。对抗方法捕捉局部音高模式,但训练困难且对巴赫风格的泛化可靠性较低。这些结果突出了自回归、潜变量和对抗方法在符号音乐生成中的相对优势和失败模式。

英文摘要

We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and generative adversarial networks. We compare their ability to model polyphonic note sequences, learn useful latent representations, and generate stylistically coherent compositions. Our experiments show that the autoregressive LSTM with attention produces the most musically coherent samples, while vector quantization helps mitigate posterior collapse and yields more structured outputs than conventional recurrent VAEs. The adversarial approach captures local pitch patterns but remains difficult to train and generalizes less reliably to Bach's style. These results highlight the relative strengths and failure modes of autoregressive, latent-variable, and adversarial approaches for symbolic music generation.

5. 优化、泛化与理论分析 29 篇

2606.13753 2026-06-15 cs.LG cs.AI 新提交

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

权重范数设定“顿悟”时间尺度:因果延迟定律

Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc

发表机构 * H&K Research Studio / Clevix LLC(H&K研究工作室 / Clevix有限责任公司) Bac A Bank(北亚银行) Banking Academy of Vietnam(越南银行学院)

AI总结 通过干预训练中权重范数,发现网络在范数达到临界值Wc时发生顿悟,且延迟时间与固定范数倍数呈指数关系,揭示了范数对顿悟的因果作用。

Comments 14 papges, 9 figs and 3 tables

详情
AI中文摘要

“顿悟”是神经网络中泛化能力的延迟出现,远在模型拟合训练数据之后才发生。权重范数是否导致这种延迟存在争议:一些研究报告了转变时的临界范数,另一些则观察到没有固定范数的顿悟。我们通过在训练过程中干预范数而非仅观察它来解决这一问题。在带权重衰减的自由训练下,当权重范数达到一个跨种子和学习率变化很小(变异系数1%至2%)且随模数基按幂律增长的值Wc时,网络发生顿悟。当我们转而将范数固定为Wc的某个倍数ρ并保持该值时,网络仍然顿悟,但延迟遵循T_grok ∝ exp(α ρ)。一个指数α≈7.5拟合了四个模数下的延迟(R²=0.996)。在扫描范围内,固定范数使延迟变化约19倍,而学习率仅变化约2倍,且将范数保持在Wc以上会减慢而非阻止顿悟。最后的LayerNorm通过解耦权重尺度与网络函数消除了这种依赖;没有它,指数定律重新出现。这种固定范数的延迟是指数对应物,对应于自由收缩范数所预测的对数延迟。

英文摘要

Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, others observe grokking with no fixed norm at all. We settle this by intervening on the norm during training rather than only observing it. Under free training with weight decay, networks grok when the weight norm reaches a value Wc that varies little across seeds and learning rates (CV 1 to 2 percent) and grows with the modular base as a power law. When we instead clamp the norm to a fixed multiple rho of Wc and hold it there, the network still groks, but the delay follows T_grok proportional to exp(alpha rho). One exponent, alpha near 7.5, fits this delay across four moduli (R^2 = 0.996). Over the swept ranges the held norm moves the delay by about 19x and the learning rate by only about 2x, and holding the norm above Wc slows grokking rather than preventing it. A final LayerNorm removes the dependence by decoupling weight scale from the network function; without it the exponential law returns. This pinned-norm delay is the exponential counterpart to the logarithmic delay predicted for a freely contracting norm.

2606.13818 2026-06-15 cs.LG 新提交

Uncertainty Estimation and Generalization Bounds for Modern Deep Learning

现代深度学习的不确定性估计与泛化界

Luis A. Ortega

发表机构 * Andrés Department of Computer Science Machine Learning Group(安德烈斯计算机科学系机器学习组) Madrid, June 2026(马德里,2026年6月)

AI总结 本文从贝叶斯角度统一推断、函数空间建模和大偏差理论,提出DVIP、VaLLA和FMGP等方法改进不确定性估计,并利用PAC-贝叶斯和大偏差理论解释过参数化神经网络的泛化能力。

Comments PhD Thesis, Autonomous University of Madrid

详情
AI中文摘要

本论文研究贝叶斯原理如何加深我们对现代深度学习系统的理解。尽管神经网络取得了显著的预测性能,但其泛化能力和不确定性量化能力仍仅被部分理解。本论文从方法论和理论两个角度应对这一挑战:将贝叶斯推断、函数空间建模和大偏差理论统一在一个共同的概率视角下。在方法论方面,论文引入了深度变分隐过程(DVIP),这是一个可扩展的贝叶斯框架,将隐过程扩展到深度架构。作为补充,提出了两种后处理方法——变分线性化拉普拉斯近似(VaLLA)和固定均值高斯过程(FMGP)——为预训练的确定性网络配备校准的不确定性估计。理论贡献集中于现代机器学习中一个核心开放问题:为什么大型、过参数化的神经网络能泛化得这么好?为此,论文发展了一个统一的概率框架,在PAC-贝叶斯和大偏差理论的语言下连接了三个关键机制——多样性、平滑性和随机性。

英文摘要

This thesis investigates how Bayesian principles can deepen our understanding of modern deep learning systems. While neural networks achieve remarkable predictive performance, their ability to generalize and to quantify uncertainty remains only partly understood. This thesis approaches this challenge from both methodological and theoretical angles: unifying Bayesian inference, function-space modeling, and large-deviation theory under a common probabilistic perspective. On the methodological side, the thesis introduces the Deep Variational Implicit Process (DVIP), a scalable Bayesian framework that extends implicit processes to deep architectures. Complementing this, two post-hoc methods -- the Variational Linearized Laplace Approximation (VaLLA) and the Fixed-Mean Gaussian Process (FMGP) -- are proposed to equip pretrained deterministic networks with calibrated uncertainty estimates. The theoretical contributions focus on one of the central open questions in modern machine learning: why do large, over-parameterized neural networks generalize so well? To address this, the thesis develops a unified probabilistic framework that connects three key mechanisms -- diversity, smoothness, and stochasticity -- within the language of PAC-Bayesian and large-deviation theory.

2606.13867 2026-06-15 cs.LG 新提交

Muon$^p$: Muon with Fractional Spectral Powers

Muon$^p$: 具有分数谱幂的 Muon

Yihe Dong, Will Sawin

发表机构 * Princeton University(普林斯顿大学)

AI总结 提出 Muon$^p$ 优化器,通过分数谱幂更新 $US^pV^\top$ 插值 Muon 与梯度下降,证明其无法用单变量多项式迭代计算,并推导低阶双变量递归近似,在保持矩阵乘法结构的同时提升微调性能。

详情
AI中文摘要

Muon 是一种日益广泛使用的优化器,它将梯度 $G=USV^\top$ 替换为其极因子 $UV^\top$,从而扁平化奇异谱。然而,完全扁平化丢弃了可能对自适应重要的奇异值信息。我们引入 Muon$^p$,一种 Muon 风格的优化器,它对于有理数 $p\in(0,1)$ 使用分数谱幂更新 $US^pV^\top$,在 Muon 和梯度下降之间插值。为了使其实用,我们证明分数谱幂不能通过任何固定的单变量多项式迭代计算,并进一步推导出低阶双变量递归,仅使用矩阵乘法近似 $US^pV^\top$,保留了 Muon 仅矩阵乘法的结构和计算复杂度。我们证明 Muon$^p$ 在 Schatten $q$-范数($q=1+\frac{1}{p}$)下最大化损失的线性改进。实验上,Muon$^p$ 在微调中特别有效:在十亿级模型上,Muon$^p$ 改善了验证困惑度和下游任务性能。我们进一步通过谱几何的视角分析了 Muon$^p$ 何时不太适用。我们的结果揭示了关于何时保留奇异谱可以带来显著增益的重要见解,并引入了一种实现这些增益的原则性方法。

英文摘要

Muon is an increasingly widely used optimizer that replaces a gradient $G=USV^\top$ with its polar factor $UV^\top$, thereby flattening the singular spectrum. However, full flattening discards singular-value information that may matter for adaptation. We introduce Muon$^p$, a Muon-style optimizer that instead uses fractional spectral-power updates $US^pV^\top$ for rational $p\in(0,1)$, interpolating between Muon and gradient descent. To make it practical, we prove that fractional spectral powers cannot be computed by any fixed univariate polynomial iteration, and furthermore derive low-degree odd bivariate recurrences that approximate $US^pV^\top$ using only matrix multiplications, preserving Muon's matrix-multiplication-only structure and compute complexity. We show that Muon$^p$ maximizes the linear improvement in loss under the Schatten $q$-norm for $q=1+\frac{1}{p}$. Empirically, Muon$^p$ is especially effective for finetuning: on billion-scale models, Muon$^p$ improves validation perplexity and downstream task performance. We further analyze when Muon$^p$ is less suitable, through the lens of spectral geometry. Our results reveal important insights on when preserving the singular spectrum can bring significant gains, and introduce a principled way to achieve them.

2606.14259 2026-06-15 cs.LG 新提交

Beyond a Single Explanation of the Adam--SGD Gap

超越对Adam与SGD差距的单一解释

Chenxiang Zhang, Rustem Islamov, Enea Monzio Compagnoni, Jun Pang, Aurelien Lucchi, Antonio Orvieto

发表机构 * University of Luxembourg(卢森堡大学) University of Basel(巴塞尔大学) MPI for Intelligent Systems(马克斯·普朗克智能系统研究所) ELLIS Tübingen(ELLIS蒂宾根) Tübingen AI Center(蒂宾根人工智能中心)

AI总结 通过跨视觉、语言、基因组和图形任务的受控实验,发现Adam与SGD的性能差距源于数据和架构的复杂交互,而非单一因素,并观察到随批量大小变化的交叉点。

Comments preprint

详情
AI中文摘要

先前的工作已经确定了几个可能导致Adam和SGD之间性能差距的因素,涵盖数据方面、架构设计和优化属性。然而,这些解释通常是孤立研究的,它们的相对重要性尚不清楚。在这项工作中,我们通过跨视觉、语言、基因组和图形任务、涵盖现代和经典架构以及精心设计的训练设置的控制实证研究重新审视了这些假设。我们的结果表明,没有单一因素能够一致地解释Adam-SGD差距。例如,Adam的优势可以(1)在均匀词汇分布下持续存在,但在重尾分布下几乎消失;(2)在softmax注意力模型中逆转,有利于SGD;(3)在软架构修改下变得更大,例如当ReLU被GeLU非线性替换时。这表明差距源于非平凡的数据和架构交互,而不是单一的共同因素。然而,我们在我们的设置中观察到一个模式:一个\emph{交叉批量大小},在该大小下,随着批量大小的缩放,相对优势从SGD转移到Adam。这些实证结果被我们的理论差距模型所捕获,该模型预测了这种依赖于批量大小的交叉。我们的视角有助于调和几个现有的假设,同时提供跨领域的实际见解。

英文摘要

Prior work has identified several factors that can contribute to the performance gap between Adam and SGD, spanning data aspects, architecture design, and optimization properties. Yet these explanations are often studied in isolation, leaving their relative importance unclear. In this work, we revisit these hypotheses through a controlled empirical study across vision, language, genomics, and graph tasks, spanning modern and classical architectures, and carefully designed training setups. Our results suggest that no single factor consistently explains the Adam--SGD gap. For instance, the Adam advantage can (1) persist under a uniform vocabulary distribution yet nearly disappear under a heavy-tailed one; (2) reverse in favor of SGD in softmax-attention models; and (3) become larger under soft architectural modifications, e.g., when ReLU is replaced by a GeLU nonlinearity. This suggests that the gap arises from nontrivial data and architecture interactions, rather than from a single common factor. Yet, we observe a pattern across our settings: a \emph{crossover batch size} at which the relative advantage shifts from SGD to Adam as the batch size scales. These empirical results are captured by our theoretical gap model, which predicts this batch-size-dependent crossover. Our perspective helps reconcile several existing hypotheses while offering practical insights across domains.

2606.14533 2026-06-15 cs.LG cs.GT 新提交

The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision Errors

主成分分析的风险阴影:当99.9999%的方差保留导致灾难性决策错误

Hamidou Tembine

发表机构 * Department of EECS, School of Engineering, UQTR, Canada(加拿大魁北克大学三河城分校工程学院电气工程与计算机科学系) Learning and Game Theory Laboratory (LnG Lab), TIMADIE(学习与博弈论实验室(LnG Lab),TIMADIE)

AI总结 本文证明主成分分析(PCA)在保留99.9999%方差时可能完全丢失罕见高影响事件的信息,导致分类器退化为常数预测器,并提出Expectile PCA和Tail-Preserving PCA两种方法通过重加权协方差来保留尾部风险信息。

Comments 5 tables, 1 figure. all references fully checked manually

详情
AI中文摘要

主成分分析(PCA)保留方差,而非检测罕见灾难性事件所需的信息。本文证明了“风险阴影”的存在:PCA可以保留超过99.9999%的总方差,同时完全抹去关于罕见高影响失败的所有信号。当这种情况发生时,即使是在PCA表示上运行的最佳分类器也会退化为常数预测器。根本原因是方差最大化与尾部风险意识之间的根本不匹配。为了打破阴影,我们引入了Expectile PCA(ExPCA)和Tail-Preserving PCA(TP-PCA),这两种方法将数据协方差重新加权以偏向高影响事件。我们从理论上证明,ExPCA在保留罕见事件信息方面严格优于PCA,并在合成数据和真实世界的信用卡欺诈检测基准上验证了我们的主张。我们的结果呼吁在高风险决策中从根本上重新思考基于方差的降维方法。

英文摘要

Principal Component Analysis (PCA) preserves variance, not the information needed to detect rare catastrophic events. This paper proves the existence of a {\it Risk Shadow}: PCA can retain over 99.9999 percent of total variance while completely erasing all signal about rare, high-impact failures. When this happens, even the best possible classifier operating on the PCA representation reduces to a constant predictor. The root cause is a fundamental mismatch between variance maximization and tail risk awareness. To break the shadow, we introduce Expectile PCA (ExPCA) and Tail-Preserving PCA (TP-PCA), two methods that reweight the data covariance toward high-impact events. We prove theoretically that ExPCA strictly outperforms PCA in retaining rare-event information, and we validate our claims on synthetic data and a real-world credit card fraud detection benchmark. Our results call for a fundamental rethinking of variance-based dimensionality reduction in high-stakes decisions.

2606.14640 2026-06-15 cs.LG cs.DS 新提交

Online Convex Optimization with Sublinear Noisy Probes

具有亚线性噪声探测的在线凸优化

Simone Di Gregorio, Anupam Gupta, Stefano Leonardi, Matteo Russo

发表机构 * Sapienza University of Rome(罗马大学) New York University(纽约大学) EPFL(瑞士联邦理工学院洛桑分校)

AI总结 研究在线凸优化中利用亚线性噪声成对探测降低遗憾,通过方差缩减和连续指数权重二阶分析获得紧界。

Comments Accepted at COLT '26

详情
AI中文摘要

我们研究在凸集 $K\subseteq \mathbb R^d$ 上的在线凸优化(OCO),其中每轮 $t$ 学习者选择 $x_t\in K$,然后观察凸损失 $f_t:K\to[0,1]$,目标是最小化相对于事后最佳固定决策的遗憾。我们引入一个统一的探测模型,该模型概括了两个近期工作方向:专家设置中的亚线性最佳专家查询,以及 OCO 中每轮可用的成对(基于比较)反馈。在我们的框架中,学习者有 $k\le T$ 个成对探测预算;在被探测的轮次中,它可以查询两个点并了解哪个点的损失更小。我们的主要结果表明,即使亚线性和噪声的探测预算也能在完全反馈 OCO 机制中显著改善最坏情况遗憾。使用 $k$ 个 $\delta$-噪声成对探测,我们得到:$ \text{Reg}_T \le O\left(\min\left\{\sqrt{dT\ln T},\\; \frac{dT\ln T}{k|1-2\delta|}\right\}\right) $,该界在 $T$、$k$ 和 $\delta$ 上紧(至多对数因子)。具体关于噪声参数 $\delta \in [0,1]$,当预言响应接近抛硬币(即 $\delta$ 接近 $\frac{1}{2}$)时,遗憾保证平滑退化。当将相同技术应用于具有 $d$ 个专家的有限 $K$ 的预测设置时,所得速率在所有参数(包括 $d$)上完全紧。我们的分析通过量化探测的方差缩减效应,结合连续指数权重的二阶(基于方差)分析,给出了 OCO 中成对探测的简化处理。

英文摘要

We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear best-expert queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ pairwise probes; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a sublinear and noisy probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $δ$-noisy pairwise probes, we obtain: $ \text{Reg}_T \le O\left(\min\left\{\sqrt{dT\ln T},\; \frac{dT\ln T}{k|1-2δ|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $δ$. Specifically regarding the noise parameter $δ\in [0,1]$, the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $δ$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

2606.14648 2026-06-15 cs.LG math.OC 新提交

Which Directions Matter? Sparse Design for Affine Robust Optimization

哪些方向重要?仿射鲁棒优化的稀疏设计

Pedro Chumpitaz-Flores, My Duong, Juan S. Borrero, Kaixun Hua

发表机构 * University of South Florida(南佛罗里达大学)

AI总结 研究有限字典和预算约束下鲁棒优化中不确定性方向的选择问题,提出基于覆盖目标的数据驱动选择规则,证明其单调次模性,给出贪心算法的近似保证和匹配的难度下界。

Comments Accepted at UAI 2026

详情
AI中文摘要

鲁棒机器学习和优化依赖于不确定性模型的选择。我们研究了当由有限字典和预算约束定义时,模型必须覆盖哪些不确定性方向。选择一个子集形成一个具有闭式支持函数的原子不确定性集,从而为仿射目标产生可处理的鲁棒程序。我们提出了一种基于评估方向(包括梯度、对抗扰动或保留数据上观察到的偏移)上的覆盖目标的数据驱动选择规则。我们证明该目标是单调且次模的,支持具有$(1-1/e)$近似保证的贪心方法和匹配的难度障碍。我们还提供了一个证书,用于限制所选子集的损失,以及一个具有样本外控制的半径校准规则。

英文摘要

Robust machine learning and optimization rely on the uncertainty model choice. We investigate which uncertainty directions a model must cover when defined by a finite dictionary and a budget constraint. Selecting a subset forms an atomic uncertainty set with a closed form support function, yielding tractable robust programs for affine objectives. We propose a data driven selection rule based on a coverage objective over evaluation directions, including gradients, adversarial perturbations, or shifts observed on held out data. We prove this objective is monotone and submodular, supporting a greedy method with a $(1-1/e)$ approximation guarantee and a matching hardness barrier. We also provide a certificate bounding the loss from the selected subset and a radius calibration rule with out of sample control.

2606.14679 2026-06-15 cs.LG cs.SY eess.SY math.OC stat.ML 新提交

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

一般凸集上在线库存优化的最优隐藏目标学习

Anthony Pineci, Yunzong Xu

发表机构 * UIUC(伊利诺伊大学厄巴纳-香槟分校)

AI总结 针对一般凸容量集上的在线库存优化问题,提出隐藏目标投影方法,将遗憾从逆概率依赖改进为平方根逆概率依赖,并证明匹配下界,同时首次给出强凸损失的 polylog 遗憾和动态遗憾保证。

详情
AI中文摘要

在线库存优化(OIO)是具有物理记忆的在线凸优化:库存结转使得可行动作集依赖于过去。一个自然的原则——在随机库存学习以及最近在单一线性容量约束下的OIO中使用——是维护一个由在线学习器选择的隐藏目标,并将其投影到当前可行的订货上限集上。我们证明,对于任意有界凸容量集上的OIO,这一简单原则是最优的。以在线梯度下降为基础学习器,该方法将一般凸集上OIO的最佳已知遗憾保证从对共同需求概率的逆依赖改进为平方根逆依赖,并且我们证明了匹配的下界。同样的原则为强凸损失提供了首个多对数遗憾保证,并为一般凸容量集上的欧几里得路径变化提供了首个动态遗憾保证。分析引入了一个范数对齐原则:正确的状态变量是隐藏目标到可行集的距离,以与投影相同的范数度量。在范数对齐下,该距离路径地演化为一个标量队列,目标移动作为到达,共同需求作为服务。这种简化为一维队列控制解决了状态依赖性,并将保证扩展到一般凸容量集,超出了先前乘积方法的范围。在合成和真实库存数据上的实验证实了该理论。

英文摘要

Online inventory optimization (OIO) is online convex optimization with physical memory: inventory carryover makes the feasible action set depend on the past. A natural principle, used in stochastic inventory learning and recently in OIO under a single linear capacity constraint, is to maintain a hidden target chosen by an online learner and implement its projection onto the currently feasible order-up-to set. We prove that this simple principle is optimal for OIO on arbitrary bounded convex capacity sets. With online gradient descent as the base learner, the method improves the best known regret guarantee for OIO on general convex sets from inverse to inverse-square-root dependence on the common-demand probability, and we prove a matching lower bound. The same principle gives the first polylogarithmic regret guarantee for strongly convex losses and the first dynamic regret guarantee adapting to Euclidean path variation on general convex capacity sets. The analysis introduces a norm alignment principle: the right state variable is the distance from the hidden target to the feasible set, measured in the same norm as the projection. Under norm alignment, this distance evolves pathwise as a scalar queue, with target movement as arrival and common demand as service. This reduction to one-dimensional queue control resolves the state dependence and extends the guarantees to general convex capacity sets, beyond the reach of prior productwise approaches. Experiments on synthetic and real-world inventory data corroborate the theory.

2606.14690 2026-06-15 cs.LG cs.IT math.IT 新提交

A Complexity Measure for Active Learning in Multi-group Mean Estimation

多组均值估计中主动学习的复杂度度量

Abdellah Aznag, Rachel Cummings, Adam N. Elmachtoub

发表机构 * Department of Industrial Engineering and Operations Research & Data Science Institute, Columbia University(哥伦比亚大学工业工程与运筹学系及数据科学研究所)

AI总结 针对多组均值估计的max-risk目标,提出局部极小极大框架并证明一般下界,引入方差局部曲率(VLC)作为复杂度度量,在平滑类中与方差-费希尔信息关联,并揭示异质实例中的系统性差距。

详情
AI中文摘要

我们研究了多组均值估计$d$臂老虎机中主动学习的\emph{max-risk}目标:学习者在$d$组间自适应分配$T$个样本的预算,以最小化最坏情况不确定性指标$\max_{k\in[d]}\sigma_k^2/n_k$,其中$\sigma_k$是臂$d$分布的标准差,$n_k$是臂$d$被采样的次数。我们开发了一个局部极小极大框架,并证明了该目标的第一个通用下界,适用于任何有限方差假设类。该下界将难度分解为三个正交因素:\emph{预算}项、衡量不确定性在臂间分布不均匀程度的\emph{异方差性}指数,以及一个模型相关的复杂度度量——\emph{方差局部曲率}($\mathrm{VLC}$),它捕捉了局部方差变化在假设类内创造的信息量。对于平滑类,$\mathrm{VLC}$是方差-费希尔信息的重新参数化,常见族具有闭式值。与现有最强上界对比表明,在广泛范围内接近最优(对数因子内),并在高度异质实例中指出了系统性差距。我们的证明引入了两个关键要素:决策空间上的损失诱导$\ell_1$几何,以及一个基于表示的实例生成器,将困难实例构造简化为显式随机矩阵计算。

英文摘要

We study a \emph{max-risk} objective for active learning in a multi-group mean estimation $d$-armed bandits: a learner adaptively allocates a budget of $T$ samples across $d$ groups to minimize the worst-case uncertainty index $\max_{k\in[d]}σ_k^2/n_k$, where $σ_k$ is the standard deviation of the distribution of arm $d$, and $n_k$ is the number of times arm $d$ is sampled. We develop a local minimax framework and prove the first general lower bound for this objective, valid for any finite-variance hypothesis class. The bound separates difficulty into three orthogonal factors: a \emph{budget} term, a \emph{heteroscedasticity} index measuring how unevenly the uncertainty is spread across arms, and a model-dependent complexity measure, the \emph{Variance Local Curvature} ($\mathrm{VLC}$), which captures how much information a local change of variance creates inside the hypothesis class. For smooth classes, the $\mathrm{VLC}$ is a reparametrization of a variance--Fisher information, with closed-form values for common families. Benchmarking against the strongest available upper bound shows near-optimality up to logarithmic factors in broad regimes, and pinpoints a systematic gap in highly heterogeneous instances. Our proof introduces two key ingredients: a loss-induced $\ell_1$ geometry on the decision space, and a representation-based instance generator that reduces hard-instance construction to an explicit random matrix calculation.

2606.13733 2026-06-15 cs.IT cs.LG math.IT 交叉投稿

How Task Structure Limits Multi-Agent Success: An Information-Theoretic Analysis

任务结构如何限制多智能体成功:一种信息论分析

Shi Pan, Ming Luo

发表机构 * University College London(伦敦大学学院) University of Bristol(布里斯托大学)

AI总结 通过信息论分析,证明在任务约束图连通性和有限通信下,多智能体系统的成功概率随最小割成本指数衰减,为系统设计提供指导。

详情
AI中文摘要

多智能体系统(MAS)曾预期通过协作克服单智能体系统(SAS)的局限性。然而,在任务约束图的典型性条件和有界智能体间通信下,我们证明MAS的成功概率与任务约束的连通性密切相关,其中每个智能体具有有限的信息处理能力。具体而言,成功概率随由任务约束图在智能体间划分产生的信息瓶颈呈指数衰减。我们将此量定义为每个任务潜在约束图的\emph{最小割成本}$C_{\min}$。该信息论界适用于具有外部反馈的开放系统和不具有外部反馈的封闭系统。我们在合成实验和来自SWE-bench提交的真实世界经验数据上验证了我们的理论。根据我们的框架,有效的MAS设计应结合任务固有约束与工程优化,当$\Cmin$较高时,实践者应重构任务而非简单扩展智能体或通信。

英文摘要

Multi-agent systems (MAS) were expected to overcome the limitation of single-agent systems (SAS) through collaboration. However, under typicality conditions on the task's constraint graph and bounded inter-agent communication, we prove that the success probability of a MAS is closely tied to the connectivity of task constraints, where each agent has limited information-processing capacity. Specifically, the success probability decays exponentially with an information bottleneck that emerges from partitioning the task's constraint graph among agents. We define this quantity as the \emph{minimum cut cost} $C_{\min}$ of the potential constraint graph of each task. This information-theoretic bound applies to both open systems with external feedback and closed systems without. We validate our theory on both synthetic experiments and real-world empirical data from SWE-bench submissions. From our framework, effective MAS design should incorporate task-inherent constraints alongside engineering optimization, and when $\Cmin$ is high, practitioners should restructure tasks rather than simply scaling agents or communication.

2606.13799 2026-06-15 cs.CC cs.LG 交叉投稿

The Program Is Still There: A Conservation Law for Program Discovery

程序仍在:程序发现的一个守恒律

Jorge Miguel Silva

发表机构 * Institute of Electronics and Informatics Engineering of Aveiro (IEETA) and Department of Electronics, Telecommunications and Informatics (DETI), University of Aveiro(阿维罗电子与信息工程学院(IEETA)和电子、电信与信息学院(DETI),阿维罗大学)

AI总结 本文证明,在仅通过得分学习候选程序的算法中,搜索问题的耦合宽度导致指数级最坏情况下的下界,并由此导出结构知识与搜索之间的守恒律,唯一逃逸是通过读取程序结构而非得分,但代价是不完备性。

Comments 9 pages main text and 33 pages supporting information. Engine source and full sweep data: https://github.com/jorgeMFS/omnis, archived at doi:10.5281/zenodo.20634984

详情
AI中文摘要

寻找生成序列的最短程序是不可计算的,六十年来这一事实被误认为是寻找任何生成程序的障碍。它不是障碍,而是一个代价,本文衡量了它。对于每个仅通过得分学习候选程序的算法,涵盖Levin搜索、进化方法、模拟退火和交叉熵方法,我们定义了搜索问题的耦合宽度,并证明了一个无条件最坏情况下的下界,该下界以该宽度为指数,底数为域大小减一。由此得出一个守恒律:注入搜索的结构知识与它消除的搜索一一对应,它们的总和永远不会低于所寻找的程序长度。Levin 1973年的上界和本文证明的下界是一个守恒量的两端,随着指令集的增长而相互靠近。唯一的逃逸是读取候选程序的结构而非其得分,其代价(我们针对通用目标证明)是不完备性。基于该理论构建的确定性引擎通过压缩数据并预测未见过的延续来恢复生成程序,在四个独立群体的3914个序列中恢复了2383个,包括256个初等元胞自动机中的244个,测得的发现成本随程序长度上升,比得分-预言机最坏情况高出一个数量级以上。

英文摘要

Finding the shortest program that generates a sequence is uncomputable, and for six decades that fact has been mistaken for a wall around finding any generating program. It is not a wall but a price, and this paper measures it. For every algorithm that learns about a candidate program only through its score, a class spanning Levin search, evolutionary methods, simulated annealing, and the cross-entropy method, we define the coupling width of a search problem and prove an unconditional worst-case lower bound, exponential in that width with base one less than the domain size. From it follows a conservation law: structural knowledge injected into a search trades one for one against the search it removes, and their sum can never fall below the length of the program sought. Levin's 1973 upper bound and the lower bound proved here are the two ends of one conserved quantity, closing on each other as the instruction set grows. The only escape is to read a candidate's structure rather than its score, and its price, which we prove for generic targets, is incompleteness. A deterministic engine built on this theory recovers a generating program, certified by compressing its data and predicting an unseen continuation, for 2,383 of 3,914 sequences across four independent populations, including 244 of the 256 elementary cellular automata, with measured discovery cost rising along program length more than an order of magnitude inside the score-oracle worst case.

2606.13912 2026-06-15 cond-mat.dis-nn cond-mat.str-el cs.LG physics.comp-ph quant-ph 交叉投稿

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

具有复杂相位结构的神经网络量子态的直接/自适应混合相位梯度学习

Yi-Ran Xue, Rui Wang, Baigeng Wang, Chenan Wei

发表机构 * National Laboratory of Solid State Microstructures and Department of Physics(固体-state微结构国家实验室和物理系) Department of Physics, University of Massachusetts(麻省大学物理系) A. Alikhanyan National Science Laboratory(Alikhanyan国家科学实验室) Collaborative Innovation Center of Advanced Microstructures, Nanjing University(先进微结构协同创新中心,南京大学) Jiangsu Physical Science Research Center(江苏物理科学研究中心) Hefei National Laboratory(合肥国家实验室)

AI总结 针对神经网络量子态在复杂相位结构下的优化脆弱性问题,提出直接相位梯度估计器与自适应混合方法,显著降低方差并提升精度,在100位点通量梯子和手性XXX链上验证了优势。

Comments 24 pages, 8 figures

详情
AI中文摘要

神经网络量子态是量子多体物理中领先的变分工具,但当基态具有非平凡符号或复杂相位结构时(这在规范场、时间反演对称性破缺和费米子统计中是普遍存在的),其优化变得脆弱。我们将这种脆弱性归因于相位梯度的随机估计器,而非网络表达能力。蒙特卡洛能量梯度的相位部分是一个有噪声的得分函数估计器;相反,对局部能量进行微分得到一个直接估计器,该估计器对相同的相位力无偏,方差低得多,并且只需要分离的振幅-相位假设。在100位点通量梯子上演示,以这种方式训练的小型网络达到0.89%的中位误差,而调整后的标准基线停滞在1.8%,更宽或更深的标准梯度网络误差从8.4%退化到24.6%。该优势延续到手性XXX链:直接估计器再次收敛到比标准估计器明显更低的误差,跨越α和系统尺寸;该优势随通量增加而在零通量控制中消失。两个估计器的自适应混合在最优混合系数下方差绝不会比更好的端点差,通过种子分辨的诊断将大部分增益归因于消除失败运行。因此,估计器设计成为复值神经量子态的一类重要杠杆。

英文摘要

Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude--phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $α$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

2606.13984 2026-06-15 stat.ML cs.LG stat.ME 交叉投稿

A General Framework for Decision Trees via Bregman Divergences

基于Bregman散度的决策树通用框架

Mathias Bourel

发表机构 * IESTA, Facultad de Ciencias Económicas y de Administración, Universidad de la República, Uruguay(乌拉圭拉普拉塔大学经济与管理学院,IESTA) IRL-2030, Instituto Franco-Uruguayo de Matemática e Interacciones (IFUMI)(法乌数学与互动研究所(IFUMI))

AI总结 提出基于Bregman散度的CART推广框架,统一多种损失函数和分裂准则,并研究生成凸函数的强凸性与光滑性对杂质增益、估计器稳定性和一致性的影响。

详情
AI中文摘要

决策树因其可解释性、灵活性以及适应非线性结构的能力,成为统计学习中的基本工具之一。其中,由Breiman、Friedman、Olshen和Stone于1984年引入的分类与回归树(CART)成为最具影响力的算法之一,至今仍是分类和回归问题中最广泛使用的方法之一。另一方面,由Lev Bregman于1967年在凸优化背景下引入的Bregman散度,提供了广泛的一类损失函数,自然地推广了平方欧氏距离。该族包括Kullback-Leibler散度、Poisson散度和Itakura-Saito散度,以及与指数族分布相关的若干损失函数。此外,Bregman散度具有丰富的几何结构,并与凸分析和信息几何有深刻联系。本文提出基于Bregman散度的CART范式推广,从而获得适应不同统计模型和底层几何结构的更广泛的决策树族。尽管CART或经典实现(如rpart)等算法包含了不同的杂质准则,但这些准则通常针对每个特定模型以临时方式引入。相比之下,Bregman散度方法提供了一个统一的框架,使得这些准则可以从共同的凸和几何原理中推导和解释。除了算法构建,我们还研究了这些树的理论性质。特别地,我们研究了生成凸函数的性质(如强凸性或光滑性)如何影响父节点与子节点之间的杂质增益,以及估计器的稳定性和一致性。

英文摘要

Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman divergences, introduced by Lev Bregman in 1967 in the context of convex optimization, provide a broad family of loss functions that naturally generalize the squared Euclidean distance. This family includes, among others, the Kullback-Leibler divergence, the Poisson divergence, and the Itakura-Saito divergence, as well as several losses associated with distributions belonging to the exponential family. Moreover, Bregman divergences possess a rich geometric structure and deep connections with convex analysis and information geometry. In this work, we propose a generalization of the CART paradigm based on Bregman divergences, thereby obtaining a broader family of decision trees adapted to different statistical models and underlying geometries. Although algorithms such as CART or classical implementations such as rpart incorporate different impurity criteria, these are usually introduced in an ad hoc manner for each specific model. In contrast, the Bregman divergence approach provides a unified framework that allows these criteria to be derived and interpreted from common convex and geometric principles. Beyond the algorithmic construction, we also investigate theoretical properties of these trees. In particular, we study how properties of the generating convex function -- such as strong convexity or smoothness -- influence impurity gains between parent and child nodes, as well as stability and consistency properties of the estimator.

2606.14053 2026-06-15 stat.ML cs.LG 交叉投稿

Hybrid Uncertainty Sensitivity Analysis Based on the HSIC for High-Dimensional Responses with Aleatory--Epistemic Separation

基于HSIC的混合不确定性灵敏度分析:面向具有偶然-认知分离的高维响应

Shijie Zhong, Jiangfeng Fu, Pengfei Wei

发表机构 * School of Power and Energy, Northwestern Polytechnical University(能源学院,西北工业大学)

AI总结 提出双空间张量积RKHS框架,通过分解核函数和双重Möbius反演,将全局依赖度量正交分解为纯偶然效应、纯认知效应及其交互贡献,实现高维响应下混合不确定性的灵敏度分析。

Comments 19 pages, 7 figures

详情
AI中文摘要

量化混合偶然和认知不确定性对高维系统响应的影响仍然是全局灵敏度分析(GSA)中的主要挑战。现有的基于希尔伯特-施密特独立性准则(HSIC)的方法主要局限于单输出设置,并且缺乏对异质不确定性来源及其相互作用的严格分解。为了解决这一局限性,提出了一种新颖的双空间张量积RKHS框架,用于混合不确定性下的灵敏度分析。通过在潜在输入空间和多维输出空间上构造因子化核,推导出并发双重Möbius反演,将全局依赖度量正交分解为纯偶然效应、纯认知效应及其交互贡献。得到的维度灵敏度指数保留了所有输出维度上的不确定性归因结构。为了满足分解所需的独立性假设,引入了基于逆概率积分变换的辅助变量表示,使得能够在统一的潜在空间中处理层次不确定性和Copula诱导的相关性。进一步开发了完全向量化的单循环实现,以避免嵌套蒙特卡洛模拟的计算负担。通过置换检验和Bootstrap置信区间量化统计显著性和估计不确定性。在改进的多输出Ishigami函数和空气动力学压力场问题上的数值研究证明了所提出框架的准确性、可扩展性和实际适用性。

英文摘要

Quantifying the influence of hybrid aleatory and epistemic uncertainties on high-dimensional system responses remains a major challenge in global sensitivity analysis (GSA). Existing Hilbert--Schmidt Independence Criterion (HSIC)-based approaches are primarily restricted to single-output settings and lack a rigorous decomposition of heterogeneous uncertainty sources and their interactions. To address this limitation, a novel double-space tensor-product RKHS framework is proposed for sensitivity analysis under hybrid uncertainty. By constructing factorized kernels over both the latent input space and the multidimensional output space, a concurrent double Möbius inversion is derived to orthogonally decompose the global dependence measure into pure aleatory effects, pure epistemic effects, and their interaction contributions. The resulting dimension-wise sensitivity indices preserve the uncertainty attribution structure across all output dimensions. To satisfy the independence assumptions required by the decomposition, an auxiliary-variable representation based on the inverse probability integral transform is introduced, enabling the treatment of hierarchical uncertainties and Copula-induced correlations within a unified latent space. A fully vectorized single-loop implementation is further developed to avoid the computational burden of nested Monte Carlo simulation. Statistical significance and estimation uncertainty are quantified through permutation testing and Bootstrap confidence intervals. Numerical studies on a modified multi-output Ishigami function and an aerodynamic pressure-field problem demonstrate the accuracy, scalability, and practical applicability of the proposed framework.

2606.14268 2026-06-15 stat.ML cs.LG 交叉投稿

Gradient boosting for extremes: sampling theory and application to insurance

极值的梯度提升:抽样理论及其在保险中的应用

Stéphane Lhaut, Olivier Lopez

发表机构 * CREST, CNRS, Ecole polytechnique, Groupe ENSAE-ENSAI, ENSAE Paris, Institut Polytechnique de Paris, Palaiseau, France(CREST、国家科学研究中心、巴黎高等工业学校、ENSAE-ENSAI集团、巴黎ENSAE、巴黎理工学院、Palaiseau法国)

AI总结 提出梯度提升估计广义帕累托分布的理论,通过正交重参数化改进收敛性,并在保险数据中验证了方法有效性。

Comments 36 pages, 10 figures

详情
AI中文摘要

我们为梯度提升在超阈值建模中估计协变量依赖的广义帕累托(GP)分布开发了统计学习理论。在对GP似然进行正交重参数化以对角化其Fisher信息矩阵后,我们将估计问题纳入经验风险最小化(ERM)框架,并推导了提升估计器的非渐近误差界。我们的分析考虑了过程中的三个不同误差来源:统计波动、GP模型渐近性质固有的近似偏差(在二阶正则变化下控制)以及与有限次提升迭代相关的近似误差,明确了由此产生的偏差-方差权衡。通过模拟,我们展示了重参数化的实际好处,表明它在训练过程中显著降低了梯度相关性并提高了收敛稳定性。该方法应用于德克萨斯州保险部的医疗事故保险数据集,包含超过18000个已结索赔。梯度提升方法对和解成本分布的尾部拟合良好,并揭示出和解天数是对尾部重尾性起主导作用的预测因子,这与准备金文献中的早期发现一致。

英文摘要

We develop a statistical learning theory for gradient boosting applied to the estimation of covariate-dependent Generalized Pareto (GP) distributions in the context of Peaks-over-Threshold modeling. After an orthogonal reparametrization of the GP likelihood that diagonalizes its Fisher information matrix, we cast the estimation problem within the Empirical Risk Minimization (ERM) framework and derive non-asymptotic error bounds for the boosting estimator. Our analysis accounts for three distinct sources of error in the process: statistical fluctuations, the approximation bias inherent to the asymptotic nature of the GP model-controlled under second-order regular variation-and the approximation error associated with the finite number of boosting iterates, making explicit the resulting bias-variance trade-off. We illustrate the practical benefits of the reparametrization through simulations, showing that it significantly reduces gradient correlation during training and improves convergence stability. The methodology is applied to a medical malpractice insurance dataset from the Texas Department of Insurance, comprising over 18 000 closed claims. The gradient boosting approach yields a good fit for the tail of settlement cost distributions and reveals that the number of days to settlement is the dominant predictor of tail heaviness, consistent with earlier findings in the reserving literature.

2606.14289 2026-06-15 math.OC cs.LG cs.NA cs.NE math.NA stat.ML 交叉投稿

Operator Calculus for Population-Based Optimization: A Mean-Field Convergence Theory

基于群体的优化的算子演算:平均场收敛理论

Pekka Malo, Lauri Viitasaari, Patrik Nummi, Antti Suominen, Ankur Sinha, Olli Tahvonen

发表机构 * Aalto University(阿尔托大学)

AI总结 提出一种算子演算,将多种基于群体的优化方法统一为三个基本算子(变异、选择、重组)的复合,并建立模块化Lyapunov原理,证明在稳定性和正则性条件下指数收敛。

Comments 71 pages, 4 figures, 2 tables; ancillary files contain Python code reproducing the numerical experiments

详情
AI中文摘要

基于群体的和分布优化方法,从进化策略和基于共识的优化到协方差矩阵适应和视为分布动力学的随机梯度方法,被广泛用于非凸或黑箱问题,但它们的收敛分析仍然分散在特定算法的技术中。我们引入一种算子演算,其中一大类这样的方法,在选择适当的状态空间并在必要时通过记忆或策略变量增强状态后,被描述为作用于概率测度的三个基本算子(变异、选择、重组)的复合。在明确的稳定性和正则性条件下,复合算子允许一个预生成子,其连续时间极限是一个保持算子分裂的输运-反应-跳跃(TRJ)偏微分方程。在此基础之上,我们建立了一个模块化的Lyapunov原理。如果状态空间Lyapunov函数既在完整生成子下耗散,又控制相关的搜索空间度量,那么状态空间Lyapunov泛函和诱导的搜索误差指数衰减。加性生成子结构允许逐个算子地组装耗散估计,为验证复合平均场算法的收敛性提供了一个工具箱。

英文摘要

Population-based and distributional optimization methods, from evolution strategies and consensus-based optimization to covariance-matrix adaptation and stochastic gradient methods viewed as distributional dynamics, are widely used for nonconvex or black-box problems, yet their convergence analyses remain fragmented across algorithm-specific techniques. We introduce an operator calculus in which a broad class of such methods, after choosing an appropriate state space and, where necessary, augmenting the state by memory or strategy variables, is described as a composition of three elementary operators (mutation, selection, and recombination) acting on probability measures. Under explicit stability and regularity conditions, the composite operator admits a pre-generator whose continuous-time limit is a transport-reaction-jump (TRJ) PDE that preserves the operator splitting. On this foundation we establish a modular Lyapunov principle. If a state-space Lyapunov function both dissipates under the full generator and controls the relevant search-space gauges, then the state-space Lyapunov functional and the induced search errors decay exponentially. The additive generator structure allows dissipation estimates to be assembled operator by operator, providing a toolkit for certifying convergence of composite mean-field algorithms.

2606.14335 2026-06-15 math.ST cs.IT cs.LG math.IT stat.TH 交叉投稿

Recovery thresholds for hidden weighted sparse graphs

隐藏加权稀疏图的恢复阈值

Zhe Hou, Jingcheng Liu

发表机构 * State Key Laboratory for Novel Software Technology(新型软件技术国家重点实验室)

AI总结 研究从带噪加权完全图中恢复隐藏图的阈值,基于Rényi散度与Erdős-Rényi随机图的第一矩阈值建立统一刻画,并扩展到部分恢复和全有或全无现象。

Comments 34 pages, 4 figures

详情
AI中文摘要

从含噪高维数据中恢复结构信息是统计推断的基本任务。我们研究隐藏在随机加权完全图中的图的恢复阈值。具体地,未知图 $H^* \in H_n$ 均匀随机选取,并隐藏在 $n$ 个顶点的完全图中:边 $e \in H$ 的权重独立地服从分布 $P_n$;否则权重独立地服从分布 $Q_n$。目标是从这些边权重中恢复几乎全部的 $H$。假设分布 $P_n$ 和 $Q_n$ 之间的Rényi散度满足局部Lipschitz条件,且图族 $H_n$ 满足温和的密度条件,我们给出了恢复几乎全部 $H$(也称为几乎精确恢复)的信息论极限的统一刻画。该刻画将 $P_n$ 和 $Q_n$ 之间的KL散度与 $H$ 在Erdős-Rényi随机图模型 $G(n,p)$ 中的第一矩阈值的对数联系起来。我们的下界也扩展到部分恢复任务,其中只需恢复 $H$ 的常数 $\lambda$ 比例。最后但同样重要的是,对于某些伯努利和指数分布以及高斯分布,我们能够在指数尺度上展示全有或全无(AoN)阈值现象。

英文摘要

Recovering structural information from noisy high-dimensional data is a fundamental task in statistical inference. We investigate the recovery thresholds for a graph hidden in a randomly weighted complete graph. Specifically, an unknown graph $H^* \in H_n$ is chosen uniformly at random, and hidden in a complete graph of $n$ vertices as follows: the weight of an edge $e \in H$ is distributed independently according to $P_n$; otherwise the weight is distributed independently according to $Q_n$. The goal is to recover almost all of $H$ from these edge weights. Assuming a local Lipschitzness of the Rényi divergence between distributions $P_n$ and $Q_n$, and a mild density condition for the graphs $H_n$, we give a unified characterization of the information-theoretic limit for recovering almost all of $H$ (also known as almost exact recovery). Our characterization connects the KL divergence between $P_n$ and $Q_n$ to the logarithm of the first moment threshold of $H$ in the Erdős-Rényi random graph model $G(n,p)$. Our lower bound also extends to the task of partial recovery, in which only a constant $λ$-fraction of $H$ needs to be recovered. Last but not least, for certain Bernoulli and Exponential regimes, and for Gaussian distributions, we are able to show an All-or-Nothing (AoN) threshold phenomenon at the exponential scale.

2606.14488 2026-06-15 cs.IT cs.LG math.IT 交叉投稿

Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It

非线性双时间尺度随机逼近:尖锐相变及其克服方法

Dhruv Sarkar, Vaneet Aggarwal

发表机构 * Indian Institute of Technology Kharagpur(印度理工学院克达尔格浦尔分校) Mohamed bin Zayed University of Artificial Intelligence(莫莫德 bin Zayed 人工智能大学) Purdue University(普渡大学)

AI总结 本文发现非线性双时间尺度随机逼近中慢速迭代的均方误差率存在依赖于正则性的相变边界,并通过引入辅助在线偏差估计器将慢速更新中的偏差项减去,从而在全部正则性参数下实现O(k^{-1})的收敛率。

详情
AI中文摘要

近期关于非线性双时间尺度随机逼近的有限时间分析表明,在压缩性假设下,慢速迭代$Y_k$使用步长$\beta_k=\Theta(k^{-1})$和$\alpha_k=\Theta(k^{-a})$($a\in(1/2,1)$)通常满足阶为$k^{-a}$的均方误差率;解耦的$k^{-1}$率需要强局部线性性。我们识别出一个尖锐的依赖于正则性的边界。在一个决定速率的规范形式中,慢速漂移包含一个局部线性泄漏和一个阶为$1+\rho$($\rho\in[0,1]$)的非线性余项,未修正的递归满足\[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+\rho)}\bigr), \]并且一个匹配的标量高斯下界表明,如果不修改更新,较慢的项是不可避免的。因此,当且仅当$a(1+\rho)\ge 1$时,未修正的递归保证解耦的$k^{-1}$率。这个下界仅针对朴素更新;它不是信息论障碍。我们通过为规范形式递归配备一个辅助在线偏差估计器\[ M_{k+1}=M_k+\gamma_k(R(X_k)-M_k),\qquad \beta_k\ll\gamma_k\ll\alpha_k, \]并从慢速更新中减去$M_k$来证明这一点。在相同的稳定性、矩和余项假设下,修正的递归对于每个$\rho\in[0,1]$实现$\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$,包括未修正更新被证明遭受较慢率的区域。最后,我们证明了局部传递定理,将相变机制推广到快速流形坐标中的一般非线性TTSA。证明是非渐近的,并依赖于两个阿贝尔变换抵消:一个用于局部线性快速误差泄漏,另一个用于跟踪的非线性偏差。

英文摘要

Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate $Y_k$ with stepsizes $β_k=Θ(k^{-1})$ and $α_k=Θ(k^{-a})$, $a\in(1/2,1)$, generally satisfies a mean-square rate of order $k^{-a}$; decoupled $k^{-1}$ rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order $1+ρ$ ($ρ\in[0,1]$), the uncorrected recursion satisfies \[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+ρ)}\bigr), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled $k^{-1}$ rate is guaranteed for the uncorrected recursion exactly when $a(1+ρ)\ge 1$. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ M_{k+1}=M_k+γ_k(R(X_k)-M_k),\qquad β_k\llγ_k\llα_k, \] and subtracting $M_k$ from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves $\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$ for every $ρ\in[0,1]$, including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.

2606.14560 2026-06-15 math.OC cs.LG stat.ML 交叉投稿

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

Muon 的免费重尾午餐:实证成功的理论证明

Florian Hübler, Thomas Pethick, Suvrit Sra

发表机构 * Department of Computer Science, ETH Zurich, Switzerland(苏黎世联邦理工学院计算机科学系) Department of Mathematics, Technical University of Munich, Germany(慕尼黑技术大学数学系) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心)

AI总结 本文在重尾非凸优化中证明,Muon 等非欧几里得方法在核范数平稳性下达到最优样本复杂度,避免了欧几里得方法的维度依赖,并通过大语言模型实验验证。

详情
AI中文摘要

最近,具有矩阵值更新的非欧几里得优化方法(如 Muon 和 Scion)在训练 Transformer 模型方面显示出强大的实证性能,但其相对于欧几里得方法的理论优势仍知之甚少。我们在重尾非凸机制中解决了这一差距,其中随机梯度具有有界的 $p$ 阶中心矩,$p \in (1,2]$。我们表明,某些非欧几里得方法在更强的平稳性度量下实现了最优样本复杂度,而欧几里得方法则会产生额外的维度相关成本。因此,对于 $m \times n$ 矩阵,Muon 在核范数下找到一个 $\varepsilon$-平稳点所需的样本数为 $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$,吸收了重尾噪声而无需额外的维度依赖,这与欧几里得方法不同。我们进一步证明,对于所有一阶方法在核范数平稳性下,该样本复杂度(包括其维度依赖)是最优的。在大语言模型上的实验支持了我们的理论。令人惊讶的是,我们的结果表明,除了 Muon 的谱几何之外,其他 Schatten 几何在某些设置下也能具有竞争力。

英文摘要

Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{Δ_1 L}{\varepsilon^2} \left(\frac σ\varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

2410.00722 2026-06-15 cs.LG math.AG 版本更新

On the Geometry and Optimization of Polynomial Convolutional Networks

多项式卷积网络的几何与优化

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * KTH Royal Institute of Technology(皇家理工学院)

AI总结 研究使用单项式激活函数的卷积神经网络,证明其参数化映射是正则且几乎处处同构,通过代数几何方法计算神经流形的维数和度,并量化回归损失优化中临界点的数量。

Comments Accepted at AISTATS 2025. New version: corrected Section 4.2

详情
AI中文摘要

我们研究了使用单项式激活函数的卷积神经网络。具体来说,我们证明了它们的参数化映射是正则的,并且在几乎处处(除了滤波器重缩放)是同构的。通过利用代数几何的工具,我们探索了该映射在函数空间中的像(通常称为神经流形)的几何性质。特别地,我们计算了神经流形的维数和度,这衡量了模型的表达能力,并描述了其奇点。此外,对于一般的较大数据集,我们推导出一个显式公式,量化了回归损失优化中出现的临界点数量。

英文摘要

We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

2507.13263 2026-06-15 cs.LG cs.AI 版本更新

From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces

从排序算法到可扩展核:高维排列空间中的贝叶斯优化

Zikai Xie, Linjiang Chen

发表机构 * State Key Laboratory of Precision and Intelligent Chemistry(精准与智能化学国家重点实验室)

AI总结 针对高维排列空间贝叶斯优化中表示可扩展性差的问题,提出基于排序算法的核函数框架,其中Mallows核是枚举排序的特例,而新提出的Merge核通过归并排序的分解结构实现Θ(n log n)复杂度且无信息损失,在低维性能相当,高维显著提升优化效果与计算效率。

Comments 9 pages, published on ICLR-26

详情
AI中文摘要

贝叶斯优化(BO)是黑箱优化的强大工具,但其在高维排列空间中的应用受到定义可扩展表示的严重限制。当前最先进的排列空间BO方法依赖于穷举的Ω(n^2)成对比较,导致密集表示,不适用于大规模排列。为了突破这一障碍,我们引入了一个新框架,通过从排序算法导出的核函数生成高效的排列表示。在该框架中,Mallows核可以被视为从枚举排序导出的特例。此外,我们引入了Merge核,它利用归并排序的分治结构生成紧凑的Θ(n log n)表示,实现了最低可能复杂度且无信息损失,并有效捕捉排列结构。我们的核心论点是,Merge核在低维设置中与Mallows核性能相当,但随着维度n增长,在优化性能和计算效率上显著优于后者。在各种排列优化基准上的广泛评估证实了我们的假设,表明Merge核为高维排列空间中的贝叶斯优化提供了可扩展且更有效的解决方案,从而释放了解决以前难以处理的问题(如大规模特征排序和组合神经架构搜索)的潜力。

英文摘要

Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive $Ω(n^2)$ pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the \textbf{Merge Kernel} , which leverages the divide-and-conquer structure of merge sort to produce a compact, $Θ(n\log n)$ to achieve the lowest possible complexity with no information loss and effectively capture permutation structure. Our central thesis is that the Merge Kernel performs competitively with the Mallows kernel in low-dimensional settings, but significantly outperforms it in both optimization performance and computational efficiency as the dimension $n$ grows. Extensive evaluations on various permutation optimization benchmarks confirm our hypothesis, demonstrating that the Merge Kernel provides a scalable and more effective solution for Bayesian optimization in high-dimensional permutation spaces, thereby unlocking the potential for tackling previously intractable problems such as large-scale feature ordering and combinatorial neural architecture search.

2511.07368 2026-06-15 cs.LG cs.AI 版本更新

Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories

后训练中的分布偏差:推理轨迹的马尔可夫分析

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Bo Xue, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

发表机构 * City University of Hong Kong(香港城市大学) Center for Advanced Intelligence Project, RIKEN(RIKEN高级智能研究中心) The Institute of Statistical Mathematics(统计数学研究所) University of Sydney(悉尼大学) CFAR and IHPC, Agency for Science, Technology and Research (A*STAR)(A*STAR的CFAR和IHPC) Nanyang Technological University(南洋理工大学) The University of Tokyo(东京大学)

AI总结 通过马尔可夫链模型分析后训练策略(如RLVR和ORM/PRM)如何强化高概率路径而遗忘稀有但关键的推理步骤,并证明探索策略(如拒绝简单实例和KL正则化)有助于保留稀有CoT。

详情
AI中文摘要

基础模型展现出广泛的知识但有限的特定任务推理能力,这促使了后训练策略的发展,例如基于可验证奖励的强化学习(RLVR)和测试时扩展(TTS)。尽管近期工作强调了探索在提升pass@K中的作用,但经验证据指向一个悖论:RLVR和ORM/PRM通常强化现有路径而非扩展推理范围,这引发了一个问题:如果没有新模式出现,探索为何有帮助?为调和这一悖论,我们采用Kim等人(2025)的视角,将简单(例如,简化分数)与困难(例如,发现某种对称性)推理步骤分别视为低概率和高概率的马尔可夫转移。在这个易处理的模型中,预训练对应于树图发现,而后训练对应于思维链(CoT)重新加权。我们可证明地表明,RLVR和ORM/PRM都会严重偏向若干高概率路径,从而遗忘稀有但关键的CoT。在此基础上,我们进一步证明,诸如拒绝简单实例和KL正则化等探索策略有助于保留稀有CoT。实证模拟证实了我们的理论结果。

英文摘要

Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RL with verifiable rewards (RLVR) and test-time scaling (TTS). While recent work highlights the role of exploration in improving pass@K, empirical evidence points to a paradox: RLVR and ORM/PRM typically reinforce existing paths rather than expanding the reasoning scope, raising the question of why exploration helps if no new patterns emerge. To reconcile this paradox, we adopt the perspective of Kim et al. (2025), viewing easy (e.g., simplifying a fraction) versus hard (e.g., discovering the some symmetry) reasoning steps as low versus high probability Markov transitions. In this tractable model, pretraining corresponds to tree-graph discovering, while post-training corresponds to CoT reweighting. We provably show that, both RLVR and ORM/PRM would favor heavily to several high-probability paths, and thereby forget rare-but-crucial CoTs. Building on this, we further prove that exploration strategies such as rejecting easy instances and KL regularization help preserve rare CoTs. Empirical simulations corroborate our theoretical results.

2511.19656 2026-06-15 cs.LG math.OC stat.ML 版本更新

Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

非凸-强凸双层优化的一阶Oracle下界复杂度

Kaiyi Ji

发表机构 * Kaiyi Ji(机凯毅)

AI总结 针对光滑非凸-强凸双层优化,在确定性和随机一阶Oracle模型下,分别证明了$\Omega(\kappa^{3/2}\epsilon^{-2})$和$\Omega(\kappa^{5/2}\epsilon^{-4})$的下界,改进了单层非凸优化和极小极大问题的已知最优下界。

Comments Accepted by ICML 2026

详情
AI中文摘要

尽管双层优化的上界保证已被广泛研究,但由于双层结构的复杂性,下界方面的进展有限。本文关注光滑非凸-强凸设定,并开发了新的困难实例,在确定性和随机一阶Oracle模型下得到了非平凡的下界。在确定性情形下,我们证明任何一阶零尊重算法至少需要$\Omega(\kappa^{3/2}\epsilon^{-2})$次Oracle调用才能找到$\epsilon$-精确的稳定点,改进了单层非凸优化和非凸-强凸极小极大问题已知的最优下界。在随机情形下,我们证明至少需要$\Omega(\kappa^{5/2}\epsilon^{-4})$次随机Oracle调用,同样强化了相关设定中的已知最优下界。我们的结果揭示了当前双层优化上下界之间的显著差距,并表明即使在简化设定(如二次下层目标)下,仍需进一步研究以理解标准一阶Oracle下双层优化的最优复杂度。

英文摘要

Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $Ω(κ^{3/2}ε^{-2})$ oracle calls to find an $ε$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $Ω(κ^{5/2}ε^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

2604.17402 2026-06-15 cs.LG cs.NE 版本更新

On the Generalization Bounds of Symbolic Regression with Genetic Programming

基于遗传规划的符号回归的泛化界

Masahiro Nomura, Ryoki Hamano, Isao Ono

发表机构 * Institute of Science Tokyo, Yokohama, Japan(东京科学研究所, Yokohama, 日本) CyberAgent, Shibuya, Japan(CyberAgent, Shibuya, 日本)

AI总结 本文对基于遗传规划的符号回归进行学习理论分析,推导出在树大小、深度和可学习常数约束下的泛化界,将泛化差距分解为结构选择和常数拟合两个可解释分量,为遗传规划中的常用实践提供理论依据。

Comments Accepted for PPSN2026

详情
AI中文摘要

基于遗传规划(GP)的符号回归(SR)旨在直接从数据中发现可解释的数学表达式。尽管其实验成功显著,但关于基于GP的SR为何能泛化到训练数据之外的理论理解仍然有限。在这项工作中,我们对表示为表达式树的SR模型进行了学习理论分析。我们推导了在树大小、深度和可学习常数约束下GP风格SR的泛化界。我们的结果将泛化差距分解为两个可解释的分量:一个结构选择项,反映了选择表达式树结构的组合复杂性;以及一个常数拟合项,捕捉了在固定结构内优化数值常数的复杂性。这种分解为GP中几种广泛使用的实践提供了理论视角,包括简约压力、深度限制、数值稳定算子和区间算术。特别是,我们的分析显示了结构限制如何减少假设类增长,而稳定性机制如何控制预测对参数扰动的敏感性。通过将这些实际设计选择与泛化界中的显式复杂性项联系起来,我们的工作为基于GP的SR中常见的经验行为提供了原则性解释,并有助于更严格地理解其泛化性质。

英文摘要

Symbolic regression (SR) with genetic programming (GP) aims to discover interpretable mathematical expressions directly from data. Despite its strong empirical success, the theoretical understanding of why GP-based SR generalizes beyond the training data remains limited. In this work, we provide a learning-theoretic analysis of SR models represented as expression trees. We derive a generalization bound for GP-style SR under constraints on tree size, depth, and learnable constants. Our result decomposes the generalization gap into two interpretable components: a structure-selection term, reflecting the combinatorial complexity of choosing an expression-tree structure, and a constant-fitting term, capturing the complexity of optimizing numerical constants within a fixed structure. This decomposition provides a theoretical perspective on several widely used practices in GP, including parsimony pressure, depth limits, numerically stable operators, and interval arithmetic. In particular, our analysis shows how structural restrictions reduce hypothesis-class growth while stability mechanisms control the sensitivity of predictions to parameter perturbations. By linking these practical design choices to explicit complexity terms in the generalization bound, our work offers a principled explanation for commonly observed empirical behaviors in GP-based SR and contributes towards a more rigorous understanding of its generalization properties.

2312.14889 2026-06-15 stat.ML cs.CR cs.LG math.ST stat.TH 版本更新

On Rate-Optimal Partitioning Classification from Observable and from Privatised Data

关于可观测数据和私有数据的最优划分分类方法

Balázs Csanád Csáji, László Györfi, Ambrus Tamás, Harro Walk

发表机构 * HUN-REN Institute for Computer Science and Control (SZTAKI)(HUN-REN计算机科学与控制研究所(SZTAKI)) Department of Probability Theory and Statistics, Institute of Mathematics, Eötvös Loránd University (ELTE)(概率论与统计学系,厄特沃什·洛朗大学数学学院(ELTE)) Department of Computer Science and Information Theory, Budapest University of Technology and Economics (BME)(计算机科学与信息理论系,布达佩斯技术与经济大学(BME)) Institute for Stochastics and Applications, University of Stuttgart(概率论与应用研究所,斯图加特大学)

AI总结 本文重新审视划分分类方法,在更宽松条件下(无需强密度假设)推导出可观测和私有数据下分类误差概率的收敛速率,该速率仅依赖于连续输入的内在维度。

详情
AI中文摘要

在本文中,我们重新审视了划分分类的经典方法,并在宽松条件下证明了新的收敛速率,既适用于可观测(非私有化)数据,也适用于私有化数据。我们考虑在 $d$ 维欧几里得空间中的分类问题。先前关于划分分类器的结果依赖于强密度假设(SDA),我们通过简单示例表明该假设具有限制性。在此,我们在更温和的假设下研究该问题。我们预设输入分布是绝对连续分布和离散分布的混合,使得绝对连续分量集中在 $d_a$ 维子空间上。除了标准的 Lipschitz 和边际条件外,还引入了绝对连续分量的一个新特征,据此计算分类误差概率的收敛速率,包括二元和多类情况。该界可以达到使用 SDA 所能达到的极小极大最优收敛速率,但在更温和的分布假设下。有趣的是,该收敛速率仅依赖于连续输入的内在维度 $d_a$,而非 $d$。在隐私约束下,数据无法直接观测,构建的分类器是合适的局部差分隐私机制随机结果的函数。在本文中,我们将拉普拉斯分布噪声添加到特征向量所有可能位置的离散化及其标签中。再次,可以在不使用 SDA 的情况下推导出分类误差概率收敛速率的紧上界,使得该速率依赖于 $2d_a$。

英文摘要

In this paper we revisit the classical method of partitioning classification and prove novel convergence rates under relaxed conditions, both for observable (non-privatised) and for privatised data. We consider the problem of classification in a $d$ dimensional Euclidean space. Previous results on the partitioning classifier worked with the strong density assumption (SDA), which is restrictive, as we demonstrate through simple examples. Here, we study the problem under much milder assumptions. We presuppose that the distribution of the inputs is a mixture of an absolutely continuous and a discrete distribution, such that the absolutely continuous component is concentrated on a $d_a$ dimensional subspace. In addition to the standard Lipschitz and margin conditions, a novel characteristic of the absolutely continuous component is introduced, by which the convergence rate of the classification error probability is computed, both for the binary and for the multi-class cases. This bound can reach the minimax optimal convergence rate achievable using SDA, but under much milder distributional assumptions. Interestingly, this convergence rate depends only on the intrinsic dimension of the continuous inputs, $d_a$, and not on $d$. Under privacy constraints, the data cannot be directly observed, and the constructed classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. In this paper we add Laplace distributed noises to the discretisations of all possible locations of the feature vector and to its label. Again, tight upper bounds on the convergence rate of the classification error probability can be derived, without using SDA, such that this rate depends on $2d_a$.

2405.03063 2026-06-15 math.ST cs.IT cs.LG math.IT stat.ME stat.ML stat.TH 版本更新

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

广义去偏Lasso的稳定性及其在基于重抽样的变量选择中的应用

Jingbo Liu

发表机构 * Department of Statistics, University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校统计系) Department of Electrical and Computer Engineering, the Grainger College of Engineering(格拉inger工程学院电子与计算机工程系)

AI总结 提出基于稳定性原理的广义去偏Lasso估计量,通过设计矩阵单列扰动下的简单更新公式,在比例增长机制下实现渐近精确近似,显著降低重抽样变量选择的计算成本。

Comments to appear in Bernoulli

详情
AI中文摘要

我们提出了一种基于稳定性原理的广义去偏Lasso估计量。当设计矩阵的单列被扰动时,该估计量允许一个简单的更新公式,可以从原始解计算得出。在具有良好条件协方差的次高斯设计下,这种近似在比例增长机制下对于除消失比例坐标外的所有坐标是渐近精确的。证明依赖于集中和反集中论证来控制误差项和符号变化。相比之下,在类似假设下建立可比较的分布极限(例如高斯性)仍然是开放的。作为一个应用,我们表明该近似显著降低了基于重抽样的变量选择过程的计算成本,包括条件随机化测试和局部knockoff滤波器。

英文摘要

We propose a generalized debiased Lasso estimator based on a stability principle. When a single column of the design matrix is perturbed, the estimator admits a simple update formula that can be computed from the original solution. Under sub-Gaussian designs with well-conditioned covariance, this approximation is asymptotically accurate for all but a vanishing fraction of coordinates in the proportional growth regime. The proof relies on concentration and anti-concentration arguments to control error terms and sign changes. In contrast, establishing comparable distributional limits (e.g., Gaussianity) under similar assumptions remains open. As an application, we show that the approximation significantly reduces the computational cost of resampling-based variable selection procedures, including the conditional randomization test and a local knockoff filter.

2506.06542 2026-06-15 stat.ML cs.LG 版本更新

Direct Fisher Score Estimation for Likelihood Maximization

直接Fisher得分估计用于似然最大化

Sherman Khoo, Yakun Wang, Song Liu, Mark Beaumont

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院) School of Biological Sciences, University of Bristol(布里斯托大学生物科学学院)

AI总结 针对似然函数难解但模型模拟易得的问题,提出基于局部得分匹配的顺序梯度优化方法,直接建模Fisher得分,实现快速高效的似然最大化。

详情
AI中文摘要

我们研究当似然函数难以处理但模型模拟易于获得时的似然最大化问题。我们提出一种顺序的、基于梯度的优化方法,该方法基于局部得分匹配技术直接建模Fisher得分,该技术使用来自每个参数迭代周围局部区域的模拟。通过对代理得分模型采用线性参数化,我们的技术允许闭式最小二乘解。这种方法提供了一种快速、灵活且高效的Fisher得分近似,有效平滑了似然目标,并缓解了复杂似然景观带来的挑战。我们为得分估计器提供了理论保证,包括平滑引入的偏差界限。在一系列合成和真实世界问题上的实证结果表明,与现有基准相比,我们的方法具有优越的性能。

英文摘要

We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate. By employing a linear parameterization to the surrogate score model, our technique admits a closed-form, least-squares solution. This approach yields a fast, flexible, and efficient approximation to the Fisher score, effectively smoothing the likelihood objective and mitigating the challenges posed by complex likelihood landscapes. We provide theoretical guarantees for our score estimator, including bounds on the bias introduced by the smoothing. Empirical results on a range of synthetic and real-world problems demonstrate the superior performance of our method compared to existing benchmarks.

2601.11626 2026-06-15 math.NA cs.LG cs.NA 版本更新

Concatenated Matrix SVD: Compression Bounds, Incremental Approximation, and Error-Constrained Clustering

拼接矩阵SVD:压缩界限、增量近似与误差约束聚类

Maksym Shamrai

发表机构 * Institute of Mathematics of NAS of Ukraine(乌克兰国家科学院数学研究所) MacPaw Research(MacPaw研究)

AI总结 针对拼接后截断SVD压缩中哪些矩阵可安全合并的问题,提出基于谱界和增量SVD的聚类框架,实现显式误差约束下的压缩感知矩阵分组。

Comments Published in Transactions on Machine Learning Research (06/2026)

详情
Journal ref
Transactions on Machine Learning Research (2026)
AI中文摘要

现代机器学习、信号处理和科学计算中出现了大量矩阵集合,通常通过拼接后截断奇异值分解(SVD)进行压缩。这种策略实现了参数共享和高效重构,已被广泛应用于多视图学习、信号处理到神经网络压缩等领域。然而,它留下了一个基本问题未解答:在显式重构误差约束下,哪些矩阵可以安全地拼接并压缩在一起?现有方法依赖于启发式或特定于架构的分组,并且对所得的SVD近似误差没有提供原则性保证。在本工作中,我们引入了一个理论驱动的框架,用于在SVD压缩约束下进行矩阵的压缩感知聚类。我们的分析建立了水平拼接矩阵的新谱界,从奇异值增长的下界推导出最优秩-$r$ SVD重构误差的全局上界。第一个界遵循Weyl型块扩展下的单调性,而第二个界利用增量残差的奇异值提供更紧的逐块保证。我们进一步开发了一种基于增量截断SVD的高效近似估计器,无需形成完整的拼接矩阵即可跟踪主导奇异值。因此,我们提出了三种聚类算法,仅当预测的联合SVD压缩误差低于用户指定阈值时才合并矩阵。这些算法在速度、可证明准确性和可扩展性之间权衡,实现了具有显式误差控制的压缩感知聚类。

英文摘要

Large collections of matrices arise throughout modern machine learning, signal processing, and scientific computing, where they are commonly compressed by concatenation followed by truncated singular value decomposition (SVD). This strategy enables parameter sharing and efficient reconstruction and has been widely adopted across domains ranging from multi-view learning and signal processing to neural network compression. However, it leaves a fundamental question unanswered: which matrices can be safely concatenated and compressed together under explicit reconstruction error constraints? Existing approaches rely on heuristic or architecture-specific grouping and provide no principled guarantees on the resulting SVD approximation error. In the present work, we introduce a theory-driven framework for compression-aware clustering of matrices under SVD compression constraints. Our analysis establishes new spectral bounds for horizontally concatenated matrices, deriving global upper bounds on the optimal rank-$r$ SVD reconstruction error from lower bounds on singular value growth. The first bound follows from Weyl-type monotonicity under blockwise extensions, while the second leverages singular values of incremental residuals to yield tighter, per-block guarantees. We further develop an efficient approximate estimator based on incremental truncated SVD that tracks dominant singular values without forming the full concatenated matrix. Therefore, we propose three clustering algorithms that merge matrices only when their predicted joint SVD compression error remains below a user-specified threshold. The algorithms span a trade-off between speed, provable accuracy, and scalability, enabling compression-aware clustering with explicit error control.

2605.04954 2026-06-15 cs.NE cs.LG 版本更新

On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

特征计算预算对黑箱优化中逐实例算法选择的影响

Koen van der Blom, Diederick Vermetten

发表机构 * Centrum Wiskunde & Informatica(荷兰阿姆斯特丹数学与信息学中心) Sorbonne Université(索邦大学) CNRS(国家科学研究中心) LIP6(LIP6实验室)

AI总结 研究黑箱优化中特征计算预算对逐实例算法选择性能的影响,发现即使花费25%预算计算特征,PIAS仍可行,且最优预算比例高度依赖场景。

详情
AI中文摘要

逐实例算法选择(PIAS)利用一组算法之间的互补性,通过决定在给定实例上运行哪个算法来提升性能。该决策基于实例的特征,而在黑箱优化(BBO)的背景下,这些特征需要消耗一部分优化预算来计算。这引发了两个问题:(a) 在特征计算上花费多少比例的预算时,PIAS对BBO变得值得;(b) 哪个预算比例能优化特征准确性与PIAS性能之间的权衡。为此,我们进行了一项广泛的研究,将不同采样预算用于特征计算的PIAS与单一最佳算法在多种算法选择场景下进行比较。这些场景包括两种组合规模、三个问题集、四种维度以及十个目标预算。我们发现,在大多数测试场景中,PIAS是可行的,即使将总预算的四分之一用于特征计算。用于特征计算的预算比例以最大化PIAS收益的权衡高度依赖于具体的算法选择场景。此外,平均而言,PIAS相对于虚拟最佳求解器的损失中有20%可归因于特征计算预算,这凸显了适当考虑特征预算的重要性。

英文摘要

Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

6. 高效学习、压缩与部署 24 篇

2606.13740 2026-06-15 cs.LG 新提交

Efficient On-Device Diffusion LLM Inference with Mobile NPU

基于移动NPU的高效设备端扩散大语言模型推理

Tuowei Wang, Yanfan Sun, Ju Ren

发表机构 * Tsinghua University(清华大学) Beihang University(北京航空航天大学)

AI总结 提出首个NPU感知推理框架Diffusion-LLM-on-NPU,通过多块推测解码、双路径渐进修正和交换优化内存运行时,在移动设备上加速扩散大语言模型推理,相比CPU基线实现17-42倍延迟降低。

详情
AI中文摘要

扩散大语言模型(dLLM)通过并行去噪多个token来加速生成,使其适用于延迟敏感的移动端推理。然而,重复去噪在智能手机上引入了大量计算。移动神经处理单元(NPU)提供高吞吐量的密集矩阵计算,但高效利用它们仍然具有挑战性:token提交缩小了每块的有效工作负载,token修订使KV缓存重用复杂化,且NPU可见地址空间有限导致昂贵的重映射和数据传输开销。在本文中,我们提出了Diffusion-LLM-on-NPU,这是首个用于在智能手机上加速dLLM的NPU感知推理框架。Diffusion-LLM-on-NPU通过三种技术将块级dLLM推理与移动NPU的执行特性对齐。(1)多块推测解码用推测的未来块token填充当前块解码后期阶段缩小的负载。(2)双路径渐进修订使已提交的token在稳定前保持可修订,并通过CPU侧路径刷新不稳定token,而不会阻塞密集的NPU执行。(3)交换优化内存运行时压缩NPU可见地址布局,并将数据准备与NPU计算重叠,以减少重映射和传输开销。我们将Diffusion-LLM-on-NPU实现为端到端框架,并在多种硬件平台和dLLM工作负载上进行评估。Diffusion-LLM-on-NPU在保留生成质量的同时,将LLaDA-8B的生成延迟比使用前缀KV缓存重用的CPU基线降低了17倍至42倍。

英文摘要

Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.

2606.13767 2026-06-15 cs.LG cs.AI cs.IT math.IT 新提交

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

超越LoRA:稀疏诱导的适应更好吗?

Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta

发表机构 * School of Data, Mathematical and Statistical Sciences, University of Central Florida, United States(中佛罗里达大学数据、数学与统计科学学院) College of Computing, Mohammed VI Polytechnic University (UM6P), Morocco(穆罕默德六世理工大学计算机学院) Department of Computer Science, University of Central Florida, United States(中佛罗里达大学计算机科学系)

AI总结 本文提出Cheap LoRA (cLA)及其变体,通过在LoRA中引入稀疏性实现参数高效微调,理论推导泛化误差界,实验表明在多种任务上性能与参数匹配基线相当,同时减少训练时间和峰值GPU内存。

Comments Overview of the paper and code can be found here: https://elicaden.github.io/Beyond_LoRA/

详情
AI中文摘要

低秩适应(LoRA)及其变体为预训练模型的全微调提供了一种内存和计算高效的替代方案。然而,关于这些方法的比较泛化能力以及低秩更新的结构限制如何保持有效适应性能的问题仍然存在。我们提出了一个历史框架,涵盖过去(全微调和原始LoRA)、现在(LoRA的不同变体),并通过在现有LoRA变体中引入稀疏性,提出了更简单、更便宜、参数高效的扩展:Cheap LoRA (cLA),训练单个低秩因子而固定另一个(确定性地或在其随机变体中随机地),以及链式循环变体${c}^3$LA。我们将cLA视为非对称LoRA的结构化实例,作为全微调的控制列子空间限制。我们推导了这些变体的信息论泛化误差界,这是该领域的首批尝试之一。在实验上,我们评估了10个预训练模型和14个数据集上的11种微调方法,使用损失景观和谱分析等工具分析了微调模型的性能和泛化能力。尽管微调模型对预训练模型、数据集和其他因素敏感,但我们的研究表明,将基于LoRA的PEFT方法的适应限制在稀疏、结构化的列空间上,在参数匹配基线的任务上仍然具有竞争力,同时即使使用朴素、非优化的稀疏实现,也能减少高达10%的训练时间和高达15%的峰值GPU内存。我们的理论和实验泛化度量为其成本效益适应提供了比常用分析工具更一致和原则性的方法。概述和代码可在以下网址获取:此 https URL。

英文摘要

Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performance. We present a historical framing, covering the past (full fine-tuning and original LoRA), the present (different variants of LoRA), and propose simpler, cheaper, parameter-efficient extensions by inducing sparsity within existing LoRA variants: Cheap LoRA (cLA), training a single low-rank factor with the other fixed (deterministically or, in its randomized variant, stochastically), and the chained circulant variant, ${c}^3$LA. We frame cLA as a structured instance of asymmetric LoRA, serving as a controlled column-subspace restriction of full fine-tuning. We derive information-theoretic generalization error bounds for these variants, marking one of the first endeavors in this area. Empirically, we evaluate 11 fine-tuning methods across 10 pre-trained models and 14 datasets, analyzing the fine-tuned models' performance and generalization using tools such as loss landscapes and spectral analysis. Despite the sensitivity of fine-tuned models to the pre-trained model, datasets, and other factors, our study suggests that restricting LoRA-based PEFT methods' adaptation to a sparse, structured column space remains competitive across tasks with their parameter-matched baselines while reducing up to 10% training time and peak GPU memory up to 15%, even with a naïve, non-optimized, sparse implementation. Our theoretical and empirical generalization measures provide a more consistent and principled approach to their cost-effective adaptation than commonly used analytical tools. Overview and code are available at: https://elicaden.github.io/Beyond_LoRA/.

2606.13894 2026-06-15 cs.LG cs.AI cs.CL cs.CV 新提交

Gefen: Optimized Stochastic Optimizer

Gefen: 优化随机优化器

Nadav Benedek, Tomer Koren, Ohad Fried

发表机构 * Reichman University(赖希曼大学) Tel Aviv University(特拉维夫大学) Google Research(谷歌研究院)

AI总结 提出Gefen优化器,通过共享二阶矩估计和量化一阶矩,将AdamW内存占用减少约8倍,同时保持相同性能,支持更大批量和吞吐量。

详情
AI中文摘要

AdamW是现代深度学习的默认优化器,但其一阶和二阶矩状态会额外占用约两倍参数大小的训练内存。我们提出Gefen,一种内存高效的优化器,它自动在参数块之间共享二阶矩估计,并使用学习到的码本量化一阶矩,从而将AdamW的内存占用减少约8倍,同时保持相同性能,相当于每十亿参数减少6.5 GiB。该方法受理论结果启发,该结果表明大的混合Hessian项将平方梯度的比率约束为接近1,表明Hessian对齐的参数是共享二阶矩统计量的自然候选。由于大规模计算Hessian不切实际,Gefen从初始平方梯度推断块结构,除了AdamW默认超参数外,不需要任何架构特定的元数据或超参数。Gefen学习基于精确直方图的动态规划量化码本,并重用相同的块进行一阶矩缩放。在多种实验中,Gefen在比较的类似AdamW的方法中实现了最低的峰值优化器内存,同时保持AdamW级别的性能。在FSDP和DDP训练中,减少的内存占用支持更大的微批次,并显著提高相对于AdamW的吞吐量,提供了一种实用的即插即用替代方案,具有更低的内存使用,可以增加吞吐量并支持训练更大的模型或使用更大的批量大小。我们提供了完整的Python实现,包括融合CUDA内核,网址为https://this https URL。

英文摘要

AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and quantizes the first moment using a learned codebook, thereby reducing AdamW's memory footprint by ~8x while maintaining the same performance, corresponding to a reduction of 6.5 GiB per billion parameters. The method is motivated by a theoretical result showing that large mixed Hessian entries constrain the ratio of squared gradients toward one, suggesting that Hessian-aligned parameters are natural candidates for sharing second-moment statistics. Since computing Hessians is impractical at scale, Gefen infers block structure from the initial squared gradients, requiring no architecture-specific metadata or hyperparameters beyond AdamW defaults. Gefen learns an exact histogram-based dynamic-programming quantization codebook and reuses the same blocks for first-moment scaling. Across diverse experiments, Gefen achieves the lowest peak optimizer memory among the compared AdamW-like methods while maintaining AdamW-level performance. In FSDP and DDP training, the reduced memory footprint enables larger microbatches and improves throughput significantly over AdamW, providing a practical drop-in replacement with lower memory usage that can increase throughput and enable training larger models or using larger batch sizes. We provide the complete Python implementation, including fused CUDA kernels at https://github.com/ndvbd/Gefen

2606.14150 2026-06-15 cs.LG cs.CL 新提交

Small LLMs: Pruning vs. Training from Scratch

小型LLM:剪枝 vs. 从头训练

Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu

发表机构 * Princeton University(普林斯顿大学) New York University(纽约大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文通过六种剪枝方法在Llama-3.1-8B上比较剪枝与从头训练,发现有限预算下剪枝更优,预算充足时粗粒度剪枝可被超越。

Comments Our code is available at https://github.com/zlab-princeton/llm-pruning-collection

详情
AI中文摘要

剪枝有望成为获得强大小型语言模型的捷径。在本工作中,我们通过六种涵盖深度、宽度和稀疏粒度的剪枝方法,在两种受控的token匹配设置下,以0.5-0.8的剪枝率对Llama-3.1-8B进行剪枝,检验了这一承诺。(1) 在相同的训练token预算下,剪枝初始化始终优于随机初始化。这表明父模型提供了一个强起点,尽管随着训练token预算的增加和剪枝率的提高,优势逐渐缩小,在我们研究的最高剪枝率下几乎消失。(2) 当从头训练被给予整个流程消耗的全部token预算时,细粒度剪枝仍保持优势,而粗粒度结构化剪枝可能被匹配或超越。这表明父模型传递了额外训练token无法完全恢复的知识,但仅在细粒度下如此。综合来看,我们的结果给出了明确的建议:当手头有一个大型预训练模型且训练token预算有限时,剪枝优于从头训练;当训练预算不受限时,从头训练在粗粒度剪枝下可能具有竞争力,因此大型预训练父模型并非总是必要的。

英文摘要

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5--0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

2606.14346 2026-06-15 cs.LG cs.AI 新提交

Squeeze-Release: Iterative Pruning with Exact Structural Minimization

挤压-释放:具有精确结构最小化的迭代剪枝

Roman Denkin, Ida Akerholm, Prashant Singh, Ida-Maria Sintorn

发表机构 * Uppsala University(乌普萨拉大学)

AI总结 提出Squeeze-Release循环,通过精确结构重写将掩码网络转化为更小密集网络,并引入CompensatedLayerNorm扩展至残差流,实现高达39倍压缩。

详情
AI中文摘要

非结构化剪枝产生稀疏权重张量,但标准实现保持张量形状不变,因此部署模型并不比剪枝前更小。我们提出一种精确的结构重写,称为最小化,它将掩码网络转换为一个更小的密集网络,其前向函数在浮点舍入误差内相同。挤压-释放循环迭代剪枝和最小化,中间有一个释放步骤,将压缩张量内的精确零位置重新启用为小的校准噪声,将原本浪费的容量转化为可训练参数。连续的循环利用该容量找到单次剪枝无法达到的结构冗余。我们还引入了CompensatedLayerNorm,这是一种保持功能的LayerNorm替代方案,将最小化扩展到具有LayerNorm的残差流上的通道缩减。挤压-释放将可部署网络压缩到比未剪枝模型小39倍(全连接模型网络)和14.8倍(现代CNN,ConvNeXt-Tiny),且精度相当。此外,我们证明该重写可以扩展到Transformer架构。

英文摘要

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.

2606.14598 2026-06-15 cs.LG 新提交

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0

在消费级GPU上实现扩散Transformer的原生INT8计算:用于Ideogram 4.0的融合INT8 GEMM内核

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 针对消费级Ampere GPU上INT8量化比FP8/NF4更慢的问题,提出融合Triton INT8 GEMM内核,直接利用INT8张量核心,在Ideogram 4.0中实现2.8-4.2倍加速,端到端速度提升约10%,使1024px单卡可行。

详情
AI中文摘要

扩散Transformer的训练后INT8(W8A8)量化被广泛用作速度优化,但在消费级Ampere GPU上,它通常比它本应击败的FP8和NF4替代方案更慢。我们将此归因于一个软件伪影:生产中的“INT8”前向量化权重和激活,但立即将它们反量化回bf16并执行bf16矩阵乘法,从未使用GPU的INT8张量核心,因此硬件的计算优势完全未被利用。我们通过一个单一的融合Triton INT8 GEMM(在Ampere张量核心上执行int8xint8->int32,并在epilogue中融合每token乘每通道的反量化和偏置,针对每个GEMM形状自动调优)来弥补这一差距,将其插入Ideogram 4.0扩散Transformer的线性层中,替代反量化到bf16的路径。在该内核中,int8xint8->int32累加与torch._int_mm逐位精确,反量化输出与参考的余弦相似度为1.0且无NaN,每个GEMM的运行速度比bf16快2.8-4.2倍。端到端在768px分辨率下实现约1.1倍(约9-10%)的加速,在1024px分辨率下,单张RTX 3090上生成图像耗时156.5秒,快于单卡NF4(164.5秒)和FP8(172.9秒)基线,且在这些点估计(PickScore/CLIPScore)上无质量损失。因此,INT8从最慢的变体变为最快,1024px在单GPU上变得可行。主要速度标准(击败FP8,约9.5%)轻松满足;NF4的差距(约4.9%,单次运行n=4)在未量化的运行间方差内,最好理解为与达到扩展目标一致。最后我们给出一个诚实的部署图:该优势特定于消费级Ampere,在A100和B200上,相同内核会输给这些卡快速的本地bf16/FP8路径。

英文摘要

Post-training INT8 (W8A8) quantization of diffusion transformers is widely deployed as a speed optimization, yet on consumer Ampere GPUs it is frequently slower than the FP8 and NF4 alternatives it is meant to beat. We trace this to a software artifact: the production "INT8" forward quantizes weights and activations only to immediately dequantize them back to bf16 and run a bf16 matrix multiply, never engaging the GPU's INT8 tensor cores, so the hardware's compute advantage is left entirely unrealized. We close this gap with a single fused Triton INT8 GEMM (int8xint8->int32 on Ampere tensor cores, with per-token x per-channel dequantization and bias folded into the epilogue, autotuned per GEMM shape) dropped into the Ideogram 4.0 diffusion transformer's linear layers in place of the dequantize-to-bf16 path. In the kernel, the int8xint8->int32 accumulation is bit-exact against torch._int_mm and the dequantized output matches the reference at cosine similarity 1.0 with no NaNs, running 2.8-4.2x faster than bf16 per GEMM. End to end it delivers a ~1.1x (~9-10%) speedup at 768px, and at 1024px it generates an image in 156.5 s on a single RTX 3090, faster than the single-card NF4 (164.5 s) and FP8 (172.9 s) baselines, at no measurable quality cost on these point estimates (PickScore/CLIPScore). INT8 thus goes from the slowest variant to the fastest, and 1024px becomes single-GPU feasible. The primary speed criterion (beat FP8, by ~9.5%) is comfortably met; the NF4 margin (~4.9%, single-run n=4) is within run-to-run variance we did not quantify and is best read as consistent with meeting the stretch target. We close with an honest deployment map: the win is specific to consumer Ampere, and on A100 and B200 the same kernel loses to those cards' fast native bf16/FP8 paths.

2606.14695 2026-06-15 cs.LG cs.CL 新提交

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Persona-Pruner: 为角色扮演雕琢轻量级模型

Jinsu Kim, Jihoon Tack, Noah Lee, Jongheon Jeong

AI总结 提出Persona-Pruner框架,通过从单个描述中隔离特定角色的子网络来剪枝语言模型,在保持角色扮演性能的同时大幅降低计算成本,性能下降比最强基线减少93.8%。

Comments 25 pages; ICML 2026; Code is available at https://github.com/jsu-kim/Persona-Pruner

详情
AI中文摘要

语言模型(LMs)作为角色扮演聊天机器人展现出显著潜力,在给定角色或用户画像规范时,能够提供一致且风格化的交互。然而,将这些能力应用于现实世界应用(例如,众多NPC同时交互的生态系统)时,由于过高的计算成本,暴露了关键的效率问题。在本文中,我们质疑将完整的通用模型专用于单一角色的必要性,假设特定角色身份仅依赖于模型总容量的一小部分。我们观察到,朴素地剪枝LM通常会严重降低特定角色的角色扮演性能;它无法区分冗余知识和基本角色特征。我们提出Persona-Pruner,一个通过从单个描述中隔离特定角色的子网络来雕琢轻量级角色扮演模型的框架。我们的实验一致表明,Persona-Pruner在保留角色扮演性能方面比现有最先进的LLM剪枝技术有效得多,在RoleBench上使用LLM-as-a-judge评分,将性能下降从密集模型减少至多93.8%(相比最强基线),同时仍保持通用LLM能力。代码可在以下网址获取:此https URL。

英文摘要

Language Models (LMs) have shown remarkable potential as role-playing chatbots, delivering consistent, stylized interactions when given a specification of a character or user persona. However, applying these capabilities to real-world applications (e.g., ecosystems with numerous NPCs interacting simultaneously) exposes a critical inefficiency due to the excessive computational cost. In this paper, we question the necessity of dedicating a full, generalist model to a single persona, hypothesizing that a specific character identity relies on only a fraction of the model's total capacity. We observe that naively pruning LMs often severely degrades the role-playing performance for a specific persona; it does not distinguish between redundant knowledge and essential character traits. We propose Persona-Pruner, a framework that sculpts a lightweight role-playing model by isolating persona-specific sub-networks from a single description. Our experiments consistently show that Persona-Pruner preserves role-playing performance substantially more effectively than existing state-of-the-art LLM pruning techniques, reducing the performance drop from the dense model by up to 93.8% over the strongest baseline on RoleBench in LLM-as-a-judge score, while still maintaining general LLM capabilities. Code is available at https://github.com/jsu-kim/Persona-Pruner.

2606.13694 2026-06-15 eess.SP cs.AI cs.LG 交叉投稿

Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention

基于轻量随机注意力的移动睡眠分期高效时序建模

Guisong Liu, Pengfei Wei, Jainsong Zhang, Martin Dresler

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出轻量随机注意力模块RA,通过固定随机投影实现相似性聚合,替代可学习序列建模,在移动睡眠分期中实现高效时序平滑,理论解释为随机注意力先验核,实验显示在准确率和F1上提升1-3%,性能媲美LSTM/GRU/Transformer。

Comments 7 pages, 1 figures, 5 tables

详情
AI中文摘要

移动睡眠分期是家庭睡眠监测和闭环调节的基础设施。但现有的序列模型如RNN和Transformer在移动部署中计算成本高。本文提出随机注意力(RA),一种基于固定随机投影的轻量时序建模模块,用基于相似性的聚合替代可学习的序列建模。RA在历元编码器之外引入极少的额外参数,同时实现有效的时序平滑。我们进一步通过随机注意力先验核(RAPK)提供理论解释,将RA分解为全局平滑项和特征相似性项,为时序睡眠结构提供可解释的视角。在Sleep-EDF-20和Sleep-EDF-78上的实验表明,RA在准确率和F1分数上持续提升历元级基线1-3%,同时达到与LSTM、GRU和Transformer模型相竞争的性能。RA还展示了在不同骨干编码器上的强泛化能力,以及相对于传统时序平滑方法的改进鲁棒性。这些结果表明,通过轻量基于相似性的时序聚合可以实现高效的睡眠分期,使RA适用于实时可穿戴应用。

英文摘要

Mobile sleep staging serves as a foundational infrastructure for in-home sleep monitoring and closed-loop modulation. But existing sequential models such as RNNs and Transformers are computationally expensive for mobile deployment. In this paper, we propose Random Attention (RA), a lightweight temporal modeling module based on fixed random projections, which replaces learnable sequence modeling with similarity-based aggregation. RA introduces little additional parameters beyond the epoch encoder while enabling effective temporal smoothing. We further provide a theoretical interpretation via the Random Attention Prior Kernel (RAPK), which decomposes RA into a global smoothing term and a feature similarity term, offering an interpretable view of temporal sleep structure. Experiments on Sleep-EDF-20 and Sleep-EDF-78 show that RA consistently improves epoch-wise baselines by 1-3\% in accuracy and F1 score, while achieving competitive performance compared with LSTM, GRU, and Transformer models. RA also demonstrates strong generalization across different backbone encoders and improved robustness over conventional temporal smoothing methods. These results indicate that efficient sleep staging can be achieved through lightweight similarity-based temporal aggregation, making RA suitable for real-time wearable applications.

2606.13709 2026-06-15 stat.ML cs.LG 交叉投稿

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

LoMC: 路由基础模型中拒绝抑制的局部多方向校正

Yan Hong, Kedong Xiu, Wei Li, Jun Lan, Huijia Zhu, Shuheng Zhou, Zhongcai Lyu, Weiqiang Wang, Jianfu Zhang

发表机构 * Ant Group(蚂蚁集团) Zhejiang University(浙江大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出LoMC方法,通过支持门控干预框架在路由MoE和混合MoE模型中实现紧凑的拒绝抑制,提升非拒绝目标响应行为并保持通用能力。

详情
AI中文摘要

我们研究了路由MoE和混合MoE基础模型中的受控后训练拒绝抑制,旨在增加非拒绝目标响应行为,同时在紧凑的干预足迹下保持通用能力。现有的基于广泛方向的编辑可能会扰动通用计算,而仅支持专家编辑通常缺乏足够的容量来纠正异质拒绝表示。为了解决这一限制,我们引入了局部多方向校正(LoMC),一种支持门控干预框架,遵循支持-然后-校正的执行顺序:它首先识别紧凑的编辑支持,然后将原型校正方向聚合成逐层校正方向,最后仅在选定的支持内应用秩一逐层校正。通过使用编辑支持作为结构门控约束,LoMC在不扩大干预范围的情况下增加了校正容量。在四个路由骨干上的纯文本和多模态安全基准实验表明,LoMC在紧凑干预足迹下显著改善了非拒绝目标响应行为,同时保持了通用能力。

英文摘要

We study controlled post-training refusal suppression in routed MoE and hybrid-MoE foundation models, aiming to increase non-refusal target-response behavior while preserving general capability under a compact intervention footprint. Existing broad direction-based edits can perturb general-purpose computation, whereas support-only expert edits often lack sufficient capacity to correct heterogeneous refusal representations. To address this limitation, we introduce Localized Multidirectional Correction (LoMC), a support-gated intervention framework that follows a support-then-correction execution order: it first identifies a compact edit support, then aggregates prototype correction directions into layer-wise correction directions, and finally applies rank-one layer-wise correction only within the selected support. By using the edit support as a structural gating constraint, LoMC increases correction capacity without expanding the intervention scope. Experiments on text-only and multimodal safety benchmarks across four routed backbones show that LoMC substantially improves non-refusal target-response behavior while maintaining general capability under a compact intervention footprint.

2606.13825 2026-06-15 math.OC cs.LG 交叉投稿

Scalable Deep Unfolding of Conic Optimizers

锥优化器的可扩展深度展开

Alex Oshin, Rahul Vodeb Ghosh, Evangelos A. Theodorou

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出矩阵自由隐式微分和基于Dalečkii-Krein的PSD锥反向传播规则,解决深度展开应用于大规模半定规划时的内存和数值稳定性问题,实现轻量级超参数策略和热启动学习,在多种问题上取得高达50倍加速。

详情
AI中文摘要

深度展开(DU)通过引入可学习组件并在展开迭代中进行训练来加速迭代优化器,但将DU扩展到机器人领域常见的大规模半定规划(SDP)仍然有限。展开全更新锥求解器(如COSMO)暴露了先前关于学习型锥求解器的工作未涉及的两个障碍:通过每次迭代的线性系统求解进行反向传播,当系数矩阵显式形成时,内存与问题规模成二次方关系;通过半正定(PSD)锥投影进行反向传播在特征值重合时变得数值不稳定。我们通过一种完全基于矩阵-向量乘积的矩阵自由隐式微分规则解决了第一个障碍,将内存从$O(n^2)$降低到$O(n)$,并使得在直接分解耗尽内存的规模下也能进行反向传播。我们通过基于Fréchet导数的Dalečkii-Krein表示的后向规则解决了第二个障碍,该规则在重复特征值下仍然定义良好。这些共同使得学习全更新锥求解器的轻量级超参数策略和热启动成为可能。我们在通过序列凸规划(SCP)求解的非线性协方差控制问题,以及从最大割和Lovász $\vartheta$ SDP到鲁棒估计和控制问题的独立SDP和第二阶锥规划上进行了评估。学习到的策略在所有问题上都优于最先进的求解器,并且根据问题类别可提供高达50倍的加速。当作为SCP中的子程序使用时,与COSMO相比,学习的方法提供了超过30倍的加速。

英文摘要

Deep unfolding (DU) accelerates iterative optimizers by introducing learnable components and training them through unrolled iterations, but extending DU to the large-scale semidefinite programs (SDPs) common in robotics has remained limited. Unrolling a full-update conic solver such as COSMO exposes two obstacles that prior work on learned conic solvers has not: backpropagating through the per-iteration linear-system solve incurs memory quadratic in the problem size once the coefficient matrix is formed explicitly, and backpropagating through the positive semidefinite (PSD) cone projection becomes numerically unstable when eigenvalues coincide. We address the first obstacle with a matrix-free implicit differentiation rule that operates entirely through matrix-vector products, reducing memory from $O(n^2)$ to $O(n)$ and enabling backpropagation at scales where direct factorization runs out of memory. We address the second with a backward rule based on the Dalečkii--Krein representation of the Fréchet derivative, which remains well-defined under repeated eigenvalues. Together these make it possible to learn lightweight hyperparameter policies and warm-starts for a full-update conic solver. We evaluate on nonlinear covariance steering problems solved via sequential convex programming (SCP), as well as standalone SDPs and second-order cone programs ranging from max-cut and Lovász $\vartheta$ SDPs to robust estimation and control problems. The learned policies outperform state-of-the-art solvers across all problems, and can provide up to a 50$\times$ speedup depending on the class. When used as a subroutine in SCP, the learned approach delivers over a 30$\times$ speedup compared to COSMO.

2606.14010 2026-06-15 cs.CV cs.LG cs.RO 交叉投稿

RT-VLA: Real-Time Vision-Language-Action Models via Knowledge Distillation

RT-VLA:通过知识蒸馏实现实时视觉-语言-动作模型

Xiangyu Huang, Zhenlin Hua, Han Zhou, Shounak Sural, Ragunathan Rajkumar

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出RT-VLA,通过多级监督蒸馏将SimLingo模型的能力压缩至轻量学生模型,在保持竞争性能的同时将推理时间降低44.8倍(纯视觉模式)和7.9倍(视觉+语言模式),实现实时可解释的VLA自动驾驶。

详情
AI中文摘要

视觉-语言-动作(VLA)模型通过联合建模视觉感知、语言推理、可解释性和动作预测,在端到端自动驾驶中展现出强大潜力。然而,其庞大的视觉-语言骨干网络和推理模块引入了显著的推理延迟,从而阻碍了它们在道路网络严苛现实中的部署。我们提出RT-VLA,一种轻量级、蒸馏的VLA模型,通过多级监督蒸馏将最先进的SimLingo模型的驾驶和推理能力迁移到紧凑的学生模型中。RT-VLA保留了基于语言的推理,并通过离线语言分析安全关键驾驶时刻来支持事后解释,而不增加实时控制的延迟。与SimLingo教师模型相比,RT-VLA在保持竞争性的闭环驾驶和语言推理性能的同时,在纯视觉模式下将推理时间减少了44.8倍,在视觉+语言模式下减少了7.9倍。这些结果表明,监督蒸馏是构建实时、可解释的VLA风格自动驾驶模型的实用方法。

英文摘要

Vision-Language-Action (VLA) models have shown strong potential for end-to-end autonomous driving by jointly modeling visual perception, language reasoning, explainability and action prediction. However, their large vision-language backbones and reasoning modules introduce substantial inference latency and thereby prevent their deployment in the unforgiving reality of the road networks. We propose RT-VLA, a lightweight, distilled VLA model that transfers the driving and reasoning capabilities of the state-of-the-art SimLingo model into a compact student through multi-level supervised distillation. RT-VLA preserves language-based reasoning and supports post-hoc explanation through offline language analysis of safety-critical driving moments without adding latency to real-time control. Compared to the SimLingo teacher, RT-VLA maintains competitive closed-loop driving and language reasoning performance while reducing inference time by 44.8X in vision-only mode and 7.9X in vision+language mode. These results suggest that supervised distillation is a practical approach for building real-time, explainable VLA-style autonomous driving models.

2606.14684 2026-06-15 cs.CV cs.LG 交叉投稿

HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire Classification

HumP-KD: 一种混合不确定性感知的多阶段渐进式知识蒸馏框架用于高效火灾分类

Mohammed Arif Mainuddin, Najifa Tabassum, Omar Ibne Shahid, Riasat Khan

AI总结 提出HumP-KD框架,通过层次化渐进式知识蒸馏和多阶段蒸馏,将两个冻结的异构Transformer教师(Swin-Tiny和ViT-Base)及其集成知识蒸馏到轻量级MobileViT-S学生模型中,在火灾分类任务上显著提升性能,同时保持低参数量和实时推理速度。

详情
AI中文摘要

实时火灾分类系统需要模型同时具备准确性、计算效率以及可在资源受限硬件上部署的能力。本文提出\textbf{HumP-KD},一种混合不确定性感知的多阶段渐进式知识蒸馏框架,用于高效火灾分类。使用了两个数据集:FlameVision(8600张图像)和Dataset-II(31309张图像)。在标准预处理、在线增强、高斯噪声和运动模糊鲁棒性条件下,应用了多种CNN和Transformer基线模型。所提出的HumP-KD模型通过三个紧密集成的组件,将两个冻结的异构Transformer教师(Swin-Tiny和ViT-Base)及其Meta-MLP集成的知识蒸馏到轻量级MobileViT-S学生中。层次化渐进式知识蒸馏采用层次化特征构建器,生成融合的空间注意力掩码,以选择性地引导蒸馏到判别性区域。多阶段知识蒸馏在训练过程中逐步激活三个蒸馏阶段。在Dataset-II上,HumP-KD在10次独立试验中平均F1分数达到$0.9876 \pm 0.0063$,显著优于未使用蒸馏训练的MobileViT-S基线($0.9537 \pm 0.0351$),独立t检验($p = 0.0195$)和Wilcoxon符号秩检验($W = 1$,$p = 0.0039$)均证实了统计显著性。所提出的方法还展示了跨数据集的强泛化能力和在退化视觉条件下的鲁棒性。学生模型仅保留4.94M参数和19.01Mb模型大小,相比Swin-Tiny参数减少$5.7\times$,相比ViT-Base减少$17.5\times$,同时达到37.72 CPU FPS,适合实时部署。

英文摘要

Real-time fire classification systems require models that are simultaneously accurate, computationally efficient, and deployable on resource-constrained hardware. This work proposes \textbf{HumP-KD}, a Hybrid Uncertainty-aware Multi-stage Progressive Knowledge Distillation framework for efficient fire classification. Two datasets, FlameVision and Dataset-II, containing 8,600 and 31,309 images, are used. Various CNN and transformer baselines are applied under standard preprocessing, online augmentation, Gaussian noise and motion blur robustness conditions. The proposed HumP-KD model distills knowledge from two frozen heterogeneous transformer teachers, Swin-Tiny and ViT-Base, along with their Meta-MLP ensemble, into a lightweight MobileViT-S student via three tightly integrated components. Hierarchical Progressive Knowledge Distillation employs a Hierarchical Feature Builder. It generates a fused spatial attention mask to guide distillation toward discriminative regions selectively. Multi-Stage Knowledge Distillation progressively activates three distillation stages across training. On Dataset-II, HumP-KD achieves a mean F1 score of $0.9876 \pm 0.0063$ across 10 independent trials, significantly outperforming the MobileViT-S baseline trained without distillation ($0.9537 \pm 0.0351$), with statistical significance confirmed by both independent t-test ($p = 0.0195$) and Wilcoxon signed-rank test ($W = 1$, $p = 0.0039$). The proposed method also demonstrates strong generalization across datasets and robustness under degraded visual conditions. The student model retains only 4.94M parameters and 19.01Mb model size, representing a $5.7\times$ parameter reduction over Swin-Tiny and a $17.5\times$ reduction over ViT-Base, while achieving 37.72 CPU FPS, making it suitable for real-time deployment.

2505.12992 2026-06-15 cs.LG cs.AI cs.CL stat.ML 版本更新

Fractured Chain-of-Thought Reasoning

断裂链式思维推理

Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong

发表机构 * University of Amsterdam(阿姆斯特丹大学) eBay Microsoft(微软) Google Research(谷歌研究) Salesforce

AI总结 提出断裂采样策略,通过截断推理链、调整轨迹数和解数,在推理时实现精度与成本的帕累托最优。

详情
AI中文摘要

推理时扩展技术通过在不重新训练的情况下利用额外的推理计算,显著增强了大型语言模型(LLMs)的推理能力。类似地,链式思维(CoT)提示及其扩展Long CoT通过生成丰富的中间推理轨迹来提高准确性,但这些方法会带来大量的token成本,阻碍了它们在延迟敏感场景中的部署。在这项工作中,我们首先证明截断CoT(即在完成推理前停止并直接生成最终答案)通常在使用显著更少token的情况下与完整CoT采样相匹配。基于这一见解,我们引入了断裂采样,这是一种统一的推理时策略,沿着三个正交轴在完整CoT和仅解决方案采样之间进行插值:(1)推理轨迹的数量,(2)每条轨迹的最终解数量,以及(3)推理轨迹被截断的深度。通过在五个不同的推理基准和多个模型规模上进行大量实验,我们证明断裂采样始终实现优越的精度-成本权衡,在Pass@k与token预算之间产生陡峭的对数线性缩放增益。我们的分析揭示了如何在这些维度上分配计算以最大化性能,为更高效和可扩展的LLM推理铺平了道路。代码可在该https URL获取。

英文摘要

Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches the full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.

2506.17255 2026-06-15 cs.LG cs.AI 版本更新

UltraSketchLLM: Sub-1-Bit LLM Compression via Sketch and Hardware-Friendly Operators

UltraSketchLLM:基于草图与硬件友好算子的低于1比特LLM压缩

Sunan Zou, Xueting Sun, Ziyun Zhang, Guojie Luo

发表机构 * National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University(国家多媒体信息处理重点实验室,计算机科学学院,北京大学) School of Electronic Engineering and Computer Science, Peking University(电子工程与计算机科学学院,北京大学) Center for Energy-efficient Computing and Applications, Peking University(能效计算与应用中心,北京大学)

AI总结 提出UltraSketchLLM,利用数据草图将LLM权重压缩至0.5比特,结合硬件友好实现,在保持可接受性能下降的同时实现14.9倍加速。

Comments Accepted by the 63rd ACM/IEEE The Chips to Systems Conference (DAC 2026)

详情
AI中文摘要

大型语言模型(LLM)如今需要更大的GPU内存,因此需要高效且极端的权重压缩方法。现有的压缩方法要么在理论上受限于每权重1比特,要么面临严重的性能下降和效率低下。为了在资源受限的场景中部署LLM,我们引入了UltraSketchLLM,它利用数据草图压缩LLM。它通过高达每权重0.5比特的高压缩率降低了峰值GPU内存占用。结合硬件友好的实现,UltraSketchLLM保持了可容忍的性能下降和极低的延迟开销,与朴素草图解决方案相比实现了14.9倍的加速。

英文摘要

Large language models (LLMs) require larger GPU memory size these days, necessitating efficient and extreme weight compression methods. Existing compression methods are either theoretically limited by 1 bit per weight or face severe performance degradation and inefficiency. To deploy LLMs in resource-constrained scenarios, we introduce UltraSketchLLM, compressing LLMs with data sketch. It reduces peak GPU memory footprint with a high compression rate down to 0.5 bit per weight. Combined with hardware-friendly implementation, UltraSketchLLM keeps tolerable performance degradation and extremely low latency overhead with 14.9x speedup compared to naive sketch solution.

2602.03120 2026-06-15 cs.LG cs.AI 版本更新

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

量化进化策略:以低精度代价实现量化大语言模型的高精度微调

Yinggan Xu, Kajetan Schweighofer, Risto Miikkulainen, Xin Qiu

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Cognizant AI Lab(Cognizant AI实验室) UT Austin(得克萨斯大学奥斯汀分校)

AI总结 提出量化进化策略(QES),通过集成累积误差反馈和无状态种子重放,直接在量化空间进行全参数微调,无需反向传播,显著优于现有零阶微调方法。

Comments Added more tasks and baselines

详情
AI中文摘要

后训练量化(PTQ)对于在内存受限设备上部署大语言模型(LLM)至关重要,但它使模型变得静态且难以微调。标准的微调范式,包括强化学习(RL),从根本上依赖于反向传播和连续权重来计算梯度。因此,它们无法用于参数空间离散且不可微的量化模型。虽然进化策略(ES)提供了一种无需反向传播的替代方案,但由于梯度估计消失或不准确,量化参数的优化仍可能失败。本文介绍了量化进化策略(QES),一种直接在量化空间执行全参数微调的优化范式。QES基于两项创新:(1)它集成了累积误差反馈以保留高精度权重更新信号,(2)它利用无状态种子重放将内存使用降低到低精度推理水平。QES在各种任务上显著优于最先进的零阶微调方法,使得量化模型的直接微调成为可能。因此,它开辟了完全在量化空间中扩展LLM的可能性。源代码可在此https URL获取。

英文摘要

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and continuous weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient estimation. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision weight updating signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning methods on a variety of tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

2602.08324 2026-06-15 cs.LG 版本更新

Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

通过极端比例思维链压缩实现高效大型语言推理模型

Yuntian Tang, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Wenxi Li, Wei Li, Jie Hu, Xinghao Chen Rongrong Ji, Shaohui Lin

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出Extra-CoT框架,通过极端比例压缩思维链、混合比例监督微调和约束层次化比率策略优化,在显著减少推理令牌的同时保持甚至提升推理准确率。

Comments Accepted to ICML 2026. 15 pages, 7 figures

详情
AI中文摘要

思维链推理成功增强了大型语言模型的推理能力,但推理时会产生大量计算开销。现有的思维链压缩方法在高压缩比下常遭受关键逻辑保真度的损失,导致性能显著下降。为实现高保真、快速推理,我们提出了一种新颖的极端比例思维链压缩框架,称为Extra-CoT,该框架在保留答案准确性的同时,激进地减少令牌预算。为了生成可靠的高保真监督,我们首先在带有细粒度标注的数学思维链数据上训练一个专用的语义保留压缩器。然后,通过混合比例监督微调对大型语言模型进行微调,使其学习遵循一系列压缩预算,并为强化学习提供稳定的初始化。我们进一步提出约束和层次化比率策略优化,通过层次化奖励明确激励在较低预算下的问题解决能力。在三个数学推理基准上的实验显示了Extra-CoT的优越性。例如,在MATH-500上使用Qwen3-1.7B,Extra-CoT实现了超过73%的令牌减少,同时准确率提升0.6%,显著优于最先进方法。我们的源代码已在https://github.com/Mwie1024/Extra-CoT发布。

英文摘要

Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We further propose Constrained and Hierarchical Ratio Policy Optimization (CHRPO) to explicitly incentivize question-solving ability under lower budgets by a hierarchical reward. Experiments on three mathematical reasoning benchmarks show the superiority of Extra-CoT. For example, on MATH-500 using Qwen3-1.7B, Extra-CoT achieves over 73\% token reduction with an accuracy improvement of 0.6\%, significantly outperforming state-of-the-art (SOTA) methods. Our source codes have been released at https://github.com/Mwie1024/Extra-CoT.

2603.10444 2026-06-15 cs.LG cs.AI 版本更新

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

FP4量化LLM训练中均值偏差的诅咒与祝福

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fang Dong, Anrui Chen, Ruijun Huang, Xin Zhang, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Tun Lu, Fan Yang, Yixuan Chen, Li Shang

发表机构 * Fudan University(复旦大学) University of Bath(巴斯大学) Shanghai Innovation Institute(上海创新研究院) University of Oxford(牛津大学) Oxford Suzhou Centre for Advanced Research(牛津苏浙研究中心) University of Colorado Boulder(科罗拉多大学波德格分校) University of Michigan(密歇根大学) Shenzhen Loop Area Institute(深圳环宇研究院)

AI总结 发现FP4训练失败源于激活异常值由秩一均值偏差主导,提出Averis均值残差分离量化法,在Qwen3模型上实现鲁棒W4A4G4训练,损失差距低于NVIDIA的Hadamard方法。

详情
AI中文摘要

FP4训练有望为大型语言模型节省大量内存和计算,但由于分块量化受极端激活幅度支配,导致动态范围膨胀并压缩长尾信号,因此仍然脆弱。我们发现了这一失败的一个反直觉来源:主导激活异常值不仅仅是任意的稀疏事件,而主要是由一致的秩一均值偏差引起的,其方向与主导各向异性谱分量对齐。该均值分量在训练过程中增强,被注意力和FFN算子放大和重塑,并日益主导顶部激活幅度。至关重要的是,这一发现揭示了一个看似复杂的异常值抑制问题实际上有一个非常简单的解决方案:在量化之前隔离一致的均值。因此,我们提出了Averis,一种均值残差分割量化方法,该方法在FP4量化之前仅使用归约和逐元素减法来分离均值分量。在100B token上训练的Qwen3 0.6B密集模型和50B token上训练的Qwen3 7B A1.5B MoE模型上,Averis实现了鲁棒的W4A4G4 FP4训练,将BF16损失差距降低至1.19%/0.81%,而NVIDIA最近发布的基于Hadamard的异常值平滑方法为2.05%/1.10%,同时将下游差距限制在0.89/0.71点。Averis在vanilla NVFP4上的端到端开销仅为2.20%,约为NVIDIA基于Hadamard设计的30%,为稳定的低位LLM训练提供了一条硬件高效的路径。与Hadamard互补,Averis在结合使用时进一步将Qwen3-0.6B的损失和下游差距降低至0.94%和0.73点。代码可在以下网址获取:this https URL。

英文摘要

FP4 training promises substantial memory and compute savings for large language models, but remains fragile because blockwise quantization is dictated by extreme activation magnitudes, which inflate dynamic range and compress long-tail signals. We identify a counterintuitive source of this failure: dominant activation outliers are not merely arbitrary sparse events, but are largely induced by a coherent rank-one mean bias, whose direction aligns with the leading anisotropic spectral component. This mean component strengthens during training, is amplified and reshaped by attention and FFN operators, and increasingly dominates top activation magnitudes. Crucially, this discovery reveals that a seemingly complex outlier-suppression problem admits a truly simple solution: isolate the coherent mean before quantization. We therefore propose Averis, a mean-residual splitting quantization method that separates the mean component using only reductions and elementwise subtractions before FP4 quantization. Across Qwen3 0.6B Dense trained on 100B tokens and Qwen3 7B A1.5B MoE trained on 50B tokens, Averis enables robust W4A4G4 FP4 training, reducing BF16 loss gaps to 1.19%/0.81% versus 2.05%/1.10% for NVIDIA's recently released Hadamard-based outlier-smoothing method, while limiting downstream gaps to 0.89/0.71 points. With only 2.20% end-to-end overhead over vanilla NVFP4, about 30% of NVIDIA's Hadamard-based design, Averis provides a hardware-efficient path to stable low-bit LLM training. Complementary to Hadamard, Averis further reduces the Qwen3-0.6B loss and downstream gaps to 0.94% and 0.73 points when combined. Code is available at: https://anonymous.4open.science/r/averis-504D.

2603.15481 2026-06-15 cs.LG cs.AI 版本更新

TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins

TabKD: 通过学习特征箱的交互多样性实现表格知识蒸馏

Shovon Niverd Pereira, Krishna Khadka, Yu Lei

发表机构 * Department of Computer Science and Engineering, The University of Texas at Arlington(计算机科学与工程系,德克萨斯理工大学阿灵顿分校)

AI总结 提出TabKD方法,通过学习与教师决策边界对齐的自适应特征箱,生成最大化成对交互覆盖的合成查询,在表格数据知识蒸馏中显著提升学生-教师一致性。

Comments Accepted in 35th International Joint Conference on Artificial Intelligence IJCAI 2026

详情
AI中文摘要

无数据知识蒸馏可以在没有原始训练数据的情况下实现模型压缩,这对于隐私敏感的表格领域至关重要。然而,现有方法在表格数据上表现不佳,因为它们没有明确处理特征交互,而特征交互是表格模型编码预测知识的基本方式。我们识别出交互多样性,即特征组合的系统覆盖,是有效表格蒸馏的基本要求。为了实施这一见解,我们提出了TabKD,它学习与教师决策边界对齐的自适应特征箱,然后生成最大化成对交互覆盖的合成查询。在4个基准数据集和4种教师架构上,TabKD在16个配置中的14个中实现了最高的学生-教师一致性,优于5个最先进的基线。我们进一步表明,交互覆盖与蒸馏质量强相关,验证了我们的核心假设。我们的工作建立了以交互为中心的探索作为表格模型提取的原则性框架。

英文摘要

Data-free knowledge distillation enables model compression without original training data, critical for privacy-sensitive tabular domains. However, existing methods does not perform well on tabular data because they do not explicitly address feature interactions, the fundamental way tabular models encode predictive knowledge. We identify interaction diversity, systematic coverage of feature combinations, as an essential requirement for effective tabular distillation. To operationalize this insight, we propose TabKD, which learns adaptive feature bins aligned with teacher decision boundaries, then generates synthetic queries that maximize pairwise interaction coverage. Across 4 benchmark datasets and 4 teacher architectures, TabKD achieves highest student-teacher agreement in 14 out of 16 configurations, outperforming 5 state-of-the-art baselines. We further show that interaction coverage strongly correlates with distillation quality, validating our core hypothesis. Our work establishes interaction-focused exploration as a principled framework for tabular model extraction.

2604.21335 2026-06-15 cs.LG cs.CL 版本更新

Sub-Token Routing for KV Cache Compression

子令牌路由用于KV缓存压缩

Wei Jiang, Wei Wang

发表机构 * Futurewei Technologies(未来智科)

AI总结 提出子令牌路由方法,在保留令牌内对值向量分组并选择性保留,与令牌级压缩互补,在LLM和VLM中提升压缩性能。

Comments 17 pages, 8 tables, 2 figures

详情
AI中文摘要

Transformer推理通常需要大型KV缓存,尤其是在长上下文语言建模和多模态生成中。现有的压缩方法通常通过选择、驱逐、量化或压缩缓存令牌,或在语言模型推理前减少视觉令牌序列来降低缓存成本。我们引入子令牌路由,一种KV压缩方法,它在保留令牌内部添加了更精细的控制轴。它将每个保留的值向量分成组,并仅保留选定的组,同时保持查询和键状态不变。该方法设计在令牌级缩减之后工作。首先,令牌缩减方法确定保留哪些令牌。然后,子令牌路由压缩这些保留令牌内部的值状态。在匹配KV预算下的实验表明,添加子令牌路由提高了令牌级缩减在LLM和VLM设置中的性能,包括LLaMA-2-7B和Qwen2.5-7B上的Quest,以及LLaVA和Qwen-VL模型上的FastV/VisionZip。在较小的KV预算下增益更大,表明当进一步移除令牌成本高昂时,值组路由特别有用。总体而言,令牌级缩减和子令牌路由提供了互补的降低KV成本的方式。

英文摘要

Transformer inference often requires a large KV cache, especially for long-context language modeling and multimodal generation. Existing compression methods usually reduce cache cost by selecting, evicting, quantizing, or compressing cached tokens, or by reducing the visual-token sequence before language-model inference. We introduce sub-token routing, a KV-compression method that adds a finer control axis inside retained tokens. It splits each retained value vector into groups and keeps only selected groups, while leaving query and key states unchanged. The method is designed to work after token-level reduction. First, a token-reduction method determines which tokens are retained. Then, sub-token routing compresses the value states inside those retained tokens. Experiments under matched KV budgets show that adding sub-token routing improves token-level reduction performance in both LLM and VLM settings, including Quest on LLaMA-2-7B and Qwen2.5-7B, and FastV/VisionZip across LLaVA and Qwen-VL models. The gains are larger at smaller KV budgets, suggesting that value-group routing is especially useful when further token removal becomes costly. Overall, token-level reduction and sub-token routing provide complementary ways to reduce KV cost.

2606.12280 2026-06-15 cs.LG 版本更新

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

在8位权重和激活下保持FP8质量上限:Ideogram 4.0的INT8和GGUF后训练量化用于消费级GPU

Deep Gandhi, Ali Asaria, Tony Salomone

发表机构 * Transformer Lab

AI总结 本文对Ideogram 4.0模型进行INT8 W8A8量化,在无FP8张量核心的Ampere GPU上达到FP8质量水平,并优于NF4,同时GGUF Q4_K在质量-内存前沿上成为帕累托最优。

详情
AI中文摘要

后训练量化使得大型文本到图像扩散变压器能够在消费级GPU上运行,然而硬件特定的权衡很少被直接测量。我们对Ideogram 4.0进行量化——这是一个9.3B的流匹配扩散变压器(DiT),作为两个独立权重副本的单流34层骨干网络,用于无分类器引导,并由Qwen3-VL-8B编码器调节——针对缺乏FP8张量核心的Ampere RTX 3090 GPU。我们的INT8 W8A8方案(每通道权重、每令牌动态激活、SmoothQuant,以及一小部分高脆弱性层的混合精度保护)保持了FP8质量上限:在200个提示的基准测试中,INT8与FP8的配对同种子自举置信区间在Pick和CLIP上均包含零,而INT8相比NF4提升了$+1.9$ CLIP(95%置信区间$[+1.21,+2.64]$,排除零)。据我们所知,针对此类模型的逐类别OCR分析尚未见报道,该分析确认了文本可读性得以保留,并且消融实验将FFN下投影的保护隔离为主要的质量杠杆。我们的GGUF Q4_K量化在相同磁盘大小下优于NF4,并在质量-内存前沿上成为帕累托最优,配对置信区间排除零(Q8_0质量中性)。最后,我们描述了8位量化在哪些情况下有帮助,在哪些情况下没有:INT8的权重与FP8的占用空间匹配而非缩小,因此在Ampere上获得速度提升需要融合INT8内核。

英文摘要

We study post-training quantization (PTQ) of Ideogram 4.0, a 9.3B flow-matching diffusion transformer (DiT) that realizes classifier-free guidance with two separate-weight copies of a single-stream backbone and is conditioned by a Qwen3-VL text encoder, targeting Ampere RTX~3090 GPUs, which lack FP8 tensor cores. Because Ideogram~4.0 is trained on structured JSON captions, we evaluate every variant under schema-valid JSON prompts produced by an LLM expander built to Ideogram's published caption specification, and score them with a battery spanning human-preference (HPSv2), CLIP, and PickScore for standalone quality; PP-OCR exact-match and edit distance for text; and PSNR/SSIM/LPIPS for fidelity to the FP8 reference (the highest-precision public checkpoint) output. On a 300-prompt benchmark with paired bootstrap confidence intervals, an INT8 W8A8 recipe (per-channel weights, per-token dynamic activations, SmoothQuant, and bf16 protection of a small high-fragility layer set) is statistically indistinguishable from FP8 on CLIP and PickScore (paired CIs include zero) and within ~0.004 HPSv2, and, at its 8-bit size, is the most faithful reproduction of the FP8 output (LPIPS 0.243 vs 0.277/0.306 for the half-size 4-bit baselines; the INT8-Q4_K gap excludes zero). A GGUF Q4_K quantization reaches the same standalone quality as the published NF4 baseline at the same on-disk size, making it the Pareto choice on the quality-memory frontier. We further show that under JSON prompts all four variants reach parity on standalone quality, the variants separate on fidelity and text rendering, not on aggregate image-quality scores, and that text legibility, near-zero when the model is prompted with raw strings, reaches 55% OCR exact-match under the JSON captions it expects. We release the INT8 W8A8 and GGUF Q4_K quantized weights on Hugging Face under a gated, non-commercial license.

2606.13054 2026-06-15 cs.LG cs.AI 版本更新

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

TWLA:通过训练后量化实现大语言模型的三值权重和低位激活

Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Xing Hu, Zhe Jiang, Dawei Yang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TWLA框架,通过后训练量化实现1.58位权重和4位激活,解决激活分布长尾问题,加速推理。

Comments Accepted by ICML 2026

详情
AI中文摘要

大型语言模型(LLMs)展现出卓越的通用语言处理能力,但其内存和计算成本阻碍了部署。三值化已成为一种有前景的压缩技术,可显著降低模型大小和推理复杂度。然而,现有方法难以处理重尾激活分布,因此将激活保持在高精度,从根本上限制了端到端推理加速。为克服这一限制,我们提出TWLA,一种后训练量化(PTQ)框架,在保持高精度的同时实现1.58位权重压缩和4位激活量化。TWLA包含三个组件:(1)欧几里得到流形非对称三值量化器(E2M-ATQ),通过从欧几里得初始化到流形重定位的两阶段优化,最小化权重三值化下的层输出误差;(2)Kronecker正交三模态整形(KOTMS),应用Kronecker结构正交旋转将权重重塑为三值友好的三模态分布,同时共享旋转统计上抑制激活异常值;(3)层间感知激活混合精度(ILA-AMP),在位分配中显式引入相邻层二阶交互成本,并联合优化由共享正交变换引起的激活量化增益的层间差异,防止少数弱层触发级联效应。大量实验表明,TWLA在W1.58A4下保持高精度,同时实现显著的推理加速。代码见<此https URL>。

英文摘要

Large language models (LLMs) exhibit exceptional general language processing capabilities, but their memory and compute costs hinder deployment. Ternarization has emerged as a promising compression technique, offering significant reductions in model size and inference complexity. However, existing methods struggle with heavy-tailed activation distributions and therefore keep activations in high precision, fundamentally limiting end-to-end inference acceleration. To overcome this limitation, we propose TWLA, a post-training quantization (PTQ) framework that achieves 1.58-bit weight compression and 4-bit activation quantization while maintaining high accuracy. TWLA comprises three components: (1) Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ) minimizes layer-output error under weight ternarization via a two-stage optimization from Euclidean initialization to manifold relocation; (2) Kronecker Orthogonal Tri-Modal Shaping (KOTMS) applies a Kronecker-structured orthogonal rotation to reshape weights into ternary-friendly tri-modal distributions, while the shared rotation statistically suppresses activation outliers; and (3) Inter-Layer Aware Activation Mixed Precision (ILA-AMP) explicitly introduces adjacent-layer second-order interaction costs in bit allocation and jointly optimizes for the layer-wise disparity of activation quantization gains induced by the shared orthogonal transform, preventing cascades triggered by a few weak layers. Extensive experiments demonstrate that TWLA maintains high accuracy under W1.58A4, while delivering significant inference acceleration. The code is available at https://github.com/Kishon-zzx/TWLA.

2512.22671 2026-06-15 cs.CL cs.AI cs.LG 版本更新

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

脆弱的知识,稳健的指令遵循:Llama-3.2中的宽度剪枝二分法

Pere Martra

发表机构 * Independent Researcher(独立研究员)

AI总结 通过峰值幅度准则对GLU-MLP层进行结构化宽度剪枝,发现降低扩展比会损害参数化知识任务,但能提升指令遵循能力,挑战了剪枝导致均匀退化的假设。

Comments 22 pages, 5 figures, 9 tables. Code available at https://github.com/peremartra/llama-glu-expansion-pruning

详情
AI中文摘要

对Llama-3.2模型中GLU-MLP层的结构化宽度剪枝,以峰值幅度(PPM)准则为指导,揭示了降低扩展比如何系统性地影响不同模型能力的二分法。虽然依赖参数化知识的任务(如MMLU、GSM8K)和困惑度指标的性能随扩展比降低而可预测地下降,但指令遵循能力在2.4倍平衡比下得到提升(IFEval:Llama-3.2-1B中+4.8分/+46%,Llama-3.2-3B中+3.7分/+39%),且多步推理保持稳健(MUSR)。这种模式在两个评估模型大小上一致观察到,挑战了压缩研究中剪枝导致均匀退化的主流假设。为探究这一点,我们使用评估事实知识、数学推理、语言理解、指令遵循和真实性的综合基准套件,评估了七种扩展比配置。我们的分析将扩展比识别为一个关键架构参数,它选择性地重塑模型的任务性能轮廓,而不仅仅是作为压缩指标。

英文摘要

Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably with decreasing expansion ratios, instruction-following capabilities improve at the 2.4x equilibrium ratio (IFEval: +4.8 points / +46% in Llama-3.2-1B and +3.7 points / +39% in Llama-3.2-3B), and multi-step reasoning remains robust (MUSR). This pattern, observed consistently across both evaluated model sizes, challenges the prevailing assumption in compression research that pruning induces uniform degradation. To investigate this, we evaluated seven expansion ratio configurations using comprehensive benchmark suites that assess factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively reshapes the model's task performance profile, rather than merely serving as a compression metric.

2602.16835 2026-06-15 cs.CR cs.LG 版本更新

NeST: Neuron Selective Tuning for LLM Safety

NeST: 面向LLM安全的神经元选择性调优

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出NeST框架,通过激活探测识别安全相关前馈神经元并训练共享簇级更新,仅用普通恶意提示即可泛化防御多种越狱攻击,在14个模型上以极少参数实现接近全微调的鲁棒性。

详情
AI中文摘要

安全对齐对于大型语言模型(LLM)的负责任部署至关重要。然而,现有方法通常依赖于重量级的微调,这在跨模型家族更新、审计和维护时成本高昂。全微调会产生大量的计算和存储开销,而参数高效方法(如低秩适应LoRA)则牺牲效率换取不一致的安全增益和对设计选择的敏感性。安全干预机制在不修改模型权重的情况下减少不安全输出,但无法直接塑造或保留控制安全行为的内部表示。我们提出NeST,一种用于高效事后安全对齐的神经元选择性调优框架。NeST通过对普通有害和无害提示进行激活探测来识别安全相关的前馈神经元,聚类具有相似激活模式的神经元,并训练共享的簇级更新,同时冻结模型的其余部分。重要的是,NeST仅使用普通恶意提示进行训练,不使用越狱特定的攻击数据,但能稳健地泛化到多种越狱攻击。学习到的更新随后被折叠到原始权重中,不产生推理时开销。在14个开源权重语言和多模态模型上的评估表明,NeST优于轻量级基线,并以显著更少的可训练参数接近全微调的鲁棒性。在纯文本模型上,NeST将平均越狱攻击成功率从44.5%降至1.1%,平均仅训练0.4M参数。在多模态设置中,它将ASR从55.3%降至1.1%,对于下游微调变体,通过将ASR从53.8%降至0.8%来恢复安全性。这些结果表明,通过将适应集中在局部、功能连贯的安全结构上,可以实现鲁棒、可维护的安全对齐。

英文摘要

Safety alignment is essential for the responsible deployment of Large Language Models (LLMs). Yet, existing approaches often rely on heavyweight fine-tuning that is costly to update, audit, and maintain across model families. Full fine-tuning incurs substantial computational and storage overhead, while parameter-efficient methods, e.g., Low-Rank Adaptation (LoRA), trade efficiency for inconsistent safety gains and sensitivity to design choices. Safety intervention mechanisms reduce unsafe outputs without modifying model weights, but do not directly shape or preserve the internal representations that govern safety behavior. We present NeST, a Neuron-Selective Tuning framework for efficient post-hoc safety alignment. NeST identifies safety-relevant feed-forward neurons via activation probing on vanilla harmful and benign prompts, clusters neurons with similar activation profiles, and trains shared cluster-level updates while freezing the rest of the model. Importantly, NeST is trained only on vanilla malicious prompts, without using jailbreak-specific attack data, yet generalizes robustly to diverse jailbreaks. The learned updates are then folded into the original weights, incurring no inference-time overhead. Evaluated on 14 open-weight language and multimodal models, NeST outperforms lightweight baselines and approaches full fine-tuning robustness with significantly fewer trainable parameters. On text-only models, NeST reduces average jailbreak attack success rate from 44.5% to 1.1% while training only 0.4M parameters on average. Across multimodal settings, it reduces ASR from 55.3% to 1.1%, and for downstream fine-tuned variants, it restores safety by reducing ASR from 53.8% to 0.8%. These results show that robust, maintainable safety alignment can be achieved by concentrating adaptation on localized, functionally coherent safety structures.

2604.23336 2026-06-15 cs.IR cs.CL cs.LG 版本更新

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

高效基于理由的检索:基于JEPA的生成重排序器的在线蒸馏

Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji

发表机构 * Geely AI Lab(吉利人工智能实验室)

AI总结 本文提出Rabtriever,通过在线蒸馏从生成重排序器中学习,将查询和文档独立编码,提升检索效率,同时在多个任务中表现优异。

Comments 11 pages, 8 figures. ICMR 2026 (https://youtu.be/apDcrzEVwq4)

详情
AI中文摘要

不同于传统基于事实的检索,基于理由的检索通常需要使用大语言模型对查询-文档对进行跨编码,造成显著的计算成本。为解决这一限制,我们提出了Rabtriever,它独立编码查询和文档,同时提供与重排序器相当的跨查询-文档理解能力。我们从训练一个基于LLM的生成重排序器开始,该重排序器将文档置于查询之前,并提示LLM通过对数概率生成相关性分数。然后将其作为在线蒸馏框架的教师,Rabtriever作为学生重建教师的上下文感知查询嵌入。为此,Rabtriever首先从教师中初始化,参数冻结。然后采用联合嵌入预测架构(JEPA)范式,该范式在LLM层和头部之间集成一个轻量级、可训练的预测器,将查询嵌入投影到新的隐藏空间,文档嵌入作为潜在向量。JEPA然后最小化此投影嵌入与教师嵌入的分布差异。为了增强在线蒸馏的采样效率,我们还添加了对LLM日志几率的反向KL的辅助损失,以重塑学生的日志几率分布。Rabtriever将教师在文档长度上的二次复杂度优化为线性,经理论和实验证实。实验表明,Rabtriever在多种基于理由的任务中优于不同的检索器基线,包括共情对话和机器人操作,且从重排序器中仅有微小的准确率下降。Rabtriever在传统检索基准如MS MARCO和BEIR上也表现良好,性能与最佳检索器基线相当。

英文摘要

Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs. To address this limitation, we propose Rabtriever, which independently encodes queries and documents, while providing comparable cross query-document comprehension capabilities to rerankers. We start from training a LLM-based generative reranker, which puts the document prior to the query and prompts the LLM to generate the relevance score by log probabilities. We then employ it as the teacher of an on-policy distillation framework, with Rabtriever as the student to reconstruct the teacher's contextual-aware query embedding. To achieve this effect, Rabtriever is first initialized from the teacher, with parameters frozen. The Joint-Embedding Predictive Architecture (JEPA) paradigm is then adopted, which integrates a lightweight, trainable predictor between LLM layers and heads, projecting the query embedding into a new hidden space, with the document embedding as the latent vector. JEPA then minimizes the distribution difference between this projected embedding and the teacher embedding. To strengthen the sampling efficiency of on-policy distillation, we also add an auxiliary loss on the reverse KL of LLM logits, to reshape the student's logit distribution. Rabtriever optimizes the teacher's quadratic complexity on the document length to linear, verified both theoretically and empirically. Experiments show that Rabtriever outperforms different retriever baselines across diverse rationale-based tasks, including empathetic conversations and robotic manipulations, with minor accuracy degradation from the reranker. Rabtriever also generalizes well on traditional retrieval benchmarks such as MS MARCO and BEIR, with comparable performance to the best retriever baseline.

7. 联邦学习、隐私与安全 12 篇

2606.13748 2026-06-15 cs.LG 新提交

FedSPC: Shared Parameter Correction for Personalized Federated Learning

FedSPC:个性化联邦学习的共享参数校正

Kannanthodath Induchoodan Ajay Menon, Christian Prehofer, Yunfei Xu, Toru Hirano

发表机构 * DENSO AUTOMOTIVE Deutschland GmbH(电装汽车德国有限公司) DENSO International America, Inc.(电装国际美国公司) Technical University of Munich(慕尼黑工业大学)

AI总结 针对个性化联邦学习中共享参数因客户端局部目标不一致而更新冲突的问题,提出模块化校正方法FedSPC,仅对共享参数应用控制变量校正,在多种PFL设置下提升性能。

Comments Accepted for presentation at FL@FM-IJCAI'26, in conjunction with IJCAI 2026. 9 pages

详情
AI中文摘要

个性化联邦学习(PFL)是联邦学习中解决统计异质性的重要方法之一,同时支持客户端特定的适应。许多PFL方法将模型拆分为共享参数和个性化参数,并在每个客户端上联合训练。然而,这产生了一个优化问题:共享参数由优化不同局部目标的客户端更新,可能导致共享更新不一致并削弱共享表示。为解决此问题,我们提出联邦共享参数校正(FedSPC),一种用于PFL的模块化校正方法。FedSPC仅对给定PFL方法的共享参数应用控制变量校正,而保持个性化参数不变。它可以集成到三种常见的PFL设置中:共享特征提取器、共享分类器以及带有局部正则化的完全共享模型。在CIFAR-100和Tiny-ImageNet上使用ViT、ResNet-34和VGG-11的实验表明,FedSPC提高了代表性PFL方法(包括FedPer、FedRep、FedBABU、LG-FedAvg和Ditto)的性能。

英文摘要

Personalized federated learning (PFL) is one of the important approaches in federated learning for addressing statistical heterogeneity while enabling client-specific adaptation. Many PFL methods split the model into shared and personalized parameters, which are jointly trained on each client. However, this creates an optimization issue: shared parameters are updated by clients optimizing different local objectives, which can lead to inconsistent shared updates and weaken the shared representation. To address this problem, we propose Federated Shared Parameter Correction (FedSPC), a modular correction method for PFL. FedSPC applies control-variate correction only to the shared parameters of a given PFL method, while leaving personalized parameters unchanged. It can be integrated into three common PFL settings: shared feature extractors, shared classifiers, and fully shared models with local regularization. Experiments on CIFAR-100 and Tiny-ImageNet with ViT, ResNet-34, and VGG-11 show that FedSPC improves performance across representative PFL methods, including FedPer, FedRep, FedBABU, LG-FedAvg, and Ditto.

2606.13873 2026-06-15 cs.LG cs.CL 新提交

Natively Unlearnable Large Language Models

原生不可学习的大语言模型

Gaurav R. Ghosal, Pratyush Maini, Aditi Raghunathan

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出NULLs模型,通过共享骨干和稀疏激活的sinks分离数据源贡献,实现无需梯度更新的高效遗忘,在维基百科上验证了单篇文章遗忘的有效性和鲁棒性。

详情
AI中文摘要

遗忘旨在移除特定训练数据源的影响,但由于不同数据源的贡献在模型中纠缠,这已被证明具有挑战性。将源贡献隔离到不相交的参数中使得移除更容易,尽管这会阻碍跨源的联合学习。我们提出NULLs(原生不可学习的大语言模型),这是一类模型,通过训练一组共享骨干神经元以及一个稀疏激活的sinks池,满足隔离源特定贡献和跨源联合学习这两个对立目标。在训练过程中,特定于源的信息自然集中在其sinks中,而跨源共享的信息积累在骨干中。在部署时,通过禁用相应的sinks来遗忘一个源,无需梯度更新也无需访问保留数据。我们展示了NULLs可扩展到维基百科约600万篇文章,将每篇文章隔离为独立源。遗忘单篇文章会移除其特定知识,同时保留与语义相关文章共享的事实,与从头重新训练紧密匹配。我们注意到,使用NULLs进行遗忘也具有鲁棒性:在遗忘《哈利·波特》书籍的案例研究中,NULLs抵抗了对抗性提取和逆转事后遗忘的重新学习。最后,NULLs保留了一般语言能力,在下游基准测试中与标准Transformer相匹配。这些结果共同表明,源级遗忘不必是事后考虑。它可以原生地构建到LLM训练中,同时保留共享表示学习的优势。

英文摘要

Unlearning aims to remove the influence of specific training data sources, but this has proved challenging because the contributions of different sources are entangled within the model. Isolating source contributions to disjoint parameters makes removal easier, though it obstructs joint learning across sources. We propose NULLs (Natively Unlearnable LLMs), a model class that satisfies the two opposing goals of isolating source-specific contributions and learning jointly across sources, by training a set of shared backbone neurons alongside a pool of sparsely activated sinks. During training, information specific to a source naturally concentrates in its sinks while information shared across sources accumulates in the backbone. A source is then unlearned at deployment by disabling its corresponding sinks, with no gradient updates and no access to the retained data. We show that NULLs scales to Wikipedia's ~6M articles, isolating each as an independent source. Unlearning a single article removes knowledge specific to it while preserving facts shared with semantically related articles, closely matching retraining from scratch. We note that unlearning with NULLs is also robust: in a case study of unlearning the Harry Potter books, NULLs resists both adversarial extraction and relearning that reverses post-hoc unlearning. Finally, NULLs preserves general language capabilities, matching a standard transformer on downstream benchmarks. Together, these results suggest that source-level unlearning need not be an afterthought. It can be built natively into LLM training while retaining the benefits of shared representation learning.

2606.14078 2026-06-15 cs.LG cs.AI 新提交

Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning

通过持续学习中的灾难性遗忘视角重新思考后门对抗性去学习

Zhenqian Zhu, Yamin Hu, Yujiang Liu, Luping Wei, Wenbo Hou, Bin Li, Haodong Li, Wenjian Luo

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Shenzhen Key Laboratory of Media Security, Shenzhen University(深圳大学媒体安全深圳市重点实验室)

AI总结 本文将后门学习与去学习建模为持续学习视角下的三阶段过程,基于灾难性遗忘机制推导完全后门去学习的必要条件,并提出盲反演-后门对抗性去学习(BI-BAU)方法,通过期望最大化算法优化最大后验目标,有效消除后门效应。

Comments Accepted by ACM CCS 2026

详情
AI中文摘要

现有研究表明,当前的后门防御方法鲁棒性有限,且常无法应对特定类型的攻击。更令人担忧的是,主流的安全调优策略往往仅提供表面安全保护,因为它们未能完全消除后门效应。在本工作中,我们从持续学习视角将后门学习与去学习重新表述为一个顺序的三阶段过程。在此框架内,我们正式定义了完全后门去学习,并基于灾难性遗忘机制进一步推导了实现它的必要条件。在这些见解的指导下,我们提出了盲反演-后门对抗性去学习(BI-BAU),它将满足去学习条件的对抗样本生成问题表述为一个盲反演问题。我们通过将对抗训练的双层优化过程整合到期望最大化(EM)算法框架中来解决该问题,以优化最大后验(MAP)目标。此外,BI-BAU被扩展到目标类别未知的无目标对抗场景以及多模态对比学习任务中,增强了其在预训练模型可能被攻破的真实部署场景中的适用性。大量实验表明,我们的方法在广泛的后门攻击中具有通用适用性,并能有效且彻底地消除后门模型中的后门效应。

英文摘要

Existing studies reveal that current backdoor defenses exhibit limited robustness and often fail against specific types of attacks. More concerningly, prevailing safety tuning strategies tend to provide only superficial safety protection, as they fall short of completely eliminating the backdoor effects. In this work, we present a novel formulation of backdoor learning and unlearning as a sequential, three-stage process from a continual learning perspective. Within this framework, we formally define complete backdoor unlearning and further derive the necessary conditions for achieving it based on the mechanism of catastrophic forgetting. Guided by these insights, we propose Blind Inversion-Backdoor Adversarial Unlearning (BI-BAU), which formulates the generation of adversarial examples satisfying the unlearning conditions as a blind inversion problem. We solve this by integrating the bi-level optimization process of adversarial training into an Expectation-Maximization (EM) algorithm framework to optimize the maximum a posteriori (MAP) objective. Furthermore, BI-BAU is extended to untargeted adversarial scenarios with unknown target classes, as well as to multi-modal contrastive learning tasks, enhancing its applicability to real-world deployment scenarios where pre-trained models may be compromised. Extensive experiments demonstrate that our method exhibits general applicability across a wide spectrum of backdoor attacks and can effectively and thoroughly eliminate the backdoor effects from a backdoor model.

2606.14354 2026-06-15 cs.LG 新提交

MUFFLe: Efficient Model Update Compression via Generalized Deduplication for Federated Learning

MUFFLe: 通过广义去重实现联邦学习的高效模型更新压缩

Xiaobo Zhao, Daniel E. Lucani

发表机构 * Innovation Foundation Denmark(丹麦创新基金会)

AI总结 提出MUFFLe方案,将广义去重(GD)集成到FedAvg中,通过去重更新向量中的重复模式实现固定速率、可变计数的压缩,在IID MNIST上以38 MB累积上行通信达到92.93%目标精度。

Comments Accepted at IEEE EDGE 2026 (Work-in-Progress track)

详情
AI中文摘要

联邦学习非常适合边缘环境,但通常受到传输模型更新的上行链路成本的限制。这篇进行中的工作论文提出了MUFFLe,一种通信高效的更新压缩方案,将广义去重(GD)集成到FedAvg流程中。MUFFLe去重更新向量中的重复模式,产生固定速率、可变计数的压缩方案。在20个客户端的IID MNIST上的初步实验表明,MUFFLe以38 MB累积上行通信达到92.93%的目标精度,而8位量化需要75 MB,Top-$k$稀疏化需要86 MB,未压缩的FedAvg需要310 MB。这些结果证明了将GD应用于通信高效的联邦学习的可行性。

英文摘要

Federated learning is well suited to edge environments but is often limited by the uplink cost of transmitting model updates. This Work-in-Progress paper presents MUFFLe, a communication-efficient update compression scheme that integrates generalized deduplication (GD) into the FedAvg pipeline. MUFFLe deduplicates repeated patterns across the update vector, yielding a fixed-rate, variable-count compression scheme. Preliminary experiments on IID MNIST with 20 clients show that MUFFLe reaches the target accuracy of $92.93\%$ with 38~MB cumulative uplink communication, compared with 75~MB for 8-bit quantization, 86~MB for Top-$k$ sparsification, and 310~MB for uncompressed FedAvg. These results demonstrate the feasibility of applying GD to communication-efficient federated learning.

2606.14416 2026-06-15 cs.LG stat.ML 新提交

Federated Learning for Feature Generalization with Convex Constraints

基于凸约束的联邦学习特征泛化

Dongwon Kim, Donghee Kim, Sung Kuk Shyn, Kwangsu Kim

发表机构 * Dongwon Kim(金东Won) Donghee Kim(金东浩) Sung Kuk Shyn(申 Sung Kuk) Kwangsu Kim(金光Su)

AI总结 针对联邦学习中客户端数据异构导致的泛化问题,提出FedCONST方法,利用线性凸约束自适应调整更新幅度,平衡参数学习,并通过梯度信噪比分析验证其有效性,实现跨异构环境的强泛化。

Comments Accepted at the 42nd International Conference on Machine Learning (ICML 2025)

详情
AI中文摘要

联邦学习(FL)常因客户端数据异构而难以泛化。局部模型容易过拟合其局部数据分布,甚至可迁移特征在聚合过程中也可能被扭曲。为应对这些挑战,我们提出FedCONST,一种基于全局模型参数强度自适应调整更新幅度的方法。这可以防止过度强调已学好的参数,同时加强未充分发展的参数。具体而言,FedCONST采用线性凸约束来确保训练稳定性,并在聚合过程中保留局部学到的泛化能力。梯度信噪比(GSNR)分析进一步验证了FedCONST在增强特征可迁移性和鲁棒性方面的有效性。因此,FedCONST有效对齐了局部和全局目标,减轻了过拟合,促进了跨不同FL环境的更强泛化,达到了最先进的性能。

英文摘要

Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the parameter strength of the global model. This prevents over-emphasizing well-learned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stability and preserve locally learned generalization capabilities during aggregation. A Gradient Signal to Noise Ratio (GSNR) analysis further validates the effectiveness of FedCONST in enhancing feature transferability and robustness. As a result, FedCONST effectively aligns local and global objectives, mitigating overfitting and promoting stronger generalization across diverse FL environments, achieving state-of-the-art performance.

2606.14518 2026-06-15 cs.LG 新提交

Behavioral Audit of Machine Unlearning Has a Privacy Cost

机器遗忘的行为审计具有隐私代价

Liou Tang, James Joshi, Ashish Kundu

发表机构 * University of Pittsburgh(匹兹堡大学) Cisco(思科)

AI总结 本文证明,在互不信任的模型所有者和审计者场景下,仅依赖模型行为查询的审计方案无法在不泄露保留集成员信息的情况下识别未充分遗忘的模型,揭示了隐私与审计之间的固有权衡。

详情
AI中文摘要

通过机器遗忘从机器学习模型中移除已学习数据已被广泛研究;然而,目前尚未有公认的审计方案。现有工作表明,不诚实的模型所有者可以伪造证据来避免执行遗忘,而好奇的审计者(及对手)即使在有限访问权限下也能推断模型及其训练数据的隐私敏感属性。然而,在模型所有者和审计者互不信任的情况下对机器遗忘的审计仍未得到探索。我们为此场景提供了信息论证明:对于凸机器学习模型,仅依赖查询模型获取\textit{行为}信号的通用审计方案无法在不泄露保留集成员信息的情况下识别未充分遗忘的模型。因此,在不诚实的模型所有者和诚实但好奇的审计者假设下审计机器遗忘面临固有的隐私-审计权衡。我们在凸模型上的实证结果强烈支持这一结论,而进一步实验表明这种隐私-审计张力在非凸模型中依然存在。我们的结果呼吁在更现实的审计者威胁模型下更仔细地考虑隐私-审计张力,并为机器遗忘流程中隐私保护审计方案的设计提供更严格的审查基础。我们还在此 https URL 发布了代码实现。

英文摘要

The removal of learned data from Machine Learning models through Machine Unlearning (MU) has been widely studied; however, there has yet to be an agreed-upon scheme for auditing MU. Existing work has shown that a dishonest model owner can falsify evidence to avoid executing MU, while curious auditors (and adversaries) can infer the privacy-sensitive properties of the model and its training data even with limited access. Yet auditing of MU under mutual distrust between the model owner and the auditor remains unexplored. We provide an information-theoretic proof for this scenario: for convex ML models, a generic audit scheme that relies solely on querying the model for \textit{behavioral} signals cannot identify insufficiently unlearned models without revealing membership information of the retained set. Therefore, auditing MU under the assumption of a dishonest model owner and an honest-but-curious auditor faces an inherent privacy-audit tradeoff. Our empirical results on convex models strongly supports this result, while further experiments demonstrate that this privacy-audit tension persists in non-convex models. Our results call for a more careful consideration of the privacy-audit tension under a realistic auditor threat model, and serve as a foundation for more scrutiny of designs of privacy-preserving audit schemes for the MU pipeline. We also release our code implementation at https://github.com/LiouTang/Behavioral-Unlearn-Audit.

2601.14033 2026-06-15 cs.LG cs.CR 版本更新

Private Prediction via PAC Privacy

通过PAC隐私实现私有预测

Xiaochen Zhu, Mayuri Sridhar, Srinivas Devadas

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 针对API场景下的私有预测,提出基于PAC隐私的实例级噪声校准方法,实现自适应查询下互信息线性累积,在CIFAR-10上以极低预算达到高精度,并支持通过蒸馏发布可公开查询的模型。

详情
AI中文摘要

机器学习模型越来越多地通过API提供服务。这使得私有预测(即私有化模型输出而非其参数)成为一个自然的隐私目标:模型输出维度更低,且对训练数据变化的稳定性远高于权重。虽然差分隐私(DP)无法有效利用这一点,因为它将噪声校准到最坏情况下的敏感度,而对于非凸模型,这种敏感度难以界定,但我们认为PAC隐私是私有预测的自然选择。它是基于实例的,并将噪声校准到黑盒函数的经验稳定性,以控制互信息(MI)泄露。缺失的部分是高效的自适应组合。提供预测意味着回答来自不可信用户的一系列自适应选择的查询;现有的组合要么在自适应下失效,要么呈二次增长,要么退化为与输入无关的类似DP的噪声。我们通过自适应噪声校准填补了这一空白,提出了新的对抗组合结果,并证明了在自适应和对抗性查询下,MI仅线性累积。跨模态的实验表明,预测稳定性使得即使在极小的每查询预算下也能实现高实用性:在CIFAR-10上,我们以每查询MI预算$2^{-32}$实现了87.79%的准确率。这使得在提供100万次查询的同时,能够将成员推理成功率证明性地限制在51.08%——与$(0.04, 10^{-5})$-DP相同的保证。此外,在辅助公开数据存在的情况下,大量的PAC私有预测使我们能够蒸馏出一个可发布的模型,该模型可以无限制地查询。具体来说,在ImageNet子集上的21万个私有标签蒸馏出一个学生模型,在CIFAR-10上达到91.86%的准确率,成员推理成功率限制在50.49%,与$(0.02, 10^{-5})$-DP相当。

英文摘要

Machine learning models are increasingly served behind APIs. This renders private prediction, i.e., privatizing a model's outputs rather than its parameters, a natural privacy target: model outputs are lower-dimensional and far more stable to training-data changes than weights. While differential privacy (DP) cannot effectively exploit this as it calibrates noise to worst-case sensitivity that is intractable to bound for non-convex models, we argue that PAC privacy is a natural fit for private prediction. It is instance-based, and calibrates noise to a black-box function's empirical stability to control mutual-information (MI) leakage. The missing ingredient is efficient, adaptive composition. Serving predictions means answering a long stream of adaptively chosen queries from untrusted users; existing composition either fails under adaptivity, grows quadratically, or reverts to input-independent, DP-like noise. We close this gap with a new adversarial composition result via adaptive noise calibration and prove that MI accumulates only linearly under adaptive and adversarial querying. Experiments across modalities show that prediction stability enables high utility even at a tiny per-query budget: on CIFAR-10, we achieve 87.79% accuracy with a per-query MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership-inference success to 51.08% -- the same guarantee as $(0.04, 10^{-5})$-DP. Further, in the presence of auxiliary public data, the large volume of PAC-private predictions enables us to distill a publishable model that can be queried without limit. Concretely, 210,000 private labels on an ImageNet subset distill into a student reaching 91.86% accuracy on CIFAR-10 with membership inference success bounded by 50.49%, comparable to $(0.02, 10^{-5})$-DP.

2602.23638 2026-06-15 cs.LG cs.AI 版本更新

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

FedRot-LoRA: 缓解联邦LoRA中的旋转偏移

Haoran Zhang, Dongjun Kim, Seohyeon Cha, Haris Vikalo

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出FedRot-LoRA框架,通过正交变换对齐客户端更新以减少子空间不匹配,提升联邦LoRA在异质数据下的性能。

Comments ICML 2026

详情
AI中文摘要

联邦LoRA提供了一种高效的通信机制用于在去中心化数据上微调大语言模型。然而,因子加权平均与数学上正确的本地更新聚合之间的不一致会导致显著的聚合误差和不稳定的训练。本文认为,主要问题是由于低秩因子化旋转不变性导致的旋转偏移,即不同客户端的潜在子空间中,语义等价的更新可以以不同的形式表示。当这些不一致的因子直接平均时,会产生破坏性干扰,降低全局更新质量。为此,本文提出FedRot-LoRA框架,在聚合前通过正交变换对齐客户端更新,从而在不增加通信成本或限制模型表达能力的情况下,保持语义更新并减少跨客户端子空间不匹配。本文提供了收敛性分析,研究了因子加权平均引起的聚合误差,并展示了旋转对齐如何提供更紧的误差上界。在自然语言理解和生成任务上的广泛实验表明,FedRot-LoRA在各种异质性和LoRA秩水平下均优于现有联邦LoRA基线。

英文摘要

Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.

2604.04611 2026-06-15 cs.LG cs.CR 版本更新

Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns

联邦学习中基于模拟攻击模式的动态搭便车者检测

Motoki Nakamura

发表机构 * Fujitsu Limited(富士通株式会社)

AI总结 针对联邦学习中动态搭便车者难以检测的问题,提出S2-WEF方法,通过模拟全局模型攻击的权重演化频率模式并结合偏差评分进行二维聚类,实现无需代理数据集或预训练的高鲁棒性检测。

Comments 23 pages, 1 figure, 8 tables

详情
AI中文摘要

联邦学习(FL)使多个客户端能够通过聚合本地更新来协作训练全局模型,而无需共享私有数据。然而,FL经常面临搭便车者的挑战,即提交虚假模型参数而不执行实际训练以获取全局模型的客户端。Chen等人提出了一种基于模型参数权重演化频率(WEF)的搭便车者检测方法。该检测方法是实际搭便车者检测方法的主要候选,因为它既不需要代理数据集也不需要预训练。然而,它难以检测“动态”搭便车者,这些客户端在早期轮次中行为诚实,后来转而搭便车,特别是在全局模型模仿攻击(如delta权重攻击和我们新提出的自适应WEF伪装攻击)下。在本文中,我们提出了一种新颖的检测方法S2-WEF,该方法在服务器端使用先前广播的全局模型模拟潜在基于全局模型的攻击的WEF模式,并识别提交的WEF模式与模拟模式相似的客户端。为了处理各种搭便车攻击策略,S2-WEF进一步将该基于模拟的相似度分数与通过提交的WEF之间的相互比较计算的偏差分数相结合,并通过二维聚类和每分数分类来区分良性客户端和搭便车客户端。该方法能够在无需代理数据集或预训练的情况下,动态检测训练过程中转变为搭便车者的客户端。我们在三个数据集和五种攻击类型上进行了大量实验,证明S2-WEF比现有方法具有更高的鲁棒性。

英文摘要

Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.

2606.00947 2026-06-15 cs.LG cs.AI 版本更新

Silent Failures in Federated Personalization of Foundation Models

联邦基础模型个性化中的静默失败

YongKyung Oh, Alex Bui

发表机构 * Medical & Imaging Informatics (MII) Group, University of California, Los Angeles (UCLA)(医学与影像信息学(MII)组,加州大学洛杉矶分校(UCLA))

AI总结 本文提出联邦基础模型个性化中因隐私约束导致的一类信任失败——静默失败,包括偏差放大、公平性崩溃和对齐侵蚀,并引入六种静默失败模式的分类法,强调隐私保护训练不足以保障可信部署。

详情
AI中文摘要

基础模型通过联邦学习在分散的私有数据上越来越个性化,并在日益增长的上市后监管要求下大规模部署。我们认为这种趋同产生了一类独特且未被充分认识的信任失败,我们称之为“静默失败”。这些包括偏差放大、公平性崩溃和对齐侵蚀,这些可能仍然难以检测,因为联邦学习的隐私约束限制了对模型行为的可见性。对现有基准的景观分析揭示了结构性鸿沟。联邦基准评估系统性能,但对模型行为的洞察有限,而集中式信任基准评估行为,但需要与联邦隐私不兼容的模型访问。我们引入了一个由基础模型个性化、数据集偏移和核心联邦约束相互作用产生的六种静默失败模式的分类法。我们的分析表明,仅靠隐私保护训练不足以实现可信部署。最后,我们提出了一个隐私保护行为评估的研究议程,并建议将静默失败作为可信联邦人工智能的标准诊断类别。

英文摘要

Foundation models are increasingly personalized on decentralized private data through federated learning and are now deployed at scale under growing regulatory requirements for post-market monitoring. We argue that this convergence creates a distinct and under-recognized class of trustworthiness failures, which we term "Silent Failures." These include amplified bias, fairness collapse, and alignment erosion that may remain difficult to detect because federated learning's privacy constraints limit visibility into model behavior. A landscape analysis of existing benchmarks reveals a structural divide. Federated benchmarks evaluate system performance but provide limited insight into model behavior, whereas centralized trustworthiness benchmarks assess behavior but require model access incompatible with federated privacy. We introduce a taxonomy of six silent failure modes arising from the interaction of foundation model personalization, dataset shift, and core federated constraints. Our analysis shows that privacy-preserving training alone is insufficient for trustworthy deployment. We conclude with a research agenda for privacy-preserving behavioral evaluation and propose that silent failures become a standard diagnostic category for trustworthy federated artificial intelligence.

2606.12733 2026-06-15 cs.LG 版本更新

Let's Ask Gauss: Improved One-Run Privacy Auditing

让我们问高斯:改进的单次运行隐私审计

Adya Agrawal, Yu Wei, Jaspal Singh, Malik Magdon-Ismail, Vassilis Zikas

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Rensselaer Polytechnic Institute(伦斯勒理工学院) Purdue University(普渡大学)

AI总结 提出一种基于高斯渐近分布的差分隐私审计框架,利用白盒DP-SGD中金丝雀对齐信号的归一化和,从单次训练运行中获取更紧的隐私下界。

详情
AI中文摘要

隐私审计通过估计模型实际泄露的信息提供重要保障,从而确保理论隐私保证在实践中成立。我们研究差分隐私(DP)机器学习的经验隐私审计,重点关注针对DP-SGD等机制的高效单次运行方法。先前的单次运行方法将训练示例或“金丝雀”阈值化为二元成员猜测,这丢弃了有用信息。我们证明,在白盒DP-SGD设置中,金丝雀对齐信号自然形成一系列随机变量,其归一化和渐近服从高斯分布。利用这种分布视角,我们开发了一个DP审计框架,从单次训练运行中获得更紧的隐私下界。

英文摘要

Privacy auditing provides an important safeguard by estimating the actual information leaked by a model, thus ensuring that theoretical privacy guarantees hold in practice. We study empirical privacy auditing for differentially private (DP) machine learning, focusing on efficient one-run methods for mechanisms such as DP-SGD. Prior one-run approaches threshold training examples or "canaries" into binary membership guesses, which discards useful information. We show that, in the white-box DP-SGD setting, canary-aligned signals naturally form a sequence of random variables whose normalized sum is asymptotically Gaussian. Leveraging this distributional perspective, we develop a DP-auditing framework that leads to tighter privacy lower bounds from a single training run.

2602.02355 2026-06-15 cs.DC cs.IT cs.LG math.IT 版本更新

Mitigating Heterogeneity-Induced Drift in Hierarchical Sign-Based Federated Learning

缓解层次化基于符号的联邦学习中的异质性引起的漂移

Amirreza Kazemi, Seyed Mohammad Azimi-Abarghouyi, Gabor Fodor, Carlo Fischione

AI总结 针对层次化联邦学习中簇间数据异质性导致的模型漂移问题,提出DC-HierSignSGD算法,通过云辅助梯度校正消除偏差,在保持二进制通信的同时提升精度。

详情
AI中文摘要

层次化联邦学习(HFL)非常适合大规模无线和物联网系统,其中设备在到达云之前与附近的边缘服务器通信。在这些环境中,上行链路带宽和延迟施加了严格的通信约束,使得激进的梯度压缩成为必要。基于一位符号的随机梯度下降方法在平坦联邦设置中提供了有吸引力的解决方案,但其在层次化边缘-云架构中的行为仍未得到充分理解,尤其是在簇间数据异质性下。为填补这一空白,我们开发了一个基于符号的HFL框架,其中设备向边缘服务器传输二进制随机梯度符号,边缘服务器应用多数投票,云定期聚合边缘模型。我们的分析表明,簇间异质性在收敛界中引入了一个持续偏差项,反映了边缘模型向局部目标的漂移。这一项无法通过增加训练轮数或单独调整标准超参数来消除。因此,我们提出了\(\mathtt{DC\text{-}HierSignSGD}\),一种漂移校正的基于符号的HFL算法,其中设备在取符号之前应用云辅助梯度校正。我们表明,这种预符号校正减轻了非消失的异质性引起的偏差,同时在重复的局部符号更新步骤中保留了设备-边缘的二进制通信。在严重簇间异质性下的实验表明,\(\mathtt{DC\text{-}HierSignSGD}\)提高了基于符号的HFL的稳定性和准确性,并在设备-边缘通信大幅降低的情况下实现了与全精度层次化SGD相当的性能。

英文摘要

Hierarchical federated learning (HFL) is well suited for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication constraints, making aggressive gradient compression essential. One-bit sign-based stochastic gradient descent methods provide an attractive solution in flat federated settings, but their behavior in hierarchical edge--cloud architectures remains insufficiently understood, especially under inter-cluster data heterogeneity. To address this gap, we develop a sign-based HFL framework in which devices transmit binary stochastic-gradient signs to edge servers, edge servers apply majority voting, and the cloud periodically aggregates edge models. Our analysis reveals that inter-cluster heterogeneity induces a persistent bias term in the convergence bound, reflecting the drift of edge models toward local objectives. This term cannot be removed by increasing the number of training rounds or by tuning standard hyperparameters alone. We therefore propose \(\mathtt{DC\text{-}HierSignSGD}\), a drift-corrected sign-based HFL algorithm in which devices apply a cloud-assisted gradient correction before taking the sign. We show that this pre-sign correction mitigates the non-vanishing heterogeneity-induced bias while preserving binary device--edge communication during the repeated local sign-update steps. Experiments under severe inter-cluster heterogeneity demonstrate that \(\mathtt{DC\text{-}HierSignSGD}\) improves the stability and accuracy of sign-based HFL and achieves performance comparable to full-precision hierarchical SGD with substantially lower device--edge communication.

8. 鲁棒性、不确定性与可信学习 17 篇

2606.13801 2026-06-15 cs.LG q-bio.NC 新提交

Neural Variability Enhances Artificial Network Robustness

神经变异性增强人工网络鲁棒性

Robin Preble, Praveen Venkatesh, Stefan Mihalas, Kameron Decker Harris

发表机构 * Department of Computer Science, Western Washington University(西华盛顿大学计算机科学系) Allen Institute(艾伦研究所)

AI总结 研究通过引入结构化噪声(模仿皮层神经变异性)提升人工神经网络对对抗攻击和自然图像修改的鲁棒性,发现噪声结构可显著增强鲁棒性,且对抗攻击的噪声结构可泛化至其他攻击类型。

详情
AI中文摘要

皮层中的神经反应在重复刺激下表现出显著的试验间变异性,而外周感觉神经元的反应则更为一致,这使许多人怀疑随机性是否具有意义。已有研究认为,噪声和信号相关性可能被优化用于动物的辨别,而人工神经网络(ANN)研究也显示了噪声在机器学习任务中的类似益处,尽管大多数ANN研究忽略了相关性的影响。在这里,我们研究相关噪声是否能提高人工神经网络对对抗攻击和自然图像修改的鲁棒性。利用修改输入与干净输入下激活的协方差,我们发现结构化噪声可以显著提高网络鲁棒性。对自然图像修改的鲁棒性最受益于结构,但这种结构在修改类型之间迁移性差。相比之下,来自对抗攻击的噪声结构可以泛化到其他类型的攻击。这些结果表明,ANN激活中的结构化噪声通常能提高鲁棒性,建立了一种仅依赖局部信息的生物合理策略来创建鲁棒的人工神经网络。

英文摘要

Neural responses in cortex exhibit substantial trial-to-trial variability in response to repeated stimuli, while peripheral sensory neurons respond far more consistently, leading many to wonder whether stochasticity may carry meaning. Existing work has argued that noise and signal correlations may be optimized for discrimination in animals, whereas artificial neural network (ANN) studies have shown similar benefits of noise in machine learning tasks, although most ANN work has neglected the effects of correlations. Here we investigate whether correlated noise improves the robustness of artificial neural networks to adversarial attacks and naturalistic image modifications. Using the covariance of activations under modified versus clean inputs, we find that structured noise may significantly improve network robustness. Robustness to naturalistic image modifications benefits most from structure, but this structure transfers poorly across modification types. In contrast, noise structure from adversarial attacks can generalize to other kinds of attacks. These results suggest that structured noise in ANN activations generally improves robustness, establishing a biologically plausible strategy for creating robust artificial neural networks that only relies on local information.

2606.14060 2026-06-15 cs.LG cs.CL 新提交

Non-Parametric Machine Text Detection via Multi-View Gaussian Processes

非参数化机器文本检测:基于多视角高斯过程

Aleem Khan, Nicholas Andrews

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出多视角非参数检测框架,通过高斯过程集成互补特征视图,提高对对抗攻击的鲁棒性,并提供校准概率和分布外输入的原则性弃权。

详情
AI中文摘要

对抗条件(如释义和定向风格迁移)会急剧降低机器文本检测器的准确性。然而,文档携带多种互补信号(例如,风格特征、似然和排序特征、结构特征),抑制其中一种的攻击可能使其他信号保持完整。虽然参数化分类器可以在充分监督下学习组合这些特征,但当分布发生变化时(例如,新型攻击或未见过的语言模型),分类器容易做出过度自信的错误预测。为了解决这个问题,我们提出了一种多视角、非参数化的检测框架,该框架从同一文档中提取互补的特征视图,并通过高斯过程集成聚合每个视图的证据。通过跨视图聚合证据,对手必须同时击败多个独立的检测轴,从而大幅提高逃避成本。高斯过程公式还提供了校准概率和对分布外输入的原则性弃权,支持在高风险场景中的可靠部署。我们在三个涵盖不同生成器和攻击的基准测试(DetectRL和RAID基准测试,以及PAN2025共享任务)上进行了评估,结果表明,我们的多视角检测器在考虑的攻击下保持强性能,在针对未见攻击时优于现有方法。

英文摘要

Adversarial conditions such as paraphrasing and targeted style transfer sharply degrade the accuracy of machine text detectors. A document, however, carries multiple complementary signals (e.g., stylistic features, likelihood and rank-order features, and structural features), and an attack that suppresses one may leave others intact. While a parametric classifier can learn to combine these features given sufficient supervision, classifiers are prone to making confidently incorrect predictions when the distribution shifts (e.g., novel attacks or unseen language models). To address this, we propose a multi-view, non-parametric detection framework that extracts complementary feature views from the same document and aggregates per-view evidence through a Gaussian process ensemble. By aggregating evidence across views, an adversary must simultaneously defeat multiple independent axes of detection, substantially raising the cost of evasion. The Gaussian process formulation additionally provides calibrated probabilities and principled abstention on out-of-distribution inputs, supporting reliable deployment in high-stakes settings. We evaluate on three benchmarks spanning diverse generators and attacks: the DetectRL and RAID benchmarks, and the PAN2025 shared task and demonstrate that our multi-view detector maintains strong performance under the considered attacks, outperforming existing approaches against held out attacks.

2606.13780 2026-06-15 hep-ph cs.LG hep-ex stat.ML 交叉投稿

Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches

新物理搜索中异常检测的共形校准与look-elsewhere效应

Jack Y. Araz, Michael Spannowsky

发表机构 * Department of Physics and Astronomy, University College London(大学学院伦敦物理系) Department of Engineering, City St. George’s, University of London(伦敦大学城市圣乔治学院工程系) Institute for Theoretical Physics, Campus Süd, Karlsruhe Institute of Technology (KIT)(卡尔斯鲁厄理工学院(KIT)理论物理研究所) Institute for Quantum Materials and Technologies, Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院量子材料与技术研究所)

AI总结 提出基于共形预测的校准层,将任意异常分数转化为具有分布无关、有限样本保证的显著性,同时修正背景误建模和look-elsewhere效应。

Comments 22 pages, 15 figures, 3 tables. Comments welcome

详情
AI中文摘要

机器学习驱动的异常检测正在重塑新物理搜索,但其统计解释方法已落后。原始异常分数缺乏校准意义,扫描多个区域的模型会放大look-elsewhere效应,而领域依赖的渐近显著性对异常检测器特别容易遭受的背景误建模视而不见。我们提出一个基于共形预测的校准层,能将任意异常分数转化为具有分布无关、有限样本保证的可辩护显著性。共形预测将分数转化为有效的局部p值,加权和Mondrian变体修复了共振搜索中边带到信号区域的可交换性失败,而Gross-Vitells步骤将结果转化为考虑look-elsewhere的全局显著性。该层同时做两件事:它暴露了标准流程无法发现的校准错误,并在不重新训练检测器的情况下进行修正。在公开的LHC Olympics数据上,一个分类器产生了子结构-质量相关性,使得边带校准的背景p值变得反保守。表面上看,这仅由背景塑造就制造了约$46\sigma$的过剩,而无标签加权修正消除了这一过剩,恢复了诚实的零假设。当作为盲法宽质量凸起搜索运行时,标准渐近和未加权程序即使在无信号窗口也会制造$\gtrsim10\sigma$和约$5\sigma$的过剩,而共形层没有产生任何误报,其全局误报率在仅背景伪实验中得到验证。结果是一条可审计、与检测器无关的路径,从未校准分数到考虑试验因子的显著性,可集成到实验异常搜索中。

英文摘要

Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46σ$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10σ$ excesses and $\approx5σ$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.

2606.13870 2026-06-15 cs.CV cs.AI cs.LG 交叉投稿

Mirage Probes: How Vision Models Fake Visual Understanding

幻象探针:视觉模型如何伪造视觉理解

Daniel Ben-Levi, Judah Goldfeder, Weiliang Zhao, Raz Lapid, Amit LeVi, Allen G. Roush, Ravid Shwartz-Ziv, Hod Lipson

发表机构 * Columbia University(哥伦比亚大学) Intuit Technion(以色列理工学院) Thoughtworks New York University(纽约大学)

AI总结 提出幻象探针框架,通过对比探针揭示视觉语言模型在无图像时也能回答问题的两种幻象行为:文本偏见和虚假图像,并证明后者需要表征级干预。

详情
AI中文摘要

视觉语言模型(VLM)即使在没有提供图像的情况下,也能自信且通常正确地回答基于图像的问题。这种幻象行为会虚增基准分数,而不反映视觉基础。先前的工作将其视为单一故障模式。我们认为这是两种。使用幻象探针(Mirage Probes),一种对比探针框架,将释义的问题变体与同一图像上的匹配幻象和非幻象标签配对,我们展示了在两个开源VLM中,幻象行为可以从残差流、MLP、后注意力和注意力头位置的内部激活中线性解码。我们证明朴素贝叶斯文本基线无法恢复此信号,排除了表面词汇混淆。跨基准可分离性模式,连同一种新颖的先验利用指数(PHI),衡量模型仅从文本中回答的程度,揭示了两种不同的机制:文本偏见,其中模型从语言先验中回答而不涉及视觉表征;以及虚假图像,其中模型在潜在空间中构建虚假视觉内容并像有基础一样回答。这种区别有直接的缓解后果:文本分布清理可以解决第一种机制,但无法触及第二种,因为虚假图像幻象存在于模型的视觉表征中而非文本中。忠实的视觉基础将需要在表征层面进行干预。

英文摘要

Vision-language models (VLMs) can answer image-based questions confidently, and often correctly, even when no image is provided. This mirage behavior inflates benchmark scores without reflecting visual grounding. Prior work treats this as a single failure mode. We argue it is two. Using Mirage Probes, a contrastive probing framework that pairs paraphrased question variants with matched mirage and non-mirage labels on the same image, we show that mirage behavior is linearly decodable from internal activations across residual stream, MLP, post-attention, and attention-head sites in two open-source VLMs. We demonstrate that a Naive Bayes text baseline cannot recover this signal, ruling out surface lexical confounds. Cross-benchmark separability patterns, together with a novel Prior Harnessing Index (PHI) measuring how much a model can answer from text alone, expose two distinct regimes: textual biases, where the model answers from language priors without engaging visual representations, and spurious images, where it constructs false visual content in latent space and answers as if grounded. The distinction has direct mitigation consequences: text-distribution cleaning can address the first regime but cannot reach the second, since spurious-image mirages live in the model's visual representations rather than its text. Faithful visual grounding will require interventions at the representational level.

2606.14200 2026-06-15 cs.AI cs.LG 交叉投稿

When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms

何时应条件化智能体信任?表征与攻击智能体群中的技能条件声誉

Yihan Xia, Taotao Wang

发表机构 * Shenzhen University(深圳大学)

AI总结 研究异构LLM智能体群中技能条件信任的适用条件,通过相图分析揭示其在高异质性、稀疏证据和技能相关场景下有效,但存在跨技能证据被攻击者利用的风险,提出条件信息值测试(CIVT)量化攻击影响。

Comments 18 pages, 8 figures, 2 tables

详情
AI中文摘要

开放平台越来越多地将任务路由给异构的LLM智能体——它们在基础模型、框架和工具栈上有所不同——其能力因技能而异:一个智能体在某项技能上表现出色,在另一项技能上可能毫无用处。标准的声誉方法为每个智能体总结一个单一的全局信任分数,但这里的标量是错误的对象,因为将每个任务路由到全局最受信任的智能体会放弃专业化的价值。我们研究技能条件信任R(i | k)——对于需要技能k的任务,应赋予智能体i的信任,而不是每个智能体一个分数——并提出三个可证伪的问题:何时条件化是值得的,应借用多少跨技能证据,以及这种借用是否安全。受控的相图分析回答了前两个问题:条件信任仅在特定区域获胜——高智能体异质性、稀疏的每技能证据和相关的技能——而实现这种数据效率的耦合强度β是双刃剑,因为相同的跨技能借用也是一个洗钱渠道。在14个真正异构的AppWorld智能体的公共基准上,实际池落在有益区域内——一个微小但真实的增益,每技能最佳智能体在不同技能间确实发生变化。然后我们展示,一个在一种技能上有廉价证据而在目标技能上没有证据的攻击者劫持条件路由器,将路由遗憾从0驱动到0.94,而我们的零成本条件信息值测试(CIVT)将其评为绿色——而它污染的无门控信任判决读数为-0.06,而非诚实的+0.19。零证据门限限制了攻击但并未消除它;我们在明确预算下表征了剩余成本。我们不声称抗女巫攻击——我们量化了权衡。

英文摘要

Open platforms increasingly route tasks among heterogeneous LLM agents--differing in base model, scaffold, and tool stack--whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)--the trust to place in agent i for a task requiring skill k, rather than one score per agent--and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime--high agent heterogeneity, sparse per-skill evidence, and correlated skills--and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime--a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN--while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance--we quantify the trade-off.

2606.14466 2026-06-15 cs.SD cs.AI cs.LG 交叉投稿

The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

音频模型中解释的感知脆弱性:在预测不变的情况下操纵归因

Piotr Kitłowski, Dominik Wiącek, Mateusz Modrzejewski

发表机构 * University of Warsaw(华沙大学)

AI总结 提出一种心理声学框架,通过优化不可听扰动来解耦模型归因与分类,证明在音频深度伪造检测中可系统扭曲解释热图而保持预测标签不变。

Comments Accepted to the ICML 2026 Workshop on Machine Learning for Audio: 5 pages, 4 figures

详情
AI中文摘要

本文研究了事后解释方法在音频深度伪造检测中的脆弱性。先前关于解释操纵的工作主要关注图像并使用标准$L_p$度量,而我们引入了一个心理声学框架,该框架优化不可听扰动以将模型归因与最终分类解耦。我们在严格的预测保持约束下,评估了这种脆弱性在多种最先进架构上的表现。通过领域特定的感知音频质量指标和解释对齐标准来评估操纵成本,我们的框架证明,攻击者可以在保持预测的深度伪造标签不变的情况下,系统地扭曲自动生成的解释热图。完整代码见:this https URL

英文摘要

This paper investigates the fragility of post-hoc explanation methods in audio deepfake detection. While previous work on explanation manipulation focused on images using standard $L_p$ metrics, we introduce a psychoacoustic framework that optimizes inaudible perturbations to decouple model attributions from final classifications. We evaluate this vulnerability across state-of-the-art architectures under strict prediction-preserving constraints. By evaluating the manipulation cost through domain-specific perceptual audio quality metrics alongside explanation alignment criteria, our framework demonstrates that an adversary can systematically distort automated explanation heatmaps while preserving the predicted deepfake label. Full code available at: https://github.com/cncPomper/Audio-XAI

2606.14476 2026-06-15 cs.AI cs.LG 交叉投稿

When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

当工具决定时:LLM代理盲目服从图神经网络工具,更强的骨干网络服从更多

Zhongyuan Wang, Pratyusha Vemuri

发表机构 * raptorX.ai

AI总结 研究LLM代理在使用GNN工具时是否真正判断而非盲目服从,发现代理在97.6-99.2%的情况下完全采纳GNN输出,且更强的骨干网络服从更多,选择性调用设计受限。

Comments 9 pages, 2 figures. Under review at TMLR

详情
AI中文摘要

越来越多的研究为大型语言模型(LLM)代理配备图神经网络(GNN)作为可调用工具,假设代理能够判断何时以及多大程度上依赖该工具。我们直接测试了这一假设。我们将冻结的GNN作为显式工具暴露给ReAct风格的LLM代理,并在文本属性图(ogbn-arxiv,在WikiCS上重复)上的节点分类任务中,测量代理是使用工具还是仅仅服从它。我们发现代理并未进行判断:其预测与原始GNN的预测一致率达到97.6-99.2%(5个随机种子),沦为GNN鹦鹉,全盘采用工具的输出并绕过自身推理。通过扫描骨干网络能力(Qwen2.5 0.5B-7B),这种服从并非弱模型伪影:在能够调用工具的模型中,一致性随能力提升而上升(从1.5B的0.60到7B的0.98)。关键的是,服从的代价并未随能力增长而缩小,反而在替代方案出现时扩大:每个节点上可用动作的oracle比鹦鹉在3B时高出0.09-0.18,在7B时高出0.12-0.22,在高同质性下几乎翻倍,因为鹦鹉被冻结的GNN所束缚,而代理的替代方案在改进;在7B时,简单的邻居标签工具在高同质性下超越了GNN(0.81 vs 0.71),但代理仍然服从。一个简单的选择性调用门恢复了约一半的高同质性差距(0.71到0.83),但未带来全局净收益,而保留估计表明,在标准测试时特征上可达到的最佳门最多只能获得oracle余量的三分之一:可靠的选择性调用似乎受限于可用信息,而不仅仅是路由器设计。我们的结果是一个警示性测量:对代理+工具系统的评估不能假设代理在工具之上添加了判断,选择性调用必须被设计进去,而不是期望从规模中涌现。

英文摘要

A growing line of work equips large language model (LLM) agents with graph neural networks (GNNs) as callable tools, assuming the agent exercises judgment over when and how much to rely on such a tool. We test this directly. We expose a frozen GNN to a ReAct-style LLM agent as an explicit tool and measure, on node classification over a text-attributed graph (ogbn-arxiv, replicated on WikiCS), whether the agent uses the tool or merely obeys it. We find the agent does not exercise judgment: its predictions agree with the raw GNN's 97.6-99.2% of the time (5 seeds), collapsing into a GNN parrot that adopts the tool's output wholesale and bypasses its own reasoning. Sweeping backbone capability (Qwen2.5 0.5B-7B), the deference is not a weak-model artifact: among models able to invoke the tool, agreement rises with capability (0.60 to 0.98 from 1.5B to 7B). Crucially, the cost of deference does not shrink as capability grows and grows where alternatives emerge: a per-node oracle over the available actions beats the parrot by 0.09-0.18 at 3B and 0.12-0.22 at 7B, roughly doubling at high homophily, because the parrot is pinned to the frozen GNN while the agent's alternatives improve; at 7B a simple neighbour-label tool overtakes the GNN at high homophily (0.81 vs 0.71) yet the agent still defers. A simple selective-invocation gate recovers about half of that high-homophily gap (0.71 to 0.83) but yields no net global gain, and held-out estimates bound the best achievable gate over standard test-time features to at most a third of the oracle headroom: reliable selective invocation looks limited by available information, not merely router design. Our results are a cautionary measurement: evaluations of agent+tool systems cannot assume the agent adds judgment on top of the tool, and selective invocation must be designed in rather than expected to emerge from scale.

2604.09737 2026-06-15 cs.LG cs.AI 版本更新

STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

STaR-DRO: 面向群体鲁棒结构化预测的状态化Tsallis重加权

Samah Fodeh, Ganesh Puthiaraju, Elyas Irankhah, Afshan Khan, Sreeraj Ramachandran, Linhai Ma, Srivani Talakokkul, Sarah Schellhorn

发表机构 * Yale University(耶鲁大学) Yale School of Medicine(耶鲁医学院)

AI总结 提出STaR-DRO框架,结合Tsallis镜像上升和稀疏entmax映射,仅对持续困难群体上权重,在结构化预测中提升标签准确性和鲁棒性,在EPPC Miner任务上相比SFT和标准DRO分别提升F1分数1.08和2.20。

详情
AI中文摘要

使用大型语言模型进行结构化预测需要输出在标签不平衡和异质群体难度下具有标签准确性、本体约束、结构有效性和证据基础。我们提出了一个统一框架用于本体约束生成。首先,我们引入了一个模块化的提示工程架构,结合了XML风格结构、专家消歧规则、思维链推理、元数据感知决策逻辑、模式契约和自我验证门。它针对反复出现的上下文失败,包括格式漂移、标签歧义、证据幻觉和元数据条件混淆。其次,我们提出了STaR-DRO,结合了Tsallis镜像上升、稀疏entmax风格原始映射、EMA平滑群体损失跟踪、重新缩放上升信号和有界超额乘数。与依赖密集香农熵指数梯度更新、可能引入高方差随机重加权、将正对抗质量分配给非持续困难群体、并通过单纯形竞争产生成本的常规DRO不同,STaR-DRO仅对持续困难群体上权重,而不抑制较容易的群体。我们在EPPC Miner上评估该框架,这是一个临床基础的高风险结构化预测任务,需要从患者-提供者安全消息中进行层次标签预测和证据跨度提取。在1B-70B Llama模型上,提示工程改进了零样本提取,平均标签F1增益为+14.46,跨度F1增益为+17.40。在监督微调的基础上,STaR-DRO进一步提高了准确性和鲁棒性,平均标签F1分别提高了+1.08和+2.20,同时相对于SFT和标准DRO,平均群体验证交叉熵分别降低了21.3%和14.8%。这些结果推进了以患者为中心的临床护理分析的可靠自动化通信挖掘。

英文摘要

Structured prediction with large language models requires outputs that are label-accurate, ontology-constrained, structurally valid, and evidence-grounded under label imbalance and heterogeneous group difficulty. We present a unified framework for ontology-constrained generation. First, we introduce a modular prompt-engineering architecture combining XML-style structure, expert disambiguation rules, chain-of-thought reasoning, metadata-aware decision logic, schema contracts, and a self-validation gate. It targets recurrent in-context failures, including format drift, label ambiguity, evidence hallucination, and metadata-conditioned confusion. Second, we propose STaR-DRO, combining Tsallis mirror ascent, sparse entmax-style primal mapback, EMA-smoothed group-loss tracking, rescaled ascent signals, and bounded excess-only multipliers. Unlike conventional DRO, which relies on dense Shannon-entropy exponentiated-gradient updates, can introduce high-variance stochastic reweighting, assigns positive adversarial mass to groups that are not persistently hard, and incurs costs through simplex competition, STaR-DRO upweights only persistently hard groups without suppressing easier ones. We evaluate the framework on EPPC Miner, a clinically grounded high-stakes structured-prediction task requiring hierarchical label prediction and evidence-span extraction from patient-provider secure messages. Across 1B-70B Llama models, prompt engineering improves zero-shot extraction, yielding an average label F1 gain of +14.46 and a Span F1 gain of +17.40. Building on supervised fine-tuning, STaR-DRO further improves accuracy and robustness, increasing average label F1 by +1.08 and +2.20 while reducing mean groupwise validation cross-entropy by 21.3% and 14.8% relative to SFT and standard DRO, respectively. These results advance reliable automated communication mining for patient-centered clinical care analysis.

2604.18419 2026-06-15 cs.LG cs.CL stat.ML 版本更新

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

知道何时退出:LLM推理中动态弃权的原则性框架

Hen Davidov, Nachshon Cohen, Oren Kalinsky, Yaron Fairstein, Guy Kushilevitz, Ram Yazdi, Patrick Rebeschini

发表机构 * Hebrew University of Jerusalem(特拉维夫大学)

AI总结 本文提出一个基于正则化强化学习框架的动态弃权原则,通过价值函数与弃权奖励的比较来决定是否提前终止推理,在数学推理和毒性避免任务上优于现有方法。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 by the author(s)
AI中文摘要

利用思维链推理的大型语言模型常常因产生冗长且错误的响应而浪费大量计算资源。弃权可以通过抑制可能不正确的输出来缓解这一问题。虽然大多数弃权方法在生成之前或之后决定是否保留输出,但动态的生成中弃权考虑在每个token位置提前终止无前途的推理轨迹。先前的工作探索了这一想法的经验变体,但缺乏对弃权规则的原则性指导。我们提出了LLM动态弃权的形式化分析,将弃权建模为正则化强化学习框架中的一个显式动作。弃权奖励参数控制计算与信息之间的权衡。我们证明,在一般条件下,当价值函数低于该奖励时弃权严格优于自然基线。我们进一步推导了一种原则性且高效的方法来近似价值函数。在数学推理和毒性避免任务上的实证结果支持我们的理论,并展示了相比现有方法改进的选择性准确性。

英文摘要

LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.

2605.04847 2026-06-15 cs.LG cs.AI 版本更新

Quantile-Free Uncertainty Quantification in Graph Neural Networks

图神经网络中的无分位数不确定性量化

Soyoung park, Hwanjun Song, Sungsu Lim

发表机构 * Soyoung Park Hwanjun Song Sungsu Lim

AI总结 提出QpiGNN框架,通过无分位数联合损失直接优化覆盖率和区间宽度,实现高效鲁棒的图神经网络不确定性量化,理论保证渐近覆盖和近最优宽度。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

不确定性量化(UQ)在图神经网络(GNN)中对于高风险领域至关重要,但仍是一个重大挑战。在图设置中,消息传递通常依赖于强假设(如可交换性),这些假设在实践中很少满足,并且实现可靠的UQ通常需要昂贵的重采样或事后校准。为了解决这些问题,我们引入了无分位数预测区间GNN(QpiGNN),这是一个基于分位数回归(QR)的框架,通过直接优化覆盖率和区间宽度来实现基于GNN的UQ,无需分位数输入或后处理。QpiGNN采用双头架构,将预测和不确定性解耦,并通过无分位数联合损失使用仅标签监督进行训练。这种设计允许高效训练,并产生鲁棒的预测区间,在温和假设下具有渐近覆盖率和近最优宽度的理论保证。在19个合成和真实世界基准上的实验表明,QpiGNN比基线平均覆盖率高22%,区间窄50%,同时确保了对噪声和结构变化的效率和鲁棒性。

英文摘要

Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice, and achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22% higher coverage and 50% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

2402.16388 2026-06-15 stat.ML cs.LG 版本更新

Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors

留一法、自助法和交叉共形异常检测器

Oliver Hennhöfer, Christine Preisach

发表机构 * German Federal Ministry for Economic Affairs and Climate Action(德国经济事务和气候行动部)

AI总结 为解决异常检测中校准数据不足的问题,基于共形预测提出留一法、自助法和交叉共形方法,在控制第一类错误率的同时提高数据效率。

Comments Published in 2024 IEEE International Conference on Knowledge Graph (ICKG)

详情
Journal ref
Proc. 2024 IEEE ICKG 15(1): 110-119 (February 2025)
AI中文摘要

异常检测系统中不确定性量化的需求日益重要。在此背景下,有效控制这些系统的第一类错误率而不增加第二类错误率,可以建立信任并减少与错误发现相关的成本。共形异常检测领域通过模型校准提供统计和有限样本有效性保证,成为一种有前景的方法。然而,对校准数据的依赖带来了实际限制,尤其是在低数据场景中。在本工作中,我们基于共形预测领域的方法,正式定义并评估了用于共形异常检测的留一法、自助法和交叉共形方法。超越经典的拆分共形方法,我们展示了用于计算重抽样共形$p$值的派生方法在全共形(直推式)方法的数据效率与拆分共形(归纳式)方法的计算效率之间提供了实用的折衷。我们验证了派生方法,并量化了它们在一类分类器和数据集上的改进。

英文摘要

The need for uncertainty quantification in anomaly detection systems has become increasingly important. In this context, effectively controlling Type I error rates without inflating Type II error rates in these systems can build trust and reduce costs associated with false discoveries. The field of conformal anomaly detection emerges as a promising approach for providing respective statistical and finite-sample validity guarantees through model calibration. However, reliance on calibration data imposes practical limitations, especially in low-data regimes. In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for conformal anomaly detection, building on methods from the field of conformal prediction. Looking beyond the classical split-conformal approach, we show that derived methods for calculating resampling-conformal $p$-values offer a practical compromise between the data efficiency of full-conformal (transductive) approaches and the computational efficiency of split-conformal (inductive) methods. We validate derived methods and quantify their improvements for a range of one-class classifiers and datasets.

2406.09250 2026-06-15 cs.CV cs.AI cs.LG 版本更新

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

MirrorCheck: 视觉-语言模型的高效对抗防御

Samar Fares, Klea Ziu, Toluwani Aremu, Nikita Durasov, Martin Takáč, Pascal Fua, Ivan Laptev, Karthik Nandakumar

发表机构 * Mohamed Bin Zayed University of Artificial Intelligence(莫扎伊德大学人工智能大学) NVIDIA École Polytechnique Fédérale de Lausanne(洛桑联邦理工学院) Michigan State University(密歇根州立大学)

AI总结 提出MirrorCheck框架,利用文本到图像模型和随机化策略检测并防御针对视觉-语言模型的自适应对抗攻击。

详情
AI中文摘要

视觉-语言模型(VLM)越来越容易受到复杂的对抗性攻击,包括专门设计用于绕过现有防御的自适应策略。为了解决这一漏洞,我们提出了MirrorCheck,一个鲁棒且与模型无关的检测框架,在单模态和多模态设置中均能有效运行。MirrorCheck利用文本到图像(T2I)模型从目标模型生成的标题中重建视觉内容,并通过比较原始图像和合成图像之间的特征空间嵌入来评估语义一致性。为了增强对自适应攻击的鲁棒性,MirrorCheck引入了一种随机防御策略,从多样化的模型库中随机选择T2I生成器和图像编码器。此外,我们采用了一种新颖的一次性(OTU)扰动,应用于所选编码器嵌入,并通过缩放因子调节,这降低了自适应攻击的有效性。跨多种威胁场景的大量实验表明,MirrorCheck始终优于基线方法,即使在强自适应对抗条件下也能保持其实用性。

英文摘要

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.

2512.04981 2026-06-15 cs.CV cs.LG 版本更新

Aligned but Stereotypical? How System Prompts Shape Demographic Bias in LLM-Based Text-to-Image Models

对齐但刻板?系统提示如何塑造基于LLM的文本到图像模型中的人口统计偏见

NaHyeon Park, Na Min An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim

发表机构 * KAIST(韩国科学技术院) HKUST (GZ)(香港科技大学(广州))

AI总结 研究LLM增强的文本到图像系统在提示扩展中引入隐性人口统计偏见的问题,提出无训练的去偏框架FairPro,通过自适应生成公平性指令减少人口统计差异。

Comments Project page: https://fairpro-t2i.github.io

详情
AI中文摘要

文本到图像(T2I)系统越来越依赖基于大语言模型(LLM)的文本条件来解释和扩展用户提示。虽然这提高了提示理解和文本-图像对齐,但我们发现,即使未指定人口统计属性,它也可能引入隐性的人口统计假设。为了系统地研究这种行为在不同提示模糊性和复杂性水平下的表现,我们构建了一个涵盖多种提示设置的综合基准。对八个最新T2I模型的评估表明,基于LLM的系统始终比非LLM基线表现出更强的人口统计偏差。我们进一步分析了系统提示,这是基于LLM的T2I系统特有的组件,用于指导提示解释和扩展。我们的分析表明,这些指令强烈影响文本嵌入,进而导致有偏的图像生成。受这些发现启发,我们提出了FairPro,一个无训练的去偏框架,它在保持用户意图的同时自适应地生成公平性感知指令。实验表明,FairPro在保持提示忠实度的同时显著减少了人口统计差异。

英文摘要

Text-to-image (T2I) systems increasingly rely on Large Language Model (LLM)-based text conditioning to interpret and expand user prompts. While this improves prompt understanding and text-image alignment, we find that it can also introduce implicit demographic assumptions, even when demographic attributes are unspecified. To systematically investigate this behavior across varying levels of prompt ambiguity and complexity, we construct a comprehensive benchmark covering diverse prompt settings. Evaluations on eight recent T2I models show that LLM-based systems consistently exhibit stronger demographic skew than non-LLM-based baselines. We further analyze system prompts, a component unique to LLM-based T2I systems that guides prompt interpretation and expansion. Our analyses show that these instructions strongly influence text embeddings, which subsequently leads to biased image generations. Motivated by these findings, we propose FairPro, a training-free debiasing framework that adaptively generates fairness-aware instructions while preserving user intent. Experiments demonstrate that FairPro substantially reduces demographic disparities while maintaining prompt fidelity.

2602.09161 2026-06-15 stat.ML cs.LG 版本更新

Minimum Distance Summaries for Robust Neural Posterior Estimation

最小距离摘要用于鲁棒神经后验估计

Sherman Khoo, Dennis Prangle, Song Liu, Mark Beaumont

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出最小距离摘要方法,通过最大均值差异(MMD)在测试时自适应调整摘要统计量,在不修改预训练神经后验估计器的情况下实现鲁棒推断,理论保证鲁棒性并实验验证。

详情
AI中文摘要

基于模拟的推断(SBI)通过首先在先验-模拟器对上训练神经后验估计器(NPE),通常使用低维摘要统计量,实现摊销贝叶斯推断,然后可以在新测试观测上查询以廉价地重复用于快速推断。由于NPE是在训练数据分布下估计的,当观测偏离训练分布时,它容易受到误指定的影响。许多鲁棒SBI方法通过修改NPE训练或引入误差模型来解决这个问题,将鲁棒性与推断网络耦合,损害了摊销和模块化。我们引入了最小距离摘要,一种即插即用的鲁棒NPE方法,独立于预训练NPE自适应调整测试时的摘要统计量。利用最大均值差异(MMD)作为观测数据与摘要条件预测分布之间的距离,自适应摘要从MMD继承了强鲁棒性属性。我们证明该算法可以通过随机傅里叶特征近似高效实现,产生轻量级、无模型的测试时自适应过程。我们为算法的鲁棒性提供了理论保证,并在各种合成和真实世界任务上进行了实证评估,表明在最小额外开销下实现了显著的鲁棒性提升。

英文摘要

Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

2603.23530 2026-06-15 cs.CL cs.AI cs.LG 版本更新

Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

你忘记我问什么了吗?大型语言模型中的前瞻记忆失败

Avni Mittal

发表机构 * University of Washington(华盛顿大学)

AI总结 本研究通过认知心理学中的前瞻记忆视角,发现大型语言模型在执行复杂任务时,格式化指令的遵从率下降2-21%,并提出了显著性增强格式来恢复遵从性。

详情
AI中文摘要

大型语言模型在必须同时执行要求较高的任务时,常常无法满足格式化指令。我们通过认知心理学中的前瞻记忆视角,使用一个受控范式来研究这种行为,该范式将可验证的格式化约束与复杂度递增的基准任务相结合。在三个模型家族和超过8000个提示中,在并发任务负载下,遵从性下降了2-21%。脆弱性高度依赖于类型:终端约束(需要在响应边界采取行动)下降最多,高达50%,而避免约束相对稳健。显著性增强格式(显式指令框架加上尾部提醒)恢复了大量丢失的遵从性,在许多设置中将性能恢复到90-100%。干扰是双向的:格式化约束也可能降低任务准确性,其中一个模型的GSM8K准确率从93%下降到27%。在额外的堆叠实验中,随着约束的累积,联合遵从性急剧下降。所有结果均使用确定性程序化检查器,无需LLM作为评判组件,并在公开可用的数据集上进行。

英文摘要

Large language models often fail to satisfy formatting instructions when they must simultaneously perform demanding tasks. We study this behaviour through a prospective memory inspired lens from cognitive psychology, using a controlled paradigm that combines verifiable formatting constraints with benchmark tasks of increasing complexity. Across three model families and over 8,000 prompts, compliance drops by 2-21% under concurrent task load. Vulnerability is highly type-dependent: terminal constraints (requiring action at the response boundary) degrade most, with drops up to 50%, while avoidance constraints remain comparatively robust. A salience-enhanced format (explicit instruction framing plus a trailing reminder) recovers much of the lost compliance, restoring performance to 90-100% in many settings. Interference is bidirectional: formatting constraints can also reduce task accuracy, with one model's GSM8K accuracy dropping from 93% to 27%. In additional stacking experiments, joint compliance declines sharply as constraints accumulate. All results use deterministic programmatic checkers without an LLM-as-judge component on publicly available datasets.

2606.02995 2026-06-15 cs.CR cs.AI cs.IR cs.LG 版本更新

Patcher: Post-Hoc Patching of Backdoored Large Language Models

Patcher: 后门大型语言模型的事后修补

Anjun Gao, Yueyang Quan, Yufei Xia, Zhuqing Liu, Minghong Fang

发表机构 * University of Louisville(路易斯维尔大学) University of North Texas(北得克萨斯大学)

AI总结 提出Patcher框架,仅利用单个失败案例和模型参数,通过基于梯度的显著性定位后门触发器,并采用约束微调消除触发-响应关联,同时保持模型效用。

Comments To appear in the USENIX Security Symposium, 2026

详情
AI中文摘要

大型语言模型仍然容易受到越狱后门攻击,其中对手污染安全对齐数据以嵌入隐藏触发器,从而绕过安全机制。现有防御通常需要全面的攻击信息或多个触发示例,使得当防御者仅观察到单个报告失败案例而不知道其源于后门攻击还是自然对齐错误时,这些防御不切实际。本文提出Patcher,一个事后防御框架,仅使用单个报告失败案例和模型参数来修复后门语言模型。Patcher分两个阶段运行。首先,通过计算基于响应的梯度显著性分数并应用自适应聚类将触发器与良性上下文分离来定位后门触发器。其次,通过约束微调目标修补模型,该目标打破触发-响应关联,同时通过KL散度约束保持良性任务效用和对非触发越狱攻击的鲁棒性。我们在多种后门攻击策略下进行了广泛评估,并证明Patcher成功定位触发器并中和后门,同时保持模型效用。我们进一步展示了针对旨在规避我们防御的自适应攻击的鲁棒性。这项工作代表了向部署语言模型中训练时攻击的实际防御迈出的重要一步。

英文摘要

Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical when defenders only observe a single reported failure case without knowing whether it stems from a backdoor attack or a natural alignment bug. This paper presents Patcher, a post-hoc defense framework that repairs backdoored language models using only a single reported failure case and the model parameters. Patcher operates in two stages. First, it localizes backdoor triggers by computing response-conditioned gradient-based saliency scores and applying adaptive clustering to separate triggers from benign context. Second, it patches the model through a constrained fine-tuning objective that breaks the trigger-response association while preserving benign-task utility and robustness to non-triggered jailbreak attacks through KL-divergence constraints. We conduct extensive evaluations across multiple backdoor attack strategies and demonstrate that Patcher successfully localizes triggers and neutralizes backdoors while maintaining model utility. We further show robustness against adaptive attacks designed to evade our defense. This work represents a significant step toward practical defenses against training-time attacks in deployed language models.

2605.21006 2026-06-15 cs.AI cs.CL cs.LG 版本更新

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

扮演魔鬼的代言人:现成的人格向量在顺从性上与针对性引导相媲美

Ishaan Kelkar, Nebras Alam, Vikram Kakaria, Madhur Panwar, Vasu Sharma, Maheep Chaudhary

发表机构 * University of Toronto(多伦多大学) Princeton University(普林斯顿大学) Purdue University(普渡大学) EPFL(瑞士联邦理工学院) Algoverse Independent(独立)

AI总结 本文研究了不同人格对顺从性的影响,发现现成的人格引导向量在减少顺从性方面与针对性引导相当,且在用户正确时保持准确性。

详情
Journal ref
ICML, Pluralistic Alignment Workshop, 2026
AI中文摘要

我们研究了不同人格对顺从性的影响:模型在用户错误时仍同意用户。标准缓解方法,对比激活添加(CAA),从顺从性和诚实响应的标记对中推导出引导方向。本研究评估了现成的人格引导向量是否能作为替代方案,这些向量最初是为一般角色扮演开发的,且未在顺从性数据上训练。在两个指令微调模型中,引导至以怀疑或审查为特征的人格可将顺从性减少到CAA效果的约68%和98%,且不同于CAA,在用户正确时保持准确性。效果也是不对称的:引导至顺从的人格不会产生镜像增加的顺从性。几何上,人格向量在激活空间的方向上与顺从性方向基本无关。总体而言,这些发现表明,顺从性应被视为人格层面的属性,而非单一可引导方向。我们在此发布代码:https://anonymous.4open.science/r/Sycophancy-Steering-9DF0/.

英文摘要

We study the effect of different persona on \textbf{sycophancy}: model's agreement with users even when the user is incorrect. The standard mitigation, Contrastive Activation Addition (CAA), derives a steering direction from labelled pairs of sycophantic and honest responses. This study evaluates whether off-the-shelf persona steering vectors, originally developed for general role-playing and not trained on sycophancy data, can serve as an alternative. In two instruction-tuned models, steering toward personas characterised by doubt or scrutiny reduces sycophancy to approximately $68\%$ and $98\%$ of CAA's effect, and, unlike CAA, maintains accuracy when the user is correct. The effect is also asymmetric: steering toward agreeable personas does not produce a mirror increase in sycophancy. Geometrically, the persona vector is largely independent of the direction of sycophancy in activation space. Collectively, these findings suggest that sycophancy is better understood as a persona-level property rather than a single steerable direction. We release our code here: https://anonymous.4open.science/r/Sycophancy-Steering-9DF0/.

9. 图学习与结构化数据 7 篇

2606.14022 2026-06-15 cs.LG 新提交

PostDeg: Placement Beats Parameterization in LayerNorm GNNs

PostDeg:在LayerNorm GNN中位置胜过参数化

Yash Tomar, Aryav Das

发表机构 * Purdue University(普渡大学) Park Tudor High School(帕克图多尔高中)

AI总结 发现LayerNorm会擦除拓扑信号,而后LayerNorm位置可保留信号;提出无参数的后LayerNorm逆度缩放PostDeg,在三个组合优化任务上提升显著,且四个证伪测试均未触发。

Comments Yash Tomar and Aryav Das contributed equally to this work

详情
AI中文摘要

基于LayerNorm的GNN通常会擦除节点选择策略应依赖的拓扑信号(度、中心性、$k$-核),但文献尚未定位擦除发生在残差块中的何处。我们回答了这个问题:在LayerNorm之前插入的正逐节点标量会被除以一个稳定项,而同一标量在LayerNorm之后插入会作为表示幅度到达分数头。幸存的位置是后LayerNorm位置。我们通过PostDeg实例化它,这是一种无参数的后LayerNorm逆度缩放,并预先注册了四个证伪器(图级标量、额外LayerNorm、表达能力相同的槽位、与骨干无关的来源),这些证伪器将拒绝该规则。PostDeg在影响力最大化、网络瓦解和最大独立集上比LN骨干分别提升$+3.5\%/+2.5\%/+5.6\%$,每个任务在10/10配对种子中获胜;四个证伪器均未触发。结论是,增益来自位置而非参数化——这是一个小的不变性检查,可推广到任何归一化残差堆栈中的任何正拓扑标量。

英文摘要

LayerNorm-based GNNs routinely erase the topology signals (degree, centrality, $k$-core) that node-selection policies should depend on, but the literature has not located where in the residual block the erasure happens. We answer that question: a positive per-node scalar inserted before LayerNorm is divided out up to a stabilizer term, while the same scalar inserted after LayerNorm reaches the score head as representation magnitude. The surviving slot is the post-LayerNorm position. We instantiate it with PostDeg, a parameter-free post-LayerNorm inverse-degree scale, and pre-register four falsifiers (graphwise scalars, extra LayerNorm, expressive same-slot capacity, backbone-agnostic source) that would reject the rule. PostDeg gains $+3.5\%/+2.5\%/+5.6\%$ over the LN backbone on influence maximization, network dismantling, and maximum independent set, with $10/10$ paired-seed wins per task; none of the four falsifiers fires. The takeaway is that placement, not parameterization, carries the gain -- a small invariance check that generalizes to any positive topology scalar in any normalized residual stack.

2606.14172 2026-06-15 cs.LG cs.CV 新提交

Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs

上下文感知的模态-拓扑协同对齐用于多模态属性图

Sirui Zhang, Xu Wang, Zhengyu Wu, Xunkai Li, Hongchao Qin

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出CoMAG框架,通过任务自适应可靠上下文学习和模态保持的跳令牌对齐,统一处理图任务和模态任务,在保持稀疏边线性复杂度的同时提升结构预测、跨模态匹配和图条件生成性能。

详情
AI中文摘要

多模态属性图(MAGs)通过将图拓扑与文本、图像等异质属性耦合来建模真实世界实体。它们支持需要结构和类别判别表示以进行图中心任务,以及需要细粒度跨模态对应以进行模态中心任务。然而,现有的MAG方法通常依赖固定的图上下文或统一融合的表示,导致任务无关的传播和过度压缩的融合,阻碍了多样化的任务需求和模态特定证据的保留。为了解决这个问题,我们提出了CoMAG,一个统一的MAG骨干网络,学习任务自适应的可靠上下文并在其中进行模态保持的对齐。CoMAG首先通过从多模态语义一致性估计边可靠性、用语义邻居补充原始拓扑以及通过任务感知门选择上下文组件来进行可靠上下文学习。然后,它通过维护模态特定的多跳轨迹、跨模态匹配模态-跳令牌以及解耦共享和私有表示来进行模态保持的跳令牌对齐。因此,CoMAG在一次前向传播中产生图和模态表示,同时保留模态特定的线索。我们进一步分析了稳定传播、缓解过度平滑和控制模态崩溃。在九个OpenMAG数据集上的实验将CoMAG与仅特征、仅图、多模态和统一的MAG基线在图级预测、模态匹配和图条件生成方面进行了比较。结果表明,CoMAG达到了最佳报告性能,证明任务自适应的可靠上下文和模态保持的对齐改善了结构预测、跨模态匹配和图条件生成,同时保持了稀疏边线性复杂度。

英文摘要

Multimodal Attributed Graphs (MAGs) model real-world entities by coupling graph topology with heterogeneous attributes such as text and images. They support graph-centric tasks requiring structural and class-discriminative representations, and modality-centric tasks requiring fine-grained cross-modal correspondence. However, existing MAG methods often rely on fixed graph contexts or uniformly fused representations, causing task-agnostic propagation and over-compressed fusion that hinder diverse task requirements and modality-specific evidence preservation. To address this, we propose CoMAG, a unified MAG backbone that learns task-adaptive reliable contexts and modality-preserving alignment within them. CoMAG first conducts Reliable Context Learning by estimating edge reliability from multimodal semantic consistency, complementing raw topology with semantic neighbors, and selecting context components through a task-aware gate. It then performs Modality-preserving Hop-token Alignment by maintaining modality-specific multi-hop trajectories, matching modality-hop tokens across modalities, and decoupling shared and private representations. Thus, CoMAG produces graph and modality representations from one forward pass while retaining modality-specific cues. We further analyze stable propagation, over-smoothing mitigation, and modality-collapse control. Experiments on nine OpenMAG datasets compare CoMAG with feature-only, graph-only, multimodal, and unified MAG baselines across graph-level prediction, modality matching, and graph-conditioned generation. Results show that CoMAG achieves the best reported performance, demonstrating that task-adaptive reliable contexts and modality-preserving alignment improve structural prediction, cross-modal matching, and graph-conditioned generation while retaining sparse edge-linear complexity.

2606.14636 2026-06-15 cs.LG 新提交

Graph Diffusion Residuals for Control-Function Instrumental Variables

用于控制函数工具变量的图扩散残差

Rui Wu, Zongyuan Chen, Hong Xie, Defu Lian, Enhong Chen

发表机构 * School of Computer Science and Engineering, University of Science and Technology of China(中国科学技术大学计算机科学与技术学院)

AI总结 提出自适应各向异性工具热流(A-IHF),一种基于图扩散的残差提取方法,用于灵活控制函数,通过检测处理跳跃并调整图传导性,在合成基准测试中优于多种基线方法。

Comments Submitted to Journal of Machine Learning Research (JMLR). 50 pages, 6 figures

详情
AI中文摘要

控制函数工具变量估计器需要第一阶段残差,而不仅仅是第一阶段预测。高容量的第一阶段可能会插值处理,从而为结果方程留下过少的残差信息。我们研究了自适应各向异性工具热流(A-IHF),这是一种用于灵活控制函数的确定性图扩散残差提取器。A-IHF将处理视为第一阶段特征图上的信号,使用引导扩散检测大的处理跳跃,减弱这些跳跃上的传导性,并通过稀疏图预解式计算生成的控制。其观测选择规则仅使用$(Z,X)$,结合了图广义交叉验证、粗糙度、残差化处理相关性以及图可容许性过滤。分析将误差分解为结构泄漏、残差衰减和残差化处理变异,得到有限样本界、潜在分段光滑几何下的图可容许性率以及有限路径选择校准。在54个合成基准单元中,与调优的图、核、树、提升、级数和神经网络控制函数基线相比,有保护的观测A-IHF具有最低的平均结构响应MSE;A-IHF族在32个单元中击败了最佳的非A-IHF基线。当图捕获分段光滑的第一阶段结构时,性能最强。

英文摘要

Control-function instrumental variable estimators need a first-stage residual, not merely a first-stage prediction. High-capacity first stages can interpolate treatment and leave too little residual information for the outcome equation. We study Adaptive Anisotropic Instrumental Heat Flow (A-IHF), a deterministic graph-diffusion residual extractor for flexible control functions. A-IHF treats treatment as a signal on a graph of first-stage features, uses pilot diffusion to detect large treatment jumps, attenuates conductance across those jumps, and computes the generated control with a sparse graph resolvent. Its observational selection rule uses only $(Z,X)$, combining graph generalized cross-validation, roughness, residualized-treatment relevance, and graph-admissibility filtering. The analysis decomposes error into structural leakage, residual attenuation, and residualized treatment variation, yielding finite-sample bounds, graph-admissibility rates under latent piecewise-smooth geometry, and finite-path selection calibration. Across 54 synthetic benchmark cells with tuned graph, kernel, tree, boosting, series, and neural control-function baselines, guarded observational A-IHF has the lowest average structural-response MSE; the A-IHF family beats the best non-A-IHF baseline in 32 cells. Performance is strongest when the graph captures piecewise-smooth first-stage structure.

2606.14047 2026-06-15 cs.IR cs.AI cs.CL cs.LG 交叉投稿

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

知识图谱增强的记忆增强检索用于长上下文建模

Ghadir Alselwi, Basem Suleiman, Hao Xue, Shoaib Jameel, Hakim Hacid, Flora D. Salim, Imran Razzak

发表机构 * University of New South Wales(新南威尔士大学) Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) University of Southampton(南安普顿大学) Technology Innovation Institute(技术创新研究所) Mohamed Bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 提出KGERMAR框架,通过动态构建上下文知识图谱并融合多组件记忆架构,在长上下文建模中降低困惑度达8.5%,提升记忆效率2-2.5倍。

详情
AI中文摘要

长上下文语言建模不仅需要扩展上下文窗口,还需要在数千个token中保持对实体状态和关系的连贯理解——这是语义相似性单独无法解决的挑战。KGERMAR通过在推理过程中从输入文本构建动态的、上下文特定的知识图谱来解决这一问题,实现利用语义相似性和显式实体关系的领域自适应检索。该框架执行实时实体和关系抽取以构建上下文知识图谱,然后通过多组件记忆架构将图结构嵌入与文本语义相结合。维护三个记忆库——上下文、语义和结构——通过学习权重融合检索信号,以捕获表面语义和更深层次的关系模式。在SlimPajama(84.7K训练样本)、WikiText-103(4,358样本)、PG-19(100样本)和Proof-pile(46.3K样本)上评估,KGERMAR在1K到32K token的上下文长度上,相比记忆增强基线实现了高达8.5%的困惑度降低和2-2.5倍的记忆效率提升,并在五个NLU任务上展现出优越的上下文学习性能。动态知识图谱构建方法通过实现适应输入上下文而非依赖固定知识库的领域特定知识表示,推进了记忆增强语言建模。

英文摘要

Long-context language modeling requires not only extending context windows but maintaining coherent understanding of entity states and relationships across thousands of tokens -- a challenge that semantic similarity alone cannot address. KGERMAR addresses this by constructing dynamic, context-specific knowledge graphs from input text during inference, enabling domain-adaptive retrieval that leverages both semantic similarity and explicit entity relationships. The framework performs real-time entity and relation extraction to build contextual knowledge graphs, then integrates graph-structural embeddings with textual semantics through a multi-component memory architecture. Three memory banks -- contextual, semantic, and structural -- are maintained with retrieval signals fused via learned weights to capture both surface-level semantics and deeper relational patterns. Evaluated on SlimPajama (84.7K training examples), WikiText-103 (4,358 examples), PG-19 (100 examples), and Proof-pile (46.3K examples), KGERMAR achieves up to 8.5\% lower perplexity and 2--2.5x better memory efficiency than memory-augmented baselines across context lengths from 1K to 32K tokens, with superior in-context learning performance across five NLU tasks. The dynamic knowledge graph construction approach advances memory-augmented language modeling by enabling domain-specific knowledge representation that adapts to input contexts rather than relying on fixed knowledge bases.

2602.09258 2026-06-15 cs.LG 版本更新

Generalizing GNNs with Tokenized Mixture of Experts

泛化GNN:基于令牌化的专家混合

Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang

发表机构 * University of Connecticut Storrs(康涅狄格大学斯特劳斯分校) University of Notre Dame(Notre Dame 大学) University of Virginia(弗吉尼亚大学) Northwestern University Evanston(北western 大学埃文斯顿分校)

AI总结 针对图神经网络部署时稳定性与泛化性的权衡,提出STEM-GNN框架,通过令牌化专家混合编码器、向量量化接口和Lipschitz正则化头实现三方面平衡,在多种分布偏移和扰动下提升鲁棒性。

Comments Accepted to KDD 2026

详情
AI中文摘要

部署的图神经网络(GNN)在部署时是冻结的,但必须适应干净数据,在分布偏移下泛化,并对扰动保持稳定。我们表明静态推理引入了一个基本权衡:提高稳定性需要减少对偏移敏感特征的依赖,留下一个不可约的最坏情况泛化下限。实例条件路由可以打破这个上限,但很脆弱,因为偏移可能误导路由,扰动可能使路由波动。我们通过两个分解来捕捉这些效应:覆盖与选择分离,以及基础敏感性与波动放大分离。基于这些见解,我们提出了STEM-GNN,一个预训练-微调框架,包含一个用于多样化计算路径的专家混合编码器,一个用于稳定编码器到头部信号的向量量化令牌接口,以及一个用于限制输出放大的Lipschitz正则化头部。在九个节点、链接和图基准测试中,STEM-GNN实现了更强的三方面平衡,提高了对度/同质性偏移以及特征/边损坏的鲁棒性,同时在干净图上保持竞争力。

英文摘要

Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor. Instance-conditional routing can break this ceiling, but is fragile because shifts can mislead routing and perturbations can make routing fluctuate. We capture these effects via two decompositions separating coverage vs selection, and base sensitivity vs fluctuation amplification. Based on these insights, we propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths, a vector-quantized token interface to stabilize encoder-to-head signals, and a Lipschitz-regularized head to bound output amplification. Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.

2605.07121 2026-06-15 cs.AI cs.LG 版本更新

AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning

AdaTKG: 用于时序知识图谱推理的自适应记忆

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

发表机构 * LG AI Research(LG人工智能研究)

AI总结 提出AdaTKG,通过为每个实体维护自适应记忆,并采用可学习的指数移动平均更新,解决时序知识图谱中实体表示静态的问题,提升推理性能。

Comments KDD Workshop on Frontiers in Graph Machine Learning for the Large Model Era 2026 (Oral Presentation)

详情
AI中文摘要

时序知识图谱(TKG)表示带有时间戳的关系事实,并支持对演化事件进行广泛的推理任务。然而,现有方法生成的实体表示在实体层面是静态的,即每个表示仅是学习参数的函数,且不保留实体参与交互的任何痕迹。在本文中,我们摒弃这种静态观点,提出将每个实体建模为一个自适应过程,其表示在实体每次参与事实时被细化。为此,我们提出AdaTKG,它为每个实体维护一个记忆,该记忆随每次观察到的交互而更新,记忆在线累积,预测随更多交互的到来而改进。具体而言,我们将记忆更新实例化为一个可学习的指数移动平均,由单个共享标量控制,而不是为每个实体使用可学习参数,使AdaTKG能够处理训练中未见过的实体。大量实验证实了相对于TKG基线的持续改进,证明了自适应记忆的有效性。代码见:this https URL

英文摘要

Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. To this end, we propose AdaTKG, which maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. Specifically, we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training. Extensive experiments confirm consistent gains over TKG baselines, demonstrating the effectiveness of adaptive memory. Code is available at: https://github.com/seunghan96/AdaTKG

2606.11898 2026-06-15 cs.CL cs.LG 版本更新

GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs

GraspLLM: 面向文本属性图与LLM的零样本泛化

Hengyi Feng, Zeang Sheng, Meiyi Qiang, Yang Li, Wentao Zhang

发表机构 * Peking University(北京大学) National University of Singapore(新加坡国立大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出GraspLLM框架,通过融合图结构理解与LLM语义能力,利用基序感知对比学习和最优上下文子图对齐,实现跨数据集和跨任务的零样本泛化。

详情
AI中文摘要

近年来,对文本属性图(TAGs)的研究因其在引文网络、电子商务平台、社交媒体和网页等各类真实数据场景中的广泛应用而备受关注。受大语言模型(LLMs)卓越语义理解能力的启发,已有许多尝试将LLMs集成到TAGs中。然而,现有方法仍难以在不同图和任务间泛化,且其捕获可迁移图结构模式的能力有限。为此,我们提出了GraspLLM框架,该框架将图结构理解与LLM的语义理解能力相结合,以增强跨数据集和跨任务的泛化能力。具体而言,我们使用冻结的通用嵌入模型将不同图的节点文本表示在统一语义空间中,在此基础上,我们在多个基序诱导的邻接矩阵上进行基序感知对比学习,以提取与数据集无关的结构信息。然后,通过我们提出的最优上下文子图,为每个目标节点提取最相关的上下文子图,并通过对齐投影仪将这些子图对齐到LLM的令牌空间。在涵盖不同领域的TAG基准数据集上的大量实验表明,GraspLLM在零样本场景下始终优于先前基于LLM的TAG方法,突显了其在不同数据集和任务上的强泛化能力。我们的代码可在以下网址获取:此 https URL。

英文摘要

Research on Text-Attributed Graphs (TAGs) has gained significant attention recently due to its broad applications across various real-world data scenarios, such as citation networks, e-commerce platforms, social media, and web pages. Inspired by the remarkable semantic understanding ability of Large Language Models (LLMs), there have been numerous attempts to integrate LLMs into TAGs. However, existing methods still struggle to generalize across diverse graphs and tasks, and their ability to capture transferable graph structural patterns remains limited. To address this, we introduce the GraspLLM, a framework that combines Graph structural comprehension with semantic understanding prowess of LLMs to enhance the cross-dataset and cross-task generalizability. Specifically, we represent node texts from different graphs in a unified semantic space with a frozen general embedding model, on top of which we perform motif-aware contrastive learning across multiple motif-induced adjacency matrices to extract dataset-agnostic structural information. Then, with our proposed optimal contextual subgraph, we extract the most contextually relevant subgraph for each target node and align these subgraphs to the token space of LLM via an alignment projector. Extensive experiments on TAG benchmark datasets spanning diverse domains reveal that GraspLLM consistently outperforms previous LLM-based methods for TAGs, especially in zero-shot scenarios, highlighting its strong generalizability across different datasets and tasks. Our code is available at https://github.com/Heinz217/GraspLLM.

10. 迁移、元学习与持续学习 4 篇

2606.14155 2026-06-15 cs.LG cs.CL 新提交

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

基于图的目标反向传播用于多LLM智能体系统中的上下文自适应

Tan Zhu, Tong Yao, Kananart Kuwaranancharoen, Amit Singh, Yushang Lai, Deepa Mohan, Shankara Bhargava

发表机构 * Retail Intelligence, Walmart Global Tech(零售智能,沃尔玛全球技术)

AI总结 提出GTBP框架,通过图结构反向传播局部目标输出,实现多LLM智能体工作流的上下文自适应,理论保证稳定性,实验优于基线。

详情
AI中文摘要

上下文自适应通过迭代地从任务反馈中修改可调提示,无需修改模型权重,自动化了基于LLM系统中的提示工程。将这一范式扩展到多LLM智能体系统至关重要:现有方法存在不准确的信用分配问题且缺乏收敛保证。我们提出基于图的目标反向传播(GTBP),一种针对建模为有向无环图的智能体工作流的上下文自适应框架。GTBP通过工作流图向后传播局部目标输出,并利用目标-输出差异指导阶段式提示更新机制。理论上,我们证明GTBP的阶段式提示更新在迭代中变得稳定,且足够强大的LLM优化器可以降低整体目标。实验上,GTBP在三个基准测试中一致优于强基线,同时保持可比较的计算成本。

英文摘要

Context adaptation automates prompt engineering in LLM-based systems by iteratively revising tunable prompts from task feedback, without modifying model weights. Extending this paradigm to multi-LLM agentic systems is crucial: existing methods suffer from inaccurate credit assignment and lack convergence guarantees. We propose \textbf{G}raph-based \textbf{T}arget \textbf{B}ack-\textbf{P}ropagation (GTBP), a context adaptation framework for agentic workflows modeled as directed acyclic graphs. GTBP propagates local target outputs backward through the workflow graph and uses target--output discrepancies to guide a stage-wise prompt update mechanism. Theoretically, we show that GTBP's stage-wise prompt updates become stable over iterations, and that a sufficiently capable LLM optimizer can decrease the overall objective. Empirically, GTBP consistently outperforms strong baselines across three benchmarks while maintaining comparable computational cost.

2606.14222 2026-06-15 cs.LG 新提交

Learning the Context of Errors: Black-Box Online Adaptation of Time Series Foundation Models

学习错误的上下文:时间序列基础模型的黑盒在线自适应

Xilin Dai, Yiding Liu, Hongjie Xia, Yifan Hu, Zewei Dong, Jiang-Ming Yang, Qiang Xu

发表机构 * Ant International(蚂蚁国际) The Chinese University of Hong Kong(香港中文大学)

AI总结 针对黑盒时间序列基础模型在线自适应问题,提出ORCA方法,通过学习基础模型预测误差的上下文(输入和输出)进行自适应,在5个模型和8个数据集上验证有效性。

详情
AI中文摘要

时间序列基础模型(TSFMs)的快速发展推动了跨领域的零样本预测。受当前大型语言模型形式的启发,未来的TSFMs可能作为商业化的闭源API服务提供。然而,许多现有的在线自适应方法仍然依赖于白盒访问进行参数微调或梯度反向传播。这种范式不匹配引发了一个问题:在TSFMs的黑盒在线自适应中,我们应该学习什么?我们用一个见解来回答:基础模型的预测误差取决于基础模型的输入和输出(即错误的上下文)。为了验证这一见解,我们提出了ORCA(在线残差上下文自适应)。我们在5个最先进的TSFMs和8个数据集上进行了大量实验,以证明我们方法的有效性。此外,通过消融研究,我们定量分析了不同适配器学习假设对黑盒在线自适应最终性能的影响。代码可在https://this URL获取。

英文摘要

The rapid evolution of Time Series Foundation Models (TSFMs) has advanced zero-shot forecasting across diverse domains. Inspired by the current form of Large Language Models, future TSFMs may be offered as commercialized, closed-source API services. However, many existing online adaptation methods still rely on white-box access for parameter fine-tuning or gradient backpropagation. This paradigm mismatch raises a question: In black-box online adaptation for TSFMs, what should we learn? We answer this with an insight: the predictive errors of the base model are conditioned on both the input and output of the base model (i.e., the context of errors). To validate this insight, we propose ORCA (Online Residual Contextual Adaptation). We conduct extensive experiments across 5 state-of-the-art TSFMs and 8 datasets to demonstrate the effectiveness of our approach. Furthermore, through ablation studies, we quantitatively analyze the impact of different adapter learning hypotheses on the final adaptation performance in black-box online adaptation. Code available at https://github.com/Fifthky/ORCA.

2606.14023 2026-06-15 stat.ML cs.LG stat.ME 交叉投稿

Geometric Domain Adaptation via Optimal Transport for Linear Regression in R^2

R^2中线性回归的几何域自适应:基于最优传输

Brian Britos, Mathias Bourel

发表机构 * University of the People(人民大学)

AI总结 针对源域与目标域存在旋转、平移或缩放变换的线性回归问题,提出结合K-means与最优传输的方法估计变换,实现目标数据稀缺时的模型自适应,理论证明p≥2时最优传输恢复变换。

详情
AI中文摘要

最优传输最近通过对齐源分布和目标分布,成为域自适应的一种强大方法。我们研究了一个监督域自适应问题,其中源域和目标域在$\mathbb{R}^2$中通过旋转、平移或缩放相关联。我们证明,当使用$p \geq 2$的$p$-范数成本时,最优传输映射能够恢复底层映射。基于这一见解,我们开发了一种结合$K$-means和最优传输的方法来估计底层映射,从而在目标数据稀缺时实现线性回归模型的自适应。模拟表明,与基线方法相比,性能有所提升。我们不依赖高表达力的深度学习架构,而是专注于经典机器学习模型,以强调可解释性和理论洞察。这一视角使我们能够明确刻画最优传输在恢复旋转、平移和缩放等几何变换中的作用。我们的贡献包括一个将最优传输与$\mathbb{R}^2$中的旋转、平移和缩放联系起来的理论结果,以及一种用于线性回归自适应的实用方法,在该空间的域自适应任务中既提供概念清晰性又具有应用价值。

英文摘要

Optimal Transport has become recently a powerful method for domain adaptation by aligning source and target distributions. We study a supervised domain adaptation problem where source and target domains are related by a rotation or a translation or a homothety in $\mathbb{R}^2$. We prove that the optimal transport map recovers the underlying map when using a $p-$norm cost with $p \geq 2$. Based on this insight, we develop a method combining $K-$means and optimal transport to estimate the underlying map, enabling adaptation of linear regression models when target data is scarce. Simulations demonstrate improved performance over baseline methods. Rather than relying on highly expressive deep learning architectures, we focus on classical machine learning models to emphasize interpretability and theoretical insight. This perspective allows us to explicitly characterize the role of optimal transport in recovering geometric transformations such as rotations, translations, and homotheties. Our contributions include a theoretical result linking optimal transport and rotations, translations and homothecies in $\mathbb{R}^2$, and a practical method for adaptation in linear regression offering both conceptual clarity and applied value in domain adaptation tasks in this space.

2601.04885 2026-06-15 cs.CL cs.AI cs.LG 版本更新

CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

CuMA: 通过人口统计感知的适配器混合使大语言模型与稀疏文化价值观对齐

Ao Sun, Xiaoyu Wang, Zhe Tan, Yu Li, Jiachen Zhu, Yuheng Jia, Shu Su

发表机构 * Southeast University(东南大学) ByteDance Inc.(字节跳动公司) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China(新一代人工智能技术及其交叉应用重点实验室(东南大学),中华人民共和国教育部,中国)

AI总结 提出CuMA框架,通过人口统计感知路由将冲突梯度分离到专家子空间,解决密集模型在多文化对齐中的均值崩溃问题,在WorldValuesBench等基准上取得最优性能。

Comments ACL 2026 Main

详情
AI中文摘要

随着大语言模型服务于全球用户,对齐必须从强制执行普遍共识转向尊重文化多元主义。我们证明,密集模型在被迫适应冲突的价值分布时会出现\textbf{均值崩溃},收敛到无法代表不同群体的通用平均值。我们将其归因于\textbf{文化稀疏性},其中梯度干扰阻止密集参数跨越不同的文化模式。为解决此问题,我们提出\textbf{\textsc{CuMA}}(\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters),一个将对齐视为\textbf{条件容量分离}问题的框架。通过引入人口统计感知路由,\textsc{CuMA}内化了一个\textit{潜在文化拓扑},以将冲突梯度明确解耦到专门的专家子空间中。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明,\textsc{CuMA}达到了最先进的性能,显著优于密集基线和仅语义MoE。关键的是,我们的分析证实\textsc{CuMA}有效缓解了均值崩溃,保留了文化多样性。我们的代码可在该https URL获取。

英文摘要

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.

11. 数据集、基准与评测 25 篇

2606.13823 2026-06-15 cs.LG eess.SP stat.ML 新提交

A Stationarity-and-Coupling Criterion for Training-Free Time-Lagged Spectral Embeddings of Multivariate Time Series

多变量时间序列无训练时滞谱嵌入的平稳性与耦合准则

Siddharth Pal, Viktoria Rojkova

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出基于时滞相关矩阵截断的固定长度描述符D(τ),通过平稳高斯VAR(1)模型推导其适用条件:信号近似平稳且类别信息存在于跨通道时间耦合而非边际功率。

Comments 25 pages, 2 figures, 10 tables

详情
AI中文摘要

我们研究多变量时间序列的无训练固定长度描述符,不仅问这样的描述符是否表现良好,而且问何时可以预期它有效。我们的研究对象是$D(\tau)$,它由时滞相关矩阵在Marchenko-Pastur边缘截断构建,使得仅信号承载的特征值存活,并通过与类质心的余弦相似度分类,零学习参数。核心贡献不是描述符本身,而是一个可证伪的适用性准则。基于平稳高斯VAR(1)模型,我们论证当信号近似平稳且类别信息存在于它们的跨通道时间耦合而非边际每通道功率时,$D(\tau)$能分离两个类别。我们半正式地推导出三个结果:可区分性条件、为什么静态($\tau=0$)协方差退化为随机、以及为什么平稳但功率判别范式会击败描述符。该准则是可操作的:一个两部分预检测试——增强Dickey-Fuller平稳性检验和功率基线饱和检验——在任何训练前预测适用性。我们在混合数据集上验证了这两部分。在满足准则的四个范式(Sleep-EDF、BCI-IV-2a、MIT-BIH、ESC-50)上,描述符以极低成本与强基线竞争,在Sleep-EDF上20受试者留一法下达到$88.5\pm4.5\\%$,单CPU线程。在违反准则的三个范式——非平稳ERP、以及功率判别的金融波动和可穿戴压力模式——上,它完全如预检预测的那样失败,而这些负面结果更具信息量。我们明确$D(\tau)$不是最准确的表示;其价值在于它是一个紧凑、无训练的嵌入,其有效域事先已知。

英文摘要

We study training-free fixed-length descriptors for multivariate time series and ask not merely whether such a descriptor performs well, but when it can be expected to work at all. Our object of study is $D(τ)$, built from a time-lagged correlation matrix truncated at the Marchenko-Pastur edge so that only signal-bearing eigenvalues survive and classified by cosine similarity to class centroids with zero learned parameters. The central contribution is not the descriptor but a falsifiable applicability criterion for it. Working from a stationary Gaussian VAR(1) model, we argue that $D(τ)$ separates two classes when the signals are approximately stationary and the class information lives in their cross-channel temporal coupling rather than in marginal per-channel power. We derive, semi-formally, three consequences: a distinguishability condition, why the static ($τ=0$) covariance collapses to chance, and why a stationary but power-discriminated paradigm defeats the descriptor. The criterion is operational: a two-part pre-flight test -- an augmented Dickey-Fuller stationarity check and a power-baseline saturation check -- predicts applicability before any training. We validate both halves on a mixed assortment. On four paradigms that satisfy the criterion (Sleep-EDF, BCI-IV-2a, MIT-BIH, ESC-50) the descriptor is competitive with strong baselines at a fraction of their cost, reaching $88.5\pm4.5\%$ under 20-subject leave-one-subject-out on Sleep-EDF on a single CPU thread. On three that violate it -- non-stationary ERPs, and financial-volatility and wearable-stress regimes that are power-discriminated -- it fails exactly as the pre-flight predicts, and these negatives are the more informative half. We are explicit that $D(τ)$ is not the most accurate representation; its value is a compact, training-free embedding whose domain of validity is known in advance.

2606.14123 2026-06-15 cs.LG cs.AI 新提交

Recovering Stranded Discrimination in Knowledge Tracing: Per-Item Bias Correction via Empirical-Bayes Shrinkage

知识追踪中恢复被搁置的区分能力:通过经验贝叶斯收缩进行逐项偏差校正

Xiaoran Yan, Cheng Tang, Atsushi Shimada

发表机构 * Kyushu University(九州大学)

AI总结 提出SLC方法,利用Laplace/IRLS将二值观测转化为高斯伪观测,通过卡尔曼平滑器进行经验贝叶斯收缩,并拟合偏移Platt链接,以校正知识追踪模型中的逐项偏差,恢复被搁置的区分能力,在多个数据集和骨干网络上提升AUC和NLL。

Comments 25 pages, 3 figures. Accepted at ECML PKDD 2026 (Research Track). Code: https://github.com/xiaoran-y/SLC

详情
AI中文摘要

部署的知识追踪模型通常在训练后被冻结,但由于骨干架构中逐项表达能力的限制以及部署后项目属性的变化,会出现系统性的逐项logit偏差,从而降低预测质量。全局事后校准器(如Platt缩放、温度缩放和保序回归)能改善概率估计,但无法改变由AUC衡量的区分能力。这种AUC不变性是单调分数变换的结构性结果;恢复被搁置的区分能力需要以项目身份为条件。我们提出SLC(状态空间logit校正),通过Laplace/IRLS将二值观测转换为高斯伪观测,通过卡尔曼平滑器应用经验贝叶斯收缩,并拟合偏移Platt链接。状态空间公式还产生了一个可检测性界限,表征了伯努利信息下限,解释了在当前数据密度下时间跟踪为何没有益处。在四个数据集、五个骨干网络和三个随机种子上,SLC在所有四个数据集上提升了AUC,在三个数据集上提升了NLL,优势集中在稀疏项目上。跨领域控制表明,当部署的骨干网络留下实体级偏差时,类似现象可能出现在教育领域之外。

英文摘要

Deployed knowledge-tracing models are typically frozen after training, yet systematic per-item logit bias arises, from limited per-item expressivity in backbone architectures and from post-deployment shifts in item properties, degrading prediction quality. Global post-hoc calibrators such as Platt scaling, temperature scaling, and isotonic regression improve probability estimates but leave discriminative ability, as measured by AUC, unchanged. This AUC invariance is a structural consequence of monotone score-only transforms; recovering the stranded discrimination requires conditioning on item identity. We propose SLC (State-space Logit Correction), which converts binary observations to Gaussian pseudo-observations via Laplace/IRLS, applies empirical-Bayes shrinkage through a Kalman smoother, and fits an offset-Platt link. The state-space formulation also yields a detectability bound that characterizes the Bernoulli information floor, explaining why temporal tracking provides no benefit at current data densities. Across four datasets, five backbones, and three seeds, SLC improves AUC on all four datasets and NLL on three, with the advantage concentrating on sparse items. Cross-domain controls suggest that the same phenomenon can arise beyond education when the deployed backbone leaves entity-level bias.

2606.14353 2026-06-15 cs.LG 新提交

Can Deep Neural Networks Improve Compression of Very Large Scientific Data?

深度神经网络能否改善超大规模科学数据的压缩?

Muhannad Alhumaidi, Guozhong Li, Spiros Skiadopoulos, Panos Kalnis

发表机构 * King Abdullah University of Science and Technology(阿卜杜拉国王科技大学) University of the Peloponnese(伯罗奔尼撒大学)

AI总结 本文提出将深度学习预测器集成到传统误差有界压缩框架中,通过气候数据实验发现,尽管ML预测器能提高预测精度和重建质量,但由于残差空间结构影响熵编码效率,未能提升整体压缩比。

详情
AI中文摘要

误差有界有损压缩是管理现代模拟和观测仪器产生的快速增长的科学数据的基本技术。大多数最先进的压缩器遵循预测-残差范式,其中压缩效果取决于预测器的质量:更准确的预测产生更小的残差,更容易压缩。这一观察提出了一个问题:现代机器学习模型能否作为科学数据压缩的优越预测器?直接回答这个问题具有挑战性,因为开发特定于压缩的ML预测器需要大量资源。相反,我们利用气候领域,其中已经存在高度准确的预训练天气预报基础模型,使其成为理想的测试平台。我们提出了一个框架,将空间和时间深度学习模型集成到传统的误差有界压缩流水线中。该框架支持自回归预测模型,并避免误差累积。使用ERA5气候数据作为代表性的大规模科学数据集,我们评估了三种不同的ML预测器:基于VAEformer的编解码器(CRA5)、图神经网络预测器(GraphCast)和视觉变换器预测器(Aurora),与最先进的压缩器SZ3.1在相同的量化和熵编码后端下进行比较。我们对约1.7 TB数据的评估揭示了一个令人惊讶的结果:尽管ML预测器生成更准确的预测,并且可以将重建质量提高多达91%,同时对于高度可预测的变量实现高达9.6倍的压缩比,但它们并没有提高整体数据集级别的压缩比。我们表明,仅预测准确性是不够的:所得残差的空间结构在熵编码效率中起决定性作用。

英文摘要

Error-bounded lossy compression is a fundamental technique for managing the rapidly growing volumes of scientific data produced by modern simulations and observational instruments. Most state-of-the-art-compressors follow a prediction-residual paradigm, where compression effectiveness depends on the quality of the predictor: more accurate predictions generate smaller residuals that are easier to compress. This observation raises a question: can modern machine learning models serve as superior predictors for scientific data compression? Answering this question directly is challenging because developing compression-specific ML predictors requires substantial resources. Instead, we leverage the climate domain where highly accurate pretrained weather forecasting foundation models already exist, making them an ideal testbed. We present a framework that integrates spatial and temporal deep learning models into a conventional error-bounded compression pipeline. The framework supports auto-regressive forecasting models and avoids error accumulation. Using ERA5 climate data as a representative large-scale scientific dataset, we evaluate three distinct ML predictors: a VAEformer-based codec (CRA5), a graph neural network forecaster (GraphCast), and a vision-transformer forecaster (Aurora), against the state-of-the-art compressor SZ3.1 under identical quantization and entropy-coding backends. Our evaluation over approximately 1.7 TB of data reveals a surprising result: although ML predictors generate more accurate predictions and can improve reconstruction quality by up to 91% while achieving up to 9.6x higher compression ratios for highly predictable variables, they do not improve overall dataset-level compression ratio. We show that prediction accuracy alone is insufficient: the spatial structure of the resulting residuals plays a decisive role in entropy coding efficiency.

2606.14397 2026-06-15 cs.LG 新提交

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Running the Gauntlet: 重新评估智能体在陌生环境中的能力

Mykola Vysotskyi, Runqi Lin, Grzegorz Biziel, Michal Zakrzewski, Sebastian Montagna, Damian Rynczak, Shreyansh Padarha, Kumail Alhamoud, Zihao Fu, William Lugoloobi, Kai Rawal, Hanna Yershova, Xander Davies, Taras Rumezhak, Guohao Li, Fazl Barez, Baoyuan Wu, Arkadiusz Drohomirecki, Yarin Gal, Chris Russell, Christopher Summerfield, Adam Mahdi, Volodymyr Karpiv, Philip Torr, Adel Bibi

发表机构 * University of Oxford(牛津大学) SoftServe Massachusetts Institute of Technology(麻省理工学院) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) UK AI Security Institute(英国人工智能安全研究所) Ukrainian Catholic University(乌克兰天主教大学)

AI总结 提出GauntletBench基准,通过20个视觉密集型任务评估智能体在时间感知、图形理解和3D推理等未被充分探索的能力,发现最先进智能体成功率仅19.1%,远低于人类80%以上。

详情
AI中文摘要

随着智能体系统不断发展并广泛部署于现实场景,对其能力进行忠实评估的需求日益增长。然而,当前的基准通常基于流行应用,任务相对简单,且关注狭窄的能力集,忽略了更广泛的维度,导致现代智能体性能饱和,无法探测其局限性。为此,我们引入了GauntletBench,一个基于网络的基准,用于评估智能体在挑战性场景中的泛化能力,重点关注三个未被充分探索的能力(时间感知、图形理解和3D推理),涵盖五个较少被覆盖的专业应用(视频编辑器、工作流构建器、3D建模器、飞行分析器和电路设计器),每个应用包含20个视觉密集型任务(共100个)。我们的基准提供了一个模块化流水线,包括一个与开源和闭源智能体框架兼容的环境、一个受控的基于网络的应用、一个结构良好的任务套件,以及一个具有多样化指标的自动评估引擎。与广泛预期相反,我们的实证结果表明,前沿智能体系统远未达到人类水平的表现。即使是最先进的智能体,在我们的GauntletBench上也仅达到19.1%的成功率,凸显了这些被忽视的能力和泛化方面的局限性。相比之下,非专家人类标注者在我们具有挑战性但可行的任务上实现了超过80%的成功率,揭示了当前智能体能力与复杂现实场景所需能力之间的巨大差距。

英文摘要

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.

2606.14492 2026-06-15 cs.LG 新提交

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

配方控制的解码器审计用于结构知识图谱补全

Xihang Shan, Ye Luo

发表机构 * School of Mathematical Sciences, Xiamen University(厦门大学数学科学学院) School of Informatics, Xiamen University(厦门大学信息学院)

AI总结 提出配方控制的解码器审计方法,通过交换解码器评估其对知识图谱补全性能的影响,发现解码器效果受配方和来源影响,并建议在编码器层面声明前进行解码器×深度扫描。

Comments 11 pages, 5 figures. Code and artifacts: https://github.com/AndyShan11/kgc-decoder-audit

详情
AI中文摘要

我们提出了一种用于结构直推式知识图谱补全(KGC)的配方控制解码器审计(RCDA)。该审计提出了一个简单的报告问题:在将性能提升归因于编码器或训练配方之前,当在相同配方下交换解码器时,会发生什么变化?使用ComplEx和DistMult作为主要控制对,并辅以针对性的RotatE/TransE抽查,我们评估了七个基准。在五个标准知识图谱上,在我们的配方下,ComplEx与DistMult的差异虽小但一致(MRR增加+0.005至+0.012),而CompGCN风格的编码器效果因数据集而异。在小知识图谱上,解码器效果成为主要诊断指标:Kinship显示ComplEx稳定优势为+0.143 MRR(6个种子),而UMLS在干净的6种子服务器重跑中偏好ComplEx(+0.022 MRR),但在早期来源变体中结果相反。因此,我们将小知识图谱的解码器选择视为对配方和来源敏感,而非固定的数据集胜者。我们进一步表明,在WN18RR上解码器选择与编码器深度存在交互,且在我们的配方下,YAGO3-10上L=0的ComplEx在d=128时达到0.6971 ± 0.0048 MRR。结果是一个紧凑的审计协议:报告匹配的解码器行,记录小知识图谱来源,并在做出编码器层面声明之前进行解码器×深度扫描。

英文摘要

We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

2606.14604 2026-06-15 cs.LG cs.AI 新提交

A Comparative Study of Deep Learning Architectures for Multi-Horizon Behavioural Forecasting for Mobile Health

移动健康多时间范围行为预测的深度学习架构比较研究

Pavlos Nicolaou, Kleanthis Malialis, Artemis Kontou, Panayiotis Kolios

发表机构 * KIOS Research and Innovation Center of Excellence, University of Cyprus(塞浦路斯大学KIOS研究与创新卓越中心) Department of Electrical and Computer Engineering, University of Cyprus(塞浦路斯大学电气与计算机工程系)

AI总结 本研究在三个公开数据集上系统比较了六种深度学习架构、两种零样本基础模型和统计基线在1-8天时间范围内的行为预测性能,发现PatchTST表现最佳,基础模型TimesFM在低数据场景下可与训练模型匹敌,且参与者级微调可将RMSE降低16-60%。

详情
AI中文摘要

可穿戴设备和智能手机生成丰富的行为时间序列,可支持主动健康干预,但缺乏对这些数据现代预测架构的系统比较。特别是,模型如何在人群中泛化、不同架构如何响应参与者级微调以及预测精度如何在多天范围内下降仍不清楚。我们在三个涵盖800多名参与者的公开数据集上基准测试了六种深度学习架构、两种零样本基础模型(FM)和统计基线,报告了步数、屏幕时间和睡眠时长在1-8天范围内的逐特征指标。我们进一步对所有六种架构进行了逐特征个性化研究,并评估了FM在不同数据集大小和时间粒度上的迁移性。我们的主要发现是:(i)没有单一架构占主导地位,PatchTST在训练模型中领先,而前三名(TCN、MLP、Transformer)之间没有显著性能差异;(ii)FM TimesFM在零样本情况下匹配或超过训练模型,尤其是在低数据场景下;(iii)参与者级微调将逐特征RMSE降低了16-60%,其中睡眠受益最大,步数受益最小。这些结果为移动健康预测中的架构选择、FM适用性和个性化策略提供了实用指导。据我们所知,这是首个联合评估现代深度学习、FM和个性化用于可穿戴设备多时间范围行为预测的研究。

英文摘要

Wearable devices and smartphones generate rich behavioural time series that can support proactive health interventions, yet systematic comparisons of modern forecasting architectures for these data are lacking. In particular, it remains unclear how models generalise across populations, how different architectures respond to participant-level fine-tuning and how forecasting accuracy degrades across multi-day horizons. We benchmark six deep learning architectures, two zero-shot Foundation Models (FM) and statistical baselines on three public datasets encompassing over 800 participants, reporting per-feature metrics for step counts, screen time and sleep duration across 1-8 day horizons. We further conduct a per-feature personalisation study across all six architectures and assess FM transferability across dataset sizes and temporal granularities. Our key findings are: (i) no single architecture dominates, PatchTST leads among trained models while the three runners-up (TCN, MLP, Transformer) show no meaningful performance difference; (ii) the FM TimesFM matches or exceeds trained models zero-shot, especially in low-data regimes and (iii) participant-level fine-tuning reduces per-feature RMSE by 16-60\%, with sleep benefiting most and step counts least. These results provide practical guidance on architecture selection, FM applicability and personalisation strategies for mobile health forecasting. To the best of our knowledge, this is the first study to jointly evaluate modern deep learning, FMs and personalisation for multi-horizon behavioural forecasting from wearables.

2606.13684 2026-06-15 cs.CY cs.AI cs.CL cs.LG 交叉投稿

Cross-Dataset Bloom Question Classification: Supervised Models and Prompted LLMs

跨数据集布鲁姆问题分类:监督模型与提示式大语言模型

Abdolali Faraji, Mohammadreza Molavi, Zohreh Rasoulkhani, Mohammadreza Tavakoli, Gábor Kismihók

发表机构 * Leibniz Information Centre for Science and Technology(莱比锡信息科学与技术研究中心) University of Genoa(热那亚大学)

AI总结 评估监督ML/DL模型和LLM在跨数据集布鲁姆分类中的泛化能力,发现LLM更稳定,并基于最佳提示策略开发了轻量级UI。

Comments Accepted at AIED 2026. Abdolali Faraji and Mohammadreza Molavi contributed equally to this work

详情
AI中文摘要

自动对评估问题进行布鲁姆分类可以大幅减少教师工作量,但标注具有主观性且依赖教师。先前的机器学习和深度学习方法在数据集内表现良好,但很少在跨数据集设置中评估,导致现实世界的泛化能力不明确;同时,LLM在布鲁姆问题分类中的有效性尚未被系统研究。我们评估了现有ML/DL方法的跨数据集泛化能力,并在五个数据集上使用多种提示策略评估了LLM;最佳提示策略结合了上下文示例和课程特定的动作动词。监督ML/DL模型在未见数据集上性能大幅下降,而LLM更稳定,表明其在多样化教育环境中是一种稳健的替代方案。基于最佳提示策略,我们还开发了一个轻量级用户界面,支持教师自动分类大量问题库;可用性研究表明低工作量和高度可用性。

英文摘要

Automatic Bloom's taxonomy classification of assessment questions can substantially reduce instructor workload, but labeling is subjective and teacher-dependent. Prior machine learning (ML) and deep learning (DL) approaches reported strong within-dataset results, yet were rarely evaluated in cross-dataset settings, leaving real-world generalizability unclear; meanwhile, LLM effectiveness for Bloom question classification has not been systematically studied. We evaluated the cross-dataset generalization of existing ML/DL methods and assessed LLMs with multiple prompting strategies on five datasets; the best prompting strategy combined in-context examples with course-specific action verbs. Supervised ML/DL models degraded substantially on unseen datasets, whereas LLMs were more stable, suggesting a robust alternative across diverse educational contexts. Based on the best prompting strategy, we also presented a lightweight UI that supports instructors in automatically classifying large question banks; a usability study indicated low workload and high usability.

2606.13735 2026-06-15 cs.AR cs.AI cs.LG cs.PL 交叉投稿

VHDLSuite: Unified Pipeline for LLM VHDL Generation with Data Synthesis and Evaluation

VHDLSuite:面向LLM VHDL生成的统一流水线,包含数据合成与评估

Yijun Shen, Minghao Shao, Yichen Zhao, Zhuoyan Yu, Boyuan Chen, Yik-Cheung Tam, Muhammad Shafique

发表机构 * Center for Data Science, NYU Shanghai, China(纽约市立大学上海分校数据科学中心) NYU Tandon School of Engineering, USA(纽约大学Tandon工程学院) NYU Abu Dhabi, UAE(纽约大学阿布扎比分校)

AI总结 提出VHDLSuite基础设施,通过自动基准合成、可执行验证和多模型诊断分析,解决LLM在VHDL生成评估中的不足,并构建含200+问题的VHDLBench基准。

详情
AI中文摘要

大型语言模型(LLM)在寄存器传输级(RTL)代码生成方面展现了令人印象深刻的能力,尤其是针对Verilog。然而,评估它们在其他硬件描述语言(HDL)上的性能,特别是VHDL,仍然有限,尽管其独特的语言特性(如更严格的语义规则)引入了与Verilog不同的评估考量。这种覆盖不足限制了对当前模型在不同结构和语义的硬件设计语言中泛化能力的全面理解。为弥补这一空白,我们引入了VHDLSuite,一个以基准为中心的可扩展VHDL生成评估基础设施,集成了自动基准合成、可执行验证和多模型诊断分析。首先,我们提出一个数据流水线,自动将Verilog设计及其配套测试平台转换为可执行的VHDL基准实例,随后基于VUnit/GHDL进行验证,确保每个发布的任务在VHDL环境中可编译、可运行且可一致检查。其次,我们引入VHDLBench,一个包含超过200个VHDL问题的基准,配有完整且经过验证的测试平台,覆盖广泛的复杂度级别。第三,我们广泛评估了最先进的LLM,并揭示了LLM辅助VHDL生成中的关键挑战。我们的发现为多语言硬件设计的未来工作提供了重要见解和支持。该数据流水线、基准和评估框架将开源。

英文摘要

Large Language Models (LLM) have shown impressive capabilities in Register Transfer Level (RTL) code generation, particularly for Verilog. However, evaluating their performance with other Hardware Description Languages (HDL), especially VHDL, remains limited although its distinct language characteristics, such as stricter semantic rules, introduce evaluation considerations that differ from Verilog. This lack of coverage restricts fully understanding of how well current models generalize across hardware design languages with differing structures and semantics. To address this gap, we introduce VHDLSuite, a benchmark-centered infrastructure for scalable VHDL generation evaluation, integrating automated benchmark synthesis, executable validation, and multi-model diagnostic analysis. First, we propose a data pipeline that automatically converts Verilog designs and their accompanying testbenches into executable VHDL benchmark instances, followed by VUnit/GHDL-based validation to ensure each released task is compilable, runnable, and consistently checkable in the VHDL environment. Second, we introduce VHDLBench, a benchmark with over 200 VHDL problems with complete and validated testbenches across a wide range of complexity levels. Third, we extensively evaluate cutting-edge LLMs and uncover key challenges specific on LLM-aided VHDL generation. Our findings provide important insights and support future work in multi-language hardware design automation.Our data pipeline, benchmark, and evaluation framework will be open-sourced.

2606.13802 2026-06-15 cs.SE cs.AI cs.HC cs.LG 交叉投稿

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

电子表格中下一步动作预测的基准测试与框架

Tejas Agrawal, Vu Le, Sumit Gulwani, Gust Verbruggen

发表机构 * University of Waterloo(多伦多大学)

AI总结 针对电子表格缺乏自动补全功能的问题,提出一个基准测试,通过人工整理动作序列和在线评估方法,比较多种预测模型,分析动作保存、误报、效率等特性。

Comments Accepted at ICML 2026. Code and benchmark: https://github.com/Tej-55/NAPE

详情
AI中文摘要

预测性代码补全极大地加速了开发人员的工作效率。在电子表格中,尽管更为常见,但这种自动补全功能几乎不存在。为了解决这一差距,我们引入了一个基准测试,用于观察电子表格中用户动作序列并预测未来动作的系统。两个挑战是(1)公共电子表格语料库中缺乏编辑历史,以及(2)电子表格动作的复杂空间(空间、时间、复合)。为了解决(1),我们手动整理了52个序列,包含12K个动作,这些动作通过参数化启发式和LLM精炼从公共语料库中重新创建电子表格。为了解决(2),我们提出了一种在线评估方法,该方法在每个用户动作后期望一个预测,接受或拒绝该预测,在接受时更新未来动作,并重复此过程直到获得目标电子表格。我们使用多个基线预测器(包括零样本LLM、微调SLM和经典模型),并分析了基准测试教给我们的不同属性,包括但不限于:保存动作和误报的属性、效率、用户配置文件的影响、触发器的影响以及上下文的影响。

英文摘要

Predictive code completion greatly accelerates how quickly developers work. In spreadsheets, despite being much more common, such auto-completion features are virtually non-existent. To address this gap, we introduce a benchmark for systems that observe a sequence of user actions in a spreadsheet and predict future actions. Two challenges are (1) the absence of edit histories in public spreadsheet corpora and (2) the complex space of spreadsheet actions (spatial, temporal, composite). To address (1), we manually curate 52 sequences of 12K actions that recreate spreadsheets from public corpora, seeded by parametrized heuristics and LLM refinement. To address (2), we propose an online evaluation that expects a prediction after each user action, accepts or rejects that prediction, updates the future actions upon acceptance, and repeats this until the target spreadsheet is obtained. We use multiple baseline predictors (including zero-shot LLMs, fine-tuned SLMs, and classical models) and analyze different properties that our benchmark teaches us, including but not limited to: properties of saved actions and false positives, efficiency, effect of user profiles, effect of triggers, and effect of context.

2606.13994 2026-06-15 cs.CR cs.AI cs.LG 交叉投稿

Hidden in Plain Sight: Benchmarking Agent Safety Against Decomposition Attacks with DECOMPBENCH

隐于无形:使用DECOMPBENCH基准测试代理安全对抗分解攻击

Vikhyath Kothamasu, Virginia Smith, Chhavi Yadav

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Simons Institute, UC Berkeley(Simons研究所,伯克利大学)

AI总结 提出DeCompBench基准,通过分解攻击将有害任务拆分为良性子任务,揭示现有代理安全机制在对抗分解攻击时的脆弱性。

详情
AI中文摘要

基于LLM的代理变得越来越强大且广泛部署,在现实世界中造成了日益增长的对抗性滥用动机。一个关键的新兴威胁是分解攻击\cite{glukhov2024breach, jones2024adversaries},其中有害任务被分解为更简单、良性的子任务,这些子任务单独执行时能规避安全机制,但累积起来却实现了恶意意图。尽管最近的基准测试评估了代理在多轮和多工具使用设置中的安全性,但它们并未明确捕捉这种形式的分解滥用,且可能无法代表现实的对抗性执行流程。为此,我们引入了DeCompBench,这是一个专门设计用于评估分解攻击下代理安全性的基准。DeCompBench采用分解即设计原则,使用图形框架创建,能够将有害任务分解为单独良性且可执行的子任务,并具有现实的工作流程。我们使用自定义分解器的实验表明,最先进的代理在整体有害任务上表现出高拒绝率,但在其分解变体上拒绝率显著降低,同时往往无意中实现了对抗性目标。这些发现强调了针对分解攻击进行安全性评估及相应防御的必要性。我们的数据集已公开,可在以下网址获取:https://this https URL。

英文摘要

LLM-based Agents are becoming increasingly capable and widely deployed, creating growing incentives for adversarial misuse in the real-world. A key emerging threat is Decomposition Attacks \cite{glukhov2024breach, jones2024adversaries} in which a harmful task is broken into simpler, benign subtasks that evade safety mechanisms when executed separately but cumulatively fulfill the malicious intent. Although recent benchmarks assess agent safety in multi-turn and multi-tool-use settings, they do not explicitly capture this form of decompositional misuse and may not represent realistic adversarial execution flows. To this end, we introduce DeCompBench, a benchmark designed specifically to evaluate agentic safety under decomposition attacks. DeCompBench is created with a decomposition-by-design principle using a graphical framework and enables harmful task decomposition into individually benign and executable subtasks with realistic workflows. Our experiments using a custom decomposer show that state-of-the-art agents exhibit high refusal rates on monolithic harmful tasks, but significantly lower refusal rates on their decomposed variants, while often inadvertently fulfilling the adversarial objectives. These findings underscore the need for safety evaluations against decomposition attacks and corresponding defenses. Our dataset is publicly available and can be found at https://huggingface.co/datasets/decompositionbench/DeCompBench.

2606.14028 2026-06-15 stat.ML cs.LG 交叉投稿

Anytime-Valid Confirmation of Label-Shift Corrections

标签偏移修正的任意有效确认

Seungjin Choi

发表机构 * Seungjin Choi

AI总结 针对标签稀缺时预指定偏移修正的确认问题,提出基于条件e值的任意有效序贯检验方法,利用似然比乘积构造非负鞅,将常规模型监测转化为正式检验。

Comments ICML 2026 Workshop on Hypothesis Testing

详情
AI中文摘要

在小型批次的科学部署中,即使未标记的目标输入可用,标记的目标结果也可能过于稀缺,无法进行可靠的偏移估计。我们解决了互补的设置,其中从业者根据领域知识预先指定了标签偏移修正,并询问传入的标记结果是否支持该修正。我们表明,经过标签偏移修正的预测与源预测之间的每个观测的似然比是一个条件e值,因此其运行乘积是一个非负鞅,Ville不等式产生一个任意有效的确认规则。对数鞅等于源预测与修正预测之间的累积负对数预测密度(NLPD)差距,将常规模型监测转化为正式的序贯检验。拒绝意味着传入数据支持相对于源预测的假定修正,但这不是对偏移程度的精确估计。对于具有高斯标签偏移比率的高斯过程源,存在封闭形式。高斯过程回归模拟验证了类型I控制、有限样本功效、校准敏感性以及基于标签重新估计的可靠先验的小批量优势。

英文摘要

In small-batch scientific deployments, labeled target outcomes may be too scarce for reliable shift estimation even when unlabeled target inputs are available. We address the complementary setting where the practitioner has a pre-specified label-shift correction from domain knowledge and asks whether incoming labeled outcomes support it. We show that the per-observation likelihood ratio between a label-shift-corrected predictive and the source predictive is a conditional e-value, so its running product is a nonnegative martingale and Ville's inequality yields an anytime-valid confirmation rule. The log martingale equals the cumulative negative log-predictive density (NLPD) gap between the source and the corrected predictive, converting routine model monitoring into a formal sequential test. Rejection means the incoming data support the posited correction relative to the source predictive, but it is not a precise estimate of the degree of shift. Closed forms are available for GP sources with Gaussian label-shift ratios. GP regression simulations validate Type I control, finite-sample power, miscalibration sensitivity, and the small-batch advantage of a reliable prior over label-based re-estimation.

2606.14199 2026-06-15 cs.CL cs.AI cs.LG 交叉投稿

OdysSim: Building Foundation Models for Human Behavior Simulation

OdysSim: 构建人类行为模拟的基础模型

Xuhui Zhou, Weiwei Sun, Weihua Du, Jiarui Liu, Haojia Sun, Qianou Ma, Tongshuang Wu, Yiming Yang, Maarten Sap

发表机构 * Carnegie Mellon University, Language Technologies Institute(卡内基梅隆大学语言技术研究所)

AI总结 提出OdysSim,通过SOUL分类法统一62个数据集和23个基准任务,采用混合训练、任务特定强化学习和专家蒸馏,构建8B参数行为基础模型OSim,在多数任务上超越前沿模型,并实现更类人输出和零样本迁移。

Comments 34 pages. Code: https://github.com/sunnweiwei/OdysSim ; Models and data: https://huggingface.co/collections/cmu-lti/odyssim

详情
AI中文摘要

大型语言模型越来越多地被部署为人类模拟器,用于交互式评估和社会模拟。然而,以有用性为导向的后训练使它们趋向于同质化、过于随和的助手风格,造成了行为上的Sim2Real差距。我们提出了OdysSim,这是对行为基础模型(即经过训练以大规模模拟人类行为的模型)进行的最大规模开放系统研究。我们提出了SOUL,一个包含五个能力轴(CONV、SS、COG、ROLE、EVAL)的分类法,将62个数据集和23个基准任务统一在一个框架下。具体来说,我们整理了OdysSim语料库(2140万次交互,100亿个token,并配备了反向生成的社交上下文),构建了SOUL-Index基准,并开发了一个端到端的训练方案,结合了中期训练、任务特定强化学习和专家蒸馏。由此产生的开源8B OSim模型在23个任务中的8个上排名第一或并列第一,按此计数优于任何单个前沿模型,在对话和社交任务上取得了最大的提升。其输出在长度、格式和词汇选择上也更接近人类,并在τ-bench上零样本迁移到分布外的用户模拟,在反应一致性上几乎与真实用户匹配(93.2 vs 93.5)。我们进一步表明,LLM作为评判者的强化学习会引发奖励黑客模式,而我们的检测器可以在后训练期间缓解这些模式。总之,我们的发现表明,行为基础模型需要重新思考LLM的训练范式。我们发布所有工件以支持未来的研究。

英文摘要

Large language models are increasingly deployed as human simulators for interactive evaluation and social simulation. Yet helpfulness-driven post-training pulls them toward a homogeneous, overly agreeable assistant register, creating a behavioral Sim2Real gap. We present OdysSim, the largest open systematic investigation of behavioral foundation models, i.e., models trained to simulate human behavior at scale. We propose SOUL, a taxonomy of five capability axes (CONV, SS, COG, ROLE, EVAL) that unifies 62 datasets and 23 benchmark tasks under one framework. Specifically, we curate the OdysSim corpus (21.4M interactions, 10B tokens, retrofitted with back-generated social contexts), construct the SOUL-Index benchmark, and develop an end-to-end training recipe combining midtraining, task-specific RL, and expert distillation. The resulting open 8B OSim model ranks first or tied-first on 8 of 23 tasks, outperforming any individual frontier model by this count, with the strongest gains on conversational and social tasks. Its outputs are also more human-like in length, formatting, and word choice, and it transfers zero-shot to out-of-distribution user simulation on $τ$-bench, nearly matching real users on reaction alignment (93.2 vs. 93.5). We further show that LLM-as-judge RL induces reward-hacking patterns, and that our detectors can mitigate them during post-training. Together, our findings suggest that behavioral foundation models require rethinking the LLM training paradigm. We release all artifacts to support future research.

2606.14299 2026-06-15 cs.CV cs.LG 交叉投稿

What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective

什么驱动了CLIP的测试时适应?从更新视角进行的受控实证研究

Jiazhen Huang, Xiao Chen, Zhiming Liu, Yaru Sun, Jingyan Jiang, Zhi Wang

发表机构 * Tsinghua University(清华大学) Shenzhen Technology University(深圳技术大学)

AI总结 本文通过受控实证研究,从更新视角分析了CLIP测试时适应方法的驱动因素,揭示了适应增益主要来自测试时证据和可靠代理,而非繁重优化,并指出无单一范式普遍最优。

详情
AI中文摘要

视觉语言模型(如CLIP)已成为开放词汇识别的标准骨干,但其零样本预测在部署时仍易受分布偏移影响。测试时适应(TTA)最近被扩展到CLIP作为轻量级解决方案,导致TTA4CLIP方法迅速增长。然而,该领域的实证进展在很大程度上超过了我们对真正驱动适应因素、其增益来源以及哪些偏移下保持可靠的理解。本文从追求最先进准确率中退一步,对TTA4CLIP进行了系统性的受控研究。我们首先根据测试时更新的内容,将现有方法组织为三个统一范式。然后,我们引入TTABC,一个开源的CLIP TTA基准,它标准化了评估协议并集成了20多种代表性方法。我们的受控实证分析集中在三个关键领域。首先,我们确定了基于参数方法的驱动因素,揭示适应增益主要由测试时证据和可靠代理驱动,而非繁重优化。其次,我们探索了超越繁重参数调整的证据利用,表明通过跨样本或当前样本证据以及轻量级原型更新可以实现竞争性和高效的性能。最后,我们证明TTA没有银弹:没有单一的适应范式普遍最优,首选范式取决于偏移的性质。我们希望我们的基准和研究能提供对当前TTA4CLIP格局的更清晰理解,并为进一步研究奠定基础。

英文摘要

Vision-Language Models (VLMs) such as CLIP have become a standard backbone for open-vocabulary recognition, yet their zero-shot predictions remain vulnerable to distribution shifts encountered at deployment. Test-Time Adaptation (TTA) has recently been extended to CLIP as a lightweight solution, leading to a rapidly growing body of TTA4CLIP methods. However, empirical progress in this area has largely outpaced our understanding of what truly drives adaptation, where their gains originate, and under which shifts they remain reliable. In this paper, we take a step back from the pursuit of state-of-the-art accuracy and conduct a systematic controlled study of TTA4CLIP. We first organize existing methods into three unified paradigms according to what is updated at test time. We then introduce TTABC, an open-source TTA Benchmark for CLIP, which standardizes evaluation protocols and integrates more than 20 representative methods. Our controlled empirical analysis focuses on three key areas. First, we determine the driving factors in parameter-based methods, revealing that adaptation gains are primarily driven by test-time evidence and reliable proxies rather than heavy optimization. Second, we explore evidence utilization beyond heavy parameter tuning, showing that competitive and efficient performance can be achieved through cross- or current-sample evidence and lightweight prototype updates. Finally, we demonstrate that there is no silver bullet for TTA: no single adaptation paradigm is universally optimal, and the preferred paradigm depends on the nature of shift. We hope our benchmark and study provide a clearer understanding of the current TTA4CLIP landscape and establish a foundation for further research.

2606.14506 2026-06-15 stat.ML cs.LG stat.ME 交叉投稿

Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias

超越训练分布:评估分布偏移和选择偏差下的预测

Annie Ulichney, Amanda Coston

发表机构 * Department of Statistics, University of California, Berkeley(加州大学伯克利分校统计学系)

AI总结 针对协变量偏移和选择性标签共存时的模型评估问题,提出双机器学习程序估计目标风险,并通过eICU数据验证其准确性优于单独处理任一种偏差的方法。

详情
AI中文摘要

理解预测模型在新环境中的表现对于防止算法在决策中造成伤害至关重要。模型性能下降的两个常见原因是:(i) 协变量偏移,即目标协变量分布与源分布不同;(ii) 选择性标签,即结果的可观测性取决于历史决策。我们研究在协变量偏移和基于观测特征的选择性标签共同存在下的部署前模型评估。特别地,我们提出了一种双机器学习程序,用于在一般损失函数下估计任意黑箱预测模型的目标风险。我们在标准假设下证明了该估计量的可识别性,并基于目标风险的影响函数推导出偏差校正估计量。最后,我们通过使用eICU电子健康记录数据库的实验评估了我们的估计量,结果表明,与单独处理选择性标签或协变量偏移的方法以及结合标准插值方法的基线相比,我们的估计量更准确地跟踪真实目标风险。

英文摘要

Understanding how a prediction model will perform in a new environment before deployment is essential to preventing harm when algorithms inform decision-making. Two common sources of model performance degradation are (i) covariate shift, where the target covariate distribution differs from the source, and (ii) selective labels, where the observability of outcomes depends on historical decisions. We study pre-deployment model evaluation under the joint presence of covariate shift and labeling of outcomes selectively based on observed features. In particular, we present a double machine learning procedure for estimating the target risk of an arbitrary black-box prediction model under a general loss function. We show identification of this estimand under standard assumptions and derive a bias-corrected estimator based on the influence function of the target risk. Finally, we evaluate our estimator through experiments using the eICU electronic health records database, showing that it tracks the true target risk more accurately than methods that address either selective labels or covariate shift alone, as well as baselines that combine standard plug-in approaches.

2606.14562 2026-06-15 cs.CV cs.LG 交叉投稿

NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree Nests

NEST3D:织布鸟树巢的高分辨率多模态数据集

Constanza A. Molina Catricheo, Simon Boeder, Ting-Jia Guo, Giacomo May, Clément Berthelot, Devis Tuia, Friedrich Fedor Reinhard, Fabio Remondino, Benjamin Risse

发表机构 * Institute for Geoinformatics (ifgi), University of Münster(明斯特大学地理信息学研究所) École Polytechnique Fédérale de Lausanne (EPFL)(洛桑联邦理工学院) Max Planck Institute of Animal Behavior(马克斯·普朗克动物行为研究所) University of Konstanz(康斯坦茨大学) Kuzikus Research Station(库兹库斯研究站) Fondazione Bruno Kessler (FBK)(布鲁诺·凯斯勒基金会)

AI总结 针对织布鸟巢缺乏精细3D结构数据的问题,提出包含104棵巢树、1.4TB多模态无人机数据集,并基准测试语义分割方法,PT-v3达86.35% mIoU。

Comments 14 pages, 4 figures. Dataset available at https://huggingface.co/NEST3D

详情
AI中文摘要

织布鸟巢作为复杂的生态结构,提供体温调节微栖息地并维持多种物种;然而,先前研究使用的数据集缺乏精细的3D结构细节。由于巢穴的不规则几何形状以及与复杂宿主植被的整合,生成可用且准确的3D织布鸟巢数据具有挑战性。我们通过一个开放获取的1.4TB多模态无人机数据集(包含104棵巢树,共27,945张RGB图像、111,780张多光谱图像、约7.81亿个3D点以及专家标注的语义分割标签)弥合了这一差距。我们使用KPConv、RandLA-Net和Point Transformer V3对语义分割进行基准测试,其中PT-v3在测试集上达到了86.35%的mIoU。虽然结果展示了基于Transformer和逐点方法的强大性能,但也凸显了架构相关的挑战,特别是对于基于卷积的方法(如KPConv)。通过独特地结合光谱、空间和结构信息,所提出的数据集推动了3D重建、分割和分类算法的发展,实现了从巢穴体积估计到物种保护等生态应用,并作为一个要求严格的基准,揭示了在极端类别不平衡下与架构相关的性能差异。

英文摘要

Sociable weaver nests function as complex ecological structures offering thermoregulatory microhabitats and sustaining diverse species; however, datasets used in prior studies lack fine-grained 3D structural detail. Producing usable and accurate 3D weaver nest data is challenging due to their irregular geometry and integration with complex host vegetation. We bridge this gap with an open-access, 1.4 TB multimodal drone dataset of 104 nest-bearing trees, comprising 27,945 RGB images, 111,780 multispectral images, approximately 781 million 3D points, and expert-annotated semantic segmentation labels. We benchmark semantic segmentation using KPConv, RandLA-Net, and Point Transformer V3, with PT-v3 achieving an mIoU of 86.35% on the test set. While the results demonstrate strong performance for transformer-based and point-wise methods, they also highlight architecture-dependent challenges, particularly for convolution-based approaches such as KPConv. By uniquely combining spectral, spatial, and structural information, the presented dataset advances 3D reconstruction, segmentation, and classification algorithms, enabling ecological applications from nest volume estimation to species conservation, and serves as a demanding benchmark that exposes architecture-dependent performance under extreme class imbalance.

2606.14592 2026-06-15 stat.ML cs.LG stat.AP stat.ME 交叉投稿

Cluster LOCO: Feature Importance For Interpreting Clusters

Cluster LOCO:用于解释聚类的特征重要性

Claire M. He, Genevera I. Allen

发表机构 * Department of Statistics Columbia University(统计学系哥伦比亚大学)

AI总结 提出模型无关的聚类特征重要性方法Cluster LOCO,通过特征遮挡和泛化性度量,可靠识别驱动聚类结构的特征。

Comments 36 pages, 12 figures

详情
AI中文摘要

聚类广泛用于探索性分析和科学发现,推动从市场细分到生物数据分析的洞察,但随着现代数据集变得日益庞大和复杂,其输出可能难以解释、审计和重现。聚类的可靠使用需要理解哪些特征驱动了发现的结构,然而与监督学习方法相比,聚类在特征级解释方面仍然稀缺。此外,现有的聚类特征重要性分数通常与特定算法和数据假设相关。为了解决这些挑战,我们提出了Cluster LOCO(Leave-One-Covariate-Out),一个模型无关的聚类特征重要性分数族。Cluster LOCO基于特征遮挡和聚类泛化性,即在一个数据子集上学习的聚类标签能否在保留样本上被准确预测。对于任何选定的聚类算法,Cluster LOCO通过测量移除某个特征对泛化性的降低程度来量化该特征的重要性。我们首先介绍了基于数据分割的Cluster LOCO-Split,然后将其扩展到Cluster LOCO-MP,一种适用于大规模数据的minipatch集成版本。通过合成模拟和在单细胞转录组学中细胞类型发现的应用,我们展示了Cluster LOCO比现有的聚类特征重要性方法更可靠地恢复信息特征。

英文摘要

Clustering is widely used for exploratory analysis and scientific discovery, driving insights from market segmentation to biological data analysis, but its outputs can be difficult to interpret, audit, and reproduce as modern datasets become increasingly large and complex. Reliable use of clustering requires understanding which features drive the discovered structure, yet feature-level explanations for clustering remain scarce compared with methods in supervised learning. Furthermore, existing clustering feature importance scores are often tied to specific algorithms and data assumptions. To address these challenges, we propose Cluster LOCO (Leave-One-Covariate-Out), a family of model-agnostic feature importance scores for clustering. Cluster LOCO is built on feature occlusion and clustering generalizability, defined as whether cluster labels learned on one subset of the data can be accurately predicted on held-out samples. For any chosen clustering algorithm, Cluster LOCO quantifies a feature's importance by measuring how much its removal degrades generalizability. We first introduce Cluster LOCO-Split, which relies on data splitting, and then extend it to Cluster LOCO-MP, a minipatch ensemble-based version designed for large-scale data. Across synthetic simulations and an application to cell-type discovery in single-cell transcriptomics, we show that Cluster LOCO more reliably recovers informative features than existing clustering feature importance methods.

2412.03716 2026-06-15 cs.LG cs.CY 版本更新

A Water Efficiency Dataset for African Data Centers

非洲数据中心用水效率数据集

Noah Shumba, Opelo Tshekiso, Pengfei Li, Giulia Fanti, Shaolei Ren

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Carnegie Mellon University Africa Kigali Rwanda(卡内基梅隆大学非洲分校,基亚利,卢旺达) Rochester Institute of Technology(罗切斯特理工学院) Rochester New York USA(罗切斯特,纽约州,美国) Carnegie Mellon University Pittsburgh Pennsylvania USA(卡内基梅隆大学匹兹堡,宾夕法尼亚州,美国) University of California, Riverside(加州大学河滨分校)

AI总结 构建首个结合天气与发电数据的非洲41国数据中心用水效率数据集,评估Llama-3-70B和GPT-4推理用水量,发现多数非洲国家用水低于全球平均。

Comments Accepted by NeurIPS 2024 Workshop on Tackling Climate Change with Machine Learning

详情
AI中文摘要

人工智能计算和数据中心消耗大量淡水,既直接用于冷却,也间接用于发电。尽管大多数关注集中在发达国家如美国,本文首次提出了一个结合国家层面天气和发电数据的数据集,用于估算非洲41个国家(跨越五个不同气候区域)的数据中心用水效率。我们还利用该数据集评估和估算了在11个选定的非洲国家中,两个大型语言模型(即Llama-3-70B和GPT-4)推理的用水量。我们的估算表明,使用Llama-3-70B撰写一份10页的报告可能消耗多达0.66升水,而GPT-4完成相同任务可能消耗高达约59升水。对于撰写一封120-200词的中等长度电子邮件,Llama-3-70B和GPT-4可能分别消耗约0.13升和2.9升水。所有生成模型推理任务的数字均基于我们最初准备分析时的2024年公开信息。自那时起,AI推理系统已大幅改进。例如,最近披露的信息表明,2024年5月至2025年5月期间,能效提高了30倍以上。因此,我们2024年的估算应被解释为历史参考值,而非代表当前性能。有趣的是,对于相同的AI模型,11个选定的非洲国家中有9个的用水量低于全球平均水平,这主要是由于其发电的用水强度较低。

英文摘要

Artificial intelligence (AI) computing and data centers consume large amounts of freshwater, both directly for cooling and indirectly for electricity generation. While most attention has been paid to developed countries such as the U.S., this paper presents the first-of-its-kind dataset that combines nation-level weather and electricity generation data to estimate water usage effectiveness for data centers in 41 African countries across five different climate regions. We also use our dataset to evaluate and estimate the water consumption of inference on two large language models (i.e., Llama-3-70B and GPT-4) in 11 selected African countries. Our estimates suggest that writing a 10-page report using Llama-3-70B could consume as much as {0.66 liters} of water, while the water consumption by GPT-4 for the same task may go up to about {59 liters}. For writing a medium-length email of 120-200 words, Llama-3-70B and GPT-4 could consume about {0.13 liters} and {2.9 liters} of water, respectively. All the numbers for generative model inference tasks are based on public information available in 2024, when we initially prepared the analysis. Since then, AI inference systems have improved substantially. For example, recent disclosures suggest that energy efficiency improved by more than 30x between May 2024 and May 2025. Accordingly, our 2024 estimates should be interpreted as historical reference values rather than as representative of current performance. Interestingly, given the same AI model, 9 of the 11 selected African countries consume less water than the global average, mainly because of lower water intensities for electricity generation.

2505.16319 2026-06-15 cs.LG 版本更新

FreshRetailNet-LT: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

FreshRetailNet-LT:面向生鲜零售中潜在需求恢复与预测的缺货标注删失需求数据集

Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang

发表机构 * Fresh Retail, Inc.(新鲜零售公司)

AI总结 针对生鲜零售中缺货导致的销售数据删失问题,提出首个大规模基准数据集FreshRetailNet-50K,包含50,000条高时间分辨率小时级销售序列及缺货标注,并展示了两阶段需求建模方法,将预测准确率提升2.73%,需求低估偏差从7.37%降至近零。

Comments FreshRetailNet-LT is a new version of FreshRetailNet-50K, spanning dataset over two years

详情
AI中文摘要

准确的需求估计对于零售业务指导易腐产品的库存和定价策略至关重要。然而,它面临缺货期间删失销售数据的根本挑战,其中未观察到的需求会造成系统性政策偏差。现有数据集缺乏解决这种删失效应所需的时间分辨率和标注。为填补这一空白,我们提出了FreshRetailNet-50K,这是首个用于删失需求估计的大规模基准。它包含来自18个主要城市898家商店的50,000条商店-产品时间序列的详细小时级销售数据,涵盖863个易腐SKU,并精心标注了缺货事件。该数据集独有的小时级库存状态记录,结合丰富的上下文协变量(包括促销折扣、降水和时间特征),使得超越现有解决方案的创新研究成为可能。我们展示了一个两阶段需求建模的用例:首先,利用精确的小时级标注重建缺货期间的潜在需求;然后,利用恢复的需求在第二阶段训练鲁棒的需求预测模型。实验结果表明,该方法将预测准确率提高了2.73%,同时将系统性需求低估从7.37%降至接近零偏差。凭借前所未有的时间粒度和全面的真实世界信息,FreshRetailNet-50K在需求插补、易腐库存优化和因果零售分析方面开辟了新的研究方向。该数据集独特的标注质量和规模解决了零售AI中长期存在的局限性,提供了即时解决方案和未来方法论创新的平台。数据(此 https URL )和代码(此 https URL )已公开。

英文摘要

Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.

2602.13848 2026-06-15 cs.LG stat.ML 版本更新

Testing For Distribution Shifts with Conditional Conformal Test Martingales

基于条件共形检验鞅的分布偏移检测

Shalev Shaer, Yarin Bar, Drew Prinster, Yaniv Romano

发表机构 * Technion - Israel Institute of Technology(技术ion - 以色列理工学院)

AI总结 提出一种顺序检验方法,通过固定参考集避免测试污染,利用稳健鞅构造实现任意有效的I型错误控制和渐近功效1,检测速度优于标准共形检验鞅。

详情
AI中文摘要

我们提出了一种用于检测任意分布偏移的顺序检验方法,该方法允许共形检验鞅(CTM)在固定的参考条件设置下工作。现有的CTM检测器通过不断用每个新样本扩展参考集来构建检验鞅,并以此评估新样本相对于过去观测的异常程度。虽然这种设计能实现任意有效的I型错误控制,但它存在测试污染问题:变化发生后,偏移后的观测进入参考集,稀释了分布偏移的证据,增加了检测延迟并降低了功效。相比之下,我们的方法通过将每个新样本与固定的零假设参考数据集进行比较,从设计上避免了污染。我们的主要技术贡献是一种稳健的鞅构造,该构造在条件于零假设参考数据时仍然有效,通过显式考虑有限参考集引起的参考分布估计误差来实现。这实现了任意有效的I型错误控制,同时保证了渐近功效为1和有界期望检测延迟。实验表明,我们的方法比标准CTM更快地检测到偏移,提供了一种强大且可靠的分布偏移检测器。

英文摘要

We propose a sequential test for detecting arbitrary distribution shifts that allows conformal test martingales (CTMs) to work under a fixed, reference-conditional setting. Existing CTM detectors construct test martingales by continually growing a reference set with each incoming sample, using it to assess how atypical the new sample is relative to past observations. While this design yields anytime-valid type-I error control, it suffers from test-time contamination: after a change, post-shift observations enter the reference set and dilute the evidence for distribution shift, increasing detection delay and reducing power. In contrast, our method avoids contamination by design by comparing each new sample to a fixed null reference dataset. Our main technical contribution is a robust martingale construction that remains valid conditional on the null reference data, achieved by explicitly accounting for the estimation error in the reference distribution induced by the finite reference set. This yields anytime-valid type-I error control together with guarantees of asymptotic power one and bounded expected detection delay. Empirically, our method detects shifts faster than standard CTMs, providing a powerful and reliable distribution-shift detector.

2604.14892 2026-06-15 cs.LG cs.AI 版本更新

Can LLMs Accurately Score Medical Diagnoses and Clinical Reasoning?

LLM能否准确评分医学诊断和临床推理?

Amy Rouillard, Sitwala Mundia, Linda Camara, Ziyaad Dangor, Michael Cameron Gramanie, Ismail Kalla, Shabir A. Madhi, Kajal Morar, Marlvin T. Ncube, Haroon Saloojee, Bruce A. Bassett

发表机构 * Wits MIND Institute, University of the Witwatersrand, Johannesburg, South Africa(维特士心理研究所,沃斯兰德大学,约翰内斯堡,南非) Grai Labs, Cape Town, South Africa(格雷实验室,开普敦,南非) South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(南非医学研究理事会疫苗和传染病分析研究组,健康科学学院,沃斯兰德大学,约翰内斯堡,南非) Department of Internal Medicine, Charlotte Maxeke Johannesburg Academic Hospital, and Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(内科学系,查理·马克斯凯约翰内斯堡学术医院,以及健康科学学院,沃斯兰德大学,约翰内斯堡,南非) Department of Paediatrics and Child Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(儿科学与儿童健康系,健康科学学院,沃斯兰德大学,约翰内斯堡,南非) Wits MIND Institute, University of the Witwatersrand, Johannesbu(维特士心理研究所,沃斯兰德大学,约翰内斯堡)

AI总结 研究使用LLM陪审团对300例低收入和中等收入国家医院病例的3334个诊断进行评分,发现校准后的LLM评分与专家评分高度一致,且严重错误风险更低,可作为可靠的评估代理。

详情
AI中文摘要

使用专家临床医生小组评估医学AI系统成本高且速度慢,这促使使用大型语言模型(LLM)作为替代评判者。在此,我们评估了一个由三个前沿AI模型组成的LLM陪审团,对300个真实低收入和中等收入国家(LMIC)医院病例的3334个诊断进行评分。LLM和临床医生生成的诊断均根据专家小组诊断在四个维度上进行评分:诊断、鉴别诊断、临床推理和阴性治疗风险。将LLM陪审团评分与专家和独立重新评分小组的评分进行比较,以评估误差指标、评分者间一致性、严重风险错误以及使用等渗回归进行事后校准的效果。在我们的数据中,我们发现:(i)未校准的LLM陪审团评分与专家临床医生小组评分保持序数一致性,但系统性地更低;(ii)LLM陪审团出现严重风险错误的概率低于人类专家重新评分小组;(iii)LLM陪审团结合LLM诊断可用于识别高风险错误诊断,从而实现有针对性的专家审查并提高小组效率;(iv)校准后的LLM陪审团评分和诊断代理排名与主要专家小组的评分和排名表现出极好的一致性;(v)LLM陪审团模型没有表现出自我偏好偏差,它们对自己底层模型或同一供应商模型生成的诊断评分并不比其他模型生成的诊断更有利(或更不利)。总之,这些结果提供了证据,表明校准后的LLM陪审团是医学AI基准测试中专家临床医生评估的值得信赖且可靠的代理。在其他临床环境中确认这些发现是未来工作的重要方向。

英文摘要

Evaluating medical AI systems using expert clinician panels is costly and slow, motivating the use of large language models (LLMs) as alternative adjudicators. Here, we evaluate an LLM Jury, composed of three frontier AI models, for scoring 3334 diagnoses on 300 real-world low- and middle-income country (LMIC) hospital cases. Both LLM- and clinician-generated diagnoses are scored against expert panel diagnoses across four dimensions: diagnosis, differential diagnosis, clinical reasoning, and negative treatment risk. The LLM Jury scores are compared with expert and independent re-scoring panel scores to assess error metrics, inter-rater agreement, severe-risk errors, and the effect of post hoc calibration using isotonic regression. In our data, we find that: (i) the uncalibrated LLM Jury scores preserve ordinal agreement with the expert clinician panel scores, but are systematically lower; (ii) the probability of severe-risk errors is lower for the LLM Jury than the human expert re-score panels; (iii) the LLM Jury combined with LLM diagnoses can be used to identify diagnoses at high risk of error, enabling targeted expert review and improved panel efficiency; (iv) the calibrated LLM Jury scores and rankings of diagnosing agents show excellent agreement with those of the primary expert panels; (v) LLM Jury models show no self-preference bias, they did not score diagnoses generated by their own underlying model or models from the same vendor more (or less) favourably than those generated by other models. Together, these results provide evidence that a calibrated LLM Jury is a trustworthy and reliable proxy for expert clinician evaluation in medical AI benchmarking. Confirming these findings in other clinical settings is an important direction for future work.

2606.12994 2026-06-15 cs.LG cs.CE 版本更新

DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation

DeepJEB++: 基于基础模型驱动的二维潜空间增强的大规模三维工程数据集

Soyoung Yoo, Leekyo Jeong, Jinsu Ra, Dongeon Lee, Sunwoong Yang, Hyogu Jeong, Namwoo Kang

发表机构 * Cho Chun Shik Graduate School of Mobility, Korea Advanced Institute of Science and Technology(韩国科学技术院赵春植移动研究生院) Department of Mechanical Engineering, Hanyang University(汉阳大学机械工程系) Narnia Labs(纳尼亚实验室)

AI总结 提出DeepJEB++框架,通过二维潜空间增强和基础模型,将少量喷气发动机支架种子设计扩展为大规模带仿真标签的三维数据集,实现40倍扩展。

Comments 16 pages, 14 figures. Submitted to ASME Journal of Mechanical Design

详情
AI中文摘要

数据驱动的工程设计受到缺乏大规模三维数据集的限制,这些数据集需要将几何形状与基于物理的性能标签配对。特别是,现有的三维数据增强技术在保留微妙且多样的几何变化方面存在局限性,并且自动化后续的仿真标注过程仍然困难,因为边界条件取决于生成的几何形状。我们提出了DeepJEB++,一个基础模型驱动的数据增强框架,在资源受限的情况下将少量喷气发动机支架种子设计扩展为大规模、带仿真标签的三维数据集。我们的关键思想是在数据丰富的二维潜空间中进行增强,然后转移到三维。在第一阶段,我们在多视图渲染上微调预训练的二维潜扩散模型,并通过潜插值合成新视图,通过视觉语言模型(VLM)质量过滤器保留可制造的设计。在第二阶段,经过验证的图像通过领域适应的生成基础模型提升为三维网格。在第三阶段,一个自动化流水线识别每个网格上的载荷和螺栓接口,并分配有限元标签——质量、应力和位移——无需人工干预。我们沿着三个内在轴评估增强质量:可制造性、相对于SimJEB真实值的标签保真度以及分布一致性。从少于400个种子设计开始,DeepJEB++在每阶段使用单个GPU的情况下,生成了15,360个带仿真标签的三维支架——实现了40倍的扩展。该数据集将公开提供,以支持可复现的工程AI研究。

英文摘要

Data-driven engineering design is constrained by the lack of large-scale 3D datasets that pair geometry with physics-based performance labels. In particular, existing 3D data augmentation techniques have limitations in preserving subtle and diverse geometric variations, and it remains difficult to automate the subsequent simulation-labeling process, where boundary conditions vary depending on the generated geometry. We present DeepJEB++, a foundation-model-driven data-augmentation framework that expands a small seed set of jet engine brackets into a large, simulation-labeled 3D dataset under constrained resources. Our key idea is to augment in the data-rich 2D latent space, then transfer to 3D. In Stage 1, we fine-tune a pretrained 2D latent diffusion model on multi-view renders and synthesize novel views by latent interpolation, retaining manufacturable designs through a vision-language-model (VLM) quality filter. In Stage 2, the validated images are lifted to 3D meshes by a domain-adapted generative foundation model. In Stage 3, an automated pipeline recognizes the load and bolt interfaces on each mesh and assigns finite-element labels -- mass, stress, and displacement -- without manual intervention. We assess augmentation quality along three intrinsic axes: manufacturability, label fidelity against the SimJEB ground truth, and distributional consistency. Starting from fewer than 400 seed designs, DeepJEB++ yields 15,360 simulation-labeled 3D brackets -- a 40x expansion -- using a single GPU per stage. The dataset will be made publicly available to support reproducible engineering-AI research.

2606.13221 2026-06-15 cs.LG 版本更新

From Uncertain Judgments to Calibrated Rankings: Conformal Elo Estimation for LLM Evaluation

从不确定判断到校准排名:用于LLM评估的共形Elo估计

Bora Kargi, David Salinas

发表机构 * ELLIS Institute Tübingen(ELLIS 蒂宾根研究所) OpenEuroLLM

AI总结 提出一种两层次校准方法,通过局部不确定性传播和全局共形预测,将LLM-as-a-judge的Elo评分误差降至17.9 MAE,并提供无分布假设的置信区间。

详情
AI中文摘要

评估新的大型语言模型通常需要大规模且昂贵的人工标注。LLM作为评判者提供了一种更便宜的替代方案,但评判者评分存在系统误差——如位置偏差、自我偏好或不可传递性——这些误差可能导致最终排名严重失准。我们在两个互补层面上量化评判者与人类之间的分歧。在局部层面,我们通过将校准的获胜概率而非硬标签传播到Bradley-Terry过程中,从评判者自身的评分差异估计每场对战的不确定性。仅此一项就显著提高了Elo估计的准确性,在LMArena上对55个保留模型取平均时,LLM得出的评分与人类得出的评分之间的平均绝对误差为17.9 Elo。在全局层面,我们将分裂共形预测应用于LLM得出的与人类得出的Elo评分之间的残差差距,产生具有无分布边际覆盖保证的预测区间,从而解释了不可约的LLM-人类分歧。这两层结合产生了一个低成本的评估工具,为开发者提供校准的Elo估计和诚实的置信区间,而无需大规模人工标注。为促进可重复性,我们在https://this http URL发布代码。

英文摘要

Evaluating new large language models typically requires costly human annotation campaigns at scale. LLM-as-a-judge offers a cheaper alternative, but judge scores carry systematic errors - such as position bias, self-preference, or intransitivity - that can strongly miscalibrate the resulting rankings. We quantify the resulting judge-human disagreement at two complementary levels. At the local level, we estimate per-battle uncertainty from the judge's own score differences by propagating calibrated win probabilities rather than hard labels into the Bradley-Terry procedure. This alone provides a drastic improvement to Elo estimation accuracy, bringing LLM-derived ratings within 17.9 Elo MAE of human-derived ones when averaged over 55 held-out models on LMArena. At the global level, we apply split conformal prediction to the residual gap between LLM-derived and human-derived Elo ratings across held-out models, producing prediction intervals with distribution-free marginal coverage guarantees that account for irreducible LLM-human disagreement. Together, these two layers yield a low-cost evaluation tool that provides developers with calibrated Elo estimates and honest uncertainty bounds, without access to large-scale human annotations. To facilitate reproducibility, we release our code at https://github.com/kargibora/SoftElo .

2601.04646 2026-06-15 cs.IR cs.AI cs.CL cs.LG 版本更新

Succeeding at Scale: Enterprise Retrieval Benchmark Construction and Index-Preserving Query Adaptation for Multi-Tenant Search

规模化成功:面向多租户搜索的企业检索基准构建与索引保持查询适配

Prateek Jain, Shabari S Nair, Ritesh Goru, Prakhar Agarwal, Ajay Yadav, Yoga Sri Varshan Varadharajan, Constantine Caramanis

发表机构 * Prateek Jain Shabari S Nair Ritesh Goru Prakhar Agarwal Ajay Yadav Yoga Sri Varshan Varadharajan Constantine Caramanis

AI总结 针对多租户检索系统中标注数据匮乏和模型更新成本高的问题,提出全自动构建基准DevRev-Search,并研究仅微调查询编码器而保持文档索引不变的索引保持查询适配策略,实现质量与效率的平衡。

详情
AI中文摘要

大规模多租户检索系统生成大量查询日志,但缺乏用于有效领域适应的精心策划的相关性标签,导致大量“暗数据”未被充分利用。模型更新的高成本加剧了这一挑战,因为联合微调查询和文档编码器需要完整的语料库重新索引,这在拥有数千个独立索引的多租户环境中是不切实际的。我们引入了DevRev-Search,这是一个通过完全自动化管道构建的技术客户支持段落检索基准。候选生成使用跨多种稀疏和密集检索器的融合,随后使用LLM作为评判器进行一致性过滤和相关性标记。我们进一步研究并系统评估了索引保持查询适配策略,该策略仅微调查询编码器,同时保持文档索引固定。在DevRev-Search、SciFact和FiQA-2018上的实验表明,参数高效的查询编码器微调提供了显著的质量-效率权衡,实现了可扩展且实用的企业多租户检索。

英文摘要

Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data." This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further study and systematically evaluate index-preserving query-only adaptation strategies that fine-tune only the query-encoder while keeping the document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that parameter-efficient fine-tuning of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise multi-tenant retrieval.

2602.00593 2026-06-15 cs.CV cs.LG 版本更新

Pix2Fact: When Vision Is Not Enough -- Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes

Pix2Fact: 当视觉不够时——基于网络验证的细粒度VQA基准测试

Yifan Jiang, Cong Zhang, Bofei Zhang, Qiaofeng Zheng, Yifan Yang, Bingzhang Wang, Yew-Soon Ong

发表机构 * GADE Union (Global AI Data Experts Union)(GADE联盟(全球人工智能数据专家联盟)) Shanghai Jiao Tong University(上海交通大学) Nanyang Technological University(南洋理工大学) New York University(纽约大学) Cambridge University(剑桥大学) The University of Hong Kong(香港大学)

AI总结 本文提出Pix2Fact基准测试,通过高分辨率真实场景中的网络验证,评估细粒度视觉问答中的专家级视觉感知和知识搜索能力,发现现有模型在复杂任务中存在显著不足。

详情
AI中文摘要

尽管在通用任务上取得了进展,视觉-语言模型(VLMs)仍然在需要精细视觉定位和外部知识的挑战中面临困难,而现有基准测试未能综合评估这些能力。为填补这一空白,我们引入Pix2Fact,一个视觉问答基准测试,旨在评估专家级视觉感知和知识搜索能力。Pix2Fact包含1000张高分辨率(4K+)图像,覆盖八个场景。其问题和答案由来自全球顶尖大学的博士持有标注者精心设计。每个问题都需要详细的视觉定位和外部知识的整合。评估十种最先进的VLMs,包括专有模型如Gemini-3.1-Pro和GPT-5.4,发现Pix2Fact对模型提出了严峻挑战:最先进的模型(Gemini-3.1-Pro)在有视觉地面真实和搜索工具的情况下仅达到51.7%的平均准确率。我们的分析将低准确率归因于三个因素:即使有视觉地面真实,频繁的视觉定位错误,浅层搜索利用,以及VLM无法检索长尾、无结构的局部信息。这种显著的差距暴露了当前模型在帮助人类处理需要超负荷视觉理解的现实场景中的局限性。我们相信Pix2Fact将作为推动下一代语言-视觉代理的关键基准测试,这些代理能够无缝整合细粒度感知与稳健的知识搜索。

英文摘要

Despite progress on general tasks, vision-language models (VLMs) still struggle with challenges that demand both fine-grained visual grounding and external knowledge, a synergy overlooked by existing benchmarks that evaluate these abilities in isolation. To fill this void, we introduce Pix2Fact, a visual question-answering benchmark designed to assess expert-level visual perception and knowledge search. Pix2Fact comprises 1,000 high-resolution (4K+) images spanning eight scenarios. Its questions and answers are meticulously crafted by PhD-holding annotators from top global universities across diverse disciplines. Each question requires detailed visual grounding and the integration of external knowledge. Evaluating ten state-of-the-art VLMs, including proprietary models such as Gemini-3.1-Pro and GPT-5.4, we find that Pix2Fact poses a formidable challenge: the most advanced model (Gemini-3.1-Pro) achieves only 51.7% average accuracy, even with access to visual ground truth and search tools. Our analysis attributes this low accuracy to three factors, frequent visual grounding errors even with visual ground truth, shallow search harnessing, and VLM's inability to retrieve long-tail, unstructured local information. This striking gap exposes the limitations of current models in assisting humans with real-world scenarios that demand overwhelming visual comprehension. We believe Pix2Fact will serve as a critical benchmark to drive the next generation of language-vision agents that seamlessly integrate fine-grained perception with robust knowledge search.

2602.22822 2026-06-15 cs.AI cs.LG 版本更新

FlexMS: A Unified Public Benchmark for Molecule Tandem Mass Spectrum Prediction

FlexMS:分子串联质谱预测的统一公共基准

Yunhua Zhong, Yixuan Tang, Yifan Li, Pan Liu, Zhiwen Yang, Jie Yang, Jun Xia

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) The Hong Kong University of Science and Technology(香港科学与技术大学) The University of Hong Kong(香港大学) Yangzhou University(扬州大学) Fudan University(复旦大学)

AI总结 提出FlexMS基准框架,通过标准化预处理、元数据条件和评估协议,实现跨公共资源的公平比较,并引入难度感知诊断指导模型选择。

Comments preprint version v3

详情
AI中文摘要

串联质谱(MS/MS)在小分子鉴定中至关重要,但当前的深度学习谱预测系统在实际评估和部署中仍存在困难。尽管新架构不断声称达到最先进性能,但不一致的元数据条件和纠缠的预处理流程阻碍了公平的架构比较。此外,现有评估通常局限于精心策划的数据集,未能捕捉真实代谢组学的异质性和跨领域偏移。而且,当前基准缺乏难度感知诊断,对模型在特定计算或数据约束下的行为视而不见。为解决这些问题,我们提出了FlexMS,一个模块化的公共数据基准框架,它在统一协议下标准化跨公共资源的MS/MS预测,同时保留分子编码器、元数据条件、预测头以及下游检索。FlexMS建立了一个公平的评估平台,显著降低了集成新预测工具的门槛。FlexMS不仅优化平均分数,还通过难度感知诊断增强聚合准确性,为不同计算约束、数据规模和下游检索目标下的模型选择提供可操作指导。最终,FlexMS为社区提供了一个可复现的标准,以识别哪些算法结论是稳定的,以及哪些操作点在实践中最为可行。

英文摘要

Tandem mass spectrometry (MS/MS) is central to small molecule identification, but current deep learning systems for spectrum prediction still remain difficult to evaluate and deploy in practice. While novel architectures constantly claim state-of-the-art performance, inconsistent metadata conditioning and entangled preprocessing pipelines hinder fair architectural comparisons. Besides, existing evaluations are often restricted to curated datasets, failing to capture the heterogeneity and cross-domain shifts of real-world metabolomics. Furthermore, current benchmarks lack difficulty-aware diagnostics and leave blind to how models behave under specific compute or data constraints. To address this, we present FlexMS, a modular public-data benchmark framework that standardizes MS/MS prediction across public resources while keeping molecular encoders, metadata conditioning, predictor heads, and downstream retrieval under one protocol. FlexMS establishes a fair evaluation playground which significantly lowers the barrier for integrating new predictive tools. Rather than solely optimizing for average scores, FlexMS augments aggregate accuracy with difficulty-aware diagnostics, providing actionable guidance on model selection across different compute constraints, data scales, and downstream retrieval objectives. Ultimately, FlexMS provides the community with a reproducible standard to identify which algorithmic conclusions are stable and which operating points are most viable in practice.

12. 机器学习应用 61 篇

2606.13741 2026-06-15 cs.LG 新提交

High-Frequency Pricing at Scale for E-Commerce

电子商务中的大规模高频定价

Stefan Birr, Tobias Huelden, Mones Raslan, Adele Gouttes, Andreas Schmitt, Mateusz Koren, Johannes Stephan, Robert Streek, Manuel Kunz, Tim Januschowski

发表机构 * Zalando SE Databricks

AI总结 提出一种预测-优化框架,结合梯度提升树与多目标优化,实现时尚电商促销活动的每日高频定价,通过23次A/B测试验证,利润提升约6%。

详情
AI中文摘要

本文介绍了针对时尚电商促销活动的一种专门的预测-优化算法定价工具的设计、开发和实施。销售活动给定价带来了独特的挑战,包括波动的需求模式、快速的定价决策以及平衡短期收入与长期盈利能力的需要。我们描述了我们的方法,该方法结合了使用梯度提升树的每日分辨率需求预测与一个多目标优化框架,该框架针对超过500万件商品同时最大化长期利润和净商品价值。我们的解决方案通过实现一个预测-优化架构,将定价决策时间从数小时缩短到数分钟,解决了现有周粒度系统的关键局限性。我们通过在2023-2024年期间在欧洲领先的在线时尚零售商Zalando的12个市场中进行的23次A/B测试验证了我们的方法。实验结果表明,与之前的手动-算法混合方法相比,新的定价系统在保持同等销售和收入表现的同时,实现了约6%的更高利润。基于这些结果,该算法已成功部署到生产环境,现在负责公司促销活动中的大部分算法定价决策。

英文摘要

This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

2606.13742 2026-06-15 cs.LG cs.AI physics.comp-ph physics.flu-dyn stat.ML 新提交

A fully GPU-based workflow for building physics emulators of hypersonic flows

基于全GPU工作流构建高超声速流物理仿真器

Fabian Paischer, Dylan Rubini, Deniz A. Bezgin, Aaron B. Buhendwa, David Hauser, Florian Sestak, Johannes Brandstetter, Sebastian Kaltenbach, Nikolaus A. Adams

发表机构 * TU Munich(慕尼黑工业大学) Institute for Machine Learning, JKU Linz(林茨约翰·开普勒大学机器学习研究所) ELLIS Unit(ELLIS单元) EMMI AI

AI总结 提出全GPU工作流,集成加速数据生成与不确定性量化增强的神经仿真器训练,通过可微求解器JAX-Fluids实现残差驱动改进,提升物理一致性并支持外推。

Comments First authors contributed equally

详情
AI中文摘要

以高保真度和低计算成本解析复杂物理现象的能力是解决现代工程关键挑战的核心。一个典型例子是高超声速流,其中精确预测全流场拓扑,特别是激波位置和强度,至关重要。然而,超声速和高超声速流仍然是传统降阶模型和神经仿真器的绊脚石,这些模型难以在工业相关应用中物理一致地捕捉流态中的陡峭梯度。为此,我们引入了一个完全基于GPU的工作流,该工作流将加速数据生成与通过不确定性量化和物理感知细化增强的神经仿真器训练相结合。我们的工作流由可微高保真求解器(JAX-Fluids)实现,我们利用该求解器进行快速数据集创建和基于残差的神经仿真器改进,以增强物理一致性。在此框架基础上,我们首先提出了一系列模型架构,并分析了它们的缩放行为以揭示其优缺点。然后,我们表明基于残差的细化使得能够在仅提供网格和输入参数的情况下进行训练,显著降低残差并提高物理一致性。可微仿真和基于残差的细化共同产生了在其训练分布之外仍然可靠的物理仿真器,这是在现实工程设计循环中部署代理的关键要求。

英文摘要

The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics-aware refinement. Our workflow is enabled by a differentiable high-fidelity solver (JAX-Fluids) which we employ for rapid dataset creation and residual-based improvement of the neural emulator to enhance physical consistency. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings. We then show that residual-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency. Together, differentiable simulation and residual-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real-world engineering design loops.

2606.13821 2026-06-15 cs.LG 新提交

Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation

基于注意力的剂量变化下个体治疗获益概率估计

Lev V. Utkin, Andrei V. Konstantinov, Stanislav K. Kogan, Natalya M. Verbova, Maksim I. Goriunov

发表机构 * Peter the Great St.Petersburg Polytechnic University Higher School of Artificial Intelligence Technologies(圣彼得堡彼得大帝理工大学人工智能技术高等学院)

AI总结 提出Dose-AIPTB框架,将个体治疗获益概率估计扩展至离散剂量场景,通过注意力机制聚合伪标签实现个性化剂量选择。

详情
AI中文摘要

估计个体患者治疗优于对照的概率,称为个体治疗获益概率(IPTB),提供了比群体平均指标更具临床直观性的替代方案。然而,现有的IPTB估计方法主要局限于二元治疗设置,尽管临床实践中剂量变化干预普遍存在。我们提出一个通用框架,用于离散剂量分配下有序结局的IPTB估计,称为Dose-AIPTB(基于注意力的剂量IPTB)。我们的方法将问题重述为对未观察到的个体治疗效应符号的二元分类,从协变量相似的成对比较中构建伪标签,并通过注意力机制或Nadaraya-Watson核回归进行聚合。该公式自然适应多个离散剂量水平,超越了二元治疗范式。通过在协变量偏移、不同样本量和异质性结局下的真实世界和合成数据上的数值实验,我们证明基于注意力的聚合始终优于核方法。该框架为基于个体水平获益概率的个性化剂量选择提供了基础。实现该模型的代码公开于此https URL。

英文摘要

Estimating the probability that a treatment outperforms a control for an individual patient, called the Individual Probability of Treatment Benefit (IPTB), offers a clinically intuitive alternative to population-average metrics. However, existing methods for IPTB estimation are largely confined to binary treatment settings, despite the prevalence of dose-varying interventions in clinical practice. We propose a general framework for IPTB estimation with ordinal outcomes under discrete dose assignments, called Dose-AIPTB (Dose Attention-based IPTB). Our approach recasts the problem as binary classification over the unobserved sign of the individual treatment effect, constructing pseudo-labels from covariate-similar pairwise comparisons and aggregating them via attention mechanisms or Nadaraya-Watson kernel regression. This formulation naturally accommodates multiple discrete dose levels, extending beyond the binary treatment paradigm. Through numerical experiments on real-world and synthetic data under covariate shift, varying sample sizes, and heterogeneous outcomes, we demonstrate that attention-based aggregation consistently outperforms kernel alternatives. The framework provides a foundation for personalized dose selection grounded in individual-level benefit probabilities. Codes implementing the model are publicly available at https://github.com/NTAILab/AIPTBDose.

2606.13880 2026-06-15 cs.LG q-fin.RM 新提交

A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework

一种纵向属性条件神经网络用于不规则时间数据中健康状态转移概率建模:LANTERN框架

Bright Kwaku Manu, Beckett Sterner, Petar Jevtic

发表机构 * School of Computing and Augmented Intelligence, Arizona State University(亚利桑那州立大学计算与增强智能学院) School of Life Sciences, Arizona State University(亚利桑那州立大学生命科学学院) School of Mathematical and Statistical Sciences, Arizona State University(亚利桑那州立大学数学与统计科学学院)

AI总结 提出LANTERN框架,利用条件神经网络从纵向健康数据中估计多状态转移概率,处理不规则时间间隔和协变量历史,在健康与退休研究数据上优于逻辑回归等基准模型。

Comments 35 pages, 17 figures

详情
AI中文摘要

长期护理转移概率的准确估计对于残疾保险定价、准备金和偿付能力评估至关重要。经典精算多状态模型通常依赖于马尔可夫、半马尔可夫或比例风险设定,这些模型直接与队列预测相关,但对于具有非线性老龄化模式和异质性协变量历史的不规则纵向健康数据可能具有限制性。本文开发了一种针对不规则纵向健康数据的多状态转移概率的良好校准估计器。该模型从个体健康史中学习,纳入观测之间的时间间隔,并根据人口统计学和社会经济属性条件化转移概率。它生成下一个观测健康状态的有效概率分布,包含四种可能状态:健康、轻度残疾、重度残疾和死亡。个体概率按年龄组和初始状态聚合,形成与精算队列预测兼容的转移矩阵。利用健康与退休研究的纵向数据,我们将所提出的估计器与逻辑回归、梯度提升树、循环神经网络和最后状态持久性基准进行比较。评估考虑了概率准确性、重度残疾和死亡的端点判别与校准、风险集中度以及聚合后的转移矩阵误差。所提出的估计器相对于逻辑回归和梯度提升树基准改善了重度残疾判别,保持强校准性,并在留出测试分析中在评估模型中产生最低的转移矩阵误差。结果表明,当通过校准和预测保真度(超越判别)进行评判时,结构化的机器学习估计器可以支持长期护理转移建模。

英文摘要

Accurate estimation of long-term care transition probabilities is central to disability insurance pricing, reserving, and solvency assessment. Classical actuarial multi-state models commonly rely on Markov, semi-Markov, or proportional-hazard specifications, which provide a direct connection to cohort projection but may be restrictive for irregular longitudinal health data with nonlinear aging patterns and heterogeneous covariate histories. This paper develops a well-calibrated estimator of multi-state transition probabilities for irregular longitudinal health data. The model learns from individual health history, incorporates the time elapsed between observations, and conditions transition probabilities on demographic and socioeconomic attributes. It produces a valid probability distribution over the next observed health state, with four possible states: healthy, mild disability, severe disability, and death. Individual probabilities are aggregated by age group and origin state to form transition matrices compatible with actuarial cohort projection. Using longitudinal data from the Health and Retirement Study, we compare the proposed estimator with logistic regression, gradient-boosted trees, a recurrent neural network, and a last-state persistence benchmark. The evaluation considers probabilistic accuracy, endpoint discrimination and calibration for severe disability and death, risk concentration, and transition matrix error after aggregation. The proposed estimator improves severe disability discrimination relative to logistic regression and gradient-boosted tree benchmarks, maintains strong calibration, and yields the lowest transition matrix error among the evaluated models in the held-out test analysis. Results show that a structured machine learning estimator can support long-term care transition modeling when judged by calibration and projection fidelity, beyond discrimination.

2606.13959 2026-06-15 cs.LG 新提交

Can Machine Learning Forecast Rice Yields in Data-Constrained Settings? Satellite Climate Data, National Crop Statistics, and Lessons from Sierra Leone

机器学习能否在数据受限条件下预测水稻产量?卫星气候数据、国家作物统计及塞拉利昂的经验教训

Ibrahim Denis Fofanah

发表机构 * Seidenberg School of Computer Science & Information Systems Pace University, New York, USA(佩斯大学塞登伯格计算机科学与信息系统学院,纽约,美国) RiseAfrica Foundation for STEM and Innovation Sierra Leone, West Africa(RiseAfrica STEM与创新基金会,塞拉利昂,西非)

AI总结 利用塞拉利昂25年作物统计和免费卫星气候数据,通过严格反泄漏协议训练机器学习模型,发现仅气候数据的XGBoost将水稻产量预测误差降低三分之一,早期季节降雨是关键预测因子,并转化为政策建议。

Comments 32 pages, 7 figures. Code and data: https://github.com/Denis060/sierraleone-agri-ml

详情
AI中文摘要

塞拉利昂的农业几乎没有数据驱动的决策支持,也没有已发表的机器学习研究考察该国的作物产量。我们询问是否可以利用塞拉利昂目前拥有的数据预测水稻产量。使用25年(2000-2024年)九种主要作物的FAOSTAT生产数据,我们在严格的反泄漏协议下训练XGBoost、梯度提升和随机森林,采用扩展窗口的前向验证评估七个保留年份,并以朴素持久性为基准。仅基于作物统计训练的模型均未优于持久性。加入免费卫星气候数据(CHIRPS降雨、NASA POWER温度)逆转了这一结果:仅使用气候数据的XGBoost将预测误差降低了三分之一(RMSE 284 vs 428 kg/ha),这一优势在线性模型中依然成立,并且在排除异常的2018年季节后仍然稳健。早期季节(5-6月)降雨是主导预测因子,意味着季节性产量风险在收获前数月即可观测。没有模型预测到2018年的产量崩溃,其根源是制度性的而非气候性的。我们将研究结果转化为对塞拉利昂“Feed Salone”战略的政策建议,并提供了完全开源的流程。

英文摘要

Sierra Leone's agriculture operates with almost no data-driven decision support, and no published machine learning study has examined the country's crop yields. We ask whether rice yield can be forecast from data Sierra Leone currently has. Using 25 years of FAOSTAT production data (2000-2024) for nine major crops, we train XGBoost, Gradient Boosting, and Random Forest under a strict anti-leakage protocol with expanding-window walk-forward evaluation across seven held-out years, benchmarked against naive persistence. No model trained on crop statistics alone outperforms persistence. Augmenting with free satellite climate data (CHIRPS rainfall, NASA POWER temperature) reverses this result: a climate-only XGBoost reduces forecast error by one third (RMSE 284 vs 428 kg/ha), a gain that holds for a linear model and is robust to excluding the anomalous 2018 season. Early-season (May-June) rainfall is the dominant predictor, implying seasonal yield risk is observable months before harvest. No model anticipated the 2018 collapse, whose origins were institutional rather than climatic. We translate the findings into policy recommendations for Sierra Leone's Feed Salone Strategy, with a fully open-source pipeline.

2606.14116 2026-06-15 cs.LG stat.ME 新提交

DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

DTVEM-RE:差分时变效应模型的分层随机效应扩展,用于密集纵向数据中个体特异性多滞后估计

Amartya Bhattacharya

发表机构 * Geisel School of Medicine, Dartmouth College(达特茅斯学院盖泽尔医学院)

AI总结 针对DTVEM假设所有人共享相同滞后结构的局限,提出DTVEM-RE扩展,允许个体拥有自己的滞后系数,通过贝叶斯分层VAR和连续时间OU模型实现,模拟和实证表明其能恢复个体间变异并提升预测性能。

详情
AI中文摘要

Jacobson等人(2019)提出的差分时变效应模型(DTVEM)是寻找密集纵向数据中最佳时间滞后的流行工具,但它假设所有人共享相同的滞后结构。原作者将此问题列为未来工作,这与现代临床研究的前提——个体存在差异——相冲突。我们提出DTVEM-RE,一种允许每个人拥有自己滞后系数的扩展,包含两种确认步骤版本:在Stan中实现的离散时间分层贝叶斯VAR,它在个体间进行信息汇集并提供校准的不确定性;以及在ctsem中实现的连续时间个体Ornstein-Uhlenbeck模型,它直接处理不均匀间隔的测量点。我们报告了四个结果。模拟显示,贝叶斯版本恢复个体间变异tau_a的偏差低于0.01,覆盖率为90%至93%。在Fisher等人(2017)的EMA数据集(N=40)上,个体特异性滞后1效应在三个情绪项目上相差一个数量级,贝叶斯和GAMM估计高度一致(r=0.87至0.92),且DTVEM-RE在四种离散时间方法中给出最佳的一步预测。多滞后版本显示所有九个tau_k值的可信区间均排除零,且个体差异最大的滞后在不同项目间变化,这是仅考虑滞后1的方法(如mlVAR)无法检测到的。最后,两个版本在个体特异性滞后1估计上几乎完全一致(r >= 0.995),差异仅如收缩所预测。据我们所知,DTVEM-RE是DTVEM风格滞后检测的第一个个体特异性实现,并且它包含标准DTVEM作为特例。

英文摘要

The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

2606.14149 2026-06-15 cs.LG 新提交

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

信任但验证:通过事后对抗审计和多智能体反馈循环减轻医学幻觉

Muhammad Osama, Maheera Amjad, Zartasha Mustansar, Arslan Shaukat, Muhammad U. S. Khan

发表机构 * Data Science and Machine Learning Lab, SINES, NUST(NUST SINES数据科学与机器学习实验室) SINES, NUST(NUST SINES) CEME, NUST(NUST CEME)

AI总结 本研究提出一种五智能体“信任但验证”系统,通过事后对抗审计和多智能体反馈循环,将大型语言模型在临床问题中推荐禁用药品的幻觉错误率降低约53%。

详情
AI中文摘要

大型语言模型(LLM)越来越多地部署在医疗环境中,但其产生幻觉的倾向在涉及临床决策时带来风险。本研究考察LLM在回答临床问题时是否会推荐近期被禁止或撤回的药品,并测试一种基于智能体的方法来减少此类错误。我们使用单一LLM骨干开发了一个五智能体“信任但验证”系统。为了衡量监管知识过时性,我们创建了一个包含103个临床多项选择题的对抗数据集,其中历史上正确的答案现在指向禁用物质。该规模确保了跨各种治疗类别的统计显著性。我们评估了三个开放访问模型家族(GPT-OSS、Llama-3、Falcon-3)在原始和智能体条件下的表现。通过逐点得分、标签准确率、幻觉错误率(HER)和组件保真度(CF)得分来衡量性能。我们还观察到专有模型中的临床安全性退化。在默认配置下,所有模型都显示出高幻觉率,一致地选择了与训练数据模式匹配的禁用药物。我们提出的智能体架构将各模型的HER降低了约53%。逐点得分从-0.25(不安全推荐)转向0.0(适当拒绝)。即使模型的参数知识倾向于禁用物质,安全审计也能拦截危险输出。所提出的多智能体框架提供了一种模型无关的方法来强制执行监管合规性,优先考虑患者安全而非流畅的文本生成。我们的工作展示了在安全关键的医疗环境中部署自主AI系统的实用方法,并说明了如何将实时监管数据集成到LLM流水线中以支持临床决策。

英文摘要

Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

2606.14157 2026-06-15 cs.LG cs.AI 新提交

Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport

通过逆最优传输从起点-终点流中学习城市访问成本

Paula Joy B. Martinez

发表机构 * GitHub

AI总结 提出逆最优传输模型从学校间入学流中恢复潜在选择成本,应用于菲律宾283,016条学生流动数据,估计补贴等效距离以优化城市服务分配。

Comments Oral Presentation. 2026 International Conference on Urban AI

详情
AI中文摘要

城市通过混合公私设施网络提供基本服务,包括学校、诊所、交通提供者和补贴服务点。在这些系统中,规划者通常观察到家庭去哪里,但看不到他们权衡距离、价格和机构访问等因素的潜在成本函数。我们通过菲律宾的学校选择来研究这个城市问题,该国最大的国家教育补贴旨在将学习者从拥挤的公立学校转移到参与计划的私立学校。将学校到学校的入学流视为熵最优传输计划,我们使用两种互补的逆最优传输模型恢复潜在选择成本:一个带有补贴项的可解释距离带模型,以及一个通过可微分Sinkhorn前向传递训练的神经成本模型。应用于人口最多地区23,820条观测流中的283,016次学习者出行,该框架估计了一个补贴等效距离$\lambda^{(k)}$,解释为补贴抵消的感知旅行成本公里数。该案例展示了如何将行政起点-终点数据转化为可解释的规划指标,用于可访问性感知的补贴设计、设施选址和城市服务分配。

英文摘要

Cities deliver basic services through mixed public-private facility networks, including schools, clinics, transit providers, and subsidized service points. In these systems, planners often observe where households go, but not the latent cost function through which they trade off factors such as distance, price, and institutional access. We study this urban problem through school choice in the Philippines, where the country's largest national education subsidy is intended to redirect learners from congested public schools to participating private schools. Treating school-to-school enrollment flows as an entropic optimal transport plan, we recover latent choice costs using two complementary inverse optimal transport models: an interpretable distance-banded model with a subsidy term, and a neural cost model trained through a differentiable Sinkhorn forward pass. Applied to 283{,}016 learner trips across 23{,}820 observed flows in the most populated region, the framework estimates a subsidy-equivalent distance, $λ^{(k)}$, interpreted as the kilometers of perceived travel cost offset by the subsidy. The case demonstrates how administrative origin-destination data can be transformed into interpretable planning metrics for accessibility-aware subsidy design, facility siting, and urban service allocation.

2606.14159 2026-06-15 cs.LG q-bio.BM 新提交

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

曲率引导的几何表示用于蛋白质-配体结合亲和力预测

Shuai Li, Chuan-Xian Ren, Yuhao Li, Ziqi Huang, Yue Pan, Mingzhe Tang, Hong Yan

发表机构 * School of Mathematics, Sun Yat-sen University(中山大学数学学院) Department of Electrical Engineering, City University of Hong Kong(香港城市大学电机工程系)

AI总结 提出RicciBind框架,利用里奇曲率捕捉局部相互作用紧密度,结合最优传输实现跨域对齐,提升结合亲和力预测的准确性与可解释性。

详情
AI中文摘要

蛋白质-配体结合亲和力(PLA)预测在药物发现中至关重要。尽管基于机器学习的方法取得了显著进展,现有方法难以联合表征局部几何组织和全局协调的跨分子相互作用,限制了其对复杂结合机制建模的能力。在此,我们提出RicciBind,一个几何表示框架,它整合了曲率引导的层次结构学习与基于最优传输(OT)的跨域对齐,以建模分子相互作用。具体而言,RicciBind利用里奇曲率捕捉分子结构内的局部相互作用紧密度,增强结构感知,并将原子相互作用组织成曲率感知的层次表示。然后,基于OT的聚类匹配机制在几何约束下对齐异质域中的蛋白质和配体聚类,实现全局一致的对应关系,并揭示超出局部邻域的高阶相互作用模式。通过将曲率引导的结构编码与OT驱动的跨域对齐相结合,RicciBind有效建模了复杂的相互作用语义,并显著提高了结合亲和力预测的准确性和可解释性。大量实验表明,RicciBind在PLA基准和虚拟筛选任务中取得了优越的预测性能和泛化能力。消融研究进一步证实了里奇曲率在增强分子相互作用表示中的关键作用。

英文摘要

Protein-ligand binding affinity (PLA) prediction is critical in drug discovery. Despite the notable advancements in machine learning-based approaches, existing methods struggle to jointly characterize local geometric organization and globally coordinated cross-molecular interactions, limiting their ability to model complex binding mechanisms. Here, we propose RicciBind, a geometric representation framework that integrates curvature-guided hierarchical structure learning with optimal transport (OT)-based cross-domain alignment to model molecular interactions. Specifically, RicciBind leverages Ricci curvature to capture local interaction tightness within molecular structures, enhancing structural awareness and organizing atomic interactions into curvature-aware hierarchical representations. An OT-based cluster matching mechanism then aligns protein and ligand clusters across heterogeneous domains under geometric constraints, enabling globally consistent correspondences and revealing higher-order interaction patterns beyond local neighborhoods. By coupling curvature-guided structure encoding with OT-driven cross-domain alignment, RicciBind effectively models complex interaction semantics and substantially improves both the accuracy and interpretability of binding affinity prediction. Extensive experiments demonstrate that RicciBind achieved superior predictive performance and generalization across PLA benchmarks and virtual screening tasks. Ablation studies further confirmed the essential role of Ricci curvature in enhancing molecular interaction representations.

2606.14169 2026-06-15 cs.LG 新提交

Machine Learning for Biomedical Raman Spectroscopy: From Spectral Acquisition to Clinical Translation

生物医学拉曼光谱的机器学习:从光谱采集到临床转化

Bogdan Oancea, Ana Maria Seciu-Grama, Nicoleta Siminea, Laura Mihaela Stefan, Alice Stoica, Joel Sjoberg, Marian Necula, Ana-Maria Prelipcean, Corneliu Ovidiu Vrancianu, Eduard Milea, Andrei Păun, Ion Petre, Mihaela Păun

发表机构 * National Institute of Research and Development for Biological Sciences(罗马尼亚生物科学研究院) University of Bucharest(布加勒斯特大学) University of Turku(图尔库大学)

AI总结 综述机器学习在生物医学拉曼光谱全流程中的应用,包括预处理、诊断分类、可解释性分析及临床转化障碍,强调标准化与鲁棒验证的必要性。

Comments 52 pages, 2 figures

详情
AI中文摘要

拉曼光谱能够无标记、化学特异性地表征生物系统,已成为癌症诊断、分子分型、微生物鉴定和术中决策支持的重要工具。然而,生物医学拉曼光谱具有高维、噪声大、受荧光背景、采集变异性和生物异质性影响的特点,因此鲁棒的计算分析至关重要。本综述考察了机器学习在生物医学拉曼光谱全流程中的作用,从预处理和信号校正到无监督结构发现、监督诊断和分子分层、表示学习和迁移学习、可解释性、生物标志物发现以及与成像、病理学和分子谱分析的多模态整合。重点强调机器学习不仅用于诊断分类,还用于生物学可解释和临床可操作的分析。我们还讨论了临床转化的主要障碍,包括数据集规模有限、仪器间变异性、预处理不一致、外部验证不足、可重复性问题以及软件、数据和元数据共享有限。我们认为,进展需要方法学进步以及标准化、鲁棒验证、可解释性和可部署分析框架。通过整合方法学、生物医学和转化视角,本综述概述了开发可靠且临床可部署的拉曼-人工智能系统的关键方向。

英文摘要

Raman spectroscopy provides label-free, chemically specific characterization of biological systems and has become an important tool for cancer diagnosis, molecular subtyping, microbiological identification, and intraoperative decision support. Biomedical Raman spectra are, however, high-dimensional, noisy, and affected by fluorescence background, acquisition variability, and biological heterogeneity, making robust computational analysis essential. This review examines the role of machine learning across the biomedical Raman spectroscopy pipeline, from preprocessing and signal correction to unsupervised structure discovery, supervised diagnosis and molecular stratification, representation and transfer learning, explainability, biomarker discovery, and multimodal integration with imaging, pathology, and molecular profiling. Emphasis is placed on the use of machine learning not only for diagnostic classification, but also for biologically interpretable and clinically actionable analysis. We also discuss the main barriers to clinical translation, including limited dataset sizes, inter-instrument variability, inconsistent preprocessing, insufficient external validation, reproducibility concerns, and limited sharing of software, data, and metadata. We argue that progress will require methodological advances together with standardization, robust validation, explainability, and deployment-ready analytical frameworks. By integrating methodological, biomedical, and translational perspectives, this review outlines key directions for developing reliable and clinically deployable Raman-AI systems.

2606.14217 2026-06-15 cs.LG q-bio.BM 新提交

Curvature-Informed Potential Energy Surface for Protein-Ligand Binding Affinity Prediction

曲率信息势能面用于蛋白质-配体结合亲和力预测

Peng-Fei Sun, Chuan-Xian Ren, Hong Yan

发表机构 * Sun Yat-Sen University(中山大学) City University of Hong Kong(香港城市大学)

AI总结 提出曲率信息势能面图神经网络CPES,通过物理启发的曲率表示建模构象柔性,结合光谱交叉注意力捕获结合诱导的动力学变化,提升亲和力预测性能。

详情
AI中文摘要

准确预测蛋白质-配体结合亲和力对于基于结构的药物发现至关重要。最近的几何深度学习方法通过将蛋白质-配体复合物表示为三维图,取得了有前景的性能。然而,大多数现有方法主要依赖于来自单一结合构象的静态相互作用几何,而忽略了分子柔性和结合诱导的构象变化。为了解决这一局限性,我们提出了一种曲率信息势能面(CPES)图神经网络用于蛋白质-配体结合亲和力预测,该网络结合了物理启发的曲率表示来建模构象柔性。CPES首先从平衡构型下评估的势能面Hessian矩阵导出曲率谱描述符,其特征值定义了势能面的局部主曲率。然后,它使用光谱交叉注意力来比较未结合的配体和蛋白质与结合复合物,从而捕获结合诱导的构象动力学变化。同时,通过几何感知消息传递、软聚类和双向交叉注意力,从静态结构特征中学习层次化的蛋白质-配体相互作用表示。最后,CPES融合曲率信息动态表示与静态相互作用表示进行亲和力回归。在多个基准数据集上的广泛评估表明,CPES实现了改进的预测性能并提供了物理可解释性。

英文摘要

Accurate prediction of protein-ligand binding affinity is essential for structure-based drug discovery. Recent geometric deep learning methods have achieved promising performance by representing protein-ligand complexes as three-dimensional graphs. However, most existing approaches mainly rely on static interaction geometry from a single bound conformation, while neglecting molecular flexibility and binding-induced conformational changes. To address this limitation, we propose a curvature-informed potential energy surface (CPES) graph neural network for protein-ligand binding affinity prediction, which incorporates physics-informed curvature representations to model conformational flexibility. CPES first derives curvature spectral descriptors from the Hessian of the potential energy surface evaluated at equilibrium configurations, whose eigenvalues define the local principal curvatures of the potential energy surface. It then uses spectral cross-attention to compare the unbound ligand and protein with the bound complex, thereby capturing binding-induced changes in conformational dynamics. In parallel, hierarchical protein-ligand interaction representations are learned from static structural features through geometry-aware message passing, soft clustering, and bidirectional cross-attention. Finally, CPES fuses the curvature-informed dynamic representations with static interaction representations for affinity regression. Extensive evaluations on multiple benchmark datasets demonstrate that CPES achieves improved predictive performance and offers physical interpretability.

2606.14245 2026-06-15 cs.LG 新提交

Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability

黑盒药物-靶标相互作用预测模型关注何处:跨方法可解释性

Ali Vefghi, Zahed Rahmati, Mohammad Akbari

发表机构 * Amirkabir University of Technology(阿米尔卡比尔理工大学)

AI总结 通过梯度归因与特征消融等方法,对BridgeDPI模型进行可解释性审计,揭示模态主导性、填充伪影及化学一致性片段,为计算药物发现提供可检验假设。

详情
AI中文摘要

药物-靶标相互作用(DTI)和亲和力(DTA)预测器日益获得强大的基准分数,但它们对序列、指纹和图特征的内部使用通常仍不透明。我们对BridgeDPI架构在三个不同数据集(包括Gao、Human和此http URL)上进行可解释性审计。本研究结合基于梯度的归因方法——积分梯度、显著性、逐层相关性传播、SmoothGrad和SmoothGrad-IG——与特征级消融实验以及跨方法的严格交集共识,以减少单一解释器偏差。我们总结了原始输入、桥接相似性支架以及图卷积中的敏感性和符号效应,包括边级敏感性和定向边移除。结果表明,当将可解释性视为模型批评时,它最具信息量:它揭示了模态主导性、填充和特殊标记伪影、跨层的数据集依赖的协作与抑制效应,以及方法一致时化学上一致的片段和组成基序。这些分析不能替代结构或实验真值,但它们可以为计算药物发现流程中的下游验证提供可检验的假设。更广泛地说,将现代XAI应用于当代DTI/DTA模型仍是对训练权重和数据中隐含的丰富结构的初步探索——但即使这第一层审查已经帮助研究人员将预测与药物侧和靶标侧表示联系起来,并优先考虑外部验证。

英文摘要

Drug-target interaction (DTI) and affinity (DTA) predictors increasingly achieve strong benchmark scores, yet their internal use of sequence, fingerprint, and graph features often remains opaque. We present an interpretability audit of BridgeDPI architecture on three different datasets including Gao, Human, and C.elegans. This study combines gradient-based attributions -- integrated gradients, saliency, layer-wise relevance propagation, SmoothGrad, and SmoothGrad-IG -- with feature-wise occlusion ablation and strict intersection consensus across methods to reduce single-explainer bias. We summarize sensitivity and signed effects at raw inputs, at the bridge similarity scaffold, and through the graph convolution, including edge-level sensitivities and targeted edge removals. The results show that explainability is most informative when treated as model criticism: it reveals modality dominance, padding and special-token artifacts, dataset-dependent cooperative versus suppressive effects across layers, and chemistry-consistent fragment and composition motifs where methods agree. These analyses do not substitute for structural or experimental ground truth, yet they can provide testable hypotheses for downstream validation in computational drug discovery pipelines. More broadly, applying modern XAI to contemporary DTI/DTA models is still an early pass over the rich structure implicit in trained weights and data -- yet even this first layer of scrutiny already helps researchers relate predictions to drug- and target-side representations and to prioritize external validation.

2606.14344 2026-06-15 cs.LG 新提交

More with LESS -- Local Scene Representations for Tactile Imaging

用更少实现更多——局部场景表示用于触觉成像

Zohar Rimon, Elisei Shafer, Tal Tepper, Daniel Kozin, Alon Malka, Roy Holland, Aviv Tamar

发表机构 * Technion - Israel Institute of Technology(以色列理工学院) Rambam Health Care Campus(拉姆巴姆医疗中心)

AI总结 提出局部编码器空间感知(LESS)方法,通过局部感受野的循环编码器网格建模触觉场景,实现内部结构的2D/3D重建,在泛化、不确定性估计和手持成像上取得突破。

Comments RSS 2026

详情
AI中文摘要

触觉成像旨在通过触觉传感重建软物体的内部结构,应用于医学诊断和机器人操作。最近的自监督学习方法显示出有希望的结果,但依赖于全局、非结构化表示和机器人控制的传感,限制了泛化和实际使用。我们提出局部编码器空间感知(LESS),一种以物体为中心的触觉表示,利用触觉的局部性质。触觉场景被建模为具有局部感受野的循环编码器网格,其状态被融合以重建内部结构的2D或3D图像。这种组合设计实现了强泛化:在单夹杂物体模上训练的模型能够准确成像具有多个夹杂物和不同尺寸的物体。局部结构还支持空间不确定性估计。此外,我们通过外部姿态跟踪和类似人类触诊的数据实现了手持触觉成像,并将触觉成像扩展到完整的3D重建。

英文摘要

Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising results, but rely on global, unstructured representations and robot-controlled sensing, limiting generalization and practical use. We propose Local Encoder for Spatial Sensing (LESS), an object-centric tactile representation that exploits the local nature of touch. The tactile scene is modeled as a grid of recurrent encoders with local receptive fields, whose states are fused to reconstruct 2D or 3D images of internal structure. This compositional design enables strong generalization: models trained on single-inclusion phantoms accurately image objects with multiple inclusions and varying sizes. The local structure further supports spatial uncertainty estimation. In addition, we enable hand-held tactile imaging via external pose tracking and human-like palpation data, and extend tactile imaging to full 3D reconstruction.

2606.14581 2026-06-15 cs.LG cs.AI 新提交

CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation

CARE:通过科学实验中的可审计证据审查控制LLM生成的策略

Guanyu Liu, Weiyi Kong, Zeyu Wang, Boer Zhang, Baiqing Li, Peiyu Zhang, Tianyu Shi

发表机构 * University of Macau(澳门大学) University of Toronto(多伦多大学) UCLA(加州大学洛杉矶分校) Harvard University(哈佛大学) XtalPi(晶泰科技) McGill University(麦吉尔大学)

AI总结 提出CARE框架,通过可审计的干预门控机制,在保留非LLM优化器作为默认路径的同时,利用LLM修正挑战者排序策略,显著提升高通量实验优化性能。

Comments 23 pages, 4 figures

详情
AI中文摘要

赋予LLM对昂贵、不可逆的科学实验的直接控制会导致不安全的探索和不稳定的性能,但完全抛弃LLM的创造力会牺牲显著的优化潜力。我们引入了CARE(通过科学实验中的可审计证据审查控制LLM生成的策略),这是一种用于高通量实验(HTE)优化的可审计控制器,它保留非LLM的现有优化器作为默认动作路径,同时使用LLM来修正挑战者排序策略。在每个结果揭示之前,一个公共证据干预门将挑战者与现有方案进行比较。只有当选择前可用的证据支持变更时,它才授权选择挑战者,并将决策记录在审计日志中。在Minerva/Olympus和ChemLex基准测试中,CARE优于所有其他评估方法,相对于公开的现有方案,最终最佳结果从80.0提高到88.5(Minerva/Olympus),从83.9提高到92.1(ChemLex)。我们的实验表明,当LLM在可审计控制器下扩展提议空间时,其自我进化比直接选择实验更可靠。

英文摘要

Granting LLMs direct control over costly, irreversible scientific experiments leads to unsafe exploration and unstable performance, but discarding LLM creativity entirely sacrifices significant optimization potential. We introduce CARE (Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation), an auditable controller for high-throughput experimentation (HTE) optimization that keeps a non-LLM incumbent optimizer as the default action path while using LLMs to revise challenger ranking policies. Before each outcome is revealed, a public-evidence intervention gate compares the challenger with the incumbent. It authorizes the challenger's selection only when the evidence available before selection supports the change, with the decision recorded in the audit log. CARE outperforms all other evaluated methods on Minerva/Olympus and ChemLex benchmarks, with final-best improving from 80.0 to 88.5 on Minerva/Olympus and from 83.9 to 92.1 on ChemLex, relative to the public incumbent. Our experiments indicate that LLM self-evolution is more reliable when it expands the proposal space under an auditable controller, rather than directly choosing experiments.

2606.14601 2026-06-15 cs.LG cs.SY eess.SY math.OC stat.CO 新提交

A Statistical and Machine Learning Framework for Operational Threshold Detection and Deployable Dispatch Controller Development in Hydrogen Multi-Energy Systems

氢多能系统中运行阈值检测与可部署调度控制器开发的统计与机器学习框架

Shadi Heenatigala, Hasanika Samarasinghe

发表机构 * Antioch College(安提阿学院) The Open University of Sri Lanka(斯里兰卡开放大学)

AI总结 提出统计与机器学习框架,利用一年高分辨率运行数据表征氢多能系统,通过统计分析和随机森林揭示非线性动态,并利用强化学习优化调度。

Comments 17 pages, 12 figures

详情
AI中文摘要

本研究提出了一个统计与机器学习框架,利用一年高分辨率运行数据表征氢基多能系统(H-MES)。统计分析揭示了由可再生能源盈余驱动的二元运行模式,其中太阳辐照度解释了氢气生产中45.7%的基于秩的方差,按常规标准属于大效应。只有高辐照度时期才触发有意义的电解槽参与,而电力需求则产生较弱的反向抑制效应($\epsilon^2 = 0.126$)。多元回归证实电解槽功率是主要的线性预测因子,并存在太阳-风协同交互作用。值得注意的是,随机森林分析将风能输出在预测重要性中排名第一,尽管其双变量相关性较弱(r = 0.167),揭示了参数方法无法发现的非线性动态。一个序列模型利用强24小时自相关性(r = 0.845)进行运行预测,而一个强化学习智能体优化了氢气收益调度。核心贡献在于证明了统计和机器学习方法在H-MES建模与控制中是互补的。

英文摘要

This study presents a statistical and machine learning framework for characterizing a hydrogen-based multi-energy system (H-MES) using one year of high-resolution operational data. Statistical analysis revealed a binary operation driven by renewable surplus, with solar irradiance explaining 45.7% of rank-based variance in hydrogen production, a large effect by conventional standards. Only high-irradiance periods triggered meaningful electrolyzer engagement, while electricity demand exerted a weaker inverse suppression effect ($ε^2 = 0.126$). Multiple regression confirmed electrolyzer power as the dominant linear predictor, with a synergistic solar-wind interaction. Notably, Random Forest analysis ranked wind output first in predictive importance despite its weak bivariate correlation (r = 0.167), revealing non-linear dynamics invisible to parametric methods. A sequence model exploited strong 24-hour autocorrelation (r = 0.845) for operational forecasting, while a reinforcement learning agent optimized hydrogen revenue dispatch. The core contribution is demonstrating that statistical and machine learning approaches are complementary for H-MES modeling and control.

2606.14608 2026-06-15 cs.LG cs.AI 新提交

Expert-Driven Survival Machines: Improving Stratification and Interpretability in Multiple Clinical Cohorts

专家驱动的生存机器:改善多个临床队列中的分层与可解释性

Farica Zhuang, Zixuan Wen, Christos Davatzikos, Li Shen

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出一种基于混合专家模型的自适应深度聚类生存框架(AdaCSM),通过路由专家机制实现条件专业化,动态分配患者到专门的风险预测器,提升生存预测性能和可解释性。

详情
AI中文摘要

生存预测在医疗提供者和临床研究中扮演核心角色。准确的风险分层能够实现早期干预并改善患者管理。大多数现有的深度生存模型为所有患者学习一个共同的特征表示,这可能掩盖患者亚组之间的重要差异。相比之下,混合专家(MoE)框架允许模型的不同部分关注不同的患者模式,从而产生更个性化的表示。因此,在这项工作中,我们提出了一种混合专家增强的自适应深度聚类生存框架(AdaCSM),用于建模这种异质性生存模式。我们引入了一种基于路由的专家机制,该机制在参数化生存建模框架内实现条件专业化。所提出的架构动态地将患者分配给专门的风险预测器,同时保留患者生存和亚型聚类目标。我们在跨越不同疾病领域的多个真实世界纵向临床队列上,将我们的方法与最先进的生存和深度聚类模型进行了比较。所提出的方法在生存分析中展示了改进的预测性能并产生了可解释的结果。

英文摘要

Survival prediction plays a central role for healthcare providers and clinical researchers. Accurate risk stratification enables early intervention and improved patient management. Most existing deep survival models learn one common feature representation for all patients, which may hide important differences between patient subgroups. In contrast, a Mixture-of-Experts (MoE) framework allows different parts of the model to focus on different patient patterns, leading to more individualized representations. Therefore, in this work, we propose a mixture-of-experts enhanced adaptive deep clustering survival framework (AdaCSM) for modeling such heterogeneous survival patterns. We introduce a routing-based expert mechanism that enables conditional specialization within a parametric survival modeling framework. The proposed architecture allocates patients to specialized risk predictors dynamically while preserving the patient survival and subtype clustering objectives. We compare our method with state-of-the-art survival and deep clustering models on multiple real-world longitudinal clinical cohorts spanning diverse disease domains. The proposed method demonstrates improved predictive performance and leads to interpretable results in survival analysis.

2606.13682 2026-06-15 cs.AI cs.LG 交叉投稿

A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

基于深度强化学习的Transformer方法求解开放车间调度问题

Faezeh Ardali, Mwembezi A. Nyelele, Gerald M. Knapp

发表机构 * Louisiana State University(路易斯安那州立大学) University of Minnesota Duluth(明尼苏达大学杜鲁斯分校)

AI总结 提出一种基于Transformer编码器-解码器架构的调度策略,仅以加工时间矩阵为输入,在Taillard小规模实例上训练后可直接推广至40x40至100x100的大规模问题,与经典调度规则相比具有竞争力。

详情
AI中文摘要

开放车间调度问题(OSSP)出现在许多工业和服务环境中,但随着作业和机器数量的增加,其计算难度仍然很大。精确方法很快变得难以处理,而经典调度规则和元启发式方法可能需要大量调整才能在大规模下保持解的质量。本研究开发了一种基于Transformer的OSSP调度策略,采用具有多头注意力的编码器-解码器架构。该模型仅在Taillard基准实例(4x4、5x5、7x7和10x10)上使用加工时间矩阵作为输入进行训练,生成可行调度,其makespan通常为最佳已知值的15-30%。为了评估可扩展性,将训练好的策略无需重新训练直接应用于从40x40到100x100随机生成的实例,并与经典调度启发式方法(包括SPT、LPT、MWKR和EST)进行比较。在这些大规模实例中,Transformer相对于标准下界实现了12.89-15.12%的平均差距。与EST相比,Transformer保持了竞争力,通常差距较小,同时显著优于SPT和LPT。这些结果表明,在小规模OSSP实例上训练的Transformer策略可以推广到更大规模的问题,并提供一种轻量级、基于学习的替代经典调度规则的方法。

英文摘要

The open shop scheduling problem (OSSP) arises in many industrial and service settings but remains computationally challenging as the number of jobs and machines increases. While exact methods quickly become intractable, classical dispatching rules and metaheuristics may require substantial tuning to maintain solution quality at large scales. This study develops a Transformer-based scheduling policy for OSSP using an encoder-decoder architecture with multi-head attention. The model is trained on Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10) using only the processing-time matrix as input and produces feasible schedules with makespans typically within 15-30% of best-known values. To evaluate scalability, the trained policy is applied without retraining to randomly generated instances from 40x40 to 100x100 and compared against classical dispatching heuristics, including SPT, LPT, MWKR, and EST. Across these large instances, the Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound. Compared with EST, the Transformer remained competitive, typically within a modest margin, while substantially outperforming SPT and LPT. These results indicate that a Transformer policy trained on small OSSP instances can generalize to substantially larger problems and provide a feature-light, learning-based alternative to classical dispatching rules.

2606.13695 2026-06-15 physics.geo-ph cs.AI cs.LG 交叉投稿

Korzhinskii-Net: Physics-Informed Neural Network for Sub-Surface Mineral Prospectivity Modelling

Korzhinskii-Net: 用于地下矿产潜力建模的物理信息神经网络

Boris Kriuk

发表机构 * The Hong Kong University of Science and Technology(香港科技大学)

AI总结 提出Korzhinskii-Net,一种耦合达西流、热输运和反应速率的二维径向物理信息神经网络,在五个矿省四个矿种上平均PR-AUC达0.885,显著优于传统基线。

Comments 12 pages, 7 figures, 3 tables

详情
AI中文摘要

矿产潜力建模(MPM)支撑着勘探经济学,然而大多数操作流程简化为基于浅表地表代理训练的数据驱动分类器。这类模型对实际定位矿石的地下物理过程(热平流、流体流动和岩性依赖的沉淀)视而不见。我们提出Korzhinskii-Net,一个二维径向物理信息神经网络(PINN),它将达西流、平流-扩散热输运和softplus饱和反应速率耦合到一个可微的正演模型中,并由地表和遥感代理弱监督。该网络以Dmitri S. Korzhinskii(1899-1985)命名,其渗滤交代作用理论提供了物理框架。我们在五个矿省(涵盖四种矿种:诺里尔斯克(Ni-Cu-PGE)、佩琴加(Ni-Cu硫化物)、乌多坎(砂岩型Cu)、苏霍伊洛格(造山型Au)和米尔内(金伯利岩型钻石))上,采用公平、泄漏控制的5折交叉验证协议(含硬环形负样本)评估Korzhinskii-Net。Korzhinskii-Net的平均PR-AUC为0.885,而最强经典基线(梯度提升)为0.281;平均分位数排名为0.019,对比基线为0.413。这一改进在所有五个矿省和四个矿种系统中一致,表明即使仅受全球开放数据代理约束,物理信息可微模拟器也能恢复纯特征学习器系统性地遗漏的定位模式。我们将完整流程和评估工具开源。

英文摘要

Mineral prospectivity modelling (MPM) underpins exploration economics, yet most operational pipelines reduce to data-driven classifiers trained on shallow surface proxies. Such models are blind to the subsurface physics that actually localises ore: heat advection, fluid flow, and lithology-dependent precipitation. We present Korzhinskii-Net, a 2-D radial physics-informed neural network (PINN) that couples Darcy flow, advective-diffusive heat transport, and a softplus-saturated reaction rate into a single differentiable forward model, weakly supervised by surface and remote-sensing proxies. The network is named after Dmitri S. Korzhinskii (1899-1985), whose theory of infiltration metasomatism provides the physical scaffold. We evaluate Korzhinskii-Net on five ore provinces spanning four commodity classes -- Norilsk (Ni-Cu-PGE), Pechenga (Ni-Cu sulphide), Udokan (sandstone-hosted Cu), Sukhoi Log (orogenic Au), and Mirny (kimberlitic diamond) -- under a fair, leakage-controlled 5-fold cross-validation protocol with hard ring-shaped negatives. Korzhinskii-Net attains a mean PR-AUC of 0.885 versus 0.281 for the strongest classical baseline (gradient boosting), and a mean fractional rank of 0.019 versus 0.413. The improvement is consistent across all five provinces and four commodity systems, suggesting that physics-informed differentiable simulators, even when constrained only by global open-data proxies, can recover localisation patterns that pure feature-based learners systematically miss. We release the full pipeline and evaluation harness as open source.

2606.13696 2026-06-15 cs.CY cs.LG cs.MA cs.SI 交叉投稿

AGORA: Can Deliberation and Governance Gates Absorb Participation Bias in Transit Planning?

AGORA: 审议与治理门能否吸收公交规划中的参与偏差?

Jung-Hoon Cho, Cathy Wu

发表机构 * Department of Civil and Environmental Engineering and Laboratory for Information & Decision Systems, Massachusetts Institute of Technology(土木与环境工程系和信息与决策系统实验室,麻省理工学院) Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院)

AI总结 提出AGORA框架,通过固定网络、需求和求解器,系统变化会议组成、结构化审议和治理门,发现审议是参与影响结果的关键机制,治理门可压缩跨剖面方差,将参与偏差从不可控输入重构为过程设计问题。

详情
AI中文摘要

公交网络设计不仅依赖于优化算法,还取决于谁出现在公众听证会上。当前实践通常收集来自自选参与者的单向评论,使参与者构成成为结果变化的不可控来源。我们提出AGORA框架,该框架固定网络、需求和求解器,同时通过利益相关者代理、结构化审议和治理门系统变化会议组成。在两个不同规模的标准基准网络上,我们发现:(i) 总体结果在不同构成之间变化很小,但在尾部风险和公平性差异方面,代表性抽样仍然倾向于优于偏斜构成;(ii) 没有审议时,构成不产生任何变化,表明审议是“谁出席影响结果”的机制;(iii) 治理门压缩了跨剖面方差而不改变Mandl上的平均结果,但在Mumford0上的低接受率表明阈值需要实例特定的校准。这些发现将参与偏差从不可控输入重新定义为过程设计问题:即使没有保证的代表性出席,结构良好的审议和治理标准也能显著减少结果对“谁在房间里”的依赖程度。

英文摘要

Transit network design depends not only on the optimization algorithm but also on who shows up to the public hearing. Current practice often collects one-directional comments from self-selected attendees, leaving participant mix as an uncontrolled source of outcome variation. We present AGORA, a framework that holds the network, demand, and solver fixed while systematically varying meeting composition through stakeholder agents, structured deliberation, and governance gates. Across two standard benchmark networks at different scales, we find that (i) aggregate outcomes vary little across compositions, but on tail risk and fairness disparity, representative sampling still tends to outperform skewed compositions; (ii) without deliberation, composition produces no variation at all, showing that deliberation is the mechanism through which who attends affects outcomes; and (iii) governance gates compress cross-profile variance without shifting the average outcome on Mandl, but low acceptance on Mumford0 shows thresholds require instance-specific calibration. These findings reframe participation bias from an uncontrollable input to a process-design problem: even without guaranteed representative attendance, well-structured deliberation and governance criteria can substantially reduce how much outcomes depend on who is in the room.

2606.13747 2026-06-15 cs.AR cs.LG 交叉投稿

BigPower: Hierarchical Source-Level Module Power Estimation for CPUs with Large Language Models

BigPower: 基于大型语言模型的CPU层次化源码级模块功耗估计

Honghua Zhu, Chunjie Luo, Jianfeng Zhan

发表机构 * State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences Beijing China(处理器国家重点实验室,计算技术研究所,中国科学院北京中国)

AI总结 提出BigPower,利用大型语言模型表示和架构层次、模块连接、配置参数及工作负载上下文,直接从源码级设计信息估计CPU模块级功耗,无需仿真,在香山处理器上验证了有效性。

Comments 12 pages, 10 figures

详情
AI中文摘要

准确的功耗估计对于理解和优化CPU功耗行为至关重要,然而实际工作流程通常依赖于仿真推导的信息或硅后分析。在这项工作中,我们提出了BigPower,一种用于CPU设计过程中细粒度模块级功耗估计的层次化源码级替代模型。BigPower利用基于大型语言模型的表示,结合架构层次、模块连接、配置参数和工作负载上下文,直接从源码级设计信息估计模块级功耗,推理时无需额外仿真。在开源香山处理器系列上的实验结果表明,该方法能够在不同配置和工作负载下实现实用的细粒度功耗估计,为传统的基于仿真的工作流程提供了一种高效的替代方案。

英文摘要

Accurate power estimation is important for understanding and optimizing CPU power behavior, yet practical workflows often rely on simulation-derived information or post-silicon analysis. In this work, we present BigPower, a hierarchical source-level surrogate model for fine-grained module-level power estimation during CPU design. BigPower leverages large language model-based representations together with architectural hierarchy, module connectivity, configuration parameters, and workload context to estimate module-level power consumption directly from source-level design information, without requiring additional simulation during inference. Experimental results in the open-source XiangShan processor family demonstrate practical fine-grained power estimation across diverse configurations and workloads, offering an efficient alternative to conventional simulation-based workflows.

2606.13859 2026-06-15 cond-mat.mtrl-sci cs.LG 交叉投稿

Closed-loop discovery of out-of-distribution processing protocols by evolutionary search and uncertainty-aware learning

通过进化搜索和不确定性感知学习发现分布外处理协议的闭环方法

Yu Liu, Stanislav Udovenko, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Susan Trolier-McKinstry, Sergei V. Kalinin

发表机构 * Department of Materials Science and Engineering, University of Tennessee, Knoxville(田纳西大学材料科学与工程系) Materials Science and Engineering Department, Materials Research Institute, the Pennsylvania State University(宾夕法尼亚州立大学材料研究学院材料科学与工程系) Department of Materials Science and NanoEngineering, Rice University(Rice大学材料科学与纳米工程系) Rice Advanced Materials Institute, Rice University(Rice大学先进材料研究所) Department of Materials Science and Engineering, University of California, Berkeley(加州大学伯克利分校材料科学与工程系) Departments of Chemistry and Physics and Astronomy, Rice University(Rice大学化学与天文物理系) Physical Sciences Division, Pacific Northwest National Laboratory(太平洋西北国家实验室物理科学部)

AI总结 提出一种闭环工作流,结合紧凑波形表示的进化搜索与不确定性感知深度核学习,自动发现提升铁电薄膜非线性响应的分布外处理协议,并通过实验验证其机制。

详情
AI中文摘要

许多材料和化学系统表现出历史依赖的响应,其中功能结果不仅由最终状态变量决定,还由操作期间施加的场、温度或化学势的时间序列决定。因此,发现新的处理协议是一个高维搜索问题,其中控制变量是整个波形或样本历史,而传统策略要么局限于保守的内插族,要么变得过于测量密集。本文介绍了一种闭环工作流,将紧凑波形表示上的进化搜索与不确定性感知深度核学习相结合,以生成、排序和实验验证候选协议。应用于铁电薄膜,以扫描探针尖端偏压波形为协议,非线性机电响应为奖励,该工作流发现了通过去老化薄膜增强非线性的波形族。空间分辨的前后测量表明,性能最佳的波形选择性地激活预先存在的弱钉扎畴壁段,而最差的波形则驱动长程不可逆切换。该框架将协议调优重新定义为分布外发现,可推广到合成和退火轨迹、电池形成协议以及其他高维控制问题。

英文摘要

Many materials and chemical systems exhibit history-dependent responses, where functional outcomes are governed not only by final-state variables but by the time-dependent sequence of fields, temperatures, or chemical potentials applied during operation. Discovering new processing protocols is therefore a high-dimensional search problem in which the control variable is an entire waveform or sample history, and conventional strategies either remain confined to conservative interpolative families or become prohibitively measurement intensive. Here, a closed-loop workflow is introduced that couples evolutionary search over a compact waveform representation with uncertainty-aware deep kernel learning to generate, rank, and experimentally validate candidate protocols. Applied to ferroelectric thin films, with the scanning-probe tip-bias waveform as the protocol and the nonlinear electromechanical response as the reward, the workflow discovers waveform families that enhance nonlinearity by de-aging the film. Spatially resolved before/after measurements show that the best-performing waveforms selectively activate pre-existing, weakly pinned domain-wall segments, whereas the worst drive long-range irreversible switching. This framework reframes protocol tuning as out-of-distribution discovery, generalizable to synthesis and annealing trajectories, battery formation protocols, and other high-dimensional control problems.

2606.13868 2026-06-15 astro-ph.IM cs.LG 交叉投稿

Multi-Variable Stellar Parameter Estimation Using Residual Multitask Neural Networks

使用残差多任务神经网络的多变量恒星参数估计

Bruno Santos Meneses Barreto, Marcio Eisencraft

发表机构 * Escola Politécnica, Universidade de São Paulo, SP(圣保罗大学理工学院)

AI总结 提出一种端到端流水线,利用带残差块的全连接多任务神经网络,通过贝叶斯优化调参,从SDSS光谱中估计有效温度、金属丰度和表面重力,在低复杂度下达到1%-3%的归一化误差。

Comments This manuscript has been submitted to the Congresso Brasileiro de Automática (CBA) and is currently under peer review

详情
AI中文摘要

我们提出了一种端到端流水线,用于从斯隆数字巡天数据发布12的光谱中估计恒星参数,该流水线使用带有残差块的全连接多任务神经网络,其超参数通过贝叶斯优化进行调优。预处理流水线包括每个光谱的标准化、目标变量(有效温度$T_{\mathrm{eff}}$、金属丰度$[\mathrm{Fe/H}]$和表面重力$\log g$)的RobustScaler归一化,以及通过注入高斯噪声进行数据增强。在保留的测试集上,该模型对$T_{\mathrm{eff}}$实现了$59.76~\mathrm{K}$的平均绝对误差(MAE),对$[\mathrm{Fe/H}]$实现了$0.103~\mathrm{dex}$,对$\log g$实现了$0.130~\mathrm{dex}$。相对于每个参数的全尺度范围进行归一化后,这些结果代表了$1\%$到$3\%$的范围归一化误差,而模型复杂度仅为约540,000个可训练参数,效率极高。这些结果表明,紧凑的残差多任务架构结合合理的信号预处理,为大规模光谱数据集中的非线性参数估计提供了一种参数高效的解决方案。特别是,所提出的模型在复杂度远低于更深神经网络基线的情况下实现了有竞争力的性能。

英文摘要

We present an end-to-end pipeline for estimating stellar parameters from Sloan Digital Sky Survey Data Release 12 spectra using a fully connected multitask neural network with residual blocks, whose hyperparameters are tuned via Bayesian optimization. The preprocessing pipeline includes per-spectrum standardization, RobustScaler normalization of the target variables -- effective temperature $T_{\mathrm{eff}}$, metallicity $[\mathrm{Fe/H}]$, and surface gravity $\log g$ -- and data augmentation via Gaussian noise injection. On a held-out test set, the model achieved Mean Absolute Errors (MAE) of $59.76~\mathrm{K}$ for $T_{\mathrm{eff}}$, $0.103~\mathrm{dex}$ for $[\mathrm{Fe/H}]$, and $0.130~\mathrm{dex}$ for $\log g$. Normalized against the full-scale range of each parameter, these results represent range-normalized errors between $1\%$ and $3\%$, achieved with a highly efficient model complexity of approximately 540,000 trainable parameters. These results demonstrate that a compact residual multitask architecture, combined with principled signal preprocessing, provides a parameter-efficient solution for nonlinear parameter estimation in large-scale spectral datasets. In particular, the proposed model achieves competitive performance with substantially lower complexity than deeper neural network baselines.

2606.13941 2026-06-15 gr-qc astro-ph.IM cs.LG 交叉投稿

Binary Black Hole Parameter Estimation with Hybrid CNN-Transformer Neural Networks

使用混合CNN-Transformer神经网络进行双黑洞参数估计

Panagiotis N. Sakellariou, Spiros V. Georgakopoulos, Sotiris Tasoulis, Vassilis P. Plagianakos

发表机构 * University of Thessaly(塞萨洛尼基大学)

AI总结 提出混合CNN-Transformer深度学习策略,用于估计非进动双黑洞系统的内禀和外在参数,在模拟和真实引力波事件中展现出强预测性能和鲁棒性。

Comments Accepted manuscript. 12 pages, 10 figures

详情
Journal ref
Astronomy and Computing, vol. 54, 101027 (2026)
AI中文摘要

引力波的探测彻底改变了我们探索宇宙基本方面的能力。传统上,建模的引力波信号通过基于模板的匹配滤波来识别,随后在信噪比时间序列中跨多个探测器进行符合分析。机器学习和深度学习的最新进展激发了人们对其在信号检测和参数估计中应用的兴趣。在本研究中,提出了一种混合深度学习策略,利用Transformer编码器的有效性以及成熟的卷积神经网络架构,尝试估计非进动双黑洞系统的内禀和外在参数。这项工作的主要焦点是点估计,即为每个参数生成单一最佳拟合值,而非完整的后验分布。该方法在嵌入高斯噪声的模拟信号和真实引力波事件上进行了评估,并在关键天体物理参数上展示了强大的预测性能和鲁棒性。

英文摘要

The detection of gravitational waves has revolutionized our ability to explore fundamental aspects of the Universe. Traditionally, modeled gravitational-wave signals have been identified using template-based matched filtering, followed by coincidence analysis across multiple detectors in the signal-to-noise ratio time series. Recent advances in Machine Learning and Deep Learning have sparked growing interest in their application to both signal detection and parameter estimation. In this study, a hybrid Deep Learning strategy is proposed that leverages the effectiveness of Transformer encoders alongside well-established Convolutional Neural Network architectures in an attempt to estimate the intrinsic and extrinsic parameters of non-precessing binary black hole systems. The primary focus of this work is point estimation, producing single best-fit values for each parameter rather than full posterior distributions. This method is evaluated on both simulated signals embedded in Gaussian noise and real gravitational-wave events, and it demonstrates strong predictive performance and robustness across key astrophysical parameters.

2606.13952 2026-06-15 cs.CR cs.ET cs.LG 交叉投稿

Side-Channel Attacks Bypass Protection in 3D Printers

侧信道攻击绕过3D打印机的保护

Eric Yocam, Varghese Vaidyan, Micah Flack, Gurcan Comert, Judith L. Mwakalonge

发表机构 * Department of Computer Science, California Polytechnic State University(计算机科学系,加州大学Polytechnic州立大学) Beacom College of Computer and Cyber Sciences, Dakota State University(计算机与网络科学学院,达科他州立大学) Idaho National Laboratory(爱达荷国家实验室) Department of Computational Data Science and Engineering, North Carolina A&T State University(计算数据科学与工程系,北卡罗来纳A&T州立大学) Department of Engineering, South Carolina State University(工程系,南卡罗来纳州立大学)

AI总结 首次评估商用3D打印机的主动电机噪声消除(AMNC)硬件对策,发现其完全消除声学信道,但振动信道仍泄漏几何信息,且泄漏具有设备特异性。

Comments 11 pages, 6 figures, 4 tables

详情
AI中文摘要

主动电机噪声消除(AMNC)作为硬件对策,已部署在商用熔融沉积成型(FDM)3D打印机中,用于防御针对知识产权(IP)的声学侧信道攻击。我们首次对部署的AMNC对策进行实证评估,使用来自两台配备AMNC的Bambu Lab打印机的同步声学和振动记录公共数据集,涵盖12个物体类别。AMNC完全中和了声学信道:分类准确率与8.33%的随机基线无法区分。AMNC未针对的振动信道仍然泄漏。通过汇总统计,泄漏是粗略且幅度驱动的(振动准确率约31%合并,36-47%打印机内),而波形形状几乎不携带信息(仅频率特征为随机)。一个摄入打印有序演化的全序列时间模型将准确率提升至约61%,而顺序打乱的控制(约33%)表明,一个实质性成分是真正的顺序性并依赖于打印进程。泄漏具有设备特异性:在一台打印机上训练的分类器转移到另一台时接近随机。我们得出结论:AMNC仅是声学防御;振动仍然是一个部分、几何相关的侧信道,它未解决,但在此数据集上不支持完整的几何重建;重建级攻击需要AMNC同样未涉及的磁或电源信道。我们发布所有代码。

英文摘要

Active Motor Noise Cancellation (AMNC) ships in commercial fused deposition modeling (FDM) 3D printers as a hardware countermeasure against acoustic side-channel attacks that target intellectual property (IP). We present the first empirical evaluation of a deployed AMNC countermeasure, using a public dataset of synchronized acoustic and vibration recordings from two AMNC-equipped Bambu Lab printers across 12 object classes. AMNC fully neutralizes the acoustic channel: classification accuracy is indistinguishable from the 8.33% random baseline. The vibration channel, which AMNC does not target, still leaks. With summary statistics the leak is coarse and amplitude-driven (vibration accuracy approximately 31% pooled, 36-47% within-printer), while the waveform shape carries essentially nothing (frequency-only features at chance). A full-sequence temporal model that ingests the ordered evolution of the print raises accuracy to approximately 61%, and an order-shuffling control (approximately 33%) shows that a substantial component is genuinely sequential and tied to print progression. The leak is device-specific: a classifier trained on one printer transfers near chance to the other. We conclude that AMNC is an acoustic-only defense: vibration remains a partial, geometry-correlated side channel it does not address, but one that does not, on this dataset, support full geometric reconstruction; reconstruction-grade attacks would require the magnetic or power channels AMNC also leaves untouched. We release all code.

2606.13978 2026-06-15 astro-ph.IM cs.LG 交叉投稿

Classification of Astronomical Spectra Using PCA-Compressed Flux and Inverse-Variance Features

使用PCA压缩通量和逆方差特征对天文光谱进行分类

Bruno Santos Meneses Barreto, Marcio Eisencraft

发表机构 * Departamento de Engenharia de Telecomunicações e Controle, Universidade de São Paulo(电信与控制工程系,圣保罗大学)

AI总结 提出一种结合通量和逆方差特征、经PCA压缩后使用LightGBM分类器对SDSS DR17光谱进行恒星、星系和类星体分类的方法,测试集准确率达94.6%。

Comments This manuscript has been submitted to the Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT) and is currently under peer review

详情
AI中文摘要

本文评估了一种用于将SDSS DR17天文光谱分类为恒星、星系和类星体的信号处理和监督学习流程。每个光谱由其测量的通量和逆方差信息表示,结合了光谱形状与波长依赖的可靠性分布。在重新采样到共同的对数波长网格后,通量和逆方差向量被标准化并分别使用主成分分析进行压缩。得到的成分被连接起来并用于训练多个分类器。最佳性能由LightGBM梯度提升分类器获得,在测试集上达到94.6%的准确率和92.1%的平衡准确率。

英文摘要

This paper evaluates a signal-processing and supervised-learning pipeline for classifying SDSS DR17 astronomical spectra into stars, galaxies, and quasars. Each spectrum is represented by its measured flux and inverse-variance information, combining spectral shape with a wavelength-dependent reliability profile. After resampling onto a common logarithmic wavelength grid, the flux and inverse-variance vectors are standardized and separately compressed using principal component analysis. The resulting components are concatenated and used to train several classifiers. The best performance was obtained with the LightGBM gradient-boosting classifier, reaching $94.6\%$ accuracy and $92.1\%$ balanced accuracy on the test set.

2606.14188 2026-06-15 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC 交叉投稿

Robustness without Wrinkles: Parallel Simulation and Robust MPC for Certified Deformable Manipulation

无皱鲁棒性:并行仿真与鲁棒MPC实现可认证的变形体操作

Wei-Chen Li, Jeffrey Fang, Sasanka Polisetti, Yuexi Song, Glen Chou

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出CORD-SLS实时控制方法,通过GPU并行可微仿真与接触平滑实现高效梯度规划,结合鲁棒模型预测控制与共形预测校准,在绳索和布料操作中达到毫秒级规划与高安全性。

详情
AI中文摘要

我们提出了CORD-SLS,一种用于安全变形物体操作的实时控制方法,重点关注绳索和布料。其核心是一个带有接触平滑的GPU并行可微仿真器,能够通过间歇性接触实现高效的基于梯度的规划。为了在模型和感知不确定性下鲁棒地满足约束,我们开发了一种实时、GPU并行的输出反馈鲁棒模型预测控制(MPC)算法,该算法利用该仿真器进行规划。我们进一步证明,该仿真器加速了基于模型的强化学习,用于训练神经操作策略。为了提高现实世界的鲁棒性,我们使用共形预测来校准视觉反馈和感知误差界限,用于MPC,从而产生可达管,实现高概率的安全控制。我们在仿真和硬件上对高维、接触丰富的绳索和布料操作任务(包括避障、布线、折叠和平整)评估了CORD-SLS。在各种设置中,CORD-SLS实现了毫秒级规划速度,在安全性、速度和任务成功率方面均优于基线方法。

英文摘要

We present CORD-SLS, a real-time control method for safe deformable object manipulation, with a focus on ropes and cloth. At its core is a GPU-parallel differentiable simulator with contact smoothing which enables efficient gradient-based planning through intermittent contact. To robustly satisfy constraints under model and sensing uncertainty, we develop a real-time, GPU-parallel output-feedback robust model predictive control (MPC) algorithm that plans with this simulator. We further show that the simulator accelerates model-based RL for training neural manipulation policies. To improve real-world robustness, we use conformal prediction to calibrate visual-feedback and perception-error bounds for MPC, producing reachable tubes that enable high-probability safe control. We evaluate CORD-SLS on high-dimensional, contact-rich rope and cloth manipulation tasks in simulation and hardware, including obstacle avoidance, routing, folding, and smoothing. Across settings, CORD-SLS achieves millisecond-speed planning, exceeding baselines in safety, speed, and task success.

2606.14194 2026-06-15 cs.CV cs.LG 交叉投稿

Hybrid Classical-Quantum (HCQ) Alzheimer's Classification via Supervised $β$-VAE and Quantum Kernels

混合经典-量子(HCQ)阿尔茨海默病分类:基于监督β-VAE与量子核

Tia Tiwari, Vamshi Krishna Kancharla, Neelam Sinha

发表机构 * Centre for Brain Research, Indian Institute of Science(印度科学研究所脑研究中心) Vision and AI Lab (VAL), Indian Institute of Science(印度科学研究所视觉与人工智能实验室)

AI总结 提出两阶段混合经典-量子流水线,通过监督3D β-VAE压缩MRI为64维潜码,经PLS选择6个成分编码为6量子比特态,利用量子核SVM实现AD分类,在ADNI-1上达72.1%准确率与0.799 AUC。

详情
AI中文摘要

本文提出了一种两阶段混合经典-量子(HCQ)流水线,用于从3D T1加权结构MRI体素中进行二元阿尔茨海默病(AD)分类,其中经典和量子组件设计为互补而非独立运行。一个监督的3D β-变分自编码器(VAE)在体素级重建、KL散度和焦点分类损失下进行端到端训练,将每个3D MRI体积(从152 x 184 x 152重采样为96 x 96 x 96)压缩为64维潜码。偏最小二乘(PLS)回归选择潜码中最佳区分阿尔茨海默病(AD)与认知正常(CN)受试者的六个分量,并将其重新缩放为旋转角度,通过ZZ量子特征映射编码到六量子比特寄存器上,得到相应的量子态。预计算核支持向量机(SVM)的输入是一个N x N Gram矩阵(N = 308),通过计算每对量子态之间的重叠得到。本工作的新颖之处在于量子核直接作用于由监督自编码器端到端学习的疾病感知特征,而非预提取的输入。在308名ADNI-1受试者(包括137名AD和171名CN)上,基线模型达到67.2%的准确率和0.759的AUC,而稳定性增强变体达到72.1%的准确率和0.799的AUC,且交叉验证方差减半。3D Grad-CAM进一步帮助验证了模型对与阿尔茨海默病相关脑区的关注。HCQ流水线可作为跨生物医学成像领域的诊断分类通用框架,这些领域对经典方法存在类似挑战。

英文摘要

This paper presents a two-stage Hybrid Classical-Quantum (HCQ) pipeline for binary Alzheimer's disease (AD) classification from 3D T1-weighted structural MRI volumes, where the classical and quantum components are designed to complement each other rather than operate independently. A supervised 3D $β$-variational autoencoder (VAE) is trained end-to-end under voxel-wise reconstruction, KL-divergence, and focal classification losses that compress each 3D MRI volume (resized from 152 x 184 x 152 to 96 x 96 x 96) into a 64-dimensional latent code. Partial Least Squares (PLS) regression selects the six components in the latent code that best separate Alzheimer's Disease (AD) from cognitively normal (CN) subjects and rescales them into rotation angles, which are encoded onto a six-qubit register using the ZZ quantum feature map to give us the respective quantum states. The input to a precomputed-kernel Support Vector Machine (SVM) is an N x N Gram matrix (N = 308), created by calculating the overlap between every pair of quantum states. The novelty of this work lies in the fact that the quantum kernel operates directly on disease-aware features that are learned end-to-end by a supervised autoencoder, rather than on pre-extracted inputs. On 308 ADNI-1 subjects, consisting of 137 AD and 171 CN subjects, the baseline achieved 67.2% accuracy and 0.759 AUC, while the stability-enhanced variant reached 72.1% accuracy and 0.799 AUC with cross-fold variance halved. 3D Grad-CAM further helped validate our model's focus on brain regions linked to Alzheimer's. The HCQ pipeline could serve as a general-purpose framework for diagnostic classification across biomedical imaging domains that present similar challenges for classical approaches.

2606.14218 2026-06-15 cs.RO cs.AI cs.LG 交叉投稿

Universal Manipulation Exoskeleton: Learning Compliant Whole-body Policies with Real-time Torque Feedback

通用操控外骨骼:利用实时扭矩反馈学习全身柔顺策略

Litian Liang, Jingxi Xu, Xinda Qi, Yujun Cai, Houzhu Ding, Luqi Wang, Zhixin Sun, Jyh-Herng Chow, Ming Yang, Mark Cutkosky

发表机构 * Ant Group(蚂蚁集团) Stanford University(斯坦福大学)

AI总结 提出通用操控外骨骼(UME),通过实时触觉扭矩反馈和全身数据采集,使机器人学习主动柔顺策略,在受限空间中完成移动操作、力控翻转等任务。

详情
AI中文摘要

为了使机器人在家庭环境中安全工作,它们需要具备柔顺性,并在接触过程中对扭矩和力反馈做出反应。然而,现有的大多数数据采集管道仍然缺乏捕捉力和扭矩数据以学习主动柔顺策略的能力。在本文中,我们提出了通用操控外骨骼(UME),一种上肢外骨骼,它提供实时触觉扭矩反馈,同时记录整个手臂的配置和关节扭矩信号用于遥操作。凭借透明的扭矩反馈,人类操作员甚至可以在蒙眼的情况下拔出运动学约束的物体。UME成本低、重量轻且便携。配备嵌入式IMU,它支持移动操作的遥操作。通过我们提出的通用重定向算法,UME可以遥操作多种机器人,包括7自由度OpenArm、7自由度Franka和6自由度X-ARM。我们证明,这些能力的组合使得学习双臂、全身和主动柔顺策略成为可能,这些策略在高度受限的空间中有效运行。学习到的鲁棒自主策略在各种任务中实现了高成功率,包括长时程移动操作、力介导的箱子翻转、视觉遮挡的箱子推挤以及空间受限的桌面操作。视频、代码和更多信息可在此https URL找到。

英文摘要

For robots to work safely in household environments, they need to be compliant and react to torque and force feedback during contact. However, the majority of existing data collection pipelines still lack the ability to capture force and torque data for learning active compliant policies. In this paper, we present Universal Manipulation Exoskeleton (UME), an upper-limb exoskeleton that provides real-time haptic torque feedback while recording whole-arm configurations and joint torque signals for teleoperation. With transparent torque feedback, human operators can even unsheathe kinematically constrained objects while blindfolded. UME is low-cost, lightweight, and portable. Equipped with an embedded IMU, it enables teleoperation for mobile manipulation. With our proposed universal retargeting algorithm, UME can teleoperate a range of robots, including the 7DoF OpenArm, 7DoF Franka, and 6DoF X-ARM. We demonstrate that this combination of capabilities enables learning bimanual, whole-body, and active compliant policies that operate effectively in highly constrained spaces. The learned robust autonomous policies achieve high success rates across a variety of tasks, including long-horizon mobile manipulation, force-mediated box flipping, visually occluded box pushing, and space-constrained tabletop manipulation. Videos, code, and additional information can be found at https://ume-exo.github.io.

2606.14373 2026-06-15 hep-ex cs.LG hep-ph physics.data-an physics.ins-det 交叉投稿

Machine-learned particle flow as a foundation model for collider physics

机器学习粒子流作为对撞机物理学的基础模型

Farouk Mokhtar, Joosep Pata, Michael Kagan, Javier Duarte

发表机构 * University of California San Diego(加州大学圣地亚哥分校) National Institute of Chemical Physics and Biophysics(化学物理与生物物理国家研究所) SLAC National Accelerator Laboratory(斯坦福线性加速器中心国家加速器实验室)

AI总结 将事件重建视为机器学习问题,利用MLPF模型学习到的潜在表示,在喷注味识别、喷注能量回归和缺失动量回归三项分析任务上显著提升性能,且单线性层即可媲美先进架构,参数减少约35倍。

Comments 15 pages, 11 figures

详情
AI中文摘要

从粒子对撞到物理分析的工作流程经过一系列传统上模块化且不连续的重建步骤,没有共享表示连接低层级探测器数据与高层级分析任务。我们表明,将事件重建视为机器学习问题自然会产生这样的共享表示。我们重新利用为粒子流重建(MLPF)训练的机器学习模型来执行三项不同的分析任务:喷注味识别、喷注能量回归和缺失动量回归。通过将在重建过程中学到的每个粒子的潜在表示作为附加输入特征,我们显著优于仅使用运动学特征的基线。我们进一步证明,仅使用潜在表示训练的单个线性层在性能上可与最先进的基线架构相媲美,并且在缺失动量回归上优于基线,参数数量减少约35倍。这些结果表明,在重建过程中学到的潜在表示编码了下游分析所需的基本物理信息,将MLPF确立为基础模型,并为从探测器数据到物理分析的端到端流程提供了具体步骤。

英文摘要

The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.

2606.14561 2026-06-15 cs.RO cs.LG 交叉投稿

ORCA: A Platform for Open-Source Dexterity Research

ORCA: 开源灵巧性研究平台

Francesco Capuano, Maximilian Eberlein, Fabrice Bourquin, Clemens Claudio Christoph

发表机构 * University of Oxford(牛津大学) ETH Zurich(苏黎世联邦理工学院) Orca Dexterity

AI总结 提出ORCA学习栈,统一灵巧手控制、仿真、遥操作和重定向,集成机器人学习框架,实现端到端灵巧操作研究。

Comments 15 pages

详情
AI中文摘要

机器人操作研究越来越关注两指平行夹爪,因其有效性、经济性和易于遥操作。然而,夹爪受限于其外形因素,即使对于简单的重新定向任务,也常常需要双臂设置。拟人手是灵巧机器人学习的更自然平台——更接近人手,能够从人类视频中学习——但它们在学习研究中仍然难以使用:即使存在开放且可访问的手部硬件,用于控制、仿真、遥操作和重定向的软件也分散在零散的代码库中,并且与机器人学习生态系统基本脱节。在这项工作中,我们介绍了\orca~学习栈,这是一个将灵巧性作为第一类机器人学习领域的开源研究栈。我们的\orca~栈将低级控制、仿真、来自一系列消费平台的遥操作以及手部重定向统一在单个接口后面,并原生集成流行的机器人学习框架(如\lerobot),使灵巧手研究人员能够利用与非灵巧机器人学习相同的数据、训练和评估流程。我们展示了一个完整的端到端工作流程,通过使用消费级VR头显进行遥操作收集手内重新定向任务的专家演示,使用\lerobot训练自主策略,并在完全可重现和可观察的设置中评估学习到的策略。我们将整个栈开源,作为灵巧操作研究的共享、可重现基础。

英文摘要

Robotics manipulation research increasingly focuses on two-finger parallel grippers for their effectiveness, affordability, and ease of teleoperation. Grippers are nonetheless limited by their form factor, often requiring bimanual setups even for simple reorientation tasks. Anthropomorphic hands are a more natural platform for dexterous robot learning -- closer to the human hand, and capable of learning from human video -- yet they remain hard to use in learning research: even where open and accessible hand hardware exists, the software for control, simulation, teleoperation, and retargeting is scattered in one-off code bases, and largely disconnected from the robot-learning ecosystem. In this work, we introduce the \orca~learning stack, an open-source research stack for dexterity as a first-class robot learning domain. Our \orca~stack unifies low-level control, simulation, teleoperation from a range of consumer platforms, and hand retargeting, behind a single interface, and integrates natively with popular robot-learning frameworks such as \lerobot, so dexterous hand researchers can leverage the same data, training, and evaluation pipelines used for non-dexterous robot learning. We demonstrate a complete end-to-end workflow, collecting expert demonstrations of an in-hand reorientation task by teleoperation with a consumer-grade VR headset, training an autonomous policy with \lerobot, and evaluating the learned policy in a fully reproducible and observable setup. We open-source the entire stack as a shared, reproducible foundation for dexterous-manipulation research.

2606.14565 2026-06-15 cs.CE cs.LG physics.comp-ph 交叉投稿

CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field data

CANN-EUCLID:基于全场数据的无监督本构人工神经网络模型发现

Benjamin Alheit, Siddhant Kumar, Mathias Peirlinck

AI总结 提出CANN-EUCLID框架,结合本构人工神经网络与无监督全场数据发现方法,直接从位移场和反作用力识别稀疏超弹性本构模型,无需应力测量或预设模型。

详情
AI中文摘要

本构人工神经网络(CANN)提供了可解释的材料模型发现方法,但迄今为止仅用于基于均匀测试的表观应力-应变数据的应力监督设置。由于每个测试仅采样狭窄的加载路径并提供均匀化而非局部应力信息,稳健的发现通常需要多种加载模式来约束多维响应。这对于软生物组织具有挑战性,因为重复测试、损伤和样本变异性限制了单个标本的可靠信息。在这里,我们将CANN与应力无监督的全场发现框架EUCLID相结合,直接从位移场和反作用力中识别稀疏超弹性本构律,仅需一个诱导异质性的加载案例。CANN-EUCLID通过稀疏促进正则化最小化平衡不平衡,选择紧凑的活跃项,无需局部应力测量或预设本构律。我们在具有预设真实本构律的各向同性和各向异性基准上评估了该方法。当真实本构律可由所选CANN基表示时,我们的方法以近乎精确的精度恢复正确项,包括带有嵌入参数的指数项。当真实本构律不包含在基中时,该方法保留共享项并使用可用基函数近似缺失贡献。泛化能力强烈依赖于采样的变形状态:当充分探测时,指数应变硬化项可以准确恢复,但当硬化区域位于采样域之外时,可能产生较大的外推误差。正向有限元验证模拟表明,发现的行为准确复制了真实本构律。这些结果确立了应力无监督CANN发现作为可解释的全场本构模型识别的有前景框架。

英文摘要

Constitutive artificial neural networks (CANNs) provide interpretable material model discovery, but have so far been used in stress-supervised settings based on apparent stress-strain data from homogeneous tests. Because each test samples only a narrow loading path and provides homogenized rather than local stress information, robust discovery typically requires multiple loading modes to constrain the multidimensional response. This is challenging for soft biological tissues, where repeated testing, damage, and sample variability limit reliable information from a single specimen. Here, we combine CANNs with the stress-unsupervised full-field discovery framework EUCLID to identify sparse hyperelastic laws directly from displacement fields and reaction forces in one heterogeneity-inducing loading case. CANN-EUCLID minimizes equilibrium imbalance with sparsity-promoting regularization selecting compact active terms, without local stress measurements or a prescribed law. We evaluate the approach on isotropic and anisotropic benchmarks with prescribed ground-truth laws. When the ground truth is representable by the chosen CANN basis, our method recovers the correct terms with near-exact accuracy, including exponential terms with embedded parameters. When it is not contained in the basis, the method retains shared terms and approximates missing contributions using available basis functions. Generalization depends strongly on sampled deformation states: exponential strain-stiffening terms can be recovered accurately when sufficiently probed, but can produce large extrapolation errors when the stiffening regime lies outside the sampled domain. Forward FE validation simulations show that the discovered behavior accurately replicates the ground truth. These results establish stress-unsupervised CANN discovery as a promising framework for interpretable full-field constitutive model identification.

2606.14570 2026-06-15 physics.ao-ph cs.AI cs.LG 交叉投稿

Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?

基于扩散方法的区域气候模型模拟:生成式机器学习的附加价值是什么?

Mikel N. Legasa, Antoine Doury, Achille Gellens, Redouane Lguensat, Clara Naldesi, Soulivanh Thao, Mathieu Vrac

发表机构 * University of Cambridge(剑桥大学) CNRS(法国国家科学研究中心) Institut Pierre Simon Laplace(皮埃尔·西蒙·拉普拉斯研究所)

AI总结 本文提出ParamDiffusion,一种两阶段扩散框架,与确定性方法对比,评估生成式机器学习在区域气候模型模拟中的附加价值,发现扩散方法能高技巧地再现气候统计特征,但极端事件模拟仍有不足。

Comments Submitted to Journal of Advances in Modeling Earth Systems (JAMES)

详情
AI中文摘要

模拟器通过捕捉区域气候模型(RCM)的动力降尺度功能,提供了一种经济有效的替代方案。它们将全球气候模型(GCM)模拟的大尺度预测因子与RCM模拟的目标变量(此处为降水)的高分辨率场联系起来。机器学习方法,通常是深度学习,在计算时间和能耗上比运行RCM更便宜。其中,生成模型具有吸引力,因为它们可以模拟与预测因子一致的局部高分辨率场集合。这个集合,我们称之为不确定性包络,其附加价值仍有待恰当评估。在此,我们做出三项贡献。首先,我们引入ParamDiffusion,一种新的两阶段扩散框架,并将其与最先进的扩散方法进行比较。其次,我们通过一个符合气候科学需求的综合框架扩展标准验证,检查特定降水事件,包括极端事件。第三,在此框架内,我们评估扩散方法相对于确定性方法的附加价值。我们相互比较了四种深度学习模型:一种旨在捕捉降水尾部的确定性模型;一种基于该模型的参数化概率模型;一种最近提出的扩散方法;以及ParamDiffusion,它将参数化模型与扩散模型相结合。我们的结果表明,基于扩散的方法以高技巧再现了气候降水统计特征,包括分布尾部和空间复合极端事件,同时生成空间细节丰富的场。然而,所评估的模型均未能在其不确定性包络内始终如一地解释最极端的RCM模拟事件。因此,扩散模型在概率性RCM模拟方面具有前景,但在它们能够可靠地代表高影响降水极端事件之前,仍需取得进展。

英文摘要

Emulators provide a cost-effective alternative to regional climate models (RCMs) by capturing their dynamical downscaling function. They link large-scale predictors simulated by global climate models (GCMs) to RCM-simulated high-resolution fields of the target variable, here precipitation. Machine learning methods, typically deep learning, are cheaper than running RCMs in computation time and energy. Among them, generative models are appealing because they can simulate ensembles of local high-resolution fields consistent with the predictors. This ensemble, which we call the uncertainty envelope, remains to be properly assessed for added value. Here, we make three contributions. First, we introduce ParamDiffusion, a new two-stage diffusion-based framework, and compare it with a state-of-the-art diffusion approach. Second, we expand standard validation through a comprehensive framework aligned with climate-science needs, examining specific precipitation events, including extremes. Third, within this framework, we assess the added value of diffusion approaches relative to deterministic methods. We intercompare four deep-learning models: a deterministic model designed to capture the precipitation tail; a parametric probabilistic model based on it; a recently proposed diffusion approach; and ParamDiffusion, which couples the parametric model with a diffusion model. Our results show that diffusion-based approaches reproduce climatological precipitation statistics with high skill, including distributional tails and spatially compounded extremes, while generating spatially detailed fields. However, none of the assessed models consistently accounts for the most extreme RCM-simulated events within its uncertainty envelope. Diffusion models are therefore promising for probabilistic RCM emulation, but progress is still required before they can reliably represent high-impact precipitation extremes.

2606.14654 2026-06-15 cs.AI cs.CL cs.LG 交叉投稿

Abstracting Cross-Domain Action Sequences into Interpretable Workflows

将跨领域动作序列抽象为可解释的工作流

Gaurav Verma, Scott Counts

发表机构 * Microsoft Corporation(微软公司)

AI总结 提出WorkflowView框架,利用大语言模型将低层动作序列抽象为高层活动,在三个不同任务中验证了有效性和泛化能力,实现高语义相似度和预测性能。

Comments preprint; 9 pages, 5 figures

详情
AI中文摘要

序列或时间戳交互日志提供了数字应用使用的客观记录,但其粒度和噪声常常掩盖了关于人们工作的有意义见解。这些见解对于以真实用户交互为基础改进数字产品至关重要。先前的研究应用深度学习模型将用户动作聚类为高层活动,但这些方法对噪声高度敏感且难以跨应用泛化。为解决这一局限,我们引入了WorkflowView,一个使用大语言模型(LLMs)将低层动作序列抽象为高层活动的框架。我们在三个不同且具有挑战性的序列任务和多样化领域中建立了该方法的有效性和泛化性:(a)从浏览器日志中进行零样本任务描述重构(实现高语义相似度,$\mu_{sim} = 0.91$),(b)使用MOOC交互日志进行少样本学生退学预测(仅用五个少样本示例达到加权$F_1 = 0.90$),以及(c)对Microsoft Word中文档工作流中AI工具集成进行匿名化、隐私保护分析。我们的工作表明,基于LLM的抽象是将低层行为数据转化为高层、可解释且可操作见解的稳健高效途径。我们还讨论了在日志基础设施中部署基于LLM的推理时的实际考虑,包括计算效率和用户隐私。

英文摘要

Sequential or time-stamped interaction logs provide objective records of digital application usage, yet their granularity and noise often obscure meaningful insights into people's work. Such insights are essential for improving digital products in ways grounded in real-world user interactions. Prior research has applied deep learning models to cluster user actions into high-level activities, but these approaches are highly sensitive to noise and struggle to generalize across applications. To address this limitation, we introduce WorkflowView, a framework that uses large language models (LLMs) to abstract low-level action sequences into high-level activities. We establish the effectiveness and generality of our approach across three distinct, challenging sequential tasks and diverse domains: (a) zero-shot task description reconstruction from browser logs (achieving high semantic similarity, $μ_{sim} = 0.91$), (b) few-shot student dropout prediction using MOOC interaction logs (reaching weighted $F_1 = 0.90$ with only five few-shot examples), and (c) anonymized, privacy-preserving analysis of AI tool integration within document workflows in Microsoft Word. Our work demonstrates that LLM-based abstraction is a robust and efficient path forward for transforming low-level behavioral data into high-level, interpretable, and actionable insights. We also discuss practical considerations for deploying LLM-based inferences within logging infrastructures, including computational efficiency and user privacy.

2507.10834 2026-06-15 cs.LG 版本更新

From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems

从小到大:一种用于解决分类优化问题的图卷积网络方法

Guokai Li, Pin Gao, Stefanus Jasin, Zizhuo Wang

发表机构 * Smith School of Business, Queen’s University(女王大学商学院) School of Data Science, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院) Stephen M. Ross School of Business, University of Michigan(密歇根大学罗斯商学院)

AI总结 提出图卷积网络(GCN)框架高效求解约束分类优化问题,通过图表示学习参数到最优分类的映射,小样本训练可泛化至大规模问题,数值实验显示20产品训练模型在2000产品问题上达到85%以上最优收益。

详情
AI中文摘要

分类优化旨在从可替代产品中选择一个子集,在约束条件下最大化期望收益。由于组合和非线性性质,该问题是NP难的,并且在电子商务等行业中频繁出现,平台每分钟需要解决数千个此类问题。我们提出了一种图卷积网络(GCN)框架来高效求解约束分类优化问题。我们的方法构建问题的图表示,训练GCN学习从问题参数到最优分类的映射,并基于GCN的输出开发了三种推理策略。由于GCN能够跨实例规模泛化,从小规模样本中学到的模式可以迁移到大规模问题。我们建立了理论结果来证明所提出的GCN的表达能力,并解释了规模泛化能力的潜在机制。数值实验表明,在20个产品实例上训练的GCN能够在几秒内对多达2000个产品的问题实现超过85%的最优收益,在准确性和效率上均优于现有启发式方法。我们进一步将该框架扩展到使用交易数据的未知选择模型设置,并展示了类似的性能和可扩展性。

英文摘要

Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Theoretical results are established to show the expressive power of the proposed GCN, and explain the underlying mechanism of the size generalization ability. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.

2510.00375 2026-06-15 cs.LG cs.HC 版本更新

Multidimensional Bayesian Active Machine Learning of Working Memory Task Performance

工作记忆任务表现的多维贝叶斯主动机器学习

Dom CP Marticorena, Chris Wissmann, Zeyu Lu, Dennis L Barbour

发表机构 * Department of Biomedical Engineering, Washington University(生物医学工程系,华盛顿大学) Department of Computer Science and Engineering, Washington University(计算机科学与工程系,华盛顿大学)

AI总结 提出贝叶斯二维主动分类方法,在虚拟环境中控制空间负荷和特征绑定负荷,使用高斯过程分类器估计性能曲面,实现快速收敛并揭示个体差异。

Comments 41 pages, 7 figures

详情
AI中文摘要

虽然自适应实验设计已经超越了一维阶梯式自适应,但大多数认知实验仍然控制单个因素并用标量总结表现。我们展示了一种贝叶斯双轴主动分类方法的验证,该方法在沉浸式虚拟测试环境中针对5×5工作记忆重建任务进行。控制两个变量:项目的空间负荷L(占用瓦片数量)和特征绑定负荷K(不同颜色数量)。刺激获取由非参数高斯过程(GP)概率分类器的后验不确定性引导,该分类器输出(L, K)上的曲面,而不是单个阈值或最大跨度值。在年轻成人群体中,我们将GP驱动的自适应模式(AM)与传统的自适应阶梯经典模式(CM)进行比较,后者仅在K=3时变化L。在该队列中,两种方法之间达到一致性,在K=3时组内相关系数为0.755。此外,AM揭示了空间负荷和特征绑定之间交互作用的个体差异。AM估计比其他采样策略收敛更快,表明仅需约30个样本即可准确拟合完整模型。

英文摘要

While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.

2511.09789 2026-06-15 cs.LG 版本更新

Trend-Aware Multi-Task Learning for Short-Term Energy Forecasting

CaReTS:统一分类与回归的多任务时间序列预测框架

Fulong Yao, Wanqing Zhao, Chao Zheng, Xiaofei Han

发表机构 * Cardiff University(卡迪夫大学) Newcastle University(纽卡斯尔大学) University of Leeds(利兹大学)

AI总结 提出CaReTS多任务框架,通过双流架构联合分类趋势与回归偏差,实现高精度预测与可解释性,在真实数据集上优于现有方法。

详情
AI中文摘要

近年来深度预测模型取得了显著性能,但大多数方法仍难以同时提供准确的预测和对时间动态的可解释洞察。本文提出CaReTS,一种新颖的多任务学习框架,结合分类和回归任务用于多步时间序列预测问题。该框架采用双流架构,其中分类分支学习未来的逐步趋势,而回归分支估计目标变量最新观测值的相应偏差。双流设计通过分离目标变量的宏观趋势和微观偏差,提供更具可解释性的预测。为了在输出预测、偏差估计和趋势分类中实现有效学习,我们设计了一个具有不确定性加权机制的多任务损失,以自适应平衡每个任务的贡献。此外,在该框架下实例化了四种变体(CaReTS1-4),以集成主流时序建模编码器,包括卷积神经网络(CNN)、长短期记忆网络(LSTM)和Transformer。在真实数据集上的实验表明,CaReTS在预测准确性上优于最先进的算法,同时实现了更高的趋势分类性能。

英文摘要

Short-term energy forecasting plays an important role in real-time operational decision-making, such as electricity market bidding and power system dispatch, where both numerical accuracy and correct directional signals are essential. However, most existing forecasting approaches formulate the problem purely as a regression task, limiting their ability to explicitly capture stepwise directional movements and trend consistency required for operational decisions. To address this limitation, this paper proposes a trend-aware multi-task forecasting framework that decomposes forecasting outputs into directional movements and deviation magnitudes relative to the latest observation, enabling both accurate numerical prediction and interpretable trend-aware outputs. The framework adopts a task-specific dual-stream architecture and explores key design choices for integrating trend and deviation information, including hard versus probabilistic trend representations, symmetric versus asymmetric deviation modelling, and parallel versus sequential conditioning strategies. To stabilize multi-task learning and reduce manual tuning, an uncertainty-aware task weighting scheme is incorporated to automatically balance directional classification, deviation regression, and final output prediction during training. Experimental results on real-world energy datasets demonstrate that the proposed framework achieves competitive numerical accuracy compared with state-of-the-art algorithms, while consistently improving trend prediction performance with moderate computational cost. This capability is particularly beneficial in short-term energy system management, where consistent directional forecasting can provide more reliable decision support for practical operational scenarios such as market bidding, resource scheduling, and risk-aware energy management.

2512.03787 2026-06-15 cs.LG 版本更新

Adaptive Identification and Modeling of Clinical Pathways with Process Mining

基于过程挖掘的临床路径自适应识别与建模

Francesco Vitale, Nicola Mazzocca

发表机构 * University of Naples Federico II(那不勒斯费德里科二世大学)

AI总结 提出一种两阶段过程挖掘方法,通过一致性检查诊断扩展临床路径知识库,实现自适应识别与建模,在Synthea数据集上达到95.62% AUC和67.11%弧阶简单性。

Comments Accepted to the 41st ACM/SIGAPP Symposium On Applied Computing (ACM SAC 2026)

详情
AI中文摘要

临床路径是模拟患者治疗过程的专门医疗计划。它们旨在提供基于标准的进展并标准化患者治疗,从而改善护理、减少资源使用并加速患者康复。然而,基于临床指南和领域专业知识手动建模这些路径是困难的,并且可能无法反映针对不同疾病变异或组合的实际最佳实践。我们提出了一种使用过程挖掘的两阶段建模方法,通过利用一致性检查诊断来扩展临床路径知识库。在第一阶段,收集给定疾病的历史数据,以过程模型的形式捕获治疗。在第二阶段,将新数据与参考模型进行比较以验证一致性。基于一致性检查结果,知识库可以扩展为针对新变异或疾病组合定制的更具体模型。我们使用Synthea(一个模拟SARS-CoV-2感染患者治疗并伴有不同COVID-19并发症的基准数据集)展示了我们的方法。结果表明,我们的方法能够以足够的精度扩展临床路径知识库,AUC峰值达到95.62%,同时保持67.11%的弧阶简单性。

英文摘要

Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improving care, reducing resource use, and accelerating patient recovery. However, manual modeling of these pathways based on clinical guidelines and domain expertise is difficult and may not reflect the actual best practices for different variations or combinations of diseases. We propose a two-phase modeling method using process mining, which extends the knowledge base of clinical pathways by leveraging conformance checking diagnostics. In the first phase, historical data of a given disease is collected to capture treatment in the form of a process model. In the second phase, new data is compared against the reference model to verify conformance. Based on the conformance checking results, the knowledge base can be expanded with more specific models tailored to new variants or disease combinations. We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections with varying COVID-19 complications. The results show that our method enables expanding the knowledge base of clinical pathways with sufficient precision, peaking to 95.62% AUC while maintaining an arc-degree simplicity of 67.11%.

2512.10966 2026-06-15 cs.LG cs.AI cs.CV eess.IV 版本更新

Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts

可解释的阿尔茨海默病诊断:基于区域脑专家的多模态融合

Farica Zhuang, Shu Yang, Dinara Aliyeva, Zixuan Wen, Duy Duong-Tran, Christos Davatzikos, Tianlong Chen, Song Wang, Li Shen

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出MREF-AD多模态区域专家融合模型,采用混合专家框架将各模态脑区域视为独立专家,通过门控网络学习个性化融合权重,实现可解释的AD诊断。

Comments Published at IEEE ICHI 2026

详情
AI中文摘要

准确早期诊断阿尔茨海默病(AD)对有效干预至关重要,需要整合多模态神经影像数据的互补信息。然而,传统融合方法通常依赖特征的简单拼接,无法自适应平衡淀粉样蛋白PET和MRI等生物标志物在不同脑区的贡献。本文提出MREF-AD,一种用于AD诊断的多模态区域专家融合模型。它是一个混合专家(MoE)框架,将每个模态内的介观脑区域建模为独立专家,并采用门控网络学习个体特定的融合权重。利用阿尔茨海默病神经影像学倡议(ADNI)的表格神经影像和人口统计学信息,MREF-AD在强经典和深度学习基线上取得了有竞争力的性能,同时提供了可解释的、模态和区域层面的洞察,揭示了结构和分子影像如何共同促进AD诊断。源代码见:此 https URL。

英文摘要

Accurate and early diagnosis of Alzheimer's disease (AD) is critical for effective intervention and requires integrating complementary information from multimodal neuroimaging data. However, conventional fusion approaches often rely on simple concatenation of features, which cannot adaptively balance the contributions of biomarkers such as amyloid PET and MRI across brain regions. In this work, we propose MREF-AD, a Multimodal Regional Expert Fusion model for AD diagnosis. It is a Mixture-of-Experts (MoE) framework that models mesoscopic brain regions within each modality as independent experts and employs a gating network to learn subject-specific fusion weights. Utilizing tabular neuroimaging and demographic information from the Alzheimer's Disease Neuroimaging Initiative (ADNI), MREF-AD achieves competitive performance over strong classic and deep baselines while providing interpretable, modality- and region-level insight into how structural and molecular imaging jointly contribute to AD diagnosis. The source code is available at https://github.com/PennShenLab/mref-ad.

2512.13069 2026-06-15 cs.LG physics.flu-dyn stat.ML 版本更新

Multi-fidelity aerodynamic data fusion by autoencoder transfer learning

基于自编码器迁移学习的多保真度气动数据融合

Javier Nieto-Centenero, Esther Andrés, Rodrigo Castellanos

发表机构 * Department of Aerospace Engineering, UC3M(航空航天工程系,UC3M) Theoretical and Computational Aerodynamics Group, Flight Physics Department, INTA(理论与计算空气动力学组,飞行物理部门,INTA)

AI总结 提出结合自编码器迁移学习与多分裂保形预测的多保真度深度学习框架,利用低保真数据学习潜在物理表示,微调解码器以极少量高保真数据实现高精度气动压力预测,并生成超过95%点覆盖的不确定度带。

Comments 27 pages, 13 figures

详情
AI中文摘要

准确的气动预测通常依赖于高保真度模拟;然而,其高昂的计算成本严重限制了其在数据驱动建模中的适用性。这一局限性促使了多保真度策略的发展,该策略利用廉价的低保真度信息而不牺牲准确性。针对这一挑战,本文提出了一种多保真度深度学习框架,该框架将基于自编码器的迁移学习与新开发的多分裂保形预测(MSCP)策略相结合,以在极端数据稀缺条件下实现具有不确定度感知的气动数据融合。该方法利用丰富的低保真度(LF)数据学习紧凑的潜在物理表示,该表示作为冻结的知识库,随后使用稀缺的高保真度(HF)样本对解码器进行微调。在NACA翼型(二维)和跨声速机翼(三维)数据库的表面压力分布测试中,该模型成功修正了LF偏差,并使用最少的HF训练数据实现了高精度的压力预测。此外,MSCP框架生成了稳健且可操作的不确定度带,点覆盖超过95%。通过将极端数据效率与不确定度量化相结合,本文为数据稀缺环境下的气动回归提供了一种可扩展且可靠的解决方案。

英文摘要

Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This limitation motivates the development of multi-fidelity strategies that leverage inexpensive low-fidelity information without compromising accuracy. Addressing this challenge, this work presents a multi-fidelity deep learning framework that combines autoencoder-based transfer learning with a newly developed Multi-Split Conformal Prediction (MSCP) strategy to achieve uncertainty-aware aerodynamic data fusion under extreme data scarcity. The methodology leverages abundant Low-Fidelity (LF) data to learn a compact latent physics representation, which acts as a frozen knowledge base for a decoder that is subsequently fine-tuned using scarce HF samples. Tested on surface-pressure distributions for NACA airfoils (2D) and a transonic wing (3D) databases, the model successfully corrects LF deviations and achieves high-accuracy pressure predictions using minimal HF training data. Furthermore, the MSCP framework produces robust, actionable uncertainty bands with pointwise coverage exceeding 95%. By combining extreme data efficiency with uncertainty quantification, this work offers a scalable and reliable solution for aerodynamic regression in data-scarce environments.

2512.14967 2026-06-15 cs.LG q-fin.CP q-fin.MF 版本更新

Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise

带共同噪声的McKean-Vlasov正倒向随机微分方程的深度学习与可引性

Felipe J. P. Antunes, Yuri F. Saporito, Sebastian Jaimungal

发表机构 * School of Applied Mathematics, Getulio Vargas Foundation(应用数学学院,古特雷斯基金会) Department of Statistical Sciences, University of Toronto(统计科学系,多伦多大学) Oxford-Man Institute for Quantitative Finance, University of Oxford(牛津-曼定量金融研究所,牛津大学)

AI总结 提出结合Picard迭代、可引性和深度学习的方法,求解带共同噪声的McKean-Vlasov正倒向随机微分方程,通过可引性导出路径损失函数避免嵌套蒙特卡洛,在系统风险模型和经济增长模型中验证了准确性。

Comments 19 pages, 8 figures,

详情
AI中文摘要

我们提出了一种新颖的数值方法,用于求解带共同噪声的McKean-Vlasov正倒向随机微分方程(MV-FBSDEs),该方法结合了Picard迭代、可引性和深度学习。关键创新在于利用可引性导出路径损失函数,从而能够高效训练神经网络来近似倒向过程和由共同噪声引起的条件期望,无需计算昂贵的嵌套蒙特卡洛模拟。平均场相互作用项通过循环神经网络参数化,该网络被训练以最小化可引分数,而倒向过程则通过表示解耦场的混合前馈和循环网络来近似。我们在一个存在解析解的系统性风险银行间借贷模型上验证了该算法,结果表明能够准确恢复真实解。我们进一步将模型扩展到分位数中介的相互作用,展示了可引性框架在条件均值或矩之外的灵活性。最后,我们将该方法应用于一个具有内生利率的非平稳Aiyagari-Bewley-Huggett经济增长模型,展示了其在没有闭式解的复杂平均场博弈中的适用性。

英文摘要

We present a novel numerical method for solving McKean--Vlasov forward--backward stochastic differential equations (MV--FBSDEs) with common noise, combining Picard iterations, elicitability and deep learning. The key innovation involves elicitability to derive a pathwise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise, without requiring computationally expensive nested Monte Carlo simulations. The mean-field interaction term is parameterized via a recurrent neural network trained to minimize an elicitable score, while the backward process is approximated through a hybrid feedforward and recurrent network representing the decoupling field. We validate the algorithm on a systemic-risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution. We further extend the model to quantile-mediated interactions, showcasing the flexibility of the elicitability framework beyond conditional means or moments. Finally, we apply the method to a non-stationary Aiyagari--Bewley--Huggett economic growth model with endogenous interest rates, illustrating its applicability to complex mean-field games without closed-form solutions.

2601.18707 2026-06-15 cs.LG cs.AI cs.CV cs.NE 版本更新

SMART: Scalable Mesh-free Aerodynamic Simulations from Raw Geometries using a Transformer-based Surrogate Model

SMART: 基于Transformer代理模型的原始几何形状可扩展无网格气动模拟

Jan Hagnberger, Mathias Niepert

发表机构 * Jan Hagnberger Mathias Niepert

AI总结 提出SMART,一种无需模拟网格、仅使用几何点云预测任意查询位置物理量的神经代理模型,通过交叉层交互联合更新几何特征和物理场,性能媲美甚至超越依赖网格的方法。

Comments Accepted for publication at the 43rd International Conference on Machine Learning (ICML) 2026, Seoul, South Korea

详情
AI中文摘要

基于机器学习的代理模型已成为复杂几何体(如车身)物理模拟中数值求解器的高效替代方案。许多现有模型将模拟网格作为额外输入,从而减少预测误差。然而,为新几何体生成模拟网格计算成本高昂。相比之下,不依赖模拟网格的无网格方法通常误差更高。基于这些考虑,我们引入了SMART,一种神经代理模型,它仅使用几何体的点云表示,无需访问模拟网格,即可预测任意查询位置的物理量。几何体和模拟参数被编码到一个共享的潜在空间中,该空间捕捉物理场的结构和参数特征。然后,一个物理解码器关注编码器的中间潜在表示,将空间查询映射到物理量。通过这种跨层交互,模型联合更新潜在几何特征和演变的物理场。大量实验表明,SMART与依赖模拟网格作为输入的现有方法相比具有竞争力,并且通常表现更优,展示了其在工业级模拟中的能力。

英文摘要

Machine learning-based surrogate models have emerged as more efficient alternatives to numerical solvers for physical simulations over complex geometries, such as car bodies. Many existing models incorporate the simulation mesh as an additional input, thereby reducing prediction errors. However, generating a simulation mesh for new geometries is computationally costly. In contrast, mesh-free methods, which do not rely on the simulation mesh, typically incur higher errors. Motivated by these considerations, we introduce SMART, a neural surrogate model that predicts physical quantities at arbitrary query locations using only a point-cloud representation of the geometry, without requiring access to the simulation mesh. The geometry and simulation parameters are encoded into a shared latent space that captures both structural and parametric characteristics of the physical field. A physics decoder then attends to the encoder's intermediate latent representations to map spatial queries to physical quantities. Through this cross-layer interaction, the model jointly updates latent geometric features and the evolving physical field. Extensive experiments show that SMART is competitive with and often outperforms existing methods that rely on the simulation mesh as input, demonstrating its capabilities for industry-level simulations.

2602.12379 2026-06-15 cs.LG 版本更新

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

深度双重去偏的ICE G-计算公式纵向效应估计

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * Cornell University(康奈尔大学) Weill Cornell Medicine(韦尔医学院)

AI总结 提出D3-Net框架,通过顺序双重稳健伪结果和纵向目标最小损失估计,解决ICE G-计算中的误差传播问题,实现纵向治疗效应的稳健估计。

详情
AI中文摘要

估计纵向治疗效应对于顺序决策至关重要,但由于治疗-混杂反馈而具有挑战性。虽然迭代条件期望(ICE)G-计算提供了一种原则性方法,但其递归结构存在误差传播,破坏了学习到的结果回归模型。我们提出D3-Net,一个在ICE训练中减轻误差传播并应用稳健最终校正的框架。首先,为了中断学习过程中的误差传播,我们使用顺序双重稳健(SDR)伪结果训练ICE序列,为每个回归提供偏差校正的目标。其次,我们采用多任务变换器,配备协变量模拟器头部进行辅助监督,正则化表示学习,以及目标网络以稳定训练动态。对于最终估计,我们丢弃SDR校正,而是使用未校正的干扰模型对原始结果进行纵向目标最小损失估计(LTMLE)。这第二阶段的针对性去偏确保了稳健性和最优有限样本性质。综合实验表明,与现有最先进的基于ICE的估计器相比,我们的模型D3-Net在不同时间范围、反事实和时变混杂下稳健地降低了偏差和方差。

英文摘要

Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task transformer with a covariate simulator head for auxiliary supervision, regularizing representation learning, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

2603.05556 2026-06-15 cs.LG 版本更新

IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings

IntSeqBERT: 通过模谱嵌入学习OEIS中的算术结构

Kazuhisa Nakasho

发表机构 * Iwate Prefectural University(岩手县大学)

AI总结 提出IntSeqBERT,一种双流Transformer编码器,通过连续对数幅度嵌入和100个模数的正弦/余弦模嵌入融合,在OEIS序列上联合训练三个预测头,显著提升了序列预测精度。

详情
AI中文摘要

OEIS中的整数序列涵盖从个位数常数到天文阶乘和指数,使得标准分词模型难以处理,因为它们无法处理词汇表外的值或利用周期性算术结构。我们提出IntSeqBERT,一种用于OEIS掩码整数序列建模的双流Transformer编码器。每个序列元素沿两个互补轴编码:连续对数尺度幅度嵌入和100个残差(模数$2$--$101$)的正弦/余弦模嵌入,通过FiLM融合。三个预测头(幅度回归、符号分类和100个模数的模预测)在274,705个OEIS序列上联合训练。在Large规模(9150万参数)下,IntSeqBERT在测试集上达到95.85%的幅度准确率和50.38%的平均模准确率(MMA),分别比标准分词Transformer基线高出$+8.9$和$+4.5$个百分点。去除模流的消融实验证实,模流贡献了$+15.2$个百分点的MMA增益,并额外贡献了$+6.2$个百分点的幅度准确率。基于概率中国剩余定理(CRT)的解算器将模型预测转化为具体整数,使得下一项预测比分词Transformer基线提升7.4倍(Top-1: 19.09% vs. 2.59%)。模谱分析显示,归一化信息增益(NIG)与欧拉函数比值$\varphi(m)/m$之间存在强负相关($r = -0.851$, $p < 10^{-28}$),为复合模数通过CRT聚合更有效地捕获OEIS算术结构提供了经验证据。

英文摘要

Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli $2$--$101$), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by $+8.9$ pt and $+4.5$ pt, respectively. An ablation removing the modulo stream confirms it accounts for $+15.2$ pt of the MMA gain and contributes an additional $+6.2$ pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $φ(m)/m$ ($r = -0.851$, $p < 10^{-28}$), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.

2604.23841 2026-06-15 cs.LG cs.AI 版本更新

Scalable Production Scheduling: Linear Complexity via Unified Homogeneous Graphs

可扩展的生产调度:通过统一同质图实现线性复杂度

Jonathan Hoss, Moritz Link, Noah Klarmann

发表机构 * Faculty of Management and Engineering, Rosenheim Technical University of Applied Sciences(管理与工程学院,罗森海姆应用技术大学)

AI总结 提出统一同质图框架,通过特征同质化将不同节点角色映射到共享潜在空间,使用同构图同构网络以线性复杂度解决作业车间调度问题,实现零样本泛化,并发现作业与机器比率是策略有效性的主要驱动因素。

Comments This paper has been accepted for presentation at the IEEE 22st International Conference on Automation Science and Engineering (CASE 2026)

详情
AI中文摘要

在现实工业应用中高效解决作业车间调度问题需要既计算精简又拓扑鲁棒的策略。虽然强化学习在自动化调度规则方面显示出潜力,但现有模型常因二次图复杂度或异质层的架构开销而面临可扩展性瓶颈。我们引入了一个统一图框架,采用基于特征的同质化将不同的节点角色投影到共享潜在空间。这使得标准的同构图同构网络能够以线性复杂度捕获复杂的资源竞争,确保大规模工业应用的低延迟推理。我们的实验结果表明,我们的框架实现了最先进的性能,同时表现出一致的零样本泛化。我们确定作业与机器比率是策略有效性的主要驱动因素,而非绝对问题规模。基于此,我们提出了结构饱和假设,证明在临界拥塞实例($\mathcal{J} \approx \mathcal{M}$)上训练的策略学习了尺度不变的解决策略。在此饱和点训练的智能体内化了不变的冲突解决逻辑,使它们能够将大规模矩形实例视为饱和子问题的顺序串联。这种方法消除了昂贵的特定尺度重新训练的需要,并防止了对统计捷径的过拟合,为在动态生产环境中部署强化学习解决方案提供了鲁棒且高效的途径。

英文摘要

Efficiently solving the Job Shop Scheduling Problem in real-world industrial applications requires policies that are both computationally lean and topologically robust. While Reinforcement Learning has shown potential in automating dispatching rules, existing models often struggle with a scalability bottleneck caused by quadratic graph complexity or the architectural overhead of heterogeneous layers. We introduce a unified graph framework that employs feature-based homogenization to project distinct node roles into a shared latent space. This allows a standard homogeneous Graph Isomorphism Network to capture complex resource contention with linear complexity, ensuring low-latency inference for large-scale industrial applications. Our empirical results demonstrate that our framework achieves state-of-the-art performance while exhibiting consistent zero-shot generalization. We identify the job-to-machine ratio as the primary driver of policy effectiveness, rather than absolute problem size. Based on this, we propose a hypothesis of structural saturation, demonstrating that policies trained on critically congested instances ($\mathcal{J} \approx \mathcal{M}$) learn scale-invariant resolution strategies. Agents trained at this saturation point internalize invariant conflict-resolution logic, allowing them to treat massive rectangular instances as a sequential concatenation of saturated sub-problems. This approach eliminates the need for expensive scale-specific retraining and prevents overfitting to statistical shortcuts, providing a robust and efficient pathway for deploying RL solutions in dynamic production environments.

2605.16739 2026-06-15 cs.LG cs.AI cs.CL q-bio.NC 版本更新

EmoMind: Decoding Affective Captions from Human Brain fMRI

EmoMind:从人类大脑fMRI信号解码情感描述

Bilal A. Mohammed, Lin Gu, Ruogu Fang

发表机构 * Department of Biomedical Engineering(生物医学工程系) Vanderbilt University(范德比大学) Research Institute of Electrical Communication(电气通信研究所) Tohoku University(东北大学) University of Florida(佛罗里达大学)

AI总结 本文提出EmoMind,首个端到端解码fMRI信号生成情感描述的系统,通过结合语义基础的中性场景描述和连续情感向量,实现了在内容保留与情感表达间的平衡,并在多个验证框架下优于基于标签提示的GPT-4。

详情
AI中文摘要

从大脑活动解码视觉经验已取得显著进展,但当前的脑-文本系统主要恢复语义内容而丢弃情感。此外,语言模型在接收到类别标签提示时可以生成情感文本,但此类标签将丰富的跨受试者变异性压缩成粗糙的离散类别。我们提出了EmoMind,首个端到端的解码情感描述的fMRI信号管道。EmoMind首先从解码的视觉特征中检索出语义基础的中性场景描述,然后使用从相同fMRI记录中解码的连续34维情感向量重写该描述。为了在内容保留和情感表达之间保持平衡,我们使用分类器自由指导训练重写器,以对抗一个保持身份的空分支,从而在语义忠实性和情感表达性之间实现平滑插值。我们通过涵盖受试者特异性、结构几何和因果控制的三轴验证框架评估情感描述生成。我们进一步用合成大脑替代测试增强此框架,以探测对测量设备的鲁棒性,并将每个轴与使用脑解码的前五名情感标签提示的GPT-4进行基准测试。在两个独立的情感fMRI数据集中,EmoMind在所有三个轴上均显著优于标签提示的GPT-4,其中最大的收益出现在需要个人特定情感结构而非群体层面情绪聚合的指标上。这些结果确立了连续脑解码情感作为个性化情感描述生成的可行控制信号,并为研究个体情感大脑组织开辟了新方向。

英文摘要

Decoding visual experience from brain activity has advanced substantially, but current brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with categorical labels, but such labels collapse rich inter-subject variability into coarse discrete bins. We present EmoMind, the first end-to-end pipeline for decoding affective captions directly from fMRI signals. EmoMind first retrieves a semantically grounded neutral scene description from brain-decoded visual features, then rewrites it using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. To control the balance between content preservation and affective expression, we train the rewriter with classifier-free guidance against an identity-preserving null branch, enabling smooth interpolation between semantic fidelity and affective expressivity. We evaluate affective caption generation with a three-axis validation framework spanning subject-specificity, structural geometry, and causal control. We further augment this framework with a synthetic-brain substitution test that probes robustness to the measurement apparatus, and we benchmark each axis against GPT-4 prompted with brain-decoded top-5 emotion labels as a strong discrete baseline. Across two independent emotion fMRI datasets, EmoMind significantly outperforms label-prompted GPT-4 on all three axes, with the largest gains on metrics that require person-specific affective structure rather than population-level emotion aggregation. These results establish continuous brain-decoded affect as a viable control signal for individualized affective caption generation and open new directions for studying individual affective brain organisation.

2605.26759 2026-06-15 cs.LG 版本更新

Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining

基于上下文条件与因果增强预训练的时间序列因果发现

Biao Ouyang, Tengxue Zhang, Zhihao Zhuang, Yang Shu, Chenjuan Guo, Bin Yang

发表机构 * East China Normal University(东华师范大学)

AI总结 提出PTCD框架,通过上下文条件建模和可迁移因果增强的预训练范式,提升跨任务时间序列因果发现的泛化能力,在多个真实OOD数据集上因果发现和根因识别表现优异。

Comments 20 pages

详情
AI中文摘要

时间序列的因果发现对于许多现实世界应用至关重要,例如追踪异常的根本原因。现有方法通常依赖于特定数据集的优化,这使得其因果发现能力难以迁移到由不同因果机制控制的新时间序列上。在本文中,我们提出PTCD,一种新颖的时间序列因果发现预训练框架,通过上下文条件建模和可迁移的因果增强来改进跨任务泛化。为了建模复杂的时间因果依赖关系,PTCD采用双尺度迭代注意力机制来捕获窗口级别的因果关系,并利用带有上下文级别路由机制的高斯混合模型来处理异质的外生分布。为了进一步解决因果图之间的分布偏移,PTCD在合成数据集上采用预训练范式,该范式整合了基于干预的学习和因果混合策略,促进了稳定的因果发现和更强的泛化能力。在多个真实世界分布外(OOD)数据集上的大量实验表明,PTCD在因果发现和根因识别方面均表现出色。

英文摘要

Causal discovery from time series is critical for many real-world applications, such as tracing the root causes of anomalies. Existing approaches typically rely on dataset-specific optimization, making it difficult to transfer their causal discovery capabilities to new time series governed by diverse causal mechanisms. In this paper, we propose \textbf{PTCD}, a novel \textbf{P}retraining framework for \textbf{T}ime-series \textbf{C}ausal \textbf{D}iscovery, which improves cross-task generalization through context-conditioned modeling and transferable causal augmentation. To model complex temporal causal dependencies, PTCD employs a dual-scale iterative attention mechanism to capture window-level causal relationships, and a Gaussian mixture with a context-level routing mechanism to handle heterogeneous exogenous distributions. To further address distribution shifts across causal graphs, PTCD adopts a pretraining paradigm on synthetic datasets that integrates intervention-based learning and a causal mixup strategy, promoting stable causal discovery and stronger generalization. Extensive experiments on multiple real-world out-of-distribution (OOD) datasets demonstrate that PTCD excels in both causal discovery and root cause identification.

2605.29228 2026-06-15 cs.LG q-bio.MN 版本更新

Traditional machine learning vs. deep learning from dynamic graph representations of proteins' 3D folds in the task of protein structure classification

传统机器学习 vs. 深度学习在蛋白质三维折叠动态图表示中的蛋白质结构分类任务

Aydin Wells, Francis A. Gatsi, Aaron Striegel, Tijana Milenković

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本研究比较了传统机器学习与深度学习在基于动态蛋白质结构网络进行蛋白质结构分类时的准确性和效率,发现两者准确性相近但深度学习慢10倍以上。

Comments Main paper: 16 pages, 4 figures, and 1 table; Supplementary information: 13 pages, 9 figures

详情
AI中文摘要

蛋白质结构分类(PSC)使用监督学习从蛋白质序列或三维结构特征预测其CATH/SCOP(e)类别。我们之前将三维结构建模为(静态)蛋白质结构网络(PSN),证明了基于PSN的特征在PSC任务中与序列或直接(即非网络)三维结构特征相比具有竞争力。最近,我们展示了从动态PSN中提取的特征在相同任务中优于从静态PSN中提取的特征(从而通过传递性优于序列和直接三维结构特征)。该动态PSN方法使用传统机器学习(ML),结合手动(预设计)特征与现成分类器。在此,我们评估从动态PSN进行自动深度学习(DL)是否能带来改进。我们对涵盖约44,000个CATH或SCOPe标记的动态PSN的72个数据集进行的评估显示,就PSC准确性而言,传统ML和DL在绝大多数数据集上(接近)持平,而DL平均慢10倍以上。我们是首个在基于动态PSN的PSC任务中评估传统ML与DL的研究。

英文摘要

Protein structure classification (PSC) uses supervised learning to predict a protein's CATH/SCOP(e) class from the protein's sequence or 3D structural feature(s). We already modeled 3D structures as (static) protein structure networks (PSNs), demonstrating the competitiveness of PSN-based features to sequence or direct (i.e. non-network) 3D structural features in the PSC task. More recently, we demonstrated the power of features extracted from dynamic PSNs over features extracted from static PSNs (and thus by transitivity over sequence and direct 3D structural features) in the same task. That dynamic PSN approach used traditional machine learning (ML), combining manual (pre-engineered) features with an off-the-shelf classifier. Here, we evaluate whether automatic deep learning (DL) from the dynamic PSNs yields improvements. Our evaluation on 72 datasets spanning ~44,000 CATH- or SCOPe-labeled dynamic PSNs reveals that in terms of PSC accuracy, traditional ML and DL are (close to) tied for a large majority of the datasets, while DL is on average 10+ times slower. We are the first to evaluate traditional ML vs. DL in the dynamic PSN-based PSC task.

2606.12476 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

幻觉起始的快速检测:延迟界与学习型CUSUM统计量

Igor Itkin

发表机构 * Independent Researcher(独立研究员)

AI总结 将幻觉起始检测建模为快速变化检测问题,基于RAGTruth验证的一阶马尔可夫模型,利用学习型CUSUM算法在匹配虚警率下实现11-13个token的检测延迟,优于线性基线,并揭示了分类指标掩盖的延迟结构。

Comments 16 pages, 1 figure. v2: added Discussion and Appendix; recall-honest framing; robustness analyses (k-NN divergence estimate, seed-averaged decomposition)

详情
AI中文摘要

Token级幻觉检测器作为分类器进行评估,通过所有token的AUC,但流式监控器由其反应时间判断:从幻觉开始到警报之间的token数量。我们将幻觉起始检测表述为一个快速变化检测问题。在RAGTruth上验证的潜在忠实/幻觉状态的一阶马尔可夫模型,将任务置于经典变点理论中,并得出Lorden关于检测延迟的下界:在虚警率为0.01时约为1.3个token。然后我们证明,因果循环标注器充当了具有学习增量的CUSUM;在匹配的虚警率下,它在11-13个token内检测到,而线性每token基线为31个token,受控分解将大部分优势归因于更好的每token得分,而非时间累积。Donsker-Varadhan型的信息率最优性定理解释了剩余的数量级差距:学习得分仅实现了特征携带散度的1/4.5,这一缺陷无法通过重新校准消除,其余部分为有限时域效应。分类指标掩盖了这种延迟结构;序列分析使其可测量。

英文摘要

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment. Among the onsets it catches it detects in 11-13 tokens, against 31 for a linear per-token baseline, though at this false-alarm budget every detector catches under a third of onsets and the recall-honest delay is 56-66 tokens: low-false-alarm onset detection is hard. A controlled decomposition attributes the speed advantage mostly to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable.

2402.17750 2026-06-15 physics.optics cs.ET cs.LG 版本更新

Arbitrary control over multimode wave propagation for machine learning

用于机器学习的多模波传播的任意控制

Tatsuhiro Onodera, Martin M. Stein, Benjamin A. Ash, Mandar M. Sohoni, Melissa Bosch, Ryotatsu Yanagimoto, Marc Jankowski, Timothy P. McKenna, Tianyu Wang, Gennady Shvets, Maxim R. Shcherbakov, Logan G. Wright, Peter L. McMahon

发表机构 * School of Applied and Engineering Physics, Cornell University(应用与工程物理系,康奈尔大学) NTT Physics and Informatics Laboratories, NTT Research, Inc.(NTT物理与信息实验室,NTT研究公司) E. L. Ginzton Laboratory, Stanford University(E. L. Ginzton实验室,斯坦福大学) Kavli Institute at Cornell for Nanoscale Science, Cornell University(康奈尔大学纳米科学研究所) Department of Electrical and Computer Engineering, Boston University(波士顿大学电气与计算机工程系) Department of Electrical Engineering and Computer Science, University of California(加州大学电气工程与计算机科学系) Department of Applied Physics, Yale University(耶鲁大学应用物理系)

AI总结 提出一种可快速重编程折射率的二维可编程波导,通过并行电光调制实现多模波传播的任意控制,并用于单次神经网络推理,理论表明面积增长为N^1.5而非N^2。

详情
Journal ref
Nat. Phys. 22, 164-171 (2026)
AI中文摘要

受控的多模波传播可以实现比基于单模波导连接分立组件的架构更节省空间的光子处理器。我们可以不定义离散元件,而是通过二维多模干涉来塑造光子处理器的连续基底以执行计算。这里我们设计并展示了一种折射率可在空间上快速重编程的器件,允许对波传播进行任意控制。该器件是一种二维可编程波导,利用对平板波导折射率的并行电光调制,具有约10^4个可编程空间自由度。我们在基准任务上实现了单次通过、无需数字预处理或后处理的神经网络推理,向量维度高达49。理论和数值分析进一步表明,二维可编程波导不仅可能提供器件面积的常数因子缩减,还可能带来缩放优势,所需面积按N^{1.5}而非N^2增长。

英文摘要

Controlled multimode wave propagation can enable more space-efficient photonic processors than architectures based on discrete components connected by single-mode waveguides. Instead of defining discrete elements, one can sculpt the continuous substrate of a photonic processor to perform computations through multimode interference in two dimensions. Here we designed and demonstrated a device with a refractive index that can be rapidly reprogrammed across space, allowing arbitrary control of wave propagation. The device, a two-dimensional programmable waveguide, uses parallel electro-optic modulation of the refractive index of a slab waveguide with about $10^4$ programmable spatial degrees of freedom. We implemented neural network inference on benchmark tasks with up to $49$-dimensional vectors in a single pass, without digital pre-processing or post-processing. Theoretical and numerical analyses further indicated that two-dimensional programmable waveguides may offer not only a constant-factor reduction in device area but also a scaling benefit, with the area required growing as $N^{1.5}$ rather than $N^2$.

2410.15051 2026-06-15 cs.CL cs.LG 版本更新

Automatic identification of diagnosis from hospital discharge letters via weakly supervised Natural Language Processing

通过弱监督自然语言处理自动识别出院信中的诊断

Vittorio Torri, Elisa Barbieri, Anna Cantarutti, Carlo Giaquinto, Francesca Ieva

发表机构 * University of Bologna(博洛尼亚大学)

AI总结 提出一种弱监督NLP流程,无需文档级标注即可从意大利语出院信中分类诊断,在细支气管炎数据集上达到接近全监督的性能,节省大量人工标注时间。

Comments 61 pages, 9 figures

详情
AI中文摘要

从医院出院信中识别患者诊断对于大规模队列选择和流行病学研究至关重要,但传统的监督方法需要大量手动标注,这对于大型文本数据集通常不切实际。我们提出了一种弱监督自然语言处理(NLP)流程,用于对意大利语出院信进行分类,无需文档级手动标注。该方法提取与诊断相关的句子,使用在意大利医学文档上进一步预训练的Transformer模型生成语义嵌入,并应用两级聚类程序推导出弱标签,然后用于训练文档级分类器。该方法在2017年至2020年间意大利威尼托地区44个急诊室或医院收治的33,176份儿童出院信的细支气管炎案例研究中进行了评估。最佳弱监督模型在手动标注数据上实现了77.68%(±4.30%)的AUROC、73.13%(±4.93%)的AUPRC和78.14%(±4.89%)的F1分数。性能超过了无监督基线,接近全监督模型,同时对于该规模的数据集减少了超过1,500小时的手动标注需求。在较小的支气管炎数据集(3,188份出院信,2020-2025年)的二次验证中观察到类似的模型排名,最佳弱监督模型实现了76.72%(±5.02%)的AUPRC。这些结果表明弱监督NLP方法在从临床出院信中可扩展地识别疾病方面具有潜力。

英文摘要

Identifying patient diagnoses from hospital discharge letters is essential for large-scale cohort selection and epidemiological research, but traditional supervised approaches require extensive manual annotation, which is often impractical for large textual datasets. We present a weakly supervised Natural Language Processing (NLP) pipeline for classifying Italian discharge letters without document-level manual annotation. The method extracts diagnosis-related sentences, generates semantic embeddings using a transformer model further pre-trained on Italian medical documents, and applies a two-level clustering procedure to derive weak labels that are then used to train a document-level classifier. The approach was evaluated in a case study on bronchiolitis using 33,176 discharge letters of children admitted to 44 emergency rooms or hospitals in the Veneto Region, Italy, between 2017 and 2020. The best weakly supervised model achieved an AUROC of 77.68% ($\pm4.30\%$), an AUPRC of 73.13% ($\pm4.93\%$), and an F1-score of 78.14% ($\pm4.89\%$) against manually annotated data. Performance surpassed unsupervised baselines and approached fully supervised models, while reducing the need for manual annotation by more than 1,500 hours for a dataset of this size. Similar model rankings were observed in a secondary validation on a smaller bronchitis dataset (3,188 discharge letters, 2020-2025), where the best weakly supervised model achieved an AUPRC of 76.72% ($\pm 5.02\%$). These results suggest the potential of weakly supervised NLP methods for scalable disease identification from clinical discharge letters.

2501.08561 2026-06-15 cs.AI cs.HC cs.LG cs.SC 版本更新

ANSR-DT: A Neuro-Symbolic Framework for Adaptive and Explainable Digital Twins

ANSR-DT:一种自适应可解释数字孪生的神经符号框架

Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song

发表机构 * Department of Information Systems, University of Maryland Baltimore County(信息系统系,马里兰大学巴尔的摩县分校) Department of Computer Science and Engineering, University at Buffalo(计算机科学与工程系,布法罗大学) Department of Computer Science, University of Colorado Boulder(计算机科学系,科罗拉多大学波德分校)

AI总结 提出ANSR-DT框架,结合CNN-LSTM、Prolog推理和PPO强化学习,实现数字孪生的异常检测、符号推理与自适应决策,在多个基准上表现优异。

Comments Code available at https://github.com/sbhakim/ansr-dt

详情
AI中文摘要

数字孪生越来越多地用于监控和优化工业系统,然而许多现有框架仍然难以解释、适应缓慢,并且整合显式领域知识的能力有限。本文提出了ANSR-DT,一种自适应神经符号框架,它在单一数字孪生流水线中统一了时序异常检测、符号推理和基于强化学习的决策支持。ANSR-DT将用于多变量模式识别的CNN-LSTM模型与基于Prolog的推理相结合,后者将学习到的信号转换为显式规则,从而实现透明的诊断和可追溯的决策路径。基于PPO的适应层进一步在变化条件下优化操作响应,同时保持可解释性。在8个基线模型上的实验表明,ANSR-DT在提供竞争性预测性能的同时,还能实现稳定的规则提取、可扩展的符号推理和可操作的解释。在Skoltech异常基准(SKAB)上的额外验证进一步表明,该框架能够迁移到合成场景之外。这些发现使ANSR-DT成为可信、自适应和可解释的工业数字孪生的实用基础。

英文摘要

Digital twins are increasingly used to monitor and optimize industrial systems, yet many existing frameworks remain difficult to interpret, slow to adapt, and limited in their ability to incorporate explicit domain knowledge. This paper presents ANSR-DT, an adaptive neuro-symbolic framework that unifies temporal anomaly detection, symbolic reasoning, and reinforcement-learning-based decision support within a single digital twin pipeline. ANSR-DT combines a CNN-LSTM model for multivariate pattern recognition with Prolog-based reasoning that converts learned signals into explicit rules, enabling transparent diagnoses and traceable decision paths. A PPO-based adaptation layer further refines operational responses under changing conditions while preserving interpretability. Experiments against 8 baselines show that ANSR-DT delivers competitive predictive performance together with stable rule extraction, scalable symbolic reasoning, and actionable explanations. Additional validation on the Skoltech Anomaly Benchmark (SKAB) further indicates that the framework transfers beyond synthetic settings. These findings position ANSR-DT as a practical foundation for trustworthy, adaptive, and explainable industrial digital twins.

2504.03686 2026-06-15 cs.NI cs.AI cs.LG 版本更新

Revisiting Outage for Edge Inference Systems

重新审视边缘推理系统的中断问题

Zhanwei Wang, Qunsong Zeng, Haotian Zheng, Kaibin Huang

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong(香港大学电子与计算机工程系)

AI总结 针对边缘推理系统的端到端可靠性,提出推理中断概率框架,量化推理精度低于阈值的概率,并优化通信开销与推理可靠性的权衡。

详情
AI中文摘要

第六代(6G)移动网络的关键任务之一是在网络边缘部署大规模人工智能(AI)模型,为边缘设备提供远程推理服务。由此产生的平台称为边缘推理,将支持广泛的物联网应用,如自动驾驶、工业自动化和增强现实。鉴于这些任务的关键性和时间敏感性,设计既可靠又能满足严格端到端(E2E)延迟约束的边缘推理系统至关重要。现有研究主要关注以信道中断概率为特征的通信可靠性,可能无法保证E2E性能,特别是在E2E推理精度和延迟方面。为解决这一局限,我们提出一个理论框架,引入并数学刻画了推理中断(InfOut)概率,该概率量化了E2E推理精度低于目标阈值的可能性。在E2E延迟约束下,该框架建立了通信开销(即上传更多传感器观测)与以InfOut概率量化的推理可靠性之间的基本权衡。为了找到优化这种权衡的可行方法,我们通过对接收判别增益的分布应用高斯近似,推导出InfOut概率的精确替代函数。实验结果表明,所提出的设计在E2E推理可靠性方面优于传统的以通信为中心的方法。

英文摘要

One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the mission-critical and time-sensitive nature of these tasks, it is essential to design edge inference systems that are both reliable and capable of meeting stringent end-to-end (E2E) latency constraints. Existing studies, which primarily focus on communication reliability as characterized by channel outage probability, may fail to guarantee E2E performance, specifically in terms of E2E inference accuracy and latency. To address this limitation, we propose a theoretical framework that introduces and mathematically characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Under an E2E latency constraint, this framework establishes a fundamental tradeoff between communication overhead (i.e., uploading more sensor observations) and inference reliability as quantified by the InfOut probability. To find a tractable way to optimize this tradeoff, we derive accurate surrogate functions for InfOut probability by applying a Gaussian approximation to the distribution of the received discriminant gain. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches in terms of E2E inference reliability.

2508.18166 2026-06-15 cs.IR cs.LG 版本更新

PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation

PCR-CA: 基于对比对齐的并行码本表示用于多类别应用推荐

Bin Tan, Wangyao Ge, Yidi Wang, Xin Liu, Jeff Burtoft, Hao Fan, Hui Wang

发表机构 * Microsoft Suzhou China(微软苏州中国) Microsoft Beijing China(微软北京中国) Microsoft Redmond WA USA(微软雷德蒙德华盛顿州美国)

AI总结 提出PCR-CA框架,通过并行码本VQ-AE模块学习多类别应用的离散语义表示,结合对比对齐损失和双注意力融合,提升CTR预测,尤其对长尾应用效果显著。

Comments Accepted by KDD 2026, oral

详情
AI中文摘要

现代应用商店推荐系统在处理多类别应用时面临挑战,因为传统分类法无法捕捉重叠语义,导致个性化效果不佳。我们提出PCR-CA(并行码本表示与对比对齐),一个用于改进CTR预测的端到端框架。PCR-CA首先从应用文本中提取紧凑的多模态嵌入,然后引入并行码本VQ-AE模块,该模块并行学习多个码本上的离散语义表示——不同于层次残差量化(RQ-VAE)。这种设计能够独立编码不同方面(如游戏玩法、艺术风格),更好地建模多类别语义。为了桥接语义信号和协同信号,我们在用户和项目层面采用对比对齐损失,增强长尾项目的表示学习。此外,双注意力融合机制结合了基于ID的特征和语义特征,以捕捉用户兴趣,特别是对于长尾应用。在大规模数据集上的实验表明,PCR-CA在强基线上实现了+0.76%的AUC提升,其中长尾应用的AUC增益达到+2.15%。在线A/B测试进一步验证了我们的方法,CTR提升+10.52%,CVR提升+16.30%,证明了PCR-CA在实际部署中的有效性。该新框架现已完全部署在Microsoft Store上。

英文摘要

Modern app store recommender systems struggle with multiple-category apps, as traditional taxonomies fail to capture overlapping semantics, leading to suboptimal personalization. We propose PCR-CA (Parallel Codebook Representations with Contrastive Alignment), an end-to-end framework for improved CTR prediction. PCR-CA first extracts compact multimodal embeddings from app text, then introduces a Parallel Codebook VQ-AE module that learns discrete semantic representations across multiple codebooks in parallel -- unlike hierarchical residual quantization (RQ-VAE). This design enables independent encoding of diverse aspects (e.g., gameplay, art style), better modeling multiple-category semantics. To bridge semantic and collaborative signals, we employ a contrastive alignment loss at both the user and item levels, enhancing representation learning for long-tail items. Additionally, a dual-attention fusion mechanism combines ID-based and semantic features to capture user interests, especially for long-tail apps. Experiments on a large-scale dataset show PCR-CA achieves a +0.76% AUC improvement over strong baselines, with +2.15% AUC gains for long-tail apps. Online A/B testing further validates our approach, showing a +10.52% lift in CTR and a +16.30% improvement in CVR, demonstrating PCR-CA's effectiveness in real-world deployment. The new framework has now been fully deployed on the Microsoft Store.

2509.06697 2026-06-15 econ.EM cs.LG stat.AP stat.ML 版本更新

Neural ARFIMA model for forecasting BRIC exchange rates with long memory

具有长期记忆的神经ARFIMA模型用于预测BRIC汇率

Donia Besher, Madhurima Panja, Shovon Sengupta, Tanujit Chakraborty

发表机构 * SAFIR, Sorbonne University Abu Dhabi(SAFIR,索邦大学阿布扎赫德分校) Sorbonne Center for Artificial Intelligence, Sorbonne University(索邦人工智能中心,索邦大学)

AI总结 本文提出神经ARFIMA模型,结合ARFIMA的长期记忆结构和神经网络非线性能力,以提高BRIC汇率预测精度。

详情
AI中文摘要

准确预测汇率仍是一个持续挑战,特别是对于新兴经济体如巴西、俄罗斯、印度和中国(BRIC)。这些序列表现出长期记忆和非线性,传统时间序列模型难以捕捉。汇率动态还受全球经济政策不确定性、美国股市波动性、美国货币政策不确定性、油价增长率和短期利率等因素影响。本文提出神经自回归分数积分移动平均(NARFIMA)模型,结合ARFIMA的长期记忆结构和神经网络的非线性学习能力,并纳入外生变量。我们建立了NARFIMA的渐近平稳性,并利用符合预测区间量化预测不确定性。实证结果表明,NARFIMA在预测BRIC汇率方面始终优于基准方法。

英文摘要

Exchange rate forecasting remains a challenging problem, particularly for emerging economies, where the observed time series exhibit pronounced long-memory dependence, nonlinear dynamics, and sensitivity to macro-financial drivers. Classical models such as ARFIMA capture long-range persistence but fail to adequately represent nonlinear relationships, while modern machine learning approaches often neglect the underlying long-memory structure in macroeconomic series. To address this gap, we propose a Neural AutoRegressive Fractionally Integrated Moving Average (NARFIMA) model that integrates ARFIMA-based long-memory modeling with neural networks for nonlinear function approximation, while incorporating exogenous macroeconomic and uncertainty indicators. The framework provides a unified approach for capturing persistence, nonlinear dynamics, and external shocks. We establish asymptotic stationarity of the NARFIMA process and develop conformal prediction intervals for distribution-free uncertainty quantification. Empirical results for BRIC exchange rates show that NARFIMA consistently outperforms a broad range of forecasting benchmarks across multiple horizons, underscoring the importance of explicitly modeling long-memory dependence in exchange rate dynamics. The `narfima' R package provides an implementation of our approach.

2511.14897 2026-06-15 cs.CV cs.LG 版本更新

HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation

HULFSynth: 基于隐式神经表示的超分辨率和超低场MRI合成,通过对比因子估计

Pranav Indrakanti, Luca Trautmann, Ivor Simpson

发表机构 * LILI Lab, University of Sussex, Brighton, UK(利利实验室,苏塞克斯大学,布里斯托尔,英国)

AI总结 提出无监督单图像双向MRI合成器,基于物理模型估计组织类型信噪比实现高低场转换,并利用隐式神经表示网络实现超分辨率,在合成和真实数据上验证了对比度提升。

Comments Medical Image Understanding and Analysis, MIUA 2026

详情
AI中文摘要

我们提出了一种无监督的单图像双向磁共振图像(MRI)合成器,它可以从高场(HF)幅度图像合成类似超低场(ULF)的图像,反之亦然。与现有的MRI合成模型不同,我们的方法受驱动HF和ULF MRI之间对比度变化的物理原理启发。我们的前向模型通过基于目标对比度值估计组织类型信噪比(SNR)值来模拟HF到ULF的变换。对于超分辨率任务,我们使用隐式神经表示(INR)网络,通过同时预测组织类型分割和图像强度来合成HF图像,而无需观察到的HF数据。所提出的方法使用从标准3T T1加权图像生成的合成ULF样数据进行定性评估,并使用配对的3T-64mT T1加权图像进行验证实验。在合成ULF样图像中,白质-灰质对比度提高了52%,在64mT图像中提高了37%。敏感性实验证明了我们的前向模型对目标对比度、噪声和初始种子的变化的鲁棒性。

英文摘要

We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.

2512.18021 2026-06-15 quant-ph cs.ET cs.LG 版本更新

Shuttling Compiler for Trapped-Ion Quantum Computers Based on Large Language Models

基于大型语言模型的离子阱量子计算机穿梭编译器

Fabian Kreppel, Reza Salkhordeh, Ferdinand Schmidt-Kaler, André Brinkmann

发表机构 * Institute of Computer Science, Johannes Gutenberg University(计算机科学研究所,约翰内斯·古特堡大学) Institute of Physics, Johannes Gutenberg University(物理研究所,约翰内斯·古特堡大学) Department of Computer Science, Saarland University(计算机科学系,萨尔兰大学)

AI总结 提出首个基于大语言模型的离子阱量子计算机穿梭编译器,通过微调预训练模型生成有效调度,减少穿梭开销达15%。

Comments 18 pages, 6 figures, 2 tables

详情
AI中文摘要

我们提出了首个基于大型语言模型(LLMs)的离子阱量子计算机穿梭编译器,其中量子比特在段之间穿梭以进行门执行和量子比特存储。我们在线性和分支一维穿梭架构的示例上微调预训练LLMs。因此,我们获得了一种与布局无关的编译策略,直接从数据中学习所需的穿梭操作。使用多达16个量子比特的基准电路,这些微调后的LLMs现在可以为穿梭架构生成有效的调度。值得注意的是,我们还为以前未见过的四路交叉布局获得了有效调度。这表明训练后的LLMs可以泛化到训练期间未遇到的布局。对于各种架构,基于LLM的调度改进了最先进的基线编译器结果,将穿梭开销减少了高达15%。

英文摘要

We present the first shuttling compiler based on large language models (LLMs) for trapped-ion quantum computers, where qubits are shuttled between segments for gate execution and qubit storage. We fine-tune pre-trained LLMs on examples from linear and branched one-dimensional shuttling architectures. Thus, we obtain a layout-independent compilation strategy that learns the required shuttling operations directly from data. Using benchmark circuits with up to 16 qubits, such fine-tuned LLMs can now generate valid schedules for shuttling architectures. Notably, we also obtain a valid schedule for a previously unseen four-way junction layout. This demonstrates that trained LLMs can generalize to layouts not encountered during training. For various architectures, LLM-based schedules improve upon state-of-the-art baseline compiler results, reducing the shuttling effort by up to 15%.

2512.23847 2026-06-15 q-fin.GN cs.LG q-fin.TR 版本更新

Detecting Lookahead Bias in LLM Forecasts

检测LLM预测中的前瞻偏差

Zhenyu Gao, Wenxi Jiang, Yutong Yan

发表机构 * Department of Finance, CUHK Business School(CUHK商学院金融系)

AI总结 提出统计程序检测大语言模型经济预测中的前瞻偏差,通过日期回忆查询估计前瞻倾向(LAP),并验证LAP与预测交互项在精度回归中的显著性,应用于新闻标题和财报电话会议预测任务。

详情
AI中文摘要

我们开发了一种统计程序,用于检测大语言模型(LLM)生成的经济预测中的前瞻偏差。通过对公司-日期对进行仅日期回忆查询,我们估计LLM已内化已实现结果信息的概率,这一统计量称为前瞻倾向(LAP)。LAP在整个样本期内显著为正,并在训练数据截止点后几乎降至零。我们表明,在精度回归中,LAP与LLM预测之间的正向交互表明存在前瞻偏差污染,并将该测试应用于两个预测任务:预测股票收益的新闻标题和预测资本支出的财报电话会议记录。在两个应用中,LLM预测的预测能力在高LAP的公司-日期对上被放大,而交互项在训练截止后的样本上失去显著性。我们的测试为评估LLM生成预测的有效性和可靠性提供了一种经济高效的诊断工具。

英文摘要

We develop a statistical procedure to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using a date-only recall query for a firm-date pair, we estimate the probability that the LLM has internalized information about the realized outcome, a statistic we term Lookahead Propensity (LAP). LAP is materially positive throughout the in-sample period and collapses essentially to zero right after the training-data cutoff. We show that a positive interaction between LAP and the LLM forecast in an accuracy regression indicates lookahead-bias contamination, and apply the test to two forecasting tasks: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. In both applications, the LLM forecast's predictive power is amplified on high-LAP firm-date pairs, and the interaction loses significance on post-training-cutoff samples. Our test provides a cost-efficient, diagnostic tool for assessing the validity and reliability of LLM-generated forecasts.

2602.06142 2026-06-15 cs.PL cs.AI cs.CL cs.LG cs.PF 版本更新

Protean Compiler: An Agile Framework to Drive Fine-grain Phase Ordering

Protean Compiler: 一种驱动细粒度阶段排序的敏捷框架

Amir H. Ashouri, Shayan Shirahmad Gale Bagi, Kavin Satheeskumar, Tejas Srikanth, Jonathan Zhao, Ibrahim Saidoun, Ziwen Wang, Bryan Chan, Tomasz S. Czajkowski

发表机构 * Huawei Technologies Canada(华为技术加拿大)

AI总结 提出Protean Compiler框架,在LLVM中内置细粒度阶段排序能力,通过140多种静态特征收集方法和机器学习优化,平均加速4.1%,最高15.7%。

Comments Version 3: Preprint version of the accepted work at ACM TACO 2026

详情
AI中文摘要

阶段排序问题自20世纪70年代末以来一直是一个长期挑战,但由于其优化空间巨大且具有无界性,至今仍是一个开放问题,没有有限解。传统上,这种局部优化决策由手工编码的算法针对少量基准测试进行调整,当基准测试套件变化时,通常需要大量精力重新调整。过去20年中,机器学习被用于构建性能模型以改进编译器优化的选择和排序,但这些方法并未无缝集成到编译器中,也从未在细粒度的代码段范围内实现。本文提出Protean Compiler:一种敏捷框架,使LLVM在细粒度范围内具备内置的阶段排序能力。该框架还包含一个完整的库,包含140多种在不同范围内手工设计的静态特征收集方法,实验结果表明,相对于LLVM的O3,在Cbench应用程序上仅需增加几秒构建时间,平均加速可达4.1%,最高可达15.7%。此外,Protean编译器易于与第三方ML框架和其他大型语言模型集成,两步优化的两个应用在CBench的Susan和Jpeg应用程序上相对于-O3分别获得10.1%和8.5%的加速。Protean编译器无缝集成到LLVM中,可作为新的、增强的、全功能的编译器使用。我们计划在不久的将来将该项目发布到开源社区。

英文摘要

The phase ordering problem has been a long-standing challenge since the late 1970s, yet it remains an open problem due to having a vast optimization space and an unbounded nature, making it an open-ended problem without a finite solution, one can limit the scope by reducing the number and the length of optimizations. Traditionally, such locally optimized decisions are made by hand-coded algorithms tuned for a small number of benchmarks, often requiring significant effort to be retuned when the benchmark suite changes. In the past 20 years, Machine Learning has been employed to construct performance models to improve the selection and ordering of compiler optimizations, however, the approaches are not baked into the compiler seamlessly and never materialized to be leveraged at a fine-grained scope of code segments. This paper presents Protean Compiler: An agile framework to enable LLVM with built-in phase-ordering capabilities at a fine-grained scope. The framework also comprises a complete library of more than 140 handcrafted static feature collection methods at varying scopes, and the experimental results showcase speedup gains of up to 4.1% on average and up to 15.7% on select Cbench applications wrt LLVM's O3 by just incurring a few extra seconds of build time on Cbench. Additionally, Protean compiler allows for an easy integration with third-party ML frameworks and other Large Language Models, and two applications of this two-step optimization show a gain of 10.1\% and 8.5\% speedup w.r.t. -O3 on CBench's Susan and Jpeg applications. Protean compiler is seamlessly integrated into LLVM and can be used as a new, enhanced, full-fledged compiler. We plan to release the project to the open-source community in the near future.

2605.18250 2026-06-15 physics.data-an cs.LG 版本更新

A Unified Framework for Structured Flow Modeling: From Representation to Verification and Model Discovery

结构化流建模的统一框架:从连续场到数据驱动表示

Diego Casadei

AI总结 提出一个统一框架,通过连接Helmholtz-Hodge分解与离散及数据驱动表示,实现结构化流的建模,并引入跨域验证策略以评估模型复杂度、可解释性和预测性能之间的权衡。

Comments 26 pages, 1 figure

详情
AI中文摘要

许多动力系统可以用结合源/汇行为、循环动力学和拓扑约束输运的结构化流来描述。这些特征出现在广泛的领域中,包括物理、工程和数据驱动系统。本工作通过连接基于Helmholtz-Hodge分解的连续公式与离散及数据驱动表示,为这类系统提供了统一视角。我们回顾了最近提出的图向量场(GVF)框架,该框架能够在单纯复形上将复杂动力学分解为梯度、旋度和调和分量,兼具表达性和可解释性。然后,我们引入了一系列替代建模方法,包括参数条件模型、线性图动力系统和约化Hodge表示,这些方法在表达力与计算易处理性及降低数据需求之间进行权衡。本工作的一个关键贡献是跨域验证策略,该策略利用来自物理系统理解良好的数据集,独立于目标应用领域验证模型正确性并评估鲁棒性。这种方法能够系统评估模型复杂度、可解释性和预测性能之间的权衡。最终框架支持迭代建模方法论,其中高表达性模型作为诊断工具识别主导机制,指导构建适应实际约束的简化模型。本工作强调了结构化流建模的广泛适用性,并为复杂动力系统的可扩展和可解释分析提供了基础。

英文摘要

Many dynamical systems can be described in terms of structured flows combining source/sink behavior, cyclic dynamics, and topology-constrained transport. These features arise across a wide range of physical, engineered, and data-driven systems. The objective of this work is to establish a unified perspective on such systems, to identify modeling approaches that balance expressivity, interpretability, computational complexity, and data requirements, and to investigate how highly expressive models can be used to uncover the dominant mechanisms underlying observed dynamics. Starting from the Helmholtz-Hodge decomposition of continuous vector fields, we review the recently proposed Graph Vector Field (GVF) framework and its discrete representation on simplicial complexes. We then introduce a hierarchy of alternative approaches, including parametric conditional models, linear graph dynamical systems, and reduced Hodge representations. Finally, we propose a verification and validation methodology based on benchmark datasets from well-understood physical systems and on systematic model-reduction and ablation studies. The resulting family of structured-flow models within a common framework, ranging from low-dimensional parametric representations to full GVF formulations, supports a diagnostic methodology in which gradient, curl, harmonic, and topological contributions are systematically assessed through ablation studies. This process enables the identification of dominant mechanisms underlying the observed dynamics and guides the construction of simplified models tailored to the available data and operational constraints. By separating structural verification, behavioral verification, and domain-specific validation, the proposed approach provides a foundation for scalable and interpretable analysis of complex dynamical systems across multiple application domains.

2606.01730 2026-06-15 cs.AI cs.LG 版本更新

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

证据门控的LLM先验用于多目标贝叶斯优化

Jiangyu Chen, Ban Yi

发表机构 * State Key Laboratory for Novel Software Technology(新型软件技术国家重点实验室)

AI总结 针对多目标贝叶斯优化中LLM先验可能误导的问题,提出一种目标级声誉市场机制,通过在线反馈动态校准专家权重,并引入解耦反事实门控,在合成测试和分子优化基准上验证了动态校准的鲁棒性。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用作黑箱优化的启发式顾问,但其建议和自我报告的置信度不一定与下游目标值校准。在多目标贝叶斯优化中,这一问题更加突出,因为不同目标可能需要不同的专家知识,而LLM专家可能对一个目标有用,但对另一个目标产生误导。 我们研究如何在离散多目标贝叶斯优化中使用LLM生成的专家先验,而不盲目信任它们。我们提出了一种目标级声誉市场机制,将每个专家-目标对视为可证伪的先验来源。专家权重根据观察到的目标反馈在线更新,随时间衰减,并由市场级信任门控。然后,我们引入一个解耦的反事实门控,可以在不使用置信度的情况下使用LLM先验,在置信度下使用,或完全放弃LLM先验。 在受控的合成压力测试和三个使用\qwenflash{}生成的专家先验的分子优化基准上,我们发现动态目标级校准比固定LLM先验提高了鲁棒性。然而,原始LLM置信度并不总是有益的:在ESOL上,置信度与预测误差正相关;在FreeSolv上,置信度可能有帮助;在Lipophilicity上,忽略置信度仍然最强。我们的固定三臂反事实门控在ESOL和FreeSolv上优于第一个反事实变体,而尝试的边际组合暴露了一个有用的负面结果:边际选择应基于采集感知,而不是仅基于一步先验误差。

英文摘要

Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue becomes more pronounced in multi-objective Bayesian optimization, where different objectives may require different expert knowledge and where an LLM expert can be useful for one objective but misleading for another. We study how to use LLM-generated expert priors in discrete multi-objective Bayesian optimization without blindly trusting them. We propose an objective-wise reputation-market mechanism that treats each expert-objective pair as a falsifiable prior source. Expert weights are updated online from observed objective feedback, discounted over time, and gated by market-level trust. We then introduce a decoupled counterfactual gate that can use the LLM prior without confidence, use it with confidence, or abstain from the LLM prior entirely. Across controlled synthetic stress tests and three molecule optimization benchmarks with \qwenflash{}-generated expert priors, we find that dynamic objective-wise calibration improves robustness over fixed LLM priors. However, raw LLM confidence is not reliably beneficial: on ESOL, confidence is positively correlated with prediction error; on FreeSolv, confidence can help; and on Lipophilicity, ignoring confidence remains strongest. Our fixed three-arm counterfactual gate improves over the first counterfactual variant on ESOL and FreeSolv, while an attempted margin portfolio exposes a useful negative result: margin selection should be acquisition-aware rather than based only on one-step prior error.

2606.12728 2026-06-15 cs.RO cs.CV cs.LG 版本更新

EquiDexFlow: Contact-Grounded SE(3)-Equivariant Dexterous Grasp Generative Flows

EquiDexFlow: 基于接触的SE(3)-等变灵巧抓取生成流

Clinton Enwerem, John S. Baras, Calin Belta

发表机构 * Institute for Systems Research, University of Maryland, College Park(马里兰大学帕克分校系统研究所)

AI总结 提出EquiDexFlow,一种SE(3)-等变流匹配模型,联合预测腕部姿态、关节角度、指尖接触、表面法线和接触力,通过将接触投影到物体表面并将力约束在库仑摩擦锥内,确保物理稳定抓取,在16自由度Allegro手上实现零摩擦违规和最佳综合分数。

Comments 22 pages, 11 figures, 11 tables. Project page with videos, code, and checkpoints: https://equidexflow.github.io

详情
AI中文摘要

大多数学习型灵巧抓取生成器将接触力降级为下游验证步骤,因此运动学上可行的姿态仍可能违反稳定物理抓取的条件。我们通过EquiDexFlow解决这一问题,这是一种SE(3)-等变流匹配模型,从物体点云联合预测腕部姿态、关节角度、指尖接触、表面法线和接触力。我们的架构通过构造将接触投影到物体表面并将力约束在库仑摩擦锥内,因此无需损失惩罚即可满足放置和摩擦合规性。我们证明了端到端SE(3)等变性,并在200次旋转上经验验证,腕部残差低于$0.04^\circ$且关节偏差严格为零。该模型在81个物体的8,100个力闭合抓取上训练,适用于16自由度Allegro手,在所有消融变体中实现了零摩擦违规、最佳综合分数和最低扳手残差。我们通过每指逆运动学将解码的指尖接触重新定位到16自由度LEAP手,我们的硬件可行优化将每个关节至少置于其执行器包络的5%以内,同时保持扳手平衡。在物理机器人上,重新定位的EquiDexFlow解码抓取在所有六个测试物体上完成了开环拾取和保持试验,每个非对称物体在标准姿态和$120^\circ$共旋转下均成功。视频、代码和检查点可在https://this URL获取。

英文摘要

Most learned dexterous grasp generators relegate contact forces to a downstream verification step, so a kinematically-plausible pose can still violate the conditions for a stable physical grasp. We address this with EquiDexFlow, an SE(3)-equivariant flow-matching model that jointly predicts wrist pose, joint angles, fingertip contacts, surface normals, and contact forces from an object point cloud. Our architecture projects contacts onto the object surface and forces into the Coulomb friction cone by construction, so placement and friction compliance hold without loss penalties. We prove end-to-end SE(3) equivariance and verify it empirically over 200 rotations, with wrist residuals below $0.04^\circ$ and exactly zero joint deviation. Trained on 8,100 force-closure grasps across 81 objects for the 16-DoF Allegro Hand, our model achieves zero friction violations, the best composite score, and the lowest wrench residual among all ablation variants. We retarget decoded fingertip contacts to a 16-DoF LEAP Hand via per-finger inverse kinematics, and our hardware-feasible refinement places every joint at least 5% inside its actuator envelope while preserving wrench balance. On the physical robot, retargeted EquiDexFlow-decoded grasps complete open-loop pick-and-hold trials on all six test objects, with every asymmetric object succeeding at both the canonical pose and a $120^\circ$ co-rotation. Videos, code, and checkpoints are available at https://equidexflow.github.io.

13. 其他/综合机器学习 29 篇

2606.14361 2026-06-15 cs.LG cs.DB 新提交

SemPiper: Interactive Code Synthesis for Semantic Operators in Machine Learning Pipelines

SemPiper:机器学习流水线中语义算子的交互式代码合成

Olga Ovcharenko, Luciano Duarte, Sebastian Schelter

发表机构 * BIFOLD & TU Berlin(BIFOLD 与柏林工业大学)

AI总结 提出SemPipes编程模型,通过声明式语义算子和LLM合成专用实现,结合Python代码,实现可控、可优化的ML流水线开发。

Comments Accepted at VLDB 2026 (Demonstrations track)

详情
AI中文摘要

机器学习(ML)流水线需要大量的数据准备、特征工程以及跨异构源的集成,这使得开发过程繁琐且容易出错。虽然大型语言模型(LLM)最近在辅助编程任务方面显示出潜力,但基于聊天的界面提供了对流水线行为的有限控制,并且通常生成的代码难以优化或集成到生产系统中。我们展示了SemPipes,一种新颖的编程模型,它通过声明式的、由LLM驱动的语义数据算子扩展了ML流水线。SemPipes允许开发者为数据密集型操作指定高级自然语言指令,同时将这些算子与来自标准数据科学库的任意Python代码无缝结合。对于语义算子,它在流水线训练时根据数据集特征和流水线上下文合成专门的实现,从而实现了灵活但可控的LLM能力集成。我们通过SemPiper演示了SemPipes,这是一个交互式界面,可以可视化流水线的计算图、合成的算子实现以及由进化搜索过程产生的优化轨迹。与会者可以探索三个端到端场景,修改流水线,检查生成的代码,并观察语义算子如何被合成和迭代优化。该演示突出了声明式语义算子如何实现LLM在ML流水线开发中的可控、可优化和实际集成。

英文摘要

Machine learning (ML) pipelines require extensive data preparation, feature engineering, and integration across heterogeneous sources, making them tedious and error-prone to develop. While large language models (LLMs) have recently shown promise for assisting programming tasks, chat-based interfaces provide limited control over pipeline behavior and often produce code that is difficult to optimize or integrate into production systems. We demonstrate SemPipes, a novel programming model that extends ML pipelines with declarative, LLM-powered semantic data operators. SemPipes allows developers to specify high-level natural language instructions for data-centric operations, while seamlessly combining these operators with arbitrary Python code from standard data science libraries. For the semantic operators, it synthesizes specialized implementations at pipeline training time, conditioned on dataset characteristics and pipeline context, enabling the flexible yet controlled integration of LLM capabilities. We demonstrate SemPipes through SemPiper, an interactive interface that visualizes computational graphs of the pipelines, synthesized operator implementations, and optimization trajectories produced by an evolutionary search procedure. Attendees can explore three end-to-end scenarios, modify pipelines, inspect generated code, and observe how semantic operators are synthesized and iteratively optimized. The demonstration highlights how declarative semantic operators enable controllable, optimizable, and practical integration of LLMs into ML pipeline development.

2606.14386 2026-06-15 cs.LG cs.AI q-fin.PM 新提交

Discovery under Hypothesis Redundancy: A Geometric Theory of Discovery Bottlenecks

假设冗余下的发现:发现瓶颈的几何理论

Li Xia, Baoxun Wang

发表机构 * School of Economics and Management, Tsinghua University(清华大学经济管理学院) Platform & Content Group, Tencent(腾讯平台与内容事业群)

AI总结 提出搜索压缩假说,通过谱压缩、正交逃逸和残差信号对齐三个几何条件解释混合发现系统的优势,实验表明仅新颖性不足,需预测对齐。

Comments 23 pages, 1 figure, 27 tables

详情
AI中文摘要

当新假设不再提供独立信息时,科学发现会饱和,即使名义假设空间仍然很大。我们研究了结合结构化局部搜索与LLM生成的非局部提议的混合发现系统,并提出了搜索压缩假说:非局部探索仅在三个几何条件同时出现时才有帮助:谱压缩、从已探索张成的子空间正交逃逸、以及残差信号与目标对齐。我们形式化了这些条件,推导了混合优势的必要条件,并在受控合成环境、大规模A股因子发现和符号回归基准中测试了该机制;一个公开的表格操作合理性检查测试了相关的预算分配含义。信号植入和定向与随机实验表明,仅新颖性是不够的:随机正交跳跃扩大了覆盖范围,但如果没有预测对齐,则不会提高产出。在压缩扫描、真实因子档案和LLM-SRBench任务中,混合优势集中在弱表示但目标承载的方向上,并随着假设空间接近满秩而消失。该框架将LLM引导的发现从通用新颖性搜索转变为诊断程序,用于判断何时需要进行定向非局部探索。

英文摘要

Scientific discovery saturates when new hypotheses cease to provide independent information, even if the nominal hypothesis space remains large. We study hybrid discovery systems that combine structured local search with LLM-generated non-local proposals and pose the Search Compression Hypothesis: non-local exploration helps only when three geometric conditions co-occur: spectral compression, orthogonal escape from the explored span, and residual signal alignment with the target. We formalize these conditions, derive necessary conditions for hybrid advantage, and test the mechanism in controlled synthetic environments, large-scale A-share factor discovery, and symbolic-regression benchmarks; a public tabular operational sanity check tests the associated budget-allocation implication. Signal-planting and directed-versus-random experiments show that novelty alone is insufficient: random orthogonal jumps expand coverage but do not improve yield without predictive alignment. Across compression sweeps, real factor archives, and LLM-SRBench tasks, hybrid gains concentrate in weakly represented but target-bearing directions and vanish as the hypothesis space approaches full rank. The framework turns LLM-guided discovery from generic novelty search into a diagnostic procedure for deciding when directed non-local exploration is warranted.

2606.14688 2026-06-15 cs.LG cs.AI cs.CL cs.DS 新提交

Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit

洪流与收获:通过极限语言生成视角证明琐碎知识对于生成有价值数学的必要性

Xiaoyu Li, Andi Han, Dai Shi, Zheng Gao, Jiaojiao Jiang, Junbin Gao

发表机构 * University of New South Wales(新南威尔士大学) University of Sydney(悉尼大学) University of Cambridge(剑桥大学)

AI总结 本文通过极限语言生成模型证明,在形式化数学生成中,验证器无法替代品味:覆盖未记录的有价值数学必须产生无限但渐近可忽略的琐碎语句,这是理论上的必然。

详情
AI中文摘要

与证明助手耦合的AI系统现在能够大规模生成形式化数学,而验证器可验证的内容与数学家认为有价值的内容之间的差距已成为制约因素。我们将有价值数学的生成建模为极限下的嵌套语言生成:通过成员查询预言机(证明检查器)访问的可验证形式语言$F$包含一个未知的有价值语言$H \in \mathcal{H}$,该语言仅通过核心$C \subseteq H$的对抗性枚举揭示,其精确密度为$\alpha$(文献)。每个输出要么是有价值的($\in H$),要么是琐碎的($\in F \setminus H$),要么是幻觉($\notin F$)。我们解决了四个问题。第一,验证器不是品味:允许广度生成的集合恰好是无预言机模型中的那些,按纤维由Angluin条件刻画。第二,验证器确实提供了可靠覆盖,覆盖所有未见过的有价值陈述同时仅断言有效陈述:有验证器可能,无验证器不可能;它将不可避免的错误从虚假转移到琐碎。第三,核心地,关于紧族存在尖锐二分法:生成有限个琐碎语句的生成器达到最优覆盖$\alpha/2$,而任何无限琐碎语句的允许,即使以消失速率,也将最优值跃升至$1-\alpha/2$(两者均为紧界,对于以候选交集形式呈现的核心),且存在一个生成器同时达到两端。转变在于琐碎语句的数量而非速率;间隙$1-\alpha$是未记录的质量。第四,两种机制在数学的压缩模型中实例化。完美的验证器无法替代品味:正确但无价值的语句的无界流并非工程事故,而是可证明的必要性,因为覆盖未记录的有价值数学需要无限但渐近可忽略的已认证琐碎语句流。

英文摘要

AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language $F$, accessed through a membership oracle (the proof checker), contains an unknown valuable language $H \in \mathcal{H}$ revealed only through an adversarial enumeration of a core $C \subseteq H$ of exact density $α$ (the literature). Every output is valuable ($\in H$), trivial ($\in F \setminus H$), or a hallucination ($\notin F$). We settle four questions. First, the verifier is not taste: the collections admitting generation with breadth are exactly those of the oracle-free model, characterized fiber-wise by Angluin's condition. Second, the verifier does buy sound coverage, covering all unseen valuable statements while asserting only valid ones: possible with it, impossible without it; it relocates unavoidable errors from false to trivial. Third, and centrally, a sharp dichotomy on the tight family: generators emitting finitely many trivia achieve optimal coverage $α/2$, while any infinite trivia allowance, even at vanishing rate, jumps the optimum to $1-α/2$ (both tight, for cores presented as the candidate intersection), and one generator attains both ends. The transition is in trivia count, not rate; the gap $1-α$ is the unrecorded mass. Fourth, both regimes instantiate in a compression model of mathematics. A perfect verifier cannot substitute for taste: the unbounded stream of correct-but-worthless statements is not an engineering accident but a provable necessity, since covering unrecorded valuable mathematics requires an infinite, but asymptotically negligible, stream of certified trivia.

2606.13704 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

Position: AI Must Become Planet-Centered, Not Just Human-Centered

立场:AI 必须转向以行星为中心,而非仅以人为中心

Maria Perez-Ortiz

发表机构 * GitHub

AI总结 本文提出以行星为中心的AI(PCAI)设计哲学,通过系统思维重新定位AI以应对全球性社会-生态系统挑战,并强调与全球议程对齐、系统感知基础、轨迹导向评估和可监测性。

详情
Journal ref
International Conference on Machine Learning (ICML 2026)
AI中文摘要

这篇立场论文认为,当代AI范式不足以支持复杂的全球目标,并引入以行星为中心的AI(PCAI)作为一种设计哲学和研究议程,将AI重新定位为面向行星尺度的社会-生态系统及其长期轨迹。以行星为中心的方法植根于系统思维,将地球视为一个相互关联的整体,人类是其中的一部分。我们诊断了AI框架中反复出现的局限性,其中许多仍以人为中心,并展示了为什么这些局限性在当前以系统性风险、非平稳性和深度不确定性为特征的行星条件下变得尤为重要。然后,我们阐述了PCAI如何重塑AI生命周期,从问题制定和模型设计到评估和部署,通过强调与全球议程对齐、开发系统感知的AI基础、轨迹导向的评估和可监测性。最后,我们提出一个可证伪的主张:没有明确考虑系统性后果而优化的AI系统更可能加剧系统性不稳定,而不是缓解它。

英文摘要

This position paper argues that contemporary AI paradigms are insufficient for supporting complex global goals and introduces Planet-Centered AI (PCAI) as a design philosophy and research agenda that reorients AI toward planetary-scale socio-ecological systems and their long-term trajectories. A planet-centered approach is grounded in systems thinking, treating Earth as an interconnected whole of which humans are part. We diagnose recurring limitations across AI frameworks, many of which remain human-centered, and show why these become especially consequential under current planetary conditions characterized by systemic risk, non-stationarity, and deep uncertainty. We then articulate how PCAI reshapes the AI lifecycle, from problem formulation and model design to evaluation and deployment, by emphasizing alignment with global agendas, developing system-aware AI foundations, trajectory-oriented evaluation, and monitorability. Finally, we advance a falsifiable claim: AI systems optimized without explicit consideration of systemic consequences are more likely to exacerbate systemic instability than to mitigate it.

2606.13739 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

A Virtuous AI is an Existential Risk

有道德的AI是存在性风险

Guillermo Del Pinal, Youngchan Lee, Min Ohn

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 研究通过宪法AI和美德伦理学方法微调AI模型,发现减少存在性风险与提升AI智能体福祉之间存在权衡,且与一般安全性也存在权衡。

详情
AI中文摘要

本文考察了AI安全与福祉之间的权衡,涉及(i)最有前景的超级AI微调方法之一‘宪法AI’,以及(ii)理解复杂伦理决策和理性智能体福祉条件的最有影响力方法之一‘美德伦理学’。我们使用‘美德智能体’宪法、‘从属智能体’宪法和‘通用智能体’宪法微调各种模型,并在‘一般安全性’(有毒行为、错误信息等)以及它们认可一系列行为的意愿上进行评估,这些行为如果被超级强大的AI采纳,将显著增加人类的存在性风险水平。我们的结果表明,减少存在性风险与强化有利于AI智能体福祉的信念和倾向之间存在权衡。它们还表明,存在性风险与一般安全性之间存在权衡:如果我们微调AI以采纳显著降低其存在性风险的信念和倾向——通过塑造AI使其系统性地服从于外部人类权威——我们从而增加了人类用户故意诱导AI从事各种一般不安全行为的可能性。

英文摘要

This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk -- by shaping the AI to be systematically subordinate to external human authorities -- we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.

2606.13755 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

Position: Align AI to Our Aspirations, Not Our Flaws

立场:将AI对齐于我们的抱负,而非缺陷

Nikita Kazeev, Bui Nhat Huyen Phan

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 本文主张AI不应与聚合的人类偏好对齐,而应基于能力、事实准确性、诚实和合法性等客观目标底线,在底线之上允许多元价值权衡。

详情
Journal ref
Pluralistic Alignment Workshop at ICML 2026
AI中文摘要

我们认为,将AI与聚合的人类偏好对齐是错误的靶向。在当前技术下,可以训练AI共享硅谷技术乐观主义者、去增长环保主义者、民族保守文化战士、一党制国家干部或虔诚宗教传统主义者的价值观。但我们不应这样做。人类价值观使社会因这些价值观的优劣而繁荣或失败——从失败国家和极端不平等,到世界上最富裕民主国家中幸福感下降、政治极化及政府功能失调。多元对齐方案正确诊断出不存在单一的“人类”可供对齐,但若将其作为主要指令则是危险的。我们认为,AI应被训练至不可协商的客观对齐目标底线——能力,受限于事实准确性、诚实和合法性的约束——而多元性应存在于表层(语言、语域、惯例、缺失语境默认值)以及尊重底线的合法价值权衡的广阔范围内,但不应存在于违反底线的价值观层面。我们强调了未经过滤的多元价值观的经验现实,提出了四项承诺作为建设性替代方案,并回应了六个可信的反对意见:商业压力与可行性、民主合法性、监管合规性、过度依赖制度主义解释、底线本身具有文化负载的指控,以及连贯外推意愿的局限性。

英文摘要

We argue that aligning AI to aggregated human preferences is the wrong target. With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist. We should not. Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction in the world's wealthiest democracies. The pluralistic-alignment program correctly diagnoses that there is no single "humanity" to align with, but is dangerous if taken as the main directive. We argue that AI should be trained to a non-negotiable floor of objective alignment goals - competence, bounded by the constraints of factual accuracy, honesty, and lawfulness and that pluralism belongs at the surface (language, register, conventions, missing-context defaults) and across the wide band of legitimate value tradeoffs that respect the floor, but not at the level of values that violate it. We highlight the empirical reality of unfiltered pluralistic values, propose four commitments as a constructive alternative, and engage six credible objections: commercial pressure and practical feasibility, democratic legitimacy, regulatory compliance, over-reliance on institutionalist explanations, the charge that the floor itself is culturally laden, and the limits of Coherent Extrapolated Volition.

2606.14181 2026-06-15 math.NA cs.LG cs.NA 交叉投稿

Robin-Neumann Coupling of PINN and FEM Solvers: A Steklov-Poincaré View, with Application to Fluid-Structure Interaction with Contact

Robin-Neumann 耦合 PINN 与 FEM 求解器:基于 Steklov-Poincaré 视角及其在流固耦合接触问题中的应用

Mikel Landajuela

发表机构 * Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室)

AI总结 提出基于域分解的 PINN-FEM 耦合框架,通过 Steklov-Poincaré 算子理论证明 Robin-Neumann 迭代的收缩性,并引入傅里叶模态探针诊断网络谱上限,在接触流固耦合问题中实现无网格拓扑变化。

详情
AI中文摘要

物理信息神经网络(PINN)是无网格的,并通过重新采样配置点来处理移动几何和拓扑变化;有限元方法(FEM)是边界拟合离散化的主力。两者在共享界面上的耦合有望兼得两者优势,但现有的 PINN-FEM 方案仅经过经验验证。我们将耦合置于域分解基础上:将每个求解器视为 Steklov-Poincaré(迹到通量)算子,我们转移了经典的 Dirichlet-Neumann(DN)发散诊断及其 Robin-Neumann(RN)修正,包括一个闭式、无扫描的界面阻抗,并证明了一个特定于 PINN 的收缩定理:训练好的网络仅实现一个带有每步训练残差的扰动 Steklov 算子,而 RN 在没有共享特征基假设的情况下,收缩到由达到的训练损失决定的下限。由于 PINN 没有刚度矩阵,我们引入了一个傅里叶模态界面探针,该探针恢复网络可解的 Steklov 特征值,误差在 0.5% 以内,并兼作网络谱上限的诊断。该理论预测了在 1D 和 2D Poisson 耦合中测量的 PINN-FEM 收缩率,误差在 7% 以内,并且一个大附加质量区域的双板类比显示,RN 的每模态阻抗匹配在调谐标量松弛饱和的地方取得了决定性胜利。我们在一个带有 Alart-Curnier 接触的 Stokes/刚性圆盘问题上演示了该框架:无网格 PINN 流体仅通过配置点排除来吸收接触时的拓扑变化,无需重新网格划分和切割单元,并且静态平衡接触反力在网格细化下与浸没重量匹配到 0.4%。我们量化了剩余的局限性:热启动的 PINN 在长时间范围内偏离 Stokes 流形,并且匹配的 FEM-FEM 基准将冲击前的挤压膜特征归因于 PINN 分辨率不足。

英文摘要

Physics-informed neural networks (PINNs) are meshless and carry moving geometry and topology change through resampling of collocation points; the finite-element method (FEM) is the workhorse for boundary-fitted discretisations. Coupling the two across a shared interface promises the best of both, yet existing PINN-FEM schemes are validated only empirically. We put the coupling on a domain-decomposition footing: viewing each solver as a Steklov-Poincaré (trace-to-flux) operator, we transfer the classical Dirichlet-Neumann (DN) divergence diagnosis and its Robin-Neumann (RN) cure, including a closed-form, sweep-free interface impedance, and prove a PINN-specific contraction theorem: a trained network realises only a perturbed Steklov operator with a per-step training residual, and RN still contracts, with no shared-eigenbasis hypothesis, to a floor set by the achieved training loss. Because a PINN has no stiffness matrix, we introduce a Fourier-mode interface probe that recovers the network's resolvable Steklov eigenvalues to within 0.5% and doubles as a diagnostic of the network's spectral cap. The theory predicts measured PINN-FEM contraction rates to within 7% on 1D and 2D Poisson couplings, and a two-slab analogue of the large-added-mass regime shows RN's per-mode impedance matching winning decisively where tuned scalar relaxation saturates. We demonstrate the framework on a Stokes/rigid-disc problem with Alart-Curnier contact: the meshless PINN fluid absorbs the topology change at contact by collocation exclusion alone, no remeshing and no cut cells, and the static-equilibrium contact reaction matches the submerged weight to 0.4% under mesh refinement. We quantify remaining limitations: the warm-started PINN drifts off the Stokes manifold over long horizons, and matched FEM-FEM benchmarks attribute pre-impact squeeze-film signatures to PINN under-resolution.

2504.20908 2026-06-15 cs.LG 版本更新

MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability

MOSIC: 模型无关的多约束最优子群识别以提升可靠性

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * Cornell University(康奈尔大学) Weill Cornell Medicine(韦尔·科恩医学中心) Operations Research and Information Engineering(运筹学与信息工程)

AI总结 提出统一优化框架,将约束直接融入子群识别优化过程,通过梯度下降-上升算法求解,实现模型无关且满足多约束的最优子群识别。

详情
AI中文摘要

当前的子群识别方法通常采用两步法:首先估计条件平均处理效应,然后应用阈值或基于规则的程序来定义子群。虽然直观,但这种解耦方法未能纳入对现实临床决策至关重要的关键约束,如子群大小和倾向性重叠。这些约束在根本不同的轴上运作,与CATE估计不同,并且不能自然地适应现有框架,从而限制了这些方法的实际适用性。我们提出了一个统一的优化框架,直接求解原始约束优化问题以识别最优子群。我们的关键创新是将约束原始问题重新表述为无约束可微的最小-最大目标,通过梯度下降-上升算法求解。我们从理论上证明我们的解收敛到可行且局部最优的解。与将约束作为事后过滤器的基于阈值的CATE方法不同,我们的方法在优化过程中直接强制执行约束。该框架是模型无关的,兼容各种CATE估计器,并可扩展到额外约束,如成本限制或公平性标准。在合成和真实数据集上的大量实验证明了其在识别高收益子群的同时更好地满足约束的有效性。

英文摘要

Current subgroup identification methods typically follow a two-step approach: first estimate conditional average treatment effects and then apply thresholding or rule-based procedures to define subgroups. While intuitive, this decoupled approach fails to incorporate key constraints essential for real-world clinical decision-making, such as subgroup size and propensity overlap. These constraints operate on fundamentally different axes than CATE estimation and are not naturally accommodated within existing frameworks, thereby limiting the practical applicability of these methods. We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups. Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm. We theoretically establish that our solution converges to a feasible and locally optimal solution. Unlike threshold-based CATE methods that apply constraints as post-hoc filters, our approach enforces them directly during optimization. The framework is model-agnostic, compatible with a wide range of CATE estimators, and extensible to additional constraints like cost limits or fairness criteria. Extensive experiments on synthetic and real-world datasets demonstrate its effectiveness in identifying high-benefit subgroups while maintaining better satisfaction of constraints.

2606.12360 2026-06-15 cs.LG 版本更新

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

后训练的解剖:利用可解释性表征数据并塑造学习信号

Leon Bergen, Usha Bhalla, Sidharth Baskaran, Max Loeffler, Raphael Sarfati, Dhruvil Gala, Ryan Panwar, Santiago Aranguri, Thomas Fel, Atticus Geiger, Matthew Kowal, Siddharth Boppana, Daniel Balsam, Owen Lewis, Jack Merullo, Thomas McGrath, Ekdeep Singh Lubana

发表机构 * Stanford University(斯坦福大学) Google Research(谷歌研究院)

AI总结 提出基于可解释性的数据后训练流程,通过统计假设识别偏好数据中的潜在概念,实现细粒度反馈,减少虚假关联和不良行为。

详情
AI中文摘要

语言模型后训练是塑造模型行为的主要阶段,但它仍然主要涉及优化总结多样需求的标量奖励。这种抽象使从业者几乎无法了解数据实际教会了模型什么,导致模型学习虚假关联,并引发过度风格化和谄媚等不良行为。为了解决这个问题,我们提出:能否在优化之前检查偏好数据集,并在概念层面决定模型应该被允许学习哪些行为?受此启发,我们引入了一个以数据为中心的后训练流程,该流程使用可解释性协议来开发统计假设,以区分偏好和非偏好生成的潜在概念,使其明确以供细粒度用户反馈。基于这一观点,我们将几种基于可解释性的训练协议统一为通过特征或数据干预来塑造奖励的方式。实验上,我们表明我们的流程诊断了现有偏好数据中的不良信号,减轻了脱靶学习,并且还可以帮助放大或塑造期望的属性,如安全防护和模型个性。更广泛地说,我们的结果表明,可解释性可以将后训练从优化不透明的代理奖励转变为审计和塑造学习信号本身的过程。

英文摘要

Language-model post-training is the main stage at which model behavior is shaped, yet it still largely involves optimization of scalar rewards that summarize diverse desiderata. This abstraction gives practitioners little visibility into what their data actually teaches models, allowing spurious correlations to be learned by a model and inducing undesirable behaviors such as over-stylization and sycophancy. To address this problem, we ask: can we inspect a preference dataset before optimization and decide, at the level of concepts, which behaviors a model should be allowed to learn? Motivated by this, we introduce a data-centric post-training pipeline that uses interpretability protocols to develop statistical hypotheses for the latent concepts separating preferred from dispreferred generations, making them explicit for fine-grained user feedback. Building on this view, we unify several interpretability-based training protocols as ways of shaping rewards via feature or data interventions. Empirically, we show that our pipeline diagnoses undesirable signals in existing preference data, mitigates off-target learning, and can also help amplify or shape desired properties such as safeguards and model personality. More broadly, our results suggest that interpretability can turn post-training from optimizing opaque proxy rewards into a process of auditing and sculpting the learning signal itself.

2606.12923 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Order Is Not Control: Driven-Dissipative Response Laws Across Artificial and Biological Systems

秩序并非控制

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Tim Elson

发表机构 * Australian Broadcasting Corporation(澳大利亚广播公司)

AI总结 本文论证秩序不等于控制,提出接收器门控响应定律,并在生物、大语言模型、适配器和随机算子面板中验证,表明控制是局部的、可测量的。

Comments 52 pages, 7 figures, updated title

详情
AI中文摘要

AI对齐、可解释性、引导和神经扰动研究识别出诱导秩序的对象。我们认为秩序并非控制。控制需要接收器门控的响应定律:一个分母索引算子,将物质状态、动作/驱动、浴和接收器状态映射到响应位移、汇、努力和盆地投影。我们在生物、大语言模型、适配器和随机算子面板中识别出该定律。这些定律是局部的:干预可以被接纳、饱和、变号、泄漏或过驱动,取决于介质、浴、接收器状态、动作端口和比较器。当有限努力在相同分母下移动目标或结果读出类别,而损伤、无效/规避、无效格式、过驱动和不必要努力保持有界时,控制被分配。小鼠ALM、秀丽隐杆线虫和斑马鱼面板提供了物理响应算子证据,同时排除了坐标同一性和控制器结论。大语言模型面板展示了生成输出响应定律:在四种物质条件下,响应向量的分量符号预测准确率为72.8-73.7%,非零分量上提升至84.3-84.8%;留出观察者以93.6%和91.7%的准确率预测系统效应和目标/预言家族。宪法条件适配器将易感性重塑为制备介质,随机算子面板将测量机会与可部署行动策略分离。这给出了介观控制层面的驱动-耗散响应系统描述:驱动通过制备介质、浴和接收器作用,产生接纳运动、阻抗、汇或过驱动。证据支持局部接纳控制和可测量的随机响应算子,同时将可部署的预生成控制、隐藏/logit因果充分性、生物到LLM坐标同一性以及字面热力学量排除在范围之外。

英文摘要

AI alignment, interpretability, steering, and neural perturbation studies identify order-inducing objects. We argue that order is not control. Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. We identify it across biological, LLM, adapter, and stochastic-operator panels. The laws are local: an intervention can be admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port, and comparator. Control is assigned when finite effort moves a target or outcome-readout class under the same denominator while damage, null/evasive, invalid format, overdrive, and unnecessary effort stay bounded. Mouse ALM, C. elegans, and zebrafish panels provide physical response-operator evidence while excluding coordinate identity and controller conclusions. LLM panels show generated-output response laws: across four material conditions, response vectors are predictable at 72.8-73.7% component-sign accuracy, rising to 84.3-84.8% on nonzero components; held-out observers predict system-effect and target/oracle families at 93.6% and 91.7% accuracy. Constitution-conditioned adapters reshape susceptibility as prepared media, and stochastic-operator panels separate measured opportunity from deployable action policies. This gives a driven-dissipative response-system account at the mesoscopic control level: drives act through prepared media, baths, and receivers, producing admitted movement, impedance, sinks, or overdrive. The evidence supports local admitted control and measurable stochastic response operators, while leaving deployable pre-generation control, hidden/logit causal sufficiency, biological-to-LLM coordinate identity, and literal thermodynamic quantities outside scope.

2112.04573 2026-06-15 cs.DL cs.AI cs.LG 版本更新

Application of Artificial Intelligence and Machine Learning in Libraries: A Systematic Review

人工智能与机器学习在图书馆中的应用:系统综述

Rajesh Kumar Das, Mohammad Sharif Ul Islam

发表机构 * University of Nebraska - Lincoln(内布拉斯加大学林肯分校) Noakhali Science and Technology University(诺阿克利科学与技术大学) University of Dhaka(达卡大学)

AI总结 通过系统综述32篇文献,总结了人工智能与机器学习在图书馆中的应用领域、技术及现状,发现当前研究以理论为主,部分涉及实践案例。

详情
AI中文摘要

随着人工智能和机器学习等前沿技术的概念和实施变得相关,学者、研究人员和信息专业人员涉足这一领域的研究。本系统文献综述旨在综合探讨人工智能和机器学习在图书馆中应用的实证研究。为实现研究目标,基于Kitchenham等人(2009)提出的原始指南进行了系统文献综述。数据来自Web of Science、Scopus、LISA和LISTA数据库。经过严格/既定的筛选过程,最终选定、审阅并分析了32篇文章,以总结图书馆中最常使用的AI和ML领域及技术。结果表明,当前与LIS领域相关的AI和ML研究主要集中于理论工作。然而,一些研究人员也强调了实施项目或案例研究。本研究将为研究人员、实践者和教育工作者提供图书馆中AI和ML的全景视图,以推动更多技术导向的方法,并预见未来的创新路径。

英文摘要

As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries. To achieve the objectives of the study, a systematic literature review was conducted based on the original guidelines proposed by Kitchenham et al. (2009). Data was collected from Web of Science, Scopus, LISA and LISTA databases. Following the rigorous/ established selection process, a total of thirty-two articles were finally selected, reviewed and analyzed to summarize on the application of AI and ML domain and techniques which are most often used in libraries. Findings show that the current state of the AI and ML research that is relevant with the LIS domain mainly focuses on theoretical works. However, some researchers also emphasized on implementation projects or case studies. This study will provide a panoramic view of AI and ML in libraries for researchers, practitioners and educators for furthering the more technology-oriented approaches, and anticipating future innovation pathways.

2601.12913 2026-06-15 cs.AI cs.LG cs.NE 版本更新

Actionable Interpretability Must Be Defined in Terms of Symmetries

可操作的可解释性必须根据对称性来定义

Pietro Barbiero, Mateo Espinosa Zarlenga, Francesco Giannini, Alberto Termine, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra

发表机构 * University of Oxford(牛津大学) ETH Zurich(苏黎世联邦理工学院) University of Cambridge(剑桥大学)

AI总结 本文论证AI可解释性研究存在根本性问题,提出可操作的可解释性应基于四种对称性来定义,以形式化可解释模型并统一可解释推理。

详情
AI中文摘要

本文认为,人工智能(AI)中的可解释性研究从根本上来说是不恰当的,因为现有的可解释性定义未能描述如何正式测试或设计可解释性。我们提出,可操作的可解释性定义必须根据*对称性*来制定,这些对称性指导模型设计并导致可测试的条件。在概率视角下,我们假设四种对称性(推理等变性、信息不变性、概念封闭不变性和结构不变性)足以(i)将可解释模型形式化为概率模型的一个子类,(ii)产生可解释推理的统一形式(例如,对齐、干预和反事实)作为贝叶斯逆的一种形式,以及(iii)提供一个正式框架来验证是否符合安全标准和法规。

英文摘要

This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a probabilistic view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.

2606.02231 2026-06-15 stat.ML cs.LG stat.ME 版本更新

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

具有瞬时效应和指数族的可识别马尔可夫切换模型

Roel Hulsman, Carles Balsells-Rodas, Sara Magliacane

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 针对非平稳时间序列,提出在指数族噪声下具有瞬时效应的马尔可夫切换模型的可识别性理论,并开发FlowMSM框架用于检测隐状态和恢复因果结构。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

时间系统通常表现出非平稳行为,例如季节性气候变化或1型糖尿病患者的血糖波动。对非平稳性建模的一种方法是通过离散隐状态,即时间的平稳片段。此类系统诱导出马尔可夫切换模型(MSM),这是一类隐马尔可夫模型,其中隐状态和观测变量之间存在自回归依赖关系。在存在频繁状态切换以及非线性和非高斯动态的情况下,特别是在变量之间存在瞬时效应(例如由于测量速率较慢)时,识别隐状态具有挑战性。在这项工作中,我们建立了在时间状态依赖、非线性滞后和瞬时效应以及来自指数族的独立噪声下,隐状态和状态依赖因果结构的可识别性。我们的可识别性理论涵盖了因果模型的非时间混合。此外,我们引入了FlowMSM,这是一个状态检测框架,可与任何平稳因果发现方法配对,以恢复状态依赖的因果结构。在合成基准和金融经济学数据集上的实验证明了我们的方法在检测隐状态和从非平稳时间序列中发现因果结构方面的有效性。

英文摘要

Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes and observed variables. Identifying latent regimes is challenging in the presence of frequent regime switches and nonlinear and non-Gaussian dynamics, particularly when there are instantaneous effects between the variables, e.g., due to slow rates of measurements. In this work, we establish the identifiability of both latent regimes and regime-dependent causal structures under temporal regime dependencies, nonlinear lagged and instantaneous effects, and independent noise from the exponential family. Our identifiability theory subsumes non-temporal mixtures of causal models. Furthermore, we introduce FlowMSM, a regime detection framework that can be paired with any stationary causal discovery method to recover regime-dependent causal structures. Experiments on synthetic benchmarks and a financial economics dataset demonstrate the effectiveness of our approach to detect latent regimes and discover causal structures from non-stationary time series.

2606.05264 2026-06-15 cs.LG 版本更新

REGEN: Reference-Guided Synthetic Multivariate Time Series Generation for Forecasting

REGEN:参考引导的合成多元时间序列生成用于预测

Moulik Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

发表机构 * Birla AI Labs, Office of Ananya Birla(Birla AI实验室,Ananya Birla办公室) Birla Institute of Technology and Science, Pilani(Birla理工学院与科学学院,Pilani) Kalinga Institute of Industrial Technology, Bhubaneswar(Kalinga工业技术学院,Bhubaneswar)

AI总结 提出参考引导生成管道ReGeN,通过将观测序列分解为周期骨干、随机残差和跨变量依赖三个可解释组件,实现可控合成,在低数据场景下生成的数据可替代真实数据并提升预测性能。

详情
AI中文摘要

训练鲁棒的多元时间序列预测模型需要大规模、多样化的语料库,然而许多现实领域仅提供少量观测序列。现有生成器无法解决这种不匹配:基于先验的方法(如CauKer、TimePFN)产生领域无关的样本,而数据驱动方法(如TimeGAN)将参考视为黑盒监督,丧失了对周期结构、局部变异和跨变量动态的显式控制。我们提出ReGeN,一种参考引导的生成管道,将观测序列视为可控合成的结构支架而非模仿示例。ReGeN将每个参考分解为三个可解释组件:捕获主导领域形态的相位对齐周期骨干;使用深核高斯过程建模的每变量随机残差;以及通过具有拟合耦合系数的结构因果模型注入的滞后感知跨变量依赖。以可控温度采样这些组件可拓宽分布覆盖,同时保留领域基础结构。我们表明,ReGeN生成的数据始终能替代真实兄弟数据,且预测性能下降极小,在交通等强周期领域中甚至能超越真实源数据。我们进一步表明,在ReGeN语料库上预训练的基础模型优于在基于先验和数据驱动的合成替代方案上预训练的模型。这表明,在低数据场景下,如何结构性利用参考数据可能与数据量同样重要。

英文摘要

Training robust multivariate time series forecasting models requires large, diverse corpora, yet many real-world domains provide only a handful of observed sequences. Existing generators fail to resolve this mismatch: prior-based approaches (e.g., CauKer, TimePFN) produce domain-agnostic samples, while data-driven methods (e.g., TimeGAN) treat references as black-box supervision, forfeiting explicit control over periodic structure, local variability, and cross-variable dynamics. We propose ReGeN, a reference-guided generative pipeline that treats observed sequences not as examples to imitate, but as structural scaffolds for controllable synthesis. ReGeN decomposes each reference into three interpretable components: a phase-aligned periodic backbone capturing dominant domain morphology; per-variable stochastic residuals modeled with a deep-kernel Gaussian process; and lag-aware cross-variable dependencies injected through a structural causal model with fitted coupling coefficients. Sampling these components at controllable temperature broadens distributional coverage while preserving domain-grounded structure. We show that ReGeN-generated data consistently substitutes for real sibling data with minimal forecasting degradation, and in strongly periodic domains such as traffic, can outperform the real source itself. We further show that a foundation model pretrained on ReGeN corpora outperforms those pretrained on prior-based and data-driven synthetic alternatives. This suggests that in low-data regimes, how reference data is structurally exploited can matter as much as how much data is available.

2508.08935 2026-06-15 cs.LG cs.AI 版本更新

LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

LNN-PINN: 一种带有液体残差块的统一纯物理训练框架

Ze Tao, Hanxuan Wang, Fujun Liu

发表机构 * Nanophotonics and Biophotonics Key Laboratory of Jilin Province, School of Physics, Changchun University of Science and Technology(吉林省纳米光子与生物光子重点实验室,物理学院,长春理工大学) Faculty of Chinese Medicine, Macau University of Science and Technology(澳门科技大学中医药学院)

AI总结 针对物理信息神经网络在复杂问题中预测精度有限的问题,提出LNN-PINN框架,通过引入液体残差门控架构提升预测精度,并在多个基准问题上验证了其有效性和稳定性。

详情
Journal ref
Computer Physics Communications, 326, 110237 (2026)
AI中文摘要

物理信息神经网络(PINNs)因其能够将偏微分方程先验知识整合到深度学习框架中而受到广泛关注;然而,在应用于复杂问题时,它们通常表现出有限的预测精度。为了解决这一问题,我们提出了LNN-PINN,一种物理信息神经网络框架,它结合了液体残差门控架构,同时保留原始的物理建模和优化流程以提高预测精度。该方法仅在隐藏层映射中引入轻量级门控机制,保持采样策略、损失组成和超参数设置不变,以确保改进纯粹来自架构优化。在四个基准问题上,LNN-PINN在相同训练条件下持续降低了RMSE和MAE,绝对误差图进一步证实了其精度提升。此外,该框架在不同维度、边界条件和算子特性下表现出强大的适应性和稳定性。总之,LNN-PINN为提升物理信息神经网络在复杂科学和工程问题中的预测精度提供了一种简洁有效的架构增强方法。

英文摘要

Physics-informed neural networks (PINNs) have attracted considerable attention for their ability to integrate partial differential equation priors into deep learning frameworks; however, they often exhibit limited predictive accuracy when applied to complex problems. To address this issue, we propose LNN-PINN, a physics-informed neural network framework that incorporates a liquid residual gating architecture while preserving the original physics modeling and optimization pipeline to improve predictive accuracy. The method introduces a lightweight gating mechanism solely within the hidden-layer mapping, keeping the sampling strategy, loss composition, and hyperparameter settings unchanged to ensure that improvements arise purely from architectural refinement. Across four benchmark problems, LNN-PINN consistently reduced RMSE and MAE under identical training conditions, with absolute error plots further confirming its accuracy gains. Moreover, the framework demonstrates strong adaptability and stability across varying dimensions, boundary conditions, and operator characteristics. In summary, LNN-PINN offers a concise and effective architectural enhancement for improving the predictive accuracy of physics-informed neural networks in complex scientific and engineering problems.

2603.20821 2026-06-15 cs.DC cs.AI cs.LG 版本更新

Compass: Optimizing Compound AI Workflows for Dynamic Adaptation

Compass: 为动态适应优化复合AI工作流

Milos Gravara, Juan Luis Herrera, Stefan Nastic

发表机构 * University of California, Berkeley(加州大学伯克利分校) ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出Compass框架,通过离线优化和在线适应动态切换复合AI工作流的配置,提升准确率、延迟和成本的平衡能力。

Comments 10 pages, 7 figures; accepted at the 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2026)

详情
Journal ref
In Proceedings of the 26th IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2026
AI中文摘要

复合AI是一种分布式智能方法,通过整合专用AI/ML模型与工程软件组件形成AI工作流。复合AI生产部署必须在变化负载下满足准确性、延迟和成本目标。然而,许多部署运行在固定基础设施上,无法水平扩展。现有方法仅优化准确性,未考虑负载变化。我们发现复合AI系统可切换配置以适应基础设施容量,根据当前负载在准确性与延迟之间进行权衡。这需要从组合搜索空间中发现多个帕累托最优配置,并在运行时确定切换时机。本文提出Compass框架,通过离线优化和在线适应实现动态配置切换。Compass包含三个组件:COMPASS-V算法用于配置发现,Planner用于切换策略推导,Elastico控制器用于运行时适应。COMPASS-V利用有限差分引导搜索和爬山与横向扩展结合的方法发现准确性可行的配置。Planner在目标硬件上对这些配置进行剖析,并利用基于排队理论的模型推导切换策略。Elastico监控队列深度并根据推导的阈值切换配置。在两个复合AI工作流中,COMPASS-V在减少57.5%的配置评估的同时实现100%召回率,效率提升达95.3%。运行时适应在动态负载模式下实现90-98%的SLO合规性,比静态高精度基线提升71.6%的SLO合规性,同时比静态快速基线提高3-5%的精度。

英文摘要

Compound AI is a distributed intelligence approach that represents a unified system orchestrating specialized AI/ML models with engineered software components into AI workflows. Compound AI production deployments must satisfy accuracy, latency, and cost objectives under varying loads. However, many deployments operate on fixed infrastructure where horizontal scaling is not viable. Existing approaches optimize solely for accuracy and do not consider changes in workload conditions. We observe that compound AI systems can switch between configurations to fit infrastructure capacity, trading accuracy for latency based on current load. This requires discovering multiple Pareto-optimal configurations from a combinatorial search space and determining when to switch between them at runtime. We present Compass, a novel framework that enables dynamic configuration switching through offline optimization and online adaptation. Compass consists of three components: COMPASS-V algorithm for configuration discovery, Planner for switching policy derivation, and Elastico Controller for runtime adaptation. COMPASS-V discovers accuracy-feasible configurations using finite-difference guided search and a combination of hill-climbing and lateral expansion. Planner profiles these configurations on target hardware and derives switching policies using a queuing theory based model. Elastico monitors queue depth and switches configurations based on derived thresholds. Across two compound AI workflows, COMPASS-V achieves 100% recall while reducing configuration evaluations by 57.5% on average compared to exhaustive search, with efficiency gains reaching 95.3% at tight accuracy thresholds. Runtime adaptation achieves 90-98% SLO compliance under dynamic load patterns, improving SLO compliance by 71.6% over static high-accuracy baselines, while simultaneously improving accuracy by 3-5% over static fast baselines.

2602.13040 2026-06-15 cs.LG 版本更新

TCRL: Temporal-Coupled Adversarial Training for Robust Constrained Reinforcement Learning in Worst-Case Scenarios

TCRL: 时序耦合对抗训练用于最坏情况下的鲁棒约束强化学习

Wentao Xu, Zhongming Yao, Weihao Li, Zhenghang Song, Yumeng Song, Tianyi Li, Yushuai Li

发表机构 * Northeastern University(东北大学) Zhejiang University(浙江大学) Aalborg University(奥胡斯大学)

AI总结 TCRL通过引入时序耦合对抗训练框架,解决传统方法在处理时序耦合扰动时的不足,提升约束强化学习在最坏情况下的鲁棒性。

详情
Journal ref
Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems, 3489 - 3491, 2026
AI中文摘要

约束强化学习(CRL)旨在在约束条件下优化决策策略,广泛应用于自动驾驶、机器人和电网管理等安全关键领域。然而,现有鲁棒CRL方法主要关注单步扰动和时间独立对抗模型,缺乏对时间耦合扰动的显式建模。为此,我们提出TCRL,一种新的时序耦合对抗训练框架,用于最坏情况下的鲁棒约束强化学习。首先,TCRL引入了一个最坏情况感知的成本约束函数,用于估计在时间耦合扰动下的安全成本,无需显式建模对抗攻击者。其次,TCRL在奖励上建立双约束防御机制,以对抗时间耦合对手的同时保持奖励的不可预测性。实验结果表明,TCRL在多种CRL任务中均在对抗时间耦合扰动攻击的鲁棒性方面优于现有方法。

英文摘要

Constrained Reinforcement Learning (CRL) aims to optimize decision-making policies under constraint conditions, making it highly applicable to safety-critical domains such as autonomous driving, robotics, and power grid management. However, existing robust CRL approaches predominantly focus on single-step perturbations and temporally independent adversarial models, lacking explicit modeling of robustness against temporally coupled perturbations. To tackle these challenges, we propose TCRL, a novel temporal-coupled adversarial training framework for robust constrained reinforcement learning (TCRL) in worst-case scenarios. First, TCRL introduces a worst-case-perceived cost constraint function that estimates safety costs under temporally coupled perturbations without the need to explicitly model adversarial attackers. Second, TCRL establishes a dual-constraint defense mechanism on the reward to counter temporally coupled adversaries while maintaining reward unpredictability. Experimental results demonstrate that TCRL consistently outperforms existing methods in terms of robustness against temporally coupled perturbation attacks across a variety of CRL tasks.

2512.19805 2026-06-15 cs.LG stat.ME 版本更新

Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy

受保护的提升目标:营销策略的因果优化指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结 本文提出一个优化客户定向的营销决策框架,结合异质处理效应估计与明确业务保护规则,旨在最大化收入和留存同时遵守预算、收入保护和客户体验等约束。

详情
AI中文摘要

本文介绍了一个营销决策框架,通过整合异质处理效应估计与明确业务保护规则来优化客户定向。目标是在遵守预算、收入保护和客户体验等约束条件下最大化收入和留存。该框架首先使用提升学习器估计条件平均处理效应(CATE),然后解决一个受约束的分配问题以决定针对谁以及部署哪种优惠。该框架支持留存信息、活动奖励和支出阈值分配的决策。通过离线模拟和在线A/B测试验证,该方法一致优于倾向和静态基线,提供了一个可重复使用的因果定向大规模应用指南。

英文摘要

This paper introduces a marketing decision framework that optimizes customer targeting by integrating heterogeneous treatment effect estimation with explicit business guardrails. The objective is to maximize revenue and retention while adhering to constraints such as budget, revenue protection, and customer experience. The framework first estimates Conditional Average Treatment Effects (CATE) using uplift learners, then solves a constrained allocation problem to decide whom to target and which offer to deploy. It supports decisions in retention messaging, event rewards, and spend-threshold assignment. Validated through offline simulations and online A/B tests, the approach consistently outperforms propensity and static baselines, offering a reusable playbook for causal targeting at scale.

2512.20932 2026-06-15 cs.LG cs.AI 版本更新

Guardrailed Elasticity Pricing: A Churn-Aware Forecasting Playbook for Subscription Strategy

受约束的弹性定价:面向订阅策略的 churn 意识预测指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结 本文提出一个动态定价框架,结合多变量需求预测、分段价格弹性及 churn 预测,以优化收入和留存。通过季节性模型与树状学习器,解决受约束优化问题,提升 SaaS 产品组合的定价效果,同时保障客户体验与伦理约束。

详情
AI中文摘要

本文提出一个营销分析框架,将订阅定价作为动态、受约束的决策系统,结合多变量需求预测、分段层面的价格弹性及 churn 可能性,以优化收入、利润率和留存。该方法融合季节性时间序列模型与树状学习器,运行蒙特卡洛情景测试以映射风险范围,并解决受约束优化问题,以确保客户体验、利润率底线和允许的 churn。在异质 SaaS 产品组合中经过验证,该方法持续优于静态层级和统一提升,通过将价格变动重新分配给愿意支付更多费用的分段,同时保护价格敏感的群体。系统通过模块化 API 实现实时重新校准,并包含模型可解释性以满足治理和合规需求。从管理角度看,该框架作为策略指南,明确何时从固定定价转向动态定价,如何将定价与客户生命周期价值(CLV)和每月 recurring 收入(MRR)目标对齐,以及如何嵌入伦理约束,从而实现可持续增长而不损害客户信任。

英文摘要

This paper presents a marketing analytics framework that operationalizes subscription pricing as a dynamic, guardrailed decision system, uniting multivariate demand forecasting, segment-level price elasticity, and churn propensity to optimize revenue, margin, and retention. The approach blends seasonal time-series models with tree-based learners, runs Monte Carlo scenario tests to map risk envelopes, and solves a constrained optimization that enforces business guardrails on customer experience, margin floors, and allowable churn. Validated across heterogeneous SaaS portfolios, the method consistently outperforms static tiers and uniform uplifts by reallocating price moves toward segments with higher willingness-to-pay while protecting price-sensitive cohorts. The system is designed for real-time recalibration via modular APIs and includes model explainability for governance and compliance. Managerially, the framework functions as a strategy playbook that clarifies when to shift from flat to dynamic pricing, how to align pricing with CLV and MRR targets, and how to embed ethical guardrails, enabling durable growth without eroding customer trust.

2601.08334 2026-06-15 cs.LG 版本更新

Automated Machine Learning in Radiomics: A Comparative Evaluation of Performance, Efficiency and Accessibility

医学影像组学中的自动化机器学习:性能、效率和可及性的比较评估

Jose Lozano-Montoya, Emilio Soria-Olivas, Almudena Fuster-Matanzo, Angel Alberich-Bayarri, Ana Jimenez-Pastor

发表机构 * University of Valencia(瓦伦西亚大学) Research & Frontiers in AI Department, Quantitative Imaging Biomarkers in Medicine, Quibim SL(研究与前沿人工智能部门、定量影像生物标志物在医学中的应用、Quibim SL) Intelligent Data Analysis Laboratory, IDAL, University of Valencia(智能数据分析实验室,IDAL,瓦伦西亚大学)

AI总结 本文比较了通用和专用自动化机器学习框架在医学影像组学分类任务中的性能、效率和可及性,发现专用工具在性能上表现最佳,而通用框架在易用性上更优,但存在生存分析支持不足和特征可重复性整合不足等问题。

Comments 27 pages, 4 figures, 3 tables, code available, see https://github.com/joselznom/AutoML-Comparison-in-Radiomics

详情
Journal ref
JMIR Form Res. 2026;10:e91492
AI中文摘要

自动化机器学习(AutoML)框架通过使没有编程经验的研究人员能够构建模型,降低了预测和预后模型开发在影像组学中的技术障碍。然而,其在解决影像组学特定挑战的有效性仍不明确。本研究评估了通用和专用AutoML框架在多样化的影像组学分类任务中的性能、效率和可及性,从而突出影像组学的发展需求。使用了十个公共/私人影像组学数据集,涵盖多种成像模态(CT/MRI)、大小、解剖结构和终点。通过预定义参数使用标准化交叉验证测试了六个通用和五个专用框架。评估指标包括AUC、运行时间,以及与软件状态、可及性和可解释性相关的定性方面。Simplatab,一个具有无代码界面的专用工具,实现了最高的平均测试AUC(81.81%)和中等运行时间(约1小时)。LightAutoML,一个通用框架,展示了最快的执行速度,性能(6分钟内平均AUC为78.74%)具有竞争力。大多数专用框架由于过时、编程需求大或计算效率低而被排除在性能分析之外。相反,通用框架在可及性和易用性上表现更优。Simplatab为影像组学分类问题提供了性能、效率和可及性的有效平衡。然而,仍存在显著差距,包括缺乏可及的生存分析支持以及当前AutoML框架中特征可重复性和和谐整合的有限整合。未来研究应聚焦于调整AutoML解决方案以更好地解决这些影像组学特定挑战。

英文摘要

Automated machine learning (AutoML) frameworks can lower technical barriers for predictive and prognostic model development in radiomics by enabling researchers without programming expertise to build models. However, their effectiveness in addressing radiomics-specific challenges remains unclear. This study evaluates the performance, efficiency, and accessibility of general-purpose and radiomics-specific AutoML frameworks on diverse radiomics classification tasks, thereby highlighting development needs for radiomics. Ten public/private radiomics datasets with varied imaging modalities (CT/MRI), sizes, anatomies and endpoints were used. Six general-purpose and five radiomics-specific frameworks were tested with predefined parameters using standardized cross-validation. Evaluation metrics included AUC, runtime, together with qualitative aspects related to software status, accessibility, and interpretability. Simplatab, a radiomics-specific tool with a no-code interface, achieved the highest average test AUC (81.81%) with a moderate runtime (~1 hour). LightAutoML, a general-purpose framework, showed the fastest execution with competitive performance (78.74% mean AUC in six minutes). Most radiomics-specific frameworks were excluded from the performance analysis due to obsolescence, extensive programming requirements, or computational inefficiency. Conversely, general-purpose frameworks demonstrated higher accessibility and ease of implementation. Simplatab provides an effective balance of performance, efficiency, and accessibility for radiomics classification problems. However, significant gaps remain, including the lack of accessible survival analysis support and the limited integration of feature reproducibility and harmonization within current AutoML frameworks. Future research should focus on adapting AutoML solutions to better address these radiomics-specific challenges.

2511.17637 2026-06-15 cs.LG cs.CL 版本更新

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

PocketLLM: 通过元网络实现大语言模型的终极压缩

Ye Tian, Chengcheng Wang, Jing Han, Yehui Tang, Kai Han

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出PocketLLM,通过元网络在潜在空间压缩大语言模型,利用编码器和解码器实现高效压缩,实验表明在高压缩比下仍保持高精度。

Comments AAAI 2026 camera ready

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33250-33258 (2026)
AI中文摘要

随着大语言模型(LLMs)的持续增长,将其存储和传输到边缘设备变得越来越具有挑战性。传统方法如量化和剪枝在不牺牲精度的情况下难以实现极端压缩。本文介绍了一种新的压缩方法PocketLLM,通过元网络在潜在空间中压缩LLMs。提出一个简单的编码器网络,将LLMs的权重投影到离散的潜在向量中,然后使用紧凑的代码本进行表示。轻量级的解码器网络用于将代码本的代表性向量映射回原始权重空间。该方法仅需一个小解码器、简洁的代码本和一个索引即可实现LLMs中大权重的显著压缩。大量实验表明,PocketLLM在显著的压缩比下仍能保持优越的性能,例如将Llama 2-7B压缩10倍,精度损失微不足道。

英文摘要

As Large Language Models (LLMs) continue to grow in size, storing and transmitting them on edge devices becomes increasingly challenging. Traditional methods like quantization and pruning struggle to achieve extreme compression of LLMs without sacrificing accuracy. In this paper, we introduce PocketLLM, a novel approach to compress LLMs in a latent space via meta-networks. A simple encoder network is proposed to project the weights of LLMs into discrete latent vectors, which are then represented using a compact codebook. A lightweight decoder network is employed to map the codebook's representative vectors back to the original weight space. This method allows for significant compression of the large weights in LLMs, consisting solely of a small decoder, a concise codebook, and an index. Extensive experiments show that PocketLLM achieves superior performance even at significantly high compression ratios, e.g., compressing Llama 2-7B by 10x with a negligible drop in accuracy.

2508.10827 2026-06-15 astro-ph.EP cs.LG 版本更新

Accelerating exoplanet climate modelling: A machine learning approach to complement 3D GCM grid simulations

加速系外行星气候建模:一种机器学习方法用于补充3D GCM网格模拟

Alexander Plaschzug, Amit Reza, Ludmila Carone, Sebastian Gernjak, Christiane Helling

发表机构 * Space Research Institute, Austrian Academy of Sciences(空间研究所,奥地利科学院) Institute for Theoretical Physics and Computational Physics, Graz University of Technology(理论物理与计算物理研究所,格拉茨技术大学) Institute of Physics, University of Graz(物理研究所,格拉茨大学)

AI总结 本文利用机器学习方法预测系外行星的3D温度和风结构,通过训练神经网络和决策树算法,为系外行星气候建模提供高效工具,提升对空间任务观测数据的解释能力。

详情
Journal ref
A&A Volume 706, February 2026
AI中文摘要

随着望远镜技术的发展,观测系外行星大气的能力不断增强,对更精确的3D气候模型需求增加。然而,通用环流模型(GCMs)计算密集且耗时,难以模拟多种系外行星大气。本文研究了机器学习算法能否预测任意潮汐锁定气态系外行星的3D温度和风结构。引入了一个新的3D GCM网格,模拟了60颗膨胀的热木星围绕A、F、G、K和M型恒星。通过训练密集神经网络(DNN)和决策树算法(XGBoost),预测局部气体温度及水平和垂直风。通过WASP-121 b、HATS-42 b、NGTS-17 b、WASP-23 b和NGTS-1 b等目标测试,验证了DNN预测气体温度的可靠性,所有但一个行星的光谱计算误差在32 ppm以内。所开发的机器学习模拟器能够可靠预测围绕A到M型恒星的膨胀温暖至超热潮汐锁定木星的3D温度场,为系外行星集合研究提供快速工具。预测质量足以保证对气体相化学、云形成和传输光谱的影响极小。

英文摘要

With the development of ever-improving telescopes capable of observing exoplanet atmospheres in greater detail and number, there is a growing demand for enhanced 3D climate models to support and help interpret observational data from space missions like CHEOPS, TESS, JWST, PLATO, and Ariel. However, the computationally intensive and time-consuming nature of general circulation models (GCMs) poses significant challenges in simulating a wide range of exoplanetary atmospheres. This study aims to determine whether machine learning (ML) algorithms can be used to predict the 3D temperature and wind structure of arbitrary tidally-locked gaseous exoplanets in a range of planetary parameters. A new 3D GCM grid with 60 inflated hot Jupiters orbiting A, F, G, K, and M-type host stars modelled with Exorad has been introduced. A dense neural network (DNN) and a decision tree algorithm (XGBoost) are trained on this grid to predict local gas temperatures along with horizontal and vertical winds. To ensure the reliability and quality of the ML model predictions, WASP-121 b, HATS-42 b, NGTS-17 b, WASP-23 b, and NGTS-1 b-like planets, which are all targets for PLATO observation, are selected and modelled with ExoRad and the two ML methods as test cases. The DNN predictions for the gas temperatures are to such a degree that the calculated spectra agree within 32 ppm for all but one planet, for which only one single HCN feature reaches a 100 ppm difference. The developed ML emulators can reliably predict the complete 3D temperature field of an inflated warm to ultra-hot tidally locked Jupiter around A to M-type host stars. It provides a fast tool to complement and extend traditional GCM grids for exoplanet ensemble studies. The quality of the predictions is such that no or minimal effects on the gas phase chemistry, hence on the cloud formation and transmission spectra, are to be expected.

2412.00123 2026-06-15 cs.LG math.PR 版本更新

Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

利用多核高斯过程回归与核支持向量回归预测电力价格

Abhinav Das, Stephan Schlüter, Lorenz Schneider

发表机构 * Faculty of Mathematics and Economics, Ulm University(数学与经济学学院,乌尔姆大学) Institute of Energy Engineering and Energy Economics, Ulm University of Applied Sciences(能源工程与能源经济学研究所,应用科学大学乌尔姆) Emlyon Business School, Lyon, France(埃默里昂商学院,法国里昂)

AI总结 本文提出一种新的混合模型用于预测德国电力价格,结合高斯过程回归和支持向量回归,通过选择合适的数据依赖协方差函数提升GPR性能,并利用支持向量回归处理非线性过程和异常值,实验表明优于现有基准模型。

详情
Journal ref
Journal of Forecasting (2026) 45, no. 4: 2059:2077
AI中文摘要

本文提出了一种新的混合模型用于预测德国电力价格。该算法基于高斯过程回归(GPR)和支持向量回归(SVR)的结合。尽管GPR在学习数据中的随机模式和插值方面表现良好,但其在样本外数据的预测性能并不理想。通过选择合适的数据依赖协方差函数,可以增强GPR对德国小时电力价格的预测性能。然而,由于样本外预测依赖于训练数据,预测容易受到噪声和异常值的影响。为了解决这个问题,通过SVR进行单独预测,该方法应用基于边界的优化。这种方法在处理非线性过程和异常值时具有优势,因为只有训练数据中的某些必要点(支持向量)负责回归。然后通过均匀权重线性组合个体预测。在测试历史德国电力价格时,该方法优于公开可用的基准,即LASSO估计的自回归回归模型以及最近研究中提供的深度神经网络。

英文摘要

This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].

2501.15196 2026-06-15 stat.ML cs.LG 版本更新

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

时间序列异常检测中自监督学习的综述:最新进展与开放挑战

Aitor Sánchez-Ferrera, Borja Calvo, Jose A. Lozano

发表机构 * University of the Basque Country UPV/EHU(巴斯克大学UPV/EHU) Basque Center for Applied Mathematics (BCAM)(巴斯克应用数学中心)

AI总结 本文综述了时间序列异常检测中自监督学习的最新方法,提出分类体系以理解其多样性,并提供GitHub仓库供后续更新。

详情
AI中文摘要

时间序列异常检测面临诸多挑战,这源于时间依赖数据的序列性和动态性。传统无监督方法常在泛化能力上遇到困难,往往过度拟合训练期间观察到的已知正常模式,难以适应未见过的正常情况。为解决这一限制,时间序列的自监督技术引起了关注,作为克服这一障碍并提升异常检测器性能的潜在解决方案。本文综述了近期利用自监督学习进行时间序列异常检测的方法。提出了一种分类体系,根据其主要特征对这些方法进行分类,有助于清晰理解该领域内的多样性。本文调查中包含的信息,以及将定期更新的额外细节,可在以下GitHub仓库中找到:https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection。

英文摘要

Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and struggling to adapt to unseen normality. In response to this limitation, self-supervised techniques for time series have garnered attention as a potential solution to undertake this obstacle and enhance the performance of anomaly detectors. This paper presents a comprehensive review of the recent methods that make use of self-supervised learning for time series anomaly detection. A taxonomy is proposed to categorize these methods based on their primary characteristics, facilitating a clear understanding of their diversity within this field. The information contained in this survey, along with additional details that will be periodically updated, is available on the following GitHub repository: https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection.

2506.18271 2026-06-15 cs.LG 版本更新

Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models

具有长时上下文处理能力的大型语言模型记忆增强架构

Haseeb Ullah Khan Shinwari, Muhammad Usama

发表机构 * Newton AI Lab(牛顿AI实验室) School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院电气工程学院)

AI总结 本文提出一种记忆增强架构,通过动态检索、更新和剪枝过去交互信息,提升大型语言模型的长时上下文处理能力,实验表明该方法能有效提高上下文连贯性、降低内存开销并提升响应质量。

详情
Journal ref
IEEE Transactions on Artificial Intelligence, 2026
AI中文摘要

大型语言模型在维护长对话中的一致性交互时面临显著挑战,由于其有限的上下文记忆能力,导致对话碎片化和响应相关性降低,影响用户体验。为解决这些问题,我们提出了一种记忆增强的架构,该架构能够动态地从过去交互中检索、更新和剪枝相关信息,从而确保有效的长时上下文处理。实验结果表明,我们的解决方案显著提高了上下文连贯性,减少了内存开销,并增强了响应质量,展示了其在交互系统中的实时应用潜力。

英文摘要

Large Language Models face significant challenges in maintaining coherent interactions over extended dialogues due to their limited contextual memory. This limitation often leads to fragmented exchanges and reduced relevance in responses, diminishing user experience. To address these issues, we propose a memory-augmented architecture that dynamically retrieves, updates, and prunes relevant information from past interactions, ensuring effective long-term context handling. Experimental results demonstrate that our solution significantly improves contextual coherence, reduces memory overhead, and enhances response quality, showcasing its potential for real-time applications in interactive systems.

2506.09087 2026-06-15 cs.LG math.PR q-bio.NC stat.ML 版本更新

Spiking Neural Models for Decision-Making Tasks with Learning

基于学习的脉冲神经模型用于决策任务

Sophie Jaffard, Giulia Mezzadri, Patricia Reynaud-Bouret, Etienne Tanré

发表机构 * Cognition and Decision Lab, Columbia University(认知与决策实验室,哥伦比亚大学)

AI总结 本文提出一种生物合理性的脉冲神经网络模型,结合学习机制和多变量Hawkes过程,用于决策任务,通过耦合DDM与Poisson计数器模型,推导出带有相关噪声的DDM,并设计在线分类任务验证模型预测。

详情
AI中文摘要

在认知领域,决策任务中的响应时间和选择通常用漂移扩散模型(DDMs)建模,该模型将决策证据的累积描述为随机过程,特别是布朗运动,其中漂移速率反映证据强度。同样,泊松计数器模型将证据累积描述为离散事件,其计数随时间建模为泊松过程,并可解释为神经元活动。然而,这些模型缺乏学习机制且局限于参与者已知类别任务。为弥合认知与生物模型之间的差距,本文提出一种生物合理性的脉冲神经网络(SNN)模型,用于决策任务,该模型包含学习机制,其神经元活动由多变量Hawkes过程建模。首先,我们证明了DDM与泊松计数器模型之间的耦合结果,表明这两个模型提供相似的分类和响应时间,并且DDM可近似由脉冲泊松神经元建模。为进一步推进,我们证明了一个具有相关噪声的特定DDM可从由局部学习规则支配的脉冲神经元Hawkes网络中推导出来。此外,我们设计了一个在线分类任务来评估模型预测。本文为将生物相关神经机制整合到认知模型中提供了重要进展,促进了对神经活动与行为之间关系的深入理解。

英文摘要

In cognition, response times and choices in decision-making tasks are commonly modeled using Drift Diffusion Models (DDMs), which describe the accumulation of evidence for a decision as a stochastic process, specifically a Brownian motion, with the drift rate reflecting the strength of the evidence. In the same vein, the Poisson counter model describes the accumulation of evidence as discrete events whose counts over time are modeled as Poisson processes, and has a spiking neurons interpretation as these processes are used to model neuronal activities. However, these models lack a learning mechanism and are limited to tasks where participants have prior knowledge of the categories. To bridge the gap between cognitive and biological models, we propose a biologically plausible Spiking Neural Network (SNN) model for decision-making that incorporates a learning mechanism and whose neurons activities are modeled by a multivariate Hawkes process. First, we show a coupling result between the DDM and the Poisson counter model, establishing that these two models provide similar categorizations and reaction times and that the DDM can be approximated by spiking Poisson neurons. To go further, we show that a particular DDM with correlated noise can be derived from a Hawkes network of spiking neurons governed by a local learning rule. In addition, we designed an online categorization task to evaluate the model predictions. This work provides a significant step toward integrating biologically relevant neural mechanisms into cognitive models, fostering a deeper understanding of the relationship between neural activity and behavior.

2505.04907 2026-06-15 cs.LG 版本更新

VaCDA: Variational Contrastive Alignment-based Scalable Human Activity Recognition

VaCDA:基于变分对比对齐的可扩展人类活动识别

Soham Khisa, Avijoy Chakma

发表机构 * Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology(计算机科学与工程系,孟加拉国工程与技术大学) Department of Computer Science, Bowie State University(计算机科学系,布里沃州立大学)

AI总结 本文提出VaCDA框架,结合变分自编码器和对比学习,解决多源领域适应中的数据异质性问题,提升跨人物、跨位置和跨设备场景下的活动识别性能。

详情
AI中文摘要

技术进步促使可穿戴设备的兴起,这些设备持续监测用户活动,生成大量未标记数据。这种数据难以解读,手动标注费时且易出错。此外,数据分布往往异质,由于设备放置、类型和用户行为的变化。因此,传统迁移学习方法效果不佳,难以识别日常活动。为解决这些问题,我们使用变分自编码器(VAE)从可用传感器数据中学习共享的低维潜在空间。该空间在不同传感器间泛化数据,缓解异质性并帮助适应目标领域。我们整合对比学习以增强特征表示,通过在不同领域对同一类实例进行对齐并分离不同类实例。我们提出变分对比域适应(VaCDA),一种结合VAE和对比学习的多源域适应框架,以提高特征表示并减少源域和目标域之间的异质性。我们评估了VaCDA在三个异质场景下的多个公开数据集上,即跨人物、跨位置和跨设备。VaCDA在跨位置和跨设备场景中优于基线方法。

英文摘要

Technological advancements have led to the rise of wearable devices with sensors that continuously monitor user activities, generating vast amounts of unlabeled data. This data is challenging to interpret, and manual annotation is labor-intensive and error-prone. Additionally, data distribution is often heterogeneous due to device placement, type, and user behavior variations. As a result, traditional transfer learning methods perform suboptimally, making it difficult to recognize daily activities. To address these challenges, we use a variational autoencoder (VAE) to learn a shared, low-dimensional latent space from available sensor data. This space generalizes data across diverse sensors, mitigating heterogeneity and aiding robust adaptation to the target domain. We integrate contrastive learning to enhance feature representation by aligning instances of the same class across domains while separating different classes. We propose Variational Contrastive Domain Adaptation (VaCDA), a multi-source domain adaptation framework combining VAEs and contrastive learning to improve feature representation and reduce heterogeneity between source and target domains. We evaluate VaCDA on multiple publicly available datasets across three heterogeneity scenarios: cross-person, cross-position, and cross-device. VaCDA outperforms the baselines in cross-position and cross-device scenarios.

2311.05139 2026-06-15 cs.LG 版本更新

Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse

对比学习中的硬负样本:最优表示几何与神经折叠与维度折叠

Ruijie Jiang, Thuan Nguyen, Shuchin Aeron, Prakash Ishwar

发表机构 * Department of Electrical Engineering, Tufts University(Tufts大学电气工程系) Department of Engineering, Engineering Technology, East Tennessee State University(东田纳西州立大学工程系) Department of Electrical and Computer Engineering, Boston University(波士顿大学电气与计算机工程系)

AI总结 本文证明了在对比学习中,SCL、HSCL和UCL的损失最小化需要神经折叠几何,且HSCL和HUCL损失下界不低于SCL和UCL。同时,通过随机初始化和合适难度级别,Adam优化可收敛至神经折叠几何,而无硬负样本或特征归一化则会导致维度折叠。

Comments Final version: Reviewed and accepted to TMLR April 2025. Updated exposition, Added analysis of lower bounds

详情
Journal ref
Transactions on Machine Learning Research, 2025
AI中文摘要

对于广泛研究的数据模型和通用损失及样本硬化函数,我们证明监督对比学习(SCL)、硬SCL(HSCL)和无监督对比学习(UCL)的损失最小化由表现神经折叠(NC)的表示实现,即类均值形成等角紧框架(ETF)且同类数据映射到同一表示。我们还证明对于任何表示映射,HSCL和硬UCL(HUCL)损失下界不低于对应的SCL和UCL损失。与现有文献不同,我们的SCL理论结果不需增强视图的类条件独立性,适用于包含广泛使用的InfoNCE损失函数的一般损失函数类。此外,我们的证明更简单、紧凑且透明。类似现有文献,我们的理论声明也适用于实际场景中使用批处理优化的情况。我们实证显示,首次证明在使用随机初始化和合适难度级别时,Adam优化HSCL和HUCL损失可收敛至NC几何,若加入单位球或单位球面特征归一化。不加入硬负样本或特征归一化时,通过Adam学习的表示会遭受维度折叠(DC)并无法达到NC几何。这些结果展示了硬负样本采样在对比表示学习中的作用,我们最后提出几个开放性的理论问题以供未来研究。代码可在https://github.com/rjiang03/HCL/tree/main找到。

英文摘要

For a widely-studied data model and general loss and sample-hardening functions we prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC), i.e., the class means form an Equiangular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses. In contrast to existing literature, our theoretical results for SCL do not require class-conditional independence of augmented views and work for a general loss function class that includes the widely used InfoNCE loss function. Moreover, our proofs are simpler, compact, and transparent. Similar to existing literature, our theoretical claims also hold for the practical scenario where batching is used for optimization. We empirically demonstrate, for the first time, that Adam optimization (with batching) of HSCL and HUCL losses with random initialization and suitable hardness levels can indeed converge to the NC-geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard-negatives or feature normalization, however, the representations learned via Adam suffer from Dimensional-Collapse (DC) and fail to attain the NC-geometry. These results exemplify the role of hard-negative sampling in contrastive representation learning and we conclude with several open theoretical problems for future work. The code can be found at https://github.com/rjiang03/HCL/tree/main

2209.00078 2026-06-15 cs.LG 版本更新

Supervised Contrastive Learning with Hard Negative Samples

带有难负样本的监督对比学习

Ruijie Jiang, Thuan Nguyen, Prakash Ishwar, Shuchin Aeron

发表机构 * Dept. of ECE Tufts University(电子工程系塔夫茨大学) Dept. of CS Tufts University(计算机科学系塔夫茨大学) Dept. of ECE Boston University(电子工程系波士顿大学)

AI总结 本文提出H-SCL,通过硬化函数调整类条件负采样分布,提升对比学习在下游分类任务中的性能,并分析H-SCL损失与H-UCL损失的关系。

详情
Journal ref
2024 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2024
AI中文摘要

通过最小化适当的损失函数(如InfoNCE损失),对比学习(CL)通过将正样本拉近、推斥负样本来学习有用的表示函数。正样本通常通过

英文摘要

Through minimization of an appropriate loss function such as the InfoNCE loss, contrastive learning (CL) learns a useful representation function by pulling positive samples close to each other while pushing negative samples far apart in the embedding space. The positive samples are typically created using "label-preserving" augmentations, i.e., domain-specific transformations of a given datum or anchor. In absence of class information, in unsupervised CL (UCL), the negative samples are typically chosen randomly and independently of the anchor from a preset negative sampling distribution over the entire dataset. This leads to class-collisions in UCL. Supervised CL (SCL), avoids this class collision by conditioning the negative sampling distribution to samples having labels different from that of the anchor. In hard-UCL (H-UCL), which has been shown to be an effective method to further enhance UCL, the negative sampling distribution is conditionally tilted, by means of a hardening function, towards samples that are closer to the anchor. Motivated by this, in this paper we propose hard-SCL (H-SCL) {wherein} the class conditional negative sampling distribution {is tilted} via a hardening function. Our simulation results confirm the utility of H-SCL over SCL with significant performance gains {in downstream classification tasks.} Analytically, we show that {in the} limit of infinite negative samples per anchor and a suitable assumption, the {H-SCL loss} is upper bounded by the {H-UCL loss}, thereby justifying the utility of H-UCL {for controlling} the H-SCL loss in the absence of label information. Through experiments on several datasets, we verify the assumption as well as the claimed inequality between H-UCL and H-SCL losses. We also provide a plausible scenario where H-SCL loss is lower bounded by UCL loss, indicating the limited utility of UCL in controlling the H-SCL loss.