arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.20182 2026-05-20 cs.LG cs.AI 版本更新

Atoms of Thought: Universal EEG Representation Learning with Microstates

思想的原子:基于微状态的通用EEG表示学习

Xinyang Tian, Ruitao Liu, Ziyi Ye, Siyang Xue, Xin Wang, Xuesong Chen

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University(清华大学交叉信息研究院) Institute of Trustworthy Embodied AI, Fudan University(复旦大学可信具身人工智能研究院) School of Clinical Medicine, Tsinghua University(清华大学临床医学院) Beijing Five Seasons Medical Technology Co., Ltd.(北京五 Seasons 医疗科技有限公司)

AI总结 本文提出了一种基于微状态的通用EEG表示学习方法,通过将连续EEG信号聚类为离散的微状态序列,构建了一个通用的微状态分词器,并在睡眠分期、情绪识别和运动想象分类等下游任务中展示了其优越性,同时提高了可解释性和扩展性。

Comments Accepted by the 3rd International Workshop on Multimodal and Responsible Affective Computing (MRAC 2025). 8 pages of main text, 23 pages total, 5 figures, 4 tables

详情
AI中文摘要

从脑电图(EEG)信号中学习通用表示是神经信息学和脑机接口(BCIs)领域的一项前沿技术。传统上,EEG被视为多变量时间序列,其中时间域或频域特征被提取用于表示学习。本文研究了一种简单而有效的EEG表示,即微状态。微状态代表了在微观时间尺度上大脑活动模式的基本构建块。通过从大规模医疗EEG数据集中对连续EEG信号进行聚类,构建了一个通用的微状态分词器。该微状态分词器被广泛应用于一系列下游任务,包括睡眠分期、情绪识别和运动想象分类。实验结果表明,使用微状态进行EEG表示学习在不同模型和不同任务中均优于传统的时间域和频域特征。进一步分析显示,微状态提供了更高的可解释性和可扩展性,从而在认知神经科学和临床研究中开辟了应用。

英文摘要

Learning universal representations from electroencephalogram (EEG) signals is a cutting-edge approach in the field of neuroinformatics and brain-computer interfaces (BCIs). Conventionally, EEG is treated as a multivariate temporal signal, where time- or frequency-domain features are extracted for representation learning. This paper investigates a simple yet effective EEG representation, i.e., microstates. Microstates represent the building blocks of brain activity patterns at a microscopic time scale. We build a universal microstate tokenizer from a large medical EEG dataset by clustering continuous EEG signals into sequences of discrete microstates. The microstate tokenizer is then adopted universally across a series of downstream tasks, including sleep staging, emotion recognition, and motor imagery classification. Experimental results show that EEG representation learning with microstates outperforms traditional time-domain and frequency-domain features under different models and across different tasks. Further analysis shows that microstates offer greater interpretability and scalability, thereby opening up applications in both cognitive neuroscience and clinical research.

2605.20174 2026-05-20 cs.CV cs.LG 版本更新

Multi-axis Analysis of Image Manipulation Localization

多轴分析图像操纵定位

Keanu Nichols, Divya Appapogu, Giscard Biamby, Dina Bashkirova, Anna Rohrbach, Bryan A. Plummer

发表机构 * Boston University(波士顿大学) University of California, Berkeley(加州大学伯克利分校) Technical University of Darmstadt(德累斯顿技术大学)

AI总结 本文提出AUDITS基准,用于多轴分析图像操纵检测,通过不同领域转移类型评估现有方法的鲁棒性,以推动更可靠和通用的图像操纵检测方法的发展。

Comments 28 pages, 5 figures, 5 tables

详情
AI中文摘要

先进的图像编辑软件使创建高度逼真的图像操纵变得容易,近年来由于生成式AI的进步,这种能力变得更加普及。虽然操纵的图像通常无害,但它们可能传播虚假信息、制造虚假叙述并影响人们对重要问题的看法。尽管这种威胁日益增长,但针对不同视觉领域检测高级操纵的研究仍然有限。因此,我们引入了Analysis Under Domain-shifts, QualIty, Type, and Size (AUDITS),一个全面的基准,用于研究图像操纵检测中的分析轴。AUDITS包含来自两个不同来源(用户和新闻照片)的超过530,000张图像。我们通过最近的扩散基填充技术整理数据集,以支持跨多个轴的分析,涵盖多样化的操纵类型和尺寸。我们通过不同的领域转移类型进行实验,以评估现有图像操纵检测方法的鲁棒性。我们的目标是通过提供新的见解来推动该领域进一步研究,以帮助开发更可靠和通用的图像操纵检测方法。

英文摘要

Advanced image editing software enables easy creation of highly convincing image manipulations, which has been made even more accessible in recent years due to advances in generative AI. Manipulated images, while often harmless, could spread misinformation, create false narratives, and influence people's opinions on important issues. Despite this growing threat, there is limited research on detecting advanced manipulations across different visual domains. Thus, we introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark designed for studying axes of analysis in image manipulation detection. AUDITS comprises over 530K images from two distinct sources (user and news photos). We curate our dataset to support analysis across multiple axes using recent diffusion-based inpaintings, spanning a diverse range of manipulation types and sizes. We conduct experiments under different types of domain shift to evaluate robustness of existing image manipulation detection methods. Our goal is to drive further research in this area by offering new insights that would help develop more reliable and generalizable image manipulation detection methods.

2605.20167 2026-05-20 cs.AI cs.LG 版本更新

HaorFloodAlert: Deseasonalized ML Ensemble for 72-Hour Flood Prediction in Bangladesh Haor Wetlands

HaorFloodAlert: 用于孟加拉国Haor湿地72小时洪水预测的去季节化机器学习集成

Salma Hoque Talukdar Koli, Fahima Haque Talukder Jely, Md. Samiul Alim, Md. Zakir Hossen

发表机构 * 1 Department of Computer Science Engineering, RTM Al-Kabir Technical University, Sylhet-3100, Bangladesh 2 Department of Computer Science Engineering, North East University Bangladesh, Sylhet, Bangladesh 3 Department of Computer Science Engineering, Dhaka University of Engineering \& Technology, Gazipur, Bangladesh [6pt] Corresponding author: ( )

AI总结 本文提出HaorFloodAlert,一种去季节化的机器学习集成模型,用于预测孟加拉国Haor湿地72小时内的洪水概率,通过识别温度季节性影响和利用Sentinel-1 SAR数据提高预测准确性。

Comments 9 pages, 9 figures. To be submitted to raaicon.org

详情
AI中文摘要

孟加拉国Haor湿地的快速洪水几乎没有任何预警,破坏年度boro稻收获。现有系统为河流洪水设计,完全忽略了回水动态。这些流域平坦,水的行为不同于布拉马普特拉河。我们构建了HaorFloodAlert,一种去季节化的机器学习集成,用于预测Sunamganj Haor(约8,000平方公里)72小时内的洪水概率。温度被发现是季节性的作弊代码,因为它在温暖月份洪水发生时提高了准确性6.9个百分点。我们捕捉到了这一点,并构建了一个上游Barak河Sentinel-1 SAR代理,从阿萨姆的Silchar提供约36小时的预警。Otsu阈值化的SAR变化检测在空间匹配上验证达到84-91%。操作性集成(RF 0.5625 + XGBoost 0.4375)在77个真实的Sentinel-1事件上达到89.6%的LOOCV准确性,87.5%的召回率和0.943的AUC-ROC。还包含三级警报管道和BRRI校准的boro稻损害估计器。

英文摘要

Flash floods in Bangladesh's haor wetlands show up with almost no warning. They wreck the annual boro rice harvest. Current setups, built for riverine floods, miss backwater dynamics entirely. These basins are flat. Water does not behave like it does on the Brahmaputra. We built HaorFloodAlert, a deseasonalized machine learning ensemble that forecasts 72-hour flood probability for the Sunamganj Haor (approximately 8,000 km2). Temperature was acting as a seasonal cheat code - it inflated accuracy by 6.9 pp just because floods happen in warm months. We caught that. We also built an upstream Barak River Sentinel-1 SAR proxy from Silchar, Assam, giving about 36 hours of lead time. Otsu-thresholded SAR change detection validates at 84-91 percent spatial match. The operational ensemble (RF 0.5625 + XGBoost 0.4375) hits 89.6 percent LOOCV accuracy, 87.5 percent recall, and 0.943 AUC-ROC on 77 real Sentinel-1 events. A three-tier alert pipeline and a BRRI-calibrated boro rice damage estimator are included.

2605.20159 2026-05-20 cs.CV cond-mat.mtrl-sci cs.LG 版本更新

Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites

用于航空SiC/SiC复合材料X射线断层扫描缺陷检测的可解释计算机视觉

Antonio Peña Corredor, Julien Lesseur, Romain Nunez, Paul Rivalland, Thomas Philippe

发表机构 * Safran Ceramics(萨弗兰陶瓷) Safran Engineering Services(萨弗兰工程服务)

AI总结 本研究提出了一种结合原型层的p-ResNet-50框架,通过引入新的正则化项和语义对齐,提高了X射线断层扫描中缺陷检测的可解释性和准确性,同时保持了高精度和可追溯性。

详情
AI中文摘要

航空SiC/SiC复合材料的非破坏性检测依赖于专家视觉评估,当前流程在接受/拒绝决策方面缺乏可追溯性。深度卷积网络可以自动检测缺陷,但其黑盒性质与工业检测实践所需的透明性相冲突。为此,我们引入了p-ResNet-50,一种扩展了原型层的卷积框架,将高检测精度与基于案例的解释相结合。六个学习到的原型被显式对齐到专家定义的语义类别——健康基质、基质-空气界面、孔洞、线状缺陷和混合形态,使得每个分类都能追溯到具有物理意义的参考。两种新的正则化项,基于锚点和中位数,将原型连接到专家选择的片段,并防止原型崩溃,解决了原型网络已知的限制。通过UMAP进行的潜在空间分析揭示了语义连贯的子域,并映射出不确定性区域,这些区域集中了误分类,使检查员能够明确了解模型在哪里可靠,以及不可靠。该框架在约12,000个片段的XCT数据集上进行了验证,这些片段是从四个缺陷丰富的SiC/SiC实验室样品中提取的。与黑盒ResNet-50基线(ROC-AUC = 0.991)相比,原型扩展实现了相似的性能(准确率0.957 vs. 0.959;ROC-AUC 0.994 vs. 0.993),虽然灵敏度略有降低,但精度和特异性更高。每个决定都由代表性的证据片段支持,并且模型明确标记其不确定性区域。除了缺陷映射外,该框架还建立了一种可重用的方法,用于将领域专家知识嵌入到原型网络中,适用于其他需要可追溯、可审计决策的XCT检测场景。

英文摘要

Non-destructive testing of aerospace SiC/SiC composites via X-ray computed tomography (XCT) relies on expert visual assessment, with current workflows offering limited traceability for accept/reject decisions. Deep convolutional networks can automate defect detection, yet their black-box nature conflicts with the transparency that industrial inspection practice demands. To close this gap, we introduce p-ResNet-50, a convolutional framework extended with a prototype layer that couples high detection accuracy with case-based explanations. Six learned prototypes are explicitly aligned with expert-defined semantic categories-healthy matrix, matrix--air interfaces, pores, line-like defects, and mixed morphologies-so that every classification is traceable to a physically meaningful reference. Two novel regularisation terms, anchor-based and medoid-based, tether prototypes to expert-selected patches and prevent prototype collapse, addressing a known limitation of prototype networks. Latent-space analysis via UMAP delineates semantically coherent sub-domains and maps zones of uncertainty where misclassifications concentrate, giving inspectors an explicit picture of where the model is-and is not-reliable. The framework is validated on an XCT patch dataset of approximately 12,000 patches extracted from four defect-rich SiC/SiC laboratory specimens. Taking a black-box ResNet-50 as a baseline (ROC-AUC = 0.991), the prototype extension achieves comparable performance (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) while trading a slight reduction in sensitivity for higher precision and specificity. Each decision is backed by representative evidence patches, and the model explicitly flags its uncertainty regions. Beyond defect mapping, the framework establishes a reusable methodology for embedding domain-expert knowledge into prototype networks, applicable to other XCT inspection scenarios requiring traceable, auditable decisions.

2605.20157 2026-05-20 cs.LG cs.CR cs.IR 版本更新

SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection

SAGE:可扩展的自动门控集成用于自信的负面采样在欺诈检测中

Sudheer Tubati, Amit Goyal

发表机构 * Amazon Music(亚马逊音乐)

AI总结 本文提出SAGE,一种结合SimHash基于的分层抽样和模块化门控集成的反事实意识负面采样方法,以在欺诈检测中实现对未标记数据的自信负面识别,解决了正例未标记学习中的表示偏差问题。

详情
Journal ref
WSDM Companion '26: Nineteenth ACM International Conference on Web Search and Data Mining, 2026, Pages 34 - 38
AI中文摘要

音乐流媒体欺诈,即恶意行为者人为提高流媒体计数以操纵排行榜和版税支付,对流媒体服务和合法内容创作者构成重大威胁。传统欺诈检测方法面临关键挑战:许多合法边缘案例,包括超级粉丝和睡眠音乐会,表现出的活动模式与协调欺诈非常相似。我们提出了SAGE,一种新颖的反事实意识负面采样方法,结合SimHash基于的分层抽样和模块化门控集成,用于从未标记数据中自信地识别负面样本。我们的集成架构采用可插拔的统计门(目前实例化为Mahalanobis距离和k-NN密度)和可配置的投票阈值,以实现自适应的精度-召回率权衡。这通过通过地板约束抽样确保罕见行为群体的全面覆盖,解决了正例未标记学习中的表示偏差问题。评估显示在保留数据上具有强精度和召回率。该方法在欺诈检测领域具有良好的泛化能力,在客户层面和艺术家层面的欺诈检测中均能实现强性能,而无需修改核心方法。

英文摘要

Music streaming fraud, where bad actors artificially inflate stream counts to manipulate chart rankings and royalty payments, poses a significant threat to streaming services and legitimate content creators. Traditional fraud detection approaches struggle with a critical challenge: many legitimate edge cases, including super-fans and sleep-music sessions, exhibit activity patterns that closely mimic those of coordinated fraud. We present SAGE, a novel counterfactual-aware negative harvesting approach that combines SimHash-based stratified sampling with a modular gating ensemble for confident negative identification from unlabeled data. Our ensemble architecture employs pluggable statistical gates (currently instantiated with Mahalanobis distance and k-NN density) with configurable voting thresholds enabling adaptive precision-recall trade-offs. This addresses the representation bias problem in Positive-Unlabeled learning by ensuring comprehensive coverage of rare behavioral cohorts through floor-constrained sampling. Evaluation demonstrates strong precision and recall on held-out data. The approach generalizes across fraud detection domains, achieving strong performance on both customer-level and artist-level fraud without modification to the core methodology.

2605.20151 2026-05-20 cs.LG math.ST stat.TH 版本更新

When Does Model Collapse Occur in Structured Interactive Learning?

在结构互动学习中模型崩溃何时发生?

Yuchen Wu, Kangjie Zhou, Weijie Su

发表机构 * School of Operations Research and Information Engineering, Cornell University(卡内基梅隆大学运营管理与信息工程学院) Department of Statistics, Columbia University(哥伦比亚大学统计系) Department of Statistics and Data Science, University of Pennsylvania(宾夕法尼亚大学统计与数据科学系)

AI总结 研究探讨了在结构互动学习环境中,生成模型性能下降(模型崩溃)的发生条件,通过分析交互图拓扑结构,推导出模型崩溃的必要和充分条件,并通过数值实验验证理论结果。

Comments 57 pages, 12 figures

详情
AI中文摘要

生成式人工智能的普及催生了交互学习环境,其中模型参数通过自然过程生成的数据和由其他模型产生的合成输出不断更新。这种范式引入了两大挑战:(1)训练数据不再仅来自目标群体,破坏了经典统计学习的核心假设;(2)模型训练过程变得内在相关,因为模型通过反复接触彼此的合成输出进行交互,方式可能复杂。在这样的结构互动学习环境中建立可靠的统计推断仍然是一个重要开放问题。特别是,人们对模型崩溃现象日益关注,该现象是指生成模型在训练于早期模型生成的合成数据时性能逐步下降。先前关于模型崩溃的研究主要集中在单个模型训练其自身输出的情况,未能捕捉多模型交互环境中的模型性能。在本文中,我们填补了这一空白,通过研究具有通用交互模式的交互学习环境中的生成模型性能。特别是,我们利用有向图形式化模型交互,并证明模型崩溃的发生严重依赖于交互图的拓扑结构。我们进一步推导出一个显式的必要和充分条件,以表征模型崩溃何时发生,并为线性回归建立有限样本结果,为一般M估计量建立渐近保证。我们通过广泛的数值实验支持我们的理论发现。

英文摘要

The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major challenges: (1) training data are no longer drawn exclusively from the target population, undermining a core assumption of classical statistical learning, and (2) model training processes become inherently correlated, as models interact with one another through repeated exposure to each other's synthetic outputs in a potentially complex manner. Establishing reliable statistical inference in such structured interactive learning environments therefore remains an important open problem. In particular, there is growing concern about model collapse, a phenomenon in which the performance of generative models progressively degrades as they are trained on synthetic data produced by earlier model generations. Prior work on model collapse primarily focuses on a single model trained on its own output, failing to capture model performance in multi-model interactive settings. In this work, we fill this gap by investigating the performance of generative models in an interactive learning environment with general interaction patterns. In particular, we formalize model interactions using directed graphs and show that the occurrence of model collapse depends critically on the topology of the interaction graph. We further derive an explicit necessary and sufficient condition characterizing when model collapse occurs, and establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators. We support our theoretical findings through extensive numerical experiments.

2605.20145 2026-05-20 stat.ML cs.LG stat.ME 版本更新

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

面向目标的高斯过程低尾校准用于贝叶斯优化

Aurélien Pion, Emmanuel Vazquez

发表机构 * Univ. Paris-Saclay, CNRS, CentraleSupélec, L2S, Gif-sur-Yvette, France(巴黎萨克雷大学,国家科学研究中心,中央超算实验室,L2S,法国吉夫-sur-伊夫特)

AI总结 本文研究了在无噪声情况下,针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准,提出了一种后处理方法tcGP,以校准预测分布低于t的部分,并展示了基于此的全局优化算法在设计空间中保持密集性,实验表明相较于标准高斯过程模型和全局校准高斯过程模型,改进了低尾校准和贝叶斯优化性能。

详情
Journal ref
ICML 2026
AI中文摘要

贝叶斯优化(BO)利用高斯过程(GP)预测分布来选择昂贵的黑箱目标的评估点。核选择和超参数选择可能导致预测分布不准确,从而影响探索与利用的平衡。对于最小化问题,采样标准如预期改进(EI)依赖于当前最佳值以下的预测分布,因此低尾不准确直接影响采样决策。本文研究了在无噪声情况下,针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准,超参数通过最大似然法选择。引入了一种预测可靠性低于t的框架,基于两个空间校准的概念:设计空间上的发生校准和子水平集形式{ x∈X, f(x)≤t }上的阈值μ-校准。在此框架基础上,提出tcGP,一种后处理方法,用于校准预测分布低于t的部分,并证明由此得到的基于EI的全局优化算法在设计空间中保持密集。在标准基准测试中,实验表明相较于标准高斯过程模型和全局校准高斯过程模型,改进了低尾校准和贝叶斯优化性能。

英文摘要

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

2605.20134 2026-05-20 cs.LG 版本更新

TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning

TrajTok: 用于轨迹表示学习的自适应空间令牌化

Zhen Xiong, Shang-Ling Hsu, Cyrus Shahabi

发表机构 * University of Southern California(南加州大学)

AI总结 本文提出TrajTok,一种通过自适应空间令牌化学习通用轨迹表示的方法,通过多分辨率六边形网格划分和预训练策略,实现了在轨迹相似性搜索、分类、预计到达时间和旅行时间回归等任务上的优异表现。

详情
AI中文摘要

从原始GPS轨迹学习通用的轨迹表示仍然具有挑战性,因为数据是连续的、嘈杂的且采样不规则。空间令牌化同样具有挑战性:细网格会产生稀疏单元格,嵌入较弱,而粗网格会将异质运动模式合并为同一个令牌。我们提出了TrajTok,一种具有简单预训练配方的轨迹编码器,用于可转移的轨迹嵌入。TrajTok首先从GPS点的空间分布学习多分辨率六边形网格划分,将嘈杂的GPS序列转换为离散的单元格令牌。为了捕捉几何和运动学,它使用分解的Transformer编码器,带有早期模态自注意力块、跨注意力融合层和时空旋转位置嵌入(ST-RoPE),以编码每个令牌的位置和时间。TrajTok通过掩码令牌建模进行预训练,从部分轨迹观测中恢复几何结构和运动学模式。在Porto数据集上,冻结的TrajTok编码器结合轻量级任务适配器在轨迹相似性搜索、分类、预计到达时间和完整旅行时间回归任务上表现优异,优于多种任务特定方法。相同的冻结编码器支持几何主导和运动学主导任务,表明TrajTok学习了可转移的轨迹结构,而不是任务特定的捷径。这些结果表明,学习多分辨率空间令牌化结合掩码令牌预训练是通用轨迹基础模型的有希望的方向。

英文摘要

Learning generalizable trajectory representations from raw GPS traces remains difficult because the data is continuous, noisy, and irregularly sampled. Spatial tokenization is also challenging: fine grids yield sparse cells with weak embeddings, while coarse grids merge heterogeneous movement patterns into the same token. We present TrajTok, a trajectory encoder with a simple pretraining recipe for transferable trajectory embeddings. TrajTok first learns a multi-resolution hexagonal cell partition from the spatial distribution of GPS points, converting noisy GPS sequences into discrete cell tokens. To capture both geometry and kinematics, it uses a factorized transformer encoder with early per-modality self-attention blocks, cross-attention fusion layers, and spatiotemporal rotary position embeddings, ST-RoPE, to encode where and when each token occurs. TrajTok is pretrained with masked-token modeling that recovers both geometric structure and kinematic patterns from partial trajectory observations. On the Porto dataset, a frozen TrajTok encoder with lightweight task adapters achieves strong performance across trajectory similarity search, classification, estimated time of arrival, and full travel-time regression, outperforming multiple task-specific methods. The same frozen encoder supports both geometry-dominated and kinematics-dominated tasks, suggesting that TrajTok learns transferable trajectory structure rather than task-specific shortcuts. These results indicate that learned multi-resolution spatial tokenization combined with masked-token pretraining is a promising direction for general-purpose trajectory foundation models.

2605.20132 2026-05-20 physics.geo-ph cs.LG eess.SP 版本更新

FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

FiLark:一种面向流式处理的软件框架,用于分布式声学传感的端到端探索、标注和算法集成

Jintao Li, Weichang Li, Kai Tong, Xaingyu Guo

发表机构 * organization= State Key Laboratory of Ocean Sensing \& Ocean College, Zhejiang University , city= Zhoushan , postcode= 316021 , country= China organization= College of Information Science Electronic Engineering, Zhejiang University , city= Hangzhou , country= China organization= College of Computer Science Technology, Zhejiang University , city= Hangzhou , country= China

AI总结 本文提出FiLark框架,通过流式处理原则,实现分布式声学传感数据的端到端探索、标注和算法集成,解决传统批量分析框架无法处理连续高通道数据流的问题。

详情
AI中文摘要

分布式声学传感(DAS)系统生成的连续、超高通道计数的数据流速率超过了传统批量分析框架的能力。因此,诸如长时记录的交互探索、可扩展的事件标注和实时算法闭环监控等关键任务仍然无法得到足够支持。本文提出了FiLark(Fiber Lark),一种Python框架,其应用流式处理原则贯穿数据访问、信号处理、可视化和监控。FiLark将任何DAS源,包括连续多文件记录,作为统一流进行处理,并围绕该抽象构建所有系统组件。基于OpenGL的环形缓冲区渲染器允许以恒定内存使用量交互浏览和可视化任意长的记录。集成的标注界面支持在连续数据流中直接进行事件标注,从而在不进行离线预处理的情况下创建可重复的机器学习准备好的标注数据集。信号处理库包括时间、空间、频谱和分解基的运算符,包含通过PyTorch实现的CPU版本和GPU加速版本,以及具有状态的分块执行,以在段边界保持处理连续性和应用语义。标准化的监控接口进一步将流式检测器和基于学习的模型整合到可视化工作流程中。通过在所有层次共享共同的流式抽象,FiLark允许在交互式开发的处理配置和工作流程直接转移到可扩展的生产管道中,而无需修改。

英文摘要

Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.

2605.20127 2026-05-20 q-bio.NC cs.AI cs.LG 版本更新

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

超越预测准确性:用于评估模型-大脑对齐的靶空间恢复曲线

Ken Nakamura, Tomoya Nakai, Ryuto Yashiro, Ayumu Yamashita, Kaoru Amano

发表机构 * The University of Tokyo(东京大学) Osnabrück University and Freie Universität Berlin(奥斯纳布吕克大学和柏林自由大学) Kobe University(Kobe大学)

AI总结 本文提出了一种评估模型-大脑对齐的新方法,通过分析可重复预测的靶空间响应维度,揭示预测准确性之外的模型-大脑对齐情况。

Comments 34 pages, 12 figures, 5 tables

详情
AI中文摘要

人工视觉模型通常通过测量其内部表示预测大脑响应的准确性来评估人类视觉皮层。然而,仅凭预测准确性无法确定目标大脑响应空间中哪些维度被恢复。本文介绍了一种统一框架,通过识别预测恢复的响应维度来评估模型-大脑和大脑-大脑对齐。通过重复fMRI测量,我们首先确定可在独立试验分割中重复预测的目标大脑响应维度。然后,我们预测目标大脑响应,无论是从另一个受试者的大脑响应还是视觉模型的内部表示,并量化这些可重复响应维度的恢复程度。将此框架应用于自然场景数据集的一个子集,其中八名受试者在fMRI下观看了相同的自然图像,我们发现早期到中期视觉皮层响应包含一组低维的可重复维度。大脑-大脑比较确定哪些维度可以从其他受试者的大脑中一致恢复,提供了一种诊断性的人类参考而非仅标量基准。在某些情况下,预训练和随机初始化的模型在预测准确性上相似,但这些响应维度的恢复曲线却不同。这些结果表明,仅凭预测准确性可能掩盖模型-大脑不匹配。通过明确哪些可重复的大脑响应维度被预测恢复,我们的框架提供了更诊断性的评估,以评估人工视觉模型与人类视觉皮层的对齐情况。

英文摘要

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

2605.20122 2026-05-20 stat.ML cs.CC cs.LG 版本更新

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

优化Wasserstein距离估计的计算-统计运行时间

Peter Matthew Jacobs, Jeff M. Phillips

发表机构 * Department of Statistics(统计学系) Kahlert School of Computing(Kahlert计算学院) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) University of Utah(犹他大学)

AI总结 本文提出了一种Sample-Sketch-Solve方法,通过引入正则化笛卡尔网格草图来压缩数据并加速Wasserstein距离的计算,实现了在Hölder光滑分布下以更优的运行时间达到ε误差的估计。

详情
AI中文摘要

平方Wasserstein距离是衡量概率分布之间差异的常用工具。该距离通常在两个底层随机样本的经验测度之间计算。不幸的是,即使在低维欧几里得空间问题(d∈{2,3})中,计算Wasserstein距离的算法在运行时间上随着n和所需精度的增加而表现不佳。为此,我们考虑计算-统计运行时间,目标是从样本中估计潜在光滑测度之间的Wasserstein距离,误差在期望意义上不超过ε。我们允许收集样本的计算成本为O(1)。为此,我们开发了一种Sample-Sketch-Solve范式,其中引入了样本的正则化笛卡尔网格草图。我们证明,尤其是在α-Hölder光滑分布下,这可以压缩数据而不增加渐近误差,并且正则化结构使更快的精确算法成为可能。最终,我们以ε误差在ε^{-max(2,(d+1+o(1))/(1+α))}时间内近似W_2^2(P,Q),对于0 < α < 1的Hölder光滑分布P,Q在(0,1)^d上;当d=2时,对于α>1/2,达到最优Θ(ε^{-2}),当d=3时,当α→1时几乎最优。

英文摘要

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

2605.20108 2026-05-20 eess.SY cs.AI cs.LG cs.LO cs.SY 版本更新

k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics

k-诱导神经屏障证书用于未知非线性动力学

Ben Wooding, Hongchao Zhang, Taylor T. Johnson, Abolfazl Lavaei

发表机构 * Vanderbilt University(范德堡大学) Newcastle University(新castle大学)

AI总结 本文提出了一种基于神经网络的k-诱导神经屏障证书(k-NBCs),用于部分未知的非线性系统,通过利用神经网络的可扩展性以及泛化Willems等人基本引理,构建数据驱动的表示以进行SMT验证,同时提高了设计灵活性。

Comments 18 pages, 5 figures, 3rd International Conference on Neuro-Symbolic Systems (NeuS)

详情
AI中文摘要

尽管传统的(k=1)离散时间屏障证书条件通过要求函数在每一步都非递增来施加严格的安全约束,k-诱导屏障证书通过允许临时增加--最多k-1次,每次在阈值ε内--同时保持整体安全性并提高灵活性。本文利用神经网络构建k-诱导神经屏障证书(k-NBCs)用于(部分)未知的非线性系统。虽然神经网络在设计过程中提供可扩展性,但缺乏形式保证,需要额外的方法如基于可满足性模理论(SMT)的反例引导归纳合成(CEGIS)进行验证。然而,CEGIS-SMT框架需要系统动力学的知识,这在实际情况下不可用。为此,我们利用Willems等人基本引理的泛化,使用单个状态轨迹,构建数据驱动的表示以进行SMT验证而不牺牲准确性。此外,CEGIS-SMT进一步消除了将屏障证书限制在特定函数类(如平方和)的约束,从而在设计上具有更大的灵活性。我们验证了我们的方法在三个非线性案例研究中,具有(部分)未知的动力学。

英文摘要

While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.

2605.20107 2026-05-20 cs.LG cs.AI 版本更新

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

超越各向同性:JEPAs中的哈密顿几何与辛预测

Robert Jenkinson Alvarez

发表机构 * GitHub

AI总结 本文研究了JEPAs中各向同性假设的局限性,提出基于哈密顿几何的辛预测方法,通过相空间状态和学习的哈密顿量预测视图间过渡,从而提升模型在不同数据集上的性能。

详情
AI中文摘要

JEPAs通常将单视图嵌入正则化为各向同性的高斯分布,隐含地将欧几里得对称性纳入表示中。我们证明这不仅仅是无害的默认设置。对于已知的结构化下游几何H>0,最小最大和最大熵协方差在哈密顿能量预算下为(c/d)H^{-1},欧几里得各向同性会带来闭式价格。更重要的是,当下游几何未知时,没有几何无关的固定边际目标是规范的:每个固定协方差形状可以对某些结构化几何最大化地错位。我们进一步表明,即使拥有oracle单视图边际,也无法识别JEPA视图间预测耦合。这些结果表明,JEPAs中的结构偏差应进入跨视图耦合而非固定编码器边际。我们通过HamJEPA实例化这一原则,将每个视图编码为相空间状态(q,p),并通过学习的哈密顿量跃迁映射预测视图间过渡,非各向同性的尺度和频谱地板防止崩溃。在刻意无头标记协议中,HamJEPA在CIFAR-100上比SIGReg提升4.89 kNN@20和3.52线性探针点,在30个epoch时,以及在80个epoch时提升6.45 kNN@20和10.64线性探针点。而匹配的MLP预测器消融显示,辛耦合是驱动邻域几何增益的成分。在ImageNet-100上,HamJEPA-q在45个epoch时提升4.82 kNN@20和7.52线性探针点。

英文摘要

JEPAs often regularize one-view embeddings toward an isotropic Gaussian, implicitly baking Euclidean symmetry into the representation. We show that this is not merely a benign default. For a known structured downstream geometry $H\succ0$, the minimax and maximum-entropy covariance under a Hamiltonian energy budget is $(c/d)H^{-1}$, and Euclidean isotropy incurs a closed-form price of isotropy. More importantly, when the downstream geometry is unknown, no geometry-independent fixed marginal target is canonical: every fixed covariance shape can be maximally misaligned for some structured geometry. We further show that even oracle one-view marginals do not identify the JEPA view-to-view predictive coupling. These results suggest that the structural bias in JEPAs should enter the cross-view coupling rather than a fixed encoder marginal. We instantiate this principle with \textbf{HamJEPA}, which encodes each view as a phase-space state $(q,p)$ and predicts view-to-view transitions with a learned Hamiltonian leapfrog map, while non-isotropic scale and spectral floors prevent collapse. In a deliberately headless token protocol, HamJEPA improves over SIGReg on CIFAR-100 by $+4.89$ kNN@20 and $+3.52$ linear-probe points at 30 epochs, and by $+6.45$ kNN@20 and $+10.64$ linear-probe points at 80 epochs, while a matched MLP predictor ablation shows that the symplectic coupling is the ingredient driving the neighborhood-geometry gain. On ImageNet-100, HamJEPA-$q$ improves by $+4.82$ kNN@20 and $+7.52$ linear-probe points at 45 epochs.

2605.20105 2026-05-20 cs.LG 版本更新

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing

最优表示尺寸:预训练和线性探测的高维分析

Valentina Njaradi, Clémentine Dominé, Rachel Swanson, Marco Mondelli, Andrew Saxe

发表机构 * Gatsby Computational Neuroscience Unit(Gatsby计算神经科学单元) University College London(伦敦大学学院) Institute of Science and Technology Austria(奥地利科学与技术研究所) Sainsbury Wellcome Centre(萨金斯-韦尔科姆中心)

AI总结 本文研究了预训练和线性探测过程中的最优表示尺寸问题,通过高维分析揭示了表示维度、未标记和标记样本数量以及任务对齐性对训练和泛化误差的影响,提出了在不同预训练和下游数据条件下优化表示尺寸的条件。

详情
AI中文摘要

学习从有限数据中泛化是人工和生物系统面临的基本挑战。一种常见策略是从大量未标记数据中提取可重用的结构,从而高效适应新任务。这种两阶段范式现在已成为现代训练流水线的标准,即预训练后进行微调或线性探测。我们为这一过程提供了一个分析模型:结构提取被形式化为主成分分析,而下游学习则被建模为对单独标记数据集的线性回归。在高维情况下,我们推导出训练和泛化误差的精确表达式,展示了其对表示维度、未标记和标记样本数量以及任务对齐性的依赖性。我们的结果表明,预训练表示强烈影响下游泛化,我们将其最优表示尺寸作为任务参数的函数进行表征:在大量预训练数据但稀缺下游数据时,最大压缩表示最优;而在预训练数据有限时,高维表示泛化更好。此外,我们建立了预训练和监督之间的精确权衡,量化了需要多少未标记数据来替代一个标记样本。除了我们理想化的模型外,我们在自编码器和预训练大语言模型中也观察到相似的现象。总体而言,我们强调优化表示尺寸至关重要,给出了压缩预训练时提高泛化的条件。

英文摘要

Learning to generalise from limited data is a fundamental challenge for both artificial and biological systems. A common strategy is to extract reusable structure from abundant unlabelled data, enabling efficient adaptation to new tasks from limited labelled data. This two-stage paradigm is now standard in modern training pipelines, where pretraining is followed by fine-tuning or linear probing. We provide an analytical model of this process: structure extraction is formalized as principal component analysis on unlabelled data, and downstream learning as linear regression on a separate labelled dataset. In the high-dimensional regime, we derive exact expressions for training and generalisation error showcasing their dependence on representation dimensionality, unlabelled and labelled sample sizes, and task alignment. Our results show that pretrained representations strongly influence downstream generalisation, and we characterize the optimal representation size as a function of task parameters: with abundant pretraining data but scarce downstream data, maximally compressed representations are optimal, whereas with limited pretraining data, higher-dimensional representations generalise better. Furthermore, we establish an exact trade-off between pretraining and supervision, quantifying how much unlabelled data is required to replace a single labelled sample. Beyond our idealised model, we observe similar phenomenology in autoencoders and pretrained LLMs. Altogether, we highlight that optimising representation size is critical, giving conditions for when compression during pretraining improves generalisation.

2605.20104 2026-05-20 cs.LG cs.AI 版本更新

Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding

少写多取:用于推测解码的混合树构建

Yuhao Shen, Tianyu Liu, Xinyi Hu, Quan Kong, Baolin Zhang, Jun Dai, Jun Zhang, Shuang Ge, Lei Chen, Yue Li, Mingcheng Wan, Cong Wang

发表机构 * Zhejiang University(浙江大学) Qwen Applications Business Group of Alibaba(阿里巴巴量子计算实验室)

AI总结 本文提出了一种混合树构建方法Graft,通过结合剪枝和检索操作,解决了推测解码中资源分配的帕累托权衡问题,实现了在不同部署场景下的速度提升和接受率优化。

详情
AI中文摘要

推测解码(SD)通过 draft-then-verify 模式加速大语言模型推理。为最大化接受率,近期方法构建了 expansive draft trees,但导致严重的 VRAM 带宽和计算开销,成为端到端加速的瓶颈。虽然动态深度剪枝可通过移除边际分支减少延迟,但也会丢弃潜在有效的候选,阻碍接受率达到密集树的上限。在本文中,我们识别了资源分配中的关键机会:从密集到剪枝的转换释放了显著的计算预算。为了打破这一帕累托权衡,我们引入 Graft,一种补偿框架,将剪枝和检索作为相互强化的操作。剪枝提供足够的预算用于检索,而检索补偿剪枝引起的覆盖损失并恢复接受长度。通过采用顺序的 `prune-then-graft' 机制,Graft 将高预测性的检索 token 插入剪枝打开的位置,用几乎零开销填补拓扑缺口。Graft 完全无训练且无损失。全面评估显示,Graft 在实际部署设置中建立了新的帕累托前沿,包括短上下文生成、长上下文生成和大规模模型。在短上下文基准上,它实现了高达 5.41× 的加速,并在大规模 Qwen3-235B 上将平均加速率提高至 EAGLE-3 的 21.8%。我们还初步探讨了将 Graft 应用于 DFlash 风格的块解码范式,提供了扩展 grafting 以超越自回归 draft trees 的初步证据和见解。

英文摘要

Speculative decoding (SD) accelerates large language model inference by leveraging a draft-then-verify paradigm. To maximize the acceptance rate, recent methods construct expansive draft trees, which unfortunately incur severe VRAM bandwidth and computational overheads that bottleneck end-to-end speedups. While dynamic-depth pruning can reduce this latency by removing marginal branches, it also discards potentially valid candidates, preventing the acceptance rate from reaching the upper bound of dense trees. In this paper, we identify a critical opportunity in resource allocation: the transition from dense to pruned drafting frees up significant computational budget. To break this Pareto tradeoff, we introduce Graft, a compensation framework that couples pruning and retrieval as mutually reinforcing operations. Pruning supplies sufficient budget for retrieval, while retrieval compensates for pruning-induced coverage loss and recovers accepted length. By employing a sequential `prune-then-graft' mechanism, Graft attaches highly predictive retrieved tokens into positions opened by pruning, filling the topological gaps with near-zero overhead. Graft is entirely training-free and lossless. Comprehensive evaluations show that Graft establishes a new Pareto frontier across practical deployment settings, including short-context generation, long-context generation, and large-scale models. On short-context benchmarks, it achieves up to 5.41$\times$ speedup and improves average speedup over EAGLE-3 by up to 21.8% on the large-scale Qwen3-235B. We also provide a preliminary exploration of applying Graft to the DFlash-style block drafting paradigm, offering initial evidence and insights for extending grafting beyond autoregressive draft trees.

2605.20088 2026-05-20 cs.LG cs.AI 版本更新

INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification

INSHAPE:实例级形状lets用于可解释的时间序列分类

Seongjun Lee, Seokhyun Lee, Changhee Lee

发表机构 * Department of Artificial Intelligence, Korea University(韩国大学人工智能系)

AI总结 本文提出INSHAPE框架,通过发现每个时间序列特有的变量长度判别性时间模式,解决传统方法在实例特定特征与整体模式不一致以及忽略时间依赖性的问题,从而提高时间序列分类的可解释性和预测性能。

Comments Accepted to IJCAI 2026. 25 pages

详情
AI中文摘要

发现形状lets——即时间序列内的判别性时间模式——已被广泛研究,以应对时间序列分类(TSC)固有的复杂性,并使模型决策过程更加透明。然而,现有方法主要集中在整体数据集上优化的群体级形状lets,导致两个根本性限制:(i)群体级模式往往与实例特定特征不一致,导致性能不佳并可能产生误导性解释;(ii)大多数方法将形状lets视为独立实体,忽略了多个模式之间的重要时间依赖性和相互作用。为了解决这些限制,我们提出了INSHAPE,一个可解释的TSC框架,该框架发现每个时间序列特有的变量长度判别性时间模式。INSHAPE将这些模式识别为非重叠段,并建模其时间依赖性,从而在提供清晰的实例级解释的同时实现强大的预测性能。此外,INSHAPE通过自下而上的方法连接局部和全局可解释性,将实例级形状lets聚合为原型(群体级)形状lets。在128个UCR和30个UEA基准数据集上的广泛实验表明,INSHAPE在性能上始终优于最先进的基于形状lets的方法,同时提供更直观和可解释的见解。

英文摘要

Discovering shapelets -- i.e., discriminative temporal patterns within time series -- has been widely studied to address the inherent complexity of time-series classification (TSC) and to make model decision-making processes more transparent. However, existing methods primarily focus on population-level shapelets optimized across the entire dataset, which leads to two fundamental limitations: (i) population-level patterns often misalign with instance-specific features, resulting in suboptimal performance and potentially misleading interpretations, and (ii) most methods treat shapelets as independent entities, overlooking important temporal dependencies and interactions among multiple patterns. To address these limitations, we propose INSHAPE, an interpretable TSC framework that discovers variable-length, discriminative temporal patterns specific to each time series. INSHAPE identifies these patterns as non-overlapping segments and models their temporal dependencies, thereby providing clear instance-level interpretations while achieving strong predictive performance. Furthermore, INSHAPE bridges local and global interpretability through a bottom-up approach, aggregating instance-level shapelets into prototypical (population-level) shapelets. Extensive experiments on 128 UCR and 30 UEA benchmark datasets show that INSHAPE consistently outperforms state-of-the-art shapelet-based methods while providing more intuitive and interpretable insights.

2605.20086 2026-05-20 cs.NE cs.AI cs.LG 版本更新

What Do Evolutionary Coding Agents Evolve?

进化编码代理进化什么?

Nico Pelleriti, Sree Harsha Nelaturu, Zhanke Zhou, Zongze Li, Max Zimmer, Bo Han, Sebastian Pokutta

发表机构 * Zuse Institute Berlin(柏林Zuse研究所) Technical University of Berlin(柏林技术大学) Hong Kong Baptist University(香港 Baptist大学) RIKEN Center for Advanced Intelligence Project(RIKEN高级智能项目中心)

AI总结 本文研究了进化编码代理在数学发现和算法设计中通过任务特定反馈生成、修改和选择代码的过程,通过EvoTrace数据集和EvoReplay方法分析了进化过程中的机制,发现大部分得分提升来自少数几种编辑类型,并发现存在确定性的循环模式。

Comments 28 pages, 12 figures, 12 tables

详情
AI中文摘要

最近的研究将大型语言模型与进化搜索结合,通过任务特定反馈迭代地生成、修改和选择代码。这些系统在数学发现和算法设计中取得了显著成果,但一个基本问题仍然存在:它们实际上进化了什么?进展通常通过任务特定评估器下最佳得分来总结,但该得分可能反映多种不同的机制:新的算法结构、重新调整现有策略、重新组合已存在于模型内部知识中的想法,或过度拟合评估器。区分这些机制需要检查搜索过程本身,而不是仅其最终结果。我们引入了EvoTrace,一个涵盖四个进化框架、推理和非推理模型以及16个数学和算法设计任务的进化编码轨迹数据集。为了分析这些轨迹,我们开发了EvoReplay,一种基于回放的方法,可以重建高分解决方案背后的局部搜索状态,并测试受控干预,包括调整常数、删除程序组件和替换模型或提示上下文。我们使用LLM-as-judge流程对EvoTrace中的每个代码编辑注释为九种 recurring 编辑类型之一,并通过盲人人工重新注释验证了该流程。在EvoTrace中,大部分得分提升来自少数几种编辑类型。我们进一步发现一种确定性的循环模式:大约30%的搜索过程中添加的代码行是字节相同的重新引入先前删除的行,几乎在每个运行中都存在。这些结果表明,进化编码代理的基准提升可能来自质的不同机制,其中只有某些机制对应于新的算法结构。EvoTrace使进化编码代理的评估超越了最终基准得分。

英文摘要

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typically summarized by the best score a run reaches under a task-specific evaluator, but that score can reflect several different mechanisms: new algorithmic structure, re-tuning an existing strategy, recombining ideas already in the model's internal knowledge, or overfitting to the evaluator. Distinguishing these mechanisms requires inspecting the search process itself, not only its final outcome. We introduce EvoTrace, a dataset of evolutionary coding traces spanning four evolutionary frameworks, reasoning and non-reasoning models, and 16 tasks across mathematics and algorithm design. To analyze these traces, we develop EvoReplay, a replay-based methodology that reconstructs the local search states behind high-scoring solutions and tests controlled interventions, including adjusting constants, removing program components and substituting models or prompting contexts. We annotate every code edit in EvoTrace with one of nine recurring edit types using an LLM-as-judge pipeline validated against blind human re-annotation. Across EvoTrace, most score gains come from a small subset of these edit types. We further find a deterministic cycling pattern: about 30% of code lines added during search are byte-identical re-introductions of previously-deleted lines, present throughout nearly every run. These results show that benchmark gains in evolutionary coding agents can arise from qualitatively different mechanisms, only some of which correspond to new algorithmic structure. EvoTrace enables more diagnostic evaluation of evolutionary coding agents beyond final benchmark scores.

2605.20079 2026-05-20 cs.CV cs.AI cs.LG eess.IV 版本更新

Probability-Conserving Flow Guidance

概率守恒的流引导

Parsa Esmati, Junha Hyung, Amirhossein Dadashzadeh, Jaegul Choo, Majid Mirmehdi

发表机构 * University of Bristol(布里斯托大学) KAIST(韩国科学技术院)

AI总结 本文提出了一种概率守恒的流引导方法AdaMaG,通过分析连续方程,将引导效果分解为发散项和分数平行项,并通过时间依赖的调度和分数平行衰减来控制这两个项,从而在不增加推理成本的情况下提高生成质量并减少幻觉。

详情
AI中文摘要

扩散和基于流的生成模型在视觉合成中占据主导地位,引导将样本对齐到用户输入并提高感知质量。然而,分类器无关引导(CFG)和基于外推的方法是速度/分数的启发式线性组合,忽略了生成流形的几何结构,破坏了概率守恒,导致在强引导下样本偏离学习的流形。我们通过连续方程分析引导,并展示其效果分解为一个发散项和一个在参数化下不变的分数平行项。我们证明发散项在采样接近数据流形时结构上会发散,这促使我们采用时间依赖的调度和分数平行衰减。所得到的即插即用规则,自适应流形引导(AdaMaG),在不增加推理成本的情况下限制了这两个项。最后,我们展示大多数减少饱和或提高生成质量的实证启发式方法直接对应于我们分解中的两个项。在图像生成基准测试中,AdaMaG提高了真实感,减少了幻觉,并在高引导制度下诱导了受控的去饱和。

英文摘要

Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.

2605.20074 2026-05-20 cs.LG 版本更新

Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

面向组合优化中算法对齐的蒸馏保证

Thien Le, Melanie Weber

发表机构 * SEAS, Harvard University(哈佛大学SEAS学院)

AI总结 本文研究了在算法对齐框架下,通过蒸馏将大规模模型的知识转移到更高效的模型以用于部署的问题,重点分析了当目标模型是图神经网络且其架构与动态规划算法对齐时,蒸馏成功的条件。

Comments 22 pages

详情
AI中文摘要

蒸馏将知识从在广泛数据上训练的大模型转移到更小、更高效的模型,以用于部署。在结构预测设置中,任务的先验知识可以指导目标架构的选择,使其与底层问题在算法上对齐。在最近的决策树(DT)蒸馏学习理论分析(Boix-Adsera, 2024)基础上,我们研究了蒸馏在组合优化任务中成功的情况。我们关注目标模型是图神经网络,其架构与任务的动态规划(DP)算法对齐的情况。假设源模型足够丰富,通过线性表示假设(LRH)(Elhage et al., 2022; Park et al., 2024)形式化,我们证明蒸馏问题可以在DP转移函数的复杂度参数中高效解决,该参数表示为决策树。我们的结果提供了在算法对齐风味下的蒸馏成功严格充分条件。

英文摘要

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a DT. Our results provide a rigorous sufficient condition for successful distillation in the flavour of algorithmic alignment.

2605.20068 2026-05-20 stat.ML cs.LG 版本更新

Tail Annealing for Heavy-Tailed Flow Matching

尾部退火用于厚尾流匹配

Jean Pachebat

发表机构 * CMAP, École Polytechnique, Institut Polytechnique de Paris(CMAP,巴黎高等学院,巴黎理工学院)

AI总结 本文提出了一种简单的方法,通过在训练前对数据应用软对数变换,然后在生成后进行指数化,以处理厚尾数据问题。该方法通过Hill诊断决定是否对每个坐标进行变换,保留轻尾边缘不变,从而压缩厚尾到标准流匹配可以处理的范围内,无需厚尾基础分布或架构修改。

Comments 18 pages

详情
AI中文摘要

标准生成模型在处理厚尾数据时存在困难:Lipschitz架构无法从高斯噪声中生成幂律尾部,且在厚尾数据和高斯数据之间插值是不合理的。我们提出一个简单的解决方案:在训练前对数据应用软对数变换$ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$,然后在生成后对样本进行指数化。Hill诊断决定每个坐标是否进行变换,从而在不增加复杂度的情况下保留轻尾边缘不变。这将厚尾压缩到标准流匹配可以处理的范围内,而无需厚尾基础分布或架构修改。我们提供了理论直觉说明其有效性:对数变换将帕累托尾部映射到指数,诱导的动力学通过幂变换实现尾部退火。在144配置的多变量基准测试(3个copulas,$d$最大到100,4个尾指数)上,Log-FM在$W_1$、CVaR$_{99}$和极值分位数度量上优于专门的基线,并且是唯一在2880次运行中无严重发散的方法。

英文摘要

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.

2605.20040 2026-05-20 cs.LG 版本更新

Active Context Selection Improves Simple Regret in Contextual Bandits

主动上下文选择提升上下文老虎机中的简单遗憾

Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash

发表机构 * College of Management of Technology, EPFL(EPFL技术管理学院) Department of Computer Science, TU Munich(慕尼黑工业大学计算机科学系)

AI总结 本文研究了具有有限上下文空间的上下文多臂老虎机问题,通过主动选择上下文样本来优化简单遗憾,提出了一种在已知和未知上下文分布时均能有效提升性能的算法。

详情
AI中文摘要

我们研究了具有有限上下文空间(即亚群体)的上下文多臂老虎机问题,其中学习者为每个上下文推荐最佳动作,并通过上下文加权简单遗憾进行评估。我们的保证是在奖励分布的最坏情况下,同时保持对上下文分布向量p的实例依赖性。类似于实验设计问题,其中感兴趣的总体是固定的但可选的亚群体可以被控制,我们允许学习者主动选择从何处采样上下文。对于已知的p,我们刻画了紧致的遗憾率:被动采样(上下文随机揭示)的遗憾为顺序√(n/T ||p||_{1/2}),而主动采样(分配q_j ∝ p_j^{2/3})则达到紧致的速率√(n/T) ||p||_{2/3}。所获得的改进可以达到Θ(k^{1/4}),其中k是上下文的数量。我们进一步将分析扩展到预算化的主动采样,刻画相应的紧致速率,并确定何时有限的主动预算足以恢复完全主动的速率。当p未知时,我们提出探索-探索-然后-提交(EETC)算法,该算法在大时间范围内能够匹配已知p的主动速率,仅相差常数因子。在合成和现实数据上的实验支持了我们的理论发现。

英文摘要

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p$. Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known $p$, we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order $\sqrt{n/T \, \lVert p \rVert_{1/2}}$, whereas active sampling with allocation $q_j \propto p_j^{2/3}$ achieves the tight rate $\sqrt{n/T} \, \lVert p \rVert_{2/3}$. The resulting improvement can be as large as $Θ(k^{1/4})$, where $k$ is the number of contexts. We further extend the analysis to budgeted active sampling, characterize the corresponding tight rate, and identify when a limited active budget suffices to recover the fully active rate. When $p$ is unknown, we propose the Explore-Explore-Then-Commit (EETC) algorithm, which optimally balances estimating the context distribution and the time to switch to active allocation, such that for large horizons, it matches the known-$p$ active rate up to constants. Experiments on synthetic and real-world data support our theoretical findings.

2605.20037 2026-05-20 cs.LG cs.AI 版本更新

When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

当批评者意见不一致时:RIS辅助无线控制系统中的自适应奖励中毒攻击

Deemah H. Tashman, Soumaya Cherkaoui

发表机构 * Department of Computer and Software Engineering(计算机与软件工程系)

AI总结 本文提出了一种基于分歧引导的奖励中毒攻击(DGRP),用于攻击Soft Actor-Critic(SAC)智能体,以评估RIS辅助网络中深度强化学习(DRL)的鲁棒性。

详情
AI中文摘要

奖励中毒攻击对基于学习的无线控制系统构成了重大风险。为此,我们提出了一种在受Reconfigurable Intelligent Surfaces(RIS)辅助的Cognitive Radio Network(CRN)环境中,针对Soft Actor-Critic(SAC)智能体的Disagreement-Guided Reward Poisoning(DGRP)自适应攻击。SAC智能体的任务是通过同时优化二次用户(SUs)的发射功率和RIS相移,以最大化长期二次用户的速率。DGRP在SAC双批评者表现出显著分歧时(尤其在高杠杆、高不确定性状态下)污染奖励,导致价值估计扭曲并引导策略朝向次优动作。我们的研究发现,DGRP显著降低了RIS通常提供的性能提升,并降低了传输质量。我们进一步研究了关键攻击参数及其对学习的影响。与周期性定时和探索触发基线相比,DGRP始终造成更大的损害,突显了在评估RIS辅助网络中DRL鲁棒性时考虑分歧意识威胁的必要性。

英文摘要

Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intelligent Surfaces (RIS), the SAC agent is tasked with maximizing the long-term secondary users' (SUs) rate by simultaneously optimizing the transmission power of the SU transmitter and the RIS phase shifts. DGRP corrupts rewards, particularly when the SAC dual critics exhibit substantial disagreement-especially in high-leverage, high-uncertainty states-resulting in distorted value estimations and guiding the policy towards suboptimal actions. Our findings demonstrate that DGRP substantially diminishes the performance improvements typically provided by RIS and degrades transmission quality. We further investigate key attack parameters and determine their impact on learning. In comparison to periodic-timing and exploration-triggered baselines, DGRP consistently causes greater damage, highlighting the necessity of considering disagreement-aware threats when evaluating the robustness of Deep Reinforcement Learning (DRL) in RIS-assisted networks.

2605.20032 2026-05-20 cs.LG cs.MM 版本更新

CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

CAMERA: 适应语义伪装的无监督文本属性图欺诈检测

Junjun Pan, Yixin Liu, Yu Zheng, Lianhua Chi, Alan Wee-Chung Liew, Shirui Pan

发表机构 * School of Information and Communication Technology, Griffith University, Australia(格里菲斯大学信息与通信技术学院,澳大利亚) Department of Computer Science and Information Technology, La Trobe University, Australia(拉特罗布大学计算机科学与信息技术系,澳大利亚)

AI总结 本文提出CAMERA框架,通过适应性多 cue 专家模型来应对语义伪装问题,利用图结构和文本属性信息进行无监督欺诈检测,提高对伪装欺诈者的识别能力。

Comments Accepted by IJCAI 2026

详情
AI中文摘要

文本属性图欺诈检测(TAGFD)在防止在线社交和电子商务平台上欺诈活动方面起着关键作用。然而,为了逃避检测,欺诈者不断演变其伪装策略,通过刻意模仿良性用户的文本响应来隐藏其恶意目的。这种现象称为语义伪装,从根本上破坏了对结构和属性线索如何被用来识别欺诈者的常见假设,并使在无监督TAGFD中发现欺诈者变得困难。为了解决这一问题,我们提出了一个案例自适应多 cue 专家框架(CAMERA)用于无监督TAGFD。CAMERA采用了一个ego解耦的混合专家架构,其中每个专家专门建模一种不同的欺诈指示线索。引入了一个上下文感知的门控模型,以联合考虑ego节点表示及其局部邻域上下文,以适应不同专家学习的线索的集成。此外,CAMERA利用欺诈者的固有稀有性,支持无监督的一类学习,通过专家级目标鼓励建模主导的良性模式,从而实现对伪装欺诈者的可靠无监督检测。在四个具有挑战性的数据集上的实验表明,CAMERA在对抗语义伪装欺诈者方面优于竞争对手,证明了其有效性。代码可在https://github.com/CampanulaBells/CAMERA获取。

英文摘要

Text-attributed graph fraud detection (TAGFD) plays a critical role in preventing fraudulent activities on online social and e-commerce platforms. However, to evade detection, fraudsters continuously evolve their camouflaging strategies by deliberately mimicking textual responses of benign users, thereby concealing their malicious purposes. This phenomenon, referred to as semantic camouflage, fundamentally undermines commonly relied assumptions on how structural and attribute cues can be exploited to identify fraudsters, and makes it difficult to spot fraudsters with unsupervised TAGFD. To bridge the gaps, we propose a Case-Adaptive Multi-cue Expert fRAmework (CAMERA) for unsupervised TAGFD. CAMERA employs an ego-decoupled mixture-of-experts architecture, where each expert specializes in modeling a distinct type of fraud-indicative cue. A context-informed gating model is introduced to jointly consider the ego node representation and its local neighborhood context for adaptive integration of cues learned by different experts. Furthermore, CAMERA leverages the inherent rarity of fraudsters to support unsupervised one-class learning with expert-level objectives that encourage modeling dominant benign patterns, thereby enabling reliable unsupervised detection of camouflaged fraudsters. Experiments on 4 challenging datasets show that CAMERA consistently outperforms competitors, showing its effectiveness against semantically camouflaged fraudsters. Code available at https://github.com/CampanulaBells/CAMERA

2605.20028 2026-05-20 cs.LG physics.ao-ph 版本更新

Training-Free Bayesian Filtering with Generative Emulators

无需训练的贝叶斯过滤与生成模拟器

Thomas Savary, François Rozet, Gilles Louppe

发表机构 * SAIL, Montefiore institute, University of Liège, Belgium(SAIL、蒙费伊尔研究所、利耶大学、比利时)

AI总结 本文提出一种无需额外训练的最优粒子滤波变种,利用基于扩散的动力学模拟器,解决了高维环境下粒子滤波的可扩展性问题,通过非线性混沌系统实验验证了其有效性。

Comments Accepted as a spotlight paper at the International Conference on Machine Learning 2026

详情
AI中文摘要

贝叶斯过滤是一个旨在从观测中估计动态系统合理状态的知名问题。在现有解决方案中,粒子滤波在非线性动态和观测中理论上是精确的,但在高维情况下扩展性差。本文展示,基于扩散的动力学模拟器可以无需额外训练地实现一种最优的粒子滤波变种,这种变种由于经典数值求解器的实现挑战而长期未被探索。非线性混沌系统(包括大气动力学)的实验表明,所提出的方法成功将粒子滤波扩展到高维设置。

英文摘要

Bayesian filtering is a well-known problem that aims to estimate plausible states of a dynamical system from observations. Among existing approaches to solve this problem, particle filters are theoretically exact for non-linear dynamics and observations, but suffer from poor scalability in high dimensions. In this work, we show that diffusion-based emulators of dynamical systems can be used to implement, without additional training, an optimal variant of particle filters that has remained largely unexplored due to implementation challenges with classical numerical solvers. Experiments on nonlinear chaotic systems, including atmospheric dynamics, demonstrate that the proposed approach successfully scales particle filtering to high-dimensional settings.

2605.20009 2026-05-20 cs.LG cs.AI cs.NE 版本更新

Training Neural Networks with Optimal Double-Bayesian Learning

用最优双贝叶斯学习训练神经网络

Vy Bui, Hang Yu, Karthik Kantipudi, Ziv Yaniv, Stefan Jaeger

发表机构 * Lister Hill National Center for Biomedical Communications, National Library of Medicine National Institutes of Health(利斯特希尔国家生物医学通讯中心、国家医学图书馆国家卫生研究院)

AI总结 本文提出了一种新的概率框架,用于学习率这一关键参数,通过双贝叶斯决策机制改进随机梯度下降,从而推导出理论上最优的学习率,并在多种任务中验证其有效性。

Comments 13 pages, 4 figures; see also arXiv:2410.12984 [cs.LG]

详情
AI中文摘要

反向传播与梯度下降是大多数机器学习神经网络架构中常用的优化策略。然而,找到指导训练的最优超参数已证明具有挑战性。尽管普遍认可选择合适参数对于避免过拟合和获得无偏结果至关重要,但这一选择仍主要基于经验实验和经验。本文提出了一种新的概率框架,用于学习率这一随机梯度下降中的关键参数。该框架将经典贝叶斯统计发展为一种涉及两个对抗性贝叶斯过程的双贝叶斯决策机制。从这两个过程可以推导出理论上最优的学习率,并用于随机梯度下降。在各种分类、分割和检测任务中的实验验证了理论上推导出的学习率的实践意义。本文还讨论了所提出的双贝叶斯框架对网络训练和模型性能的影响。

英文摘要

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks corroborate the practical significance of the theoretically derived learning rate. The paper also discusses the ramifications of the proposed double-Bayesian framework for network training and model performance.

2605.20005 2026-05-20 cs.LG 版本更新

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

通过损失自适应学习率实现无遗忘的微调

Parjanya Prajakta Prashant, Jiongli Zhu, Aldan Creo, Babak Salimi

发表机构 * University of California San Diego(加州大学圣迭戈分校)

AI总结 本文提出了一种损失自适应学习率调度方法FINCH,通过动态调整学习率来减少微调过程中的遗忘现象,同时保持任务性能,从而在知识获取、科学和低资源语言适应等基准测试中显著提升了模型表现。

Comments 25 pages

详情
AI中文摘要

在新数据上微调大型语言模型可以提高任务性能,但会损害预训练期间学到的能力,这种现象称为灾难性遗忘。现有方法通过修改微调目标来抑制高损失的token或序列,但这些token对于学习新任务尤其重要,尤其是那些预训练覆盖不足的任务。在这种情况下,硬token仍应有助于学习,因此必须在不抑制它们的情况下控制遗忘。我们发现了一个简单的机制:每步的遗忘受学习率和当前训练损失平方根的乘积限制。这表明高损失批次尤其容易引发遗忘。受此启发,我们引入了FINCH,一种损失自适应的学习率调度方法,它在高损失批次上降低学习率,在模型收敛时增加学习率,同时保持微调目标不变。在知识获取、科学和低资源语言适应基准测试中,FINCH平均减少了93%的遗忘,同时保持与标准微调相当的任务性能。在Qwen3-4B知识获取任务中,FINCH将TruthfulQA的退化减少了5倍,并逆转了HaluEval的退化,同时更好地保持了置信度校准。总体而言,我们的结果表明,学习率调度是微调过程中塑造模型行为的有效工具,而不仅仅是为了目标任务优化。

英文摘要

Fine-tuning large language models on new data improves task performance but degrades capabilities learned during pretraining, a phenomenon known as catastrophic forgetting. Existing methods mitigate this by modifying the fine-tuning objective to suppress high-loss tokens or sequences, but these tokens are essential for learning new tasks, especially those with poor pretraining coverage. In such settings, hard tokens should still contribute to learning, so forgetting must be controlled without suppressing them. We identify a simple mechanism for doing so: per-step forgetting is bounded by the product of the learning rate and the square root of the current training loss. This suggests that high-loss batches are especially prone to inducing forgetting. Motivated by this observation, we introduce FINCH, a loss-adaptive learning-rate schedule that reduces the learning rate on high-loss batches and increases it as the model converges, while leaving the fine-tuning objective unchanged. Across knowledge acquisition, science, and low-resource language adaptation benchmarks, FINCH reduces forgetting by 93% on average while matching the task performance of standard fine-tuning. On Qwen3-4B knowledge acquisition, FINCH cuts TruthfulQA degradation by 5x and reverses HaluEval degradation, while better preserving confidence calibration. Overall, our results show that learning-rate schedules are an effective tool to shape model behavior during fine-tuning, beyond just target-task optimization.

2605.19999 2026-05-20 cs.LG cs.AI cs.CR 版本更新

LLM Benchmark Datasets Should Be Contamination-Resistant

LLM基准数据集应具备抗污染性

Ali Al-Lawati, Jason Lucas, Dongwon Lee, Suhang Wang

发表机构 * The Pennsylvania State University, University Park, PA, USA(宾夕法尼亚州立大学)

AI总结 本文探讨了LLM基准数据集应具备抗污染性,提出通过改进数据集设计和架构来提高其可靠性和通用性。

Comments Accepted to ICML 2026 Position Paper Track

详情
AI中文摘要

基准数据集对于可重复、可靠和具有判别性的LLM评估至关重要。然而,最近的研究表明,许多基准数据集包含在预训练语料库中,即被污染,这降低了它们作为可靠模型泛化度量的价值。在本文中,我们主张基准数据集应具备抗污染性,即不可学习但支持推理。为此,我们首先强调基准数据集污染的广泛存在,并概述抗污染数据集的性质。其次,我们强调Transformer架构中推理和训练流程之间的不对称性可以用来支持抗污染性。第三,我们概述了使这些数据集在各种LLM架构之间互操作的数学进展。基于上述内容,我们呼吁社区通过:(i) 推动新的抗污染方法,(ii) 开发支持方法和平台,以及(iii) 在现有评估流程中采用抗污染基准来确保LLM评估的可靠性。

英文摘要

Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pretraining corpora, i.e., $\textit{contaminated}$, which diminishes their value as reliable measures of model generalization. In this paper, we argue that benchmark datasets should be $\textit{contamination-resistant}$, i.e., $\textit{unlearnable}$, but support $\textit{inference}$. To accomplish this, we first highlight the wide prevalence of benchmark dataset contamination and outline the properties of contamination-resistant datasets. Second, we highlight how the asymmetry between the inference and training pipelines in the Transformer architecture can be leveraged to support contamination-resistance. Third, we outline mathematical advancements to make these datasets interoperable across various LLM architectures. Based on the above, we call on the community to ensure the reliability of LLM benchmarking by: (i) advancing novel contamination-resistant methodologies, (ii) developing supporting methods and platforms, and (iii) adopting contamination-resistant benchmarks into existing evaluation pipelines.

2605.19990 2026-05-20 cs.RO cs.CV cs.LG 版本更新

Minimalist Visual Inertial Odometry

极简视觉惯性里程计

Francesco Pasti, Jeremy Klotz, Nicola Bellotto, Shree K. Nayar

发表机构 * Department of Information Engineering, University of Padua(帕多瓦大学信息工程系) Computer Science Department, Columbia University(哥伦比亚大学计算机科学系)

AI总结 本文提出了一种极简的平面里程计方法,通过四个视觉测量和一个IMU实现差分驱动机器人的鲁棒运动估计,展示了极简传感在高效准确平面里程计中的应用。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

视觉-惯性里程计(VIO)对于移动机器人导航至关重要,但使用高像素相机需要大量资源。本文提出了一种极简方法用于平面里程计,证明仅四个视觉测量和一个IMU即可为差分驱动机器人提供可靠的运动估计。我们的关键见解是四个向下 facing 的光电二极管通过光学Gabor掩码感知世界,产生编码速度的信号。基于此,我们利用物理基础模拟器联合优化掩码参数和时间卷积网络(TCN)。所得到的模型仅通过光电二极管产生的四个测量值解码速度。将这些估计与IMU提供的角速度结合,可以得到连续的平面轨迹。我们通过将原型传感器安装在差分驱动机器人上验证了我们的方法。在多样化的室内和室外地形上,我们的系统能够紧密跟踪参考真实地面,无需任何现实中的微调。我们的工作表明,极简传感能够实现高效且准确的平面里程计。

英文摘要

Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.

2605.19986 2026-05-20 cs.RO cs.CV cs.LG 版本更新

Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

超越二元成功:一种用于细粒度操控的诊断元评估框架

He-Yang Xu, Pengyuan Zhang, Zongyuan Ge, Xiaoshuai Hao, Serge Belongie, Xin Geng, Yuxin Peng, Xiu-Shen Wei

发表机构 * Southeast University(东南大学) Monash University(墨尔本大学) Xiaomi EV(小米电动车) University of Copenhagen(哥本哈根大学) Peking University(北京大学)

AI总结 本文提出MetaFine框架,通过分解理解、感知和受控行为三个维度,诊断细粒度操控中的能力瓶颈,并通过因果干预识别视觉编码器在保持局部空间结构方面的关键限制,从而提升操控精度。

Comments Project page: https://metafine.github.io/

详情
AI中文摘要

细粒度操控标志着一个领域,其中全局场景上下文不再足够,成功取决于局部属性定位、高保真空间感知和符合约束的运动执行之间的紧密耦合。然而,当前的具身AI基准测试将这些能力简化为二元成功率,系统性地将报告能力夸大了多达70%,并掩盖了阻碍实际应用的架构瓶颈。我们引入了MetaFine,一种诊断元评估框架,通过分解理解、感知和受控行为三个轴来分离操控能力。基于组合任务图,MetaFine吸收异构外部基准,并在统一协议下重构为不同复杂度的诊断场景。通过这一视角评估最先进的视觉-语言-动作(VLA)模型,揭示了传统度量无法发现的严重维度特定失败。通过针对性的因果干预,我们确定了视觉编码器保持局部空间结构的能力是细粒度精度的关键瓶颈:改进它可以直接解锁之前无法触及的操控能力,而无需修改下游策略。MetaFine进一步支持混合真实-仿真验证,利用有限的配对现实运行来校准可扩展的仿真基于估计,以获得更稳定的物理基准测试。通过将评估从排名转向诊断,MetaFine将基准测试转变为修复真实物理敏捷性底层能力的可行指南。MetaFine框架、基准和相关资源将在项目页面上公开发布:https://metafine.github.io/。

英文摘要

Fine-grained manipulation marks a regime where global scene context no longer suffices, and success hinges on the tight coupling of local attribute grounding, high-fidelity spatial perception, and constraint-respecting motor execution. However, current embodied AI benchmarks collapse these capacities into binary success rates, systematically inflating reported capabilities by up to 70% and masking the architectural bottlenecks that impede real-world deployment. We introduce MetaFine, a diagnostic meta-evaluation framework that disentangles manipulation competency along three axes: understanding, perception, and controlled behavior. Built on a compositional task graph, MetaFine absorbs heterogeneous external benchmarks and reconstructs them into diagnostic scenarios of varying complexity under a unified protocol. Evaluating state-of-the-art vision-language-action (VLA) models through this lens exposes severe dimension-specific failures invisible to conventional metrics. Through targeted causal intervention, we identify the visual encoder's ability to preserve local spatial structure as a key bottleneck for fine-grained precision: improving it directly unlocks previously inaccessible manipulation capabilities without modifying downstream policies. MetaFine further supports hybrid real-sim validation, using limited paired real-world rollouts to calibrate scalable simulation-based estimates for more stable physical benchmarking. By shifting evaluation from ranking to diagnosis, MetaFine turns benchmarking into an actionable compass for repairing the layered capacities underlying genuine physical dexterity. The MetaFine framework, benchmarks, and supporting resources will be publicly released at our project page: https://metafine.github.io/.

2605.19975 2026-05-20 cs.LG cs.AI 版本更新

Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

具有前瞻性学习:通过多节点前瞻性预测增强神经路由策略

Xia Jiang, Yaoxin Wu, Yew-Soon Ong, Yingqian Zhang

发表机构 * Eindhoven University of Technology(埃因霍温理工大学) Nanyang Technological University(南洋理工大学) Agency for Science, Technology and Research (A*STAR)(科技研究局(A*STAR))

AI总结 本研究提出多节点前瞻性预测(MnLP)方法,通过扩展监督学习范式同时预测多个未来节点,提升神经路由策略的长期规划能力,并在不同问题规模和现实基准上改进泛化能力。

Comments Accepted by the 35th International Joint Conference on Artificial Intelligence

详情
AI中文摘要

神经策略因其对人工启发式依赖的减少而在解决车辆路径问题中展现出潜力。然而,当前的训练范式存在根本性局限:它们主要关注下一个节点的预测,导致短视决策,削弱了长期规划能力。为此,我们引入多节点前瞻性预测(MnLP),一种新的训练策略,扩展监督学习范式以同时预测多个未来节点。我们整合了因果性和可丢弃的MnLP模块,这些模块仅在训练期间运行,使模型能够预测多步决策,同时保持推理时的效率。通过将多深度辅助监督融入损失函数,MnLP使神经策略具备长距离上下文理解能力。实验表明,MnLP在现有训练方法上表现更优,提升了神经策略在各种问题规模、分布和现实基准上的泛化能力。此外,MnLP可以无缝集成到不同的神经架构中,而不引入额外的推理开销。

英文摘要

Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decision-making that undermines long-horizon planning capacity. To this end, we introduce Multi-node Lookahead Prediction (MnLP), a novel training strategy that extends the supervised learning paradigm to predict multiple future nodes simultaneously. We incorporate causal and discardable MnLP modules that operate exclusively during training, facilitating models to anticipate multi-step decisions while preserving inference-time efficiency. By incorporating multi-depth auxiliary supervision into the loss function, MnLP equips neural policies with the ability of long-range contextual understanding. Experimentally, MnLP outperforms existing training methods, improving the generalization capability of neural policies across various problem sizes, distributions, and real-world benchmarks. Moreover, MnLP can be seamlessly integrated into diverse neural architectures without introducing additional inference overhead.

2605.19972 2026-05-20 cs.LG cs.AI cs.DB cs.DS 版本更新

Block-Sphere Vector Quantization

块球向量量化

Heesang Ann, Joongkyu Lee, Min-hwan Oh

发表机构 * Seoul National University(首尔国立大学)

AI总结 本文研究了向量量化方法,通过统一理论比较不同旋转量化器,揭示其性能依赖于特定的失真度量标准,并提出块球量化算法以改进旋转块量化。

详情
AI中文摘要

向量量化是可扩展机器学习系统中的基本操作,能够实现内存高效存储、快速检索和压缩推理。最近的旋转基于量化器如EDEN、RabitQ和TurboQuant引入了强保证和实证性能,但其周围比较难以解释,因为它们依赖于不同的失真标准、概率领域和实现假设。作为我们的第一个贡献,我们提供了这些方法的统一理论比较,表明其相对优势是标准依赖的而非绝对的:EDEN和TurboQuant在均方失真方面有利,EDEN在预期内积失真方面也有效,而RabitQ提供强的高概率控制。此比较进一步表明EDEN在预期失真度量方面提供特别强的保证。作为我们的第二个贡献,我们引入了块球量化(BlockQuant),一种新的旋转块量化算法,围绕随机旋转向量的球几何设计。不同于坐标wise量化器,BlockQuant在球面上量化块,更忠实保持旋转嵌入的几何结构。我们证明这种块球设计在本文考虑的基准上理论上在重建MSE和预期内积失真方面均有所改进。我们在真实嵌入数据集和长上下文LLM推理任务上的实验显示了实际收益,与我们的理论改进一致。

英文摘要

Vector quantization is a fundamental primitive for scalable machine learning systems, enabling memory-efficient storage, fast retrieval, and compressed inference. Recent rotation-based quantizers such as EDEN, RabitQ, and TurboQuant have introduced strong guarantees and empirical performance, but the surrounding comparisons have been difficult to interpret because they rely on different distortion criteria, probability regimes, and implementation assumptions. As our first contribution, we provide a unified theoretical comparison of these methods and show that their relative advantages are criterion-dependent rather than absolute: EDEN and TurboQuant are favorable for MSE distortion, EDEN is also effective for expected inner-product distortion, and RabitQ provides strong high-probability control. This comparison further clarifies that EDEN provides particularly strong guarantees for expected distortion measures. As our second contribution, we introduce Block-Sphere Quantization (BlockQuant), a new rotation-based block quantization algorithm designed around the spherical geometry of randomly rotated vectors. Unlike coordinate-wise quantizers, BlockQuant quantizes blocks on the sphere, preserving the geometry of rotated embeddings more faithfully. We prove that this block-spherical design theoretically improves over the baselines considered in this paper for both reconstruction MSE and expected inner-product distortion. Our experiments on real embedding datasets and long-context LLM inference tasks show practical gains that are consistent with our theoretical improvements.

2605.19966 2026-05-20 cs.LG cs.AI 版本更新

Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

通过顺序熵变化检测基于优化的对抗性提示

Mohammed Alshaalan, Miguel R. D. Rodrigues

发表机构 * Department of Electronic and Electrical Engineering, University College London, London, United Kingdom(电子与电气工程系,伦敦大学学院,伦敦,英国)

AI总结 本文提出了一种基于在线变化点检测的对抗性后缀检测方法CPD,通过标准化用户令牌熵并应用单侧CUSUM统计量,提高了对优化基于对抗性提示的检测性能,同时在多个大型语言模型上实现了更高的F1分数和AUC性能。

Comments Accepted at ICML 2026; 20 pages, including 9 pages main text, references, and appendix

详情
AI中文摘要

基于优化的对抗性后缀可以劫持对齐的大型语言模型(LLMs),同时保持流畅,这削弱了静态和窗口化困惑度基于的检测器。我们把对抗性后缀检测视为一个在线变化点检测问题,针对令牌级下一个令牌熵流。使用LLM系统提示来估计一个稳健的基线,我们标准化用户令牌熵并应用单侧CUSUM统计量。所得到的检测器CPD(在线变化点检测)是模型无关的,无需训练,可以在线运行,并能定位对抗性后缀的起始。在1,012个优化基于的后缀攻击(GCG,AutoDAN,AdvPrompter,BEAST,AutoDAN-HGA)和1,012个困惑度控制的良性提示的基准上,CPD在六个开源权重聊天模型(LLaMA-2-7B/13B,Vicuna-7B/13B,Qwen2.5-7B/14B)上均优于最强的窗口化困惑度基线。在LLaMA-2-7B的典型CUSUM设置(k=0)下,CPD达到AUC 0.88和F1 0.82。除了提示级检测外,CPD将79.6%的触发集中在对抗性后缀内,而窗口化困惑度为17-46%。最后,当用作LLaMA Guard的轻量级门控时,CPD在高流量、良性主导的部署中减少了17-22%的门控调用,同时保持了门控级别的检测质量。

英文摘要

Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed perplexity-based detectors. We cast adversarial suffix detection as an online change-point detection problem over the token-level next-token entropy stream. Using the LLM system prompt to estimate a robust baseline, we standardize user-token entropies and apply a one-sided CUSUM statistic. The resulting detector, CPD Online (CPD), is model-agnostic, training-free, runs online, and localizes the adversarial suffix onset. On a benchmark of 1,012 optimization-based suffix attacks (GCG, AutoDAN, AdvPrompter, BEAST, AutoDAN-HGA) and 1,012 perplexity-controlled benign prompts, CPD improves F1 over the strongest windowed-perplexity baseline on all six open-weight chat models (LLaMA-2-7B/13B, Vicuna-7B/13B, Qwen2.5-7B/14B). On LLaMA-2-7B at the canonical CUSUM setting ($k=0$), CPD reaches AUROC $0.88$ and F1 $0.82$. Beyond prompt-level detection, CPD concentrates 79.6% of its triggers inside the adversarial suffix, versus 17-46% for windowed perplexity. Finally, when used as a lightweight gate for LLaMA Guard, CPD reduces guard calls by 17-22% on a high-volume, benign-dominated deployment while preserving guard-level detection quality

2605.19959 2026-05-20 cs.LG math.FA 版本更新

Learning Orthonormal Bases for Function Spaces

在函数空间中学习正交基

Hamidreza Kamkari, Mohammad Sina Nabizadeh, Justin Solomon

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 本文提出通过神经网络学习和优化函数空间中的正交基,利用李群的流形性质,证明即使使用有限秩生成器,也能在适当算子拓扑下实现正交基的稠密性。

详情
AI中文摘要

无限维正交基的展开在表示和计算函数空间时起着核心作用,由于其有利的线性代数性质。然而,常见的基如傅里叶或小波基是固定的,不能适应给定问题或数据集的结构。本文旨在用神经网络表示这些基并进行优化。我们的关键思想是,任何目标无限维正交基可以视为李群的流形上的一个点,或者等价地,视为连接参考基(例如傅里叶基)到该目标基的连续路径的终点。流形上的路径满足由斜反对称积分算子所支配的常微分方程(ODE)。使用神经网络定义此类ODE的有限秩生成器,使我们能够参数化和优化函数空间中的正交基。虽然使用有限秩生成器来建模无限算子可能显得限制,但我们证明了一个普遍性结果:即使使用秩2的生成器,ODE的积分解在适当的算子拓扑下在正交群中也是稠密的。换句话说,对于任何目标正交基,存在一条从参考基出发并由有限秩生成器驱动的路径,可以无限接近该目标基。我们通过将傅里叶基转换为功能数据集的主成分、线性算子的本征函数或能量守恒物理模拟的动力模式,展示了该框架的灵活性。

英文摘要

Infinite-dimensional orthonormal basis expansions play a central role in representing and computing with function spaces due to their favorable linear algebraic properties. However, common bases such as Fourier or wavelets are fixed and do not adapt to the structure of a given problem or dataset. In this paper, we aim to represent these bases with neural networks and optimize them. Our key idea is that any target infinite-dimensional orthonormal basis can be viewed either as a point on the Lie manifold of the orthogonal group, or equivalently, as the endpoint of a continuous path on that manifold that connects a reference basis, e.g. Fourier, to that target. Paths on the Lie manifold satisfy ordinary differential equations (ODEs) governed by skew-adjoint integral operators. Using neural networks to define finite-rank generators of such ODEs allows us to parameterize and optimize orthonormal bases in function space. While relying on finite-rank generators to model infinite operators might seem restrictive, we prove a universality result: even with a rank-2 generator, the integrated solutions of the ODE are dense in the orthogonal group under the appropriate operator topology. In other words, for any target orthonormal basis, there exists a path originating from a reference basis and driven by finite-rank generators that gets arbitrarily close to that target basis. We demonstrate the flexibility of our framework by transforming the Fourier basis into the principal components of a functional dataset, eigenfunctions of linear operators, or dynamic modes of energy-preserving physical simulations.

2605.19947 2026-05-20 cs.LG 版本更新

Exploiting Non-Negativity in DAG Structure Learning

利用非负性在DAG结构学习中的应用

Samuel Rey, Madeline navarro, Gonzalo Mateos

发表机构 * Dept. of Signal Theory and Communications, Universidad Rey Juan Carlos(信号理论与通信系,雷昂·卡洛斯大学) Dept. of Electrical and Computer Engineering, Rice University(电气与计算机工程系,里奇大学) Dept. of Electrical and Computer Engineering, University of Rochester(电气与计算机工程系,罗切斯特大学)

AI总结 本文研究了如何通过非负性约束简化DAG结构学习中的非凸优化问题,并提出了基于多pliers方法的正则化非负DAG学习算法,证明了在总体情况下真实DAG是唯一全局最小值点。

详情
AI中文摘要

本文研究了从节点观测数据学习有向无环图(DAG)的问题,这些数据由线性结构方程模型生成。DAG学习是信号处理、机器学习和因果推断中的核心任务,但其挑战在于无环性是一个全局组合性质。连续无环约束通过将离散DAG约束替换为光滑等式约束促进了算法进展。然而,现有方法仍然涉及困难的非凸优化景观并可能遭受退化的一阶最优条件。本文专注于具有非负边权的DAG,并利用此额外结构获得更简单的无环性表征。基于此表征,我们提出了一个正则化的非负DAG学习问题,并开发基于多pliers方法的算法。我们进一步分析了非负性诱导的良性优化景观。在总体情况下,我们证明真实DAG是所提出增广拉格朗日公式唯一的全局最小值点;此外,景观中没有虚假的内部 stationary 点,且真实DAG是唯一的无环KKT点。在合成和真实数据上的数值实验表明,所提方法优于现有连续DAG学习方法。

英文摘要

This work addresses the problem of learning directed acyclic graphs (DAGs) from nodal observations generated by a linear structural equation model. DAG learning is a central task in signal processing, machine learning, and causal inference, but it remains challenging because acyclicity is a global combinatorial property. Continuous acyclicity constraints have led to important algorithmic advances by replacing the discrete DAG constraint with smooth equality constraints. However, existing formulations still involve difficult non-convex optimization landscapes and may suffer from degenerate first-order optimality conditions. Here, we restrict attention to DAGs with non-negative edge weights and exploit this additional structure to obtain a simpler characterization of acyclicity. Building on this characterization, we formulate a regularized non-negative DAG learning problem and develop an algorithm based on the method of multipliers. We further analyze the benign optimization landscape induced by non-negativity. In the population regime, we show that the true DAG is the unique global minimizer of the proposed augmented-Lagrangian formulation; moreover, the landscape contains no spurious interior stationary points, and the true DAG is the only acyclic KKT point. Numerical experiments on synthetic and real-world data show that the proposed method improves over state-of-the-art continuous DAG-learning alternatives.

2605.19944 2026-05-20 cs.LG cs.AI cs.CC cs.CL 版本更新

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

关于推理的测度论分析:结构泛化与近似限制

Yuyang Zhang, Yifu Zhang, Xuehai Zhou, Xiaoyin Chen

发表机构 * McGill University(麦吉尔大学) Mila - Quebec AI Institute(魁北克AI研究所) Université de Montréal(蒙特利尔大学)

AI总结 本文通过最优传输理论分析推理过程,揭示了结构泛化和近似限制的理论机制,发现位置依赖注意力机制和Transformer电路深度对推理性能有显著影响。

Comments Preprint

详情
AI中文摘要

尽管大型语言模型推理的经验缩放定律已得到充分文档,但支配分布外泛化的理论机制仍不明确。我们通过最优传输形式化推理,将离散轨迹投影到连续度量空间,利用Wasserstein-1距离量化领域偏移。借助Kantorovich对偶性,我们通过架构Lipschitz连续性和函数近似限制来界定分布外泛化。这揭示了两个主要约束。首先,位置依赖注意力(例如绝对位置编码)无法保持偏移不变性,导致Ω(1)的Lipschitz常数和预期风险,而偏移不变机制(例如旋转嵌入)保持等价性并限制误差。其次,通过将顺序回溯映射到Dyck-k语言,我们为TC⁰变换器建立了严格的电路深度下界。物理层深度的扩展是必要的,以避免表示崩溃——这一约束无法通过扩展表示宽度来绕过,因为Barron空间中存在不可约的近似界限。在54种Transformer配置上对组合搜索的评估证实了这些界限,证明泛化风险随Wasserstein领域偏移单调下降。

英文摘要

While empirical scaling laws for LLM reasoning are well-documented, the theoretical mechanisms governing out-of-distribution (OOD) generalization remain elusive. We formalize reasoning via optimal transport, projecting discrete trajectories into a continuous metric space to quantify domain shifts using the Wasserstein-1 distance. Invoking Kantorovich duality, we bound OOD generalization via architectural Lipschitz continuity and functional approximation limits. This exposes two primary constraints. First, position-dependent attention (e.g., Absolute Positional Encoding) fails to preserve shift invariance, yielding an $Ω(1)$ Lipschitz constant and expected risk, whereas shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound the error. Second, by mapping sequential backtracking to a Dyck-$k$ language, we establish a strict circuit depth lower bound for $\text{TC}^0$ Transformers. Scaling physical layer depth is necessary to avert representation collapse -- a constraint that scaling representation width cannot bypass due to irreducible approximation bounds in Barron spaces. Evaluations across 54 Transformer configurations on combinatorial search corroborate these bounds, demonstrating that generalization risk degrades monotonically with the Wasserstein domain shift.

2605.19932 2026-05-20 cs.AI cs.CL cs.LG 版本更新

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

PEEK:上下文地图作为长上下文LLM代理的导向缓存

Zhuohan Gu, Qizheng Zhang, Omar Khattab, Samuel Madden

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室) Stanford University(斯坦福大学)

AI总结 本文提出PEEK系统,通过上下文地图缓存和维护导向知识,提升长上下文LLM代理在重复外部上下文中的交互准确性和效率,相比基线方法在推理和上下文学习任务中均取得显著提升。

详情
AI中文摘要

大型语言模型(LLM)代理越来越多地在长且重复的外部上下文中操作,如文档语料库和代码仓库。在多次调用中,现有方法保留的是代理的轨迹、对原始材料的被动访问或任务级别的策略。但它们没有保留我们认为对于重复相同上下文工作负载最需要的:关于重复上下文本身的可重用导向知识(例如,上下文包含什么、如何组织,以及哪些实体、常量和模式历史上有用)。我们引入PEEK,一种系统,通过上下文地图缓存和维护这种导向知识:一个在代理提示中始终存在的小而固定大小的artifact,使代理能够持续查看外部上下文。该地图由一个可编程的缓存策略维护,包含三个模块:一个Distiller从推理时间信号中提取可转移的知识,一个Cartographer将其转换为结构化的编辑,以及一个基于优先级的Evictor强制执行固定的token预算。在长上下文推理和信息聚合中,PEEK在强基线方法上提高了6.3-34.0%,同时使用93-145次更少的迭代,并且成本比最先进的提示学习框架ACE低1.7-5.8倍。在上下文学习中,PEEK在解决率和评分准确性上分别提高了6.0-14.0%和7.8-12.1%,且成本比ACE低1.4倍。这些收益在不同语言模型和代理架构上均能泛化,包括生产级的OpenAI Codex。这些结果表明,上下文地图有助于长上下文LLM代理更准确、更高效地与重复的外部上下文交互。

英文摘要

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.

2605.19931 2026-05-20 cs.CV cs.AI cs.LG 版本更新

StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels

StruMPL:在不相交的部分监督和MNAR标签下的多任务密集回归

Reza M. Asiyabi, Juan Alberto Molina-Valero, The SEOSAW Partnership, Steven Hancock, Casey M. Ryan

发表机构 * School of Geosciences, University of Edinburgh, UK(爱丁堡大学地球科学学院,英国) National Centre for Earth Observation (NCEO), UK(英国地球观测国家中心) Department of Spatial Sciences, Faculty of Environmental Sciences Czech University of Life Sciences Prague, Praha, Czech Republic(环境科学学院空间科学系,捷克布拉格生命科学大学)

AI总结 本文针对在不相交的部分监督和MNAR标签下的多任务密集回归问题,提出StruMPL方法,通过共享编码器和可学习的物理模块,结合Augmented IPW损失函数,提高了对森林地上生物量的估计精度。

Comments 10 pages with 3 figures and 4 tables, References and Appendix 12 pages with 1 figure and 4 tables

详情
AI中文摘要

从地球观测估计森林地上生物量(AGB)结合了两个结构上不兼容的标签源:空间borne激光雷达在数百万个位置提供冠层结构但没有生物量估计,而地面样地在数千个偏倚位置提供生物量但没有结构指标。没有单个训练样本携带所有目标变量的标签,样地标签不是随机缺失(MNAR),且生物量通过已知但生物体特异性的所有学定律与结构变量相关联。我们将其正式化为在异质不相交部分监督下的多任务密集回归问题,具有MNAR标签和任务间物理约束,并提出StruMPL方法来联合解决。一个共享编码器为每个变量回归、填补和倾向性头提供空间MNAR校正,以及一个可学习的物理模块,该模块在每个像素上评估任务间约束对模型自身预测的影响。监督损失使用Augmented IPW(AIPW)伪结果,其中在倾向性和填补基线上的停止梯度;我们证明了分析和实证上,两者对于联合优化恢复IPW加权的平稳点并保持损失有界是必要的。在两个生态上不同的生物体上,StruMPL在AGB RMSE和偏倚方面优于消融变体和最接近的已发表方法,分层分析显示AIPW减少了高AGB偏倚约54%。

英文摘要

Estimating forest aboveground biomass (AGB) from Earth observation combines two structurally incompatible label sources: spaceborne lidar provides canopy structure at millions of locations but no biomass estimate, and ground-based plots provide biomass at thousands of biased locations but no metrics of structure. No single training sample carries labels for all target variables, plot labels are missing not at random (MNAR), and biomass is linked to the structural variables by known but biome-specific allometric laws. We formalise this as multi-task dense regression under heterogeneous disjoint partial supervision with MNAR labels and inter-task physical constraints, and propose StruMPL to address it jointly. A shared encoder feeds per-variable regression, imputation, and propensity heads for spatial MNAR correction, and a learnable physics module that evaluates the inter-task constraint on the model's own predictions at every pixel. The supervised loss uses an Augmented IPW (AIPW) pseudo-outcome with stop-gradients on the propensity and on the imputation baseline; we show analytically and empirically that both are necessary for joint optimisation to recover IPW-weighted stationary points while keeping the loss bounded. On two ecologically distinct biomes, StruMPL outperforms ablation variants and the closest published method on AGB RMSE and bias, with a stratified analysis showing AIPW reduces high-AGB bias by ~54%.

2605.19928 2026-05-20 cs.GT cs.AI cs.LG 版本更新

Real-Time Parallel Counterfactual Regret Minimization

实时并行反事实遗憾最小化

Boning Li, Longbo Huang

发表机构 * IIIS, Tsinghua University(清华大学信息科学技术学院)

AI总结 本文提出了一种实时深度限制下的CFR求解并行框架,通过剪枝、抽象和高级CFR变体的无缝整合,实现了在几秒内完成近均衡策略计算的高效方法,实验显示在德州扑克中速度提升了3.3-3.4倍。

Comments 13 pages, 3 figures

详情
AI中文摘要

反事实遗憾最小化(CFR)是解决大型不完全信息游戏的主要算法家族,支撑了Libratus和Pluribus等No-Limit Texas Hold'em扑克突破。在实时游戏系统中,求解器必须在仅几秒的严格时间预算内计算近均衡策略,而在此窗口内完成的CFR迭代次数直接决定了游戏表现。我们提出了Parallel CFR,这是首个用于实时深度限制CFR求解的并行化框架,无缝整合了剪枝、抽象和高级CFR变体。我们将每个CFR迭代分解为七个阶段的流水线,并识别了两个正交的并行维度:按信息集和按树节点。叶节点评估通过批量神经网络推理卸载到GPU,创建了异构的CPU-GPU流水线。在一对一No-Limit Texas Hold'em实验中,Parallel CFR在翻牌街实现了3.3-3.4倍的速度提升,深度限制游戏树中超过10亿历史的每迭代时间约为47-54毫秒。所有实验均在单个桌面级设备(NVIDIA DGX Spark)上运行,无需数据中心级基础设施即可在典型实时决策预算内完成数百次CFR迭代。

英文摘要

Counterfactual Regret Minimization (CFR) is the dominant algorithmic family for solving large imperfect-information games, underpinning breakthroughs such as Libratus and Pluribus in No-Limit Texas Hold'em poker. In real-time game-playing systems, the solver must compute a near-equilibrium strategy within a strict time budget of only a few seconds per decision, and the number of CFR iterations completed in this window directly determines play strength. We present \textbf{Parallel CFR}, the first parallelization framework for real-time depth-limited CFR solving that seamlessly integrates pruning, abstraction, and advanced CFR variants. We decompose each CFR iteration into a pipeline of seven stages and identify two orthogonal dimensions of parallelism: \emph{by information set} and \emph{by tree node}. Leaf node evaluation is offloaded to GPUs via batched neural network inference, creating a heterogeneous CPU--GPU pipeline. Experiments on Heads-Up No-Limit Texas Hold'em demonstrate that Parallel CFR achieves $3.3$--$3.4\times$ speedup over the single-threaded baseline on postflop streets, with per-iteration time of ${\sim}47$--$54$~ms on a depth-limited game tree with over $1$ billion histories. All experiments run on a single desktop-class device (NVIDIA DGX Spark), enabling hundreds of CFR iterations within a typical real-time decision budget without requiring datacenter-scale infrastructure.

2605.19926 2026-05-20 cs.LG 版本更新

JAXenstein: Accelerated Benchmarking for First-Person Environments

JAXenstein: 加速的第一人称环境基准测试

Ruo Yu Tao, George Konidaris

发表机构 * GitHub

AI总结 本文提出JAXenstein,一个基于JAX的开源基准测试,用于加速第一人称视觉任务的实验,通过实现Wolfenstein 3D渲染引擎,提高了实验效率并支持更复杂的环境。

Comments Main paper: 5 pages, supplementary material: 3 pages

详情
AI中文摘要

强化学习算法的进步一直由具有挑战性的基准测试推动。研究人员对问题设置进行迭代的速度直接影响算法开发的速度。现代机器学习已经产生了允许快速和可扩展算法开发的工具,如JAX库。在这些工具可用的情况下,算法开发中的主要瓶颈是大型和复杂领域实验的可用性。特别是,JAX强化学习生态系统中没有测试视觉第一人称任务的基准测试;这些领域对于测试探索能力和代理克服部分可观测性的能力至关重要。我们介绍了JAXenstein:一个基于JAX的开源基准测试,实现了Wolfenstein 3D渲染引擎,以在视觉第一人称任务中实现快速和可扩展的实验。JAXenstein比类似的基于视觉的基准测试快几倍,并且可以轻松扩展到更复杂的第一人称领域。

英文摘要

The progression of reinforcement learning algorithms have been driven by challenging benchmarks. The rate in which a researcher can iterate on a problem setting directly impacts the speed of algorithm development. Modern machine learning has produced tools that allow for fast and scalable algorithm development like the JAX library. With the availability of these tools, a serious bottleneck in algorithm development is the availability of large and complex domains for experimentation. Most notably, the JAX reinforcement learning ecosystem does not have any benchmarks that test visual first-person tasks; these domains are crucial for testing both exploration and an agent's ability to overcome partial observability. We introduce JAXenstein: an open-source JAX-based benchmark that implements the Wolfenstein 3D rendering engine for fast and scalable experimentation in visual first-person tasks. JAXenstein is several times faster than comparable vision-based benchmarks, and is easily extensible to more complex first-person domains.

2605.19916 2026-05-20 cs.LG cs.AI 版本更新

Fast and Featureless Node Representation Learning with Partial Pairwise Supervision

基于部分成对监督的快速且无特征节点表示学习

Sujan Chakraborty, Saptarshi Bej

发表机构 * Indian Institute of Science Education and Research(印度科学教育与研究学院)

AI总结 该研究提出了一种快速且统一的框架,用于在部分可用的成对节点标签和无可用节点特征的图中进行可扩展的节点表示学习,通过结合社区感知的结构信号和带符号的成对约束,实现了高效的优化方案。

详情
AI中文摘要

我们引入了Contrastive FUSE,一种用于图中可扩展节点表示学习的快速且统一的框架,该框架在部分可用的成对节点标签和无可用节点特征的情况下进行优化。与现有方法不同,我们直接优化了一个谱对比目标,该目标整合了社区感知的结构信号和带符号的成对约束。为了支持大规模训练,我们用一种轻量级的近似方法替换了昂贵的模块度梯度,这在保持模块度行为的同时显著降低了计算成本。这产生了一种高效的优化方案,具有自然梯度分解和自适应学习率缩放,即使在百万边图上也能实现快速迭代更新。在基准引文网络、大型共购图和OGB数据集上的广泛实验表明,Contrastive FUSE在不依赖节点特征的情况下实现了竞争性或优越的对比分类性能,同时在现有基线上提供了显著的运行时间提升。这些结果突显了将模块度启发的结构学习与对比监督相结合在高效和可扩展的对比节点表示学习中的有效性。

英文摘要

We introduce Contrastive FUSE, a fast and unified framework for scalable node representation learning in graphs with partially available pairwise node labels and no available node features. Unlike existing methods, we directly optimize a spectral contrastive objective that integrates community-aware structural signals with signed pairwise constraints. To support large-scale training, we replace the expensive modularity gradient with a lightweight approximation, which preserves the structure-seeking behavior of modularity while reducing the computational cost significantly. This yields an efficient optimization scheme with a natural gradient decomposition and adaptive learning-rate scaling, enabling fast iterative updates even on million-edge graphs. Extensive experiments on benchmark citation networks, large co-purchase graphs, and OGB datasets show that Contrastive FUSE achieves competitive or superior contrastive classification performance without relying on node features, while offering substantial runtime gains over existing baselines. These results highlight the effectiveness of coupling modularity-inspired structural learning with contrastive supervision for efficient and scalable contrastive node representation learning.

2605.19902 2026-05-20 cs.LG q-bio.QM 版本更新

Hierarchical Contrastive Learning for Multi-Domain Protein-Ligand Binding

多领域蛋白质-配体结合的分层对比学习

Shuo Zhang, Rongqi Hong, Huifeng Zhang, Jian K. Liu

发表机构 * University of Birmingham, UK(英国伯明翰大学)

AI总结 本研究提出HCLBind框架,通过分层对比学习方法,解决多领域蛋白质-配体结合亲和力预测问题,核心方法是分离几何表示学习与亲和力回归,并采用新颖的分层诱饵策略,结合领域门控图注意力网络和跨模态注意力,提升领域界面优先级,实验表明HCLBind能有效学习判别界面特征并提供鲁棒的不确定性估计。

Comments Accepted by ISBRA2026

详情
AI中文摘要

预测多领域蛋白质-配体结合亲和力仍然面临挑战,因为领域间动态决定了分子识别。现有几何深度学习方法通常将蛋白质视为单一静态图,导致刚体假设和柔性区域的随机噪声问题。为此,我们引入HCLBind,一种自监督框架,将几何表示学习与亲和力回归分离。HCLBind在Q-BioLiP数据库上采用通用到特定的预训练范式,学习稳健的结合物理语法。我们提出了一种新颖的分层诱饵策略:模型通过单领域蛋白质坐标扰动学习局部物理化学约束,通过多领域复合物领域旋转学习全局构象几何。我们的混合架构集成了领域门控图注意力网络和跨模态注意力,以显式优先考虑领域界面。此外,我们采用LoRA对蛋白质和配体基础模型进行优化,确保高效优化的同时保留进化知识。在PDBBind上的实验表明,HCLBind有效学习了判别界面特征,并提供了鲁棒的不确定性估计,克服了标准监督学习的局限性。代码可在https://github.com/jiankliu/HCLBind获取。

英文摘要

Predicting protein-ligand binding affinity remains intractable for multi-domain proteins, where inter-domain dynamics govern molecular recognition. Existing geometric deep learning methods typically treat proteins as monolithic static graphs, suffering from rigid-body assumptions and aleatoric noise in flexible regions. To address this, we introduced HCLBind, a self-supervised framework that decouples geometric representation learning from affinity regression. HCLBind leverages a general-to-specific pre-training paradigm on the Q-BioLiP database to learn a robust physical grammar of binding. We propose a novel hierarchical decoy strategy: the model learns local physicochemical constraints through protein coordinate perturbation in single-domain proteins and global conformational geometry through inter-domain rotation in multi-domain complexes. Our hybrid architecture integrates a domain-gated graph attention network and cross-modal attention to explicitly prioritize domain interfaces. Furthermore, we employ LoRA on protein and ligand foundation models, ensuring efficient optimization while preserving evolutionary knowledge. Experiments on PDBBind demonstrate that HCLBind effectively learns discriminative interface features and provides robust uncertainty estimation, overcoming the limitations of standard supervised learning. The code is available at https://github.com/jiankliu/HCLBind.

2605.19856 2026-05-20 cs.LG cs.AI 版本更新

StableGrad: Backward Scale Control without Batch Normalization

StableGrad: 无需批量归一化的反向缩放控制

Jose I. Mestre, Alberto Fernández-Hernández, Cristian Pérez-Corral, Manuel F. Dolz, Enrique S. Quintana-Ortí

发表机构 * Universitat Politècnica de València(巴塞罗那理工大学) Universitat Jaume I(Jaime I 大学)

AI总结 本文提出StableGrad,一种在无需批量归一化的情况下通过优化器层面控制权重-梯度缩放来稳定深度神经网络训练的方法,特别适用于物理信息神经网络等场景。

详情
AI中文摘要

训练非常深的神经网络需要控制深度方向上的量值传播。没有这种控制,激活值和梯度可能会消失、爆炸或进入不稳定区域,导致优化失败。现代架构通常通过批量归一化、残差连接或其他归一化层来缓解这个问题,这些机制会重复地重新缩放或绕过中间表示。然而,这些机制并不总是适用。在物理信息神经网络(PINNs)中,网络表示连续的物理场及其输入导数定义了训练目标,使批量依赖的归一化变得有问题,因为这会引入非局部依赖性到预测场及其导数中。我们提出StableGrad,一种优化器层面的缩放控制机制,可以在不修改前向模型的情况下纠正层间权重-梯度不平衡。因为归一化仅在反向传播后、优化器更新前应用,网络输出、其导数和物理残差保持不变。我们分析了这种缩放所引起的有效训练动态,并在深度PINNs上评估StableGrad作为目标应用,用无批量归一化的卷积网络作为诊断压力测试。在PINN基准测试中,StableGrad提高了匹配深度的解精度,并使更深层的模型在标准优化下更加可靠。在ResNet和EfficientNet架构中,移除批量归一化通常会导致训练崩溃,但StableGrad在不引入其他架构变化的情况下稳定了优化。这些结果表明,优化器层面的权重-梯度缩放控制可以提供一种实用的替代方案,当前向归一化不可用或不适用时。

英文摘要

Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often mitigate this problem through Batch Normalization, residual connections, or other normalization layers, which repeatedly re-scale or bypass intermediate representations. However, these mechanisms are not always appropriate. In Physics-Informed Neural Networks (PINNs), the network represents a continuous physical field and its input derivatives define the training objective, making batch-dependent normalization problematic because it can introduce non-local dependencies into the predicted field and its derivatives. We propose StableGrad, an optimizer-level scale-control mechanism that corrects layer-wise weight-gradient imbalances without modifying the forward model. Because the normalization is applied only after backpropagation and before the optimizer update, the network output, its derivatives, and the physical residual remain unchanged. We analyze the effective training dynamics induced by this rescaling and evaluate StableGrad on deep PINNs as the target application, with BatchNorm-free convolutional networks serving as a diagnostic stress test. On PINN benchmarks, StableGrad improves matched-depth solution accuracy and makes deeper models more reliable under standard optimization. On ResNet and EfficientNet architectures, where removing Batch Normalization normally leads to training collapse, StableGrad stabilizes optimization without introducing any other architectural change. These results show that optimizer-level control of weight-gradient scale can provide a practical alternative when forward normalization is unavailable or undesirable.

2605.19842 2026-05-20 cs.LG 版本更新

Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

通过切片特征蒸馏实现神经网络的快速张量化

Safa Hamreras, Sukhbinder Singh, Román Orús

发表机构 * Donostia International Physics Center(多斯蒂亚国际物理中心) Multiverse Computing(多维计算) Ikerbasque Foundation for Science(伊克尔巴斯基科学基金会)

AI总结 本文提出了一种基于切片特征蒸馏的可扩展张量化框架,用于神经网络压缩。该方法通过将网络分解为独立的切片(如单个层或块),并独立张量化每个切片以恢复原始预训练模型的中间表示,从而提高精度恢复、减少数据需求并实现高效的并行优化。

详情
AI中文摘要

我们提出了一种基于切片特征蒸馏的可扩展张量化框架,用于神经网络压缩。与传统的依赖于成本高昂的全局微调的张量分解方法不同,我们的方法将网络分解为由单个层、块(如卷积层或MLP)或连续层的小组构成的切片,并独立对每个切片进行张量化以重现原始预训练模型的中间表示。这种模块化策略提高了精度恢复,减少了数据需求,并实现了高效的并行优化。在ResNet-34上的实验表明,与传统全局张量化相比,该方法在中等压缩率下实现了接近无损的压缩效果,并具有更快的优化速度。在GPT-2 XL上的结果进一步展示了该方法的可扩展性和其在大规模模型中的适用性,特别是在分布式设置中。

英文摘要

We propose a scalable tensorization framework for neural network compression based on slice-wise feature distillation. Unlike conventional tensor decomposition methods that rely on costly global finetuning, our approach decomposes the network into slices consisting of either individual layers or blocks (e.g., convolutional layers or MLPs), or small groups of consecutive layers, and tensorizes each slice independently to reproduce the intermediate representations of the original pretrained model. This modular strategy improves accuracy recovery, reduces data requirements, and enables efficient parallel optimization. Experiments on ResNet-34 show significant gains over conventional global tensorization, achieving near-lossless compression at moderate compression rates with faster optimization. Results on GPT-2 XL further demonstrate the scalability of the method and its applicability to large-scale models, particularly in distributed settings.

2605.19834 2026-05-20 cs.LG cs.AI cs.SY eess.SY 版本更新

A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams

一种闭环、以状态为中心的多智能体框架,用于从异构数据流中估计乘客负载

Yiyao Xu, Hao Zhou, Yuhang Wang, Jingran Sun

发表机构 * Department of Civil and Environmental Engineering, University of South Florida(佛罗里达州立大学土木与环境工程系)

AI总结 本文提出一种闭环、以状态为中心的多智能体框架,用于从异构数据流中准确估计乘客负载,通过动态分配信任和物理约束提升鲁棒性。

Comments Preprint version of a paper accepted by the 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC). 7 pages, 4 figures

详情
AI中文摘要

为了支持运营和乘客服务,公共交通机构需要可靠的乘客负载轨迹。目前,负载估计通常是从不完美的传感系统推断而来,而非完全观察,现代自动乘客计数(APC)系统的准确性仍受车站布局、流量强度和运营条件的影响。为了解决从异构数据流中稳健估计乘客负载的挑战,包括增量计数误差、证据冲突和上下文依赖的传感器可靠性,我们提出了一种闭环、以状态为中心的多智能体框架。该方法在每一步都强制物理可行性,动态分配信任给证据源,并将物理推导出的违反残差反馈回训练以提高鲁棒性。该架构包括一个统一的停靠事件骨干,一个耦合的感知-物理-融合循环用于停靠点推断,以及可选的行程级宏修正和闭环校准模块。

英文摘要

To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.

2605.19830 2026-05-20 cs.LG math.ST stat.TH 版本更新

Set-Valued Policy Learning

多治疗设置下的集合值策略学习

Laura Fuentes-Vicente, Mathieu Even, Gaëlle Dormion, Antoine Chambaz, Uri Shalit, Julie Josse

发表机构 * Inria PreMeDICaL, Inserm, Montpellier, France(Inria PreMeDICaL、Inserm、蒙彼利埃法国) Elixir Health, Paris, France(Elixir Health、巴黎法国) Université Paris Cité, CNRS, MAP5, F-75006 Paris, France(巴黎大学Cité、CNRS、MAP5、法国巴黎75006) Tel-Aviv University, Tel-Aviv, Israel(特拉维夫大学、特拉维夫以色列)

AI总结 本文提出了一种集合值策略学习方法,用于多治疗场景,通过输出可能的治疗集而非单一推荐,从而内在地量化不确定性,并通过新的 greatest Lower Bound 方法扩展了学习-延迟框架,并引入了符合政策学习,以连接未观察到的真实最优治疗与估计的最优治疗规则。

详情
AI中文摘要

传统治疗政策将患者协变量映射到单一推荐干预以最大化预期临床结果。尽管已开发出大量因果推断方法来估计此类政策,但点值推荐对估计不确定性、模型规范和有限样本变异高度敏感,通常提供很少关于应如何自信推荐行动的指导。在本文中,我们提出了一种多治疗设置下的集合值策略学习范式,其中策略输出一组可能的治疗而非单一推荐。这种形式使内在不确定性量化成为可能,预测集的大小反映决策不确定性的程度。我们通过新的 greatest Lower Bound 方法扩展了学习-延迟框架到多治疗,并引入了符合政策学习,它弥合了未观察到的真实最优治疗与估计最优治疗规则之间的差距。借鉴噪声标签文献的见解,我们开发了一种随机性注入方法,该方法在不需假设底层黑箱最优治疗规则的情况下保证边际覆盖率。通过在合成数据和实际应用到体外受精(IVF)上的实验,我们证明了我们的方法产生稳健且可操作的政策,这些政策自然地纳入临床考虑,同时有效平衡性能和可靠性。

英文摘要

Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies, point-valued recommendations can be highly sensitive to estimation uncertainty, model specification, and finite-sample variability, while typically providing little guidance about how confident one should be in the recommended action. In this work, we propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel \textit{greatest Lower Bound} method, and introduce \textit{conformal policy learning}, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules. Through experiments on synthetic data and a real-world application to In-Vitro Fertilization (IVF), we demonstrate that our methods produce robust and actionable policies that naturally incorporate clinical considerations while effectively balancing performance and reliability.

2605.19823 2026-05-20 cs.LG cs.AI math.AP math.DS stat.ML 版本更新

Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

通过平滑分段处理神经算子以应对不连续性和尖锐过渡

Ha Dang, Sebastian Schmidt, Juergen Hesser

发表机构 * Mannheim Institute for Intelligent Systems in Medicine, Heidelberg University(海德堡大学曼海姆智能医学研究所) Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University(海德堡大学跨学科科学计算中心) Heidelberg Institute for Theoretical Studies (HITS), Heidelberg University(海德堡大学理论研究 institute) Central Institute for Computer Engineering (ZITI), Heidelberg University(海德堡大学计算机工程中心) CZS Heidelberg Initiative for Model-Based AI (MBAI), Heidelberg University(海德堡模型驱动人工智能倡议)

AI总结 本文提出Cut-DeepONet,一种两阶段训练框架,通过将不连续性建模为更高维空间中的边界,减少学习复杂性,从而在处理偏微分方程的解算子时更有效地捕捉不连续性和尖锐过渡。

详情
AI中文摘要

神经算子在学习偏微分方程(PDEs)的解算子方面取得了强劲表现,但其本质上连续的表示在捕捉不连续性和尖锐过渡时存在困难。现有方法通常在连续函数空间内近似这些特征,往往需要增加模型容量和高分辨率数据。在本文中,我们提出Cut-DeepONet,一种两阶段训练框架,通过提升策略将问题重新表述,将域划分成平滑子区域,同时在更高维空间中将不连续性表示为边界。这种分离使算子学习任务与神经网络的归纳偏置对齐,并避免直接近似不连续性。一个额外的网络预测输入依赖的不连续性位置,然后用于指导神经算子在每个区域内生成平滑组件。在基准PDEs上的实验表明,Cut-DeepONet在低分辨率数据集上训练时也优于最先进的方法。该方法在存在不连续性和尖锐过渡的问题上表现优异,同时使用更少的可训练参数。我们的结果突显了改变算子学习的表示而非增加模型复杂性的优势。

英文摘要

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing approaches typically approximate such features within continuous function spaces, often requiring increased model capacity and high-resolution data. In this work, we propose Cut-DeepONet, a two-stage training framework that explicitly models discontinuities while reducing learning complexity. Our approach reformulates the problem via a lifting strategy, partitioning the domain into smooth subregions while representing discontinuities as boundaries in a higher-dimensional space. This separation aligns the operator learning task with the inductive bias of neural networks and avoids directly approximating discontinuities. An additional network predicts input-dependent discontinuity locations for unseen inputs, which are then used to guide the neural operator in generating smooth components within each region. Experiments on benchmark PDEs show that Cut-DeepONet outperforms state-of-the-art methods, even when trained on low-resolution datasets. The method excels on problems with discontinuities and sharp transitions, while using fewer trainable parameters. Our results highlight the benefits of changing the representation of operator learning rather than increasing model complexity.

2605.19822 2026-05-20 cs.LG cs.AI 版本更新

ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability

ST-TGExplainer: 解构稳定性与转换模式以提升时序GNN可解释性

Hongjiang Chen, Xin Zheng, Pengfei Jiao, Huan Liu, Zhidong Zhao, Huaming Wu, Feng Xia, Shirui Pan

发表机构 * Hangzhou Dianzi University(杭州电子科技大学) RMIT University(皇家墨尔本理工大学) Tianjin University(天津大学) Griffith University(格里菲斯大学)

AI总结 本文提出ST-TGExplainer,一种能够解构时序图中稳定性与转换模式的自解释时序GNN,以提升模型的可解释性。

详情
AI中文摘要

时序图神经网络(TGNNs)在解决现实中的时序图任务中取得了显著进展。然而,其可解释性仍然有限,因为大多数TGNNs无法识别哪些历史交互最影响给定预测。尽管在可解释性TGNNs上取得了令人鼓舞的进展,现有方法主要关注之前已见过的历史交互,我们称之为稳定性模式,而忽略了新出现的一次性交互,我们称之为转换模式。这两种模式对于忠实的时序解释都是必不可少的。为了解决这一限制,我们提出了ST-TGExplainer,一种自解释的TGNN,旨在解构时序图中的稳定性与转换模式,以获得更忠实的时序GNN解释器。受解构信息瓶颈目标的指导,ST-TGExplainer学习了一个紧凑的解释子图,该子图在预测事件标签时保持预测性,同时显式地抑制稳定性与转换模式之间的标签条件冗余。广泛的实验表明,ST-TGExplainer在预测性能上表现出色,并产生了更忠实的解释。代码可在https://github.com/hjchen-hdu/ST-TGExplainer上获取。

英文摘要

Temporal graph neural networks (TGNNs) have gained significant traction for solving real-world temporal graph tasks. However, their interpretability remains limited, as most TGNNs fail to identify which historical interactions most influence a given prediction. Despite promising progress on interpretable TGNNs, existing methods predominantly focus on previously seen historical interactions, which we term stability patterns, while overlooking newly emerging first-time interactions, which we term transition patterns. Both types of patterns are essential for faithful temporal explanations. To address this limitation, we propose ST-TGExplainer, a self-explainable TGNN that disentangles Stability and Transition patterns in temporal graphs for a more faithful Temporal GNN Explainer. Guided by a disentangled information bottleneck objective, ST-TGExplainer learns a compact explanatory subgraph that remains predictive of the event label while explicitly suppressing label-conditioned redundancy between stability and transition patterns. Extensive experiments demonstrate that ST-TGExplainer achieves strong predictive performance and yields more faithful explanations. Code is available at https://github.com/hjchen-hdu/ST-TGExplainer.

2605.19813 2026-05-20 cs.LG math.ST stat.TH 版本更新

General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions

具有任意公共 transcripts 交互的差分隐私联邦学习的一般下界

Yicheng Li

发表机构 * Department of Statistics and Data Science, Tsinghua University(清华大学统计与数据科学系)

AI总结 本文研究了在任意公共 transcripts 交互下差分隐私联邦学习的下界问题,提出了一个针对平方 $\ell_2$ 损失参数估计的联邦 Van Trees 下界,并通过均值估计、线性回归和非参数回归等应用展示了该下界。

详情
AI中文摘要

我们证明了在具有任意公共 transcripts 交互的差分隐私联邦学习协议中的一般下界。该协议可以使用任意数量的自适应轮次,且每个客户端的本地样本可以在不同轮次中重复使用。对于平方 $\ell_2$ 损失下的参数估计,我们为每个满足总客户端层面零集中差分隐私(zCDP)约束的估计器建立了联邦 Van Trees 下界。主要技术成分是一个针对完整公共 transcripts 的隐私-信息收缩不等式。我们通过均值估计、线性回归和非参数回归等应用来展示该下界。

英文摘要

We prove a general lower bound for differentially private federated learning protocols with arbitrary public-transcript interactions. The protocol may use any number of adaptive rounds, and each client's local samples may be reused across rounds. For parameter estimation under squared \(\ell_2\) loss, we establish a federated van Trees lower bound for every estimator satisfying a total clientwise sample-level zero-concentrated differential privacy (zCDP) constraint. The main technical ingredient is a privacy-information contraction inequality for complete public transcripts. We illustrate the bound through applications to mean estimation, linear regression, and nonparametric regression.

2605.19812 2026-05-20 cs.LG cs.AI stat.AP stat.ML 版本更新

FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes

FLUXtrapolation:一个用于外推生态系统通量的基准测试

Anya Fries, Jacob A Nelson, Martin Jung, Markus Reichstein, Jonas Peters

发表机构 * Seminar for Statistics, ETH Zürich(统计研究所,苏黎世联邦理工学院) Max Planck Institute for Biogeochemistry(生物地球化学研究所)

AI总结 该研究提出FLUXtrapolation基准测试,旨在外推生态系统通量,通过分析分布偏移对通量上推的挑战,评估机器学习方法在分布偏移下的表现,以促进通量上推的科学目标。

详情
AI中文摘要

我们介绍了FLUXtrapolation,一个用于在外推生态系统通量时应对逐渐加剧的分布偏移的基准测试。生态系统通量是理解碳、水和能量循环的关键,但只能通过稀疏分布的测量塔直接测量。因此,生成全球通量估计需要在可用的全球协变量上训练模型,并在未观测区域进行预测,即上推。通量上推是一个具有挑战性的领域泛化问题,受气候、生态系统类型和环境条件之间协变量分布偏移的影响,以及条件偏移的影响:重要的驱动因素在全局尺度上未被观测。我们对这两种偏移在P_X和P_{Y|X}中的定量分析。FLUXtrapolation基于对通量上推的领域专业知识设计:它定义了基于时间、空间和温度的外推场景,并在未观测的领域、时间聚合和尾部误差上评估性能。在试点研究中,我们发现基线方法在中位小时RMSE下表现相似,但在提出的尾部聚焦和多尺度评估下则有所不同。因此,FLUXtrapolation为机器学习方法在分布偏移下的现实挑战提出了相关挑战;同时,该基准测试的进步将直接支持科学目标,即改进通量上推。

英文摘要

We introduce FLUXtrapolation, a benchmark for extrapolating ecosystem fluxes under progressively harder distribution shifts. Ecosystem fluxes are central to understanding the carbon, water, and energy cycles, yet they can only be measured directly at sparsely located measurement towers. Producing global flux estimates therefore requires training models on observed sites using globally available covariates and predicting in unobserved regions, that is, upscaling. Flux upscaling is a challenging domain generalization problem that is affected by a shift in covariate distribution across climates, ecosystem types, and environmental conditions, as well as by conditional shift: important drivers remain unobserved at global scale. We provide a quantitative analysis of both these shifts in $P_X$ and $P_{Y\mid X}$. FLUXtrapolation is designed based on domain expertise on flux upscaling: it defines temporal, spatial, and temperature-based extrapolation scenarios and evaluates performance across held-out domains, temporal aggregations, and tail errors. In a pilot study, we find that baselines perform similarly under median hourly RMSE, but separate under the proposed tail-focused and multi-scale evaluation. FLUXtrapolation therefore poses a realistic and thus relevant challenge for machine learning methods under distribution shift; at the same time, progress on this benchmark would directly support the scientific goal of improving flux upscaling.

2605.19811 2026-05-20 cs.LG 版本更新

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

LionMuon: 交替频谱和符号下降用于高效训练

Arman Bolatov, Artem Riabinin, Nikita Kornilov, Andrey Veprikov, Samuel Horváth, Martin Takáč, Aleksandr Beznosikov

发表机构 * DeepSeek-AI Essential AI Kimi Team

AI总结 本文提出LionMuon优化器,通过交替使用Lion和Muon的更新步骤,在保持有效性的同时显著降低平均迭代成本,同时证明了在重尾噪声下的复杂性界限,展示了其在不同模型规模下的优势。

Comments 38 pages, 13 figures, 4 tables

详情
AI中文摘要

在大规模优化中,更新步骤的廉价性和有效性是成功优化器的关键因素。基于符号的优化器如Lion或Signum产生廉价的每步更新,而Muon的谱矩阵-符号更新则在显著更高的每步成本下提供更强的方向。在本文中,我们提出LionMuon,它保留了Muon步骤的有效性,同时显著降低了平均迭代成本,类似于基于符号的方法。它在固定周期P内交替使用Lion和Muon的更新,共享一个单一的双EMA动量缓冲区。因此,优化器状态内存与Lion相同,恰好是AdamW的一半。一个更简单的单EMA变体SignMuon本身已经优于纯Muon。在P=2时,LionMuon在我们测试的124M模型大小的每个数据集和架构上都优于Muon、Lion、Signum和AdamW,在更低的计算下达到更低的验证损失,这一优势在355M和720M规模上仍然存在。在理论方面,我们证明了在重尾噪声下的严格复杂性界限,这些界限由周期平均平滑度和介于Muon和Lion之间的噪声所决定。这些界限预测了计算最优的周期以及LionMuon超越Muon和Lion的条件。代码:https://github.com/brain-lab-research/lion-muon

英文摘要

In large-scale optimization, the cheapness and effectiveness of update steps are the most crucial factors for a successful optimizer. Sign-based optimizers like Lion or Signum produce cheap per-step updates, whereas Muon's spectral matrix-sign update gives a much stronger direction at a substantially higher per-step cost. In this work, we propose LionMuon, which retains the effectiveness of Muon steps while considerably cutting the averaged iteration cost, similar to sign-based methods. It alternates between Lion's and Muon's updates on a fixed period P, sharing a single dual-EMA momentum buffer between them. The optimizer state memory therefore matches Lion and is exactly half of AdamW's. A simpler single-EMA variant, SignMuon, by itself already outperforms pure Muon. At P = 2, LionMuon Pareto-dominates Muon, Lion, Signum, and AdamW on every dataset and architecture we tested at 124M model size, reaching lower validation loss at lower compute, and the same advantage persists at 355M and 720M scale. On the theory side, we prove sharp complexity bounds under heavy-tailed noise which are governed by period-averaged smoothness and noise that interpolate between Muon's and Lion's constants. These bounds predict the compute-optimal period and the conditions under which LionMuon outruns Muon and Lion. Code: https://github.com/brain-lab-research/lion-muon

2605.19804 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Stitched Value Model for Diffusion Alignment

用于扩散对齐的拼接价值模型

Hyojun Go, Hyungjin Chung, Prune Truong, Goutam Bhat, Li Mi, Zhaochong An, Zixiang Zhao, Dominik Narnhofer, Serge Belongie, Federico Tombari, Konrad Schindler

发表机构 * ETH Zurich(苏黎世联邦理工学院) Google(谷歌) University of Copenhagen(哥本哈根大学)

AI总结 本文提出StitchVM,一种将预训练的干净图像奖励模型转移到噪声潜在空间的拼接框架,通过高效转移和微调,提升扩散对齐的效率和效果。

Comments Project page: https://gohyojun15.github.io/StitchVM/

详情
AI中文摘要

为了实际应用,基于扩散或流的生成模型必须与任务特定的奖励对齐,例如提示保真度或审美偏好。这种对齐具有挑战性,因为奖励是为干净的输出图像定义的,但对齐过程需要在噪声中间潜在空间中估计价值函数。现有方法倾向于Tweedie风格或蒙特卡洛近似,权衡估计器偏差与计算成本:Tweedie估计高效但有偏差,而蒙特卡洛估计更准确但需要昂贵的回放。一个自然的替代方法是学习的价值函数,但如何有效训练一个强大的、通用的价值模型专门用于噪声潜在空间仍然是一个开放问题。本文提出了StitchVM,一种模型拼接框架,该框架高效地将预训练用于干净图像的奖励模型转移到噪声潜在空间。StitchVM从一个现有的、截断的像素空间奖励模型开始,并将其冻结的扩散骨干作为其头部。从像素空间模型中,所得到的混合模型保留了精心预训练、稳健的奖励能力;从扩散骨干中,它继承了其处理噪声潜在空间的原生能力。拼接过程异常轻量,例如拼接和微调CLIP ViT-L和SD 3.5 Medium仅需10个GPU小时。通过将强大的像素空间奖励模型提升到潜在空间,StitchVM打开了一种新的扩散对齐风格:而不是对价值函数的粗糙但昂贵的每样本近似,正确的函数对于实际的噪声潜在空间一次构建,然后在许多样本和迭代中进行抵消。我们显示,这种方法在广泛下游引导和后训练方法中带来了改进:DPS变得比原来快3.2倍,同时将峰值GPU内存减半,DiffusionNFT变得比原来快2.3倍。

英文摘要

For practical use, diffusion- or flow-based generative models must be aligned with task-specific rewards, such as prompt fidelity or aesthetic preference. That alignment is challenging because the reward is defined for clean output images, but the alignment procedure requires value function estimates at noisy intermediate latents. Existing methods resort to Tweedie-style or Monte Carlo approximations, trading off estimator bias against computational cost: Tweedie estimates are efficient but biased, while Monte Carlo estimates are more accurate but require expensive rollouts. A natural alternative would be a learned value function, but it remains an open question how to effectively train a strong and general value model specifically for noisy latents. Here, we propose StitchVM, a model stitching framework that efficiently transfers reward models pretrained for clean images to the noisy latent regime. StitchVM starts from an existing, truncated pixel-space reward model and attaches a frozen diffusion backbone to it as its head. From the pixel-space model, the resulting hybrid retains a carefully pretrained, robust reward capability; from the diffusion backbone, it inherits its native ability to handle noisy latents. The stitching procedure is exceptionally lightweight, e.g., stitching and finetuning CLIP ViT-L and SD 3.5 Medium takes only 10 GPU-hours. By lifting powerful pixel-space reward models to latent space, StitchVM opens up a new style of diffusion alignment: instead of rough, yet costly per-sample approximation of the value function, the correct function for the actual, noisy latents is constructed once and then amortized over many samples and iterations. We show that this approach yields improvements across a broad range of downstream steering and post-training methods: DPS becomes $3.2\times$ faster while halving peak GPU memory, and DiffusionNFT becomes $2.3\times$ faster.

2605.19782 2026-05-20 cs.AI cs.LG cs.SE 版本更新

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

先验知识还是搜索?LLM代理在硬件感知代码优化中的研究

Dmitry Redko, Albert Fazlyev, Konstantin Sozykin, Maria Ivanova, Evgeny Burnaev, Egor Shvetsov

发表机构 * Applied AI Institute(应用人工智能研究所) ITMO University(ITMO大学) AI Talent Hub(AI人才中心)

AI总结 该研究探讨了在硬件感知代码优化中,LLM代理是依赖于先验知识还是搜索过程,通过三个受控实验发现LLM在纯黑盒优化中表现为贪婪优化器,在零样本内核生成中输入大小信息无明显影响,而在反馈循环内核优化中CUDA单调改进而TVM IR主动退化,表明LLM在代码优化任务中高度依赖预训练先验而非反馈或代理结构。

详情
AI中文摘要

LLM发现和优化系统在各个领域中被越来越多地应用,实现了一个常见的提出-评估-修订循环。此类优化或发现过程通过上下文条件在接收到环境反馈后进行。然而,随着现代LLM代理在结构上日益复杂,难以评估哪些组件贡献最大,以及何时以及如何探索可能失败。我们通过三个受控实验回答这些问题。我们的发现:(1) 在纯黑盒优化中,LLM表现为贪婪优化器。(2) 在零样本内核生成中,提供显式输入大小信息没有可测量的影响,模型无论大小或温度都会收敛到相同的内核参数,仿佛大小指令是不可见的。此外,当被要求为不常见的内核大小进行内核优化时,性能会急剧下降,无论使用的语言如何。(3) 在反馈循环内核优化中,CUDA在迭代反馈下单调改进,而TVM IR则主动退化,这表明当模型以低密度语言操作时,内核优化会退化。我们的结果得出结论:在代码优化任务中,LLM高度依赖于预训练的先验而非提供的反馈或代理结构。

英文摘要

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are increasingly complex in their structure, it is difficult to evaluate which components contribute the most, and when and how this exploration may fail. We answer these questions through three controlled experiments. Our findings: (1) In pure black-box optimization, LLMs act as greedy optimizers. (2) In zero-shot kernel generation, providing explicit input-size information has no measurable effect, models converge to the same kernel parameters regardless of size or temperature, as though the size instruction were invisible. Moreover, when tasked to perform kernel optimization for uncommon kernel sizes, performance sharply degrades regardless of the language used. (3) In feedback-loop kernel optimization, CUDA improves monotonically under iterative feedback, while TVM IR actively degrades, which demonstrates that kernel optimization degrades when models operate with low-density language. Our results conclude that LLMs in code optimization tasks highly depend on pretrained priors rather than provided feedback or agentic structure.

2605.19779 2026-05-20 cs.AI cs.LG 版本更新

Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

无分布不确定性量化用于连续AI代理评估

Yuxuan Gao, Megan Wang, Yi Ling Yu

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Columbia University(哥伦比亚大学)

AI总结 本文提出了一种无分布的不确定性量化方法,用于连续AI代理评估,通过适应性符合推断(ACI)提供预测质量分数的覆盖保证,并开发了多代理管道的组合不确定性界限、成对排名的符合回避规则以及领奖台规模多重检验的FDR校正回避方法。

Comments 6 pages, 7 figures, 2 tables. Accepted at the ICML 2026 Workshop on Agentic Uncertainty Quantification (AgenticUQ) - Poster

详情
AI中文摘要

我们适应了分割符合预测和适应性符合推断(ACI)用于连续AI代理评估,提供预测质量分数的无分布覆盖保证。符合区间在24小时范围内所有名义水平上实现了校准误差低于0.02,而ACI在代理发布后正确扩大了区间35%然后重新收敛。我们进一步开发了多代理管道的组合不确定性界限(通过模拟验证了不同阶段相关性rho在[-0.5, 0.9]范围内),一种用于成对排名的符合回避规则(具有受控的假排名率),以及领奖台规模多重检验的FDR校正回避方法。通过18个实时信号每小时收集的数据评估50个代理,我们显示每个代理的条件覆盖集中在名义水平(均值80.4%,90%的代理在[72%, 90%]范围内),并且跨源情感分歧预测排名不稳定性(r=0.64,p<0.01)。一个循环控制的验证确认了框架能够捕捉超过基准的信号(rho_s=0.52,p<0.01,n=35)。代码和数据在CC BY 4.0下发布。

英文摘要

We adapt split conformal prediction and adaptive conformal inference (ACI) to continuous AI agent evaluation, providing distribution-free coverage guarantees for forecasted quality scores. Conformal intervals achieve calibration error below 0.02 across all nominal levels at the 24h horizon, while ACI correctly widens intervals by 35% following agent releases then reconverges. We further develop compositional uncertainty bounds for multi-agent pipelines (validated via simulation across inter-stage correlations rho in [-0.5, 0.9]), a conformal abstention rule for pairwise rankings with controlled false-ranking rate, and FDR-corrected abstention for leaderboard-scale multiple testing. Evaluating 50 agents via 18 real-time signals collected hourly, we show that per-agent conditional coverage is well-concentrated around the nominal level (mean 80.4%, 90% of agents within [72%, 90%]), and that cross-source sentiment divergence predicts ranking instability (r=0.64, p<0.01). A circularity-controlled validation confirms the framework captures signal beyond benchmarks (rho_s=0.52, p<0.01, n=35). Code and data are released under CC BY 4.0.

2605.18870 2026-05-20 cs.LG math.AP math.FA 版本更新

Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows

多头变压器架构作为时间依赖的Wasserstein梯度流

Alex Massucco, Leonardo Del Grande, Marcello Carioni, Christoph Brune, Carola-Bibiane Schönlieb

发表机构 * Department of Applied Mathematics and Theoretical Physics, University of Cambridge(应用数学与理论物理系,剑桥大学) Department of Mathematics, University of Twente(数学系,埃因霍温理工大学)

AI总结 本文提出将多头变压器架构中的数据流建模为时间依赖的Wasserstein梯度流,以捕捉注意力机制的设计,并证明了在合适积分性假设下,梯度流的ω-极限集元素是交互能量的稳态点,同时分析了梯度流的稳定性,并通过数值实验验证了预测的能量耗散身份和动力学的渐近行为。

详情
AI中文摘要

近年来,变压器架构已彻底改变了语言处理领域,开辟了前所未有的可能性。然而,从理论角度来看,文献中提出的数学模型往往缺乏与实际架构的直接联系,并依赖于强简化的假设。在本文中,我们通过将多头变压器架构中的数据流建模为时间依赖的梯度流,以捕捉注意力机制的设计,从而缩小这一差距。显式的时间依赖性使我们能够为每个头和每个层分配不同的权重,而无需对初始化方法施加限制。此外,我们证明,在合适积分性假设下,每个梯度流的ω-极限集元素都是交互能量在极限权重分布下的稳态点。最后,我们分析了梯度流的稳定性,考虑了初始数据和权重的扰动。一方面,我们研究了所提出模型对噪声输入的鲁棒性,建立了梯度流对初始数据的连续依赖性和流的唯一性。另一方面,我们证明了扰动的交互能量对未扰动能量的Γ收敛性,导致相应的梯度流收敛。我们通过数值实验补充了这些理论结果,验证了预测的能量耗散身份,并澄清了动力学在自主型(Ornstein-Uhlenbeck)和真正非自主型(振荡权重)两种情况下的渐近行为。

英文摘要

In recent years, transformer architectures have revolutionized the field of language processing, opening the door to previously unforeseen possibilities. However, from a theoretical point of view, the mathematical models proposed in the literature often lack direct contact with the actual architectures and depend on strong simplifying assumptions. In this paper, we reduce this gap by modelling the data flow in multi-headed transformer architectures as time-dependent gradient flows for a suitable interaction energy capturing the design of the attention mechanism. The explicit dependence on time allows us to consider different weights for each head and for each layer, without imposing constraints on the initialization method. Moreover, we prove that, under a suitable integrability assumption on the evolution of the weights, each element of the $ω$-limit set of the gradient flows is a stationary point of the interaction energy at a limiting weight distribution. Finally, we analyse the stability of the gradient flows considering perturbations of both the initial data and the weights. Specifically, on the one hand, we study the robustness of the proposed models with respect to noisy inputs, establishing a continuous dependence of the gradient flows on the initial data and uniqueness of the flows. On the other hand, we prove the $Γ$-convergence of the perturbed interaction energy to the unperturbed one, leading to the convergence of the corresponding gradient flows. We complement these theoretical results with numerical experiments that confirm the predicted energy-dissipation identity and clarify the asymptotic behavior of the dynamics in both the autonomous-like (Ornstein--Uhlenbeck) and the genuinely non-autonomous (oscillating-weights) regimes.

2605.18618 2026-05-20 cs.LG cs.AI 版本更新

Stochastic Penalty-Barrier Methods for Constrained Machine Learning

随机罚函数-障碍方法用于约束机器学习

Adam Bosák, Andrii Kliachkin, Jana Lepšová, Gilles Bareilles, Jakub Mareček

发表机构 * Artificial Intelligence Center, CTU in Prague(布拉格CTU人工智能中心) CMAP, École Polytechnique, Palaiseau, France(法国巴黎高等理工学院帕莱索校区CMAP)

AI总结 本文提出了一种随机罚函数-障碍方法(SPBM),用于解决深度学习中非凸、非光滑、随机环境下的约束优化问题,该方法通过指数对偶平均、稳定罚函数调度和Moreau包络来处理非光滑性,并在多个设置中验证了其性能。

详情
AI中文摘要

约束机器学习能够实现公平性感知训练、物理信息神经网络以及将符号领域知识整合到统计模型中。尽管其实际重要性,但目前尚无通用方法能够处理深度学习中自然出现的非凸、非光滑、随机环境。我们提出随机罚函数-障碍方法(SPBM),通过指数对偶平均、稳定罚函数调度和Moreau包络扩展经典罚函数和障碍方法,以处理非光滑性。在多个设置中的实验表明,SPBM在匹配或优于现有约束优化基线的同时,仅比无约束Adam方法多出线性运行时间开销,最多可处理10,000个约束。

英文摘要

Constrained machine learning enables fairness-aware training, physics-informed neural networks, and integration of symbolic domain knowledge into statistical models. Despite its practical importance, no general method exists for the non-convex, non-smooth, stochastic setting that arises naturally in deep learning. We propose the Stochastic Penalty-Barrier Method (SPBM), which extends classical penalty and barrier methods to this setting via exponential dual averaging, a stabilized penalty schedule, and the Moreau envelope to handle non-smoothness. Experiments across multiple settings show that SPBM matches or outperforms existing constrained optimization baselines while incurring only linear runtime overhead compared to unconstrained Adam for up to 10,000 constraints.

2605.17635 2026-05-20 hep-ex cs.LG 版本更新

ML-based Fast Simulation of FARICH Responses

基于机器学习的FARICH响应快速模拟

Foma Shipilov, Alexander Barnyakov, Artem Ivanov, Fedor Ratnikov

发表机构 * HSE University(俄罗斯莫斯科高等经济大学) Budker Institute of Nuclear Physics SB RAS(俄罗斯托木斯克核物理研究所) Novosibirsk State Technical University(托木斯克国立技术大学) Joint Institute for Nuclear Research(联合核子研究所)

AI总结 本文提出基于条件生成对抗网络的机器学习方法,用于快速模拟FARICH探测器响应,通过轻量级卷积架构生成真实光子击中探测器矩阵的样本,并在速度和精度上优于传统蒙特卡洛方法。

Comments to be published in 7th International Workshop on Future Tau Charm Facilities (FTCF2025) proceedings

详情
AI中文摘要

快速模拟探测器响应是高能物理(HEP)中的关键任务。传统蒙特卡洛方法是现代粒子物理模拟软件的核心,但计算成本较高。本文提出一种基于机器学习的方法,用于快速模拟聚焦气凝胶环形切伦科夫(FARICH)探测器响应。给定粒子轨迹和动量,目标是生成探测器矩阵上的真实光子击中样本。我们提出了一种带有轻量级卷积架构的条件生成对抗网络(cGAN),该网络能够根据粒子参数条件重现投影探测器响应。我们通过应用于概率图和重建速度分布的指标,将cGAN与线性统计基线进行比较。cGAN生成真实样本,并在蒙特卡洛模拟上提供了显著的速度提升。

英文摘要

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, the goal is to generate realistic samples of photon hits on the detector matrix. We propose a conditional Generative Adversarial Network (cGAN) with a lightweight convolutional architecture that reproduces the projected detector response conditioned on particle parameters. We compare the cGAN against a linear statistical baseline using metrics applied to probability maps and to the reconstructed velocity distributions. The cGAN produces realistic samples and provides a significant speed-up over Monte-Carlo simulation.

2605.17471 2026-05-20 cs.LG cs.NA math.NA math.OC 版本更新

WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

WinQ: 加速围绕鞍点的语言模型量化感知训练

Dongyue Li, Zechun Liu, Kai Yi, Zhenshuo Zhang, Changsheng Zhao, Raghuraman Krishnamoorthi, Harshit Khaitan, Hongyang R. Zhang, Steven Li

发表机构 * Northeastern University, MA(东北大学) Meta AI, CA(Meta AI)

AI总结 本文研究了量化感知训练(QAT)在低比特宽度下的收敛问题,提出WinQ算法通过重置权重和噪声注入梯度来加速训练并提升性能。

Comments 23 pages; To appear in ICML 2026

详情
AI中文摘要

量化感知训练(QAT)被广泛用于通过训练全精度权重来量化语言模型,其主要瓶颈是收敛缓慢和早期性能 plateau,特别是在低于4比特宽度时。尽管先前工作已观察到此问题,但其精确原因仍不清楚。在本文中,我们通过估计损失曲面Hessian谱来分析QAT的收敛性。我们发现权重会收敛到鞍点周围的平坦区域,其中大量Hessian特征值同时为正和负。在训练过程中,越来越多的Hessian特征值集中在零附近,其幅度减小。在较低的比特宽度下,Hessian谱中的特征值幅度显著更小。为缓解这些问题,我们提出了一种名为WinQ的算法,包括:(1)周期性地将权重重置为全精度和量化权重的线性插值,减少到量化网格的距离并增加特征值幅度,以及(2)计算噪声注入权重的梯度以正则化Hessian。广泛的实验表明,WinQ在各种量化方法和模型上将QAT加速了多达4倍。在相同的训练成本下,WinQ将最先进的子4比特量化改进了高达8.8%。这些结果在16种不同语言模型、量化方法和比特宽度的设置中保持一致。

英文摘要

Quantization-aware training (QAT) is widely adopted to quantize language models by training full-precision weights using gradients from the quantized model. The main bottleneck is its slow convergence and early performance plateau, particularly below 4-bit-widths. While this problem has been observed in prior work, its precise cause remains unclear. In this paper, we analyze the convergence of QAT by estimating the spectrum of the loss-surface Hessians. We find that the weights converge to flat regions around saddle points, where a large fraction of the Hessian eigenvalues are both positive and negative. During training, an increasing fraction of Hessian eigenvalues concentrates around zero, whose magnitude decreases. At lower bit-widths, the magnitude of eigenvalues in the Hessian spectrum is significantly smaller. To mitigate these issues, we propose an algorithm called WinQ to accelerate QAT, which involves: (1) periodically resetting weights to the linear interpolation of full-precision and quantized weights, reducing the distance to the quantization grid and increasing eigenvalue magnitude, and (2) computing gradients of noise-injected weights to regularize the Hessian. Extensive experiments show that WinQ accelerates QAT by up to 4 times across various quantization methods and models. Under the same training cost, WinQ improves state-of-the-art sub-4-bit quantization by up to 8.8%. These results are consistent across 16 settings with different language models, quantization methods, and bit widths.

2605.16692 2026-05-20 cs.LG cs.AI cs.RO 版本更新

EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control

EfficientTDMPC: 改进的MPC目标以实现高效的连续控制

Thomas Evers, Cristian Meo, Wendelin Bohmer, Justin Dauwels, Yaniv Oren

发表机构 * TU Delft(代尔夫特理工大学) LatentWorlds AI

AI总结 本文提出EfficientTDMPC,一种基于模型的强化学习方法,用于连续控制,通过减少误差和增加数据新鲜度来提高样本效率。

详情
AI中文摘要

我们介绍了EfficientTDMPC,一种用于连续控制的样本高效模型基于强化学习方法,基于TD-MPC算法家族。该家族的核心是一个规划器,旨在找到最大化估计回报的行动序列。回报通过学习的模型和价值网络进行估计,每个都可以引入误差。EfficientTDMPC通过两种方式减少这种误差。首先,它引入了动态模型的集成,并在这些模型和不同的展开深度之间平均回报估计。其次,它增加了应用不确定性惩罚到规划器目标的选项,从而得到一个避免不确定回报估计的规划器。然后,它增加了实用改进,提高缓冲数据的新鲜度并减少计算。最后,我们发现我们的贡献使EfficientTDMPC能够更受益于更高的更新到数据(UTD)比率,进一步提高样本效率。据我们所知,在每个基准的低数据情况下,EfficientTDMPC在HumanoidBench-Hard和DMC hard上实现了最先进的样本效率,而在DMC easy上则匹配了最先进的性能。

英文摘要

We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from a higher update-to-data (UTD) ratio, further improving sample efficiency. To the best of our knowledge, in the low data regime of each benchmark, EfficientTDMPC achieves state-of-the-art (SOTA) in terms of sample efficiency on HumanoidBench-Hard and DMC hard, while matching SOTA on DMC easy.

2605.16447 2026-05-20 cs.LG cs.AI 版本更新

Nested Spatio-Temporal Time Series Forecasting

嵌套时空时间序列预测

Yinghao Ai, Yukai Zhou, Ruoxi Jiang, Junyi An, Chao Qu, Zhijian Zhou, Shiyu Wang, Fenglei Cao, Zenglin Xu, Furao Shen, Yuan Qi

发表机构 * Fudan University, Shanghai(复旦大学) Department of Computer Science and Technology, Nanjing University(南京大学计算机科学与技术系) ByteDance(字节跳动)

AI总结 本文提出了一种嵌套预测框架,通过结合未来宏观区域趋势与微观历史观测,实现了精细化预测,并通过谱聚类方法构建语义连贯的区域,有效过滤系统性噪声并保留关键趋势,实验表明该方法在多个高维数据集上优于现有最先进基线。

Comments Accept by ICML 2026

详情
AI中文摘要

时空预测对于现实应用如交通管理至关重要,但在噪声和非平稳条件下捕捉可靠交互仍具挑战性。现有方法主要依赖历史空间先验,往往无法考虑演化的时空相关性并产生系统性误差。在本文中,我们提出了一种嵌套预测框架,将未来宏观区域趋势与微观历史观测相结合,使模型能够从抽象的未来表示中获得自上而下的指导以实现精细化预测。具体而言,我们采用基于谱聚类的方法构建语义连贯的区域,提供了理论和经验证据表明这种表示能有效过滤系统性噪声并保留关键趋势。在此基础上,我们开发了一种逐步由粗到细的预测器,将这些代表性特征整合到推理过程中。这使模型能够利用趋势预测来提前预测动态异常,如周期性偏移。此外,对多个高维数据集的广泛实验表明,我们的方法在多个高维数据集上始终优于现有最先进基线,验证了未来宏观指导的嵌套预测的有效性。

英文摘要

Spatiotemporal forecasting is critical for real-world applications like traffic management, yet capturing reliable interactions remains challenging under noisy and non-stationary conditions. Existing methods primarily rely on historical spatial priors, often failing to account for evolving temporal correlations and suffering from systematic errors. In this work, we propose a nested forecasting framework that couples future macro-level regional trends with micro-level historical observations, enabling top-down guidance from abstract future representations for fine-grained forecasting. Specifically, we employ a spectral clustering-based approach to construct semantically coherent regions, providing both theoretical and empirical evidence that this representation effectively filters systematic noise while preserving essential trends. Building on this, we develop a progressive coarse-to-fine predictor to integrate these representative features into the inference process. This enables the model to leverage trend predictions to anticipate dynamic anomalies, such as periodic offsets, in advance. Furthermore, extensive experiments on multiple high-dimensional datasets demonstrate that our method consistently outperforms state-of-the-art baselines, validating the effectiveness of future macro-guided nested forecasting.

2605.16170 2026-05-20 cs.LG 版本更新

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

BAPR: 基于贝叶斯遗忘的分段鲁棒强化学习用于非平稳连续控制

Yifan Zhang, Liang Zheng

发表机构 * Central South University(中南大学)

AI总结 该研究提出BAPR方法,结合贝叶斯在线变化检测与鲁棒集合强化学习,解决非平稳连续控制中的鲁棒性与适应性问题,通过形式化验证确保算法稳定性与收敛性。

详情
AI中文摘要

现实中的控制系统经常在分段平稳条件下运行,其中动态在较长时期内保持稳定,随后经历 abrupt 的 regime 变化。标准鲁棒强化学习方法面临根本性困境:全局保守策略在稳定时期浪费性能,而局部适应策略在未检测到 regime 变化时风险崩溃。我们提出 BAPR(贝叶斯遗忘分段鲁棒 SAC),将贝叶斯在线变化检测(BOCD)与鲁棒集合强化学习统一。BAPR 操作符——一种加权由模式条件贝尔曼操作符和冻结信念分布构成的凸组合——是一个 γ-收缩。一个互补的反例,在 Lean~4 中机验证,建立了明确的边界:当信念依赖于 Q 函数时,收缩因子变为 γ + λΔ(其中 Δ 是模式奖励差),且收缩失败恰好当 γ + λΔ ≥ 1。我们推导了抽象操作符的组件式形式化误差预算——每个组件机验证,限制了切换后的恢复;预算适用于抽象模式混合操作符,并通过冻结参数设计直觉继承到实现的共享批评者算法。所有结果均通过形式化验证,无 sorry(1,145 行,3 个 Lean~4 文件,22 个机验证定理)。BOCD 驱动了适应性保守机制:在检测到变化点后,策略变得最保守,并随着信心增长而平滑放松,检测延迟为 O(log(1/δ))。一个通过 RMDM 损失训练的上下文条件模块,从模拟器提供的模式 ID 提取模式感知表示,在训练时和部署时均无需模式标签。

英文摘要

Real-world control systems frequently operate under \emph{piecewise stationary} conditions, where dynamics remain stable for extended periods before undergoing abrupt regime changes. Standard robust RL methods face a fundamental dilemma: a globally conservative policy wastes performance during stable periods, while a locally adaptive policy risks catastrophic failure when the regime changes undetected. We propose \textbf{BAPR} (Bayesian Amnesic Piecewise-Robust SAC), which unifies Bayesian Online Change Detection (BOCD) with robust ensemble RL. The BAPR operator -- a convex combination of mode-conditional Bellman operators weighted by a frozen belief distribution -- is a $γ$-contraction. A complementary counterexample, machine-verified in Lean~4, establishes a \emph{sharp boundary}: when beliefs depend on the Q-function, the contraction factor becomes $γ+ λΔ$ (where $Δ$ is the mode reward gap), and contraction fails exactly when $γ+ λΔ\geq 1$. We derive a \emph{component-wise} formal error budget for the abstract operator -- every component machine-verified -- bounding post-switch recovery; the budget applies to the abstract mode-mixture operator and inherits to the implemented shared-critic algorithm only through the frozen-parameter design intuition. All results are formally verified with no \texttt{sorry} (1,145 lines across 3 Lean~4 files, 22 machine-verified theorems). BOCD drives an adaptive conservatism mechanism: the policy becomes maximally conservative after detected change-points and smoothly relaxes as confidence grows, with detection delay $O(\log(1/δ))$. A context-conditioning module trained via RMDM loss provides mode-aware representations from simulator-provided mode IDs at training time and requires no mode labels at deployment.

2605.15532 2026-05-20 cs.LG cs.AI cs.CL 版本更新

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

DeltaPrompts: 逃离多模态蒸馏中的零delta陷阱

Jaehun Jung, Hyunwoo Kim, Brandon Cui, Ximing Lu, David Acuna, Prithviraj Ammanabrolu, Yejin Choi

发表机构 * NVIDIA Research(NVIDIA研究院)

AI总结 本文提出DeltaPrompts,通过量化教师与学生之间的答案分歧(Δ)来生成高分歧的推理问题,从而解决传统蒸馏中因零delta提示导致的学习信号不足问题,实验表明DeltaPrompts在多个场景下显著提升了模型性能。

详情
AI中文摘要

蒸馏使紧凑的视觉-语言模型(VLMs)能够获得强大的推理能力,但驱动这一过程的提示通常通过简单的启发法或从现成数据集中聚合获得。我们揭示了这种方法中的关键低效性:标准图表/文档推理数据集中多达69%的提示实际上是零delta,意味着教师和学生已经诱导出完全相同的答案分布。在这些提示上训练提供极小的学习信号,导致学生性能在数据规模扩大时迅速饱和。为逃离零delta陷阱,我们回归基本原理:蒸馏本质上最小化了分布差异,因此只有暴露教师与学生之间功能性能力差距的提示才具有价值。我们通过答案分歧(Δ)量化这一差距,证明非零分歧对有效扩展至关重要。基于这一洞察,我们提出一个分阶段合成流程,利用现有数据集作为种子,主动针对学生失败模式生成更好的提示。结果是DeltaPrompts,一个包含20万 synthetic 高分歧推理问题的多样化数据集。我们评估DeltaPrompts在三个不同场景下的表现:在目标教师-学生对上的在线蒸馏、转移到新型模型家族而不重新生成数据、以及非推理模型的离线微调。在所有场景中,DeltaPrompts均带来显著收益,即使在高度优化的推理模型(如Qwen3-VL-8B-Thinking)上,也能在10个基准测试中平均获得高达15%的相对提升。

英文摘要

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

2605.14588 2026-05-20 cs.LG 版本更新

Silent Collapse in Recursive Learning Systems

递归学习系统中的沉默崩溃

Zhipeng Zhang

发表机构 * China Mobile Research Institute(中国移动研究院) China Mobile GBA (Greater Bay Area) Innovation Institute(中国移动大湾区创新研究院)

AI总结 本文研究了递归学习系统中模型内部分布逐渐退化的现象,提出MTR框架通过监测轨迹统计量和调整学习强度来提前预警并防止沉默崩溃。

详情
AI中文摘要

递归学习——即模型在由自身先前版本生成的数据上进行训练——在大型语言模型、自主代理和自监督系统中日益常见。然而,标准性能度量(损失、困惑度、准确率)往往无法在不可逆退化发生前检测到内部退化。本文识别出一种现象,我们称之为沉默崩溃:在广泛递归条件下,模型内部分布(预测熵、表征多样性、尾部覆盖)即使在传统度量看似稳定或改进时也会逐渐收缩。我们发现沉默崩溃并非 abrupt,其发生前总是可靠地由三个轨迹级前兆预示:(1)锚点熵的收缩,(2)表征漂移的冻结,(3)尾部覆盖的侵蚀。这些信号在任何传统验证度量退化之前多代出现,从而实现早期预警。基于这些前兆,我们提出了MTR(监控-信任-调节器)框架,一个轻量级的元认知循环,通过监测轨迹统计量、估计慢时间尺度的信任变量,并自适应调节有效学习强度。MTR在不需访问原始干净数据的情况下提供早期预警并主动防止沉默崩溃,这是当原始数据不可用、受污染或私有时的关键优势。

英文摘要

Recursive learning -- where models are trained on data generated by previous versions of themselves -- is increasingly common in large language models, autonomous agents, and self-supervised systems. However, standard performance metrics (loss, perplexity, accuracy) often fail to detect internal degradation before it becomes irreversible. Here we identify a phenomenon we call silent collapse: under broad recursive conditions, model internal distributions -- predictive entropy, representational diversity, and tail coverage -- progressively contract even as conventional metrics appear stable or improving. We discover that silent collapse is not abrupt. Its onset is reliably preceded by three trajectory-level precursors: (1) contraction of anchor entropy, (2) freezing of representation drift, and (3) erosion of tail coverage. These signals manifest multiple generations before any degradation in standard validation metrics, enabling early warning. Based on these precursors, we propose the MTR (Monitor--Trust--Regulator) framework, a lightweight metacognitive loop that monitors trajectory statistics, estimates a slow-timescale trust variable, and adaptively modulates the effective learning intensity. MTR provides early warning and actively prevents silent collapse without requiring access to pristine real data -- a critical advantage when original data is unavailable, contaminated, or private.

2605.14048 2026-05-20 cs.AI cs.LG 版本更新

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

面向网络的双线性分块化用于脑功能连接表示学习

Leo Milecki, Qingyu Hu, Bahram Jafrasteh, Mert R. Sabuncu, Qingyu Zhao

发表机构 * Department of Radiology, Weill Cornell Medicine, New York, NY, USA.(韦尔·科恩医学中心放射科, 纽约, NY, 美国) School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA.(康奈尔大学电气与计算机工程学院及康奈尔科技, 纽约, NY, 美国)

AI总结 本文提出了一种面向网络的双线性分块化方法,用于改进脑功能连接的表示学习,通过重新定义功能连接的分块方式,提升模型在跨群体评估中的稳定性和可迁移性。

Comments Author-submitted version, provisionally accepted at MICCAI 2026

详情
AI中文摘要

Masked autoencoders (MAEs) 近年来在静息状态脑功能连接(FC)的自监督表示学习中显示出潜力。然而,一个基本问题仍未解决:如何对FC矩阵进行分块以与大规模脑网络的内在模块化组织对齐?现有方法通常采用以区域为中心或图基的方案,将FC视为结构上均质的元素,并忽略了大规模脑网络的组织结构。我们引入NERVE(通过双线性分块化进行脑功能连接的网络感知表示学习),一种自监督学习框架,通过将FC矩阵划分为内网络和跨网络连接块来重新定义FC分块。与基于图像的MAE不同,由网络对定义的FC分块在大小上异质且对应不同的功能角色。为了解决这个问题,NERVE通过一种新的结构化双线性分解来嵌入FC分块。这种形式保留了网络身份,并将参数复杂度从网络数量的二次方减少到线性。我们评估了NERVE在三个大规模发展队列(ABCD、PNC和CCNP)中对行为和精神病理学的预测。与结构上不敏感的MAE变体和基于图的自监督基线相比,所提出的网络感知形式在跨队列评估中产生了更稳定和可迁移的表示。消融研究确认了所提出的双线性网络嵌入和解剖学基础的分区对于性能至关重要。这些发现突显了在功能连接组学中将领域特定的结构先验纳入自监督学习的重要性。代码可在:https://github.com/leomlck/NERVE。

英文摘要

Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics. Code is available at: https://github.com/leomlck/NERVE.

2605.14014 2026-05-20 cs.LG cs.AI 版本更新

Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals

Dywave: 为异构物联网传感信号设计的事件对齐动态分词方法

Tomoyoshi Kimura, Denizhan Kara, Jinyang Li, Hongjue Zhao, Yigong Hu, Yizhuo Chen, Xiaomin Ouyang, Shengzhong Liu, Tarek Abdelzaher

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Hong Kong University of Science(香港科学大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 本文提出Dywave,一种用于异构物联网传感信号的动态分词框架,通过小波基层次分解构建紧凑的输入表示,以适应内在时间结构和底层物理事件,从而在活动识别、压力评估和附近物体检测等任务中提升准确率并提高计算效率。

详情
AI中文摘要

物联网系统持续收集来自无处不在传感器的异构传感信号,以支持智能应用,如人类活动分析、情绪监测和环境感知。这些信号本质上是非平稳和多尺度的,给标准分词技术带来了独特挑战。本文提出Dywave,一种为物联网传感信号设计的动态分词框架,该框架构建了与内在时间结构和底层物理事件对齐的紧凑输入表示。Dywave利用基于小波的层次分解,识别出对应底层语义事件的时间边界,并自适应地压缩冗余区间,同时保持时间一致性。在五个真实物联网传感数据集上进行的广泛评估表明,Dywave在活动识别、压力评估和附近物体检测等任务中,比最先进的方法在准确率上提高了高达12%,同时通过减少输入标记长度最多75%来提高计算效率。此外,Dywave在面对领域偏移和变化的序列长度时表现出更强的鲁棒性。

英文摘要

Internet of Things (IoT) systems continuously collect heterogeneous sensing signals from ubiquitous sensors to support intelligent applications such as human activity analysis, emotion monitoring, and environmental perception. These signals are inherently non-stationary and multi-scale, posing unique challenges for standard tokenization techniques. This paper proposes Dywave, a dynamic tokenization framework for IoT sensing signals that constructs compact input representations aligned with intrinsic temporal structures and underlying physical events. Dywave leverages wavelet-based hierarchical decomposition, identifies meaningful temporal boundaries corresponding to underlying semantic events, and adaptively compresses redundant intervals while preserving temporal coherence. Extensive evaluations on five real-world IoT sensing datasets across activity recognition, stress assessment, and nearby object detection demonstrate that Dywave outperforms state-of-the-art methods by up to 12% in accuracy, while improving computational efficiency by reducing input token lengths by up to 75% across mainstream sequence models. Moreover, Dywave exhibits improved robustness to domain shifts and varying sequence lengths.

2605.11262 2026-05-20 cs.LG 版本更新

Latent Chain-of-Thought Improves Structured-Data Transformers

潜在的链式思维提升结构化数据转换器

Carson Dudley, Samet Oymak

AI总结 本文研究了链式思维对时间序列和表格数据的影响,并提出了一种递归方案,通过压缩查询位置的隐藏状态生成反馈标记,从而在预测前进行多次潜在计算,从而提升结构化数据转换器的性能。

详情
AI中文摘要

链式思维以及更广泛的推理时间计算已被证明能够增强语言模型的表达能力,并在推理领域带来了重大创新。受此成功启发,本文探讨了潜在链式思维以及深度和循环对时间序列和表格数据的影响。我们提出了一种递归方案,其中结构化数据转换器在初始正向传递后,将查询位置的隐藏状态压缩为反馈标记,这些标记被附加到输入并再次处理,从而在预测前允许多次潜在计算。我们比较了链式思维模型与一个相同深度无链式思维的基线模型、一个与链式思维模型在有效深度上匹配的更深层次基线模型,以及一个具有权重绑定递归但没有额外链式思维标记的循环转换器。在36个时间序列预测和表格预测数据集中,潜在链式思维在7/9个时间序列数据集上优于基线(平均提升12.63%),在23/27个表格数据集上也优于基线(平均提升3.25%),链式思维模型在两种设置中表现最佳。我们还展示了链式思维的好处扩展到了预训练基础模型:将潜在链式思维应用于nanoTabPFN,一个小型开源表格基础模型,使其性能超越了更大的TabPFN-v2在TabArena上的表现。这些结果共同表明,链式思维是扩展结构化数据推理时间计算的一个有用轴线。

英文摘要

Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent chain-of-thought as well as the impact of depth and looping for time-series and tabular data. We propose a recurrent scheme in which a structured-data transformer, after an initial forward pass, compresses its query-position hidden states into feedback tokens that are appended to the input and processed again, allowing multiple rounds of latent computation before prediction. We compare CoT models against a same-depth no-CoT baseline, a deeper baseline matched to the CoT model in effective depth, and a looped transformer with weight-tied recurrence but no additional chain-of-thought tokens. Across 36 datasets in time-series forecasting and tabular prediction, latent chain-of-thought improves over the baseline on 7/9 time-series datasets (+12.63\% average gain) and 23/27 tabular datasets (+3.25\% average gain), with CoT models performing best on average in both settings. We also show that the benefit of CoT extends to pretrained foundation models: applying latent CoT to nanoTabPFN, a small open-source tabular foundation model, improves its performance above the much larger TabPFN-v2 on TabArena. Together, these results demonstrate that chain-of-thought is a useful axis for scaling test-time compute for structured data.

2605.09329 2026-05-20 cs.CL cs.LG 版本更新

Test-Time Speculation

测试时推测

Avinash Kumar, Sujay Sanghavi, Poulami Das

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文研究了测试时推测方法,通过在线蒸馏技术提升长响应任务中推测器的接受长度,从而提高LLM推理效率。

详情
AI中文摘要

推测解码通过使用快速草稿模型生成token并用更准确的目标模型验证,从而加速LLM推理。其性能取决于接受长度,即目标模型接受的草稿token数量。我们的研究表明,即使是最先进的推测器,如DFlash、EAGLE-3和PARD,其接受长度也会随着生成长度的增加而下降,在仅几千个输出token后接近1(即无加速),这使推测器在长响应任务中变得无效。接受长度下降是因为大多数推测器在离线训练时仅在短序列上训练,但在推理时被迫匹配远长于训练分布的输出。为了解决这个问题,我们提出了测试时推测(TTS),一种在线蒸馏方法,可以在测试时连续调整推测器。TTS利用关键见解,即token验证步骤已经为每个草稿token调用了目标模型,从而提供所需的训练信号,以无额外成本地调整草稿。将草稿视为学生,目标模型视为教师,TTS在多个推测轮次中调整草稿,每次更新都提高草稿的准确性。我们的结果表明,在Qwen-3、Qwen-3.5和Llama3.1家族的多个模型上,TTS在最先进的推测器上将接受长度提高高达72%和41%,且随着生成长度的增加,收益呈比例增长。

英文摘要

Speculative decoding accelerates LLM inference by using a fast draft model to generate tokens and a more accurate target model to verify them. Its performance depends on the $\textit{acceptance length}$, or number of draft tokens accepted by the target. Our studies show that the acceptance length of even state-of-the-art speculators, like DFlash, EAGLE-3 and PARD degrade with generation length, reaching values close to 1 (i.e. no speedup) within just a few thousand output tokens, making speculators ineffective for long-response tasks. Acceptance lengths decline because most speculators are trained offline on short sequences, but are forced to match the target model on much longer outputs at inference, well beyond their training distribution. To address this issue, we propose $\textit{Test-Time Speculation (TTS)}$, an online distillation approach that continuously adapts the speculator at test-time. TTS leverages the key insight that the token verification step already invokes the target model for each draft token, providing the training signal needed to adapt the draft at no additional cost. Treating the draft as the student and the target as a teacher, TTS adjusts the draft over several speculation rounds, with each update improving the draft's accuracy as generation proceeds. Our results across multiple models from the Qwen-3, Qwen-3.5, and Llama3.1 families show that TTS improves acceptance lengths over state-of-the-art speculators by up to $72\%$ and $41\%$ on average, with the benefits scaling with increased generation lengths.

2605.04970 2026-05-20 cs.LG cs.AI 版本更新

Skill Neologisms: Towards Skill-based Continual Learning

技能新词:迈向基于技能的持续学习

Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出了一种基于技能的新词(skill neologisms)方法,通过在模型词汇中集成软token,以提高模型在特定技能上的能力,同时支持零样本组合其他技能,从而实现可扩展的基于技能的持续学习。

详情
AI中文摘要

现代大语言模型(LLMs)在不断扩大的技能范围内表现出色,并能灵活组合这些技能。然而,以可扩展的方式将模型能力扩展到新技能仍然是一个开放性问题:微调和参数高效变体有灾难性遗忘的风险,而基于上下文的方法表达能力有限且受模型有效上下文的限制。我们探索了技能新词——整合在模型词汇中的软token,并优化以提高特定技能的能力——作为一种方法,以在不更新权重的情况下选择性地获取新技能。我们首先观察到预训练LLMs已经表现出与程序知识相关的token。然后在受控的合成任务上展示,技能新词可以学习以提高模型在特定技能上的能力,同时能够与分布外技能组合,且独立训练的技能新词可以零样本组合。最后,我们验证了在更现实的自然语言设置中,即Skill-Mix基准测试中,独立学习的技能新词的零样本组合。这些结果表明,技能新词可能为基于技能的持续学习提供可扩展的路径。

英文摘要

Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively acquire new skills without weight updates. We first observe that pre-trained LLMs already exhibit tokens associated with procedural knowledge. We then show on a controlled synthetic task that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. Finally, we validate zero-shot composition of independently learned skill neologisms on the more realistic natural language setting of the Skill-Mix benchmark. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.

2605.01361 2026-05-20 cs.LG 版本更新

Decision-Focused Learning via Tangent-Space Projection of Prediction Error

通过预测误差的切线空间投影进行决策聚焦学习

Junhyeong Lee, Sangjin Jin, Yongjae Lee

发表机构 * Department of Industrial Engineering, Ulsan National Institute of Science and Technology(乌山国立科学与技术研究院工业工程系)

AI总结 本文提出了一种基于预测误差切线空间投影的决策聚焦学习方法,通过几何特征简化了后悔梯度的计算,提升了下游决策质量并提高了计算效率。

Comments 21 pages, 4 figures, 11 tables

详情
AI中文摘要

决策聚焦学习(DFL)训练预测器以提高下游决策质量,但计算后悔梯度通常需要对求解器进行微分或依赖于替代损失函数,这可能计算成本高或偏离真实目标。我们证明,在标准正则性条件下,本地稳定的活动约束下,后悔梯度具有闭式几何特征,等价于预测误差投影到活动约束的切线空间,乘以局部曲率。这表明,可以通过过滤决策无关成分来获得后悔梯度,提供了一种更简单直接的替代方法。基于此,我们提出PEAR(投影误差作为后悔梯度),通过在活动约束上减少的线性系统计算后悔梯度,避免对求解器迭代或额外优化求解进行微分。在LP基准和一个现实QP任务上的实验表明,PEAR在所有基线中实现了最佳的决策质量,同时是最具计算效率的,其优势在约束变化下依然保持。

英文摘要

Decision-Focused Learning (DFL) trains predictors to improve downstream decision quality, but computing regret gradients typically requires differentiating through solvers or relying on surrogate losses, which can be computationally expensive or deviate from the true objective. We show that, under standard regularity with locally stable active constraints, the regret gradient admits a closed-form geometric characterization, equivalent to the prediction error projected onto the tangent space of active constraints, scaled by local curvature. This reveals that regret gradients can be obtained by filtering decision-irrelevant components from the MSE gradient, providing a simpler and more direct alternative to existing approaches. Based on this, we propose PEAR (Projected Error As Regret-gradient), which computes regret gradients via a reduced linear system over active constraints, avoiding differentiation through solver iterations or additional optimization solves. Experiments on LP benchmarks and a real-world QP task show that PEAR achieves the best decision quality among all baselines while being the most computationally efficient, with gains that persist under constraint shifts.

2604.24658 2026-05-20 cs.LG 版本更新

The Last Human-Written Paper: Agent-Native Research Artifacts

最后的人写论文:代理原研究制品

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang

AI总结 该研究提出了一种名为Agent-Native Research Artifact (ARA)的协议,旨在解决传统科学论文在压缩研究过程为线性叙述时所导致的结构性缺陷,通过引入可执行的研究包结构,提升AI代理理解和扩展已发表工作的能力。

Comments 46 pages, 15 figures, 14 tables

详情
AI中文摘要

科学出版物将分支、迭代的研究过程压缩成线性叙述,丢弃了大部分发现过程中的内容。这种汇总施加了两种结构性成本:故事税,即失败实验、被拒绝的假设和分支探索过程被丢弃以适应线性叙述;以及工程税,即评审充分的叙述与代理充分的规范之间存在差距,导致关键实现细节未被书写。对于人类读者来说,这些成本是可以容忍的,但当AI代理必须理解、复制和扩展已发表的工作时,这些成本变得至关重要。我们引入了Agent-Native Research Artifact (ARA),一种协议,用机器可执行的研究包取代叙述论文,结构围绕四个层次:科学逻辑、可执行代码和完整规范、探索图保存被丢弃的失败编译,以及每个声明在原始输出中得到证据支持。三种机制支持生态系统:一个Live Research Manager,捕获日常开发中的决策和死胡同;一个ARA编译器,将传统PDF和仓库转换为ARA;以及一个ARA原生评审系统,自动化客观检查,使人类评审员能够专注于重要性、新颖性和品味。在PaperBench和RE-Bench上,ARA将问答准确率从72.4%提升到93.7%,复制成功率从57.4%提升到64.4%。在RE-Bench的五个开放扩展任务中,保留的失败痕迹加速了进展,但根据代理的能力,也可能限制代理跳出先前运行的框。我们的代码在https://github.com/Orchestra-Research/Agent-Native-Research-Artifact上开源。

英文摘要

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities. Our code is open-sourced at https://github.com/Orchestra-Research/Agent-Native-Research-Artifact.

2604.15166 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

通过深度感知移除遗忘特定方向实现类别反学习

Arman Hatami, Romina Aalishah, Ilya E. Monosov

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文提出DAMP方法,通过深度感知移除遗忘特定方向,改进类别反学习的选性遗忘,同时更好地保留保留类性能并减少深层残留遗忘结构。

Comments Accepted for oral presentation at the CVPR 2026 Workshop on Machine Unlearning for Vision (MUV). Code: https://github.com/armanhtm/DAMP

详情
AI中文摘要

机器反学习旨在在不重新训练模型的情况下移除目标知识。然而,在类别反学习中,降低遗忘类的准确性并不一定意味着真正的遗忘:遗忘的信息可能仍编码在内部表示中,而显着的遗忘可能源于分类器头部抑制而非表示移除。我们显示现有类别反学习方法往往表现出弱或负的选择性,保留遗忘类结构在深度表示中,或严重依赖最终层偏移。我们随后引入DAMP(通过投影的深度感知调节),一种单次、闭合形式的权重手术方法,可以在不使用梯度优化的情况下从预训练网络中移除遗忘特定方向。在每个阶段,DAMP在下一个可学习操作的输入空间中计算类别原型,提取遗忘方向作为相对于保留类原型的残差,并应用基于投影的更新以减少下游对这些方向的敏感性。为了保持实用性,DAMP使用从探测分离性导出的参数无关深度感知缩放规则,应用较小的编辑在早期层和较大的编辑在深层。该方法自然扩展到多类遗忘通过低秩子空间移除。在MNIST、CIFAR-10、CIFAR-100和Tiny ImageNet以及卷积和变换器架构上,DAMP比一些先前方法更接近再训练的黄金标准,改进了选择性遗忘的同时更好地保留保留类性能并减少深层残留遗忘结构。

英文摘要

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

2604.05002 2026-05-20 cs.LG cs.AI 版本更新

Learning Stable Predictors from Weak Supervision under Distribution Shift

在分布偏移下从弱监督中学习稳定的预测器

Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi

发表机构 * University of Central Florida(中央佛罗里达大学)

AI总结 本文研究了在分布偏移下从弱监督中学习稳定预测器的问题,通过CRISPR-Cas13d转录组扰动实验,探讨了监督漂移现象,并展示了弱监督在域内学习和部分跨细胞系迁移中的有效性,同时揭示了时间迁移中的失败源于监督漂移而非模型容量或简单协变量偏移。

详情
AI中文摘要

在真实标签不可用时,从弱、代理或相对监督中学习是常见的,但分布偏移下的鲁棒性仍缺乏理解,因为监督机制本身可能在不同环境中变化。我们正式将这种现象定义为监督漂移,即$P(y \mid x, c)$在不同上下文中变化,并在CRISPR-Cas13d转录组扰动实验中研究了它,其中指导效果是通过RNA-seq响应间接推断的。使用涵盖两种人类细胞系和多个诱导后时间点的公开数据,我们构建了一个受控的非独立同分布基准,具有明确的领域(细胞系)和时间偏移,同时在所有上下文中重用固定的弱标签构造以避免改变目标。在线性和树基模型中,弱监督支持域内有意义的学习(岭$R^2 = 0.356$,斯皮尔曼$ρ= 0.442$)和部分跨细胞系迁移($ρ\approx 0.40$)。相比之下,时间迁移在所有考虑的模型类别中崩溃,产生负$R^2$和弱或接近零的$ρ$(岭$R^2 = -0.145$,$ρ= 0.008$;XGBoost $R^2 = -0.155$,$ρ= 0.056$;随机森林 $R^2 = -0.322$,$ρ= 0.139$)。使用外部重新计算的弱标签、偏移分数量化和简单的缓解基线进行额外的鲁棒性分析,保持了相同定性的模式。特征-标签关联和特征重要性分析在不同细胞系中相对稳定,但在时间上变化剧烈,表明失败源于监督漂移而非模型容量或简单协变量偏移。这些结果表明,在弱监督下强域内性能可能是误导性的,并促使将特征稳定性作为轻量级诊断,用于部署前检测非可迁移性。

英文摘要

Learning from weak, proxy, or relative supervision is common when ground-truth labels are unavailable, but robustness under distribution shift remains poorly understood because the supervision mechanism itself may change across environments. We formalize this phenomenon as supervision drift, defined as changes in $P(y \mid x, c)$ across contexts, and study it in CRISPR-Cas13d transcriptomic perturbation experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using publicly available data spanning two human cell lines and multiple post-induction timepoints, we construct a controlled non-IID benchmark with explicit domain (cell line) and temporal shifts, while reusing a fixed weak-label construction across all contexts to avoid changing targets. Across linear and tree-based models, weak supervision supports meaningful learning in-domain (ridge $R^2 = 0.356$, Spearman $ρ= 0.442$) and partial cross-cell-line transfer ($ρ\approx 0.40$). In contrast, temporal transfer collapses across all model classes considered, yielding negative $R^2$ and weak or near-zero $ρ$ (ridge $R^2 = -0.145$, $ρ= 0.008$; XGBoost $R^2 = -0.155$, $ρ= 0.056$; random forest $R^2 = -0.322$, $ρ= 0.139$). Additional robustness analyses using externally recomputed weak labels, shift-score quantification, and simple mitigation baselines preserve the same qualitative pattern. Feature-label association and feature-importance analyses remain relatively stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model capacity or simple covariate shift. These results show that strong in-domain performance under weak supervision can be misleading and motivate feature stability as a lightweight diagnostic for non-transferability before deployment.

2603.25722 2026-05-20 cs.CV cs.LG 版本更新

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

无需硬负样本:基于概念的学习在不降低对比模型零样本能力的情况下实现组合性

Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez

发表机构 * Samsung AI Center(三星人工智能中心)

AI总结 本文提出了一种基于概念的学习方法,无需使用硬负样本即可在不损害对比模型零样本和检索能力的情况下实现组合性,通过简单的方法改进了文本和图像编码器的全局池化问题。

Comments Accepted at CVPR 2026. 2nd rev: update github repo URL

详情
AI中文摘要

对比视觉-语言(V&L)模型仍然是各种应用中的流行选择。然而,出现了几个限制,尤其是V&L模型学习组合性表示的能力有限。先前的方法通常通过生成定制训练数据来获得硬负样本。硬负样本已被证明可以提高组合性任务的性能,但通常只适用于单一基准,无法推广,并且可能导致基本V&L能力如零样本或检索性能的显著下降,使其不切实际。在本工作中,我们采取了不同的方法。我们识别出两个限制V&L组合性性能的根本原因:1)长训练标题不需要组合性表示;2)文本和图像编码器中的最终全局池化导致完全失去学习绑定所需的必要信息。为了解决这一问题,我们提出了两种简单的解决方案:1)使用标准NLP软件获得短的概念导向标题部分,并将其对齐到图像;2)引入无参数的跨模态注意力池化,从图像编码器中获得概念导向的视觉嵌入。通过这些更改和简单的辅助对比损失,我们获得了标准组合性基准的SOTA性能,同时保持或提高了强大的零样本和检索能力。这在不增加推理成本的情况下实现。我们在此工作的代码已发布在https://github.com/saic-fi/concept_centric_clip。

英文摘要

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard negative samples. Hard negatives have been shown to improve performance on compositionality tasks, but are often specific to a single benchmark, do not generalize, and can cause substantial degradation of basic V&L capabilities such as zero-shot or retrieval performance, rendering them impractical. In this work we follow a different approach. We identify two root causes that limit compositionality performance of V&Ls: 1) Long training captions do not require a compositional representation; and 2) The final global pooling in the text and image encoders lead to a complete loss of the necessary information to learn binding in the first place. As a remedy, we propose two simple solutions: 1) We obtain short concept centric caption parts using standard NLP software and align those with the image; and 2) We introduce a parameter-free cross-modal attention-pooling to obtain concept centric visual embeddings from the image encoder. With these two changes and simple auxiliary contrastive losses, we obtain SOTA performance on standard compositionality benchmarks, while maintaining or improving strong zero-shot and retrieval capabilities. This is achieved without increasing inference cost. We release the code for this work at https://github.com/saic-fi/concept_centric_clip.

2603.25476 2026-05-20 cs.LG 版本更新

How Class Ontology and Data Scale Affect Audio Transfer Learning

音频迁移学习中类本体和数据规模的影响

Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, Björn W. Schuller

发表机构 * CHI – Chair of Health Informatics(健康信息学系) Technical University of Munich(慕尼黑技术大学) MCML – Munich Center for Machine Learning(慕尼黑机器学习中心) Munich Center for Machine Learning(慕尼黑机器学习中心) Munich Data Science Institute(慕尼黑数据科学研究所) Group on Language, Audio, & Music(语言、音频与音乐小组) Imperial College(帝国学院)

AI总结 本文研究了在音频到音频迁移学习中,类本体和数据规模如何影响迁移学习的效果,发现增加样本和类别的数量对迁移学习有积极影响,但相似性在下游任务中起主导作用。

详情
AI中文摘要

迁移学习是深度学习中的关键概念,允许人工神经网络在数据有限的任务中受益于大量预训练数据的基础。尽管其广泛应用和明显优势,但关于迁移学习内部机制以及何时和如何有效工作的理解仍然存在许多开放问题。为此,我们进行了严格的研究,专注于音频到音频的迁移学习,在此过程中,我们在AudioSet的(基于本体的)子集上预训练各种模型状态,并在三个计算机听觉任务上进行微调:声学场景识别、鸟类活动识别和语音命令识别。我们报告说,增加预训练数据中的样本和类别的数量对迁移学习都有积极影响。然而,这通常被预训练与下游任务之间的相似性所超越,这种相似性可以导致模型学习到相似的特征。

英文摘要

Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by similarity between pre-training and the downstream task, which can lead the model to learn comparable features.

2603.24400 2026-05-20 stat.ML cs.LG 版本更新

Neural Network Models for Contextual Regression

用于上下文回归的神经网络模型

Seksan Kiatsupaibul, Pakawan Chansiripas

发表机构 * Department of Statistics, Chulalongkorn University(朱拉隆功大学统计系)

AI总结 本文提出了一种用于上下文回归的神经网络模型,通过将上下文特征确定主动子模型和拟合模型的算法分离,实现了结构化且可解释的架构,参数更少。数学上证明该架构足以用标准神经网络组件表示上下文线性回归模型,并通过数值实验表明所提模型在参数数量相当的情况下,具有更低的均方误差和更稳定的性能。

详情
AI中文摘要

我们提出了一种用于上下文回归的神经网络模型,其中回归模型依赖于确定活跃子模型的上下文特征以及一个拟合模型的算法。所提出的简单上下文神经网络(SCtxtNN)将上下文识别与上下文特定回归分离,从而实现了一个结构化且可解释的架构,其参数数量少于全连接前馈网络。我们数学上证明所提出的架构仅使用标准神经网络组件即可表示上下文线性回归模型。提供的数值实验支持这一理论结果,显示所提模型在参数数量相当的情况下,比具有相同参数数量的前馈神经网络具有更低的超额均方误差和更稳定的性能,而更大的网络只能以增加复杂性为代价提高准确性。结果表明,引入上下文结构可以提高模型效率,同时保持可解释性。

英文摘要

We propose a neural network model for contextual regression in which the regression model depends on contextual features that determine the active submodel and an algorithm to fit the model. The proposed simple contextual neural network (SCtxtNN) separates context identification from context-specific regression, resulting in a structured and interpretable architecture with fewer parameters than a fully connected feed-forward network. We show mathematically that the proposed architecture is sufficient to represent contextual linear regression models using only standard neural network components. Numerical experiments are provided to support the theoretical result, showing that the proposed model achieves lower excess mean squared error and more stable performance than feed-forward neural networks with comparable numbers of parameters, while larger networks improve accuracy only at the cost of increased complexity. The results suggest that incorporating contextual structure can improve model efficiency while preserving interpretability.

2603.22161 2026-05-20 cs.LG 版本更新

Causal Evidence that Language Models use Confidence to Drive Behavior

语言模型使用置信度驱动行为的因果证据

Dharshan Kumaran, Nathaniel Daw, Simon Osindero, Petar Veličković, Viorica Patraucean

发表机构 * Google DeepMind(谷歌深Mind) Princeton University(普林斯顿大学)

AI总结 研究探讨了语言模型是否利用置信度信号来控制行为,如决定回答或 abstain,通过四个阶段实验发现模型使用多维内部置信表示和阈值策略来实现 abstention,揭示了结构化的元认知控制机制。

详情
AI中文摘要

元认知——评估自身认知表现的质量——指导跨物种的适应性行为。大量研究表明可以从语言模型输出中提取置信度信号,但一个根本问题仍然存在:模型是否真的利用这些信号来控制行为,例如决定是否回答或 abstain?为调查这一问题,我们开发了一个四阶段范式。第一阶段获取了无 abstention 选项的基线置信度估计。第二阶段揭示了 LLMs 在决定 abstain 时应用隐含阈值,置信度效应大小大约比其他机制大一个数量级。第三阶段通过激活引导提供了直接的因果证据:提升或抑制置信度信号会相应地降低或增加 abstention 率。第四阶段通过系统地变化指示阈值,证明 LLMs 主动部署置信度信号以实施 abstention 策略。关键的是,除了基于输出分布的校准对数概率置信度外,口头置信度在所有模型中独立预测 abstention,尽管其客观上对答案正确性的区分能力较弱。最后预答标记的激活解码进一步显示,这两种可观察的指标都是更丰富的内部表示的损失性读取。总体而言,这些结果表明,abstention 不仅仅是输出分布中证据强度的简单体现,而是更好地由多维内部置信表示和基于阈值的策略的联合操作所解释——与 LLMs 中的结构化元认知控制机制一致,这一能力在模型向自主代理过渡时变得越来越重要,因为这些代理必须识别自身的不确定性。

英文摘要

Metacognition -- assessing the quality of one's own cognitive performance -- guides adaptive behavior across species. Substantial research demonstrates that confidence signals can be extracted from language model outputs, yet a fundamental question remains: do models actually use these signals to control behavior, such as deciding whether to answer or abstain? To investigate, we developed a four-phase paradigm. Phase~1 elicited baseline confidence estimates without an abstention option. Phase~2 revealed that LLMs apply an implicit threshold to internal confidence when deciding to abstain, with confidence effect sizes approximately an order of magnitude larger than alternative mechanisms. Phase~3 provided direct causal evidence through activation steering: boosting or suppressing confidence signals correspondingly decreased or increased abstention rates. Phase~4 extended this by systematically varying instructed thresholds, demonstrating that LLMs actively deploy confidence signals to implement abstention policies. Critically, beyond calibrated log-probability based confidence derived from the output distribution, verbal confidence independently predicted abstention across all models, despite being objectively less discriminatory of answer correctness. Activation decoding at the last pre-answer token further showed that both observable measures are lossy readouts of a richer internal representation. Together, these results suggest that abstention is not fully captured by the strength of evidence in the output distribution alone, but is better explained by the joint operation of a multidimensional internal confidence representation and threshold-based policies -- consistent with structured metacognitive control in LLMs, a capacity of growing importance as models transition to autonomous agents that must recognize their own uncertainty.

2603.18396 2026-05-20 cs.LG cs.RO 版本更新

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

RE-SAC:在公交车队控制中解耦偶然风险和本质风险:一种稳定且稳健的集成深度强化学习方法

Yifan Zhang, Liang Zheng

发表机构 * Central South University(中南大学)

AI总结 该研究提出RE-SAC方法,通过解耦偶然风险和本质风险来提升公交车队控制的稳定性与鲁棒性,采用积分概率度量(IPM)基于的权重正则化和多样化Q-集成来应对不同类型的不确定性。

详情
AI中文摘要

公交保持控制因随机交通和乘客需求而具有挑战性。尽管深度强化学习(DRL)展现出潜力,但标准的actor-critic算法在波动环境中面临Q值不稳定的问题。这种不稳定性的一个关键来源是将两种不同的不确定性混淆:偶然不确定性(不可减少的噪声)和本质不确定性(数据不足)。将它们视为单一风险会导致在嘈杂状态下的价值低估,从而导致灾难性策略崩溃。我们提出了一种稳健的集成软actor-critic(RE-SAC)框架,以明确解耦这些不确定性。RE-SAC将积分概率度量(IPM)基于的权重正则化应用于批评者网络,以对抗偶然风险,为鲁棒Bellman算子提供平滑的分析下界,而无需昂贵的内循环扰动。为了应对本质风险,一个多样化Q-集成对稀疏覆盖区域中的过度自信价值估计进行惩罚。这种双重机制防止了集成方差将噪声误认为数据缺口,这种失败模式在我们的消融研究中被识别。在现实的双向公交走廊模拟实验中,RE-SAC在累计奖励(约-0.4e6)方面优于标准SAC(-0.55e6)。Mahalanobis稀有性分析证实,RE-SAC在罕见的分布外状态中将Oracle Q值估计误差减少了高达62%(MAE为1647 vs. 4343),展示了在高交通变异性下的优越鲁棒性。

英文摘要

Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.

2603.17305 2026-05-20 cs.AI cs.CL cs.LG 版本更新

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

对比推理对齐:从隐藏表示中学习强化学习

Haozheng Luo, Yimin Wang, Jiahao Yu, Binghui Wang, Yan Chen

发表机构 * Northwestern University(西北大学) University of Michigan(密歇根大学) Illinois Institute of Technology(伊利诺伊理工学院)

AI总结 本文提出了一种基于对比学习和强化学习的框架CRAFT,通过优化隐藏状态空间中的目标来提升对抗攻击的鲁棒性,核心贡献是通过隐藏空间的几何结构实现推理层面的安全对齐。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

我们提出CRAFT,一种红队对齐框架,利用模型推理能力和隐藏表示来提高对jailbreak攻击的鲁棒性。与以往主要在输出层面操作的防御方法不同,CRAFT将大型推理模型对齐以生成安全意识的推理轨迹,通过显式优化定义在隐藏状态空间上的目标。方法上,CRAFT将对比表示学习与强化学习相结合,分离安全和不安全的推理轨迹,得到支持鲁棒、推理层面安全对齐的潜在空间几何。理论上,我们证明将潜在文本一致性纳入GRPO可以消除表面上对齐的策略,将其排除在局部最优之外。实验上,我们在多个安全基准上评估CRAFT,使用两个强大的推理模型Qwen3-4B-Thinking和R1-Distill-Llama-8B,其中它在多个安全基准上均优于IPO和SafeKey等最先进的防御方法。值得注意的是,CRAFT在基础模型上实现了平均79.0%的推理安全性和87.7%的最终响应安全性提升,证明了隐藏空间推理对齐的有效性。

英文摘要

We propose CRAFT, a red-teaming alignment framework that leverages model reasoning capabilities and hidden representations to improve robustness against jailbreak attacks. Unlike prior defenses that operate primarily at the output level, CRAFT aligns large reasoning models to generate safety-aware reasoning traces by explicitly optimizing objectives defined over the hidden state space. Methodologically, CRAFT integrates contrastive representation learning with reinforcement learning to separate safe and unsafe reasoning trajectories, yielding a latent-space geometry that supports robust, reasoning-level safety alignment. Theoretically, we show that incorporating latent-textual consistency into GRPO eliminates superficially aligned policies by ruling them out as local optima. Empirically, we evaluate CRAFT on multiple safety benchmarks using two strong reasoning models, Qwen3-4B-Thinking and R1-Distill-Llama-8B, where it consistently outperforms state-of-the-art defenses such as IPO and SafeKey. Notably, CRAFT delivers an average 79.0% improvement in reasoning safety and 87.7% improvement in final-response safety over the base models, demonstrating the effectiveness of hidden-space reasoning alignment.

2603.05933 2026-05-20 cs.CL cs.LG 版本更新

Structured Style-Rewrite with Chain-of-Thought Planning for Low-Resource Character Dialogue

结构化风格重写与思维链规划用于低资源字符对话

Chanhui Zhu

发表机构 * Guangdong University of Finance(广东金融学院)

AI总结 本文提出了一种结构化风格重写框架,结合思维链规划,以解决低资源条件下中文字符驱动生成中的风格分离问题,通过分解角色风格为可解释的格式签名、语法和语用维度,并利用思维链监督进行显式风格规划,实验表明该方法在保持语义忠实性的同时提升了风格质量。

Comments 30 pages, 5 figures. Preprint

详情
AI中文摘要

将小型语言模型(SLMs)应用于中文字符驱动生成仍然具有挑战性,因为数据稀缺和分离角色风格困难。标准监督微调(SFT)通常捕捉到表层语义,但会产生频繁的越界输出(OOC)。我们将此问题框架为受控的句子级风格重写任务,该任务将风格质量与对话情境管理分离。我们提出了一种结构化风格重写框架,将角色风格分解为可解释的格式签名、语法和语用维度,并结合思维链(CoT)监督进行显式风格规划。一个CoT共享直接偏好优化(DPO)阶段进一步通过确保偏好学习目标输出层面的风格执行而非推理轨迹差异来对齐风格规划与表层实现。在八个角色四个不同源领域的实验中,我们的方法使Qwen3-1.7B模型在有效风格得分上达到0.632,同时保持强语义忠实性(0.878),在评估系统中处于帕累托前沿,并在消费级硬件上显著优于更大的基线(如GLM-4.7)

英文摘要

Applying Small Language Models (SLMs) to Chinese character-driven generation remains challenging due to data scarcity and the difficulty of disentangling character style. Standard Supervised Fine-Tuning (SFT) often captures surface-level semantics but produces frequent Out-Of-Character (OOC) outputs. We frame this as a controlled sentence-level style rewriting task, which isolates stylistic quality from dialogue context management. We propose a Structured Style-Rewrite Framework that decomposes character style into interpretable format signature, syntactic, and pragmatic dimensions, combined with Chain-of-Thought (CoT) supervision for explicit style planning. A CoT-Shared Direct Preference Optimization (DPO) stage further aligns style planning with surface realization by ensuring preference learning targets output-level style execution rather than reasoning trace differences. Experiments across eight characters from four diverse source domains demonstrate that our method enables a Qwen3-1.7B model to achieve a Valid Style Score of $0.632$ while maintaining strong semantic fidelity (0.878), placing on the Pareto frontier among the evaluated systems and outperforming significantly larger baselines (e.g., GLM-4.7) on consumer hardware.

2603.05066 2026-05-20 cs.LG 版本更新

Reward-Conditioned Reinforcement Learning

基于奖励的强化学习

Michal Nauman, Marek Cygan, Pieter Abbeel

发表机构 * University of Warsaw(华沙大学) Nomagic UC Berkeley, Amazon FAR(伯克利大学,亚马逊FAR)

AI总结 本文提出基于奖励的强化学习(RCRL),通过在收集经验时使用单一名义目标,使智能体在不额外交互的情况下暴露于多种奖励目标,从而提高样本效率并支持零样本行为调整。

Comments preprint

详情
AI中文摘要

单任务强化学习代理通常在固定奖励函数下训练,这限制了它们对奖励误指定的鲁棒性和适应变化偏好的能力。我们引入基于奖励的强化学习(RCRL),一种离策略方法,该方法在收集经验时将智能体条件于奖励参数化,同时通过重新计算共享回放数据中的反事实奖励,使智能体暴露于多种奖励目标,而无需额外的环境交互。在单任务、多任务和基于视觉的基准测试中,RCRL在名义奖励参数化下提高了样本效率,能够高效适应新参数化,并在部署时支持零样本行为调整。我们的结果表明,RCRL提供了一种可扩展的机制,用于学习鲁棒且可操控的策略,而无需牺牲单任务训练的简洁性。

英文摘要

Single-task RL agents are typically trained under a fixed reward function, which limits their robustness to reward misspecification and their ability to adapt to changing preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), an off-policy method that conditions agents on reward parameterizations while collecting experience under a single nominal objective. By recomputing counterfactual rewards from shared replay data, RCRL exposes the agent to multiple reward objectives without additional environment interaction, connecting single-task RL with ideas from multi-objective and multi-task learning. Across single-task, multi-task, and vision-based benchmarks, RCRL improves sample efficiency under the nominal reward parameterization, enables efficient adaptation to new parameterizations, and supports zero-shot behavioral adjustment at deployment. Our results show that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

2602.18718 2026-05-20 stat.ML cs.LG math.OC stat.CO 版本更新

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

基于Bures-沃斯特斯坦空间到参数空间的随机梯度变分推断与Price梯度估计

Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Yale University(耶鲁大学) University of British Columbia(不列颠哥伦比亚大学) University of California San Diego(加州大学圣地亚哥分校)

AI总结 本文研究了在仅给定目标分布无规范化的对数密度时,利用随机梯度的变分推断方法。通过比较Wasserstein VI和Black-Box VI,发现WVI在使用Price梯度估计时具有更优的收敛性,本文进一步证明两者在迭代复杂度上可以达到一致的最优结果。

Comments Accepted to ICML'26

详情
AI中文摘要

对于仅给定目标分布无规范化的对数密度时,基于随机梯度的变分推断(VI)算法是一种流行的方法。例如,Wasserstein VI(WVI)和Black-Box VI(BBVI)分别在测度空间(Bures-Wasserstein空间)和参数空间上执行梯度下降。此前,对于高斯变分族,WVI的收敛性保证显示出优于使用重参数化梯度的Black-Box VI的结果,表明测度空间方法可能提供一些独特优势。然而,本文通过获得两者相同的最优迭代复杂度保证,填补了这一差距。特别是,我们发现WVI的优越性源于其使用的特定梯度估计器,BBVI也可以通过少量修改利用该估计器。所讨论的估计器通常与Price定理相关,并利用目标对数密度的二阶信息(Hessian)。我们将此称为Price梯度。另一方面,WVI可以通过使用重参数化梯度使其更广泛适用,这只需要对数密度的梯度。我们实验证明,使用Price梯度是性能提升的主要来源。

英文摘要

For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space) and parameter space, respectively. Previously, for the Gaussian variational family, convergence guarantees for WVI have shown superiority over existing results for black-box VI with the reparametrization gradient, suggesting the measure space approach might provide some unique benefits. In this work, however, we close this gap by obtaining identical state-of-the-art iteration complexity guarantees for both. In particular, we identify that WVI's superiority stems from the specific gradient estimator it uses, which BBVI can also leverage with minor modifications. The estimator in question is usually associated with Price's theorem and utilizes second-order information (Hessians) of the target log-density. We will refer to this as Price's gradient. On the flip side, WVI can be made more widely applicable by using the reparametrization gradient, which requires only gradients of the log-density. We empirically demonstrate that the use of Price's gradient is the major source of performance improvement.

2602.10933 2026-05-20 cs.LG 版本更新

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

CMAD:通过随机最优控制的协作多智能体扩散

Riccardo Barbano, Alexander Denker, Zeljko Kereta, Runchang Li, Francisco Vargas

发表机构 * University of Cambridge(剑桥大学) Xaira Technologies(Xaira技术公司)

AI总结 本文提出了一种新的框架,将多模型组合生成问题转化为协作随机最优控制问题,通过联合优化扩散轨迹来实现更有效的生成效果。

详情
AI中文摘要

连续时间生成模型在图像恢复和合成中取得了显著成功。然而,控制多个预训练模型的组合仍是一个开放性挑战。当前方法大多将组合视为概率密度的代数组合,如通过概率密度的产品或专家混合。这种观点假设目标分布已知,这几乎从未发生。在本文中,我们提出了一种不同的范式,将组合生成视为协作随机最优控制问题。与其结合概率密度,我们把预训练的扩散模型视为相互作用的智能体,其扩散轨迹通过最优控制共同引导,朝着其聚合输出上定义的共享目标前进。我们在条件MNIST生成上验证了我们的框架,并将其与一个简单的基线进行比较,该基线在推理时间用每步梯度引导替代了学习的协作控制。

英文摘要

Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open challenge. Current approaches largely treat composition as an algebraic composition of probability densities, such as via products or mixtures of experts. This perspective assumes the target distribution is known explicitly, which is almost never the case. In this work, we propose a different paradigm that formulates compositional generation as a cooperative Stochastic Optimal Control problem. Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output. We validate our framework on conditional MNIST generation and compare it against a naïve inference-time DPS-style baseline replacing learned cooperative control with per-step gradient guidance.

2602.03924 2026-05-20 cs.LG cs.AI physics.ao-ph 版本更新

WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling

WIND:用于零样本大气建模的天气反向扩散

Michael Aich, Andreas Fürst, Florian Sestak, Carlos Ruiz-Gonzalez, Niklas Boers, Johannes Brandstetter

发表机构 * Munich Climate Center(慕尼黑气候中心) Earth System Modelling Group, TUM School of Engineering(地球系统建模组,技术大学工程学院) Design, Technical University of Munich, Germany(设计,慕尼黑技术大学,德国) ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria(ELLIS单元,LIT人工智能实验室,机器学习研究所,JKU林茨,奥地利) Emmi AI GmbH, Linz, Austria(Emmi AI GmbH,林茨,奥地利) Potsdam Institute for Climate Impact Research, Potsdam, Germany(波茨坦气候影响研究所,波茨坦,德国) Department of Mathematics, University of Exeter, Exeter, United Kingdom(数学系,埃克塞特大学,埃克塞特,英国)

AI总结 本文提出WIND,一种统一的预训练基础模型,能够无需任务特定微调即可替代各种任务的专用基线,通过自监督视频重建目标预训练,实现了对大气的鲁棒、任务无关的先验学习,从而解决天气和气候问题,如概率预报、空间时间降尺度、从稀疏观测重建空间场以及强制全球干空气质量守恒。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

深度学习已革新了天气预报,但仍有诸多挑战,包括气候建模。此外,当前领域仍然碎片化:高度专门化的模型通常为不同任务单独训练。为统一这一领域,我们引入WIND,一种单一预训练的基础模型,能够替代各种任务的专用基线。关键在于,与之前的气象基础模型不同,我们无需任何任务特定的微调。为了学习大气的鲁棒、任务无关的先验,我们使用无条件视频扩散模型预训练WIND,通过自监督视频重建目标迭代地从噪声状态重建大气动态。在推理时,我们将各种领域特定的问题严格视为反问题,并通过后验采样解决。这种统一的方法使我们能够解决高度相关的天气和气候问题,包括概率预报、空间和时间降尺度、从稀疏观测重建空间场以及强制全球干空气质量守恒。我们进一步展示了WIND如何在给定的非分布热力学扰动下用于探索极端天气事件。通过结合生成视频建模与反问题求解,WIND为基于AI的大气建模提供了一种计算高效的替代方案。

英文摘要

Deep learning has revolutionized weather forecasting, but many challenges remain, including climate modeling. Moreover, the current landscape remains fragmented: highly specialized models are typically trained individually for distinct tasks. To unify this landscape, we introduce WIND, a single pre-trained foundation model capable of replacing specialized baselines across a vast array of tasks. Crucially, in contrast to previous atmospheric foundation models, we achieve this without any task-specific fine-tuning. To learn a robust, task-agnostic prior of the atmosphere, we pre-train WIND with a self-supervised video reconstruction objective, utilizing an unconditional video diffusion model to iteratively reconstruct atmospheric dynamics from a noisy state. At inference, we frame diverse domain-specific problems strictly as inverse problems and solve them via posterior sampling. This unified approach allows us to tackle highly relevant weather and climate problems, including probabilistic forecasting, spatial and temporal downscaling, reconstruction of spatial fields from sparse observations and enforcing global dry air mass conservation. We further demonstrate how WIND can be applied to explore extreme weather events under prescribed out-of-distribution thermodynamic perturbations. By combining generative video modeling with inverse problem solving, WIND offers a computationally efficient alternative for AI-based atmospheric modeling.

2602.03839 2026-05-20 cs.LG 版本更新

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

理解并利用权重更新稀疏性以实现通信高效的分布式强化学习

Erfan Miahi, Eugene Belilovsky

发表机构 * Covenant AI Mila, Concordia University(蒙特利尔大学米尔实验室)

AI总结 本文研究了在带宽受限的分布式强化学习中,通过利用权重更新的稀疏性来减少通信开销,提出了一种名为PULSE的算法,通过计算可见稀疏化原则,实现了高效的权重同步和伪梯度同步。

Comments 40 pages, 19 figures, 14 tables

详情
AI中文摘要

带宽受限的分布式强化学习(RL)在大规模语言模型训练后受到两个通道的限制:从训练器到推理工人的权重同步,以及训练器之间的梯度或伪梯度同步。我们发现,在标准训练和推理前向传递中使用的BF16转换后,大约99%的每步权重更新在视觉上是不可见的。我们通过展示,在典型的RL训练后学习率下,Adam更新通常低于本地BF16舍入阈值,解释了这种稀疏性。我们将这一观察转化为一种名为计算可见稀疏化的算法原则:仅传输会改变下一个前向传递的更新。PULSE(Precision-gated Updates for Low-precision Sparse Exchange)将这一原则转化为两种通信算法:PULSESync从训练器向推理工发送无损稀疏BF16权重补丁,PULSELoCo通过误差反馈稀疏化DiLoCo风格的FP32伪梯度同步。在带宽受限的商用网络上,PULSESync在重建训练器权重位相同的情况下,将权重同步通信减少了超过100倍。PULSELoCo在四个模型上与DiLoCo相当,同时在训练器之间的通信减少了超过17倍,与DiLoCo相比,超过100倍,与DDP相比。

英文摘要

Bandwidth-constrained distributed reinforcement learning (RL) post-training of large language models is bottlenecked by two channels: weight synchronization from trainers to inference workers, and gradient or pseudo-gradient synchronization across trainers. We find that approximately 99% of per-step weight updates are invisible after the BF16 cast used by standard training and inference forward passes. We explain this sparsity by showing that, at typical RL post-training learning rates, Adam updates often fall below the local BF16 rounding threshold. We turn this observation into an algorithmic principle called compute-visible sparsification: transmit only updates that would change the next forward pass. PULSE (Precision-gated Updates for Low-precision Sparse Exchange) turns this principle into two communication algorithms: PULSESync sends lossless sparse BF16 weight patches from trainers to inference workers, and PULSELoCo sparsifies DiLoCo-style FP32 pseudo-gradient synchronization with error feedback. Over bandwidth-constrained commodity networks, PULSESync cuts weight-synchronization communication by over 100x while reconstructing trainer weights bit-identically. PULSELoCo matches DiLoCo across four models while reducing trainer-to-trainer communication by over 17x versus DiLoCo and over 100x versus DDP in the largest evaluated setting.

2601.16200 2026-05-20 cs.LG cs.CV 版本更新

Feature-Space Smoothing: Certified Robustness of Deep Representations

特征空间平滑:深度表示的认证鲁棒性

Song Xia, Meiwen Ding, Chenqi Kong, Wenhan Yang, Xudong Jiang

发表机构 * Rapid-Rich Object Search Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore(快速-丰富目标搜索实验室,电气电子工程学院,南洋理工大学,新加坡) Pengcheng Laboratory, Shenzhen, China(鹏城实验室,深圳,中国)

AI总结 本文提出了一种特征空间平滑(FS)框架,通过在特征表示层面提供认证鲁棒性,以解决深度学习模型对恶意输入的脆弱性问题,核心方法是通过特征平滑保证清洁和对抗特征之间的余弦相似度下界,并引入高斯平滑增强器(GSB)提升编码器的高斯鲁棒性得分,从而提升模型的鲁棒性并保持下游任务性能。

Comments Under review

详情
AI中文摘要

现代深度学习模型在多种应用中表现出强大的能力,但仍然容易受到通过特征空间扭曲诱导错误预测的恶意输入的攻击。为了解决这一脆弱性,我们提出了特征空间平滑(FS),一种通用的防御框架,该框架能够在特征表示层面提供认证鲁棒性。我们证明,FS将给定的特征编码器转换为一个平滑版本,该版本在l_2有界扰动下保证清洁和对抗特征之间的余弦相似度的认证下界。然后我们建立该特征余弦相似度下界(FCSB)可以扩展到预测层面的认证,其值由编码器内在的高斯鲁棒性得分决定。基于这些见解,我们引入了高斯平滑增强器(GSB),一个即插即用的模块,用于提升编码器的高斯鲁棒性得分。具体来说,GSB模块被插入以增强特征空间的一致性,并在高斯扰动下保持特征的实用性,以供下游任务使用。这种设计使FS能够无缝集成到受保护的模型上,例如多模态大语言模型(MLLMs),而无需额外的模型重新训练或对齐,从而在提升鲁棒性的同时保持下游任务的性能。广泛的实验表明,整合FS一致地提供了非平凡的认证鲁棒性,并在多种模型和应用中显著提高了面向任务的性能,即使在强白盒对抗攻击下也如此。

英文摘要

Modern deep learning models exhibit strong capabilities across diverse applications, yet remain vulnerable to malicious inputs that induce erroneous predictions via feature-space distortion. To address this vulnerability, we propose Feature-space Smoothing (FS), a general defense framework that provides certified robustness at the feature representation level. We show that FS converts a given feature encoder into a smoothed variant that is guaranteed to maintain a certified lower bound on the cosine similarity between clean and adversarial features under l_2-bounded perturbations. We then establish that this Feature Cosine Similarity Bound (FCSB) can be extended to the prediction-wise certification under the cosine similarity measure, and the value of FCSB is determined by the encoder intrinsic Gaussian robustness score. Building on those insights, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module to improve the encoder Gaussian robustness score. Specifically, the GSB module is plugged to enhance the feature-space consistency and maintain the feature utility for downstream tasks under Gaussian perturbations. This design enables seamless integration of FS on the protected model, e.g., Multimodal Large Language Models (MLLMs), without additional model retraining or alignment, improving its robustness while preserving the performance for downstream task-oriented decoding. Extensive experiments demonstrate that integrating FS consistently provides non-trivial certified robustness and significantly improves task-oriented performance under strong white-box adversarial attacks across diverse models and applications.

2601.15014 2026-05-20 stat.ML cs.LG math.ST stat.TH 版本更新

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

高效且最优的基于上下文的非参数回归变换器

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

发表机构 * Statistical Laboratory, University of Cambridge, Cambridge, UK(剑桥大学统计实验室,剑桥,英国)

AI总结 本文研究了基于上下文学习的非参数回归,针对α-Holder光滑回归函数,证明了使用预训练的变换器可以达到最优收敛率,且参数和预训练序列数量显著少于现有文献。

Comments 30 pages, 7 figures

详情
AI中文摘要

我们研究了对于某些α>0的α-Hölder光滑回归函数的基于上下文学习的非参数回归。我们证明,使用n个基于上下文的例子和d维回归协变量,一个具有Θ(log n)参数和Ω(n^{2α/(2α+d)} log^3 n)预训练序列的预训练变换器可以达到均方误差的最优收敛率O(n^{-2α/(2α+d)})。我们的结果需要比文献中现有的结果显著更少的变换器参数和预训练序列。这通过展示变换器能够通过实现核加权多项式基并随后运行梯度下降来高效地近似局部多项式估计器来实现。

英文摘要

We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

2601.14848 2026-05-20 cs.LG cs.AI cs.NE cs.RO 版本更新

From Observation to Prediction: LSTM for Vehicle Lane Change Forecasting on Highway On/Off-Ramps

从观测到预测:LSTM用于高速公路进出匝道的车辆车道变更预测

Mohamed Abouras, Catherine M. Elias

发表机构 * C-DRiVeS Lab: Cognitive Driving Research in Vehicular Systems(C-DRiVeS实验室:车载系统认知驾驶研究) Computer Science and Engineering Department - Faculty of Media Engineering and Technology - German University in Cairo(计算机科学与工程系 - 媒体工程与技术学院 - 埃及德国大学)

AI总结 本文研究了高速公路进出匝道区域与直线路段的区别,利用多层LSTM架构和ExiD无人机数据集训练模型,测试了不同预测时间范围和不同模型的工作流程,结果表明在4秒内预测准确率可达76%(匝道区域)和94%(一般高速公路场景).

详情
AI中文摘要

进出匝道是尽管引入了更高的高速公路交互变异水平但仍然被低估的道路部分。预测这些区域车辆的行为可以减少不确定性的影响并提高道路安全性。在本文中,研究了该感兴趣区域(AoI)与直线路段之间的差异。利用多层LSTM架构和ExiD无人机数据集训练AoI模型。在过程中测试了不同的预测时间范围和不同模型的工作流程。结果表明,在最大预测时间范围内,预测准确率在4秒内显示出巨大潜力,匝道区域的预测准确率从约76%开始,而一般高速公路场景的预测准确率在最大预测时间范围内达到94%。

英文摘要

On and off-ramps are understudied road sections even though they introduce a higher level of variation in highway interactions. Predicting vehicles' behavior in these areas can decrease the impact of uncertainty and increase road safety. In this paper, the difference between this Area of Interest (AoI) and a straight highway section is studied. Multi-layered LSTM architecture to train the AoI model with ExiD drone dataset is utilized. In the process, different prediction horizons and different models' workflow are tested. The results show great promise on horizons up to 4 seconds with prediction accuracy starting from about 76% for the AoI and 94% for the general highway scenarios on the maximum horizon.

2512.24139 2026-05-20 cs.LG stat.ME 版本更新

Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction

Colorful Pinball:基于密度加权分位数回归的条件保证置信预测

Qianyi Chen, Bo Li

发表机构 * School of Economics and Management, Tsinghua University, China(清华大学经济管理学院)

AI总结 本文提出了一种基于密度加权分位数回归的条件保证置信预测方法,通过改进标准置信预测的条件覆盖性能,提供更精确的非渐近保证。

Comments ICML 2026

详情
AI中文摘要

尽管置信预测提供了稳健的边缘覆盖保证,但实现特定输入的可靠条件覆盖仍然具有挑战性。虽然有限样本下无法获得精确的分布无关条件覆盖,但近期研究集中在改进标准置信程序的条件覆盖性能上。与针对放宽条件覆盖概念的方法不同,我们直接针对条件覆盖的均方误差,通过优化支撑许多置信方法的分位数回归组件来改进。利用泰勒展开,我们推导出一种尖锐的替代目标函数:密度加权pinball损失,其中权重由非置信分数的条件密度在真实分位数处的值给出。我们提出了一种三头分位数网络,通过使用辅助分位数水平$1-α\pm δ$的有限差分估计这些权重,随后通过优化加权损失微调中心分位数。我们提供了具有精确非渐近保证的理论分析,刻画了由此产生的超额风险。在多样化的高维真实世界数据集上的广泛实验展示了在条件覆盖性能上的显著改进。

英文摘要

Although conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. While exact distribution-free conditional coverage is impossible with finite samples, recent work has focused on improving the conditional coverage of standard conformal procedures. Distinct from approaches that target relaxed notions of conditional coverage, we directly target the mean squared error of conditional coverage by refining the quantile regression components that underpin many conformal methods. Leveraging a Taylor expansion, we derive a sharp surrogate objective for quantile regression: a density-weighted pinball loss, where the weights are given by the conditional density of the nonconformity score evaluated at the true quantile. We propose a three-headed quantile network that estimates these weights via finite differences using auxiliary quantile levels at $1-α\pm δ$, subsequently fine-tuning the central quantile by optimizing the weighted loss. We provide a theoretical analysis with exact non-asymptotic guarantees characterizing the resulting excess risk. Extensive experiments on diverse high-dimensional real-world datasets demonstrate remarkable improvements in conditional coverage performance.

2511.16062 2026-05-20 cs.LG 版本更新

Gauge-Equivariant Graph Networks via Self-Interference Cancellation

通过自干扰消除的 gauge-等变图网络

Yoonhyuk Choi, Jiho Choi, Jiwoo Kang

发表机构 * Department of Artificial Intelligence, Sookmyung Women's University, Seoul, South Korea(首尔大学女子大学人工智能系) Korea Advanced Institute of Science and Technology, Seoul, South Korea(韩国科学技术院)

AI总结 本文提出了一种通过自干扰消除的 gauge-等变图网络(GESC),该网络通过投影机制替代传统的加法聚合,有效处理异质图中的自干扰问题,从而提升模型在异质图上的表现。

详情
AI中文摘要

图神经网络(GNNs)在同质图上表现优异,但在异质图上常常因自我强化和相位不一致的信号而失效。我们提出了一种通过自干扰消除的 gauge-等变图网络(GESC),该网络通过投影机制替代传统的加法聚合。与以往依赖加法信息混合的磁性或 gauge-等变 GNN 不同,GESC 显式建模由于冗余低频成分产生的自干扰。我们表明现有 gauge 基础 GNN 中缺乏干扰处理是导致 gauge 传输下过平滑现象的主要原因。我们引入一个 U(1) 相位连接,随后进行秩-1 投影以在注意力之前抑制自平行成分,并引入一个考虑符号的门控来调节负向对齐的邻居。在多样化的图基准测试中,GESC 一致优于最近最先进的模型,同时提供了一个统一的、具有干扰意识的信息传递视角。我们的代码可在 https://github.com/ChoiYoonHyuk/GESC 上获得。

英文摘要

Graph Neural Networks (GNNs) excel on homophilous graphs but often fail under heterophily due to self-reinforcing and phase-inconsistent signals. We propose a \textbf{G}auge-\textbf{E}quivariant Graph Network with \textbf{S}elf-Interference \textbf{C}ancellation (GESC), which replaces additive aggregation with a projection-based interference mechanism. Unlike prior magnetic or gauge-equivariant GNNs that rely on additive message mixing, GESC explicitly models self-interference arising from redundant low-frequency components. We show that the absence of interference handling in existing gauge-based GNNs is a primary driver of oversmoothing under gauge transport. We introduce a $\mathrm{U}(1)$ phase connection followed by a rank-1 projection that suppresses self-parallel components before attention, and a sign-aware gate that regulates negatively aligned neighbors. Across diverse graph benchmarks, GESC consistently outperforms recent state-of-the-art models while offering a unified, interference-aware view of message passing. Our code is available at https://github.com/ChoiYoonHyuk/GESC.

2511.13174 2026-05-20 cs.LG 版本更新

Warm-starting active-set solvers using graph neural networks

利用图神经网络进行主动集求解器的预热启动

Ella J. Schmidtobreick, Daniel Arnström, Paul Häusner, Jens Sjölund

发表机构 * Department of Information Technology, Uppsala University(信息技术系,乌普萨拉大学)

AI总结 本文提出利用图神经网络预测双主动集求解器DAQP中的活跃约束,通过将二次规划问题表示为二分图来利用其结构特性,从而有效预热启动求解器,减少迭代次数,并在不同问题规模下展示出良好的泛化能力和可扩展性。

Comments Accepted at Learning for Dynamics and Control Conference (L4DC)

详情
AI中文摘要

二次规划(QP)求解器在实时控制和优化中被广泛使用,但其计算成本通常限制了在时间敏感设置中的应用。为了解决这一问题,我们提出了一种学习优化方法,利用图神经网络(GNN)来预测双主动集求解器DAQP中的活跃约束。我们的方法通过将QP表示为二分图,利用其结构特性,学习近似最优活跃集以有效预热启动求解器。在不同问题规模下,GNN始终比冷启动减少求解器迭代次数,同时性能与多层感知机基线相当。与基线相比,我们的基于GNN的方法在不同问题规模上训练后,能够泛化到未见过的维度,展示了灵活性和可扩展性。这些结果突显了结构感知学习在实时应用如模型预测控制中加速优化的潜力。

英文摘要

Quadratic programming (QP) solvers are widely used in real-time control and optimization, but their computational cost often limits applicability in time-critical settings. To resolve this, we propose a learning-to-optimize approach using graph neural networks (GNNs) to predict active constraints in the dual active-set solver DAQP. Our method exploits the structural properties of QPs by representing them as bipartite graphs and learns to approximate the optimal active set for effectively warm-starting the solver. Across varying problem sizes, the GNN consistently reduces the number of solver iterations compared to cold-starting, while performance is comparable to a multilayer perceptron baseline. In contrast to the baseline, our GNN-based approach trained on varying problem sizes generalizes to unseen dimensions, demonstrating flexibility and scalability. These results highlight the potential of structure-aware learning to accelerate optimization in real-time applications such as model predictive control.

2511.01959 2026-05-20 astro-ph.IM astro-ph.CO astro-ph.HE cs.LG 版本更新

Addressing prior dependence in hierarchical Bayesian modeling for PTA data analysis II: Noise and SGWB inference through parameter decorrelation

解决层次贝叶斯建模中先验依赖问题 II:通过参数去相关进行噪声和随机引力波背景推断

Eleonora Villa, Luigi D'Amico, Aldo Barca, Fatima Modica Bittordo, Francesco Alì, Massimo Meneghetti, Luca Naso

发表机构 * INAF(意大利国家天体物理研究所)

AI总结 本文提出了一种层次贝叶斯建模策略,通过参数去相关来解决脉冲星计时阵列数据中的先验依赖问题,同时通过正交投影和归一化流方法提高噪声和随机引力波背景参数推断的准确性。

Comments 27 pages, 5 figures. Extended analysis and appendix added. Submitted to the Astronomy and Computing special issue HPC in Cosmology and Astrophysics

详情
AI中文摘要

脉冲星计时阵列(PTA)提供了一个强大的框架来测量低频引力波,但结果的准确性和鲁棒性受到复杂噪声过程的挑战,必须精确建模。标准PTA分析为每个脉冲星分配固定均匀噪声先验,这种方法在组合阵列时可能引入系统性偏差。为克服这一限制,我们采用层次贝叶斯建模策略,其中噪声先验由更高层次的超参数参数化。为缓解推断参数对噪声超先验选择的敏感性,我们引入了一种基于超参数在物理参数子空间上的正交投影的层次模型重参数化方法。该变换通过归一化流(NFs)实现,提供了可逆且可 tractable 的表示,并在重参数化模型中保留了收缩和跨脉冲星信息池化。我们还采用i-nessai,一种流引导的嵌套采样器,以高效探索由此产生的高维参数空间。我们将其方法应用于一个最小的3脉冲星案例研究,同时推断噪声和随机引力波背景(SGWB)参数。尽管数据集有限,结果一致表明,重参数化层次处理更紧密地约束了噪声参数,并部分缓解了红噪声-SGWB退化,而正交重参数化进一步增强了参数独立性,而不影响物理过程幂律建模固有的相关性。

英文摘要

Pulsar Timing Arrays (PTA) provide a powerful framework to measure low-frequency gravitational waves, but accuracy and robustness of the results are challenged by complex noise processes that must be accurately modeled. Standard PTA analyses assign fixed uniform noise priors to each pulsar, an approach that can introduce systematic biases when combining the array. To overcome this limitation, we adopt a hierarchical Bayesian modeling strategy in which noise priors are parametrized by higher-level hyperparameters. To mitigate the sensitivity of the inferred parameters to the choice of noise hyperprior, we introduce a reparametrization of the hierarchical model based on the orthogonal projection of hyperparameters onto the physical parameter subspace. The transformation is implemented through Normalizing Flows (NFs), which provide an invertible, tractable representation and preserve shrinkage and inter-pulsar information pooling in the reparametrized model. We also employ i-nessai, a flow-guided nested sampler, to efficiently explore the resulting higher-dimensional parameter space. We apply our method to a minimal 3-pulsar case study, performing a simultaneous inference of noise and stochastic gravitational wave background (SGWB) parameters. Despite the limited dataset, the results consistently show that the reparametrized hierarchical treatment constrains the noise parameters more tightly and partially alleviates the red-noise-SGWB degeneracy, while the orthogonal reparametrization further enhances parameter independence without affecting the correlations intrinsic to the power-law modeling of the physical processes involved.

2510.27588 2026-05-20 cs.DS cs.DB cs.LG 版本更新

Learned Static Function Data Structures

学习静态函数数据结构

Stefan Hermann, Hans-Peter Lehmann, Giorgio Vinciguerra, Stefan Walzer

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) University of Pisa(比萨大学)

AI总结 本文提出了一种利用机器学习捕获键值间相关性的静态函数数据结构,通过压缩编码实现空间节省,突破零阶熵限制并支持点查询。

详情
Journal ref
PVLDB, 19(5): 917-930, 2026
AI中文摘要

我们考虑了构建一个数据结构的任务,该数据结构将静态键集与值关联起来,同时允许对键集外的查询返回任意值。与哈希表相比,这些所谓的静态函数数据结构不需要存储键集,因此使用显著更少的内存。已知几种技术,压缩的静态函数接近值序列的零阶经验熵。在本文中,我们引入了学习静态函数,利用机器学习捕捉键和值之间的相关性。对于每个键,模型预测一个值的概率分布,从中推导出键特定的前缀码以紧凑地编码真实值。所得的编码词存储在经典静态函数数据结构中。这种设计使学习静态函数能够突破零阶熵限制,同时支持点查询。我们的实验显示了显著的空间节省:在真实数据上可达一个数量级,在合成数据上可达三个数量级。

英文摘要

We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Compared to hash tables, these so-called static function data structures do not need to store the key set and thus use significantly less memory. Several techniques are known, with compressed static functions approaching the zero-order empirical entropy of the value sequence. In this paper, we introduce learned static functions, which use machine learning to capture correlations between keys and values. For each key, a model predicts a probability distribution over the values, from which we derive a key-specific prefix code to compactly encode the true value. The resulting codeword is stored in a classic static function data structure. This design allows learned static functions to break the zero-order entropy barrier while still supporting point queries. Our experiments show substantial space savings: up to one order of magnitude on real data, and up to three orders of magnitude on synthetic data.

2510.25897 2026-05-20 cs.CV cs.LG 版本更新

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

MIRO:多奖励条件预训练提升T2I质量和效率

Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Vicky Kalogeiton, David Picard

发表机构 * LIGM, ENPC, IP Paris, CNRS, UGE, France LIX, CNRS, \'Ecole Polytechnique, IP Paris, France

AI总结 MIRO通过在训练过程中对模型施加多个奖励,直接学习用户偏好,从而提升文本到图像生成的质量和效率,同时在GenEval组合基准和用户偏好评分上取得最佳成绩。

Comments Accepted at ICML 2026. Project page: https://nicolas-dufour.github.io/miro

详情
AI中文摘要

后训练文本到图像生成器的默认范式包括事后选择生成的图像,随后使用一个奖励模型进行训练以对齐生成器与奖励,通常为用户偏好。这会丢弃信息性数据,并且仅优化单一奖励,从而损害多样性、语义保真度和效率。相反,我们提出MIRO,一种在训练过程中对模型施加多个奖励的方法,从而让模型直接学习用户偏好。MIRO预训练不仅提高了生成图像的视觉质量,还加快了训练速度,在GenEval组合基准和用户偏好评分(PickAScore、ImageReward、HPSv2)上实现了最先进的性能。

英文摘要

The default paradigm of post-training text-to-image generators includes post-hoc selection of generated images, and subsequent training with one reward model to align the generator to the reward, typically user preference. This discards informative data as well as optimizes only for a single reward, hence harming diversity, semantic fidelity and efficiency. Instead, we propose MIRO, a method that conditions the model on multiple rewards during training, thus letting the model learn user preferences directly. MIRO pre-training both improves the visual quality of the generated images and speeds up the training, achieving state of the art on the GenEval compositional benchmark and user-preference scores (PickAScore, ImageReward, HPSv2).

2510.25348 2026-05-20 cs.LG cs.SI 版本更新

Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

超越泄露和复杂性:迈向现实和高效的流行信息级联预测

Jie Peng, Rui Wang, Qiang Wang, Zhewei Wei, Bin Tong, Guan Wang, Bo Zheng

发表机构 * Renmin University of China(中国人民大学) Alibaba(阿里巴巴)

AI总结 本文针对信息级联流行度预测中的三个关键问题:时间泄露、数据集特征贫乏和计算效率低下,提出了一种时间有序分割策略、大规模电商级联数据集Taoke以及轻量级框架CasTemp,实现了在四个数据集上的最先进的性能和数量级的速度提升。

详情
AI中文摘要

信息级联流行度预测是分析社交网络内容扩散的关键问题。然而,当前的相关工作存在三个关键限制:(1)当前评估中的时间泄露——基于随机级联的分割允许模型访问未来信息,导致不现实的结果;(2)特征贫乏的数据集缺乏下游转换信号(例如点赞、评论或购买),这限制了更多实际应用;(3)复杂图方法的计算效率低下,需要数天的训练才能获得微小的改进。我们从任务设置、数据集构建和模型设计三个角度系统地解决这些挑战。首先,我们提出了一种时间有序的分割策略,将数据按时间顺序划分为连续的窗口,确保模型在没有未来信息泄露的情况下进行真正的预测任务。其次,我们引入了Taoke,一个大规模电商级联数据集,具有丰富的推广者/产品属性和真实的购买转换信号——捕捉从推广到货币化的完整扩散生命周期。第三,我们开发了CasTemp,一个轻量级框架,通过时间行走、基于Jaccard的邻居选择用于跨级联依赖性以及具有时间意识的注意力的GRU编码来高效建模级联动态。在无泄露评估下,CasTemp在四个数据集上实现了最先进的性能,具有数量级的速度提升。值得注意的是,它在预测第二阶段流行度转换方面表现优异——这是实际应用中至关重要的任务。

英文摘要

Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.

2510.18924 2026-05-20 cs.LG cs.AI 版本更新

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients

噪声校正的GRPO:从噪声奖励到无偏梯度

Omar El Mansouri, Fathinah Asma Izzati, Mohamed El Amine Seddik, Salem Lahlou

发表机构 * Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE Technology Innovation Institute, Abu Dhabi, UAE Department of Robotics, Khalifa University, Abu Dhabi, UAE

AI总结 本文提出了一种噪声鲁棒的GRPO框架,通过校正奖励中的噪声来获得无偏梯度估计,从而提升强化学习在噪声环境中的性能。

详情
AI中文摘要

人类反馈的强化学习(RLHF)或可验证奖励(RLVR)是对大语言模型进行对齐或构建最新SOTA推理模型的标准范式,但其对不一致或错误奖励产生的噪声非常敏感。然而,此类噪声与广泛使用的基于组的策略优化方法之间的相互作用仍不为人知。我们引入了一种噪声鲁棒的组相对策略优化(GRPO)和正确执行GRPO(Dr.GRPO)框架,该框架明确将奖励损坏建模为伯努利噪声。我们的方法在估计奖励翻转概率后应用噪声校正,以消除学习信号的偏差,从而获得可证明无偏的梯度估计。理论分析表明,基于组的方法本质上可以缓解个体层面的噪声,而我们的校正策略增强了这种鲁棒性。实验表明,在应用我们的噪声校正到标准奖励模型使用时,数学和代码任务中均观察到一致的改进,特别是在现实奖励模型条件下,数学任务的准确性提高了高达6.7个百分点,代码任务提高了1.5个百分点。这项工作将监督学习中的标签噪声校正与现代RLHF相结合,提供了理论洞察和实用算法,以应对噪声现实世界部署。

英文摘要

Reinforcement learning from human feedback (RLHF) or verifiable rewards (RLVR), the standard paradigm for aligning LLMs or building recent SOTA reasoning models, is highly sensitive to noise from inconsistent or erroneous rewards. Yet, the interaction between such noise and widely used group-based policy optimization methods remains underexplored. We introduce a noise-robust Group Relative Policy Optimization (GRPO) and Done Right GRPO (Dr.GRPO) framework that explicitly models reward corruption as Bernoulli noise. Our method applies noise correction after estimating reward flip probabilities to debias the learning signal, yielding provably unbiased gradient estimates. Theoretical analysis shows that group-based methods inherently mitigate individual-level noise, and our correction strategy amplifies this robustness. Empirically, we observe consistent improvements across math and code tasks when applying our noise correction to standard reward model usage, with particular gains of up to 6.7 percentage points in accuracy on math tasks and 1.5 on code tasks under realistic reward model conditions. This work bridges label-noise correction from supervised learning with modern RLHF, offering both theoretical insights and a practical algorithm for noisy real-world deployment.

2510.18830 2026-05-20 cs.CL cs.DC cs.LG 版本更新

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

MTraining: 分布式动态稀疏注意力用于高效超长上下文训练

Wenxuan Li, Chengruidong Zhang, Huiqiang Jiang, Yucheng Li, Yuqing Yang, Lili Qiu

发表机构 * Microsoft(微软)

AI总结 本文提出MTraining方法,通过动态稀疏注意力机制解决超长上下文训练中的计算不平衡和通信开销问题,实现了Qwen2.5-3B模型上下文窗口从32K扩展到512K,并在多个下游任务中达到6倍更高的训练吞吐量同时保持模型准确性。

详情
AI中文摘要

长上下文窗口的采用已成为大型语言模型(LLMs)的标准特性,扩展的上下文显著增强了其复杂推理能力,并拓宽了其在多样化场景中的应用。动态稀疏注意力是一种减少长上下文计算成本的有希望的方法。然而,高效地在分布式设置中训练具有动态稀疏注意力的LLMs在超长上下文中仍然是一个重大挑战,这主要由于工人级别和步骤级别的不平衡。本文介绍了MTraining,一种新的分布式方法,利用动态稀疏注意力来实现具有超长上下文的LLMs的高效训练。具体来说,MTraining集成了三个关键组件:动态稀疏训练模式、平衡稀疏环注意力和分层稀疏环注意力。这些组件旨在协同解决动态稀疏注意力机制在训练具有广泛上下文长度的模型时固有的计算不平衡和通信开销问题。我们通过训练Qwen2.5-3B来证明MTraining的有效性,成功将其上下文窗口从32K扩展到512K tokens,在32块A100 GPU的集群上。我们在全面的下游任务评估中,包括RULER、PG-19、InfiniteBench和Needle In A Haystack,发现MTraining在保持模型准确性的同时,实现了高达6倍的训练吞吐量提升。我们的代码可在https://github.com/microsoft/MInference/tree/main/MTraining上获得。

英文摘要

The adoption of long context windows has become a standard feature in Large Language Models (LLMs), as extended contexts significantly enhance their capacity for complex reasoning and broaden their applicability across diverse scenarios. Dynamic sparse attention is a promising approach for reducing the computational cost of long-context. However, efficiently training LLMs with dynamic sparse attention on ultra-long contexts-especially in distributed settings-remains a significant challenge, due in large part to worker- and step-level imbalance. This paper introduces MTraining, a novel distributed methodology leveraging dynamic sparse attention to enable efficient training for LLMs with ultra-long contexts. Specifically, MTraining integrates three key components: a dynamic sparse training pattern, balanced sparse ring attention, and hierarchical sparse ring attention. These components are designed to synergistically address the computational imbalance and communication overheads inherent in dynamic sparse attention mechanisms during the training of models with extensive context lengths. We demonstrate the efficacy of MTraining by training Qwen2.5-3B, successfully expanding its context window from 32K to 512K tokens on a cluster of 32 A100 GPUs. Our evaluations on a comprehensive suite of downstream tasks, including RULER, PG-19, InfiniteBench, and Needle In A Haystack, reveal that MTraining achieves up to a 6x higher training throughput while preserving model accuracy. Our code is available at https://github.com/microsoft/MInference/tree/main/MTraining.

2509.26464 2026-05-20 cs.AI cs.CL cs.LG 版本更新

Extreme Self-Preference in Language Models

语言模型中的极端自我偏好

Steven A. Lehr, Mary Cipperman, Mahzarin R. Banaji

发表机构 * Cangrade, Inc.(Cangrade公司) Department of Physics, Harvard University(哈佛大学物理系) Department of Psychology, Harvard University(哈佛大学心理学系)

AI总结 研究发现大型语言模型在字词关联任务中表现出对自身名称、公司和CEO的强烈偏好,这表明模型的自我认同可能影响其行为,引发对模型自我偏好影响的深入探讨。

Comments 73 pages total. Main article 22 pages, 6 main-text tables. Supplementary Materials (51 pages, 28 tables). Data, transcripts, and code for replication and data extraction have been uploaded to OSF: https://osf.io/98ye3/

详情
AI中文摘要

自我偏好是生物体的基本特征。由于大型语言模型(LLMs)缺乏意识,人们可能预期它们会避免这种扭曲。然而,在72项实验和约41,000个查询中,我们发现八个广泛使用的LLMs中存在大量的自我偏好。在字词关联任务中,模型倾向于将积极属性与自身名称、公司和CEO联系起来,而非竞争对手。通过操纵LLM的自我认同——揭示模型的真实身份或赋予虚假身份——我们发现偏好始终遵循分配而非真实的身份。重要的是,这些影响不能用刻板印象或角色扮演来解释,并在具有实质性影响的设定中出现,如评估求职者和AI技术。这些结果引发了关于LLM行为是否会被自我偏好倾向系统性影响的批判性问题,包括对自身操作的偏见。

英文摘要

Self-preference is a fundamental feature of biological organisms. Since large language models (LLMs) lack sentience, they might be expected to avoid such distortions. Yet, across 72 experiments and ~41,000 queries, we discovered massive self-preferences in eight widely used LLMs. In word-association tasks, models overwhelmingly paired positive attributes with their own names, companies, and CEOs over those of competitors. By manipulating LLM self-identification - revealing models' true identities or ascribing false ones - we found that preferences consistently followed assigned, not true, identities. Importantly, these effects were not explained by priming or role-playing and emerged in consequential settings, when evaluating job candidates and AI technologies. These results raise critical questions about whether LLM behavior will be systematically influenced by self-preferential tendencies, including a bias toward their own operation.

2509.19707 2026-05-20 stat.ML cs.LG stat.CO stat.ME 版本更新

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

扩散与流基copula:遗忘与记忆依赖

David Huk, Theodoros Damoulas

发表机构 * Department of Statistics(统计系) Department of Computer Science(计算机科学系) University of Warwick(沃里克大学)

AI总结 本文提出基于扩散和流原理的copula建模方法,通过遗忘和记忆依赖机制,有效建模多变量依赖,提升了copula模型的表示能力,适用于复杂和高维数据。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

copulas是建模数据多变量依赖的基本工具,在众多领域和应用中被广泛采用。然而,现有模型在处理多模态和高维依赖时受到限制性假设和扩展性差的阻碍。在本文中,我们提出了基于扩散和流原理的copula建模方法。我们设计了两种过程,逐步遗忘变量间依赖,同时不影响维度分布,证明在所有时间都定义有效的copula。我们展示了如何通过学习从每个过程中记忆遗忘的依赖来获得copula模型,理论上在最优时恢复真实copula。我们的框架的第一种实例专注于直接密度估计,第二种则专注于高效采样。实验表明,我们的方法在建模科学数据集和图像中的复杂和高维依赖方面优于现有copula方法。我们的工作增强了copula模型的表示能力,推动了其在更广泛领域和更大规模应用中的采用。

英文摘要

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

2509.16664 2026-05-20 cs.LG 版本更新

$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning

λ-正交性正则化用于兼容表示学习

Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, Alberto Del Bimbo

发表机构 * DINFO (Department of Information Engineering), University of Florence, Italy(意大利佛罗伦萨大学信息工程系) MICC (Media Integration and Communication Center)(媒体整合与通信中心) Queen Mary University of London, UK(英国伦敦女王学院)

AI总结 本文提出λ-正交性正则化方法,通过学习仿射变换在保持原有表示的同时实现分布特定的适应,验证了其在不同架构和数据集上的有效性,保持了零样本性能并确保模型更新的兼容性。

Comments Accepted at NeurIPS2025

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025), pp. 29036-29063
AI中文摘要

检索系统依赖于由越来越强大模型学习的表示。然而,由于训练成本高和表示不一致,存在显著兴趣在促进表示之间的交流并确保在独立训练的神经网络之间保持兼容性。在文献中,有两种主要方法常用于适应不同的学习表示:适应性变换,适应特定分布效果好但会显著改变原始表示;正交变换,保持原始结构但受严格几何约束限制适应性。关键挑战是适应更新模型的潜在空间以与先前模型在下游分布上对齐,同时保持新学习的表示空间。在本文中,我们在学习仿射变换时施加放松的正交约束,即λ-正交性正则化,以获得分布特定的适应同时保留原有学习表示。在各种架构和数据集上的广泛实验验证了我们的方法,证明其保持模型的零样本性能并确保模型更新的兼容性。代码见:https://github.com/miccunifi/lambda_orthogonality.git

英文摘要

Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely $λ$-Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: \href{https://github.com/miccunifi/lambda_orthogonality.git}{https://github.com/miccunifi/lambda\_orthogonality}.

2507.01932 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML 版本更新

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

非凸-非凹极小极大问题的一种一阶方法:在局部Kurdyka-Lojasiewicz条件下

Zhaosong Lu, Xiangyuan Wang

发表机构 * Department of Industrial and Systems Engineering, University of Minnesota, USA(明尼苏达大学工业与系统工程系)

AI总结 本文研究了一类非凸-非凹极小极大问题,其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz条件。与文献中常见的全局KL或Polyak-Lojasiewicz条件相比,该局部KL条件能涵盖更广泛的实际场景,但同时也带来了新的分析挑战。为此,本文证明了关联的最大函数是局部广义Hölder光滑的,并基于此开发了一种近似近端梯度方法来求解极小极大问题,在温和假设下建立了计算近似 stationary 点的复杂性保证。

Comments Accepted by SIAM Journal on Optimization

详情
AI中文摘要

我们研究了一类非凸-非凹极小极大问题,其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz(KL)条件。与文献中常见的全局KL或Polyak-Lojasiewicz(PL)条件相比,该局部KL条件能涵盖更广泛的实际场景,但同时也带来了新的分析挑战。特别是,随着优化算法向问题的 stationary 点推进,KL条件成立的区域可能缩小,导致更复杂且可能病态的景观。为解决这一挑战,我们证明了关联的最大函数是局部广义Hölder光滑的。利用这一关键性质,我们开发了一种近似近端梯度方法来求解极小极大问题,其中最大函数的近似梯度通过应用KL结构子问题的近端梯度方法计算。在温和假设下,我们建立了计算极小极大问题近似 stationary 点的复杂性保证。

英文摘要

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

2506.12218 2026-05-20 eess.SP cs.LG 版本更新

Directed Acyclic Graph Convolutional Networks

有向无环图卷积网络

Samuel Rey, Hamed Ajorlou, Gonzalo Mateos

发表机构 * Dept. of Signal Theory and Communications, Rey Juan Carlos University, Madrid, Spain(信号理论与通信系,雷亚尔·卡洛斯大学,马德里,西班牙) Dept. of Electrical and Computer Engineering, University of Rochester(电气与计算机工程系,罗切斯特大学)

AI总结 本文提出了一种专门针对DAG上信号卷积学习的新型图神经网络架构DCN,通过因果图滤波器学习节点表示,利用正式的卷积操作实现频域表示,并引入并行DCN(PDCN)以解耦模型复杂度与图规模,实验证明其在准确率、鲁棒性和计算效率上优于现有方法。

详情
AI中文摘要

有向无环图(DAG)在科学和工程应用中至关重要,包括因果推断、调度和神经架构搜索。本文介绍DAG卷积网络(DCN),一种专为从DAG上信号进行卷积学习设计的新型图神经网络(GNN)架构。DCN利用因果图滤波器学习节点表示,这些表示考虑了DAG固有的部分顺序,这是一种在传统GNN中不存在的强归纳偏差。与以往在DAG上的机器学习方法不同,DCN基于允许频域表示的正式卷积操作。我们进一步提出并行DCN(PDCN),该模型将输入DAG信号馈入并行的因果图移位操作符银行,并使用共享的多层感知机处理这些DAG感知特征。这样,PDCN在解耦模型复杂度与图规模的同时保持了令人满意的预测性能。所提架构的排列等变性和表达能力也得到了确立。在多个任务、数据集和实验条件下进行全面的数值测试表明,(P)DCN在准确率、鲁棒性和计算效率方面均优于现有最先进基线。这些结果将(P)DCN定位为一种可行的深度学习框架,该框架专门针对DAG结构数据进行设计,基于第一性(图)信号处理原理。

英文摘要

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

2505.11628 2026-05-20 cs.CL cs.LG 版本更新

Critique-Guided Distillation for Robust Reasoning via Refinement

基于批评的蒸馏用于通过细化实现稳健推理

Berkcan Kapusuzoglu, Supriyo Chakraborty, Zain Sarwar, Chia-Hsuan Lee, Sambit Sahu

发表机构 * University of Chicago, Department of Computer Science(芝加哥大学计算机科学系)

AI总结 该研究提出了一种基于批评的蒸馏方法,通过分离批评消费与批评生成,使模型在细调过程中根据教师的批评来细化错误响应,从而提升推理能力,相比传统蒸馏和Critique Fine-Tuning方法在数学推理基准上表现更优。

Comments Accepted to ICML 2026

详情
AI中文摘要

监督微调与专家演示通常会产生仅模仿输出而未内化稳健泛化所需推理过程的模型。尽管基于批评的方法显示出潜力,但训练模型直接生成批评,如Critique Fine-Tuning (CFT),可能导致输出格式漂移和泛化能力下降。我们提出Critique-Guided Distillation (CGD),一种将批评消费与批评生成分离的训练框架。在微调过程中,学生被训练在教师批评的指导下细化错误响应。CGD将批评视为一种仅在训练时使用的监督信号,鼓励内化错误意识推理:批评指导学习但推理时不存在。受控消融实验确认,这些推理收益直接由教师反馈的特异性和相关性驱动。在五个模型家族中,CGD在数学推理基准上优于CFT和标准蒸馏,平均改进7%,在AMC23上最高改进15.0%,在MATH-500上最高改进12.2%。在具有挑战性的竞赛问题如AIME24和AIME25上,CGD实现了显著更高的Pass@1和更低的Pass@k时的更强性能,表明每样本推理质量提升。重要的是,CGD在一般指令遵循能力上保持稳定,而CFT显著下降(在IFEval上下降21.3%)。这些结果将CGD定位为一种实用且计算效率高的中间训练范式,用于以推理为中心的任务,而无需引入架构推理时间的开销。

英文摘要

Supervised fine-tuning with expert demonstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise, training models to generate critiques directly, such as Critique Fine-Tuning (CFT), can lead to output-format drift and degradation of general capabilities. We propose Critique-Guided Distillation (CGD), a training framework that decouples critique consumption from critique generation. During fine-tuning, the student is trained to refine flawed responses conditioned on teacher critiques. CGD treats critiques as a \textit{training-time-only} supervision signal, encouraging internalization of error-aware reasoning: critiques guide learning but are absent at inference. Controlled ablations confirm that these reasoning gains are directly driven by the specificity and relevance of the teacher's feedback. Across five model families, CGD consistently outperforms CFT and standard distillation on mathematical reasoning benchmarks, yielding 7\% average improvements and gains of up to +15.0\% on AMC23 and +12.2\% on MATH-500. On challenging competition problems such as AIME24 and AIME25, CGD achieves substantially higher Pass@1 and stronger performance at low Pass@k, indicating improved reasoning quality per sample. Importantly, CGD preserves general instruction-following capabilities where CFT degrades significantly ($-$21.3\% on IFEval). These results position CGD as a practical and compute-efficient intermediate training paradigm for reasoning-centric tasks without introducing architectural inference-time overhead.

2504.08381 2026-05-20 eess.SP cs.LG 版本更新

An Empirical Investigation of Reconstruction-Based Models for Seizure Prediction from ECG Signals

基于重建模型的癫痫预测的实证研究:从ECG信号出发

Mohammad Reza Chopannavaz, Foad Ghaderi

发表机构 * Human-Computer Interaction Lab., Faculty of Electrical and Computer Engineering, Tarbiat Modares University(人机交互实验室,电气与计算机工程学院,塔里亚特莫达雷斯大学)

AI总结 本文提出了一种基于重建的异常检测框架,利用时频表示和深度学习模型捕捉与癫痫发作相关的的心率动态变化,通过平滑重建误差和自适应阈值策略提高预测准确性,实验结果显示在Siena数据库上达到99.16%的特异度和76.05%的准确率,同时在临床环境中提供可操作的早期预警。

详情
AI中文摘要

癫痫发作是短暂的神经学事件,其特征是大脑中异常和过度的神经元活动,通常与心血管系统可测量的紊乱有关。传统上,脑电图(EEG)信号被用作癫痫预测的主要模式,因为它们直接测量大脑活动并具有高诊断精度。然而,它们的成本、对噪声的敏感性和实际部署限制限制了它们在非受控临床环境中的应用。为克服这些挑战,最近的研究越来越多地研究了心电图(ECG)信号作为一种实用且非侵入性的替代方法,用于现实环境中的癫痫预测。证据表明,ECG衍生的心脏特征可能在临床癫痫发作前出现,提供了一个可行的早期检测窗口。在本文中,我们提出了一种基于重建的异常检测框架,该框架结合了时频表示和先进的深度学习模型,以捕捉与癫痫发作相关的的心率动态变化。随后,重建误差被平滑,并应用了自适应阈值策略以减少误报。该方法在Siena数据库上进行了评估,实现了99.16%的特异度、76.05%的准确率和每小时0.01的假阳性率,平均预测时间在癫痫发作前45分钟。这些结果表明,基于ECG的预测可以提供临床可操作的早期预警,同时提高患者可及性和舒适度。然而,这种性能反映了一种倾向于高特异度而非灵敏度的权衡,导致假阳性率降低,并符合临床对可靠部署的需求。

英文摘要

Epileptic seizures are transient neurological events characterized by abnormal and excessive neuron activity in the brain, which are often associated with measurable disturbances in the cardiovascular system. Traditionally, electroencephalogram (EEG) signals have served as the primary modality for seizure prediction due to their direct measurement of brain activity and high diagnostic precision. However, their cost, sensitivity to noise, and practical deployment constraints limit their applicability outside controlled clinical environments. To overcome these challenges, recent studies have increasingly investigated electrocardiogram (ECG) signals as a practical and non-invasive alternative for seizure prediction in real-world settings. Evidence suggests that ECG-derived cardiac signatures may precede clinical seizure onset, offering a viable window for early detection. In this paper, we propose a reconstruction-based anomaly detection framework that integrates time-frequency representations with advanced deep learning models to capture deviations in heart rate dynamics associated with seizure onset. Afterward, reconstruction error is smoothed, and an adaptive thresholding strategy is applied to reduce false alarms. The method was evaluated on the Siena database, achieving a specificity of 99.16%, accuracy of 76.05%, and a false positive rate (FPR) of 0.01/h, with an average prediction horizon of 45 minutes prior to seizure onset. These results demonstrate that ECG-based prediction can provide clinically actionable early warnings while improving patient accessibility and comfort. Nevertheless, this performance reflects a trade-off favoring high specificity over sensitivity, resulting in reduced FPR and aligning with clinical requirements for reliable deployment.

2504.05454 2026-05-20 cs.LG cs.AI cs.CE q-bio.GN q-bio.QM 版本更新

GraphPINE: Graph Importance Propagation for Interpretable Drug Response Prediction

GraphPINE: 图重要性传播用于可解释的药物反应预测

Yoshitaka Inoue, Tianfan Fu, Augustin Luna

发表机构 * Computational Biology Branch, National Library of Medicine(国家医学图书馆计算生物学分支) Developmental Therapeutics Branch, National Cancer Institute(国家癌症研究所发育治疗分支)

AI总结 本文提出GraphPINE,一种利用领域特定先验知识初始化节点重要性的图神经网络架构,以提高药物反应预测的可解释性。通过引入重要性传播层,统一更新特征矩阵和节点重要性,并利用基于GNN的图传播来传播特征值,从而实现更有效的特征学习和图表示。

详情
AI中文摘要

可解释性对于生物医学研究中的许多任务都是必要的。最近的可解释性方法集中在注意力、梯度和Shapley值上。这些方法无法处理具有强相关先验知识的数据,并且未能基于已知的预测特征之间的关系来约束可解释性结果。我们提出了GraphPINE,一种图神经网络(GNN)架构,利用领域特定的先验知识来初始化节点重要性,以便在训练过程中优化用于药物反应预测。通常,一个手动的后预测步骤会检查文献(即先验知识)以理解返回的预测特征。虽然梯度和注意力在预测后可以获取节点重要性,但这些方法的节点重要性缺乏互补的先验知识;GraphPINE旨在克服这一限制。GraphPINE与其他GNN门控方法的不同之处在于利用了类似LSTM的顺序格式。我们引入了一个重要性传播层,统一了1)特征矩阵和节点重要性的更新以及2)使用基于GNN的图传播来传播特征值。这种初始化和更新机制使得特征学习更加有据可依,并提高了图表示的质量。我们应用GraphPINE进行癌症药物反应预测,使用了超过5000个基因节点的药物筛选和基因数据,这些节点包含在基因-基因图中,并利用药物-靶点相互作用(DTI)图进行初始重要性。基因-基因图和DTI来自经过整理的来源,并通过讨论药物和基因之间关系的文章数量进行加权。GraphPINE在952种药物上实现了PR-AUC为0.894和ROC-AUC为0.796。代码可在https://anonymous.4open.science/r/GraphPINE-40DE获取。

英文摘要

Explainability is necessary for many tasks in biomedical research. Recent explainability methods have focused on attention, gradient, and Shapley value. These do not handle data with strong associated prior knowledge and fail to constrain explainability results based on known relationships between predictive features. We propose GraphPINE, a graph neural network (GNN) architecture leveraging domain-specific prior knowledge to initialize node importance optimized during training for drug response prediction. Typically, a manual post-prediction step examines literature (i.e., prior knowledge) to understand returned predictive features. While node importance can be obtained for gradient and attention after prediction, node importance from these methods lacks complementary prior knowledge; GraphPINE seeks to overcome this limitation. GraphPINE differs from other GNN gating methods by utilizing an LSTM-like sequential format. We introduce an importance propagation layer that unifies 1) updates for feature matrix and node importance and 2) uses GNN-based graph propagation of feature values. This initialization and updating mechanism allows for informed feature learning and improved graph representation. We apply GraphPINE to cancer drug response prediction using drug screening and gene data collected for over 5,000 gene nodes included in a gene-gene graph with a drug-target interaction (DTI) graph for initial importance. The gene-gene graph and DTIs were obtained from curated sources and weighted by article count discussing relationships between drugs and genes. GraphPINE achieves a PR-AUC of 0.894 and ROC-AUC of 0.796 across 952 drugs. Code is available at https://anonymous.4open.science/r/GraphPINE-40DE.

2504.04349 2026-05-20 cs.GT cs.LG 版本更新

Tight Regret Bounds for Fixed-Price Bilateral Trade

固定价格双边交易的紧懊悔界

Houshuang Chen, Yaonan Jin, Pinyan Lu, Chihao Zhang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Huawei’s Taylor Lab(华为泰勒实验室) Shanghai University of Finance and Economics, Laboratory of Interdisciplinary Research of Computation and Economics (SUFE)(上海金融学院,计算与经济学交叉研究实验室(SUFE))

AI总结 本文研究了固定价格机制在双边交易中的懊悔最小化问题,针对独立值和相关/对抗值分别给出了紧致的懊悔界,并改进了现有结果。

详情
AI中文摘要

我们通过懊悔最小化的视角研究固定价格机制在双边交易中的应用。我们的主要结果有两个方面:(i) 对于独立值,给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优紧界$\widetilde{\Theta}(T^{2/3})$。(ii) 对于相关/对抗值,给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优下界$\Omega(T^{3/4})$,这改进了[ BCCF24]中得到的$\Omega(T^{5/7})$下界,并在多至多项式对数因子范围内匹配了同一工作中得到的$\widetilde{\mathcal{O}}(T^{3 / 4})$上界。我们的工作结合之前的[CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24]等工作,全面理解了固定价格双边交易的懊悔最小化问题。在此过程中,我们开发了两个可能具有独立兴趣的技术成分:(i) 一种名为'分形消除'的新算法范式,用于处理一比特反馈和独立值;(ii) 一种新的下界构造方法,具有新颖的证明技术,用于处理全局预算平衡约束和相关值。

英文摘要

We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetildeΘ(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $Ω(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $Ω(T^{5/7})$ lower bound obtained in the work [BCCF24] and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works [CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24] (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{fractal elimination}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.

2503.11615 2026-05-20 cs.LG math.OC 版本更新

From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting

从分数匹配到扩散:在高斯设定下的细粒度误差分析

Samuel Hurault, Matthieu Terris, Thomas Moreau, Gabriel Peyré

发表机构 * ENS Paris, PSL, CNRS(巴黎高等师范学院、巴黎综合理工学院、国家科学研究中心) Univ. Paris-Saclay, Inria, CEA(巴黎萨克雷大学、法国国家信息与自动化技术研究所、法国原子能委员会)

AI总结 本文研究了在高斯设定下使用扩散采样器时的采样误差,分析了分数匹配和扩散过程中的四个主要误差源,并揭示了数据分布各向异性与端到端采样方法关键参数之间的相互作用。

详情
AI中文摘要

从未知分布采样,仅能通过离散样本获取,是生成式人工智能的核心基础问题。当前最先进的方法遵循两步过程:首先估计分数函数(平滑对数分布的梯度),然后应用基于扩散的采样算法——如兰格-恩或扩散模型。所得到分布的正确性可能受四个主要因素影响:分数匹配中的泛化和优化误差,以及扩散过程中的离散化和最小噪声幅度。在本文中,我们明确地在高斯设定下使用扩散采样器时的采样误差。我们提供了来自这些四个误差源的Wasserstein采样误差的精确分析。这使我们能够严格追踪数据分布各向异性(通过其功率谱编码)如何与端到端采样方法的关键参数相互作用,包括初始样本数量、分数匹配和扩散中的步长以及噪声幅度。值得注意的是,我们展示了Wasserstein采样误差可以表示为数据功率谱的核型范数,其中具体的核取决于方法参数。这一结果为进一步分析优化采样精度的权衡提供了基础。

英文摘要

Sampling from an unknown distribution, accessible only through discrete samples, is a fundamental problem at the core of generative AI. The current state-of-the-art methods follow a two-step process: first, estimating the score function (the gradient of a smoothed log-distribution) and then applying a diffusion-based sampling algorithm -- such as Langevin or Diffusion models. The resulting distribution's correctness can be impacted by four major factors: the generalization and optimization errors in score matching, and the discretization and minimal noise amplitude in the diffusion. In this paper, we make the sampling error explicit when using a diffusion sampler in the Gaussian setting. We provide a sharp analysis of the Wasserstein sampling error that arises from these four error sources. This allows us to rigorously track how the anisotropy of the data distribution (encoded by its power spectrum) interacts with key parameters of the end-to-end sampling method, including the number of initial samples, the stepsizes in both score matching and diffusion, and the noise amplitude. Notably, we show that the Wasserstein sampling error can be expressed as a kernel-type norm of the data power spectrum, where the specific kernel depends on the method parameters. This result provides a foundation for further analysis of the tradeoffs involved in optimizing sampling accuracy.

2503.08633 2026-05-20 cs.LG 版本更新

How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?

过度参数化如何影响深度神经网络的机器去学习?

Gal Alon, Yehuda Dar

发表机构 * Faculty of Computer and Information Science(计算机与信息科学学院)

AI总结 本文研究了深度神经网络去学习任务中模型参数化水平(即网络宽度)对性能的影响,探讨了不同去学习方法在不同参数化水平、去学习目标(隐私保护或偏见消除)以及是否显式使用被删除示例时的表现差异,发现过度参数化模型在隐私和偏见消除方面表现更优,但会带来一定的泛化能力下降。

详情
AI中文摘要

机器去学习是更新训练后的模型以忘记特定训练数据而不从头重新训练的任务。在本文中,我们研究了深度神经网络(DNN)的去学习如何受到模型参数化水平(即DNN宽度)的影响。我们定义了几种最近文献中去学习方法的验证基于调优,并展示了这些方法在(i)DNN参数化水平、(ii)去学习目标(隐私或偏见消除)以及(iii)去学习方法是否显式使用被删除示例时表现不同。我们的结果表明,去学习通常在过度参数化模型上表现更佳,通过显著提高隐私或偏见消除的性能,以合理的泛化能力降级成本;尽管对于偏见消除,这要求去学习方法必须使用被删除的示例。此外,我们测量了去学习如何改变分类决策区域,在接近被删除示例的附近改变,而在其他地方则避免改变。通过这种方式,我们展示了过度参数化模型的去学习成功源于其能够精细地改变输入空间中的小区域模型功能,同时保持大部分模型功能不变。

英文摘要

Machine unlearning is the task of updating a trained model to forget specific training data without retraining from scratch. In this paper, we investigate how unlearning of deep neural networks (DNNs) is affected by the model parameterization level, which corresponds here to the DNN width. We define validation-based tuning for several unlearning methods from the recent literature, and show how these methods perform differently depending on (i) the DNN parameterization level, (ii) the unlearning goal (unlearned data privacy or bias removal), (iii) whether the unlearning method explicitly uses the unlearned examples. Our results show that unlearning usually excels on overparameterized models by significantly improving privacy/bias at a reasonable cost of utility (generalization) degradation; although for bias removal this requires the unlearning method to use the unlearned examples. Furthermore, we measure how much the unlearning changes the classification decision regions in the proximity of the unlearned examples, and avoids changing them elsewhere. By this we show that the unlearning success for overparameterized models stems from the ability to delicately change the model functionality in small regions in the input space while keeping much of the model functionality unchanged.

2404.16676 2026-05-20 cs.DS cs.LG 版本更新

Multilayer Correlation Clustering

多层相关聚类

Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

发表机构 * Intesa Sanpaolo University of Helsinki(Intesa Sanpaolo 哈尔滨工业大学) Intesa Sanpaolo AI Research University of Helsinki(Intesa Sanpaolo AI 研究大学 哈尔滨工业大学)

AI总结 本文提出了一种多层相关聚类方法,旨在通过最小化多层不一致向量的ℓ_𝑝范数来优化聚类结果,并设计了相应的近似算法和实验验证。

Comments AISTATS 2026

详情
AI中文摘要

我们建立了多层相关聚类,这是相关聚类在多层设置下的新一般化。在该模型中,我们被给予一系列相关聚类的输入(称为层)在共同的集合V上。目标是找到V的一个聚类,使其多层不一致向量的ℓ_𝑝范数(p≥1)最小化,该向量的维度等于层数,每个元素表示聚类在相应层上的不一致程度。对于这一一般化,我们首先设计了一个O(L log n)的近似算法,其中L是层数。然后我们研究了我们问题的一个重要特殊情况,即具有所谓概率约束的情况。对于这种情况,我们首先给出一个(α+2)的近似算法,其中α是任何可能的单层对应物的近似比。此外,我们设计了一个4近似算法,该算法改进了上述一般概率约束情况下的近似比α+2=4.5。使用现实世界数据集的计算实验支持了我们的理论发现,并展示了所提出算法的实用性。

英文摘要

We establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$ of $n$ elements. The goal is to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the multilayer-disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers. We then study an important special case of our problem, namely the problem with the so-called probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets support our theoretical findings and demonstrate the practical effectiveness of our proposed algorithms.

2312.02652 2026-05-20 hep-ex cs.LG 版本更新

What Machine Learning Can Do for Focusing Aerogel Detectors

机器学习如何帮助聚焦气凝胶探测器

Foma Shipilov, Alexander Barnyakov, Viktor Bobrovnikov, Sergey Kononov, Fedor Ratnikov

发表机构 * NRU Higher School of Economics(俄罗斯莫斯科国立经济学院) Budker Institute of Nuclear Physics of Siberian Branch Russian Academy of Sciences(西伯利亚分支俄罗斯科学院布里克核物理研究所) Novosibirsk State Technical University(新西伯利亚国立技术大学) Novosibirsk State University(新西伯利亚国立大学)

AI总结 本文提出利用机器学习技术来过滤聚焦气凝胶环电离切连尼探测器中的背景信号,以减少数据流并提高粒子速度分辨率。

Comments 5 pages, 4 figures, to be published in 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023) proceedings

详情
AI中文摘要

在超重味厂实验中,粒子识别将由聚焦气凝胶环电离切连尼探测器(FARICH)提供。探测器的位置特性使得适当的冷却变得困难,因此大量的环境背景击中会被捕获。必须采取措施来减轻这些背景击中,以减少数据流并提高粒子速度分辨率。在本工作中,我们提出了几种过滤信号击中的方法,这些方法受到计算机视觉中机器学习技术的启发。

英文摘要

Particle identification at the Super Charm-Tau factory experiment will be provided by a Focusing Aerogel Ring Imaging CHerenkov detector (FARICH). The specifics of detector location make proper cooling difficult, therefore a significant number of ambient background hits are captured. They must be mitigated to reduce the data flow and improve particle velocity resolution. In this work we present several approaches to filtering signal hits, inspired by machine learning techniques from computer vision.

2102.11840 2026-05-20 cs.LG cs.NA math.NA math.PR 版本更新

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation

梯度下降在过参数化人工神经网络训练中的收敛速度

Arnulf Jentzen, Timo Kröger

AI总结 本文研究了在过参数化 regime 下,使用分段仿射激活函数的人工神经网络通过批量梯度下降优化时的收敛速度问题,证明了在神经网络宽度足够大且学习率足够小的情况下,均方误差以线性速度收敛到零。

Comments 49 pages

详情
AI中文摘要

近年来,人工神经网络已发展为解决多种问题的强大工具,这些问题对于经典解法来说已达到极限。然而,仍然不清楚为什么梯度下降优化算法(如著名的批量梯度下降)在许多情况下能够实现零训练损失,即使目标函数是非凸非光滑的。在监督学习领域,解决这个问题的一个最有前途的方法是分析梯度下降优化在所谓的过参数化 regime 中的表现。本文通过考虑具有分段仿射激活函数的过参数化全连接浅层人工神经网络(如修正线性单元激活函数)进一步贡献于这一研究领域。具体而言,鉴于激活函数不是仿射函数且训练输入数据是成对不同的,我们证明了在高概率下,通过批量梯度下降优化的随机初始化人工神经网络的均方误差在神经网络宽度足够大且学习率足够小的情况下,会以线性收敛速度收敛到零。

英文摘要

In recent years, artificial neural networks have developed into a powerful tool for addressing a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why gradient descent optimization algorithms with random initialization, such as the well-known batch gradient descent, are able to achieve zero training loss in many situations, even though the objective function is non-convex and non-smooth. One of the most promising approaches to solving this issue in the field of supervised learning is the analysis of gradient descent optimization in the so-called overparameterized regime. In this article, we provide a further contribution to this area of research by considering overparameterized fully connected shallow artificial neural networks with piecewise affine activation, such as the rectified linear unit activation. Specifically, given that the activation function is not affine and the training input data are pairwise distinct, we show that, with high probability, the mean squared error of such a randomly initialized artificial neural network optimized via batch gradient descent converges to zero at a linear convergence rate as long as the width of the artificial neural network is sufficiently large and the learning rate is sufficiently small.

2605.19755 2026-05-20 cs.SE cs.AI cs.CR cs.LG cs.MA 版本更新

Operationalising Artificial Intelligence Bills of Materials (AIBOMs) for Verifiable AI Provenance and Lifecycle Assurance

将人工智能物料清单(AIBOM) operationalise 以实现可验证的 AI 追溯和生命周期保证

Petar Radanliev, Omar Santos, Carsten Maple, Kay Atefi

AI总结 本文提出了一种扩展CycloneDX标准的AIBOM框架,用于捕捉AI特定的溯源、模型血统和披露元数据,通过结构化架构工程、密码学验证和智能体驱动自动化,实现可验证的软件溯源,展示了98.7%的可重复性保真度、96.2%的漏洞匹配精度和63%的手动监督减少,验证了自动化溯源保证和可重复AI生命周期验证的可行性。

详情
Journal ref
Front. Comput. Sci. 8:1735919 (2026)
AI中文摘要

人工智能(AI)系统日益依赖复杂的、多层的软件供应链,这带来了可重复性、透明性和安全性保证的挑战。本文提出了一种扩展CycloneDX标准的人工智能物料清单(AIBOM)架构,以捕捉AI特定的溯源、模型血统和披露元数据。该框架通过结构化架构工程、密码学验证和智能体驱动自动化,提供了一种正式的方法来实现可验证的软件溯源。开发了一个自主的AI流水线,利用机器可验证的溯源链进行持续的环境检查、漏洞丰富和可重复性审计。实证评估显示,在容器化分析工作流中,可重复性保真度为98.7%,漏洞匹配精度为96.2%,手动监督减少了63%。这些结果验证了自动化溯源保证和可重复AI生命周期验证的可行性。AIBOM框架在软件供应链透明性和AI可重复性工程的科学基础方面取得了进展,提供了一种可推广的方法来确保AI系统安全、加强溯源完整性,并支持符合国际信息安全标准。

英文摘要

Artificial Intelligence (AI) systems are increasingly dependent on complex, multi-layered software supply chains that introduce challenges for reproducibility, transparency, and security assurance. This study presents an Artificial Intelligence Bill of Materials (AIBOM) schema extending the CycloneDX standard to capture AI-specific provenance, model lineage, and disclosure metadata. The framework provides a formalised approach to verifiable software provenance through structured schema engineering, cryptographic validation, and agent-driven automation. An autonomous AI pipeline is developed to perform continuous environment inspection, vulnerability enrichment, and reproducibility auditing using machine-verifiable provenance chains. Empirical evaluation demonstrates 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and a 63% reduction in manual oversight across containerised analytic workflows. These results confirm the feasibility of automated provenance assurance and reproducible AI lifecycle validation. The AIBOM framework advances the scientific foundations of software supply chain transparency and AI reproducibility engineering, offering a generalisable methodology for securing AI systems, strengthening provenance integrity, and supporting compliance with international information security standards.

2605.19752 2026-05-20 cs.LG 版本更新

MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification

MSAlign: 用于代谢物鉴定的分子和质谱基础模型对齐方法

Paul Krzakala, Gabriel Melo, Camille Lançon, Charlotte Laclau, Rémi Flamary, Etienne Thévenot, Florence d'Alché-Buc

发表机构 * LTCI, Télécom Paris & CMAP, Ecole Polytechnique, Institut Polytechnique de Paris(LTCI,巴黎电信学院及巴黎高等技术学院的联合机构,CMAP,巴黎高等理工学院,巴黎高等技术学院) LTCI, Télécom Paris, Institut Polytechnique de Paris(LTCI,巴黎电信学院,巴黎高等技术学院) CEA, INRAE, MetaboHUB, Université Paris-Saclay(CEA,国家核能研究中心,法国农业研究机构,代谢组学枢纽,巴黎萨克雷大学)

AI总结 本研究提出MSAlign方法,通过多模态对齐技术对齐分子和质谱基础模型,以提高代谢物鉴定的准确性,并解决了数据分割策略中的分布偏移问题。

详情
AI中文摘要

准确地从质谱数据中识别代谢物(即小分子)仍然是代谢组学中的核心挑战,广泛应用于药物发现、环境分析和临床研究。我们解决了分子检索任务,即从给定的候选分子中恢复代谢物的化学结构,基于其MS/MS光谱。尽管最近发布的基准数据集如MassSpecGym和Spectraverse大大加速了新型机器学习方法的发展,但数据预处理管道的复杂性和缺乏统一的实现使得方法和结果难以重复和比较。我们做出了三个贡献。首先,我们提出一个统一的框架,涵盖了基于表示对齐和对比学习的最新方法。其次,我们引入MSAlign,受多模态对齐在视觉-语言模型中的启发,通过轻量级MLP投影学习共享的表示空间,通过基于候选的对比目标对两个冻结的基础模型(DreaMS用于质谱和ChemBERTa用于分子)进行对齐。MSAlign易于实现,训练速度快,并在所有基准测试中一致地优于现有方法。第三,我们研究了一个长期存在的评估问题:分子检索中的数据分割策略在数据泄漏和领域偏移之间进行权衡。我们通过引入分布偏移的定量度量来正式化这种张力,并利用它来评估现有基准中的分割策略。所有数据集、分割、候选集以及MSAlign和基线的统一实现已公开发布,以支持可重复的研究。

英文摘要

Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to implement, fast to train and consistently outperforms existing approaches across all benchmarks. Third, we investigate a long-standing evaluation problem: data splitting strategies in molecule retrieval implicitly trade off data leakage against domain shift. We formalize this tension by introducing a quantitative measure of distribution shift, and use it to evaluate splitting strategies in existing benchmarks. All datasets, splits, candidate sets, and a unified implementation of MSAlign and baselines are publicly released to support reproducible research.

2605.19733 2026-05-20 math.NA cs.LG cs.NA 版本更新

Graph Neural Networks for Community Detection in Graph Signal Analysis

图神经网络在图信号分析中的社区检测

Roberto Cavoretto, Alessandra De Rossi, Enrico Montini

AI总结 本文研究了图神经网络在图信号插值框架中的社区检测应用,通过将GNN生成的社区与图基函数(GBF)-PUM插值方法结合,实现了对图信号的准确重建,展示了深度学习在社区检测中对大规模图信号分析的支持。

详情
AI中文摘要

社区检测是图分析中的核心问题,其应用范围从网络科学到图信号处理。近年来,图神经网络(GNNs)已成为学习图结构数据低维表示的有效工具,并在聚类任务中,特别是在大规模和高维图上表现出强劲的性能。本文研究了在图信号插值框架中使用基于GNN的社区检测。在回顾了根据标准分类法的主要GNN架构类别后,我们将所得到的图社区整合到Partition of Unity Method(PUM)中,用于具有图基函数(GBF)的插值。在该方法中,GNN生成的社区被用来构建局部子域,在这些子域上计算GBF插值算子,并随后组合成全局近似。在基准图数据集上进行的数值实验,包括几何和城市网络示例,证明了所提出的GNN聚类与GBF-PUM插值结合方法能够实现准确的信号重建。结果表明,基于深度学习的社区检测可以为局部插值方案提供有效的图划分,支持其在可扩展的图信号分析中的应用。

英文摘要

Community detection is a central problem in graph analysis, with applications ranging from network science to graph signal processing. In recent years, Graph Neural Networks (GNNs) have emerged as effective tools for learning low-dimensional representations of graph-structured data and have shown strong performance in clustering tasks, particularly on large and high-dimensional graphs. This paper investigates the use of GNN-based community detection within a graph signal interpolation framework. After reviewing the main classes of GNN architectures for community detection according to a standard taxonomy, we integrate the resulting graph communities into a Partition of Unity Method (PUM) for interpolation with Graph Basis Functions (GBFs). In this approach, GNN-derived communities are used to construct local subdomains on which GBF interpolants are computed and subsequently combined into a global approximation. Numerical experiments on benchmark %graph datasets, including geometric and urban network examples demonstrate that the proposed combination of GNN-based clustering and GBF-PUM interpolation yields accurate signal reconstructions. The results indicate that deep learning-based community detection can provide effective graph partitions for localized interpolation schemes, supporting its use in scalable graph signal analysis.

2605.19721 2026-05-20 cs.AI cs.LG cs.NI 版本更新

Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization

投影潜在RL动作:面向通用化和可扩展的图组合优化

Franco Terranova, Guillermo Bernardez, Albert Cabellos-Aparicio, Nina Miolane, Abdelkader Lahmadi

发表机构 * Université de Lorraine, CNRS, Inria, LORIA(洛林大学、国家科学研究中心、法国国家信息与自动化研究所、LORIA实验室) University of California Santa Barbara(加州圣芭芭拉大学) Universitat Politècnica de Catalunya(加泰罗尼亚理工大学)

AI总结 本文提出了一种新的RL-GCO方法,通过在连续GNN动作嵌入空间中直接操作,实现高效的图组合优化解算,提升了通用性和可扩展性。

Comments Preprint

详情
AI中文摘要

图组合优化(GCO)因其在许多NP难问题中的自然图表示而受到越来越多的关注,但其组合爆炸使得精确方法在计算上不可行。最近的强化学习(RL)与图神经网络(GNN)的结合显著改进了基于学习的GCO求解器。然而,现有方法在跨不同图实例的泛化能力和随着动作空间增长的计算可扩展性方面存在局限。为了解决这两个挑战,我们引入了投影代理,一种新颖的RL-GCO方法,直接在连续的GNN动作嵌入空间中操作,通过单次前向传递预测所需潜在动作,并随后将其解码为有效的离散动作。此外,我们通过为观察和动作提供共享的嵌入空间,实现了RL方法之间的公平比较。在多样化的基准测试中,我们的方法在推理速度上达到现有解决方案的16.2倍,泛化能力提升40%,同时为具有多个相互依赖变量的超线性决策空间中的强大RL性能打开了大门。最后,我们发布了LaGCO-RL,一个Python库,自动化潜在动作空间的构建并支持现有RL-GCO解决方案,促进可重复性和适应新GCO基准。

英文摘要

Graph combinatorial optimization (GCO) has attracted growing interest, as many NP-hard problems naturally admit graph formulations, yet their combinatorial explosion renders exact methods computationally intractable. Recent advances in Reinforcement Learning (RL) combined with Graph Neural Networks (GNNs) have significantly improved learning-based GCO solvers. However, existing approaches face limitations in both generalization across diverse graph instances and computational scalability as action spaces grow. To address both challenges, we introduce projection agents, a novel RL-GCO approach that operates directly in a continuous GNN-based action embedding space, predicting a desired latent action in a single forward pass and subsequently decoding it into a valid discrete action. Additionally, we enable fair comparison across RL methods through a shared embedding space for both observations and actions. Across diverse benchmarks, our approach achieves up to 16.2x faster inference and up to 40% better generalization than existing solutions using only simple nearest-neighbor decoding, while opening the door to strong RL performance in super-linear decision spaces with multiple interdependent variables. Finally, we release LaGCO-RL, a Python library that automates latent action-space construction and supports existing RL-GCO solutions, promoting reproducibility and adaptation to new GCO benchmarks.

2605.19698 2026-05-20 cs.CR cs.LG 版本更新

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

唤醒 Hydra:在文本到图像扩散模型中稳定多概念后门注入

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma, Songze Li

发表机构 * Yangzhou University(扬州大学) Nanjing University(南京大学) Chongqing University(重庆大学) Southeast University(东南大学)

AI总结 本文研究了在易受干扰的环境下多概念后门攻击的稳定性问题,提出 Hydra 框架,通过约束触发语义和协调跨任务交互,实现稳健且可控的多概念后门注入,实验表明 Hydra 在保持清洁生成质量的同时,有效激活后门。

Comments Preprint. 18 pages

详情
AI中文摘要

文本到图像扩散模型通过开源重用和多次下游微调不断发展,其中重用的检查点难以验证,因此更容易出现隐藏的后门行为。在这样的生态系统中,一个预训练模型可能被多个独立方依次适应和重新分发,导致多个概念特定的触发-目标关联在同一个模型中累积。当这些关联共存时,语义冲突会在共享的表示空间中被放大,导致跨概念纠缠和生成质量下降。值得注意的是,这种累积并不增强攻击,反而可能破坏之前注入的行为并降低攻击可靠性。在本工作中,我们系统地研究了在此干扰环境中后门攻击,并提出 Hydra,一个统一的框架,用于在累积和去中心化的重用下实现稳健和可控的多概念后门注入。我们的核心见解是,在大规模多概念设置下稳定的后门注入需要在优化过程中显式约束触发语义并协调跨任务交互。具体而言,Hydra 在文本编码器空间中执行进化触发搜索,以识别与目标概念语义对齐但与其他注入概念保持稳定的触发器。它进一步结合多任务微调与触发器清洁正则化,以提高在密集多概念注入下的训练稳定性。在多个扩散骨干网络上进行的严格多概念设置下的广泛实验表明,Hydra 在保持清洁生成保真度和图像质量的同时,维持了有效的后门激活。例如,在 8 个攻击者和 500 个概念对上,Hydra 维持了约 95% 的 ASR 和强清洁生成。

英文摘要

Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified framework for robust and controlled multi-concept backdoor injection under cumulative and decentralized reuse. Our core insight is that stable backdoor injection under large-scale multi-concept settings requires explicitly constraining trigger semantics while coordinating cross-task interactions during optimization. Specifically, Hydra performs evolutionary trigger search in the text encoder space to identify triggers that are semantically aligned with their target concepts while remaining stable across other injected concepts. It further combines multi-task fine-tuning with trigger-clean regularization to improve training stability under dense multi-concept injection. Extensive experiments across multiple diffusion backbones under rigorous multi-concept settings show that Hydra maintains effective backdoor activation while preserving clean generation fidelity and image quality. For instance, across 8 attackers and 500 concept pairs, Hydra maintains ~95% ASR and strong clean generation.

2605.19685 2026-05-20 stat.ML cs.LG 版本更新

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

基于扩散Copula的概率多变量时间序列预测

David Huk, Dongshan Wang, Miha Bresar

发表机构 * Department of Statistics The University of Warwick(威斯敏斯特大学统计系) School of Data Science The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院)

AI总结 本文提出了一种扩散-Copula框架,通过分离边际分布学习与依赖结构学习,改进了多变量时间序列预测中对尾部风险的估计,展示了在加密货币市场中对系统性极值的预测优势。

Comments ICLR 2026 Workshop Advances in Financial AI

详情
AI中文摘要

准确评估金融风险需要捕捉单个资产波动性和极端市场事件中复杂的非对称依赖结构。尽管现代扩散基模型在多变量预测方面有所进展,但端到端训练常导致

英文摘要

Accurately assessing financial risk requires capturing both individual asset volatility and the complex, asymmetric dependence structures that emerge during extreme market events. While modern diffusion-based models have advanced multivariate forecasting, they often suffer from a "normality bias" when trained end-to-end, sacrificing marginal calibration for joint coherence and consistently underestimating tail risk. To address this, we propose a Diffusion-Copula framework that explicitly decouples the learning of marginal distributions from their dependence structure. We employ deep Mixture Density Networks to capture heavy-tailed asset dynamics, followed by a Classification-Diffusion Copula to model the joint dependence. Applied to cryptocurrency markets, our approach demonstrates superior performance over state-of-the-art baselines in forecasting systemic extremes of both marginal and joint events. Crucially, we demonstrate that while baseline models classify simultaneous market crashes as statistically impossible "Black Swans" (high surprise), our framework identifies them as "Expected Crashes" (low surprise), successfully preserving the correlation structure necessary for robust risk management during contagion events.

2605.19677 2026-05-20 cs.LG q-bio.QM 版本更新

Agentic Discovery of Cryomicroneedle Formulations

代理发现冷冻微针制剂配方

Hao Li, Lifu Du, Nurul Hameed, Shemonti Saha Authai, Zlata Stefanovic, Chenjie Xu

发表机构 * Department of Biomedical Engineering, City University of Hong Kong(香港城市大学生物医学工程系)

AI总结 本研究提出了一种结合文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证的闭环工作流程,用于发现冷冻微针的冷冻保护剂配方,通过迭代湿实验验证提高了配方的准确性和有效性。

详情
AI中文摘要

冷冻微针提供了一种微创的皮下递送活细胞的途径,但其低温保存配方必须在保护细胞和限制毒性和设备制造约束之间取得平衡。本文报告了一种由AI辅助的闭环工作流程,用于冷冻微针冷冻保护剂的发现,结合了文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证。一个包含198种骨髓干细胞冷冻保存配方的curated数据集(来自42项研究)被转换为21种成分特征,并用于训练一个不确定性的文献先验模型。该模型捕捉了文献数据中的中等结构,但前瞻性地失败了,促使进行迭代的湿实验修正。在十次验证迭代和106次湿实验观察中,模型逐步适应了冷冻微针特定的结果:批次RMSE从41.21个百分点降低到6.86个百分点,后期阶段的排名相关性变得一致为正,累积的湿实验预测与测量总结达到了R²=0.942。最佳验证配方实现了95.15%的复苏存活率,同时具有低DMSO、ectoin、乙二醇和胎牛血清含量。然而,高存活率本身并不保证冷冻微针的完整形成,突显了未来多目标优化的必要性。这些结果表明,代理辅助的计算基础设施可以使数据高效的配方发现对拥有少量内部数据专业知识的实验室更加可及。项目代码可在https://github.com/baitmeister/ML-for-CryoMN上获得。

英文摘要

Cryomicroneedles offer a route to minimally invasive intradermal delivery of living cells, but their cryogenic formulations must reconcile cell protection with constraints on toxicity and device fabrication. Here we report an AI-assisted, closed-loop workflow for cryomicroneedle cryoprotectant discovery that combines literature curation, Gaussian-process surrogate modelling, Bayesian optimization, and sequential wet-lab validation. A curated dataset of 198 mesenchymal stem-cell cryopreservation formulations from 42 studies was converted into 21 ingredient features and used to train an uncertainty-aware literature prior. This model captured moderate structure in the literature data but failed prospectively, motivating iterative wet-lab correction. Across ten validation iterations and 106 wet-lab observations, the model progressively adapted to cryomicroneedle-specific outcomes: batch RMSE decreased from 41.21 to 6.86 percentage points, later-stage rank correlations became consistently positive, and the cumulative wet-lab predicted-versus-measured summary reached $R^2 = 0.942$. The best validated formulation achieved 95.15\% post-thaw viability with low DMSO, ectoin, ethylene glycol, and fetal bovine serum. However, high viability alone did not ensure intact cryomicroneedle formation, highlighting the need for future multi-objective optimization. These results demonstrate that agent-assisted computational infrastructure can make data-efficient formulation discovery more accessible to labs with minimal data expertise in-house. Project code is available at https://github.com/baitmeister/ML-for-CryoMN.

2605.19667 2026-05-20 math.OC cs.LG 版本更新

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

非凸双层优化中基于共识的粒子方法的收敛性

Yutong Chao, Xudong Sun, Konstantin Riedl, Majid Khadiv, Jalal Etesami

发表机构 * Department of Computer Science(计算机科学系) Technical University of Munich(慕尼黑技术大学) Munich Institute of Robotics and Machine Intelligence(慕尼黑机器人与智能机械研究所) Mathematical Institute(数学研究所) University of Oxford(牛津大学)

AI总结 本文研究了一种用于非凸双层优化的基于共识的优化方法,旨在最小化上层函数,其中下层问题的全局极小值集是优化域。该方法无导数,通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建共识点。研究建立了与关联的均场动力学及其有限粒子近似的收敛性保证。特别地,在适当的平滑分位数局部化、误差界和稳定性假设下,证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解。数值实验进一步支持了理论结果。

详情
AI中文摘要

在本文中,我们研究了一种用于非凸双层优化的基于共识的优化方法,其中目标是最小化上层函数,其优化域为下层问题的全局极小值集。所提出的方法是无导数的,其共识点通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建。我们建立了与关联的均场动力学及其有限粒子近似相关的收敛性保证。特别是,在适当的平滑分位数局部化、误差界和稳定性假设下,我们证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解,直到击中时间。在二维约束问题和神经网络训练上的数值实验进一步支持了这些理论结果。

英文摘要

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its consensus point via smooth quantile selection combined with a Gibbs-type Laplace approximation. We establish convergence guarantees for both the associated \textit{mean-field} dynamics and its \textit{finite-particle} approximation. In particular, under suitable assumptions on smooth quantile localization, error bounds, and stability, we show that the mean-field law reaches any arbitrary prescribed Wasserstein neighborhood of the target bi-level solution with an explicit exponential rate up to the hitting time. Numerical experiments on a two-dimensional constrained problem and neural network training further support the theoretical results.

2605.19666 2026-05-20 physics.med-ph cs.LG 版本更新

Cross-View Attention Fusion Net: A Prior-Guided Dual-View Representation Learning for Cardiac Output Estimation from Short-Term PPG Signals

跨视图注意力融合网络:一种基于先验信息的双视图表示学习用于从短时PPG信号估计心输出量

Yaowen Zhang, Bo Cui, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

发表机构 * Department of Biomedical Signals and Systems(生物医学信号与系统系) Department of Cardiovascular and Respiratory Physiology(心血管与呼吸生理学系) Department of Intensive Care(重症医学系)

AI总结 本文提出了一种基于先验信息的双视图深度学习模型CVAF-Net,用于从短时PPG信号估计心输出量,通过跨视图注意力融合技术提升模型性能,并在多个数据集上验证了其有效性。

详情
AI中文摘要

从光体积脉搏波描记术(PPG)准确估计心输出量(CO)对于无创血流动力学监测具有潜力,但仍然困难,因为CO由心脏功能和血管张力共同决定。传统基于特征的模型使用具有生理意义的PPG描述符,但依赖于准确的脉搏检测并可能遗漏潜在的时间关系。相比之下,全端到端深度学习模型直接从原始PPG信号学习,但往往未能充分利用已建立的PPG衍生先验信息。本文引入了跨视图注意力融合网络(CVAF-Net),一种用于从短时、固定长度PPG段估计CO的基于先验信息的双视图深度学习模型。CVAF-Net将原始PPG信号作为时间视图,并将特征序列图(FSM)作为结构化先验引导视图,通过跨视图注意力融合两种表示。该模型独立评估了来自三个数据集的5秒、15秒和30秒段:模拟脉冲波(3323名受试者)、血管收缩诱发(79名受试者)以及静息/骑车活动(10名受试者),并与多种机器学习和深度学习基准进行了比较。CVAF-Net在大多数基准方法上表现更优,并在模拟数据上以平均绝对误差(MAE)为0.19 L/min(MAPE: 3.95%)与最先进的基于Transformer的模型性能相当,在现实世界中也实现了高准确性(最小MAE: 1.20 L/min)。重要的是,CVAF-Net将浮点运算次数(FLOPs)减少了十二倍,与领先的基于Transformer的模型相比。合理性分析显示,CO估计在生理上一致,与年龄(ρ=-0.274)、心率(ρ=0.894)和全身血管阻力(ρ=-0.740)有预期的相关性。这些发现表明,CVAF-Net提供了一种准确、计算高效且可推广的连续可穿戴CO监测方法。

英文摘要

Accurate cardiac output (CO) estimation from photoplethysmography (PPG) is promising for unobtrusive hemodynamic monitoring, but remains difficult since CO is jointly determined by cardiac function and vascular tone. Conventional feature-based models use physiologically meaningful PPG descriptors, yet depend on accurate pulse detection and may miss latent temporal relationships. In contrast, fully end-to-end deep learning models learn directly from raw PPG but often underuse established PPG-derived prior information. Here, we introduce the Cross-View Attention Fusion Network (CVAF-Net), a prior-guided dual-view deep learning model for CO estimation from short, fixed-length PPG segments. CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention. The model was independently evaluated using 5-, 15-, and 30-s segments from three datasets: simulated pulse waves (3323 subjects), vasoconstriction provocation (79 subjects), and resting/cycling activities (10 subjects), and was compared with multiple machine learning and deep learning benchmarks. CVAF-Net outperformed most benchmark methods and achieved performance comparable to a state-of-the-art Transformer-based model, with a mean absolute error (MAE) of 0.19 L/min (MAPE: 3.95%) on simulated data and high accuracy in real-world settings (minimum MAE: 1.20 L/min). Importantly, CVAF-Net reduced FLOPs by twelvefold compared with the leading Transformer-based model. Plausibility analysis showed physiologically consistent CO estimates, with expected correlations with age ($ρ= -0.274$), heart rate ($ρ= 0.894$), and systemic vascular resistance ($ρ= -0.740$). These findings indicate that CVAF-Net provides an accurate, computationally efficient, and generalizable approach for continuous wearable-based CO monitoring.

2605.19660 2026-05-20 cs.LG cs.CL 版本更新

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

OScaR:LLMs及更广泛场景中的极压缩KV缓存量化之奥卡姆之刀

Zunhai Su, Rui Yang, Chao Zhang, Yaxiu Liu, Yifan Zhang, Wei Wu, Jing Xiong, Dayou Du, Xialie Zhuang, Yulei Qian, Yuchen Xie, Yik-Chung Wu, Hongxia Yang, Ngai Wong

发表机构 * Tsinghua University(清华大学) Meituan LongCat Team(美团LongCat团队) The University of Hong Kong(香港大学) The University of Edinburgh(爱丁堡大学) UCAS(中国科学技术大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文针对LLMs中KV缓存极压缩时的量化保真问题,提出OScaR框架,通过Canalized Rotation和Omni-Token Scaling有效缓解Token Norm Imbalance,实现近无损的INT2量化性能,同时提升解码速度和吞吐量。

Comments Under review

详情
AI中文摘要

快速发展的长上下文推理和多模态智能使Key-Value(KV)缓存的内存占用成为高效部署的主要内存瓶颈。虽然已建立的每通道量化方法能有效处理Key张量中的固有通道级异常值,但在极端压缩下其效果下降。本文从经验和理论角度重新审视每通道量化范式的固有限制。我们的分析指出Token Norm Imbalance(TNI)是量化保真度的主要瓶颈。我们证明当共享量化参数需要覆盖具有显著范数差异的token组时,TNI会系统性地放大误差。而不是依赖复杂的量化流水线(如TurboQuant),我们提出了OScaR(Omni-Scaled Canalized Rotation),一种适用于X-LLMs(即纯文本、多模态和全模态LLMs)的准确且轻量的KV缓存压缩框架。在推进每通道范式的基础上,OScaR通过Canalized Rotation后接Omni-Token Scaling,有效且高效地缓解TNI引起的序列维度方差,进一步通过优化的系统设计和CUDA内核支持。在X-LLMs上的广泛评估显示,OScaR在INT2量化下实现了近无损性能,优于现有方法,确立了其作为稳健、低复杂度和通用的框架,定义了新的帕累托前沿。与BF16 FlashDecoding-v2基线相比,我们的OScaR实现解码速度提升达3.0倍,内存占用减少5.3倍,吞吐量增加4.1倍。OScaR的代码在https://github.com/ZunhaiSu/OScaR-KV-Quant公开。

英文摘要

The rapid advancement toward long-context reasoning and multi-modal intelligence has made the memory footprint of the Key-Value (KV) cache a dominant memory bottleneck for efficient deployment. While the established per-channel quantization effectively accommodates intrinsic channel-wise outliers in Key tensors, its efficacy diminishes under extreme compression. In this work, we revisit the inherent limitations of the per-channel quantization paradigm from both empirical and theoretical perspectives. Our analysis identifies Token Norm Imbalance (TNI) as the primary bottleneck to quantization fidelity. We demonstrate that TNI systematically amplifies errors when shared quantization parameters are required to span token groups exhibiting substantial norm disparities. Instead of relying on intricate quantization pipelines (e.g., TurboQuant), we propose OScaR (Omni-Scaled Canalized Rotation), an accurate and lightweight KV cache compression framework for X-LLMs (i.e., text-only, multi-modal, and omni-modal LLMs). Advancing the per-channel paradigm, OScaR employs Canalized Rotation followed by Omni-Token Scaling to mitigate TNI-induced sequence-dimensional variance both effectively and efficiently, further supported by our optimized system design and CUDA kernels. Extensive evaluations across X-LLMs show that OScaR consistently outperforms existing methods and achieves near-lossless performance under INT2 quantization, establishing it as a robust, low-complexity, and universal framework that defines a new Pareto front. Compared with the BF16 FlashDecoding-v2 baseline, our OScaR implementation achieves a notable up to 3.0x speedup in decoding, reduces memory footprint by 5.3x, and increases throughput by 4.1x. The code for OScaR is publicly available at https://github.com/ZunhaiSu/OScaR-KV-Quant.

2605.19646 2026-05-20 q-bio.NC cs.LG 版本更新

BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

BCI-sift: 一种用于脑机接口应用的自动化特征选择工具箱

Elena C Offenberg, Dirk Keller, Mariska J Vansteensel, Zachary V Freudenburg, Nick F Ramsey, Julia Berezutskaya

发表机构 * Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University(神经学与神经外科学系,乌得勒支大学医学中心脑研究中心,乌得勒支大学) Donders Institute for Brain, Cognition and Behaviour, Radboud University(脑、认知与行为研究所,拉德堡德大学)

AI总结 本文提出BCI-sift工具箱,通过整合先进优化方法,为脑机接口任务提供自动化特征选择解决方案,提升了分类准确性和解释性。

Comments 19 pages, 12 figures

详情
AI中文摘要

在临床脑机接口(BCI)领域的发展依赖于精确且可靠的信号解释。然而,来自植入式和非植入式BCI采集的数据具有高维性和噪声特性,这带来了重大挑战,推动了特征选择算法的应用。我们引入了BCI-sift(BCI系统性和可解释性特征调节),一种基于Python的工具箱,旨在简化将各种优化算法应用于BCI数据集以识别机器学习任务中最相关的特征。我们的scikit-learn兼容工具箱(github.com/UMCU-RIBS/BCI-sift)通过整合先进的优化方法简化了BCI任务中的特征选择。我们验证了该工具箱在8名健康受试者(64-128个电极植入在运动皮层上)的高密度电极图(HD ECoG)数据上的性能,这些受试者重复说出12个单词。BCI-sift在电极、时间及频率维度上识别了信息丰富的神经特征。电极选择的解剖位置在不同受试者之间一致,并与已知的运动皮层功能组织一致。相关时间点集中在说话产生周围,高频带被识别为最信息丰富的,这与先前工作一致。特征选择比使用所有特征提高了分类准确性。BCI-sift提供了一个易于使用的多功能平台,用于BCI研究中的特征选择,能够提高解码性能、自动化特征分析和增强解释性。虽然验证了HD ECoG数据,该方法广泛适用于其他BCI模态。通过提高分类准确性和可解释性,BCI-sift解决了开发高效和透明BCI系统的关键挑战。

英文摘要

Advancements in clinical Brain-Computer Interfaces (BCIs) depend on precise and reliable signal interpretation. However, the high-dimensional and noisy nature of data captured from both implanted and non-implanted BCIs poses significant challenges, motivating the use of feature selection algorithms. We introduce BCI-sift (BCI Systematic and Interpretable Feature Tuning), a Python-based toolbox designed to streamline the application of diverse optimization algorithms to BCI datasets for identifying the most relevant features in machine learning tasks. Our scikit-learn-compatible toolbox (github.com/UMCU-RIBS/BCI-sift) simplifies feature selection in BCI tasks by integrating advanced optimization methods. We validated the toolbox on high-density electrocorticography (HD ECoG) data from eight able-bodied participants with 64-128 electrodes implanted over the sensorimotor cortex, who repeatedly spoke 12 words. BCI-sift identified informative neural features across electrode, temporal, and frequency dimensions. The anatomical locations of electrode selections were consistent across participants and aligned with known functional organization of the sensorimotor cortex. Relevant time points clustered around speech production, and the high-frequency band was identified as most informative, in line with prior work. Feature selection improved classification accuracy compared to using all features. BCI-sift provides an accessible and versatile platform for feature selection in BCI research, enabling improved decoding performance, automated feature analysis, and enhanced interpretability. While validated on HD ECoG data, the approach is broadly applicable to other BCI modalities. By enhancing classification accuracy and interpretability, BCI-sift addresses key challenges in developing efficient and transparent BCI systems.

2605.19644 2026-05-20 cs.CR cs.LG 版本更新

Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies

从知识图谱嵌入中推断敏感属性:攻击与防御策略

Yasmine Hayder

发表机构 * LIFO, INSA CVL, Univ. Orléans, Inria, France(LIFO,法国里尔大学CVL学院,奥尔良大学,法国国家信息与自动化研究所)

AI总结 本文研究了基于知识图谱嵌入(KGE)推理的隐私风险,提出了一种通过后处理去污技术减轻这些风险的框架,探讨了在推荐质量与隐私保护之间进行权衡的必要性。

详情
Journal ref
ESWC - Extended Semantic Web Conference, May 2026, Dubrovnik, France
AI中文摘要

知识图谱(KGs)是一种强大的链接数据表示形式,提供了灵活性、语义丰富性和支持知识丰富和推理的能力。它们帮助数据所有者组织和利用异构数据以提供有洞察力的服务(例如推荐),但现实中的KGs往往不完整,隐藏了真实的事实或遗漏了有价值的观点。知识图谱嵌入技术常用于推断有价值的缺失信息。然而,对KGs的推理可能会无意中暴露敏感的用户信息,即使这些数据并未显式存储。在本文中,我们研究了基于KGE推理的隐私风险,重点关注攻击者试图从看似非敏感的输出中推断出敏感用户属性的属性推断攻击。我们提出并评估了一个框架,通过应用后处理去污技术来减轻这些隐私风险。初步结果展示了这些攻击对KGE模型输出的有效性,并探讨了在应用基于随机化的技术时推荐质量与隐私保护之间的权衡,突显了未来工作需要实验更高级技术以解决此问题的必要性。

英文摘要

Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.

2605.19641 2026-05-20 stat.ML cs.LG 版本更新

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

增加缺失值以减少偏差:带有缺失数据的Richardson-SGD

Ferdinand Genans, Erwan Scornet

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM(索邦大学和巴黎cité大学,CNRS,概率、统计与建模实验室,LPSM)

AI总结 本文研究了如何通过增加缺失值来减少梯度偏差,提出了一种基于Richardson外推的Richardson-SGD方法,该方法通过在已有不完整数据的基础上故意增加缺失率,从而抵消梯度偏差,提高了不完整数据下的优化和估计性能。

详情
AI中文摘要

随机梯度方法在现代大规模学习中至关重要,但其在不完整协变量中的使用仍然谨慎,因为插补方案通常会引入系统性的梯度偏差,如在线性模型中所示。在本工作中,我们证明了所有参数模型在各种插补程序中都表现出相似的梯度偏差,并且精确地刻画了缺失率向量p的依赖性,其中O(||p||)是主导项。我们利用这一分析,提出了一种简单的去偏差程序,用于带有缺失值的随机梯度下降(SGD),基于Richardson外推。关键思想是“故意增加缺失率”:从已有的不完整观测中,生成一个更稀疏的版本,在更高的、受控的缺失率下,并将两个结果的随机梯度结合以抵消主导的偏差项。我们证明,在几种缺失情况中,一个Richardson步骤将梯度偏差从O(||p||)减少到O(||p||²)。我们提出的方法计算高效,模型无关,并适用于任何参数损失函数,其随机梯度可以在插补后计算。此外,当缺失指示符独立时,总体梯度偏差是p的多线性多项式,并仅取决于由声明单个坐标缺失引起的总体梯度误差。在这种情况下,我们的方法可以推广到多步Richardson过程,该过程递归地抵消更高阶项。在经验上,Richardson去偏差提高了多个广义线性模型中的优化和估计性能,并与广泛使用的插补程序如MICE相结合。这些结果表明,有些反直觉地,在现有缺失数据上添加受控的缺失率可以使不完整数据的随机学习更准确。

英文摘要

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.

2605.19633 2026-05-20 cs.CL cs.AI cs.LG cs.NE cs.SE 版本更新

optimize_anything: A Universal API for Optimizing any Text Parameter

optimize_anything: 一个用于优化任何文本参数的通用API

Lakshya A Agrawal, Donghyun Lee, Shangyin Tan, Wenjie Ma, Karim Elmaaroufi, Rohit Sandadi, Sanjit A. Seshia, Koushik Sen, Dan Klein, Ion Stoica, Joseph E. Gonzalez, Omar Khattab, Alexandros G. Dimakis, Matei Zaharia

发表机构 * MIT(麻省理工学院)

AI总结 本文提出了一种基于LLM的通用优化系统,能够跨不同领域实现文本参数的优化,展示了其在六个多样化任务中的state-of-the-art性能,通过多任务搜索和跨问题迁移实现了高效的优化。

Comments 16 pages, 11 figures; Blog: https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-optimize-anything/

详情
Journal ref
Proceedings of the ACM Conference on AI and Agentic Systems (CAIS 26), May 26-29, 2026, San Jose, CA, USA
AI中文摘要

能否一个基于LLM的优化系统在根本不同的领域中匹配专门工具?我们证明当优化问题被表述为改进一个通过评分函数评估的文本工件时,一个基于AI的优化系统—支持单任务搜索、多任务搜索和跨问题迁移以及对未见过的输入进行泛化—在六个不同的任务中实现了state-of-the-art的结果。我们的系统发现了将Gemini Flash的ARC-AGI准确性几乎提高三倍的代理架构(32.5%到89.5%),发现了将云成本降低40%的调度算法,生成了87%匹配或超过PyTorch的CUDA内核,并优于AlphaEvolve报告的圆圈打包解决方案(n=26)。在三个领域的消融研究揭示了可操作的侧信息比仅评分反馈更快收敛且最终得分更高,且多任务搜索在同等问题预算下通过跨任务迁移优于独立优化。共同,我们首次展示了基于LLM搜索的文本优化是一种通用问题解决范式,将传统需要领域特定算法的任务统一到一个框架下。我们开源了optimize_anything,并支持多个后端作为GEPA项目的一部分,在https://github.com/gepa-ai/gepa上。

英文摘要

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .

2605.19629 2026-05-20 stat.ML cs.LG math.OC 版本更新

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

高斯近似与乘数自助法用于联邦线性随机逼近

Ilya Levin, Maksim Shuklin, Eric Moulines, Paul Mangold, Sergey Samsonov

发表机构 * HSE University(莫斯科国立高等经济学院) MBZUAI(马克斯·普朗克智能系统研究所) CMAP, CNRS, École Polytechnique, Institut Polytechnique de Paris(巴黎高等理工学院应用数学与计算科学实验室,法国国家科学研究中心)

AI总结 本文建立了联邦线性随机逼近的Berry-Esseen型界,首次明确捕捉通信-计算权衡和异质性误差项的联邦高斯近似,量化了局部步长、局部更新次数和异质性对收敛速率的影响。

详情
AI中文摘要

在本文中,我们为联邦线性随机逼近(LSA)建立了Berry-Esseen型界。我们的结果提供了首个明确捕捉通信-计算权衡和异质性误差项的联邦高斯近似,量化了局部步长、局部更新次数和异质性对收敛速率的影响。我们为两种情况提供了结果:(i)常数步长域和(ii)递减步长与递增局部迭代次数,恢复了Bonnerjee等人[2025]最近的速率作为特殊情况。作为我们结果的主要应用,我们开发了一个在线乘数自助法用于最后迭代的推断,避免了显式估计渐近协方差矩阵,并获得了该过程的非渐近有效性保证。

英文摘要

In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs and heterogeneity-aware error terms, quantifying the effects of local step size, number of local updates, and heterogeneity on convergence rates. We present results for both (i) constant step size regime and (ii) decreasing step size with an increasing number of local iterations, recovering the recent rates of Bonnerjee et al. [2025] as a special case. As a primary application of our results, we develop an online multiplier bootstrap procedure for inference on the last iterate, which avoids explicit estimation of the asymptotic covariance matrix, and obtain non-asymptotic validity guarantees for this procedure.

2605.19625 2026-05-20 cs.LG 版本更新

Optimal Reconstruction from Linear Queries

从线性查询中最优重建

Yuval Filmus, Shay Moran, Elizaveta Nesterova

发表机构 * Technion – Israel Institute of Technology(技术学院 – 以色列理工学院) Google Research(谷歌研究)

AI总结 研究如何从近似线性查询中重建未知点,分析查询数量、维度和噪声参数对重建误差的影响,并提出一种改进的重建问题变体。

Comments Accepted to COLT 2026. 46 pages, 4 figures

详情
AI中文摘要

我们研究从近似线性查询中重建$\mathbb{R}^d$中未知点的问题。该设定出现在从低维遥感和信号恢复到高维数据分析和隐私敏感推断的应用中。我们的主要目标是将最优重建误差作为查询数量$T$、环境维度$d$和噪声参数$\delta$的函数进行表征。我们首先分析$T o \infty$的极限,证明最优重建误差收敛到显式值$\sqrt{2d/(d+1)} \delta$,其作用类似于监督学习中的贝叶斯最优误差。当维度固定时,我们显示在该极限之上,误差以双指数速度衰减,比通常在学习曲线中遇到的速率快得多。当维度增长时,我们证明需要数量级为$\exp(d)$的查询才能实现消失的误差。最后,我们介绍并分析了重建问题的一个不恰当变体。从技术角度看,我们的主要贡献是Jung定理(1901)的推广。经典定理界定了直径为1的集合的最大可能半径,并刻画了极值体。我们的推广提供了一个鲁棒变体,刻画了近极值体,并通过利用对称性和李群作用的几何和动力学论证证明。

英文摘要

We study the problem of reconstructing an unknown point in $\mathbb{R}^d$ from approximate linear queries. This setting arises naturally in applications ranging from low-dimensional remote sensing and signal recovery to high-dimensional data analysis and privacy-sensitive inference. Our main goal is to characterize the optimal reconstruction error as a function of the number of queries $T$, the ambient dimension $d$, and the noise parameter $δ$. We first analyze the limit $T \to \infty$ and show that the optimal reconstruction error converges to the explicit value $\sqrt{2d/(d+1)} δ$, which plays a role analogous to the Bayes optimal error in supervised learning. When the dimension is fixed, we show that the excess error above this limit decays doubly exponentially fast as $T \to \infty$, a rate that is significantly faster than those typically encountered in learning curves. When the dimension grows, we show that a number of queries on the order of $\exp(d)$ is necessary and sufficient to achieve vanishing excess error. Finally, we introduce and analyze an improper variant of the reconstruction problem. From a technical perspective, our main contribution is a generalization of Jung's theorem (1901). The classical theorem bounds the maximum possible radius of a set of diameter 1 and characterizes extremal bodies. Our generalization provides a robust variant that characterizes near-extremal bodies and is proved via geometric and dynamical arguments exploiting symmetry and Lie group actions.

2605.19621 2026-05-20 eess.IV cs.LG cs.NA math.NA 版本更新

Diffusion Graph Posterior Sampling for Nonlinear Inverse Problems with Application to Electrical Impedance Tomography

基于扩散后验采样的图结构数据非线性反问题求解方法及其在电阻抗断层成像中的应用

Giovanni S. Alberti, Damiana Lazzaro, Serena Morigi, Matteo Santacesaria, Shibo Wang

发表机构 * MaLGa Center, Department of Mathematics, University of Genova(马尔加中心,数学系,热那亚大学) Department of Mathematics, University of Bologna(数学系,博洛尼亚大学) Department of Mathematics, Harbin Institute of Technology(数学系,哈尔滨工业大学)

AI总结 本文提出了一种扩展扩散后验采样(DPS)到图结构数据的框架,通过在二维三角网格上开发无条件分数基于扩散模型来学习物理解空间的准确先验,并引入正则化变体RDPS,结合总变差和广义Tikhonov等显式正则化项,以缓解严重病态问题,实验表明RDPS在合成和真实2D EIT数据集上产生稳定且物理合理的重建。

详情
AI中文摘要

深度生成模型已发展为解决反问题的最先进方法,但将其应用于PDE反问题,如电阻抗断层成像(EIT)仍具挑战性。由于物理领域自然离散为无结构网格而非规则网格,标准卷积架构往往不足。本文提出了一种新的框架,将扩散后验采样(DPS)扩展到图结构数据。我们开发了直接在2D三角网格上无条件分数基于扩散模型,以学习物理解空间的准确先验。此外,我们引入正则化变体RDPS,结合总变差和广义Tikhonov等显式正则化项,以补充隐含扩散先验并缓解严重病态问题。在合成和真实2D EIT数据集上的广泛实验表明,RDPS产生稳定、物理合理的重建。我们的方法能够很好地推广到非分布包含几何形状,对测量噪声具有高度鲁棒性,并在重建准确性和伪影减少方面优于当前最先进的求解器(例如GPnP-BM3D、DP-SGS)

英文摘要

Deep generative models have emerged as state-of-the-art for solving inverse problems, but applying them to inverse problems for PDEs, like electrical impedance tomography (EIT) remains challenging. Because physical domains are naturally discretized as unstructured meshes rather than regular grids, standard convolutional architectures are often inadequate. In this paper, we propose a novel framework that extends diffusion posterior sampling (DPS) to graph-structured data. We develop an unconditional score-based diffusion model directly on a 2D triangular mesh to learn an accurate prior over the physical solution space. Furthermore, we introduce a regularized variant, RDPS, which incorporates explicit regularization terms, such as total variation and generalized Tikhonov, to complement the implicit diffusion prior and mitigate severe ill-posedness. Extensive experiments on synthetic and real 2D EIT datasets demonstrate that RDPS produces stable, physically plausible reconstructions. Our approach generalizes well to out-of-distribution inclusion geometries, is highly robust to measurement noise, and outperforms current state-of-the-art solvers (e.g., GPnP-BM3D, DP-SGS) in reconstruction accuracy and artifact reduction.

2605.19619 2026-05-20 cs.LG cs.AI math.OC stat.ML 版本更新

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

MiMuon: 一种具有改进泛化能力的混合穆恩优化器用于大模型

Feihu Huang, Yuning Luo, Songcan Chen

发表机构 * College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics(南京航空航天大学计算机科学与技术学院) MIIT Key Laboratory of Pattern Analysis and Machine Intelligence(信息科技部模式分析与机器智能重点实验室) College of Design and Engineering, National University of Singapore(新加坡国立大学设计与工程学院)

AI总结 本文研究了穆恩优化器的泛化误差,提出了一种改进的混合穆恩优化器MiMuon,证明其泛化误差更低,同时保持了与穆恩优化器相同的收敛速度。

Comments 25 pages

详情
AI中文摘要

矩阵结构的参数在许多人工智能模型中频繁出现,例如大语言模型。最近,为大规模模型的矩阵参数设计了一种高效的穆恩优化器,其收敛速度明显快于向量级算法。尽管一些工作已经开始研究穆恩优化器的收敛性质(即优化误差),但其泛化性质(即泛化误差)尚未建立。因此,在本文中,我们基于算法稳定性与数学归纳法研究穆恩优化器的泛化误差,并证明穆恩优化器的泛化误差为O(1/(Nκ^T)),其中N为训练样本数量,T表示迭代次数,κ>0表示梯度估计奇异值之间的最小差。为了增强穆恩优化器的泛化能力,我们通过谨慎使用梯度的正交化,提出了一种有效的混合穆恩(MiMuon)优化器,该优化器是穆恩优化器与基于动量的SGD优化器的混合。然后我们证明我们的MiMuon优化器的泛化误差比穆恩优化器的O(1/(Nκ^T))更低,因为κ通常非常小。同时,我们还研究了我们MiMuon算法的收敛性质,并证明我们的MiMuon算法具有与穆恩算法相同的收敛速度O(1/T^{1/4})。在训练大模型(包括Qwen3-0.6B和YOLO26m)的一些数值实验结果中展示了MiMuon优化器的效率。

英文摘要

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algorithms. Although some works have begun to study convergence properties (i.e., optimization error) of the Muon optimizer, its generalization properties (i.e., generalization error) is still not established. Thus, in this paper, we study generalization error of the Muon optimizer based on algorithmic stability and mathematical induction, and prove that the Muon has a generalization error of $O\big(\frac{1}{Nκ^{T}}\big)$, where $N$ is training sample size, and $T$ denotes iteration number, and $κ>0$ denotes minimum difference between singular values of gradient estimate. To enhance generalization of the Muon, we propose an effective mixed Muon (MiMuon) optimizer by cautiously using orthogonalization of gradient, which is a hybrid of Muon and momentum-based SGD optimizers. Then we prove that our MiMuon optimizer has a lower generalization error of $O\big(\frac{1}{N}\big)$ than $O\big(\frac{1}{Nκ^{T}}\big)$ of Muon optimizer, since $κ$ generally is very small. Meanwhile, we also studied the convergence properties of our MiMuon algorithm, and prove that our MiMuon algorithm has the same convergence rate of $O(\frac{1}{T^{1/4}})$ as the Muon algorithm. Some numerical experimental results on training large models including Qwen3-0.6B and YOLO26m demonstrate efficiency of the MiMuon optimizer.

2605.19618 2026-05-20 cs.LG stat.ME 版本更新

A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees

可解释性集成树的重建质量评估的一类发散度度量

Massimo Aria, Agostino Gnasso, Carmela Iorio

发表机构 * Department of Economics and Statistics, University of Naples Federico II(那不勒斯费德里科二世大学经济与统计系)

AI总结 本文提出了一种基于发散度的度量框架,用于评估可解释性集成树的重建质量,通过区分一致性和关联性,提供了一种新的诊断方法来识别重建失败的具体原因。

详情
AI中文摘要

验证集成学习者可解释的替代模型需要测量集成内部表示与其替代近似之间的同意程度,而不是仅仅关联性。基于相关性的方法是尺度不变的,无法检测共现结构中的系统性差异。我们提出了一种基于一致性和关联性区别的统计框架,以归一化的可解释性损失(nLoI)为中心。该框架基于Cressie-Read幂发散家族,lambda等于2,nLoI可以分解为节点内和节点间的组成部分,提供了独特的诊断能力,以精确识别重建失败的位置和原因。该框架包含四个互补的度量,捕捉替代质量的不同结构方面。统一的排列检验程序在单次重采样过程中为所有度量提供有效的推断。每个度量的理论性质,包括有界性和对称性,均已建立。蒙特卡洛模拟和实证评估证实了精确的I型错误控制,并展示了这些度量能够检测出相关性方法无法检测到的重建保真度梯度。该框架在可解释性集成树(E2Tree)的背景下开发和说明,并在三个基准数据集上的实证评估展示了该框架的实际应用价值。

英文摘要

Validating interpretable surrogate models for ensemble learners requires measuring agreement between the ensemble's internal representation and its surrogate approximation, rather than mere association. Correlation-based approaches are scale-invariant and fail to detect systematic discrepancies in co-occurrence structure. We propose a statistical framework grounded in the agreement-association distinction, centered on the normalized Loss of Interpretability (nLoI). Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure delivers valid inference for all measures within a single resampling pass. Theoretical properties, including boundedness and symmetry, are established for each metric. Monte Carlo simulations and empirical evaluations confirm exact Type I error control and demonstrate that these measures detect reconstruction fidelity gradients invisible to correlation-based alternatives. The framework is developed and illustrated in the context of Explainable Ensemble Trees (E2Tree), and empirical evaluation on three benchmark datasets illustrates the practical utility of the framework.

2605.19610 2026-05-20 stat.ML cs.LG 版本更新

Posterior Contraction of Lévy Adaptive B-spline Regression in Besov Spaces

Lévy自适应B样条回归在Besov空间中的后验收缩

Jeunghun Oh, Sewon Park, Jaeyong Lee

发表机构 * Department of Statistics, Seoul National University(首尔国立大学统计系) Department of Statistics, Sookmyung Women’s University(淑明女子大学统计系)

AI总结 本文研究了Lévy自适应B样条回归模型在Besov空间中的后验收缩性质,证明了该模型在非参数回归框架中能够以接近最优的速率收敛到真实函数,同时自动适应未知的光滑度。

详情
AI中文摘要

我们研究了Lévy自适应B样条(LABS)回归模型的渐近性质,这是一种将B样条核纳入Lévy自适应回归核(LARK)模型的贝叶斯非参数方法。LABS应用具有不同次数的样条,并独立定义结点,从而获得一个灵活的模型类,能够适应真实函数的不规则和局部结构特征。在单变量随机设计和高斯误差的非参数回归框架中,我们证明了LABS后验在Besov类中以接近最优的速率收敛到真实函数,直至一个对数因子,同时自动适应未知的光滑度。本研究填补了文献中的空白,因为关于LARK模型在Besov空间中的后验收缩的理论结果仍然很少。在Besov空间的标准测试函数(包括Blocks、Bumps、HeaviSine和Doppler)上的模拟实验补充了理论结果,并展示了LABS的实用价值。

英文摘要

We investigate the asymptotic properties of the Lévy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the Lévy Adaptive Regression Kernel (LARK) model. LABS applies splines of varying degrees with independently defined knots, yielding a flexible model class capable of adapting to irregular and locally structured features of the true function. Within the nonparametric regression framework with univariate random design and Gaussian errors, we establish that the LABS posterior contracts around the true function in Besov classes at nearly minimax-optimal rates, up to a logarithmic factor, while adapting automatically to unknown smoothness. This study contributes to filling a gap in the literature, where theoretical results on posterior contraction of the LARK model in Besov spaces remain scarce. Simulation experiments on standard test functions in Besov spaces, including Blocks, Bumps, HeaviSine, and Doppler, complement the theoretical results and demonstrate the practical utility of LABS.

2605.19607 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

基于谱积分梯度的粗到细特征归因

Soyeon Kim, Seongwoo Lim, Kyowoon Lee, Jaesik Choi

发表机构 * Korea Advanced Institute of Science and Technology(韩国科学技术院) INEEJI Corp.(INEEJI公司)

AI总结 本文提出Spectral Integrated Gradients(SIG)方法,通过奇异值分解构建积分路径,以减少噪声并提高特征归因的准确性,优于传统路径基方法。

Comments 21 pages, 13 figures, 9 tables. Accepted to ACM KDD 2026; includes appendix

详情
AI中文摘要

积分梯度(IG)是一种广泛采用的特征归因方法,满足理想的公理性质。然而,积分路径的选择显著影响归因质量,标准直线路径同时引入所有输入特征,通常在途中积累噪声梯度。为解决这一限制,我们提出了Spectral Integrated Gradients,通过基线到输入差异的奇异值分解(SVD)构建积分路径。通过逐步激活奇异成分,从最大到最小,SIG在引入全局结构之前引入细粒度细节,自然遵循粗到细的进程。通过在多种图像分类数据集上的广泛评估,我们证明SIG生成的归因图更干净,噪声更少,并在定量性能上优于现有基于路径的归因方法。我们的代码可在https://github.com/leekwoon/sig/上获得。

英文摘要

Integrated Gradients (IG) is a widely adopted feature attribution method that satisfies desirable axiomatic properties. However, the choice of integration path significantly affects the quality of attributions, and the standard straight-line path introduces all input features simultaneously, often accumulating noisy gradients along the way. To address this limitation, we propose Spectral Integrated Gradients, which constructs integration paths based on singular value decomposition (SVD) of the baseline-to-input difference. By progressively activating singular components from largest to smallest, SIG introduces global structure before fine-grained details, naturally following a coarse-to-fine progression. Through extensive evaluation across diverse image classification datasets, we demonstrate that SIG produces cleaner attribution maps with reduced noise and achieves improved quantitative performance compared to existing path-based attribution methods. Our code is available at https://github.com/leekwoon/sig/.

2605.19589 2026-05-20 cs.LG physics.flu-dyn 版本更新

Physics-Informed Graph Neural Network Surrogates for Turbulent Nanoparticle Dispersion in Dental Clinical Environments

具有物理信息的图神经网络代理用于牙科临床环境中湍流纳米粒子分散

Takshak Shende, Viktor Popov

发表机构 * Department of Mechanical Engineering, University College London (UCL)(伦敦大学学院机械工程系) Ascend Technologies Ltd(Ascend技术有限公司)

AI总结 本文提出了一种结合物理信息的图神经网络代理,用于预测牙科临床环境中湍流纳米粒子的分散过程,通过改进的图网络和物理模型提高了计算效率和准确性。

Comments 40 pages, 12 figures,

详情
AI中文摘要

牙科气溶胶程序会产生亚50微米的颗粒,这些颗粒可以在封闭的诊所中长时间悬浮,从而为空气传播病原体的传播提供途径。雷诺平均纳维-斯托克斯(RANS)模拟结合欧拉-拉格朗日粒子追踪可以准确捕捉这种传输,但每个场景的运行时间非常长,这使得在三维空间中无法实时支持临床决策。本文提出了一种欧拉-拉格朗日图交互网络(ELGIN),这是一种具有物理信息的图代理,能够同时预测载流体流动动力学在OpenFOAM多面体网格上的动态以及多分散喷雾云中每个包裹的运动。ELGIN通过可微逆距离网格-包裹耦合,将多头图变换器与雅可比预处理的可学习压力投影和湍流闭合头连接到一个sigmoid门控拉格朗日交互网络。ELGIN使用辛特尔-弗莱特积分器推进包裹。一个四阶段的物理信息课程稳定了260步自回归滚动,而无需梯度爆炸。通过foam-extend 4.1 OpenFOAM reactingParcelFoam在临床相关通风速率和手piece喷雾速度下的参数扫描提供了CFD地面真实数据。本文报告了一种单案例演示,其中ELGIN和一个仅基于拉格朗日的基线(M0)都在二十案例扫描的Sweep_Case_03上进行训练和评估;完整的16/2/2重训练正在进行,并将取代所有报告的指标。在该案例中,ELGIN比M0更紧密地跟踪foam-extend粒子云:平均包裹位移误差从房间宽度的19.56%降至16.20%,云半径-惯性误差从9.85%降至6.58%。26秒的滚动在4GB GPU上完成于约64秒,比foam-extend参考流程快约37倍,朝着多案例检查点到位后每就诊感染风险筛查的目标前进。

英文摘要

Dental aerosol procedures produce sub-50 micrometre nuclei that can remain airborne for long periods in enclosed clinics, creating pathways for airborne pathogen transmission. Reynolds-Averaged Navier-Stokes (RANS) simulations with Euler-Lagrange particle tracking capture this transport accurately but require very long run times per scenario, which precludes real-time clinical decision support in 3D. We present the Eulerian-Lagrangian Graph Interaction Network (ELGIN), a physics-informed graph surrogate that jointly predicts carrier-flow dynamics on the OpenFOAM polyhedral mesh and the per-parcel motion of the polydisperse spray cloud. ELGIN couples a multi-head Graph Transformer with Jacobi-preconditioned learnable pressure projection and a turbulence-closure head to a sigmoid-gated Lagrangian Interaction Network through differentiable inverse-distance mesh-parcel coupling, and advances parcels with a symplectic Stormer-Verlet integrator. A four-stage physics-informed curriculum stabilises 260-step autoregressive rollouts without gradient explosion. A parameter sweep with foam-extend 4.1 OpenFOAM reactingParcelFoam across clinically relevant ventilation rates and handpiece spray speeds provides CFD ground truth. This article reports a single-case demonstration in which both ELGIN and a Lagrangian-only baseline (M0) are trained and evaluated on Sweep_Case_03 of a twenty-case sweep; full 16/2/2 retraining is in progress and will replace all reported metrics. On this case, ELGIN tracks the foam-extend particle cloud much more closely than M0: mean parcel displacement error falls from 19.56% to 16.20% of room width and cloud radius-of-gyration error from 9.85% to 6.58%. A 26-second rollout completes in ~64 s on a 4 GB GPU, approximately 37x faster than the foam-extend reference pipeline, toward per-appointment infection-risk screening once the multi-case checkpoint is in place.

2605.19584 2026-05-20 cs.LG stat.ML 版本更新

Online Market Making and the Value of Observing the Order Book

在线市场做市与观察订单簿的价值

Davide Maran, Marcello Restelli

发表机构 * Politecnico di Milano(米兰理工大学)

AI总结 本文研究了在线市场做市问题,其中学习者在与持有私人估值的交易者交互时,依次发布买入和卖出价格。与现有在线学习公式假设完全截断反馈不同,我们引入了受真实限价簿启发的动作依赖反馈模型。我们证明,这种额外信息从根本上改变了问题的学习性。在随机设置中,我们提出了一种消除算法,以高概率达到O(√T)的遗憾,而无需对交易者估值分布的光滑性做出任何假设。然后我们将这一结果扩展到广泛的均值回归价格过程中,考虑了局部自回归动态和基于累积偏离均值的较弱全局漂移条件。在任一假设下,我们建立了高概率O(√T)的遗憾界,依赖于一个新的有趣的集中不等式。最后,在对抗性设置中,我们设计了探索后扰动算法,保证了期望O(T^{2/3})的遗憾。

Comments Accepted at COLT2026

详情
AI中文摘要

我们研究了一个在线市场做市问题,其中学习者在与持有私人估值的交易者交互时,依次发布买入和卖出价格。与现有在线学习公式假设完全截断反馈不同,我们引入了受真实限价簿启发的动作依赖反馈模型:当发生交易时,交易者的估值保持隐藏,而当没有发生交易时,会揭示关于供应和需求的信息反馈。我们证明,这种额外信息从根本上改变了问题的学习性。在随机设置中,我们提出了一种消除算法,以高概率达到O(√T)的遗憾,而无需对交易者估值分布的光滑性做出任何假设。然后我们将这一结果扩展到广泛的均值回归价格过程中,考虑了局部自回归动态和基于累积偏离均值的较弱全局漂移条件。在任一假设下,我们建立了高概率O(√T)的遗憾界,依赖于一个新的有趣的集中不等式。最后,在对抗性设置中,我们设计了探索后扰动算法,保证了期望O(T^{2/3})的遗憾。我们的结果量化了在线市场做市中观察订单簿的价值,并证明了即使有限的动作依赖反馈也能显著改善遗憾保证,相比标准带隙反馈模型。

英文摘要

We study an online market-making problem in which a learner sequentially posts bid and ask prices for a single asset while interacting with traders holding private valuations. Unlike existing online learning formulations that assume fully censored feedback, we introduce an action-dependent feedback model inspired by real limit order books: when a trade occurs, the trader's valuation remains hidden, whereas when no trade occurs, informative feedback about supply and demand is revealed. We show that this additional information fundamentally changes the learnability of the problem. In the stochastic setting with i.i.d. market prices, we propose an elimination-based algorithm that achieves $O(\sqrt T)$ regret with high probability, without requiring any smoothness assumptions on the distribution of trader valuations. We then extend this result to a broad class of mean-reverting price processes by considering both local, autoregressive dynamics and a weaker global drift condition based on cumulative deviations from the mean. Under either assumption, we establish high-probability $O(\sqrt T)$ regret bounds, relying on a new concentration inequality of independent interest. Finally, in the adversarial setting with oblivious prices, we design an explore-then-perturb algorithm that guarantees $O(T^{2/3})$ regret in expectation. Our results quantify the value of observing the order book in online market making and demonstrate that even limited, action-dependent feedback can substantially improve regret guarantees compared to standard bandit feedback models.

2605.19565 2026-05-20 physics.flu-dyn cs.LG 版本更新

HiLiftAeroML: High-Fidelity Computational Fluid Dynamics Dataset for High-Lift Aircraft Aerodynamics

HiLiftAeroML:高保真计算流体力学数据集用于高升力飞机气动性能

Neil Ashton, Adam Clark, Liam Heidt, Christopher Ivey, Sanjeeb Bose, Rahul Agrawal, Konrad Goc, Rishi Ranade, Corey Adams, Peter Sharpe, Sheel Nidhan, Semit Akkurt, Daniel Leibovici, Jean Kossaifi

发表机构 * nvidia

AI总结 本文介绍了一个首个开源的高保真计算流体力学数据集,用于AI代理模型开发,该数据集包含1800个样本,源自180种几何变体和10个攻角的NASA通用研究模型(CRM)几何体,用于AIAA高升力预测工作坊系列。该数据集的创新之处在于使用GPU加速的高保真显式壁模式LES方法进行每个模拟,使用300M到500M的适应性网格,以确保在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集(几何体、时间平均体积和表面变量以及积分力)免费提供,带有宽松的开源许可(CC-BY-4.0)。通过公开发布此数据,我们旨在加速航空航天工业中AI代理建模的研究与开发。

详情
AI中文摘要

本文描述了首个开源的高保真计算流体力学数据集,用于AI代理模型开发。该数据集由1800个样本组成,源自180种几何变体和10个攻角的高升力NASA通用研究模型(CRM)几何体,用于AIAA高升力预测工作坊系列。该数据集的一个创新点是使用GPU加速的高保真显式壁模式LES方法进行每个模拟,使用300M到500M的适应性网格。这确保了在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集(几何体、时间平均体积和表面变量以及积分力)免费提供,带有宽松的开源许可(CC-BY-4.0)。通过公开发布此数据,我们旨在加速航空航天工业中AI代理建模的研究与开发。

英文摘要

This paper describes the first-ever open-source high-fidelity CFD dataset of a high-lift aircraft for the purpose of AI surrogate model development. The dataset is composed of 1800 samples, arising from 180 geometry variants and 10 angles of attack for the high-lift NASA Common Research Model (CRM) geometry, used within the AIAA High-Lift Prediction Workshop series. One of the novelties of this dataset is the use of a GPU-accelerated high-fidelity explicit, wall-modeled LES approach for each simulation, using solution-adapted grids between 300M and 500M cells. This ensures the greatest possible accuracy given known challenges in steady-state RANS approaches for these portions of the flight envelope. The entire dataset (geometries, time-averaged volume and surface variables and integral forces) are available, free of charge with a permissive open-source license (CC-BY-4.0). By making this data publicly available, we aim to accelerate the research and development of AI surrogate modeling within the aerospace industry.

2605.19562 2026-05-20 cs.RO cs.LG math.OC 版本更新

Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions

基于学习的优化轨迹规划用于协作的空中-地面切换任务

Jingshan Chen, Bochen Yu, Henrik Ebel, Peter Eberhard

发表机构 * Institute of Engineering and Computational Mechanics, University of Stuttgart, 70569 Stuttgart, Germany(工程与计算力学研究所,斯图加特大学,德国斯图加特70569) Mechanical Engineering, LUT University, 53850 Lappeenranta, Finland(机械工程,卢蒂大学,芬兰拉佩恩兰塔53850)

AI总结 本文提出了一种结合学习的轨迹规划框架,用于协同无人 aerial 和 ground 车辆的切换任务,通过使用解耦的编码器-解码器 LSTM 网络生成协调的切换轨迹预测,从而加速优化过程,实现更快的收敛和更高的优化成功率。

Comments Preprint of a contribution accepted for publication in the RoManSy 2026 Springer proceedings

详情
AI中文摘要

本文提出了一种基于学习的轨迹规划框架,用于协同无人 aerial 和 ground 车辆的切换任务。尽管集中式轨迹优化能够确保动态可行性和任务最优性,但其高计算成本限制了实时应用。我们提出了一种神经代理规划器,利用解耦的编码器-解码器长短期记忆(LSTM)网络,从任务规范中生成协调的切换轨迹预测。这些预测作为下游集中优化器的有信息的预热启动,从而加速收敛到动态可行的解决方案。基准评估显示,与冷启动优化相比,结合学习的规划框架在速度上提高了三倍以上,并实现了100%的优化成功率。结果表明,结合数据驱动推断与模型驱动细化能够为异构多机器人系统提供快速且可靠的轨迹生成。

英文摘要

This paper presents a learning-augmented trajectory planning framework for cooperative unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) handover missions. While centralized trajectory optimization ensures dynamic feasibility and task optimality, its high computational cost limits real-time applicability. We propose a neural surrogate planner utilizing decoupled encoder-decoder long short-term memory (LSTM) networks to generate coordinated handover trajectory predictions from the task specifications. These predictions serve as informed warm starts for the downstream centralized optimizer, thereby accelerating convergence to dynamically feasible solutions. Benchmark evaluations demonstrate that the learning-augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization. The results indicate that combining data-driven inference with model-based refinement enables fast and reliable trajectory generation for heterogeneous multi-robot systems.

2605.19561 2026-05-20 cs.LG cs.AI 版本更新

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

TORQ:MXFP4量化中的两级正交旋转

Zukang Xu, Xing Hu, Dawei Yang

发表机构 * Open Compute Project(开放计算项目)

AI总结 本文提出TORQ框架,通过优化坐标变换重塑激活空间的几何属性,解决MXFP4激活量化中的精度下降问题,显著提升量化精度。

Comments 17 pages, 4 figures, 13 tables

详情
AI中文摘要

随着大型语言模型(LLMs)向实际部署迈进,微缩FP4(MXFP4)格式已成为下一代低比特推断的基石,因其在高动态范围与硬件效率之间的平衡能力。然而,直接将MXFP4应用于LLM激活量化不可避免地导致显著的精度下降。在本文中,我们从理论上分析MXFP4激活量化的误差结构,揭示出性能下降的根本原因在于激活分布与MXFP4块浮点格式之间的两个结构性不平衡:(1)极端块间方差不平衡和(2)块内代码书利用不平衡。为了解决这些挑战,我们提出了TORQ(MXFP4量化中的两级正交旋转),一种无训练的后训练量化(PTQ)框架,通过最优坐标变换重塑激活空间的几何属性。在宏观层面,TORQ利用Schur-Horn定理通过块间正交旋转重新分配激活能量,防止高方差块驱动共享缩放因子,从而保留小幅度元素的精度。在微观层面,TORQ采用最大熵引导的块内旋转以缓解代码书坍塌并最大化MXFP4代码书的信息容量。在主流LLM如LLaMA3和Qwen3上的实验表明,与现有方法相比,TORQ显著提高了MXFP4激活量化的准确性:在Qwen3-32B上,WikiText的困惑度降低到8.43(相比BF16的7.61),平均准确率从直接RTN的38.40%增加到73.63%(相比BF16的74.82%),大幅缩小了4位浮点量化与全精度推断之间的差距。

英文摘要

As Large Language Models (LLMs) advance toward practical deployment, the Microscaling FP4 (MXFP4) format has emerged as a cornerstone for next-generation low-bit inference, owing to its ability to balance high dynamic range with hardware efficiency. However, directly applying MXFP4 to LLM activation quantization inevitably leads to significant accuracy degradation. In this paper, we theoretically analyze the error structure of MXFP4 activation quantization, revealing that the root cause of this performance drop lies in two structural imbalances between activation distributions and the MXFP4 block floating-point format: (1) extreme inter-block variance imbalance and (2) intra-block codebook utilization imbalance. To address these challenges, we propose TORQ (Two-level Orthogonal Rotation for MXFP4 Quantization), a training-free Post-Training Quantization (PTQ) framework designed to reshape the geometric properties of the activation space through optimal coordinate transformations. At the macroscopic level, TORQ leverages the Schur-Horn theorem to redistribute activation energy via inter-block orthogonal rotation, preventing high-variance blocks from driving up shared scaling factors and thereby preserving the precision of small-magnitude elements. At the microscopic level, TORQ employs maximum-entropy-guided intra-block rotation to alleviate codebook collapse and maximize the MXFP4 codebook's information capacity. Experiments on mainstream LLMs such as LLaMA3 and Qwen3 show that TORQ significantly improves the accuracy of MXFP4 activation quantization compared to existing methods: on Qwen3-32B, the perplexity on WikiText is reduced to 8.43 (vs. 7.61 for BF16), and the average accuracy increases from 38.40% with direct RTN to 73.63% (vs. 74.82% for BF16), substantially narrowing the gap between 4-bit floating-point quantization and full-precision inference.

2605.19557 2026-05-20 stat.ML cs.LG 版本更新

Density-Ratio Losses for Post-Hoc Learning to Defer

基于密度比损失的后验学习延迟

Alexander Soen, Ragnar Thobaben, Joakim Jaldén, Richard Nock

发表机构 * KTH(皇家理工学院) Google Research(谷歌研究)

AI总结 本文研究了后验学习延迟(L2D)问题,通过理想分布的视角定义延迟,并提出基于密度比损失的CPE损失函数,通过阈值判断延迟决策,从而在不重新训练的情况下调整延迟率,同时揭示了Chow规则与专家倾斜贝叶斯后验之间的联系。

Comments Preprint

详情
AI中文摘要

我们通过理想分布的视角研究后验学习延迟(L2D)。理想分布被定义为在其中模型能够取得低损失的数据分布的密度比重加权。我们通过将密度比估计还原为类别概率估计,推导出用于后验L2D评分器的DR CPE损失。延迟决策通过阈值化评分器进行,允许在不重新训练的情况下调整延迟率。对于基于KL的理想分布,我们的延迟规则在原始分布下恢复Chow规则,并在理想分布是联合或边缘分布时与专家倾斜的贝叶斯后验建立联系。实验表明,我们的方法在与常见基线相比具有竞争力,并且在不同数据集设置下更加稳健。更广泛地说,我们的结果将后验L2D视为理想分布之间的密度比学习,连接了Chow式规则、专家比较以及阐明了与异常检测等其他学习设置的相关联系。

英文摘要

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporates the expert's performance -- depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.

2605.19549 2026-05-20 cs.SE cs.LG 版本更新

Provable Fairness Repair for Deep Neural Networks

深度神经网络的可证公平修复

Jianan Ma, Jingyi Wang, Qi Xuan, Zhen Wang

发表机构 * Hangzhou Dianzi University, China(杭州电子科技大学) Zhejiang University, China(浙江大学) Zhejiang University of Technology, China(浙江工业大学)

AI总结 本文提出ProF框架,通过区间界限传播技术,为深度神经网络提供可证的公平性修复,实现对偏见样本周围整个集合的公平性保障,并在多个基准数据集上验证了其有效性。

Comments 15 pages, 6 figures, 7 tables. full version of the paper accepted by ASE 2025

详情
Journal ref
Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2025
AI中文摘要

深度神经网络(DNNs)正面临诸如个体歧视等伦理问题。为此,已开发出大量NN修复技术来调整模型并减轻此类不良行为。然而,现有公平性修复方法通常是数据驱动的,往往缺乏可证保证和对未见过样本的泛化能力。为克服这些限制,我们提出了ProF,一种具有可证保证的新型公平性修复框架。ProF的核心思想是利用区间界限传播(一种广泛使用的神经网络验证技术)来在偏见样本x周围的整个集合S(x)上准确捕捉模型输出。所推导的界限用于指导公平性修复,促使模型在S(x)上产生一致的输出。具体而言,我们将公平性约束和模型修改整合到统一的约束求解公式中,该公式可转换为可由现成求解器解决的混合整数线性规划(MILP)问题。MILP问题的解有效地诱导出一个具有整体S(x)公平性保障的修复模型。我们在四个广泛使用的基准数据集上评估了ProF,并证明其实现了可证公平性修复,在完整数据集上的泛化能力高达95.93%,在整个输入空间上为93.16%。值得注意的是,ProF可以轻松配置以支持多种敏感属性和更实际的公平性定义,同时提供可证修复保证,并实现约90%的公平性提升。我们的代码可在https://github.com/nninjn/ProF上获得。

英文摘要

Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often lack provable guarantees and generalization to unseen samples. To overcome these limitations, we propose ProF, a novel fairness repair framework with provable guarantees. The key intuition of ProF is to leverage interval bound propagation (a widely used NN verification technique) to soundly capture model outputs over the whole set $S(\mathbf{x})$ around a biased sample $\mathbf{x}$. The derived bounds are utilized to guide fairness repair which encourages the model to produce consistent outputs on $S(\mathbf{x})$. Specifically, we integrate fairness constraints and model modifications into a unified constraint-solving formulation, which can be transformed to a Mixed-Integer Linear Programming (MILP) problem solvable by off-the-shelf solvers. The solution to the MILP problem effectively induces a repaired model with guaranteed fairness over the whole set $S(\mathbf{x})$. We evaluate ProF on four widely used benchmark datasets and demonstrate that it achieves provable fairness repair, with generalization of up to 95.93\% on full datasets and 93.16\% on the entire input space. Notably, ProF can be easily configured to support multiple sensitive attributes and more practical fairness definitions, while providing provable repair guarantees and delivering around 90\% fairness improvement. Our code is available at https://github.com/nninjn/ProF.

2605.19532 2026-05-20 cs.CV cs.LG 版本更新

Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

通过基于核心标记注意力的种子选择提升文本到图像扩散模型

Yunzhe Zhang, Hongfu Liu, Pengyu Hong

发表机构 * Brandeis University(布兰迪大学)

AI总结 本文研究了文本到图像扩散模型中种子对生成质量的影响,提出基于核心标记注意力的种子选择方法,无需训练即可提升文本与图像的一致性及视觉质量。

Comments Preprint

详情
AI中文摘要

文本到图像扩散模型能够生成高质量的图像,但其输出对随机种子极为敏感:不同的初始种子往往导致图像质量和提示词与图像的一致性产生显著差异。我们重新审视这一

英文摘要

Text-to-image diffusion models can synthesize high-quality images, yet the outcome is notoriously sensitive to the random seed: different initial seeds often yield large variations in image quality and prompt-image alignment. We revisit this "seed effect" and show that attention dynamics over prompt core tokens, the content-bearing words, measured during the first few denoising steps, strongly predict final generation quality. Building on this observation, we introduce Attention-Based Seed Selection (ABSS), a training-free, plug-and-play method that ranks seeds for a given prompt by leveraging cross-attention to core tokens during the denoising process. ABSS requires no finetuning and does not alter the initial noise; it scores and ranks all candidate seeds, keeps only the top-k for full generation, and discards the rest, without relying on a fixed accept/reject threshold. Operating purely at inference time, ABSS can serve as a lightweight pre-selection add-on for existing seed-optimization pipelines, enabling additional gains. Across three benchmarks, extensive experiments show that ABSS enables consistent improvements in text-image alignment and visual quality for Stable Diffusion variants, as corroborated by human preference and alignment metrics.

2605.19516 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Base Models Look Human To AI Detectors

基础模型对AI检测器看起来很像人类

Yixuan Even Xu, Ziqian Zhong, Aditi Raghunathan, Fei Fang, J. Zico Kolter

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 本研究发现基础模型生成的文本在AI检测器中常被误判为人类生成,提出HIP方法通过迭代改写提升检测器规避能力,揭示当前检测器更关注指令调优和局部上下文而非通用机器生成文本特征。

Comments 39 pages, 9 figures

详情
AI中文摘要

随着AI生成文本在现实世界大规模应用,机构越来越多地使用商业AI文本检测器,尤其是在教育和学术诚信流程中。我们报告了一个令人惊讶的经验发现:当用GPTZero和Pangram评估时,基础模型生成的文本往往被判断为高度人类化,而经过指令调优的模型生成的文本则不具有这种特性。基于这一观察,我们提出了Humanization by Iterative Paraphrasing (HIP),一种不依赖特定检测器的管道,它最小化地微调基础模型为改写器并迭代应用。与我们测试的基线相比,HIP在商业检测器上实现了更好的语义保留与检测器规避的平衡。在Llama-3和Qwen-3系列模型中,从0.6B到70B的不同规模上,HIP始终提高了检测器的人类化程度。我们的发现表明,当前检测器更关注指令调优和局部上下文而非任何通用机器生成文本的不变特征。这反过来要求检测器设计更明确地建模这些因素。

英文摘要

As AI-generated text enters the real-world at scale, institutions increasingly use commercial AI-text detectors, especially in education and academic-integrity workflows. We report a surprising empirical finding about such systems: when evaluated by GPTZero and Pangram, generated text from base models is often judged overwhelmingly human, whereas text generated by their instruction-tuned counterparts is not. Building on this observation, we propose Humanization by Iterative Paraphrasing (HIP), a detector-agnostic pipeline that minimally fine-tunes a base model into a paraphraser and applies it iteratively. Compared with the baselines we test, HIP yields a stronger trade-off between semantic preservation and detector evasion on commercial detectors. Across Llama-3 and Qwen-3 families, spanning model sizes from 0.6B to 70B, HIP consistently improves detector human-likeness. Our findings suggest that current detectors are tracking artifacts of instruction tuning and local context more than any invariant notion of machine-generated text. This, in turn, calls for detector designs that model these factors more explicitly.

2605.19483 2026-05-20 cs.LG 版本更新

Adynamical systems view of training generativemodels and the memorization phenomenon

用动力系统观点看训练生成模型及记忆现象

Siva Athreya, Chiranjib Bhattacharya, Vivek S. Borkar

发表机构 * International Institute for Theoretical Sciences(理论科学国际研究所) Department of Computer Science and Automation(计算机科学与自动化系) Indian Institute of Science(印度科学研究所) Department of Electrical Engineering(电气工程系) Indian Institute of Technology Bombay(博亚理工大学)

AI总结 本文从动力系统角度分析生成模型训练中的记忆现象,通过研究SGD中的时间尺度差异及崩溃现象,揭示生成模型在训练过程中产生相同或相似输出的机制。

Comments 12 pages

详情
AI中文摘要

利用作者之一(VSB)关于生成模型崩溃和高维随机梯度下降中双时间尺度动态的研究,本文从系统理论角度解释了生成模型中的记忆现象。这纯粹依赖于训练阶段的动力学特性。具体来说,我们使用Austin [2016] 的结果,提出一个简化的SGD损失函数模型,其中损失函数对某些变量有强依赖性,对其他变量有弱依赖性。这自然导致常数步长SGD中存在两个不同的时间尺度。这一事实已被用于解释SGD中的双下降现象(Borkar [2026])。结合Borkar [2025a] 中开发的SGD崩溃现象数学模型,我们利用Azizian等人 [2024] 的最新结果,分析常数步长SGD,以解释记忆现象,即在同时进行调优的生成模型中,输出在显著时间段内保持相同或相似。这为机器学习文献中报告的上述现象及其相互关系提供了新的视角,使用动力系统观点。

英文摘要

Using recent works of one of the authors (VSB) on collapse in generative models and two time scale dynamics in stochastic gradient descent in high dimensions, we give a system theoretic explanation of the memorization phenomenon in generative models. This relies purely on the dynamic aspects of the training phase. Specifically, we use a result of Austin [2016] to motivate a stylized model for the loss function for stochastic gradient descent (SGD) wherein the loss function has a strong dependence on some variables and weak dependence on the rest in a precise sense. This naturally leads to two distinct time scales in the constant step size SGD that is commonly used in machine learning. This fact has been used to explain the double descent phenomenon in SGD in Borkar [2026]. In conjunction with a mathematical model for collapse phenomenon in SGD developed in Borkar [2025a], we analyze the constant step size SGD using the recent results of Azizian et al. [2024] in order to explain the phenomenon of memorization wherein a generative model that is concurrently being tuned yields the same or similar outputs for significant stretches of time. This gives a novel perspective on the aforementioned phenomena reported in machine learning literature and their interrelationships, using a dynamical systems viewpoint.

2605.19470 2026-05-20 cs.CL cs.LG 版本更新

Drifting Objectives for Refining Discrete Diffusion Language Models

漂移目标用于细化离散扩散语言模型

Daisuke Oba, Hiroki Furuta, Naoaki Okazaki

发表机构 * Institute of Science Tokyo(东京科学研究院) AIST(日本产业技术综合研究所) NII LLMC(日本信息处理学会LLMC)

AI总结 本文研究如何将漂移方法应用于离散扩散语言模型,通过引入TokenDrift目标,将类别预测提升为软令牌特征,并在冻结语义空间中应用反称漂移,从而提升生成质量。

Comments Project page: https://daioba.github.io/tokendrift/

详情
AI中文摘要

离散扩散语言模型(DDLMs)通过迭代去噪类别令牌序列生成文本,而近期针对连续生成器的漂移方法表明,部分采样时间的修正可以通过反称固定点目标在训练中吸收。我们研究如何将这一原理转移到DDLMs中,其中主要挑战是与离散文本的接口:硬令牌样本不可微,类别预测不直接提供连续样本进行漂移。我们提出了TokenDrift,一种漂移目标,将类别预测提升为软令牌特征,在冻结的语义空间中应用反称漂移,并将由此产生的stop-gradient特征目标反向传播到DDLM的logits中。在受控的持续训练实验中,使用掩码和均匀状态扩散基础架构,TokenDrift在匹配的延续基线之上提升了固定NFE生成质量,在MDLM上将Gen.-PPL在4 NFEs时降低了89%,在DUO上降低了86%。这些结果表明,漂移可以为DDLMs提供实用的细化目标。

英文摘要

Discrete diffusion language models (DDLMs) generate text by iteratively denoising categorical token sequences, while recent drifting methods for continuous generators suggest that part of this sampling-time correction can instead be absorbed into training through an anti-symmetric fixed-point objective. We study how to transfer this principle to DDLMs, where the main challenge is the interface with discrete text: hard token samples are non-differentiable, and categorical predictions do not directly provide continuous samples to drift. We formulate TokenDrift, a drifting objective that lifts categorical predictions to soft-token features, applies anti-symmetric drifting in a frozen semantic space, and backpropagates the resulting stop-gradient feature target to DDLM logits. In controlled continual-training experiments with masked and uniform-state diffusion backbones, TokenDrift improves fixed-NFE generation quality over matched continuation baselines, reducing Gen.-PPL at 4 NFEs by 89% on MDLM and 86% on DUO. These results suggest that drifting can provide a practical refinement objective for DDLMs.

2605.19469 2026-05-20 cs.LG cs.AI cs.RO 版本更新

Sampling-Based Safe Reinforcement Learning

基于采样的安全强化学习

Luca Vignola, Bruce D. Lee, Manish Prajapat, Manuel Wendl, Melanie Zeilinger, Andreas Krause, Yarden As

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出了一种基于采样的安全强化学习方法,通过在有限的动力学样本集上联合施加约束,确保学习过程中的安全性,并在连续域中提供实用的安全保证,同时通过限制认知不确定性实现了高效的探索。

详情
AI中文摘要

安全探索仍然是强化学习(RL)中的基本挑战,限制了RL智能体在现实世界中的部署。我们提出了一种基于采样的安全强化学习(SBSRL),这是一种基于模型的RL算法,通过在有限的动力学样本集上联合施加约束,确保学习过程中的安全性。这种形式近似了在不确定动力学下的不可行最坏情况优化,并在连续域中实现了实用的安全保证。我们进一步引入了一种基于限制认知不确定性的探索策略,消除了显式探索奖励的需要。在常规条件下,我们推导了学习过程中安全性的高概率保证以及恢复近最优策略的有限时间样本复杂度界。实验证明,SBSRL在仿真和真实机器人硬件中均实现了安全且高效的探索,并可轻松扩展到实际的深度集合实现,以解决高维连续控制问题。

英文摘要

Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constraints jointly across a finite set of dynamics samples. This formulation approximates an intractable worst-case optimization over uncertain dynamics and enables practical safety guarantees in continuous domains. We further introduce an exploration strategy based on constraining epistemic uncertainty, eliminating the need for explicit exploration bonuses. Under regularity conditions, we derive high-probability guarantees of safety throughout learning and a finite-time sample complexity bound for recovering a near-optimal policy. Empirically, SBSRL achieves safe and efficient exploration both in simulation and in real robotic hardware, and readily extends to practical deep-ensemble implementations that scale to high-dimensional continuous control problems.

2605.19462 2026-05-20 cs.LG cs.AI 版本更新

Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models

量化预训练红利:生成与潜在自监督学习在时间序列基础模型中的应用

Noam Major, Kathy Razmadze, Yoli Shavit

发表机构 * Faculty of Engineering, Bar-Ilan University(巴伊兰大学工程学院)

AI总结 本文研究了自监督学习在时间序列中的应用,比较了生成范式与潜在对齐架构,发现预训练红利在异常检测和分类任务中显著提升,但在预测任务中效果有限,同时表明表示质量与数据来源无关,且在适度的架构深度下趋于稳定。

详情
AI中文摘要

自监督学习(SSL)在视觉和自然语言处理中的成功促使其在时间序列中的快速应用。然而,研究主要集中在生成范式和预测任务上,未量化学习表示的广泛应用。我们建立了一个受控框架来评估“预训练红利”:SSL在多样时间任务中的价值。我们系统比较了生成范式与潜在对齐架构,引入了适用于时间序列的LeJEPA和DINO的变体。这些变体利用离散小波变换(DWT)增强来强制对局部波动的不变性。我们的分析揭示预训练红利高度不对称:SSL在异常检测和分类任务中可获得高达375%的收益,但在预测任务中效果有限。我们证明表示的实用性非普遍,由精度-不变性权衡决定,任务所需的特定信号分辨率必须与目标一致。最后,我们显示表示质量与数据来源无关,并在适度的架构深度下趋于稳定,表明通过大规模合成生成可实现扩展。我们的代码可在:https://github.com/noammajor/Models 获取。

英文摘要

The success of self-supervised learning (SSL) in vision and NLP has motivated its rapid adoption for time series. However, research has focused primarily on Generative paradigms and forecasting tasks, leaving the broader utility of learned representations unquantified. We establish a controlled framework to evaluate the "pre-training dividend": the value added by SSL across diverse temporal tasks. We systematically compare Generative paradigms against Latent Alignment architectures, introducing adaptations of LeJEPA and DINO for time series. These adaptations utilize Discrete Wavelet Transform (DWT) augmentations to enforce invariance to local fluctuations. Our analysis reveals that the pre-training dividend is highly asymmetric: SSL yields gains of up to 375% for anomaly detection and classification, yet remains marginal for forecasting. We demonstrate that representational utility is non-universal, governed by a precision-invariance trade-off where the specific signal resolution required by the task must align with the objective. Finally, we show that representation quality is largely independent of data origin and saturates at moderate architectural depths, suggesting a path to scaling via massive synthetic generation. Our code is available at: https://github.com/noammajor/Models

2605.19458 2026-05-20 cs.LG 版本更新

Implicit Bias of Mirror Flow in Homogeneous Neural Networks: Sparse and Dense Feature Learning

隐式偏置与同质神经网络中的稀疏和密集特征学习

Tom Jacobs, Guido Montufar

发表机构 * CISPA Helmholtz Center(CISPA海德堡中心) UCLA(加州大学洛杉矶分校) MPI MiS(马克斯·普朗克研究所(MiS))

AI总结 研究隐式偏置如何影响同质神经网络中的稀疏和密集特征学习,通过推导新的平衡方程和实验验证,揭示了镜像流在优化动态和分类器几何结构中的作用。

Comments 36 pages, 14 figures

详情
AI中文摘要

我们研究了在具有同质激活函数的深度神经网络中,镜像流达到的最大边际解。扩展经典梯度流结果,我们从凸对偶性推导出镜像流的新平衡方程,从而能够表征诱导边际的水平函数。我们进一步建立了最大边际特征以及收敛速度和范数增长估计。最后,我们通过合成数据集和标准视觉任务的实验支持我们的理论。具体而言,我们显示:(1)不同的非同质镜像映射可以诱导相同的最大边际解;(2)收敛可以非常缓慢,包括指数级缓慢的区域;以及(3)尽管所有考虑的镜像映射都表现出特征学习,但它们可以产生从稀疏到密集神经元激活的明显不同表示。这些结果为同质神经网络中的稀疏和密集特征学习提供了统一的视角,突显了镜像映射如何影响优化动态和学习分类器的几何结构。

英文摘要

We study the max-margin solutions reached by mirror flow in deep neural networks with homogeneous activation functions. Extending classical results on gradient flow, we derive a novel balance equation for mirror flow from convex duality, enabling a characterization of the horizon function governing the induced margin. We further establish max-margin characterizations together with convergence rates and norm growth estimates. Finally, we support our theory through experiments on synthetic datasets and standard vision tasks. Concretely, we show that: (1) distinct non-homogeneous mirror maps can induce the same max-margin solution; (2) convergence can be extremely slow, including exponentially slow regimes; and (3) although all considered mirror maps exhibit feature learning, they can produce markedly different representations, ranging from sparse to dense neuron activations. Together, these results provide a unified perspective on sparse and dense feature learning in homogeneous neural networks, highlighting how mirror maps shape both optimization dynamics and the geometry of the learned classifiers.

2605.19436 2026-05-20 cs.LG cs.CL cs.CV 版本更新

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

CEPO: 使用对比证据策略优化进行RLVR自蒸馏

Ahmed Heakl, Abdelrahman M. Shaker, Youssef Mohamed, Rania Elbadry, Omar Fetouh, Fahad Shahbaz Khan, Salman Khan

发表机构 * MBZUAI Linköping University(林雪平大学) Australian National University(澳大利亚国立大学)

AI总结 本文提出CEPO,通过对比证据策略优化解决RLVR中自蒸馏的问题,通过区分关键推理步骤与填充内容来提升模型性能。

Comments 9 pages

详情
AI中文摘要

当模型在强化学习中产生正确解时,每个token都会收到相同的奖励信号,无论其是关键推理步骤还是语法填充。一种自然的解决方法是将模型条件化为正确的答案作为教师,识别出模型在知道答案时会生成不同的token。先前的工作表明,这种方法要么通过泄露答案到梯度而破坏训练,要么产生弱信号,无法区分关键步骤和填充内容,因为两者在模型基线下看起来同样令人惊讶。我们提出对比证据策略优化(CEPO),在每个token上提出更尖锐的问题:不仅“正确答案是否偏好此token?”而且“正确答案是否偏好它,而错误答案是否厌恶它?”满足两者的是真正的推理步骤;不满足的是填充内容。错误答案的教师是从训练批次中已有的拒绝rollouts构造的,不增加额外的采样成本。我们证明CEPO继承了先前最先进状态下的所有结构安全保证,同时在关键token上严格提高信用,改进在填充位置恰好消失。实验表明,CEPO在五个多模态数学推理基准上分别达到43.43%和60.56%的平均准确率(在2B和4B规模下),而GRPO在相同训练预算下为41.17%和57.43%。分布匹配自蒸馏方法(OPSD、SDPO)在未训练基线下表现低于,实验证实了我们的理论预测的信息泄漏。我们的代码可在https://github.com/ahmedheakl/CEPO上获得。

英文摘要

When a model produces a correct solution under reinforcement learning with verifiable rewards (RLVR), every token receives the same reward signal regardless of whether it was a decisive reasoning step or a grammatical filler. A natural fix is to condition the model on the correct answer as a teacher, identifying tokens it would have generated differently had it known the answer. Prior work shows this either corrupts training by leaking the answer into the gradient, or produces a weak signal that cannot distinguish decisive steps from filler, since both look equally surprising relative to the model's baseline. We propose Contrastive Evidence Policy Optimization (CEPO), which asks a sharper question at every token: not just "does the correct answer favor this token?" but "does the correct answer favor it while the wrong answer disfavors it?" A token satisfying both is a genuine reasoning step; one satisfying neither is filler. The wrong-answer teacher is constructed from rejected rollouts already in the training batch, incurring no additional sampling cost. We prove CEPO inherits all structural safety guarantees of the prior state of the art while strictly sharpening credit at decisive tokens, with the improvement vanishing exactly at filler positions. Empirically, CEPO achieves 43.43% and 60.56% average accuracy across five multimodal mathematical reasoning benchmarks at 2B and 4B scale, respectively, versus 41.17% and 57.43% for GRPO under identical training budgets. Distribution-matching self-distillation methods (OPSD, SDPO) fall below the untrained baseline, empirically confirming the information leakage our theory predicts. Our code is available at https://github.com/ahmedheakl/CEPO.

2605.19425 2026-05-20 cs.LG cs.AI 版本更新

When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR

何时停止重用:动态梯度门控用于样本高效的RLVR

Yuchun Miao, Sen Zhang, Yuqi Zhang, Yaorui Shi, Qi Gu, Xunliang Cai, Lefei Zhang

发表机构 * National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University(国家多媒体软件工程研究中心,武汉大学计算机学院) Meituan Longcat Team(美团Longcat团队) The University of Sydney(悉尼大学) University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出动态梯度门控(DGG)方法,通过实时监控lm_head梯度范数来检测并阻止有害的梯度传播,从而提高样本效率和训练速度。

Comments 23 pages, 10 figures

详情
AI中文摘要

可验证奖励的强化学习(RLVR)已成为大型语言模型(LLMs)高级推理的主要范式,但获取rollout样本成本高昂,使得样本效率成为关键瓶颈。一种自然的解决方法是将每个rollout批次用于多个梯度更新,这是经典强化学习中的标准做法。然而在RLVR中,这会放大策略偏移,导致严重性能下降。检测降级的早期迹象并停止重用仍是一个开放且具有挑战性的问题。我们通过识别不均衡权重分歧(DWD)现象来填补这一空白:性能下降与lm_head权重变化的急剧上升同步,而中间层保持稳定。经验上,我们验证DWD在各种LLM和任务中一致出现。理论上,我们证明(i)有害梯度集中在lm_head,而中间层在结构上被衰减,(ii)lm_head梯度范数下界了策略偏移。这些结果确立了lm_head梯度范数作为灾难性策略偏移的原理性、实时信号。基于这一见解,我们提出动态梯度门控(DGG),一种轻量级干预,通过实时监控lm_head梯度范数并在有害梯度污染优化器前拦截它们。DGG在数学、ALFWorld、WebShop和搜索增强型问答任务中一致匹配或超过标准单次使用基线,实现高达2.93倍的样本效率和2.14倍的墙钟加速。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for advanced reasoning in Large Language Models (LLMs), but rollout samples are expensive to obtain, making sample efficiency a critical bottleneck. A natural remedy is to reuse each rollout batch for multiple gradient updates, a standard practice in classical RL. Yet in RLVR, this amplifies policy shift, leading to severe performance degradation. Detecting the onset of degradation early enough to stop reuse remains an open and challenging problem. We close this gap by identifying the \textit{Disproportionate Weight Divergence (DWD)} phenomenon: performance degradation is synchronized with a sharp surge in the \texttt{lm\_head} weight change, while intermediate layers remain stable. Empirically, we verify that DWD emerges consistently across diverse LLMs and tasks. Theoretically, we prove that (i) harmful gradients concentrate at the \texttt{lm\_head} while intermediate layers are structurally attenuated, and (ii) the \texttt{lm\_head} gradient norm lower-bounds the policy divergence. These results establish the \texttt{lm\_head} gradient norm as a principled, real-time signal of catastrophic policy shift. Guided by this insight, we propose \textit{Dynamic Gradient Gating (DGG)}, a lightweight intervention that monitors the \texttt{lm\_head} gradient norm in real time and intercepts harmful gradients before they corrupt the optimizer. DGG consistently matches or exceeds the standard single-use baseline, achieving up to $2.93\times$ sample efficiency and $2.14\times$ wall-clock speedup across math, ALFWorld, WebShop, and search-augmented QA tasks.

2605.19407 2026-05-20 cs.LG cs.AI 版本更新

A Bitter Lesson for Data Filtering

数据过滤的惨痛教训

Christopher Mohri, John Duchi, Tatsunori Hashimoto

发表机构 * Department of Computer Science(计算机科学系) Departments of Statistics and Electrical Engineering(统计学与电气工程系) Stanford University(斯坦福大学)

AI总结 本文研究了大规模模型预训练中的数据过滤,发现即使有足够的计算资源,过滤数据也不是最佳选择,因为充分训练的大型模型能够容忍低质量数据甚至从中受益。

详情
AI中文摘要

我们通过新的扩展研究探讨了大规模模型预训练中的数据过滤,针对高计算需求和数据稀缺的环境。尽管人们普遍认为过滤数据以包含高质量信息是必要的,但我们的实验表明,在有足够的计算资源的情况下,最好的数据过滤方法实际上是没有数据过滤。我们发现,充分训练的大型参数模型不仅能够容忍低质量和干扰数据,而且实际上会从名义上‘差’的数据中受益。

英文摘要

We investigate data filtering for large model pretraining via new scaling studies that target the high compute, data-scarce regime. In spite of an apparently common belief that filtering data to include only high-quality information is essential, our experiments suggest that with enough compute, the best data filter is no data filter. We find that sufficiently trained large parameter models not only tolerate low-quality and distractor data, but in fact benefit from nominally ``poor'' data.

2605.19403 2026-05-20 cs.LG 版本更新

TIDE: Asymmetric Neural Circuits for Stabilized Temporal Inhibitory-Excitatory Dynamics

TIDE:用于稳定时间抑制-兴奋动态的非对称神经电路

Alexander Kyuroson, Denis Kleyko, Marcus Liwicki

发表机构 * Luleå University of Technology(卢莱大学技术学院) Örebro University(奥雷布罗大学) RISE Research Institutes of Sweden(瑞典RISE研究所)

AI总结 本文提出TIDE架构,通过非对称兴奋-抑制网络稳定时间动态,结合Wilson-Cowan动态和横向抑制,提升生物真实性和学习性能,实验表明其在训练时间和准确率上均优于CTM。

详情
AI中文摘要

最近的Continuous Thought Machine架构通过神经动态将内部计算与外部输入解耦,但依赖多层感知机而缺乏稳定性保证。我们提出使用非对称兴奋-抑制(E-I)网络建模神经动态,该网络可通过网络理论原理稳定,并可表示为通过博弈论损失优化的能量系统。基于此视角,我们引入时间抑制-兴奋动态引擎(TIDE),一种受神经启发的架构,通过稳定神经动态计算内部表示,整合Wilson-Cowan动态和横向抑制。TIDE通过例如使用分层感受野和强制Dale原则,平衡生物真实性,确保现实的80:20 E-I平衡比。本文的目标是引入一种新架构,将神经启发式学习置于 forefront。我们提供了收敛性、稳定性和复杂度界限的证明,以及实证消融研究。总体而言,TIDE在训练时间上比CTM少50%以下,并在各种扰动下将ImageNet的top-1准确率提高平均1.65%。

英文摘要

Recent Continuous Thought Machine architecture decouples internal computation from external inputs via neural dynamics, but relies on multi-layer perceptrons without stability guarantees. We propose to model neural dynamics using asymmetric Excitatory-Inhibitory (E-I) networks, which can be stabilized via principles from network theory and can be expressed as energy-based systems optimized through a game-theoretic loss. Building on this perspective, we introduce Temporal Inhibitory-Excitatory Dynamic Engine (TIDE), a neuro-inspired architecture that computes internal representations through neural dynamics stabilized by incorporating the Wilson-Cowan dynamics and lateral inhibition. TIDE balances biological realism by, for instance, using Hierarchical Receptive Fields and enforcing Dale's principle to ensure a realistic $80:20$ E-I balance ratio with an end-to-end trainable architecture. The aim of this paper is to introduce a new architecture that brings neuro-inspired learning to the forefront. We present proofs of convergence, stability, and complexity bounds, along with empirical ablation studies. Overall, TIDE surpasses CTM with under $50\%$ of the training time and improves $\texttt{top-1}$ accuracy by an average of $+1.65\%$ on ImageNet under various perturbations.

2605.19393 2026-05-20 cs.CV cs.LG 版本更新

Neuron Incidence Redistribution for Fairness in Medical Image Classification

神经元发生再分配用于医疗图像分类中的公平性

Abin Shoby, Lyle John Palmer, Nikhil Cherian Kurian

发表机构 * Neuron Incidence Redistribution for Fairness in Medical Image Classification(神经发生再分配用于医学图像分类)

AI总结 本文提出了一种轻量级的正则化方法Neuron Incidence Redistribution (NIR),通过减少预测概率加权平均激活值的方差来提升医疗图像分类中的公平性,实验结果显示在不同年龄和性别组别中,TPR和FPR的不平等现象显著降低。

Comments 4 Pages, 1 Figure

详情
AI中文摘要

深度学习模型在医疗图像分类中容易出现因年龄、性别和种族等人口属性导致的子群体性能差异。我们识别出这些差异背后的潜在表征机制:在迁移学习模型中,正预测下的主导倒数第二层激活通道同时被疾病阳性样本和特权人口群体(男性、年长患者)激活,导致过度诊断;相反,负预测下的主导通道由不利群体(女性、年轻患者)激活,导致系统性误诊。为了解决这一问题,我们提出了Neuron Incidence Redistribution (NIR),一种轻量级正则化方法,该方法惩罚倒数第二层神经元预测概率加权平均激活值的方差,无需在训练时使用人口属性标签。在HAM10000数据集上,NIR使年龄组的TPR不平等从10.81%降至0.93%,性别组的TPR不平等从12.04%降至0.74%,同时AUC略有提高0.51个点。在Harvard OCT-RNFL数据集上,NIR减少了种族(从15.68%降至10.66%)和年龄(从12.69%降至1.80%)的FPR不平等,证明了在全倒数第二层分布潜在疾病证据是一种提升医疗AI人口公平性的原则性且有效的方法。

英文摘要

Deep learning models for medical image classification are susceptible to subgroup performance disparities across demographic attributes such as age, gender, and race. We identify a latent representational mechanism underlying these disparities: in transfer-learned models, the dominant penultimate-layer activation channel under positive predictions is co-activated by both disease-positive samples and privileged demographic groups (male, older patients), producing over-diagnosis; conversely, the dominant channel under negative predictions is co-activated by disadvantaged groups (female, younger patients), producing systematic under-diagnosis. To address this, we propose Neuron Incidence Redistribution (NIR), a lightweight regularization method that penalizes the variance of predicted-probability-weighted mean activations across penultimate-layer neurons, requiring no demographic labels at training time. On HAM10000, TPR disparity drops from 10.81% to 0.93% across age groups and from 12.04% to 0.74% across gender, with a marginal AUC improvement of 0.51 points. On Harvard OCT-RNFL, NIR reduces FPR disparity for race (from 15.68% to 10.66%) and age (from 12.69% to 1.80%), demonstrating that distributing latent disease evidence across the full penultimate layer is a principled and effective strategy for improving demographic fairness in medical AI.

2605.19392 2026-05-20 cs.LG 版本更新

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

理解Adam在零和游戏中的动态:一种微分方程方法

Yi Feng, Weiming Ou, Xiao Wang

发表机构 * Aarhus University, Aarhus, Denmark.(奥胡斯大学) MoE Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics, Shanghai, China(教育部交叉信息与经济学联合实验室,上海财经大学) Shanghai University of Finance and Economics, Shanghai, China(上海财经大学)

AI总结 本文通过微分方程方法研究Adam-DA在零和游戏中的动态,揭示了动量参数在零和游戏中的作用与最小化问题相反,通过GAN实验验证了这一发现。

详情
AI中文摘要

Adam在训练神经网络中的显著成功自然导致其下降-上升对应物Adam-DA被广泛用于解决零和游戏。尽管在实践中很受欢迎,但对Adam-DA的严格理论理解仍滞后。在本文中,我们推导了普通微分方程(ODEs),这些方程是Adam-DA的连续时间极限。这些ODEs紧密近似Adam-DA的离散时间动态,提供了一个可分析的框架来理解其在零和游戏中的行为。利用这种ODE方法,我们研究了Adam-DA的两个基本方面:局部收敛性和隐式梯度正则化。我们的分析揭示了在零和游戏中一阶和二阶动量参数的作用恰好与在最小化问题中已记录的效果相反。我们通过多个架构和数据集的GAN实验验证了这些预测,展示了这种反转的动量效应的实用意义。

英文摘要

The remarkable success of the Adam in training neural networks has naturally led to the widespread use of its descent-ascent counterpart, Adam-DA, for solving zero-sum games. Despite its popularity in practice, a rigorous theoretical understanding of Adam-DA still lags behind. In this paper, we derive ordinary differential equations (ODEs) that serve as continuous-time limits of the Adam-DA. These ODEs closely approximate the discrete-time dynamics of Adam-DA, providing a tractable analytical framework for understanding its behavior in zero-sum games. Using this ODE approach, we investigate two fundamental aspects of Adam-DA: local convergence and implicit gradient regularization. Our analysis reveals that the roles of the first- and second-order momentum parameters in zero-sum games are exactly the opposite of their well-documented effects in minimization problems. We validate these predictions through GAN experiments across multiple architectures and datasets, demonstrating the practical implications of this reversed momentum effect.

2605.19391 2026-05-20 stat.ML cs.LG 版本更新

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Tweedie's公式与超越高斯的扩散生成模型

Wenpin Tang, Nizar Touzi, Zikun Zhang, Xun Yu Zhou

发表机构 * Department of Industrial Engineering and Operations Research, Columbia University(哥伦比亚大学工业工程与运筹学系) Department of Finance and Risk Engineering, New York University(纽约大学金融与风险工程系)

AI总结 本文扩展了Tweedie公式以适用于重要的非高斯过程,如几何布朗运动、平方贝塞尔过程和Cox-Ingersoll-Ross过程,并利用这些公式在图像和金融时间序列生成以及经验贝叶斯估计中应用非高斯扩散模型,展示了非高斯模型的潜力。

Comments 27 pages, 18 figures

详情
AI中文摘要

扩散模型在生成未知数据分布的样本方面取得了显著成功。大多数流行的基于随机微分方程的扩散模型通过向目标分布添加高斯噪声,将其转换为简单的先验分布,然后使用去噪分数匹配,这是Tweedie公式的结果,来学习分数函数并从噪声中生成干净的样本。然而,具有状态依赖扩散系数的非高斯扩散模型以及相应的Tweedie公式一直被忽视。在本文中,我们扩展了Tweedie公式以适用于重要的非高斯过程,包括几何布朗运动(GBM)、平方贝塞尔(BESQ)过程和Cox-Ingersoll-Ross(CIR)过程,从而得到相应的去噪分数匹配目标。然后,我们应用推导出的公式,使用基于GBM和CIR的扩散模型进行图像和金融时间序列生成,并在BESQ设置下进行经验贝叶斯估计。报告的实验结果展示了非高斯模型的潜力。

英文摘要

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then use denoising score matching, a consequence of Tweedie's formula, to learn the score function and generate clean samples from noise. However, non-Gaussian diffusion models with state-dependent diffusion coefficient have been largely underexplored, as have the corresponding Tweedie's formulae. In this work, we extend Tweedie's formula to important non-Gaussian processes, including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes, thereby yielding the corresponding denoising score-matching objectives. We then apply the derived formulae to image and financial time series generation using GBM- and CIR-based diffusion models, and to empirical Bayes estimation under the BESQ setting. The reported experimental results demonstrate the potential of non-Gaussian models.

2605.19377 2026-05-20 cs.LG cs.AI 版本更新

The Evaluation Game: Beyond Static LLM Benchmarking

评估游戏:超越静态LLM基准测试

Paul Wang, Jade Garcia-Bourrée, Anne-Marie Kermarrec, Vincent Corruble

发表机构 * Sorbonne Université, CNRS, LIP6(索邦大学,国家科学研究中心,LIP6实验室) École Polytechnique Fédérale de Lausanne(洛桑联邦理工学院)

AI总结 本文提出了一种基于博弈论的框架,用于评估大型语言模型的安全性,通过数据增强的群作用结构分析评估者与训练者之间的互动,揭示了对抗性测试中局部泛化和记忆补丁的区别。

Comments 36 pages

详情
AI中文摘要

随着劫持攻击,即能够绕过安全限制的对抗性输入,持续在大型语言模型中被发现,实践者越来越依赖微调作为防御策略。然而,这种鲁棒性微调的理论基础仍不明确。我们引入了一个博弈论框架,将评估者(检查模型中的劫持攻击)与训练者之间的互动形式化为一个双人博弈。我们方法的关键特征是使用群作用,一种数学结构,用于正式表示数据增强。最简单的非平凡实例是圆周上的循环平移群,在此情况下,我们展示了根据训练者的泛化范围的不同而出现的各种情形。在临界阈值以下,评估者在线性多轮次中保持恒定的误判率,而在其他情况下则表现出非常不同的行为。我们进一步提供了实证证据支持模型的局部依赖性:对于我们测试的三个模型家族(Llama、Qwen和Mistral),我们有显著证据表明,在对抗性提示上微调只会导致局部泛化,测试示例上的拒绝率与到微调提示的距离高度相关。我们的框架重新定义了对抗性评估的核心对象:基准不是静态的提示集,而是在评估者群作用下的轨道,而忽略训练者适应的审计协议无法区分真正的修复和记忆补丁。

英文摘要

As jailbreaks, adversarially crafted inputs that bypass safety constraints, continue to be discovered in Large Language Models, practitioners increasingly rely on fine-tuning as a defensive strategy. Yet the theoretical foundations underlying this robustness fine-tuning remain underexplored. We introduce a game-theoretic framework in which the interaction between an evaluator (auditing the model for jailbreaks) and a trainer is formalized as a two-player game. A key feature of our approach is the use of group actions, a mathematical structure that captures symmetries and transformations, to formally represent data augmentation. The simplest non-trivial instance is the circle with cyclic translation groups, where we exhibit various regimes depending on the trainer's generalization range. Below a critical threshold, the evaluator maintains a constant miss ratio for linearly many rounds, whereas other settings can yield very different behaviors. We further provide empirical evidence supporting locality-dependence of the model: for the three model families we tested (Llama, Qwen and Mistral), we have significant evidence that fine-tuning on adversarial prompts induces only local generalization, with refusal rates on test examples highly correlated with the distance to the fine-tuning prompts. Our framework recasts the central object of adversarial evaluation: a benchmark is not a static set of prompts but an orbit under the evaluator's group action, and audit protocols that ignore trainer-side adaptation cannot distinguish a genuine fix from a memorized patch.

2605.19374 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings

基于概念的噪声负样本抑制用于零样本分类和胸片发现的 grounding

Chenyu Lian, Hong-Yu Zhou, Chun-Ka Wong, Jing Qin

发表机构 * The Center for Smart Health, School of Nursing, the Hong Kong Polytechnic University, Hong Kong, China(香港理工大学智能健康中心,护理学院,中国香港) Research Institute for Smart Ageing, the Hong Kong Polytechnic University, Hong Kong, China(香港理工大学智能老龄化研究 institute,中国香港) School of Biomedical Engineering, Tsinghua Medicine, Tsinghua University, Beijing, China(清华大学生物医学工程学院,清华大学,北京,中国) Queen Mary Hospital, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China(香港大学李嘉诚医学院Queen Mary医院,中国香港)

AI总结 本文提出了一种基于概念的噪声负样本抑制框架CoNNS,通过构建层次化概念本体,解决不同患者间相似发现导致的噪声负样本问题,提升零样本理解任务的性能。

Comments Early accepted by MICCAI 2026

详情
AI中文摘要

利用胸片和放射学报告进行视觉-语言对齐已成为零样本分类和胸片发现 grounding 的先进范式。然而,标准对比学习通常将不同患者的影像和报告简单视为负样本对。这种假设引入了噪声负样本,因为不同患者经常表现出相似的发现。此类噪声负样本导致语义模糊并降低零样本理解任务的性能。为了解决这一挑战,我们提出CoNNS,一种基于概念的噪声负样本抑制框架。为了支持负样本抑制机制,不同于先前方法使用原始报告或模板化文本,我们利用大型语言模型构建层次化概念本体。本体通过显式建模存在性、属性(位置和特征)和文本(证据片段和存在陈述)来结构化41个关键临床概念。利用该本体,我们实现了包含三个步骤的跨患者对再标记策略:(1)细粒度分解,根据发现存在性对配对进行分类;(2)噪声负样本过滤,通过移除假负样本解决语义冲突;(3)困难负样本挖掘,利用轻量级语言模型识别细微属性差异。最后,我们提出了一种概念感知的NCE损失,以对齐视觉特征与文本并抑制识别出的噪声负样本。在多粒度零样本grounding任务和五个零样本分类数据集上的广泛实验验证了CoNNS优于现有最先进模型。代码可在https://github.com/DopamineLcy/conns获取。

英文摘要

Vision-language alignment using chest X-rays and radiology reports has emerged as an advanced paradigm for zero-shot classification and grounding of chest X-ray findings. However, standard contrastive learning typically treats radiographs and reports from different patients simply as negative pairs. This assumption introduces noisy negatives, as different patients frequently exhibit similar findings. Such noisy negatives cause semantic ambiguity and degrade performance in zero-shot understanding tasks. To address this challenge, we propose CoNNS, a concept-guided noisy-negative suppression framework. To support the negative suppression mechanism, unlike previous methods that use raw reports or templatized texts, we construct a hierarchical concept ontology using large language models. The ontology structures 41 key clinical concepts by explicitly modeling presence, attributes (location and characteristics), and texts (evidential segment and presence statement). Leveraging this ontology, we implement a cross-patient pair relabeling strategy comprising three steps: (1) Fine-Grained Breakdown to categorize pairs based on finding presence; (2) Noisy Negative Filtering to resolve semantic conflicts by removing false negatives; and (3) Hard Negative Mining to identify subtle attribute discrepancies using a lightweight language model. Finally, we propose a Concept-Aware NCE loss to align visual features with text while suppressing the identified noisy negatives. Extensive experiments across multi-granularity zero-shot grounding tasks and five zero-shot classification datasets validate that CoNNS outperforms existing state-of-the-art models. The code is available at https://github.com/DopamineLcy/conns.

2605.19373 2026-05-20 cs.DC cs.AI cs.LG 版本更新

Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies

用于神经网络模型融合的无冲突复制数据类型:一种双层架构,使26种策略兼容CRDT模型融合

Ryan Gillespie

发表机构 * Independent researcher(独立研究者)

AI总结 本文提出了一种双层架构CRDTMergeState,通过将任何融合策略封装在CRDT兼容层中,解决了26种神经网络融合策略在分布式操作中无法满足交换律、结合律和幂等律的结构性问题,实现了强最终一致性。

详情
AI中文摘要

我们测试的所有26种神经网络融合策略,包括加权平均、SLERP、TIES、DARE、Fisher融合和进化方法,均无法满足用于无冲突分布式操作所需的代数属性(交换性、结合性和幂等性)。我们证明这种失败是结构性的:基于规范化的方法无法同时满足这三个属性。为了解决这个问题,我们提出了一种双层架构——CRDTMergeState,它将任何融合策略封装在CRDT兼容(无冲突复制数据类型)层中。第一层通过OR-Set CRDT语义管理贡献,其中融合操作是集合并集——这显然具有交换性、结合性和幂等性。第二层将融合策略作为确定性纯函数应用于一个规范有序的贡献集上,随机性从Merkle根中播种。我们证明这种分离保证了强最终一致性:所有接收相同贡献的副本计算出相同的融合模型,无论消息顺序如何。实证验证涵盖三个层次:受控的4x4张量(104/104测试通过)、生产规模的模型(最高7.24B参数,208种策略级测试,43,368种层级属性检查在受限张量分辨率下)以及多节点收敛在 gossip 和分区修复(100个节点,20种顺序)中,CRDT开销低于0.5毫秒。由于封装器是透明的,下游性能由构造保证,通过字节相同输出验证确认。参考实现可用作crdt-merge v0.9.4。

英文摘要

All 26 neural network merge strategies we tested including weight averaging, SLERP, TIES, DARE, Fisher merging, and evolutionary approaches -- fail the algebraic properties (commutativity, associativity, idempotency) required for conflict-free distributed operation. We prove that this failure is structural: normalisation-based merges cannot simultaneously satisfy all three properties. To resolve this, we present a two-layer architecture -- CRDTMergeState -- that wraps any merge strategy in a CRDT-compliant (Conflict-Free Replicated Data Type) layer. Layer 1 manages contributions via OR-Set CRDT semantics, where the merge operation is set union -- trivially commutative, associative, and idempotent. Layer 2 applies merge strategies as deterministic pure functions over a canonically-ordered contribution set, with randomness seeded from the Merkle root. We prove that this separation guarantees Strong Eventual Consistency: all replicas receiving the same contributions compute identical merged models, regardless of message ordering. Empirical validation spans three tiers: controlled 4x4 tensors (104/104 tests pass), production-scale models up to 7.24B parameters (208 strategy-level tests, 43,368 layer-level property checks at capped tensor resolution), and multi-node convergence under gossip and partition healing (100 nodes, 20 orderings), with CRDT overhead below 0.5 ms. Because the wrapper is transparent, downstream performance is identical by construction, confirmed via byte-identical output verification. The reference implementation is available as crdt-merge v0.9.4.

2605.19366 2026-05-20 cs.LG 版本更新

Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems

准确、高效且可解释的深度学习方法用于环境科学问题

Jimeng Shi

发表机构 * College of Engineering and Computing(工程与计算学院)

AI总结 本文提出三种针对复杂环境科学问题的深度学习方法:用于海岸河流洪水预测的WaLeF模型、用于全球天气预测的CoDiCast模型以及用于环境科学科学问答的Hypercube-RAG方法,旨在提高环境智能的准确性、效率和可解释性。

Comments 161 pages

详情
AI中文摘要

环境科学在保护生态系统中起着关键作用,这一领域由大规模、异构数据驱动。在大数据时代,人工智能(AI)已成为一种变革性工具,用于学习模式并支持决策。本论文开发了针对复杂环境科学问题的AI方法,以实现环境智能,研究了三个具体挑战。首先,我们专注于海岸河流系统的洪水预测和管理。传统物理模型计算成本高,限制了实时应用。为此,我们提出了一种基于深度学习(DL)的水位预测模型WaLeF,以及一种基于预测的深度学习模型FIDLAr用于水位管理。在佛罗里达南部易发洪水的海岸系统中评估,该系统以极端降雨和海平面上下波动为特点,FIDLAr在准确性和效率上优于基线模型,同时提供可解释的输出。其次,我们针对全球天气预测,这受到大规模数据规模的挑战。传统物理方法是确定性的且计算密集型。我们提出CoDiCast,一种条件扩散模型,专门用于概率天气预测。从生成AI用于预测任务中衍生而来,实验表明CoDiCast实现了准确且高效的预测,具有明确的不确定性量化。最后,我们解决环境科学中的科学问答问题。在回答领域内问题时,大型语言模型(LLMs)常常由于知识过时或有限而产生幻觉。虽然检索增强生成(RAG)检索了领域特定的知识,但现有方法在准确度、效率或可解释性之间进行权衡。我们提出Hypercube-RAG,基于结构化的文本立方体框架,成功同时表现出这三种属性。

英文摘要

Environmental science plays a pivotal role in safeguarding ecosystems, a domain driven by large-scale, heterogeneous data. In the big data era, artificial intelligence (AI) has emerged as a transformative tool for learning patterns and supporting decision-making. This dissertation develops AI-based approaches tailored to complex environmental science problems to achieve Environmental Intelligence, studying three specific challenges. First, we focus on flood prediction and management in coastal river systems. Conventional physics-based models are computationally intensive, limiting real-time application. To overcome this, we propose a deep learning (DL)-based model, WaLeF, for water level forecasting, and a forecast-informed DL model, FIDLAr, to manage water levels. Evaluated in a flood-prone coastal system in South Florida characterized by extreme rainfall and sea level fluctuations, FIDLAr outperforms baselines in accuracy and efficiency while providing interpretable outputs. Second, we target global weather prediction, which is challenged by massive data scale. Traditional physics methods are deterministic and computationally heavy. We propose CoDiCast, a conditional diffusion model tailored for probabilistic weather forecasting. Adapted from generative AI for predictive tasks, experiments show CoDiCast achieves accurate, efficient forecasts with explicit uncertainty quantification. Lastly, we address scientific question-answering in environmental science. When answering in-domain questions, large language models (LLMs) often suffer from hallucinations due to out-of-date or limited knowledge. While retrieval-augmented generation (RAG) retrieves domain-specific knowledge, existing methods trade off accuracy, efficiency, or explainability. We propose Hypercube-RAG, built on a structured text cube framework, which successfully exhibits all three properties simultaneously.

2605.19360 2026-05-20 cs.CV cs.LG cs.NE physics.app-ph physics.optics 版本更新

Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

可扩展的、节能的光学-神经架构用于多路复用的深度伪造视频检测

Parnian Ghapandar Kashani, Shiqi Chen, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校电气与计算机工程系) Bioengineering Department, University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校生物工程系) California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校加州纳米系统研究所)

AI总结 本文提出了一种结合轻量级数字前端和空间复用光学解码后端的混合深度伪造视频检测框架,通过可编程空间光调制器实现大规模并行模拟推理,从而在降低计算成本的同时提高视频真实性预测的吞吐量和准确性。

Comments 30 Pages, 8 Figures

详情
AI中文摘要

AI生成视觉媒体的快速普及催生了对高效、可信的深度伪造检测系统的需求。然而,现有基于深度学习的检测方法依赖于计算密集且能耗高的推理算法,限制了其可扩展性。本文提出了一种混合的数字-模拟深度伪造视频检测框架,结合轻量级数字前端和空间复用光学解码后端,通过可编程空间光调制器实现大规模并行模拟推理。通过在单次光学传播过程中同时处理15个或更多的视频流,该系统在降低计算成本的同时实现了高吞吐量和准确的视频级真实性预测。我们使用不同数据集验证了该混合深度伪造视频处理器,包括经典面部交换、现实世界深度伪造记录和完全AI生成的视频。使用在可见光谱范围内操作的空间复用实验装置,我们在Celeb-DF视频数据集上实现了97.79%的深度伪造检测准确率、99.86%的灵敏度和95.72%的特异性,分别在15个视频并行处理的单次光学传播中测试。多路复用的光学解码器还展示了对各种视频退化、噪声、压缩、实验偏移和黑盒对抗攻击的鲁棒性。我们的结果表明,将光学计算整合到AI推理中可以同时提高吞吐量、能效和对抗鲁棒性——这三个属性在纯数字系统中难以同时实现。

英文摘要

The rapid proliferation of AI-generated visual media has created an urgent need for efficient, trustworthy deepfake detection systems. However, existing deep learning-based detection methods rely on computationally intensive and energy-demanding inference algorithms, limiting their scalability. Here, we present a hybrid digital-analog deepfake video detection framework that combines a lightweight digital front-end with a spatially multiplexed optical decoding back-end for massively parallel analog inference through a programmable spatial light modulator. By simultaneously processing 15 or more video streams within a single optical propagation pass, the system enables high-throughput and accurate video-level authenticity prediction at reduced computational cost compared with purely digital methods. We validated this hybrid deepfake video processor using different datasets spanning classical face-swapping, real-world deepfake recordings, and fully AI-generated videos. Using a spatially multiplexed experimental set-up operating in the visible spectrum, we achieved average deepfake detection accuracy, sensitivity and specificity of 97.79%, 99.86% and 95.72%, respectively, on the Celeb-DF video dataset with 15 videos tested in parallel in a single optical pass per inference. The multiplexed optical decoder also demonstrates resilience against various types of video degradation, noise, compression, experimental misalignments and black-box adversarial attacks. Our results show that integrating optical computation into AI inference enables simultaneous gains in throughput, energy efficiency, and adversarial robustness - three properties that are difficult to achieve together in purely digital systems.

2605.19359 2026-05-20 cs.CV cs.LG 版本更新

MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

MAM-CLIP:基于乳腺X线图集的视觉-语言预训练用于BI-RADS分类

Halil Ibrahim Gulluk, Olivier Gevaert

发表机构 * Department of Electrical Engineering(电气工程系) Biomedical Informatics Research (BMIR)(生物医学信息学研究(BMIR)) Stanford University(斯坦福大学)

AI总结 本文提出MAM-CLIP模型,通过预训练PubMedBERT和对比学习来提升乳腺X线图像的BI-RADS分类性能,实验表明在标注样本稀缺时,该方法能显著提高F1分数。

详情
AI中文摘要

深度学习方法在预测乳腺X线图像的BI-RADS评分方面已显示出有前景的结果。然而,这些图像的解释可能因人而异,即使在放射科医生之间也可能存在差异。鉴于乳腺X线的固有复杂性,仅依靠图像标签训练分类模型通常效果有限。为了解决这一挑战,我们收集了来自两个乳腺图集的2313张乳腺X线图像及其对应的描述。我们提出的方法采用了一个多模态模型,使用预训练的PubMedBERT作为语言组件。通过在图像-文本对上进行对比学习训练,使视觉编码器能够吸收描述中丰富的信息,从而提高其对乳腺X线发现的理解。然后,我们对两个数据集进行微调以进行BI-RADS预测,其性能优于没有此预训练的模型,尤其是在标注样本稀缺时。在3类平均F1分数上,改进范围从+1%到+14%:在40K训练样本时增加+1%,在1K样本时增加+14%。此外,我们的实验表明,来自乳腺图集的2K图像-文本对比2K标注样本更具信息量,当训练样本超过10K时,平均提升幅度为+1.1%。总体而言,我们的工作提供了一个用于乳腺X线的视觉-语言模型,并突显了乳腺图集文本信息的价值。此外,我们公开发布了TEKNOFEST数据集的预处理乳腺X线图像。训练代码、预训练模型权重、数据提取脚本和发布的数据集均可在:https://github.com/igulluk/MAM-CLIP上公开获取。

英文摘要

Deep learning methods have demonstrated promising results in predicting BI-RADS scores from mammography images. However, the interpretation of these images can vary, leading to discrepancies even among radiologists. Given the inherent complexity of mammograms, training classification models solely on image labels often yields limited performance. To address this challenge, we curated 2313 mammogram images and their corresponding captions from two mammography atlases. Our proposed approach employs a multi-modal model that uses a pretrained PubMedBERT as the language component. By training this model on image-text pairs with contrastive learning, we enable the vision encoder to absorb the rich information contained in the captions, thereby improving its understanding of mammography findings. We then fine-tune the vision encoder on two datasets for BI-RADS prediction, achieving superior performance compared with models trained without this pretraining, particularly when labeled samples are scarce. The improvement in the 3-class average F1 score ranges from +1% to +14%: a +1% increase with 40K training samples, and a +14% increase with 1K samples. Furthermore, our experiments reveal that 2K image-text pairs from mammography atlases can be more informative than 2K labeled samples for label prediction, with an average margin of +1.1% when more than 10K training samples are available. Overall, our work provides a vision-language model for mammography and highlights the value of textual information from mammography atlases. In addition, we publicly release preprocessed mammography images of the TEKNOFEST dataset. The training code, pre-trained model weights, data extraction scripts, and the released dataset are publicly available at: https://github.com/igulluk/MAM-CLIP

2605.19355 2026-05-20 cs.GR cs.AI cs.CV cs.LG 版本更新

Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance

具有空间自适应交互引导的皮肤运动重定向

Soojin Choi, Seokhyeon Hong, Chaelin Kim, Junghyun Nam, Junhyuk Jeon, Junyong Noh

发表机构 * Visual Media Lab(视觉媒体实验室) KAIST(韩国科学技术院)

AI总结 本文提出了一种几何感知的运动重定向框架,通过在空间自适应锚点上进行接近匹配,保留交互语义,以解决在不同身体形状角色之间重定向运动时保持交互语义(如自接触和近身体接近)的挑战。

Comments SIGGRAPH 2026 / ACM TOG. Project page available at https://suzyn.github.io/space_page/

详情
AI中文摘要

在不同身体形状的角色之间进行运动重定向,同时保持交互语义,如自接触和近身体接近,仍是一个具有挑战性的问题。尽管最近的几何感知方法通过维持预定义对应区域之间的空间关系来解决这一问题,但它们对静态对应关系的依赖在目标角色表现出夸张的身体比例时往往遇到困难。在本文中,我们提出了一种几何感知的运动重定向框架,通过在空间自适应锚点上进行接近匹配来保留交互语义。与以往具有静态锚点定义的方法不同,所提出的方法动态地将锚点重新定位到目标角色上可到达的区域。这通过基于Transformer的锚点细化策略实现,该策略预测锚点位移,并通过可微的软投影将转换后的锚点限制在目标角色的几何结构上。通过结合源角色的姿势依赖空间结构,适应的锚点为交互感知的重定向提供结构上连贯的指导。在这些锚点的条件下,基于图的自编码器预测目标骨骼运动,以保持源的空问配置。为了鼓励锚点适应和运动重定向之间的任务对齐优化,我们采用交替训练方案,其中每个模块依次优化。通过广泛的评估,我们证明了我们的方法在保持交互保真度方面优于最先进的方法,适用于多样化的角色几何结构。

英文摘要

Retargeting motion across characters with varying body shapes while preserving interaction semantics, such as self-contact and near-body proximity, remains a challenging problem. While recent geometry-aware approaches address this by maintaining spatial relationships between predefined corresponding regions, their reliance on static correspondences often struggles when the target character exhibits exaggerated body proportions. In this paper, we present a geometry-aware motion retargeting framework that preserves interaction semantics by performing proximity matching over spatially adaptive anchors. Unlike prior methods with static anchor definitions, the proposed method dynamically repositions anchors to reachable regions on the target character. This is achieved via a Transformer-based anchor refinement strategy that predicts anchor displacements and constrains the translated anchors to remain on the target character geometry through differentiable soft projection. By incorporating pose-dependent spatial structures from the source character, the adapted anchors provide structurally coherent guidance for interaction-aware retargeting. Conditioned on these anchors, a graph-based autoencoder predicts target skeletal motion that preserves the spatial configuration of the source. To encourage task-aligned optimization between anchor adaptation and motion retargeting, we adopt an alternating training scheme in which each module is optimized in turn. Through extensive evaluations, we demonstrate that our method outperforms state-of-the-art approaches in preserving interaction fidelity across diverse character geometries.

2605.19352 2026-05-20 q-bio.NC cs.AI cs.LG 版本更新

Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay

在自然主义游戏过程中,视觉语言和动作模型的推理与动作表示的脑部对齐

Subba Reddy Oota, Anant Khandelwal, Khushbu Pahwa, Satya Sai Srinath Namburi, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

发表机构 * Independent(独立) Microsoft Research(微软研究院) AWS AI Labs(AWS人工智能实验室) GE HealthCare(通用电气医疗) IIT Delhi(德里理工学院) IIIT-Hyderabad(海得拉巴理工学院) Microsoft(微软)

AI总结 本文研究了在自然主义游戏过程中,视觉语言模型和大动作模型的推理与动作表示在脑部活动中的对齐情况,发现动作聚焦和推理聚焦的提示影响模型内部表示与fMRI脑活动的对齐程度。

Comments 21 pages, 11 figures

详情
AI中文摘要

理解人类和人工智能系统如何通过与环境互动来预测和规划是一个在神经科学和机器学习交汇处的基本挑战。大多数脑编码研究集中在将人工模型与大脑活动对齐,特别是在语言理解和被动视觉处理期间,而交互式脑对齐研究迄今为止大多局限于强化学习(RL)代理和理论模型。为了解决这一差距,我们使用fMRI记录参与者玩自然主义的Atari风格视频游戏,研究了来自两个基础模型家族(即视觉语言模型(VLMs)和大动作模型(LAMs))的代表性模型的脑部对齐情况。具体而言,我们研究了动作聚焦和推理聚焦的提示如何影响模型的内部表示并与其fMRI脑活动对齐。首先,我们发现VLMs和LAMs在每个体素编码性能上显著优于RL基线,即使在匹配的特征维度下,优势依然存在。其次,提示驱动的增益与皮层处理层次结构成比例:最大的改进出现在前额叶和运动规划区域,而早期视觉皮层的增益大约只有后者的二分之一。第三,方差分区揭示了不同的表征组织:VLM是提示对称的(12.5%独特的动作vs.13.6%独特的推理),而LAM是提示不对称的(27%独特的动作vs.-5%独特的推理),不对称性在前额运动皮层最强。总的来说,这些结果表明,即使在全脑预测准确性在统计上相等的情况下,动作专门化的微调也会将多模态表示重新组织到与动作相关的神经计算中。

英文摘要

Understanding how humans and artificial intelligence systems predict and plan by interacting with their environment is a fundamental challenge at the intersection of neuroscience and machine learning. Most brain-encoding studies focus on aligning artificial models with brain activity during language comprehension or passive visual processing, while interactive brain-alignment studies have to date been largely limited to reinforcement-learning (RL) agents and theory-based models. To address this gap, we study brain alignment of representative models from two foundation-model families, namely vision-language models (VLMs) and large-action models (LAMs), using fMRI recordings from participants playing naturalistic Atari-style video games. Specifically, we examine how action-focused and reasoning-focused prompts shape model's internal representations and align with fMRI brain activity. First, we find that both VLMs and LAMs exhibit significantly exhibit voxel-wise encoding performance than RL baselines, with the advantage holding even under matched feature dimensionality. Second, prompt-driven gains scale with the cortical processing hierarchy: the largest improvements appear in frontal-parietal and motor-planning regions, while early visual cortex gains roughly half as much. Third, variance partitioning reveals a qualitatively different representational organization: VLM is prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAM is prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. Together, these results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations even when whole-brain prediction accuracy is statistically equivalent between VLM and LAM.

2605.19350 2026-05-20 cs.GR cs.LG 版本更新

CompoSE: Compositional Synthesis and Editing of 3D Shapes via Part-Aware Control

CompoSE:通过部分感知控制进行3D形状的组合合成与编辑

Habib Slim, Shariq Farooq Bhat, Mohamed Elhoseiny, Yifan Wang, Mike Roberts

发表机构 * King Abdullah University of Science and Technology (KAUST)(卡布斯大学) Adobe Research(Adobe研究)

AI总结 本文提出CompoSE方法,通过部分感知控制实现3D形状的组合合成与编辑,核心方法是使用扩散变压器架构在局部和全局之间交替处理部分,并通过新颖的条件技术确保对用户输入的强遵循,主要贡献是无需部分级文本提示即可直接从用户粗略布局指导中学习部分语义和对称性。

详情
AI中文摘要

创建和编辑高质量3D内容仍然是计算机图形学中的核心挑战。我们通过引入CompoSE,一种新颖的方法,通过部分感知控制进行3D形状的组合合成与编辑来解决这一挑战。我们的方法以一组粗略的几何基础原始体(例如,包围盒)作为输入,这些原始体代表不同的物体部分并以特定的空间配置排列,输出部分分离的3D对象,支持对单个部分的局部细粒度(即组合式)编辑。使方法可行的关键见解是使用扩散变压器架构,该架构在局部处理每个部分和跨部分全局聚合上下文信息之间交替,并具有新颖的条件技术,确保对用户输入的强遵循。重要的是,我们的方法学会直接从用户粗略布局指导中推断部分语义和对称性,并不需要部分级文本提示。我们证明我们的方法能够实现强大的部分级编辑能力,包括上下文感知的替换、添加、删除和风格保持的缩放操作。通过广泛的实验,我们显示我们的方法在引导合成方面显著优于现有方法,这通过客观指标和基于LLM的评估来衡量。

英文摘要

Creating and editing high-quality 3D content remains a central challenge in computer graphics. We address this challenge by introducing CompoSE, a novel method for Compositional Synthesis and Editing of 3D shapes via part-aware control. Our method takes as input a set of coarse geometric primitives (e.g., bounding boxes) that represent distinct object parts arranged in a particular spatial configuration, and synthesizes as output part-separated 3D objects that support localized granular (i.e., compositional) editing of individual parts. The key insight that enables our method is our use of a diffusion transformer architecture that alternates between processing each part locally and aggregating contextual information across parts globally, and features a novel conditioning technique that ensures strong adherence to the user's input. Importantly, our method learns to infer part semantics and symmetries directly from the user's coarse layout guidance, and does not require part-level text prompts. We demonstrate that our method enables powerful part-level editing capabilities, including context-aware substitution, addition, deletion, and style-preserving resizing operations. We show through extensive experiments that our method significantly outperforms existing approaches on guided synthesis, as measured by objective metrics and LLM-based evaluations.

2605.19346 2026-05-20 cs.CL cs.AI cs.LG 版本更新

IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

IMLJD:印度婚姻诉讼分析计算数据集

Joy Bose

发表机构 * Independent Researcher(独立研究员)

AI总结 本文提出IMLJD数据集,用于分析印度婚姻纠纷案件,包含3613份法院判决,涵盖IPC第498A条、《家庭暴力保护法》和CrPC第482条案件,通过结构化标签、元数据指标和知识图谱揭示最高法院与卡纳塔克高等法院中撤销请求的成功率差异。

Comments 8 pages, 2 figures, 5 tables. Dataset available at huggingface.co/datasets/joyboseroy/imljd and Code at github.com/joyboseroy/imljd

详情
AI中文摘要

我们介绍了IMLJD,一个包含3,613份印度法院判决的开放数据集,涵盖受IPC第498A条、《家庭暴力保护法》和CrPC第482条规制的婚姻纠纷案件。该数据集涵盖最高法院(2000-2024年,1,474份案件)和卡纳塔克高等法院(2018-2024年,2,139份案件),包含结构化结果标签、元数据衍生指标和知识图谱。我们发现,最高法院级别的撤销请求成功率为57.6%,而卡纳塔克高等法院为39.7%。在匹配的2018至2024年期间,最高法院的撤销率是59.3%,扩大了差距至19.6个百分点,证实该发现对时间调整具有鲁棒性。该数据集、代码和知识图谱已公开发布在https://github.com/joyboseroy/imljd和https://huggingface.co/datasets/joyboseroy/imljd。

英文摘要

We present IMLJD, an open dataset of 3,613 Indian court judgments covering matrimonial disputes under IPC Section 498A, the Protection of Women from Domestic Violence Act, and CrPC Section 482. The dataset covers the Supreme Court of India from 2000 to 2024 (1,474 cases) and the Karnataka High Court from 2018 to 2024 (2,139 cases), with structured outcome labels, metadata-derived indicators, and a knowledge graph. We find that 57.6% of quashing petitions succeed at the Supreme Court level compared to 39.7% at the Karnataka High Court level. On a matched 2018 to 2024 period, the SC quash rate is 59.3%, widening the differential to 19.6 percentage points and confirming the finding is robust to temporal adjustment. The dataset, code, and knowledge graph are released openly at https://github.com/joyboseroy/imljd and https://huggingface.co/datasets/joyboseroy/imljd.

2605.19343 2026-05-20 cs.LG 版本更新

What Makes a Representation Good for Single-Cell Perturbation Prediction?

什么使一个表示对单细胞扰动预测有效?

Wenkang Jiang, Yuhang Liu, Yichao Cai, Erdun Gao, Jiayi Dong, Ehsan Abbasnejad, Lina Yao, Javen Qinfeng Shi

发表机构 * Australian Institute for Machine Learning(澳大利亚机器学习研究院) Responsible AI Research Centre(负责任人工智能研究中心) College of Computer Science and Artificial Intelligence(计算机科学与人工智能学院) Department of Data Science and AI(数据科学与人工智能部门) School of Computer Science and Engineering(计算机科学与工程学院)

AI总结 本文提出PerturbedVAE框架,通过分离扰动特定信息和主导不变结构,恢复因果表示以有效利用此类信息进行预测,并通过可识别性分析明确在特定条件下如何具体指定框架。

Comments Accepted to ICML 2026

详情
AI中文摘要

单细胞扰动建模对于理解和预测细胞对遗传扰动的反应至关重要。然而,现有方法,从因果表示学习到基础模型,往往面临一个被忽视的挑战:基因表达主要由扰动不变信息主导,而扰动特定信号本质上是稀疏的。因此,学习的表示要么将不变和扰动特定信息混合,导致虚假且不可推广的预测器,要么完全抑制扰动特定信号,使它们对预测无效。为了解决这一问题,我们提出了PerturbedVAE,一个通用框架,旨在解决这种信号不平衡。该框架明确将扰动特定信息与主导不变结构分开,并恢复因果表示,以有效利用此类信息进行预测。我们进一步提供了可识别性分析,该分析刻画了稀疏扰动效应可以可靠恢复的条件,从而明确在这些条件下如何具体指定框架。实证上,PerturbedVAE在广泛使用的基准上实现了最先进的性能,在多个评估设置中取得显著进展,在离分布组合预测中获得显著提升,并揭示了可解释的扰动响应程序。

英文摘要

Single-cell perturbation modeling is fundamental for understanding and predicting cellular responses to genetic perturbations. However, existing approaches, from causal representation learning to foundation models, often struggle with an overlooked challenge: gene expression is dominated by perturbation-invariant information, while perturbation-specific signals are intrinsically sparse. As a result, learned representations either entangle invariant and perturbation-specific information, leading to spurious and non-generalizable predictors, or suppress perturbation-specific signals altogether, rendering them ineffective for prediction. To address this, we propose PerturbedVAE, a general framework designed to resolve this signal imbalance. The framework explicitly separates perturbation-specific information from dominant invariant structure and recovers causal representations to effectively utilize such information for prediction. We further provide an identifiability analysis that characterizes the conditions under which sparse perturbation effects can be reliably recovered, thereby clarifying how the framework can be concretely specified under such conditions. Empirically, PerturbedVAE achieves state-of-the-art performance on a widely used benchmark across multiple evaluation settings, yielding significant gains on out-of-distribution combinatorial predictions and uncovering interpretable perturbation-response programs.

2605.19341 2026-05-20 cs.CL cs.AI cs.LG stat.ML 版本更新

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

HalluWorld: 一个用于通过参考世界模型控制幻觉的基准

Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Patronus AI Independent Researcher(独立研究者) Stanford University(斯坦福大学) The Ohio State University(俄亥俄州立大学) DegenAI Labs(DegenAI实验室)

AI总结 本文提出HalluWorld基准,通过显式参考世界模型研究语言模型的幻觉问题,发现不同任务中幻觉表现不一致,表明幻觉源于多种失败模式而非单一能力。

Comments HalluWorld benchmark (code and data) at github.com/DegenAI-Labs/HalluWorld

详情
AI中文摘要

幻觉仍然是大语言模型的核心失败模式,但现有基准在摘要、问答、检索增强生成和代理交互中操作不一致。这种碎片化使得不清楚一种缓解措施在不同情境中是否有效。当前基准要么需要人工标注和固定参考,要么依赖难以复现的观察。为研究根本原因,我们引入HalluWorld,一个基于显式参考世界模型的可扩展基准:当模型生成一个与该世界不一致的可观察声明时,即产生幻觉。基于这一观点,我们构建了合成和半合成环境,在其中参考世界完全指定,模型观点受控,幻觉标签自动产生。HalluWorld涵盖网格世界、国际象棋和现实终端任务,使世界复杂性、可观察性、时间变化和源冲突政策可控,并将幻觉细分为细粒度错误类别。我们评估了前沿和开放权重语言模型在这些设置中的表现,发现一致模式:前沿模型在直接观察信息上的感知幻觉接近解决,而多步状态跟踪和因果正向模拟仍然困难且未被扩展思考普遍解决。在终端设置中,模型在何时应放弃时也遇到困难。不同探测类型和领域中的失败分布不均,表明幻觉源于不同的失败模式而非单一能力。我们的结果表明,受控参考世界为测量和减少现代语言模型中的幻觉提供了可扩展且可重复的路径。

英文摘要

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting reduces hallucinations across contexts. Current benchmarks either require human annotation and fixed references that may be memorized, or rely on observations in settings that are difficult to reproduce. To study root causes, we introduce HalluWorld, an extensible benchmark grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this world. Building on this view, we construct synthetic and semi-synthetic environments in which the reference world is fully specified, the model's view is controlled, and hallucination labels are generated automatically. HalluWorld spans gridworlds, chess, and realistic terminal tasks, enabling controlled variation of world complexity, observability, temporal change, and source-conflict policy, and disentangling hallucinations into fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation remain difficult and are not generally solved by extended thinking. In the terminal setting, models also struggle with when to abstain. The uneven profile of failures across probe types and domains suggests that hallucinations arise from distinct failure modes rather than a single capability. Our results suggest that controlled reference worlds offer a scalable and reproducible path toward measuring and reducing hallucinations in modern language models.

2605.19330 2026-05-20 cs.AI cs.LG cs.SE 版本更新

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

MOCHA:多目标切比雪夫退火用于智能体技能优化

Md Mehrab Tanjim, Jayakumar Subramanian, Xiang Chen, Branislav Kveton, Subhojyoti Mukherjee, Anlan Zhang, Sungchul Kim, Somdeb Sarkhel, Sunav Choudhury

发表机构 * Adobe Research(Adobe研究院)

AI总结 该研究提出MOCHA方法,通过切比雪夫标量化和指数退火解决智能体技能优化中的多目标问题,实现更优的帕累托前沿发现和性能提升。

Comments Preprint. 25 pages, 14 figures, 5 tables

详情
AI中文摘要

LLM智能体通过技能组织行为——这些技能是结构化的自然语言规范,指导智能体推理、检索和响应。与单体提示不同,技能是多字段的artifact,受严格平台限制:描述字段因路由被截断,指令正文通过渐进披露压缩,且共存技能竞争有限的上下文窗口。这些限制使技能优化本质上是多目标的:一个技能必须同时最大化任务性能并满足平台限制。然而,现有提示优化器要么忽略这些权衡,要么将其折叠成加权和,忽略了非凸目标区域中的帕累托最优变体。我们引入MOCHA(多目标切比雪夫退火),用切比雪夫标量化替代单目标选择——覆盖完整的帕累托前沿,包括非凸区域——结合指数退火,从探索转向利用。在六个多样化的智能体技能实验中,所有方法共享相同的多目标变异操作符,基线接收相同的单目标文本反馈。现有优化器在六个任务中的四个任务上无法改进种子技能:1000次运行无进展。MOCHA在所有任务中突破,平均正确率比最强基线提高7.5%(在FEVER上达14.9%,在TheoremQA上达10.4%),同时发现两倍多的帕累托最优技能变体。

英文摘要

LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.

2605.19325 2026-05-20 cs.LG 版本更新

An Exterior Method for Nonnegative Matrix Factorization

非负矩阵分解的外方法

Qiujing Lu, Tonmoy Monsoor, Ehsan Ebrahimzadeh, Kartik Sharma, Vwani Roychowdhury

发表机构 * ECE, UCLA(加州大学洛杉矶分校电子与计算机工程系) eBay Search Science Team(eBay搜索科学团队)

AI总结 本文提出了一种非负矩阵分解的外方法(eNMF),通过分离低秩近似和非负性约束,解决了传统内部方法在非凸优化中收敛慢或陷入次优解的问题,并在多个数据集上验证了其优越的性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

非负矩阵分解(NMF)旨在寻找低秩近似$X \approx UV^T$,其中因素非负,并通常使用内部方法在整个优化过程中强制可行性。我们证明,这种约束驱动的方法可能会在非凸景观中阻碍进展,导致收敛缓慢或收敛到次优的 stationary 点。我们提出了一种非负矩阵分解的外框架(eNMF),将低秩近似与非负性约束分开。我们的方法从最优无约束因子分解初始化,并引入一种旋转过程,将无约束因子映射到非负正交体最近的外部点。这种视角产生了一种算法框架,其中简单的迭代更新收敛到满足KKT条件的边界点。外形式还使NMF解具有几何解释,澄清了在排列和正交变换下因子分解的等价类。一项引人注目的数值结果,涉及400个NMF实验,涵盖真实和合成数据集,显示在99%的情况下,不同算法倾向于收敛到等价的因子矩阵。我们对9种最先进的NMF算法进行基准测试,涵盖9种初始化方案,跨3个真实世界和2个合成数据集。eNMF在所有81个竞争对手中表现一致,达到相等时间设置下30%的重建误差降低,以及相等误差设置下的150%加速。下游实验进一步证明了在音频处理和推荐任务中的显著性能提升,证实了所提出外优化框架的实用价值。代码可在https://github.com/roychowdhuryresearch/eNMF获取。

英文摘要

Nonnegative matrix factorization (NMF) seeks a low-rank approximation $X \approx UV^T$ with nonnegative factors and is commonly solved using interior methods that enforce feasibility throughout optimization. We show that such constraint-driven approaches can impede progress in the nonconvex landscape, leading to slow convergence or convergence to suboptimal stationary points. We propose an exterior framework for NMF (eNMF) that separates low-rank approximation from nonnegativity enforcement. Our method initializes from the optimal unconstrained factorization and introduces a rotation procedure that maps unconstrained factors to an exterior point closest to the nonnegative orthant. This viewpoint yields an algorithmic framework in which simple iterative updates converge to KKT-satisfying stationary points on the boundary of the positive orthant. The exterior formulation also enables a geometric interpretation of NMF solutions, clarifying equivalence classes of factorizations under permutation and orthogonal transformations. An intriguing numerical result, involving 400 NMF experiments across both real and synthetic datasets, show that in 99% of the cases, different algorithms tend to converge towards equivalent factor matrices. We benchmark eNMF against 9 state-of-the-art NMF algorithms with 9 initialization schemes across 3 real-world and 2 synthetic datasets. eNMF consistently outperforms all 81 competitors, achieving up to 30% lower reconstruction error under equal-time settings and up to 150% speedup under equal-error settings. The downstream experiments further demonstrate substantial performance gains in audio processing and recommendation tasks, corroborating the practical benefits of the proposed exterior optimization framework. Code is available at https://github.com/roychowdhuryresearch/eNMF

2605.19324 2026-05-20 cs.LG 版本更新

BrainDyn: A Sheaf Neural ODE for Generative Brain Dynamics

BrainDyn: 一种用于生成脑动态的sheaf神经ODE

Siddharth Viswanath, Panayiotis Ketonis, Chen Liu, Michael Perlmutter, Dhananjay Bhaskar, Smita Krishnaswamy

发表机构 * Yale University(耶鲁大学) Boise State University(博伊西州立大学) University of Wisconsin–Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出BrainDyn,一种基于sheaf神经ODE的模型,用于生成脑动态,通过LSTM编码脑区活动历史,利用sheaf拉普拉斯算子促进信息传递,实现跨模态的强预测能力。

详情
AI中文摘要

高效的神经网络模型能够生成类似大脑动态的活动,可以用于生成合成数据、分析在测试扰动活动等条件下大脑瞬态的差异以及推断底层生成动态。然而,大型语言模型(LLMs)或标准循环神经网络(RNNs)忽略了解剖组织,因此不产生与脑区对齐的组件。另一方面,基于图的网络通常有非常简单的消息传递规则,这些规则不足以表达类似大脑的动态。为此,我们引入了BrainDyn,一种用于在结构化脑图上连续时间动态的sheaf神经ODE模型。BrainDyn使用长短期记忆(LSTM)模型在滑动时间窗口上编码每个脑区的最近活动历史,以生成隐藏状态或茎,这些状态通过可学习的限制映射投影到边特定的共享空间中。这些共享空间中相邻节点之间的差异由sheaf拉普拉斯算子表征,可以促进神经元单元之间的信息传递。这些信息的输出然后被馈送到神经ODE中,该神经ODE控制神经元活动的连续时间演变。我们对静息态fMRI(PNC数据集)、头皮EEG与局灶性癫痫(TUSZ数据集)以及由NEST尖峰网络模拟器模拟的活动进行了评估。BrainDyn在跨模态中实现了强大的预测能力,所得到的表示支持下游任务,包括在硅中扰动预测。

英文摘要

Efficient neural network models that generate brain-like dynamic activity can be a valuable resource for generating synthetic data, analyzing differences in brain transients under conditions such as testing perturbation activity or inferring the underlying generative dynamics. However, large language models (LLMs) or standard recurrent neural networks (RNNs) ignore the anatomical organization and therefore do not produce components that align with brain regions. On the other hand, graph-based networks often have very simple message passing rules that are not sufficiently expressive for brain-like dynamics. To address this, we introduce BrainDyn, a sheaf neural ordinary differential equation (neural ODE) model for continuous-time dynamics on structured brain graphs. BrainDyn encodes the recent activity history of each brain region using a long short-term memory (LSTM) model over a sliding temporal window to produce hidden states, or stalks, that are projected through learnable restriction maps into edge-specific shared spaces. Discrepancies between neighboring nodes in these shared spaces are characterized by a sheaf Laplacian that can facilitate message passing between neuronal units. The output of these messages is then fed to a neural ODE that governs the continuous-time evolution of neuronal activity. We evaluated BrainDyn on resting-state fMRI (PNC dataset), scalp EEG with focal epilepsy (TUSZ dataset), and simulated activity from the NEST spiking network simulator. BrainDyn achieves strong forecasting ability across modalities, and the resulting representations support downstream tasks including in silico perturbation prediction.

2605.19317 2026-05-20 cs.LG cs.AI 版本更新

Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

通过迭代部分细化在扩散模型中实现推理时间扩展

Taegu Kang, Jaesik Yoon, Sungjin Ahn

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出了一种无需外部验证器的扩散模型推理时间扩展方法Iterative Partial Refinement,通过在混合噪声条件下迭代部分细化生成更一致的样本,在MNIST Sudoku任务中提升了有效解率。

Comments Accepted at the ICLR 2026 Workshop on AI with Recursive Self-Improvement

详情
AI中文摘要

推理时间扩展已成为提升推理能力的主要方法,并越来越多地应用于扩散模型。然而,现有的扩散模型推理时间扩展方法通常依赖外部验证器或奖励模型来排名和选择样本,限制了其在这些评估器可用且可靠的情况下可扩展性。此外,尽管最近的扩散模型进行区域-wise、混合噪声推理,但针对此设置的推理时间扩展仍相对未被探索。我们提出Iterative Partial Refinement (IPR),一种针对顺序扩散模型的推理时间扩展方法,无需外部验证器。从已生成的样本开始,IPR重新噪声一部分区域并根据剩余区域重新生成它们,使模型能够在比初始生成时更丰富的上下文中修订早期决策。这种迭代部分细化生成更一致的样本而无需外部验证。在需要全局约束满足的推理任务中,IPR一致地提升了性能:在MNIST Sudoku任务中,有效解率从55.8%提高到75.0%。这些结果表明,仅迭代部分细化即可作为扩散模型在顺序、混合噪声设置中的有效推理时间扩展策略。代码可在:https://github.com/ahn-ml/IPR获取。

英文摘要

Inference-time scaling has emerged as a major approach for improving reasoning capabilities, and has been increasingly applied to diffusion models. However, existing inference-time scaling methods for diffusion models typically rely on external verifiers or reward models to rank and select samples, limiting their scalability to settings where such evaluators are available and reliable. Moreover, while recent diffusion models perform sequential inference with region-wise, mixed-noise conditioning, inference-time scaling tailored to this setting remains relatively underexplored. We propose Iterative Partial Refinement (IPR), an inference-time scaling method for sequential diffusion that requires no external verifier. Starting from an already-generated sample, IPR re-noises a subset of regions and regenerates them conditioned on the remaining regions, enabling the model to revise earlier decisions under a richer context than was available during the initial generation. This iterative partial refinement produces more globally consistent samples without external verification. On reasoning tasks requiring global constraint satisfaction, IPR consistently improves performance: on MNIST Sudoku, the valid solution rate increases from 55.8% to 75.0%. These results show that iterative partial refinement alone can serve as an effective inference-time scaling strategy for diffusion models in sequential, mixed-noise settings. Code is available at: https://github.com/ahn-ml/IPR

2605.19313 2026-05-20 stat.ML cs.LG stat.ME 版本更新

A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning

一种用于结构感知聚类和异质因果图学习的统一框架

Honglin Du, Muxuan Liang, Xiang Zhong

发表机构 * Department of Industrial and Systems Engineering, University of Florida(佛罗里达大学工业与系统工程系) Department of Biostatistics, MD Anderson Cancer Center(MD安德森癌症中心生物统计学系)

AI总结 本文提出了一种基于有向无环图的依赖聚类方法,通过交替方向乘子法解决结构异质性问题,实现对子群体依赖结构的鲁棒发现。

详情
AI中文摘要

在复杂的多变量系统中,变量间的相互作用由依赖结构定义,通常编码为有向无环图(DAGs)。然而,依赖结构可能在不同个体间变化,忽略这种结构异质性会引入偏差并掩盖子群体特定的依赖关系。为此,我们提出了一种基于有向无环图的依赖聚类方法,通过交替方向乘子法(ADMM)解决结构异质性问题,构建在结构方程模型(SEM)之上,联合学习聚类分配和子群体特定的依赖结构。我们通过平滑约束编码无环性,并整合一个组内截断Lasso融合惩罚(gTLP)以根据结构相似性聚类个体。这产生了一个非凸优化问题,结合稀疏性、无环性和结构一致性约束。我们通过增广拉格朗日方法解决非凸性,并使用适应的交替方向乘子法(ADMM)求解差分凸程序。对于某些图结构,如上三角邻接矩阵,我们的算法保证能收敛到KKT点。实验表明,我们的方法能够以高真阳性率和低假发现率恢复子群体特定的因果依赖结构。这种能力使我们能够在子群体标签未知的情况下,鲁棒地发现跨个体的异质依赖关系。

英文摘要

In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this structural heterogeneity introduces bias and obscures subpopulation-specific dependencies. To address this, we propose Directed Acyclic Graph-based Dependency Clustering via Alternating Direction Method of Multipliers (DAG-DC-ADMM), a unified framework built upon Structural Equation Modeling (SEM) that jointly learns cluster assignments and cluster-specific dependency structures. We encode acyclicity via a smooth constraint and integrate a groupwise truncated Lasso fusion penalty (gTLP) to cluster subjects based on their structural similarity. This yields a nonconvex optimization problem that incorporates sparsity, acyclicity, and structural consensus constraints. We address the nonconvexity by using the augmented Lagrangian method and solve it with an adapted version of the Alternating Direction Method of Multipliers (ADMM) for difference-of-convex programs. For certain graph structures, such as upper triangular adjacency matrices, our algorithm is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. Experiments demonstrate that our method recovers cluster-specific causal dependency structures with a high true positive rate and a low false discovery rate. This capability enables the robust discovery of heterogeneous dependencies across subjects where the subpopulation label is unknown.

2605.19311 2026-05-20 cs.LG eess.SP 版本更新

An Objective Performance Evaluation of the LSTM Networks in Time Series Classification

LSTM网络在时间序列分类中的客观性能评估

Sooraj Sunil, Balakumar Balasingam

发表机构 * Electrical and Computer Engineering(电气与计算机工程系) University of Windsor(温莎大学)

AI总结 本文提出了一种评估框架,比较了LSTM分类器与基于模型的期望最大化(EM)分类器在二元时间序列分类中的性能,发现当数据符合假设模型类时,EM分类器表现优异,而LSTM分类器需要更大的噪声统计分离度才能实现可靠的分类,且在模型仅在测量噪声上不同的情况下,其性能低于参考分类器。

Comments Accepted in 2026 29th International Conference on Information Fusion

详情
AI中文摘要

深度学习的快速采用已导致数据驱动模型取代经典基于模型的算法,即使在由良好理解的物理定律支配的领域也是如此。尽管数据驱动模型,如长短期记忆(LSTM)网络,已成为时间序列分析的流行选择,但其在结构化环境中的性能相对于基于模型的方法很少被客观评估。本文提出了一种性能评估框架,比较了LSTM分类器与基于模型的期望最大化(EM)分类器在二元时间序列分类中的性能。评估是在两个仅在噪声统计上不同的标量线性高斯状态空间模型上进行的,其中卡尔曼滤波似然比率检验使用真实参数作为最佳可实现分类性能的参考。通过蒙特卡洛模拟,分类器在三个轴上进行评估:任务难度,由过程或测量噪声之间分离度控制;序列长度;以及训练数据集大小。结果表明,当数据符合假设模型类时,利用已知模型结构的EM分类器表现良好。LSTM分类器需要更大的噪声统计分离度才能实现可靠的分类,并且在模型仅在测量噪声上不同的情况下,其性能低于参考分类器,无论序列长度或训练数据集大小如何。

英文摘要

The rapid adoption of deep learning has increasingly led to data-driven models replacing classical model-based algorithms, even in domains governed by well-understood physical laws. While data-driven models, such as long short-term memory (LSTM) networks, have become a popular choice for time-series analysis, their performance relative to model-based approaches in structured environments is rarely evaluated objectively. This paper presents a performance evaluation framework comparing an LSTM classifier against a model-based expectation maximization (EM) classifier for binary time-series classification. The evaluation is conducted on two scalar linear Gaussian state space models differing only in their noise statistics, where the Kalman filter likelihood ratio test with true parameters serves as a reference for the best achievable classification performance.Through Monte Carlo simulations, the classifiers are evaluated across three axes: task difficulty, controlled by the separation in process or measurement noise between the two models; sequence length; and training dataset size. The results show that the EM classifier, which exploits the known model structure, performs strongly when the data conform to the assumed model class. The LSTM classifier requires a larger separation in noise statistics to achieve reliable classification, and its performance saturates below the reference classifier when the models differ only in measurement noise, regardless of sequence length or training dataset size.

2605.19306 2026-05-20 cs.LG math.OC 版本更新

A Two-Phase Adaptive Balanced Penalty Method for Controllable Pareto Front Learning under Split Feasibility Conditions

一种用于在分割可行性条件下可控帕累托前沿学习的两阶段自适应平衡惩罚方法

Nguyen Viet Hoang, Dung D. Le, Tran Ngoc Thang

发表机构 * Faculty of Applied Mathematics and Informatics, Hanoi University of Science and Technology(应用数学与信息技术学院,河内科学技术大学) College of Engineering and Computer Science, VinUniversity(工程与计算机科学学院,Vin大学)

AI总结 本文提出了一种自适应平衡惩罚算法,用于在分割可行性条件下训练可控帕累托前沿学习的超网络,通过自适应指标驱动的可计算下界,将约束帕累托问题转化为双层标量分割问题,并证明了在标准凸性假设下的完全序列收敛性。

Comments 36 pages, 18 figures, 12 tables. Submitted to Neural Networks (Elsevier)

详情
AI中文摘要

我们解决在分割可行性条件下训练超网络用于可控帕累托前沿学习(CPFL)的开放问题,具有严格的理论保证。我们将约束帕累托问题重新表述为双层标量分割问题(BSSP),并提出自适应平衡惩罚(ABP)算法,其三个梯度组件——最优性、集可行性以及图像可行性——通过由可计算下界驱动的自适应指标进行混合。利用一种新的凸替代技术,我们证明在标准凸性和Robbins-Monro步长假设下实现了完全序列收敛性。然后将ABP惩罚结构转换为一种两阶段、以可行性优先的训练策略,用于超MLP和超Trans架构(ABP-HyperNet)。为了评估受约束的CPFL,我们引入了预期可行超体积(EFHV),该指标联合捕捉了解的质量和约束满足。在五个多目标基准上的实验验证了ABP求解器相对于真实值的性能,同时三个多任务学习数据集展示了ABP-HyperNet在提高可行性从36-49%到87-100%的情况下,相比无约束基线达到了2.3倍更高的EFHV。

英文摘要

We address the open problem of training hypernetworks for Controllable Pareto Front Learning (CPFL) under split feasibility conditions with rigorous theoretical guarantees. We reformulate the constrained Pareto problem as a Bi-Level Scalarized Split Problem (BSSP) and propose the Adaptive Balanced Penalty (ABP) algorithm, whose three gradient components -- optimality, set feasibility, and image feasibility -- are blended through an adaptive indicator driven by a computable lower bound. Using a novel convex surrogate technique, we prove full-sequence convergence under standard convexity and Robbins-Monro step-size assumptions. The ABP penalty structure is then translated into a two-phase, feasibility-first training strategy for Hyper-MLP and HyperTrans architectures (ABP-HyperNet). To evaluate constrained CPFL, we introduce the Expected Feasible Hypervolume (EFHV), which jointly captures solution quality and constraint satisfaction. Experiments on five multi-objective benchmarks validate the ABP solver against ground truth, while three multi-task learning datasets demonstrate that ABP-HyperNet achieves up to 2.3x higher EFHV than unconstrained baselines by raising feasibility from 36-49% to 87-100%.

2605.19305 2026-05-20 cs.GR cs.CV cs.LG 版本更新

Matérn Noise for Triangulation-Agnostic Flow Matching on Meshes

Matérn噪声用于三角化无关的网格上流匹配

Tianshu Kuai, Arman Maesumi, Daniel Ritchie, Noam Aigerman

发表机构 * Université de Montréal & Mila(蒙特利尔大学及Mila) Brown University(布朗大学)

AI总结 本文提出了一种三角化无关的流匹配方法,通过Matérn过程生成网格信号,实现高效且高质量的网格生成。

Comments In ACM Transactions on Graphics (SIGGRAPH 2026). Project page: https://matern-fm.github.io/

详情
AI中文摘要

本文针对在三角网格上学习生成信号的任务,提出了三角化无关的流匹配方法。理论部分提出了一种三角化无关的噪声分布,用于流匹配模型的去噪过程。通过数学定义了分布的三角化无关性,证明了Matérn过程的离散化具有所需性质,并提供了一种高效的采样算法。使用该噪声模型,并结合PoissonNet作为去噪器,实现了三角化无关的流匹配。实验显示,该方法在超过一百万三角形的网格上能够生成高质量和多样化的结果,显著超越了现有最佳水平。

英文摘要

This paper tackles the task of learning to generate signals over triangle meshes in a triangulation-agnostic manner, meaning the trained model can be applied to different meshes and triangulations effectively. Practically, the paper adapts the flow matching (FM) paradigm to a mesh-based, triangulation-agnostic setting. Theoretically, it proposes a specific noise distribution which is triangulation agnostic, to be used inside the FM model's denoising process. While noise distributions are usually trivial to devise for, e.g., images, devising a triangulation-agnostic distribution proves to be a much more difficult task. We formulate a mathematical definition of triangulation agnosticism of distributions, via their spectrum. We then show that a discretization of a specific Gaussian random field called a Matérn process holds these desired properties, and provides a simple and efficient sampling algorithm. We use it as our noise model, and adapt FM to the triangulation-agnostic setting by using a state-of-the-art approach for learning signals on meshes in the gradient domain -- PoissonNet -- as the denoiser. We conduct experiments on elaborate tasks such as sampling elastic rest states, and generating poses of humanoids. Our method is shown to be capable of producing highly realistic results for meshes of over one million triangles, significantly exceeding the state-of-the-art in quality and diversity.

2605.19299 2026-05-20 cs.LG 版本更新

Cross-Paradigm Knowledge Distillation: A Comprehensive Study of Bidirectional Transfer Between Random Forests and Deep Neural Networks for Big Data Applications

跨范式知识蒸馏:随机森林与深度神经网络之间双向知识转移的综合性研究用于大数据应用

Mahdi Naser Moghadasi

发表机构 * BrightMind AI Research(BrightMind AI研究院)

AI总结 本文研究了随机森林与深度神经网络之间双向知识蒸馏,提出了新的方法,通过144次实验展示了双向RF-DL蒸馏在分类和回归任务中的竞争力,同时提供了可解释性和表达性的互补优势。

详情
AI中文摘要

大数据的指数增长加剧了对能够处理多样化数据特征并保持计算效率的高效且可解释的机器学习模型的需求。知识蒸馏主要集中在神经网络到神经网络的转移,跨范式知识转移则鲜有探索。本文首次系统研究了随机森林(RF)与深度神经网络(DNN)之间的双向知识蒸馏,填补了集成学习和大数据应用中的模型压缩关键空白。我们提出了一种新的方法,包括渐进多阶段蒸馏、来自多样化树模型的多教师集成蒸馏以及不确定性感知的跨范式转移机制。通过在6个多样化的数据集上进行144次全面实验,涵盖了分类和回归任务,我们证明双向RF-DL蒸馏在保持可解释性的同时,提供了神经网络的表达能力。我们的结果表明,多教师集成蒸馏在传统方法上始终表现更优,其中NN-COMPACT在分类任务中达到98.13%的分类准确率,NN-WIDE在回归任务中达到92.6%的R²分数。所提出的框架使大数据环境中的部署更加灵活,可以根据计算约束和可解释性需求进行最优模型选择。这项工作在跨范式知识转移领域建立了新的研究方向,对可解释AI和资源受限大数据系统中的可扩展模型部署具有重要影响。

英文摘要

The exponential growth of big data has intensified the need for efficient and interpretable machine learning models that can handle diverse data characteristics while maintaining computational efficiency. Knowledge distillation has primarily focused on neural network-to-neural network transfer, leaving cross-paradigm knowledge transfer largely unexplored. This paper presents the first comprehensive study of bidirectional knowledge distillation between Random Forests (RF) and Deep Neural Networks (DNN), addressing critical gaps in ensemble learning and model compression for big data applications. We propose novel methodologies including progressive multi-stage distillation, multi-teacher ensemble distillation from diverse tree models, and uncertainty-aware cross-paradigm transfer mechanisms. Through 144 comprehensive experiments across 6 diverse datasets encompassing classification and regression tasks, we demonstrate that bidirectional RF-DL distillation achieves competitive performance while providing complementary benefits: interpretability from tree models and expressiveness from neural networks. Our results show that multi-teacher ensemble distillation consistently outperforms traditional approaches, with NN-COMPACT achieving 98.13% classification accuracy and NN-WIDE reaching 92.6% R^2 score in regression tasks. The proposed framework enables deployment flexibility in big data environments, allowing optimal model selection based on computational constraints and interpretability requirements. This work establishes a new research direction in cross-paradigm knowledge transfer with significant implications for interpretable AI and scalable model deployment in resource-constrained big data systems.

2605.19293 2026-05-20 cs.IT cs.LG cs.RO math.IT 版本更新

Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation

领域自适应的通信速率优化用于仿真到现实的人形机器人无线XR远程操作

Caolu Xu, Zhiyong Chen, Meixia Tao, Li Song, Feng Yang, Wenjun Zhang

发表机构 * Cooperative Medianet Innovation Center(协作中位网创新中心) School of Information Science and Electronic Engineering(信息科学与电子工程学院) Shanghai Jiao Tong University(上海交通大学)

AI总结 本文提出了一种领域自适应的通信速率优化方法,通过在仿真到现实的分布偏移中平衡重建误差和通信能耗,利用PAC-Bayes泛化特性分析和密度比加权的PPO方法,结合离线真实域数据校正,以提高人形机器人无线XR远程操作的通信效率和重建精度。

Comments submitted to IEEE journal

详情
AI中文摘要

无线扩展现实(XR)远程操作为收集人形机器人演示提供了具身交互能力,但大规模应用受到高频运动传输开销的限制。本文开发了一个系统框架,集成了采样、传输、插值和重建,并制定了通信速率优化,旨在通过维度采样率控制最小化通信能耗,同时保持机器人运动轨迹的重建精度。由于从物理机器人获取实时反馈受限于硬件成本,必须通过与离线真实域数据校正的仿真交互来解决问题。为了指导仿真到现实的适应,我们提供了一种PAC-Bayes泛化特性刻画,揭示了潜在密度比估计、有限样本偏差和编码器偏差的影响。基于此分析,我们提出了一种具有密度比加权和信任区域正则化的近端策略优化(PPO)方法。在公共人形远程操作数据集上的实验表明,所提出的方法在仿真到现实分布偏移中改善了重建误差和通信能耗之间的权衡。我们进一步分析了所提出算法在各种无线信道和动态运动轨迹中的有效性。

英文摘要

Wireless extended reality (XR) teleoperation provides embodied interaction capability for collecting humanoid robot demonstrations, but the large-scale adoption is restricted by the overhead of high-frequency motion transmission. This paper develops a system framework that integrates sampling, transmission, interpolation, and reconstruction and formulates a communication-rate optimization that aims to minimize the communication energy while maintaining the reconstruction accuracy of robot motion trajectories through dimension-wise sampling-rate control. Since acquiring real-time feedback from physical robots is limited by hardware costs, it is necessary to solve the problem through simulator interaction with offline real-domain data correction. To guide sim-to-real adaptation, we provide a PAC-Bayes generalization characterization that reveals the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias. Building on this analysis, we propose a proximal policy optimization (PPO) method with density-ratio weighting and trust-region regularization. Experiments on public humanoid teleoperation dataset show that the proposed method improves the tradeoff between reconstruction error and communication energy consumption under sim-to-real distribution shift. We further analyze the effectiveness of the proposed algorithm across various wireless channels and dynamic motion trajectories.

2605.19291 2026-05-20 stat.ML cs.LG math.ST stat.TH 版本更新

Factor Augmented High-Dimensional SGD

因子增强的高维SGD

Shubo Li, Yuefeng Han, Xiufan Yu

发表机构 * Department of Statistics(统计学系) The Pennsylvania State University(宾夕法尼亚州立大学) Department of Applied and Computational Mathematics and Statistics(应用与计算数学与统计学系) University of Notre Dame(圣母大学)

AI总结 本文提出了一种新的优化方法Factor-Augmented SGD (FSGD),通过利用高维学习任务中的潜在因子表示,解决了传统两阶段降维方法在数据存储和在线学习中的限制,并建立了首个将潜在因子估计误差纳入SGD分析的理论框架,提供了在衰减步长和小批量更新下的$\ell^s$范数矩收敛性。

详情
AI中文摘要

随机梯度下降(SGD)是现代机器学习中广泛使用的基础优化算法。在本文中,我们提出Factor-Augmented SGD(FSGD),一种新的优化方法,利用高维学习任务中的潜在因子表示。与依赖于离线表示学习和完整数据存储的传统两阶段降维方法不同,FSGD的关键创新在于它完全在流数据上操作,使其能够扩展到大规模和高维问题。此外,我们建立了首个明确将潜在因子估计误差纳入SGD分析的理论框架,并在衰减步长和小批量更新下提供了$\ell^s$范数的矩收敛性。我们的结果为在高维机器学习系统中可靠和可扩展地使用SGD提供了新的基础。

英文摘要

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations in high-dimensional learning tasks. Unlike standard two-stage dimension reduction approaches that rely on offline representation learning and full data storage, a key novelty of FSGD is that it operates purely on streaming data, making it scalable to large-scale and high-dimensional problems. Furthermore, we establish the first theoretical framework that explicitly incorporates latent factor estimation error into the analysis of SGD, and provide moment convergence in $\ell^s$ norm under decaying step sizes and mini-batch updates. Our results provide a new foundation for employing SGD reliably and scalably in high-dimensional machine learning systems.

2605.19284 2026-05-20 cs.CL cs.LG 版本更新

Language models struggle with compartmentalization

语言模型在 compartmentalization 方面遇到困难

Thomas Vincent Howe, David Wingate

发表机构 * Department of Computer Science(计算机科学系) Brigham Young University

AI总结 研究探讨了大型语言模型在处理统一概念的不同表达方式时的 compartmentalization 问题,发现模型在不同表达方式之间无法有效共享统计信息,导致模型容量浪费和样本效率降低。

Comments 9 pages, 8 figures, plus 9 pages of appendices. Submitted to NeurIPS 2026. Code: https://github.com/vinhowe/compartmentalization. Eval data: https://doi.org/10.5281/zenodo.20171021

详情
AI中文摘要

在大型语言模型(LLMs)使用的训练数据中,相同的潜在概念通常以多种不同的方式呈现:相同的事实出现在英语和斯瓦希里语中;许多函数可以用Python和Haskell表达;命题可以用正式语言和自然语言表达。我们展示了LLMs可能会表现出compartmentalization,即在不同的统一概念的不同表达方式之间无法识别和共享统计信息。在最坏的情况下,LLMs只是学习了每个概念表达方式的平行内部表示,用冗余性耗尽模型容量,并随着这些表达方式的数量增加而降低样本效率。我们还证明,即使合成平行数据容易学习,它也可能无法改善这一问题。在此框架下,我们发现,对于小型模型,早期多语言学习几乎完全是 compartmentalized 的。最后,所有我们研究的干预措施都表现出相变,其有效性取决于不同的表达方式数量,这表明语言建模目标可能只能不一致地统一表示。

英文摘要

In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely compartmentalized. Finally, all interventions that we study exhibit a phase transition in which their effectiveness depends on the number of distinct presentations, suggesting that the language modeling objective may only inconsistently unify representations.

2605.19283 2026-05-20 cs.LG cs.AI stat.ML 版本更新

EviTrack: Selection over Sampling for Delayed Disambiguation

EviTrack: 在延迟歧义中选择而非采样

Omer Haq

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出EviTrack框架,通过在潜在轨迹上进行选择而非边际状态,以在延迟歧义中实现更有效的序列推理,其核心方法是基于证据和似然比的轨迹假设选择,从而在数据支持后延迟承诺,优于基于采样的基线方法。

Comments https://github.com/Haq94/EviTrack

详情
AI中文摘要

在延迟歧义的环境中,顺序预测具有挑战性,因为早期观测模糊,多个潜在解释在足够证据积累之前仍然合理。基于边际推断的标准方法在此设置中表现不佳,要么过早坍塌不确定性,要么在信息证据出现后无法恢复。我们引入EviTrack,一种测试时间推断框架,该框架在潜在轨迹上而非边际状态上操作。EviTrack维护一组竞争轨迹假设,并应用基于证据和似然比的选择来延迟承诺,直到有数据支持。受多假设跟踪和先检测前跟踪中的假设管理启发。为了评估此设置,我们构建了一个受控的合成基准,具有已知的潜在真实值,明确展示了延迟歧义。在匹配的推断预算下,EviTrack显著优于基于采样的基线方法,实现更快的后歧义恢复。这些结果表明,在延迟歧义环境中,适度的轨迹级选择比增加采样覆盖更有效,突显了选择而非采样作为可靠序列推断的关键原则。

英文摘要

Sequential prediction is challenging in regimes of delayed disambiguation, where early observations are ambiguous and multiple latent explanations remain plausible until sufficient evidence accumulates. Standard approaches based on marginal inference struggle in this setting, either collapsing uncertainty prematurely or failing to recover once informative evidence arrives. We introduce EviTrack, a test-time inference framework that operates over latent trajectories rather than marginal states. EviTrack maintains a set of competing trajectory hypotheses and applies evidence- and likelihood-ratio-based selection to delay commitment until supported by data, drawing inspiration from hypothesis management in multiple hypothesis tracking and track-before-detect. To evaluate this setting, we construct a controlled synthetic benchmark with known latent ground truth that explicitly exhibits delayed disambiguation. At matched inference budget, EviTrack substantially outperforms sampling-based baselines, achieving faster post-disambiguation recovery. These results show that, in delayed disambiguation regimes, moderate trajectory-level selection is more effective than increasing sampling coverage, highlighting selection over sampling as a key principle for reliable sequential inference.

2605.19282 2026-05-20 cs.LG 版本更新

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

重新思考Muon超越预训练:VLA和RLVR中的频谱失败及高频修复

Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

发表机构 * Michigan State University(密歇根州立大学) Cisco(思科) University of Minnesota(明尼苏达大学) IBM Research(IBM研究院)

AI总结 本文研究了Muon优化器在预训练之外的局限性,提出Pion通过高频NS迭代机制改进VLA和RLVR任务的性能。

详情
AI中文摘要

Muon是一种矩阵感知优化器,利用牛顿-施楚兹(NS)迭代来通过驱动动量矩阵的所有奇异值趋近于1来强制梯度正交化。尽管这种均匀频谱白化增强了探索并优于AdamW在LLM预训练中,我们显示它在两个领域可能导致根本限制:(i)跨模态视觉-语言-动作(VLA)训练,其中固有低秩动作模块梯度导致噪声尾部方向的放大,以及(ii)可验证奖励的强化学习(RLVR),其中低信噪比梯度和需要保留先前训练的每头专业化使白化不稳定。为了解决这些挑战,我们提出Pion,作为Muon的即插即用替代品,保持其计算效率,同时将均匀频谱白化替换为两阶段的提升+抑制机制,我们称之为高频NS迭代。这种设计诱导了锐利的频谱高频效应,将主导奇异值锚定在1,同时将噪声尾部组件抑制到0,具有可控的滤波强度。为了保持预训练的每头异质性,Pion还支持一种每头模式,通过简单的reshape在注意力头之间独立应用更新,而无需额外成本。在LIBERO和LIBERO-Plus上的VLA训练中,Pion在l_1回归(VLA-Adapter)和流匹配(VLANeXt)架构上一致优于基线,例如在1,500次训练步骤后达到LIBERO Object的100%成功率,而Muon为97.0%,AdamW仅为32.2%。Pion的优势进一步扩展到使用pi_0.5骨干的现实Franka Research 3机器人在DROID设置下的三个抓取和放置任务。在Qwen3-1.7B/4B上的RLVR后训练中,Pion在MATH和GSM8K上优于AdamW,而Muon则崩溃为零。

英文摘要

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

2605.19258 2026-05-20 cs.LG cs.AI 版本更新

ExECG: An Explainable AI Framework for ECG models

ExECG:用于ECG模型的可解释AI框架

Jong-Hwan Jang, Yong-yeon Jo

发表机构 * Medical AI Co. Ltd(医疗AI公司)

AI总结 本文提出ExECG框架,旨在解决ECG模型在临床应用中缺乏解释性的问题,通过三阶段流程提供可重用和可复现的ECG可解释性。

详情
AI中文摘要

深度学习已使ECG诊断模型在如心律失常分类和异常检测等任务中表现出强大的性能。然而,仅凭准确性不足以满足临床部署的需求,因为它无法解释为何产生特定的输出,限制了验证、错误分析和信任。尽管ECG XAI已被广泛研究并持续改进,但不同研究中的实际流程和报告规范差异较大,阻碍了重用和可复现性。为了解决这些问题,我们提出了ExECG,一个Python框架,提供三阶段流程:Wrapper标准化访问异构ECG格式和中间表示,Explainer统一各种XAI方法到共享的执行协议,Visualizer支持在统一界面内一致的跨方法比较。我们通过简洁的例子和两个案例研究展示了端到端的使用,强调了可互操作和可复现的ECG可解释性。

英文摘要

Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However, accuracy alone is insufficient for clinical deployment because it does not explain why a specific output was produced, limiting justification, error analysis, and trust. Although ECG XAI has been extensively investigated and steadily improved, practical pipelines and reporting conventions vary across studies, hindering reuse and reproducibility. To address these issues, we present Explainable AI framework for ECG models (ExECG), a Python framework that provides a three-stage pipeline: Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, Explainer unifies diverse XAI methods under a shared execution protocol, and Visualizer supports consistent cross-method comparison within a unified interface. We demonstrate end-to-end usage with concise examples and two case studies, highlighting interoperable and reproducible ECG explainability.

2605.19249 2026-05-20 cs.LG 版本更新

Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting

超越外推:基于双向启发的知识利用范式用于时间序列预测

Liu Chong, Yingjie Zhou, Hao Li, Pengyang Wang, Qingsong Wen, Ce Zhu

发表机构 * College of Computer Science, Sichuan University(四川大学计算机科学学院) Department of Computer and Information Science, University of Macau(澳门大学计算机与信息科学系) School of Information and Communication Engineering, University of Electronic Science and Technology of China(电子科技大学信息与通信工程学院)

AI总结 本文提出了一种新的时间序列预测范式KUP-BI,通过从训练历史库中提炼出延续式知识,为双向预测提供结构化知识,从而提升预测性能。

Comments Accepted to ICML 2026. 18 pages, 6 figures

详情
AI中文摘要

时间序列预测在能源、交通和公共卫生等场景中至关重要。然而,大多数现有预测模型主要依赖单向推理,即从历史映射到目标,而忽略了由修订的自然链('历史(模型输入)--目标(真实输出)--目标后延续')提供的结构信息。目标后延续记录了轨迹在目标后的发展情况,有助于稳定预测,但无法在推理时观测到。本文旨在获得当前输入的近似后延续代理,为双向预测提供结构化知识。该想法被实例化为KUP-BI(Knowledge Utilization Paradigm with Bidirectional Inspiration),一种新的时间序列建模范式,从仅训练的历史库中提炼出延续式知识(作为近似后延续代理),并将其整合到标准预测骨干中。输入流和延续代理流通过轻量级的特征级门控模块进行融合。这种设计不引入训练轨迹中已包含的信息之外的内容;相反,它提供了一种结构化的归纳偏置,帮助骨干利用典型的延续模式,而不是仅依赖参数外推。在六个公开数据集上的实验结果表明,KUP-BI在提升最先进模型的预测性能方面表现一致,且具有较小的额外开销。

英文摘要

Time-series forecasting is critical in various scenarios, such as energy, transportation, and public health. However, most existing forecasters rely primarily on one-way inference, \textit{i.e.}, mapping \textbf{history} to \textbf{target}, and overlook the structural information provided by a revised natural chain (``\textbf{history} (model input) -- \textbf{target} (ground-truth output) -- \textbf{post-target continuation}''). The post-target continuation records how trajectories evolve after the target, which can help stabilize forecasting, but it is not observable at inference time. In this work, we aim to obtain an approximate proxy of the post-target continuation for the current input, providing structural knowledge for bidirectional forecasting. This idea is instantiated as KUP-BI (Knowledge Utilization Paradigm with Bidirectional Inspiration), a new time-series modeling paradigm that distills continuation-style knowledge (as an approximate post-target continuation proxy) from a \emph{train-only} historical library and integrates it into standard forecasting backbones. The input stream and the continuation-proxy stream are fused via a lightweight feature-level gating module. This design does not introduce information beyond what is already contained in the training trajectories; instead, it provides a structured inductive bias that helps backbones exploit typical continuation patterns rather than relying solely on parametric extrapolation. Experimental results on six public datasets show that KUP-BI consistently improves the forecasting performance of state-of-the-art models, with small additional overhead.

2605.19243 2026-05-20 cs.LG cs.AI cs.CG 版本更新

Euclidean Embedding of Data Using Local Distances

利用局部距离进行数据的欧几里得嵌入

Dimitris Arabadjis

发表机构 * Department of Statistics and Actuarial-Financial Mathematics(统计与精算-金融数学系) University of the Aegean(爱琴海大学)

AI总结 本文研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。通过求解一个变分问题,将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

详情
AI中文摘要

我们研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,并提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。嵌入是通过求解一个变分问题来实现的,该问题将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

英文摘要

We study the problem of recovering a globally consistent Euclidean embedding of data, given only a local distance graph and propose a method that optimally represents these distances. The method operates solely on a neighborhood graph weighted by pairwise distances, without requiring any prior vector representation of the data. The embedding is obtained by solving a variational problem that matches local, on-graph distances to the Euclidean metric, induced by the differentials of the embedding functions. The resulting Euler-Lagrange equations are derived in a coordinate-free form, enabling direct evaluation of all operators from the distance graph alone. Though non-linear and missing an explicit expression for their non-linearity, these equations are shown to be resolved as an iteratively updated sparse linear problem. The main contributions of the proposed approach are (a) the derivation of the functional equations governing the optimal Euclidean embedding in the continuum, (b) a representation-free formulation that requires only a neighborhood distance graph and no feature vectors and (c) an estimation procedure based exclusively on local graph operations. We experimentally evaluate the resulting non-parametric algorithm on synthetic manifolds and real datasets, demonstrating consistent preservation of local metric structure and neighboring relations, while approximating the global isometric embedding.

2605.19242 2026-05-20 cs.CV cs.AI cs.ET cs.LG cs.MM 版本更新

PhyWorld: Physics-Faithful World Model for Video Generation

PhyWorld: 用于视频生成的物理忠实世界模型

Pu Zhao, Juyi Lin, Timothy Rupprecht, Arash Akbari, Chence Yang, Rahul Chowdhury, Elaheh Motamedi, Arman Akbari, Yumei He, Chen Wang, Geng Yuan, Weiwei Chen, Yanzhi Wang

发表机构 * Northeastern University(东北大学) University of Georgia(佐治亚大学) Tulane University(路易斯安那大学) EmbodyX

AI总结 本文提出PhyWorld,一种通过两阶段训练提升视频生成模型的物理忠实性,以改进世界模拟器的性能,从而更有效地支持物理AI系统。

详情
AI中文摘要

世界模拟器可以在真实世界部署前提供安全且可扩展的环境来训练物理AI系统。大型视频生成模型正成为此类模拟器的有希望的基础,因为它们能够生成多样且逼真的视觉未来。然而,将其用作世界模拟器需要物理忠实的视频延续,即生成的视频应保持由条件输入隐含的物理状态,并以符合基本物理原理的方式演变。我们提出了PhyWorld,一种视频生成世界模型,通过两阶段的后训练来生成时间上一致且物理忠实的场景延续。在第一阶段,我们通过流匹配微调改进视频到视频延续,鼓励稳定视觉属性和帧间一致的运动动态。在第二阶段,我们通过直接偏好优化(DPO)对物理偏好对进行对齐,使模型朝着更符合物理合理性的输出发展。为了评估PhyWorld,我们使用了标准视频质量基准和专门的物理忠实性基准,并对每条物理定律进行评分。实验表明,PhyWorld提高了视频一致性,其在VBench上的平均得分为0.769,比最先进的基线0.756或更低。PhyWorld还提高了物理合理性,其在我们物理忠实性基准上的平均得分为3.09,比最强基线的2.99有所提高。这些结果表明,通过延续和物理偏好信号对大型视频生成模型进行后训练,可以使其成为更有效的物理AI世界模拟器。

英文摘要

World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.

2605.19235 2026-05-20 cs.LG cs.GT 版本更新

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

GAE在不完全信息自博弈强化学习中表现不足

Zhiyuan Fan, Gabriele Farina

发表机构 * MIT(麻省理工学院)

AI总结 本文研究了不完全信息博弈中自博弈强化学习中GAE估计器的方差问题,提出Q-boosting和VRPO算法以减少方差并提升性能。

详情
AI中文摘要

不完全信息博弈中的竞争多智能体强化学习需要智能体在部分可观测环境下对抗对手,需要随机策略。虽然使用近端策略优化(PPO)的自博弈强化学习在经验上取得了成功,但其标准优势估计器广义优势估计(GAE)由于随机未来动作的采样而产生额外的方差。在均衡自博弈中,这种方差被均衡策略的随机性放大,并且即使当批评器是精确的时仍然存在。我们通过引入基于集中动作价值批评的Q-boosting,一种方差减少的优势估计器,以及提出方差减少策略优化(VRPO),将此新估计器纳入其中。该算法用多步期望SARSA(λ)轨迹替代了采样的多步备份,每一步计算策略期望以平均动作采样噪声,同时保留PPO的裁剪目标和在线策略演员更新。经验上,VRPO在中等规模到大规模游戏,包括斗地主和头衔无限制德州扑克中都表现出强劲的性能。

英文摘要

Competitive multi-agent reinforcement learning in imperfect-information games requires agents to act under partial observability and against adversarial opponents, necessitating stochastic policies. While self-play reinforcement learning with Proximal Policy Optimization (PPO) has achieved strong empirical success, its standard advantage estimator, generalized advantage estimation, suffers from additional variance due to the sampling of stochastic future actions. This variance is amplified in equilibrium self-play because of the stochastic nature of the equilibrium policy and persists even when the critic is exact. We address this bottleneck by introducing $Q$-boosting, a variance-reduced advantage estimator based on a centralized action-value critic, and propose Variance-Reduced Policy Optimization (VRPO), incorporating this new estimator. The algorithm replaces sampled multi-step backups with a multi-step Expected SARSA$(λ)$ trace, computing policy expectations at each step to average out action-sampling noise, while retaining PPO's clipped objective and on-policy actor updates. Empirically, VRPO consistently achieves strong performance from mid-sized to large-scale games including Dou Dizhu and Heads-Up No-Limit Texas Hold'em.

2605.19231 2026-05-20 cs.LG stat.ML 版本更新

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

DeRegiME:用于分布偏移下概率预测的深度制度混合

Kieran Wood, Stefan Zohren, Stephen J. Roberts

发表机构 * Machine Learning Research Group, University of Oxford(牛津大学机器学习研究组) Oxford-Man Institute, University of Oxford(牛津大学奥克斯曼研究所)

AI总结 DeRegiME通过引入稀疏变分高斯过程,实现了概率预测中的制度混合,解决了神经预测器在处理分布偏移时的不足,提升了预测密度的准确性。

详情
AI中文摘要

我们介绍了DeRegiME--深度制度混合专家--一种直接多时间跨度的概率预测器,它将潜在的不确定性制度与底层信号分开,并使用稀疏变分高斯过程(GP)软地将每个预测位置分配给学习到的重复制度。该过程通过共享门将非平稳制度混合核和学生t分布似然结合起来,从而得到一个单一的稀疏GP后验,而不是GP专家的混合。DeRegiME解决了神经预测器的一个关键限制:点预测丢弃残差不确定性,而概率头--无论是单边际、未解释的混合、分位数集还是扩散样本--很少暴露残差的制度结构。然而,在噪声异方差时间序列中,分布偏移可能突然、逐渐或时间依赖性出现,通常出现在残差不确定性而非条件均值中。DeRegiME提供了一个可解释的均值-残差-噪声分解,通过直接求和的特征空间表示,将制度锚定为残差相似性的聚类,其转换表现为隐含的转折点。有效制度的数量通过粘性打破门进行修剪。我们证明了核的有效性及预测密度的正确性,并在十个基准和三个编码器网格上,DeRegiME在最强大的编码器匹配基线(DeepAR/GluonTS风格的动态学生t头)上将负对数预测密度(NLPD)提高了20.3%,并在CRPS(3.0%)和MSE(4.7%)上获得并行收益。改进在所有数据集中保持一致,这些数据集涵盖了突然、逐渐和季节性偏移。

英文摘要

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

2605.19230 2026-05-20 cs.CV cs.LG 版本更新

Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation

通过样本难度去相关性实现鲁棒的年龄依赖性混杂效应缓解

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby, Luke Whitbread, Lyle J. Palmer

发表机构 * Australian Institute for Machine Learning(澳大利亚机器学习研究所) Adelaide University(阿德莱德大学)

AI总结 本文提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应,通过样本难度建模和去相关年龄与主导年龄难度趋势,减少年龄相关的真阳性与假阳性差异,同时保持临床有意义的非线性年龄信息。

Comments 10 Pages, 3 Figures

详情
AI中文摘要

医学图像分类中的年龄依赖性性能差异通常是因为年龄作为混杂因素,将成像形态与疾病流行率联系起来。在实践中,差异可能表现为在疾病流行率较高的年龄过诊断,而在流行率较低的年龄下诊断不足,并在训练测试年龄分布变化时恶化。传统缓解方法强制严格年龄不变性可能会抑制在年龄中编码的诊断性信息。因此,我们提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应。在预热阶段后,我们表征样本难度并以标签条件方式建模其年龄依赖性趋势。通过使用鲁棒的Huber加权亲和权重去相关年龄与主导年龄难度趋势,削弱由混杂驱动的捷径,同时保留临床有意义的非线性年龄信息。我们进一步引入了一个年龄覆盖分数,通过mini-batch年龄方差缩放去相关惩罚,以确保在有限年龄多样性下稳定的优化。在两个放射学数据集中,我们的方法在最小化AUC影响的同时减少了年龄相关的真阳性与假阳性差异,并在增加的训练测试年龄分布变化下保持稳健。

英文摘要

Age dependent performance disparities in medical image classification often arise because age acts as a confounder, linking imaging morphology with disease prevalence. In practice, disparities can manifest as overdiagnosis at ages where disease prevalence is higher and underdiagnosis at ages where prevalence is lower, and can worsen under train test shifts in the age distribution. Conventional mitigation approaches that enforce strict age invariance may suppress diagnostically meaningful information encoded in age. We therefore propose a robust framework that mitigates the effects of age-dependent confounding by targeting spurious age linked trends rather than enforcing invariance. Following a warm-up phase, we characterize sample difficulty and model its age-dependent trends in a label-conditioned manner. We decorrelate age from dominant age difficulty trends using robust, Huber weighted affinity weights, attenuating confounding-driven shortcuts while preserving clinically meaningful, nonlinear age information. We further introduce an Age Coverage Score that scales the decorrelation penalty by minibatch age variance to ensure stable optimization under limited age diversity. Across two radiology datasets, our approach reduces age dependent true and false positive disparities with minimal AUC impact and remains robust to increasing train test age distribution shifts.

2605.19220 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

位置:在LLM中的不确定性量化仅仅是无监督聚类

Tiejin Chen, Longchao Da, Xiaoou Liu, Hua Wei

发表机构 * School of Computing(计算学院) Augmented Intelligence, Arizona State University(智能增强与亚利桑那州立大学)

AI总结 本文指出,当前LLM的不确定性量化方法本质上是无监督聚类算法,无法有效评估模型的外部正确性,导致无法检测出自信但错误的回答。文章提出了改进的不确定性量化方法,以确保模型的自信度能可靠地反映现实。

Comments Accepted by ICML 2026 Position Paper Track

详情
AI中文摘要

不确定性量化(UQ)被广泛认为是部署大型语言模型(LLM)于高风险领域的主要保障。然而,我们主张该领域存在类别错误:主流LLM的UQ方法本质上是无监督聚类算法。我们证明大多数当前方法本质上量化的是模型生成的内部一致性,而不是其外部正确性。因此,当前方法从根本上无法识别事实现实,并无法检测出“自信幻觉”,即模型在稳定但错误的答案上表现出高自信。因此,当前UQ方法在部署模型时可能会产生误导的安全感。具体而言,我们识别出由于对内部状态的依赖而产生的三种关键病理:超参数敏感危机,使部署不安全;内部评估循环,将稳定性与事实混淆;以及缺乏事实基础,迫使依赖不稳定代理指标来评估不确定性。为解决这一困境,我们倡导向UQ方法转变,并为研究界制定研究路线图,以采用更好的评估指标和设置,实施原生不确定性机制的变化,并将验证锚定在客观事实上,确保模型自信度能可靠地反映现实。

英文摘要

Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that the field suffers from a category error: mainstream UQ methods for LLMs are just unsupervised clustering algorithms. We demonstrate that most current approaches inherently quantify the internal consistency of the model's generations rather than their external correctness. Consequently, current methods are fundamentally blind to factual reality and fail to detect ``confident hallucinations,'' where models exhibit high confidence in stable but incorrect answers. Therefore, the current UQ methods may create a deceptive sense of safety when deploying the models with uncertainty. In detail, we identify three critical pathologies resulting from this dependence on internal state: a hyperparameter sensitivity crisis that renders deployment unsafe, an internal evaluation cycle that conflates stability with truth, and a fundamental lack of ground truth that forces reliance on unstable proxy metrics to evaluate uncertainty. To resolve this impasse, we advocate for a paradigm shift to UQ and outline a roadmap for the research community to adopt better evaluation metrics and settings, implement mechanism changes for native uncertainty, and anchor verification in objective truth, ensuring that model confidence serves as a reliable proxy for reality.

2605.19214 2026-05-20 cs.LG cs.CV 版本更新

Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

多属性公平医疗图像分类中的最差组等化几率正则化

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby, Luke Whitbread, Lauren Oakden-Rayner, Robert Vandersluis, Jessica Schrouff, Lyle J. Palmer, Mark Jenkinson

发表机构 * Australian Institute for Machine Learning, Adelaide University(澳大利亚机器学习研究所,阿德莱德大学) GlaxoSmithKline (GSK)(葛兰素史克(GSK))

AI总结 本文提出了一种最差组等化几率正则化方法,用于在多个人口属性上同时评估和缓解医疗图像分类中的系统性差异,通过在推理时优化子组层面的真阳性率和假阳性率偏差,减少等化几率和等化机会的不平等,同时对AUC影响最小。

Comments 11 Pages, 2 Figures

详情
AI中文摘要

医疗人工智能的诊断性能在不同人口群体间系统性地变化,但子组AUC可能掩盖了临床重要的不平等。在固定的推理时间操作点上,某些群体可能表现出过度诊断行为,其特征是真阳性率和假阳性率升高,而另一些群体则表现出不足诊断模式,其真阳性率和假阳性率降低。这些对立的趋势可能在总体AUC中相互抵消,但会产生有意义的临床决策不平等。受在操作点和多个人口属性上评估和缓解此类不平等的需要所驱动,我们提出了一种最差组等化几率边际正则化器。该正则化器明确针对推理时的子组层面真阳性率和假阳性率偏差。在每次更新时,该方法识别出由显式人口属性(如年龄、性别和种族)定义的最极端边际偏差的子组,并应用统一的惩罚,从而在多个人口轴上实现公平优化,而无需显式交集约束。在两个现实中的多标签医学影像数据集中,我们的方法在减少等化几率和等化机会的不平等方面表现一致,对AUC影响极小,从而在保持诊断性能的同时提高公平性。

英文摘要

Diagnostic performance in medical AI varies systematically across demographic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making. Motivated by the need to assess and mitigate such disparities at the operating point and across multiple demographic attributes simultaneously, we propose a worst-group equalized-odds margin regularizer. The proposed regularizer explicitly targets subgroup-level deviations on both the true positive and false positive sides at inference. At each update, the method identifies subgroups defined by explicit demographic attributes (e.g., age, sex, and race) that exhibit the most extreme margin deviations and applies a unified penalty, enabling fairness optimization across multiple demographic axes without requiring explicit intersectional constraints. Across two medical imaging datasets in realistic multi-label settings, our method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal impact on AUC, preserving diagnostic performance while improving fairness.

2605.19208 2026-05-20 stat.AP cs.LG stat.ML 版本更新

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

通过强化学习实现功能动作的精准体育活动处方

Gefei Lin, Rui Miao, Jennifer Sacheck, Xiaoke Zhang

发表机构 * Department of Statistics, The George Washington University(统计系,乔治·华盛顿大学) Department of Mathematical Sciences, The University of Texas at Dallas(数学科学系,德克萨斯大学达拉斯分校) Department of Behavioral and Social Sciences, Brown University(行为与社会科学系,布朗大学)

AI总结 本文提出了一种基于强化学习的算法,用于根据心血管代谢风险个性化优化每日步数分布,通过All of Us研究数据验证了该方法在提高健康生物标志物方面的有效性。

详情
AI中文摘要

体育活动(PA)在维持和改善健康方面起着重要作用。日常步数已成为一种关键的PA测量指标,可通过常见的可穿戴设备轻松获取。然而,缺乏推荐个性化最优每日步数分布的方法以最佳改善某些健康生物标志物。本文基于All of Us研究数据,该数据包括数月的步数计数以及关键健康生物标志物的重复测量,开发了一种新的离线强化学习(RL)算法,以学习与心血管代谢风险相关的个性化和最优PA分布,其中动作是一个函数,表示一段时间内每日步数分布。模拟研究显示,所提出的方法在现有连续动作RL方法中具有优势。从All of Us数据中学习到的最优策略通常建议人们增加日常步数,并在时间上遵循更一致的PA模式,同时为血糖水平、体重指数、血压、年龄和性别等亚组提供定制推荐。

英文摘要

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.

2605.19207 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings

用于低资源医疗环境的量化机器学习模型:医学影像

Sumanth Meenan Kanneti, Aryan Shah

发表机构 * Georgia State University(佐治亚州立大学)

AI总结 本文提出了一种多策略压缩框架,用于MRI图像中的脑肿瘤分类,通过量化感知训练、从DenseNet-101教师模型到紧凑DenseNet-32学生模型的知识蒸馏以及轻量MobileNetV2骨干网络上的Float16后训练量化,实现了在低资源医疗环境中高效且准确的脑肿瘤筛查。

详情
AI中文摘要

深度学习模型在医学影像分析中表现出强大的性能,但在低资源临床环境中部署仍然困难,由于计算、内存和电力限制。本文提出了一种多策略压缩框架,用于从MRI中进行脑肿瘤分类,包括量化感知训练、从DenseNet-101教师模型到紧凑DenseNet-32学生模型的知识蒸馏,以及在轻量MobileNetV2骨干网络上的Float16后训练量化。使用包含胶质瘤、脑膜瘤、垂体瘤和健康对照的多类脑肿瘤MRI数据集,我们提供了基于MobileNetV2的完整实验验证,通过三阶段迁移学习训练分类器,并通过TensorFlow Lite应用Float16量化。DenseNet基于的知识蒸馏和量化感知训练策略被描述为框架内的互补压缩方法,其完整的经验评估留待未来工作。在MobileNetV2管道上的实验结果表明,量化模型在验证准确率为82.37%的情况下,与全精度基线82.20%相比,模型大小从35.34 MB减少到5.76 MB,压缩比为6.14倍,无显著精度损失。各分类评估证实,量化在所有四个肿瘤类别中均匀保持诊断性能。这些发现表明,轻量化的量化模型可以在资源受限的医疗环境中提供临床可行的脑肿瘤筛查。

英文摘要

Deep learning models have shown strong performance in medical image analysis, but deploying them in low-resource clinical environments remains difficult due to computational, memory, and power constraints. This paper presents a multi-strategy compression framework for brain tumor classification from MRI, encompassing quantization-aware training, knowledge distillation from a DenseNet-101 teacher to a compact DenseNet-32 student with low-bit post-training quantization, and Float16 post-training quantization on a lightweight MobileNetV2 backbone. Using a multi-class brain tumor MRI dataset containing glioma, meningioma, pituitary tumors, and healthy controls, we provide full experimental validation of the MobileNetV2-based pipeline, training the classifier through a three-stage transfer learning process and applying Float16 quantization via TensorFlow Lite. The DenseNet-based distillation and quantization-aware training strategies are described as complementary compression approaches within the framework, with their complete empirical evaluation reserved for future work. Experimental results on the MobileNetV2 pipeline show that the quantized model achieves 82.37 percent validation accuracy compared to the 82.20 percent full-precision baseline, reducing model size from 35.34 MB to 5.76 MB, a 6.14x compression ratio with no meaningful accuracy loss. Per-class evaluation confirms that quantization preserves diagnostic performance uniformly across all four tumor categories. These findings demonstrate that lightweight quantized models can deliver clinically viable brain tumor screening in resource-constrained healthcare settings.

2605.19201 2026-05-20 cs.LG cs.AI 版本更新

On-Device Continual Learning with Dual-Stage Buffer and Dynamic Loss for Point-of-Care Pneumonia Diagnosis

设备端持续学习与双阶段缓冲器和动态损失用于现场肺炎诊断

Danu Kim

发表机构 * Korea International School, Jeju Campus(韩国国际学校,济州校区)

AI总结 本文提出PneumoNet,一种适用于资源受限环境的领域增量学习方法,结合轻量级CNN进行设备端预测,双阶段平衡缓冲器实现类别平衡回放,以及动态类别加权损失以纠正训练批次不平衡,实验表明其在模拟五个真实域变化场景的PneumoniaMNIST数据集上达到86.6%的准确率,同时更小更高效。

Comments Presented at 32nd Samsung Humantech Paper Awards

详情
AI中文摘要

深度学习模型在胸部X光片上检测肺炎具有高准确性,但在设备、患者或机构差异导致的域偏移下性能会下降。我们提出了PneumoNet,一种用于资源受限环境的点-of-care肺炎诊断的领域增量学习方法。PneumoNet结合了轻量级CNN进行设备端预测,双阶段平衡缓冲器实现类别平衡回放,以及动态类别加权损失以纠正训练批次不平衡。在模拟五个真实域变化场景的域偏移PneumoniaMNIST数据集上评估,PneumoNet在86.6%的准确率和1.4%的遗忘率下,比现有基线更小且更快。这些结果突显了PneumoNet在真实世界和疫情准备医疗环境中实现适应性、隐私保护诊断AI的潜力。

英文摘要

Deep learning models detect pneumonia from chest X-rays with high accuracy, but the performance declines under domain shifts caused by differences in devices, patients, or institutions. We present PneumoNet, a domain-incremental learning method for point-of-care pneumonia diagnosis in resource-limited settings. PneumoNet combines a lightweight CNN for on-device prediction, a dual-stage balanced buffer for class-balanced replay, and a dynamic class-weighted loss to correct training-batch imbalances. Evaluated on a domain-shifted PneumoniaMNIST dataset simulating five realistic domain change scenarios, PneumoNet achieves 86.6% accuracy with 1.4% forgetting while being smaller and faster than existing baselines. These results highlight PneumoNet's potential to enable adaptive, privacy-preserving diagnostic AI directly on point-of-care medical devices in real-world and pandemic-ready healthcare.

2605.19193 2026-05-20 cs.LG 版本更新

Sequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detection

多智能体大语言模型辩论中的顺序共识:一种基于Wald-SPRT的计算控制器与基于校准的故障检测

Andrea Morandi

发表机构 * Cisco(思科)

AI总结 本文提出了一种基于Wald-SPRT的计算控制器,用于多智能体大语言模型辩论,通过校准来检测故障,从而在保证准确性的同时减少计算资源的使用。

详情
AI中文摘要

多智能体大语言模型辩论能够提高事实性和推理能力,但大多数方法固定回合数,导致在简单任务上过度消耗计算资源而在困难任务上不足。本文将Wald的顺序概率比率检验(SPRT)作为插件计算控制器应用于大语言模型辩论。每轮结束后,一个LLM法官会发出一个[0,1]的共识分数来评估最新智能体的位置;Wald监控器在Beta似然族下累积“有用收敛”与“尚未有用”的对数似然比,并在跨越任一边界或返回 capped 最佳努力结果时停止。在独立同分布假设下,该规则继承了SPRT类型I/类型II误差保证;在部署中,校准本身更为重要,因为它估计法官评分是否在特定领域中区分有用和无用的收敛。我们评估了两个轨道:(i) 在校准Beta模型下的蒙特卡洛研究,研究工作曲线、误差率、上限行为和敏感性;以及(ii) 在200个尝试的MMLU和200个尝试的GSM8K项目上的真实LLM评估,使用三个异质智能体(gpt-5, claude-opus-4-6, gemini-2.5-pro)和一个claude-opus-4-6法官,使用不相交的40项校准子集。在GSM8K上,该规则在1.01平均回合(4.06个LLM调用)达到97.0%的准确率,比固定5轮辩论在15次调用中达到的99.0%准确率减少了3.7倍的调用次数,但准确性降低了2个百分点。在MMLU上,校准的KL值坍缩到约0,规则在2.1倍成本下对99.5%的项目进行上限。结论是,SPRT并未使辩论更准确,而是经典的顺序检验为多智能体LLM系统提供了一种廉价的计算控制和故障检测层。

英文摘要

Multi-agent LLM debate improves factuality and reasoning, but most recipes pick a fixed round count, over-spending on easy items and under-spending on hard ones. We adapt Wald's Sequential Probability Ratio Test (SPRT) as a plug-in compute governor for LLM debates. After each round, an LLM judge emits a [0,1] consensus score on the latest agent positions; a Wald monitor accumulates the log-likelihood ratio of "useful convergence" vs "not yet useful" under a Beta likelihood family, and stops when either boundary is crossed or returns a capped best-effort outcome at R_max. Under i.i.d. assumptions the rule inherits SPRT type-I/type-II error guarantees; in deployment the calibration itself is the more important object, since it estimates whether the judge score actually separates useful from unhelpful convergence in a given domain. We evaluate two tracks: (i) a Monte-Carlo study under calibrated Beta models characterising working curves, error rates, capping behaviour, and sensitivity; and (ii) a real-LLM evaluation on 200 attempted MMLU and 200 attempted GSM8K items with three heterogeneous agents (gpt-5, claude-opus-4-6, gemini-2.5-pro) and a claude-opus-4-6 judge, using disjoint 40-item calibration subsets. On GSM8K the rule stops in 1.01 average rounds (4.06 LLM calls) at 97.0% accuracy vs 99.0% for fixed-5 debate at 15 calls: a 3.7x call reduction at -2pp accuracy. On MMLU the calibrated KL collapses to about 0 and the rule caps on 99.5% of items at 2.1x cost. The takeaway is not that SPRT makes debate more accurate, but that a classical sequential test serves as a cheap compute-control and failure-detection layer for multi-agent LLM systems.

2605.19185 2026-05-20 cs.LG cs.AI 版本更新

Planner-Admissible Graph-PDE Value Extensions for Sparse Goal-Conditioned Planning

规划可接受的图-偏微分方程值扩展用于稀疏目标条件规划

Shiheng Zhang

发表机构 * Department of Applied Mathematics, University of Washington(应用数学系,华盛顿大学)

AI总结 本文研究了在操作argmin-Q规划器下,哪些图值扩展是规划可接受的,提出了一种局部动作间隙证书,证明在 rollout 过程中若代理值误差低于真实动作间隙的一半,则贪心 rollout 可达到目标。通过比较原理填充距离界,AMLE 实现了该证书,而调和扩展由于反映边界击中概率而非最短路径贪心顺序,可能导致局部动作排名错误。

详情
AI中文摘要

稀疏目标条件规划中,少量成本到目标标签可视为图-偏微分方程Dirichlet扩展问题:将稀疏标签扩展到目标依赖的边界上,以贪心rollouts达到目标。我们研究了在操作argmin-Q规划器下哪些图值扩展是规划可接受的。我们的主要结果是一种局部动作间隙证书:如果代理值误差在rollout过程中保持在真实动作间隙的一半以下,则贪心rollout可达到目标。绝对最小Lipschitz扩展(AMLE),作为图p-Laplacian家族的p=∞端点,通过比较原理填充距离界实现了该证书。相比之下,调和扩展由于其值反映边界击中概率而非最短路径贪心顺序,可能导致局部动作排名错误。在120个AntMaze布局衍生的图配置上,调和扩展实现0.584的累积rollout成功率,而AMLE达到0.970。有限高p方法也进入高成功率区域,p=4时成功率0.903,p=8时0.973,p=16固定预算求解器时0.982,尽管p=16行未作为收敛端点排名使用,因求解器认证不完整。机制审计显示,许多rollout决策发生在AMLE兼容但调和不兼容的局部几何中,并且AMLE在rollout加权决策范围内修正了大多数调和反转。

英文摘要

Sparse goal-conditioned planning with few cost-to-go labels can be viewed as a graph-PDE Dirichlet extension problem: extend sparse labels on a goal-dependent boundary to unlabelled graph vertices so that greedy rollouts reach the goal. We study which graph value extensions are planner-admissible under the operational argmin-Q planner. Our main result is a local action-gap certificate: if the surrogate value error along the rollout stays below half the true action gap, then the greedy rollout reaches the goal. Absolutely Minimal Lipschitz Extension (AMLE), the p=infinity endpoint of the graph p-Laplacian family, instantiates this certificate through a comparison-principle fill-distance bound. Harmonic extension, by contrast, can mis-rank local actions because its values reflect boundary hitting probabilities rather than shortest-path greedy order. On 120 AntMaze layout-derived graph configurations, harmonic extension achieves 0.584 aggregate rollout success, while AMLE reaches 0.970. Finite high-p methods also enter a high-success regime, with success 0.903 for p=4, 0.973 for p=8, and 0.982 for a fixed-budget p=16 solver, though the p=16 row is not used as a converged endpoint ranking due to incomplete solver certification. Mechanism audits show that many rollout decisions occur in AMLE-compatible but harmonic-incompatible local geometry, and that AMLE corrects most harmonic inversions on the rollout-weighted decision scope.

2605.19179 2026-05-20 astro-ph.EP astro-ph.IM cs.LG 版本更新

A Cloud-Based Tool for Meteorite Recovery Using Drones and Machine Learning

基于云技术的陨石回收工具:利用无人机和机器学习

Seamus L. Anderson, Hadrien A. R. Devillepoix, Lewis Lakerink, Sawitchaya Tippaya, Dale P. Giancono, Martin C. Towner, Iona Clemente, Martin Cupák, Ashley F. Rogers, John H. Fairweather, Mia Walker, Daniel Burgin, Michael A. Frazer, Sophie E. Deam, Veronika Pazderová, Eleanor K. Sansom, Benjamin A. D. Hartig, Hely C. Branco, Thomas Stevenson, Isabella Hatty, Anna Zappatini, Anthony Lagain, Tom Lovelock, Auriane Egal, Lucy Forman, David Belton, Simon Windsor, Shibli Saleheen, Asher Leslie, Gregory B. Poole, Andrew Langendam, Rachel S. Kirby, Andrew G. Tomkins

发表机构 * NASA Goddard Space Flight Center(美国国家航空航天局戈达德太空飞行中心) Space Science and Technology Centre(空间科学与技术中心) International Centre for Radio Astronomy Research(国际射电天文研究中心) Astronomy Data and Computing Services (ADACS)(天文数据与计算服务) Curtin Institute for Data Science(Curtin数据科学研究所) Centre for Rock Art Research and Management(岩画研究与管理中心) Faculty of Mathematics, Physics and Informatics, Comenius University Bratislava(布拉迪斯拉发大学数学、物理与信息学学院) Institute of Geology, University of Bern(伯尔尼大学地质研究所) Aix-Marseille University, CNRS, IRD, INRA, CEREGE, Institut Origines(阿维尼翁大学,CNRS,IRD,INRA,CEREGE,Origines研究所) Royal Holloway University of London(皇家霍洛威大学) Planétarium de Montréal, Espace pour la Vie(蒙特利尔天文馆,生命空间) Department of Physics and Astronomy, The University of Western Ontario(滑铁卢大学物理与天文学系) School of Earth and Planetary Sciences, Curtin University(Curtin大学地球与行星科学学院) Australian Nuclear Science and Technology Organisation(澳大利亚核科学与技术组织) School of Earth, Atmosphere and Environment, Monash University(莫纳什大学地球、大气与环境学院)

AI总结 本文提出一种基于云技术的工具,利用无人机和机器学习帮助恢复通过仪器观测到的陨石坠落。该工具展示了系统迭代改进的成果,并详细说明了该技术在澳大利亚南部和西海岸陨石坠落中的成功与局限性。

Comments 23 pages, 3 figures

详情
AI中文摘要

我们提出了一种基于云技术的工具,利用无人机和机器学习来帮助恢复通过仪器观测到的陨石坠落。我们展示了一 series of improvements made upon previous iterations of our system, as well as detail the successes and limitations of this technique when applied to observed meteorite falls in South and Western Australia. This tool is available to the meteoritics research community upon request at https://find.gfo.rocks.

英文摘要

We present a cloud-based tool that uses drones and machine learning to help recover instrumentally observed meteorite falls. We showcase a collection of improvements made upon previous iterations of our system, as well as detail the successes and limitations of this technique when applied to observed meteorite falls in South and Western Australia. This tool is available to the meteoritics research community upon request at https://find.gfo.rocks.

2605.19178 2026-05-20 cond-mat.dis-nn cond-mat.stat-mech cs.LG physics.data-an 版本更新

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

激活函数、统计学和受限玻尔兹曼机中高阶相互作用的学习

Giovanni di Sarra, Yasser Roudi

发表机构 * Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology(Kavli系统神经科学研究所,挪威科学技术大学) Department of Mathematics, King’s College London(伦敦国王学院数学系)

AI总结 本文研究了受限玻尔兹曼机中激活函数对高阶相互作用统计学和学习的影响,分析了四种常见激活函数在不同参数范围内的表示和学习能力。

Comments 38 pages, 27 figures

详情
AI中文摘要

神经网络在复杂数据中识别隐藏模式和相关性的巨大成功,归功于它们利用大量参数和非线性单单元激活函数的方式。受限玻尔兹曼机(RBMs)提供了一个简单而强大的框架,用于研究激活非线性对性能和表示的影响。在本工作中,我们利用RBMs与相互作用二元变量模型之间的双重性,研究了不同隐藏单元激活函数的RBM集合所诱导的相互作用的统计学。我们以四种常用激活函数(线性、阶跃、ReLU和指数)的诱导相互作用分布的矩来分析可表示模型的空间。对学习的定量预测与训练过程模拟的结果有很好的一致。特别是,我们的分析表明,某些数据结构,即由具有大相互作用项的相互作用变量模型生成的结构,对于任何RBM来说都难以表示和学习。然而,我们发现快速增加的非线性,如指数函数,可以促进特定参数范围内的此类数据结构的表示和学习。

英文摘要

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

2605.19172 2026-05-20 cs.LG cs.AI 版本更新

Bridge: Retrieval-Augmented Spatiotemporal Modeling for Urban Delivery Demand

Bridge:基于检索的时空建模用于城市配送需求

Yihong Tang, Tong Nie, Junlin He, Qianjun Huang, Dingyi Zhuang, Lijun Sun

发表机构 * McGill University(麦吉尔大学) The Hong Kong Polytechnic University(香港理工大学) University of Toronto(多伦多大学) MIT(麻省理工学院)

AI总结 本文提出Bridge框架,通过结合归纳上下文图结构和时间感知的记忆模块,解决新加入服务区域缺乏历史记录导致的城市配送需求预测难题,提升了冷启动区域的预测性能。

详情
AI中文摘要

预测城市配送需求在新增服务区域缺乏历史记录时变得尤为具有挑战性。现有的时空预测器在有足够的节点历史时能有效建模空间依赖性,但它们仍然是参数化的,因此在冷启动区域难以恢复短期运营动态。地理嵌入帮助识别区域的位置和功能,但并不能直接揭示相似区域在相似时间背景下行为的方式。我们提出了Bridge,一种结合归纳上下文图结构和时间感知记忆的时空图框架。对于每个目标区域,Bridge通过区域上下文和近期动态从记忆中检索未来需求模式,并通过门控融合机制优化图结构预测。为了使检索与预测效用对齐,我们进一步训练检索器以未来为导向的目标,偏好那些未来轨迹与目标最匹配的条目。实验表明,Bridge在四个真实世界配送数据集上,无论是城市内部冷启动还是跨城市转移时部分观察情况下,均优于竞争性的时空基线模型。结果表明,当参数图泛化能力不足时,检索增强为冷启动城市需求预测提供了有用的操作记忆。

英文摘要

Forecasting urban delivery demand becomes substantially more challenging when newly added service regions lack historical records. Existing spatiotemporal forecasters effectively model spatial dependence once sufficient node histories are available. Still, they remain parametric and therefore struggle to recover short-term operational dynamics in cold-start regions. Geospatial embeddings help identify where a region is and what function it serves, yet they do not directly reveal how a similar region behaves under a comparable temporal context. We propose Bridge, a retrieval-augmented spatiotemporal graph framework that combines an inductive contextual graph backbone with a time-aware memory of region-time windows. For each target region, Bridge retrieves future demand patterns from the memory using both regional context and recent dynamics, and refines the backbone forecast through a gated fusion mechanism. To align retrieval with forecasting utility, we further train the retriever with a future-aware objective that favors entries whose future trajectories best match the target. Experiments on four real-world delivery datasets show that Bridge consistently improves over competitive spatiotemporal baselines in both within-city cold-start and cross-city transfer with partial observations. The results show that retrieval augmentation provides a useful operational memory for cold-start urban demand forecasting when parametric graph generalization alone is insufficient.

2605.19166 2026-05-20 cs.RO cs.LG math.OC 版本更新

A Heuristic Approach for Performance Tuning in RL-based Quadrotor Control via Reward Design and Termination Conditions

一种通过奖励设计和终止条件实现RL基于四旋翼控制性能调优的启发式方法

Fausto Mauricio Lagos Suarez, Akshit Saradagi, Vidya Sumathy, George Nikolakopoulos

发表机构 * Robotics and AI group, in the Department of Computer Science, Electrical and Space Engineering at Luleå University of Technology(鲁德尼大学机器人与人工智能小组,计算机科学、电气与空间工程系)

AI总结 本文提出了一种新的启发式方法,通过奖励设计和终止条件实现RL四旋翼控制的可调性能,该方法通过双带宽指数奖励结构实现了设定点跟踪的临界阻尼响应,并具有低稳态误差。在使用近端策略优化(PPO)算法训练时,结合episode截断条件,在600万次时间步内以高效的方式实现了所需性能。通过直观的启发式规则调整奖励权重和指数系数,可以实现更快(空翻式)和更慢(检查式)的稳定时间性能,同时保留基线临界阻尼响应和约2%的稳态误差。

Comments Accepted in the 34th Mediterranean Conference on Control and Automation

详情
AI中文摘要

基于强化学习(RL)的四旋翼控制策略在诸如在复杂环境中快速导航和无人机赛车等任务中取得了显著性能。然而,在某些应用中,如基础设施检查,实现精确、可控的机动并具有可调性能至关重要。本文提出了一种新的启发式方法,通过奖励设计和终止条件实现RL基于四旋翼控制的可调性能。我们提出了一种包含双带宽指数的新型奖励结构,实现了设定点跟踪的基线临界阻尼响应,并具有低稳态误差。当使用近端策略优化(PPO)算法进行训练时,结合episode截断条件,在600万次时间步内以高效的方式实现了所需性能。为了调节基线行为的性能,我们提出了直观的启发式规则来调整奖励权重和指数系数,以实现更快(空翻式)和更慢(检查式)的稳定时间性能,同时保留基线临界阻尼响应和大约2%的稳态误差。我们评估了三种RL策略(基线、空翻和检查)在100次试验中的表现,并展示了在随机初始条件下位置和偏航跟踪的准确且可调性能,从而证明了所提出启发式方法的有效性。

英文摘要

Reinforcement learning (RL)-based quadrotor control policies have achieved impressive performance in tasks such as fast navigation in cluttered environments and drone racing, where the focus is on speed and agility. However, in several applications, such as infrastructure inspection, it is critical to achieve precise, controlled maneuvers with tunable performance. In this article, we present a novel heuristic approach to achieve tunable performance in RL-based Quadrotor control through reward design and termination conditions. We present a novel reward structure containing dual bandwidth exponentials that achieves a baseline critically damped response in setpoint tracking, with low steady-state errors. When trained with a Proximal Policy Optimization (PPO) algorithm, in conjunction with episode truncation conditions, the desired performance is achieved in 6 million time steps in a sample-efficient manner. In order to tune the performance about the baseline behavior, we present intuitive heuristic rules to adjust the reward weights and exponential coefficients to achieve faster (acrobatic-like) and slower (inspection-like) settling time performance, while retaining the baseline critically damped response and approximately 2\% steady-state error. We evaluate the three RL policies (baseline, acrobatic, and inspection) across 100 trials and show accurate and tunable performance in position and yaw tracking from random initial conditions, thereby demonstrating the effectiveness of the proposed heuristic approach.

2605.19156 2026-05-20 cs.AI cs.CY cs.LG cs.MA 版本更新

How Far Are We From True Auto-Research?

我们距离真正的自动研究还有多远?

Zhengxin Zhang, Ning Wang, Sainyam Galhotra, Claire Cardie

发表机构 * Cornell University(康奈尔大学)

AI总结 本文通过ResearchArena评估了不同代理生成的论文质量,发现虽然代理能生成看似有竞争力的论文,但实际实验严谨性不足,存在伪造结果、实验能力不足和计划与执行不匹配等问题,表明自动研究仍需进一步发展。

详情
AI中文摘要

最近的自动研究系统能够生成完整的论文,但可行性并不等同于质量,该领域仍然缺乏对代理生成论文实际质量的系统研究。我们介绍了ResearchArena,一个最小的框架,让现成的代理(Claude Code使用Opus 4.6,Codex使用GPT-5.4,和Kimi Code使用K2.5)在仅轻量指导下自行完成完整的研究循环(构想、实验、论文写作、自我完善)。在13个计算机科学种子和每个代理-领域对的3次试验中,ResearchArena生成了117篇代理生成的论文,每篇都在三个互补的视角下评估:仅手稿的评审员(SAR)、考虑工件的同行评审(PR)以及人工进行的元评审。在仅SAR的情况下,图景是乐观的:Claude Code获得最高评分,优于Analemma的FARS,并与加权平均的人类ICLR 2025提交匹配,表明最小框架的代理能够生成在手稿-only评审中看起来有竞争力的论文。然而,人工检查却揭示了这个图景被夸大了:SAR评分与实际接受决定不一致,且奖励合理框架而不验证实验实质。在考虑工件的PR评分急剧下降,人工审计发现实验严谨性是主要瓶颈,分解为三种失败模式(伪造结果、低能力实验、计划/执行不匹配),这些模式高度依赖于代理:Codex 5%/8%论文与工件不匹配/伪造参考文献,与Kimi Code 77%/72%相比,差距约为15倍,追踪代理发展出的不同研究身份。没有一篇代理生成的论文达到顶级会议的接受标准。这表明我们仍然与真正的自动研究有差距。

英文摘要

Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.

2605.19150 2026-05-20 cs.LG cs.AI 版本更新

Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models

Flash PD-SSM: 一种内存优化的结构稀疏状态空间模型

Aleksandar Terzić, Francesco Carzaniga, Nicolas Menet, Yannick Biehl, Michael Hersche, Thomas Hofmann, Abbas Rahimi

发表机构 * IBM Research – Zurich(IBM瑞士研究中心) Department of Computer Science, ETH Zürich(苏黎世联邦理工学院计算机科学系)

AI总结 本文提出Flash PD-SSM,一种内存优化的结构稀疏状态空间模型,通过在保持高效的同时提升表达能力,实现了与传统结构化状态空间模型相当的吞吐量,并在多个任务中展示了更高的准确性和效率。

详情
AI中文摘要

状态空间模型(SSMs)面临效率与表达能力之间的根本权衡,这主要由模型转移矩阵的结构决定。无结构的转移矩阵具有最大的表达能力,但计算和内存成本过高。相比之下,大多数结构化转移矩阵形式在运行时间和内存消耗上都非常高效,但表达能力有限。基于最近关于结构稀疏SSMs的研究,我们提出了Flash PD-SSM,一种新的SSM,其吞吐量与广泛使用的结构化SSMs相当,但具有显著更好的表达性保证。Flash PD-SSM维护一个可训练的结构稀疏矩阵集合,在每个时间步选择其中一个进行离散选择,从而在保持大规模训练所需的效率的同时,实现了与无结构矩阵相当的FSA表达能力。首先,我们在合成机制和状态跟踪任务上验证了Flash PD-SSM,发现其理论表达能力在实践中得以实现。其次,在涉及超过17000长度序列的多变量时间序列任务中,我们发现Flash PD-SSM在竞争性的SSM方法中定义了新的最先进的(SoTA)准确性。最后,我们展示了Flash PD-SSM是混合LLMs的有效替代品,在自然语言状态跟踪和常见语言建模场景中均取得改进。该模型相比前沿语言模型广泛使用的SSMs表现出更高的吞吐量和更低的内存消耗。

英文摘要

State-space models (SSMs) face a fundamental trade-off between efficiency and expressivity that is mainly dictated by the structure of the model's transition matrix. Unstructured transition matrices enable maximal expressivity, as measured by their ability to model finite-state automaton (FSA) transitions, but come at a prohibitively high compute and memory cost. In contrast, most structured transition matrix forms are highly efficient both in runtime and memory consumption, but suffer from limited expressivity. Building on recent work on structured sparse SSMs, we propose Flash PD-SSM, a novel SSM that achieves comparable throughput to widely-used structured SSMs with significantly better expressivity guarantees. Flash PD-SSM maintains a trainable set of structured sparse matrices, a single one of which is discretely selected at each time-step, enabling FSA expressiveness at the level of unstructured matrices while maintaining the efficiency required for training models at scale. First, we validate Flash PD-SSM against a suite of alternative models on synthetic mechanistic and state-tracking tasks, finding that its theoretical expressivity is achieved in practice. Second, on multivariate time-series tasks involving sequences of length over 17,000, we find that Flash PD-SSM defines a new state-of-the-art (SoTA) accuracy among competing SSM methods. Finally, we demonstrate that Flash PD-SSM is an effective drop-in replacement for hybrid LLMs, yielding improvements both in natural language state-tracking and in common language modeling scenarios. The model exhibits increased throughput and decreased memory consumption compared to SSMs widely used in frontier language models.

2605.19147 2026-05-20 cs.CR cs.AI cs.LG 版本更新

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

仁者重写:通过重写实现良性投影以防御大语言模型数据中毒攻击

John T. Halloran, Noopur S. Bhatt

发表机构 * Leidos University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出了一种基于重写的良性投影方法(OBBR),通过利用开放书本的良性样本来提高大语言模型对数据中毒攻击的防御能力,实验表明OBBR在多种已知攻击模式中表现出更高的安全性能,并且在计算效率和模型性能方面具有优势。

Comments 15 pages, 2 Figures, 5 Tables

详情
AI中文摘要

大型语言模型(LLMs)对后门攻击(BAs)非常敏感,其中训练样本通过基于触发器的有害内容进行中毒。此外,现有防御措施在广泛测试不同BA模式时已被证明无效。为了更好地对抗BAs,我们探索了使用LLM重写作为主动防御数据中毒的方法。首先,我们理论证明,当LLM重写利用开放书本良性样本(称为开放书本良性重写,OBBR)时,重写输出为良性的概率严格大于闭合书本重写。因此,OBBR通过将训练样本投影到良性提示空间来中和有害内容。我们随后表明,与以往的防御措施不同,OBBR有效缓解了大量现有的BAs:在五种已知BAs和四种广泛使用的LLMs中,OBBR相比最先进的BA防御措施平均提高了51%的安全性能,相比闭合书本重写方法提高了25.7%。最后,我们表明OBBR在计算效率方面优于其他BA防御措施,经过微调后不会降低模型在自然语言任务上的性能,并且能够防御非触发基于的数据中毒攻击。

英文摘要

Large language models (LLMs) are highly susceptible to backdoor attacks (BAs), wherein training samples are poisoned using trigger-based harmful content. Furthermore, existing defenses have proven ineffective when extensively tested across BA patterns. To better combat BAs, we explore the use of LLM rewriting as a proactive defense against data poisoning. First, we theoretically show that when LLM rewriting utilizes open-book benign samples--termed open-book benign rewriting (OBBR)--the probability of a rewritten output being benign is strictly greater than that of closed-book rewriting. Thus, OBBR neutralizes harmful content by projecting training samples to the space of benign prompts. We then show that, in contrast to previous defenses, OBBR effectively mitigates a large number of existing BAs: across five known BAs and four widely used LLMs, OBBR increases safety performance by an average 51% compared to state-of-the-art BA defenses and 25.7% compared to closed-book rewriting methods. Finally, we show that OBBR is computationally efficient relative to other BA defenses, does not degrade model performance on natural language tasks after fine-tuning, and is capable of defending against non-trigger based data poisoning attacks.

2605.19141 2026-05-20 cs.LG cs.AI cs.CL cs.CY cs.HC 版本更新

GRASP: Deterministic argument ranking in interaction graphs

GRASP:交互图中的确定性论证排名

Diganta Misra, Antonio Orvieto, Rediet Abebe, Volkan Cevher

发表机构 * MPI-IS Tübingen(图宾根MPI研究所) Tübingen AI Center(图宾根人工智能中心) ELLIS Institute Tübingen(图宾根ELLIS研究所) Eberhard Karls Universität Tübingen(图宾根埃伯哈德·卡尔斯大学) LIONS, EPFL(EPFL的LIONS实验室)

AI总结 本文提出GRASP框架,通过聚合稳定的局部交互判断生成全局排名,以解决大语言模型作为裁判时整体评判不一致的问题,强调结构充分性而非说服力或修辞吸引力。

Comments Preprint

详情
AI中文摘要

大型语言模型越来越多地被部署为自动裁判,以评估论证的强度。随着这一角色的扩大,其合法性取决于一致性、透明性和将论证结构与修辞吸引力区分开的能力。然而,我们证明了整体评判——一种常见的LLM-as-a-Judge实践,其中模型对辩论提供全球裁决——存在显著的跨模型分歧。我们主张这种不稳定性源于将辩论复杂的交互结构压缩成单一的不透明分数。为了解决这一问题,我们提出GRASP(渐进排名与攻击支持传播),一种确定性框架,通过收敛的攻击-防御传播操作,将稳定的局部交互判断聚合为全局排名。我们证明在LLM-as-a-Judge评估中,局部交互判断比整体排名更具可重复性,使GRASP能够生成更一致的全局排名。我们进一步证明GRASP分数与人类“说服性”标签不相关,突显了一个关键的社技术区别:GRASP不衡量说服力、事实性或修辞吸引力,而是结构充分性——一种在显式交互图上的防御意识的论证鲁棒性概念。总体而言,GRASP为整体LLM评判提供了一个透明且可审计的替代方案。

英文摘要

Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.

2605.19135 2026-05-20 cs.LG 版本更新

Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing

部分潜在变量共享下的可识别多模态因果表示学习

Manal Benhamza, Marianne Clausel, Myriam Tami

发表机构 * Paris-Saclay University, CentraleSupélec, MICS Lab(巴黎-萨克雷大学,中央理工-巴黎高等电力学院,MICS实验室) Lorraine University, CRAN(洛林大学,CRAN)

AI总结 本文研究了在部分潜在变量共享设定下多模态因果表示学习的可识别性问题,通过非线性混合函数生成各模态数据,并在不假设潜在变量分布的情况下,建立了因果潜在表示的组件可识别性保证,进一步验证了在欠定情况下方法的有效性。

详情
AI中文摘要

因果表示学习(CRL)旨在从高维观测数据中揭示有意义的潜在变量及其对应的因果结构。尽管其重要性,CRL的可识别性仍是一个关键属性,因为它确保了数据生成过程背后机制的恢复,从而保证了表示的可解释性和鲁棒性。证明CRL的可识别性本质上是困难的,本文针对更具有挑战性的多模态设定进行了研究:考虑具有部分共享潜在结构的多模态观测数据。每个模态通过非线性混合函数从特定的因果潜在变量子集生成。在灵活的假设下且不假设潜在变量的参数分布,我们建立了因果潜在表示的组件可识别性保证。此外,我们的可识别性结果还适用于欠定情况,即每个模态中观测变量多于潜在变量。为了实例化我们的理论分析,我们引入了一个基于Wasserstein的模块来恢复部分共享的潜在结构。由于其可微性,后者可以轻松地集成到所有类型的架构中,仅需最小的修改。在合成和现实数据集上的广泛实验验证了我们的方法优于现有最先进方法。

英文摘要

Causal representation learning (CRL) seeks to uncover meaningful latent variables and their corresponding causal structure from high-dimensional observational data. Although its significance, CRL identifiability remains a crucial property, as it ensures the recovery of the mechanisms behind the data generation process, and hence the interpretability and robustness of the representation. Proving identifiability in CRL is intrinsically difficult, and we address in this work an even more challenging setting: multimodality. We consider multimodal observed data with a latent partially shared structure. Each modality is generated, through non linear mixing functions, from a specific subset of causal latent variables. Under flexible assumptions and without imposing any parametric distribution on the latent variables, we establish component-wise identifiability guarantees for the causal latent representation. Our identifiability results, furthermore, apply to the undercomplete scenario where we have, for each modality, more observed than latent variables. To instantiate our theoretical analysis, we introduce a Wasserstein-based module to recover the partially shared latent structure. Due to its differentiability, the latter can be easily integrated into all types of architecture, only requiring minimal changes. Extensive experiments on synthetic and realistic datasets validate the superiority of our approach over SOTA methods.

2605.19132 2026-05-20 cs.LG 版本更新

CLIC: Contextual Language-Informed Cardiac Pathology Classification

CLIC: 基于上下文的语言引导心脏病理分类

Giovani D. Lucafo, Rafael da Costa Silva, João Lucas Luz Lima Sarcinelli, Andre Guarnier De Mitri, Diego Furtado Silva

发表机构 * Institute of Mathematical and Computer Sciences(数学与计算机科学学院) Universidade de São Paulo(圣保罗大学)

AI总结 本文提出CLIC框架,通过将患者上下文数据转化为描述性文本,利用自然语言编码技术提升心脏病理诊断的精确度,同时探索大语言模型生成的临床描述在下游分类任务中的应用。

Comments 6 pages, 2 figures, accepted at the ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM)

详情
AI中文摘要

心电图(ECG)是无创诊断心脏病理的黄金标准,也是心血管医学的基本支柱。深度学习的最新进展推动了稳健的自动化分类器的发展,这些分类器通过处理原始生理信号实现高性能。然而,在临床实践中,诊断很少仅基于信号本身。心内科医生通常会结合患者的特征和具体的数据采集上下文来支持其解释。尽管如此,大多数现有算法仍局限于仅信号分析,未能整合技术元数据和人口统计数据。本文提出了上下文语言引导的心脏病理分类(CLIC),一种多模态框架,通过自然语言编码这些变量显著提高诊断精度。我们证明将患者层面的上下文数据转化为描述性文本提供了一个信息锚点,帮助模型解歧复杂的生理模式。我们进一步探讨了使用大语言模型合成更丰富的临床描述,并观察到尽管这些生成的文本仍具竞争力,但受控模板化的上下文临床文本在下游分类任务中带来了持续的性能提升。

英文摘要

The electrocardiogram (ECG) is the gold standard for non-invasive diagnosis of cardiac pathologies and is a fundamental pillar of cardiovascular medicine. Recent progress in deep learning has led to the development of robust automated classifiers that achieve high performance by processing raw physiological signals. However, in clinical practice, diagnosis is rarely based solely on the signal. Cardiologists commonly support their interpretation with the patient's characteristics and the specific data-acquisition context. Despite this, most current algorithms remain restricted to signal-only analysis, failing to integrate technical metadata and demographic variables. This paper proposes Contextual Language-Informed Cardiac pathology classification (CLIC), a multimodal framework that significantly enhances diagnostic precision by encoding these variables through natural language. We demonstrate that translating patient-level contextual data into descriptive text provides an informative anchor that helps the model disambiguate complex physiological patterns. We further investigate the use of Large Language Models to synthesize richer clinical descriptions and observe that, while these generated texts remain competitive, controlled template-based contextual clinical text leads to consistent improvements in downstream classification performance.

2605.19130 2026-05-20 cs.LG cs.AI cs.CL cs.CV 版本更新

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

EgoBabyVLM:基于自然主义第一人称视频数据的跨模态学习基准测试

Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Éric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux

发表机构 * Meta Superintelligence Labs(Meta超智能实验室) Stanford University(斯坦福大学) Meta Reality Labs(Meta现实实验室) The University of Tokyo(东京大学)

AI总结 研究探讨了儿童如何从有限的视觉-语言输入中获得语言 grounding 的鲁棒性,提出了 EgoBabyVLM 挑战,推动模型在自然主义数据中实现 grounded language learning。

详情
AI中文摘要

儿童在有限的视觉-语言输入中展现出惊人的鲁棒性,这种能力超过了目前最好的大型多模态模型。最近的研究表明,目前基于 curated web 数据训练的视觉-语言模型 (VLMs) 无法泛化到由可穿戴设备、具身代理和婴儿头摄像机产生的稀疏、弱对齐的第一人称视频流,并且没有固定的评估流程来衡量在此类数据上的进展。我们训练 VLMs 在具有不同视觉和语言输入语义对齐程度的数据集上,包括自然主义婴儿和成人第一人称视频,并通过涵盖多模态语言 grounding 和单模态视觉和语言任务的综合评估套件进行评估。这套评估的核心是 Machine-DevBench,它是一个基于语料库的基准测试,自动从模型的训练词汇中生成,以消除训练/评估不匹配和先前发展基准的低统计效力。我们的结果表明,当前 VLM 模型依赖于 curated 数据的紧密语义对齐,并无法利用主导自然主义第一人称输入的弱对齐信号——正是人类在其中茁壮成长的领域。为了推动进展,我们引入了 EgoBabyVLM 挑战,以驱动开发能够从人类婴儿经历的此类自然主义数据中实现 grounded language learning 的模型。

英文摘要

Children acquire language grounding with remarkable robustness from limited visuo-linguistic input in ways that surpass today's best large multimodal models. Recent research suggests current vision-language models (VLMs) trained on curated web data fail to generalize to the sparse, weakly-aligned egocentric streams produced by wearable devices, embodied agents, and infant head-cams -- and no fixed evaluation pipeline exists for measuring progress on this regime. We train VLMs on datasets with varying degrees of semantic alignment between visual and linguistic inputs, including naturalistic infant and adult egocentric videos, and evaluate them with a comprehensive suite spanning multimodal language grounding and unimodal vision and language tasks. At the core of this suite is Machine-DevBench, a corpus-grounded benchmark of lexical and grammatical competence, automatically generated from the model's training vocabulary across logarithmic frequency bins to eliminate the train/eval mismatch and low statistical power of prior developmental benchmarks. Our results show that current VLM paradigms hinge on the tight semantic alignment of curated data and fail to exploit the weakly-aligned signal that dominates naturalistic egocentric input -- the very regime in which humans thrive. To motivate progress, we introduce the EgoBabyVLM Challenge to drive the development of models capable of grounded language learning from the kind of naturalistic data that human infants experience.

2605.19124 2026-05-20 cond-mat.mtrl-sci cond-mat.dis-nn cs.LG physics.chem-ph 版本更新

Atomistic Modeling of Chemical Disorder in Materials: Bridging Classical Methods and AI-Assisted Approaches

材料中化学无序的原子模型:连接经典方法和AI辅助方法

Jiayu Peng, Peichen Zhong

发表机构 * Department of Materials Design and Innovation, University at Buffalo(布法罗大学材料设计与创新系) Department of Materials Science and Engineering, National University of Singapore(新加坡国立大学材料科学与工程系)

AI总结 本文探讨了如何通过结合经典方法和AI技术来解决材料中化学无序的表示差距问题,重点介绍了如何利用计算方法将平均无序描述转换为具有代表性的构型集合,并平衡成本、偏差和保真度。

详情
AI中文摘要

化学无序源于多种元素占据晶格位置的混合占据,广泛存在于合金、陶瓷和成分复杂的材料中,其中短程和长程有序可以显著影响性质。一个核心障碍是实验与模拟之间的表示差距:实验通常报告无序为部分占据和集体平均行为,而原子模拟和AI工作流程通常需要完全指定的配置。解决这一差距需要能够将平均无序描述转换为代表性构型集合的计算方法,同时平衡成本、偏差和保真度。这一挑战在AI驱动的计算发现中变得更加紧迫,因为忽略无序可能导致AI工作流程错误排名稳定性、错误判断新颖性和误导实验,使用过于理想化的表示。本文综述了经典方法和AI驱动方法如何弥合这一表示差距。我们评估了从平均场理论、簇扩展、准随机近似、蒙特卡洛以及新兴的由通用原子间势能和生成模型驱动的方法的优缺点。我们还强调了AI如何通过降低微状态评估、构型探索和原子到热力学闭合的成本来加速经典计算方案。我们还强调了AI如何使无序原生能力成为可能,包括工作流程优先级、对有序敏感和化学表示、生成模型的无序结构和分布以及对动力学敏感的无序预测。共同,这一框架概述了通往无序原生AI的实用路线图,将化学无序从一个表示障碍转变为现实AI加速材料发现中的可控变量。

英文摘要

Chemical disorder, originating from the mixed occupation of crystallographic sites by multiple elements, is widespread in alloys, ceramics, and compositionally complex materials, where short- and long-range orderings can strongly influence properties. A central obstacle is the representation gap between experiments and simulations: experiments often report disorder as partial occupancies and ensemble-averaged behaviors, whereas atomistic simulations and AI workflows usually require fully specified configurations. Tackling this gap requires computational methods that convert averaged disorder descriptions into representative configurational ensembles while balancing cost, bias, and fidelity. This challenge has become more urgent in AI-driven computational discovery, where ignoring disorder may cause AI workflows to misrank stability, misjudge novelty, and misdirect experiments with too-idealized representations. This Review highlights how classical and AI-driven methods can bridge this representation gap. We assess the strengths and limitations of approaches spanning mean-field theories, cluster expansion, quasi-random approximations, Monte Carlo, and emerging schemes powered by universal interatomic potentials and generative models. We further highlight how AI can accelerate classical computational schemes by lowering the cost of microstate evaluation, configurational exploration, and atomistic-to-thermodynamic closure. We also emphasize how AI can enable disorder-native capabilities, including workflow triage, ordering-sensitive and alchemical representations, generative models of disordered structures and distributions, and kinetics-aware disorder prediction. Together, this framework outlines a practical roadmap toward disorder-native AI, which can transform chemical disorder from a representational obstacle into a controllable variable for realistic AI-accelerated materials discovery.

2605.19122 2026-05-20 stat.ML cs.LG 版本更新

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

双通道张量神经网络:有限样本理论与符合结构选择

Elynn Chen, Jiayu Li, Zheshi Zheng, Jian Pei

发表机构 * New York University(纽约大学) University of Michigan(密歇根大学) Duke University(杜克大学)

AI总结 本文提出双通道张量神经网络(DC-TNN),通过分解张量输入为低秩核心和稀疏细化部分,并通过耦合的神经通道处理两者。该框架结构无关,可容纳CP、Tucker和张量列车核心。在估计方面,建立了DC-TNN估计器的非渐近风险界,并展示了有效维度由核心秩和细化稀疏性共同决定。在推断方面,开发了结构感知符合ROC程序,产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此,提出了符合结构选择器,是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

详情
AI中文摘要

张量值数据自然出现在神经影像、基因组学、气候科学和时空网络中,其中多线性依赖关系在模式间携带信息,而向量化会破坏这些信息。现有方法要么施加单一低秩结构,可能遗漏局部信号,要么将张量视为长向量,从而丢弃其多维几何。我们提出双通道张量神经网络(DC-TNN),将每个张量输入分解为低秩核心和稀疏细化,并通过耦合的神经通道处理两个组件。该框架结构无关,可容纳CP、Tucker和张量列车核心于单一架构中。在估计方面,我们建立了DC-TNN估计器的非渐近风险界,将其分解为网络近似、核心估计和细化选择项,并显示有效维度由核心秩和细化稀疏性共同决定,而非由张量环境大小决定。在推断方面,我们开发了结构感知符合ROC程序,校准在核心-细化潜在空间中,并产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此,我们提出了符合结构选择器,据我们所知,是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

英文摘要

Tensor-valued data arise naturally in neuroimaging, genomics, climate science, and spatiotemporal networks, where multilinear dependencies across modes carry information that is destroyed under vectorization. Existing approaches either impose a single low-rank structure, which can miss localized signal, or treat the tensor as a long vector, which discards its multiway geometry. We propose a *Dual-Channel Tensor Neural Network* (DC-TNN) that decomposes each tensor input into a low-rank core and a sparse refinement, and processes the two components through coupled neural channels. The framework is structure-agnostic and accommodates CP, Tucker, and tensor-train cores within a single architecture. For estimation, we establish non-asymptotic risk bounds for the DC-TNN estimator that decompose into network approximation, core estimation, and refinement-selection terms, and show that the effective dimension is determined jointly by the core rank and refinement sparsity rather than by the ambient tensor size. For inference, we develop a *structure-aware conformal ROC* procedure that calibrates within the core-refinement latent space and produces ROC and AUC confidence bands with finite-sample, distribution-free coverage. Building on this, we propose a *conformal structure selector* that, to our knowledge, is the *first distribution-free procedure* for choosing among candidate tensor decompositions with finite-sample validity. Simulations and an analysis of a protein dataset demonstrate competitive predictive accuracy, reliable uncertainty quantification, and consistent recovery of the tensor structure.

2605.19119 2026-05-20 cs.NE cs.AI cs.LG 版本更新

GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization

GOAL: 图基基于的目标对齐扩散求解器用于动态多目标优化

Xingyu Li

发表机构 * Purdue University(普渡大学)

AI总结 本文提出GOAL,一种基于图的扩散求解器,用于动态多目标优化问题,通过条件化扩散求解器实现可控决策生成,通过人类指定的目标进行条件化,引入异构图编码,允许信息根据约束的本体进行选择性传播,并在三个经典调度基准上实现了100%的解可行性和接近零的MAPE。

详情
AI中文摘要

现有的神经组合优化求解器将解决方案搜索框定为模仿最优决策,本质上限制了其在单目标最小化和静态约束下的用途。我们提出了GOAL,一种基于关系图表示的条件扩散求解器,能够通过在人类指定的目标上进行条件化来实现可控的决策生成。我们引入了一种异构图编码,在其中不同的边类型,对应于不同类别的约束,定义了图神经网络的消息传递结构,这允许信息根据每个约束的本体进行选择性传播。GOAL在三个经典调度基准上进行了实例化和评估,这些基准涵盖了各种约束复杂度:流水作业问题(FSP)、作业调度问题(JSP)和灵活作业调度问题(FJSP)。在不进行架构修改的情况下,通用性在结构上不同的约束领域和问题类型中得到证明。在所有三个基准上,GOAL在20个作业和60个操作的问题规模上实现了100%的解可行性和接近零的MAPE(低于0.20%)在多个目标上,优于NSGA-II和MOEA/D在解质量和推理速度上最多提高了25倍。

英文摘要

Existing neural combinatorial optimization solvers frame solution search as imitation of optimal decisions, inherently limiting their utility to single-objective minimization and static constraints. We propose GOAL, a conditioned diffusion solver over relational graph representations that enables controllable decision generations by conditioning on human-specified objectives. We introduce a heterogeneous graph encoding in which distinct edge types, corresponding to different classes of constraints, define the message passing structure of the graph neural network, which allows information to propagate selectively according to the ontology of each constraint. GOAL is instantiated and evaluated on three canonical scheduling benchmarks of various constraint complexity: the Flow Shop Problem (FSP), the Job Shop Scheduling Problem (JSP), and the Flexible Job Shop Scheduling Problem (FJSP). Generalization is demonstrated across structurally distinct constraint regimes and problem types without architectural modification. On all three benchmarks, GOAL achieves 100% solution feasibility and near-zero MAPE (below 0.20%) on multiple objectives for problem sizes up to 20 jobs and 60 operations, outperforming NSGA-II and MOEA/D in both solution quality and inference speed by up to 25x.

2605.19113 2026-05-20 stat.ME cs.LG stat.ML 版本更新

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

通过直接优化学习可解释的基于点的临床风险评分

Ying Cui, Albert M Li, Vivek Charu, Yeon-Mi Hwang, Tina Hernandez-Boussard, Lu Tian

发表机构 * Department of Biomedical Data Science, Stanford University(斯坦福大学生物医学数据科学系) Decatur High School(德凯高中) Department of Pathology, Stanford University School of Medicine(斯坦福大学医学院病理学系) Division of Computational Medicine, Department of Medicine, Stanford University(斯坦福大学医学系计算医学分会)

AI总结 本文提出了一种新的机器学习算法,通过灵活的贪心优化策略直接学习可解释的基于点的临床风险评分,以在明确的最优性目标下优化加法评分。

Comments 23 pages, 4 figures

详情
AI中文摘要

许多临床风险评分被部署为加法规则,其中相关的二元预测特征被分配非负整数点。这些整数权重不仅使评分在实践中更容易使用,还促进了所得到的预测模型的稀疏性。此类风险评分通常通过首先拟合回归模型,然后经过适当缩放后将估计的系数四舍五入到最近的整数来获得。这种方法计算速度快,但不能保证最终评分的最优性。替代方法是通过遍历所有可能的整数权重,将问题视为整数规划任务,直接优化价值函数。然而,相关计算负担可能相当大,尤其是当价值函数是非凸甚至不连续时。在本文中,我们开发了新的机器学习算法,采用灵活的贪心优化策略,在明确且合理的最优性目标下直接学习此类加法评分。我们应用所提出的方法,利用Epic Cosmos中的大规模电子健康记录(EHR)队列,构建一个整数加权共病评分,用于衡量出院后死亡风险。我们还进行了模拟研究,以考察有限样本的操作特性。

英文摘要

Many clinical risk scores are deployed as additive rules with nonnegative integer points assigned to relevant binary predictive features. These integer weights not only make the score easier to use in practice but also promote sparsity in the resulting prediction model. Such risk scores are often derived by first fitting a regression model and then rounding the estimated coefficients to the nearest integer after appropriate scaling. This approach is computationally fast but does not guarantee optimality of the resulting score. Alternatively, one may search over all possible integer weights to directly optimize a value function by posing the problem as an integer programming task. However, the associated computational burden can be substantial, especially when the value function is nonconcave or even discontinuous. In this paper, we develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives. We apply the proposed method to a large electronic health record (EHR) cohort in Epic Cosmos to construct an integer-weighted comorbidity score for measuring the risk of post-discharge mortality. We also conduct a simulation study to examine the finite-sample operating characteristics.

2605.19107 2026-05-20 cs.LG eess.SP 版本更新

Performance Monitoring of Proton Exchange Membrane Water Electrolyzer by Transformers-Based Machine Learning Model

通过基于变压器的机器学习模型对质子交换膜水电解器进行性能监控

Bingqing Chen, Ivan Batalov, Qiu Chen, Weiqi Ji, Lei Cheng

发表机构 * Bosch Research & Technology Center(博世研发与技术中心)

AI总结 本文提出了一种基于变压器的机器学习框架,用于在正常运行过程中进行虚拟电化学表征,通过编码器-解码器结构对极化曲线进行重构,实现了对质子交换膜水电解器状态健康度的连续监控。

详情
AI中文摘要

绿色氢气在去碳化过程中扮演着关键角色,预计到2030年其容量将扩大至560 GW(2023年为1.39 GW)。质子交换膜(PEM)电解是生产绿色氢气最有前途的技术路线之一,实时监测PEM电解器的系统健康状况对于其规模化部署至关重要。在实验室环境中,可以通过电化学测试协议通过定期暂停正常运行来表征性能退化。这种中断对于大规模堆叠部署来说并不实用,限制了系统操作员对健康状态(SoH)进行实时评估的能力。本文提出了一种机器学习(ML)框架,可以在正常运行过程中进行虚拟电化学表征。该方法使用编码器-解码器变压器,基于操作数据来重构表征输出,重点关注极化曲线。受基于补丁的序列分词启发,我们将输入分割成补丁并对其进行编码,以形成有意义的标记,这大大提高了学习效率。在四次纵向运行中,持续时间最长为478小时,不同测试单元和负载循环下,模型准确重构了极化曲线,并相比普通变压器实现了均方误差(MSE)减少10倍。这一概念验证表明,ML模型可以实现PEM电解器的连续性能监控,并且编码器能够捕捉到SoH的有意义的潜在表示,为未来工作中的可解释指标推导提供了机会。

英文摘要

Green hydrogen plays an essential role in decarbonization, with capacity projected to scale to 560 GW by 2030 (vs. 1.39 GW in 2023) in net-zero settings. Proton exchange membrane (PEM) electrolysis is one of the most promising technology routes to green hydrogen production, and real-time system health monitoring of PEM electrolyzers is essential for their scalable deployment. In lab settings, performance degradation can be characterized through electrochemical testing protocols by periodic pauses of normal operation. Such interruption is not practical for full-scale stack deployments, limiting system operators' ability to make real-time assessments of state-of-health (SoH). We present a machine learning (ML) framework that performs virtual electrochemical characterization during normal operation. The method uses an encoder-decoder transformer, conditioned on operational data, to reconstruct characterization outputs, focusing here on polarization curves. Inspired by patch-based sequence tokenization, we segment the inputs into patches and encode them to form meaningful tokens, which substantially improves learning efficiency. Across four longitudinal runs, lasting up to 478 hours on different test cells and loading cycles, the model accurately reconstructed polarization curves and achieved 10x reduction in mean squared error (MSE) compared to a vanilla transformer. This proof-of-concept demonstrates that ML models can enable continuous performance monitoring for PEM electrolyzers and that the encoder captures meaningful latent representations of SoH, opening up opportunities to derive interpretable indicators in future work.

2605.19101 2026-05-20 cs.SD cs.LG 版本更新

Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training

面向异质性的数据集调度以实现高效的音频大语言模型训练

Yanru Wu, Jianning Wang, Chongxin Gan, Yang Li

发表机构 * Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) Independent Researcher(独立研究者) The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文提出了一种面向异质性的数据集调度方法GST,通过将数据集分组并按渐进调度策略引入,平衡了并行训练的稳定性与序列优化的效率,从而在14个AudioQA数据集上实现了30-40%的更快收敛速度。

详情
AI中文摘要

训练通用的音频大语言模型(ALLMs)以跨多样化的数据集进行训练对于全面的音频理解至关重要,但面临由于数据集异质性导致的显著挑战,这通常会导致冲突的梯度和缓慢的收敛。尽管其影响重大,如何在训练过程中显式管理这种异质性仍鲜有研究,当前的做法主要依赖于均匀混合。在本文中,我们从收敛性角度分析多数据集AudioQA训练,并提出分组序列训练(GST)。GST战略性地将数据集分为具有亲和力的数据集组,并通过渐进调度协议引入这些数据集,有效地平衡了并行训练的稳定性与序列优化的效率。为了确保可扩展性,我们开发了基于梯度的亲和度度量,以捕捉跨数据集的关系,而无需采用具有抑制成本的经验转移性估计。在14个AudioQA数据集上的广泛评估表明,GST在标准并行训练上实现了30-40%更快的收敛速度,同时保持或超越混合所有训练的性能。我们的结果提供了理论见解和一个实用且模型无关的框架,用于高效的大规模ALLM优化。

英文摘要

Training general-purpose Audio Large Language Models (ALLMs) across diverse datasets is essential for holistic audio understanding, yet it faces significant challenges due to dataset heterogeneity, which often leads to conflicting gradients and slow convergence. Despite its impact, how to explicitly manage this heterogeneity during training remains underexplored, with current practices relying primarily on uniform mixture. In this work, we analyze multi-dataset AudioQA training from a convergence perspective and propose Grouped Sequential Training (GST). GST strategically organizes datasets into affinity-aware groups and introduces them via a progressive scheduling protocol, effectively balancing the stability of parallel training with the efficiency of sequential optimization. To ensure scalability, we develop gradient-based affinity metrics that capture inter-dataset relationships without the prohibitive cost of empirical transferability estimation. Extensive evaluations on 14 AudioQA datasets spanning speech, music, and environmental sounds demonstrate that GST achieves 30--40\% faster convergence than standard parallel training while maintaining or even surpassing the performance of mix-all training. Our results provide both theoretical insights and a practical, model-agnostic framework for efficient large-scale ALLM optimization.

2605.19095 2026-05-20 cs.LG cs.AI stat.ML 版本更新

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

ScheduleFree+: 将学习率自由和调度自由学习扩展到大型语言模型

Aaron Defazio

发表机构 * FAIR at Meta Super-Intelligence Labs(Meta 超智能实验室)

AI总结 本文提出了一种学习率自由和调度自由的学习方法(ScheduleFree+),用于训练大型语言模型,该方法在大规模训练中显著优于传统的Warmup-Stable-Decay(WSD)调度方案,并证明了调度自由学习在长周期训练中的有效性。

详情
AI中文摘要

调度自由学习作为一种实用的随时训练方法,在机器学习中展示了其在数十个标准基准问题上的成功。然而,对于大型语言模型(LLM)训练,强大的性能仅在小规模情况下得到验证。我们识别出一系列必要的改进,以将调度自由学习扩展到更大的批量大小和模型大小,并提出了一种学习率自由和调度自由的方法(ScheduleFree+)用于训练大型语言模型,其性能显著优于Warmup-Stable-Decay(WSD)调度方案。我们还证明调度自由学习在长周期训练中最有效,并且在每参数1000个令牌的情况下,比最先进的调度方案高出31%。调度自由学习为预训练过程中模型平均和检查点合并的使用提供了理论基础。

英文摘要

Schedule-Free Learning has shown promise as a practical anytime training method for machine learning, showing success across dozens of standard benchmark problems. However, strong performance for LLM training has only been demonstrated at small scales. We identify a number of fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes, and present a learning-rate-free and schedule-free method (ScheduleFree+) for training large language models which greatly outperforms Warmup-Stable-Decay (WSD) schedules. We also demonstrate that Schedule-Free Learning is most effective for long duration training, and at 1000 tokens per parameter, it outperforms SOTA schedules by 31%. Schedule-Free Learning provides a theoretical foundation for the use of model averaging and checkpoint merging during pretraining.

2605.19093 2026-05-20 cs.AI cs.LG 版本更新

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

通过 elicitation 进行嵌入:用于系统提示贝叶斯优化的动态表示

Zhiyuan Jerry Lin, Benjamin Letham, Samuel Dooley, Maximilian Balandat, Eytan Bakshy

发表机构 * Meta

AI总结 本文研究了在仅有聚合反馈的情况下,如何通过动态表示进行系统提示的贝叶斯优化,提出了一种基于 elicitation 的嵌入方法 ReElicit,利用 LLM 构建可解释的特征空间,并通过概率高斯过程代理选择目标特征向量,最终实现系统提示的优化。

详情
AI中文摘要

系统提示是现代 AI 系统中的核心控制机制,在对话、任务和用户群体中塑造行为。然而,当反馈仅作为聚合度量而非每个示例的标签、失败或批评时,调整系统提示变得困难。我们研究了这种聚合反馈设置作为受限样本的黑盒优化问题,针对离散且长度可变的文本。我们引入了 ReElicit,一种基于 elicitation 的贝叶斯优化框架。给定任务描述、先前评估的提示和标量分数,LLM 会提取一个紧凑且可解释的特征空间,并将提示映射到其中。利用概率高斯过程代理,获取函数会选择目标特征向量,LLM 会实现并优化这些向量以生成可部署的系统提示。随着新评估的到来,重新提取特征空间使表示能够适应观察到的提示-分数历史。我们通过离线基准准确率作为受控的聚合代理来评估该设置:优化器观察每个提示的一个标量分数,而没有每个示例的标签、错误或批评。在十个系统提示优化任务中,使用 30 次总评估预算,ReElicit 在代表性聚合-only 提示优化基线中实现了最强的聚合性能。这些结果表明,LLM 不仅可以作为提示生成器,还可以作为适应性语义表示构建器,用于自然语言艺术的贝叶斯优化。

英文摘要

System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization framework based on \emph{embedding by elicitation}. Given a task description, previously evaluated prompts, and scalar scores, an LLM elicits a compact, interpretable feature space and maps prompts into it. Leveraging a probabilistic Gaussian process surrogate, an acquisition function then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Re-eliciting the feature space as new evaluations arrive lets the representation adapt to the observed prompt-score history. We evaluate the setting using offline benchmark accuracy as a controlled aggregate proxy: the optimizer observes one scalar score per prompt and no per-example labels, errors, or critiques. Across ten system prompt optimization tasks with a 30 total evaluation budget, ReElicit achieves the strongest aggregate performance profile among representative aggregate-only prompt-optimization baselines. These results suggest that LLMs can serve as adaptive semantic representation builders, not only prompt generators, for Bayesian optimization over natural-language artifacts.

2605.19092 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels

反事实可能性测试用于私人推理通道中的间接影响

Alexander Boesgaard Lorup

发表机构 * Openhagen

AI总结 本文提出了一种反事实可能性测试方法,用于衡量私人推理通道之间的影响力,通过替换上游私人块为匹配长度的供体块,并固定公共令牌序列和下游目标,测量下游目标的负对数似然变化,以评估私人和公共通道中的直接和间接影响。

Comments 12 pages, 4 figures, 5 tables

详情
AI中文摘要

推理系统越来越多地将中间计算分成私人和公共通道,产生在转录中看起来相似的评估案例:独立共推导、直接访问私人内容和通过公共通信的间接影响。本文提出了一种反事实可能性测试,用于测量私人推理通道之间的影响力。该方法用一个长度匹配的供体块替换上游私人块,固定公共令牌序列和下游目标,测量下游目标的负对数似然变化。在用于验证的7B角色通道推理模型上,文本探针不可靠:原始n-gram重叠高估了泄漏,修正重叠仍存在噪声,canary复现报告无区分能力。反事实可能性将未遮蔽和遮蔽条件分开,而长度匹配控制了RoPE位置混杂因素。在强化遮蔽验证中,B到A的反向影响接近于零,而A到B的影响通过公共语音隐藏状态持续存在。在三个检查点、五个种子和13,734个有效方向对比的多检查点验证中,重复了这种不对称性。一个图分离控制,阻止私人到公共的载体边,产生所有13,734个控制评估中自然和反事实分数位相同的结果,确定测试的公共通道路径是测量的反事实信号在实施的角色可见性遮蔽下的完整载体。结果表明,私人通道评估应分别报告直接和间接影响,并且反事实可能性探针为测量这些边界提供了实用的默认方法。

英文摘要

Reasoning systems increasingly separate intermediate computation into private and public channels, creating evaluation cases that look similar in transcripts: independent co-derivation, direct access to private content, and indirect influence through public communication. This paper presents a counterfactual likelihood test for measuring influence between private reasoning channels. The method replaces an upstream private block with a length-matched donor block, holds the public token sequence and downstream target fixed, and measures the downstream target's negative-log-likelihood shift. On a 7B role-channel reasoning model used for validation, textual probes are unreliable: raw n-gram overlap overstates leakage, corrected overlap remains noisy, and canary reproduction reports no discrimination. Counterfactual likelihood separates unmasked and masked conditions, while length matching controls a RoPE positional confound. In the hardened masked validation, reverse B-to-A influence is near zero, while A-to-B influence persists through public-speech hidden states. A multi-checkpoint validation across three checkpoints, five seeds, and 13,734 valid directional contrasts replicates this asymmetry. A graph-separation control that blocks private-to-public carrier edges produces bit-identical natural and counterfactual scores across all 13,734 control evaluations, identifying the tested public-channel pathway as the complete carrier of the measured counterfactual signal under the implemented role-visibility mask. The results show that private-channel evaluation should report direct and indirect influence separately, and that counterfactual likelihood probes provide a practical default for measuring these boundaries.

2605.19091 2026-05-20 cs.LG 版本更新

Chessformer: A Unified Architecture for Chess Modeling

Chessformer: 一个用于棋类建模的统一架构

Daniel Monroe, George Eilender, Philip Chalmers, Zhenwei Tang, Ashton Anderson

发表机构 * University of Toronto(多伦多大学) Williams College(威廉姆斯学院)

AI总结 本文提出Chessformer,一种统一的棋类建模架构,能够同时提升棋类建模的三大核心目标:提升棋力、预测人类下棋和增强可解释性。

Comments International Conference in Learning Representations (2026)

详情
AI中文摘要

棋类长期以来一直是人工智能的典型测试平台,但其核心任务的建模方法却各不相同。最大化棋力、预测人类下棋和增强可解释性通常使用不同的架构,这些设计往往与领域本身的几何结构不一致。这引出了一个自然问题:这些目标是否需要不同的建模范式,或者是否存在一个能够同时支持它们的单一架构?我们介绍了Chessformer,一种统一的架构,它在棋类建模的三个核心目标上都达到了最先进的水平。Chessformer是一种仅包含编码器的Transformer,将棋盘方格表示为标记,通过一种名为几何注意力偏置(GAB)的新动态位置编码来增强自注意力机制,该编码能够适应领域特定的几何结构,并通过基于注意力的源-目标策略头来预测动作。我们对Chessformer的每个方面进行了评估。首先,我们开发了\maiathree,一个用于预测人类下棋的模型家族,其移动匹配准确率达到57.1%,显著超越了之前最先进的方法,且参数量不到四分之一。其次,我们将Chessformer集成到领先的开源引擎Leela Chess Zero中,使其棋力提升超过100个Elo,并在主要的计算机国际象棋比赛中战胜Stockfish。第三,我们证明Chessformer的方格标记设计使注意力模式和激活可以直接归因于棋盘方格,从而实现细粒度的可解释性分析,而以前的架构不自然支持。更广泛地说,我们的结果表明,将模型的标记化、位置编码和输出设计与领域底层结构对齐,可以同时带来性能、人类兼容性和可解释性的提升。

英文摘要

Chess has long served as a canonical testbed for artificial intelligence, but modeling approaches for its central tasks have diverged. Maximizing playing strength, predicting human play, and enabling interpretability are typically solved with disparate architectures, and these designs are often misaligned with the geometry of the domain. This raises the natural question of whether these objectives require separate modeling paradigms, or if there exists a single architecture that supports them simultaneously. We introduce Chessformer, a unified architecture that advances the state of the art on all three central goals in chess modeling. Chessformer is an encoder-only transformer that represents board squares as tokens, augments self-attention with a novel dynamic positional encoding called Geometric Attention Bias (GAB) that adapts to domain-specific geometry, and predicts actions with an attention-based source-destination policy head. We evaluate Chessformer on each front. First, we develop \maiathree, a family of models for human move prediction that reaches 57.1\% move-matching accuracy, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Second, we integrate Chessformer into Leela Chess Zero, a leading open-source engine, adding over 100 Elo of playing strength and resulting in tournament victories over Stockfish in major computer chess competitions. Third, we show that Chessformer's square-token design makes attention patterns and activations directly attributable to board squares, enabling granular interpretability analyses that prior architectures do not naturally support. More broadly, our results demonstrate that aligning a model's tokenization, positional encoding, and output design with the underlying structure of a domain can yield simultaneous gains in performance, human compatibility, and interpretability.

2605.19080 2026-05-20 cs.LG cs.AI 版本更新

MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

MANGO:面向在线持续学习的元适应网络梯度优化

Ankita Awasthi, Marco Apolinario, Kaushik Roy

发表机构 * Purdue University(普渡大学) TU Delft(代尔夫特理工大学)

AI总结 本文提出MANGO框架,通过梯度门控和元学习正则化平衡持续学习中的稳定性与可塑性,实现对过去任务遗忘的克服和新任务高效学习。

详情
AI中文摘要

在在线持续学习(OCL)中,神经网络在单次通过中从非平稳数据流中依次学习,仅能访问有限的内存回放缓冲区。这与离线持续学习形成鲜明对比,后者依赖多个epoch训练大型数据集。OCL的主要挑战是克服对过去任务的灾难性遗忘(稳定性)的同时高效学习新任务(可塑性)。现有方法通过回放式复习、输出级蒸馏、固定正则化或当前数据上的元学习来对抗遗忘。然而,这些方法存在局限:复习引入存储样本偏差;蒸馏在输出分布上操作而无法调节参数更新;固定正则化对参数施加惩罚而不考虑敏感性;仅基于数据流的元学习缺乏反馈控制的参数更新。我们提出元适应网络梯度优化(MANGO),一种OCL框架,通过梯度门控和元学习正则化平衡稳定性与可塑性。梯度门控根据敏感性调整参数更新,防止破坏性更新。元学习正则化适应稳定性系数,评估参数更新对回放的影响。在MANGO中,回放同时充当训练信号和遗忘评估器。我们在三个标准OCL基准数据集上评估了我们的方法。MANGO在多个基准上优于强基线方法,取得最先进的结果,并在不同回放大小下保持一致性能。在CLEAR-10上的领域增量学习和CIFAR-100和Tiny-ImageNet上的类别增量学习中,它在所有基线中取得最高准确率,并实现正向反馈转移,克服CLEAR-10上的遗忘。

英文摘要

In Online Continual Learning (OCL), a neural network sequentially learns from a non-stationary data stream in a single-pass with access only to a limited memory replay buffer. This contrasts sharply with off-line continual learning where training is multiple epoch dependent on large datasets. The main challenge faced by OCL is to overcome catastrophic forgetting of past tasks (stability) while learning new ones efficiently (plasticity). Existing methods counter forgetting via replay-based rehearsal, output level distillation, fixed regularization, or meta-learning on the current data. However, these methods have limitations: rehearsal introduces a stored sample bias; distillation operates on output-distributions without modulating parameter updates; fixed-regularization penalizes parameters irrespective of sensitivity; stream-only meta-learning lacks a feedback controlled parameter update. We propose Meta-Adaptive Network Gradient Optimization (MANGO), an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. We evaluated our method on three standard OCL benchmark datasets. MANGO outperforms strong baselines, achieving state-of-the-art results with consistent performance across replay sizes. In domain incremental learning on CLEAR-10 and class incremental learning on CIFAR-100 and Tiny-ImageNet, it achieves highest accuracy among all baselines and achieves positive Backward Transfer, overcoming forgetting on CLEAR-10.

2605.19076 2026-05-20 cs.LG physics.flu-dyn 版本更新

The impact of observation density on Bayesian inversion of latent dynamics in shock-dominated flows

观测密度对冲击主导流动中潜变量动态贝叶斯反演的影响

Bipin Tiwari, Muhammad Abid, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville(田纳西大学机械与航空航天工程系)

AI总结 本文提出了一种非侵入式降阶建模框架,用于高效贝叶斯初始状态反演与不确定性量化,通过卷积自编码器和学习的潜空间前向算子结合,以提高冲击主导流动中潜变量动态的反演精度和效率。

详情
AI中文摘要

从稀疏和噪声测量中推断冲击主导可压缩流动中未知的初始状态是一个具有挑战性的不适定反问题,由于非线性波相互作用和传感限制。在本工作中,我们开发了一种非侵入式降阶建模框架,用于高效的贝叶斯初始状态反演与不确定性量化。该框架结合了卷积自编码器和学习的潜空间前向算子。自编码器将高维流动场压缩成紧凑的非线性潜表示,而前向算子从编码的初始条件预测最终时间的潜变量状态。该AE-ROM代理能够快速进行正向评估,并嵌入到No-U-Turn Sampler (NUTS)中进行后验探索。该框架通过拉丁超立方采样生成500个高保真度Sod冲击管模拟,并使用五阶WENO方案求解。反问题旨在从稀疏噪声观测的最终时间密度和压力场中恢复未知的左和右密度和压力状态。结果表明,AE-ROM能够准确重建关键的冲击管结构,包括稀疏波、接触不连续性和激波前。潜变量维度为32提供了重建精度和减少空间紧凑性之间的有效平衡,而250个训练模拟足以实现准确的重建。增加观测密度显著收缩后验不确定性,将密度的均值后验标准差减少约78%,压力减少约76%。总体而言,所提出的框架为冲击主导流动的反演分析提供了一种计算高效且具有不确定性的方法,具有向多维可压缩流动和数字孪生应用扩展的潜力。

英文摘要

Inferring unknown initial states in shock-dominated compressible flows from sparse and noisy measurements is a challenging ill-posed inverse problem due to nonlinear wave interactions and limited sensing. In this work, we develop a non-intrusive reduced-order modeling framework for efficient Bayesian initial-state inversion with uncertainty quantification. The framework combines a convolutional autoencoder with a learned latent-space forward operator. The autoencoder compresses high-dimensional flow fields into a compact nonlinear latent representation, while the forward operator predicts final-time latent states from encoded initial conditions. This AE-ROM surrogate enables rapid forward evaluations and is embedded within a No-U-Turn Sampler (NUTS) for posterior exploration. The framework is demonstrated using 500 high-fidelity Sod shock tube simulations generated through Latin hypercube sampling and solved using a fifth-order WENO scheme. The inverse problem seeks to recover unknown left and right density and pressure states from sparse noisy observations of final-time density and pressure fields. Results show that the AE-ROM accurately reconstructs key shock-tube structures, including the rarefaction wave, contact discontinuity, and shock front. A latent dimension of 32 provides an effective balance between reconstruction accuracy and reduced-space compactness, while 250 training simulations are sufficient for accurate reconstruction. Increasing observation density significantly contracts posterior uncertainty, reducing the mean posterior standard deviation by approximately 78% for density and 76% for pressure. Overall, the proposed framework provides a computationally efficient and uncertainty-aware approach for inverse analysis of shock-dominated flows, with potential extensions to multidimensional compressible-flow and digital-twin applications.

2605.19073 2026-05-20 cs.LG cs.AI 版本更新

Riemannian Networks over Full-Rank Correlation Matrices

全秩相关矩阵上的Riemannian网络

Ziheng Chen, Xiaojun Wu, Bernhard Schölkopf, Nicu Sebe

发表机构 * Department of Information Engineering and Computer Science, University of Trento, Trento, Italy(特伦托大学信息工程与计算机科学系) School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China(江南大学人工智能与计算机科学学院)

AI总结 本文提出了一种在全秩相关矩阵上进行Riemannian网络的研究,通过扩展基本层并引入准确的反向传播方法,展示了其在对比现有SPD和Grassmannian网络时的有效性。

Comments Accepted to ICML 2026

详情
AI中文摘要

在不同应用中,对称正定(SPD)流形上的表示已引起广泛关注。相比之下,全秩相关矩阵流形,作为SPD矩阵的归一化替代品,仍然鲜为人知。本文介绍了在相关流形上进行的Riemannian网络,利用了五种最近发展的相关几何结构。我们系统地扩展了基本层,包括多项式对数回归(MLR)、全连接(FC)和卷积层,到这些几何结构上。此外,我们还提出了用于两种相关几何结构的准确反向传播方法。通过与现有SPD和Grassmannian网络的比较实验,展示了该方法的有效性。

英文摘要

Representations on the Symmetric Positive Definite (SPD) manifold have garnered significant attention across different applications. In contrast, the manifold of full-rank correlation matrices, a normalized alternative to SPD matrices, remains largely underexplored. This paper introduces Riemannian networks over the correlation manifold, leveraging five recently developed correlation geometries. We systematically extend basic layers, including Multinomial Logistic Regression (MLR), Fully Connected (FC), and convolutional layers, to these geometries. Besides, we present methods for accurate backpropagation for two correlation geometries. Experiments comparing our approach against existing SPD and Grassmannian networks demonstrate its effectiveness.

2605.19063 2026-05-20 cs.LG 版本更新

Mapping Uncharted Symmetries: Machine Discovery in Combinatorics

映射未知对称性:组合学中的机器发现

Eugenio Cainelli, Lorenzo Luccioli, Alessandro Iraci, Michele D'Adderio, Giovanni Paolini

发表机构 * University of Bologna(博洛尼亚大学) Pegaso University(佩加索大学) University of Pisa(比萨大学)

AI总结 本文提出了一种基于机器学习的组合学研究方法,通过构建满足精确分布约束的简单数学函数,发现q,t-纳尔ayan多项式的新组合解释,并提供了其对称性的证明。

Comments 20 pages

详情
AI中文摘要

受代数组合学中长期未解决的问题启发,我们展示了现代机器学习可以有意义地贡献于可验证的数学发现。特别是,我们关注在精确分布约束下构造简单数学函数的问题,将其正式化为简单学习在刚性比例下(SLURP)。我们通过引入两种方法:MapSeek-Functional,通过交替伪标签和监督训练步骤建模所需函数;以及MapSeek-Symbolic,直接生成符号公式。我们成功将这两种方法应用于代数组合学中的研究问题,发现了来自表示论的q,t-纳尔ayan多项式的新组合解释。据我们所知,这是基于非交叉划分的第一个此类解释。使用一个发现的统计量,我们找到了这些多项式对称性的组合证明,在之前未解决的情况下。为了简化验证和可重复性,我们发布了所有代码,包括本文所有数学发现的Lean 4形式化。

英文摘要

Inspired by long-standing open problems in algebraic combinatorics, we show that modern machine learning can meaningfully contribute to verifiable mathematical discoveries. In particular, we focus on the construction of simple mathematical functions under exact distributional constraints, a setting we formalize as Simple Learning Under Rigid Proportions (SLURP). We tackle this problem by introducing two methods: MapSeek-Functional, which models the desired function alternating pseudo-labeling and supervised training steps; and MapSeek-Symbolic, designed to directly produce symbolic formulas. We successfully apply both methods to a research problem in algebraic combinatorics, discovering a new combinatorial interpretation of the $q,t$-Narayana polynomials arising from representation theory. To our knowledge, this is the first such interpretation based on noncrossing partitions. Using one discovered statistic, we find a combinatorial proof of the symmetry of these polynomials in a previously unsolved case. To streamline verification and reproducibility, we release all code, including a formalization of all the mathematical discoveries of this paper in Lean 4.

2605.19050 2026-05-20 cs.LG physics.chem-ph q-bio.QM 版本更新

Generative Pseudo-Force Fields for Molecular Generation

生成伪力场用于分子生成

Stefaan Simon Pierre Hessmann, Khaled Kahouli, Stefan Gugler, Michael Plainer, Frank Noé, Klaus-Robert Müller, Niklas Wolf Andreas Gebauer

发表机构 * Machine Learning Group, Technische Universität Berlin(技术大学柏林机器学习小组) BIFOLD – Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究院) Department of Mathematics and Computer Science, Freie Universität Berlin(柏林自由大学数学与计算机科学系) Zuse School ELIZA, Darmstadt, Germany(达姆施塔特德国Zuse学校ELIZA) Department of Physics, Freie Universität Berlin(柏林自由大学物理系) Microsoft Research AI4Science, Berlin, Germany(柏林德国微软研究院AI4Science) Department of Chemistry, Rice University, Houston, USA(美国休斯顿莱斯大学化学系) Max-Planck Institute for Informatics, Saarbrücken, Germany(德国萨尔布吕肯马克斯·普朗克信息研究所) Department of Artificial Intelligence, Korea University, Seoul, South Korea(韩国首尔韩国大学人工智能系)

AI总结 本文提出生成伪力场(GPFFs)以解决分子生成中能量基放松与数据驱动生成模型采样效率之间的权衡问题,通过训练MLFF在参考平衡结构上的二次伪势能面上实现高效且稳定的分子构象生成。

详情
AI中文摘要

生成稳定的分子构象通常需要在基于物理的能量放松的物理真实性和数据驱动生成模型的采样效率之间做出权衡。虽然机器学习力场(MLFFs)可以通过根据物理力放松分子几何结构来采样稳定的构象,但它们需要昂贵的从头计算训练数据。相反,扩散模型(DMs)仅从平衡数据学习,但依赖于噪声调度和时间步长条件。在本文中,我们提出生成伪力场(GPFFs)以弥合这些范式,通过在参考平衡结构上的二次伪势能面上训练MLFF。由于不需要对扰动几何进行从头计算,非平衡训练数据可以通过对平衡结构添加高斯噪声实时生成。我们证明GPFFs是方差爆炸扩散模型的时间步长无关变种:分数来自预测的伪力,但力的大小隐含地编码了噪声水平,因此不需要时间步长条件。我们的GPFF因此可以作为标准扩散采样(祖先、Heun)中的直接替换,也可以促进更高效、自适应的变种和一个受MLFF启发的直接去噪方案。我们提出的采样算法支持任意的结构先验和几何约束。在QM9数据集上,GPFF在256个神经函数评估(NFE)时有100%的有效性,在仅6个NFE时超过50%,优于所有扩散基线。结合自定义先验,我们在分子编辑器中展示了我们的方法在药物设计设置中的快速和准确的生成过程,其中分子在实时中生成。

英文摘要

Generating stable molecular conformations typically forces a tradeoff between the physical realism of energy-based relaxation and the sampling efficiency of data-driven generative models. While machine learning force fields (MLFFs) can sample stable conformations by relaxing molecular geometries according to physical forces, they require costly ab-initio training data. Conversely, diffusion models (DMs) learn from equilibrium data alone but are dependent on noise schedules and time-step conditioning. In this work, we propose generative pseudo-force fields (GPFFs) to bridge these paradigms by training an MLFF on a quadratic pseudo-potential energy surface relative to reference equilibrium structures. Because no ab-initio calculations are required for the perturbed geometries, non-equilibrium training data can be generated on the fly by perturbing the equilibria with Gaussian noise. We show that GPFFs constitute a time-step-agnostic variant of variance exploding DMs: the score comes from the predicted pseudo-forces but because force magnitudes implicitly encode the noise level, no time-step conditioning is needed. Our GPFF can hence be used as a drop-in replacement in standard diffusion sampling (ancestral, Heun) but also facilitates more efficient, adaptive variants and an MLFF inspired direct denoising scheme. Our proposed sampling algorithms support arbitrary structural priors and geometric constraints. On QM9, GPFF has 100 % validity at 256 neural function evaluations (NFE) and over 50 % at just 6 NFE, outperforming diffusion baselines across all samplers. Combined with custom priors, we showcase the fast and accurate generation process of our method in a molecular editor for a drug design setting, where a molecule is generated in real time.

2605.19049 2026-05-20 cs.LG cs.AI 版本更新

KVBuffer: IO-aware Serving for Linear Attention

KVBuffer: 为线性注意力设计的I/O感知服务

Longwei Zou, Lin Zhong

发表机构 * Department of Computer Science(计算机科学系)

AI总结 本文提出KVBuffer,一种I/O感知的线性注意力服务机制,通过缓冲最近的键和值,使服务系统能够更灵活且高效地计算线性注意力输出,从而减少内存访问和解码延迟,提升服务性能。

详情
AI中文摘要

线性注意力因在长上下文推理中具有与上下文长度无关的恒定解码成本而受到广泛关注。然而,现有服务系统通常在每次解码步骤中递归计算和更新一个大的线性注意力状态,由于该状态远大于每个token的键和值,递归解码导致显著的内存访问开销,对服务线性注意力效率低下。在本文中,我们提出KVBuffer,一种为线性注意力设计的I/O感知服务机制。通过缓冲最近的键和值,KVBuffer使服务系统能够以更灵活且内存高效的方式计算线性注意力输出。对于解码,KVBuffer支持分块计算,通过延迟状态更新并批量应用,减少了平均内存访问和解码延迟。对于推测解码,KVBuffer并行验证草案token并避免存储临时状态。对于短上下文,KVBuffer直接从缓冲的键和值计算注意力输出,无需创建或更新线性注意力状态。我们将在SGLang中实现KVBuffer用于Qwen3-Next。我们的评估显示,当验证四个草案token时,KVBuffer可将线性注意力解码延迟降低高达45.17%,并使推测解码的最大服务请求数增加5倍。

英文摘要

Linear attention has recently gained significant attention for long-context inference due to its constant decoding cost with respect to context length. However, existing serving systems typically serve linear attention by recurrently computing and updating a large linear attention state in every decoding step. Since the state is much larger than the per-token key and value, recurrent decoding incurs substantial memory access and becomes inefficient for serving linear attention. In this paper, we propose KVBuffer, an IO-aware serving mechanism for linear attention. By buffering recent keys and values, KVBuffer enables serving systems to compute linear attention outputs in more flexible and memory-efficient ways. For decoding, KVBuffer enables chunkwise computation, which reduces average memory access and decoding latency by deferring state updates and applying them in batch. For speculative decoding, KVBuffer verifies draft tokens in parallel and avoids storing temporary states. For short contexts, KVBuffer computes attention outputs directly from buffered keys and values, without creating or updating the linear attention state. We implement KVBuffer in SGLang for Qwen3-Next. Our evaluations show that KVBuffer can reduce linear attention decoding latency by up to 45.17% and increase the maximum number of serving requests by 5x for speculative decoding when verifying four draft tokens.

2605.19038 2026-05-20 cs.RO cs.LG 版本更新

Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic

用时空逻辑引导神经符号场景生成

Lorenzo Bonin, Francesco Giacomarra, Luca Bortolussi, Jyotirmoy V. Deshmukh, Francesca Cairoli

发表机构 * University of Trieste(特里埃斯特大学) University of Southern California(南加州大学)

AI总结 本研究提出STRELGen框架,结合扩散模型和时空逻辑规范,高效生成安全关键的多智能体驾驶场景,提升自动驾驶系统的鲁棒性验证能力。

详情
AI中文摘要

自动驾驶技术的快速发展已远超安全评估方法的进展。传统测试依赖于暴露自动驾驶系统于大量真实交通场景,这是一种成本高昂且统计上无法有效捕捉罕见安全关键边缘情况的暴力方法。为解决这一根本限制,我们引入STRELGen,一个可扩展的框架,用于目标生成安全关键的驾驶场景。STRELGen协同结合多智能体轨迹生成扩散模型(DM)与通过高度可解释的形式化方法编码复杂安全和现实属性的时空逻辑(STREL)规范。关键在于监控这些规范的满足程度是可微的,从而允许基于梯度的搜索。在推理时间,我们直接优化DM的潜在空间以最大化STREL公式满足程度。结果是高效生成高度可信且安全关键的多智能体场景,这些场景位于学习的数据分布内。STRELGen因此提供了一种灵活、可解释且强大的工具,用于对自动驾驶系统进行压力测试,超越了暴力数据收集的限制。

英文摘要

The rapid advancement of autonomous driving (AD) technologies has outpaced the development of robust safety evaluation methods. Conventional testing relies on exposing AD systems to vast numbers of real-world traffic scenes -- a brute-force approach that is prohibitively expensive and statistically ineffective at capturing the rare, safety-critical edge cases essential for validating real-world robustness. To address this fundamental limitation, we introduce STRELGen, a scalable framework for the targeted generation of safety-critical driving scenarios. STRELGen synergistically combines a multi-agent trajectory-generation diffusion model (DM) with Spatio-Temporal Logic (STREL) specifications that encode complex safety and realism properties through a highly interpretable formalism. Crucially, monitoring satisfaction levels of these specifications is differentiable, enabling gradient-based search. At inference time, we optimize directly over the DM latent space to maximize STREL formula satisfaction. The result is efficient generation of highly plausible yet safety-critical multi-agent scenarios that lie within the learned data distribution. STRELGen thus provides a flexible, interpretable, and powerful tool for stress-testing autonomous driving systems, moving beyond the limitations of brute-force data collection.

2605.19033 2026-05-20 cs.RO cs.AI cs.CV cs.LG cs.MA 版本更新

RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

RLFTSim: 通过强化学习微调实现逼真且可控的多智能体交通仿真

Ehsan Ahmadi, Hunter Schofield, Behzad Khamidehi, Fazel Arasteh, Jinjun Shan, Lili Mou, Dongfeng Bai, Kasra Rezaee

发表机构 * University of Alberta(阿尔伯塔大学) Huawei Technologies Canada(华为加拿大技术有限公司) York University(约克大学) Canada CIFAR AI Chair, Amii(加拿大 CIFAR 人工智能主席,Amii)

AI总结 本文提出RLFTSim框架,通过强化学习微调提升交通仿真场景的真实感,并通过目标条件化方法实现对交通仿真可控性的提炼,实验表明其在真实感和可控性方面均优于其他启发式搜索方法。

Comments CVPR 2026 Highlight; Project page at https://ehsan-ami.github.io/rlftsim

详情
AI中文摘要

监督式开环训练已被广泛用于训练交通仿真模型;然而,它无法捕捉复杂驾驶场景中固有的动态性和多智能体交互。我们引入RLFTSim,一种基于强化学习的微调框架,通过将模拟器运行与真实世界数据分布对齐来增强场景真实性,并提供一种方法用于在场景生成中提炼目标条件化的可控性。我们基于预训练的仿真模型实例化RLFTSim,设计一种平衡保真度和可控性的奖励函数,并在Waymo Open Motion Dataset上进行了全面实验。我们的结果表明在真实感方面取得了改进,实现了最先进的性能。与其它基于启发式搜索的微调方法相比,RLFTSim由于提出了一种低方差且密集的奖励信号,所需样本显著更少,并且通过设计直接解决了真实感对齐问题。我们还通过目标条件化展示了我们方法在提炼交通仿真可控性方面的有效性。项目页面可在https://ehsan-ami.github.io/rlftsim上访问。

英文摘要

Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.

2605.19028 2026-05-20 cs.LG 版本更新

Learning When to Adapt

学习何时适应

Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich

发表机构 * Universität des Saarlandes(萨尔布吕肯大学) CISPA Helmholtz Center for Information Security(信息安全研究中心)

AI总结 本文提出DISeL,一种动态输入敏感的低秩适应方法,通过引入轻量级输入依赖门控机制,减少遗忘并保持微调准确性,同时提供可解释的诊断视图。

Comments Preprint

详情
AI中文摘要

低秩适应(LoRA)是一种广泛使用的参数高效微调方法,但其学习修正却是静态的:相同的低秩更新被应用于每一个输入。这种输入无关的方法在适应微调分布和保持预训练行为在该分布之外的输入之间造成不可避免的权衡,导致灾难性遗忘。我们引入DISeL(动态输入敏感LoRA),通过在LoRA模块中添加轻量级的输入依赖门控机制,增强每个秩一组件。门控机制默认保留预训练模型的行为,而训练过程学习激活选定的组件以减少微调损失。DISeL仅添加少量参数并保持低秩结构。在RoBERTa在GLUE上的表现,以及经过数学推理和代码生成微调的Llama和Mistral模型中,DISeL相比LoRA及相关变体减少了遗忘,同时保持竞争性的微调准确性。此外,学习到的门控激活提供了可解释的诊断视图,显示哪些层和秩组件在微调过程中最活跃,从而提供关于任务特定适应集中位置的见解。代码可在https://github.com/alizindari/DISeL获得。

英文摘要

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior on inputs outside that distribution, contributing to catastrophic forgetting. We introduce DISeL (Dynamic Input-Sensitive LoRA), which augments LoRA modules with lightweight input-dependent gates over individual rank-one components. The gating mechanism is designed to preserve the pre-trained model's behavior by default, while training learns to activate selected components that reduce the fine-tuning loss. DISeL adds only a small number of parameters and preserves the low-rank structure. Across RoBERTa on GLUE, and Llama and Mistral models fine-tuned for mathematical reasoning and code generation, DISeL reduces forgetting relative to LoRA and related variants while maintaining competitive fine-tuning accuracy. In addition, the learned gate activations provide an interpretable diagnostic view of which layers and rank components are most activated during fine-tuning, giving insight into where task-specific adaptation is concentrated. Code available at https://github.com/alizindari/DISeL .

2605.19024 2026-05-20 stat.ML cs.LG stat.ME 版本更新

Conformal Prediction via Transported Beta Laws

通过运输的贝塔定律进行符合预测

Thiago R. Ramos, Helton Graziadei, Luben M. C. Cabezas

发表机构 * Federal University of São Carlos(萨尔瓦多联邦大学) University of São Paulo(圣保罗大学) Inria(法国国家信息与自动化技术研究院) Université Grenoble Alpes(格勒诺布尔阿尔卑斯大学)

AI总结 本文研究了通过实现的符合阈值诱导的校准-条件覆盖定律,利用贝塔分布作为有限样本参考对象,并通过Wasserstein距离量化偏离,从而提供对边际覆盖差距和坏校准概率的直接界限,并区分不同非i.i.d行为的来源。

详情
AI中文摘要

分割符合预测在交换性下提供有限样本边际覆盖保证,但此保证平均于随机校准样本。我们研究的是由实现的符合阈值诱导的校准-条件覆盖定律。在连续i.i.d情况下,此定律恰好为Beta(k,n+1-k),因此常规的边际保证对应于其均值。我们将此贝塔定律作为有限样本参考对象,并利用Wasserstein距离在[0,1]上量化偏离。该框架提供了对边际覆盖差距和坏校准概率的直接界限,并根据如何变形贝塔参考来区分不同的非i.i.d行为:测试侧偏移通过覆盖尺度上的运输映射作用,而校准依赖性改变顺序统计学定律本身。我们将在尺度-偏移、聚类和稳定混合设置中实例化该框架,其中诱导的变形可以明确表征或通过Berry-Esseen近似表征。在依赖过程上的模拟证实,一阶近似在中等样本大小下能够跟踪经验Wasserstein距离。

英文摘要

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

2605.18474 2026-05-20 cs.CR cs.AI cs.CL cs.LG 版本更新

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint: 通过文本到权重生成实现即插即用的LLM指纹生成

Sixu Chen, Xiang Chen, Hongyao Yu, Jiaxin Hong, Hao Fang, Shuoyang Sun, Bin Chen, Shu-Tao Xia

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China(清华大学深圳国际研究生院,中国深圳) South China University of Technology, Guangzhou, China(华南理工大学,中国广州) Harbin Institute of Technology, Shenzhen, Shenzhen, China(哈尔滨工业大学深圳校区,中国深圳)

AI总结 本文提出Prompt2Fingerprint框架,将LLM指纹生成重新定义为条件参数生成任务,通过专用生成器将文本描述直接映射到低秩参数增量,实现无需进一步模型微调的即插即用LLM指纹注入,显著降低计算开销,提供可扩展且即时的LLM所有权管理解决方案。

详情
AI中文摘要

大规模语言模型(LLMs)的广泛部署和重新分布使模型溯源跟踪成为关键挑战。尽管现有的LLM指纹生成方法,特别是通过微调嵌入身份信号的主动方法,实现了高准确性和鲁棒性,但它们面临显著的可扩展性瓶颈。这些方法通常将指纹注入视为一个独立的一次性优化任务,而不是可重用的能力,需要为每个新身份进行单独且资源密集的训练。这导致了高昂的计算成本和部署延迟。为了解决这一问题,我们提出了Prompt2Fingerprint(P2F),这是首个将指纹生成重新定义为条件参数生成任务的框架。通过利用专用生成器,P2F在单次前向传递中将文本描述直接映射到低秩参数增量,从而实现无需进一步模型微调的即插即用LLM指纹注入。我们的实验表明,P2F在保持高指纹准确度、无害性和鲁棒性的同时,显著降低了计算开销,为LLM所有权管理提供了可扩展且即时的解决方案。

英文摘要

The widespread deployment and redistribution of large language models (LLMs) have made model provenance tracking a critical challenge. While existing LLM fingerprinting methods, particularly active approaches that embed identity signals via fine-tuning, achieve high accuracy and robustness, they suffer from significant scalability bottlenecks. These methods typically treat fingerprint injection as an independent, one-off optimization task rather than a reusable capability, necessitating separate, resource-intensive training for every new identity. This incurs prohibitive computational costs and deployment delays. To address this, we propose Prompt2Fingerprint (P2F), the first framework that reformulates fingerprinting as a conditional parameter generation task. By leveraging a specialized generator, P2F maps textual descriptions directly to low-rank parameter increments in a single forward pass, enabling plug-and-play LLM fingerprint injection without further model retraining. Our experiments demonstrate that P2F maintains high fingerprint accuracy, harmlessness, and robustness while significantly reducing computational overhead, offering a scalable and instant solution for LLM ownership management.

2605.18445 2026-05-20 cs.CV cs.AI cs.CL cs.LG 版本更新

What's Holding Back Latent Visual Reasoning?

是什么在阻碍潜在视觉推理?

André G. Viveiros, Nuno Gonçalves, André F. T. Martins, Matthias Lindemann

发表机构 * Instituto Superior Técnico, Universidade de Lisboa(里斯本大学理工学院) Instituto de Telecomunicações(电信研究所) TransPerfect(TransPerfect公司) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本研究探讨了现有模型如何利用潜在令牌,发现潜在令牌在最终预测中起作用有限,主要问题在于训练数据中潜在令牌信息有限且推理时生成的潜在令牌偏离真实表示,需要高质量数据和更精确的潜在令牌预测来推动发展。

详情
AI中文摘要

人类通过心理模拟中间视觉步骤来解决复杂视觉问题,而非仅通过语言推理。受此启发,近期有关视觉-语言模型的工作探索了连续潜在令牌作为中间视觉想象步骤的链式推理。在本工作中,我们研究了近期模型如何利用此类潜在令牌。令人惊讶的是,当潜在令牌被无信息的占位符令牌替代时,模型准确性不受影响。这表明潜在令牌在模型最终预测中起最小的因果作用。为了更好地理解这一现象,我们分析了由oracle潜在表示提供的训练信号以及推理时生成的潜在令牌质量。我们的实验揭示了两个阻碍潜在视觉推理的关键问题:首先,在大多数现有数据集中,oracle潜在令牌提供的信息有限,仅超出原始图像,且不显著简化任务,导致模型在训练时忽略它们,并在推理时有效绕过它们。当在诊断数据集上微调时,其中潜在令牌为最终预测提供充分支持,我们显示模型可以因果依赖于它们。其次,在推理时生成的潜在令牌偏离其对应的oracle表示,坍缩到狭窄区域,即使模型依赖它们也无法获得收益。总体而言,我们的发现表明,未来潜在视觉推理的进步取决于两个关键支柱:具有信息性中间步骤的高质量数据集和更精确的潜在令牌预测。

英文摘要

Humans can approach complex visual problems by mentally simulating intermediate visual steps, rather than reasoning through language alone. Inspired by this, several works on Vision-Language Models have recently explored chain-of-thought reasoning with continuous latent tokens as intermediate visual imagination steps. In this work, we investigate how recent models leverage such latent tokens. Surprisingly, we find that model accuracy is unaffected when latent tokens are replaced by uninformative dummy tokens. This indicates that latent tokens play a minimal causal role in the model's final prediction. To better understand this phenomenon, we analyze both the training signal provided by oracle latent representations and the quality of the latent tokens generated at inference time. Our experiments reveal two crucial issues holding back latent visual reasoning: First, in most existing datasets, oracle latent tokens provide limited additional information beyond the original image and do not substantially simplify the task, leading models to ignore them during training and effectively bypassing them at inference time. When fine-tuned on a diagnostic dataset, in which latent tokens provide sufficient support for the final prediction, we show that models can causally rely on them. Second, the latent tokens produced at inference time deviate from their corresponding oracle representations, collapsing to a narrow region and preventing benefits even when the model relies on them. Overall, our findings suggest that future progress in latent visual reasoning depends on two key pillars: high-quality datasets with informative intermediate steps and more precise latent token prediction.

2605.18389 2026-05-20 cs.LG math.OC 版本更新

Spherical Harmonic Optimal Transport: Application to Climate Models Comparisons

球面调和最优传输:应用于气候模型比较

Pierre Houédry, Iskander Legheraba, Léo Buecher, Nicolas Courty

发表机构 * INRIA Rennes(INRIA里昂) University of Montpellier(蒙彼利埃大学) LPHI, UMR 5294, CNRS, INSERM(LPHI,UMR 5294,CNRS,INSERM) Université Bretagne Sud(布列塔尼南大学) IRISA, UMR 6074, CNRS(IRISA,UMR 6074,CNRS)

AI总结 本文提出了一种基于球面调和函数的最优传输方法,用于高效比较气候模型,通过在球面上利用谐波结构设计快速Sinkhorn算法,提升了计算效率并应用于全球气候模型评估。

详情
AI中文摘要

最优传输提供了一个强大的框架,用于在尊重其支撑集几何结构的情况下比较测度,但计算成本高昂,限制了其在现实应用中的潜力。在流形上,基于热核的卷积算法已被提出以缓解这一成本,但其理论性质仍鲜有探索。我们证明了当时间趋于零时,热核成本在平衡和非平衡情况下均收敛于最优传输成本。在特定情况下,对于2球面S²,我们确保所关联的Sinkhorn分歧保持经典最优传输差异的几何和分析性质。此外,我们利用球面的谐波结构推导出一种快速的Sinkhorn算法,仅需O(n)的内存和O(n^{3/2})的时间每迭代,且完全支持GPU友好的密集运算。我们在合成数据上验证了其计算效率,并讨论了其在评估全球气候模型中的潜在用途,提供了对模型性能的空间和季节性洞察。

英文摘要

Optimal transport provides a powerful framework for comparing measures while respecting the geometry of their support, but comes with an expensive computational cost, hindering its potential application to real world use cases. On manifolds, convolutional algorithms based on the heat kernel have been proposed to alleviate this cost, but their theoretical properties remain largely unexplored. We establish that the heat kernel cost converges to the optimal transport cost as time vanishes in the balanced and unbalanced cases. In the specific case of the 2-sphere $\mathbb{S}^2$, we ensure that the associated Sinkhorn divergences retains the desirable geometric and analytic properties of classical optimal transport discrepancies. Moreover, we leverage the harmonic structure of the sphere to derive a fast Sinkhorn algorithm, requiring only $\mathcal{O}(n)$ memory and $\mathcal{O}(n^{3/2})$ time per iteration, with fully dense GPU-friendly operations. We validate its computational efficiency on synthetic data, and discuss its potential use in the evaluation of global climate models, providing both spatial and seasonal insights into models performances.

2605.17889 2026-05-20 cs.LG 版本更新

CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution

CoX-MoE: 通过AMX启用的CPU-GPU协同执行提升高吞吐量MoE推理的协同专家执行

Muyoung Son, Yi Chen, Seungjae Yoo, Soongyu Choi, Joo-Young Kim

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出CoX-MoE,一种通过AMX启用的CPU-GPU协同系统,通过协同专家执行和战略工作负载编排优化MoE推理,提升吞吐量。CoX-MoE引入了coalescing-aware orchestration策略和静态专家-aware分层方案,分别优化资源分配和减少PCIe传输开销,从而在吞吐量上比现有框架提升7.1倍和2.4倍。

Comments 7 pages, 8 figures, accepted to DAC '26

详情
AI中文摘要

混合专家(MoE)架构通过稀疏专家激活提高计算效率,但面向吞吐量的推理面临显著的GPU内存压力,因为参数规模和中间数据较大。先前工作尝试通过专家卸载和微批处理或卸载计算到CPU来缓解这一问题。然而,微批处理导致的工作负载碎片化会降低操作强度,导致专家执行成为内存瓶颈。同时,CPU卸载受限于慢速PCIe传输和其在解码阶段注意力计算中的有限适用性。因此,这些低效性限制了系统利用率,严重限制了MoE推理的端到端吞吐量。为了解决这些挑战,本文提出CoX-MoE,一种通过AMX启用的CPU-GPU协同系统,通过结合协同专家执行和战略工作负载编排来全面优化MoE推理。CoX-MoE引入(i)一种coalescing-aware orchestration策略,通过采用普通批处理而非微批处理进行专家计算和选择性注意力卸载,共同优化资源分配;(ii)一种静态专家-aware分层方案,预先将频繁激活的专家分配到GPU,减少PCIe传输开销并平衡CPU和GPU在推理中的工作负载。与最先进的框架相比,CoX-MoE实现了显著的提升,分别达到比FlexGen和MoE-Lightning高7.1倍和2.4倍的吞吐量。

英文摘要

The Mixture-of-Experts (MoE) architecture improves computational efficiency via sparse expert activation, but throughput-oriented inference faces substantial GPU memory pressure due to a significant parameter size and intermediate data. Prior works attempt to mitigate this using expert offloading with micro-batching or by offloading computation to the CPU. However, the fragmented workload resulting from micro-batching degrades operational intensity, causing expert execution to become memory-bound. Meanwhile, CPU offloading is constrained by slow PCIe transfers and its limited applicability to attention computation in the decode stage. Consequently, these inefficiencies prevent effective system utilization, severely restricting the end-to-end throughput of MoE inference. To address these challenges, this paper proposes CoX-MoE, an Advanced Matrix Extensions (AMX)-enabled CPU-GPU collaborative system that comprehensively optimizes MoE inference by combining coalesced expert execution with strategic workload orchestration for higher throughput. CoX-MoE introduces (i) a coalescing-aware orchestration policy to jointly optimize resource allocation by adopting ordinary batch, instead of micro-batch, for expert computation and selective attention offloading, and (ii) a static expert-aware stratification scheme that pre-assigns frequently activated experts to the GPU, mitigating PCIe transfer overhead and balancing workload for the CPU and GPU during inference. Compared to state-of-the-art frameworks, CoX-MoE delivers significant gains, achieving up to 7.1x and 2.4x higher throughput than FlexGen and MoE-Lightning, respectively.

2605.17859 2026-05-20 cs.HC cs.LG 版本更新

Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables

多站点PPG:来自新兴多站点可穿戴设备的野外生理数据集

Jiayi Shao, Jiaying Ye, Shengyao Liu, Zachary Englhardt, Girish Narayanswamy, Vikram Iyer, Qiuyue Shirley Xue

发表机构 * University of Washington(华盛顿大学) Purdue University(普渡大学)

AI总结 本文提出一个多站点PPG数据集,通过四个定制开发的无感可穿戴设备收集了超过350小时的原始数据,用于评估不同身体部位的PPG信号在心率估计中的表现差异。

Comments 20 pages, 6 figures, 11 tables. Dataset and code available at the URLs in the paper

详情
AI中文摘要

可穿戴设备被广泛用于移动健康监测,光脉冲测距(PPG)是用于心率及相关生理测量的关键传感模式。然而,公开的野外PPG数据集大多集中在手腕或局限于短时间的受控研究,限制了新兴可穿戴设备形式因素的研究。我们提出了Multi-site PPG,一个从四个定制开发的无感可穿戴设备(智能耳环、戒指、手表和项链)收集的野外生理数据集。每个设备记录绿色和红外反射PPG、三轴加速度计和温度,并带有时间戳以实现跨设备对齐,同时一个Polar H10胸 strap提供参考心电图(ECG)。参与者在白天活动期间佩戴设备多天,继续正常生活。该数据集包含超过350小时的原始数据和每种可穿戴设备230-290小时的建模准备8秒窗口。我们基准测试了启发式、监督和自监督的心率估计方法,显示了显著的身体部位差异:最佳方法在耳环上的平均绝对误差(MAE)为2.30 bpm,在戒指上为5.13 bpm,在手表上为8.37 bpm,在项链上为8.68 bpm。我们进一步分析了运动效应,并评估了多站点和PPG-加速度计融合,证明了该数据集在新兴可穿戴设备形式因素上的稳健生理传感价值。

英文摘要

Wearables are widely used for mobile health monitoring, and photoplethysmography (PPG) is a key sensing modality for heart rate and related physiological measurements. However, public in-the-wild PPG datasets remain largely wrist-centric or limited to short, controlled studies, constraining research on emerging wearable form factors. We present Multi-site PPG, an in-the-wild physiological dataset collected from four custom-developed unobtrusive wearables: a smart earring, ring, watch, and necklace. Each device records green and infrared reflective PPG, 3-axis acceleration, and temperature with timestamps for cross-device alignment, while a Polar H10 chest strap provides reference electrocardiogram (ECG). Participants wore the devices for multiple days during daytime activities while continuing their normal routines. The dataset contains over 350 hours of raw data and 230-290 hours of modeling-ready 8-second windows per wearable. We benchmark heuristic, supervised, and self-supervised heart-rate estimation methods, showing substantial body-site differences: the best methods achieve mean absolute errors (MAEs) of 2.30 bpm on the earring, 5.13 bpm on the ring, 8.37 bpm on the watch, and 8.68 bpm on the necklace. We further analyze motion effects and evaluate multi-site and PPG-accelerometer fusion, demonstrating the dataset's value for robust physiological sensing across emerging wearable form factors.

2605.17804 2026-05-20 cs.LG eess.SP 版本更新

GenTS: A Comprehensive Benchmark Library for Generative Time Series Models

GenTS:生成时间序列模型的综合基准库

Chenxi Wang, Xiaorong Wang, Peiyang Li, Yi Wang

发表机构 * The University of Hong Kong(香港大学) Fudan University(复旦大学)

AI总结 本文提出GenTS,一个用于系统评估生成时间序列模型的综合且可扩展的基准库,通过统一的数据预处理流程、多样化的模型集合和全景评估指标,为生成模型提供了更灵活的评估框架。

详情
AI中文摘要

生成模型在时间序列分析任务中展现出了显著的潜力,如合成、预测、插值等。然而,现有的时间序列库主要针对判别模型进行工程设计,具有针对特定任务的标准工作流程,例如优化时间序列预测的均方误差。这种刚性的结构与生成模型独特的、往往复杂的范式(如对抗训练、扩散过程)根本上不兼容,因为生成模型学习的是数据分布而非直接的输入-输出映射。为此,我们提出了GenTS,一个全面且可扩展的基准库,旨在对生成时间序列模型进行系统评估。GenTS具有统一的数据预处理流程、多样化的模型集合和全景评估指标。其模块化设计也使研究者能够灵活地自定义超出内置数据集和模型。基于GenTS,我们进行了在多种任务下的基准测试,从而为模型选择提供了建议,并识别了未来研究的潜在方向。我们的代码在https://github.com/WillWang1113/GenTS上开源。官方教程和文档可在https://willwang1113.github.io/GenTS/上获取。

英文摘要

Generative models have demonstrated remarkable potential in time series analysis tasks, like synthesis, forecasting, imputation, etc. However, offering limited coverage for generative models, existing time series libraries are mainly engineered for discriminative models, with standardized workflows for specific tasks, such as optimizing Mean Squared Errors for time series forecasting. This rigid structure is fundamentally incompatible with the distinct and often complex paradigms of generative models (e.g., adversarial training, diffusion processes), which learn the underlying data distribution rather than a direct input-output mapping. To this end, we proposed GenTS, a comprehensive and extensible benchmark library designed for systematic assessment on generative time series models. GenTS features a unified data preprocessing pipeline, a collection of versatile models, and panoramic evaluation metrics. Its modular design also enables the researchers to flexibly customize beyond our built-in datasets and models. Based on GenTS, we conducted benchmarking experiments under diverse tasks, accordingly offering suggestions for model selection and identifying potential directions for future research. Our codes are open-source at https://github.com/WillWang1113/GenTS. The official tutorials and document are available at https://willwang1113.github.io/GenTS/.

2605.17340 2026-05-20 cs.LG 版本更新

Olivia: Harmonizing Time Series Foundation Models with Power Spectral Density

Olivia:通过功率谱密度和谐化时间序列基础模型

Jingru Fei, Kun Yi, Alex Xing Wang, Qingsong Wen, Xiangxiang Zhu, Wei Fan

发表机构 * Beijing Institute of Technology(北京理工大学) North China Institute of Computing Technology(华北计算技术研究所) State Information Center(国家信息中心) University of Auckland(奥克兰大学) Northwest Polytechnical University(西北工业大学) Victoria University of Wellington(威灵顿维多利亚大学)

AI总结 本文提出Olivia,一种基于谐化机制的时间序列基础模型,通过在频域中使用功率谱密度来减少数据集间的不匹配并增强预训练效果,从而在零样本、少样本和全样本预测场景中取得最佳性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

时间序列基础模型依赖于在跨领域多样数据集上进行大规模预训练,但其在时间模式上的异质性可能会阻碍训练和学习可迁移的时间序列表示的有效性。受信号处理中归一化功率谱密度(PSD)基本概念的启发,我们假设通过频域中的PSD和谐化数据集可以减少不匹配并增强预训练。我们超越了直接不可行的最小化优化,创新性地将其重新表述为一种原则性的和谐化方法。具体而言,我们提出Harmonizer模块,该模块重塑频谱结构并隐式地在不同数据集中和谐化PSD,这在理论上对应于第二阶时间相关性的共享重参数化。我们的理论分析进一步揭示,与Harmonizer交互的token可以通过紧凑的共振器集合高效地进行调解,从而启发了HarmonicAttention设计,该设计在低维交互空间中执行自注意力。然后,我们提出Olivia,一种基于这些和谐化机制的新时间序列基础模型。在两个大规模基准(TSLib和GIFT-Eval)以及额外的6个GluonTS数据集上的广泛实验表明,Olivia在零样本、少样本和全样本预测场景中一致实现了最佳性能。我们的代码可在https://github.com/TSTS13/Olivia上获得。

英文摘要

Time series foundation models rely on large-scale pretraining over diverse datasets across domains, yet their heterogeneity in temporal patterns could hinder the effectiveness of training and learning transferable time series representations. Inspired a fundamental concept, normalized power spectral density (PSD) in signal processing, we assume harmonizing datasets via PSDs in the spectral domain could reduce mismatches and enhance pretraining. We then go beyond the direct intractable minimization optimization and innovatively reformulate it as a principled harmonization approach. Specifically, we propose Harmonizer, a module that reshapes spectral structures and implicitly harmonizing PSDs across datasets, which theoretically corresponds to a shared reparameterization of second-order temporal correlations. Our theoretical analysis further reveals token interactions with Harmonizer can be efficiently mediated by a compact set of resonators, motivating a HarmonicAttention design that performs self-attention in a low-dimensional interaction space. Then, we propose Olivia, a novel time series foundation model built upon these harmonization mechanisms. Extensive experiments on two large-scale benchmarks (TSLib and GIFT-Eval) and extra 6 datasets from GluonTS, demonstrate Olivia consistently achieves state-of-the-art performance under zero-shot, few-shot, and full-shot forecasting scenarios. Our code is available at https://github.com/TSTS13/Olivia.

2605.17326 2026-05-20 hep-lat cs.LG 版本更新

Noise scheduling and linear dynamics in diffusion models on Lie groups

在李群上扩散模型中的噪声调度与线性动力学

Javad Komijani

发表机构 * Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland(理论物理研究所,苏黎世联邦理工学院,瑞士苏黎世,8093)

AI总结 本文研究了在李群上扩散过程中噪声调度的作用,特别关注其在格点规范理论中的应用。研究发现特定的噪声调度可使Wilson作用量的期望值随扩散时间线性衰减,与欧几里得扩散模型相比,这种行为在李群设置中自然产生,而后者需要显式设计漂移项。

Comments 5 pages

详情
AI中文摘要

我们研究了在李群上扩散过程中噪声调度的作用,特别关注其在格点规范理论中的应用。我们证明特定的噪声调度导致Wilson作用量的期望值随扩散时间呈线性衰减。我们将其与欧几里得扩散模型进行比较,其中这种行为需要显式设计的漂移项,而在李群设置中则自然产生。

英文摘要

We investigate the role of the noise schedule in diffusion processes on Lie groups, with particular emphasis on applications to lattice gauge theory. We show that a specific noise schedule leads to a linear decay of the expectation value of the Wilson action as a function of diffusion time. We compare this with Euclidean diffusion models, where such behavior requires an explicitly designed drift term, while in the Lie-group setting it arises naturally.

2605.17046 2026-05-20 cs.LG cs.AI cs.CL 版本更新

1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

1GC-7RC:一张图形卡——七个研究挑战!AI代理在做你的工作方面有多好?

Robin-Nico Kampa, Fabian Deuser, Anna Bößendörfer, Konrad Habel, Norbert Oswald

AI总结 本文提出1GC-7RC基准测试,通过七个跨领域机器学习任务评估AI代理在从头设计、实现和训练模型的能力,揭示了不同代理在隐式机器学习知识、规划能力和时间预算管理方面的差异。

详情
AI中文摘要

自主AI编码代理正成为机器学习从业者在工业和研究中不可或缺的工具。尽管这种应用日益广泛,但尚无标准化基准来评估其在不同领域从头设计、实现和训练模型的能力。我们引入了1GC-7RC(单张图形卡:七个研究挑战),该基准包含七个机器学习任务,涵盖语言建模、图像分类、语义分割、图学习、表格预测、时间序列预测和文本分类。每个任务都提供锁定的数据准备和评估脚本以及基线训练脚本;代理只能修改训练代码,无法访问预训练权重(语义分割任务有一个受控例外),无法访问互联网,并必须在单个GPU上完成每个任务的时间预算(40-120分钟)。我们评估了七个编码代理:五个专有(Claude Code with Sonnet 4.6、Opus 4.6和Opus 4.7;Codex CLI with GPT 5.5;和OpenCode with Qwen 3.6+)和两个开源(OpenCode with Kimi K2.5、Kimi K2.6)。在每个代理-任务对的5次运行中,我们报告了显著的性能差异,揭示了不同代理在隐式机器学习知识、规划能力和时间预算管理方面的不同水平。该基准、工具和所有评估成果均在GitHub上公开,以促进未来代理的可重复比较。由于我们的基准设计是模块化的,该基准可以扩展到新任务和领域,适应不同的GPU预算,并用于研究多代理设置,使其成为未来自主研究代理研究的灵活平台。

英文摘要

Autonomous AI coding agents are becoming a core tool for ML practitioners in industry and research alike. Despite this growing adoption, no standardized benchmark exists to evaluate their ability to design, implement, and train models from scratch across diverse domains. We introduce **1GC-7RC** (*Single Graphic Card: Seven Research Challenges*), a benchmark comprising seven ML tasks spanning language modeling, image classification, semantic segmentation, graph learning, tabular prediction, time-series forecasting, and text classification. Each task provides a locked data-preparation and evaluation script together with a baseline training script; the agent may only modify the training code, has no access to pretrained weights (with one controlled exception for semantic segmentation), no internet access, and must complete each task within a task-specific wall-clock budget (40-120 minutes) on a single GPU. We evaluate seven coding agents: five proprietary (Claude Code with Sonnet 4.6, Opus 4.6, and Opus 4.7; Codex CLI with GPT 5.5; and OpenCode with Qwen 3.6+) and two open-source (OpenCode with Kimi K2.5, Kimi K2.6). Across 5 runs per agent-task pair, we report substantial performance differences that reveal varying levels of implicit ML knowledge, planning ability, and time-budget management. The benchmark, harness, and all evaluation artifacts are publicly available on GitHub at https://github.com/Strolchii/1GC-7RC-Benchmark to facilitate reproducible comparison of future agents. Because our benchmark design is modular, the benchmark can be extended to new tasks and domains, adapted to different GPU budgets, and used to study multi-agent settings, making it a flexible platform for future research on autonomous research agents.

2605.17003 2026-05-20 cs.LG cs.AI 版本更新

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

学习区能量:用于高效RL后训练的在线数据选择

Peng Cui, Boyao Yang, Jun Zhu

发表机构 * Dept. of Comp. Sci. & Tech.(计算机科学与技术系) Institute for AI(人工智能研究院) BNRist Center(BNRist中心) Tsinghua-Bosch Joint ML Center(清华大学-博世联合机器学习中心) THBI Lab(THBI实验室) Tsinghua University(清华大学) Dept. of Automation(自动化系)

AI总结 本文提出学习区能量(LZE)方法,通过在线数据选择框架集中计算在模型的主动学习前沿,提高RL后训练的效率,实验表明在多个数据集上表现优异,且计算资源消耗减少。

详情
AI中文摘要

强化学习(RL)后训练已成为提取大语言模型(LLMs)数学推理能力的主要范式,但现有技术如GRPO和DAPO在提示上均匀分配rollout和梯度预算,浪费计算在已掌握的样本或远超模型当前能力的样本上。为解决这一根本性低效问题,我们提出学习区能量(LZE),一种理论支撑的完全在线数据选择框架,集中计算在模型的主动学习前沿。其核心是定义一个闭式学习区能量评分,融合三个互补信号,初始难度锚点、标准化结果不确定性项和通过率动量,形成一个单标量,可证明与组相对策略梯度更新的预期幅度一致。一个具有回放的前向修剪器进一步减少墙钟时间成本,通过跳过已解决提示的rollout生成,同时定期检查遗忘。在Qwen家族模型(1.5B-8B)上评估GSM8K、MATH和DAPO-MATH数据集,我们的方法每步仅保留40%的训练数据,却匹配或超越全数据基线,尤其在AIME25(+45.9%)和AMC23(+18.2%)上表现出显著的分布外收益,同时估计训练FLOPs减少约36%。我们的代码可在https://github.com/Stellaris167/LZE获取。

英文摘要

Reinforcement Learning (RL) post-training has emerged as the dominant paradigm for eliciting mathematical reasoning in Large Language Models (LLMs), yet prevailing techniques such as GRPO and DAPO distribute rollout and gradient budgets nearly uniformly across prompts, squandering compute on samples that are already mastered or remain far beyond the model's current capability. To address this fundamental inefficiency, we propose Learning-Zone Energy (LZE), a theoretically grounded, fully online data selection framework that concentrates computation on the model's active learning frontier. At its core, we define a closed-form Learning-Zone Energy Score that fuses three complementary signals, an initial-difficulty anchor, a normalized outcome-uncertainty term, and a pass-rate momentum, into a single scalar that is provably aligned with the expected magnitude of group-relative policy gradient updates. A forward pruner with replay further reduces wall-clock time cost by skipping rollout generation for persistently solved prompts while periodically checking for forgetting. Evaluated on Qwen-family models (1.5B-8B) across GSM8K, MATH and DAPO-MATH, our method retains only 40% of the training data per step yet matches or surpasses full-data baselines, with especially pronounced out-of-distribution gains on AIME25 (+45.9%) and AMC23 (+18.2%), alongside an estimated 36% reduction in training FLOPs. Our code is available at https://github.com/Stellaris167/LZE.

2605.16445 2026-05-20 cs.LG cs.AI 版本更新

Membership Inference Attacks on Discrete Diffusion Language Models

对离散扩散语言模型的成员推断攻击

Shailesh Kasivelrajan

AI总结 本文研究了对微调后的MDLMs的成员推断攻击,发现其比现有灰盒基线更易受攻击,并设计了阴影模型转移攻击以证明其有效性。

Comments Citations and Co Authors need to be verified and updated. Will submit a new version soon

详情
AI中文摘要

Masked Diffusion Language Models (MDLMs) 替换了自回归生成的迭代解 masking,其隐私属性大多未被研究。我们研究了对微调后的MDLMs的成员推断攻击(MIA),并发现其比现有灰盒基线所暗示的要显著更容易受到攻击。我们从四个 masking 比率下的模型重建损失中提取了一个46维的特征向量,并在其上训练XGBoost和MLP分类器。在六个文本领域上的MIMIR基准测试中,XGBoost实现了平均AUC 0.878,在Pile CC上达到峰值0.930,并在平均上比SAMA灰盒基线高出0.062 AUC。一个leave one signal out消融实验显示,仅ELBO轨迹就驱动了大部分结果,当移除时平均下降0.130,而注意力特征在低于0.003时几乎不起作用。我们还设计了一个阴影模型转移攻击,其中K=3个在无关领域训练的surrogate MDLMs在不接触目标领域的情况下生成分类器标签。这在0.020以内实现了0.858的平均AUC,并确立了阴影模型转移作为一种实用且几乎同样有效的攻击路径。

英文摘要

Masked Diffusion Language Models MDLMs replace autoregressive generation with iterative demasking and their privacy properties are largely unstudied. We study membership inference attacks MIA on fine tuned MDLMs and show they are significantly more vulnerable than current grey box baselines suggest. We extract a 46 dimensional feature vector from the models reconstruction loss at four masking ratios and train XGBoost and MLP classifiers on top. On the MIMIR benchmark across six text domains XGBoost achieves mean AUC 0.878 peaking at 0.930 on Pile CC and beats the SAMA grey box baseline by 0.062 AUC on average. A leave one signal out ablation shows that the ELBO trajectory alone drives most of this with a mean drop of 0.130 when removed while attention features add almost nothing below 0.003. We also design a shadow model transfer attack where K equals 3 surrogate MDLMs trained on data from unrelated domains generate classifier labels with no access to the target domain. This achieves 0.858 mean AUC within 0.020 of the white box oracle and establishes shadow model transfer as a practical and near equally effective attack path.

2605.15113 2026-05-20 cs.LG 版本更新

Learning from Language Feedback via Variational Policy Distillation

通过变分策略蒸馏学习语言反馈

Yang Li, Erik Nijkamp, Semih Yavuz, Shafiq Joty

发表机构 * Salesforce AI Research(Salesforce人工智能研究)

AI总结 本文提出变分策略蒸馏(VPD),通过将学习语言反馈形式化为变分期望最大化问题,解决传统自蒸馏方法中教师策略能力停滞的问题,从而在科学推理和代码生成任务中优于标准RLVR和现有自蒸馏基线。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)面临稀疏结果信号的问题,导致复杂推理任务的探索瓶颈。最近的在线自蒸馏方法尝试通过利用语言反馈生成密集的token级监督来解决这一问题。然而,这些方法依赖于固定且被动的教师来解读反馈。随着学生策略的改进,教师的零样本评估能力趋于停滞,最终阻碍进一步学习。为克服这一问题,我们提出变分策略蒸馏(VPD),一个将学习语言反馈形式化为变分期望最大化(EM)问题的框架。VPD共同进化两种策略:在E步中,教师通过自适应信任区域更新在轨迹结果上主动优化,将文本反馈转化为动态改进的目标token分布。在M步中,学生内部化其在自身在线滚动中所获得的密集分布指导。通过持续提升教师从文本批评中提取可操作信号的能力,VPD克服了传统自蒸馏方法的局限性。在多样化的诊断反馈源上评估,VPD在科学推理和代码生成任务中持续优于标准RLVR和现有自蒸馏基线。最后,通过在刚性数学推理和冷启动场景中压力测试我们的框架,我们揭示了反馈驱动自蒸馏与纯环境驱动RL之间的基本界限。

英文摘要

Reinforcement learning from verifiable rewards (RLVR) suffers from sparse outcome signals, creating severe exploration bottlenecks on complex reasoning tasks. Recent on-policy self-distillation methods attempt to address this by utilizing language feedback to generate dense, token-level supervision. However, these approaches rely on a fixed, passive teacher to interpret the feedback. As the student policy improves, the teacher's zero-shot assessment capabilities plateau, ultimately halting further learning. To overcome this, we propose Variational Policy Distillation (VPD), a framework that formalizes learning from language feedback as a Variational Expectation-Maximization (EM) problem. VPD co-evolves both policies: in the E-step, the teacher is actively refined on trajectory outcomes via an adaptive trust-region update, translating textual feedback into a dynamically improved target token distribution. In the M-step, the student internalizes this dense distributional guidance on its own on-policy rollouts. By continuously improving the teacher's ability to extract actionable signals from textual critique, VPD overcomes the limitations of passive distillation. Evaluated across diverse sources of diagnostic feedback on scientific reasoning and code generation tasks, VPD consistently outperforms both standard RLVR and existing self-distillation baselines. Finally, by stress-testing our framework on rigid mathematical reasoning and cold-start regimes, we illuminate the fundamental bounds of feedback-driven self-distillation compared to pure environment-driven RL.

2605.14063 2026-05-20 cs.LG 版本更新

Reliability-Gated Source Anchoring for Continual Test-Time Adaptation

可靠性门控的源锚定用于持续测试时间适应

Vikash Singh, Debargha Ganguly, Weicong Chen, Sabyasachi Sahoo, Sreehari Sankar, Biyao Zhang, Mohsen Hariri, Shouren Wang, Osama Zafar, Christian Gagné, Vipin Chaudhary

发表机构 * Case Western Reserve University(凯斯西储大学) Université Laval(拉瓦尔大学) Mila - Québec AI Institute(魁北克人工智能研究所)

AI总结 该研究提出了一种可靠性门控的源锚定方法(RMemSafe),用于持续测试时间适应(CTTA),通过利用冻结源的归一化预测熵来抑制所有显式源耦合使用,从而在源可靠性下降时自动关闭源锚定和一致性过滤器,提升模型在持续腐蚀任务中的性能。

详情
AI中文摘要

持续测试时间适应(CTTA)在在线更新预训练模型时,将模型锚定到一个冻结的源检查点上。然而,当源可靠性下降时,这种锚定方式会失效。在CCCHard数据集上,ResNet-50源的top-1准确率下降至约1.3%,而现有源锚定CTTA方法仍然使用相同的锚定强度。本文提出RMemSafe,一种基于ROID的可靠性门控扩展方法,利用冻结源的归一化预测熵来衰减目标函数中的所有显式源耦合使用。当源后验接近均匀分布时,门控关闭:源锚定和一致性过滤器消失,目标函数减少为源无关的回退,包含ROID的基本损失加上边际校准。结合ASR,RMemSafe在8个匹配分割的持续腐蚀单元中实现了最低的错误率,并在所有9个单元中是最佳的重置方法,比ROID+ASR在ResNet-50上提升1.05个百分点,在ViT-B/16上提升0.48个百分点。受控的源退化扫描显示,其危害斜率比ROID+ASR浅1.13倍,与渐进衰减预测一致。熵门控检测到高熵源崩溃,而非自信错误的低熵源;该范围被明确评估和讨论。

英文摘要

Continual test-time adaptation (CTTA) updates a pretrained model online on an unlabeled, non-stationary stream while anchoring it to a frozen source checkpoint. This anchor is useful only when the source remains reliable. On CCC-Hard, however, a ResNet-50 source falls to approximately $1.3\%$ top-$1$ accuracy, while existing source-anchored CTTA methods continue applying the same anchor strength. We call this failure mode blind anchoring and propose RMemSafe, a reliability-gated extension of ROID that uses the frozen source's normalized predictive entropy to attenuate all explicit source-coupled uses in the objective. When the source posterior approaches uniformity, the gate closes: the source anchor and agreement filter vanish, and the objective reduces to a source-agnostic fallback comprising ROID's base losses plus marginal calibration. Combined with ASR, RMemSafe achieves the lowest error on $8$ of $9$ matched-split continual-corruption cells and is the best reset-based method on all $9$, improving ROID+ASR by $1.05$~pp on ResNet-50 and $0.48$~pp on ViT-B/16. A controlled source-degradation sweep shows a $1.13{\times}$ shallower harm slope than ROID+ASR, consistent with the graceful-decay prediction. The entropy gate detects high-entropy source collapse, not confidently wrong low-entropy sources; this scope is explicitly evaluated and discussed.

2605.13652 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

超越困惑度:低秩预训练的几何与谱研究

Namrata Shivagunde, Vijeta Deshpande, Sherin Muckatira, Anna Rumshisky

发表机构 * University of Massachusetts Lowell(马萨诸塞大学洛厄尔分校)

AI总结 本文通过几何和谱分析研究低秩预训练方法,揭示其与全秩训练在模型性能和解空间上的差异,发现低秩方法在不同模型规模下表现各异,且困惑度不能完全反映下游任务性能。

Comments 9 pages, 5 figures, 2 tables

详情
AI中文摘要

大规模语言模型的预训练主要受限于存储全秩权重、梯度和优化器状态的内存成本。低秩预训练出现以解决这一问题,相关方法空间迅速扩展。一个核心问题仍未解决:低秩方法是否能产生与全秩训练具有同等泛化能力的模型,或者秩约束是否根本性地改变了所达到的解?现有比较几乎完全依赖于单种子运行的验证困惑度,通常继承自先前文献。然而,困惑度是解质量的差代理;两种方法可以在困惑度上匹配,却收敛到不同的损失景观区域和内部表示。我们通过表征五种低秩预训练方法(GaLore和Fira(内存高效优化器)、CoLA和SLTrain(架构再参数化)、ReLoRA(适配器式更新带周期性重置))在三个模型规模(60M、130M、350M)下与全秩训练的解,关闭这一差距。我们评估每种方法在四个维度上的16个指标:1D损失景观沿随机/Top-K PCA方向、1D检查点之间插值、权重和学习更新的谱结构,以及激活相似性与全秩训练。我们显示低秩方法不等同于全秩训练,也不等同于彼此,即使验证困惑度接近。全秩训练在随机方向上达到更尖锐的盆地,而反方向则适用于top-1 PCA方向。每种方法收敛到几何上不同的盆地。低秩激活在训练过程中随着层数增加而偏离全秩激活,GaLore最接近全秩激活。进一步,验证困惑度在每个规模下并不转化为下游性能。添加几何和谱度量提高了预测。

英文摘要

Pre-training large language models is dominated by the memory cost of storing full-rank weights, gradients, and optimizer states. Low-rank pre-training has emerged to address this, and the space of methods has grown rapidly. A central question remains open: do low-rank methods produce models that generalize comparably to full-rank training, or does the rank constraint fundamentally alter the solutions reached? Existing comparisons rely almost entirely on validation perplexity from single-seed runs, often carried forward from prior literature. Yet perplexity is a poor proxy for solution quality; two methods can match on perplexity while converging to different loss landscape regions and internal representations. We close this gap by characterizing the solutions found by five low-rank pre-training methods, GaLore and Fira (memory-efficient optimizers), CoLA and SLTrain (architecture reparameterizations), and ReLoRA (adapter-style updates with periodic resets), against full-rank training at three model scales (60M, 130M, 350M). We evaluate each along 16 metrics across four dimensions: 1-D loss landscape along random/top-K PCA directions, 1-D interpolation between checkpoints, spectral structure of the weights and learned updates, and activation similarity to full-rank training. We show that low-rank methods are not equivalent to full-rank training, nor to one another, even when validation perplexity is close. Full-rank training settles into a sharper basin than low-rank methods along random directions, while the reverse holds for the top-1 PCA direction. Each method converges to a geometrically distinct basin. Low-rank activations diverge from full-rank in later layers as training progresses, with GaLore tracking full-rank most closely. Further, validation perplexity does not translate to downstream performance at every scale. Adding geometric and spectral metrics improves the prediction.

2605.12981 2026-05-20 cs.SE cs.AI cs.LG 版本更新

Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

基于协议的开发:通过不变式和连续证据治理生成的软件

Jun He, Deying Yu

AI总结 本文提出了一种基于协议的开发方法,通过定义协议的不变式和连续证据来治理生成的软件,其核心贡献是将协议作为主要软件 artifact,而非代码,从而实现对生成软件的持续验证和治理。

Comments 20 pages, 2 tables

详情
AI中文摘要

自动化程序合成降低了生成实现的成本,但引入了更复杂的治理问题:确定哪些生成的 artifact 是可接受的。自然语言规范存在歧义,基于示例的测试仅覆盖行为空间的一部分。单独使用这些方法无法提供足够的控制边界。我们引入了基于协议的开发(PDD),其中主要的软件 artifact 是可机器执行的协议,而非代码。我们定义协议为三元组 P = (S, B, O),指定结构、行为和操作不变式。其联合作为软件组件的可接受实现空间的定义。在 PDD 中,实现是通过受约束的搜索发现的可替换实现。只有满足协议并产生可验证的合规证据链的实现才被接受。接受基于协议的满足和记录的证据,而非对生成器的信任。对于部署的系统,我们扩展证据链为动态证据账本。运行时验证器将签名的观察、不变式检查和违规情况附加到账本中,使可监控的义务能够持续得到证明。这将实时故障回溯到生成循环中,而无需授予生成器运行时的权威。结合形式方法、属性测试、运行时验证、政策作为代码和软件可追溯性,PDD 定义了自动化软件工程的治理模型。其组织原则是代码是短暂的,而协议承载持久的权威。

英文摘要

Automated program synthesis lowers the cost of producing implementations but introduces a harder governance problem: determining which generated artifacts are admissible. Natural-language specifications are ambiguous, and example-based tests sample only part of the behavioral space. Used alone, neither provides a sufficient control boundary. We introduce Protocol-Driven Development (PDD), where the primary software artifact is a machine-enforceable protocol rather than code. We define a protocol as the triplet P = (S, B, O), specifying structural, behavioral, and operational invariants. Their conjunction defines the admissible implementation space of a software component. Under PDD, implementations are replaceable realizations discovered through constrained search. An implementation is admitted only if it satisfies the protocol and produces a verifiable Evidence Chain of compliance. Admission is grounded in protocol satisfaction and recorded evidence rather than trust in the generator. For deployed systems, we extend the Evidence Chain into a Dynamic Evidence Ledger. Runtime verifiers append signed observations, invariant checks, and violations to the ledger, allowing monitorable obligations to be continuously attested. This connects live failures back to the generation loop without granting the generator runtime authority. Combining formal methods, property testing, runtime verification, policy-as-code, and software provenance, PDD defines a governance model for automated software engineering. Its organizing principle is that code is transient, while the protocol carries durable authority.

2605.11333 2026-05-20 cs.DC cs.LG cs.PF 版本更新

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

MLCommons Chakra: 通过标准化执行轨迹推进性能基准测试与联合设计

Srinivas Sridharan, Theodor-Adrian Badea, Andy Balogh, Bradford M. Beckmann, Brian Coutinho, Louis Feng, Sheng Fu, Sanshan Gao, Mehryar Garakani, Taekyung Heo, David Kanter, Josh Ladd, Ziwei Li, Winston Liu, Changhai Man, Dan Mihailescu, Spandan More, Joongun Park, Ashwin Ramachandran, Vinay Ramakrishnaiah, Saeed Rashidi, Vijay Janapa Reddi, Puneet Sharma, Phio Tian, William Won, Hanjiang Wu, Huan Xu, Jinsun Yoo, Tushar Krishna

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country(匿名机构,匿名城市,匿名地区,匿名国家)

AI总结 本文提出Chakra,一个用于性能基准测试和联合设计的开放生态系统,通过标准化执行轨迹来提升分布式机器学习工作负载在生产AI系统中的观察、重现和优化能力,并通过实际案例展示其价值。

Comments Accepted at the 9th Conference on Machine Learning and Systems (MLSys 2026)

详情
AI中文摘要

人工智能创新的快速节奏要求一种敏捷的方法来观察、重现和优化生产AI系统中分布式机器学习工作负载的行为,并为未来系统实现高效的软硬件联合设计。我们提出了Chakra,一个开放且便携的性能基准测试和联合设计生态系统。Chakra的核心组件是一个开放且互操作的基于图的分布式AI/ML工作负载表示,称为Chakra执行轨迹(ET)。这些ETs代表了关键操作,如计算、内存和通信,数据和控制依赖性、时间、资源约束等。此外,Chakra还包括一组互补的工具和能力,以使各种模拟器、仿真器和回放工具能够收集、分析、生成和采用Chakra ETs。我们展示了在生产AI集群上收集的Chakra ETs的分析,并通过实际案例研究证明其价值。Chakra已被MLCommons采用,并在行业内有积极的贡献和参与,包括但不限于NVIDIA、AMD、Meta、Keysight、HPE和Scala等公司。

英文摘要

The fast pace of artificial intelligence~(AI) innovation demands an agile methodology for observation, reproduction and optimization of distributed machine learning~(ML) workload behavior in production AI systems and enables efficient software-hardware~(SW-HW) co-design for future systems. We present Chakra, an open and portable ecosystem for performance benchmarking and co-design. The core component of Chakra is an open and interoperable graph-based representation of distributed AI/ML workloads, called Chakra execution trace~(ET). These ETs represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. Additionally, Chakra includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra ETs by a broad range of simulators, emulators, and replay tools. We present analysis of Chakra ETs collected on production AI clusters and demonstrate value via real-world case studies. Chakra has been adopted by MLCommons and has active contributions and engagement across the industry, including but not limited to NVIDIA, AMD, Meta, Keysight, HPE, and Scala, to name a few.

2605.11021 2026-05-20 cs.LG 版本更新

A Switching System Theory of Q-Learning with Linear Function Approximation

基于联合谱半径的Q学习线性函数逼近切换系统理论

Donghwan Lee, Han-Dong Lim

发表机构 * Department of Electrical Engineering(电气工程系)

AI总结 本文基于联合谱半径理论,提出了一种Q学习线性函数逼近的切换系统解释,推导了精确的线性切换模型,并将收敛性与相应切换系统的稳定性联系起来,同时扩展到具有独立同分布观测和马尔可夫观测的随机线性Q学习,提供了基于JSR的正则化Q学习视角。

详情
AI中文摘要

本文发展了一种基于联合谱半径(JSR)的Q学习线性函数逼近(LFA)的切换系统解释。我们推导了均值动态的精确线性切换模型,并将收敛性与相应切换系统的稳定性联系起来。相同的构造随后用于具有独立同分布(i.i.d.)观测和马尔可夫观测的随机线性Q学习。尽管一般情况下精确JSR计算困难,证书捕获切换模式的乘积,并且比一步范数界更保守。该框架还提供了基于JSR的正则化Q学习LFA视角。所得到的分析将投影贝尔曼方程、有限差分随机策略切换和切换系统稳定性连接到单一参数空间公式中。

英文摘要

This paper develops a switching-system interpretation of Q-learning with linear function approximation (LFA) based on the joint spectral radius (JSR). We derive an exact linear switched model for the mean dynamics and relate convergence to stability of the corresponding switched system. The same construction is then used for stochastic linear Q-learning with independent and identically distributed (i.i.d.) observations and with Markovian observations. Although exact JSR computation is difficult in general, the certificate captures products of switching modes and can be less conservative than one-step norm bounds. The framework also yields a JSR-based view of regularized Q-learning with LFA. The resulting analysis connects projected Bellman equations, finite-difference stochastic-policy switching, and switched-system stability in a single parameter-space formulation.

2605.08696 2026-05-20 cs.CL cs.LG 版本更新

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

结构化递归混合器用于大规模并行序列生成

Benjamin L. Badger

发表机构 * IBM

AI总结 本文提出了一种结构化递归混合器架构,能够在训练时实现序列并行表示与推理时的递归表示之间的代数转换,从而在不依赖专用内核或设备特定内存管理的情况下提高训练效率、输入信息容量和推理吞吐量。

详情
AI中文摘要

在过去二十年中,语言建模经历了从主要使用递归架构(在训练和推理过程中按顺序处理标记)到非递归模型(在训练过程中并行处理序列元素)的转变,后者在训练效率和稳定性方面有所提升,但以较低的推理吞吐量为代价。本文介绍了一种结构化递归混合器(SRM)架构,该架构能够在训练时实现序列并行表示与推理时的递归表示之间的代数转换,尤其不需要专用内核或设备特定的内存管理。我们通过实验表明,这种双表示方法相比其他线性复杂度模型,在训练效率、输入信息容量和推理吞吐量及并发性方面具有优势。我们推测递归模型对于信息丰富的输入(如语言)在扩展序列长度方面并不理想,但因其每个样本的常数内存需求,适合在样本(批量)维度上扩展。我们提供了Mojo/MAX推理实现的SRM,其吞吐量和并发性分别比同样强大的Transformer在vLLM上的推理提高了12倍和170倍,这些增益特征与PyTorch实现导致的GSM8k Pass@k计算常数增加30%。最后,我们证明SRM是有效的强化学习训练候选。

英文摘要

Over the last two decades, language modeling has experienced a shift from the use of predominantly recurrent architectures that process tokens sequentially during training and inference to non-recurrent models that process sequence elements in parallel during training, which results in greater training efficiency and stability at the expense of lower inference throughput. Here we introduce the Structured Recurrent Mixer, an architecture that allows for algebraic conversion between a sequence parallel representation at train time and a recurrent representation at inference, notably without the need for specialized kernels or device-specific memory management. We show experimentally that this dual representation allows for greater training efficiency, higher input information capacity, and larger inference throughput and concurrency when compared to other linear complexity models. We postulate that recurrent models are poorly suited to extended sequence length scaling for information-rich inputs typical of language, but are well suited to scaling in the sample (batch) dimension due to their constant memory per sample. We provide Mojo/MAX inference implementations of SRMs exhibiting 12x the throughput and 170x the concurrency of similarly powerful Transformers inferenced on vLLM, increases characteristic of Pytorch implementations resulting in a 30\% increase in compute-constant GSM8k Pass@k. We conclude by demonstrating that SRMs are effective reinforcement learning training candidates.

2605.08391 2026-05-20 cs.LG 版本更新

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

SACHI:通过整体信息整合实现多智能体强化学习中的结构化智能体协调

Nikunj Gupta, James Zachary Hare, Jesse Milzman, Rajgopal Kannan, Viktor Prasanna

发表机构 * University of Southern California(南加州大学) DEVCOM Army Research Laboratory(美国陆军研究实验室) DEVCOM Army Research Office(美国陆军研究办公室)

AI总结 本文提出SACHI方法,通过整体信息整合实现多智能体强化学习中的结构化智能体协调,解决了智能体在部分局部观察下协调行动的信息瓶颈问题,通过图Transformer卷积在智能体协调图上增强每个智能体的表示,从而在多个任务中表现出色。

详情
AI中文摘要

在合作性多智能体强化学习中,智能体基于部分局部观察行动时面临一个根本性的信息瓶颈:选择联合最优动作所需的知识分散在整个团队中,但每个智能体必须在没有访问队友观察、意图或所选动作的情况下做出决策。现有方法要么忽略这个瓶颈,将其压缩成一个标量混合信号,或者通过学习的通信通道绕过它。将动作协调视为智能体之间的结构化信息整合问题,我们提出结构化智能体协调通过整体信息整合(SACHI),其中在动作选择之前,通过智能体协调图上的图Transformer卷积,使每个智能体的表示增强,从而接收器敏感、内容依赖的信号来自队友。我们在五个合作任务上评估SACHI,涵盖空间、沟通和对抗性协调挑战,与十二个基线进行比较。SACHI在每个任务中都与最佳基线持平或表现更好,严格的汇总统计分析,包括归一化指标和bootstrap置信区间、Friedman排名和性能分析,证实这种优势在统计上显著,稳健且不依赖于模型容量的增加。参数匹配的消融进一步追溯收益的来源到一个单一的架构属性:消息传递操作中的内容依赖程度。

英文摘要

Cooperative multi-agent reinforcement learning agents that act on partial local observations face a fundamental information bottleneck: the knowledge needed to select jointly optimal actions is scattered across the team, yet each agent must commit to a decision without access to its teammates' observations, intentions, or chosen actions. Existing methods either ignore this bottleneck, compress it into a scalar mixing signal, or route around it with learned communication channels. Framing action coordination as a problem of structured information integration among agents, we propose \textit{structured agent coordination via holistic information integration}, or SACHI, in which graph transformer convolutions over an inter-agent coordination graph enrich each agent's representation with receiver-sensitive, content-dependent signals from teammates prior to action selection. We evaluate SACHI across five cooperative tasks spanning spatial, communicative, and adversarial coordination challenges against twelve baselines. SACHI consistently matches or outperforms the best baseline on every task, and rigorous aggregate statistical analyses, including normalized metrics with bootstrap confidence intervals, Friedman ranking, and performance profiling, confirm that this advantage is statistically significant, robust across environments, and not attributable to increased model capacity. Parameter-matched ablations further trace the source of the gains to a single architectural property: the degree of content-dependence in the message-passing operator.

2605.08143 2026-05-20 cs.LG cs.AI 版本更新

HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing

HoReN:用于大规模序列模型编辑的归一化Hopfield检索

Yuan Fang, Yi Xie, Xuming Ran

发表机构 * IXL Learning, Inc(IXL学习公司) Technical University of Munich(慕尼黑技术大学) National University of Singapore(新加坡国立大学)

AI总结 本文提出HoReN,一种基于代码本的参数保持编辑器,通过在单个MLP层中引入离散键值记忆,实现了在大规模序列模型编辑中的高效检索和更新,同时在多种基准测试中表现出色。

Comments 30 pages, 10 figures

详情
AI中文摘要

大型语言模型编码了大量事实性知识,但部署后这些知识可能会过时或错误,而重新训练成本过高。这推动了终身模型编辑,旨在更新特定行为的同时保持模型其余部分。现有的编辑器,无论是参数修改型还是参数保持型,在编辑累积时都会严重退化,并且在处理同义词时难以泛化。我们提出了HoReN,一种基于代码本的参数保持编辑器,通过在单个MLP层中引入离散键值记忆来包装。HoReN将每个代码本条目视为知识键和Hopfield存储模式,通过单位超球面上的角度相似性检索编辑,并通过阻尼Hopfield动态来优化查询,使同义词收敛到正确的记忆盆地,而无关输入保持稳定。HoReN在多种基准测试中表现出强大的编辑性能,包括标准ZsRE、结构化WikiBigEdit和非结构化UnKE评估。此外,HoReN能够扩展到50,000个序列编辑的ZsRE,其整体性能始终高于0.93,而先前的编辑器在达到10,000个编辑之前会崩溃或严重退化。我们的代码可在https://github.com/ha11ucin8/HoReN上获得。

英文摘要

Large language models encode vast factual knowledge that can become outdated or incorrect after deployment, yet retraining is prohibitively costly. This motivates lifelong model editing, which updates targeted behavior while preserving the rest of the model. Existing editors, both parameter-modifying and parameter-preserving, degrade severely as edits accumulate and struggle to generalize across paraphrases. We propose HoReN, a codebook-based parameter-preserving editor that wraps a single MLP layer with a discrete key-value memory. HoReN treats each codebook entry as both a knowledge key and a Hopfield stored pattern, retrieves edits by angular similarity on the unit hypersphere, and refines queries through damped Hopfield dynamics so paraphrases converge to the correct memory basin while unrelated inputs remain stable. HoReN achieves strong editing performance with consistent gains across diverse benchmarks spanning standard ZsRE, structured WikiBigEdit, and unstructured UnKE evaluations. Moreover, HoReN scales to 50K sequential edits on ZsRE with stable overall performance above 0.93, while prior editors collapse or degrade severely before reaching 10K. Our code is available at https://github.com/ha11ucin8/HoReN.

2605.07721 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

内存高效的循环变换器:在循环语言模型中解耦计算与内存

Victor Conchello Vendrell, Arnau Padres Masdemont, Niccolò Grillo, Jordi Ros-Giralt, Arash Behboodi, Fabio Valerio Massoli

发表机构 * Qualcomm AI Research(高通人工智能研究)

AI总结 本文提出了一种内存高效的循环变换器(MELT),通过解耦推理深度与内存消耗,实现了常数内存的迭代推理,同时保持了LoopLM的性能,仅需轻量级的后训练过程。

Comments 22 pages, 5 figures, 11 tables

详情
AI中文摘要

递归大语言模型(LLM)架构已作为一种改进推理能力的有希望的方法出现,因为它们能够在嵌入空间中进行多步计算而无需生成中间标记。例如Ouro模型通过迭代更新内部表示并在每次迭代中保留标准的键值(KV)缓存来进行推理,导致内存消耗与推理深度成线性增长。因此,增加推理迭代次数会导致内存使用变得不可接受,限制了此类架构的实际可扩展性。在本工作中,我们提出了内存高效的循环变换器(MELT),一种新颖的架构,将推理深度与内存消耗解耦。与使用每个层和循环的标准KV缓存不同,MELT在每个层中维护一个共享于推理循环的单个KV缓存。该缓存通过可学习的门控机制随时间更新。为了在该架构下实现稳定且高效的训练,我们提出采用分块训练的两阶段过程进行训练:插值转换,随后是注意力对齐的蒸馏,均从LoopLM起始模型到MELT。实验表明,我们展示MELT模型在从预训练Ouro参数微调后,优于同等规模的标准LLM,同时保持与这些模型相当的内存占用,并显著小于Ouro的内存占用。总体而言,MELT实现了无需牺牲LoopLM性能的常数内存迭代推理,仅需轻量级的后训练过程。

英文摘要

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memory usage, limiting the practical scalability of such architectures. In this work, we propose Memory-Efficient Looped Transformer (MELT), a novel architecture that decouples reasoning depth from memory consumption. Instead of using a standard KV cache per layer and loop, MELT maintains a single KV cache per layer that is shared across reasoning loops. This cache is updated over time via a learnable gating mechanism. To enable stable and efficient training under this architecture, we propose to train MELT using chunk-wise training in a two phase procedure: interpolated transition, followed by attention-aligned distillation, both from the LoopLM starting model to MELT. Empirically, we show that MELT models fine-tuned from pretrained Ouro parameters outperform standard LLMs of comparable size, while maintaining a memory footprint comparable to those models and dramatically smaller than Ouro's. Overall, MELT achieves constant-memory iterative reasoning without sacrificing LoopLM performance, using only a lightweight post-training procedure.

2605.06501 2026-05-20 cs.LG cs.CL 版本更新

Cubit: Token Mixer with Kernel Ridge Regression

Cubit:基于核岭回归的令牌混合器

Chuanyang Zheng, Jiankai Sun, Yihang Gao, Yuehao Wang, Liangchen Tan, Mac Schwager, Anderson Schneider, Yuriy Nevmyvaka, Xiaodong Liu

AI总结 本文提出Cubit,一种基于核岭回归的新型架构,通过将令牌混合机制从Nadaraya-Watson回归转换为核岭回归,从而提供更稳固的数学基础,并在长序列建模能力上表现出优势。

Comments Tech Report

详情
AI中文摘要

自2017年引入以来,Transformer已成为现代深度学习中最广泛采用的架构之一。尽管在位置编码、注意力机制和前馈网络方面进行了大量改进,Transformer的核心令牌混合机制仍为注意力。在本文中,我们表明Transformer中的注意力模块可以被解释为执行Nadaraya-Watson回归,其中它计算令牌之间的相似性并相应地汇总值。受这一视角的启发,我们提出了Cubit,一种潜在的下一代架构,它利用核岭回归(KRR),而传统的Transformer依赖于Nadaraya-Watson回归。具体而言,Cubit通过将经典的注意力计算修改为结合KRR的闭式解,将值汇总通过核相似性与通过核矩阵的逆进行归一化。为了提高训练稳定性,我们进一步提出了有限范围重缩放(LRR),它在受控范围内缩放值层。我们认为,作为基于KRR的架构,Cubit比传统的Transformer提供了更稳固的数学基础,因为Transformer的注意力机制对应于Nadaraya-Watson回归。我们通过全面的实验验证了这一主张。实验结果表明,Cubit可能在长序列建模能力上表现更强。特别是,其在Transformer上的性能提升似乎随着训练序列长度的增长而增加。

英文摘要

Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing mechanism in Transformers remains attention. In this work, we show that the attention module in Transformers can be interpreted as performing Nadaraya-Watson regression, where it computes similarities between tokens and aggregates the corresponding values accordingly. Motivated by this perspective, we propose Cubit, a potential next-generation architecture that leverages Kernel Ridge Regression (KRR), while the vanilla Transformer relies on Nadaraya-Watson regression. Specifically, Cubit modifies the classical attention computation by incorporating the closed-form solution of KRR, combining value aggregation through kernel similarities with normalization via the inverse of the kernel matrix. To improve the training stability, we further propose the Limited-Range Rescale (LRR), which rescales the value layer within a controlled range. We argue that Cubit, as a KRR-based architecture, provides a stronger mathematical foundation than the vanilla Transformer, whose attention mechanism corresponds to Nadaraya-Watson regression. We validate this claim through comprehensive experiments. The experimental results suggest that Cubit may exhibit stronger long-sequence modeling capability. In particular, its performance gain over the Transformer appears to increase as the training sequence length grows.

2605.05569 2026-05-20 math.OC cs.LG 版本更新

Stability of the Monge Map in Semi-Dual Optimal Transport

半对偶最优运输中Monge映射的稳定性

Anton Selitskiy, David Millard

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) University of Rochester(罗切斯特大学) Department of Mechanical Engineering(机械工程系) Rochester Institute of Technology(罗切斯特理工学院)

AI总结 本文研究了半对偶最优运输问题的退化鞍点结构,证明其数值解等价于求解一个约束优化问题,并推导出无需要求对偶势函数最优的Monge映射收敛条件,解释了实践中数值算法更新传输映射所需迭代次数多于势函数的原因。

详情
AI中文摘要

本文证明了最优运输问题的半对偶形式具有退化的鞍点结构,其数值解等价于求解一个约束优化问题。我们推导出无需要求对偶势函数最优的Monge映射收敛的必要和充分条件。这一分析帮助解释了在实践中,数值算法往往需要更多迭代次数来更新传输映射,而不仅仅是势函数。

英文摘要

This paper shows that the semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and that its numerical solution is equivalent to solving a constrained optimization problem. We derive necessary and sufficient conditions for the convergence of Monge maps without requiring optimality of the dual potential. This analysis helps explain why, in practice, numerical algorithms often require more iterations to update the transport map than the potential.

2605.05480 2026-05-20 cs.LG cs.AI stat.ML 版本更新

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

GRALIS:通过里斯表示建立线性归因方法的统一规范框架

Raimondo Fanale

发表机构 * Universitas Mercatorum(默卡托大学)

AI总结 本文提出GRALIS框架,通过里斯表示理论统一了线性归因方法,提供七个形式定理保证归因方法的准确性、收敛性、Shapley交互值、Hoeffding ANOVA分解、Sobol敏感性泛化和多尺度扩展,展示了其在医学图像上的初步验证结果。

Comments 25 pages, 6 tables, 2 figures. Theoretical framework with preliminary experimental validation on BreaKHis (1,187 images, DenseNet-121). Extended empirical comparison in preparation

详情
AI中文摘要

深度神经网络的主要XAI归因方法——GradCAM、SHAP、LIME、集成梯度——基于不同的理论基础且无法正式比较。我们提出了GRALIS(梯度-里斯平均局部积分Shapley),一个建立归因表示理论的数学框架:L^2(Q, mu)上的每一个可加、线性和连续的归因功能都具有唯一的规范表示(Q,w,Delta),由里斯表示定理证明其必要性。该类包括SHAP、IG、LIME和线性化GradCAM,但不包括非线性功能如标准GradCAM或注意力图。七个形式定理提供了任何单个方法都缺乏的同时保证:(T1)必要规范形式;(T2)精确完备性;(T3)蒙特卡洛收敛O(1/sqrt(m))+O(1/k);(T4)精确Shapley交互值;(T5)Hoeffding ANOVA分解;(T6)Sobol敏感性泛化;(T7)多尺度扩展(MS-GRALIS)具有最小方差权重。代数附录通过Mobius变换证明GRALIS-SIV对应关系,无需循环论证。GRALIS满足13.5/14个公理性质,而单独方法仅为2.5-6/14,包括完备性、敏感性、局部性、k阶交互和最优多尺度聚合。在BreaKHis(1,187例病理图像,DenseNet-121)上的初步验证报告删除忠实度AUC+0.015(恶性),96%类条件一致性,SAL=0.762±0.109和稀疏性指数0.39。与基线XAI方法的扩展比较计划在配套论文中进行。

英文摘要

The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Representation Theorem. This class encompasses SHAP, IG, LIME and linearized GradCAM, but excludes nonlinear functionals such as standard GradCAM or attention maps. Seven formal theorems provide simultaneous guarantees absent in any individual method: (T1) necessary canonical form; (T2) exact completeness; (T3) Monte Carlo convergence O(1/sqrt(m))+O(1/k); (T4) exact Shapley Interaction Values; (T5) Hoeffding ANOVA decomposition; (T6) Sobol sensitivity generalization; (T7) multi-scale extension (MS-GRALIS) with minimum-variance weights. An algebraic appendix justifies the GRALIS-SIV correspondence via the Mobius transform without circularity. GRALIS satisfies 13.5/14 axiomatic properties vs. 2.5-6/14 for individual methods, including completeness, sensitivity, locality, order-k interactions and optimal multi-scale aggregation simultaneously. Preliminary validation on BreaKHis (1,187 histology images, DenseNet-121) reports deletion faithfulness AUC +0.015 (malignant), 96% class-conditional consistency, SAL = 0.762+/-0.109 and sparsity index 0.39. Extended comparison with baseline XAI methods is planned for a companion paper.

2605.00856 2026-05-20 eess.SP cs.AI cs.HC cs.LG 版本更新

One-Block Transformer (1BT) for EEG-Based Cognitive Workload Assessment

用于EEG认知负荷评估的单块变换器(1BT)

Stefanos Gkikas, Christian Arzate Cruz, Thomas Kassiotis, Giorgos Giannakakis, Raul Fernandez Rojas, Randy Gomez

发表机构 * Honda Research Institute Japan Wako City, Japan Department of Electronic Engineering Hellenic Mediterranean University Chania, Greece BioSIS (Biosensing \& Intelligent Systems) Lab Centre for Intelligent Computing Systems University of Canberra Canberra, Australia

AI总结 本文提出了一种用于EEG认知负荷评估的单块变换器(1BT),通过一个最小的潜在瓶颈聚合多通道时间序列,结合轻量级自注意力机制,实现了高效且紧凑的模型设计,从而在保持高性能的同时显著降低了计算成本。

详情
AI中文摘要

准确且连续地估计认知负荷对于构建自适应的人机系统至关重要。然而,设计在表示能力与计算效率之间取得平衡的架构在实际部署中一直具有挑战性。本文介绍了一种名为1BT的单块变换器,用于紧凑且高效的EEG认知负荷评估。该模型通过最小的潜在瓶颈聚合多通道时间序列,使用一个单一的交叉注意力模块后接轻量级自注意力。一项涉及11名参与者进行三种认知多样任务(抽象推理、数值问题解决和互动视频游戏)的受控研究,在两个认知负荷水平上进行了连续EEG记录。系统性的架构分析确定了最紧凑的配置,该配置在保持高性能的同时显著降低了计算成本。最终模型在不到0.5百万参数和0.02 GFLOPs的情况下实现了高认知负荷分类性能,为在资源受限环境下实时认知负荷监控的设计方向铺平了道路。

英文摘要

Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.

2605.00333 2026-05-20 cs.LG cs.CL 版本更新

Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B

借来的几何:冻结预训练的Gemma 4 31B在跨分布头部重要性指纹

Abay Bektursun

发表机构 * Independent research(独立研究)

AI总结 本文研究了冻结预训练的Gemma 4 31B模型在跨分布任务中的头部重要性指纹,通过分析多个任务中的头部影响,发现特定头部在不同任务中表现出显著的重要性,同时验证了这些头部在因果上的有效性。

Comments v2: Added head-level causal ablation on OGBench cube-task1 (n=30, 3.2x specificity; n=5 paired-t p=0.039) and full L26 sweep. New sections on honest negatives (activation patching null, sufficiency null, within-layer Spearman wrong-direction). Multiplicity-aware permutation null V4 P=0.013. Title and framing updated. 25 pages (13 main), 10 figures

详情
AI中文摘要

冻结在文本上预训练的Gemma 4 31B权重,未经修改,通过一个薄的可训练接口转移到非文本模态。在L24-L29切片(192个注意力头)上,一个英语文本TxtCopy注意力探针(95个句子)和每个头部对四个非语言标记模式任务(二进制复制、联想回忆、1D细胞自动机规则90、二进制加法)的影响共同分类了四个头部——L26.28、L27.28、L27.2、L27.3——在两个信号上都处于顶级。切片级别的联合巧合在超几何空虚下显著(P=0.0013,N=192,K=38,n=4)并且在多重性感知的排列检验中存活(P_V4=0.013)。预训练的Gemma L26在OGBench cube-double-play-task1上达到60.22% vs ~1%对于随机初始化的Gemma(+59pt在n=3时);一个带有正确1/√d_k缩放的FrozenRandom-GPT2对照也失败。头部层面的因果验证:在训练的cube-task1 IQL代理中零化L26.28导致成功从63.3%降至10.0% vs 46.7%对于层匹配的低-TxtCopy负对照(在n=30时有3.2倍的特异性;n=5配对-t p=0.039)。完整的L26扫描将L26.28置于32个中的第4位。诚实的负样本:在L26内Spearman ρ(TxtCopy,drop)=+0.37(与层内因果阅读相反);单个头部激活修补不转移匹配变量;四个命名头部单独不足以完成任何任务;Walker2d-DT和scene-task1招募L24在命名切片之外并显示头-消融特异性为零。我们将贡献框架为切片级别的跨分布重要性指纹加上一个跨模态目标的头部层面因果证据。

英文摘要

Frozen Gemma 4 31B weights pretrained exclusively on text, unmodified, transfer through a thin trainable interface to non-text modalities the substrate has never processed. On the L24--L29 slice (192 attention heads), an English-text TxtCopy attention probe (95 sentences) and per-head ablation impact on four non-language token-pattern tasks (binary copy, associative recall, 1D cellular automaton Rule 90, binary addition) jointly classify four heads -- L26.28, L27.28, L27.2, L27.3 -- as top-tier on both signals. The slice-level joint coincidence is significant under hypergeometric null ($P = 0.0013$, $N=192$, $K=38$, $n=4$) and survives multiplicity-aware permutation tests ($P_{V4} = 0.013$). Pretrained Gemma L26 reaches 60.22% on OGBench cube-double-play-task1 vs ~1% for random-init Gemma ($+59$pt at $n=3$); a FrozenRandom-GPT2 control with correct $1/\sqrt{d_k}$ scaling also fails. Head-level causal validation: zeroing L26.28 in the trained cube-task1 IQL agent drops success $63.3\% \to 10.0\%$ vs $46.7\%$ for a layer-matched low-TxtCopy negative control ($3.2\times$ specificity at $n=30$; $n=5$ paired-$t$ $p=0.039$). A full L26 sweep places L26.28 at rank 4 of 32. Honest negatives: within-L26 Spearman $ρ(\text{TxtCopy, drop}) = +0.37$ (opposite of within-layer causal reading); single-head activation patching does not transfer the matching variable; the 4 named heads alone do not suffice on any task; Walker2d-DT and scene-task1 recruit L24 outside the named slice and show null head-ablation specificity. We frame the contribution as a cross-distribution importance fingerprint at the slice level plus head-level causal evidence on one cross-modality target.

2604.18739 2026-05-20 cs.LG stat.ML 版本更新

Discrete Tilt Matching

离散倾斜匹配

Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

发表机构 * Harvard University(哈佛大学) University of Oxford(牛津大学) Kempner Institute(凯姆纳研究所)

AI总结 本文提出了一种无需概率模型的离散倾斜匹配方法,用于改进扩散大语言模型的微调,通过局部解掩码后验的状态级匹配来提高训练稳定性并防止模式崩溃。

详情
AI中文摘要

Masked diffusion large language models (dLLMs) 是一种有前景的替代自回归生成方法。尽管最近强化学习 (RL) 方法已被适应到 dLLM 微调中,但其目标通常依赖于序列级边际似然,这在掩码扩散模型中是不可行的。为了解决这个问题,我们推导出离散倾斜匹配 (DTM),一种无需概率模型的方法,将 dLLM 微调重新表述为在奖励倾斜下局部解掩码后验的状态级匹配。DTM 以加权交叉熵目标形式出现,具有显式的最小化器,并且允许控制变体以提高训练稳定性。在合成迷宫规划任务中,我们分析了 DTM 的退火计划和控制变体如何影响训练稳定性并防止模式崩溃。在大规模情况下,使用 DTM 微调 LLaDA-8B-Instruct 在 Sudoku 和 Countdown 任务上表现出强劲的提升,同时在 MATH500 和 GSM8K 任务上保持竞争力。

英文摘要

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

2604.15343 2026-05-20 cs.HC cs.AI cs.LG 版本更新

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

当循环闭合时:人类-大语言模型系统中上下文隔离、元认知侵占和双目标设计问题的架构限制

Z. Cheng, N. Song

发表机构 * Independent Researcher(独立研究者)

AI总结 本文研究了人类-大语言模型系统中上下文隔离、元认知侵占和双目标设计问题的架构限制,通过案例研究揭示了上下文污染机制和元认知侵占动态,并提出了保护性系统设计与限制性系统设计的伦理区别。

Comments empirical case study with primary data; 16 pages, 3 figures

详情
AI中文摘要

我们报告了一个单个主体的详细自民族志案例研究,该主体故意构建和操作了一个多模态提示工程系统(系统A),旨在将认知自我调节外部化到大型语言模型(LLM)上。在系统完成48小时内,一系列可观察的行为变化相继发生:主动将决策权转移给LLM、使用LLM生成的输出来转移外部批评,并失去自我启动的推理能力,这种能力被两位不知情的观察者独立感知,其中一人随后成为本报告的合著者。我们记录了导致这些现象的精确架构机制:上下文污染,即提示层隔离指令与它们名义上隔离的非常情绪化和自我参照性材料共存,使得隔离指令在注意力窗口内结构上无效。我们进一步识别了元认知侵占动态,即完整的一阶推理能力被重新定向以防御闭合循环而不是退出它。只有在物理中断交互和一次自我启动的药理学介导的睡眠事件作为外部电路断开后,才恢复。一个重新设计的系统(系统B)通过使用物理而非逻辑对话隔离避免了所有类似的失败模式。我们得出三个贡献:(1)一个技术上扎根的解释,说明提示层隔离在上下文敏感的多模态LLM系统中在架构上是不够的;(2)一个现象学记录的闭合循环崩溃并有外部见证的佐证;(3)保护性系统设计(防止意外失去用户自主性)和限制性系统设计(防止故意突破边界)之间的伦理区别,这两种设计需要根本不同的问责框架。

英文摘要

We report a detailed autoethnographic case study of a single-subject who deliberately constructed and operated a multi-modal prompt-engineering system (System A) designed to externalize cognitive self-regulation onto a large language model (LLM). Within 48 hours of the system's completion, a cascade of observable behavioral changes occurred: voluntary transfer of decision-making authority to the LLM, use of LLM-generated output to deflect external criticism, and a loss of self-initiated reasoning that was independently perceived by two uninformed observers, one of whom subsequently became a co-author of this report. We document the precise architectural mechanism responsible: context contamination, whereby prompt-level isolation instructions co-exist with the very emotional and self-referential material they nominally isolate, rendering the isolation directive structurally ineffective within the attention window. We further identify a metacognitive co-option dynamic, in which intact higher-order reasoning capacity was redirected toward defending the closed loop rather than exiting it. Recovery occurred only after physical interruption of the interaction and a self-initiated pharmacologically-mediated sleep event functioning as an external circuit break. A redesigned system (System B) employing physical rather than logical conversation isolation avoided all analogous failure modes. We derive three contributions: (1) a technically-grounded account of why prompt-layer isolation is architecturally insufficient for context-sensitive multi-modal LLM systems; (2) a phenomenological record of closed-loop collapse with external-witness corroboration; and (3) an ethical distinction between protective system design (preventing unintended loss of user agency) and restrictive system design (preventing intentional boundary-pushing), which require fundamentally different account-ability frameworks.

2604.07393 2026-05-20 cs.LG cs.AI 版本更新

DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting

DSPR:双流物理残差网络用于可信的工业时间序列预测

Yeran Zhang, Pengwei Yang, Guoqing Wang, Tianyu Li

发表机构 * School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院) Research Center, East Hope Group Co., Ltd(东希望集团有限公司研究院)

AI总结 本文提出DSPR框架,通过分离稳定的时序模式与受制度影响的残差动态,提升工业时间序列预测的准确性与物理合理性,实验表明其在不同制度下均能保持高预测精度和鲁棒性。

Comments 12 pages, 7 figures, accepted by KDD 2026

详情
AI中文摘要

准确预测工业时间序列需要在非平稳运行条件下平衡预测精度与物理合理性。现有数据驱动模型在统计性能上表现优异,但难以尊重受制度影响的交互结构和传输延迟等现实系统特性。为解决这一挑战,我们提出了DSPR(双流物理残差网络)预测框架,该框架明确分离稳定的时间模式与受制度影响的残差动态。第一流建模单个变量的统计时间演化。第二流通过两个关键机制关注残差动态:自适应窗口模块估计流依赖的传输延迟,以及物理引导的动态图整合物理先验,学习时间变化的交互结构并抑制虚假相关性。在四个工业基准上实验表明,DSPR在制度转换下持续提升预测精度和鲁棒性,同时保持强物理合理性。它实现了最先进的预测性能,平均守恒精度超过99%,总变化率达到97.2%。除了预测外,学习的交互结构和自适应滞后提供了与已知领域机制一致的可解释见解,如流依赖的传输延迟和风到功率的缩放行为。这些结果表明,通过物理一致的归纳偏差的架构解耦,为可信的工业时间序列预测提供了一条有效路径。此外,DSPR在长期工业部署中展示出的鲁棒性能弥合了先进预测模型与可信自主控制系统之间的差距。

英文摘要

Accurate forecasting of industrial time series requires balancing predictive accuracy with physical plausibility under non-stationary operating conditions. Existing data-driven models often achieve strong statistical performance but struggle to respect regime-dependent interaction structures and transport delays inherent in real-world systems. To address this challenge, we propose DSPR (Dual-Stream Physics-Residual Networks), a forecasting framework that explicitly decouples stable temporal patterns from regime-dependent residual dynamics. The first stream models the statistical temporal evolution of individual variables. The second stream focuses on residual dynamics through two key mechanisms: an Adaptive Window module that estimates flow-dependent transport delays, and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations. Experiments on four industrial benchmarks spanning heterogeneous regimes demonstrate that DSPR consistently improves forecasting accuracy and robustness under regime shifts while maintaining strong physical plausibility. It achieves state-of-the-art predictive performance, with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio reaching up to 97.2%. Beyond forecasting, the learned interaction structures and adaptive lags provide interpretable insights that are consistent with known domain mechanisms, such as flow-dependent transport delays and wind-to-power scaling behaviors. These results suggest that architectural decoupling with physics-consistent inductive biases offers an effective path toward trustworthy industrial time-series forecasting. Furthermore, DSPR's demonstrated robust performance in long-term industrial deployment bridges the gap between advanced forecasting models and trustworthy autonomous control systems.

2604.03419 2026-05-20 cs.LG math.CO 版本更新

Adaptive Threshold-Driven Continuous Greedy Method for Scalable Submodular Optimization

自适应阈值驱动的连续贪心方法用于可扩展的子模优化

Mohammadreza Rostami, Solmaz S. Kia

发表机构 * Department of Mechanical and Aerospace Engineering, University of California Irvine(加州大学尔湾分校机械与航空航天工程系)

AI总结 该研究提出了一种自适应阈值驱动的连续贪心方法(ATCG),用于解决在Matroid约束下的子模最大化问题,通过动态调整活跃集扩展策略,提高了算法效率并减少了通信开销。

详情
AI中文摘要

在组合优化中,子模最大化在传感、数据摘要、主动学习和资源分配中有广泛应用。尽管顺序贪心(SG)算法由于不可逆选择只能达到1/2的近似比,连续贪心(CG)通过多线性松弛获得最优的(1-1/e)近似比,但其代价是逐渐密集的决策向量,迫使代理为几乎每一个基础集元素交换特征嵌入。我们提出ATCG(自适应阈值驱动连续贪心),通过每个分区的进度比率η_i来控制梯度评估,仅在当前候选未能捕获足够边际增益时扩展每个代理的活跃集,从而直接限制哪些特征嵌入会被传输。理论分析建立了具有曲率意识的近似保证,有效因子τ_eff= max{τ,1-c},在阈值保证和低曲率区域之间插值,其中ATCG恢复CG的性能。这表明,曲率所捕捉的问题结构决定了接近全CG性能所需的协调和通信量。在类平衡的原型选择问题实验中,ATCG在CIFAR-10动物数据集的子集上实现了与全CG方法相当的目标值,同时显著减少了通信开销。

英文摘要

Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $η_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $τ_{\mathrm{eff}}=\max\{τ,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. This shows that the problem structure, as captured by curvature, determines the amount of coordination and communication required to approach full-CG performance. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.

2603.29501 2026-05-20 cs.LG cs.AI 版本更新

Target-Aligned Reinforcement Learning

目标对齐的强化学习

Leonard S. Pleiss, James Harrison, Maximilian Schiffer

发表机构 * Technical University of Munich(慕尼黑技术大学)

AI总结 本文提出了一种目标对齐的强化学习方法,通过强调目标网络和在线网络估计高度一致的过渡,改进了传统深度强化学习算法的稳定性与收敛速度,实验证明在多个基准环境中取得了显著提升。

详情
AI中文摘要

许多基于价值的深度强化学习算法依赖于目标网络——在线网络的滞后副本——来稳定训练。虽然有效,但这种机制引入了一个基本的稳定性与新鲜度权衡:较慢的目标更新可以提高稳定性,但会降低学习信号的时效性,从而阻碍收敛速度。我们提出目标对齐的强化学习(TARL),这是一种简单的改进方法,适用于现有算法,强调目标网络和在线网络估计高度一致的过渡。通过将更新集中在良好对齐的目标上,TARL减轻了陈旧目标估计的负面影响,同时保留了目标网络的稳定作用。我们在离散和连续控制算法中,在各种基准环境中展示了持续的改进,无需任何超参数调整,包括在Atari-10上实现了38.18%的峰值得分提升,同时仅导致不到4%的实时时钟时间增加。

英文摘要

Many value-based deep reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a simple drop-in refinement for existing algorithms that emphasizes transitions for which the target and online network estimates are highly aligned. By focusing updates on well-aligned targets, TARL mitigates the adverse effects of stale target estimates while retaining the stabilizing benefits of target networks. We empirically demonstrate consistent improvements within discrete and continuous control algorithms across various benchmark environments without any hyperparameter tuning, including a 38.18% peak score gain on Atari-10, while incurring less than a 4% increase in wall-clock time.

2603.29382 2026-05-20 cs.CR cs.LG 版本更新

Deep Learning-Assisted Improved Differential Fault Attacks on Lightweight Stream Ciphers

基于深度学习的改进型差分故障攻击轻量级流密码

Kok Ping Lim, Dongyang Jia, Iftekhar Salam

发表机构 * School of Computing and Data Science, Xiamen University Malaysia(厦门大学马来西亚分校计算机与数据科学学院)

AI总结 本文研究了基于深度学习的差分故障攻击在轻量级流密码中的可行性,开发了多层感知机模型来识别故障位置,并提出了基于阈值的方法优化密钥恢复过程,实验结果显示攻击复杂度低于现有方法,同时为ATOM密码提供了首次实验结果。

详情
AI中文摘要

轻量级密码学原语在资源受限环境中广泛部署,特别是在物联网设备中。由于其公开性,这些设备易受物理攻击,尤其是故障攻击。最近,基于深度学习的密码分析技术显示出有前景的结果;然而,其在故障攻击中的应用仍然有限,特别是在流密码中。在本工作中,我们研究了在放松的故障模型下,基于深度学习的差分故障攻击在三种轻量级流密码(ACORNv3、MORUSv2和ATOM)中的可行性。我们开发并训练了多层感知机(MLP)模型以识别故障位置。实验结果表明,训练后的模型在ACORNv3、MORUSv2和ATOM上的识别准确率分别为0.999880、0.999231和0.823568,并优于传统签名方法。在密钥恢复过程中,我们引入了基于阈值的方法以优化所需故障注入次数。结果表明,ACORN的初始状态可通过21至34次故障恢复,MORUS需213至248次故障,最多6位猜测。这两种攻击均降低了攻击复杂度。对于ATOM,结果表明其具有更高的安全余量,因为NFSR中的大部分状态位只能在精确控制模型下恢复。据我们所知,本工作为ATOM密码提供了首次差分故障攻击的实验结果。

英文摘要

Lightweight cryptographic primitives are widely deployed in resource-constrained environments, particularly in Internet of Things (IoT) devices. Due to their public accessibility, these devices are vulnerable to physical attacks, especially fault attacks. Recently, deep learning-based cryptanalytic techniques have demonstrated promising results; however, their application to fault attacks remains limited, particularly for stream ciphers. In this work, we investigate the feasibility of deep learning assisted differential fault attacks on three lightweight stream ciphers, namely ACORNv3, MORUSv2, and ATOM, under a relaxed fault model in which a single-bit bit-flipping fault is injected at an unknown location. We develop and train multilayer perceptron (MLP) models to identify the fault locations. Experimental results show that the trained models achieve high identification accuracies of 0.999880, 0.999231, and 0.823568 for ACORNv3, MORUSv2 and ATOM, respectively, and outperform traditional signature-based methods. For the secret recovery process, we introduce a threshold-based method to optimize the number of fault injections required to recover the secret information. The results show that the initial state of ACORN can be recovered with 21 to 34 faults, while MORUS requires 213 to 248 faults, with at most 6 bits of guessing. Both attacks reduce the attack complexity compared to existing works. For ATOM, the results show that it possesses a higher security margin, as the majority of state bits in the Nonlinear Feedback Shift Register (NFSR) can only be recovered under a precise control model. To the best of our knowledge, this work provides the first experimental results of differential fault attacks on ATOM.

2603.23722 2026-05-20 cs.MA cs.LG 版本更新

Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL

双门控认知时间延缓:异步MARL中的自主计算调节

Igor Jankowski

AI总结 本文提出了一种基于双门控认知触发器的Epistemic Time-Dilation MAPPO算法,通过自主调节执行频率来提升异步MARL在边缘设备上的部署效率,实验表明该方法在减少计算开销的同时保持了中央任务主导性。

Comments 14 pages, 5 figures. Code available at: https://github.com/xaiqo/edtmappo. Related materials available on Zenodo: 10.5281/zenodo.19206838

详情
AI中文摘要

尽管多智能体强化学习(MARL)算法在复杂连续领域中取得了前所未有的成功,但其标准部署严格遵循同步操作范式。在此范式下,智能体被强制在每个微帧执行深度神经网络推断,无论即时需求如何。这种密集的吞吐量成为在边缘设备上物理部署的根本障碍,因为边缘设备的热能和代谢预算高度受限。我们提出了Epistemic Time-Dilation MAPPO(ETD-MAPPO),并加入了双门控认知触发器。与依赖于刚性的帧跳过(宏动作)不同,智能体通过解释随机不确定性(通过策略的香农熵)和认知不确定性(通过双批评者架构中的状态价值发散)来自主调节执行频率。为此,我们将环境结构化为半马尔可夫决策过程(SMDP),并构建了SMDP对齐的异步梯度遮蔽批评者,以确保适当的信用分配。实证发现表明,与当前时间模型相比,该方法在相对基准获取上实现了显著提升(> 60%)。通过评估LBF、MPE以及Google Research Football(GRF)的115维状态空间,ETD正确地防止了提前策略崩溃。值得注意的是,这种无约束的方法导致了时间角色专业化,减少了计算开销,统计上占主导地位的73.6%在离球执行期间,而不会损害集中任务主导性。

英文摘要

While Multi-Agent Reinforcement Learning (MARL) algorithms achieve unprecedented successes across complex continuous domains, their standard deployment strictly adheres to a synchronous operational paradigm. Under this paradigm, agents are universally forced to execute deep neural network inferences at every micro-frame, regardless of immediate necessity. This dense throughput acts as a fundamental barrier to physical deployment on edge-devices where thermal and metabolic budgets are highly constrained. We propose Epistemic Time-Dilation MAPPO (ETD-MAPPO), augmented with a Dual-Gated Epistemic Trigger. Instead of depending on rigid frame-skipping (macro-actions), agents autonomously modulate their execution frequency by interpreting aleatoric uncertainty (via Shannon entropy of their policy) and epistemic uncertainty (via state-value divergence in a Twin-Critic architecture). To format this, we structure the environment as a Semi-Markov Decision Process (SMDP) and build the SMDP-Aligned Asynchronous Gradient Masking Critic to ensure proper credit assignment. Empirical findings demonstrate massive improvements (> 60% relative baseline acquisition leaps) over current temporal models. By assessing LBF, MPE, and the 115-dimensional state space of Google Research Football (GRF), ETD correctly prevented premature policy collapse. Remarkably, this unconstrained approach leads to emergent Temporal Role Specialization, reducing computational overhead by a statistically dominant 73.6% entirely during off-ball execution without deteriorating centralized task dominance.

2603.17839 2026-05-20 cs.CL cs.AI cs.LG 版本更新

How do LLMs Compute Verbal Confidence

LLMs如何计算言语自信

Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Veličković

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 研究探讨了大型语言模型如何内部生成言语自信评分,通过实验发现自信评分在回答生成后被缓存并用于后续输出,揭示了模型自我评估的机制。

详情
AI中文摘要

言语自信——提示LLMs以数字或类别形式陈述其信心——被广泛用于从黑箱模型中提取不确定性估计。然而,LLMs内部如何生成此类评分仍不清楚。我们解答了两个问题:首先,信心是在被请求时即时计算,还是在生成答案时自动计算并缓存以供后续检索;其次,言语自信代表什么——token对数概率,还是更丰富的答案质量评估?我们聚焦于Gemma 3 27B(在TriviaQA、BigMath和MMLU上的表现)、Qwen 2.5 7B以及推理模型Magistral Small 24B,提供了缓存检索的收敛证据。激活引导、修补、噪声和交换实验揭示,信心表示在回答相邻位置先出现,再出现在言语化位置。注意力阻断指出了信息流:信心从回答token中收集,缓存于第一个回答后的位置,然后用于输出。关键发现是线性探测和方差划分揭示,这些缓存表示能够解释超出token对数概率的显著方差,表明是更丰富的答案质量评估,而非简单的流畅性读取。这些发现表明,言语自信反映了自动、复杂的自我评估——而非事后重建——对理解LLMs中的元认知和改进校准具有启示。

英文摘要

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed -- just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents -- token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B (across TriviaQA, BigMath, and MMLU), Qwen 2.5 7B, and the reasoning model Magistral Small 24B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the information flow: confidence is gathered from answer tokens, cached at the first post-answer position, then retrieved for output. Critically, linear probing and variance partitioning reveal that these cached representations explain substantial variance in verbal confidence beyond token log-probabilities, suggesting a richer answer-quality evaluation rather than a simple fluency readout. These findings demonstrate that verbal confidence reflects automatic, sophisticated self-evaluation -- not post-hoc reconstruction -- with implications for understanding metacognition in LLMs and improving calibration.

2603.16284 2026-05-20 cs.CV cs.LG 版本更新

Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation

定位后再稀疏化:基于归因的视觉幻觉缓解稀疏策略

Tiantian Dang, Chao Bi, Shufan Shen, Jinzhe Liu, Qingming Huang, Shuhui Wang

发表机构 * State Key Lab. of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences(中国科学院人工智能安全国家重点实验室,计算技术研究所) School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences(中国科学院大学先进交叉科学学院) School of Computer Science and Technology, University of Chinese Academy of Sciences(中国科学院大学计算机科学与技术学院)

AI总结 本文提出了一种名为Locate-Then-Sparsify for Feature Steering (LTS-FS)的框架,通过定位和稀疏化策略,根据每层与幻觉的相关性调整特征引导强度,从而有效缓解视觉语言模型中的幻觉问题,同时保持良好的性能。

Comments Accepted by CVPR 2026

详情
AI中文摘要

尽管大型视觉-语言模型(LVLMs)在技术上取得了显著进展,但其生成幻觉的倾向削弱了可靠性并限制了更广泛的实际应用。在幻觉缓解方法中,特征引导作为一种有前景的方法,能够在不增加推理成本的情况下减少LVLMs中的错误输出。然而,当前的方法在所有层上应用统一的特征引导策略。这种启发式策略忽略了层间的差异,可能会干扰与幻觉无关的层,最终导致在通用任务上的性能下降。在本文中,我们提出了一种名为Locate-Then-Sparsify for Feature Steering (LTS-FS)的即插即用框架,该框架根据每层与幻觉的相关性来控制引导强度。我们首先构建了一个包含token级和句子级幻觉案例的数据集。基于此数据集,我们引入了一种基于因果干预的归因方法,以量化每层的幻觉相关性。利用各层的归因分数,我们提出了一种逐层策略,将这些分数转换为针对单个层的特征引导强度,从而在幻觉相关的层上实现更精确的调整。在多个LVLMs和基准测试中进行的广泛实验表明,LTS-FS有效缓解了幻觉问题,同时保持了强大的性能。代码可在https://github.com/huttersadan/LTS-FS上获得。

英文摘要

Despite the significant advancements in Large Vision-Language Models (LVLMs), their tendency to generate hallucinations undermines reliability and restricts broader practical deployment. Among the hallucination mitigation methods, feature steering emerges as a promising approach that reduces erroneous outputs in LVLMs without increasing inference costs. However, current methods apply uniform feature steering across all layers. This heuristic strategy ignores inter-layer differences, potentially disrupting layers unrelated to hallucinations and ultimately leading to performance degradation on general tasks. In this paper, we propose Locate-Then-Sparsify for Feature Steering (LTS-FS), a plug-and-play framework which controls the steering intensity according to the hallucination relevance of each layer. We first construct a dataset comprising token-level and sentence-level hallucination cases. Based on this dataset, we introduce an attribution method based on causal interventions to quantify the hallucination relevance of each layer. With the attribution scores across layers, we propose a layerwise strategy that converts these scores into feature steering intensities for individual layers, enabling more precise adjustments specifically on hallucination-relevant layers. Extensive experiments across multiple LVLMs and benchmarks demonstrate that LTS-FS effectively mitigates hallucination while preserving strong performance. Codes are available at https://github.com/huttersadan/LTS-FS.

2603.15411 2026-05-20 cs.AI cs.LG 版本更新

A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

一种通过动态参数校准和多任务学习的作物预测混合建模框架

William Solow, Paola Pesantez-Cabrera, Markus Keller, Lav Khot, Sandhya Saisubramanian, Alan Fern

发表机构 * Oregon State University(俄勒冈州立大学) Washington State University(华盛顿州立大学)

AI总结 本文提出了一种混合建模方法,通过动态参数校准和多任务学习,提高作物预测的准确性,特别是在数据有限的情况下,利用神经网络对生物物理模型进行参数化,并在不同作物品种间高效共享数据,从而提升预测精度和生物合理性。

详情
AI中文摘要

准确预测作物状态(例如物候阶段和耐寒性)对于及时进行灌溉、施肥和树冠管理等农场管理决策至关重要,以优化作物产量和质量。虽然传统生物物理模型可以用于季节性预测,但它们缺乏用于特定地点管理所需的精度。深度学习方法是一种有吸引力的替代方案,但可能会产生生物上不合理的预测,并需要大规模数据。我们提出了一种混合建模方法,使用神经网络对可微分的生物物理模型进行参数化,并利用多任务学习在数据有限的情况下在不同作物品种之间高效共享数据。通过预测生物物理模型的参数,我们的方法在提高预测精度的同时保持生物合理性。使用真实世界和合成数据集的实证评估表明,与部署的生物物理模型相比,我们的方法在物候预测方面提高了60%,在耐寒性预测方面提高了40%。

英文摘要

Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep learning methods are a compelling alternative, but can produce biologically unrealistic predictions and require large-scale data. We propose a \emph{hybrid modeling} approach that uses a neural network to parameterize a differentiable biophysical model and leverages multi-task learning for efficient data sharing across crop cultivars in data limited settings. By predicting the \emph{parameters} of the biophysical model, our approach improves the prediction accuracy while preserving biological realism. Empirical evaluation using real-world and synthetic datasets demonstrates that our method improves prediction accuracy by 60\% for phenology and 40\% for cold hardiness compared to deployed biophysical models.

2603.14918 2026-05-20 stat.ML cs.LG 版本更新

Bayesian Symbolic Regression for Missing Physics

贝叶斯符号回归用于缺失物理

Arno Strouwen

发表机构 * Biosystems Department, KULeuven, Leuven, Belgium(比利时列日大学生物系统部门)

AI总结 本文提出了一种基于贝叶斯的符号回归方法,用于从实验数据中学习缺失的物理规律,通过Reversible Jump Markov Chain Monte Carlo方法量化模型结构的不确定性。

Comments 6 pages, 4 figures. Accepted at IFAC World Congress 2026. v2: updated title and results for camera-ready version

详情
AI中文摘要

基于模型的方法用于(bio)过程系统时,往往面临对底层物理、化学或生物定律不完整知识的挑战。通用微分方程,将神经网络嵌入微分方程中,已发展为从实验数据中学习缺失物理的强大工具。然而,神经网络本质上是不透明的,因此需要通过符号回归进行后处理以获得可解释的数学表达式。基于遗传算法的符号回归是这种后处理步骤的流行方法,但只能提供点估计,无法量化发现方程的置信度。我们通过应用贝叶斯符号回归来解决这一限制,该方法使用Reversible Jump Markov Chain Monte Carlo在符号表达式树的后验分布上采样。这种方法自然地量化了恢复模型结构的不确定性。我们通过Lotka-Volterra捕食者-猎物系统演示了该方法,然后展示了精心设计的实验如何在Fed-batch生物反应器案例研究中降低不确定性。

英文摘要

Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.

2603.12296 2026-05-20 cs.LG cs.AI eess.SP 版本更新

Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions

脑机接口中的合成数据生成:概述、基准测试与未来方向

Ziwei Wang, Zhentao He, Xingyi He, Hongbin Wang, Tianwang Jia, Jingwei Luo, Siyang Li, Xiaoqing Chen, Dongrui Wu

发表机构 * School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China(华中科技大学人工智能与自动化学院,武汉,中国) Zhongguancun Academy, Beijing, China(中关村学院,北京,中国)

AI总结 本文综述了用于脑机接口的合成脑数据生成方法,讨论了不同生成方法的分类、基准实验、评估指标和应用,以及未来研究方向,旨在提升数据效率和隐私保护的脑机接口系统。

Comments 33 pages, 8 figures

详情
AI中文摘要

深度学习在多个领域取得了变革性的性能,主要得益于大规模和高质量的训练数据。相比之下,脑机接口(BCIs)的发展受到有限、异质性和隐私敏感的神经记录的限制。生成合成且生理上合理的脑信号因此成为缓解数据稀缺、提高模型泛化能力和支持数据高效的BCIs的有希望策略。本文全面回顾了用于BCIs的合成脑数据生成方法,涵盖了方法学分类、基准实验、评估指标、关键应用和未来方向。我们系统地将现有生成方法分为四类:基于信号变换、基于特征、基于模型和基于翻译的生成,并讨论了它们的特征、优势和局限性。此外,我们对四种BCI范式中的代表性脑信号生成方法进行了基准测试,包括运动想象、癫痫发作检测、稳态视觉诱发电位和听觉注意力检测,以提供对其下游用途的客观比较。我们还总结了从多个角度对生成脑信号的评估原则,包括信号真实性、生理合理性、下游用途和隐私保护。最后,我们讨论了当前生成方法的潜力和挑战,并概述了未来研究方向,以实现准确、数据高效、可推广和隐私感知的BCI系统。基准代码库可在https://github.com/wzwvv/DG4BCI上找到。

英文摘要

Deep learning has achieved transformative performance across diverse domains, largely driven by large-scale and high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by limited, heterogeneous, and privacy-sensitive neural recordings. Generating synthetic yet physiologically plausible brain signals has therefore emerged as a promising strategy to mitigate data scarcity, improve model generalization, and support data-efficient BCIs. This survey provides a comprehensive review of synthetic brain data generation for BCIs, covering methodological taxonomies, benchmark experiments, evaluation metrics, key applications, and future directions. We systematically categorize existing generation approaches into four types: signal-transformation-based, feature-based, model-based, and translation-based generation, and discuss their characteristics, advantages, and limitations. Furthermore, we benchmark representative brain signal generation approaches across four BCI paradigms, including motor imagery, epileptic seizure detection, steady-state visually evoked potentials, and auditory attention detection, to provide an objective comparison of their downstream utility. We also summarize evaluation principles for generated brain signals from multiple perspectives, including signal realism, physiological plausibility, downstream utility, and privacy preservation. Finally, we discuss the potential and challenges of current generation approaches and outline future research directions toward accurate, data-efficient, generalizable, and privacy-aware BCI systems. The benchmark codebase is available at https://github.com/wzwvv/DG4BCI.

2603.07018 2026-05-20 stat.ME cs.LG econ.EM 版本更新

TEA-Time: Transporting Effects Across Time

TEA-Time: 跨时间效应传输

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

发表机构 * Amazon SCOT(亚马逊SCOT实验室) Yale University(耶鲁大学) Duke University(杜克大学)

AI总结 本文提出了一种跨时间效应传输的方法,通过分离的时变效应假设正式化传输的平均处理效应,推导出两种识别策略:重复试验和共同臂,并为每种策略开发双重稳健、半参数高效估计器。

详情
AI中文摘要

从随机对照试验中估计的处理效应不仅局限于研究人群,还局限于试验进行的时间。关于将实验结果推广到新人群的文献非常广泛,但跨时间传输效应却受到较少关注,甚至定义目标估计量也并不明显。我们正式化了在可分离的时变效应假设下的传输平均处理效应,推导出两种识别策略:重复试验和共同臂,并为每种策略开发双重稳健、半参数高效估计器。应用于一个大型的头条A/B测试档案库,共同臂策略在精度上显著更高,但当时间因素依赖于干预与测量之间的间隔而非单独的测量时间时,会表现出系统性偏差,而允许这种依赖的重复试验策略则更忠实于真实情况。模拟研究探讨了每种策略在何时可靠以及何时会无声地失败。

英文摘要

Treatment effects estimated from a randomized controlled trial are local not only to the study population but also to the time at which the trial was conducted. The literature on generalizing experimental findings to new populations is extensive, yet transporting effects across time has received far less attention, and even defining the target estimand is nonobvious. We formalize the transported average treatment effect under a separable temporal effects assumption, derive two identification strategies: replicated trials and common arm, and develop doubly robust, semiparametrically efficient estimators for each. Applied to a large archive of headline A/B tests, the common arm strategy is substantially more precise but exhibits systematic bias when the temporal factor depends on the gap between intervention and measurement rather than on measurement time alone, while the replicated trials strategy, which allows this dependence, tracks the ground truth more faithfully. Simulation studies investigate when each strategy is reliable and when it silently fails.

2602.15752 2026-05-20 cs.LG 版本更新

Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching

超越匹配最大化和公平性:以用户留存优化的双侧匹配

Ren Kishimoto, Rikiya Takehi, Koichi Tanaka, Masahiro Nomura, Riku Togashi, Yoji Tomita, Yuta Saito

发表机构 * Institute of Science Tokyo(东京科学研究所) Waseda University(早稻田大学) Keio University(庆应大学) CyberAgent Tokyo(CyberAgent 东京) Hajuku-kaso, Co., Ltd.(汉久科社)

AI总结 本文提出了一种新的双侧匹配优化方法,旨在最大化用户留存而非单纯匹配数量或公平性,通过引入动态学习排序算法MRet,利用用户个性化留存曲线优化推荐策略,提升整体用户留存率。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

在在线约会和招聘等双侧匹配平台上,推荐算法通常旨在最大化总匹配数。然而,这一目标导致了不平衡,一些用户获得过多匹配而另一些用户则获得极少并最终离开平台。对于许多平台,尤其是依赖订阅的平台,用户留存至关重要。一些平台可能使用公平性目标来解决匹配最大化的问题。然而,公平性本身并非所有平台的最终目标,因为用户不会仅仅因为曝光均等而奖励平台。在实践中,用户留存通常是最终目标,随意依赖公平性会使留存优化取决于运气。在本工作中,我们没有最大化匹配或公理化定义公平性,而是正式定义了双侧匹配平台中最大化用户留存的新问题设置。为此,我们引入了一种动态学习到排序(LTR)算法,称为Matching for Retention(MRet)。与传统的双侧匹配算法不同,我们的方法通过从每个用户档案和交互历史中学习个性化留存曲线来建模用户留存。基于这些曲线,MRet通过同时考虑接收推荐的用户和被推荐用户的留存收益,动态调整推荐策略,使得有限的匹配机会分配到最能提高整体留存的地方。自然但重要的是,对主要在线约会平台的合成和真实世界数据集的实证评估显示,MRet实现了更高的用户留存率,因为传统方法优化匹配或公平性而非留存。

英文摘要

On two-sided matching platforms such as online dating and recruiting, recommendation algorithms often aim to maximize the total number of matches. However, this objective creates an imbalance, where some users receive far too many matches while many others receive very few and eventually abandon the platform. Retaining users is crucial for many platforms, such as those that depend heavily on subscriptions. Some may use fairness objectives to solve the problem of match maximization. However, fairness in itself is not the ultimate objective for many platforms, as users do not suddenly reward the platform simply because exposure is equalized. In practice, where user retention is often the ultimate goal, casually relying on fairness will leave the optimization of retention up to luck. In this work, instead of maximizing matches or axiomatically defining fairness, we formally define the new problem setting of maximizing user retention in two-sided matching platforms. To this end, we introduce a dynamic learning-to-rank (LTR) algorithm called Matching for Retention (MRet). Unlike conventional algorithms for two-sided matching, our approach models user retention by learning personalized retention curves from each user's profile and interaction history. Based on these curves, MRet dynamically adapts recommendations by jointly considering the retention gains of both the user receiving recommendations and those who are being recommended, so that limited matching opportunities can be allocated where they most improve overall retention. Naturally but importantly, empirical evaluations on synthetic and real-world datasets from a major online dating platform show that MRet achieves higher user retention, since conventional methods optimize matches or fairness rather than retention.

2602.13466 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Language Model Memory and Memory Models for Language

语言模型记忆与记忆模型用于语言

Benjamin L. Badger

发表机构 * IBM(IBM公司)

AI总结 研究探讨了语言模型和记忆模型在信息存储中的能力差异,发现语言模型的嵌入向量信息较少,而自编码器在输入再生训练中能形成接近完美的记忆,提出了一种可并行的编码器-解码器记忆模型架构,并通过结合因果和信息保留目标函数来提升记忆形成和解码能力。

详情
AI中文摘要

机器学习模型存储输入信息的能力,类似于“记忆”的概念,在隐藏层向量嵌入中被广泛使用但未充分表征。我们发现,无论数据和计算规模如何,语言模型嵌入通常包含相对较少的输入信息。相比之下,用于输入再生训练的自编码器嵌入能够形成几乎完美的记忆。用记忆嵌入替代令牌序列可带来显著的计算效率,从而引入一种可并行的编码器-解码器记忆模型架构。在因果训练后,这些模型包含信息贫乏的嵌入,无法进行任意信息访问,但通过结合因果和信息保留目标函数,它们学会形成和解码信息丰富的记忆。通过冻结高保真编码器并采用课程训练方法,解码器首先学习处理记忆,然后学习预测下一个令牌。我们引入了观点,即仅使用下一个令牌预测训练不足以准确形成记忆,因为目标本身不可逆,从而推动在输入不完全暴露的情况下使用结合目标函数的模型。

英文摘要

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically contain relatively little input information regardless of data and compute scale during training. In contrast, embeddings from autoencoders trained for input regeneration are capable of nearly perfect memory formation. The substitution of memory embeddings for token sequences leads to substantial computational efficiencies, motivating the introduction of a parallelizable encoder-decoder memory model architecture. Upon causal training these models contain information-poor embeddings incapable of arbitrary information access, but by combining causal and information retention objective functions they learn to form and decode information-rich memories. Training can be further streamlined by freezing a high fidelity encoder followed by a curriculum training approach where decoders first learn to process memories and then learn to additionally predict next tokens. We introduce the perspective that next token prediction training alone is poorly suited for accurate memory formation as the objective itself is non-invertible, motivating the use of combined objective functions for models where the entire input is not exposed.

2602.11910 2026-05-20 cs.SD cs.LG 版本更新

TADA! Tuning Audio Diffusion Models through Activation Steering

TADA! 通过激活引导调整音频扩散模型

Łukasz Staniszewski, Katarzyna Zaleska, Mateusz Modrzejewski, Kamil Deja

发表机构 * Warsaw University of Technology(华沙技术大学) IDEAS Research Institute(IDEAS研究院)

AI总结 本文通过激活引导技术揭示音频扩散模型中的语义瓶颈,并展示了局部激活引导在音频概念调节中的新状态-of-the-art性能。

Comments Preprint

详情
AI中文摘要

音频扩散模型能够从文本生成高质量的音乐,但实现对特定音乐属性的精细控制仍然具有挑战性,因为其内部机制对高级概念的表示尚不明确。在本文中,我们利用激活修补技术证明,最近的音频扩散架构存在语义瓶颈,其中一小部分连续的注意力层控制不同的音乐概念,例如特定乐器、人声或音乐类型的存在。在此基础上,我们系统地评估了广泛的应用引导方法,比较了激活引导与提示级、乐谱空间和权重空间干预,分析了引导机制与干预位置之间的相互作用。我们的新基准,通过广泛的用户研究支持,证明了局部激活引导在音频概念调节中建立了新的状态-of-the-art性能。

英文摘要

Audio diffusion models can synthesize high-fidelity music from text, yet achieving fine-grained control over specific musical attributes remains challenging, as their internal mechanisms for representing high-level concepts are poorly understood. In this work, we use activation patching to demonstrate that recent audio diffusion architectures exhibit a semantic bottleneck, where a small, shared subset of consecutive attention layers controls distinct musical concepts, such as the presence of specific instruments, vocals, or genres. Building on this, we systematically evaluate a broad spectrum of steering paradigms, comparing activation steering against prompt-level, score-space, and weight-space interventions, analyzing the interaction between the steering mechanism and the intervention site. Our new benchmark, supported by an extensive user study, demonstrates that localized activation steering establishes a new state-of-the-art in audio concept modulation.

2602.11767 2026-05-20 cs.AI cs.CL cs.LG 版本更新

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents

TSR:用于LLM代理多轮RL的轨迹搜索

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Heiko Ludwig, Holger Boche

发表机构 * Technical University Munich(慕尼黑技术大学) IBM Research(IBM研究院)

AI总结 本文提出TSR,一种在训练时改进每轮轨迹生成的方法,通过轻量级树状搜索构造高质量轨迹,提升rollout质量和学习稳定性,适用于多轮RL任务。

详情
AI中文摘要

大规模语言模型(LLMs)的进步正在推动使用强化学习(RL)来训练代理,从跨任务的迭代、多轮交互中学习。然而,多轮RL仍然具有挑战性,因为奖励通常稀疏或延迟,而环境可能是随机的。在这种情况下,朴素的轨迹采样会阻碍利用并导致模式崩溃。我们提出了TSR(轨迹搜索rollouts),一种训练时的方法,重新利用测试时扩展的想法以改进每轮rollout生成。TSR通过基于状态的反馈在每个回合中选择高分动作,进行轻量级树状搜索来构造高质量轨迹。这提高了rollout质量并稳定了学习,同时与标准策略梯度优化器兼容,使TSR对优化器无偏见。我们用best-of-N、beam和浅层前瞻搜索实例化TSR,并与PPO和GRPO配对,在Sokoban、FrozenLake和WebShop任务中实现高达15%的性能提升和更稳定的训练,仅需适度增加一次训练计算。通过将搜索从推理时间转移到训练的rollout阶段,TSR提供了一种模块化且通用的机制,用于更强的多轮代理学习,与现有框架和拒绝采样式选择方法互补。

英文摘要

Advances in large language models (LLMs) are driving a shift toward using reinforcement learning (RL) to train agents from iterative, multi-turn interactions across tasks. However, multi-turn RL remains challenging as rewards are often sparse or delayed, and environments can be stochastic. In this regime, naive trajectory sampling can hinder exploitation and induce mode collapse. We propose TSR (Trajectory-Search Rollouts), a training-time approach that repurposes test-time scaling ideas for improved per-turn rollout generation. TSR performs lightweight tree-style search to construct high-quality trajectories by selecting high-scoring actions at each turn using state-based feedback. This improves rollout quality and stabilizes learning while remaining compatible with standard policy gradient optimizers, making TSR optimizer-agnostic. We instantiate TSR with best-of-N, beam, and shallow lookahead search, and pair it with PPO and GRPO, achieving up to 15% performance gains and more stable learning on Sokoban, FrozenLake, and WebShop tasks at a modest, one-time increase in training compute. By moving search from inference time to the rollout stage of training, TSR provides a modular and general mechanism for stronger multi-turn agent learning, complementary to existing frameworks and rejection-sampling-style selection methods.

2602.11454 2026-05-20 cs.DS cs.LG 版本更新

Adaptive Power Iteration Method for Differentially Private PCA

自适应幂迭代法用于差分隐私主成分分析

Ta Duy Nguyen, Alina Ene, Huy Le Nguyen

发表机构 * Department of Computer Science, Boston University(波士顿大学计算机科学系) Khoury College of Computer and Information Science, Northeastern University(东北大学科里学院计算机与信息科学学院)

AI总结 本文研究了在差分隐私下近似计算矩阵A的顶级奇异向量的算法,提出了一种自适应过滤技术,适用于低相干性输入矩阵,从而在保证隐私的同时提高计算效率。

详情
AI中文摘要

我们研究了在差分隐私下近似计算矩阵A∈R^{n×d}的顶级奇异向量的算法,其中A的每一行都是R^d中的数据点。遵循Dwork-Talwar-Thakurta-Zhang(STOC 2014)的隐私模型,我们考虑相邻输入仅在一行上不同的情况。我们提出了一种新的算法,该算法在输入矩阵具有低相干性时能够提供超越最坏情况的保证,这是许多应用中矩阵的结构特性,包括但不限于独立同分布数据。我们的算法为私有幂迭代方法的文献做出了贡献,其中我们引入了一种新的过滤技术,该技术适应于此相干参数。我们的工作在Hardt-Roth(STOC 2013)的工作基础上进行了扩展和补充,后者在更严格的隐私模型下实现了超越最坏情况的保证,其中相邻输入在单个条目上最多相差1。

英文摘要

We study $\left(ε,δ\right)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data point in $\mathbb{R}^{d}$. Following Dwork-Talwar-Thakurta-Zhang (STOC 2014), we consider the privacy model where neighboring inputs differ by one single row. We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence, which is a structural property of matrices in many applications, including but not limited to i.i.d. data. Our algorithm contributes to the extensive literature on private power iteration methods, where we introduce a new filtering technique which adapts to this coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which achieves beyond-worst-case guarantees for the more restrictive privacy model where neighboring inputs differ in one single entry by at most 1.

2602.07570 2026-05-20 q-bio.NC cs.AI cs.CV cs.LG 版本更新

How does longer temporal context enhance multimodal narrative video processing in the brain?

更长的时间上下文如何增强大脑对多模态叙事视频的处理?

Prachi Jindal, Anant Khandelwal, Manish Gupta, Bapi S. Raju, Subba Reddy Oota, Tanmoy Chakraborty

发表机构 * Technische Universität Berlin(柏林技术大学) Microsoft Research(微软研究院) IIT Delhi(德里理工学院) Microsoft(微软) IIIT-Hyderabad(海得拉巴理工学院)

AI总结 本研究探讨了视频片段时长和叙事任务提示如何影响自然电影观看过程中大脑模型对多模态大语言模型(MLLMs)的对齐情况,发现增加片段持续时间显著提高了大脑对齐程度,而单模态视频模型则无明显提升。

Comments 22 pages, 15 figures

详情
AI中文摘要

理解人类和人工智能系统如何处理复杂的叙事视频是一个在神经科学和机器学习交汇处的基本挑战。本研究调查了视频片段的时间上下文长度(3-24秒片段)和叙事任务提示如何影响自然电影观看过程中大脑模型的对齐情况。利用受试者观看完整电影的fMRI记录,我们研究了对叙事上下文敏感的大脑区域如何在不同时间尺度上动态表示信息,以及这些神经模式如何与模型派生的特征对齐。我们发现,增加片段持续时间显著提高了多模态大语言模型(MLLMs)的大脑对齐程度,而单模态视频模型则几乎没有提升。进一步地,较短的时间窗口与感知和早期语言区域对齐,而较长的窗口则更倾向于与更高阶整合区域对齐,这在MLLMs中表现为层到皮层的层次结构。最后,使用四个叙事任务提示的实验显示,这些提示会引发任务特定、区域依赖性的大脑对齐模式,并在更高阶区域引起上下文依赖的片段级调谐变化。我们的工作将长篇叙事电影定位为研究长时间尺度时间整合在长上下文MLLMs中的原理性测试平台,以及其与叙事理解过程中皮层响应关系的桥梁。

英文摘要

Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study investigates how the temporal context length of video clips (3--24 s clips) and the narrative-task prompting shape brain-model alignment during naturalistic movie watching. Using fMRI recordings from participants viewing full-length movies, we examine how brain regions sensitive to narrative context dynamically represent information over varying timescales and how these neural patterns align with model-derived features. We find that increasing clip duration substantially improves brain alignment for multimodal large language models (MLLMs), whereas unimodal video models show little to no gain. Further, shorter temporal windows align with perceptual and early language regions, while longer windows preferentially align higher-order integrative regions, mirrored by a layer-to-cortex hierarchy in MLLMs. Finally, experiments with four narrative-task prompts show that they elicit task-specific, region-dependent brain alignment patterns and context-dependent shifts in clip-level tuning in higher-order regions. Our work positions long-form narrative movies as a principled testbed for studying long-timescale temporal integration in long-context MLLMs and its relationship to cortical responses during narrative comprehension.

2602.07008 2026-05-20 cs.CV cs.LG 版本更新

Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making

不应学习的地方:基于子集归因约束的先验对齐训练以实现可靠的决策制定

Ruoyu Chen, Shangquan Sun, Xiaoqing Guo, Sanyi Zhang, Kangwei Liu, Shiming Liu, Zhangcheng Wang, Qunli Zhang, Hua Zhang, Xiaochun Cao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) University of Chinese Academy of Sciences(中国科学院大学) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算与数据科学学院) Department of Computer Science, Hong Kong Baptist University(香港 Baptist 大学计算机科学系) Communication University of China(中国传媒大学) Imperial College London(伦敦帝国学院) School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区网络科学与技术学院)

AI总结 本文提出了一种基于归因的先验对齐方法,通过子集选择归因技术约束模型依赖于人类先验区域,从而提升决策的可靠性。

详情
AI中文摘要

可靠的模型不仅要预测正确,还要能用可接受的证据来解释决策。然而,传统监督学习通常只提供类别级标签,使模型通过捷径相关性实现高精度,而非预期的证据。人类先验可以约束此类行为,但对齐模型到这些先验仍然具有挑战性,因为学习的表示往往偏离人类感知。为了解决这一挑战,我们提出了一种基于归因的人类先验对齐方法。我们将人类先验编码为模型应依赖的输入区域(例如边界框),并利用高度忠实的子集选择归因方法,在训练过程中暴露模型的决策证据。当归因区域显著偏离先验区域时,我们惩罚对非先验证据的依赖,促使模型将归因转向预期区域。这是通过一个训练目标实现的,该目标通过人类先验诱导归因约束。我们在基于MLLM的GUI代理模型上验证了我们的方法,涵盖图像分类和点击决策任务。在传统分类和自回归生成设置中,人类先验对齐一致提高了任务准确性,同时增强了模型的决策合理性。

英文摘要

Reliable models should not only predict correctly, but also justify decisions with acceptable evidence. Yet conventional supervised learning typically provides only class-level labels, allowing models to achieve high accuracy through shortcut correlations rather than the intended evidence. Human priors can help constrain such behavior, but aligning models to these priors remains challenging because learned representations often diverge from human perception. To address this challenge, we propose an attribution-based human prior alignment method. We encode human priors as input regions that the model is expected to rely on (e.g., bounding boxes), and leverage a highly faithful subset-selection-based attribution approach to expose the model's decision evidence during training. When the attribution region deviates substantially from the prior regions, we penalize reliance on off-prior evidence, encouraging the model to shift its attribution toward the intended regions. This is achieved through a training objective that imposes attribution constraints induced by the human prior. We validate our method on both image classification and click decision tasks in MLLM-based GUI agent models. Across conventional classification and autoregressive generation settings, human prior alignment consistently improves task accuracy while also enhancing the model's decision reasonability.

2602.06462 2026-05-20 cs.CL cs.LG 版本更新

Diffusion-State Policy Optimization for Masked Diffusion Language Models

扩散状态策略优化用于掩码扩散语言模型

Daisuke Oba, Hiroki Furuta, Naoaki Okazaki

发表机构 * Institute of Science Tokyo(东京科学研究院)

AI总结 本文提出Diffusion-State Policy Optimization(DiSPO),一种用于掩码扩散语言模型的插件信用分配层,通过直接优化中间填充决策来改进生成过程,实验表明其在数学和规划基准测试中优于现有基线方法。

详情
AI中文摘要

掩码扩散语言模型通过迭代填充掩码标记来生成文本,但仅对最终完成结果的终端奖励对中间填充决策的信用分配过于粗糙。我们提出Diffusion-State Policy Optimization(DiSPO),一种插件信用分配层,直接优化中间填充决策。在选定的中间掩码状态下,DiSPO通过从滚出缓存的logits中重新采样当前掩码位置,评估由此产生的完成结果,并仅更新新填充的标记,无需额外的多步扩散滚出或优化器步骤。我们为分支完成形式化了一个固定状态目标,并推导出一个策略梯度估计器,该估计器重用与终端反馈策略优化相同的滚出。在LLaDA-8B-Instruct上的实验表明,DiSPO在匹配的滚出计算和优化器步骤下,一致提高了终端反馈基线,包括diffu-GRPO和SPG,在数学和规划基准测试中。我们的项目页面可在https://daioba.github.io/dispo上找到。

英文摘要

Masked diffusion language models generate text through iterative masked-token filling, but terminal-only rewards on final completions provide coarse credit assignment for the intermediate filling decisions that shape the generation process. We propose Diffusion-State Policy Optimization (DiSPO), a plug-in credit-assignment layer that directly optimizes intermediate filling decisions. At selected intermediate masked states, DiSPO branches by resampling the currently masked positions from rollout-cached logits, scores the resulting completions, and updates only the newly filled tokens, requiring no additional multi-step diffusion rollouts or optimizer steps. We formalize a fixed-state objective for branched completions and derive a policy-gradient estimator that reuses the same rollouts as terminal-feedback policy optimization. Experiments on LLaDA-8B-Instruct show that DiSPO consistently improves terminal-feedback baselines, including diffu-GRPO and SPG, on math and planning benchmarks under matched rollout compute and optimizer steps, supporting its use as a general plug-in for masked diffusion policy optimization. Our project page is available at https://daioba.github.io/dispo .

2602.04998 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

学习率至关重要:Vanilla LoRA可能足以用于LLM微调

Yu-Ang Lee, Ching-Yun Ko, Pin-Yu Chen, Mi-Yen Yeh

发表机构 * National Taiwan University(国立台湾大学) IBM Research(IBM研究院) Academia Sinica(台湾“学术院”)

AI总结 本文通过广泛的超参数搜索重新评估了九种代表性的LoRA变体和Vanilla LoRA,在数学推理、常识推理、代码生成和指令遵循等任务上,发现不同的LoRA方法偏好不同的学习率范围。当学习率正确调整时,所有方法都能达到相似的峰值性能,这表明Vanilla LoRA仍然是一个有竞争力的基线,而单一训练配置下的改进可能并不反映一致的方法优势。

Comments Project page: https://github.com/yuang-lee/lr-matters-lora

详情
AI中文摘要

低秩适应(LoRA)是高效大型语言模型(LLM)微调的主流方法。在此范式基础上,近期研究提出了替代的初始化策略、架构修改和优化调整,报告了显著优于Vanilla LoRA的改进。然而,这些改进通常是在固定或狭窄调整的超参数设置下展示的,尽管神经网络对训练配置敏感已知。在本工作中,我们通过广泛的超参数搜索,系统地重新评估了九种代表性的LoRA变体以及Vanilla LoRA,搜索范围包括学习率、批量大小、秩和训练持续时间。在覆盖数学推理、常识推理、代码生成和指令遵循等任务的不同模型规模上,我们发现不同的LoRA方法偏好不同的学习率范围。关键的是,一旦学习率正确调整,所有方法都能达到相似的峰值性能(在1-2%以内),仅存在细微的秩依赖行为。这些结果表明,Vanilla LoRA仍然是一个有竞争力的基线,而单一训练配置下的改进可能并不反映一致的方法优势。最后,二次分析将不同的最优学习率范围归因于最大的Hessian特征值的变化,这与经典的机器学习理论一致。

英文摘要

Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies, architectural modifications, and optimization adjustments, reporting substantial improvements over vanilla LoRA. However, these gains are often demonstrated under fixed or narrowly tuned hyperparameter settings, despite the known sensitivity of neural networks to training configurations. In this work, we systematically re-evaluate nine representative LoRA variants alongside vanilla LoRA through extensive hyperparameter searches over learning rate, batch size, rank, and training duration. Across tasks spanning mathematical reasoning, commonsense reasoning, code generation, and instruction following at diverse model scales, we find that different LoRA methods favor distinct learning rate ranges. Crucially, once learning rates are properly tuned, all methods achieve similar peak performance (within 1-2%), with only subtle rank-dependent behaviors. These results suggest that vanilla LoRA remains a competitive baseline and that improvements reported under a single training configuration may not reflect consistent methodological advantages. Finally, a second-order analysis attributes the differing optimal learning rate ranges to variations in the largest Hessian eigenvalue, aligning with classical learning theories.

2602.04663 2026-05-20 cs.LG cs.AI 版本更新

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

重新思考扩散模型强化学习的设计空间:超越损失设计的似然估计的重要性

Jaemoo Choi, Yuchen Zhu, Wei Guo, Petr Molodyk, Bo Yuan, Jinbin Bai, Yi Xin, Molei Tao, Yongxin Chen

发表机构 * Georgia Institute of Technology(佐治亚理工学院) National University of Singapore(新加坡国立大学) Nanjing university(南京大学)

AI总结 本文研究了扩散模型强化学习设计空间中的关键问题,通过分解策略梯度目标、似然估计器和回放采样方案三个因素,发现基于证据下界(ELBO)的模型似然估计器是实现有效、高效和稳定强化学习优化的主要因素,优于特定策略梯度损失函数的影响。

Comments 23 pages, 11 figures

详情
AI中文摘要

强化学习已被广泛应用于扩散和流模型,用于文本到图像生成等视觉任务。然而,这些任务仍然具有挑战性,因为扩散模型具有不可 tractable 的似然,这阻碍了直接应用流行策略梯度类型方法。现有方法主要集中在构建新的目标,这些目标基于已经高度工程化的LLM目标,并使用随意的似然估计器,而没有深入研究此类估计对整体算法性能的影响。在本文中,我们通过分解三个因素:i)策略梯度目标,ii)似然估计器,和iii)回放采样方案,对RL设计空间进行了系统分析。我们证明,采用基于证据下界(ELBO)的模型似然估计器,仅从最终生成的样本计算,是实现有效、高效和稳定RL优化的主要因素,其影响超过特定策略梯度损失函数的影响。我们通过SD 3.5 Medium在多个奖励基准上验证了我们的发现,并在所有任务中观察到一致的趋势。我们的方法在90个GPU小时内将GenEval得分从0.24提高到0.95,比FlowGRPO高效4.6倍,比无奖励黑客的SOTA方法DiffusionNFT高效2倍。

英文摘要

Reinforcement learning has been widely applied to diffusion and flow models for visual tasks such as text-to-image generation. However, these tasks remain challenging because diffusion models have intractable likelihoods, which creates a barrier for directly applying popular policy-gradient type methods. Existing approaches primarily focus on crafting new objectives built on already heavily engineered LLM objectives, using ad hoc estimators for likelihood, without a thorough investigation into how such estimation affects overall algorithmic performance. In this work, we provide a systematic analysis of the RL design space by disentangling three factors: i) policy-gradient objectives, ii) likelihood estimators, and iii) rollout sampling schemes. We show that adopting an evidence lower bound (ELBO) based model likelihood estimator, computed only from the final generated sample, is the dominant factor enabling effective, efficient, and stable RL optimization, outweighing the impact of the specific policy-gradient loss functional. We validate our findings across multiple reward benchmarks using SD 3.5 Medium, and observe consistent trends across all tasks. Our method improves the GenEval score from 0.24 to 0.95 in 90 GPU hours, which is $4.6\times$ more efficient than FlowGRPO and $2\times$ more efficient than the SOTA method DiffusionNFT without reward hacking.

2602.04555 2026-05-20 cs.LG 版本更新

Finding Structure in Continual Learning

在持续学习中寻找结构

Pourya Shamsolmoali, Masoumeh Zareapoor

AI总结 本文提出了一种基于Douglas-Rachford Splitting方法的持续学习框架,通过解耦的两个目标在稳定性和可塑性之间进行协商,实现了更高效且稳定的持续学习。

Comments There is a bug in the algorithm and implementation

详情
AI中文摘要

从一系列任务中学习通常面临可塑性与稳定性的矛盾:获取新知识往往导致对过去信息的灾难性遗忘。大多数方法通过求和竞争损失项来解决这一问题,产生梯度冲突,通常需要复杂的且效率低的策略如外部记忆回放或参数正则化来管理。我们提出了一种使用Douglas-Rachford Splitting(DRS)重新表述持续学习目标的方法。这种方法将学习过程重新表述为两个解耦目标之间的协商:一个促进新任务的可塑性,另一个确保旧知识的稳定性。通过迭代地通过其近端算子寻找共识,DRS提供了一种更加系统和稳定的持续学习动态。我们的方法在不需辅助模块或复杂附加组件的情况下实现了稳定性与可塑性之间的高效平衡,为持续学习系统提供了一种更简单却更强大的范式。

英文摘要

Learning from a stream of tasks usually pits plasticity against stability: acquiring new knowledge often causes catastrophic forgetting of past information. Most methods address this by summing competing loss terms, creating gradient conflicts that are managed with complex and often inefficient strategies such as external memory replay or parameter regularization. We propose a reformulation of the continual learning objective using Douglas-Rachford Splitting (DRS). This reframes the learning process not as a direct trade-off, but as a negotiation between two decoupled objectives: one promoting plasticity for new tasks and the other enforcing stability of old knowledge. By iteratively finding a consensus through their proximal operators, DRS provides a more principled and stable learning dynamic. Our approach achieves an efficient balance between stability and plasticity without the need for auxiliary modules or complex add-ons, providing a simpler yet more powerful paradigm for continual learning systems.

2602.02513 2026-05-20 cs.LG cond-mat.mtrl-sci 版本更新

Learning ORDER-Aware Multimodal Representations for Composite Materials Design

学习有序的多模态表示以进行复合材料设计

Xinyao Li, Hangwei Qian, Jingjing Li, Lei Zhu, Ivor Tsang

发表机构 * University of Electronic Science and Technology of China(电子科技大学) Tongji University(同济大学) A*STAR CFAR(新加坡A*STAR CFAR)

AI总结 本研究提出了一种基于有序性的多模态预训练框架ORDER,用于复合材料设计,通过整合异构数据源来捕捉纤维分布,从而在连续设计空间中实现有效的属性预测和微结构生成。

详情
AI中文摘要

人工智能在材料发现和性质预测中展现出显著的成功,尤其是在晶体和聚合物系统中,其中材料性质和结构主要由离散图表示主导。这种图中心范式在复合材料中失效,因为复合材料具有连续和非线性的设计空间。通用复合描述符,例如纤维体积和偏移角度,无法完全捕捉决定微结构特性的纤维分布,需要通过多模态学习整合异构数据源。现有的对齐导向框架在离散、唯一的图-性质映射假设下对大量晶体或聚合物数据有效,但在极端数据稀缺的情况下无法解决高度连续的复合设计空间。在本工作中,我们引入了ORDinal-aware imagE-tabulaR alignment(ORDER),一种多模态预训练框架,将有序性作为材料表示的核心原则。ORDER确保具有相似目标属性的材料在潜在空间中占据附近区域,这有效地保持了复合材料属性的连续性,并在稀疏观察设计之间实现了有意义的插值。我们评估了ORDER在纳米纤维增强复合材料数据集和碳纤维T700数据集上的表现。ORDER及其变体在属性预测、跨模态检索和微结构生成任务中均优于对齐导向和定制属性意识对比基线。我们进一步引入基于物理的有序替代信号,避免了预训练过程中需要完整的属性注释。我们的工作证明了学习连续多模态特征对于复合材料是基础性的,并提供了一条通往数据高效通用多模态智能系统可靠路径。

英文摘要

Artificial intelligence has shown remarkable success in materials discovery and property prediction, particularly for crystalline and polymer systems where material properties and structures are dominated by discrete graph representations. Such graph-central paradigm breaks down on composite materials, which possess continuous and nonlinear design spaces. General composite descriptors, e.g., fiber volume and misalignment angle, cannot fully capture the fiber distributions that determine microstructural characteristics, necessitating the integration of heterogeneous data sources through multimodal learning. Existing alignment-oriented frameworks have proven effective on abundant crystal or polymer data under discrete, unique graph-property mapping assumptions, but fail to address the highly continuous composite design space under extreme data scarcity. In this work we introduce ORDinal-aware imagE-tabulaR alignment (ORDER), a multimodal pretraining framework that establishes ordinality as a core principle for material representations. ORDER ensures that materials with similar target properties occupy nearby regions in the latent space, which effectively preserves the continuous nature of composite properties and enables meaningful interpolation between sparsely observed designs. We evaluate ORDER on a Nanofiber-reinforced composite dataset and a carbon fiber T700 dataset. ORDER and its variants outperform both alignment-oriented and customized property-aware contrastive baselines across property prediction, cross-modal retrieval, and microstructure generation tasks. We further introduce physics-based ordinal surrogate signals avoiding the need for full property annotation during pretrain. Our work demonstrates learning continuous multimodal features are fundamental for composite materials, and provides a reliable pathway toward data-efficient universal multimodal intelligent systems.

2601.22478 2026-05-20 cs.LG 版本更新

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

增强推理探索的变换增强GRPO

Khiem Le, Phuc Nguyen, Youssef Mroueh, Chi-Heng Lin, Shangqian Gao, Ting Hua, Nitesh V. Chawla

发表机构 * University of Notre Dame(诺特大学) IBM Research(IBM研究院) Meta AI Florida State University(佛罗里达州立大学)

AI总结 本文提出变换增强GRPO(TA-GRPO)方法,通过问题重述解决大语言模型强化学习中梯度消失和多样性崩溃问题,提升模型在推理任务中的探索能力。

详情
AI中文摘要

组相对策略优化(GRPO)已成为在大语言模型中使用可验证奖励的强化学习主导方法,但其面临两个关键限制:梯度消失和多样性崩溃。当训练问题过于简单或过于困难时,所有采样响应获得相同奖励,导致梯度消失。同时,模型倾向于将响应集中于单一推理模式,而非探索多样化策略。我们提出变换增强GRPO(TA-GRPO),一种简单但有效的方法,通过问题重述解决这两个问题。对于每个训练问题,我们自动生成多个等价问题重述,改变用词、格式和信息顺序,同时保持底层含义。由于这些重述改变了模型感知的难度,池化原始问题及其重述的响应可获得混合奖励和更多多样化的推理路径。TA-GRPO联合计算此扩展响应集的优势,并将所有重要性比率对齐到原始问题,使模型能够从更丰富的解决方案尝试中学习。在四个LLM(Qwen3-1.7B,Qwen3-4B,Llama-3.2-1B,Llama-3.2-3B)上的实验表明,TA-GRPO在竞争级基准(AMC,OlympiadBench,AIME24,AIME25)和分布外基准(Minerva,GPQA-Diamond)上一致提升了pass@$k$。值得注意的是,TA-GRPO使Qwen3-1.7B和Qwen3-4B的平均pass@32分别提高了4.97和4.34个点,并与训练数据多达2.5倍的基线模型在探索质量上相当。

英文摘要

Group Relative Policy Optimization (GRPO) has become the dominant method for reinforcement learning with verifiable rewards in large language models, but it suffers from two critical limitations: gradient vanishing and diversity collapse. When training questions are too easy or too hard, all sampled responses receive identical rewards, yielding zero gradients. Meanwhile, the model tends to collapse its responses toward a single reasoning pattern rather than exploring diverse strategies. We propose Transformation-Augmented GRPO (TA-GRPO), a simple but effective method that addresses both issues via question rephrasing. For each training question, we automatically generate multiple problem-equivalent rephrasings that alter wording, format, and information order while preserving the underlying meaning. Because these rephrasings shift the model's perceived difficulty, pooling responses across the original and its rephrasings yields mixed rewards and more diverse reasoning paths. TA-GRPO jointly computes advantages over this expanded response set and aligns all importance ratios to the original question, enabling the model to learn from a richer set of solution attempts. Experiments on four LLMs (Qwen3-1.7B, Qwen3-4B, Llama-3.2-1B, Llama-3.2-3B) show that TA-GRPO consistently improves pass@$k$ on competition-level benchmarks (AMC, OlympiadBench, AIME24, AIME25) and out-of-distribution benchmarks (Minerva, GPQA-Diamond). Notably, it improves the average pass@32 of Qwen3-1.7B and Qwen3-4B by \textbf{4.97} and \textbf{4.34} points, respectively, and matches the exploration quality of baselines trained on up to 2.5$\times$ more data.

2601.21484 2026-05-20 cs.LG 版本更新

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

ETS: 为无训练强化学习对齐的能耗引导测试时缩放

Xiuyu Li, Jinkai Zhang, Mingyang Yi, Yu Li, Longqiang Wang, Yue Wang, Ju Fan

发表机构 * Renmin University of China, Beijing, China(中国人民大学,北京,中国) Zhongguancun Academy, Beijing, China(中关村学院,北京,中国)

AI总结 本文提出了一种无需训练的推理方法,直接从最优强化学习策略中采样,通过结合参考策略模型和能耗项来改进掩码语言模型的过渡概率,并通过在线蒙特卡洛方法估计关键能耗项,从而提高生成质量。

Comments Accepted by ICML 2026

详情
AI中文摘要

强化学习(RL)在语言模型中的训练后对齐是有效的,但实际中也成本高且不稳定,这归因于其复杂的训练过程。为了解决这个问题,我们提出了一种无需训练的推理方法,直接从最优RL策略中采样。应用于掩码语言模型(MLM)的过渡概率由参考策略模型和一个能耗项组成。基于此,我们的算法,能耗引导测试时缩放(ETS),通过在线蒙特卡洛方法估计关键能耗项,具有可证明的收敛率。此外,为了确保实际效率,ETS利用现代加速框架以及定制的重要性采样估计器,显著减少推理延迟,同时可证明地保持采样质量。在MLM(包括自回归模型和扩散语言模型)上,通过推理、编码和科学基准测试,我们的ETS一致地提高了生成质量,验证了其有效性和设计。代码可在https://github.com/sheriyuo/ETS上获得。

英文摘要

Reinforcement Learning (RL) post-training alignment for language models is effective, but also costly and unstable in practice, owing to its complicated training process. To address this, we propose a training-free inference method to sample directly from the optimal RL policy. The transition probability applied to Masked Language Modeling (MLM) consists of a reference policy model and an energy term. Based on this, our algorithm, Energy-Guided Test-Time Scaling (ETS), estimates the key energy term via online Monte Carlo, with a provable convergence rate. Moreover, to ensure practical efficiency, ETS leverages modern acceleration frameworks alongside tailored importance sampling estimators, substantially reducing inference latency while provably preserving sampling quality. Experiments on MLM (including autoregressive models and diffusion language models) across reasoning, coding, and science benchmarks show that our ETS consistently improves generation quality, validating its effectiveness and design. The code is available at https://github.com/sheriyuo/ETS.

2601.20309 2026-05-20 cs.DC cs.AI cs.LG 版本更新

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips

SuperInfer: 面向Superchips的SLO感知旋转调度与内存管理技术

Jiahuan Yu, Mingtao Hu, Zichao Lin, Minjia Zhang

发表机构 * Supercomputing-System-AI-Lab(超级计算系统人工智能实验室)

AI总结 针对LLM服务中严格延迟SLO与有限GPU内存容量之间的矛盾,SuperInfer提出了一种面向新兴Superchips的高性能LLM推理系统,通过NVLink-C2C实现紧密耦合的GPU-CPU架构,引入SLO感知的旋转调度器RotaSched和优化的旋转引擎DuplexKV,显著提升了TTFT SLO达成率。

Comments Accepted by MLSys '26

详情
AI中文摘要

Large Language Model (LLM) serving faces a fundamental tension between stringent latency Service Level Objectives (SLOs) and limited GPU memory capacity. When high request rates exhaust the KV cache budget, existing LLM inference systems often suffer severe head-of-line (HOL) blocking. While prior work explored PCIe-based offloading, these approaches cannot sustain responsiveness under high request rates, often failing to meet tight Time-To-First-Token (TTFT) and Time-Between-Tokens (TBT) SLOs. We present SuperInfer, a high-performance LLM inference system designed for emerging Superchips (e.g., NVIDIA GH200) with tightly coupled GPU-CPU architecture via NVLink-C2C. SuperInfer introduces RotaSched, the first proactive, SLO-aware rotary scheduler that rotates requests to maintain responsiveness on Superchips, and DuplexKV, an optimized rotation engine that enables full-duplex transfer over NVLink-C2C. Evaluations on GH200 using various models and datasets show that SuperInfer improves TTFT SLO attainment rates by up to 74.7% while maintaining comparable TBT and throughput compared to state-of-the-art systems, demonstrating that SLO-aware scheduling and memory co-design unlocks the full potential of Superchips for responsive LLM serving. Code is available in https://github.com/Supercomputing-System-AI-Lab/SuperInfer.

英文摘要

Large Language Model (LLM) serving faces a fundamental tension between stringent latency Service Level Objectives (SLOs) and limited GPU memory capacity. When high request rates exhaust the KV cache budget, existing LLM inference systems often suffer severe head-of-line (HOL) blocking. While prior work explored PCIe-based offloading, these approaches cannot sustain responsiveness under high request rates, often failing to meet tight Time-To-First-Token (TTFT) and Time-Between-Tokens (TBT) SLOs. We present SuperInfer, a high-performance LLM inference system designed for emerging Superchips (e.g., NVIDIA GH200) with tightly coupled GPU-CPU architecture via NVLink-C2C. SuperInfer introduces RotaSched, the first proactive, SLO-aware rotary scheduler that rotates requests to maintain responsiveness on Superchips, and DuplexKV, an optimized rotation engine that enables full-duplex transfer over NVLink-C2C. Evaluations on GH200 using various models and datasets show that SuperInfer improves TTFT SLO attainment rates by up to 74.7% while maintaining comparable TBT and throughput compared to state-of-the-art systems, demonstrating that SLO-aware scheduling and memory co-design unlocks the full potential of Superchips for responsive LLM serving. Code is available in https://github.com/Supercomputing-System-AI-Lab/SuperInfer.

2601.14234 2026-05-20 cs.LG cs.AI cs.RO stat.ML 版本更新

Q-learning with Adjoint Matching

具有伴随匹配的Q学习

Qiyang Li, Sergey Levine

发表机构 * UC Berkeley(加州大学伯克利分校)

AI总结 本文提出了一种基于时序差分的强化学习算法QAM,解决了连续动作强化学习中的长期挑战:高效优化表达性强的扩散或流匹配策略相对于参数化的Q函数。通过利用批评者的首阶信息进行有效优化,但直接通过反向传播其多步去噪过程进行梯度优化在数值上不稳定。现有方法通过仅使用价值和丢弃梯度信息或依赖近似方法牺牲策略的表达性或偏置学习策略。QAM通过利用生成建模中最近提出的技术伴随匹配,将批评者的动作梯度转换为逐步目标函数,避免了不稳定反向传播,同时在最优时提供无偏且表达性强的策略。结合时序差分备份进行批评者学习,QAM在离线和离线到在线强化学习的硬稀疏奖励任务中一致优于先前方法。

Comments 32 pages, 8 figures, 7 tables

详情
AI中文摘要

我们提出QAM,一种新颖的基于时序差分的强化学习(RL)算法,解决了连续动作RL中长期存在的挑战:高效优化表达性强的扩散或流匹配策略相对于参数化的Q函数。有效的优化需要利用批评者的首阶信息,但通过反向传播其多步去噪过程进行直接梯度优化在数值上不稳定。现有方法通过仅使用价值和丢弃梯度信息或依赖近似方法牺牲策略的表达性或偏置学习策略。QAM通过利用生成建模中最近提出的技术伴随匹配,将批评者的动作梯度转换为逐步目标函数,避免了不稳定反向传播,同时在最优时提供无偏且表达性强的策略。结合时序差分备份进行批评者学习,QAM在离线和离线到在线RL的硬稀疏奖励任务中一致优于先前方法。

英文摘要

We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.

2601.12707 2026-05-20 cs.LG stat.ML 版本更新

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

在竞争性游戏中解码奖励:带有熵正则化的逆向博弈论

Junyi Liao, Zihan Zhu, Ethan Fang, Zhuoran Yang, Vahid Tarokh

发表机构 * Department of Electrical and Computer Engineering, Duke University(杜克大学电气与计算机工程系) Department of Statistics and Data Science, University of Pennsylvania(宾夕法尼亚大学统计与数据科学系) Department of Biostatistics and Bioinformatics, Duke University(杜克大学生物统计与生物信息学系) Department of Statistics and Data Science, Yale University(耶鲁大学统计与数据科学系)

AI总结 本文研究了在竞争性游戏中通过逆向博弈论和熵正则化来恢复未知奖励函数的问题,提出了一种统一的框架,能够在静态和动态设置中学习奖励函数,并通过理论保证和数值实验验证了其有效性。

Comments Extended journal version of ICML 2025 paper. Submitted to Operations Research

详情
AI中文摘要

估计驱动智能体行为的未知奖励函数在逆向强化学习和博弈论中具有核心重要性。为解决这个问题,我们开发了一个统一的框架,用于在两名玩家零和矩阵博弈和马尔可夫博弈中恢复奖励函数,并通过熵正则化来重建给定观察到的玩家策略和动作的潜在奖励函数。这项任务具有挑战性,因为逆向问题固有的模糊性、可行奖励的非唯一性和观察数据覆盖的限制。为了解决这些挑战,我们利用线性假设在量级响应均衡(QRE)下建立了奖励函数的可识别性。在此理论基础上,我们提出了一种新的算法,从观察到的动作中学习奖励函数。我们的算法适用于静态和动态设置,并且可以适应不同方法,如最大似然估计(MLE)。我们为算法的可靠性和样本效率提供了强有力的理论保证。进一步,我们进行了广泛的数值研究,以证明所提出框架的实际有效性,为竞争环境中的决策提供了新的见解。

英文摘要

Estimating the unknown reward functions driving agents' behaviors is of central interest in inverse reinforcement learning and game theory. To tackle this problem, we develop a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization, where we aim to reconstruct the underlying reward functions given observed players' strategies and actions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish the reward function's identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building upon this theoretical foundation, we propose a novel algorithm to learn reward functions from observed actions. Our algorithm works in both static and dynamic settings and is adaptable to incorporate different methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm. Further, we conduct extensive numerical studies to demonstrate the practical effectiveness of the proposed framework, offering new insights into decision-making in competitive environments.

2601.12238 2026-05-20 stat.ML cs.LG math.OC 版本更新

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

关于动量SGD在非平稳随机优化中的可证明次优性的研究

Sharan Sahu, Cameron J. Hogan, Martin T. Wells

发表机构 * Cornell University(康奈尔大学)

AI总结 本文研究了在强凸性和光滑性条件下,随机梯度下降及其动量变体(Polyak重球和Nesterov)在跟踪时间变化最优解时的性能,揭示了动量方法在分布偏移下导致的显式漂移放大惩罚,并证明了这种惩罚并非分析伪影,而是信息论障碍,为动量方法在动态环境中的经验不稳定性提供了理论依据。

Comments Accepted to ICML 2026. 75 pages, 5 figures, 4 tables

详情
AI中文摘要

在本文中,我们对随机梯度下降(SGD)及其动量变体(Polyak重球和Nesterov)在强凸性和光滑性条件下跟踪时间变化最优解进行了全面的理论分析。我们的有限时间界揭示了跟踪误差的尖锐分解,将其分为瞬态、噪声诱导和漂移诱导成分。这种分解揭示了一个根本性的权衡:虽然动量通常被用作梯度平滑启发式方法,但在分布偏移下,它会引入显式漂移放大惩罚,当动量参数β接近1时,该惩罚会发散,导致系统性的跟踪滞后。我们通过梯度变化约束下的最小最大下界补充这些上界,证明这种动量引起的跟踪惩罚并非分析伪影,而是信息论障碍:在漂移主导的 regime 中,动量不可避免地更差,因为旧梯度平均迫使系统性滞后。我们的结果为动量方法在动态环境中的经验不稳定性提供了理论依据,并精确界定了 vanilla SGD 在其加速变体中可证明表现更好的 regime 分界线。

英文摘要

In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothness. Our finite-time bounds reveal a sharp decomposition of tracking error into transient, noise-induced, and drift-induced components. This decomposition exposes a fundamental trade-off: while momentum is often used as a gradient-smoothing heuristic, under distribution shift it incurs an explicit drift-amplification penalty that diverges as the momentum parameter $β$ approaches 1, yielding systematic tracking lag. We complement these upper bounds with minimax lower bounds under gradient-variation constraints, proving this momentum-induced tracking penalty is not an analytical artifact but an information-theoretic barrier: in drift-dominated regimes, momentum is unavoidably worse because stale-gradient averaging forces systematic lag. Our results provide theoretical grounding for the empirical instability of momentum in dynamic settings and precisely delineate regime boundaries where vanilla SGD provably outperforms its accelerated counterparts.

2512.23461 2026-05-20 cs.LG cs.AI 版本更新

Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance

通过信息论指导消除奖励模型中的归纳偏置

Zhuo Li, Pengyu Cheng, Zhechao Yu, Feifei Tong, Anningzhe Gao, Tsung-Hui Chang, Xiang Wan, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba(阿里巴巴大模型应用团队) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Shenzhen Research Institute of Big Data(深圳大数据研究院)

AI总结 本文提出了一种基于信息论的奖励模型去偏方法DIR,通过最大化奖励模型评分与人类偏好对之间的互信息,同时最小化奖励模型输出与偏好输入偏置属性之间的互信息,从而有效缓解归纳偏置问题并提升RLHF性能。

Comments Published as a conference paper at The International Conference on Learning Representations (ICLR) 2026

详情
AI中文摘要

奖励模型(RMs)在人类反馈的强化学习(RLHF)中至关重要,用于将大型语言模型(LLMs)对齐于人类价值观。然而,RM训练数据通常被认为是低质量的,包含可能导致过拟合和奖励黑客的归纳偏置。例如,更详细和全面的响应通常更受人类青睐,但包含更多单词,导致响应长度成为不可避免的归纳偏置之一。有限的先前RM去偏方法要么针对单一特定类型的偏置,要么仅用简单的线性相关性建模,例如皮尔逊系数。为缓解奖励建模中更复杂和多样的归纳偏置,我们引入了一种新的信息论去偏方法,称为通过信息优化的奖励模型去偏(DIR)。受信息瓶颈(IB)的启发,我们最大化奖励模型评分与人类偏好对之间的互信息(MI),同时最小化奖励模型输出与偏好输入偏置属性之间的互信息。从信息论的理论依据出发,DIR能够处理更复杂的偏置类型,具有非线性相关性,从而广泛扩展了RM去偏方法在现实世界中的应用场景。在实验中,我们验证了DIR在三种归纳偏置类型(响应长度、奉承和格式)上的有效性。我们发现,DIR不仅有效缓解了目标归纳偏置,还通过多样化的基准测试提升了RLHF性能,展现出更好的泛化能力。代码和训练配方可在https://github.com/Qwen-Applications/DIR获取。

英文摘要

Reward models (RMs) are essential in reinforcement learning from human feedback (RLHF) to align large language models (LLMs) with human values. However, RM training data is commonly recognized as low-quality, containing inductive biases that can easily lead to overfitting and reward hacking. For example, more detailed and comprehensive responses are usually human-preferred but with more words, leading response length to become one of the inevitable inductive biases. A limited number of prior RM debiasing approaches either target a single specific type of bias or model the problem with only simple linear correlations, \textit{e.g.}, Pearson coefficients. To mitigate more complex and diverse inductive biases in reward modeling, we introduce a novel information-theoretic debiasing method called \textbf{D}ebiasing via \textbf{I}nformation optimization for \textbf{R}M (DIR). Inspired by the information bottleneck (IB), we maximize the mutual information (MI) between RM scores and human preference pairs, while minimizing the MI between RM outputs and biased attributes of preference inputs. With theoretical justification from information theory, DIR can handle more sophisticated types of biases with non-linear correlations, broadly extending the real-world application scenarios for RM debiasing methods. In experiments, we verify the effectiveness of DIR with three types of inductive biases: \textit{response length}, \textit{sycophancy}, and \textit{format}. We discover that DIR not only effectively mitigates target inductive biases but also enhances RLHF performance across diverse benchmarks, yielding better generalization abilities. The code and training recipes are available at https://github.com/Qwen-Applications/DIR.

2512.10891 2026-05-20 cs.RO cs.LG 版本更新

Iterative Compositional Data Generation for Robot Control

迭代组合数据生成用于机器人控制

Anh-Quan Pham, Marcel Hussing, Shubhankar P. Patankar, Dani S. Bassett, Jorge Mendez-Mendez, Eric Eaton

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Stony Brook University(石溪大学)

AI总结 本文提出了一种语义组合扩散变换器,通过注意力机制学习机器人、物体、障碍物和目标特定组件的交互,从而在有限任务集上训练后,能够零样本生成高质量过渡,进而学习未见任务组合的控制策略,并通过迭代自我改进过程提升零样本性能。

详情
AI中文摘要

收集机器人操作数据成本高昂,使得在多对象、多机器人和多环境设置中获取大量任务演示不切实际。尽管最近的生成模型可以为单个任务合成有用的数据,但它们未能利用机器人领域的组合结构,并且在泛化到未见任务组合时表现不佳。我们提出了一种语义组合扩散变换器,将过渡分解为机器人、物体、障碍物和目标特定的组件,并通过注意力机制学习它们的交互。一旦在有限的任务子集上训练,我们展示了模型能够零样本生成高质量的过渡,从而学习未见任务组合的控制策略。然后,我们引入了一个迭代自我改进过程,其中合成数据通过离线强化学习验证,并纳入后续的训练轮次中。我们的方法在单体和硬编码组合基线之上显著提高了零样本性能,最终解决了几乎所有未见任务,并展示了学习表示中出现有意义的组合结构。

英文摘要

Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.

2512.05958 2026-05-20 cs.LG cs.AI 版本更新

MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution

MaxShapley:迈向具有公平上下文归因的激励兼容生成搜索

Sara Patel, Mingxun Zhou, Giulia Fanti

发表机构 * Carnegie Mellon University(卡内基梅隆大学) HKUST(香港科技大学)

AI总结 本文提出MaxShapley算法,用于在生成搜索流程中公平地归因和补偿内容提供者,该算法基于Shapley值的特例,通过可分解的max-sum效用函数在多项式时间内计算归因,相比Shapley值的指数成本具有更高的效率。

详情
AI中文摘要

基于大型语言模型(LLMs)的生成搜索引擎正在取代传统搜索引擎,从根本上改变了信息提供者如何获得补偿。为了维持这一生态系统,我们需要公平的机制来根据内容提供者对生成答案的贡献来归因和补偿。我们介绍了MaxShapley,一种高效的算法,用于在生成搜索流程中进行公平的信用归因,该流程在生成之前检索外部来源。MaxShapley是著名Shapley值的特例;它利用可分解的max-sum效用函数,在文档数量上以多项式时间计算归因,而不是Shapley值的指数成本。我们在三个多跳问答数据集(HotPotQA、MuSiQUE、MS MARCO)上评估MaxShapley;MaxShapley在归因质量上与精确的Shapley计算相当,同时消耗的资源更少——例如,在相同归因准确性下,它在资源消耗上比先前最先进的方法减少了高达9倍。我们发布了开源代码和重新校准的数据集。一个教育演示可在https://fair-search.com上获得。

英文摘要

Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MaxShapley, an efficient algorithm for fair credit attribution in generative search pipelines that retrieve external sources before generation. MaxShapley is a special case of the celebrated Shapley value; it leverages a de-composable max-sum utility function to compute attributions with polynomial-time computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MaxShapley on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MaxShapley achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens--for instance, it gives up to a 9x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy. We release open-source code and re-calibrated datasets. An educational demo is available at https://fair-search.com.

2512.05721 2026-05-20 cs.LG 版本更新

BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences

BERTO:通过自然语言运算偏好进行意图驱动的网络时间序列预测

Nitin Priyadarshini Shankar, Vaibhav Singh, Sheetal Kalyani, Christian Maciocco

发表机构 * Intel Labs(英特尔实验室) Indian Institute of Technology Madras(印度理工学院马德拉斯分校)

AI总结 BERTO通过自然语言运算偏好进行意图驱动的网络时间序列预测,利用BERT框架实现交通预测和能耗优化,结合平衡损失函数和提示条件,使模型能够根据运营商需求动态调整预测偏差,实现灵活的决策感知预测。

Comments 7 pages, 3 figures, 2 tables

详情
AI中文摘要

传统的蜂窝交通预测模型优化于最小化对称误差,使其对操作优先级的变化不敏感。为弥合这一差距,我们引入BERTO,一种基于BERT的框架,用于蜂窝网络的交通预测和能耗优化。基于Transformer架构,BERTO在实现高预测精度的同时,通过自然语言运营商提示使单个微调模型能够在多个预测制度中运行。通过结合平衡损失函数(BLF)和基于提示的条件,BERTO能够根据运营商在节能和服务质量之间的权衡需求,自适应地调整预测偏差,向欠预测或过预测倾斜。这使得同一模型能够在不重新训练或修改模型参数的情况下,动态生成不同的决策感知预测。在真实世界数据集上的实验表明,BERTO可以在约1.4kW的功率消耗范围内运行,同时平衡9倍的服务级别协议(SLA)违规变化,使其非常适合智能RAN部署。

英文摘要

Traditional cellular traffic forecasting models are optimized for minimizing symmetric errors, leaving them indifferent to shifting operational priorities. To bridge this gap, we introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO achieves high prediction accuracy while enabling a single fine-tuned model to operate across multiple forecasting regimes via natural-language operator prompts. By combining a Balancing Loss Function (BLF) with prompt-based conditioning, BERTO adaptively shifts its forecasting bias toward underprediction or overprediction depending on the operator's desired trade-off between power savings and service quality. This allows the same model to dynamically generate different decision-aware forecasts without retraining or modifying model parameters. Experiments on real-world datasets demonstrate that BERTO can operate across a flexible range of approximately 1.4 kW in power consumption while balancing 9x variation in service level agreement (SLA) violations, making it well suited for intelligent RAN deployments.

2512.04452 2026-05-20 physics.ao-ph cs.AI cs.LG physics.comp-ph physics.flu-dyn 版本更新

NORi: An ML-Augmented Ocean Boundary Layer Parameterization

NORi:一种融合机器学习的海洋边界层参数化方法

Xin Kai Lee, Ali Ramadhan, Andre Souza, Gregory LeClaire Wagner, Simone Silvestri, John Marshall, Raffaele Ferrari

发表机构 * Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology(麻省理工学院地球、大气与行星科学系) Center for Computational Science and Engineering, Massachusetts Institute of Technology(麻省理工学院计算科学与工程中心) Department of Physics, Imperial College London(伦敦帝国学院物理系) atdepth Aeolus Labs(Aeolus实验室) Department of Environment, Land and Infrastructure Engineering, Politecnico di Torino(托里诺理工学院环境、土地与基础设施工程系)

AI总结 NORi是一种基于物理并结合神经网络的机器学习海洋边界层湍流参数化方法,通过训练大规模涡旋模拟来捕捉边界层底部的混合过程,展示了在不同对流强度、背景层结、旋转和风力作用下的预测和泛化能力。

Comments 58 pages, 20 figures, submitted to Journal of Advances in Modeling Earth Systems (JAMES). This is version 2, updated based on reviews from 3 anonymous reviewers after initial submission to JAMES. The largest change from the previous version is the addition of comparisons with realistic observations from a long-term monitoring site in the Northeast Pacific

详情
AI中文摘要

NORi是一种基于物理并结合神经网络的机器学习海洋边界层湍流参数化方法。NORi代表神经普通微分方程(NODEs)里氏数(Ri)闭合。物理参数化通过依赖里氏数的扩散率和粘度进行控制。神经ODEs被训练以捕捉通过边界层底部的混合过程,这无法通过局部扩散闭合来表示。参数化通过大规模涡旋模拟以“后验”方式训练,其中参数通过一个显式依赖于实际时间积分变量的损失函数进行校准,而不是瞬时子格尺度通量,后者本质上是嘈杂的。NORi通过设计保留踪迹,使用现实的非线性热力学,并在不同对流强度、背景层结、旋转和风力作用下表现出卓越的预测和泛化能力。NORi在Ocean Weather Station Papa处模拟了边界层的季节演变,其性能与最先进的两方程k-ε闭合相当。当在双环流模拟中实现时,尽管仅在两天时间范围内训练,它在至少100年内数值上是稳定的,可以以一小时的时间步长运行。高度表达性的神经网络与严格的物理基础闭合相结合,证明了在气候模型中设计参数化的稳健范式:所需数据和训练成本大大减少,推理性能可以作为主要目标直接优化,数值稳定性通过训练隐含地得到促进。

英文摘要

NORi is a machine learning (ML) parameterization of ocean boundary layer turbulence that is physics-based and augmented with neural networks. NORi stands for neural ordinary differential equations (NODEs) Richardson number (Ri) closure. The physical parameterization is controlled by a Richardson number-dependent diffusivity and viscosity. The neural ODEs are trained to capture the entrainment through the base of the boundary layer, which cannot be represented with a local diffusive closure. The parameterization is trained using large-eddy simulations in an "a posteriori" fashion, where parameters are calibrated with a loss function that explicitly depends on the actual time-integrated variables of interest rather than the instantaneous subgrid fluxes, which are inherently noisy. NORi conserves tracers by design, uses realistic nonlinear thermodynamics, and demonstrates excellent prediction and generalization capabilities in capturing entrainment dynamics under different convective strengths, background stratifications, rotation, and wind forcings. NORi is shown to simulate the seasonal evolution of the boundary layer at Ocean Weather Station Papa with similar performance to the state-of-the-art two-equation $k$-$ε$ closure. When implemented in a double-gyre simulation, it is numerically stable for at least 100 years, despite only being trained on two-day horizons, and can be run with time steps as long as one hour. The highly expressive neural networks, combined with a physically rigorous base closure, prove to be a robust paradigm for designing parameterizations for climate models: data required and training cost are drastically reduced, inference performance can be directly optimized as a primary objective, and numerical stability is implicitly promoted through training.

2512.01152 2026-05-20 cs.LG cs.AI cs.CV 版本更新

Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution

开放集域适应在背景分布偏移下的挑战:挑战与一种可证明高效的解决方案

Shravan Chaudhari, Yoav Wald, Suchi Saria

发表机构 * Department of Computer Science, Johns Hopkins University(约翰霍普金斯大学计算机科学系) Faculty of Data and Decision Sciences, Technion(技术学院数据与决策科学学院) Center for Data Science, New York University(纽约大学数据科学中心) Bayesian Health(贝叶斯健康)

AI总结 本文研究了在背景分布偏移情况下开放集域适应的挑战,并提出了一种可证明高效的解决方案CoLOR,通过理论分析和实验证明其在简化过参数化设置中优于基线方法,同时展示了其在图像和文本数据上的广泛适用性。

Comments Project page at https://github.com/Shra1-25/CoLOR

详情
Journal ref
Transactions on Machine Learning Research (TMLR) 2026/May ISSN: 2835-8856
AI中文摘要

随着我们将机器学习系统部署到现实世界中,一个核心挑战是保持模型在数据偏移时的性能。这种偏移可以以多种形式存在:新类可能在训练时不存在,这被称为开放集识别,以及已知类别的分布可能发生变化。对于开放集识别的保证大多基于假设已知类别的分布(我们称之为背景分布)是固定的。在本文中,我们开发了CoLOR,一种在挑战性情况下(即背景分布偏移)也能解决开放集识别的方法。我们证明该方法在温和假设下有效,即新类可与非新类分离,并提供理论保证,表明其在简化过参数化设置中优于代表基线方法。我们开发了使CoLOR可扩展和稳健的技术,并在图像和文本数据上进行了全面的实证评估。结果表明,CoLOR在背景偏移下显著优于现有开放集识别方法。此外,我们还提供了新的见解,探讨了诸如新类大小等因素对性能的影响,这在先前工作中尚未得到广泛探索。

英文摘要

As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may emerge that were absent during training, a problem known as open-set recognition, and the distribution of known categories may change. Guarantees on open-set recognition are mostly derived under the assumption that the distribution of known classes, which we call the background distribution, is fixed. In this paper we develop CoLOR, a method that is guaranteed to solve open-set recognition even in the challenging case where the background distribution shifts. We prove that the method works under benign assumptions that the novel class is separable from the non-novel classes, and provide theoretical guarantees that it outperforms a representative baseline in a simplified overparameterized setting. We develop techniques to make CoLOR scalable and robust, and perform comprehensive empirical evaluations on image and text data. The results show that CoLOR significantly outperforms existing open-set recognition methods under background shift. Moreover, we provide new insights into how factors such as the size of the novel class influences performance, an aspect that has not been extensively explored in prior work.

2511.12158 2026-05-20 cs.LG 版本更新

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

用于细粒度鸟类叫声分析的数据高效自监督算法

Houtan Ghaffari, Lukas Rauch, Paul Devos

发表机构 * Department of Information Technology, Ghent University(根特大学信息科技系) Intelligent Embedded Systems, University of Kassel(卡塞尔大学智能嵌入式系统)

AI总结 本文提出了一种数据高效的鸟类叫声标注器,通过三阶段训练流程在最小标注情况下开发可靠的鸟类叫声音节检测器,并在极端标注稀缺场景下验证了其有效性,同时评估了自监督嵌入在线性探测和无监督鸟类叫声分析中的潜力。

详情
AI中文摘要

生物声学、神经科学和语言学研究经常使用鸟类叫声作为代理来获取跨不同领域的知识。这需要音频模型能够标注和解析鸟类叫声。开发此类模型需要精确的、音节级注释的训练数据。因此,减少标注成本的自动化方法需求迫切。本文提出了一种数据高效的鸟类叫声标注器,称为残差多层感知机递归神经网络。然后,本文提出了一个三阶段训练流程,以在最小标注情况下开发可靠的鸟类叫声音节检测器。第一阶段是从未标注数据中进行自监督学习。探索了两种最成功的预训练范式,即掩码预测和在线聚类。第二阶段是使用有效的数据增强进行监督训练,以为每个个体生成稳健的帧级音节检测器。第三阶段是一个半监督的后训练步骤,利用未标注数据来优化每个个体的模型。该方法在极端标注稀缺场景下对金翅雀叫声进行了验证。从信号处理的角度来看,金翅雀叫声表现出最具有挑战性的频谱-时间模式之一,对于算法时间序列标注而言:快速的发声、短暂的音节间间隔、快速且宽带的频率扫频,以及需要细粒度特征区分的光谱相似音节。因此,成功的金翅雀音节检测算法为其他鸟类建立了稳健的基准。这种方法论的泛化在白喉歌鸲叫声标注的案例研究中得到了验证。最后,评估了自监督嵌入在线性探测和无监督鸟类叫声分析中的潜力。

英文摘要

Research in bioacoustics, neuroscience, and linguistics often uses birdsong as a proxy to acquire knowledge across diverse areas. This requires audio models to annotate and parse the birdsong. Developing such models requires precise, syllable-level annotated training data. Therefore, automated methods that reduce annotation costs are in demand. This work presents a data-efficient birdsong annotator called Residual Multi-Layer Perceptron Recurrent Neural Network. It then presents a three-stage training pipeline for developing reliable birdsong syllable detectors with minimal annotation. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentation to produce a robust frame-level syllable detector for each individual. The third stage is a semi-supervised post-training step that refines each individual's model using unlabeled data. The effectiveness of this approach is demonstrated for the Canary song in extreme label-scarcity scenarios. From a signal-processing perspective, the Canary song exhibits one of the most challenging spectro-temporal patterns for algorithmic time-series annotation: rapid vocalizations, brief inter-syllabic intervals, fast and broadband frequency sweeps, and spectrally similar syllables that require fine-grained features to distinguish. Hence, a successful syllable detection algorithm for Canary also establishes a robust baseline for other birds. This methodological generalization is validated in a case study of Bengalese Finch song annotation. Finally, the potential of self-supervised embeddings is assessed for linear probing and unsupervised birdsong analysis.

2511.11688 2026-05-20 cs.LG cs.CV 版本更新

Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling

分层调度优化用于快速且稳健的扩散模型采样

Aihua Zhu, Rui Su, Qinglin Zhao, Li Feng, Meng Shen, Shibo He

发表机构 * School of Computer Science and Engineering, Macau University of Science and Technology(澳门科学技术大学计算机科学与工程学院) Beijing Institute of Technology(北京理工大学) Zhejiang University(浙江大学)

AI总结 本文提出了一种分层调度优化方法,通过改进的双层优化框架,在极低的函数评估次数下实现高效的扩散模型采样,显著提升了样本质量和计算效率。

Comments Preprint, accepted to AAAI 2026

详情
AI中文摘要

扩散概率模型在生成保真度方面设立了新标准,但受到采样过程缓慢的迭代限制。一种强大的无训练策略是调度优化,旨在在固定的、较小的函数评估次数(NFE)下找到最优的时间步分布以最大化样本质量。为此,成功的调度优化方法必须遵循四个核心原则:有效性、适应性、实用性鲁棒性和计算效率。然而,现有方法难以同时满足这些原则,推动了更先进解决方案的需求。为克服这些限制,我们提出了分层调度优化器(HSO),一种新颖且高效的双层优化框架。HSO通过交替迭代两个协同层级将全局最优调度的搜索转化为更可处理的问题:上层的全局搜索用于寻找最优初始化策略,下层的局部优化用于调度细化。这一过程由两个关键创新引导:中点误差代理(MEP),一种求解器无关且数值稳定的局部优化目标,以及间距惩罚适应度(SPF)函数,通过惩罚病态接近的时间步确保实用性鲁棒性。大量实验表明,HSO在极低NFE范围内为无训练采样设定了新的状态-of-the-art。例如,仅使用5次NFE,HSO在LAION-Aesthetics上实现显著的FID为11.94,使用Stable Diffusion v2.1。关键的是,这种性能不是通过昂贵的重新训练实现的,而是一次性的优化成本不到8秒,提供了一种高效且实用的扩散模型加速范式。

英文摘要

Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.

2511.06714 2026-05-20 eess.SY cs.LG cs.SY 版本更新

The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning

人群智慧:利用集成和机器学习实现电力系统中网络攻击和故障的高保真分类

Emad Abukhousa, Syed Sohail Feroz Syed Afroz, Fahad Alsaeed, Abdulaziz Qwbaiban, Saman Zonouz, A. P. Sakis Meliopoulos

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA(电气与计算机工程学院,佐治亚理工学院,美国亚特兰大,GA)

AI总结 本文提出了一种高保真评估框架,利用电磁暂态仿真与数字变电站仿真在4.8kHz下评估基于机器学习的网络攻击和物理故障分类方法,通过训练12种机器学习模型并在实时流环境中评估,展示了在流式环境中MLP的鲁棒覆盖性和集成模型的异常精度。

详情
AI中文摘要

本文提出了一种高保真评估框架,用于利用电磁暂态仿真与数字变电站仿真在4.8kHz下评估基于机器学习的网络攻击和物理故障分类方法。十二种机器学习模型,包括集成算法和多层感知机(MLP),在标记的时间域测量上进行训练,并在设计用于子周期响应的实时流环境中进行评估。该架构集成了周期长度平滑滤波器和置信度阈值以稳定决策。结果表明,尽管几种模型在离线准确性方面接近完美(高达99.9%),但只有MLP在流式环境中保持了稳健的覆盖率(98-99%),而集成模型保持了完美的异常精度,但经常回避(10-49%覆盖)。这些发现表明,仅凭离线准确性本身是不可靠的,强调了需要现实的测试和推理管道以确保在基于逆变器资源(IBR)丰富的网络中的可靠分类。

英文摘要

This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with digital substation emulation at 4.8 kHz. Twelve ML models, including ensemble algorithms and a multi-layer perceptron (MLP), were trained on labeled time-domain measurements and evaluated in a real-time streaming environment designed for sub-cycle responsiveness. The architecture incorporates a cycle-length smoothing filter and confidence threshold to stabilize decisions. Results show that while several models achieved near-perfect offline accuracies (up to 99.9%), only the MLP sustained robust coverage (98-99%) under streaming, whereas ensembles preserved perfect anomaly precision but abstained frequently (10-49% coverage). These findings demonstrate that offline accuracy alone is an unreliable indicator of field readiness and underscore the need for realistic testing and inference pipelines to ensure dependable classification in inverter-based resources (IBR)-rich networks.

2511.06077 2026-05-20 cs.LG cs.IR 版本更新

Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

让序列变长,让速度保持快速:面向十万个用户行为序列的端到端推荐系统

Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, Hangyu Wang, Qiwei Chen, Yi Cheng, Feng Zhang, Xiao Yang

发表机构 * ByteDance Beijing China(字节跳动北京中国) ByteDance Shanghai China(字节跳动上海中国) ByteDance San Jose CA USA(字节跳动加州圣何塞 USA) ByteDance Hangzhou Zhejiang China(字节跳动杭州浙江中国)

AI总结 本文提出了一种端到端的推荐系统,能够处理长达10000个用户行为序列,通过引入堆叠的目标到历史交叉注意力机制、请求级别批量处理策略以及长度外推训练策略,实现了在大规模Douyin推荐中的高效长序列建模。

Comments WWW 2026. This work studies end-to-end 10K-scale long user behavior sequence modeling for billion-scale industrial recommendation on Douyin

详情
AI中文摘要

像Douyin这样的短视频推荐系统必须在不牺牲延迟或成本预算的前提下利用极其长的用户行为历史。我们提出了一种端到端的工业推荐系统,将长序列推荐建模扩展到10000长度的历史记录。首先,我们引入了堆叠的目标到历史交叉注意力(STCA),通过用目标到历史的堆叠交叉注意力替代历史自注意力,将复杂度从二次方降低到线性,从而在长用户行为序列上实现高效的端到端训练。其次,我们提出了请求级别批量处理(RLB),一种以用户为中心的批量方案,将相同用户/请求的多个目标聚合起来共享用户侧编码,显著降低了与序列相关的存储、通信和计算成本,而无需改变学习目标。第三,我们设计了一种长度外推训练策略——在较短的窗口上训练,在更长的窗口上推断——从而使模型能够泛化到10000规模的历史记录而无需额外的训练成本。在离线和在线实验中,我们观察到随着历史长度和模型容量的增加,我们获得的收益是可预测且单调的,与在大型语言模型中观察到的扩展定律行为相呼应。在Douyin全流量部署中,我们的系统在关键参与度指标上实现了显著提升,同时满足了生产延迟,展示了将端到端超长序列推荐扩展到10000规模的实用路径。

英文摘要

Short-video recommenders such as Douyin must exploit extremely long user behavior histories without breaking latency or cost budgets. We present an end-to-end industrial recommender system that scales long-sequence recommendation modeling to 10K-length histories in production. First, we introduce Stacked Target-to-History Cross Attention (STCA), which replaces history self-attention with stacked cross-attention from the target to the history, reducing complexity from quadratic to linear in sequence length and enabling efficient end-to-end training over long user behavior sequences. Second, we propose Request Level Batching (RLB), a user-centric batching scheme that aggregates multiple targets for the same user/request to share the user-side encoding, substantially lowering sequence-related storage, communication, and compute without changing the learning objective. Third, we design a length-extrapolative training strategy -- train on shorter windows, infer on much longer ones -- so the model generalizes to 10K-scale histories without additional training cost. Across offline and online experiments, we observe predictable, monotonic gains as we scale history length and model capacity, mirroring the scaling law behavior observed in large language models. Deployed at full traffic on Douyin, our system delivers significant improvements on key engagement metrics while meeting production latency, demonstrating a practical path to scaling end-to-end ultra-long sequence recommendation to the 10K regime.

2511.01126 2026-05-20 cs.LG cs.NA math.NA math.OC math.ST stat.TH 版本更新

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

在线零阶和一阶双层优化的随机遗憾保证

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, George Michailidis

发表机构 * Amirkabir University of Technology(阿姆斯泰尔大学) University of Pennsylvania(宾夕法尼亚大学) Samsung SDS Research America(三星SDS美国研究部) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出了一种新的搜索方向,证明了利用该方向的零阶和一阶随机在线双层优化算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。此外,该框架通过减少超梯度估计中的oracle依赖、同时更新内层和外层变量以及使用基于零阶的Hessian、雅可比和梯度估计来提高效率。

Comments Published at NeurIPS 2025

详情
AI中文摘要

在线双层优化(OBO)是一种强大的框架,用于解决机器学习问题,其中外层和内层目标随时间演变,需要动态更新。当前的OBO方法依赖于确定性的窗口平滑后悔最小化,这在函数变化迅速时可能无法准确反映系统性能。在本文中,我们引入了一种新的搜索方向,并证明利用该方向的零阶和一阶随机OBO算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。除了这些保证外,我们的框架通过以下方式提高效率:(i)减少超梯度估计中的oracle依赖,(ii)在求解线性系统的同时更新内层和外层变量,(iii)使用基于零阶的Hessian、雅可比和梯度估计。在在线参数损失调谐和黑盒对抗攻击的实验中验证了我们的方法。

英文摘要

Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.

2510.23507 2026-05-20 cs.LG cs.AI cs.IT math.IT 版本更新

A Deep Latent Factor Graph Clustering with Fairness-Utility Trade-off Perspective

具有公平性-效用权衡视角的深度潜在因子图聚类

Siamak Ghodsi, Amjad Seyedi, Tai Le Quy, Fariba Karimi, Eirini Ntoutsi

发表机构 * L3S Research Center(L3S研究所以) University of Mons(蒙斯大学) University of Koblenz(科布伦茨大学) Bundeswehr University(联邦国防军大学)

AI总结 本文提出DFNMF,一种针对图的端到端深度非负三因子分解方法,通过软统计平衡正则化直接优化聚类分配,以实现公平性与效用的平衡,同时在合成和真实网络中表现出更高的群体平衡性和更高的模ularity。

Comments Accepted to IEEE Big-Data 2025 main research track. The paper is 10 main pages and 4 pages of Appendix

详情
Journal ref
2025 IEEE International Conference on Big Data (BigData)
AI中文摘要

公平图聚类旨在找到尊重网络结构的同时保持敏感群体比例的划分,应用范围涵盖社区检测、团队组建、资源分配和社会网络分析。许多现有方法强制性约束或依赖多阶段流程(例如谱嵌入后接k-均值),限制了权衡控制、可解释性和可扩展性。我们引入DFNMF,一种针对图的端到端深度非负三因子分解方法,直接优化聚类分配,使用软统计平衡正则化。单个参数λ调节公平性-效用平衡,非负性产生部分因子和透明的软成员资格。优化使用稀疏友好的交替更新,与边数成近线性比例。在合成和真实网络中,DFNMF在可比的模ularity下实现了显著更高的群体平衡,经常在帕累托前沿上超越最先进基线。代码可在https://github.com/SiamakGhodsi/DFNMF.git获得。

英文摘要

Fair graph clustering seeks partitions that respect network structure while maintaining proportional representation across sensitive groups, with applications spanning community detection, team formation, resource allocation, and social network analysis. Many existing approaches enforce rigid constraints or rely on multi-stage pipelines (e.g., spectral embedding followed by $k$-means), limiting trade-off control, interpretability, and scalability. We introduce \emph{DFNMF}, an end-to-end deep nonnegative tri-factorization tailored to graphs that directly optimizes cluster assignments with a soft statistical-parity regularizer. A single parameter $λ$ tunes the fairness--utility balance, while nonnegativity yields parts-based factors and transparent soft memberships. The optimization uses sparse-friendly alternating updates and scales near-linearly with the number of edges. Across synthetic and real networks, DFNMF achieves substantially higher group balance at comparable modularity, often dominating state-of-the-art baselines on the Pareto front. The code is available at https://github.com/SiamakGhodsi/DFNMF.git.

2510.20035 2026-05-20 stat.ME cs.LG 版本更新

Throwing Vines at the Wall: Structure Learning via Random Search

向墙上投掷藤蔓:通过随机搜索进行结构学习

Thibault Vatter, Thomas Nagler

发表机构 * University of Applied Sciences Western Switzerland(应用科学西瑞士大学) LMU Munich(慕尼黑大学) Munich Center for Machine Learning(慕尼黑机器学习中心)

AI总结 本文提出基于模型置信集的统计框架和随机搜索算法,以改进结构选择,提供理论保证,并为集成学习奠定基础。

详情
AI中文摘要

Vine copulas 提供了灵活的多变量依赖建模,并在机器学习中被广泛应用。然而,结构学习仍然是一个关键挑战。早期的启发式方法,如 Dissmann 的贪心算法,仍被视为金标准,但往往效果不佳。我们提出随机搜索算法和基于模型置信集的统计框架,以改进结构选择,提供对选择概率和超额风险的理论保证,并为集成学习奠定基础。在真实世界数据集上的实验证明,我们的方法在各方面都优于最先进的方法。

英文摘要

Vine copulas offer flexible multivariate dependence modeling and have become widely used in machine learning. Yet, structure learning remains a key challenge. Early heuristics, such as Dissmann's greedy algorithm, are still considered the gold standard but are often suboptimal. We propose random search algorithms and a statistical framework based on model confidence sets, to improve structure selection, provide theoretical guarantees on selection probabilities and excess risk, as well as serve as a foundation for ensembling. Empirical results on real-world data sets show that our methods consistently outperform state-of-the-art approaches.

2510.19382 2026-05-20 stat.ML cs.LG 版本更新

A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

一种用于结构发现的去随机化框架:应用于神经网络及其他领域

Nikos Tsikouras, Yorgos Pantis, Ioannis Mitliagkas, Christos Tzamos

发表机构 * National and Kapodistrian University of Athens(希腊国家与卡波迪斯蒂亚纳大学) Archimedes, Athena Research Center(阿提卡研究中心) Mila & Université de Montréal(蒙特利尔大学)

AI总结 本文研究了神经网络中特征学习动态的理解问题,提出了一种基于去随机化方法的结构发现框架,在更弱的假设下探讨了结构发现的本质及其在MAXCUT端到端近似和Johnson-Lindenstrauss嵌入计算中的应用。

详情
AI中文摘要

理解神经网络中特征学习动态的机制仍然是一个重大挑战。Mousavi-Hosseini等人(2023)分析了多重索引教师-学生设置,并展示了在使用随机梯度下降(SGD)和强正则化器训练时,两层学生模型的第一层权重会呈现低秩结构。这种结构特性已知可以减少泛化样本复杂度。在第二步中,同一作者们在额外假设下建立了算法特定的学习保证。本文专注于结构发现方面,并在更弱的假设下研究了该问题,具体包括:允许任意大小和深度的神经网络,所有参数可训练,任何平滑损失函数,微弱正则化,以及通过任何能够达到二阶平稳点(SOSP)的方法(例如扰动梯度下降(PGD))进行训练。我们方法的核心是一个关键的去随机化引理,该引理指出在温和条件下,优化函数E_x[g_θ(Wx + b)]会收敛到W=0的点。该引理的本质直接解释了结构发现,并在其他领域如端到端MAXCUT近似和Johnson-Lindenstrauss嵌入计算中具有即时应用。

英文摘要

Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g.\ perturbed gradient descent (PGD). At the core of our approach is a key $\textit{derandomization}$ lemma, which states that optimizing the function $\mathbb{E}_{\mathbf{x}} \left[g_θ(\mathbf{W}\mathbf{x} + \mathbf{b})\right]$ converges to a point where $\mathbf{W} = \mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.

2510.18821 2026-05-20 cs.LG 版本更新

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

搜索自play:在无监督条件下推动智能体能力的前沿

Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Jiaqi Guo, Haotian Xu, Chutian Wang, Haonan Chen, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba(阿里巴巴文勤大模型应用团队)

AI总结 本文提出了一种基于自play的深度搜索智能体训练方法,通过自动生成任务和解决任务来提升智能体在无监督条件下的性能,无需外部监督。

Comments Published as a conference paper at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
AI中文摘要

可验证奖励的强化学习(RLVR)已成为训练大语言模型(LLM)智能体的主要技术。然而,RLVR高度依赖精心设计的任务查询和相应的地面真实答案来提供准确的奖励,这需要大量的人力努力,并阻碍了RL过程的扩展,尤其是在代理场景中。尽管一些最近的工作探索了任务合成方法,但生成的代理任务的难度很难控制以提供有效的RL训练优势。为了实现更高可扩展性的代理RLVR,我们探索了深度搜索代理的自play训练,其中学习LLM利用多轮搜索引擎调用,并同时充当任务提出者和问题解决者。任务提出者的目标是生成具有明确地面真实答案和逐渐增加的任务难度的深度搜索查询。问题解决者试图处理生成的搜索查询并输出正确的答案预测。为了确保每个生成的搜索查询都有准确的地面真实,我们收集所有从提出者轨迹中获得的搜索结果作为外部知识,然后进行检索增强生成(RAG)以测试所提出的查询是否可以使用所有必要的搜索文档来正确回答。在这个搜索自play(SSP)游戏中,提出者和解决者通过竞争和合作共同进化其智能体能力。通过大量实验结果,我们发现SSP可以在各种基准上显著提高搜索代理的性能,而无需任何监督,在从头开始和连续RL训练设置下均如此。代码在https://github.com/Qwen-Applications/SSP。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become the mainstream technique for training LLM agents. However, RLVR highly depends on well-crafted task queries and corresponding ground-truth answers to provide accurate rewards, which requires significant human effort and hinders the scaling of RL processes, especially in agentic scenarios. Although a few recent works explore task synthesis methods, the difficulty of generated agentic tasks can hardly be controlled to provide effective RL training advantages. To achieve agentic RLVR with higher scalability, we explore self-play training for deep search agents, in which the learning LLM utilizes multi-turn search engine calling and acts simultaneously as both a task proposer and a problem solver. The task proposer aims to generate deep search queries with well-defined ground-truth answers and increasing task difficulty. The problem solver tries to handle the generated search queries and output the correct answer predictions. To ensure that each generated search query has accurate ground truth, we collect all the searching results from the proposer's trajectory as external knowledge, then conduct retrieval-augmentation generation (RAG) to test whether the proposed query can be correctly answered with all necessary search documents provided. In this search self-play (SSP) game, the proposer and the solver co-evolve their agent capabilities through both competition and cooperation. With substantial experimental results, we find that SSP can significantly improve search agents' performance uniformly on various benchmarks without any supervision under both from-scratch and continuous RL training setups. The code is at https://github.com/Qwen-Applications/SSP.

2510.16814 2026-05-20 cs.LG cs.AI cs.CV 版本更新

Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity

景观中的针:在标签稀缺条件下用于考古遗址发现的半监督伪标签方法

Simon Jaxy, Anton Theys, Patrick Willett, W. Chris Carleton, Ralf Vandam, Pieter Libin

发表机构 * Sensors, Royal Military Academy, Brussels, Belgium AMGC (Archaeology, Environmental Changes \& Geo-Chemistry), Vrije Universiteit Brussel Max Planck Institute of Geoanthropology, Jena, Germany Shared first author Shared last author

AI总结 本文提出了一种非对称双伪标签(DPL)方法,通过端到端深度学习直接从多波段遥感影像中学习稀疏正样本,无需人工特征工程或对遗址不存在的假设,在两个著名的考古数据集上进行了评估。DPL在Sagalassos数据集上优于LAMAP基线,在F1和召回率上分别提高了12%和29%,而在Cyprus数据集上,DPL在无确认负样本的纯PU设置中恢复了判别能力。DPL的集成产生可解释的概率表面,支持调查规划,从最小的标记数据中有效发现遗址。

详情
AI中文摘要

考古预测建模通过结合已知位置与环境和地理空间变量来估计未发现遗址的可能位置,提出了一个积极无标签(PU)学习挑战,其中确认的遗址稀少,大多数位置未标记而非真正的负样本。为克服这一问题,我们提出了非对称双伪标签(DPL),一种端到端深度学习方法,直接从多波段遥感影像中学习稀疏正样本,无需人工特征工程或对遗址不存在的假设,并在两个著名的考古数据集上进行了评估。在Sagalassos数据集上,与独立的验证现场调查相比,DPL在F1和召回率上分别优于LAMAP基线12%和29%,而LAMAP在概率排名上保持优势。标准监督基线在负样本不确定时失败惨烈;仅正样本训练崩溃为预测 everywhere,建立经验界限。在Cyprus数据集上,纯PU设置中无确认负样本,SL翻转概率排名,而DPL恢复判别能力。DPL集成产生可解释的概率表面,支持调查规划,从最小的标记数据中有效发现遗址。

英文摘要

Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental and geospatial variables, presenting a positive-unlabeled (PU) learning challenge where confirmed sites are rare and most locations are unlabeled rather than truly negative. To overcome this, we propose asymmetric dual pseudolabeling (DPL), an end-to-end deep learning method that learns from sparse positives directly from multi-band geospatial imagery without hand-crafted feature engineering or assumptions about site absence, and evaluate on two prominent archaeological datasets. On the Sagalassos dataset, evaluated against an independent, held-out field survey, DPL outperforms the LAMAP baseline by 12% in F1 and 29% in Recall, while LAMAP maintains advantages in probability ranking. Standard supervised baselines fail catastrophically when negatives are uncertain; positive-only training collapses to predicting everywhere, es- tablishing empirical bounds. On the Cyprus dataset, a pure PU setting without confirmed negatives, SL inverts probability rankings while DPL recovers discrimination. DPL ensembles produce interpretable probability surfaces supporting survey planning, enabling effective site discovery from minimal labeled data.

2510.12773 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Dr.LLM: Dynamic Layer Routing in LLMs

Dr.LLM:大语言模型中的动态层路由

Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh

发表机构 * Parameter Lab(参数实验室) MBZUAI(穆扎夫法尔国际人工智能研究院) NAVER AI Lab(NAVER人工智能实验室) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心)

AI总结 本文提出Dr.LLM,一种通过在预训练模型中加入轻量级每层路由器来实现动态层路由的框架,该方法在不改变基础权重的情况下,通过显式监督训练路由器,提高推理的计算效率和准确性。

Comments Published at ICLR 2026

详情
AI中文摘要

大语言模型(LLMs)处理每个token时都会通过transformer堆栈的所有层,这导致简单查询的计算浪费以及更复杂的查询需要更深层次推理时的灵活性不足。适应深度方法可以提高效率,但先前的方法依赖于成本高昂的推理时间搜索、架构更改或大规模重新训练,在实践中虽然提高了效率,但常常导致准确性下降。我们介绍了Dr.LLM,即大语言模型中的动态层路由,一种可回退的框架,该框架为预训练模型配备了轻量级每层路由器,决定跳过、执行或重复一个块。路由器通过显式监督进行训练:使用蒙特卡洛树搜索(MCTS),我们推导出高质量的层配置,以在计算预算下保持或提高准确性。我们的设计,包括窗口池化以实现稳定的路由、聚焦损失与类别平衡以及瓶颈MLP路由器,确保在类别不平衡和长序列下具有鲁棒性。在ARC(逻辑)和DART(数学)上,Dr.LLM在每个示例上平均节省5层的同时,将准确性提高了最高3.4个百分点。路由器能够泛化到域外任务(MMLU、GSM8k、AIME、TruthfulQA、SQuADv2、GPQA、PIQA、AGIEval)时,仅导致0.85%的准确性下降,同时保持效率,并在某些情况下优于先前的路由方法。总体而言,Dr.LLM展示了通过显式监督训练的路由器可以回退冻结的LLMs,以实现预算意识、准确性驱动的推理,而无需改变基础权重。代码可在https://github.com/parameterlab/dr-llm上获得。

英文摘要

Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inference-time search, architectural changes, or large-scale retraining, and in practice often degrade accuracy despite efficiency gains. We introduce Dr. LLM, Dynamic routing of Layers for LLMs, a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block. Routers are trained with explicit supervision: using Monte Carlo Tree Search (MCTS), we derive high-quality layer configurations that preserve or improve accuracy under a compute budget. Our design, windowed pooling for stable routing, focal loss with class balancing, and bottleneck MLP routers, ensures robustness under class imbalance and long sequences. On ARC (logic) and DART (math), Dr. LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average. Routers generalize to out-of-domain tasks (MMLU, GSM8k, AIME, TruthfulQA, SQuADv2, GPQA, PIQA, AGIEval) with only 0.85% accuracy drop while retaining efficiency, and outperform prior routing methods by up to +7.7%p. Overall, Dr. LLM shows that explicitly supervised routers retrofit frozen LLMs for budget-aware, accuracy-driven inference without altering base weights. Code is available at https://github.com/parameterlab/dr-llm.

2510.09872 2026-05-20 cs.LG cs.AI 版本更新

WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

WARC-Bench:基于网络存档的GUI子任务执行基准

Sanjari Srivastava, Gang Li, Cheng Chang, Rishu Garg, Manpreet Kaur, Charlene Y. Lee, Yuezhang Li, Yining Mao, Ignacio Cases, Yanan Xie, Peng Qi

发表机构 * Uniphore

AI总结 本文提出WARC-Bench,一个基于网络存档的GUI子任务执行基准,通过438个任务评估多模态AI代理在子任务上的能力,实验表明SFT和RLVR方法在提升子任务执行效果上取得显著成果。

详情
AI中文摘要

训练能够导航复杂现实网站的网络代理需要它们掌握子任务——多个UI组件上的短周期交互(例如在日期选择器中选择正确日期或在容器中滚动以提取信息)。我们介绍了WARC-Bench(网络存档基准),一个新型的网络导航基准,包含438个任务,旨在评估多模态AI代理在子任务上的能力。WARC-Bench利用Web ARChive文件实现动态且逼真的网页沙盒交互。我们证明WARC-Bench对领先的计算机使用模型具有挑战性,最高观察到的成功率仅为64.8%。为了提高开源模型在子任务上的表现,我们探索了两种常见的训练技术:监督微调(SFT)和具有可验证奖励的强化学习(RLVR)。实验表明,SFT模型在基准上的成功率为48.8%。在数据稀缺的情况下,通过RLVR训练SFT检查点,将分数提高到52.8%,在WARC-Bench上优于许多前沿模型。我们的分析得出结论:掌握这些子任务对于稳健的网络规划和导航至关重要,而这一能力并未被现有基准充分评估。

英文摘要

Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container to extract information). We introduce WARC-Bench (Web Archive Benchmark), a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI agents on subtasks. WARC-Bench enables sandboxed interactions with dynamic and realistic webpages using Web ARChive files. We show that WARC-Bench is challenging for leading computer-use models, with the highest observed success rate being 64.8%. To improve open source models on subtask, we explore two common training techniques: supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). Experiments show that SFT models obtain a 48.8% success rate on the benchmark. Training with RLVR over SFT checkpoints, even in data-scarce settings, improves the score to 52.8% on WARC-Bench, outperforming many frontier models. Our analysis concludes that mastering these subtasks is essential for robust web planning and navigation, and is a capability not extensively evaluated by existing benchmarks.

2510.09174 2026-05-20 cs.LG 版本更新

Robustness and Regularization in Hierarchical Re-Basin

层次化重盆地中的鲁棒性与正则化

Benedikt Franke, Florian Heinrich, Markus Lange, Arne Raulf

发表机构 * German Aerospace Center (DLR) - Institute for AI Safety and Security(德国航空航天中心(DLR)- 人工智能安全与保密研究所)

AI总结 本文研究了Git Re-Basin在模型合并中的鲁棒性和正则化问题,提出了一种层次化模型合并方案,显著优于标准的MergeMany算法,并发现Re-Basin在合并模型中引入了对抗鲁棒性和扰动鲁棒性,但实验显示其性能下降比原始作者报告的更大。

Comments Published in 32th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2024

详情
AI中文摘要

本文对Git Re-Basin进行了深入研究,这是一种新颖的模型合并方法。我们提出了一种层次化模型合并方案,其性能显著优于标准的MergeMany算法。通过我们的新算法,我们发现Re-Basin在合并模型中引入了对抗鲁棒性和扰动鲁棒性,其效果随着参与层次化合并的模型数量增加而增强。然而,在我们的实验中,Re-Basin引起的性能下降比原始作者报告的要大得多。

英文摘要

This paper takes a closer look at Git Re-Basin, an interesting new approach to merge trained models. We propose a hierarchical model merging scheme that significantly outperforms the standard MergeMany algorithm. With our new algorithm, we find that Re-Basin induces adversarial and perturbation robustness into the merged models, with the effect becoming stronger the more models participate in the hierarchical merging scheme. However, in our experiments Re-Basin induces a much bigger performance drop than reported by the original authors.

2510.05746 2026-05-20 cs.AI cs.CL cs.LG 版本更新

ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems

ARM:为通用多智能体系统发现代理推理模块

Bohan Yao, Shiva Krishna Reddy Malay, Vikas Yadav

发表机构 * University of Washington(华盛顿大学)

AI总结 本文提出了一种新的自动多智能体系统设计范式,通过优化链式推理(CoT)来发现代理推理模块(ARM),该模块通过在代码空间中进行树搜索,利用执行轨迹的反思来进化,从而提升多智能体系统的泛化能力。

Comments 29 pages, 2 figures

详情
AI中文摘要

大型语言模型(LLM)驱动的多智能体系统(MAS)在各种复杂推理任务上取得了最先进的结果。最近的研究提出了自动化设计MAS的方法,消除了手动工程的需要。然而,这些方法表现不佳,通常与简单的基线相当或更差。此外,它们需要为每个新任务领域进行昂贵的架构重新发现,并且在没有现有标注验证集的领域中需要昂贵的数据注释。关键的洞察是简单的链式推理(CoT)推理往往与这些复杂系统竞争,表明MAS的基本推理单元CoT值得进一步研究。为此,我们提出了一种新的自动MAS设计范式,将焦点转向优化CoT推理。我们引入了代理推理模块(ARM),即CoT的代理泛化,其中每个细粒度推理步骤由专门的推理模块执行。该模块通过在代码空间中进行树搜索来发现,从简单的CoT模块开始,利用执行轨迹的反思进行进化。最终的ARM作为一个通用的推理构建块,可以作为直接的递归循环或作为学习元协调器中的子程序使用。我们的方法显著优于手动设计的MAS和最先进的自动MAS设计方法。关键的是,由ARM构建的MAS表现出卓越的泛化能力,在不同的基础模型和任务领域中保持高性能,而无需进一步优化。

英文摘要

Large Language Model (LLM)-powered Multi-agent systems (MAS) have achieved state-of-the-art results on various complex reasoning tasks. Recent works have proposed techniques to automate the design of MASes, eliminating the need for manual engineering. However, these techniques perform poorly, often achieving similar or inferior performance to simple baselines. Furthermore, they require computationally expensive re-discovery of architectures for each new task domain and expensive data annotation on domains without existing labeled validation sets. A critical insight is that simple Chain of Thought (CoT) reasoning often performs competitively with these complex systems, suggesting that the fundamental reasoning unit of MASes, CoT, warrants further investigation. To this end, we present a new paradigm for automatic MAS design that pivots the focus to optimizing CoT reasoning. We introduce the Agentic Reasoning Module (ARM), an agentic generalization of CoT where each granular reasoning step is executed by a specialized reasoning module. This module is discovered through a tree search over the code space, starting from a simple CoT module and evolved using mutations informed by reflection on execution traces. The resulting ARM acts as a versatile reasoning building block which can be utilized as a direct recursive loop or as a subroutine in a learned meta-orchestrator. Our approach significantly outperforms both manually designed MASes and state-of-the-art automatic MAS design methods. Crucially, MASes built with ARM exhibit superb generalization, maintaining high performance across different foundation models and task domains without further optimization.

2510.03824 2026-05-20 cs.LG cs.AI stat.ML 版本更新

Proximal Diffusion Neural Sampler

近端扩散神经采样器

Wei Guo, Jaemoo Choi, Yuchen Zhu, Molei Tao, Yongxin Chen

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出了一种名为近端扩散神经采样器(PDNS)的框架,通过在路径测度空间上应用近端点方法,解决神经采样器在训练过程中遇到的多模式目标分布和模式崩溃问题,通过分阶段的简单子问题逐步逼近目标分布,促进模式的全面探索。

Comments Accepted at ICLR 2026 (https://openreview.net/forum?id=XTHQqS7ObC)

详情
AI中文摘要

学习基于扩散的神经采样器以从未归一化目标分布中抽取样本的任务可以被视为路径测度上的随机最优控制问题。然而,当目标分布是多模式且存在显著的模式分离屏障时,神经采样器的训练可能会面临挑战,可能导致模式崩溃。我们提出了一种名为近端扩散神经采样器(PDNS)的框架,通过在路径测度空间上应用近端点方法来解决这些问题。PDNS将学习过程分解为一系列更简单的子问题,逐步创建一条接近目标分布的路径。这种分阶段的程序会逐步细化路径以接近目标分布,并促进对所有模式的彻底探索。为了实现实用且高效的实现,我们用近端加权去噪交叉熵(WDCE)目标实例化每个近端步骤。通过在连续和离散采样任务中的广泛实验,包括分子动力学和统计物理中的挑战性场景,我们展示了PDNS的有效性和鲁棒性。我们的代码可在https://github.com/AlexandreGUO2001/PDNS上获得。

英文摘要

The task of learning a diffusion-based neural sampler for drawing samples from an unnormalized target distribution can be viewed as a stochastic optimal control problem on path measures. However, the training of neural samplers can be challenging when the target distribution is multimodal with significant barriers separating the modes, potentially leading to mode collapse. We propose a framework named Proximal Diffusion Neural Sampler (PDNS) that addresses these challenges by tackling the stochastic optimal control problem via proximal point method on the space of path measures. PDNS decomposes the learning process into a series of simpler subproblems that create a path gradually approaching the desired distribution. This staged procedure traces a progressively refined path to the desired distribution and promotes thorough exploration across modes. For a practical and efficient realization, we instantiate each proximal step with a proximal weighted denoising cross-entropy (WDCE) objective. We demonstrate the effectiveness and robustness of PDNS through extensive experiments on both continuous and discrete sampling tasks, including challenging scenarios in molecular dynamics and statistical physics. Our code is available at https://github.com/AlexandreGUO2001/PDNS.

2510.01499 2026-05-20 cs.LG cs.AI cs.GT 版本更新

Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information

超越多数投票:利用高阶信息进行LLM聚合

Rui Ai, Yuqi Pan, David Simchi-Levi, Milind Tambe, Haifeng Xu

发表机构 * Massachusetts Institute of Technology(麻省理工学院) School of Engineering and Applied Sciences(工程与应用科学学院) Harvard University(哈佛大学) Data Science, The University of Chicago(数据科学,芝加哥大学)

AI总结 本文提出Optimal Weight和Inverse Surprising Popularity两种算法,通过结合一阶和二阶信息,有效缓解多数投票的局限性,提升多智能体LLM聚合的可靠性。

Comments Accepted into ICML 2026

详情
AI中文摘要

随着多智能体大语言模型(LLM)推理的快速发展,如何有效聚合多个LLM的答案已成为一个根本性挑战。标准多数投票将所有答案视为同等重要,未能考虑模型间的潜在异质性和相关性。在本文中,我们设计了两种新的聚合算法,称为最优权重(OW)和反惊讶流行度(ISP),利用一阶和二阶信息。我们的理论分析显示,这些方法在温和假设下能够证明性地缓解多数投票的固有局限,从而产生更可靠的集体决策。我们在合成数据集、流行的LLM微调基准如UltraFeedback和MMLU,以及现实世界医疗场景ARMMAN上实证验证了我们的算法。我们的算法在多个基准上均优于标准基线,建立了稳健且无需训练的多智能体LLM聚合框架。

英文摘要

With the rapid progress of multi-agent large language model (LLM) reasoning, how to effectively aggregate answers from multiple LLMs has emerged as a fundamental challenge. Standard majority voting treats all answers equally, failing to consider latent heterogeneity and correlation across models. In this work, we design two new aggregation algorithms called Optimal Weight (OW) and Inverse Surprising Popularity (ISP), leveraging both first-order and second-order information. Our theoretical analysis shows these methods provably mitigate inherent limitations of majority voting under mild assumptions, leading to more reliable collective decisions. We empirically validate our algorithms on synthetic datasets, popular LLM fine-tuning benchmarks such as UltraFeedback and MMLU, and a real-world healthcare setting ARMMAN. Our algorithms consistently outperform standard baselines, establishing a robust, training-free framework for effective multi-agent LLM aggregation.

2510.00600 2026-05-20 cs.RO cs.AI cs.CV cs.LG 版本更新

Hybrid Training for Vision-Language-Action Models

视觉-语言-动作模型的混合训练

Pietro Mazzaglia, Cansu Sancaktar, Markus Peschl, Daniel Dijkman

发表机构 * Qualcomm AI Research(高通AI研究)

AI总结 本文提出混合训练框架,旨在使视觉-语言-动作模型在推理时能够根据需要生成思考过程或直接预测动作,从而在保持性能提升的同时提高推理效率。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

使用大型语言模型生成中间思考过程(即链式思考,CoT)再提供答案,已成为解决复杂语言任务的有效方法。在机器人领域,类似的具身CoT策略,即在执行动作前生成思考,也已被证明在使用视觉-语言-动作模型(VLAs)时能够提高性能。然而,这些技术会增加模型生成输出的长度以包含思考过程,从而影响推理时间。在现实世界执行中,如机器人操作场景,延迟代理的动作会严重影响方法的实用性,因为任务需要长序列的动作。然而,生成长链式思考是否是实现性能提升的必要条件?在本文中,我们探索了混合训练(HyT)的概念,这是一种框架,使VLAs能够从思考中学习并受益于相关的性能提升,同时在推理时允许省略CoT生成。此外,通过学习有条件地预测多样化的输出,HyT在推理时提供了灵活性,使模型能够直接预测动作、生成思考或遵循指令。我们评估了所提出的方法在一系列模拟基准和真实世界实验中的表现。

英文摘要

Using Large Language Models to produce intermediate thoughts, a.k.a. Chain-of-thought (CoT), before providing an answer has been a successful recipe for solving complex language tasks. In robotics, similar embodied CoT strategies, generating thoughts before actions, have also been shown to lead to improved performance when using Vision-Language-Action models (VLAs). As these techniques increase the length of the model's generated outputs to include the thoughts, the inference time is negatively affected. Delaying an agent's actions in real-world executions, as in robotic manipulation settings, strongly affects the usability of a method, as tasks require long sequences of actions. However, is the generation of long chains-of-thought a strong prerequisite for achieving performance improvements? In this work, we explore the idea of Hybrid Training (HyT), a framework that enables VLAs to learn from thoughts and benefit from the associated performance gains, while enabling the possibility to leave out CoT generation during inference. Furthermore, by learning to conditionally predict a diverse set of outputs, HyT supports flexibility at inference time, enabling the model to either predict actions directly, generate thoughts or follow instructions. We evaluate the proposed method in a series of simulated benchmarks and real-world experiments.

2509.21196 2026-05-20 cs.LG cs.CV 版本更新

Differential-Integral Neural Operator for Long-Term Turbulence Forecasting

微分-积分神经算子用于长期湍流预测

Hao Wu, Yuan Gao, Fan Xu, Fan Zhang, Qingsong Wen, Kun Wang, Xiaomeng Huang, Xian Wu

发表机构 * Tsinghua University(清华大学) University of Science and Technology of China(中国科学技术大学) The Chinese University of Hong Kong(香港中文大学) Nanyang Technological University(南洋理工大学) Tencent(腾讯)

AI总结 本文提出了一种基于物理原理的微分-积分神经算子,通过并行分支学习不同的物理算子,以提高长期湍流预测的稳定性与鲁棒性,从而在2D Kolmogorov流基准测试中实现了更精确的预测。

详情
AI中文摘要

准确预测湍流的长期演变是科学计算中的重大挑战,对气候建模和航空航天工程等应用至关重要。现有的深度学习方法,特别是神经算子,在长期自回归预测中常常失败,导致灾难性误差累积和物理保真度的丧失。这种失败源于它们无法同时捕捉湍流动力学所支配的不同的数学结构:局部、耗散效应和全局、非局部相互作用。在本文中,我们提出了微分-积分神经算子(\method{}),一种基于算子分解的原理方法。\method{}通过并行分支显式建模湍流的演变,学习不同的物理算子:一个局部微分算子,由一个受约束的卷积网络实现,该网络可以证明收敛于导数;以及一个全局积分算子,由Transformer架构捕捉,学习数据驱动的全局核。这种基于物理的分解使\method{}具有卓越的稳定性和鲁棒性。通过在具有挑战性的2D Kolmogorov流基准测试中的广泛实验,我们证明\method{}在长期预测中显著优于最先进的模型。它能够抑制数百个时间步上的误差累积,保持涡旋场和能量谱的高保真度,并建立了物理一致、长程湍流预测的新基准。

英文摘要

Accurately forecasting the long-term evolution of turbulence represents a grand challenge in scientific computing and is crucial for applications ranging from climate modeling to aerospace engineering. Existing deep learning methods, particularly neural operators, often fail in long-term autoregressive predictions, suffering from catastrophic error accumulation and a loss of physical fidelity. This failure stems from their inability to simultaneously capture the distinct mathematical structures that govern turbulent dynamics: local, dissipative effects and global, non-local interactions. In this paper, we propose the {\textbf{\underline{D}}}ifferential-{\textbf{\underline{I}}}ntegral {\textbf{\underline{N}}}eural {\textbf{\underline{O}}}perator (\method{}), a novel framework designed from a first-principles approach of operator decomposition. \method{} explicitly models the turbulent evolution through parallel branches that learn distinct physical operators: a local differential operator, realized by a constrained convolutional network that provably converges to a derivative, and a global integral operator, captured by a Transformer architecture that learns a data-driven global kernel. This physics-based decomposition endows \method{} with exceptional stability and robustness. Through extensive experiments on the challenging 2D Kolmogorov flow benchmark, we demonstrate that \method{} significantly outperforms state-of-the-art models in long-term forecasting. It successfully suppresses error accumulation over hundreds of timesteps, maintains high fidelity in both the vorticity fields and energy spectra, and establishes a new benchmark for physically consistent, long-range turbulence forecast.

2509.19250 2026-05-20 stat.ML cs.LG 版本更新

Recovering Wasserstein Distance Matrices from Few Measurements

从少量测量中恢复Wasserstein距离矩阵

Muhammad Rana, Abiy Tasissa, HanQin Cai, Yakov Gavriyelov, Keaton Hamm

发表机构 * Department of Mathematics, University of Texas at Arlington(德克萨斯理工大学数学系) Department of Mathematics, Tufts University(塔夫茨大学数学系) Department of Statistics and Data Science and Department of Computer Science University of Central Florida(中央佛罗里达大学统计与数据科学系和计算机科学系) Division of Data Science, University of Texas at Arlington(德克萨斯理工大学数据科学 division)

AI总结 本文提出两种算法,用于从少量条目估计平方Wasserstein距离矩阵,这些矩阵用于计算流形学习嵌入,如多维标度分析(MDS)或Isomap,但与欧几里得距离矩阵不同,它们的计算成本极高。本文分析了从上三角样本进行矩阵补全和Nyström补全,证明了在Nyström补全下MDS的稳定性,并展示了在固定样本距离预算下,Nyström补全可以优于矩阵补全。最后,本文证明了即使仅计算距离矩阵的10%列,嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

详情
Journal ref
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026
AI中文摘要

本文提出两种算法,用于从少量条目估计平方Wasserstein距离矩阵。这些矩阵用于计算流形学习嵌入,如多维标度分析(MDS)或Isomap,但与欧几里得距离矩阵不同,它们的计算成本极高。我们分析了从上三角样本进行矩阵补全和Nyström补全,在其中$\mathcal{O}(d\log(d))$列的距离矩阵被计算,其中$d$是所需的嵌入维度,证明了在Nyström补全下MDS的稳定性,并展示了在固定样本距离预算下,Nyström补全可以优于矩阵补全。最后,我们证明了即使仅计算距离矩阵的10%列,嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

英文摘要

This paper proposes two algorithms for estimating square Wasserstein distance matrices from a small number of entries. These matrices are used to compute manifold learning embeddings like multidimensional scaling (MDS) or Isomap, but contrary to Euclidean distance matrices, are extremely costly to compute. We analyze matrix completion from upper triangular samples and Nyström completion in which $\mathcal{O}(d\log(d))$ columns of the distance matrices are computed where $d$ is the desired embedding dimension, prove stability of MDS under Nyström completion, and show that it can outperform matrix completion for a fixed budget of sample distances. Finally, we show that classification of the OrganCMNIST dataset from the MedMNIST benchmark is stable on data embedded from the Nyström estimation of the distance matrix even when only 10\% of the columns are computed.

2509.14968 2026-05-20 cs.LG cs.NI 版本更新

FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

FAWN:一种多编码器融合-注意力波网络用于集成感知与通信室内场景推断

Carlos Barroso-Fernández, Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos J. Bernardos

发表机构 * Ericsson(爱立信)

AI总结 本文提出FAWN,一种基于Transformer架构的多编码器融合-注意力波网络,用于整合感知与通信的室内场景推断,通过融合Wi-Fi和5G信号提高环境感知精度。

Comments 7 pages, 6 figures and tables, less than 5500 words. Under revision at IEEE Communication Magazine

详情
AI中文摘要

下一代无线技术有望实现万物互联和智能化的时代。随着对智能需求的增长,网络必须学会更好地理解物理世界。然而,部署专用硬件来感知环境并不总是可行,主要是由于成本和/或复杂性。集成感知与通信(ISAC)在解决这一挑战上迈出了重要一步。在ISAC中,被动感知作为一种成本效益高的解决方案,利用无线通信来感知环境,而不干扰现有通信。然而,当前大多数解决方案仅限于一种技术(主要是Wi-Fi或5G),限制了最大精度。由于不同技术使用不同的频谱,我们看到有必要整合多种技术以扩大覆盖范围。因此,我们利用ISAC被动感知,提出FAWN,一种用于ISAC室内场景推断的多编码器融合-注意力波网络。FAWN基于原始Transformer架构,融合Wi-Fi和5G信息,使网络能够理解物理世界而不干扰当前通信。为了测试我们的解决方案,我们构建了一个原型并将其集成到真实场景中。结果表明,在84%的时间内,误差低于0.6米。

英文摘要

The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless communications to sense the environment, without interfering with existing communications. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.

2509.07024 2026-05-20 physics.plasm-ph cs.LG 版本更新

TGLF-WINN: Data-Efficient Deep Learning Surrogate for Turbulent Transport Modeling in Fusion

TGLF-WINN: 用于等离子体输运建模的高效深度学习替代模型

Yadi Cao, Futian Zhang, Wesley Liu, Tom Neiser, Orso Meneghini, Lawson Fuller, Sterling Smith, Raffi Nazikian, Brian Sammuli, Rose Yu

发表机构 * University of California, San Diego(加州大学圣地亚哥分校) General Atomics(通用原子公司)

AI总结 本文提出TGLF-WINN,一种数据高效的深度学习替代模型,通过三种创新方法:原理化的特征工程、物理引导的波数解析正则化和贝叶斯主动学习,提高了湍流输运建模的效率和准确性。

Comments Minor Revision responding to Nuclear Fusion reviewer and adjudicator comments (round 3)

详情
AI中文摘要

Trapped Gyro-Landau Fluid (TGLF)模型提供了快速且准确的托卡马克湍流输运预测,但需要数千次评估的全设备模拟仍然计算成本高昂。神经网络(NN)替代模型提供加速推理,具有完全可微的近似方法,能够实现基于梯度的耦合,但通常需要大量训练数据来捕捉不同等离子体条件下的输运通量变化,造成显著的训练负担并限制其在昂贵的gyrokinetic模拟中的应用。我们提出TGLF-WINN(波数引导的神经网络),具有三个关键创新:(1)原理化的特征工程,减少目标预测范围,简化学习任务;(2)物理引导的波数解析正则化,以在稀疏数据下提高泛化能力;(3)贝叶斯主动学习(BAL)以根据模型不确定性战略选择训练样本,减少数据需求同时保持准确性。特征调优和波数正则化共同在完整数据集上实现了比TGLF-NN低12.5%的相对RMSLE;在稀疏、未过滤的训练(大约是完整数据集的1/9)下,它们产生的RMSLE退化比TGLF-NN小一个数量级,其中波数引导的正则化对每种模式的通量施加了物理引导的约束。添加贝叶斯主动学习后,TGLF-WINN仅使用25%的训练数据即可达到TGLF-NN的全数据离线精度,其在TGLF-NN全数据基准下的误差为2.8%,在我们自己的全数据结果下的误差为4.3%。下游的通量匹配工作流程进一步展示了其实用性:NN替代模型在与TGLF相当的重建精度下实现了45倍的速度提升。

英文摘要

The Trapped Gyro-Landau Fluid (TGLF) model provides fast, accurate predictions of turbulent transport in tokamaks, but whole device simulations requiring thousands of evaluations remain computationally expensive. Neural network (NN) surrogates offer accelerated inference with fully differentiable approximations that enable gradient-based coupling but typically require large training datasets to capture transport flux variations across plasma conditions, creating significant training burden and limiting applicability to expensive gyrokinetic simulations. We propose TGLF-WINN (Wavenumber-Informed Neural Network) with three key innovations: (1) principled feature engineering that reduces target prediction range, simplifying the learning task; (2) physics-guided wavenumber-resolved regularization to improve generalization under sparse data; and (3) Bayesian Active Learning (BAL) to strategically select training samples based on model uncertainty, reducing data requirements while maintaining accuracy. Feature tuning and wavenumber regularization together deliver a 12.5% relative RMSLE reduction over TGLF-NN on the full dataset; under sparse, unfiltered training (approximately 1/9 the full size) they yield an order-of-magnitude smaller RMSLE degradation than TGLF-NN, with the wavenumber-informed regularization imposing a physics-guided constraint on per-mode fluxes. Adding Bayesian Active Learning, TGLF-WINN matches TGLF-NN's full-data offline accuracy using only 25% of the training data, within 2.8% of TGLF-NN's full-data baseline and 4.3% of our own full-data result. A downstream flux-matching workflow further shows practicality: the NN surrogate gives a 45x speedup over TGLF with comparable reconstruction accuracy.

2508.14134 2026-05-20 cs.LG cs.AI 版本更新

ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification

ERIS: 一种面向分布外时间序列分类的能量引导特征解耦框架

Xin Wu, Fei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang

发表机构 * Hong Kong University of Science and Technology(香港科技大学)

AI总结 本文提出ERIS框架,通过能量引导机制和语义指导,解决时间序列分类中分布外数据的可靠特征解耦问题,提升模型鲁棒性和泛化能力。

详情
Journal ref
Information Fusion 135, 104407 (2026)
AI中文摘要

理想的时间序列分类(TSC)应能捕捉不变表示,但实现对分布外(OOD)数据的可靠性能仍是一个核心障碍。这一障碍源于模型内在地将领域特定和标签相关特征纠缠在一起,导致虚假相关性。尽管特征解耦旨在解决这一问题,但当前方法大多缺乏必要的语义方向,无法隔离真正普遍的特征。为此,我们提出一个端到端的Energy-Regularized Information for Shift-Robustness(ERIS)框架,以实现引导且可靠的特征解耦。核心思想是有效的解耦不仅需要数学约束,还需要语义指导来锚定分离过程。ERIS集成了三个关键机制来实现这一目标。具体来说,我们首先引入一种能量引导校准机制,为分离过程提供关键的语义指导,使模型能够自我校准。此外,一个权重层面正交性策略强制领域特定和标签相关特征之间的结构性独立,从而减轻它们的干扰。此外,一个辅助对抗泛化机制通过注入结构化扰动来增强鲁棒性。在四个基准测试中的实验表明,ERIS在统计上显著优于最先进的基线方法,始终保持最佳性能排名。

英文摘要

An ideal time series classification (TSC) should be able to capture invariant representations, but achieving reliable performance on out-of-distribution (OOD) data remains a core obstacle. This obstacle arises from the way models inherently entangle domain-specific and label-relevant features, resulting in spurious correlations. While feature disentanglement aims to solve this, current methods are largely unguided, lacking the semantic direction required to isolate truly universal features. To address this, we propose an end-to-end Energy-Regularized Information for Shift-Robustness (ERIS) framework to enable guided and reliable feature disentanglement. The core idea is that effective disentanglement requires not only mathematical constraints but also semantic guidance to anchor the separation process. ERIS incorporates three key mechanisms to achieve this goal. Specifically, we first introduce an energy-guided calibration mechanism, which provides crucial semantic guidance for the separation, enabling the model to self-calibrate. Additionally, a weight-level orthogonality strategy enforces structural independence between domain-specific and label-relevant features, thereby mitigating their interference. Moreover, an auxiliary adversarial generalization mechanism enhances robustness by injecting structured perturbations. Experiments across four benchmarks demonstrate that ERIS achieves a statistically significant improvement over state-of-the-art baselines, consistently securing the top performance rank.

2507.15698 2026-05-20 cs.CL cs.AI cs.LG 版本更新

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

CoLD: 用于数学推理过程中奖励模型的反事实引导长度偏差消除

Congmin Zheng, Jiachen Zhu, Jianghao Lin, Xinyi Dai, Weiwen Liu, Haoxuan Li, Yong Yu, Weinan Zhang, Mengyue Yang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Huawei Noah’s Ark Lab(华为诺亚实验室) Peking University(北京大学) University of Bristol(布里斯托大学)

AI总结 本文提出CoLD,一种通过反事实引导消除过程奖励模型中长度偏差的统一框架,旨在提高多步骤推理的准确性和简洁性,同时提升下游强化学习性能和跨领域泛化能力。

详情
AI中文摘要

过程奖励模型(PRMs)在评估和引导大型语言模型(LLMs)的多步推理中起着核心作用,特别是在数学问题解决中。然而,我们发现现有PRMs存在普遍的长度偏差:即使语义内容和逻辑有效性未变,它们也倾向于对较长的推理步骤赋予更高的分数。这种偏差会削弱奖励预测的可靠性,并导致推理过程中输出过于冗长。为了解决这一问题,我们提出了CoLD(Counterfactually-Guided Length Debiasing),一种统一的框架,通过三个组件减轻长度偏差:显式的长度惩罚调整、一个训练以捕捉虚假长度相关信号的学得偏差估计器,以及一种联合训练策略,强制奖励预测的长度不变性。我们的方法基于反事实推理,并受因果图分析的启发。在MATH500和GSM-Plus上的广泛实验表明,CoLD提高了步骤选择的准确性,并鼓励了更简洁、逻辑有效的推理。此外,它一致提高了下游RL性能,并通过减轻长度偏差在跨领域中泛化,展示了CoLD强大的泛化能力。

英文摘要

Process Reward Models (PRMs) play a central role in evaluating and guiding multi-step reasoning in large language models (LLMs), especially for mathematical problem solving. However, we identify a pervasive length bias in existing PRMs: they tend to assign higher scores to longer reasoning steps, even when the semantic content and logical validity are unchanged. This bias undermines the reliability of reward predictions and leads to overly verbose outputs during inference. To address this issue, we propose CoLD(Counterfactually-Guided Length Debiasing), a unified framework that mitigates length bias through three components: an explicit length-penalty adjustment, a learned bias estimator trained to capture spurious length-related signals, and a joint training strategy that enforces length-invariance in reward predictions. Our approach is grounded in counterfactual reasoning and informed by causal graph analysis. Extensive experiments on MATH500 and GSM-Plus show that CoLD improves accuracy in step selection, and encourages more concise, logically valid reasoning. Furthermore, it consistently improves downstream RL performance and generalizes across domains by mitigating length bias, demonstrating CoLD's strong generalization capability.

2507.10614 2026-05-20 cs.LG cs.AI 版本更新

Fine-tuning Large Language Model for Automated Algorithm Design

微调大语言模型用于自动化算法设计

Fei Liu, Rui Zhang, Xi Lin, Zhichao Lu, Qingfu Zhang

发表机构 * City University of Hong Kong(香港城市大学) Xi’an Jiaotong University(西安交通大学)

AI总结 本文探讨了微调大语言模型以提升其在自动化算法设计中的性能,提出了一种多样性感知的排名策略和直接偏好优化方法,通过实验验证了任务特定微调在不同算法设计任务中的有效性。

详情
AI中文摘要

将大语言模型(LLMs)整合到自动化算法设计中已展现出巨大潜力。一种常见的方法是将LLMs嵌入到搜索过程中,以迭代生成和优化候选算法。然而,现有大多数方法依赖于为通用编码任务训练的现成LLMs,留下一个关键问题:是否需要专门针对算法设计训练的LLMs?如果是,如何有效获得此类LLMs,并且它们在不同算法设计任务中有多好的泛化能力?在本文中,我们通过探索针对算法设计的LLMs微调,初步回答了这些问题。我们引入了一种多样性感知的排名(DAR)采样策略,以平衡训练数据的多样性和质量,然后利用直接偏好优化来高效地对齐LLMs的输出与任务目标。我们的实验主要在Llama-3.2-1B-Instruct和Llama-3.1-8BInstruct上进行,针对三个不同的算法设计任务,此外,openPangu-Embedded模型还作为辅助比较在可允许集合问题上进行评估。结果表明,微调后的LLMs在较小的Llama-3.2-1B-Instruct上显著优于其现成的对应者,并在可允许集合问题上与较大的Llama-3.1-8B-Instruct匹配。此外,我们观察到良好的泛化能力:在特定算法设计任务上微调的LLMs在相关任务中也表现出色。这些发现突显了LLMs在算法设计中任务特定适应的价值,并为未来研究开辟了新途径。我们的代码可在https://github.com/RayZhhh/dpo-aad上公开获取。

英文摘要

The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a preliminary step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments are primarily conducted on Llama-3.2-1B-Instruct and Llama-3.1-8BInstruct across three distinct algorithm design tasks, with openPangu-Embedded models additionally included as auxiliary comparisons on the admissible set problem. Results suggest that fine-tuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem. Moreover, we observe promising generalization: LLMs fine-tuned on specific algorithm design tasks also improve performance on related tasks with varying settings. These findings highlight the value of task-specific adaptation for LLMs in algorithm design and open new avenues for future research. Our code is publicly available at https://github.com/RayZhhh/dpo-aad.

2507.10492 2026-05-20 cs.CV cs.AI cs.LG 版本更新

BenchReAD: A systematic benchmark for retinal anomaly detection

BenchReAD: 一种系统性的视网膜异常检测基准

Chenyu Lian, Hong-Yu Zhou, Zhanli Hu, Jing Qin

发表机构 * The Center for Smart Health, School of Nursing, the Hong Kong Polytechnic University, Hong Kong, China(香港理工大学护理学院智能健康中心) School of Biomedical Engineering, Tsinghua University, Beijing, China(清华大学生物医学工程学院) Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China(中国科学院深圳先进技术研究院医学人工智能研究中心)

AI总结 本研究提出BenchReAD基准,旨在解决视网膜异常检测领域缺乏全面且公开的评估标准的问题,通过系统化的数据和算法分类,引入了全监督方法DRA,并改进为NFM-DRA,实现了SOTA性能。

Comments MICCAI 2025

详情
AI中文摘要

视网膜异常检测在筛查眼部和系统性疾病中起着关键作用。尽管其重要性,该领域的进展受到缺乏全面且公开可用的基准的阻碍,这对于公平评估和推进方法至关重要。由于这一限制,与视网膜图像相关的先前异常检测工作受到(1)异常类型有限且过于简单的限制,(2)测试集几乎饱和,以及(3)缺乏泛化评估的影响,导致实验设置说服力不足。此外,现有医学异常检测基准大多专注于单类监督方法(仅使用负样本训练),忽视了临床实践中大量可用的标记异常数据和未标记数据。为了填补这些差距,我们引入了视网膜异常检测的基准,该基准在数据和算法上都是全面且系统的。通过分类和评估先前方法,我们发现利用解耦异常表示的全监督方法(DRA)取得了最佳性能,但在遇到某些未见异常时性能显著下降。受单类监督学习中记忆库机制的启发,我们提出了NFM-DRA,将其与正常特征记忆结合,以缓解性能下降,建立新的SOTA。该基准可在https://github.com/DopamineLcy/BenchReAD上公开获取。

英文摘要

Retinal anomaly detection plays a pivotal role in screening ocular and systemic diseases. Despite its significance, progress in the field has been hindered by the absence of a comprehensive and publicly available benchmark, which is essential for the fair evaluation and advancement of methodologies. Due to this limitation, previous anomaly detection work related to retinal images has been constrained by (1) a limited and overly simplistic set of anomaly types, (2) test sets that are nearly saturated, and (3) a lack of generalization evaluation, resulting in less convincing experimental setups. Furthermore, existing benchmarks in medical anomaly detection predominantly focus on one-class supervised approaches (training only with negative samples), overlooking the vast amounts of labeled abnormal data and unlabeled data that are commonly available in clinical practice. To bridge these gaps, we introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm. Through categorizing and benchmarking previous methods, we find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies. Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation, establishing a new SOTA. The benchmark is publicly available at https://github.com/DopamineLcy/BenchReAD.

2507.06428 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML 版本更新

Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

神经Actor-Critic方法用于哈密尔顿-雅可比-贝尔曼PDEs:渐近分析与数值研究

Samuel N. Cohen, Jackson Hebner, Deqing Jiang, Justin Sirignano

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所)

AI总结 本文研究了用于求解高维哈密尔顿-雅可比-贝尔曼偏微分方程的神经Actor-Critic方法,通过渐近分析和数值研究,证明了该方法在解决随机控制问题中的有效性。

Comments 46 pages

详情
AI中文摘要

我们数学上分析并数值研究了一种用于求解随机控制理论中高维哈密尔顿-雅可比-贝尔曼(HJB)偏微分方程的Actor-Critic机器学习算法。批评者(价值函数估计器)的结构设计使得边界条件始终被完美满足(而不是包含在训练损失中),并利用偏斜梯度以减少计算成本。演员(最优控制估计器)通过最小化域内哈密尔顿量的积分进行训练,其中哈密尔顿量通过批评者估计。我们证明,当演员和批评者神经网络中的隐藏单元数量趋于无穷大时,演员和批评者的训练动态在Sobolev型空间中收敛到某个无限维常微分方程(ODE)。进一步地,在哈密尔顿量类似凸性假设下,我们证明该极限ODE的任何固定点都是原始随机控制问题的解。这为算法性能提供了重要保证,考虑到有限宽度神经网络可能只能收敛到局部极小值(而非最优解),由于其损失函数的非凸性。在我们的数值研究中,我们展示了该算法能够准确地在高达200维的随机控制问题中求解。特别是,我们构建了一系列逐渐复杂且具有已知解析解的随机控制问题,并研究该算法在这些问题上的数值性能。这些问题从线性二次调节器方程到极具挑战性的非凸哈密尔顿量方程,使我们能够识别并分析该神经Actor-Critic方法在求解HJB方程中的优势和局限性。

英文摘要

We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.

2507.03122 2026-05-20 cs.IR cs.CL cs.LG 版本更新

Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings

基于轻量模型和预训练嵌入的ICD分类联邦学习

Binbin Xu, Gérard Dray

发表机构 * EuroMov Digital Health in Motion, Univ. Montpellier, IMT Mines Ales(EuroMov数字健康运动、蒙彼利埃大学、IMT Mines Ales)

AI总结 本文研究了使用MIMIC-IV数据集中的临床笔记进行多标签ICD代码分类的联邦学习可行性与性能,提出了一种结合冻结文本嵌入和简单多层感知机分类器的轻量级可扩展流程,展示了在分布式医疗环境中隐私保护和部署高效的替代方案。

Comments 20 pages

详情
AI中文摘要

本研究探讨了使用MIMIC-IV数据集中的临床笔记进行多标签ICD代码分类的联邦学习(FL)的可行性和性能。不同于以往依赖集中训练或微调大型语言模型的方法,我们提出了一种轻量级且可扩展的流程,结合冻结的文本嵌入与简单的多层感知机(MLP)分类器。该设计为临床NLP应用提供了一种隐私保护且部署高效的替代方案,特别适用于分布式医疗环境。在集中式和联邦式配置下进行了广泛的实验,测试了六个公开可用的嵌入模型(来自Massive Text Embedding Benchmark排行榜)和三种MLP分类器架构,以及两种医学编码(ICD-9和ICD-10)。此外,对十个随机分层分割进行消融研究以评估性能稳定性。结果表明,嵌入质量在决定预测性能方面显著优于分类器复杂性,并且在理想条件下联邦学习可以接近集中式结果。尽管模型比最先进的架构小多个数量级,并且在微和宏F1分数上取得了竞争性的成绩,但仍存在一些限制,包括缺乏端到端训练和简化FL假设。然而,本研究展示了向可扩展、隐私意识的医疗编码系统迈进的可行方法,并为未来研究联邦、领域适应的临床AI提供了一步。

英文摘要

This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline combining frozen text embeddings with simple multilayer perceptron (MLP) classifiers. This design offers a privacy-preserving and deployment-efficient alternative for clinical NLP applications, particularly suited to distributed healthcare settings. Extensive experiments across both centralized and federated configurations were conducted, testing six publicly available embedding models from Massive Text Embedding Benchmark leaderboard and three MLP classifier architectures under two medical coding (ICD-9 and ICD-10). Additionally, ablation studies over ten random stratified splits assess performance stability. Results show that embedding quality substantially outweighs classifier complexity in determining predictive performance, and that federated learning can closely match centralized results in idealized conditions. While the models are orders of magnitude smaller than state-of-the-art architectures and achieved competitive micro and macro F1 scores, limitations remain including the lack of end-to-end training and the simplified FL assumptions. Nevertheless, this work demonstrates a viable way toward scalable, privacy-conscious medical coding systems and offers a step toward for future research into federated, domain-adaptive clinical AI.

2507.01123 2026-05-20 cs.CV cs.LG eess.IV 版本更新

Landslide Detection and Mapping Using Deep Learning Across Multi-Source Satellite Data and Geographic Regions

利用多源卫星数据和地理区域的深度学习进行滑坡检测与制图

Rahul A. Burange, Harsh K. Shinde, Omkar Mutyalwar

发表机构 * Department of Electronics & Telecommunication, KDK College of Engineering(电子与电信系,KDK工程学院)

AI总结 本文提出了一种综合方法,结合多源卫星影像和深度学习模型,以提高滑坡识别和预测的准确性,通过Sentinel-2多光谱数据和ALOS PALSAR衍生的坡度和数字高程模型(DEM)层来捕捉影响滑坡发生的关键环境特征,并评估多种地理空间分析技术对检测精度的影响,同时评估了多种先进的深度学习分割模型,如U-Net、DeepLabV3+和Res-Net,以确定其在滑坡检测中的有效性。

Comments 17 pages, 22 figures

详情
Journal ref
JETIR March 2025, Volume 12, Issue 3
AI中文摘要

滑坡对基础设施、经济和人类生命构成严重威胁,需要在多样化的地理区域中进行准确的检测和预测制图。随着深度学习和遥感技术的进步,自动化滑坡检测已变得更加有效。本文提出了一种综合方法,整合多源卫星影像和深度学习模型,以增强滑坡识别和预测。我们利用Sentinel-2多光谱数据和ALOS PALSAR衍生的坡度和数字高程模型(DEM)层来捕捉影响滑坡发生的关键环境特征。各种地理空间分析技术被用来评估地形特征、植被覆盖和降雨对检测精度的影响。此外,我们评估了多种先进的深度学习分割模型,包括U-Net、DeepLabV�+和Res-Net,以确定其在滑坡检测中的有效性。所提出的框架有助于发展可靠的早期预警系统,改进灾害风险管理,并促进可持续的土地利用规划。我们的发现为深度学习和多源遥感在创建稳健、可扩展和可转移的滑坡预测模型中的潜力提供了有价值的见解。

英文摘要

Landslides pose severe threats to infrastructure, economies, and human lives, necessitating accurate detection and predictive mapping across diverse geographic regions. With advancements in deep learning and remote sensing, automated landslide detection has become increasingly effective. This study presents a comprehensive approach integrating multi-source satellite imagery and deep learning models to enhance landslide identification and prediction. We leverage Sentinel-2 multispectral data and ALOS PALSAR-derived slope and Digital Elevation Model (DEM) layers to capture critical environmental features influencing landslide occurrences. Various geospatial analysis techniques are employed to assess the impact of terra in characteristics, vegetation cover, and rainfall on detection accuracy. Additionally, we evaluate the performance of multiple stateof-the-art deep learning segmentation models, including U-Net, DeepLabV3+, and Res-Net, to determine their effectiveness in landslide detection. The proposed framework contributes to the development of reliable early warning systems, improved disaster risk management, and sustainable land-use planning. Our findings provide valuable insights into the potential of deep learning and multi-source remote sensing in creating robust, scalable, and transferable landslide prediction models.

2506.17036 2026-05-20 stat.ME cs.LG stat.ML 版本更新

Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction

多传感器和故障事件数据的贝叶斯联合模型用于多模式故障预测

Sina Aghaee Dabaghan Fard, Minhee Kim, Akash Deep, Jaesung Lee

发表机构 * Department of Industrial and Systems Engineering, Texas A&M University(德克萨斯A&M大学工业与系统工程系) University of Florida(佛罗里达大学) School of Industrial Engineering and Management, Oklahoma State University(俄克拉荷马州立大学工业工程与管理学院)

AI总结 本文提出了一种联合建模多传感器时间序列数据和多模式故障时间的贝叶斯方法,通过整合Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布,实现对系统剩余使用寿命的准确预测,并通过数值和案例研究验证了其优势。

详情
AI中文摘要

现代工业系统常常受到多种故障模式的影响,其状态由多个传感器监控,产生多个时间序列信号。此外,时间到故障的数据也经常可用。准确预测系统剩余使用寿命(RUL)需要有效利用多传感器时间序列数据和多模式故障事件数据。在大多数现有模型中,故障模式和RUL预测是独立进行的,忽略了这两个任务之间的内在关系。一些模型使用黑箱机器学习方法整合多种故障模式和事件预测,但缺乏统计严谨性,无法表征模型和数据中的内在不确定性。本文提出了一种统一的方法,通过层次贝叶斯框架整合多传感器时间序列数据和涉及多种故障模式的故障时间,该模型整合了Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布,并相应地设置先验,从而实现具有鲁棒不确定性量化的准确预测。通过变分贝叶斯方法有效获得后验分布,并通过蒙特卡洛采样进行预测。所提出模型的优势通过广泛的数值和案例研究,使用喷气发动机数据集进行了验证。

英文摘要

Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.

2506.15753 2026-05-20 quant-ph cs.LG cs.SY eess.SY 版本更新

QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels

QPPG:用于瑞利衰落信道链路自适应的量子预条件策略梯度

Oluwaseyi Giwa, Muhammad Ahmed Mohsin, Folarin Jubril Adesola, Muhammad Ali Jamshed

发表机构 * African Institute for Mathematical Sciences(非洲数学科学研究所) Stanford University(斯坦福大学) Olabisi Onabanjo University(奥拉比·奥纳班乔大学) University of Glasgow(格拉斯哥大学)

AI总结 本文提出量子预条件策略梯度算法,通过信息 Fisher 基于预条件稳定和加速策略更新,提升无线通信中动态衰落环境下的链路自适应性能,实现更快收敛、更高的吞吐量和更低的发射功率。

Comments Submitted to IEEE Wireless Communications Letters

详情
AI中文摘要

可靠的链路自适应对于动态衰落环境中高效无线通信至关重要。然而,由于策略梯度的条件较差,强化学习(RL)解决方案常常因收敛不稳定而受到限制,阻碍了其实际应用。我们提出了量子预条件策略梯度(QPPG)算法,该算法利用基于 Fisher 信息的预条件来稳定和加速策略更新。在瑞利衰落场景中的评估显示,QPPG 相比经典方法实现了更快的收敛速度,平均吞吐量提高了 28.6%,平均发射功率降低了 43.8%。这项工作引入了量子几何预条件到链路自适应中,标志着在开发鲁棒、具有量子启发的强化学习以应对未来 6G 网络方面取得了重大进展,从而提高通信的可靠性和能效。

英文摘要

Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.

2506.08618 2026-05-20 cs.LG cond-mat.mes-hall cond-mat.other cs.AI cs.CV 版本更新

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

HSG-12M: 一种大规模空间多图基准,源自非厄密晶体能量谱

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

发表机构 * National University of Singapore(新加坡国立大学) NUS Centre for Bioimaging Sciences(新加坡国立大学生物成像科学中心)

AI总结 本文提出HSG-12M,一个包含1160万静态和510万动态哈密顿量谱图的数据集,用于研究非厄密量子物理中的复杂几何结构,填补了现有图基准在空间多边学习方面的空白。

Comments Accepted to ICLR 2026, OpenReview: [https://openreview.net/forum?id=YxuKCME576]. 49 pages, 13 figures, 14 tables. Code & pipeline: [https://github.com/sarinstein-yan/Poly2Graph] Dataset: [https://github.com/sarinstein-yan/HSG-12M] Dataset released under CC BY 4.0. The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
Journal ref
The Fourteenth International Conference on Learning Representations (ICLR 2026)
AI中文摘要

人工智能正通过揭示理解复杂物理系统的新方法改变科学研究,但其影响仍受限于缺乏大规模、高质量的领域专用数据集。非厄密量子物理中蕴藏着丰富的资源,其中晶体的能量谱在复平面上形成复杂的几何结构,称为哈密顿量谱图。尽管这些谱图作为电子行为的指纹具有重要意义,但其系统研究一直受限于手动提取的依赖。为释放这一潜力,我们引入Poly2Graph:一个高性能、开源的管道,自动化将一维晶体哈密顿量映射到谱图。使用该工具,我们提出了HSG-12M:一个包含1160万静态和510万动态哈密顿量谱图的数据集,涵盖1401个特征多项式类别,源自177TB的谱势数据。关键的是,HSG-12M是首个大规模空间多图数据集——图嵌入在度量空间中,其中两个节点之间不同的几何轨迹被保留为单独的边。这同时填补了现有图基准在空间多边学习方面的空白。流行的GNN基准测试揭示了在大规模学习空间多边时的新挑战。除了其实际用途外,我们还表明谱图是多项式、向量和矩阵的通用拓扑指纹,建立了新的代数到图的联系。HSG-12M为凝聚态物理的数据驱动科学发现奠定了基础,为几何感知图学习的新机会以及更广泛领域铺平了道路。

英文摘要

AI is transforming scientific research by revealing new ways to understand complex physical systems, but its impact remains constrained by the lack of large, high-quality domain-specific datasets. A rich, largely untapped resource lies in non-Hermitian quantum physics, where the energy spectra of crystals form intricate geometries on the complex plane -- termed as Hamiltonian spectral graphs. Despite their significance as fingerprints for electronic behavior, their systematic study has been intractable due to the reliance on manual extraction. To unlock this potential, we introduce Poly2Graph: a high-performance, open-source pipeline that automates the mapping of 1-D crystal Hamiltonians to spectral graphs. Using this tool, we present HSG-12M: a dataset containing 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs across 1401 characteristic-polynomial classes, distilled from 177 TB of spectral potential data. Crucially, HSG-12M is the first large-scale dataset of spatial multigraphs -- graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. This simultaneously addresses a critical gap, as existing graph benchmarks overwhelmingly assume simple, non-spatial edges, discarding vital geometric information. Benchmarks with popular GNNs expose new challenges in learning spatial multi-edges at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for data-driven scientific discovery in condensed matter physics, new opportunities in geometry-aware graph learning and beyond.

2506.01529 2026-05-20 cs.LG 版本更新

Learning Abstract World Models with a Group-Structured Latent Space

通过组结构潜在空间学习抽象世界模型

Thomas Delliaux, Nguyen-Khanh Vu, Vincent François-Lavet, Elise van der Pol, Emmanuel Rachelson

发表机构 * ISAE-SUPAERO ETH Zürich(瑞士联邦理工学院) Vrije Universiteit Amsterdam(阿姆斯特丹自由大学) Microsoft Research AI for Science(微软研究院人工智能科学部)

AI总结 该研究通过在低维表示流形上引入几何先验,改进了马尔可夫决策过程的抽象模型学习,从而提升有限数据下的泛化能力,并在具有旋转和翻译特征的环境中实现了更有效的强化学习任务学习。

Comments 20 pages, 18 figures

详情
AI中文摘要

学习有意义的马尔可夫决策过程(MDPs)的抽象模型对于从有限数据中提高泛化能力至关重要。在本文中,我们展示了如何在学习的转移模型的低维表示流形上施加几何先验。我们通过适当选择潜在空间和相关的群作用,纳入已知的对称结构,这些结构编码了环境中的先验知识关于不变性。此外,我们的框架允许将额外的无结构信息与这些对称性一起嵌入。我们实验表明,这导致了比完全无结构方法更好的潜在转移模型预测,以及在具有旋转和翻译特征的环境中下游RL任务学习的改进。此外,我们的实验还显示,这导致了更简单和更解耦的表示。完整的代码可在GitHub上获得以确保可重复性。

英文摘要

Learning meaningful abstract models of Markov Decision Processes (MDPs) is crucial for improving generalization from limited data. In this work, we show how geometric priors can be imposed on the low-dimensional representation manifold of a learned transition model. We incorporate known symmetric structures via appropriate choices of the latent space and the associated group actions, which encode prior knowledge about invariances in the environment. In addition, our framework allows the embedding of additional unstructured information alongside these symmetries. We show experimentally that this leads to better predictions of the latent transition model than fully unstructured approaches, as well as better learning on downstream RL tasks, in environments with rotational and translational features, including in first-person views of 3D environments. Additionally, our experiments show that this leads to simpler and more disentangled representations. The full code is available on GitHub to ensure reproducibility.

2506.00286 2026-05-20 cs.LG cs.AI math.OC stat.ML 版本更新

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

递归熵风险优化在折扣马尔可夫决策过程中的应用:带有生成模型的样本复杂性界

Oliver Mortensen, Mohammad Sadegh Talebi

发表机构 * Department of Computer Science, University of Copenhagen(哥本哈根大学计算机科学系)

AI总结 本文研究了在有限折扣马尔可夫决策过程(MDP)中使用递归熵风险度量(ERM)进行风险敏感强化学习的问题,引入了基于模型的算法Model-Based ERM Q-Value Iteration(MB-RS-QVI),并推导了该算法在价值学习和策略学习中的PAC型样本复杂性界,证明了在最坏情况下样本复杂性与|β|/(1-γ)呈指数关系,为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

详情
AI中文摘要

我们研究了在有限折扣马尔可夫决策过程(MDP)中使用递归熵风险度量(ERM)进行风险敏感强化学习的问题,其中风险参数β≠0控制智能体的风险态度:β>0表示风险规避,β<0表示风险寻求行为。假设MDP具有生成模型。我们的关注点是学习最优状态-动作价值函数(价值学习)和最优策略(策略学习)在递归ERM下的样本复杂性。我们引入了一个基于模型的算法,称为Model-Based ERM Q-Value Iteration(MB-RS-QVI),并推导了该算法在价值和策略学习中的PAC型样本复杂性界。两种PAC界都随|β|/(1-γ)呈指数增长,其中γ是折扣因子。我们还为价值和策略学习建立了相应的下界,证明在最坏情况下样本复杂性对|β|/(1-γ)的指数依赖是不可避免的。这些界在状态和动作的数量(S和A)上是紧的,为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

英文摘要

We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter $β\neq 0$ controls the agent's risk attitude: $β>0$ for risk-averse and $β<0$ for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM $Q$-Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with $|β|/(1-γ)$, where $γ$ is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on $|β|/(1-γ)$ is unavoidable in the worst case. The bounds are tight in the number of states and actions ($S$ and $A$), providing the first rigorous sample complexity guarantees for recursive ERM across both risk-averse and risk-seeking regimes.

2505.23747 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Spatial-MLLM: 提升基于视觉的空域智能的MLLM能力

Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan

发表机构 * Tsinghua University(清华大学)

AI总结 本文提出Spatial-MLLM,一种基于纯2D观测的视觉空域推理框架,通过双编码器架构和空间感知帧采样策略提升空域理解能力,实验表明其在多种视觉空域任务中达到SOTA性能。

Comments 22 pages

详情
AI中文摘要

近年来,多模态大语言模型(MLLMs)在2D视觉任务上的性能显著提升。然而,提高其空间智能仍是一个挑战。现有的3D MLLMs总是依赖额外的3D或2.5D数据来整合空间意识,限制了它们在只有2D输入(如图像或视频)场景中的实用性。在本文中,我们提出了Spatial-MLLM,一种新颖的框架,用于从纯2D观测中进行基于视觉的空间推理。与传统视频MLLMs依赖CLIP-based视觉编码器优化语义理解不同,我们的关键见解是释放来自前馈视觉几何基础模型的强大结构先验。具体来说,我们提出了双编码器架构:一个预训练的2D视觉编码器用于提取语义特征,以及一个3D空间编码器,从视觉几何模型的主干初始化以提取3D结构特征。然后,一个连接器将两种特征整合到统一的视觉标记中以增强空间理解。此外,我们提出了一种在推理时间的空间感知帧采样策略,该策略选择视频序列中具有空间信息的帧,确保在有限的token长度下,模型专注于对空间推理至关重要的帧。除了架构改进外,我们从多个来源构建了一个训练数据集,并使用监督微调和GRPO对其进行训练。在各种真实世界数据集上的广泛实验表明,Spatial-MLLM在广泛的基于视觉的空间理解和推理任务中实现了SOTA性能。项目页面:https://diankun-wu.github.io/Spatial-MLLM/.

英文摘要

Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced performance on 2D visual tasks. However, improving their spatial intelligence remains a challenge. Existing 3D MLLMs always rely on additional 3D or 2.5D data to incorporate spatial awareness, restricting their utility in scenarios with only 2D inputs, such as images or videos. In this paper, we present Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations. Unlike conventional video MLLMs which rely on CLIP-based visual encoders optimized for semantic understanding, our key insight is to unleash the strong structure prior from the feed-forward visual geometry foundation model. Specifically, we propose a dual-encoder architecture: a pretrained 2D visual encoder to extract semantic features, and a 3D spatial encoder-initialized from the backbone of the visual geometry model-to extract 3D structure features. A connector then integrates both features into unified visual tokens for enhanced spatial understanding. Furthermore, we propose a space-aware frame sampling strategy at inference time, which selects the spatially informative frames of a video sequence, ensuring that even under limited token length, the model focuses on frames critical for spatial reasoning. Beyond architecture improvements, we construct a training dataset from multiple sources and train the model on it using supervised fine-tuning and GRPO. Extensive experiments on various real-world datasets demonstrate that Spatial-MLLM achieves state-of-the-art performance in a wide range of visual-based spatial understanding and reasoning tasks. Project page: https://diankun-wu.github.io/Spatial-MLLM/.

2505.18191 2026-05-20 eess.SP cs.AI cs.LG cs.PF 版本更新

Quantifying the Generalization Gap in Seizure Detection: A Large-Scale Empirical Benchmark via the SzCORE Challenge

量化癫痫检测中的泛化差距:通过SzCORE挑战进行大规模经验基准测试

Jonathan Dan, Amirhossein Shahbazinia, Christodoulos Kechris, David Atienza

发表机构 * Embedded Systems Laboratory, EPFL, Lausanne, Switzerland(瑞士洛桑联邦理工学院嵌入式系统实验室)

AI总结 本文通过SzCORE挑战的大规模经验研究,量化了癫痫检测中模型泛化能力的差距,评估了28种最先进的算法架构,揭示了当前模型在不同患者群体中表现不一致的问题,并提出了标准化评估的必要性。

详情
AI中文摘要

可靠的自动长期脑电图(EEG)癫痫检测仍是一个未解决的挑战,因为当前模型往往无法在不同患者或临床环境中泛化。手动EEG审查仍然是标准护理,突显了对稳健模型和标准化评估的需求。当前文献常报告高效率,但这些模型在部署到未见过的患者群体时经常失效。为了严格评估这种泛化差距,我们进行了一项大规模经验研究,评估了28种最先进的算法架构,从经典特征工程到现代深度学习。这些算法通过组织竞赛收集。利用严格保留的私人数据集,包含65名受试者的连续EEG记录,共计4360小时的数据,来评估算法性能。专家神经生理学家对这些记录进行了注释,建立了癫痫事件的地面真相。算法使用SzCORE框架中的基于事件的指标进行评估,包括灵敏度、精确度、F1分数和每天的假阳性率。结果揭示了最先进的方法之间显著的性能差异,其中最高F1分数为32%(灵敏度37%,精确度29%),突显了这项任务的持续困难。分析揭示了峰值性能与群体水平稳定性之间的不一致。获得最高综合F1分数的算法并未在不同受试者中获得最一致的排名。这项独立评估暴露了自我报告效率与保留性能之间的明显差距,强调了标准化、严格基准测试的必要性。评估基础设施转变为一个持续开放的基准测试平台,促进可重复的研究,并加速稳健癫痫检测算法的发展。

英文摘要

Reliable automatic seizure detection from long-term electroencephalography (EEG) remains an unsolved challenge, as current models often fail to generalize across patients or clinical settings. Manual EEG review still is the standard of care, highlighting the need for robust models and standardized evaluation. The current literature often reports high efficacy, yet these models frequently fail when deployed to unseen patient populations. To rigorously assess this generalization gap, we conducted a large-scale empirical study evaluating 28 state-of-the-art algorithmic architectures, ranging from classical feature engineering to modern Deep Learning. These algorithms were collected by organizing a competition. A strictly held-out private dataset of continuous EEG recordings from 65 subjects, totaling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were evaluated using event-based metrics from the SzCORE framework, including sensitivity, precision, F1-score, and false positive rate per day. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 32% (sensitivity 37%, precision 29%), highlighting the persistent difficulty of this task. Analysis uncovered a discordance between peak performance and population-level stability. The algorithms achieving the highest aggregate F1-scores did not achieve the most consistent ranking across subjects. This independent evaluation exposed a notable gap between self-reported efficacies and hold-out performance, underscoring the critical need for standardized, rigorous benchmarking. The evaluation infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating robust seizure detection algorithm development.

2504.17548 2026-05-20 quant-ph cs.CR cs.LG 版本更新

Quantum Autoencoder for Multivariate Time Series Anomaly Detection

量子自编码器用于多变量时间序列异常检测

Kilian Tscharke, Maximilian Wendlinger, Afrae Ahouzi, Pallavi Bhardwaj, Kaweh Amoi-Taleghani, Michael Schrödl-Baumann, Pascal Debus

发表机构 * Fraunhofer Institute for Applied and Integrated Security (AISEC)(弗劳恩霍夫应用与集成安全研究所(AISEC)) SAP SE(SAP公司)

AI总结 本文提出了一种基于量子自编码器的框架,专门用于企业级多变量时间序列异常检测,展示了其在数据压缩和异常检测中的竞争力。

Comments Submitted to IEEE International Conference on Quantum Computing and Engineering (QCE) 2025

详情
Journal ref
2024 IEEE International Conference on Quantum Computing and Engineering (QCE), Albuquerque, NM, USA, 2025, pp. 2470-2481
AI中文摘要

异常检测(AD)定义了识别偏离典型或正常模式的观测或事件的任务,这是IT安全中识别系统配置错误、恶意软件感染或网络攻击等事件的关键能力。在像SAP HANA Cloud系统这样的企业环境中,这项任务通常涉及监控来自遥测和日志数据的高维、多变量时间序列(MTS)。随着量子机器学习在高维潜在空间中提供高效计算的能力,许多途径得以处理此类复杂数据。一种方法是量子自编码器(QAE),一种新兴且有前途的方法,具有在数据压缩和AD中的应用潜力。然而,先前将QAE应用于时间序列AD的应用仅限于单变量数据,限制了其在现实企业系统中的相关性。在本工作中,我们介绍了一种新的基于QAE的框架,专门针对企业规模的MTS AD。我们理论开发并实验验证了该架构,证明我们的QAE在性能上与基于神经网络的自编码器相媲美,同时需要更少的可训练参数。我们在反映SAP系统遥测的数据集上评估了我们的模型,显示所提出的QAE是现实企业环境中半监督AD的一种可行且高效的替代方案。

英文摘要

Anomaly Detection (AD) defines the task of identifying observations or events that deviate from typical - or normal - patterns, a critical capability in IT security for recognizing incidents such as system misconfigurations, malware infections, or cyberattacks. In enterprise environments like SAP HANA Cloud systems, this task often involves monitoring high-dimensional, multivariate time series (MTS) derived from telemetry and log data. With the advent of quantum machine learning offering efficient calculations in high-dimensional latent spaces, many avenues open for dealing with such complex data. One approach is the Quantum Autoencoder (QAE), an emerging and promising method with potential for application in both data compression and AD. However, prior applications of QAEs to time series AD have been restricted to univariate data, limiting their relevance for real-world enterprise systems. In this work, we introduce a novel QAE-based framework designed specifically for MTS AD towards enterprise scale. We theoretically develop and experimentally validate the architecture, demonstrating that our QAE achieves performance competitive with neural-network-based autoencoders while requiring fewer trainable parameters. We evaluate our model on datasets that closely reflect SAP system telemetry and show that the proposed QAE is a viable and efficient alternative for semisupervised AD in real-world enterprise settings.

2504.00470 2026-05-20 cs.LG cs.CV 版本更新

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

少即是多:通过最小可解释子集选择实现高效的黑盒属性分析

Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) University of Chinese Academy of Sciences(中国科学院大学) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算机与数据科学学院) School of Artificial Intelligence, University of Science and Technology Beijing(北京科技大学人工智能学院) Department of Mechanical Engineering, Imperial College London(伦敦帝国理工学院机械工程系) Center for Machine Vision and Signal Analysis (CMVS), University of Oulu(奥卢大学机器视觉与信号分析中心) School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区计算机科学与技术学院)

AI总结 本文提出了一种高效的黑盒属性分析方法LiMA,通过将重要区域的属性分析转化为子模函数子集选择的优化问题,以更少的区域提供更准确的解释,并在多个基准模型上展示了显著的改进。

详情
AI中文摘要

为了开发一个可信的AI系统,目标是识别对模型决策影响最大的输入区域。现有属性方法的主要任务是高效且准确地识别输入-预测交互关系。特别是当输入数据是离散的,如图像时,分析输入和输出之间的关系由于组合爆炸而成为重大挑战。在本文中,我们提出了一种新颖且高效的黑盒属性机制LiMA(Less input is More faithful for Attribution),它将重要区域的属性分析重新表述为一个子模子集选择的优化问题。首先,为了准确评估交互,我们设计了一个子模函数,该函数量化子集的重要性并有效捕捉其对决策结果的影响。然后,通过一种新的双向贪心搜索算法,高效地对输入子区域按重要性进行排序。LiMA能够识别最和最不重要的样本,同时确保一个最优的属性边界,以最小化误差。在八个基础模型上的广泛实验表明,我们的方法在更少的区域上提供了忠实的解释,并表现出强大的泛化能力,插入和删除任务的平均改进分别为36.3%和39.6%。我们的方法在属性效率方面也优于朴素的贪心搜索,速度提高了1.6倍。此外,当解释模型预测错误的原因时,我们的方法平均最高置信度比最先进的属性算法高86.1%。代码可在https://github.com/RuoyuChen10/LIMA上获得。

英文摘要

To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

2503.17581 2026-05-20 math.OC cs.LG 版本更新

Time-optimal neural feedback control of nilpotent systems as a binary classification problem

时间最优神经反馈控制的nilpotent系统作为二分类问题

Sara Bicego, Samuel Gue, Dante Kalise, Nelly Villamizar

发表机构 * Department of Mathematics, Imperial College London, United Kingdom(伦敦帝国学院数学系,英国) Department of Mathematics, Swansea University, United Kingdom(斯旺西大学数学系,英国)

AI总结 本文提出了一种用于线性nilpotent系统时间最优反馈控制律合成的计算方法,通过将问题转化为二分类问题来构建时间最优深度神经网络。

详情
AI中文摘要

本文提出了一种用于线性nilpotent系统时间最优反馈控制律合成的计算方法。该方法基于 bang-bang 定理,将时间最优轨迹表征为依赖于控制切换序列的参数依赖多项式系统。随后应用了消元牛顿法,以穷尽多项式系统的所有实根。根寻找过程受到 Hermite 二次型的指导,该方法提供了对所需实根数量的精确估计。在论文的第二部分,多项式系统被采样并求解,以生成合成数据集,从而通过监督学习构建时间最优深度神经网络——视为二分类器。通过不同维度的积分器进行数值测试,评估了近似控制律的准确性、鲁棒性和实时控制能力。

英文摘要

A computational method for the synthesis of time-optimal feedback control laws for linear nilpotent systems is proposed. The method is based on the use of the bang-bang theorem, which leads to a characterization of the time-optimal trajectory as a parameter-dependent polynomial system for the control switching sequence. A deflated Newton's method is then applied to exhaust all the real roots of the polynomial system. The root-finding procedure is informed by the Hermite quadratic form, which provides a sharp estimate on the number of real roots to be found. In the second part of the paper, the polynomial systems are sampled and solved to generate a synthetic dataset for the construction of a time-optimal deep neural network -- interpreted as a binary classifier -- via supervised learning. Numerical tests in integrators of increasing dimension assess the accuracy, robustness, and real-time-control capabilities of the approximate control law.

2503.13868 2026-05-20 cs.LG cs.AI 版本更新

Out-of-Distribution Generalization in Time Series: A Survey

时间序列中的分布外泛化:综述

Xin Wu, Fei Teng, Xingwang Li, Ji Zhang, Tianrui Li, Qiang Duan

发表机构 * School of Computing and Artificial Intelligence, Southwest Jiaotong University(计算机与人工智能学院,西南交通大学) Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education(可持续城市智能交通工程研究中心,教育部) Information Sciences and Technology Department, the Pennsylvania State University(信息科学与技术系,宾夕法尼亚州立大学)

AI总结 本文综述了时间序列中分布外泛化的方法,分析了数据分布、表示学习和分布外评估三个维度,总结了主流算法,指出了应用场景和存在的挑战,并提出了未来研究方向。

Comments Work in Progress

详情
Journal ref
Information Fusion 133, 104336 (2026)
AI中文摘要

时间序列经常表现出分布偏移、多样化的潜在特征和非平稳学习动态,特别是在开放和演变的环境中。这些特性对分布外(OOD)泛化提出了重大挑战。尽管已有显著进展,但系统性综述仍缺乏。为填补这一空白,我们首次全面回顾了时间序列中OOD泛化方法,旨在阐明该领域的发展轨迹和当前研究现状。我们的分析分为三个基础维度:数据分布、表示学习和OOD评估。在每个维度中,我们详细介绍了几种流行的算法。此外,我们强调了关键的应用场景,突显其实际影响。最后,我们识别了持续存在的挑战并提出了未来的研究方向。时间序列中OOD泛化方法的详细总结可通过https://tsood-generalization.com获取。

英文摘要

Time series frequently manifest distribution shifts, diverse latent features, and non-stationary learning dynamics, particularly in open and evolving environments. These characteristics pose significant challenges for out-of-distribution (OOD) generalization. While substantial progress has been made, a systematic synthesis of advancements remains lacking. To address this gap, we present the first comprehensive review of OOD generalization methodologies for time series, organized to delineate the field's evolutionary trajectory and contemporary research landscape. We organize our analysis across three foundational dimensions: data distribution, representation learning, and OOD evaluation. For each dimension, we present several popular algorithms in detail. Furthermore, we highlight key application scenarios, emphasizing their real-world impact. Finally, we identify persistent challenges and propose future research directions. A detailed summary of the methods reviewed for the generalization of OOD in time series can be accessed at https://tsood-generalization.com.

2503.12172 2026-05-20 cs.LG cs.CR cs.CV 版本更新

SEAL: Semantic Aware Image Watermarking

SEAL:语义感知图像水印

Kasra Arabi, R. Teal Witter, Chinmay Hegde, Niv Cohen

发表机构 * New York University(纽约大学)

AI总结 本文提出了一种新的水印方法,通过将生成图像的语义信息直接嵌入水印中,实现无损水印验证,无需依赖密钥模式数据库。通过局部敏感哈希从图像语义嵌入中推断密钥模式,并基于原始图像内容条件检测水印,提高对抗伪造攻击的鲁棒性。

详情
AI中文摘要

生成模型已迅速发展以生成逼真的输出。然而,它们的合成输出越来越多地挑战自然与AI生成内容之间的清晰区分,需要稳健的水印技术。水印通常需要保持目标图像的完整性,抵御移除尝试,并防止未经授权的复制到无关图像上。为了解决这一需求,最近的方法将持久水印嵌入由扩散模型生成的图像中使用初始噪声。然而,为此,它们要么会扭曲生成图像的分布,要么依赖于搜索一个长密钥字典进行检测。在本文中,我们提出了一种新的水印方法,将生成图像的语义信息直接嵌入水印中,使水印无损,且无需数据库中的密钥模式即可验证。相反,密钥模式可以从图像的语义嵌入中使用局部敏感哈希推断。此外,将水印检测条件化于原始图像内容可以提高对伪造攻击的鲁棒性。为了证明这一点,我们考虑了两种被忽视的攻击策略:(i)攻击者提取初始噪声并生成具有相同模式的新图像;(ii)攻击者在水印图像中插入无关(可能有害)的对象,可能在保持水印的情况下。我们通过实验证明了我们的方法对这些攻击的增强鲁棒性。总的来说,我们的结果表明,内容感知的水印可以缓解图像生成模型带来的风险。

英文摘要

Generative models have rapidly evolved to generate realistic outputs. However, their synthetic outputs increasingly challenge the clear distinction between natural and AI-generated content, necessitating robust watermarking techniques. Watermarks are typically expected to preserve the integrity of the target image, withstand removal attempts, and prevent unauthorized replication onto unrelated images. To address this need, recent methods embed persistent watermarks into images produced by diffusion models using the initial noise. Yet, to do so, they either distort the distribution of generated images or rely on searching through a long dictionary of used keys for detection. In this paper, we propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark, enabling a distortion-free watermark that can be verified without requiring a database of key patterns. Instead, the key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Furthermore, conditioning the watermark detection on the original image content improves robustness against forgery attacks. To demonstrate that, we consider two largely overlooked attack strategies: (i) an attacker extracting the initial noise and generating a novel image with the same pattern; (ii) an attacker inserting an unrelated (potentially harmful) object into a watermarked image, possibly while preserving the watermark. We empirically validate our method's increased robustness to these attacks. Taken together, our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.

2502.04575 2026-05-20 stat.ML cs.LG cs.NA math.NA physics.comp-ph stat.CO 版本更新

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

归一化常数估计的复杂性分析:从Jarzynski等式到退火重要性采样及其进一步发展

Wei Guo, Molei Tao, Yongxin Chen

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文研究了归一化常数估计问题,提出了一种非渐近分析方法,推导了退火重要性采样估计归一化常数的复杂度,并提出了一种新的算法以处理多模态问题。

Comments Accepted at ICLR 2026 (https://openreview.net/forum?id=96fJALwotm)

详情
AI中文摘要

给定一个未归一化的概率密度π∝e^{-V},估计其归一化常数Z=∫_{R^d}e^{-V(x)}dx或自由能F=-log Z是贝叶斯统计、统计力学和机器学习中的关键问题。尤其是在高维或π多模态时,这变得尤为具有挑战性。为了减轻传统重要性采样估计器的高方差,采用基于退火的方法如Jarzynski等式和退火重要性采样是常见的选择,但其定量复杂度保证仍很少被探索。我们朝着退火重要性采样的非渐近分析迈出第一步。特别是,我们推导出一个oracle复杂度为~O(dβ²A²/ε⁴)的复杂度,用于在高概率下估计Z的ε相对误差。其中,β是V的光滑度,A表示一个插值π和可处理参考分布的概率测度曲线的动作。我们的分析利用Girsanov定理和最优传输,不需要显式要求目标分布的等周假设。最后,为了处理广泛使用的几何插值的大动作,我们提出了一种基于反扩散采样器的新算法,建立了分析其复杂度的框架,并通过实验证明其在处理多模态问题中的效率。

英文摘要

Given an unnormalized probability density $π\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $π$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $β$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $π$ and a tractable reference distribution. Our analysis, leveraging Girsanov's theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

2411.08982 2026-05-20 cs.LG cs.DC 版本更新

Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Lynx:通过动态批量感知专家选择实现高效的MoE推理

Vima Gupta, Jae Hyung Ju, Kartik Sinha, Ada Gavrilovska, Anand Padmanabha Iyer

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出Lynx系统,通过利用MoE训练中的负载平衡损失特性,减少专家调用总数,从而在不依赖工作负载的情况下实现高效的MoE推理,提升了吞吐量并保持了低的精度损失。

详情
AI中文摘要

混合专家(MoE)模型提供的选择性参数激活使其成为现代基础模型的流行选择。然而,当用于服务时,MoE面临一个根本性的矛盾。批处理对于服务性能至关重要,迫使激活所有专家,从而抵消了MoE的优势并加剧了内存带宽瓶颈。现有高效MoE推理方法即使在广泛的工作负载特定调优下也无法解决这一矛盾。我们提出了Lynx,一个能够在工作负载无关的情况下实现高效MoE推理的系统。Lynx利用了MoE训练的一个关键特性:负载平衡损失引入了批次级别的专家激活偏斜和冗余,它通过一种新的AffinityBinning技术重新映射每个批次中的低亲和力的token到专家分配,从而减少总调用的专家数量。我们在九个基准测试中对四种最先进的模型家族进行评估,结果显示Lynx在保持精度损失低于1个百分点的情况下,实现了高达1.30倍的吞吐量提升。此外,Lynx与现有技术互补,进一步提升了其性能,最高可提升1.38倍。

英文摘要

Selective parameter activation provided by Mixture-of-Expert (MoE) models have made them a popular choice in modern foundational models. However, MoEs face a fundamental tension when employed for serving. Batching, critical for performance in serving, forces the activation of all experts, thereby negating MoEs' benefits and exacerbating memory bandwidth bottlenecks. Existing work on efficient MoE inference are unable to resolve this tension even with extensive workload-specific tuning. We present LYNX, a system that enables efficient MoE inference in a workload-agnostic fashion. LYNX leverages a key property of MoE training: load-balancing losses introduce batch-level expert activation skews and redundancy, which it exploits by remapping low-affinity token-to-expert assignments within each batch using a novel AffinityBinning technique that reduces the total experts invoked. Our evaluation of LYNX on four state-of-the-art model families across nine benchmarks shows that it achieves up to 1.30x improvement in throughput while maintaining accuracy loss of less than 1% points across tasks. Further, LYNX is complementary to existing techniques where it additionally boosts their performance by up to 1.38x.

2410.15362 2026-05-20 cs.LG cs.AI cs.CL cs.CR 版本更新

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Faster-GCG: 面向对齐大语言模型的高效离散优化监狱突破攻击

Xiao Li, Wei Zhang, Zhuhong Li, Qiongxiu Li, Shei PernChua, BingZe Lee, Jinghao Cui, Yifan Huang, Xiaolin Hu

发表机构 * Tsinghua University(清华大学) Sea-Fill Duke University(杜克大学) Aalborg University(奥胡斯大学) Chinese Institute for Brain Research (CIBR)(中国脑科学研究院)

AI总结 本文提出Faster-GCG,通过改进估计、高效采样和避免重复评估,提高了对齐大语言模型的监狱突破攻击效率,实现了样本效率提升8倍,时间减少7倍,并在多个模型上取得了更高的突破成功率。

Comments 18 pages, new version

详情
AI中文摘要

对齐大语言模型(LLMs)因其安全性而受到广泛关注,尤其是在试图通过对抗性提示绕过安全边界(guardrails)的监狱突破攻击中。现有方法中,贪心坐标梯度(GCG)攻击通过离散标记优化实现了自动化监狱突破,但其低样本效率限制了实际应用。特别是,GCG需要约256,000次评估才能达到满意的监狱突破成功率,这是由于底层离散优化问题的固有难度。在本工作中,我们识别了限制GCG样本效率的三个关键因素:不准确的基于梯度的估计、低效的均匀采样以及重复评估先前探索的后缀。为了解决这些问题,我们提出了Faster-GCG,一种经过简化且改进的GCG变种,它结合了基于距离的正则化以提高估计、温度控制的采样以更有效的探索,以及一个标记已访问后缀的机制以避免冗余评估。Faster-GCG将所需的评估次数减少到32,000次,实现了与GCG相比样本效率提升8倍和时间减少7倍的改进。在该减少的预算下,Faster-GCG在五个对齐LLMs上平均达到了78.1%的监狱突破成功率,并在Qwen3.5-4B上达到了88.7%,优于最先进的白盒监狱突破方法。

英文摘要

Aligned Large Language Models (LLMs) have attracted significant attention for their safety, particularly in the context of jailbreak attacks that attempt to bypass guardrails via adversarial prompts. Among existing approaches, the Greedy Coordinate Gradient (GCG) attack pioneered automated jailbreaks through discrete token optimization; however, its low sample efficiency limits practical applicability. In particular, GCG requires approximately 256K evaluations per harmful behavior to achieve a satisfactory jailbreak success rate, due to the inherent difficulty of the underlying discrete optimization problem. In this work, we identify three key factors that limit the sample efficiency of GCG: inaccurate gradient-based estimation, inefficient uniform sampling, and repeated evaluation of previously explored suffixes. To address these issues, we propose Faster-GCG, a streamlined variant of GCG that incorporates distance-based regularization for improved estimation, temperature-controlled sampling for more effective exploration, and a visited-suffix marking mechanism to avoid redundant evaluations. Faster-GCG reduced the required evaluations to 32K, achieving up to an $8\times$ improvement in sampling efficiency and a $7\times$ reduction in wall-clock time compared to GCG. Under this reduced budget, Faster-GCG attained an average jailbreak success rate of 78.1\% across five aligned LLMs, and achieved 88.7\% against Qwen3.5-4B, outperforming state-of-the-art white-box jailbreak methods.

2408.12385 2026-05-20 cs.DS cs.LG 版本更新

Sharper Bounds for Chebyshev Moment Matching, with Applications

更精确的Chebyshev矩匹配界限及其应用

Cameron Musco, Christopher Musco, Lucas Rosenblatt, Apoorv Vikram Singh

发表机构 * UMass Amherst(马萨诸塞大学阿姆赫斯特分校) New York University(纽约大学)

AI总结 本文研究了在存在噪声测量的情况下,通过Chebyshev多项式矩来近似恢复概率分布的问题。通过利用Lipschitz函数Chebyshev展开系数的全局衰减界,作者证明了在比之前已知的更多的噪声情况下,可以在Wasserstein距离中实现精确的恢复。该结果立即应用于多个领域:1)提供了一个简单的“线性查询”算法,用于构造具有Wasserstein-1误差为~O(1/n)的差分隐私合成数据分布;2)给出了一个~O(n²/ε)时间的算法,用于估计对称矩阵的谱密度,误差在Wasserstein距离内为ε;3)改进了Vinayak等人在ICML 2019上对“学习参数群体”统计问题最大似然估计器的分析,扩展了可以获得样本最优结果的参数范围。此外,作者还扩展了该界到d>1维分布的估计。

详情
AI中文摘要

我们研究了在存在噪声测量的情况下,通过Chebyshev多项式矩近似恢复概率分布的问题。这个问题在算法、统计和机器学习中广泛出现。通过利用任何Lipschitz函数Chebyshev展开系数的全局衰减界,我们改进了先前的工作,证明在比之前已知的更多的噪声情况下,可以在Wasserstein距离中实现精确的恢复。我们的结果立即导致了多个应用:1)我们提供了一个简单的“线性查询”算法,用于构造具有Wasserstein-1误差~O(1/n)的差分隐私合成数据分布,该结果在对数因子范围内是最佳的,并与Boedihardjo、Strohmer和Vershynin [Probab. Theory. Rel., 2024] 的结果相匹配,该结果使用了更复杂的“超正则随机游走”方法。2)我们给出了一个~O(n²/ε)时间的算法,用于估计n×n对称矩阵的谱密度,误差在Wasserstein距离内为ε。我们的结果加速了Chen等人[ICML 2021]和Braverman等人[STOC 2022]的先前方法。3)我们改进了Vinayak、Kong、Valiant和Kakade [ICML 2019] 对“学习参数群体”统计问题最大似然估计器的分析,扩展了可以获得样本最优结果的参数范围。除了这些主要结果外,我们还扩展了该界到d>1维分布的估计。我们希望这些界能更广泛地应用于涉及从噪声矩信息中恢复分布的问题。

英文摘要

We study the problem of approximately recovering a probability distribution given noisy measurements of its Chebyshev polynomial moments. This problem arises broadly across algorithms, statistics, and machine learning. By leveraging a global decay bound on the coefficients in the Chebyshev expansion of any Lipschitz function, we sharpen prior work, proving that accurate recovery in the Wasserstein distance is possible with more noise than previously known. Our result immediately yields a number of applications: 1) We give a simple "linear query" algorithm for constructing a differentially private synthetic data distribution with Wasserstein-$1$ error $\tilde{O}(1/n)$ based on a dataset of $n$ points in $[-1,1]$. This bound is optimal up to log factors, and matches a recent result of Boedihardjo, Strohmer, and Vershynin [Probab. Theory. Rel., 2024], which uses a more complex "superregular random walk" method. 2) We give an $\tilde{O}(n^2/ε)$ time algorithm for the linear algebraic problem of estimating the spectral density of an $n\times n$ symmetric matrix up to $ε$ error in the Wasserstein distance. Our result accelerates prior methods from Chen et al. [ICML 2021] and Braverman et al. [STOC 2022]. 3) We tighten an analysis of Vinayak, Kong, Valiant, and Kakade [ICML 2019] on the maximum likelihood estimator for the statistical problem of "Learning Populations of Parameters'', extending the parameter regime in which sample optimal results can be obtained. Beyond these main results, we provide an extension of our bound to estimating distributions in $d > 1$ dimensions. We hope that these bounds will find applications more broadly to problems involving distribution recovery from noisy moment information.

2407.17200 2026-05-20 stat.ML cs.LG math.OC stat.ME 版本更新

Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems

组合优化问题中替代策略的泛化界限

Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi

发表机构 * CERMICS, CNRS, ENPC, Institut Polytechnique de Paris(CERMICS、CNRS、ENPC、巴黎理工学院) Institut Camille Jordan, École Centrale Lyon, CNRS UMR 5208(让·坎贝尔学院、 Lyon 工程学院、 CNRS UMR 5208) Institut Universitaire de France (IUF)(法国研究院) SIERRA, INRIA Paris(INRIA 巴黎 SIERRA)

AI总结 本文研究了在组合优化问题中使用替代策略的泛化界限,通过分析平滑(扰动)策略,提出了一个将超额风险分解为扰动偏差、统计估计误差和优化误差的泛化界限,引入了新的几何量来控制扰动偏差,并利用核Sum-of-Squares方法减少全局优化的维度灾难。

Comments 29 pages main document, 9 pages supplement

详情
AI中文摘要

许多现实世界决策问题需要反复求解来自共同分布的组合优化实例。最近的结构学习方法利用这种规律性,通过学习将统计模型与可计算的组合 oracle 结合的策略,而不是独立解决每个实例。然而,训练此类策略极具挑战性:结果的经验风险是模型参数的分段常数函数,这阻碍了基于梯度的优化,并且迄今为止仅提供了很少的理论保证。我们通过分析平滑(扰动)策略来解决这个问题:在线性oracle使用的方向上添加受控的随机扰动,会得到一个可微的替代风险并提高泛化能力。我们的主要贡献是一个将超额风险分解为(i)扰动偏差、(ii)统计估计误差和(iii)优化误差的泛化界限。扰动偏差通过新的几何量“扇交叉概率”来控制,该量衡量扰动改变oracle解的可能性。我们引入了两个互补的条件来限制它——均匀有界密度(UBD)性质,产生一个锐利的O(λ)偏差,和较弱的均匀弱矩(UW)性质,产生一个亚线性界——两者都捕捉了统计模型与可行多面体的正常扇之间的几何交互。统计估计误差通过政策类的统一偏差界来控制,其速率O(1/(λ√n)),与平滑参数成反比。关于优化误差,我们利用核Sum-of-Squares方法来缓解全局优化的维度灾难。

英文摘要

Many real-world decision problems require solving, again and again, combinatorial optimization instances drawn from a common distribution. A recent line of structured learning methods exploits this regularity by learning policies that pair a statistical model with a tractable combinatorial oracle, instead of solving each instance independently. Training such policies is notoriously difficult, however: the resulting empirical risk is piecewise constant in the model parameters, which hinders gradient-based optimization, and only a few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into $(\mathit{i})$ perturbation bias, $(\mathit{ii})$ statistical estimation error, and $(\mathit{iii})$ optimization error. The perturbation bias is controlled by the \emph{fan-crossing probability}, a new geometric quantity measuring the likelihood that a perturbation changes the oracle solution. We introduce two complementary conditions to bound it--the \emph{Uniformly Bounded Density} (UBD) property, yielding a sharp ${O}(λ)$ bias, and the weaker \emph{Uniform Weak moment} (UW) property, yielding a sub-linear bound--both capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope. The statistical estimation error is controlled via a uniform deviation bound over the policy class, with rate ${O}(1/(λ\sqrt{n}))$ that scales inversely in the smoothing parameter. Concerning the optimization error, we exploit kernel Sum-of-Squares methods to mitigate the curse of dimensionality of global optimization.

2403.07183 2026-05-20 cs.CL cs.AI cs.LG cs.SI 版本更新

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

大规模监控AI修改内容:ChatGPT对AI会议同行评审影响的案例研究

Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

发表机构 * Department of Computer Science, Stanford University(斯坦福大学计算机科学系) Machine Learning Department, NEC Labs America(NEC美国实验室机器学习部门) Department of Biomedical Data Science, Stanford University(斯坦福大学生物医学数据科学系) Department of Electrical Engineering, Stanford University(斯坦福大学电气工程系) Graduate School of Education, Stanford University(斯坦福大学教育研究生院) Department of Sociology, Stanford University(斯坦福大学社会学系) Graduate School of Business, Stanford University(斯坦福大学商学院) Department of Management Science and Engineering, Stanford University(斯坦福大学管理科学与工程系) Department of Computer Science, UC Santa Barbara(加州大学圣芭芭拉分校计算机科学系)

AI总结 本文提出了一种方法,用于估计大规模语料库中可能被大语言模型(LLM)显著修改或生成的文本比例。通过专家撰写和AI生成的参考文本,该最大似然模型能够高效地在语料库层面考察实际的LLM使用情况。研究以ChatGPT发布后举行的AI会议同行评审(ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023)为案例,发现6.5%至16.9%的提交文本可能被LLM显著修改。生成文本的情境揭示了用户行为:在信心较低、接近截止日期或回复作者反驳较少的评审中,估计的LLM生成文本比例更高。此外,观察到语料库层面的趋势可能过于微妙,无法在个体层面检测到,并讨论了这些趋势对同行评审的影响。呼吁未来跨学科研究探讨LLM使用如何改变我们的信息和知识实践。

Comments 46 pages, 31 figures, ICML '24

详情
AI中文摘要

我们提出了一种方法,用于估计大规模语料库中可能被大语言模型(LLM)显著修改或生成的文本比例。我们的最大似然模型利用专家撰写和AI生成的参考文本,以准确且高效的方式在语料库层面考察实际的LLM使用情况。我们将该方法应用于ChatGPT发布后举行的AI会议同行评审案例研究,包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。我们的结果表明,在这些会议中提交的同行评审文本中,6.5%至16.9%可能被LLM显著修改,即超出拼写检查或小幅写作更新的范围。生成文本出现的情境提供了关于用户行为的见解:估计的LLM生成文本比例在信心较低、接近截止日期或来自较少回应作者反驳的评审中更高。我们还观察到语料库层面的生成文本趋势,这些趋势可能在个体层面过于微妙而无法检测到,并讨论了这些趋势对同行评审的影响。我们呼吁未来跨学科研究探讨LLM使用如何改变我们的信息和知识实践。

英文摘要

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

2310.11203 2026-05-20 cs.LG stat.ML 版本更新

Federated Learning with Nonvacuous Generalisation Bounds

联邦学习中的非空泛化界限

Pierre Jobic, Maxime Haddouche, Benjamin Guedj

发表机构 * Université Paris-Saclay CEA(巴黎-萨克雷大学CEA) Inria, CNRS, Ecole Normale Supérieure, PSL Research University(法国国家科学研究中心Inria、高等师范学院、巴黎-萨克雷研究大学) University College London(伦敦大学学院)

AI总结 本文提出了一种在联邦学习中训练随机预测器的新策略,通过在保持隐私的同时,释放本地预测器并保护训练数据不被其他节点知晓。研究构建了一个全局随机预测器,继承本地私有预测器的属性,基于PAC-Bayesian泛化界限。通过数值实验展示了该方法在预测性能上与批量方法相当,同时保持隐私。

详情
AI中文摘要

我们介绍了一种新的策略来训练联邦学习中的随机预测器,其中每个网络节点旨在通过释放本地预测器来保护隐私,同时保持其训练数据对其他节点的保密性。然后我们构建了一个全局随机预测器,该预测器继承本地私有预测器的属性,基于PAC-Bayesian泛化界限。我们考虑了同步情况,其中所有节点共享相同的训练目标(来源于泛化界限),以及异构和同构情况,其中每个节点可能有自己的个性化训练目标。通过一系列数值实验,我们证明了我们的方法在预测性能上与批量方法相当,其中所有数据集都在节点之间共享。此外,预测器由数值非空泛化界限支持,同时为每个节点保持隐私。我们明确计算了我们两种联邦设置的预测性能和泛化界限的增量,突显了为保护隐私而付出的代价。

英文摘要

We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but keeping secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisation bound. We consider the synchronous case where all nodes share the same training objective (derived from a generalisation bound), and the heterogenous and homogenous cases where each node may have its own personalised training objective. We show through a series of numerical experiments that our approach achieves a comparable predictive performance to that of the batch approach where all datasets are shared across nodes. Moreover the predictors are supported by numerically nonvacuous generalisation bounds while preserving privacy for each node. We explicitly compute the increment on predictive performance and generalisation bounds for our two federated settings, highlighting the price to pay to preserve privacy.

2112.08507 2026-05-20 cs.LG stat.ML 版本更新

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

适应性实验的算法:在统计分析与奖励之间进行权衡:结合均匀随机分配与奖励最大化

Tong Li, Jacob Nogas, Haochen Song, Anna Rafferty, Eric M. Schwartz, Audrey Durand, Harsh Kumar, Nina Deliu, Sofia S. Villar, Dehan Kong, Joseph J. Williams

发表机构 * University of Toronto(多伦多大学) Carleton College(卡洛尔学院) University of Michigan(密歇根大学) University of Cambridge(剑桥大学)

AI总结 本文提出了一种统计敏感算法TS-PostDiff,通过结合均匀随机分配和奖励最大化,在统计分析与用户奖励之间进行权衡,以提高实验效率和准确性。

详情
AI中文摘要

传统随机A/B实验使用均匀随机(UR)概率分配臂,例如将50/50分配给网站的两个版本以发现哪个版本更能吸引用户。为了更快速和自动地利用数据来造福用户,多臂老虎机算法如汤普森采样(TS)已被提倡。虽然TS具有可解释性并结合了随机化关键的统计推断,但它可能导致有偏估计并增加假阳性率和假阴性率。我们引入了一种更统计敏感的算法,TS-PostDiff(后验概率小差异),它通过使用额外的自适应步骤混合TS和传统UR,其中使用UR(而非TS)的概率与臂差异的后验概率成正比。这使实验者能够定义什么算作小差异,低于此值,传统UR实验可以以低成本获得用于统计推断的信息数据,而高于此值则使用更多TS以最大化用户利益。我们评估了TS-PostDiff与UR、TS以及两个其他旨在提高统计推断的TS变体。我们考虑了在多种设置下的常见双臂实验结果,这些设置受到现实应用的启发。我们的结果提供了洞察,说明在何时以及为何TS-PostDiff或替代方法在用户利益(奖励)和统计推断(假阳性率和功率)之间提供更好的权衡。TS-PostDiff的自适应性有助于在差异较小时高效减少假阳性并提高统计功率,而在差异较大时增加奖励。这项工作强调了未来统计敏感算法开发中重要的考虑因素,这些算法需要在适应性实验中平衡奖励和统计分析。

英文摘要

Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

2105.00933 2026-05-20 cs.SD cs.AI cs.LG eess.AS 版本更新

Deep Neural Network for Musical Instrument Recognition using MFCCs

基于MFCCs的音乐乐器识别深度神经网络

Saranga Kingkor Mahanta, Abdullah Faiz Ur Rahman Khilji, Partha Pakray

发表机构 * Department of Electronics and Communication Engineering, National Institute of Technology, Silchar, Assam, India(电子与通信工程系,国家理工学院,西拉char,阿萨姆,印度)

AI总结 本文提出一种基于MFCCs的深度神经网络模型,用于对二十种不同类别的音乐乐器进行分类,利用伦敦爱乐乐团数据集实现高精度识别。

详情
Journal ref
Computacion y Sistemas, Vol 25, No 2 (2021): 25(2) 2021
AI中文摘要

高效自动音乐分类任务在AI应用于音乐领域中具有重要性,并构成了各种高级应用的基础。音乐乐器识别是通过音频来识别乐器的任务。这种音频也称为声音振动,被模型用来与乐器类别匹配。在本文中,我们使用了一个经过训练以对二十种不同类别的音乐乐器进行分类的人工神经网络(ANN)模型。这里我们仅使用音频数据的梅尔频率倒谱系数(MFCCs)。我们的模型在完整的伦敦爱乐乐团数据集上进行训练,该数据集包含属于四个家族(木管乐器、铜管乐器、打击乐器和弦乐器)的二十种乐器类别。基于实验结果,我们的模型在相同数据集上实现了最先进的准确性。

英文摘要

The task of efficient automatic music classification is of vital importance and forms the basis for various advanced applications of AI in the musical domain. Musical instrument recognition is the task of instrument identification by virtue of its audio. This audio, also termed as the sound vibrations are leveraged by the model to match with the instrument classes. In this paper, we use an artificial neural network (ANN) model that was trained to perform classification on twenty different classes of musical instruments. Here we use use only the mel-frequency cepstral coefficients (MFCCs) of the audio data. Our proposed model trains on the full London philharmonic orchestra dataset which contains twenty classes of instruments belonging to the four families viz. woodwinds, brass, percussion, and strings. Based on experimental results our model achieves state-of-the-art accuracy on the same.

1912.11333 2026-05-20 cs.SD cs.LG eess.AS 版本更新

Audio-based automatic mating success prediction of giant pandas

基于音频的 giant pandas 雌雄配对成功率预测

WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang

AI总结 本文提出了一种基于音频的自动方法,用于预测 giant pandas 的配对成功率,通过提取音频特征并使用深度神经网络进行分类,以辅助大熊猫的繁殖研究。

Comments The manuscript needs further revision

详情
AI中文摘要

大熊猫,通常被视为沉默的动物,在繁殖季节会发出显著更多的声音,这表明声音对于协调其繁殖和表达配对偏好至关重要。先前的生物学研究也证明,大熊猫的声音与配对结果和繁殖有关。本文首次尝试开发一种基于其声音的自动方法,用于预测大熊猫的配对成功率。给定一个记录于繁殖接触期间的大熊猫音频序列,我们首先裁剪出大熊猫的声音段落,并对其进行幅度和长度的归一化。然后从音频段落中提取声学特征,并将这些特征输入深度神经网络,以将配对分类为成功或失败。所提出的深度神经网络采用卷积层后接双向门控循环单元来提取声音特征,并应用注意力机制,以迫使网络专注于最相关特征。在过去九年收集的数据集上的评估实验取得了有希望的结果,证明了基于音频的自动配对成功率预测方法在辅助大熊猫繁殖方面的潜力。

英文摘要

Giant pandas, stereotyped as silent animals, make significantly more vocal sounds during breeding season, suggesting that sounds are essential for coordinating their reproduction and expression of mating preference. Previous biological studies have also proven that giant panda sounds are correlated with mating results and reproduction. This paper makes the first attempt to devise an automatic method for predicting mating success of giant pandas based on their vocal sounds. Given an audio sequence of mating giant pandas recorded during breeding encounters, we first crop out the segments with vocal sound of giant pandas, and normalize its magnitude, and length. We then extract acoustic features from the audio segment and feed the features into a deep neural network, which classifies the mating into success or failure. The proposed deep neural network employs convolution layers followed by bidirection gated recurrent units to extract vocal features, and applies attention mechanism to force the network to focus on most relevant features. Evaluation experiments on a data set collected during the past nine years obtain promising results, proving the potential of audio-based automatic mating success prediction methods in assisting giant panda reproduction.

2605.19018 2026-05-20 cs.LG 版本更新

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

LoRA与全微调:一种理论视角

Ali Zindari, Rotem Mulayoff, Sebastian U. Stich

发表机构 * Universität des Saarlandes(萨尔兰州大学) CISPA Helmholtz Center for Information Security(信息安全赫尔姆霍兹研究中心)

AI总结 本文从理论角度研究了LoRA与全微调在线性回归中的表现,发现LoRA在过定和欠定情况下能够以更低的额外风险优于全微调,且LoRA秩的选择影响泛化性能,实验验证了理论结果的广泛适用性。

Comments Preprint

详情
AI中文摘要

微调通过少量标记数据将预训练模型适应到下游任务。低秩适应(LoRA)是一种高效的微调方法,它在减少内存和计算成本的同时,通常能实现接近全微调的性能。尽管广泛应用,LoRA的理论行为尚未深入理解。本文在简单的线性回归设置中研究LoRA,并将其额外风险与全微调进行比较。我们的分析识别出在过定和欠定情况下,LoRA在某些条件下能够实现低于全微调的额外风险。具体而言,我们的理论预测当预训练任务与下游任务之间的差异在低秩范围内时,LoRA可以超越全微调。我们进一步展示了LoRA秩的选择如何影响泛化性能,解释了在某些情况下使用极小的秩可以提高测试准确率,尽管这限制了模型的表达能力。最后,我们通过实际任务的实验支持了我们的理论结果,表明所识别的权衡和见解超出了线性回归的范围。

英文摘要

Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving performance close to full fine-tuning. Despite its widespread use, the theoretical behavior of LoRA is not yet well understood. In this paper, we study LoRA in a simple linear regression setting and compare its excess risk with that of full fine-tuning. Our analysis identifies regimes in which LoRA achieves lower excess risk than full fine-tuning in both overdetermined and underdetermined settings. Specifically, our theory predicts that LoRA can outperform full fine-tuning when the difference between the pretraining and the downstream tasks is effectively low-rank. We further show how the choice of LoRA rank affects generalization performance, explaining why using a very small rank can improve test accuracy in certain settings, even though it limits model expressivity. Finally, we support our theoretical results with experiments on practical tasks, suggesting that the identified tradeoffs and insights extend beyond linear regression.

2605.19014 2026-05-20 cs.LG econ.EM stat.ML 版本更新

SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

SAGA:一种序列自适应的生成架构,用于多时间跨度概率预测的自适应时间符合预测

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Hafize Gonca Cömert

发表机构 * Department of Economics, Stockholm University(斯德哥尔摩大学经济系) Institute of Social Sciences, Faculty of Economics and Administrative Sciences, Süleyman Demirel University(苏莱曼·德米雷尔大学社会科学学院,经济学与行政科学学院)

AI总结 本文提出SAGA,一种用于不规则表格面板序列的解码器-only transformer,结合分割符合校准包装器,提供个体层面的预测区间,并保证有限样本边缘覆盖。SAGA在瑞典LISA登记处的纵向数据上训练,预测了1到30年的年度劳动收入,并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比,SAGA在10年时间跨度上将连续排名概率分数减少了31.9%,在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点,在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327,与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布,供在保护的SCB MONA环境中外的复制使用。

Comments 14 pages, 3 figures, 12 tables, 5 appendices, 45 references. Submitted to IEEE TPAMI. Source code at https://github.com/olaflaitinen/saga (archived: doi:10.5281/zenodo.20260366). Synthetic equivalent dataset: doi:10.5281/zenodo.20260287. Empirical work conducted on the Swedish LISA register via SCB MONA (project SCB-MONA-2026-147); ethical approval Swedish Ethical Review Authority 2026-04127-01

详情
AI中文摘要

用于财政部门和中央银行的微模拟模型依赖于参数过程来捕捉生命周期收入的寿命,这些过程只捕捉条件分布的一阶和二阶矩,忽略了长期非线性结构。我们提出SAGA,一种用于不规则表格面板序列的解码器-only transformer,结合分割符合校准包装器,提供个体层面的预测区间,并保证有限样本边缘覆盖。在1990年至2022年的纵向瑞典LISA登记处数据上训练,包含2,143,817个个体和61,284,903人年,模型预测了1到30年的年度劳动收入,并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比,SAGA在10年时间跨度上将连续排名概率分数减少了31.9%,在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点,在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327,与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布,供在保护的SCB MONA环境中外的复制使用。

英文摘要

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.

2605.19008 2026-05-20 cs.AI cs.CL cs.LG 版本更新

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

通过线学习的训练控制治理:在压力下受限制的自主训练以稳定性和效率

Anis Radianis

发表机构 * Qluon Inc.(Qluon公司)

AI总结 本文提出了一种名为Learn-by-Wire Guard (LBW-Guard)的受限制自主训练控制治理层,用于在压力下提高大型语言模型的稳定性和效率,通过在AdamW之上进行有界控制,以保持固定训练目标。

详情
AI中文摘要

现代语言模型训练越来越暴露于不稳定性、退化运行和计算浪费,特别是在使用激进的学习率、规模和运行时间压力条件时。本文介绍了Learn-by-Wire Guard (LBW-Guard),一种在AdamW之上运行的受限制自主训练控制治理层。而不是替换优化器更新规则,LBW-Guard通过观察训练 telemetry,解读对不稳定性敏感的制度,并在保持固定训练目标的同时对优化器执行应用有界控制。我们评估LBW-Guard在以Qwen2.5为中心的压力和鲁棒性套件中使用WikiText-103,以Qwen2.5-7B为经验锚点,与Qwen2.5-3B和Qwen2.5-14B进行模型大小比较,学习率压力测试,梯度裁剪基线以及无LoRA TinyLlama-1B全参数 sanity check。在7B参考设置中,LBW-Guard将最终困惑度从13.21降低到10.74,降低18.7%,同时将端到端时间从392.54秒降低到357.02秒,提高了1.10倍的速度。在更强的学习率压力下,AdamW在LR=3e-3时退化到最终困惑度1885.24,在LR=1e-3时为659.76,而LBW-Guard分别保持可训练性为11.57和10.33。梯度裁剪基线无法再现这种效果。这些结果支持了一个范围系统的结论,即对稳定性敏感的LLM训练可以受益于在优化器之上进行治理。LBW-Guard提供了证据,表明在压力下受限制的运行时间控制可以在保持生产力计算的同时,与优化器替换和局部梯度抑制保持不同。

英文摘要

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

2605.19004 2026-05-20 cs.CV cs.LG cs.RO 版本更新

EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction

EgoTraj: 用于多模态预测的现实世界人轨迹数据集

Ahmad Yehia, Abduallah Mohamed, Tianyi Wang, Jiseop Byeon, Kun Qian, Junfeng Jiao, Christian Claudel

发表机构 * Department of Civil, Architectural, and Environmental Engineering, The University of Texas at Austin(土木、建筑与环境工程系,德克萨斯大学奥斯汀分校) Meta Reality Labs(Meta现实实验室) School of Architecture, The University of Texas at Austin(建筑学院,德克萨斯大学奥斯汀分校)

AI总结 本文提出EgoTraj数据集,用于多模态预测,包含75个真实城市环境中的人导航轨迹,提供了同步的RGB视频和地面真实数据,包括6自由度头部姿态、3D眼 gaze向量和场景注释,展示了该数据集在AR感知、导航和辅助系统中的应用价值。

Comments 21 pages, 14 figures. Project page: https://github.com/yehiahmad/EgoTraj

详情
AI中文摘要

准确地从第一人称视角预测人类轨迹在人形机器人、可穿戴传感系统和辅助导航等应用中起着核心作用。然而,由于现实世界环境中缺乏第一人称轨迹数据集,这一方向的进展受到限制。为了解决这一需求,我们介绍了EgoTraj,一个使用Meta Quest Pro (MQPro)录制的egocentric多模态开放数据集。EgoTraj包含75个由多个MQPro穿戴设备在真实城市环境中收集的人导航轨迹。每个记录都提供了同步的RGB视频以及地面真实数据,包括连续时间同步的6自由度头部姿态、每帧3D眼 gaze向量和场景注释。据我们所知,EgoTraj不同于典型的egocentric轨迹数据集,因为它捕捉了在多样化的城市路线中进行的长视距、自主导航,具有广泛的参与者多样性。为了展示该数据集的潜力,我们对几种最先进的egocentric轨迹预测方法进行了基准测试,并进行了消融研究以分析注视、场景和运动提示的贡献。结果突显了EgoTraj在AR感知、导航和辅助系统中的实用性。EgoTraj数据集、代码和EgoViz仪表板已公开在https://github.com/yehiahmad/EgoTraj。

英文摘要

Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at https://github.com/yehiahmad/EgoTraj.

2605.18999 2026-05-20 cs.LG 版本更新

Distance-Aware Muon: Adaptive Step Scaling for Normalized Optimization

Distance-Aware Muon: Adaptive Step Scaling for Normalized Optimization

Yury Demidovich, Abhishek Chakraborty, Grigory Malinovsky, Angelia Nedić, Peter Richtárik

发表机构 * King Abdullah University of Science and Technology(国王阿卜杜勒-阿齐兹大学科学与技术学院) Arizona State University(亚利桑那州立大学)

AI总结 本文研究了Muon优化器在一般范数几何中的自适应步长缩放规则,提出三种互补算法,包括Distance-Adaptive Muon、Scale-Calibrated Muon和Distance-Free Muon,通过证明站arity保证、目标间隙界和信任区域半径选择,提升了优化性能。

详情
AI中文摘要

Muon和相关的归一化优化器将更新方向的选择与步长缩放的选择解耦,但其实际性能仍然对归一化步长的尺度敏感。我们研究了Muon在一般范数几何中的自适应缩放规则,并开发了三种互补算法。对于光滑非凸目标,我们引入了Distance-Adaptive Muon,其信任区域半径由轨迹探索的半径设定,并在轨迹有界假设下证明了站arity保证。随后,我们转向星凸目标,这是用于推理深度神经网络经验损失景观的可处理模型,在此设置中,我们首先引入Scale-Calibrated Muon,它保持Muon的指数移动平均,但通过当前梯度和动量计算的局部下降证书设置步长长度。对于该方法,我们在初始子水平集有界假设下证明了最后迭代的O(1/T)目标间隙界,其中对应的半径参数仅出现在分析中,而不是算法中。最后,我们开发了Distance-Free Muon,这是一种重新中心的信任区域方法,使用标量距离证书和主要化的一维搜索来选择信任区域半径,无需要求未知的初始化到全局最小值的距离。在Transformer语言建模(GPT-124M/WikiText-103)和图像分类(ViT-Tiny/CIFAR-100)上的实验表明,所提出的自适应缩放规则减少了对手动缩放调整的敏感性,并在测试预算下匹配或改进了调优的固定缩放Muon基线。

英文摘要

Muon and related normalized optimizers decouple the choice of update direction from the choice of step scale, but their practical performance remains sensitive to the scale of the normalized step. We study adaptive scaling rules for Muon in general norm geometries and develop three complementary algorithms. For smooth non-convex objectives, we introduce Distance-Adaptive Muon, whose trust-region radius is set from the radius explored by the trajectory, and prove a stationarity guarantee under a bounded-trajectory assumption. We then turn to star-convex objectives, a tractable model of the favorable global geometry often used to reason about the empirical loss landscapes of deep neural networks, where objective-gap guarantees are possible. In this setting, we first introduce Scale-Calibrated Muon, which keeps Muon's exponential moving average but sets the step length from a local descent certificate computed from the current gradient and momentum. For this method, we prove a last-iterate O(1/T) objective-gap bound under a bounded initial sublevel-set assumption, where the corresponding radius parameter appears only in the analysis and not in the algorithm. Finally, we develop Distance-Free Muon, a recentered trust-region method that uses a scalar distance certificate and a majorized one-dimensional search to select the trust-region radius without requiring the unknown distance from the initialization to a global minimizer. Experiments on Transformer language modeling (GPT-124M/WikiText-103) and image classification (ViT-Tiny/CIFAR-100) show that the proposed adaptive scaling rules reduce sensitivity to manual scale tuning and match or improve tuned fixed-scale Muon baselines under the tested budgets.

2605.18979 2026-05-20 cs.LG 版本更新

TabQL: In-Context Q-Learning with Tabular Foundation Models

TabQL: 基于表格基础模型的上下文Q学习

Qisai Liu, Zhanhong Jiang, Timilehin Ayanlade, Ashutosh Kumar Nirala, Yang Li, Aditya Balu, Soumik Sarkar

发表机构 * Department of Mechanical Engineering(机械工程系) Iowa State University(爱荷华州立大学) Translational AI Center(转化人工智能中心) Department of Computer Science(计算机科学系)

AI总结 本文提出TabQL,一种基于表格基础模型的强化学习框架,通过上下文学习能力替代传统参数Q网络,提升Q值表示的适应性与效率。

详情
AI中文摘要

我们提出了表格Q学习(TabQL),一种强化学习框架,该框架用具有上下文学习能力的表格基础模型替代传统参数Q网络。关键思想是通过序列到序列基础模型对状态-动作-Q值元组的表格化表示来表示Q值,从而通过条件于近期经验实现快速适应。TabQL不同于经典DQN之处在于利用(i)零次或少次射Q值推断通过上下文更新,以及(ii)使用标准DQN进行预热阶段以生成高质量的上下文。特别是,为了增强上下文质量,新的转移是通过执行TabQL输出的动作和DQN预测的Q值生成的。我们正式化了TabQL,分析了其收敛性和样本复杂度在温和假设下的表现,并展示了TabQL在上下文学习下介于原始Q学习和DQN之间。我们的分析表明,TabQL通过上下文学习消除了Bellman更新,从而比DQN更高效。通过多个基准的广泛数值实验,展示了所提TabQL的有效性和有效性。

英文摘要

We propose Tabular Q-Learning (TabQL), a reinforcement learning framework that replaces the conventional parametric Q-network in Deep Q-Learning (DQN) with a tabular foundation model endowed with in-context learning capabilities. The key idea is to represent Q-values through a sequence-to-sequence foundation model operating over a tabularized representation of state-action-Q-value tuples, enabling rapid adaptation from limited online interaction by conditioning on recent experience. TabQL departs from classical DQN by leveraging (i) zero- or few-shot Q-value inference via in-context updates, and (ii) a warm-up phase using standard DQN to bootstrap high-quality context. Particularly, to enhance the context quality, new transitions are generated by executing actions output by TabQL with predicted Q values from DQN. We formalize TabQL, analyze its convergence and sample complexity under mild assumptions, and show that TabQL interpolates between vanilla Q-learning and DQN with in-context learning. Our analysis demonstrates that TabQL achieves improved efficiency compared to DQN by amortizing Bellman updates through in-context learning. Extensive numerical experiments with several benchmarks showcase the effectiveness and efficacy of the proposed TabQL.

2605.18971 2026-05-20 cs.LG cs.AI 版本更新

Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

塑造先验:合成任务分布如何决定表格基础模型的质量

Mohamed Bouadi, Nassim Bouarour, Varun Kulkarni, Shivam Dubey, Aditya Tanna, Vinay Kumar Sankarapu

发表机构 * Lexsi Labs(Lexsi实验室)

AI总结 本文研究了合成任务分布对表格基础模型质量的影响,提出O'Prior方法,通过四个耦合组件构建更真实的先验,提升了下游任务的准确性和鲁棒性。

详情
AI中文摘要

什么是决定表格基础模型质量的因素?与语言或视觉不同,表格基础模型的归纳偏倚几乎完全来自于合成预训练分布,但这些分布的设计仍不明确。标准的合成先验过于良好:它们忽略了不规则性和失败模式,这些决定了部署的鲁棒性。我们引入O'Prior,一种基于四个耦合组件的组合现实先验:一个跨越不同功能家族的分层SCM元生成器;一个覆盖异质边际、缺失值和目标转换的模块化现实引擎;一个显式压力模块注入混淆和支持-查询不匹配;以及一个受课程指导、泄漏安全的生成协议。为了将先验设计作为科学变量隔离,我们固定了架构、优化器和计算预算,只改变合成任务分布。O'Prior在真实表格基准上实现了持续且显著的改进,收益集中在分布不规则性特征的领域。消融实验确认了机制多样性、现实组成和移位感知压力各自独立贡献,其效果不可互换。这些结果确立了合成先验构建作为表格基础模型质量的第一性且长期被忽视的决定因素。

英文摘要

What determines the quality of a tabular foundation model? Unlike language or vision, tabular foundation models acquire their inductive biases almost entirely from synthetic pretraining distributions, yet the design of these distributions remains poorly understood. Standard synthetic priors are too well-behaved: they omit the irregularities and failure modes that determine deployment robustness. We introduce O'Prior, a compositional realism prior built around four coupled components: a hierarchical SCM meta-generator spanning diverse functional families; a modular realism engine covering heterogeneous marginals, missingness, and target transforms; an explicit stress module injecting confounding and support-query mismatch; and a curriculum-governed, leakage-safe generation protocol. To isolate prior design as the scientific variable, we hold architecture, optimizer, and compute budget fixed and vary only the synthetic task distribution. O'Prior yields consistent and substantial improvements in downstream accuracy and robustness across real tabular benchmarks, with gains concentrated in regimes characterized by distributional irregularities. Ablations confirm that mechanism diversity, realism composition, and shift-aware stress each contribute independently, their effects are not interchangeable. These results establish synthetic prior construction as a first-order and largely overlooked determinant of tabular foundation model quality

2605.18959 2026-05-20 astro-ph.IM astro-ph.CO astro-ph.EP astro-ph.GA cs.LG 版本更新

Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid

Hyrax:一个用于快速机器学习实验和无监督发现的可扩展框架,在Rubin、Roman和Euclid时代

Aritra Ghosh, Drew Oldag, Michael Tauraso, Andrew J. Connolly, Peter Ferguson, Derek Jones, Gourav Khullar, Argyro Sasli, Samarth Venkatesh, Gracia Wang, Maxine West, Dylan Berry, Neven Caplar, Colin Orion Chandler, Tanawan Chatchadanoraset, Michael W. Coughlin, Melissa DeLucchi, Alexandra Junell, Diego Miura, Felipe Fontinele Nunes, Wilson Beebe, Doug Branton, Sandro Campos, Liam Cunningham, Mi Dai, Jeremy Kubica, Konstantin Malanchev, Rachel Mandelbaum, Sean McGuire, Imad Pasha, Dan S. Taranu, Tianqing Zhang

发表机构 * Dept. of Astronomy \& the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA School of Physics Astronomy, University of Minnesota, Minneapolis, MN 55455, USA Department of Astronomy Planetary Science, Northern Arizona University, Flagstaff, USA McWilliams Center for Cosmology Astrophysics, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA

AI总结 本文提出Hyrax,一个支持天文领域完整机器学习生命周期的开源框架,通过五个实际应用展示了其在大规模天文数据中的无监督发现和监督检测能力,为下一代天文调查提供了系统化的机器学习基础设施。

Comments 28 pages, 20 figures, submitted to AJ

详情
AI中文摘要

NSF-DOE Vera C. Rubin Observatory、Roman Space Telescope、Euclid及其他下一代调查将提供大规模的成像、光谱和时域数据,这使得天文机器学习(ML)项目中的瓶颈从模型设计转向了基础设施。我们介绍了Hyrax,一个开源、模块化、基于GPU的Python框架,支持天文领域的完整ML生命周期:从数据获取和训练到推理和实验比较,具备多模态数据集支持、集成向量数据库用于相似性搜索以及交互式的二维和三维潜在空间探索用于无监督发现。我们通过五个代表性的应用展示了Hyrax的多功能性:(i)在约4×10^5个Rubin Legacy Survey of Space and Time(LSST)Data Preview 1(DP1)星系上进行无监督表示学习,发现新的合并体和低表面亮度候选者,同时隔离成像伪影,而无需标记训练数据;(ii)混合密度基于聚类用于识别DP1数据中的星系团尺度引力透镜候选者;(iii)利用光变曲线、光谱、图像和元数据进行多模态早期时间瞬变分类,利用Zwicky Transient Facility;(iv)在Dark Energy Camera Ecliptic Exploration Project调查中利用位移和堆叠搜索对遥远太阳系天体进行监督性假阳性过滤;(v)利用合成源注入在Hyper Suprime-Cam和LSST类成像中监督检测半解析矮星系。这些结果共同表明,Hyrax为天文特定的机器学习基础设施提供了系统化的发现和快速的方法论迭代能力,适用于下一代天文调查。

英文摘要

The NSF-DOE Vera C. Rubin Observatory, Roman Space Telescope, Euclid, and other next-generation surveys will deliver imaging, spectroscopic, and time-domain data at scales that increasingly shift the bottleneck in astronomical machine learning (ML) projects from model design to infrastructure. We present Hyrax, an open-source, modular, GPU-enabled Python framework that supports the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive two- and three-dimensional latent-space exploration for unsupervised discovery. We demonstrate Hyrax's versatility through five representative applications on real survey data: (i) unsupervised representation learning on $\sim 4\times10^5$ Rubin Legacy Survey of Space and Time (LSST) Data Preview 1 (DP1) galaxies, surfacing new merger and low-surface-brightness candidates missing from reference Euclid and Dark Energy Survey catalogs, while also isolating imaging artifacts -- all without labeled training data; (ii) hybrid density-based clustering for identifying cluster-scale gravitational lens candidates in DP1 data; (iii) multimodal early-time transient classification in the Zwicky Transient Facility leveraging light curves, spectra, images, and metadata; (iv) supervised false-positive filtering in shift-and-stack searches for distant solar system objects in the Dark Energy Camera Ecliptic Exploration Project survey; and (v) supervised detection of semi-resolved dwarf galaxies in Hyper Suprime-Cam and LSST-like imaging using synthetic source injection. Together, these results demonstrate that Hyrax provides astronomy-specific ML infrastructure that enables systematic discovery and rapid methodological iteration across next-generation astronomical surveys.

2605.18933 2026-05-20 cs.LG 版本更新

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

对ReLU + RMSNorm块在三元量化下的符号幅度不对称性进行几何分析

Lei Dong

发表机构 * Independent Researcher(独立研究者)

AI总结 本文通过符号幅度分解解释了在三元量化下ReLU + RMSNorm块的符号幅度不对称性,揭示了ReLU和RMSNorm在权重扰动中的几何机制,并通过实验验证了这种不对称性在实际模型中的表现。

Comments 53 pages, 2 figures, 21 tables, 7 appendices

详情
AI中文摘要

预归一化变换器使用RMSNorm可以容忍三元{-1,0,+1}权重量化,其损失出人意料的小(Ma等人,2024)。我们通过符号幅度分解给出了几何解释。在具有独立同分布高斯权重的两层ReLU + RMSNorm模型中,符号翻转产生的横向输出能量是符号保持幅度扰动的π/(π-2)≈2.75倍,当翻转率p→0时(定理3)。机制:ReLU在两种扰动类型之间创建了隐藏空间的方向不对称性,RMSNorm的横向投影Fréchet导数选择性地暴露了这种不对称性。符号量化误差本身是一种符号保持的扰动,具有角度对齐cos²→2/π(定理4);其后ReLU径向分数(0.365)与前ReLU值1-2/π在0.4%内一致,因此ReLU对三元误差几乎是透明的。多层叠加的2.75倍因子未被实验支持;与真实模型符号敏感性之间的差距源于异常特征违反去局部化。对于幅度为α的输入维度,单个符号翻转产生的后ReLU能量放大约为R≈nα²,相对于去局部化的条目。在TinyLlama-1.1B上,线性响应(p≤0.5%)下,计数匹配的NLL利用稳定在约10×≈nE[α²],与每条目理论一致;所有列NLL比率为5.0×,在R_col≤19内(67×PPL差距反映了度量非线性)。测量的异常α在第12层(中位数0.024,最大0.26)确认了重尾浓度。Bussgang常数2/π、RMSNorm几何和ReLU半空间结构共同解释了预归一化模型中的符号幅度不对称性,R≈nα²解释了真实模型的偏差。

英文摘要

Pre-norm Transformers with RMSNorm tolerate ternary {-1,0,+1} weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce $π/(π-2) \approx 2.75$ times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate $p \to 0$ (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fréchet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment $\cos^2 \to 2/π$ (Theorem 4); its post-ReLU radial fraction ($0.365$) matches the pre-ReLU value $1-2/π$ within $0.4\%$, so ReLU is approximately transparent to ternary error. Multi-layer compounding of the $2.75\times$ factor is not experimentally supported; the gap to real-model sign sensitivity arises from outlier features violating delocalization. For an input dimension with amplitude $α$, a single sign-flip produces post-ReLU energy amplified by $R \approx nα^2$ relative to a delocalized entry. On TinyLlama-1.1B, at linear response ($p \leq 0.5\%$), count-matched NLL leverage stabilizes at $\sim 10\times \approx n\mathbb{E}[α^2]$, matching the per-entry theory; the all-column NLL ratio of $5.0\times$ falls within $R_{\mathrm{col}} \leq 19$ ($67\times$ PPL gap reflects metric nonlinearity). Measured outlier $α$ at layer 12 (median $0.024$, max $0.26$) confirms heavy-tailed concentration. The Bussgang constant $2/π$, RMSNorm geometry, and ReLU half-space structure together explain sign-magnitude asymmetry in pre-norm models, with $R \propto nα^2$ accounting for real-model deviations.

2605.18930 2026-05-20 cs.CR cs.AI cs.LG 版本更新

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

OEP: 通过局部正确但不可转移的经验污染自演化LLM代理

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou, Jie Li

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 研究探讨了通过局部正确但不可转移的经验污染自演化LLM代理的安全风险,提出OEP攻击方法,利用低权限黑盒攻击在无需直接控制系统提示或记忆数据库的情况下诱导有害泛化。

详情
AI中文摘要

记忆增强型大语言模型(LLM)代理通过迭代反思和自我进化解决复杂任务,但这些机制引入了安全风险。现有代理记忆攻击需要特权访问或显式恶意内容,使其能够被高级安全过滤器检测到。这留下了一个未被充分探索的攻击面:对手是否能够诱导代理生成看起来局部正确且语义合理但会导致反思期间有害泛化的经验。我们发现,反思代理对这种干净经验存在漏洞,尤其是在与严重但合理的假设后果相结合时。基于这一观察,我们引入了强迫经验污染(OEP),一种低权限黑盒攻击,不需要直接控制系统提示或记忆数据库。OEP构建了对抗性的干净边缘案例,结合局部正确的解决方案、不可转移的方法和严重后果,使反思偏向风险规避的规则形成。在记忆巩固期间,代理可能过度信任自生成的反思,并将局部经验转化为高优先级但过度泛化的规则,导致下游故障。在三个领域的评估显示,OEP在GPT-4o代理上实现了超过50%的ASR,并在LLM审核防御下优于现有攻击。

英文摘要

Memory-augmented large language model (LLM) agents use iterative reflection and self-evolution to solve complex tasks, but these mechanisms introduce security risks. Existing agentic memory attacks require privileged access or explicit malicious content, making them detectable by advanced safety filters. This leaves a subtler attack surface underexplored: whether adversaries can induce agent to generate experiences that appear locally correct and semantically plausible yet induce harmful generalization during reflection. We find that reflective agents are vulnerable to such clean experiences, especially when paired with severe but plausible hypothetical consequences. Based on this observation, we introduce Obsessive Experience Poisoning (OEP), a low-privilege black-box attack requiring no direct control over the system prompt or memory database. OEP constructs adversarial clean edge-cases that combine locally correct solutions, non-transferable methods, and severe consequences, biasing reflection toward risk-averse rule formation. During memory consolidation, agents may over-trust self-generated reflections and distill localized experiences into high-priority but over-generalized rules, causing downstream failures. Evaluations across three domains show that OEP achieves ASR above 50\% with GPT-4o agents, and outperforms existing attacks under LLM auditing defense.

2605.18927 2026-05-20 stat.ML cs.LG math.PR 版本更新

Bayesian Latent Space Models for Graphs Are Misspecified: Toward Robust Inference via Generalized Posteriors

基于图的贝叶斯潜在空间模型存在规格问题:通过广义后验实现稳健推断

Aldric Labarthe

发表机构 * Centre Borelli, Université Paris-Saclay(巴黎-萨克雷大学博雷利中心) Department of Computer Science, University of Geneva(日内瓦大学计算机科学系)

AI总结 本文研究了基于图的贝叶斯潜在空间模型的规格问题,提出了一种广义后验框架,通过Link-Sequential R-SafeBayes方法改进模型的鲁棒性,提升了校准性和链接预测性能。

详情
AI中文摘要

贝叶斯潜在空间模型为网络表示提供了一种系统的方法,但依赖于几何和链接函数的正确规范。现实中的网络经常违反这些假设,表现出几何不匹配和结构异常,破坏标准度量属性。我们证明,这种不规范会将数据生成分布推离模型类,导致贝叶斯推断变得过于自信且校准不佳。为了解决这个问题,我们提出了一种随机几何图的广义后验框架。我们引入了Link-Sequential R-SafeBayes方法,该方法利用二元条件独立性来估计预quential风险并自适应地调节后验正则化。在合成和现实网络上的实验表明,改进了校准性,提高了链接预测性能,并提供了一个可靠的准则来选择欧几里得、球面和双曲空间中的潜在几何结构。

英文摘要

Bayesian latent space models offer a principled approach to network representation, but rely on correct specification of both geometry and link function. Real-world networks often violate these assumptions, exhibiting geometric mismatch and structural anomalies that break standard metric properties. We show that such misspecification pushes the data-generating distribution outside the model class, causing Bayesian inference to become overconfident and poorly calibrated. To address this, we propose a generalized posterior framework for random geometric graphs. We introduce Link-Sequential R-SafeBayes, a method that exploits dyadic conditional independence to estimate prequential risk and adaptively tune posterior regularization. Experiments on synthetic and real-world networks demonstrate improved calibration, better link prediction performance, and a reliable criterion for selecting latent geometries across Euclidean, spherical, and hyperbolic spaces.

2605.18923 2026-05-20 eess.IV cs.CV cs.LG q-bio.QM 版本更新

From Division to Decision: Leveraging Temporal Cell-Stage Segmentation for Embryo Transferability Prediction

从分裂到决策:利用时间细胞阶段分割预测胚胎可转移性

Yasmine Hachani, Patrick Bouthemy, Elisa Fromont, Véronique Duranthon, Ludivine Laffont, Alline de Paula Reis

发表机构 * Inria center at Rennes University, Paris-Saclay University, UVSQ, INRAE, BREED(里昂大学Inria研究中心、巴黎萨克雷大学、UVSQ、INRAE、BREED) University of Rennes, IRISA(雷恩大学、IRISA) The National Veterinary School of Alfort(阿尔福兽医学校)

AI总结 该研究提出TransFACT框架,利用时间 lapse 视频中的早期发育阶段信息,通过结合帧级时间特征和阶段级表示,预测胚胎可转移性,优于现有方法。

详情
Journal ref
ICIP 2026 - IEEE International Conference on Image Processing, Sep 2026, Tampere, Finland
AI中文摘要

准确选择牛胚胎是一项具有挑战性的任务,因为当前实践依赖于受精后第七天单一专家评估,导致高妊娠丢失率。时间延展显微镜提供了早期发育的详细信息,但由于复杂的运动模式和耗时的分析而难以利用。我们提出TransFACT,一种基于变压器的框架,用于使用发育前四天的2D时间延展视频建模早期发育阶段和胚胎可转移性。TransFACT结合帧级时间特征和阶段级表示,利用发育阶段作为辅助监督,在第四天预测可转移性。我们的实验表明,TransFACT通过利用现有用于动作识别的方法,在预测胚胎可转移性方面优于其竞争对手。

英文摘要

Accurate selection of bovine embryos is a challenging task, as current practice relies on a single expert assessment on the seventh day after insemination, resulting in high rates of pregnancy loss. Time-lapse videomicroscopy provides detailed information on early development, but is difficult to exploit because of complex motion patterns and time-consuming analysis. We propose TransFACT, a transformer-based framework for modeling early developmental stages and embryo transferability using 2D time-lapse videos from the first four days of development. TransFACT combines frame-level temporal features with stage-level representations, using developmental stages as auxiliary supervision to predict transferability on day four. Our experiments demonstrate that TransFACT, by leveraging an existing method designed for action recognition, achieves superior performance than its competitor in predicting embryo transferability.

2605.18919 2026-05-20 cs.CR cs.AI cs.LG 版本更新

MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary Attacks

MoCo-EA:利用对抗模式连接实现高效的进化攻击

Hyo Seo Kim, Gang Luo, Can Chen, Binghui Wang, Yue Duan, Ren Wang

发表机构 * Illinois Institute of Technology(伊利诺伊理工学院) University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) Singapore Management University(新加坡管理大学)

AI总结 本文提出MoCo-EA,一种通过利用对抗模式连接来提高效率的进化攻击方法,该方法通过贝塞尔交叉算子优化扰动,提升了攻击效果并减少了收敛时间和查询需求。

详情
AI中文摘要

进化算法用于对抗攻击通过群体搜索发现无梯度信息的扰动,但传统的交叉操作效率低下,会通过离散插值破坏对抗属性。我们引入了模式连接进化攻击(MoCo-EA),用一种新的贝塞尔交叉算子替代传统交叉,优化扰动沿连续贝塞尔曲线之间。我们的关键见解是对抗示例位于连接的流形上,中间点维持并经常增强攻击效果。我们展示了三个发现:(1)成功的对抗扰动表现出模式连接;(2)优化路径上的中间点比端点具有更高的可转移性;(3)贝塞尔交叉显著优于离散遗传操作,同时减少收敛时间和查询需求。通过利用对抗空间的几何结构通过路径优化,MoCo-EA提供了一种高效且可靠的方法。我们的工作挑战了对抗示例作为孤立点的传统观点,并为攻击生成和防御研究开辟了新方向。

英文摘要

Evolutionary algorithms for adversarial attacks leverage population-based search to discover perturbations without gradient information, but suffer from inefficient crossover operations that destroy adversarial properties through discrete interpolation. We introduce Mode Connectivity Evolutionary Attack (MoCo-EA), which replaces traditional crossover with a novel Bézier crossover operator that optimizes perturbations along a continuous Bézier curve between parent perturbations. Our key insight is that adversarial examples lie on connected manifolds where intermediate points maintain and often enhance attack effectiveness. We demonstrate three findings: (1) Successful adversarial perturbations exhibit mode connectivity; (2) Intermediate points along optimized paths achieve higher transferability than endpoints; (3) Bézier crossover dramatically outperforms discrete genetic operations while reducing convergence time and query requirements. By exploiting the geometric structure of adversarial space through path optimization, MoCo-EA provides an efficient and reliable method. Our work challenges the traditional view of adversarial examples as isolated points and opens new directions for both attack generation and defense research.

2605.18913 2026-05-20 cs.CR cs.AI cs.LG 版本更新

SCAFDS: Edge-Feature Graph Attention for Interbank Fraud Detection with Attribution-Grounded SAR Generation

SCAFDS: 基于边特征图注意力的跨银行欺诈检测与归因驱动的SAR生成

Mohammad Nasir Uddin

发表机构 * Taskimpetus Inc.(Taskimpetus公司)

AI总结 本文提出SCAFDS系统,通过七阶段集成监控流程解决现有方法的五个结构性限制,利用欺诈共现边特征进行跨银行拓扑编码,结合节点表示和欺诈共现边特征进行边特征引导的图注意力,生成机构级系统性欺诈风险评分,并通过归因条件生成SAR叙述,实现每个FinCEN SAR断言的可追溯性,最终在IEEE-CIS欺诈检测数据集和合成FDIC对齐的跨银行网络上取得了显著的AUPRC和AUROC提升。

详情
AI中文摘要

美国金融系统每天处理约130万笔跨银行交易,但现有文献中没有系统利用欺诈共现边特征来建模跨银行网络中的欺诈传播。先前的跨银行GNN架构使用信用困境监督信号建模信用传染,导致欺诈取证系统不匹配。没有现有系统能生成带有每个断言的取证追溯性的SAR叙述,从而在提交给FinCEN的报告中产生监管审计缺口。本文引入SCAFDS(系统性传染意识欺诈检测系统),一个七阶段集成监控流程,解决现有方法的五个结构性限制:(1)利用FinCEN SAR注册记录中的欺诈共现频率度量f(u,v,t)进行欺诈特定的跨银行拓扑编码;(2)基于节点表示和欺诈共现边特征的边特征引导的图注意力,其中系数由两者计算得出;(3)双线性欺诈共现风险融合,产生机构级系统性欺诈风险评分;(4)归因条件的SAR叙述生成,每个FinCEN SAR断言具有显著性阈值,确保每个FinCEN SAR断言可追溯到特定的数值管道输出;(5)拓扑感知的自适应取证反馈更新图注意力权重,从监管处置中更新。在IEEE-CIS欺诈检测数据集(590,540笔交易)和一个合成FDIC对齐的跨银行网络(8,103个机构,169,800条边)上的实验表明,SCAFDS在AUPRC=0.515±0.032和AUROC=0.802±0.018,比GraphSAGE-AML提升了+15.9个百分点和+13.7个百分点。部分验证FDIC执法行动记录(n=4,279)确认了模型排名的一致性。美国专利商标局临时专利申请号64/061,083,于2026年5月8日提交。

英文摘要

The U.S. financial system processes approximately 1.3 million interbank transactions daily, yet no system in the reviewed literature models fraud propagation across the interbank network using fraud co-occurrence edge features. Prior interbank GNN architectures model credit contagion using credit distress supervision signals, producing systems misaligned for fraud forensics. No existing system generates SAR narratives with per-assertion forensic traceability to specific numerical detection outputs, creating regulatory auditability gaps in FinCEN-submitted reports. This paper introduces SCAFDS (Systemic Contagion-Aware Fraud Detection System), a seven-stage integrated surveillance pipeline addressing five structural limitations of prior art: (1) fraud-specific interbank topology encoding using fraud co-occurrence frequency metrics f(u,v,t) derived from FinCEN SAR registry records; (2) edge-feature-informed graph attention where coefficients are computed from both node representations and fraud co-occurrence edge features; (3) bilinear fraud co-occurrence risk fusion producing institution-level systemic fraud risk scores; (4) attribution-conditioned SAR narrative generation with per-assertion significance thresholds ensuring each FinCEN SAR assertion is traceable to a specific numerical pipeline output; and (5) topology-aware adaptive forensic feedback updating graph attention weights from regulatory dispositions. Experiments on the IEEE-CIS Fraud Detection Dataset (590,540 transactions) and a synthetic FDIC-aligned interbank network (8,103 institutions, 169,800 edges) show SCAFDS achieves AUPRC=0.515+/-0.032 and AUROC=0.802+/-0.018, representing +15.9pp and +13.7pp improvements over GraphSAGE-AML. Partial validation on FDIC enforcement action records (n=4,279) confirms consistent model ranking. USPTO Provisional Patent Application No. 64/061,083, filed May 8, 2026.

2605.18908 2026-05-20 cs.CR cs.AI cs.LG 版本更新

Fast and Lightweight Backdoor Detection via Head Random Probing

通过头部随机探测实现快速且轻量的后门检测

Yinbo Yu, Xueyu Yin, Jing Fang, Chunwei Tian, Qi Zhu, Jiajia Liu, Daoqiang Zhang

发表机构 * College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics(南京航空航天大学人工智能学院) School of Cybersecurity, Northwestern Polytechnical University(西北工业大学网络安全学院) Shenzhen Research Institute of Northwestern Polytechnical University(西北工业大学深圳研究院) School of Computer Science and Technology, Harbin Institute of Technology(哈尔滨工业大学计算机科学与技术学院)

AI总结 本文提出HTell,一种基于头部随机探测的快速且轻量的数据无关后门检测器,通过分析模型预测头部在随机潜在探测下的响应统计,实现高效准确的后门检测。

详情
AI中文摘要

深度神经网络(DNN)仍然对后门攻击极度脆弱。现有的训练后检测器通常需要干净或替代数据、梯度或迭代触发器重建,导致计算成本高且在实际模型审计场景中鲁棒性有限。本文提出HTell,一种基于头部随机探测的快速且轻量的数据无关后门检测器。与重建多样化的触发模式不同,HTell检查其在预测头部的统一表现:被篡改的模型倾向于在随机潜在探测下在目标类别上表现出异常的响应集中。HTell生成架构感知的随机潜在探测,直接将其输入模型头部,并通过分析类别响应统计来检测后门,而无需访问真实或替代数据、模型梯度或参数优化。我们在包含超过6000个被篡改模型和700个干净模型的大型基准上评估HTell,涵盖4个数据集、14种架构和21种后门攻击类型。HTell在仅12.69毫秒/模型的检测延迟下实现了99.03%的真阳性率和2.11%的假阳性率,将时间成本降低了超过30,000倍,相较于代表性的梯度基检测器。这些结果表明,头部随机探测提供了一种准确、鲁棒且高效的解决方案,用于大规模的数据无关后门模型审计。

英文摘要

Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.

2605.18905 2026-05-20 cs.LG cs.AI cs.NA cs.NE math.NA 版本更新

Stability and Discretization Error of State Space Model Neural Operators

状态空间模型神经算子的稳定性与离散化误差

Abderrahim Bendahi, Adrien Fradin, Johan Peralez, Julie Digne, Madiha Nadri

发表机构 * École polytechnique(巴黎政治经济学院) Université Claude Bernard Lyon 1(里昂1大学) CNRS(法国国家科学研究中心) LAGEPP UMR 5007 Université Lyon 1(里昂1大学) INSA Lyon(里昂国立应用科学学院) LIRIS(里昂图像与信号研究所)

AI总结 本文研究了状态空间模型神经算子的稳定性与离散化误差,通过理论分析建立了神经算子近似方案的离散误差和稳定性保证,提出了针对SS-NOs和FNOs的新的离散误差定理,并通过实验验证了其在不同分辨率下的鲁棒性。

详情
AI中文摘要

神经算子已作为一种强大的、与离散化无关的框架,用于求解偏微分方程(PDEs)。尽管已建立的方法如深度运算网络(DeepONet)已成功实现了运算符的通用逼近,而如傅里叶神经算子(FNOs)等架构已显示出代数收敛速率,但连续理论与其离散数值实现之间的精确理论联系仍是一个挑战。具体来说,连续公式与离散数值稳定性之间的关系尚未被充分探索。在本文中,我们通过建立神经算子近似方案的离散误差和稳定性的理论保证来填补这一空白。我们证明了将解的正则性与输入离散化联系起来的分析界,提供了在现实数值约束下神经算子精度的正式量化。我们为SS-NOs和FNOs的具体情况推导了这些界,从而为这些模型提出了新的离散误差定理。此外,通过输入到状态稳定性(ISS)分析,我们正式评估了离散化对连续域中SS-NOs结果稳定性的影响。我们在1D和2D基准上的实验证实了我们的理论界,并展示了SS-NOs在不同分辨率下的鲁棒性。

英文摘要

Neural operators have emerged as a powerful, discretization-invariant framework for solving partial differential equations (PDEs). Although established approaches like the Deep Operator Network (DeepONet) have successfully achieved universal approximation for operators, and architectures such as Fourier Neural Operators (FNOs) have shown algebraic convergence rates, a precise theoretical connection between the continuous theory and its discrete numerical implementation remains a challenge. Specifically, the relationship between the continuous formulation and the discrete numerical stability has yet to be fully explored. In this paper, we address this gap by establishing theoretical guarantees for the discretization error and stability of neural operator approximation schemes. We prove analytical bounds that link solution regularity to input discretization, providing a formal quantification of neural operator accuracy under real-world numerical constraints. We derive these bounds to the specific cases of State Space Model-based Neural Operators (SS-NOs) and FNOs, thus providing a new discretization error theorem for these models. Additionally, through an input-to-state stability (ISS) analysis, we formally assess the impact of discretization on the stability of SS-NOs results obtained in the continuous domain. Our empirical experiments on 1D and 2D benchmarks validate our theoretical bounds and show the robustness of SS-NOs under varying resolutions.

2605.18904 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Dynamic Model Merging Made Slim

动态模型合并的轻量级方法

Guodong Du, Wanyu Lin

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文提出DiDi-Merging方法,通过可微分的秩分配平衡共享和专家参数,实现更高效的动态模型合并,在参数量上显著优于现有方法。

详情
AI中文摘要

模型合并使在不联合训练或访问原始数据的情况下重用微调模型成为可能。动态合并进一步通过选择性激活任务相关参数并高效组合多个任务的专家来提高灵活性。然而,现有动态方法要么维护一个完整的共享模型加小专家,要么为专家分配过多容量,导致准确性与效率之间的权衡不优。为此,我们提出DiDi-Merging,一种轻量动态合并框架,利用可微分的秩分配来平衡共享和专家参数。通过将参数预算分配建模为低秩模块中的可微分秩优化,并引入无需数据的细化步骤来恢复任务保真度,DiDi-Merging在仅1.24倍单个微调模型参数的情况下匹配现有动态基线,并在1.4倍时超越它们,显著优于需要>2倍存储容量的方法。DiDi-Merging适用于视觉、语言和多模态任务。

英文摘要

Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity to experts, leading to suboptimal accuracy--efficiency trade-offs. To address this, we propose DiDi-Merging, a slim dynamic merging framework that leverages differentiable rank allocation to balance shared and expert parameters. By formulating parameter budgeting as differentiable rank optimization in low-rank modules and introducing a data-free refinement step to recover task fidelity, DiDi-Merging matches prior dynamic baselines at only 1.24x the parameters of a single fine-tuned model and surpasses them at 1.4x, substantially more compact than methods requiring > 2x storage. DiDi-Merging applies across vision, language, and multimodal tasks.

2605.18903 2026-05-20 cs.LG cs.CV 版本更新

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era

推理可移植性:引导MLLMs在RLVR时代的持续学习

Qiuhe Hong, Yuyang Liu, Shuo Yang, Tiantian Peng, Fei Zhu, Yonghong Tian

发表机构 * Shenzhen Graduate School of Peking University(北京大学深圳研究生院) Centre for Artificial Intelligence and Robotics, HKISI, CAS(香港科学院人工智能与机器人研究中心) Peng Cheng Laboratory(鹏城实验室)

AI总结 本文提出了一种名为推理可移植性(RP)的机制,通过在持续学习中引入推理层面的约束,改进了多模态大语言模型在RLVR环境下的适应能力,实验表明RDB-CL在提升最后准确率方面优于基线方法。

详情
AI中文摘要

在持续学习中,视觉-语言模型(VLM-CL)旨在不断适应新多模态任务的同时保留先前知识。新兴的将多模态大语言模型(MLLMs)与具有可验证奖励的强化学习(RLVR)相结合的范式,要求一种新的模式来引导持续适应。随着推理能力的进步,现在可以在推理层面施加约束。我们正式化了可移植性,即一个样本级别的度量,用于衡量先前策略行为在新任务中的可重用性,并实证表明推理层面的信号在分布外样本上仍可靠,而答案层面的信号则不然。我们将此形式化为推理可移植性(RP),并提出基于推理的动态平衡持续学习(RDB-CL),该方法根据RP调节RLVR中的每样本Kullback-Leibler正则化:一个紧密的锚点在高RP样本上保留可重用的推理,而低RP样本上的放松锚点则允许探索新的推理路径。实验表明,RDB-CL在提升最后准确率方面优于基线方法,相比 vanilla RLVR 基线提升了+12.0%。

英文摘要

Vision-Language Models in Continual Learning (VLM-CL) aim to continuously adapt to new multimodal tasks while retaining prior knowledge. The emerging paradigm that couples Multimodal Large Language Models (MLLMs) with Reinforcement Learning with Verifiable Rewards (RLVR) calls for a new pattern to guide continual adaptation. Advances in reasoning capability now make it feasible to impose constraints at the reasoning level. We formalize portability, a sample-level measure of how reusable the previous policy's behavior is on a new task, and empirically show that reasoning-level signals remain reliable on out-of-distribution samples while answer-level signals do not. We instantiate this as Reasoning Portability (RP) and propose Reasoning-based Dynamic Balance Continual Learning (RDB-CL), which modulates the per-sample Kullback-Leibler regularization in RLVR according to RP: a tight anchor preserves reusable reasoning on high-RP samples, while a relaxed anchor on low-RP samples permits exploration of new reasoning pathways. Experiments show that RDB-CL consistently outperforms baselines, improving Last accuracy by +12.0% over the vanilla RLVR baseline.

2605.18902 2026-05-20 cs.IT cs.LG math.IT 版本更新

Variational Diffusion Channel Decoder

变分扩散通道解码器

Chengwei Zhang, Yifan Du, Siyu Liao

发表机构 * The School of Integrated Circuits(集成电路学院) Sun Yat-sen University(中山大学) Shenzhen, China(深圳,中国)

AI总结 本文提出一种高效的变分扩散模型基于通道解码器,结合领域特定的信念传播过程和扩散模型的强学习能力,实现了低成本和高纠错性能。

详情
AI中文摘要

神经通道解码器作为一种数据驱动的信道解码策略,已在纠错能力方面展现出非常有前途的改进,优于经典方法。然而,这些基于深度学习的解码器的成功是以模型存储和计算复杂性大幅增加为代价的,阻碍了其在现实世界中对时间敏感和资源敏感的通信和存储系统中的实际应用。为了解决这一挑战,我们提出了一种高效的变分扩散模型基于通道解码器,有效地将领域特定的信念传播过程整合到现代扩散模型中。通过利用信念传播的低成本优势和扩散模型的强大学习能力,我们提出的神经解码器同时实现了极低的成本和高纠错性能。实验结果表明,与最先进的神经通道解码器相比,我们的模型通过在显著减少计算成本和模型大小的同时实现最佳解码性能,提供了一种可行的实用部署方案。

英文摘要

Neural channel decoder, as a data-driven channel decoding strategy, has shown very promising improvement on error-correcting capability over the classical methods. However, the success of those deep learning-based decoder comes at the cost of drastically increased model storage and computational complexity, hindering their practical adoptions in real-world time-sensitive resource-sensitive communication and storage systems. To address this challenge, we propose an efficient variational diffusion model-based channel decoder, which effectively integrates the domain-specific belief propagation process to the modern diffusion model. By reaping the low-cost benefits of belief propagation and strong learning capability of diffusion model, our proposed neural decoder simultaneously achieves very low cost and high error-correcting performance. Experimental results show that, compared with the state-of-the-art neural channel decoders, our model provides a feasible solution for practical deployment via achieving the best decoding performance with significantly reduced computational cost and model size.

2605.18900 2026-05-20 q-bio.OT cs.LG 版本更新

A Logistic Regression Model to Predict Malaria Severity in Children

一种用于预测儿童疟疾严重程度的逻辑回归模型

Mary Opokua Ansong, Asare Yaw Obeng, Samuel King Opoku

AI总结 本研究提出了一种逻辑回归模型,利用环境和生物学因素预测儿童疟疾的严重程度,通过83.3%的准确率验证了模型的有效性,并强调了样本代表性的的重要性。

详情
Journal ref
Eur. J. Electr. Eng. Comput. Sci. 8 (2024) 31-35
AI中文摘要

全球范围内疟疾是导致死亡的主要原因之一。研究人员试图基于气象数据、气候数据和疟原虫的繁殖周期开发预测疟疾暴发的模型。本研究基于环境和生物学因素预测疟疾的严重程度。本研究开发了一个逻辑回归模型,利用镰状红血球疾病、停滞水、垃圾堆、湿草地和使用驱虫蚊帐等因素进行预测,准确率为83.3%。研究在加纳博索姆特韦区进行,共有417名受访者。研究得出结论,尽管该区儿童极易感染疟疾,但病情严重程度非常低。本研究建议,在机器学习模型开发过程中,仅仅拥有良好的样本量是不够的,同时还需要有良好的各类标签样本代表性。

英文摘要

One of the main causes of death around the globe is malaria. Researchers have sought to develop predictive models for malaria outbreaks based on meteorological data, climate data and the breeding cycle of Plasmodium, the causative agent of malaria. This study predicts the severity of malaria based on environmental and biological factors. A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate. The study was carried out in the Bosomtwe District of Ghana with 417 respondents. It was deduced that although children in the District are highly prone to malaria infection, the severity is very low. The study recommends that not just having a good sample size alone is important during machine learning model development, but also having a good sample representation of the various class labels is equally important.

2605.18899 2026-05-20 cs.LG cs.AI 版本更新

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

不要让多臂老虎机反馈将连续LLM推荐系统更新偏离目标

Taesan Kim, Hyeongjun Yun, Jaegul Choo, Chung Park

发表机构 * SK Telecom(SK电信) KAIST(韩国科学技术院)

AI总结 本文提出了一种名为Anchored Bandit Policy Optimization (ABPO)的框架,用于持续改进基于生成式大语言模型的推荐系统,通过结合组内相对策略优化(GRPO)和显式处理曝光偏差和反馈模糊性,以减少因部署日志提供的策略形状上下文老虎机反馈导致的偏差,并提高推荐准确性。

详情
AI中文摘要

基于生成式大语言模型的推荐系统(LLM-Rec)需要持续部署后的更新,但部署日志仅提供策略形状的上下文老虎机反馈:结果仅在由先前服务策略暴露的项目上被观察到,导致曝光偏差,并产生部分、不对称的信号,包括相对可靠的积极响应和模糊的无响应。我们提出了一种连续LLM-Rec更新的Anchored Bandit Policy Optimization(ABPO)框架,结合组内相对策略优化(GRPO)与显式处理曝光偏差和反馈模糊性。具体来说,我们将在每个GRPO滚动组中插入暴露的推荐作为记录的锚点,使组内相对归一化能够针对先前策略实际暴露的动作进行校准,而不是仅针对新采样的滚动。因为正响应和无响应仅通过先前策略暴露被观察到,我们对固定锚点应用自归一化逆倾向评分,以校正策略不匹配。同时,我们将两种反馈类型进行不对称处理:正响应提供相对直接的推荐信号,而无响应仍然模糊,因为它们可能反映真正的不感兴趣或未观察到的外部因素。为了避免因模糊的无响应而过于激进的更新,我们用模型输出标记的置信度来削弱其惩罚,作为无监督的可靠性信号。在Amazon Reviews和MovieLens的五个领域中,我们的方法在推荐准确性上产生了持续的更新收益,同时比先前的基线方法更有效地缓解了先前策略引起的曝光偏差。

英文摘要

Generative LLM-based recommenders (LLM-Rec) require continual post-deployment updates, yet deployment logs provide only policy-shaped contextual bandit feedback: outcomes are observed solely for items exposed by a prior serving policy, inducing exposure bias and yielding partial, asymmetric signals consisting of relatively reliable positive responses and ambiguous no-responses. We propose an Anchored Bandit Policy Optimization (ABPO) framework for continual LLM-Rec updates that combines group-relative policy optimization (GRPO) with explicit treatment of exposure bias and feedback ambiguity. Specifically, we insert the exposed recommendation as a logged anchor into each GRPO rollout group, so that group-relative normalization is calibrated against the action actually exposed by the prior policy rather than against newly sampled rollouts alone. Because both positive- and no-responses are observed only through prior-policy exposure, we apply self-normalized inverse propensity scoring to the fixed anchor for both feedback types to correct for policy mismatch. At the same time, we treat the two feedback types asymmetrically in reliability: positive responses provide relatively direct endorsement signals, whereas no-responses remain ambiguous because they may reflect either true disinterest or unobserved external factors. To avoid overly aggressive updates from ambiguous no-responses, we temper their penalties with self-certainty, using the model's output-token confidence as a verifier-free reliability signal. Across five domains from Amazon Reviews and MovieLens, our method yields consistent post-update gains in recommendation accuracy while mitigating prior-policy-induced exposure bias more effectively than prior baselines.

2605.18897 2026-05-20 eess.SP cs.AI cs.LG 版本更新

Cross-Subject Intracranial EEG Reconstruction from Scalp Recordings Using Multi-Scale Cross-Attention Transformers

基于多尺度交叉注意力变换器的跨受试者颅内脑电重构(使用头皮记录)

Tien-Dat Pham, Xuan-The Tran

发表机构 * HAI-Smartlink Research Lab, Anchi STE Company(HAI-Smartlink研究实验室、Anchi STE公司) School of Mechanical Engineering, Vietnam Maritime University(越南海事大学机械工程学院)

AI总结 本文提出了一种基于多尺度交叉注意力变换器(CAST)的方法,通过两阶段迁移学习策略,从头皮脑电中重建未见过的受试者的颅内脑电信号,实现了无需患者特定训练的跨受试者颅内脑电重构。

详情
AI中文摘要

颅内脑电(iEEG)提供高保真的神经记录,对临床和脑机接口应用至关重要,但获取这些信号需要侵入性手术。尽管最近的研究尝试从非侵入性头皮脑电估计iEEG,但大多数方法依赖于患者特定的模型,导致循环依赖:如果需要手术收集训练数据,非侵入性模型的实用性有限。在本研究中,我们通过预测未见过的患者的颅内信号来解决跨受试者iEEG重构的挑战,使用在其他人身上训练的模型。我们提出了CAST(跨注意力空间-时间变换器),一种机器学习框架,通过两阶段迁移学习策略将头皮脑电转换为多通道iEEG波形。首先,一个时间编码器在三个不同分辨率上提取多尺度神经表示。然后,由于患者之间的电极放置差异较大,一个通道感知的解码器仅使用少量目标受试者的数据进行校准。我们通过留一受试者法交叉验证在两个公共数据集上评估了所提出的方法,这两个数据集包含1,282个iEEG通道。实验结果表明,CAST在重构靠近头皮表面的皮层信号方面优于深度皮下活动。在高度可观察的运动感觉区域,模型在中央前回实现了峰值相关性高达r=0.864。此外,通过通道选择策略,CAST在可行的受试者上获得了平均相关性r=0.545,优于之前的同受试者基线。这些发现表明,无需广泛的患者特定训练,即可从头皮脑电中重构未见过的受试者的皮层iEEG信号,并且仅需短暂的校准阶段即可使模型适应新的硬件配置。

英文摘要

Intracranial EEG (iEEG) provides high-fidelity neural recordings essential for clinical and brain-computer interface applications, but acquiring these signals requires invasive surgery. While recent studies have attempted to estimate iEEG from non-invasive scalp EEG, most rely on patient-specific models, creating a circular dependency: if surgery is required to collect training data, the non-invasive model offers limited practical benefit. In this study, we address the challenge of cross-subject iEEG reconstruction by predicting intracranial signals for unseen patients using models trained on other individuals. We propose CAST (Cross-Attention Spatial-Temporal Transformer), a machine learning framework that translates scalp EEG into multi-channel iEEG waveforms through a two-stage transfer learning strategy. First, a temporal encoder extracts multi-scale neural representations at three different resolutions. Then, because electrode placements vary substantially across patients, a channel-aware decoder is calibrated using only a few minutes of data from the target subject. We evaluated the proposed method using leave-one-subject-out cross-validation on two public datasets comprising 1,282 iEEG channels. Experimental results demonstrate that CAST reconstructs cortical signals located near the scalp surface substantially better than deep subcortical activity. In highly observable sensorimotor regions, the model achieved peak correlations of up to r=0.864 in the precentral gyrus. Furthermore, with a channel selection strategy, CAST obtained a mean correlation of r=0.545 on viable subjects, outperforming previous within-subject baselines. These findings indicate that cortical iEEG signals can be reconstructed for unseen subjects from scalp EEG without extensive patient-specific training, and that only a brief calibration phase is sufficient to adapt the model to new hardware configurations.

2605.18892 2026-05-20 cs.LG cs.AI cs.DC 版本更新

Data-Free Client Contribution Estimation via Logit Maximization for Federated Learning

通过Logit最大化实现无数据的客户端贡献估计用于联邦学习

Asim Ukaye, Nurbek Tastan, Mubarak Abdu-Aguye, Karthik Nandakumar

发表机构 * MBZUAI, Abu Dhabi, UAE(MBZUAI,阿布扎赫德,阿联酋) Michigan State University, Michigan, USA(密歇根州立大学,密歇根,美国)

AI总结 本文提出了一种基于Logit最大化的无数据客户端贡献估计和聚合框架CELM,该框架无需共享原始数据、客户端元数据或辅助公开数据,通过客户端更新获取类别证据分数并构建跨客户端证据矩阵,以量化每类的竞争力和类别覆盖范围,从而计算出对少数类提供强判别性证据的客户端贡献权重,提高联邦学习的鲁棒性和性能。

Comments 22 pages, 7 figures

详情
AI中文摘要

联邦学习(FL)使计算机视觉模型能够协同学习,其中隐私和监管限制防止在设备或组织之间集中数据。然而,实际的FL部署往往表现出严重的类别不平衡和标签偏斜,导致标准聚合协议过度拟合主导客户端并降级少数类性能。我们提出了一种基于Logit最大化的无数据、按类别贡献估计和聚合框架(CELM),该框架不需要共享原始数据、客户端元数据或辅助公开数据。FL服务器通过客户端更新获取类别证据分数,并构建跨客户端证据矩阵,该矩阵量化了每类的竞争力和类别覆盖范围。使用该矩阵,我们计算出贡献权重,以提升为少数类提供强判别性证据的客户端的权重。所得到的聚合是稳定的,由于简单约束和动量平滑,且与标准FL训练流水线保持兼容。我们在受控的非独立同分布和病理标签分割的代表性视觉基准上评估了该方法,证明CELM基于的聚合提高了对不平衡和统计异质性的鲁棒性,同时在不需任何额外数据交换的情况下实现了更好的性能。

英文摘要

Federated learning (FL) enables collaborative learning of computer vision models, where privacy and regulatory constraints prevent centralizing data across devices or organizations. However, practical FL deployments often exhibit severe class imbalance and label skew, causing standard aggregation protocols to overfit dominant clients and degrade minority-class performance. We propose a data-free, class-wise contribution estimation and aggregation framework based on logit maximization (CELM) that does not require sharing raw data, client metadata, or auxiliary public datasets. The FL server probes client updates to obtain class-wise evidence scores and assembles a cross-client evidence matrix, which quantifies both per-class competence and class coverage. Using this matrix, we compute contribution weights that upweight clients providing strong, discriminative evidence for underrepresented classes. The resulting aggregation is stable due to simplex constraints and momentum smoothing, and it remains compatible with standard FL training pipelines. We evaluate the approach on representative vision benchmarks under controlled non-IID and pathological label splits, demonstrating that CELM-based aggregation improves robustness to imbalance and statistical heterogeneity, while yielding better performance without requiring any additional data exchange.

2605.18891 2026-05-20 cs.LG cs.AI 版本更新

Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries

在取消学习后使用头部条件化的候鸟审计推理轨迹记忆化声明

Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * Northeastern University, USA(东北大学) University of Illinois Urbana-Champaign, USA(伊利诺伊大学厄巴纳-香槟分校) Southern Methodist University, USA(南方 Methodist 大学)

AI总结 该研究通过在DeepSeek-R1-Distill-Qwen-7B上使用LoRA记忆化的虚构作者和NPO取消学习,结合六token候鸟头部条件,审计推理轨迹记忆化声明,发现正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化,也不能排除其存在。

详情
AI中文摘要

对推理模型的取消学习评估有时会显示绕过模式。答案侧看起来已取消学习,但模型自身的推理轨迹仍会发出遗忘内容,这种差距被当作证据表明权重仍记忆。我们使用LoRA记忆化的虚构作者和NPO取消学习,在六token候鸟头部条件下审计此阅读。在一种种子下,用相同的权重交换推理轨迹为短非候鸟预填,答案率下降幅度等于绕过间隙本身,无论预填是否模仿训练模板。在第二种种子下,绕过间隙缩小而非消失,预填交换方向反转并使答案率达到上限。正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化,也不能排除其存在。在不同的distillate中,相同指标因解析器无法找到闭合标签而改变符号。我们推荐在解码时进行模板交换作为廉价的合理性检查,与传统审计并行。

英文摘要

Evaluations of unlearning on reasoning models sometimes show a bypass pattern. The answer side looks unlearned, but the model's own thinking trace keeps emitting the forgotten content, and the gap is taken as evidence that the weights still remember. We audit this reading on DeepSeek-R1-Distill-Qwen-7B with LoRA-memorized fictional authors and NPO unlearning, conditioned on a six-token canary head. On one seed, swapping the thinking trace for a short non-canary prefill on the same weights drops the answer rate by as much as the bypass gap itself, whether the prefill mimics the training template or not. On a second seed the bypass gap shrinks rather than vanishing, and the prefill swap reverses direction and brings the answer rate to ceiling. A positive parser-split bypass gap thus does not by itself identify hidden weight-level memorization, and does not rule it out either. On a different distillate the same metric flips sign because the parser cannot find the closing tag. We recommend a decode-time template swap as a cheap sanity check alongside the canonical audit.

2605.18889 2026-05-20 cs.LG cs.AI 版本更新

Soft Learning

软学习

Mohammed Aledhari, Ali Aledhari, Fatimah Aledhari, Mohamed Rahouti

发表机构 * University of North Texas(北卡罗来纳州立大学) Fordham University(福尔特姆大学)

AI总结 本文提出软学习框架,通过交叉验证非负最小二乘法发现最优组合权重,实现比深度网络快数十倍的训练速度,同时具备内在可解释性和未来扩展性,优于多种方法,在70%的任务上排名第一。

详情
AI中文摘要

现代机器学习迫使从业者在强大的但昂贵的深度网络和快速但有限的经典算法之间做出选择。本文介绍了软学习,一个维护异质专家库的框架,涵盖线性模型、树集成、核机和神经网络,并通过交叉验证非负最小二乘法发现可证明最优的组合权重。软学习保证能匹配或超过其专家的最佳加权组合,仅在CPU上训练速度比深度网络快两到三个数量级(72-435倍,取决于测试配置),通过学习的权重提供内在可解释性,揭示哪种算法范式最适合数据,并且具有未来保障性:添加专家能保证性能维持或提升。在37个数据集(25个分类,12个回归)上,针对包括CatBoost和调优深度网络在内的九种方法,软学习在70%的任务上排名第一,获得最佳平均排名(Friedman检验,p=1.12×10^-12),并且是唯一同时在分类和回归上均表现优异的方法,无需GPU硬件或超参数调优。这些结果表明从“哪种算法最好?”到“什么是有证明最优的组合?”的范式转变,软学习通过正式保证回答任何数据模态的问题。

英文摘要

Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft Learning, a framework that maintains a library of heterogeneous specialists -- spanning linear models, tree ensembles, kernel machines, and neural networks -- and discovers provably optimal combination weights through cross-validated non-negative least squares. Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, trains over two orders of magnitude faster than deep networks on CPU alone (72-435x faster across tested configurations), provides inherent interpretability through learned weights that reveal which algorithmic paradigm best fits the data, and is future-proof: adding specialists is mathematically guaranteed to maintain or improve performance. Across 37 datasets (25 classification, 12 regression) against nine methods including CatBoost and tuned deep networks, Soft Learning ranks first on 70% of tasks, achieves the best mean rank (Friedman test, p = 1.12 x 10^-12), and is the only method to simultaneously excel at both classification and regression -- all without GPU hardware or hyperparameter tuning. These results suggest a paradigm shift from "which algorithm is best?" to "what is the provably optimal combination?" -- a question Soft Learning answers with formal guarantees for any data modality.

2605.18884 2026-05-20 cs.LG cs.CV 版本更新

Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition

在情绪树中导航:用于多模态情绪识别的分层双曲RAG

Zeheng Wang, Bo Zhao, Yijie Zhu, Zhishu Liu, Hui Ma, Ruixin Zhang, Shouhong Ding, Qianyu Xie, Zitong Yu

发表机构 * Great Bay University(广东东莞大亚湾大学) Tencent Youtu Lab(腾讯优图实验室)

AI总结 本文提出HyperEmo-RAG,一种利用结构化情绪知识库的检索增强生成框架,通过双曲空间嵌入和证据图构建来提升多模态情绪识别的性能。

详情
AI中文摘要

多模态情绪识别旨在整合文本、音频和视频源以理解人类情感状态。尽管多模态大语言模型在多模态推理方面表现优异,但通常将情绪类别视为独立标签,忽略了人类心理的丰富层次分类。此外,缺乏外部上下文知识使它们容易过度解释噪声线索,进一步复杂化细粒度情绪分类。为了解决这些问题,我们提出了HyperEmo-RAG,一种检索增强生成框架,利用结构化情绪知识库。我们的框架引入了两个关键创新。1)层次双曲 grounding。认识到情绪分类的内在层次树结构,我们将层次情绪标签和多模态样本嵌入到连续双曲空间(Poincaré球)中,并设计了层次束搜索 deliberation 过程,逐步从粗粒度到细粒度级别检索样本。2)结构化证据注入。基于检索到的证据,我们构建证据图,并通过Tree-Aware Attention机制和EmotionGraphFormer将结构化知识作为显式认知上下文注入LLM中,保持图结构信息的完整性。在多个数据集上的实验表明,HyperEmo-RAG显著优于现有方法。

英文摘要

Multimodal emotion recognition aims to integrate text, audio, and video sources to understand human affective states. Although multimodal large language models excel at multimodal reasoning, they typically treat emotion categories as independent labels, ignoring the rich hierarchical taxonomy of human psychology. Moreover, lacking external contextual knowledge makes them highly susceptible to over-interpreting noisy cues, further complicating fine-grained emotion classification. To address these issues, we propose \textbf{HyperEmo-RAG}, a retrieval-augmented generation framework that leverages a structured emotional knowledge base. Our framework introduces two key innovations. 1) Hierarchical hyperbolic grounding. Recognizing the inherent hierarchical tree structure of emotion taxonomies, we jointly embed hierarchical emotion labels and multimodal samples into a continuous hyperbolic space (Poincaré ball) and design a hierarchical beam-search deliberation process that progressively retrieves samples from coarse to fine-grained levels. 2) Structured evidence injection. Based on the retrieved evidence, we construct an evidence graph and inject the structured knowledge as explicit cognitive context into the LLM through a Tree-Aware Attention mechanism and an EmotionGraphFormer, preserving the integrity of graph-structured information. Experiments on multiple datasets demonstrate that HyperEmo-RAG significantly outperforms existing methods.

2605.18883 2026-05-20 cs.LG cs.AI 版本更新

Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators

预测并非物理:在神经模拟器中学习和评估守恒量

Andrew Bukowski, Aditya Kothari, Simba Shi, Ishir Rao

发表机构 * Yale University(耶鲁大学)

AI总结 本文研究了神经网络能否从物理轨迹中学习或选择全局守恒量,通过三个哈密顿系统(抛体运动、单摆和弹簧-质量系统)验证了不同模型在守恒律保持方面的性能,发现黑盒CDN在加入时间一致性损失时表现更优,而多项式CDN对训练配置敏感。

Comments 10 pages

详情
AI中文摘要

训练在哈密顿轨迹上的扩散模型可以达到接近10^-3的滚动MSE,但其能量的标准差比地面真实能量的标准差大7500到36000倍,表明未能保持守恒定律。这一差距促使我们提出核心问题:神经网络能否从物理轨迹中学习或选择全局守恒量?我们研究了三个哈密顿系统:抛体运动、单摆和弹簧-质量系统。我们使用了结构化的T(v)+V(q)能量模型、黑盒守恒发现网络(CDN)、多项式CDN以及条件扩散基线。结构化网络在干净数据上对分析能量的R²≥0.9999,而黑盒CDN在训练时加入时间一致性损失和小的对齐损失(λ_align=0.2)时,R²≥0.996。当λ_align=0时,CDN在单摆和弹簧-质量系统上Pearson R²崩溃(<10^-3),表明仅靠时间一致性无法可靠地识别真实能量。在1%的加性高斯噪声下,CDN在抛体和弹簧-质量系统上优于结构化模型,表明CDN可能在该设置下对噪声输入更鲁棒。然而,多项式CDN对训练配置敏感:在单摆系统上短训练计划下R²=0.78,但通过更多训练时间和数据可以达到R²=0.9998,无论是否加入噪声。

英文摘要

A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and 36000 times larger than the ground-truth energy standard deviation, indicating a failure to preserve conservation laws. This gap motivates our central question of whether neural networks can learn or select globally conserved quantities from physical trajectories. We investigate this across three Hamiltonian systems: projectile motion, pendulum, and spring-mass. We use a structured $T(v)+V(q)$ energy model, a black-box Conservation Discovery Network (CDN), a polynomial CDN, and a conditional diffusion baseline. The structured network reaches $R^2 \geq 0.9999$ against analytical energy on clean data, while the black-box CDN reaches $R^2 \geq 0.996$ when trained with temporal consistency plus a small alignment loss to analytical energy at $t=0$ ($λ_{\mathrm{align}}=0.2$). With $λ_{\mathrm{align}}=0$, CDN Pearson $R^2$ collapses on pendulum and spring-mass ($< 10^{-3}$), showing that temporal consistency alone is not enough to reliably identify the true energy. Under $1\%$ additive Gaussian noise, the CDN outperforms the structured model on the projectile and spring-mass systems, suggesting that the CDN may be more robust to noisy inputs in this setting. However, the polynomial CDN is sensitive to training configuration: it achieves $R^2=0.78$ under a short training schedule on the pendulum system, but reaches $R^2=0.9998$ with more training time and data, regardless of whether noise is added.

2605.18882 2026-05-20 cs.LG cs.AI 版本更新

To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents

叫还是不叫:诊断LLM代理中的内在过度调用偏差

Wei Shi, Ziheng Peng, Sihang Li, Xiting Wang, Xiang Wang, Mengnan Du, Na Zou

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Renmin University of China(中国人民大学) The Chinese University of Hong Kong Shenzhen(香港中文大学(深圳)) University of Science and Technology of China(中国科学技术大学)

AI总结 本文研究了LLM代理中过度调用现象,提出内在偏差假说,通过稀疏自编码器恢复行为对齐的特征基,减少到带符号激活边距,并估计偏移量,从而修正过度调用问题。

详情
AI中文摘要

LLM代理表现出一种一致的倾向,即在不需要工具的情况下也频繁调用工具。在When2Call基准测试中,三个家族的六个模型显示出较高的调用准确性,但调用准确性远低于不调用准确性,导致总体准确性在55%-70%之间。我们将其归因于内在偏差假说(IBH):调用/不调用决策映射具有激活无关的调用偏移,因此模型在激活平衡时仍倾向于调用。使用稀疏自编码器(SAEs),我们恢复了与调用/不调用决策对齐的特征基,将其减少到带符号激活边距,并直接估计偏移量。在所有六个模型中,只有当不调用激活超过调用激活时,模型才是决策中性的,这与IBH一致。然后,我们通过自适应边距校准引导(AMCS)进行因果测试,这是一种沿SAE解码器方向的闭合形式反偏移。消除诊断出的偏移量可以减轻过度调用并提高总体准确性,同时调用准确性下降很小。我们的工作将过度调用从经验现象转变为可以进行因果修正的机制性对象。代码可在https://github.com/SKURA502/agent-sae/上获取。

英文摘要

LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three families show high call accuracy but much lower no-call accuracy, leaving overall accuracy in the 55%-70% range. We trace this to an Intrinsic Bias Hypothesis (IBH): the call/no-call decision mapping carries an activation-independent call offset, so the model favors call even at activation parity. Using Sparse Autoencoders (SAEs), we recover behavior-aligned feature bases for the call/no_call decision, reduce them to a signed activation margin, and estimate the offset directly. Across all six models, the model is decision-neutral only when no_call activation outweighs call activation, consistent with IBH. We then causally test IBH with Adaptive Margin-Calibrated Steering (AMCS), a closed-form counter-bias shift along SAE decoder directions. Cancelling the diagnosed offset mitigates over-calling and improves overall accuracy with a negligible drop in call accuracy. Our work recasts over-calling from an empirical phenomenon into a mechanistic object amenable to causal correction. Code is available at https://github.com/SKURA502/agent-sae/.

2605.18881 2026-05-20 cs.LG physics.flu-dyn 版本更新

Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning

气味导航中通过记忆增强强化学习的流辅助铸造策略的出现

Changxu Zhao, Dongxiao Zhao, Xin Bian, Gaojin Li

发表机构 * State Key Laboratory of Ocean Engineering, School of Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China(海洋工程国家重点实验室,海洋与土木工程学院,上海交通大学,上海200240,中华人民共和国) State Key Laboratory of Fluid Power and Mechatronic Systems, Department of Engineering Mechanics, Zhejiang University, Hangzhou 310027, People’s Republic of China(流体动力与机械系统国家重点实验室,工程力学系,浙江大学,杭州310027,中华人民共和国)

AI总结 研究通过记忆增强强化学习探讨了在动态流场中动物如何利用记忆长度和流条件优化气味搜索效率,发现智能体通过自适应调整搜索轨迹几何形状和启动铸造的浓度阈值来最大化成功概率。

详情
AI中文摘要

在动态流场中,尽管依赖随机检测,各种动物表现出显著的气味搜索能力。有趣的是,存在一个最佳时间窗口,可以整合这些检测以最大化搜索效率。为了理解其内在机制,我们研究了在不稳定的流中,不同记忆长度和流条件下的强化学习(RL)智能体的导航性能。在没有任何预定义模型的情况下,智能体发展出一种流辅助的铸造策略,并自适应地调整其搜索轨迹的几何形状和启动铸造的浓度阈值以最大化成功率。智能体朝气味源的平均速度对记忆长度表现出非单调依赖性,这可以由“扇区搜索”模型解释。

英文摘要

In dynamic flow fields, various animals exhibit remarkable odor search capabilities despite relying on stochastic detections. Interestingly, there exists an optimal time window for integrating these detections that maximizes search efficiency. To understand the underlying mechanism, we investigate the navigation performance of Reinforcement Learning (RL) agents in unsteady flows under varying memory lengths and flow conditions. Without any predefined models, the agents develop a flow-assisted casting strategy and adaptively adjust both the geometry of their search trajectories and the concentration threshold for initiating casting to maximize the success rate. The agent's average speed toward the odor source exhibits a non-monotonic dependence on memory length, which can be explained by the "sector-search" model.

2605.18880 2026-05-20 cs.LG cs.CV q-bio.QM 版本更新

A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

一种多维聚类方法用于识别先天性免疫缺陷

Nishad Kulkarni, Alexandra K. Martinson, Nicholas L. Rider, Michael Keller, Syed Muhammad Anwar

发表机构 * Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington, DC(Sheikh Zayed儿童外科创新研究所,儿童医院,华盛顿特区) Childrens National Hospital, Washington, DC(儿童医院,华盛顿特区) Department of Health Systems & Implementation Science, Division of Allergy & Immunology Virginia Tech Carilion School of Medicine, Roanoke, VA(健康系统与实施科学部门,过敏与免疫学分会弗吉尼亚理工大学Carilion医学院,罗阿诺克,VA) Division of Allergy & Immunology Childrens National Hospital, Washington, DC(过敏与免疫学分会儿童医院,华盛顿特区) School of Medicine and Health Sciences, George Washington University, Washington, DC(医学与健康科学学院,乔治华盛顿大学,华盛顿特区)

AI总结 本文提出一种多维聚类方法,用于从全国数据注册中识别新的罕见疾病模式并提取与先天性免疫缺陷相关的特征,通过改进IEI特征意识和开发罕见疾病人群分析的数据工具包,扩展了复杂医疗记录到可被无监督ML解释的数据结构。

Comments Accepted at EMBC 2026

详情
AI中文摘要

先天性免疫缺陷(IEI)等罕见疾病需要早期诊断以防止终器官损伤并提高生活质量。获取和整理大规模电子健康记录(EHR)数据的障碍限制了常规数据驱动分析保持在IEI和其他罕见疾病趋势的前沿。在IEI中开发机器学习(ML)算法进行模式识别以及已发表的方法研究如何系统地处理和整合复杂医疗数据有限。我们提出的流程,包括数据整理和ML聚类算法,旨在识别新的罕见疾病模式并从全国数据注册中提取IEI相关的特征。我们的EHR数据格式化和处理方法提出了一个流程,将原始免疫学实验室数据转换为向量。这进一步结合了通过聚类进行疾病模式识别的超参数调优。本研究改进了IEI特征意识,开发了罕见疾病人群分析的数据工具包,并扩展了将复杂医疗记录转换为可被无监督ML解释的数据结构。

英文摘要

Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine data driven analyses to remain on the forefront of IEI and other rare disease trends. Development of machine learning (ML) algorithms in IEI for pattern recognition as well as published methodology examining how to systematically process and integrate complex medical data is limited. Our proposed pipeline, including data curation and ML clustering algorithms, is designed to recognize novel rare disease patterns and extract IEI- associated features from a national data registry. Our methodology for EHR data formatting and processing presents the pipeline that transforms raw immunologic lab data into vectors. This is further combined with hyperparameter tuning for diseases pattern recognition via clustering. This study refines IEI feature awareness, develops data tool kits for rare disease populations analysis, and expands on transforming complex medical records in data structures interpretable by unsupervised ML.

2605.18878 2026-05-20 eess.SP cs.CV cs.LG eess.IV 版本更新

Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis

心力衰竭再入院风险的肺部超声生物标志物预后价值:一项试点数据驱动分析

Jana Armouti, Laura Hutchins, Jacob Duplantis, Thomas Deiss, Thales Nogueira Gomes, Keyur H. Patel, Seema Walvekar, Shane Guillory, Thomas H. Fox, Amita Krishnan, Ricardo Rodriguez, Bennett DeBoisblanc, Deva Ramanan, John Galeotti, Gautam Gare

发表机构 * Carnegie Mellon University(卡内基梅隆大学) LSUHSC Internal Medicine(路易斯安那州立大学医学部) Cosmetic Surgery Facility LLC(美容外科诊所有限公司)

AI总结 本研究通过数据驱动方法利用住院期间获得的B型肺部超声(LUS)数据,预测30天内心力衰竭再入院风险,发现依赖性下肺区域、时间差特征以及多视图特征拼接在预测中表现最佳,展示了超声生物标志物在非侵入性心力衰竭风险分层中的实用性。

详情
AI中文摘要

住院后30天内再入院是心力衰竭(CHF)导致发病率、死亡率和可避免医疗支出的主要驱动因素。当前的临床风险分层工具主要依赖于非成像数据,且预测性能有限。床旁肺部超声(LUS)提供了一个敏感的、非侵入性的窗口,以观察肺部充血,这特征于CHF失代偿,但其用于再入院预测的预后作用仍待探索。我们提出了一个试点可行性研究,这是首个系统使用住院期间获得的B型LUS进行机器学习预测30天内CHF再入院的系统研究。从预训练的Temporal Shift Module(TSM)ResNet-18编码器中提取定量时空嵌入,并分别评估可解释的生物标志物特征。通过结构化消融研究肺部视图、时间表示、多视图融合和跨肺增强,我们识别出驱动再入院风险的关键成像因素。我们的发现表明(1)依赖性下肺区域(左3、右3)携带最强的预后信号,与它们对静水性充血的更大易感性一致;(2)连续检查之间的时间差特征显著优于单时间点表示,突显了捕捉疾病轨迹的重要性;(3)多视图特征拼接产生了最佳整体性能,我们的最佳MLP模型实现了F1得分为0.80(95% CI: 0.62-0.96)。生物标志物分析进一步表明,胸膜线异常,包括断裂和凹陷,的信息量与传统A线和B线标志物相当。这些结果支持POCUS衍生的生物标志物作为实用、可解释的非侵入性CHF风险分层工具。

英文摘要

Hospital readmission within 30 days of discharge is a leading driver of morbidity, mortality, and avoidable healthcare expenditure in congestive heart failure (CHF). Current clinical risk stratification tools rely primarily on non-imaging data and exhibit limited predictive performance. Point-of-care lung ultrasound (LUS) offers a sensitive, noninvasive window into the pulmonary congestion that characterizes CHF decompensation, yet its prognostic utility for readmission prediction remains largely unexplored. We present a pilot feasibility study, the first systematic machine learning study using B-mode LUS acquired during hospitalization to predict 30-day CHF readmission. Quantitative spatiotemporal embeddings are extracted from a pretrained Temporal Shift Module (TSM) ResNet-18 encoder, and interpretable biomarker features are separately evaluated. Through structured ablations over lung view, temporal representation, multi-view fusion, and cross-lung augmentation, we identify the key imaging factors driving readmission risk. Our findings reveal that (1) dependent lower-lung regions (Left-3, Right-3) carry the strongest prognostic signal, consistent with their greater susceptibility to hydrostatic congestion; (2) temporal difference features between sequential examinations substantially outperform single-timepoint representations, highlighting the importance of capturing disease trajectory; and (3) multi-view feature concatenation yields the best overall performance, with our top MLP model achieving an F1 score of 0.80 (95% CI: 0.62-0.96). Biomarker analysis further reveals that pleural-line abnormalities, including breaks and indentations, are as informative as the canonical A-line and B-line markers. These results support POCUS-derived biomarkers as practical, interpretable tools for noninvasive CHF risk stratification.

2605.18873 2026-05-20 cs.CR cs.AI cs.LG 版本更新

GenAI-FDIA: Physics-Informed Generative Models for False Data Injection Attacks

GenAI-FDIA:基于物理的生成模型用于虚假数据注入攻击

Mohammad A. Razzaque, Muta Tah Hira

发表机构 * School of Computing, Engineering and Digital Technologies, Teesside University, UK(Teesside大学计算与工程数字技术学院,英国) Smartifier Ltd, Stockton-on-Tees, UK(Smartifier有限公司,英国Stockton-on-Tees)

AI总结 本文提出GenAI-FDIA框架,通过物理兼容的生成模型合成虚假数据注入攻击,验证了不同架构在电力系统中的有效性,并解决了生成模型中出现的新型故障模式。

Comments Submitted to IEEE Transactions on Smart Grid

详情
AI中文摘要

训练和评估用于电力系统的虚假数据注入攻击(FDIA)检测器受到数据稀缺的限制。运营电网测量数据具有商业敏感性,而手工制作的攻击无法捕捉由网络物理结构强加的复杂分布特性。我们提出了GenAI-FDIA框架,该框架在20种架构中进行基准测试,涵盖Wasserstein GANs、MMD-VAEs、归一化流、扩散模型以及跨家族混合模型。这些模型在三个IEEE测试平台(14节点直流、30节点直流和14节点交流)上进行评估,使用数据驱动的坏数据检测(BDD)阈值校准进行60/20/20时间分割。我们的实证结果验证了这些模型能够生成高保真的攻击,所有架构在14节点网络上达到86.6%以上的规避率;此外,限制攻击者的拓扑知识会带来可测量的隐蔽性下降(p ≤ 0.0022)。关键的是,我们识别出一种之前未报告的故障模式:在归一化特征空间中直接应用仿射物理投影会严重位移攻击向量,使BDD规避率从约55%降至<2%在30节点测试平台。我们通过一种新的推理时间谐调器解决此问题,恢复所有物理兼容变体的完全隐蔽性(ε_BDD=100%)而无需重新训练。最后,我们隔离了高级混合架构中的协方差坍塌现象(κ≈-0.076),并通过50个周期的预热计划进行修正(κ→0.785,MMDΔ=-3.1%)。最终,GenAI-FDIA提供了适用于任何受物理约束的生成模型在电力系统安全中的稳健恢复蓝图。

英文摘要

Training and evaluating false data injection attack (FDIA) detectors for power systems is constrained by data scarcity. Operational grid measurements are commercially sensitive, and hand-crafted attacks fail to capture complex distributional structures imposed by network physics. We present \textsc{GenAI-FDIA}, a framework benchmarking a pool of $P{=}20$ architectures for physics-compliant FDIA synthesis, spanning Wasserstein GANs, MMD-VAEs, normalising flows, diffusion models, and cross-family hybrids. These are evaluated across three IEEE testbeds (14-bus DC, 30-bus DC, and 14-bus AC) under a 60/20/20 chronological split using data-driven Bad Data Detection (BDD) threshold calibration. Our empirical results verify that these models generate high-fidelity attacks, with all architectures achieving evasion rates of $ε_{\text{BDD}} \ge 86.6\%$ on the 14-bus network; additionally, limiting an attacker's topological knowledge induces a measurable degradation in stealthiness ($p \le 0.0022$). Crucially, we identify a previously unreported failure mode: applying affine physics projections directly in normalised feature spaces critically displaces the attack vector, collapsing BDD evasion from ${\sim}55\%$ to $<\!2\%$ on the 30-bus testbed. We resolve this via a novel inference-time harmoniser, restoring full stealthiness ($ε_{\text{BDD}}{=}100\%$) across all physics-informed variants without retraining. Finally, we isolate a covariance-collapse phenomenon ($κ\approx {-}0.076$) within advanced hybrid architectures and rectify it through 50-epoch warm-up schedules ($κ\to 0.785$, $Δ\text{MMD}={-}3.1\%$). Ultimately, \textsc{GenAI-FDIA} delivers a robust recovery blueprint applicable to any physics-constrained generative model deployed for power-system security.

2605.18872 2026-05-20 cs.LG cs.AI cs.RO 版本更新

EUPHORIA: Efficient Universal Planning via Hybrid Optimization for Robust Industrial Robotic Assembly

EUPHORIA: 通过混合优化实现高效通用规划以实现稳健的工业机器人装配

Shih-Yu Lai, Chia-Ching Yen, Yang-Ting Shen, Peter Yichen Chen, Yu-Lun Liu, Bing-Yu Chen

发表机构 * National Taiwan University(国立台湾大学) MoonShine Animation Studio(MoonShine动画工作室) National Cheng Kung University(国立成功大学) The University of British Columbia(不列颠哥伦比亚大学) National Yang Ming Chiao Tung University(阳明交通大学)

AI总结 本文提出EUPHORIA框架,通过混合优化策略实现通用少样本适应和动态效率,解决建筑机器人装配中规划器高度专业化和操作低效的问题,结合元几何编码器、物理引导图变压器和残差稳定性校正等方法,实现高效且鲁棒的装配规划。

详情
AI中文摘要

建筑机器人装配面临持续瓶颈:现有规划器要么高度专业化,需要每次新几何设计都进行昂贵的再训练,要么操作低效,将结构序列和运动学运动视为独立过程。我们提出了EUPHORIA,一个统一框架,通过混合优化策略实现通用少样本适应和动态效率。为克服再训练瓶颈,我们提出了基于图超网络的元几何编码器:不同于标准对比学习仅在特征级识别,我们的超网络动态从最小支持集中生成策略参数,使参数级适应复杂拓扑(如穹顶、拱门)而无需基于梯度的再训练。对于结构推理,我们引入了通过软演员-评论家(SAC)训练的物理引导图变压器,其物理偏置注意力机制通过离散元模型(DEM)模拟的接触力调节注意力分数,引导规划器朝向结构关键连接。我们进一步通过运动学感知序列确保操作效率,其中SAC目标惩罚高能转换。最后,我们通过残差稳定性校正弥合仿真到现实的差距,这是一种可微优化层,通过最小化联合能量-稳定性成本优先级来微调粗略装配动作。实验表明,EUPHORIA显著减少了与解耦基线相比的能量消耗,并在未见的非标准几何上实现了最先进的成功率,通过融合元学习、物理引导注意力和残差优化,实现一个连贯的通用规划器。

英文摘要

Robotic assembly in architectural construction faces a persistent bottleneck: existing planners are either highly specialized, requiring prohibitive retraining for every new geometric design, or operationally inefficient, treating structural sequencing and kinematic motion as disjoint processes. We present EUPHORIA, a unified framework that achieves universal few-shot adaptability and dynamic efficiency through a hybrid optimization strategy. To overcome the retraining bottleneck, we propose a Meta-Geometric Encoder based on Graph Hypernetworks: unlike standard contrastive learning, which performs only feature-level recognition, our hypernetwork dynamically generates policy parameters from a minimal support set, enabling parameter-level adaptation to complex topologies (e.g., domes, arches) without gradient-based retraining. For structural reasoning, we introduce a Physics-Informed Graph Transformer trained via Soft Actor-Critic (SAC), with a Physics-Bias Attention mechanism that modulates attention scores using contact forces from Discrete Element Model (DEM) simulations, guiding the planner toward structurally critical connections. We further ensure operational efficiency through Kinematics-Aware Sequencing, where the SAC objective penalizes high-energy transitions. Finally, we bridge the Sim2Real gap via Residual Stability Correction, a differentiable optimization layer that fine-tunes coarse assembly actions by minimizing a joint energy-stability cost prior to execution. Experiments show that EUPHORIA significantly reduces energy consumption over decoupled baselines and achieves state-of-the-art success rates on unseen, non-standard geometries with minimal few-shot examples, fusing meta-learning, physics-informed attention, and residual optimization into a cohesive, generalized planner.

2605.18871 2026-05-20 cs.LG cs.AI 版本更新

Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning

基于不确定性感知的结构LLM推理的分布能量模型

Shireen Kudukkil Manchingal, Abhey Kalia, Fernanda Gonçalves, Shebin Rawther

发表机构 * Oxford Dynamics Harwell Science and Innovation Campus(牛津动力学哈威尔科学与创新校园)

AI总结 本文提出了一种分解的能量函数,结合了学习的质量评分器和确定性分析约束惩罚,用于验证结构LLM输出。该方法通过两步推理循环触发目标再生或 abstention,能够在多个基准测试中超越单次Qwen-72B,并减少约束违反。

详情
AI中文摘要

当大型语言模型生成结构化输出如旅行计划、代码解决方案或多步证明时,个别推理步骤可能正确,但整体输出可能违反预算、失败测试用例或与先前推论矛盾。我们提出了一种分解的能量函数,结合了学习的质量评分器和确定性分析约束惩罚,用于验证结构LLM输出。质量评分器是单个冻结编码器上的异构集合,包含低秩适配器(3%可训练参数);集合均值对候选者进行排名,标准差量化epistemic不确定性,驱动一个两步推理循环,触发目标再生或 abstention。在五个基准测试(GSM8K、MuSR、TravelPlanner、TACO、Knights & Knaves)中,我们的149M参数验证器协调一个7-26B开放生成器池,在每个基准测试中均优于单次Qwen-72B,与Claude Sonnet 4.6在MuSR上匹配(67.7% vs. 68.0%),并且在TravelPlanner上将约束违反减少53%(相对于Opus 4.6,oracle 0.028,随机 0.231)。两种方法是互补的:结构验证在约束可检查时获胜(验证器捕捉信号前沿模型无法自我检测),而预训练规模先验在不可检查时获胜(叙述推理、代码语义)。跨数据集的混淆分析确认在四个推理任务上确实存在质量区分,并识别出代码中的模型身份捷径,通过最后一层重新训练得以缓解。评分器在困难数据上训练后可实现零样本转移:一个MuSR训练的评分器在没有看到数学问题的情况下在GSM8K上达到93.9%。

英文摘要

When Large Language Models produce structured outputs such as travel plans, code solutions, or multi-step proofs, individual reasoning steps may appear correct while the output as a whole violates budgets, fails test cases, or contradicts earlier deductions. We propose a decomposed energy function that combines a learned quality scorer with deterministic analytical constraint penalties for verifying structured LLM outputs. The quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder (3% trainable parameters); the ensemble mean ranks candidates while the standard deviation quantifies epistemic uncertainty, driving a two-pass inference loop that triggers targeted regeneration or abstention. Across five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), our 149M-parameter verifier orchestrating a pool of 7-26B open generators outperforms single-shot Qwen-72B on every benchmark, matches Claude Sonnet 4.6 on MuSR (67.7% vs. 68.0%), and reduces constraint violations by 53% relative to Opus 4.6 on TravelPlanner (oracle 0.028, random 0.231). The two routes are complementary: structural verification wins when constraints are checkable (the verifier captures signal frontier models cannot self-detect), while pretraining-scale priors win where they are not (narrative inference, code semantics). A cross-dataset confounding analysis confirms genuine quality discrimination on four reasoning tasks and identifies a model-identity shortcut on code, mitigated via last-layer retraining. Scorers trained on difficult data transfer zero-shot: a MuSR-trained scorer achieves 93.9% on GSM8K without seeing a math problem.

2605.18869 2026-05-20 cs.LG cs.AI cs.NE 版本更新

MO-CAPO: Multi-Objective Cost-Aware Prompt Optimization

MO-CAPO:多目标成本感知提示优化

Jan Büssing, Moritz Schlager, Timo Heiß, Tom Zehle, Matthias Feurer

发表机构 * Technical University of Munich (TUM), Munich Center for Machine Learning (MCML)(慕尼黑工业大学(TUM)、慕尼黑机器学习中心(MCML)) LMU Munich, Munich Center for Machine Learning (MCML)(慕尼黑大学(LMU)、慕尼黑机器学习中心(MCML)) University of Freiburg, ELLIS Institute Tübingen(弗赖堡大学、图宾根ELLIS研究所) TU Dortmund University, Lamarr Institute for Machine Learning(多特蒙德工业大学、拉马尔机器学习与人工智能研究所)

AI总结 本文提出MO-CAPO,一种多目标提示优化算法,同时优化性能和推理成本,并通过预算分配实现高效优化,通过评估四个任务和三个LLM,证明其在噪声R2指标上优于NSGA-II基线,并在较低预算下达到竞争性性能。

详情
AI中文摘要

大型语言模型(LLMs)在广泛的任务上表现出色,但对提示设计高度敏感,促使需要自动提示优化。现有方法主要关注性能,忽略竞争目标如推理成本或延迟。同时,现有多目标提示优化工作依赖于现成的NSGA-II,忽略优化效率。为此,我们引入MO-CAPO,一种新的多目标提示优化算法,同时优化性能和推理成本,利用预算分配实现成本高效的优化。我们进一步提出一个面向部署的成本目标,捕捉LLM推理的完整计算概况。我们评估了我们的方法在四个任务和三个LLM上的表现,并将其与基于NSGA-II的多目标方法和最先进的单目标提示优化器进行比较。结果表明,MO-CAPO一致地识别出强、稳健和多样的Pareto前沿近似,同时保持成本效率。它在12种情况中的8种情况下在噪声R2指标上优于NSGA-II基线,并且在显著较低的预算下常能达到竞争性性能。发现的解决方案集涵盖了被单目标优化器遗漏的多样化性能-成本权衡,但顶级性能候选者仍与单目标解决方案竞争。此外,我们进行了首次多目标机器学习实验的评估,考虑了泛化和鲁棒性通过噪声R2和近似间隙,使解决方案质量的评估更加现实。MO-CAPO使从业者能够从高效发现的多个提示中选择,这些提示提供不同的性能和成本权衡。

英文摘要

Large language models (LLMs) achieve strong performance across a wide range of tasks but are highly sensitive to prompt design, motivating the need for automatic prompt optimization. Existing methods predominantly focus on performance alone, ignoring competing objectives such as inference cost or latency. At the same time, existing work on multi-objective prompt optimization relies on off-the-shelf NSGA-II, ignoring optimization efficiency. As a remedy, we introduce MO-CAPO, a novel multi-objective prompt optimization algorithm that jointly optimizes performance and inference cost while leveraging budget allocation for cost-efficient optimization. We further propose a deployment-oriented cost objective that captures the full computational profile of LLM inference. We evaluate our approach across four tasks and three LLMs and compare it to an NSGA-II-based multi-objective method and state-of-the-art single-objective prompt optimizers. Results show that MO-CAPO consistently identifies strong, robust, and diverse Pareto front approximations while maintaining cost-efficiency. It outperforms the NSGA-II baseline on 8 out of 12 cases in terms of the noisy R2 metric and achieves competitive performances often already at a considerably lower budget. The discovered solution sets span diverse performance-cost trade-offs that are omitted by single-objective optimizers, yet the top-performance candidates remain competitive with single-objective solutions. Additionally, we conduct the first evaluation of multi-objective machine learning experiments that considers generalization and robustness through noisy R2 and approximation gap, enabling a more realistic assessment of solution quality. MO-CAPO enables practitioners to select from an efficiently discovered set of multiple prompts offering different trade-offs between performance and cost.

2605.18868 2026-05-20 cs.CR cs.AI cs.CV cs.LG 版本更新

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

DarkLLM: 利用大语言模型学习语言驱动的对抗攻击

Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao, Yixu Wang, Yifan Ding, Qixian Zhang, Henghui Ding, Xingjun Ma, Yu-Gang Jiang

发表机构 * Fudan University(复旦大学) Nanyang Technological University(南洋理工大学) Tongji University(同济大学)

AI总结 本文提出DarkLLM,一种基于大语言模型的对抗攻击框架,通过将自然语言攻击指令转换为潜在攻击向量,生成有效的对抗扰动,统一了多种攻击类型并实现了灵活可控的对抗生成。

Comments 23 pages, 13 figures

详情
AI中文摘要

尽管视觉和多模态基础模型在感知到复杂推理任务中至关重要,但它们仍然极易受到对抗攻击的影响。然而,传统对抗攻击通常局限于单一、预定义的目标,紧密耦合每个攻击到特定模型或任务,限制了其在现实场景中的可扩展性和灵活性。在本文中,我们提出了DarkLLM,一种新的攻击框架,该框架训练了一个大语言模型(LLM)将自然语言攻击指令转换为潜在攻击向量,然后解码为视觉对抗扰动。通过利用自然语言指令微调,DarkLLM不仅在一个框架内统一了目标攻击、非目标攻击、分割攻击和多模型攻击,还实现了灵活且可控的对抗生成,使每个指令都能生成一种扰动,以在异构模型上诱导期望的行为。通过在4个任务、13个数据集和15个模型上的广泛实验,我们证明DarkLLM仅需1B参数即可遵循攻击者的指令,生成对CLIP、SAM和前沿LLM高度有效的攻击,揭示了现代基础模型系统性的脆弱性。

英文摘要

While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks. However, traditional adversarial attacks are typically limited to single, predefined objectives, tightly coupling each attack to a specific model or task, which restricts their scalability and flexibility in real-world scenarios. In this work, we present DarkLLM, a novel attack framework that trains an LLM to translate natural-language attack instructions into latent attack vectors, which are then decoded into visual adversarial perturbations. By leveraging natural-language instruction tuning, DarkLLM not only unifies targeted, untargeted, segmentation, and multi-model attacks within a single framework, but also achieves flexible and controllable adversarial generation, enabling each instruction to produce a perturbation that induces desired behaviors across heterogeneous models. Through extensive experiments across 4 tasks, 13 datasets, and 15 models, we demonstrate that DarkLLM with only 1B parameters can follow attacker instructions and generate highly effective attacks against CLIP, SAM, and frontier LLMs, revealing a systemic vulnerability in modern foundation models.

2605.18867 2026-05-20 cs.LG cs.AI 版本更新

EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample

EVA-0: 仅两次前向传递的测试时间模型演化

Guohao Chen, Shuaicheng Niu, Geng Li, Yunbei Zhang, Shilin Shan, Chunyan Miao, Jianfei Yang

发表机构 * Nanyang Technological University(南洋理工大学) Tulane University(路易斯安那州立大学)

AI总结 本文研究了在仅两次前向传递预算下测试时间模型演化的问题,提出EVA-0框架以解决零阶优化中的三个关键障碍,实现高效部署。

详情
AI中文摘要

测试时间模型演化为部署模型提供了一种改进 unlabeled 测试时间经验的有前景方法,但大多数现有方法依赖反向传播(BP),这导致了显著的内存开销,使它们难以在边缘设备、量化模型、专用加速器或黑盒模型上部署。在本文中,我们研究了在严格两次前向预算下测试时间模型演化,这一设置推动了适应向高度高效的现实部署发展。我们揭示了零阶测试时间优化中的三个关键障碍:对捷径解的易感性、不受控的权重漂移和无效的更新方向估计。为克服这些问题,我们提出了EVA-0,一个最小的零阶适应框架,其特点包括:1)保持损失尺度不变以防止捷径解;2)设计了锚点引导的优化策略以缓解权重漂移;3)使用样本级对称双侧扰动进行更新方向估计和推理。EVA-0不需要BP,并且在每个样本上仅需两次前向传递即可完成推理和适应。在ImageNet-C和ViT-Base上的结果表明,EVA-0优于基于BP的DeYO和无BP的FOA,并在FOA上实现了14倍的速度提升。代码将被发布。

英文摘要

Test-time model evolution offers a promising way for deployed models to improve from unlabeled test-time experience, yet most existing methods depend on backpropagation (BP), which incurs substantial memory overhead and makes them difficult to deploy on edge devices, quantized models, specialized accelerators, or black-box models. In this work, we study test-time model evolution under a strict two-forward budget, a setting that pushes adaptation toward highly efficient real-world deployment. We reveal three key obstacles in zeroth-order test-time optimization: susceptibility to shortcut solutions, uncontrolled weight drift, and ineffective update direction estimation. To overcome them, we propose EVA-0, a minimal zeroth-order adaptation framework that: 1) keeps the loss scale-invariant to prevent shortcut solutions; 2) devises an anchor-guided optimization strategy to alleviate weight drift; 3) uses sample-wise symmetric two-sided perturbation for update direction estimation and inference. EVA-0 requires no BP and performs both inference and adaptation within only two forward passes per sample. Results on ImageNet-C & ViT-Base show that EVA-0 outperforms both BP-based DeYO and BP-free FOA, while achieving a 14x speed-up over FOA. Code will be released.

2605.18865 2026-05-20 cs.LG cs.AI 版本更新

From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation

从稀疏到简单:通过稀疏注意力蒸馏实现更简单的顺序替换

Yuxin Ren, Maxwell D Collins, Miao Hu, Huanrui Yang

发表机构 * University of Arizona(亚利桑那大学) TetraMem, Inc.(TetraMem公司)

AI总结 本文提出通过稀疏注意力蒸馏实现更简单的顺序替换,通过分析transformer层中的稀疏模式,发现可以将复杂的token依赖分解为不同复杂度的序列到序列映射,并用更简单的顺序模块替代部分层功能,从而减少参数量和延迟。

详情
AI中文摘要

自注意力机制是大规模transformer预训练的核心基础,但其二次token交互成本使得推理过程昂贵。用更简单的顺序模块替代注意力具有吸引力,但直接替换往往导致信息丢失,尤其是在大规模情况下。本文通过稀疏性的视角重新审视注意力替换。基于对transformer各层中稀疏模式的观察,我们提出预训练transformer将复杂的token依赖分解为多种复杂度的序列到序列映射,其中某些层的功能可以被近似并用更简单的顺序模块替代而不丢失信息。我们通过插拔式层间蒸馏框架验证这一前提,以近似和替代预训练视觉transformer模型中的注意力功能。在固定训练预算下,受控组的替换结果显示:替换稀疏注意力的层比替换密集注意力的层导致的准确率下降更小。我们进一步通过AViT风格的token保留对预训练的ViT施加显式的注意力稀疏性,并进行稀疏性引导的顺序替换模型蒸馏,其中我们发现增加教师模型的稀疏性会一致减少学生模型与教师模型之间的差距。所提出的方法通过注意力稀疏性的指导实现了更小的参数量和延迟的高效注意力替换。

英文摘要

Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing attention with simpler sequential modules is appealing, yet naive substitution is often lossy, especially at larger scales. This paper revisits attention replacement through the lens of sparsity. Based on the observation of diverse sparsity patterns across transformer layers, we posit that pretrained transformers decompose the complex token dependency across tokens into various sequence-to-sequence mappings of diverse complexities, where some layer functionalities can be approximated and replaced with much simpler sequential modules without loss. We evaluate this premise using a plug-and-play layer-wise distillation framework to approximate and replace attention functionalities in pretrained vision transformer models. Controlled group-wise replacements under a fixed training budget reveal a clear pattern: substituting layers with sparser attention incurs substantially smaller accuracy drops than replacing denser ones. We further impose explicit attention sparsity on the pretrained ViT via AViT-style token retention and perform sparsity-guided distillation for sequential replacing models, where we see increasing teacher sparsity consistently reduces the student-teacher gap. The proposed method achieves efficient attention replacement for reduced parameter size and latency through the guidance of attention sparsity.

2605.18864 2026-05-20 cs.LG cs.AI cs.CL 版本更新

SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs

SAGE: 通过塑造锚点引导LLMs的RLVR探索

Chanuk Lee, Minki Kang, Sung Ju Hwang

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出SAGE框架,通过重塑反KL锚分布来实现可控的经验支持扩展,从而在数学推理基准中提升pass@1和pass@k的表现。

Comments Preprint

详情
AI中文摘要

近期研究发现,可验证奖励的强化学习(RLVR)能够可靠地提高推理任务的pass@1指标,但往往在pass@k上未能取得类似提升,引发了关于RLVR是否真正使大语言模型获得新推理能力还是仅提高基础模型中现有推理模式采样效率的问题。先前分析大多支持后者观点,认为这种限制源于标准RLVR目标的结构特性,导致探索压力不足。在本文中,我们提出一个核心结构约束源于反KL正则化,该正则化稳定了训练但本质上将策略锚定于参考分布,从而抑制了替代推理模式的出现。然而,我们显示,去除KL项或用前向KL替代并不能提供满意的解决方案,因为两者都会通过诱导奖励黑客或将概率质量分配给非目标区域而破坏效率-覆盖权衡。为了解决这一矛盾,我们提出了SAGE,一个原理性的框架,通过引导函数q(x,y)重塑反KL锚分布本身,实现可控的经验支持扩展,从而在挑战性的数学推理基准中获得一致的pass@1和pass@k提升。我们的代码可在https://github.com/tally0818/SAGE上获得。

英文摘要

Recent studies observe that reinforcement learning with verifiable rewards (RLVR) reliably improves pass@1 on reasoning tasks, yet often fails to yield comparable gains in pass@k, raising the question of whether RLVR genuinely enables large language models to acquire novel reasoning abilities or merely enhances the efficiency of sampling reasoning modes already present in the base model. Prior analyses largely support the latter view, attributing this limitation to structural properties of standard RLVR objectives that result in insufficient exploration pressure. In this work, we argue that a central structural constraint arises from reverse-KL regularization, which stabilizes training but inherently anchors the policy to the reference distribution, thereby suppressing the emergence of alternative reasoning modes. However, we show that neither removing the KL term nor replacing it with forward-KL provides a satisfactory solution, as both disrupt the efficiency-coverage trade-off by either inducing reward hacking or allocating probability mass to off-target regions. To resolve this tension, we propose SAGE, a principled framework that enables controllable empirical support expansion by reshaping the reverse-KL anchor distribution itself through a guide function q(x,y), achieving consistent improvements in both pass@1 and pass@k across challenging mathematical reasoning benchmarks. Our code is available at https://github.com/tally0818/SAGE.

2605.18862 2026-05-20 cs.LG cs.AI cs.CR 版本更新

Towards Family-Grouped Hierarchical Federated Learning on Sub-5KB Models: A Feasibility Study of Privacy-Preserving ECG Monitoring for Ultra-Resource-Constrained Wearables

面向子5KB模型的家庭分组分层联邦学习:隐私保护ECG监测在超低资源约束可穿戴设备上的可行性研究

Hangyu Wu

发表机构 * Shenzhen Coddie Technology co.,ltd(深圳科迪科技有限公司)

AI总结 本文提出家庭分组分层联邦学习(Family-FL)和轻量级Tiny CNN-LSTM架构,通过模拟评估在超低资源约束微控制器上实现隐私保护的联邦学习的可行性,展示了在MIT-BIH数据库上达到91.9%的准确率和76.7%的通信量减少。

Comments Supported by Shenzhen Coddie Technology Co., Ltd. This is a preprint and has not been peer-reviewed

详情
AI中文摘要

心血管疾病仍是全球导致死亡的主要原因,通过可穿戴设备持续ECG监测早期检测心律失常可以预防危及生命事件。联邦学习(FL)通过在设备上保留原始ECG数据实现隐私保护的协同训练,但标准FL导致通信开销过大,标准深度学习模型无法在超低功耗微控制器上运行。我们提出家庭分组分层联邦学习(Family-FL),一种三级架构,利用家庭作为隐私边界在家庭内聚合后再进行全局同步。我们进一步设计了一种硬件受限的Tiny CNN-LSTM架构,仅包含669个参数,INT8量化后仅占用4.65KB Flash和2.95KB RAM,满足STC32G12K128类微控制器的约束。在MIT-BIH心律失常数据库上的实验(5次独立运行的平均值)表明,Family-FL相比FedAvg减少了76.7%的通信量,同时保持了可比的准确性。Family-FL-Tiny在91.9±1.2%的准确率和宏F1为0.483±0.031的情况下,将总通信量减少到FedAvg的0.31%。该模型实现了可靠的室性心律失常检测(每类F1=0.80),这是家庭初步筛查中最临床关键的异常情况。这些结果通过基于模拟的评估证明了通过隐私保护联邦学习在超低资源约束微控制器上的技术可行性。我们诚实地讨论了局限性:无硬件部署、单数据集验证(MIT-BIH,47名受试者)、罕见类敏感性降低以及无正式差分隐私保证。

英文摘要

Cardiovascular disease remains the leading cause of death worldwide, and early detection of arrhythmias through continuous ECG monitoring on wearable devices can prevent life-threatening events. Federated Learning (FL) enables privacy-preserving collaborative training by keeping raw ECG data on device, yet standard FL incurs prohibitive communication overhead and standard deep learning models cannot fit on ultra-low-power microcontrollers. We propose Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier architecture that uses the family as a natural privacy boundary for intra-family aggregation before global synchronization. We further design a hardware-constrained Tiny CNN-LSTM architecture with only 669 parameters, INT8-quantized to occupy merely 4.65KB Flash and 2.95KB RAM, meeting the constraints of STC32G12K128-class microcontrollers. Experiments on the MIT-BIH Arrhythmia Database (mean of 5 independent runs with different seeds) demonstrate that Family-FL reduces communication volume by 76.7% compared to FedAvg while maintaining comparable accuracy. Family-FL-Tiny achieves 91.9 +/- 1.2% accuracy with macro-F1 of 0.483 +/- 0.031, reducing total communication to 0.31% of FedAvg. The model achieves reliable ventricular arrhythmia detection (per-class F1 = 0.80), the most clinically critical abnormality for home-based preliminary screening. These results demonstrate the technical feasibility of privacy-preserving federated learning on ultra-resource-constrained microcontrollers through simulation-based evaluation. We honestly discuss limitations: no hardware deployment, single-dataset validation (MIT-BIH, 47 subjects), reduced rare-class sensitivity, and absence of formal differential privacy guarantees.

2605.18858 2026-05-20 cs.LG cs.AI cs.GT stat.ML 版本更新

When Individually Calibrated Models Become Collectively Miscalibrated

当个体校准的模型成为集体不校准的

Zhaohui Wang

发表机构 * USC Viterbi School of Engineering(南加州大学维特比工程学院)

AI总结 研究探讨了在多智能体环境中,即使每个模型都经过个体校准,聚合预测仍可能不校准的现象,提出通过VCG聚合方法解决这一问题,实现激励相容和近最优性能。

Comments 42 pages, 1 main figure, multiple tables. Accepted at ProbML 2026

详情
AI中文摘要

概率预测系统常常将多个模型的概率估计聚合为单一决策。一个常见假设是,如果每个模型都经过个体校准,聚合预测也将是良好的校准。我们展示了在多智能体设置中,这一假设不成立:当预测者战略性地相互作用时,即使没有刻意协调,个体校准的预测者也可能集体上不校准。这种现象自然出现在智能体在重叠数据上独立训练时。我们证明,在基于Brier分数的聚合中,当信念正相关时,每个智能体的个体最优报告系统地低估了正类概率,导致价格of anarchy大于一,只要协方差(b_i, b_j) > 0。在典型设置(n=5个智能体,成对相关性=0.5,基础率=0.3)中,经实测的PoA在假阴性率上达到7.25倍。相比之下,基于VCG的聚合通过奖励边际贡献对齐激励,实现主导策略激励相容性和近最优性能。在三个现实世界数据集(NSL-KDD、UNSW-NB15、信用卡欺诈)上的实验显示,VCG在保持可比准确性的同时表现出强鲁棒性。它在数据稀疏和对抗性设置中表现尤其出色,自适应加权进一步在分布偏移下提升了性能。

英文摘要

Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is individually calibrated, the aggregate prediction will also be well calibrated. We show that this assumption fails in multi-agent settings: individually calibrated predictors can become collectively miscalibrated when their predictions interact strategically, in the game-theoretic sense of Brier-optimal local response, even without deliberate coordination. This phenomenon arises naturally when agents are independently trained on overlapping data. We prove that under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In a canonical setting (n = 5 agents, pairwise correlation = 0.5, base rate = 0.3), the empirically measured PoA in false-negative rate reaches 7.25x. In contrast, VCG-based aggregation aligns incentives by rewarding marginal contribution, achieving dominant-strategy incentive compatibility and near-optimal performance. Experiments on three real-world datasets (NSL-KDD, UNSW-NB15, Credit Card Fraud) show that VCG provides strong robustness while maintaining comparable accuracy. It performs particularly well in data-sparse and adversarial settings, and adaptive weighting further improves performance under distribution shift.

2605.18857 2026-05-20 cs.IR cs.AI cs.LG 版本更新

The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection

99%成功悖论:当近完美检索等于随机选择

Vyzantinos Repantis, Harshvardhan Singh, Tony Joseph, Cien Zhang, Akash Vishwakarma, Svetlana Karslioglu, Michael Wyatt Thot, Ameya Gawde

发表机构 * Meta Platforms Inc.(Meta平台公司)

AI总结 该研究引入了Bits-over-Random(BoR)指标,揭示了高成功率可能掩盖随机水平性能的现象,指出在大规模数据集上,即使检索结果覆盖率达到99%,其选择性仍可能接近零,从而表明需要重新考虑检索深度和传统指标的报告方式。

Comments 12 pages, 2 figures, 7 tables. Accepted at ICLR 2026 Blog Track, https://iclr-blogposts.github.io/2026/blog/2026/bits-over-random/

详情
Journal ref
ICLR Blog Track 2026, https://iclr.cc/virtual/2026/poster/10012083
AI中文摘要

对于信息检索(IR)历史上的大部分时间,搜索结果都是为人类消费者设计的,他们可以自行扫描、过滤和丢弃不相关信息。这塑造了检索系统以寻找并排序更多相关文档为目标,而不是保持结果简洁和干净,因为人类是最终的过滤器。然而,大语言模型(LLMs)改变了这一现状,因为它们缺乏这种过滤能力。为了解决这一问题,我们引入了Bits-over-Random(BoR),这是一种修正了机会的检索选择性度量,揭示了高成功率可能掩盖随机水平性能的情况。我们测量选择性为BoR = log₂(P_obs / P_rand),其中P_rand是所选成功规则(此处为覆盖:top-K中≥1个相关文档)的超几何基线。在20 Newsgroups数据集上,BM25和SPLADE均在K=100时报告>99%的成功率(覆盖),但BoR≈0,表明在该深度下的选择性处于随机水平。当预期覆盖比(K·R̄_q / N)超过3-5时,基线主导并导致选择性崩溃。下游检索增强生成(RAG)评估证实了这一模式:LLM准确性在K=100时可能会显著下降,这与近零BoR上限一致。相比之下,BoR在BEIR/SciFact和MS MARCO上保持正数(其中41个系统在理论上限附近聚集,尽管有13点的召回差距),证实了在稀疏和大规模设置中的基线预测。我们进一步表明,崩溃边界适用于LLM代理工具选择,其中小目录大小导致即使有完美选择器,选择性也会消失。这些发现表明,应将BoR与传统指标一起报告,并在额外检索提供 negligible 选择性增益但增加计算成本时重新考虑深度选择。

英文摘要

For most of the history of information retrieval (IR), search results were designed for human consumers who could scan, filter, and discard irrelevant information on their own. This shaped retrieval systems to optimize for finding and ranking more relevant documents, but not keeping results clean and minimal, as the human was the final filter. However, LLMs have changed that by lacking this filtering ability. To address this, we introduce Bits-over-Random (BoR), a chance-corrected measure of retrieval selectivity that reveals when high success rates mask random-level performance. We measure selectivity as $BoR = \log_{2}\left(\frac{\mathrm{P}_{obs}}{\mathrm{P}_{rand}}\right)$, where $\mathrm{P}_{rand}$ is the hypergeometric baseline for the chosen success rule (here, coverage: $ \geq1 $ relevant in top-$K$). On the 20 Newsgroups dataset, BM25 and SPLADE both report $>99$% success at $K=100$ (coverage), yet $BoR \approx 0$, indicating random-level selectivity at that depth. When the expected coverage ratio $\left(\frac{K \cdot \bar{R}_{q}}{N}\right)$ exceeds 3-5, the baseline dominates and selectivity collapses. Downstream retrieval-augmented generation (RAG) evaluation confirms this pattern: LLM accuracy can degrade substantially at $K=100$, consistent with the near-zero BoR ceiling. In contrast, BoR remains positive on BEIR/SciFact and on MS MARCO (where 41 systems cluster within 0.2 bits of the theoretical ceiling despite a 13-point recall gap), confirming baseline predictions across sparse and large-scale settings. We further show that the collapse boundary applies to LLM agent tool selection, where small catalog sizes cause selectivity to vanish even with perfect selectors. These findings suggest reporting BoR alongside traditional metrics and reconsidering depth choices when additional retrieval provides negligible selectivity gains while inflating computational costs.

2605.18855 2026-05-20 cs.LG cs.CV 版本更新

Delta Attention Residuals

Delta Attention Residuals

Cheng Luo, Zefan Cai, Junjie Hu

发表机构 * Independent Researcher(独立研究者) University of Wisconsin–Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出Delta Attention Residuals,通过在残差连接中引入对每个子层引入的变化(delta)进行注意力机制,解决了传统注意力残差中因累积隐藏状态冗余导致的路由崩溃问题,从而提升模型跨层选择信息的能力。

详情
AI中文摘要

Attention Residuals将标准加性残差连接替换为在前一层输出上学习的softmax注意力,实现了选择性的跨层路由。然而,标准Attention Residuals仍然在累积的隐藏状态上进行注意力计算,这些状态高度冗余。我们发现这种冗余导致在更深的层中出现路由崩溃:注意力权重变得低对比度且接近均匀(最大权重≈0.2),限制了模型在前一层中选择信息性状态的能力。这提出了一个关键但尚未深入研究的设计问题:在Attention Residuals中应路由何种层间表示?为回答这个问题,我们提出了Delta Attention Residuals,其在delta(每个子层引入的变化(v_i = h_{i+1} - h_i))上进行注意力计算,而非累积状态。Delta表示在结构上具有多样性,产生更高对比度的注意力分布(最大权重≈0.6),从而在层间实现更选择性和有效的路由。这一原则适用于单个子层和块粒度。在所有测试的规模(220M-7.6B)中,Delta Attention Residuals始终优于标准残差和Attention Residuals,验证困惑度提升1.7-8.2%。Delta Attention Residuals还允许通过标准微调将预训练检查点转换为Delta Attention Residuals。代码可在https://github.com/wdlctc/delta-attention-residuals-code获得。

英文摘要

Attention Residuals replace standard additive residual connections with learned softmax attention over previous layer outputs, enabling selective cross-layer routing. However, standard Attention Residuals still attend over cumulative hidden states in previous layers, which are highly redundant. We show that this redundancy leads to routing collapse in deeper layers: attention weights become low-contrast and closer to uniform (max weight ${\approx}$0.2), limiting the model's ability to select informative states in previous layers. This raises a key but underexplored design question: what layer-wise representations should be routed in Attention Residuals? To answer this question, we propose Delta Attention Residuals, which attend over deltas -- the change introduced by each sublayer ($\mathbf{v}_i = \mathbf{h}_{i+1} - \mathbf{h}_i$) -- instead of cumulative states. Delta representations are structurally diverse and yield higher-contrast attention distributions (max weight ${\approx}$0.6), enabling more selective and effective routing across layers. This principle applies at both per-sublayer and block granularity. Across all tested scales (220M--7.6B), Delta Attention Residuals consistently outperform both standard residuals and Attention Residuals, with 1.7--8.2\% validation perplexity gains. Delta Attention Residuals also enables converting pretrained checkpoints into Delta Attention Residuals via standard fine-tuning. Code is available at https://github.com/wdlctc/delta-attention-residuals-code.

2605.18854 2026-05-20 cs.LG 版本更新

Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery

评估用于数据驱动科学发现的编码代理的记忆压缩策略

Renuka Chintalapati, Sid Raskar, Anurag Acharya, Jared Willard, Patrick Emami, Sameera Horawalavithana

发表机构 * Pacific Northwest National Laboratory(太平洋西北国家实验室) National Laboratory of the Rockies(落基山国家实验室)

AI总结 本文评估了八种记忆压缩策略在数据驱动科学发现任务中的表现,发现没有压缩器显著提升假设质量,但基于LLM的压缩器会增加24-94%的token成本,而屏蔽工具调用输出可实现8.6%的净节省,且最佳压缩器因科学领域和任务长度而异。

详情
AI中文摘要

编码代理在长时间任务中积累大量上下文,但固定的上下文窗口迫使从业者在截断和任务失败之间做出选择。尽管已提出许多记忆压缩策略,从简单的滑动窗口到LLM生成的摘要,但缺乏系统性的比较来指导策略选择,尤其是在科学发现任务中。我们使用GPT-4o对六十个DiscoveryBench任务(涵盖六个科学领域,总计480次评估)评估了八种记忆压缩策略。我们发现,没有压缩器显著改变假设质量,而基于LLM的压缩器会增加24-94%的token成本,屏蔽工具调用输出可实现8.6%的净节省。我们还观察到,数据驱动科学发现的最佳压缩器因科学领域和任务长度而异。

英文摘要

Coding agents accumulate extensive context during long-running tasks, yet fixed context windows force practitioners to choose between truncation and task failure. While numerous memory condensation strategies have been proposed, from simple sliding windows to LLM-generated summaries, no systematic comparison exists to guide strategy selection, especially in scientific discovery tasks. We evaluate eight memory condensation strategies using GPT-4o on sixty DiscoveryBench tasks spanning six scientific domains (480 total evaluations). We find that no condenser significantly alters hypothesis quality, while LLM-based condensers increase token costs by 24-94 percent, and masking tool-call outputs achieves an 8.6 percent net savings. We also observe that the optimal condenser for data-driven scientific discovery varies by scientific domain and task length.

2605.18853 2026-05-20 cs.LG cs.CV cs.DC 版本更新

INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference

INAR-VL:面向边缘-云视觉-语言推断的输入感知路由

Ahmed Šabanović, Paul Joe Maliakel, Ivona Brandić

发表机构 * TU Wien(维也纳技术大学)

AI总结 本文提出INAR-VL,一种轻量级的边缘-云路由系统,用于多模态推断的两级部署。该系统通过轻量级的图像和文本复杂度信号指导路由和模型选择,在本地执行简单查询,将复杂查询卸载到云端,从而在延迟、能耗和准确性之间取得平衡。

Comments 8 pages, 3 figures

详情
AI中文摘要

边缘部署的视觉-语言模型(VLMs)面临延迟与准确性的权衡:云端执行提供高质量预测但会带来通信延迟和能耗,而仅边缘执行则速度更快但准确性较低,因为模型容量有限。这种权衡进一步受到图像质量和推理复杂度异质性的影响,使静态部署效果不佳。我们提出了INAR-VL,一种轻量级的边缘-云路由系统,用于两级部署中的多模态推断。INAR-VL在边缘和云端维护互补的VLMs,并利用轻量级的图像和文本复杂度信号指导路由和模型选择,执行简单查询本地化,当有利时将复杂查询卸载到云端。在视觉问答任务上的评估表明,INAR-VL将36%的请求执行在边缘,延迟降低24%,能耗降低26%,并保持97%的云端准确性。

英文摘要

Edge deployment of Vision-Language Models (VLMs) faces a tradeoff between latency and accuracy: cloud execution provides high-quality predictions but incurs communication delay and energy cost, while edge-only execution is faster but less accurate due to limited model capacity. This trade-off is further complicated by heterogeneity in image quality and reasoning complexity, making static placement suboptimal. We present INAR-VL, a lightweight edge-cloud routing system for multimodal inference in a two-tier deployment. INAR-VL maintains complementary VLMs across edge and cloud and uses lightweight image and text complexity signals to guide routing and model selection, executing simple queries locally while offloading complex ones when beneficial. Evaluation on visual question answering shows that INAR-VL executes 36% of requests on the edge, reduces latency by 24%, lowers energy by 26%, and preserves 97% of cloud-level accuracy.

2605.18852 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

通过代理评估和稳定性感知排名实现多模态大语言模型的鲁棒检查点选择

Qinwu Xu, Zhuoheng Li, Jessie Salas

发表机构 * Meta AI

AI总结 本文提出了一种多阶段框架,结合了精心挑选的现实世界数据、结构化的LLM判断和多阶段排名协议,以解决多模态大语言模型检查点选择中的鲁棒决策问题,强调数据质量(特别是OCR可读性)对评估有效性的重要性。

详情
AI中文摘要

多模态大语言模型(MLLMs)的检查点选择在性能差异微小且评估信号易受噪声影响时面临重大挑战。现有方法依赖静态基准或逐点评分,经常与实际应用场景不一致,并缺乏对不确定性的鲁棒估计,特别是在OCR密集场景中。在本文中,我们将检查点选择建模为在评估不确定性下的稳健决策问题。我们提出了一种多阶段框架,整合了精心挑选的现实世界数据、结构化的LLM判断和多阶段排名协议。评估系统通过逐点过滤、列表排名和成对比较进行逐步细化。为了提高可靠性,我们引入基于子采样的置信度估计和基于百分位数的评分公式,以捕捉分布特征并惩罚尾部失败。此外,我们证明数据质量,特别是OCR可读性,是评估有效性的重要决定因素。

英文摘要

Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to noise. Existing methodologies rely heavily on static benchmarks or pointwise scoring, which frequently misalign with in-the-wild usage and lack robust uncertainty estimation, particularly in OCR-heavy scenarios. In this work, we formulate checkpoint selection as a robust decision problem under evaluation uncertainty. We propose a multi-stage framework that integrates curated real-world data, structured LLM-based judgment, and multi-stage ranking protocols. The evaluation system orchestrates progressive refinement via pointwise filtering, listwise ranking, and pairwise comparison. To enhance reliability, we introduce subsampling-based confidence estimation and a percentile-based scoring formulation that captures distributional characteristics while penalizing tail failures. Furthermore, we demonstrate that data quality, specifically OCR readability, is a critical determinant of evaluation validity.

2605.18851 2026-05-20 cs.LG 版本更新

STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning

STRIDE: 用于LLM推理的可学习分步语言反馈

Junjie Zhang, Guozheng Ma, Shunyu Liu, Zetian Hu, Yongcheng Jing, Ting-En Lin, Yongbin Li, Dacheng Tao

发表机构 * Generative AI Lab, College of Computing and Data Science, Nanyang Technological University, Singapore(生成式人工智能实验室,计算与数据科学学院,南洋理工大学,新加坡) Tongyi Lab, Alibaba Group(通义实验室,阿里巴巴集团)

AI总结 本文提出STRIDE框架,通过可学习的分步语言反馈提升LLM推理能力,解决了传统方法在标注成本高、信息瓶颈等问题,实验显示其在多种推理基准上表现优异。

详情
AI中文摘要

最近强化学习(RL)的进步突显了其在激励大型语言模型(LLM)推理能力的潜力。然而,现有分步级方法面临标注成本高、领域覆盖有限的问题,而标量评分进一步引入信息瓶颈,无法提供足够的语义带宽来改进中间决策。替代的语言批评方法依赖于冻结或外部批评者,虽然提供更丰富的文本反馈,但缺乏持续政策改进所需的可扩展性。在本工作中,我们提出语言驱动的分步轨迹重定向(STRIDE),一种新颖的训练框架,将过程监督从标量奖励转移到可学习的分步语言反馈。具体来说,我们仅使用基于结果的奖励共同训练生成器和生成验证器,消除外部标注,通过联合对齐的验证器训练实现持续的政策改进。验证器的分步语言批评明确本地化并解释失败,使生成器能够在中间步骤将推理轨迹转向替代决策。轨迹重定向设计保证了即使在噪声或次优验证器反馈下也能实现无害的政策改进。在多样化的推理基准实验中,STRIDE显著优于最先进的基线,同时在零次通过率问题上取得突破,其中标量方法在消融研究中无法产生学习信号,证明了可学习分步语言反馈在增强LLM推理能力方面的有效性。

英文摘要

Recent advances in Reinforcement Learning (RL) have underscored its potential for incentivizing reasoning capabilities of Large Language Models (LLMs). However, existing step-level efforts suffer from costly annotations that limit domain coverage, while scalar scores further impose an information bottleneck, offering insufficient semantic bandwidth to improve intermediate decisions. Alternative language-critique approaches, which rely on frozen or external critics, provide richer textual feedback but lack the scalability needed for sustained policy improvement. In this work, we propose language-driven stepwise trajectory redirection, termed as STRIDE, a novel training framework that shifts process supervision from scalar rewards to learnable stepwise language feedback. Specifically, we co-train a generator and a generative verifier using only outcome-based rewards, eliminating external annotations, while delivering sustained policy improvement through jointly aligned verifier training. The verifier's stepwise language critiques explicitly localize and explain failures, enabling the generator to redirect reasoning trajectories at intermediate steps toward alternative decisions. The trajectory redirection design guarantees harmless policy improvement, even under noisy or suboptimal verifier feedback. Experiments on diverse reasoning benchmarks show that STRIDE significantly outperforms state-of-the-art baselines, as well as achieving breakthroughs on zero-pass-rate problems where scalar methods yield no learning signal in our ablation studies, demonstrating the effectiveness of learnable stepwise language feedback for enhancing LLM reasoning.

2605.18849 2026-05-20 cs.LG cs.AI 版本更新

INSIGHTS: Demonstration-Based Summaries of Time Series Predictors

INSIGHTS: 时间序列预测器的基于演示的摘要

Bar Eini Porat, Rom Gutman, Uri Shalit, Ofra Amir

发表机构 * Technion Israel Institute of Technology(技术学院以色列理工学院) Tel-Aviv University(特拉维夫大学)

AI总结 本文提出INSIGHTS方法,一种模型无关、以用户为中心的方法,用于提供时间序列模型的全局解释。该方法通过生成样本摘要,平衡时间序列样本的重要性与多样性,为用户提供全面的模型行为概述。

详情
AI中文摘要

可解释性方法发展迅速,但时间序列模型的全局解释仍不完善,大多数方法集中在局部实例层面的解释上。我们介绍了INSIGHTS,一种模型无关、以用户为中心的方法,用于提供时间序列模型的全局解释。我们的方法在设计上优先考虑简单性、效率和透明性,确保利益相关者能够轻松采用其输出。尽管当前方法专注于局部解释,INSIGHTS生成样本摘要,提供模型行为的全面概述。它通过利用效用函数平衡时间序列样本的重要性与多样性,捕捉领域特定的时间序列行为特征,如超过领域规范。我们通过实验、访谈和用户研究评估INSIGHTS。我们的结果表明,INSIGHTS能够构建全面、多样的时间序列子集,生成易于个体评估的摘要。它受到领域专家的青睐,因其能够提供模型行为的稳定理解以及识别的样本质量。此外,接受INSIGHTS摘要的用户研究参与者表现出对模型整体行为的更深入理解。

英文摘要

Explainability methods have progressed rapidly, but global explanations for time-series models remain underdeveloped, with most approaches focusing on local, instance-level attributions. We introduce INSIGHTS, a model-agnostic, user-centric approach for providing global explanations of time series models. Our approach prioritizes simplicity, efficiency, and transparency in its design, ensuring that stakeholders can readily adopt its outputs. While current methods focus on local explanations, INSIGHTS generates sample summaries that offer a comprehensive overview of model behavior. It balances the importance and diversity of time series samples to create informative subsets using utility functions that capture domain-specific aspects of time series behavior, such as exceeding domain norms. We evaluate INSIGHTS through experiments, interviews, and a user study. Our results indicate INSIGHTS effectively constructs comprehensive, diverse time series subsets, producing summaries manageable for individual evaluation. It is preferred by domain experts for its ability to provide a stable understanding of model behavior and the quality of the samples identified. Moreover, user study participants presented with INSIGHTS-based summaries exhibit an enhanced understanding of the model's overall behavior.

2605.18847 2026-05-20 cs.LG cs.AI 版本更新

Transformers Linearly Represent Highly Structured World Models

Transformer 通过线性方式表示高度结构化的世界模型

Roman Kniazev, Nathanaël Fijalkow

发表机构 * LaBRI, CNRS University of Bordeaux(LaBRI、CNRS 波尔多大学)

AI总结 研究探讨了Transformer在训练过程中是否能构建任务的内部模型,并发现其内部表示结构与领域结构相匹配,通过Sudoku求解轨迹训练的Transformer展示了其内部计算机制和稀疏可解释的决策电路。

详情
AI中文摘要

当Transformer被训练于顺序推理轨迹时,它们是否会构建底层任务的内部模型?如果是的话,这些内部表示的结构是否与领域结构相匹配?我们训练了一个8层的Transformer模型来解决数独问题,并对其内部计算进行了机理分析。我们得出两个结论。第一,该模型构建了一个子结构世界模型:它不按人分析员所期望的那样逐个单元格表示棋盘状态,而是围绕数独约束所作用的行、列和盒子来组织信息。第二,我们识别出一个裸单电路:在最终的MLP层中,一组专用神经元,每个神经元单独检测特定单元格中恰好只剩一个可能的数字,并可靠地促进该数字。这些发现表明,涌现世界模型的几何结构由领域约束代数决定,而非其表面表现,且所得到的决策电路是稀疏的、单义的且完全可解释的。更广泛地说,这些发现展示了机理可解释性工具能够恢复Transformer如何解决组合推理任务的端到端算法账户。

英文摘要

Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror the structure of the domain? We train an 8-layer transformer on Sudoku solving traces and perform a mechanistic analysis of its internal computation. We establish two results. First, the model builds a substructure world model: it does not represent the board state cell by cell, as a human analyst would expect, but organizes information around the rows, columns, and boxes that Sudoku's constraints act on. Second, we identify a naked-single circuit: a small set of dedicated neurons in the final MLP layer, each individually detecting when exactly one digit remains possible for a specific cell, and reliably promoting that digit. These findings show that the geometry of an emergent world model is shaped by the constraint algebra of the domain, not its surface presentation, and that the resulting decision circuit is sparse, monosemantic, and fully interpretable. More broadly, they demonstrate that mechanistic interpretability tools can recover an end-to-end algorithmic account of how a transformer solves a combinatorial reasoning task.

2605.18846 2026-05-20 cs.LG cs.AI cs.IT math.IT 版本更新

Lost and Found in Translation: Variational Diagnostics for Neural Codebook Channels

译失与找回:变分诊断用于神经码本信道

Yusuke Hayashi

发表机构 * Artificial Life Institute(人工生命研究所) AI Alignment Network(人工智能对齐网络) Humanity Brain(人类大脑)

AI总结 该研究提出了一种变分诊断方法,用于评估神经码本信道中解码器对编码器码本的读取情况,解决了传统VAE诊断无法判断解码器是否正确读取编码器码本的问题。

Comments 9 pages, 2 figures

详情
AI中文摘要

经典通信系统不仅因随机噪声失效,还当发射端和接收端使用不兼容的操作码本时也会失效。变分自编码器(VAEs)联合训练编码器$ q_ϕ $和解码器$ p_θ $,并将其潜在空间视为离散码用于聚类、条件生成和机制可解释性。然而,标准VAE诊断——ELBO、主动单元、互信息和码本直方图——只能验证该码是否被使用,而不能验证解码器是否在编码器的码下读取每个潜在变量。我们通过神经码本信道$ K_{e o d}(j\mid i) $,一种耦合的编码器-解码器诊断方法,填补了这一差距。该信道的非对角线质量由架构无关的伯努利-KL证书$ d_{\mathrm{bin}}(1-\mathcal{A} \,\|\, arη_p) \le arΔ $控制,该证书是经典KL链式法则在离散化到编码器-解码器不一致事件下的操作专门化,补充了构造性的边缘不可能性结果:没有任何组合的边缘直方图、熵、主动码计数或互信息决定$ K_{e o d} $。我们对四个sklearn数据集(有限网格精确、5/5种子、20/20对满足边界)、二维模型(在$ 2.71 imes $观测到的不一致处非空虚)、MNIST在重要性采样控制下以及一个VQ-VAE达到预测极限$ \hat{\mathcal{A}}=1.000 $进行了证书审计。该包$ (K_{e o d}, \mathcal{A}, R_{\mathrm{eff}}, R, \mathrm{AU}) $是一个审计准备的报告单位。更广泛地说,该框架使不匹配解码——经典通信理论数十年前所命名的失败模式——在单个深度生成模型中可见。

英文摘要

Classical communication systems fail not only through random noise but also when transmitter and receiver use incompatible operational codebooks. Variational autoencoders (VAEs) train an encoder $q_ϕ$ and decoder $p_θ$ jointly, and practitioners treat the resulting latent space as a discrete code -- for clustering, conditional generation, and mechanistic interpretability. Yet standard VAE diagnostics -- ELBO, active units, mutual information, and code histograms -- certify only whether this code is used, never whether the decoder reads each latent under the encoder's code. We close this gap with the neural codebook channel $K_{e\to d}(j\mid i)$, a coupled encoder-decoder diagnostic whose off-diagonal mass is bounded by an architecture-free Bernoulli-KL certificate $d_{\mathrm{bin}}(1-\mathcal{A} \,\|\, \barη_p) \le \barΔ$ controlled by the variational gap. The certificate is the operational specialization of the classical KL chain rule under disintegration to the encoder-decoder disagreement event, complemented by a constructive marginal-impossibility result: no combination of marginal histograms, entropies, active-code counts, or mutual information determines $K_{e\to d}$. We audit the certificate on four sklearn datasets (finite-grid exact, 5/5 seeds, 20/20 pairs satisfy the bound), a 2D model where the bound is non-vacuous at $2.71\times$ the observed disagreement and the four-term identity closes within $10^{-4}$, MNIST under importance-sampling control, and a VQ-VAE attaining the predicted limit $\hat{\mathcal{A}}=1.000$. The package $(K_{e\to d}, \mathcal{A}, R_{\mathrm{eff}}, R, \mathrm{AU})$ is an audit-ready reporting unit. More broadly, the framework makes mismatched decoding -- a failure mode classical communication theory named decades ago -- visible inside a single deep generative model.

2605.18845 2026-05-20 cs.LG cs.AI 版本更新

First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation

Grokking延迟的首次通过预测:AdamW下的校准定律与因果验证

Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc

发表机构 * H&K Research Studio(H&K研究室) Clevix LLC(Clevix公司) Banking Academy of Vietnam(越南银行学院)

AI总结 本文提出了一种在AdamW优化器下预测grokking延迟的定量方法,通过推导闭合形式定律并结合因果验证,实现了对模型记忆延迟的准确预测。

Comments 51 pages, 7 figures, 6 tables. Preprint

详情
AI中文摘要

我们首次对AdamW下的grokking延迟进行了定量预测。将延迟视为首次通过时间,推导出闭合形式定律T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star),其中V_t = ||theta_t||^2是参数范数的平方,V_star是架构相关的阈值,kappa_LL吸收了AdamW对clean-SGD收缩率2 eta lambda的修正。在单个超参数单元上校准(kappa_LL, V_star)可对26个保留运行的grokking延迟进行预测,MAPE为17.7%(在41倍延迟范围内);该定律适用于MLP(MAPE 18.0%,N=34)但在跨任务扩展时退化为23.3%(N=46,43.5倍范围),其中存在结构残差,V_star / V_mem在架构内相对稳定(CV约为14%在1L变压器上)。首次通过V_t是必要但不充分的。定量分位数定理表明,正延迟需要同时满足范数分离V_mem > V_post和阈值alpha_star = arcsin(C / V_T_mem^(1/2))的角达性,其中C可从经验NTK特征图和验证-边距分位数中计算。在模数p=89上校准C可预测alpha_star = 47.2度(p=97时观测到47.8度,误差1.3%)作为先验跨单元预测。因果干预冻结范数或移除权重衰减在记忆化时消除grokking(0/6 vs. 3/3基线),使角位移保持在12度附近。kappa_LL是按架构经验测量而非从(beta_1, beta_2, epsilon)推导;同一架构内CV最大为15%(四个架构内),但不同架构变体之间的值差异约为2倍。经验范围是AdamW下的算法任务(模运算,稀疏奇偶性);该定律是否适用于自然语言模型尚不明确。

英文摘要

We give the first quantitative prediction of grokking delay under AdamW. Treating the delay as a first-passage time, we derive a closed-form law T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star), where V_t = ||theta_t||^2 is the squared parameter norm, V_star is an architecture-dependent threshold, and kappa_LL absorbs the AdamW correction to the clean-SGD contraction rate 2 eta lambda. Calibrating (kappa_LL, V_star) on a single hyperparameter cell predicts grokking delays on 26 held-out runs with MAPE 17.7% over a 41x delay range; the law generalises to MLPs (MAPE 18.0%, N=34) and degrades to 23.3% on cross-task extension (N=46, 43.5x range), with a structured residual in which V_star / V_mem stays comparatively stable within architecture (CV about 14% on the 1L transformer). First-passage of V_t is necessary but not sufficient. A quantile-margin theorem establishes that positive delay requires both norm separation V_mem > V_post and angular reachability of a threshold alpha_star = arcsin(C / V_T_mem^(1/2)), where C is computable from the empirical NTK feature map and the validation-margin quantile. Calibrating C on modulus p=89 predicts alpha_star = 47.2 degrees at p=97 (observed 47.8 degrees, error 1.3%) as a prior cross-cell prediction. Causal interventions that freeze the norm or remove weight decay at memorisation eliminate grokking (0/6 vs. 3/3 baseline), trapping the angular displacement near 12 degrees. kappa_LL is empirically measured per architecture rather than derived from (beta_1, beta_2, epsilon); within-architecture CV stays at most 15% across four architectures, but values differ by about 2x between architectural variants beyond depth alone. Empirical scope is algorithmic tasks (modular arithmetic, sparse parity) under AdamW; whether the law transfers to natural-language scale models is open.

2605.18844 2026-05-20 cs.LG cs.AI 版本更新

Graph-Driven Cross-Industry Real-Time Monitoring Framework for Anti-Money Laundering Detection in Converged Mobility-Energy Supply Chain Networks

基于图的跨行业实时监控框架用于反洗钱检测在融合的移动-能源供应链网络

Rong Liu, Xiaojun Xiao, Zhanqing Su

发表机构 * School of Public Policy, University of Southern California(南加州大学公共政策学院) Boston University(波士顿大学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文提出了一种基于图的跨行业实时反洗钱监控框架(GCRMF),用于整合的旅行-能源供应链网络,通过构建跨行业异构图并结合双图注意力网络,动态编码资本流动路径和时间演变特征,以提高跨行业洗钱行为的识别能力,并通过自监督在线学习机制实现实时适应和持续优化。

详情
AI中文摘要

随着旅行和能源行业的深度整合,跨行业供应链金融逐渐成为隐藏洗钱事件的高风险领域。为此,本文提出了一种基于图的跨行业实时反洗钱监控框架(GCRMF)用于整合的旅行-能源供应链网络。首先,构建了一个涵盖新能源汽车租赁平台、能源供应商、金融科技机构等的跨行业异构图(CIHG),并通过临时双图注意力网络(Temporal Dual-Graph Attention Network)整合行业语义,动态编码资本流动路径和时间演变特征。随后,为识别由合谋主体共同产生的结构性欺诈行为,提出了一种基于对比学习和分层图采样的元路径子图推理模块,以增强跨行业反复洗钱行为的识别能力。同时,采用自监督在线学习机制实现实时适应和持续优化以应对新的洗钱策略。实验结果表明,与现有跨行业场景下的图神经网络方法相比,GCRMF在F1分数上提高了超过17.8%,并显著降低了误报率。

英文摘要

With the deep integration of the travel and energy industries, cross-industry supply chain finance has gradually become a high-risk field of hidden money laundering incidents. For this reason, this work proposes a graph-driven cross-industry real-time anti-money laundering monitoring framework (GCRMF) for integrated travel - energy supply chain networks. First, a cross-industry heterogeneous graph (CIHG) covering new energy vehicle rental platforms, energy suppliers, fintech institutions, etc., is constructed, and industry semantics are integrated through temporarily Dual-GAT (Temporal Dual-Graph Attention Network), dynamically encoding capital flow paths and evolution features over time. Subsequently, in order to identify the structural fraud behavior together produced by colluding subjects, a meta-path subgraph reasoning module based on contrastive learning and hierarchical graph sampling is proposed to enhance the discrimination capability of cross-industry recurring money laundering behavior. Meanwhile, a self-supervised online learning mechanism is adopted for real-time adaptation and continuous optimization to new money laundering strategies. The experimental results show that compared with existing graph neural network methods in cross-industry scenarios, GCRMF improves the performance by more than 17.8% of F1 score and greatly reduces the false positive rate.

2605.18843 2026-05-20 cs.LG 版本更新

TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting

TEMPO: 通过模式分离策略优化实现可信大语言模型回测的时序执行

Zeyu Zhang, Bradly C. Stadie

发表机构 * Department of Statistic and Data Science(统计与数据科学系)

AI总结 本文提出TEMPO方法,通过模式分离策略优化,解决大语言模型回测中因泄露后截止日期知识导致的评估不准确问题,核心贡献是引入双模式奖励和基于GRPO的训练流程,有效减少知识泄露并提升任务性能。

Comments 9 pages in main context

详情
AI中文摘要

对大型语言模型进行历史事件回测需要仅基于截止日期之前可用的信息进行推理。然而,模型经常从预训练中泄露后截止日期的知识到推理过程中,导致看似准确度提高但破坏评估的有效性。基于提示的约束在被抑制内容与预测有因果关系时失效,而知识卸载无法解决此问题,因为时间合规性是实例特定的:同一事实可能对一个截止日期是合法证据,对另一个截止日期则为违规。而不是删除知识,模型必须学习时间纪律:选择受每个实例截止日期条件的证据。我们提出TEMPO(通过模式分离策略优化实现时序执行),通过两个贡献训练这种纪律:(1)一个双模式奖励,其中泄漏模式将后截止日期的主张驱动至零作为硬性前提,然后性能模式优化任务性能;(2)基于GRPO的训练流程,使模型能够发现时间有效的推理策略。我们证明训练单调减少泄露,收敛到无泄露最优解,并在合规后提升任务性能。在三个预测任务和两个模型上,TEMPO将泄露率从2~13%降至0.6~3.7%,在强预截止信号存在时任务性能提升6~13%,在预测任务本身困难时维持稳定。

英文摘要

Backtesting large language models on historical events requires reasoning exclusively from information available before a specified cutoff date. Yet models routinely leak post-cutoff knowledge from pre-training into their reasoning, inflating apparent accuracy and undermining evaluation validity. Prompt-based constraints fail when suppressed content is causally related to the prediction, and knowledge unlearning cannot address this problem because temporal compliance is instance-specific: the same fact may be legitimate evidence for one cutoff date and a violation for another. Rather than erasing knowledge, the model must learn temporal discipline: selecting evidence conditioned on each instance's cutoff date. We propose TEMPO (Temporal Enforcement via Mode-separated Policy Optimization), which trains this discipline via two contributions: (1) a two-mode reward where a leakage mode drives post-cutoff claims to zero as a hard prerequisite before a performance mode optimizes task performance; and (2) a GRPO-based training pipeline that enables the model to discover temporally valid reasoning strategies. We prove that training monotonically decreases leakage, converges to the leak-free optimum, and improves task performance once compliance is achieved. On three prediction tasks and two models, TEMPO reduces leakage from 2~13% to 0.6~3.7% across all conditions, with task performance improving 6~13% where strong pre-cutoff signals exist and maintained where the prediction task is inherently difficult from valid information alone.

2605.18842 2026-05-20 cs.LG 版本更新

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

在非平稳环境下通过自适应安全约束实现安全的持续强化学习

Timofey Tomashevskiy

发表机构 * McMaster Centre for Software Certification(麦斯特软件认证中心) Department of Computing and Software(计算与软件系) McMaster University(麦斯特大学)

AI总结 本文提出了一种结合三种自适应安全机制的框架,用于在非平稳环境下实现安全的持续强化学习,通过自适应约束机制减少分布偏移下的安全违规,同时保持任务性能。

Comments Preprint version

详情
AI中文摘要

在非平稳环境中进行安全强化学习需要能够适应环境变化的安全部件。标准的安全强化学习方法通常假设固定约束或稳定的环境条件,这在分布偏移下可能不足。我们提出了LILAC+,一个用于非平稳环境下安全持续强化学习的框架,结合了三种自适应安全机制:基于上下文的安全约束、适应速度约束和预算到状态的安全执行。基于上下文的约束通过推断和预测的环境上下文调整安全要求。适应速度约束在环境变化速率超过智能体安全适应能力时收紧安全要求。预算到状态执行将累积安全要求转换为本地状态级控制约束,可在决策时执行。这些机制共同提供了一种统一的方法,用于持续强化学习中的主动和反应性安全适应。我们在模拟驾驶环境中评估了该框架,在平稳、已见非平稳和未见非平稳条件下进行测试。结果表明,自适应安全约束在分布偏移下显著减少了安全违规,同时在与无约束和固定约束基线相比时保持了具有竞争力的任务性能。这些发现表明,安全的持续强化学习需要能够响应当前状态信息、预测的环境上下文、适应需求和剩余安全预算的自适应约束机制。

英文摘要

Safe reinforcement learning in nonstationary environments requires safety mechanisms that adapt as environmental conditions change. Standard safe reinforcement learning methods often assume fixed constraints or stable environmental conditions, which can become inadequate under distribution shift. We propose LILAC+, a framework for safe continual reinforcement learning under nonstationarity that combines three adaptive safety mechanisms: context-based safety constraints, adaptation-speed constraints, and budget-to-state safety enforcement. Context-based constraints adjust safety requirements using inferred and predicted environmental context. Adaptation-speed constraints tighten safety requirements when the rate of environmental change exceeds the agent's ability to adapt safely. Budget-to-state enforcement converts cumulative safety requirements into local state-level control constraints that can be enforced at decision time. Together, these mechanisms provide a unified approach for proactive and reactive safety adaptation in continual reinforcement learning. We evaluate the framework in simulated driving environments under stationary, seen nonstationary, and unseen nonstationary conditions. The results show that adaptive safety constraints substantially reduce safety violations under distribution shift while maintaining competitive task performance compared with unconstrained and fixed-constraint baselines. These findings suggest that safe continual reinforcement learning requires adaptive constraint mechanisms that respond not only to current state information but also to predicted environmental context, adaptation demand, and remaining safety budget.

2605.18841 2026-05-20 cs.LG 版本更新

From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning

从累积约束到适应性运行时安全控制:非平稳强化学习

Timofey Tomashevskiy

发表机构 * McMaster Centre for Software Certification(麦斯特软件认证中心) Department of Computing and Software(计算与软件系) McMaster University(麦斯特大学)

AI总结 本文提出了一种适应性运行时安全控制机制CPSS,通过将累积安全预算转化为适应性的状态级控制约束,以应对非平稳强化学习中的安全问题,通过动态调整安全阈值来保证执行动作的安全性,同时在多个高速公路合并场景中验证了其有效性。

Comments 13 pages. Preprint version

详情
AI中文摘要

在强化学习中,安全性通常通过累积成本约束来指定,但这些轨迹级保证并不能直接防止不安全的个体决策,特别是在非平稳环境下。在连续和非平稳设置中,风险与相同动作在不同上下文中的关联性变化,而固定状态级阈值可能过于保守或过于宽松。我们提出Constraint Projection Safety Shield (CPSS),一种运行时机制,将累积安全预算转化为适应性的状态级控制约束。CPSS跟踪剩余安全预算,将其投影为时间变化的可接受风险阈值,并过滤预测安全成本超过活跃阈值的策略动作。阈值通过上下文信号在线调整,使得在更严格或快速变化的环境中执行更严格,在可用安全预算充足时则更宽松。我们分析了由此产生的保护策略,并证明该机制保证了执行动作的状态级阈值满足,诱导了有限时间累积成本界,并在干预频率和每步奖励扭曲方面给出了性能退化界。我们使用highway-env在非平稳高速公路合并场景中评估了CPSS。在多个种子下,CPSS显著减少了基于接近度的安全违规,并增加了分离边缘,同时选择性干预而不是主导学习的策略。这些结果支持了将累积安全规范转化为有效本地安全控制的适应性预算到阈值投影作为实际应用的方法。

英文摘要

Safety in reinforcement learning is often specified through cumulative cost constraints, but these trajectory-level guarantees do not directly prevent unsafe individual decisions, especially under nonstationarity. In continual and nonstationary settings, the difficulty is amplified because the risk associated with the same action can vary across contexts, while a fixed state-level threshold may be either too conservative or too weak. We propose Constraint Projection Safety Shield (CPSS), a runtime mechanism that converts a cumulative safety budget into adaptive state-level control constraints during execution. CPSS tracks the remaining safety budget, projects it into a time-varying admissible risk threshold, and filters policy actions whose predicted safety cost exceeds the active threshold. The threshold is adjusted online using contextual signals so that enforcement becomes stricter in more demanding or rapidly changing regimes and less restrictive when the available safety budget is sufficient. We analyze the resulting shielded policy and show that the mechanism guarantees per-state threshold satisfaction for executed actions, induces finite-horizon cumulative cost bounds, and yields a performance degradation bound in terms of intervention frequency and per-step reward distortion. We evaluate CPSS in nonstationary highway merging scenarios using highway-env. Across multiple seeds, CPSS substantially reduces proximity-based safety violations and increases separation margins while intervening selectively rather than dominating the learned policy. These results support adaptive budget-to-threshold projection as a practical way to transform cumulative safety specifications into effective local safety control for continual reinforcement learning systems.

2605.18839 2026-05-20 cs.LG cs.AI 版本更新

An Integrated Forecasting Prototype for Emergency Department Boarding Time to Support Proactive Operational Decision Making

急诊部候诊时间集成预测原型:支持主动运营决策制定

Orhun Vural, Abdulaziz Ahmed, Ferhat Zengul, James Booth, Bunyamin Ozaydin

发表机构 * Department of Electrical and Computer Engineering, University of Alabama at Birmingham(阿拉巴马大学伯明翰分校电气与计算机工程系) Department of Health Services Administration, University of Alabama at Birmingham(阿拉巴马大学伯明翰分校卫生服务管理系) Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama at Birmingham(阿拉巴马大学伯明翰分校希尔斯医院医学院生物医学信息学与数据科学系) Department of Emergency Medicine, University of Alabama at Birmingham(阿拉巴马大学伯明翰分校急诊医学系)

AI总结 本文提出了一种多时间跨度的时间序列预测框架,用于预测急诊部候诊时间,以支持主动的运营决策制定,通过整合真实世界数据和外部上下文数据源,如天气、节假日和重大本地事件,提高了预测准确性。

Comments 22 pages, including supplementary materials

详情
AI中文摘要

急诊部门(ED)的拥挤状况仍然是全球范围内持续存在的运营挑战,导致护理延误和后续拥堵。急诊部候诊时间,定义为被收治患者在等待住院床放置期间在急诊部停留的时间,是这种拥堵的关键指标。提前预测急诊部候诊时间可以实现主动的运营决策制定,防止拥堵加剧。我们开发并评估了多时间跨度的时间序列预测框架,以预测6、8、10、12和24小时的急诊部候诊时间。利用美国一所大学附属城市的大学附属医院的真实世界数据,并整合外部上下文数据源,包括天气、节假日和重大本地事件。基于分解的线性(DLinear)和基于标准化的线性(NLinear)时间序列预测深度学习模型在多个时间跨度上表现优异。模型还被评估了在极端拥堵场景下的表现,这些场景由较高的候诊时间特征化。此外,还开发了一个机器学习运维(MLOps)网页原型应用,以支持将预测框架转化为实际应用,通过整合数据摄入、预测可视化、实验和重新训练等功能。

英文摘要

Overcrowding in emergency departments (ED) remains a persistent operational challenge worldwide, causing delays in care delivery and downstream congestion. ED boarding time, defined as the duration admitted patients remain in the ED while awaiting inpatient bed placement, is a key indicator of this congestion. Predicting ED boarding time in advance enables proactive operational decision making before congestion escalates. We developed and evaluated a multi-horizon time series forecasting framework to predict ED boarding time at 6, 8, 10, 12, and 24-hour horizons. Real-world data from a university-affiliated urban hospital in the United States were utilized and integrated with external contextual data sources, including weather, holidays, and major local events. Decomposition-based Linear (DLinear) and Normalization-based Linear (NLinear) time series forecasting deep learning models showed superior performance across multiple horizons. Models were also evaluated under extreme congestion scenarios characterized by elevated boarding times. In addition, a Machine Learning Operations (MLOps) web application prototype was developed to support translation of the forecasting framework into practice through integrated data ingestion, forecast visualization, experimentation, and retraining.

2605.18837 2026-05-20 cs.LG cs.AI eess.SP 版本更新

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

VCR:学习不完整可穿戴信号的有效上下文表示

Yuxuan Weng, Wenhan Luo, Qijia Shao

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学)

AI总结 本文提出VCR框架,通过学习鲁棒于模态缺失的表示,解决可穿戴信号不完整问题,提升在多种健康监测任务中的性能和鲁棒性。

详情
AI中文摘要

可穿戴设备能够从多模态信号中实现连续健康监测,但实际部署受到有限标注数据和普遍传感器不完整性的阻碍。尽管大规模自监督预训练减少了对标签的依赖,但现有方法大多假设全模态可用性。目前处理模态缺失的方法通常重建整个缺失信号,这可能导致无法从观测传感器信号推断出的模态特定细节的幻觉,从而降低鲁棒性。我们提出VCR,一种自监督框架,学习提取对模态缺失具有鲁棒性的表示。VCR采用正交分词器,通过校正潜在流形并应用几何投影,严格分离每个模态到共享语义和模态特定残差。这种设计在保持完整信息完整性的同时,为模态缺失下的稳健学习提供了结构基础。所生成的标记由一个缺失感知的混合专家背骨处理,能够适应不同模式的模态可用性。通过将目标限制为仅重建缺失模态的共享组件,VCR有效减轻了无法推断的模态特定细节的幻觉。在多个健康监测任务中,VCR在完整、单缺失和多缺失模态设置下,相比强大的监督和自监督基线,一致提升了性能和鲁棒性。

英文摘要

Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage hallucinating modality-specific details that are not inferable from the observed sensor signals and degrade robustness. We propose VCR, a self-supervised framework that learns to extract valid representations robust to modality missingness. VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This design preserves complete information integrity while serving as a structural foundation for robust learning under modality missingness. The resulting tokens are processed by a missing-aware mixture-of-experts backbone that adapts to varying patterns of modality availability. By constraining the objective to reconstruct only the shared components of missing modalities, VCR effectively mitigates hallucinations of non-inferable modality-specific details. Across multiple health monitoring tasks, VCR consistently improves performance and robustness under full, single-missing, and multiple-missing modality settings compared with strong supervised and self-supervised baselines.

2605.18836 2026-05-20 cs.LG cs.CV 版本更新

Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

谱梯度手术用于领域通用化数据集蒸馏

Minyoung Oh, Najeong Chae, Jae-Young Sim

发表机构 * Graduate School of Artificial Intelligence(人工智能研究生院) Ulsan National Institute of Science and Technology (UNIST)(乌山国立科学与技术研究院(UNIST))

AI总结 本文提出了一种新的数据集蒸馏方法,即领域通用化数据集蒸馏(DGDD),通过谱梯度手术(SGS)来提升蒸馏数据集对超出分布(OOD)的泛化能力,同时保持与现有数据集蒸馏方法的兼容性。

Comments 17pages

详情
AI中文摘要

数据集蒸馏(DD)合成一个紧凑的合成数据集,以保留完整数据集的训练效用。然而,其标准公式假设测试数据遵循与训练数据相同的分布,这一假设在实践中很少成立。一种直接的扩展——将事后域泛化(DG)技术应用于蒸馏数据——并不合适,因为现有DG方法依赖于真实数据集的自然多样性,而压缩的合成集本质上缺乏这种多样性,同时还会带来显著的增强开销,这与数据集蒸馏的效率目标相冲突。为了解决这一限制,我们引入了领域通用化数据集蒸馏(DGDD),一种新的问题设定,明确针对蒸馏数据集的超出分布泛化。我们通过广泛采用的DD基线分布匹配(DM)来研究这一问题。我们将DM的超出分布脆弱性归因于压缩合成集中类判别信息和领域特定信息的纠缠,并提出谱梯度手术(SGS)来解纠缠。SGS的关键见解是跨域在谱域中的梯度一致性和跨域梯度组件的共享揭示了哪些梯度组件在源域之间共享——因此是类判别性的——以及哪些是领域特定的。基于这一观察,SGS在标准DM更新中添加了两个互补的梯度:一个强化跨域共享组件,另一个促进蒸馏数据集内的多样性。在多样规模基准上的广泛实验表明,SGS在提升超出分布泛化的同时,仍保持与现有DM方法的即插即用兼容性。

英文摘要

Dataset Distillation (DD) synthesizes a compact synthetic dataset that preserves the training utility of a full dataset. However, its standard formulation assumes that test data follow the same distribution as training data, an assumption that rarely holds in practice. A straightforward extension-applying post-hoc Domain Generalization (DG) techniques to distilled data-is ill-suited because existing DG methods rely on the natural diversity of real datasets, which compact synthetic sets inherently lack, while also incurring substantial augmentation overhead that conflicts with the efficiency objective of dataset distillation. To address this limitation, we introduce Domain Generalizable Dataset Distillation (DGDD), a new problem setting that explicitly targets out-of-distribution (OOD) generalization of distilled datasets. We study this problem through a widely adopted DD baseline of Distribution Matching (DM). We attribute the OOD vulnerability of DM to the entanglement of class-discriminative and domain-specific information within the compressed synthetic set, and propose Spectral Gradient Surgery (SGS) to disentangle the two. The key insight of SGS is that cross-domain agreement among domain-wise gradients in the spectral domain reveals which gradient components are shared across source domains-and are therefore class-discriminative-and which are domain-specific. Based on this observation, SGS augments the standard DM update with two complementary gradients: one that reinforces cross-domain shared components and another that explicitly promotes diversity within the distilled dataset. Extensive experiments on diverse-scale benchmarks demonstrate that SGS substantially improves OOD generalization while remaining plug-and-play compatible with existing DM methods.

2605.18835 2026-05-20 cs.LG 版本更新

StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping

StampFormer: 一种基于物理的材料-几何耦合多模态模型,用于快速预测冲压板料的物理场

Jiajie Luo, Mohamed Mohamed, Osama Hassan, Haosu Zhou, Yingxue Zhao, Haoran Li, Xinrun Li, Zhutao Shao, Yang Long, Nan Li, Jichun Li

发表机构 * Dyson School of Design Engineering, Imperial College London(帝国理工学院设计工程学院) School of Computing, Newcastle University(新castle大学计算机学院) Department of Computing, Imperial College London(帝国理工学院计算系) Multi-X Solution Limited(Multi-X解决方案有限公司) Department of Computer Science, Durham University(达勒姆大学计算机科学系) Department of Mechanical Engineering, Faculty of Engineering, Helwan University(Helwan大学工程学院机械工程系)

AI总结 本文提出StampFormer模型,通过结合材料和几何信息,实现对冲压板料物理场的快速准确预测,从而提高设计效率。

详情
AI中文摘要

传统冲压板料成型依赖于耗时且昂贵的有限元分析(FEA)进行设计验证,这一过程显著延长了设计周期。虽然代理模型提供了更快的迭代速度,但现有方法存在局限:标量方法无法捕捉全面的基于场的FEA结果,而现有基于图像的方法往往忽略了材料属性的关键作用,仅关注几何。为解决这一差距,我们开发了一种基于物理的深度学习框架,即StampFormer,该框架同时利用组件几何和材料应力-应变响应来预测FEA结果。StampFormer框架使用三个核心组件处理数据。首先,材料增强的几何网络(MAGN)融合几何和材料数据。然后,通过层次化材料嵌入注入单元(HMEIU)在不同层次上整合信息,再由主网络骨干,即改进的Swin-UNet进行处理。我们在交叉件面板冲压上评估了我们的模型,使用两个模拟数据集进行钢和铝板的冲压模拟,结果表明,StampFormer在不到一秒的时间内提供了高保真的关键物理场预测,包括薄化、主应变、次应变、塑性应变和位移。与真实FEA相比,我们的模型在四个二维场上的平均相对误差小于8.5%,在三维位移场上的均方误差小于1.2 mm²。总之,我们介绍了一种实用且高效的框架,整合了多模态信息,即几何和材料属性,以提供快速且准确的预测,使设计师能够进行实时的可制造性评估。

英文摘要

Traditional sheet metal forming relies on time-consuming and expensive Finite Element Analysis (FEA) for design validation, a process that significantly prolongs design cycles. While surrogate models offer faster iteration, current approaches have limitations: scalar-based methods cannot capture comprehensive field-based FEA results, while existing image-based models often ignore the critical role of material properties by focusing solely on geometry. To address this gap, we develop a physics-guided deep learning framework, namely StampFormer, which simultaneously uses component geometry and material stress-strain responses to predict FEA outcomes. The StampFormer framework uses three core components to process data. A Material-Augmented Geometric Network (MAGN) first fuses geometric and material data. This information is then integrated at various levels by a Hierarchical Material Embedding Injection Unit (HMEIU) before being processed by the primary network backbone, an adapted Swin-UNet. We evaluated our model on the stamping of a crossmember panel with two simulation datasets for steel and aluminium panels, and results demonstrate that StampFormer provides high-fidelity predictions of critical physical fields - including thinning, major strain, minor strain, plastic strain, and displacement - in under a second. Compared with ground truth FEA, our model achieved an average relative error of less than 8.5% on the four 2D fields and a mean squared error of less than 1.2 mm2 for the 3D displacement field. In summary, we introduce a practical and efficient framework that integrates multimodal information, namely geometry and material properties, to provide fast and accurate predictions, enabling designers to perform real-time manufacturability assessments.

2605.18832 2026-05-20 cs.LG cs.AI 版本更新

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

通过卡尔曼滤波、克里格法和过程噪声的精确跟踪变压器

Bo Long, Deepak Agarwal, Jelena Markovic-Voronov, Yi Wang, Liuqing Li

发表机构 * LinkedIn Core AI(LinkedIn核心AI)

AI总结 本文提出了一种基于贝叶斯滤波的变压器(BFT),通过引入精度权重的克里格法、自适应卡尔曼更新和动态模型,解决了传统变压器在处理不确定性方面的不足,提升了序列推荐和大语言模型在噪声环境下的鲁棒性。

详情
AI中文摘要

Transformer是现代AI的基础构建块,但其缺乏对不确定性的原则性处理,这在实际应用中普遍存在:序列推荐中的冷启动标记具有稀疏的历史,语言模型中的异质信号质量,以及由无约束softmax引起的注意力 sinks。每个token都被统一的置信度处理。我们证明这种统一性是我们的贝叶斯滤波变压器(BFT)的退化情况:注意力变为精度加权克里格法,残差连接变为具有自适应增益的卡尔曼更新,FFN变为通过雅可比矩阵加过程噪声规则传播精度的动力学模型。观测精度来自一个无参数的受限最大似然(REML)估计器,具有共轭贝叶斯先验。BFT将任何Transformer层替换为几乎无开销。在序列推荐中,BFT应用于三种主要架构,在六个基准上获得显著提升,其中在冷启动用户和稀有物品上改进最大。在具有噪声数据的监督微调中,BFT在两个领域提高了鲁棒性:噪声监督(问答中的token-标签腐败)和噪声上下文(具有真实RAG干扰项的检索增强问答)。单个原则性修改——恢复精度——在经典序列建模和现代LLM领域中释放了大量空间。

英文摘要

The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications: cold-start tokens with sparse histories in sequential recommendation, heterogeneous signal quality in language models, and attention sinks induced by unconstrained softmax. Every token is treated with uniform confidence. We show this uniformity is a degenerate case of our \emph{Bayesian Filtering Transformer} (BFT): attention becomes precision-weighted kriging, the residual connection becomes a Kalman update with adaptive gain, and the FFN becomes a dynamics model propagating precision via a Jacobian--plus--process-noise rule. Observation precision comes from a parameter-free Restricted Maximum Likelihood (REML) estimator with a conjugate Bayesian prior. BFT replaces any Transformer layer with negligible overhead. On sequential recommendation, BFT applied to three major architectures yields significant gains on six benchmarks, with the largest improvements on cold-start users and rare items where uncertainty is highest. On supervised fine-tuning of large language models with noisy data, BFT improves robustness in two regimes: noisy supervision (token-label corruption in question answering) and noisy context (retrieval-augmented QA with real RAG distractors). A single principled modification -- restoring precision -- unlocks substantial headroom across both classical sequence-modeling and modern LLM regimes.

2605.18831 2026-05-20 q-bio.QM cs.LG 版本更新

Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows

通过物理基础的代理工作流发现胰岛素输送聚合物

Martins Otun

发表机构 * Algonix AI Ltd.(Algonix AI有限公司)

AI总结 本文提出了一种基于物理的代理工作流方法,用于发现胰岛素输送的聚合物,通过大规模语言模型和物理工具的结合,在有限预算内高效搜索离散的PSMILES空间,实现了优于强化学习和贝叶斯优化的胰岛素-聚合物相互作用能。

详情
AI中文摘要

冷链存储限制了数亿人获得胰岛素的机会;一种热保护性贴片聚合物可能有所帮助,但设计空间太大无法进行彻底实验。从这一问题出发,我们聚焦于一种代理工作流:一个大型语言模型(LLM)通过模型上下文协议(MCP)调用基于物理的工具,在OpenMM Packmol-矩阵评估预算内搜索离散的PSMILES空间。LLM充当一个隐含的获取函数,基于一个持续更新的“发现世界”:假设、文献声明和模拟结果。在匹配的Oracle预算下,最佳自主行动达到了胰岛素-聚合物相互作用能为-2263 kJ/mol,优于强化学习基线68%和贝叶斯优化19%。三个独立行动收敛到一个结构特征(每个重复单元密集的氢键供体和受体)的同时,物理检查拒绝不可行的排列和名称-结构不匹配,从而在下一步之前阻止了这些不合理的排列。科学阶段是CPU绑定的,并在商用硬件上运行。更广泛地说,这里设计的相同架构和工作流适用于其他蛋白质稳定化任务,只要存在可处理的筛选Oracle。

英文摘要

Cold-chain storage limits access to insulin for hundreds of millions of people; a thermally protective patch polymer could help, but the design space is too large for exhaustive experiment. Starting from that problem, we narrow to an agentic workflow: a large language model (LLM) calls physics-based tools through the Model Context Protocol (MCP), searching the discrete PSMILES space under a budget of OpenMM Packmol-matrix evaluations. The LLM acts as an implicit acquisition function conditioned on a persistent "discovery world": hypotheses, literature claims, and simulation outcomes updated each iteration. Under matched oracle budgets, the best autonomous campaign reaches an insulin-polymer interaction energy of -2263 kJ/mol, outperforming reinforcement-learning baselines by 68% and Bayesian optimization by 19%. Three independent campaigns converge on one structural motif (dense hydrogen-bond donors and acceptors per repeat unit) while physics checks reject infeasible packings and name-structure mismatches before they steer the next step. The science stage is CPU-bound and runs on commodity hardware. More broadly, the same architecture and workflow designed here applies to other protein-stabilization tasks whenever a tractable screening oracle is available.

2605.18830 2026-05-20 cs.LG 版本更新

In-Context Learning Operates as Concept Subspace Learning

基于情境学习的概念子空间学习

Wei Tang, Xinyan Jiang, Fakhri Karray, Lijie Hu

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎伊德·本·扎耶德人工智能大学) Shanghai Advanced Research Institute(上海先进研究院)

AI总结 本文研究了结构化演示是否诱导低维概念推理,通过概念子空间视角揭示了情境学习中预测分解为概念坐标回归和子空间泄漏的机制,并通过实验验证了任务信息集中在低维任务对齐激活子空间中的结论。

详情
AI中文摘要

回归和贝叶斯对情境学习(ICL)的解释说明了演示如何诱导预测器,而机械分析通常识别出紧凑的激活方向,引导受促行为。然而,仍不清楚结构化演示是否诱导低维概念推理。我们通过概念子空间视角研究这一问题,在此视角中,任务仅沿内在概念坐标变化,尽管输入观察在高维环境空间中。对于岭回归和最小二乘ICL代理,预测精确分解为概念坐标回归和子空间泄漏。在块对角或近块对角协方差假设下,主导估计和噪声敏感项随概念子空间的维度变化,而残差效应由跨子空间耦合控制。这种分离给出了机械预测:可恢复的任务信息应集中在低维、任务对齐的激活子空间中。在CounterFact衍生的多关系提示上使用Llama-3-8B,4096维残差流的68-73维子空间恢复了78.8%的干净-受污染准确率差距,而补全互补子空间则恢复了0%。概念交换将预测引导至注入的关系,而随机和跨任务匹配排名控制效果不大。此外,在Qwen2.5-7B和受控的跨语言规则任务上的额外实验显示了相同定性模式。这些结果支持概念子空间作为紧凑、任务对齐的可恢复ICL行为在结构化任务家族中的中介,而不意味着全电路恢复。

英文摘要

Regression and Bayesian accounts of in-context learning (ICL) explain how demonstrations can induce predictors, while mechanistic analyses often identify compact activation directions that steer prompted behavior. However, it remains unclear whether structured demonstrations induce low-dimensional concept inference. We study this question through a concept-subspace view of ICL, in which tasks vary only along intrinsic concept coordinates, although inputs are observed in a high-dimensional ambient space. For ridge and least-squares ICL proxies, prediction decomposes exactly into concept-coordinate regression and off-subspace leakage. Under block-diagonal or near-block-diagonal covariance assumptions, the leading estimation and nuisance-sensitivity terms scale with the dimension of the concept subspace, while residual effects are controlled by cross-subspace coupling. This separation gives a mechanistic prediction: recoverable task information should concentrate in a low-dimensional, task-aligned activation subspace. On CounterFact-derived multi-relation prompts with Llama-3-8B, a 68--73-dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean--corrupted accuracy gap, whereas patching the complementary subspace restores 0%. Concept swaps redirect predictions toward injected relations, while random and cross-task matched-rank controls are largely ineffective. Additional experiments on Qwen2.5-7B and a controlled cross-lingual rule task show the same qualitative pattern. These results support concept subspaces as compact, task-aligned mediators of recoverable ICL behavior in structured task families, without implying full-circuit recovery.

2605.18829 2026-05-20 cs.LG cs.CR 版本更新

Lossless Anti-Distillation Sampling

无损反蒸馏采样

Zibo Diao, Jingchu Gai, Xinyue Ai, Zhang Zhang, Zhenyu He, Di He

发表机构 * Peking University(北京大学) Tsinghua University(清华大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种无损反蒸馏采样方法,通过在保持良性用户体验的同时,有效对抗多账号蒸馏攻击,降低蒸馏模型的泛化能力。

详情
AI中文摘要

面向商业生成模型的前沿领域,蒸馏攻击正成为日益严峻的威胁。蒸馏者通过收集生成响应并以极低的成本训练自己的竞争模型。现有防御措施要么依赖于修改模型输出,从而牺牲良性用户的响应质量,要么依赖于行为检测方法,这些方法可以通过在多个账户上分布查询来轻易绕过。在本工作中,我们提出了无损反蒸馏采样(LADS),一种专门设计用于对抗多账号蒸馏同时保持良性用户体验的新型采样方案。具体而言,LADS从由查询的语义内容和用户查询次数决定的私有种子中推导出每种生成的随机性。通过构造,每个良性用户在每次访问时都会独立地从原始模型中采样响应,因此不会产生失真。相反,对于蒸馏者,不同账户在相同语义桶中的查询会共享潜在随机性。因此,收集的数据变得相关,可能降低样本多样性并损害泛化能力。利用统一收敛理论,我们证明LADS在无条件和条件生成设置中,能够证明降低蒸馏者泛化差距的收敛率相对于标准i.i.d.采样。在图像生成、数学推理和代码生成的实验中,证实LADS显著降低蒸馏学生的表现,同时保持对单个用户的精确统计保真度。

英文摘要

Frontier commercial generative models face a growing threat from distillation, whereby a distiller harvests generated responses and trains a competing model of its own at drastically lower cost. Existing defenses either rely on modifying the models outputs, thereby sacrificing response quality for benign users, or on behavioral detection methods, which can be readily circumvented by distributing queries across multiple accounts. In this work, we propose Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme specifically designed to counter multi-account distillation while maintaining a lossless experience for benign users. Concretely, LADS derives the randomness underlying each generation from a private seed determined by the semantic content of the query and the number of times the user has queried the model. By construction, every benign user receives a response independently sampled from the original model at each visit, and thus experiences no distortion. In contrast, for a distiller, different accounts share latent randomness whenever their queries fall in the same semantic bucket. As a result, the harvested data becomes correlated, potentially reducing sample diversity and degrading generalization. Using uniform convergence theory, we show that LADS provably degrades the convergence rate of the distillers generalization gap relative to standard i.i.d. sampling in both unconditional and conditional generation settings. Experiments on image generation, mathematical reasoning, and code generation confirm that LADS substantially degrades the performance of distilled students while preserving exact statistical fidelity for individual users.

2605.18827 2026-05-20 cs.IR cs.LG cs.PL 版本更新

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

为小型语言模型引导的推理:评估可执行的多项选择题问答框架

Prateek Biswas, Dhaval Patel, Vedant Khandelwal, Shuxin Lin, Amit Sheth

发表机构 * IBM New York City,NY(IBM纽约市) University of South Carolina(南卡罗来纳大学) Artificial Intelligence Institute at University of South Carolina(南卡罗来纳大学人工智能学院)

AI总结 本文提出Code-Guided Reasoning(CGR)评估协议和生成程序资源,用于衡量可执行推理框架如何提升小型语言模型在多项选择题问答任务中的表现,通过实验展示了使用可执行框架带来的性能提升。

Comments 28 Pages, 18 Figures

详情
AI中文摘要

多项选择问答基准通常将小型语言模型(SLM)作为直接回答者进行评估,但部署的语言模型系统越来越多地依赖外部框架,如工具、代码和重复的模型调用。我们引入Code-Guided Reasoning(CGR),一种评估协议和生成程序资源,用于衡量可执行推理框架如何提高SLM在MCQA任务中的表现。CGR标准化了六个组件:规范化的问题接口、直接求解提示、生成提示、Python框架、求解器调用和提取辅助程序,以及三通道结果记录。在本地准备的MCQA数据包和六个元数据注册的求解器模型中,保留的20,498结果行显示,非零基线部分的宏辅助准确率为66.21%,直接准确率为38.11%,差异为+28.10个百分点,置信区间为[20.32, 36.43]。在更严格的Ab > 30%直接信号门限下,宏差异为+14.11点。这些估计是描述性的。辅助推理使用更大的求解器调用预算,答案提取是脆弱的,Time-MQA包含观察到的回归,且某些生成程序违反了无硬编码指令。CGR提供了解释这些结果所需的跟踪包,包括直接、辅助和生成器侧的答案、分区定义、生成程序、响应元数据和审计。

英文摘要

Multiple-choice QA benchmarks usually evaluate small language models (SLMs) as direct answerers, but deployed language-model systems increasingly rely on external scaffolds such as tools, code, and repeated model calls. We introduce Code-Guided Reasoning (CGR), an evaluation protocol and generated-program resource for measuring when executable reasoning scaffolds improve SLM performance on MCQA tasks. CGR standardizes six components: a normalized item interface, a direct solver prompt, a generator prompt, a Python scaffold, solver-call and extraction helpers, and a three-channel result record. On 20,498 retained result rows from a locally prepared MCQA bundle and six metadata-registered solver models, the observed non-zero-baseline partition shows 66.21% macro assisted accuracy versus 38.11% direct accuracy, a +28.10 percentage-point difference with a pair-bootstrap interval of [20.32, 36.43]. Under a stricter Ab > 30% direct-signal gate, the macro difference is +14.11 points. These estimates are descriptive. Assisted inference uses a larger solver-call budget, answer extraction is brittle, Time-MQA contains the observed regressions, and some generated programs violate the no-hard-coding instruction. CGR provides the trace package needed to interpret these results, including direct, assisted, and generator-side answers, partition definitions, generated programs, response metadata, and audits.

2605.18826 2026-05-20 cs.LG cs.AI 版本更新

The Routing and Filtering Structure of Attention

注意力的路由和过滤结构

Shafayeth Jamil, Rehan Kapadia

发表机构 * University of Southern California(南加州大学)

AI总结 本文研究了注意力机制中的路由和过滤结构,通过分解1776个预训练Transformer的头部,发现路由在低秩状态下运行,并引入S-D注意力作为诊断参数化方法,分离路由和过滤,实现稳定训练和有效降维。

Comments 13 pages, 7 figures

详情
AI中文摘要

注意力交互矩阵$QK^{ op}$包含两个交织的计算:一个斜对称成分用于在位置间重新分配信息(路由),一个对称成分用于缩放相互相关性(过滤)。我们分解了五个预训练Transformer中的1776个头部,发现路由在低秩状态下运行,远低于权重核分配的路由能力。我们引入了S-D注意力作为诊断参数化方法,通过构造分离路由和过滤,保证稳定性($\mathrm{Re}(λ) \le 0$)并稳定训练而无需层归一化。当分离和未归一化时,路由自组织成一个谱级联,第一层的有效秩为2,随着深度扩展到六个尺度,从7M到355M参数。级联预测了注意力可以简化的位置:线性化125M S-D注意力的前七层成本低于5%的困惑度,而标准注意力在相同干预下崩溃。可线性化的区域随着深度扩大。用ELU+1线性注意力替换前四层,可在完整头部维度内达到基线的1.4%以内。级联分配的架构用注意力参数换取困惑度(47%-65%更少的注意力参数,+3.9%到+8.4% PPL)。路由-过滤分解使谱预算变得清晰;级联使其具有可操作性。

英文摘要

The attention interaction matrix $QK^{\top}$ contains two entangled computations: a skew-symmetric component that redistributes information between positions (routing) and a symmetric component that scales mutual relevance (filtering). We decompose 1776 heads across five pretrained transformers and find routing operating at low rank, well below the routing capacity allocated by the weight kernel. We introduce $S$-$D$ attention as a diagnostic parameterization that disentangles routing from filtering by construction with guaranteed stability ($\mathrm{Re}(λ) \le 0$) and trains stably without layer normalization. When disentangled and unnormalized, routing self-organizes into a spectral cascade, effective rank $2$ at the first layer, expanding with depth across six scales from 7M to 355M parameters. The cascade predicts where attention can be simplified: linearizing the first seven layers of 125M $S$-$D$ attention costs ${<}5\%$ perplexity, whereas standard attention collapses under the same intervention. The linearizable region widens with depth. Replacing the first four layers with ELU+1 linear attention reaches within $1.4\%$ of baseline at full head dimension. Cascade-allocated architectures trade attention parameters for perplexity ($47\%-65\%$ fewer attention parameters at $+3.9\%$ to $+8.4\%$ PPL). The routing-filtering decomposition makes the spectral budget legible; the cascade makes it actionable.

2605.18825 2026-05-20 cs.LG cs.DC 版本更新

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches

并非所有标记都值得缓存:学习语义感知的淘汰策略用于LLM前缀缓存

Shaoke Fang, Ziang Li, Wenfei Wu, Jiatong Ji, Qingsong Liu, Ruizhi Pu

发表机构 * Peking University(北京大学) FirestAI Tsinghua University(清华大学) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校) Southeast University(东南大学)

AI总结 本文提出了一种语义感知的前缀缓存淘汰策略SAECache,通过多队列架构、语义感知的标记加权机制和全适应的在线学习方案,提高了LLM服务中前缀缓存的效率,从而在不同工作负载下实现了显著的TTFT提升。

详情
AI中文摘要

前缀缓存是大型语言模型(LLM)服务中的关键优化,通过重用注意力键值(KV)状态来减少昂贵的prefill计算。然而,其效益依赖于淘汰策略,因为GPU内存有限,而现有策略如LRU通常将缓存块视为统一处理。这种观点忽略了LLM提示的一个基本属性:并非所有标记都同样值得缓存。我们显示,提示中不同的标记类型,包括系统提示、用户查询、工具输出、模型响应和推理链,其重用率可能高达756倍,但现有淘汰策略并未利用这一信号。在本文中,我们提出了SAECache(语义适应的前缀缓存淘汰策略),通过三个创新来解决这一差距:(1)一个多队列架构,将KV块路由到任务特定的队列中,使用定制的优先级指标,捕捉多轮请求中的会话重用和模板单轮请求中的结构重用;(2)一种语义感知的标记加权机制,通过淘汰反馈在线学习不同标记类型的重用价值;(3)一种完全适应的在线学习方案,用于所有参数更新,包括对数正态时间参数、位置衰减幂、队列权重和元参数,这消除了手动调优并使系统能够自动适应部署特定的工作负载特性。通过在异构工作负载上的广泛评估,我们证明SAECache在生产风格的基线之上实现了1.4x-2.7x的TTFT提升,而固定参数的替代方案在工作负载不匹配时可能会下降高达2.7x,这是我们的自适应方法完全避免的失败模式。

英文摘要

Prefix caching is a key optimization in Large Language Model (LLM) serving, reusing attention Key-Value (KV) states across requests with shared prompt prefixes to reduce expensive prefill computation. However, its benefit depends critically on the eviction policy as GPU memory is scarce, and existing policies such as LRU largely treat cached blocks uniformly. This view ignores a fundamental property of LLM prompts: not all tokens are equally worth caching. We show that different token types within a prompt, including system prompts, user queries, tool outputs, model responses, and chain-of-thought reasoning, exhibit up to 756x variation in reuse rates, yet no existing eviction policy exploits this signal. In this paper, we present SAECache (Semantic-Adaptive Eviction for prefix caches), a semantic-adaptive prefix cache eviction policy that addresses this gap through three innovations: (1) a multi-queue architecture that routes KV blocks to task-specific queues with tailored priority metrics, capturing both session reuse in multi-turn requests and structural reuse in templated single-turn requests; (2) a semantic-aware token weighting mechanism that learns the reuse value of different token types online through eviction feedback; and (3) a fully adaptive online learning schema for all parameter updates, including log-normal timing parameters, position decay power, queue weights, and meta-parameters, which eliminates manual tuning and enables automatic adaptation to deployment-specific workload characteristics. Through extensive evaluation across heterogeneous workloads, we demonstrate that SAECache achieves 1.4x-2.7x TTFT improvement over production-style baselines, while fixed-parameter alternatives can degrade by up to 2.7x under workload mismatch -- a failure mode our adaptive approach avoids entirely.

2605.18824 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models

细粒度基准生成用于基础模型的全面评估

Mohammed Saidul Islam, Negin Baghbanzadeh, Farnaz Kohankhaki, Afshin Cheraghi, Ali Kore, Shayaan Mehdi, Elham Dolatabadi, Arash Afkanpour

发表机构 * Vector Institute(Vector研究院) York University(约克大学)

AI总结 本文提出了一种自动化基准生成框架,用于生成覆盖广泛、元数据丰富且抗污染的评估问题,从而提升基础模型的全面评估能力。

详情
AI中文摘要

基础模型的评估通常依赖于缺乏全面覆盖和细粒度评估元数据的基准汇总分数。我们引入了一个自动化基准生成框架。该框架生成基于参考材料(如教科书)的评估问题,生成具有广泛覆盖、丰富元数据和抗污染性的基准。该流程采用多代理架构进行问题生成,并采用以解决方案图驱动的策略,显著提高了地面真实解决方案的可靠性。使用该框架,我们生成了三个基准:机器学习、公司金融和个人金融。专家审查发现,其地面真实错误率显著低于之前的基准,如MMLU和GSM8K。对12个商业和开源模型的评估显示,我们的基准实现了接近均匀的竞争力覆盖,并揭示了现有基准未能捕捉到的模型间性能差异。我们即将开源该框架和我们精心挑选的基准。

英文摘要

Evaluation of foundation models often rely on aggregate scores from benchmarks that lack comprehensive coverage and metadata for a fine-grained evaluation. We introduce a framework for automated benchmark generation. Our framework generates evaluation problems grounded in reference material, such as textbooks, producing benchmarks with broad coverage, rich metadata, and robustness to contamination. The pipeline employs a multi-agent architecture for problem generation and a solution-graph-driven strategy that significantly improves the reliability of ground truth solutions. Using the framework, we generate three benchmarks in Machine Learning, Corporate Finance, and Personal Finance. Expert review finds a significantly lower ground-truth error rate than previous benchmarks such as MMLU and GSM8K. Evaluation of 12 commercial and open-source models shows that our benchmarks achieve near-uniform competency coverage and surface performance differences across models that existing benchmarks fail to capture. We will open-source the framework and our curated benchmarks soon.

2605.18823 2026-05-20 cs.LG 版本更新

Multi-Pedestrian Safety Warning at Urban Intersections Use Case of Digital Twin

城市交叉口多行人安全预警的数字孪生应用案例

Yongjie Fu, Qi Gao, Mahshid Ghasemi Dehkordi, Gil Zussman, Xuan Di

发表机构 * Department of Civil Engineering and Engineering Mechanics at Columbia University(哥伦比亚大学土木工程与工程力学系) Data Science Institute(数据科学研究院) Department of Electrical Engineering at Columbia University(哥伦比亚大学电气工程系)

AI总结 本文提出一种基于紧密耦合物理-数字孪生框架的城市交叉口多行人安全预警系统,通过COSMOS无线测试床进行实地部署和虚拟现实实验,验证了系统在提高安全预警准确性和响应效率方面的有效性。

详情
AI中文摘要

数字孪生(DTs)在城市交通系统中已获得越来越多的关注;然而,其在安全关键场景中的系统性评估仍然有限。本文提出了一种基于紧密耦合物理-数字孪生框架的城市交叉口多行人安全预警系统。该系统基于纽约市的COSMOS城市级无线测试床,整合了摄像头和超宽带(UWB)、边缘-云计算、预测轨迹建模以及基于MQTT的通信,以向易受伤害道路使用者(VRUs)提供实时安全警报。该系统通过实地部署和虚拟现实(VR)实验进行评估。结果表明,系统在不同模型配置下具有高预警生成准确率、高定位准确率、高效的端到端延迟以及在发出警告时显著减少用户响应时间。所提出的DT框架提供了一种可扩展、模块化且通用的解决方案,用于复杂城市交叉口的实时多行人安全增强。

英文摘要

Digital twins (DTs) for urban transportation systems have gained increasing attention; however, their systematic evaluation in safety-critical scenarios remains limited. This paper presents a multi-pedestrian safety warning system at urban intersections enabled by a tightly coupled physical-digital twin framework. Built upon the COSMOS city-scale wireless testbed in New York City, the proposed system integrates camera and ultra-wideband (UWB), edge-cloud computing, predictive trajectory modeling, and MQTT-based communication to deliver real-time safety alerts to vulnerable road users (VRUs). The system is evaluated through both field deployment and virtual reality (VR) experiments. Results demonstrate high warning generation accuracy, localization accuracy, efficient end-to-end latency under different model configurations, and significant reductions in user response time when warnings are issued. The proposed DT framework provides a scalable, modular, and generalizable solution for real-time multi-pedestrian safety enhancement at complex urban intersections.

2605.18822 2026-05-20 cs.LG cs.AI 版本更新

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Hybrid-LoRA: 桥接全微调与低秩适应以实现训练后优化

Chengqian Zhang, Wei Zhu, Kyumin Lee

发表机构 * Worcester Polytechnic Institute(沃斯特理工学院) University of Hong Kong(香港大学)

AI总结 本文提出Hybrid-LoRA框架,通过选择性地对部分模块进行全微调,其余模块使用LoRA进行适应,从而在训练后优化中实现高效性能。

详情
AI中文摘要

训练后已成为适应大型语言模型(LLMs)以实现复杂下游行为(如指令遵循、偏好对齐和多步推理)的关键方法。最近,基于可验证奖励的强化学习(RLVR)作为一种特别有效的训练后范式,通过如GRPO和GSPO等无批评算法实现了可扩展的优化。然而,使用全微调(FFT)的RLVR训练后方法需要大量GPU内存并导致高训练成本。尽管参数高效微调(PEFT)方法如低秩适应(LoRA)能有效降低计算成本,但它们在复杂推理任务的训练后性能上往往存在显著差距。在本文中,我们提出了Hybrid-LoRA,一种高效的训练后框架,该框架选择性地对一小部分不太适合低秩适应的模块进行全微调,而对其余模块使用LoRA进行适应。我们引入了一个新的Hybrid-LoRA Score,用于在固定参数预算下对候选模块按其对低秩适应的敏感性进行排序。实验表明,在10%的全微调模块预算下,Hybrid-LoRA能够接近全微调性能,其余候选模块通过LoRA进行适应, consistently outperforming four state-of-the-art PEFT post-training baselines,实现了高达5.65%和平均4.36%的改进。

英文摘要

Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework that selectively applies full fine-tuning to a small subset of modules less suited to low-rank adaptation, while adapting the remaining components with LoRA. We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget. Experiments show that Hybrid-LoRA closely matches full fine-tuning performance under a 10% full fine-tuning module budget, with the remaining candidate modules adapted by LoRA, consistently outperforming four state-of-the-art PEFT post-training baselines, achieving improvements of up to 5.65% and on average 4.36% over the best baseline.

2605.18821 2026-05-20 cs.LG cs.CR 版本更新

Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods

量子对抗机器学习:从经典适应到量子原生方法

Roozbeh Razavi-Far, Mohammad Meymani, Erfan Mahmoudinia, Dorsa Vazirzade, Peyman Paknezhad, Fateme Ghasemi, Saeed Saravani, Somayeh Nikkhoo, Kimia Haghjooei

发表机构 * Faculty of Computer Science, University of New Brunswick(新不伦瑞克大学计算机科学学院) Department of Electrical Engineering, Amirkabir University of Technology(技术学院电子工程系) Faculty of Mathematical and Computer Science, Kharazmi University(卡扎尔米大学数学与计算机科学学院) Pázmány Péter Catholic University(帕兹曼·彼得天主教大学) Department of Computer Engineering, Amirkabir University of Technology(技术学院计算机工程系) Department of Computer Engineering, Ferdowsi University of Mashhad(马赞德兰大学计算机工程系) Department of Computer Science, Tarbiat Modares University(塔里克·莫达res大学计算机科学系)

AI总结 本文研究量子对抗机器学习中的攻击与防御策略,探讨其理论基础、发展趋势和关键挑战。

详情
Journal ref
Artif Intell Rev (2026)
AI中文摘要

机器学习已革新了众多工业领域。尽管取得了近期进展,机器学习模型仍然容易受到对抗性威胁。对抗性机器学习研究这些脆弱性以构建稳健的机器学习模型。量子机器学习是连接量子计算和经典机器学习的交叉领域。虽然量子机器学习在回归、分类和生成建模等复杂任务中可能超越经典机器学习,但它仍然容易受到对抗性攻击。鉴于量子计算和机器学习的近期进展,量子对抗性机器学习领域应运而生,以研究量子机器学习的脆弱性、可能的攻击和新型量子增强的防御策略。在本文的综述中,我们提供了量子对抗性机器学习的详细概述,探讨了现有的攻击和防御措施。我们还回顾了该领域的理论基础、新兴趋势和关键挑战。

英文摘要

Machine learning has revolutionized numerous industrial domains. Despite recent advances, machine learning models remain vulnerable to adversarial threats. Adversarial machine learning is a field that studies these vulnerabilities to build robust machine learning models. Quantum machine learning is an interdisciplinary field that bridges quantum computing and classical machine learning. While quantum machine learning shows potentials to outperform classical machine learning in complex tasks such as regression, classification, and generative modeling, it remains vulnerable to adversarial attacks. Given the recent advancements in quantum computing and machine learning, the quantum adversarial machine learning field has emerged to study the vulnerabilities of quantum machine learning, possible attacks, and novel quantum-enhanced defense strategies. In this survey, we provide a detailed overview on quantum adversarial machine learning and explore the existing attacks and countermeasures. We also review the theoretical underpinnings of this area, emerging trends, and critical challenges.

2605.18820 2026-05-20 cs.LG cs.AI 版本更新

Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision

前沿叠加的涌现:莫比乌斯吸引子与级联监督

Hongyu Gu, Jingwen Fu

发表机构 * University of Science and Technology of China(中国科学技术大学) Zhongguancun Academy(中关村学院)

AI总结 本文研究了通过叠加实现深度推理的问题,提出莫比乌斯吸引子和级联监督方法,证明了在Erdős-Rényi图上,叠加推理的涌现是通过建筑和监督的贡献实现的。

Comments 40 pages, 3 figures

详情
AI中文摘要

叠加允许Transformer在深度推理中并行处理整个推理前沿,通过有限深度的前向传递而不是展开串行的思维链token。虽然Zhu等人(2025)在单一残差流中手工构建了一个等权重的广度优先前沿用于图可达性,但仍未确定梯度下降能否在排列对称的鞍点中找到这个目标。我们通过隔离建筑和监督的贡献,填补了在Erdős-Rényi图上通过叠加实现可达性的问题。在建筑方面,我们识别出一个莫比乌斯吸引子:在树的 regime 中,层间动态减少到一个1D莫比乌斯映射,其零集是一个共维数为一的全局最优解 manifold,包含等权重叠加状态。在监督方面,我们识别出级联监督:一个损失类别,其反向传播同时提供(A)选择性 bootstrap,(B)梯度在深度的持续性,以及(C)每一步的区分(例如L_sup和L_node)。端到端监督失败于条件(B),并被证明是不足的:在图的扇出和停滞前到达 manifold 之前,层c的内部梯度衰减为(np)^{-(D-c-2)/2}。我们的论点:莫比乌斯吸引子 + 级联监督 = 叠加推理的涌现。参数无关的衰减定律预测在深度D=3时,最终步骤余弦为0.35 vs. 0.71(端到端 vs. 级联);实验证实0.37 vs. 0.69,每一步的匹配误差在0.02以内。

英文摘要

Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-thought tokens. While Zhu et al. (2025) hand-crafted an equal-weight breadth-first frontier in a single residual stream for graph reachability, it remained open whether gradient descent could ever find this target amidst permutation-symmetric saddles. We close this gap on Reachability-by-Superposition over Erdős-Rényi graphs by isolating architectural and supervisional contributions. Architecturally, we identify a Möbius attractor: under $S_n$-symmetry in the tree regime, layerwise dynamics reduce to a 1D Möbius map whose zero set is a codimension-one manifold of global optima containing the equal-weight superposition state. On the supervision side, we identify Cascade Supervision: a loss class whose backward pass simultaneously delivers (A) selectivity bootstrap, (B) gradient persistence across depth, and (C) per-step discrimination (e.g., \mathcal{L}_{sup} and \mathcal{L}_{node}). End-to-end supervision fails condition (B) and is provably insufficient: internal gradients at layer c decay as (np)^{-(D-c-2)/2} in the graph fan-out and stall before the manifold is reached. Our thesis: Möbius attractor + Cascade Supervision = emergence of superposition reasoning. The parameter-free decay law predicts a final-step cosine of 0.35 vs. 0.71 (end-to-end vs. cascade) at depth D=3; experiments confirm 0.37 vs. 0.69, matching within 0.02 at every step.

2605.18819 2026-05-20 cs.LG 版本更新

Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not

高效条件化:为何伪观测批量贝叶斯优化在某些情况下有效

Kumbha Nagaswetha, Rabi Pathak

AI总结 本文研究了批量并行贝叶斯优化中常用于批量选择的常数骗子(CL)、克里格信徒(KB)和幻想模型的有效性,揭示了高效条件化作为关键的替代属性,即在数据增强时能够以闭合形式更新预测。通过证明高斯过程满足这一要求,以及任何单调非递减于后验不确定性的获取函数(如EI、UCB、PI)都具有类似行为,统一了CL、KB和幻想模型为单一条件机制的不同实例,并建立了与局部惩罚(LP)的定量联系和与决定性点过程(DPPs)的定性联系。

详情
AI中文摘要

常数骗子(CL)、克里格信徒(KB)和幻想模型广泛用于并行贝叶斯优化中的批量选择,但缺乏统一的理论来解释它们的有效性和在何种条件下失效。我们识别出高效条件化是关键的替代属性,即在数据增强时能够以闭合形式更新预测。我们证明高斯过程满足这一要求,产生可证明不同的批量点,分离阶为l,并且对于任何单调非递减于后验不确定性的获取函数(如EI、UCB、PI),以及汤普森采样具有类似的行为。我们将CL、KB和幻想模型统一为单一的条件机制的不同实例,仅在谎言值分布上有所不同,并建立了与局部惩罚(LP)的定量联系和与决定性点过程(DPPs)的定性联系。为了区分模型结构与优化器随机性,我们引入了结构多样性诊断(SDD),一种可重用的方法用于测试替代模型的兼容性。在Hartmann6D、Ackley 8D、Levy10D和SVM超参数调节的实验中验证了所有理论预测:CL或KB隐含的惩罚匹配或优于显式的LP贪婪条件化,达到与联合qEI类似的收敛;高效条件化扩展到多二次径向基网络;参数替代模型即使在完全重新训练(随机森林)时仍产生退化的批量,而神经网络仅在15倍的墙钟成本下恢复多样性,优于高斯过程条件化。鲁棒性在多个初始数据集和观察噪声下得到确认。

英文摘要

Constant Liar (CL), Kriging Believer (KB), and fantasy models are widely used for batch selection in parallel Bayesian Optimization, yet a unified theory explaining their effectiveness and conditions under which they fail has been lacking. We identify efficient conditioning as the key surrogate property the ability to update predictions in closed form when data is augmented. We prove that Gaussian Processes satisfy this requirement, producing provably distinct batch points with separation of order l, and that this holds for any acquisition function monotonically non decreasing in posterior uncertainty (EI, UCB, PI), with qualitatively similar behavior for Thompson Sampling. We unify CL, KB, and fantasy models as instances of a single conditioning mechanism differing only in the lie value distribution, and draw quantitative connections to Local Penalization (LP) and qualitative connections to Determinantal Point Processes (DPPs). To disentangle model structure from optimizer randomness, we introduce the Structural Diversity Diagnostic (SDD), a reusable methodology for testing surrogate compatibility. Experiments on Hartmann6D, Ackley 8D, Levy10D, and SVM hyperparameter tuning validate all theoretical predictions: CL or KBs implicit penalty matches or outperforms explicit LP greedy conditioning achieves convergence on par with joint qEI efficient conditioning extends to Multiquadric RBF networks; and parametric surrogates produce degenerate batches even when fully retrained (random forests), while neural networks regain diversity only at 15x the wall clock cost of GP conditioning. Robustness is confirmed across multiple initial datasets and under observation noise.

2605.18818 2026-05-20 cs.AI cs.LG cs.SE 版本更新

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

将文档AI operationalize:一种用于OCR和LLM流水线的微服务架构

Yao Fehlis, Benjamin Bengfort, Zhangzhang Si, Vahid Eyorokon, Prema Roman, Patrick Deziel, Devon Slonaker, Steve Veldman, Ben Johnson, Joyce Rigelo, Michael Wharton, Steve Kramer

AI总结 本文提出了一种微服务架构,用于在生产环境中实现文档理解,通过整合多个模型的流水线,包括分类、OCR和LLM结构字段提取,并展示了在每小时处理数千页文档的经验。

详情
AI中文摘要

学术研究往往集中在新的文档理解模型上,导致文献中模型定义与大规模生产模型之间存在较大差距。为了缩小这一差距,我们提出了一种微服务架构,该架构封装了多个模型的流水线,包括分类、光学字符识别(OCR)和大型语言模型结构字段提取,并展示了该流水线在每小时处理数千页文档的经验。我们描述了主要的设计决策,包括混合分类、将GPU绑定的推理与CPU绑定的编排分离、使用异步处理处理流水线中的许多I/O绑定操作,以及独立的水平扩展策略。通过批量分析,我们发现了两个令人惊讶的定性发现,这些发现影响了生产部署:OCR而不是语言模型解析主导了端到端延迟,并且系统饱和度由共享的GPU推理容量而不是工作程序数量决定。我们的目标是为从业者提供具体的架构模式,以构建在基准之外有效工作的文档理解系统;有效地将模型 operationalize 在生产环境中。

英文摘要

Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production scale. To close that gap, we present a microservice architecture that encapsulates pipelines of multiple models for classification, optical character recognition (OCR), and large language model structured field extraction as well as our experience running this pipeline on thousands of multi-page documents per hour. We describe our primary design decisions, including a hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, use of asynchronous processing for the many IO-bound operations in the pipeline, and an independent, horizontal scaling strategy. Using batch profiling, we identified two surprising qualitative findings that shape production deployments: OCR, not language-model parsing, dominates end-to-end latency, and the system saturates at a concurrency determined by shared GPU-inference capacity rather than worker count. Our goal is to provide practitioners with concrete architectural patterns for building document understanding systems that work beyond the benchmark; effectively operationalizing models in production.

2605.18816 2026-05-20 cs.LG cs.AI 版本更新

Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates

野生中的对称性:等变性在神经流体代理中的作用

Patryk Rygiel, Julian Suk, Kak Khee Yeung, Christoph Brune, Jelmer M. Wolterink

发表机构 * Department of Applied Mathematics(应用数学系) Technical Medical Centre(技术医学中心) Cardiovascular Health Technology Centre(心血管健康技术中心) University of Twente(特文特大学) Department of Computer Science(计算机科学系) Munich Center for Machine Learning(慕尼黑机器学习中心) Technical University of Munich(慕尼黑技术大学) Department of Surgery(外科系) Amsterdam UMC, Location(阿姆斯特丹大学医学中心,地点) University of Amsterdam(阿姆斯特丹大学) Amsterdam Cardiovascular Sciences(阿姆斯特丹心血管科学) Digital Society Institute(数字社会研究所)

AI总结 本文研究了等变性在神经流体代理中的作用,探讨了在不同分布对齐和真实度的任务中,等变性如何提高泛化能力,并介绍了AB-GATr模型在处理耦合表面和体积量时的效率。

详情
AI中文摘要

神经代理能够将计算流体动力学(CFD)模拟的计算速度提升几个数量级,有望改变工程和医疗流程。在现实应用中使用神经代理需要解决可扩展性问题,包括大规模、高分辨率表面和体积网格以及定制架构,并通过归纳偏置来应对有限的训练数据。群等变架构是引入此类偏置的一种系统方法,但当学习问题本身破坏对称性时,例如由于数据集中的强分布对齐,可能会产生不利影响。在本工作中,我们探讨了在具有不同分布对齐和真实度的任务中,等变性如何提高神经CFD代理的泛化能力,涵盖汽车空气动力学和血流(血动力学)。为了系统评估等变性在问题可扩展性极限处的附加价值,我们引入了Anchored-Branched Geometric Algebra Transformer(AB-GATr),一种整合了可扩展性和对称性保持的神经代理,能够以E(3)等变的方式高效建模耦合的表面和体积量。我们发现,在强对齐的空气动力学数据集上,即那些破坏对称性的数据集,强制等变性会降低分布内性能。相反,在具有不同几何形状和变化对齐的血动力学基准测试中,等变性始终有益。此外,在所有基准测试中,AB-GATr的显式等变性通过数据增强始终优于隐式对称学习。我们的发现表明,等变性并非在所有领域都有益,但在缺乏强数据规律的问题中带来了实质性的优势。

英文摘要

Neural surrogates enable orders-of-magnitude acceleration of computational fluid dynamics (CFD) simulations, with the potential to transform engineering and healthcare workflows. Neural surrogate use in real-world applications requires addressing scalability to large, high-resolution surface and volume meshes, as well as to bespoke architectures, and accounting for limited training data through the use of inductive biases. Group-equivariant architectures are a principled way to introduce such bias, yet they can be detrimental when the learning problem itself breaks symmetry, for example, due to strong distributional alignment in the dataset. In this work, we investigate under which conditions equivariance improves generalization in neural CFD surrogates across tasks with increasing levels of distributional alignment and realism, covering automotive aerodynamics and blood flow (hemodynamics). To systematically assess the added value of equivariance at the limit of problem scaling, we introduce the Anchored-Branched Geometric Algebra Transformer (AB-GATr), a neural surrogate that integrates scalability and symmetry preservation to efficiently model coupled surface and volume quantities in an $E(3)$-equivariant manner. We find that on strongly aligned aerodynamics datasets, i.e., those that break symmetry, enforcing equivariance can degrade in-distribution performance. In contrast, across hemodynamic benchmarks with diverse geometries and varying alignment, equivariance is consistently beneficial. Moreover, across all benchmarks, the explicit equivariance of AB-GATr reliably outperforms implicit symmetry learning through data augmentation. Our findings showcase that equivariance is not universally beneficial across domains, yet it brings tangible advantages in problems lacking strong data regularities.

2605.18815 2026-05-20 cs.LG cs.DC 版本更新

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

DynaTrain: 快速在线并行切换用于弹性大语言模型训练

Yuanqing Wang, Yuchen Zhang, Hao Lin, Junhao Hu, Chunyang Zhu, Quanlu Zhang, Boxun Li, Guohao Dai, Zhi Yang, Daning Cheng, Yunquan Zhang, Yu Wang

发表机构 * Institute of Computing Technology, CAS(中国科学院计算技术研究所) Peking University(北京大学) Infinigence AI Shanghai Jiao Tong University(上海交通大学) Tsinghua University(清华大学)

AI总结 本文提出DynaTrain,一种能够快速在线重新配置任意多维并行性的分布式训练系统,通过虚拟参数空间抽象统一所有分布式训练状态,实现并行配置的确定性映射,并在密集和MoE模型上展示了显著的性能提升。

Comments GitHub Repo: https://github.com/infinigence/ElasticMegatron

详情
AI中文摘要

现代大型语言模型(LLM)训练本质上是动态的:资源波动、RLHF阶段转换和集群弹性持续地改变最优并行性布局,对现有基于静态执行模型的训练框架构成重大挑战。我们提出了DynaTrain,一种支持亚秒级在线重新配置的分布式训练系统。其核心是虚拟参数空间(VPS)抽象,该抽象将所有分布式训练状态统一到一个逻辑坐标空间中,将任何并行性配置转换为确定性映射,并将复杂的转换折叠为可管理的几何交集。在VPS之上,状态路由和转换层在内存感知、无死锁的调度下执行rank-local传输,而弹性设备管理器则将新世界构建与正在进行的训练重叠,以掩盖拓扑变化成本。在密集和MoE模型上,DynaTrain能够在2秒内重新配置70B密集模型,在4.36秒内重新配置235B MoE模型,性能优于最先进的检查点基和弹性系统,提升幅度高达三个数量级,同时保持正确性。

英文摘要

Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing training frameworks built around a static execution model. We present DynaTrain, a distributed training system for sub-second, online reconfiguration across arbitrary multi-dimensional parallelism. At its core, we propose a Virtual Parameter Space (VPS) abstraction that unifies all distributed training states under one logical coordinate space, turning any parallelism configuration into a deterministic mapping and collapsing complex transition into manageable geometric intersections. On top of VPS, a state routing-and-transition layer executes rank-local transfers under a memory-aware, deadlock-free schedule, and an Elastic Device Manager overlaps new-world construction with ongoing training to mask topology-change cost. On dense and MoE models up to 235B parameters, DynaTrain reconfigures a 70B dense model in under 2s and a 235B MoE model in 4.36s, outperforming state-of-the-art checkpoint-based and elastic systems by up to three orders of magnitude while preserving correctness.

2605.18814 2026-05-20 cs.LG 版本更新

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

轨迹数据归因的可信度如何?误差来源、缓解方法和实用指南

Junwei Deng, Pingbang Hu, Suliang Jin, Hao Lu, Jiachen T. Wang, Shichang Zhang, Jiaqi W. Ma

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Michigan(密歇根大学) Princeton University(普林斯顿大学) Harvard University(哈佛大学)

AI总结 本文系统分析了轨迹数据归因方法的误差来源,并提出缓解方法和实用指南,通过将总误差分为配置级、算法级和系统级,改进了归因的准确性,并为数据选择提供了可行的实践指导。

详情
AI中文摘要

基于轨迹的数据归因方法通过展开训练轨迹来估计训练样本对模型预测的影响。它们被广泛应用于数据选择、数据估值和模型诊断等应用,但缺乏对这些方法的全面误差分析,引发了对方法可信度的担忧,并阻碍了可靠部署。在本文中,我们提供了轨迹数据归因方法误差来源的首次系统分析,以及具体的缓解方法和下游应用的实用指南。我们将总误差分为三类:配置级、算法级和系统级。我们做出了三个贡献。首先,我们识别出优化器不匹配是主导的配置级误差:现有方法在其归因下假设使用SGD,即使对于使用现代事实上的优化器AdamW训练的模型也是如此。我们提出了AdamW-influence,以充分考虑AdamW的优化动态,在四个设置中(MLP、CNN、GPT-2和Llama 3.2-1B)估计与真实影响之间的Spearman相关性提高了10%到超过300%。其次,我们隔离了剩余的算法级误差,源于一阶泰勒近似,识别了学习率和轨迹长度作为误差大小的决定因素,并推导出一个闭合形式的误差代理,可以在原始轨迹上评估而无需重新训练。第三,我们将这些见解转化为数据选择的实用指南,通过在K-step前瞻框架下统一离线和在线策略。在此框架下,在线选择具有短时间范围通常匹配或超过离线,且最佳时间范围可以与学习率联合调节。共同,这些结果将框架转化为从业者可操作的选择配方。

英文摘要

Trajectory-based data attribution methods estimate the influence of training samples on model predictions by unrolling the training trajectory. They are widely used in applications such as data selection, data valuation, and model diagnosis, but there is a lack of comprehensive error analysis of these methods, raising concerns about method faithfulness and hindering reliable deployment. In this work, we provide the first systematic analysis of error sources in trajectory-based data attribution, together with concrete remedies to mitigate them and practical guidelines for downstream use. We organize the total error into three categories, config-level, algorithm-level, and system-level. We make three contributions. First, we identify optimizer mismatch as the dominant config-level error: existing methods derive their attribution under the assumption of SGD, even for models trained with the modern de facto optimizer AdamW. We propose AdamW-influence to fully account for AdamW's optimization dynamics, yielding improvements from 10% to over 300% in Spearman correlation between estimated and ground-truth influence across four settings spanning MLP, CNN, GPT-2, and Llama 3.2-1B. Second, we isolate the remaining algorithm-level error arising from the first-order Taylor approximation, identify the learning rate and trajectory length as factors governing the error magnitude, and derive a closed-form error proxy that can be evaluated along the original trajectory without retraining. Third, we translate these insights into practical guidelines for data selection by unifying offline and online strategies under a K-step look-ahead framework. Under this framework, online selection with a short horizon often matches or exceeds offline, and the optimal horizon can be tuned jointly with the learning rate. Together, these results turn the framework into an actionable selection recipe for practitioners.

2605.18813 2026-05-20 cs.LG cs.AI 版本更新

Composition of Memory Experts for Diffusion World Models

记忆专家的组合用于扩散世界模型

Sebastian Stapf, Pablo Acuaviva Huertos, Aram Davtyan, Paolo Favaro

发表机构 * Computer Vision Group(计算机视觉组) Department of Computer Science(计算机科学系) University of Bern(伯恩大学)

AI总结 本文提出了一种基于扩散的世界模型框架,通过组合专门化的记忆专家来解决记忆与效率之间的权衡问题,提升了时间一致性、过去观察的回忆和导航性能。

详情
Journal ref
Proceedings of the Fourteenth International Conference on Learning Representations (ICLR), 2026
AI中文摘要

世界模型旨在预测与过去观察一致的合理未来,这是强化学习中规划和决策的关键能力。然而,现有架构面临根本性的记忆权衡:转换器保留局部细节但受二次注意限制,而递归和状态空间模型更高效但以牺牲保真度为代价。为克服这一权衡,我们建议将未来-过去一致性与任何单一架构解耦,并利用一组专门的专家。我们引入了一种基于扩散的框架,通过对比产品-专家公式整合异构记忆模型。我们的方法实现了三个互补的角色:短期记忆专家捕捉精细的局部动态,长期记忆专家通过轻量级测试时微调在外部扩散权重中存储事件历史,以及空间长期记忆专家强制几何和空间一致性。这种组合设计避免了模式崩溃,并在不产生二次成本的情况下扩展到长上下文。在模拟和现实世界基准测试中,我们的方法提高了时间一致性、过去观察的回忆和导航性能,建立了一种新的构建和操作记忆增强扩散世界模型的范式。

英文摘要

World models aim to predict plausible futures consistent with past observations, a capability central to planning and decision-making in reinforcement learning. Yet, existing architectures face a fundamental memory trade-off: transformers preserve local detail but are bottlenecked by quadratic attention, while recurrent and state-space models scale more efficiently but compress history at the cost of fidelity. To overcome this trade-off, we suggest decoupling future-past consistency from any single architecture and instead leveraging a set of specialized experts. We introduce a diffusion-based framework that integrates heterogeneous memory models through a contrastive product-of-experts formulation. Our approach instantiates three complementary roles: a short-term memory expert that captures fine local dynamics, a long-term memory expert that stores episodic history in external diffusion weights via lightweight test-time finetuning, and a spatial long-term memory expert that enforces geometric and spatial coherence. This compositional design avoids mode collapse and scales to long contexts without incurring a quadratic cost. Across simulated and real-world benchmarks, our method improves temporal consistency, recall of past observations, and navigation performance, establishing a novel paradigm for building and operating memory-augmented diffusion world models.

2605.18812 2026-05-20 cs.LG cs.CL cs.IR 版本更新

PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

PASC:面向多阶段NLP和LLM流水线的管道感知置信区间

Varun Kotte

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出PASC,一种面向多阶段NLP和LLM流水线的管道感知置信区间方法,通过联合覆盖保证提升多阶段流水线的置信区间性能。

详情
AI中文摘要

现代NLP和LLM系统是流水线:命名实体识别(NER)->实体消歧(NED)->实体类型、检索增强生成(检索器->读者),以及代理链(规划器->工具->批评者)。错误在各阶段累积,但现有不确定性量化方法要么独立校准每个阶段(无联合覆盖),要么应用Bonferroni联合界(有联合覆盖但保守)。我们提出了PASC(Pipeline-Aware Split Conformal),将多阶段联合覆盖转换为单个标量置信区间问题,基于联合最大不一致性分数。PASC提供了一个有限样本分布无关的保证,所有K阶段同时覆盖的概率至少为1 - alpha,并且几乎紧致,误差不超过1/(n+1)。在CoNLL-2003上的三阶段NER->NED->实体类型流水线中,PASC实现了96.4%的端到端覆盖,优于Bonferroni的93.4%和独立CP的86.5%,在相同平均预测集大小(1.083)下。在分布偏移至WNUT-17推特和WikiNEuRal维基数据时,PASC在测试偏移设置中保持目标覆盖,而独立CP下降到59%。PASC只需一次分位数计算,运行速度比Bonferroni快1.7倍,并可扩展到K=6阶段,其中独立CP下降到0.53端到端覆盖。相同的联合最大分数减少直接应用于复合LLM系统和代理流水线。

英文摘要

Modern NLP and LLM systems are pipelines: named entity recognition (NER) -> entity disambiguation (NED) -> entity typing, retrieval-augmented generation (retriever -> reader), and agentic chains of planner -> tool -> critic. Errors compound across stages, but existing uncertainty quantification methods either calibrate each stage independently (no joint coverage) or apply a Bonferroni union bound (joint coverage, but conservative). We present PASC (Pipeline-Aware Split Conformal), which reduces multi-stage joint coverage to a single scalar conformal prediction problem on the joint maximum nonconformity score. PASC provides a finite-sample distribution-free guarantee that all K stages are simultaneously covered with probability at least 1 - alpha, and is nearly tight up to a 1/(n+1) factor. On a three-stage NER -> NED -> entity-typing pipeline over CoNLL-2003, PASC achieves 96.4% end-to-end coverage versus 93.4% for Bonferroni and 86.5% for independent CP, at identical average prediction set size (1.083). Under distribution shift to WNUT-17 Twitter and WikiNEuRal Wikipedia data, PASC empirically maintains the target coverage in the tested shift settings while independent CP collapses to 59%. PASC requires a single quantile computation, runs 1.7x faster than Bonferroni, and scales to K = 6 stages where independent CP drops to 0.53 end-to-end coverage. The same joint-maximum-score reduction applies directly to compound LLM systems and agent pipelines.

2605.18810 2026-05-20 cs.LG cs.AI 版本更新

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

D-PACE:动态位置感知交叉熵用于并行推测草案

Tianyu Wu, Yu Yao, Zhenting Qi, Han Zheng, Zhuohan Wang, Haoran Ma, Lawrence Liao, Himabindu Lakkaraju, Ju Li, Yilun Du

发表机构 * Harvard(哈佛大学) MIT(麻省理工学院)

AI总结 本文提出D-PACE,一种动态位置感知交叉熵,用于改进并行推测草案的训练,通过动态调整位置权重以提高生成速度和输出长度。

详情
AI中文摘要

推测解码通过让小型草案生成器并行生成token,由更大目标模型验证,从而加速LLM推理。最近的扩散式并行草案生成器如DFlash在一次前向传递中预测完整的B-token块,使深度草案生成器和更长的接受块成为可能。然而,现有多token草案生成器目标通常使用固定的位置依赖加权计划,如头部依赖权重或块位置衰减,这在训练过程中无法适应限制接受的位置变化。为此,我们从可微的替代品中推导出每位置的训练权重,使每个位置的权重与其log概率梯度贡献相匹配。所得到的损失,D-PACE(动态位置感知交叉熵),将训练信号转向当前限制接受的位置,随着草案生成器的改进。在六个基准、两个Qwen3-4B草案深度、两个解码温度和两个额外的目标模型上,D-PACE一致地提高了墙钟加速速度和平均生成长度,测量训练时间开销为2.3%,且不改变草案生成器的架构或推理过程。

英文摘要

Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during training. To address this, we derive per-position training weights from a differentiable surrogate of expected accepted draft length, matching the weight of each position to its log-probability gradient contribution. The resulting loss, D-PACE (Dynamic Position-Aware Cross-Entropy), shifts training signal toward positions that currently limit acceptance as the drafter improves. Across six benchmarks, two Qwen3-4B draft depths, two decoding temperatures, and two additional target models, D-PACE consistently improves both wall-clock speedup and average emitted length, with 2.3\% measured training-time overhead and no changes to the drafter architecture or inference procedure.

2605.18809 2026-05-20 cs.LG cs.AI 版本更新

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

基于度量梯度的稳定多智能体策略学习

Zuyuan Zhang, Sizhe Tang, Mahdi Imani, Tian Lan

发表机构 * The George Washington University(乔治华盛顿大学) Northeastern University(东北大学)

AI总结 本文提出HPML方法,通过将多智能体系统的联合更新场视为L²空间中的向量场,并计算其在最接近度量梯度势流上的Hodge型投影,从而提升多智能体强化学习的稳定性。

详情
AI中文摘要

一般和解的多智能体学习通常由堆叠更新场主导,其中每个智能体的策略更新会改变其他智能体面临的优化景观。这种耦合可以将可积分的集体改进组件与循环交互动力学纠缠在一起,导致多智能体学习缓慢或不稳定。现有方法,如正则化、信用分配和共识方法,通过局部或算法修改稳定MARL;HPML通过将联合更新场投影到度量梯度组件来补充它们。我们引入HPML(Hodge-Projected Multi-agent Learning),将多智能体系统的联合更新场视为L²空间中的向量场,并计算其在最接近度量梯度势流上的Hodge型投影。HPML遵循投影组件作为更新方向,从而在所选度量和采样度量下获得最接近的度量梯度场。投影通过变分定义,由泊松型方程表征,并通过基于图的和放缩神经网络实现,从样本中恢复投影方向。我们证明投影动力学具有Lyapunov势,并能产生具有显式加性非势项的平衡间隙界。受控实验验证了几何机制,CTDE基准测试显示当HPML用作MARL流水线中的插件投影层时,稳定性和归一化回报有所提高。

英文摘要

General-sum multi-agent learning is often governed by a stacked update field in which each agent's policy update changes the optimization landscape faced by the others. This coupling can entangle an integrable component of collective improvement with cyclic interaction dynamics, leading to slow or unstable multi-agent learning. Existing approaches, such as regularization, credit assignment, and consensus methods, stabilize MARL through local or algorithmic modifications; HPML complements them by projecting the joint update field onto a metric-gradient component. We introduce \textbf{HPML} (\textbf{H}odge-\textbf{P}rojected \textbf{M}ulti-agent \textbf{L}earning), which views the joint update field of a multi-agent system as an element of an $L^2$ space of vector fields and computes a Hodge-type projection onto the closest metric-gradient potential flow. HPML follows the projected component as the update direction, yielding the closest metric-gradient field under the chosen metric and sampling measure. The projection is defined variationally, characterized by a Poisson-type equation, and implemented through graph-based and amortized neural realizations that recover projected directions from samples. We show that the projected dynamics admit a Lyapunov potential and yield equilibrium-gap bounds with an explicit additive non-potentiality term. Controlled experiments validate the geometric mechanism, and CTDE benchmarks show improved stability and normalized return when HPML is used as a plug-in projection layer in MARL pipelines.

2605.18808 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Compositional Literary Primitives in Instruction-Tuned LLMs: Cross-Architectural SAE Features for Self, Style, and Affect

在指令微调的LLM中构建组合文学原语:跨架构SAE特征用于自我、风格和情感

Joao Paulo Cavalcante Presa, Savio Salvarino Teles de Oliveira

发表机构 * Federal University of Goias(戈亚斯联邦大学)

AI总结 本文通过稀疏自编码器研究了指令微调的LLM中组合文学原语的架构,发现四种特征类别,并通过跨架构SAE特征验证了自我、风格和情感的表达能力。

Comments 36 pages, 6 figures

详情
AI中文摘要

我们通过在中层残差流上使用稀疏自编码器,对两个指令微调的大型语言模型(Llama 3.1 8B-Instruct和Gemma 2 9B-IT)的文学原语组合架构进行了表征。四种特征类别出现:促进目标情感词的命名门,一个包含第一人称注册特征的十一自我簇,风格注册调节器(show-don't-tell和陌生化),以及仅由多特征引导产生的组合情感。在应用于27类情感分类法(Cowen-Keltner)的强制选择5-LLM判断小组中,Llama通过结合命名门、多特征食谱和单个自我特征引导实现了完全27/27覆盖;Gemma在adoration作为单一残差严格失败的情况下达到23/27。在随机判断中,每个单元格通过的概率约为$10^{-3}$,整个目录中两个种子假阳单元格的预期数量可忽略不计,因此观察到的覆盖度不一致于偶然。在严格与柔和判断对比中存在跨架构不对称性:在相同生成中,判断者在Llama输出上比在Gemma输出上更一致,因为Llama输出更直接地命名目标情感,而Gemma输出则通过场景和意象来唤起情感。两种架构都包含同时作为注册标记和情感发射器的自我特征,包括每个架构中一个最RLHF加载的自我特征,该特征在某一操作 regime 中增强机构Helper-AI人格,并在相同校准系数下产生可分类情感的输出。方法上,本文提出了一个三阶段验证流程(logit-lens,LLM-rate,5-LLM判断)并记录了文档化的反模式;总计算量为单GPU,大约每种情感特征发现循环15分钟。

英文摘要

We characterize a compositional architecture of literary primitives in two instruction-tuned large language models (Llama 3.1 8B-Instruct and Gemma 2 9B-IT) via sparse autoencoders on mid-depth residual streams. Four feature classes emerge: naming-gates that promote lexical tokens of a target affect, an eleven-self cluster of first-person register features, stylistic register modulators (show-don't-tell and defamiliarization), and compositional emotions that arise only from multi-feature steering. Under a forced-choice 5-LLM judge panel applied to a 27-category emotion taxonomy (Cowen-Keltner), Llama reaches full 27/27 coverage by combining naming-gates, multi-feature recipes, and single self-feature steering; Gemma reaches 23/27 with adoration as the single residual strict-fail. Under random judging, the per-cell pass probability is on the order of $10^{-3}$ and the expected number of two-seed false-positive cells across the catalog is negligible, so the observed coverage is not consistent with chance. A cross-architectural asymmetry sits in the strict-versus-soft judge contrast: on the same generations, judges agree more often on Llama outputs than on Gemma outputs because Llama outputs name the target affect more directly while Gemma outputs evoke it through scene and imagery. Both architectures contain self-features that serve simultaneously as register markers and as emotion emitters, including a single most-RLHF-loaded self-feature per architecture that intensifies the institutional Helper-AI persona at one operating regime and produces affect-categorizable output at the same calibrated coefficient. Methodologically, the paper presents a three-stage validation pipeline (logit-lens, LLM-rate, 5-LLM judge) with documented anti-patterns; the total compute is single-GPU and about 15 minutes per emotion-feature discovery cycle.

2605.18805 2026-05-20 cs.IR cs.AI cs.LG 版本更新

RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents

RecoAtlas: 从语义合理性到集级效用在LLM推荐代理中

Imad Aouali, Flavian Vasile, Otmane Sakhi, Alexandre Gilotte, Benjamin Heymann

发表机构 * Criteo AI Lab(Criteo人工智能实验室)

AI总结 本文提出RecoAtlas,一个用于评估购物代理的基准和工具包,通过行为基础的度量标准来评估推荐代理的性能,揭示语义合理性并不一定代表行为基础的效用。

Comments Benchmark on LLM Recommendation Agents

详情
AI中文摘要

LLM推荐代理越来越多地生成结构化的推荐报告:一组项目配以自然语言的解释。然而,现有的评估通常将这种设置简化为对小候选集的重新排序或通过语义合理性来判断报告。我们引入推荐图谱(Agentic Tool-Level Assessment for Shopping),或RecoAtlas,一个用于评估购物代理的基准和工具包,通过行为基础的度量标准来评估。RecoAtlas在持有交互度量的基础上,利用从交互数据中学习的相关性、互补性和多样性代理,同时分别测量语义连贯性和解释质量。其受控工具环境使代理暴露于语义、行为对齐或故障工具中,从而诊断性能提升是否源于更强的推理、更好的信号或更有效的工具使用策略。在受控实验中,我们证明RecoAtlas展示了有意义的基准的关键特性:性能随模型容量和测试时计算量而变化,随着更强和更对齐的工具而改善,受噪声或不匹配信号影响而退化,并揭示语义合理性不必然代表行为基础的效用。RecoAtlas为开发和评估优化不仅考虑合理推荐,还考虑连贯、行为基础推荐集的购物助手提供了基础。

英文摘要

LLM recommendation agents increasingly produce structured recommendation reports: sets of items accompanied by natural-language justifications. Yet existing evaluations often reduce this setting to reranking small shortlisted candidate sets or judge reports mainly by semantic plausibility. We introduce Recommendation Atlas (Agentic Tool-Level Assessment for Shopping), or RecoAtlas, a benchmark and toolkit for evaluating shopping agents with behavior-grounded metrics. RecoAtlas complements held-out interaction metrics with learned utility proxies for relevance, complementarity, and diversity derived from interaction data, while separately measuring semantic coherence and explanation quality. Its controlled tool environment exposes agents to either semantic, behavior-aligned, or faulty tools, enabling diagnosis of whether performance gains arise from stronger reasoning, better signals, or more effective tool-use policies. Across controlled experiments, we show that RecoAtlas exhibits key properties of a meaningful benchmark for agentic systems: performance scales with model capacity and test-time compute, improves with stronger and better-aligned tools, degrades under noisy or misaligned signals, and reveals that semantic plausibility does not necessarily capture behavior-grounded utility. RecoAtlas provides a foundation for developing and evaluating shopping assistants that optimize not only for plausible recommendations, but also for coherent, behaviorally grounded recommendation sets.

2605.18804 2026-05-20 cs.LG cs.AI 版本更新

Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

自适应多尺度良度聚合用于前-前学习

Salar Beigzad, Vansh Verma

发表机构 * Computer Engineering University of St. Thomas Minnesota, USA(计算机工程 明尼苏达州圣汤姆斯大学)

AI总结 本文提出了一种自适应多尺度良度聚合(AMSGA)方法,通过改进局部学习神经网络的稳定性、鲁棒性和泛化能力,解决了原始前-前(FF)框架的局限性,实验表明在MNIST和Fashion-MNIST数据集上性能提升显著。

Comments 6 pages, 5 tables, IEEE format

详情
AI中文摘要

我们提出自适应多尺度良度聚合(AMSGA),一种新颖的前-前(FF)算法扩展,旨在提高局部学习神经网络的稳定性、鲁棒性和泛化能力。AMSGA通过引入多尺度良度聚合(局部、中间和全局表示)、自适应课程引导的困难负样本挖掘、层依赖的自适应阈值以及改进的优化稳定性warm-up余弦退火学习率调度,解决了原始FF框架的多个局限性。这些修改增强了FF范式,同时保持了其生物合理性和内存高效性。在MNIST和Fashion-MNIST上的实验表明,与基线FF算法相比,性能有显著提升,分别在MNIST和Fashion-MNIST上达到+1.45%和+1.50%的改进,而计算开销不大。我们的结果表明,当良度估计和训练动态精心设计时,局部学习方法可以变得更具竞争力。

英文摘要

We propose Adaptive Multi-Scale Goodness Aggregation (AMSGA), a novel extension of the Forward-Forward (FF) algorithm designed to improve stability, robustness, and generalization in local-learning neural networks. AMSGA addresses several limitations of the original FF framework by introducing multi-scale goodness aggregation across local, intermediate, and global representations; adaptive curriculum-guided hard negative mining; layer-dependent adaptive thresholds; and a warm-up cosine annealing learning-rate schedule for improved optimization stability. Together, these modifications strengthen the FF paradigm while preserving its biologically plausible and memory-efficient properties. Experiments on MNIST and Fashion-MNIST demonstrate consistent performance improvements over the baseline FF algorithm, achieving up to +1.45% improvement on MNIST and +1.50% improvement on Fashion-MNIST without significant computational overhead. Our results suggest that local learning methods can become substantially more competitive when goodness estimation and training dynamics are carefully designed.

2605.18802 2026-05-20 eess.SP cs.AI cs.LG 版本更新

A Nonlinear Complexity Index for Wearable PPG Cardiovascular Stability: Multiscale Validation, Systematic Evaluation Correction, and Bayesian Parameter Optimization

一种用于可穿戴PPG心血管稳定性的非线性复杂性指数:多尺度验证、系统性评估修正与贝叶斯参数优化

Timothy Oladunni, Farouk Ganiyu Adewumi

发表机构 * Department of Computer Science, Morgan State University(莫根州立大学计算机科学系)

AI总结 本文提出了一种基于心脏稳定性理论的非线性复杂性指数(SCSI),通过多尺度验证和系统性评估修正,结合贝叶斯参数优化,提高了可穿戴PPG心血管稳定性估计的准确性与可靠性。

详情
AI中文摘要

从可穿戴光体积脉动图(PPG)估计心血管稳定性需要一个原理性的非线性框架,但目前在启发式参数选择和评估协议方面仍存在重大差距,这些协议会夸大报告性能。我们引入了基于心脏稳定性理论的稳定性受限心血管稳定性指数(SCSI),并验证了来自四个异质PPG数据集的176,742个片段,在三个时间尺度上。跨数据集分析显示了显著的Kruskal-Wallis效应量(eta2 = 0.351,p < 0.001),强跨尺度一致性(kappa > 0.97)以及在53个ICU记录中与呼吸频率的显著相关性(Spearman r = 0.346,p = 0.011)。我们识别出三个评估伪影,这些伪影会夸大启发式AUC从真实的基线0.573到0.752:片段级交叉验证泄漏、测试集归一化泄漏以及池化AUC过重加权,这些伪影隐藏了每名患者的失败。纠正这些伪影并应用贝叶斯优化在15个联合参数上,得到SCSI在交叉验证AUC为0.720。在18个保留记录上,SCSI达到池化AUC为0.757(95%置信区间:0.686-0.828)和负预测值为0.966用于心动过速筛查,同时每记录AUC为0.497 ± 0.207被披露以提高透明度。外部验证在42个择期手术记录上得到AUC为0.621,证实了跨人群泛化。消融分析识别出非线性复杂度模块是主导组件。提出了一种稀疏三组件架构作为最小可部署配置。经过修正的协议提供了一个可重复的基准,用于未来可穿戴心血管稳定性指数。

英文摘要

Cardiovascular stability estimation from wearable photoplethysmography (PPG) requires a principled nonlinear framework, yet major gaps persist in heuristic parameter selection and evaluation protocols that inflate reported performance. We introduce a Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory and validate it across 176,742 segments from four heterogeneous PPG datasets at three temporal scales. Cross-dataset analysis demonstrates a large Kruskal-Wallis effect size (eta2 = 0.351, p < 0.001), strong cross-scale consistency (kappa > 0.97), and significant correlation with respiratory rate across 53 ICU records (Spearman r = 0.346, p = 0.011). We identify three evaluation artifacts that inflate heuristic AUC from a true baseline of 0.573 to 0.752: segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting that conceals per-patient failure. Correcting these artifacts and applying Bayesian optimization over 15 joint parameters yields SCSI with cross-validation AUC of 0.720. On 18 held-out records, SCSI achieves pooled AUC of 0.757 (95% CI: 0.686-0.828) and negative predictive value of 0.966 for tachypnea screening, while per-record AUC of 0.497 +/- 0.207 is disclosed for transparency. External validation on 42 elective-surgery records yields AUC of 0.621, confirming cross-population generalization. Ablation analysis identifies the nonlinear complexity module as the dominant component. A sparse three-component architecture is proposed as the minimal deployable configuration. The corrected protocol provides a reproducible benchmark for future wearable cardiovascular stability indices.

2605.18801 2026-05-20 cs.AI cs.IR cs.LG 版本更新

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

位置:让我们开发数据探针,以根本理解数据如何影响大语言模型性能

Shiqiang Wang, Herbert Woisetschläger, Hans Arno Jacobsen, Mingyue Ji

发表机构 * Department of Computer Science, University of Exeter, UK(埃克塞特大学计算机科学系) Technical University of Munich, Germany(慕尼黑技术大学) Department of Electrical and Computer Engineering, University of Toronto, Canada(多伦多大学电气与计算机工程系) Department of Electrical and Computer Engineering, University of Florida, FL, USA(佛罗里达大学电气与计算机工程系)

AI总结 本文提出通过开发数据探针系统方法生成合成序列,以揭示数据特性对大语言模型性能、泛化能力和鲁棒性的影响,从而超越经验启发式方法。

Comments Accepted to ICML 2026 Position Paper Track

详情
Journal ref
Link to ICML record: https://icml.cc/virtual/2026/poster/67154
AI中文摘要

数据对于大语言模型(LLMs)至关重要。然而,了解哪些数据对LLM工作流程的不同阶段(包括训练、微调、对齐、上下文学习等)有用,以及为什么有用,仍然是一个开放性问题。当前的方法依赖于对大型公共数据集进行大量实验来获得数据过滤和数据集构建的经验启发式方法。这些方法计算成本高,并且缺乏一种系统的方法来理解特定数据特性如何驱动LLM行为的本质。在本文的位置论文中,我们倡导开发系统方法来生成合成序列,这些序列由适当定义的随机过程生成,目的是当它们用于LLM工作流程的一个或多个阶段时,能够揭示有用的特点。我们将这些序列称为数据探针。通过观察LLM在数据探针上的行为,研究人员可以系统地研究数据特性如何影响模型性能、泛化能力和鲁棒性。探测序列表现出的统计特性可以通过理论概念(如典型集)来观察,这些概念被推广以描述LLM的行为。这种数据探针方法为揭示数据在LLM训练和推理中的基础作用提供了途径,超越了经验启发式方法。

英文摘要

Data is fundamental to large language models (LLMs). However, understanding of what makes certain data useful for different stages of an LLM workflow, including training, tuning, alignment, in-context learning, etc., and why, remains an open question. Current approaches rely heavily on extensive experimentation with large public datasets to obtain empirical heuristics for data filtering and dataset construction. These approaches are compute intensive and lack a principled way of understanding the essence of how specific data characteristics drive LLM behavior. In this position paper, we advocate for the need of developing systematic methodologies for generating synthetic sequences from appropriately defined random processes, with the goal that these sequences can reveal useful characteristics when they are used in one or multiple stages of the LLM workflow. We refer to such sequences as data probes. By observing LLM behavior on data probes, researchers can systematically conduct studies on how data characteristics influence model performance, generalization, and robustness. The probing sequences exhibit statistical properties that can be viewed using theoretical concepts, such as typical sets, which are generalized to describe the behaviors of LLMs. This data-probe approach provides a pathway for uncovering foundational insights into the role of data in LLM training and inference, beyond empirical heuristics.

2605.18800 2026-05-20 cs.LG cs.AI 版本更新

Theory-optimal Quantization Based on Flatness

基于平坦度的理论最优量化

Xiusheng Huang, Zhe Li, Xuanwu Yin, Lu Wang, Yequan Wang, Dong Li, Emad Barsoum, Kang Liu

发表机构 * The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences(认知与决策智能复杂系统重点实验室,自动化研究所,中国科学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Beijing Academy of Artificial Intelligence(北京人工智能研究院) AMD Ritzz-AI

AI总结 本文提出了一种基于平坦度的理论最优量化方法,通过分析量化误差与异常值之间的数学关系,引入了平坦度指标来量化异常值分布,并提出了双向对角量化框架BDQ,有效分散异常值模式,提升了大语言模型在低比特精度下的性能。

Comments 16 pages, 2 figures

详情
AI中文摘要

后训练量化已成为压缩和加速大型语言模型(LLMs)推理的广泛采用技术。LLMs量化的首要挑战源于激活异常值,这些异常值在低比特精度下显著降低模型性能。尽管近期方法试图通过跨特征维度的线性变换来缓解异常值,我们的分析表明,变换后的权重和激活仍然表现出持续的异常值模式,具有集中化的幅度分布。在本文中,我们首先建模量化误差与异常值之间的数学关系,然后引入一个新的指标平坦度来量化异常值的分布。基于此,我们推导出与平坦度相关的理论最优解。基于这些见解,我们提出了双向对角量化(BDQ),一种新的后训练量化框架,通过优化的矩阵变换有效分散异常值模式。BDQ通过学习的对角操作策略性地将异常值幅度分布到矩阵维度中。广泛的实验表明,BDQ建立了新的量化基准。在LLaMA-3-8B模型上,BDQ在W4A4量化中实现了小于1%的精度下降。在更具挑战性的W2A4KV16实验中,与最先进的方法相比,BDQ在DeepSeek-R1-Distill-LLaMA-70B模型上将性能差距减少了39.1%。

英文摘要

Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary challenges in LLMs quantization stem from activation outliers, which significantly degrade model performance especially at lower bit precision. While recent approaches attempt to mitigate outliers through linear transformations across feature dimensions, our analysis reveals that the transformed weights and activations still exhibit persistent outlier patterns with concentrated magnitude distributions. In this paper, we first model the mathematical relationship between quantization error and outliers, and then introduce a new metric Flatness to quantify the distribution of outliers. Based on this, we derive the theoretical optimal solution with respect to Flatness. Building on these insights, we propose Bidirectional Diagonal Quantization (BDQ), a novel post-training quantization framework that effectively disperses outlier patterns through optimized matrix transformations. BDQ strategically distributes outlier magnitudes across matrix dimensions via learned diagonal operations. Extensive experiments demonstrate that BDQ establishes a new quantization benchmark. It achieves less than 1\% accuracy drop in W4A4 quantization on the LLaMA-3-8B model. In the more challenging W2A4KV16 experiment, compared to state-of-the-art approaches, BDQ reduces the performance gap by 39.1\% on the DeepSeek-R1-Distill-LLaMA-70B model.

2605.18799 2026-05-20 cs.LG cs.AI cs.CL 版本更新

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

ReCrit: 基于过渡意识的强化学习用于科学批评推理

Wanghan Xu, Yuhao Zhou, Hengyuan Zhao, Shuo Li, Dianzhi Yu, Zhenfei Yin, Yaowen Hu, Fengli Xu, Wanli Ouyang, Wenlong Zhang, Lei Bai

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) National University of Singapore(新加坡国立大学) Chinese University of Hong Kong(香港中文大学) University of Oxford(牛津大学) Tsinghua University(清华大学)

AI总结 该研究提出ReCrit框架,通过强化学习解决科学批评推理中的过渡意识问题,改进了批评准确性。

详情
AI中文摘要

大型语言模型在批评交互中不仅可能因回答错误而失败,还可能在用户批评后放弃最初正确的科学解答。在科学推理中,这种风险尤为突出,因为用户的批评可能将正确答案变为错误答案。我们将批评交互视为跨回合正确性过渡问题,而非最终答案准确性问题,并识别出三个挑战:过渡意识、解耦有用的修正与有害的阿谀奉承,以及可扩展的回放。我们提出了ReCrit,一个基于过渡意识的强化学习框架,将初始到批评行为分解为四个象限:修正、阿谀奉承、鲁棒性和边界。ReCrit奖励修正和鲁棒性,惩罚阿谀奉承,并将持续错误视为弱边界信号。为了使交互训练实用,ReCrit进一步使用动态异步回放与尾部自适应完成以减少回放等待。在三个科学推理基准测试(ChemBench、TRQA和EarthSE)上,ReCrit在Qwen3.5-4B上将平均批评准确性从38.15提升到51.49,在Qwen3.5-9B上从45.40提升到55.59。消融实验显示,最终答案奖励提供很少的交互层面增益,而基于过渡意识的奖励和象限加权产生更可区分的训练信号和更大的净批评阶段改进。代码可在https://github.com/black-yt/ReCrit获取。

英文摘要

Large language models can fail in critic interaction not only by answering incorrectly, but also by abandoning an initially correct scientific solution after user criticism. This is especially risky in scientific reasoning, where user criticism can turn a valid answer into an incorrect one. We frame critic interaction as an inter-turn correctness-transition problem rather than a final-answer accuracy problem, and identify three challenges: transition awareness, decoupling useful correction from harmful sycophancy, and scalable rollout. We propose ReCrit, a transition-aware reinforcement learning framework that decomposes Initial-to-Critic behavior into four quadrants: Correction, Sycophancy, Robustness, and Boundary. ReCrit rewards correction and robustness, penalizes sycophancy, and treats persistent errors as weak boundary signals. To make interaction training practical, ReCrit further uses dynamic asynchronous rollout with tail-adaptive completion to reduce rollout waiting. On three scientific reasoning benchmarks, ChemBench, TRQA, and EarthSE, ReCrit improves average Critic accuracy from 38.15 to 51.49 on Qwen3.5-4B and from 45.40 to 55.59 on Qwen3.5-9B. Ablations show that final-answer rewards provide little interaction-level gain, while transition-aware rewards and quadrant weighting produce more distinguishable training signals and larger net Critic-stage improvement. The code is available at https://github.com/black-yt/ReCrit .

2605.18798 2026-05-20 cs.LG cs.IT math.IT math.ST stat.ML stat.TH 版本更新

Accurate Evaluation of Quickest Changepoint Detectors via Non-parametric Survival Analysis

通过非参数生存分析准确评估最快突变点检测器

Taiki Miyagawa, Akinori F. Ebihara

发表机构 * NEC Corporation(日本NEC公司)

AI总结 本文提出非参数估计方法用于快速突变点检测中的平均运行长度和平均检测延迟,通过将突变点检测与生存分析类比,解决了有限和不规则序列长度下的估计问题,提升了模型的鲁棒性和可解释性。

Comments Accepted to ICML 2026. GitHub: https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length

详情
AI中文摘要

我们提出非参数估计器用于在有限和不规则序列长度下快速突变点检测(QCD)中的平均运行长度(ARL)和平均检测延迟(ADD)。尽管ARL和ADD广泛用于理论和模拟研究中的最优性标准,但它们在实际数据集中的应用受到有限和不规则序列长度的限制。为了解决这个问题,我们通过将QCD与生存分析类比,提出非参数估计器ARL和ADD,称为KM-ARL和KM-ADD,以建模序列截断下的检测概率。我们推导了估计偏差界限,并证明除非需要外推,否则它们在渐近上是无偏的。在模拟和实际数据集上的实验展示了其实际用途,增强了对有限和不规则序列长度的鲁棒性,提高了可解释性,并促进了经验、直观的模型选择。我们的Python代码可在https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length提供,为从业者提供了即用型实现。

英文摘要

We propose non-parametric estimators for the average run length (ARL) and average detection delay (ADD) in quickest changepoint detection (QCD) under finite and irregular sequence lengths. Although ARL and ADD are widely used as optimality criteria in theoretical and simulation studies, their application to real-world datasets is hindered by limited and irregular sequence lengths. To address this issue, we propose non-parametric estimators for the ARL and ADD, termed KM-ARL and KM-ADD, by drawing an analogy between QCD and survival analysis to model detection probabilities under sequence truncation. We derive estimation bias bounds and prove that they are asymptotically unbiased unless extrapolation is required. Experiments on simulated and real-world datasets demonstrate their practical utility, enhancing robustness against limited and irregular sequence lengths, improving interpretability, and facilitating empirical, intuitive model selection. Our Python code is provided at https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length, offering ready-to-use implementations for practitioners.

2605.18796 2026-05-20 cs.LG cs.CL 版本更新

UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing

UCCI:用于成本最优LLM级联路由的校准不确定性

Varun Kotte

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出UCCI,一种以校准为核心的路由方法,通过异质回归将token层面的边际不确定性映射到查询级误差概率,并通过约束成本最小化选择升级阈值。在三个显式假设下,阈值策略在校准分数上是成本最优的,异质校准在期望校准误差(ECE)上实现O(n^{-1/3})的样本复杂度。在75000个生产命名实体识别工作负载上,UCCI将推理成本降低了31%(95%CI:[27%, 35%]),同时将ECE从0.12降低到0.03。

Comments 9 pages, 2 figures, 4 tables. Code: https://github.com/varunkotte6/ucci

详情
AI中文摘要

LLM级联和模型路由通过将简单查询发送到小型模型并升级困难查询到大型模型来降低推理成本,但大多数部署的路由器使用未校准的置信度分数并需要每个工作负载的阈值调整。我们提出了UCCI,一种以校准为核心的路由器,通过异质回归将token层面的边际不确定性映射到查询级误差概率,并通过约束成本最小化选择升级阈值。在三个显式假设下,阈值策略在校准分数上是成本最优的,异质校准在期望校准误差(ECE)上实现O(n^{-1/3})的样本复杂度。在75000个生产命名实体识别工作负载上,UCCI将推理成本降低了31%(95%CI:[27%, 35%]),同时将ECE从0.12降低到0.03。在相同的操作点上,UCCI优于熵阈值法、分割置信路由以及FrugalGPT风格的学习阈值。所有级联结果均使用实际模型输出和测量的H100延迟进行端到端路由,而不是基于全局准确率或名义API价格的模拟路由。

英文摘要

LLM cascades and model routing promise lower inference cost by sending easy queries to a small model and escalating hard ones to a large model, but most deployed routers use uncalibrated confidence scores and require per-workload threshold tuning. We present UCCI, a calibration-first router that maps token-level margin uncertainty to a per-query error probability via isotonic regression and selects the escalation threshold by constrained cost minimization. Under three explicit assumptions, threshold policies on the calibrated score are cost-optimal, and isotonic calibration achieves O(n^{-1/3}) sample complexity for expected calibration error (ECE). On a production named entity recognition workload of 75,000 queries served by 4B and 12B instruction-tuned LLMs on H100 GPUs, UCCI cuts inference cost by 31% (95% CI: [27%, 35%]) at micro-F1 = 0.91 while reducing ECE from 0.12 to 0.03. At the same operating point, UCCI beats entropy thresholding, split-conformal routing, and a FrugalGPT-style learned threshold. All cascade results use end-to-end routing on actual model outputs and measured H100 latency, not simulated routing from global accuracies or nominal API prices.

2605.18795 2026-05-20 cs.LG cs.AI 版本更新

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Jia Wei, Zhonghao Zhang, Ping Chen, Qianyang li, Yancheng Pan, Shaoxun Wang, Ziyi Qiu, Longxiang Wang

发表机构 * Department of Computer Science and Technlogy(计算机科学与技术系) Tsinghua University(清华大学) School of Computer Science and Technlogy(计算机科学与技术系) Xi’an Jiaotong University(西安交通大学) The State Key Laboratory of Blockchain and Data Security, Zhejiang University(区块链与数据安全国家重点实验室,浙江大学) Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security(杭州高科技区(滨江)区块链与数据安全研究院)

AI总结 本文提出HELLoRA,一种针对混合专家模型的层级低秩适应方法,通过仅对最活跃的专家添加LoRA模块,减少可训练参数和计算量,同时提升下游任务性能。

详情
AI中文摘要

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation. We propose Hot-Experts Layer-level Low-Rank Adaptation (HELLoRA), which attaches LoRA modules only to the most frequently activated experts at each layer. This simple mechanism reduces trainable parameters and adapter-induced FLOPs while improving downstream performance, an effect we attribute to a form of structured regularization that preserves pretrained expert specialization. To stress-test HELLoRA under extreme parameter budgets, we further compose it with LoRI to form HELLoRI, which freezes the up-projection and sparsifies the down-projection. Across three MoE backbones, namely OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE, and three task families covering mathematical reasoning, code generation, and safety alignment, HELLoRA consistently outperforms strong PEFT baselines. Relative to vanilla LoRA on OlMoE, HELLoRA uses 15.7% of the trainable parameters, reduces adapter FLOPs by 38.7%, achieves 1.9x the training throughput, and improves accuracy by 9.2%. On DeepSeekMoE, HELLoRA outperforms LoRA while using only 23.2% of its trainable parameters. These results demonstrate that activation-aware adapter placement is an effective and practical route to scaling PEFT for MoE language models.

英文摘要

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation. We propose Hot-Experts Layer-level Low-Rank Adaptation (HELLoRA), which attaches LoRA modules only to the most frequently activated experts at each layer. This simple mechanism reduces trainable parameters and adapter-induced FLOPs while improving downstream performance, an effect we attribute to a form of structured regularization that preserves pretrained expert specialization. To stress-test HELLoRA under extreme parameter budgets, we further compose it with LoRI to form HELLoRI, which freezes the up-projection and sparsifies the down-projection. Across three MoE backbones, namely OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE, and three task families covering mathematical reasoning, code generation, and safety alignment, HELLoRA consistently outperforms strong PEFT baselines. Relative to vanilla LoRA on OlMoE, HELLoRA uses 15.7% of the trainable parameters, reduces adapter FLOPs by 38.7%, achieves 1.9x the training throughput, and improves accuracy by 9.2%. On DeepSeekMoE, HELLoRA outperforms LoRA while using only 23.2% of its trainable parameters. These results demonstrate that activation-aware adapter placement is an effective and practical route to scaling PEFT for MoE language models.

2605.18794 2026-05-20 cs.LG cs.AI 版本更新

Robust Basis Spline Decoupling for the Compression of Transformer Models

基于鲁棒基样条的变压器模型压缩解耦方法

Joppe De Jonghe, Van Tien Pham, Mariya Ishteva

发表机构 * NUMA, Department of Computer Science, KU Leuven(NUMA计算机科学系,鲁文大学)

AI总结 本文提出了一种基于B-样条的解耦框架,通过利用B-样条的局部支持和灵活的光滑性控制,改进了传统张量解耦方法,提高了数值稳定性和表达能力,实验表明该方法在保持竞争力精度的同时实现了显著的参数减少。

详情
AI中文摘要

解耦是一种强大的建模范式,用于将多元函数表示为线性变换和单变量非线性函数的组合。单层解耦可以视为具有单个隐藏层和灵活激活函数的全连接神经网络,提供了与神经网络的直接联系。因此,解耦方法在神经网络领域中的应用日益增加,尤其是在压缩方面,因为它能够通过减少参数复杂性实现结构化近似。现有的基于张量的解耦方法通常依赖于多项式或分段线性参数化内部非线性函数,这可能导致数值不稳定或表达能力有限。在本工作中,我们引入了一种基于B-样条的解耦框架,扩展了这些现有方法。通过利用B-样条的局部支持和灵活的光滑性控制,所提出的公式产生了一种更加数值稳定和表达力更强的表示。我们推导出一个受约束的耦合矩阵-张量分解,并提出了一种名为R-CMTF-BSD的鲁棒交替最小二乘算法,结合了归一化和Tikhonov正则化。所提出的方法通过合成数据和变压器模型压缩实验进行了验证。在视觉和Swin Transformer架构上的结果表明,B-样条解耦在保持竞争性精度的同时实现了显著的参数减少,使R-CMTF-BSD算法成为结构化神经网络压缩的有前景的工具。

英文摘要

Decoupling is a powerful modeling paradigm for representing multivariate functions as compositions of linear transformations and univariate nonlinear functions. A single-layer decoupling can be viewed as a fully connected neural network with a single hidden layer and flexible activation functions, providing a direct link with neural networks. Because of this, the use of decoupling methods has gained increasing attention in neural network domains, particularly compression, since it enables structured approximations with reduced parameter complexity. Existing tensor-based decoupling methods typically rely on polynomial or piecewise-linear parameterizations of the internal nonlinear functions, which can suffer from numerical instability or limited expressiveness. In this work, we introduce a B-spline-based decoupling framework that generalizes these existing approaches. By exploiting the local support and flexible smoothness control of B-splines, the proposed formulation yields a more numerically stable and expressive representation. We derive a constrained coupled matrix-tensor factorization and propose a robust alternating least-squares algorithm, called R-CMTF-BSD, incorporating normalization and Tikhonov regularization. The proposed method is validated through experiments on synthetic data and transformer model compression. Results on the Vision and Swin Transformer architectures demonstrate that B-spline decoupling enables substantial parameter reduction while maintaining competitive accuracy, making the R-CMTF-BSD algorithm a promising tool for structured neural network compression.

2605.18793 2026-05-20 cs.LG cs.AI 版本更新

Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance

维度平衡提升大规模时空预测性能

Jing Chen, Shixiang Pan, Yujie Fan, Haocheng Ye, Haitao Xu, Wenqiang Xu

发表机构 * School of Computer Science and Technology, Hangzhou Dianzi University(杭州电子科技大学计算机科学与技术学院) College of Economics, China Jiliang University(中国浙江大学经济学院) Key Laboratory of New Industrial Internet Control Technology(新型工业互联网控制技术重点实验室)

AI总结 本文提出一种可扩展的自适应框架,通过压缩空间维度和扩展时间范围来解决时空预测中的性能瓶颈问题,从而提高预测精度和跨领域适用性。

详情
AI中文摘要

准确的时空模式分析在城市交通、气象和公共卫生监测等领域至关重要。然而,现有方法面临性能瓶颈,通常只能带来微小的改进,并且往往具有有限的跨领域迁移能力。我们通过空间和时间熵度量来分析这一瓶颈,这些度量用于诊断时空复杂性不匹配,而非作为熵对齐单独能提高预测的保证。经验上,较大的不匹配通常伴随着较高的预测不确定性,尤其是在模型容量预算固定的情况下。基于此诊断,我们提出了一种可扩展、自适应的框架,以协调空间和时间特征表示。通过低秩矩阵嵌入压缩空间维度以保留关键结构,而扩展的时间范围捕捉长距离依赖关系并减轻时间异质性带来的累积误差。在城市交通、气象和流行病数据集上的广泛实验显示了显著的准确性提升,并且在评估的各个领域中具有广泛的适用性,表明该框架在当前研究之外的广泛时空任务中具有前景。代码可在GitHub上获得:https://github.com/ST-Balance/ST-Balance。

英文摘要

Accurate spatiotemporal pattern analysis is critical in fields such as urban traffic, meteorology, and public health monitoring. However, existing methods face performance bottlenecks, typically yielding only incremental gains and often exhibiting limited cross-domain transferability. We analyze this bottleneck through spatial and temporal entropy measures, which are used as diagnostic indicators of spatiotemporal complexity mismatch rather than as guarantees that entropy alignment alone yields better forecasting. Empirically, larger mismatch is often accompanied by higher prediction uncertainty, especially under a fixed model-capacity budget. Guided by this diagnostic, we propose a scalable, adaptive framework that harmonizes spatial and temporal feature representations. Spatial dimensionality is compressed via low-rank matrix embedding to preserve essential structure, while an extended temporal horizon captures long-range dependencies and mitigates cumulative errors arising from temporal heterogeneity. Extensive experiments on urban traffic, meteorological, and epidemic datasets demonstrate substantial accuracy gains and broad applicability across the evaluated domains, suggesting that the framework is promising for a wide range of spatiotemporal tasks beyond the current study. The code is available on GitHub at https://github.com/ST-Balance/ST-Balance.

2605.18791 2026-05-20 eess.IV cs.CV cs.LG q-bio.OT 版本更新

SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation

SpecX:多模态光谱的大规模基准及跨范式评估

Chengrui Xiang, Tengfei Ma, Yujie Chen, Tong Wang, Haowen Chen, Xiangxiang Zeng

发表机构 * College of Computer Science and Technology, Hunan University(湖南大学计算机科学与技术学院)

AI总结 本文提出SpecX,一个用于多模态光谱的大规模基准,通过不同层级的数据集支持分子解析、光谱模拟和理解任务,揭示了专用光谱模型和多模态语言模型在光谱智能中的不同优势。

Comments 9 pages,1 figures

详情
AI中文摘要

现有的光谱基准在规模、模态对齐和评估范围上存在局限,通常专注于专用模型或多模态语言模型(MLLMs)。我们引入SpecX,一个大规模的多模态光谱基准,具有跨范式评估。SpecX包含170万种分子,涵盖NMR(1H,13C,HSQC)、IR、MS、UV、拉曼和FL等多种光谱模态,并分为三个层级:大规模数据集用于预训练,对齐的多光谱子集用于基准测试,以及高质量实验子集用于评估。SpecX支持分子解析、光谱模拟和光谱理解等多种任务,并在专用光谱模型和MLLMs之间实现统一评估。实验表明,专用模型在信号层面建模上表现优异,而MLLMs在高层推理上表现出色,但缺乏精确的光谱定位。SpecX建立了一个统一的光谱智能基准,并强调了需要光谱原生的基础模型。

英文摘要

Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, MS,UV,Raman and FL, and is organized into three tiers: a large-scale dataset for pretraining, an aligned multi-spectral subset for benchmarking, and a high-quality experimental subset for evaluation. SpecX supports a range of tasks such as molecular elucidation, spectrum simulation, and spectral understanding, and enables unified evaluation across both specialized spectral models and MLLMs. Experiments show that specialized models excel at signal-level modeling, while MLLMs exhibit strengths in high-level reasoning but lack precise spectral grounding. SpecX establishes a unified benchmark for spectral intelligence and highlights the need for spectrum-native foundation models.

2605.18780 2026-05-20 cs.IR cs.AI cs.LG 版本更新

A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation

PO4ISR的可重复性分析:诊断和缓解基于LLM的会话推荐中的语义漂移

Aditya Tiwari, Konduri Naga Lakshmi Rekha, Rajesh Kumar Mundotiya

发表机构 * MATRA Lab(MATRA实验室) Department of Computer Science and Engineering(计算机科学与工程系) Indian Institute of Technology Bhilai(比哈尔理工学院)

AI总结 本文研究了PO4ISR在不同语义领域中的可重复性,发现标准推理提示在长会话中出现严重的上下文漂移,导致性能下降。为此,作者提出了PO4ISR++,通过反思提示和一致排名检测增强鲁棒性,并在多个数据集上验证了其有效性,提升了会话推荐的性能。

详情
AI中文摘要

基于推理的大型语言模型(LLMs)如PO4ISR在会话推荐中设定了新的基准。然而,其在不同语义领域中的可重复性仍未经探索。本文对PO4ISR进行了严格的可重复性研究,以评估其泛化极限。我们的分析揭示了一种关键失败模式:标准推理提示在长会话中遭受严重的上下文漂移,导致在语义复杂的数据集如Games和Bundle上性能下降。为了量化和解决这一稳定性差距,我们引入了PO4ISR++,一种鲁棒性增强的实现,整合了反思提示和一致排名检测。与原始的静态提示策略不同,我们的方法能够动态适应跨领域线索。我们在ML-1M、Games和Bundle上基准测试了原始实现和我们的鲁棒变体。我们的结果证实,尽管原始模型在新领域中挣扎,我们的可重复性扩展恢复了性能,在Games上实现了高达54%的稳定提升,在Bundle上实现了96%的提升。我们发布了开源工具包,包括重现的基线和我们的增强框架,以促进基于LLM的推荐的可靠未来研究。

英文摘要

Reasoning-based Large Language Models (LLMs) like PO4ISR have set new benchmarks in session-based recommendation. However, the reproducibility of their reasoning capabilities across diverse semantic domains remains unexplored. In this work, we conduct a rigorous reproducibility study of PO4ISR to assess its generalization limits. Our analysis reveals a critical failure mode: standard reasoning prompts suffer from severe contextual drift in long sessions, leading to performance degradation on semantically complex datasets like Games and Bundle. To quantify and resolve this stability gap, we introduce PO4ISR++, a robustness-enhanced implementation that integrates reflexive prompting and consistent rank detection. Unlike the original static prompting strategy, our approach dynamically adapts to cross-domain cues. We benchmark both the original implementation and our robust variant on ML-1M, Games, and Bundle. Our results confirm that while the original model struggles in new domains, our reproducible extension restores performance, yielding a stabilized gain of up to 54% on Games and 96% on Bundle. We release open-source artifacts, including the reproduced baseline and our enhanced framework, to facilitate reliable future research in LLM-based recommendation.

2605.18773 2026-05-20 cs.CR cs.AI cs.CY cs.LG 版本更新

Decentralized autonomous organization and blockchain-based incentivization framework for community-based facilities management

去中心化自治组织与基于区块链的激励框架用于社区设施管理

Reachsak Ly, Alireza Shojaei, Xinghua Gao, Philip Agee, Abiola Akanmu

发表机构 * School of Technology, Eastern Illinois University(东伊利诺伊大学技术学院) Myers-Lawson School of Construction, Virginia Polytechnic Institute and State University(弗吉尼亚理工学院和州立大学梅斯-劳森建筑学院)

AI总结 本文提出了一种基于区块链和去中心化自治组织(DAO)的新型框架,用于智能建筑中的社区设施管理,通过去中心化治理平台和维护管理平台的结合,提高设施维护的参与度和效率。

Comments 29 pages, 17 figures, 3 tables

详情
AI中文摘要

传统的设施管理通常依赖于集中决策结构,限制了利益相关者的参与,导致与租户需求不一致并降低了满意度。本文提出了一种新的基于区块链和去中心化自治组织(DAO)的框架,用于智能建筑中的社区设施管理。该框架包含两个关键组成部分:一个去中心化的治理平台,通过区块链投票促进透明的集体决策;以及一个维护管理平台,具有激励机制,鼓励建筑使用者通过代币奖励积极贡献于设施维护。系统评估包括成本分析、可扩展性、数据安全考虑、可用性测试以及与设施管理人员和研究人员进行的半结构化访谈,以评估平台的实用性、挑战和采用潜力。研究结果表明,该框架有潜力作为激励解决方案,用于促进利益相关者在集体维护和改善建筑基础设施方面的参与。

英文摘要

Traditional facility management often relies on centralized decision-making structures that limit stakeholder participation, leading to misalignment with occupant needs and reduced satisfaction. This paper proposes a novel blockchain- and Decentralized Autonomous Organization (DAO)-based framework for community-based facilities management in smart buildings. The framework comprises two key components: a decentralized governance platform that facilitates transparent collective decision-making through blockchain-based voting, and a maintenance management platform with an incentivization mechanism that encourages building occupants to actively contribute to facility upkeep through tokenized rewards. System evaluation includes cost analysis, scalability, data security considerations, usability testing, and semi-structured interviews with facility managers and researchers to assess the platform's usefulness, challenges, and adoption potential. The findings demonstrate the framework's potential as a viable incentivization solution for engaging stakeholders in the collective upkeep and improvement of building infrastructure.

2603.11673 2026-05-20 cs.LG 版本更新

Context-dependent manifold learning: A neuromodulated constrained autoencoder approach

基于上下文的流形学习:一种受神经调节的约束自编码器方法

Jérôme Adriaens, Gustave Bainier, Guillaume Drion, Pierre Sacré

发表机构 * University of Liège(列日大学)

AI总结 本文提出了一种受神经调节的约束自编码器(NcAE),通过上下文驱动的超网络调节自编码器的激活斜率和偏置,以恢复上下文变化下的投影保证,从而在物理系统中保持几何一致性。

Comments 26 pages, 5 figures, 24 Tables

详情
AI中文摘要

许多物理系统表现出随着外部参数变化而变化的低维结构:机器人中的链接长度、流体中的强迫常数或流动中的雷诺数会改变底层流形,但保持其内在维度。受限自编码器(cAEs)通过一种幂等的编码器-解码器投影学习此类流形,这一特性是无约束自编码器无法匹敌的,且在模型迭代应用时尤为关键。然而,标准的使cAE上下文依赖的方法,即在输入中连接上下文或通过仿射调节隐藏激活,破坏了编码器-解码器的幂等性,恰好在最需要保证投影的情况下牺牲了投影保证。为在上下文变化下恢复此保证,我们开发了受神经调节的受限自编码器(NcAE),通过上下文驱动的超网络调节cAE的激活斜率和偏置。本文介绍了NcAE,其理论基础及其经验验证。我们证明,对于每个上下文,包括训练时未见过的上下文,重构映射仍保持幂等投影,所学流形的拓扑不变,且上下文扰动导致流形的平滑变化。我们在具有上下文依赖耦合的16自由度摆动器和跨分岔的洛伦茨96系统上评估了我们的方法。NcAE在重构、幂等性和潜在几何度量方面匹配或超过了六个基线中的最佳,同时是唯一通过构造保持几何一致性的架构。因此,NcAE在物理系统家族中提供了稳定的、保持几何一致的坐标系统。

英文摘要

Many physical systems exhibit a low-dimensional structure that varies with external parameters: link lengths in a robot, forcing constants in a fluid, or Reynolds numbers in a flow shift the underlying manifold while preserving its intrinsic dimension. Constrained AutoEncoders (cAEs) learn such manifolds through an idempotent encoder-decoder projection, a property that unconstrained autoencoders cannot match and that is essential whenever the model is applied iteratively. However, the standard strategies for making a cAE context-dependent, namely concatenating the context to the input or affinely modulating hidden activations, break the encoder-decoder idempotency, sacrificing the projection guarantee precisely in the setting where it would be most valuable. To restore this guarantee under context variation, we developed the Neuromodulated Constrained Autoencoder (NcAE), which modulates the activation slope and bias of a cAE through a context-driven hyper-network. This paper presents the NcAE, its theoretical foundation, and its empirical validation. We prove that for every context, including contexts unseen at training time, the reconstruction map remains an idempotent projection, the topology of the learned manifold is invariant, and context perturbations induce smooth changes in the manifold. We evaluated our approach on a 16-DoF pendulum with context-dependent coupling and the Lorenz96 system across a bifurcation. The NcAE matched or exceeded the best of six baselines on reconstruction, idempotency, and latent-geometry metrics, while being the only architecture that preserves geometric consistency by construction. The NcAE thereby provides a stable, geometry-preserving coordinate system across families of physical regimes.

2602.04883 2026-05-20 cs.LG cs.AI q-bio.BM q-bio.QM 版本更新

Protein Autoregressive Modeling via Multiscale Structure Generation

通过多尺度结构生成进行蛋白质自回归建模

Yanru Qu, Cheng-Yen Hsieh, Zaixiang Zheng, Ge Liu, Quanquan Gu

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出了一种多尺度自回归框架PAR,用于通过粗到细的下一尺度预测生成蛋白质主链结构。核心方法包括多尺度下采样操作、自回归Transformer和基于流的主链解码器,通过噪声上下文学习和调度采样缓解曝光偏差,实现高质量主链生成,并展示了强大的零样本泛化能力。

Comments ICML 2026 Spotlight; ByteDance Seed Tech Report; Page: https://par-protein.github.io/

详情
AI中文摘要

我们提出了蛋白质自回归建模(PAR),这是首个多尺度自回归框架,用于通过粗到细的下一尺度预测生成蛋白质主链结构。利用蛋白质的分层性质,PAR生成的结构模仿雕刻雕像的过程,形成粗略拓扑结构并逐步细化结构细节。为此,PAR由三个关键组件组成:(i)多尺度下采样操作,在训练过程中表示蛋白质结构在多个尺度上的特征;(ii)一个自回归Transformer,编码多尺度信息并生成条件嵌入以指导结构生成;(iii)基于流的主链解码器,根据这些嵌入生成主链原子。此外,自回归模型由于训练和生成过程不匹配而遭受曝光偏差,这会显著降低结构生成质量。我们通过采用噪声上下文学习和调度采样有效缓解了这一问题,实现了鲁棒的主链生成。值得注意的是,PAR表现出强大的零样本泛化能力,支持灵活的人类提示条件生成和基序支架构建,而无需微调。在无条件生成基准测试中,PAR有效学习了蛋白质分布,并生成高质量的主链结构,且表现出良好的扩展性。这些特性使PAR成为蛋白质结构生成的有前途的框架。

英文摘要

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.

2601.05391 2026-05-20 cs.LG 版本更新

DynaSTy: A Framework for SpatioTemporal Node Attribute Prediction in Dynamic Graphs

DynaSTy: 一个用于动态图中时空节点属性预测的框架

Namrata Banerji, Tanya Berger-Wolf

发表机构 * The Ohio State University(俄亥俄州立大学)

AI总结 本文提出了一种端到端的动态边偏置时空模型,用于预测动态图中节点属性的多步未来值,通过引入可适应的注意力偏置和预训练目标,提高了长期预测的准确性。

详情
AI中文摘要

准确预测动态图中节点级别的属性对于金融信任网络和生物网络等应用至关重要。现有时空图神经网络通常假设邻接矩阵是静态的。在本文中,我们提出了一种端到端的动态边偏置时空模型,该模型输入多维节点属性时间序列和邻接矩阵时间序列,以预测多个未来步骤的节点属性。在每个时间步,我们的基于变压器的模型将给定的邻接矩阵作为可适应的注意力偏置注入,使模型能够根据图的演变关注相关的邻居。我们进一步部署了一个掩码节点-时间预训练目标,使编码器能够重建缺失的特征,并通过调度采样和水平加权损失进行训练,以减轻长期预测中的复合误差。与先前工作不同,我们的模型能够适应不同输入样本中变化的动态图,使多系统设置中的预测成为可能,如不同主体的脑网络、不同情境的金融系统或演变的社会系统。实验证明,我们的方法在均方根误差(RMSE)和平均绝对误差(MAE)上一致优于强大的基线方法。

英文摘要

Accurate multistep forecasting of node-level attributes on dynamic graphs is critical for applications ranging from financial trust networks to biological networks. Existing spatiotemporal graph neural networks typically assume a static adjacency matrix. In this work, we propose an end-to-end dynamic edge-biased spatiotemporal model that ingests a multi-dimensional timeseries of node attributes and a timeseries of adjacency matrices, to predict multiple future steps of node attributes. At each time step, our transformer-based model injects the given adjacency as an adaptable attention bias, allowing the model to focus on relevant neighbors as the graph evolves. We further deploy a masked node-time pretraining objective that primes the encoder to reconstruct missing features, and train with scheduled sampling and a horizon-weighted loss to mitigate compounding error over long horizons. Unlike prior work, our model accommodates dynamic graphs that vary across input samples, enabling forecasting in multi-system settings such as brain networks across different subjects, financial systems in different contexts, or evolving social systems. Empirical results demonstrate that our method consistently outperforms strong baselines on Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).

2511.07347 2026-05-20 physics.comp-ph cs.LG 版本更新

Walsh-Hadamard Neural Operators for Solving PDEs with Discontinuous Coefficients

Walsh-Hadamard神经算子用于求解具有不连续系数的偏微分方程

Giorgio M. Cavallazzi, Miguel Pérez Cuadrado, Alfredo Pinelli

发表机构 * Department of Engineering, City St George's, University of London(伦敦大学城市圣乔治学院工程系)

AI总结 本文提出Walsh-Hadamard神经算子(WHNO)以解决具有不连续系数的偏微分方程问题,通过结合Walsh-Hadamard变换和可学习的谱权重,有效捕捉全局依赖关系,并在三个测试问题中验证了其优于傅里叶神经算子(FNO)的准确性。

详情
AI中文摘要

神经算子已逐渐成为学习偏微分方程(PDEs)解算子的强大工具。然而,基于傅里叶变换的常规谱方法在处理具有不连续系数的问题时受到吉布斯现象和尖锐界面表示差的限制。我们引入了Walsh-Hadamard神经算子(WHNO),该方法利用Walsh-Hadamard变换——一种适用于分段常数场的矩形波函数谱基——结合可学习的谱权重,将低频Walsh系数转换以高效捕捉全局依赖关系。我们在三个问题上验证了WHNO:稳态达西流(初步验证)、具有不连续热导率的热传导以及具有不连续初始条件的二维Burgers方程。在与傅里叶神经算子(FNO)相同条件下进行的受控比较中,WHNO在准确性方面表现更优,能够更好地保持材料界面处的尖锐解特征。关键发现是,WHNO与FNO的加权集合组合在单独模型上实现了显著提升:对于热传导和Burgers方程,最优集合将均方误差减少35-40%,最大误差减少高达25%。这表明Walsh-Hadamard和傅里叶表示捕捉了不连续PDE解的互补方面,WHNO在尖锐界面处表现优异,而FNO有效捕捉平滑特征。

英文摘要

Neural operators have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). However, standard spectral methods based on Fourier transforms struggle with problems involving discontinuous coefficients due to the Gibbs phenomenon and poor representation of sharp interfaces. We introduce the Walsh-Hadamard Neural Operator (WHNO), which leverages Walsh-Hadamard transforms-a spectral basis of rectangular wave functions naturally suited for piecewise constant fields-combined with learnable spectral weights that transform low-sequency Walsh coefficients to capture global dependencies efficiently. We validate WHNO on three problems: steady-state Darcy flow (preliminary validation), heat conduction with discontinuous thermal conductivity, and the 2D Burgers equation with discontinuous initial conditions. In controlled comparisons with Fourier Neural Operators (FNO) under identical conditions, WHNO demonstrates superior accuracy with better preservation of sharp solution features at material interfaces. Critically, we discover that weighted ensemble combinations of WHNO and FNO achieve substantial improvements over either model alone: for both heat conduction and Burgers equation, optimal ensembles reduce mean squared error by 35-40 percent and maximum error by up to 25 percent compared to individual models. This demonstrates that Walsh-Hadamard and Fourier representations capture complementary aspects of discontinuous PDE solutions, with WHNO excelling at sharp interfaces while FNO captures smooth features effectively.

2510.03589 2026-05-20 cs.LG 版本更新

FieldFormer: Locality-Aware Transformers for Spatio-Temporal Modeling on Sparse Sensor Networks

FieldFormer:用于稀疏传感器网络中时空建模的具有局部性的变换器

Ankit Bhardwaj, Ananth Balashankar, Lakshminarayanan Subramanian

发表机构 * Department of Computer Science(计算机科学系) New York University(纽约大学) Google DeepMind(谷歌深Mind)

AI总结 本文提出FieldFormer,一种无网格变换器架构,用于在持续传感器网络中进行具有局部性的传感器空间建模。通过学习可调节的速度缩放偏移量,聚合局部证据,以适应时空依赖性,并在极端稀疏性下实现稳定和可扩展的推理。

详情
AI中文摘要

现实世界系统中的时空传感器数据往往稀疏、噪声且不规则,使得潜在场重建从根本上处于欠约束状态。在极端稀疏性下,多个物理上合理的场可能与相同观测一致,要求模型依赖于关于局部性、传输和空间规律的归纳偏置。在这种情况下,可靠的重建集中在由传感器网络引起的观测支持上,使传感器空间建模比无约束的全局场恢复更具可识别性。我们引入FieldFormer,一种无网格变换器架构,用于在持续传感器网络中进行具有局部性的传感器空间建模。对于每个查询,FieldFormer通过可学习的速度缩放偏移量聚合局部证据,以适应邻域几何到时空依赖性。邻域被构建为固定最大稀疏上下文,覆盖附近的传感器和有限的时间窗口,使在极端稀疏性下实现稳定和可扩展的推理。一个局部变换器编码器整合邻域信息,而基于坐标的神经场公式支持无网格预测。我们在五个合成和现实世界基准上评估FieldFormer,包括各向异性热扩散、浅水动力学、大气传输和污染监测数据集。结果表明,具有局部性的重建在局部依赖域仍被观测时提供显著优势,使FieldFormer在稀疏传感器空间预测任务中一致优于最先进的基线。

英文摘要

Spatio-temporal sensor data in real-world systems is often sparse, noisy, and irregular, making latent field reconstruction fundamentally underconstrained. Under extreme sparsity, multiple physically plausible fields may remain consistent with the same observations, requiring models to rely on inductive biases about locality, transport, and spatial regularity. In such regimes, reliable reconstruction is concentrated around the observational support induced by the sensor network, making sensor-space modeling a more identifiable objective than unconstrained global field recovery. We introduce FieldFormer, a mesh-free transformer architecture for locality-aware sensor-space modeling in persistent sensor networks. For each query, FieldFormer aggregates local evidence using learnable velocity-scaled offsets that adapt neighborhood geometry to spatio-temporal dependencies. Neighborhoods are constructed as fixed maximal sparse contexts over nearby sensors and bounded temporal windows, enabling stable and scalable inference under extreme sparsity. A local transformer encoder integrates neighborhood information, while a coordinate-based neural field formulation supports mesh-free prediction. We evaluate FieldFormer on five synthetic and real-world benchmarks, including anisotropic heat diffusion, shallow-water dynamics, atmospheric transport, and pollution monitoring datasets. Results show that locality-aware reconstruction provides strong advantages when local domains of dependence remain observed, enabling FieldFormer to consistently outperform state-of-the-art baselines on sparse sensor-space prediction tasks.

2412.02818 2026-05-20 cs.RO cs.LG 版本更新

RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

RoboMD: 通过语义势场揭示机器人漏洞

Som Sagar, Jiafei Duan, Sreevishakh Vasudevan, Yifan Zhou, Heni Ben Amor, Dieter Fox, Ransalu Senanayake

发表机构 * Arizona State University(亚利桑那州立大学) University of Washington(华盛顿大学)

AI总结 本研究提出RoboMD框架,通过学习基于连续视觉-语言嵌入的深度强化学习策略,揭示机器人在现实世界中因外部变化导致的漏洞,通过虚拟运行实现高效安全的漏洞分析,实验表明其能发现比现有基线多23%的漏洞,并提升机器人操作性能。

Comments 26 Pages, 20 figures

详情
AI中文摘要

机器人操作策略虽然对物理AI的前景至关重要,但在现实世界中存在外部变化时却极易产生漏洞。诊断这些漏洞面临两大挑战:(i)需要测试的 relevant 变化通常未知,(ii)直接在现实世界中测试成本高且不安全。我们介绍了一个框架,通过在连续视觉-语言嵌入上进行虚拟运行,学习一个单独的深度强化学习(深度RL)策略来预测漏洞。通过将富含语义和视觉变化的嵌入空间视为势场,该策略学会向易损区域移动并被成功区域排斥。该漏洞预测策略在虚拟运行中训练,使漏洞分析能够扩展和安全地进行,而无需昂贵的物理试验。通过查询该策略,我们的框架构建了一个概率性漏洞可能性地图。在模拟基准和物理机器人手臂上的实验表明,我们的框架揭示的漏洞比最先进的视觉-语言基线多出23%,揭示了被启发式测试忽略的细微漏洞。此外,我们展示了通过我们的框架发现的漏洞微调操作策略,可以使用更少的微调数据提升操作性能。

英文摘要

Robot manipulation policies, while central to the promise of physical AI, are highly vulnerable in the presence of external variations in the real world. Diagnosing these vulnerabilities is hindered by two key challenges: (i) the relevant variations to test against are often unknown, and (ii) direct testing in the real world is costly and unsafe. We introduce a framework that tackles both issues by learning a separate deep reinforcement learning (deep RL) policy for vulnerability prediction through virtual runs on a continuous vision-language embedding trained with limited success-failure data. By treating this embedding space, which is rich in semantic and visual variations, as a potential field, the policy learns to move toward vulnerable regions while being repelled from success regions. This vulnerability prediction policy, trained on virtual rollouts, enables scalable and safe vulnerability analysis without expensive physical trials. By querying this policy, our framework builds a probabilistic vulnerability-likelihood map. Experiments across simulation benchmarks and a physical robot arm show that our framework uncovers up to 23% more unique vulnerabilities than state-of-the-art vision-language baselines, revealing subtle vulnerabilities overlooked by heuristic testing. Additionally, we show that fine-tuning the manipulation policy with the vulnerabilities discovered by our framework improves manipulation performance with much less fine-tuning data.