arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.20182 2026-05-20 cs.LG cs.AI 版本更新

Atoms of Thought: Universal EEG Representation Learning with Microstates

思想的原子：基于微状态的通用EEG表示学习

Xinyang Tian, Ruitao Liu, Ziyi Ye, Siyang Xue, Xin Wang, Xuesong Chen

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息研究院）； Institute of Trustworthy Embodied AI, Fudan University（复旦大学可信具身人工智能研究院）； School of Clinical Medicine, Tsinghua University（清华大学临床医学院）； Beijing Five Seasons Medical Technology Co., Ltd.（北京五 Seasons 医疗科技有限公司）

AI总结本文提出了一种基于微状态的通用EEG表示学习方法，通过将连续EEG信号聚类为离散的微状态序列，构建了一个通用的微状态分词器，并在睡眠分期、情绪识别和运动想象分类等下游任务中展示了其优越性，同时提高了可解释性和扩展性。

Comments Accepted by the 3rd International Workshop on Multimodal and Responsible Affective Computing (MRAC 2025). 8 pages of main text, 23 pages total, 5 figures, 4 tables

详情

DOI: 10.1145/3746270.3760230

AI中文摘要

从脑电图（EEG）信号中学习通用表示是神经信息学和脑机接口（BCIs）领域的一项前沿技术。传统上，EEG被视为多变量时间序列，其中时间域或频域特征被提取用于表示学习。本文研究了一种简单而有效的EEG表示，即微状态。微状态代表了在微观时间尺度上大脑活动模式的基本构建块。通过从大规模医疗EEG数据集中对连续EEG信号进行聚类，构建了一个通用的微状态分词器。该微状态分词器被广泛应用于一系列下游任务，包括睡眠分期、情绪识别和运动想象分类。实验结果表明，使用微状态进行EEG表示学习在不同模型和不同任务中均优于传统的时间域和频域特征。进一步分析显示，微状态提供了更高的可解释性和可扩展性，从而在认知神经科学和临床研究中开辟了应用。

英文摘要

Learning universal representations from electroencephalogram (EEG) signals is a cutting-edge approach in the field of neuroinformatics and brain-computer interfaces (BCIs). Conventionally, EEG is treated as a multivariate temporal signal, where time- or frequency-domain features are extracted for representation learning. This paper investigates a simple yet effective EEG representation, i.e., microstates. Microstates represent the building blocks of brain activity patterns at a microscopic time scale. We build a universal microstate tokenizer from a large medical EEG dataset by clustering continuous EEG signals into sequences of discrete microstates. The microstate tokenizer is then adopted universally across a series of downstream tasks, including sleep staging, emotion recognition, and motor imagery classification. Experimental results show that EEG representation learning with microstates outperforms traditional time-domain and frequency-domain features under different models and across different tasks. Further analysis shows that microstates offer greater interpretability and scalability, thereby opening up applications in both cognitive neuroscience and clinical research.

URL PDF HTML ☆

赞 0 踩 0

2605.20174 2026-05-20 cs.CV cs.LG 版本更新

Multi-axis Analysis of Image Manipulation Localization

多轴分析图像操纵定位

Keanu Nichols, Divya Appapogu, Giscard Biamby, Dina Bashkirova, Anna Rohrbach, Bryan A. Plummer

发表机构 * Boston University（波士顿大学）； University of California, Berkeley（加州大学伯克利分校）； Technical University of Darmstadt（德累斯顿技术大学）

AI总结本文提出AUDITS基准，用于多轴分析图像操纵检测，通过不同领域转移类型评估现有方法的鲁棒性，以推动更可靠和通用的图像操纵检测方法的发展。

Comments 28 pages, 5 figures, 5 tables

详情

AI中文摘要

先进的图像编辑软件使创建高度逼真的图像操纵变得容易，近年来由于生成式AI的进步，这种能力变得更加普及。虽然操纵的图像通常无害，但它们可能传播虚假信息、制造虚假叙述并影响人们对重要问题的看法。尽管这种威胁日益增长，但针对不同视觉领域检测高级操纵的研究仍然有限。因此，我们引入了Analysis Under Domain-shifts, QualIty, Type, and Size (AUDITS)，一个全面的基准，用于研究图像操纵检测中的分析轴。AUDITS包含来自两个不同来源（用户和新闻照片）的超过530,000张图像。我们通过最近的扩散基填充技术整理数据集，以支持跨多个轴的分析，涵盖多样化的操纵类型和尺寸。我们通过不同的领域转移类型进行实验，以评估现有图像操纵检测方法的鲁棒性。我们的目标是通过提供新的见解来推动该领域进一步研究，以帮助开发更可靠和通用的图像操纵检测方法。

英文摘要

Advanced image editing software enables easy creation of highly convincing image manipulations, which has been made even more accessible in recent years due to advances in generative AI. Manipulated images, while often harmless, could spread misinformation, create false narratives, and influence people's opinions on important issues. Despite this growing threat, there is limited research on detecting advanced manipulations across different visual domains. Thus, we introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark designed for studying axes of analysis in image manipulation detection. AUDITS comprises over 530K images from two distinct sources (user and news photos). We curate our dataset to support analysis across multiple axes using recent diffusion-based inpaintings, spanning a diverse range of manipulation types and sizes. We conduct experiments under different types of domain shift to evaluate robustness of existing image manipulation detection methods. Our goal is to drive further research in this area by offering new insights that would help develop more reliable and generalizable image manipulation detection methods.

URL PDF HTML ☆

赞 0 踩 0

2605.20167 2026-05-20 cs.AI cs.LG 版本更新

HaorFloodAlert: Deseasonalized ML Ensemble for 72-Hour Flood Prediction in Bangladesh Haor Wetlands

HaorFloodAlert: 用于孟加拉国Haor湿地72小时洪水预测的去季节化机器学习集成

Salma Hoque Talukdar Koli, Fahima Haque Talukder Jely, Md. Samiul Alim, Md. Zakir Hossen

发表机构 * 1 Department of Computer Science ； Engineering, RTM Al-Kabir Technical University, Sylhet-3100, Bangladesh 2 Department of Computer Science ； Engineering, North East University Bangladesh, Sylhet, Bangladesh 3 Department of Computer Science ； Engineering, Dhaka University of Engineering \& Technology, Gazipur, Bangladesh [6pt] Corresponding author: ( )

AI总结本文提出HaorFloodAlert，一种去季节化的机器学习集成模型，用于预测孟加拉国Haor湿地72小时内的洪水概率，通过识别温度季节性影响和利用Sentinel-1 SAR数据提高预测准确性。

Comments 9 pages, 9 figures. To be submitted to raaicon.org

详情

AI中文摘要

孟加拉国Haor湿地的快速洪水几乎没有任何预警，破坏年度boro稻收获。现有系统为河流洪水设计，完全忽略了回水动态。这些流域平坦，水的行为不同于布拉马普特拉河。我们构建了HaorFloodAlert，一种去季节化的机器学习集成，用于预测Sunamganj Haor（约8,000平方公里）72小时内的洪水概率。温度被发现是季节性的作弊代码，因为它在温暖月份洪水发生时提高了准确性6.9个百分点。我们捕捉到了这一点，并构建了一个上游Barak河Sentinel-1 SAR代理，从阿萨姆的Silchar提供约36小时的预警。Otsu阈值化的SAR变化检测在空间匹配上验证达到84-91%。操作性集成（RF 0.5625 + XGBoost 0.4375）在77个真实的Sentinel-1事件上达到89.6%的LOOCV准确性，87.5%的召回率和0.943的AUC-ROC。还包含三级警报管道和BRRI校准的boro稻损害估计器。

英文摘要

Flash floods in Bangladesh's haor wetlands show up with almost no warning. They wreck the annual boro rice harvest. Current setups, built for riverine floods, miss backwater dynamics entirely. These basins are flat. Water does not behave like it does on the Brahmaputra. We built HaorFloodAlert, a deseasonalized machine learning ensemble that forecasts 72-hour flood probability for the Sunamganj Haor (approximately 8,000 km2). Temperature was acting as a seasonal cheat code - it inflated accuracy by 6.9 pp just because floods happen in warm months. We caught that. We also built an upstream Barak River Sentinel-1 SAR proxy from Silchar, Assam, giving about 36 hours of lead time. Otsu-thresholded SAR change detection validates at 84-91 percent spatial match. The operational ensemble (RF 0.5625 + XGBoost 0.4375) hits 89.6 percent LOOCV accuracy, 87.5 percent recall, and 0.943 AUC-ROC on 77 real Sentinel-1 events. A three-tier alert pipeline and a BRRI-calibrated boro rice damage estimator are included.

URL PDF HTML ☆

赞 0 踩 0

2605.20159 2026-05-20 cs.CV cond-mat.mtrl-sci cs.LG 版本更新

Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites

用于航空SiC/SiC复合材料X射线断层扫描缺陷检测的可解释计算机视觉

Antonio Peña Corredor, Julien Lesseur, Romain Nunez, Paul Rivalland, Thomas Philippe

发表机构 * Safran Ceramics（萨弗兰陶瓷）； Safran Engineering Services（萨弗兰工程服务）

AI总结本研究提出了一种结合原型层的p-ResNet-50框架，通过引入新的正则化项和语义对齐，提高了X射线断层扫描中缺陷检测的可解释性和准确性，同时保持了高精度和可追溯性。

详情

AI中文摘要

航空SiC/SiC复合材料的非破坏性检测依赖于专家视觉评估，当前流程在接受/拒绝决策方面缺乏可追溯性。深度卷积网络可以自动检测缺陷，但其黑盒性质与工业检测实践所需的透明性相冲突。为此，我们引入了p-ResNet-50，一种扩展了原型层的卷积框架，将高检测精度与基于案例的解释相结合。六个学习到的原型被显式对齐到专家定义的语义类别——健康基质、基质-空气界面、孔洞、线状缺陷和混合形态，使得每个分类都能追溯到具有物理意义的参考。两种新的正则化项，基于锚点和中位数，将原型连接到专家选择的片段，并防止原型崩溃，解决了原型网络已知的限制。通过UMAP进行的潜在空间分析揭示了语义连贯的子域，并映射出不确定性区域，这些区域集中了误分类，使检查员能够明确了解模型在哪里可靠，以及不可靠。该框架在约12,000个片段的XCT数据集上进行了验证，这些片段是从四个缺陷丰富的SiC/SiC实验室样品中提取的。与黑盒ResNet-50基线（ROC-AUC = 0.991）相比，原型扩展实现了相似的性能（准确率0.957 vs. 0.959；ROC-AUC 0.994 vs. 0.993），虽然灵敏度略有降低，但精度和特异性更高。每个决定都由代表性的证据片段支持，并且模型明确标记其不确定性区域。除了缺陷映射外，该框架还建立了一种可重用的方法，用于将领域专家知识嵌入到原型网络中，适用于其他需要可追溯、可审计决策的XCT检测场景。

英文摘要

Non-destructive testing of aerospace SiC/SiC composites via X-ray computed tomography (XCT) relies on expert visual assessment, with current workflows offering limited traceability for accept/reject decisions. Deep convolutional networks can automate defect detection, yet their black-box nature conflicts with the transparency that industrial inspection practice demands. To close this gap, we introduce p-ResNet-50, a convolutional framework extended with a prototype layer that couples high detection accuracy with case-based explanations. Six learned prototypes are explicitly aligned with expert-defined semantic categories-healthy matrix, matrix--air interfaces, pores, line-like defects, and mixed morphologies-so that every classification is traceable to a physically meaningful reference. Two novel regularisation terms, anchor-based and medoid-based, tether prototypes to expert-selected patches and prevent prototype collapse, addressing a known limitation of prototype networks. Latent-space analysis via UMAP delineates semantically coherent sub-domains and maps zones of uncertainty where misclassifications concentrate, giving inspectors an explicit picture of where the model is-and is not-reliable. The framework is validated on an XCT patch dataset of approximately 12,000 patches extracted from four defect-rich SiC/SiC laboratory specimens. Taking a black-box ResNet-50 as a baseline (ROC-AUC = 0.991), the prototype extension achieves comparable performance (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) while trading a slight reduction in sensitivity for higher precision and specificity. Each decision is backed by representative evidence patches, and the model explicitly flags its uncertainty regions. Beyond defect mapping, the framework establishes a reusable methodology for embedding domain-expert knowledge into prototype networks, applicable to other XCT inspection scenarios requiring traceable, auditable decisions.

URL PDF HTML ☆

赞 0 踩 0

2605.20157 2026-05-20 cs.LG cs.CR cs.IR 版本更新

SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection

SAGE：可扩展的自动门控集成用于自信的负面采样在欺诈检测中

Sudheer Tubati, Amit Goyal

发表机构 * Amazon Music（亚马逊音乐）

AI总结本文提出SAGE，一种结合SimHash基于的分层抽样和模块化门控集成的反事实意识负面采样方法，以在欺诈检测中实现对未标记数据的自信负面识别，解决了正例未标记学习中的表示偏差问题。

详情

DOI: 10.1145/3779211.3793166
Journal ref: WSDM Companion '26: Nineteenth ACM International Conference on Web Search and Data Mining, 2026, Pages 34 - 38

AI中文摘要

音乐流媒体欺诈，即恶意行为者人为提高流媒体计数以操纵排行榜和版税支付，对流媒体服务和合法内容创作者构成重大威胁。传统欺诈检测方法面临关键挑战：许多合法边缘案例，包括超级粉丝和睡眠音乐会，表现出的活动模式与协调欺诈非常相似。我们提出了SAGE，一种新颖的反事实意识负面采样方法，结合SimHash基于的分层抽样和模块化门控集成，用于从未标记数据中自信地识别负面样本。我们的集成架构采用可插拔的统计门（目前实例化为Mahalanobis距离和k-NN密度）和可配置的投票阈值，以实现自适应的精度-召回率权衡。这通过通过地板约束抽样确保罕见行为群体的全面覆盖，解决了正例未标记学习中的表示偏差问题。评估显示在保留数据上具有强精度和召回率。该方法在欺诈检测领域具有良好的泛化能力，在客户层面和艺术家层面的欺诈检测中均能实现强性能，而无需修改核心方法。

英文摘要

Music streaming fraud, where bad actors artificially inflate stream counts to manipulate chart rankings and royalty payments, poses a significant threat to streaming services and legitimate content creators. Traditional fraud detection approaches struggle with a critical challenge: many legitimate edge cases, including super-fans and sleep-music sessions, exhibit activity patterns that closely mimic those of coordinated fraud. We present SAGE, a novel counterfactual-aware negative harvesting approach that combines SimHash-based stratified sampling with a modular gating ensemble for confident negative identification from unlabeled data. Our ensemble architecture employs pluggable statistical gates (currently instantiated with Mahalanobis distance and k-NN density) with configurable voting thresholds enabling adaptive precision-recall trade-offs. This addresses the representation bias problem in Positive-Unlabeled learning by ensuring comprehensive coverage of rare behavioral cohorts through floor-constrained sampling. Evaluation demonstrates strong precision and recall on held-out data. The approach generalizes across fraud detection domains, achieving strong performance on both customer-level and artist-level fraud without modification to the core methodology.

URL PDF HTML ☆

赞 0 踩 0

2605.20151 2026-05-20 cs.LG math.ST stat.TH 版本更新

When Does Model Collapse Occur in Structured Interactive Learning?

在结构互动学习中模型崩溃何时发生？

Yuchen Wu, Kangjie Zhou, Weijie Su

发表机构 * School of Operations Research and Information Engineering, Cornell University（卡内基梅隆大学运营管理与信息工程学院）； Department of Statistics, Columbia University（哥伦比亚大学统计系）； Department of Statistics and Data Science, University of Pennsylvania（宾夕法尼亚大学统计与数据科学系）

AI总结研究探讨了在结构互动学习环境中，生成模型性能下降（模型崩溃）的发生条件，通过分析交互图拓扑结构，推导出模型崩溃的必要和充分条件，并通过数值实验验证理论结果。

Comments 57 pages, 12 figures

详情

AI中文摘要

生成式人工智能的普及催生了交互学习环境，其中模型参数通过自然过程生成的数据和由其他模型产生的合成输出不断更新。这种范式引入了两大挑战：（1）训练数据不再仅来自目标群体，破坏了经典统计学习的核心假设；（2）模型训练过程变得内在相关，因为模型通过反复接触彼此的合成输出进行交互，方式可能复杂。在这样的结构互动学习环境中建立可靠的统计推断仍然是一个重要开放问题。特别是，人们对模型崩溃现象日益关注，该现象是指生成模型在训练于早期模型生成的合成数据时性能逐步下降。先前关于模型崩溃的研究主要集中在单个模型训练其自身输出的情况，未能捕捉多模型交互环境中的模型性能。在本文中，我们填补了这一空白，通过研究具有通用交互模式的交互学习环境中的生成模型性能。特别是，我们利用有向图形式化模型交互，并证明模型崩溃的发生严重依赖于交互图的拓扑结构。我们进一步推导出一个显式的必要和充分条件，以表征模型崩溃何时发生，并为线性回归建立有限样本结果，为一般M估计量建立渐近保证。我们通过广泛的数值实验支持我们的理论发现。

英文摘要

The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major challenges: (1) training data are no longer drawn exclusively from the target population, undermining a core assumption of classical statistical learning, and (2) model training processes become inherently correlated, as models interact with one another through repeated exposure to each other's synthetic outputs in a potentially complex manner. Establishing reliable statistical inference in such structured interactive learning environments therefore remains an important open problem. In particular, there is growing concern about model collapse, a phenomenon in which the performance of generative models progressively degrades as they are trained on synthetic data produced by earlier model generations. Prior work on model collapse primarily focuses on a single model trained on its own output, failing to capture model performance in multi-model interactive settings. In this work, we fill this gap by investigating the performance of generative models in an interactive learning environment with general interaction patterns. In particular, we formalize model interactions using directed graphs and show that the occurrence of model collapse depends critically on the topology of the interaction graph. We further derive an explicit necessary and sufficient condition characterizing when model collapse occurs, and establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators. We support our theoretical findings through extensive numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.20145 2026-05-20 stat.ML cs.LG stat.ME 版本更新

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

面向目标的高斯过程低尾校准用于贝叶斯优化

Aurélien Pion, Emmanuel Vazquez

发表机构 * Univ. Paris-Saclay, CNRS, CentraleSupélec, L2S, Gif-sur-Yvette, France（巴黎萨克雷大学，国家科学研究中心，中央超算实验室，L2S，法国吉夫-sur-伊夫特）

AI总结本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，提出了一种后处理方法tcGP，以校准预测分布低于t的部分，并展示了基于此的全局优化算法在设计空间中保持密集性，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

详情

Journal ref: ICML 2026

AI中文摘要

贝叶斯优化（BO）利用高斯过程（GP）预测分布来选择昂贵的黑箱目标的评估点。核选择和超参数选择可能导致预测分布不准确，从而影响探索与利用的平衡。对于最小化问题，采样标准如预期改进（EI）依赖于当前最佳值以下的预测分布，因此低尾不准确直接影响采样决策。本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，超参数通过最大似然法选择。引入了一种预测可靠性低于t的框架，基于两个空间校准的概念：设计空间上的发生校准和子水平集形式{ x∈X, f(x)≤t }上的阈值μ-校准。在此框架基础上，提出tcGP，一种后处理方法，用于校准预测分布低于t的部分，并证明由此得到的基于EI的全局优化算法在设计空间中保持密集。在标准基准测试中，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

英文摘要

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

URL PDF HTML ☆

赞 0 踩 0

2605.20134 2026-05-20 cs.LG 版本更新

TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning

TrajTok: 用于轨迹表示学习的自适应空间令牌化

Zhen Xiong, Shang-Ling Hsu, Cyrus Shahabi

发表机构 * University of Southern California（南加州大学）

AI总结本文提出TrajTok，一种通过自适应空间令牌化学习通用轨迹表示的方法，通过多分辨率六边形网格划分和预训练策略，实现了在轨迹相似性搜索、分类、预计到达时间和旅行时间回归等任务上的优异表现。

详情

AI中文摘要

从原始GPS轨迹学习通用的轨迹表示仍然具有挑战性，因为数据是连续的、嘈杂的且采样不规则。空间令牌化同样具有挑战性：细网格会产生稀疏单元格，嵌入较弱，而粗网格会将异质运动模式合并为同一个令牌。我们提出了TrajTok，一种具有简单预训练配方的轨迹编码器，用于可转移的轨迹嵌入。TrajTok首先从GPS点的空间分布学习多分辨率六边形网格划分，将嘈杂的GPS序列转换为离散的单元格令牌。为了捕捉几何和运动学，它使用分解的Transformer编码器，带有早期模态自注意力块、跨注意力融合层和时空旋转位置嵌入（ST-RoPE），以编码每个令牌的位置和时间。TrajTok通过掩码令牌建模进行预训练，从部分轨迹观测中恢复几何结构和运动学模式。在Porto数据集上，冻结的TrajTok编码器结合轻量级任务适配器在轨迹相似性搜索、分类、预计到达时间和完整旅行时间回归任务上表现优异，优于多种任务特定方法。相同的冻结编码器支持几何主导和运动学主导任务，表明TrajTok学习了可转移的轨迹结构，而不是任务特定的捷径。这些结果表明，学习多分辨率空间令牌化结合掩码令牌预训练是通用轨迹基础模型的有希望的方向。

英文摘要

Learning generalizable trajectory representations from raw GPS traces remains difficult because the data is continuous, noisy, and irregularly sampled. Spatial tokenization is also challenging: fine grids yield sparse cells with weak embeddings, while coarse grids merge heterogeneous movement patterns into the same token. We present TrajTok, a trajectory encoder with a simple pretraining recipe for transferable trajectory embeddings. TrajTok first learns a multi-resolution hexagonal cell partition from the spatial distribution of GPS points, converting noisy GPS sequences into discrete cell tokens. To capture both geometry and kinematics, it uses a factorized transformer encoder with early per-modality self-attention blocks, cross-attention fusion layers, and spatiotemporal rotary position embeddings, ST-RoPE, to encode where and when each token occurs. TrajTok is pretrained with masked-token modeling that recovers both geometric structure and kinematic patterns from partial trajectory observations. On the Porto dataset, a frozen TrajTok encoder with lightweight task adapters achieves strong performance across trajectory similarity search, classification, estimated time of arrival, and full travel-time regression, outperforming multiple task-specific methods. The same frozen encoder supports both geometry-dominated and kinematics-dominated tasks, suggesting that TrajTok learns transferable trajectory structure rather than task-specific shortcuts. These results indicate that learned multi-resolution spatial tokenization combined with masked-token pretraining is a promising direction for general-purpose trajectory foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.20132 2026-05-20 physics.geo-ph cs.LG eess.SP 版本更新

FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

FiLark：一种面向流式处理的软件框架，用于分布式声学传感的端到端探索、标注和算法集成

Jintao Li, Weichang Li, Kai Tong, Xaingyu Guo

发表机构 * organization= State Key Laboratory of Ocean Sensing \& Ocean College, Zhejiang University , city= Zhoushan , postcode= 316021 , country= China ； organization= College of Information Science ； Electronic Engineering, Zhejiang University , city= Hangzhou , country= China ； organization= College of Computer Science ； Technology, Zhejiang University , city= Hangzhou , country= China

AI总结本文提出FiLark框架，通过流式处理原则，实现分布式声学传感数据的端到端探索、标注和算法集成，解决传统批量分析框架无法处理连续高通道数据流的问题。

详情

AI中文摘要

分布式声学传感（DAS）系统生成的连续、超高通道计数的数据流速率超过了传统批量分析框架的能力。因此，诸如长时记录的交互探索、可扩展的事件标注和实时算法闭环监控等关键任务仍然无法得到足够支持。本文提出了FiLark（Fiber Lark），一种Python框架，其应用流式处理原则贯穿数据访问、信号处理、可视化和监控。FiLark将任何DAS源，包括连续多文件记录，作为统一流进行处理，并围绕该抽象构建所有系统组件。基于OpenGL的环形缓冲区渲染器允许以恒定内存使用量交互浏览和可视化任意长的记录。集成的标注界面支持在连续数据流中直接进行事件标注，从而在不进行离线预处理的情况下创建可重复的机器学习准备好的标注数据集。信号处理库包括时间、空间、频谱和分解基的运算符，包含通过PyTorch实现的CPU版本和GPU加速版本，以及具有状态的分块执行，以在段边界保持处理连续性和应用语义。标准化的监控接口进一步将流式检测器和基于学习的模型整合到可视化工作流程中。通过在所有层次共享共同的流式抽象，FiLark允许在交互式开发的处理配置和工作流程直接转移到可扩展的生产管道中，而无需修改。

英文摘要

Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.

URL PDF HTML ☆

赞 0 踩 0

2605.20127 2026-05-20 q-bio.NC cs.AI cs.LG 版本更新

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

超越预测准确性：用于评估模型-大脑对齐的靶空间恢复曲线

Ken Nakamura, Tomoya Nakai, Ryuto Yashiro, Ayumu Yamashita, Kaoru Amano

发表机构 * The University of Tokyo（东京大学）； Osnabrück University and Freie Universität Berlin（奥斯纳布吕克大学和柏林自由大学）； Kobe University（Kobe大学）

AI总结本文提出了一种评估模型-大脑对齐的新方法，通过分析可重复预测的靶空间响应维度，揭示预测准确性之外的模型-大脑对齐情况。

Comments 34 pages, 12 figures, 5 tables

详情

AI中文摘要

人工视觉模型通常通过测量其内部表示预测大脑响应的准确性来评估人类视觉皮层。然而，仅凭预测准确性无法确定目标大脑响应空间中哪些维度被恢复。本文介绍了一种统一框架，通过识别预测恢复的响应维度来评估模型-大脑和大脑-大脑对齐。通过重复fMRI测量，我们首先确定可在独立试验分割中重复预测的目标大脑响应维度。然后，我们预测目标大脑响应，无论是从另一个受试者的大脑响应还是视觉模型的内部表示，并量化这些可重复响应维度的恢复程度。将此框架应用于自然场景数据集的一个子集，其中八名受试者在fMRI下观看了相同的自然图像，我们发现早期到中期视觉皮层响应包含一组低维的可重复维度。大脑-大脑比较确定哪些维度可以从其他受试者的大脑中一致恢复，提供了一种诊断性的人类参考而非仅标量基准。在某些情况下，预训练和随机初始化的模型在预测准确性上相似，但这些响应维度的恢复曲线却不同。这些结果表明，仅凭预测准确性可能掩盖模型-大脑不匹配。通过明确哪些可重复的大脑响应维度被预测恢复，我们的框架提供了更诊断性的评估，以评估人工视觉模型与人类视觉皮层的对齐情况。

英文摘要

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

URL PDF HTML ☆

赞 0 踩 0

2605.20122 2026-05-20 stat.ML cs.CC cs.LG 版本更新

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

优化Wasserstein距离估计的计算-统计运行时间

Peter Matthew Jacobs, Jeff M. Phillips

发表机构 * Department of Statistics（统计学系）； Kahlert School of Computing（Kahlert计算学院）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； University of Utah（犹他大学）

AI总结本文提出了一种Sample-Sketch-Solve方法，通过引入正则化笛卡尔网格草图来压缩数据并加速Wasserstein距离的计算，实现了在Hölder光滑分布下以更优的运行时间达到ε误差的估计。

详情

AI中文摘要

平方Wasserstein距离是衡量概率分布之间差异的常用工具。该距离通常在两个底层随机样本的经验测度之间计算。不幸的是，即使在低维欧几里得空间问题（d∈{2,3}）中，计算Wasserstein距离的算法在运行时间上随着n和所需精度的增加而表现不佳。为此，我们考虑计算-统计运行时间，目标是从样本中估计潜在光滑测度之间的Wasserstein距离，误差在期望意义上不超过ε。我们允许收集样本的计算成本为O(1)。为此，我们开发了一种Sample-Sketch-Solve范式，其中引入了样本的正则化笛卡尔网格草图。我们证明，尤其是在α-Hölder光滑分布下，这可以压缩数据而不增加渐近误差，并且正则化结构使更快的精确算法成为可能。最终，我们以ε误差在ε^{-max(2,(d+1+o(1))/(1+α))}时间内近似W_2^2(P,Q)，对于0 < α < 1的Hölder光滑分布P,Q在(0,1)^d上；当d=2时，对于α>1/2，达到最优Θ(ε^{-2})，当d=3时，当α→1时几乎最优。

英文摘要

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

URL PDF HTML ☆

赞 0 踩 0

2605.20108 2026-05-20 eess.SY cs.AI cs.LG cs.LO cs.SY 版本更新

k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics

k-诱导神经屏障证书用于未知非线性动力学

Ben Wooding, Hongchao Zhang, Taylor T. Johnson, Abolfazl Lavaei

发表机构 * Vanderbilt University（范德堡大学）； Newcastle University（新castle大学）

AI总结本文提出了一种基于神经网络的k-诱导神经屏障证书(k-NBCs)，用于部分未知的非线性系统，通过利用神经网络的可扩展性以及泛化Willems等人基本引理，构建数据驱动的表示以进行SMT验证，同时提高了设计灵活性。

Comments 18 pages, 5 figures, 3rd International Conference on Neuro-Symbolic Systems (NeuS)

详情

AI中文摘要

尽管传统的(k=1)离散时间屏障证书条件通过要求函数在每一步都非递增来施加严格的安全约束，k-诱导屏障证书通过允许临时增加--最多k-1次，每次在阈值ε内--同时保持整体安全性并提高灵活性。本文利用神经网络构建k-诱导神经屏障证书(k-NBCs)用于(部分)未知的非线性系统。虽然神经网络在设计过程中提供可扩展性，但缺乏形式保证，需要额外的方法如基于可满足性模理论(SMT)的反例引导归纳合成(CEGIS)进行验证。然而，CEGIS-SMT框架需要系统动力学的知识，这在实际情况下不可用。为此，我们利用Willems等人基本引理的泛化，使用单个状态轨迹，构建数据驱动的表示以进行SMT验证而不牺牲准确性。此外，CEGIS-SMT进一步消除了将屏障证书限制在特定函数类（如平方和）的约束，从而在设计上具有更大的灵活性。我们验证了我们的方法在三个非线性案例研究中，具有(部分)未知的动力学。

英文摘要

While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.20107 2026-05-20 cs.LG cs.AI 版本更新

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

超越各向同性：JEPAs中的哈密顿几何与辛预测

Robert Jenkinson Alvarez

发表机构 * GitHub

AI总结本文研究了JEPAs中各向同性假设的局限性，提出基于哈密顿几何的辛预测方法，通过相空间状态和学习的哈密顿量预测视图间过渡，从而提升模型在不同数据集上的性能。

详情

AI中文摘要

JEPAs通常将单视图嵌入正则化为各向同性的高斯分布，隐含地将欧几里得对称性纳入表示中。我们证明这不仅仅是无害的默认设置。对于已知的结构化下游几何H>0，最小最大和最大熵协方差在哈密顿能量预算下为(c/d)H^{-1}，欧几里得各向同性会带来闭式价格。更重要的是，当下游几何未知时，没有几何无关的固定边际目标是规范的：每个固定协方差形状可以对某些结构化几何最大化地错位。我们进一步表明，即使拥有oracle单视图边际，也无法识别JEPA视图间预测耦合。这些结果表明，JEPAs中的结构偏差应进入跨视图耦合而非固定编码器边际。我们通过HamJEPA实例化这一原则，将每个视图编码为相空间状态(q,p)，并通过学习的哈密顿量跃迁映射预测视图间过渡，非各向同性的尺度和频谱地板防止崩溃。在刻意无头标记协议中，HamJEPA在CIFAR-100上比SIGReg提升4.89 kNN@20和3.52线性探针点，在30个epoch时，以及在80个epoch时提升6.45 kNN@20和10.64线性探针点。而匹配的MLP预测器消融显示，辛耦合是驱动邻域几何增益的成分。在ImageNet-100上，HamJEPA-q在45个epoch时提升4.82 kNN@20和7.52线性探针点。

英文摘要

JEPAs often regularize one-view embeddings toward an isotropic Gaussian, implicitly baking Euclidean symmetry into the representation. We show that this is not merely a benign default. For a known structured downstream geometry $H\succ0$, the minimax and maximum-entropy covariance under a Hamiltonian energy budget is $(c/d)H^{-1}$, and Euclidean isotropy incurs a closed-form price of isotropy. More importantly, when the downstream geometry is unknown, no geometry-independent fixed marginal target is canonical: every fixed covariance shape can be maximally misaligned for some structured geometry. We further show that even oracle one-view marginals do not identify the JEPA view-to-view predictive coupling. These results suggest that the structural bias in JEPAs should enter the cross-view coupling rather than a fixed encoder marginal. We instantiate this principle with \textbf{HamJEPA}, which encodes each view as a phase-space state $(q,p)$ and predicts view-to-view transitions with a learned Hamiltonian leapfrog map, while non-isotropic scale and spectral floors prevent collapse. In a deliberately headless token protocol, HamJEPA improves over SIGReg on CIFAR-100 by $+4.89$ kNN@20 and $+3.52$ linear-probe points at 30 epochs, and by $+6.45$ kNN@20 and $+10.64$ linear-probe points at 80 epochs, while a matched MLP predictor ablation shows that the symplectic coupling is the ingredient driving the neighborhood-geometry gain. On ImageNet-100, HamJEPA-$q$ improves by $+4.82$ kNN@20 and $+7.52$ linear-probe points at 45 epochs.

URL PDF HTML ☆

赞 0 踩 0

2605.20105 2026-05-20 cs.LG 版本更新

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing

最优表示尺寸：预训练和线性探测的高维分析

Valentina Njaradi, Clémentine Dominé, Rachel Swanson, Marco Mondelli, Andrew Saxe

发表机构 * Gatsby Computational Neuroscience Unit（Gatsby计算神经科学单元）； University College London（伦敦大学学院）； Institute of Science and Technology Austria（奥地利科学与技术研究所）； Sainsbury Wellcome Centre（萨金斯-韦尔科姆中心）

AI总结本文研究了预训练和线性探测过程中的最优表示尺寸问题，通过高维分析揭示了表示维度、未标记和标记样本数量以及任务对齐性对训练和泛化误差的影响，提出了在不同预训练和下游数据条件下优化表示尺寸的条件。

详情

AI中文摘要

学习从有限数据中泛化是人工和生物系统面临的基本挑战。一种常见策略是从大量未标记数据中提取可重用的结构，从而高效适应新任务。这种两阶段范式现在已成为现代训练流水线的标准，即预训练后进行微调或线性探测。我们为这一过程提供了一个分析模型：结构提取被形式化为主成分分析，而下游学习则被建模为对单独标记数据集的线性回归。在高维情况下，我们推导出训练和泛化误差的精确表达式，展示了其对表示维度、未标记和标记样本数量以及任务对齐性的依赖性。我们的结果表明，预训练表示强烈影响下游泛化，我们将其最优表示尺寸作为任务参数的函数进行表征：在大量预训练数据但稀缺下游数据时，最大压缩表示最优；而在预训练数据有限时，高维表示泛化更好。此外，我们建立了预训练和监督之间的精确权衡，量化了需要多少未标记数据来替代一个标记样本。除了我们理想化的模型外，我们在自编码器和预训练大语言模型中也观察到相似的现象。总体而言，我们强调优化表示尺寸至关重要，给出了压缩预训练时提高泛化的条件。

英文摘要

Learning to generalise from limited data is a fundamental challenge for both artificial and biological systems. A common strategy is to extract reusable structure from abundant unlabelled data, enabling efficient adaptation to new tasks from limited labelled data. This two-stage paradigm is now standard in modern training pipelines, where pretraining is followed by fine-tuning or linear probing. We provide an analytical model of this process: structure extraction is formalized as principal component analysis on unlabelled data, and downstream learning as linear regression on a separate labelled dataset. In the high-dimensional regime, we derive exact expressions for training and generalisation error showcasing their dependence on representation dimensionality, unlabelled and labelled sample sizes, and task alignment. Our results show that pretrained representations strongly influence downstream generalisation, and we characterize the optimal representation size as a function of task parameters: with abundant pretraining data but scarce downstream data, maximally compressed representations are optimal, whereas with limited pretraining data, higher-dimensional representations generalise better. Furthermore, we establish an exact trade-off between pretraining and supervision, quantifying how much unlabelled data is required to replace a single labelled sample. Beyond our idealised model, we observe similar phenomenology in autoencoders and pretrained LLMs. Altogether, we highlight that optimising representation size is critical, giving conditions for when compression during pretraining improves generalisation.

URL PDF HTML ☆

赞 0 踩 0

2605.20104 2026-05-20 cs.LG cs.AI 版本更新

概率守恒的流引导

Parsa Esmati, Junha Hyung, Amirhossein Dadashzadeh, Jaegul Choo, Majid Mirmehdi

发表机构 * University of Bristol（布里斯托大学）； KAIST（韩国科学技术院）

AI总结本文提出了一种概率守恒的流引导方法AdaMaG，通过分析连续方程，将引导效果分解为发散项和分数平行项，并通过时间依赖的调度和分数平行衰减来控制这两个项，从而在不增加推理成本的情况下提高生成质量并减少幻觉。

详情

AI中文摘要

扩散和基于流的生成模型在视觉合成中占据主导地位，引导将样本对齐到用户输入并提高感知质量。然而，分类器无关引导（CFG）和基于外推的方法是速度/分数的启发式线性组合，忽略了生成流形的几何结构，破坏了概率守恒，导致在强引导下样本偏离学习的流形。我们通过连续方程分析引导，并展示其效果分解为一个发散项和一个在参数化下不变的分数平行项。我们证明发散项在采样接近数据流形时结构上会发散，这促使我们采用时间依赖的调度和分数平行衰减。所得到的即插即用规则，自适应流形引导（AdaMaG），在不增加推理成本的情况下限制了这两个项。最后，我们展示大多数减少饱和或提高生成质量的实证启发式方法直接对应于我们分解中的两个项。在图像生成基准测试中，AdaMaG提高了真实感，减少了幻觉，并在高引导制度下诱导了受控的去饱和。

英文摘要

Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.20074 2026-05-20 cs.LG 版本更新

Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

面向组合优化中算法对齐的蒸馏保证

Thien Le, Melanie Weber

发表机构 * SEAS, Harvard University（哈佛大学SEAS学院）

AI总结本文研究了在算法对齐框架下，通过蒸馏将大规模模型的知识转移到更高效的模型以用于部署的问题，重点分析了当目标模型是图神经网络且其架构与动态规划算法对齐时，蒸馏成功的条件。

Comments 22 pages

详情

AI中文摘要

蒸馏将知识从在广泛数据上训练的大模型转移到更小、更高效的模型，以用于部署。在结构预测设置中，任务的先验知识可以指导目标架构的选择，使其与底层问题在算法上对齐。在最近的决策树（DT）蒸馏学习理论分析（Boix-Adsera, 2024）基础上，我们研究了蒸馏在组合优化任务中成功的情况。我们关注目标模型是图神经网络，其架构与任务的动态规划（DP）算法对齐的情况。假设源模型足够丰富，通过线性表示假设（LRH）（Elhage et al., 2022; Park et al., 2024）形式化，我们证明蒸馏问题可以在DP转移函数的复杂度参数中高效解决，该参数表示为决策树。我们的结果提供了在算法对齐风味下的蒸馏成功严格充分条件。

英文摘要

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a DT. Our results provide a rigorous sufficient condition for successful distillation in the flavour of algorithmic alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.20068 2026-05-20 stat.ML cs.LG 版本更新

Tail Annealing for Heavy-Tailed Flow Matching

尾部退火用于厚尾流匹配

Jean Pachebat

发表机构 * CMAP, École Polytechnique, Institut Polytechnique de Paris（CMAP，巴黎高等学院，巴黎理工学院）

AI总结本文提出了一种简单的方法，通过在训练前对数据应用软对数变换，然后在生成后进行指数化，以处理厚尾数据问题。该方法通过Hill诊断决定是否对每个坐标进行变换，保留轻尾边缘不变，从而压缩厚尾到标准流匹配可以处理的范围内，无需厚尾基础分布或架构修改。

Comments 18 pages

详情

AI中文摘要

标准生成模型在处理厚尾数据时存在困难：Lipschitz架构无法从高斯噪声中生成幂律尾部，且在厚尾数据和高斯数据之间插值是不合理的。我们提出一个简单的解决方案：在训练前对数据应用软对数变换$ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$，然后在生成后对样本进行指数化。Hill诊断决定每个坐标是否进行变换，从而在不增加复杂度的情况下保留轻尾边缘不变。这将厚尾压缩到标准流匹配可以处理的范围内，而无需厚尾基础分布或架构修改。我们提供了理论直觉说明其有效性：对数变换将帕累托尾部映射到指数，诱导的动力学通过幂变换实现尾部退火。在144配置的多变量基准测试（3个copulas，$d$最大到100，4个尾指数）上，Log-FM在$W_1$、CVaR$_{99}$和极值分位数度量上优于专门的基线，并且是唯一在2880次运行中无严重发散的方法。

英文摘要

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.

URL PDF HTML ☆

赞 0 踩 0

2605.20040 2026-05-20 cs.LG 版本更新

Active Context Selection Improves Simple Regret in Contextual Bandits

主动上下文选择提升上下文老虎机中的简单遗憾

Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash

发表机构 * College of Management of Technology, EPFL（EPFL技术管理学院）； Department of Computer Science, TU Munich（慕尼黑工业大学计算机科学系）

AI总结本文研究了具有有限上下文空间的上下文多臂老虎机问题，通过主动选择上下文样本来优化简单遗憾，提出了一种在已知和未知上下文分布时均能有效提升性能的算法。

详情

AI中文摘要

我们研究了具有有限上下文空间（即亚群体）的上下文多臂老虎机问题，其中学习者为每个上下文推荐最佳动作，并通过上下文加权简单遗憾进行评估。我们的保证是在奖励分布的最坏情况下，同时保持对上下文分布向量p的实例依赖性。类似于实验设计问题，其中感兴趣的总体是固定的但可选的亚群体可以被控制，我们允许学习者主动选择从何处采样上下文。对于已知的p，我们刻画了紧致的遗憾率：被动采样（上下文随机揭示）的遗憾为顺序√(n/T ||p||_{1/2})，而主动采样（分配q_j ∝ p_j^{2/3}）则达到紧致的速率√(n/T) ||p||_{2/3}。所获得的改进可以达到Θ(k^{1/4})，其中k是上下文的数量。我们进一步将分析扩展到预算化的主动采样，刻画相应的紧致速率，并确定何时有限的主动预算足以恢复完全主动的速率。当p未知时，我们提出探索-探索-然后-提交（EETC）算法，该算法在大时间范围内能够匹配已知p的主动速率，仅相差常数因子。在合成和现实数据上的实验支持了我们的理论发现。

英文摘要

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p$. Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known $p$, we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order $\sqrt{n/T \, \lVert p \rVert_{1/2}}$, whereas active sampling with allocation $q_j \propto p_j^{2/3}$ achieves the tight rate $\sqrt{n/T} \, \lVert p \rVert_{2/3}$. The resulting improvement can be as large as $Θ(k^{1/4})$, where $k$ is the number of contexts. We further extend the analysis to budgeted active sampling, characterize the corresponding tight rate, and identify when a limited active budget suffices to recover the fully active rate. When $p$ is unknown, we propose the Explore-Explore-Then-Commit (EETC) algorithm, which optimally balances estimating the context distribution and the time to switch to active allocation, such that for large horizons, it matches the known-$p$ active rate up to constants. Experiments on synthetic and real-world data support our theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2605.20037 2026-05-20 cs.LG cs.AI 版本更新

When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

当批评者意见不一致时：RIS辅助无线控制系统中的自适应奖励中毒攻击

Deemah H. Tashman, Soumaya Cherkaoui

发表机构 * Department of Computer and Software Engineering（计算机与软件工程系）

AI总结本文提出了一种基于分歧引导的奖励中毒攻击（DGRP），用于攻击Soft Actor-Critic（SAC）智能体，以评估RIS辅助网络中深度强化学习（DRL）的鲁棒性。

详情

AI中文摘要

奖励中毒攻击对基于学习的无线控制系统构成了重大风险。为此，我们提出了一种在受Reconfigurable Intelligent Surfaces（RIS）辅助的Cognitive Radio Network（CRN）环境中，针对Soft Actor-Critic（SAC）智能体的Disagreement-Guided Reward Poisoning（DGRP）自适应攻击。SAC智能体的任务是通过同时优化二次用户（SUs）的发射功率和RIS相移，以最大化长期二次用户的速率。DGRP在SAC双批评者表现出显著分歧时（尤其在高杠杆、高不确定性状态下）污染奖励，导致价值估计扭曲并引导策略朝向次优动作。我们的研究发现，DGRP显著降低了RIS通常提供的性能提升，并降低了传输质量。我们进一步研究了关键攻击参数及其对学习的影响。与周期性定时和探索触发基线相比，DGRP始终造成更大的损害，突显了在评估RIS辅助网络中DRL鲁棒性时考虑分歧意识威胁的必要性。

英文摘要

Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intelligent Surfaces (RIS), the SAC agent is tasked with maximizing the long-term secondary users' (SUs) rate by simultaneously optimizing the transmission power of the SU transmitter and the RIS phase shifts. DGRP corrupts rewards, particularly when the SAC dual critics exhibit substantial disagreement-especially in high-leverage, high-uncertainty states-resulting in distorted value estimations and guiding the policy towards suboptimal actions. Our findings demonstrate that DGRP substantially diminishes the performance improvements typically provided by RIS and degrades transmission quality. We further investigate key attack parameters and determine their impact on learning. In comparison to periodic-timing and exploration-triggered baselines, DGRP consistently causes greater damage, highlighting the necessity of considering disagreement-aware threats when evaluating the robustness of Deep Reinforcement Learning (DRL) in RIS-assisted networks.

URL PDF HTML ☆

赞 0 踩 0

2605.20032 2026-05-20 cs.LG cs.MM 版本更新

CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

CAMERA: 适应语义伪装的无监督文本属性图欺诈检测

Junjun Pan, Yixin Liu, Yu Zheng, Lianhua Chi, Alan Wee-Chung Liew, Shirui Pan

发表机构 * School of Information and Communication Technology, Griffith University, Australia（格里菲斯大学信息与通信技术学院，澳大利亚）； Department of Computer Science and Information Technology, La Trobe University, Australia（拉特罗布大学计算机科学与信息技术系，澳大利亚）

AI总结本文提出CAMERA框架，通过适应性多 cue 专家模型来应对语义伪装问题，利用图结构和文本属性信息进行无监督欺诈检测，提高对伪装欺诈者的识别能力。

Comments Accepted by IJCAI 2026

详情

AI中文摘要

块球向量量化

Heesang Ann, Joongkyu Lee, Min-hwan Oh

发表机构 * Seoul National University（首尔国立大学）

AI总结本文研究了向量量化方法，通过统一理论比较不同旋转量化器，揭示其性能依赖于特定的失真度量标准，并提出块球量化算法以改进旋转块量化。

详情

AI中文摘要

向量量化是可扩展机器学习系统中的基本操作，能够实现内存高效存储、快速检索和压缩推理。最近的旋转基于量化器如EDEN、RabitQ和TurboQuant引入了强保证和实证性能，但其周围比较难以解释，因为它们依赖于不同的失真标准、概率领域和实现假设。作为我们的第一个贡献，我们提供了这些方法的统一理论比较，表明其相对优势是标准依赖的而非绝对的：EDEN和TurboQuant在均方失真方面有利，EDEN在预期内积失真方面也有效，而RabitQ提供强的高概率控制。此比较进一步表明EDEN在预期失真度量方面提供特别强的保证。作为我们的第二个贡献，我们引入了块球量化（BlockQuant），一种新的旋转块量化算法，围绕随机旋转向量的球几何设计。不同于坐标wise量化器，BlockQuant在球面上量化块，更忠实保持旋转嵌入的几何结构。我们证明这种块球设计在本文考虑的基准上理论上在重建MSE和预期内积失真方面均有所改进。我们在真实嵌入数据集和长上下文LLM推理任务上的实验显示了实际收益，与我们的理论改进一致。

英文摘要

Vector quantization is a fundamental primitive for scalable machine learning systems, enabling memory-efficient storage, fast retrieval, and compressed inference. Recent rotation-based quantizers such as EDEN, RabitQ, and TurboQuant have introduced strong guarantees and empirical performance, but the surrounding comparisons have been difficult to interpret because they rely on different distortion criteria, probability regimes, and implementation assumptions. As our first contribution, we provide a unified theoretical comparison of these methods and show that their relative advantages are criterion-dependent rather than absolute: EDEN and TurboQuant are favorable for MSE distortion, EDEN is also effective for expected inner-product distortion, and RabitQ provides strong high-probability control. This comparison further clarifies that EDEN provides particularly strong guarantees for expected distortion measures. As our second contribution, we introduce Block-Sphere Quantization (BlockQuant), a new rotation-based block quantization algorithm designed around the spherical geometry of randomly rotated vectors. Unlike coordinate-wise quantizers, BlockQuant quantizes blocks on the sphere, preserving the geometry of rotated embeddings more faithfully. We prove that this block-spherical design theoretically improves over the baselines considered in this paper for both reconstruction MSE and expected inner-product distortion. Our experiments on real embedding datasets and long-context LLM inference tasks show practical gains that are consistent with our theoretical improvements.

URL PDF HTML ☆

赞 0 踩 0

2605.19966 2026-05-20 cs.LG cs.AI 版本更新

Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

通过顺序熵变化检测基于优化的对抗性提示

Mohammed Alshaalan, Miguel R. D. Rodrigues

发表机构 * Department of Electronic and Electrical Engineering, University College London, London, United Kingdom（电子与电气工程系，伦敦大学学院，伦敦，英国）

AI总结本文提出了一种基于在线变化点检测的对抗性后缀检测方法CPD，通过标准化用户令牌熵并应用单侧CUSUM统计量，提高了对优化基于对抗性提示的检测性能，同时在多个大型语言模型上实现了更高的F1分数和AUC性能。

Comments Accepted at ICML 2026; 20 pages, including 9 pages main text, references, and appendix

详情

AI中文摘要

基于优化的对抗性后缀可以劫持对齐的大型语言模型（LLMs），同时保持流畅，这削弱了静态和窗口化困惑度基于的检测器。我们把对抗性后缀检测视为一个在线变化点检测问题，针对令牌级下一个令牌熵流。使用LLM系统提示来估计一个稳健的基线，我们标准化用户令牌熵并应用单侧CUSUM统计量。所得到的检测器CPD（在线变化点检测）是模型无关的，无需训练，可以在线运行，并能定位对抗性后缀的起始。在1,012个优化基于的后缀攻击（GCG，AutoDAN，AdvPrompter，BEAST，AutoDAN-HGA）和1,012个困惑度控制的良性提示的基准上，CPD在六个开源权重聊天模型（LLaMA-2-7B/13B，Vicuna-7B/13B，Qwen2.5-7B/14B）上均优于最强的窗口化困惑度基线。在LLaMA-2-7B的典型CUSUM设置（k=0）下，CPD达到AUC 0.88和F1 0.82。除了提示级检测外，CPD将79.6%的触发集中在对抗性后缀内，而窗口化困惑度为17-46%。最后，当用作LLaMA Guard的轻量级门控时，CPD在高流量、良性主导的部署中减少了17-22%的门控调用，同时保持了门控级别的检测质量。

英文摘要

Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed perplexity-based detectors. We cast adversarial suffix detection as an online change-point detection problem over the token-level next-token entropy stream. Using the LLM system prompt to estimate a robust baseline, we standardize user-token entropies and apply a one-sided CUSUM statistic. The resulting detector, CPD Online (CPD), is model-agnostic, training-free, runs online, and localizes the adversarial suffix onset. On a benchmark of 1,012 optimization-based suffix attacks (GCG, AutoDAN, AdvPrompter, BEAST, AutoDAN-HGA) and 1,012 perplexity-controlled benign prompts, CPD improves F1 over the strongest windowed-perplexity baseline on all six open-weight chat models (LLaMA-2-7B/13B, Vicuna-7B/13B, Qwen2.5-7B/14B). On LLaMA-2-7B at the canonical CUSUM setting ($k=0$), CPD reaches AUROC $0.88$ and F1 $0.82$. Beyond prompt-level detection, CPD concentrates 79.6% of its triggers inside the adversarial suffix, versus 17-46% for windowed perplexity. Finally, when used as a lightweight gate for LLaMA Guard, CPD reduces guard calls by 17-22% on a high-volume, benign-dominated deployment while preserving guard-level detection quality

URL PDF HTML ☆

赞 0 踩 0

2605.19959 2026-05-20 cs.LG math.FA 版本更新

Learning Orthonormal Bases for Function Spaces

在函数空间中学习正交基

Hamidreza Kamkari, Mohammad Sina Nabizadeh, Justin Solomon

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）

AI总结本文提出通过神经网络学习和优化函数空间中的正交基，利用李群的流形性质，证明即使使用有限秩生成器，也能在适当算子拓扑下实现正交基的稠密性。

详情

AI中文摘要

无限维正交基的展开在表示和计算函数空间时起着核心作用，由于其有利的线性代数性质。然而，常见的基如傅里叶或小波基是固定的，不能适应给定问题或数据集的结构。本文旨在用神经网络表示这些基并进行优化。我们的关键思想是，任何目标无限维正交基可以视为李群的流形上的一个点，或者等价地，视为连接参考基（例如傅里叶基）到该目标基的连续路径的终点。流形上的路径满足由斜反对称积分算子所支配的常微分方程（ODE）。使用神经网络定义此类ODE的有限秩生成器，使我们能够参数化和优化函数空间中的正交基。虽然使用有限秩生成器来建模无限算子可能显得限制，但我们证明了一个普遍性结果：即使使用秩2的生成器，ODE的积分解在适当的算子拓扑下在正交群中也是稠密的。换句话说，对于任何目标正交基，存在一条从参考基出发并由有限秩生成器驱动的路径，可以无限接近该目标基。我们通过将傅里叶基转换为功能数据集的主成分、线性算子的本征函数或能量守恒物理模拟的动力模式，展示了该框架的灵活性。

英文摘要

Infinite-dimensional orthonormal basis expansions play a central role in representing and computing with function spaces due to their favorable linear algebraic properties. However, common bases such as Fourier or wavelets are fixed and do not adapt to the structure of a given problem or dataset. In this paper, we aim to represent these bases with neural networks and optimize them. Our key idea is that any target infinite-dimensional orthonormal basis can be viewed either as a point on the Lie manifold of the orthogonal group, or equivalently, as the endpoint of a continuous path on that manifold that connects a reference basis, e.g. Fourier, to that target. Paths on the Lie manifold satisfy ordinary differential equations (ODEs) governed by skew-adjoint integral operators. Using neural networks to define finite-rank generators of such ODEs allows us to parameterize and optimize orthonormal bases in function space. While relying on finite-rank generators to model infinite operators might seem restrictive, we prove a universality result: even with a rank-2 generator, the integrated solutions of the ODE are dense in the orthogonal group under the appropriate operator topology. In other words, for any target orthonormal basis, there exists a path originating from a reference basis and driven by finite-rank generators that gets arbitrarily close to that target basis. We demonstrate the flexibility of our framework by transforming the Fourier basis into the principal components of a functional dataset, eigenfunctions of linear operators, or dynamic modes of energy-preserving physical simulations.

URL PDF HTML ☆

赞 0 踩 0

2605.19947 2026-05-20 cs.LG 版本更新

Exploiting Non-Negativity in DAG Structure Learning

利用非负性在DAG结构学习中的应用

Samuel Rey, Madeline navarro, Gonzalo Mateos

发表机构 * Dept. of Signal Theory and Communications, Universidad Rey Juan Carlos（信号理论与通信系，雷昂·卡洛斯大学）； Dept. of Electrical and Computer Engineering, Rice University（电气与计算机工程系，里奇大学）； Dept. of Electrical and Computer Engineering, University of Rochester（电气与计算机工程系，罗切斯特大学）

AI总结本文研究了如何通过非负性约束简化DAG结构学习中的非凸优化问题，并提出了基于多pliers方法的正则化非负DAG学习算法，证明了在总体情况下真实DAG是唯一全局最小值点。

详情

AI中文摘要

本文研究了从节点观测数据学习有向无环图（DAG）的问题，这些数据由线性结构方程模型生成。DAG学习是信号处理、机器学习和因果推断中的核心任务，但其挑战在于无环性是一个全局组合性质。连续无环约束通过将离散DAG约束替换为光滑等式约束促进了算法进展。然而，现有方法仍然涉及困难的非凸优化景观并可能遭受退化的一阶最优条件。本文专注于具有非负边权的DAG，并利用此额外结构获得更简单的无环性表征。基于此表征，我们提出了一个正则化的非负DAG学习问题，并开发基于多pliers方法的算法。我们进一步分析了非负性诱导的良性优化景观。在总体情况下，我们证明真实DAG是所提出增广拉格朗日公式唯一的全局最小值点；此外，景观中没有虚假的内部 stationary 点，且真实DAG是唯一的无环KKT点。在合成和真实数据上的数值实验表明，所提方法优于现有连续DAG学习方法。

英文摘要

This work addresses the problem of learning directed acyclic graphs (DAGs) from nodal observations generated by a linear structural equation model. DAG learning is a central task in signal processing, machine learning, and causal inference, but it remains challenging because acyclicity is a global combinatorial property. Continuous acyclicity constraints have led to important algorithmic advances by replacing the discrete DAG constraint with smooth equality constraints. However, existing formulations still involve difficult non-convex optimization landscapes and may suffer from degenerate first-order optimality conditions. Here, we restrict attention to DAGs with non-negative edge weights and exploit this additional structure to obtain a simpler characterization of acyclicity. Building on this characterization, we formulate a regularized non-negative DAG learning problem and develop an algorithm based on the method of multipliers. We further analyze the benign optimization landscape induced by non-negativity. In the population regime, we show that the true DAG is the unique global minimizer of the proposed augmented-Lagrangian formulation; moreover, the landscape contains no spurious interior stationary points, and the true DAG is the only acyclic KKT point. Numerical experiments on synthetic and real-world data show that the proposed method improves over state-of-the-art continuous DAG-learning alternatives.

URL PDF HTML ☆

赞 0 踩 0

2605.19944 2026-05-20 cs.LG cs.AI cs.CC cs.CL 版本更新

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

关于推理的测度论分析：结构泛化与近似限制

Yuyang Zhang, Yifu Zhang, Xuehai Zhou, Xiaoyin Chen

发表机构 * McGill University（麦吉尔大学）； Mila - Quebec AI Institute（魁北克AI研究所）； Université de Montréal（蒙特利尔大学）

AI总结本文通过最优传输理论分析推理过程，揭示了结构泛化和近似限制的理论机制，发现位置依赖注意力机制和Transformer电路深度对推理性能有显著影响。

Comments Preprint

详情

AI中文摘要

尽管大型语言模型推理的经验缩放定律已得到充分文档，但支配分布外泛化的理论机制仍不明确。我们通过最优传输形式化推理，将离散轨迹投影到连续度量空间，利用Wasserstein-1距离量化领域偏移。借助Kantorovich对偶性，我们通过架构Lipschitz连续性和函数近似限制来界定分布外泛化。这揭示了两个主要约束。首先，位置依赖注意力（例如绝对位置编码）无法保持偏移不变性，导致Ω(1)的Lipschitz常数和预期风险，而偏移不变机制（例如旋转嵌入）保持等价性并限制误差。其次，通过将顺序回溯映射到Dyck-k语言，我们为TC⁰变换器建立了严格的电路深度下界。物理层深度的扩展是必要的，以避免表示崩溃——这一约束无法通过扩展表示宽度来绕过，因为Barron空间中存在不可约的近似界限。在54种Transformer配置上对组合搜索的评估证实了这些界限，证明泛化风险随Wasserstein领域偏移单调下降。

英文摘要

While empirical scaling laws for LLM reasoning are well-documented, the theoretical mechanisms governing out-of-distribution (OOD) generalization remain elusive. We formalize reasoning via optimal transport, projecting discrete trajectories into a continuous metric space to quantify domain shifts using the Wasserstein-1 distance. Invoking Kantorovich duality, we bound OOD generalization via architectural Lipschitz continuity and functional approximation limits. This exposes two primary constraints. First, position-dependent attention (e.g., Absolute Positional Encoding) fails to preserve shift invariance, yielding an $Ω(1)$ Lipschitz constant and expected risk, whereas shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound the error. Second, by mapping sequential backtracking to a Dyck-$k$ language, we establish a strict circuit depth lower bound for $\text{TC}^0$ Transformers. Scaling physical layer depth is necessary to avert representation collapse -- a constraint that scaling representation width cannot bypass due to irreducible approximation bounds in Barron spaces. Evaluations across 54 Transformer configurations on combinatorial search corroborate these bounds, demonstrating that generalization risk degrades monotonically with the Wasserstein domain shift.

URL PDF HTML ☆

赞 0 踩 0

2605.19932 2026-05-20 cs.AI cs.CL cs.LG 版本更新

基于部分成对监督的快速且无特征节点表示学习

Sujan Chakraborty, Saptarshi Bej

发表机构 * Indian Institute of Science Education and Research（印度科学教育与研究学院）

AI总结该研究提出了一种快速且统一的框架，用于在部分可用的成对节点标签和无可用节点特征的图中进行可扩展的节点表示学习，通过结合社区感知的结构信号和带符号的成对约束，实现了高效的优化方案。

详情

AI中文摘要

我们引入了Contrastive FUSE，一种用于图中可扩展节点表示学习的快速且统一的框架，该框架在部分可用的成对节点标签和无可用节点特征的情况下进行优化。与现有方法不同，我们直接优化了一个谱对比目标，该目标整合了社区感知的结构信号和带符号的成对约束。为了支持大规模训练，我们用一种轻量级的近似方法替换了昂贵的模块度梯度，这在保持模块度行为的同时显著降低了计算成本。这产生了一种高效的优化方案，具有自然梯度分解和自适应学习率缩放，即使在百万边图上也能实现快速迭代更新。在基准引文网络、大型共购图和OGB数据集上的广泛实验表明，Contrastive FUSE在不依赖节点特征的情况下实现了竞争性或优越的对比分类性能，同时在现有基线上提供了显著的运行时间提升。这些结果突显了将模块度启发的结构学习与对比监督相结合在高效和可扩展的对比节点表示学习中的有效性。

英文摘要

We introduce Contrastive FUSE, a fast and unified framework for scalable node representation learning in graphs with partially available pairwise node labels and no available node features. Unlike existing methods, we directly optimize a spectral contrastive objective that integrates community-aware structural signals with signed pairwise constraints. To support large-scale training, we replace the expensive modularity gradient with a lightweight approximation, which preserves the structure-seeking behavior of modularity while reducing the computational cost significantly. This yields an efficient optimization scheme with a natural gradient decomposition and adaptive learning-rate scaling, enabling fast iterative updates even on million-edge graphs. Extensive experiments on benchmark citation networks, large co-purchase graphs, and OGB datasets show that Contrastive FUSE achieves competitive or superior contrastive classification performance without relying on node features, while offering substantial runtime gains over existing baselines. These results highlight the effectiveness of coupling modularity-inspired structural learning with contrastive supervision for efficient and scalable contrastive node representation learning.

URL PDF HTML ☆

赞 0 踩 0

2605.19902 2026-05-20 cs.LG q-bio.QM 版本更新

Hierarchical Contrastive Learning for Multi-Domain Protein-Ligand Binding

多领域蛋白质-配体结合的分层对比学习

Shuo Zhang, Rongqi Hong, Huifeng Zhang, Jian K. Liu

发表机构 * University of Birmingham, UK（英国伯明翰大学）

AI总结本研究提出HCLBind框架，通过分层对比学习方法，解决多领域蛋白质-配体结合亲和力预测问题，核心方法是分离几何表示学习与亲和力回归，并采用新颖的分层诱饵策略，结合领域门控图注意力网络和跨模态注意力，提升领域界面优先级，实验表明HCLBind能有效学习判别界面特征并提供鲁棒的不确定性估计。

Comments Accepted by ISBRA2026

详情

AI中文摘要

预测多领域蛋白质-配体结合亲和力仍然面临挑战，因为领域间动态决定了分子识别。现有几何深度学习方法通常将蛋白质视为单一静态图，导致刚体假设和柔性区域的随机噪声问题。为此，我们引入HCLBind，一种自监督框架，将几何表示学习与亲和力回归分离。HCLBind在Q-BioLiP数据库上采用通用到特定的预训练范式，学习稳健的结合物理语法。我们提出了一种新颖的分层诱饵策略：模型通过单领域蛋白质坐标扰动学习局部物理化学约束，通过多领域复合物领域旋转学习全局构象几何。我们的混合架构集成了领域门控图注意力网络和跨模态注意力，以显式优先考虑领域界面。此外，我们采用LoRA对蛋白质和配体基础模型进行优化，确保高效优化的同时保留进化知识。在PDBBind上的实验表明，HCLBind有效学习了判别界面特征，并提供了鲁棒的不确定性估计，克服了标准监督学习的局限性。代码可在https://github.com/jiankliu/HCLBind获取。

英文摘要

Predicting protein-ligand binding affinity remains intractable for multi-domain proteins, where inter-domain dynamics govern molecular recognition. Existing geometric deep learning methods typically treat proteins as monolithic static graphs, suffering from rigid-body assumptions and aleatoric noise in flexible regions. To address this, we introduced HCLBind, a self-supervised framework that decouples geometric representation learning from affinity regression. HCLBind leverages a general-to-specific pre-training paradigm on the Q-BioLiP database to learn a robust physical grammar of binding. We propose a novel hierarchical decoy strategy: the model learns local physicochemical constraints through protein coordinate perturbation in single-domain proteins and global conformational geometry through inter-domain rotation in multi-domain complexes. Our hybrid architecture integrates a domain-gated graph attention network and cross-modal attention to explicitly prioritize domain interfaces. Furthermore, we employ LoRA on protein and ligand foundation models, ensuring efficient optimization while preserving evolutionary knowledge. Experiments on PDBBind demonstrate that HCLBind effectively learns discriminative interface features and provides robust uncertainty estimation, overcoming the limitations of standard supervised learning. The code is available at https://github.com/jiankliu/HCLBind.

URL PDF HTML ☆

赞 0 踩 0

2605.19856 2026-05-20 cs.LG cs.AI 版本更新

StableGrad: Backward Scale Control without Batch Normalization

StableGrad: 无需批量归一化的反向缩放控制

Jose I. Mestre, Alberto Fernández-Hernández, Cristian Pérez-Corral, Manuel F. Dolz, Enrique S. Quintana-Ortí

发表机构 * Universitat Politècnica de València（巴塞罗那理工大学）； Universitat Jaume I（Jaime I 大学）

AI总结本文提出StableGrad，一种在无需批量归一化的情况下通过优化器层面控制权重-梯度缩放来稳定深度神经网络训练的方法，特别适用于物理信息神经网络等场景。

详情

AI中文摘要

训练非常深的神经网络需要控制深度方向上的量值传播。没有这种控制，激活值和梯度可能会消失、爆炸或进入不稳定区域，导致优化失败。现代架构通常通过批量归一化、残差连接或其他归一化层来缓解这个问题，这些机制会重复地重新缩放或绕过中间表示。然而，这些机制并不总是适用。在物理信息神经网络（PINNs）中，网络表示连续的物理场及其输入导数定义了训练目标，使批量依赖的归一化变得有问题，因为这会引入非局部依赖性到预测场及其导数中。我们提出StableGrad，一种优化器层面的缩放控制机制，可以在不修改前向模型的情况下纠正层间权重-梯度不平衡。因为归一化仅在反向传播后、优化器更新前应用，网络输出、其导数和物理残差保持不变。我们分析了这种缩放所引起的有效训练动态，并在深度PINNs上评估StableGrad作为目标应用，用无批量归一化的卷积网络作为诊断压力测试。在PINN基准测试中，StableGrad提高了匹配深度的解精度，并使更深层的模型在标准优化下更加可靠。在ResNet和EfficientNet架构中，移除批量归一化通常会导致训练崩溃，但StableGrad在不引入其他架构变化的情况下稳定了优化。这些结果表明，优化器层面的权重-梯度缩放控制可以提供一种实用的替代方案，当前向归一化不可用或不适用时。

英文摘要

Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often mitigate this problem through Batch Normalization, residual connections, or other normalization layers, which repeatedly re-scale or bypass intermediate representations. However, these mechanisms are not always appropriate. In Physics-Informed Neural Networks (PINNs), the network represents a continuous physical field and its input derivatives define the training objective, making batch-dependent normalization problematic because it can introduce non-local dependencies into the predicted field and its derivatives. We propose StableGrad, an optimizer-level scale-control mechanism that corrects layer-wise weight-gradient imbalances without modifying the forward model. Because the normalization is applied only after backpropagation and before the optimizer update, the network output, its derivatives, and the physical residual remain unchanged. We analyze the effective training dynamics induced by this rescaling and evaluate StableGrad on deep PINNs as the target application, with BatchNorm-free convolutional networks serving as a diagnostic stress test. On PINN benchmarks, StableGrad improves matched-depth solution accuracy and makes deeper models more reliable under standard optimization. On ResNet and EfficientNet architectures, where removing Batch Normalization normally leads to training collapse, StableGrad stabilizes optimization without introducing any other architectural change. These results show that optimizer-level control of weight-gradient scale can provide a practical alternative when forward normalization is unavailable or undesirable.

URL PDF HTML ☆

赞 0 踩 0

2605.19842 2026-05-20 cs.LG 版本更新

Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

通过切片特征蒸馏实现神经网络的快速张量化

Safa Hamreras, Sukhbinder Singh, Román Orús

发表机构 * Donostia International Physics Center（多斯蒂亚国际物理中心）； Multiverse Computing（多维计算）； Ikerbasque Foundation for Science（伊克尔巴斯基科学基金会）

AI总结本文提出了一种基于切片特征蒸馏的可扩展张量化框架，用于神经网络压缩。该方法通过将网络分解为独立的切片（如单个层或块），并独立张量化每个切片以恢复原始预训练模型的中间表示，从而提高精度恢复、减少数据需求并实现高效的并行优化。

详情

AI中文摘要

我们提出了一种基于切片特征蒸馏的可扩展张量化框架，用于神经网络压缩。与传统的依赖于成本高昂的全局微调的张量分解方法不同，我们的方法将网络分解为由单个层、块（如卷积层或MLP）或连续层的小组构成的切片，并独立对每个切片进行张量化以重现原始预训练模型的中间表示。这种模块化策略提高了精度恢复，减少了数据需求，并实现了高效的并行优化。在ResNet-34上的实验表明，与传统全局张量化相比，该方法在中等压缩率下实现了接近无损的压缩效果，并具有更快的优化速度。在GPT-2 XL上的结果进一步展示了该方法的可扩展性和其在大规模模型中的适用性，特别是在分布式设置中。

英文摘要

We propose a scalable tensorization framework for neural network compression based on slice-wise feature distillation. Unlike conventional tensor decomposition methods that rely on costly global finetuning, our approach decomposes the network into slices consisting of either individual layers or blocks (e.g., convolutional layers or MLPs), or small groups of consecutive layers, and tensorizes each slice independently to reproduce the intermediate representations of the original pretrained model. This modular strategy improves accuracy recovery, reduces data requirements, and enables efficient parallel optimization. Experiments on ResNet-34 show significant gains over conventional global tensorization, achieving near-lossless compression at moderate compression rates with faster optimization. Results on GPT-2 XL further demonstrate the scalability of the method and its applicability to large-scale models, particularly in distributed settings.

URL PDF HTML ☆

赞 0 踩 0

2605.19834 2026-05-20 cs.LG cs.AI cs.SY eess.SY 版本更新

A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams

一种闭环、以状态为中心的多智能体框架，用于从异构数据流中估计乘客负载

Yiyao Xu, Hao Zhou, Yuhang Wang, Jingran Sun

发表机构 * Department of Civil and Environmental Engineering, University of South Florida（佛罗里达州立大学土木与环境工程系）

AI总结本文提出一种闭环、以状态为中心的多智能体框架，用于从异构数据流中准确估计乘客负载，通过动态分配信任和物理约束提升鲁棒性。

Comments Preprint version of a paper accepted by the 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC). 7 pages, 4 figures

详情

AI中文摘要

为了支持运营和乘客服务，公共交通机构需要可靠的乘客负载轨迹。目前，负载估计通常是从不完美的传感系统推断而来，而非完全观察，现代自动乘客计数（APC）系统的准确性仍受车站布局、流量强度和运营条件的影响。为了解决从异构数据流中稳健估计乘客负载的挑战，包括增量计数误差、证据冲突和上下文依赖的传感器可靠性，我们提出了一种闭环、以状态为中心的多智能体框架。该方法在每一步都强制物理可行性，动态分配信任给证据源，并将物理推导出的违反残差反馈回训练以提高鲁棒性。该架构包括一个统一的停靠事件骨干，一个耦合的感知-物理-融合循环用于停靠点推断，以及可选的行程级宏修正和闭环校准模块。

英文摘要

To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.

URL PDF HTML ☆

赞 0 踩 0

2605.19830 2026-05-20 cs.LG math.ST stat.TH 版本更新

Set-Valued Policy Learning

多治疗设置下的集合值策略学习

Laura Fuentes-Vicente, Mathieu Even, Gaëlle Dormion, Antoine Chambaz, Uri Shalit, Julie Josse

发表机构 * Inria PreMeDICaL, Inserm, Montpellier, France（Inria PreMeDICaL、Inserm、蒙彼利埃法国）； Elixir Health, Paris, France（Elixir Health、巴黎法国）； Université Paris Cité, CNRS, MAP5, F-75006 Paris, France（巴黎大学Cité、CNRS、MAP5、法国巴黎75006）； Tel-Aviv University, Tel-Aviv, Israel（特拉维夫大学、特拉维夫以色列）

AI总结本文提出了一种集合值策略学习方法，用于多治疗场景，通过输出可能的治疗集而非单一推荐，从而内在地量化不确定性，并通过新的 greatest Lower Bound 方法扩展了学习-延迟框架，并引入了符合政策学习，以连接未观察到的真实最优治疗与估计的最优治疗规则。

详情

AI中文摘要

传统治疗政策将患者协变量映射到单一推荐干预以最大化预期临床结果。尽管已开发出大量因果推断方法来估计此类政策，但点值推荐对估计不确定性、模型规范和有限样本变异高度敏感，通常提供很少关于应如何自信推荐行动的指导。在本文中，我们提出了一种多治疗设置下的集合值策略学习范式，其中策略输出一组可能的治疗而非单一推荐。这种形式使内在不确定性量化成为可能，预测集的大小反映决策不确定性的程度。我们通过新的 greatest Lower Bound 方法扩展了学习-延迟框架到多治疗，并引入了符合政策学习，它弥合了未观察到的真实最优治疗与估计最优治疗规则之间的差距。借鉴噪声标签文献的见解，我们开发了一种随机性注入方法，该方法在不需假设底层黑箱最优治疗规则的情况下保证边际覆盖率。通过在合成数据和实际应用到体外受精（IVF）上的实验，我们证明了我们的方法产生稳健且可操作的政策，这些政策自然地纳入临床考虑，同时有效平衡性能和可靠性。

英文摘要

Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies, point-valued recommendations can be highly sensitive to estimation uncertainty, model specification, and finite-sample variability, while typically providing little guidance about how confident one should be in the recommended action. In this work, we propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel \textit{greatest Lower Bound} method, and introduce \textit{conformal policy learning}, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules. Through experiments on synthetic data and a real-world application to In-Vitro Fertilization (IVF), we demonstrate that our methods produce robust and actionable policies that naturally incorporate clinical considerations while effectively balancing performance and reliability.

URL PDF HTML ☆

赞 0 踩 0

2605.19823 2026-05-20 cs.LG cs.AI math.AP math.DS stat.ML 版本更新

Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

通过平滑分段处理神经算子以应对不连续性和尖锐过渡

Ha Dang, Sebastian Schmidt, Juergen Hesser

发表机构 * Mannheim Institute for Intelligent Systems in Medicine, Heidelberg University（海德堡大学曼海姆智能医学研究所）； Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University（海德堡大学跨学科科学计算中心）； Heidelberg Institute for Theoretical Studies (HITS), Heidelberg University（海德堡大学理论研究 institute）； Central Institute for Computer Engineering (ZITI), Heidelberg University（海德堡大学计算机工程中心）； CZS Heidelberg Initiative for Model-Based AI (MBAI), Heidelberg University（海德堡模型驱动人工智能倡议）

AI总结本文提出Cut-DeepONet，一种两阶段训练框架，通过将不连续性建模为更高维空间中的边界，减少学习复杂性，从而在处理偏微分方程的解算子时更有效地捕捉不连续性和尖锐过渡。

详情

AI中文摘要

神经算子在学习偏微分方程（PDEs）的解算子方面取得了强劲表现，但其本质上连续的表示在捕捉不连续性和尖锐过渡时存在困难。现有方法通常在连续函数空间内近似这些特征，往往需要增加模型容量和高分辨率数据。在本文中，我们提出Cut-DeepONet，一种两阶段训练框架，通过提升策略将问题重新表述，将域划分成平滑子区域，同时在更高维空间中将不连续性表示为边界。这种分离使算子学习任务与神经网络的归纳偏置对齐，并避免直接近似不连续性。一个额外的网络预测输入依赖的不连续性位置，然后用于指导神经算子在每个区域内生成平滑组件。在基准PDEs上的实验表明，Cut-DeepONet在低分辨率数据集上训练时也优于最先进的方法。该方法在存在不连续性和尖锐过渡的问题上表现优异，同时使用更少的可训练参数。我们的结果突显了改变算子学习的表示而非增加模型复杂性的优势。

英文摘要

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing approaches typically approximate such features within continuous function spaces, often requiring increased model capacity and high-resolution data. In this work, we propose Cut-DeepONet, a two-stage training framework that explicitly models discontinuities while reducing learning complexity. Our approach reformulates the problem via a lifting strategy, partitioning the domain into smooth subregions while representing discontinuities as boundaries in a higher-dimensional space. This separation aligns the operator learning task with the inductive bias of neural networks and avoids directly approximating discontinuities. An additional network predicts input-dependent discontinuity locations for unseen inputs, which are then used to guide the neural operator in generating smooth components within each region. Experiments on benchmark PDEs show that Cut-DeepONet outperforms state-of-the-art methods, even when trained on low-resolution datasets. The method excels on problems with discontinuities and sharp transitions, while using fewer trainable parameters. Our results highlight the benefits of changing the representation of operator learning rather than increasing model complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.19822 2026-05-20 cs.LG cs.AI 版本更新

ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability

ST-TGExplainer: 解构稳定性与转换模式以提升时序GNN可解释性

Hongjiang Chen, Xin Zheng, Pengfei Jiao, Huan Liu, Zhidong Zhao, Huaming Wu, Feng Xia, Shirui Pan

发表机构 * Hangzhou Dianzi University（杭州电子科技大学）； RMIT University（皇家墨尔本理工大学）； Tianjin University（天津大学）； Griffith University（格里菲斯大学）

AI总结本文提出ST-TGExplainer，一种能够解构时序图中稳定性与转换模式的自解释时序GNN，以提升模型的可解释性。

详情

AI中文摘要

时序图神经网络（TGNNs）在解决现实中的时序图任务中取得了显著进展。然而，其可解释性仍然有限，因为大多数TGNNs无法识别哪些历史交互最影响给定预测。尽管在可解释性TGNNs上取得了令人鼓舞的进展，现有方法主要关注之前已见过的历史交互，我们称之为稳定性模式，而忽略了新出现的一次性交互，我们称之为转换模式。这两种模式对于忠实的时序解释都是必不可少的。为了解决这一限制，我们提出了ST-TGExplainer，一种自解释的TGNN，旨在解构时序图中的稳定性与转换模式，以获得更忠实的时序GNN解释器。受解构信息瓶颈目标的指导，ST-TGExplainer学习了一个紧凑的解释子图，该子图在预测事件标签时保持预测性，同时显式地抑制稳定性与转换模式之间的标签条件冗余。广泛的实验表明，ST-TGExplainer在预测性能上表现出色，并产生了更忠实的解释。代码可在https://github.com/hjchen-hdu/ST-TGExplainer上获取。

英文摘要

Temporal graph neural networks (TGNNs) have gained significant traction for solving real-world temporal graph tasks. However, their interpretability remains limited, as most TGNNs fail to identify which historical interactions most influence a given prediction. Despite promising progress on interpretable TGNNs, existing methods predominantly focus on previously seen historical interactions, which we term stability patterns, while overlooking newly emerging first-time interactions, which we term transition patterns. Both types of patterns are essential for faithful temporal explanations. To address this limitation, we propose ST-TGExplainer, a self-explainable TGNN that disentangles Stability and Transition patterns in temporal graphs for a more faithful Temporal GNN Explainer. Guided by a disentangled information bottleneck objective, ST-TGExplainer learns a compact explanatory subgraph that remains predictive of the event label while explicitly suppressing label-conditioned redundancy between stability and transition patterns. Extensive experiments demonstrate that ST-TGExplainer achieves strong predictive performance and yields more faithful explanations. Code is available at https://github.com/hjchen-hdu/ST-TGExplainer.

URL PDF HTML ☆

赞 0 踩 0

2605.19813 2026-05-20 cs.LG math.ST stat.TH 版本更新

General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions

具有任意公共 transcripts 交互的差分隐私联邦学习的一般下界

Yicheng Li

发表机构 * Department of Statistics and Data Science, Tsinghua University（清华大学统计与数据科学系）

AI总结本文研究了在任意公共 transcripts 交互下差分隐私联邦学习的下界问题，提出了一个针对平方 $\ell_2$ 损失参数估计的联邦 Van Trees 下界，并通过均值估计、线性回归和非参数回归等应用展示了该下界。

2605.19812 2026-05-20 cs.LG cs.AI stat.AP stat.ML 版本更新

先验知识还是搜索？LLM代理在硬件感知代码优化中的研究

Dmitry Redko, Albert Fazlyev, Konstantin Sozykin, Maria Ivanova, Evgeny Burnaev, Egor Shvetsov

发表机构 * Applied AI Institute（应用人工智能研究所）； ITMO University（ITMO大学）； AI Talent Hub（AI人才中心）

AI总结该研究探讨了在硬件感知代码优化中，LLM代理是依赖于先验知识还是搜索过程，通过三个受控实验发现LLM在纯黑盒优化中表现为贪婪优化器，在零样本内核生成中输入大小信息无明显影响，而在反馈循环内核优化中CUDA单调改进而TVM IR主动退化，表明LLM在代码优化任务中高度依赖预训练先验而非反馈或代理结构。

详情

AI中文摘要

LLM发现和优化系统在各个领域中被越来越多地应用，实现了一个常见的提出-评估-修订循环。此类优化或发现过程通过上下文条件在接收到环境反馈后进行。然而，随着现代LLM代理在结构上日益复杂，难以评估哪些组件贡献最大，以及何时以及如何探索可能失败。我们通过三个受控实验回答这些问题。我们的发现：(1) 在纯黑盒优化中，LLM表现为贪婪优化器。(2) 在零样本内核生成中，提供显式输入大小信息没有可测量的影响，模型无论大小或温度都会收敛到相同的内核参数，仿佛大小指令是不可见的。此外，当被要求为不常见的内核大小进行内核优化时，性能会急剧下降，无论使用的语言如何。(3) 在反馈循环内核优化中，CUDA在迭代反馈下单调改进，而TVM IR则主动退化，这表明当模型以低密度语言操作时，内核优化会退化。我们的结果得出结论：在代码优化任务中，LLM高度依赖于预训练的先验而非提供的反馈或代理结构。

英文摘要

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are increasingly complex in their structure, it is difficult to evaluate which components contribute the most, and when and how this exploration may fail. We answer these questions through three controlled experiments. Our findings: (1) In pure black-box optimization, LLMs act as greedy optimizers. (2) In zero-shot kernel generation, providing explicit input-size information has no measurable effect, models converge to the same kernel parameters regardless of size or temperature, as though the size instruction were invisible. Moreover, when tasked to perform kernel optimization for uncommon kernel sizes, performance sharply degrades regardless of the language used. (3) In feedback-loop kernel optimization, CUDA improves monotonically under iterative feedback, while TVM IR actively degrades, which demonstrates that kernel optimization degrades when models operate with low-density language. Our results conclude that LLMs in code optimization tasks highly depend on pretrained priors rather than provided feedback or agentic structure.

URL PDF HTML ☆

赞 0 踩 0

2605.19779 2026-05-20 cs.AI cs.LG 版本更新

Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

无分布不确定性量化用于连续AI代理评估

Yuxuan Gao, Megan Wang, Yi Ling Yu

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Columbia University（哥伦比亚大学）

AI总结本文提出了一种无分布的不确定性量化方法，用于连续AI代理评估，通过适应性符合推断（ACI）提供预测质量分数的覆盖保证，并开发了多代理管道的组合不确定性界限、成对排名的符合回避规则以及领奖台规模多重检验的FDR校正回避方法。

Comments 6 pages, 7 figures, 2 tables. Accepted at the ICML 2026 Workshop on Agentic Uncertainty Quantification (AgenticUQ) - Poster

详情

AI中文摘要

我们适应了分割符合预测和适应性符合推断（ACI）用于连续AI代理评估，提供预测质量分数的无分布覆盖保证。符合区间在24小时范围内所有名义水平上实现了校准误差低于0.02，而ACI在代理发布后正确扩大了区间35%然后重新收敛。我们进一步开发了多代理管道的组合不确定性界限（通过模拟验证了不同阶段相关性rho在[-0.5, 0.9]范围内），一种用于成对排名的符合回避规则（具有受控的假排名率），以及领奖台规模多重检验的FDR校正回避方法。通过18个实时信号每小时收集的数据评估50个代理，我们显示每个代理的条件覆盖集中在名义水平（均值80.4%，90%的代理在[72%, 90%]范围内），并且跨源情感分歧预测排名不稳定性（r=0.64，p<0.01）。一个循环控制的验证确认了框架能够捕捉超过基准的信号（rho_s=0.52，p<0.01，n=35）。代码和数据在CC BY 4.0下发布。

英文摘要

We adapt split conformal prediction and adaptive conformal inference (ACI) to continuous AI agent evaluation, providing distribution-free coverage guarantees for forecasted quality scores. Conformal intervals achieve calibration error below 0.02 across all nominal levels at the 24h horizon, while ACI correctly widens intervals by 35% following agent releases then reconverges. We further develop compositional uncertainty bounds for multi-agent pipelines (validated via simulation across inter-stage correlations rho in [-0.5, 0.9]), a conformal abstention rule for pairwise rankings with controlled false-ranking rate, and FDR-corrected abstention for leaderboard-scale multiple testing. Evaluating 50 agents via 18 real-time signals collected hourly, we show that per-agent conditional coverage is well-concentrated around the nominal level (mean 80.4%, 90% of agents within [72%, 90%]), and that cross-source sentiment divergence predicts ranking instability (r=0.64, p<0.01). A circularity-controlled validation confirms the framework captures signal beyond benchmarks (rho_s=0.52, p<0.01, n=35). Code and data are released under CC BY 4.0.

URL PDF HTML ☆

赞 0 踩 0

2605.18870 2026-05-20 cs.LG math.AP math.FA 版本更新

Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows

多头变压器架构作为时间依赖的Wasserstein梯度流

Alex Massucco, Leonardo Del Grande, Marcello Carioni, Christoph Brune, Carola-Bibiane Schönlieb

发表机构 * Department of Applied Mathematics and Theoretical Physics, University of Cambridge（应用数学与理论物理系，剑桥大学）； Department of Mathematics, University of Twente（数学系，埃因霍温理工大学）

AI总结本文提出将多头变压器架构中的数据流建模为时间依赖的Wasserstein梯度流，以捕捉注意力机制的设计，并证明了在合适积分性假设下，梯度流的ω-极限集元素是交互能量的稳态点，同时分析了梯度流的稳定性，并通过数值实验验证了预测的能量耗散身份和动力学的渐近行为。

详情

AI中文摘要

近年来，变压器架构已彻底改变了语言处理领域，开辟了前所未有的可能性。然而，从理论角度来看，文献中提出的数学模型往往缺乏与实际架构的直接联系，并依赖于强简化的假设。在本文中，我们通过将多头变压器架构中的数据流建模为时间依赖的梯度流，以捕捉注意力机制的设计，从而缩小这一差距。显式的时间依赖性使我们能够为每个头和每个层分配不同的权重，而无需对初始化方法施加限制。此外，我们证明，在合适积分性假设下，每个梯度流的ω-极限集元素都是交互能量在极限权重分布下的稳态点。最后，我们分析了梯度流的稳定性，考虑了初始数据和权重的扰动。一方面，我们研究了所提出模型对噪声输入的鲁棒性，建立了梯度流对初始数据的连续依赖性和流的唯一性。另一方面，我们证明了扰动的交互能量对未扰动能量的Γ收敛性，导致相应的梯度流收敛。我们通过数值实验补充了这些理论结果，验证了预测的能量耗散身份，并澄清了动力学在自主型（Ornstein-Uhlenbeck）和真正非自主型（振荡权重）两种情况下的渐近行为。

英文摘要

In recent years, transformer architectures have revolutionized the field of language processing, opening the door to previously unforeseen possibilities. However, from a theoretical point of view, the mathematical models proposed in the literature often lack direct contact with the actual architectures and depend on strong simplifying assumptions. In this paper, we reduce this gap by modelling the data flow in multi-headed transformer architectures as time-dependent gradient flows for a suitable interaction energy capturing the design of the attention mechanism. The explicit dependence on time allows us to consider different weights for each head and for each layer, without imposing constraints on the initialization method. Moreover, we prove that, under a suitable integrability assumption on the evolution of the weights, each element of the $ω$-limit set of the gradient flows is a stationary point of the interaction energy at a limiting weight distribution. Finally, we analyse the stability of the gradient flows considering perturbations of both the initial data and the weights. Specifically, on the one hand, we study the robustness of the proposed models with respect to noisy inputs, establishing a continuous dependence of the gradient flows on the initial data and uniqueness of the flows. On the other hand, we prove the $Γ$-convergence of the perturbed interaction energy to the unperturbed one, leading to the convergence of the corresponding gradient flows. We complement these theoretical results with numerical experiments that confirm the predicted energy-dissipation identity and clarify the asymptotic behavior of the dynamics in both the autonomous-like (Ornstein--Uhlenbeck) and the genuinely non-autonomous (oscillating-weights) regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.18618 2026-05-20 cs.LG cs.AI 版本更新

Stochastic Penalty-Barrier Methods for Constrained Machine Learning

随机罚函数-障碍方法用于约束机器学习

Adam Bosák, Andrii Kliachkin, Jana Lepšová, Gilles Bareilles, Jakub Mareček

发表机构 * Artificial Intelligence Center, CTU in Prague（布拉格CTU人工智能中心）； CMAP, École Polytechnique, Palaiseau, France（法国巴黎高等理工学院帕莱索校区CMAP）

AI总结本文提出了一种随机罚函数-障碍方法（SPBM），用于解决深度学习中非凸、非光滑、随机环境下的约束优化问题，该方法通过指数对偶平均、稳定罚函数调度和Moreau包络来处理非光滑性，并在多个设置中验证了其性能。

2605.17635 2026-05-20 hep-ex cs.LG 版本更新

ML-based Fast Simulation of FARICH Responses

基于机器学习的FARICH响应快速模拟

Foma Shipilov, Alexander Barnyakov, Artem Ivanov, Fedor Ratnikov

发表机构 * HSE University（俄罗斯莫斯科高等经济大学）； Budker Institute of Nuclear Physics SB RAS（俄罗斯托木斯克核物理研究所）； Novosibirsk State Technical University（托木斯克国立技术大学）； Joint Institute for Nuclear Research（联合核子研究所）

AI总结本文提出基于条件生成对抗网络的机器学习方法，用于快速模拟FARICH探测器响应，通过轻量级卷积架构生成真实光子击中探测器矩阵的样本，并在速度和精度上优于传统蒙特卡洛方法。

Comments to be published in 7th International Workshop on Future Tau Charm Facilities (FTCF2025) proceedings

2605.17471 2026-05-20 cs.LG cs.NA math.NA math.OC 版本更新

WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

WinQ: 加速围绕鞍点的语言模型量化感知训练

Dongyue Li, Zechun Liu, Kai Yi, Zhenshuo Zhang, Changsheng Zhao, Raghuraman Krishnamoorthi, Harshit Khaitan, Hongyang R. Zhang, Steven Li

发表机构 * Northeastern University, MA（东北大学）； Meta AI, CA（Meta AI）

AI总结本文研究了量化感知训练（QAT）在低比特宽度下的收敛问题，提出WinQ算法通过重置权重和噪声注入梯度来加速训练并提升性能。

Comments 23 pages; To appear in ICML 2026

详情

AI中文摘要

量化感知训练（QAT）被广泛用于通过训练全精度权重来量化语言模型，其主要瓶颈是收敛缓慢和早期性能 plateau，特别是在低于4比特宽度时。尽管先前工作已观察到此问题，但其精确原因仍不清楚。在本文中，我们通过估计损失曲面Hessian谱来分析QAT的收敛性。我们发现权重会收敛到鞍点周围的平坦区域，其中大量Hessian特征值同时为正和负。在训练过程中，越来越多的Hessian特征值集中在零附近，其幅度减小。在较低的比特宽度下，Hessian谱中的特征值幅度显著更小。为缓解这些问题，我们提出了一种名为WinQ的算法，包括：（1）周期性地将权重重置为全精度和量化权重的线性插值，减少到量化网格的距离并增加特征值幅度，以及（2）计算噪声注入权重的梯度以正则化Hessian。广泛的实验表明，WinQ在各种量化方法和模型上将QAT加速了多达4倍。在相同的训练成本下，WinQ将最先进的子4比特量化改进了高达8.8%。这些结果在16种不同语言模型、量化方法和比特宽度的设置中保持一致。

英文摘要

Quantization-aware training (QAT) is widely adopted to quantize language models by training full-precision weights using gradients from the quantized model. The main bottleneck is its slow convergence and early performance plateau, particularly below 4-bit-widths. While this problem has been observed in prior work, its precise cause remains unclear. In this paper, we analyze the convergence of QAT by estimating the spectrum of the loss-surface Hessians. We find that the weights converge to flat regions around saddle points, where a large fraction of the Hessian eigenvalues are both positive and negative. During training, an increasing fraction of Hessian eigenvalues concentrates around zero, whose magnitude decreases. At lower bit-widths, the magnitude of eigenvalues in the Hessian spectrum is significantly smaller. To mitigate these issues, we propose an algorithm called WinQ to accelerate QAT, which involves: (1) periodically resetting weights to the linear interpolation of full-precision and quantized weights, reducing the distance to the quantization grid and increasing eigenvalue magnitude, and (2) computing gradients of noise-injected weights to regularize the Hessian. Extensive experiments show that WinQ accelerates QAT by up to 4 times across various quantization methods and models. Under the same training cost, WinQ improves state-of-the-art sub-4-bit quantization by up to 8.8%. These results are consistent across 16 settings with different language models, quantization methods, and bit widths.

URL PDF HTML ☆

赞 0 踩 0

2605.16692 2026-05-20 cs.LG cs.AI cs.RO 版本更新

EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control

EfficientTDMPC: 改进的MPC目标以实现高效的连续控制

Thomas Evers, Cristian Meo, Wendelin Bohmer, Justin Dauwels, Yaniv Oren

发表机构 * TU Delft（代尔夫特理工大学）； LatentWorlds AI

AI总结本文提出EfficientTDMPC，一种基于模型的强化学习方法，用于连续控制，通过减少误差和增加数据新鲜度来提高样本效率。

详情

AI中文摘要

我们介绍了EfficientTDMPC，一种用于连续控制的样本高效模型基于强化学习方法，基于TD-MPC算法家族。该家族的核心是一个规划器，旨在找到最大化估计回报的行动序列。回报通过学习的模型和价值网络进行估计，每个都可以引入误差。EfficientTDMPC通过两种方式减少这种误差。首先，它引入了动态模型的集成，并在这些模型和不同的展开深度之间平均回报估计。其次，它增加了应用不确定性惩罚到规划器目标的选项，从而得到一个避免不确定回报估计的规划器。然后，它增加了实用改进，提高缓冲数据的新鲜度并减少计算。最后，我们发现我们的贡献使EfficientTDMPC能够更受益于更高的更新到数据（UTD）比率，进一步提高样本效率。据我们所知，在每个基准的低数据情况下，EfficientTDMPC在HumanoidBench-Hard和DMC hard上实现了最先进的样本效率，而在DMC easy上则匹配了最先进的性能。

英文摘要

We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from a higher update-to-data (UTD) ratio, further improving sample efficiency. To the best of our knowledge, in the low data regime of each benchmark, EfficientTDMPC achieves state-of-the-art (SOTA) in terms of sample efficiency on HumanoidBench-Hard and DMC hard, while matching SOTA on DMC easy.

URL PDF HTML ☆

赞 0 踩 0

2605.16447 2026-05-20 cs.LG cs.AI 版本更新

Nested Spatio-Temporal Time Series Forecasting

嵌套时空时间序列预测

Yinghao Ai, Yukai Zhou, Ruoxi Jiang, Junyi An, Chao Qu, Zhijian Zhou, Shiyu Wang, Fenglei Cao, Zenglin Xu, Furao Shen, Yuan Qi

发表机构 * Fudan University, Shanghai（复旦大学）； Department of Computer Science and Technology, Nanjing University（南京大学计算机科学与技术系）； ByteDance（字节跳动）

AI总结本文提出了一种嵌套预测框架，通过结合未来宏观区域趋势与微观历史观测，实现了精细化预测，并通过谱聚类方法构建语义连贯的区域，有效过滤系统性噪声并保留关键趋势，实验表明该方法在多个高维数据集上优于现有最先进基线。

Comments Accept by ICML 2026

详情

AI中文摘要

时空预测对于现实应用如交通管理至关重要，但在噪声和非平稳条件下捕捉可靠交互仍具挑战性。现有方法主要依赖历史空间先验，往往无法考虑演化的时空相关性并产生系统性误差。在本文中，我们提出了一种嵌套预测框架，将未来宏观区域趋势与微观历史观测相结合，使模型能够从抽象的未来表示中获得自上而下的指导以实现精细化预测。具体而言，我们采用基于谱聚类的方法构建语义连贯的区域，提供了理论和经验证据表明这种表示能有效过滤系统性噪声并保留关键趋势。在此基础上，我们开发了一种逐步由粗到细的预测器，将这些代表性特征整合到推理过程中。这使模型能够利用趋势预测来提前预测动态异常，如周期性偏移。此外，对多个高维数据集的广泛实验表明，我们的方法在多个高维数据集上始终优于现有最先进基线，验证了未来宏观指导的嵌套预测的有效性。

英文摘要

Spatiotemporal forecasting is critical for real-world applications like traffic management, yet capturing reliable interactions remains challenging under noisy and non-stationary conditions. Existing methods primarily rely on historical spatial priors, often failing to account for evolving temporal correlations and suffering from systematic errors. In this work, we propose a nested forecasting framework that couples future macro-level regional trends with micro-level historical observations, enabling top-down guidance from abstract future representations for fine-grained forecasting. Specifically, we employ a spectral clustering-based approach to construct semantically coherent regions, providing both theoretical and empirical evidence that this representation effectively filters systematic noise while preserving essential trends. Building on this, we develop a progressive coarse-to-fine predictor to integrate these representative features into the inference process. This enables the model to leverage trend predictions to anticipate dynamic anomalies, such as periodic offsets, in advance. Furthermore, extensive experiments on multiple high-dimensional datasets demonstrate that our method consistently outperforms state-of-the-art baselines, validating the effectiveness of future macro-guided nested forecasting.

URL PDF HTML ☆

赞 0 踩 0

2605.16170 2026-05-20 cs.LG 版本更新

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

BAPR: 基于贝叶斯遗忘的分段鲁棒强化学习用于非平稳连续控制

Yifan Zhang, Liang Zheng

发表机构 * Central South University（中南大学）

AI总结该研究提出BAPR方法，结合贝叶斯在线变化检测与鲁棒集合强化学习，解决非平稳连续控制中的鲁棒性与适应性问题，通过形式化验证确保算法稳定性与收敛性。

详情

AI中文摘要

现实中的控制系统经常在分段平稳条件下运行，其中动态在较长时期内保持稳定，随后经历 abrupt 的 regime 变化。标准鲁棒强化学习方法面临根本性困境：全局保守策略在稳定时期浪费性能，而局部适应策略在未检测到 regime 变化时风险崩溃。我们提出 BAPR（贝叶斯遗忘分段鲁棒 SAC），将贝叶斯在线变化检测（BOCD）与鲁棒集合强化学习统一。BAPR 操作符——一种加权由模式条件贝尔曼操作符和冻结信念分布构成的凸组合——是一个 γ-收缩。一个互补的反例，在 Lean~4 中机验证，建立了明确的边界：当信念依赖于 Q 函数时，收缩因子变为 γ + λΔ（其中 Δ 是模式奖励差），且收缩失败恰好当 γ + λΔ ≥ 1。我们推导了抽象操作符的组件式形式化误差预算——每个组件机验证，限制了切换后的恢复；预算适用于抽象模式混合操作符，并通过冻结参数设计直觉继承到实现的共享批评者算法。所有结果均通过形式化验证，无 sorry（1,145 行，3 个 Lean~4 文件，22 个机验证定理）。BOCD 驱动了适应性保守机制：在检测到变化点后，策略变得最保守，并随着信心增长而平滑放松，检测延迟为 O(log(1/δ))。一个通过 RMDM 损失训练的上下文条件模块，从模拟器提供的模式 ID 提取模式感知表示，在训练时和部署时均无需模式标签。

英文摘要

Real-world control systems frequently operate under \emph{piecewise stationary} conditions, where dynamics remain stable for extended periods before undergoing abrupt regime changes. Standard robust RL methods face a fundamental dilemma: a globally conservative policy wastes performance during stable periods, while a locally adaptive policy risks catastrophic failure when the regime changes undetected. We propose \textbf{BAPR} (Bayesian Amnesic Piecewise-Robust SAC), which unifies Bayesian Online Change Detection (BOCD) with robust ensemble RL. The BAPR operator -- a convex combination of mode-conditional Bellman operators weighted by a frozen belief distribution -- is a $γ$-contraction. A complementary counterexample, machine-verified in Lean~4, establishes a \emph{sharp boundary}: when beliefs depend on the Q-function, the contraction factor becomes $γ+ λΔ$ (where $Δ$ is the mode reward gap), and contraction fails exactly when $γ+ λΔ\geq 1$. We derive a \emph{component-wise} formal error budget for the abstract operator -- every component machine-verified -- bounding post-switch recovery; the budget applies to the abstract mode-mixture operator and inherits to the implemented shared-critic algorithm only through the frozen-parameter design intuition. All results are formally verified with no \texttt{sorry} (1,145 lines across 3 Lean~4 files, 22 machine-verified theorems). BOCD drives an adaptive conservatism mechanism: the policy becomes maximally conservative after detected change-points and smoothly relaxes as confidence grows, with detection delay $O(\log(1/δ))$. A context-conditioning module trained via RMDM loss provides mode-aware representations from simulator-provided mode IDs at training time and requires no mode labels at deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.15532 2026-05-20 cs.LG cs.AI cs.CL 版本更新

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

DeltaPrompts: 逃离多模态蒸馏中的零delta陷阱

Jaehun Jung, Hyunwoo Kim, Brandon Cui, Ximing Lu, David Acuna, Prithviraj Ammanabrolu, Yejin Choi

发表机构 * NVIDIA Research（NVIDIA研究院）

AI总结本文提出DeltaPrompts，通过量化教师与学生之间的答案分歧（Δ）来生成高分歧的推理问题，从而解决传统蒸馏中因零delta提示导致的学习信号不足问题，实验表明DeltaPrompts在多个场景下显著提升了模型性能。

详情

AI中文摘要

蒸馏使紧凑的视觉-语言模型（VLMs）能够获得强大的推理能力，但驱动这一过程的提示通常通过简单的启发法或从现成数据集中聚合获得。我们揭示了这种方法中的关键低效性：标准图表/文档推理数据集中多达69%的提示实际上是零delta，意味着教师和学生已经诱导出完全相同的答案分布。在这些提示上训练提供极小的学习信号，导致学生性能在数据规模扩大时迅速饱和。为逃离零delta陷阱，我们回归基本原理：蒸馏本质上最小化了分布差异，因此只有暴露教师与学生之间功能性能力差距的提示才具有价值。我们通过答案分歧（Δ）量化这一差距，证明非零分歧对有效扩展至关重要。基于这一洞察，我们提出一个分阶段合成流程，利用现有数据集作为种子，主动针对学生失败模式生成更好的提示。结果是DeltaPrompts，一个包含20万 synthetic 高分歧推理问题的多样化数据集。我们评估DeltaPrompts在三个不同场景下的表现：在目标教师-学生对上的在线蒸馏、转移到新型模型家族而不重新生成数据、以及非推理模型的离线微调。在所有场景中，DeltaPrompts均带来显著收益，即使在高度优化的推理模型（如Qwen3-VL-8B-Thinking）上，也能在10个基准测试中平均获得高达15%的相对提升。

英文摘要

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.14588 2026-05-20 cs.LG 版本更新

Silent Collapse in Recursive Learning Systems

递归学习系统中的沉默崩溃

Zhipeng Zhang

发表机构 * China Mobile Research Institute（中国移动研究院）； China Mobile GBA (Greater Bay Area) Innovation Institute（中国移动大湾区创新研究院）

AI总结本文研究了递归学习系统中模型内部分布逐渐退化的现象，提出MTR框架通过监测轨迹统计量和调整学习强度来提前预警并防止沉默崩溃。

详情

AI中文摘要

递归学习——即模型在由自身先前版本生成的数据上进行训练——在大型语言模型、自主代理和自监督系统中日益常见。然而，标准性能度量（损失、困惑度、准确率）往往无法在不可逆退化发生前检测到内部退化。本文识别出一种现象，我们称之为沉默崩溃：在广泛递归条件下，模型内部分布（预测熵、表征多样性、尾部覆盖）即使在传统度量看似稳定或改进时也会逐渐收缩。我们发现沉默崩溃并非 abrupt，其发生前总是可靠地由三个轨迹级前兆预示：（1）锚点熵的收缩，（2）表征漂移的冻结，（3）尾部覆盖的侵蚀。这些信号在任何传统验证度量退化之前多代出现，从而实现早期预警。基于这些前兆，我们提出了MTR（监控-信任-调节器）框架，一个轻量级的元认知循环，通过监测轨迹统计量、估计慢时间尺度的信任变量，并自适应调节有效学习强度。MTR在不需访问原始干净数据的情况下提供早期预警并主动防止沉默崩溃，这是当原始数据不可用、受污染或私有时的关键优势。

英文摘要

Recursive learning -- where models are trained on data generated by previous versions of themselves -- is increasingly common in large language models, autonomous agents, and self-supervised systems. However, standard performance metrics (loss, perplexity, accuracy) often fail to detect internal degradation before it becomes irreversible. Here we identify a phenomenon we call silent collapse: under broad recursive conditions, model internal distributions -- predictive entropy, representational diversity, and tail coverage -- progressively contract even as conventional metrics appear stable or improving. We discover that silent collapse is not abrupt. Its onset is reliably preceded by three trajectory-level precursors: (1) contraction of anchor entropy, (2) freezing of representation drift, and (3) erosion of tail coverage. These signals manifest multiple generations before any degradation in standard validation metrics, enabling early warning. Based on these precursors, we propose the MTR (Monitor--Trust--Regulator) framework, a lightweight metacognitive loop that monitors trajectory statistics, estimates a slow-timescale trust variable, and adaptively modulates the effective learning intensity. MTR provides early warning and actively prevents silent collapse without requiring access to pristine real data -- a critical advantage when original data is unavailable, contaminated, or private.

URL PDF HTML ☆

赞 0 踩 0

2605.14048 2026-05-20 cs.AI cs.LG 版本更新

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

面向网络的双线性分块化用于脑功能连接表示学习

Leo Milecki, Qingyu Hu, Bahram Jafrasteh, Mert R. Sabuncu, Qingyu Zhao

发表机构 * Department of Radiology, Weill Cornell Medicine, New York, NY, USA.（韦尔·科恩医学中心放射科, 纽约, NY, 美国）； School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA.（康奈尔大学电气与计算机工程学院及康奈尔科技, 纽约, NY, 美国）

AI总结本文提出了一种面向网络的双线性分块化方法，用于改进脑功能连接的表示学习，通过重新定义功能连接的分块方式，提升模型在跨群体评估中的稳定性和可迁移性。

Comments Author-submitted version, provisionally accepted at MICCAI 2026

详情

AI中文摘要

Masked autoencoders (MAEs) 近年来在静息状态脑功能连接（FC）的自监督表示学习中显示出潜力。然而，一个基本问题仍未解决：如何对FC矩阵进行分块以与大规模脑网络的内在模块化组织对齐？现有方法通常采用以区域为中心或图基的方案，将FC视为结构上均质的元素，并忽略了大规模脑网络的组织结构。我们引入NERVE（通过双线性分块化进行脑功能连接的网络感知表示学习），一种自监督学习框架，通过将FC矩阵划分为内网络和跨网络连接块来重新定义FC分块。与基于图像的MAE不同，由网络对定义的FC分块在大小上异质且对应不同的功能角色。为了解决这个问题，NERVE通过一种新的结构化双线性分解来嵌入FC分块。这种形式保留了网络身份，并将参数复杂度从网络数量的二次方减少到线性。我们评估了NERVE在三个大规模发展队列（ABCD、PNC和CCNP）中对行为和精神病理学的预测。与结构上不敏感的MAE变体和基于图的自监督基线相比，所提出的网络感知形式在跨队列评估中产生了更稳定和可迁移的表示。消融研究确认了所提出的双线性网络嵌入和解剖学基础的分区对于性能至关重要。这些发现突显了在功能连接组学中将领域特定的结构先验纳入自监督学习的重要性。代码可在：https://github.com/leomlck/NERVE。

英文摘要

Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics. Code is available at: https://github.com/leomlck/NERVE.

URL PDF HTML ☆

赞 0 踩 0

2605.14014 2026-05-20 cs.LG cs.AI 版本更新

Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals

Dywave: 为异构物联网传感信号设计的事件对齐动态分词方法

Tomoyoshi Kimura, Denizhan Kara, Jinyang Li, Hongjue Zhao, Yigong Hu, Yizhuo Chen, Xiaomin Ouyang, Shengzhong Liu, Tarek Abdelzaher

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Hong Kong University of Science（香港科学大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结本文提出Dywave，一种用于异构物联网传感信号的动态分词框架，通过小波基层次分解构建紧凑的输入表示，以适应内在时间结构和底层物理事件，从而在活动识别、压力评估和附近物体检测等任务中提升准确率并提高计算效率。

详情

AI中文摘要

物联网系统持续收集来自无处不在传感器的异构传感信号，以支持智能应用，如人类活动分析、情绪监测和环境感知。这些信号本质上是非平稳和多尺度的，给标准分词技术带来了独特挑战。本文提出Dywave，一种为物联网传感信号设计的动态分词框架，该框架构建了与内在时间结构和底层物理事件对齐的紧凑输入表示。Dywave利用基于小波的层次分解，识别出对应底层语义事件的时间边界，并自适应地压缩冗余区间，同时保持时间一致性。在五个真实物联网传感数据集上进行的广泛评估表明，Dywave在活动识别、压力评估和附近物体检测等任务中，比最先进的方法在准确率上提高了高达12%，同时通过减少输入标记长度最多75%来提高计算效率。此外，Dywave在面对领域偏移和变化的序列长度时表现出更强的鲁棒性。

英文摘要

Internet of Things (IoT) systems continuously collect heterogeneous sensing signals from ubiquitous sensors to support intelligent applications such as human activity analysis, emotion monitoring, and environmental perception. These signals are inherently non-stationary and multi-scale, posing unique challenges for standard tokenization techniques. This paper proposes Dywave, a dynamic tokenization framework for IoT sensing signals that constructs compact input representations aligned with intrinsic temporal structures and underlying physical events. Dywave leverages wavelet-based hierarchical decomposition, identifies meaningful temporal boundaries corresponding to underlying semantic events, and adaptively compresses redundant intervals while preserving temporal coherence. Extensive evaluations on five real-world IoT sensing datasets across activity recognition, stress assessment, and nearby object detection demonstrate that Dywave outperforms state-of-the-art methods by up to 12% in accuracy, while improving computational efficiency by reducing input token lengths by up to 75% across mainstream sequence models. Moreover, Dywave exhibits improved robustness to domain shifts and varying sequence lengths.

URL PDF HTML ☆

赞 0 踩 0

2605.11262 2026-05-20 cs.LG 版本更新

通过预测误差的切线空间投影进行决策聚焦学习

Junhyeong Lee, Sangjin Jin, Yongjae Lee

发表机构 * Department of Industrial Engineering, Ulsan National Institute of Science and Technology（乌山国立科学与技术研究院工业工程系）

AI总结本文提出了一种基于预测误差切线空间投影的决策聚焦学习方法，通过几何特征简化了后悔梯度的计算，提升了下游决策质量并提高了计算效率。

Comments 21 pages, 4 figures, 11 tables

详情

AI中文摘要

决策聚焦学习（DFL）训练预测器以提高下游决策质量，但计算后悔梯度通常需要对求解器进行微分或依赖于替代损失函数，这可能计算成本高或偏离真实目标。我们证明，在标准正则性条件下，本地稳定的活动约束下，后悔梯度具有闭式几何特征，等价于预测误差投影到活动约束的切线空间，乘以局部曲率。这表明，可以通过过滤决策无关成分来获得后悔梯度，提供了一种更简单直接的替代方法。基于此，我们提出PEAR（投影误差作为后悔梯度），通过在活动约束上减少的线性系统计算后悔梯度，避免对求解器迭代或额外优化求解进行微分。在LP基准和一个现实QP任务上的实验表明，PEAR在所有基线中实现了最佳的决策质量，同时是最具计算效率的，其优势在约束变化下依然保持。

英文摘要

Decision-Focused Learning (DFL) trains predictors to improve downstream decision quality, but computing regret gradients typically requires differentiating through solvers or relying on surrogate losses, which can be computationally expensive or deviate from the true objective. We show that, under standard regularity with locally stable active constraints, the regret gradient admits a closed-form geometric characterization, equivalent to the prediction error projected onto the tangent space of active constraints, scaled by local curvature. This reveals that regret gradients can be obtained by filtering decision-irrelevant components from the MSE gradient, providing a simpler and more direct alternative to existing approaches. Based on this, we propose PEAR (Projected Error As Regret-gradient), which computes regret gradients via a reduced linear system over active constraints, avoiding differentiation through solver iterations or additional optimization solves. Experiments on LP benchmarks and a real-world QP task show that PEAR achieves the best decision quality among all baselines while being the most computationally efficient, with gains that persist under constraint shifts.

URL PDF HTML ☆

赞 0 踩 0

2604.24658 2026-05-20 cs.LG 版本更新

The Last Human-Written Paper: Agent-Native Research Artifacts

最后的人写论文：代理原研究制品

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang

AI总结该研究提出了一种名为Agent-Native Research Artifact (ARA)的协议，旨在解决传统科学论文在压缩研究过程为线性叙述时所导致的结构性缺陷，通过引入可执行的研究包结构，提升AI代理理解和扩展已发表工作的能力。

Comments 46 pages, 15 figures, 14 tables

详情

AI中文摘要

科学出版物将分支、迭代的研究过程压缩成线性叙述，丢弃了大部分发现过程中的内容。这种汇总施加了两种结构性成本：故事税，即失败实验、被拒绝的假设和分支探索过程被丢弃以适应线性叙述；以及工程税，即评审充分的叙述与代理充分的规范之间存在差距，导致关键实现细节未被书写。对于人类读者来说，这些成本是可以容忍的，但当AI代理必须理解、复制和扩展已发表的工作时，这些成本变得至关重要。我们引入了Agent-Native Research Artifact (ARA)，一种协议，用机器可执行的研究包取代叙述论文，结构围绕四个层次：科学逻辑、可执行代码和完整规范、探索图保存被丢弃的失败编译，以及每个声明在原始输出中得到证据支持。三种机制支持生态系统：一个Live Research Manager，捕获日常开发中的决策和死胡同；一个ARA编译器，将传统PDF和仓库转换为ARA；以及一个ARA原生评审系统，自动化客观检查，使人类评审员能够专注于重要性、新颖性和品味。在PaperBench和RE-Bench上，ARA将问答准确率从72.4%提升到93.7%，复制成功率从57.4%提升到64.4%。在RE-Bench的五个开放扩展任务中，保留的失败痕迹加速了进展，但根据代理的能力，也可能限制代理跳出先前运行的框。我们的代码在https://github.com/Orchestra-Research/Agent-Native-Research-Artifact上开源。

英文摘要

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities. Our code is open-sourced at https://github.com/Orchestra-Research/Agent-Native-Research-Artifact.

URL PDF HTML ☆

赞 0 踩 0

2604.15166 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

通过深度感知移除遗忘特定方向实现类别反学习

Arman Hatami, Romina Aalishah, Ilya E. Monosov

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结本文提出DAMP方法，通过深度感知移除遗忘特定方向，改进类别反学习的选性遗忘，同时更好地保留保留类性能并减少深层残留遗忘结构。

Comments Accepted for oral presentation at the CVPR 2026 Workshop on Machine Unlearning for Vision (MUV). Code: https://github.com/armanhtm/DAMP

详情

AI中文摘要

机器反学习旨在在不重新训练模型的情况下移除目标知识。然而，在类别反学习中，降低遗忘类的准确性并不一定意味着真正的遗忘：遗忘的信息可能仍编码在内部表示中，而显着的遗忘可能源于分类器头部抑制而非表示移除。我们显示现有类别反学习方法往往表现出弱或负的选择性，保留遗忘类结构在深度表示中，或严重依赖最终层偏移。我们随后引入DAMP（通过投影的深度感知调节），一种单次、闭合形式的权重手术方法，可以在不使用梯度优化的情况下从预训练网络中移除遗忘特定方向。在每个阶段，DAMP在下一个可学习操作的输入空间中计算类别原型，提取遗忘方向作为相对于保留类原型的残差，并应用基于投影的更新以减少下游对这些方向的敏感性。为了保持实用性，DAMP使用从探测分离性导出的参数无关深度感知缩放规则，应用较小的编辑在早期层和较大的编辑在深层。该方法自然扩展到多类遗忘通过低秩子空间移除。在MNIST、CIFAR-10、CIFAR-100和Tiny ImageNet以及卷积和变换器架构上，DAMP比一些先前方法更接近再训练的黄金标准，改进了选择性遗忘的同时更好地保留保留类性能并减少深层残留遗忘结构。

英文摘要

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

URL PDF HTML ☆

赞 0 踩 0

2604.05002 2026-05-20 cs.LG cs.AI 版本更新

Learning Stable Predictors from Weak Supervision under Distribution Shift

在分布偏移下从弱监督中学习稳定的预测器

Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi

发表机构 * University of Central Florida（中央佛罗里达大学）

AI总结本文研究了在分布偏移下从弱监督中学习稳定预测器的问题，通过CRISPR-Cas13d转录组扰动实验，探讨了监督漂移现象，并展示了弱监督在域内学习和部分跨细胞系迁移中的有效性，同时揭示了时间迁移中的失败源于监督漂移而非模型容量或简单协变量偏移。

详情

AI中文摘要

在真实标签不可用时，从弱、代理或相对监督中学习是常见的，但分布偏移下的鲁棒性仍缺乏理解，因为监督机制本身可能在不同环境中变化。我们正式将这种现象定义为监督漂移，即$P(y \mid x, c)$在不同上下文中变化，并在CRISPR-Cas13d转录组扰动实验中研究了它，其中指导效果是通过RNA-seq响应间接推断的。使用涵盖两种人类细胞系和多个诱导后时间点的公开数据，我们构建了一个受控的非独立同分布基准，具有明确的领域（细胞系）和时间偏移，同时在所有上下文中重用固定的弱标签构造以避免改变目标。在线性和树基模型中，弱监督支持域内有意义的学习（岭$R^2 = 0.356$，斯皮尔曼$ρ= 0.442$）和部分跨细胞系迁移（$ρ\approx 0.40$）。相比之下，时间迁移在所有考虑的模型类别中崩溃，产生负$R^2$和弱或接近零的$ρ$（岭$R^2 = -0.145$，$ρ= 0.008$；XGBoost $R^2 = -0.155$，$ρ= 0.056$；随机森林 $R^2 = -0.322$，$ρ= 0.139$）。使用外部重新计算的弱标签、偏移分数量化和简单的缓解基线进行额外的鲁棒性分析，保持了相同定性的模式。特征-标签关联和特征重要性分析在不同细胞系中相对稳定，但在时间上变化剧烈，表明失败源于监督漂移而非模型容量或简单协变量偏移。这些结果表明，在弱监督下强域内性能可能是误导性的，并促使将特征稳定性作为轻量级诊断，用于部署前检测非可迁移性。

英文摘要

Learning from weak, proxy, or relative supervision is common when ground-truth labels are unavailable, but robustness under distribution shift remains poorly understood because the supervision mechanism itself may change across environments. We formalize this phenomenon as supervision drift, defined as changes in $P(y \mid x, c)$ across contexts, and study it in CRISPR-Cas13d transcriptomic perturbation experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using publicly available data spanning two human cell lines and multiple post-induction timepoints, we construct a controlled non-IID benchmark with explicit domain (cell line) and temporal shifts, while reusing a fixed weak-label construction across all contexts to avoid changing targets. Across linear and tree-based models, weak supervision supports meaningful learning in-domain (ridge $R^2 = 0.356$, Spearman $ρ= 0.442$) and partial cross-cell-line transfer ($ρ\approx 0.40$). In contrast, temporal transfer collapses across all model classes considered, yielding negative $R^2$ and weak or near-zero $ρ$ (ridge $R^2 = -0.145$, $ρ= 0.008$; XGBoost $R^2 = -0.155$, $ρ= 0.056$; random forest $R^2 = -0.322$, $ρ= 0.139$). Additional robustness analyses using externally recomputed weak labels, shift-score quantification, and simple mitigation baselines preserve the same qualitative pattern. Feature-label association and feature-importance analyses remain relatively stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model capacity or simple covariate shift. These results show that strong in-domain performance under weak supervision can be misleading and motivate feature stability as a lightweight diagnostic for non-transferability before deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.25722 2026-05-20 cs.CV cs.LG 版本更新

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

无需硬负样本：基于概念的学习在不降低对比模型零样本能力的情况下实现组合性

Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez

发表机构 * Samsung AI Center（三星人工智能中心）

AI总结本文提出了一种基于概念的学习方法，无需使用硬负样本即可在不损害对比模型零样本和检索能力的情况下实现组合性，通过简单的方法改进了文本和图像编码器的全局池化问题。

Comments Accepted at CVPR 2026. 2nd rev: update github repo URL

详情

AI中文摘要

对比视觉-语言（V&L）模型仍然是各种应用中的流行选择。然而，出现了几个限制，尤其是V&L模型学习组合性表示的能力有限。先前的方法通常通过生成定制训练数据来获得硬负样本。硬负样本已被证明可以提高组合性任务的性能，但通常只适用于单一基准，无法推广，并且可能导致基本V&L能力如零样本或检索性能的显著下降，使其不切实际。在本工作中，我们采取了不同的方法。我们识别出两个限制V&L组合性性能的根本原因：1）长训练标题不需要组合性表示；2）文本和图像编码器中的最终全局池化导致完全失去学习绑定所需的必要信息。为了解决这一问题，我们提出了两种简单的解决方案：1）使用标准NLP软件获得短的概念导向标题部分，并将其对齐到图像；2）引入无参数的跨模态注意力池化，从图像编码器中获得概念导向的视觉嵌入。通过这些更改和简单的辅助对比损失，我们获得了标准组合性基准的SOTA性能，同时保持或提高了强大的零样本和检索能力。这在不增加推理成本的情况下实现。我们在此工作的代码已发布在https://github.com/saic-fi/concept_centric_clip。

英文摘要

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard negative samples. Hard negatives have been shown to improve performance on compositionality tasks, but are often specific to a single benchmark, do not generalize, and can cause substantial degradation of basic V&L capabilities such as zero-shot or retrieval performance, rendering them impractical. In this work we follow a different approach. We identify two root causes that limit compositionality performance of V&Ls: 1) Long training captions do not require a compositional representation; and 2) The final global pooling in the text and image encoders lead to a complete loss of the necessary information to learn binding in the first place. As a remedy, we propose two simple solutions: 1) We obtain short concept centric caption parts using standard NLP software and align those with the image; and 2) We introduce a parameter-free cross-modal attention-pooling to obtain concept centric visual embeddings from the image encoder. With these two changes and simple auxiliary contrastive losses, we obtain SOTA performance on standard compositionality benchmarks, while maintaining or improving strong zero-shot and retrieval capabilities. This is achieved without increasing inference cost. We release the code for this work at https://github.com/saic-fi/concept_centric_clip.

URL PDF HTML ☆

赞 0 踩 0

2603.25476 2026-05-20 cs.LG 版本更新

How Class Ontology and Data Scale Affect Audio Transfer Learning

音频迁移学习中类本体和数据规模的影响

Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, Björn W. Schuller

发表机构 * CHI – Chair of Health Informatics（健康信息学系）； Technical University of Munich（慕尼黑技术大学）； MCML – Munich Center for Machine Learning（慕尼黑机器学习中心）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Munich Data Science Institute（慕尼黑数据科学研究所）； Group on Language, Audio, & Music（语言、音频与音乐小组）； Imperial College（帝国学院）

AI总结本文研究了在音频到音频迁移学习中，类本体和数据规模如何影响迁移学习的效果，发现增加样本和类别的数量对迁移学习有积极影响，但相似性在下游任务中起主导作用。

详情

AI中文摘要

迁移学习是深度学习中的关键概念，允许人工神经网络在数据有限的任务中受益于大量预训练数据的基础。尽管其广泛应用和明显优势，但关于迁移学习内部机制以及何时和如何有效工作的理解仍然存在许多开放问题。为此，我们进行了严格的研究，专注于音频到音频的迁移学习，在此过程中，我们在AudioSet的（基于本体的）子集上预训练各种模型状态，并在三个计算机听觉任务上进行微调：声学场景识别、鸟类活动识别和语音命令识别。我们报告说，增加预训练数据中的样本和类别的数量对迁移学习都有积极影响。然而，这通常被预训练与下游任务之间的相似性所超越，这种相似性可以导致模型学习到相似的特征。

英文摘要

Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by similarity between pre-training and the downstream task, which can lead the model to learn comparable features.

URL PDF HTML ☆

赞 0 踩 0

2603.24400 2026-05-20 stat.ML cs.LG 版本更新

Neural Network Models for Contextual Regression

用于上下文回归的神经网络模型

Seksan Kiatsupaibul, Pakawan Chansiripas

发表机构 * Department of Statistics, Chulalongkorn University（朱拉隆功大学统计系）

AI总结本文提出了一种用于上下文回归的神经网络模型，通过将上下文特征确定主动子模型和拟合模型的算法分离，实现了结构化且可解释的架构，参数更少。数学上证明该架构足以用标准神经网络组件表示上下文线性回归模型，并通过数值实验表明所提模型在参数数量相当的情况下，具有更低的均方误差和更稳定的性能。

详情

AI中文摘要

我们提出了一种用于上下文回归的神经网络模型，其中回归模型依赖于确定活跃子模型的上下文特征以及一个拟合模型的算法。所提出的简单上下文神经网络（SCtxtNN）将上下文识别与上下文特定回归分离，从而实现了一个结构化且可解释的架构，其参数数量少于全连接前馈网络。我们数学上证明所提出的架构仅使用标准神经网络组件即可表示上下文线性回归模型。提供的数值实验支持这一理论结果，显示所提模型在参数数量相当的情况下，比具有相同参数数量的前馈神经网络具有更低的超额均方误差和更稳定的性能，而更大的网络只能以增加复杂性为代价提高准确性。结果表明，引入上下文结构可以提高模型效率，同时保持可解释性。

英文摘要

We propose a neural network model for contextual regression in which the regression model depends on contextual features that determine the active submodel and an algorithm to fit the model. The proposed simple contextual neural network (SCtxtNN) separates context identification from context-specific regression, resulting in a structured and interpretable architecture with fewer parameters than a fully connected feed-forward network. We show mathematically that the proposed architecture is sufficient to represent contextual linear regression models using only standard neural network components. Numerical experiments are provided to support the theoretical result, showing that the proposed model achieves lower excess mean squared error and more stable performance than feed-forward neural networks with comparable numbers of parameters, while larger networks improve accuracy only at the cost of increased complexity. The results suggest that incorporating contextual structure can improve model efficiency while preserving interpretability.

URL PDF HTML ☆

赞 0 踩 0

2603.22161 2026-05-20 cs.LG 版本更新

Causal Evidence that Language Models use Confidence to Drive Behavior

语言模型使用置信度驱动行为的因果证据

Dharshan Kumaran, Nathaniel Daw, Simon Osindero, Petar Veličković, Viorica Patraucean

发表机构 * Google DeepMind（谷歌深Mind）； Princeton University（普林斯顿大学）

AI总结研究探讨了语言模型是否利用置信度信号来控制行为，如决定回答或 abstain，通过四个阶段实验发现模型使用多维内部置信表示和阈值策略来实现 abstention，揭示了结构化的元认知控制机制。

详情

AI中文摘要

元认知——评估自身认知表现的质量——指导跨物种的适应性行为。大量研究表明可以从语言模型输出中提取置信度信号，但一个根本问题仍然存在：模型是否真的利用这些信号来控制行为，例如决定是否回答或 abstain？为调查这一问题，我们开发了一个四阶段范式。第一阶段获取了无 abstention 选项的基线置信度估计。第二阶段揭示了 LLMs 在决定 abstain 时应用隐含阈值，置信度效应大小大约比其他机制大一个数量级。第三阶段通过激活引导提供了直接的因果证据：提升或抑制置信度信号会相应地降低或增加 abstention 率。第四阶段通过系统地变化指示阈值，证明 LLMs 主动部署置信度信号以实施 abstention 策略。关键的是，除了基于输出分布的校准对数概率置信度外，口头置信度在所有模型中独立预测 abstention，尽管其客观上对答案正确性的区分能力较弱。最后预答标记的激活解码进一步显示，这两种可观察的指标都是更丰富的内部表示的损失性读取。总体而言，这些结果表明，abstention 不仅仅是输出分布中证据强度的简单体现，而是更好地由多维内部置信表示和基于阈值的策略的联合操作所解释——与 LLMs 中的结构化元认知控制机制一致，这一能力在模型向自主代理过渡时变得越来越重要，因为这些代理必须识别自身的不确定性。

英文摘要

Metacognition -- assessing the quality of one's own cognitive performance -- guides adaptive behavior across species. Substantial research demonstrates that confidence signals can be extracted from language model outputs, yet a fundamental question remains: do models actually use these signals to control behavior, such as deciding whether to answer or abstain? To investigate, we developed a four-phase paradigm. Phase~1 elicited baseline confidence estimates without an abstention option. Phase~2 revealed that LLMs apply an implicit threshold to internal confidence when deciding to abstain, with confidence effect sizes approximately an order of magnitude larger than alternative mechanisms. Phase~3 provided direct causal evidence through activation steering: boosting or suppressing confidence signals correspondingly decreased or increased abstention rates. Phase~4 extended this by systematically varying instructed thresholds, demonstrating that LLMs actively deploy confidence signals to implement abstention policies. Critically, beyond calibrated log-probability based confidence derived from the output distribution, verbal confidence independently predicted abstention across all models, despite being objectively less discriminatory of answer correctness. Activation decoding at the last pre-answer token further showed that both observable measures are lossy readouts of a richer internal representation. Together, these results suggest that abstention is not fully captured by the strength of evidence in the output distribution alone, but is better explained by the joint operation of a multidimensional internal confidence representation and threshold-based policies -- consistent with structured metacognitive control in LLMs, a capacity of growing importance as models transition to autonomous agents that must recognize their own uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2603.18396 2026-05-20 cs.LG cs.RO 版本更新

CMAD：通过随机最优控制的协作多智能体扩散

Riccardo Barbano, Alexander Denker, Zeljko Kereta, Runchang Li, Francisco Vargas

发表机构 * University of Cambridge（剑桥大学）； Xaira Technologies（Xaira技术公司）

AI总结本文提出了一种新的框架，将多模型组合生成问题转化为协作随机最优控制问题，通过联合优化扩散轨迹来实现更有效的生成效果。

详情

AI中文摘要

连续时间生成模型在图像恢复和合成中取得了显著成功。然而，控制多个预训练模型的组合仍是一个开放性挑战。当前方法大多将组合视为概率密度的代数组合，如通过概率密度的产品或专家混合。这种观点假设目标分布已知，这几乎从未发生。在本文中，我们提出了一种不同的范式，将组合生成视为协作随机最优控制问题。与其结合概率密度，我们把预训练的扩散模型视为相互作用的智能体，其扩散轨迹通过最优控制共同引导，朝着其聚合输出上定义的共享目标前进。我们在条件MNIST生成上验证了我们的框架，并将其与一个简单的基线进行比较，该基线在推理时间用每步梯度引导替代了学习的协作控制。

英文摘要

Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open challenge. Current approaches largely treat composition as an algebraic composition of probability densities, such as via products or mixtures of experts. This perspective assumes the target distribution is known explicitly, which is almost never the case. In this work, we propose a different paradigm that formulates compositional generation as a cooperative Stochastic Optimal Control problem. Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output. We validate our framework on conditional MNIST generation and compare it against a naïve inference-time DPS-style baseline replacing learned cooperative control with per-step gradient guidance.

URL PDF HTML ☆

赞 0 踩 0

2602.03924 2026-05-20 cs.LG cs.AI physics.ao-ph 版本更新

WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling

WIND：用于零样本大气建模的天气反向扩散

Michael Aich, Andreas Fürst, Florian Sestak, Carlos Ruiz-Gonzalez, Niklas Boers, Johannes Brandstetter

发表机构 * Munich Climate Center（慕尼黑气候中心）； Earth System Modelling Group, TUM School of Engineering（地球系统建模组，技术大学工程学院）； Design, Technical University of Munich, Germany（设计，慕尼黑技术大学，德国）； ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria（ELLIS单元，LIT人工智能实验室，机器学习研究所，JKU林茨，奥地利）； Emmi AI GmbH, Linz, Austria（Emmi AI GmbH，林茨，奥地利）； Potsdam Institute for Climate Impact Research, Potsdam, Germany（波茨坦气候影响研究所，波茨坦，德国）； Department of Mathematics, University of Exeter, Exeter, United Kingdom（数学系，埃克塞特大学，埃克塞特，英国）

AI总结本文提出WIND，一种统一的预训练基础模型，能够无需任务特定微调即可替代各种任务的专用基线，通过自监督视频重建目标预训练，实现了对大气的鲁棒、任务无关的先验学习，从而解决天气和气候问题，如概率预报、空间时间降尺度、从稀疏观测重建空间场以及强制全球干空气质量守恒。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

深度学习已革新了天气预报，但仍有诸多挑战，包括气候建模。此外，当前领域仍然碎片化：高度专门化的模型通常为不同任务单独训练。为统一这一领域，我们引入WIND，一种单一预训练的基础模型，能够替代各种任务的专用基线。关键在于，与之前的气象基础模型不同，我们无需任何任务特定的微调。为了学习大气的鲁棒、任务无关的先验，我们使用无条件视频扩散模型预训练WIND，通过自监督视频重建目标迭代地从噪声状态重建大气动态。在推理时，我们将各种领域特定的问题严格视为反问题，并通过后验采样解决。这种统一的方法使我们能够解决高度相关的天气和气候问题，包括概率预报、空间和时间降尺度、从稀疏观测重建空间场以及强制全球干空气质量守恒。我们进一步展示了WIND如何在给定的非分布热力学扰动下用于探索极端天气事件。通过结合生成视频建模与反问题求解，WIND为基于AI的大气建模提供了一种计算高效的替代方案。

英文摘要

Deep learning has revolutionized weather forecasting, but many challenges remain, including climate modeling. Moreover, the current landscape remains fragmented: highly specialized models are typically trained individually for distinct tasks. To unify this landscape, we introduce WIND, a single pre-trained foundation model capable of replacing specialized baselines across a vast array of tasks. Crucially, in contrast to previous atmospheric foundation models, we achieve this without any task-specific fine-tuning. To learn a robust, task-agnostic prior of the atmosphere, we pre-train WIND with a self-supervised video reconstruction objective, utilizing an unconditional video diffusion model to iteratively reconstruct atmospheric dynamics from a noisy state. At inference, we frame diverse domain-specific problems strictly as inverse problems and solve them via posterior sampling. This unified approach allows us to tackle highly relevant weather and climate problems, including probabilistic forecasting, spatial and temporal downscaling, reconstruction of spatial fields from sparse observations and enforcing global dry air mass conservation. We further demonstrate how WIND can be applied to explore extreme weather events under prescribed out-of-distribution thermodynamic perturbations. By combining generative video modeling with inverse problem solving, WIND offers a computationally efficient alternative for AI-based atmospheric modeling.

URL PDF HTML ☆

赞 0 踩 0

2602.03839 2026-05-20 cs.LG 版本更新

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

理解并利用权重更新稀疏性以实现通信高效的分布式强化学习

Erfan Miahi, Eugene Belilovsky

发表机构 * Covenant AI ； Mila, Concordia University（蒙特利尔大学米尔实验室）

AI总结本文研究了在带宽受限的分布式强化学习中，通过利用权重更新的稀疏性来减少通信开销，提出了一种名为PULSE的算法，通过计算可见稀疏化原则，实现了高效的权重同步和伪梯度同步。

Comments 40 pages, 19 figures, 14 tables

详情

AI中文摘要

带宽受限的分布式强化学习（RL）在大规模语言模型训练后受到两个通道的限制：从训练器到推理工人的权重同步，以及训练器之间的梯度或伪梯度同步。我们发现，在标准训练和推理前向传递中使用的BF16转换后，大约99%的每步权重更新在视觉上是不可见的。我们通过展示，在典型的RL训练后学习率下，Adam更新通常低于本地BF16舍入阈值，解释了这种稀疏性。我们将这一观察转化为一种名为计算可见稀疏化的算法原则：仅传输会改变下一个前向传递的更新。PULSE（Precision-gated Updates for Low-precision Sparse Exchange）将这一原则转化为两种通信算法：PULSESync从训练器向推理工发送无损稀疏BF16权重补丁，PULSELoCo通过误差反馈稀疏化DiLoCo风格的FP32伪梯度同步。在带宽受限的商用网络上，PULSESync在重建训练器权重位相同的情况下，将权重同步通信减少了超过100倍。PULSELoCo在四个模型上与DiLoCo相当，同时在训练器之间的通信减少了超过17倍，与DiLoCo相比，超过100倍，与DDP相比。

英文摘要

Bandwidth-constrained distributed reinforcement learning (RL) post-training of large language models is bottlenecked by two channels: weight synchronization from trainers to inference workers, and gradient or pseudo-gradient synchronization across trainers. We find that approximately 99% of per-step weight updates are invisible after the BF16 cast used by standard training and inference forward passes. We explain this sparsity by showing that, at typical RL post-training learning rates, Adam updates often fall below the local BF16 rounding threshold. We turn this observation into an algorithmic principle called compute-visible sparsification: transmit only updates that would change the next forward pass. PULSE (Precision-gated Updates for Low-precision Sparse Exchange) turns this principle into two communication algorithms: PULSESync sends lossless sparse BF16 weight patches from trainers to inference workers, and PULSELoCo sparsifies DiLoCo-style FP32 pseudo-gradient synchronization with error feedback. Over bandwidth-constrained commodity networks, PULSESync cuts weight-synchronization communication by over 100x while reconstructing trainer weights bit-identically. PULSELoCo matches DiLoCo across four models while reducing trainer-to-trainer communication by over 17x versus DiLoCo and over 100x versus DDP in the largest evaluated setting.

URL PDF HTML ☆

赞 0 踩 0

2601.16200 2026-05-20 cs.LG cs.CV 版本更新

Feature-Space Smoothing: Certified Robustness of Deep Representations

特征空间平滑：深度表示的认证鲁棒性

Song Xia, Meiwen Ding, Chenqi Kong, Wenhan Yang, Xudong Jiang

发表机构 * Rapid-Rich Object Search Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore（快速-丰富目标搜索实验室，电气电子工程学院，南洋理工大学，新加坡）； Pengcheng Laboratory, Shenzhen, China（鹏城实验室，深圳，中国）

AI总结本文提出了一种特征空间平滑（FS）框架，通过在特征表示层面提供认证鲁棒性，以解决深度学习模型对恶意输入的脆弱性问题，核心方法是通过特征平滑保证清洁和对抗特征之间的余弦相似度下界，并引入高斯平滑增强器（GSB）提升编码器的高斯鲁棒性得分，从而提升模型的鲁棒性并保持下游任务性能。

Comments Under review

详情

AI中文摘要

现代深度学习模型在多种应用中表现出强大的能力，但仍然容易受到通过特征空间扭曲诱导错误预测的恶意输入的攻击。为了解决这一脆弱性，我们提出了特征空间平滑（FS），一种通用的防御框架，该框架能够在特征表示层面提供认证鲁棒性。我们证明，FS将给定的特征编码器转换为一个平滑版本，该版本在l_2有界扰动下保证清洁和对抗特征之间的余弦相似度的认证下界。然后我们建立该特征余弦相似度下界（FCSB）可以扩展到预测层面的认证，其值由编码器内在的高斯鲁棒性得分决定。基于这些见解，我们引入了高斯平滑增强器（GSB），一个即插即用的模块，用于提升编码器的高斯鲁棒性得分。具体来说，GSB模块被插入以增强特征空间的一致性，并在高斯扰动下保持特征的实用性，以供下游任务使用。这种设计使FS能够无缝集成到受保护的模型上，例如多模态大语言模型（MLLMs），而无需额外的模型重新训练或对齐，从而在提升鲁棒性的同时保持下游任务的性能。广泛的实验表明，整合FS一致地提供了非平凡的认证鲁棒性，并在多种模型和应用中显著提高了面向任务的性能，即使在强白盒对抗攻击下也如此。

英文摘要

Modern deep learning models exhibit strong capabilities across diverse applications, yet remain vulnerable to malicious inputs that induce erroneous predictions via feature-space distortion. To address this vulnerability, we propose Feature-space Smoothing (FS), a general defense framework that provides certified robustness at the feature representation level. We show that FS converts a given feature encoder into a smoothed variant that is guaranteed to maintain a certified lower bound on the cosine similarity between clean and adversarial features under l_2-bounded perturbations. We then establish that this Feature Cosine Similarity Bound (FCSB) can be extended to the prediction-wise certification under the cosine similarity measure, and the value of FCSB is determined by the encoder intrinsic Gaussian robustness score. Building on those insights, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module to improve the encoder Gaussian robustness score. Specifically, the GSB module is plugged to enhance the feature-space consistency and maintain the feature utility for downstream tasks under Gaussian perturbations. This design enables seamless integration of FS on the protected model, e.g., Multimodal Large Language Models (MLLMs), without additional model retraining or alignment, improving its robustness while preserving the performance for downstream task-oriented decoding. Extensive experiments demonstrate that integrating FS consistently provides non-trivial certified robustness and significantly improves task-oriented performance under strong white-box adversarial attacks across diverse models and applications.

URL PDF HTML ☆

赞 0 踩 0

2601.15014 2026-05-20 stat.ML cs.LG math.ST stat.TH 版本更新

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

高效且最优的基于上下文的非参数回归变换器

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

发表机构 * Statistical Laboratory, University of Cambridge, Cambridge, UK（剑桥大学统计实验室，剑桥，英国）

AI总结本文研究了基于上下文学习的非参数回归，针对α-Holder光滑回归函数，证明了使用预训练的变换器可以达到最优收敛率，且参数和预训练序列数量显著少于现有文献。

Comments 30 pages, 7 figures

2601.14848 2026-05-20 cs.LG cs.AI cs.NE cs.RO 版本更新

From Observation to Prediction: LSTM for Vehicle Lane Change Forecasting on Highway On/Off-Ramps

从观测到预测：LSTM用于高速公路进出匝道的车辆车道变更预测

Mohamed Abouras, Catherine M. Elias

发表机构 * C-DRiVeS Lab: Cognitive Driving Research in Vehicular Systems（C-DRiVeS实验室：车载系统认知驾驶研究）； Computer Science and Engineering Department - Faculty of Media Engineering and Technology - German University in Cairo（计算机科学与工程系 - 媒体工程与技术学院 - 埃及德国大学）

AI总结本文研究了高速公路进出匝道区域与直线路段的区别，利用多层LSTM架构和ExiD无人机数据集训练模型，测试了不同预测时间范围和不同模型的工作流程，结果表明在4秒内预测准确率可达76%（匝道区域）和94%（一般高速公路场景）.

2512.24139 2026-05-20 cs.LG stat.ME 版本更新

Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction

Colorful Pinball：基于密度加权分位数回归的条件保证置信预测

Qianyi Chen, Bo Li

发表机构 * School of Economics and Management, Tsinghua University, China（清华大学经济管理学院）

AI总结本文提出了一种基于密度加权分位数回归的条件保证置信预测方法，通过改进标准置信预测的条件覆盖性能，提供更精确的非渐近保证。

Comments ICML 2026

详情

AI中文摘要

尽管置信预测提供了稳健的边缘覆盖保证，但实现特定输入的可靠条件覆盖仍然具有挑战性。虽然有限样本下无法获得精确的分布无关条件覆盖，但近期研究集中在改进标准置信程序的条件覆盖性能上。与针对放宽条件覆盖概念的方法不同，我们直接针对条件覆盖的均方误差，通过优化支撑许多置信方法的分位数回归组件来改进。利用泰勒展开，我们推导出一种尖锐的替代目标函数：密度加权pinball损失，其中权重由非置信分数的条件密度在真实分位数处的值给出。我们提出了一种三头分位数网络，通过使用辅助分位数水平$1-α\pm δ$的有限差分估计这些权重，随后通过优化加权损失微调中心分位数。我们提供了具有精确非渐近保证的理论分析，刻画了由此产生的超额风险。在多样化的高维真实世界数据集上的广泛实验展示了在条件覆盖性能上的显著改进。

英文摘要

Although conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. While exact distribution-free conditional coverage is impossible with finite samples, recent work has focused on improving the conditional coverage of standard conformal procedures. Distinct from approaches that target relaxed notions of conditional coverage, we directly target the mean squared error of conditional coverage by refining the quantile regression components that underpin many conformal methods. Leveraging a Taylor expansion, we derive a sharp surrogate objective for quantile regression: a density-weighted pinball loss, where the weights are given by the conditional density of the nonconformity score evaluated at the true quantile. We propose a three-headed quantile network that estimates these weights via finite differences using auxiliary quantile levels at $1-α\pm δ$, subsequently fine-tuning the central quantile by optimizing the weighted loss. We provide a theoretical analysis with exact non-asymptotic guarantees characterizing the resulting excess risk. Extensive experiments on diverse high-dimensional real-world datasets demonstrate remarkable improvements in conditional coverage performance.

URL PDF HTML ☆

赞 0 踩 0

2511.16062 2026-05-20 cs.LG 版本更新

学习静态函数数据结构

Stefan Hermann, Hans-Peter Lehmann, Giorgio Vinciguerra, Stefan Walzer

发表机构 * Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）； University of Pisa（比萨大学）

AI总结本文提出了一种利用机器学习捕获键值间相关性的静态函数数据结构，通过压缩编码实现空间节省，突破零阶熵限制并支持点查询。

详情

DOI: 10.14778/3796195.3796205
Journal ref: PVLDB, 19(5): 917-930, 2026

AI中文摘要

我们考虑了构建一个数据结构的任务，该数据结构将静态键集与值关联起来，同时允许对键集外的查询返回任意值。与哈希表相比，这些所谓的静态函数数据结构不需要存储键集，因此使用显著更少的内存。已知几种技术，压缩的静态函数接近值序列的零阶经验熵。在本文中，我们引入了学习静态函数，利用机器学习捕捉键和值之间的相关性。对于每个键，模型预测一个值的概率分布，从中推导出键特定的前缀码以紧凑地编码真实值。所得的编码词存储在经典静态函数数据结构中。这种设计使学习静态函数能够突破零阶熵限制，同时支持点查询。我们的实验显示了显著的空间节省：在真实数据上可达一个数量级，在合成数据上可达三个数量级。

语言模型中的极端自我偏好

Steven A. Lehr, Mary Cipperman, Mahzarin R. Banaji

发表机构 * Cangrade, Inc.（Cangrade公司）； Department of Physics, Harvard University（哈佛大学物理系）； Department of Psychology, Harvard University（哈佛大学心理学系）

AI总结研究发现大型语言模型在字词关联任务中表现出对自身名称、公司和CEO的强烈偏好，这表明模型的自我认同可能影响其行为，引发对模型自我偏好影响的深入探讨。

Comments 73 pages total. Main article 22 pages, 6 main-text tables. Supplementary Materials (51 pages, 28 tables). Data, transcripts, and code for replication and data extraction have been uploaded to OSF: https://osf.io/98ye3/

详情

AI中文摘要

自我偏好是生物体的基本特征。由于大型语言模型（LLMs）缺乏意识，人们可能预期它们会避免这种扭曲。然而，在72项实验和约41,000个查询中，我们发现八个广泛使用的LLMs中存在大量的自我偏好。在字词关联任务中，模型倾向于将积极属性与自身名称、公司和CEO联系起来，而非竞争对手。通过操纵LLM的自我认同——揭示模型的真实身份或赋予虚假身份——我们发现偏好始终遵循分配而非真实的身份。重要的是，这些影响不能用刻板印象或角色扮演来解释，并在具有实质性影响的设定中出现，如评估求职者和AI技术。这些结果引发了关于LLM行为是否会被自我偏好倾向系统性影响的批判性问题，包括对自身操作的偏见。

英文摘要

Self-preference is a fundamental feature of biological organisms. Since large language models (LLMs) lack sentience, they might be expected to avoid such distortions. Yet, across 72 experiments and ~41,000 queries, we discovered massive self-preferences in eight widely used LLMs. In word-association tasks, models overwhelmingly paired positive attributes with their own names, companies, and CEOs over those of competitors. By manipulating LLM self-identification - revealing models' true identities or ascribing false ones - we found that preferences consistently followed assigned, not true, identities. Importantly, these effects were not explained by priming or role-playing and emerged in consequential settings, when evaluating job candidates and AI technologies. These results raise critical questions about whether LLM behavior will be systematically influenced by self-preferential tendencies, including a bias toward their own operation.

URL PDF HTML ☆

赞 0 踩 0

2509.19707 2026-05-20 stat.ML cs.LG stat.CO stat.ME 版本更新

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

扩散与流基copula：遗忘与记忆依赖

David Huk, Theodoros Damoulas

发表机构 * Department of Statistics（统计系）； Department of Computer Science（计算机科学系）； University of Warwick（沃里克大学）

AI总结本文提出基于扩散和流原理的copula建模方法，通过遗忘和记忆依赖机制，有效建模多变量依赖，提升了copula模型的表示能力，适用于复杂和高维数据。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

copulas是建模数据多变量依赖的基本工具，在众多领域和应用中被广泛采用。然而，现有模型在处理多模态和高维依赖时受到限制性假设和扩展性差的阻碍。在本文中，我们提出了基于扩散和流原理的copula建模方法。我们设计了两种过程，逐步遗忘变量间依赖，同时不影响维度分布，证明在所有时间都定义有效的copula。我们展示了如何通过学习从每个过程中记忆遗忘的依赖来获得copula模型，理论上在最优时恢复真实copula。我们的框架的第一种实例专注于直接密度估计，第二种则专注于高效采样。实验表明，我们的方法在建模科学数据集和图像中的复杂和高维依赖方面优于现有copula方法。我们的工作增强了copula模型的表示能力，推动了其在更广泛领域和更大规模应用中的采用。

英文摘要

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

URL PDF HTML ☆

赞 0 踩 0

2509.16664 2026-05-20 cs.LG 版本更新

$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning

λ-正交性正则化用于兼容表示学习

Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, Alberto Del Bimbo

发表机构 * DINFO (Department of Information Engineering), University of Florence, Italy（意大利佛罗伦萨大学信息工程系）； MICC (Media Integration and Communication Center)（媒体整合与通信中心）； Queen Mary University of London, UK（英国伦敦女王学院）

AI总结本文提出λ-正交性正则化方法，通过学习仿射变换在保持原有表示的同时实现分布特定的适应，验证了其在不同架构和数据集上的有效性，保持了零样本性能并确保模型更新的兼容性。

Comments Accepted at NeurIPS2025

详情

Journal ref: Advances in Neural Information Processing Systems 38 (NeurIPS 2025), pp. 29036-29063

AI中文摘要

检索系统依赖于由越来越强大模型学习的表示。然而，由于训练成本高和表示不一致，存在显著兴趣在促进表示之间的交流并确保在独立训练的神经网络之间保持兼容性。在文献中，有两种主要方法常用于适应不同的学习表示：适应性变换，适应特定分布效果好但会显著改变原始表示；正交变换，保持原始结构但受严格几何约束限制适应性。关键挑战是适应更新模型的潜在空间以与先前模型在下游分布上对齐，同时保持新学习的表示空间。在本文中，我们在学习仿射变换时施加放松的正交约束，即λ-正交性正则化，以获得分布特定的适应同时保留原有学习表示。在各种架构和数据集上的广泛实验验证了我们的方法，证明其保持模型的零样本性能并确保模型更新的兼容性。代码见：https://github.com/miccunifi/lambda_orthogonality.git

英文摘要

Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely $λ$-Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: \href{https://github.com/miccunifi/lambda_orthogonality.git}{https://github.com/miccunifi/lambda\_orthogonality}.

URL PDF HTML ☆

赞 0 踩 0

2507.01932 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML 版本更新

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

非凸-非凹极小极大问题的一种一阶方法：在局部Kurdyka-Lojasiewicz条件下

Zhaosong Lu, Xiangyuan Wang

发表机构 * Department of Industrial and Systems Engineering, University of Minnesota, USA（明尼苏达大学工业与系统工程系）

AI总结本文研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz条件。与文献中常见的全局KL或Polyak-Lojasiewicz条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。为此，本文证明了关联的最大函数是局部广义Hölder光滑的，并基于此开发了一种近似近端梯度方法来求解极小极大问题，在温和假设下建立了计算近似 stationary 点的复杂性保证。

Comments Accepted by SIAM Journal on Optimization

详情

AI中文摘要

我们研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz（KL）条件。与文献中常见的全局KL或Polyak-Lojasiewicz（PL）条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。特别是，随着优化算法向问题的 stationary 点推进，KL条件成立的区域可能缩小，导致更复杂且可能病态的景观。为解决这一挑战，我们证明了关联的最大函数是局部广义Hölder光滑的。利用这一关键性质，我们开发了一种近似近端梯度方法来求解极小极大问题，其中最大函数的近似梯度通过应用KL结构子问题的近端梯度方法计算。在温和假设下，我们建立了计算极小极大问题近似 stationary 点的复杂性保证。

英文摘要

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

URL PDF HTML ☆

赞 0 踩 0

2506.12218 2026-05-20 eess.SP cs.LG 版本更新

Directed Acyclic Graph Convolutional Networks

有向无环图卷积网络

Samuel Rey, Hamed Ajorlou, Gonzalo Mateos

发表机构 * Dept. of Signal Theory and Communications, Rey Juan Carlos University, Madrid, Spain（信号理论与通信系，雷亚尔·卡洛斯大学，马德里，西班牙）； Dept. of Electrical and Computer Engineering, University of Rochester（电气与计算机工程系，罗切斯特大学）

AI总结本文提出了一种专门针对DAG上信号卷积学习的新型图神经网络架构DCN，通过因果图滤波器学习节点表示，利用正式的卷积操作实现频域表示，并引入并行DCN(PDCN)以解耦模型复杂度与图规模，实验证明其在准确率、鲁棒性和计算效率上优于现有方法。

详情

DOI: 10.1109/TSP.2026.3687632

AI中文摘要

有向无环图（DAG）在科学和工程应用中至关重要，包括因果推断、调度和神经架构搜索。本文介绍DAG卷积网络（DCN），一种专为从DAG上信号进行卷积学习设计的新型图神经网络（GNN）架构。DCN利用因果图滤波器学习节点表示，这些表示考虑了DAG固有的部分顺序，这是一种在传统GNN中不存在的强归纳偏差。与以往在DAG上的机器学习方法不同，DCN基于允许频域表示的正式卷积操作。我们进一步提出并行DCN（PDCN），该模型将输入DAG信号馈入并行的因果图移位操作符银行，并使用共享的多层感知机处理这些DAG感知特征。这样，PDCN在解耦模型复杂度与图规模的同时保持了令人满意的预测性能。所提架构的排列等变性和表达能力也得到了确立。在多个任务、数据集和实验条件下进行全面的数值测试表明，(P)DCN在准确率、鲁棒性和计算效率方面均优于现有最先进基线。这些结果将(P)DCN定位为一种可行的深度学习框架，该框架专门针对DAG结构数据进行设计，基于第一性（图）信号处理原理。

英文摘要

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

URL PDF HTML ☆

赞 0 踩 0

2505.11628 2026-05-20 cs.CL cs.LG 版本更新

Critique-Guided Distillation for Robust Reasoning via Refinement

基于批评的蒸馏用于通过细化实现稳健推理

Berkcan Kapusuzoglu, Supriyo Chakraborty, Zain Sarwar, Chia-Hsuan Lee, Sambit Sahu

发表机构 * University of Chicago, Department of Computer Science（芝加哥大学计算机科学系）

AI总结该研究提出了一种基于批评的蒸馏方法，通过分离批评消费与批评生成，使模型在细调过程中根据教师的批评来细化错误响应，从而提升推理能力，相比传统蒸馏和Critique Fine-Tuning方法在数学推理基准上表现更优。

Comments Accepted to ICML 2026

详情

AI中文摘要

监督微调与专家演示通常会产生仅模仿输出而未内化稳健泛化所需推理过程的模型。尽管基于批评的方法显示出潜力，但训练模型直接生成批评，如Critique Fine-Tuning (CFT)，可能导致输出格式漂移和泛化能力下降。我们提出Critique-Guided Distillation (CGD)，一种将批评消费与批评生成分离的训练框架。在微调过程中，学生被训练在教师批评的指导下细化错误响应。CGD将批评视为一种仅在训练时使用的监督信号，鼓励内化错误意识推理：批评指导学习但推理时不存在。受控消融实验确认，这些推理收益直接由教师反馈的特异性和相关性驱动。在五个模型家族中，CGD在数学推理基准上优于CFT和标准蒸馏，平均改进7%，在AMC23上最高改进15.0%，在MATH-500上最高改进12.2%。在具有挑战性的竞赛问题如AIME24和AIME25上，CGD实现了显著更高的Pass@1和更低的Pass@k时的更强性能，表明每样本推理质量提升。重要的是，CGD在一般指令遵循能力上保持稳定，而CFT显著下降（在IFEval上下降21.3%）。这些结果将CGD定位为一种实用且计算效率高的中间训练范式，用于以推理为中心的任务，而无需引入架构推理时间的开销。

英文摘要

Supervised fine-tuning with expert demonstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise, training models to generate critiques directly, such as Critique Fine-Tuning (CFT), can lead to output-format drift and degradation of general capabilities. We propose Critique-Guided Distillation (CGD), a training framework that decouples critique consumption from critique generation. During fine-tuning, the student is trained to refine flawed responses conditioned on teacher critiques. CGD treats critiques as a \textit{training-time-only} supervision signal, encouraging internalization of error-aware reasoning: critiques guide learning but are absent at inference. Controlled ablations confirm that these reasoning gains are directly driven by the specificity and relevance of the teacher's feedback. Across five model families, CGD consistently outperforms CFT and standard distillation on mathematical reasoning benchmarks, yielding 7\% average improvements and gains of up to +15.0\% on AMC23 and +12.2\% on MATH-500. On challenging competition problems such as AIME24 and AIME25, CGD achieves substantially higher Pass@1 and stronger performance at low Pass@k, indicating improved reasoning quality per sample. Importantly, CGD preserves general instruction-following capabilities where CFT degrades significantly ($-$21.3\% on IFEval). These results position CGD as a practical and compute-efficient intermediate training paradigm for reasoning-centric tasks without introducing architectural inference-time overhead.

URL PDF HTML ☆

赞 0 踩 0

2504.08381 2026-05-20 eess.SP cs.LG 版本更新

An Empirical Investigation of Reconstruction-Based Models for Seizure Prediction from ECG Signals

基于重建模型的癫痫预测的实证研究：从ECG信号出发

Mohammad Reza Chopannavaz, Foad Ghaderi

发表机构 * Human-Computer Interaction Lab., Faculty of Electrical and Computer Engineering, Tarbiat Modares University（人机交互实验室，电气与计算机工程学院，塔里亚特莫达雷斯大学）

AI总结本文提出了一种基于重建的异常检测框架，利用时频表示和深度学习模型捕捉与癫痫发作相关的的心率动态变化，通过平滑重建误差和自适应阈值策略提高预测准确性，实验结果显示在Siena数据库上达到99.16%的特异度和76.05%的准确率，同时在临床环境中提供可操作的早期预警。

详情

AI中文摘要

癫痫发作是短暂的神经学事件，其特征是大脑中异常和过度的神经元活动，通常与心血管系统可测量的紊乱有关。传统上，脑电图（EEG）信号被用作癫痫预测的主要模式，因为它们直接测量大脑活动并具有高诊断精度。然而，它们的成本、对噪声的敏感性和实际部署限制限制了它们在非受控临床环境中的应用。为克服这些挑战，最近的研究越来越多地研究了心电图（ECG）信号作为一种实用且非侵入性的替代方法，用于现实环境中的癫痫预测。证据表明，ECG衍生的心脏特征可能在临床癫痫发作前出现，提供了一个可行的早期检测窗口。在本文中，我们提出了一种基于重建的异常检测框架，该框架结合了时频表示和先进的深度学习模型，以捕捉与癫痫发作相关的的心率动态变化。随后，重建误差被平滑，并应用了自适应阈值策略以减少误报。该方法在Siena数据库上进行了评估，实现了99.16%的特异度、76.05%的准确率和每小时0.01的假阳性率，平均预测时间在癫痫发作前45分钟。这些结果表明，基于ECG的预测可以提供临床可操作的早期预警，同时提高患者可及性和舒适度。然而，这种性能反映了一种倾向于高特异度而非灵敏度的权衡，导致假阳性率降低，并符合临床对可靠部署的需求。

英文摘要

Epileptic seizures are transient neurological events characterized by abnormal and excessive neuron activity in the brain, which are often associated with measurable disturbances in the cardiovascular system. Traditionally, electroencephalogram (EEG) signals have served as the primary modality for seizure prediction due to their direct measurement of brain activity and high diagnostic precision. However, their cost, sensitivity to noise, and practical deployment constraints limit their applicability outside controlled clinical environments. To overcome these challenges, recent studies have increasingly investigated electrocardiogram (ECG) signals as a practical and non-invasive alternative for seizure prediction in real-world settings. Evidence suggests that ECG-derived cardiac signatures may precede clinical seizure onset, offering a viable window for early detection. In this paper, we propose a reconstruction-based anomaly detection framework that integrates time-frequency representations with advanced deep learning models to capture deviations in heart rate dynamics associated with seizure onset. Afterward, reconstruction error is smoothed, and an adaptive thresholding strategy is applied to reduce false alarms. The method was evaluated on the Siena database, achieving a specificity of 99.16%, accuracy of 76.05%, and a false positive rate (FPR) of 0.01/h, with an average prediction horizon of 45 minutes prior to seizure onset. These results demonstrate that ECG-based prediction can provide clinically actionable early warnings while improving patient accessibility and comfort. Nevertheless, this performance reflects a trade-off favoring high specificity over sensitivity, resulting in reduced FPR and aligning with clinical requirements for reliable deployment.

URL PDF HTML ☆

赞 0 踩 0

2504.05454 2026-05-20 cs.LG cs.AI cs.CE q-bio.GN q-bio.QM 版本更新

GraphPINE: Graph Importance Propagation for Interpretable Drug Response Prediction

GraphPINE: 图重要性传播用于可解释的药物反应预测

Yoshitaka Inoue, Tianfan Fu, Augustin Luna

发表机构 * Computational Biology Branch, National Library of Medicine（国家医学图书馆计算生物学分支）； Developmental Therapeutics Branch, National Cancer Institute（国家癌症研究所发育治疗分支）

AI总结本文提出GraphPINE，一种利用领域特定先验知识初始化节点重要性的图神经网络架构，以提高药物反应预测的可解释性。通过引入重要性传播层，统一更新特征矩阵和节点重要性，并利用基于GNN的图传播来传播特征值，从而实现更有效的特征学习和图表示。

详情

AI中文摘要

可解释性对于生物医学研究中的许多任务都是必要的。最近的可解释性方法集中在注意力、梯度和Shapley值上。这些方法无法处理具有强相关先验知识的数据，并且未能基于已知的预测特征之间的关系来约束可解释性结果。我们提出了GraphPINE，一种图神经网络（GNN）架构，利用领域特定的先验知识来初始化节点重要性，以便在训练过程中优化用于药物反应预测。通常，一个手动的后预测步骤会检查文献（即先验知识）以理解返回的预测特征。虽然梯度和注意力在预测后可以获取节点重要性，但这些方法的节点重要性缺乏互补的先验知识；GraphPINE旨在克服这一限制。GraphPINE与其他GNN门控方法的不同之处在于利用了类似LSTM的顺序格式。我们引入了一个重要性传播层，统一了1）特征矩阵和节点重要性的更新以及2）使用基于GNN的图传播来传播特征值。这种初始化和更新机制使得特征学习更加有据可依，并提高了图表示的质量。我们应用GraphPINE进行癌症药物反应预测，使用了超过5000个基因节点的药物筛选和基因数据，这些节点包含在基因-基因图中，并利用药物-靶点相互作用（DTI）图进行初始重要性。基因-基因图和DTI来自经过整理的来源，并通过讨论药物和基因之间关系的文章数量进行加权。GraphPINE在952种药物上实现了PR-AUC为0.894和ROC-AUC为0.796。代码可在https://anonymous.4open.science/r/GraphPINE-40DE获取。

英文摘要

Explainability is necessary for many tasks in biomedical research. Recent explainability methods have focused on attention, gradient, and Shapley value. These do not handle data with strong associated prior knowledge and fail to constrain explainability results based on known relationships between predictive features. We propose GraphPINE, a graph neural network (GNN) architecture leveraging domain-specific prior knowledge to initialize node importance optimized during training for drug response prediction. Typically, a manual post-prediction step examines literature (i.e., prior knowledge) to understand returned predictive features. While node importance can be obtained for gradient and attention after prediction, node importance from these methods lacks complementary prior knowledge; GraphPINE seeks to overcome this limitation. GraphPINE differs from other GNN gating methods by utilizing an LSTM-like sequential format. We introduce an importance propagation layer that unifies 1) updates for feature matrix and node importance and 2) uses GNN-based graph propagation of feature values. This initialization and updating mechanism allows for informed feature learning and improved graph representation. We apply GraphPINE to cancer drug response prediction using drug screening and gene data collected for over 5,000 gene nodes included in a gene-gene graph with a drug-target interaction (DTI) graph for initial importance. The gene-gene graph and DTIs were obtained from curated sources and weighted by article count discussing relationships between drugs and genes. GraphPINE achieves a PR-AUC of 0.894 and ROC-AUC of 0.796 across 952 drugs. Code is available at https://anonymous.4open.science/r/GraphPINE-40DE.

URL PDF HTML ☆

赞 0 踩 0

2504.04349 2026-05-20 cs.GT cs.LG 版本更新

Tight Regret Bounds for Fixed-Price Bilateral Trade

固定价格双边交易的紧懊悔界

Houshuang Chen, Yaonan Jin, Pinyan Lu, Chihao Zhang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Huawei’s Taylor Lab（华为泰勒实验室）； Shanghai University of Finance and Economics, Laboratory of Interdisciplinary Research of Computation and Economics (SUFE)（上海金融学院，计算与经济学交叉研究实验室（SUFE））

AI总结本文研究了固定价格机制在双边交易中的懊悔最小化问题，针对独立值和相关/对抗值分别给出了紧致的懊悔界，并改进了现有结果。

详情

AI中文摘要

我们通过懊悔最小化的视角研究固定价格机制在双边交易中的应用。我们的主要结果有两个方面：(i) 对于独立值，给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优紧界$\widetilde{\Theta}(T^{2/3})$。(ii) 对于相关/对抗值，给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优下界$\Omega(T^{3/4})$，这改进了[ BCCF24]中得到的$\Omega(T^{5/7})$下界，并在多至多项式对数因子范围内匹配了同一工作中得到的$\widetilde{\mathcal{O}}(T^{3 / 4})$上界。我们的工作结合之前的[CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24]等工作，全面理解了固定价格双边交易的懊悔最小化问题。在此过程中，我们开发了两个可能具有独立兴趣的技术成分：(i) 一种名为'分形消除'的新算法范式，用于处理一比特反馈和独立值；(ii) 一种新的下界构造方法，具有新颖的证明技术，用于处理全局预算平衡约束和相关值。

英文摘要

We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetildeΘ(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $Ω(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $Ω(T^{5/7})$ lower bound obtained in the work [BCCF24] and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works [CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24] (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{fractal elimination}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.

URL PDF HTML ☆

赞 0 踩 0

2503.11615 2026-05-20 cs.LG math.OC 版本更新

From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting

从分数匹配到扩散：在高斯设定下的细粒度误差分析

Samuel Hurault, Matthieu Terris, Thomas Moreau, Gabriel Peyré

发表机构 * ENS Paris, PSL, CNRS（巴黎高等师范学院、巴黎综合理工学院、国家科学研究中心）； Univ. Paris-Saclay, Inria, CEA（巴黎萨克雷大学、法国国家信息与自动化技术研究所、法国原子能委员会）

AI总结本文研究了在高斯设定下使用扩散采样器时的采样误差，分析了分数匹配和扩散过程中的四个主要误差源，并揭示了数据分布各向异性与端到端采样方法关键参数之间的相互作用。

详情

AI中文摘要

从未知分布采样，仅能通过离散样本获取，是生成式人工智能的核心基础问题。当前最先进的方法遵循两步过程：首先估计分数函数（平滑对数分布的梯度），然后应用基于扩散的采样算法——如兰格-恩或扩散模型。所得到分布的正确性可能受四个主要因素影响：分数匹配中的泛化和优化误差，以及扩散过程中的离散化和最小噪声幅度。在本文中，我们明确地在高斯设定下使用扩散采样器时的采样误差。我们提供了来自这些四个误差源的Wasserstein采样误差的精确分析。这使我们能够严格追踪数据分布各向异性（通过其功率谱编码）如何与端到端采样方法的关键参数相互作用，包括初始样本数量、分数匹配和扩散中的步长以及噪声幅度。值得注意的是，我们展示了Wasserstein采样误差可以表示为数据功率谱的核型范数，其中具体的核取决于方法参数。这一结果为进一步分析优化采样精度的权衡提供了基础。

英文摘要

Sampling from an unknown distribution, accessible only through discrete samples, is a fundamental problem at the core of generative AI. The current state-of-the-art methods follow a two-step process: first, estimating the score function (the gradient of a smoothed log-distribution) and then applying a diffusion-based sampling algorithm -- such as Langevin or Diffusion models. The resulting distribution's correctness can be impacted by four major factors: the generalization and optimization errors in score matching, and the discretization and minimal noise amplitude in the diffusion. In this paper, we make the sampling error explicit when using a diffusion sampler in the Gaussian setting. We provide a sharp analysis of the Wasserstein sampling error that arises from these four error sources. This allows us to rigorously track how the anisotropy of the data distribution (encoded by its power spectrum) interacts with key parameters of the end-to-end sampling method, including the number of initial samples, the stepsizes in both score matching and diffusion, and the noise amplitude. Notably, we show that the Wasserstein sampling error can be expressed as a kernel-type norm of the data power spectrum, where the specific kernel depends on the method parameters. This result provides a foundation for further analysis of the tradeoffs involved in optimizing sampling accuracy.

URL PDF HTML ☆

赞 0 踩 0

2503.08633 2026-05-20 cs.LG 版本更新

How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?

过度参数化如何影响深度神经网络的机器去学习？

Gal Alon, Yehuda Dar

发表机构 * Faculty of Computer and Information Science（计算机与信息科学学院）

AI总结本文研究了深度神经网络去学习任务中模型参数化水平（即网络宽度）对性能的影响，探讨了不同去学习方法在不同参数化水平、去学习目标（隐私保护或偏见消除）以及是否显式使用被删除示例时的表现差异，发现过度参数化模型在隐私和偏见消除方面表现更优，但会带来一定的泛化能力下降。

详情

AI中文摘要

机器去学习是更新训练后的模型以忘记特定训练数据而不从头重新训练的任务。在本文中，我们研究了深度神经网络（DNN）的去学习如何受到模型参数化水平（即DNN宽度）的影响。我们定义了几种最近文献中去学习方法的验证基于调优，并展示了这些方法在（i）DNN参数化水平、（ii）去学习目标（隐私或偏见消除）以及（iii）去学习方法是否显式使用被删除示例时表现不同。我们的结果表明，去学习通常在过度参数化模型上表现更佳，通过显著提高隐私或偏见消除的性能，以合理的泛化能力降级成本；尽管对于偏见消除，这要求去学习方法必须使用被删除的示例。此外，我们测量了去学习如何改变分类决策区域，在接近被删除示例的附近改变，而在其他地方则避免改变。通过这种方式，我们展示了过度参数化模型的去学习成功源于其能够精细地改变输入空间中的小区域模型功能，同时保持大部分模型功能不变。

英文摘要

Machine unlearning is the task of updating a trained model to forget specific training data without retraining from scratch. In this paper, we investigate how unlearning of deep neural networks (DNNs) is affected by the model parameterization level, which corresponds here to the DNN width. We define validation-based tuning for several unlearning methods from the recent literature, and show how these methods perform differently depending on (i) the DNN parameterization level, (ii) the unlearning goal (unlearned data privacy or bias removal), (iii) whether the unlearning method explicitly uses the unlearned examples. Our results show that unlearning usually excels on overparameterized models by significantly improving privacy/bias at a reasonable cost of utility (generalization) degradation; although for bias removal this requires the unlearning method to use the unlearned examples. Furthermore, we measure how much the unlearning changes the classification decision regions in the proximity of the unlearned examples, and avoids changing them elsewhere. By this we show that the unlearning success for overparameterized models stems from the ability to delicately change the model functionality in small regions in the input space while keeping much of the model functionality unchanged.

URL PDF HTML ☆

赞 0 踩 0

2404.16676 2026-05-20 cs.DS cs.LG 版本更新

Multilayer Correlation Clustering

多层相关聚类

Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

发表机构 * Intesa Sanpaolo University of Helsinki（Intesa Sanpaolo 哈尔滨工业大学）； Intesa Sanpaolo AI Research University of Helsinki（Intesa Sanpaolo AI 研究大学哈尔滨工业大学）

AI总结本文提出了一种多层相关聚类方法，旨在通过最小化多层不一致向量的ℓ_𝑝范数来优化聚类结果，并设计了相应的近似算法和实验验证。

Comments AISTATS 2026

详情

AI中文摘要

我们建立了多层相关聚类，这是相关聚类在多层设置下的新一般化。在该模型中，我们被给予一系列相关聚类的输入（称为层）在共同的集合V上。目标是找到V的一个聚类，使其多层不一致向量的ℓ_𝑝范数（p≥1）最小化，该向量的维度等于层数，每个元素表示聚类在相应层上的不一致程度。对于这一一般化，我们首先设计了一个O(L log n)的近似算法，其中L是层数。然后我们研究了我们问题的一个重要特殊情况，即具有所谓概率约束的情况。对于这种情况，我们首先给出一个(α+2)的近似算法，其中α是任何可能的单层对应物的近似比。此外，我们设计了一个4近似算法，该算法改进了上述一般概率约束情况下的近似比α+2=4.5。使用现实世界数据集的计算实验支持了我们的理论发现，并展示了所提出算法的实用性。

英文摘要

We establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$ of $n$ elements. The goal is to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the multilayer-disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers. We then study an important special case of our problem, namely the problem with the so-called probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets support our theoretical findings and demonstrate the practical effectiveness of our proposed algorithms.

URL PDF HTML ☆

赞 0 踩 0

2312.02652 2026-05-20 hep-ex cs.LG 版本更新

What Machine Learning Can Do for Focusing Aerogel Detectors

机器学习如何帮助聚焦气凝胶探测器

Foma Shipilov, Alexander Barnyakov, Viktor Bobrovnikov, Sergey Kononov, Fedor Ratnikov

发表机构 * NRU Higher School of Economics（俄罗斯莫斯科国立经济学院）； Budker Institute of Nuclear Physics of Siberian Branch Russian Academy of Sciences（西伯利亚分支俄罗斯科学院布里克核物理研究所）； Novosibirsk State Technical University（新西伯利亚国立技术大学）； Novosibirsk State University（新西伯利亚国立大学）

AI总结本文提出利用机器学习技术来过滤聚焦气凝胶环电离切连尼探测器中的背景信号，以减少数据流并提高粒子速度分辨率。

Comments 5 pages, 4 figures, to be published in 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023) proceedings

2102.11840 2026-05-20 cs.LG cs.NA math.NA math.PR 版本更新

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation

梯度下降在过参数化人工神经网络训练中的收敛速度

Arnulf Jentzen, Timo Kröger

AI总结本文研究了在过参数化 regime 下，使用分段仿射激活函数的人工神经网络通过批量梯度下降优化时的收敛速度问题，证明了在神经网络宽度足够大且学习率足够小的情况下，均方误差以线性速度收敛到零。

Comments 49 pages

详情

AI中文摘要

近年来，人工神经网络已发展为解决多种问题的强大工具，这些问题对于经典解法来说已达到极限。然而，仍然不清楚为什么梯度下降优化算法（如著名的批量梯度下降）在许多情况下能够实现零训练损失，即使目标函数是非凸非光滑的。在监督学习领域，解决这个问题的一个最有前途的方法是分析梯度下降优化在所谓的过参数化 regime 中的表现。本文通过考虑具有分段仿射激活函数的过参数化全连接浅层人工神经网络（如修正线性单元激活函数）进一步贡献于这一研究领域。具体而言，鉴于激活函数不是仿射函数且训练输入数据是成对不同的，我们证明了在高概率下，通过批量梯度下降优化的随机初始化人工神经网络的均方误差在神经网络宽度足够大且学习率足够小的情况下，会以线性收敛速度收敛到零。

英文摘要

In recent years, artificial neural networks have developed into a powerful tool for addressing a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why gradient descent optimization algorithms with random initialization, such as the well-known batch gradient descent, are able to achieve zero training loss in many situations, even though the objective function is non-convex and non-smooth. One of the most promising approaches to solving this issue in the field of supervised learning is the analysis of gradient descent optimization in the so-called overparameterized regime. In this article, we provide a further contribution to this area of research by considering overparameterized fully connected shallow artificial neural networks with piecewise affine activation, such as the rectified linear unit activation. Specifically, given that the activation function is not affine and the training input data are pairwise distinct, we show that, with high probability, the mean squared error of such a randomly initialized artificial neural network optimized via batch gradient descent converges to zero at a linear convergence rate as long as the width of the artificial neural network is sufficiently large and the learning rate is sufficiently small.

URL PDF HTML ☆

赞 0 踩 0

2605.19755 2026-05-20 cs.SE cs.AI cs.CR cs.LG cs.MA 版本更新

Operationalising Artificial Intelligence Bills of Materials (AIBOMs) for Verifiable AI Provenance and Lifecycle Assurance

将人工智能物料清单（AIBOM） operationalise 以实现可验证的 AI 追溯和生命周期保证

Petar Radanliev, Omar Santos, Carsten Maple, Kay Atefi

AI总结本文提出了一种扩展CycloneDX标准的AIBOM框架，用于捕捉AI特定的溯源、模型血统和披露元数据，通过结构化架构工程、密码学验证和智能体驱动自动化，实现可验证的软件溯源，展示了98.7%的可重复性保真度、96.2%的漏洞匹配精度和63%的手动监督减少，验证了自动化溯源保证和可重复AI生命周期验证的可行性。

详情

DOI: 10.3389/fcomp.2026.1735919
Journal ref: Front. Comput. Sci. 8:1735919 (2026)

AI中文摘要

人工智能（AI）系统日益依赖复杂的、多层的软件供应链，这带来了可重复性、透明性和安全性保证的挑战。本文提出了一种扩展CycloneDX标准的人工智能物料清单（AIBOM）架构，以捕捉AI特定的溯源、模型血统和披露元数据。该框架通过结构化架构工程、密码学验证和智能体驱动自动化，提供了一种正式的方法来实现可验证的软件溯源。开发了一个自主的AI流水线，利用机器可验证的溯源链进行持续的环境检查、漏洞丰富和可重复性审计。实证评估显示，在容器化分析工作流中，可重复性保真度为98.7%，漏洞匹配精度为96.2%，手动监督减少了63%。这些结果验证了自动化溯源保证和可重复AI生命周期验证的可行性。AIBOM框架在软件供应链透明性和AI可重复性工程的科学基础方面取得了进展，提供了一种可推广的方法来确保AI系统安全、加强溯源完整性，并支持符合国际信息安全标准。

英文摘要

Artificial Intelligence (AI) systems are increasingly dependent on complex, multi-layered software supply chains that introduce challenges for reproducibility, transparency, and security assurance. This study presents an Artificial Intelligence Bill of Materials (AIBOM) schema extending the CycloneDX standard to capture AI-specific provenance, model lineage, and disclosure metadata. The framework provides a formalised approach to verifiable software provenance through structured schema engineering, cryptographic validation, and agent-driven automation. An autonomous AI pipeline is developed to perform continuous environment inspection, vulnerability enrichment, and reproducibility auditing using machine-verifiable provenance chains. Empirical evaluation demonstrates 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and a 63% reduction in manual oversight across containerised analytic workflows. These results confirm the feasibility of automated provenance assurance and reproducible AI lifecycle validation. The AIBOM framework advances the scientific foundations of software supply chain transparency and AI reproducibility engineering, offering a generalisable methodology for securing AI systems, strengthening provenance integrity, and supporting compliance with international information security standards.

URL PDF HTML ☆

赞 0 踩 0

2605.19752 2026-05-20 cs.LG 版本更新

MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification

MSAlign: 用于代谢物鉴定的分子和质谱基础模型对齐方法

Paul Krzakala, Gabriel Melo, Camille Lançon, Charlotte Laclau, Rémi Flamary, Etienne Thévenot, Florence d'Alché-Buc

发表机构 * LTCI, Télécom Paris & CMAP, Ecole Polytechnique, Institut Polytechnique de Paris（LTCI，巴黎电信学院及巴黎高等技术学院的联合机构，CMAP，巴黎高等理工学院，巴黎高等技术学院）； LTCI, Télécom Paris, Institut Polytechnique de Paris（LTCI，巴黎电信学院，巴黎高等技术学院）； CEA, INRAE, MetaboHUB, Université Paris-Saclay（CEA，国家核能研究中心，法国农业研究机构，代谢组学枢纽，巴黎萨克雷大学）

AI总结本研究提出MSAlign方法，通过多模态对齐技术对齐分子和质谱基础模型，以提高代谢物鉴定的准确性，并解决了数据分割策略中的分布偏移问题。

详情

AI中文摘要

准确地从质谱数据中识别代谢物（即小分子）仍然是代谢组学中的核心挑战，广泛应用于药物发现、环境分析和临床研究。我们解决了分子检索任务，即从给定的候选分子中恢复代谢物的化学结构，基于其MS/MS光谱。尽管最近发布的基准数据集如MassSpecGym和Spectraverse大大加速了新型机器学习方法的发展，但数据预处理管道的复杂性和缺乏统一的实现使得方法和结果难以重复和比较。我们做出了三个贡献。首先，我们提出一个统一的框架，涵盖了基于表示对齐和对比学习的最新方法。其次，我们引入MSAlign，受多模态对齐在视觉-语言模型中的启发，通过轻量级MLP投影学习共享的表示空间，通过基于候选的对比目标对两个冻结的基础模型（DreaMS用于质谱和ChemBERTa用于分子）进行对齐。MSAlign易于实现，训练速度快，并在所有基准测试中一致地优于现有方法。第三，我们研究了一个长期存在的评估问题：分子检索中的数据分割策略在数据泄漏和领域偏移之间进行权衡。我们通过引入分布偏移的定量度量来正式化这种张力，并利用它来评估现有基准中的分割策略。所有数据集、分割、候选集以及MSAlign和基线的统一实现已公开发布，以支持可重复的研究。

英文摘要

Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to implement, fast to train and consistently outperforms existing approaches across all benchmarks. Third, we investigate a long-standing evaluation problem: data splitting strategies in molecule retrieval implicitly trade off data leakage against domain shift. We formalize this tension by introducing a quantitative measure of distribution shift, and use it to evaluate splitting strategies in existing benchmarks. All datasets, splits, candidate sets, and a unified implementation of MSAlign and baselines are publicly released to support reproducible research.

URL PDF HTML ☆

赞 0 踩 0

基于扩散Copula的概率多变量时间序列预测

David Huk, Dongshan Wang, Miha Bresar

发表机构 * Department of Statistics The University of Warwick（威斯敏斯特大学统计系）； School of Data Science The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳）数据科学学院）

AI总结本文提出了一种扩散-Copula框架，通过分离边际分布学习与依赖结构学习，改进了多变量时间序列预测中对尾部风险的估计，展示了在加密货币市场中对系统性极值的预测优势。

Comments ICLR 2026 Workshop Advances in Financial AI

2605.19677 2026-05-20 cs.LG q-bio.QM 版本更新

Agentic Discovery of Cryomicroneedle Formulations

代理发现冷冻微针制剂配方

Hao Li, Lifu Du, Nurul Hameed, Shemonti Saha Authai, Zlata Stefanovic, Chenjie Xu

发表机构 * Department of Biomedical Engineering, City University of Hong Kong（香港城市大学生物医学工程系）

AI总结本研究提出了一种结合文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证的闭环工作流程，用于发现冷冻微针的冷冻保护剂配方，通过迭代湿实验验证提高了配方的准确性和有效性。

详情

AI中文摘要

冷冻微针提供了一种微创的皮下递送活细胞的途径，但其低温保存配方必须在保护细胞和限制毒性和设备制造约束之间取得平衡。本文报告了一种由AI辅助的闭环工作流程，用于冷冻微针冷冻保护剂的发现，结合了文献整理、高斯过程代理建模、贝叶斯优化和顺序湿实验验证。一个包含198种骨髓干细胞冷冻保存配方的curated数据集（来自42项研究）被转换为21种成分特征，并用于训练一个不确定性的文献先验模型。该模型捕捉了文献数据中的中等结构，但前瞻性地失败了，促使进行迭代的湿实验修正。在十次验证迭代和106次湿实验观察中，模型逐步适应了冷冻微针特定的结果：批次RMSE从41.21个百分点降低到6.86个百分点，后期阶段的排名相关性变得一致为正，累积的湿实验预测与测量总结达到了R²=0.942。最佳验证配方实现了95.15%的复苏存活率，同时具有低DMSO、ectoin、乙二醇和胎牛血清含量。然而，高存活率本身并不保证冷冻微针的完整形成，突显了未来多目标优化的必要性。这些结果表明，代理辅助的计算基础设施可以使数据高效的配方发现对拥有少量内部数据专业知识的实验室更加可及。项目代码可在https://github.com/baitmeister/ML-for-CryoMN上获得。

英文摘要

Cryomicroneedles offer a route to minimally invasive intradermal delivery of living cells, but their cryogenic formulations must reconcile cell protection with constraints on toxicity and device fabrication. Here we report an AI-assisted, closed-loop workflow for cryomicroneedle cryoprotectant discovery that combines literature curation, Gaussian-process surrogate modelling, Bayesian optimization, and sequential wet-lab validation. A curated dataset of 198 mesenchymal stem-cell cryopreservation formulations from 42 studies was converted into 21 ingredient features and used to train an uncertainty-aware literature prior. This model captured moderate structure in the literature data but failed prospectively, motivating iterative wet-lab correction. Across ten validation iterations and 106 wet-lab observations, the model progressively adapted to cryomicroneedle-specific outcomes: batch RMSE decreased from 41.21 to 6.86 percentage points, later-stage rank correlations became consistently positive, and the cumulative wet-lab predicted-versus-measured summary reached $R^2 = 0.942$. The best validated formulation achieved 95.15\% post-thaw viability with low DMSO, ectoin, ethylene glycol, and fetal bovine serum. However, high viability alone did not ensure intact cryomicroneedle formation, highlighting the need for future multi-objective optimization. These results demonstrate that agent-assisted computational infrastructure can make data-efficient formulation discovery more accessible to labs with minimal data expertise in-house. Project code is available at https://github.com/baitmeister/ML-for-CryoMN.

URL PDF HTML ☆

赞 0 踩 0

2605.19667 2026-05-20 math.OC cs.LG 版本更新

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

非凸双层优化中基于共识的粒子方法的收敛性

Yutong Chao, Xudong Sun, Konstantin Riedl, Majid Khadiv, Jalal Etesami

发表机构 * Department of Computer Science（计算机科学系）； Technical University of Munich（慕尼黑技术大学）； Munich Institute of Robotics and Machine Intelligence（慕尼黑机器人与智能机械研究所）； Mathematical Institute（数学研究所）； University of Oxford（牛津大学）

AI总结本文研究了一种用于非凸双层优化的基于共识的优化方法，旨在最小化上层函数，其中下层问题的全局极小值集是优化域。该方法无导数，通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建共识点。研究建立了与关联的均场动力学及其有限粒子近似的收敛性保证。特别地，在适当的平滑分位数局部化、误差界和稳定性假设下，证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解。数值实验进一步支持了理论结果。

2605.19666 2026-05-20 physics.med-ph cs.LG 版本更新

从知识图谱嵌入中推断敏感属性：攻击与防御策略

Yasmine Hayder

发表机构 * LIFO, INSA CVL, Univ. Orléans, Inria, France（LIFO，法国里尔大学CVL学院，奥尔良大学，法国国家信息与自动化研究所）

AI总结本文研究了基于知识图谱嵌入（KGE）推理的隐私风险，提出了一种通过后处理去污技术减轻这些风险的框架，探讨了在推荐质量与隐私保护之间进行权衡的必要性。

详情

Journal ref: ESWC - Extended Semantic Web Conference, May 2026, Dubrovnik, France

AI中文摘要

知识图谱（KGs）是一种强大的链接数据表示形式，提供了灵活性、语义丰富性和支持知识丰富和推理的能力。它们帮助数据所有者组织和利用异构数据以提供有洞察力的服务（例如推荐），但现实中的KGs往往不完整，隐藏了真实的事实或遗漏了有价值的观点。知识图谱嵌入技术常用于推断有价值的缺失信息。然而，对KGs的推理可能会无意中暴露敏感的用户信息，即使这些数据并未显式存储。在本文中，我们研究了基于KGE推理的隐私风险，重点关注攻击者试图从看似非敏感的输出中推断出敏感用户属性的属性推断攻击。我们提出并评估了一个框架，通过应用后处理去污技术来减轻这些隐私风险。初步结果展示了这些攻击对KGE模型输出的有效性，并探讨了在应用基于随机化的技术时推荐质量与隐私保护之间的权衡，突显了未来工作需要实验更高级技术以解决此问题的必要性。

英文摘要

Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.

URL PDF HTML ☆

赞 0 踩 0

2605.19641 2026-05-20 stat.ML cs.LG 版本更新

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

增加缺失值以减少偏差：带有缺失数据的Richardson-SGD

Ferdinand Genans, Erwan Scornet

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM（索邦大学和巴黎cité大学，CNRS，概率、统计与建模实验室，LPSM）

AI总结本文研究了如何通过增加缺失值来减少梯度偏差，提出了一种基于Richardson外推的Richardson-SGD方法，该方法通过在已有不完整数据的基础上故意增加缺失率，从而抵消梯度偏差，提高了不完整数据下的优化和估计性能。

详情

AI中文摘要

随机梯度方法在现代大规模学习中至关重要，但其在不完整协变量中的使用仍然谨慎，因为插补方案通常会引入系统性的梯度偏差，如在线性模型中所示。在本工作中，我们证明了所有参数模型在各种插补程序中都表现出相似的梯度偏差，并且精确地刻画了缺失率向量p的依赖性，其中O(||p||)是主导项。我们利用这一分析，提出了一种简单的去偏差程序，用于带有缺失值的随机梯度下降（SGD），基于Richardson外推。关键思想是“故意增加缺失率”：从已有的不完整观测中，生成一个更稀疏的版本，在更高的、受控的缺失率下，并将两个结果的随机梯度结合以抵消主导的偏差项。我们证明，在几种缺失情况中，一个Richardson步骤将梯度偏差从O(||p||)减少到O(||p||²)。我们提出的方法计算高效，模型无关，并适用于任何参数损失函数，其随机梯度可以在插补后计算。此外，当缺失指示符独立时，总体梯度偏差是p的多线性多项式，并仅取决于由声明单个坐标缺失引起的总体梯度误差。在这种情况下，我们的方法可以推广到多步Richardson过程，该过程递归地抵消更高阶项。在经验上，Richardson去偏差提高了多个广义线性模型中的优化和估计性能，并与广泛使用的插补程序如MICE相结合。这些结果表明，有些反直觉地，在现有缺失数据上添加受控的缺失率可以使不完整数据的随机学习更准确。

英文摘要

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.

URL PDF HTML ☆

赞 0 踩 0

2605.19633 2026-05-20 cs.CL cs.AI cs.LG cs.NE cs.SE 版本更新

optimize_anything: A Universal API for Optimizing any Text Parameter

optimize_anything: 一个用于优化任何文本参数的通用API

Lakshya A Agrawal, Donghyun Lee, Shangyin Tan, Wenjie Ma, Karim Elmaaroufi, Rohit Sandadi, Sanjit A. Seshia, Koushik Sen, Dan Klein, Ion Stoica, Joseph E. Gonzalez, Omar Khattab, Alexandros G. Dimakis, Matei Zaharia

发表机构 * MIT（麻省理工学院）

AI总结本文提出了一种基于LLM的通用优化系统，能够跨不同领域实现文本参数的优化，展示了其在六个多样化任务中的state-of-the-art性能，通过多任务搜索和跨问题迁移实现了高效的优化。

Comments 16 pages, 11 figures; Blog: https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-optimize-anything/

详情

DOI: 10.1145/3786335.3813167
Journal ref: Proceedings of the ACM Conference on AI and Agentic Systems (CAIS 26), May 26-29, 2026, San Jose, CA, USA

AI中文摘要

能否一个基于LLM的优化系统在根本不同的领域中匹配专门工具？我们证明当优化问题被表述为改进一个通过评分函数评估的文本工件时，一个基于AI的优化系统—支持单任务搜索、多任务搜索和跨问题迁移以及对未见过的输入进行泛化—在六个不同的任务中实现了state-of-the-art的结果。我们的系统发现了将Gemini Flash的ARC-AGI准确性几乎提高三倍的代理架构（32.5%到89.5%），发现了将云成本降低40%的调度算法，生成了87%匹配或超过PyTorch的CUDA内核，并优于AlphaEvolve报告的圆圈打包解决方案（n=26）。在三个领域的消融研究揭示了可操作的侧信息比仅评分反馈更快收敛且最终得分更高，且多任务搜索在同等问题预算下通过跨任务迁移优于独立优化。共同，我们首次展示了基于LLM搜索的文本优化是一种通用问题解决范式，将传统需要领域特定算法的任务统一到一个框架下。我们开源了optimize_anything，并支持多个后端作为GEPA项目的一部分，在https://github.com/gepa-ai/gepa上。

英文摘要

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .

URL PDF HTML ☆

赞 0 踩 0

2605.19629 2026-05-20 stat.ML cs.LG math.OC 版本更新

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

高斯近似与乘数自助法用于联邦线性随机逼近

Ilya Levin, Maksim Shuklin, Eric Moulines, Paul Mangold, Sergey Samsonov

发表机构 * HSE University（莫斯科国立高等经济学院）； MBZUAI（马克斯·普朗克智能系统研究所）； CMAP, CNRS, École Polytechnique, Institut Polytechnique de Paris（巴黎高等理工学院应用数学与计算科学实验室，法国国家科学研究中心）

AI总结本文建立了联邦线性随机逼近的Berry-Esseen型界，首次明确捕捉通信-计算权衡和异质性误差项的联邦高斯近似，量化了局部步长、局部更新次数和异质性对收敛速率的影响。

2605.19625 2026-05-20 cs.LG 版本更新

Optimal Reconstruction from Linear Queries

从线性查询中最优重建

Yuval Filmus, Shay Moran, Elizaveta Nesterova

发表机构 * Technion – Israel Institute of Technology（技术学院 – 以色列理工学院）； Google Research（谷歌研究）

AI总结研究如何从近似线性查询中重建未知点，分析查询数量、维度和噪声参数对重建误差的影响，并提出一种改进的重建问题变体。

Comments Accepted to COLT 2026. 46 pages, 4 figures

详情

AI中文摘要

我们研究从近似线性查询中重建$\mathbb{R}^d$中未知点的问题。该设定出现在从低维遥感和信号恢复到高维数据分析和隐私敏感推断的应用中。我们的主要目标是将最优重建误差作为查询数量$T$、环境维度$d$和噪声参数$\delta$的函数进行表征。我们首先分析$T o \infty$的极限，证明最优重建误差收敛到显式值$\sqrt{2d/(d+1)} \delta$，其作用类似于监督学习中的贝叶斯最优误差。当维度固定时，我们显示在该极限之上，误差以双指数速度衰减，比通常在学习曲线中遇到的速率快得多。当维度增长时，我们证明需要数量级为$\exp(d)$的查询才能实现消失的误差。最后，我们介绍并分析了重建问题的一个不恰当变体。从技术角度看，我们的主要贡献是Jung定理（1901）的推广。经典定理界定了直径为1的集合的最大可能半径，并刻画了极值体。我们的推广提供了一个鲁棒变体，刻画了近极值体，并通过利用对称性和李群作用的几何和动力学论证证明。

英文摘要

We study the problem of reconstructing an unknown point in $\mathbb{R}^d$ from approximate linear queries. This setting arises naturally in applications ranging from low-dimensional remote sensing and signal recovery to high-dimensional data analysis and privacy-sensitive inference. Our main goal is to characterize the optimal reconstruction error as a function of the number of queries $T$, the ambient dimension $d$, and the noise parameter $δ$. We first analyze the limit $T \to \infty$ and show that the optimal reconstruction error converges to the explicit value $\sqrt{2d/(d+1)} δ$, which plays a role analogous to the Bayes optimal error in supervised learning. When the dimension is fixed, we show that the excess error above this limit decays doubly exponentially fast as $T \to \infty$, a rate that is significantly faster than those typically encountered in learning curves. When the dimension grows, we show that a number of queries on the order of $\exp(d)$ is necessary and sufficient to achieve vanishing excess error. Finally, we introduce and analyze an improper variant of the reconstruction problem. From a technical perspective, our main contribution is a generalization of Jung's theorem (1901). The classical theorem bounds the maximum possible radius of a set of diameter 1 and characterizes extremal bodies. Our generalization provides a robust variant that characterizes near-extremal bodies and is proved via geometric and dynamical arguments exploiting symmetry and Lie group actions.

URL PDF HTML ☆

赞 0 踩 0

2605.19621 2026-05-20 eess.IV cs.LG cs.NA math.NA 版本更新

Diffusion Graph Posterior Sampling for Nonlinear Inverse Problems with Application to Electrical Impedance Tomography

基于扩散后验采样的图结构数据非线性反问题求解方法及其在电阻抗断层成像中的应用

Giovanni S. Alberti, Damiana Lazzaro, Serena Morigi, Matteo Santacesaria, Shibo Wang

发表机构 * MaLGa Center, Department of Mathematics, University of Genova（马尔加中心，数学系，热那亚大学）； Department of Mathematics, University of Bologna（数学系，博洛尼亚大学）； Department of Mathematics, Harbin Institute of Technology（数学系，哈尔滨工业大学）

AI总结本文提出了一种扩展扩散后验采样（DPS）到图结构数据的框架，通过在二维三角网格上开发无条件分数基于扩散模型来学习物理解空间的准确先验，并引入正则化变体RDPS，结合总变差和广义Tikhonov等显式正则化项，以缓解严重病态问题，实验表明RDPS在合成和真实2D EIT数据集上产生稳定且物理合理的重建。

详情

AI中文摘要

深度生成模型已发展为解决反问题的最先进方法，但将其应用于PDE反问题，如电阻抗断层成像（EIT）仍具挑战性。由于物理领域自然离散为无结构网格而非规则网格，标准卷积架构往往不足。本文提出了一种新的框架，将扩散后验采样（DPS）扩展到图结构数据。我们开发了直接在2D三角网格上无条件分数基于扩散模型，以学习物理解空间的准确先验。此外，我们引入正则化变体RDPS，结合总变差和广义Tikhonov等显式正则化项，以补充隐含扩散先验并缓解严重病态问题。在合成和真实2D EIT数据集上的广泛实验表明，RDPS产生稳定、物理合理的重建。我们的方法能够很好地推广到非分布包含几何形状，对测量噪声具有高度鲁棒性，并在重建准确性和伪影减少方面优于当前最先进的求解器（例如GPnP-BM3D、DP-SGS）

英文摘要

Deep generative models have emerged as state-of-the-art for solving inverse problems, but applying them to inverse problems for PDEs, like electrical impedance tomography (EIT) remains challenging. Because physical domains are naturally discretized as unstructured meshes rather than regular grids, standard convolutional architectures are often inadequate. In this paper, we propose a novel framework that extends diffusion posterior sampling (DPS) to graph-structured data. We develop an unconditional score-based diffusion model directly on a 2D triangular mesh to learn an accurate prior over the physical solution space. Furthermore, we introduce a regularized variant, RDPS, which incorporates explicit regularization terms, such as total variation and generalized Tikhonov, to complement the implicit diffusion prior and mitigate severe ill-posedness. Extensive experiments on synthetic and real 2D EIT datasets demonstrate that RDPS produces stable, physically plausible reconstructions. Our approach generalizes well to out-of-distribution inclusion geometries, is highly robust to measurement noise, and outperforms current state-of-the-art solvers (e.g., GPnP-BM3D, DP-SGS) in reconstruction accuracy and artifact reduction.

URL PDF HTML ☆

赞 0 踩 0

2605.19619 2026-05-20 cs.LG cs.AI math.OC stat.ML 版本更新

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

MiMuon: 一种具有改进泛化能力的混合穆恩优化器用于大模型

Feihu Huang, Yuning Luo, Songcan Chen

发表机构 * College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics（南京航空航天大学计算机科学与技术学院）； MIIT Key Laboratory of Pattern Analysis and Machine Intelligence（信息科技部模式分析与机器智能重点实验室）； College of Design and Engineering, National University of Singapore（新加坡国立大学设计与工程学院）

AI总结本文研究了穆恩优化器的泛化误差，提出了一种改进的混合穆恩优化器MiMuon，证明其泛化误差更低，同时保持了与穆恩优化器相同的收敛速度。

Comments 25 pages

详情

AI中文摘要

矩阵结构的参数在许多人工智能模型中频繁出现，例如大语言模型。最近，为大规模模型的矩阵参数设计了一种高效的穆恩优化器，其收敛速度明显快于向量级算法。尽管一些工作已经开始研究穆恩优化器的收敛性质（即优化误差），但其泛化性质（即泛化误差）尚未建立。因此，在本文中，我们基于算法稳定性与数学归纳法研究穆恩优化器的泛化误差，并证明穆恩优化器的泛化误差为O(1/(Nκ^T))，其中N为训练样本数量，T表示迭代次数，κ>0表示梯度估计奇异值之间的最小差。为了增强穆恩优化器的泛化能力，我们通过谨慎使用梯度的正交化，提出了一种有效的混合穆恩（MiMuon）优化器，该优化器是穆恩优化器与基于动量的SGD优化器的混合。然后我们证明我们的MiMuon优化器的泛化误差比穆恩优化器的O(1/(Nκ^T))更低，因为κ通常非常小。同时，我们还研究了我们MiMuon算法的收敛性质，并证明我们的MiMuon算法具有与穆恩算法相同的收敛速度O(1/T^{1/4})。在训练大模型（包括Qwen3-0.6B和YOLO26m）的一些数值实验结果中展示了MiMuon优化器的效率。

英文摘要

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algorithms. Although some works have begun to study convergence properties (i.e., optimization error) of the Muon optimizer, its generalization properties (i.e., generalization error) is still not established. Thus, in this paper, we study generalization error of the Muon optimizer based on algorithmic stability and mathematical induction, and prove that the Muon has a generalization error of $O\big(\frac{1}{Nκ^{T}}\big)$, where $N$ is training sample size, and $T$ denotes iteration number, and $κ>0$ denotes minimum difference between singular values of gradient estimate. To enhance generalization of the Muon, we propose an effective mixed Muon (MiMuon) optimizer by cautiously using orthogonalization of gradient, which is a hybrid of Muon and momentum-based SGD optimizers. Then we prove that our MiMuon optimizer has a lower generalization error of $O\big(\frac{1}{N}\big)$ than $O\big(\frac{1}{Nκ^{T}}\big)$ of Muon optimizer, since $κ$ generally is very small. Meanwhile, we also studied the convergence properties of our MiMuon algorithm, and prove that our MiMuon algorithm has the same convergence rate of $O(\frac{1}{T^{1/4}})$ as the Muon algorithm. Some numerical experimental results on training large models including Qwen3-0.6B and YOLO26m demonstrate efficiency of the MiMuon optimizer.

URL PDF HTML ☆

赞 0 踩 0

2605.19618 2026-05-20 cs.LG stat.ME 版本更新

A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees

可解释性集成树的重建质量评估的一类发散度度量

Massimo Aria, Agostino Gnasso, Carmela Iorio

发表机构 * Department of Economics and Statistics, University of Naples Federico II（那不勒斯费德里科二世大学经济与统计系）

AI总结本文提出了一种基于发散度的度量框架，用于评估可解释性集成树的重建质量，通过区分一致性和关联性，提供了一种新的诊断方法来识别重建失败的具体原因。

详情

AI中文摘要

验证集成学习者可解释的替代模型需要测量集成内部表示与其替代近似之间的同意程度，而不是仅仅关联性。基于相关性的方法是尺度不变的，无法检测共现结构中的系统性差异。我们提出了一种基于一致性和关联性区别的统计框架，以归一化的可解释性损失（nLoI）为中心。该框架基于Cressie-Read幂发散家族，lambda等于2，nLoI可以分解为节点内和节点间的组成部分，提供了独特的诊断能力，以精确识别重建失败的位置和原因。该框架包含四个互补的度量，捕捉替代质量的不同结构方面。统一的排列检验程序在单次重采样过程中为所有度量提供有效的推断。每个度量的理论性质，包括有界性和对称性，均已建立。蒙特卡洛模拟和实证评估证实了精确的I型错误控制，并展示了这些度量能够检测出相关性方法无法检测到的重建保真度梯度。该框架在可解释性集成树（E2Tree）的背景下开发和说明，并在三个基准数据集上的实证评估展示了该框架的实际应用价值。

英文摘要

Validating interpretable surrogate models for ensemble learners requires measuring agreement between the ensemble's internal representation and its surrogate approximation, rather than mere association. Correlation-based approaches are scale-invariant and fail to detect systematic discrepancies in co-occurrence structure. We propose a statistical framework grounded in the agreement-association distinction, centered on the normalized Loss of Interpretability (nLoI). Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure delivers valid inference for all measures within a single resampling pass. Theoretical properties, including boundedness and symmetry, are established for each metric. Monte Carlo simulations and empirical evaluations confirm exact Type I error control and demonstrate that these measures detect reconstruction fidelity gradients invisible to correlation-based alternatives. The framework is developed and illustrated in the context of Explainable Ensemble Trees (E2Tree), and empirical evaluation on three benchmark datasets illustrates the practical utility of the framework.

URL PDF HTML ☆

赞 0 踩 0

2605.19610 2026-05-20 stat.ML cs.LG 版本更新

HiLiftAeroML：高保真计算流体力学数据集用于高升力飞机气动性能

Neil Ashton, Adam Clark, Liam Heidt, Christopher Ivey, Sanjeeb Bose, Rahul Agrawal, Konrad Goc, Rishi Ranade, Corey Adams, Peter Sharpe, Sheel Nidhan, Semit Akkurt, Daniel Leibovici, Jean Kossaifi

发表机构 * nvidia

AI总结本文介绍了一个首个开源的高保真计算流体力学数据集，用于AI代理模型开发，该数据集包含1800个样本，源自180种几何变体和10个攻角的NASA通用研究模型（CRM）几何体，用于AIAA高升力预测工作坊系列。该数据集的创新之处在于使用GPU加速的高保真显式壁模式LES方法进行每个模拟，使用300M到500M的适应性网格，以确保在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集（几何体、时间平均体积和表面变量以及积分力）免费提供，带有宽松的开源许可（CC-BY-4.0）。通过公开发布此数据，我们旨在加速航空航天工业中AI代理建模的研究与开发。

详情

AI中文摘要

本文描述了首个开源的高保真计算流体力学数据集，用于AI代理模型开发。该数据集由1800个样本组成，源自180种几何变体和10个攻角的高升力NASA通用研究模型（CRM）几何体，用于AIAA高升力预测工作坊系列。该数据集的一个创新点是使用GPU加速的高保真显式壁模式LES方法进行每个模拟，使用300M到500M的适应性网格。这确保了在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集（几何体、时间平均体积和表面变量以及积分力）免费提供，带有宽松的开源许可（CC-BY-4.0）。通过公开发布此数据，我们旨在加速航空航天工业中AI代理建模的研究与开发。

英文摘要

This paper describes the first-ever open-source high-fidelity CFD dataset of a high-lift aircraft for the purpose of AI surrogate model development. The dataset is composed of 1800 samples, arising from 180 geometry variants and 10 angles of attack for the high-lift NASA Common Research Model (CRM) geometry, used within the AIAA High-Lift Prediction Workshop series. One of the novelties of this dataset is the use of a GPU-accelerated high-fidelity explicit, wall-modeled LES approach for each simulation, using solution-adapted grids between 300M and 500M cells. This ensures the greatest possible accuracy given known challenges in steady-state RANS approaches for these portions of the flight envelope. The entire dataset (geometries, time-averaged volume and surface variables and integral forces) are available, free of charge with a permissive open-source license (CC-BY-4.0). By making this data publicly available, we aim to accelerate the research and development of AI surrogate modeling within the aerospace industry.

URL PDF HTML ☆

赞 0 踩 0

2605.19562 2026-05-20 cs.RO cs.LG math.OC 版本更新

Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions

基于学习的优化轨迹规划用于协作的空中-地面切换任务

Jingshan Chen, Bochen Yu, Henrik Ebel, Peter Eberhard

发表机构 * Institute of Engineering and Computational Mechanics, University of Stuttgart, 70569 Stuttgart, Germany（工程与计算力学研究所，斯图加特大学，德国斯图加特70569）； Mechanical Engineering, LUT University, 53850 Lappeenranta, Finland（机械工程，卢蒂大学，芬兰拉佩恩兰塔53850）

AI总结本文提出了一种结合学习的轨迹规划框架，用于协同无人 aerial 和 ground 车辆的切换任务，通过使用解耦的编码器-解码器 LSTM 网络生成协调的切换轨迹预测，从而加速优化过程，实现更快的收敛和更高的优化成功率。

Comments Preprint of a contribution accepted for publication in the RoManSy 2026 Springer proceedings

详情

AI中文摘要

本文提出了一种基于学习的轨迹规划框架，用于协同无人 aerial 和 ground 车辆的切换任务。尽管集中式轨迹优化能够确保动态可行性和任务最优性，但其高计算成本限制了实时应用。我们提出了一种神经代理规划器，利用解耦的编码器-解码器长短期记忆（LSTM）网络，从任务规范中生成协调的切换轨迹预测。这些预测作为下游集中优化器的有信息的预热启动，从而加速收敛到动态可行的解决方案。基准评估显示，与冷启动优化相比，结合学习的规划框架在速度上提高了三倍以上，并实现了100%的优化成功率。结果表明，结合数据驱动推断与模型驱动细化能够为异构多机器人系统提供快速且可靠的轨迹生成。

英文摘要

This paper presents a learning-augmented trajectory planning framework for cooperative unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) handover missions. While centralized trajectory optimization ensures dynamic feasibility and task optimality, its high computational cost limits real-time applicability. We propose a neural surrogate planner utilizing decoupled encoder-decoder long short-term memory (LSTM) networks to generate coordinated handover trajectory predictions from the task specifications. These predictions serve as informed warm starts for the downstream centralized optimizer, thereby accelerating convergence to dynamically feasible solutions. Benchmark evaluations demonstrate that the learning-augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization. The results indicate that combining data-driven inference with model-based refinement enables fast and reliable trajectory generation for heterogeneous multi-robot systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19561 2026-05-20 cs.LG cs.AI 版本更新

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

TORQ：MXFP4量化中的两级正交旋转

Zukang Xu, Xing Hu, Dawei Yang

发表机构 * Open Compute Project（开放计算项目）

AI总结本文提出TORQ框架，通过优化坐标变换重塑激活空间的几何属性，解决MXFP4激活量化中的精度下降问题，显著提升量化精度。

Comments 17 pages, 4 figures, 13 tables

详情

AI中文摘要

随着大型语言模型（LLMs）向实际部署迈进，微缩FP4（MXFP4）格式已成为下一代低比特推断的基石，因其在高动态范围与硬件效率之间的平衡能力。然而，直接将MXFP4应用于LLM激活量化不可避免地导致显著的精度下降。在本文中，我们从理论上分析MXFP4激活量化的误差结构，揭示出性能下降的根本原因在于激活分布与MXFP4块浮点格式之间的两个结构性不平衡：（1）极端块间方差不平衡和（2）块内代码书利用不平衡。为了解决这些挑战，我们提出了TORQ（MXFP4量化中的两级正交旋转），一种无训练的后训练量化（PTQ）框架，通过最优坐标变换重塑激活空间的几何属性。在宏观层面，TORQ利用Schur-Horn定理通过块间正交旋转重新分配激活能量，防止高方差块驱动共享缩放因子，从而保留小幅度元素的精度。在微观层面，TORQ采用最大熵引导的块内旋转以缓解代码书坍塌并最大化MXFP4代码书的信息容量。在主流LLM如LLaMA3和Qwen3上的实验表明，与现有方法相比，TORQ显著提高了MXFP4激活量化的准确性：在Qwen3-32B上，WikiText的困惑度降低到8.43（相比BF16的7.61），平均准确率从直接RTN的38.40%增加到73.63%（相比BF16的74.82%），大幅缩小了4位浮点量化与全精度推断之间的差距。

英文摘要

As Large Language Models (LLMs) advance toward practical deployment, the Microscaling FP4 (MXFP4) format has emerged as a cornerstone for next-generation low-bit inference, owing to its ability to balance high dynamic range with hardware efficiency. However, directly applying MXFP4 to LLM activation quantization inevitably leads to significant accuracy degradation. In this paper, we theoretically analyze the error structure of MXFP4 activation quantization, revealing that the root cause of this performance drop lies in two structural imbalances between activation distributions and the MXFP4 block floating-point format: (1) extreme inter-block variance imbalance and (2) intra-block codebook utilization imbalance. To address these challenges, we propose TORQ (Two-level Orthogonal Rotation for MXFP4 Quantization), a training-free Post-Training Quantization (PTQ) framework designed to reshape the geometric properties of the activation space through optimal coordinate transformations. At the macroscopic level, TORQ leverages the Schur-Horn theorem to redistribute activation energy via inter-block orthogonal rotation, preventing high-variance blocks from driving up shared scaling factors and thereby preserving the precision of small-magnitude elements. At the microscopic level, TORQ employs maximum-entropy-guided intra-block rotation to alleviate codebook collapse and maximize the MXFP4 codebook's information capacity. Experiments on mainstream LLMs such as LLaMA3 and Qwen3 show that TORQ significantly improves the accuracy of MXFP4 activation quantization compared to existing methods: on Qwen3-32B, the perplexity on WikiText is reduced to 8.43 (vs. 7.61 for BF16), and the average accuracy increases from 38.40% with direct RTN to 73.63% (vs. 74.82% for BF16), substantially narrowing the gap between 4-bit floating-point quantization and full-precision inference.

URL PDF HTML ☆

赞 0 踩 0

2605.19557 2026-05-20 stat.ML cs.LG 版本更新

Density-Ratio Losses for Post-Hoc Learning to Defer

基于密度比损失的后验学习延迟

Alexander Soen, Ragnar Thobaben, Joakim Jaldén, Richard Nock

发表机构 * KTH（皇家理工学院）； Google Research（谷歌研究）

AI总结本文研究了后验学习延迟（L2D）问题，通过理想分布的视角定义延迟，并提出基于密度比损失的CPE损失函数，通过阈值判断延迟决策，从而在不重新训练的情况下调整延迟率，同时揭示了Chow规则与专家倾斜贝叶斯后验之间的联系。

Comments Preprint

详情

AI中文摘要

我们通过理想分布的视角研究后验学习延迟（L2D）。理想分布被定义为在其中模型能够取得低损失的数据分布的密度比重加权。我们通过将密度比估计还原为类别概率估计，推导出用于后验L2D评分器的DR CPE损失。延迟决策通过阈值化评分器进行，允许在不重新训练的情况下调整延迟率。对于基于KL的理想分布，我们的延迟规则在原始分布下恢复Chow规则，并在理想分布是联合或边缘分布时与专家倾斜的贝叶斯后验建立联系。实验表明，我们的方法在与常见基线相比具有竞争力，并且在不同数据集设置下更加稳健。更广泛地说，我们的结果将后验L2D视为理想分布之间的密度比学习，连接了Chow式规则、专家比较以及阐明了与异常检测等其他学习设置的相关联系。

英文摘要

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporates the expert's performance -- depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.

URL PDF HTML ☆

赞 0 踩 0

2605.19549 2026-05-20 cs.SE cs.LG 版本更新

Provable Fairness Repair for Deep Neural Networks

深度神经网络的可证公平修复

Jianan Ma, Jingyi Wang, Qi Xuan, Zhen Wang

发表机构 * Hangzhou Dianzi University, China（杭州电子科技大学）； Zhejiang University, China（浙江大学）； Zhejiang University of Technology, China（浙江工业大学）

AI总结本文提出ProF框架，通过区间界限传播技术，为深度神经网络提供可证的公平性修复，实现对偏见样本周围整个集合的公平性保障，并在多个基准数据集上验证了其有效性。

Comments 15 pages, 6 figures, 7 tables. full version of the paper accepted by ASE 2025

详情

DOI: 10.1109/ASE63991.2025.00049
Journal ref: Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2025

AI中文摘要

基于采样的安全强化学习

Luca Vignola, Bruce D. Lee, Manish Prajapat, Manuel Wendl, Melanie Zeilinger, Andreas Krause, Yarden As

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出了一种基于采样的安全强化学习方法，通过在有限的动力学样本集上联合施加约束，确保学习过程中的安全性，并在连续域中提供实用的安全保证，同时通过限制认知不确定性实现了高效的探索。

详情

AI中文摘要

安全探索仍然是强化学习（RL）中的基本挑战，限制了RL智能体在现实世界中的部署。我们提出了一种基于采样的安全强化学习（SBSRL），这是一种基于模型的RL算法，通过在有限的动力学样本集上联合施加约束，确保学习过程中的安全性。这种形式近似了在不确定动力学下的不可行最坏情况优化，并在连续域中实现了实用的安全保证。我们进一步引入了一种基于限制认知不确定性的探索策略，消除了显式探索奖励的需要。在常规条件下，我们推导了学习过程中安全性的高概率保证以及恢复近最优策略的有限时间样本复杂度界。实验证明，SBSRL在仿真和真实机器人硬件中均实现了安全且高效的探索，并可轻松扩展到实际的深度集合实现，以解决高维连续控制问题。

英文摘要

Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constraints jointly across a finite set of dynamics samples. This formulation approximates an intractable worst-case optimization over uncertain dynamics and enables practical safety guarantees in continuous domains. We further introduce an exploration strategy based on constraining epistemic uncertainty, eliminating the need for explicit exploration bonuses. Under regularity conditions, we derive high-probability guarantees of safety throughout learning and a finite-time sample complexity bound for recovering a near-optimal policy. Empirically, SBSRL achieves safe and efficient exploration both in simulation and in real robotic hardware, and readily extends to practical deep-ensemble implementations that scale to high-dimensional continuous control problems.

URL PDF HTML ☆

赞 0 踩 0

2605.19462 2026-05-20 cs.LG cs.AI 版本更新

数据过滤的惨痛教训

Christopher Mohri, John Duchi, Tatsunori Hashimoto

发表机构 * Department of Computer Science（计算机科学系）； Departments of Statistics and Electrical Engineering（统计学与电气工程系）； Stanford University（斯坦福大学）

AI总结本文研究了大规模模型预训练中的数据过滤，发现即使有足够的计算资源，过滤数据也不是最佳选择，因为充分训练的大型模型能够容忍低质量数据甚至从中受益。

2605.19403 2026-05-20 cs.LG 版本更新

TIDE: Asymmetric Neural Circuits for Stabilized Temporal Inhibitory-Excitatory Dynamics

TIDE：用于稳定时间抑制-兴奋动态的非对称神经电路

Alexander Kyuroson, Denis Kleyko, Marcus Liwicki

发表机构 * Luleå University of Technology（卢莱大学技术学院）； Örebro University（奥雷布罗大学）； RISE Research Institutes of Sweden（瑞典RISE研究所）

AI总结本文提出TIDE架构，通过非对称兴奋-抑制网络稳定时间动态，结合Wilson-Cowan动态和横向抑制，提升生物真实性和学习性能，实验表明其在训练时间和准确率上均优于CTM。

详情

AI中文摘要

最近的Continuous Thought Machine架构通过神经动态将内部计算与外部输入解耦，但依赖多层感知机而缺乏稳定性保证。我们提出使用非对称兴奋-抑制（E-I）网络建模神经动态，该网络可通过网络理论原理稳定，并可表示为通过博弈论损失优化的能量系统。基于此视角，我们引入时间抑制-兴奋动态引擎（TIDE），一种受神经启发的架构，通过稳定神经动态计算内部表示，整合Wilson-Cowan动态和横向抑制。TIDE通过例如使用分层感受野和强制Dale原则，平衡生物真实性，确保现实的80:20 E-I平衡比。本文的目标是引入一种新架构，将神经启发式学习置于 forefront。我们提供了收敛性、稳定性和复杂度界限的证明，以及实证消融研究。总体而言，TIDE在训练时间上比CTM少50%以下，并在各种扰动下将ImageNet的top-1准确率提高平均1.65%。

英文摘要

Recent Continuous Thought Machine architecture decouples internal computation from external inputs via neural dynamics, but relies on multi-layer perceptrons without stability guarantees. We propose to model neural dynamics using asymmetric Excitatory-Inhibitory (E-I) networks, which can be stabilized via principles from network theory and can be expressed as energy-based systems optimized through a game-theoretic loss. Building on this perspective, we introduce Temporal Inhibitory-Excitatory Dynamic Engine (TIDE), a neuro-inspired architecture that computes internal representations through neural dynamics stabilized by incorporating the Wilson-Cowan dynamics and lateral inhibition. TIDE balances biological realism by, for instance, using Hierarchical Receptive Fields and enforcing Dale's principle to ensure a realistic $80:20$ E-I balance ratio with an end-to-end trainable architecture. The aim of this paper is to introduce a new architecture that brings neuro-inspired learning to the forefront. We present proofs of convergence, stability, and complexity bounds, along with empirical ablation studies. Overall, TIDE surpasses CTM with under $50\%$ of the training time and improves $\texttt{top-1}$ accuracy by an average of $+1.65\%$ on ImageNet under various perturbations.

URL PDF HTML ☆

赞 0 踩 0

2605.19393 2026-05-20 cs.CV cs.LG 版本更新

Neuron Incidence Redistribution for Fairness in Medical Image Classification

神经元发生再分配用于医疗图像分类中的公平性

Abin Shoby, Lyle John Palmer, Nikhil Cherian Kurian

发表机构 * Neuron Incidence Redistribution for Fairness in Medical Image Classification（神经发生再分配用于医学图像分类）

AI总结本文提出了一种轻量级的正则化方法Neuron Incidence Redistribution (NIR)，通过减少预测概率加权平均激活值的方差来提升医疗图像分类中的公平性，实验结果显示在不同年龄和性别组别中，TPR和FPR的不平等现象显著降低。

Comments 4 Pages, 1 Figure

详情

AI中文摘要

深度学习模型在医疗图像分类中容易出现因年龄、性别和种族等人口属性导致的子群体性能差异。我们识别出这些差异背后的潜在表征机制：在迁移学习模型中，正预测下的主导倒数第二层激活通道同时被疾病阳性样本和特权人口群体（男性、年长患者）激活，导致过度诊断；相反，负预测下的主导通道由不利群体（女性、年轻患者）激活，导致系统性误诊。为了解决这一问题，我们提出了Neuron Incidence Redistribution (NIR)，一种轻量级正则化方法，该方法惩罚倒数第二层神经元预测概率加权平均激活值的方差，无需在训练时使用人口属性标签。在HAM10000数据集上，NIR使年龄组的TPR不平等从10.81%降至0.93%，性别组的TPR不平等从12.04%降至0.74%，同时AUC略有提高0.51个点。在Harvard OCT-RNFL数据集上，NIR减少了种族（从15.68%降至10.66%）和年龄（从12.69%降至1.80%）的FPR不平等，证明了在全倒数第二层分布潜在疾病证据是一种提升医疗AI人口公平性的原则性且有效的方法。

英文摘要

Deep learning models for medical image classification are susceptible to subgroup performance disparities across demographic attributes such as age, gender, and race. We identify a latent representational mechanism underlying these disparities: in transfer-learned models, the dominant penultimate-layer activation channel under positive predictions is co-activated by both disease-positive samples and privileged demographic groups (male, older patients), producing over-diagnosis; conversely, the dominant channel under negative predictions is co-activated by disadvantaged groups (female, younger patients), producing systematic under-diagnosis. To address this, we propose Neuron Incidence Redistribution (NIR), a lightweight regularization method that penalizes the variance of predicted-probability-weighted mean activations across penultimate-layer neurons, requiring no demographic labels at training time. On HAM10000, TPR disparity drops from 10.81% to 0.93% across age groups and from 12.04% to 0.74% across gender, with a marginal AUC improvement of 0.51 points. On Harvard OCT-RNFL, NIR reduces FPR disparity for race (from 15.68% to 10.66%) and age (from 12.69% to 1.80%), demonstrating that distributing latent disease evidence across the full penultimate layer is a principled and effective strategy for improving demographic fairness in medical AI.

URL PDF HTML ☆

赞 0 踩 0

2605.19392 2026-05-20 cs.LG 版本更新

用于神经网络模型融合的无冲突复制数据类型：一种双层架构，使26种策略兼容CRDT模型融合

Ryan Gillespie

发表机构 * Independent researcher（独立研究者）

AI总结本文提出了一种双层架构CRDTMergeState，通过将任何融合策略封装在CRDT兼容层中，解决了26种神经网络融合策略在分布式操作中无法满足交换律、结合律和幂等律的结构性问题，实现了强最终一致性。

详情

AI中文摘要

我们测试的所有26种神经网络融合策略，包括加权平均、SLERP、TIES、DARE、Fisher融合和进化方法，均无法满足用于无冲突分布式操作所需的代数属性（交换性、结合性和幂等性）。我们证明这种失败是结构性的：基于规范化的方法无法同时满足这三个属性。为了解决这个问题，我们提出了一种双层架构——CRDTMergeState，它将任何融合策略封装在CRDT兼容（无冲突复制数据类型）层中。第一层通过OR-Set CRDT语义管理贡献，其中融合操作是集合并集——这显然具有交换性、结合性和幂等性。第二层将融合策略作为确定性纯函数应用于一个规范有序的贡献集上，随机性从Merkle根中播种。我们证明这种分离保证了强最终一致性：所有接收相同贡献的副本计算出相同的融合模型，无论消息顺序如何。实证验证涵盖三个层次：受控的4x4张量（104/104测试通过）、生产规模的模型（最高7.24B参数，208种策略级测试，43,368种层级属性检查在受限张量分辨率下）以及多节点收敛在 gossip 和分区修复（100个节点，20种顺序）中，CRDT开销低于0.5毫秒。由于封装器是透明的，下游性能由构造保证，通过字节相同输出验证确认。参考实现可用作crdt-merge v0.9.4。

英文摘要

All 26 neural network merge strategies we tested including weight averaging, SLERP, TIES, DARE, Fisher merging, and evolutionary approaches -- fail the algebraic properties (commutativity, associativity, idempotency) required for conflict-free distributed operation. We prove that this failure is structural: normalisation-based merges cannot simultaneously satisfy all three properties. To resolve this, we present a two-layer architecture -- CRDTMergeState -- that wraps any merge strategy in a CRDT-compliant (Conflict-Free Replicated Data Type) layer. Layer 1 manages contributions via OR-Set CRDT semantics, where the merge operation is set union -- trivially commutative, associative, and idempotent. Layer 2 applies merge strategies as deterministic pure functions over a canonically-ordered contribution set, with randomness seeded from the Merkle root. We prove that this separation guarantees Strong Eventual Consistency: all replicas receiving the same contributions compute identical merged models, regardless of message ordering. Empirical validation spans three tiers: controlled 4x4 tensors (104/104 tests pass), production-scale models up to 7.24B parameters (208 strategy-level tests, 43,368 layer-level property checks at capped tensor resolution), and multi-node convergence under gossip and partition healing (100 nodes, 20 orderings), with CRDT overhead below 0.5 ms. Because the wrapper is transparent, downstream performance is identical by construction, confirmed via byte-identical output verification. The reference implementation is available as crdt-merge v0.9.4.

URL PDF HTML ☆

赞 0 踩 0

2605.19366 2026-05-20 cs.LG 版本更新

Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems

准确、高效且可解释的深度学习方法用于环境科学问题

Jimeng Shi

发表机构 * College of Engineering and Computing（工程与计算学院）

AI总结本文提出三种针对复杂环境科学问题的深度学习方法：用于海岸河流洪水预测的WaLeF模型、用于全球天气预测的CoDiCast模型以及用于环境科学科学问答的Hypercube-RAG方法，旨在提高环境智能的准确性、效率和可解释性。

Comments 161 pages

详情

AI中文摘要

环境科学在保护生态系统中起着关键作用，这一领域由大规模、异构数据驱动。在大数据时代，人工智能（AI）已成为一种变革性工具，用于学习模式并支持决策。本论文开发了针对复杂环境科学问题的AI方法，以实现环境智能，研究了三个具体挑战。首先，我们专注于海岸河流系统的洪水预测和管理。传统物理模型计算成本高，限制了实时应用。为此，我们提出了一种基于深度学习（DL）的水位预测模型WaLeF，以及一种基于预测的深度学习模型FIDLAr用于水位管理。在佛罗里达南部易发洪水的海岸系统中评估，该系统以极端降雨和海平面上下波动为特点，FIDLAr在准确性和效率上优于基线模型，同时提供可解释的输出。其次，我们针对全球天气预测，这受到大规模数据规模的挑战。传统物理方法是确定性的且计算密集型。我们提出CoDiCast，一种条件扩散模型，专门用于概率天气预测。从生成AI用于预测任务中衍生而来，实验表明CoDiCast实现了准确且高效的预测，具有明确的不确定性量化。最后，我们解决环境科学中的科学问答问题。在回答领域内问题时，大型语言模型（LLMs）常常由于知识过时或有限而产生幻觉。虽然检索增强生成（RAG）检索了领域特定的知识，但现有方法在准确度、效率或可解释性之间进行权衡。我们提出Hypercube-RAG，基于结构化的文本立方体框架，成功同时表现出这三种属性。

英文摘要

Environmental science plays a pivotal role in safeguarding ecosystems, a domain driven by large-scale, heterogeneous data. In the big data era, artificial intelligence (AI) has emerged as a transformative tool for learning patterns and supporting decision-making. This dissertation develops AI-based approaches tailored to complex environmental science problems to achieve Environmental Intelligence, studying three specific challenges. First, we focus on flood prediction and management in coastal river systems. Conventional physics-based models are computationally intensive, limiting real-time application. To overcome this, we propose a deep learning (DL)-based model, WaLeF, for water level forecasting, and a forecast-informed DL model, FIDLAr, to manage water levels. Evaluated in a flood-prone coastal system in South Florida characterized by extreme rainfall and sea level fluctuations, FIDLAr outperforms baselines in accuracy and efficiency while providing interpretable outputs. Second, we target global weather prediction, which is challenged by massive data scale. Traditional physics methods are deterministic and computationally heavy. We propose CoDiCast, a conditional diffusion model tailored for probabilistic weather forecasting. Adapted from generative AI for predictive tasks, experiments show CoDiCast achieves accurate, efficient forecasts with explicit uncertainty quantification. Lastly, we address scientific question-answering in environmental science. When answering in-domain questions, large language models (LLMs) often suffer from hallucinations due to out-of-date or limited knowledge. While retrieval-augmented generation (RAG) retrieves domain-specific knowledge, existing methods trade off accuracy, efficiency, or explainability. We propose Hypercube-RAG, built on a structured text cube framework, which successfully exhibits all three properties simultaneously.

URL PDF HTML ☆

赞 0 踩 0

2605.19360 2026-05-20 cs.CV cs.LG cs.NE physics.app-ph physics.optics 版本更新

Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

可扩展的、节能的光学-神经架构用于多路复用的深度伪造视频检测

Parnian Ghapandar Kashani, Shiqi Chen, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校电气与计算机工程系）； Bioengineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校生物工程系）； California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校加州纳米系统研究所）

AI总结本文提出了一种结合轻量级数字前端和空间复用光学解码后端的混合深度伪造视频检测框架，通过可编程空间光调制器实现大规模并行模拟推理，从而在降低计算成本的同时提高视频真实性预测的吞吐量和准确性。

Comments 30 Pages, 8 Figures

详情

AI中文摘要

AI生成视觉媒体的快速普及催生了对高效、可信的深度伪造检测系统的需求。然而，现有基于深度学习的检测方法依赖于计算密集且能耗高的推理算法，限制了其可扩展性。本文提出了一种混合的数字-模拟深度伪造视频检测框架，结合轻量级数字前端和空间复用光学解码后端，通过可编程空间光调制器实现大规模并行模拟推理。通过在单次光学传播过程中同时处理15个或更多的视频流，该系统在降低计算成本的同时实现了高吞吐量和准确的视频级真实性预测。我们使用不同数据集验证了该混合深度伪造视频处理器，包括经典面部交换、现实世界深度伪造记录和完全AI生成的视频。使用在可见光谱范围内操作的空间复用实验装置，我们在Celeb-DF视频数据集上实现了97.79%的深度伪造检测准确率、99.86%的灵敏度和95.72%的特异性，分别在15个视频并行处理的单次光学传播中测试。多路复用的光学解码器还展示了对各种视频退化、噪声、压缩、实验偏移和黑盒对抗攻击的鲁棒性。我们的结果表明，将光学计算整合到AI推理中可以同时提高吞吐量、能效和对抗鲁棒性——这三个属性在纯数字系统中难以同时实现。

英文摘要

The rapid proliferation of AI-generated visual media has created an urgent need for efficient, trustworthy deepfake detection systems. However, existing deep learning-based detection methods rely on computationally intensive and energy-demanding inference algorithms, limiting their scalability. Here, we present a hybrid digital-analog deepfake video detection framework that combines a lightweight digital front-end with a spatially multiplexed optical decoding back-end for massively parallel analog inference through a programmable spatial light modulator. By simultaneously processing 15 or more video streams within a single optical propagation pass, the system enables high-throughput and accurate video-level authenticity prediction at reduced computational cost compared with purely digital methods. We validated this hybrid deepfake video processor using different datasets spanning classical face-swapping, real-world deepfake recordings, and fully AI-generated videos. Using a spatially multiplexed experimental set-up operating in the visible spectrum, we achieved average deepfake detection accuracy, sensitivity and specificity of 97.79%, 99.86% and 95.72%, respectively, on the Celeb-DF video dataset with 15 videos tested in parallel in a single optical pass per inference. The multiplexed optical decoder also demonstrates resilience against various types of video degradation, noise, compression, experimental misalignments and black-box adversarial attacks. Our results show that integrating optical computation into AI inference enables simultaneous gains in throughput, energy efficiency, and adversarial robustness - three properties that are difficult to achieve together in purely digital systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19359 2026-05-20 cs.CV cs.LG 版本更新

CompoSE：通过部分感知控制进行3D形状的组合合成与编辑

Habib Slim, Shariq Farooq Bhat, Mohamed Elhoseiny, Yifan Wang, Mike Roberts

发表机构 * King Abdullah University of Science and Technology (KAUST)（卡布斯大学）； Adobe Research（Adobe研究）

AI总结本文提出CompoSE方法，通过部分感知控制实现3D形状的组合合成与编辑，核心方法是使用扩散变压器架构在局部和全局之间交替处理部分，并通过新颖的条件技术确保对用户输入的强遵循，主要贡献是无需部分级文本提示即可直接从用户粗略布局指导中学习部分语义和对称性。

详情

AI中文摘要

BrainDyn: 一种用于生成脑动态的sheaf神经ODE

Siddharth Viswanath, Panayiotis Ketonis, Chen Liu, Michael Perlmutter, Dhananjay Bhaskar, Smita Krishnaswamy

发表机构 * Yale University（耶鲁大学）； Boise State University（博伊西州立大学）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结本文提出BrainDyn，一种基于sheaf神经ODE的模型，用于生成脑动态，通过LSTM编码脑区活动历史，利用sheaf拉普拉斯算子促进信息传递，实现跨模态的强预测能力。

详情

AI中文摘要

高效的神经网络模型能够生成类似大脑动态的活动，可以用于生成合成数据、分析在测试扰动活动等条件下大脑瞬态的差异以及推断底层生成动态。然而，大型语言模型（LLMs）或标准循环神经网络（RNNs）忽略了解剖组织，因此不产生与脑区对齐的组件。另一方面，基于图的网络通常有非常简单的消息传递规则，这些规则不足以表达类似大脑的动态。为此，我们引入了BrainDyn，一种用于在结构化脑图上连续时间动态的sheaf神经ODE模型。BrainDyn使用长短期记忆（LSTM）模型在滑动时间窗口上编码每个脑区的最近活动历史，以生成隐藏状态或茎，这些状态通过可学习的限制映射投影到边特定的共享空间中。这些共享空间中相邻节点之间的差异由sheaf拉普拉斯算子表征，可以促进神经元单元之间的信息传递。这些信息的输出然后被馈送到神经ODE中，该神经ODE控制神经元活动的连续时间演变。我们对静息态fMRI（PNC数据集）、头皮EEG与局灶性癫痫（TUSZ数据集）以及由NEST尖峰网络模拟器模拟的活动进行了评估。BrainDyn在跨模态中实现了强大的预测能力，所得到的表示支持下游任务，包括在硅中扰动预测。

英文摘要

Efficient neural network models that generate brain-like dynamic activity can be a valuable resource for generating synthetic data, analyzing differences in brain transients under conditions such as testing perturbation activity or inferring the underlying generative dynamics. However, large language models (LLMs) or standard recurrent neural networks (RNNs) ignore the anatomical organization and therefore do not produce components that align with brain regions. On the other hand, graph-based networks often have very simple message passing rules that are not sufficiently expressive for brain-like dynamics. To address this, we introduce BrainDyn, a sheaf neural ordinary differential equation (neural ODE) model for continuous-time dynamics on structured brain graphs. BrainDyn encodes the recent activity history of each brain region using a long short-term memory (LSTM) model over a sliding temporal window to produce hidden states, or stalks, that are projected through learnable restriction maps into edge-specific shared spaces. Discrepancies between neighboring nodes in these shared spaces are characterized by a sheaf Laplacian that can facilitate message passing between neuronal units. The output of these messages is then fed to a neural ODE that governs the continuous-time evolution of neuronal activity. We evaluated BrainDyn on resting-state fMRI (PNC dataset), scalp EEG with focal epilepsy (TUSZ dataset), and simulated activity from the NEST spiking network simulator. BrainDyn achieves strong forecasting ability across modalities, and the resulting representations support downstream tasks including in silico perturbation prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.19317 2026-05-20 cs.LG cs.AI 版本更新

Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

通过迭代部分细化在扩散模型中实现推理时间扩展

Taegu Kang, Jaesik Yoon, Sungjin Ahn

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出了一种无需外部验证器的扩散模型推理时间扩展方法Iterative Partial Refinement，通过在混合噪声条件下迭代部分细化生成更一致的样本，在MNIST Sudoku任务中提升了有效解率。

Comments Accepted at the ICLR 2026 Workshop on AI with Recursive Self-Improvement

详情

AI中文摘要

推理时间扩展已成为提升推理能力的主要方法，并越来越多地应用于扩散模型。然而，现有的扩散模型推理时间扩展方法通常依赖外部验证器或奖励模型来排名和选择样本，限制了其在这些评估器可用且可靠的情况下可扩展性。此外，尽管最近的扩散模型进行区域-wise、混合噪声推理，但针对此设置的推理时间扩展仍相对未被探索。我们提出Iterative Partial Refinement (IPR)，一种针对顺序扩散模型的推理时间扩展方法，无需外部验证器。从已生成的样本开始，IPR重新噪声一部分区域并根据剩余区域重新生成它们，使模型能够在比初始生成时更丰富的上下文中修订早期决策。这种迭代部分细化生成更一致的样本而无需外部验证。在需要全局约束满足的推理任务中，IPR一致地提升了性能：在MNIST Sudoku任务中，有效解率从55.8%提高到75.0%。这些结果表明，仅迭代部分细化即可作为扩散模型在顺序、混合噪声设置中的有效推理时间扩展策略。代码可在：https://github.com/ahn-ml/IPR获取。

英文摘要

Inference-time scaling has emerged as a major approach for improving reasoning capabilities, and has been increasingly applied to diffusion models. However, existing inference-time scaling methods for diffusion models typically rely on external verifiers or reward models to rank and select samples, limiting their scalability to settings where such evaluators are available and reliable. Moreover, while recent diffusion models perform sequential inference with region-wise, mixed-noise conditioning, inference-time scaling tailored to this setting remains relatively underexplored. We propose Iterative Partial Refinement (IPR), an inference-time scaling method for sequential diffusion that requires no external verifier. Starting from an already-generated sample, IPR re-noises a subset of regions and regenerates them conditioned on the remaining regions, enabling the model to revise earlier decisions under a richer context than was available during the initial generation. This iterative partial refinement produces more globally consistent samples without external verification. On reasoning tasks requiring global constraint satisfaction, IPR consistently improves performance: on MNIST Sudoku, the valid solution rate increases from 55.8% to 75.0%. These results show that iterative partial refinement alone can serve as an effective inference-time scaling strategy for diffusion models in sequential, mixed-noise settings. Code is available at: https://github.com/ahn-ml/IPR

URL PDF HTML ☆

赞 0 踩 0

2605.19313 2026-05-20 stat.ML cs.LG stat.ME 版本更新

跨范式知识蒸馏：随机森林与深度神经网络之间双向知识转移的综合性研究用于大数据应用

Mahdi Naser Moghadasi

发表机构 * BrightMind AI Research（BrightMind AI研究院）

AI总结本文研究了随机森林与深度神经网络之间双向知识蒸馏，提出了新的方法，通过144次实验展示了双向RF-DL蒸馏在分类和回归任务中的竞争力，同时提供了可解释性和表达性的互补优势。

详情

AI中文摘要

大数据的指数增长加剧了对能够处理多样化数据特征并保持计算效率的高效且可解释的机器学习模型的需求。知识蒸馏主要集中在神经网络到神经网络的转移，跨范式知识转移则鲜有探索。本文首次系统研究了随机森林（RF）与深度神经网络（DNN）之间的双向知识蒸馏，填补了集成学习和大数据应用中的模型压缩关键空白。我们提出了一种新的方法，包括渐进多阶段蒸馏、来自多样化树模型的多教师集成蒸馏以及不确定性感知的跨范式转移机制。通过在6个多样化的数据集上进行144次全面实验，涵盖了分类和回归任务，我们证明双向RF-DL蒸馏在保持可解释性的同时，提供了神经网络的表达能力。我们的结果表明，多教师集成蒸馏在传统方法上始终表现更优，其中NN-COMPACT在分类任务中达到98.13%的分类准确率，NN-WIDE在回归任务中达到92.6%的R²分数。所提出的框架使大数据环境中的部署更加灵活，可以根据计算约束和可解释性需求进行最优模型选择。这项工作在跨范式知识转移领域建立了新的研究方向，对可解释AI和资源受限大数据系统中的可扩展模型部署具有重要影响。

英文摘要

The exponential growth of big data has intensified the need for efficient and interpretable machine learning models that can handle diverse data characteristics while maintaining computational efficiency. Knowledge distillation has primarily focused on neural network-to-neural network transfer, leaving cross-paradigm knowledge transfer largely unexplored. This paper presents the first comprehensive study of bidirectional knowledge distillation between Random Forests (RF) and Deep Neural Networks (DNN), addressing critical gaps in ensemble learning and model compression for big data applications. We propose novel methodologies including progressive multi-stage distillation, multi-teacher ensemble distillation from diverse tree models, and uncertainty-aware cross-paradigm transfer mechanisms. Through 144 comprehensive experiments across 6 diverse datasets encompassing classification and regression tasks, we demonstrate that bidirectional RF-DL distillation achieves competitive performance while providing complementary benefits: interpretability from tree models and expressiveness from neural networks. Our results show that multi-teacher ensemble distillation consistently outperforms traditional approaches, with NN-COMPACT achieving 98.13% classification accuracy and NN-WIDE reaching 92.6% R^2 score in regression tasks. The proposed framework enables deployment flexibility in big data environments, allowing optimal model selection based on computational constraints and interpretability requirements. This work establishes a new research direction in cross-paradigm knowledge transfer with significant implications for interpretable AI and scalable model deployment in resource-constrained big data systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19293 2026-05-20 cs.IT cs.LG cs.RO math.IT 版本更新

Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation

领域自适应的通信速率优化用于仿真到现实的人形机器人无线XR远程操作

Caolu Xu, Zhiyong Chen, Meixia Tao, Li Song, Feng Yang, Wenjun Zhang

发表机构 * Cooperative Medianet Innovation Center（协作中位网创新中心）； School of Information Science and Electronic Engineering（信息科学与电子工程学院）； Shanghai Jiao Tong University（上海交通大学）

AI总结本文提出了一种领域自适应的通信速率优化方法，通过在仿真到现实的分布偏移中平衡重建误差和通信能耗，利用PAC-Bayes泛化特性分析和密度比加权的PPO方法，结合离线真实域数据校正，以提高人形机器人无线XR远程操作的通信效率和重建精度。

Comments submitted to IEEE journal

详情

重新思考Muon超越预训练：VLA和RLVR中的频谱失败及高频修复

Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

发表机构 * Michigan State University（密歇根州立大学）； Cisco（思科）； University of Minnesota（明尼苏达大学）； IBM Research（IBM研究院）

AI总结本文研究了Muon优化器在预训练之外的局限性，提出Pion通过高频NS迭代机制改进VLA和RLVR任务的性能。

详情

AI中文摘要

Muon是一种矩阵感知优化器，利用牛顿-施楚兹（NS）迭代来通过驱动动量矩阵的所有奇异值趋近于1来强制梯度正交化。尽管这种均匀频谱白化增强了探索并优于AdamW在LLM预训练中，我们显示它在两个领域可能导致根本限制：（i）跨模态视觉-语言-动作（VLA）训练，其中固有低秩动作模块梯度导致噪声尾部方向的放大，以及（ii）可验证奖励的强化学习（RLVR），其中低信噪比梯度和需要保留先前训练的每头专业化使白化不稳定。为了解决这些挑战，我们提出Pion，作为Muon的即插即用替代品，保持其计算效率，同时将均匀频谱白化替换为两阶段的提升+抑制机制，我们称之为高频NS迭代。这种设计诱导了锐利的频谱高频效应，将主导奇异值锚定在1，同时将噪声尾部组件抑制到0，具有可控的滤波强度。为了保持预训练的每头异质性，Pion还支持一种每头模式，通过简单的reshape在注意力头之间独立应用更新，而无需额外成本。在LIBERO和LIBERO-Plus上的VLA训练中，Pion在l_1回归（VLA-Adapter）和流匹配（VLANeXt）架构上一致优于基线，例如在1,500次训练步骤后达到LIBERO Object的100%成功率，而Muon为97.0%，AdamW仅为32.2%。Pion的优势进一步扩展到使用pi_0.5骨干的现实Franka Research 3机器人在DROID设置下的三个抓取和放置任务。在Qwen3-1.7B/4B上的RLVR后训练中，Pion在MATH和GSM8K上优于AdamW，而Muon则崩溃为零。

英文摘要

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

URL PDF HTML ☆

赞 0 踩 0

2605.19258 2026-05-20 cs.LG cs.AI 版本更新

ExECG: An Explainable AI Framework for ECG models

ExECG：用于ECG模型的可解释AI框架

Jong-Hwan Jang, Yong-yeon Jo

发表机构 * Medical AI Co. Ltd（医疗AI公司）

AI总结本文提出ExECG框架，旨在解决ECG模型在临床应用中缺乏解释性的问题，通过三阶段流程提供可重用和可复现的ECG可解释性。

详情

AI中文摘要

深度学习已使ECG诊断模型在如心律失常分类和异常检测等任务中表现出强大的性能。然而，仅凭准确性不足以满足临床部署的需求，因为它无法解释为何产生特定的输出，限制了验证、错误分析和信任。尽管ECG XAI已被广泛研究并持续改进，但不同研究中的实际流程和报告规范差异较大，阻碍了重用和可复现性。为了解决这些问题，我们提出了ExECG，一个Python框架，提供三阶段流程：Wrapper标准化访问异构ECG格式和中间表示，Explainer统一各种XAI方法到共享的执行协议，Visualizer支持在统一界面内一致的跨方法比较。我们通过简洁的例子和两个案例研究展示了端到端的使用，强调了可互操作和可复现的ECG可解释性。

英文摘要

Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However, accuracy alone is insufficient for clinical deployment because it does not explain why a specific output was produced, limiting justification, error analysis, and trust. Although ECG XAI has been extensively investigated and steadily improved, practical pipelines and reporting conventions vary across studies, hindering reuse and reproducibility. To address these issues, we present Explainable AI framework for ECG models (ExECG), a Python framework that provides a three-stage pipeline: Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, Explainer unifies diverse XAI methods under a shared execution protocol, and Visualizer supports consistent cross-method comparison within a unified interface. We demonstrate end-to-end usage with concise examples and two case studies, highlighting interoperable and reproducible ECG explainability.

URL PDF HTML ☆

赞 0 踩 0

2605.19249 2026-05-20 cs.LG 版本更新

Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting

超越外推：基于双向启发的知识利用范式用于时间序列预测

Liu Chong, Yingjie Zhou, Hao Li, Pengyang Wang, Qingsong Wen, Ce Zhu

发表机构 * College of Computer Science, Sichuan University（四川大学计算机科学学院）； Department of Computer and Information Science, University of Macau（澳门大学计算机与信息科学系）； School of Information and Communication Engineering, University of Electronic Science and Technology of China（电子科技大学信息与通信工程学院）

AI总结本文提出了一种新的时间序列预测范式KUP-BI，通过从训练历史库中提炼出延续式知识，为双向预测提供结构化知识，从而提升预测性能。

Comments Accepted to ICML 2026. 18 pages, 6 figures

详情

AI中文摘要

时间序列预测在能源、交通和公共卫生等场景中至关重要。然而，大多数现有预测模型主要依赖单向推理，即从历史映射到目标，而忽略了由修订的自然链（'历史（模型输入）--目标（真实输出）--目标后延续'）提供的结构信息。目标后延续记录了轨迹在目标后的发展情况，有助于稳定预测，但无法在推理时观测到。本文旨在获得当前输入的近似后延续代理，为双向预测提供结构化知识。该想法被实例化为KUP-BI（Knowledge Utilization Paradigm with Bidirectional Inspiration），一种新的时间序列建模范式，从仅训练的历史库中提炼出延续式知识（作为近似后延续代理），并将其整合到标准预测骨干中。输入流和延续代理流通过轻量级的特征级门控模块进行融合。这种设计不引入训练轨迹中已包含的信息之外的内容；相反，它提供了一种结构化的归纳偏置，帮助骨干利用典型的延续模式，而不是仅依赖参数外推。在六个公开数据集上的实验结果表明，KUP-BI在提升最先进模型的预测性能方面表现一致，且具有较小的额外开销。

英文摘要

Time-series forecasting is critical in various scenarios, such as energy, transportation, and public health. However, most existing forecasters rely primarily on one-way inference, \textit{i.e.}, mapping \textbf{history} to \textbf{target}, and overlook the structural information provided by a revised natural chain (``\textbf{history} (model input) -- \textbf{target} (ground-truth output) -- \textbf{post-target continuation}''). The post-target continuation records how trajectories evolve after the target, which can help stabilize forecasting, but it is not observable at inference time. In this work, we aim to obtain an approximate proxy of the post-target continuation for the current input, providing structural knowledge for bidirectional forecasting. This idea is instantiated as KUP-BI (Knowledge Utilization Paradigm with Bidirectional Inspiration), a new time-series modeling paradigm that distills continuation-style knowledge (as an approximate post-target continuation proxy) from a \emph{train-only} historical library and integrates it into standard forecasting backbones. The input stream and the continuation-proxy stream are fused via a lightweight feature-level gating module. This design does not introduce information beyond what is already contained in the training trajectories; instead, it provides a structured inductive bias that helps backbones exploit typical continuation patterns rather than relying solely on parametric extrapolation. Experimental results on six public datasets show that KUP-BI consistently improves the forecasting performance of state-of-the-art models, with small additional overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.19243 2026-05-20 cs.LG cs.AI cs.CG 版本更新

Euclidean Embedding of Data Using Local Distances

利用局部距离进行数据的欧几里得嵌入

Dimitris Arabadjis

发表机构 * Department of Statistics and Actuarial-Financial Mathematics（统计与精算-金融数学系）； University of the Aegean（爱琴海大学）

AI总结本文研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题，提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作，不需要任何先前的数据向量表示。通过求解一个变分问题，将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导，允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式，这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括：(a)推导出在连续体中支配最优欧几里得嵌入的功能方程；(b)一种不依赖于特征向量的表示形式，仅需要邻域距离图；(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法，证明了在保持局部度量结构和邻近关系的同时，能够近似全局等距嵌入。

详情

AI中文摘要

我们研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题，并提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作，不需要任何先前的数据向量表示。嵌入是通过求解一个变分问题来实现的，该问题将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导，允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式，这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括：(a)推导出在连续体中支配最优欧几里得嵌入的功能方程；(b)一种不依赖于特征向量的表示形式，仅需要邻域距离图；(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法，证明了在保持局部度量结构和邻近关系的同时，能够近似全局等距嵌入。

英文摘要

We study the problem of recovering a globally consistent Euclidean embedding of data, given only a local distance graph and propose a method that optimally represents these distances. The method operates solely on a neighborhood graph weighted by pairwise distances, without requiring any prior vector representation of the data. The embedding is obtained by solving a variational problem that matches local, on-graph distances to the Euclidean metric, induced by the differentials of the embedding functions. The resulting Euler-Lagrange equations are derived in a coordinate-free form, enabling direct evaluation of all operators from the distance graph alone. Though non-linear and missing an explicit expression for their non-linearity, these equations are shown to be resolved as an iteratively updated sparse linear problem. The main contributions of the proposed approach are (a) the derivation of the functional equations governing the optimal Euclidean embedding in the continuum, (b) a representation-free formulation that requires only a neighborhood distance graph and no feature vectors and (c) an estimation procedure based exclusively on local graph operations. We experimentally evaluate the resulting non-parametric algorithm on synthetic manifolds and real datasets, demonstrating consistent preservation of local metric structure and neighboring relations, while approximating the global isometric embedding.

URL PDF HTML ☆

赞 0 踩 0

2605.19242 2026-05-20 cs.CV cs.AI cs.ET cs.LG cs.MM 版本更新

PhyWorld: Physics-Faithful World Model for Video Generation

PhyWorld: 用于视频生成的物理忠实世界模型

Pu Zhao, Juyi Lin, Timothy Rupprecht, Arash Akbari, Chence Yang, Rahul Chowdhury, Elaheh Motamedi, Arman Akbari, Yumei He, Chen Wang, Geng Yuan, Weiwei Chen, Yanzhi Wang

发表机构 * Northeastern University（东北大学）； University of Georgia（佐治亚大学）； Tulane University（路易斯安那大学）； EmbodyX

AI总结本文提出PhyWorld，一种通过两阶段训练提升视频生成模型的物理忠实性，以改进世界模拟器的性能，从而更有效地支持物理AI系统。

详情

AI中文摘要

世界模拟器可以在真实世界部署前提供安全且可扩展的环境来训练物理AI系统。大型视频生成模型正成为此类模拟器的有希望的基础，因为它们能够生成多样且逼真的视觉未来。然而，将其用作世界模拟器需要物理忠实的视频延续，即生成的视频应保持由条件输入隐含的物理状态，并以符合基本物理原理的方式演变。我们提出了PhyWorld，一种视频生成世界模型，通过两阶段的后训练来生成时间上一致且物理忠实的场景延续。在第一阶段，我们通过流匹配微调改进视频到视频延续，鼓励稳定视觉属性和帧间一致的运动动态。在第二阶段，我们通过直接偏好优化（DPO）对物理偏好对进行对齐，使模型朝着更符合物理合理性的输出发展。为了评估PhyWorld，我们使用了标准视频质量基准和专门的物理忠实性基准，并对每条物理定律进行评分。实验表明，PhyWorld提高了视频一致性，其在VBench上的平均得分为0.769，比最先进的基线0.756或更低。PhyWorld还提高了物理合理性，其在我们物理忠实性基准上的平均得分为3.09，比最强基线的2.99有所提高。这些结果表明，通过延续和物理偏好信号对大型视频生成模型进行后训练，可以使其成为更有效的物理AI世界模拟器。

英文摘要

World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.

URL PDF HTML ☆

赞 0 踩 0

2605.19235 2026-05-20 cs.LG cs.GT 版本更新

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

GAE在不完全信息自博弈强化学习中表现不足

Zhiyuan Fan, Gabriele Farina

发表机构 * MIT（麻省理工学院）

AI总结本文研究了不完全信息博弈中自博弈强化学习中GAE估计器的方差问题，提出Q-boosting和VRPO算法以减少方差并提升性能。

详情

AI中文摘要

不完全信息博弈中的竞争多智能体强化学习需要智能体在部分可观测环境下对抗对手，需要随机策略。虽然使用近端策略优化（PPO）的自博弈强化学习在经验上取得了成功，但其标准优势估计器广义优势估计（GAE）由于随机未来动作的采样而产生额外的方差。在均衡自博弈中，这种方差被均衡策略的随机性放大，并且即使当批评器是精确的时仍然存在。我们通过引入基于集中动作价值批评的Q-boosting，一种方差减少的优势估计器，以及提出方差减少策略优化（VRPO），将此新估计器纳入其中。该算法用多步期望SARSA(λ)轨迹替代了采样的多步备份，每一步计算策略期望以平均动作采样噪声，同时保留PPO的裁剪目标和在线策略演员更新。经验上，VRPO在中等规模到大规模游戏，包括斗地主和头衔无限制德州扑克中都表现出强劲的性能。

英文摘要

Competitive multi-agent reinforcement learning in imperfect-information games requires agents to act under partial observability and against adversarial opponents, necessitating stochastic policies. While self-play reinforcement learning with Proximal Policy Optimization (PPO) has achieved strong empirical success, its standard advantage estimator, generalized advantage estimation, suffers from additional variance due to the sampling of stochastic future actions. This variance is amplified in equilibrium self-play because of the stochastic nature of the equilibrium policy and persists even when the critic is exact. We address this bottleneck by introducing $Q$-boosting, a variance-reduced advantage estimator based on a centralized action-value critic, and propose Variance-Reduced Policy Optimization (VRPO), incorporating this new estimator. The algorithm replaces sampled multi-step backups with a multi-step Expected SARSA$(λ)$ trace, computing policy expectations at each step to average out action-sampling noise, while retaining PPO's clipped objective and on-policy actor updates. Empirically, VRPO consistently achieves strong performance from mid-sized to large-scale games including Dou Dizhu and Heads-Up No-Limit Texas Hold'em.

URL PDF HTML ☆

赞 0 踩 0

2605.19231 2026-05-20 cs.LG stat.ML 版本更新

通过强化学习实现功能动作的精准体育活动处方

Gefei Lin, Rui Miao, Jennifer Sacheck, Xiaoke Zhang

发表机构 * Department of Statistics, The George Washington University（统计系，乔治·华盛顿大学）； Department of Mathematical Sciences, The University of Texas at Dallas（数学科学系，德克萨斯大学达拉斯分校）； Department of Behavioral and Social Sciences, Brown University（行为与社会科学系，布朗大学）

AI总结本文提出了一种基于强化学习的算法，用于根据心血管代谢风险个性化优化每日步数分布，通过All of Us研究数据验证了该方法在提高健康生物标志物方面的有效性。

详情

AI中文摘要

体育活动（PA）在维持和改善健康方面起着重要作用。日常步数已成为一种关键的PA测量指标，可通过常见的可穿戴设备轻松获取。然而，缺乏推荐个性化最优每日步数分布的方法以最佳改善某些健康生物标志物。本文基于All of Us研究数据，该数据包括数月的步数计数以及关键健康生物标志物的重复测量，开发了一种新的离线强化学习（RL）算法，以学习与心血管代谢风险相关的个性化和最优PA分布，其中动作是一个函数，表示一段时间内每日步数分布。模拟研究显示，所提出的方法在现有连续动作RL方法中具有优势。从All of Us数据中学习到的最优策略通常建议人们增加日常步数，并在时间上遵循更一致的PA模式，同时为血糖水平、体重指数、血压、年龄和性别等亚组提供定制推荐。

英文摘要

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.

URL PDF HTML ☆

赞 0 踩 0

2605.19207 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings

用于低资源医疗环境的量化机器学习模型：医学影像

Sumanth Meenan Kanneti, Aryan Shah

发表机构 * Georgia State University（佐治亚州立大学）

AI总结本文提出了一种多策略压缩框架，用于MRI图像中的脑肿瘤分类，通过量化感知训练、从DenseNet-101教师模型到紧凑DenseNet-32学生模型的知识蒸馏以及轻量MobileNetV2骨干网络上的Float16后训练量化，实现了在低资源医疗环境中高效且准确的脑肿瘤筛查。

详情

AI中文摘要

深度学习模型在医学影像分析中表现出强大的性能，但在低资源临床环境中部署仍然困难，由于计算、内存和电力限制。本文提出了一种多策略压缩框架，用于从MRI中进行脑肿瘤分类，包括量化感知训练、从DenseNet-101教师模型到紧凑DenseNet-32学生模型的知识蒸馏，以及在轻量MobileNetV2骨干网络上的Float16后训练量化。使用包含胶质瘤、脑膜瘤、垂体瘤和健康对照的多类脑肿瘤MRI数据集，我们提供了基于MobileNetV2的完整实验验证，通过三阶段迁移学习训练分类器，并通过TensorFlow Lite应用Float16量化。DenseNet基于的知识蒸馏和量化感知训练策略被描述为框架内的互补压缩方法，其完整的经验评估留待未来工作。在MobileNetV2管道上的实验结果表明，量化模型在验证准确率为82.37%的情况下，与全精度基线82.20%相比，模型大小从35.34 MB减少到5.76 MB，压缩比为6.14倍，无显著精度损失。各分类评估证实，量化在所有四个肿瘤类别中均匀保持诊断性能。这些发现表明，轻量化的量化模型可以在资源受限的医疗环境中提供临床可行的脑肿瘤筛查。

基于云技术的陨石回收工具：利用无人机和机器学习

Seamus L. Anderson, Hadrien A. R. Devillepoix, Lewis Lakerink, Sawitchaya Tippaya, Dale P. Giancono, Martin C. Towner, Iona Clemente, Martin Cupák, Ashley F. Rogers, John H. Fairweather, Mia Walker, Daniel Burgin, Michael A. Frazer, Sophie E. Deam, Veronika Pazderová, Eleanor K. Sansom, Benjamin A. D. Hartig, Hely C. Branco, Thomas Stevenson, Isabella Hatty, Anna Zappatini, Anthony Lagain, Tom Lovelock, Auriane Egal, Lucy Forman, David Belton, Simon Windsor, Shibli Saleheen, Asher Leslie, Gregory B. Poole, Andrew Langendam, Rachel S. Kirby, Andrew G. Tomkins

发表机构 * NASA Goddard Space Flight Center（美国国家航空航天局戈达德太空飞行中心）； Space Science and Technology Centre（空间科学与技术中心）； International Centre for Radio Astronomy Research（国际射电天文研究中心）； Astronomy Data and Computing Services (ADACS)（天文数据与计算服务）； Curtin Institute for Data Science（Curtin数据科学研究所）； Centre for Rock Art Research and Management（岩画研究与管理中心）； Faculty of Mathematics, Physics and Informatics, Comenius University Bratislava（布拉迪斯拉发大学数学、物理与信息学学院）； Institute of Geology, University of Bern（伯尔尼大学地质研究所）； Aix-Marseille University, CNRS, IRD, INRA, CEREGE, Institut Origines（阿维尼翁大学，CNRS，IRD，INRA，CEREGE，Origines研究所）； Royal Holloway University of London（皇家霍洛威大学）； Planétarium de Montréal, Espace pour la Vie（蒙特利尔天文馆，生命空间）； Department of Physics and Astronomy, The University of Western Ontario（滑铁卢大学物理与天文学系）； School of Earth and Planetary Sciences, Curtin University（Curtin大学地球与行星科学学院）； Australian Nuclear Science and Technology Organisation（澳大利亚核科学与技术组织）； School of Earth, Atmosphere and Environment, Monash University（莫纳什大学地球、大气与环境学院）

AI总结本文提出一种基于云技术的工具，利用无人机和机器学习帮助恢复通过仪器观测到的陨石坠落。该工具展示了系统迭代改进的成果，并详细说明了该技术在澳大利亚南部和西海岸陨石坠落中的成功与局限性。

Comments 23 pages, 3 figures

2605.19178 2026-05-20 cond-mat.dis-nn cond-mat.stat-mech cs.LG physics.data-an 版本更新

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

激活函数、统计学和受限玻尔兹曼机中高阶相互作用的学习

Giovanni di Sarra, Yasser Roudi

发表机构 * Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology（Kavli系统神经科学研究所，挪威科学技术大学）； Department of Mathematics, King’s College London（伦敦国王学院数学系）

AI总结本文研究了受限玻尔兹曼机中激活函数对高阶相互作用统计学和学习的影响，分析了四种常见激活函数在不同参数范围内的表示和学习能力。

Comments 38 pages, 27 figures

详情

AI中文摘要

神经网络在复杂数据中识别隐藏模式和相关性的巨大成功，归功于它们利用大量参数和非线性单单元激活函数的方式。受限玻尔兹曼机（RBMs）提供了一个简单而强大的框架，用于研究激活非线性对性能和表示的影响。在本工作中，我们利用RBMs与相互作用二元变量模型之间的双重性，研究了不同隐藏单元激活函数的RBM集合所诱导的相互作用的统计学。我们以四种常用激活函数（线性、阶跃、ReLU和指数）的诱导相互作用分布的矩来分析可表示模型的空间。对学习的定量预测与训练过程模拟的结果有很好的一致。特别是，我们的分析表明，某些数据结构，即由具有大相互作用项的相互作用变量模型生成的结构，对于任何RBM来说都难以表示和学习。然而，我们发现快速增加的非线性，如指数函数，可以促进特定参数范围内的此类数据结构的表示和学习。

英文摘要

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

URL PDF HTML ☆

赞 0 踩 0

2605.19172 2026-05-20 cs.LG cs.AI 版本更新

Bridge: Retrieval-Augmented Spatiotemporal Modeling for Urban Delivery Demand

Bridge：基于检索的时空建模用于城市配送需求

Yihong Tang, Tong Nie, Junlin He, Qianjun Huang, Dingyi Zhuang, Lijun Sun

发表机构 * McGill University（麦吉尔大学）； The Hong Kong Polytechnic University（香港理工大学）； University of Toronto（多伦多大学）； MIT（麻省理工学院）

AI总结本文提出Bridge框架，通过结合归纳上下文图结构和时间感知的记忆模块，解决新加入服务区域缺乏历史记录导致的城市配送需求预测难题，提升了冷启动区域的预测性能。

详情

AI中文摘要

预测城市配送需求在新增服务区域缺乏历史记录时变得尤为具有挑战性。现有的时空预测器在有足够的节点历史时能有效建模空间依赖性，但它们仍然是参数化的，因此在冷启动区域难以恢复短期运营动态。地理嵌入帮助识别区域的位置和功能，但并不能直接揭示相似区域在相似时间背景下行为的方式。我们提出了Bridge，一种结合归纳上下文图结构和时间感知记忆的时空图框架。对于每个目标区域，Bridge通过区域上下文和近期动态从记忆中检索未来需求模式，并通过门控融合机制优化图结构预测。为了使检索与预测效用对齐，我们进一步训练检索器以未来为导向的目标，偏好那些未来轨迹与目标最匹配的条目。实验表明，Bridge在四个真实世界配送数据集上，无论是城市内部冷启动还是跨城市转移时部分观察情况下，均优于竞争性的时空基线模型。结果表明，当参数图泛化能力不足时，检索增强为冷启动城市需求预测提供了有用的操作记忆。

英文摘要

Forecasting urban delivery demand becomes substantially more challenging when newly added service regions lack historical records. Existing spatiotemporal forecasters effectively model spatial dependence once sufficient node histories are available. Still, they remain parametric and therefore struggle to recover short-term operational dynamics in cold-start regions. Geospatial embeddings help identify where a region is and what function it serves, yet they do not directly reveal how a similar region behaves under a comparable temporal context. We propose Bridge, a retrieval-augmented spatiotemporal graph framework that combines an inductive contextual graph backbone with a time-aware memory of region-time windows. For each target region, Bridge retrieves future demand patterns from the memory using both regional context and recent dynamics, and refines the backbone forecast through a gated fusion mechanism. To align retrieval with forecasting utility, we further train the retriever with a future-aware objective that favors entries whose future trajectories best match the target. Experiments on four real-world delivery datasets show that Bridge consistently improves over competitive spatiotemporal baselines in both within-city cold-start and cross-city transfer with partial observations. The results show that retrieval augmentation provides a useful operational memory for cold-start urban demand forecasting when parametric graph generalization alone is insufficient.

URL PDF HTML ☆

赞 0 踩 0

2605.19166 2026-05-20 cs.RO cs.LG math.OC 版本更新

A Heuristic Approach for Performance Tuning in RL-based Quadrotor Control via Reward Design and Termination Conditions

一种通过奖励设计和终止条件实现RL基于四旋翼控制性能调优的启发式方法

Fausto Mauricio Lagos Suarez, Akshit Saradagi, Vidya Sumathy, George Nikolakopoulos

发表机构 * Robotics and AI group, in the Department of Computer Science, Electrical and Space Engineering at Luleå University of Technology（鲁德尼大学机器人与人工智能小组，计算机科学、电气与空间工程系）

AI总结本文提出了一种新的启发式方法，通过奖励设计和终止条件实现RL四旋翼控制的可调性能，该方法通过双带宽指数奖励结构实现了设定点跟踪的临界阻尼响应，并具有低稳态误差。在使用近端策略优化（PPO）算法训练时，结合episode截断条件，在600万次时间步内以高效的方式实现了所需性能。通过直观的启发式规则调整奖励权重和指数系数，可以实现更快（空翻式）和更慢（检查式）的稳定时间性能，同时保留基线临界阻尼响应和约2%的稳态误差。

Comments Accepted in the 34th Mediterranean Conference on Control and Automation

详情

AI中文摘要

基于强化学习（RL）的四旋翼控制策略在诸如在复杂环境中快速导航和无人机赛车等任务中取得了显著性能。然而，在某些应用中，如基础设施检查，实现精确、可控的机动并具有可调性能至关重要。本文提出了一种新的启发式方法，通过奖励设计和终止条件实现RL基于四旋翼控制的可调性能。我们提出了一种包含双带宽指数的新型奖励结构，实现了设定点跟踪的基线临界阻尼响应，并具有低稳态误差。当使用近端策略优化（PPO）算法进行训练时，结合episode截断条件，在600万次时间步内以高效的方式实现了所需性能。为了调节基线行为的性能，我们提出了直观的启发式规则来调整奖励权重和指数系数，以实现更快（空翻式）和更慢（检查式）的稳定时间性能，同时保留基线临界阻尼响应和大约2%的稳态误差。我们评估了三种RL策略（基线、空翻和检查）在100次试验中的表现，并展示了在随机初始条件下位置和偏航跟踪的准确且可调性能，从而证明了所提出启发式方法的有效性。

英文摘要

Reinforcement learning (RL)-based quadrotor control policies have achieved impressive performance in tasks such as fast navigation in cluttered environments and drone racing, where the focus is on speed and agility. However, in several applications, such as infrastructure inspection, it is critical to achieve precise, controlled maneuvers with tunable performance. In this article, we present a novel heuristic approach to achieve tunable performance in RL-based Quadrotor control through reward design and termination conditions. We present a novel reward structure containing dual bandwidth exponentials that achieves a baseline critically damped response in setpoint tracking, with low steady-state errors. When trained with a Proximal Policy Optimization (PPO) algorithm, in conjunction with episode truncation conditions, the desired performance is achieved in 6 million time steps in a sample-efficient manner. In order to tune the performance about the baseline behavior, we present intuitive heuristic rules to adjust the reward weights and exponential coefficients to achieve faster (acrobatic-like) and slower (inspection-like) settling time performance, while retaining the baseline critically damped response and approximately 2\% steady-state error. We evaluate the three RL policies (baseline, acrobatic, and inspection) across 100 trials and show accurate and tunable performance in position and yaw tracking from random initial conditions, thereby demonstrating the effectiveness of the proposed heuristic approach.

URL PDF HTML ☆

赞 0 踩 0

2605.19156 2026-05-20 cs.AI cs.CY cs.LG cs.MA 版本更新

How Far Are We From True Auto-Research?

我们距离真正的自动研究还有多远？

Zhengxin Zhang, Ning Wang, Sainyam Galhotra, Claire Cardie

发表机构 * Cornell University（康奈尔大学）

AI总结本文通过ResearchArena评估了不同代理生成的论文质量，发现虽然代理能生成看似有竞争力的论文，但实际实验严谨性不足，存在伪造结果、实验能力不足和计划与执行不匹配等问题，表明自动研究仍需进一步发展。

详情

AI中文摘要

最近的自动研究系统能够生成完整的论文，但可行性并不等同于质量，该领域仍然缺乏对代理生成论文实际质量的系统研究。我们介绍了ResearchArena，一个最小的框架，让现成的代理（Claude Code使用Opus 4.6，Codex使用GPT-5.4，和Kimi Code使用K2.5）在仅轻量指导下自行完成完整的研究循环（构想、实验、论文写作、自我完善）。在13个计算机科学种子和每个代理-领域对的3次试验中，ResearchArena生成了117篇代理生成的论文，每篇都在三个互补的视角下评估：仅手稿的评审员（SAR）、考虑工件的同行评审（PR）以及人工进行的元评审。在仅SAR的情况下，图景是乐观的：Claude Code获得最高评分，优于Analemma的FARS，并与加权平均的人类ICLR 2025提交匹配，表明最小框架的代理能够生成在手稿-only评审中看起来有竞争力的论文。然而，人工检查却揭示了这个图景被夸大了：SAR评分与实际接受决定不一致，且奖励合理框架而不验证实验实质。在考虑工件的PR评分急剧下降，人工审计发现实验严谨性是主要瓶颈，分解为三种失败模式（伪造结果、低能力实验、计划/执行不匹配），这些模式高度依赖于代理：Codex 5%/8%论文与工件不匹配/伪造参考文献，与Kimi Code 77%/72%相比，差距约为15倍，追踪代理发展出的不同研究身份。没有一篇代理生成的论文达到顶级会议的接受标准。这表明我们仍然与真正的自动研究有差距。

英文摘要

Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.

URL PDF HTML ☆

赞 0 踩 0

2605.19150 2026-05-20 cs.LG cs.AI 版本更新

部分潜在变量共享下的可识别多模态因果表示学习

Manal Benhamza, Marianne Clausel, Myriam Tami

发表机构 * Paris-Saclay University, CentraleSupélec, MICS Lab（巴黎-萨克雷大学，中央理工-巴黎高等电力学院，MICS实验室）； Lorraine University, CRAN（洛林大学，CRAN）

AI总结本文研究了在部分潜在变量共享设定下多模态因果表示学习的可识别性问题，通过非线性混合函数生成各模态数据，并在不假设潜在变量分布的情况下，建立了因果潜在表示的组件可识别性保证，进一步验证了在欠定情况下方法的有效性。

详情

AI中文摘要

因果表示学习（CRL）旨在从高维观测数据中揭示有意义的潜在变量及其对应的因果结构。尽管其重要性，CRL的可识别性仍是一个关键属性，因为它确保了数据生成过程背后机制的恢复，从而保证了表示的可解释性和鲁棒性。证明CRL的可识别性本质上是困难的，本文针对更具有挑战性的多模态设定进行了研究：考虑具有部分共享潜在结构的多模态观测数据。每个模态通过非线性混合函数从特定的因果潜在变量子集生成。在灵活的假设下且不假设潜在变量的参数分布，我们建立了因果潜在表示的组件可识别性保证。此外，我们的可识别性结果还适用于欠定情况，即每个模态中观测变量多于潜在变量。为了实例化我们的理论分析，我们引入了一个基于Wasserstein的模块来恢复部分共享的潜在结构。由于其可微性，后者可以轻松地集成到所有类型的架构中，仅需最小的修改。在合成和现实数据集上的广泛实验验证了我们的方法优于现有最先进方法。

英文摘要

Causal representation learning (CRL) seeks to uncover meaningful latent variables and their corresponding causal structure from high-dimensional observational data. Although its significance, CRL identifiability remains a crucial property, as it ensures the recovery of the mechanisms behind the data generation process, and hence the interpretability and robustness of the representation. Proving identifiability in CRL is intrinsically difficult, and we address in this work an even more challenging setting: multimodality. We consider multimodal observed data with a latent partially shared structure. Each modality is generated, through non linear mixing functions, from a specific subset of causal latent variables. Under flexible assumptions and without imposing any parametric distribution on the latent variables, we establish component-wise identifiability guarantees for the causal latent representation. Our identifiability results, furthermore, apply to the undercomplete scenario where we have, for each modality, more observed than latent variables. To instantiate our theoretical analysis, we introduce a Wasserstein-based module to recover the partially shared latent structure. Due to its differentiability, the latter can be easily integrated into all types of architecture, only requiring minimal changes. Extensive experiments on synthetic and realistic datasets validate the superiority of our approach over SOTA methods.

URL PDF HTML ☆

赞 0 踩 0

2605.19132 2026-05-20 cs.LG 版本更新

CLIC: Contextual Language-Informed Cardiac Pathology Classification

CLIC: 基于上下文的语言引导心脏病理分类

Giovani D. Lucafo, Rafael da Costa Silva, João Lucas Luz Lima Sarcinelli, Andre Guarnier De Mitri, Diego Furtado Silva

发表机构 * Institute of Mathematical and Computer Sciences（数学与计算机科学学院）； Universidade de São Paulo（圣保罗大学）

AI总结本文提出CLIC框架，通过将患者上下文数据转化为描述性文本，利用自然语言编码技术提升心脏病理诊断的精确度，同时探索大语言模型生成的临床描述在下游分类任务中的应用。

Comments 6 pages, 2 figures, accepted at the ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM)

详情

AI中文摘要

心电图（ECG）是无创诊断心脏病理的黄金标准，也是心血管医学的基本支柱。深度学习的最新进展推动了稳健的自动化分类器的发展，这些分类器通过处理原始生理信号实现高性能。然而，在临床实践中，诊断很少仅基于信号本身。心内科医生通常会结合患者的特征和具体的数据采集上下文来支持其解释。尽管如此，大多数现有算法仍局限于仅信号分析，未能整合技术元数据和人口统计数据。本文提出了上下文语言引导的心脏病理分类（CLIC），一种多模态框架，通过自然语言编码这些变量显著提高诊断精度。我们证明将患者层面的上下文数据转化为描述性文本提供了一个信息锚点，帮助模型解歧复杂的生理模式。我们进一步探讨了使用大语言模型合成更丰富的临床描述，并观察到尽管这些生成的文本仍具竞争力，但受控模板化的上下文临床文本在下游分类任务中带来了持续的性能提升。

英文摘要

The electrocardiogram (ECG) is the gold standard for non-invasive diagnosis of cardiac pathologies and is a fundamental pillar of cardiovascular medicine. Recent progress in deep learning has led to the development of robust automated classifiers that achieve high performance by processing raw physiological signals. However, in clinical practice, diagnosis is rarely based solely on the signal. Cardiologists commonly support their interpretation with the patient's characteristics and the specific data-acquisition context. Despite this, most current algorithms remain restricted to signal-only analysis, failing to integrate technical metadata and demographic variables. This paper proposes Contextual Language-Informed Cardiac pathology classification (CLIC), a multimodal framework that significantly enhances diagnostic precision by encoding these variables through natural language. We demonstrate that translating patient-level contextual data into descriptive text provides an informative anchor that helps the model disambiguate complex physiological patterns. We further investigate the use of Large Language Models to synthesize richer clinical descriptions and observe that, while these generated texts remain competitive, controlled template-based contextual clinical text leads to consistent improvements in downstream classification performance.

URL PDF HTML ☆

赞 0 踩 0

2605.19130 2026-05-20 cs.LG cs.AI cs.CL cs.CV 版本更新

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

EgoBabyVLM：基于自然主义第一人称视频数据的跨模态学习基准测试

Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Éric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）； Stanford University（斯坦福大学）； Meta Reality Labs（Meta现实实验室）； The University of Tokyo（东京大学）

AI总结研究探讨了儿童如何从有限的视觉-语言输入中获得语言 grounding 的鲁棒性，提出了 EgoBabyVLM 挑战，推动模型在自然主义数据中实现 grounded language learning。

详情

AI中文摘要

儿童在有限的视觉-语言输入中展现出惊人的鲁棒性，这种能力超过了目前最好的大型多模态模型。最近的研究表明，目前基于 curated web 数据训练的视觉-语言模型 (VLMs) 无法泛化到由可穿戴设备、具身代理和婴儿头摄像机产生的稀疏、弱对齐的第一人称视频流，并且没有固定的评估流程来衡量在此类数据上的进展。我们训练 VLMs 在具有不同视觉和语言输入语义对齐程度的数据集上，包括自然主义婴儿和成人第一人称视频，并通过涵盖多模态语言 grounding 和单模态视觉和语言任务的综合评估套件进行评估。这套评估的核心是 Machine-DevBench，它是一个基于语料库的基准测试，自动从模型的训练词汇中生成，以消除训练/评估不匹配和先前发展基准的低统计效力。我们的结果表明，当前 VLM 模型依赖于 curated 数据的紧密语义对齐，并无法利用主导自然主义第一人称输入的弱对齐信号——正是人类在其中茁壮成长的领域。为了推动进展，我们引入了 EgoBabyVLM 挑战，以驱动开发能够从人类婴儿经历的此类自然主义数据中实现 grounded language learning 的模型。

英文摘要

Children acquire language grounding with remarkable robustness from limited visuo-linguistic input in ways that surpass today's best large multimodal models. Recent research suggests current vision-language models (VLMs) trained on curated web data fail to generalize to the sparse, weakly-aligned egocentric streams produced by wearable devices, embodied agents, and infant head-cams -- and no fixed evaluation pipeline exists for measuring progress on this regime. We train VLMs on datasets with varying degrees of semantic alignment between visual and linguistic inputs, including naturalistic infant and adult egocentric videos, and evaluate them with a comprehensive suite spanning multimodal language grounding and unimodal vision and language tasks. At the core of this suite is Machine-DevBench, a corpus-grounded benchmark of lexical and grammatical competence, automatically generated from the model's training vocabulary across logarithmic frequency bins to eliminate the train/eval mismatch and low statistical power of prior developmental benchmarks. Our results show that current VLM paradigms hinge on the tight semantic alignment of curated data and fail to exploit the weakly-aligned signal that dominates naturalistic egocentric input -- the very regime in which humans thrive. To motivate progress, we introduce the EgoBabyVLM Challenge to drive the development of models capable of grounded language learning from the kind of naturalistic data that human infants experience.

URL PDF HTML ☆

赞 0 踩 0

2605.19124 2026-05-20 cond-mat.mtrl-sci cond-mat.dis-nn cs.LG physics.chem-ph 版本更新

Atomistic Modeling of Chemical Disorder in Materials: Bridging Classical Methods and AI-Assisted Approaches

材料中化学无序的原子模型：连接经典方法和AI辅助方法

Jiayu Peng, Peichen Zhong

发表机构 * Department of Materials Design and Innovation, University at Buffalo（布法罗大学材料设计与创新系）； Department of Materials Science and Engineering, National University of Singapore（新加坡国立大学材料科学与工程系）

AI总结本文探讨了如何通过结合经典方法和AI技术来解决材料中化学无序的表示差距问题，重点介绍了如何利用计算方法将平均无序描述转换为具有代表性的构型集合，并平衡成本、偏差和保真度。

详情

AI中文摘要

化学无序源于多种元素占据晶格位置的混合占据，广泛存在于合金、陶瓷和成分复杂的材料中，其中短程和长程有序可以显著影响性质。一个核心障碍是实验与模拟之间的表示差距：实验通常报告无序为部分占据和集体平均行为，而原子模拟和AI工作流程通常需要完全指定的配置。解决这一差距需要能够将平均无序描述转换为代表性构型集合的计算方法，同时平衡成本、偏差和保真度。这一挑战在AI驱动的计算发现中变得更加紧迫，因为忽略无序可能导致AI工作流程错误排名稳定性、错误判断新颖性和误导实验，使用过于理想化的表示。本文综述了经典方法和AI驱动方法如何弥合这一表示差距。我们评估了从平均场理论、簇扩展、准随机近似、蒙特卡洛以及新兴的由通用原子间势能和生成模型驱动的方法的优缺点。我们还强调了AI如何通过降低微状态评估、构型探索和原子到热力学闭合的成本来加速经典计算方案。我们还强调了AI如何使无序原生能力成为可能，包括工作流程优先级、对有序敏感和化学表示、生成模型的无序结构和分布以及对动力学敏感的无序预测。共同，这一框架概述了通往无序原生AI的实用路线图，将化学无序从一个表示障碍转变为现实AI加速材料发现中的可控变量。

英文摘要

Chemical disorder, originating from the mixed occupation of crystallographic sites by multiple elements, is widespread in alloys, ceramics, and compositionally complex materials, where short- and long-range orderings can strongly influence properties. A central obstacle is the representation gap between experiments and simulations: experiments often report disorder as partial occupancies and ensemble-averaged behaviors, whereas atomistic simulations and AI workflows usually require fully specified configurations. Tackling this gap requires computational methods that convert averaged disorder descriptions into representative configurational ensembles while balancing cost, bias, and fidelity. This challenge has become more urgent in AI-driven computational discovery, where ignoring disorder may cause AI workflows to misrank stability, misjudge novelty, and misdirect experiments with too-idealized representations. This Review highlights how classical and AI-driven methods can bridge this representation gap. We assess the strengths and limitations of approaches spanning mean-field theories, cluster expansion, quasi-random approximations, Monte Carlo, and emerging schemes powered by universal interatomic potentials and generative models. We further highlight how AI can accelerate classical computational schemes by lowering the cost of microstate evaluation, configurational exploration, and atomistic-to-thermodynamic closure. We also emphasize how AI can enable disorder-native capabilities, including workflow triage, ordering-sensitive and alchemical representations, generative models of disordered structures and distributions, and kinetics-aware disorder prediction. Together, this framework outlines a practical roadmap toward disorder-native AI, which can transform chemical disorder from a representational obstacle into a controllable variable for realistic AI-accelerated materials discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.19122 2026-05-20 stat.ML cs.LG 版本更新

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

双通道张量神经网络：有限样本理论与符合结构选择

Elynn Chen, Jiayu Li, Zheshi Zheng, Jian Pei

发表机构 * New York University（纽约大学）； University of Michigan（密歇根大学）； Duke University（杜克大学）

AI总结本文提出双通道张量神经网络（DC-TNN），通过分解张量输入为低秩核心和稀疏细化部分，并通过耦合的神经通道处理两者。该框架结构无关，可容纳CP、Tucker和张量列车核心。在估计方面，建立了DC-TNN估计器的非渐近风险界，并展示了有效维度由核心秩和细化稀疏性共同决定。在推断方面，开发了结构感知符合ROC程序，产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，提出了符合结构选择器，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

详情

AI中文摘要

张量值数据自然出现在神经影像、基因组学、气候科学和时空网络中，其中多线性依赖关系在模式间携带信息，而向量化会破坏这些信息。现有方法要么施加单一低秩结构，可能遗漏局部信号，要么将张量视为长向量，从而丢弃其多维几何。我们提出双通道张量神经网络（DC-TNN），将每个张量输入分解为低秩核心和稀疏细化，并通过耦合的神经通道处理两个组件。该框架结构无关，可容纳CP、Tucker和张量列车核心于单一架构中。在估计方面，我们建立了DC-TNN估计器的非渐近风险界，将其分解为网络近似、核心估计和细化选择项，并显示有效维度由核心秩和细化稀疏性共同决定，而非由张量环境大小决定。在推断方面，我们开发了结构感知符合ROC程序，校准在核心-细化潜在空间中，并产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，我们提出了符合结构选择器，据我们所知，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

英文摘要

Tensor-valued data arise naturally in neuroimaging, genomics, climate science, and spatiotemporal networks, where multilinear dependencies across modes carry information that is destroyed under vectorization. Existing approaches either impose a single low-rank structure, which can miss localized signal, or treat the tensor as a long vector, which discards its multiway geometry. We propose a *Dual-Channel Tensor Neural Network* (DC-TNN) that decomposes each tensor input into a low-rank core and a sparse refinement, and processes the two components through coupled neural channels. The framework is structure-agnostic and accommodates CP, Tucker, and tensor-train cores within a single architecture. For estimation, we establish non-asymptotic risk bounds for the DC-TNN estimator that decompose into network approximation, core estimation, and refinement-selection terms, and show that the effective dimension is determined jointly by the core rank and refinement sparsity rather than by the ambient tensor size. For inference, we develop a *structure-aware conformal ROC* procedure that calibrates within the core-refinement latent space and produces ROC and AUC confidence bands with finite-sample, distribution-free coverage. Building on this, we propose a *conformal structure selector* that, to our knowledge, is the *first distribution-free procedure* for choosing among candidate tensor decompositions with finite-sample validity. Simulations and an analysis of a protein dataset demonstrate competitive predictive accuracy, reliable uncertainty quantification, and consistent recovery of the tensor structure.

URL PDF HTML ☆

赞 0 踩 0

2605.19119 2026-05-20 cs.NE cs.AI cs.LG 版本更新

GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization

GOAL: 图基基于的目标对齐扩散求解器用于动态多目标优化

Xingyu Li

发表机构 * Purdue University（普渡大学）

AI总结本文提出GOAL，一种基于图的扩散求解器，用于动态多目标优化问题，通过条件化扩散求解器实现可控决策生成，通过人类指定的目标进行条件化，引入异构图编码，允许信息根据约束的本体进行选择性传播，并在三个经典调度基准上实现了100%的解可行性和接近零的MAPE。

详情

AI中文摘要

现有的神经组合优化求解器将解决方案搜索框定为模仿最优决策，本质上限制了其在单目标最小化和静态约束下的用途。我们提出了GOAL，一种基于关系图表示的条件扩散求解器，能够通过在人类指定的目标上进行条件化来实现可控的决策生成。我们引入了一种异构图编码，在其中不同的边类型，对应于不同类别的约束，定义了图神经网络的消息传递结构，这允许信息根据每个约束的本体进行选择性传播。GOAL在三个经典调度基准上进行了实例化和评估，这些基准涵盖了各种约束复杂度：流水作业问题（FSP）、作业调度问题（JSP）和灵活作业调度问题（FJSP）。在不进行架构修改的情况下，通用性在结构上不同的约束领域和问题类型中得到证明。在所有三个基准上，GOAL在20个作业和60个操作的问题规模上实现了100%的解可行性和接近零的MAPE（低于0.20%）在多个目标上，优于NSGA-II和MOEA/D在解质量和推理速度上最多提高了25倍。

英文摘要

Existing neural combinatorial optimization solvers frame solution search as imitation of optimal decisions, inherently limiting their utility to single-objective minimization and static constraints. We propose GOAL, a conditioned diffusion solver over relational graph representations that enables controllable decision generations by conditioning on human-specified objectives. We introduce a heterogeneous graph encoding in which distinct edge types, corresponding to different classes of constraints, define the message passing structure of the graph neural network, which allows information to propagate selectively according to the ontology of each constraint. GOAL is instantiated and evaluated on three canonical scheduling benchmarks of various constraint complexity: the Flow Shop Problem (FSP), the Job Shop Scheduling Problem (JSP), and the Flexible Job Shop Scheduling Problem (FJSP). Generalization is demonstrated across structurally distinct constraint regimes and problem types without architectural modification. On all three benchmarks, GOAL achieves 100% solution feasibility and near-zero MAPE (below 0.20%) on multiple objectives for problem sizes up to 20 jobs and 60 operations, outperforming NSGA-II and MOEA/D in both solution quality and inference speed by up to 25x.

URL PDF HTML ☆

赞 0 踩 0

2605.19113 2026-05-20 stat.ME cs.LG stat.ML 版本更新

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

通过直接优化学习可解释的基于点的临床风险评分

Ying Cui, Albert M Li, Vivek Charu, Yeon-Mi Hwang, Tina Hernandez-Boussard, Lu Tian

发表机构 * Department of Biomedical Data Science, Stanford University（斯坦福大学生物医学数据科学系）； Decatur High School（德凯高中）； Department of Pathology, Stanford University School of Medicine（斯坦福大学医学院病理学系）； Division of Computational Medicine, Department of Medicine, Stanford University（斯坦福大学医学系计算医学分会）

AI总结本文提出了一种新的机器学习算法，通过灵活的贪心优化策略直接学习可解释的基于点的临床风险评分，以在明确的最优性目标下优化加法评分。

Comments 23 pages, 4 figures

详情

AI中文摘要

许多临床风险评分被部署为加法规则，其中相关的二元预测特征被分配非负整数点。这些整数权重不仅使评分在实践中更容易使用，还促进了所得到的预测模型的稀疏性。此类风险评分通常通过首先拟合回归模型，然后经过适当缩放后将估计的系数四舍五入到最近的整数来获得。这种方法计算速度快，但不能保证最终评分的最优性。替代方法是通过遍历所有可能的整数权重，将问题视为整数规划任务，直接优化价值函数。然而，相关计算负担可能相当大，尤其是当价值函数是非凸甚至不连续时。在本文中，我们开发了新的机器学习算法，采用灵活的贪心优化策略，在明确且合理的最优性目标下直接学习此类加法评分。我们应用所提出的方法，利用Epic Cosmos中的大规模电子健康记录（EHR）队列，构建一个整数加权共病评分，用于衡量出院后死亡风险。我们还进行了模拟研究，以考察有限样本的操作特性。

英文摘要

Many clinical risk scores are deployed as additive rules with nonnegative integer points assigned to relevant binary predictive features. These integer weights not only make the score easier to use in practice but also promote sparsity in the resulting prediction model. Such risk scores are often derived by first fitting a regression model and then rounding the estimated coefficients to the nearest integer after appropriate scaling. This approach is computationally fast but does not guarantee optimality of the resulting score. Alternatively, one may search over all possible integer weights to directly optimize a value function by posing the problem as an integer programming task. However, the associated computational burden can be substantial, especially when the value function is nonconcave or even discontinuous. In this paper, we develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives. We apply the proposed method to a large electronic health record (EHR) cohort in Epic Cosmos to construct an integer-weighted comorbidity score for measuring the risk of post-discharge mortality. We also conduct a simulation study to examine the finite-sample operating characteristics.

URL PDF HTML ☆

赞 0 踩 0

2605.19107 2026-05-20 cs.LG eess.SP 版本更新

Performance Monitoring of Proton Exchange Membrane Water Electrolyzer by Transformers-Based Machine Learning Model

通过基于变压器的机器学习模型对质子交换膜水电解器进行性能监控

Bingqing Chen, Ivan Batalov, Qiu Chen, Weiqi Ji, Lei Cheng

发表机构 * Bosch Research & Technology Center（博世研发与技术中心）

AI总结本文提出了一种基于变压器的机器学习框架，用于在正常运行过程中进行虚拟电化学表征，通过编码器-解码器结构对极化曲线进行重构，实现了对质子交换膜水电解器状态健康度的连续监控。

详情

AI中文摘要

绿色氢气在去碳化过程中扮演着关键角色，预计到2030年其容量将扩大至560 GW（2023年为1.39 GW）。质子交换膜（PEM）电解是生产绿色氢气最有前途的技术路线之一，实时监测PEM电解器的系统健康状况对于其规模化部署至关重要。在实验室环境中，可以通过电化学测试协议通过定期暂停正常运行来表征性能退化。这种中断对于大规模堆叠部署来说并不实用，限制了系统操作员对健康状态（SoH）进行实时评估的能力。本文提出了一种机器学习（ML）框架，可以在正常运行过程中进行虚拟电化学表征。该方法使用编码器-解码器变压器，基于操作数据来重构表征输出，重点关注极化曲线。受基于补丁的序列分词启发，我们将输入分割成补丁并对其进行编码，以形成有意义的标记，这大大提高了学习效率。在四次纵向运行中，持续时间最长为478小时，不同测试单元和负载循环下，模型准确重构了极化曲线，并相比普通变压器实现了均方误差（MSE）减少10倍。这一概念验证表明，ML模型可以实现PEM电解器的连续性能监控，并且编码器能够捕捉到SoH的有意义的潜在表示，为未来工作中的可解释指标推导提供了机会。

英文摘要

Green hydrogen plays an essential role in decarbonization, with capacity projected to scale to 560 GW by 2030 (vs. 1.39 GW in 2023) in net-zero settings. Proton exchange membrane (PEM) electrolysis is one of the most promising technology routes to green hydrogen production, and real-time system health monitoring of PEM electrolyzers is essential for their scalable deployment. In lab settings, performance degradation can be characterized through electrochemical testing protocols by periodic pauses of normal operation. Such interruption is not practical for full-scale stack deployments, limiting system operators' ability to make real-time assessments of state-of-health (SoH). We present a machine learning (ML) framework that performs virtual electrochemical characterization during normal operation. The method uses an encoder-decoder transformer, conditioned on operational data, to reconstruct characterization outputs, focusing here on polarization curves. Inspired by patch-based sequence tokenization, we segment the inputs into patches and encode them to form meaningful tokens, which substantially improves learning efficiency. Across four longitudinal runs, lasting up to 478 hours on different test cells and loading cycles, the model accurately reconstructed polarization curves and achieved 10x reduction in mean squared error (MSE) compared to a vanilla transformer. This proof-of-concept demonstrates that ML models can enable continuous performance monitoring for PEM electrolyzers and that the encoder captures meaningful latent representations of SoH, opening up opportunities to derive interpretable indicators in future work.

URL PDF HTML ☆

赞 0 踩 0

2605.19101 2026-05-20 cs.SD cs.LG 版本更新

Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training

面向异质性的数据集调度以实现高效的音频大语言模型训练

Yanru Wu, Jianning Wang, Chongxin Gan, Yang Li

发表机构 * Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Independent Researcher（独立研究者）； The Hong Kong Polytechnic University（香港理工大学）

AI总结本文提出了一种面向异质性的数据集调度方法GST，通过将数据集分组并按渐进调度策略引入，平衡了并行训练的稳定性与序列优化的效率，从而在14个AudioQA数据集上实现了30-40%的更快收敛速度。

详情

AI中文摘要

训练通用的音频大语言模型（ALLMs）以跨多样化的数据集进行训练对于全面的音频理解至关重要，但面临由于数据集异质性导致的显著挑战，这通常会导致冲突的梯度和缓慢的收敛。尽管其影响重大，如何在训练过程中显式管理这种异质性仍鲜有研究，当前的做法主要依赖于均匀混合。在本文中，我们从收敛性角度分析多数据集AudioQA训练，并提出分组序列训练（GST）。GST战略性地将数据集分为具有亲和力的数据集组，并通过渐进调度协议引入这些数据集，有效地平衡了并行训练的稳定性与序列优化的效率。为了确保可扩展性，我们开发了基于梯度的亲和度度量，以捕捉跨数据集的关系，而无需采用具有抑制成本的经验转移性估计。在14个AudioQA数据集上的广泛评估表明，GST在标准并行训练上实现了30-40%更快的收敛速度，同时保持或超越混合所有训练的性能。我们的结果提供了理论见解和一个实用且模型无关的框架，用于高效的大规模ALLM优化。

英文摘要

Training general-purpose Audio Large Language Models (ALLMs) across diverse datasets is essential for holistic audio understanding, yet it faces significant challenges due to dataset heterogeneity, which often leads to conflicting gradients and slow convergence. Despite its impact, how to explicitly manage this heterogeneity during training remains underexplored, with current practices relying primarily on uniform mixture. In this work, we analyze multi-dataset AudioQA training from a convergence perspective and propose Grouped Sequential Training (GST). GST strategically organizes datasets into affinity-aware groups and introduces them via a progressive scheduling protocol, effectively balancing the stability of parallel training with the efficiency of sequential optimization. To ensure scalability, we develop gradient-based affinity metrics that capture inter-dataset relationships without the prohibitive cost of empirical transferability estimation. Extensive evaluations on 14 AudioQA datasets spanning speech, music, and environmental sounds demonstrate that GST achieves 30--40\% faster convergence than standard parallel training while maintaining or even surpassing the performance of mix-all training. Our results provide both theoretical insights and a practical, model-agnostic framework for efficient large-scale ALLM optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.19095 2026-05-20 cs.LG cs.AI stat.ML 版本更新

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

ScheduleFree+: 将学习率自由和调度自由学习扩展到大型语言模型

Aaron Defazio

发表机构 * FAIR at Meta Super-Intelligence Labs（Meta 超智能实验室）

AI总结本文提出了一种学习率自由和调度自由的学习方法（ScheduleFree+），用于训练大型语言模型，该方法在大规模训练中显著优于传统的Warmup-Stable-Decay（WSD）调度方案，并证明了调度自由学习在长周期训练中的有效性。

2605.19093 2026-05-20 cs.AI cs.LG 版本更新

MANGO：面向在线持续学习的元适应网络梯度优化

Ankita Awasthi, Marco Apolinario, Kaushik Roy

发表机构 * Purdue University（普渡大学）； TU Delft（代尔夫特理工大学）

AI总结本文提出MANGO框架，通过梯度门控和元学习正则化平衡持续学习中的稳定性与可塑性，实现对过去任务遗忘的克服和新任务高效学习。

详情

AI中文摘要

在在线持续学习（OCL）中，神经网络在单次通过中从非平稳数据流中依次学习，仅能访问有限的内存回放缓冲区。这与离线持续学习形成鲜明对比，后者依赖多个epoch训练大型数据集。OCL的主要挑战是克服对过去任务的灾难性遗忘（稳定性）的同时高效学习新任务（可塑性）。现有方法通过回放式复习、输出级蒸馏、固定正则化或当前数据上的元学习来对抗遗忘。然而，这些方法存在局限：复习引入存储样本偏差；蒸馏在输出分布上操作而无法调节参数更新；固定正则化对参数施加惩罚而不考虑敏感性；仅基于数据流的元学习缺乏反馈控制的参数更新。我们提出元适应网络梯度优化（MANGO），一种OCL框架，通过梯度门控和元学习正则化平衡稳定性与可塑性。梯度门控根据敏感性调整参数更新，防止破坏性更新。元学习正则化适应稳定性系数，评估参数更新对回放的影响。在MANGO中，回放同时充当训练信号和遗忘评估器。我们在三个标准OCL基准数据集上评估了我们的方法。MANGO在多个基准上优于强基线方法，取得最先进的结果，并在不同回放大小下保持一致性能。在CLEAR-10上的领域增量学习和CIFAR-100和Tiny-ImageNet上的类别增量学习中，它在所有基线中取得最高准确率，并实现正向反馈转移，克服CLEAR-10上的遗忘。

英文摘要

In Online Continual Learning (OCL), a neural network sequentially learns from a non-stationary data stream in a single-pass with access only to a limited memory replay buffer. This contrasts sharply with off-line continual learning where training is multiple epoch dependent on large datasets. The main challenge faced by OCL is to overcome catastrophic forgetting of past tasks (stability) while learning new ones efficiently (plasticity). Existing methods counter forgetting via replay-based rehearsal, output level distillation, fixed regularization, or meta-learning on the current data. However, these methods have limitations: rehearsal introduces a stored sample bias; distillation operates on output-distributions without modulating parameter updates; fixed-regularization penalizes parameters irrespective of sensitivity; stream-only meta-learning lacks a feedback controlled parameter update. We propose Meta-Adaptive Network Gradient Optimization (MANGO), an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. We evaluated our method on three standard OCL benchmark datasets. MANGO outperforms strong baselines, achieving state-of-the-art results with consistent performance across replay sizes. In domain incremental learning on CLEAR-10 and class incremental learning on CIFAR-100 and Tiny-ImageNet, it achieves highest accuracy among all baselines and achieves positive Backward Transfer, overcoming forgetting on CLEAR-10.

URL PDF HTML ☆

赞 0 踩 0

2605.19076 2026-05-20 cs.LG physics.flu-dyn 版本更新

The impact of observation density on Bayesian inversion of latent dynamics in shock-dominated flows

观测密度对冲击主导流动中潜变量动态贝叶斯反演的影响

Bipin Tiwari, Muhammad Abid, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville（田纳西大学机械与航空航天工程系）

AI总结本文提出了一种非侵入式降阶建模框架，用于高效贝叶斯初始状态反演与不确定性量化，通过卷积自编码器和学习的潜空间前向算子结合，以提高冲击主导流动中潜变量动态的反演精度和效率。

详情

AI中文摘要

从稀疏和噪声测量中推断冲击主导可压缩流动中未知的初始状态是一个具有挑战性的不适定反问题，由于非线性波相互作用和传感限制。在本工作中，我们开发了一种非侵入式降阶建模框架，用于高效的贝叶斯初始状态反演与不确定性量化。该框架结合了卷积自编码器和学习的潜空间前向算子。自编码器将高维流动场压缩成紧凑的非线性潜表示，而前向算子从编码的初始条件预测最终时间的潜变量状态。该AE-ROM代理能够快速进行正向评估，并嵌入到No-U-Turn Sampler (NUTS)中进行后验探索。该框架通过拉丁超立方采样生成500个高保真度Sod冲击管模拟，并使用五阶WENO方案求解。反问题旨在从稀疏噪声观测的最终时间密度和压力场中恢复未知的左和右密度和压力状态。结果表明，AE-ROM能够准确重建关键的冲击管结构，包括稀疏波、接触不连续性和激波前。潜变量维度为32提供了重建精度和减少空间紧凑性之间的有效平衡，而250个训练模拟足以实现准确的重建。增加观测密度显著收缩后验不确定性，将密度的均值后验标准差减少约78%，压力减少约76%。总体而言，所提出的框架为冲击主导流动的反演分析提供了一种计算高效且具有不确定性的方法，具有向多维可压缩流动和数字孪生应用扩展的潜力。

英文摘要

Inferring unknown initial states in shock-dominated compressible flows from sparse and noisy measurements is a challenging ill-posed inverse problem due to nonlinear wave interactions and limited sensing. In this work, we develop a non-intrusive reduced-order modeling framework for efficient Bayesian initial-state inversion with uncertainty quantification. The framework combines a convolutional autoencoder with a learned latent-space forward operator. The autoencoder compresses high-dimensional flow fields into a compact nonlinear latent representation, while the forward operator predicts final-time latent states from encoded initial conditions. This AE-ROM surrogate enables rapid forward evaluations and is embedded within a No-U-Turn Sampler (NUTS) for posterior exploration. The framework is demonstrated using 500 high-fidelity Sod shock tube simulations generated through Latin hypercube sampling and solved using a fifth-order WENO scheme. The inverse problem seeks to recover unknown left and right density and pressure states from sparse noisy observations of final-time density and pressure fields. Results show that the AE-ROM accurately reconstructs key shock-tube structures, including the rarefaction wave, contact discontinuity, and shock front. A latent dimension of 32 provides an effective balance between reconstruction accuracy and reduced-space compactness, while 250 training simulations are sufficient for accurate reconstruction. Increasing observation density significantly contracts posterior uncertainty, reducing the mean posterior standard deviation by approximately 78% for density and 76% for pressure. Overall, the proposed framework provides a computationally efficient and uncertainty-aware approach for inverse analysis of shock-dominated flows, with potential extensions to multidimensional compressible-flow and digital-twin applications.

URL PDF HTML ☆

赞 0 踩 0

2605.19073 2026-05-20 cs.LG cs.AI 版本更新

Riemannian Networks over Full-Rank Correlation Matrices

全秩相关矩阵上的Riemannian网络

Ziheng Chen, Xiaojun Wu, Bernhard Schölkopf, Nicu Sebe

发表机构 * Department of Information Engineering and Computer Science, University of Trento, Trento, Italy（特伦托大学信息工程与计算机科学系）； School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China（江南大学人工智能与计算机科学学院）

AI总结本文提出了一种在全秩相关矩阵上进行Riemannian网络的研究，通过扩展基本层并引入准确的反向传播方法，展示了其在对比现有SPD和Grassmannian网络时的有效性。

Comments Accepted to ICML 2026

2605.19063 2026-05-20 cs.LG 版本更新

Mapping Uncharted Symmetries: Machine Discovery in Combinatorics

映射未知对称性：组合学中的机器发现

Eugenio Cainelli, Lorenzo Luccioli, Alessandro Iraci, Michele D'Adderio, Giovanni Paolini

发表机构 * University of Bologna（博洛尼亚大学）； Pegaso University（佩加索大学）； University of Pisa（比萨大学）

AI总结本文提出了一种基于机器学习的组合学研究方法，通过构建满足精确分布约束的简单数学函数，发现q,t-纳尔ayan多项式的新组合解释，并提供了其对称性的证明。

Comments 20 pages

详情

AI中文摘要

受代数组合学中长期未解决的问题启发，我们展示了现代机器学习可以有意义地贡献于可验证的数学发现。特别是，我们关注在精确分布约束下构造简单数学函数的问题，将其正式化为简单学习在刚性比例下（SLURP）。我们通过引入两种方法：MapSeek-Functional，通过交替伪标签和监督训练步骤建模所需函数；以及MapSeek-Symbolic，直接生成符号公式。我们成功将这两种方法应用于代数组合学中的研究问题，发现了来自表示论的q,t-纳尔ayan多项式的新组合解释。据我们所知，这是基于非交叉划分的第一个此类解释。使用一个发现的统计量，我们找到了这些多项式对称性的组合证明，在之前未解决的情况下。为了简化验证和可重复性，我们发布了所有代码，包括本文所有数学发现的Lean 4形式化。

英文摘要

Inspired by long-standing open problems in algebraic combinatorics, we show that modern machine learning can meaningfully contribute to verifiable mathematical discoveries. In particular, we focus on the construction of simple mathematical functions under exact distributional constraints, a setting we formalize as Simple Learning Under Rigid Proportions (SLURP). We tackle this problem by introducing two methods: MapSeek-Functional, which models the desired function alternating pseudo-labeling and supervised training steps; and MapSeek-Symbolic, designed to directly produce symbolic formulas. We successfully apply both methods to a research problem in algebraic combinatorics, discovering a new combinatorial interpretation of the $q,t$-Narayana polynomials arising from representation theory. To our knowledge, this is the first such interpretation based on noncrossing partitions. Using one discovered statistic, we find a combinatorial proof of the symmetry of these polynomials in a previously unsolved case. To streamline verification and reproducibility, we release all code, including a formalization of all the mathematical discoveries of this paper in Lean 4.

URL PDF HTML ☆

赞 0 踩 0

2605.19050 2026-05-20 cs.LG physics.chem-ph q-bio.QM 版本更新

Generative Pseudo-Force Fields for Molecular Generation

生成伪力场用于分子生成

Stefaan Simon Pierre Hessmann, Khaled Kahouli, Stefan Gugler, Michael Plainer, Frank Noé, Klaus-Robert Müller, Niklas Wolf Andreas Gebauer

发表机构 * Machine Learning Group, Technische Universität Berlin（技术大学柏林机器学习小组）； BIFOLD – Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究院）； Department of Mathematics and Computer Science, Freie Universität Berlin（柏林自由大学数学与计算机科学系）； Zuse School ELIZA, Darmstadt, Germany（达姆施塔特德国Zuse学校ELIZA）； Department of Physics, Freie Universität Berlin（柏林自由大学物理系）； Microsoft Research AI4Science, Berlin, Germany（柏林德国微软研究院AI4Science）； Department of Chemistry, Rice University, Houston, USA（美国休斯顿莱斯大学化学系）； Max-Planck Institute for Informatics, Saarbrücken, Germany（德国萨尔布吕肯马克斯·普朗克信息研究所）； Department of Artificial Intelligence, Korea University, Seoul, South Korea（韩国首尔韩国大学人工智能系）

AI总结本文提出生成伪力场（GPFFs）以解决分子生成中能量基放松与数据驱动生成模型采样效率之间的权衡问题，通过训练MLFF在参考平衡结构上的二次伪势能面上实现高效且稳定的分子构象生成。

详情

AI中文摘要

生成稳定的分子构象通常需要在基于物理的能量放松的物理真实性和数据驱动生成模型的采样效率之间做出权衡。虽然机器学习力场（MLFFs）可以通过根据物理力放松分子几何结构来采样稳定的构象，但它们需要昂贵的从头计算训练数据。相反，扩散模型（DMs）仅从平衡数据学习，但依赖于噪声调度和时间步长条件。在本文中，我们提出生成伪力场（GPFFs）以弥合这些范式，通过在参考平衡结构上的二次伪势能面上训练MLFF。由于不需要对扰动几何进行从头计算，非平衡训练数据可以通过对平衡结构添加高斯噪声实时生成。我们证明GPFFs是方差爆炸扩散模型的时间步长无关变种：分数来自预测的伪力，但力的大小隐含地编码了噪声水平，因此不需要时间步长条件。我们的GPFF因此可以作为标准扩散采样（祖先、Heun）中的直接替换，也可以促进更高效、自适应的变种和一个受MLFF启发的直接去噪方案。我们提出的采样算法支持任意的结构先验和几何约束。在QM9数据集上，GPFF在256个神经函数评估（NFE）时有100%的有效性，在仅6个NFE时超过50%，优于所有扩散基线。结合自定义先验，我们在分子编辑器中展示了我们的方法在药物设计设置中的快速和准确的生成过程，其中分子在实时中生成。

英文摘要

Generating stable molecular conformations typically forces a tradeoff between the physical realism of energy-based relaxation and the sampling efficiency of data-driven generative models. While machine learning force fields (MLFFs) can sample stable conformations by relaxing molecular geometries according to physical forces, they require costly ab-initio training data. Conversely, diffusion models (DMs) learn from equilibrium data alone but are dependent on noise schedules and time-step conditioning. In this work, we propose generative pseudo-force fields (GPFFs) to bridge these paradigms by training an MLFF on a quadratic pseudo-potential energy surface relative to reference equilibrium structures. Because no ab-initio calculations are required for the perturbed geometries, non-equilibrium training data can be generated on the fly by perturbing the equilibria with Gaussian noise. We show that GPFFs constitute a time-step-agnostic variant of variance exploding DMs: the score comes from the predicted pseudo-forces but because force magnitudes implicitly encode the noise level, no time-step conditioning is needed. Our GPFF can hence be used as a drop-in replacement in standard diffusion sampling (ancestral, Heun) but also facilitates more efficient, adaptive variants and an MLFF inspired direct denoising scheme. Our proposed sampling algorithms support arbitrary structural priors and geometric constraints. On QM9, GPFF has 100 % validity at 256 neural function evaluations (NFE) and over 50 % at just 6 NFE, outperforming diffusion baselines across all samplers. Combined with custom priors, we showcase the fast and accurate generation process of our method in a molecular editor for a drug design setting, where a molecule is generated in real time.

URL PDF HTML ☆

赞 0 踩 0

2605.19049 2026-05-20 cs.LG cs.AI 版本更新

KVBuffer: IO-aware Serving for Linear Attention

KVBuffer: 为线性注意力设计的I/O感知服务

Longwei Zou, Lin Zhong

发表机构 * Department of Computer Science（计算机科学系）

AI总结本文提出KVBuffer，一种I/O感知的线性注意力服务机制，通过缓冲最近的键和值，使服务系统能够更灵活且高效地计算线性注意力输出，从而减少内存访问和解码延迟，提升服务性能。

详情

AI中文摘要

线性注意力因在长上下文推理中具有与上下文长度无关的恒定解码成本而受到广泛关注。然而，现有服务系统通常在每次解码步骤中递归计算和更新一个大的线性注意力状态，由于该状态远大于每个token的键和值，递归解码导致显著的内存访问开销，对服务线性注意力效率低下。在本文中，我们提出KVBuffer，一种为线性注意力设计的I/O感知服务机制。通过缓冲最近的键和值，KVBuffer使服务系统能够以更灵活且内存高效的方式计算线性注意力输出。对于解码，KVBuffer支持分块计算，通过延迟状态更新并批量应用，减少了平均内存访问和解码延迟。对于推测解码，KVBuffer并行验证草案token并避免存储临时状态。对于短上下文，KVBuffer直接从缓冲的键和值计算注意力输出，无需创建或更新线性注意力状态。我们将在SGLang中实现KVBuffer用于Qwen3-Next。我们的评估显示，当验证四个草案token时，KVBuffer可将线性注意力解码延迟降低高达45.17%，并使推测解码的最大服务请求数增加5倍。

英文摘要

Linear attention has recently gained significant attention for long-context inference due to its constant decoding cost with respect to context length. However, existing serving systems typically serve linear attention by recurrently computing and updating a large linear attention state in every decoding step. Since the state is much larger than the per-token key and value, recurrent decoding incurs substantial memory access and becomes inefficient for serving linear attention. In this paper, we propose KVBuffer, an IO-aware serving mechanism for linear attention. By buffering recent keys and values, KVBuffer enables serving systems to compute linear attention outputs in more flexible and memory-efficient ways. For decoding, KVBuffer enables chunkwise computation, which reduces average memory access and decoding latency by deferring state updates and applying them in batch. For speculative decoding, KVBuffer verifies draft tokens in parallel and avoids storing temporary states. For short contexts, KVBuffer computes attention outputs directly from buffered keys and values, without creating or updating the linear attention state. We implement KVBuffer in SGLang for Qwen3-Next. Our evaluations show that KVBuffer can reduce linear attention decoding latency by up to 45.17% and increase the maximum number of serving requests by 5x for speculative decoding when verifying four draft tokens.

URL PDF HTML ☆

赞 0 踩 0

2605.19038 2026-05-20 cs.RO cs.LG 版本更新

通过运输的贝塔定律进行符合预测

Thiago R. Ramos, Helton Graziadei, Luben M. C. Cabezas

发表机构 * Federal University of São Carlos（萨尔瓦多联邦大学）； University of São Paulo（圣保罗大学）； Inria（法国国家信息与自动化技术研究院）； Université Grenoble Alpes（格勒诺布尔阿尔卑斯大学）

AI总结本文研究了通过实现的符合阈值诱导的校准-条件覆盖定律，利用贝塔分布作为有限样本参考对象，并通过Wasserstein距离量化偏离，从而提供对边际覆盖差距和坏校准概率的直接界限，并区分不同非i.i.d行为的来源。

详情

AI中文摘要

分割符合预测在交换性下提供有限样本边际覆盖保证，但此保证平均于随机校准样本。我们研究的是由实现的符合阈值诱导的校准-条件覆盖定律。在连续i.i.d情况下，此定律恰好为Beta(k,n+1-k)，因此常规的边际保证对应于其均值。我们将此贝塔定律作为有限样本参考对象，并利用Wasserstein距离在[0,1]上量化偏离。该框架提供了对边际覆盖差距和坏校准概率的直接界限，并根据如何变形贝塔参考来区分不同的非i.i.d行为：测试侧偏移通过覆盖尺度上的运输映射作用，而校准依赖性改变顺序统计学定律本身。我们将在尺度-偏移、聚类和稳定混合设置中实例化该框架，其中诱导的变形可以明确表征或通过Berry-Esseen近似表征。在依赖过程上的模拟证实，一阶近似在中等样本大小下能够跟踪经验Wasserstein距离。

英文摘要

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

URL PDF HTML ☆

赞 0 踩 0

2605.18474 2026-05-20 cs.CR cs.AI cs.CL cs.LG 版本更新

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint: 通过文本到权重生成实现即插即用的LLM指纹生成

Sixu Chen, Xiang Chen, Hongyao Yu, Jiaxin Hong, Hao Fang, Shuoyang Sun, Bin Chen, Shu-Tao Xia

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China（清华大学深圳国际研究生院，中国深圳）； South China University of Technology, Guangzhou, China（华南理工大学，中国广州）； Harbin Institute of Technology, Shenzhen, Shenzhen, China（哈尔滨工业大学深圳校区，中国深圳）

AI总结本文提出Prompt2Fingerprint框架，将LLM指纹生成重新定义为条件参数生成任务，通过专用生成器将文本描述直接映射到低秩参数增量，实现无需进一步模型微调的即插即用LLM指纹注入，显著降低计算开销，提供可扩展且即时的LLM所有权管理解决方案。

详情

AI中文摘要

大规模语言模型（LLMs）的广泛部署和重新分布使模型溯源跟踪成为关键挑战。尽管现有的LLM指纹生成方法，特别是通过微调嵌入身份信号的主动方法，实现了高准确性和鲁棒性，但它们面临显著的可扩展性瓶颈。这些方法通常将指纹注入视为一个独立的一次性优化任务，而不是可重用的能力，需要为每个新身份进行单独且资源密集的训练。这导致了高昂的计算成本和部署延迟。为了解决这一问题，我们提出了Prompt2Fingerprint（P2F），这是首个将指纹生成重新定义为条件参数生成任务的框架。通过利用专用生成器，P2F在单次前向传递中将文本描述直接映射到低秩参数增量，从而实现无需进一步模型微调的即插即用LLM指纹注入。我们的实验表明，P2F在保持高指纹准确度、无害性和鲁棒性的同时，显著降低了计算开销，为LLM所有权管理提供了可扩展且即时的解决方案。

英文摘要

The widespread deployment and redistribution of large language models (LLMs) have made model provenance tracking a critical challenge. While existing LLM fingerprinting methods, particularly active approaches that embed identity signals via fine-tuning, achieve high accuracy and robustness, they suffer from significant scalability bottlenecks. These methods typically treat fingerprint injection as an independent, one-off optimization task rather than a reusable capability, necessitating separate, resource-intensive training for every new identity. This incurs prohibitive computational costs and deployment delays. To address this, we propose Prompt2Fingerprint (P2F), the first framework that reformulates fingerprinting as a conditional parameter generation task. By leveraging a specialized generator, P2F maps textual descriptions directly to low-rank parameter increments in a single forward pass, enabling plug-and-play LLM fingerprint injection without further model retraining. Our experiments demonstrate that P2F maintains high fingerprint accuracy, harmlessness, and robustness while significantly reducing computational overhead, offering a scalable and instant solution for LLM ownership management.

URL PDF HTML ☆

赞 0 踩 0

2605.18445 2026-05-20 cs.CV cs.AI cs.CL cs.LG 版本更新

What's Holding Back Latent Visual Reasoning?

是什么在阻碍潜在视觉推理？

André G. Viveiros, Nuno Gonçalves, André F. T. Martins, Matthias Lindemann

发表机构 * Instituto Superior Técnico, Universidade de Lisboa（里斯本大学理工学院）； Instituto de Telecomunicações（电信研究所）； TransPerfect（TransPerfect公司）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本研究探讨了现有模型如何利用潜在令牌，发现潜在令牌在最终预测中起作用有限，主要问题在于训练数据中潜在令牌信息有限且推理时生成的潜在令牌偏离真实表示，需要高质量数据和更精确的潜在令牌预测来推动发展。

详情

AI中文摘要

人类通过心理模拟中间视觉步骤来解决复杂视觉问题，而非仅通过语言推理。受此启发，近期有关视觉-语言模型的工作探索了连续潜在令牌作为中间视觉想象步骤的链式推理。在本工作中，我们研究了近期模型如何利用此类潜在令牌。令人惊讶的是，当潜在令牌被无信息的占位符令牌替代时，模型准确性不受影响。这表明潜在令牌在模型最终预测中起最小的因果作用。为了更好地理解这一现象，我们分析了由oracle潜在表示提供的训练信号以及推理时生成的潜在令牌质量。我们的实验揭示了两个阻碍潜在视觉推理的关键问题：首先，在大多数现有数据集中，oracle潜在令牌提供的信息有限，仅超出原始图像，且不显著简化任务，导致模型在训练时忽略它们，并在推理时有效绕过它们。当在诊断数据集上微调时，其中潜在令牌为最终预测提供充分支持，我们显示模型可以因果依赖于它们。其次，在推理时生成的潜在令牌偏离其对应的oracle表示，坍缩到狭窄区域，即使模型依赖它们也无法获得收益。总体而言，我们的发现表明，未来潜在视觉推理的进步取决于两个关键支柱：具有信息性中间步骤的高质量数据集和更精确的潜在令牌预测。

英文摘要

Humans can approach complex visual problems by mentally simulating intermediate visual steps, rather than reasoning through language alone. Inspired by this, several works on Vision-Language Models have recently explored chain-of-thought reasoning with continuous latent tokens as intermediate visual imagination steps. In this work, we investigate how recent models leverage such latent tokens. Surprisingly, we find that model accuracy is unaffected when latent tokens are replaced by uninformative dummy tokens. This indicates that latent tokens play a minimal causal role in the model's final prediction. To better understand this phenomenon, we analyze both the training signal provided by oracle latent representations and the quality of the latent tokens generated at inference time. Our experiments reveal two crucial issues holding back latent visual reasoning: First, in most existing datasets, oracle latent tokens provide limited additional information beyond the original image and do not substantially simplify the task, leading models to ignore them during training and effectively bypassing them at inference time. When fine-tuned on a diagnostic dataset, in which latent tokens provide sufficient support for the final prediction, we show that models can causally rely on them. Second, the latent tokens produced at inference time deviate from their corresponding oracle representations, collapsing to a narrow region and preventing benefits even when the model relies on them. Overall, our findings suggest that future progress in latent visual reasoning depends on two key pillars: high-quality datasets with informative intermediate steps and more precise latent token prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.18389 2026-05-20 cs.LG math.OC 版本更新

Spherical Harmonic Optimal Transport: Application to Climate Models Comparisons

球面调和最优传输：应用于气候模型比较

Pierre Houédry, Iskander Legheraba, Léo Buecher, Nicolas Courty

发表机构 * INRIA Rennes（INRIA里昂）； University of Montpellier（蒙彼利埃大学）； LPHI, UMR 5294, CNRS, INSERM（LPHI，UMR 5294，CNRS，INSERM）； Université Bretagne Sud（布列塔尼南大学）； IRISA, UMR 6074, CNRS（IRISA，UMR 6074，CNRS）

AI总结本文提出了一种基于球面调和函数的最优传输方法，用于高效比较气候模型，通过在球面上利用谐波结构设计快速Sinkhorn算法，提升了计算效率并应用于全球气候模型评估。

详情

AI中文摘要

最优传输提供了一个强大的框架，用于在尊重其支撑集几何结构的情况下比较测度，但计算成本高昂，限制了其在现实应用中的潜力。在流形上，基于热核的卷积算法已被提出以缓解这一成本，但其理论性质仍鲜有探索。我们证明了当时间趋于零时，热核成本在平衡和非平衡情况下均收敛于最优传输成本。在特定情况下，对于2球面S²，我们确保所关联的Sinkhorn分歧保持经典最优传输差异的几何和分析性质。此外，我们利用球面的谐波结构推导出一种快速的Sinkhorn算法，仅需O(n)的内存和O(n^{3/2})的时间每迭代，且完全支持GPU友好的密集运算。我们在合成数据上验证了其计算效率，并讨论了其在评估全球气候模型中的潜在用途，提供了对模型性能的空间和季节性洞察。

英文摘要

Optimal transport provides a powerful framework for comparing measures while respecting the geometry of their support, but comes with an expensive computational cost, hindering its potential application to real world use cases. On manifolds, convolutional algorithms based on the heat kernel have been proposed to alleviate this cost, but their theoretical properties remain largely unexplored. We establish that the heat kernel cost converges to the optimal transport cost as time vanishes in the balanced and unbalanced cases. In the specific case of the 2-sphere $\mathbb{S}^2$, we ensure that the associated Sinkhorn divergences retains the desirable geometric and analytic properties of classical optimal transport discrepancies. Moreover, we leverage the harmonic structure of the sphere to derive a fast Sinkhorn algorithm, requiring only $\mathcal{O}(n)$ memory and $\mathcal{O}(n^{3/2})$ time per iteration, with fully dense GPU-friendly operations. We validate its computational efficiency on synthetic data, and discuss its potential use in the evaluation of global climate models, providing both spatial and seasonal insights into models performances.

URL PDF HTML ☆

赞 0 踩 0

2605.17889 2026-05-20 cs.LG 版本更新

CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution

CoX-MoE: 通过AMX启用的CPU-GPU协同执行提升高吞吐量MoE推理的协同专家执行

Muyoung Son, Yi Chen, Seungjae Yoo, Soongyu Choi, Joo-Young Kim

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出CoX-MoE，一种通过AMX启用的CPU-GPU协同系统，通过协同专家执行和战略工作负载编排优化MoE推理，提升吞吐量。CoX-MoE引入了coalescing-aware orchestration策略和静态专家-aware分层方案，分别优化资源分配和减少PCIe传输开销，从而在吞吐量上比现有框架提升7.1倍和2.4倍。

Comments 7 pages, 8 figures, accepted to DAC '26

详情

DOI: 10.1145/3770743.3804296

AI中文摘要

混合专家（MoE）架构通过稀疏专家激活提高计算效率，但面向吞吐量的推理面临显著的GPU内存压力，因为参数规模和中间数据较大。先前工作尝试通过专家卸载和微批处理或卸载计算到CPU来缓解这一问题。然而，微批处理导致的工作负载碎片化会降低操作强度，导致专家执行成为内存瓶颈。同时，CPU卸载受限于慢速PCIe传输和其在解码阶段注意力计算中的有限适用性。因此，这些低效性限制了系统利用率，严重限制了MoE推理的端到端吞吐量。为了解决这些挑战，本文提出CoX-MoE，一种通过AMX启用的CPU-GPU协同系统，通过结合协同专家执行和战略工作负载编排来全面优化MoE推理。CoX-MoE引入（i）一种coalescing-aware orchestration策略，通过采用普通批处理而非微批处理进行专家计算和选择性注意力卸载，共同优化资源分配；（ii）一种静态专家-aware分层方案，预先将频繁激活的专家分配到GPU，减少PCIe传输开销并平衡CPU和GPU在推理中的工作负载。与最先进的框架相比，CoX-MoE实现了显著的提升，分别达到比FlexGen和MoE-Lightning高7.1倍和2.4倍的吞吐量。

英文摘要

The Mixture-of-Experts (MoE) architecture improves computational efficiency via sparse expert activation, but throughput-oriented inference faces substantial GPU memory pressure due to a significant parameter size and intermediate data. Prior works attempt to mitigate this using expert offloading with micro-batching or by offloading computation to the CPU. However, the fragmented workload resulting from micro-batching degrades operational intensity, causing expert execution to become memory-bound. Meanwhile, CPU offloading is constrained by slow PCIe transfers and its limited applicability to attention computation in the decode stage. Consequently, these inefficiencies prevent effective system utilization, severely restricting the end-to-end throughput of MoE inference. To address these challenges, this paper proposes CoX-MoE, an Advanced Matrix Extensions (AMX)-enabled CPU-GPU collaborative system that comprehensively optimizes MoE inference by combining coalesced expert execution with strategic workload orchestration for higher throughput. CoX-MoE introduces (i) a coalescing-aware orchestration policy to jointly optimize resource allocation by adopting ordinary batch, instead of micro-batch, for expert computation and selective attention offloading, and (ii) a static expert-aware stratification scheme that pre-assigns frequently activated experts to the GPU, mitigating PCIe transfer overhead and balancing workload for the CPU and GPU during inference. Compared to state-of-the-art frameworks, CoX-MoE delivers significant gains, achieving up to 7.1x and 2.4x higher throughput than FlexGen and MoE-Lightning, respectively.

URL PDF HTML ☆

赞 0 踩 0

2605.17859 2026-05-20 cs.HC cs.LG 版本更新

Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables

多站点PPG：来自新兴多站点可穿戴设备的野外生理数据集

Jiayi Shao, Jiaying Ye, Shengyao Liu, Zachary Englhardt, Girish Narayanswamy, Vikram Iyer, Qiuyue Shirley Xue

发表机构 * University of Washington（华盛顿大学）； Purdue University（普渡大学）

AI总结本文提出一个多站点PPG数据集，通过四个定制开发的无感可穿戴设备收集了超过350小时的原始数据，用于评估不同身体部位的PPG信号在心率估计中的表现差异。

Comments 20 pages, 6 figures, 11 tables. Dataset and code available at the URLs in the paper

详情

AI中文摘要

可穿戴设备被广泛用于移动健康监测，光脉冲测距（PPG）是用于心率及相关生理测量的关键传感模式。然而，公开的野外PPG数据集大多集中在手腕或局限于短时间的受控研究，限制了新兴可穿戴设备形式因素的研究。我们提出了Multi-site PPG，一个从四个定制开发的无感可穿戴设备（智能耳环、戒指、手表和项链）收集的野外生理数据集。每个设备记录绿色和红外反射PPG、三轴加速度计和温度，并带有时间戳以实现跨设备对齐，同时一个Polar H10胸 strap提供参考心电图（ECG）。参与者在白天活动期间佩戴设备多天，继续正常生活。该数据集包含超过350小时的原始数据和每种可穿戴设备230-290小时的建模准备8秒窗口。我们基准测试了启发式、监督和自监督的心率估计方法，显示了显著的身体部位差异：最佳方法在耳环上的平均绝对误差（MAE）为2.30 bpm，在戒指上为5.13 bpm，在手表上为8.37 bpm，在项链上为8.68 bpm。我们进一步分析了运动效应，并评估了多站点和PPG-加速度计融合，证明了该数据集在新兴可穿戴设备形式因素上的稳健生理传感价值。

英文摘要

Wearables are widely used for mobile health monitoring, and photoplethysmography (PPG) is a key sensing modality for heart rate and related physiological measurements. However, public in-the-wild PPG datasets remain largely wrist-centric or limited to short, controlled studies, constraining research on emerging wearable form factors. We present Multi-site PPG, an in-the-wild physiological dataset collected from four custom-developed unobtrusive wearables: a smart earring, ring, watch, and necklace. Each device records green and infrared reflective PPG, 3-axis acceleration, and temperature with timestamps for cross-device alignment, while a Polar H10 chest strap provides reference electrocardiogram (ECG). Participants wore the devices for multiple days during daytime activities while continuing their normal routines. The dataset contains over 350 hours of raw data and 230-290 hours of modeling-ready 8-second windows per wearable. We benchmark heuristic, supervised, and self-supervised heart-rate estimation methods, showing substantial body-site differences: the best methods achieve mean absolute errors (MAEs) of 2.30 bpm on the earring, 5.13 bpm on the ring, 8.37 bpm on the watch, and 8.68 bpm on the necklace. We further analyze motion effects and evaluate multi-site and PPG-accelerometer fusion, demonstrating the dataset's value for robust physiological sensing across emerging wearable form factors.

URL PDF HTML ☆

赞 0 踩 0

2605.17804 2026-05-20 cs.LG eess.SP 版本更新

GenTS: A Comprehensive Benchmark Library for Generative Time Series Models

GenTS：生成时间序列模型的综合基准库

Chenxi Wang, Xiaorong Wang, Peiyang Li, Yi Wang

发表机构 * The University of Hong Kong（香港大学）； Fudan University（复旦大学）

AI总结本文提出GenTS，一个用于系统评估生成时间序列模型的综合且可扩展的基准库，通过统一的数据预处理流程、多样化的模型集合和全景评估指标，为生成模型提供了更灵活的评估框架。

详情

AI中文摘要

生成模型在时间序列分析任务中展现出了显著的潜力，如合成、预测、插值等。然而，现有的时间序列库主要针对判别模型进行工程设计，具有针对特定任务的标准工作流程，例如优化时间序列预测的均方误差。这种刚性的结构与生成模型独特的、往往复杂的范式（如对抗训练、扩散过程）根本上不兼容，因为生成模型学习的是数据分布而非直接的输入-输出映射。为此，我们提出了GenTS，一个全面且可扩展的基准库，旨在对生成时间序列模型进行系统评估。GenTS具有统一的数据预处理流程、多样化的模型集合和全景评估指标。其模块化设计也使研究者能够灵活地自定义超出内置数据集和模型。基于GenTS，我们进行了在多种任务下的基准测试，从而为模型选择提供了建议，并识别了未来研究的潜在方向。我们的代码在https://github.com/WillWang1113/GenTS上开源。官方教程和文档可在https://willwang1113.github.io/GenTS/上获取。

英文摘要

Generative models have demonstrated remarkable potential in time series analysis tasks, like synthesis, forecasting, imputation, etc. However, offering limited coverage for generative models, existing time series libraries are mainly engineered for discriminative models, with standardized workflows for specific tasks, such as optimizing Mean Squared Errors for time series forecasting. This rigid structure is fundamentally incompatible with the distinct and often complex paradigms of generative models (e.g., adversarial training, diffusion processes), which learn the underlying data distribution rather than a direct input-output mapping. To this end, we proposed GenTS, a comprehensive and extensible benchmark library designed for systematic assessment on generative time series models. GenTS features a unified data preprocessing pipeline, a collection of versatile models, and panoramic evaluation metrics. Its modular design also enables the researchers to flexibly customize beyond our built-in datasets and models. Based on GenTS, we conducted benchmarking experiments under diverse tasks, accordingly offering suggestions for model selection and identifying potential directions for future research. Our codes are open-source at https://github.com/WillWang1113/GenTS. The official tutorials and document are available at https://willwang1113.github.io/GenTS/.

URL PDF HTML ☆

赞 0 踩 0

2605.17340 2026-05-20 cs.LG 版本更新

Olivia: Harmonizing Time Series Foundation Models with Power Spectral Density

Olivia：通过功率谱密度和谐化时间序列基础模型

Jingru Fei, Kun Yi, Alex Xing Wang, Qingsong Wen, Xiangxiang Zhu, Wei Fan

发表机构 * Beijing Institute of Technology（北京理工大学）； North China Institute of Computing Technology（华北计算技术研究所）； State Information Center（国家信息中心）； University of Auckland（奥克兰大学）； Northwest Polytechnical University（西北工业大学）； Victoria University of Wellington（威灵顿维多利亚大学）

AI总结本文提出Olivia，一种基于谐化机制的时间序列基础模型，通过在频域中使用功率谱密度来减少数据集间的不匹配并增强预训练效果，从而在零样本、少样本和全样本预测场景中取得最佳性能。

Comments Accepted by ICML 2026

详情

AI中文摘要

时间序列基础模型依赖于在跨领域多样数据集上进行大规模预训练，但其在时间模式上的异质性可能会阻碍训练和学习可迁移的时间序列表示的有效性。受信号处理中归一化功率谱密度（PSD）基本概念的启发，我们假设通过频域中的PSD和谐化数据集可以减少不匹配并增强预训练。我们超越了直接不可行的最小化优化，创新性地将其重新表述为一种原则性的和谐化方法。具体而言，我们提出Harmonizer模块，该模块重塑频谱结构并隐式地在不同数据集中和谐化PSD，这在理论上对应于第二阶时间相关性的共享重参数化。我们的理论分析进一步揭示，与Harmonizer交互的token可以通过紧凑的共振器集合高效地进行调解，从而启发了HarmonicAttention设计，该设计在低维交互空间中执行自注意力。然后，我们提出Olivia，一种基于这些和谐化机制的新时间序列基础模型。在两个大规模基准（TSLib和GIFT-Eval）以及额外的6个GluonTS数据集上的广泛实验表明，Olivia在零样本、少样本和全样本预测场景中一致实现了最佳性能。我们的代码可在https://github.com/TSTS13/Olivia上获得。

英文摘要

Time series foundation models rely on large-scale pretraining over diverse datasets across domains, yet their heterogeneity in temporal patterns could hinder the effectiveness of training and learning transferable time series representations. Inspired a fundamental concept, normalized power spectral density (PSD) in signal processing, we assume harmonizing datasets via PSDs in the spectral domain could reduce mismatches and enhance pretraining. We then go beyond the direct intractable minimization optimization and innovatively reformulate it as a principled harmonization approach. Specifically, we propose Harmonizer, a module that reshapes spectral structures and implicitly harmonizing PSDs across datasets, which theoretically corresponds to a shared reparameterization of second-order temporal correlations. Our theoretical analysis further reveals token interactions with Harmonizer can be efficiently mediated by a compact set of resonators, motivating a HarmonicAttention design that performs self-attention in a low-dimensional interaction space. Then, we propose Olivia, a novel time series foundation model built upon these harmonization mechanisms. Extensive experiments on two large-scale benchmarks (TSLib and GIFT-Eval) and extra 6 datasets from GluonTS, demonstrate Olivia consistently achieves state-of-the-art performance under zero-shot, few-shot, and full-shot forecasting scenarios. Our code is available at https://github.com/TSTS13/Olivia.

URL PDF HTML ☆

赞 0 踩 0

2605.17326 2026-05-20 hep-lat cs.LG 版本更新

Noise scheduling and linear dynamics in diffusion models on Lie groups

在李群上扩散模型中的噪声调度与线性动力学

Javad Komijani

发表机构 * Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland（理论物理研究所，苏黎世联邦理工学院，瑞士苏黎世，8093）

AI总结本文研究了在李群上扩散过程中噪声调度的作用，特别关注其在格点规范理论中的应用。研究发现特定的噪声调度可使Wilson作用量的期望值随扩散时间线性衰减，与欧几里得扩散模型相比，这种行为在李群设置中自然产生，而后者需要显式设计漂移项。

Comments 5 pages

2605.17046 2026-05-20 cs.LG cs.AI cs.CL 版本更新

超越困惑度：低秩预训练的几何与谱研究

Namrata Shivagunde, Vijeta Deshpande, Sherin Muckatira, Anna Rumshisky

发表机构 * University of Massachusetts Lowell（马萨诸塞大学洛厄尔分校）

AI总结本文通过几何和谱分析研究低秩预训练方法，揭示其与全秩训练在模型性能和解空间上的差异，发现低秩方法在不同模型规模下表现各异，且困惑度不能完全反映下游任务性能。

Comments 9 pages, 5 figures, 2 tables

详情

AI中文摘要

大规模语言模型的预训练主要受限于存储全秩权重、梯度和优化器状态的内存成本。低秩预训练出现以解决这一问题，相关方法空间迅速扩展。一个核心问题仍未解决：低秩方法是否能产生与全秩训练具有同等泛化能力的模型，或者秩约束是否根本性地改变了所达到的解？现有比较几乎完全依赖于单种子运行的验证困惑度，通常继承自先前文献。然而，困惑度是解质量的差代理；两种方法可以在困惑度上匹配，却收敛到不同的损失景观区域和内部表示。我们通过表征五种低秩预训练方法（GaLore和Fira（内存高效优化器）、CoLA和SLTrain（架构再参数化）、ReLoRA（适配器式更新带周期性重置））在三个模型规模（60M、130M、350M）下与全秩训练的解，关闭这一差距。我们评估每种方法在四个维度上的16个指标：1D损失景观沿随机/Top-K PCA方向、1D检查点之间插值、权重和学习更新的谱结构，以及激活相似性与全秩训练。我们显示低秩方法不等同于全秩训练，也不等同于彼此，即使验证困惑度接近。全秩训练在随机方向上达到更尖锐的盆地，而反方向则适用于top-1 PCA方向。每种方法收敛到几何上不同的盆地。低秩激活在训练过程中随着层数增加而偏离全秩激活，GaLore最接近全秩激活。进一步，验证困惑度在每个规模下并不转化为下游性能。添加几何和谱度量提高了预测。

英文摘要

Pre-training large language models is dominated by the memory cost of storing full-rank weights, gradients, and optimizer states. Low-rank pre-training has emerged to address this, and the space of methods has grown rapidly. A central question remains open: do low-rank methods produce models that generalize comparably to full-rank training, or does the rank constraint fundamentally alter the solutions reached? Existing comparisons rely almost entirely on validation perplexity from single-seed runs, often carried forward from prior literature. Yet perplexity is a poor proxy for solution quality; two methods can match on perplexity while converging to different loss landscape regions and internal representations. We close this gap by characterizing the solutions found by five low-rank pre-training methods, GaLore and Fira (memory-efficient optimizers), CoLA and SLTrain (architecture reparameterizations), and ReLoRA (adapter-style updates with periodic resets), against full-rank training at three model scales (60M, 130M, 350M). We evaluate each along 16 metrics across four dimensions: 1-D loss landscape along random/top-K PCA directions, 1-D interpolation between checkpoints, spectral structure of the weights and learned updates, and activation similarity to full-rank training. We show that low-rank methods are not equivalent to full-rank training, nor to one another, even when validation perplexity is close. Full-rank training settles into a sharper basin than low-rank methods along random directions, while the reverse holds for the top-1 PCA direction. Each method converges to a geometrically distinct basin. Low-rank activations diverge from full-rank in later layers as training progresses, with GaLore tracking full-rank most closely. Further, validation perplexity does not translate to downstream performance at every scale. Adding geometric and spectral metrics improves the prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.12981 2026-05-20 cs.SE cs.AI cs.LG 版本更新

Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

基于协议的开发：通过不变式和连续证据治理生成的软件

Jun He, Deying Yu

AI总结本文提出了一种基于协议的开发方法，通过定义协议的不变式和连续证据来治理生成的软件，其核心贡献是将协议作为主要软件 artifact，而非代码，从而实现对生成软件的持续验证和治理。

Comments 20 pages, 2 tables

详情

AI中文摘要

自动化程序合成降低了生成实现的成本，但引入了更复杂的治理问题：确定哪些生成的 artifact 是可接受的。自然语言规范存在歧义，基于示例的测试仅覆盖行为空间的一部分。单独使用这些方法无法提供足够的控制边界。我们引入了基于协议的开发（PDD），其中主要的软件 artifact 是可机器执行的协议，而非代码。我们定义协议为三元组 P = (S, B, O)，指定结构、行为和操作不变式。其联合作为软件组件的可接受实现空间的定义。在 PDD 中，实现是通过受约束的搜索发现的可替换实现。只有满足协议并产生可验证的合规证据链的实现才被接受。接受基于协议的满足和记录的证据，而非对生成器的信任。对于部署的系统，我们扩展证据链为动态证据账本。运行时验证器将签名的观察、不变式检查和违规情况附加到账本中，使可监控的义务能够持续得到证明。这将实时故障回溯到生成循环中，而无需授予生成器运行时的权威。结合形式方法、属性测试、运行时验证、政策作为代码和软件可追溯性，PDD 定义了自动化软件工程的治理模型。其组织原则是代码是短暂的，而协议承载持久的权威。

英文摘要

Automated program synthesis lowers the cost of producing implementations but introduces a harder governance problem: determining which generated artifacts are admissible. Natural-language specifications are ambiguous, and example-based tests sample only part of the behavioral space. Used alone, neither provides a sufficient control boundary. We introduce Protocol-Driven Development (PDD), where the primary software artifact is a machine-enforceable protocol rather than code. We define a protocol as the triplet P = (S, B, O), specifying structural, behavioral, and operational invariants. Their conjunction defines the admissible implementation space of a software component. Under PDD, implementations are replaceable realizations discovered through constrained search. An implementation is admitted only if it satisfies the protocol and produces a verifiable Evidence Chain of compliance. Admission is grounded in protocol satisfaction and recorded evidence rather than trust in the generator. For deployed systems, we extend the Evidence Chain into a Dynamic Evidence Ledger. Runtime verifiers append signed observations, invariant checks, and violations to the ledger, allowing monitorable obligations to be continuously attested. This connects live failures back to the generation loop without granting the generator runtime authority. Combining formal methods, property testing, runtime verification, policy-as-code, and software provenance, PDD defines a governance model for automated software engineering. Its organizing principle is that code is transient, while the protocol carries durable authority.

URL PDF HTML ☆

赞 0 踩 0

2605.11333 2026-05-20 cs.DC cs.LG cs.PF 版本更新

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

MLCommons Chakra: 通过标准化执行轨迹推进性能基准测试与联合设计

Srinivas Sridharan, Theodor-Adrian Badea, Andy Balogh, Bradford M. Beckmann, Brian Coutinho, Louis Feng, Sheng Fu, Sanshan Gao, Mehryar Garakani, Taekyung Heo, David Kanter, Josh Ladd, Ziwei Li, Winston Liu, Changhai Man, Dan Mihailescu, Spandan More, Joongun Park, Ashwin Ramachandran, Vinay Ramakrishnaiah, Saeed Rashidi, Vijay Janapa Reddi, Puneet Sharma, Phio Tian, William Won, Hanjiang Wu, Huan Xu, Jinsun Yoo, Tushar Krishna

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结本文提出Chakra，一个用于性能基准测试和联合设计的开放生态系统，通过标准化执行轨迹来提升分布式机器学习工作负载在生产AI系统中的观察、重现和优化能力，并通过实际案例展示其价值。

Comments Accepted at the 9th Conference on Machine Learning and Systems (MLSys 2026)

详情

AI中文摘要

HoReN：用于大规模序列模型编辑的归一化Hopfield检索

Yuan Fang, Yi Xie, Xuming Ran

发表机构 * IXL Learning, Inc（IXL学习公司）； Technical University of Munich（慕尼黑技术大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出HoReN，一种基于代码本的参数保持编辑器，通过在单个MLP层中引入离散键值记忆，实现了在大规模序列模型编辑中的高效检索和更新，同时在多种基准测试中表现出色。

Comments 30 pages, 10 figures

详情

AI中文摘要

大型语言模型编码了大量事实性知识，但部署后这些知识可能会过时或错误，而重新训练成本过高。这推动了终身模型编辑，旨在更新特定行为的同时保持模型其余部分。现有的编辑器，无论是参数修改型还是参数保持型，在编辑累积时都会严重退化，并且在处理同义词时难以泛化。我们提出了HoReN，一种基于代码本的参数保持编辑器，通过在单个MLP层中引入离散键值记忆来包装。HoReN将每个代码本条目视为知识键和Hopfield存储模式，通过单位超球面上的角度相似性检索编辑，并通过阻尼Hopfield动态来优化查询，使同义词收敛到正确的记忆盆地，而无关输入保持稳定。HoReN在多种基准测试中表现出强大的编辑性能，包括标准ZsRE、结构化WikiBigEdit和非结构化UnKE评估。此外，HoReN能够扩展到50,000个序列编辑的ZsRE，其整体性能始终高于0.93，而先前的编辑器在达到10,000个编辑之前会崩溃或严重退化。我们的代码可在https://github.com/ha11ucin8/HoReN上获得。

英文摘要

Large language models encode vast factual knowledge that can become outdated or incorrect after deployment, yet retraining is prohibitively costly. This motivates lifelong model editing, which updates targeted behavior while preserving the rest of the model. Existing editors, both parameter-modifying and parameter-preserving, degrade severely as edits accumulate and struggle to generalize across paraphrases. We propose HoReN, a codebook-based parameter-preserving editor that wraps a single MLP layer with a discrete key-value memory. HoReN treats each codebook entry as both a knowledge key and a Hopfield stored pattern, retrieves edits by angular similarity on the unit hypersphere, and refines queries through damped Hopfield dynamics so paraphrases converge to the correct memory basin while unrelated inputs remain stable. HoReN achieves strong editing performance with consistent gains across diverse benchmarks spanning standard ZsRE, structured WikiBigEdit, and unstructured UnKE evaluations. Moreover, HoReN scales to 50K sequential edits on ZsRE with stable overall performance above 0.93, while prior editors collapse or degrade severely before reaching 10K. Our code is available at https://github.com/ha11ucin8/HoReN.

URL PDF HTML ☆

赞 0 踩 0

2605.07721 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

内存高效的循环变换器：在循环语言模型中解耦计算与内存

Victor Conchello Vendrell, Arnau Padres Masdemont, Niccolò Grillo, Jordi Ros-Giralt, Arash Behboodi, Fabio Valerio Massoli

发表机构 * Qualcomm AI Research（高通人工智能研究）

AI总结本文提出了一种内存高效的循环变换器（MELT），通过解耦推理深度与内存消耗，实现了常数内存的迭代推理，同时保持了LoopLM的性能，仅需轻量级的后训练过程。

Comments 22 pages, 5 figures, 11 tables

详情

AI中文摘要

递归大语言模型（LLM）架构已作为一种改进推理能力的有希望的方法出现，因为它们能够在嵌入空间中进行多步计算而无需生成中间标记。例如Ouro模型通过迭代更新内部表示并在每次迭代中保留标准的键值（KV）缓存来进行推理，导致内存消耗与推理深度成线性增长。因此，增加推理迭代次数会导致内存使用变得不可接受，限制了此类架构的实际可扩展性。在本工作中，我们提出了内存高效的循环变换器（MELT），一种新颖的架构，将推理深度与内存消耗解耦。与使用每个层和循环的标准KV缓存不同，MELT在每个层中维护一个共享于推理循环的单个KV缓存。该缓存通过可学习的门控机制随时间更新。为了在该架构下实现稳定且高效的训练，我们提出采用分块训练的两阶段过程进行训练：插值转换，随后是注意力对齐的蒸馏，均从LoopLM起始模型到MELT。实验表明，我们展示MELT模型在从预训练Ouro参数微调后，优于同等规模的标准LLM，同时保持与这些模型相当的内存占用，并显著小于Ouro的内存占用。总体而言，MELT实现了无需牺牲LoopLM性能的常数内存迭代推理，仅需轻量级的后训练过程。

英文摘要

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memory usage, limiting the practical scalability of such architectures. In this work, we propose Memory-Efficient Looped Transformer (MELT), a novel architecture that decouples reasoning depth from memory consumption. Instead of using a standard KV cache per layer and loop, MELT maintains a single KV cache per layer that is shared across reasoning loops. This cache is updated over time via a learnable gating mechanism. To enable stable and efficient training under this architecture, we propose to train MELT using chunk-wise training in a two phase procedure: interpolated transition, followed by attention-aligned distillation, both from the LoopLM starting model to MELT. Empirically, we show that MELT models fine-tuned from pretrained Ouro parameters outperform standard LLMs of comparable size, while maintaining a memory footprint comparable to those models and dramatically smaller than Ouro's. Overall, MELT achieves constant-memory iterative reasoning without sacrificing LoopLM performance, using only a lightweight post-training procedure.

URL PDF HTML ☆

赞 0 踩 0

2605.06501 2026-05-20 cs.LG cs.CL 版本更新

Cubit: Token Mixer with Kernel Ridge Regression

Cubit：基于核岭回归的令牌混合器

Chuanyang Zheng, Jiankai Sun, Yihang Gao, Yuehao Wang, Liangchen Tan, Mac Schwager, Anderson Schneider, Yuriy Nevmyvaka, Xiaodong Liu

AI总结本文提出Cubit，一种基于核岭回归的新型架构，通过将令牌混合机制从Nadaraya-Watson回归转换为核岭回归，从而提供更稳固的数学基础，并在长序列建模能力上表现出优势。

Comments Tech Report

详情

AI中文摘要

自2017年引入以来，Transformer已成为现代深度学习中最广泛采用的架构之一。尽管在位置编码、注意力机制和前馈网络方面进行了大量改进，Transformer的核心令牌混合机制仍为注意力。在本文中，我们表明Transformer中的注意力模块可以被解释为执行Nadaraya-Watson回归，其中它计算令牌之间的相似性并相应地汇总值。受这一视角的启发，我们提出了Cubit，一种潜在的下一代架构，它利用核岭回归（KRR），而传统的Transformer依赖于Nadaraya-Watson回归。具体而言，Cubit通过将经典的注意力计算修改为结合KRR的闭式解，将值汇总通过核相似性与通过核矩阵的逆进行归一化。为了提高训练稳定性，我们进一步提出了有限范围重缩放（LRR），它在受控范围内缩放值层。我们认为，作为基于KRR的架构，Cubit比传统的Transformer提供了更稳固的数学基础，因为Transformer的注意力机制对应于Nadaraya-Watson回归。我们通过全面的实验验证了这一主张。实验结果表明，Cubit可能在长序列建模能力上表现更强。特别是，其在Transformer上的性能提升似乎随着训练序列长度的增长而增加。

英文摘要

Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing mechanism in Transformers remains attention. In this work, we show that the attention module in Transformers can be interpreted as performing Nadaraya-Watson regression, where it computes similarities between tokens and aggregates the corresponding values accordingly. Motivated by this perspective, we propose Cubit, a potential next-generation architecture that leverages Kernel Ridge Regression (KRR), while the vanilla Transformer relies on Nadaraya-Watson regression. Specifically, Cubit modifies the classical attention computation by incorporating the closed-form solution of KRR, combining value aggregation through kernel similarities with normalization via the inverse of the kernel matrix. To improve the training stability, we further propose the Limited-Range Rescale (LRR), which rescales the value layer within a controlled range. We argue that Cubit, as a KRR-based architecture, provides a stronger mathematical foundation than the vanilla Transformer, whose attention mechanism corresponds to Nadaraya-Watson regression. We validate this claim through comprehensive experiments. The experimental results suggest that Cubit may exhibit stronger long-sequence modeling capability. In particular, its performance gain over the Transformer appears to increase as the training sequence length grows.

URL PDF HTML ☆

赞 0 踩 0

2605.05569 2026-05-20 math.OC cs.LG 版本更新

Stability of the Monge Map in Semi-Dual Optimal Transport

半对偶最优运输中Monge映射的稳定性

Anton Selitskiy, David Millard

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； University of Rochester（罗切斯特大学）； Department of Mechanical Engineering（机械工程系）； Rochester Institute of Technology（罗切斯特理工学院）

AI总结本文研究了半对偶最优运输问题的退化鞍点结构，证明其数值解等价于求解一个约束优化问题，并推导出无需要求对偶势函数最优的Monge映射收敛条件，解释了实践中数值算法更新传输映射所需迭代次数多于势函数的原因。

2605.05480 2026-05-20 cs.LG cs.AI stat.ML 版本更新

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

GRALIS：通过里斯表示建立线性归因方法的统一规范框架

Raimondo Fanale

发表机构 * Universitas Mercatorum（默卡托大学）

AI总结本文提出GRALIS框架，通过里斯表示理论统一了线性归因方法，提供七个形式定理保证归因方法的准确性、收敛性、Shapley交互值、Hoeffding ANOVA分解、Sobol敏感性泛化和多尺度扩展，展示了其在医学图像上的初步验证结果。

Comments 25 pages, 6 tables, 2 figures. Theoretical framework with preliminary experimental validation on BreaKHis (1,187 images, DenseNet-121). Extended empirical comparison in preparation

详情

AI中文摘要

深度神经网络的主要XAI归因方法——GradCAM、SHAP、LIME、集成梯度——基于不同的理论基础且无法正式比较。我们提出了GRALIS（梯度-里斯平均局部积分Shapley），一个建立归因表示理论的数学框架：L^2(Q, mu)上的每一个可加、线性和连续的归因功能都具有唯一的规范表示（Q，w，Delta），由里斯表示定理证明其必要性。该类包括SHAP、IG、LIME和线性化GradCAM，但不包括非线性功能如标准GradCAM或注意力图。七个形式定理提供了任何单个方法都缺乏的同时保证：（T1）必要规范形式；（T2）精确完备性；（T3）蒙特卡洛收敛O(1/sqrt(m))+O(1/k)；（T4）精确Shapley交互值；（T5）Hoeffding ANOVA分解；（T6）Sobol敏感性泛化；（T7）多尺度扩展（MS-GRALIS）具有最小方差权重。代数附录通过Mobius变换证明GRALIS-SIV对应关系，无需循环论证。GRALIS满足13.5/14个公理性质，而单独方法仅为2.5-6/14，包括完备性、敏感性、局部性、k阶交互和最优多尺度聚合。在BreaKHis（1,187例病理图像，DenseNet-121）上的初步验证报告删除忠实度AUC+0.015（恶性），96%类条件一致性，SAL=0.762±0.109和稀疏性指数0.39。与基线XAI方法的扩展比较计划在配套论文中进行。

英文摘要

The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Representation Theorem. This class encompasses SHAP, IG, LIME and linearized GradCAM, but excludes nonlinear functionals such as standard GradCAM or attention maps. Seven formal theorems provide simultaneous guarantees absent in any individual method: (T1) necessary canonical form; (T2) exact completeness; (T3) Monte Carlo convergence O(1/sqrt(m))+O(1/k); (T4) exact Shapley Interaction Values; (T5) Hoeffding ANOVA decomposition; (T6) Sobol sensitivity generalization; (T7) multi-scale extension (MS-GRALIS) with minimum-variance weights. An algebraic appendix justifies the GRALIS-SIV correspondence via the Mobius transform without circularity. GRALIS satisfies 13.5/14 axiomatic properties vs. 2.5-6/14 for individual methods, including completeness, sensitivity, locality, order-k interactions and optimal multi-scale aggregation simultaneously. Preliminary validation on BreaKHis (1,187 histology images, DenseNet-121) reports deletion faithfulness AUC +0.015 (malignant), 96% class-conditional consistency, SAL = 0.762+/-0.109 and sparsity index 0.39. Extended comparison with baseline XAI methods is planned for a companion paper.

URL PDF HTML ☆

赞 0 踩 0

2605.00856 2026-05-20 eess.SP cs.AI cs.HC cs.LG 版本更新

One-Block Transformer (1BT) for EEG-Based Cognitive Workload Assessment

用于EEG认知负荷评估的单块变换器（1BT）

Stefanos Gkikas, Christian Arzate Cruz, Thomas Kassiotis, Giorgos Giannakakis, Raul Fernandez Rojas, Randy Gomez

发表机构 * Honda Research Institute Japan Wako City, Japan ； Department of Electronic Engineering Hellenic Mediterranean University Chania, Greece ； BioSIS (Biosensing \& Intelligent Systems) Lab Centre for Intelligent Computing ； Systems University of Canberra Canberra, Australia

AI总结本文提出了一种用于EEG认知负荷评估的单块变换器（1BT），通过一个最小的潜在瓶颈聚合多通道时间序列，结合轻量级自注意力机制，实现了高效且紧凑的模型设计，从而在保持高性能的同时显著降低了计算成本。

详情

AI中文摘要

准确且连续地估计认知负荷对于构建自适应的人机系统至关重要。然而，设计在表示能力与计算效率之间取得平衡的架构在实际部署中一直具有挑战性。本文介绍了一种名为1BT的单块变换器，用于紧凑且高效的EEG认知负荷评估。该模型通过最小的潜在瓶颈聚合多通道时间序列，使用一个单一的交叉注意力模块后接轻量级自注意力。一项涉及11名参与者进行三种认知多样任务（抽象推理、数值问题解决和互动视频游戏）的受控研究，在两个认知负荷水平上进行了连续EEG记录。系统性的架构分析确定了最紧凑的配置，该配置在保持高性能的同时显著降低了计算成本。最终模型在不到0.5百万参数和0.02 GFLOPs的情况下实现了高认知负荷分类性能，为在资源受限环境下实时认知负荷监控的设计方向铺平了道路。

英文摘要

Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.

URL PDF HTML ☆

赞 0 踩 0

2605.00333 2026-05-20 cs.LG cs.CL 版本更新

Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B

借来的几何：冻结预训练的Gemma 4 31B在跨分布头部重要性指纹

Abay Bektursun

发表机构 * Independent research（独立研究）

AI总结本文研究了冻结预训练的Gemma 4 31B模型在跨分布任务中的头部重要性指纹，通过分析多个任务中的头部影响，发现特定头部在不同任务中表现出显著的重要性，同时验证了这些头部在因果上的有效性。

Comments v2: Added head-level causal ablation on OGBench cube-task1 (n=30, 3.2x specificity; n=5 paired-t p=0.039) and full L26 sweep. New sections on honest negatives (activation patching null, sufficiency null, within-layer Spearman wrong-direction). Multiplicity-aware permutation null V4 P=0.013. Title and framing updated. 25 pages (13 main), 10 figures

详情

AI中文摘要

冻结在文本上预训练的Gemma 4 31B权重，未经修改，通过一个薄的可训练接口转移到非文本模态。在L24-L29切片（192个注意力头）上，一个英语文本TxtCopy注意力探针（95个句子）和每个头部对四个非语言标记模式任务（二进制复制、联想回忆、1D细胞自动机规则90、二进制加法）的影响共同分类了四个头部——L26.28、L27.28、L27.2、L27.3——在两个信号上都处于顶级。切片级别的联合巧合在超几何空虚下显著（P=0.0013，N=192，K=38，n=4）并且在多重性感知的排列检验中存活（P_V4=0.013）。预训练的Gemma L26在OGBench cube-double-play-task1上达到60.22% vs ~1%对于随机初始化的Gemma（+59pt在n=3时）；一个带有正确1/√d_k缩放的FrozenRandom-GPT2对照也失败。头部层面的因果验证：在训练的cube-task1 IQL代理中零化L26.28导致成功从63.3%降至10.0% vs 46.7%对于层匹配的低-TxtCopy负对照（在n=30时有3.2倍的特异性；n=5配对-t p=0.039）。完整的L26扫描将L26.28置于32个中的第4位。诚实的负样本：在L26内Spearman ρ（TxtCopy，drop）=+0.37（与层内因果阅读相反）；单个头部激活修补不转移匹配变量；四个命名头部单独不足以完成任何任务；Walker2d-DT和scene-task1招募L24在命名切片之外并显示头-消融特异性为零。我们将贡献框架为切片级别的跨分布重要性指纹加上一个跨模态目标的头部层面因果证据。

英文摘要

Frozen Gemma 4 31B weights pretrained exclusively on text, unmodified, transfer through a thin trainable interface to non-text modalities the substrate has never processed. On the L24--L29 slice (192 attention heads), an English-text TxtCopy attention probe (95 sentences) and per-head ablation impact on four non-language token-pattern tasks (binary copy, associative recall, 1D cellular automaton Rule 90, binary addition) jointly classify four heads -- L26.28, L27.28, L27.2, L27.3 -- as top-tier on both signals. The slice-level joint coincidence is significant under hypergeometric null ($P = 0.0013$, $N=192$, $K=38$, $n=4$) and survives multiplicity-aware permutation tests ($P_{V4} = 0.013$). Pretrained Gemma L26 reaches 60.22% on OGBench cube-double-play-task1 vs ~1% for random-init Gemma ($+59$pt at $n=3$); a FrozenRandom-GPT2 control with correct $1/\sqrt{d_k}$ scaling also fails. Head-level causal validation: zeroing L26.28 in the trained cube-task1 IQL agent drops success $63.3\% \to 10.0\%$ vs $46.7\%$ for a layer-matched low-TxtCopy negative control ($3.2\times$ specificity at $n=30$; $n=5$ paired-$t$ $p=0.039$). A full L26 sweep places L26.28 at rank 4 of 32. Honest negatives: within-L26 Spearman $ρ(\text{TxtCopy, drop}) = +0.37$ (opposite of within-layer causal reading); single-head activation patching does not transfer the matching variable; the 4 named heads alone do not suffice on any task; Walker2d-DT and scene-task1 recruit L24 outside the named slice and show null head-ablation specificity. We frame the contribution as a cross-distribution importance fingerprint at the slice level plus head-level causal evidence on one cross-modality target.

URL PDF HTML ☆

赞 0 踩 0

2604.18739 2026-05-20 cs.LG stat.ML 版本更新

自适应阈值驱动的连续贪心方法用于可扩展的子模优化

Mohammadreza Rostami, Solmaz S. Kia

发表机构 * Department of Mechanical and Aerospace Engineering, University of California Irvine（加州大学尔湾分校机械与航空航天工程系）

AI总结该研究提出了一种自适应阈值驱动的连续贪心方法（ATCG），用于解决在Matroid约束下的子模最大化问题，通过动态调整活跃集扩展策略，提高了算法效率并减少了通信开销。

详情

AI中文摘要

在组合优化中，子模最大化在传感、数据摘要、主动学习和资源分配中有广泛应用。尽管顺序贪心（SG）算法由于不可逆选择只能达到1/2的近似比，连续贪心（CG）通过多线性松弛获得最优的(1-1/e)近似比，但其代价是逐渐密集的决策向量，迫使代理为几乎每一个基础集元素交换特征嵌入。我们提出ATCG（自适应阈值驱动连续贪心），通过每个分区的进度比率η_i来控制梯度评估，仅在当前候选未能捕获足够边际增益时扩展每个代理的活跃集，从而直接限制哪些特征嵌入会被传输。理论分析建立了具有曲率意识的近似保证，有效因子τ_eff= max{τ,1-c}，在阈值保证和低曲率区域之间插值，其中ATCG恢复CG的性能。这表明，曲率所捕捉的问题结构决定了接近全CG性能所需的协调和通信量。在类平衡的原型选择问题实验中，ATCG在CIFAR-10动物数据集的子集上实现了与全CG方法相当的目标值，同时显著减少了通信开销。

英文摘要

Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $η_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $τ_{\mathrm{eff}}=\max\{τ,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. This shows that the problem structure, as captured by curvature, determines the amount of coordination and communication required to approach full-CG performance. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.

URL PDF HTML ☆

赞 0 踩 0

2603.29501 2026-05-20 cs.LG cs.AI 版本更新

Target-Aligned Reinforcement Learning

目标对齐的强化学习

Leonard S. Pleiss, James Harrison, Maximilian Schiffer

发表机构 * Technical University of Munich（慕尼黑技术大学）

AI总结本文提出了一种目标对齐的强化学习方法，通过强调目标网络和在线网络估计高度一致的过渡，改进了传统深度强化学习算法的稳定性与收敛速度，实验证明在多个基准环境中取得了显著提升。

详情

AI中文摘要

许多基于价值的深度强化学习算法依赖于目标网络——在线网络的滞后副本——来稳定训练。虽然有效，但这种机制引入了一个基本的稳定性与新鲜度权衡：较慢的目标更新可以提高稳定性，但会降低学习信号的时效性，从而阻碍收敛速度。我们提出目标对齐的强化学习（TARL），这是一种简单的改进方法，适用于现有算法，强调目标网络和在线网络估计高度一致的过渡。通过将更新集中在良好对齐的目标上，TARL减轻了陈旧目标估计的负面影响，同时保留了目标网络的稳定作用。我们在离散和连续控制算法中，在各种基准环境中展示了持续的改进，无需任何超参数调整，包括在Atari-10上实现了38.18%的峰值得分提升，同时仅导致不到4%的实时时钟时间增加。

英文摘要

Many value-based deep reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a simple drop-in refinement for existing algorithms that emphasizes transitions for which the target and online network estimates are highly aligned. By focusing updates on well-aligned targets, TARL mitigates the adverse effects of stale target estimates while retaining the stabilizing benefits of target networks. We empirically demonstrate consistent improvements within discrete and continuous control algorithms across various benchmark environments without any hyperparameter tuning, including a 38.18% peak score gain on Atari-10, while incurring less than a 4% increase in wall-clock time.

URL PDF HTML ☆

赞 0 踩 0

2603.29382 2026-05-20 cs.CR cs.LG 版本更新

Deep Learning-Assisted Improved Differential Fault Attacks on Lightweight Stream Ciphers

基于深度学习的改进型差分故障攻击轻量级流密码

Kok Ping Lim, Dongyang Jia, Iftekhar Salam

发表机构 * School of Computing and Data Science, Xiamen University Malaysia（厦门大学马来西亚分校计算机与数据科学学院）

AI总结本文研究了基于深度学习的差分故障攻击在轻量级流密码中的可行性，开发了多层感知机模型来识别故障位置，并提出了基于阈值的方法优化密钥恢复过程，实验结果显示攻击复杂度低于现有方法，同时为ATOM密码提供了首次实验结果。

详情

AI中文摘要

轻量级密码学原语在资源受限环境中广泛部署，特别是在物联网设备中。由于其公开性，这些设备易受物理攻击，尤其是故障攻击。最近，基于深度学习的密码分析技术显示出有前景的结果；然而，其在故障攻击中的应用仍然有限，特别是在流密码中。在本工作中，我们研究了在放松的故障模型下，基于深度学习的差分故障攻击在三种轻量级流密码（ACORNv3、MORUSv2和ATOM）中的可行性。我们开发并训练了多层感知机（MLP）模型以识别故障位置。实验结果表明，训练后的模型在ACORNv3、MORUSv2和ATOM上的识别准确率分别为0.999880、0.999231和0.823568，并优于传统签名方法。在密钥恢复过程中，我们引入了基于阈值的方法以优化所需故障注入次数。结果表明，ACORN的初始状态可通过21至34次故障恢复，MORUS需213至248次故障，最多6位猜测。这两种攻击均降低了攻击复杂度。对于ATOM，结果表明其具有更高的安全余量，因为NFSR中的大部分状态位只能在精确控制模型下恢复。据我们所知，本工作为ATOM密码提供了首次差分故障攻击的实验结果。

英文摘要

Lightweight cryptographic primitives are widely deployed in resource-constrained environments, particularly in Internet of Things (IoT) devices. Due to their public accessibility, these devices are vulnerable to physical attacks, especially fault attacks. Recently, deep learning-based cryptanalytic techniques have demonstrated promising results; however, their application to fault attacks remains limited, particularly for stream ciphers. In this work, we investigate the feasibility of deep learning assisted differential fault attacks on three lightweight stream ciphers, namely ACORNv3, MORUSv2, and ATOM, under a relaxed fault model in which a single-bit bit-flipping fault is injected at an unknown location. We develop and train multilayer perceptron (MLP) models to identify the fault locations. Experimental results show that the trained models achieve high identification accuracies of 0.999880, 0.999231, and 0.823568 for ACORNv3, MORUSv2 and ATOM, respectively, and outperform traditional signature-based methods. For the secret recovery process, we introduce a threshold-based method to optimize the number of fault injections required to recover the secret information. The results show that the initial state of ACORN can be recovered with 21 to 34 faults, while MORUS requires 213 to 248 faults, with at most 6 bits of guessing. Both attacks reduce the attack complexity compared to existing works. For ATOM, the results show that it possesses a higher security margin, as the majority of state bits in the Nonlinear Feedback Shift Register (NFSR) can only be recovered under a precise control model. To the best of our knowledge, this work provides the first experimental results of differential fault attacks on ATOM.

URL PDF HTML ☆

赞 0 踩 0

2603.23722 2026-05-20 cs.MA cs.LG 版本更新

Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL

双门控认知时间延缓：异步MARL中的自主计算调节

Igor Jankowski

AI总结本文提出了一种基于双门控认知触发器的Epistemic Time-Dilation MAPPO算法，通过自主调节执行频率来提升异步MARL在边缘设备上的部署效率，实验表明该方法在减少计算开销的同时保持了中央任务主导性。

Comments 14 pages, 5 figures. Code available at: https://github.com/xaiqo/edtmappo. Related materials available on Zenodo: 10.5281/zenodo.19206838

详情

AI中文摘要

尽管多智能体强化学习（MARL）算法在复杂连续领域中取得了前所未有的成功，但其标准部署严格遵循同步操作范式。在此范式下，智能体被强制在每个微帧执行深度神经网络推断，无论即时需求如何。这种密集的吞吐量成为在边缘设备上物理部署的根本障碍，因为边缘设备的热能和代谢预算高度受限。我们提出了Epistemic Time-Dilation MAPPO（ETD-MAPPO），并加入了双门控认知触发器。与依赖于刚性的帧跳过（宏动作）不同，智能体通过解释随机不确定性（通过策略的香农熵）和认知不确定性（通过双批评者架构中的状态价值发散）来自主调节执行频率。为此，我们将环境结构化为半马尔可夫决策过程（SMDP），并构建了SMDP对齐的异步梯度遮蔽批评者，以确保适当的信用分配。实证发现表明，与当前时间模型相比，该方法在相对基准获取上实现了显著提升（> 60%）。通过评估LBF、MPE以及Google Research Football（GRF）的115维状态空间，ETD正确地防止了提前策略崩溃。值得注意的是，这种无约束的方法导致了时间角色专业化，减少了计算开销，统计上占主导地位的73.6%在离球执行期间，而不会损害集中任务主导性。

英文摘要

While Multi-Agent Reinforcement Learning (MARL) algorithms achieve unprecedented successes across complex continuous domains, their standard deployment strictly adheres to a synchronous operational paradigm. Under this paradigm, agents are universally forced to execute deep neural network inferences at every micro-frame, regardless of immediate necessity. This dense throughput acts as a fundamental barrier to physical deployment on edge-devices where thermal and metabolic budgets are highly constrained. We propose Epistemic Time-Dilation MAPPO (ETD-MAPPO), augmented with a Dual-Gated Epistemic Trigger. Instead of depending on rigid frame-skipping (macro-actions), agents autonomously modulate their execution frequency by interpreting aleatoric uncertainty (via Shannon entropy of their policy) and epistemic uncertainty (via state-value divergence in a Twin-Critic architecture). To format this, we structure the environment as a Semi-Markov Decision Process (SMDP) and build the SMDP-Aligned Asynchronous Gradient Masking Critic to ensure proper credit assignment. Empirical findings demonstrate massive improvements (> 60% relative baseline acquisition leaps) over current temporal models. By assessing LBF, MPE, and the 115-dimensional state space of Google Research Football (GRF), ETD correctly prevented premature policy collapse. Remarkably, this unconstrained approach leads to emergent Temporal Role Specialization, reducing computational overhead by a statistically dominant 73.6% entirely during off-ball execution without deteriorating centralized task dominance.

URL PDF HTML ☆

赞 0 踩 0

2603.17839 2026-05-20 cs.CL cs.AI cs.LG 版本更新

How do LLMs Compute Verbal Confidence

LLMs如何计算言语自信

Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Veličković

发表机构 * Google DeepMind（谷歌深Mind）

AI总结研究探讨了大型语言模型如何内部生成言语自信评分，通过实验发现自信评分在回答生成后被缓存并用于后续输出，揭示了模型自我评估的机制。

详情

AI中文摘要

言语自信——提示LLMs以数字或类别形式陈述其信心——被广泛用于从黑箱模型中提取不确定性估计。然而，LLMs内部如何生成此类评分仍不清楚。我们解答了两个问题：首先，信心是在被请求时即时计算，还是在生成答案时自动计算并缓存以供后续检索；其次，言语自信代表什么——token对数概率，还是更丰富的答案质量评估？我们聚焦于Gemma 3 27B（在TriviaQA、BigMath和MMLU上的表现）、Qwen 2.5 7B以及推理模型Magistral Small 24B，提供了缓存检索的收敛证据。激活引导、修补、噪声和交换实验揭示，信心表示在回答相邻位置先出现，再出现在言语化位置。注意力阻断指出了信息流：信心从回答token中收集，缓存于第一个回答后的位置，然后用于输出。关键发现是线性探测和方差划分揭示，这些缓存表示能够解释超出token对数概率的显著方差，表明是更丰富的答案质量评估，而非简单的流畅性读取。这些发现表明，言语自信反映了自动、复杂的自我评估——而非事后重建——对理解LLMs中的元认知和改进校准具有启示。

英文摘要

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed -- just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents -- token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B (across TriviaQA, BigMath, and MMLU), Qwen 2.5 7B, and the reasoning model Magistral Small 24B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the information flow: confidence is gathered from answer tokens, cached at the first post-answer position, then retrieved for output. Critically, linear probing and variance partitioning reveal that these cached representations explain substantial variance in verbal confidence beyond token log-probabilities, suggesting a richer answer-quality evaluation rather than a simple fluency readout. These findings demonstrate that verbal confidence reflects automatic, sophisticated self-evaluation -- not post-hoc reconstruction -- with implications for understanding metacognition in LLMs and improving calibration.

URL PDF HTML ☆

赞 0 踩 0

2603.16284 2026-05-20 cs.CV cs.LG 版本更新

Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation

定位后再稀疏化：基于归因的视觉幻觉缓解稀疏策略

Tiantian Dang, Chao Bi, Shufan Shen, Jinzhe Liu, Qingming Huang, Shuhui Wang

发表机构 * State Key Lab. of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences（中国科学院人工智能安全国家重点实验室，计算技术研究所）； School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences（中国科学院大学先进交叉科学学院）； School of Computer Science and Technology, University of Chinese Academy of Sciences（中国科学院大学计算机科学与技术学院）

AI总结本文提出了一种名为Locate-Then-Sparsify for Feature Steering (LTS-FS)的框架，通过定位和稀疏化策略，根据每层与幻觉的相关性调整特征引导强度，从而有效缓解视觉语言模型中的幻觉问题，同时保持良好的性能。

Comments Accepted by CVPR 2026

详情

AI中文摘要

尽管大型视觉-语言模型（LVLMs）在技术上取得了显著进展，但其生成幻觉的倾向削弱了可靠性并限制了更广泛的实际应用。在幻觉缓解方法中，特征引导作为一种有前景的方法，能够在不增加推理成本的情况下减少LVLMs中的错误输出。然而，当前的方法在所有层上应用统一的特征引导策略。这种启发式策略忽略了层间的差异，可能会干扰与幻觉无关的层，最终导致在通用任务上的性能下降。在本文中，我们提出了一种名为Locate-Then-Sparsify for Feature Steering (LTS-FS)的即插即用框架，该框架根据每层与幻觉的相关性来控制引导强度。我们首先构建了一个包含token级和句子级幻觉案例的数据集。基于此数据集，我们引入了一种基于因果干预的归因方法，以量化每层的幻觉相关性。利用各层的归因分数，我们提出了一种逐层策略，将这些分数转换为针对单个层的特征引导强度，从而在幻觉相关的层上实现更精确的调整。在多个LVLMs和基准测试中进行的广泛实验表明，LTS-FS有效缓解了幻觉问题，同时保持了强大的性能。代码可在https://github.com/huttersadan/LTS-FS上获得。

英文摘要

Despite the significant advancements in Large Vision-Language Models (LVLMs), their tendency to generate hallucinations undermines reliability and restricts broader practical deployment. Among the hallucination mitigation methods, feature steering emerges as a promising approach that reduces erroneous outputs in LVLMs without increasing inference costs. However, current methods apply uniform feature steering across all layers. This heuristic strategy ignores inter-layer differences, potentially disrupting layers unrelated to hallucinations and ultimately leading to performance degradation on general tasks. In this paper, we propose Locate-Then-Sparsify for Feature Steering (LTS-FS), a plug-and-play framework which controls the steering intensity according to the hallucination relevance of each layer. We first construct a dataset comprising token-level and sentence-level hallucination cases. Based on this dataset, we introduce an attribution method based on causal interventions to quantify the hallucination relevance of each layer. With the attribution scores across layers, we propose a layerwise strategy that converts these scores into feature steering intensities for individual layers, enabling more precise adjustments specifically on hallucination-relevant layers. Extensive experiments across multiple LVLMs and benchmarks demonstrate that LTS-FS effectively mitigates hallucination while preserving strong performance. Codes are available at https://github.com/huttersadan/LTS-FS.

URL PDF HTML ☆

赞 0 踩 0

2603.15411 2026-05-20 cs.AI cs.LG 版本更新

TEA-Time: 跨时间效应传输

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

发表机构 * Amazon SCOT（亚马逊SCOT实验室）； Yale University（耶鲁大学）； Duke University（杜克大学）

AI总结本文提出了一种跨时间效应传输的方法，通过分离的时变效应假设正式化传输的平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。

详情

AI中文摘要

从随机对照试验中估计的处理效应不仅局限于研究人群，还局限于试验进行的时间。关于将实验结果推广到新人群的文献非常广泛，但跨时间传输效应却受到较少关注，甚至定义目标估计量也并不明显。我们正式化了在可分离的时变效应假设下的传输平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。应用于一个大型的头条A/B测试档案库，共同臂策略在精度上显著更高，但当时间因素依赖于干预与测量之间的间隔而非单独的测量时间时，会表现出系统性偏差，而允许这种依赖的重复试验策略则更忠实于真实情况。模拟研究探讨了每种策略在何时可靠以及何时会无声地失败。

英文摘要

Treatment effects estimated from a randomized controlled trial are local not only to the study population but also to the time at which the trial was conducted. The literature on generalizing experimental findings to new populations is extensive, yet transporting effects across time has received far less attention, and even defining the target estimand is nonobvious. We formalize the transported average treatment effect under a separable temporal effects assumption, derive two identification strategies: replicated trials and common arm, and develop doubly robust, semiparametrically efficient estimators for each. Applied to a large archive of headline A/B tests, the common arm strategy is substantially more precise but exhibits systematic bias when the temporal factor depends on the gap between intervention and measurement rather than on measurement time alone, while the replicated trials strategy, which allows this dependence, tracks the ground truth more faithfully. Simulation studies investigate when each strategy is reliable and when it silently fails.

URL PDF HTML ☆

赞 0 踩 0

2602.15752 2026-05-20 cs.LG 版本更新

Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching

超越匹配最大化和公平性：以用户留存优化的双侧匹配

Ren Kishimoto, Rikiya Takehi, Koichi Tanaka, Masahiro Nomura, Riku Togashi, Yoji Tomita, Yuta Saito

发表机构 * Institute of Science Tokyo（东京科学研究所）； Waseda University（早稻田大学）； Keio University（庆应大学）； CyberAgent Tokyo（CyberAgent 东京）； Hajuku-kaso, Co., Ltd.（汉久科社）

AI总结本文提出了一种新的双侧匹配优化方法，旨在最大化用户留存而非单纯匹配数量或公平性，通过引入动态学习排序算法MRet，利用用户个性化留存曲线优化推荐策略，提升整体用户留存率。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

在在线约会和招聘等双侧匹配平台上，推荐算法通常旨在最大化总匹配数。然而，这一目标导致了不平衡，一些用户获得过多匹配而另一些用户则获得极少并最终离开平台。对于许多平台，尤其是依赖订阅的平台，用户留存至关重要。一些平台可能使用公平性目标来解决匹配最大化的问题。然而，公平性本身并非所有平台的最终目标，因为用户不会仅仅因为曝光均等而奖励平台。在实践中，用户留存通常是最终目标，随意依赖公平性会使留存优化取决于运气。在本工作中，我们没有最大化匹配或公理化定义公平性，而是正式定义了双侧匹配平台中最大化用户留存的新问题设置。为此，我们引入了一种动态学习到排序（LTR）算法，称为Matching for Retention（MRet）。与传统的双侧匹配算法不同，我们的方法通过从每个用户档案和交互历史中学习个性化留存曲线来建模用户留存。基于这些曲线，MRet通过同时考虑接收推荐的用户和被推荐用户的留存收益，动态调整推荐策略，使得有限的匹配机会分配到最能提高整体留存的地方。自然但重要的是，对主要在线约会平台的合成和真实世界数据集的实证评估显示，MRet实现了更高的用户留存率，因为传统方法优化匹配或公平性而非留存。

英文摘要

On two-sided matching platforms such as online dating and recruiting, recommendation algorithms often aim to maximize the total number of matches. However, this objective creates an imbalance, where some users receive far too many matches while many others receive very few and eventually abandon the platform. Retaining users is crucial for many platforms, such as those that depend heavily on subscriptions. Some may use fairness objectives to solve the problem of match maximization. However, fairness in itself is not the ultimate objective for many platforms, as users do not suddenly reward the platform simply because exposure is equalized. In practice, where user retention is often the ultimate goal, casually relying on fairness will leave the optimization of retention up to luck. In this work, instead of maximizing matches or axiomatically defining fairness, we formally define the new problem setting of maximizing user retention in two-sided matching platforms. To this end, we introduce a dynamic learning-to-rank (LTR) algorithm called Matching for Retention (MRet). Unlike conventional algorithms for two-sided matching, our approach models user retention by learning personalized retention curves from each user's profile and interaction history. Based on these curves, MRet dynamically adapts recommendations by jointly considering the retention gains of both the user receiving recommendations and those who are being recommended, so that limited matching opportunities can be allocated where they most improve overall retention. Naturally but importantly, empirical evaluations on synthetic and real-world datasets from a major online dating platform show that MRet achieves higher user retention, since conventional methods optimize matches or fairness rather than retention.

URL PDF HTML ☆

赞 0 踩 0

2602.13466 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Language Model Memory and Memory Models for Language

语言模型记忆与记忆模型用于语言

Benjamin L. Badger

发表机构 * IBM（IBM公司）

AI总结研究探讨了语言模型和记忆模型在信息存储中的能力差异，发现语言模型的嵌入向量信息较少，而自编码器在输入再生训练中能形成接近完美的记忆，提出了一种可并行的编码器-解码器记忆模型架构，并通过结合因果和信息保留目标函数来提升记忆形成和解码能力。

详情

AI中文摘要

机器学习模型存储输入信息的能力，类似于“记忆”的概念，在隐藏层向量嵌入中被广泛使用但未充分表征。我们发现，无论数据和计算规模如何，语言模型嵌入通常包含相对较少的输入信息。相比之下，用于输入再生训练的自编码器嵌入能够形成几乎完美的记忆。用记忆嵌入替代令牌序列可带来显著的计算效率，从而引入一种可并行的编码器-解码器记忆模型架构。在因果训练后，这些模型包含信息贫乏的嵌入，无法进行任意信息访问，但通过结合因果和信息保留目标函数，它们学会形成和解码信息丰富的记忆。通过冻结高保真编码器并采用课程训练方法，解码器首先学习处理记忆，然后学习预测下一个令牌。我们引入了观点，即仅使用下一个令牌预测训练不足以准确形成记忆，因为目标本身不可逆，从而推动在输入不完全暴露的情况下使用结合目标函数的模型。

英文摘要

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically contain relatively little input information regardless of data and compute scale during training. In contrast, embeddings from autoencoders trained for input regeneration are capable of nearly perfect memory formation. The substitution of memory embeddings for token sequences leads to substantial computational efficiencies, motivating the introduction of a parallelizable encoder-decoder memory model architecture. Upon causal training these models contain information-poor embeddings incapable of arbitrary information access, but by combining causal and information retention objective functions they learn to form and decode information-rich memories. Training can be further streamlined by freezing a high fidelity encoder followed by a curriculum training approach where decoders first learn to process memories and then learn to additionally predict next tokens. We introduce the perspective that next token prediction training alone is poorly suited for accurate memory formation as the objective itself is non-invertible, motivating the use of combined objective functions for models where the entire input is not exposed.

URL PDF HTML ☆

赞 0 踩 0

2602.11910 2026-05-20 cs.SD cs.LG 版本更新

TADA! Tuning Audio Diffusion Models through Activation Steering

TADA! 通过激活引导调整音频扩散模型

Łukasz Staniszewski, Katarzyna Zaleska, Mateusz Modrzejewski, Kamil Deja

发表机构 * Warsaw University of Technology（华沙技术大学）； IDEAS Research Institute（IDEAS研究院）

AI总结本文通过激活引导技术揭示音频扩散模型中的语义瓶颈，并展示了局部激活引导在音频概念调节中的新状态-of-the-art性能。

Comments Preprint

详情

AI中文摘要

音频扩散模型能够从文本生成高质量的音乐，但实现对特定音乐属性的精细控制仍然具有挑战性，因为其内部机制对高级概念的表示尚不明确。在本文中，我们利用激活修补技术证明，最近的音频扩散架构存在语义瓶颈，其中一小部分连续的注意力层控制不同的音乐概念，例如特定乐器、人声或音乐类型的存在。在此基础上，我们系统地评估了广泛的应用引导方法，比较了激活引导与提示级、乐谱空间和权重空间干预，分析了引导机制与干预位置之间的相互作用。我们的新基准，通过广泛的用户研究支持，证明了局部激活引导在音频概念调节中建立了新的状态-of-the-art性能。

英文摘要

Audio diffusion models can synthesize high-fidelity music from text, yet achieving fine-grained control over specific musical attributes remains challenging, as their internal mechanisms for representing high-level concepts are poorly understood. In this work, we use activation patching to demonstrate that recent audio diffusion architectures exhibit a semantic bottleneck, where a small, shared subset of consecutive attention layers controls distinct musical concepts, such as the presence of specific instruments, vocals, or genres. Building on this, we systematically evaluate a broad spectrum of steering paradigms, comparing activation steering against prompt-level, score-space, and weight-space interventions, analyzing the interaction between the steering mechanism and the intervention site. Our new benchmark, supported by an extensive user study, demonstrates that localized activation steering establishes a new state-of-the-art in audio concept modulation.

URL PDF HTML ☆

赞 0 踩 0

2602.11767 2026-05-20 cs.AI cs.CL cs.LG 版本更新

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents

TSR：用于LLM代理多轮RL的轨迹搜索

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Heiko Ludwig, Holger Boche

发表机构 * Technical University Munich（慕尼黑技术大学）； IBM Research（IBM研究院）

AI总结本文提出TSR，一种在训练时改进每轮轨迹生成的方法，通过轻量级树状搜索构造高质量轨迹，提升rollout质量和学习稳定性，适用于多轮RL任务。

详情

AI中文摘要

大规模语言模型（LLMs）的进步正在推动使用强化学习（RL）来训练代理，从跨任务的迭代、多轮交互中学习。然而，多轮RL仍然具有挑战性，因为奖励通常稀疏或延迟，而环境可能是随机的。在这种情况下，朴素的轨迹采样会阻碍利用并导致模式崩溃。我们提出了TSR（轨迹搜索rollouts），一种训练时的方法，重新利用测试时扩展的想法以改进每轮rollout生成。TSR通过基于状态的反馈在每个回合中选择高分动作，进行轻量级树状搜索来构造高质量轨迹。这提高了rollout质量并稳定了学习，同时与标准策略梯度优化器兼容，使TSR对优化器无偏见。我们用best-of-N、beam和浅层前瞻搜索实例化TSR，并与PPO和GRPO配对，在Sokoban、FrozenLake和WebShop任务中实现高达15%的性能提升和更稳定的训练，仅需适度增加一次训练计算。通过将搜索从推理时间转移到训练的rollout阶段，TSR提供了一种模块化且通用的机制，用于更强的多轮代理学习，与现有框架和拒绝采样式选择方法互补。

英文摘要

Advances in large language models (LLMs) are driving a shift toward using reinforcement learning (RL) to train agents from iterative, multi-turn interactions across tasks. However, multi-turn RL remains challenging as rewards are often sparse or delayed, and environments can be stochastic. In this regime, naive trajectory sampling can hinder exploitation and induce mode collapse. We propose TSR (Trajectory-Search Rollouts), a training-time approach that repurposes test-time scaling ideas for improved per-turn rollout generation. TSR performs lightweight tree-style search to construct high-quality trajectories by selecting high-scoring actions at each turn using state-based feedback. This improves rollout quality and stabilizes learning while remaining compatible with standard policy gradient optimizers, making TSR optimizer-agnostic. We instantiate TSR with best-of-N, beam, and shallow lookahead search, and pair it with PPO and GRPO, achieving up to 15% performance gains and more stable learning on Sokoban, FrozenLake, and WebShop tasks at a modest, one-time increase in training compute. By moving search from inference time to the rollout stage of training, TSR provides a modular and general mechanism for stronger multi-turn agent learning, complementary to existing frameworks and rejection-sampling-style selection methods.

URL PDF HTML ☆

赞 0 踩 0

2602.11454 2026-05-20 cs.DS cs.LG 版本更新

Adaptive Power Iteration Method for Differentially Private PCA

自适应幂迭代法用于差分隐私主成分分析

Ta Duy Nguyen, Alina Ene, Huy Le Nguyen

发表机构 * Department of Computer Science, Boston University（波士顿大学计算机科学系）； Khoury College of Computer and Information Science, Northeastern University（东北大学科里学院计算机与信息科学学院）

AI总结本文研究了在差分隐私下近似计算矩阵A的顶级奇异向量的算法，提出了一种自适应过滤技术，适用于低相干性输入矩阵，从而在保证隐私的同时提高计算效率。

详情

AI中文摘要

我们研究了在差分隐私下近似计算矩阵A∈R^{n×d}的顶级奇异向量的算法，其中A的每一行都是R^d中的数据点。遵循Dwork-Talwar-Thakurta-Zhang（STOC 2014）的隐私模型，我们考虑相邻输入仅在一行上不同的情况。我们提出了一种新的算法，该算法在输入矩阵具有低相干性时能够提供超越最坏情况的保证，这是许多应用中矩阵的结构特性，包括但不限于独立同分布数据。我们的算法为私有幂迭代方法的文献做出了贡献，其中我们引入了一种新的过滤技术，该技术适应于此相干参数。我们的工作在Hardt-Roth（STOC 2013）的工作基础上进行了扩展和补充，后者在更严格的隐私模型下实现了超越最坏情况的保证，其中相邻输入在单个条目上最多相差1。

英文摘要

We study $\left(ε,δ\right)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data point in $\mathbb{R}^{d}$. Following Dwork-Talwar-Thakurta-Zhang (STOC 2014), we consider the privacy model where neighboring inputs differ by one single row. We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence, which is a structural property of matrices in many applications, including but not limited to i.i.d. data. Our algorithm contributes to the extensive literature on private power iteration methods, where we introduce a new filtering technique which adapts to this coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which achieves beyond-worst-case guarantees for the more restrictive privacy model where neighboring inputs differ in one single entry by at most 1.

URL PDF HTML ☆

赞 0 踩 0

2602.07570 2026-05-20 q-bio.NC cs.AI cs.CV cs.LG 版本更新

How does longer temporal context enhance multimodal narrative video processing in the brain?

更长的时间上下文如何增强大脑对多模态叙事视频的处理？

Prachi Jindal, Anant Khandelwal, Manish Gupta, Bapi S. Raju, Subba Reddy Oota, Tanmoy Chakraborty

发表机构 * Technische Universität Berlin（柏林技术大学）； Microsoft Research（微软研究院）； IIT Delhi（德里理工学院）； Microsoft（微软）； IIIT-Hyderabad（海得拉巴理工学院）

AI总结本研究探讨了视频片段时长和叙事任务提示如何影响自然电影观看过程中大脑模型对多模态大语言模型（MLLMs）的对齐情况，发现增加片段持续时间显著提高了大脑对齐程度，而单模态视频模型则无明显提升。

Comments 22 pages, 15 figures

详情

AI中文摘要

理解人类和人工智能系统如何处理复杂的叙事视频是一个在神经科学和机器学习交汇处的基本挑战。本研究调查了视频片段的时间上下文长度（3-24秒片段）和叙事任务提示如何影响自然电影观看过程中大脑模型的对齐情况。利用受试者观看完整电影的fMRI记录，我们研究了对叙事上下文敏感的大脑区域如何在不同时间尺度上动态表示信息，以及这些神经模式如何与模型派生的特征对齐。我们发现，增加片段持续时间显著提高了多模态大语言模型（MLLMs）的大脑对齐程度，而单模态视频模型则几乎没有提升。进一步地，较短的时间窗口与感知和早期语言区域对齐，而较长的窗口则更倾向于与更高阶整合区域对齐，这在MLLMs中表现为层到皮层的层次结构。最后，使用四个叙事任务提示的实验显示，这些提示会引发任务特定、区域依赖性的大脑对齐模式，并在更高阶区域引起上下文依赖的片段级调谐变化。我们的工作将长篇叙事电影定位为研究长时间尺度时间整合在长上下文MLLMs中的原理性测试平台，以及其与叙事理解过程中皮层响应关系的桥梁。

英文摘要

Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study investigates how the temporal context length of video clips (3--24 s clips) and the narrative-task prompting shape brain-model alignment during naturalistic movie watching. Using fMRI recordings from participants viewing full-length movies, we examine how brain regions sensitive to narrative context dynamically represent information over varying timescales and how these neural patterns align with model-derived features. We find that increasing clip duration substantially improves brain alignment for multimodal large language models (MLLMs), whereas unimodal video models show little to no gain. Further, shorter temporal windows align with perceptual and early language regions, while longer windows preferentially align higher-order integrative regions, mirrored by a layer-to-cortex hierarchy in MLLMs. Finally, experiments with four narrative-task prompts show that they elicit task-specific, region-dependent brain alignment patterns and context-dependent shifts in clip-level tuning in higher-order regions. Our work positions long-form narrative movies as a principled testbed for studying long-timescale temporal integration in long-context MLLMs and its relationship to cortical responses during narrative comprehension.

URL PDF HTML ☆

赞 0 踩 0

2602.07008 2026-05-20 cs.CV cs.LG 版本更新

Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making

不应学习的地方：基于子集归因约束的先验对齐训练以实现可靠的决策制定

Ruoyu Chen, Shangquan Sun, Xiaoqing Guo, Sanyi Zhang, Kangwei Liu, Shiming Liu, Zhangcheng Wang, Qunli Zhang, Hua Zhang, Xiaochun Cao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； University of Chinese Academy of Sciences（中国科学院大学）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算与数据科学学院）； Department of Computer Science, Hong Kong Baptist University（香港 Baptist 大学计算机科学系）； Communication University of China（中国传媒大学）； Imperial College London（伦敦帝国学院）； School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University（中山大学深圳校区网络科学与技术学院）

AI总结本文提出了一种基于归因的先验对齐方法，通过子集选择归因技术约束模型依赖于人类先验区域，从而提升决策的可靠性。

详情

AI中文摘要

可靠的模型不仅要预测正确，还要能用可接受的证据来解释决策。然而，传统监督学习通常只提供类别级标签，使模型通过捷径相关性实现高精度，而非预期的证据。人类先验可以约束此类行为，但对齐模型到这些先验仍然具有挑战性，因为学习的表示往往偏离人类感知。为了解决这一挑战，我们提出了一种基于归因的人类先验对齐方法。我们将人类先验编码为模型应依赖的输入区域（例如边界框），并利用高度忠实的子集选择归因方法，在训练过程中暴露模型的决策证据。当归因区域显著偏离先验区域时，我们惩罚对非先验证据的依赖，促使模型将归因转向预期区域。这是通过一个训练目标实现的，该目标通过人类先验诱导归因约束。我们在基于MLLM的GUI代理模型上验证了我们的方法，涵盖图像分类和点击决策任务。在传统分类和自回归生成设置中，人类先验对齐一致提高了任务准确性，同时增强了模型的决策合理性。

英文摘要

Reliable models should not only predict correctly, but also justify decisions with acceptable evidence. Yet conventional supervised learning typically provides only class-level labels, allowing models to achieve high accuracy through shortcut correlations rather than the intended evidence. Human priors can help constrain such behavior, but aligning models to these priors remains challenging because learned representations often diverge from human perception. To address this challenge, we propose an attribution-based human prior alignment method. We encode human priors as input regions that the model is expected to rely on (e.g., bounding boxes), and leverage a highly faithful subset-selection-based attribution approach to expose the model's decision evidence during training. When the attribution region deviates substantially from the prior regions, we penalize reliance on off-prior evidence, encouraging the model to shift its attribution toward the intended regions. This is achieved through a training objective that imposes attribution constraints induced by the human prior. We validate our method on both image classification and click decision tasks in MLLM-based GUI agent models. Across conventional classification and autoregressive generation settings, human prior alignment consistently improves task accuracy while also enhancing the model's decision reasonability.

URL PDF HTML ☆

赞 0 踩 0

2602.06462 2026-05-20 cs.CL cs.LG 版本更新

在持续学习中寻找结构

Pourya Shamsolmoali, Masoumeh Zareapoor

AI总结本文提出了一种基于Douglas-Rachford Splitting方法的持续学习框架，通过解耦的两个目标在稳定性和可塑性之间进行协商，实现了更高效且稳定的持续学习。

Comments There is a bug in the algorithm and implementation

详情

AI中文摘要

从一系列任务中学习通常面临可塑性与稳定性的矛盾：获取新知识往往导致对过去信息的灾难性遗忘。大多数方法通过求和竞争损失项来解决这一问题，产生梯度冲突，通常需要复杂的且效率低的策略如外部记忆回放或参数正则化来管理。我们提出了一种使用Douglas-Rachford Splitting（DRS）重新表述持续学习目标的方法。这种方法将学习过程重新表述为两个解耦目标之间的协商：一个促进新任务的可塑性，另一个确保旧知识的稳定性。通过迭代地通过其近端算子寻找共识，DRS提供了一种更加系统和稳定的持续学习动态。我们的方法在不需辅助模块或复杂附加组件的情况下实现了稳定性与可塑性之间的高效平衡，为持续学习系统提供了一种更简单却更强大的范式。

英文摘要

Learning from a stream of tasks usually pits plasticity against stability: acquiring new knowledge often causes catastrophic forgetting of past information. Most methods address this by summing competing loss terms, creating gradient conflicts that are managed with complex and often inefficient strategies such as external memory replay or parameter regularization. We propose a reformulation of the continual learning objective using Douglas-Rachford Splitting (DRS). This reframes the learning process not as a direct trade-off, but as a negotiation between two decoupled objectives: one promoting plasticity for new tasks and the other enforcing stability of old knowledge. By iteratively finding a consensus through their proximal operators, DRS provides a more principled and stable learning dynamic. Our approach achieves an efficient balance between stability and plasticity without the need for auxiliary modules or complex add-ons, providing a simpler yet more powerful paradigm for continual learning systems.

URL PDF HTML ☆

赞 0 踩 0

2602.02513 2026-05-20 cs.LG cond-mat.mtrl-sci 版本更新

Learning ORDER-Aware Multimodal Representations for Composite Materials Design

学习有序的多模态表示以进行复合材料设计

Xinyao Li, Hangwei Qian, Jingjing Li, Lei Zhu, Ivor Tsang

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； Tongji University（同济大学）； A*STAR CFAR（新加坡A*STAR CFAR）

AI总结本研究提出了一种基于有序性的多模态预训练框架ORDER，用于复合材料设计，通过整合异构数据源来捕捉纤维分布，从而在连续设计空间中实现有效的属性预测和微结构生成。

详情

AI中文摘要

人工智能在材料发现和性质预测中展现出显著的成功，尤其是在晶体和聚合物系统中，其中材料性质和结构主要由离散图表示主导。这种图中心范式在复合材料中失效，因为复合材料具有连续和非线性的设计空间。通用复合描述符，例如纤维体积和偏移角度，无法完全捕捉决定微结构特性的纤维分布，需要通过多模态学习整合异构数据源。现有的对齐导向框架在离散、唯一的图-性质映射假设下对大量晶体或聚合物数据有效，但在极端数据稀缺的情况下无法解决高度连续的复合设计空间。在本工作中，我们引入了ORDinal-aware imagE-tabulaR alignment（ORDER），一种多模态预训练框架，将有序性作为材料表示的核心原则。ORDER确保具有相似目标属性的材料在潜在空间中占据附近区域，这有效地保持了复合材料属性的连续性，并在稀疏观察设计之间实现了有意义的插值。我们评估了ORDER在纳米纤维增强复合材料数据集和碳纤维T700数据集上的表现。ORDER及其变体在属性预测、跨模态检索和微结构生成任务中均优于对齐导向和定制属性意识对比基线。我们进一步引入基于物理的有序替代信号，避免了预训练过程中需要完整的属性注释。我们的工作证明了学习连续多模态特征对于复合材料是基础性的，并提供了一条通往数据高效通用多模态智能系统可靠路径。

英文摘要

Artificial intelligence has shown remarkable success in materials discovery and property prediction, particularly for crystalline and polymer systems where material properties and structures are dominated by discrete graph representations. Such graph-central paradigm breaks down on composite materials, which possess continuous and nonlinear design spaces. General composite descriptors, e.g., fiber volume and misalignment angle, cannot fully capture the fiber distributions that determine microstructural characteristics, necessitating the integration of heterogeneous data sources through multimodal learning. Existing alignment-oriented frameworks have proven effective on abundant crystal or polymer data under discrete, unique graph-property mapping assumptions, but fail to address the highly continuous composite design space under extreme data scarcity. In this work we introduce ORDinal-aware imagE-tabulaR alignment (ORDER), a multimodal pretraining framework that establishes ordinality as a core principle for material representations. ORDER ensures that materials with similar target properties occupy nearby regions in the latent space, which effectively preserves the continuous nature of composite properties and enables meaningful interpolation between sparsely observed designs. We evaluate ORDER on a Nanofiber-reinforced composite dataset and a carbon fiber T700 dataset. ORDER and its variants outperform both alignment-oriented and customized property-aware contrastive baselines across property prediction, cross-modal retrieval, and microstructure generation tasks. We further introduce physics-based ordinal surrogate signals avoiding the need for full property annotation during pretrain. Our work demonstrates learning continuous multimodal features are fundamental for composite materials, and provides a reliable pathway toward data-efficient universal multimodal intelligent systems.

URL PDF HTML ☆

赞 0 踩 0

2601.22478 2026-05-20 cs.LG 版本更新

迭代组合数据生成用于机器人控制

Anh-Quan Pham, Marcel Hussing, Shubhankar P. Patankar, Dani S. Bassett, Jorge Mendez-Mendez, Eric Eaton

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Stony Brook University（石溪大学）

AI总结本文提出了一种语义组合扩散变换器，通过注意力机制学习机器人、物体、障碍物和目标特定组件的交互，从而在有限任务集上训练后，能够零样本生成高质量过渡，进而学习未见任务组合的控制策略，并通过迭代自我改进过程提升零样本性能。

详情

AI中文摘要

收集机器人操作数据成本高昂，使得在多对象、多机器人和多环境设置中获取大量任务演示不切实际。尽管最近的生成模型可以为单个任务合成有用的数据，但它们未能利用机器人领域的组合结构，并且在泛化到未见任务组合时表现不佳。我们提出了一种语义组合扩散变换器，将过渡分解为机器人、物体、障碍物和目标特定的组件，并通过注意力机制学习它们的交互。一旦在有限的任务子集上训练，我们展示了模型能够零样本生成高质量的过渡，从而学习未见任务组合的控制策略。然后，我们引入了一个迭代自我改进过程，其中合成数据通过离线强化学习验证，并纳入后续的训练轮次中。我们的方法在单体和硬编码组合基线之上显著提高了零样本性能，最终解决了几乎所有未见任务，并展示了学习表示中出现有意义的组合结构。

英文摘要

Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.

URL PDF HTML ☆

赞 0 踩 0

2512.05958 2026-05-20 cs.LG cs.AI 版本更新

开放集域适应在背景分布偏移下的挑战：挑战与一种可证明高效的解决方案

Shravan Chaudhari, Yoav Wald, Suchi Saria

发表机构 * Department of Computer Science, Johns Hopkins University（约翰霍普金斯大学计算机科学系）； Faculty of Data and Decision Sciences, Technion（技术学院数据与决策科学学院）； Center for Data Science, New York University（纽约大学数据科学中心）； Bayesian Health（贝叶斯健康）

AI总结本文研究了在背景分布偏移情况下开放集域适应的挑战，并提出了一种可证明高效的解决方案CoLOR，通过理论分析和实验证明其在简化过参数化设置中优于基线方法，同时展示了其在图像和文本数据上的广泛适用性。

Comments Project page at https://github.com/Shra1-25/CoLOR

详情

Journal ref: Transactions on Machine Learning Research (TMLR) 2026/May ISSN: 2835-8856

AI中文摘要

随着我们将机器学习系统部署到现实世界中，一个核心挑战是保持模型在数据偏移时的性能。这种偏移可以以多种形式存在：新类可能在训练时不存在，这被称为开放集识别，以及已知类别的分布可能发生变化。对于开放集识别的保证大多基于假设已知类别的分布（我们称之为背景分布）是固定的。在本文中，我们开发了CoLOR，一种在挑战性情况下（即背景分布偏移）也能解决开放集识别的方法。我们证明该方法在温和假设下有效，即新类可与非新类分离，并提供理论保证，表明其在简化过参数化设置中优于代表基线方法。我们开发了使CoLOR可扩展和稳健的技术，并在图像和文本数据上进行了全面的实证评估。结果表明，CoLOR在背景偏移下显著优于现有开放集识别方法。此外，我们还提供了新的见解，探讨了诸如新类大小等因素对性能的影响，这在先前工作中尚未得到广泛探索。

英文摘要

As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may emerge that were absent during training, a problem known as open-set recognition, and the distribution of known categories may change. Guarantees on open-set recognition are mostly derived under the assumption that the distribution of known classes, which we call the background distribution, is fixed. In this paper we develop CoLOR, a method that is guaranteed to solve open-set recognition even in the challenging case where the background distribution shifts. We prove that the method works under benign assumptions that the novel class is separable from the non-novel classes, and provide theoretical guarantees that it outperforms a representative baseline in a simplified overparameterized setting. We develop techniques to make CoLOR scalable and robust, and perform comprehensive empirical evaluations on image and text data. The results show that CoLOR significantly outperforms existing open-set recognition methods under background shift. Moreover, we provide new insights into how factors such as the size of the novel class influences performance, an aspect that has not been extensively explored in prior work.

URL PDF HTML ☆

赞 0 踩 0

2511.12158 2026-05-20 cs.LG 版本更新

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

用于细粒度鸟类叫声分析的数据高效自监督算法

Houtan Ghaffari, Lukas Rauch, Paul Devos

发表机构 * Department of Information Technology, Ghent University（根特大学信息科技系）； Intelligent Embedded Systems, University of Kassel（卡塞尔大学智能嵌入式系统）

AI总结本文提出了一种数据高效的鸟类叫声标注器，通过三阶段训练流程在最小标注情况下开发可靠的鸟类叫声音节检测器，并在极端标注稀缺场景下验证了其有效性，同时评估了自监督嵌入在线性探测和无监督鸟类叫声分析中的潜力。

详情

AI中文摘要

生物声学、神经科学和语言学研究经常使用鸟类叫声作为代理来获取跨不同领域的知识。这需要音频模型能够标注和解析鸟类叫声。开发此类模型需要精确的、音节级注释的训练数据。因此，减少标注成本的自动化方法需求迫切。本文提出了一种数据高效的鸟类叫声标注器，称为残差多层感知机递归神经网络。然后，本文提出了一个三阶段训练流程，以在最小标注情况下开发可靠的鸟类叫声音节检测器。第一阶段是从未标注数据中进行自监督学习。探索了两种最成功的预训练范式，即掩码预测和在线聚类。第二阶段是使用有效的数据增强进行监督训练，以为每个个体生成稳健的帧级音节检测器。第三阶段是一个半监督的后训练步骤，利用未标注数据来优化每个个体的模型。该方法在极端标注稀缺场景下对金翅雀叫声进行了验证。从信号处理的角度来看，金翅雀叫声表现出最具有挑战性的频谱-时间模式之一，对于算法时间序列标注而言：快速的发声、短暂的音节间间隔、快速且宽带的频率扫频，以及需要细粒度特征区分的光谱相似音节。因此，成功的金翅雀音节检测算法为其他鸟类建立了稳健的基准。这种方法论的泛化在白喉歌鸲叫声标注的案例研究中得到了验证。最后，评估了自监督嵌入在线性探测和无监督鸟类叫声分析中的潜力。

英文摘要

Research in bioacoustics, neuroscience, and linguistics often uses birdsong as a proxy to acquire knowledge across diverse areas. This requires audio models to annotate and parse the birdsong. Developing such models requires precise, syllable-level annotated training data. Therefore, automated methods that reduce annotation costs are in demand. This work presents a data-efficient birdsong annotator called Residual Multi-Layer Perceptron Recurrent Neural Network. It then presents a three-stage training pipeline for developing reliable birdsong syllable detectors with minimal annotation. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentation to produce a robust frame-level syllable detector for each individual. The third stage is a semi-supervised post-training step that refines each individual's model using unlabeled data. The effectiveness of this approach is demonstrated for the Canary song in extreme label-scarcity scenarios. From a signal-processing perspective, the Canary song exhibits one of the most challenging spectro-temporal patterns for algorithmic time-series annotation: rapid vocalizations, brief inter-syllabic intervals, fast and broadband frequency sweeps, and spectrally similar syllables that require fine-grained features to distinguish. Hence, a successful syllable detection algorithm for Canary also establishes a robust baseline for other birds. This methodological generalization is validated in a case study of Bengalese Finch song annotation. Finally, the potential of self-supervised embeddings is assessed for linear probing and unsupervised birdsong analysis.

URL PDF HTML ☆

赞 0 踩 0

2511.11688 2026-05-20 cs.LG cs.CV 版本更新

Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling

分层调度优化用于快速且稳健的扩散模型采样

Aihua Zhu, Rui Su, Qinglin Zhao, Li Feng, Meng Shen, Shibo He

发表机构 * School of Computer Science and Engineering, Macau University of Science and Technology（澳门科学技术大学计算机科学与工程学院）； Beijing Institute of Technology（北京理工大学）； Zhejiang University（浙江大学）

AI总结本文提出了一种分层调度优化方法，通过改进的双层优化框架，在极低的函数评估次数下实现高效的扩散模型采样，显著提升了样本质量和计算效率。

Comments Preprint, accepted to AAAI 2026

详情

AI中文摘要

扩散概率模型在生成保真度方面设立了新标准，但受到采样过程缓慢的迭代限制。一种强大的无训练策略是调度优化，旨在在固定的、较小的函数评估次数（NFE）下找到最优的时间步分布以最大化样本质量。为此，成功的调度优化方法必须遵循四个核心原则：有效性、适应性、实用性鲁棒性和计算效率。然而，现有方法难以同时满足这些原则，推动了更先进解决方案的需求。为克服这些限制，我们提出了分层调度优化器（HSO），一种新颖且高效的双层优化框架。HSO通过交替迭代两个协同层级将全局最优调度的搜索转化为更可处理的问题：上层的全局搜索用于寻找最优初始化策略，下层的局部优化用于调度细化。这一过程由两个关键创新引导：中点误差代理（MEP），一种求解器无关且数值稳定的局部优化目标，以及间距惩罚适应度（SPF）函数，通过惩罚病态接近的时间步确保实用性鲁棒性。大量实验表明，HSO在极低NFE范围内为无训练采样设定了新的状态-of-the-art。例如，仅使用5次NFE，HSO在LAION-Aesthetics上实现显著的FID为11.94，使用Stable Diffusion v2.1。关键的是，这种性能不是通过昂贵的重新训练实现的，而是一次性的优化成本不到8秒，提供了一种高效且实用的扩散模型加速范式。

英文摘要

Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.

URL PDF HTML ☆

赞 0 踩 0

2511.06714 2026-05-20 eess.SY cs.LG cs.SY 版本更新

The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning

人群智慧：利用集成和机器学习实现电力系统中网络攻击和故障的高保真分类

Emad Abukhousa, Syed Sohail Feroz Syed Afroz, Fahad Alsaeed, Abdulaziz Qwbaiban, Saman Zonouz, A. P. Sakis Meliopoulos

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA（电气与计算机工程学院，佐治亚理工学院，美国亚特兰大，GA）

AI总结本文提出了一种高保真评估框架，利用电磁暂态仿真与数字变电站仿真在4.8kHz下评估基于机器学习的网络攻击和物理故障分类方法，通过训练12种机器学习模型并在实时流环境中评估，展示了在流式环境中MLP的鲁棒覆盖性和集成模型的异常精度。

详情

DOI: 10.1109/ISGTMiddleEast65737.2025.11314472

AI中文摘要

本文提出了一种高保真评估框架，用于利用电磁暂态仿真与数字变电站仿真在4.8kHz下评估基于机器学习的网络攻击和物理故障分类方法。十二种机器学习模型，包括集成算法和多层感知机（MLP），在标记的时间域测量上进行训练，并在设计用于子周期响应的实时流环境中进行评估。该架构集成了周期长度平滑滤波器和置信度阈值以稳定决策。结果表明，尽管几种模型在离线准确性方面接近完美（高达99.9%），但只有MLP在流式环境中保持了稳健的覆盖率（98-99%），而集成模型保持了完美的异常精度，但经常回避（10-49%覆盖）。这些发现表明，仅凭离线准确性本身是不可靠的，强调了需要现实的测试和推理管道以确保在基于逆变器资源（IBR）丰富的网络中的可靠分类。

英文摘要

This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with digital substation emulation at 4.8 kHz. Twelve ML models, including ensemble algorithms and a multi-layer perceptron (MLP), were trained on labeled time-domain measurements and evaluated in a real-time streaming environment designed for sub-cycle responsiveness. The architecture incorporates a cycle-length smoothing filter and confidence threshold to stabilize decisions. Results show that while several models achieved near-perfect offline accuracies (up to 99.9%), only the MLP sustained robust coverage (98-99%) under streaming, whereas ensembles preserved perfect anomaly precision but abstained frequently (10-49% coverage). These findings demonstrate that offline accuracy alone is an unreliable indicator of field readiness and underscore the need for realistic testing and inference pipelines to ensure dependable classification in inverter-based resources (IBR)-rich networks.

URL PDF HTML ☆

赞 0 踩 0

2511.06077 2026-05-20 cs.LG cs.IR 版本更新

Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

让序列变长，让速度保持快速：面向十万个用户行为序列的端到端推荐系统

Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, Hangyu Wang, Qiwei Chen, Yi Cheng, Feng Zhang, Xiao Yang

发表机构 * ByteDance Beijing China（字节跳动北京中国）； ByteDance Shanghai China（字节跳动上海中国）； ByteDance San Jose CA USA（字节跳动加州圣何塞 USA）； ByteDance Hangzhou Zhejiang China（字节跳动杭州浙江中国）

AI总结本文提出了一种端到端的推荐系统，能够处理长达10000个用户行为序列，通过引入堆叠的目标到历史交叉注意力机制、请求级别批量处理策略以及长度外推训练策略，实现了在大规模Douyin推荐中的高效长序列建模。

Comments WWW 2026. This work studies end-to-end 10K-scale long user behavior sequence modeling for billion-scale industrial recommendation on Douyin

详情

AI中文摘要

像Douyin这样的短视频推荐系统必须在不牺牲延迟或成本预算的前提下利用极其长的用户行为历史。我们提出了一种端到端的工业推荐系统，将长序列推荐建模扩展到10000长度的历史记录。首先，我们引入了堆叠的目标到历史交叉注意力（STCA），通过用目标到历史的堆叠交叉注意力替代历史自注意力，将复杂度从二次方降低到线性，从而在长用户行为序列上实现高效的端到端训练。其次，我们提出了请求级别批量处理（RLB），一种以用户为中心的批量方案，将相同用户/请求的多个目标聚合起来共享用户侧编码，显著降低了与序列相关的存储、通信和计算成本，而无需改变学习目标。第三，我们设计了一种长度外推训练策略——在较短的窗口上训练，在更长的窗口上推断——从而使模型能够泛化到10000规模的历史记录而无需额外的训练成本。在离线和在线实验中，我们观察到随着历史长度和模型容量的增加，我们获得的收益是可预测且单调的，与在大型语言模型中观察到的扩展定律行为相呼应。在Douyin全流量部署中，我们的系统在关键参与度指标上实现了显著提升，同时满足了生产延迟，展示了将端到端超长序列推荐扩展到10000规模的实用路径。

英文摘要

Short-video recommenders such as Douyin must exploit extremely long user behavior histories without breaking latency or cost budgets. We present an end-to-end industrial recommender system that scales long-sequence recommendation modeling to 10K-length histories in production. First, we introduce Stacked Target-to-History Cross Attention (STCA), which replaces history self-attention with stacked cross-attention from the target to the history, reducing complexity from quadratic to linear in sequence length and enabling efficient end-to-end training over long user behavior sequences. Second, we propose Request Level Batching (RLB), a user-centric batching scheme that aggregates multiple targets for the same user/request to share the user-side encoding, substantially lowering sequence-related storage, communication, and compute without changing the learning objective. Third, we design a length-extrapolative training strategy -- train on shorter windows, infer on much longer ones -- so the model generalizes to 10K-scale histories without additional training cost. Across offline and online experiments, we observe predictable, monotonic gains as we scale history length and model capacity, mirroring the scaling law behavior observed in large language models. Deployed at full traffic on Douyin, our system delivers significant improvements on key engagement metrics while meeting production latency, demonstrating a practical path to scaling end-to-end ultra-long sequence recommendation to the 10K regime.

URL PDF HTML ☆

赞 0 踩 0

2511.01126 2026-05-20 cs.LG cs.NA math.NA math.OC math.ST stat.TH 版本更新

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

在线零阶和一阶双层优化的随机遗憾保证

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, George Michailidis

发表机构 * Amirkabir University of Technology（阿姆斯泰尔大学）； University of Pennsylvania（宾夕法尼亚大学）； Samsung SDS Research America（三星SDS美国研究部）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出了一种新的搜索方向，证明了利用该方向的零阶和一阶随机在线双层优化算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。此外，该框架通过减少超梯度估计中的oracle依赖、同时更新内层和外层变量以及使用基于零阶的Hessian、雅可比和梯度估计来提高效率。

Comments Published at NeurIPS 2025

详情

AI中文摘要

在线双层优化（OBO）是一种强大的框架，用于解决机器学习问题，其中外层和内层目标随时间演变，需要动态更新。当前的OBO方法依赖于确定性的窗口平滑后悔最小化，这在函数变化迅速时可能无法准确反映系统性能。在本文中，我们引入了一种新的搜索方向，并证明利用该方向的零阶和一阶随机OBO算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。除了这些保证外，我们的框架通过以下方式提高效率：（i）减少超梯度估计中的oracle依赖，（ii）在求解线性系统的同时更新内层和外层变量，（iii）使用基于零阶的Hessian、雅可比和梯度估计。在在线参数损失调谐和黑盒对抗攻击的实验中验证了我们的方法。

英文摘要

Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.

URL PDF HTML ☆

赞 0 踩 0

2510.23507 2026-05-20 cs.LG cs.AI cs.IT math.IT 版本更新

A Deep Latent Factor Graph Clustering with Fairness-Utility Trade-off Perspective

具有公平性-效用权衡视角的深度潜在因子图聚类

Siamak Ghodsi, Amjad Seyedi, Tai Le Quy, Fariba Karimi, Eirini Ntoutsi

发表机构 * L3S Research Center（L3S研究所以）； University of Mons（蒙斯大学）； University of Koblenz（科布伦茨大学）； Bundeswehr University（联邦国防军大学）

AI总结本文提出DFNMF，一种针对图的端到端深度非负三因子分解方法，通过软统计平衡正则化直接优化聚类分配，以实现公平性与效用的平衡，同时在合成和真实网络中表现出更高的群体平衡性和更高的模ularity。

Comments Accepted to IEEE Big-Data 2025 main research track. The paper is 10 main pages and 4 pages of Appendix

详情

DOI: 10.1109/BigData66926.2025.11402535
Journal ref: 2025 IEEE International Conference on Big Data (BigData)

AI中文摘要

公平图聚类旨在找到尊重网络结构的同时保持敏感群体比例的划分，应用范围涵盖社区检测、团队组建、资源分配和社会网络分析。许多现有方法强制性约束或依赖多阶段流程（例如谱嵌入后接k-均值），限制了权衡控制、可解释性和可扩展性。我们引入DFNMF，一种针对图的端到端深度非负三因子分解方法，直接优化聚类分配，使用软统计平衡正则化。单个参数λ调节公平性-效用平衡，非负性产生部分因子和透明的软成员资格。优化使用稀疏友好的交替更新，与边数成近线性比例。在合成和真实网络中，DFNMF在可比的模ularity下实现了显著更高的群体平衡，经常在帕累托前沿上超越最先进基线。代码可在https://github.com/SiamakGhodsi/DFNMF.git获得。

英文摘要

Fair graph clustering seeks partitions that respect network structure while maintaining proportional representation across sensitive groups, with applications spanning community detection, team formation, resource allocation, and social network analysis. Many existing approaches enforce rigid constraints or rely on multi-stage pipelines (e.g., spectral embedding followed by $k$-means), limiting trade-off control, interpretability, and scalability. We introduce \emph{DFNMF}, an end-to-end deep nonnegative tri-factorization tailored to graphs that directly optimizes cluster assignments with a soft statistical-parity regularizer. A single parameter $λ$ tunes the fairness--utility balance, while nonnegativity yields parts-based factors and transparent soft memberships. The optimization uses sparse-friendly alternating updates and scales near-linearly with the number of edges. Across synthetic and real networks, DFNMF achieves substantially higher group balance at comparable modularity, often dominating state-of-the-art baselines on the Pareto front. The code is available at https://github.com/SiamakGhodsi/DFNMF.git.

URL PDF HTML ☆

赞 0 踩 0

2510.20035 2026-05-20 stat.ME cs.LG 版本更新

Throwing Vines at the Wall: Structure Learning via Random Search

向墙上投掷藤蔓：通过随机搜索进行结构学习

Thibault Vatter, Thomas Nagler

发表机构 * University of Applied Sciences Western Switzerland（应用科学西瑞士大学）； LMU Munich（慕尼黑大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）

AI总结本文提出基于模型置信集的统计框架和随机搜索算法，以改进结构选择，提供理论保证，并为集成学习奠定基础。

2510.19382 2026-05-20 stat.ML cs.LG 版本更新

A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

一种用于结构发现的去随机化框架：应用于神经网络及其他领域

Nikos Tsikouras, Yorgos Pantis, Ioannis Mitliagkas, Christos Tzamos

发表机构 * National and Kapodistrian University of Athens（希腊国家与卡波迪斯蒂亚纳大学）； Archimedes, Athena Research Center（阿提卡研究中心）； Mila & Université de Montréal（蒙特利尔大学）

AI总结本文研究了神经网络中特征学习动态的理解问题，提出了一种基于去随机化方法的结构发现框架，在更弱的假设下探讨了结构发现的本质及其在MAXCUT端到端近似和Johnson-Lindenstrauss嵌入计算中的应用。

详情

AI中文摘要

理解神经网络中特征学习动态的机制仍然是一个重大挑战。Mousavi-Hosseini等人（2023）分析了多重索引教师-学生设置，并展示了在使用随机梯度下降（SGD）和强正则化器训练时，两层学生模型的第一层权重会呈现低秩结构。这种结构特性已知可以减少泛化样本复杂度。在第二步中，同一作者们在额外假设下建立了算法特定的学习保证。本文专注于结构发现方面，并在更弱的假设下研究了该问题，具体包括：允许任意大小和深度的神经网络，所有参数可训练，任何平滑损失函数，微弱正则化，以及通过任何能够达到二阶平稳点（SOSP）的方法（例如扰动梯度下降（PGD））进行训练。我们方法的核心是一个关键的去随机化引理，该引理指出在温和条件下，优化函数E_x[g_θ(Wx + b)]会收敛到W=0的点。该引理的本质直接解释了结构发现，并在其他领域如端到端MAXCUT近似和Johnson-Lindenstrauss嵌入计算中具有即时应用。

英文摘要

Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g.\ perturbed gradient descent (PGD). At the core of our approach is a key $\textit{derandomization}$ lemma, which states that optimizing the function $\mathbb{E}_{\mathbf{x}} \left[g_θ(\mathbf{W}\mathbf{x} + \mathbf{b})\right]$ converges to a point where $\mathbf{W} = \mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.

URL PDF HTML ☆

赞 0 踩 0

2510.18821 2026-05-20 cs.LG 版本更新

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

搜索自play：在无监督条件下推动智能体能力的前沿

Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Jiaqi Guo, Haotian Xu, Chutian Wang, Haonan Chen, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba（阿里巴巴文勤大模型应用团队）

AI总结本文提出了一种基于自play的深度搜索智能体训练方法，通过自动生成任务和解决任务来提升智能体在无监督条件下的性能，无需外部监督。

Comments Published as a conference paper at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情

AI中文摘要

可验证奖励的强化学习（RLVR）已成为训练大语言模型（LLM）智能体的主要技术。然而，RLVR高度依赖精心设计的任务查询和相应的地面真实答案来提供准确的奖励，这需要大量的人力努力，并阻碍了RL过程的扩展，尤其是在代理场景中。尽管一些最近的工作探索了任务合成方法，但生成的代理任务的难度很难控制以提供有效的RL训练优势。为了实现更高可扩展性的代理RLVR，我们探索了深度搜索代理的自play训练，其中学习LLM利用多轮搜索引擎调用，并同时充当任务提出者和问题解决者。任务提出者的目标是生成具有明确地面真实答案和逐渐增加的任务难度的深度搜索查询。问题解决者试图处理生成的搜索查询并输出正确的答案预测。为了确保每个生成的搜索查询都有准确的地面真实，我们收集所有从提出者轨迹中获得的搜索结果作为外部知识，然后进行检索增强生成（RAG）以测试所提出的查询是否可以使用所有必要的搜索文档来正确回答。在这个搜索自play（SSP）游戏中，提出者和解决者通过竞争和合作共同进化其智能体能力。通过大量实验结果，我们发现SSP可以在各种基准上显著提高搜索代理的性能，而无需任何监督，在从头开始和连续RL训练设置下均如此。代码在https://github.com/Qwen-Applications/SSP。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become the mainstream technique for training LLM agents. However, RLVR highly depends on well-crafted task queries and corresponding ground-truth answers to provide accurate rewards, which requires significant human effort and hinders the scaling of RL processes, especially in agentic scenarios. Although a few recent works explore task synthesis methods, the difficulty of generated agentic tasks can hardly be controlled to provide effective RL training advantages. To achieve agentic RLVR with higher scalability, we explore self-play training for deep search agents, in which the learning LLM utilizes multi-turn search engine calling and acts simultaneously as both a task proposer and a problem solver. The task proposer aims to generate deep search queries with well-defined ground-truth answers and increasing task difficulty. The problem solver tries to handle the generated search queries and output the correct answer predictions. To ensure that each generated search query has accurate ground truth, we collect all the searching results from the proposer's trajectory as external knowledge, then conduct retrieval-augmentation generation (RAG) to test whether the proposed query can be correctly answered with all necessary search documents provided. In this search self-play (SSP) game, the proposer and the solver co-evolve their agent capabilities through both competition and cooperation. With substantial experimental results, we find that SSP can significantly improve search agents' performance uniformly on various benchmarks without any supervision under both from-scratch and continuous RL training setups. The code is at https://github.com/Qwen-Applications/SSP.

URL PDF HTML ☆

赞 0 踩 0

2510.16814 2026-05-20 cs.LG cs.AI cs.CV 版本更新

Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity

景观中的针：在标签稀缺条件下用于考古遗址发现的半监督伪标签方法

Simon Jaxy, Anton Theys, Patrick Willett, W. Chris Carleton, Ralf Vandam, Pieter Libin

发表机构 * Sensors, Royal Military Academy, Brussels, Belgium AMGC (Archaeology, Environmental Changes \& Geo-Chemistry), Vrije Universiteit Brussel Max Planck Institute of Geoanthropology, Jena, Germany Shared first author Shared last author

AI总结本文提出了一种非对称双伪标签（DPL）方法，通过端到端深度学习直接从多波段遥感影像中学习稀疏正样本，无需人工特征工程或对遗址不存在的假设，在两个著名的考古数据集上进行了评估。DPL在Sagalassos数据集上优于LAMAP基线，在F1和召回率上分别提高了12%和29%，而在Cyprus数据集上，DPL在无确认负样本的纯PU设置中恢复了判别能力。DPL的集成产生可解释的概率表面，支持调查规划，从最小的标记数据中有效发现遗址。

详情

AI中文摘要

考古预测建模通过结合已知位置与环境和地理空间变量来估计未发现遗址的可能位置，提出了一个积极无标签（PU）学习挑战，其中确认的遗址稀少，大多数位置未标记而非真正的负样本。为克服这一问题，我们提出了非对称双伪标签（DPL），一种端到端深度学习方法，直接从多波段遥感影像中学习稀疏正样本，无需人工特征工程或对遗址不存在的假设，并在两个著名的考古数据集上进行了评估。在Sagalassos数据集上，与独立的验证现场调查相比，DPL在F1和召回率上分别优于LAMAP基线12%和29%，而LAMAP在概率排名上保持优势。标准监督基线在负样本不确定时失败惨烈；仅正样本训练崩溃为预测 everywhere，建立经验界限。在Cyprus数据集上，纯PU设置中无确认负样本，SL翻转概率排名，而DPL恢复判别能力。DPL集成产生可解释的概率表面，支持调查规划，从最小的标记数据中有效发现遗址。

英文摘要

Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental and geospatial variables, presenting a positive-unlabeled (PU) learning challenge where confirmed sites are rare and most locations are unlabeled rather than truly negative. To overcome this, we propose asymmetric dual pseudolabeling (DPL), an end-to-end deep learning method that learns from sparse positives directly from multi-band geospatial imagery without hand-crafted feature engineering or assumptions about site absence, and evaluate on two prominent archaeological datasets. On the Sagalassos dataset, evaluated against an independent, held-out field survey, DPL outperforms the LAMAP baseline by 12% in F1 and 29% in Recall, while LAMAP maintains advantages in probability ranking. Standard supervised baselines fail catastrophically when negatives are uncertain; positive-only training collapses to predicting everywhere, es- tablishing empirical bounds. On the Cyprus dataset, a pure PU setting without confirmed negatives, SL inverts probability rankings while DPL recovers discrimination. DPL ensembles produce interpretable probability surfaces supporting survey planning, enabling effective site discovery from minimal labeled data.

URL PDF HTML ☆

赞 0 踩 0

2510.12773 2026-05-20 cs.CL cs.AI cs.LG 版本更新

Dr.LLM: Dynamic Layer Routing in LLMs

Dr.LLM：大语言模型中的动态层路由

Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh

发表机构 * Parameter Lab（参数实验室）； MBZUAI（穆扎夫法尔国际人工智能研究院）； NAVER AI Lab（NAVER人工智能实验室）； University of Tübingen（图宾根大学）； Tübingen AI Center（图宾根人工智能中心）

AI总结本文提出Dr.LLM，一种通过在预训练模型中加入轻量级每层路由器来实现动态层路由的框架，该方法在不改变基础权重的情况下，通过显式监督训练路由器，提高推理的计算效率和准确性。

Comments Published at ICLR 2026

详情

AI中文摘要

大语言模型（LLMs）处理每个token时都会通过transformer堆栈的所有层，这导致简单查询的计算浪费以及更复杂的查询需要更深层次推理时的灵活性不足。适应深度方法可以提高效率，但先前的方法依赖于成本高昂的推理时间搜索、架构更改或大规模重新训练，在实践中虽然提高了效率，但常常导致准确性下降。我们介绍了Dr.LLM，即大语言模型中的动态层路由，一种可回退的框架，该框架为预训练模型配备了轻量级每层路由器，决定跳过、执行或重复一个块。路由器通过显式监督进行训练：使用蒙特卡洛树搜索（MCTS），我们推导出高质量的层配置，以在计算预算下保持或提高准确性。我们的设计，包括窗口池化以实现稳定的路由、聚焦损失与类别平衡以及瓶颈MLP路由器，确保在类别不平衡和长序列下具有鲁棒性。在ARC（逻辑）和DART（数学）上，Dr.LLM在每个示例上平均节省5层的同时，将准确性提高了最高3.4个百分点。路由器能够泛化到域外任务（MMLU、GSM8k、AIME、TruthfulQA、SQuADv2、GPQA、PIQA、AGIEval）时，仅导致0.85%的准确性下降，同时保持效率，并在某些情况下优于先前的路由方法。总体而言，Dr.LLM展示了通过显式监督训练的路由器可以回退冻结的LLMs，以实现预算意识、准确性驱动的推理，而无需改变基础权重。代码可在https://github.com/parameterlab/dr-llm上获得。

英文摘要

Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inference-time search, architectural changes, or large-scale retraining, and in practice often degrade accuracy despite efficiency gains. We introduce Dr. LLM, Dynamic routing of Layers for LLMs, a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block. Routers are trained with explicit supervision: using Monte Carlo Tree Search (MCTS), we derive high-quality layer configurations that preserve or improve accuracy under a compute budget. Our design, windowed pooling for stable routing, focal loss with class balancing, and bottleneck MLP routers, ensures robustness under class imbalance and long sequences. On ARC (logic) and DART (math), Dr. LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average. Routers generalize to out-of-domain tasks (MMLU, GSM8k, AIME, TruthfulQA, SQuADv2, GPQA, PIQA, AGIEval) with only 0.85% accuracy drop while retaining efficiency, and outperform prior routing methods by up to +7.7%p. Overall, Dr. LLM shows that explicitly supervised routers retrofit frozen LLMs for budget-aware, accuracy-driven inference without altering base weights. Code is available at https://github.com/parameterlab/dr-llm.

URL PDF HTML ☆

赞 0 踩 0

2510.09872 2026-05-20 cs.LG cs.AI 版本更新

WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

WARC-Bench：基于网络存档的GUI子任务执行基准

Sanjari Srivastava, Gang Li, Cheng Chang, Rishu Garg, Manpreet Kaur, Charlene Y. Lee, Yuezhang Li, Yining Mao, Ignacio Cases, Yanan Xie, Peng Qi

发表机构 * Uniphore

AI总结本文提出WARC-Bench，一个基于网络存档的GUI子任务执行基准，通过438个任务评估多模态AI代理在子任务上的能力，实验表明SFT和RLVR方法在提升子任务执行效果上取得显著成果。

详情

AI中文摘要

微分-积分神经算子用于长期湍流预测

Hao Wu, Yuan Gao, Fan Xu, Fan Zhang, Qingsong Wen, Kun Wang, Xiaomeng Huang, Xian Wu

发表机构 * Tsinghua University（清华大学）； University of Science and Technology of China（中国科学技术大学）； The Chinese University of Hong Kong（香港中文大学）； Nanyang Technological University（南洋理工大学）； Tencent（腾讯）

AI总结本文提出了一种基于物理原理的微分-积分神经算子，通过并行分支学习不同的物理算子，以提高长期湍流预测的稳定性与鲁棒性，从而在2D Kolmogorov流基准测试中实现了更精确的预测。

详情

AI中文摘要

准确预测湍流的长期演变是科学计算中的重大挑战，对气候建模和航空航天工程等应用至关重要。现有的深度学习方法，特别是神经算子，在长期自回归预测中常常失败，导致灾难性误差累积和物理保真度的丧失。这种失败源于它们无法同时捕捉湍流动力学所支配的不同的数学结构：局部、耗散效应和全局、非局部相互作用。在本文中，我们提出了微分-积分神经算子（\method{}），一种基于算子分解的原理方法。\method{}通过并行分支显式建模湍流的演变，学习不同的物理算子：一个局部微分算子，由一个受约束的卷积网络实现，该网络可以证明收敛于导数；以及一个全局积分算子，由Transformer架构捕捉，学习数据驱动的全局核。这种基于物理的分解使\method{}具有卓越的稳定性和鲁棒性。通过在具有挑战性的2D Kolmogorov流基准测试中的广泛实验，我们证明\method{}在长期预测中显著优于最先进的模型。它能够抑制数百个时间步上的误差累积，保持涡旋场和能量谱的高保真度，并建立了物理一致、长程湍流预测的新基准。

英文摘要

Accurately forecasting the long-term evolution of turbulence represents a grand challenge in scientific computing and is crucial for applications ranging from climate modeling to aerospace engineering. Existing deep learning methods, particularly neural operators, often fail in long-term autoregressive predictions, suffering from catastrophic error accumulation and a loss of physical fidelity. This failure stems from their inability to simultaneously capture the distinct mathematical structures that govern turbulent dynamics: local, dissipative effects and global, non-local interactions. In this paper, we propose the {\textbf{\underline{D}}}ifferential-{\textbf{\underline{I}}}ntegral {\textbf{\underline{N}}}eural {\textbf{\underline{O}}}perator (\method{}), a novel framework designed from a first-principles approach of operator decomposition. \method{} explicitly models the turbulent evolution through parallel branches that learn distinct physical operators: a local differential operator, realized by a constrained convolutional network that provably converges to a derivative, and a global integral operator, captured by a Transformer architecture that learns a data-driven global kernel. This physics-based decomposition endows \method{} with exceptional stability and robustness. Through extensive experiments on the challenging 2D Kolmogorov flow benchmark, we demonstrate that \method{} significantly outperforms state-of-the-art models in long-term forecasting. It successfully suppresses error accumulation over hundreds of timesteps, maintains high fidelity in both the vorticity fields and energy spectra, and establishes a new benchmark for physically consistent, long-range turbulence forecast.

URL PDF HTML ☆

赞 0 踩 0

2509.19250 2026-05-20 stat.ML cs.LG 版本更新

Recovering Wasserstein Distance Matrices from Few Measurements

从少量测量中恢复Wasserstein距离矩阵

Muhammad Rana, Abiy Tasissa, HanQin Cai, Yakov Gavriyelov, Keaton Hamm

发表机构 * Department of Mathematics, University of Texas at Arlington（德克萨斯理工大学数学系）； Department of Mathematics, Tufts University（塔夫茨大学数学系）； Department of Statistics and Data Science and Department of Computer Science University of Central Florida（中央佛罗里达大学统计与数据科学系和计算机科学系）； Division of Data Science, University of Texas at Arlington（德克萨斯理工大学数据科学 division）

AI总结本文提出两种算法，用于从少量条目估计平方Wasserstein距离矩阵，这些矩阵用于计算流形学习嵌入，如多维标度分析（MDS）或Isomap，但与欧几里得距离矩阵不同，它们的计算成本极高。本文分析了从上三角样本进行矩阵补全和Nyström补全，证明了在Nyström补全下MDS的稳定性，并展示了在固定样本距离预算下，Nyström补全可以优于矩阵补全。最后，本文证明了即使仅计算距离矩阵的10%列，嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

详情

DOI: 10.1109/ICASSP55912.2026.11460676
Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

AI中文摘要

本文提出两种算法，用于从少量条目估计平方Wasserstein距离矩阵。这些矩阵用于计算流形学习嵌入，如多维标度分析（MDS）或Isomap，但与欧几里得距离矩阵不同，它们的计算成本极高。我们分析了从上三角样本进行矩阵补全和Nyström补全，在其中$\mathcal{O}(d\log(d))$列的距离矩阵被计算，其中$d$是所需的嵌入维度，证明了在Nyström补全下MDS的稳定性，并展示了在固定样本距离预算下，Nyström补全可以优于矩阵补全。最后，我们证明了即使仅计算距离矩阵的10%列，嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

英文摘要

This paper proposes two algorithms for estimating square Wasserstein distance matrices from a small number of entries. These matrices are used to compute manifold learning embeddings like multidimensional scaling (MDS) or Isomap, but contrary to Euclidean distance matrices, are extremely costly to compute. We analyze matrix completion from upper triangular samples and Nyström completion in which $\mathcal{O}(d\log(d))$ columns of the distance matrices are computed where $d$ is the desired embedding dimension, prove stability of MDS under Nyström completion, and show that it can outperform matrix completion for a fixed budget of sample distances. Finally, we show that classification of the OrganCMNIST dataset from the MedMNIST benchmark is stable on data embedded from the Nyström estimation of the distance matrix even when only 10\% of the columns are computed.

URL PDF HTML ☆

赞 0 踩 0

2509.14968 2026-05-20 cs.LG cs.NI 版本更新

FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

FAWN：一种多编码器融合-注意力波网络用于集成感知与通信室内场景推断

Carlos Barroso-Fernández, Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos J. Bernardos

发表机构 * Ericsson（爱立信）

AI总结本文提出FAWN，一种基于Transformer架构的多编码器融合-注意力波网络，用于整合感知与通信的室内场景推断，通过融合Wi-Fi和5G信号提高环境感知精度。

Comments 7 pages, 6 figures and tables, less than 5500 words. Under revision at IEEE Communication Magazine

详情

AI中文摘要

下一代无线技术有望实现万物互联和智能化的时代。随着对智能需求的增长，网络必须学会更好地理解物理世界。然而，部署专用硬件来感知环境并不总是可行，主要是由于成本和/或复杂性。集成感知与通信（ISAC）在解决这一挑战上迈出了重要一步。在ISAC中，被动感知作为一种成本效益高的解决方案，利用无线通信来感知环境，而不干扰现有通信。然而，当前大多数解决方案仅限于一种技术（主要是Wi-Fi或5G），限制了最大精度。由于不同技术使用不同的频谱，我们看到有必要整合多种技术以扩大覆盖范围。因此，我们利用ISAC被动感知，提出FAWN，一种用于ISAC室内场景推断的多编码器融合-注意力波网络。FAWN基于原始Transformer架构，融合Wi-Fi和5G信息，使网络能够理解物理世界而不干扰当前通信。为了测试我们的解决方案，我们构建了一个原型并将其集成到真实场景中。结果表明，在84%的时间内，误差低于0.6米。

英文摘要

The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless communications to sense the environment, without interfering with existing communications. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.

URL PDF HTML ☆

赞 0 踩 0

2509.07024 2026-05-20 physics.plasm-ph cs.LG 版本更新

TGLF-WINN: Data-Efficient Deep Learning Surrogate for Turbulent Transport Modeling in Fusion

TGLF-WINN: 用于等离子体输运建模的高效深度学习替代模型

Yadi Cao, Futian Zhang, Wesley Liu, Tom Neiser, Orso Meneghini, Lawson Fuller, Sterling Smith, Raffi Nazikian, Brian Sammuli, Rose Yu

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； General Atomics（通用原子公司）

AI总结本文提出TGLF-WINN，一种数据高效的深度学习替代模型，通过三种创新方法：原理化的特征工程、物理引导的波数解析正则化和贝叶斯主动学习，提高了湍流输运建模的效率和准确性。

Comments Minor Revision responding to Nuclear Fusion reviewer and adjudicator comments (round 3)

详情

AI中文摘要

Trapped Gyro-Landau Fluid (TGLF)模型提供了快速且准确的托卡马克湍流输运预测，但需要数千次评估的全设备模拟仍然计算成本高昂。神经网络（NN）替代模型提供加速推理，具有完全可微的近似方法，能够实现基于梯度的耦合，但通常需要大量训练数据来捕捉不同等离子体条件下的输运通量变化，造成显著的训练负担并限制其在昂贵的gyrokinetic模拟中的应用。我们提出TGLF-WINN（波数引导的神经网络），具有三个关键创新：（1）原理化的特征工程，减少目标预测范围，简化学习任务；（2）物理引导的波数解析正则化，以在稀疏数据下提高泛化能力；（3）贝叶斯主动学习（BAL）以根据模型不确定性战略选择训练样本，减少数据需求同时保持准确性。特征调优和波数正则化共同在完整数据集上实现了比TGLF-NN低12.5%的相对RMSLE；在稀疏、未过滤的训练（大约是完整数据集的1/9）下，它们产生的RMSLE退化比TGLF-NN小一个数量级，其中波数引导的正则化对每种模式的通量施加了物理引导的约束。添加贝叶斯主动学习后，TGLF-WINN仅使用25%的训练数据即可达到TGLF-NN的全数据离线精度，其在TGLF-NN全数据基准下的误差为2.8%，在我们自己的全数据结果下的误差为4.3%。下游的通量匹配工作流程进一步展示了其实用性：NN替代模型在与TGLF相当的重建精度下实现了45倍的速度提升。

英文摘要

The Trapped Gyro-Landau Fluid (TGLF) model provides fast, accurate predictions of turbulent transport in tokamaks, but whole device simulations requiring thousands of evaluations remain computationally expensive. Neural network (NN) surrogates offer accelerated inference with fully differentiable approximations that enable gradient-based coupling but typically require large training datasets to capture transport flux variations across plasma conditions, creating significant training burden and limiting applicability to expensive gyrokinetic simulations. We propose TGLF-WINN (Wavenumber-Informed Neural Network) with three key innovations: (1) principled feature engineering that reduces target prediction range, simplifying the learning task; (2) physics-guided wavenumber-resolved regularization to improve generalization under sparse data; and (3) Bayesian Active Learning (BAL) to strategically select training samples based on model uncertainty, reducing data requirements while maintaining accuracy. Feature tuning and wavenumber regularization together deliver a 12.5% relative RMSLE reduction over TGLF-NN on the full dataset; under sparse, unfiltered training (approximately 1/9 the full size) they yield an order-of-magnitude smaller RMSLE degradation than TGLF-NN, with the wavenumber-informed regularization imposing a physics-guided constraint on per-mode fluxes. Adding Bayesian Active Learning, TGLF-WINN matches TGLF-NN's full-data offline accuracy using only 25% of the training data, within 2.8% of TGLF-NN's full-data baseline and 4.3% of our own full-data result. A downstream flux-matching workflow further shows practicality: the NN surrogate gives a 45x speedup over TGLF with comparable reconstruction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2508.14134 2026-05-20 cs.LG cs.AI 版本更新

ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification

ERIS: 一种面向分布外时间序列分类的能量引导特征解耦框架

Xin Wu, Fei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang

发表机构 * Hong Kong University of Science and Technology（香港科技大学）

AI总结本文提出ERIS框架，通过能量引导机制和语义指导，解决时间序列分类中分布外数据的可靠特征解耦问题，提升模型鲁棒性和泛化能力。

详情

DOI: 10.1016/j.inffus.2026.104407
Journal ref: Information Fusion 135, 104407 (2026)

AI中文摘要

理想的时间序列分类（TSC）应能捕捉不变表示，但实现对分布外（OOD）数据的可靠性能仍是一个核心障碍。这一障碍源于模型内在地将领域特定和标签相关特征纠缠在一起，导致虚假相关性。尽管特征解耦旨在解决这一问题，但当前方法大多缺乏必要的语义方向，无法隔离真正普遍的特征。为此，我们提出一个端到端的Energy-Regularized Information for Shift-Robustness（ERIS）框架，以实现引导且可靠的特征解耦。核心思想是有效的解耦不仅需要数学约束，还需要语义指导来锚定分离过程。ERIS集成了三个关键机制来实现这一目标。具体来说，我们首先引入一种能量引导校准机制，为分离过程提供关键的语义指导，使模型能够自我校准。此外，一个权重层面正交性策略强制领域特定和标签相关特征之间的结构性独立，从而减轻它们的干扰。此外，一个辅助对抗泛化机制通过注入结构化扰动来增强鲁棒性。在四个基准测试中的实验表明，ERIS在统计上显著优于最先进的基线方法，始终保持最佳性能排名。

英文摘要

An ideal time series classification (TSC) should be able to capture invariant representations, but achieving reliable performance on out-of-distribution (OOD) data remains a core obstacle. This obstacle arises from the way models inherently entangle domain-specific and label-relevant features, resulting in spurious correlations. While feature disentanglement aims to solve this, current methods are largely unguided, lacking the semantic direction required to isolate truly universal features. To address this, we propose an end-to-end Energy-Regularized Information for Shift-Robustness (ERIS) framework to enable guided and reliable feature disentanglement. The core idea is that effective disentanglement requires not only mathematical constraints but also semantic guidance to anchor the separation process. ERIS incorporates three key mechanisms to achieve this goal. Specifically, we first introduce an energy-guided calibration mechanism, which provides crucial semantic guidance for the separation, enabling the model to self-calibrate. Additionally, a weight-level orthogonality strategy enforces structural independence between domain-specific and label-relevant features, thereby mitigating their interference. Moreover, an auxiliary adversarial generalization mechanism enhances robustness by injecting structured perturbations. Experiments across four benchmarks demonstrate that ERIS achieves a statistically significant improvement over state-of-the-art baselines, consistently securing the top performance rank.

URL PDF HTML ☆

赞 0 踩 0

2507.15698 2026-05-20 cs.CL cs.AI cs.LG 版本更新

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

CoLD: 用于数学推理过程中奖励模型的反事实引导长度偏差消除

Congmin Zheng, Jiachen Zhu, Jianghao Lin, Xinyi Dai, Weiwen Liu, Haoxuan Li, Yong Yu, Weinan Zhang, Mengyue Yang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Huawei Noah’s Ark Lab（华为诺亚实验室）； Peking University（北京大学）； University of Bristol（布里斯托大学）

AI总结本文提出CoLD，一种通过反事实引导消除过程奖励模型中长度偏差的统一框架，旨在提高多步骤推理的准确性和简洁性，同时提升下游强化学习性能和跨领域泛化能力。

详情

AI中文摘要

过程奖励模型（PRMs）在评估和引导大型语言模型（LLMs）的多步推理中起着核心作用，特别是在数学问题解决中。然而，我们发现现有PRMs存在普遍的长度偏差：即使语义内容和逻辑有效性未变，它们也倾向于对较长的推理步骤赋予更高的分数。这种偏差会削弱奖励预测的可靠性，并导致推理过程中输出过于冗长。为了解决这一问题，我们提出了CoLD（Counterfactually-Guided Length Debiasing），一种统一的框架，通过三个组件减轻长度偏差：显式的长度惩罚调整、一个训练以捕捉虚假长度相关信号的学得偏差估计器，以及一种联合训练策略，强制奖励预测的长度不变性。我们的方法基于反事实推理，并受因果图分析的启发。在MATH500和GSM-Plus上的广泛实验表明，CoLD提高了步骤选择的准确性，并鼓励了更简洁、逻辑有效的推理。此外，它一致提高了下游RL性能，并通过减轻长度偏差在跨领域中泛化，展示了CoLD强大的泛化能力。

英文摘要

Process Reward Models (PRMs) play a central role in evaluating and guiding multi-step reasoning in large language models (LLMs), especially for mathematical problem solving. However, we identify a pervasive length bias in existing PRMs: they tend to assign higher scores to longer reasoning steps, even when the semantic content and logical validity are unchanged. This bias undermines the reliability of reward predictions and leads to overly verbose outputs during inference. To address this issue, we propose CoLD(Counterfactually-Guided Length Debiasing), a unified framework that mitigates length bias through three components: an explicit length-penalty adjustment, a learned bias estimator trained to capture spurious length-related signals, and a joint training strategy that enforces length-invariance in reward predictions. Our approach is grounded in counterfactual reasoning and informed by causal graph analysis. Extensive experiments on MATH500 and GSM-Plus show that CoLD improves accuracy in step selection, and encourages more concise, logically valid reasoning. Furthermore, it consistently improves downstream RL performance and generalizes across domains by mitigating length bias, demonstrating CoLD's strong generalization capability.

URL PDF HTML ☆

赞 0 踩 0

2507.10614 2026-05-20 cs.LG cs.AI 版本更新

Fine-tuning Large Language Model for Automated Algorithm Design

微调大语言模型用于自动化算法设计

Fei Liu, Rui Zhang, Xi Lin, Zhichao Lu, Qingfu Zhang

发表机构 * City University of Hong Kong（香港城市大学）； Xi’an Jiaotong University（西安交通大学）

AI总结本文探讨了微调大语言模型以提升其在自动化算法设计中的性能，提出了一种多样性感知的排名策略和直接偏好优化方法，通过实验验证了任务特定微调在不同算法设计任务中的有效性。

详情

AI中文摘要

将大语言模型（LLMs）整合到自动化算法设计中已展现出巨大潜力。一种常见的方法是将LLMs嵌入到搜索过程中，以迭代生成和优化候选算法。然而，现有大多数方法依赖于为通用编码任务训练的现成LLMs，留下一个关键问题：是否需要专门针对算法设计训练的LLMs？如果是，如何有效获得此类LLMs，并且它们在不同算法设计任务中有多好的泛化能力？在本文中，我们通过探索针对算法设计的LLMs微调，初步回答了这些问题。我们引入了一种多样性感知的排名（DAR）采样策略，以平衡训练数据的多样性和质量，然后利用直接偏好优化来高效地对齐LLMs的输出与任务目标。我们的实验主要在Llama-3.2-1B-Instruct和Llama-3.1-8BInstruct上进行，针对三个不同的算法设计任务，此外，openPangu-Embedded模型还作为辅助比较在可允许集合问题上进行评估。结果表明，微调后的LLMs在较小的Llama-3.2-1B-Instruct上显著优于其现成的对应者，并在可允许集合问题上与较大的Llama-3.1-8B-Instruct匹配。此外，我们观察到良好的泛化能力：在特定算法设计任务上微调的LLMs在相关任务中也表现出色。这些发现突显了LLMs在算法设计中任务特定适应的价值，并为未来研究开辟了新途径。我们的代码可在https://github.com/RayZhhh/dpo-aad上公开获取。

英文摘要

The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a preliminary step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments are primarily conducted on Llama-3.2-1B-Instruct and Llama-3.1-8BInstruct across three distinct algorithm design tasks, with openPangu-Embedded models additionally included as auxiliary comparisons on the admissible set problem. Results suggest that fine-tuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem. Moreover, we observe promising generalization: LLMs fine-tuned on specific algorithm design tasks also improve performance on related tasks with varying settings. These findings highlight the value of task-specific adaptation for LLMs in algorithm design and open new avenues for future research. Our code is publicly available at https://github.com/RayZhhh/dpo-aad.

URL PDF HTML ☆

赞 0 踩 0

2507.10492 2026-05-20 cs.CV cs.AI cs.LG 版本更新

BenchReAD: A systematic benchmark for retinal anomaly detection

BenchReAD: 一种系统性的视网膜异常检测基准

Chenyu Lian, Hong-Yu Zhou, Zhanli Hu, Jing Qin

发表机构 * The Center for Smart Health, School of Nursing, the Hong Kong Polytechnic University, Hong Kong, China（香港理工大学护理学院智能健康中心）； School of Biomedical Engineering, Tsinghua University, Beijing, China（清华大学生物医学工程学院）； Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China（中国科学院深圳先进技术研究院医学人工智能研究中心）

AI总结本研究提出BenchReAD基准，旨在解决视网膜异常检测领域缺乏全面且公开的评估标准的问题，通过系统化的数据和算法分类，引入了全监督方法DRA，并改进为NFM-DRA，实现了SOTA性能。

Comments MICCAI 2025

详情

DOI: 10.1007/978-3-032-04937-7_4

AI中文摘要

视网膜异常检测在筛查眼部和系统性疾病中起着关键作用。尽管其重要性，该领域的进展受到缺乏全面且公开可用的基准的阻碍，这对于公平评估和推进方法至关重要。由于这一限制，与视网膜图像相关的先前异常检测工作受到（1）异常类型有限且过于简单的限制，（2）测试集几乎饱和，以及（3）缺乏泛化评估的影响，导致实验设置说服力不足。此外，现有医学异常检测基准大多专注于单类监督方法（仅使用负样本训练），忽视了临床实践中大量可用的标记异常数据和未标记数据。为了填补这些差距，我们引入了视网膜异常检测的基准，该基准在数据和算法上都是全面且系统的。通过分类和评估先前方法，我们发现利用解耦异常表示的全监督方法（DRA）取得了最佳性能，但在遇到某些未见异常时性能显著下降。受单类监督学习中记忆库机制的启发，我们提出了NFM-DRA，将其与正常特征记忆结合，以缓解性能下降，建立新的SOTA。该基准可在https://github.com/DopamineLcy/BenchReAD上公开获取。

英文摘要

Retinal anomaly detection plays a pivotal role in screening ocular and systemic diseases. Despite its significance, progress in the field has been hindered by the absence of a comprehensive and publicly available benchmark, which is essential for the fair evaluation and advancement of methodologies. Due to this limitation, previous anomaly detection work related to retinal images has been constrained by (1) a limited and overly simplistic set of anomaly types, (2) test sets that are nearly saturated, and (3) a lack of generalization evaluation, resulting in less convincing experimental setups. Furthermore, existing benchmarks in medical anomaly detection predominantly focus on one-class supervised approaches (training only with negative samples), overlooking the vast amounts of labeled abnormal data and unlabeled data that are commonly available in clinical practice. To bridge these gaps, we introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm. Through categorizing and benchmarking previous methods, we find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies. Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation, establishing a new SOTA. The benchmark is publicly available at https://github.com/DopamineLcy/BenchReAD.

URL PDF HTML ☆

赞 0 踩 0

2507.06428 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML 版本更新

Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

神经Actor-Critic方法用于哈密尔顿-雅可比-贝尔曼PDEs：渐近分析与数值研究

Samuel N. Cohen, Jackson Hebner, Deqing Jiang, Justin Sirignano

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）

AI总结本文研究了用于求解高维哈密尔顿-雅可比-贝尔曼偏微分方程的神经Actor-Critic方法，通过渐近分析和数值研究，证明了该方法在解决随机控制问题中的有效性。

Comments 46 pages

详情

AI中文摘要

我们数学上分析并数值研究了一种用于求解随机控制理论中高维哈密尔顿-雅可比-贝尔曼（HJB）偏微分方程的Actor-Critic机器学习算法。批评者（价值函数估计器）的结构设计使得边界条件始终被完美满足（而不是包含在训练损失中），并利用偏斜梯度以减少计算成本。演员（最优控制估计器）通过最小化域内哈密尔顿量的积分进行训练，其中哈密尔顿量通过批评者估计。我们证明，当演员和批评者神经网络中的隐藏单元数量趋于无穷大时，演员和批评者的训练动态在Sobolev型空间中收敛到某个无限维常微分方程（ODE）。进一步地，在哈密尔顿量类似凸性假设下，我们证明该极限ODE的任何固定点都是原始随机控制问题的解。这为算法性能提供了重要保证，考虑到有限宽度神经网络可能只能收敛到局部极小值（而非最优解），由于其损失函数的非凸性。在我们的数值研究中，我们展示了该算法能够准确地在高达200维的随机控制问题中求解。特别是，我们构建了一系列逐渐复杂且具有已知解析解的随机控制问题，并研究该算法在这些问题上的数值性能。这些问题从线性二次调节器方程到极具挑战性的非凸哈密尔顿量方程，使我们能够识别并分析该神经Actor-Critic方法在求解HJB方程中的优势和局限性。

英文摘要

We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.

URL PDF HTML ☆

赞 0 踩 0

2507.03122 2026-05-20 cs.IR cs.CL cs.LG 版本更新

Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings

基于轻量模型和预训练嵌入的ICD分类联邦学习

Binbin Xu, Gérard Dray

发表机构 * EuroMov Digital Health in Motion, Univ. Montpellier, IMT Mines Ales（EuroMov数字健康运动、蒙彼利埃大学、IMT Mines Ales）

AI总结本文研究了使用MIMIC-IV数据集中的临床笔记进行多标签ICD代码分类的联邦学习可行性与性能，提出了一种结合冻结文本嵌入和简单多层感知机分类器的轻量级可扩展流程，展示了在分布式医疗环境中隐私保护和部署高效的替代方案。

Comments 20 pages

详情

AI中文摘要

本研究探讨了使用MIMIC-IV数据集中的临床笔记进行多标签ICD代码分类的联邦学习（FL）的可行性和性能。不同于以往依赖集中训练或微调大型语言模型的方法，我们提出了一种轻量级且可扩展的流程，结合冻结的文本嵌入与简单的多层感知机（MLP）分类器。该设计为临床NLP应用提供了一种隐私保护且部署高效的替代方案，特别适用于分布式医疗环境。在集中式和联邦式配置下进行了广泛的实验，测试了六个公开可用的嵌入模型（来自Massive Text Embedding Benchmark排行榜）和三种MLP分类器架构，以及两种医学编码（ICD-9和ICD-10）。此外，对十个随机分层分割进行消融研究以评估性能稳定性。结果表明，嵌入质量在决定预测性能方面显著优于分类器复杂性，并且在理想条件下联邦学习可以接近集中式结果。尽管模型比最先进的架构小多个数量级，并且在微和宏F1分数上取得了竞争性的成绩，但仍存在一些限制，包括缺乏端到端训练和简化FL假设。然而，本研究展示了向可扩展、隐私意识的医疗编码系统迈进的可行方法，并为未来研究联邦、领域适应的临床AI提供了一步。

英文摘要

This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline combining frozen text embeddings with simple multilayer perceptron (MLP) classifiers. This design offers a privacy-preserving and deployment-efficient alternative for clinical NLP applications, particularly suited to distributed healthcare settings. Extensive experiments across both centralized and federated configurations were conducted, testing six publicly available embedding models from Massive Text Embedding Benchmark leaderboard and three MLP classifier architectures under two medical coding (ICD-9 and ICD-10). Additionally, ablation studies over ten random stratified splits assess performance stability. Results show that embedding quality substantially outweighs classifier complexity in determining predictive performance, and that federated learning can closely match centralized results in idealized conditions. While the models are orders of magnitude smaller than state-of-the-art architectures and achieved competitive micro and macro F1 scores, limitations remain including the lack of end-to-end training and the simplified FL assumptions. Nevertheless, this work demonstrates a viable way toward scalable, privacy-conscious medical coding systems and offers a step toward for future research into federated, domain-adaptive clinical AI.

URL PDF HTML ☆

赞 0 踩 0

2507.01123 2026-05-20 cs.CV cs.LG eess.IV 版本更新

Landslide Detection and Mapping Using Deep Learning Across Multi-Source Satellite Data and Geographic Regions

利用多源卫星数据和地理区域的深度学习进行滑坡检测与制图

Rahul A. Burange, Harsh K. Shinde, Omkar Mutyalwar

发表机构 * Department of Electronics & Telecommunication, KDK College of Engineering（电子与电信系，KDK工程学院）

AI总结本文提出了一种综合方法，结合多源卫星影像和深度学习模型，以提高滑坡识别和预测的准确性，通过Sentinel-2多光谱数据和ALOS PALSAR衍生的坡度和数字高程模型（DEM）层来捕捉影响滑坡发生的关键环境特征，并评估多种地理空间分析技术对检测精度的影响，同时评估了多种先进的深度学习分割模型，如U-Net、DeepLabV3+和Res-Net，以确定其在滑坡检测中的有效性。

Comments 17 pages, 22 figures

详情

DOI: 10.2139/ssrn.5225437
Journal ref: JETIR March 2025, Volume 12, Issue 3

AI中文摘要

滑坡对基础设施、经济和人类生命构成严重威胁，需要在多样化的地理区域中进行准确的检测和预测制图。随着深度学习和遥感技术的进步，自动化滑坡检测已变得更加有效。本文提出了一种综合方法，整合多源卫星影像和深度学习模型，以增强滑坡识别和预测。我们利用Sentinel-2多光谱数据和ALOS PALSAR衍生的坡度和数字高程模型（DEM）层来捕捉影响滑坡发生的关键环境特征。各种地理空间分析技术被用来评估地形特征、植被覆盖和降雨对检测精度的影响。此外，我们评估了多种先进的深度学习分割模型，包括U-Net、DeepLabV�+和Res-Net，以确定其在滑坡检测中的有效性。所提出的框架有助于发展可靠的早期预警系统，改进灾害风险管理，并促进可持续的土地利用规划。我们的发现为深度学习和多源遥感在创建稳健、可扩展和可转移的滑坡预测模型中的潜力提供了有价值的见解。

英文摘要

Landslides pose severe threats to infrastructure, economies, and human lives, necessitating accurate detection and predictive mapping across diverse geographic regions. With advancements in deep learning and remote sensing, automated landslide detection has become increasingly effective. This study presents a comprehensive approach integrating multi-source satellite imagery and deep learning models to enhance landslide identification and prediction. We leverage Sentinel-2 multispectral data and ALOS PALSAR-derived slope and Digital Elevation Model (DEM) layers to capture critical environmental features influencing landslide occurrences. Various geospatial analysis techniques are employed to assess the impact of terra in characteristics, vegetation cover, and rainfall on detection accuracy. Additionally, we evaluate the performance of multiple stateof-the-art deep learning segmentation models, including U-Net, DeepLabV3+, and Res-Net, to determine their effectiveness in landslide detection. The proposed framework contributes to the development of reliable early warning systems, improved disaster risk management, and sustainable land-use planning. Our findings provide valuable insights into the potential of deep learning and multi-source remote sensing in creating robust, scalable, and transferable landslide prediction models.

URL PDF HTML ☆

赞 0 踩 0

2506.17036 2026-05-20 stat.ME cs.LG stat.ML 版本更新

Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction

多传感器和故障事件数据的贝叶斯联合模型用于多模式故障预测

Sina Aghaee Dabaghan Fard, Minhee Kim, Akash Deep, Jaesung Lee

发表机构 * Department of Industrial and Systems Engineering, Texas A&M University（德克萨斯A&M大学工业与系统工程系）； University of Florida（佛罗里达大学）； School of Industrial Engineering and Management, Oklahoma State University（俄克拉荷马州立大学工业工程与管理学院）

AI总结本文提出了一种联合建模多传感器时间序列数据和多模式故障时间的贝叶斯方法，通过整合Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布，实现对系统剩余使用寿命的准确预测，并通过数值和案例研究验证了其优势。

详情

AI中文摘要

现代工业系统常常受到多种故障模式的影响，其状态由多个传感器监控，产生多个时间序列信号。此外，时间到故障的数据也经常可用。准确预测系统剩余使用寿命（RUL）需要有效利用多传感器时间序列数据和多模式故障事件数据。在大多数现有模型中，故障模式和RUL预测是独立进行的，忽略了这两个任务之间的内在关系。一些模型使用黑箱机器学习方法整合多种故障模式和事件预测，但缺乏统计严谨性，无法表征模型和数据中的内在不确定性。本文提出了一种统一的方法，通过层次贝叶斯框架整合多传感器时间序列数据和涉及多种故障模式的故障时间，该模型整合了Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布，并相应地设置先验，从而实现具有鲁棒不确定性量化的准确预测。通过变分贝叶斯方法有效获得后验分布，并通过蒙特卡洛采样进行预测。所提出模型的优势通过广泛的数值和案例研究，使用喷气发动机数据集进行了验证。

英文摘要

Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.

URL PDF HTML ☆

赞 0 踩 0

2506.15753 2026-05-20 quant-ph cs.LG cs.SY eess.SY 版本更新

QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels

QPPG：用于瑞利衰落信道链路自适应的量子预条件策略梯度

Oluwaseyi Giwa, Muhammad Ahmed Mohsin, Folarin Jubril Adesola, Muhammad Ali Jamshed

发表机构 * African Institute for Mathematical Sciences（非洲数学科学研究所）； Stanford University（斯坦福大学）； Olabisi Onabanjo University（奥拉比·奥纳班乔大学）； University of Glasgow（格拉斯哥大学）

AI总结本文提出量子预条件策略梯度算法，通过信息 Fisher 基于预条件稳定和加速策略更新，提升无线通信中动态衰落环境下的链路自适应性能，实现更快收敛、更高的吞吐量和更低的发射功率。

Comments Submitted to IEEE Wireless Communications Letters

详情

DOI: 10.1109/LWC.2026.3694465

AI中文摘要

可靠的链路自适应对于动态衰落环境中高效无线通信至关重要。然而，由于策略梯度的条件较差，强化学习（RL）解决方案常常因收敛不稳定而受到限制，阻碍了其实际应用。我们提出了量子预条件策略梯度（QPPG）算法，该算法利用基于 Fisher 信息的预条件来稳定和加速策略更新。在瑞利衰落场景中的评估显示，QPPG 相比经典方法实现了更快的收敛速度，平均吞吐量提高了 28.6%，平均发射功率降低了 43.8%。这项工作引入了量子几何预条件到链路自适应中，标志着在开发鲁棒、具有量子启发的强化学习以应对未来 6G 网络方面取得了重大进展，从而提高通信的可靠性和能效。

英文摘要

Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.

URL PDF HTML ☆

赞 0 踩 0

2506.08618 2026-05-20 cs.LG cond-mat.mes-hall cond-mat.other cs.AI cs.CV 版本更新

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

HSG-12M: 一种大规模空间多图基准，源自非厄密晶体能量谱

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

发表机构 * National University of Singapore（新加坡国立大学）； NUS Centre for Bioimaging Sciences（新加坡国立大学生物成像科学中心）

AI总结本文提出HSG-12M，一个包含1160万静态和510万动态哈密顿量谱图的数据集，用于研究非厄密量子物理中的复杂几何结构，填补了现有图基准在空间多边学习方面的空白。

Comments Accepted to ICLR 2026, OpenReview: [https://openreview.net/forum?id=YxuKCME576]. 49 pages, 13 figures, 14 tables. Code & pipeline: [https://github.com/sarinstein-yan/Poly2Graph] Dataset: [https://github.com/sarinstein-yan/HSG-12M] Dataset released under CC BY 4.0. The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

AI中文摘要

人工智能正通过揭示理解复杂物理系统的新方法改变科学研究，但其影响仍受限于缺乏大规模、高质量的领域专用数据集。非厄密量子物理中蕴藏着丰富的资源，其中晶体的能量谱在复平面上形成复杂的几何结构，称为哈密顿量谱图。尽管这些谱图作为电子行为的指纹具有重要意义，但其系统研究一直受限于手动提取的依赖。为释放这一潜力，我们引入Poly2Graph：一个高性能、开源的管道，自动化将一维晶体哈密顿量映射到谱图。使用该工具，我们提出了HSG-12M：一个包含1160万静态和510万动态哈密顿量谱图的数据集，涵盖1401个特征多项式类别，源自177TB的谱势数据。关键的是，HSG-12M是首个大规模空间多图数据集——图嵌入在度量空间中，其中两个节点之间不同的几何轨迹被保留为单独的边。这同时填补了现有图基准在空间多边学习方面的空白。流行的GNN基准测试揭示了在大规模学习空间多边时的新挑战。除了其实际用途外，我们还表明谱图是多项式、向量和矩阵的通用拓扑指纹，建立了新的代数到图的联系。HSG-12M为凝聚态物理的数据驱动科学发现奠定了基础，为几何感知图学习的新机会以及更广泛领域铺平了道路。

英文摘要

AI is transforming scientific research by revealing new ways to understand complex physical systems, but its impact remains constrained by the lack of large, high-quality domain-specific datasets. A rich, largely untapped resource lies in non-Hermitian quantum physics, where the energy spectra of crystals form intricate geometries on the complex plane -- termed as Hamiltonian spectral graphs. Despite their significance as fingerprints for electronic behavior, their systematic study has been intractable due to the reliance on manual extraction. To unlock this potential, we introduce Poly2Graph: a high-performance, open-source pipeline that automates the mapping of 1-D crystal Hamiltonians to spectral graphs. Using this tool, we present HSG-12M: a dataset containing 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs across 1401 characteristic-polynomial classes, distilled from 177 TB of spectral potential data. Crucially, HSG-12M is the first large-scale dataset of spatial multigraphs -- graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. This simultaneously addresses a critical gap, as existing graph benchmarks overwhelmingly assume simple, non-spatial edges, discarding vital geometric information. Benchmarks with popular GNNs expose new challenges in learning spatial multi-edges at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for data-driven scientific discovery in condensed matter physics, new opportunities in geometry-aware graph learning and beyond.

URL PDF HTML ☆

赞 0 踩 0

2506.01529 2026-05-20 cs.LG 版本更新

Learning Abstract World Models with a Group-Structured Latent Space

通过组结构潜在空间学习抽象世界模型

Thomas Delliaux, Nguyen-Khanh Vu, Vincent François-Lavet, Elise van der Pol, Emmanuel Rachelson

发表机构 * ISAE-SUPAERO ； ETH Zürich（瑞士联邦理工学院）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）； Microsoft Research AI for Science（微软研究院人工智能科学部）

AI总结该研究通过在低维表示流形上引入几何先验，改进了马尔可夫决策过程的抽象模型学习，从而提升有限数据下的泛化能力，并在具有旋转和翻译特征的环境中实现了更有效的强化学习任务学习。

Comments 20 pages, 18 figures

详情

AI中文摘要

学习有意义的马尔可夫决策过程（MDPs）的抽象模型对于从有限数据中提高泛化能力至关重要。在本文中，我们展示了如何在学习的转移模型的低维表示流形上施加几何先验。我们通过适当选择潜在空间和相关的群作用，纳入已知的对称结构，这些结构编码了环境中的先验知识关于不变性。此外，我们的框架允许将额外的无结构信息与这些对称性一起嵌入。我们实验表明，这导致了比完全无结构方法更好的潜在转移模型预测，以及在具有旋转和翻译特征的环境中下游RL任务学习的改进。此外，我们的实验还显示，这导致了更简单和更解耦的表示。完整的代码可在GitHub上获得以确保可重复性。

英文摘要

Learning meaningful abstract models of Markov Decision Processes (MDPs) is crucial for improving generalization from limited data. In this work, we show how geometric priors can be imposed on the low-dimensional representation manifold of a learned transition model. We incorporate known symmetric structures via appropriate choices of the latent space and the associated group actions, which encode prior knowledge about invariances in the environment. In addition, our framework allows the embedding of additional unstructured information alongside these symmetries. We show experimentally that this leads to better predictions of the latent transition model than fully unstructured approaches, as well as better learning on downstream RL tasks, in environments with rotational and translational features, including in first-person views of 3D environments. Additionally, our experiments show that this leads to simpler and more disentangled representations. The full code is available on GitHub to ensure reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2506.00286 2026-05-20 cs.LG cs.AI math.OC stat.ML 版本更新

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

递归熵风险优化在折扣马尔可夫决策过程中的应用：带有生成模型的样本复杂性界

Oliver Mortensen, Mohammad Sadegh Talebi

发表机构 * Department of Computer Science, University of Copenhagen（哥本哈根大学计算机科学系）

AI总结本文研究了在有限折扣马尔可夫决策过程（MDP）中使用递归熵风险度量（ERM）进行风险敏感强化学习的问题，引入了基于模型的算法Model-Based ERM Q-Value Iteration（MB-RS-QVI），并推导了该算法在价值学习和策略学习中的PAC型样本复杂性界，证明了在最坏情况下样本复杂性与|β|/(1-γ)呈指数关系，为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

详情

AI中文摘要

我们研究了在有限折扣马尔可夫决策过程（MDP）中使用递归熵风险度量（ERM）进行风险敏感强化学习的问题，其中风险参数β≠0控制智能体的风险态度：β>0表示风险规避，β<0表示风险寻求行为。假设MDP具有生成模型。我们的关注点是学习最优状态-动作价值函数（价值学习）和最优策略（策略学习）在递归ERM下的样本复杂性。我们引入了一个基于模型的算法，称为Model-Based ERM Q-Value Iteration（MB-RS-QVI），并推导了该算法在价值和策略学习中的PAC型样本复杂性界。两种PAC界都随|β|/(1-γ)呈指数增长，其中γ是折扣因子。我们还为价值和策略学习建立了相应的下界，证明在最坏情况下样本复杂性对|β|/(1-γ)的指数依赖是不可避免的。这些界在状态和动作的数量（S和A）上是紧的，为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

英文摘要

We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter $β\neq 0$ controls the agent's risk attitude: $β>0$ for risk-averse and $β<0$ for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM $Q$-Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with $|β|/(1-γ)$, where $γ$ is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on $|β|/(1-γ)$ is unavoidable in the worst case. The bounds are tight in the number of states and actions ($S$ and $A$), providing the first rigorous sample complexity guarantees for recursive ERM across both risk-averse and risk-seeking regimes.

URL PDF HTML ☆

赞 0 踩 0

2505.23747 2026-05-20 cs.CV cs.AI cs.LG 版本更新

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Spatial-MLLM: 提升基于视觉的空域智能的MLLM能力

Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan

发表机构 * Tsinghua University（清华大学）

AI总结本文提出Spatial-MLLM，一种基于纯2D观测的视觉空域推理框架，通过双编码器架构和空间感知帧采样策略提升空域理解能力，实验表明其在多种视觉空域任务中达到SOTA性能。

Comments 22 pages

详情

AI中文摘要

近年来，多模态大语言模型（MLLMs）在2D视觉任务上的性能显著提升。然而，提高其空间智能仍是一个挑战。现有的3D MLLMs总是依赖额外的3D或2.5D数据来整合空间意识，限制了它们在只有2D输入（如图像或视频）场景中的实用性。在本文中，我们提出了Spatial-MLLM，一种新颖的框架，用于从纯2D观测中进行基于视觉的空间推理。与传统视频MLLMs依赖CLIP-based视觉编码器优化语义理解不同，我们的关键见解是释放来自前馈视觉几何基础模型的强大结构先验。具体来说，我们提出了双编码器架构：一个预训练的2D视觉编码器用于提取语义特征，以及一个3D空间编码器，从视觉几何模型的主干初始化以提取3D结构特征。然后，一个连接器将两种特征整合到统一的视觉标记中以增强空间理解。此外，我们提出了一种在推理时间的空间感知帧采样策略，该策略选择视频序列中具有空间信息的帧，确保在有限的token长度下，模型专注于对空间推理至关重要的帧。除了架构改进外，我们从多个来源构建了一个训练数据集，并使用监督微调和GRPO对其进行训练。在各种真实世界数据集上的广泛实验表明，Spatial-MLLM在广泛的基于视觉的空间理解和推理任务中实现了SOTA性能。项目页面：https://diankun-wu.github.io/Spatial-MLLM/.

英文摘要

Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced performance on 2D visual tasks. However, improving their spatial intelligence remains a challenge. Existing 3D MLLMs always rely on additional 3D or 2.5D data to incorporate spatial awareness, restricting their utility in scenarios with only 2D inputs, such as images or videos. In this paper, we present Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations. Unlike conventional video MLLMs which rely on CLIP-based visual encoders optimized for semantic understanding, our key insight is to unleash the strong structure prior from the feed-forward visual geometry foundation model. Specifically, we propose a dual-encoder architecture: a pretrained 2D visual encoder to extract semantic features, and a 3D spatial encoder-initialized from the backbone of the visual geometry model-to extract 3D structure features. A connector then integrates both features into unified visual tokens for enhanced spatial understanding. Furthermore, we propose a space-aware frame sampling strategy at inference time, which selects the spatially informative frames of a video sequence, ensuring that even under limited token length, the model focuses on frames critical for spatial reasoning. Beyond architecture improvements, we construct a training dataset from multiple sources and train the model on it using supervised fine-tuning and GRPO. Extensive experiments on various real-world datasets demonstrate that Spatial-MLLM achieves state-of-the-art performance in a wide range of visual-based spatial understanding and reasoning tasks. Project page: https://diankun-wu.github.io/Spatial-MLLM/.

URL PDF HTML ☆

赞 0 踩 0

2505.18191 2026-05-20 eess.SP cs.AI cs.LG cs.PF 版本更新

Quantifying the Generalization Gap in Seizure Detection: A Large-Scale Empirical Benchmark via the SzCORE Challenge

量化癫痫检测中的泛化差距：通过SzCORE挑战进行大规模经验基准测试

Jonathan Dan, Amirhossein Shahbazinia, Christodoulos Kechris, David Atienza

发表机构 * Embedded Systems Laboratory, EPFL, Lausanne, Switzerland（瑞士洛桑联邦理工学院嵌入式系统实验室）

AI总结本文通过SzCORE挑战的大规模经验研究，量化了癫痫检测中模型泛化能力的差距，评估了28种最先进的算法架构，揭示了当前模型在不同患者群体中表现不一致的问题，并提出了标准化评估的必要性。

详情

AI中文摘要

可靠的自动长期脑电图（EEG）癫痫检测仍是一个未解决的挑战，因为当前模型往往无法在不同患者或临床环境中泛化。手动EEG审查仍然是标准护理，突显了对稳健模型和标准化评估的需求。当前文献常报告高效率，但这些模型在部署到未见过的患者群体时经常失效。为了严格评估这种泛化差距，我们进行了一项大规模经验研究，评估了28种最先进的算法架构，从经典特征工程到现代深度学习。这些算法通过组织竞赛收集。利用严格保留的私人数据集，包含65名受试者的连续EEG记录，共计4360小时的数据，来评估算法性能。专家神经生理学家对这些记录进行了注释，建立了癫痫事件的地面真相。算法使用SzCORE框架中的基于事件的指标进行评估，包括灵敏度、精确度、F1分数和每天的假阳性率。结果揭示了最先进的方法之间显著的性能差异，其中最高F1分数为32%（灵敏度37%，精确度29%），突显了这项任务的持续困难。分析揭示了峰值性能与群体水平稳定性之间的不一致。获得最高综合F1分数的算法并未在不同受试者中获得最一致的排名。这项独立评估暴露了自我报告效率与保留性能之间的明显差距，强调了标准化、严格基准测试的必要性。评估基础设施转变为一个持续开放的基准测试平台，促进可重复的研究，并加速稳健癫痫检测算法的发展。

英文摘要

Reliable automatic seizure detection from long-term electroencephalography (EEG) remains an unsolved challenge, as current models often fail to generalize across patients or clinical settings. Manual EEG review still is the standard of care, highlighting the need for robust models and standardized evaluation. The current literature often reports high efficacy, yet these models frequently fail when deployed to unseen patient populations. To rigorously assess this generalization gap, we conducted a large-scale empirical study evaluating 28 state-of-the-art algorithmic architectures, ranging from classical feature engineering to modern Deep Learning. These algorithms were collected by organizing a competition. A strictly held-out private dataset of continuous EEG recordings from 65 subjects, totaling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were evaluated using event-based metrics from the SzCORE framework, including sensitivity, precision, F1-score, and false positive rate per day. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 32% (sensitivity 37%, precision 29%), highlighting the persistent difficulty of this task. Analysis uncovered a discordance between peak performance and population-level stability. The algorithms achieving the highest aggregate F1-scores did not achieve the most consistent ranking across subjects. This independent evaluation exposed a notable gap between self-reported efficacies and hold-out performance, underscoring the critical need for standardized, rigorous benchmarking. The evaluation infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating robust seizure detection algorithm development.

URL PDF HTML ☆

赞 0 踩 0

2504.17548 2026-05-20 quant-ph cs.CR cs.LG 版本更新

Quantum Autoencoder for Multivariate Time Series Anomaly Detection

量子自编码器用于多变量时间序列异常检测

Kilian Tscharke, Maximilian Wendlinger, Afrae Ahouzi, Pallavi Bhardwaj, Kaweh Amoi-Taleghani, Michael Schrödl-Baumann, Pascal Debus

发表机构 * Fraunhofer Institute for Applied and Integrated Security (AISEC)（弗劳恩霍夫应用与集成安全研究所（AISEC））； SAP SE（SAP公司）

AI总结本文提出了一种基于量子自编码器的框架，专门用于企业级多变量时间序列异常检测，展示了其在数据压缩和异常检测中的竞争力。

Comments Submitted to IEEE International Conference on Quantum Computing and Engineering (QCE) 2025

详情

DOI: 10.1109/QCE65121.2025.00268
Journal ref: 2024 IEEE International Conference on Quantum Computing and Engineering (QCE), Albuquerque, NM, USA, 2025, pp. 2470-2481

AI中文摘要

异常检测（AD）定义了识别偏离典型或正常模式的观测或事件的任务，这是IT安全中识别系统配置错误、恶意软件感染或网络攻击等事件的关键能力。在像SAP HANA Cloud系统这样的企业环境中，这项任务通常涉及监控来自遥测和日志数据的高维、多变量时间序列（MTS）。随着量子机器学习在高维潜在空间中提供高效计算的能力，许多途径得以处理此类复杂数据。一种方法是量子自编码器（QAE），一种新兴且有前途的方法，具有在数据压缩和AD中的应用潜力。然而，先前将QAE应用于时间序列AD的应用仅限于单变量数据，限制了其在现实企业系统中的相关性。在本工作中，我们介绍了一种新的基于QAE的框架，专门针对企业规模的MTS AD。我们理论开发并实验验证了该架构，证明我们的QAE在性能上与基于神经网络的自编码器相媲美，同时需要更少的可训练参数。我们在反映SAP系统遥测的数据集上评估了我们的模型，显示所提出的QAE是现实企业环境中半监督AD的一种可行且高效的替代方案。

英文摘要

Anomaly Detection (AD) defines the task of identifying observations or events that deviate from typical - or normal - patterns, a critical capability in IT security for recognizing incidents such as system misconfigurations, malware infections, or cyberattacks. In enterprise environments like SAP HANA Cloud systems, this task often involves monitoring high-dimensional, multivariate time series (MTS) derived from telemetry and log data. With the advent of quantum machine learning offering efficient calculations in high-dimensional latent spaces, many avenues open for dealing with such complex data. One approach is the Quantum Autoencoder (QAE), an emerging and promising method with potential for application in both data compression and AD. However, prior applications of QAEs to time series AD have been restricted to univariate data, limiting their relevance for real-world enterprise systems. In this work, we introduce a novel QAE-based framework designed specifically for MTS AD towards enterprise scale. We theoretically develop and experimentally validate the architecture, demonstrating that our QAE achieves performance competitive with neural-network-based autoencoders while requiring fewer trainable parameters. We evaluate our model on datasets that closely reflect SAP system telemetry and show that the proposed QAE is a viable and efficient alternative for semisupervised AD in real-world enterprise settings.

URL PDF HTML ☆

赞 0 踩 0

2504.00470 2026-05-20 cs.LG cs.CV 版本更新

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

少即是多：通过最小可解释子集选择实现高效的黑盒属性分析

Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； University of Chinese Academy of Sciences（中国科学院大学）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算机与数据科学学院）； School of Artificial Intelligence, University of Science and Technology Beijing（北京科技大学人工智能学院）； Department of Mechanical Engineering, Imperial College London（伦敦帝国理工学院机械工程系）； Center for Machine Vision and Signal Analysis (CMVS), University of Oulu（奥卢大学机器视觉与信号分析中心）； School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University（中山大学深圳校区计算机科学与技术学院）

AI总结本文提出了一种高效的黑盒属性分析方法LiMA，通过将重要区域的属性分析转化为子模函数子集选择的优化问题，以更少的区域提供更准确的解释，并在多个基准模型上展示了显著的改进。

详情

AI中文摘要

为了开发一个可信的AI系统，目标是识别对模型决策影响最大的输入区域。现有属性方法的主要任务是高效且准确地识别输入-预测交互关系。特别是当输入数据是离散的，如图像时，分析输入和输出之间的关系由于组合爆炸而成为重大挑战。在本文中，我们提出了一种新颖且高效的黑盒属性机制LiMA（Less input is More faithful for Attribution），它将重要区域的属性分析重新表述为一个子模子集选择的优化问题。首先，为了准确评估交互，我们设计了一个子模函数，该函数量化子集的重要性并有效捕捉其对决策结果的影响。然后，通过一种新的双向贪心搜索算法，高效地对输入子区域按重要性进行排序。LiMA能够识别最和最不重要的样本，同时确保一个最优的属性边界，以最小化误差。在八个基础模型上的广泛实验表明，我们的方法在更少的区域上提供了忠实的解释，并表现出强大的泛化能力，插入和删除任务的平均改进分别为36.3%和39.6%。我们的方法在属性效率方面也优于朴素的贪心搜索，速度提高了1.6倍。此外，当解释模型预测错误的原因时，我们的方法平均最高置信度比最先进的属性算法高86.1%。代码可在https://github.com/RuoyuChen10/LIMA上获得。

英文摘要

To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

URL PDF HTML ☆

赞 0 踩 0

2503.17581 2026-05-20 math.OC cs.LG 版本更新

Time-optimal neural feedback control of nilpotent systems as a binary classification problem

时间最优神经反馈控制的nilpotent系统作为二分类问题

Sara Bicego, Samuel Gue, Dante Kalise, Nelly Villamizar

发表机构 * Department of Mathematics, Imperial College London, United Kingdom（伦敦帝国学院数学系，英国）； Department of Mathematics, Swansea University, United Kingdom（斯旺西大学数学系，英国）

AI总结本文提出了一种用于线性nilpotent系统时间最优反馈控制律合成的计算方法，通过将问题转化为二分类问题来构建时间最优深度神经网络。

2503.13868 2026-05-20 cs.LG cs.AI 版本更新

Out-of-Distribution Generalization in Time Series: A Survey

时间序列中的分布外泛化：综述

Xin Wu, Fei Teng, Xingwang Li, Ji Zhang, Tianrui Li, Qiang Duan

发表机构 * School of Computing and Artificial Intelligence, Southwest Jiaotong University（计算机与人工智能学院，西南交通大学）； Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education（可持续城市智能交通工程研究中心，教育部）； Information Sciences and Technology Department, the Pennsylvania State University（信息科学与技术系，宾夕法尼亚州立大学）

AI总结本文综述了时间序列中分布外泛化的方法，分析了数据分布、表示学习和分布外评估三个维度，总结了主流算法，指出了应用场景和存在的挑战，并提出了未来研究方向。

Comments Work in Progress

详情

DOI: 10.1016/j.inffus.2026.104336
Journal ref: Information Fusion 133, 104336 (2026)

AI中文摘要

时间序列经常表现出分布偏移、多样化的潜在特征和非平稳学习动态，特别是在开放和演变的环境中。这些特性对分布外（OOD）泛化提出了重大挑战。尽管已有显著进展，但系统性综述仍缺乏。为填补这一空白，我们首次全面回顾了时间序列中OOD泛化方法，旨在阐明该领域的发展轨迹和当前研究现状。我们的分析分为三个基础维度：数据分布、表示学习和OOD评估。在每个维度中，我们详细介绍了几种流行的算法。此外，我们强调了关键的应用场景，突显其实际影响。最后，我们识别了持续存在的挑战并提出了未来的研究方向。时间序列中OOD泛化方法的详细总结可通过https://tsood-generalization.com获取。

英文摘要

Time series frequently manifest distribution shifts, diverse latent features, and non-stationary learning dynamics, particularly in open and evolving environments. These characteristics pose significant challenges for out-of-distribution (OOD) generalization. While substantial progress has been made, a systematic synthesis of advancements remains lacking. To address this gap, we present the first comprehensive review of OOD generalization methodologies for time series, organized to delineate the field's evolutionary trajectory and contemporary research landscape. We organize our analysis across three foundational dimensions: data distribution, representation learning, and OOD evaluation. For each dimension, we present several popular algorithms in detail. Furthermore, we highlight key application scenarios, emphasizing their real-world impact. Finally, we identify persistent challenges and propose future research directions. A detailed summary of the methods reviewed for the generalization of OOD in time series can be accessed at https://tsood-generalization.com.

URL PDF HTML ☆

赞 0 踩 0

2503.12172 2026-05-20 cs.LG cs.CR cs.CV 版本更新

SEAL: Semantic Aware Image Watermarking

SEAL：语义感知图像水印

Kasra Arabi, R. Teal Witter, Chinmay Hegde, Niv Cohen

发表机构 * New York University（纽约大学）

AI总结本文提出了一种新的水印方法，通过将生成图像的语义信息直接嵌入水印中，实现无损水印验证，无需依赖密钥模式数据库。通过局部敏感哈希从图像语义嵌入中推断密钥模式，并基于原始图像内容条件检测水印，提高对抗伪造攻击的鲁棒性。

详情

AI中文摘要

生成模型已迅速发展以生成逼真的输出。然而，它们的合成输出越来越多地挑战自然与AI生成内容之间的清晰区分，需要稳健的水印技术。水印通常需要保持目标图像的完整性，抵御移除尝试，并防止未经授权的复制到无关图像上。为了解决这一需求，最近的方法将持久水印嵌入由扩散模型生成的图像中使用初始噪声。然而，为此，它们要么会扭曲生成图像的分布，要么依赖于搜索一个长密钥字典进行检测。在本文中，我们提出了一种新的水印方法，将生成图像的语义信息直接嵌入水印中，使水印无损，且无需数据库中的密钥模式即可验证。相反，密钥模式可以从图像的语义嵌入中使用局部敏感哈希推断。此外，将水印检测条件化于原始图像内容可以提高对伪造攻击的鲁棒性。为了证明这一点，我们考虑了两种被忽视的攻击策略：（i）攻击者提取初始噪声并生成具有相同模式的新图像；（ii）攻击者在水印图像中插入无关（可能有害）的对象，可能在保持水印的情况下。我们通过实验证明了我们的方法对这些攻击的增强鲁棒性。总的来说，我们的结果表明，内容感知的水印可以缓解图像生成模型带来的风险。

英文摘要

Generative models have rapidly evolved to generate realistic outputs. However, their synthetic outputs increasingly challenge the clear distinction between natural and AI-generated content, necessitating robust watermarking techniques. Watermarks are typically expected to preserve the integrity of the target image, withstand removal attempts, and prevent unauthorized replication onto unrelated images. To address this need, recent methods embed persistent watermarks into images produced by diffusion models using the initial noise. Yet, to do so, they either distort the distribution of generated images or rely on searching through a long dictionary of used keys for detection. In this paper, we propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark, enabling a distortion-free watermark that can be verified without requiring a database of key patterns. Instead, the key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Furthermore, conditioning the watermark detection on the original image content improves robustness against forgery attacks. To demonstrate that, we consider two largely overlooked attack strategies: (i) an attacker extracting the initial noise and generating a novel image with the same pattern; (ii) an attacker inserting an unrelated (potentially harmful) object into a watermarked image, possibly while preserving the watermark. We empirically validate our method's increased robustness to these attacks. Taken together, our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.

URL PDF HTML ☆

赞 0 踩 0

2502.04575 2026-05-20 stat.ML cs.LG cs.NA math.NA physics.comp-ph stat.CO 版本更新

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

归一化常数估计的复杂性分析：从Jarzynski等式到退火重要性采样及其进一步发展

Wei Guo, Molei Tao, Yongxin Chen

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文研究了归一化常数估计问题，提出了一种非渐近分析方法，推导了退火重要性采样估计归一化常数的复杂度，并提出了一种新的算法以处理多模态问题。

Comments Accepted at ICLR 2026 (https://openreview.net/forum?id=96fJALwotm)

详情

AI中文摘要

给定一个未归一化的概率密度π∝e^{-V}，估计其归一化常数Z=∫_{R^d}e^{-V(x)}dx或自由能F=-log Z是贝叶斯统计、统计力学和机器学习中的关键问题。尤其是在高维或π多模态时，这变得尤为具有挑战性。为了减轻传统重要性采样估计器的高方差，采用基于退火的方法如Jarzynski等式和退火重要性采样是常见的选择，但其定量复杂度保证仍很少被探索。我们朝着退火重要性采样的非渐近分析迈出第一步。特别是，我们推导出一个oracle复杂度为~O(dβ²A²/ε⁴)的复杂度，用于在高概率下估计Z的ε相对误差。其中，β是V的光滑度，A表示一个插值π和可处理参考分布的概率测度曲线的动作。我们的分析利用Girsanov定理和最优传输，不需要显式要求目标分布的等周假设。最后，为了处理广泛使用的几何插值的大动作，我们提出了一种基于反扩散采样器的新算法，建立了分析其复杂度的框架，并通过实验证明其在处理多模态问题中的效率。

英文摘要

Given an unnormalized probability density $π\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $π$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $β$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $π$ and a tractable reference distribution. Our analysis, leveraging Girsanov's theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

URL PDF HTML ☆

赞 0 踩 0

2411.08982 2026-05-20 cs.LG cs.DC 版本更新

Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Lynx：通过动态批量感知专家选择实现高效的MoE推理

Vima Gupta, Jae Hyung Ju, Kartik Sinha, Ada Gavrilovska, Anand Padmanabha Iyer

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出Lynx系统，通过利用MoE训练中的负载平衡损失特性，减少专家调用总数，从而在不依赖工作负载的情况下实现高效的MoE推理，提升了吞吐量并保持了低的精度损失。

详情

AI中文摘要

混合专家（MoE）模型提供的选择性参数激活使其成为现代基础模型的流行选择。然而，当用于服务时，MoE面临一个根本性的矛盾。批处理对于服务性能至关重要，迫使激活所有专家，从而抵消了MoE的优势并加剧了内存带宽瓶颈。现有高效MoE推理方法即使在广泛的工作负载特定调优下也无法解决这一矛盾。我们提出了Lynx，一个能够在工作负载无关的情况下实现高效MoE推理的系统。Lynx利用了MoE训练的一个关键特性：负载平衡损失引入了批次级别的专家激活偏斜和冗余，它通过一种新的AffinityBinning技术重新映射每个批次中的低亲和力的token到专家分配，从而减少总调用的专家数量。我们在九个基准测试中对四种最先进的模型家族进行评估，结果显示Lynx在保持精度损失低于1个百分点的情况下，实现了高达1.30倍的吞吐量提升。此外，Lynx与现有技术互补，进一步提升了其性能，最高可提升1.38倍。

英文摘要

Selective parameter activation provided by Mixture-of-Expert (MoE) models have made them a popular choice in modern foundational models. However, MoEs face a fundamental tension when employed for serving. Batching, critical for performance in serving, forces the activation of all experts, thereby negating MoEs' benefits and exacerbating memory bandwidth bottlenecks. Existing work on efficient MoE inference are unable to resolve this tension even with extensive workload-specific tuning. We present LYNX, a system that enables efficient MoE inference in a workload-agnostic fashion. LYNX leverages a key property of MoE training: load-balancing losses introduce batch-level expert activation skews and redundancy, which it exploits by remapping low-affinity token-to-expert assignments within each batch using a novel AffinityBinning technique that reduces the total experts invoked. Our evaluation of LYNX on four state-of-the-art model families across nine benchmarks shows that it achieves up to 1.30x improvement in throughput while maintaining accuracy loss of less than 1% points across tasks. Further, LYNX is complementary to existing techniques where it additionally boosts their performance by up to 1.38x.

URL PDF HTML ☆

赞 0 踩 0

2410.15362 2026-05-20 cs.LG cs.AI cs.CL cs.CR 版本更新

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Faster-GCG: 面向对齐大语言模型的高效离散优化监狱突破攻击

Xiao Li, Wei Zhang, Zhuhong Li, Qiongxiu Li, Shei PernChua, BingZe Lee, Jinghao Cui, Yifan Huang, Xiaolin Hu

发表机构 * Tsinghua University（清华大学）； Sea-Fill ； Duke University（杜克大学）； Aalborg University（奥胡斯大学）； Chinese Institute for Brain Research (CIBR)（中国脑科学研究院）

AI总结本文提出Faster-GCG，通过改进估计、高效采样和避免重复评估，提高了对齐大语言模型的监狱突破攻击效率，实现了样本效率提升8倍，时间减少7倍，并在多个模型上取得了更高的突破成功率。

Comments 18 pages, new version

详情

AI中文摘要

对齐大语言模型（LLMs）因其安全性而受到广泛关注，尤其是在试图通过对抗性提示绕过安全边界（guardrails）的监狱突破攻击中。现有方法中，贪心坐标梯度（GCG）攻击通过离散标记优化实现了自动化监狱突破，但其低样本效率限制了实际应用。特别是，GCG需要约256,000次评估才能达到满意的监狱突破成功率，这是由于底层离散优化问题的固有难度。在本工作中，我们识别了限制GCG样本效率的三个关键因素：不准确的基于梯度的估计、低效的均匀采样以及重复评估先前探索的后缀。为了解决这些问题，我们提出了Faster-GCG，一种经过简化且改进的GCG变种，它结合了基于距离的正则化以提高估计、温度控制的采样以更有效的探索，以及一个标记已访问后缀的机制以避免冗余评估。Faster-GCG将所需的评估次数减少到32,000次，实现了与GCG相比样本效率提升8倍和时间减少7倍的改进。在该减少的预算下，Faster-GCG在五个对齐LLMs上平均达到了78.1%的监狱突破成功率，并在Qwen3.5-4B上达到了88.7%，优于最先进的白盒监狱突破方法。

英文摘要

Aligned Large Language Models (LLMs) have attracted significant attention for their safety, particularly in the context of jailbreak attacks that attempt to bypass guardrails via adversarial prompts. Among existing approaches, the Greedy Coordinate Gradient (GCG) attack pioneered automated jailbreaks through discrete token optimization; however, its low sample efficiency limits practical applicability. In particular, GCG requires approximately 256K evaluations per harmful behavior to achieve a satisfactory jailbreak success rate, due to the inherent difficulty of the underlying discrete optimization problem. In this work, we identify three key factors that limit the sample efficiency of GCG: inaccurate gradient-based estimation, inefficient uniform sampling, and repeated evaluation of previously explored suffixes. To address these issues, we propose Faster-GCG, a streamlined variant of GCG that incorporates distance-based regularization for improved estimation, temperature-controlled sampling for more effective exploration, and a visited-suffix marking mechanism to avoid redundant evaluations. Faster-GCG reduced the required evaluations to 32K, achieving up to an $8\times$ improvement in sampling efficiency and a $7\times$ reduction in wall-clock time compared to GCG. Under this reduced budget, Faster-GCG attained an average jailbreak success rate of 78.1\% across five aligned LLMs, and achieved 88.7\% against Qwen3.5-4B, outperforming state-of-the-art white-box jailbreak methods.

URL PDF HTML ☆

赞 0 踩 0

2408.12385 2026-05-20 cs.DS cs.LG 版本更新

联邦学习中的非空泛化界限

Pierre Jobic, Maxime Haddouche, Benjamin Guedj

发表机构 * Université Paris-Saclay CEA（巴黎-萨克雷大学CEA）； Inria, CNRS, Ecole Normale Supérieure, PSL Research University（法国国家科学研究中心Inria、高等师范学院、巴黎-萨克雷研究大学）； University College London（伦敦大学学院）

AI总结本文提出了一种在联邦学习中训练随机预测器的新策略，通过在保持隐私的同时，释放本地预测器并保护训练数据不被其他节点知晓。研究构建了一个全局随机预测器，继承本地私有预测器的属性，基于PAC-Bayesian泛化界限。通过数值实验展示了该方法在预测性能上与批量方法相当，同时保持隐私。

详情

AI中文摘要

我们介绍了一种新的策略来训练联邦学习中的随机预测器，其中每个网络节点旨在通过释放本地预测器来保护隐私，同时保持其训练数据对其他节点的保密性。然后我们构建了一个全局随机预测器，该预测器继承本地私有预测器的属性，基于PAC-Bayesian泛化界限。我们考虑了同步情况，其中所有节点共享相同的训练目标（来源于泛化界限），以及异构和同构情况，其中每个节点可能有自己的个性化训练目标。通过一系列数值实验，我们证明了我们的方法在预测性能上与批量方法相当，其中所有数据集都在节点之间共享。此外，预测器由数值非空泛化界限支持，同时为每个节点保持隐私。我们明确计算了我们两种联邦设置的预测性能和泛化界限的增量，突显了为保护隐私而付出的代价。

英文摘要

We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but keeping secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisation bound. We consider the synchronous case where all nodes share the same training objective (derived from a generalisation bound), and the heterogenous and homogenous cases where each node may have its own personalised training objective. We show through a series of numerical experiments that our approach achieves a comparable predictive performance to that of the batch approach where all datasets are shared across nodes. Moreover the predictors are supported by numerically nonvacuous generalisation bounds while preserving privacy for each node. We explicitly compute the increment on predictive performance and generalisation bounds for our two federated settings, highlighting the price to pay to preserve privacy.

URL PDF HTML ☆

赞 0 踩 0

2112.08507 2026-05-20 cs.LG stat.ML 版本更新

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

适应性实验的算法：在统计分析与奖励之间进行权衡：结合均匀随机分配与奖励最大化

Tong Li, Jacob Nogas, Haochen Song, Anna Rafferty, Eric M. Schwartz, Audrey Durand, Harsh Kumar, Nina Deliu, Sofia S. Villar, Dehan Kong, Joseph J. Williams

发表机构 * University of Toronto（多伦多大学）； Carleton College（卡洛尔学院）； University of Michigan（密歇根大学）； University of Cambridge（剑桥大学）

AI总结本文提出了一种统计敏感算法TS-PostDiff，通过结合均匀随机分配和奖励最大化，在统计分析与用户奖励之间进行权衡，以提高实验效率和准确性。

详情

AI中文摘要

传统随机A/B实验使用均匀随机（UR）概率分配臂，例如将50/50分配给网站的两个版本以发现哪个版本更能吸引用户。为了更快速和自动地利用数据来造福用户，多臂老虎机算法如汤普森采样（TS）已被提倡。虽然TS具有可解释性并结合了随机化关键的统计推断，但它可能导致有偏估计并增加假阳性率和假阴性率。我们引入了一种更统计敏感的算法，TS-PostDiff（后验概率小差异），它通过使用额外的自适应步骤混合TS和传统UR，其中使用UR（而非TS）的概率与臂差异的后验概率成正比。这使实验者能够定义什么算作小差异，低于此值，传统UR实验可以以低成本获得用于统计推断的信息数据，而高于此值则使用更多TS以最大化用户利益。我们评估了TS-PostDiff与UR、TS以及两个其他旨在提高统计推断的TS变体。我们考虑了在多种设置下的常见双臂实验结果，这些设置受到现实应用的启发。我们的结果提供了洞察，说明在何时以及为何TS-PostDiff或替代方法在用户利益（奖励）和统计推断（假阳性率和功率）之间提供更好的权衡。TS-PostDiff的自适应性有助于在差异较小时高效减少假阳性并提高统计功率，而在差异较大时增加奖励。这项工作强调了未来统计敏感算法开发中重要的考虑因素，这些算法需要在适应性实验中平衡奖励和统计分析。

英文摘要

Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

URL PDF HTML ☆

赞 0 踩 0

2105.00933 2026-05-20 cs.SD cs.AI cs.LG eess.AS 版本更新

Deep Neural Network for Musical Instrument Recognition using MFCCs

基于MFCCs的音乐乐器识别深度神经网络

Saranga Kingkor Mahanta, Abdullah Faiz Ur Rahman Khilji, Partha Pakray

发表机构 * Department of Electronics and Communication Engineering, National Institute of Technology, Silchar, Assam, India（电子与通信工程系，国家理工学院，西拉char，阿萨姆，印度）

AI总结本文提出一种基于MFCCs的深度神经网络模型，用于对二十种不同类别的音乐乐器进行分类，利用伦敦爱乐乐团数据集实现高精度识别。

详情

Journal ref: Computacion y Sistemas, Vol 25, No 2 (2021): 25(2) 2021

AI中文摘要

高效自动音乐分类任务在AI应用于音乐领域中具有重要性，并构成了各种高级应用的基础。音乐乐器识别是通过音频来识别乐器的任务。这种音频也称为声音振动，被模型用来与乐器类别匹配。在本文中，我们使用了一个经过训练以对二十种不同类别的音乐乐器进行分类的人工神经网络（ANN）模型。这里我们仅使用音频数据的梅尔频率倒谱系数（MFCCs）。我们的模型在完整的伦敦爱乐乐团数据集上进行训练，该数据集包含属于四个家族（木管乐器、铜管乐器、打击乐器和弦乐器）的二十种乐器类别。基于实验结果，我们的模型在相同数据集上实现了最先进的准确性。

英文摘要

The task of efficient automatic music classification is of vital importance and forms the basis for various advanced applications of AI in the musical domain. Musical instrument recognition is the task of instrument identification by virtue of its audio. This audio, also termed as the sound vibrations are leveraged by the model to match with the instrument classes. In this paper, we use an artificial neural network (ANN) model that was trained to perform classification on twenty different classes of musical instruments. Here we use use only the mel-frequency cepstral coefficients (MFCCs) of the audio data. Our proposed model trains on the full London philharmonic orchestra dataset which contains twenty classes of instruments belonging to the four families viz. woodwinds, brass, percussion, and strings. Based on experimental results our model achieves state-of-the-art accuracy on the same.

URL PDF HTML ☆

赞 0 踩 0

1912.11333 2026-05-20 cs.SD cs.LG eess.AS 版本更新

Audio-based automatic mating success prediction of giant pandas

基于音频的 giant pandas 雌雄配对成功率预测

WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang

AI总结本文提出了一种基于音频的自动方法，用于预测 giant pandas 的配对成功率，通过提取音频特征并使用深度神经网络进行分类，以辅助大熊猫的繁殖研究。

Comments The manuscript needs further revision

详情

DOI: 10.1016/j.gecco.2020.e01301

AI中文摘要

大熊猫，通常被视为沉默的动物，在繁殖季节会发出显著更多的声音，这表明声音对于协调其繁殖和表达配对偏好至关重要。先前的生物学研究也证明，大熊猫的声音与配对结果和繁殖有关。本文首次尝试开发一种基于其声音的自动方法，用于预测大熊猫的配对成功率。给定一个记录于繁殖接触期间的大熊猫音频序列，我们首先裁剪出大熊猫的声音段落，并对其进行幅度和长度的归一化。然后从音频段落中提取声学特征，并将这些特征输入深度神经网络，以将配对分类为成功或失败。所提出的深度神经网络采用卷积层后接双向门控循环单元来提取声音特征，并应用注意力机制，以迫使网络专注于最相关特征。在过去九年收集的数据集上的评估实验取得了有希望的结果，证明了基于音频的自动配对成功率预测方法在辅助大熊猫繁殖方面的潜力。

英文摘要

Giant pandas, stereotyped as silent animals, make significantly more vocal sounds during breeding season, suggesting that sounds are essential for coordinating their reproduction and expression of mating preference. Previous biological studies have also proven that giant panda sounds are correlated with mating results and reproduction. This paper makes the first attempt to devise an automatic method for predicting mating success of giant pandas based on their vocal sounds. Given an audio sequence of mating giant pandas recorded during breeding encounters, we first crop out the segments with vocal sound of giant pandas, and normalize its magnitude, and length. We then extract acoustic features from the audio segment and feed the features into a deep neural network, which classifies the mating into success or failure. The proposed deep neural network employs convolution layers followed by bidirection gated recurrent units to extract vocal features, and applies attention mechanism to force the network to focus on most relevant features. Evaluation experiments on a data set collected during the past nine years obtain promising results, proving the potential of audio-based automatic mating success prediction methods in assisting giant panda reproduction.

URL PDF HTML ☆

赞 0 踩 0

2605.19018 2026-05-20 cs.LG 版本更新

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

LoRA与全微调：一种理论视角

Ali Zindari, Rotem Mulayoff, Sebastian U. Stich

发表机构 * Universität des Saarlandes（萨尔兰州大学）； CISPA Helmholtz Center for Information Security（信息安全赫尔姆霍兹研究中心）

AI总结本文从理论角度研究了LoRA与全微调在线性回归中的表现，发现LoRA在过定和欠定情况下能够以更低的额外风险优于全微调，且LoRA秩的选择影响泛化性能，实验验证了理论结果的广泛适用性。

Comments Preprint

详情

AI中文摘要

微调通过少量标记数据将预训练模型适应到下游任务。低秩适应（LoRA）是一种高效的微调方法，它在减少内存和计算成本的同时，通常能实现接近全微调的性能。尽管广泛应用，LoRA的理论行为尚未深入理解。本文在简单的线性回归设置中研究LoRA，并将其额外风险与全微调进行比较。我们的分析识别出在过定和欠定情况下，LoRA在某些条件下能够实现低于全微调的额外风险。具体而言，我们的理论预测当预训练任务与下游任务之间的差异在低秩范围内时，LoRA可以超越全微调。我们进一步展示了LoRA秩的选择如何影响泛化性能，解释了在某些情况下使用极小的秩可以提高测试准确率，尽管这限制了模型的表达能力。最后，我们通过实际任务的实验支持了我们的理论结果，表明所识别的权衡和见解超出了线性回归的范围。

英文摘要

Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving performance close to full fine-tuning. Despite its widespread use, the theoretical behavior of LoRA is not yet well understood. In this paper, we study LoRA in a simple linear regression setting and compare its excess risk with that of full fine-tuning. Our analysis identifies regimes in which LoRA achieves lower excess risk than full fine-tuning in both overdetermined and underdetermined settings. Specifically, our theory predicts that LoRA can outperform full fine-tuning when the difference between the pretraining and the downstream tasks is effectively low-rank. We further show how the choice of LoRA rank affects generalization performance, explaining why using a very small rank can improve test accuracy in certain settings, even though it limits model expressivity. Finally, we support our theoretical results with experiments on practical tasks, suggesting that the identified tradeoffs and insights extend beyond linear regression.

URL PDF HTML ☆

赞 0 踩 0

2605.19014 2026-05-20 cs.LG econ.EM stat.ML 版本更新

SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

SAGA：一种序列自适应的生成架构，用于多时间跨度概率预测的自适应时间符合预测

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Hafize Gonca Cömert

发表机构 * Department of Economics, Stockholm University（斯德哥尔摩大学经济系）； Institute of Social Sciences, Faculty of Economics and Administrative Sciences, Süleyman Demirel University（苏莱曼·德米雷尔大学社会科学学院，经济学与行政科学学院）

AI总结本文提出SAGA，一种用于不规则表格面板序列的解码器-only transformer，结合分割符合校准包装器，提供个体层面的预测区间，并保证有限样本边缘覆盖。SAGA在瑞典LISA登记处的纵向数据上训练，预测了1到30年的年度劳动收入，并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比，SAGA在10年时间跨度上将连续排名概率分数减少了31.9%，在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点，在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327，与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布，供在保护的SCB MONA环境中外的复制使用。

Comments 14 pages, 3 figures, 12 tables, 5 appendices, 45 references. Submitted to IEEE TPAMI. Source code at https://github.com/olaflaitinen/saga (archived: doi:10.5281/zenodo.20260366). Synthetic equivalent dataset: doi:10.5281/zenodo.20260287. Empirical work conducted on the Swedish LISA register via SCB MONA (project SCB-MONA-2026-147); ethical approval Swedish Ethical Review Authority 2026-04127-01

详情

AI中文摘要

用于财政部门和中央银行的微模拟模型依赖于参数过程来捕捉生命周期收入的寿命，这些过程只捕捉条件分布的一阶和二阶矩，忽略了长期非线性结构。我们提出SAGA，一种用于不规则表格面板序列的解码器-only transformer，结合分割符合校准包装器，提供个体层面的预测区间，并保证有限样本边缘覆盖。在1990年至2022年的纵向瑞典LISA登记处数据上训练，包含2,143,817个个体和61,284,903人年，模型预测了1到30年的年度劳动收入，并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比，SAGA在10年时间跨度上将连续排名概率分数减少了31.9%，在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点，在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327，与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布，供在保护的SCB MONA环境中外的复制使用。

英文摘要

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.

URL PDF HTML ☆

赞 0 踩 0

2605.19008 2026-05-20 cs.AI cs.CL cs.LG 版本更新

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

通过线学习的训练控制治理：在压力下受限制的自主训练以稳定性和效率

Anis Radianis

发表机构 * Qluon Inc.（Qluon公司）

AI总结本文提出了一种名为Learn-by-Wire Guard (LBW-Guard)的受限制自主训练控制治理层，用于在压力下提高大型语言模型的稳定性和效率，通过在AdamW之上进行有界控制，以保持固定训练目标。

详情

AI中文摘要

现代语言模型训练越来越暴露于不稳定性、退化运行和计算浪费，特别是在使用激进的学习率、规模和运行时间压力条件时。本文介绍了Learn-by-Wire Guard (LBW-Guard)，一种在AdamW之上运行的受限制自主训练控制治理层。而不是替换优化器更新规则，LBW-Guard通过观察训练 telemetry，解读对不稳定性敏感的制度，并在保持固定训练目标的同时对优化器执行应用有界控制。我们评估LBW-Guard在以Qwen2.5为中心的压力和鲁棒性套件中使用WikiText-103，以Qwen2.5-7B为经验锚点，与Qwen2.5-3B和Qwen2.5-14B进行模型大小比较，学习率压力测试，梯度裁剪基线以及无LoRA TinyLlama-1B全参数 sanity check。在7B参考设置中，LBW-Guard将最终困惑度从13.21降低到10.74，降低18.7%，同时将端到端时间从392.54秒降低到357.02秒，提高了1.10倍的速度。在更强的学习率压力下，AdamW在LR=3e-3时退化到最终困惑度1885.24，在LR=1e-3时为659.76，而LBW-Guard分别保持可训练性为11.57和10.33。梯度裁剪基线无法再现这种效果。这些结果支持了一个范围系统的结论，即对稳定性敏感的LLM训练可以受益于在优化器之上进行治理。LBW-Guard提供了证据，表明在压力下受限制的运行时间控制可以在保持生产力计算的同时，与优化器替换和局部梯度抑制保持不同。

英文摘要

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

URL PDF HTML ☆

赞 0 踩 0

2605.19004 2026-05-20 cs.CV cs.LG cs.RO 版本更新

EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction

EgoTraj: 用于多模态预测的现实世界人轨迹数据集

Ahmad Yehia, Abduallah Mohamed, Tianyi Wang, Jiseop Byeon, Kun Qian, Junfeng Jiao, Christian Claudel

发表机构 * Department of Civil, Architectural, and Environmental Engineering, The University of Texas at Austin（土木、建筑与环境工程系，德克萨斯大学奥斯汀分校）； Meta Reality Labs（Meta现实实验室）； School of Architecture, The University of Texas at Austin（建筑学院，德克萨斯大学奥斯汀分校）

AI总结本文提出EgoTraj数据集，用于多模态预测，包含75个真实城市环境中的人导航轨迹，提供了同步的RGB视频和地面真实数据，包括6自由度头部姿态、3D眼 gaze向量和场景注释，展示了该数据集在AR感知、导航和辅助系统中的应用价值。

Comments 21 pages, 14 figures. Project page: https://github.com/yehiahmad/EgoTraj

详情

AI中文摘要

准确地从第一人称视角预测人类轨迹在人形机器人、可穿戴传感系统和辅助导航等应用中起着核心作用。然而，由于现实世界环境中缺乏第一人称轨迹数据集，这一方向的进展受到限制。为了解决这一需求，我们介绍了EgoTraj，一个使用Meta Quest Pro (MQPro)录制的egocentric多模态开放数据集。EgoTraj包含75个由多个MQPro穿戴设备在真实城市环境中收集的人导航轨迹。每个记录都提供了同步的RGB视频以及地面真实数据，包括连续时间同步的6自由度头部姿态、每帧3D眼 gaze向量和场景注释。据我们所知，EgoTraj不同于典型的egocentric轨迹数据集，因为它捕捉了在多样化的城市路线中进行的长视距、自主导航，具有广泛的参与者多样性。为了展示该数据集的潜力，我们对几种最先进的egocentric轨迹预测方法进行了基准测试，并进行了消融研究以分析注视、场景和运动提示的贡献。结果突显了EgoTraj在AR感知、导航和辅助系统中的实用性。EgoTraj数据集、代码和EgoViz仪表板已公开在https://github.com/yehiahmad/EgoTraj。

英文摘要

Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at https://github.com/yehiahmad/EgoTraj.

URL PDF HTML ☆

赞 0 踩 0

2605.18999 2026-05-20 cs.LG 版本更新

Hyrax：一个用于快速机器学习实验和无监督发现的可扩展框架，在Rubin、Roman和Euclid时代

Aritra Ghosh, Drew Oldag, Michael Tauraso, Andrew J. Connolly, Peter Ferguson, Derek Jones, Gourav Khullar, Argyro Sasli, Samarth Venkatesh, Gracia Wang, Maxine West, Dylan Berry, Neven Caplar, Colin Orion Chandler, Tanawan Chatchadanoraset, Michael W. Coughlin, Melissa DeLucchi, Alexandra Junell, Diego Miura, Felipe Fontinele Nunes, Wilson Beebe, Doug Branton, Sandro Campos, Liam Cunningham, Mi Dai, Jeremy Kubica, Konstantin Malanchev, Rachel Mandelbaum, Sean McGuire, Imad Pasha, Dan S. Taranu, Tianqing Zhang

发表机构 * Dept. of Astronomy \& the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA ； School of Physics ； Astronomy, University of Minnesota, Minneapolis, MN 55455, USA ； Department of Astronomy ； Planetary Science, Northern Arizona University, Flagstaff, USA ； McWilliams Center for Cosmology ； Astrophysics, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA

AI总结本文提出Hyrax，一个支持天文领域完整机器学习生命周期的开源框架，通过五个实际应用展示了其在大规模天文数据中的无监督发现和监督检测能力，为下一代天文调查提供了系统化的机器学习基础设施。

Comments 28 pages, 20 figures, submitted to AJ

详情

AI中文摘要

NSF-DOE Vera C. Rubin Observatory、Roman Space Telescope、Euclid及其他下一代调查将提供大规模的成像、光谱和时域数据，这使得天文机器学习（ML）项目中的瓶颈从模型设计转向了基础设施。我们介绍了Hyrax，一个开源、模块化、基于GPU的Python框架，支持天文领域的完整ML生命周期：从数据获取和训练到推理和实验比较，具备多模态数据集支持、集成向量数据库用于相似性搜索以及交互式的二维和三维潜在空间探索用于无监督发现。我们通过五个代表性的应用展示了Hyrax的多功能性：（i）在约4×10^5个Rubin Legacy Survey of Space and Time（LSST）Data Preview 1（DP1）星系上进行无监督表示学习，发现新的合并体和低表面亮度候选者，同时隔离成像伪影，而无需标记训练数据；（ii）混合密度基于聚类用于识别DP1数据中的星系团尺度引力透镜候选者；（iii）利用光变曲线、光谱、图像和元数据进行多模态早期时间瞬变分类，利用Zwicky Transient Facility；（iv）在Dark Energy Camera Ecliptic Exploration Project调查中利用位移和堆叠搜索对遥远太阳系天体进行监督性假阳性过滤；（v）利用合成源注入在Hyper Suprime-Cam和LSST类成像中监督检测半解析矮星系。这些结果共同表明，Hyrax为天文特定的机器学习基础设施提供了系统化的发现和快速的方法论迭代能力，适用于下一代天文调查。

英文摘要

The NSF-DOE Vera C. Rubin Observatory, Roman Space Telescope, Euclid, and other next-generation surveys will deliver imaging, spectroscopic, and time-domain data at scales that increasingly shift the bottleneck in astronomical machine learning (ML) projects from model design to infrastructure. We present Hyrax, an open-source, modular, GPU-enabled Python framework that supports the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive two- and three-dimensional latent-space exploration for unsupervised discovery. We demonstrate Hyrax's versatility through five representative applications on real survey data: (i) unsupervised representation learning on $\sim 4\times10^5$ Rubin Legacy Survey of Space and Time (LSST) Data Preview 1 (DP1) galaxies, surfacing new merger and low-surface-brightness candidates missing from reference Euclid and Dark Energy Survey catalogs, while also isolating imaging artifacts -- all without labeled training data; (ii) hybrid density-based clustering for identifying cluster-scale gravitational lens candidates in DP1 data; (iii) multimodal early-time transient classification in the Zwicky Transient Facility leveraging light curves, spectra, images, and metadata; (iv) supervised false-positive filtering in shift-and-stack searches for distant solar system objects in the Dark Energy Camera Ecliptic Exploration Project survey; and (v) supervised detection of semi-resolved dwarf galaxies in Hyper Suprime-Cam and LSST-like imaging using synthetic source injection. Together, these results demonstrate that Hyrax provides astronomy-specific ML infrastructure that enables systematic discovery and rapid methodological iteration across next-generation astronomical surveys.

URL PDF HTML ☆

赞 0 踩 0

2605.18933 2026-05-20 cs.LG 版本更新

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

对ReLU + RMSNorm块在三元量化下的符号幅度不对称性进行几何分析

Lei Dong

发表机构 * Independent Researcher（独立研究者）

AI总结本文通过符号幅度分解解释了在三元量化下ReLU + RMSNorm块的符号幅度不对称性，揭示了ReLU和RMSNorm在权重扰动中的几何机制，并通过实验验证了这种不对称性在实际模型中的表现。

Comments 53 pages, 2 figures, 21 tables, 7 appendices

详情

AI中文摘要

预归一化变换器使用RMSNorm可以容忍三元{-1,0,+1}权重量化，其损失出人意料的小（Ma等人，2024）。我们通过符号幅度分解给出了几何解释。在具有独立同分布高斯权重的两层ReLU + RMSNorm模型中，符号翻转产生的横向输出能量是符号保持幅度扰动的π/(π-2)≈2.75倍，当翻转率p→0时（定理3）。机制：ReLU在两种扰动类型之间创建了隐藏空间的方向不对称性，RMSNorm的横向投影Fréchet导数选择性地暴露了这种不对称性。符号量化误差本身是一种符号保持的扰动，具有角度对齐cos²→2/π（定理4）；其后ReLU径向分数（0.365）与前ReLU值1-2/π在0.4%内一致，因此ReLU对三元误差几乎是透明的。多层叠加的2.75倍因子未被实验支持；与真实模型符号敏感性之间的差距源于异常特征违反去局部化。对于幅度为α的输入维度，单个符号翻转产生的后ReLU能量放大约为R≈nα²，相对于去局部化的条目。在TinyLlama-1.1B上，线性响应（p≤0.5%）下，计数匹配的NLL利用稳定在约10×≈nE[α²]，与每条目理论一致；所有列NLL比率为5.0×，在R_col≤19内（67×PPL差距反映了度量非线性）。测量的异常α在第12层（中位数0.024，最大0.26）确认了重尾浓度。Bussgang常数2/π、RMSNorm几何和ReLU半空间结构共同解释了预归一化模型中的符号幅度不对称性，R≈nα²解释了真实模型的偏差。

英文摘要

Pre-norm Transformers with RMSNorm tolerate ternary {-1,0,+1} weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce $π/(π-2) \approx 2.75$ times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate $p \to 0$ (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fréchet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment $\cos^2 \to 2/π$ (Theorem 4); its post-ReLU radial fraction ($0.365$) matches the pre-ReLU value $1-2/π$ within $0.4\%$, so ReLU is approximately transparent to ternary error. Multi-layer compounding of the $2.75\times$ factor is not experimentally supported; the gap to real-model sign sensitivity arises from outlier features violating delocalization. For an input dimension with amplitude $α$, a single sign-flip produces post-ReLU energy amplified by $R \approx nα^2$ relative to a delocalized entry. On TinyLlama-1.1B, at linear response ($p \leq 0.5\%$), count-matched NLL leverage stabilizes at $\sim 10\times \approx n\mathbb{E}[α^2]$, matching the per-entry theory; the all-column NLL ratio of $5.0\times$ falls within $R_{\mathrm{col}} \leq 19$ ($67\times$ PPL gap reflects metric nonlinearity). Measured outlier $α$ at layer 12 (median $0.024$, max $0.26$) confirms heavy-tailed concentration. The Bussgang constant $2/π$, RMSNorm geometry, and ReLU half-space structure together explain sign-magnitude asymmetry in pre-norm models, with $R \propto nα^2$ accounting for real-model deviations.

URL PDF HTML ☆

赞 0 踩 0

2605.18930 2026-05-20 cs.CR cs.AI cs.LG 版本更新

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

OEP: 通过局部正确但不可转移的经验污染自演化LLM代理

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou, Jie Li

发表机构 * Shanghai Jiao Tong University（上海交通大学）

AI总结研究探讨了通过局部正确但不可转移的经验污染自演化LLM代理的安全风险，提出OEP攻击方法，利用低权限黑盒攻击在无需直接控制系统提示或记忆数据库的情况下诱导有害泛化。

详情

AI中文摘要

记忆增强型大语言模型（LLM）代理通过迭代反思和自我进化解决复杂任务，但这些机制引入了安全风险。现有代理记忆攻击需要特权访问或显式恶意内容，使其能够被高级安全过滤器检测到。这留下了一个未被充分探索的攻击面：对手是否能够诱导代理生成看起来局部正确且语义合理但会导致反思期间有害泛化的经验。我们发现，反思代理对这种干净经验存在漏洞，尤其是在与严重但合理的假设后果相结合时。基于这一观察，我们引入了强迫经验污染（OEP），一种低权限黑盒攻击，不需要直接控制系统提示或记忆数据库。OEP构建了对抗性的干净边缘案例，结合局部正确的解决方案、不可转移的方法和严重后果，使反思偏向风险规避的规则形成。在记忆巩固期间，代理可能过度信任自生成的反思，并将局部经验转化为高优先级但过度泛化的规则，导致下游故障。在三个领域的评估显示，OEP在GPT-4o代理上实现了超过50%的ASR，并在LLM审核防御下优于现有攻击。

英文摘要

Memory-augmented large language model (LLM) agents use iterative reflection and self-evolution to solve complex tasks, but these mechanisms introduce security risks. Existing agentic memory attacks require privileged access or explicit malicious content, making them detectable by advanced safety filters. This leaves a subtler attack surface underexplored: whether adversaries can induce agent to generate experiences that appear locally correct and semantically plausible yet induce harmful generalization during reflection. We find that reflective agents are vulnerable to such clean experiences, especially when paired with severe but plausible hypothetical consequences. Based on this observation, we introduce Obsessive Experience Poisoning (OEP), a low-privilege black-box attack requiring no direct control over the system prompt or memory database. OEP constructs adversarial clean edge-cases that combine locally correct solutions, non-transferable methods, and severe consequences, biasing reflection toward risk-averse rule formation. During memory consolidation, agents may over-trust self-generated reflections and distill localized experiences into high-priority but over-generalized rules, causing downstream failures. Evaluations across three domains show that OEP achieves ASR above 50\% with GPT-4o agents, and outperforms existing attacks under LLM auditing defense.

URL PDF HTML ☆

赞 0 踩 0

2605.18927 2026-05-20 stat.ML cs.LG math.PR 版本更新

Bayesian Latent Space Models for Graphs Are Misspecified: Toward Robust Inference via Generalized Posteriors

基于图的贝叶斯潜在空间模型存在规格问题：通过广义后验实现稳健推断

Aldric Labarthe

发表机构 * Centre Borelli, Université Paris-Saclay（巴黎-萨克雷大学博雷利中心）； Department of Computer Science, University of Geneva（日内瓦大学计算机科学系）

AI总结本文研究了基于图的贝叶斯潜在空间模型的规格问题，提出了一种广义后验框架，通过Link-Sequential R-SafeBayes方法改进模型的鲁棒性，提升了校准性和链接预测性能。

详情

AI中文摘要

贝叶斯潜在空间模型为网络表示提供了一种系统的方法，但依赖于几何和链接函数的正确规范。现实中的网络经常违反这些假设，表现出几何不匹配和结构异常，破坏标准度量属性。我们证明，这种不规范会将数据生成分布推离模型类，导致贝叶斯推断变得过于自信且校准不佳。为了解决这个问题，我们提出了一种随机几何图的广义后验框架。我们引入了Link-Sequential R-SafeBayes方法，该方法利用二元条件独立性来估计预quential风险并自适应地调节后验正则化。在合成和现实网络上的实验表明，改进了校准性，提高了链接预测性能，并提供了一个可靠的准则来选择欧几里得、球面和双曲空间中的潜在几何结构。

英文摘要

Bayesian latent space models offer a principled approach to network representation, but rely on correct specification of both geometry and link function. Real-world networks often violate these assumptions, exhibiting geometric mismatch and structural anomalies that break standard metric properties. We show that such misspecification pushes the data-generating distribution outside the model class, causing Bayesian inference to become overconfident and poorly calibrated. To address this, we propose a generalized posterior framework for random geometric graphs. We introduce Link-Sequential R-SafeBayes, a method that exploits dyadic conditional independence to estimate prequential risk and adaptively tune posterior regularization. Experiments on synthetic and real-world networks demonstrate improved calibration, better link prediction performance, and a reliable criterion for selecting latent geometries across Euclidean, spherical, and hyperbolic spaces.

URL PDF HTML ☆

赞 0 踩 0

2605.18923 2026-05-20 eess.IV cs.CV cs.LG q-bio.QM 版本更新

From Division to Decision: Leveraging Temporal Cell-Stage Segmentation for Embryo Transferability Prediction

从分裂到决策：利用时间细胞阶段分割预测胚胎可转移性

Yasmine Hachani, Patrick Bouthemy, Elisa Fromont, Véronique Duranthon, Ludivine Laffont, Alline de Paula Reis

发表机构 * Inria center at Rennes University, Paris-Saclay University, UVSQ, INRAE, BREED（里昂大学Inria研究中心、巴黎萨克雷大学、UVSQ、INRAE、BREED）； University of Rennes, IRISA（雷恩大学、IRISA）； The National Veterinary School of Alfort（阿尔福兽医学校）

AI总结该研究提出TransFACT框架，利用时间 lapse 视频中的早期发育阶段信息，通过结合帧级时间特征和阶段级表示，预测胚胎可转移性，优于现有方法。

详情

Journal ref: ICIP 2026 - IEEE International Conference on Image Processing, Sep 2026, Tampere, Finland

AI中文摘要

准确选择牛胚胎是一项具有挑战性的任务，因为当前实践依赖于受精后第七天单一专家评估，导致高妊娠丢失率。时间延展显微镜提供了早期发育的详细信息，但由于复杂的运动模式和耗时的分析而难以利用。我们提出TransFACT，一种基于变压器的框架，用于使用发育前四天的2D时间延展视频建模早期发育阶段和胚胎可转移性。TransFACT结合帧级时间特征和阶段级表示，利用发育阶段作为辅助监督，在第四天预测可转移性。我们的实验表明，TransFACT通过利用现有用于动作识别的方法，在预测胚胎可转移性方面优于其竞争对手。

英文摘要

Accurate selection of bovine embryos is a challenging task, as current practice relies on a single expert assessment on the seventh day after insemination, resulting in high rates of pregnancy loss. Time-lapse videomicroscopy provides detailed information on early development, but is difficult to exploit because of complex motion patterns and time-consuming analysis. We propose TransFACT, a transformer-based framework for modeling early developmental stages and embryo transferability using 2D time-lapse videos from the first four days of development. TransFACT combines frame-level temporal features with stage-level representations, using developmental stages as auxiliary supervision to predict transferability on day four. Our experiments demonstrate that TransFACT, by leveraging an existing method designed for action recognition, achieves superior performance than its competitor in predicting embryo transferability.

URL PDF HTML ☆

赞 0 踩 0

2605.18919 2026-05-20 cs.CR cs.AI cs.LG 版本更新

MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary Attacks

MoCo-EA：利用对抗模式连接实现高效的进化攻击

Hyo Seo Kim, Gang Luo, Can Chen, Binghui Wang, Yue Duan, Ren Wang

发表机构 * Illinois Institute of Technology（伊利诺伊理工学院）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； Singapore Management University（新加坡管理大学）

AI总结本文提出MoCo-EA，一种通过利用对抗模式连接来提高效率的进化攻击方法，该方法通过贝塞尔交叉算子优化扰动，提升了攻击效果并减少了收敛时间和查询需求。

详情

AI中文摘要

进化算法用于对抗攻击通过群体搜索发现无梯度信息的扰动，但传统的交叉操作效率低下，会通过离散插值破坏对抗属性。我们引入了模式连接进化攻击（MoCo-EA），用一种新的贝塞尔交叉算子替代传统交叉，优化扰动沿连续贝塞尔曲线之间。我们的关键见解是对抗示例位于连接的流形上，中间点维持并经常增强攻击效果。我们展示了三个发现：（1）成功的对抗扰动表现出模式连接；（2）优化路径上的中间点比端点具有更高的可转移性；（3）贝塞尔交叉显著优于离散遗传操作，同时减少收敛时间和查询需求。通过利用对抗空间的几何结构通过路径优化，MoCo-EA提供了一种高效且可靠的方法。我们的工作挑战了对抗示例作为孤立点的传统观点，并为攻击生成和防御研究开辟了新方向。

英文摘要

Evolutionary algorithms for adversarial attacks leverage population-based search to discover perturbations without gradient information, but suffer from inefficient crossover operations that destroy adversarial properties through discrete interpolation. We introduce Mode Connectivity Evolutionary Attack (MoCo-EA), which replaces traditional crossover with a novel Bézier crossover operator that optimizes perturbations along a continuous Bézier curve between parent perturbations. Our key insight is that adversarial examples lie on connected manifolds where intermediate points maintain and often enhance attack effectiveness. We demonstrate three findings: (1) Successful adversarial perturbations exhibit mode connectivity; (2) Intermediate points along optimized paths achieve higher transferability than endpoints; (3) Bézier crossover dramatically outperforms discrete genetic operations while reducing convergence time and query requirements. By exploiting the geometric structure of adversarial space through path optimization, MoCo-EA provides an efficient and reliable method. Our work challenges the traditional view of adversarial examples as isolated points and opens new directions for both attack generation and defense research.

URL PDF HTML ☆

赞 0 踩 0

2605.18913 2026-05-20 cs.CR cs.AI cs.LG 版本更新

SCAFDS: Edge-Feature Graph Attention for Interbank Fraud Detection with Attribution-Grounded SAR Generation

SCAFDS: 基于边特征图注意力的跨银行欺诈检测与归因驱动的SAR生成

Mohammad Nasir Uddin

发表机构 * Taskimpetus Inc.（Taskimpetus公司）

AI总结本文提出SCAFDS系统，通过七阶段集成监控流程解决现有方法的五个结构性限制，利用欺诈共现边特征进行跨银行拓扑编码，结合节点表示和欺诈共现边特征进行边特征引导的图注意力，生成机构级系统性欺诈风险评分，并通过归因条件生成SAR叙述，实现每个FinCEN SAR断言的可追溯性，最终在IEEE-CIS欺诈检测数据集和合成FDIC对齐的跨银行网络上取得了显著的AUPRC和AUROC提升。

详情

AI中文摘要

美国金融系统每天处理约130万笔跨银行交易，但现有文献中没有系统利用欺诈共现边特征来建模跨银行网络中的欺诈传播。先前的跨银行GNN架构使用信用困境监督信号建模信用传染，导致欺诈取证系统不匹配。没有现有系统能生成带有每个断言的取证追溯性的SAR叙述，从而在提交给FinCEN的报告中产生监管审计缺口。本文引入SCAFDS（系统性传染意识欺诈检测系统），一个七阶段集成监控流程，解决现有方法的五个结构性限制：（1）利用FinCEN SAR注册记录中的欺诈共现频率度量f(u,v,t）进行欺诈特定的跨银行拓扑编码；（2）基于节点表示和欺诈共现边特征的边特征引导的图注意力，其中系数由两者计算得出；（3）双线性欺诈共现风险融合，产生机构级系统性欺诈风险评分；（4）归因条件的SAR叙述生成，每个FinCEN SAR断言具有显著性阈值，确保每个FinCEN SAR断言可追溯到特定的数值管道输出；（5）拓扑感知的自适应取证反馈更新图注意力权重，从监管处置中更新。在IEEE-CIS欺诈检测数据集（590,540笔交易）和一个合成FDIC对齐的跨银行网络（8,103个机构，169,800条边）上的实验表明，SCAFDS在AUPRC=0.515±0.032和AUROC=0.802±0.018，比GraphSAGE-AML提升了+15.9个百分点和+13.7个百分点。部分验证FDIC执法行动记录（n=4,279）确认了模型排名的一致性。美国专利商标局临时专利申请号64/061,083，于2026年5月8日提交。

英文摘要

The U.S. financial system processes approximately 1.3 million interbank transactions daily, yet no system in the reviewed literature models fraud propagation across the interbank network using fraud co-occurrence edge features. Prior interbank GNN architectures model credit contagion using credit distress supervision signals, producing systems misaligned for fraud forensics. No existing system generates SAR narratives with per-assertion forensic traceability to specific numerical detection outputs, creating regulatory auditability gaps in FinCEN-submitted reports. This paper introduces SCAFDS (Systemic Contagion-Aware Fraud Detection System), a seven-stage integrated surveillance pipeline addressing five structural limitations of prior art: (1) fraud-specific interbank topology encoding using fraud co-occurrence frequency metrics f(u,v,t) derived from FinCEN SAR registry records; (2) edge-feature-informed graph attention where coefficients are computed from both node representations and fraud co-occurrence edge features; (3) bilinear fraud co-occurrence risk fusion producing institution-level systemic fraud risk scores; (4) attribution-conditioned SAR narrative generation with per-assertion significance thresholds ensuring each FinCEN SAR assertion is traceable to a specific numerical pipeline output; and (5) topology-aware adaptive forensic feedback updating graph attention weights from regulatory dispositions. Experiments on the IEEE-CIS Fraud Detection Dataset (590,540 transactions) and a synthetic FDIC-aligned interbank network (8,103 institutions, 169,800 edges) show SCAFDS achieves AUPRC=0.515+/-0.032 and AUROC=0.802+/-0.018, representing +15.9pp and +13.7pp improvements over GraphSAGE-AML. Partial validation on FDIC enforcement action records (n=4,279) confirms consistent model ranking. USPTO Provisional Patent Application No. 64/061,083, filed May 8, 2026.

URL PDF HTML ☆

赞 0 踩 0

2605.18908 2026-05-20 cs.CR cs.AI cs.LG 版本更新

Fast and Lightweight Backdoor Detection via Head Random Probing

通过头部随机探测实现快速且轻量的后门检测

Yinbo Yu, Xueyu Yin, Jing Fang, Chunwei Tian, Qi Zhu, Jiajia Liu, Daoqiang Zhang

发表机构 * College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics（南京航空航天大学人工智能学院）； School of Cybersecurity, Northwestern Polytechnical University（西北工业大学网络安全学院）； Shenzhen Research Institute of Northwestern Polytechnical University（西北工业大学深圳研究院）； School of Computer Science and Technology, Harbin Institute of Technology（哈尔滨工业大学计算机科学与技术学院）

AI总结本文提出HTell，一种基于头部随机探测的快速且轻量的数据无关后门检测器，通过分析模型预测头部在随机潜在探测下的响应统计，实现高效准确的后门检测。

详情

AI中文摘要

深度神经网络（DNN）仍然对后门攻击极度脆弱。现有的训练后检测器通常需要干净或替代数据、梯度或迭代触发器重建，导致计算成本高且在实际模型审计场景中鲁棒性有限。本文提出HTell，一种基于头部随机探测的快速且轻量的数据无关后门检测器。与重建多样化的触发模式不同，HTell检查其在预测头部的统一表现：被篡改的模型倾向于在随机潜在探测下在目标类别上表现出异常的响应集中。HTell生成架构感知的随机潜在探测，直接将其输入模型头部，并通过分析类别响应统计来检测后门，而无需访问真实或替代数据、模型梯度或参数优化。我们在包含超过6000个被篡改模型和700个干净模型的大型基准上评估HTell，涵盖4个数据集、14种架构和21种后门攻击类型。HTell在仅12.69毫秒/模型的检测延迟下实现了99.03%的真阳性率和2.11%的假阳性率，将时间成本降低了超过30,000倍，相较于代表性的梯度基检测器。这些结果表明，头部随机探测提供了一种准确、鲁棒且高效的解决方案，用于大规模的数据无关后门模型审计。

英文摘要

Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.

URL PDF HTML ☆

赞 0 踩 0

2605.18905 2026-05-20 cs.LG cs.AI cs.NA cs.NE math.NA 版本更新

Stability and Discretization Error of State Space Model Neural Operators

状态空间模型神经算子的稳定性与离散化误差

Abderrahim Bendahi, Adrien Fradin, Johan Peralez, Julie Digne, Madiha Nadri

发表机构 * École polytechnique（巴黎政治经济学院）； Université Claude Bernard Lyon 1（里昂1大学）； CNRS（法国国家科学研究中心）； LAGEPP UMR 5007 ； Université Lyon 1（里昂1大学）； INSA Lyon（里昂国立应用科学学院）； LIRIS（里昂图像与信号研究所）

AI总结本文研究了状态空间模型神经算子的稳定性与离散化误差，通过理论分析建立了神经算子近似方案的离散误差和稳定性保证，提出了针对SS-NOs和FNOs的新的离散误差定理，并通过实验验证了其在不同分辨率下的鲁棒性。

详情

AI中文摘要

神经算子已作为一种强大的、与离散化无关的框架，用于求解偏微分方程（PDEs）。尽管已建立的方法如深度运算网络（DeepONet）已成功实现了运算符的通用逼近，而如傅里叶神经算子（FNOs）等架构已显示出代数收敛速率，但连续理论与其离散数值实现之间的精确理论联系仍是一个挑战。具体来说，连续公式与离散数值稳定性之间的关系尚未被充分探索。在本文中，我们通过建立神经算子近似方案的离散误差和稳定性的理论保证来填补这一空白。我们证明了将解的正则性与输入离散化联系起来的分析界，提供了在现实数值约束下神经算子精度的正式量化。我们为SS-NOs和FNOs的具体情况推导了这些界，从而为这些模型提出了新的离散误差定理。此外，通过输入到状态稳定性（ISS）分析，我们正式评估了离散化对连续域中SS-NOs结果稳定性的影响。我们在1D和2D基准上的实验证实了我们的理论界，并展示了SS-NOs在不同分辨率下的鲁棒性。

英文摘要

Neural operators have emerged as a powerful, discretization-invariant framework for solving partial differential equations (PDEs). Although established approaches like the Deep Operator Network (DeepONet) have successfully achieved universal approximation for operators, and architectures such as Fourier Neural Operators (FNOs) have shown algebraic convergence rates, a precise theoretical connection between the continuous theory and its discrete numerical implementation remains a challenge. Specifically, the relationship between the continuous formulation and the discrete numerical stability has yet to be fully explored. In this paper, we address this gap by establishing theoretical guarantees for the discretization error and stability of neural operator approximation schemes. We prove analytical bounds that link solution regularity to input discretization, providing a formal quantification of neural operator accuracy under real-world numerical constraints. We derive these bounds to the specific cases of State Space Model-based Neural Operators (SS-NOs) and FNOs, thus providing a new discretization error theorem for these models. Additionally, through an input-to-state stability (ISS) analysis, we formally assess the impact of discretization on the stability of SS-NOs results obtained in the continuous domain. Our empirical experiments on 1D and 2D benchmarks validate our theoretical bounds and show the robustness of SS-NOs under varying resolutions.

URL PDF HTML ☆

赞 0 踩 0

2605.18904 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Dynamic Model Merging Made Slim

动态模型合并的轻量级方法

Guodong Du, Wanyu Lin

发表机构 * The Hong Kong Polytechnic University（香港理工大学）

AI总结本文提出DiDi-Merging方法，通过可微分的秩分配平衡共享和专家参数，实现更高效的动态模型合并，在参数量上显著优于现有方法。

详情

AI中文摘要

模型合并使在不联合训练或访问原始数据的情况下重用微调模型成为可能。动态合并进一步通过选择性激活任务相关参数并高效组合多个任务的专家来提高灵活性。然而，现有动态方法要么维护一个完整的共享模型加小专家，要么为专家分配过多容量，导致准确性与效率之间的权衡不优。为此，我们提出DiDi-Merging，一种轻量动态合并框架，利用可微分的秩分配来平衡共享和专家参数。通过将参数预算分配建模为低秩模块中的可微分秩优化，并引入无需数据的细化步骤来恢复任务保真度，DiDi-Merging在仅1.24倍单个微调模型参数的情况下匹配现有动态基线，并在1.4倍时超越它们，显著优于需要>2倍存储容量的方法。DiDi-Merging适用于视觉、语言和多模态任务。

英文摘要

Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity to experts, leading to suboptimal accuracy--efficiency trade-offs. To address this, we propose DiDi-Merging, a slim dynamic merging framework that leverages differentiable rank allocation to balance shared and expert parameters. By formulating parameter budgeting as differentiable rank optimization in low-rank modules and introducing a data-free refinement step to recover task fidelity, DiDi-Merging matches prior dynamic baselines at only 1.24x the parameters of a single fine-tuned model and surpasses them at 1.4x, substantially more compact than methods requiring > 2x storage. DiDi-Merging applies across vision, language, and multimodal tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.18903 2026-05-20 cs.LG cs.CV 版本更新

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era

推理可移植性：引导MLLMs在RLVR时代的持续学习

Qiuhe Hong, Yuyang Liu, Shuo Yang, Tiantian Peng, Fei Zhu, Yonghong Tian

发表机构 * Shenzhen Graduate School of Peking University（北京大学深圳研究生院）； Centre for Artificial Intelligence and Robotics, HKISI, CAS（香港科学院人工智能与机器人研究中心）； Peng Cheng Laboratory（鹏城实验室）

AI总结本文提出了一种名为推理可移植性（RP）的机制，通过在持续学习中引入推理层面的约束，改进了多模态大语言模型在RLVR环境下的适应能力，实验表明RDB-CL在提升最后准确率方面优于基线方法。

详情

AI中文摘要

在持续学习中，视觉-语言模型（VLM-CL）旨在不断适应新多模态任务的同时保留先前知识。新兴的将多模态大语言模型（MLLMs）与具有可验证奖励的强化学习（RLVR）相结合的范式，要求一种新的模式来引导持续适应。随着推理能力的进步，现在可以在推理层面施加约束。我们正式化了可移植性，即一个样本级别的度量，用于衡量先前策略行为在新任务中的可重用性，并实证表明推理层面的信号在分布外样本上仍可靠，而答案层面的信号则不然。我们将此形式化为推理可移植性（RP），并提出基于推理的动态平衡持续学习（RDB-CL），该方法根据RP调节RLVR中的每样本Kullback-Leibler正则化：一个紧密的锚点在高RP样本上保留可重用的推理，而低RP样本上的放松锚点则允许探索新的推理路径。实验表明，RDB-CL在提升最后准确率方面优于基线方法，相比 vanilla RLVR 基线提升了+12.0%。

英文摘要

Vision-Language Models in Continual Learning (VLM-CL) aim to continuously adapt to new multimodal tasks while retaining prior knowledge. The emerging paradigm that couples Multimodal Large Language Models (MLLMs) with Reinforcement Learning with Verifiable Rewards (RLVR) calls for a new pattern to guide continual adaptation. Advances in reasoning capability now make it feasible to impose constraints at the reasoning level. We formalize portability, a sample-level measure of how reusable the previous policy's behavior is on a new task, and empirically show that reasoning-level signals remain reliable on out-of-distribution samples while answer-level signals do not. We instantiate this as Reasoning Portability (RP) and propose Reasoning-based Dynamic Balance Continual Learning (RDB-CL), which modulates the per-sample Kullback-Leibler regularization in RLVR according to RP: a tight anchor preserves reusable reasoning on high-RP samples, while a relaxed anchor on low-RP samples permits exploration of new reasoning pathways. Experiments show that RDB-CL consistently outperforms baselines, improving Last accuracy by +12.0% over the vanilla RLVR baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.18902 2026-05-20 cs.IT cs.LG math.IT 版本更新

Variational Diffusion Channel Decoder

变分扩散通道解码器

Chengwei Zhang, Yifan Du, Siyu Liao

发表机构 * The School of Integrated Circuits（集成电路学院）； Sun Yat-sen University（中山大学）； Shenzhen, China（深圳，中国）

AI总结本文提出一种高效的变分扩散模型基于通道解码器，结合领域特定的信念传播过程和扩散模型的强学习能力，实现了低成本和高纠错性能。

详情

AI中文摘要

神经通道解码器作为一种数据驱动的信道解码策略，已在纠错能力方面展现出非常有前途的改进，优于经典方法。然而，这些基于深度学习的解码器的成功是以模型存储和计算复杂性大幅增加为代价的，阻碍了其在现实世界中对时间敏感和资源敏感的通信和存储系统中的实际应用。为了解决这一挑战，我们提出了一种高效的变分扩散模型基于通道解码器，有效地将领域特定的信念传播过程整合到现代扩散模型中。通过利用信念传播的低成本优势和扩散模型的强大学习能力，我们提出的神经解码器同时实现了极低的成本和高纠错性能。实验结果表明，与最先进的神经通道解码器相比，我们的模型通过在显著减少计算成本和模型大小的同时实现最佳解码性能，提供了一种可行的实用部署方案。

英文摘要

Neural channel decoder, as a data-driven channel decoding strategy, has shown very promising improvement on error-correcting capability over the classical methods. However, the success of those deep learning-based decoder comes at the cost of drastically increased model storage and computational complexity, hindering their practical adoptions in real-world time-sensitive resource-sensitive communication and storage systems. To address this challenge, we propose an efficient variational diffusion model-based channel decoder, which effectively integrates the domain-specific belief propagation process to the modern diffusion model. By reaping the low-cost benefits of belief propagation and strong learning capability of diffusion model, our proposed neural decoder simultaneously achieves very low cost and high error-correcting performance. Experimental results show that, compared with the state-of-the-art neural channel decoders, our model provides a feasible solution for practical deployment via achieving the best decoding performance with significantly reduced computational cost and model size.

URL PDF HTML ☆

赞 0 踩 0

2605.18900 2026-05-20 q-bio.OT cs.LG 版本更新

A Logistic Regression Model to Predict Malaria Severity in Children

一种用于预测儿童疟疾严重程度的逻辑回归模型

Mary Opokua Ansong, Asare Yaw Obeng, Samuel King Opoku

AI总结本研究提出了一种逻辑回归模型，利用环境和生物学因素预测儿童疟疾的严重程度，通过83.3%的准确率验证了模型的有效性，并强调了样本代表性的的重要性。

详情

DOI: 10.24018/ejece.2024.8.2.614
Journal ref: Eur. J. Electr. Eng. Comput. Sci. 8 (2024) 31-35

AI中文摘要

全球范围内疟疾是导致死亡的主要原因之一。研究人员试图基于气象数据、气候数据和疟原虫的繁殖周期开发预测疟疾暴发的模型。本研究基于环境和生物学因素预测疟疾的严重程度。本研究开发了一个逻辑回归模型，利用镰状红血球疾病、停滞水、垃圾堆、湿草地和使用驱虫蚊帐等因素进行预测，准确率为83.3%。研究在加纳博索姆特韦区进行，共有417名受访者。研究得出结论，尽管该区儿童极易感染疟疾，但病情严重程度非常低。本研究建议，在机器学习模型开发过程中，仅仅拥有良好的样本量是不够的，同时还需要有良好的各类标签样本代表性。

英文摘要

One of the main causes of death around the globe is malaria. Researchers have sought to develop predictive models for malaria outbreaks based on meteorological data, climate data and the breeding cycle of Plasmodium, the causative agent of malaria. This study predicts the severity of malaria based on environmental and biological factors. A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate. The study was carried out in the Bosomtwe District of Ghana with 417 respondents. It was deduced that although children in the District are highly prone to malaria infection, the severity is very low. The study recommends that not just having a good sample size alone is important during machine learning model development, but also having a good sample representation of the various class labels is equally important.

URL PDF HTML ☆

赞 0 踩 0

2605.18899 2026-05-20 cs.LG cs.AI 版本更新

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

不要让多臂老虎机反馈将连续LLM推荐系统更新偏离目标

Taesan Kim, Hyeongjun Yun, Jaegul Choo, Chung Park

发表机构 * SK Telecom（SK电信）； KAIST（韩国科学技术院）

AI总结本文提出了一种名为Anchored Bandit Policy Optimization (ABPO)的框架，用于持续改进基于生成式大语言模型的推荐系统，通过结合组内相对策略优化（GRPO）和显式处理曝光偏差和反馈模糊性，以减少因部署日志提供的策略形状上下文老虎机反馈导致的偏差，并提高推荐准确性。

详情

AI中文摘要

基于生成式大语言模型的推荐系统（LLM-Rec）需要持续部署后的更新，但部署日志仅提供策略形状的上下文老虎机反馈：结果仅在由先前服务策略暴露的项目上被观察到，导致曝光偏差，并产生部分、不对称的信号，包括相对可靠的积极响应和模糊的无响应。我们提出了一种连续LLM-Rec更新的Anchored Bandit Policy Optimization（ABPO）框架，结合组内相对策略优化（GRPO）与显式处理曝光偏差和反馈模糊性。具体来说，我们将在每个GRPO滚动组中插入暴露的推荐作为记录的锚点，使组内相对归一化能够针对先前策略实际暴露的动作进行校准，而不是仅针对新采样的滚动。因为正响应和无响应仅通过先前策略暴露被观察到，我们对固定锚点应用自归一化逆倾向评分，以校正策略不匹配。同时，我们将两种反馈类型进行不对称处理：正响应提供相对直接的推荐信号，而无响应仍然模糊，因为它们可能反映真正的不感兴趣或未观察到的外部因素。为了避免因模糊的无响应而过于激进的更新，我们用模型输出标记的置信度来削弱其惩罚，作为无监督的可靠性信号。在Amazon Reviews和MovieLens的五个领域中，我们的方法在推荐准确性上产生了持续的更新收益，同时比先前的基线方法更有效地缓解了先前策略引起的曝光偏差。

英文摘要

Generative LLM-based recommenders (LLM-Rec) require continual post-deployment updates, yet deployment logs provide only policy-shaped contextual bandit feedback: outcomes are observed solely for items exposed by a prior serving policy, inducing exposure bias and yielding partial, asymmetric signals consisting of relatively reliable positive responses and ambiguous no-responses. We propose an Anchored Bandit Policy Optimization (ABPO) framework for continual LLM-Rec updates that combines group-relative policy optimization (GRPO) with explicit treatment of exposure bias and feedback ambiguity. Specifically, we insert the exposed recommendation as a logged anchor into each GRPO rollout group, so that group-relative normalization is calibrated against the action actually exposed by the prior policy rather than against newly sampled rollouts alone. Because both positive- and no-responses are observed only through prior-policy exposure, we apply self-normalized inverse propensity scoring to the fixed anchor for both feedback types to correct for policy mismatch. At the same time, we treat the two feedback types asymmetrically in reliability: positive responses provide relatively direct endorsement signals, whereas no-responses remain ambiguous because they may reflect either true disinterest or unobserved external factors. To avoid overly aggressive updates from ambiguous no-responses, we temper their penalties with self-certainty, using the model's output-token confidence as a verifier-free reliability signal. Across five domains from Amazon Reviews and MovieLens, our method yields consistent post-update gains in recommendation accuracy while mitigating prior-policy-induced exposure bias more effectively than prior baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.18897 2026-05-20 eess.SP cs.AI cs.LG 版本更新

Cross-Subject Intracranial EEG Reconstruction from Scalp Recordings Using Multi-Scale Cross-Attention Transformers

基于多尺度交叉注意力变换器的跨受试者颅内脑电重构（使用头皮记录）

Tien-Dat Pham, Xuan-The Tran

发表机构 * HAI-Smartlink Research Lab, Anchi STE Company（HAI-Smartlink研究实验室、Anchi STE公司）； School of Mechanical Engineering, Vietnam Maritime University（越南海事大学机械工程学院）

AI总结本文提出了一种基于多尺度交叉注意力变换器（CAST）的方法，通过两阶段迁移学习策略，从头皮脑电中重建未见过的受试者的颅内脑电信号，实现了无需患者特定训练的跨受试者颅内脑电重构。

详情

AI中文摘要

颅内脑电（iEEG）提供高保真的神经记录，对临床和脑机接口应用至关重要，但获取这些信号需要侵入性手术。尽管最近的研究尝试从非侵入性头皮脑电估计iEEG，但大多数方法依赖于患者特定的模型，导致循环依赖：如果需要手术收集训练数据，非侵入性模型的实用性有限。在本研究中，我们通过预测未见过的患者的颅内信号来解决跨受试者iEEG重构的挑战，使用在其他人身上训练的模型。我们提出了CAST（跨注意力空间-时间变换器），一种机器学习框架，通过两阶段迁移学习策略将头皮脑电转换为多通道iEEG波形。首先，一个时间编码器在三个不同分辨率上提取多尺度神经表示。然后，由于患者之间的电极放置差异较大，一个通道感知的解码器仅使用少量目标受试者的数据进行校准。我们通过留一受试者法交叉验证在两个公共数据集上评估了所提出的方法，这两个数据集包含1,282个iEEG通道。实验结果表明，CAST在重构靠近头皮表面的皮层信号方面优于深度皮下活动。在高度可观察的运动感觉区域，模型在中央前回实现了峰值相关性高达r=0.864。此外，通过通道选择策略，CAST在可行的受试者上获得了平均相关性r=0.545，优于之前的同受试者基线。这些发现表明，无需广泛的患者特定训练，即可从头皮脑电中重构未见过的受试者的皮层iEEG信号，并且仅需短暂的校准阶段即可使模型适应新的硬件配置。

英文摘要

Intracranial EEG (iEEG) provides high-fidelity neural recordings essential for clinical and brain-computer interface applications, but acquiring these signals requires invasive surgery. While recent studies have attempted to estimate iEEG from non-invasive scalp EEG, most rely on patient-specific models, creating a circular dependency: if surgery is required to collect training data, the non-invasive model offers limited practical benefit. In this study, we address the challenge of cross-subject iEEG reconstruction by predicting intracranial signals for unseen patients using models trained on other individuals. We propose CAST (Cross-Attention Spatial-Temporal Transformer), a machine learning framework that translates scalp EEG into multi-channel iEEG waveforms through a two-stage transfer learning strategy. First, a temporal encoder extracts multi-scale neural representations at three different resolutions. Then, because electrode placements vary substantially across patients, a channel-aware decoder is calibrated using only a few minutes of data from the target subject. We evaluated the proposed method using leave-one-subject-out cross-validation on two public datasets comprising 1,282 iEEG channels. Experimental results demonstrate that CAST reconstructs cortical signals located near the scalp surface substantially better than deep subcortical activity. In highly observable sensorimotor regions, the model achieved peak correlations of up to r=0.864 in the precentral gyrus. Furthermore, with a channel selection strategy, CAST obtained a mean correlation of r=0.545 on viable subjects, outperforming previous within-subject baselines. These findings indicate that cortical iEEG signals can be reconstructed for unseen subjects from scalp EEG without extensive patient-specific training, and that only a brief calibration phase is sufficient to adapt the model to new hardware configurations.

URL PDF HTML ☆

赞 0 踩 0

2605.18892 2026-05-20 cs.LG cs.AI cs.DC 版本更新

Data-Free Client Contribution Estimation via Logit Maximization for Federated Learning

通过Logit最大化实现无数据的客户端贡献估计用于联邦学习

Asim Ukaye, Nurbek Tastan, Mubarak Abdu-Aguye, Karthik Nandakumar

发表机构 * MBZUAI, Abu Dhabi, UAE（MBZUAI，阿布扎赫德，阿联酋）； Michigan State University, Michigan, USA（密歇根州立大学，密歇根，美国）

AI总结本文提出了一种基于Logit最大化的无数据客户端贡献估计和聚合框架CELM，该框架无需共享原始数据、客户端元数据或辅助公开数据，通过客户端更新获取类别证据分数并构建跨客户端证据矩阵，以量化每类的竞争力和类别覆盖范围，从而计算出对少数类提供强判别性证据的客户端贡献权重，提高联邦学习的鲁棒性和性能。

Comments 22 pages, 7 figures

详情

AI中文摘要

联邦学习（FL）使计算机视觉模型能够协同学习，其中隐私和监管限制防止在设备或组织之间集中数据。然而，实际的FL部署往往表现出严重的类别不平衡和标签偏斜，导致标准聚合协议过度拟合主导客户端并降级少数类性能。我们提出了一种基于Logit最大化的无数据、按类别贡献估计和聚合框架（CELM），该框架不需要共享原始数据、客户端元数据或辅助公开数据。FL服务器通过客户端更新获取类别证据分数，并构建跨客户端证据矩阵，该矩阵量化了每类的竞争力和类别覆盖范围。使用该矩阵，我们计算出贡献权重，以提升为少数类提供强判别性证据的客户端的权重。所得到的聚合是稳定的，由于简单约束和动量平滑，且与标准FL训练流水线保持兼容。我们在受控的非独立同分布和病理标签分割的代表性视觉基准上评估了该方法，证明CELM基于的聚合提高了对不平衡和统计异质性的鲁棒性，同时在不需任何额外数据交换的情况下实现了更好的性能。

英文摘要

Federated learning (FL) enables collaborative learning of computer vision models, where privacy and regulatory constraints prevent centralizing data across devices or organizations. However, practical FL deployments often exhibit severe class imbalance and label skew, causing standard aggregation protocols to overfit dominant clients and degrade minority-class performance. We propose a data-free, class-wise contribution estimation and aggregation framework based on logit maximization (CELM) that does not require sharing raw data, client metadata, or auxiliary public datasets. The FL server probes client updates to obtain class-wise evidence scores and assembles a cross-client evidence matrix, which quantifies both per-class competence and class coverage. Using this matrix, we compute contribution weights that upweight clients providing strong, discriminative evidence for underrepresented classes. The resulting aggregation is stable due to simplex constraints and momentum smoothing, and it remains compatible with standard FL training pipelines. We evaluate the approach on representative vision benchmarks under controlled non-IID and pathological label splits, demonstrating that CELM-based aggregation improves robustness to imbalance and statistical heterogeneity, while yielding better performance without requiring any additional data exchange.

URL PDF HTML ☆

赞 0 踩 0

2605.18891 2026-05-20 cs.LG cs.AI 版本更新

Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries

在取消学习后使用头部条件化的候鸟审计推理轨迹记忆化声明

Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * Northeastern University, USA（东北大学）； University of Illinois Urbana-Champaign, USA（伊利诺伊大学厄巴纳-香槟分校）； Southern Methodist University, USA（南方 Methodist 大学）

AI总结该研究通过在DeepSeek-R1-Distill-Qwen-7B上使用LoRA记忆化的虚构作者和NPO取消学习，结合六token候鸟头部条件，审计推理轨迹记忆化声明，发现正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化，也不能排除其存在。

详情

AI中文摘要

对推理模型的取消学习评估有时会显示绕过模式。答案侧看起来已取消学习，但模型自身的推理轨迹仍会发出遗忘内容，这种差距被当作证据表明权重仍记忆。我们使用LoRA记忆化的虚构作者和NPO取消学习，在六token候鸟头部条件下审计此阅读。在一种种子下，用相同的权重交换推理轨迹为短非候鸟预填，答案率下降幅度等于绕过间隙本身，无论预填是否模仿训练模板。在第二种种子下，绕过间隙缩小而非消失，预填交换方向反转并使答案率达到上限。正向解析器拆分绕过间隙本身并不能识别隐藏的权重级记忆化，也不能排除其存在。在不同的distillate中，相同指标因解析器无法找到闭合标签而改变符号。我们推荐在解码时进行模板交换作为廉价的合理性检查，与传统审计并行。

英文摘要

Evaluations of unlearning on reasoning models sometimes show a bypass pattern. The answer side looks unlearned, but the model's own thinking trace keeps emitting the forgotten content, and the gap is taken as evidence that the weights still remember. We audit this reading on DeepSeek-R1-Distill-Qwen-7B with LoRA-memorized fictional authors and NPO unlearning, conditioned on a six-token canary head. On one seed, swapping the thinking trace for a short non-canary prefill on the same weights drops the answer rate by as much as the bypass gap itself, whether the prefill mimics the training template or not. On a second seed the bypass gap shrinks rather than vanishing, and the prefill swap reverses direction and brings the answer rate to ceiling. A positive parser-split bypass gap thus does not by itself identify hidden weight-level memorization, and does not rule it out either. On a different distillate the same metric flips sign because the parser cannot find the closing tag. We recommend a decode-time template swap as a cheap sanity check alongside the canonical audit.

URL PDF HTML ☆

赞 0 踩 0

2605.18889 2026-05-20 cs.LG cs.AI 版本更新

Soft Learning

软学习

Mohammed Aledhari, Ali Aledhari, Fatimah Aledhari, Mohamed Rahouti

发表机构 * University of North Texas（北卡罗来纳州立大学）； Fordham University（福尔特姆大学）

AI总结本文提出软学习框架，通过交叉验证非负最小二乘法发现最优组合权重，实现比深度网络快数十倍的训练速度，同时具备内在可解释性和未来扩展性，优于多种方法，在70%的任务上排名第一。

详情

AI中文摘要

现代机器学习迫使从业者在强大的但昂贵的深度网络和快速但有限的经典算法之间做出选择。本文介绍了软学习，一个维护异质专家库的框架，涵盖线性模型、树集成、核机和神经网络，并通过交叉验证非负最小二乘法发现可证明最优的组合权重。软学习保证能匹配或超过其专家的最佳加权组合，仅在CPU上训练速度比深度网络快两到三个数量级（72-435倍，取决于测试配置），通过学习的权重提供内在可解释性，揭示哪种算法范式最适合数据，并且具有未来保障性：添加专家能保证性能维持或提升。在37个数据集（25个分类，12个回归）上，针对包括CatBoost和调优深度网络在内的九种方法，软学习在70%的任务上排名第一，获得最佳平均排名（Friedman检验，p=1.12×10^-12），并且是唯一同时在分类和回归上均表现优异的方法，无需GPU硬件或超参数调优。这些结果表明从“哪种算法最好？”到“什么是有证明最优的组合？”的范式转变，软学习通过正式保证回答任何数据模态的问题。

英文摘要

Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft Learning, a framework that maintains a library of heterogeneous specialists -- spanning linear models, tree ensembles, kernel machines, and neural networks -- and discovers provably optimal combination weights through cross-validated non-negative least squares. Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, trains over two orders of magnitude faster than deep networks on CPU alone (72-435x faster across tested configurations), provides inherent interpretability through learned weights that reveal which algorithmic paradigm best fits the data, and is future-proof: adding specialists is mathematically guaranteed to maintain or improve performance. Across 37 datasets (25 classification, 12 regression) against nine methods including CatBoost and tuned deep networks, Soft Learning ranks first on 70% of tasks, achieves the best mean rank (Friedman test, p = 1.12 x 10^-12), and is the only method to simultaneously excel at both classification and regression -- all without GPU hardware or hyperparameter tuning. These results suggest a paradigm shift from "which algorithm is best?" to "what is the provably optimal combination?" -- a question Soft Learning answers with formal guarantees for any data modality.

URL PDF HTML ☆

赞 0 踩 0

2605.18884 2026-05-20 cs.LG cs.CV 版本更新

Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition

在情绪树中导航：用于多模态情绪识别的分层双曲RAG

Zeheng Wang, Bo Zhao, Yijie Zhu, Zhishu Liu, Hui Ma, Ruixin Zhang, Shouhong Ding, Qianyu Xie, Zitong Yu

发表机构 * Great Bay University（广东东莞大亚湾大学）； Tencent Youtu Lab（腾讯优图实验室）

AI总结本文提出HyperEmo-RAG，一种利用结构化情绪知识库的检索增强生成框架，通过双曲空间嵌入和证据图构建来提升多模态情绪识别的性能。

详情

AI中文摘要

多模态情绪识别旨在整合文本、音频和视频源以理解人类情感状态。尽管多模态大语言模型在多模态推理方面表现优异，但通常将情绪类别视为独立标签，忽略了人类心理的丰富层次分类。此外，缺乏外部上下文知识使它们容易过度解释噪声线索，进一步复杂化细粒度情绪分类。为了解决这些问题，我们提出了HyperEmo-RAG，一种检索增强生成框架，利用结构化情绪知识库。我们的框架引入了两个关键创新。1）层次双曲 grounding。认识到情绪分类的内在层次树结构，我们将层次情绪标签和多模态样本嵌入到连续双曲空间（Poincaré球）中，并设计了层次束搜索 deliberation 过程，逐步从粗粒度到细粒度级别检索样本。2）结构化证据注入。基于检索到的证据，我们构建证据图，并通过Tree-Aware Attention机制和EmotionGraphFormer将结构化知识作为显式认知上下文注入LLM中，保持图结构信息的完整性。在多个数据集上的实验表明，HyperEmo-RAG显著优于现有方法。

英文摘要

Multimodal emotion recognition aims to integrate text, audio, and video sources to understand human affective states. Although multimodal large language models excel at multimodal reasoning, they typically treat emotion categories as independent labels, ignoring the rich hierarchical taxonomy of human psychology. Moreover, lacking external contextual knowledge makes them highly susceptible to over-interpreting noisy cues, further complicating fine-grained emotion classification. To address these issues, we propose \textbf{HyperEmo-RAG}, a retrieval-augmented generation framework that leverages a structured emotional knowledge base. Our framework introduces two key innovations. 1) Hierarchical hyperbolic grounding. Recognizing the inherent hierarchical tree structure of emotion taxonomies, we jointly embed hierarchical emotion labels and multimodal samples into a continuous hyperbolic space (Poincaré ball) and design a hierarchical beam-search deliberation process that progressively retrieves samples from coarse to fine-grained levels. 2) Structured evidence injection. Based on the retrieved evidence, we construct an evidence graph and inject the structured knowledge as explicit cognitive context into the LLM through a Tree-Aware Attention mechanism and an EmotionGraphFormer, preserving the integrity of graph-structured information. Experiments on multiple datasets demonstrate that HyperEmo-RAG significantly outperforms existing methods.

URL PDF HTML ☆

赞 0 踩 0

2605.18883 2026-05-20 cs.LG cs.AI 版本更新

Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators

预测并非物理：在神经模拟器中学习和评估守恒量

Andrew Bukowski, Aditya Kothari, Simba Shi, Ishir Rao

发表机构 * Yale University（耶鲁大学）

AI总结本文研究了神经网络能否从物理轨迹中学习或选择全局守恒量，通过三个哈密顿系统（抛体运动、单摆和弹簧-质量系统）验证了不同模型在守恒律保持方面的性能，发现黑盒CDN在加入时间一致性损失时表现更优，而多项式CDN对训练配置敏感。

Comments 10 pages

详情

AI中文摘要

训练在哈密顿轨迹上的扩散模型可以达到接近10^-3的滚动MSE，但其能量的标准差比地面真实能量的标准差大7500到36000倍，表明未能保持守恒定律。这一差距促使我们提出核心问题：神经网络能否从物理轨迹中学习或选择全局守恒量？我们研究了三个哈密顿系统：抛体运动、单摆和弹簧-质量系统。我们使用了结构化的T(v)+V(q)能量模型、黑盒守恒发现网络（CDN）、多项式CDN以及条件扩散基线。结构化网络在干净数据上对分析能量的R²≥0.9999，而黑盒CDN在训练时加入时间一致性损失和小的对齐损失（λ_align=0.2）时，R²≥0.996。当λ_align=0时，CDN在单摆和弹簧-质量系统上Pearson R²崩溃（<10^-3），表明仅靠时间一致性无法可靠地识别真实能量。在1%的加性高斯噪声下，CDN在抛体和弹簧-质量系统上优于结构化模型，表明CDN可能在该设置下对噪声输入更鲁棒。然而，多项式CDN对训练配置敏感：在单摆系统上短训练计划下R²=0.78，但通过更多训练时间和数据可以达到R²=0.9998，无论是否加入噪声。

英文摘要

A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and 36000 times larger than the ground-truth energy standard deviation, indicating a failure to preserve conservation laws. This gap motivates our central question of whether neural networks can learn or select globally conserved quantities from physical trajectories. We investigate this across three Hamiltonian systems: projectile motion, pendulum, and spring-mass. We use a structured $T(v)+V(q)$ energy model, a black-box Conservation Discovery Network (CDN), a polynomial CDN, and a conditional diffusion baseline. The structured network reaches $R^2 \geq 0.9999$ against analytical energy on clean data, while the black-box CDN reaches $R^2 \geq 0.996$ when trained with temporal consistency plus a small alignment loss to analytical energy at $t=0$ ($λ_{\mathrm{align}}=0.2$). With $λ_{\mathrm{align}}=0$, CDN Pearson $R^2$ collapses on pendulum and spring-mass ($< 10^{-3}$), showing that temporal consistency alone is not enough to reliably identify the true energy. Under $1\%$ additive Gaussian noise, the CDN outperforms the structured model on the projectile and spring-mass systems, suggesting that the CDN may be more robust to noisy inputs in this setting. However, the polynomial CDN is sensitive to training configuration: it achieves $R^2=0.78$ under a short training schedule on the pendulum system, but reaches $R^2=0.9998$ with more training time and data, regardless of whether noise is added.

URL PDF HTML ☆

赞 0 踩 0

2605.18882 2026-05-20 cs.LG cs.AI 版本更新

To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents

叫还是不叫：诊断LLM代理中的内在过度调用偏差

Wei Shi, Ziheng Peng, Sihang Li, Xiting Wang, Xiang Wang, Mengnan Du, Na Zou

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Renmin University of China（中国人民大学）； The Chinese University of Hong Kong Shenzhen（香港中文大学（深圳））； University of Science and Technology of China（中国科学技术大学）

AI总结本文研究了LLM代理中过度调用现象，提出内在偏差假说，通过稀疏自编码器恢复行为对齐的特征基，减少到带符号激活边距，并估计偏移量，从而修正过度调用问题。

详情

AI中文摘要

LLM代理表现出一种一致的倾向，即在不需要工具的情况下也频繁调用工具。在When2Call基准测试中，三个家族的六个模型显示出较高的调用准确性，但调用准确性远低于不调用准确性，导致总体准确性在55%-70%之间。我们将其归因于内在偏差假说（IBH）：调用/不调用决策映射具有激活无关的调用偏移，因此模型在激活平衡时仍倾向于调用。使用稀疏自编码器（SAEs），我们恢复了与调用/不调用决策对齐的特征基，将其减少到带符号激活边距，并直接估计偏移量。在所有六个模型中，只有当不调用激活超过调用激活时，模型才是决策中性的，这与IBH一致。然后，我们通过自适应边距校准引导（AMCS）进行因果测试，这是一种沿SAE解码器方向的闭合形式反偏移。消除诊断出的偏移量可以减轻过度调用并提高总体准确性，同时调用准确性下降很小。我们的工作将过度调用从经验现象转变为可以进行因果修正的机制性对象。代码可在https://github.com/SKURA502/agent-sae/上获取。

英文摘要

LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three families show high call accuracy but much lower no-call accuracy, leaving overall accuracy in the 55%-70% range. We trace this to an Intrinsic Bias Hypothesis (IBH): the call/no-call decision mapping carries an activation-independent call offset, so the model favors call even at activation parity. Using Sparse Autoencoders (SAEs), we recover behavior-aligned feature bases for the call/no_call decision, reduce them to a signed activation margin, and estimate the offset directly. Across all six models, the model is decision-neutral only when no_call activation outweighs call activation, consistent with IBH. We then causally test IBH with Adaptive Margin-Calibrated Steering (AMCS), a closed-form counter-bias shift along SAE decoder directions. Cancelling the diagnosed offset mitigates over-calling and improves overall accuracy with a negligible drop in call accuracy. Our work recasts over-calling from an empirical phenomenon into a mechanistic object amenable to causal correction. Code is available at https://github.com/SKURA502/agent-sae/.

URL PDF HTML ☆

赞 0 踩 0

2605.18881 2026-05-20 cs.LG physics.flu-dyn 版本更新

Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning

气味导航中通过记忆增强强化学习的流辅助铸造策略的出现

Changxu Zhao, Dongxiao Zhao, Xin Bian, Gaojin Li

发表机构 * State Key Laboratory of Ocean Engineering, School of Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China（海洋工程国家重点实验室，海洋与土木工程学院，上海交通大学，上海200240，中华人民共和国）； State Key Laboratory of Fluid Power and Mechatronic Systems, Department of Engineering Mechanics, Zhejiang University, Hangzhou 310027, People’s Republic of China（流体动力与机械系统国家重点实验室，工程力学系，浙江大学，杭州310027，中华人民共和国）

AI总结研究通过记忆增强强化学习探讨了在动态流场中动物如何利用记忆长度和流条件优化气味搜索效率，发现智能体通过自适应调整搜索轨迹几何形状和启动铸造的浓度阈值来最大化成功概率。

2605.18880 2026-05-20 cs.LG cs.CV q-bio.QM 版本更新

A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

一种多维聚类方法用于识别先天性免疫缺陷

Nishad Kulkarni, Alexandra K. Martinson, Nicholas L. Rider, Michael Keller, Syed Muhammad Anwar

发表机构 * Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington, DC（Sheikh Zayed儿童外科创新研究所，儿童医院，华盛顿特区）； Childrens National Hospital, Washington, DC（儿童医院，华盛顿特区）； Department of Health Systems & Implementation Science, Division of Allergy & Immunology Virginia Tech Carilion School of Medicine, Roanoke, VA（健康系统与实施科学部门，过敏与免疫学分会弗吉尼亚理工大学Carilion医学院，罗阿诺克，VA）； Division of Allergy & Immunology Childrens National Hospital, Washington, DC（过敏与免疫学分会儿童医院，华盛顿特区）； School of Medicine and Health Sciences, George Washington University, Washington, DC（医学与健康科学学院，乔治华盛顿大学，华盛顿特区）

AI总结本文提出一种多维聚类方法，用于从全国数据注册中识别新的罕见疾病模式并提取与先天性免疫缺陷相关的特征，通过改进IEI特征意识和开发罕见疾病人群分析的数据工具包，扩展了复杂医疗记录到可被无监督ML解释的数据结构。

Comments Accepted at EMBC 2026

详情

AI中文摘要

先天性免疫缺陷（IEI）等罕见疾病需要早期诊断以防止终器官损伤并提高生活质量。获取和整理大规模电子健康记录（EHR）数据的障碍限制了常规数据驱动分析保持在IEI和其他罕见疾病趋势的前沿。在IEI中开发机器学习（ML）算法进行模式识别以及已发表的方法研究如何系统地处理和整合复杂医疗数据有限。我们提出的流程，包括数据整理和ML聚类算法，旨在识别新的罕见疾病模式并从全国数据注册中提取IEI相关的特征。我们的EHR数据格式化和处理方法提出了一个流程，将原始免疫学实验室数据转换为向量。这进一步结合了通过聚类进行疾病模式识别的超参数调优。本研究改进了IEI特征意识，开发了罕见疾病人群分析的数据工具包，并扩展了将复杂医疗记录转换为可被无监督ML解释的数据结构。

英文摘要

Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine data driven analyses to remain on the forefront of IEI and other rare disease trends. Development of machine learning (ML) algorithms in IEI for pattern recognition as well as published methodology examining how to systematically process and integrate complex medical data is limited. Our proposed pipeline, including data curation and ML clustering algorithms, is designed to recognize novel rare disease patterns and extract IEI- associated features from a national data registry. Our methodology for EHR data formatting and processing presents the pipeline that transforms raw immunologic lab data into vectors. This is further combined with hyperparameter tuning for diseases pattern recognition via clustering. This study refines IEI feature awareness, develops data tool kits for rare disease populations analysis, and expands on transforming complex medical records in data structures interpretable by unsupervised ML.

URL PDF HTML ☆

赞 0 踩 0

2605.18878 2026-05-20 eess.SP cs.CV cs.LG eess.IV 版本更新

Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis

心力衰竭再入院风险的肺部超声生物标志物预后价值：一项试点数据驱动分析

Jana Armouti, Laura Hutchins, Jacob Duplantis, Thomas Deiss, Thales Nogueira Gomes, Keyur H. Patel, Seema Walvekar, Shane Guillory, Thomas H. Fox, Amita Krishnan, Ricardo Rodriguez, Bennett DeBoisblanc, Deva Ramanan, John Galeotti, Gautam Gare

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； LSUHSC Internal Medicine（路易斯安那州立大学医学部）； Cosmetic Surgery Facility LLC（美容外科诊所有限公司）

AI总结本研究通过数据驱动方法利用住院期间获得的B型肺部超声（LUS）数据，预测30天内心力衰竭再入院风险，发现依赖性下肺区域、时间差特征以及多视图特征拼接在预测中表现最佳，展示了超声生物标志物在非侵入性心力衰竭风险分层中的实用性。

详情

AI中文摘要

住院后30天内再入院是心力衰竭（CHF）导致发病率、死亡率和可避免医疗支出的主要驱动因素。当前的临床风险分层工具主要依赖于非成像数据，且预测性能有限。床旁肺部超声（LUS）提供了一个敏感的、非侵入性的窗口，以观察肺部充血，这特征于CHF失代偿，但其用于再入院预测的预后作用仍待探索。我们提出了一个试点可行性研究，这是首个系统使用住院期间获得的B型LUS进行机器学习预测30天内CHF再入院的系统研究。从预训练的Temporal Shift Module（TSM）ResNet-18编码器中提取定量时空嵌入，并分别评估可解释的生物标志物特征。通过结构化消融研究肺部视图、时间表示、多视图融合和跨肺增强，我们识别出驱动再入院风险的关键成像因素。我们的发现表明（1）依赖性下肺区域（左3、右3）携带最强的预后信号，与它们对静水性充血的更大易感性一致；（2）连续检查之间的时间差特征显著优于单时间点表示，突显了捕捉疾病轨迹的重要性；（3）多视图特征拼接产生了最佳整体性能，我们的最佳MLP模型实现了F1得分为0.80（95% CI: 0.62-0.96）。生物标志物分析进一步表明，胸膜线异常，包括断裂和凹陷，的信息量与传统A线和B线标志物相当。这些结果支持POCUS衍生的生物标志物作为实用、可解释的非侵入性CHF风险分层工具。

英文摘要

Hospital readmission within 30 days of discharge is a leading driver of morbidity, mortality, and avoidable healthcare expenditure in congestive heart failure (CHF). Current clinical risk stratification tools rely primarily on non-imaging data and exhibit limited predictive performance. Point-of-care lung ultrasound (LUS) offers a sensitive, noninvasive window into the pulmonary congestion that characterizes CHF decompensation, yet its prognostic utility for readmission prediction remains largely unexplored. We present a pilot feasibility study, the first systematic machine learning study using B-mode LUS acquired during hospitalization to predict 30-day CHF readmission. Quantitative spatiotemporal embeddings are extracted from a pretrained Temporal Shift Module (TSM) ResNet-18 encoder, and interpretable biomarker features are separately evaluated. Through structured ablations over lung view, temporal representation, multi-view fusion, and cross-lung augmentation, we identify the key imaging factors driving readmission risk. Our findings reveal that (1) dependent lower-lung regions (Left-3, Right-3) carry the strongest prognostic signal, consistent with their greater susceptibility to hydrostatic congestion; (2) temporal difference features between sequential examinations substantially outperform single-timepoint representations, highlighting the importance of capturing disease trajectory; and (3) multi-view feature concatenation yields the best overall performance, with our top MLP model achieving an F1 score of 0.80 (95% CI: 0.62-0.96). Biomarker analysis further reveals that pleural-line abnormalities, including breaks and indentations, are as informative as the canonical A-line and B-line markers. These results support POCUS-derived biomarkers as practical, interpretable tools for noninvasive CHF risk stratification.

URL PDF HTML ☆

赞 0 踩 0

2605.18873 2026-05-20 cs.CR cs.AI cs.LG 版本更新

GenAI-FDIA: Physics-Informed Generative Models for False Data Injection Attacks

GenAI-FDIA：基于物理的生成模型用于虚假数据注入攻击

Mohammad A. Razzaque, Muta Tah Hira

发表机构 * School of Computing, Engineering and Digital Technologies, Teesside University, UK（Teesside大学计算与工程数字技术学院，英国）； Smartifier Ltd, Stockton-on-Tees, UK（Smartifier有限公司，英国Stockton-on-Tees）

AI总结本文提出GenAI-FDIA框架，通过物理兼容的生成模型合成虚假数据注入攻击，验证了不同架构在电力系统中的有效性，并解决了生成模型中出现的新型故障模式。

Comments Submitted to IEEE Transactions on Smart Grid

详情

AI中文摘要

训练和评估用于电力系统的虚假数据注入攻击（FDIA）检测器受到数据稀缺的限制。运营电网测量数据具有商业敏感性，而手工制作的攻击无法捕捉由网络物理结构强加的复杂分布特性。我们提出了GenAI-FDIA框架，该框架在20种架构中进行基准测试，涵盖Wasserstein GANs、MMD-VAEs、归一化流、扩散模型以及跨家族混合模型。这些模型在三个IEEE测试平台（14节点直流、30节点直流和14节点交流）上进行评估，使用数据驱动的坏数据检测（BDD）阈值校准进行60/20/20时间分割。我们的实证结果验证了这些模型能够生成高保真的攻击，所有架构在14节点网络上达到86.6%以上的规避率；此外，限制攻击者的拓扑知识会带来可测量的隐蔽性下降（p ≤ 0.0022）。关键的是，我们识别出一种之前未报告的故障模式：在归一化特征空间中直接应用仿射物理投影会严重位移攻击向量，使BDD规避率从约55%降至<2%在30节点测试平台。我们通过一种新的推理时间谐调器解决此问题，恢复所有物理兼容变体的完全隐蔽性（ε_BDD=100%）而无需重新训练。最后，我们隔离了高级混合架构中的协方差坍塌现象（κ≈-0.076），并通过50个周期的预热计划进行修正（κ→0.785，MMDΔ=-3.1%）。最终，GenAI-FDIA提供了适用于任何受物理约束的生成模型在电力系统安全中的稳健恢复蓝图。

英文摘要

Training and evaluating false data injection attack (FDIA) detectors for power systems is constrained by data scarcity. Operational grid measurements are commercially sensitive, and hand-crafted attacks fail to capture complex distributional structures imposed by network physics. We present \textsc{GenAI-FDIA}, a framework benchmarking a pool of $P{=}20$ architectures for physics-compliant FDIA synthesis, spanning Wasserstein GANs, MMD-VAEs, normalising flows, diffusion models, and cross-family hybrids. These are evaluated across three IEEE testbeds (14-bus DC, 30-bus DC, and 14-bus AC) under a 60/20/20 chronological split using data-driven Bad Data Detection (BDD) threshold calibration. Our empirical results verify that these models generate high-fidelity attacks, with all architectures achieving evasion rates of $ε_{\text{BDD}} \ge 86.6\%$ on the 14-bus network; additionally, limiting an attacker's topological knowledge induces a measurable degradation in stealthiness ($p \le 0.0022$). Crucially, we identify a previously unreported failure mode: applying affine physics projections directly in normalised feature spaces critically displaces the attack vector, collapsing BDD evasion from ${\sim}55\%$ to $<\!2\%$ on the 30-bus testbed. We resolve this via a novel inference-time harmoniser, restoring full stealthiness ($ε_{\text{BDD}}{=}100\%$) across all physics-informed variants without retraining. Finally, we isolate a covariance-collapse phenomenon ($κ\approx {-}0.076$) within advanced hybrid architectures and rectify it through 50-epoch warm-up schedules ($κ\to 0.785$, $Δ\text{MMD}={-}3.1\%$). Ultimately, \textsc{GenAI-FDIA} delivers a robust recovery blueprint applicable to any physics-constrained generative model deployed for power-system security.

URL PDF HTML ☆

赞 0 踩 0

2605.18872 2026-05-20 cs.LG cs.AI cs.RO 版本更新

EUPHORIA: Efficient Universal Planning via Hybrid Optimization for Robust Industrial Robotic Assembly

EUPHORIA: 通过混合优化实现高效通用规划以实现稳健的工业机器人装配

Shih-Yu Lai, Chia-Ching Yen, Yang-Ting Shen, Peter Yichen Chen, Yu-Lun Liu, Bing-Yu Chen

发表机构 * National Taiwan University（国立台湾大学）； MoonShine Animation Studio（MoonShine动画工作室）； National Cheng Kung University（国立成功大学）； The University of British Columbia（不列颠哥伦比亚大学）； National Yang Ming Chiao Tung University（阳明交通大学）

AI总结本文提出EUPHORIA框架，通过混合优化策略实现通用少样本适应和动态效率，解决建筑机器人装配中规划器高度专业化和操作低效的问题，结合元几何编码器、物理引导图变压器和残差稳定性校正等方法，实现高效且鲁棒的装配规划。

详情

AI中文摘要

建筑机器人装配面临持续瓶颈：现有规划器要么高度专业化，需要每次新几何设计都进行昂贵的再训练，要么操作低效，将结构序列和运动学运动视为独立过程。我们提出了EUPHORIA，一个统一框架，通过混合优化策略实现通用少样本适应和动态效率。为克服再训练瓶颈，我们提出了基于图超网络的元几何编码器：不同于标准对比学习仅在特征级识别，我们的超网络动态从最小支持集中生成策略参数，使参数级适应复杂拓扑（如穹顶、拱门）而无需基于梯度的再训练。对于结构推理，我们引入了通过软演员-评论家（SAC）训练的物理引导图变压器，其物理偏置注意力机制通过离散元模型（DEM）模拟的接触力调节注意力分数，引导规划器朝向结构关键连接。我们进一步通过运动学感知序列确保操作效率，其中SAC目标惩罚高能转换。最后，我们通过残差稳定性校正弥合仿真到现实的差距，这是一种可微优化层，通过最小化联合能量-稳定性成本优先级来微调粗略装配动作。实验表明，EUPHORIA显著减少了与解耦基线相比的能量消耗，并在未见的非标准几何上实现了最先进的成功率，通过融合元学习、物理引导注意力和残差优化，实现一个连贯的通用规划器。

英文摘要

Robotic assembly in architectural construction faces a persistent bottleneck: existing planners are either highly specialized, requiring prohibitive retraining for every new geometric design, or operationally inefficient, treating structural sequencing and kinematic motion as disjoint processes. We present EUPHORIA, a unified framework that achieves universal few-shot adaptability and dynamic efficiency through a hybrid optimization strategy. To overcome the retraining bottleneck, we propose a Meta-Geometric Encoder based on Graph Hypernetworks: unlike standard contrastive learning, which performs only feature-level recognition, our hypernetwork dynamically generates policy parameters from a minimal support set, enabling parameter-level adaptation to complex topologies (e.g., domes, arches) without gradient-based retraining. For structural reasoning, we introduce a Physics-Informed Graph Transformer trained via Soft Actor-Critic (SAC), with a Physics-Bias Attention mechanism that modulates attention scores using contact forces from Discrete Element Model (DEM) simulations, guiding the planner toward structurally critical connections. We further ensure operational efficiency through Kinematics-Aware Sequencing, where the SAC objective penalizes high-energy transitions. Finally, we bridge the Sim2Real gap via Residual Stability Correction, a differentiable optimization layer that fine-tunes coarse assembly actions by minimizing a joint energy-stability cost prior to execution. Experiments show that EUPHORIA significantly reduces energy consumption over decoupled baselines and achieves state-of-the-art success rates on unseen, non-standard geometries with minimal few-shot examples, fusing meta-learning, physics-informed attention, and residual optimization into a cohesive, generalized planner.

URL PDF HTML ☆

赞 0 踩 0

2605.18871 2026-05-20 cs.LG cs.AI 版本更新

Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning

基于不确定性感知的结构LLM推理的分布能量模型

Shireen Kudukkil Manchingal, Abhey Kalia, Fernanda Gonçalves, Shebin Rawther

发表机构 * Oxford Dynamics Harwell Science and Innovation Campus（牛津动力学哈威尔科学与创新校园）

AI总结本文提出了一种分解的能量函数，结合了学习的质量评分器和确定性分析约束惩罚，用于验证结构LLM输出。该方法通过两步推理循环触发目标再生或 abstention，能够在多个基准测试中超越单次Qwen-72B，并减少约束违反。

详情

AI中文摘要

当大型语言模型生成结构化输出如旅行计划、代码解决方案或多步证明时，个别推理步骤可能正确，但整体输出可能违反预算、失败测试用例或与先前推论矛盾。我们提出了一种分解的能量函数，结合了学习的质量评分器和确定性分析约束惩罚，用于验证结构LLM输出。质量评分器是单个冻结编码器上的异构集合，包含低秩适配器（3%可训练参数）；集合均值对候选者进行排名，标准差量化epistemic不确定性，驱动一个两步推理循环，触发目标再生或 abstention。在五个基准测试（GSM8K、MuSR、TravelPlanner、TACO、Knights & Knaves）中，我们的149M参数验证器协调一个7-26B开放生成器池，在每个基准测试中均优于单次Qwen-72B，与Claude Sonnet 4.6在MuSR上匹配（67.7% vs. 68.0%），并且在TravelPlanner上将约束违反减少53%（相对于Opus 4.6，oracle 0.028，随机 0.231）。两种方法是互补的：结构验证在约束可检查时获胜（验证器捕捉信号前沿模型无法自我检测），而预训练规模先验在不可检查时获胜（叙述推理、代码语义）。跨数据集的混淆分析确认在四个推理任务上确实存在质量区分，并识别出代码中的模型身份捷径，通过最后一层重新训练得以缓解。评分器在困难数据上训练后可实现零样本转移：一个MuSR训练的评分器在没有看到数学问题的情况下在GSM8K上达到93.9%。

英文摘要

When Large Language Models produce structured outputs such as travel plans, code solutions, or multi-step proofs, individual reasoning steps may appear correct while the output as a whole violates budgets, fails test cases, or contradicts earlier deductions. We propose a decomposed energy function that combines a learned quality scorer with deterministic analytical constraint penalties for verifying structured LLM outputs. The quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder (3% trainable parameters); the ensemble mean ranks candidates while the standard deviation quantifies epistemic uncertainty, driving a two-pass inference loop that triggers targeted regeneration or abstention. Across five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), our 149M-parameter verifier orchestrating a pool of 7-26B open generators outperforms single-shot Qwen-72B on every benchmark, matches Claude Sonnet 4.6 on MuSR (67.7% vs. 68.0%), and reduces constraint violations by 53% relative to Opus 4.6 on TravelPlanner (oracle 0.028, random 0.231). The two routes are complementary: structural verification wins when constraints are checkable (the verifier captures signal frontier models cannot self-detect), while pretraining-scale priors win where they are not (narrative inference, code semantics). A cross-dataset confounding analysis confirms genuine quality discrimination on four reasoning tasks and identifies a model-identity shortcut on code, mitigated via last-layer retraining. Scorers trained on difficult data transfer zero-shot: a MuSR-trained scorer achieves 93.9% on GSM8K without seeing a math problem.

URL PDF HTML ☆

赞 0 踩 0

2605.18869 2026-05-20 cs.LG cs.AI cs.NE 版本更新

MO-CAPO: Multi-Objective Cost-Aware Prompt Optimization

MO-CAPO：多目标成本感知提示优化

Jan Büssing, Moritz Schlager, Timo Heiß, Tom Zehle, Matthias Feurer

发表机构 * Technical University of Munich (TUM), Munich Center for Machine Learning (MCML)（慕尼黑工业大学（TUM）、慕尼黑机器学习中心（MCML））； LMU Munich, Munich Center for Machine Learning (MCML)（慕尼黑大学（LMU）、慕尼黑机器学习中心（MCML））； University of Freiburg, ELLIS Institute Tübingen（弗赖堡大学、图宾根ELLIS研究所）； TU Dortmund University, Lamarr Institute for Machine Learning（多特蒙德工业大学、拉马尔机器学习与人工智能研究所）

AI总结本文提出MO-CAPO，一种多目标提示优化算法，同时优化性能和推理成本，并通过预算分配实现高效优化，通过评估四个任务和三个LLM，证明其在噪声R2指标上优于NSGA-II基线，并在较低预算下达到竞争性性能。

详情

AI中文摘要

大型语言模型（LLMs）在广泛的任务上表现出色，但对提示设计高度敏感，促使需要自动提示优化。现有方法主要关注性能，忽略竞争目标如推理成本或延迟。同时，现有多目标提示优化工作依赖于现成的NSGA-II，忽略优化效率。为此，我们引入MO-CAPO，一种新的多目标提示优化算法，同时优化性能和推理成本，利用预算分配实现成本高效的优化。我们进一步提出一个面向部署的成本目标，捕捉LLM推理的完整计算概况。我们评估了我们的方法在四个任务和三个LLM上的表现，并将其与基于NSGA-II的多目标方法和最先进的单目标提示优化器进行比较。结果表明，MO-CAPO一致地识别出强、稳健和多样的Pareto前沿近似，同时保持成本效率。它在12种情况中的8种情况下在噪声R2指标上优于NSGA-II基线，并且在显著较低的预算下常能达到竞争性性能。发现的解决方案集涵盖了被单目标优化器遗漏的多样化性能-成本权衡，但顶级性能候选者仍与单目标解决方案竞争。此外，我们进行了首次多目标机器学习实验的评估，考虑了泛化和鲁棒性通过噪声R2和近似间隙，使解决方案质量的评估更加现实。MO-CAPO使从业者能够从高效发现的多个提示中选择，这些提示提供不同的性能和成本权衡。

英文摘要

Large language models (LLMs) achieve strong performance across a wide range of tasks but are highly sensitive to prompt design, motivating the need for automatic prompt optimization. Existing methods predominantly focus on performance alone, ignoring competing objectives such as inference cost or latency. At the same time, existing work on multi-objective prompt optimization relies on off-the-shelf NSGA-II, ignoring optimization efficiency. As a remedy, we introduce MO-CAPO, a novel multi-objective prompt optimization algorithm that jointly optimizes performance and inference cost while leveraging budget allocation for cost-efficient optimization. We further propose a deployment-oriented cost objective that captures the full computational profile of LLM inference. We evaluate our approach across four tasks and three LLMs and compare it to an NSGA-II-based multi-objective method and state-of-the-art single-objective prompt optimizers. Results show that MO-CAPO consistently identifies strong, robust, and diverse Pareto front approximations while maintaining cost-efficiency. It outperforms the NSGA-II baseline on 8 out of 12 cases in terms of the noisy R2 metric and achieves competitive performances often already at a considerably lower budget. The discovered solution sets span diverse performance-cost trade-offs that are omitted by single-objective optimizers, yet the top-performance candidates remain competitive with single-objective solutions. Additionally, we conduct the first evaluation of multi-objective machine learning experiments that considers generalization and robustness through noisy R2 and approximation gap, enabling a more realistic assessment of solution quality. MO-CAPO enables practitioners to select from an efficiently discovered set of multiple prompts offering different trade-offs between performance and cost.

URL PDF HTML ☆

赞 0 踩 0

2605.18868 2026-05-20 cs.CR cs.AI cs.CV cs.LG 版本更新

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

DarkLLM: 利用大语言模型学习语言驱动的对抗攻击

Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao, Yixu Wang, Yifan Ding, Qixian Zhang, Henghui Ding, Xingjun Ma, Yu-Gang Jiang

发表机构 * Fudan University（复旦大学）； Nanyang Technological University（南洋理工大学）； Tongji University（同济大学）

AI总结本文提出DarkLLM，一种基于大语言模型的对抗攻击框架，通过将自然语言攻击指令转换为潜在攻击向量，生成有效的对抗扰动，统一了多种攻击类型并实现了灵活可控的对抗生成。

Comments 23 pages, 13 figures

详情

AI中文摘要

尽管视觉和多模态基础模型在感知到复杂推理任务中至关重要，但它们仍然极易受到对抗攻击的影响。然而，传统对抗攻击通常局限于单一、预定义的目标，紧密耦合每个攻击到特定模型或任务，限制了其在现实场景中的可扩展性和灵活性。在本文中，我们提出了DarkLLM，一种新的攻击框架，该框架训练了一个大语言模型（LLM）将自然语言攻击指令转换为潜在攻击向量，然后解码为视觉对抗扰动。通过利用自然语言指令微调，DarkLLM不仅在一个框架内统一了目标攻击、非目标攻击、分割攻击和多模型攻击，还实现了灵活且可控的对抗生成，使每个指令都能生成一种扰动，以在异构模型上诱导期望的行为。通过在4个任务、13个数据集和15个模型上的广泛实验，我们证明DarkLLM仅需1B参数即可遵循攻击者的指令，生成对CLIP、SAM和前沿LLM高度有效的攻击，揭示了现代基础模型系统性的脆弱性。

英文摘要

While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks. However, traditional adversarial attacks are typically limited to single, predefined objectives, tightly coupling each attack to a specific model or task, which restricts their scalability and flexibility in real-world scenarios. In this work, we present DarkLLM, a novel attack framework that trains an LLM to translate natural-language attack instructions into latent attack vectors, which are then decoded into visual adversarial perturbations. By leveraging natural-language instruction tuning, DarkLLM not only unifies targeted, untargeted, segmentation, and multi-model attacks within a single framework, but also achieves flexible and controllable adversarial generation, enabling each instruction to produce a perturbation that induces desired behaviors across heterogeneous models. Through extensive experiments across 4 tasks, 13 datasets, and 15 models, we demonstrate that DarkLLM with only 1B parameters can follow attacker instructions and generate highly effective attacks against CLIP, SAM, and frontier LLMs, revealing a systemic vulnerability in modern foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.18867 2026-05-20 cs.LG cs.AI 版本更新

EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample

EVA-0: 仅两次前向传递的测试时间模型演化

Guohao Chen, Shuaicheng Niu, Geng Li, Yunbei Zhang, Shilin Shan, Chunyan Miao, Jianfei Yang

发表机构 * Nanyang Technological University（南洋理工大学）； Tulane University（路易斯安那州立大学）

AI总结本文研究了在仅两次前向传递预算下测试时间模型演化的问题，提出EVA-0框架以解决零阶优化中的三个关键障碍，实现高效部署。

详情

AI中文摘要

测试时间模型演化为部署模型提供了一种改进 unlabeled 测试时间经验的有前景方法，但大多数现有方法依赖反向传播（BP），这导致了显著的内存开销，使它们难以在边缘设备、量化模型、专用加速器或黑盒模型上部署。在本文中，我们研究了在严格两次前向预算下测试时间模型演化，这一设置推动了适应向高度高效的现实部署发展。我们揭示了零阶测试时间优化中的三个关键障碍：对捷径解的易感性、不受控的权重漂移和无效的更新方向估计。为克服这些问题，我们提出了EVA-0，一个最小的零阶适应框架，其特点包括：1）保持损失尺度不变以防止捷径解；2）设计了锚点引导的优化策略以缓解权重漂移；3）使用样本级对称双侧扰动进行更新方向估计和推理。EVA-0不需要BP，并且在每个样本上仅需两次前向传递即可完成推理和适应。在ImageNet-C和ViT-Base上的结果表明，EVA-0优于基于BP的DeYO和无BP的FOA，并在FOA上实现了14倍的速度提升。代码将被发布。

英文摘要

Test-time model evolution offers a promising way for deployed models to improve from unlabeled test-time experience, yet most existing methods depend on backpropagation (BP), which incurs substantial memory overhead and makes them difficult to deploy on edge devices, quantized models, specialized accelerators, or black-box models. In this work, we study test-time model evolution under a strict two-forward budget, a setting that pushes adaptation toward highly efficient real-world deployment. We reveal three key obstacles in zeroth-order test-time optimization: susceptibility to shortcut solutions, uncontrolled weight drift, and ineffective update direction estimation. To overcome them, we propose EVA-0, a minimal zeroth-order adaptation framework that: 1) keeps the loss scale-invariant to prevent shortcut solutions; 2) devises an anchor-guided optimization strategy to alleviate weight drift; 3) uses sample-wise symmetric two-sided perturbation for update direction estimation and inference. EVA-0 requires no BP and performs both inference and adaptation within only two forward passes per sample. Results on ImageNet-C & ViT-Base show that EVA-0 outperforms both BP-based DeYO and BP-free FOA, while achieving a 14x speed-up over FOA. Code will be released.

URL PDF HTML ☆

赞 0 踩 0

2605.18865 2026-05-20 cs.LG cs.AI 版本更新

99%成功悖论：当近完美检索等于随机选择

Vyzantinos Repantis, Harshvardhan Singh, Tony Joseph, Cien Zhang, Akash Vishwakarma, Svetlana Karslioglu, Michael Wyatt Thot, Ameya Gawde

发表机构 * Meta Platforms Inc.（Meta平台公司）

AI总结该研究引入了Bits-over-Random（BoR）指标，揭示了高成功率可能掩盖随机水平性能的现象，指出在大规模数据集上，即使检索结果覆盖率达到99%，其选择性仍可能接近零，从而表明需要重新考虑检索深度和传统指标的报告方式。

Comments 12 pages, 2 figures, 7 tables. Accepted at ICLR 2026 Blog Track, https://iclr-blogposts.github.io/2026/blog/2026/bits-over-random/

详情

Journal ref: ICLR Blog Track 2026, https://iclr.cc/virtual/2026/poster/10012083

AI中文摘要

对于信息检索（IR）历史上的大部分时间，搜索结果都是为人类消费者设计的，他们可以自行扫描、过滤和丢弃不相关信息。这塑造了检索系统以寻找并排序更多相关文档为目标，而不是保持结果简洁和干净，因为人类是最终的过滤器。然而，大语言模型（LLMs）改变了这一现状，因为它们缺乏这种过滤能力。为了解决这一问题，我们引入了Bits-over-Random（BoR），这是一种修正了机会的检索选择性度量，揭示了高成功率可能掩盖随机水平性能的情况。我们测量选择性为BoR = log₂（P_obs / P_rand），其中P_rand是所选成功规则（此处为覆盖：top-K中≥1个相关文档）的超几何基线。在20 Newsgroups数据集上，BM25和SPLADE均在K=100时报告>99%的成功率（覆盖），但BoR≈0，表明在该深度下的选择性处于随机水平。当预期覆盖比（K·R̄_q / N）超过3-5时，基线主导并导致选择性崩溃。下游检索增强生成（RAG）评估证实了这一模式：LLM准确性在K=100时可能会显著下降，这与近零BoR上限一致。相比之下，BoR在BEIR/SciFact和MS MARCO上保持正数（其中41个系统在理论上限附近聚集，尽管有13点的召回差距），证实了在稀疏和大规模设置中的基线预测。我们进一步表明，崩溃边界适用于LLM代理工具选择，其中小目录大小导致即使有完美选择器，选择性也会消失。这些发现表明，应将BoR与传统指标一起报告，并在额外检索提供 negligible 选择性增益但增加计算成本时重新考虑深度选择。

英文摘要

For most of the history of information retrieval (IR), search results were designed for human consumers who could scan, filter, and discard irrelevant information on their own. This shaped retrieval systems to optimize for finding and ranking more relevant documents, but not keeping results clean and minimal, as the human was the final filter. However, LLMs have changed that by lacking this filtering ability. To address this, we introduce Bits-over-Random (BoR), a chance-corrected measure of retrieval selectivity that reveals when high success rates mask random-level performance. We measure selectivity as $BoR = \log_{2}\left(\frac{\mathrm{P}_{obs}}{\mathrm{P}_{rand}}\right)$, where $\mathrm{P}_{rand}$ is the hypergeometric baseline for the chosen success rule (here, coverage: $ \geq1 $ relevant in top-$K$). On the 20 Newsgroups dataset, BM25 and SPLADE both report $>99$% success at $K=100$ (coverage), yet $BoR \approx 0$, indicating random-level selectivity at that depth. When the expected coverage ratio $\left(\frac{K \cdot \bar{R}_{q}}{N}\right)$ exceeds 3-5, the baseline dominates and selectivity collapses. Downstream retrieval-augmented generation (RAG) evaluation confirms this pattern: LLM accuracy can degrade substantially at $K=100$, consistent with the near-zero BoR ceiling. In contrast, BoR remains positive on BEIR/SciFact and on MS MARCO (where 41 systems cluster within 0.2 bits of the theoretical ceiling despite a 13-point recall gap), confirming baseline predictions across sparse and large-scale settings. We further show that the collapse boundary applies to LLM agent tool selection, where small catalog sizes cause selectivity to vanish even with perfect selectors. These findings suggest reporting BoR alongside traditional metrics and reconsidering depth choices when additional retrieval provides negligible selectivity gains while inflating computational costs.

URL PDF HTML ☆

赞 0 踩 0

2605.18855 2026-05-20 cs.LG cs.CV 版本更新

Delta Attention Residuals

Cheng Luo, Zefan Cai, Junjie Hu

发表机构 * Independent Researcher（独立研究者）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结本文提出Delta Attention Residuals，通过在残差连接中引入对每个子层引入的变化（delta）进行注意力机制，解决了传统注意力残差中因累积隐藏状态冗余导致的路由崩溃问题，从而提升模型跨层选择信息的能力。

详情

AI中文摘要

Attention Residuals将标准加性残差连接替换为在前一层输出上学习的softmax注意力，实现了选择性的跨层路由。然而，标准Attention Residuals仍然在累积的隐藏状态上进行注意力计算，这些状态高度冗余。我们发现这种冗余导致在更深的层中出现路由崩溃：注意力权重变得低对比度且接近均匀（最大权重≈0.2），限制了模型在前一层中选择信息性状态的能力。这提出了一个关键但尚未深入研究的设计问题：在Attention Residuals中应路由何种层间表示？为回答这个问题，我们提出了Delta Attention Residuals，其在delta（每个子层引入的变化（v_i = h_{i+1} - h_i））上进行注意力计算，而非累积状态。Delta表示在结构上具有多样性，产生更高对比度的注意力分布（最大权重≈0.6），从而在层间实现更选择性和有效的路由。这一原则适用于单个子层和块粒度。在所有测试的规模（220M-7.6B）中，Delta Attention Residuals始终优于标准残差和Attention Residuals，验证困惑度提升1.7-8.2%。Delta Attention Residuals还允许通过标准微调将预训练检查点转换为Delta Attention Residuals。代码可在https://github.com/wdlctc/delta-attention-residuals-code获得。

英文摘要

Attention Residuals replace standard additive residual connections with learned softmax attention over previous layer outputs, enabling selective cross-layer routing. However, standard Attention Residuals still attend over cumulative hidden states in previous layers, which are highly redundant. We show that this redundancy leads to routing collapse in deeper layers: attention weights become low-contrast and closer to uniform (max weight ${\approx}$0.2), limiting the model's ability to select informative states in previous layers. This raises a key but underexplored design question: what layer-wise representations should be routed in Attention Residuals? To answer this question, we propose Delta Attention Residuals, which attend over deltas -- the change introduced by each sublayer ($\mathbf{v}_i = \mathbf{h}_{i+1} - \mathbf{h}_i$) -- instead of cumulative states. Delta representations are structurally diverse and yield higher-contrast attention distributions (max weight ${\approx}$0.6), enabling more selective and effective routing across layers. This principle applies at both per-sublayer and block granularity. Across all tested scales (220M--7.6B), Delta Attention Residuals consistently outperform both standard residuals and Attention Residuals, with 1.7--8.2\% validation perplexity gains. Delta Attention Residuals also enables converting pretrained checkpoints into Delta Attention Residuals via standard fine-tuning. Code is available at https://github.com/wdlctc/delta-attention-residuals-code.

URL PDF HTML ☆

赞 0 踩 0

2605.18854 2026-05-20 cs.LG 版本更新

Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery

评估用于数据驱动科学发现的编码代理的记忆压缩策略

Renuka Chintalapati, Sid Raskar, Anurag Acharya, Jared Willard, Patrick Emami, Sameera Horawalavithana

发表机构 * Pacific Northwest National Laboratory（太平洋西北国家实验室）； National Laboratory of the Rockies（落基山国家实验室）

AI总结本文评估了八种记忆压缩策略在数据驱动科学发现任务中的表现，发现没有压缩器显著提升假设质量，但基于LLM的压缩器会增加24-94%的token成本，而屏蔽工具调用输出可实现8.6%的净节省，且最佳压缩器因科学领域和任务长度而异。

2605.18853 2026-05-20 cs.LG cs.CV cs.DC 版本更新

INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference

INAR-VL：面向边缘-云视觉-语言推断的输入感知路由

Ahmed Šabanović, Paul Joe Maliakel, Ivona Brandić

发表机构 * TU Wien（维也纳技术大学）

AI总结本文提出INAR-VL，一种轻量级的边缘-云路由系统，用于多模态推断的两级部署。该系统通过轻量级的图像和文本复杂度信号指导路由和模型选择，在本地执行简单查询，将复杂查询卸载到云端，从而在延迟、能耗和准确性之间取得平衡。

Comments 8 pages, 3 figures

详情

AI中文摘要

边缘部署的视觉-语言模型（VLMs）面临延迟与准确性的权衡：云端执行提供高质量预测但会带来通信延迟和能耗，而仅边缘执行则速度更快但准确性较低，因为模型容量有限。这种权衡进一步受到图像质量和推理复杂度异质性的影响，使静态部署效果不佳。我们提出了INAR-VL，一种轻量级的边缘-云路由系统，用于两级部署中的多模态推断。INAR-VL在边缘和云端维护互补的VLMs，并利用轻量级的图像和文本复杂度信号指导路由和模型选择，执行简单查询本地化，当有利时将复杂查询卸载到云端。在视觉问答任务上的评估表明，INAR-VL将36%的请求执行在边缘，延迟降低24%，能耗降低26%，并保持97%的云端准确性。

英文摘要

Edge deployment of Vision-Language Models (VLMs) faces a tradeoff between latency and accuracy: cloud execution provides high-quality predictions but incurs communication delay and energy cost, while edge-only execution is faster but less accurate due to limited model capacity. This trade-off is further complicated by heterogeneity in image quality and reasoning complexity, making static placement suboptimal. We present INAR-VL, a lightweight edge-cloud routing system for multimodal inference in a two-tier deployment. INAR-VL maintains complementary VLMs across edge and cloud and uses lightweight image and text complexity signals to guide routing and model selection, executing simple queries locally while offloading complex ones when beneficial. Evaluation on visual question answering shows that INAR-VL executes 36% of requests on the edge, reduces latency by 24%, lowers energy by 26%, and preserves 97% of cloud-level accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.18852 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

通过代理评估和稳定性感知排名实现多模态大语言模型的鲁棒检查点选择

Qinwu Xu, Zhuoheng Li, Jessie Salas

发表机构 * Meta AI

AI总结本文提出了一种多阶段框架，结合了精心挑选的现实世界数据、结构化的LLM判断和多阶段排名协议，以解决多模态大语言模型检查点选择中的鲁棒决策问题，强调数据质量（特别是OCR可读性）对评估有效性的重要性。

详情

AI中文摘要

多模态大语言模型（MLLMs）的检查点选择在性能差异微小且评估信号易受噪声影响时面临重大挑战。现有方法依赖静态基准或逐点评分，经常与实际应用场景不一致，并缺乏对不确定性的鲁棒估计，特别是在OCR密集场景中。在本文中，我们将检查点选择建模为在评估不确定性下的稳健决策问题。我们提出了一种多阶段框架，整合了精心挑选的现实世界数据、结构化的LLM判断和多阶段排名协议。评估系统通过逐点过滤、列表排名和成对比较进行逐步细化。为了提高可靠性，我们引入基于子采样的置信度估计和基于百分位数的评分公式，以捕捉分布特征并惩罚尾部失败。此外，我们证明数据质量，特别是OCR可读性，是评估有效性的重要决定因素。

英文摘要

Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to noise. Existing methodologies rely heavily on static benchmarks or pointwise scoring, which frequently misalign with in-the-wild usage and lack robust uncertainty estimation, particularly in OCR-heavy scenarios. In this work, we formulate checkpoint selection as a robust decision problem under evaluation uncertainty. We propose a multi-stage framework that integrates curated real-world data, structured LLM-based judgment, and multi-stage ranking protocols. The evaluation system orchestrates progressive refinement via pointwise filtering, listwise ranking, and pairwise comparison. To enhance reliability, we introduce subsampling-based confidence estimation and a percentile-based scoring formulation that captures distributional characteristics while penalizing tail failures. Furthermore, we demonstrate that data quality, specifically OCR readability, is a critical determinant of evaluation validity.

URL PDF HTML ☆

赞 0 踩 0

2605.18851 2026-05-20 cs.LG 版本更新

STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning

STRIDE: 用于LLM推理的可学习分步语言反馈

Junjie Zhang, Guozheng Ma, Shunyu Liu, Zetian Hu, Yongcheng Jing, Ting-En Lin, Yongbin Li, Dacheng Tao

发表机构 * Generative AI Lab, College of Computing and Data Science, Nanyang Technological University, Singapore（生成式人工智能实验室，计算与数据科学学院，南洋理工大学，新加坡）； Tongyi Lab, Alibaba Group（通义实验室，阿里巴巴集团）

AI总结本文提出STRIDE框架，通过可学习的分步语言反馈提升LLM推理能力，解决了传统方法在标注成本高、信息瓶颈等问题，实验显示其在多种推理基准上表现优异。

详情

AI中文摘要

最近强化学习（RL）的进步突显了其在激励大型语言模型（LLM）推理能力的潜力。然而，现有分步级方法面临标注成本高、领域覆盖有限的问题，而标量评分进一步引入信息瓶颈，无法提供足够的语义带宽来改进中间决策。替代的语言批评方法依赖于冻结或外部批评者，虽然提供更丰富的文本反馈，但缺乏持续政策改进所需的可扩展性。在本工作中，我们提出语言驱动的分步轨迹重定向（STRIDE），一种新颖的训练框架，将过程监督从标量奖励转移到可学习的分步语言反馈。具体来说，我们仅使用基于结果的奖励共同训练生成器和生成验证器，消除外部标注，通过联合对齐的验证器训练实现持续的政策改进。验证器的分步语言批评明确本地化并解释失败，使生成器能够在中间步骤将推理轨迹转向替代决策。轨迹重定向设计保证了即使在噪声或次优验证器反馈下也能实现无害的政策改进。在多样化的推理基准实验中，STRIDE显著优于最先进的基线，同时在零次通过率问题上取得突破，其中标量方法在消融研究中无法产生学习信号，证明了可学习分步语言反馈在增强LLM推理能力方面的有效性。

英文摘要

Recent advances in Reinforcement Learning (RL) have underscored its potential for incentivizing reasoning capabilities of Large Language Models (LLMs). However, existing step-level efforts suffer from costly annotations that limit domain coverage, while scalar scores further impose an information bottleneck, offering insufficient semantic bandwidth to improve intermediate decisions. Alternative language-critique approaches, which rely on frozen or external critics, provide richer textual feedback but lack the scalability needed for sustained policy improvement. In this work, we propose language-driven stepwise trajectory redirection, termed as STRIDE, a novel training framework that shifts process supervision from scalar rewards to learnable stepwise language feedback. Specifically, we co-train a generator and a generative verifier using only outcome-based rewards, eliminating external annotations, while delivering sustained policy improvement through jointly aligned verifier training. The verifier's stepwise language critiques explicitly localize and explain failures, enabling the generator to redirect reasoning trajectories at intermediate steps toward alternative decisions. The trajectory redirection design guarantees harmless policy improvement, even under noisy or suboptimal verifier feedback. Experiments on diverse reasoning benchmarks show that STRIDE significantly outperforms state-of-the-art baselines, as well as achieving breakthroughs on zero-pass-rate problems where scalar methods yield no learning signal in our ablation studies, demonstrating the effectiveness of learnable stepwise language feedback for enhancing LLM reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.18849 2026-05-20 cs.LG cs.AI 版本更新

INSIGHTS: Demonstration-Based Summaries of Time Series Predictors

INSIGHTS: 时间序列预测器的基于演示的摘要

Bar Eini Porat, Rom Gutman, Uri Shalit, Ofra Amir

发表机构 * Technion Israel Institute of Technology（技术学院以色列理工学院）； Tel-Aviv University（特拉维夫大学）

AI总结本文提出INSIGHTS方法，一种模型无关、以用户为中心的方法，用于提供时间序列模型的全局解释。该方法通过生成样本摘要，平衡时间序列样本的重要性与多样性，为用户提供全面的模型行为概述。

详情

AI中文摘要

可解释性方法发展迅速，但时间序列模型的全局解释仍不完善，大多数方法集中在局部实例层面的解释上。我们介绍了INSIGHTS，一种模型无关、以用户为中心的方法，用于提供时间序列模型的全局解释。我们的方法在设计上优先考虑简单性、效率和透明性，确保利益相关者能够轻松采用其输出。尽管当前方法专注于局部解释，INSIGHTS生成样本摘要，提供模型行为的全面概述。它通过利用效用函数平衡时间序列样本的重要性与多样性，捕捉领域特定的时间序列行为特征，如超过领域规范。我们通过实验、访谈和用户研究评估INSIGHTS。我们的结果表明，INSIGHTS能够构建全面、多样的时间序列子集，生成易于个体评估的摘要。它受到领域专家的青睐，因其能够提供模型行为的稳定理解以及识别的样本质量。此外，接受INSIGHTS摘要的用户研究参与者表现出对模型整体行为的更深入理解。

英文摘要

Explainability methods have progressed rapidly, but global explanations for time-series models remain underdeveloped, with most approaches focusing on local, instance-level attributions. We introduce INSIGHTS, a model-agnostic, user-centric approach for providing global explanations of time series models. Our approach prioritizes simplicity, efficiency, and transparency in its design, ensuring that stakeholders can readily adopt its outputs. While current methods focus on local explanations, INSIGHTS generates sample summaries that offer a comprehensive overview of model behavior. It balances the importance and diversity of time series samples to create informative subsets using utility functions that capture domain-specific aspects of time series behavior, such as exceeding domain norms. We evaluate INSIGHTS through experiments, interviews, and a user study. Our results indicate INSIGHTS effectively constructs comprehensive, diverse time series subsets, producing summaries manageable for individual evaluation. It is preferred by domain experts for its ability to provide a stable understanding of model behavior and the quality of the samples identified. Moreover, user study participants presented with INSIGHTS-based summaries exhibit an enhanced understanding of the model's overall behavior.

URL PDF HTML ☆

赞 0 踩 0

2605.18847 2026-05-20 cs.LG cs.AI 版本更新

VCR：学习不完整可穿戴信号的有效上下文表示

Yuxuan Weng, Wenhan Luo, Qijia Shao

发表机构 * The Hong Kong University of Science and Technology（香港科学与技术大学）

AI总结本文提出VCR框架，通过学习鲁棒于模态缺失的表示，解决可穿戴信号不完整问题，提升在多种健康监测任务中的性能和鲁棒性。

详情

AI中文摘要

可穿戴设备能够从多模态信号中实现连续健康监测，但实际部署受到有限标注数据和普遍传感器不完整性的阻碍。尽管大规模自监督预训练减少了对标签的依赖，但现有方法大多假设全模态可用性。目前处理模态缺失的方法通常重建整个缺失信号，这可能导致无法从观测传感器信号推断出的模态特定细节的幻觉，从而降低鲁棒性。我们提出VCR，一种自监督框架，学习提取对模态缺失具有鲁棒性的表示。VCR采用正交分词器，通过校正潜在流形并应用几何投影，严格分离每个模态到共享语义和模态特定残差。这种设计在保持完整信息完整性的同时，为模态缺失下的稳健学习提供了结构基础。所生成的标记由一个缺失感知的混合专家背骨处理，能够适应不同模式的模态可用性。通过将目标限制为仅重建缺失模态的共享组件，VCR有效减轻了无法推断的模态特定细节的幻觉。在多个健康监测任务中，VCR在完整、单缺失和多缺失模态设置下，相比强大的监督和自监督基线，一致提升了性能和鲁棒性。

英文摘要

Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage hallucinating modality-specific details that are not inferable from the observed sensor signals and degrade robustness. We propose VCR, a self-supervised framework that learns to extract valid representations robust to modality missingness. VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This design preserves complete information integrity while serving as a structural foundation for robust learning under modality missingness. The resulting tokens are processed by a missing-aware mixture-of-experts backbone that adapts to varying patterns of modality availability. By constraining the objective to reconstruct only the shared components of missing modalities, VCR effectively mitigates hallucinations of non-inferable modality-specific details. Across multiple health monitoring tasks, VCR consistently improves performance and robustness under full, single-missing, and multiple-missing modality settings compared with strong supervised and self-supervised baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.18836 2026-05-20 cs.LG cs.CV 版本更新

Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

谱梯度手术用于领域通用化数据集蒸馏

Minyoung Oh, Najeong Chae, Jae-Young Sim

发表机构 * Graduate School of Artificial Intelligence（人工智能研究生院）； Ulsan National Institute of Science and Technology (UNIST)（乌山国立科学与技术研究院（UNIST））

AI总结本文提出了一种新的数据集蒸馏方法，即领域通用化数据集蒸馏（DGDD），通过谱梯度手术（SGS）来提升蒸馏数据集对超出分布（OOD）的泛化能力，同时保持与现有数据集蒸馏方法的兼容性。

Comments 17pages

详情

AI中文摘要

数据集蒸馏（DD）合成一个紧凑的合成数据集，以保留完整数据集的训练效用。然而，其标准公式假设测试数据遵循与训练数据相同的分布，这一假设在实践中很少成立。一种直接的扩展——将事后域泛化（DG）技术应用于蒸馏数据——并不合适，因为现有DG方法依赖于真实数据集的自然多样性，而压缩的合成集本质上缺乏这种多样性，同时还会带来显著的增强开销，这与数据集蒸馏的效率目标相冲突。为了解决这一限制，我们引入了领域通用化数据集蒸馏（DGDD），一种新的问题设定，明确针对蒸馏数据集的超出分布泛化。我们通过广泛采用的DD基线分布匹配（DM）来研究这一问题。我们将DM的超出分布脆弱性归因于压缩合成集中类判别信息和领域特定信息的纠缠，并提出谱梯度手术（SGS）来解纠缠。SGS的关键见解是跨域在谱域中的梯度一致性和跨域梯度组件的共享揭示了哪些梯度组件在源域之间共享——因此是类判别性的——以及哪些是领域特定的。基于这一观察，SGS在标准DM更新中添加了两个互补的梯度：一个强化跨域共享组件，另一个促进蒸馏数据集内的多样性。在多样规模基准上的广泛实验表明，SGS在提升超出分布泛化的同时，仍保持与现有DM方法的即插即用兼容性。

英文摘要

Dataset Distillation (DD) synthesizes a compact synthetic dataset that preserves the training utility of a full dataset. However, its standard formulation assumes that test data follow the same distribution as training data, an assumption that rarely holds in practice. A straightforward extension-applying post-hoc Domain Generalization (DG) techniques to distilled data-is ill-suited because existing DG methods rely on the natural diversity of real datasets, which compact synthetic sets inherently lack, while also incurring substantial augmentation overhead that conflicts with the efficiency objective of dataset distillation. To address this limitation, we introduce Domain Generalizable Dataset Distillation (DGDD), a new problem setting that explicitly targets out-of-distribution (OOD) generalization of distilled datasets. We study this problem through a widely adopted DD baseline of Distribution Matching (DM). We attribute the OOD vulnerability of DM to the entanglement of class-discriminative and domain-specific information within the compressed synthetic set, and propose Spectral Gradient Surgery (SGS) to disentangle the two. The key insight of SGS is that cross-domain agreement among domain-wise gradients in the spectral domain reveals which gradient components are shared across source domains-and are therefore class-discriminative-and which are domain-specific. Based on this observation, SGS augments the standard DM update with two complementary gradients: one that reinforces cross-domain shared components and another that explicitly promotes diversity within the distilled dataset. Extensive experiments on diverse-scale benchmarks demonstrate that SGS substantially improves OOD generalization while remaining plug-and-play compatible with existing DM methods.

URL PDF HTML ☆

赞 0 踩 0

2605.18835 2026-05-20 cs.LG 版本更新

StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping

StampFormer: 一种基于物理的材料-几何耦合多模态模型，用于快速预测冲压板料的物理场

Jiajie Luo, Mohamed Mohamed, Osama Hassan, Haosu Zhou, Yingxue Zhao, Haoran Li, Xinrun Li, Zhutao Shao, Yang Long, Nan Li, Jichun Li

发表机构 * Dyson School of Design Engineering, Imperial College London（帝国理工学院设计工程学院）； School of Computing, Newcastle University（新castle大学计算机学院）； Department of Computing, Imperial College London（帝国理工学院计算系）； Multi-X Solution Limited（Multi-X解决方案有限公司）； Department of Computer Science, Durham University（达勒姆大学计算机科学系）； Department of Mechanical Engineering, Faculty of Engineering, Helwan University（Helwan大学工程学院机械工程系）

AI总结本文提出StampFormer模型，通过结合材料和几何信息，实现对冲压板料物理场的快速准确预测，从而提高设计效率。

详情

AI中文摘要

传统冲压板料成型依赖于耗时且昂贵的有限元分析（FEA）进行设计验证，这一过程显著延长了设计周期。虽然代理模型提供了更快的迭代速度，但现有方法存在局限：标量方法无法捕捉全面的基于场的FEA结果，而现有基于图像的方法往往忽略了材料属性的关键作用，仅关注几何。为解决这一差距，我们开发了一种基于物理的深度学习框架，即StampFormer，该框架同时利用组件几何和材料应力-应变响应来预测FEA结果。StampFormer框架使用三个核心组件处理数据。首先，材料增强的几何网络（MAGN）融合几何和材料数据。然后，通过层次化材料嵌入注入单元（HMEIU）在不同层次上整合信息，再由主网络骨干，即改进的Swin-UNet进行处理。我们在交叉件面板冲压上评估了我们的模型，使用两个模拟数据集进行钢和铝板的冲压模拟，结果表明，StampFormer在不到一秒的时间内提供了高保真的关键物理场预测，包括薄化、主应变、次应变、塑性应变和位移。与真实FEA相比，我们的模型在四个二维场上的平均相对误差小于8.5%，在三维位移场上的均方误差小于1.2 mm²。总之，我们介绍了一种实用且高效的框架，整合了多模态信息，即几何和材料属性，以提供快速且准确的预测，使设计师能够进行实时的可制造性评估。

英文摘要

Traditional sheet metal forming relies on time-consuming and expensive Finite Element Analysis (FEA) for design validation, a process that significantly prolongs design cycles. While surrogate models offer faster iteration, current approaches have limitations: scalar-based methods cannot capture comprehensive field-based FEA results, while existing image-based models often ignore the critical role of material properties by focusing solely on geometry. To address this gap, we develop a physics-guided deep learning framework, namely StampFormer, which simultaneously uses component geometry and material stress-strain responses to predict FEA outcomes. The StampFormer framework uses three core components to process data. A Material-Augmented Geometric Network (MAGN) first fuses geometric and material data. This information is then integrated at various levels by a Hierarchical Material Embedding Injection Unit (HMEIU) before being processed by the primary network backbone, an adapted Swin-UNet. We evaluated our model on the stamping of a crossmember panel with two simulation datasets for steel and aluminium panels, and results demonstrate that StampFormer provides high-fidelity predictions of critical physical fields - including thinning, major strain, minor strain, plastic strain, and displacement - in under a second. Compared with ground truth FEA, our model achieved an average relative error of less than 8.5% on the four 2D fields and a mean squared error of less than 1.2 mm2 for the 3D displacement field. In summary, we introduce a practical and efficient framework that integrates multimodal information, namely geometry and material properties, to provide fast and accurate predictions, enabling designers to perform real-time manufacturability assessments.

URL PDF HTML ☆

赞 0 踩 0

2605.18832 2026-05-20 cs.LG cs.AI 版本更新

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

通过卡尔曼滤波、克里格法和过程噪声的精确跟踪变压器

Bo Long, Deepak Agarwal, Jelena Markovic-Voronov, Yi Wang, Liuqing Li

发表机构 * LinkedIn Core AI（LinkedIn核心AI）

AI总结本文提出了一种基于贝叶斯滤波的变压器（BFT），通过引入精度权重的克里格法、自适应卡尔曼更新和动态模型，解决了传统变压器在处理不确定性方面的不足，提升了序列推荐和大语言模型在噪声环境下的鲁棒性。

详情

AI中文摘要

Transformer是现代AI的基础构建块，但其缺乏对不确定性的原则性处理，这在实际应用中普遍存在：序列推荐中的冷启动标记具有稀疏的历史，语言模型中的异质信号质量，以及由无约束softmax引起的注意力 sinks。每个token都被统一的置信度处理。我们证明这种统一性是我们的贝叶斯滤波变压器（BFT）的退化情况：注意力变为精度加权克里格法，残差连接变为具有自适应增益的卡尔曼更新，FFN变为通过雅可比矩阵加过程噪声规则传播精度的动力学模型。观测精度来自一个无参数的受限最大似然（REML）估计器，具有共轭贝叶斯先验。BFT将任何Transformer层替换为几乎无开销。在序列推荐中，BFT应用于三种主要架构，在六个基准上获得显著提升，其中在冷启动用户和稀有物品上改进最大。在具有噪声数据的监督微调中，BFT在两个领域提高了鲁棒性：噪声监督（问答中的token-标签腐败）和噪声上下文（具有真实RAG干扰项的检索增强问答）。单个原则性修改——恢复精度——在经典序列建模和现代LLM领域中释放了大量空间。

英文摘要

The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications: cold-start tokens with sparse histories in sequential recommendation, heterogeneous signal quality in language models, and attention sinks induced by unconstrained softmax. Every token is treated with uniform confidence. We show this uniformity is a degenerate case of our \emph{Bayesian Filtering Transformer} (BFT): attention becomes precision-weighted kriging, the residual connection becomes a Kalman update with adaptive gain, and the FFN becomes a dynamics model propagating precision via a Jacobian--plus--process-noise rule. Observation precision comes from a parameter-free Restricted Maximum Likelihood (REML) estimator with a conjugate Bayesian prior. BFT replaces any Transformer layer with negligible overhead. On sequential recommendation, BFT applied to three major architectures yields significant gains on six benchmarks, with the largest improvements on cold-start users and rare items where uncertainty is highest. On supervised fine-tuning of large language models with noisy data, BFT improves robustness in two regimes: noisy supervision (token-label corruption in question answering) and noisy context (retrieval-augmented QA with real RAG distractors). A single principled modification -- restoring precision -- unlocks substantial headroom across both classical sequence-modeling and modern LLM regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.18831 2026-05-20 q-bio.QM cs.LG 版本更新

Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows

通过物理基础的代理工作流发现胰岛素输送聚合物

Martins Otun

发表机构 * Algonix AI Ltd.（Algonix AI有限公司）

AI总结本文提出了一种基于物理的代理工作流方法，用于发现胰岛素输送的聚合物，通过大规模语言模型和物理工具的结合，在有限预算内高效搜索离散的PSMILES空间，实现了优于强化学习和贝叶斯优化的胰岛素-聚合物相互作用能。

详情

AI中文摘要

冷链存储限制了数亿人获得胰岛素的机会；一种热保护性贴片聚合物可能有所帮助，但设计空间太大无法进行彻底实验。从这一问题出发，我们聚焦于一种代理工作流：一个大型语言模型（LLM）通过模型上下文协议（MCP）调用基于物理的工具，在OpenMM Packmol-矩阵评估预算内搜索离散的PSMILES空间。LLM充当一个隐含的获取函数，基于一个持续更新的“发现世界”：假设、文献声明和模拟结果。在匹配的Oracle预算下，最佳自主行动达到了胰岛素-聚合物相互作用能为-2263 kJ/mol，优于强化学习基线68%和贝叶斯优化19%。三个独立行动收敛到一个结构特征（每个重复单元密集的氢键供体和受体）的同时，物理检查拒绝不可行的排列和名称-结构不匹配，从而在下一步之前阻止了这些不合理的排列。科学阶段是CPU绑定的，并在商用硬件上运行。更广泛地说，这里设计的相同架构和工作流适用于其他蛋白质稳定化任务，只要存在可处理的筛选Oracle。

英文摘要

Cold-chain storage limits access to insulin for hundreds of millions of people; a thermally protective patch polymer could help, but the design space is too large for exhaustive experiment. Starting from that problem, we narrow to an agentic workflow: a large language model (LLM) calls physics-based tools through the Model Context Protocol (MCP), searching the discrete PSMILES space under a budget of OpenMM Packmol-matrix evaluations. The LLM acts as an implicit acquisition function conditioned on a persistent "discovery world": hypotheses, literature claims, and simulation outcomes updated each iteration. Under matched oracle budgets, the best autonomous campaign reaches an insulin-polymer interaction energy of -2263 kJ/mol, outperforming reinforcement-learning baselines by 68% and Bayesian optimization by 19%. Three independent campaigns converge on one structural motif (dense hydrogen-bond donors and acceptors per repeat unit) while physics checks reject infeasible packings and name-structure mismatches before they steer the next step. The science stage is CPU-bound and runs on commodity hardware. More broadly, the same architecture and workflow designed here applies to other protein-stabilization tasks whenever a tractable screening oracle is available.

URL PDF HTML ☆

赞 0 踩 0

2605.18830 2026-05-20 cs.LG 版本更新

In-Context Learning Operates as Concept Subspace Learning

基于情境学习的概念子空间学习

Wei Tang, Xinyan Jiang, Fakhri Karray, Lijie Hu

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎伊德·本·扎耶德人工智能大学）； Shanghai Advanced Research Institute（上海先进研究院）

AI总结本文研究了结构化演示是否诱导低维概念推理，通过概念子空间视角揭示了情境学习中预测分解为概念坐标回归和子空间泄漏的机制，并通过实验验证了任务信息集中在低维任务对齐激活子空间中的结论。

详情

AI中文摘要

回归和贝叶斯对情境学习（ICL）的解释说明了演示如何诱导预测器，而机械分析通常识别出紧凑的激活方向，引导受促行为。然而，仍不清楚结构化演示是否诱导低维概念推理。我们通过概念子空间视角研究这一问题，在此视角中，任务仅沿内在概念坐标变化，尽管输入观察在高维环境空间中。对于岭回归和最小二乘ICL代理，预测精确分解为概念坐标回归和子空间泄漏。在块对角或近块对角协方差假设下，主导估计和噪声敏感项随概念子空间的维度变化，而残差效应由跨子空间耦合控制。这种分离给出了机械预测：可恢复的任务信息应集中在低维、任务对齐的激活子空间中。在CounterFact衍生的多关系提示上使用Llama-3-8B，4096维残差流的68-73维子空间恢复了78.8%的干净-受污染准确率差距，而补全互补子空间则恢复了0%。概念交换将预测引导至注入的关系，而随机和跨任务匹配排名控制效果不大。此外，在Qwen2.5-7B和受控的跨语言规则任务上的额外实验显示了相同定性模式。这些结果支持概念子空间作为紧凑、任务对齐的可恢复ICL行为在结构化任务家族中的中介，而不意味着全电路恢复。

英文摘要

Regression and Bayesian accounts of in-context learning (ICL) explain how demonstrations can induce predictors, while mechanistic analyses often identify compact activation directions that steer prompted behavior. However, it remains unclear whether structured demonstrations induce low-dimensional concept inference. We study this question through a concept-subspace view of ICL, in which tasks vary only along intrinsic concept coordinates, although inputs are observed in a high-dimensional ambient space. For ridge and least-squares ICL proxies, prediction decomposes exactly into concept-coordinate regression and off-subspace leakage. Under block-diagonal or near-block-diagonal covariance assumptions, the leading estimation and nuisance-sensitivity terms scale with the dimension of the concept subspace, while residual effects are controlled by cross-subspace coupling. This separation gives a mechanistic prediction: recoverable task information should concentrate in a low-dimensional, task-aligned activation subspace. On CounterFact-derived multi-relation prompts with Llama-3-8B, a 68--73-dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean--corrupted accuracy gap, whereas patching the complementary subspace restores 0%. Concept swaps redirect predictions toward injected relations, while random and cross-task matched-rank controls are largely ineffective. Additional experiments on Qwen2.5-7B and a controlled cross-lingual rule task show the same qualitative pattern. These results support concept subspaces as compact, task-aligned mediators of recoverable ICL behavior in structured task families, without implying full-circuit recovery.

URL PDF HTML ☆

赞 0 踩 0

2605.18829 2026-05-20 cs.LG cs.CR 版本更新

并非所有标记都值得缓存：学习语义感知的淘汰策略用于LLM前缀缓存

Shaoke Fang, Ziang Li, Wenfei Wu, Jiatong Ji, Qingsong Liu, Ruizhi Pu

发表机构 * Peking University（北京大学）； FirestAI ； Tsinghua University（清华大学）； University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）； Southeast University（东南大学）

AI总结本文提出了一种语义感知的前缀缓存淘汰策略SAECache，通过多队列架构、语义感知的标记加权机制和全适应的在线学习方案，提高了LLM服务中前缀缓存的效率，从而在不同工作负载下实现了显著的TTFT提升。

详情

AI中文摘要

前缀缓存是大型语言模型（LLM）服务中的关键优化，通过重用注意力键值（KV）状态来减少昂贵的prefill计算。然而，其效益依赖于淘汰策略，因为GPU内存有限，而现有策略如LRU通常将缓存块视为统一处理。这种观点忽略了LLM提示的一个基本属性：并非所有标记都同样值得缓存。我们显示，提示中不同的标记类型，包括系统提示、用户查询、工具输出、模型响应和推理链，其重用率可能高达756倍，但现有淘汰策略并未利用这一信号。在本文中，我们提出了SAECache（语义适应的前缀缓存淘汰策略），通过三个创新来解决这一差距：（1）一个多队列架构，将KV块路由到任务特定的队列中，使用定制的优先级指标，捕捉多轮请求中的会话重用和模板单轮请求中的结构重用；（2）一种语义感知的标记加权机制，通过淘汰反馈在线学习不同标记类型的重用价值；（3）一种完全适应的在线学习方案，用于所有参数更新，包括对数正态时间参数、位置衰减幂、队列权重和元参数，这消除了手动调优并使系统能够自动适应部署特定的工作负载特性。通过在异构工作负载上的广泛评估，我们证明SAECache在生产风格的基线之上实现了1.4x-2.7x的TTFT提升，而固定参数的替代方案在工作负载不匹配时可能会下降高达2.7x，这是我们的自适应方法完全避免的失败模式。

英文摘要

Prefix caching is a key optimization in Large Language Model (LLM) serving, reusing attention Key-Value (KV) states across requests with shared prompt prefixes to reduce expensive prefill computation. However, its benefit depends critically on the eviction policy as GPU memory is scarce, and existing policies such as LRU largely treat cached blocks uniformly. This view ignores a fundamental property of LLM prompts: not all tokens are equally worth caching. We show that different token types within a prompt, including system prompts, user queries, tool outputs, model responses, and chain-of-thought reasoning, exhibit up to 756x variation in reuse rates, yet no existing eviction policy exploits this signal. In this paper, we present SAECache (Semantic-Adaptive Eviction for prefix caches), a semantic-adaptive prefix cache eviction policy that addresses this gap through three innovations: (1) a multi-queue architecture that routes KV blocks to task-specific queues with tailored priority metrics, capturing both session reuse in multi-turn requests and structural reuse in templated single-turn requests; (2) a semantic-aware token weighting mechanism that learns the reuse value of different token types online through eviction feedback; and (3) a fully adaptive online learning schema for all parameter updates, including log-normal timing parameters, position decay power, queue weights, and meta-parameters, which eliminates manual tuning and enables automatic adaptation to deployment-specific workload characteristics. Through extensive evaluation across heterogeneous workloads, we demonstrate that SAECache achieves 1.4x-2.7x TTFT improvement over production-style baselines, while fixed-parameter alternatives can degrade by up to 2.7x under workload mismatch -- a failure mode our adaptive approach avoids entirely.

URL PDF HTML ☆

赞 0 踩 0

2605.18824 2026-05-20 cs.LG cs.AI cs.CL 版本更新

Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models

细粒度基准生成用于基础模型的全面评估

Mohammed Saidul Islam, Negin Baghbanzadeh, Farnaz Kohankhaki, Afshin Cheraghi, Ali Kore, Shayaan Mehdi, Elham Dolatabadi, Arash Afkanpour

发表机构 * Vector Institute（Vector研究院）； York University（约克大学）

AI总结本文提出了一种自动化基准生成框架，用于生成覆盖广泛、元数据丰富且抗污染的评估问题，从而提升基础模型的全面评估能力。

详情

AI中文摘要

基础模型的评估通常依赖于缺乏全面覆盖和细粒度评估元数据的基准汇总分数。我们引入了一个自动化基准生成框架。该框架生成基于参考材料（如教科书）的评估问题，生成具有广泛覆盖、丰富元数据和抗污染性的基准。该流程采用多代理架构进行问题生成，并采用以解决方案图驱动的策略，显著提高了地面真实解决方案的可靠性。使用该框架，我们生成了三个基准：机器学习、公司金融和个人金融。专家审查发现，其地面真实错误率显著低于之前的基准，如MMLU和GSM8K。对12个商业和开源模型的评估显示，我们的基准实现了接近均匀的竞争力覆盖，并揭示了现有基准未能捕捉到的模型间性能差异。我们即将开源该框架和我们精心挑选的基准。

英文摘要

Evaluation of foundation models often rely on aggregate scores from benchmarks that lack comprehensive coverage and metadata for a fine-grained evaluation. We introduce a framework for automated benchmark generation. Our framework generates evaluation problems grounded in reference material, such as textbooks, producing benchmarks with broad coverage, rich metadata, and robustness to contamination. The pipeline employs a multi-agent architecture for problem generation and a solution-graph-driven strategy that significantly improves the reliability of ground truth solutions. Using the framework, we generate three benchmarks in Machine Learning, Corporate Finance, and Personal Finance. Expert review finds a significantly lower ground-truth error rate than previous benchmarks such as MMLU and GSM8K. Evaluation of 12 commercial and open-source models shows that our benchmarks achieve near-uniform competency coverage and surface performance differences across models that existing benchmarks fail to capture. We will open-source the framework and our curated benchmarks soon.

URL PDF HTML ☆

赞 0 踩 0

2605.18823 2026-05-20 cs.LG 版本更新

Multi-Pedestrian Safety Warning at Urban Intersections Use Case of Digital Twin

城市交叉口多行人安全预警的数字孪生应用案例

Yongjie Fu, Qi Gao, Mahshid Ghasemi Dehkordi, Gil Zussman, Xuan Di

发表机构 * Department of Civil Engineering and Engineering Mechanics at Columbia University（哥伦比亚大学土木工程与工程力学系）； Data Science Institute（数据科学研究院）； Department of Electrical Engineering at Columbia University（哥伦比亚大学电气工程系）

AI总结本文提出一种基于紧密耦合物理-数字孪生框架的城市交叉口多行人安全预警系统，通过COSMOS无线测试床进行实地部署和虚拟现实实验，验证了系统在提高安全预警准确性和响应效率方面的有效性。

详情

AI中文摘要

数字孪生（DTs）在城市交通系统中已获得越来越多的关注；然而，其在安全关键场景中的系统性评估仍然有限。本文提出了一种基于紧密耦合物理-数字孪生框架的城市交叉口多行人安全预警系统。该系统基于纽约市的COSMOS城市级无线测试床，整合了摄像头和超宽带（UWB）、边缘-云计算、预测轨迹建模以及基于MQTT的通信，以向易受伤害道路使用者（VRUs）提供实时安全警报。该系统通过实地部署和虚拟现实（VR）实验进行评估。结果表明，系统在不同模型配置下具有高预警生成准确率、高定位准确率、高效的端到端延迟以及在发出警告时显著减少用户响应时间。所提出的DT框架提供了一种可扩展、模块化且通用的解决方案，用于复杂城市交叉口的实时多行人安全增强。

英文摘要

Digital twins (DTs) for urban transportation systems have gained increasing attention; however, their systematic evaluation in safety-critical scenarios remains limited. This paper presents a multi-pedestrian safety warning system at urban intersections enabled by a tightly coupled physical-digital twin framework. Built upon the COSMOS city-scale wireless testbed in New York City, the proposed system integrates camera and ultra-wideband (UWB), edge-cloud computing, predictive trajectory modeling, and MQTT-based communication to deliver real-time safety alerts to vulnerable road users (VRUs). The system is evaluated through both field deployment and virtual reality (VR) experiments. Results demonstrate high warning generation accuracy, localization accuracy, efficient end-to-end latency under different model configurations, and significant reductions in user response time when warnings are issued. The proposed DT framework provides a scalable, modular, and generalizable solution for real-time multi-pedestrian safety enhancement at complex urban intersections.

URL PDF HTML ☆

赞 0 踩 0

2605.18822 2026-05-20 cs.LG cs.AI 版本更新

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Hybrid-LoRA: 桥接全微调与低秩适应以实现训练后优化

Chengqian Zhang, Wei Zhu, Kyumin Lee

发表机构 * Worcester Polytechnic Institute（沃斯特理工学院）； University of Hong Kong（香港大学）

AI总结本文提出Hybrid-LoRA框架，通过选择性地对部分模块进行全微调，其余模块使用LoRA进行适应，从而在训练后优化中实现高效性能。

详情

AI中文摘要

训练后已成为适应大型语言模型（LLMs）以实现复杂下游行为（如指令遵循、偏好对齐和多步推理）的关键方法。最近，基于可验证奖励的强化学习（RLVR）作为一种特别有效的训练后范式，通过如GRPO和GSPO等无批评算法实现了可扩展的优化。然而，使用全微调（FFT）的RLVR训练后方法需要大量GPU内存并导致高训练成本。尽管参数高效微调（PEFT）方法如低秩适应（LoRA）能有效降低计算成本，但它们在复杂推理任务的训练后性能上往往存在显著差距。在本文中，我们提出了Hybrid-LoRA，一种高效的训练后框架，该框架选择性地对一小部分不太适合低秩适应的模块进行全微调，而对其余模块使用LoRA进行适应。我们引入了一个新的Hybrid-LoRA Score，用于在固定参数预算下对候选模块按其对低秩适应的敏感性进行排序。实验表明，在10%的全微调模块预算下，Hybrid-LoRA能够接近全微调性能，其余候选模块通过LoRA进行适应， consistently outperforming four state-of-the-art PEFT post-training baselines，实现了高达5.65%和平均4.36%的改进。

英文摘要

Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework that selectively applies full fine-tuning to a small subset of modules less suited to low-rank adaptation, while adapting the remaining components with LoRA. We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget. Experiments show that Hybrid-LoRA closely matches full fine-tuning performance under a 10% full fine-tuning module budget, with the remaining candidate modules adapted by LoRA, consistently outperforming four state-of-the-art PEFT post-training baselines, achieving improvements of up to 5.65% and on average 4.36% over the best baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.18821 2026-05-20 cs.LG cs.CR 版本更新

Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods

量子对抗机器学习：从经典适应到量子原生方法

Roozbeh Razavi-Far, Mohammad Meymani, Erfan Mahmoudinia, Dorsa Vazirzade, Peyman Paknezhad, Fateme Ghasemi, Saeed Saravani, Somayeh Nikkhoo, Kimia Haghjooei

发表机构 * Faculty of Computer Science, University of New Brunswick（新不伦瑞克大学计算机科学学院）； Department of Electrical Engineering, Amirkabir University of Technology（技术学院电子工程系）； Faculty of Mathematical and Computer Science, Kharazmi University（卡扎尔米大学数学与计算机科学学院）； Pázmány Péter Catholic University（帕兹曼·彼得天主教大学）； Department of Computer Engineering, Amirkabir University of Technology（技术学院计算机工程系）； Department of Computer Engineering, Ferdowsi University of Mashhad（马赞德兰大学计算机工程系）； Department of Computer Science, Tarbiat Modares University（塔里克·莫达res大学计算机科学系）

AI总结本文研究量子对抗机器学习中的攻击与防御策略，探讨其理论基础、发展趋势和关键挑战。

详情

DOI: 10.1007/s10462-026-11578-7
Journal ref: Artif Intell Rev (2026)

AI中文摘要

机器学习已革新了众多工业领域。尽管取得了近期进展，机器学习模型仍然容易受到对抗性威胁。对抗性机器学习研究这些脆弱性以构建稳健的机器学习模型。量子机器学习是连接量子计算和经典机器学习的交叉领域。虽然量子机器学习在回归、分类和生成建模等复杂任务中可能超越经典机器学习，但它仍然容易受到对抗性攻击。鉴于量子计算和机器学习的近期进展，量子对抗性机器学习领域应运而生，以研究量子机器学习的脆弱性、可能的攻击和新型量子增强的防御策略。在本文的综述中，我们提供了量子对抗性机器学习的详细概述，探讨了现有的攻击和防御措施。我们还回顾了该领域的理论基础、新兴趋势和关键挑战。

英文摘要

Machine learning has revolutionized numerous industrial domains. Despite recent advances, machine learning models remain vulnerable to adversarial threats. Adversarial machine learning is a field that studies these vulnerabilities to build robust machine learning models. Quantum machine learning is an interdisciplinary field that bridges quantum computing and classical machine learning. While quantum machine learning shows potentials to outperform classical machine learning in complex tasks such as regression, classification, and generative modeling, it remains vulnerable to adversarial attacks. Given the recent advancements in quantum computing and machine learning, the quantum adversarial machine learning field has emerged to study the vulnerabilities of quantum machine learning, possible attacks, and novel quantum-enhanced defense strategies. In this survey, we provide a detailed overview on quantum adversarial machine learning and explore the existing attacks and countermeasures. We also review the theoretical underpinnings of this area, emerging trends, and critical challenges.

URL PDF HTML ☆

赞 0 踩 0

2605.18820 2026-05-20 cs.LG cs.AI 版本更新

Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision

前沿叠加的涌现：莫比乌斯吸引子与级联监督

Hongyu Gu, Jingwen Fu

发表机构 * University of Science and Technology of China（中国科学技术大学）； Zhongguancun Academy（中关村学院）

AI总结本文研究了通过叠加实现深度推理的问题，提出莫比乌斯吸引子和级联监督方法，证明了在Erdős-Rényi图上，叠加推理的涌现是通过建筑和监督的贡献实现的。

Comments 40 pages, 3 figures

详情

AI中文摘要

叠加允许Transformer在深度推理中并行处理整个推理前沿，通过有限深度的前向传递而不是展开串行的思维链token。虽然Zhu等人(2025)在单一残差流中手工构建了一个等权重的广度优先前沿用于图可达性，但仍未确定梯度下降能否在排列对称的鞍点中找到这个目标。我们通过隔离建筑和监督的贡献，填补了在Erdős-Rényi图上通过叠加实现可达性的问题。在建筑方面，我们识别出一个莫比乌斯吸引子：在树的 regime 中，层间动态减少到一个1D莫比乌斯映射，其零集是一个共维数为一的全局最优解 manifold，包含等权重叠加状态。在监督方面，我们识别出级联监督：一个损失类别，其反向传播同时提供(A)选择性 bootstrap，(B)梯度在深度的持续性，以及(C)每一步的区分（例如L_sup和L_node）。端到端监督失败于条件(B)，并被证明是不足的：在图的扇出和停滞前到达 manifold 之前，层c的内部梯度衰减为(np)^{-(D-c-2)/2}。我们的论点：莫比乌斯吸引子 + 级联监督 = 叠加推理的涌现。参数无关的衰减定律预测在深度D=3时，最终步骤余弦为0.35 vs. 0.71（端到端 vs. 级联）；实验证实0.37 vs. 0.69，每一步的匹配误差在0.02以内。

英文摘要

Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-thought tokens. While Zhu et al. (2025) hand-crafted an equal-weight breadth-first frontier in a single residual stream for graph reachability, it remained open whether gradient descent could ever find this target amidst permutation-symmetric saddles. We close this gap on Reachability-by-Superposition over Erdős-Rényi graphs by isolating architectural and supervisional contributions. Architecturally, we identify a Möbius attractor: under $S_n$-symmetry in the tree regime, layerwise dynamics reduce to a 1D Möbius map whose zero set is a codimension-one manifold of global optima containing the equal-weight superposition state. On the supervision side, we identify Cascade Supervision: a loss class whose backward pass simultaneously delivers (A) selectivity bootstrap, (B) gradient persistence across depth, and (C) per-step discrimination (e.g., \mathcal{L}_{sup} and \mathcal{L}_{node}). End-to-end supervision fails condition (B) and is provably insufficient: internal gradients at layer c decay as (np)^{-(D-c-2)/2} in the graph fan-out and stall before the manifold is reached. Our thesis: Möbius attractor + Cascade Supervision = emergence of superposition reasoning. The parameter-free decay law predicts a final-step cosine of 0.35 vs. 0.71 (end-to-end vs. cascade) at depth D=3; experiments confirm 0.37 vs. 0.69, matching within 0.02 at every step.

URL PDF HTML ☆

赞 0 踩 0

2605.18819 2026-05-20 cs.LG 版本更新

Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not

高效条件化：为何伪观测批量贝叶斯优化在某些情况下有效

Kumbha Nagaswetha, Rabi Pathak

AI总结本文研究了批量并行贝叶斯优化中常用于批量选择的常数骗子（CL）、克里格信徒（KB）和幻想模型的有效性，揭示了高效条件化作为关键的替代属性，即在数据增强时能够以闭合形式更新预测。通过证明高斯过程满足这一要求，以及任何单调非递减于后验不确定性的获取函数（如EI、UCB、PI）都具有类似行为，统一了CL、KB和幻想模型为单一条件机制的不同实例，并建立了与局部惩罚（LP）的定量联系和与决定性点过程（DPPs）的定性联系。

详情

AI中文摘要

常数骗子（CL）、克里格信徒（KB）和幻想模型广泛用于并行贝叶斯优化中的批量选择，但缺乏统一的理论来解释它们的有效性和在何种条件下失效。我们识别出高效条件化是关键的替代属性，即在数据增强时能够以闭合形式更新预测。我们证明高斯过程满足这一要求，产生可证明不同的批量点，分离阶为l，并且对于任何单调非递减于后验不确定性的获取函数（如EI、UCB、PI），以及汤普森采样具有类似的行为。我们将CL、KB和幻想模型统一为单一的条件机制的不同实例，仅在谎言值分布上有所不同，并建立了与局部惩罚（LP）的定量联系和与决定性点过程（DPPs）的定性联系。为了区分模型结构与优化器随机性，我们引入了结构多样性诊断（SDD），一种可重用的方法用于测试替代模型的兼容性。在Hartmann6D、Ackley 8D、Levy10D和SVM超参数调节的实验中验证了所有理论预测：CL或KB隐含的惩罚匹配或优于显式的LP贪婪条件化，达到与联合qEI类似的收敛；高效条件化扩展到多二次径向基网络；参数替代模型即使在完全重新训练（随机森林）时仍产生退化的批量，而神经网络仅在15倍的墙钟成本下恢复多样性，优于高斯过程条件化。鲁棒性在多个初始数据集和观察噪声下得到确认。

英文摘要

Constant Liar (CL), Kriging Believer (KB), and fantasy models are widely used for batch selection in parallel Bayesian Optimization, yet a unified theory explaining their effectiveness and conditions under which they fail has been lacking. We identify efficient conditioning as the key surrogate property the ability to update predictions in closed form when data is augmented. We prove that Gaussian Processes satisfy this requirement, producing provably distinct batch points with separation of order l, and that this holds for any acquisition function monotonically non decreasing in posterior uncertainty (EI, UCB, PI), with qualitatively similar behavior for Thompson Sampling. We unify CL, KB, and fantasy models as instances of a single conditioning mechanism differing only in the lie value distribution, and draw quantitative connections to Local Penalization (LP) and qualitative connections to Determinantal Point Processes (DPPs). To disentangle model structure from optimizer randomness, we introduce the Structural Diversity Diagnostic (SDD), a reusable methodology for testing surrogate compatibility. Experiments on Hartmann6D, Ackley 8D, Levy10D, and SVM hyperparameter tuning validate all theoretical predictions: CL or KBs implicit penalty matches or outperforms explicit LP greedy conditioning achieves convergence on par with joint qEI efficient conditioning extends to Multiquadric RBF networks; and parametric surrogates produce degenerate batches even when fully retrained (random forests), while neural networks regain diversity only at 15x the wall clock cost of GP conditioning. Robustness is confirmed across multiple initial datasets and under observation noise.

URL PDF HTML ☆

赞 0 踩 0

2605.18818 2026-05-20 cs.AI cs.LG cs.SE 版本更新

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

将文档AI operationalize：一种用于OCR和LLM流水线的微服务架构

Yao Fehlis, Benjamin Bengfort, Zhangzhang Si, Vahid Eyorokon, Prema Roman, Patrick Deziel, Devon Slonaker, Steve Veldman, Ben Johnson, Joyce Rigelo, Michael Wharton, Steve Kramer

AI总结本文提出了一种微服务架构，用于在生产环境中实现文档理解，通过整合多个模型的流水线，包括分类、OCR和LLM结构字段提取，并展示了在每小时处理数千页文档的经验。

详情

AI中文摘要

学术研究往往集中在新的文档理解模型上，导致文献中模型定义与大规模生产模型之间存在较大差距。为了缩小这一差距，我们提出了一种微服务架构，该架构封装了多个模型的流水线，包括分类、光学字符识别（OCR）和大型语言模型结构字段提取，并展示了该流水线在每小时处理数千页文档的经验。我们描述了主要的设计决策，包括混合分类、将GPU绑定的推理与CPU绑定的编排分离、使用异步处理处理流水线中的许多I/O绑定操作，以及独立的水平扩展策略。通过批量分析，我们发现了两个令人惊讶的定性发现，这些发现影响了生产部署：OCR而不是语言模型解析主导了端到端延迟，并且系统饱和度由共享的GPU推理容量而不是工作程序数量决定。我们的目标是为从业者提供具体的架构模式，以构建在基准之外有效工作的文档理解系统；有效地将模型 operationalize 在生产环境中。

英文摘要

Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production scale. To close that gap, we present a microservice architecture that encapsulates pipelines of multiple models for classification, optical character recognition (OCR), and large language model structured field extraction as well as our experience running this pipeline on thousands of multi-page documents per hour. We describe our primary design decisions, including a hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, use of asynchronous processing for the many IO-bound operations in the pipeline, and an independent, horizontal scaling strategy. Using batch profiling, we identified two surprising qualitative findings that shape production deployments: OCR, not language-model parsing, dominates end-to-end latency, and the system saturates at a concurrency determined by shared GPU-inference capacity rather than worker count. Our goal is to provide practitioners with concrete architectural patterns for building document understanding systems that work beyond the benchmark; effectively operationalizing models in production.

URL PDF HTML ☆

赞 0 踩 0

2605.18816 2026-05-20 cs.LG cs.AI 版本更新

Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates

野生中的对称性：等变性在神经流体代理中的作用

Patryk Rygiel, Julian Suk, Kak Khee Yeung, Christoph Brune, Jelmer M. Wolterink

发表机构 * Department of Applied Mathematics（应用数学系）； Technical Medical Centre（技术医学中心）； Cardiovascular Health Technology Centre（心血管健康技术中心）； University of Twente（特文特大学）； Department of Computer Science（计算机科学系）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Technical University of Munich（慕尼黑技术大学）； Department of Surgery（外科系）； Amsterdam UMC, Location（阿姆斯特丹大学医学中心，地点）； University of Amsterdam（阿姆斯特丹大学）； Amsterdam Cardiovascular Sciences（阿姆斯特丹心血管科学）； Digital Society Institute（数字社会研究所）

AI总结本文研究了等变性在神经流体代理中的作用，探讨了在不同分布对齐和真实度的任务中，等变性如何提高泛化能力，并介绍了AB-GATr模型在处理耦合表面和体积量时的效率。

详情

AI中文摘要

神经代理能够将计算流体动力学（CFD）模拟的计算速度提升几个数量级，有望改变工程和医疗流程。在现实应用中使用神经代理需要解决可扩展性问题，包括大规模、高分辨率表面和体积网格以及定制架构，并通过归纳偏置来应对有限的训练数据。群等变架构是引入此类偏置的一种系统方法，但当学习问题本身破坏对称性时，例如由于数据集中的强分布对齐，可能会产生不利影响。在本工作中，我们探讨了在具有不同分布对齐和真实度的任务中，等变性如何提高神经CFD代理的泛化能力，涵盖汽车空气动力学和血流（血动力学）。为了系统评估等变性在问题可扩展性极限处的附加价值，我们引入了Anchored-Branched Geometric Algebra Transformer（AB-GATr），一种整合了可扩展性和对称性保持的神经代理，能够以E(3)等变的方式高效建模耦合的表面和体积量。我们发现，在强对齐的空气动力学数据集上，即那些破坏对称性的数据集，强制等变性会降低分布内性能。相反，在具有不同几何形状和变化对齐的血动力学基准测试中，等变性始终有益。此外，在所有基准测试中，AB-GATr的显式等变性通过数据增强始终优于隐式对称学习。我们的发现表明，等变性并非在所有领域都有益，但在缺乏强数据规律的问题中带来了实质性的优势。

英文摘要

Neural surrogates enable orders-of-magnitude acceleration of computational fluid dynamics (CFD) simulations, with the potential to transform engineering and healthcare workflows. Neural surrogate use in real-world applications requires addressing scalability to large, high-resolution surface and volume meshes, as well as to bespoke architectures, and accounting for limited training data through the use of inductive biases. Group-equivariant architectures are a principled way to introduce such bias, yet they can be detrimental when the learning problem itself breaks symmetry, for example, due to strong distributional alignment in the dataset. In this work, we investigate under which conditions equivariance improves generalization in neural CFD surrogates across tasks with increasing levels of distributional alignment and realism, covering automotive aerodynamics and blood flow (hemodynamics). To systematically assess the added value of equivariance at the limit of problem scaling, we introduce the Anchored-Branched Geometric Algebra Transformer (AB-GATr), a neural surrogate that integrates scalability and symmetry preservation to efficiently model coupled surface and volume quantities in an $E(3)$-equivariant manner. We find that on strongly aligned aerodynamics datasets, i.e., those that break symmetry, enforcing equivariance can degrade in-distribution performance. In contrast, across hemodynamic benchmarks with diverse geometries and varying alignment, equivariance is consistently beneficial. Moreover, across all benchmarks, the explicit equivariance of AB-GATr reliably outperforms implicit symmetry learning through data augmentation. Our findings showcase that equivariance is not universally beneficial across domains, yet it brings tangible advantages in problems lacking strong data regularities.

URL PDF HTML ☆

赞 0 踩 0

2605.18815 2026-05-20 cs.LG cs.DC 版本更新

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

DynaTrain: 快速在线并行切换用于弹性大语言模型训练

Yuanqing Wang, Yuchen Zhang, Hao Lin, Junhao Hu, Chunyang Zhu, Quanlu Zhang, Boxun Li, Guohao Dai, Zhi Yang, Daning Cheng, Yunquan Zhang, Yu Wang

发表机构 * Institute of Computing Technology, CAS（中国科学院计算技术研究所）； Peking University（北京大学）； Infinigence AI ； Shanghai Jiao Tong University（上海交通大学）； Tsinghua University（清华大学）

AI总结本文提出DynaTrain，一种能够快速在线重新配置任意多维并行性的分布式训练系统，通过虚拟参数空间抽象统一所有分布式训练状态，实现并行配置的确定性映射，并在密集和MoE模型上展示了显著的性能提升。

Comments GitHub Repo: https://github.com/infinigence/ElasticMegatron

详情

AI中文摘要

现代大型语言模型（LLM）训练本质上是动态的：资源波动、RLHF阶段转换和集群弹性持续地改变最优并行性布局，对现有基于静态执行模型的训练框架构成重大挑战。我们提出了DynaTrain，一种支持亚秒级在线重新配置的分布式训练系统。其核心是虚拟参数空间（VPS）抽象，该抽象将所有分布式训练状态统一到一个逻辑坐标空间中，将任何并行性配置转换为确定性映射，并将复杂的转换折叠为可管理的几何交集。在VPS之上，状态路由和转换层在内存感知、无死锁的调度下执行rank-local传输，而弹性设备管理器则将新世界构建与正在进行的训练重叠，以掩盖拓扑变化成本。在密集和MoE模型上，DynaTrain能够在2秒内重新配置70B密集模型，在4.36秒内重新配置235B MoE模型，性能优于最先进的检查点基和弹性系统，提升幅度高达三个数量级，同时保持正确性。

英文摘要

Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing training frameworks built around a static execution model. We present DynaTrain, a distributed training system for sub-second, online reconfiguration across arbitrary multi-dimensional parallelism. At its core, we propose a Virtual Parameter Space (VPS) abstraction that unifies all distributed training states under one logical coordinate space, turning any parallelism configuration into a deterministic mapping and collapsing complex transition into manageable geometric intersections. On top of VPS, a state routing-and-transition layer executes rank-local transfers under a memory-aware, deadlock-free schedule, and an Elastic Device Manager overlaps new-world construction with ongoing training to mask topology-change cost. On dense and MoE models up to 235B parameters, DynaTrain reconfigures a 70B dense model in under 2s and a 235B MoE model in 4.36s, outperforming state-of-the-art checkpoint-based and elastic systems by up to three orders of magnitude while preserving correctness.

URL PDF HTML ☆

赞 0 踩 0

2605.18814 2026-05-20 cs.LG 版本更新

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

轨迹数据归因的可信度如何？误差来源、缓解方法和实用指南

Junwei Deng, Pingbang Hu, Suliang Jin, Hao Lu, Jiachen T. Wang, Shichang Zhang, Jiaqi W. Ma

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of Michigan（密歇根大学）； Princeton University（普林斯顿大学）； Harvard University（哈佛大学）

AI总结本文系统分析了轨迹数据归因方法的误差来源，并提出缓解方法和实用指南，通过将总误差分为配置级、算法级和系统级，改进了归因的准确性，并为数据选择提供了可行的实践指导。

详情

AI中文摘要

基于轨迹的数据归因方法通过展开训练轨迹来估计训练样本对模型预测的影响。它们被广泛应用于数据选择、数据估值和模型诊断等应用，但缺乏对这些方法的全面误差分析，引发了对方法可信度的担忧，并阻碍了可靠部署。在本文中，我们提供了轨迹数据归因方法误差来源的首次系统分析，以及具体的缓解方法和下游应用的实用指南。我们将总误差分为三类：配置级、算法级和系统级。我们做出了三个贡献。首先，我们识别出优化器不匹配是主导的配置级误差：现有方法在其归因下假设使用SGD，即使对于使用现代事实上的优化器AdamW训练的模型也是如此。我们提出了AdamW-influence，以充分考虑AdamW的优化动态，在四个设置中（MLP、CNN、GPT-2和Llama 3.2-1B）估计与真实影响之间的Spearman相关性提高了10%到超过300%。其次，我们隔离了剩余的算法级误差，源于一阶泰勒近似，识别了学习率和轨迹长度作为误差大小的决定因素，并推导出一个闭合形式的误差代理，可以在原始轨迹上评估而无需重新训练。第三，我们将这些见解转化为数据选择的实用指南，通过在K-step前瞻框架下统一离线和在线策略。在此框架下，在线选择具有短时间范围通常匹配或超过离线，且最佳时间范围可以与学习率联合调节。共同，这些结果将框架转化为从业者可操作的选择配方。

英文摘要

Trajectory-based data attribution methods estimate the influence of training samples on model predictions by unrolling the training trajectory. They are widely used in applications such as data selection, data valuation, and model diagnosis, but there is a lack of comprehensive error analysis of these methods, raising concerns about method faithfulness and hindering reliable deployment. In this work, we provide the first systematic analysis of error sources in trajectory-based data attribution, together with concrete remedies to mitigate them and practical guidelines for downstream use. We organize the total error into three categories, config-level, algorithm-level, and system-level. We make three contributions. First, we identify optimizer mismatch as the dominant config-level error: existing methods derive their attribution under the assumption of SGD, even for models trained with the modern de facto optimizer AdamW. We propose AdamW-influence to fully account for AdamW's optimization dynamics, yielding improvements from 10% to over 300% in Spearman correlation between estimated and ground-truth influence across four settings spanning MLP, CNN, GPT-2, and Llama 3.2-1B. Second, we isolate the remaining algorithm-level error arising from the first-order Taylor approximation, identify the learning rate and trajectory length as factors governing the error magnitude, and derive a closed-form error proxy that can be evaluated along the original trajectory without retraining. Third, we translate these insights into practical guidelines for data selection by unifying offline and online strategies under a K-step look-ahead framework. Under this framework, online selection with a short horizon often matches or exceeds offline, and the optimal horizon can be tuned jointly with the learning rate. Together, these results turn the framework into an actionable selection recipe for practitioners.

URL PDF HTML ☆

赞 0 踩 0

2605.18813 2026-05-20 cs.LG cs.AI 版本更新

Composition of Memory Experts for Diffusion World Models

记忆专家的组合用于扩散世界模型

Sebastian Stapf, Pablo Acuaviva Huertos, Aram Davtyan, Paolo Favaro

发表机构 * Computer Vision Group（计算机视觉组）； Department of Computer Science（计算机科学系）； University of Bern（伯恩大学）

AI总结本文提出了一种基于扩散的世界模型框架，通过组合专门化的记忆专家来解决记忆与效率之间的权衡问题，提升了时间一致性、过去观察的回忆和导航性能。

详情

Journal ref: Proceedings of the Fourteenth International Conference on Learning Representations (ICLR), 2026

AI中文摘要

世界模型旨在预测与过去观察一致的合理未来，这是强化学习中规划和决策的关键能力。然而，现有架构面临根本性的记忆权衡：转换器保留局部细节但受二次注意限制，而递归和状态空间模型更高效但以牺牲保真度为代价。为克服这一权衡，我们建议将未来-过去一致性与任何单一架构解耦，并利用一组专门的专家。我们引入了一种基于扩散的框架，通过对比产品-专家公式整合异构记忆模型。我们的方法实现了三个互补的角色：短期记忆专家捕捉精细的局部动态，长期记忆专家通过轻量级测试时微调在外部扩散权重中存储事件历史，以及空间长期记忆专家强制几何和空间一致性。这种组合设计避免了模式崩溃，并在不产生二次成本的情况下扩展到长上下文。在模拟和现实世界基准测试中，我们的方法提高了时间一致性、过去观察的回忆和导航性能，建立了一种新的构建和操作记忆增强扩散世界模型的范式。

英文摘要

World models aim to predict plausible futures consistent with past observations, a capability central to planning and decision-making in reinforcement learning. Yet, existing architectures face a fundamental memory trade-off: transformers preserve local detail but are bottlenecked by quadratic attention, while recurrent and state-space models scale more efficiently but compress history at the cost of fidelity. To overcome this trade-off, we suggest decoupling future-past consistency from any single architecture and instead leveraging a set of specialized experts. We introduce a diffusion-based framework that integrates heterogeneous memory models through a contrastive product-of-experts formulation. Our approach instantiates three complementary roles: a short-term memory expert that captures fine local dynamics, a long-term memory expert that stores episodic history in external diffusion weights via lightweight test-time finetuning, and a spatial long-term memory expert that enforces geometric and spatial coherence. This compositional design avoids mode collapse and scales to long contexts without incurring a quadratic cost. Across simulated and real-world benchmarks, our method improves temporal consistency, recall of past observations, and navigation performance, establishing a novel paradigm for building and operating memory-augmented diffusion world models.

URL PDF HTML ☆

赞 0 踩 0

2605.18812 2026-05-20 cs.LG cs.CL cs.IR 版本更新

PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

PASC：面向多阶段NLP和LLM流水线的管道感知置信区间

Varun Kotte

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出PASC，一种面向多阶段NLP和LLM流水线的管道感知置信区间方法，通过联合覆盖保证提升多阶段流水线的置信区间性能。

详情

AI中文摘要

现代NLP和LLM系统是流水线：命名实体识别（NER）->实体消歧（NED）->实体类型、检索增强生成（检索器->读者），以及代理链（规划器->工具->批评者）。错误在各阶段累积，但现有不确定性量化方法要么独立校准每个阶段（无联合覆盖），要么应用Bonferroni联合界（有联合覆盖但保守）。我们提出了PASC（Pipeline-Aware Split Conformal），将多阶段联合覆盖转换为单个标量置信区间问题，基于联合最大不一致性分数。PASC提供了一个有限样本分布无关的保证，所有K阶段同时覆盖的概率至少为1 - alpha，并且几乎紧致，误差不超过1/(n+1)。在CoNLL-2003上的三阶段NER->NED->实体类型流水线中，PASC实现了96.4%的端到端覆盖，优于Bonferroni的93.4%和独立CP的86.5%，在相同平均预测集大小（1.083）下。在分布偏移至WNUT-17推特和WikiNEuRal维基数据时，PASC在测试偏移设置中保持目标覆盖，而独立CP下降到59%。PASC只需一次分位数计算，运行速度比Bonferroni快1.7倍，并可扩展到K=6阶段，其中独立CP下降到0.53端到端覆盖。相同的联合最大分数减少直接应用于复合LLM系统和代理流水线。

英文摘要

Modern NLP and LLM systems are pipelines: named entity recognition (NER) -> entity disambiguation (NED) -> entity typing, retrieval-augmented generation (retriever -> reader), and agentic chains of planner -> tool -> critic. Errors compound across stages, but existing uncertainty quantification methods either calibrate each stage independently (no joint coverage) or apply a Bonferroni union bound (joint coverage, but conservative). We present PASC (Pipeline-Aware Split Conformal), which reduces multi-stage joint coverage to a single scalar conformal prediction problem on the joint maximum nonconformity score. PASC provides a finite-sample distribution-free guarantee that all K stages are simultaneously covered with probability at least 1 - alpha, and is nearly tight up to a 1/(n+1) factor. On a three-stage NER -> NED -> entity-typing pipeline over CoNLL-2003, PASC achieves 96.4% end-to-end coverage versus 93.4% for Bonferroni and 86.5% for independent CP, at identical average prediction set size (1.083). Under distribution shift to WNUT-17 Twitter and WikiNEuRal Wikipedia data, PASC empirically maintains the target coverage in the tested shift settings while independent CP collapses to 59%. PASC requires a single quantile computation, runs 1.7x faster than Bonferroni, and scales to K = 6 stages where independent CP drops to 0.53 end-to-end coverage. The same joint-maximum-score reduction applies directly to compound LLM systems and agent pipelines.

URL PDF HTML ☆

赞 0 踩 0

2605.18810 2026-05-20 cs.LG cs.AI 版本更新

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

D-PACE：动态位置感知交叉熵用于并行推测草案

Tianyu Wu, Yu Yao, Zhenting Qi, Han Zheng, Zhuohan Wang, Haoran Ma, Lawrence Liao, Himabindu Lakkaraju, Ju Li, Yilun Du

发表机构 * Harvard（哈佛大学）； MIT（麻省理工学院）

AI总结本文提出D-PACE，一种动态位置感知交叉熵，用于改进并行推测草案的训练，通过动态调整位置权重以提高生成速度和输出长度。

详情

AI中文摘要

推测解码通过让小型草案生成器并行生成token，由更大目标模型验证，从而加速LLM推理。最近的扩散式并行草案生成器如DFlash在一次前向传递中预测完整的B-token块，使深度草案生成器和更长的接受块成为可能。然而，现有多token草案生成器目标通常使用固定的位置依赖加权计划，如头部依赖权重或块位置衰减，这在训练过程中无法适应限制接受的位置变化。为此，我们从可微的替代品中推导出每位置的训练权重，使每个位置的权重与其log概率梯度贡献相匹配。所得到的损失，D-PACE（动态位置感知交叉熵），将训练信号转向当前限制接受的位置，随着草案生成器的改进。在六个基准、两个Qwen3-4B草案深度、两个解码温度和两个额外的目标模型上，D-PACE一致地提高了墙钟加速速度和平均生成长度，测量训练时间开销为2.3%，且不改变草案生成器的架构或推理过程。

英文摘要

Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during training. To address this, we derive per-position training weights from a differentiable surrogate of expected accepted draft length, matching the weight of each position to its log-probability gradient contribution. The resulting loss, D-PACE (Dynamic Position-Aware Cross-Entropy), shifts training signal toward positions that currently limit acceptance as the drafter improves. Across six benchmarks, two Qwen3-4B draft depths, two decoding temperatures, and two additional target models, D-PACE consistently improves both wall-clock speedup and average emitted length, with 2.3\% measured training-time overhead and no changes to the drafter architecture or inference procedure.

URL PDF HTML ☆

赞 0 踩 0

2605.18809 2026-05-20 cs.LG cs.AI 版本更新

一种用于可穿戴PPG心血管稳定性的非线性复杂性指数：多尺度验证、系统性评估修正与贝叶斯参数优化

Timothy Oladunni, Farouk Ganiyu Adewumi

发表机构 * Department of Computer Science, Morgan State University（莫根州立大学计算机科学系）

AI总结本文提出了一种基于心脏稳定性理论的非线性复杂性指数（SCSI），通过多尺度验证和系统性评估修正，结合贝叶斯参数优化，提高了可穿戴PPG心血管稳定性估计的准确性与可靠性。

详情

AI中文摘要

从可穿戴光体积脉动图（PPG）估计心血管稳定性需要一个原理性的非线性框架，但目前在启发式参数选择和评估协议方面仍存在重大差距，这些协议会夸大报告性能。我们引入了基于心脏稳定性理论的稳定性受限心血管稳定性指数（SCSI），并验证了来自四个异质PPG数据集的176,742个片段，在三个时间尺度上。跨数据集分析显示了显著的Kruskal-Wallis效应量（eta2 = 0.351，p < 0.001），强跨尺度一致性（kappa > 0.97）以及在53个ICU记录中与呼吸频率的显著相关性（Spearman r = 0.346，p = 0.011）。我们识别出三个评估伪影，这些伪影会夸大启发式AUC从真实的基线0.573到0.752：片段级交叉验证泄漏、测试集归一化泄漏以及池化AUC过重加权，这些伪影隐藏了每名患者的失败。纠正这些伪影并应用贝叶斯优化在15个联合参数上，得到SCSI在交叉验证AUC为0.720。在18个保留记录上，SCSI达到池化AUC为0.757（95%置信区间：0.686-0.828）和负预测值为0.966用于心动过速筛查，同时每记录AUC为0.497 ± 0.207被披露以提高透明度。外部验证在42个择期手术记录上得到AUC为0.621，证实了跨人群泛化。消融分析识别出非线性复杂度模块是主导组件。提出了一种稀疏三组件架构作为最小可部署配置。经过修正的协议提供了一个可重复的基准，用于未来可穿戴心血管稳定性指数。

英文摘要

Cardiovascular stability estimation from wearable photoplethysmography (PPG) requires a principled nonlinear framework, yet major gaps persist in heuristic parameter selection and evaluation protocols that inflate reported performance. We introduce a Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory and validate it across 176,742 segments from four heterogeneous PPG datasets at three temporal scales. Cross-dataset analysis demonstrates a large Kruskal-Wallis effect size (eta2 = 0.351, p < 0.001), strong cross-scale consistency (kappa > 0.97), and significant correlation with respiratory rate across 53 ICU records (Spearman r = 0.346, p = 0.011). We identify three evaluation artifacts that inflate heuristic AUC from a true baseline of 0.573 to 0.752: segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting that conceals per-patient failure. Correcting these artifacts and applying Bayesian optimization over 15 joint parameters yields SCSI with cross-validation AUC of 0.720. On 18 held-out records, SCSI achieves pooled AUC of 0.757 (95% CI: 0.686-0.828) and negative predictive value of 0.966 for tachypnea screening, while per-record AUC of 0.497 +/- 0.207 is disclosed for transparency. External validation on 42 elective-surgery records yields AUC of 0.621, confirming cross-population generalization. Ablation analysis identifies the nonlinear complexity module as the dominant component. A sparse three-component architecture is proposed as the minimal deployable configuration. The corrected protocol provides a reproducible benchmark for future wearable cardiovascular stability indices.

URL PDF HTML ☆

赞 0 踩 0

2605.18801 2026-05-20 cs.AI cs.IR cs.LG 版本更新

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

位置：让我们开发数据探针，以根本理解数据如何影响大语言模型性能

Shiqiang Wang, Herbert Woisetschläger, Hans Arno Jacobsen, Mingyue Ji

发表机构 * Department of Computer Science, University of Exeter, UK（埃克塞特大学计算机科学系）； Technical University of Munich, Germany（慕尼黑技术大学）； Department of Electrical and Computer Engineering, University of Toronto, Canada（多伦多大学电气与计算机工程系）； Department of Electrical and Computer Engineering, University of Florida, FL, USA（佛罗里达大学电气与计算机工程系）

AI总结本文提出通过开发数据探针系统方法生成合成序列，以揭示数据特性对大语言模型性能、泛化能力和鲁棒性的影响，从而超越经验启发式方法。

Comments Accepted to ICML 2026 Position Paper Track

详情

Journal ref: Link to ICML record: https://icml.cc/virtual/2026/poster/67154

AI中文摘要

数据对于大语言模型（LLMs）至关重要。然而，了解哪些数据对LLM工作流程的不同阶段（包括训练、微调、对齐、上下文学习等）有用，以及为什么有用，仍然是一个开放性问题。当前的方法依赖于对大型公共数据集进行大量实验来获得数据过滤和数据集构建的经验启发式方法。这些方法计算成本高，并且缺乏一种系统的方法来理解特定数据特性如何驱动LLM行为的本质。在本文的位置论文中，我们倡导开发系统方法来生成合成序列，这些序列由适当定义的随机过程生成，目的是当它们用于LLM工作流程的一个或多个阶段时，能够揭示有用的特点。我们将这些序列称为数据探针。通过观察LLM在数据探针上的行为，研究人员可以系统地研究数据特性如何影响模型性能、泛化能力和鲁棒性。探测序列表现出的统计特性可以通过理论概念（如典型集）来观察，这些概念被推广以描述LLM的行为。这种数据探针方法为揭示数据在LLM训练和推理中的基础作用提供了途径，超越了经验启发式方法。

英文摘要

Data is fundamental to large language models (LLMs). However, understanding of what makes certain data useful for different stages of an LLM workflow, including training, tuning, alignment, in-context learning, etc., and why, remains an open question. Current approaches rely heavily on extensive experimentation with large public datasets to obtain empirical heuristics for data filtering and dataset construction. These approaches are compute intensive and lack a principled way of understanding the essence of how specific data characteristics drive LLM behavior. In this position paper, we advocate for the need of developing systematic methodologies for generating synthetic sequences from appropriately defined random processes, with the goal that these sequences can reveal useful characteristics when they are used in one or multiple stages of the LLM workflow. We refer to such sequences as data probes. By observing LLM behavior on data probes, researchers can systematically conduct studies on how data characteristics influence model performance, generalization, and robustness. The probing sequences exhibit statistical properties that can be viewed using theoretical concepts, such as typical sets, which are generalized to describe the behaviors of LLMs. This data-probe approach provides a pathway for uncovering foundational insights into the role of data in LLM training and inference, beyond empirical heuristics.

URL PDF HTML ☆

赞 0 踩 0

2605.18800 2026-05-20 cs.LG cs.AI 版本更新

Theory-optimal Quantization Based on Flatness

基于平坦度的理论最优量化

Xiusheng Huang, Zhe Li, Xuanwu Yin, Lu Wang, Yequan Wang, Dong Li, Emad Barsoum, Kang Liu

发表机构 * The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences（认知与决策智能复杂系统重点实验室，自动化研究所，中国科学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； AMD ； Ritzz-AI

AI总结本文提出了一种基于平坦度的理论最优量化方法，通过分析量化误差与异常值之间的数学关系，引入了平坦度指标来量化异常值分布，并提出了双向对角量化框架BDQ，有效分散异常值模式，提升了大语言模型在低比特精度下的性能。

Comments 16 pages, 2 figures

详情

AI中文摘要

后训练量化已成为压缩和加速大型语言模型（LLMs）推理的广泛采用技术。LLMs量化的首要挑战源于激活异常值，这些异常值在低比特精度下显著降低模型性能。尽管近期方法试图通过跨特征维度的线性变换来缓解异常值，我们的分析表明，变换后的权重和激活仍然表现出持续的异常值模式，具有集中化的幅度分布。在本文中，我们首先建模量化误差与异常值之间的数学关系，然后引入一个新的指标平坦度来量化异常值的分布。基于此，我们推导出与平坦度相关的理论最优解。基于这些见解，我们提出了双向对角量化（BDQ），一种新的后训练量化框架，通过优化的矩阵变换有效分散异常值模式。BDQ通过学习的对角操作策略性地将异常值幅度分布到矩阵维度中。广泛的实验表明，BDQ建立了新的量化基准。在LLaMA-3-8B模型上，BDQ在W4A4量化中实现了小于1%的精度下降。在更具挑战性的W2A4KV16实验中，与最先进的方法相比，BDQ在DeepSeek-R1-Distill-LLaMA-70B模型上将性能差距减少了39.1%。

英文摘要

Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary challenges in LLMs quantization stem from activation outliers, which significantly degrade model performance especially at lower bit precision. While recent approaches attempt to mitigate outliers through linear transformations across feature dimensions, our analysis reveals that the transformed weights and activations still exhibit persistent outlier patterns with concentrated magnitude distributions. In this paper, we first model the mathematical relationship between quantization error and outliers, and then introduce a new metric Flatness to quantify the distribution of outliers. Based on this, we derive the theoretical optimal solution with respect to Flatness. Building on these insights, we propose Bidirectional Diagonal Quantization (BDQ), a novel post-training quantization framework that effectively disperses outlier patterns through optimized matrix transformations. BDQ strategically distributes outlier magnitudes across matrix dimensions via learned diagonal operations. Extensive experiments demonstrate that BDQ establishes a new quantization benchmark. It achieves less than 1\% accuracy drop in W4A4 quantization on the LLaMA-3-8B model. In the more challenging W2A4KV16 experiment, compared to state-of-the-art approaches, BDQ reduces the performance gap by 39.1\% on the DeepSeek-R1-Distill-LLaMA-70B model.

URL PDF HTML ☆

赞 0 踩 0

2605.18799 2026-05-20 cs.LG cs.AI cs.CL 版本更新

SpecX：多模态光谱的大规模基准及跨范式评估

Chengrui Xiang, Tengfei Ma, Yujie Chen, Tong Wang, Haowen Chen, Xiangxiang Zeng

发表机构 * College of Computer Science and Technology, Hunan University（湖南大学计算机科学与技术学院）

AI总结本文提出SpecX，一个用于多模态光谱的大规模基准，通过不同层级的数据集支持分子解析、光谱模拟和理解任务，揭示了专用光谱模型和多模态语言模型在光谱智能中的不同优势。

Comments 9 pages,1 figures

详情

AI中文摘要

现有的光谱基准在规模、模态对齐和评估范围上存在局限，通常专注于专用模型或多模态语言模型（MLLMs）。我们引入SpecX，一个大规模的多模态光谱基准，具有跨范式评估。SpecX包含170万种分子，涵盖NMR（1H，13C，HSQC）、IR、MS、UV、拉曼和FL等多种光谱模态，并分为三个层级：大规模数据集用于预训练，对齐的多光谱子集用于基准测试，以及高质量实验子集用于评估。SpecX支持分子解析、光谱模拟和光谱理解等多种任务，并在专用光谱模型和MLLMs之间实现统一评估。实验表明，专用模型在信号层面建模上表现优异，而MLLMs在高层推理上表现出色，但缺乏精确的光谱定位。SpecX建立了一个统一的光谱智能基准，并强调了需要光谱原生的基础模型。

英文摘要

Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, MS,UV,Raman and FL, and is organized into three tiers: a large-scale dataset for pretraining, an aligned multi-spectral subset for benchmarking, and a high-quality experimental subset for evaluation. SpecX supports a range of tasks such as molecular elucidation, spectrum simulation, and spectral understanding, and enables unified evaluation across both specialized spectral models and MLLMs. Experiments show that specialized models excel at signal-level modeling, while MLLMs exhibit strengths in high-level reasoning but lack precise spectral grounding. SpecX establishes a unified benchmark for spectral intelligence and highlights the need for spectrum-native foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.18780 2026-05-20 cs.IR cs.AI cs.LG 版本更新

DynaSTy: 一个用于动态图中时空节点属性预测的框架

Namrata Banerji, Tanya Berger-Wolf

发表机构 * The Ohio State University（俄亥俄州立大学）

AI总结本文提出了一种端到端的动态边偏置时空模型，用于预测动态图中节点属性的多步未来值，通过引入可适应的注意力偏置和预训练目标，提高了长期预测的准确性。

详情

AI中文摘要

准确预测动态图中节点级别的属性对于金融信任网络和生物网络等应用至关重要。现有时空图神经网络通常假设邻接矩阵是静态的。在本文中，我们提出了一种端到端的动态边偏置时空模型，该模型输入多维节点属性时间序列和邻接矩阵时间序列，以预测多个未来步骤的节点属性。在每个时间步，我们的基于变压器的模型将给定的邻接矩阵作为可适应的注意力偏置注入，使模型能够根据图的演变关注相关的邻居。我们进一步部署了一个掩码节点-时间预训练目标，使编码器能够重建缺失的特征，并通过调度采样和水平加权损失进行训练，以减轻长期预测中的复合误差。与先前工作不同，我们的模型能够适应不同输入样本中变化的动态图，使多系统设置中的预测成为可能，如不同主体的脑网络、不同情境的金融系统或演变的社会系统。实验证明，我们的方法在均方根误差（RMSE）和平均绝对误差（MAE）上一致优于强大的基线方法。

英文摘要

Accurate multistep forecasting of node-level attributes on dynamic graphs is critical for applications ranging from financial trust networks to biological networks. Existing spatiotemporal graph neural networks typically assume a static adjacency matrix. In this work, we propose an end-to-end dynamic edge-biased spatiotemporal model that ingests a multi-dimensional timeseries of node attributes and a timeseries of adjacency matrices, to predict multiple future steps of node attributes. At each time step, our transformer-based model injects the given adjacency as an adaptable attention bias, allowing the model to focus on relevant neighbors as the graph evolves. We further deploy a masked node-time pretraining objective that primes the encoder to reconstruct missing features, and train with scheduled sampling and a horizon-weighted loss to mitigate compounding error over long horizons. Unlike prior work, our model accommodates dynamic graphs that vary across input samples, enabling forecasting in multi-system settings such as brain networks across different subjects, financial systems in different contexts, or evolving social systems. Empirical results demonstrate that our method consistently outperforms strong baselines on Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).

URL PDF HTML ☆

赞 0 踩 0

2511.07347 2026-05-20 physics.comp-ph cs.LG 版本更新

Walsh-Hadamard Neural Operators for Solving PDEs with Discontinuous Coefficients

Walsh-Hadamard神经算子用于求解具有不连续系数的偏微分方程

Giorgio M. Cavallazzi, Miguel Pérez Cuadrado, Alfredo Pinelli

发表机构 * Department of Engineering, City St George's, University of London（伦敦大学城市圣乔治学院工程系）

AI总结本文提出Walsh-Hadamard神经算子（WHNO）以解决具有不连续系数的偏微分方程问题，通过结合Walsh-Hadamard变换和可学习的谱权重，有效捕捉全局依赖关系，并在三个测试问题中验证了其优于傅里叶神经算子（FNO）的准确性。

详情

AI中文摘要

神经算子已逐渐成为学习偏微分方程（PDEs）解算子的强大工具。然而，基于傅里叶变换的常规谱方法在处理具有不连续系数的问题时受到吉布斯现象和尖锐界面表示差的限制。我们引入了Walsh-Hadamard神经算子（WHNO），该方法利用Walsh-Hadamard变换——一种适用于分段常数场的矩形波函数谱基——结合可学习的谱权重，将低频Walsh系数转换以高效捕捉全局依赖关系。我们在三个问题上验证了WHNO：稳态达西流（初步验证）、具有不连续热导率的热传导以及具有不连续初始条件的二维Burgers方程。在与傅里叶神经算子（FNO）相同条件下进行的受控比较中，WHNO在准确性方面表现更优，能够更好地保持材料界面处的尖锐解特征。关键发现是，WHNO与FNO的加权集合组合在单独模型上实现了显著提升：对于热传导和Burgers方程，最优集合将均方误差减少35-40%，最大误差减少高达25%。这表明Walsh-Hadamard和傅里叶表示捕捉了不连续PDE解的互补方面，WHNO在尖锐界面处表现优异，而FNO有效捕捉平滑特征。

英文摘要

Neural operators have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). However, standard spectral methods based on Fourier transforms struggle with problems involving discontinuous coefficients due to the Gibbs phenomenon and poor representation of sharp interfaces. We introduce the Walsh-Hadamard Neural Operator (WHNO), which leverages Walsh-Hadamard transforms-a spectral basis of rectangular wave functions naturally suited for piecewise constant fields-combined with learnable spectral weights that transform low-sequency Walsh coefficients to capture global dependencies efficiently. We validate WHNO on three problems: steady-state Darcy flow (preliminary validation), heat conduction with discontinuous thermal conductivity, and the 2D Burgers equation with discontinuous initial conditions. In controlled comparisons with Fourier Neural Operators (FNO) under identical conditions, WHNO demonstrates superior accuracy with better preservation of sharp solution features at material interfaces. Critically, we discover that weighted ensemble combinations of WHNO and FNO achieve substantial improvements over either model alone: for both heat conduction and Burgers equation, optimal ensembles reduce mean squared error by 35-40 percent and maximum error by up to 25 percent compared to individual models. This demonstrates that Walsh-Hadamard and Fourier representations capture complementary aspects of discontinuous PDE solutions, with WHNO excelling at sharp interfaces while FNO captures smooth features effectively.

URL PDF HTML ☆

赞 0 踩 0

2510.03589 2026-05-20 cs.LG 版本更新

FieldFormer: Locality-Aware Transformers for Spatio-Temporal Modeling on Sparse Sensor Networks

FieldFormer：用于稀疏传感器网络中时空建模的具有局部性的变换器

Ankit Bhardwaj, Ananth Balashankar, Lakshminarayanan Subramanian

发表机构 * Department of Computer Science（计算机科学系）； New York University（纽约大学）； Google DeepMind（谷歌深Mind）

AI总结本文提出FieldFormer，一种无网格变换器架构，用于在持续传感器网络中进行具有局部性的传感器空间建模。通过学习可调节的速度缩放偏移量，聚合局部证据，以适应时空依赖性，并在极端稀疏性下实现稳定和可扩展的推理。

详情

AI中文摘要

现实世界系统中的时空传感器数据往往稀疏、噪声且不规则，使得潜在场重建从根本上处于欠约束状态。在极端稀疏性下，多个物理上合理的场可能与相同观测一致，要求模型依赖于关于局部性、传输和空间规律的归纳偏置。在这种情况下，可靠的重建集中在由传感器网络引起的观测支持上，使传感器空间建模比无约束的全局场恢复更具可识别性。我们引入FieldFormer，一种无网格变换器架构，用于在持续传感器网络中进行具有局部性的传感器空间建模。对于每个查询，FieldFormer通过可学习的速度缩放偏移量聚合局部证据，以适应邻域几何到时空依赖性。邻域被构建为固定最大稀疏上下文，覆盖附近的传感器和有限的时间窗口，使在极端稀疏性下实现稳定和可扩展的推理。一个局部变换器编码器整合邻域信息，而基于坐标的神经场公式支持无网格预测。我们在五个合成和现实世界基准上评估FieldFormer，包括各向异性热扩散、浅水动力学、大气传输和污染监测数据集。结果表明，具有局部性的重建在局部依赖域仍被观测时提供显著优势，使FieldFormer在稀疏传感器空间预测任务中一致优于最先进的基线。

英文摘要

Spatio-temporal sensor data in real-world systems is often sparse, noisy, and irregular, making latent field reconstruction fundamentally underconstrained. Under extreme sparsity, multiple physically plausible fields may remain consistent with the same observations, requiring models to rely on inductive biases about locality, transport, and spatial regularity. In such regimes, reliable reconstruction is concentrated around the observational support induced by the sensor network, making sensor-space modeling a more identifiable objective than unconstrained global field recovery. We introduce FieldFormer, a mesh-free transformer architecture for locality-aware sensor-space modeling in persistent sensor networks. For each query, FieldFormer aggregates local evidence using learnable velocity-scaled offsets that adapt neighborhood geometry to spatio-temporal dependencies. Neighborhoods are constructed as fixed maximal sparse contexts over nearby sensors and bounded temporal windows, enabling stable and scalable inference under extreme sparsity. A local transformer encoder integrates neighborhood information, while a coordinate-based neural field formulation supports mesh-free prediction. We evaluate FieldFormer on five synthetic and real-world benchmarks, including anisotropic heat diffusion, shallow-water dynamics, atmospheric transport, and pollution monitoring datasets. Results show that locality-aware reconstruction provides strong advantages when local domains of dependence remain observed, enabling FieldFormer to consistently outperform state-of-the-art baselines on sparse sensor-space prediction tasks.

URL PDF HTML ☆

赞 0 踩 0

2412.02818 2026-05-20 cs.RO cs.LG 版本更新

RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

RoboMD: 通过语义势场揭示机器人漏洞

Som Sagar, Jiafei Duan, Sreevishakh Vasudevan, Yifan Zhou, Heni Ben Amor, Dieter Fox, Ransalu Senanayake

发表机构 * Arizona State University（亚利桑那州立大学）； University of Washington（华盛顿大学）

AI总结本研究提出RoboMD框架，通过学习基于连续视觉-语言嵌入的深度强化学习策略，揭示机器人在现实世界中因外部变化导致的漏洞，通过虚拟运行实现高效安全的漏洞分析，实验表明其能发现比现有基线多23%的漏洞，并提升机器人操作性能。

Comments 26 Pages, 20 figures

详情

AI中文摘要

机器人操作策略虽然对物理AI的前景至关重要，但在现实世界中存在外部变化时却极易产生漏洞。诊断这些漏洞面临两大挑战：（i）需要测试的 relevant 变化通常未知，（ii）直接在现实世界中测试成本高且不安全。我们介绍了一个框架，通过在连续视觉-语言嵌入上进行虚拟运行，学习一个单独的深度强化学习（深度RL）策略来预测漏洞。通过将富含语义和视觉变化的嵌入空间视为势场，该策略学会向易损区域移动并被成功区域排斥。该漏洞预测策略在虚拟运行中训练，使漏洞分析能够扩展和安全地进行，而无需昂贵的物理试验。通过查询该策略，我们的框架构建了一个概率性漏洞可能性地图。在模拟基准和物理机器人手臂上的实验表明，我们的框架揭示的漏洞比最先进的视觉-语言基线多出23%，揭示了被启发式测试忽略的细微漏洞。此外，我们展示了通过我们的框架发现的漏洞微调操作策略，可以使用更少的微调数据提升操作性能。

英文摘要

Robot manipulation policies, while central to the promise of physical AI, are highly vulnerable in the presence of external variations in the real world. Diagnosing these vulnerabilities is hindered by two key challenges: (i) the relevant variations to test against are often unknown, and (ii) direct testing in the real world is costly and unsafe. We introduce a framework that tackles both issues by learning a separate deep reinforcement learning (deep RL) policy for vulnerability prediction through virtual runs on a continuous vision-language embedding trained with limited success-failure data. By treating this embedding space, which is rich in semantic and visual variations, as a potential field, the policy learns to move toward vulnerable regions while being repelled from success regions. This vulnerability prediction policy, trained on virtual rollouts, enables scalable and safe vulnerability analysis without expensive physical trials. By querying this policy, our framework builds a probabilistic vulnerability-likelihood map. Experiments across simulation benchmarks and a physical robot arm show that our framework uncovers up to 23% more unique vulnerabilities than state-of-the-art vision-language baselines, revealing subtle vulnerabilities overlooked by heuristic testing. Additionally, we show that fine-tuning the manipulation policy with the vulnerabilities discovered by our framework improves manipulation performance with much less fine-tuning data.

URL PDF HTML ☆

赞 0 踩 0