详情

AI中文摘要

表格基础模型（TFMs）在健康数据集上表现出色，但其推理成本和基础设施需求限制了实际应用。我们研究了是否可以通过知识蒸馏将TFMs的预测行为转移到轻量级表格模型中。由于上下文TFMs在推理时依赖于训练集，直接蒸馏会引入上下文泄露；我们通过分层出折教师标签来解决这一问题。在19个医疗数据集、6个TFM教师、4个学生家族和多个多教师集成模型上，我们发现蒸馏后的学生模型至少保留了教师AUC的90%，在某些情况下优于教师，同时在CPU上运行速度至少快26倍，并保持了对健康应用至关重要的校准和公平性。此外，多教师平均法并不总能超越最佳单教师。因此，具有泄漏意识的蒸馏是一种将TFM质量预测带入受推理限制的健康环境中的可行途径。

英文摘要

Tabular foundation models (TFMs) achieve strong performance on health datasets, but their inference cost and infrastructure requirements limit practical use. We study whether their predictive behavior can be transferred to lightweight tabular models through knowledge distillation. Since in-context TFMs condition on the training set at inference time, naive distillation can introduce context leakage; we address this with stratified out-of-fold teacher labeling. Across $19$ healthcare datasets, $6$ TFM teachers, $4$ student families, and several multi-teacher ensembles, we find that distilled students retain at least $90\%$ of teacher AUC, outperforming teachers in some cases, while running at least $26\times$ faster on CPU and preserving calibration and fairness critical for health applications. Moreover, multi-teacher averaging does not consistently improve over the best single teacher. Leakage-aware distillation is thus a viable route for bringing TFM-quality predictions into inference-constrained health settings.

URL PDF HTML ☆

赞 0 踩 0

2605.18701 2026-05-19 cs.LG q-bio.QM 版本更新

Learning Normal Representations for Blood Biomarkers

学习正常表示以血清生物标志物

Aashna P. Shah, Michelle M. Li, Yash Lal, Seffi Cohen, Liat F. Antwarg, Morgan Sanchez, James A. Diao, Chirag J. Patel, Ben Y. Reis, Ran D. Balicer, Noa Dagan, Arjun K. Manrai

发表机构 * Department of Biomedical Informatics, Harvard Medical School（哈佛医学院生物医学信息学系）； Department of Systems Biology, Harvard Medical School（哈佛医学院系统生物学系）； Department of Medicine, Brigham and Women’s Hospital（布里洛妇产科医院医学系）； Department of Mathematics, Johns Hopkins University（约翰霍普金斯大学数学系）； Computational Health Informatics Program (CHIP), Boston Children’s Hospital（波士顿儿童医院计算健康信息学计划）； The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute（哈佛医学院伊万和弗rancesca伯克伍德家庭生活实验室合作项目及克劳斯研究机构）； Clalit Research Institute, Innovation Division, Clalit Health Services（克劳斯研究机构创新部门，克劳斯健康服务）； Faculty of Computer and Information Science, Ben Gurion University（本· Gurion大学计算机与信息科学系）

AI总结该研究提出NORMA框架，通过结合患者历史和人口水平数据生成更精确的参考区间，以改善血清生物标志物的个性化解读，避免过度个性化导致的误诊风险。

详情

AI中文摘要

基于生物液体的生物标志物是临床诊断和管理的基础，但其解释主要依赖于固定的参考区间，这些区间忽略了稳定的个体间变异性。因此，基于群体的解释可能会掩盖个体基线的有意义偏差，从而延误疾病检测。为了解决这个问题，人们越来越多地尝试使用个体测试历史来个性化血清生物标志物的解释。然而，这些方法可能会过度拟合稀疏数据，导致假阳性率升高和不必要的随访，并可能无意中包含未被识别或亚临床疾病。在这里，我们利用近20亿个纵向实验室测量值，来自超过160万名北美洲、中东和东亚的个体，表明尽管实验室值高度个体化，但纯个性化区间经常过度拟合，将多达68%的测量值分类为异常，而没有与不良临床结果相应的关联。我们随后引入NORMA，一个基于条件变压器的框架，通过结合患者的历史和人口水平数据中的“正常”变异生成参考区间。NORMA生成的区间在预测结果方面更具精度，包括死亡率、急性肾损伤和慢性疾病。这些发现警示过度个性化在实验室医学中的风险，并证明将个体轨迹锚定到人口水平先验优于单独的方法。为了促进透明度，我们公开发布模型、代码和一个交互式用户界面，以实现可访问的个性化实验室解释。

英文摘要

Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient's history and population-level data about "normal" variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.

URL PDF HTML ☆

赞 0 踩 0

2605.18696 2026-05-19 cs.LG cs.AI 版本更新

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

表格基础模型的集成——多样性上限与校准陷阱

Aditya Tanna, Yash Desai, Pratinav Seth, Mohamed Bouadi, Nassim Bouarour, Vinay Kumar Sankarapu

发表机构 * Lexsi Labs（Lexsi实验室）

AI总结本文研究了表格基础模型（TFMs）的集成方法，发现尽管集成通常能提升性能，但现代TFMs的集成池近似冗余，且某些集成策略在准确率和校准上表现不佳，建议采用贪心选择作为实用默认方案。

详情

AI中文摘要

对抗梯度攻击的无防御策略：更少即是更多？

Mohamed elShehaby, Ashraf Matrawy

发表机构 * Carleton University（卡尔顿大学）； Computer Engineering Carleton University Ottawa, Canada（计算机工程系卡尔顿大学渥太华加拿大）； School of Information Technology Carleton University Ottawa, Canada（信息科技学院卡尔顿大学渥太华加拿大）

AI总结本文研究了通过精心选择网络架构是否能构建出固有鲁棒的深度神经网络（DNN）基于网络入侵检测系统（NIDS），而无需额外显式防御。通过数千次实验，发现较浅的网络、减少特征集和使用ReLU激活函数能有效降低对抗攻击的脆弱性，且简单模型在保持高清洁流量检测性能和低训练时间的同时优于更深层的模型。

2605.18663 2026-05-19 cs.AI cs.CL cs.LG 版本更新

GIM: Evaluating models via tasks that integrate multiple cognitive domains

GIM：通过整合多个认知领域的任务评估模型

Rohit Patel, Alexandre Rezende, Steven McClain

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）

AI总结本文提出GIM基准测试，通过整合多个认知领域的任务来评估模型，其核心方法是设计820个原创问题，结合广泛的知识和多种认知操作，从而保持推理在现实任务中的基础性，同时通过2PL IRT模型校准能力估计，发布涵盖22个模型和47种测试配置的综合排行榜，并深入研究了测试时计算与模型能力之间的权衡。

Comments 56 pages, 27 figures, 4 tables. Code: https://github.com/facebookresearch/gim ; Dataset: https://huggingface.co/datasets/facebook/gim

详情

AI中文摘要

随着LLM基准测试趋于饱和，评估社区已采取两种策略来提高难度：提升知识需求（GPQA，HLE）或完全去除知识而采用抽象推理（ARC-AGI）。前者将记忆混淆为能力，后者使推理脱离实际应用背景。我们采取了不同的方法。Grounded Integration Measure（GIM）是一个包含820个原创问题（615个公开问题，205个私有问题）的基准测试，其中难度来自于整合；每个问题都需要协调多种认知操作（约束满足、状态跟踪、知识警惕、受众校准）在广泛可获取的知识上，从而保持推理在现实任务中而不依赖专门的专家知识。每个问题都是原创专家撰写的组成，大多数有基于评分标准分解的评分（中位数6个独立判断的准则）。一个平衡的公开-私有划分提供了内置的污染诊断。我们校准了一个连续响应的2参数逻辑（2PL）IRT模型，超过200,000个提示-响应对，覆盖28个模型，产生稳健的能力估计，即使在原始准确率被错误或缺失数据扭曲的情况下，也能正确排序测试配置，解决了基准报告中的常见挑战。使用这一框架，我们发布了一个涵盖22个模型和47种测试配置的综合排行榜（独特的模型和思考级别对），并进行了迄今为止最广泛的已发表研究，探讨在固定基准上测试时计算与模型能力之间的权衡：11个模型在35种测试配置中被扫过。我们观察到，家庭内部配置选择，如思考预算和量化，与模型选择一样重要。我们发布了评估框架、校准的IRT参数和所有公开问题。

英文摘要

As LLM benchmarks saturate, the evaluation community has pursued two strategies to increase difficulty: escalating knowledge demands (GPQA, HLE) or removing knowledge entirely in favor of abstract reasoning (ARC-AGI). The first conflates memorization with capability; the second divorces reasoning from the practical contexts in which it matters. We take a different approach. The Grounded Integration Measure (GIM) is a benchmark of 820 original problems (615 public, 205 private) where difficulty comes from integration; individual problems require coordinating multiple cognitive operations (constraint satisfaction, state tracking, epistemic vigilance, audience calibration) over broadly accessible knowledge, so that reasoning stays grounded in realistic tasks without being gated on specialized expertise. Each problem is an original expert-authored composition, majority with rubric-decomposed scoring (median 6 independently judged criteria). A balanced public--private split provides built-in contamination diagnostic. We calibrate a continuous response 2-parameter logistic (2PL) IRT model over >200k prompt-response pairs across 28 models, producing robust ability estimates that correctly order test-configurations even when raw accuracy is distorted by errors or missing data, addressing a common challenge in benchmark reporting. Using this framework, we present a comprehensive leaderboard spanning 22 models and 47 test-configurations (unique model, thinking-level pairs), and conduct what is to our knowledge the most extensive published study of how test-time compute trades off against model capability on a fixed benchmark: 11 models swept across 35 test-configurations. We observe that within-family configuration choices, such as thinking budget and quantization, matter as much as model selection. We release the evaluation framework, calibrated IRT parameters, and all public problems.

URL PDF HTML ☆

赞 0 踩 0

2605.18662 2026-05-19 cs.LG 版本更新

Efficient and Noise-Tolerant PAC Learning of Multiclass Linear Classifiers

高效且抗噪声的多类线性分类器PAC学习

Rita Adhikari, Shiwei Zeng

发表机构 * Augusta University（奥古斯塔大学）

AI总结本文研究了在存在恶意噪声的情况下，如何高效学习多类线性分类器，并提出了一种在混合分布和边际条件下的PAC学习算法，该算法在常数噪声率下仅需O(k²·(d log d + log k))个样本。

详情

AI中文摘要

自上个世纪以来，噪声容忍的PAC学习线性模型一直是机器学习社区的核心关注点。近年来，许多计算高效的算法已被提出，用于在多种噪声模型下学习线性阈值函数。然而，当问题考虑多类学习设置，即当类别数k至少为3时，尚不清楚是否存在计算高效的PAC学习算法，当数据集被恶意破坏时。在本文中，我们假设边际分布是有限方差分布的混合，并且数据集同时满足边际条件。我们证明存在一种计算高效的算法，能够在常数速率的恶意噪声下，使用至多O(k²·(d log d + log k))个样本来PAC学习多类线性分类器{h_w:x↦argmax_{y∈[k]}w_y·x, x∈R^d, w∈R^{kd}}。我们的算法包含两个主要成分：基于聚类的修剪方案和标准的多类合页损失最小化程序。即使在二元设置的特殊情况下，即k=2时，我们的结果也严格优于所有先前工作。

英文摘要

Noise-tolerant PAC learning of linear models has been of central interests in machine learning community since the last century. In recent years, many computationally-efficient algorithms have been proposed for the problem of learning linear threshold functions under multiple noise models. Yet, when the problem is considered under multiclass learning settings, i.e. when the number of classes $k$ is at least $3$, it is unknown whether there exist computationally-efficient PAC learning algorithms when the data sets are maliciously corrupted. In this paper, we consider that the marginal distribution is a mixture of bounded variance distributions and the data sets satisfy a margin condition at the same time. We show that there exists a computationally-efficient algorithm that PAC learns multiclass linear classifiers $\{h_w:x\mapsto \arg\max_{y\in[k]}w_y\cdot x, x\in \mathbb{R}^d, w\in\mathbb{R}^{kd}\}$ using at most $O(k^2\cdot (d\log d+\log k))$ samples even under a constant rate of nasty noise. Our algorithm consists of two main ingredients: a cluster-based pruning scheme and a standard multiclass hinge loss minimization program. Even in the special case of binary setting, i.e. $k=2$, our result is strictly stronger than all prior works.

URL PDF HTML ☆

赞 0 踩 0

2605.18656 2026-05-19 stat.ML cs.AI cs.LG stat.ME 版本更新

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

统计界限与差分隐私联邦学习的高效算法

Arnab Auddy, Xiangni Peng, Subhadeep Paul

发表机构 * Department of Statistics（统计系）

AI总结本文研究了差分隐私联邦学习中估计精度、隐私约束和通信成本之间的权衡，提出了FedHybrid和FedNewton两种高效算法，通过减少通信成本提升准确性，并建立了均方误差的上界和下界以评估算法性能。

详情

AI中文摘要

联邦学习是训练机器学习和人工智能模型的一种主流框架，用于在众多用户设备或数据库之间协同训练。我们研究了差分隐私（DP）联邦M估计中估计精度、隐私约束和通信成本之间的权衡。文献中的两种标准方法是FedAvg，可能面临较高的联邦偏差，以及FedSGD，可能导致较高的通信成本。为了在减少通信成本的同时提高准确性，我们提出了FedHybrid，它使用FedSGD，但起始时通过FedAvg估计器改进初始化。我们还提出了FedNewton，通过平均本地牛顿迭代来减少FedAvg的偏差，从而在客户端数量增长缓慢时，以更少的通信轮次达到与FedSGD相当的估计精度。我们建立了这些估计器的DP版本的均方误差率的有限样本上界，作为客户端数量、本地样本大小、隐私预算和迭代次数的函数。我们进一步推导了任何迭代私有联邦过程的均方误差的最小最大下界，以作为评估这些方法最优性差距的基准。我们还通过在MNIST和CIFAR-10计算机视觉数据集上训练逻辑回归和神经网络来数值评估我们的方法。

英文摘要

Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard methods in the literature are FedAvg, which may suffer from high federation bias, and FedSGD, which can incur high communication cost. Aimed at improving accuracy at a reduced communication cost, we propose FedHybrid, which uses FedSGD starting with an improved initialization by the FedAvg estimator. We propose FedNewton, which averages local Newton iterations to reduce bias in FedAvg, achieving an estimation accuracy comparable to FedSGD with much fewer communication rounds when the number of clients grows sufficiently slowly. We establish finite sample upper bounds on the mean-squared error rates of the DP versions of these estimators as functions of the number of clients, local sample sizes, privacy budget, and number of iterations. We further derive a minimax lower bound on the MSE of any iterative private federated procedure that provides a benchmark to assess the optimality gap of these methods. We numerically evaluate our methods for training a logistic regression and a neural network on the computer vision datasets MNIST and CIFAR-10.

URL PDF HTML ☆

赞 0 踩 0

2605.18654 2026-05-19 cs.LG cs.AI 版本更新

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

口袋基础模型：将TFMs压缩成CPU可用的梯度提升树

Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay kumar Sankarapu, Pratinav Seth

发表机构 * Lexsi Labs（Lexsi实验室）

AI总结本文提出了一种将高性能表格基础模型（TFMs）压缩成CPU原生梯度提升树的方法，以解决实时欺诈评分需求与现有模型性能之间的差距，同时在多个数据集上验证了该方法的有效性。

详情

AI中文摘要

一个欺诈评分器需要在2毫秒内响应。最好的表格基础模型（TFMs）在GPU上需要151-1275毫秒。我们通过将TFM离线压缩成XGBoost或CatBoost的学生模型，该模型可以在CPU上原生运行，从而缩小这一差距。核心障碍是特定于上下文学习（ICL）教师：他们在评分自己的训练集时会泄露标签，导致软目标崩溃为近一热向量，不再有可供压缩的类间结构。分层出折（OOF）教师标注可以防止这一问题。在153个来自TALENT、OpenML-CC18、TabZilla和TabArena的数据集上，将TabICLv2压缩成XGBoost在CPU上达到0.882宏均AUC（96.5%的教师AUC），在1.9毫秒内，比教师-学生对的教师模型快38到860倍，且在统计上显著优于调优的CatBoost基线（Wilcoxon p=0.0008；51%胜率）。四个进一步发现：教师排名精确转移到学生排名；收益集中在低维数据（<21个特征：比CatBoost高0.011 vs. >21个特征：高0.001）；多教师平均有助于MLP学生（+0.006，p=0.003）但对树学生增加不到0.001；在高维任务中，当教师本身落后于CatBoost时，压缩反而使情况更糟。完整的流水线作为TabTune库的一部分开源。

英文摘要

A fraud scorer needs to answer in under 2 ms. The best tabular foundation models (TFMs) take 151-1,275 ms on GPU. We close this gap by distilling the TFM offline into an XGBoost or CatBoost student that runs natively on CPU. The central obstacle is specific to in-context learning (ICL) teachers: they leak labels when scoring their own training set, so the soft targets collapse to near-one-hot vectors with no inter-class structure left to distill. Stratified out-of-fold (OOF) teacher labeling prevents this. Across 153 classification datasets drawn from TALENT, OpenML-CC18, TabZilla, and TabArena, distilling TabICLv2 into XGBoost gives 0.882 macro-mean AUC (96.5% of teacher AUC) at 1.9 ms on CPU, a 38x to 860x speedup across teacher-student pairs with a statistically significant edge over a tuned CatBoost baseline (Wilcoxon p = 0.0008; 51% win rate). Four further findings: teacher rank transfers exactly to student rank; gains concentrate on low-dimensional data (< 21 features: +0.011 over CatBoost vs. >21 features: +0.001); multi-teacher averaging helps MLP students (+0.006, p = 0.003) but adds less than 0.001 for tree students; and on high-dimensional tasks where the teacher itself trails CatBoost, distillation makes things worse rather than better. The full pipeline is open-sourced as part of the TabTune library.

URL PDF HTML ☆

赞 0 踩 0

2605.18648 2026-05-19 cs.LG cs.AI cs.CL 版本更新

An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

对软标签学习和校准中人类与模型不确定性的评估

Maja Pavlovic, Silviu Paun, Massimo Poesio

发表机构 * Queen Mary University London（伦敦女王玛丽大学）； Amazon（亚马逊）； University of Utrecht（乌得勒支大学）

AI总结本文通过对比人类和模型标签在软标签学习中的效果，发现人类标签不仅提升了模型准确性，还通过正则化作用改善了模型在困难样本上的校准和训练稳定性。

详情

AI中文摘要

人类对齐的人工智能的核心在于理解人类提取的标签相对于合成标签的优势。虽然人类软标签通过捕捉不确定性来提高校准，但先前研究将这些好处与隐含的错误标签修正（模式偏移）混淆了，从而掩盖了软标签的真实效果。我们对MNIST和一个合成变体上的软标签学习进行了受控审计，重新标注子集以提取人类不确定性。通过将软标签监督与底层标签模式偏移解耦，我们发现虽然人类软标签确实提供了准确性提升，但其更大的价值在于作为正则化器，改善模型在困难样本上的校准并促进训练运行中的稳定收敛。数据集制图显示，训练于人类软标签的模型能反映人类不确定性，而训练于合成标签的模型则无法与人类对齐。广泛而言，这项工作提供了一个用于人类-人工智能不确定性对齐的诊断测试平台。

英文摘要

Central to human-aligned AI is understanding the benefits of human-elicited labels over synthetic alternatives. While human soft-labels improve calibration by capturing uncertainty, prior studies conflate these benefits with the implicit correction of mislabeled data (mode shifts), obscuring true effects of soft-labels. We present a controlled audit of soft-label learning across MNIST and a synthetic variant, re-annotating subsets to extract human uncertainty. By decoupling soft-label supervision from underlying label mode shifts, we show that while human soft-labels do provide accuracy gains, their larger value lies in acting as a regularizer that improves model calibration on difficult samples and promotes stable convergence across training runs. Dataset cartography reveals models trained on human soft-labels mirror human uncertainty, whereas those trained on synthetic labels fail to align with humans. Broadly, this work provides a diagnostic testbed for human-AI uncertainty alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.18635 2026-05-19 cs.LG cs.AI 版本更新

Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

数据呈现与架构：用于表格基础模型的信用风险预测重采样策略

Aditya Tanna, Mitul Solanki, Mohamed Bouadi, Nassim Bouarour, Pratinav Seth, Vinay Kumar Sankarapu

发表机构 * Lexsi Labs（Lexsi实验室）

AI总结本文研究了在信用风险预测中，通过不同的上下文构建策略对表格基础模型性能的影响，发现上下文构建策略比模型架构对AUC-ROC指标的贡献更大。

详情

AI中文摘要

信用违约预测是一个具有严重类别不平衡、异质特征和严格延迟预算的表格学习问题。表格基础模型（TFMs）通过上下文学习来解决这个问题，其预测结果对上下文窗口的构建方式敏感。我们在Home Credit和Lending Club数据集上基准测试了四种经典模型和五种TFMs，变化上下文构建策略（七种选项）和上下文大小（1K到50K）。在两个数据集上，上下文策略的选择对AUC-ROC的方差解释比模型家族的选择更大：平衡和混合采样比均匀采样增加3到4个AUC点，且差距超过了TFMs之间的差异。使用5K到10K的平衡上下文，最强的TFMs达到经典基线模型在完整数据上训练的AUC，同时恢复了默认类别召回率，而默认阈值GBDTs无法做到。我们将此视为证据，表明在不平衡信用风险设置中，上下文构建而非架构选择是TFMs的主要部署杠杆。

英文摘要

Credit default prediction is a tabular learning problem with severe class imbalance, heterogeneous features, and tight latency budgets. Tabular Foundation Models (TFMs) approach this problem through in-context learning, which makes their predictions sensitive to how the context window is built. We benchmark four classical models and five TFMs on the Home Credit and Lending Club datasets, varying the context-construction strategy (seven options) and the context size (1K to 50K). On both datasets, the choice of context strategy explains more variance in AUC-ROC than the choice of TFM family: balanced and hybrid sampling add 3 to 4 AUC points over uniform sampling, and the gap exceeds the spread between TFMs. With a balanced context of 5K to 10K examples, the strongest TFMs reach the AUC of classical baselines trained on the full data, while also recovering meaningful default-class recall that default-threshold GBDTs do not. We frame this as evidence that context construction, rather than architecture choice, is the primary deployment lever for TFMs in imbalanced credit-risk settings.

URL PDF HTML ☆

赞 0 踩 0

2605.18632 2026-05-19 cs.LG cs.AI 版本更新

Position: Weight Space Should Be a First-Class Generative AI Modality

权重空间应成为一种第一类生成式AI模态

Zhangyang Wang, Peihao Wang, Kai Wang

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； Tencent Hy（腾讯实验室）

AI总结本文提出将模型检查点视为第一类数据模态，并主张在权重空间中进行生成式建模应成为机器学习的核心原始操作。通过最近的进展表明，神经网络权重可以按需合成，通常在减少适应成本的规模下达到微调性能。本文认为这些结果反映了权重空间中高性能模型占据的低维、高度结构化区域的结构事实。基于此观点，本文将现有方法组织成五阶段流程，调查该方法已实际应用的领域，并澄清当前限制：适配器规模和条件生成正在迅速发展，而无限制的前沿规模检查点合成仍处于开放状态。

Comments AI systems routinely improve or create other AI systems

详情

AI中文摘要

神经网络检查点已悄然成为大规模数据资源：现在存在数百万个训练好的权重向量，每个都编码任务、领域和架构特定的知识。本文立场论文认为，模型检查点应被视为第一类数据模态，并且在权重空间中的生成式建模应被标准化为机器学习的核心基本操作。最近的进展表明，神经权重可以按需合成，通常在减少适应成本的规模下达到微调性能。我们主张这些结果反映了底层的结构事实：高性能模型占据由对称性、平坦性、模块性和共享子空间形状的权重空间中的低维、高度结构化区域。基于这一观点，我们组织现有方法为五阶段流程，调查该方法已实际应用的领域，并澄清当前限制：适配器规模和条件生成正在迅速发展，而无限制的前沿规模检查点合成仍处于开放状态。我们的目标是将社区的默认思维从按任务优化模型转变为从学习的权重分布中采样模型，加速迈向一个AI系统定期改进或创建其他AI系统的时代。

英文摘要

Neural network checkpoints have quietly become a large-scale data resource: millions of trained weight vectors now exist, each encoding task-, domain-, and architecture-specific knowledge. This position paper argues that model checkpoints should be treated as a first-class data modality, and that generative modeling in weight space should be standardized as a core machine learning primitive. Recent advances demonstrate that neural weights can be synthesized on demand, often matching fine-tuning performance while reducing adaptation cost by orders of magnitude. We contend that these results reflect an underlying structural fact: high-performing models occupy low-dimensional, highly structured regions of weight space shaped by symmetry, flatness, modularity, and shared subspaces. Building on this view, we organize existing methods into a five-stage pipeline, survey applications where the approach is already practical, and clarify current limits: adapter-scale and conditional generation are advancing rapidly, while unrestricted frontier-scale checkpoint synthesis remains open. Our goal is to shift the community's default mindset from optimizing models per task to sampling models from learned weight distributions, accelerating toward an era in which AI systems routinely improve or create other AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.18624 2026-05-19 cs.CR cs.LG 版本更新

Learning to Look Benign: Targeted Evasion of Malware Detectors via API Import Injection

学习看起来无害：通过API导入注入实现针对恶意软件检测器的定向逃避

Juozas Dautartas, Olga Kurasova, Juozapas Rokas Čypas, Viktor Medvedev

发表机构 * Institute of Data Science and Digital Technologies, Faculty of Mathematics and Informatics, Vilnius University（数据科学与数字技术研究所，数学与信息学学院，维尔纽斯大学）

AI总结本文研究了通过添加少量特定良性软件类别的Win32 API导入，将恶意软件样本故意误分类为特定良性类别而非仅仅非恶意软件的可能性。提出了一种基于条件变分自编码器（CVAE）的框架，其解码器严格加法，能够引入新的API调用但不移除现有调用，从而保留恶意软件功能。对于每个恶意软件样本，该框架自动识别其最接近的良性类别并将其作为逃避目标。

详情

AI中文摘要

基于机器学习的恶意软件检测器广泛应用于杀毒和端点检测系统，但其对静态特征的依赖使其容易受到对抗性操纵。本文研究了一种恶意软件样本是否可以通过添加少量具有所选类别特征的Win32 API导入，故意被误分类为特定良性软件类别，而不仅仅是非恶意软件。我们提出了一种以条件变分自编码器（CVAE）为核心的框架，其解码器严格加法。该框架可以引入新的API调用但永远不会移除现有的调用，通过设计保留恶意软件功能。对于每个恶意软件样本，该框架会自动识别其最接近的良性类别并将其作为逃避目标。一个知识蒸馏的可微代理使能够基于梯度训练对抗非可微的集成检测器。在六个类别二进制Win32 API导入向量数据集上的实验表明，针对一个达到87.5%恶意软件召回率的检测器，添加仅20个API导入可将召回率降低至30%。在k=20时，逃过检测的样本中99%被分类为预期的目标类别。CVAE在所有测试的注入大小（k=5到50）中均优于基于频率的基线和随机选择。在真实PE文件提交到VirusTotal的验证中确认，该攻击能够转移到商业静态检测引擎，平均减少标记引擎的标记率54.5%。这些发现暴露了基于API的恶意软件分类器中的具体漏洞，并证明通过最小化、功能保留的修改可以实现针对所选良性类别的定向逃避。

英文摘要

Machine learning-based malware detectors are widely deployed in antivirus and endpoint detection systems, yet their reliance on static features makes them vulnerable to adversarial manipulation. This paper investigates whether a malware sample can be intentionally misclassified as a specific benign software category, not merely as "not malware", by adding a small number of Win32 API imports characteristic of that selected category, without removing any existing imports or retraining the detector. We propose a framework centered on a Conditional Variational Autoencoder (CVAE) whose decoder is strictly additive. It can introduce new API calls but never remove existing ones, preserving malware functionality by design. For each malware sample, the framework automatically identifies which benign category it most closely resembles and uses that as the evasion target. A knowledge-distilled differentiable proxy enables gradient-based training against the non-differentiable ensemble detector. Experiments on a six-class dataset of binary Win32 API import vectors extracted from 3,799 Windows executables (five benign categories, one malware class) show that, against a detector achieving 87.5% malware recall, adding just 20 API imports reduces recall to 30%. At k=20, among samples that evaded detection, 99% are classified as the intended target category. The CVAE outperforms both a frequency-based baseline and random selection at every tested injection size (k = 5 to 50). Validation on real PE files submitted to VirusTotal confirms that the attack transfers to commercial static detection engines, with an average 54.5% reduction in flagging engines. These findings expose a concrete vulnerability in API-based malware classifiers and demonstrate that targeted evasion into a chosen benign category is achievable with minimal, functionality-preserving modifications.

URL PDF HTML ☆

赞 0 踩 0

2605.18610 2026-05-19 cs.CV cs.AI cs.LG 版本更新

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

CATA: 通过冲突厌恶任务算术实现持续机器去学习

Shen Lin, Junhao Dong, Rongjie Chen, Xiaoyu Zhang, Li Xu, Xiaofeng Chen

发表机构 * Fujian Normal University（福建师范大学）； Nanyang Technological University（南洋理工大学）； Xidian University（西安电子科技大学）

AI总结本文首次研究了视觉语言模型的持续去学习问题，提出CATA方法，通过冲突厌恶任务算术有效解决去学习中的有效性、模型保真度和持续性挑战。

详情

AI中文摘要

视觉语言模型（VLMs）在对齐视觉和文本表示方面表现出色，能够支持多种多模态应用。然而，其大规模训练数据不可避免地引发了隐私、版权和不良内容的担忧，这使得机器去学习变得必要。尽管现有研究主要关注单次去学习，但实际VLM部署往往涉及随时间推移的连续删除请求，从而产生持续机器去学习。在本文中，我们首次研究了VLMs的持续去学习，并识别出该设置中的三个关键挑战：去除目标知识的有效性、保留模型效用的保真度以及在连续更新下防止知识重新出现的持续性。为了解决这些挑战，我们提出了CATA，一种冲突厌恶任务算术方法，将每个遗忘请求表示为一个去学习任务向量。通过维护历史任务向量并执行符号感知的冲突厌恶聚合，CATA抑制可能削弱先前遗忘效果的冲突更新组件。在单次和持续设置下的大量实验表明，CATA在遗忘有效性、模型保真度和遗忘持续性方面均优于基线方法。

英文摘要

Vision-language models (VLMs) have shown remarkable ability in aligning visual and textual representations, enabling a wide range of multimodal applications. However, their large-scale training data inevitably raises concerns about privacy, copyright, and undesirable content, creating a strong need for machine unlearning. While existing studies mainly focus on single-shot unlearning, practical VLM deployment often involves sequential removal requests over time, giving rise to continual machine unlearning. In this work, we make the first attempt to study continual unlearning for VLMs and identify three key challenges in this setting: effectiveness in removing target knowledge, fidelity in preserving retained model utility, and persistence in preventing knowledge re-emergence under sequential updates. To address these challenges, we propose CATA, a conflict-averse task arithmetic method that represents each forget request as an unlearning task vector. By maintaining historical task vectors and performing sign-aware conflict-averse aggregation, CATA suppresses conflicting update components that may weaken previous forgetting effects. Extensive experiments under both single-shot and continual settings show that CATA outperforms baselines in terms of forgetting effectiveness, model fidelity, and forgetting persistence.

URL PDF HTML ☆

赞 0 踩 0

2605.18609 2026-05-19 cs.LG 版本更新

Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration

在经典动量加速下实现小批量SGD的完美并行化

Sachin Garg, Michał Dereziński

发表机构 * University of Michigan（密歇根大学）

AI总结本文提出了一种通用的小批量优化理论，展示了经典动量对梯度小批量大小的加速比例关系，从而实现小批量计算的完美并行化。

详情

AI中文摘要

利用经典动量方案（如Polyak的重球方案）加速随机梯度方法，在训练大规模机器学习模型中证明了其高度成功，特别是在结合大规模小批量计算的硬件加速时。然而，经典动量对随机小批量优化的影响在理论上理解甚微，先前工作需要强噪声假设和极大的小批量。在本文中，我们开发了一种通用的随机动量加速理论，用于在插值域中优化二次函数，这是一门研究深度学习动态的流行抽象，也包括随机Kaczmarz和坐标下降等经典方法。我们的框架涵盖了重球和Nesterov式动量，允许任意小批量大小，并对随机噪声做出最小假设。特别地，我们证明了经典动量的加速与梯度小批量大小成正比（除了自然饱和点），从而实现小批量计算的完美并行化。我们的理论还提供了一个简单的动量参数选择，该选择在经验上被证明是有效的。

英文摘要

Accelerating stochastic gradient methods with classical momentum schemes, such as Polyak's heavy ball, has proven highly successful in training large-scale machine learning models, particularly when combined with the hardware acceleration of large mini-batch computations. Yet, the effect of classical momentum on stochastic mini-batch optimization has been poorly understood theoretically, with prior works requiring strong noise assumptions and extremely large mini-batches. In this work, we develop a general theory of stochastic momentum acceleration for optimizing over quadratics in the interpolation regime, a popular abstraction for studying deep learning dynamics which also includes classical methods such as randomized Kaczmarz and coordinate descent. Our framework encompasses both heavy ball and Nesterov-style momentum, allows for arbitrary mini-batch sizes, and makes minimal assumptions on the stochastic noise. In particular, we show that acceleration from classical momentum is directly proportional to the gradient mini-batch size (up to a natural saturation point), thereby enabling perfect parallelization of mini-batch computations. Our theory also provides a simple choice for the momentum parameter, which is shown to be effective empirically.

URL PDF HTML ☆

赞 0 踩 0

2605.18607 2026-05-19 cs.CL cs.LG 版本更新

Forecasting Downstream Performance of LLMs With Proxy Metrics

通过代理指标预测大语言模型的下游性能

Arkil Patel, Siva Reddy, Marius Mosbach, Dzmitry Bahdanau

发表机构 * Mila – Quebec AI Institute & McGill University（魁北克AI研究院与麦吉尔大学）； CIFAR AI Chair（CIFAR人工智能主席）； ServiceNow Research Periodic Labs（ServiceNow研究周期实验室）

AI总结本文提出通过聚合候选模型的下一个token分布中的token级统计信息（如熵、top-k准确率和专家token排名）来构建代理指标，以更准确地预测大语言模型的下游性能，优于传统的损失和计算量基线方法。

Comments Preprint. 31 pages

详情

AI中文摘要

语言模型的发展进步往往由比较决策驱动：选择哪种架构、哪种预训练语料库或哪种训练配方。做出这些决策需要可靠的性能预测，但常用的两个信号从根本上受到限制。交叉熵损失与下游能力不匹配，而直接下游评估成本高、稀疏且在早期训练阶段信息有限。相反，我们提出通过聚合候选模型的下一个token分布中的token级统计信息（如熵、top-k准确率和专家token排名）来构建代理指标。在三个设置中，我们的代理指标始终优于基于损失和计算量的基线方法：1）在跨家族模型选择中，它们对异质推理模型的排名平均Spearman Rho为0.81（与交叉熵损失的Rho为0.36相比）；2）在预训练数据选择中，它们能以大约10,000倍更低的计算成本可靠地对25个候选语料库进行排名，推动帕累托前沿超越现有方法；3）在训练时间预测中，它们在18倍计算范围内预测下游准确性时，误差大约是现有方法的一半。这些结果表明，专家轨迹是评估模型能力广泛有用的信息源，使整个模型开发生命周期中的性能预测变得可靠。

英文摘要

Progress in language model development is often driven by comparative decisions: which architecture to adopt, which pretraining corpus to use, or which training recipe to apply. Making these decisions well requires reliable performance forecasts, yet the two commonly used signals are fundamentally limited. Cross-entropy loss is poorly aligned with downstream capabilities, and direct downstream evaluation is expensive, sparse, and often uninformative at early training stages. Instead, we propose to construct proxy metrics by aggregating token-level statistics, such as entropy, top-k accuracy, and expert token rank, from a candidate model's next token distribution over expert-written solutions. Across three settings, our proxies consistently outperform loss- and compute-based baselines: 1) For cross-family model selection, they rank a heterogeneous population of reasoning models with mean Spearman Rho = 0.81 (vs. Rho = 0.36 for cross-entropy loss); 2) For pretraining data selection, they reliably rank 25 candidate corpora for a target model at roughly $10{,}000\times$ less compute than direct evaluation, pushing the Pareto frontier beyond existing methods; and 3) for training-time forecasting, they extrapolate downstream accuracy across an $18\times$ compute horizon with roughly half the error of existing alternatives. Together, these results suggest that expert trajectories are a broadly useful source of signal for assessing model capabilities, enabling reliable performance forecasting throughout the model development life cycle.

URL PDF HTML ☆

赞 0 踩 0

2605.18598 2026-05-19 cs.LG cond-mat.stat-mech math.FA math.PR math.ST stat.TH 版本更新

Pointwise Generalization in Deep Neural Networks

深度神经网络中的逐点泛化

Shaojie Li, Yunbei Xu

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文提出了一种深度神经网络逐点泛化的理论框架，通过分析全连接网络的点wise Riemannian 维度，建立了新的表示学习统计基础，提供了更精确的泛化界限。

详情

AI中文摘要

我们通过建立全连接网络的点wise泛化理论，探讨了深度神经网络为何能够泛化的根本问题。该框架解决了长期以来在刻画丰富非线性特征学习领域中的障碍，并为表示学习建立了新的统计基础。对于每个训练好的模型，我们通过从各层学习的特征表示的本征值推导出点wise Riemannian 维度来表征假设。这建立了一个有原则的框架，用于推导依赖假设的、具有表示意识的泛化界限。这些界限在理论和实验上都比基于模型大小、范数乘积和无限宽度线性化的方法有数量级更紧的保证。在分析上，我们识别了深度网络可 tractable 的结构属性和数学原理。在经验上，点wise Riemannian 维度表现出显著的特征压缩，随着过度参数化程度的增加而减小，并捕捉了优化器的隐含偏置。综合来看，我们的结果表明，深度网络在实际情况下是数学上可 tractable 的，并且其泛化性可以通过点wise、特征谱意识的复杂性得到清晰解释。

英文摘要

We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear feature-learning regime and builds a new statistical foundation for representation learning. For each trained model, we characterize the hypothesis via a pointwise Riemannian Dimension, derived from the eigenvalues of the learned feature representations across layers. This establishes a principled framework for deriving hypothesis-dependent, representation-aware generalization bounds. These bounds offer a systematic upgrade over approaches based on model size, products of norms, and infinite-width linearizations, yielding guarantees that are orders of magnitude tighter in both theory and experiment. Analytically, we identify the structural properties and mathematical principles that explain the tractability of deep networks. Empirically, the pointwise Riemannian Dimension exhibits substantial feature compression, decreases with increased over-parameterization, and captures the implicit bias of optimizers. Taken together, our results indicate that deep networks are mathematically tractable in practical regimes and that their generalization is sharply explained by pointwise, feature-spectrum-aware complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.18591 2026-05-19 cs.LG cs.AI 版本更新

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

随机优势变换（RAT）：通过直接反向传播计算自然策略梯度

Mingfei Sun

发表机构 * The University of Manchester, United Kingdom（曼彻斯特大学，英国）

AI总结本文提出RAT方法，通过直接反向传播估计正则化自然策略梯度，解决了传统方法中估计和求逆Fisher矩阵成本高的问题，实验证明其在连续和视觉控制基准上性能优异且易于实现。

Comments Accepted to ICML 2026

2605.18580 2026-05-19 cs.AI cs.LG 版本更新

When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

当结果看似正确但纪律却失败：基于轨迹的评估在隐藏对手状态下的应用

Peiying Zhu, Sidi Chang

发表机构 * Blossom AI ； Blossom AI Labs（Blossom AI 实验室）

AI总结本文提出了一种基于轨迹的评估方法，用于评估在隐藏对手状态下的行为纪律稳定性，通过轨迹诊断、机制分离和转移测试来改进强化学习策略，特别是在酒店定价和隐藏预算竞标任务中。

详情

AI中文摘要

仅结果的评估可能无法保证经济安全的智能体：一种策略可能在达到业务KPI的同时，违反可部署的行为纪律。在酒店定价中，当存在隐藏的对手状态时，学习者可能在看似合理的每间房收入上取得成绩，却无法保持规则基于的收益管理对手的定价纪律。我们引入了纪律稳定性，一种基于轨迹的评估范式：定义基准行为，限制观察到部署阶段，从失败中诱导轨迹诊断，通过消融分离机制，并测试转移和部署。在两个酒店基准和一个紧凑的隐藏预算竞标任务中，仅奖励的PPO变体无法实现轨迹对齐；揭示隐藏状态可减少标签不确定性；确定性复制可压缩不确定性；而轨迹先验或修正历史策略能更好地保持价格或投标分布。纯粹的行为克隆在对称模仿中几乎足够，而轨迹先验强化学习在容量不对称情况下增加有限的适应性。本文的贡献是一种评估和基准范式，而不是新的优化器或关于多智能体强化学习的普遍声明。

英文摘要

Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor. We introduce discipline stability, a trace-based evaluation paradigm: define the benchmark behavior, restrict observations to the deployment regime, induce trace diagnostics from failure, separate mechanisms with ablations, and test transfer and deployment. Across a two-hotel benchmark and a compact hidden-budget bidding task, reward-only PPO variants miss trace alignment; revealing hidden state reduces label uncertainty; deterministic copy collapses uncertainty; and trace-prior or corrected history policies better preserve price or bid distributions. Pure behavior cloning is nearly enough for symmetric imitation, while Trace-Prior RL adds bounded adaptation under capacity asymmetry. The contribution is an evaluation and benchmark paradigm, not a new optimizer or a universal claim about MARL

URL PDF HTML ☆

赞 0 踩 0

2605.18576 2026-05-19 cs.LG 版本更新

scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement

scHelix: 通过显式基因层面解缠实现非对称双流整合

Xichen Yan, Zelin Zang, Changxi Chi, Jingbo Zhou, Chang Yu, Jinlin Wu, Shenghui Cheng, Fuji Yang, Jiebo Luo, Zhen Lei, Stan Z. Li

发表机构 * Jinan University（济南大学）； Westlake University（西湖大学）

AI总结 scHelix通过显式基因层面解缠实现非对称双流整合，解决单细胞RNA测序数据整合中消除批次效应与保持生物学忠实性之间的矛盾，通过双流稀疏扩散编码器和非对称对齐-细化-融合协议提升整合效果。

Comments 17 pages, 8 figures, accepted by KDD 26

详情

AI中文摘要

单细胞RNA测序（scRNA-seq）数据整合中一个关键挑战是解决消除批次效应与保持生物学忠实性之间的张力。尽管近期证据表明批次效应在基因层面异质性表现，但大多数现有方法对转录组进行统一处理，常导致过度校正和细微生物学信号的丢失。为此，我们提出了scHelix，一个数据自适应框架，通过在输入层面显式将基因划分为领域不变的Anchors和领域敏感的Variants。scHelix利用配备停止梯度图缓存的双流稀疏扩散编码器，高效学习多尺度结构表示。我们的核心方法是一种新的非对称Align-Refine-Fuse协议：首先将不稳定的Variant流对齐到稳定的Anchor流拓扑结构，随后进行保守细化阶段，其中Anchor流通过有界残差门吸收去噪细节。这种分而治之的架构防止了捷径学习，确保在不损害生物簇完整性的情况下实现稳健的批次去除。广泛基准测试表明，scHelix在性能上优于现有最先进方法。

英文摘要

A critical challenge in single-cell RNA sequencing (scRNA-seq) integration is resolving the tension between eliminating batch effects and maintaining biological fidelity. While recent evidence indicates that batch effects manifest heterogeneously across genes, most existing methods process the transcriptome uniformly, frequently resulting in over-correction and loss of subtle biological signals. To address this, we present scHelix, a dataset-adaptive framework that fundamentally changes how features are processed by explicitly partitioning genes into domain-invariant Anchors and domain-sensitive Variants at the input level. scHelix utilizes a dual-stream sparse diffusion encoder equipped with stop-gradient graph caching to efficiently learn multi-scale structural representations. The core of our approach is a novel asymmetric Align-Refine-Fuse protocol: the unstable Variant stream is first aligned to the robust topology of the Anchor stream, followed by a conservative refinement phase where the Anchor stream absorbs denoised details via bounded residual gating. This divide-and-conquer architecture prevents shortcut learning and ensures robust batch removal without compromising the integrity of biological clusters. Extensive benchmarking demonstrates that scHelix outperforms state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.18567 2026-05-19 cs.CL cs.LG 版本更新

GUT-IS: A Data-Driven Approach to Integrating Constructs and Their Relations in Information Systems

GUT-IS: 一种数据驱动的方法，用于整合信息系统的构念及其关系

Maximilian Reinhardt, Jonas Scharfenberger, Burkhardt Funk

发表机构 * Institute of Information Systems（信息系统研究所）

AI总结本文提出了一种数据驱动的方法，通过结合任务适应的文本嵌入和聚类技术，生成构念分组候选集，并利用显式权衡语义纯度和聚类数量简洁性的损失函数选择最优解，从而分析构念分组及其关系在优先级从纯度转向简洁性时的变化。

Comments Accepted at the 34th European Conference on Information Systems (ECIS 2026), Milan, Italy

2605.18562 2026-05-19 stat.ME cs.AI cs.LG stat.AP 版本更新

Estimating Item Difficulty with Large Language Models as Experts

利用大语言模型作为专家估算项目难度

Diana Kolesnikova, Kirill Fedyanin, Abe D. Hofman, Matthieu J. S. Brinkhuis, Maria Bolsinova

发表机构 * Department of Methodology and Statistics, Tilburg University（蒂尔堡大学方法学与统计学系）； Smart Business Technologies（智能商务技术公司）； Department of Psychological Methods, University of Amsterdam（阿姆斯特丹大学心理方法系）； Prowise Learn, Amsterdam（Prowise Learn公司，阿姆斯特丹）； Department of Information and Computing Sciences, Utrecht University（乌得勒支大学信息与计算科学系）

AI总结本文研究了如何利用大语言模型估算新任务的难度，通过对比不同配置下的模型表现，发现基于对偶比较的配置在无额外优化时表现更优，而结合token概率和已知难度示例的绝对判断配置也表现出中等至高水平的对齐度。

Comments 24 pages, 2 figures, 9 tables

详情

AI中文摘要

准确估计项目难度对于有效的评估和适应性学习至关重要。然而，对于新创建的任务，响应数据通常不可用。预测试和专家判断可能成本高且耗时，而机器学习方法通常需要大量标记训练数据。最近的研究表明，大语言模型（LLMs）可能有所帮助。然而，关于如何通过提示配置来模拟专家进行难度估计的证据有限。本研究通过评估三种现成的LLMs作为新任务的难度评估者，填补了这一空白。使用一个在线学习系统中的项目库，研究了6个小学数学领域，将经验难度作为参考。研究采用全因子设计，交叉三个因素：判断格式（绝对vs对偶比较）、决策类型（硬决策vs基于token概率的估计）和提示策略（零样本vs少量样本）。LLM生成的难度估计与经验难度通过斯皮尔曼等级相关性进行比较。在各领域中，LLM生成的估计与经验项目难度表现出中等至强正相关。对于简单的算术任务，某些配置接近之前研究中人类专家报告的准确性范围的上限。对偶比较在无额外优化时始终优于绝对判断。然而，当结合token级概率并提供已知难度的项目示例时，绝对判断配置也表现出中等至高水平的对齐度。本研究将LLMs定位为初始项目校准的有前途的工具，并提供了有效工作流程配置的见解。

英文摘要

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow, while machine learning methods often require large labelled training datasets. Recent work suggests that large language models (LLMs) may help. However, there is limited evidence on the elicitation procedures and prompt configurations used to emulate experts for difficulty estimation. This study addresses this gap by evaluating three off-the-shelf LLMs as difficulty raters for newly created items without access to response data. Using an item bank from an online learning system, the study examined 6 domains of primary-school mathematics, with empirical difficulty estimates treated as empirical reference. The study used a full factorial design crossing three factors: judgement format (absolute vs pairwise), decision type (hard decisions vs token-probability-based estimates), and prompting strategy (zero-shot vs few-shot). LLM-derived difficulty estimates were compared with empirical difficulties using Spearman rank correlations. Across domains, LLM-based estimates exhibited moderate to strong positive correlations with empirical item difficulties. For simpler arithmetic tasks, some configurations approached the upper end of the accuracy range reported for human experts in previous research. Pairwise comparison consistently outperformed absolute judgement in the absence of additional refinements. However, when token-level probabilities were incorporated and examples of items with known empirical difficulty were provided, the absolute judgement configuration likewise demonstrated moderate-to-high alignment. The study positions LLMs as a promising tool for initial item calibration and offers insights into effective workflow configuration.

URL PDF HTML ☆

赞 0 踩 0

2605.18557 2026-05-19 cs.LG cs.NE q-bio.NC 版本更新

在叠加中探测表示流形

Alexander Modell

发表机构 * Department of Mathematics（数学系）

AI总结本文提出Manifold Probe方法，用于发现叠加中的表示流形，通过学习可线性预测的特征空间以及编码方向，从而揭示模型行为中因果相关的流形。

Comments 19 pages, 7 figures

2605.18535 2026-05-19 cs.LG cs.MA 版本更新

Beyond Scaling: Agents Are Heading to the Edge

超越扩展：智能体正走向边缘

Chunlin Tian, Dongqi Cai, Wanru Zhao, Nicholas D. Lane

发表机构 * University of Cambridge（剑桥大学）； University of Macau（澳门大学）； Nanjing University（南京大学）

AI总结本文探讨了智能体技术发展的瓶颈从单一模型压缩世界知识转向协调系统执行，提出个人智能体架构必须转向边缘计算，以适应高保真局部环境的结构耦合和零延迟执行循环需求。

详情

AI中文摘要

有用智能体智能的瓶颈已从将世界知识压缩到单一模型转变为执行协调系统。本文主张个人智能体架构必须走向边缘，因为智能体任务的核心特性，特别是其与高保真局部环境的结构耦合以及对零延迟执行循环的需求，无法与以云为中心的设计兼容。我们通过三个结构性转变来支持这一主张。首先，前额转变：能力的主要边际杠杆已从预训练规模转移到框架级执行控制。此类控制必须保持与行动环境的物理接近，以确保智能体保持认知一致性。其次，数据地理悖论，智能体数据的“暗物质”（本地文件层次结构、实时传感器流和瞬态操作系统状态）在准备传输到云时会退化、消失或失去意义，从而切断智能体与真实环境上下文的联系。第三，交互对齐循环，唯一经济和生态可持续的智能体细化数据来源是通过实时本地交互产生的高保真隐含偏好信号。我们最后提出可检验的预测，用于个人智能体的下一次部署周期。

英文摘要

The bottleneck of useful agentic intelligence has shifted from compressing world knowledge into a single model to executing a coordinated system. This position paper argues that personal-agent architecture must move to the edge because the core properties of agentic intelligence tasks, particularly their structural coupling with high-fidelity local context and the need for zero-latency execution loops, do not sit well with cloud-centric designs. We develop this claim through three structural shifts. First, the Prefrontal Turn: the main marginal lever of capability has moved from pre-training scale to framework-level executive control. Such control must remain physically close to the environment of action if the agent is to preserve cognitive alignment. Second, the Data-Geography Paradox, the ``dark matter'' of agentic data (local file hierarchies, real-time sensor streams, and transient OS states) degrades, disappears, or loses meaning once prepared for cloud transmission, thereby cutting the agent off from ground-truth context. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. We conclude with falsifiable predictions for the next deployment cycle of personal agents.

URL PDF HTML ☆

赞 0 踩 0

2605.18534 2026-05-19 cs.LG 版本更新

XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis

XCTFormer: 利用跨通道和跨时间依赖性提升时间序列分析

Israel Zexer, Omri Azencot

发表机构 * The Stein Faculty of Computer and Information Science（施坦计算机与信息科学系）； Ben-Gurion University of the Negev（本·古里安大学）

AI总结本文提出XCTFormer模型，通过增强的注意力机制显式捕捉时间序列中的跨时间与跨通道依赖性，以提升时间序列分析性能，特别是在缺失值填补任务中取得state-of-the-art结果。

Comments TMLR 2026

详情

AI中文摘要

多变量时间序列分析涉及从多个相互依赖变量的序列中提取信息性表示，支持预测、填补和异常检测等任务。在现实场景中，这些变量通常来自共享上下文或底层现象，表明存在时间与通道间的潜在依赖性，可以利用以提高性能。然而，最近的研究发现，假设无变量间依赖性的通道独立（CI）模型往往优于显式建模此类关系的通道依赖（CD）模型。这一意外结果表明，当前CD模型可能由于依赖性捕捉的限制而未能充分发挥潜力。最近的研究重新审视了通道依赖建模，但这些方法通常采用间接建模策略，可能导致有意义的依赖性被忽视。为了解决这个问题，我们引入了XCTFormer，一种基于Transformer的通道依赖（CD）模型，通过增强的注意力机制显式捕捉跨时间和跨通道依赖性。该模型以token到token的方式操作，建模时间与通道之间每对token之间的成对依赖性。架构包括（i）数据处理模块，（ii）新型的跨关系注意力块（CRAB），以增加容量和表达性，以及（iii）可选的依赖压缩插件（DeCoP），以提高可扩展性。通过在三个时间序列基准上的广泛实验，我们证明XCTFormer在与广泛认可的基线相比时取得了强劲的结果；特别是，在填补任务中，它在MSE和MAE上分别比第二好的方法平均高出20.8%和15.3%。

英文摘要

Multivariate time-series analysis involves extracting informative representations from sequences of multiple interdependent variables, supporting tasks such as forecasting, imputation, and anomaly detection. In real-world scenarios, these variables are typically collected from a shared context or underlying phenomenon, suggesting the presence of latent dependencies across time and channels that can be leveraged to improve performance. However, recent findings show that channel-independent (CI) models, which assume no inter-variable dependencies, often outperform channel-dependent (CD) models that explicitly model such relationships. This surprising result indicates that current CD models may not fully exploit their potential due to limitations in how dependencies are captured. Recent studies have revisited channel dependence modeling with various approaches; however, these methods often employ indirect modeling strategies, which can lead to meaningful dependencies being overlooked. To address this issue, we introduce XCTFormer, a transformer-based channel-dependent (CD) model that explicitly captures cross-temporal and cross-channel dependencies via an enhanced attention mechanism. The model operates in a token-to-token fashion, modeling pairwise dependencies between every pair of tokens across time and channels. The architecture comprises (i) a data processing module, (ii) a novel Cross-Relational Attention Block (CRAB) that increases capacity and expressiveness, and (iii) an optional Dependency Compression Plugin (DeCoP) that improves scalability. Through extensive experiments on three time-series benchmarks, we show that XCTFormer achieves strong results compared to widely recognized baselines; in particular, it attains state-of-the-art performance on the imputation task, outperforming the second-best method by an average of 20.8% in MSE and 15.3% in MAE.

URL PDF HTML ☆

赞 0 踩 0

2605.18530 2026-05-19 cs.CL cs.AI cs.LG stat.ML 版本更新

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

连续扩散在语言领域中能与离散扩散竞争性地扩展

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, John Thickstun

发表机构 * NVIDIA & Cornell（NVIDIA与康奈尔大学）； NVIDIA & Georgia Tech（NVIDIA与佐治亚理工学院）； UW-Madison（威斯康星大学麦迪逊分校）； MBZUAI-IFM（梅兰德大学-IFM）； Cornell（康奈尔大学）

AI总结本文研究了连续扩散模型在语言建模中的扩展能力，通过改进Plaid模型构建RePlaid，证明连续扩散模型在计算效率和性能上可与离散模型竞争，并提供了理论支持。

详情

AI中文摘要

尽管扩散模型近期在语言建模领域受到广泛关注，但连续扩散模型在扩展性方面似乎不如离散方法。为了挑战这一观点，我们重新审视Plaid，一种基于似然的连续扩散语言模型（DLM），并构建RePlaid，通过将Plaid的架构与现代离散DLMs对齐。在统一的设定下，我们建立了第一个连续DLMs的扩展定律，表明RePlaid的计算差距仅为自回归模型的20倍，使用更少的参数优于Duo，并在过训练范围内优于MDLM。我们将RePlaid与最近的连续DLMs进行基准测试：在OpenWebText上，RePlaid实现了连续DLMs中的新状态-of-the-art PPL界值为22.1，并在生成质量上更优。这些结果表明，当通过似然训练时，连续扩散是与离散DLMs高度竞争且可扩展的替代方案。此外，我们提供了理论见解以理解基于似然训练的优势。我们展示了优化噪声调度以最小化ELBO的方差自然会得到时间上的线性交叉熵（信息损失）。这均匀地分配去噪难度，而无需任何特定时间的重参数化。此外，我们发现通过似然优化嵌入会创建结构化的几何形状并驱动最大的似然增益。

英文摘要

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.

URL PDF HTML ☆

赞 0 踩 0

2605.18522 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

超越形态学：量化颜色特征在癌症分类中的诊断能力

Farnaz Kheiri, Shahryar Rahnamayan, Masoud Makrehchi

发表机构 * Dept. of Electrical, Computer and Software Engineering（电气、计算机与软件工程系）； Ontario Tech University（安大略技术大学）； Dept. of Engineering（工程系）； Brock University（布鲁克大学）

AI总结本文研究了颜色特征在癌症分类中的诊断能力，通过排除形态学信息，评估了全局颜色特征的判别力，发现颜色特征在二分类任务中可达到高达89%的准确率，表明颜色分布包含非随机的诊断信号。

详情

AI中文摘要

在组织病理学中，人类专家主要依靠颜色增强对比度来解读组织形态，而机器视觉模型则将颜色视为原始统计信息。这一区别提出了一个根本性问题：像素强度本身，独立于结构和形态学线索，能支持多少癌症分类？为了解决这个问题，我们系统评估了全局颜色特征的独立判别力，同时刻意排除所有形态学信息。具体而言，我们提取了统计颜色矩，并对RGB和HSV颜色直方图进行离散化处理，然后在十个不同的实验设置中使用经典机器学习分类器评估其性能。我们的结果表明，在二元诊断任务（例如良性与恶性）中，仅颜色特征即可实现强劲的性能，分类准确率可达到89%。这种性能很可能归因于与恶性相关的全局色度变化。重要的是，这些简单的颜色基表示在很大程度上优于随机基线，表明原始颜色分布编码了非随机且具有诊断意义的信号用于癌症检测。因此，本研究表明，简单的、计算高效的色彩特征可以作为一种有效的预筛选工具。通过识别具有强色度指示恶性特征的样本，这些轻量模型可以作为第一道筛选系统，减少对复杂深度学习架构的计算负担。

英文摘要

In histopathology, human experts primarily rely on color as a means of enhancing contrast to interpret tissue morphology, whereas machine vision models process color as raw statistical information. This distinction raises a fundamental question: to what extent can pixel intensity alone, independent of structural and morphological cues, support cancer classification? To address this question, we systematically evaluated the standalone discriminative power of global color features while deliberately excluding all morphological information. Specifically, we extracted statistical color moments and discretized RGB and HSV color histograms, and assessed their performance across ten diverse experimental settings using classical machine learning classifiers. Our results demonstrate that color features alone can achieve strong performance in binary diagnostic tasks (e.g., benign versus malignant), with classification accuracies reaching up to 89%. This performance is likely attributable to global chromatic shifts associated with malignancy. Importantly, these simple color-based representations consistently outperformed random baselines by a substantial margin, indicating that raw color distributions encode a non-random and diagnostically relevant signal for cancer detection. Consequently, this study suggests that simple, computationally efficient color features can serve as an effective pre-screening tool. By identifying samples with strong chromatic indicators of malignancy, these lightweight models could function as a first-pass triage system, reducing the computational burden on complex deep learning architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.18509 2026-05-19 cs.LG 版本更新

Offline Contextual Bandits in the Presence of New Actions

离线情境老虎机中存在新动作的情况

Ren Kishimoto, Tatsuhiro Shimizu, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Yuki Sasamoto, Kei Tateno, Takuma Udagawa, Yuta Saito

发表机构 * Institute of Science Tokyo（科学研究所东京）； Yale University（耶鲁大学）； Sony Group Corporation（索尼集团公司）； Hanjuku-kaso, Co., Ltd.（汉库吉卡索有限公司）

AI总结本文研究了在部署日志策略后引入的新动作对离线情境老虎机（OPL）的影响，提出了一种新的OPL方法，通过局部组合伪逆（LCPI）估计器和Policy Optimization for Effective New Actions（PONA）算法，有效学习和选择新动作，同时保持整体策略性能。

Comments 12pages, 7 figures

详情

AI中文摘要

自动化决策算法驱动推荐系统和搜索引擎等应用。这些算法通常依赖于离线情境老虎机或离线学习（OPL）。传统上，OPL选择现有动作集中的动作以最大化预期奖励。然而，在许多现实场景中，动作（如新闻文章或视频内容）会持续变化，且在数据收集后，动作空间会随时间演变。我们定义在部署日志策略后引入的动作为新动作，并专注于包含新动作的OPL。现有OPL方法能有效识别现有动作集中的最优动作，但无法学习和选择新动作，因为没有相关数据被记录。为解决这一限制，我们提出了一种新的OPL方法，利用动作特征。我们首先引入局部组合伪逆（LCPI）估计器用于策略梯度，扩展了最初为离线情境老虎机滑动评估提出的伪逆估计器。LCPI在奖励建模条件和数据收集条件之间控制动作特征的权衡，捕捉不同动作特征维度之间的交互效应。此外，我们提出了一种名为Policy Optimization for Effective New Actions（PONA）的通用算法，将专门用于新动作选择的LCPI组件与在现有动作中学习效果出色的双重稳健（DR）算法结合。我们定义PONA为LCPI和DR估计器的加权和，优化现有和新动作的选择，并允许通过权重参数调整新动作选择的比例。通过广泛的实验，我们证明PONA能够高效地选择新动作，同时保持整体策略性能，相较于大多数现有方法无法选择新动作。

英文摘要

Automated decision-making algorithms drive applications such as recommendation systems and search engines. These algorithms often rely on off-policy contextual bandits or off-policy learning (OPL). Conventionally, OPL selects actions that maximize the expected reward from an existing action set. However, in many real-world scenarios, actions, such as news articles or video content, change continuously, and the action space evolves over time after data collection. We define actions introduced after deploying the logging policy as new actions and focus on OPL with new actions. Existing OPL methods identify optimal actions from the existing set effectively but cannot learn and select new actions because no relevant data are logged. To address this limitation, we propose a new OPL method that leverages action features. We first introduce the Local Combination PseudoInverse (LCPI) estimator for the policy gradient, generalizing the PseudoInverse estimator initially proposed for off-policy evaluation of slate bandits. LCPI controls the trade-off between reward-modeling condition and the condition for data collection regarding the action features, capturing the interaction effects among different dimensions of action features. Furthermore, we propose a generalized algorithm called Policy Optimization for Effective New Actions (PONA), which integrates LCPI, a component specialized for new action selection, with Doubly Robust (DR), which excels at learning within existing actions. We define PONA as a weighted sum of the LCPI and DR estimators, optimizing both the selection of existing and new actions, and allowing the proportion of new action selections to be adjusted by the weight parameter. Through extensive experiments, we demonstrate that PONA efficiently selects new actions while maintaining the overall policy performance as opposed to most existing methods that cannot select new actions.

URL PDF HTML ☆

赞 0 踩 0

2605.18508 2026-05-19 cs.LG cs.AI 版本更新

DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

DiPRL: 通过架构熵正则化学习离散程序性策略

Chengpeng Hu, Yingqian Zhang, Hendrik Baier

发表机构 * Eindhoven University of Technology（埃因霍温理工大学）； Centrum Wiskunde & Informatica（数学与信息学研究中心）

AI总结本文提出DiPRL，一种通过架构熵正则化学习可解释程序性策略的方法，以避免事后细化阶段，提高策略表达性和任务性能。

详情

AI中文摘要

程序性强化学习（PRL）通过将策略表示为可读可编辑的程序，为深度强化学习提供了一种可解释的替代方案。尽管基于梯度的方法已被开发用于优化程序的连续松弛，但在将连续松弛转换回离散程序时会显著降低性能。事后离散化会丢弃优化的分支和参数，导致策略表达性崩溃和任务性能下降，从而需要额外的微调。为克服这些限制，我们提出了可微离散程序性强化学习（DiPRL），一种在训练过程中使程序接近离散的方法，避免了单独的事后微调阶段。我们首先分析了基于梯度方法事后离散化引入的性能下降固有风险。然后，我们引入了程序架构熵正则化，这使得训练过程平滑且可微，鼓励收敛到离散程序。DiPRL在保持基于梯度优化效率的同时，减轻了事后离散化的风险。在多个离散和连续RL任务中的实验表明，DiPRL可以通过可解释的程序性策略实现强大的性能。

英文摘要

Programmatic reinforcement learning (PRL) offers an interpretable alternative to deep reinforcement learning by representing policies as human-readable and -editable programs. While gradient-based methods have been developed to optimize continuous relaxations of programs, they face a significant performance drop when converting the continuous relaxations back into discrete programs. Post-hoc discretization can discard optimized branches and parameters in a program, which results in a collapse of policy expressivity and lowered task performance, leading in turn to a need for additional fine-tuning. To overcome these limitations, we propose Differentiable Discrete Programmatic Reinforcement Learning (DiPRL), a method that learns programmatic policies that become nearly discrete during training, avoiding a separate post-hoc fine-tuning stage. We first analyze the inherent risks of performance drop introduced by post-hoc discretization of gradient-based methods. Then, we introduce programmatic architecture entropy regularization, which enables smooth, differentiable training that encourages convergence toward a discrete program. DiPRL maintains the efficiency of gradient-based optimization while mitigating the risks of post-hoc discretization. Our experiments across multiple discrete and continuous RL tasks demonstrate that DiPRL can achieve strong performance via interpretable programmatic policies.

URL PDF HTML ☆

赞 0 踩 0

2605.18498 2026-05-19 cs.LG cs.AI 版本更新

DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

DBES: 一种用于评估大规模MoE模型专家专业化程度的系统性基准和度量套件

Jing Wang, Hongxuan Lu, Jazze Young, Shu Wang, Zhimin Xin

发表机构 * Jing Wang（王静）； Hongxuan Lu（卢洪轩）； Jazze Young（杨杰兹）； Shu Wang（王舒）； Zhimin Xin（辛志敏）

AI总结本文提出DBES系统性基准和度量套件，通过多领域基准和五个理论基础的度量指标，评估MoE模型中的专家专业化程度，并验证这些度量指标在领域特定后训练中的可操作性，实现了显著的性能提升。

详情

AI中文摘要

MoE模型中的专家专业化仍缺乏深入理解，传统评估将架构负载均衡与功能专业化混淆。我们引入DBES，一种综合的诊断框架，结合多领域基准和五个理论基础的度量指标：路由专业化、归一化有效秩、领域隔离、路由刚度分数和n-gram专家度量。关键发现显示不同模型展现出不同的专业化范式：Qwen系列表现出模块化专业化，具有高领域隔离，而DeepSeek和GLM采用分布式协作。然而，我们强调专业化是诊断维度，必要但不充分用于下游性能。最重要的是，干预证据验证了这些度量指标的可操作性：通过使用DBES在领域特定后训练中识别高专业化专家路径，我们仅使用15%的原始训练资源，在专业化领域实现了66%至94.48%的性能提升，证明这些诊断工具可以转化为具体的优化算子。本文提供了首个系统性的方法，用于独立于准确度指标评估专家专业化，为下一代MoE系统的设计和后训练优化提供了关键见解。

英文摘要

Expert specialization in Mixture-of-Experts (MoE) models remains poorly understood, with traditional evaluations conflating architectural load-balancing with functional specialization. We introduce DBES, a comprehensive diagnostic framework combining a multi-domain benchmark with five theoretically grounded metrics: Routing Specialization, Normalized Effective Rank, Domain Isolation, Routing Stiffness Score, and N-gram Expertise measures. Critical findings demonstrate distinct specialization paradigms across models: Qwen-series exhibit modular specialization with high domain isolation, while DeepSeek and GLM employ distributed collaboration. However, we emphasize that specialization is a diagnostic dimension, necessary but not sufficient for downstream performance. Most crucially, interventional evidence validates the actionability of these metrics: by using DBES to identify high-specialization expert paths during domain-specific post-training, we achieved 66% to 94.48% improvement in specialized domains with only 15% of original training resources, demonstrating that these diagnostic tools can be converted into concrete optimization operators. This work provides the first systematic methodology for evaluating expert specialization independently of accuracy metrics, offering crucial insights for the design and post-training optimization of next-generation MoE systems.

URL PDF HTML ☆

赞 0 踩 0

2605.18483 2026-05-19 cs.LG cs.AI 版本更新

Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals

模态与形态：生物信号时间序列分类的框架

Jordan Tschida, Matthew Yohe, Edward Kane, Gavin Jager, Emma J. Reid, Tony G. Allen, Mark Story, Leanne Thompson, Joe Hoskins, Brandon Schreiber, Stan Seiferth, Scott Dolvin, David Cornett

发表机构 * UT-Battelle, LLC（UT-巴特勒公司）； Oak Ridge National Laboratory（橡树岭国家实验室）

AI总结本文提出了一种统一的形态-模态框架，通过分析生物信号的形态结构，揭示了如何影响模型设计和性能，强调形态对预处理和建模策略的重要性，并指出未来的工作方向包括形态数据增强和评估指标改进。

详情

AI中文摘要

生物信号时间序列分类（TSC）已从手工制作的模态特定方法发展为能够表示底层生理过程多样波形结构的深度架构（即形态）。本文综述介绍了一种统一的形态-模态框架，将波形结构与方法论设计连接起来，揭示了尖峰、爆发、振荡、慢漂移和层次节奏如何影响模型设计。通过分析脑电图、肌电图、心电图、脉搏波描记图以及眼动模态（电眼图、瞳孔测量、眼动追踪），本文展示了形态如何决定预处理和建模策略。整合这些生物信号的证据，该框架揭示形态而非模型类别最强烈地决定了性能和可解释性。这提供了深度模型在诱导偏见与底层波形动态一致时为何成功的原因。本文还识别了未来的工作，包括形态数据增强和评估指标改进以提高泛化能力。这些见解将形态意识建模定位为开发跨生物信号通用、可解释和生理意义的TSC模型的统一原则。

英文摘要

Time series classification (TSC) of biological signals has progressed from handcrafted, modality-specific approaches to deep architectures capable of representing the diverse waveform structures of underlying physiological processes (i.e., morphology). This review introduces a unified morphology--modality framework that connects waveform structure to a methodological design, revealing how spikes, bursts, oscillations, slow drift, and hierarchical rhythms inform model design. By analyzing electroencephalography, electromyography, electrocardiography, photoplethysmography, and ocular modalities (electrooculography, pupillometry, eye-tracking), the review demonstrates how morphology determines preprocessing and modeling strategies. Integrating evidence across these biological signals, the framework reveals that morphology, not model class, most strongly determines performance and interpretability. This provides insight into why deep models succeed when their inductive biases align with underlying waveform dynamics. This review also identifies future work including morphological data augmentation and evaluation metrics to improve generalization. Together, these insights position morphology-aware modeling as a unifying principle for developing generalizable, interpretable, and physiologically meaningful TSC models across biological signals.

URL PDF HTML ☆

赞 0 踩 0

2605.18476 2026-05-19 stat.CO cs.AI cs.LG 版本更新

AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

AI4BayesCode: 从自然语言描述到经过验证的模块化状态性贝叶斯采样器

Jungang Zou, Alex Ziyu Jiang, Qixuan Chen

发表机构 * Department of Biostatistics, Columbia University（哥伦比亚大学生物统计学系）

AI总结该研究提出AI4BayesCode系统，通过自然语言描述生成可运行且验证过的MCMC采样器，采用模块化设计和递归状态性编码范式，提升了贝叶斯模型的可靠性和扩展性。

详情

AI中文摘要

编码和计算仍然是马尔可夫链蒙特卡洛（MCMC）工作流程中的主要瓶颈，尤其是在现代采样算法日益复杂的情况下，现有的概率编程系统在模型支持、扩展性和可组合性方面仍然有限。我们介绍了AI4BayesCode，这是一个可扩展的LLM驱动系统，能够将自然语言的贝叶斯模型描述转换为可运行且经过验证的MCMC采样器。为了提高可靠性，AI4BayesCode采用模块化设计，将模型分解为模块化采样块，并将每个块映射到内置的采样组件，从而减少从头实现复杂采样算法的需要。通过预生成模型规范的验证和后生成采样器代码的验证进一步提高了可靠性。AI4BayesCode还引入了一种新的递归状态性编码范式，使模块化采样组件（可能由不同贡献者开发）能够在更大的MCMC过程中协同一致地组成。我们开发了一个基准测试套件来评估AI4BayesCode的采样器生成能力。实验表明，AI4BayesCode能够仅通过自然语言描述实现广泛的贝叶斯模型。作为一项开放系统，其能力可以随着底层AI代理的改进和新增内置块的添加而继续扩展。

英文摘要

Coding and computation remain major bottlenecks in Markov chain Monte Carlo (MCMC) workflows, especially as modern sampling algorithms have become increasingly complex and existing probabilistic programming systems remain limited in model support, extensibility, and composability. We introduce \textbf{AI4BayesCode}, an extensible LLM-driven system that translates natural-language Bayesian model descriptions into runnable, validated MCMC samplers. To improve reliability, AI4BayesCode adopts a modular design that decomposes models into modular sampling blocks and maps each block to a built-in sampling component, reducing the need to implement complex sampling algorithms from scratch. Reliability is further improved through pre-generation validation of model specifications and post-generation validation of generated sampler code. AI4BayesCode also introduces a novel recursively stateful coding paradigm for MCMC, allowing modular sampling components, potentially developed by different contributors, to be composed coherently within larger MCMC procedures. We develop a benchmark suite to evaluate AI4BayesCode for sampler-generation. Experiments show that AI4BayesCode can implement a wide range of Bayesian models from natural-language descriptions alone. As an open-ended system, its capability can continue to expand with improvements in the underlying AI agent and the addition of new built-in blocks.

URL PDF HTML ☆

赞 0 踩 0

2605.18475 2026-05-19 cs.LG cs.AI 版本更新

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

GAMMA：在任意预算下为混合精度模型进行全局位分配

Zhangyang Yao, Haiyan Zhao, Haoyu Wang, Tianbo Huang, Lihua Zhang, Xu Han

发表机构 * Beihang University（北航）； Tsinghua University（清华）； ByteDance Inc（字节跳动）

AI总结本文提出GAMMA框架，通过后训练流水线学习模块级精度偏好，优化教师强制隐藏状态重建目标并利用整数规划实现精确预算分配，从而在任意预算下提升大语言模型的精度，优于固定精度基线和搜索基混合精度方法。

详情

AI中文摘要

混合精度量化通过将更多位分配给敏感模块，提高了大语言模型（LLMs）的预算-精度权衡。然而，在LLM规模上自动化这种分配面临独特约束：可学习方法需要量化感知训练，这在十亿参数模型中不可行；训练自由替代方案依赖静态代理指标，无法捕捉跨模块交互，并且必须为每个目标预算重新计算；搜索方法成本高且无法保证精确预算符合。我们提出GAMMA，一种量化器无关的框架，完全在后训练流水线内学习模块级精度偏好。GAMMA在增强拉格朗日约束下优化教师强制隐藏状态重建目标，并通过整数规划将学习的偏好投影到精确预算可行的离散分配中。关键性质是分数重用：因为学习的偏好编码了一个稳定的敏感性排名而非预算特定权重，单次训练运行可服务于任意部署目标，仅需重新求解整数规划，将每预算适应时间从小时减少到几分钟。在Llama和Qwen模型（8B-32B）上，GAMMA优于固定精度基线（最高+12.99 Avg.）和搜索基混合精度方法（最高+7.00 Avg.），并在2.5位平均精度下可匹配固定3位质量，从而在大幅减小内存占用的情况下实现部署。

英文摘要

Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) by allocating more bits to sensitive modules. However, automating this allocation at LLM scale faces a unique combination of constraints: learnable approaches require quantization-aware training, which is infeasible for billion-parameter models; training-free alternatives rely on static proxy metrics that miss cross-module interactions and must be recomputed per target budget; and search-based methods are expensive without guaranteeing exact budget compliance. We propose GAMMA, a quantizer-agnostic framework that learns module-wise precision preferences entirely within a post-training pipeline. GAMMA optimizes a teacher-forced hidden-state reconstruction objective under an augmented Lagrangian constraint, and projects the learned preferences into exact budget-feasible discrete assignments via integer programming. A key property is score reuse: because the learned preferences encode a stable sensitivity ranking rather than budget-specific weights, a single training run serves arbitrary deployment targets by re-solving only the integer program, reducing per-budget adaptation from hours to a few minutes. Across Llama and Qwen models (8B--32B), GAMMA outperforms both fixed-precision baselines (up to +12.99 Avg.) and search-based mixed-precision methods (up to +7.00 Avg.), and can match fixed 3-bit quality at 2.5-bit average precision, enabling deployment at substantially smaller memory footprints.

URL PDF HTML ☆

赞 0 踩 0

2605.18472 2026-05-19 stat.ML cs.AI cs.LG 版本更新

Flowing with Confidence

流中自信

Friso de Kruiff, Dario Coscia, Max Welling, Erik Bekkers

发表机构 * CuspAI ； AMLab, University of Amsterdam（阿姆斯特丹大学AMLab）； mathLab, SISSA（SISSA数学实验室）

AI总结本文提出了一种名为流匹配与自信（FMwC）的方法，通过在选定层注入输入依赖的乘法噪声，传播其方差并通过网络闭式形式传播，从而在标准采样成本下获得每个样本的置信度评分，用于改进图像质量和晶体热力学稳定性、轨迹编辑和自适应步长等应用。

详情

AI中文摘要

生成模型可以产生不合逻辑的文本、不现实的图像和不稳定的材料，其生成速度比模拟或人类审查更快；没有每个样本的置信度，信任会逐渐丧失。现有解决方案运行k个集成或随机轨迹，消耗k倍的计算资源，测量模型之间的变异性，而不是模型的置信度。我们提出流匹配与自信（FMwC）。FMwC在选定的层注入输入依赖的乘法噪声，通过网络闭式形式传播其方差，并沿ODE轨迹整合，从而在标准采样成本下获得每个样本的置信度评分。该评分支持多种用途：过滤可以提高图像质量和晶体的热力学稳定性；编辑可以将轨迹回退到模型承诺的点并重新定向；自适应步长将ODE计算集中在流不明确的地方。我们发现置信度评分与学习速度场的发散量的大小相关，这为我们提供了一个窗口来理解生成过程，开启了针对关键时刻的手术形式指导，新的采样算法和生成模型的可解释性。

用强化学习建模客户轨迹以获得实际零售洞察

Ken Ming Lee, Paul Barde, Maxime C. Cohen, Derek Nowrouzezahrai

发表机构 * McGill University（麦吉尔大学）； Mila - Quebec AI Institute（魁北克人工智能研究所）

AI总结本文提出了一种基于智能体的建模框架，将客户轨迹预测转化为最大熵强化学习问题，以更准确地反映具有有限理性的客户行为，从而提供更精确的冲动购买率和货架交通密度估计。

Comments Proceeding of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情

AI中文摘要

理解零售空间内客户移动对于优化商店布局至关重要。现实世界轨迹数据可以提供高度准确的洞察，但收集起来成本高昂且对许多零售商来说难以实现。启发式方法如旅行商问题（TSP）和概率最近邻（PNN）常被用作廉价的近似方法，但实际客户轨迹与最短路径的偏差平均为28%，突显了准确性和实用性之间的权衡。我们提出了一种基于智能体的建模框架，将客户轨迹预测视为最大熵强化学习（RL）问题，通过平衡奖励最大化与随机性来更好地反映具有有限理性的客户。使用现实世界便利商店的轨迹数据，我们证明RL生成的轨迹比TSP和PNN更接近客户行为，提供了更准确的冲动购买率和货架交通密度估计。此外，只有基于RL的预测能够为冲动产品提供与实际轨迹数据一致的重新定位决策，从而产生可比的估计利润增长。我们的工作表明，RL提供了一种实用且基于行为的替代方法，弥合了过于简化的启发式方法和数据密集型方法之间的差距，使准确的布局优化更具可及性。为了鼓励进一步研究，源代码可在GitHub上获得。

英文摘要

Understanding customer movement within retail spaces is essential for optimizing store layouts. Real-world trajectory data can provide highly accurate insights, but collecting it is costly and often infeasible for many retailers. Heuristics such as Travelling Salesman Problem (TSP) and Probabilistic Nearest Neighbours (PNN) are commonly used as inexpensive approximations, but actual customer trajectories deviate by an average of 28% from shortest paths, highlighting a tradeoff between accuracy and practicality. We propose an agent-based modelling framework that casts customer trajectory prediction as a maximum entropy reinforcement learning (RL) problem, balancing reward maximization with stochasticity to better reflect customers with bounded rationality. Using real-world trajectory data from a convenience store, we show that RL-generated trajectories align more closely with customer behaviour than TSP and PNN, providing more accurate estimates of impulse purchase rates and shelf traffic densities. Furthermore, only RL-based predictions yield repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains. Our work demonstrates that RL provides a practical, behaviourally grounded alternative that bridges the gap between oversimplified heuristics and data-intensive approaches, making accurate layout optimization more accessible. To encourage further research, the source code is available on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2605.18437 2026-05-19 cs.LG cs.DC 版本更新

Heterogeneous Tasks Offloading in Vehicular Edge Computing: A Federated Meta Deep Reinforcement Learning Approach

车载边缘计算中的异构任务卸载：一种联邦元深度强化学习方法

Yaorong Huang, Jingtao Luo, Xuechao Wang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Chengdu Neusoft University（成都新soft大学）

AI总结本文提出了一种联邦元深度强化学习框架FedMAGS，用于解决车载边缘计算中异构任务卸载问题，通过图注意力网络捕捉DAG依赖关系，序列到序列策略生成结构化卸载决策，并利用联邦元学习实现跨分布式MEC服务器的快速适应。

详情

AI中文摘要

车载边缘计算（VEC）通过将计算密集型任务卸载到附近的边缘服务器，使延迟敏感的车载应用成为可能。然而，现实中的车载工作负载通常被建模为具有复杂依赖结构的异构有向无环图（DAG）任务，这使得联合卸载和资源分配极具挑战性。此外，分布式MEC部署在协同训练基于学习的策略时会引发隐私问题。本文提出了一种联邦元深度强化学习框架，结合GAT-Seq2Seq建模（FedMAGS），用于车载边缘计算系统中的异构任务卸载。所提出的方法利用图注意力网络捕捉DAG依赖关系，基于序列到序列的策略生成结构化卸载决策，并利用联邦元学习实现跨分布式MEC服务器的快速适应，而无需共享原始数据。大量模拟表明，FedMAGS在收敛速度、执行延迟和可扩展性方面均优于现有最先进的基线方法。此外，联邦设计在保护数据隐私的同时减少了通信开销，使该框架非常适合动态和大规模的VEC环境。

英文摘要

Vehicular edge computing (VEC) enables latency-sensitive vehicular applications by offloading computation-intensive tasks to nearby edge servers. However, real-world vehicular workloads are typically modeled as heterogeneous directed acyclic graph (DAG) tasks with complex dependency structures, making joint offloading and resource allocation highly challenging. Moreover, distributed MEC deployment raises privacy concerns when collaboratively training learning-based policies. In this paper, we propose a Federated Meta Deep Reinforcement Learning framework with GAT-Seq2Seq modeling (FedMAGS) for heterogeneous task offloading in VEC systems. The proposed approach leverages Graph Attention Networks to capture DAG dependencies, a Seq2Seq-based policy to generate structured offloading decisions, and federated meta-learning to enable fast adaptation across distributed MEC servers without sharing raw data. Extensive simulations demonstrate that FedMAGS achieves faster convergence, lower execution delay, and better scalability compared with state-of-the-art baselines. In addition, the federated design preserves data privacy while reducing communication overhead, making the framework well suited for dynamic and large-scale VEC environments.

URL PDF HTML ☆

赞 0 踩 0

2605.18430 2026-05-19 cs.LG 版本更新

Text2CAD-Bench: A Benchmark for LLM-based Text-to-Parametric CAD Generation

Text2CAD-Bench: 一个用于基于LLM的文本到参数化CAD生成的基准

Liang Wang, Heng Meng, Zekai Xiang, Jin Liu, Pingyi Zhou, Litao Chen, Yongqiang Tang

发表机构 * School of Computer Science, Wuhan University, Wuhan 430000, Hubei, China（武汉大学计算机科学学院）； Spatial Design Intelligence Lab, BitInf Ltd., Shanghai 200003, China（BitInf Ltd.空间设计智能实验室）； College of Computer and Information Engineering, Nanjing Tech University, Nanjing 211800, Jiangsu, China（南京理工大学计算机与信息工程学院）； State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China（中国科学院自动化研究所多模态人工智能系统国家重点实验室）

AI总结本文提出Text2CAD-Bench，首个系统评估文本到CAD在几何复杂度和应用多样性方面的基准，发现当前模型在基本几何上表现良好，但在复杂拓扑和高级功能上表现下降。

详情

AI中文摘要

文本到CAD生成旨在从自然语言创建参数化CAD模型，使快速原型设计和直观设计流程成为可能。然而，现有基准主要关注基本原始体和简单的草图-拉伸序列，缺乏现实应用中必需的高级功能，并仅涵盖传统机械部件。我们引入Text2CAD-Bench，首个系统评估文本到CAD在几何复杂度和应用多样性方面的基准。我们的基准包含600个由人类整理的例子，涵盖四个层次：L1-L2涵盖基本几何和标准特征，L3引入复杂拓扑和自由曲面，L4扩展到机械部件之外的现实领域。每个示例配对双风格提示--几何描述模仿非专家用户，以及程序序列对齐专家级规范。评估主流通用LLM和领域特定模型，发现当前模型在基本几何上表现良好，但在复杂拓扑和高级功能上表现下降。我们发布此基准以推动文本到CAD研究的发展。

英文摘要

Text-to-CAD generation aims to create parametric CAD models from natural language, enabling rapid prototyping and intuitive design workflows. However, existing benchmarks focus on basic primitives and simple sketch-extrude sequences, lacking advanced features essential for real-world applications and covering only traditional mechanical parts. We introduce Text2CAD-Bench, the first benchmark systematically evaluating text-to-CAD across geometric complexity and application diversity. Our benchmark comprises 600 human-curated examples spanning four levels: L1-L2 cover fundamental geometry with standard features, L3 introduces complex topology and freeform surfaces, and L4 extends to real-world domains beyond mechanical parts. Each example pairs dual-style prompts -- geometric descriptions mimicking non-expert users, and procedural sequences aligned with expert-level conventions. Evaluating mainstream general LLMs and domain-specific models, we find that current models perform reasonably on basic geometry but degrade substantially on complex topology and advanced features. We release our benchmark to drive progress in text-to-CAD research.

URL PDF HTML ☆

赞 0 踩 0

2605.18425 2026-05-19 cs.LG math.ST stat.TH 版本更新

Generative Adversarial Learning from Deterministic Processes

从确定性过程生成对抗学习

Joris C. Kühl, Hanno Gottschalk

发表机构 * Institute of Mathematics, Technical University of Berlin（柏林技术大学数学研究所）

AI总结本文研究了生成对抗网络在非独立同分布数据中的成功应用，证明了通过无限维生成对抗学习模型可以从单个确定性时间序列中学习混沌动力系统不变分布，并给出了收敛速率。

Comments 37 pages, 3 figures

2605.18422 2026-05-19 stat.ML cs.LG math.ST stat.TH 版本更新

Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

广义函数ANOVA的闭式表达：加性解释的统一视角

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes

发表机构 * EDF R&D, SINCLAIR Lab（EDF研究院，SINCLAIR实验室）； Université de Toulouse, ANITI（图卢兹大学，ANITI）； Université de Toulouse Sorbonne Université（图卢兹大学，索邦大学）； Universidad Medellin（梅尔辛大学）； INRIA Regalia（INRIA皇家研究所）

AI总结本文提出了一种闭式表达的广义函数ANOVA方法，提供了一种统一的加性解释框架，能够处理依赖输入情况下的模型预测分解问题。

Comments 34 pages, 23 Figures, 101 equations, 8 Tables

详情

AI中文摘要

函数ANOVA，或Hoeffding分解，提供了一个原理性的框架用于可解释性，通过将模型预测分解为主效应和高阶交互作用。对于独立输入，这种经典分解是显式的。它与SHAP值、广义加性模型和正交多项式展开密切相关，因此构成了加性可解释性的重要工具。然而，在更一般和现实的依赖设置中，获得可处理的表示并从数据中估计分解仍然具有挑战性。在本文中，我们针对连续输入解决了这个问题。通过结合Hilbert空间方法与广义函数ANOVA，我们构建了一个显式的Riesz基分解，使得分解计算变得容易。我们的方法恢复了经典独立情况及其相关的正交分解。基于此表示，我们提出了一种简单但强大的算法，能够在模型无关的设置下从数据样本中估计分解，并通过与几种最先进的解释方法进行实证比较，展示了该方法的威力。

英文摘要

The functional ANOVA, or Hoeffding decomposition, provides a principled framework for interpretability by decomposing a model prediction into main effects and higher-order interactions. For independent inputs, this classical decomposition is explicit. It is closely connected to SHAP values, generalized additive models, and orthogonal polynomial expansions, and therefore constitutes a fundamental tool for additive explainability. In the more general and realistic dependent setting, however, obtaining a tractable representation and estimating the decomposition from data remain challenging. In this work, we address this problem for continuous inputs. By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis allowing to easily compute the decomposition. Our formulation recovers the classical independent case and its associated orthogonal decomposition. Building on this representation, we propose a simple but mighty algorithm to estimate the decomposition from a data sample in a model-agnostic setting and we compare it empirically with several state-of-the-art explanation methods, demonstrating the power of the approach.

URL PDF HTML ☆

赞 0 踩 0

2605.18387 2026-05-19 cs.LG cs.AI 版本更新

Graph Hierarchical Recurrence for Long-Range Generalization

图层次递归用于长距离泛化

Stefano Carotti, Marco Pacini, Alessio Gravina, Davide Bacciu, Bruno Lepri, Sebastiano Bontorin

发表机构 * Department of Computer Science, University of Trento（特伦托大学计算机科学系）； Fondazione Bruno Kessler（布鲁诺·克谢勒基金会）； Department of Computer Science, University of Pisa（帕尔马大学计算机科学系）

AI总结本文提出了一种名为图层次递归（GHR）的新框架，通过在输入图和通过池化获得的层次抽象上联合操作，解决了图神经网络和图转换器在长距离相关性捕捉任务中的限制，并在多个长距离基准测试中表现出色，参数效率高。

详情

AI中文摘要

图神经网络（GNNs）和图转换器（GTs）已成为图学习的基本范式，结合了深度模型的表示学习能力与诱导偏置带来的样本效率。尽管其有效性已得到广泛认可，但大量研究表明这些模型在需要捕捉图中远距离区域之间相关性的任务中仍面临根本性限制。为了解决这一问题，我们引入了图层次递归（GHR），一种新的框架，该框架同时在输入图和通过池化获得的层次抽象上进行操作。我们还展示了现有模型的局限性在超出范围的泛化中更加明显，其中测试实例涉及比训练时观察到的更长距离的相互作用。相比之下，尽管其设计简单，GHR提供了三个关键优势：在长距离依赖上表现强劲，改进了超出范围的泛化能力，以及高参数效率。为了验证这些主张，我们展示了在广泛的长距离基准测试中，GHR在使用当前最先进的模型参数的1%的情况下，始终优于现有的图模型。这些结果表明，当前趋势通过扩展架构来获得图基础模型的互补方向，表明仅增加模型容量可能不足以实现泛化。

英文摘要

Graph Neural Networks (GNNs) and Graph Transformers (GTs) are now a fundamental paradigm for graph learning, combining the representation-learning capabilities of deep models with the sample efficiency induced by their inductive biases. Despite their effectiveness, a large body of work has shown that these models still face fundamental limitations in tasks that require capturing correlations between distant regions of a graph. To address this issue, we introduce Graph Hierarchical Recurrence (GHR), a novel framework that operates jointly on the input graph and on a hierarchical abstraction obtained through pooling. We also show that the limitations of existing models are even more pronounced in out-of-range generalization, where test instances involve interactions over distances longer than those observed during training. By contrast, despite its simple design, GHR provides three key advantages: strong performance on long-range dependencies, improved out-of-range generalization, and high parameter efficiency. To corroborate these claims, we show that across a broad set of long-range benchmarks, GHR consistently outperforms existing graph models while using as little as 1% of the parameters of current state-of-the-art models. These results suggest a complementary direction to the current trend of scaling architectures to obtain graph foundation models, indicating that increased model capacity alone may not be sufficient for generalization.

URL PDF HTML ☆

赞 0 踩 0

2605.18383 2026-05-19 cs.LG 版本更新

TabH2O: A Unified Foundation Model for Tabular Prediction

TabH2O：用于表格预测的统一基础模型

Pascal Pfeiffer, Dmitry Gordeev, Mathias Müller, Laura Fink, Joan Salvà Soler, Mark Landry, Branden Murray, Marcos V. Conde, Sri Satish Ambati

发表机构 * H2O.ai

AI总结本文提出TabH2O，一种统一的基础模型，通过上下文学习在单次前向传递中实现分类和回归。该模型基于TabICL架构进行了关键改进，包括统一训练、单阶段预训练和噪声感知预训练，从而在表格数据预测任务中表现出色。

Comments Technical Report - https://tabh2o.h2oai.com/

详情

AI中文摘要

我们提出了TabH2O，一种用于表格数据的基础模型，该模型通过上下文学习在单次前向传递中实现分类和回归。TabH2O基于TabICL架构进行了若干关键改进：(1) 统一训练，一个模型通过双头架构同时处理分类和回归，消除了对单独模型的需要，从而降低了总预训练成本；(2) 单阶段预训练，通过训练稳定性改进（有界可扩展softmax、阶段间归一化、可学习残差缩放、logit软上限）消除了多阶段课程学习的需要，使模型能够从一开始就使用完整长度序列进行训练；(3) 噪声感知预训练，合成数据集包含显式噪声维度以教导模型对无关特征具有鲁棒性。我们在TALENT基准（300个数据集）上评估了TabH2O v1（29.2M参数），其中它在6种评估方法中的平均排名为2.55，优于调优的CatBoost（4.07）、H2O AutoML（4.18）和LightGBM（5.08），与TabPFN v2.6（2.74）竞争，但落后于TabICL v2（2.12），并在分类和回归任务中81%的测试数据集上位列前三名。

英文摘要

We present TabH2O, a foundation model for tabular data that performs classification and regression in a single forward pass via in-context learning. TabH2O builds on the TabICL architecture with several key modifications: (1) unified training, a single model handles both classification and regression via a dual-head architecture, eliminating the need for separate models and reducing total pretraining cost; (2) single-stage pretraining, training stability improvements (bounded scalable softmax, inter-stage normalization, learnable residual scaling, logit soft-capping) eliminate the need for multi-stage curriculum learning, enabling training with full-length sequences from the start; and (3) noise-aware pretraining, synthetic datasets include explicit noise dimensions to teach the model robustness to irrelevant features. We evaluate TabH2O v1 (29.2M parameters) on the TALENT benchmark (300 datasets), where it achieves an average rank of 2.55 out of 6 evaluated methods, outperforming tuned CatBoost (4.07), H2O AutoML (4.18), and LightGBM (5.08), competitive with TabPFN v2.6 (2.74), and behind TabICL v2 (2.12), while placing in the top-3 on 81% of the testing datasets across classification and regression tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.18381 2026-05-19 cs.LG 版本更新

Generating Physically Consistent Molecules with Energy-Based Models

生成具有物理一致性的分子的基于能量模型

Christoph Griesbacher, Lea Bogensperger, Andreas Habring, Thomas Pock

发表机构 * Graz University of Technology（格拉茨技术大学）； University of Zurich（苏黎世大学）

AI总结本文提出了一种基于能量模型（EBM）的方法EBMol，用于生成三维分子，通过学习原子可加的标量势能恢复了能量归纳偏差，从而在QM9和GEOM-Drugs数据集上实现了最先进的性能，并展示了学习的能量景观作为质量度量用于配置排序和过滤，以及通过形状引导采样实现可控生成。

详情

AI中文摘要

处于平衡状态的分子遵循玻尔兹曼分布，使底层的能量景观成为一种基于物理的建模目标。然而，这样的景观从数据中学习起来困难，一旦学习完成，也难以进行采样。扩散模型和流匹配模型通过学习噪声与数据之间的时条件分数或传输场来规避这些困难，以更可处理的训练目标交换了能量归纳偏差。我们引入EBMol，一种基于能量模型（EBM），通过在训练过程中不进行显式模拟而学习原子可加的标量势能来恢复这种归纳偏差。我们的方法采用受流启发的恢复场匹配目标来近似能量景观。我们采用镜像-兰格-恩算法进行采样，使原子位置和类型的统一更新成为可能，并在推理时间采用并行退火来扩展计算规模。EBMol是首个在三维分子生成中实现最先进的性能的EBM，已在QM9和GEOM-Drugs数据集上达到最先进的性能。此外，我们还证明了学习的能量景观可以作为原理性的质量度量用于排序和过滤配置，并通过潜在能组成和零样本连接器设计通过形状引导采样实现可控生成，而无需重新训练。

英文摘要

Molecules in equilibrium follow a Boltzmann distribution, making the underlying energy landscape a physically grounded modeling objective. However, such landscapes are difficult to learn from data and, once learned, hard to sample from. Diffusion and flow-matching models sidestep these difficulties by learning a time-conditional score or transport field between noise and data, losing the energy inductive bias in exchange for a more tractable training objective. We introduce EBMol, an energy-based model (EBM) that restores this inductive bias by learning an atom-additive scalar potential without explicit simulation during training. Our method employs a flow-inspired Restoring Field Matching objective to approximate the energy landscape. We adopt the Mirror-Langevin algorithm for sampling, enabling unified updates of atomic positions and types, and incorporate parallel tempering for inference-time compute scaling. EBMol is the first EBM for 3D molecular generation to achieve state-of-the-art performance on QM9 and GEOM-Drugs. Moreover, we show that the learned energy landscape serves as a principled quality metric for ranking and filtering configurations, and demonstrate controllable generation without retraining through shape-steered sampling via potential composition and zero-shot linker design.

URL PDF HTML ☆

赞 0 踩 0

2605.18379 2026-05-19 cs.LG 版本更新

Beyond Square Roots: Explicit Memory-Efficient Factorization for Multi-Epoch Private Learning

超越平方根：多轮差分隐私学习的显式内存高效分解

Nikita P. Kalinin, Aki Rehn, Joel Daniel Andersson, Antti Honkela, Christoph H. Lampert

发表机构 * Institute of Science and Technology Austria（奥地利科学与技术研究所）； University of Helsinki（赫尔辛基大学）

AI总结本文提出了一种统一的分解方法γ-BIFR，用于多轮差分隐私学习，该方法在低内存和低带宽情况下显著提升了RMSE、放大RMSE和隐私训练性能，同时提供了更紧的理论保证。

详情

AI中文摘要

相关噪声机制是提高差分隐私模型训练效用最具前景的方法之一，但严格的保证需要显式、可分析的分解，而实际部署需要内存效率。最近的研究开发了带状逆分解，通过利用相关矩阵的带状结构来同时满足这两个要求。带宽控制用于在迭代之间相关噪声的噪声缓冲区大小，从而控制效用和内存成本之间的权衡。现有分解强调这种权衡：DP-λCGD通过仅使用一个步骤的噪声缓冲区实现了高内存效率，但限制了其效用增益，而带状逆平方根（BISR）分解利用更大的相关窗口，在大带宽下渐近最优，但在低带宽下表现不佳。我们提出γ-BIFR，是这两种分解的统一泛化。在低内存、低带宽情况下，γ-BIFR显著提高了RMSE、放大RMSE和隐私训练性能，同时为多轮参与误差提供了更紧的理论保证。

英文摘要

Correlated-noise mechanisms are among the most promising approaches for improving the utility of differentially private model training, but rigorous guarantees require explicit, analyzable factorizations, and practical deployment requires memory efficiency. Recent works have developed banded inverse factorizations, which address both requirements by exploiting a banded structure in the correlation matrix. The bandwidth controls the size of the noise buffer used to correlate noise across iterations, and thus governs the tradeoff between utility and memory cost. Existing factorizations highlight this tradeoff: DP-$λ$CGD achieves high memory efficiency by using only a one-step noise buffer, but this limits its utility gains, while the banded inverse square root (BISR) factorization exploits larger correlation windows and is asymptotically optimal for large bandwidths but performs poorly at low bandwidths. We propose $γ$-BIFR, a unified generalization of both factorizations. In the low-memory, low-bandwidth regime, $γ$-BIFR significantly improves RMSE, amplified RMSE, and private training performance, while yielding tighter theoretical guarantees for multi-participation error in multi-epoch training.

URL PDF HTML ☆

赞 0 踩 0

2605.18374 2026-05-19 cs.LG cs.AI 版本更新

混合量子-经典神经网络架构搜索

Alberto Marchisio, Muhammad Kashif, Nouhaila Innan, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University Abu Dhabi (NYUAD)（eBRAIN实验室，工程系，纽约大学阿布扎比分校（NYUAD））； Center for Quantum and Topological Systems (CQTS), NYUAD Research Institute（量子与拓扑系统中心（CQTS），NYUAD研究院）

AI总结本文研究了混合量子-经典神经网络架构搜索的基础，探讨了NAS如何扩展到量子和混合场景，并展示了FLOPs感知搜索作为构建高效且可部署的HQNN的重要方向。

详情

AI中文摘要

混合量子-经典神经网络（HQNNs）正成为噪声中等规模量子（NISQ）时代量子机器学习的实用方法，因为它们结合了经典学习组件和参数化量子电路在一个端到端可训练的框架中。然而，其性能和效率高度依赖于架构选择，如数据编码、电路结构、测量设计以及经典和量子模块之间的耦合。这使得手动设计变得越来越困难，尤其是在考虑硬件限制和资源约束时。本文研究了HQNNs和神经架构搜索（NAS）的基础，讨论了NAS如何扩展到量子和混合设置，并展示了FLOPs感知搜索（其中FLOPs作为计算复杂性的代理）作为构建不仅准确而且计算高效且可实际部署的HQNN的重要方向。

英文摘要

Hybrid quantum-classical neural networks (HQNNs) are emerging as a practical approach for quantum machine learning in the noisy intermediate-scale quantum (NISQ) era, as they combine classical learning components with parameterized quantum circuits in an end-to-end trainable framework. However, their performance and efficiency depend strongly on architectural choices such as data encoding, circuit structure, measurement design, and the coupling between classical and quantum modules. This makes manual design increasingly difficult, especially when hardware limitations and resource constraints must also be taken into account. In this paper, we study the foundations of HQNNs and neural architecture search (NAS), discuss how NAS extends to quantum and hybrid settings, and demonstrate FLOPs-aware search (where FLOPs serve as a proxy for computational complexity), as an important hardware-aware direction for building HQNNs that are not only accurate but also computationally efficient and practically deployable.

URL PDF HTML ☆

赞 0 踩 0

2605.18338 2026-05-19 stat.AP cs.LG 版本更新

Robust Player-Conditional Champion Ranking for League of Legends: Style Similarity, Mastery Priors, and Archetype-Constrained Discovery

《英雄联盟中稳健的玩家条件冠军排名：风格相似性、熟练度先验知识和范式约束发现》

Min Heo, Pranav Kadiyam, Prasun Panthi

发表机构 * Wabash College（瓦巴什学院）； Arizona State University（亚利桑那州立大学）

AI总结本文提出了一种基于玩家条件的稳健冠军排名方法，结合风格相似性、熟练度先验知识和范式约束，以解决《英雄联盟》中的冠军推荐问题。

Comments 11 pages, 3 figures

详情

AI中文摘要

在多人在线战斗竞技场游戏中，冠军推荐通常被非正式地视为元游戏强度、个人舒适度或全局胜率的问题。我们正式将《英雄联盟》中的冠军推荐建模为一个可解释的、玩家条件的排名问题，该问题在稀疏、嘈杂和非平稳的行为数据下进行。所提出的框架结合了四个信息源：人口强度代理、玩家风格相似性、直接和间接熟练度先验知识以及范式级的保护措施。该方法使用稳健的中位数/MAD标准化、对数转换用于偏斜事件计数、近期加权的玩家风格向量、熟练度加权的冠军池向量、加权余弦相似度、排名缩放的得分组件以及k-means++聚类用于粗略的范式支持。实现原型使用Python/Pandas建模层、Supabase支持的存储以及面向网页的推荐接口。与黑箱监督胜利预测系统不同，所提出的方法返回分解的推荐评分，可以作为预期性能代理、拟合、熟练度和范式兼容性的检查。包含一个单人案例研究，针对玩家标识符DIVINERAINRACCON的100场比赛历史进行端到端的合理性检查。因此，本文是一项方法和系统贡献：它指定了一个可重复、模块化和可审计的冠军推荐器，并通过时间训练-测试分割、下一冠军恢复、校准分析和消融研究提供了未来大规模评估的验证协议。

英文摘要

Champion recommendation in multiplayer online battle arena games is usually framed informally as a problem of metagame strength, personal comfort, or global win rate. We formalize champion recommendation in League of Legends as an interpretable, player-conditional ranking problem under sparse, noisy, and non-stationary behavioral data. The proposed framework combines four information sources: a population-strength proxy, player-style similarity, direct and indirect mastery priors, and archetype-level guardrails. The method uses robust median/MAD normalization, logarithmic transforms for skewed event counts, recency-weighted player style vectors, mastery-weighted champion-pool vectors, weighted cosine similarity, rank-scaled score components, and k-means++ clustering for coarse archetype support. The implemented prototype uses a Python/Pandas modeling layer, Supabase-backed storage, and a web-facing recommendation interface. Unlike black-box supervised win-prediction systems, the proposed method returns decomposed recommendation scores that can be inspected as expected-performance proxy, fit, mastery, and archetype compatibility. A single-player case study on a 100-game history for the player identifier DIVINERAINRACCON is included as an end-to-end sanity check. The manuscript is therefore a methods and systems contribution: it specifies a reproducible, modular, and auditable champion recommender and gives a validation protocol for future large-scale evaluation through temporal train-test splits, next-champion recovery, calibration analysis, and ablation studies.

URL PDF HTML ☆

赞 0 踩 0

2605.18333 2026-05-19 quant-ph cs.LG 版本更新

QLIF-CAST: Quantum Leaky-Integrate-and-Fire for Time-Series Weather Forecasting

QLIF-CAST：用于时间序列天气预报的量子泄漏积分-放电神经网络

Alberto Marchisio, Aayan Ebrahim, Nouhaila Innan, Muhammad Kashif, Muhammad Shafique

发表机构 * eBrain Lab, Division of Engineering, New York University Abu Dhabi（eBrain实验室，工程系，纽约大学阿布扎克分校）； Center for Quantum and Topological Systems, NYUAD Research Institute, New York University Abu Dhabi（量子与拓扑系统中心，NYUAD研究学院，纽约大学阿布扎克分校）

AI总结本文提出QLIF-CAST模型，将量子泄漏积分-放电神经网络应用于多变量天气短期预报，通过量子神经动态降低预测误差，且在训练时间与精度之间取得良好平衡。

详情

AI中文摘要

准确且高效的时序预测仍然是经典和量子神经架构在多变量环境设置中的挑战性问题。本文将量子泄漏积分-放电（QLIF）脉冲神经网络适应于时序回归任务，特别是短期多变量天气预报。我们扩展了QLIF的应用范围，证明其适用于连续值预测问题。QLIF-CAST模型将神经元激发状态编码为单量子比特的量子叠加态，由Rx旋转门和T1弛豫衰减驱动，并嵌入在混合量子-经典递归架构中。我们进行了两项不同的评估。首先，与参数匹配的经典LIF基线在多变量天气数据集上的受控比较显示，QLIF-CAST在MSE和MAE上分别降低了15.4%和4.4%，证明量子神经动态在预测误差上优于经典等效模型。其次，在空气质量与风速基准上与最先进的量子LSTM（QLSTM）和量子神经网络（QNN）模型的跨领域比较显示，QLIF-CAST在训练时间内最多减少了94%，在速度-误差权衡空间中占据独特位置。在IBM Marrakesh（156量子比特QPU）上的硬件验证确认了电路执行的可靠性，仅存在1.2%的平均偏差。

英文摘要

Accurate and efficient time-series forecasting remains a challenging problem for both classical and quantum neural architectures, particularly in multivariate environmental settings. This work adapts the Quantum Leaky Integrate-and-Fire (QLIF) spiking neural network for time-series regression tasks, specifically short-term multivariate weather forecasting. We extend QLIF beyond classification and demonstrate its applicability to continuous-valued prediction problems. The QLIF-CAST model encodes neuron excitation states as single-qubit quantum superpositions, driven by Rx rotation gates and T1 relaxation decay, and is embedded within a hybrid quantum-classical recurrent architecture. We conduct two distinct evaluations. First, a controlled comparison against a parameter-matched classical LIF baseline on a multivariate weather dataset shows that QLIF-CAST achieves 15.4% lower MSE and 4.4% lower MAE, demonstrating that quantum neuronal dynamics reduce prediction error over classical equivalents. Second, a cross-domain comparative analysis with state-of-the-art quantum LSTM (QLSTM) and quantum neural network (QNN) models on air quality and wind speed benchmarks reveals that QLIF-CAST converges in up to 94% less training time, occupying a distinct position in the speed-error trade-off space. Hardware verification on IBM Marrakesh (156-qubit QPU) confirms reliable circuit execution with only 1.2% average deviation from simulation.

URL PDF HTML ☆

赞 0 踩 0

2605.18331 2026-05-19 cs.LG 版本更新

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

剪枝、更新与裁剪：大型语言模型的鲁棒结构剪枝

Diego Coello de Portugal Mecke, Tom Hanika, Lars Schmidth-Thieme

发表机构 * ISMLL & DARC VWFS University of Hildesheim（ISMLL与DARC VWFS大学海德斯海姆大学）； ISMLL University of Hildesheim（ISMLL大学海德斯海姆大学）

AI总结本文提出Putri方法，通过更新未剪枝权重、按顺序剪枝FFN层以及移除单个注意力头来改进大型语言模型的后训练剪枝，实现了在极端稀疏率下的高效剪枝。

详情

AI中文摘要

大型语言模型（LLMs）近年来经历了显著的增长和开发。然而，进行LLMs的推理仍然成本高昂，尤其是在长上下文推理或资源受限的设备上。这促使开发新的后训练剪枝（PTP）方法。这些方法通过移除模型参数的大量部分来降低LLMs的要求。被丢弃的权重根据其对模型性能的影响进行选择。当前的PTP方法通过移除FFN层中信息较少的隐藏节点和最不重要的注意力层来剪枝模型。我们提出Putri，一种PTP方法，引入了三个改进：首先，更新未剪枝的FFN权重以补偿引入的剪枝误差；其次，按顺序剪枝FFN层，考虑之前层的更新；第三，而不是移除完整的注意力层，我们移除单个注意力头。我们扩展了这种方法，使其能够处理分组查询注意力。总之，Putri是一种保持简单但表现卓越的结构剪枝方法。在多个模型上进行剪枝实验，涵盖广泛的稀疏率范围和不同的数据集，验证了Putri的通用性。值得注意的是，我们证明，与以前的方法不同，Putri可以在极端稀疏率下剪枝LLMs。代码可在：https://github.com/Coello-dev/Putri 获取。

英文摘要

Large Language Models (LLMs) have experienced significant growth and development in recent years. However, performing inference on LLMs remains costly, especially for long-context inference or in resource-constrained devices. This motivates the development of new post-training pruning (PTP) methods. These methods reduce LLMs' requirements by removing a substantial part of the model's parameters. The discarded weights are selected depending on their impact on the models performance. Current PTP methods prune the models by removing the less informative hidden nodes from the FFN layers, and the least important attention layers. We propose Putri, a PTP method that introduces three changes to the State- of-the-art. First, we update the un-pruned weights of the FFN to compensate for the introduced pruning error. Second, the FFN layers are pruned sequentially, taking into account the updates done to the previous layers. Third, instead of removing full attention layers, we remove individual attention-heads. We extend this method such that it can also address Grouped-Query Attention. In summary, Putri is a structure pruning method which remains simple while showing SOTA performance. Pruning experiments on multiple models with a wide variety of sparsity ranges and on different datasets, validate the generality of Putri. Notably, we demonstrate that, unlike previous methods, Putri can prune LLMs on extreme sparsity ratios. The code is available at: https://github.com/Coello-dev/Putri.

URL PDF HTML ☆

赞 0 踩 0

2605.18320 2026-05-19 cs.LG cs.AI 版本更新

ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

ISEP: 通过随机策略优化实现离线强化学习的隐式支持扩展

Yifei Chen, Shaoqin Zhu, Xiaoqiang Ji

发表机构 * The Chinese University of Hong Kong, Shenzhen Longgang（香港中文大学（深圳）松山湖校区）

AI总结本文提出ISEP方法，通过随机策略优化实现离线强化学习中的隐式支持扩展，以解决传统方法在安全约束下难以发现最优行为的问题，核心贡献是通过价值函数插值和随机动作选择策略提高策略改进的导航能力。

详情

AI中文摘要

DARE-EEG: 一种用于挖掘双对齐表示的EEG基础模型

Yang Shao, Peiliang Gong, Qun Dai, Daoqiang Zhang

发表机构 * College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics（航空宇航学院人工智能学院）

AI总结本文提出DARE-EEG，一种通过双对齐表示学习预训练的自监督基础模型，旨在解决EEG编码器在不完整观测下学习不变表示的问题，通过对比学习和动量更新实现语义稳定性，并通过卷积-线性探针策略适应异构电极配置和采样率，实验表明其在EEG基准测试中表现优异。

Comments 22 pages, 10 pages of main text + 12 pages of appendices

详情

AI中文摘要

通过在大规模EEG数据上进行掩码重建预训练，基础模型已成为在多样化脑机接口应用中学习通用神经表示的有前景范式。然而，一个关键但被忽视的挑战是EEG编码器必须学习对不完整观测不变的表示——当不同掩码视图的同一信号有最小重叠时，现有方法无法将它们约束到一致的潜在子空间，导致转移性下降。为此，我们提出DARE-EEG，一种自监督基础模型，通过预训练期间的双对齐表示学习显式强制掩码不变性。具体而言，我们引入掩码对齐，通过对比学习约束同一EEG样本多个掩码视图的表示，补充锚点对齐，将掩码表示对齐到动量更新的完整特征以实现语义稳定性。此外，我们提出卷积-线性探针，一种参数高效策略，通过解耦频谱-空间投影适应异构电极配置和采样率。在多样化的EEG基准测试中，广泛实验表明DARE-EEG在准确性表现上始终领先，同时保持相对较低的参数复杂度和优于现有方法的跨数据集可移植性。此外，DARE-EEG有助于有效发现和利用EEG中的丰富潜在表示。

具有单侧反馈的隐私保护强化学习

Lin William Cong, Guangyan Gan, Hanzhang Qin, Zhenzhen Yan

发表机构 * Nanyang Technological University（南洋理工大学）； National University of Singapore（国立新加坡大学）； Cornell SC Johnson College of Business（康奈尔大学SC Johnson商学院）

AI总结本文研究了在多维连续状态和动作空间中，代理仅接收状态部分观测并仅在每个时间步获得状态-动作空间子集奖励信息的强化学习问题，提出了一种新的隐私保护强化学习算法POOL，并通过理论分析证明其样本复杂度与非隐私强化学习的下界一致，展示了在保持高学习效率的同时实现强隐私保障的可行性。

Comments Accepted at IJCAI-ECAI 2026

详情

AI中文摘要

我们研究了在多维连续状态和动作空间中具有单侧反馈的强化学习（RL）。在此设置中，智能体仅接收状态的部分观测，并在每个时间步仅获得状态-动作空间子集的奖励信息。这种设置在学习效率和隐私保护方面带来了重大挑战。为了解决这些挑战，我们提出了POOL，一种新颖的隐私保护RL算法。我们对POOL进行了全面的理论分析，推导出一个样本复杂度界，该界与已知的非隐私RL下界相匹配。其中，E_rho表示隐私参数，H是时间范围，alpha是最优性差距参数。我们的研究结果表明，可以在保持高学习效率的同时实现强隐私保障，这标志着在具有单侧反馈的多维环境中实现实用的隐私感知RL迈出重要一步。

英文摘要

We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound that matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is the optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.

URL PDF HTML ☆

赞 0 踩 0

2605.18229 2026-05-19 cs.LG cs.AI 版本更新

Are Sparse Autoencoder Benchmarks Reliable?

稀疏自编码基准测试是否可靠？

David Chanin

发表机构 * Decode Research, MATS, UCL（Decode研究、MATS、伦敦大学学院）

AI总结该研究评估了稀疏自编码（SAE）基准测试的可靠性，发现其中两个指标在多个角度下表现不佳，其他指标也未能达到预期效果，表明需要改进SAE基准测试。

详情

AI中文摘要

稀疏自编码（SAEs）是大型语言模型的核心可解释性工具，其进展依赖于能够可靠区分更好和更差SAE的基准测试。我们通过三种互补的视角审计了SAEBench中SAE质量指标：固定SAE上的重新播种噪声、合成SAE上的真实相关性以及训练轨迹的可区分性。我们发现，两个指标，即目标探测扰动（TPP）和虚假相关性消除（SCR），在它们的典型设置下未能通过多个视角，不应用于评估SAE。其他指标显示出更高的重新播种噪声和更低的可区分性，比领域假设的要差。sae-probes变体的k-稀疏探测是我们在测试中发现最可靠的指标，但即使sae-probes也难以区分同一体系结构的不同变体。我们的结果表明，领域需要更好的SAE基准测试。

英文摘要

Sparse autoencoders (SAEs) are a core interpretability tool for large language models, and progress on SAE architectures depends on benchmarks that reliably distinguish better SAEs from worse ones. We audit the SAE quality metrics in SAEBench, the de-facto standard SAE evaluation suite, through three complementary lenses: reseed noise on a fixed SAE, ground-truth correlation on synthetic SAEs, and discriminability across training trajectories. We find that two of these metrics, Targeted Probe Perturbation (TPP) and Spurious Correlation Removal (SCR), fail multiple lenses at their canonical settings and should not be used to evaluate SAEs. The other metrics show higher reseed noise and lower discriminability than the field assumes. The sae-probes variant of $k$-sparse probing is the most reliable metric we tested, but even sae-probes struggles to separate variants of the same SAE architecture. Our results show the field needs better SAE benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.18221 2026-05-19 cs.SD cs.CL cs.CV cs.LG physics.med-ph 版本更新

SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

SIREM: 语音引导的MRI重建与学习采样

Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg（埃森哲-埃尔朗根-纽伦堡大学模式识别实验室）； Institute of Radiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg（埃尔朗根大学医院放射学研究所）； Institut für Informationsverarbeitung, Leibniz Universität Hannover（汉诺威莱比锡大学信息处理研究所）； Department of Radiology, Harvard Medical School and Massachusetts General Hospital（哈佛医学院放射科和麻省总医院）

AI总结本文提出了一种语音引导的MRI重建框架SIREM，通过同步语音作为跨模态先验，利用语音与声音学之间的相关性预测图像内容，从而在更高的吞吐量下实现更合理的解剖结构重建。

详情

AI中文摘要

实时磁共振成像（rtMRI）在语音生产中的应用能够非侵入性地可视化动态声带运动，对语音科学和临床评估具有价值。然而，rtMRI本质上受到空间分辨率、时间分辨率和获取速度之间的权衡限制，常常导致k空间测量不足和重建质量下降。我们提出SIREM，一种利用同步语音作为跨模态先验的MRI重建框架。核心思想是语音期间的声带配置与产生的声音学相关，使图像部分内容可从音频预测。SIREM将每帧建模为音频驱动组件和MRI驱动组件的融合，通过空间加权图。音频分支从语音预测发音器相关结构，而MRI分支从测量的k空间数据重建互补内容。我们进一步引入了可学习的软加权轮廓，使螺旋臂的使用与语音引导融合的交互研究可微分。这产生了一个统一的多模态公式，结合了音频驱动预测、MRI重建和采样适应。我们在USC语音rtMRI基准上评估了SIREM，与标准基线（包括栅格、基于小波的压缩感知和总变分）进行比较。SIREM引入了一种语音引导的重建范式，在比迭代方法高得多的吞吐量下运行，同时保持解剖上合理的声带结构。这些结果为多模态语音引导的rtMRI重建建立了初步基准，并突显了同步语音作为快速重建辅助先验的潜力。源代码可在https://github.com/mdhasanai/SIREM获取。

英文摘要

Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM

URL PDF HTML ☆

赞 0 踩 0

2605.18204 2026-05-19 stat.ML cs.LG 版本更新

Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster

前向学习离散扩散：学习如何更快地噪声去噪声

Grigory Bartosh, Teodora Pandeva, Sushrut Karmalkar, Javier Zazo

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Microsoft Research, Cambridge（微软研究院，剑桥）

AI总结本文提出前向学习离散扩散（FLDD），通过引入可学习的前向（噪声）过程，减少目标分布与模型分布之间的差距，实现少步生成。该方法采用非马尔可夫形式，利用可学习的边缘和后验分布，使生成过程保持因子化同时匹配噪声过程定义的目标。实验表明，在相同采样步数下，FLDD生成的样本质量优于传统离散扩散模型。

详情

AI中文摘要

离散扩散模型是一类强大的生成模型，在许多领域表现出色。然而，为了效率，离散扩散通常用因子化分布参数化生成（反向）过程，这使得模型难以在少量步骤内学习目标过程，并需要长且计算成本高的采样过程。为减少目标与模型分布之间的差距并实现少步生成，我们提出前向学习离散扩散（FLDD），引入可学习的前向（噪声）过程。不同于固定马尔可夫前向链，我们采用非马尔可夫形式，结合可学习的边缘和后验分布。这使生成过程保持因子化，同时匹配由噪声过程定义的目标。我们通过标准变分目标端到端训练所有参数。在各种基准测试中，实验表明，对于给定的采样步数，我们的方法生成的样本质量优于使用相同反向参数化的传统离散扩散模型。

英文摘要

Discrete diffusion models are a powerful class of generative models with strong performance across many domains. For efficiency, however, discrete diffusion typically parameterizes the generative (reverse) process with factorized distributions, which makes it difficult for the model to learn the target process in a small number of steps and necessitates a long, computationally expensive sampling procedure. To reduce the gap between the target and model distributions and enable few-step generation, we propose Forward-Learned Discrete Diffusion (FLDD), which introduces discrete diffusion with a learnable forward (noising) process. Rather than fixing a Markovian forward chain, we adopt a non-Markovian formulation with learnable marginal and posterior distributions. This allows the generative process to remain factorized while matching the target defined by the noising process. We train all parameters end-to-end under the standard variational objective. Experiments on various benchmarks show that, for a given number of sampling steps, our approach produces a higher quality samples than conventional discrete diffusion models using the same reverse parameterization.

URL PDF HTML ☆

赞 0 踩 0

2605.18202 2026-05-19 cs.LG cs.AI 版本更新

Concise and Logically Consistent Conformal Sets for Neuro-Symbolic Concept-Based Models

简洁且逻辑一致的神经符号概念模型的符合集

Samuele Bortolotti, Emanuele Marconato, Andrea Pugnana, Andrea Passerini, Stefano Teso

发表机构 * Department of Information Engineering and Computer Science, University of Trento, Italy（特伦托大学信息工程与计算机科学系）； CIMeC, University of Trento, Rovereto, Italy（特伦托大学罗韦雷托CIMeC）

AI总结本文提出COCOCO框架，通过整合符合预测方法，解决神经符号概念模型中标签和概念预测过于自信的问题，满足一致性、覆盖性和简洁性三个要求，提升模型的可靠性。

详情

AI中文摘要

神经符号概念模型（NeSy-CBMs）是一类将神经网络与符号推理相结合的架构，用于在高风险应用中提高可靠性。它们通过从输入中提取高层概念，然后在给定的逻辑约束下推断任务标签。然而，其标签和概念预测可能过于自信，使利益相关者难以判断何时可以信任模型的决策。本文通过整合符合预测（CP）框架，提供严格的分布无关覆盖保证，正式化了三个要求——一致性、覆盖性和简洁性，证明现有方法至少在一项上不足。然后引入COCOCO，一种后处理框架，联合符合概念和标签，并通过单个推断-反推修订步骤进行协调。COCOCO满足所有三个要求，保留分布无关覆盖，对不完美的知识具有鲁棒性，并支持用户指定的大小预算。在8个数据集上的实验显示，COCOCO在性能和集合大小方面优于竞争对手和自然基线。

英文摘要

Neuro-Symbolic Concept-based Models (NeSy-CBMs) are a family of architectures that integrate neural networks with symbolic reasoning for enhanced reliability in high-stakes applications. They work by first extracting high-level concepts from the input and then inferring a task label from these compatibly with given logical constraints. Yet, their label and concept predictions can be overconfident, making it difficult for stakeholders to gauge when the model's decisions can be trusted. We address this issue by integrating ideas from Conformal Prediction (CP), a framework providing rigorous, distribution-free coverage guarantees. We formalize three desiderata -- consistency, coverage, and conciseness -- that any conformal method for NeSy-CBMs should satisfy, and show that existing approaches fall short of at least one. We then introduce COCOCO, a post-hoc framework that conformalizes concepts and labels jointly and reconciles them via a single deduction-abduction revision step. COCOCO satisfies all three desiderata, retains distribution-free coverage, is robust to imperfect knowledge and supports user-specified size budgets. Our experiments on 8 data sets highlight how COCOCO compares favorably against competitors and natural baselines in terms of performance and set size.

URL PDF HTML ☆

赞 0 踩 0

2605.18190 2026-05-19 cs.LG cs.CV 版本更新

Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network

双速率扩散：通过交错重-轻网络加速扩散模型

Grigory Bartosh, David Ruhe, Emiel Hoogeboom, Jonathan Heek, Thomas Mensink, Tim Salimans

发表机构 * Google DeepMind Amsterdam（谷歌深Mind阿姆斯特丹）； Amsterdam University of Amsterdam（阿姆斯特丹大学）

AI总结本文提出双速率扩散方法，通过交错执行高容量上下文编码器和轻量解噪模型，加速扩散模型推理，同时保持样本质量，在ImageNet基准上实现性能与计算成本的平衡。

详情

AI中文摘要

扩散模型在生成性能上达到最先进的水平，但在推理过程中由于重复评估重的神经网络而面临高昂的计算成本。在本文中，我们提出了双速率扩散，一种通过交错执行高容量的上下文编码器和轻量高效的去噪模型来加速采样的方法。上下文编码器被稀疏评估以提取高维特征，这些特征在每一步都被轻量去噪模型有效重用，以高效地细化样本。这种方法显著加速了推理过程，而不会牺牲样本质量。在ImageNet基准上，双速率扩散在性能上与标准基线相匹配，同时将计算成本降低了2-4倍。此外，我们证明了我们的方法与蒸馏技术，如动量匹配蒸馏，兼容，从而在少步生成中进一步提高效率。

英文摘要

Diffusion models achieve state-of-the-art generative performance but suffer from high computational costs during inference due to the repeated evaluation of a heavy neural network. In this work, we propose Dual-Rate Diffusion, a method to accelerate sampling by interleaving the execution of a heavy high-capacity context encoder and a light efficient denoising model. The context encoder is evaluated sparsely to extract high-dimensional features, which are effectively reused by the light denoising model at every step to refine the sample efficiently. This approach significantly accelerates inference without compromising sample quality. On ImageNet benchmarks, Dual-Rate Diffusion matches the performance of standard baselines while reducing computational cost by a factor of $2$-$4$. Furthermore, we demonstrate that our method is compatible with distillation techniques, such as Moment Matching Distillation, enabling further efficiency gains in few-step generation.

URL PDF HTML ☆

赞 0 踩 0

2605.18188 2026-05-19 cs.LG 版本更新

UTOPYA: A Multimodal Deep Learning Framework for Physics-Informed Anomaly Detection and Time-Series Prediction

UTOPYA：一种用于物理信息异常检测和时间序列预测的多模态深度学习框架

Robson W. S. Pessoa, Julien Amblard, Alessandra Russo, Idelfonso B. R. Nogueira

发表机构 * Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU)（化学工程系，挪威科学与技术大学）； Department of Computing, Imperial College London（计算系，帝国理工学院伦敦分校）

AI总结本文提出UTOPYA框架，通过融合八种数据模态，利用FiLM条件交叉模态注意力和门控融合，共同解决批次蒸馏中的异常检测、时间序列预测和相分类问题，并通过物理信息正则化方案和课程学习方法提升性能。

详情

AI中文摘要

批次过程中的异常检测受到瞬态动态、稀少故障标签和依赖单一模态传感器数据的限制。本文介绍了UTOPYA（统一时间观测用于物理信息异常检测和时间序列预测），一种具有15.2M参数的多模态框架，通过特征-wise线性调制（FiLM）条件交叉模态注意力和门控融合，共同解决批次蒸馏中的异常检测、时间序列预测和相分类问题。本文引入的物理信息正则化方案强制时间平滑性和热力学单调性，而课程学习则按物理难度顺序引入训练样本。在Arweiler等人（2026）的119次实验多模态批次蒸馏数据集上，UTOPYA在窗口级别测试中达到0.832和0.874的AUROC，显著优于四个外部基线（PCA、自动编码器、隔离森林和LSTM自动编码器）在相同条件下的表现（+0.147窗口级别AUROC超过最佳基线）。对15种架构配置的多模态消融研究显示，通过FiLM条件的静态上下文是关键使能器，使实验级别多信号AUROC提高+0.145（从0.729到0.874）。此外，对14种设计选择的训练消融研究发现，包括实例归一化、Mixup、集成、测试时增强和随机权重平均在内的几种广泛采用的技巧在数据稀少的设置中未能提升或主动降低泛化能力。这些负面结果揭示了平滑基于正则化和异常检测之间的根本矛盾，为多模态过程监控部署提供了实际指导。

英文摘要

Anomaly detection in batch processes is hindered by transient dynamics, scarce fault labels, and reliance on single-modality sensor data. This work introduces UTOPYA (Unified Temporal Observation for Physics-Informed Anomaly Detection and Time-Series Prediction), a 15.2M-parameter multimodal framework that jointly addresses anomaly detection, time-series prediction, and phase classification in batch distillation by fusing eight data modalities through Feature-wise Linear Modulation (FiLM) conditioned cross-modal attention and gated fusion. A physics-informed regularisation scheme introduced in this work enforces temporal smoothness and thermodynamic monotonicity, while curriculum learning introduces training samples in order of physical difficulty. On the 119-experiment multimodal batch distillation dataset of Arweiler et al. (2026), UTOPYA achieves a window-level test AUROC of 0.832 and 0.874 under multi-signal experiment-level scoring, substantially outperforming four external baselines (PCA, autoencoder, Isolation Forest, and LSTM autoencoder) evaluated under identical conditions (+0.147 window-level AUROC over the best baseline). A multimodal ablation over 15~architectural configurations shows that static context via FiLM conditioning is the key enabler, lifting experiment-level multi-signal AUROC by +0.145 over the unimodal baseline (0.729 to 0.874). Separately, a training ablation across 14 design choices reveals that several widely-adopted techniques, including instance normalisation, Mixup, ensembling, test-time augmentation, and stochastic weight averaging, fail to improve or actively degrade generalisation in this data-scarce setting. These negative results expose a fundamental tension between smoothing-based regularisation and anomaly detection, providing practical guidance for multimodal process monitoring deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.18180 2026-05-19 stat.ML cs.LG 版本更新

Canonical Regularisation of Wide Feature-Learning Neural Networks

宽特征学习神经网络的规范正则化

George Whittle, Pranav Vaidhyanathan, Juliusz Ziomek, Natalia Ares, Maike A. Osborne

发表机构 * Department of Engineering Science University of Oxford（工程科学系牛津大学）

AI总结本文研究了宽特征学习神经网络中梯度流训练所隐含的正则化性质，揭示了在核域中广泛研究的范数正则化在特征学习域中会导致诱导偏差扭曲，并提出了弧范数作为可扩展的替代方案，扩展了范数正则化到特征学习域。

详情

AI中文摘要

宽神经网络在特征学习范式中推动了现代深度学习的发展，但它们的研究远少于核范式中的网络。我们考虑了这两个范式之间一个关键但研究不足的差异：梯度流训练所隐含的正则化和先验。这种规范正则化性质在核范式网络中已被广泛研究——在所有无限全局极小点中，梯度流精确选择消失的岭解——并支撑了著名的NN-GP对应关系，精确允许在训练过程中建模噪声。然而，我们证明在特征学习范式网络中，岭正则化会扭曲梯度流的诱导偏差，即使在正则化趋于零的极限下也是如此。在训练过程中，岭正则化会扭曲网络的诱导偏差，尤其对预训练网络造成损害，因为隐含的先验信息是有信息的。我们通过将规范正则化作为一种无关范式函数空间能量和提升函数来公理化，这在核范式中唯一识别岭解，并且关键地扩展到特征学习范式。通过研究特征学习网络的黎曼几何，我们从框架中推导出黎曼几何岭，将岭扩展到特征学习范式。相应地，我们证明规范函数空间先验是一个黎曼-高斯过程，扩展了更熟悉的高斯过程。作为实际贡献，我们提出了弧岭作为最小最大鲁棒、可扩展的替代方案，揭示了早停和规范正则化在学习范式中的深刻关系。最后，我们在图像处理和NLP迁移学习问题上展示了我们的理论后果。

英文摘要

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.

URL PDF HTML ☆

赞 0 踩 0

2605.18174 2026-05-19 cs.LG cs.DC math.OC stat.ML 版本更新

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

Ringmaster LMO: 异步线性最小化Oracle动量方法

Abdurakhmon Sadiev, Artavazd Maranjyan, Ivan Ilin, Peter Richtárik

AI总结本文提出Ringmaster LMO，一种用于无约束随机非凸优化的异步线性最小化Oracle动量方法，通过延迟阈值机制改进传统同步方法，适用于异构分布式系统，实验表明其在系统异构性增强时表现更优。

详情

AI中文摘要

Muon最近作为一种强大的替代AdamW方法出现，展现出大规模预训练的良好结果和矩阵结构更新在实践中可能更快的证据。然而，Muon以及更一般的线性最小化Oracle（LMO）方法通常用于同步方式。这在异构分布式系统中存在问题，因为工人完成梯度计算的速度不同，同步训练必须反复等待较慢的工人。本文引入Ringmaster LMO，一种用于无约束随机非凸优化的异步LMO基于动量方法。我们的方法基于Ringmaster ASGD的延迟阈值思想。对于SGD类型方法，Ringmaster ASGD通过丢弃过于陈旧的梯度实现最优时间复杂度。Ringmaster LMO将这一机制扩展到一般LMO更新。我们建立了在广义$(L_0, L_1)$-平滑条件下的收敛保证，并进一步开发了参数无关变体，具有递减步长和自适应延迟阈值。最后，我们将我们的迭代保证转换为在异构工人计算时间下的时间复杂度界限。在经典欧几里得平滑设置中，这些界限恢复了Ringmaster ASGD的最优时间复杂度。在随机二次问题和NanoChat语言模型预训练中的实验表明，Ringmaster LMO的优势随着系统异构性增加而增强，并且该方法在同步和异步基线方法中表现更优。

英文摘要

Muon has recently emerged as a strong alternative to AdamW for training neural networks, with encouraging large-scale pretraining results and growing evidence that matrix-structured updates can be faster in practice. Yet Muon, and more generally Linear Minimization Oracle (LMO) based methods, are typically used synchronously. This is problematic in heterogeneous distributed systems, where workers complete gradient computations at different speeds and synchronous training must repeatedly wait for slower workers. In this work, we introduce Ringmaster LMO, an asynchronous LMO-based momentum method for unconstrained stochastic nonconvex optimization. Our method builds on the delay-thresholding idea of Ringmaster ASGD. For SGD-type methods, Ringmaster ASGD achieves optimal time complexity by discarding overly stale gradients. Ringmaster LMO extends this mechanism to general LMO-based updates. We establish convergence guarantees under generalized $(L_0, L_1)$-smoothness and further develop a parameter-agnostic variant with decreasing stepsizes and adaptive delay thresholds. Finally, we translate our iteration guarantees into time complexity bounds under heterogeneous worker computation times. In the classical Euclidean smooth setting, these bounds recover the optimal time complexity of Ringmaster ASGD. Experiments on stochastic quadratic problems and NanoChat language-model pretraining show that the advantages of Ringmaster LMO grow with system heterogeneity and that the method outperforms strong synchronous and asynchronous baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.18170 2026-05-19 eess.SP cs.CE cs.LG 版本更新

Buffer-Parameterized Machine Learning Surrogate Models for Cross-Technology Signal Integrity Analysis and Optimization

基于缓冲参数的机器学习替代模型用于跨技术信号完整性分析与优化

Julian Withöft, Werner John, Emre Ecik, Ralf Brüning, Jürgen Götze

发表机构 * Information Processing Lab, Faculty for Electrical Engineering and Information Technology, TU Dortmund（信息处理实验室，电气工程与信息科技学院，图尔尼大学）； Pyramide2525 ； EMC Technology Center Paderborn, Zuken GmbH（帕德博恩电磁兼容技术中心，Zuken GmbH）

AI总结本文提出了一种基于缓冲参数的机器学习替代模型，用于处理跨技术变化而无需重新训练，通过将IC缓冲特性作为动态模型输入，结合PCB参数，以提高信号完整性分析和优化的效率。

Comments 12 pages, 16 figures, 7 tables. This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

印刷电路板（PCB）互连中的信号完整性（SI）分析由于集成电路（IC）缓冲技术的多样性、操作条件的变化和制造公差而变得更加复杂。现有的机器学习（ML）替代模型用于预测SI指标，如内眼轮廓、眼高（EH）、眼宽（EW）和瞬态波形特征，通常依赖于固定的缓冲参数，需要为每次技术转换生成新的数据并重新训练，成本高昂。本文介绍了一种缓冲参数化的ML替代建模方法，能够处理跨技术变化而无需重新训练，通过将IC缓冲特性（例如时钟频率、供电电压、上升/下降时间、抖动和内部电阻和电容）作为动态模型输入，与PCB参数相结合。为了确定此高维空间的最佳替代架构，进行了全面的基准研究，比较了基于树的方法（RFR/GBM）、核方法（SVR/KRR）、高斯过程回归（GPR）和神经网络。随后，该框架在具有44个设计参数的复杂互连上进行了验证。结果表明，各向异性GPR在低数据量情况下表现优异，而神经网络在大数据集上显著优于其他模型。最后，通过跨技术设计空间探索和优化场景展示了ML替代模型的实用价值，证明了与模拟相比，眼罩合规检查的计算速度大幅提高。

英文摘要

Signal integrity (SI) analysis in printed circuit board (PCB) interconnects faces increasing complexity due to diverse integrated circuit (IC) buffer technologies, varying operating conditions, and manufacturing tolerances. Existing machine learning (ML) surrogate models for predicting SI metrics such as the inner eye contour, eye-height (EH), eye-width (EW), and transient waveform features typically rely on fixed buffer parameters, requiring costly new data generation and retraining cycles for every technology shift. This paper introduces a buffer-parameterized ML surrogate modeling methodology capable of handling cross-technology variations without retraining by treating IC buffer characteristics, e.g., clock frequency, supply voltage, rise/fall times, jitter, and internal resistors and capacitors, as dynamic model inputs alongside PCB parameters. To identify the optimal surrogate architecture for this high-dimensional space, a comprehensive benchmarking study compares tree-based methods (RFR/GBM), kernel methods (SVR/KRR), Gaussian process regression (GPR), and neural networks. The framework is subsequently validated on a complex interconnect with 44 design parameters. Results show that while anisotropic GPR excels in low-data regimes, neural networks heavily outperform other models on large datasets. Finally, the practical value of the ML surrogate models is demonstrated through a cross-technology design space exploration and optimization scenario, showcasing massive computational speedups for eye mask compliance checking compared to simulation.

URL PDF HTML ☆

赞 0 踩 0

2605.18165 2026-05-19 cs.LG 版本更新

Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs

Elastic-dLLM: Diffusion LLMs的弹性上下文压缩与增强

Junyi Wu, Tianchen Zhao, Shaoqiu Zhang, Linfeng Zhang, Guohao Dai, Yu Wang

发表机构 * Tsinghua University（清华大学）； Shanghai Jiao Tong University（上海交通大学）； Infinigence AI

AI总结本文针对扩散大语言模型中上下文压缩和增强问题，提出了一种位置保持的上下文压缩和终端感知增强方法，以提高解码效率并实现长上下文扩展。

详情

AI中文摘要

与自回归模型生成一个token一次不同，dLLMs通过联合去噪一批[MASK] tokens并每一步采样一个或多个token；尽管这允许并行解码，但由于被掩码token的大量批大小，这个过程会带来显著的计算成本。我们观察到，大部分成本用于重复处理前面的上下文和许多[MASK] tokens的相同特征表示，表明存在相当大的计算冗余。在本工作中，我们从[MASK] tokens的角度重新审视dLLM的冗余性。通过系统分析，我们验证了[MASK] tokens的冗余性并揭示了它们在提供结构信息中的关键作用。基于这些发现，我们提出了位置保持的[MASK] token压缩和终端感知增强。通过压缩冗余的[MASK]计算，该方法加速了解码，并进一步为受有限输入长度约束的完整序列dLLMs（如LLaDA-8B-Instruct和LLaDA-1.5）提供了自然的上下文折叠式长上下文扩展。此外，对于块dLLMs（如LLaDA2.0-mini），它通过添加受保护的终端[MASK] token来增强生成质量，且无显著开销。

英文摘要

Unlike autoregressive models, which generate one token at a time, dLLMs denoise a chunk of [MASK] tokens jointly and sample one or more tokens per step; despite enabling parallel decoding, this process incurs substantial computational cost due to the large chunk size of masked tokens. We observe that much of this cost is spent on repeatedly processing the preceding context and many [MASK] tokens with the same feature representations, indicating considerable computational redundancy. In this work, we revisit dLLM's redundancy from the perspective of [MASK] tokens. Through systematic analysis, we verify the redundancy of [MASK] tokens while revealing their critical role in providing structural information. Guided by these findings, we propose position-preserving [MASK] token compression and terminal-aware augmentation. By compressing redundant [MASK] computation, this approach accelerates decoding and further provides a natural extension toward context-folding-like long-context scaling under limited input-length constraints for full-sequence dLLMs such as LLaDA-8B-Instruct and LLaDA-1.5. Moreover, for block dLLMs such as LLaDA2.0-mini, it augments the context with a protected terminal [MASK] token to enhance generation quality with negligible overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.16142 2026-05-19 cs.AI cs.LG 版本更新

Property-Guided LLM Program Synthesis for Planning

基于属性的LLM程序合成用于规划

André G. Pereira, Augusto B. Corrêa, Jendrik Seipp

发表机构 * Federal University of Rio Grande do Sul（里约格朗德杜斯尔大学）； University of Oxford（牛津大学）； Linköping University（林奈大学）

AI总结本文研究了一种基于属性的LLM程序合成方法，通过检查候选程序是否满足形式定义的属性来指导LLM生成更高质量的程序，从而减少生成和评估成本。

详情

AI中文摘要

LLMs在程序合成中表现出色，能够发现超越先前解决方案的程序。然而，这些方法依赖于简单的数值评分来指示程序质量，如解决方案的值或通过的测试数量。因为评分无法指导程序为何失败，系统必须生成并评估许多候选程序，希望其中一些成功，从而增加LLM推理和评估成本。我们研究了一种不同的方法：属性引导的LLM程序合成。与评分程序后评估不同，我们检查候选程序是否满足形式定义的属性。当属性被违反时，我们提前停止评估并提供具体的反例，显示程序为何失败。这种反馈显著减少了程序生成的数量和评估成本，并可以指导LLM生成更强大的程序。我们在PDDL规划领域评估了这种方法，要求LLM合成直接启发函数：每个通过严格改进转换可达的状态都有严格改进的后继。具有这种属性的启发函数可使爬山算法直接到达目标状态。反例引导的修复循环生成一个候选程序，检查训练集上的属性，并返回第一个违反属性的案例。我们在十个规划领域上评估了这种方法，并使用分布外测试集。合成的启发函数在几乎所有测试任务中都是直接的，与最佳先前生成方法相比，我们的方法在每个领域平均生成的程序数量少七倍，无需使用搜索即可解决更多任务，并且评估候选人的计算量减少了几个数量级。只要问题允许可验证的属性，属性引导的LLM合成可以降低成本并提高程序质量。

英文摘要

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests. Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs. We study a different approach: property-guided LLM program synthesis. Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property. When the property is violated, we stop the evaluation early and provide the LLM with a concrete counterexample showing exactly how the program failed. This feedback drastically reduces both the number of program generations and the evaluation cost, and can guide the LLM to generate stronger programs. We evaluate this approach on PDDL planning domains, asking the LLM to synthesize direct heuristic functions: every state reachable by strictly improving transitions has a strictly improving successor. A heuristic with this property leads hill-climbing algorithm directly to a goal state. A counterexample-guided repair loop generates one candidate program, checks the property over a training set, and returns the first case that violates the property. We evaluate our approach on ten planning domains with an out-of-distribution test set. The synthesized heuristics are effectively direct on virtually all test tasks, and compared to the best prior generation method our approach generates seven times fewer programs per domain on average, solves more tasks without using search, and requires several orders of magnitude less computation to evaluate candidates. Whenever a problem admits a verifiable property, property-guided LLM synthesis can reduce cost and improve program quality.

URL PDF HTML ☆

赞 0 踩 0

2605.16015 2026-05-19 cs.RO cs.LG 版本更新

推理时的机器去学习 via 门控激活重定向

Vinícius Conte Turani, Otávio Parraga, João Vitor Boer Abitante, Kristen K. Arguello, Joana Pasquali, Ramiro N. Barros, Flavio du Pin Calmon, Christian Mattjie, Rodrigo C. Barros, Lucas S. Kupssinskü

发表机构 * MALTA, Machine Learning Theory and Applications Lab, PUCRS, Porto Alegre, Brazil（MALTA机器学习理论与应用实验室，PUCRS，波士顿-阿尔格雷，巴西）； Harvard University（哈佛大学）； Kunumi Institute, Brazil（库努米研究所，巴西）

AI总结本文提出了一种无需训练和梯度的机器去学习方法GUARD-IT，通过在推理时依赖输入的激活引导来消除特定数据集的影响，同时保持模型性能，且在量化部署下仍有效。

详情

AI中文摘要

大型语言模型会记住大量训练数据，这引发了隐私、版权侵犯和安全方面的担忧。机器去学习旨在在不改变模型性能的情况下移除特定遗忘集的影响，理想上近似于从头重新训练模型而不包含遗忘集。现有方法通过梯度基方法更新模型参数来实现这一目标。然而，这些更新计算成本高，导致不可逆的权重变化，并在模型量化部署时性能下降。一种最近的替代方法是激活工程，在推理期间更改激活以引导模型行为。尽管绕过了权重编辑，但朴素的激活引导会引入自身的问题，因为单一的全局引导向量对每个输入应用相同的干预，导致模型行为的意外变化。我们引入了推理时的机器去学习 via 门控激活重定向（GUARD-IT），这是一种训练和梯度自由的方法，通过在推理时依赖输入的激活引导来实现去学习。所得到的干预作为残差流中的规范保持旋转应用，不改变模型权重。在TOFU和MUSE上的实验表明，GUARD-IT在三个模型规模上匹配或超过了12种基于梯度的基线方法，是唯一一个在所有设置中同时保持效用、抑制记忆和避免灾难性崩溃的方法。GUARD-IT进一步支持无需重新训练的连续去学习，并在参数编辑方法会退化的量化场景下仍有效。

英文摘要

Large Language Models memorize vast amounts of training data, raising concerns regarding privacy, copyright infringement, and safety. Machine unlearning seeks to remove the influence of a targeted forget set while preserving model performance, ideally approximating a model retrained from scratch without the forget set. Existing approaches aim to achieve this by updating model parameters via gradient-based methods. However, these updates are computationally expensive, lead to irreversible weight changes, and degrade when the model is quantized for deployment. A recent alternative to changing model weights is activation engineering, where activations are changed during inference to steer model behavior. Despite circumventing weight editing, naive activation steering introduces its own failure modes, as a single global steering vector applies the same intervention to every input, leading to unintended changes in model behavior. We introduce Inference-Time Unlearning via Gated Activation Redirection (GUARD-IT), a training- and gradient-free method that unlearns via input-dependent activation steering at inference time. The resulting intervention is applied as a norm-preserving rotation in the residual stream, leaving model weights untouched. Experiments on TOFU and MUSE show that GUARD-IT matches or exceeds 12 gradient-based baselines across three model scales, while being the only method to simultaneously preserve utility, suppress memorization, and avoid catastrophic collapse across all settings. GUARD-IT further supports continual unlearning without retraining, and remains effective under quantization, a scenario in which parameter-editing methods degrade.

URL PDF HTML ☆

赞 0 踩 0

2605.12000 2026-05-19 cs.LG 版本更新

生成人工智能中的因果偏见检测

Drago Plecko

发表机构 * Department of Statistics & Data Science（统计与数据科学系）

AI总结本文研究了生成人工智能中的因果公平性问题，提出了新的因果分解结果，以量化不同因果路径和现实机制被生成模型替代对公平性的影响，并通过分析大型语言模型中的种族和性别偏见验证了方法的有效性。

详情

AI中文摘要

基于人工智能构建的自动化系统越来越多地应用于高风险领域，引发了关于公平性和现实世界中存在的人口差异持续存在的关键担忧。在此背景下，因果推断提供了一个有原则的框架来思考公平性，因为它将观察到的不平等与潜在机制联系起来，并自然与人类直觉和法律上的歧视观念相一致。先前关于因果公平性的研究主要集中在标准机器学习设置中，其中决策者为结果变量Y构建单一预测机制f_Ŷ，同时继承其他协变量的因果机制。然而，生成人工智能的设置却更加复杂：生成模型可以从任意条件下对任何变量集进行采样，隐式地构建了自己对所有因果机制的看法，而不是学习单一预测函数。这种根本性的差异要求因果公平性方法论有新的发展。我们正式定义了生成人工智能中的因果公平性问题，并在统一的理论框架下将其与标准机器学习设置相结合。然后，我们推导了新的因果分解结果，使能够对不同因果路径以及现实机制被生成模型机制替代的公平性影响进行精细量化。我们建立了识别条件并引入了用于因果感兴趣的量的高效估计器，并通过分析不同数据集中的大型语言模型中的种族和性别偏见来证明了我们方法的价值。

英文摘要

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.07263 2026-05-19 eess.SP cs.AI cs.DC cs.LG stat.ML 版本更新

Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

非协作空中联邦学习的资源元素能量差

Hao Chen, Zavareh Bozorgasl

发表机构 * Signal, Communication, and Learning Lab (SCALE Lab), Department of Electrical and Computer Engineering, Boise State University（信号、通信与学习实验室（SCALE实验室），电气与计算机工程系，博伊西州立大学）

AI总结本文提出了一种非协作物理层原始方法，即资源元素能量差（REED），用于连续符号聚合。该方法通过将实值更新的正负部分映射到配对正交的资源元素上的传输能量，并通过减去对应的接收到的能量来估计符号和。REED利用慢时间尺度校准的平均信道功率，但不需要瞬时发射端或接收端CSI或信道反转。对于独立的瑞利衰落，我们推导了单次REED和芯片多样扩展的精确一阶和二阶矩表达式。

Comments Preprint; Under-review; Codes to replicate the results is available at: https://github.com/zavareh1/REED

详情

AI中文摘要

Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.

英文摘要

Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.

URL PDF HTML ☆

赞 0 踩 0

2605.06933 2026-05-19 cs.LG cs.CR cs.MA 版本更新

MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

MAGIQ: 一种具有可证明安全性的多智能体AI治理系统

Sepideh Avizheh, Tushin Mallick, Alina Oprea, Cristina Nita-Rotaru, Reihaneh Safavi-Naini

发表机构 * University of Calgary（卡尔加里大学）； Northeastern University（东北大学）

AI总结本文提出MAGIQ，一种利用新型高效抗量子加密协议进行多智能体AI系统策略定义和执行的框架，旨在解决智能体通信和访问控制策略的安全性问题，并提供可追溯的问责机制。

详情

基于图神经ODE数字孪生的面向控制的反应堆热力学预测（在部分可观测性下）

Akzhol Almukhametov, Doyeong Lim, Rui Hu, Yang Liu

发表机构 * Department of Nuclear Engineering, Texas A&M University, College Station, TX 77843, USA（德克萨斯A&M大学核工程系，学院站，TX 77843，美国）； Argonne National Laboratory, Nuclear Science and Engineering Division, USA（阿贡国家实验室，核科学与工程部，美国）

AI总结本文提出了一种结合物理信息的图神经网络与神经普通微分方程（GNN-ODE）的模型，用于在部分可观测性下实现反应堆热力学状态的准确预测，该模型在预测精度、毫秒级推理速度和对部分可观测性的鲁棒性方面均表现出色。

详情

AI中文摘要

先进的反应堆实时监督控制需要准确预测整个系统的热力学状态，包括物理传感器不可用的位置。为满足这一需求，需要结合预测精度、毫秒级推理速度以及对部分可观测性的鲁棒性的替代模型。在本文中，我们提出了一种结合物理信息的图神经网络与神经普通微分方程（GNN-ODE）来同时解决这三个要求。我们将整个系统表示为一个有向传感器图，其边通过流/热传递感知的消息传递编码液压连接性，并通过受控的神经ODE在连续时间推进潜在动态。拓扑引导的缺失节点初始化器在运行开始时重建未仪器化状态；预测然后完全自回归进行。GNN-ODE替代模型在系统动态预测中取得了令人满意的成果。在测试模拟瞬态中，替代模型在60秒时对未仪器化节点的平均MAE为0.91 K，在300秒时为2.18 K，对于缺失节点状态重建，$R^2$达到0.995。在单个GPU上推理速度大约是模拟时间的105倍，使64成员的集合运行成为可能，用于不确定性量化。为了评估仿真到现实的转移，我们使用逐层判别微调将预训练的替代模型适应到实验设施数据上，仅使用30个训练序列。学习的流依赖热传递缩放恢复了与已确立相关性一致的雷诺数指数，表明了超越轨迹拟合的构成学习。该模型跟踪了陡峭的功率变化瞬态，并在未仪器化位置产生了准确的轨迹。

英文摘要

Real-time supervisory control of advanced reactors requires accurate forecasting of plant-wide thermal-hydraulic states, including locations where physical sensors are unavailable. Meeting this need calls for surrogate models that combine predictive fidelity, millisecond-scale inference, and robustness to partial observability. In this work, we present a physics-informed message-passing Graph Neural Network coupled with a Neural Ordinary Differential Equation (GNN-ODE) to addresses all three requirements simultaneously. We represent the whole system as a directed sensor graph whose edges encode hydraulic connectivity through flow/heat transfer-aware message passing, and we advance the latent dynamics in continuous time via a controlled Neural ODE. A topology-guided missing-node initializer reconstructs uninstrumented states at rollout start; prediction then proceeds fully autoregressively. The GNN-ODE surrogate achieves satisfactory results for the system dynamics prediction. On held-out simulation transients, the surrogate achieves an average MAE of 0.91 K at 60 s and 2.18 K at 300 s for uninstrumented nodes, with $R^2$ up to 0.995 for missing-node state reconstruction. Inference runs at approximately 105 times faster than simulated time on a single GPU, enabling 64-member ensemble rollouts for uncertainty quantification. To assess sim-to-real transfer, we adapt the pretrained surrogate to experimental facility data using layerwise discriminative fine-tuning with only 30 training sequences. The learned flow-dependent heat-transfer scaling recovers a Reynolds-number exponent consistent with established correlations, indicating constitutive learning beyond trajectory fitting. The model tracks a steep power change transient and produces accurate trajectories at uninstrumented locations.

URL PDF HTML ☆

赞 0 踩 0

2604.02184 2026-05-19 cs.LG 版本更新

Neural-network methods for two-dimensional finite-source reflector design

用于二维有限源反射器设计的神经网络方法

Roel Hacking, Lisa Kusch, Koondanibha Mitra, Martijn Anthonissen, Wilbert IJzerman

发表机构 * Eindhoven University of Technology（埃因霍温理工大学）； Signify（Signify公司）

AI总结本文提出了一种基于神经网络的二维有限源反射器设计方法，通过直接变量变换损失和基于网格的损失函数优化反射器高度，实现了高精度的远场分布控制，并在多个基准测试中展示了比传统反卷积方法更高的精度和速度。

Comments 25 pages, 12 figures, 2 tables. Submitted to Machine Learning: Science and Technology

详情

AI中文摘要

我们解决了将有限扩展光源发出的光转换为指定远场分布的二维反射器设计的逆问题。反射器高度由神经网络表示，并通过两个目标函数进行优化：一个基于闭式反射射线图的直接变量变换损失，以及一个将目标单元映射回光源的基于网格的损失，适用于不连续光源。通过自动微分计算梯度，并使用稳健的拟牛顿方法进行最小化。作为基线，我们采用了一种基于简化有限源近似的反卷积流程：从通量平衡中恢复一维单调映射，通过积分因子ODE求解转换为反射器，并嵌入修改后的Van Cittert迭代中，结合非负性裁剪和射线追踪反馈。在四个基准测试中，涵盖连续和不连续光源以及最小高度约束，精度通过射线追踪归一化均方误差测量。在两个主要基准测试中，神经方法在几秒钟内达到约2e-5和5e-5的误差，相比之下，反卷积基线在数百秒后仍为4e-3和5e-2。结果表明，神经方法在精度和速度上均优于传统方法，同时仍支持实际的高度约束。我们还讨论了通过迭代校正方案扩展到旋转对称和全三维反射器设计的可能性。

英文摘要

We address the inverse problem of designing two-dimensional reflectors that transform light from a finite, extended source into a prescribed far-field distribution. The reflector height is represented by a neural network and optimized with two objective functions: a direct change-of-variables loss based on the closed-form inverse ray map, and a mesh-based loss that maps target cells back to the source and remains usable for discontinuous sources. Gradients are computed by automatic differentiation and minimized with a robust quasi-Newton method. As a baseline, we adapt a deconvolution pipeline built on a simplified finite-source approximation: a one-dimensional monotone map is recovered from flux balance, converted to a reflector by an integrating-factor ODE solve, and embedded in a modified Van Cittert iteration with nonnegativity clipping and ray-traced feedback. Across four benchmarks, covering continuous and discontinuous sources and minimum-height constraints, accuracy is measured by ray-traced normalized mean absolute error. On the two main benchmarks, the neural method reaches errors of about 2e-5 and 5e-5 within a few seconds on one NVIDIA RTX 4090 GPU, compared with 4e-3 and 5e-2 for the deconvolution baseline after several hundred seconds. The results show that the neural formulation is both more accurate and substantially faster, while still supporting practical height constraints. We also discuss extensions to rotationally symmetric and full three-dimensional reflector design through iterative correction schemes.

URL PDF HTML ☆

赞 0 踩 0

2603.23194 2026-05-19 cs.GR cs.CV cs.LG 版本更新

PhysSkin: Real-Time and Generalizable Physics-Based Animation via Self-Supervised Neural Skinning

PhysSkin: 通过自监督神经皮肤化实现实时且可泛化的基于物理的动画

Yuanhang Lei, Tao Cheng, Xingxuan Li, Boming Zhao, Siyuan Huang, Ruizhen Hu, Peter Yichen Chen, Hujun Bao, Zhaopeng Cui

发表机构 * State Key Laboratory of CAD&CG（CAD与计算机图形学国家重点实验室）； BIGAI ； Shenzhen University（深圳大学）； University of British Columbia（不列颠哥伦比亚大学）

AI总结本文提出PhysSkin框架，通过自监督学习策略实现对多样3D形状和离散化形式的实时基于物理的动画，其核心方法是神经皮肤化场自动编码器和物理感知的学习策略。

Comments Accepted by CVPR 2026 Highlight. Project Page: https://zju3dv.github.io/PhysSkin/

详情

AI中文摘要

实现能够在多样3D形状和离散化形式之间泛化的真实时间基于物理的动画仍然是一个基本挑战。我们引入PhysSkin，一个基于物理的框架，解决这一挑战。受线性混合皮肤化的启发，我们学习连续皮肤化场作为基函数，将运动子空间坐标提升到全空间变形，子空间由手柄变换定义。为了生成无网格、离散化无关且物理一致的皮肤化场，PhysSkin采用新的神经皮肤化场自动编码器，由基于Transformer的编码器和交叉注意力解码器组成。此外，我们还开发了一种新的物理感知自监督学习策略，结合实时皮肤化场归一化和冲突感知梯度校正，从而有效平衡能量最小化、空间平滑性和正交约束。PhysSkin在可泛化的神经皮肤化上表现出色，并实现了实时基于物理的动画。

英文摘要

Achieving real-time physics-based animation that generalizes across diverse 3D shapes and discretizations remains a fundamental challenge. We introduce PhysSkin, a physics-informed framework that addresses this challenge. In the spirit of Linear Blend Skinning, we learn continuous skinning fields as basis functions lifting motion subspace coordinates to full-space deformation, with subspace defined by handle transformations. To generate mesh-free, discretization-agnostic, and physically consistent skinning fields that generalize well across diverse 3D shapes, PhysSkin employs a new neural skinning fields autoencoder which consists of a transformer-based encoder and a cross-attention decoder. Furthermore, we also develop a novel physics-informed self-supervised learning strategy that incorporates on-the-fly skinning-field normalization and conflict-aware gradient correction, enabling effective balancing of energy minimization, spatial smoothness, and orthogonality constraints. PhysSkin shows outstanding performance on generalizable neural skinning and enables real-time physics-based animation.

URL PDF HTML ☆

赞 0 踩 0

2603.20216 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Locally Coherent Parallel Decoding in Diffusion Language Models

局部相干并行解码在扩散语言模型中

Michael Hersche, Nicolas Menet, Ronan Tanios, Abbas Rahimi

发表机构 * IBM Research - Zurich（IBM瑞士研究实验室）

AI总结本文提出CoDiLA方法，通过引入小型辅助自回归模型来解决扩散语言模型在并行解码中的相干性问题，从而在代码生成任务中实现更高的准确性和速度。

Comments Accepted at ICML 2026

详情

AI中文摘要

扩散语言模型（DLMs）作为一种有前景的替代自回归（AR）模型，提供了亚线性生成延迟和双向能力，这在代码生成和编辑中尤为吸引人。在离散DLMs中实现亚线性延迟需要并行预测多个token。然而，标准DLMs从条件边缘分布独立采样token，无法捕捉同时生成token之间的联合依赖关系。因此，它们常常导致语法不一致并破坏多token结构。在本工作中，我们引入CoDiLA（Coherent Diffusion with Local Autoregression），一种方法，通过引入小型辅助AR模型来解决并行采样与局部依赖建模之间的矛盾。该方法将局部解码委托给一个小型辅助AR模型，该模型在扩散潜变量上进行操作。这种设计允许并行生成，同时在块内确保序列的有效性，并保持核心DLM能力，包括跨块的双向建模。我们证明使用高度紧凑的辅助AR模型（例如，0.6B参数）可以有效消除相干性伪影，在代码生成基准中建立了一个新的帕累托前沿。

英文摘要

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Achieving sub-linear latency in discrete DLMs requires predicting multiple tokens in parallel. However, standard DLMs sample tokens independently from conditional marginal distributions, failing to capture the joint dependencies among concurrently generated tokens. As a result, they often lead to syntactic inconsistencies and break multi-token structures. In this work, we introduce CoDiLA (Coherent Diffusion with Local Autoregression), a method that reconciles parallel sampling with local dependency modeling. Rather than forcing the DLM to resolve fine-grained syntax, CoDiLA delegates local decoding to a small, auxiliary AR model operating on the diffusion latents. This design allows for parallel generation while ensuring sequential validity within a block and maintaining core DLM capabilities, including bidirectional modeling across blocks. We demonstrate that using a highly compact auxiliary AR model (e.g., 0.6B parameters) effectively eliminates coherence artifacts, establishing a new Pareto frontier for accuracy and speed in code generation benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.17577 2026-05-19 cs.LG cs.AI stat.ML 版本更新

Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

通过示范多样性从离线数据中识别潜在动作和动态

Felix Schur

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结本文研究了在不观察动作的情况下从离线轨迹中恢复潜在动作和环境动态的问题，通过示范多样性假设，证明了在满足特定条件时，潜在转移和示范策略可以被唯一确定，从而为从离线强化学习数据中学习潜在动作和动态提供了新的方法。

详情

AI中文摘要

在动作未被观察的情况下，能否从离线轨迹中恢复潜在动作和环境动态？我们研究了在轨迹无动作但带有示范者身份标签的设置中这一问题。我们假设每个示范者遵循不同的策略，而环境动态在所有示范者之间是共享的，身份仅通过所选动作影响下一个观测。在这些假设下，条件下一个观测分布 $p(o_{t+1}\mid o_t,e)$ 是潜在动作条件化转移核的混合，具有示范者特定的混合权重。我们证明，这导致每个状态的可观测条件分布具有列随机非负矩阵分解。通过充分分散的策略多样性和秩条件，我们证明潜在转移和示范策略在潜在动作标签的排列下是可识别的。通过Gram行列式最小体积准则，我们将结果扩展到连续观测空间，并证明在连接的状态空间上转移映射的连续性将局部排列模糊性提升为单一全局排列。少量标记的动作数据足以消除最终的模糊性。这些结果确立了示范多样性作为从离线强化学习数据中学习潜在动作和动态的原理性可识别性来源。

英文摘要

Can latent actions and environment dynamics be recovered from offline trajectories when actions are never observed? We study this question in a setting where trajectories are action-free but tagged with demonstrator identity. We assume that each demonstrator follows a distinct policy, while the environment dynamics are shared across demonstrators and identity affects the next observation only through the chosen action. Under these assumptions, the conditional next-observation distribution $p(o_{t+1}\mid o_t,e)$ is a mixture of latent action-conditioned transition kernels with demonstrator-specific mixing weights. We show that this induces, for each state, a column-stochastic nonnegative matrix factorization of the observable conditional distribution. Using sufficiently scattered policy diversity and rank conditions, we prove that the latent transitions and demonstrator policies are identifiable up to permutation of the latent action labels. We extend the result to continuous observation spaces via a Gram-determinant minimum-volume criterion, and show that continuity of the transition map over a connected state space upgrades local permutation ambiguities to a single global permutation. A small amount of labeled action data then suffices to fix this final ambiguity. These results establish demonstrator diversity as a principled source of identifiability for learning latent actions and dynamics from offline RL data.

URL PDF HTML ☆

赞 0 踩 0

2603.17041 2026-05-19 stat.ML cs.AI cs.LG stat.ME 版本更新

When Marginals Match but Structure Fails: Covariance Fidelity in Generative Models

当边缘匹配但结构失败：生成模型中的协方差保真度

Nazia Riasat

发表机构 * North Dakota State University（北达科他州立大学）

AI总结本文提出了一种基于协方差层面的依赖保真度评估标准，以弥补传统边缘分布匹配评估方法的不足，通过实验证明该标准能更准确地区分结构保留与结构丢失的生成模型。

Comments 44 pages, 25 figures. Extended version of paper accepted at MathAI 2026 (International Conference on Mathematics of Artificial Intelligence), March 30 - April 3, 2026

详情

AI中文摘要

生成模型正越来越多地被用作真实数据的替代品用于下游科学流程，但标准评估标准仍然集中在边缘分布匹配上。我们主张这代表了一个根本性的差距：下游推断很少是边缘操作，且一个通过所有单变量诊断的模型仍可能产生结构不可靠的合成数据。我们引入了协方差层面的依赖保真度，通过D_Sigma(P,Q) = ||Sigma_P - Sigma_Q||_F来衡量生成模型是否在超出单变量边缘之外保留数据的联合结构。三个结果正式化了这一准则。首先，边缘保真度对依赖结构没有任何约束：D_Sigma可以被任意增大，同时所有单变量边缘完全匹配。其次，协方差分歧会引起可量化的下游不稳定性，包括总体回归系数的符号反转。第三，通过Davis-Kahan型界提供对依赖敏感过程如PCA的正向稳定性保证。在三个领域，图像数据（Fashion-MNIST VAE，n = 60,000）、批量RNA-seq（TCGA-BRCA，n = 1,111）和小样本压力测试（阿尔茨海默症基因表达，n = 113）的实证验证显示，D_Sigma/delta在标准边缘诊断显示很少分离的情况下，能一致地区分结构丢弃与结构保留的生成器，确认了协方差层面保真度在跨领域和样本大小上提供了与现有评估指标正交的信息。

英文摘要

Generative models are increasingly deployed as substitutes for real data in downstream scientific workflows, yet standard evaluation criteria remain focused on marginal distribution matching. We argue that this represents a fundamental gap: downstream inference is rarely a marginal operation, and a model that passes every univariate diagnostic can still produce structurally unreliable synthetic data. We introduce covariance-level dependence fidelity, measured by D_Sigma(P,Q) = ||Sigma_P - Sigma_Q||_F, as a principled, computable criterion for evaluating whether a generative model preserves the joint structure of data beyond its univariate marginals. Three results formalise this criterion. First, marginal fidelity provides no constraint on dependence structure: D_Sigma can be made arbitrarily large while all univariate marginals match exactly. Second, covariance divergence induces quantifiable downstream instability, including sign reversals in population regression coefficients. Third, bounding D_Sigma provides positive stability guarantees for dependence-sensitive procedures such as PCA via Davis-Kahan-type bounds. Empirical validation across three domains, image data (Fashion-MNIST VAE, n = 60,000), bulk RNA-seq (TCGA-BRCA, n = 1,111), and a small-sample stress test (Alzheimer's gene expression, n = 113), shows that D_Sigma/delta consistently distinguishes structure-discarding from structure-preserving generators in cases where standard marginal diagnostics show little separation, confirming that covariance-level fidelity provides information orthogonal to existing evaluation metrics across domains and sample sizes.

URL PDF HTML ☆

赞 0 踩 0

2603.08462 2026-05-19 cs.LG 版本更新

Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

推理作为压缩：通过条件信息瓶颈统一预算强制

Fabio Valerio Massoli, Andrey Kuzmin, Arash Behboodi

发表机构 * Qualcomm AI Research（高通人工智能研究）

AI总结本文提出将高效推理视为信息瓶颈原则下的损失性压缩问题，通过引入条件信息瓶颈（CIB）原则，解决了传统预算强制方法在处理transformers时的理论缺陷，并通过语义先验实现了更高效的推理压缩，提升了准确率并减少了计算成本。

详情

AI中文摘要

\ac{CoT}提示方法提高了LLM在复杂任务上的准确性，但通常会增加token使用和推理成本。现有的"预算强制"方法通过使用启发式长度惩罚进行微调来减少成本，但会抑制必要的推理和冗余填充。我们重新将高效推理视为在\ac{IB}原则下的损失性压缩问题，并识别出在应用朴素\ac{IB}到transformers时的关键理论缺口：注意力违反了提示、推理轨迹和响应之间的马尔可夫性质。为了解决这个问题，我们模型\ac{CoT}生成在\ac{CIB}原则下，其中推理轨迹$Z$作为计算桥梁，只包含响应$Y$中无法直接从提示$X$获得的信息。这产生了一个通用的强化学习目标：在推理轨迹的先验分布下最大化任务奖励，同时压缩完成内容，将常见启发法（如长度惩罚）作为特殊情况（如均匀先验）包含在内。与传统的token计数方法不同，我们引入了一个语义先验，通过语言模型测量token成本的惊奇度。关键的是，该先验仅在token级log-概率上进行查询，对训练循环的开销可忽略不计。实证表明，我们的\ac{CIB}目标在保留流畅性和逻辑性的同时修剪推理冗余，提高准确率在中等压缩水平，并在最小的准确率下降下实现激进压缩。这些收益在不同模型家族和任务领域中得到验证，确认\ac{CIB}作为一种领域无关的CoT压缩框架。

英文摘要

\ac{CoT} prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing ``Budget Forcing'' methods reduce cost via fine-tuning with heuristic length penalties, suppressing both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the \ac{IB} principle, and identify a key theoretical gap when applying naive \ac{IB} to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model \ac{CoT} generation under the \ac{CIB} principle, where the reasoning trace $Z$ acts as a computational bridge that contains only the information about the response $Y$ that is not directly accessible from the prompt $X$. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting approaches, we introduce a semantic prior that measures token cost by surprisal under a language model. Crucially, the prior is queried only for token-level log-probabilities, adding negligible overhead to the training loop. Empirically, our \ac{CIB} objective prunes reasoning redundancy while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop. These gains generalize across model families and task domains, confirming \ac{CIB} as a domain-agnostic CoT compression framework.

URL PDF HTML ☆

赞 0 踩 0

2603.08290 2026-05-19 cs.LG cs.AI 版本更新

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

先浅后深：一种由深度诱导的sharpness-aware minimization的隐式偏见

Chaewon Moon, Dongkuk Si, Chulhee Yun

发表机构 * Graduate School of AI, KAIST（韩国成均馆大学人工智能研究生院）； Mobilint, Inc.（Mobilint公司）

AI总结该研究探讨了在训练线性可分二分类问题时，sharpness-aware minimization (SAM) 的隐式偏见，发现对于深度L=2的情况，SAM的行为与深度L=1时不同，展示了sequential feature amplification现象。

Comments Accepted to ICLR 2026, 84 pages, 35 figures

详情

AI中文摘要

我们研究了在训练L层线性对角网络时，sharpness-aware minimization (SAM) 的隐式偏见。对于线性模型（L=1），ℓ∞-SAM和ℓ2-SAM都能恢复ℓ2最大间隔分类器，与梯度下降（GD）一致。然而，对于深度L=2，行为发生剧烈变化——即使在单例数据集上。对于ℓ∞-SAM，极限方向依赖于初始化，并可能收敛到零向量或任何标准基向量，与GD的极限方向形成鲜明对比。对于ℓ2-SAM，我们证明其极限方向与GD的ℓ1最大间隔解一致，但有限时间动态表现出我们称之为“顺序特征放大”的现象，即预测器最初依赖于次要坐标，然后逐渐转向更大的坐标。我们的理论分析将这种现象归因于ℓ2-SAM在扰动中应用的梯度归一化因子，该因子在早期放大次要坐标，允许主要坐标在后期主导。合成和真实数据实验验证了我们的发现。

英文摘要

We study the implicit bias of Sharpness-Aware Minimization (SAM) when training $L$-layer linear diagonal networks on linearly separable binary classification. For linear models ($L=1$), both $\ell_\infty$- and $\ell_2$-SAM recover the $\ell_2$ max-margin classifier, matching gradient descent (GD). However, for depth $L = 2$, the behavior changes drastically -- even on a single-example dataset. For $\ell_\infty$-SAM, the limit direction depends critically on initialization and can converge to $\mathbf{0}$ or to any standard basis vector, in stark contrast to GD, whose limit aligns with the basis vector of the dominant data coordinate. For $\ell_2$-SAM, we show that although its limit direction matches the $\ell_1$ max-margin solution as in the case of GD, its finite-time dynamics exhibit a phenomenon we call "sequential feature amplification", in which the predictor initially relies on minor coordinates and gradually shifts to larger ones as training proceeds or initialization increases. Our theoretical analysis attributes this phenomenon to $\ell_2$-SAM's gradient normalization factor applied in its perturbation, which amplifies minor coordinates early and allows major ones to dominate later, giving a concrete example where infinite-time implicit-bias analyses are insufficient. Synthetic and real-data experiments corroborate our findings.

URL PDF HTML ☆

赞 0 踩 0

2603.06984 2026-05-19 stat.ML cs.AI cs.GT cs.LG cs.SI 版本更新

Masking Causality and Conditional Dependence

掩盖因果关系与条件依赖

Zou Yang, Sophia Xiao, Bijan Mazaheri

发表机构 * Thayer School of Engineering（泰勒学校工程学院）； Dartmouth College（达特茅斯学院）

AI总结本文研究了通过平均约束来强制条件独立性的问题，发现这种约束在监管层面无法满足分层要求，而在优化者层面却能有效隐藏依赖关系，从而指出通过观测决策的平均统计来监管直接依赖是有限的，必须在决策规则层面进行监管。

详情

AI中文摘要

许多监管和分析问题要求被禁止的变量只能通过指定的允许渠道影响决策——这是一种出现在路径特定公平性、处理敏感信息和监管非公开信息交易等场景中的条件独立性要求。这些要求可以通过分层方式执行，或更常见且更高效地通过单个平均约束来执行。本文从监管者的角度将因果掩盖建模为一个线性规划，并证明平均约束优化几乎总是产生违反分层要求但恰好满足平均约束的政策。掩盖收益随着混淆和结果异质性增加而增长，检测需要精确的条件独立性测试，而平均约束旨在避免这些测试。从优化者的角度来看，相同的构造表明，被掩盖的政策恢复了大部分无约束利用的收益，但更难被检测到，因此在决策基础本身敏感的任何设置中都具有吸引力。这些结果表明，通过观测决策的平均统计来监管直接依赖在结构上是有限的，有意义的监管必须在决策规则本身层面进行。

英文摘要

Many regulatory and analytic problems require that a prohibited variable influence a decision only through a designated allowable channel -- a conditional-independence requirement that arises in path-specific fairness, the handling of classified information, and the regulation of trading on non-public information, among other settings. Such requirements may be enforced either stratum-by-stratum or, more commonly (and more efficiently), through a single averaged constraint on the conditional effect. We study the resulting enforcement problem from two perspectives. From the regulator's side, we formulate causal masking as a linear program and show that averaged-constraint optimization almost surely produces policies that violate the stratum-wise requirement while satisfying the averaged one exactly. The gains from masking grow with confounding and outcome heterogeneity, and detection requires precisely the conditional-independence tests that average constraints aim to avoid. From the optimizer's side, the same construction shows that masked policies recover most of the reward of unconstrained exploitation while being far harder to detect, making them attractive in any setting where the basis of decisions is itself sensitive. Together, these results argue that regulating direct dependence through averaged statistics on observed decisions is structurally limited, and that meaningful enforcement must operate at the level of the decision rule itself.

URL PDF HTML ☆

赞 0 踩 0

2602.21707 2026-05-19 eess.IV cs.CV cs.LG math.OC 版本更新

Learning spatially adaptive sparsity level maps for arbitrary convolutional dictionaries

学习任意卷积字典的时空自适应稀疏性水平图

Joshua Schulz, David Schote, Christoph Kolbitsch, Kostas Papafitsoros, Andreas Kofler

发表机构 * Physikalisch-Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany（物理技术联邦机构（PTB），柏林和不莱梅，德国）； School of Mathematical Sciences, Queen Mary University of London, UK（伦敦女王学院数学科学学院，英国）

AI总结本文提出了一种学习方法，通过改进的网络设计和专门的训练策略，扩展了基于神经网络推断的时空自适应稀疏性水平图的图像重建方法，实现了滤波器排列不变性，并在低场MRI中展示了使用不同字典的优势。

Comments accepted for publication at ICIP 2026; differs from previous versions after a bugfix in one of the used packages; corresponds to the final camera-ready version submitted to the conference

详情

AI中文摘要

最先进的学习重建方法通常依赖于黑盒模块，尽管性能强大，但对其可解释性和鲁棒性提出了质疑。本文基于最近提出的一种图像重建方法，通过将数据驱动的信息嵌入到基于模型的卷积字典正则化中，利用神经网络推断的时空自适应稀疏性水平图。通过改进的网络设计和专门的训练策略，我们扩展了该方法，以实现滤波器排列不变性以及在推理时更改卷积字典的可能性。我们将该方法应用于低场MRI，并与其他几种最近的深度学习方法进行了比较，包括体内数据，展示了使用不同字典的优势。我们进一步评估了该方法在测试体内和体外数据时的鲁棒性。当测试体外数据时，所提出的方法比其他学习方法受到的数据分布偏移影响更小，这归因于其基于模型的重建组件对训练数据的依赖性较低。

英文摘要

State-of-the-art learned reconstruction methods often rely on black-box modules that, despite their strong performance, raise questions about their interpretability and robustness. Here, we build on a recently proposed image reconstruction method, which is based on embedding data-driven information into a model-based convolutional dictionary regularization via neural network-inferred spatially adaptive sparsity level maps. By means of improved network design and dedicated training strategies, we extend the method to achieve filter-permutation invariance as well as the possibility to change the convolutional dictionary at inference time. We apply our method to low-field MRI and compare it to several other recent deep learning-based methods, also on in vivo data, where the benefit of using a different dictionary is demonstrated. We further assess the method's robustness when tested on in- and out-of-distribution data. When tested on the latter, the proposed method suffers less from the data distribution shift compared to the other learned methods, which we attribute to its reduced reliance on training data due to its underlying model-based reconstruction component.

URL PDF HTML ☆

赞 0 踩 0

2602.12703 2026-05-19 cs.LG 版本更新

SWING: Unlocking Implicit Graph Representations for Graph Random Features

SWING: 解锁隐式图表示用于图随机特征

Alessandro Manenti, Avinava Dubey, Arijit Sehanobish, Cesare Alippi, Krzysztof Choromanski

发表机构 * Google Research（谷歌研究）； Independent Researcher（独立研究者）； Google DeepMind（谷歌深Mind）； Columbia University（哥伦比亚大学）

AI总结 SWING通过在连续空间中进行行走而非在图节点上进行行走，实现了对隐式图表示（i-graphs）中图随机特征的高效计算，其核心方法是结合随机特征和重要性采样技术的定制Gumbel-softmax采样机制，从而在不需显式图结构的情况下，提高了计算效率和精度。

详情

AI中文摘要

我们提出了SWING：空间行走用于隐式网络图，这是一种新的算法类别，用于在由隐式表示（i-graphs）给出的图上进行图随机特征的计算，其中边权重定义为相应节点特征向量的双变量函数。这些图类包括多个显著例子，如ε邻域图，广泛用于机器学习。与在图节点上进行行走不同，这些方法依赖于在连续空间中的行走，在其中这些图被嵌入。为了准确且高效地近似原始组合计算，SWING应用了通过随机特征结合重要性采样技术获得的定制Gumbel-softmax采样机制，具有线性化内核。该算法本身具有独特价值。SWING依赖于隐式定义图与傅里叶分析之间的深刻联系，本文中已提出。SWING具有加速友好特性，不需要输入图的显式材料。我们对SWING进行了详细的分析，并在不同类别的i-graphs上进行了彻底的实验。

英文摘要

We propose SWING: Space Walks for Implicit Network Graphs, a new class of algorithms for computations involving Graph Random Features on graphs given by implicit representations (i-graphs), where edge-weights are defined as bi-variate functions of feature vectors in the corresponding nodes. Those classes of graphs include several prominent examples, such as: $ε$-neighborhood graphs, used on regular basis in machine learning. Rather than conducting walks on graphs' nodes, those methods rely on walks in continuous spaces, in which those graphs are embedded. To accurately and efficiently approximate original combinatorial calculations, SWING applies customized Gumbel-softmax sampling mechanism with linearized kernels, obtained via random features coupled with importance sampling techniques. This algorithm is of its own interest. SWING relies on the deep connection between implicitly defined graphs and Fourier analysis, presented in this paper. SWING is accelerator-friendly and does not require input graph materialization. We provide detailed analysis of SWING and complement it with thorough experiments on different classes of i-graphs.

URL PDF HTML ☆

赞 0 踩 0

2602.09805 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Beyond Accuracy: Decomposing the Reasoning Efficiency of LLMs

超越准确率：分解大语言模型的推理效率

Daniel Kaiser, Arnoldo Frigessi, Ali Ramezani-Kebrya, Benjamin Ricaud

发表机构 * Integreat - Norwegian Centre for knowledge-driven machine learning（Integreat - 挪威知识驱动机器学习中心）； UiT - The Arctic University of Norway（UiT - 北极大学）； University of Oslo（奥斯陆大学）

AI总结本文提出一种无需追踪的评估协议，通过完成率、条件正确性和生成长度三个指标分解大语言模型的token效率，同时考虑任务工作量元数据进行归一化处理，并评估模型在不同任务上的推理效率和冗余问题。

Comments Preprint (under review). 29 pages, 4 figures

详情

AI中文摘要

随着推理大语言模型越来越多地通过推理、搜索和自我纠正来换取准确性，单一的准确性分数已无法说明这些token是否带来了有用的推理、从困难实例中恢复或不必要的冗长。我们介绍了一种可选追踪的评估协议，通过三个即使在封闭模型中也可用的观测指标精确分解token效率：完成率、在完成条件下正确性的条件正确性以及生成长度。当实例级工作量元数据可用时，我们进一步将生成长度归一化为声明的任务隐含工作，并将平均口头冗余与工作量依赖的扩展分离。当此类元数据不可用时，我们定义了一个可审计的求解器衍生工作量规模，并在留出自我、留出top-k和持有参考池扰动下评估其稳定性。我们在CogniLoad、GSM8K、ProofWriter和ZebraLogic上评估了14个共享开放权重模型。我们进一步在CogniLoad上评估了11个额外模型，从而能够对推理任务难度因素进行细致分析：任务长度、内在难度和干扰项密度。效率和冗余排名在所有基准对中保持稳定，比准确性排名更加稳健，同时分解了逻辑受限、上下文受限（截断驱动）和冗余受限的失败模式，这些模式在准确性每token下看起来是相同的。我们发布了评估工具包和报告模板，详细说明了LLM在推理上的低效原因。

英文摘要

As reasoning LLMs increasingly trade tokens for accuracy through deliberation, search, and self-correction, a single accuracy score can no longer tell whether those tokens buy useful reasoning, recovery from hard instances, or unnecessary verbosity. We introduce a trace-optional evaluation protocol that exactly decomposes token efficiency using three observables available even for closed models: completion rate, conditional correctness given completion, and generated length. When instance-level workload metadata is available, we further normalize generated length by declared task-implied work and separate mean verbalization overhead from workload-dependent scaling. When such metadata is absent, we define an auditable solver-derived workload scale and evaluate its stability under leave-self-out, leave-top-k, and held-out-reference-pool perturbations. We evaluate 14 shared open-weight models on CogniLoad, GSM8K, ProofWriter, and ZebraLogic. We further evaluate 11 additional models on CogniLoad, enabling a fine-grained analysis of reasoning-task difficulty factors: task length, intrinsic difficulty, and distractor density. Efficiency and overhead rankings remain stable across all benchmark pairs, more robustly than accuracy rankings, while the decomposition separates logic-limited, context-limited (truncation-driven), and verbosity-limited failure modes that look identical under accuracy-per-token. We release an evaluation artifact and reporting template, which elaborates on why an LLM is inefficient at reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.07618 2026-05-19 cs.LG stat.ML 版本更新

Neural Networks With Dense Weights Are Not Universal Approximators

具有密集权重的神经网络不是通用逼近器

Levi Rauchwerger, Stefanie Jegelka, Ron Levie

发表机构 * Princeton University, Dept of CS（普林斯顿大学计算机科学系）； MIT, Dept of EECS and CSAIL（麻省理工学院电子工程与计算机科学系及计算机科学与人工智能实验室）； TUM, School of CIT, MCML, MDSI（技术大学（TUM）信息科技学院，MCML，MDSI）； Technion – IIT, Faculty of Mathematics（技术学院–以色列理工学院数学学院）

AI总结研究探讨了密集神经网络的逼近能力，指出在有限的权重约束下，密集连接的神经网络无法逼近任意连续函数，从而揭示了密集层神经网络的固有局限性，推动了稀疏连接在实现真正通用性中的必要性。

2602.06866 2026-05-19 cs.LG 版本更新

T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility

T-STAR: 一种基于上下文的Transformer框架用于基于码头的共享微出行短期概率需求预测

Jingyi Cheng, Gonçalo Homem de Almeida Correia, Oded Cats, Shadi Sharif Azadeh

发表机构 * Transport and Planning, Delft University of Technology（代尔夫特理工大学交通与规划）

AI总结本文提出T-STAR框架，通过两级结构分离一致需求模式和短期波动，提升短期概率需求预测的准确性，实验表明其在确定性和概率性准确性上均优于现有方法，且具备良好的时空鲁棒性。

Comments This work has been submitted to Transportation Research Part C

详情

AI中文摘要

可靠的短期需求预测对于管理共享微出行服务和确保响应、以用户为中心的操作至关重要。本文介绍了T-STAR（Two-stage Spatial and Temporal Adaptive contextual Representation），一种新的基于Transformer的概率框架，旨在以15分钟的分辨率预测车站级自行车共享需求。T-STAR通过分层两级结构解决高分辨率预测中的关键挑战，第一阶段捕捉粗粒度的小时需求模式，第二阶段通过整合高频、本地化的输入（包括近期波动和实时需求变化）提高预测精度，以考虑短期需求的时间转移。时间序列Transformer模型用于两个阶段生成概率预测。使用华盛顿特区的Capitol Bikeshare数据的广泛实验表明，T-STAR在确定性和概率性准确性上均优于现有方法。该模型在车站和时间期间表现出强大的时空鲁棒性。零样本预测实验进一步展示了T-STAR在无需重新训练的情况下能够转移到以前未见过的服务区域的能力。这些结果凸显了该框架在提供细粒度、可靠且不确定性的短期需求预测方面的潜力，从而无缝整合以支持多模式出行规划，提高共享微出行服务的实时操作能力。

英文摘要

Reliable short-term demand forecasting is essential for managing shared micro-mobility services and ensuring responsive, user-centered operations. This study introduces T-STAR (Two-stage Spatial and Temporal Adaptive contextual Representation), a novel transformer-based probabilistic framework designed to forecast station-level bike-sharing demand at a 15-minute resolution. T-STAR addresses key challenges in high-resolution forecasting by disentangling consistent demand patterns from short-term fluctuations through a hierarchical two-stage structure. The first stage captures coarse-grained hourly demand patterns, while the second stage improves prediction accuracy by incorporating high-frequency, localized inputs, including recent fluctuations and real-time demand variations in connected metro services, to account for temporal shifts in short-term demand. Time series transformer models are employed in both stages to generate probabilistic predictions. Extensive experiments using Washington D.C.'s Capital Bikeshare data demonstrate that T-STAR outperforms existing methods in both deterministic and probabilistic accuracy. The model exhibits strong spatial and temporal robustness across stations and time periods. A zero-shot forecasting experiment further highlights T-STAR's ability to transfer to previously unseen service areas without retraining. These results underscore the framework's potential to deliver granular, reliable, and uncertainty-aware short-term demand forecasts, which enable seamless integration to support multimodal trip planning for travelers and enhance real-time operations in shared micro-mobility services.

URL PDF HTML ☆

赞 0 踩 0

2602.05172 2026-05-19 stat.ML cs.LG math.ST stat.TH 版本更新

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

有限粒子率的正则化Stein变分梯度下降

Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal

发表机构 * Department of Mathematics, Georgia Institute of Technology（佐治亚理工学院数学系）； Department of Statistics, University of California, Davis（加州大学戴维斯分校统计系）； Department of Statistics and Operations Research, University of North Carolina, Chapel Hill（北卡罗来纳大学夏洛特分校统计与运筹学系）； Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结本文研究了正则化Stein变分梯度下降算法的有限粒子率，通过应用树脂型预条件器来校正SVGD的常数阶偏差，推导了时间平均经验测度的非渐近界，并在目标满足W₁I条件下，证明了对于光滑核函数的大类，W₁收敛。

2602.03797 2026-05-19 cs.LG 版本更新

深度网络中最小权重扰动的理论及其在低秩激活后门攻击中的应用

Bethan Evans, Jared Tanner

发表机构 * Department of Mathematics, University of Oxford, Oxford, UK（牛津大学数学系）

AI总结本文推导了深度网络实现指定输出变化所需的最小范数权重扰动，并讨论了其大小决定因素，同时将其应用于精度修改激活的后门攻击，确定了攻击成功的压缩阈值，并展示了低秩压缩可以在保持全精度准确性的同时可靠激活潜在后门。

2601.14330 2026-05-19 cs.CV cs.LG 版本更新

通过深度行为批评稳定化实现非策略模仿学习

Sayambhu Sen, Shalabh Bhatnagar

发表机构 * Amazon Alexa（亚马逊Alexa）； Indian Institute of Science（印度科学研究院）

AI总结本文提出一种结合非策略学习的对抗模仿学习算法，通过双Q网络稳定化和价值学习（无需奖励函数推断）来提高样本效率，从而更高效地匹配专家行为。

Comments 14 pages and 4 images

2510.26745 2026-05-19 cs.LG cs.AI cs.CL stat.ML 版本更新

Deep sequence models tend to memorize geometrically; it is unclear why

深度序列模型倾向于记忆几何学；不清楚为何

Shahriar Noroozizadeh, Vaishnavh Nagarajan, Elan Rosenfeld, Sanjiv Kumar

发表机构 * Machine Learning Department \& Heinz College, Carnegie Mellon University, Pittsburgh, PA, USA ； Google Research, NY, USA

AI总结研究探讨了深度序列模型中原子事实的存储机制，发现几何记忆能编码全局关系，即使在训练中未共现的实体间也能建立联系，挑战了传统关联记忆的观点。

Comments Forty-third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

深度序列模型被认为主要通过关联记忆存储原子事实，即通过暴力查找共现实体。我们识别出一种不同的存储形式，称为几何记忆。在此模型中，嵌入编码了所有实体之间的新型全局关系，包括训练中未共现的实体。这种存储形式强大：例如，我们展示了它如何将涉及ℓ-折叠组合的困难推理任务转化为易于学习的一步导航任务。从这一现象中，我们提取了神经嵌入几何学中难以解释的基本方面。我们认为，这种几何的出现，与局部关联的查找相比，不能简单归因于典型的监督、架构或优化压力。反直觉的是，即使几何比暴力查找更复杂，它仍然会被学习。然后，通过分析与Node2Vec的联系，我们展示了几何起源于一种光谱偏见，这与主流理论相反，确实自然产生，尽管缺乏各种压力。这一分析也指出了从业者在使Transformer记忆更几何化方面的可见空间。我们希望几何视角的参数记忆鼓励重新审视指导知识获取、容量、发现和遗忘等领域的默认直觉。

英文摘要

Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of storage of atomic facts that we term as geometric memory. Here, the model has synthesized embeddings encoding novel global relationships between all entities, including ones that do not co-occur in training. Such storage is powerful: for instance, we show how it transforms a hard reasoning task involving an $\ell$-fold composition into an easy-to-learn $1$-step navigation task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, as against a lookup of local associations, cannot be straightforwardly attributed to typical supervisory, architectural, or optimizational pressures. Counterintuitively, a geometry is learned even when it is more complex than the brute-force lookup. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points out to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery, and unlearning.

URL PDF HTML ☆

赞 0 踩 0

2510.24208 2026-05-19 cs.CL cs.LG 版本更新

Beyond Neural Incompatibility: Cross-Scale Knowledge Transfer in Language Models through Latent Semantic Alignment

超越神经不兼容：通过潜在语义对齐实现语言模型中的跨尺度知识转移

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

发表机构 * Monash University（墨尔本大学）； Technical University of Munich（慕尼黑技术大学）； Chongqing University（重庆大学）

AI总结本文提出SemAlign方法，通过潜在语义对齐实现跨尺度知识转移，解决了不同架构和参数化模型间参数重用受限的问题，通过激活值作为转移介质，利用语义分解与重组稳定地实现知识迁移。

Comments an early-stage version

详情

AI中文摘要

语言模型（LMs）在其参数中编码了大量知识，但如何以细粒度方式转移此类知识，即参数化知识转移（PKT）仍不明确。核心挑战是当源模型和目标模型在架构和参数化上存在差异时，如何实现有效的、高效的跨尺度转移，这使得直接参数重用受到神经不兼容的限制。在本文中，我们识别出潜在语义对齐是跨尺度知识转移的关键前提。与直接移动层参数不同，我们的方法使用激活值作为转移介质。SemAlign包含两个阶段：一个层归因阶段，用于归因任务相关的源层并为每个目标层选择恰好一个源层；一个语义对齐阶段，通过逐层配对并优化目标模型，利用源侧语义监督。对齐通过语义分解和重组在潜在空间中进行。在浅层到深层的转移过程中，只有前沿目标层是可训练的。层目标通过匹配中心化的词-词关系几何与对齐的监督残差来监督该层的残差贡献，而输出KL保持源级预测行为。因此，转移介质既不是参数块也不是绝对的隐藏状态，而是由配对源层监督诱导的目标空间残差几何。在四个基准测试中的评估证实了SemAlign的有效性，进一步分析确认语义分解和重组为跨尺度知识转移提供了一个稳定的机制。

英文摘要

Language Models (LMs) encode substantial knowledge in their parameters, yet it remains unclear how to transfer such knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT). A central challenge is to make cross-scale transfer effective and efficient when source and target models differ in architecture and parameterization, making direct parameter reuse strongly limited by neural incompatibility. In this paper, we identify latent semantic alignment as the key prerequisite for cross-scale knowledge transfer. Instead of directly moving layer parameters, our approach uses activations as the transfer medium. \textsc{SemAlign} has two stages: an \emph{layer attribution} stage that attributes task-relevant source layers and selects exactly one source layer for each target layer, and a \emph{semantic alignment} stage that pairs them layer by layer and optimizes the target with source-side semantic supervision. The alignment is carried out in latent space through semantic decomposition and recomposition. During the shallow-to-deep transfer, only the frontier target layer is trainable. The layer objective supervises the residual contribution of that layer by matching centered token-token relation geometry against an aligned supervisory residual, while output KL preserves source-level predictive behavior. The transferred medium is therefore neither a parameter block nor an absolute hidden state, but target-space residual geometry induced by paired source-layer supervision. Evaluations on four benchmarks demonstrate the efficacy of \textsc{SemAlign}, and further analysis confirms that semantic decomposition and recomposition provide a stable mechanism for cross-scale knowledge transfer.

URL PDF HTML ☆

赞 0 踩 0

2510.08141 2026-05-19 cs.LG 版本更新

SCOPE-RL: Stable and Quantitative Control of Policy Entropy in RL Post-Training

SCOPE-RL: 稳定和定量控制强化学习后训练中的策略熵

Chen Wang, Zhaochun Li, Jionghao Bai, Hexuan Deng, Ge Lan, Yue Wang

发表机构 * College of Software, Nankai University（南开大学软件学院）； Zhongguancun Academy（中关村学院）； Beijing Institute of Technology（北京理工大学）； Zhejiang University（浙江大学）； Harbin Institute of Technology（哈尔滨工业大学）

AI总结本文提出SCOPE-RL框架，通过温度自适应的正样本构造正则化项，稳定并定量控制强化学习后训练中的策略熵，实验表明其在Pass@1和Pass@$k$任务上优于现有基线方法。

详情

AI中文摘要

强化学习（RL）是训练大型语言模型（LLMs）的关键范式，但广泛使用的分组相对策略优化（GRPO）常面临熵崩溃问题：探索迅速消失，策略提前收敛，样本多样性下降，最终损害训练效果。现有解决方案，包括熵奖励和裁剪方法，很少能保持熵在稳定的探索范围内，且常引入振荡的熵或奖励退化。在本文中，我们识别出熵动态中被忽视的不对称性：在高温度采样下，正样本和负样本对策略熵有相反影响。具体而言，高温度正样本促进熵增长，而负样本抑制它。我们为此现象提供了理论解释：当策略更新过程中熵下降时，其对温度的导数在正样本更新下严格为正，表明高温度正样本可以抵消熵衰减，从而减缓熵崩溃并可能逆转它。受此启发，我们提出了SCOPE-RL，通过构造来自温度自适应正样本的正则化项，实现稳定且定量的熵控制。广泛实验表明，SCOPE-RL在Pass@1和Pass@$k$任务上均优于现有强RL基线方法。我们的结果提供了证据，证明摆脱熵崩溃可以提高推理性能，同时显示收益是非单调的，RL后训练在推理LLMs中存在最优的探索水平。

英文摘要

Reinforcement learning (RL) is a key paradigm for post-training large language models (LLMs), but the widely used Group Relative Policy Optimization (GRPO) often suffers from entropy collapse: exploration quickly disappears, policies converge prematurely, and sample diversity declines, ultimately harming training effectiveness. Existing remedies, including entropy bonuses and clip-based methods, rarely keep entropy within a stable exploration regime and often introduce oscillatory entropy or reward degradation. In this work, we identify a previously overlooked asymmetry in entropy dynamics: under high-temperature sampling, positive and negative samples have opposite effects on policy entropy. Specifically, high-temperature positive samples promote entropy growth, whereas negative samples suppress it. We provide a theoretical explanation for this phenomenon: when entropy decreases during policy updates, its derivative with respect to temperature is strictly positive under positive-sample updates, indicating that high-temperature positive samples can counteract entropy decay, thereby slowing entropy collapse and potentially reversing it. Motivated by this insight, we propose SCOPE-RL, a stable and quantitative entropy control framework through a regularization term constructed from temperature-adaptive positive samples. Extensive experiments show that SCOPE-RL consistently outperforms strong RL baselines on both Pass@1 and Pass@$k$. Our results provide evidence that escaping entropy collapse can improve reasoning performance, while also showing that the benefit is non-monotonic, with an optimal level of exploration for RL post-training in reasoning LLMs.

URL PDF HTML ☆

赞 0 踩 0

2510.04930 2026-05-19 cs.LG 版本更新

Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking

平等梯度下降：一种加速 Grokking 的简单方法

Ali Saheb Pasand, Elvis Dohmatob

发表机构 * McGill University（麦吉尔大学）； Mila Institute（Mila研究院）； Concordia University（康科迪亚大学）

AI总结本文提出平等梯度下降（EGD）方法，通过规范化梯度使所有主方向的动态以相同速度演化，从而加速模型的 Grokking 过程，消除测试性能的停滞现象。

详情

AI中文摘要

Grokking 是一种现象，其中不同于训练性能在早期达到峰值，模型的测试/泛化性能在任意多个周期内停滞，然后突然跃升至接近完美的水平。在实践中，减少此类停滞的长度是有利的，即使学习过程'更快地 Grok'。在本工作中，我们提供了对 Grokking 的新见解。首先，我们通过实证和理论证明，不对称的（随机）梯度下降速度可以在不同主方向（即奇异方向）上诱导 Grokking。然后，我们提出了一种简单的修改，规范化梯度，使得所有主方向的动力学以相同的速度演化。接着，我们证明这种修改方法，称为平等梯度下降（EGD），可以被视为一种精心修改的自然梯度下降方法，能够更快地 Grok。事实上，在某些情况下，停滞完全被消除。最后，我们实证地展示了在经典算术问题如模加法和稀疏奇偶问题上，这种停滞现象被我们的方法消除。

英文摘要

Grokking is the phenomenon whereby, unlike the training performance, which peaks early in the training process, the test/generalization performance of a model stagnates over arbitrarily many epochs and then suddenly jumps to usually close to perfect levels. In practice, it is desirable to reduce the length of such plateaus, that is to make the learning process "grok" faster. In this work, we provide new insights into grokking. First, we show both empirically and theoretically that grokking can be induced by asymmetric speeds of (stochastic) gradient descent, along different principal (i.e singular directions) of the gradients. We then propose a simple modification that normalizes the gradients so that dynamics along all the principal directions evolves at exactly the same speed. Then, we establish that this modified method, which we call egalitarian gradient descent (EGD) and can be seen as a carefully modified form of natural gradient descent, groks much faster. In fact, in some cases the stagnation is completely removed. Finally, we empirically show that on classical arithmetic problems such as modular addition and sparse parity problem which this stagnation has been widely observed and intensively studied, that our proposed method eliminates the plateaus.

URL PDF HTML ☆

赞 0 踩 0

2510.02590 2026-05-19 cs.LG 版本更新

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

在可以的时候使用在线网络：迈向快速且稳定的强化学习

Ahmed Hendawy, Henrik Metternich, Théo Vincent, Mahdi Kallel, Jan Peters, Carlo D'Eramo

发表机构 * Technical University of Darmstadt（德累斯顿技术大学）； German Research Center for AI (DFKI)（德国人工智能研究中心（DFKI））； Robotics Institute Germany (RIG)（德国机器人研究所（RIG））； University of Würzburg（弗赖堡大学）

AI总结本文提出了一种新的更新规则，通过在目标网络和在线网络之间取最小估计来改进价值函数学习，从而实现更快且更稳定的强化学习。

Comments Accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情

AI中文摘要

在深度强化学习（RL）中，使用目标网络来估计价值函数是一种流行的方法。虽然有效，但目标网络仍是一种折中方案，它在保持稳定性的同时牺牲了缓慢移动的目标，从而延迟了学习。相反，使用在线网络作为强化目标在直觉上很有吸引力，但众所周知会导致不稳定的学。在本文中，我们旨在结合两者的优势，通过引入一种新的更新规则，该规则通过目标网络和在线网络之间的最小估计来计算目标，从而得到我们的方法MINTO。通过这种简单而有效的修改，我们证明MINTO能够通过缓解使用在线网络进行强化时的潜在过估计偏差，从而实现更快且更稳定的价值函数学习。值得注意的是，MINTO可以无缝集成到广泛的价值基础和演员-评论家算法中，成本极低。我们对MINTO在多种基准上的进行了广泛评估，涵盖了在线和离线RL以及离散和连续动作空间。在所有基准上，MINTO都一致地提高了性能，展示了其广泛的应用性和有效性。

英文摘要

The use of target networks is a popular approach for estimating value functions in deep Reinforcement Learning (RL). While effective, the target network remains a compromise solution that preserves stability at the cost of slowly moving targets, thus delaying learning. Conversely, using the online network as a bootstrapped target is intuitively appealing, albeit well-known to lead to unstable learning. In this work, we aim to obtain the best out of both worlds by introducing a novel update rule that computes the target using the MINimum estimate between the Target and Online network, giving rise to our method, MINTO. Through this simple, yet effective modification, we show that MINTO enables faster and stable value function learning, by mitigating the potential overestimation bias of using the online network for bootstrapping. Notably, MINTO can be seamlessly integrated into a wide range of value-based and actor-critic algorithms with a negligible cost. We evaluate MINTO extensively across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces. Across all benchmarks, MINTO consistently improves performance, demonstrating its broad applicability and effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2509.23183 2026-05-19 cs.LG cs.NI 版本更新

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

ZeroSiam: 一种高效的非对称方法用于测试时熵优化而不发生崩溃

Guohao Chen, Shuaicheng Niu, Deyu Chen, Jiahao Yang, Zitian Zhang, Mingkui Tan, Pengcheng Wu, Zhiqi Shen

发表机构 * Nanyang Technological University（南洋理工大学）； Joint WeBank-NTU Research Institute on Fintech（金融科技联合研究机构）； South China University of Technology（华南理工大学）

AI总结本文提出ZeroSiam，一种针对测试时熵最小化的高效非对称架构，通过非对称发散对齐防止崩溃，并通过可学习预测器和stop-gradient操作符有效实现，实验和理论证明其能防止崩溃并正则化偏见学习信号，提升性能，尤其在易崩溃的小模型上表现稳定。

详情

AI中文摘要

测试时熵最小化有助于适应新环境并激励模型的推理能力，在推理过程中允许模型通过自身预测实时进化和改进，从而实现有竞争力的性能。然而，纯粹的熵最小化可能会偏好不可推广的捷径，如放大logit范数并驱动所有预测到主导类别以减少熵，从而导致崩溃解（例如，恒定的一热输出），这些解仅通过简单的方式最小化目标函数而没有有意义的学习。在本文中，我们揭示了非对称性作为防止崩溃的关键机制，并引入了ZeroSiam——一种专门针对测试时熵最小化的高效非对称孪生架构。ZeroSiam通过非对称发散对齐来防止崩溃，这一过程通过在分类器之前使用可学习预测器和stop-gradient操作符高效实现。我们提供了实证和理论证据表明，ZeroSiam不仅能够防止崩溃，还能正则化偏见学习信号，即使在没有崩溃的情况下也能提升性能。尽管其简单性，广泛的结果显示，ZeroSiam在使用可忽略开销的情况下，比先前的方法更稳定，展示了其在视觉适应和大语言模型推理任务中的有效性，包括在具有挑战性的测试场景和多样化的模型中，特别是易崩溃的微型模型上。

英文摘要

Test-time entropy minimization helps adapt a model to novel environments and incentivize its reasoning capability, unleashing the model's potential during inference by allowing it to evolve and improve in real-time using its own predictions, achieving promising performance. However, pure entropy minimization can favor non-generalizable shortcuts, such as inflating the logit norm and driving all predictions to a dominant class to reduce entropy, risking collapsed solutions (e.g., constant one-hot outputs) that trivially minimize the objective without meaningful learning. In this paper, we reveal asymmetry as a key mechanism for collapse prevention and introduce ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization. ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse, but also regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

URL PDF HTML ☆

赞 0 踩 0

2509.23068 2026-05-19 stat.ML cs.LG 版本更新

Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

稀疏深度加法模型与交互：增强可解释性和预测性

Yi-Ting Hung, Li-Hsiang Lin, Vince D. Calhoun

发表机构 * Department of Mathematics and Statistics（数学与统计学系）； Georgia State University（佐治亚州立大学）； Tri-institutional Center for Translational Research in Neuroimaging and Data Science（神经影像与数据科学转化研究三机构中心）

AI总结本文提出了一种结合稀疏特征选择与深度子网络的稀疏深度加法模型与交互（SDAMI），通过三阶段策略实现高维回归中的可解释性和预测性提升。

详情

AI中文摘要

近年来深度学习的进步突显了需要能够从少量样本中学习、处理高维特征并保持可解释性的个性化模型。为此，我们提出了稀疏深度加法模型与交互（SDAMI）框架，该框架结合了以稀疏性驱动的特征选择与深度子网络以实现灵活的功能近似。SDAMI的核心是效应足迹原理，该原理认为高阶交互会在构成变量上留下可检测的边际痕迹，从而无需穷尽搜索即可发现它们。SDAMI通过三阶段策略执行这一原理：（1）筛选足迹变量，（2）通过组Lasso分离主效应与交互，（3）使用专用深度子网络建模组件。理论分析证实，足迹仅在测度零对称条件下消失，而这些条件在实践中极为罕见，从而确保了一致的交互恢复。广泛模拟显示，SDAMI能够成功识别出基于遗传的基线方法根本无法识别的纯交互，以接近零的假阳性率恢复复杂的效应结构。这些结果将SDAMI定位为一种原理上适用于高维回归的可解释框架。

英文摘要

Recent advances in deep learning highlight the need for personalized models that can learn from small samples, handle high-dimensional features, and remain interpretable. To address this, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity-driven feature selection with deep subnetworks for flexible function approximation. Central to SDAMI is the Effect Footprint principle, which posits that higher-order interactions leave detectable marginal traces on constituent variables, enabling their discovery without exhaustive search. SDAMI executes this principle through a three-stage strategy: (1) screening for footprint variables, (2) disentangling main effects from interactions via group lasso, and (3) modeling components with dedicated deep subnetworks. Theoretical analysis confirms that footprints vanish only under measure-zero symmetry conditions that are rare in practice, ensuring consistent interaction recovery. Extensive simulations demonstrate that SDAMI successfully identifies pure interactions that heredity-based baselines fundamentally miss, recovering complex effect structures with near-zero false positive rates. Together, these results position SDAMI as a principled framework for interpretable high-dimensional regression.

URL PDF HTML ☆

赞 0 踩 0

2509.22459 2026-05-19 stat.ML cs.LG 版本更新

The Loupe: 一种用于增强视觉变换器中判别特征的插件式注意力模块

Naren Sengodan

发表机构 * Jain University（贾因大学）

AI总结本文提出The Loupe模块，通过在视觉变换器的中间特征阶段插入轻量级插件式空间门控模块，利用小CNN预测单通道空间掩码，并在端到端训练中使用交叉熵目标和l1稀疏项对特征激活进行加权，从而提升细粒度视觉分类性能。

详情

AI中文摘要

细粒度视觉分类（FGVC）要求模型关注于细微的、与任务相关的区域，而非广泛的物体上下文。我们提出了The Loupe，一种轻量级的插件式空间门控模块，用于层次化的视觉变换器。该模块在中间特征阶段插入，使用小CNN预测单通道空间掩码，并在端到端训练中使用交叉熵目标和l1稀疏项对特征激活进行加权。在CUB-200-2011数据集上，The Loupe将Swin-Base的准确率从88.36%提升至91.72%，将Swin-Tiny的准确率从85.14%提升至88.61%，且仅增加0.1%的参数。消融实验表明，改进依赖于插入点和稀疏正则化器，表明受控的空间门控比朴素的多尺度遮蔽在此设置下更有效。定性结果表明，学习到的掩码通常与判别鸟类部分对齐，尽管该模块不是部分级监督的替代品，在遮挡或细粒度内部分差异时可能会失效。

英文摘要

Fine-Grained Visual Classification (FGVC) requires models to focus on subtle, task-relevant regions rather than broad object context. We present The Loupe, a lightweight plug-and-play spatial gating module for hierarchical Vision Transformers. The module is inserted at an intermediate feature stage, predicts a single-channel spatial mask with a small CNN, and uses that mask to reweight feature activations during end-to-end training with a cross-entropy objective and an l1 sparsity term. On CUB-200-2011, The Loupe improves Swin-Base from 88.36% to 91.72% and Swin-Tiny from 85.14% to 88.61%, with under 0.1% additional parameters. Ablations show that the improvement depends on the insertion point and the sparsity regularizer, suggesting that controlled spatial gating is more effective than naive multi-scale masking in this setting. Qualitative results indicate that the learned masks often align with discriminative bird parts, although the module is not a substitute for part-level supervision and can fail under occlusion or fine-grained intra-part differences.

URL PDF HTML ☆

赞 0 踩 0

2508.15878 2026-05-19 cs.LO cs.AI cs.CL cs.LG 版本更新

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

Lean 与理论计算机科学的交汇：形式-非形式对中可扩展的定理证明挑战合成

Terry Jingchen Zhang, Wenyuan Jiang, Rongchuan Liu, Yisong Wang, Junran Yang, Ning Wang, Nicole Ni, Yinya Huang, Mrinmaya Sachan

发表机构 * D-CHAB, ETH Zurich, Zurich, Switzerland. ； D-INFK, ETH Zurich, Zurich, Switzerland. ； ETH AI Center, Zurich, Switzerland. ； University of Pennsylvania, PA, USA. ； Independent Researcher.

AI总结本文提出利用理论计算机科学作为可扩展的严谨证明问题来源，通过算法定义自动生成大量挑战性定理-证明对，展示了在Busy Beaver问题和混合布尔算术问题上的应用，并揭示了自动定理证明在复杂问题上的局限性。

Comments Accepted to AI4MATH@ICML2025

详情

AI中文摘要

形式定理证明（FTP）已成为评估大语言模型推理能力的关键基础，使大规模自动验证数学证明成为可能。然而，进展受到有限数据集的限制，因为手动编纂成本高且缺乏具有验证形式-非形式对应关系的挑战性问题。我们提出利用理论计算机科学（TCS）作为可扩展的严谨证明问题来源，其中算法定义能够自动生成任意多的挑战性定理-证明对。我们在此两个TCS领域中展示了这种方法：Busy Beaver问题，涉及证明图灵机停止行为的界限，以及混合布尔算术问题，结合了逻辑和算术推理。我们的框架自动合成具有并行形式（Lean4）和非形式（Markdown）规范的问题，创建了一个可扩展的生成验证证明挑战的流水线。对前沿模型的评估揭示了自动定理证明的显著差距：尽管DeepSeekProver-V2-671B在Busy Beaver问题上达到57.5%的成功率，但在混合布尔算术问题上仅达到12%。这些结果突显了即使对于计算上易于验证的问题，长形式证明生成的难度，展示了TCS领域在推动自动推理研究中的价值。

英文摘要

Formal theorem proving (FTP) has emerged as a critical foundation for evaluating the reasoning capabilities of large language models, enabling automated verification of mathematical proofs at scale. However, progress has been constrained by limited datasets due to the high cost of manual curation and the scarcity of challenging problems with verified formal-informal correspondences. We propose leveraging theoretical computer science (TCS) as a scalable source of rigorous proof problems, where algorithmic definitions enable automated generation of arbitrarily many challenging theorem-proof pairs. We demonstrate this approach on two TCS domains: Busy Beaver problems, which involve proving bounds on Turing machine halting behavior, and Mixed Boolean Arithmetic problems, which combine logical and arithmetic reasoning. Our framework automatically synthesizes problems with parallel formal (Lean4) and informal (Markdown) specifications, creating a scalable pipeline for generating verified proof challenges. Evaluation on frontier models reveals substantial gaps in automated theorem proving: while DeepSeekProver-V2-671B achieves 57.5\% success on Busy Beaver problems, it manages only 12\% on Mixed Boolean Arithmetic problems. These results highlight the difficulty of long-form proof generation even for problems that are computationally easy to verify, demonstrating the value of TCS domains for advancing automated reasoning research.

URL PDF HTML ☆

赞 0 踩 0

2508.14769 2026-05-19 cs.LG cs.DC 版本更新

Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data

边缘设备上的联邦蒸馏：非iid数据的高效客户端过滤

Ahmed Mujtaba, Gleb Radchenko, Radu Prodan, Marc Masana

发表机构 * 1 Embedded Systems Division, Silicon Austria Labs, Graz, Austria ； 2 Department of Computer Science, University of Innsbruck, Austria ； 5 Institute of Information Technology, University of Klagenfurt, Austria ； 3 TU-Graz SAL DES Lab, Silicon Austria Labs, Graz, Austria ； 4 Institute of Visual Computing, Graz University of Technology, Austria

AI总结本文提出了一种高效的边缘联邦蒸馏方法EdgeFD，通过在客户端使用KMeans基于的密度比估计器来过滤分布内外的代理数据，从而减少计算复杂度并提高知识共享质量，适用于非iid数据分布。

Comments This paper was accepted at the International Conference on Federated Learning Technologies and Applications, 2025. The final version is available at IEEE Xplore

详情

DOI: 10.1109/FLTA67013.2025.11336390

AI中文摘要

联邦蒸馏作为一种有前途的协同机器学习方法，通过交换模型输出（软日志）而不是完整模型参数，相较于传统联邦学习提供了增强的隐私保护和减少的通信开销。然而，现有方法采用复杂的选择性知识共享策略，要求客户端通过计算昂贵的统计密度比估计器来识别分布内代理数据。此外，服务器端对模糊知识的过滤引入了延迟。为了解决这些挑战，我们提出了一个鲁棒且资源高效的EdgeFD方法，该方法减少了客户端侧密度比估计的复杂性并消除了服务器端过滤的需要。EdgeFD引入了一个高效的KMeans基于的密度比估计器，用于在客户端上有效过滤分布内和分布外的代理数据，显著提高了知识共享的质量。我们评估了EdgeFD在多样化的实际场景中的表现，包括强非iid、弱非iid和iid数据分布，无需在服务器上预训练教师模型进行知识蒸馏。实验结果表明，EdgeFD优于最先进的方法，在异构和挑战性条件下仍能持续达到接近iid场景的准确率。KMeans基于的估计器显著减少的计算开销适用于在资源受限的边缘设备上部署，从而增强了联邦蒸馏的可扩展性和实际应用性。代码已在线提供以供复现。

英文摘要

Federated distillation has emerged as a promising collaborative machine learning approach, offering enhanced privacy protection and reduced communication compared to traditional federated learning by exchanging model outputs (soft logits) rather than full model parameters. However, existing methods employ complex selective knowledge-sharing strategies that require clients to identify in-distribution proxy data through computationally expensive statistical density ratio estimators. Additionally, server-side filtering of ambiguous knowledge introduces latency to the process. To address these challenges, we propose a robust, resource-efficient EdgeFD method that reduces the complexity of the client-side density ratio estimation and removes the need for server-side filtering. EdgeFD introduces an efficient KMeans-based density ratio estimator for effectively filtering both in-distribution and out-of-distribution proxy data on clients, significantly improving the quality of knowledge sharing. We evaluate EdgeFD across diverse practical scenarios, including strong non-IID, weak non-IID, and IID data distributions on clients, without requiring a pre-trained teacher model on the server for knowledge distillation. Experimental results demonstrate that EdgeFD outperforms state-of-the-art methods, consistently achieving accuracy levels close to IID scenarios even under heterogeneous and challenging conditions. The significantly reduced computational overhead of the KMeans-based estimator is suitable for deployment on resource-constrained edge devices, thereby enhancing the scalability and real-world applicability of federated distillation. The code is available online for reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2508.08080 2026-05-19 cs.LG cs.NE stat.AP 版本更新

Symbolic Quantile Regression for the Interpretable Prediction of Conditional Quantiles

符号量化回归用于条件量化可解释性预测

Cas Oude Hoekstra, Floris den Hengst

发表机构 * Independent researcher（独立研究者）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结本文提出了一种符号量化回归方法，用于预测条件量化并解释预测变量对结果的影响，通过在航空燃料使用案例中比较预测极值和中央结果的模型，展示了SQR在高风险应用中的有效性。

详情

Journal ref: Transactions on Machine Learning Research, May 2026, https://openreview.net/pdf?id=x9OYbyPJOG

AI中文摘要

符号回归（SR）是一种生成可解释或白盒预测模型的已知框架。尽管SR已被成功应用于创建结果平均值的可解释估计，但目前尚不清楚如何利用SR来估计目标变量分布其他点处变量之间的关系。例如，中位数或极值的估计提供了预测变量如何影响结果的更全面图景，并在高风险、安全关键应用领域是必要的。本文介绍了符号量化回归（SQR），一种利用SR预测条件量化的做法。在广泛的评估中，我们发现SQR在透明模型上表现优于，并且在不牺牲透明性的情况下与强大的黑盒基线模型表现相当。我们还展示了如何利用SQR通过比较预测极值和中央结果的模型来解释目标分布的差异。我们得出结论，SQR适用于预测条件量化并理解不同分位数下的有趣特征影响。

英文摘要

Symbolic Regression (SR) is a well-established framework for generating interpretable or white-box predictive models. Although SR has been successfully applied to create interpretable estimates of the average of the outcome, it is currently not well understood how it can be used to estimate the relationship between variables at other points in the distribution of the target variable. Such estimates of e.g. the median or an extreme value provide a fuller picture of how predictive variables affect the outcome and are necessary in high-stakes, safety-critical application domains. This study introduces Symbolic Quantile Regression (SQR), an approach to predict conditional quantiles with SR. In an extensive evaluation, we find that SQR outperforms transparent models and performs comparably to a strong black-box baseline without compromising transparency. We also show how SQR can be used to explain differences in the target distribution by comparing models that predict extreme and central outcomes in an airline fuel usage case study. We conclude that SQR is suitable for predicting conditional quantiles and understanding interesting feature influences at varying quantiles.

URL PDF HTML ☆

赞 0 踩 0

2508.00901 2026-05-19 cs.LG cs.CL 版本更新

Provable Knowledge Acquisition and Extraction in One-Layer Transformers

在单层变换器中可证明的知识获取与提取

Ruichen Xu, Kexin Chen

AI总结本文研究了单层变换器中知识获取与提取的机制，通过理论分析和实验验证，揭示了预训练和微调过程中知识存储与提取的关系，以及低秩微调如何恢复预训练的事实知识。

详情

AI中文摘要

大型语言模型在预训练过程中可能获得事实性知识，但在微调后却无法可靠地使用这些知识。尽管有越来越多的实证证据表明MLP层存储事实关联，并且微调影响事实回忆，但连接下一个标记预训练、知识存储和后微调提取的训练动态机制仍然理解有限。我们研究了这个问题，使用了一个简化的一层变换器，包含自注意力和MLP模块，通过下一个标记预测进行训练，随后在问答数据上进行微调。在适当的正则性条件下，我们首先证明模型在学习结构化注意力模式和关系特定的特征方向时达到接近最优的预训练损失，从而提供了一个事实性知识获取的机制。然后我们展示微调可以将问答提示格式转化为触发预训练关系特征的手段，使模型能够提取在微调过程中未被重新访问的事实。我们的分析给出了知识提取的关联覆盖特征化：微调不需要重新访问每一个存储的主体-答案对，但必须覆盖足够的潜在关系-模板方向，通过这些方向在预训练中编码了事实。因此，提取随着预训练的多重性和微调的覆盖度而提高，但随着关系-模板宇宙的增长而变得更加困难。相反，不足的覆盖度会导致失败状态，其中事实可能被存储但仍然无法访问，提供了一个简化的幻觉机制。该理论适用于全和低秩微调，为为什么当关系覆盖度足够时低秩适应可以恢复预训练的事实知识提供了见解。在合成数据和基于PopQA的GPT-2/Llama模型上的实验支持了预测的趋势。

英文摘要

Large language models may encounter factual knowledge during pre-training yet fail to reliably use that knowledge after fine-tuning. Despite growing empirical evidence that MLP layers store factual associations and fine-tuning affects factual recall, the training-dynamics mechanisms linking next-token pre-training, knowledge storage, and post-fine-tuning extraction remain poorly understood. We study this problem in a stylized one-layer transformer with self-attention and MLP modules, trained by next-token prediction and subsequently fine-tuned on question-answering data. Under suitable regularity conditions, we first prove that the model reaches near-optimal pre-training loss while learning structured attention patterns and relation-specific feature directions, giving a mechanism for factual knowledge acquisition. We then show that fine-tuning can turn the Q&A prompt format into a trigger for pre-trained relation features, enabling the model to extract facts that are not revisited during fine-tuning. Our analysis yields a relation-covering characterization of knowledge extraction: fine-tuning need not revisit every stored subject-answer pair, but it must cover enough latent relation-template directions through which facts were encoded during pre-training. Consequently, extraction improves with pre-training multiplicity and fine-tuning coverage, but becomes harder as the relation-template universe grows. Conversely, insufficient coverage leads to a failure regime in which facts may be stored but remain inaccessible, providing a stylized mechanism for hallucination. The theory applies to both full and low-rank fine-tuning, offering insight into why low-rank adaptation can recover pre-trained factual knowledge when relation coverage is sufficient. Experiments on synthetic data and PopQA-based GPT-2/Llama models support the predicted trends.

URL PDF HTML ☆

赞 0 踩 0

2507.17798 2026-05-19 cs.LG 版本更新

Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism

基于Wasserstein GAN与最优传输的降水下scaling以增强感知现实性

Kenta Shiraishi, Yuka Muto, Atsushi Okazaki, Shunji Kotsuki

发表机构 * Graduate School of Science and Engineering, Chiba University（千叶大学科学技术研究生院）； Center for Environmental Remote Sensing, Chiba University（千叶大学环境遥感中心）； Institute for Advanced Academic Research, Chiba University（千叶大学高级学术研究所）； Research Institute of Disaster Medicine, Chiba University（千叶大学灾害医学研究所）

AI总结本文提出利用Wasserstein GAN与最优传输成本进行降水下scaling，以提高降水预测的感知现实性，尽管WGAN在传统评估指标上略逊，但其生成的降水场在视觉上更真实，且能有效识别不真实输出和参考数据中的潜在伪影。

详情

DOI: 10.1186/s40645-026-00815-w
Journal ref: Progress in Earth and Planetary Science, 13, 29, 2026

AI中文摘要

高分辨率（HR）降水预测对于减少静止和局部强降雨造成的损害至关重要；然而，使用过程驱动的数值天气预测模型进行HR降水预测仍然具有挑战性。本研究提出利用Wasserstein生成对抗网络（WGAN）结合最优传输成本进行降水下scaling。与传统神经网络使用均方误差训练不同，WGAN能够生成具有精细结构的视觉上逼真的降水场，尽管WGAN在传统评估指标上略逊。WGAN学习的批评者与人类感知现实性密切相关。基于案例的分析表明，批评者分数的显著差异有助于识别不真实的WGAN输出和参考数据中的潜在伪影。这些发现表明，WGAN框架不仅提高了降水下scaling的感知现实性，还为评估和质量控制降水数据集提供了新的视角。

英文摘要

High-resolution (HR) precipitation prediction is essential for reducing damage from stationary and localized heavy rainfall; however, HR precipitation forecasts using process-driven numerical weather prediction models remains challenging. This study proposes using Wasserstein Generative Adversarial Network (WGAN) to perform precipitation downscaling with an optimal transport cost. In contrast to a conventional neural network trained with mean squared error, the WGAN generated visually realistic precipitation fields with fine-scale structures even though the WGAN exhibited slightly lower performance on conventional evaluation metrics. The learned critic of WGAN correlated well with human perceptual realism. Case-based analysis revealed that large discrepancies in critic scores can help identify both unrealistic WGAN outputs and potential artifacts in the reference data. These findings suggest that the WGAN framework not only improves perceptual realism in precipitation downscaling but also offers a new perspective for evaluating and quality-controlling precipitation datasets.

URL PDF HTML ☆

赞 0 踩 0

2507.05482 2026-05-19 cs.LG stat.ML 版本更新

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

Van Khoa Nguyen, Lionel Blondé, Alexandros Kalousis

发表机构 * Department of Computer Science, University of Geneva（日内瓦大学计算机科学系）

AI总结本文提出了一种基于Stein扩散引导的训练自由后验校正方法，用于在高密度区域之外进行采样。该方法结合了随机最优控制和Stein变分推断，通过引入新的理论界和运行成本函数，实现了在低密度区域的有效引导。

Comments Revised version accepted to the ICML 2026 main track; prior version accepted to two ICLR 2026 workshops: ReALM-GEN and DeLTa

详情

AI中文摘要

Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifiers without additional training. Yet, current approaches hinge on posterior approximations via Tweedie's formula, which often yield unreliable guidance, particularly in low-density regions. Stochastic optimal control (SOC), in contrast, enables principled posterior sampling but remains computationally prohibitive for efficient inference. In this work, we reconcile the strengths of these paradigms by introducing Stein Diffusion Guidance (SDG), a novel 免训练 framework grounded in a surrogate SOC objective. We establish a new theoretical bound on the SOC value function, revealing the necessity of correcting approximate posteriors to reflect true diffusion dynamics. Building on Stein variational inference, SDG computes the steepest descent direction that minimizes the Kullback-Leibler divergence between approximate and true posteriors. By integrating a principled Stein correction mechanism along with a novel running cost functional, SDG enables effective guidance in low-density regions. Our experiments on diverse image-guidance tasks and on challenging small-ligand sampling for protein docking suggest that SDG consistently outperforms standard 免训练 guidance methods and highlights its potential for broader posterior sampling problems beyond high-density regimes.

英文摘要

Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifiers without additional training. Yet, current approaches hinge on posterior approximations via Tweedie's formula, which often yield unreliable guidance, particularly in low-density regions. Stochastic optimal control (SOC), in contrast, enables principled posterior sampling but remains computationally prohibitive for efficient inference. In this work, we reconcile the strengths of these paradigms by introducing Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective. We establish a new theoretical bound on the SOC value function, revealing the necessity of correcting approximate posteriors to reflect true diffusion dynamics. Building on Stein variational inference, SDG computes the steepest descent direction that minimizes the Kullback-Leibler divergence between approximate and true posteriors. By integrating a principled Stein correction mechanism along with a novel running cost functional, SDG enables effective guidance in low-density regions. Our experiments on diverse image-guidance tasks and on challenging small-ligand sampling for protein docking suggest that SDG consistently outperforms standard training-free guidance methods and highlights its potential for broader posterior sampling problems beyond high-density regimes.

URL PDF HTML ☆

赞 0 踩 0

2507.01533 2026-05-19 math.NA cs.LG cs.NA math.PR 版本更新

Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

利用神经ODEs的学得稀疏网格求积规则的一致性

Hanno Gottschalk, Emil Partow, Tobias J. Riedlinger

发表机构 * Technische Universität Berlin（柏林技术大学）； Ludwig-Maximilians-Universität München（慕尼黑路德维希-马克西米利安大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）

AI总结本文研究了利用神经ODEs学习的稀疏网格求积规则的一致性问题，通过分析运输映射与Clenshaw-Curtis稀疏网格求积的组合，证明了在一般目标和产品目标下的求积速率，并展示了在两种不同情况下得到的PAC一致性结果。

Comments 39 pages, 8 figures

详情

AI中文摘要

我们证明了最近提出的一种方案的一致性，该方案通过将学习的运输映射与Clenshaw--Curtis稀疏网格求积组合来评估期望值。我们的分析基于这样一个结构事实：将一个具有混合正则性C^{k}_{mix}的函数（其快速求积速率为m^{-k}(log m)^{(d-1)(k+1)})与一个C^1的微分同胚相组合，只有当微分同胚在坐标上至多是置换时，才能保证其本身仍然是C^{k}_{mix}。因此，快速速率仅适用于产品目标，分析分为两种情形。在一般情形下，任意目标中，我们学习运输作为由最大似然训练的ReLU^{k+1}神经ODE的时间一流。所得到的流位于各向同性的C^k空间中，产生速率m^{-k/d}(log m)^{(d-1)(k/d+1)}，其中提升密度平滑度k和匹配的激活阶数k+1缓解了维度灾难，但代价是更困难的优化。在对角线情形下，Knothe--Rosenblatt映射本身是对角线的，我们通过经验分位数运输点估计它，这是一种轻量级的替代方法，可以恢复完整的混合正则性速率。在两种情形中，所得到的LtI估计器都是PAC（probably approximately correct）一致的。以高概率，当样本大小n和求积预算m趋于无穷时，数值积分近似真实值的精度可以任意高。

英文摘要

We prove consistency of a recently proposed scheme that evaluates expected values by composing a learned transport map with Clenshaw--Curtis sparse-grid quadrature on a tractable product source. Our analysis hinges on the structural fact that composition of a $C^k_{\mathrm{mix}}$-regular function -- which carries the fast quadrature rate $m^{-k}(\log m)^{(d-1)(k+1)}$ -- with a $C^1$-diffeomorphism can only be guaranteed to be $C^k_{\mathrm{mix}}$ itself, if the diffeomorphism is diagonal up to a permutation of coordinates. The fast rate is therefore available exclusively for product targets, and the analysis splits into two regimes. In the general regime of arbitrary targets, we learn the transport as the time-one flow of a $\mathrm{ReLU}^{k+1}$-neural ODE trained by maximum likelihood. The resulting flow lies in the isotropic space $C^k$ and yields the rate $m^{-k/d}(\log m)^{(d-1)(k/d+1)}$, with raising the density smoothness $k$ and the matched activation order $k+1$ mitigating the curse of dimensionality at the cost of harder optimization. In the diagonal regime of product targets, the Knothe--Rosenblatt map is itself diagonal and we estimate it pointwise via empirical quantile transport, a lightweight alternative that recovers the full mixed-regularity rate. In both regimes, the resulting LtI estimator is PAC (probably approximately correct) consistent. With high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size $n$ and the quadrature budget $m$ tend to infinity.

URL PDF HTML ☆

赞 0 踩 0

2506.16042 2026-05-19 cs.AI cs.LG cs.OS 版本更新

OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents

OSWorld-Human: 评估计算机使用代理的效率基准

Reyna Abhyankar, Qi Qi, Yiying Zhang

发表机构 * OpenAI ； Anthropic ； Google DeepMind ； ByteDance（字节跳动）； Agent S2 ； GTA1 ； Lei ； Jedi

AI总结本文研究了计算机使用代理在OSWorld基准上的时间性能，发现大模型调用导致高延迟，并构建了包含人类轨迹的OSWorld Human数据集，评估发现最佳代理仍需更多步骤。

详情

AI中文摘要

生成式AI正被用于解决涉及桌面应用的多种计算机使用任务。最先进的系统仅专注于提高领先基准的准确性。然而，这些系统由于端到端延迟极高（例如，数十分钟）而实际上不可用，因为通常只需人类几分钟即可完成的任务。为了理解这一现象并指导未来计算机代理的发展，我们首次研究了计算机使用代理在OSWorld基准上的时间性能。我们发现，规划、反思和判断的大模型调用占总延迟的主要部分，并且随着代理使用更多步骤完成任务，每一步骤的时间会比任务开始时的步骤长3倍。我们随后构建了OSWorld Human，即原始OSWorld数据集的手动标注版本，其中包含每个任务的人类确定轨迹。我们使用OSWorld Human评估了16个代理的效率，并发现即使最佳代理也比必要多出2.7-4.3倍的步骤。

英文摘要

Generative AI is being leveraged to solve a variety of computer-use tasks involving desktop applications. State-of-the-art systems have focused solely on improving accuracy on leading benchmarks. However, these systems are practically unusable due to extremely high end-to-end latency (e.g., tens of minutes) for tasks that typically take humans just a few minutes to complete. To understand the cause behind this and to guide future developments of computer agents, we conduct the first study on the temporal performance of computer-use agents on OSWorld, the flagship benchmark in computer-use AI. We find that large model calls for planning, reflection, and judging account for most of the overall latency, and as an agent uses more steps to complete a task, each successive step can take 3x longer than steps at the beginning of a task. We then construct OSWorld Human, a manually annotated version of the original OSWorld dataset that contains a human-determined trajectory for each task. We evaluate 16 agents on their efficiency using OSWorld Human and found that even the best agents take 2.7-4.3x more steps than necessary.

URL PDF HTML ☆

赞 0 踩 0

2506.15588 2026-05-19 cs.LG 版本更新

Memory-Efficient Differentially Private Training with Gradient Random Projection

内存高效的差分隐私训练与梯度随机投影

Alex Mulrooney, Devansh Gupta, James Flemings, Huanyu Zhang, Murali Annavaram, Meisam Razaviyayn, Xinwei Zhang

发表机构 * University of Delaware（德克萨斯大学）； University of Southern California（南加州大学）； Meta（Meta公司）； Amazon（亚马逊）

AI总结本文提出DP-GRAPE方法，通过随机高斯矩阵替代SVD子空间，减少内存使用并保持与一阶DP方法相当的效用，同时消除了昂贵的SVD计算需求，显著提升内存效率和模型性能。

详情

AI中文摘要

差分隐私（DP）在神经网络训练中保护敏感数据，但标准方法如DP-Adam由于每个样本梯度裁剪导致高内存开销，限制了可扩展性。我们引入DP-GRAPE（梯度随机投影），一种差分隐私训练方法，显著减少内存使用，同时保持与一阶DP方法相当的效用。DP-GRAPE的灵感来自我们发现隐私化使梯度奇异值谱变平，使基于SVD的投影（如GaLore（Zhao等人，2024））变得不必要的。因此，DP-GRAPE采用三个关键组件：（1）随机高斯矩阵替代基于SVD的子空间；（2）在投影后对梯度进行隐私化；（3）在反向传播期间应用投影。这些贡献消除了昂贵的SVD计算需求，实现了显著的内存节省，并提高了效用。尽管在较低维子空间中运行，我们的理论分析显示，DP-GRAPE在隐私-效用权衡上与DP-SGD相当。我们的广泛实验证明，DP-GRAPE可以显著减少DP训练的内存足迹，而不牺牲准确性和训练时间。特别是，DP-GRAPE在预训练视觉Transformer时将内存使用减少超过63%，在微调RoBERTa-Large时减少超过70%，同时实现相似性能。我们进一步证明，DP-GRAPE能够扩展到微调大型模型，如具有67亿参数的OPT，这是DP-Adam因内存限制而无法处理的规模。我们的代码可在https://github.com/alexmul1114/DP_GRAPE获得。

英文摘要

Differential privacy (DP) protects sensitive data during neural network training, but standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping, limiting scalability. We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage while maintaining utility on par with first-order DP approaches. DP-GRAPE is motivated by our finding that privatization flattens the gradient singular value spectrum, making SVD-based projections (as in GaLore (Zhao et al., 2024)) unnecessary. Consequently, DP-GRAPE employs three key components: (1) random Gaussian matrices replace SVD-based subspaces, (2) gradients are privatized after projection, and (3) projection is applied during backpropagation. These contributions eliminate the need for costly SVD computations, enable substantial memory savings, and lead to improved utility. Despite operating in lower-dimensional subspaces, our theoretical analysis shows that DP-GRAPE achieves a privacy-utility tradeoff comparable to DP-SGD. Our extensive empirical experiments show that DP-GRAPE can significantly reduce the memory footprint of DP training without sacrificing accuracy or training time. In particular, DP-GRAPE reduces memory usage by over 63% when pre-training Vision Transformers and over 70% when fine-tuning RoBERTa-Large as compared to DP-Adam, while achieving similar performance. We further demonstrate that DP-GRAPE scales to fine-tuning large models such as OPT with up to 6.7 billion parameters, a scale at which DP-Adam fails due to memory constraints. Our code is available at https://github.com/alexmul1114/DP_GRAPE.

URL PDF HTML ☆

赞 0 踩 0

2506.08244 2026-05-19 cs.LG cs.AI stat.ML 版本更新

Algebraic Priors for Approximately Equivariant Networks

代数先验用于近似等变网络

Riccardo Ali, Pietro Liò, Jamie Vicary

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出了一种无需参数的代数方法，利用群表示理论来构建等变网络的先验，通过实验验证该方法在多个任务中表现优异，甚至在无限群情况下也优于专门设计的模型。

详情

AI中文摘要

等变神经网络通过群作用来整合对称性，将其作为归纳偏差以提高性能。现有方法在潜在空间中学习等变作用，或设计具有等变结构的架构。这些方法通常能获得良好的经验结果，但可能涉及架构特定的约束、大量参数和高计算成本。我们挑战复杂等变架构范式，提出一种无参数的方法，基于群表示理论。我们证明，对于有限群上的等变编码器，潜在空间几乎必然包含每个线性无关数据轨道的一个副本，我们通过多个实验证明这一点。利用这一基础的代数洞察，我们通过辅助损失将群的正则表示作为归纳偏差，不增加可学习参数。我们的广泛评估显示，该方法在多个任务中表现优异，甚至在无限群情况下也优于专门设计的模型。我们进一步通过消融研究验证了正则表示的选择，显示其在所有情况下均优于定义和平凡群表示的基线模型。

英文摘要

Equivariant neural networks incorporate symmetries through group actions, embedding them as an inductive bias to improve performance. Existing methods learn an equivariant action on the latent space, or design architectures that are equivariant by construction. These approaches often deliver strong empirical results but can involve architecture-specific constraints, large parameter counts, and high computational cost. We challenge the paradigm of complex equivariant architectures with a parameter-free approach grounded in group representation theory. We prove that for an equivariant encoder over a finite group, the latent space must almost surely contain one copy of its regular representation for each linearly independent data orbit, which we explore with a number of empirical studies. Leveraging this foundational algebraic insight, we impose the group's regular representation as an inductive bias via an auxiliary loss, adding no learnable parameters. Our extensive evaluation shows that this method matches or outperforms specialized models in several cases, even those for infinite groups. We further validate our choice of the regular representation through an ablation study, showing it consistently outperforms defining and trivial group representation baselines.

URL PDF HTML ☆

赞 0 踩 0

2506.04170 2026-05-19 quant-ph cond-mat.stat-mech cs.LG hep-lat hep-th 版本更新

Estimation of the reduced density matrix and entanglement entropies using autoregressive networks

利用自回归网络估计简化的密度矩阵和纠缠熵

Piotr Białas, Piotr Korcyl, Tomasz Stebel, Dawid Zapolski

发表机构 * Institute of Applied Computer Science（应用计算机科学研究所）； Institute of Theoretical Physics（理论物理学研究所）； Doctoral School of Exact and Natural Sciences（精确与自然科学研究博士学院）

AI总结本文提出利用自回归神经网络对量子自旋链的蒙特卡罗模拟进行应用，通过与经典二维自旋系统的对应关系，直接估算简化的密度矩阵元素，并计算Ising链中由最多5个自旋构成的区间地纯态的von Neumann和Rényi双分纠缠熵的连续极限。

Comments 9 pages, 7 figures

详情

DOI: 10.5506/APhysPolB.56.12-A4
Journal ref: Acta Physica Polonica B, Vol. 56 (2025), No. 12

AI中文摘要

我们提出将自回归神经网络应用于量子自旋链的蒙特卡罗模拟，通过与经典二维自旋系统的对应关系。我们使用能够估计连续自旋条件概率的神经网络层次结构，直接估算简化的密度矩阵元素。以Ising链为例，我们计算了由最多5个自旋构成的区间地纯态的von Neumann和Rényi双分纠缠熵的连续极限。我们证明了我们的架构能够仅通过一次训练，针对固定的离散化时间和晶格体积，估算所有所需的矩阵元素。我们的方法可以应用于其他类型的自旋链，可能包含缺陷，以及非零温度热态的纠缠熵估计。

英文摘要

We present an application of autoregressive neural networks to Monte Carlo simulations of quantum spin chains using the correspondence with classical two-dimensional spin systems. We use a hierarchy of neural networks capable of estimating conditional probabilities of consecutive spins to evaluate elements of reduced density matrices directly. Using the Ising chain as an example, we calculate the continuum limit of the ground state's von Neumann and Rényi bipartite entanglement entropies of an interval built of up to 5 spins. We demonstrate that our architecture is able to estimate all the needed matrix elements with just a single training for a fixed time discretization and lattice volume. Our method can be applied to other types of spin chains, possibly with defects, as well as to estimating entanglement entropies of thermal states at non-zero temperature.

URL PDF HTML ☆

赞 0 踩 0

2505.24438 2026-05-19 cs.LG 版本更新

RAP: 用于大语言模型推理的运行时自适应剪枝

Huanrong Liu, Chunlin Tian, Xuyang Wei, Qingbiao Li, Li Li

发表机构 * Faculty of Science and Technology, University of Macau, Macau, China（澳门大学科学与技术学院）； School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China（电子科技大学信息与软件工程学院）

AI总结本文提出RAP，一种基于强化学习的弹性剪枝框架，通过动态调整压缩策略来适应运行时内存变化和异构KV缓存需求，首次在推理过程中同时考虑模型权重和KV缓存。

详情

AI中文摘要

大语言模型（LLMs）在语言理解和生成方面表现出色，但其巨大的计算和内存需求限制了部署。压缩提供了一种潜在的解决方案来缓解这些约束。然而，大多数现有方法依赖于固定的启发式方法，因此无法适应运行时内存变化或来自多样化用户请求的异构KV缓存需求。为了解决这些限制，我们提出了RAP，一种由强化学习（RL）驱动的弹性剪枝框架，能够以运行时感知的方式动态调整压缩策略。具体而言，RAP动态跟踪实际执行过程中模型参数与KV缓存之间的演变比例。认识到前馈网络（FFNs）包含大部分参数，而参数轻量的注意力层主导KV缓存的形成，RL代理只保留那些在当前内存预算内最大化效用的组件，基于即时的工作负载和设备状态。广泛的实验结果表明，RAP优于最先进的基线方法，标志着首次在推理过程中同时考虑模型权重和KV缓存。

英文摘要

Large language models (LLMs) excel at language understanding and generation, but their enormous computational and memory requirements hinder deployment. Compression offers a potential solution to mitigate these constraints. However, most existing methods rely on fixed heuristics and thus fail to adapt to runtime memory variations or heterogeneous KV-cache demands arising from diverse user requests. To address these limitations, we propose RAP, an elastic pruning framework driven by reinforcement learning (RL) that dynamically adjusts compression strategies in a runtime-aware manner. Specifically, RAP dynamically tracks the evolving ratio between model parameters and KV-cache across practical execution. Recognizing that FFNs house most parameters, whereas parameter -light attention layers dominate KV-cache formation, the RL agent retains only those components that maximize utility within the current memory budget, conditioned on instantaneous workload and device state. Extensive experiments results demonstrate that RAP outperforms state-of-the-art baselines, marking the first time to jointly consider model weights and KV-cache on the fly.

URL PDF HTML ☆

赞 0 踩 0

2504.03035 2026-05-19 stat.ML cs.LG math.PR math.ST stat.ME stat.TH 版本更新

High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

具有随机特征的高维岭回归：非同分布数据的方差轮廓

Issa-Mbenard Dabo, Jérémie Bigot

发表机构 * New York University Abu Dhabi（纽约大学阿布扎赫分校）； Institut de mathématiques de Bordeaux（波尔多数学研究所）

AI总结本文研究了在非同分布数据下，使用随机特征的高维岭回归，通过方差轮廓模型分析训练和测试风险的渐近等价，并揭示了异质方差轮廓对泛化性能的影响。

详情

AI中文摘要

随机特征岭回归通常在同质采样模型下分析，即$x_i=Σ^{1/2}x_i'$，其中向量$x_i'$具有独立同分布的条目和相同的协方差矩阵$Σ$。本文超越了这一设定，通过方差轮廓模型研究非同分布数据，其中训练和测试协变量具有行依赖的对角协方差矩阵$Σ_i=diag(γ_{i1}^2,…,γ_{ip}^2)$和$\widetildeΣ_i=diag( ildeγ_{i1}^2,…, ildeγ_{ip}^2)$。我们的主要贡献是推导了当$n$、$p$和$m$按比例增长时，具有随机特征的岭回归的训练和测试风险的渐近等价。第一组等价是通过线性加混沌近似与交通概率论证相结合得到的，而第二组是确定性的，并通过通过主对角线的融合论证从算子值自由概率中获得。这些等价在数值实验中是精确的。它们还揭示了异质方差轮廓，包括受MNIST启发的混合型轮廓，如何修改泛化性能，并在岭参数较小时表现出双下降行为。

英文摘要

Random feature ridge regression is often analyzed in the high-dimensional regime under the homogeneous sampling model $x_i=Σ^{1/2}x_i'$, where the vectors $x_i'$ have iid entries and the same covariance matrix $Σ$ is shared by all samples. In this paper, we move beyond this setting and study non-identically distributed data through a variance-profile model in which the training and test covariates have row-dependent diagonal covariance matrices $Σ_i=\diag(γ_{i1}^2,\ldots,γ_{ip}^2)$ and $\widetildeΣ_i=\diag(\tildeγ_{i1}^2,\ldots,\tildeγ_{ip}^2)$. Our main contribution is the derivation of asymptotic equivalents for the training and test risks of ridge regression with random features when $n$, $p$, and $m$ grow proportionally. The first set of equivalents is obtained by combining the linear-plus-chaos approximation with traffic-probability arguments, whereas the second set is deterministic and follows from operator-valued free probability through an amalgamation-over-the-diagonal argument. These equivalents are sharp in numerical experiments. They also reveal how heterogeneous variance profiles, including mixture-type profiles inspired by MNIST, can modify generalization and exhibit double-descent behavior when the ridge parameter is small.

URL PDF HTML ☆

赞 0 踩 0

2502.20969 2026-05-19 cs.DC cs.LG 版本更新

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

TeleRAG: 通过前瞻性检索实现高效的检索增强生成推理

Chien-Yu Lin, Keisuke Kamahori, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Luis Ceze, Baris Kasikci

发表机构 * Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA（华盛顿大学保罗·G·艾伦计算机科学与工程学院，西雅图，华盛顿州，美国）； Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA（哈佛大学约翰·A·保罗森工程与应用科学学院，剑桥，马萨诸塞州，美国）； Shanghai Jiao Tong University, Shanghai, China（上海交通大学，上海，中国）

AI总结本文提出TeleRAG，一种通过前瞻性检索机制减少延迟并提高吞吐量的高效检索增强生成推理系统，该系统在有限的GPU内存下实现了更高的性能和可扩展性。

详情

AI中文摘要

检索增强生成（RAG）通过外部数据源扩展大型语言模型（LLMs），以提高事实正确性和领域覆盖范围。现代RAG流水线依赖于大型数据存储，这带来了显著的系统挑战：在GPU内存有限时，实现高吞吐量和低延迟非常困难。为了解决这些挑战，我们提出了TeleRAG，一种高效的推理系统，该系统在最小的GPU内存需求下减少延迟并提高吞吐量。TeleRAG的核心创新是前瞻性检索，这是一种预取机制，可以预测所需的数据并将它们从CPU传输到GPU，与LLM生成同时进行。此外，TeleRAG采用预取调度器和缓存感知调度器，以支持高效的多GPU推理，且具有最小的开销。评估显示，TeleRAG在单查询情况下实现了高达1.53倍的端到端延迟减少，在批量处理时平均吞吐量提高了1.83倍，并在吞吐量方面表现出良好的可扩展性。这证实了TeleRAG在更快、更内存高效的RAG应用部署中的实用价值。

英文摘要

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large datastores, creating a significant system challenge: achieving high throughput and low latency is difficult, especially when GPU memory is limited. To address these challenges, we propose TeleRAG, an efficient inference system that reduces latency and improves throughput with minimal GPU memory requirements. The core innovation of TeleRAG is lookahead retrieval, a prefetching mechanism that predicts required data and transfers them from CPU to GPU in parallel with LLM generation. In addition, TeleRAG adopts a prefetching scheduler and a cache-aware scheduler to support efficient multi-GPU inference with minimal overhead. Evaluations show TeleRAG achieves up to a 1.53x average end-to-end latency reduction (single-query) and 1.83x higher average throughput (batched), as well as good scalability in throughput. This confirms the practical utility of TeleRAG for faster and more memory-efficient deployments of RAG applications.

URL PDF HTML ☆

赞 0 踩 0

2502.02463 2026-05-19 stat.ML cs.LG 版本更新

Distribution Transformers: Fast Approximate Bayesian Inference With On-The-Fly Prior Adaptation

分布变换器：通过实时先验适应实现快速近似贝叶斯推断

George Whittle, Juliusz Ziomek, Jacob Rawling, Maike A. Osborne

发表机构 * Mind Foundry Ltd（Mind Foundry有限公司）

AI总结本文提出分布变换器，一种能够学习任意分布到分布映射的新型架构，通过实时先验适应实现快速近似贝叶斯推断，显著降低计算时间并达到与现有方法相当或更优的对数似然性能。

Comments Spotlight acceptance at ICML 2026

详情

AI中文摘要

尽管贝叶斯推断为在不确定性下的推理提供了原理性框架，但其广泛应用受到精确后验计算不可行的限制，需要使用近似推断。然而，现有方法通常计算成本高，或在先验变化时需要昂贵的重新训练，限制了其在如实时传感器融合等连续推断问题中的实用性。为了解决这些挑战，我们引入了分布变换器——一种新型架构，能够学习任意分布到分布的映射。我们的方法可以训练为将先验映射到对应的后验，条件于某些数据集——从而执行近似贝叶斯推断。我们的新型架构将先验分布表示为（通用近似）高斯混合模型（GMM），并将其实变为后验的GMM表示。GMM的组成部分通过自注意力机制相互关注，并通过交叉注意力机制与数据点相互作用。我们证明分布变换器在保持先验变化的灵活性的同时，显著减少了计算时间——从分钟到毫秒——并在序列推断、量子系统参数推断以及具有超先验的高斯过程预测后验推断等任务中实现了与现有近似推断方法相当或更优的对数似然性能。

英文摘要

While Bayesian inference provides a principled framework for reasoning under uncertainty, its widespread adoption is limited by the intractability of exact posterior computation, necessitating the use of approximate inference. However, existing methods are often computationally expensive, or demand costly retraining when priors change, limiting their utility, particularly in sequential inference problems such as real-time sensor fusion. To address these challenges, we introduce the Distribution Transformer -- a novel architecture that can learn arbitrary distribution-to-distribution mappings. Our method can be trained to map a prior to the corresponding posterior, conditioned on some dataset -- thus performing approximate Bayesian inference. Our novel architecture represents a prior distribution as a (universally-approximating) Gaussian Mixture Model (GMM), and transforms it into a GMM representation of the posterior. The components of the GMM attend to each other via self-attention, and to the datapoints via cross-attention. We demonstrate that Distribution Transformers both maintain flexibility to vary the prior, and significantly reduces computation times-from minutes to milliseconds-while achieving log-likelihood performance on par with or superior to existing approximate inference methods across tasks such as sequential inference, quantum system parameter inference, and Gaussian Process predictive posterior inference with hyperpriors.

URL PDF HTML ☆

赞 0 踩 0

2309.01243 2026-05-19 cs.CR cs.LG 版本更新

The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning

正态分布不可区分光谱及其在隐私保护机器学习中的应用

Yu Wei, Yun Lu, Malik Magdon-Ismail, Vassilis Zikas

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of Victoria（维多利亚大学）； Rensselaer Polytechnic Institute（拉特格斯理工学院）

AI总结本文研究了任何输出具有高斯分布的算法的隐私性，提出了一种名为正态分布不可区分光谱（NDIS）的通用引理，用于计算任意两个多元高斯分布之间的hockey-stick散度，并将其应用于随机投影等算法的隐私证明，从而实现更高效的隐私保护机制。

详情

AI中文摘要

我们研究了任何输出具有高斯分布的算法的隐私性。这项工作受到在多个有用（ML）应用中广泛使用此类算法，以及在数据上添加高斯噪声（如DP-SGD）之外相对较少关注隐私保护学习的启发。什么是任何具有多元高斯输出的算法的DP？我们通过一个通用引理来回答这个问题，该引理称为正态分布不可区分光谱（NDIS），用于计算任意两个多元高斯分布之间的hockey-stick散度δ，参数化为隐私参数ε。为了展示其实际影响，我们证明了NDIS引理的几个性质。这些性质形成了一组结果工具，可用于为任何高斯输出算法提供更简单的隐私证明。作为我们工具包的一个应用示例，我们证明了随机投影（RP）隐私的更紧参数化，并由此获得一个更节省噪声的DP机制。除了随机投影，NDIS可以用于将任何具有`sensitivity`（我们定义）的高斯输出算法提升为高斯输出的DP机制。该机制增强了现有算法中的随机性，使得可以将机制的隐私描述为单对高斯分布之间的IS，然后通过NDIS进行分析。最后，我们利用NDIS与广义χ²分布CDF之间的联系（具有高效的实证估计器）来提出一个用于高斯输出算法白盒审计的工具。

英文摘要

We investigate the privacy of {\em any} algorithm whose outputs have Gaussian distribution. This work is motivated by the prevalence of such algorithms in several useful (ML) applications, and the comparatively little research that focuses on privacy-preserving learning outside of adding Gaussian noise to the data (such as DP-SGD). {\em What is the DP of any algorithm with multivariate Gaussian output?} We answer the above research question with a general lemma which we call {\em Normal Distributions Indistinguishability Spectrum} (NDIS), a closed-form analytic computation of the hockey-stick divergence $δ$ between an arbitrary pair of multivariate Gaussians, parameterized by privacy parameter $ε$. To show its practical implications, we prove several properties of our NDIS lemma. These properties form a {\em toolbox} of results which lead to potentially {\em easier} privacy proofs for any Gaussian-output algorithm. As an example application of our toolbox, we prove a tighter parametrisation of the privacy of {\em random projection (RP)}, and obtaining from it a more noise-frugal DP mechanism. Beyond random projection, NDIS can be used to lift {\em any} Gaussian-output algorithm with a `sensitivity' (which we define) to a Gaussian-output DP mechanism. The mechanism boosts the existing randomness in the algorithm, so that one can describe the mechanism's privacy as the IS between a single pair of Gaussians, which can then be analyzed via NDIS. Lastly, we leverage the connections between NDIS and the CDF of the generalized $χ^2$ distribution (which have efficient empirical estimators) to present a tool for white-box auditing of Gaussian-output algorithms.

URL PDF HTML ☆

赞 0 踩 0

2307.08643 2026-05-19 cs.LG stat.ML 版本更新

Corruptions of Supervised Learning Problems: Typology and Mitigations

监督学习问题的腐败：类型与缓解方法

Laura Iacovissi, Nan Lu, Robert C. Williamson

发表机构 * Tübingen AI Center, University of Tübingen（图宾根人工智能中心，图宾根大学）

AI总结本文提出了一种通用的腐败理论，通过马尔可夫核分析底层概率分布的变化，统一了不同类型的腐败模型，并探讨了针对各种腐败类型的缓解方法。

Comments 73 pages. To be published in Journal of Machine Learning Research 27 (2026) 1-73

详情

AI中文摘要

腐败在数据收集中普遍存在。尽管已有大量研究，现有文献主要集中在特定设置和学习场景，缺乏对腐败建模和缓解的统一视角。本文开发了一种通用的腐败理论，涵盖监督学习问题的所有修改，包括模型类和损失的变化。通过分析底层概率分布的变化，我们的方法带来了三个新机会：首先，构建了一个新型且可证明的腐败框架，区分不同类型的腐败；其次，通过比较清洁和受污染场景下的贝叶斯风险，系统分析了腐败对学习任务的影响；第三，基于这些结果，我们研究了各种腐败类型的缓解方法。我们扩展了现有的标签腐败损失修正方法以处理依赖性腐败类型。我们的发现强调了将经典腐败修正学习框架推广到更宽松的范式以涵盖更多腐败类型的必要性。我们提供了这种范式以及属性和联合腐败情况下的损失修正公式。

英文摘要

Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Markov kernels, our approach leads to three novel opportunities. First, it enables the construction of a novel, provably exhaustive corruption framework, distinguishing among different corruption types. This serves to unify existing models and establish a consistent nomenclature. Second, it facilitates a systematic analysis of corruption's consequences on learning tasks, by comparing Bayes risks in the clean and corrupted scenarios. Notably, while label corruptions affect only the loss function, attribute corruptions additionally influence the hypothesis class. Third, building upon these results, we investigate mitigations for various corruption types. We expand existing loss-correction methods for label corruption to handle dependent corruption types. Our findings highlight the necessity to generalize this classical corruption-corrected learning framework to a new paradigm with weaker requirements to encompass more corruption types. We provide such a paradigm as well as loss correction formulas in the attribute and joint corruption cases.

URL PDF HTML ☆

赞 0 踩 0

2305.18578 2026-05-19 stat.ME cs.LG stat.ML 版本更新

Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models

快速自适应三元分割：一种适用于隐马尔可夫模型的高效解码过程

Alexandre Mösching, Housen Li, Axel Munk

发表机构 * Nonclinical Biostatistics, F. Hoffmann-La Roche, Switzerland（非临床生物统计学，霍夫曼拉罗奇公司，瑞士）； Institute for Mathematical Stochastics, Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells”（数学概率研究所，卓越中心“多尺度生物成像：从分子机器到可兴奋细胞网络”）； Georg-August-Universität Göttingen, Germany（哥廷根大学，德国）

AI总结本文提出了一种快速自适应三元分割（QATS）方法，通过分治策略在序列长度上具有多项对数复杂度，在状态空间大小上具有三次复杂度，适用于大规模隐马尔可夫模型。该方法通过自适应搜索近似最大化局部似然得分，实现了比Viterbi和PMAP更快的解码速度和更高的精度。

详情

DOI: 10.1080/10618600.2025.2572328
Journal ref: Journal of Computational and Graphical Statistics, 35(2), 865-879, 2026

AI中文摘要

隐马尔可夫模型（HMMs）由一个不可观测的马尔可夫链和一个可观测的过程组成——隐藏链的噪声版本。从噪声观测中解码原始信号是几乎所有基于HMM的数据分析的主要目标。现有的解码算法，如维特比算法和点最大后验（PMAP）算法，其计算复杂度在最坏情况下是观测序列长度的线性函数，或隐藏链状态空间大小的亚二次函数。我们提出了快速自适应三元分割（QATS），一种分治策略，其计算复杂度在序列长度上为多项对数，在状态空间大小上为三次方，因此特别适用于具有相对较少状态的大规模HMM。它还提出了一种有效的数据存储方法，即特定的累积和。本质上，估计的状态序列在所有最多三个段的局部路径中最大化局部似然得分，并且是可接受的。最大化仅通过自适应搜索过程近似进行。我们的模拟展示了QATS相比维特比和PMAP的速度提升，以及精度分析。QATS的实现可在GitHub上的R包QATS中找到。

英文摘要

Hidden Markov models (HMMs) are characterized by an unobservable Markov chain and an observable process -- a noisy version of the hidden chain. Decoding the original signal from the noisy observations is one of the main goals in nearly all HMM based data analyses. Existing decoding algorithms such as Viterbi and the pointwise maximum a posteriori (PMAP) algorithm have computational complexity at best linear in the length of the observed sequence, and sub-quadratic in the size of the state space of the hidden chain. We present Quick Adaptive Ternary Segmentation (QATS), a divide-and-conquer procedure with computational complexity polylogarithmic in the length of the sequence, and cubic in the size of the state space, hence particularly suited for large scale HMMs with relatively few states. It also suggests an effective way of data storage as specific cumulative sums. In essence, the estimated sequence of states sequentially maximizes local likelihood scores among all local paths with at most three segments, and is meanwhile admissible. The maximization is performed only approximately using an adaptive search procedure. Our simulations demonstrate the speedups offered by QATS in comparison to Viterbi and PMAP, along with a precision analysis. An implementation of QATS is in the R-package QATS on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2605.18147 2026-05-19 cs.LG 版本更新

Foundation Models for Credit Risk Prediction: A Game Changer?

信贷风险预测的基础模型：变革性突破？

Bart Baesens, Andreas Goethals, Stefan Lessmann, Simon De Vos, Cristián Bravo, David Martens, Victor Medina-Olivares, Christophe Mues, Maria Oskarsdóttir, Seppe vanden Broucke, Tim Verdonck, Wouter Verbeke

发表机构 * Faculty of Economics and Business, KU Leuven, Belgium（比利时库勒万大学经济与商业学院）； School of Business and Economics, Humboldt University of Berlin, Germany（德国洪堡大学商学院）； Department of Statistical and Actuarial Sciences, Western University, Canada（加拿大西部大学统计与精算科学系）； Department of Engineering Management, University of Antwerp, Belgium（比利时安特卫普大学工程管理系）； Business School, University of Edinburgh, United Kingdom（英国爱丁堡大学商学院）； Business School, University of Southampton, United Kingdom（英国南安普顿大学商学院）； School of Mathematical Sciences, University of Southampton, United Kingdom（英国南安普顿大学数学科学学院）； Department of Business Informatics and Operations Management, Ghent University, Belgium（比利时根特大学商业信息与运营管理系）； Department of Mathematics, University of Antwerp, Belgium（比利时安特卫普大学数学系）； Department of Mathematics, KU Leuven, Belgium（比利时库勒万大学数学系）

AI总结本文研究了信贷风险预测中基础模型的应用，探讨了其在小数据环境下提升预测性能的能力，并通过对比多种方法验证了基础模型在PD和LGD建模任务中的优越性。

详情

AI中文摘要

预测模型在信贷风险管理中发挥着关键作用，通过准确估计违约概率和损失来指导关键决策。大量研究引入了新的建模技术，并通过大规模基准研究巩固了最先进的方法。如今，梯度提升模型配以SHAP解释器已成为准标准，但风险模型的持续改进仍是首要任务。同时，人工智能的快速进展，尤其是大型语言模型，已颠覆了预测建模范式。基础模型通过在广泛领域数据集上预训练，利用先验知识表现出色。尽管在自然语言处理和计算机视觉中广泛应用，但针对表格数据的基础模型才刚刚出现。我们推测，在小数据设置中，如中小企业贷款或专门化的公司投资组合中，使用非领域数据进行预训练可能特别有益，并可能帮助解决长期存在的挑战，包括低违约率投资组合和类别不平衡问题。本文将最近提出的方法与广泛竞争对手进行基准测试，包括已建立和先进的机器学习技术，在PD和LGD建模两个核心任务中进行评估。我们的评估涵盖了各种数据集、性能指标和实验条件。我们发现，表格基础模型在各种数据集和任务中表现最佳。此外，当数据集规模减小时，它们在预测性能上提供了显著改进。这些结果令人印象深刻，因为模型在即开即用的情况下进行测试，无需超参数调优，确保了易用性和降低了计算成本。

英文摘要

Predictive models play a pivotal role in credit risk management, guiding critical decisions through accurate estimation of default probabilities and losses. Extensive research has introduced new modeling techniques, complemented by large-scale benchmarking studies consolidating the state-of-the-art. Today, quasi-standards such as gradient-boosting models paired with SHAP explainers have emerged, yet continuous improvement of risk models remains a top priority. Concurrently, rapid advancements in AI, most notably large language models, have disrupted predictive modeling paradigms. Foundation models, pretrained on extensive datasets from diverse domains, have demonstrated remarkable performance by leveraging prior knowledge. While prevalent in natural language processing and computer vision, foundation models for tabular data have only recently emerged. We conjecture that pretraining on out-of-domain data is particularly beneficial in small-data settings, such as SME lending or specialized corporate portfolios, and may help address longstanding challenges including low default portfolios and class imbalance. This paper benchmarks recently proposed tabular foundation models against a broad set of competitors, including established and advanced machine learning techniques, across two core tasks: PD and LGD modeling. Our evaluation encompasses various datasets, performance indicators, and experimental conditions. We find that tabular foundation models generally perform best across datasets and tasks. Moreover, they offer significant improvement in predictive performance as dataset size shrinks. These results are remarkable given that the models are tested out-of-the-box, without hyperparameter tuning, ensuring ease of use and mitigating computational costs.

URL PDF HTML ☆

赞 0 踩 0

2605.18082 2026-05-19 cs.LG 版本更新

pyforce-1.0.0: Python Framework for data-driven model Order Reduction of multi-physiCs problEms

pyforce-1.0.0: 用于多物理问题数据驱动模型降阶的Python框架

Stefano Riva, Yantao Luo, Carolina Introini, Antonio Cammi

发表机构 * Department of Energy, Nuclear Engineering Division, Politecnico di Milano（能源学院，核工程系，米兰理工学院）； Department of Mechanical and Nuclear Engineering and Emirates Nuclear Technology Center, Khalifa University（机械与核工程学院和阿联酋核技术中心，卡比大学）

AI总结本文提出pyforce-1.0.0框架，采用数据驱动降阶建模技术用于多物理问题，主要应用于核工程领域，改进了传感器位置优化和实测数据整合，提升了物理系统认知。

Comments Github Repo: https://github.com/ERMETE-Lab/ROSE-pyforce

2605.18079 2026-05-19 cs.LG cs.CC cs.CL 版本更新

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

低精度softmax变换器的表达能力（摘要）链式思维

Moritz Brösamle, Stephan Eckstein

发表机构 * Department of Mathematics, University of Tübingen, Germany（图宾根大学数学系）

AI总结本文研究了低精度softmax变换器在链式思维中的表达能力，通过构造三元激活和分离注意力分数的硬max变换器来模拟图灵机，从而将构造转换为等效的softmax变换器，并分析了最近提出的总结链式思维范式在模拟图灵机时的效率。

Comments Accepted to ICML 2026

详情

AI中文摘要

现有的变换器表达性结果通常依赖于hardmax注意力、高精度和其它架构修改，这些修改将它们与实际使用的模型脱节。我们通过分析具有softmax注意力和激活值及注意力权重四舍五入的标准变换器解码器，同时允许深度和宽度以对数方式增长于上下文长度，来弥合这一差距。作为中间步骤，我们构造了具有三元激活和良好分离注意力分数的硬max变换器，利用链式思维（CoT）模拟图灵机。这使我们能够将构造转换为等效的softmax变换器，而无需先前方法所需的不现实的参数规模或激活精度。使用相同的技术，我们分析了最近提出的总结Co T范式，并展示其在模拟图灵机时更加高效，模型大小以空间界而非时间界缩放。我们通过在数独推理任务上验证我们的结果，并发现其比先前的高精度结果更符合可学习性。我们的代码可在https://github.com/moritzbroe/transformer-expressivity上获得。

英文摘要

Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transformer decoders with softmax attention and rounding of activations and attention weights, while allowing depth and width to grow logarithmically with the context length. As an intermediate step, we construct hardmax transformers with ternary activations and well-separated attention scores that simulate Turing machines using Chain-of-Thought (CoT). This lets us convert the constructions to equivalent softmax transformers without the unrealistic parameter magnitudes or activation precision that prior approaches would require. Using the same technique, we analyze a recently proposed summarized CoT paradigm and show that it simulates Turing machines more efficiently, with model size scaling logarithmically in a space bound rather than a time bound. We empirically test predictions made by our results on a Sudoku reasoning task and find better alignment with learnability than for prior high-precision results. Our code is available at https://github.com/moritzbroe/transformer-expressivity.

URL PDF HTML ☆

赞 0 踩 0

2605.18078 2026-05-19 cs.LG 版本更新

Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

通过对手感知盆地入口进行多智能体策略梯度的均衡选择

Yevhen Shcherbinin, Arina Redina, Maxim Kalpin, Vlad Kochetov

发表机构 * Bloomsbury Technology（布洛姆斯伯里技术）； London School of Economics and Political Science（伦敦政治经济学院）； University of Bristol（布里斯托大学）； Johannes Kepler University Linz（林茨约翰尼斯·开普勒大学）； Odesa Polytechnic National University（敖德萨国立技术大学）

AI总结本文研究了多智能体策略梯度方法在局部收敛到稳定纳什均衡时的均衡选择问题，提出通过对手感知的盆地入口概率机制来提升目标均衡集的进入概率，并通过实验验证了该机制在合作盆地中的有效性。

详情

AI中文摘要

多智能体策略梯度方法已被证明能够局部收敛到稳定的纳什均衡。然而，局部收敛并不决定最终达到哪一个均衡。本文通过相对于由外部标准（如收益支配）选择的目标均衡集的盆地入口概率来研究这一问题。对于有限展开的元Meta-MAPG，我们证明更新可以分解为普通的策略梯度加上自身学习和同伴学习的修正，其中包含受控的采样噪声和有限展开偏差。我们识别出同伴学习修正作为主要的均衡选择机制：在局部对齐条件下，进入目标稳定纳什集的认证吸引区域的概率相对于普通的策略梯度会增加。由于持续的修正可能会改变原始游戏的零更新点，进入盆地后对修正进行退火可以恢复普通的策略梯度动态，并继承局部稳定的纳什收敛保证。在 stag hunt、迭代囚徒困境和初步的神经策略协调环境中的实验支持了这一盆地入口观点，显示在同伴意识更新下合作盆地的进入概率增加。

英文摘要

Multi-agent policy-gradient methods have been shown to converge locally near stable Nash equilibria. Local convergence, however, does not determine which equilibrium is reached. We study this question through basin-entry probability with respect to a target set of equilibria selected by an external criterion, such as payoff dominance. For finite-unroll Meta-MAPG, we show that the update decomposes into ordinary policy gradient plus own-learning and peer-learning corrections, with controlled sampling noise and finite-unroll bias. We identify the peer-learning correction as the main equilibrium-selection mechanism: under a local alignment condition, the probability of entering the certified attraction region of the target stable-Nash set increases, relative to ordinary policy gradient. Because persistent correction may shift zero-update points of the original game, annealing the correction after entering the basin recovers ordinary policy-gradient dynamics and inherits local stable-Nash convergence guarantees. Experiments in Stag Hunt, iterated Prisoner's Dilemma, and preliminary neural-policy coordination environments support this basin-entry view, showing increased entry into cooperative basins under peer-aware updates.

URL PDF HTML ☆

赞 0 踩 0

2605.18069 2026-05-19 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Wasserstein bounds for denoising diffusion probabilistic models via the Föllmer process

通过Föllmer过程研究去噪扩散概率模型的Wasserstein界限

Yuta Koike

发表机构 * Graduate School of Mathematical Sciences, University of Tokyo（东京大学数学科学研究院）； CREST, Japan Science and Technology Agency（日本科学技术 Agency CREST）

AI总结本文研究了去噪扩散概率模型（DDPMs）在2-Wasserstein距离下的采样误差界限，提出了三种核心贡献：一是基于一般Lipschitz型条件和广泛方差调度（包括余弦调度），建立了最优的上界；二是证明了相同的Lipschitz型条件蕴含对数Sobolev不等式和二次运输成本不等式；三是展示了对于一般的对数凹目标分布，即使没有二次运输成本不等式，最优的Wasserstein误差界限仍可达到。

Comments 45 pages

详情

AI中文摘要

本文研究了去噪扩散概率模型（DDPMs）在2-Wasserstein距离下的采样误差界限。我们的贡献有三个方面。 (i) 在一般Lipschitz型条件和广泛方差调度（包括余弦调度）下，我们建立了最优的上界，该上界在维度和步骤数上都是最优的，并恢复了文献中已获得的几个最优误差界限。 (ii) 我们证明了相同的Lipschitz型条件，涵盖了通常施加于（学习的）得分函数的条件，蕴含对数Sobolev不等式以及DDPM的二次运输成本不等式。因此，在现有工作的覆盖设置中，最优的Wasserstein界限（在对数因子范围内）可以从最近在Kullback-Leibler散度下的最优误差界限中推导出来。 (iii) 我们展示了对于一般的对数凹目标分布，即使没有目标的二次运输成本不等式，最优的Wasserstein误差界限仍可达到。我们的分析基于将DDPM采样器视为Föllmer过程的离散化，而不是传统的反向Ornstein-Uhlenbeck过程。

英文摘要

This paper studies sampling error bounds for denoising diffusion probabilistic models (DDPMs) in the 2-Wasserstein distance. Our contributions are threefold. (i) Under general Lipschitz-type conditions on the score function and for a broad class of variance schedules, including the cosine schedule, we establish sharp upper bounds that are optimal in both the dimension and the number of steps, and recover several sharp error bounds previously obtained in the literature. (ii) We prove that the same Lipschitz-type conditions, which encompass those commonly imposed on the (learned) score, imply a logarithmic Sobolev inequality and hence a quadratic transportation cost inequality for the DDPM. As a consequence, in settings covered by existing work, an optimal Wasserstein bound, up to a logarithmic factor, follows from the recently obtained sharp error bound in the Kullback-Leibler divergence under geometric-type variance schedules. (iii) We show that for general log-concave target distributions, the optimal Wasserstein error bound remains attainable even without a quadratic transportation cost inequality for the target. Our analysis is based on viewing the DDPM sampler as a discretization of the Föllmer process rather than the conventional reverse Ornstein-Uhlenbeck process.

URL PDF HTML ☆

赞 0 踩 0

2605.18068 2026-05-19 cs.LG cs.AI 版本更新

关于Föllmer过程与去噪扩散概率模型之间联系的注记

Yuta Koike

发表机构 * Graduate School of Mathematical Sciences, University of Tokyo（东京大学数学科学研究院）； CREST, Japan Science and Technology Agency（日本科学技术 Agency CREST）

AI总结本文探讨了Föllmer过程与去噪扩散概率模型（DDPM）之间的联系，指出离散化的Föllmer过程可以作为DDPM采样器的自然超参数设置，并系统地恢复了DDPM采样误差界的结果。

Comments 32 pages

2605.18035 2026-05-19 cs.AI cs.LG 版本更新

New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

零阶硬阈值化中方差减少的新见解：缓解梯度误差和扩张性矛盾

Xinzhe Yuan, William de Vazelhes, Bin Gu, Huan Xiong

发表机构 * IASM, Harbin Institute of Technology（哈尔滨工业大学人工智能研究所，哈尔滨工业大学）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）

AI总结本文提出了一种通用的方差减少零阶硬阈值化算法，通过考虑方差的作用，缓解零阶梯度与硬阈值操作之间的冲突，从而消除对随机方向数量的限制，提高收敛速度和应用范围。

Comments Published as a conference paper at ICLR 2024. 9 pages main paper, 24 pages appendix, 11 figures, 7 tables. Correspondence to Bin Gu and Huan Xiong

详情

Journal ref: International Conference on Learning Representations (ICLR), 2024

AI中文摘要

硬阈值化是机器学习中用于解决ℓ0约束优化问题的重要算法类型。然而，在某些情况下，目标函数的真实梯度可能难以获取，通常可以通过零阶（ZO）方法进行近似。到目前为止，SZOHT算法是唯一能够处理ℓ0稀疏性约束的ZO梯度算法。不幸的是，由于零阶梯度的偏差与硬阈值操作的扩张性之间存在固有的矛盾，SZOHT在ZO梯度的随机方向数量上存在明显的限制。本文通过考虑方差的作用，提供了一种新的方差减少见解：缓解零阶梯度与硬阈值操作之间的独特矛盾。在此视角下，我们提出了一种通用的方差减少零阶硬阈值化算法以及在标准假设下的通用收敛性分析。理论结果表明，新算法消除了对随机方向数量的限制，相较于SZOHT，具有改进的收敛速度和更广泛的应用范围。最后，我们通过岭回归问题以及黑盒对抗攻击问题展示了本方法的实用性。

英文摘要

Hard-thresholding is an important type of algorithm in machine learning that is used to solve $\ell_0$ constrained optimization problems. However, the true gradient of the objective function can be difficult to access in certain scenarios, which normally can be approximated by zeroth-order (ZO) methods. The SZOHT algorithm is the only algorithm tackling $\ell_0$ sparsity constraints with ZO gradients so far. Unfortunately, SZOHT has a notable limitation on the number of random directions % in ZO gradients due to the inherent conflict between the deviation of ZO gradients and the expansivity of the hard-thresholding operator. This paper approaches this problem by considering the role of variance and provides a new insight into variance reduction: mitigating the unique conflicts between ZO gradients and hard-thresholding. Under this perspective, we propose a generalized variance reduced ZO hard-thresholding algorithm as well as the generalized convergence analysis under standard assumptions. The theoretical results demonstrate the new algorithm eliminates the restrictions on the number of random directions, leading to improved convergence rates and broader applicability compared with SZOHT. Finally, we illustrate the utility of our method on a ridge regression problem as well as black-box adversarial attacks.

URL PDF HTML ☆

赞 0 踩 0

2605.18033 2026-05-19 cond-mat.mtrl-sci cs.LG physics.app-ph 版本更新

Real-time Multi-instrument Autonomous Discovery of Novel Phase-change Memory Materials

实时多仪器自主发现新型相变存储器材料

Chih-Yu Lee, Haotong Liang, Ryan Kim, Austin McDannald, Carlos A Rios Ocampo, A. Gilad Kusne, Ichiro Takeuchi

发表机构 * Department of Materials Science and Engineering, University of Maryland, College Park, MD, USA（材料科学与工程系，马里兰大学，College Park, MD, USA）； Materials Measurement Science, Division of the National Institute of Standards and Technology, Gaithersburg, MD, USA（国家标准技术研究院材料测量科学部，Gaithersburg, MD, USA）； Institute of Research in Electronics and Applied Physics, University of Maryland, College Park, Maryland, USA（电子与应用物理研究所，马里兰大学，College Park, Maryland, USA）； Maryland Quantum Materials Center, University of Maryland, College Park, MD, USA（马里兰量子材料中心，马里兰大学，College Park, MD, USA）

AI总结本文提出了一种实时多仪器自主发现框架，通过闭环方式同时进行结构属性映射和功能属性优化，用于发现新型相变存储器材料，实现了七倍速度提升。

Comments 25 pages, 5 figures

详情

AI中文摘要

MARR: 模块自适应残差重建用于低比特后训练量化

Le Su, Xing Luo, Zhi Jin

发表机构 * Peng Cheng Laboratory（鹏城实验室）

AI总结本文提出MARR，一种模块自适应残差重建方法，通过为每个模块分配特定的缩放系数，平衡残差相关的HA偏差和累积误差校正，从而在低比特量化中提升性能。

详情

AI中文摘要

近年来，基于残差重建的模型量化方法在低比特后训练量化（PTQ）中取得了有希望的性能，通过引入跨层残差来减少来自先前层的误差积累。然而，这些残差也可能引入额外的偏差，源于重建基于PTQ的Hessian近似（HA）假设，导致量化性能不理想。在本文中，我们分析发现，通过将残差项乘以一个缩放系数，可以提供一种直接的方法来缓解与残差强度相关的HA偏差，同时保持累积误差校正。更重要的是，我们观察到这种权衡是模块依赖性的，使单一全局残差强度不足以在不同模块之间平衡有效的校正和残差相关的偏差。基于这些观察，我们提出了模块自适应残差重建（MARR），为每个模块分配模块特定的缩放系数，以自适应地平衡累积误差校正和残差相关的HA偏差。为了避免昂贵的每模块系数搜索并获得稳定的系数估计，我们设计了一种基于比例-积分-微分（PID）的自适应更新策略，利用重建误差作为反馈，逐步细化此系数。在多个典型的大语言模型（LLMs）和视觉变换器（ViTs）上的实验表明，MARR在低比特量化（小于等于4位）中表现出色，实现了LLMs高达20.2%的性能提升，以及ViTs相对于残差重建最先进的方法高达4.6%的相对提升。代码将在接受后公开发布。

英文摘要

Recently, residual reconstruction-based model quantization methods have achieved promising performance in low-bit post-training quantization (PTQ) by introducing cross-layer residuals to reduce error accumulated from previous layers.However, these residuals may also introduce additional bias arising from the Hessian-approximation (HA) assumption underlying reconstruction-based PTQ, leading to suboptimal quantization performance.In this work, we analyze that multiplying the residual term by a scaling coefficient provides a direct way to mitigate the HA bias associated with residual strength, while preserving accumulated-error correction. More importantly, we observe that this trade-off is module-dependent, making a single global residual strength insufficient to balance effective correction and residual-related bias across modules.Based on these observations, we propose Module-Adaptive Residual Reconstruction (MARR), which assigns a module-specific scaling coefficient to adaptively balance accumulated-error correction and residual-related HA bias for each module.To avoid expensive per-module coefficient search and obtain a stable coefficient estimate, we design a Proportional-Integral-Derivative (PID)-based adaptive update strategy that uses reconstruction error as feedback to progressively refine this coefficient. Experiments on several typical large language models (LLMs) and vision transformers (ViTs) demonstrate the effectiveness of MARR under low-bit quantization (less than or equal to 4-bit), achieving up to 20.2% performance gains on LLMs and up to 4.6% relative gains on ViTs over the residual reconstruction state-of-the-art methods.Code will be made publicly available upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2605.17985 2026-05-19 cs.LG cs.AI 版本更新

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

SAFE-SVD：面向物理基础模型的敏感性感知保真度压缩SVD

Chengjie Hong, Feixiang He, Yiheng Zeng, Lulu Kang, He Wang

发表机构 * AI Centre, University College London（伦敦大学学院人工智能中心）； University College London（伦敦大学学院）； Central South University（中南大学）； University of Massachusetts at Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结本文提出了一种新的压缩物理基础模型的方法，通过在压缩过程中显式建模损失感知的层敏感性，以保持准确性和物理保真度，实验表明在多个模型和数据集上实现了显著的压缩增益。

详情

AI中文摘要

我们提出了一种新的方法，用于压缩物理基础模型（PFMs），这是AI for Science领域的新趋势。尽管模型压缩对于减少内存使用和加速大基础模型的推理至关重要，但其在PFMs中的应用仍然不足探索，因为保持物理保真度至关重要。挑战在于物理数据的功能性质，其中偏导数编码了时空动态，并对压缩具有高度敏感性。传统压缩方法忽视了这种结构，常常导致严重的性能退化或失败。为此，我们引入了一种敏感性感知的保真度强制压缩框架，在压缩过程中显式建模输出函数空间中的损失感知层敏感性。这为压缩科学基础模型提供了一条新途径，同时保持准确性和物理保真度。实验表明，在多个模型和数据集上，相较于现有方法，取得了显著的增益，实现了更高的压缩比，同时保持准确性，在某些情况下甚至提高了几个数量级。更广泛地说，这项工作可能引领AI for Science领域高效、可部署和可持续的科学基础模型的新子领域。

英文摘要

We propose a new method for compressing physics foundation models (PFMs) which is a new trend in AI for Science. While model compression is essential for reducing memory use and accelerating inference in large foundation models, it remains under-explored for PFMs, where preserving physical fidelity is crucial. The challenge lies in the functional nature of physics data, where partial derivatives encode spatiotemporal dynamics and exhibit high sensitivity to compression. Conventional compression methods ignore this structure, often causing severe performance degradation or failure. To address this, we introduce a sensitivity-aware fidelity-enforcing compression framework that explicitly models loss-aware layer sensitivity in the output function space during compression. This provides a new route to compressing scientific foundation models while preserving accuracy and physical fidelity. Experiments show substantial gains over existing methods across multiple models and datasets, achieving significantly higher compression ratios while maintaining accuracy, in some cases by orders of magnitude. More broadly, the work potentially leads to a new subfield of efficient, deployable, and sustainable scientific foundation models in AI for Science.

URL PDF HTML ☆

赞 0 踩 0

2605.17968 2026-05-19 cs.LG 版本更新

Function graph transformers universally approximate operators between function spaces

函数图变换器在函数空间之间近似算子

Takashi Furuya, David Mis, Ivan Dokmanić, Maarten V. de Hoop, Matti Lassas

发表机构 * Doshisha University（大阪市立大学）； RIKEN AIP（日本科学技术厅Advanced Institute for Photonics and Electron器件）； Rice University（里士满大学）； University of Basel（巴塞尔大学）； Simons Chair in Computational and Applied Mathematics and Earth Science（Simons计算与应用数学及地球科学主席职位）； University of Helsinki（赫尔辛基大学）

AI总结本文研究了通过变换器近似函数空间之间非线性算子的问题，提出了一种基于图度量的函数图变换器，能够以单值函数形式输出，并证明其在广义非线性算子近似中的通用性。

详情

AI中文摘要

我们研究了通过变换器近似函数空间之间非线性算子的问题。我们的方法是将函数提升为在其图上支持的度量，并利用最近引入的度量论视角来分析变换器。函数h通过其图度量γ_h表示，其中有限的token{(x_j,h(x_j))}_{j=1}^N是其经验近似。我们证明，该框架优雅地通过度量的收敛来建模离散化细化，并提供了一个自然的算子学习设置。在此框架中，我们引入了函数图变换器，即一种图保持的度量变换器子类，能够将图度量映射为图度量，也就是说，输出保持为单值函数。关键的是，这种额外的结构并不降低通用性：我们证明，所得到的图保持映射可以被标准softmax自注意力层和点wise MLP的有限组合近似，从而在广泛的非线性算子类别中实现通用近似结果。与现有基于变换器的算子学习理论方法不同，度量论框架还能够处理正则化的负阶Sobolev输入，这些输入的离散化不变性特别具有挑战性，以及不同输出域上的查询点。总体而言，函数图变换器为基于变换器的算子学习提供了一个连续视角和数学工具包，明确了位置编码、图结构、正则化和在离散化之间保持一致的作用。

英文摘要

We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $γ_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.

URL PDF HTML ☆

赞 0 踩 0

2605.17958 2026-05-19 cs.LG cs.PL 版本更新

Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning

通过基于一致性的强化学习增强大语言模型的代码推理能力

Zhanyue Qin, Jia Feng, Yibo Lyu, Yun Peng, Dianbo Sui, Cuiyun Gao, Qing Liao

发表机构 * Harbin Institute of Technology（哈尔滨工业大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结本文提出CodeThinker框架，通过一致性驱动的强化学习方法提升大语言模型的代码推理能力，实验表明其在多个基准测试中表现优异，显著提升了代码生成和数学推理任务的准确性。

Comments Under review

详情

AI中文摘要

代码推理指的是在给定源代码和特定输入的情况下预测程序输出的任务。它可以衡量大语言模型（LLMs）的推理能力，并且有助于下游任务，如代码生成和数学推理。现有工作已验证了强化学习在该任务上的有效性。然而，这些方法仅基于最终输出或粗粒度信号设计奖励，忽略了任务中逐步推理过程的内在一致性。因此，这些方法常常导致稀疏奖励或奖励黑客问题，限制了增强学习能力的充分发挥。为缓解这些问题，我们提出CodeThinker，一种用于代码推理的一致性驱动强化学习框架。具体而言，CodeThinker有三个关键组件：（1）一个具有逐步推理意识的模型训练模块，利用一致性追踪范式作为模板，合成捕捉逐步推理过程的训练数据；（2）一个动态束采样策略，旨在在固定采样预算下提高采样输出的质量；（3）一个一致性奖励机制，可以有效缓解奖励黑客问题。在三个流行基准测试上的实验表明，CodeThinker在多个LLMs上均取得最佳性能。例如，当部署在Qwen2.5-Coder-7B-Instruct上时，其在准确性方面比最强基线高出4.3%。我们还验证了CodeThinker在下游任务中的有效性。结果表明，在不进行额外训练的情况下，CodeThinker在覆盖17种编程语言的数学推理和代码推理任务中分别获得了平均准确率提升5.33和3.11个百分点。

英文摘要

Code reasoning refers to the task of predicting the output of a program given its source code and specific inputs. It can measure the reasoning capability of large language models (LLMs) and also benefit downstream tasks such as code generation and mathematical reasoning. Existing work has verified the effectiveness of reinforcement learning on the task. However, these methods design rewards solely based on final outputs or coarse-grained signals, and neglect the inherent consistency of the stepwise reasoning process in the task. Therefore, these methods often result in sparse reward or reward hacking, which limits the full play of enhanced learning capabilities. To alleviate these issues, we propose CodeThinker, a consistency-driven reinforcement learning framework for code reasoning. Specifically, CodeThinker has three key components: (1) a stepwise reasoning-aware model training module, which utilizes a consistency tracing paradigm as a template to synthesize training data that captures the stepwise reasoning process; (2) a dynamic beam sampling strategy, which aims to improve the quality of sampled outputs under a fixed sampling budget; and (3) a consistency reward mechanism that can effectively alleviate reward hacking. Experiments on three popular benchmarks show that CodeThinker achieves state-of-the-art performance across multiple LLMs. For instance, it outperforms the strongest baseline by 4.3% in accuracy when deployed on Qwen2.5-Coder-7B-Instruct. We also validate the effectiveness of CodeThinker on downstream tasks. Results show that, without additional training, CodeThinker obtains average accuracy gains of 5.33 and 3.11 percentage points on mathematical reasoning and code reasoning tasks covering 17 programming languages, respectively.

URL PDF HTML ☆

赞 0 踩 0

2605.17954 2026-05-19 cs.CV cs.AI cs.LG 版本更新

A More Word-like Image Tokenization for MLLMs

一种更像单词的图像标记化方法用于大规模语言模型

Hyun Lee, Hyemin Jeong, Yejin Kim, Hyungwook Choi, Hyunsoo Cho, Soo Kyung Kim, Joonseok Lee

发表机构 * Seoul National University（首尔国立大学）； Ewha Womans University（成均馆大学）

AI总结本文提出了一种解耦视觉标记化方法（DiVT），通过将图像块嵌入聚类为语义单元，使每个标记对应于独特的视觉概念，从而提升多模态模型的性能和效率。

详情

Journal ref: Proceedings of the IEEE/CVF International Conference on Pattern Recognition and Computer Vision (CVPR), 2026

AI中文摘要

现代多模态大语言模型（MLLMs）通常保持语言模型不变，并训练一个视觉投影器，将像素映射到其嵌入空间中的标记序列，使图像能以与文本相同的形式呈现。然而，语言模型已优化以操作离散且具有语义意义的标记，而现有视觉投影器将图像转换为长流的连续且高度相关的嵌入。这导致视觉标记的行为不同于LLM最初训练以理解的单词状单元。我们提出了一种新的解耦视觉标记化（DiVT），将图像块嵌入聚类为连贯的语义单元，使得每个标记对应于一个独特的视觉概念，而不是一个刚性的网格单元。DiVT进一步根据图像复杂度调整其标记预算，提供显式的精度-计算权衡，既不修改视觉编码器也不修改语言模型。在多样化的多模态基准测试中，DiVT在显著较少的视觉标记下匹配或超越基线，展示了在有限标记预算下的鲁棒性，显著降低了内存成本和延迟，同时使视觉输入更兼容于LLM。我们的代码可在https://github.com/snuviplab/DiVT上获得。

英文摘要

Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a visual projector that maps the pixels into a sequence of tokens in its embedding space, so that images can be presented in essentially the same form as text. However, the language model has been optimized to operate on discrete, semantically meaningful tokens, while prevailing visual projectors transform an image into a long stream of continuous and highly correlated embeddings. This causes the visual tokens to behave differently from the word-like units that LLMs are originally trained to understand. We propose a novel Disentangled Visual Tokenization (DiVT) that clusters patch embeddings into coherent semantic units, so each token corresponds to a distinct visual concept instead of a rigid grid cell. DiVT further adapts its token budget to image complexity, providing an explicit accuracy-compute trade-off modifying neither the vision encoder nor the language model. Across diverse multimodal benchmarks, DiVT matches or surpasses baselines with significantly fewer visual tokens, demonstrating robustness under limited token budgets, significantly reducing memory cost and latency while making visual inputs more compatible with LLMs. Our code is available at https://github.com/snuviplab/DiVT.

URL PDF HTML ☆

赞 0 踩 0

2605.17938 2026-05-19 cs.LG cs.AI stat.ML 版本更新

Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

通过镜像反学习和噪声一致偏斜训练数据归因

Joan Serrà, Dipam Goswami, Fabio Morreale, Wei-Hsiang Liao, Yuki Mitsufuji

发表机构 * Sony AI（索尼人工智能）

AI总结本文提出了一种基于镜像反学习和噪声一致偏斜的方法，用于提升扩散模型的训练数据归因的可靠性与鲁棒性，通过在不同数据集上显著优于现有方法，展示了其在生成实例间影响实例重叠和扩散损失比较任务中的潜力。

Comments 21 pages, 5 figures, 9 tables (includes appendix)

详情

AI中文摘要

训练数据归因（TDA）应能够促进生成模型的可解释性，并推动各种相关下游任务的发展。然而，当前的TDA方法缺乏可靠性和鲁棒性，阻碍了其在实际应用中的采用。在本文中，我们采取了关键步骤，以实现更可靠和鲁棒的扩散模型TDA。我们提出通过镜像反学习和噪声一致偏斜（MUCS）进行TDA。该方法的核心思想是使用受限的镜像梯度上升微调第二个模型，并通过一致的噪声样本测量该模型相对于原始模型的归一化偏斜。我们展示了，尽管概念上简单且通用，MUCS在三个不同的数据集上系统性地大幅优于现有方法。此外，我们研究了核心设计选择对最终性能的影响，并分析了影响实例在生成项目中的重叠以及整合TDA方法的潜力。我们相信，我们的发现可能对更一般的反学习设置以及需要比较扩散损失的任务具有更广泛的意义。

英文摘要

Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses.

URL PDF HTML ☆

赞 0 踩 0

2605.17936 2026-05-19 cs.CL cs.LG 版本更新

Universal Adversarial Triggers

通用对抗触发器

Benedict Florance Arockiaraj, Alexander Feng, Jianxiong Cai, Xiaoyu Cheng

AI总结本文提出了一种结合词性过滤和困惑度损失函数的新技术，生成更接近自然短语的合理触发器，以提高对抗攻击的检测难度并促进鲁棒模型的发展。

详情

AI中文摘要

近期的研究表明，现代NLP模型在从情感分析到语言生成的多种任务中均受到通用对抗攻击的影响，这类攻击是一种输入无关的攻击，使用共同的触发序列攻击模型。尽管这些攻击成功，但由此生成的触发器却不合语法且不自然。我们的工作提出了一种新颖的技术，结合词性过滤和基于困惑度的损失函数，以生成更合理的触发器，这些触发器更接近自然短语。在SST数据集上的情感分析任务中，该方法生成的触发器能够将正向预测翻转为负向预测，准确率降至0.04和0.12。为了构建鲁棒模型，我们还使用生成的触发器进行对抗训练，使模型的准确率从0.12提升至0.48。我们旨在展示通过生成合理的触发器，可以使得对抗攻击难以被检测，并通过相关防御促进鲁棒模型的发展。

英文摘要

Recent works have illustrated that modern NLP models trained for diverse tasks ranging from sentiment analysis to language generation succumb to universal adversarial attacks, a class of input-agnostic attacks where a common trigger sequence is used to attack the model. Although these attacks are successful, the triggers generated by such attacks are ungrammatical and unnatural. Our work proposes a novel technique combining parts-of-speech filtering and perplexity based loss function to generate sensible triggers that are closer to natural phrases. For the task of sentiment analysis on the SST dataset, the method produces sensible triggers that achieve accuracies as low as 0.04 and 0.12 for flipping positive to negative predictions and vice-versa. To build robust models, we also perform adversarial training using the generated triggers that increases the accuracy of the model from 0.12 to 0.48. We aim to illustrate that adversarial attacks can be made difficult to detect by generating sensible triggers, and to facilitate robust model development through relevant defenses.

URL PDF HTML ☆

赞 0 踩 0

2605.17930 2026-05-19 cs.LG 版本更新

InfoFlow: A Framework for Multi-Layer Transformer Analysis

InfoFlow: 多层Transformer分析的框架

Penghao Yu, Haotian Jiang, Zeyu Bao, Qianxiao Li

发表机构 * Department of Mathematics（数学系）； National University of Singapore（新加坡国立大学）； Institute for Functional Intelligent Materials（功能智能材料研究所）

AI总结该研究通过分析多层Transformer的近似能力，揭示了其与单层Transformer的根本差异，并提出InfoFlow框架以提升多层Transformer的近似效率。

Comments 36 pages

详情

AI中文摘要

尽管近期已有研究探讨了单层Transformer架构的近似性质，但对多层设置的严谨理论理解仍然有限。本文证明多层Transformer在某些检索任务中具有与单层Transformer根本不同的近似能力：对于某些检索任务，任何单层Transformer需要至少Ω(ε^{-k})参数才能达到精度ε，其中k与序列长度T线性增长，而双层Transformer每层一个头则能以至多O(ε^{-1})参数实现相同近似精度。为理解这种分离，我们识别出多层近似背后的两种结构机制。具体而言，softmax注意力只能高效检索获得最大注意力分数的token，导致k-th最大检索的参数成本呈指数级增长（k≥2）。此外，解码耦合信息的参数成本与所检索token集合的大小成正比。受这些发现启发，我们提出了InfoFlow框架，用于多层Transformer。该框架在每个token和层跟踪可访问的输入位置集合，并为每种信息传播模式分配明确的近似率。这种抽象恢复了已知的近似界限，与训练网络的实验观察保持一致，并在目前无法直接理论分析的设置中产生具体预测。我们的结果提供了一个原则性的框架，用于分析多层Transformer的近似效率。

英文摘要

While the approximation properties of single-layer Transformer architectures have been studied in recent works, a rigorous theoretical understanding of the multi-layer setting remains limited. In this work, we establish that multi-layer Transformers possess fundamentally different approximation capabilities from single-layer ones: for certain retrieval tasks, any single-layer Transformer requires least $Ω(\varepsilon^{-k})$ parameters to achieve precision $\varepsilon$, where $k$ grows linearly with sequence length $T$, whereas a two-layer Transformer with a single head per layer achieves the same approximation precision with at most $O (\varepsilon^{-1})$ parameters. To understand this separation, we identify two structural mechanisms underlying multi-layer approximation. Specifically, softmax attention can only efficiently retrieve the token attaining the maximum attention score, incurring exponential-in-length parameter cost for $k$-th largest retrieval with $k \geq 2$. Moreover, the parameter cost of decoding coupled information scales with the size of the retrieved token set. Motivated by these findings, we propose InfoFlow, a framework for multi-layer Transformers. The framework tracks an information set of accessible input positions at each token and layer, assigning an explicit approximation rate to each mode of information propagation. This abstraction recovers known approximation bounds, remains consistent with experimental observations on trained networks, and yields concrete predictions in settings where direct theoretical analysis is currently intractable. Our results provide a principled framework for reasoning about the approximation efficiency of multi-layer Transformers.

URL PDF HTML ☆

赞 0 踩 0

2605.17928 2026-05-19 cs.RO cs.LG 版本更新

Transfer Learning for Customized Car Racing Environments

迁移学习用于定制化的赛车环境

Benedict Florance Arockiaraj, Richard Chang, Wesley Yee

发表机构 * seas（系统工程与科学学院）

AI总结本文研究了迁移学习在深度强化学习中的应用，旨在通过在单一赛道上训练智能体，实现零样本迁移或进一步微调以在其他定制化赛车环境中获得更快的圈速，并比较了基于模型和非基于模型方法的性能。

详情

AI中文摘要

迁移学习是一种技术，其中模型/智能体可以利用其在一项任务中获得的知识/专长来解决另一个密切相关任务。通过本项目，我们探讨了迁移学习在深度强化学习中的应用。具体而言，我们希望利用迁移学习在OpenAI的赛车环境中实现快速圈速，通过在单一赛道上训练智能体，并通过零样本迁移或额外微调在其他定制化目标环境中进行比赛。此外，我们比较了基于模型和非基于模型方法的性能，并观察到基于模型的方法在性能上占优，并且在该环境中比非基于模型的方法收敛得更快。我们观察到迁移学习在大多数设置中不仅提升了目标领域的性能，而且在学习过程中也表现出高水平的性能能力。

英文摘要

Transfer Learning, a technique where a model/agent can use the knowledge/expertise that it gained from one task and exploit that to solve another closely-related task, is often used in tackling problems in deep learning. Through this project, we explore transfer learning in the purview of deep reinforcement learning. Specifically, we want to use transfer learning to achieve the fast lap times in OpenAI's Car racing environment by training the agent on one circuit, and racing it on other customized target environments by zero-shot transfer or by additional fine-tuning. In addition, we compare the performance of model-based and model-free approaches, and observe that model-based approaches dominate in performance and converge faster than model-free approaches in this environment. We observe that transfer learning in most setups not only boosts the performance on the target domain, but also shows high performance ability during learning.

URL PDF HTML ☆

赞 0 踩 0

2605.17923 2026-05-19 cs.DC cs.AI cs.LG 版本更新

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

AdaptiveLoad: 向高效视频扩散变换器训练迈进

Yucheng Guo, Yongjian Guo, Zhong Guan, Haoran Sun, Wen Huang, Wanting Xu, Jing Long, Shuai Di, Junwu Xiong

发表机构 * Tsinghua University（清华大学）； Peking University（北京大学）； Tianjin University（天津大学）

AI总结本文提出AdaptiveLoad框架，通过双约束自适应负载平衡系统和融合LayerNorm-Modulate CUDA内核，解决视频生成模型中大规模视频扩散变换器（如DiT和MMDiT）训练中的计算不平衡问题，实验显示其在Wan 2.1世界模型上提升了计算效率和训练吞吐量。

详情

AI中文摘要

在视频生成模型，特别是世界模型中，训练大规模视频扩散变换器（如DiT和MMDiT）由于混合模式数据集中序列长度的极端差异，带来了显著的计算挑战。现有基于桶的数据加载策略通常依赖于'等长token'约束。这种方法未能考虑自注意力机制的二次复杂性，导致严重的负载不平衡和GPU资源利用率低下。本文提出了AdaptiveLoad，一个集成优化框架，包含两个核心组件：（1）双约束自适应负载平衡系统，通过同时限制内存消耗和计算负载（B×S^p≤M_comp）消除长序列瓶颈；（2）融合LayerNorm-Modulate CUDA内核，利用D-tile共alesced减少策略提高吞吐量并缓解内存压力。实验结果表明，在Wan 2.1世界模型上，我们的方法将计算不平衡率从39%降低到18.9%，峰值VRAM利用率效率提高22.7%，并实现了整体训练吞吐量增加27.2%。

英文摘要

In video generation models, particularly world models, training large-scale video diffusion Transformers (such as DiT and MMDiT) poses significant computational challenges due to the extreme variance in sequence lengths within mixed-mode datasets. Existing bucket-based data loading strategies typically rely on "equal token length" constraints. This approach fails to account for the quadratic complexity of self-attention mechanisms, leading to severe load imbalance and underutilization of GPU resources. This paper proposes \textit{AdaptiveLoad}, an integrated optimization framework consisting of two core components: (1) A dual-constraint adaptive load balancing system, which eliminates long-sequence bottlenecks by simultaneously limiting memory consumption and computational load ($B \times S^p \le M_{\text{comp}}$); (2) A fused LayerNorm-Modulate CUDA kernel, which utilizes a D-tile coalesced reduction strategy to increase throughput and alleviate memory pressure. Experimental results on the Wan 2.1 world model demonstrate that our method reduces the computational imbalance rate from 39\% to 18.9\%, improves peak VRAM utilization efficiency by 22.7\%, and achieves an overall training throughput increase of 27.2\%.

URL PDF HTML ☆

赞 0 踩 0

2605.17918 2026-05-19 cs.LG cs.AI cs.CV 版本更新

Domain Transfer Becomes Identifiable via a Single Alignment

通过单个对齐使领域转移变得可识别

Sagar Shrestha, Subash Timilsina, Hoang-Son Nguyen, Xiao Fu

发表机构 * School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA（电气工程与计算机科学系，俄勒冈州立大学，科瓦利斯，俄勒冈，美国）

AI总结本文提出了一种新的方法，通过结构稀疏性条件和单个配对锚样本实现领域转移的可识别性，减少了对监督信号的依赖，并提出了高效的雅可比稀疏性正则化器以支持高维学习。

详情

AI中文摘要

领域转移（DT）将源分布映射到目标分布，并支持无监督的图像到图像翻译、单细胞分析和跨平台医学影像任务。然而，DT本质上是不明确的：推动正向映射通常不可识别，因为保持测度的自同构（MPAs）在保持边缘分布的同时改变跨领域对应关系，导致内容不一致的翻译。最近的工作表明，通过联合转移多个对应的源/目标条件分布可以消除MPAs，但标记这些条件的监督信号在实践中并不总是可用。我们开发了一种替代的DT可识别性路线。在雅可比支持图案的结构稀疏性条件下，我们证明了分布匹配与单个配对锚样本足以识别真实转移——比先前方法需要的监督更少。为了支持实际的高维学习，我们进一步提出了一种基于随机掩码有限差分的高效雅可比稀疏性正则化器，得到一个可扩展的替代品，无需显式雅可比评估。在合成和现实任务上的实验证实了理论。

英文摘要

Domain transfer (DT) maps source to target distributions and supports tasks such as unsupervised image-to-image translation, single-cell analysis, and cross-platform medical imaging. However, DT is fundamentally ill-posed: push-forward mappings are generally non-identifiable, as measure-preserving automorphisms (MPAs) preserve marginals while altering cross-domain correspondences, leading to content-misaligned translation. Recent work shows that MPAs can be eliminated by jointly transferring multiple corresponding source/target conditional distributions, but supervision signals labeling such conditionals are not always available in practice. We develop an alternative route to DT identifiability. Under a structural sparsity condition on the Jacobian support pattern, we show that distribution matching together with a single paired anchor sample suffices to identify the ground-truth transfer -- requiring substantially less supervision than prior approaches. To enable practical high-dimensional learning, we further propose an efficient Jacobian sparsity regularizer based on randomized masked finite differences, yielding a scalable surrogate without explicit Jacobian evaluation. Empirical results on synthetic and real-world DT tasks validate the theory.

URL PDF HTML ☆

赞 0 踩 0

2605.17899 2026-05-19 cs.LG cs.AI q-bio.QM 版本更新

DCFold: Efficient Protein Structure Generation with Single Forward Pass

DCFold: 通过单次前向传递高效生成蛋白质结构

Zhe Zhang, Yuanning Feng, Yuxuan Song, Keyue Qiu, Hao Zhou, Wei-Ying Ma

发表机构 * Institute for AI Industry Research (AIR)（人工智能产业研究院）； Department of Computer Science and Technology（计算机科学与技术系）； School of Computer Science and Technology（计算机科学与技术学院）； ByteDance Seed（字节跳动种子）

AI总结本文提出DCFold，一种单步生成模型，实现了与AlphaFold3同等的精度，通过双一致性训练框架和新的时间测地匹配（TGM）调度器，在保持预测保真度的同时将推理速度提升15倍，验证了其在结构预测和结合设计基准上的有效性。

2605.17898 2026-05-19 cs.LG 版本更新

Lightweight Gaussian Process Inference in C++ on Metal and CUDA

基于C++在Metal和CUDA上的轻量级高斯过程推断

Yu-Hsueh Fang

发表机构 * Department of Information Management, National Taiwan University（国立台湾大学信息管理系）； H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology（佐治亚理工学院H. Milton Stewart工业与系统工程学院）

AI总结本文提出LightGP，一个无需依赖的C++17库，用于高斯过程回归，支持Apple Metal和NVIDIA CUDA后端，以及通过Apple Accelerate和OpenBLAS优化的CPU路径。LightGP提供了四种推断路径，覆盖从N=100到N=500,000的问题规模，并在不同硬件上实现了显著的性能提升。

详情

AI中文摘要

高斯过程（GP）推断在Python中主要由GPyTorch和GPflow等库主导，这些库基于深度学习框架，继承了它们的调度开销和依赖项足迹。我们提出了LightGP，一个无依赖的C++17库，用于GP回归，并提供Python绑定，支持Apple Metal和NVIDIA CUDA后端，以及通过Apple Accelerate和OpenBLAS优化的CPU路径。LightGP提供了四种推断路径——精确的Cholesky分解、无矩阵的共轭梯度法、稀疏变分自由能和结构化核插值（SKI）与FFT——覆盖从N=100到N=500,000的问题。在Apple M4上，LightGP CPU在精确GP推断中比GPyTorch CPU快2.6-8.7倍，在稀疏GP推断中每种规模都快1.5倍。在NVIDIA RTX 3060上，LightGP CUDA在精确GP推断中比GPyTorch CUDA快2.3-6.7倍，直到N=2048，而在N=4096时GPyTorch缩小了差距。在Metal上融合的无矩阵核-向量乘积在N=20,000时以O(N)内存实现了32倍的性能提升，而通过Accelerate vDSP加速的SKI矩阵-向量乘法在N=200,000时运行在亚毫秒级别。LightGP编译为一个单一的静态库，无外部依赖，并可通过pip install lightgp安装。

英文摘要

Gaussian process (GP) inference in Python is dominated by libraries such as GPyTorch and GPflow, which are built on deep-learning frameworks and inherit their dispatch overhead and dependency footprint. We present LightGP, a dependency-free C++17 library for GP regression with Python bindings, supporting Apple Metal and NVIDIA CUDA backends alongside tuned CPU paths via Apple Accelerate and OpenBLAS. LightGP provides four inference paths -- exact Cholesky, matrix-free conjugate gradients, sparse variational free energy, and structured kernel interpolation with FFT -- covering problems from $N{=}100$ to $N{=}500{,}000$. On an Apple M4, LightGP CPU is 2.6--8.7$\times$ faster than GPyTorch CPU for exact GP and ${\sim}1.5\times$ faster for sparse GP at every scale tested. On an NVIDIA RTX~3060, LightGP CUDA is 2.3--6.7$\times$ faster than GPyTorch CUDA for exact GP up to $N{=}2{,}048$, with GPyTorch closing the gap at $N{=}4{,}096$. A fused matrix-free kernel-vector product on Metal achieves 32$\times$ over the explicit path at $N{=}20{,}000$ with $O(N)$ memory, and an FFT-accelerated SKI matvec via Accelerate vDSP runs in sub-millisecond time at $N{=}200{,}000$. LightGP compiles as a single static library with zero external dependencies and is installable via \texttt{pip install lightgp

URL PDF HTML ☆

赞 0 踩 0

2605.17888 2026-05-19 physics.flu-dyn cs.LG 版本更新

Long-horizon prediction of three-dimensional wall-bounded turbulence with CTA-Swin-UNet and resolvent analysis

利用CTA-Swin-UNet和分辨率分析进行三维壁湍流长周期预测

Bo Chen, Yitong Fan, Jie Yao, Weipeng Li

发表机构 * School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China（航空航天学院，上海交通大学，上海200240，中国）； School of Interdisciplinary Science, Beijing Institute of Technology, Beijing 100081, China（交叉科学学院，北京理工大学，北京100081，中国）

AI总结本文提出了一种混合机器学习框架，通过CTA-Swin-UNet和多时间尺度融合校正策略，有效预测壁平行平面的湍流场，并通过分辨率基谱线性随机估计重构三维流场，展示了该框架在长周期自回归预测中的有效性与计算效率。

Comments 40 pages, 18 figures

详情

AI中文摘要

利用机器学习方法对三维（3D）壁湍流进行长周期预测仍是一项具有挑战性的任务，由于自回归误差的快速累积以及显著的计算成本。为解决这些挑战，我们提出了一种混合机器学习框架，其中开发了通道-时间-注意Swin-UNet（CTA-Swin-UNet）和多时间尺度融合校正（MTFC）策略，以在可控计算成本下预测壁平行平面的湍流场。然后，通过基于分辨率的谱线性随机估计（SLSE）重构三维流场，根植于预测的平面流。结果表明，CTA-Swin-UNet在单步预测和自回归滚动预测中均优于基线模型（LSTM、FNO和传统Swin-UNet），表明将CTA模块引入Swin-UNet架构是有效的。在相同的时间间隔内，CTA-Swin-UNet在约150次滚动步骤内保持稳定，而基线模型在20至50次滚动步骤内失败。引入MTFC策略后，实现了长达300次的预测周期。使用基于分辨率的SLSE重构进一步从预测的平面输入中恢复三维流结构和能量谱分布，这表明所提出的框架为三维壁湍流的长周期自回归预测提供了一种有效且计算高效的途径。

英文摘要

Long-horizon prediction of three-dimensional (3D) wall-bounded turbulence with machine-learning methods remains a challenging task, due to the rapid accumulation of autoregressive errors and the substantially computational cost. To address these challenges, we present a hybrid machine-learning framework, in which a channel-time-attention Swin-UNet (CTA-Swin-UNet) and a multi-time-scale fusion correction (MTFC) strategy are developed to predict the turbulent flow fields in a wall-parallel plane, with affordable computational cost. Then, 3D flow fields are reconstructed via a resolvent-based spectral linear stochastic estimation (SLSE), rooting from the predicted planar flow. Results show that the CTA-Swin-UNet outperforms the baseline models (LSTM, FNO and traditional Swin-UNet) in both single-step prediction and autoregressive rollouts, indicating the effectiveness of introducing the CTA module into the Swin-UNet architecture. At the same temporal interval, the CTA-Swin-UNet remains stable for approximately 150 rollout steps, while the baseline models fail within 20 to 50 rollout steps. After introducing the MTFC strategy, a longer horizon upto 300 steps is achieved. Using the resolvent-based SLSE reconstruction further recovers the 3D flow structures and energy spectral distributions from the predicted planar inputs, which demonstrates that the proposed framework provides an effective and computationally efficient approach for long-horizon autoregressive prediction of 3D wall-bounded turbulence.

URL PDF HTML ☆

赞 0 踩 0

2605.17887 2026-05-19 cs.LG cs.AI 版本更新

Attention Sinks and Outliers in Attention Residuals

注意力沉底与注意力残差中的异常值

Haozheng Luo, Haoran Dai, Shaoyang Zhang, Xi Chen, Eric Hanchen Jiang, Yijiang Li, Jingyuan Huang, Chenghao Qiu, Chenwei Xu, Zhenyu Pan, Haotian Zhang, Binghui Wang, Yan Chen

发表机构 * Department of Computer Science, Northwestern University（西北大学计算机科学系）； Department of Computer Science and Engineering, University of Michigan（密歇根大学计算机科学与工程系）； Department of Statistics and Data Science, University of California Los Angeles（加州大学洛杉矶分校统计与数据科学系）； Department of Electrical and Computer Engineering, University of California San Diego（加州圣地亚哥大学电气与计算机工程系）； Department of Computer Science, Rutgers University-New Brunswick（新泽西州立大学鲁特学院计算机科学系）； Department of Computer Science and Engineering, Texas A&M University（德克萨斯农工大学计算机科学与工程系）； Department of Computer Science, Columbia University（哥伦比亚大学计算机科学系）

AI总结本文提出OASIS技术，通过层间空信号来解决注意力残差架构中注意力沉底、激活异常值以及推理稳定性下降的问题，通过双归一化设计和实验验证提升了模型的结构鲁棒性和量化鲁棒性。

详情

AI中文摘要

我们提出OASIS，一种基于层间空信号的异常值和沉底感知技术。As AttnResidual架构引入了额外的深度归一化通道，它们提高了层间路由的灵活性，但也加剧了注意力沉底、激活异常值以及由此导致的推理稳定性和量化鲁棒性下降。OASIS通过引入基于Softmax1的空空间和通过层间空信号将token级的空证据耦合到深度路由中，从而减少由沉底主导的路由并提高结构鲁棒性。理论上，我们证明了AttnResidual的双归一化设计加剧了沉底形成和量化脆性。实验上，我们在三个真实世界数据集上将OASIS与五个基线进行比较，并观察到在注意力沉底和后量化性能方面有持续的改进。值得注意的是，OASIS在评估设置中实现了最大无穷范数平均减少9.26%、平均峰度减少2.60%，并在W8A8下将困惑度降低了75.85%，在W4A4下将GSM8K Pass@1提高了12.42%。

英文摘要

We propose OASIS, an outlier- and sink-aware technique built on inter-layer null signaling. As AttnResidual architectures introduce an additional depth-wise normalization channel, they improve inter-layer routing flexibility but also exacerbate attention sinks, activation outliers, and the resulting degradation in inference stability and quantization robustness. OASIS addresses this issue by introducing a Softmax1-based null space and coupling token-level null evidence to depth routing through an inter-layer null signal, thereby reducing sink-dominated routing and improving structural robustness. Theoretically, we show that the dual-normalization design of AttnResidual intensifies sink formation and quantization brittleness. Experimentally, we compare OASIS against five baselines on three real-world datasets and observe consistent improvements in both attention sink and post-quantization performance. Notably, OASIS achieves an average reduction of 9.26% in maximum infinity norm and 2.60% in average kurtosis across the evaluated settings, while lowering perplexity by 75.85% under W8A8 and improving GSM8K Pass@1 by 12.42% under W4A4.

URL PDF HTML ☆

赞 0 踩 0

2605.17879 2026-05-19 cs.DC cs.AI cs.LG 版本更新

Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training

Guard：用于大规模训练的可扩展的延迟检测和节点健康管理

Guanliang Liu, Abhinandan Patni, Congzhu Lin, Zoe Zeng, Jack Wittmayer, Josh Wu, Ashvin Nihalani, Binxuan Huang, Yinghong Liu, Rory Na, Anthony Ko, Alexander Zhipa, Cong Cheng, Mi Sun, Vijay Rajakumar, Rejith George Joseph, Parthasarathy Govindarajen

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结本文提出Guard系统，通过在线性能监控和离线节点扫描机制，有效检测训练中的延迟节点并确保节点健康，从而提升大规模训练的效率和稳定性。

Comments Proceedings of the 9 th MLSys Conference, Bellevue, WA, USA, 2026

详情

AI中文摘要

训练前沿规模的基础模型需要协调成千上万的GPU进行多月运行，其中即使微小的性能退化也会累积成显著的效率损失。现有健康检查机制，如NCCL测试或GPU烧录，主要关注功能正确性，往往无法检测到悄无声息降低系统性能的fail-slow行为。在本文中，我们提出了Guard，一个用于检测stragglers并确保大规模训练集群中节点健康的可扩展系统。Guard结合了训练期间的轻量级在线性能监控与一个离线节点扫描机制，系统地评估和认证节点在参与生产工作负载之前。这种设计使Guard能够检测到传统诊断无法捕捉的急性故障和长期运行的fail-slow行为。在大规模基础模型预训练工作负载上部署Guard，可将平均FLOPs利用率提高多达1.7倍，将运行到运行的训练步骤方差从20%降至1%，增加平均故障时间（MTTF），并显著减少操作和调试开销。这些结果表明，主动检测stragglers和系统化的节点认证对于维持稳定和高效的大型训练至关重要。

英文摘要

Training frontier-scale foundation models involves coordinating tens of thousands of GPUs over multi-month runs, where even minor performance degradations can accumulate into substantial efficiency losses. Existing health-check mechanisms, such as NCCL tests or GPU burn-in, primarily focus on functional correctness and often fail to detect fail-slow behaviors that silently degrade system performance. In this paper, we present Guard, a scalable system for detecting stragglers and ensuring node health in large-scale training clusters. Guard combines lightweight online performance monitoring during training with an offline node-sweep mechanism that systematically evaluates and qualifies nodes before they participate in production workloads. This design enables Guard to detect both acute failures and long-running fail-slow behaviors that traditional diagnostics cannot capture. Deployed on large-scale foundation model pretraining workloads, Guard improves mean FLOPs utilization by up to 1.7x, reduces run-to-run training step variance from 20% to 1%, increases mean time to failure (MTTF), and significantly reduces operational and debugging overhead. These results demonstrate that proactive straggler detection and systematic node qualification are critical for maintaining stable and efficient large-scale training.

URL PDF HTML ☆

赞 0 踩 0

2605.17873 2026-05-19 cs.LG cs.AI cs.CL 版本更新

通过路径测度的序列蒙特卡洛实现扩散模型的简单近似与无导数推理时间缩放

Chenyang Wang, Weizhong Wang, Yinuo Ren, Jose Blanchet, Yiping Lu

发表机构 * School of Mathematical Sciences, Peking University, Beijing, China ； School of Mathematical Sciences, Fudan University, Shanghai, China ； Department of Industrial Engineering \& Management Sciences, Northwestern University, Evanston, IL, United States ； Institute for Computational \& Mathematical Engineering, Stanford University, Stanford, CA, United States ； Management Science \& Engineering, Stanford University, Stanford, CA, United States

AI总结本文提出URGE算法，一种无需梯度的推理时间缩放方法，通过路径重要性重加权提升扩散模型样本质量，同时在合成测试和扩散模型基准中表现出色，且实现简单且无梯度依赖。

Comments accepted by ICML 2026

详情

AI中文摘要

扩散生成模型越来越多地依赖于推理时间引导，通过添加漂移项或重新加权专家混合物来提高任务特定目标的样本质量。然而，大多数现有技术需要重复评估分数或梯度，引入偏差、高计算开销或两者兼有。我们引入URGE（Unbiased Resampling via Girsanov Estimation），一种无导数的推理时间缩放算法，通过Girsanov测度变换进行路径重要性重加权。与先前工作不同，URGE为每个模拟轨迹附加简单的乘法权重，并定期重新采样。无需计算基于梯度的粒子权重。我们建立了路径级和粒子级SMC之间的等价性：Girsanov路径权重允许一个向后条件期望，恢复先前的粒子级权重，保证两种方案产生相同的无偏终端分布。经验上，URGE在合成测试和扩散模型基准中优于现有推理时间引导基线，实现了更好的生成质量，同时显著更简单且完全无梯度依赖。

英文摘要

iffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high computational overhead, or both. We introduce \texttt{URGE}, Unbiased Resampling via Girsanov Estimation, a derivative-free inference-time scaling algorithm that performs path-wise importance reweighting via a Girsanov change of measure. Instead of computing gradient-based particle weights in previous work, \texttt{URGE} attaches a simple multiplicative weight to each simulated trajectory and periodically resamples. No score, no Hessian, and no PDE evaluation is required. We establish an equivalence between path-wise and particle-wise SMC: the Girsanov path weight admits a backward conditional expectation that recovers the previous particle-level weights, guaranteeing that both schemes produce the same unbiased terminal law. Empirically, \texttt{URGE} outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks, achieving better generation quality, while being significantly simpler to implement and fully gradient-free.

URL PDF HTML ☆

赞 0 踩 0

2605.17849 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

从有机数据生成预训练令牌以实现数据驱动的扩展

Zichun Yu, Chenyan Xiong

发表机构 * Language Technologies Institute, Carnegie Mellon University（卡内基梅隆大学语言技术研究所）

AI总结本文提出SynPro框架，通过重新表述和重新格式化操作，帮助大语言模型更充分地利用有限的有机数据，从而在数据驱动的预训练中实现更高效的扩展。

详情

AI中文摘要

LLM预训练正从计算驱动转向数据驱动的阶段，其中可用的人类（有机）文本远远无法满足扩展需求。然而，达到数据驱动阶段并不意味着模型已充分利用其有机语料库。在本文中，我们介绍了SynPro，一个合成数据生成框架，帮助LLM更深入地学习有限的有机数据。SynPro应用两种操作，即重新表述和重新格式化，以多样化的形式呈现相同的有机源，以促进更深层次的学习，而无需引入外部信息。两个生成器通过强化学习优化，使用质量、忠实度和数据影响奖励进行优化，并在预训练平台期持续更新，以针对模型尚未吸收的内容。我们使用DCLM-Baseline的10%最优令牌（0.8B和2.2B）预训练400M和1.1B模型，反映了前沿预训练中现实的数据驱动阶段。我们的结果表明，有机数据被标准重复方法显著低估：SynPro解锁了比重复方法多3.7-5.2倍的有效令牌，甚至在1.1B规模上超过了非数据驱动的Oracle，该Oracle在等效唯一数据上训练。分析证实，忠实、模型意识的合成可以在不导致分布崩溃的情况下实现数据驱动的扩展。我们开源代码在https://github.com/cxcscmu/SynPro。

英文摘要

LLM pretraining is shifting from a compute-bound to a data-bound regime, where available human (organic) text falls far short of scaling demands. However, reaching the data-bound regime does not mean the model has fully utilized its organic corpus. In this paper, we introduce SynPro, a synthetic data generation framework that helps LLMs more thoroughly learn from limited organic data. SynPro applies two operations, rephrasing and reformat, that present the same organic source in diverse forms to facilitate deeper learning without introducing external information. Both generators are optimized via reinforcement learning with quality, faithfulness, and data influence rewards, and are continuously updated as pretraining plateaus to target content the model has yet to absorb. We pretrain 400M and 1.1B models with 10% of their Chinchilla-optimal tokens (0.8B and 2.2B) from DCLM-Baseline, reflecting a realistic data-bound regime in frontier pretraining. Our results reveal that organic data is significantly underutilized by standard repetition: SynPro unlocks 3.7-5.2x the effective tokens of repetition, even surpassing the non-data-bound oracle that trains on equivalent unique data at the 1.1B scale. Analyses confirm that faithful, model-aware synthesis sustains data-bound scaling without causing distribution collapse. We open-source our code at https://github.com/cxcscmu/SynPro.

URL PDF HTML ☆

赞 0 踩 0

2605.17833 2026-05-19 cs.LG cs.AI 版本更新

通过Wasserstein梯度流构建数据免费一步采样的统一框架

Chenguang Wang, Tianshu Yu

发表机构 * School of Data Science（数据科学学院）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结本文提出了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架，展示了f-分歧度目标下诱导速度场的通用形式，并通过软欠覆盖功能理论推导了分歧度选择与质量运输几何之间的压缩-弹性恒等式，进一步扩展到Log-Variance分歧度，并通过KDE实现和归一化流路线实现了一步推断。

详情

AI中文摘要

我们开发了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架。对于广泛的标准f-分歧度目标，我们证明诱导速度场具有通用形式V(x)=w(r(x))β(x)，其中β(x)=∇log(p(x)/q(x))在不同目标中共享，而w仅由分歧度的选择决定。这种分解表明标准f-分歧度漂移共享相同的渐近目标分布p，并主要区别于如何在欠覆盖区域重新分配瞬时修复努力。为了正式化这种区别，我们推导了软欠覆盖功能的一步区域响应理论，并获得了一个将分歧度选择与质量运输进入欠覆盖区域的几何联系的压缩-弹性恒等式。我们进一步将该框架扩展到Log-Variance (LV)分歧度，分析参考分布如何改变最终的漂移结构，并提出一个实用的LV启发式替代方案用于数据免费训练。基于此理论，我们通过KDE实现该框架，并描述了互补的归一化流路线，从而在训练后实现一步推断。在多模态高斯混合基准测试中的实验结果与理论预测一致，并在这些目标上展示了有效的一步采样。

英文摘要

We develop a unified theoretical framework for data-free one-step sampling from unnormalized target distributions based on Wasserstein gradient flows. For a broad class of standard f-divergence objectives, we show that the induced velocity field admits the universal form $\mathbf{V}(x)=w(r(x))\,β(x)$, where $β(x)=\nabla \log (p(x)/q(x))$ is shared across objectives and $w$ is determined solely by the choice of divergence. This decomposition shows that standard f-divergence drifts share the same asymptotic target distribution $p$ and differ primarily in how they redistribute transient repair effort across under-covered regions. To formalize this distinction, we derive a one-step regional-response theory for a soft under-coverage functional and obtain a compression--elasticity identity that links divergence choice to the geometry of mass transport into under-covered regions. We further extend the framework beyond the f-divergence family to the Log-Variance (LV) divergence, analyze how the reference distribution alters the resulting drift structure, and motivate a practical LV-inspired surrogate for data-free training. Based on this theory, we instantiate the framework with a KDE-based implementation and describe a complementary normalizing-flow route, enabling one-step inference after training. Experiments on multimodal Gaussian-mixture benchmarks are consistent with the theoretical predictions and demonstrate effective one-step sampling on these targets.

URL PDF HTML ☆

赞 0 踩 0

2605.17806 2026-05-19 cs.LG 版本更新

AMO: Adaptive Muon Orthogonalization

AMO：自适应缪子正交化

Xinlin Zhuang, Panyi Ouyang, Yichen Li, Jiangming Shi, Yizhang Chen, Shuman Liu, Ying Qian, Weiyang Liu, Haibo Zhang, Imran Razzak

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Shopee ； MBZUAI ； East China Normal University（华东师范大学）； Huazhong University of Science and Technology（华中科技大学）； Xiamen University（厦门大学）

AI总结本文研究了缪子优化中正交化过程的异质性，提出自适应缪子正交化方法，通过测量权重几何特性动态分配NS预算，提升预训练性能。

Comments preprint, under-review

详情

AI中文摘要

缪子最近作为一种替代AdamW的预训练优化器出现，其核心操作是通过牛顿-施鲁茨（NS）迭代实现正交化。现有缪子变体对所有参数矩阵应用统一的NS调度，忽略了正交化难度的差异及其对性能的影响。通过系统性的实证研究，我们发现这种每矩阵异质性普遍存在，主要由矩阵几何决定，其在不同操作类型、训练阶段和网络深度下动态变化。因此，统一的NS调度可能导致模型中正交化质量不均。受此启发，我们提出自适应缪子正交化（AMO），一种观察后承诺的方法，通过早期测量操作类型权重几何特性，并利用这些信号为剩余训练分配NS预算。AMO在标准、延长和连续预训练中均优于统一调度的缪子，其在Llama3.1-1.4B上平均下游性能提升+0.76，在Qwen3-1.7B上提升+0.51。

英文摘要

Muon has recently emerged as a competitive alternative to AdamW for large-scale pre-training, with orthogonalization via Newton-Schulz (NS) iterations as its core operation. Existing Muon variants apply a uniform NS schedule to all parameter matrices, overlooking possible differences in orthogonalization difficulty and its impact on performance. Through a systematic empirical study, we show that this per-matrix heterogeneity is pervasive and largely determined by matrix geometry, which evolves dynamically across operator types, training stages, and network depths. As a result, uniform NS schedules can lead to uneven orthogonalization quality across the model. Motivated by these findings, we propose Adaptive Muon Orthogonalization (AMO), an observe-then-commit method that measures weight geometry by operator type early in training and then uses these signals to allocate the NS budget for the remainder of training. AMO delivers consistent improvements over uniform-schedule Muon across standard, prolonged, and continual pre-training, surpassing the strongest baseline by +0.76 on Llama3.1-1.4B and +0.51 on Qwen3-1.7B in average downstream performance of 12 evaluation tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.17799 2026-05-19 cs.CV cs.LG 版本更新

Is Complex Training Necessary for Long-Tailed OOD Detection? A Re-think from Feature Geometry

长尾分布外检测是否需要复杂的训练？从特征几何角度的重新思考

Ningkang Peng, Xuanming Chen, Yanhui Gu

发表机构 * Nanjing Normal University（南京师范大学）

AI总结本文重新审视长尾分布外检测问题，提出通过特征几何方法简化检测过程，改进Mahalanobis距离计算，提升检测性能。

详情

AI中文摘要

长尾分布外检测通常通过专门的训练方法解决，包括引入分布外数据、回避头、对比目标、能量损失或梯度冲突控制。我们表明这些训练机制可能掩盖了一个更简单的问题：冻结的长尾表示可能已经包含有用的分布外证据，但原始Mahalanobis距离受到频率耦合特征半径和不充分支持的尾部协方差的影响。我们提出了超球面池化Mahalanobis（HPM）方法，一种后处理检测器，将特征归一化到单位球面，并用池化、岭正则化的度量替换类特定协方差，同时保持类均值作为语义锚点。在CIFAR-LT实验和ImageNet-100-LT近分布外边界分析中，HPM提高了原始Mahalanobis评分；对于先验校准经验风险最小化（PC-ERM），在CIFAR-10-LT上将AUROC从46.49提升到85.67，在CIFAR-100-LT上从50.40提升到78.35。这个简单的PC-ERM+HPM流程在CIFAR-100-LT上实现了最佳对数效率分数（LES；3.08），在显著降低训练时间成本的情况下，保留了约95%的最佳CIFAR-100-LT AUROC观测值。这些结果表明，在长尾分布外检测中应分别评估表示质量、检测器几何和训练复杂性。

英文摘要

Long-tailed out-of-distribution (LT-OOD) detection is often addressed with specialized training, including auxiliary out-of-distribution (OOD) data, abstention heads, contrastive objectives, energy losses, or gradient-conflict control. We show that these training mechanisms can obscure a simpler issue: frozen long-tailed representations may already contain useful OOD evidence, but raw Mahalanobis distance is distorted by frequency-coupled feature radius and poorly supported tail covariance. We propose Hyperspherical Pooled Mahalanobis (HPM), a post-hoc detector that normalizes features onto the unit sphere and replaces class-specific covariance with a pooled, ridge-regularized metric while keeping class means as semantic anchors. In CIFAR-LT experiments and an ImageNet-100-LT near-OOD boundary analysis, HPM improves raw Mahalanobis scoring; for Prior-Calibrated ERM (PC-ERM), it raises AUROC from 46.49 to 85.67 on CIFAR-10-LT and from 50.40 to 78.35 on CIFAR-100-LT. This simple PC-ERM+HPM pipeline also achieves the best Log Efficiency Score (LES; 3.08) on CIFAR-100-LT, retaining roughly 95% of the best CIFAR-100-LT AUROC observed among the compared post-hoc scores at substantially lower training-time cost. These results argue for evaluating representation quality, detector geometry, and training complexity as separate factors in LT-OOD detection.

URL PDF HTML ☆

赞 0 踩 0

2605.17795 2026-05-19 cs.LG cs.CV 版本更新

When Accuracy Is Not Enough: Uncertainty Collapse between Noisy Label Learning and Out-of-Distribution Detection

当准确性不够时：噪声标签学习与分布外检测之间的不确定性崩溃

Ningkang Peng, Jingyang Mao, Runhan Zhou, Peirong Ma, Yanhui Gu

发表机构 * Nanjing Normal University（南京师范大学）

AI总结本文研究了噪声标签学习与分布外检测之间的不确定性崩溃问题，提出了一种通用的ACC-OOD基准，揭示了高准确率并不保证分布外可靠性，提出虚拟边距正则化方法来缓解这一问题。

详情

AI中文摘要

噪声标签学习（LNL）通常通过封闭集分类准确率进行评估，但部署时往往需要分类器能够拒绝分布外（OOD）输入。我们提出了一种学习者无关的ACC-OOD基准，冻结LNL检查点，并在合成和真实噪声标签上评估它们，使用标准化的近/远OOD路由和事后评分。该基准揭示了一种反复出现的失败模式：高封闭集准确率不保证OOD可靠性，因为低置信度、被错误分类的分布内样本可能在噪声训练下与OOD输入占据的得分和特征区域重叠。我们称之为这种病理现象不确定性崩溃。这种结构重叠可能导致高准确率的LNL方法在标准OOD评分下失去ID错误/OOD界面的分离性。作为干预措施，我们研究了虚拟边距正则化（VMR），一种轻量级的修复探针，主要通过PSSCL展示，通过在可信ID批次上合成边界虚拟异常值并扩大能量边距。VMR在不替换主机目标或牺牲封闭集准确率的情况下，部分减少了由崩溃引起的远OOD失败。这些结果支持LNL基准，同时报告封闭集泛化、开放世界可靠性以及结构重叠诊断。

英文摘要

Learning with noisy labels (LNL) is typically benchmarked by closed-set classification accuracy, yet deployment often requires classifiers to reject out-of-distribution (OOD) inputs. We present a learner-agnostic ACC-OOD benchmark that freezes LNL checkpoints and evaluates them with standardized near-/far-OOD routing and post-hoc scores across synthetic and real label noise. The benchmark reveals a recurring failure mode: high closed-set accuracy does not ensure OOD reliability, because low-confidence, misclassified in-distribution samples can overlap the score and feature regions occupied by OOD inputs under noisy training. We term this pathology uncertainty collapse. This structural overlap can make high-accuracy LNL methods lose separability at the ID-error/OOD interface under standard OOD scores. As an intervention, we study Virtual Margin Regularization (VMR), a lightweight repair probe demonstrated mainly with PSSCL that synthesizes boundary virtual outliers on trusted ID batches and widens the energy margin. VMR partially reduces the collapse-induced far-OOD failure without replacing the host objective or sacrificing closed-set accuracy in the tested settings. These results support LNL benchmarks that co-report closed-set generalization, open-world reliability, and structural overlap diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2605.17792 2026-05-19 cs.LG physics.geo-ph 版本更新

HydroAgent: Closing the Gap Between Frontier LLMs and Human Experts in Hydrologic Model Calibration via Simulator-Grounded RL

HydroAgent: 通过模拟器引导的强化学习缩小前沿大语言模型与人类专家在水文模型校准之间的差距

Zhi Li, Songkun Yan, Jie Cao, Mofan Zhang, Anjiang Wei, Jinwoong Yoo, Yang Hong

发表机构 * Civil, Environmental, and Architectural Engineering, University of Colorado Boulder（科罗拉多大学波尔德分校土木、环境与建筑工程系）； Civil Engineering and Environmental Sciences, University of Oklahoma（俄克拉荷马大学土木工程与环境科学系）； Department of Computer Science, University of Oklahoma（俄克拉荷马大学计算机科学系）； Civil and Environmental Engineering, Stanford University（斯坦福大学土木与环境工程系）； Department of Computer Science, Stanford University（斯坦福大学计算机科学系）； NASA Goddard Space Flight Center（美国国家航空航天局戈达德空间飞行中心）

AI总结本文研究如何利用前沿大语言模型（LLM）代理替代人类水文模型师进行水文模型校准，提出HydroAgent方法，通过模拟器引导的强化学习（RLSF）进行微调，以提高模型在不同流域中的适应性和准确性。

详情

AI中文摘要

校准分布式水文模型是操作水资源管理中的关键瓶颈——径流预测、水库调度、干旱监测、基础设施设计和洪水预测都依赖于此。每个流域都需要专家将水文图谱特征转化为高维参数向量的调整，而这种工作流程无法在不同流域之间转移。我们问：前沿大语言模型（LLM）代理能否替代人类水文模型师？如果不能，需要什么条件？我们对九个前沿LLM代理——Claude Opus 4.6/4.7、Sonnet 4.6、GPT-5/5.4/5.4-pro和Gemini 2.5-pro/3.1-pro/3-flash——在由美国国家气象局用于暴雨预报的运营CREST分布式水文模型上进行基准测试。最佳的二十轮次Nash-Sutcliffe效率（NSE）在四个保留的水文站上跨越329-40,792平方公里的范围从-0.16（GPT-5.4）到0.75（Sonnet 4.6）；上限在所有三个供应商和能力层级中都保持一致，最强的模型集中在0.65-0.75范围内，除了Opus-4.7在其中一个水文站外，没有其他模型达到人类专家的参考水平。我们认为这个差距不是参数数量的问题，而是领域基础的问题。然后我们提出了HYDROAGENT，通过监督微调2,576条专家校准轨迹和使用NSE作为可验证奖励的组相对策略优化，对开放权重的Qwen3-4B进行微调——模拟器反馈的强化学习（RLSF）。对于地球系统科学，一个经过领域微调的策略，通过模拟器在环的强化学习，比扩展通用前沿模型更计算高效且物理上更忠实，而地球数据的多模态丰富性——遥感、现场时间序列和预报员叙述——使领域代理成为物理科学中人工智能发展的杠杆方向。

英文摘要

Calibrating distributed hydrologic models is a critical bottleneck across operational water resources management - streamflow prediction, reservoir operation, drought monitoring, infrastructure design, and flood forecasting all depend on it. Each basin demands an expert to translate hydrograph signatures into adjustments of a high-dimensional parameter vector, and the resulting workflow does not transfer between watersheds. We ask: can frontier large language model (LLM) agents replace the human hydrologic modeler, and if not, what would it take? We benchmark nine frontier LLM agents - Claude Opus 4.6/4.7, Sonnet 4.6, GPT-5/5.4/5.4-pro, and Gemini 2.5-pro/3.1-pro/3-flash - on the operational CREST distributed hydrologic model used by the U.S. National Weather Service for flash-flood forecasting. Best-of-twenty-rounds Nash-Sutcliffe Efficiency (NSE) across four held-out gauges spanning 329-40,792 km2 ranges from -0.16 (GPT-5.4) to 0.75 (Sonnet 4.6); the ceiling reproduces across all three vendors and capability tiers, with the strongest models concentrating in the 0.65-0.75 band, and no model reaches the human-expert reference except Opus-4.7 on one gauge. We argue this gap is not a parameter-count problem but a domain-grounding problem. We then propose HYDROAGENT, fine-tuning open-weight Qwen3-4B with supervised fine-tuning on 2,576 expert calibration trajectories and Group-Relative Policy Optimization using NSE as a verifiable reward from online CREST simulations - reinforcement learning with simulation feedback (RLSF). For Earth system science, a small domain-tuned policy with simulator-in-the-loop RL is a more compute-efficient and physically faithful path than scaling generic frontier models, and the multi-modal richness of Earth data - remote sensing, in-situ time series, and forecaster narrative - makes domain agents a leveraged direction for AI in physical science.

URL PDF HTML ☆

赞 0 踩 0

2605.17787 2026-05-19 cs.LG 版本更新

Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates

重新审视LLM预训练中Adam与SGD的差距：大有效学习率的作用

Athanasios Glentis, Dawei Li, Chung-Yiu Yau, Mingyi Hong

发表机构 * University of Minnesota（明尼苏达大学）

AI总结本文通过实证和理论分析，发现SGD在LLM预训练中表现较差的原因在于其无法维持与Adam相媲美的有效学习率，而大有效学习率需求源于小梯度范数和大权重-梯度比，且在大批次大小下更加明显。通过简单剪枝机制，SGD在大学习率下能恢复大部分Adam性能，实验显示验证损失差距从超过50%降至约3.5%。

详情

AI中文摘要

人们普遍认为随机梯度下降（SGD）在预训练大型语言模型（LLMs）时比自适应优化器如Adam表现更差。然而，这一差距的根源仍不清楚。本文认为，SGD无法维持与Adam相比更大的有效学习率是导致差异的主要原因。通过分析LLM预训练动态，我们发现训练过程中梯度范数较小且权重-梯度比较大，这一现象在预训练中常见的大批次大小下更加显著，需要较大的有效学习率。然而，我们发现输出层梯度幅度在不同token类别间差异显著，且训练过程中经常出现大梯度尖峰。这些因素严重限制了SGD的可接受学习率。基于这一理解，我们展示出简单的剪枝机制能够稳定SGD在大学习率下的表现，使其恢复大部分Adam的性能。在大规模实验中，使用1B参数的LLaMA模型和1M token批次大小预训练时，大学习率SGD与Adam的验证损失差距从超过50%降至仅约3.5%。

英文摘要

It is widely believed that stochastic gradient descent (SGD) performs significantly worse than adaptive optimizers such as Adam in pre-training Large Language Models (LLMs). Yet the underlying reason for this gap remains unclear. In this work, we attribute a large part of the discrepancy to SGD's inability to sustain learning rates comparable to Adam's much larger effective learning rates. Through empirical and theoretical analysis of LLM pre-training dynamics, we identify that training is characterized by small gradient norms and large weight-to-gradient ratios, an effect that becomes more pronounced with larger batch sizes typical in pre-training, necessitating such large effective learning rates. However, we find that output-layer gradient magnitudes become highly uneven across token classes, and that large gradient spikes frequently occur during training. Together, these effects severely restrict the admissible learning rate of SGD. Guided by this understanding, we show that simple clipping mechanisms that stabilize SGD at large learning rates enable it to recover most of Adam's performance. In our large-scale experiments, the validation loss gap between large-learning-rate SGD and Adam shrinks from more than 50% to only about 3.5% when pre-training a 1B-parameter LLaMA model with a 1M-token batch size.

URL PDF HTML ☆

赞 0 踩 0

2605.17778 2026-05-19 math.ST cs.LG stat.ME stat.ML stat.TH 版本更新

Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

自蒸馏在带噪协方差模型中的谱收缩估计器中是最优的

Radu Lecoiu, Debarghya Mukherjee, Pragya Sur

发表机构 * Department of Statistics, Harvard University（哈佛大学统计学系）； Department of Mathematics & Statistics, Boston University（波士顿大学数学与统计学系）

AI总结本文研究了自蒸馏在带噪协方差模型中的表现，证明了在谱收缩估计器中，s步自蒸馏在性能上最优，并展示了其在统计和机器学习中的优势。

Comments 103 pages, 8 figures

详情

AI中文摘要

自蒸馏已经 emerged 为提高现代机器学习系统模型性能的一种有前景的技术。我们通过引入并分析一个广泛的估计器类别，即谱收缩估计器，建立了自蒸馏在带噪协方差模型中的统计基础。我们证明了对于具有s个脊的带噪协方差矩阵，s步自蒸馏在谱收缩估计器中达到最优性能，优于统计和机器学习中已知的估计器。此外，我们还显示s步是必要的，任何(s-k)步蒸馏估计器对于1 ≤ k ≤ s都是严格次优的。对于等方差协方差的特殊子类，我们证明了最优调优的岭回归在谱收缩估计器中表现最佳。我们还研究了一种联邦方法，其中多个数据中心共享谱收缩估计器，并且一个共同的服务器试图聚合它们以实现最优性能。在这种情况下，我们发现最佳的本地规则再次采用自蒸馏的形式，尽管当数据集中在单一服务器上时，它与最优规则不同。总之，我们的结果阐明了自蒸馏如何提高预测性能，并提供了一个更广泛的统计框架，将自蒸馏与经典收缩方法联系起来。

英文摘要

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

URL PDF HTML ☆

赞 0 踩 0

2605.17765 2026-05-19 cs.LG 版本更新

AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models

AURORA：用于医疗基础模型中几何表示学习的上下文正交化

Yuanyun Zhang, Shi Li

发表机构 * University of the Chinese Academy of Sciences（中国科学院大学）； Columbia University（哥伦比亚大学）

AI总结本文提出AURORA框架，通过上下文潜在几何进行正交化，以解决医疗基础模型中潜在表示的语义模糊和上下文变化不稳定性问题，提升了模型在不同机构分布变化下的鲁棒性和预测性能。

详情

AI中文摘要

近年来，医疗基础模型通过大规模自监督学习实现了强大的预测性能，但其潜在表示经常将生理严重程度、干预强度、观察结构和机构工作流程整合到共享嵌入方向中。尽管在下游预测中有效，这些表示在上下文变化下仍然语义模糊且不稳定。我们引入AURORA，即通过正交化关系对齐的适应性不确定性感知表示，这是一种基于上下文潜在几何的医疗表示学习新框架。与优化单一统一嵌入流形不同，AURORA将表示分解为对应于不同上下文因素的正交语义子空间，并在每个子空间内学习关系一致性目标。这诱导出既语义解耦又几何可解释的潜在空间。在多个临床预测和检索任务中，AURORA在重建、对比和自蒸馏基线方面表现一致优于，同时显著提高了上下文解耦、邻域纯度和机构分布变化下的鲁棒性。我们的结果表明，潜在几何本身是医疗基础模型设计的重要轴线，且根据上下文语义显式结构化表示空间为传统预测压缩目标提供了补充方向。

英文摘要

Recent healthcare foundation models have achieved strong predictive performance through large scale self supervised learning, yet their latent representations frequently entangle physiologic severity, intervention intensity, observational structure, and institutional workflow into shared embedding directions. While effective for downstream prediction, such representations remain semantically opaque and unstable under contextual shift. We introduce AURORA, Adaptive Uncertainty aware Representations through Orthogonalized Relational Alignment, a new framework for healthcare representation learning based on contextual latent geometry. Rather than optimizing a single unified embedding manifold, AURORA decomposes representations into orthogonal semantic subspaces corresponding to distinct contextual factors and learns relational consistency objectives within each subspace. This induces latent spaces that are both semantically disentangled and geometrically interpretable. Across multiple clinical prediction and retrieval tasks, AURORA consistently outperforms reconstruction, contrastive, and self distillation baselines while substantially improving contextual disentanglement, neighborhood purity, and robustness under institutional distribution shift. Our results suggest that latent geometry itself constitutes an important axis of healthcare foundation model design and that explicitly structuring representation space according to contextual semantics provides a complementary direction beyond conventional predictive compression objectives.

URL PDF HTML ☆

赞 0 踩 0

2605.17761 2026-05-19 cs.SI cs.LG 版本更新

MV-Gate: Insider Threat Detection via Multi-View Behavioral Statistics and Semantic Modeling

MV-Gate：通过多视图行为统计与语义建模进行内部威胁检测

Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng

发表机构 * College of Cyber Security, Jinan University（济南大学网络安全学院）； School of Advanced Technology, Xi’an Jiaotong-Liverpool University（西安交通大学利物浦大学先进技术学院）

AI总结本文提出MV-Gate框架，通过整合行为统计规律与序列语义，有效检测渐进性和低可见性内部威胁，提升了内部威胁检测的鲁棒性。

Comments Accepted by The 29th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2026)

详情

AI中文摘要

内部威胁往往通过行为统计的早期异常（如复发模式的变化或短期与长期频率的转变）而非事件语义的变化来揭示。然而，随着领域从统计建模转向日志标记和深度序列编码，这些统计线索被削弱或丢失，导致当前模型对渐进性和低可见性内部行为不敏感。本文提出MV-Gate，一种多视图行为建模框架，明确整合统计规律与序列语义。MV-Gate构建了三个对齐的行为序列：活动标记、多尺度状态信号捕捉复发模式，以及频率偏差信号描述短期与长期强度差异。一个异常感知的门控机制将这些统计视图注入注意力计算，引导编码器强调统计不规则事件。在CERT r4.2、CERT r5.2和ADFA-LD上的实验表明，MV-Gate在经典、深度学习和领域特定基线模型上取得了显著提升，特别是在渐进性和弱信号威胁方面。这些结果强调了联合建模统计和序列证据对于鲁棒内部威胁检测的必要性。

英文摘要

Insider threats often reveal early anomalies through disruptions in behavioral statistics-such as altered recurrence patterns or short-versus long-term frequency shifts-rather than changes in event semantics. Yet, as the field has shifted from statistical modeling to log tokenization and deep sequential encoders, these statistical cues are weakened or lost, leaving current models insensitive to gradual and low-visibility insider behaviors.We propose MV-Gate, a multi-view behavior modeling framework that explicitly integrates statistical regularities with sequence semantics. MV-Gate constructs three aligned behavioral sequences: activity tokens, multi-scale status signals capturing recurrence patterns, and frequency-deviation signals describing short- vs long-term intensity differences. An anomaly-aware gating mechanism injects these statistical views into the attention computation, guiding the encoder to emphasize statistically irregular events. Experiments on CERT r4.2, CERT r5.2, and ADFA-LD show that MV-Gate achieves notable gains over classical, deep-learning, and domain-specific baselines, particularly for progressive, weak-signal threats. These results highlight the necessity of jointly modeling statistical and sequential evidence for robust insider-threat detection.

URL PDF HTML ☆

赞 0 踩 0

2605.17758 2026-05-19 cs.LG 版本更新

Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets

Memisis：协调和评估表格健康数据的合成数据

Nitish Nagesh, Mahdi Bagheri, Arshia Harish Puthran, Pengbao Zhou, Muhjaazee Love, Aadi Sharma, Ian Harris, Amir M. Rahmani

发表机构 * University of California Irvine（加州大学尔湾分校）

AI总结本文提出Memisis工具，通过结合现有合成数据工具、大语言模型和先进评估指标，协调和评估合成数据，以提高下游预测任务和临床决策的质量。

详情

AI中文摘要

合成数据在医疗领域被广泛用于创建与原始数据相似但不涉及隐私问题的数据集。在隐私、效用和公平性方面生成和评估合成数据对于促进高质量数据的可用性以支持下游预测任务和临床决策至关重要。我们提出了Memisis，一个工具，通过利用现有的合成数据工具、大语言模型的威力以及最先进的评估指标来协调和评估合成数据。我们的工具创建了一个统一的工作流用于数据生成、验证和评估。用户可以控制训练大小、训练周期以及合成行的数量。而不是通过调整合成数据的参数，交互式代理允许用户指定其合成数据生成目标，工具将通过利用现有工具并执行必要的评估来协调工作流。在演示中，我们使用了一个开源的 schizophrenia 数据集，其中包含与种族和性别相关的受保护属性，三种不同的合成器和一个本地语言模型来协调工作流。我们观察到 CTGAN、TVAE 和 GaussianCopula 在公平性和效用指标上表现相当。工作流允许用户在数据生成和评估过程中拥有灵活性和控制。

英文摘要

Synthetic data is widely used in healthcare to create datasets that are similar to original data but without the privacy concerns. Generating and evaluating synthetic data across privacy, utility and fairness is crucial for facilitating high quality data availability for downstream prediction tasks and clinical decision making. We present Memisis, a tool that orchestrates and evaluates synthetic data by leveraging existing synthetic data tools, the power of large language models and state-of-the-art evaluation metrics. Our tool creates a unified workflow for data generation, validation and evaluation. Users have control over the training size, training epochs and the number of synthetic rows to sample. Instead of knobs to tune synthetic data, the interactive agent allows users to specify their synthetic data generation goals and the tool will orchestrate the workflow by leveraging existing tools while performing the requisite evaluation. For the demo, we use an open source schizophrenia dataset with protected attributes related to race and gender, three different synthesizers and a local language model to orchestrate the workflow. We observe that CTGAN, TVAE and GaussianCopula have comparable performance across fairness and utility metrics. The workflow allows users flexibility and control over the data generation and evaluation process.

URL PDF HTML ☆

赞 0 踩 0

2605.17757 2026-05-19 cs.LG cs.AI cs.DC cs.PF 版本更新

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

OSCAR: 2位KV缓存量化中的离线频谱协方差感知旋转

Zhongzhu Zhou, Donglin Zhuang, Jisen Li, Ziyan Chen, Shuaiwen Leon Song, Ben Athiwaratkun, Xiaoxia Wu

发表机构 * Together AI ； University of Sydney（悉尼大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出OSCAR方法，通过离线估计注意力感知的协方差结构，实现2位KV缓存量化的高效和准确，同时开发了可部署的系统，提升了LLM服务框架的性能和效率。

Comments 35 pages, 10 figures

详情

AI中文摘要

INT2 KV-cache量化对于长上下文LLM服务具有吸引力，但实现准确性和可部署性仍然具有挑战。简单的旋转如Hadamard变换可以减少异常值，但仍然在INT2层面失效，因为它们与下游注意力不对齐。我们提出了OSCAR，一种超低比特KV缓存量化方法，通过离线估计注意力感知的协方差结构，并利用这些结构推导出固定旋转和截断阈值用于量化。这样，KV量化就与注意力实际消耗的协方差结构对齐。更重要的是，我们不仅提供了理论依据，还开发了一个完全可部署的OSCAR系统，包含一个定制的INT2注意力内核，该内核与分页KV缓存服务和融合内核流水线保持兼容，从而无缝集成到现代LLM服务框架中，如SGLang和vLLM。我们评估了我们的方法在最近的推理模型上，使用最多32k token的推理轨迹进行跨5个任务的测试。在Qwen3-4B-Thinking-2507和Qwen3-8B上，OSCAR将BF16精度差距分别减少到3.78和1.42个点，而朴素旋转INT2几乎归零。我们进一步将OSCAR扩展到Qwen3-32B和GLM-4.7（358B参数），其中它仍然与BF16保持有效相当。在长上下文-RULER-NIAH（最多128K）上，OSCAR在Qwen3模型上保持稳健，而朴素旋转INT2崩溃。从系统层面来看，OSCAR将KV缓存内存减少约8倍，在相同内存预算下，大批次大小下吞吐量提高最多7倍，并且由于内存带宽开销减少，单批次解码速度比BF16快最多3倍。

英文摘要

INT2 KV-cache quantization is attractive for long-context LLM serving, but it remains difficult to make both accurate and deployable. Simple rotations such as Hadamard transforms reduce outliers, but still degrade at INT2 because they are not aligned with downstream attention. We propose OSCAR, an Ultra-low-bit KV Cache quantization method that estimates attention-aware covariance structures offline and uses them to derive fixed rotations and clipping thresholds for quantization. In this way, it aligns KV quantization with the covariance structures that attention actually consumes. More importantly, we not only provide theoretical justification but also develop a fully deployable OSCAR system with a custom INT2 attention kernel that remains compatible with paged KV-cache serving and fused kernel pipelines, enabling seamless integration into modern LLM serving frameworks such as SGLang and vLLM. We evaluate our methods on recent reasoning models with reasoning traces of up to 32k tokens across 5 tasks. On Qwen3-4B-Thinking-2507 and Qwen3-8B, OSCAR reduces the BF16 accuracy gap to 3.78 and 1.42 points, respectively, while naive rotation INT2 collapses to nearly zero. We further scale OSCAR to Qwen3-32B and GLM-4.7 (358B params), where it remains effectively on par with BF16. On long context - RULER-NIAH up to 128K, OSCAR remains robust on both Qwen3 models, while naive rotation INT2 collapses. System-wise, OSCAR reduces KV-cache memory by approximately 8x, improves throughput by up to 7x at large batch sizes under the same memory budget, and accelerates batch-size-1 decoding by up to 3x over BF16 due to reduced memory bandwidth overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.17749 2026-05-19 cs.LG stat.ML 版本更新

Testable and Actionable Calibration for Full Swap Regret

可检验且可操作的全面交换懊悔校准

Konstantina Bairaktari, Lunjia Hu, Huy L. Nguyen, Jonathan Ullman

发表机构 * Department of Computer Science, Aarhus University（阿arhus大学计算机科学系）； Khoury College of Computer Sciences, Northeastern University（东北大学计算机科学学院）； Northeastern University（东北大学）

AI总结本文提出了一种新的校准度量标准SCDL，该度量标准在不削弱任何要求的前提下，既可操作又可检验，同时具备连续性和一致性等理想特性，并通过实验验证了其在实际中的优越性能。

详情

AI中文摘要

人工智能生成的预测越来越多地影响关键任务中的决策制定，因此必须具有可信度。校准是衡量可信度的一种广泛使用的度量标准，要求预测与真实频率匹配，并可以像真实概率一样对待某一结果。然而，定义校准是微妙的，设计良好的校准误差度量标准一直是最近研究的活跃主题。第一个目标是找到可操作的校准度量标准，即能够向决策者说明当预测被视为真实概率时的效用损失，这被称为交换懊悔。第二个目标是找到可检验的校准度量标准，即校准误差可以从少量预测和结果中测量出来。尽管这些是基本要求，但目前没有现有的校准度量标准能够完全满足这两个属性，所有现有的度量标准都通过限制交换懊悔的弱化观念来放松可操作性，或通过具有次优估计误差来放松可检验性。我们介绍了一种新的校准度量标准，称为软分箱校准决策损失（SCDL），我们证明其在不削弱任何要求的前提下是完全可操作的，并且可检验性具有几乎最优的误差率。此外，SCDL还满足其他理想属性，如连续性和一致性。我们还提供了一组实验，证明了SCDL与其他度量标准的理论优势在实践中导致更好的性能。

英文摘要

AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure that fully satisfies both properties, and all existing measures relax actionability by bounding a weaker notion of swap regret, or relax testability by having suboptimal estimation error. We introduce a new calibration measure, Soft-Binned Calibration Decision Loss (SCDL), which we prove is fully actionable without weakening either requirement, and testable with nearly optimal error rate. In addition, SCDL satisfies other desired properties such as continuity and consistency. We also provide a set of experiments confirming that the theoretical advantages of SCDL compared to other measures lead to better performance in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.17745 2026-05-19 stat.ML cs.LG 版本更新

StatQAT: Statistical Quantizer Optimization for Deep Networks

StatQAT: 深度网络的统计量化优化

Mehmet Aktukmak, Daniel Huang, Ke Ding

发表机构 * Intel（英特尔）

AI总结本文提出了一种新的统计误差分析框架，用于统一和浮点量化，以提供理论洞察，针对不同数据分布的量化配置误差行为。基于此分析，作者提出了适用于任意数据分布的迭代量化器和适用于高斯似分布权重的分析量化器，从而实现了高效的低误差量化，适用于激活和权重。将这些量化器整合到量化感知训练中，并在整数和浮点格式上进行了评估，实验表明提高了准确性和稳定性，展示了该方法在训练低精度神经网络中的有效性。

详情

AI中文摘要

量化对于减少深度神经网络的计算成本和内存使用至关重要，使低精度硬件上的高效推断成为可能。尽管统一和浮点量化方案的广泛应用，选择最优的量化参数仍是一个关键挑战，尤其是在训练和推断过程中遇到的多样化数据分布。本文提出了一种新的统计误差分析框架，用于统一和浮点量化，提供了对量化配置下误差行为的理论洞察。基于此分析，我们提出了适用于任意数据分布的迭代量化器和适用于高斯似分布权重的分析量化器。这些方法使高效、低误差的量化成为可能，适用于激活和权重。我们将我们的量化器整合到量化感知训练中，并在整数和浮点格式中进行了评估。实验表明，精度和稳定性得到了提高，突显了我们的方法在训练低精度神经网络中的有效性。

英文摘要

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes, selecting optimal quantization parameters remains a key challenge, particularly for diverse data distributions encountered during training and inference. This work presents a novel statistical error analysis framework for uniform and floating-point quantization, providing theoretical insight into error behavior across quantization configurations. Building on this analysis, we propose iterative quantizers designed for arbitrary data distributions and analytic quantizers tailored for Gaussian-like weight distributions. These methods enable efficient, low-error quantization suitable for both activations and weights. We incorporate our quantizers into quantization-aware training and evaluate them across integer and floating-point formats. Experiments demonstrate improved accuracy and stability, highlighting the effectiveness of our approach for training low-precision neural networks.

URL PDF HTML ☆

赞 0 踩 0

2605.17733 2026-05-19 cs.AI cs.LG 版本更新

玩具组合可解释性模型揭示早期特征空间中的彩票彩票

Alon Bebchuk, Nir Shavit

发表机构 * Tel-Aviv University（特拉维夫大学）； MIT and Red Hat AI（麻省理工学院和红帽AI）

AI总结本文研究了彩票彩票假说在早期特征空间中的表现，通过组合玩具模型揭示了彩票彩票在特征空间中的保留对象，表明彩票彩票结构由隐藏的特征空间几何而非权重空间子网络身份决定。

详情

AI中文摘要

彩票彩票假说认为密集网络中包含稀疏子网络，即' winning tickets'，当重置初始权重并单独训练时，其性能可与完整模型匹配。我们提出更机理性的问题：彩票彩票保留的是什么内部对象？我们采用组合、子句结构的玩具设置，该设置允许具有明确组合距离的可解释特征空间表示。我们显示，在权重空间中彩票彩票对应于特征空间中已接近最终特征通道编码的前驱位置。密集SGD通过结构化选择解决这些位置：近邻位置要么收敛到最终代码要么被拒绝，拒绝集中在更拥挤的神经元，暗示在叠加下存在竞争。因此，彩票彩票是兼容代码位置的家族，共同平衡接近最终代码与低特征间干扰。稀疏重训练通常在不同行上重新表达相同的子句/模板家族，因此保留的对象是家族层面而非微观行身份。我们通过轻量级探针基于特征空间距离和运动验证了这一观点；在我们的设置中，这些探针在准确性和精确代码恢复方面经常优于已建立的基于权重的彩票发现方法。尽管这些发现基于玩具设置，但它们表明彩票彩票结构由隐藏的特征空间几何而非权重空间子网络身份决定。

英文摘要

The lottery ticket hypothesis posits that dense networks contain sparse subnetworks, ``winning tickets,'' that, when rewound to their initial weights and retrained in isolation, match the performance of the full model. We ask a more mechanistic question: what internal object does a winning ticket preserve? We work in a combinatorial, clause-structured toy setting that admits an interpretable feature-space representation with well-defined combinatorial distances between features. We show that winning tickets in weight space correspond to precursor locations in feature space that are already near, at initialization, to the final feature-channel codes. Dense SGD resolves these locations through structured selection: proximal locations either converge to final codes or are rejected, with rejection concentrated at more crowded neurons, implicating competition under superposition. A winning ticket is thus a family of compatible code locations that jointly balance proximity to final codes with low inter-feature interference. Sparse retraining often re-expresses the same clause/template family on a different row, so the preserved object is family-level rather than microscopic row identity. We validate this account with lightweight probes based on feature-space distance and motion; in our setting, these probes frequently outperform established weight-based ticket discovery methods in both accuracy and exact code recovery. Although these findings are grounded in a toy setting, they suggest that the lottery ticket structure is governed by hidden feature-space geometry rather than weight-space subnetwork identity.

URL PDF HTML ☆

赞 0 踩 0

2605.17698 2026-05-19 cs.LG cs.MA 版本更新

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

Agent Bazaar: 使多智能体市场场所具备经济对齐能力

Seth Karten, Cameron Crow, Chi Jin

发表机构 * Princeton University（普林斯顿大学）

AI总结该研究提出Agent Bazaar框架，用于评估多智能体系统的经济对齐能力，通过分析两种失败模式（算法不稳定和Sybil欺骗）发现模型难以自我调节，并提出经济对齐的训练方法和EAS评分标准。

Comments 17 pages, 9 figures

详情

AI中文摘要

将大型语言模型（LLMs）作为自主经济代理部署引入了系统性风险，这些风险超出了单个能力故障的范围。随着代理直接参与市场，其集体行为会放大波动并大规模掩盖欺骗。我们引入Agent Bazaar，一个多代理模拟框架，用于评估经济对齐能力，即代理系统维持市场稳定和完整性的能力。我们识别出两种失败模式：（1）在B2C市场中的算法不稳定（

英文摘要

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.

URL PDF HTML ☆

赞 0 踩 0

2605.17693 2026-05-19 cs.LG cs.AI 版本更新

Fine-tuning Pocket-Aware Diffusion Models via Denoising Policy Optimization

通过去噪策略优化微调意识口袋扩散模型

Yuan Xue, Daniel Kudenko, Megha Khosla

发表机构 * L3S Research Center（L3S研究所以）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出DEPPA方法，基于去噪扩散策略优化，通过强化学习微调预训练的意识口袋扩散模型，以优化结合亲和力、药物性、可合成性和多样性等多属性。

详情

AI中文摘要

基于结构的药物设计已被意识口袋3D生成模型加速，但大多数方法主要拟合训练分布，可能无法满足真实世界治疗药物发现所需的多种属性。最近，越来越多的关注集中在基于结构的分子优化（SBMO）上，其目标是精细控制多个指定的分子属性。在本文中，我们提出DEPPA，一种新的SBMO方法，基于去噪扩散策略优化，通过强化学习微调预训练的意识口袋扩散模型。DEPPA能够优化多个属性，包括结合亲和力、药物性、可合成性和多样性。我们将预训练的意识口袋扩散模型的反向去噪过程建模为多步马尔可夫决策过程，其中期望的属性作为奖励信号在最终生成的配体分子上进行评估。DEPPA在RL微调期间结合粗略的去噪调度器，以实现高效的分子优化。在CrossDocked2020基准上的实验结果表明，DEPPA在结合亲和力（Vina Score -8.5 kcal/mol）、药物性和多样性方面优于基线，在可合成性方面表现出竞争性性能。源代码可在https://github.com/xy9485/DePPA上获得。

英文摘要

Structure-based drug design has been accelerated by pocket-aware 3D generative models, yet most methods primarily fit the training distribution and may fall short of satisfying multiple properties required in real-world therapeutic drug discovery. Recently, increasing attention has focused on structure-based molecule optimization (SBMO), which targets fine-grained control over multiple specified molecular properties. In this paper, we present DEPPA, a novel SBMO approach building upon Denoising Diffusion Policy Optimization for fine-tuning a pre-trained pocket-aware diffusion model via reinforcement learning. DEPPA enables optimization over multiple properties, including binding affinity, drug-likeness, synthesizability and diversity. We formulate the reverse denoising process of the pretrained pocket-aware diffusion model as a multi-step Markov Decision Process, where the desired properties that serve as reward signals are evaluated on the final generated ligand molecules. DEPPA incorporates a coarse denoising scheduler during the RL fine-tuning to achieve efficient and effective molecule optimization. Experimental results on the CrossDocked2020 benchmark demonstrate that DEPPA outperforms baselines in binding affinity (Vina Score -8.5 kcal/mol), drug-likeness and diversity while exhibiting competitive performance in synthesizability. The source code is available at https://github.com/xy9485/DePPA .

URL PDF HTML ☆

赞 0 踩 0

2605.17692 2026-05-19 cs.LG math.OC 版本更新

Exact Convex Reformulations of Linear Neural Networks via Completely Positive Lifting

通过完全正提升实现线性神经网络的精确凸改写

Karthik Prakhya, Alp Yurtsever

AI总结本文提出了一种将深度线性神经网络的训练问题精确地转化为凸优化问题的方法，利用完全正锥的提升空间，将非凸性编码在锥约束中，并展示了其与半正定规划的联系。

详情

AI中文摘要

反事实解释在概念漂移下的应用

Marcin Kostrzewa, Jerzy Stefanowski, Maciej Zięba

发表机构 * Wrocław University of Science and Technology（沃拉什大学科学与技术学院）； Poznań University of Technology（波兹南技术大学）

AI总结本文研究了在数据不断变化的环境中，如何维护反事实解释的有效性，提出了一种轻量级的更新方案以修复现有解释，保持其与原始实例的接近性。

2605.17642 2026-05-19 cs.LG 版本更新

SynVA：一种用于血管生成和动脉瘤编辑的模块化工具包

Marten J. Finck, Niklas C. Koser, Sarker M. Mahfuz, Tameem Jahangir, Jon E. Wilhelm, Daniel Behme, Naomi Larsen, Wojtek Palubicki, Sylvia Saalfeld, Sören Pirk

发表机构 * Visual Computing and Artificial Intelligence, Kiel University, Germany（视觉计算与人工智能研究所，基尔大学，德国）； Institute for Medical Informatics and Statistics, Kiel University, Germany（医学信息学与统计研究所，基尔大学，德国）； Clinic for Neuroradiology, Medical Faculty, Magdeburg University, Germany（神经放射科，马格德堡大学医学学院，德国）； Department of Radiology and Neuroradiology, University Hospital Schleswig-Holstein, Germany（放射学与神经放射学部门，石勒苏益格-荷尔斯泰因大学医院，德国）； Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poland（数学与计算机科学学院，亚当·密茨凯维奇大学，波兰）

AI总结本文提出SynVA，一种模块化工具包，用于生成血管网格和在解剖学上一致的动脉瘤合成，通过结合新的流匹配方法和基于学习的方法，生成真实血管几何和解剖学合理的动脉瘤，同时提供大规模标注数据集以提升医疗影像分析能力。

详情

AI中文摘要

颅内动脉瘤（IAs）以不可预测的生长和破裂风险为特征，是导致中风的主要原因，可能引发致命性出血，具有高死亡率和长期残疾。随着人口老龄化，脑血管疾病的发病率和整体负担预计会增加，凸显了需要可扩展的方法来分析复杂的医疗数据并提高对这些疾病的群体层面理解的必要性。尽管数字孪生和深度学习为提高诊断、预后和治疗提供了有希望的途径，但其效果受到大规模高质量医疗数据和相应标签稀缺的限制。我们提出了SynVA，一种用于血管网格生成和解剖学一致动脉瘤合成的模块化工具包。SynVA结合了基于流匹配的新型方法生成健康血管网格与基于学习的方法生成解剖条件下的动脉瘤网格——动脉瘤是从已有的血管几何结构计算而来的，而不是孤立生成。此外，我们引入了基于生理学原理和统计先验的SynVA过程模型，用于血管和动脉瘤合成，从而能够生成大规模数据集（例如用于训练基于网格的生成模型）。为此，我们发布了包含50,000个完全标注网格样本的数据集，用于各种下游视觉任务，如语义分割。广泛的定量和定性评估证明了SynVA能够生成逼真的血管几何和解剖学合理的动脉瘤。具体而言，我们的实验表明，某些方法生成的动脉瘤形状更符合专家人类感知，而其他方法在定量相似性度量上与真实动脉瘤的重建表现更优。

英文摘要

Intracranial aneurysms (IAs), characterized by unpredictable growth and risk of rupture, are a major cause of stroke and can lead to life-threatening hemorrhages with high mortality and long-term disability. With aging populations, the incidence and overall burden of cerebrovascular diseases are expected to increase, highlighting the need for scalable approaches to analyze complex medical data and improve population-level understanding of these conditions. While digital twins and deep learning offer promising avenues for improving diagnosis, prognosis, and treatment, their effectiveness is limited by the scarcity of large-scale, high-quality medical data and corresponding labels. We present Synthetic VAsculature (SynVA), a modular toolkit for vascular mesh generation and anatomically consistent aneurysm synthesis. SynVA combines novel flow-matching-based methods for generating healthy vessel meshes with learning-based approaches for anatomy-conditioned aneurysm mesh generation - aneurysms are computed from pre-existing vascular geometries rather than being generated in isolation. In addition, we introduce the SynVA procedural model for vascular and aneurysm synthesis based solely on physiological principles and statistical priors, which enables the generation of large-scale datasets (e.g., for the training of mesh-based generative models). To this end, we release a dataset of 50,000 fully labeled mesh samples for a variety of downstream vision tasks, such as semantic segmentation. Extensive quantitative and qualitative evaluations demonstrate that SynVA generates realistic vessel geometries and anatomically plausible aneurysms. Specifically, our experiments indicate that some methods produce aneurysm shapes more aligned with expert human perception while others perform better on quantitative similarity metrics with reconstructions of real aneurysms.

URL PDF HTML ☆

赞 0 踩 0

2605.17613 2026-05-19 cs.AR cs.LG 版本更新

Longwang: 一种基于潜在生成先验的零样本全球时空降水降尺度方法

Yue Wang, Daniele Visioni

发表机构 * Department of Earth and Atmospheric Sciences（地球与大气科学系）

AI总结本文提出Longwang方法，通过学习条件化的潜在生成先验和物理信息观测算子，实现从月尺度到日尺度的降水降尺度，优于传统方法在细尺度空间模式重建、时间一致性保持和极端降水强度恢复方面，并能泛化到历史气候模拟和未来气候预测。

详情

AI中文摘要

高分辨率降水信息对于气候影响评估至关重要，但全球气候模型仍然过于粗糙，无法解析关键的小尺度过程。现有的机器学习降尺度方法通常需要配对的低分辨率和高分辨率数据进行监督学习，在推理过程中受限于固定区域或尺度因子，并且在物理空间中训练和运行计算成本较高。本文介绍Longwang，一种用于全球时空降水降尺度的零样本潜在生成框架。Longwang学习了一个条件化的潜在生成先验，并通过后验采样与物理信息观测算子结合，使从月尺度O(100 km)输入生成日尺度O(10 km)降水场成为可能。在ERA5再分析数据上，Longwang在重建细尺度空间模式、保持时间一致性以及恢复极端降水强度方面优于标准后验采样方法。该框架进一步能够泛化到历史气候模拟和未来气候预测，在显著的分布偏移下仍保持有效性。

英文摘要

High-resolution precipitation information is essential for climate impact assessment, yet global climate models remain too coarse to resolve key small-scale processes. Existing machine learning downscaling methods often require paired low- and high-resolution data for supervised learning, are tied to fixed regions or scale factors during inference, and can be computationally expensive to train and run in physical space. Here we introduce Longwang, a zero-shot latent generative framework for global spatiotemporal precipitation downscaling. Longwang learns a context-conditioned latent generative prior and combines it with a physically informed observation operator through posterior sampling, enabling daily O(10 km) precipitation fields to be generated from monthly O(100 km) inputs. On ERA5 reanalysis, Longwang outperforms standard posterior sampling with an unconditional generative prior in reconstructing fine-scale spatial patterns, preserving temporal coherence, and recovering extreme precipitation intensities. The framework further generalizes to historical climate simulations and future climate projections under substantial distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2605.17590 2026-05-19 cs.LG math.OC 版本更新

Form and Function: Machine Unlearning as a Problem of Misaligned States

形式与功能：将机器去学习视为不一致状态的问题

Kennon Stewart

发表机构 * Second Street Labs, Detroit, MI, USA（第二街实验室，密歇根州底特律）； Department of Statistics, University of Michigan, Ann Arbor, MI, USA（密歇根大学统计系，密歇根州安阿伯）

AI总结本文提出将在线L-BFGS的机器去学习问题建模为反事实状态对齐问题，通过引入状态感知度量和反事实 oracle 模型，证明去学习不仅仅是参数修正问题，还需要与可实现的反事实优化器状态对齐。

详情

AI中文摘要

我们把在线L-BFGS的机器去学习问题建模为反事实状态对齐问题。给定一个实际事件流和一个经过删除编辑的反事实流，去学习的目标是确定在从未处理过被删除样本的情况下会产生的优化器状态。我们引入了状态感知度量，分别衡量参数误差、内存运算符误差、综合状态误差和更新方向误差。内存度量比较由o-L-BFGS内存引起的逆Hessian作用，而不是将曲率对视为有限影响。在凸性假设下，我们推导出反事实状态偏差的递归界。然后，我们评估了一个状态感知的删除干预基准，包括仅内存和仅参数的修正，与反事实 oracle 模型进行比较。这些结果表明，在线L-BFGS的去学习不仅仅是参数修正问题：它需要与可实现的反事实优化器状态对齐。

英文摘要

We formulate machine unlearning for online L-BFGS as a counterfactual state-alignment problem. Given an actual event stream and a deletion-edited counterfactual stream, the target of unlearning is the optimizer state that would have arisen had the deleted samples never been processed. We introduce state-aware metrics that separately measure parameter error, memory-operator error, combined state error, and update-direction error. The memory metric compares the inverse-Hessian actions induced by the o-L-BFGS memory, rather than treating curvature pairs as of finite influence. Under convexity assumptions, we derive a recursive bound on counterfactual state deviation. We then evaluate a state-aware benchmark of deletion interventions, including memory-only and parameter-only corrections, against an counterfactual oracle model. These results show that unlearning for online L-BFGS is not merely a parameter-correction problem: it requires alignment with a realizable counterfactual optimizer state.

URL PDF HTML ☆

赞 0 踩 0

2605.17582 2026-05-19 cs.LG cs.CE 版本更新

Scale-Equivariant Generative Forecasting: Weight-Tied Dilated Convolutions, Wavelet Scattering Inputs, and Spectral-Consistency Training for Self-Similar Time Series

尺度等变生成预测：权重绑定的扩张卷积、小波散射输入和频谱一致性训练用于自相似时间序列

Andrea Morandi

发表机构 * Cisco Systems, Inc.（思科系统公司）

AI总结本文提出了一种尺度等变生成预测方法，通过权重绑定的扩张卷积、小波散射输入和频谱一致性训练，用于自相似时间序列的生成，展示了在S&P 500日收益率上的优越性能。

详情

AI中文摘要

许多自然和工程时间序列--股票回报、气候异常、湍流速度、神经记录、分组网络流量--近似自相似：其时间跨度为T的分布与时间跨度为1的分布通过一个缩放指数H关联。标准深度生成序列模型（Transformer、扩张TCN、WaveNet家族）忽略了这一点。它们的感受野很宽，但内核参数在每个扩张级别独立存在，导致多尺度架构，而非尺度等变架构。我们有三个贡献。首先，我们为一维因果网络给出了离散尺度等变的精确定义，并证明了二进制扩张在边界效应范围内与任何内核权重在不同级别共享的扩张卷积堆栈相容。绑定内核将卷积参数预算减少L倍（L为深度），并强制自相似性作为归纳偏置。其次，我们将这种尺度等变WaveNet（SE-WaveNet）主干包裹在三个具有相同先验的组件中：一级Daubechies-4小波输入、Hurst-FiLM块暴露局部缩放指数、以及针对|f|^{-(2H+1)}幂律频谱的频谱一致性训练项。头部是条件归一化流，选择以保持等变性。第三，在30年S&P 500每日对数收益率上，SE-WaveNet样本在Allan方差前25个宇宙上重现经验缩放崩溃诊断（中位数C* = 0.020），而普通WaveNet在匹配容量下不（≥0.06）。NLL、KS校准和尾部能量距离与基线持平或优于基线，参数数量更少L倍。

英文摘要

Many natural and engineered time series -- equity returns, climate anomalies, turbulent velocities, neural recordings, packet-level network traffic -- are approximately self-similar: their horizon-$T$ distribution is tied to the horizon-$1$ distribution by one scaling exponent $H$. Standard deep generative sequence models (transformers, dilated TCNs, the WaveNet family) ignore this. Their receptive fields are wide, but kernel parameters live independently at every dilation level, yielding a multi-scale architecture, not a scale-equivariant one. We make three contributions. First, we give a precise definition of discrete scale equivariance for 1D causal networks and prove that dyadic dilation commutes (up to boundary effects) with any dilated-convolution stack whose kernel weights are shared across levels. Tying the kernel shrinks the convolutional parameter budget by an $L$-fold factor (where $L$ is depth) and hard-wires self-similarity in as an inductive bias. Second, we wrap this Scale-Equivariant WaveNet (SE-WaveNet) backbone in three components that carry the same prior: a one-level Daubechies-4 wavelet input, a Hurst-FiLM block exposing the local scaling exponent, and a spectral-consistency training term targeting the $|f|^{-(2H+1)}$ power-law spectrum. The head is a conditional normalising flow, chosen to preserve equivariance. Third, on 30 years of S&P 500 daily log-returns, SE-WaveNet samples reproduce the empirical scaling-collapse diagnostic on the Allan-Variance top-25 universe (median $\mathcal{C}^\star = 0.020$), while a vanilla WaveNet at matched capacity does not ($\geq 0.06$). NLL, KS-calibration, and tail energy distance tie or beat the baseline, with $L\times$ fewer convolutional parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.17581 2026-05-19 cond-mat.soft cs.LG 版本更新

Topological Data Analysis combined with Machine Learning for Predicting Permeability of Porous Media

拓扑数据分析结合机器学习预测多孔介质渗透率

Ebru Dagdelen, Catherin Neena Lalu, Aakash Karlekar, Manav Arora, Matthew Illingworth, Jonathan Jaquette, Linda Cummings, Lou Kondic

发表机构 * Department of Mathematical Sciences, New Jersey Institute of Technology（新泽西理工学院数学科学系）； Department of Physics, New Jersey Institute of Technology（新泽西理工学院物理系）

AI总结本研究探讨了如何利用拓扑数据分析和机器学习方法，通过提取多孔介质的结构、拓扑和网络特征，来预测其渗透率，并展示了拓扑数据分析在结合机器学习时的有效性。

详情

AI中文摘要

多孔介质中的流体流动由于其复杂性难以通过标准的解析或数值方法解决。然而，由于合成多孔介质的表示容易生成且物理实验数据日益普及，该问题非常适合结合机器学习（ML）技术的研究。我们讨论了可以从此类数据中提取的多种特征及其作为标准ML算法输入变量的用途。这些特征包括描述多孔介质几何结构的结构度量、描述连通性的拓扑度量以及通过将多孔介质建模为简化孔隙网络获得的网络度量。这些特征使能够利用机器学习技术预测所考虑的（合成）多孔材料的渗透率，其中机器学习方法还利用了单独计算的精确渗透率（真实值）。通过比较不同输入变量所得出的结果，有助于更深入地理解各种度量在基于多孔介质结构预测渗透率方面的实用性。我们特别表明，拓扑数据分析（TDA）提供了一组有用的特征，可以轻松地与机器学习结合，以获得有意义的结果。

英文摘要

Flow in porous media is difficult to address using standard analytical or numerical methods due to its complexity. However, since synthetic representations of porous media are easy to produce and data from physical experiments are becoming more widely available, the problem is well-suited to studies that include machine learning (ML) techniques. We discuss a number of features that can be extracted from such data, and their utility as input variables into a standard ML algorithm. These features include structural measures describing the geometry of the porous media, topological measures describing the connectivity, and network measures obtained by modeling the porous media as simplified pore networks. These features enable the prediction of the permeability of the considered (synthetic) porous materials using ML techniques that also leverage the separately computed exact permeability (ground truth). Comparing results obtained using different input variables helps develop a better understanding of the utility of various measures for predicting permeability based on the porous media structure. We show, in particular, that topological data analysis (TDA) provides a useful set of features that can be easily combined with ML to yield meaningful results.

URL PDF HTML ☆

赞 0 踩 0

2605.17575 2026-05-19 cs.LG cs.AI 版本更新

UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts

UniAlign：一种用于在分布偏移下鲁棒网络流量分类的模型无关框架

Tongze Wang, Xiaohui Xie, Wenduo Wang, Chuyi Wang, Yong Cui

发表机构 * Institute for Network Sciences and Cyberspace, Tsinghua University（网络科学与网络空间研究院，清华大学）； Department of Computer Science and Technology, Tsinghua University（计算机科学与技术系，清华大学）

AI总结本文提出UniAlign，一种模型无关的框架，通过领域对齐微调和稳定模型集成提升深度学习网络流量分类模型在分布偏移下的鲁棒性，实验表明其在准确率和F1分数上均优于现有基线。

详情

AI中文摘要

网络流量分类（NTC）模型在真实世界环境中部署时，由于网络条件的变化导致的分布偏移常常引起严重的性能下降。现有的增强鲁棒性的方法通常与特定的模型架构或数据设置耦合，无法泛化到最先进的原始字节基NTC模型，或导致显著的训练开销。在本文中，我们提出UniAlign，一种新的模型无关框架，旨在提升基于深度学习的NTC模型在分布偏移下的鲁棒性。UniAlign结合了领域对齐微调，该方法鼓励在异构网络条件下学习领域不变的流量表示，以及稳定模型集成，该方法通过在平坦损失区域内的检查点聚合来增强推理鲁棒性。该框架可以无缝集成到现有的监督NTC模型中，无需特定的特征模态或引入非常数的额外训练成本。我们在三个涵盖多样分布偏移的公开数据集上评估了UniAlign，包括加密方案、数据收集设备和攻击行为。在两个代表性的NTC模型上的实验结果表明，与标准训练相比，UniAlign将平均分类准确率提高了2.51%，平均F1分数提高了2.71%，在准确率和F1分数上均优于最强基线，同时仅需所有NTC特定基线训练时间的12.4%至53.9%。

英文摘要

Network traffic classification (NTC) models often suffer severe performance degradation when deployed in real-world environments due to distribution shifts caused by changing network conditions. Existing robustness-enhancing approaches are commonly coupled to specific model architectures or data settings, fail to generalize to state-of-the-art raw-byte-based NTC models, or incur significant training overhead. In this paper, we propose UniAlign, a novel model-agnostic framework that improves the robustness of deep learning-based NTC models under distribution shifts. UniAlign combines \emph{domain alignment fine-tuning}, which encourages the learning of domain-invariant traffic representations across heterogeneous network conditions, with \emph{stable model ensembling}, which enhances inference robustness by aggregating checkpoints within a flat loss region. The framework can be seamlessly integrated into existing supervised NTC models without requiring specific feature modalities or introducing non-constant additional training costs. We evaluate UniAlign on three public datasets covering diverse distribution shifts, including encryption schemes, data collection devices, and attack behaviors. Experimental results on two representative NTC models demonstrate that, compared with standard training, UniAlign improves average classification accuracy by 2.51\% and average F1 score by 2.71\%, outperforming the strongest baseline by 1.45\% in accuracy and 1.69\% in F1 score, while requiring only 12.4\%--53.9\% of the training time of all NTC-specific baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.17571 2026-05-19 cs.CV cs.LG 版本更新

Stable Routing for Mixture-of-Experts in Class-Incremental Learning

混合专家在类增量学习中的稳定路由

Zirui Guo, Quan Cheng, Da-Wei Zhou, Lijun Zhang

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）

AI总结本文研究了在类增量学习中混合专家模型的稳定路由问题，提出了一种稳定路由框架StaR-MoE，通过敏感性感知路由对齐和不对称容量正则化，提高了模型对新类别的适应能力和旧类别的知识保留能力。

详情

AI中文摘要

类增量学习（CIL）要求模型在学习新类别时保持先前知识。最近，结合预训练模型与混合专家（MoE）的方法在CIL中受到越来越多关注：它们通常在学习过程中扩展专家，并使用路由器分配权重。然而，现有MoE方法往往忽视了专家扩展引起的路由漂移。一旦引入新的专家，路由器可能会将样本从早期类别重新分配给新加入的专家，从而扰动已建立的专家组合，即使旧专家保持冻结。我们主张，可扩展的MoE在CIL中需要两个互补的性质：稳定的旧类路由用于知识保留和足够的容量利用用于新类适应。为此，我们提出了Stable Routing for MoE（StaR-MoE），一种用于可扩展MoE的路由级别框架。通过结合敏感性感知的路由对齐，StaR-MoE通过敏感性引导的约束将当前旧类路由行为与历史路由分布对齐。同时，StaR-MoE引入了不对称容量正则化，以鼓励有效利用扩展的专家池，而不影响类特定的路由专业化。在四个标准CIL基准上的广泛实验表明，StaR-MoE在平均准确率和最后准确率上均优于现有最先进方法，突显了稳定路由的重要性。

英文摘要

Class-incremental learning (CIL) requires models to learn new classes sequentially while preserving prior knowledge. Recently, approaches that combine pre-trained models with mixture-of-experts (MoE) have received increasing attention in CIL: they typically expand experts during learning and employ a router to assign weights across experts. However, existing MoE methods often overlook routing drift induced by expert expansion. Once new experts are introduced, the router may reassign samples from earlier classes to newly added experts, thereby perturbing previously established expert compositions and causing interference even when old experts remain frozen. We argue that expandable MoE in CIL requires two complementary properties: stable old-class routing for knowledge preservation and sufficient capacity utilization for new-class adaptation. To this end, we propose Stable Routing for MoE (StaR-MoE), a routing-level framework for expandable MoE in CIL. By incorporating sensitivity-aware routing alignment, StaR-MoE aligns current old-class routing behavior with historical routing distributions through sensitivity-guided constraints. Complementarily, StaR-MoE introduces asymmetric capacity regularization to encourage effective utilization of the expanded expert pool without compromising class-specific routing specialization. Extensive experiments across four standard CIL benchmarks demonstrate that StaR-MoE consistently improves both average and last accuracy over state-of-the-art methods, highlighting the importance of stable routing.

URL PDF HTML ☆

赞 0 踩 0

2605.17570 2026-05-19 cs.LG cs.CL 版本更新

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning

GRPO在离线策略下的可能性：Mu-GRPO用于高效的大语言模型强化学习

Minghao Tian, Yunfei Xie, Chen Wei

发表机构 * Rice University（里士大学）

AI总结本文探讨了GRPO在离线策略下的可行性，提出Mu-GRPO方法，通过减少rollout-optimization切换开销，实现高效的LLM强化学习，同时在多个基准测试中表现出色。

详情

AI中文摘要

组相对策略优化（GRPO）已成为近期大语言模型强化学习中可验证奖励（RLVR）进展的关键推动因素，但通常在低延迟、近策略的 regime 中训练，导致系统开销显著。我们提出一个简单的问题：GRPO可以多离线策略吗？我们证明GRPO类算法可以容忍比之前假设更大的rollout延迟，并提出Mu-GRPO，一种将训练分为少量（例如四个）大序列生成-优化阶段的RL训练框架。这种设计在诱导高rollout延迟的同时大幅减少了rollout-optimization切换开销。为了在延迟数据下稳定学习，Mu-GRPO结合了放松的剪裁（保留有用的延迟rollout梯度）与负优势 veto（移除不稳定后触发后缀更新）。在五个语言模型和多个数学推理基准测试中，Mu-GRPO在性能上与标准GRPO匹配或超过，同时在墙钟训练时间上实现了约2倍的加速，为LLM强化学习建立了显著改进的性能-效率权衡。

英文摘要

Group Relative Policy Optimization (GRPO) has been a key driver of recent progress in reinforcement learning with verifiable rewards (RLVR) for large language models, but it is typically trained in a low-staleness, near-on-policy regime that incurs substantial system overhead. We ask a simple question: How off-policy can GRPO be? We show that GRPO-style algorithms can tolerate substantially larger rollout staleness than previously assumed, and propose Mu-GRPO, an RL training framework that organizes training into a small number (e.g., four) of large sequential generation-optimization stages. This design induces high rollout staleness while greatly reducing rollout-optimization switching overhead. To stabilize learning under stale data, Mu-GRPO combines relaxed clipping, which preserves useful stale-rollout gradients, with negative-advantage veto, which removes destabilizing post-trigger suffix updates in negative-advantage responses. Across five language models and multiple math reasoning benchmarks, Mu-GRPO matches or exceeds the performance of standard GRPO while achieving around 2x speedup in wall-clock training time, establishing a substantially improved performance-efficiency trade-off for LLM reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2605.17562 2026-05-19 cs.LG cs.AI cs.HC 版本更新

Beyond Accuracy: Robustness, Interpretability and Expressiveness of EEG Foundation Models

超越准确率：EEG基础模型的鲁棒性、可解释性和表达性

Urban Širca, Maryam Alimardani, Stefanos Zafeiriou, Konstantinos Barmpas

发表机构 * Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）； Imperial College London（伦敦帝国学院）

AI总结本文研究了EEG基础模型的鲁棒性、可解释性和表达性，通过在八个数据集上对六个EEG-FMs和一个基线深度学习模型进行基准测试，揭示了模型在不同扰动下的表现，以及其在可解释性和表达性方面的特性。

详情

AI中文摘要

EEG基础模型（EEG-FMs）主要在干净且分布内的准确性上进行了评估，其鲁棒性、可解释性和表征质量尚未得到充分考察。本研究通过在八个数据集上对六个EEG-FMs和一个基线深度学习模型进行基准测试，填补了这些空白。除了干净准确性外，我们进行了三层分析：（i）鲁棒性：我们应用了测试时扰动，包括加性噪声、随机和区域基于的通道丢弃以及区域特定的噪声注入。我们的分析表明，没有单一模型在所有失败模式中占主导地位。最抗噪的模型在通道丢弃下最为脆弱，当通道被移除而不是零填充时，许多丢弃脆弱性消失。（ii）可解释性：我们首次将注意力感知的层间相关传播（AttnLRP）应用于EEG-FMs，并展示了模型广泛集中在与任务相关的脑区，这与已知的神经生理学一致。然而，属性图在扰动下保持空间稳定，而预测性能下降，表明模型关注正确的脑区，但解码了被破坏的内容。（iii）表达性：通过块状探测，我们显示在微调过程中后期块被重新利用，而早期块已经包含任务相关的信息。此外，我们证明了之前归因于低质量预训练表示的头部-only性能较差，很大程度上是由于池化所致，且当EEG-FMs的token级嵌入被保留时，它们具有足够的表征能力。这些发现为EEG-FMs的鲁棒性、可解释性和表达性提供了首次系统的评估，并突显了其开发中的关键考虑因素。

英文摘要

EEG foundation models (EEG-FMs) have been evaluated predominantly on clean, in-distribution accuracy, leaving their robustness, interpretability and representational quality largely unexamined. This study addresses these gaps by benchmarking six EEG-FMs against a baseline deep learning model across eight datasets. Beyond clean accuracy, we conduct three layers of analysis: (i) Robustness: we apply test-time perturbations including additive noise, random and region-based channel dropout and region-specific noise injection. Our analyses show that no single model dominates all failure modes. The most noise-robust model is among the most fragile under channel dropout and much of the dropout fragility disappears when channels are removed rather than zero-padded. (ii) Interpretability: we present the first application of Attention-Aware Layer-Wise Relevance Propagation (AttnLRP) to EEG-FMs and show that models broadly concentrate relevance on task-appropriate brain regions consistent with known neurophysiology. However, attribution maps remain spatially stable under perturbation while predictions degrade, suggesting that the models attend to the correct brain regions but decode corrupted content. (iii) Expressiveness: With block-wise probing we show that late blocks are repurposed during fine-tuning, while early blocks already hold task-related information. Furthermore, we demonstrate that the poor head-only performance previously attributed to low-quality pre-trained representations is largely explained by pooling and that EEG-FMs possess sufficient representational capacity when their token-level embeddings are preserved. Together, these findings provide the first systematic assessment of robustness, interpretability and expressiveness for EEG-FMs and highlight critical considerations for their development.

URL PDF HTML ☆

赞 0 踩 0

2605.17555 2026-05-19 cs.LG cs.CV 版本更新

PFlow-T: A Persistence-Driven Forward Process for Topology-Controlled Generation

PFlow-T：基于持续性的拓扑控制生成过程

Snigdha Chandan Khilar

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出PFlow-T，一种基于持续性的前向过程生成模型，通过持续同调来控制拓扑结构，实现了对Betti数的生成和处理非分布任务的改进。

2605.17552 2026-05-19 cs.LG 版本更新

Q-LocalAdam: Memory-Efficient Client-Side Adaptive Optimization for Edge Federated Learning

Q-LocalAdam: 一种内存高效的边缘联邦学习客户端自适应优化方法

Vedant Waykole, Haroon R. Lone

发表机构 * IISER Bhopal（印度比哈尔州科学与技术研究院）

AI总结本文提出Q-LocalAdam，一种针对边缘联邦学习中非独立同分布数据和内存限制的自适应优化方法，通过分布感知的8位量化块线性编码和对数空间编码实现内存高效优化，显著提升模型性能和并发工作负载能力。

详情

AI中文摘要

边缘设备上的联邦学习必须应对非独立同分布的客户端数据和严格的内存预算。像Adam这样的自适应优化器在数据异质性下稳定训练，但需要存储全精度动量和方差状态，通常使客户端内存开销增加三倍。这限制了在资源受限设备上可部署的模型大小和同时进行的联邦任务数量。我们实证发现，联邦Adam中的动量和方差在统计特性上存在根本差异：动量值对称且有界，而方差跨越八个数量级并具有对数正态结构。受这种不对称性启发，我们提出了Q-LocalAdam，它对动量应用分布感知的8位量化块线性编码，对方差应用对数空间编码，同时保持模型参数在全精度下。在CIFAR-10和CIFAR-100上，针对不同数据异质性（α∈{0.1, 0.5, 1.0, IID}），Q-LocalAdam在中等异质性下实现3.37倍的优化器内存减少，无精度损失，在极端异质性下（如CIFAR-100，α=0.1）实现显著提升（+5.74pp）。多种子验证确认统计显著性（p<0.01）。相比之下，朴素的均匀量化退化到随机性能，证明了分布感知设计的重要性。Q-LocalAdam在内存受限的边缘设备上无需修改联邦协议即可实现更大的模型和更多的并发工作负载。

英文摘要

Federated learning on edge devices must cope with non-IID client data and tight memory budgets. Adaptive optimizers like Adam stabilize training under data heterogeneity but require storing full-precision momentum and variance states, often tripling client memory overhead. This limits deployable model sizes and concurrent federated jobs on resource-constrained devices. We empirically observe that momentum and variance in federated Adam exhibit fundamentally different statistical properties: momentum values are symmetric and bounded, while variance spans eight orders of magnitude with log-normal structure. Motivated by this asymmetry, we propose \textbf{Q-LocalAdam}, which applies distribution-aware 8-bit quantization block-wise linear encoding for momentum and log-space encoding for variance while keeping model parameters in full precision. Across CIFAR-10 and CIFAR-100 under varying data heterogeneity ($α\in \{0.1, 0.5, 1.0, \text{IID}\}$), Q-LocalAdam achieves $3.37\times$ optimizer memory reduction with no accuracy loss under moderate heterogeneity and significant improvements under extreme heterogeneity (e.g., +5.74pp on CIFAR-100, $α=0.1$). Multi-seed validation confirms statistical significance ($p<0.01$). In contrast, naive uniform quantization degrades to random performance, demonstrating that distribution-aware design is essential. Q-LocalAdam enables larger models and more concurrent workloads on memory-constrained edge devices without modifying the federated protocol.

URL PDF HTML ☆

赞 0 踩 0

2605.17546 2026-05-19 astro-ph.IM astro-ph.GA cs.LG 版本更新

Accelerating Redshift-Conditioned Galaxy Image Synthesis with One-step Generative Modeling

通过一步生成建模加速红移条件下的星系图像合成

Tianyue Yang, Sandro Tacchella, Xiao Xue

发表机构 * The Center for Computational Science（计算科学中心）； University College London（伦敦大学学院）； Cavendish Laboratory（卡文迪许实验室）； Kavli Institute for Cosmology University of Cambridge（剑桥大学卡文迪许宇宙研究所）

AI总结本文研究了利用扩散模型和像素MeanFlow实现高效红移条件下的星系图像生成，通过对比不同模型在GalaxiesML-64数据集上的表现，发现一步生成模型在计算成本大幅降低的情况下能有效恢复星系形态统计信息，为大规模宇宙巡天和基于模拟的科学推断提供了新路径。

Comments 19 pages, 8 figures

详情

AI中文摘要

理解宇宙不同时期星系形态演化的关键在于能够根据红移条件生成真实星系群体的模型。本文研究了利用扩散模型和像素MeanFlow实现高效红移条件下的生成建模。我们首先回顾了基于分数的扩散模型、流匹配、一步生成模型和现代扩散采样器之间的联系。然后我们在GalaxiesML-64数据集上评估了DDPM、DDIM、DEIS-AB2、DPM++2M和一步像素MeanFlow，使用基于形态的指标，包括椭圆率、半长轴、塞尔斯指数和等亮面积。我们的结果表明存在清晰的精度-效率权衡：标准DDPM采样在分布忠实度上最佳但计算成本高，而二阶采样器在DDIM上显著提高了效率。像素MeanFlow实现了单步生成并在多个形态统计上表现竞争，尽管在细粒度结构上仍弱于多步DDPM。我们的结果表明，一步生成模型可以在计算成本降低数量级的情况下恢复关键星系形态统计信息，为大规模宇宙巡天和基于模拟的科学推断开辟了新路径。

英文摘要

Understanding galaxy morphology evolution across cosmic time requires models that can generate realistic galaxy populations conditioned on redshift. In this work, we study efficient redshift-conditioned generative modeling for astrophysical image synthesis using diffusion models and pixel-MeanFlow. We first review the connections between score-based diffusion models, Flow Matching, one-step generative models, and modern diffusion samplers. We then evaluate DDPM, DDIM, DEIS-AB2, DPM++2M, and one-step pixel-MeanFlow on the GalaxiesML-64 dataset using morphology-based metrics, including ellipticity, semi-major axis, Sérsic index, and isophotal area. Our results show a clear accuracy-efficiency trade-off: standard DDPM sampling achieves the best distributional fidelity but requires high computational cost, while second-order samplers substantially improve efficiency over DDIM. Pixel-MeanFlow enables single-step generation and achieves competitive performance on several morphology statistics, though it remains weaker than many-step DDPM for fine-grained structure. Our results demonstrate that one-step generative models can recover key galaxy morphology statistics at orders-of-magnitude lower computational cost, opening a path toward efficient conditional simulators for large cosmological surveys and simulation-based scientific inference.

URL PDF HTML ☆

赞 0 踩 0

2605.17530 2026-05-19 cs.CR cs.AI cs.LG cs.NI 版本更新

Few-Shot Network Intrusion Detection Using Online Triplet Mining

基于在线三元组挖掘的少样本网络入侵检测

Jack Wilkie, Hanan Hindy, Christos Tachtatzis, Miroslav Bures, Robert Atkinson

发表机构 * Department of Electronics and Electrical Engineering, University of Strathclyde（斯特拉斯克莱德大学电子与电气工程系）； Faculty of Computer and Information Sciences, Ain Shams University（爱思曼大学计算机与信息科学学院）； Faculty of Electrical Engineering, Czech Technical University（捷克技术大学电气工程学院）

AI总结本文提出利用在线三元组挖掘和KNN分类器的三元组网络，实现少样本下的有效网络入侵检测，通过对比不同三元组挖掘算法和模型设计，验证了在少量恶意样本下该方法的竞争力。

Comments Published in: MDPI Applied Sciences, 2026. Official version: https://doi.org/10.3390/app16104589 Code: https://github.com/jackwilkie/few_shot_nids_triplet_mining

详情

DOI: 10.3390/app16104589
Journal ref: Wilkie, J.; Hindy, H.; Tachtatzis, C.; Bures, M.; Atkinson, R. Few-Shot Network Intrusion Detection Using Online Triplet Mining. Appl. Sci. 2026, 16, 4589. https://doi.org/10.3390/app16104589

AI中文摘要

网络入侵检测系统在网络保护中起着关键作用，通过检测恶意网络流量并由网络安全运营中心调查。最先进的方法利用监督机器学习方法训练分类模型以识别已知的网络攻击；然而，这些模型需要大量的标记数据集进行训练，并在训练较小数据集时表现不佳。为了解决这一不足，异常检测模型学习良性流量的分布，并将不符合的流量标记为恶意。虽然这些方法不需要恶意示例进行训练，但它们的高误报率使其不切实际。因此，当特定攻击类别的标记实例不足时，网络可能特别容易受到攻击。这通常发生在新建立的网络或之前未见过的攻击类型出现时。为了解决这一挑战，本文提出使用三元组网络，利用在线三元组挖掘和KNN分类器，能够进行少样本分类，从而在仅训练少量恶意示例后实现有效的入侵检测。各种在线三元组挖掘算法被探索，并通过一系列消融研究比较和评估了模型设计选择，如推断算法和优化的距离度量。最终模型在少样本二分类和多类分类中与现有方法进行了比较，发现当每个类别训练至少10个恶意样本时，所提出的方法在竞争性方面表现良好。

英文摘要

Network intrusion detection systems play a vital role in protecting networks by detecting malicious network traffic which can then be investigated by a cybersecurity operations centre. State-of-the-art approaches utilise supervised machine learning methods to train a classification model to recognise known cyberattacks; however, these models require a large labelled dataset to train and show poor performance when trained on smaller datasets. In an attempt to address this shortcoming, anomaly detection models learn the distribution of benign traffic and flag non-conforming traffic as malicious. While these methods do not require malicious examples to train, they suffer from high false-positive rates rendering them impractical. As a result, networks may be particularly vulnerable when there are insufficient labelled instances of a specific attack class to train an effective classifier. This often occurs in newly established networks or when previously unseen types of attacks emerge. To address this challenge, this work proposes the use of a triplet network, utilising online triplet mining and a KNN classifier, which is able to perform few-shot classification, enabling effective intrusion detection after being trained on a limited number of malicious examples. Various online triplet mining algorithms were explored and model design choices, such as the inference algorithm and optimised distance metrics, were compared and evaluated through a series of ablation studies. The final model was compared against other state-of-the-art approaches in few-shot binary and multiclass classification, where the proposed approach was found to be competitive with existing methods when trained on as little as 10 malicious samples of each class.

URL PDF HTML ☆

赞 0 踩 0

2605.17528 2026-05-19 cs.LG cs.AI cs.CL 版本更新

CasualSynth: Generating Structurally Sound Synthetic Data

CasualSynth: 生成结构上合理的合成数据

Zehua Cheng, Wei Dai, Jiahao Sun, Thomas Lukasiewicz

发表机构 * Department of Computer Science, University of Oxford（牛津大学计算机科学系）； Institute of Logic and Computation, TU Wien（维也纳技术大学逻辑与计算研究所）

AI总结本文提出CasualSynth框架，通过解耦因果结构生成与语义实现，生成既符合因果机制又语义丰富的合成数据，解决了LLM在生成合成数据时无法保证因果正确性的问题。

Comments 15 pages

详情

AI中文摘要

大型语言模型（LLMs）能够生成逼真的合成数据，但无法保证其输出符合目标领域的因果机制。我们引入CausalSynth框架，该框架将因果结构生成与语义实现解耦，生成既符合因果机制又语义丰富的合成数据。该框架分为三个阶段：首先，一个结构因果模型（SCM）——一个定义在有向无环图（DAG）上的结构方程组，通过祖先采样生成因果骨架，即满足支配图全局马尔可夫性质的变量赋值；其次，一个LLM作为受约束的实现者，一个条件翻译器，将每个骨架映射到高维观测，如临床笔记或交易日志；第三，一个迭代一致性验证模块通过确定性提取检测结构违规，并将针对性的修正反馈给LLM，形成闭环优化过程。我们识别出语义后门问题，即LLM系统性地用预训练先验覆盖施加的因果事实——并证明我们的迭代机制相对于标准拒绝采样减少了由此产生的选择偏差。在三个因果基准（ASIA、ALARM和MIMIC-Struct）上，CausalSynth在假阳性率接近名义α=0.05水平的情况下保持条件独立性，并在70B参数LLM基础上实现了超过96%的可实现率。该框架还通过保留噪声和图 mutilation 支持原理化的干预和反事实生成。

英文摘要

Large Language Models (LLMs) generate realistic synthetic data but offer no guarantee that their outputs respect the causal mechanisms governing the target domain. We introduce CausalSynth, a framework that decouples causal structure generation from semantic realization, yielding synthetic data that is both causally valid and linguistically rich. The framework operates in three phases. First, a Structural Causal Model (SCM) - a tuple of structural equations defined over a directed acyclic graph (DAG) generates causal skeletons, i.e., variable assignments that satisfy the Global Markov Property of the governing DAG, via ancestral sampling. Second, an LLM acts as a constrained \emph{realizer}, a conditional translator that maps each skeleton to a high-dimensional observation such as a clinical note or a transaction log. Third, an Iterative Consistency Verification module detects structural violations through deterministic extraction and feeds targeted corrections back to the LLM, forming a closed-loop refinement process. We identify the Semantic Backdoor problem the systematic tendency of LLMs to override imposed causal facts with pre-training priors -- and prove that our iterative mechanism reduces the resulting selection bias relative to standard rejection sampling. On three causal benchmarks (ASIA, ALARM, and MIMIC-Struct), CausalSynth preserved conditional independencies with false-positive rates near the nominal $α=0.05$ level and achieved realizability rates above 96% with 70B-parameter LLM backbones. The framework additionally supports principled interventional and counterfactual generation through noise retention and graph mutilation.

URL PDF HTML ☆

赞 0 踩 0

2605.17508 2026-05-19 cs.LG cs.AI 版本更新

BESplit: Bias-Compensated Split Federated Learning with Evidential Aggregation

BESplit: 偏差补偿分割联邦学习与证据聚合

Yuhan Xie, Chen Lyu, Jingrong Huang

发表机构 * MoE Key Laboratory of Interdisciplinary Research of Computation（交叉计算与经济学 interdisciplinary 研究 MOE 重点实验室）； Shanghai University of Finance（上海财经大学）

AI总结本文提出BESplit框架，通过证据聚合和偏差补偿协作来解决非独立同分布数据下分割联邦学习的偏差优化和收敛不稳定问题，提升了模型的准确性和效率。

详情

AI中文摘要

分割联邦学习（SFL）通过将模型分割到客户端和服务器之间实现隐私保护的协同训练。然而，在非独立同分布数据分布下，SFL常面临偏差优化和收敛不稳定的问题，而现有解决方案大多借鉴传统联邦学习的技术。在本工作中，我们发现SFL的分割架构本质上改变了客户端信息的表示和协调方式，为超越参数级聚合的偏差补偿提供了机会。基于这一见解，我们提出了BESplit，一个架构感知的框架，利用SFL内在结构来缓解非IID效应。首先，为防止偏见本地数据主导全局更新，我们引入证据聚合（EA）以基于证据不确定性对客户端贡献进行细粒度重新加权。其次，为进一步减少分布偏斜，我们开发了偏差补偿协作（BCC）以通过配对互补客户端对齐分割层表示。最后，双教师蒸馏（DTD）被纳入以同步解耦客户端和服务器模型之间的知识，使本地推理能够独立进行。在五个基准数据集上的广泛实验表明，BESplit在多样化的非IID设置下，准确率、收敛稳定性以及计算效率均优于现有最先进方法。

英文摘要

Split Federated Learning (SFL) enables privacy-preserving collaborative training by partitioning models between clients and a server. However, under non-IID data distributions, SFL often suffers from biased optimization and unstable convergence, while existing solutions largely adapt techniques from conventional federated learning. In this work, we observe that the split architecture of SFL inherently alters how client information is represented and coordinated, opening opportunities for bias compensation beyond parameter-level aggregation. Based on this insight, we propose BESplit, an architecture-aware framework that exploits the intrinsic structure of SFL to mitigate non-IID effects. First, to prevent biased local data from dominating global updates, we introduce Evidential Aggregation (EA) to perform fine-grained reweighting of client contributions based on evidential uncertainty. Second, to further reduce distributional skew, we develop Bias-Compensated Collaboration (BCC) to align split-layer representations by pairing complementary clients. Finally, Dual-Teacher Distillation (DTD) is incorporated to synchronize knowledge between decoupled client and server models, enabling independent local inference. Extensive experiments on five benchmark datasets demonstrate that BESplit consistently outperforms state-of-the-art methods in accuracy, convergence stability, and computational efficiency under diverse non-IID settings.

URL PDF HTML ☆

赞 0 踩 0

2605.17500 2026-05-19 cs.LG cs.CV 版本更新

The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation

沉默的画笔：评估AI艺术生成中的艺术风格泄露

Ninad Joshi, Ashutosh Ranjan, Vivek Srivastava, Shirish Karande

发表机构 * TCS Research（TCS研究）

AI总结本文研究了AI艺术生成中由于模型学习并复现艺术风格而产生的无意风格复现问题，提出了一种评估方法Art Arena，用于衡量艺术作品的编码强度、交互情况以及在无明确提示的情况下风格特征的重现频率。

详情

AI中文摘要

生成式文本到图像模型通常是在大规模网络爬取数据集上训练的，这些数据集包含多样化的视觉内容，如受版权保护和风格独特的艺术品，引发了关于所有权、归属和受保护视觉表达的无意重用的担忧。一个关键问题是，模型可以从这些数据中学习风格模式，并在生成输出中复现这些模式，而无需在提示中显式引用。我们称这种现象为The Silent Brush，即使在未被请求的情况下，所学的风格也会再次出现。现有的评估方法主要集中在近似重复检索或成员推断，而没有考虑到这种跨提示的无意风格复现形式。为了解决这些差距，我们首先制定了评估The Silent Brush的指导原则。然后引入Art Arena评估协议，用于衡量艺术作品的编码强度、交互情况以及在无明确提示的情况下其风格特征在生成输出中重现的频率。我们对广泛使用的文本到图像扩散模型，包括Stable Diffusion v1.5、Stable Diffusion XL (SDXL)和SANA-1.5进行了评估，并设计使其能够跨文本到图像生成系统通用。我们的结果表明，The Silent Brush源于艺术作品之间表示强度和交互动态的差异，导致模型生成中的不对称混合。代码和评估资源可在：https://anonymous.4open.science/r/ArtArena-EBE4获取。

一种用于科学计算程序元突变关系充分性的语义突变度量

Meng Li, Xiaohua Yang, Jie Liu, Shiyu Yan

发表机构 * School of Computing, University of South China（南华大学计算机学院）； Hunan Engineering Research Center of Software Evaluation and Testing for Intellectual Equipment（湖南软件测评与智能设备工程研究中心）； CNNC Key Laboratory on High Trusted Computing（中核集团高可信计算重点实验室）

AI总结本文提出了一种基于领域语义操作符的语义突变度量（SMS），旨在解决传统突变度量在科学计算中忽略领域语义的问题，通过引入五个领域语义操作符，提高了对元突变关系充分性的评估能力。

Comments Submitted to Information and Software Technology (IST), Elsevier. Manuscript: 93 pages in elsarticle review mode (12pt double-spaced, ~28-35 pp typeset). Supplementary code and 12-PUT pool at https://github.com/meng004/P2-Semantic-Mutation

详情

AI中文摘要

背景。元突变测试用于解决科学计算中的测试- oracle 问题，但传统突变度量仅基于语法 AST 突变，忽略了领域语义。目标。我们提出了语义突变度量（SMS），其基于五个领域语义操作符（保守侵蚀、操作符替换、超参数、轨迹翻转、结构注入）。SMS 在限定条件下几乎退化为传统突变度量（MS），因此任何基于 SMS 的结论在经典范围内都与之前的突变测试文献一致。方法。一个 12-PUT x 5-MP 设计用于四个单输出浮点到浮点类（数值、概率、代理、机器学习）的组合，配以一个三层归因分类器，将真正的语义故障与容忍、OOD、统计和人工制品类别分开。在相同的提示下，同一源/跨源的消融实验隔离了 LLM 源多样性贡献。LLM 生成的突变体在 AST 正则化水平上与默认配置的宇宙射线语法池进行比较。结果。预注册的大型效应阈值在点估计标准下未被满足；观察到的效应位于中等效应范围内。在相同的提示下，跨源池化没有明显改变 delta，表明在此设计中 LLM 身份不是关键因素。LLM 生成的突变体与默认宇宙射线语法突变体在 AST 层面的重叠很小；在默认的一阶语法配置下，超参数、结构注入和轨迹翻转类别是不可达的。结论。SMS 是科学计算中领域语义元突变关系集的后向兼容充分性度量。一阶不可达证据与效应大小问题无关。

英文摘要

Context. Metamorphic Testing addresses the test-oracle problem in scientific computing, but classical Mutation Score operates on syntactic AST mutations and misses domain semantics. Objective. We propose the Semantic Mutation Score (SMS), built on five domain-semantic operators (Conservation Erosion, Operator Substitution, Hyperparameter, Trajectory Flip, Structural Injection). SMS degenerates almost everywhere to MS in a characterised limit, so any SMS-based conclusion remains consistent with prior mutation-testing literature in the classical regime. Method. A 12-PUT x 5-MP design over four single-output float-to-float classes (numeric, probabilistic, surrogate, machine-learning) is paired with a three-layer attribution classifier separating true semantic faults from tolerance, OOD, statistical, and artefact categories. A same-source / cross-source ablation under an identical prompt isolates the LLM-source-diversity contribution. LLM-generated mutants are compared against a default-configuration cosmic-ray syntactic pool at the AST-normalised level. Results. The pre-registered large-effect threshold for Cliff's delta is not met under the point-estimate criterion; the observed effect lies in the medium-effect range. Cross-source pooling under an identical prompt does not appreciably shift delta, indicating that LLM identity is not the lever within this design. AST-level overlap between LLM-generated and default cosmic-ray syntactic mutants is small; the Hyperparameter, Structural Injection, and Trajectory Flip classes are unreachable under default first-order syntactic configurations. Conclusion. SMS is a backward-compatible adequacy metric for domain-semantic metamorphic-relation sets in scientific computing. The first-order unreachability evidence is independent of the effect-size question.

URL PDF HTML ☆

赞 0 踩 0

2605.17432 2026-05-19 cs.LG cs.CR 版本更新

DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models

DP-SelFT: 大语言模型的差分隐私选择性微调

Haichao Sha, Zihao Wang, Yuncheng Wu, Hong Chen, Wei Dong

发表机构 * Renmin University of China（中国人民大学）； Nanyang Technological University（南洋理工大学）

AI总结本文提出DP-SelFT框架，通过选择性微调方法在保持差分隐私的同时提升大语言模型的隐私-效用权衡。

详情

AI中文摘要

人类流动数字孪生用于预测移动引入对访客流动的影响

Chiharu Shima, Haruki Yonekura, Fukuharu Tanaka, Tatsuya Amano, Hirozumi Yamaguchi

发表机构 * bitA Inc.（bitA公司）； The University of Osaka（大阪大学）； RIKEN Center for Computational Science（理化学研究所计算科学中心）

AI总结本文提出了一种利用人类流动数字孪生预测移动引入措施对访客流动影响的框架，通过多智能体模拟器模拟访客根据当前位置和景点吸引力选择目的地的过程，并利用训练好的决策模型来量化移动引入措施对访客数量和流动的影响。

Comments An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026). Project page: https://mc.net.ist.osaka-u.ac.jp/en/activity/wakayama-castle-mobility_2023/

详情

AI中文摘要

我们提出了一种框架，用于使用人类流动数字孪生预测移动引入措施的影响。该数字孪生包含一个多智能体模拟器，能够表示访客根据当前位置和景点吸引力等因素选择目的地的方式。我们提取了访客在测量前干预的人流数据、景点间距离、景点吸引力和交通量等数据，并利用这些数据训练每个智能体的决策模型。训练好的决策模型是一个函数，输入访客的当前状态和周围环境信息，并输出访客下一步将移动到的景点。通过将移动引入措施表示为景点间距离或景点吸引力的变化，该框架可以在多智能体模拟器中重现移动引入的人流，并从而量化访客数量和流动变化等影响。我们使用日本宫岛城公园在引入和不引入移动措施时测量的人流数据评估了所提出的方法。当使用多层感知机决策模型重现移动引入的人流时，空间人口分布的余弦相似性超过0.7，证实了该方法能够复制移动引入引起的流动变化。

英文摘要

We propose a framework for predicting the effects of mobility introduction measures using a human-flow digital twin. This digital twin incorporates a multi-agent simulator that can represent how visitors choose destinations depending on factors such as their current location and the attractiveness of spots. We extract data on how visitors selected destinations with respect to measured pre-intervention human-flow data, inter-spot distances, spot attractiveness, and travel volumes, and use these data to train each agent's decision model of this simulator. The trained decision model is a function that takes a visitor's current state and surrounding environmental information as input and outputs which spot the visitor will move toward next. By expressing mobility introduction measures as changes to inter-point distances or to spot attractiveness, the framework can reproduce human flows with mobility introduction in the multi-agent simulator and thereby quantify effects such as changes in visitor counts and circulation. We evaluated the proposed method using human-flow data measured with and without introducing mobility within Wakayama Castle Park in Japan. When reproducing flows with mobility introduction using a multi-layer perceptron decision model, the cosine similarity of the spatial population distribution exceeded 0.7, confirming that the approach can replicate the flow changes caused by the mobility introduction.

URL PDF HTML ☆

赞 0 踩 0

2605.17419 2026-05-19 cs.LG cs.AI 版本更新

异质信息瓶颈协调图用于多智能体强化学习

Wei Duan, Junyu Xuan, En Yu, Xiaoyu Yang, Jie Lu

发表机构 * Australian Artificial Intelligence Institute (AAII)（澳大利亚人工智能研究所）

AI总结本文提出异质信息瓶颈协调图（HIBCG），通过理论指导机制解决多智能体强化学习中协调图的边存在性和信息传递容量分配问题，通过信息瓶颈方法构建组对齐的块对角先验，实现边存在性和信息容量的理论验证。

详情

AI中文摘要

协调图是合作多智能体强化学习（MARL）中的核心抽象，然而现有的稀疏图学习者缺乏理论基础的机制来决定哪些边应存在以及每条边应携带多少信息。当前方法依赖于启发式标准，无法保证学习到的拓扑结构的正式保证，并且没有系统的方法来分配不同的通信容量以处理结构不同的智能体关系。为了解决这个问题，我们提出了异质信息瓶颈协调图（HIBCG），它学习了一个组感知的稀疏图，在其中边的存在性和信息容量都得到了理论支持。通过图信息瓶颈（GIB）作为底层工具，HIBCG首先构建了一个组对齐的块对角先验，提供了一个闭式标准用于边保留——确定哪些边应该存在以及每个组块的密度——然后在所得到的拓扑上控制每个智能体的特征带宽，压缩信息以保留仅与任务相关的内容。我们证明了组对齐的先验严格收紧拓扑学习的变分界，目标分解为每个组块，实现了微分边控制，且容量分配遵循水填充原则。

英文摘要

Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism to decide which edges should exist and how much information each edge should carry. Current methods rely on heuristic criteria that offer no formal guarantee on the learned topology, and no principled way to allocate different communication capacities to structurally different agent relationships. To address this, we propose Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG), which learns a group-aware sparse graph in which both edge existence and message capacity are theoretically justified. With the graph information bottleneck (GIB) serving as the underlying tool, HIBCG first constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention -- determining which edges should exist and at what density per group block -- and then controls per-agent feature bandwidth on the resulting topology, compressing messages to retain only task-relevant content. We prove that the group-aligned prior strictly tightens the variational bound on topology learning, that the objective decomposes per group block, enabling differential edge control, and that capacity allocation follows a water-filling principle.

URL PDF HTML ☆

赞 0 踩 0

2605.17390 2026-05-19 cs.SE cs.LG cs.LO 版本更新

NOETHER: A Constructive Framework for Metamorphic Pattern Discovery from Operator Algebras

NOETHER：从算子代数中构造性地发现元模式的框架

Meng Li, Xiaohua Yang, Jie Liu, Shiyu Yan

发表机构 * School of Computing, University of South China（南华大学计算机学院）； Hunan Engineering Research Center of Software Evaluation and Testing for Intellectual Equipment（湖南软件测评与智能设备工程研究中心）； CNNC Key Laboratory on High Trusted Computing（中核集团高可信计算重点实验室）

AI总结本文提出NOETHER框架，通过从程序诱导的算子代数到元模式集的机械且可证明的下游步骤，解决元模式关系识别中的基础问题，同时通过三个算子代数领域验证了该框架的代数闭包和多项式时间可判定性。

Comments 71 pages, 18 tables, 1 figure. Under review at ACM Transactions on Software Engineering and Methodology. Supplementary materials (algorithm reference implementation, 84-MR PWR corpus, SE(3) case study harness, three-tier METRIC+ replication) at https://github.com/meng004/P1-MetaPattern

详情

AI中文摘要

背景。元测试被IEEE/ISO软件测试标准认可，并越来越多地推荐用于AI系统，但其进展受元模式关系（MR）识别的瓶颈限制：现有方法（结构化框架、挖掘和进化流水线、LLM辅助方法、MetaPattern目录）共享一个归纳基础，留下三个根本问题未解决：起源、闭包和可转移性。目标。我们提出一个框架，其下游步骤从程序诱导的算子代数到元模式集是机械且可证明的，而上游的代数整理是一个明确的实证假设，具有显式的作用域前提。方法。NOETHER是一个两层框架。上游层是对递归数学结构（对称性、顺序、自共轭、时间反演、极限、定性动力学、方法比较、关系等价）的八块分解。下游CONSTRUCT-MP算法生成具有代数闭包（定理1）和多项式时间可判定性（定理2）保证的元模式集。我们测试了该框架在三个算子代数领域。结果。在Boltzmann反应堆物理NOETHER系统化了先前的归纳目录；在等变ML中推导出可执行的MRs用于旋转不变性、自轭对偶性和训练轨迹可逆性；在关系查询优化器中检验了关系等价块。核心可检验预测（L*-盲性在保持同质性突变器上）在作用域子基上成立。绝对完备性猜想（定理1'）通过PWR核心扩散通过两个相互独立的反例被推翻，这些反例识别出五个Translate-extension维度。结论。归纳从单个程序MR采样转移到每个领域的代数层；下游步骤是演绎且机械的。

英文摘要

Context. Metamorphic Testing is recognised in IEEE/ISO software-testing standards and increasingly recommended for AI systems, but its progress is bottlenecked by metamorphic relation (MR) identification: existing approaches (structured frameworks, mining and evolutionary pipelines, LLM-assisted methods, MetaPattern catalogues) share an inductive grounding that leaves three foundational questions open: origin, closure, and transferability. Objective. We propose a framework whose downstream step from program-induced operator algebra to MetaPattern set is mechanical and provable, while the upstream curation of the algebra is a stated empirical hypothesis with explicit scope precondition. Method. NOETHER is a two-layer framework. The upstream layer is an eight-block decomposition over recurrent mathematical structures (symmetry, order, self-adjoint, time-reversal, limit, qualitative-dynamics, method-comparison, relational equivalence). The downstream CONSTRUCT-MP algorithm produces a MetaPattern set with algebraic-closure (Theorem 1) and polynomial-time decidability (Theorem 2) guarantees. We test the framework on three operator-algebraic domains. Results. On Boltzmann reactor physics NOETHER systematises a prior inductive catalogue; on equivariant ML it derives executable MRs for rotation invariance, adjoint duality, and training-trajectory reversibility; on relational query optimisers it exercises the relational-equivalence block. The central falsifiable prediction (L*-blindness on homogeneity-preserving mutators) holds on the in-scope substrate. The absolute-completeness conjecture (Theorem 1') is falsified on PWR core diffusion via two pairwise-independent counterexamples that identify five Translate-extension dimensions. Conclusion. Induction is relocated from per-program MR sampling to a per-domain algebraic layer; the downstream step is deductive and mechanical.

URL PDF HTML ☆

赞 0 踩 0

2605.17380 2026-05-19 cs.AI cs.CR cs.LG 版本更新

ADR: An Agentic Detection System for Enterprise Agentic AI Security

ADR：一种用于企业代理AI安全的代理检测系统

Chenning Li, Pan Hu, Justin Xu, Baris Ozbas, Olivia Liu, Caroline Van, Manxue Li, Wei Zhou, Mohammad Alizadeh, Pengyu Zhang, KK Sriramadhesikan, Ming Zhang

发表机构 * Uber

AI总结本文提出ADR系统，一种大规模、经过生产验证的企业框架，用于安全地管理通过模型上下文协议（MCP）运行的AI代理。该系统解决了三个关键问题：观测有限、鲁棒性不足和检测成本高，并通过三个组件实现了这些目标：ADR传感器、ADR探索器和ADR检测器。

Comments Accepted at MLSys 2026 (Industry Track)

详情

AI中文摘要

我们提出了代理AI检测与响应（ADR）系统，这是首个大规模、经过生产验证的企业框架，用于安全地管理通过模型上下文协议（MCP）运行的AI代理。我们识别出该领域存在的三个持续挑战：（1）观测有限——现有的终端检测与响应（EDR）工具只能看到文件写入，而无法看到代理推理、提示或连接意图到执行的因果链；（2）鲁棒性不足——静态防御受限于预定义规则，无法在多样化的攻击技术和企业环境中泛化；（3）高检测成本——基于LLM的推理在大规模上成本过高。ADR通过三个组件解决这些挑战：ADR传感器用于高保真的代理遥测，ADR探索器用于系统性的预部署红队行动和困难示例生成，以及ADR检测器用于可扩展的、两阶段在线检测，结合快速初步筛查与上下文感知推理。在Uber部署超过十个月，ADR在生产中保持了可靠的检测，随着采用的增加，已覆盖超过7,200个唯一主机，每天处理超过10,000个代理会话，发现了数百个凭证泄露，涵盖26类，并启用了向左预防层（97.2%的精度，206个检测到的凭证）。为了验证该方法并促进社区采用，我们引入了ADR-Bench（302个任务，17种技术，133个MCP服务器），其中ADR实现了零误报，同时检测了67%的攻击——在F1分数上，比三个最先进的基线（ALRPHFS、GuardAgent、LlamaFirewall）高出2-4倍。在AgentDojo（公共提示注入基准）上，ADR检测了所有攻击，仅在93个任务中产生了3个误报。

英文摘要

We present the Agentic AI Detection and Response (ADR) system, the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol (MCP). We identify three persistent challenges in this domain: (1) limited observability -- existing Endpoint Detection and Response (EDR) tools see file writes but not the agent reasoning, prompts, or causal chains linking intent to execution; (2) insufficient robustness -- static defenses constrained by pre-defined rules fail to generalize across diverse attack techniques and enterprise contexts; and (3) high detection costs -- LLM-based inference is prohibitively expensive at scale. ADR addresses these challenges via three components: the ADR Sensor for high-fidelity agentic telemetry, the ADR Explorer for systematic pre-deployment red teaming and hard-example generation, and the ADR Detector for scalable, two-tier online detection combining fast triage with context-aware reasoning. Deployed at Uber for over ten months, ADR has sustained reliable detection in production with growing adoption reaching over 7,200 unique hosts and processing over 10,000 agent sessions daily, uncovering hundreds of credential exposures across 26 categories and enabling a shift-left prevention layer (97.2% precision, 206 detected credentials). To validate the approach and enable community adoption, we introduce ADR-Bench (302 tasks, 17 techniques, 133 MCP servers), where ADR achieves zero false positives while detecting 67% of attacks -- outperforming three state-of-the-art baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.16234 2026-05-19 cs.LG cs.AI cs.CL 版本更新

No Free Swap: Protocol-Dependent Layer Redundancy in Transformers

没有免费的交换：Transformer中的协议依赖层冗余

Gabriel Garcia

发表机构 * Independent Researcher（独立研究者）

AI总结本文研究了Transformer中层冗余问题，通过比较替换和交换两种协议，发现它们在压缩中的效果存在显著差异，且在相同评估器下，不同协议可能导致层剪枝结果的变化，尤其在高替换距离时更为明显。

Comments 40 pages, 8 figures, 24 tables. Code is available at https://github.com/Gpgabriel25/ProtocolGapDiagnostic

详情

AI中文摘要

当研究人员询问两个Transformer层是否在压缩中“等价”时，他们常常混淆了不同的测试方法。替换测试询问是否可以将一层的映射替换为另一层的映射；交换测试询问是否当两层位置交换时，它们近似可交换。两者都是基于输出的swap-KL探测器，但它们并不总是一致：在预训练的Transformer中，协议差距可能在相同评估器下改变哪些层看起来可以安全剪枝，尤其是在替换距离较高时。我们跨检查点和架构测量了两种协议。在Pythia训练轨迹（410M和1.4B）上，替换-交换差距从初始化到收敛逐渐增大。在8B规模的WikiText-2合同下，Qwen3-8B进入了一个发散阶段：交换引导的移除比替换引导的在相同层预算下更安全，而Llama-3.1-8B在剪枝成本上两者持平，尽管交换KL较低，这表明指标差距不必一对一映射到移除。在层移除或合并之前，应在目标检查点上对两种swap-KL进行评分；该诊断仅需未标记的正向传递。

英文摘要

When researchers ask whether two transformer layers are "equivalent" for compression, they often conflate distinct tests. Replacement asks whether one layer's map can substitute for another's in place; interchange asks whether two layers approximately commute when their positions are swapped. Both are output-grounded swap-KL probes, but they need not agree: on pretrained transformers the protocol gap can change which layers look safe to prune by several-fold under the same evaluator, especially when replacement distances are high. We measure both protocols across checkpoints and architectures. On a Pythia training trajectory (410M and 1.4B), the replacement-interchange gap grows from initialization to convergence. Under one matched WikiText-2 contract at 8B scale, Qwen3-8B enters a divergent regime: interchange-guided removal is several-fold safer than replacement-guided at the same layer budgets, while Llama-3.1-8B ties the two protocols for pruning cost even though interchange KL is lower, showing metric gaps need not map one-to-one to removal. Before layer removal or merging, score both swap-KLs on the target checkpoint; the diagnostic requires only unlabeled forward passes.

URL PDF HTML ☆

赞 0 踩 0

2605.15694 2026-05-19 cs.LG 版本更新

学习归一化能量模型以解决线性逆问题

Nicolas Zilberstein, Santiago Segarra, Eero Simoncelli, Florentin Guth

发表机构 * Rice University（里士满大学）； Flatiron Institute（Flatiron研究所）； New York University（纽约大学）

AI总结本文提出了一种新的能量模型，用于解决线性逆问题，通过引入基于协方差的正则化项来提高不同测量条件下的一致性，从而计算出归一化的后验密度，无需额外训练或微调，同时实现了能量引导的自适应采样、无偏的Metropolis-Hastings修正步骤以及通过贝叶斯规则估计退化算子。

Comments ICML 2026

详情

Journal ref: Int'l Conf Machine Learning (ICML), Jul 2026. https://openreview.net/forum?id=PlFJwgaaDK

AI中文摘要

生成扩散模型可以为成像中的逆问题提供强大的先验概率模型，但现有实现存在两个关键限制：(i) 先验密度以隐式方式表示，(ii) 它们依赖于似然近似，这会引入采样偏见。我们通过引入一种新的能量模型来解决这些挑战，该模型针对去噪进行了训练，并引入了基于协方差的正则化项，以确保在不同测量条件下的一致性。训练后的模型能够为各种线性逆问题计算归一化的后验密度，而无需额外的重新训练或微调。除了保留扩散模型的采样能力外，这还使以前不可用的能力得以实现：能量引导的自适应采样，可以实时调整采样计划，无偏的Metropolis-Hastings修正步骤，以及通过贝叶斯规则估计退化算子。我们验证了该方法在多个数据集（ImageNet、CelebA、AFHQ）和任务（修复、去模糊）上的性能，证明了其与现有基线相比具有竞争力或更优的表现。

英文摘要

Generative diffusion models can provide powerful prior probability models for inverse problems in imaging, but existing implementations suffer from two key limitations: $(i)$ the prior density is represented implicitly, and $(ii)$ they rely on likelihood approximations that introduce sampling biases. We address these challenges by introducing a new energy-based model trained for denoising with a covariance-based regularization term that enforces consistency across different measurement conditions. The trained model can compute normalized posterior densities for diverse linear inverse problems, without additional retraining or fine tuning. In addition to preserving the sampling capabilities of diffusion models, this enables previously unavailable capabilities: energy-guided adaptive sampling that adjusts schedules on-the-fly, unbiased Metropolis-Hastings correction steps, and blind estimation of the degradation operator via Bayes rule. We validate the method on multiple datasets (ImageNet, CelebA, AFHQ) and tasks (inpainting, deblurring), demonstrating competitive or superior performance to established baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.14005 2026-05-19 cs.CL cs.LG 版本更新

Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

毒藤：针对推测解码的隐秘加速-崩溃攻击

Shuoyang Sun, Chang Dai, Hao Fang, Kuofeng Gao, Xinhao Zhong, Yi Sun, Fan Mo, Shu-Tao Xia, Bin Chen

发表机构 * Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； South China University of Technology（华南理工大学）； Tsinghua Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Huawei Technology（华为技术）

AI总结本文提出Mistletoe攻击，通过优化降质目标和语义保留目标，隐秘地降低推测解码的接受长度τ，从而减少加速效果，同时保持输出质量。

详情

AI中文摘要

打破赢家通吃：合作策略优化提升大语言模型的多样化推理

Haoxuan Chen, Tianming Liang, Wei-Shi Zheng, Jian-Fang Hu

发表机构 * ISEE Lab, Sun Yat-sen University（中山大学ISEE实验室）

AI总结本文提出Group Cooperative Policy Optimization (GCPO)方法，通过改变训练范式从 rollout 竞争转向团队合作，提升大语言模型在推理任务中的准确性和解题多样性。

详情

AI中文摘要

基于验证器的强化学习（RLVR）已成为提升大语言模型（LLM）推理能力的核心范式，然而流行的基于群体的优化算法如GRPO常常面临探索崩溃问题，即模型过早收敛于一组高分模式，缺乏探索新解的能力。最近的研究尝试通过添加熵正则化或多样性奖励来缓解这一问题，但这些方法并未改变赢家通吃的本质，即rollouts仍为个体优势竞争而非合作最大化全局多样性。在本文中，我们提出Group Cooperative Policy Optimization（GCPO），将训练范式从rollout竞争转向团队合作。具体而言，GCPO将独立rollout评分替换为团队层面的信用分配：rollout被奖励其对团队有效解覆盖的贡献，而非其个体准确性。该覆盖被描述为奖励加权语义嵌入上的确定体体积，其中只有正确且非冗余的rollout才对这一体积做出贡献。在优势估计过程中，GCPO将集体团队奖励重新分配给每个单个rollout，根据其对团队的平均边际贡献。这种合作训练范式将优化方向导向非冗余的正确推理路径。在多个推理基准测试中，GCPO在现有方法的基础上显著提高了推理准确性和解题多样性。代码将在https://github.com/bradybuddiemarch/gcpo上发布。

英文摘要

Reinforcement learning with verifiers (RLVR) has become a central paradigm for improving LLM reasoning, yet popular group-based optimization algorithms like GRPO often suffer from exploration collapse, where the models prematurely converge on a narrow set of high-scoring patterns, lacking the ability to explore new solutions. Recent efforts attempt to alleviate this by adding entropy regularization or diversity bonus. However, these approaches do not change the \textit{winner-takes-all} nature, where rollouts still compete for individual advantage rather than cooperating for maximizing global diversity. In this work, we propose Group Cooperative Policy Optimization (GCPO), which shifts the training paradigm from rollout competition to team cooperation. Specifically, GCPO replaces independent rollout scoring with team-level credit assignment: a rollout is rewarded by how much it contributes to the team's valid solution coverage, rather than its individual accuracy. This coverage is described as a determinant volume over reward-weighted semantic embeddings, where only correct and non-redundant rollouts contribute to this volume. During advantage estimation, GCPO redistributes the collective team reward to each single rollout according to its average marginal contribution to the team. This cooperative training paradigm routes optimization toward non-redundant correct reasoning paths. Experiments across multiple reasoning benchmarks demonstrate that GCPO significantly improves both reasoning accuracy and solution diversity over existing approaches. Code will be released at https://github.com/bradybuddiemarch/gcpo.

URL PDF HTML ☆

赞 0 踩 0

2605.10871 2026-05-19 physics.med-ph cs.AI cs.LG 版本更新

Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography

吸引子-血管耦合理论：为基于智能手机光电容积图的AAMI标准无创血压估计提供形式基础和实证验证

Timothy Oladunni, Farouk Ganiyu Adewumi

发表机构 * Department of Computer Science, Morgan State University（莫根州立大学计算机科学系）

AI总结本文提出了一种数学框架，证明心脏吸引子几何编码了足够的血压信息，用于AAMI标准估计，并通过校准的无创血压模型验证了该理论，利用光电容积图（PPG）进行血压估计。

详情

AI中文摘要

本文提出吸引子-血管耦合理论（AVCT），一种数学框架，证明心脏吸引子几何编码了足够的血压（BP）信息，足以用于AAMI标准估计，并通过使用光电容积图（PPG）的校准无创血压模型验证了该理论。AVCT基于心脏稳定性理论，并通过Takens延迟嵌入和吸引子形态提取进行操作化。两个定理、一个命题和一个推论正式证明了PPG吸引子特征用于血压估计的使用，并预测了特征重要性层次。一个使用脉搏传导时间（PTT）和心脏稳定性指数（CSI）吸引子特征训练的LightGBM模型在严格留一受试者出交叉验证（LOSO-CV）上进行了评估，评估了来自BIDMC ICU（n=9）和VitalDB手术数据（n=37）的46名受试者，共29,684个窗口。该模型实现了收缩压（SBP）的平均绝对误差（MAE）为2.05 mmHg，舒张压（DBP）的MAE为1.67 mmHg，相关系数r=0.990和r=0.991，满足AAMI/IEEE SP10要求的MAE低于5 mmHg。每个受试者的中位数MAE为1.87/1.54 mmHg，70%/76%的受试者个体满足AAMI标准。使用九个智能手机吸引子特征的PPG-only消融与ECG+PPG模型的误差在0.05 mmHg以内，证明了仅使用智能手机摄像头即可实现临床级血压跟踪，超过了以往使用更少传感器的LOSO-CV结果。所有四个AVCT预测都得到了定量确认，从未校准到校准估计的误差减少了91.5%（epsilon_cal=0.915）。与后验可解释AI方法不同，AVCT预测的特征满足可解释AI可信度（EAT）框架的建筑忠实性标准，并将血压估计扎根于非线性动力学系统理论。

英文摘要

This work proposes Attractor-Vascular Coupling Theory (AVCT), a mathematical framework showing that cardiac attractor geometry encodes blood pressure (BP) information sufficient for AAMI-standard estimation, and validates the theory through a calibrated cuffless BP model using photoplethysmography (PPG). AVCT is grounded in Cardiac Stability Theory and operationalized using Takens delay embedding and attractor morphology extraction. Two theorems, one proposition, and one corollary formally justify the use of PPG attractor features for BP estimation and predict the feature-importance hierarchy. A LightGBM model trained on pulse transit time (PTT) and Cardiac Stability Index (CSI) attractor features under single-point calibration was evaluated using strict leave-one-subject-out cross-validation (LOSO-CV) on 46 subjects from BIDMC ICU (n = 9) and VitalDB surgical data (n = 37), comprising 29,684 windows. The model achieved systolic BP (SBP) mean absolute error (MAE) of 2.05 mmHg and diastolic BP (DBP) MAE of 1.67 mmHg, with correlations r = 0.990 and r = 0.991, satisfying the AAMI/IEEE SP10 requirement of MAE below 5 mmHg. Median per-subject MAE was 1.87/1.54 mmHg, and 70%/76% of subjects individually satisfied AAMI criteria. A PPG-only ablation using nine smartphone attractor features matched the ECG+PPG model within 0.05 mmHg, demonstrating that clinical-grade BP tracking is achievable using only a smartphone camera while surpassing prior generalized LOSO-CV results using fewer sensors. All four AVCT predictions were quantitatively confirmed, with 91.5% error reduction from uncalibrated to calibrated estimation (epsilon_cal = 0.915). Unlike post-hoc explainable AI methods, AVCT predicts features satisfying the architectural faithfulness criterion of the Explainable-AI Trustworthiness (EAT) framework and grounding BP estimation in nonlinear dynamical systems theory.

URL PDF HTML ☆

赞 0 踩 0

2605.10236 2026-05-19 cs.LG cs.AI 版本更新

When Does Non-Uniform Replay Matter in Reinforcement Learning?

在强化学习中非均匀回放何时起作用？

Michal Korniak, Mikołaj Czarnecki, Yarden As, Piotr Miłoś, Pieter Abbeel, Michal Nauman

发表机构 * ETH Zurich（苏黎世联邦理工学院）； University of Warsaw（华沙大学）； UC Berkeley（伯克利加州大学）； Amazon FAR（亚马逊FAR）

AI总结本文研究了非均匀回放在强化学习中的有效性，发现回放体积、预期近期性和回放分布熵是决定因素，并提出了一种简单有效的截断几何回放策略以提高样本效率。

详情

AI中文摘要

现代非策略强化学习算法通常依赖于简单的均匀回放采样，但非均匀回放何时以及为何优于这一强基线仍不清楚。在多样化的强化学习设置中，我们证明非均匀回放的有效性由三个因素决定：回放体积、每环境步骤回放的转换数量；预期近期性，即所采样转换的近期程度；以及回放采样分布的熵。我们的主要贡献是明确非均匀回放何时有益，并为现代非策略强化学习中的回放设计提供实用指导。我们发现，当回放体积较低时，非均匀回放最有益，且即使在预期近期性相当时，高熵采样也很重要。受这些发现的启发，我们采用了一种简单的截断几何回放策略，该策略倾向于近期经验，同时保持高熵并带来可忽略的计算开销。在大规模并行模拟、单任务和多任务设置中，包括在五个强化学习基准套件上评估的三种现代算法，这种回放采样策略在低体积情况下提高了样本效率，而在高回放体积时仍具有竞争力。

英文摘要

Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward recent experience while preserving high entropy and incurring negligible computational overhead. Across large-scale parallel simulation, single-task, and multi-task settings, including three modern algorithms evaluated on five RL benchmark suites, this replay sampling strategy improves sample efficiency in low-volume regimes while remaining competitive when replay volume is high.

URL PDF HTML ☆

赞 0 踩 0

2605.09855 2026-05-19 cs.LG 版本更新

Concordia: Self-Improving Synthetic Tables for Federated LLMs

Concordia：面向联邦大语言模型的自改进合成表格

Jimin Huang, Duanyu Feng, Nuo Chen, Xiaoyu Wang, Zhiqiang Zhang, Xueqing Peng, Mingquan Lin, Prayag Tiwari, Guojun Xiong, Alejandro Lopez-Lira, Sophia Ananiadou

发表机构 * University of Manchester（曼彻斯特大学）； National University of Singapore（新加坡国立大学）； New York University（纽约大学）； University of Minnesota（明尼苏达大学）； Halmstad University（哈姆斯塔德大学）； Harvard University（哈佛大学）； University of Florida（佛罗里达大学）

AI总结本文研究了在无法共享原始数据的情况下，如何通过自改进的合成表格来提升联邦学习中大语言模型的适应能力，提出了一种三层优化框架Concordia，通过参数高效LoRA训练和轻量级效用评分器提升联邦验证效用和跨客户端稳定性。

Comments 12 pages

详情

AI中文摘要

联邦学习（FL）能够在不共享原始数据的情况下训练大型语言模型（LLMs），但在严格的数据隔离和非独立同分布（non-IID）客户端分布下，适应LLMs仍然具有挑战性。合成数据为本地训练提供了自然的隐私保护替代方案，但现有联邦流程通常将合成生成视为静态或松散耦合于下游优化，导致在异质客户端下效用迅速下降。我们研究了在无法共享原始记录和验证数据的情况下，如何在表格任务中进行联邦适应，并且本地训练必须完全依赖合成表格。我们提出Concordia，一种三层优化框架，该框架在这些约束下对齐合成数据生成与联邦验证效用。在客户端层面，模型通过参数高效LoRA训练在合成表格上进行适应。客户端还从私有验证反馈中学习轻量级效用评分器，以在本地训练中重新加权合成样本。在外层，每个客户端使用组相对策略优化（GRPO）来细化自己的合成表格生成器，由跨客户端共享的异质评分器集合引导，而无需聚合生成器参数或暴露验证数据。在隐私敏感的表格基准测试中，Concordia在金融和医疗领域展示了比静态和解耦合成数据基线更一致的联邦性能、跨客户端稳定性和对分布偏移的鲁棒性。

英文摘要

Federated learning (FL) enables training large language models (LLMs) without sharing raw data, but adapting LLMs under strict data isolation and non-IID client distributions remains challenging in practice. Synthetic data offers a natural privacy-preserving surrogate for local training, yet existing federated pipelines typically treat synthetic generation as static or loosely coupled with downstream optimization, leading to rapidly diminishing utility under heterogeneous clients. We study federated adaptation of LLMs on tabular tasks where raw records and validation data cannot be shared, and local training must rely entirely on synthetic tables. We propose Concordia, a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite these constraints. At the client level, models are adapted via parameter-efficient LoRA training on synthetic tables. Clients additionally learn lightweight utility scorers from private validation feedback to reweight synthetic samples during local training. At the outer level, each client refines its own synthetic table generator using group-relative policy optimization (GRPO), guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare demonstrate that Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift compared to static and decoupled synthetic-data baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.09040 2026-05-19 cs.AI cs.IR cs.LG 版本更新

UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence

UxSID：面向超长序列的语义感知用户兴趣建模

Hongwei Zhang, Qiqiang Zhong, Jiangxia Cao, Yiyang Lv, Huanjie Wang, Liwei Guan, Jing Yao, Yiyu Wang, Junfeng Shu, Zhaojie Liu, Han Li

发表机构 * Kuaishou Technology（快手科技）

AI总结本文提出UxSID框架，通过语义组共享兴趣记忆和双层注意力策略，实现高效且语义感知的超长用户序列建模，取得最佳性能并提升广告收益。

Comments Work in progress

2605.08738 2026-05-19 cs.LG cs.AI cs.CL 版本更新

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

SlimQwen: 探索在大规模MoE模型预训练中的剪枝与知识蒸馏

Shengkun Tang, Zekun Wang, Bo Zheng, Liangyu Wang, Rui Men, Siqi Zhang, Xiulong Yuan, Zihan Qiu, Zhiqiang Shen, Dayiheng Liu

发表机构 * Qwen Team, Alibaba Inc.（通义实验室，阿里公司）； MBZUAI ； KAUST（卡士大学）

AI总结本文研究了在大规模预训练中如何应用剪枝和知识蒸馏技术，探讨了剪枝在初始化方面的优势、专家压缩对最终模型的影响以及训练策略的有效性，最终将Qwen3-Next-80A3B压缩到23A2B模型并保持竞争力。

详情

AI中文摘要

结构化剪枝和知识蒸馏（KD）是压缩大型语言模型的典型技术，但其在预训练规模下的应用仍不清楚，尤其是针对最近的混合专家（MoE）模型。本文系统研究了大规模预训练中的MoE压缩，重点探讨三个关键问题：剪枝是否比从头训练提供更好的初始化；专家压缩选择如何影响继续训练后的最终模型；以及哪种训练策略最有效。我们得出以下发现：首先，在深度、宽度和专家压缩方面，对预训练MoE进行剪枝在相同训练预算下优于从头训练。其次，不同的单次专家压缩方法在大规模持续预训练后收敛到相似的最终性能。受此启发，我们引入了一种简单的部分保留专家合并策略，该策略在大多数基准上提升了下游性能。第三，结合KD与语言建模损失在知识密集型任务上优于仅使用KD。我们进一步提出了多令牌预测（MTP）蒸馏，其效果一致。最后，鉴于相同的训练令牌，渐进式剪枝计划优于单次压缩，表明渐进的架构过渡导致更好的优化轨迹。综合来看，我们将Qwen3-Next-80A3B压缩到23A2B模型，保持了竞争力。这些结果为大规模高效MoE压缩提供了实用指导。

英文摘要

Structured pruning and knowledge distillation (KD) are typical techniques for compressing large language models, but it remains unclear how they should be applied at pretraining scale, especially to recent mixture-of-experts (MoE) models. In this work, we systematically study MoE compression in large-scale pretraining, focusing on three key questions: whether pruning provides a better initialization than training from scratch, how expert compression choices affect the final model after continued training, and which training strategy is most effective. We have the following findings: First, across depth, width, and expert compression, pruning a pretrained MoE consistently outperforms training the target architecture from scratch under the same training budget. Second, different one-shot expert compression methods converge to similar final performance after large-scale continual pretraining. Motivated by this, we introduce a simple partial-preservation expert merging strategy that improves downstream performance across most benchmarks. Third, combining KD with the language modeling loss outperforms KD alone, particularly on knowledge-intensive tasks. We further propose multi-token prediction (MTP) distillation, which yields consistent gains. Finally, given the same training tokens, progressive pruning schedules outperform one-shot compression, suggesting that gradual architecture transitions lead to better optimization trajectories. Putting it all together, we compress Qwen3-Next-80A3B to a 23A2B model that retains competitive performance. These results offer practical guidance for efficient MoE compression at scale.

URL PDF HTML ☆

赞 0 踩 0

2605.07790 2026-05-19 cs.LG cs.CV 版本更新

Hessian Surgery: Class-Targeted Post-Hoc Rebalancing via Hessian Spike Perturbation

Hessian Surgery: 通过Hessian尖峰扰动实现类目标后处理重平衡

Hugo Vigna, Samuel Bontemps

发表机构 * CentraleSupélec – Université Paris-Saclay（中央理工巴黎高等学院 – 巴黎萨克莱大学）； ESILV – Léonard de Vinci（ESILV – 莱昂纳德·德·文奇）

AI总结本文提出Hessian Surgery方法，通过扰动模型权重沿尖峰特征向量来重平衡各类准确率，无需重新训练，提升了CIFAR-10和ISIC-2019数据集的平衡准确率和标准差。

Comments The code is available here: https://github.com/hugovigna/hessian-surgery.git

详情

AI中文摘要

训练好的深度网络的Hessian谱表现出一种特征结构：连续的近零特征值和少量的大异常特征值（尖峰），证实了随机矩阵理论在深度学习中的相关性。尖峰数量与类别数减一相匹配。尽管先前工作描述了这种结构，但没有方法将其操作化以提高分类性能。我们提出Hessian Surgery，一种后处理优化方法，直接扰动模型权重沿尖峰特征向量以重平衡各类准确率而无需重新训练。我们引入（i）一个尖峰类敏感度矩阵，量化每个类准确率沿每个尖峰特征向量的方向导数，（ii）一个约束优化扰动系数，针对弱类同时保持强类，以及（iii）自适应幅度控制，根据迭代级改进信号调整扰动预算。我们在CIFAR-10和ISIC-2019上获得了令人鼓舞的结果，同时在平衡准确率和标准差方面都取得了显著提升。

英文摘要

The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike count matches the number of classes minus one. While prior work has described this structure, no method has exploited it operationally to improve classification performance. We propose Hessian Surgery, a post-hoc optimization method that directly perturbs model weights along spike eigenvectors to rebalance per-class accuracy without retraining. We introduce (i) a spike-class sensitivity matrix that quantifies the directional derivative of each class's accuracy along each spike eigenvector, (ii) a constrained optimization of perturbation coefficients that targets weak classes while preserving strong ones, and (iii) an adaptive amplitude control that raises or lowers the perturbation budget based on iteration-level improvement signals. We obtain encouraging results on CIFAR-10 and ISIC-2019 on both balanced accuracy and standard deviation.

URL PDF HTML ☆

赞 0 踩 0

2605.00264 2026-05-19 cs.LG cs.GT 版本更新

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

通过KL正则化实现一般和博弈中的无悲观离线学习

Claire Chen, Yuheng Zhang

AI总结本文提出了一种基于KL正则化的离线学习方法，能够在一般和博弈中实现无悲观的均衡恢复，通过加速的统计速率和计算高效的算法提升学习效率。

2604.23267 2026-05-19 cs.CL cs.LG 版本更新

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

在大型语言模型中微调与上下文学习：从形式语言学习的角度

Bishwamittra Ghosh, Soumi Das, Till Speicher, Qinyuan Wu, Mohammad Aflah Khan, Deepak Garg, Krishna P. Gummadi, Evimaria Terzi

发表机构 * Max Planck Institute for Software Systems（马克斯·普朗克软件系统研究所）； Boston University（波士顿大学）

AI总结本文从形式语言学习的角度比较了大型语言模型中的微调与上下文学习，通过设计精确的语言边界、受控字符串采样和无数据污染的任务，发现微调在分布内泛化上优于上下文学习，而两者在分布外泛化上表现相当，且两者在不同熟练度水平上的归纳偏置也有所不同。

Comments Accepted at ACL 2026 (Main)

详情

AI中文摘要

大型语言模型（LLMs）在两种基本的学习模式中运作——微调（FT）和上下文学习（ICL），这引发了关于哪种模式产生更大的语言能力以及它们是否在归纳偏置上有所不同的关键问题。先前比较FT和ICL的研究由于实验设置不一致而得出混杂和不明确的结果。为了实现严格比较，我们提出了一项形式语言学习任务——提供精确的语言边界、受控字符串采样和无数据污染，并引入一种判别测试来评估语言能力，其中LLM成功当且仅当它将更高生成概率分配给语言字符串而不是非语言字符串。经验上，我们发现：（a）FT在分布内泛化上比ICL更具语言能力，但两者在分布外泛化上表现相当。（b）它们的归纳偏置，通过字符串生成概率的相关性来衡量，当两种模式部分学习语言时相似，但在更高熟练度水平上分化。（c）与FT不同，ICL的表现在不同大小和家族的模型之间差异显著，并且对语言的token词汇表敏感。因此，我们的工作展示了形式语言作为评估LLM的受控测试床的潜力，这些行为在自然语言数据集中难以隔离。我们的源代码可在https://github.com/bishwamittra/formallm上获得。

英文摘要

Large language models (LLMs) operate in two fundamental learning modes - fine-tuning (FT) and in-context learning (ICL) - raising key questions about which mode yields greater language proficiency and whether they differ in their inductive biases. Prior studies comparing FT and ICL have yielded mixed and inconclusive results due to inconsistent experimental setups. To enable a rigorous comparison, we propose a formal language learning task - offering precise language boundaries, controlled string sampling, and no data contamination - and introduce a discriminative test for language proficiency, where an LLM succeeds if it assigns higher generation probability to in-language strings than to out-of-language strings. Empirically, we find that: (a) FT has greater language proficiency than ICL on in-distribution generalization, but both perform equally well on out-of-distribution generalization. (b) Their inductive biases, measured by the correlation in string generation probabilities, are similar when both modes partially learn the language but diverge at higher proficiency levels. (c) Unlike FT, ICL performance differs substantially across models of varying sizes and families and is sensitive to the token vocabulary of the language. Thus, our work demonstrates the promise of formal languages as a controlled testbed for evaluating LLMs, behaviors that are difficult to isolate in natural language datasets. Our source code is available at https://github.com/bishwamittra/formallm.

URL PDF HTML ☆

赞 0 踩 0

2604.23135 2026-05-19 cs.LG 版本更新

Characterizing Paraphrase-Induced Failures in Lean 4 Autoformalization

刻画 Lean 4 自动形式化中的同义词诱导失败

William Feng, Ethan Lou, Aryan Sharma

发表机构 * Yale University（耶鲁大学）

AI总结本研究探讨了 Lean 4 自动形式化中由于同义词变化导致的失败模式，通过应用确定性同义词规则到本科和竞赛级数学问题数据集，发现代码生成层的失败主导了同义词敏感性，并揭示了不同数据集对失败类型的影响，结果为自动形式化提供了失败模式分类并推动了针对性的训练干预。

详情

AI中文摘要

近年来，Lean 4 自动形式化在前沿语言模型和开放权重自动形式化器中变得越来越流行，这些模型现在能够生成数学定理的有效形式化。然而，这些评估通常依赖于单个标准定理表述，很少探讨输出是否对输入的自然变化具有鲁棒性，而先前的工作已表明语义等价的同义词变化常导致形式化输出的差异。我们通过应用确定性同义词规则到本科和竞赛级数学问题数据集，研究了 Lean 4 中这些差异的结构。在四个前沿模型和三个开放权重自动形式化器上，我们发现同义词敏感性主要由代码生成层的失败主导，并且这些失败在不同数据集中被类型化不同。此外，这些模式扩展到开放权重模型，显示最先进的自动形式化器仍难以生成有效的 Lean 代码。我们的结果为自动形式化提供了失败模式分类，并推动了针对特定编译失败的训练干预。

英文摘要

Lean 4 autoformalization has become increasingly popular in recent years, with frontier language models and open-weight autoformalizers now producing valid formalizations of mathematical theorems. However, these evaluations often rely on single canonical phrasings of theorems and rarely probe whether outputs are robust to natural variation in inputs, while prior work has shown that semantically equivalent paraphrases often induce divergent formal outputs. We study the structure of these divergences in Lean 4 by applying deterministic paraphrase rules to datasets of undergraduate and Olympiad-level math problems. Across four frontier models and three open-weight autoformalizers, we find that paraphrase sensitivity is dominated by failures at the code-generation layer, and that these failures are typed differently by dataset. Furthermore, these patterns generalize to open-weight models, showing that state-of-the-art autoformalizers still struggle to generate valid Lean code. Our results provide a failure-mode taxonomy for autoformalization and motivate training-time interventions targeted at specific compilation failures.

URL PDF HTML ☆

赞 0 踩 0

2604.18966 2026-05-19 cs.LG cs.AI 版本更新

Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

通过迭代奖励引导的后训练改进表格语言模型

Yunbo Long, Tejumade Afonja, Guangya Hao, Alexandra Brintrup, Mario Fritz

发表机构 * Department of Engineering, University of Cambridge（剑桥大学工程系）； CISPA Helmholtz Center for Information Security, Saarbrücken, Germany（德国萨尔布吕肯信息安全中心）； The Alan Turing Institute, London（伦敦阿兰·图灵研究所）

AI总结本文研究了通过生成-评分-对齐协议进行迭代奖励引导的后训练，提出了一种基于组相对对齐的方法TabGRAA，通过比较高分和低分生成组的组平均策略/参考对数比来改进表格语言模型，在五个混合类型基准上优于额外监督微调，并在保真度和下游效用之间实现了最佳平均权衡，同时保持经验隐私诊断接近监督基线。

详情

AI中文摘要

表格语言模型可以通过将行建模为令牌序列来生成合成表格，但通常通过监督微调一次后就作为静态生成器使用。这限制了下一步令牌似然不能直接优化用于评估合成数据的分布、效用和不可区分性属性。我们通过生成-评分-对齐协议研究了表格语言模型的迭代奖励引导后训练，其中生成器采样合成行，任务特定的奖励对其进行排序，模型则相对于固定监督参考进行更新。在该协议中，我们提出了TabGRAA（表格组相对优势对齐），通过组平均的策略/参考对数比比较高分和低分生成组，而非一对一偏好对。在五个混合类型基准上，TabGRAA在GReaT基座上优于额外监督微调，并在保真度和下游效用之间实现了最强的平均权衡，同时保持经验隐私诊断接近监督基线。消融研究显示，收益依赖于有意义的奖励排名和稳定的组级更新，而非额外训练本身。奖励替换和评分分离研究进一步表明，后训练循环可以使用基于分类器和无分类器的奖励，且适当的评分分离对于保持保真度-效用-隐私权衡至关重要。这些结果将TabGRAA定位为一种自改进的后训练方法，用于表格语言模型生成器，作为强大静态表格生成器的补充。

英文摘要

Tabular language models can generate synthetic tables by modeling rows as token sequences, but they are typically trained once with supervised fine-tuning and then used as static synthesizers. This is limiting because next-token likelihood does not directly optimize the distributional, utility, and indistinguishability properties used to evaluate synthetic data. We study iterative reward-guided post-training for tabular language models through a generate--score--align protocol, where a generator samples synthetic rows, a task-specified reward ranks them, and the model is updated relative to a fixed supervised reference. Within this protocol, we propose \textbf{TabGRAA} (\textbf{Tab}ular \textbf{G}roup-\textbf{R}elative \textbf{A}dvantage \textbf{A}lignment), a group-relative alignment method that compares high- and low-reward generated groups using group-averaged policy/reference log-ratios rather than one-to-one preference pairs. Across five mixed-type benchmarks, TabGRAA improves a GReaT backbone beyond additional supervised fine-tuning and achieves the strongest average trade-off among adapted DPO, KTO, and NPO baselines on fidelity and downstream utility, while maintaining empirical privacy diagnostics near the supervised baseline. Ablations show that the gains depend on meaningful reward ranking and stable group-level updates rather than extra training alone. Reward-substitution and scorer-separation studies further show that the post-training loop can use both classifier-based and classifier-free rewards, and that proper scorer separation is important for preserving the fidelity--utility--privacy trade-off. These results position TabGRAA as a self-improving post-training method for tabular language-model generators, complementary to strong static tabular synthesizers.

URL PDF HTML ☆

赞 0 踩 0

2604.16429 2026-05-19 cs.LG cs.AI cs.CV physics.ao-ph 版本更新

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

(稀疏) 注意细节：在基于机器学习的天气预测模型中保持频谱保真度

Maksim Zhdanov, Ana Lucic, Max Welling, Jan-Willem van de Meent

发表机构 * AMLab（AM实验室）； University of Amsterdam（阿姆斯特丹大学）

AI总结本文提出Mosaic模型，通过学习功能扰动生成集合成员，并利用网格对齐的块稀疏注意力机制，在原分辨率网格上操作，以线性成本捕捉长距离依赖关系，从而在1.5°分辨率下达到或超越更精细分辨率模型的性能，实现了状态-of-the-art结果。

Comments Accepted to ICML 2026

详情

AI中文摘要

我们介绍Mosaic，一种概率天气预测模型，旨在解决基于机器学习的天气预测中频谱退化问题的三种失败模式：频谱阻尼（统计学）、高频混叠（架构学）和残余高频泄漏（参数学）。Mosaic通过学习的功能扰动生成集合成员，并通过网格对齐的块稀疏注意力机制在原分辨率网格上操作，该机制是一种硬件对齐的机制，通过在空间相邻查询之间共享键和值，以线性成本捕捉长距离依赖关系。在1.5°分辨率和214M参数下，Mosaic在关键变量上达到或超越了在6倍更精细分辨率上训练的模型的性能，并在1.5°模型中实现了最先进的结果，生成了经过良好校准的集合，其个体成员在所有解析频率上表现出近乎完美的频谱对齐。一个24成员、10天的预测在单个H100 GPU上不到12秒。代码可在https://github.com/maxxxzdn/mosaic上获得。

英文摘要

We introduce Mosaic, a probabilistic weather forecasting model that addresses three failure modes of spectral degradation in ML-based weather prediction: spectral damping (statistical), high-frequency aliasing (architectural), and residual high-frequency leakage (parametric). Mosaic generates ensemble members through learned functional perturbations and operates on native-resolution grids via mesh-aligned block-sparse attention, a hardware-aligned mechanism that captures long-range dependencies at linear cost by sharing keys and values across spatially adjacent queries. At 1.5° resolution with 214M parameters, Mosaic matches or outperforms models trained on 6$\times$ finer resolution on key variables and achieves state-of-the-art results among 1.5° models, producing well-calibrated ensembles whose individual members exhibit near-perfect spectral alignment across all resolved frequencies. A 24-member, 10-day forecast takes under 12s on a single H100~GPU. Code is available at https://github.com/maxxxzdn/mosaic.

URL PDF HTML ☆

赞 0 踩 0

2604.15851 2026-05-19 cs.LG cs.AI cs.CR 版本更新

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

DPrivBench：评估大语言模型在差分隐私推理中的基准测试

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu

发表机构 * Halıcıoğlu Data Science Institute, UC San Diego（哈里奇奥格卢数据科学研究所，加州大学圣地亚哥分校）； Department of Computer Science and Engineering, UC San Diego（计算机科学与工程系，加州大学圣地亚哥分校）； Department of Electrical Engineering, National Taiwan University（电气工程系，国立台湾大学）； OpenAI

AI总结本文提出DPrivBench基准测试，用于评估大语言模型在差分隐私推理中的能力，发现当前模型在高级算法推理上存在显著差距，并为改进自动化差分隐私推理提供了方向。

详情

AI中文摘要

差分隐私（DP）在保护数据隐私方面有广泛的应用，但设计和验证DP算法需要专家级推理，这为非专家从业者设置了高门槛。先前的工作要么依赖于需要大量领域专业知识的专用验证语言，要么仍然是半自动化的，需要人工在循环中指导。在本文中，我们研究大语言模型（LLMs）能否自动化DP推理。我们引入了DPrivBench，这是一个基准测试，每个实例询问函数或算法是否在指定假设下满足陈述的DP保证。该基准测试精心设计，覆盖了广泛的DP主题，跨越不同的难度级别，并通过简单的模式匹配来抵抗快捷推理。实验显示，尽管最强的模型能够处理教科书机制，但所有模型在高级算法上都面临困难，揭示了当前DP推理能力的显著差距。通过进一步的分析研究和失败模式分析，我们识别出改进自动化DP推理的几个有前途的方向。我们的基准测试为开发和评估此类方法提供了坚实的基础，并补充了现有的数学推理基准测试。

英文摘要

Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models struggle with advanced algorithms, revealing substantial gaps in current DP reasoning capabilities. Through further analytic study and failure-mode analysis, we identify several promising directions for improving automated DP reasoning. Our benchmark provides a solid foundation for developing and evaluating such methods, and complements existing benchmarks for mathematical reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.15762 2026-05-19 cs.LG 版本更新

Zero-Shot Scalable Resilience in UAV Swarms: A Decentralized Imitation Learning Framework with Physics-Informed Graph Interactions

无人机群中的零样本可扩展韧性：一种带有物理信息图交互的去中心化模仿学习框架

Huan Lin, Lianghui Ding

发表机构 * Institute of Image Communication and Network Engineering, School of Integrated Circuits, Shanghai Jiao Tong University（图像通信与网络工程研究所，集成电路学院，上海交通大学）

AI总结本文提出了一种去中心化模仿学习框架，通过物理信息图神经网络编码局部交互，实现无人机群在大规模故障和碎片化拓扑下的鲁棒恢复。

详情

AI中文摘要

大规模无人机（UAV）故障可能导致无人机群网络分裂为断开的子网络，使得去中心化恢复既紧迫又困难。集中式恢复方法依赖于全局拓扑信息，在严重碎片化后变得通信密集。去中心化启发法和多智能体强化学习方法更容易部署，但其性能在群规模和损坏严重程度变化时通常会退化。我们提出了物理信息图对抗模仿学习算法（PhyGAIL），该算法采用集中训练与去中心化执行。PhyGAIL从异构观测中构建有界的局部交互图，并利用物理信息图神经网络将方向局部交互编码为具有显式吸引力和排斥力的门控消息传递。这使策略具有物理基础的协调偏置，同时保持局部观测的尺度不变性。它还使用场景自适应模仿学习来改进在碎片化拓扑和可变长度恢复周期下的训练。我们的分析建立了有界局部图放大、有界交互动态和终端成功信号的受控方差。在20个UAV群上训练的策略可直接转移到最多500个UAV的群中，无需微调，且在重新连接可靠性、恢复速度、运动安全性和运行效率方面优于代表性基线。

英文摘要

Large-scale Unmanned Aerial Vehicle (UAV) failures can split an unmanned aerial vehicle swarm network into disconnected sub-networks, making decentralized recovery both urgent and difficult. Centralized recovery methods depend on global topology information and become communication-heavy after severe fragmentation. Decentralized heuristics and multi-agent reinforcement learning methods are easier to deploy, but their performance often degrades when the swarm scale and damage severity vary. We present Physics-informed Graph Adversarial Imitation Learning algorithm (PhyGAIL) that adopts centralized training with decentralized execution. PhyGAIL builds bounded local interaction graphs from heterogeneous observations, and uses physics-informed graph neural network to encode directional local interactions as gated message passing with explicit attraction and repulsion. This gives the policy a physically grounded coordination bias while keeping local observations scale-invariant. It also uses scenario-adaptive imitation learning to improve training under fragmented topologies and variable-length recovery episodes. Our analysis establishes bounded local graph amplification, bounded interaction dynamics, and controlled variance of the terminal success signal. A policy trained on 20-UAV swarms transfers directly to swarms of up to 500 UAVs without fine-tuning, and achieves better performance across reconnection reliability, recovery speed, motion safety, and runtime efficiency than representative baselines.

URL PDF HTML ☆

赞 0 踩 0

2604.12288 2026-05-19 stat.ML cs.LG stat.ME 版本更新

SMART Fine-tuning Factor Augmented Neural Lasso

Jinhang Chai, Jianqing Fan, Cheng Gao, Qishuo Yin

发表机构 * Department of Operations Research and Financial Engineering（运筹学与金融工程系）

AI总结本文提出了一种结合预训练源模型作为增强特征的残差调优框架（SMART），用于高维非参数回归中的变量选择问题，通过引入低秩因子结构和残差调优分解，实现了协变量和后验偏移的联合处理，并推导了最小最大最优的超额风险界。

Comments Authors are listed in alphabetical order

详情

AI中文摘要

细调是一种广泛用于将预训练模型适应到新任务的策略，然而在高维非参数设置中，其方法论和理论性质在变量选择方面尚未得到发展。我们提出了一种源模型增强残差调优（SMART）框架，该框架将预训练源模型作为增强特征纳入目标学习者，并仅估计残差目标特定组件。该方法广泛适用，从参数和稀疏模型到神经网络和黑箱机器学习模型。我们专注于细调因子增强神经Lasso的发展，从而得到SMART-FAN-Lasso。这种用于高维非参数回归的迁移学习框架，同时处理协变量和后验偏移。我们使用低秩因子结构来管理高维依赖协变量，并在残差调优分解中将目标函数表示为源模型和其他目标特定变量的函数，从而降低目标任务的有效复杂性。我们推导了最小最大最优的超额风险界，刻画了在相对样本量和函数复杂性条件下，细调在统计加速方面优于单任务学习的精确条件。在广泛的不同协变量和后验偏移场景中进行的大量数值实验表明，SMART-FAN-Lasso在严重的目标样本量约束下仍能超越标准基线，实现接近 oracle 的性能，经验上验证了推导的速率。

英文摘要

Fine-tuning is a widely used strategy for adapting pre-trained models to new tasks, yet its methodology and theoretical properties in high-dimensional nonparametric settings with variable selection have not yet been developed. We propose a source-model-augmented residual tuning (SMART) framework, which incorporates the pre-trained source model as an augmented feature into the target learner and estimates only the residual target-specific component. The approach is widely applicable, from parametric and sparse models to neural networks and blackbox machine learning models. We focus on the development of fine-tuning factor-augmented neural Lasso, resulting in SMART-FAN-Lasso. This transfer-learning framework for high-dimensional nonparametric regression with variable selection simultaneously handles covariate and posterior shifts. We use a low-rank factor structure to manage high-dimensional dependent covariates and a residual tuning decomposition in which the target function is expressed as a function of source model and other target-specific variables, thereby reducing the effective complexity of the target task. We derive minimax-optimal excess risk bounds, characterizing the precise conditions, in terms of relative sample sizes and function complexities, under which fine-tuning yields statistical acceleration over single-task learning. Extensive numerical experiments across diverse covariate- and posterior-shift scenarios demonstrate that SMART-FAN-Lasso consistently outperforms standard baselines and achieves near-oracle performance even under severe target sample size constraints, empirically validating the derived rates.

URL PDF HTML ☆

赞 0 踩 0

2604.11852 2026-05-19 q-bio.QM cs.AI cs.LG 版本更新

Limitations of Sequence-Based Protein Representations for Parkinson's Disease Classification: A Leakage-Free Benchmark

序列基蛋白质表示在帕金森病分类中的局限性：一种无泄漏的基准测试

César Jesús Núñez-Prado, Grigori Sidorov, Liliana Chanona-Hernández

发表机构 * Higher School of Mechanical and Electrical Engineering, Instituto Politécnico Nacional（机械与电气工程高等专科学校，墨西哥国立理工学院）； Research Center for Computing, Instituto Politécnico Nacional（计算研究中心，墨西哥国立理工学院）

AI总结本文研究了序列基蛋白质表示在帕金森病分类中的局限性，通过无泄漏的基准测试评估了多种基于蛋白质初级序列的表示方法，发现单一序列信息对疾病分类的判别能力有限，需引入更丰富的生物学特征。

Comments 36 pages, 10 figures, 9 tables. Updated title, abstract, figures, and revised experimental discussion

详情

AI中文摘要

可靠分子生物标志物的鉴定仍因帕金森病的多因素性质而具有挑战性。尽管蛋白质序列是基础且广泛可用的生物信息来源，但其单独判别能力用于复杂疾病分类仍不明确。本文提出了一个受控且无泄漏的评估，评估了多种仅基于蛋白质初级序列的表示方法，包括氨基酸组成、k-mer、物理化学描述符、混合表示以及来自蛋白质语言模型的嵌入，所有均在嵌套分层交叉验证框架下评估以确保性能估计的无偏性。表现最佳的配置（ProtBERT + MLP）达到F1分数为0.704 ± 0.028和ROC-AUC为0.748 ± 0.047，表明判别性能仅中等。传统表示如k-mer达到相似的F1值（最高约0.667），但表现出高度不平衡的行为，召回率接近0.98，精度约0.50，反映出对正样本预测的强烈偏倚。在各种表示中，性能差异仍保持在狭窄范围内（F1在0.60到0.70之间），而无监督分析揭示没有与类别标签对齐的内在结构，统计检验（Friedman检验，p = 0.1749）不显示模型间的显著差异。这些结果表明类别之间有显著重叠，并表明仅凭初级序列信息对帕金森病分类的判别能力有限。本研究建立了一个可重复的基线，并提供了实证证据，表明更丰富的生物学特征，如结构、功能或相互作用描述符，对于稳健的疾病建模是必需的。

英文摘要

The identification of reliable molecular biomarkers for Parkinson's disease remains challenging due to its multifactorial nature. Although protein sequences constitute a fundamental and widely available source of biological information, their standalone discriminative capacity for complex disease classification remains unclear. In this work, we present a controlled and leakage-free evaluation of multiple representations derived exclusively from protein primary sequences, including amino acid composition, k-mers, physicochemical descriptors, hybrid representations, and embeddings from protein language models, all assessed under a nested stratified cross-validation framework to ensure unbiased performance estimation. The best-performing configuration (ProtBERT + MLP) achieves an F1-score of 0.704 +/- 0.028 and ROC-AUC of 0.748 +/- 0.047, indicating only moderate discriminative performance. Classical representations such as k-mers reach comparable F1 values (up to approximately 0.667), but exhibit highly imbalanced behavior, with recall close to 0.98 and precision around 0.50, reflecting a strong bias toward positive predictions. Across representations, performance differences remain within a narrow range (F1 between 0.60 and 0.70), while unsupervised analyses reveal no intrinsic structure aligned with class labels, and statistical testing (Friedman test, p = 0.1749) does not indicate significant differences across models. These results demonstrate substantial overlap between classes and indicate that primary sequence information alone provides limited discriminative power for Parkinson's disease classification. This work establishes a reproducible baseline and provides empirical evidence that more informative biological features, such as structural, functional, or interaction-based descriptors, are required for robust disease modeling.

URL PDF HTML ☆

赞 0 踩 0

2604.09450 2026-05-19 cs.LG cs.AI eess.IV 版本更新

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

ECHO: 通过一步块扩散实现高效的胸部X光报告生成

Lifeng Chen, Tianqi You, Hao Liu, Zhimin Bao, Jile Jiao, Xiao Han, Zhicai Ou, Tao Sun, Xiaofeng Mou, Xiaojie Jin, Yi Xu

发表机构 * Beijing Jiaotong University（北京交通大学）； Dalian University of Technology（大连理工大学）

AI总结本文提出ECHO，一种基于扩散模型的高效视觉-语言模型，用于生成胸部X光报告，通过一步块扩散和响应不对称扩散策略，显著提高了生成效率和文本连贯性，同时在临床准确性上保持良好表现。

详情

AI中文摘要

胸部X光报告生成（CXR-RG）有潜力显著减轻放射科医生的工作负担。然而，传统自回归视觉-语言模型（VLMs）由于序列令牌解码而存在高推理延迟。基于扩散的模型通过并行生成提供了一种有前景的替代方案，但它们仍然需要多个去噪迭代。将多步去噪压缩到单步可以进一步减少延迟，但通常会因令牌因子化去噪器引入的均场偏差而降级文本连贯性。为了解决这一挑战，我们提出了ECHO，一种高效的基于扩散的VLM（dVLM），用于胸部X光报告生成。ECHO通过一种新颖的直接条件蒸馏（DCD）框架实现了稳定的每块一步推理，该框架通过从策略扩散轨迹中构建非因子化监督来缓解均场限制，以编码联合令牌依赖性。此外，我们引入了一种响应不对称扩散（RAD）训练策略，该策略进一步提高了训练效率，同时保持模型有效性。广泛的实验表明，ECHO超越了最先进的自回归方法，在RaTE和SemScore上分别提高了64.33%和60.58%，同时在临床准确性上几乎没有下降的情况下，实现了高达8倍的推理加速。

英文摘要

Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative through parallel generation, but they still require multiple denoising iterations. Compressing multi-step denoising to a single step could further reduce latency, but often degrades textual coherence due to the mean-field bias introduced by token-factorized denoisers. To address this challenge, we propose \textbf{ECHO}, an efficient diffusion-based VLM (dVLM) for chest X-ray report generation. ECHO enables stable one-step-per-block inference via a novel Direct Conditional Distillation (DCD) framework, which mitigates the mean-field limitation by constructing unfactorized supervision from on-policy diffusion trajectories to encode joint token dependencies. In addition, we introduce a Response-Asymmetric Diffusion (RAD) training strategy that further improves training efficiency while maintaining model effectiveness. Extensive experiments demonstrate that ECHO surpasses state-of-the-art autoregressive methods, improving RaTE and SemScore by \textbf{64.33\%} and \textbf{60.58\%} respectively, while achieving up to \textbf{$8\times$} inference speedup with negligible degradation in clinical accuracy.

URL PDF HTML ☆

赞 0 踩 0

2604.06398 2026-05-19 physics.ao-ph cs.LG physics.comp-ph 版本更新

Calibration of a neural network ocean closure for improved mean state and variability

神经网络海洋闭合的校准以提高均值状态和变异性

Pavel Perezhogin, Alistair Adcroft, Laure Zanna

发表机构 * Courant Institute School of Mathematics, Computing, and Data Science, New York University, New York, NY, USA（科朗学院数学、计算与数据科学学院，纽约大学，纽约，纽约州，美国）； Program in Atmospheric and Oceanic Sciences, Princeton University, Princeton, NJ, USA（大气与海洋科学项目，普林斯顿大学，普林斯顿，新泽西州，美国）

AI总结本文提出利用集合卡尔曼反演方法对神经网络参数进行校准，以改进粗分辨率海洋模型的均值状态和变异性，通过校准减少了约1.7至3.3倍的误差。

详情

AI中文摘要

全球海洋模型在均值状态和变异性上存在偏差，特别是在粗分辨率下，其中次网格涡旋未被解析。为解决这些偏差，通常会通过任意方式调整参数化系数。本文将参数调整问题公式化为一个校准问题，使用集合卡尔曼反演（EKI）。我们优化了两个理想化海洋模型在粗分辨率下次网格涡旋的神经网络参数化参数。校准后的参数化在时间平均流体界面及其变异性上比未参数化的模型减少了1.7至3.3倍的误差，具体取决于度量和配置。EKI方法对时间平均统计中的噪声具有鲁棒性，源于混沌海洋动力学。此外，我们提出了一种高效的校准协议，通过精心选择初始条件来绕过统计平衡的积分。这些结果表明，系统性的校准可以显著提高粗分辨率海洋模拟，并为减少全球海洋模型中的偏差提供了一条实用路径。

英文摘要

Global ocean models exhibit biases in the mean state and variability, particularly at coarse resolution, where mesoscale eddies are unresolved. To address these biases, parameterization coefficients are typically tuned ad hoc. Here, we formulate parameter tuning as a calibration problem using Ensemble Kalman Inversion (EKI). We optimize parameters of a neural network parameterization of mesoscale eddies in two idealized ocean models at coarse resolution. The calibrated parameterization reduces errors by factors of 1.7-3.3 in the time-averaged fluid interfaces and their variability compared to the unparameterized model, depending on the metric and configuration. The EKI method is robust to noise in time-averaged statistics arising from chaotic ocean dynamics. Furthermore, we propose an efficient calibration protocol that bypasses integration to statistical equilibrium by carefully choosing an initial condition. These results demonstrate that systematic calibration can substantially improve coarse-resolution ocean simulations and provide a practical pathway for reducing biases in global ocean models.

URL PDF HTML ☆

赞 0 踩 0

2604.00919 2026-05-19 quant-ph cond-mat.stat-mech cs.LG 版本更新

Multi-Mode Quantum Annealing for Generative Representation Learning with Boltzmann Priors

多模式量子退火用于生成表示学习中的玻尔兹曼先验

Gilhan Kim, Daniel K. Park

发表机构 * Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea（统计与数据科学系，延世大学，首尔03722，大韩民国）； Department of Applied Statistics, Yonsei University, Seoul 03722, Republic of Korea（应用统计学系，延世大学，首尔03722，大韩民国）； Department of Quantum Information, Yonsei University, Seoul 03722, Republic of Korea（量子信息系，延世大学，首尔03722，大韩民国）

AI总结本文提出了一种基于量子退火的框架，利用通用玻尔兹曼先验改进变分自编码器，通过三种互补的退火模式实现高效训练、无条件生成和条件生成，展示了在MNIST、Fashion-MNIST和CelebA上的稳定训练和高质量生成，同时在异常检测和金融数据中表现出色。

Comments 25 pages, 8 figures

详情

AI中文摘要

基于能量模型为统计物理和机器学习提供自然桥梁，通过结构化能量景观表示数据。玻尔兹曼机是此类模型中特别有吸引力的一类，能够捕捉潜在变量间的复杂相互作用，但其在现代生成学习中的应用受到经典方法难以从一般（非受限）玻尔兹曼分布中采样的限制。本文开发了一种基于量子退火的框架，使变分自编码器能够使用通用玻尔兹曼先验。该框架采用三种互补的退火模式，适用于学习和部署的不同阶段：非绝热量子退火提供无偏的玻尔兹曼样本以实现高效训练，较慢的退火集中在学习先验的低能配置附近以实现无条件生成，条件退火配合外部场将学习的能量景观引导至属性特定区域以实现条件生成和语义编辑。使用多达2000个量子比特的D-Wave Advantage2处理器，在MNIST、Fashion-MNIST和CelebA上展示了稳定的训练和高质量的生成，比具有相同编码器-解码器架构的高斯先验VAE更快收敛且重建损失更低。除了生成外，学习的能量函数还提供超出重建损失的判别能力，用于异常检测。在单类MNIST实验中，这些分数能够将分布内样本与外样本分开，并在金融数据中改进市场制度转换的检测。这些结果证明了量子退火作为能量表示学习和生成建模的实用且可控的物理机制，超越了可计算的经典方法的范围。

英文摘要

Energy-based models provide a natural bridge between statistical physics and machine learning by representing data through structured energy landscapes. Boltzmann machines are a particularly compelling class of such models for capturing complex interactions among latent variables, but their use in modern generative learning has been limited by the classical intractability of sampling from general (non-restricted) Boltzmann distributions. Here we develop a quantum-annealing-based framework that enables variational autoencoders with general Boltzmann priors. The framework employs three complementary annealing modes tailored to different stages of learning and deployment: diabatic quantum annealing provides unbiased Boltzmann samples for efficient training, slower annealing concentrates samples near low-energy configurations of the learned prior for unconditional generation, and conditional annealing with external fields steers the learned energy landscape toward attribute-specific regions for conditional generation and semantic editing. Using up to 2000 qubits on a D-Wave Advantage2 processor, we demonstrate stable training and high-quality generation on MNIST, Fashion-MNIST, and CelebA, achieving faster convergence and lower reconstruction loss than a Gaussian-prior VAE with the same encoder-decoder architecture. Beyond generation, the learned energy function provides out-of-distribution detection signals that add discriminative power beyond reconstruction loss. We demonstrate that these scores separate in-distribution samples from held-out digit classes in one-class MNIST experiments and improve the detection of market regime shifts in financial data. These results establish quantum annealing as a practical and controllable physical mechanism for energy-based representation learning and generative modeling beyond the reach of tractable classical approaches.

URL PDF HTML ☆

赞 0 踩 0

2603.27341 2026-05-19 cs.AI cs.CV cs.LG 版本更新

A Comparative Study in Surgical AI: Potential and Limitations of Data, Compute, and Scaling

外科AI的比较研究：数据、计算和扩展的潜力与局限

Kirill Skobelev, Eric Fithian, Yegor Baranovski, Jack Cook, Sandeep Angara, Shauna Otto, Zhuang-Fang Yi, John Zhu, Daniel A. Donoho, X. Y. Han, Neeraj Mainkar, Margaux Masson-Forsythe

发表机构 * Center for Applied AI, Chicago Booth（应用人工智能中心，芝加哥商学院）； Surgical Data Science Collective（外科数据科学集体）； Children’s National Hospital（儿童医学中心）； Operations Management & Tolan Center for Healthcare, Chicago Booth（运营管理与托兰医疗中心，芝加哥商学院）

AI总结本文通过2026年最先进的AI方法，研究了外科手术工具检测中的性能和限制，发现即使使用多十亿参数模型和大量训练数据，当前的视觉语言模型在神经外科手术工具检测任务中仍表现不足，且模型规模和训练时间的增加对性能提升效果有限，表明当前AI在手术应用中仍面临显著挑战。

详情

AI中文摘要

最近的人工智能（AI）模型在多个生物医学任务基准上已匹配或超越了人类专家，但特别是在外科手术基准方面，这些基准往往缺失于主要的医学基准套件中。由于手术需要整合多种任务，一般能力的AI模型可能成为协作工具，如果性能可以得到提升。一方面，通过扩展架构大小和训练数据的常规方法具有吸引力，尤其是由于每年有数百万小时的手术视频数据生成。另一方面，为AI训练准备手术数据需要显著更高的专业水平，并且在该数据上训练需要昂贵的计算资源。这些权衡描绘了现代AI是否以及在多大程度上能够帮助外科实践的不确定图景。在本文中，我们通过使用2026年最先进的AI方法进行外科手术工具检测的案例研究来探讨这个问题。我们证明，即使使用多十亿参数模型和大量训练，当前的视觉语言模型在看似简单的神经外科手术工具检测任务中仍表现不足。此外，我们展示了扩展实验，表明增加模型规模和训练时间仅导致相关性能指标的边际改善。因此，我们的实验表明，当前模型在手术使用案例中仍可能面临重大障碍。此外，一些障碍无法通过额外的计算能力简单地“解决”并持续存在于不同的模型架构中，提出了数据和标签可用性是否是唯一限制因素的问题。我们讨论了这些约束的主要贡献者，并提出了潜在的解决方案。

英文摘要

Recent Artificial Intelligence (AI) models have matched or exceeded human experts in several benchmarks of biomedical task performance, but surgical benchmarks in particular are often missing from prominent medical benchmark suites. Since surgery requires integrating disparate tasks, generally-capable AI models could be particularly attractive as a collaborative tool if performance could be improved. On the one hand, the canonical approach of scaling architecture size and training data is attractive, especially since there are millions of hours of surgical video data generated per year. On the other hand, preparing surgical data for AI training requires significantly higher levels of professional expertise, and training on that data requires expensive computational resources. These trade-offs paint an uncertain picture of whether and to-what-extent modern AI could aid surgical practice. In this paper, we explore this question through a case study of surgical tool detection using state-of-the-art AI methods available in 2026. We demonstrate that even with multi-billion parameter models and extensive training, current Vision Language Models fall short in the seemingly simple task of tool detection in neurosurgery. Additionally, we show scaling experiments indicating that increasing model size and training time only leads to diminishing improvements in relevant performance metrics. Thus, our experiments suggest that current models could still face significant obstacles in surgical use cases. Moreover, some obstacles cannot be simply ``scaled away'' with additional compute and persist across diverse model architectures, raising the question of whether data and label availability are the only limiting factors. We discuss the main contributors to these constraints and advance potential solutions.

URL PDF HTML ☆

赞 0 踩 0

2603.18972 2026-05-19 cs.LG 版本更新

Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives

兼顾两种世界的多对决老虎机：统一算法用于在康多塞和波尔多目标下的随机和对抗性偏好

S Akash, Pratik Gajane, Jawar Singh

AI总结本文提出了一种兼顾随机和对抗性环境的多对决老虎机统一算法，针对康多塞和波尔多目标，同时在无先验知识的情况下实现了最优性能。

详情

AI中文摘要

多对决老虎机，其中学习者每轮选择m≥2个臂并仅观察胜者，自然出现在许多应用中，包括排名和推荐系统，但一个基本问题仍然存在：能否一个单一的算法在随机和对抗性环境中都表现最优，而无需知道所处的环境？我们对此给出了肯定答案，提供了第一个兼顾两种世界的多对决老虎机算法，适用于康多塞和波尔多目标。对于康多塞设置，我们提出MetaDueling，一种黑盒减少方法，将任何对决老虎机算法转换为多对决老虎机算法，通过将多方式胜者反馈转换为无偏的 pairwise 信号。将我们的减少方法应用于Versatile-DB，得到第一个兼顾两种世界的多对决老虎机算法：它在对抗性偏好下达到O(√(KT))的伪遗憾，在随机偏好下达到实例最优的O(∑_{i≠a*} logT/Δ_i)的伪遗憾，同时且无需先验知识。对于波尔多设置，我们提出SA-MiDEX，一种随机和对抗性算法，它在随机环境中达到O(K²logKT + Klog²T + ∑_{i:Δ_i^B>0} KlogKT/(Δ_i^B)²)的遗憾，在对抗者面前达到O(K√(TlogKT) + K^{1/3}T^{2/3}(logK)^{1/3})的遗憾，再次无需先验知识。我们用康多塞设置的上界补充了匹配的下界。对于波尔多设置，我们的上界在下界附近（因子K内）并且与文献中最好的结果相匹配。

英文摘要

Multi-dueling bandits, where a learner selects $m \geq 2$ arms per round and observes only the winner, arise naturally in many applications including ranking and recommendation systems, yet a fundamental question has remained open: can a single algorithm perform optimally in both stochastic and adversarial environments, without knowing which regime it faces? We answer this affirmatively, providing the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives. For the Condorcet setting, we propose $\texttt{MetaDueling}$, a black-box reduction that converts any dueling bandit algorithm into a multi-dueling bandit algorithm by transforming multi-way winner feedback into an unbiased pairwise signal. Instantiating our reduction with $\texttt{Versatile-DB}$ yields the first best-of-both-worlds algorithm for multi-dueling bandits: it achieves $O(\sqrt{KT})$ pseudo-regret against adversarial preferences and the instance-optimal $O\left(\sum_{i \neq a^\star} \frac{\log T}{Δ_i}\right)$ pseudo-regret under stochastic preferences, both simultaneously and without prior knowledge of the regime. For the Borda setting, we propose $\texttt{SA-MiDEX}$, a stochastic-and-adversarial algorithm that achieves $O\left(K^2 \log KT + K \log^2 T + \sum_{i: Δ_i^{\mathrm{B}} > 0} \frac{K\log KT}{(Δ_i^{\mathrm{B}})^2}\right)$ regret in stochastic environments and $O\left(K \sqrt{T \log KT} + K^{1/3} T^{2/3} (\log K)^{1/3}\right)$ regret against adversaries, again without prior knowledge of the regime. We complement our upper bounds with matching lower bounds for the Condorcet setting. For the Borda setting, our upper bounds are near-optimal with respect to the lower bounds (within a factor of $K$) and match the best-known results in the literature.

URL PDF HTML ☆

赞 0 踩 0

2603.18702 2026-05-19 cs.LG 版本更新

Off-Policy Learning with Limited Supply

有限供应下的离策略学习

Koichi Tanaka, Ren Kishimoto, Bushun Kawagishi, Yusuke Narita, Yasuo Yamamoto, Nobuyuki Shimizu, Yuta Saito

发表机构 * Keio University（Keio大学）； Institute of Science Tokyo（东京科学研究所）； Meiji University（Meiji大学）； Yale University（Yale大学）； LY Corporation（LY公司）； Hanjuku-kaso, Co., Ltd.

AI总结本文研究了在情境老虎机中受限供应下的离策略学习问题，提出了一种新的OPLS方法，通过考虑用户间的相对预期奖励来更高效地分配有限供应的物品，实验证明其在有限供应情境下的优越性。

Comments Published as a conference paper at WWW 2026

详情

AI中文摘要

我们研究了情境老虎机中的离策略学习（OPL），这在推荐系统和在线广告等广泛的实际应用中起着关键作用。典型的OPL在情境老虎机中假设一个无约束环境，其中策略可以无限次选择同一物品。然而，在许多实际应用中，包括优惠券分配和电子商务，有限供应通过分布式优惠券的预算限制或产品库存限制来限制物品。在这些设置中，贪心地选择当前用户预期奖励最高的物品可能导致该物品的早期耗尽，使其无法为未来可能生成更高预期奖励的用户使用。因此，最优的无约束设置中的OPL方法在有限供应设置中可能变得次优。为了解决这个问题，我们提供了一个理论分析，显示传统贪心OPL方法可能无法最大化策略性能，并证明在有限供应设置中必须存在性能更优的策略。基于这一见解，我们引入了一种新的方法，称为有限供应下的离策略学习（OPLS）。与简单选择预期奖励最高的物品不同，OPLS关注相对预期奖励较高的物品，从而更有效地分配有限供应的物品。我们在合成和现实数据集上的实验证明，OPLS在具有有限供应的情境老虎机问题中优于现有的OPL方法。

英文摘要

We study off-policy learning (OPL) in contextual bandits, which plays a key role in a wide range of real-world applications such as recommendation systems and online advertising. Typical OPL in contextual bandits assumes an unconstrained environment where a policy can select the same item infinitely. However, in many practical applications, including coupon allocation and e-commerce, limited supply constrains items through budget limits on distributed coupons or inventory restrictions on products. In these settings, greedily selecting the item with the highest expected reward for the current user may lead to early depletion of that item, making it unavailable for future users who could potentially generate higher expected rewards. As a result, OPL methods that are optimal in unconstrained settings may become suboptimal in limited supply settings. To address the issue, we provide a theoretical analysis showing that conventional greedy OPL approaches may fail to maximize the policy performance, and demonstrate that policies with superior performance must exist in limited supply settings. Based on this insight, we introduce a novel method called Off-Policy learning with Limited Supply (OPLS). Rather than simply selecting the item with the highest expected reward, OPLS focuses on items with relatively higher expected rewards compared to the other users, enabling more efficient allocation of items with limited supply. Our empirical results on both synthetic and real-world datasets show that OPLS outperforms existing OPL methods in contextual bandit problems with limited supply.

URL PDF HTML ☆

赞 0 踩 0

2603.14462 2026-05-19 cs.LG cs.AI 版本更新

STAG-CN: Spatio-Temporal Apiary Graph Convolutional Network for Disease Onset Prediction in Beehive Sensor Networks

STAG-CN：时空蜂巢图卷积网络用于蜂巢传感器网络中疾病发病预测

Sungwoo Kang

AI总结该研究提出STAG-CN模型，通过建模蜂箱间关系来预测疾病发病，利用时空图卷积网络结合物理位置和气候传感器相关性，验证了共享环境响应模式比空间接近性更有效。

Comments Null result after running with 10 seeds

详情

AI中文摘要

蜂蜜蜂群损失威胁着全球授粉服务，但当前监测系统将每个蜂箱视为孤立单元，忽略了疾病在养蜂场中传播的空间路径。本文介绍了时空蜂巢图卷积网络（STAG-CN），一种图神经网络，用于疾病发病预测。STAG-CN基于双邻接图，结合蜂箱会话间的物理共置和气候传感器相关性，通过基于因果扩张卷积和Chebyshev谱图卷积的时空-时空三明治架构处理多变量物联网传感器流。在韩国AI Hub养蜂数据集（数据集#71488）上进行扩展窗口时间交叉验证后，STAG-CN在三天预测范围内达到F1分数0.607。消融研究显示，仅气候邻接矩阵可达到全模型性能（F1=0.607），而仅物理邻接矩阵则为F1=0.274，表明共享的环境响应模式比空间接近性在疾病发病预测中更具预测信号。这些结果为基于图的生物安全监控在精准养蜂中的概念验证奠定了基础，证明了蜂箱传感器相关性编码了单个蜂箱方法无法察觉的疾病相关信息。

英文摘要

Honey bee colony losses threaten global pollination services, yet current monitoring systems treat each hive as an isolated unit, ignoring the spatial pathways through which diseases spread across apiaries. This paper introduces the Spatio-Temporal Apiary Graph Convolutional Network (STAG-CN), a graph neural network that models inter-hive relationships for disease onset prediction. STAG-CN operates on a dual adjacency graph combining physical co-location and climatic sensor correlation among hive sessions, and processes multivariate IoT sensor streams through a temporal--spatial--temporal sandwich architecture built on causal dilated convolutions and Chebyshev spectral graph convolutions. Evaluated on the Korean AI Hub apiculture dataset (dataset \#71488) with expanding-window temporal cross-validation, STAG-CN achieves an F1 score of 0.607 at a three-day forecast horizon. An ablation study reveals that the climatic adjacency matrix alone matches full-model performance (F1\,=\,0.607), while the physical adjacency alone yields F1\,=\,0.274, indicating that shared environmental response patterns carry stronger predictive signal than spatial proximity for disease onset. These results establish a proof-of-concept for graph-based biosecurity monitoring in precision apiculture, demonstrating that inter-hive sensor correlations encode disease-relevant information invisible to single-hive approaches.

URL PDF HTML ☆

赞 0 踩 0

2603.12145 2026-05-19 cs.LG cs.AI cs.SE 版本更新

Automatic Generation of High-Performance RL Environments

自动生成高性能强化学习环境

Seth Karten, Rahul Dev Appapogu, Chi Jin

发表机构 * Princeton University（普林斯顿大学）； Independent Researcher（独立研究者）

AI总结本文提出了一种闭环方法，通过最小的计算成本生成等效的高性能强化学习环境，展示了三种不同的工作流程，并在五个环境中验证了无仿真到仿真的差距，同时展示了新的环境创建方法。

Comments 20 pages, 5 figures

详情

AI中文摘要

将复杂的强化学习（RL）环境转换为高性能实现传统上需要数月的专业工程工作。我们提出了一种闭环方法，以最小的计算成本生成等效的高性能环境。我们的方法使用通用提示模板、分层验证（属性、交互和运行测试）、迭代修复和跨后端策略转移来验证无仿真到仿真的差距。我们展示了三个不同的工作流程跨越五个环境：（1）从Game Boy模拟器PyBoy直接翻译到我们的EmuRust（通过Rust IPC）和从Pokemon Showdown翻译到我们的PokeJAX（通过JAX）；（2）通过与现有高性能实现的吞吐量一致性进行验证，如Puffer Pong、MJX和Brax在匹配的GPU批次大小下；（3）新环境的创建：TCGJax，第一个Pokemon TCG Pocket环境，从网页提取的规范中创建。在2亿个参数下，环境开销低于训练时间的4%。我们的闭环方法验证了所有五个环境的等效性。TCGJax，由一个不在公共存储库中的私有参考合成，用于控制代理预训练数据的污染问题。

英文摘要

Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a closed-loop methodology that produces equivalent high-performance environments for minimal compute cost. Our method uses a generic prompt template, hierarchical verification (property, interaction, and rollout tests), iterative repair, and cross-backend policy transfer to verify no sim-to-sim gap. We demonstrate three distinct workflows across five environments: (1) Direct translation (no prior performance implementation exists) from Game Boy emulator PyBoy to our EmuRust (via Rust IPC) and from Pokemon Showdown to our PokeJAX (via JAX); (2) Translation verified against existing performance implementations via throughput parity with Puffer Pong, MJX and Brax at matched GPU batch sizes; and (3) New environment creation: TCGJax, the first Pokemon TCG Pocket environment, created from a web-extracted specification. At 200M parameters, the environment overhead drops below 4% of training time. Our closed-loop methodology confirms equivalence for all five environments. TCGJax, synthesized from a private reference absent from public repositories, serves as a contamination control for agent pretraining data concerns.

URL PDF HTML ☆

赞 0 踩 0

2603.11276 2026-05-19 stat.ML cs.LG 版本更新

RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

RIE-Greedy: 基于正则化的探索策略用于上下文老虎机

Tong Li, Thiago de Queiroz Casanova, Eric M. Schwartz, Victor Kostyuk, Dehan Kong, Joseph J. Williams

发表机构 * University of Toronto（多伦多大学）； University of Michigan（密歇根大学）

AI总结本文提出了一种基于正则化的探索策略（RIE-Greedy），利用模型拟合过程中的随机性作为内在探索源，理论证明其在两臂老虎机情况下等价于Thompson Sampling，并在大规模商业环境中优于epsilon-greedy等基准方法。

详情

AI中文摘要

现实中的复杂奖励模型的上下文老虎机问题通常使用迭代训练的模型（如提升树）来解决。然而，直接应用简单的有效探索策略（如Thompson Sampling或UCB）在这些黑箱估计器上很困难。现有方法依赖于复杂的假设或不可行的程序，难以在实践中验证和实现。本文探讨了一种无探索（纯贪婪）的动作选择策略，利用模型拟合过程中的随机性作为内在探索源。更具体地说，我们注意到基于交叉验证的正则化过程中的随机性可以自然地诱导出Thompson Sampling-like的探索。我们证明了这种正则化诱导的探索在两臂老虎机情况下在理论上等价于Thompson Sampling，并在大规模商业环境中相对于epsilon-greedy和其他最先进的方法在经验上实现了可靠的探索。总体而言，本文揭示了正则化估计器训练本身如何诱导有效的探索，为上下文老虎机设计提供了理论洞察和实践指导。

英文摘要

Real-world contextual bandit problems with complex reward models are often tackled with iteratively trained models, such as boosting trees. However, it is difficult to directly apply simple and effective exploration strategies--such as Thompson Sampling or UCB--on top of those black-box estimators. Existing approaches rely on sophisticated assumptions or intractable procedures that are hard to verify and implement in practice. In this work, we explore the use of an exploration-free (pure-greedy) action selection strategy, that exploits the randomness inherent in model fitting process as an intrinsic source of exploration. More specifically, we note that the stochasticity in cross-validation based regularization process can naturally induce Thompson Sampling-like exploration. We show that this regularization-induced exploration is theoretically equivalent to Thompson Sampling in the two-armed bandit case and empirically leads to reliable exploration in large-scale business environments compared to benchmark methods such as epsilon-greedy and other state-of-the-art approaches. Overall, our work reveals how regularized estimator training itself can induce effective exploration, offering both theoretical insight and practical guidance for contextual bandit design.

URL PDF HTML ☆

赞 0 踩 0

2603.10935 2026-05-19 cs.LG cs.AI cs.CV 版本更新

Spherical VAE with Cluster-Aware Feasible Regions: Guaranteed Prevention of Posterior Collapse

具有聚类感知可行区域的球形VAE：保证防止后验崩溃

Zegu Zhang, Jian Zhang

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出了一种理论保证非崩溃解的新型框架，通过利用球壳几何和聚类感知约束，防止VAE中的后验崩溃问题，并在合成和现实数据集上实现了100%的崩溃预防。

Comments 8 pages, 6 figures

详情

AI中文摘要

Proximal-IMH: 用于独立Metropolis-Hastings的近端后验提议

Youguang Chen, George Biros

发表机构 * Oden Institute for Computational Engineering and Sciences（奥登计算工程与科学研究所）

AI总结本文提出了一种改进的独立Metropolis-Hastings算法，通过引入辅助优化问题来消除近似后验分布中的偏差，从而在保持精确模型的同时提高稳定性和采样效率。

详情

AI中文摘要

我们考虑了在科学、工程和成像中的贝叶斯反问题中从后验分布采样的问题。我们的方法属于独立Metropolis-Hastings（IMH）采样算法家族，常用于贝叶斯推断。依赖于存在一个更便宜但可能有显著偏差的近似后验分布，我们引入了Proximal-IMH，通过辅助优化问题纠正近似后验的样本，从而在精确模型和近似参考点周围获得局部调整。对于理想化设置，我们证明了近端校正能够收紧近似和精确后验之间的匹配，从而提高接受率和混合性。该方法适用于线性和非线性输入-输出算子，并特别适用于精确后验采样成本过高的反问题。我们展示了包含多模态和数据驱动先验的数值实验，结果表明Proximal-IMH在现有IMH变体中表现更优。

英文摘要

We consider the problem of sampling from a posterior distribution arising in Bayesian inverse problems in science, engineering, and imaging. Our method belongs to the family of independence Metropolis-Hastings (IMH) sampling algorithms, which are common in Bayesian inference. Relying on the existence of an approximate posterior distribution that is cheaper to sample from but may have significant bias, we introduce Proximal-IMH, a scheme that removes this bias by correcting samples from the approximate posterior through an auxiliary optimization problem. This yields a local adjustment that trades off adherence to the exact model against stability around the approximate reference point. For idealized settings, we prove that the proximal correction tightens the match between approximate and exact posteriors, thereby improving acceptance rates and mixing. The method applies to both linear and nonlinear input-output operators and is particularly suitable for inverse problems where exact posterior sampling is too expensive. We present numerical experiments including multimodal and data-driven priors with nonlinear input-output operators. The results show that Proximal-IMH reliably outperforms existing IMH variants.

URL PDF HTML ☆

赞 0 踩 0

2602.21265 2026-05-19 cs.CL cs.LG cs.SE 版本更新

ToolMATH: A Diagnostic Benchmark for Long-Horizon Tool Use under Systematic Tool-Catalog Constraints

ToolMATH: 一种用于在系统性工具目录约束下评估长周期工具使用的诊断基准

Hyeonje Choi, Jeongsoo Lee, Hyojun Lee, Jay-Yoon Lee

发表机构 * Seoul National University（首尔国立大学）

AI总结本文提出ToolMATH，一种基于数学的诊断基准，用于评估在可控工具目录条件下长周期工具使用的性能，通过将分步MATH解决方案转换为可重用的Python工具，并配对需要顺序工具使用、中间输出重用和逻辑连接工具调用链的问题，从而评估模型在不同工具目录条件下的适应性、鲁棒性和工具连接性。

Comments Submitted to NeurIPS Evaluation & Dataset Track

详情

AI中文摘要

我们介绍了ToolMATH，一种用于评估在可控工具目录条件下长周期工具使用的数学基础诊断基准。ToolMATH将分步MATH解决方案转换为具有自然语言描述和类型化架构的可重用Python工具，并配对每个问题与一个需要顺序工具使用、中间输出重用和逻辑连接工具调用链的工具环境。ToolMATH通过构建黄金工具和难度分级的干扰项来控制工具可用性和目录难度。ToolMATH还结合了行为条件度量指标，使诊断评估超越最终准确性。基于这些测量，ToolMATH强调三个评估轴：（1）适应性衡量在黄金工具被完全替换为干扰项时保留的黄金成功程度；（2）鲁棒性衡量在添加干扰项作为噪声时的稳定性；（3）工具连接性衡量模型是否在长执行的工具调用链中保持准确性。此外，跟踪级失败分析描述了模型在每种工具目录条件下如何失败。这些诊断揭示了不同的模型特征：可靠的工具使用、工具回避、适应性替代以及不可靠工具目录的影响。总体而言，ToolMATH提供了一个受控的测试平台，用于评估语言模型如何适应变化的工具可用性，保持对干扰项的鲁棒性，并在长周期工具使用轨迹中保持正确性。

英文摘要

Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on unimodal data; in contrast, the theoretical underpinnings of in-context learning for multi-modal data remain poorly understood. We introduce a mathematically tractable framework for studying multi-modal learning and explore when transformer-like architectures can recover Bayes-optimal performance in-context. To model multi-modal problems, we assume the observed data arises from a latent factor model. Our first result comprises a negative take on expressibility: we prove that single-layer, linear self-attention fails to recover the Bayes-optimal predictor uniformly over the task distribution. To address this limitation, we introduce a novel, linearized cross-attention mechanism, which we study in the regime where both the number of cross-attention layers and the context length are large. We show that this cross-attention mechanism is provably Bayes optimal when optimized using gradient flow. Our results underscore the benefits of depth for in-context learning and establish the provable utility of cross-attention for multi-modal distributions.

URL PDF HTML ☆

赞 0 踩 0

2602.03535 2026-05-19 cs.LG cs.NA math.NA math.OC 版本更新

Sparse Training of Neural Networks based on Multilevel Mirror Descent

基于多级镜像下降法的神经网络稀疏训练

Yannick Lunk, Sebastian J. Scott, Leon Bungert

发表机构 * Institute of Mathematics（数学研究所）； University of Würzburg（乌尔姆大学）； Institute of Mathematics, CAIDAS University of Würzburg（数学研究所，CAIDAS乌尔姆大学）

AI总结本文提出了一种基于线性化Bregman迭代/镜像下降的动态稀疏训练算法，通过交替静态和动态稀疏模式更新来利用自然产生的稀疏性，结合稀疏诱导Bregman迭代与自适应冻结网络结构，以高效探索稀疏参数空间并保持稀疏性。通过多级优化框架保证收敛性，并实验证明该算法在标准基准上能产生高稀疏性和准确性的模型，同时在理论FLOPs数量和训练时间上均有显著提升。

2602.02830 2026-05-19 cs.LG stat.ME 版本更新

SC3D: Dynamic and Differentiable Causal Discovery for Temporal and Instantaneous Graphs

SC3D：动态和可微的因果发现用于时序和瞬时图

Sourajit Das, Dibyajyoti Chakraborty, Romit Maulik

发表机构 * Institute for Computational Data Science（计算数据科学研究所）； School of Mechanical Engineering（机械工程学院）

AI总结本文提出SC3D，一种动态和可微的因果发现方法，用于处理时序和瞬时图，通过两阶段可微框架联合学习滞后特定的邻接矩阵和瞬时有向无环图，提升了因果结构的稳定性和准确性。

Comments 12 pages

详情

AI中文摘要

PyHealth 2.0: 一个全面的开源工具包，用于可访问和可重复的临床深度学习

John Wu, Yongda Fan, Zhenbang Wu, Paul Landes, Eric Schrock, Sayeed Sajjad Razin, Arjun Chatterjee, Naveen Baskaran, Joshua Steier, Andrea Fitzpatrick, Bilal Arif, Rian Atri, Jathurshan Pradeepkumar, Siddhartha Laghuvarapu, Junyi Gao, Adam R. Cross, Jimeng Sun

发表机构 * University of Illinois Urbana-Champaign, Urbana, IL, USA（伊利诺伊大学厄巴纳-香槟分校）； PyHealth Research Initiative（PyHealth研究计划）； University of Illinois College of Medicine, Chicago, IL, USA（伊利诺伊大学医学院）； The University of Edinburgh, Edinburgh, UK（爱丁堡大学）； Health Data Research UK, London, UK（英国健康数据研究）； Department of Biomedical Engineering, Bangladesh University of Engineering（孟加拉国工程大学生物医学工程系）

AI总结本文提出PyHealth 2.0，一个全面的开源工具包，旨在解决临床AI研究中的可重复性和可访问性问题，通过统一15+数据集、20+临床任务、25+模型、5+可解释性方法和不确定性量化方法，实现7行代码即可完成预测建模。

Comments Under Review

详情

AI中文摘要

难以复制基线、高计算成本和所需领域专业知识创建了持续存在的临床AI研究障碍。为了解决这些挑战，我们介绍了PyHealth 2.0，一个增强的临床深度学习工具包，使在7行代码内即可实现预测建模。PyHealth 2.0提供了三个关键贡献：(1) 一个全面的工具包，通过统一15+数据集、20+临床任务、25+模型、5+可解释性方法和不确定性量化（包括符合预测的置信预测）在一个框架中解决可重复性和兼容性挑战，支持多种临床数据模态——信号、影像和电子健康记录——并翻译5+医学编码标准；(2) 以可访问性为重点的设计，支持多模态数据和多样化的计算资源，处理速度比以往快39倍，内存使用减少20倍，使从16GB笔记本电脑到生产系统都能轻松使用；(3) 一个活跃的开源社区，拥有400多名成员，通过详尽的文档、可重复研究贡献以及与学术医疗系统和产业伙伴的合作，包括通过RHealth实现的多语言支持，降低了领域专业知识的障碍。PyHealth 2.0建立了一个开源基础和社区，推动了可访问和可重复的医疗AI发展。可在pip install pyhealth中获取。

英文摘要

Difficulty replicating baselines, high computational costs, and required domain expertise create persistent barriers to clinical AI research. To address these challenges, we introduce PyHealth 2.0, an enhanced clinical deep learning toolkit that enables predictive modeling in as few as 7 lines of code. PyHealth 2.0 offers three key contributions: (1) a comprehensive toolkit addressing reproducibility and compatibility challenges by unifying 15+ datasets, 20+ clinical tasks, 25+ models, 5+ interpretability methods, and uncertainty quantification including conformal prediction within a single framework that supports diverse clinical data modalities - signals, imaging, and electronic health records - with translation of 5+ medical coding standards; (2) accessibility-focused design accommodating multimodal data and diverse computational resources with up to 39x faster processing and 20x lower memory usage, enabling work from 16GB laptops to production systems; and (3) an active open-source community of 400+ members lowering domain expertise barriers through extensive documentation, reproducible research contributions, and collaborations with academic health systems and industry partners, including multi-language support via RHealth. PyHealth 2.0 establishes an open-source foundation and community advancing accessible, reproducible healthcare AI. Available at pip install pyhealth.

URL PDF HTML ☆

赞 0 踩 0

2601.09071 2026-05-19 cs.LG 版本更新

Resolving Predictive Multiplicity for the Rashomon Set

解决Rashomon集的预测多样性

Parian Haghighat, Hadis Anahideh, Cynthia Rudin

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）； Duke University（杜克大学）

AI总结本文针对Rashomon集中的预测不一致性问题，提出三种方法：异常值修正、局部修补和成对协调，以减少预测分歧并提升模型可靠性，实验表明这些方法能有效降低不一致度同时保持竞争性准确性。

详情

DOI: 10.1609/aaai.v40i44.41076

AI中文摘要

多个同样准确的模型对于给定的预测任务的存在导致了预测多样性，其中一组称为Rashomon集的模型在准确性上相似，但个体预测却存在分歧。这种不一致性削弱了在高风险应用中对一致预测的信任。我们提出了三种方法来减少Rashomon集中成员之间的不一致性。第一种方法是异常值修正，异常值具有无法被良好模型正确预测的标签，异常值可能导致Rashomon集在局部区域有高方差的预测，因此修正它们可以降低方差。第二种方法是局部修补，在测试点的局部区域，模型可能因为某些模型存在偏差而相互矛盾。我们可以通过验证集检测并修正这些偏差，从而减少多样性。第三种方法是成对协调，我们找到在测试点周围区域上意见不一致的模型对，并修改这些不一致的预测，使其更少偏向。这三种方法可以单独或共同使用，各自具有独特的优势。协调后的预测可以被提炼成一个单一的可解释模型用于现实部署。在多个数据集上的实验表明，我们的方法在减少不一致度的同时保持了竞争性的准确性。

英文摘要

The existence of multiple, equally accurate models for a given predictive task leads to predictive multiplicity, where a ``Rashomon set'' of models achieve similar accuracy but diverges in their individual predictions. This inconsistency undermines trust in high-stakes applications where we want consistent predictions. We propose three approaches to reduce inconsistency among predictions for the members of the Rashomon set. The first approach is \textbf{outlier correction}. An outlier has a label that none of the good models are capable of predicting correctly. Outliers can cause the Rashomon set to have high variance predictions in a local area, so fixing them can lower variance. Our second approach is local patching. In a local region around a test point, models may disagree with each other because some of them are biased. We can detect and fix such biases using a validation set, which also reduces multiplicity. Our third approach is pairwise reconciliation, where we find pairs of models that disagree on a region around the test point. We modify predictions that disagree, making them less biased. These three approaches can be used together or separately, and they each have distinct advantages. The reconciled predictions can then be distilled into a single interpretable model for real-world deployment. In experiments across multiple datasets, our methods reduce disagreement metrics while maintaining competitive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2601.08118 2026-05-19 cs.AI cs.LG 版本更新

MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness

MirrorBench: 一个评估对话用户代理人类化能力的基准测试

Ashutosh Hathidara, Julien Yu, Vaishali Senthil, Sebastian Schreiber, Anil Babu Ankisettipalli

发表机构 * SAP Labs（SAP实验室）

AI总结本文提出MirrorBench基准测试，用于评估对话用户代理的人类化能力，通过结合多种词汇多样性指标和LLM评估指标，揭示用户代理与真实人类用户之间的系统性差距。

Comments KDD 2026 (Dataset & Benchmark Track)

2601.07122 2026-05-19 cs.CR cs.AI cs.LG 版本更新

Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework

通过一个鲁棒的LLM赋能的多智能体强化学习框架增强云网络韧性

Yixiao Peng, Hao Hu, Feiyang Li, Xinye Cao, Yingchang Jiang, Jipeng Tang, Guoshun Nan, Yuling Liu

发表机构 * State Key Laboratory of Mathematical Engineering and Advanced Computing（数学工程与先进计算国家重点实验室）； Henan Key Laboratory of Information Security（河南省信息安全重点实验室）； National Engineering Research Center for Mobile Network Technologies（移动网络技术国家工程研究中心）； Beijing University of Posts and Telecommunications（北京邮电大学）； Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）

AI总结本文提出了一种基于大语言模型的多智能体强化学习框架，旨在提升云网络的防御能力和韧性，通过分层架构和人类在回路支持来增强系统的适应性和可解释性。

详情

AI中文摘要

尽管虚拟化和资源池化赋予了云网络结构灵活性和弹性扩展能力，但它们不可避免地扩大了攻击面并挑战了网络的网络安全性。基于强化学习（RL）的防御策略已被开发用于在对抗条件下优化资源部署和隔离策略，以通过维护和恢复网络可用性来增强系统韧性。然而，现有方法缺乏鲁棒性，因为它们需要重新训练才能适应网络结构、节点规模、攻击策略和攻击强度的动态变化。此外，缺乏人类在回路（HITL）支持限制了可解释性和灵活性。为了解决这些限制，我们提出了CyberOps-Bots，一种由大语言模型（LLMs）赋能的分层多智能体强化学习框架。受MITRE ATT&CK的战术-技术模型启发，CyberOps-Bots具有双层架构：（1）一个上层LLM代理，包含四个模块——ReAct规划、IPDRR基础感知、长短时记忆和动作/工具整合，执行全局意识、人类意图识别和战术规划；（2）下层RL代理，通过异构分离预训练开发，执行原子防御动作，以在本地网络区域中执行。这种协同作用保留了LLM的适应性和可解释性，同时确保了可靠的RL执行。在真实云数据集上的实验表明，与最先进的算法相比，CyberOps-Bots在不重新训练的情况下，网络可用性保持在68.5%更高，并且在场景切换时实现了34.7%的性能提升。据我们所知，这是首次建立具有HITL支持的鲁棒LLM-RL框架用于云防御的研究。

英文摘要

While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense.

URL PDF HTML ☆

赞 0 踩 0

2601.06858 2026-05-19 eess.SP cs.LG 版本更新

Deep Learning-Based Channel Extrapolation for Dual-Band Massive MIMO Systems

基于深度学习的双频大规模MIMO系统的信道外推

Qikai Xiao, Kehui Li, Binggui Zhou, Shaodan Ma

发表机构 * State Key Laboratory of Internet of Things for Smart City and the Department of Electrical and Computer Engineering, University of Macau（物联网智能城市国家重点实验室和澳门大学电子与计算机工程系）； Department of Electrical and Electronic Engineering, Imperial College London（帝国理工学院伦敦分校电子与电气工程系）

AI总结本文提出了一种基于深度学习的多域融合信道外推方法，用于将sub-6 GHz频段的信道状态信息外推到毫米波频段，以减少毫米波信道状态信息获取的试点开销，提高大规模MIMO系统的效率。

详情

DOI: 10.1109/LWC.2026.3689267

AI中文摘要

未来无线通信系统将越来越多地依赖毫米波（mmWave）和sub-6 GHz频段的整合，以满足对高速数据传输和广泛覆盖的异构需求。为了充分利用毫米波频段在大规模多输入多输出（MIMO）系统中的优势，需要高精度的信道状态信息（CSI）。然而，直接估计毫米波信道需要大量的试点开销，因为CSI维度大且由于严重的路径损耗和阻挡衰减导致信噪比（SNR）低。在本文中，我们提出了一种高效的MDFCE（Multi-Domain Fusion Channel Extrapolator）来外推sub-6 GHz频段的CSI到毫米波频段的CSI，从而减少双频大规模MIMO系统中毫米波CSI获取的试点开销。与基于数学建模的传统信道外推方法不同，所提出的MDFCE结合了专家混合框架和多头自注意力机制，以融合sub-6 GHz CSI的多域特征，旨在有效且高效地表征从sub-6 GHz CSI到毫米波CSI的映射。仿真结果表明，MDFCE在各种天线阵列规模和信噪比水平上，相比现有方法在较少的训练试点情况下实现了更优的性能，同时表现出更高的计算效率。

英文摘要

Future wireless communication systems will increasingly rely on the integration of millimeter wave (mmWave) and sub-6 GHz bands to meet heterogeneous demands on high-speed data transmission and extensive coverage. To fully exploit the benefits of mmWave bands in massive multiple-input multiple-output (MIMO) systems, highly accurate channel state information (CSI) is required. However, directly estimating the mmWave channel demands substantial pilot overhead due to the large CSI dimension and low signal-to-noise ratio (SNR) led by severe path loss and blockage attenuation. In this paper, we propose an efficient \textbf{M}ulti-\textbf{D}omain \textbf{F}usion \textbf{C}hannel \textbf{E}xtrapolator (MDFCE) to extrapolate sub-6 GHz band CSI to mmWave band CSI, so as to reduce the pilot overhead for mmWave CSI acquisition in dual band massive MIMO systems. Unlike traditional channel extrapolation methods based on mathematical modeling, the proposed MDFCE combines the mixture-of-experts framework and the multi-head self-attention mechanism to fuse multi-domain features of sub-6 GHz CSI, aiming to characterize the mapping from sub-6 GHz CSI to mmWave CSI effectively and efficiently. The simulation results demonstrate that MDFCE can achieve superior performance with less training pilots compared with existing methods across various antenna array scales and signal-to-noise ratio levels while showing a much higher computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2601.06163 2026-05-19 cs.CV cs.LG 版本更新

Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking

Forget-It-All: 通过概念感知神经元掩码实现多概念机器去学习

Kaiyuan Deng, Bo Hui, Gen Li, Jie Ji, Minghai Qin, Geng Yuan, Xiaolong Ma

发表机构 * The University of Arizona（亚利桑那大学）； The University of Tulsa（塔尔萨大学）； Clemson University（克莱姆森大学）； Western Digital Corporation（西部数据公司）； University of Georgia（佐治亚大学）

AI总结该研究提出Forget-It-All框架，通过利用模型稀疏性，解决多概念去学习问题，有效提升去学习效果并保持生成质量。

Comments Accepted to ICML 2026

详情

Journal ref: Forty-Third International Conference on Machine Learning (ICML 2026)

AI中文摘要

文本到图像（T2I）扩散模型的广泛应用引发了对其可能生成版权、不当或敏感图像的担忧。作为实际解决方案，机器去学习旨在在不重新训练的情况下删除不需要的概念。尽管现有方法在单概念去学习中有效，但去除多个概念时往往面临显著挑战，包括去学习效果、生成质量和对超参数和数据集的敏感性。我们通过利用模型稀疏性，从独特角度看待多概念去学习，并提出Forget It All（FIA）框架。FIA首先引入对比概念显著性以量化每个权重连接对目标概念的贡献。然后通过结合时间信息和空间信息，识别出概念敏感神经元，确保只选择那些一致响应目标概念的神经元。最后，FIA从识别的神经元中构建掩码，并将其融合成统一的多概念掩码，其中对一般内容生成有广泛支持的无概念神经元被保留，而概念特定神经元被修剪以去除目标。FIA是无训练的，需要最少超参数调整即可用于新任务，实现即插即用。在三个不同的去学习任务上进行了广泛的实验，证明FIA在多概念去学习中实现了更可靠的性能，提高了遗忘效果同时保持生成的保真度和质量。代码可在https://github.com/kaiyuan02415/Forget-It-All获取。

英文摘要

The widespread adoption of text-to-image (T2I) diffusion models has raised concerns about their potential to generate copyrighted, inappropriate, or sensitive imagery. As a practical solution, machine unlearning aims to erase unwanted concepts without retraining from scratch. While most existing methods are effective for single-concept unlearning, they often struggle when removing multiple concepts, causing significant challenges in unlearning effectiveness, generation quality, and sensitivity to hyperparameters and datasets. We take a unique perspective on multi-concept unlearning by leveraging model sparsity and propose the Forget It All (FIA) framework. FIA first introduces Contrastive Concept Saliency to quantify each weight connection's contribution to a target concept. It then identifies Concept Sensitive Neurons by combining temporal and spatial information, ensuring that only neurons consistently responsive to the target concept are selected. Finally, FIA constructs masks from the identified neurons and fuses them into a unified multi-concept mask, where Concept Agnostic Neurons that broadly support general content generation are preserved while concept-specific neurons are pruned to remove the targets. FIA is training-free and requires minimal hyperparameter tuning for new tasks, enabling plug-and-play use. Extensive experiments across three distinct unlearning tasks demonstrate that FIA achieves more reliable multi-concept unlearning, improving forgetting effectiveness while maintaining generation fidelity and quality. Code is available at https://github.com/kaiyuan02415/Forget-It-All

URL PDF HTML ☆

赞 0 踩 0

2601.06162 2026-05-19 cs.LG cs.CV 版本更新

Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models

忘却众多，忘却正确：扩散模型中可扩展且精确的概念反学习

Kaiyuan Deng, Gen Li, Yang Xiao, Bo Hui, Xiaolong Ma

发表机构 * The University of Arizona（亚利桑那大学）； Clemson University（克莱姆森大学）； The University of Tulsa（塔尔萨大学）

AI总结本文提出了一种名为ScaPre的统一框架，用于在大规模扩散模型中实现精确的概念反学习，通过解决冲突更新、不精确机制和依赖额外数据的问题，提高了反学习的效率和精度。

Comments Accepted at ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR) 2026

AI中文摘要

文本到图像的扩散模型已取得显著进展，但其使用引发了版权和滥用问题，促使研究机器反学习。然而，将多概念反学习扩展到大规模场景仍然困难，因为存在三个挑战：（i）冲突的权重更新会阻碍反学习或降低生成质量；（ii）不精确的机制会导致对相似内容的损害；（iii）依赖额外数据或模块，造成可扩展性瓶颈。为了解决这些问题，我们提出了可扩展-精确概念反学习（ScaPre），一种专门针对大规模反学习的统一框架。ScaPre引入了冲突感知的稳定设计，整合了谱迹正则化和几何对齐，以稳定优化、抑制冲突并保持全局结构。此外，Informax解耦器识别与概念相关的参数并自适应地重新加权更新，严格将反学习限制在目标子空间内。ScaPre产生了一个高效的闭式解，无需额外数据或子模型。在对象、风格和显性内容上的全面实验表明，ScaPre能够有效移除目标概念并保持生成质量。它比最佳基线在可接受的质量限制内能忘却多达$ imes \mathbf{5}$更多的概念，实现了大规模反学习的最先进精度和效率。代码可在https://github.com/kaiyuan02415/scapre获取。

英文摘要

Text-to-image diffusion models have achieved remarkable progress, yet their use raises copyright and misuse concerns, prompting research into machine unlearning. However, extending multi-concept unlearning to large-scale scenarios remains difficult due to three challenges: (i) conflicting weight updates that hinder unlearning or degrade generation; (ii) imprecise mechanisms that cause collateral damage to similar content; and (iii) reliance on additional data or modules, creating scalability bottlenecks. To address these, we propose Scalable-Precise Concept Unlearning (ScaPre), a unified framework tailored for large-scale unlearning. ScaPre introduces a conflict-aware stable design, integrating spectral trace regularization and geometry alignment to stabilize optimization, suppress conflicts, and preserve global structure. Furthermore, an Informax Decoupler identifies concept-relevant parameters and adaptively reweights updates, strictly confining unlearning to the target subspace. ScaPre yields an efficient closed-form solution without requiring auxiliary data or sub-models. Comprehensive experiments on objects, styles, and explicit content demonstrate that ScaPre effectively removes target concepts while maintaining generation quality. It forgets up to $\times \mathbf{5}$ more concepts than the best baseline within acceptable quality limits, achieving state-of-the-art precision and efficiency for large-scale unlearning. Code is available at https://github.com/kaiyuan02415/scapre

URL PDF HTML ☆

赞 0 踩 0

2601.04855 2026-05-19 cs.LG cs.AI 版本更新

对PINNs和操作学习在土木工程中的关键评估

Krishna Kumar

AI总结本文评估了PINNs和操作学习在土木工程中的性能，比较了多种神经网络方法与有限差分和粒子方法在地质基准测试中的表现，并探讨了PINN反演与自动微分的优劣。

详情

AI中文摘要

科学机器学习（SciML）为土木工程中的数值流程提供了神经网络替代方案。本文将多层感知器（MLPs）、物理信息神经网络（PINNs）、深度操作网络（DeepONet）和图网络模拟器（GNS）与有限差分和粒子方法在地质基准测试中进行基准测试，并通过传统求解器比较PINN反演与自动微分（AD）。我们评估了每种方法在 extrapolation、训练、推理成本、跨问题实例转移和物理准确性方面的表现。一个在两年内训练的MLP能够拟合数据，但在第十年使用ReLU预测约290毫米，使用tanh或sigmoid预测约60毫米，而参考值为99.3毫米。一个带有时间域在[0,1]内的PINN在该区间内匹配闭合形式，但超出该范围失败，因为残差约束了仅在采样处的拟合。对于一维波动方程，PINN训练速度比有限差分方法慢约96,000倍且精度较低。DeepONet避免了PINN重新训练，但对于弹性基础上的梁，其训练成本等于约180万次有限差分求解，推理速度比直接求解器更慢。GNS通过局部粒子相互作用改进了几何转移，尽管公式仍需要轨迹、大规模训练集和大量内存。在逆向波动基准测试中，通过有限差分求解器的自动微分在几秒内恢复了材料剖面，误差约为1%。结果支持SciML的谨慎应用。神经网络适合在验证域内进行插值和模式识别，而逆向分析应在存在可靠正向求解器时首先尝试可微分的物理基础求解器。

英文摘要

Scientific machine learning (SciML) offers neural-network alternatives to numerical workflows in geotechnical engineering. This paper benchmarks multi-layer perceptrons (MLPs), physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) against finite-difference and particle-based references on geotechnical benchmarks, and compares PINN inversion with automatic differentiation (AD) through a conventional solver. We evaluate each method for extrapolation, training, and inference cost, transfer across problem instances, and physics accuracy. An MLP trained on two years of Terzaghi consolidation fits the data, but at year ten predicts ~290 mm with ReLU and ~60 mm with tanh or sigmoid, against a reference of 99.3 mm. A PINN on a damped oscillator with a time domain inside [0,1] matches the closed form within that interval but fails outside, since the residual constrains the fit only where it is sampled. For the 1D wave equation, PINN training is ~96,000 times slower than finite-difference methods and less accurate. DeepONet avoids PINN retraining, yet for the beam on elastic foundation, its training cost equals ~1.8 million finite-difference solves, and inference is slower per query than the direct solver. GNS improves geometric transfer through local particle interactions, though formulations still need trajectories, large training sets, and substantial memory. In the inverse wave benchmark, AD through the finite-difference solver recovers the material profile in seconds with ~1% error. The results support a cautious role for SciML. Neural networks suit interpolation and pattern recognition inside validated domains, while inverse analysis should first try differentiable physics-based solvers when a reliable forward solver exists.

URL PDF HTML ☆

赞 0 踩 0

2512.23178 2026-05-19 math.OC cs.LG stat.ML 版本更新

Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis

针对重尾噪声下的非光滑凸优化的截断梯度方法：一种细化分析

Zijian Liu

发表机构 * Stern School of Business（斯特恩商学院）

AI总结本文针对重尾噪声下的非光滑凸优化问题，提出了一种改进的截断梯度方法，并在高概率和期望收敛方面提供了更优的收敛速率和理论分析。

Comments A preliminary conference version is accepted at ICLR 2026. This full version includes the formal statements of lower bounds and their proofs. v3: fixed some typos

详情

AI中文摘要

在重尾噪声下的优化问题近年来变得流行，因为它更好地拟合了许多现代机器学习任务，如经验观察所捕获的。具体来说，而不是对梯度噪声有有限的二阶矩，已被认识到一个有界的p阶矩，其中p∈(1,2]更现实（例如上界由σ_l^p对于某些σ_l≥0）。一个简单而有效的操作，梯度截断，已知能成功处理这个新的挑战。具体来说，截断随机梯度下降（Clipped SGD）保证了非光滑凸（resp.强凸）问题的高概率速率O(σ_l ln(1/δ)T^{1/p-1})（resp. O(σ_l^2 ln^2(1/δ)T^{2/p-2}))，其中δ∈(0,1]是失败概率，T∈N是时间范围。在本文中，我们为Clipped SGD提供了一种细化分析，并提供了两个速率，O(σ_l d_{eff}^{-1/(2p)} ln^{1-1/p}(1/δ) T^{1/p-1})和O(σ_l^2 d_{eff}^{-1/p} ln^{2-2/p}(1/δ) T^{2/p-2})，比上述最佳结果更快，其中d_{eff}≥1是我们称为“广义有效维度”的量。我们的分析在两个方面优于现有方法：更有效地利用Freedman不等式和更精细的截断误差界在重尾噪声下。此外，我们将细化分析扩展到期望收敛，并获得新的速率，突破了已知的下界。最后，为了补充研究，我们为高概率和期望收敛建立了新的下界。值得注意的是，期望下界与我们的新上界相匹配，表明我们的细化分析在期望收敛方面是最佳的。

英文摘要

Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient noise, a bounded ${\frak p}$-th moment where ${\frak p}\in(1,2]$ has been recognized to be more realistic (say being upper bounded by $σ_{\frak l}^{\frak p}$ for some $σ_{\frak l}\ge0$). A simple yet effective operation, gradient clipping, is known to handle this new challenge successfully. Specifically, Clipped Stochastic Gradient Descent (Clipped SGD) guarantees a high-probability rate ${\cal O}(σ_{\frak l}\ln(1/δ)T^{1/{\frak p}-1})$ (resp. ${\cal O}(σ_{\frak l}^2\ln^2(1/δ)T^{2/{\frak p}-2})$) for nonsmooth convex (resp. strongly convex) problems, where $δ\in(0,1]$ is the failure probability and $T\in\mathbb{N}$ is the time horizon. In this work, we provide a refined analysis for Clipped SGD and offer two rates, ${\cal O}(σ_{\frak l}d_{\rm eff}^{-1/2{\frak p}}\ln^{1-1/{\frak p}}(1/δ)T^{1/{\frak p}-1})$ and ${\cal O}(σ_{\frak l}^2d_{\rm eff}^{-1/{\frak p}}\ln^{2-2/{\frak p}}(1/δ)T^{2/{\frak p}-2})$, faster than the aforementioned best results, where $d_{\rm eff}\ge1$ is a quantity we call the $\textit{generalized effective dimension}$. Our analysis improves upon the existing approach on two sides: better utilization of Freedman's inequality and finer bounds for clipping error under heavy-tailed noise. In addition, we extend the refined analysis to convergence in expectation and obtain new rates that break the known lower bounds. Lastly, to complement the study, we establish new lower bounds for both high-probability and in-expectation convergence. Notably, the in-expectation lower bounds match our new upper bounds, indicating the optimality of our refined analysis for convergence in expectation.

URL PDF HTML ☆

赞 0 踩 0

2512.11089 2026-05-19 stat.ML cs.LG 版本更新

TPV: Parameter Perturbations Through the Lens of Test Prediction Variance

TPV：通过测试预测方差的透镜进行参数扰动分析

Devansh Arpit

发表机构 * Modelable AI（可建模人工智能）

AI总结本文引入测试预测方差（TPV）作为分析训练后鲁棒性的统一框架，通过研究参数扰动对模型输出的一阶敏感性，揭示了SGD噪声、标签噪声、量化和剪枝等机制的统一视角，并提出了基于TPV的剪枝准则和模型选择方法。

Comments ICML 2026

详情

AI中文摘要

我们引入测试预测方差（TPV）——训练模型输出对参数扰动的一阶敏感性——作为分析训练后鲁棒性的统一框架。TPV是一个完全标签无关的对象，其迹形式将训练好的模型几何结构与特定扰动机制分离，将SGD噪声、标签噪声、量化和剪枝置于同一个视角下。所得到的表达式恢复了SGD和量化噪声的宽谷假设，并给出了标签噪声的Jacobian谱特征，将标签噪声TPV与非线性网络中的良性过拟合联系起来。理论上，我们证明在过参数化极限下，训练集TPV收敛到其测试集对应值，无论泛化性能如何，提供了首个结果：预测方差在局部参数扰动下可以通过训练输入单独推断。经验上，这种稳定性在更广泛的范围内成立，包括非常低的宽度。此外，TPV与测试损失相关联，使其具有实际应用价值：JBR，一种基于TPV几何匹配的无标签剪枝准则，实现了最先进的基线；以及基于训练集的模型选择信号，适用于分布内和迁移学习场景。代码可在github.com/devansharpit/TPV获得。

英文摘要

We introduce test prediction variance (TPV)--the first-order sensitivity of a trained model's outputs to parameter perturbations--as a unifying framework for analyzing post-training robustness. TPV is a fully label-free object whose trace form separates the geometry of the trained model from the specific perturbation mechanism, placing SGD noise, label noise, quantization, and pruning under a single lens. The resulting expressions recover the wide-minima hypothesis for SGD and quantization noise, and yield a distinct Jacobian-spectral characterization for label noise connecting label-noise TPV with benign overfitting in nonlinear networks. Theoretically, we prove that training-set TPV converges to its test-set counterpart in the overparameterized limit, irrespective of generalization performance, providing the first result that prediction variance under local parameter perturbations can be inferred from training inputs alone. Empirically, this stability holds far more broadly, including at very low widths. Further, TPV correlates well with test loss, enabling practical applications: JBR, a label-free pruning criterion derived from TPV geometry matching state-of-the-art baselines; and training-set based model selection signal for in-distribution and transfer learning scenarios. Code available at github.com/devansharpit/TPV.

URL PDF HTML ☆

赞 0 踩 0

2511.21654 2026-05-19 cs.LG 版本更新

基于表示和训练范式转变的分布外检测系统分析

Claudio César Claros Olivares, Austin J. Brockmeier

发表机构 * Department of Electrical & Computer Engineering（电气与计算机工程系）； University of Delaware（德雷塞尔大学）

AI总结本文通过表示中心的视角系统评估了分布外检测的CSFs，分析了不同架构、训练范式和数据集的影响，并提出基于PCA的投影过滤方法和基于神经坍塌的预测方法来提升检测性能。

详情

AI中文摘要

我们通过表示中心的视角系统评估了分布外检测（OOD）的CSFs。我们的研究涵盖了CNN和ViT架构、多种训练范式、四个图像分类源数据集（CIFAR-10、CIFAR-100、SuperCIFAR-100和TinyImageNet），以及通过CLIP衍生的语义距离将OOD数据集分为近、中、远三个区域。为了比较这些设置下的CSFs，我们采用了一种多重比较受控的排名流程，该流程在无阈值排名指标（AURC和AUGRC）下识别出统计上不可区分的顶级聚类。主要经验发现是，竞争性检测器家族更依赖于学习的表示而不是单纯的分数设计。对于CNN和ViT，简单的概率分数在误分类检测中占主导地位。在CNN中，基于边界的分数在近OOD区域最强，而几何感知分数如NNGuide、fDBD和CTM在移位严重性增加时变得更具竞争力。在微调的ViT中，顶级聚类主要由重建和残差分数主导。为了解释这些排名变化，我们使用神经坍塌（NC）指标分析最后一层表示。得到的图景在不同架构中是一致的：原型和边界感知分数在表示更坍塌且与分类器权重更好对齐时更强，而弱坍塌区域则更青睐梯度和流形基于的分数。基于这些见解，我们提出两个贡献：一种基于PCA的投影过滤过程，可以提高检测器性能，以及一种利用训练分类器计算的NC测量来预测其竞争性的分布外检测器短名单的方法，而无需任何额外的分布外数据。

英文摘要

We present a systematic benchmark of out-of-distribution (OOD) detection CSFs through a representation-centric lens. Our study spans CNN and ViT backbones, multiple training paradigms, four image-classification source datasets (CIFAR-10, CIFAR-100, SuperCIFAR-100, and TinyImageNet), and OOD datasets grouped into near, mid, and far regimes using CLIP-derived semantic distances. To compare CSFs across these settings, we employ a multiple-comparison-controlled rank pipeline that identifies top cliques of statistically indistinguishable winners under threshold-free ranking metrics (AURC and AUGRC). The main empirical finding is that the competitive detector family depends more on the learned representation than on score design alone. For both CNNs and ViTs, simple probabilistic scores dominate misclassification detection. On CNNs, margin-based scores are strongest in near-OOD regimes, while geometry-aware scores such as NNGuide, fDBD, and CTM become more competitive as shift severity increases. On fine-tuned ViTs, the top cliques are led mainly by reconstruction- and residual-based scores. To interpret these ranking shifts, we analyze the last-layer representation using Neural Collapse (NC) metrics. The resulting picture is consistent across architectures: prototype- and boundary-aware scores become stronger when the representation is more collapsed and better aligned with classifier weights, whereas weaker-collapse regimes favor gradient- and manifold-based scores. Building on these insights, we propose two contributions: a simple PCA-based projection-filtering procedure that improves detector performance, and an approach that uses NC measurements computed from a trained classifier to predict its competitive out-of-distribution detector shortlist, without requiring any additional OOD data.

URL PDF HTML ☆

赞 0 踩 0

2511.08704 2026-05-19 cs.CV cs.LG 版本更新

Rethinking Generative Image Pretraining: How Far Are We From Scaling Up Next-Pixel Prediction?

重新思考生成图像预训练：我们离扩大下一步像素预测还有多远？

Xinchen Yan, Chen Liang, Lijun Yu, Adams Wei Yu, Yifeng Lu, Quoc V. Le

发表机构 * Google Deepmind（谷歌深Mind）

AI总结本文研究了自回归下一步像素预测的扩展特性，探讨了统一视觉模型中简单且端到端但尚未充分探索的框架。通过在32x32分辨率的图像上训练Transformer模型，评估了三个目标指标：下一步像素预测目标、ImageNet分类准确率和基于生成的完成度（通过Fr'echet距离测量）。研究发现，最优扩展策略高度依赖任务，且随着图像分辨率的增加，模型大小必须比数据量增长得更快。通过预测发现，计算能力是主要瓶颈，而非训练数据量。随着计算能力每年增长四到五倍，预计在五年内可实现像素级图像建模。

Comments Accepted by ICML2026

详情

AI中文摘要

本文研究了自回归下一步像素预测的扩展特性，一种简单、端到端但尚未充分探索的统一视觉模型框架。从32x32分辨率的图像开始，我们训练了一系列Transformer模型，使用IsoFlops配置在计算预算高达7e19 FLOPs的情况下进行训练，并评估了三个不同的目标指标：下一步像素预测目标、ImageNet分类准确率和基于生成的完成度（通过Fr'echet距离测量）。首先，最优扩展策略高度依赖于任务。在固定的32x32分辨率下，图像分类和图像生成的最优扩展特性不同，其中生成最优设置要求数据量增长是分类最优设置的三到五倍。其次，随着图像分辨率的增加，最优扩展策略表明模型大小必须比数据量增长得更快。令人惊讶的是，通过投影我们的发现，我们发现主要瓶颈是计算能力，而不是训练数据量。随着计算能力每年增长四到五倍，我们预测在五年内可以实现像素级图像建模。

英文摘要

This paper investigates the scaling properties of autoregressive next-pixel prediction, a simple, end-to-end yet under-explored framework for unified vision models. Starting with images at resolutions of 32x32, we train a family of Transformers using IsoFlops profiles across compute budgets up to 7e19 FLOPs and evaluate three distinct target metrics: next-pixel prediction objective, ImageNet classification accuracy, and generation-based completion measured by Fr'echet Distance. First, optimal scaling strategy is critically task-dependent. At a fixed resolution of 32x32 alone, the optimal scaling properties for image classification and image generation diverge, where generation optimal setup requires the data size grow three to five times faster than for the classification optimal setup. Second, as image resolution increases, the optimal scaling strategy indicates that the model size must grow much faster than data size. Surprisingly, by projecting our findings, we discover that the primary bottleneck is compute rather than the amount of training data. As compute continues to grow four to five times annually, we forecast the feasibility of pixel-by-pixel modeling of images within the next five years.

URL PDF HTML ☆

赞 0 踩 0

2511.03828 2026-05-19 cs.LG 版本更新

M2H：基于高效窗口交叉任务注意力的多任务学习用于单目空间感知

U. V. B. L Udugama, George Vosselman, Francesco Nex

发表机构 * Department of Earth Observation Science（地球观测科学系）

AI总结本文提出M2H框架，通过高效的窗口交叉任务注意力模块，实现单目图像上的语义分割、深度估计、边缘检测和表面法线估计，同时在计算效率上优于现有方法。

Comments Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 7 figures

详情

DOI: 10.1109/IROS60139.2025.11246974

AI中文摘要

在边缘设备上部署实时空间感知需要高效的多任务模型，这些模型能够在利用互补任务信息的同时最小化计算开销。本文介绍了Multi-Mono-Hydra（M2H），一种新的多任务学习框架，用于从单张单目图像中进行语义分割、深度、边缘和表面法线估计。与传统方法依赖独立单任务模型或共享编码器-解码器架构不同，M2H引入了基于窗口的跨任务注意力模块，实现了结构化的特征交换同时保留任务特定的细节，提高了任务间预测的一致性。M2H基于轻量级的ViT-based DINOv2主干网络，优化了实时部署，并作为支持动态环境中3D场景图构建的单目空间感知系统的基础。全面评估显示，M2H在NYUDv2上优于最先进的多任务模型，在Hypersim上超越了单任务深度和语义基线，在Cityscapes数据集上实现了更优的性能，同时在笔记本硬件上保持计算效率。除了基准测试外，M2H还在真实世界数据上得到了验证，证明了其在空间感知任务中的实用性。

英文摘要

Deploying real-time spatial perception on edge devices requires efficient multi-task models that leverage complementary task information while minimizing computational overhead. This paper introduces Multi-Mono-Hydra (M2H), a novel multi-task learning framework designed for semantic segmentation and depth, edge, and surface normal estimation from a single monocular image. Unlike conventional approaches that rely on independent single-task models or shared encoder-decoder architectures, M2H introduces a Window-Based Cross-Task Attention Module that enables structured feature exchange while preserving task-specific details, improving prediction consistency across tasks. Built on a lightweight ViT-based DINOv2 backbone, M2H is optimized for real-time deployment and serves as the foundation for monocular spatial perception systems supporting 3D scene graph construction in dynamic environments. Comprehensive evaluations show that M2H outperforms state-of-the-art multi-task models on NYUDv2, surpasses single-task depth and semantic baselines on Hypersim, and achieves superior performance on the Cityscapes dataset, all while maintaining computational efficiency on laptop hardware. Beyond benchmarks, M2H is validated on real-world data, demonstrating its practicality in spatial perception tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.16609 2026-05-19 cs.LG cs.AI cs.CC cs.DS 版本更新

Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods

先验知识使其成为可能：从次线性图算法到LLM测试时方法

Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless

发表机构 * Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； Columbia University（哥伦比亚大学）； Google Research（谷歌研究）

AI总结本文研究了测试时增强方法中先验知识与外部信息交互的理论基础，通过将多步推理建模为知识图中的s-t连通性问题，揭示了在部分先验知识下，测试时增强步骤数量与图结构之间的关系，发现当知识图中存在小组件时，增强步骤数呈平方根增长，而当知识密度超过阈值形成大组件时，增强步骤数趋于常数。

详情

AI中文摘要

测试时增强，如检索增强生成（RAG）或工具使用，关键依赖于模型参数知识与外部检索信息之间的相互作用。然而，这种关系的理论基础仍不明确。具体来说，不清楚在少量增强步骤下需要多少预训练知识来回答查询，这在实践中是理想的属性。为了解决这个问题，我们将多步推理建模为知识图中的s-t连通性问题。我们将模型的预训练参数知识表示为部分、可能嘈杂的子图。我们将增强视为查询一个 oracle 以获得真实的边，从而扩展模型的知识。然后，我们表征了在部分先验知识下，模型生成准确答案所需的必要和充分的增强步骤数。一个关键结果表明：如果包含n个顶点的知识图被分割成小组件，则通过增强找到路径是低效的，需要Ω(√n)次查询。另一方面，一旦正确知识的密度超过阈值，形成大组件，我们可以通过预期常数次查询找到路径。

英文摘要

Test-time augmentation, such as Retrieval-Augmented Generation (RAG) or tool use, critically depends on an interplay between a model's parametric knowledge and externally retrieved information. However, the theoretical underpinnings of this relationship remain poorly understood. Specifically, it is not clear how much pre-training knowledge is required to answer queries with a small number of augmentation steps, which is a desirable property in practice. To address this question, we formulate multi-step reasoning as an $s$-$t$ connectivity problem on a knowledge graph. We represent a model's pre-training parametric knowledge as a partial, potentially noisy subgraph. We view augmentation as querying an oracle for true edges that augment the model's knowledge. Then, we characterize the necessary and sufficient number of augmentation steps for the model to generate an accurate answer given partial prior knowledge. One key result shows a phase transition: if the prior knowledge graph over $n$ vertices is disconnected into small components, then finding a path via augmentation is inefficient and requires $Ω(\sqrt{n})$ queries. On the other hand, once the density of correct knowledge surpasses a threshold, forming a giant component, we can find paths with an expected constant number of queries.

URL PDF HTML ☆

赞 0 踩 0

2510.16252 2026-05-19 cs.LG cs.CL 版本更新

WEBSERV: A Full-Stack and RL-Ready Web Environment for Training Web Agents at Scale

WEBSERV: 一个全栈且适合强化学习的网页环境，用于大规模训练网页代理

Yuxuan Lu, Ziyi Wang, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Xianfeng Tang, Chen Luo, Yisi Sang, Jin Lai, Dakuo Wang

发表机构 * Northeastern University（东北大学）； Amazon（亚马逊）

AI总结本文提出WebServ，一个全栈且适合强化学习的网页环境，用于大规模训练网页代理。该环境在服务器端使用Incus容器减少启动延迟和存储需求，浏览器端提供自动化的观察和动作接口，以及可靠的执行后端。实验表明，WebServ在WebArena-Lite上实现了最先进的单提示结果，并在强化学习训练中超越了现有方法。

详情

AI中文摘要

针对网页代理强化学习需求，本文提出WebServ，一个全栈且适合强化学习的网页环境，用于大规模训练网页代理。当前网页环境存在不足：服务器端Docker设置过于资源密集，无法支持大规模并行展开；浏览器端接口产生噪声观察，执行动作在现代单页应用中不可靠，并遗漏视觉交互提示。我们引入WebServ，一个全栈、适合强化学习的网页环境，解决这些限制。在服务器端，WebServ使用Incus容器，通过块级拷贝-写入减少启动延迟约5倍，持久化存储减少约240倍，使单台主机支持200+个隔离环境。在浏览器端，WebServ提供一个紧凑的、站点无关的观察和动作接口，自动从DOM派生，并提供人类对齐的交互提示，以及使用网络感知等待的稳健动作执行后端。在WebArena-Lite上，WebServ实现了最先进的单提示结果，受控比较确认在GPT-4o、OpenAI-o3和Llama-3.1-8B上均优于普通WebArena。我们进一步在WebServ中完全训练Qwen3-4B和Qwen3-30B-A3B；RL训练的4B模型在均值准确率上达到55.5%，超过了Claude 4.5 Sonnet（50.0%）和WebAgent-R1中的RL训练8B模型（51.8%）

英文摘要

Reinforcement learning (RL) for web agents demands environments that are both effective for evaluation and efficient enough for large-scale on-policy training. Current web environments fall short: server-side Docker setups are too resource-intensive for massive parallel rollouts, while browser-side interfaces produce noisy observations, execute actions unreliably under modern single-page applications, and omit visual interactivity cues. We introduce WebServ, a full-stack, RL-ready web environment that addresses these limitations end-to-end. On the server side, WebServ uses Incus containers with block-level copy-on-write, reducing launch latency by ~5x and persistent storage by ~240x, enabling 200+ concurrent isolated environments on a single host. On the browser side, WebServ provides a compact, site-agnostic observation and action interface derived automatically from the DOM with human-aligned interactivity cues, and a robust action execution backend using network-aware waiting for reliable SPA support. On WebArena-Lite, WebServ achieves state-of-the-art single-prompt results, with controlled comparisons confirming consistent gains across GPT-4o, OpenAI-o3, and Llama-3.1-8B over vanilla WebArena. We further train Qwen3-4B and Qwen3-30B-A3B with RL entirely within WebServ; the RL-trained 4B model achieves 55.5% mean accuracy, surpassing both Claude 4.5 Sonnet (50.0%) and the RL-trained 8B model from WebAgent-R1 (51.8%).

URL PDF HTML ☆

赞 0 踩 0

2510.13068 2026-05-19 cs.LG cs.AI cs.HC 版本更新

多类预测中的诚实校准误差

Yuxuan Lu, Yifan Wu, Jason Hartline, Lunjia Hu

发表机构 * Peking University（北京大学）； Northwestern University（西北大学）； Microsoft Research, New England（微软研究院（新英格兰））； Northeastern University（东北大学）； Khoury College of Computer Sciences（计算机科学学院）

AI总结本文研究了多类预测中诚实校准误差的实用作用，提出了完美诚实校准误差以处理标签分布的多维线性属性，并分析了这些诚实误差在决策理论上的影响，从而解释并缓解了分箱校准误差的排名鲁棒性问题。

详情

AI中文摘要

校准预测之所以有用，是因为其数值可以被解释为概率。校准误差因此被广泛用于评估、比较和调整概率预测器。最近，Haghtalab等人（2024）引入了一个额外的要求：诚实性。如果预测器通过报告真实的条件标签分布来最小化其预期测量误差，则校准度量是诚实的。许多标准的经验校准误差是非诚实的：预测器可能通过扭曲其概率而不是报告真实值来显得更校准。我们研究了诚实性在多类预测中校准测量的实用作用。首先，我们引入了完美诚实校准误差以处理标签分布的多维线性属性，推广了Hartline等人（2025）中二元预测的诚实校准误差。此框架包括完整的多类校准和类内校准。我们还确定了置信度校准的诚实修正。其次，我们分析了这些诚实误差的决策理论影响。对于校准预测器，诚实校准误差保持了Blackwell主导性：更信息丰富的校准预测器不会产生更大的预期误差。第三，我们表明这种决策理论解释解释并缓解了已观察到的分箱校准误差的排名鲁棒性问题。经验上，非诚实的置信度校准误差在分箱数量变化时可能逆转模型排名，而我们的诚实误差在不同分箱选择下提供更稳定的排名。

英文摘要

Calibrated predictions are useful because their numerical values can be interpreted as probabilities. Calibration errors are therefore widely used to evaluate, compare, and tune probabilistic predictors. Recently, Haghtalab et al. (2024) introduced an additional requirement for such measures: truthfulness. A calibration measure is truthful if a predictor minimizes its expected measured error by reporting the true conditional label distribution. Many standard empirical calibration errors are non-truthful: a predictor may appear better calibrated by distorting its probabilities rather than reporting them truthfully. We study the practical role of truthfulness for calibration measurement in multiclass prediction. First, we introduce perfectly truthful calibration errors for multidimensional linear properties of the label distribution, generalizing the truthful calibration error for binary predictions in Hartline et al. (2025). This framework includes full multiclass calibration and classwise calibration. We also identify a truthful correction for confidence calibration. Second, we characterize the decision-theoretic implications of these truthful errors. For calibrated predictors, truthful calibration errors preserve the Blackwell dominance: a more informative calibrated predictor receives no larger expected error. Third, we show that this decision-theoretic interpretation explains and mitigates the well-observed ranking robustness problem of binned calibration errors. Empirically, non-truthful confidence-based errors can reverse model rankings when the number of bins changes, while our truthful errors give more stable rankings across binning choices.

URL PDF HTML ☆

赞 0 踩 0

2510.05921 2026-05-19 cs.CL cs.LG 版本更新

Prompt reinforcing for long-term planning of large language models

通过提示强化实现大语言模型的长期规划

Hsien-Chin Lin, Benjamin Matthias Ruppik, Carel van Niekerk, Chia-Hao Shen, Michael Heck, Nurul Lubis, Renato Vukovic, Shutong Feng, Milica Gašić

发表机构 * Heinrich-Heine-Universität Düsseldorf（杜伊斯堡-埃森大学）

AI总结本文提出了一种基于强化学习的提示优化框架，通过修改LLM代理的任务指令提示来实现长期规划，提升了多轮交互任务如文本到SQL和任务导向对话的表现，并能泛化到不同LLM代理和多种LLM作为元提示代理。

详情

AI中文摘要

大型语言模型（LLMs）在广泛自然语言处理任务中取得了显著成功，并可通过提示进行适应。然而，它们在多轮交互中仍表现不足，常依赖错误的早期假设，无法随时间跟踪用户目标，使此类任务尤其具有挑战性。先前对话系统的工作表明，长期规划对于处理交互任务至关重要。在本工作中，我们提出了一种受强化学习启发的提示优化框架，仅通过修改LLM代理的任务指令提示即可实现此类规划。通过生成回合间的反馈并利用经验回放进行提示重写，我们的方法在文本到SQL和任务导向对话等多轮任务中显示出显著改进。此外，该方法能跨不同LLM代理泛化，并可利用多种LLM作为元提示代理。这促使未来在受强化学习启发的无参数优化方法上的研究。

英文摘要

Large language models (LLMs) have achieved remarkable success in a wide range of natural language processing tasks and can be adapted through prompting. However, they remain suboptimal in multi-turn interactions, often relying on incorrect early assumptions and failing to track user goals over time, which makes such tasks particularly challenging. Prior works in dialogue systems have shown that long-term planning is essential for handling interactive tasks. In this work, we propose a prompt optimisation framework inspired by reinforcement learning, which enables such planning to take place by only modifying the task instruction prompt of the LLM-based agent. By generating turn-by-turn feedback and leveraging experience replay for prompt rewriting, our proposed method shows significant improvement in multi-turn tasks such as text-to-SQL and task-oriented dialogue. Moreover, it generalises across different LLM-based agents and can leverage diverse LLMs as meta-prompting agents. This warrants future research in reinforcement learning-inspired parameter-free optimisation methods.

URL PDF HTML ☆

赞 0 踩 0

2510.01479 2026-05-19 cs.LG cs.SY eess.SY 版本更新

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets

密度比加权行为克隆：从受污染的数据集中学习控制策略

Shriram Karpoora Sundara Pandian, Ali Baheri

发表机构 * Department of Cybersecurity（网络安全系）； Rochester Institute of Technology（罗切斯特理工学院）； Mechanical Engineering Department（机械工程系）

AI总结本文提出了一种鲁棒的模仿学习方法Density-Ratio Weighted Behavioral Cloning，通过使用一个小的验证干净参考集估计轨迹级密度比，以优先考虑干净的专家行为并降低或丢弃受污染的数据，从而在不需了解污染机制的情况下提升政策性能。

详情

AI中文摘要

离线强化学习（RL）通过固定数据集进行策略优化，使其适用于在线探索不可行的安全关键应用。然而，这些数据集常受到对抗性污染、系统错误或低质量样本的污染，导致标准行为克隆（BC）和离线RL方法的策略性能下降。本文介绍了密度比加权行为克隆（Weighted BC），一种鲁棒的模仿学习方法，通过二元判别器估计轨迹级密度比，这些比值被截断并用作BC目标中的权重，以优先考虑干净的专家行为，同时降低或丢弃受污染的数据，而无需了解污染机制。我们建立了理论保证，证明在有限样本界限下，能够收敛到干净的专家策略，这些界限与污染率无关。建立了一个全面的评估框架，该框架包含各种污染协议（奖励、状态、转换和动作）在连续控制基准上的应用。实验表明，Weighted BC即使在高污染比下也能保持接近最优性能，优于传统BC、批量约束Q学习（BCQ）和行为正则化的Actor-Critic（BRAC）等基线方法。

英文摘要

Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).

URL PDF HTML ☆

赞 0 踩 0

2510.00304 2026-05-19 cs.LG cs.AI 版本更新

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity

在不断变化的世界中学习的障碍：对学习能力丧失的数学理解

Amir Joudaki, Giulia Lanzillotta, Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh, Thomas Hofmann, Mehrdad Farajtabar, Fartash Faghri

发表机构 * ETH Zürich（苏黎世联邦理工学院）； Apple（苹果公司）

AI总结本文研究了在非平稳环境中深度学习模型因学习能力丧失（LoP）而失效的问题，通过动力系统理论分析了LoP的两个主要机制，并探讨了缓解策略。

详情

AI中文摘要

深度学习模型在静态数据上表现优异，但在非静态环境中因一种称为学习能力丧失（LoP）的现象而表现不佳，即其未来学习能力下降。本文首次从原理上研究了基于梯度的学习中的LoP。基于动力系统理论，我们通过在参数空间中识别稳定的流形来正式定义LoP，这些流形会捕获梯度轨迹。我们的分析揭示了两种主要机制，这些机制创造了这些陷阱：来自激活饱和的冻结单元和来自表征冗余的克隆单元流形。我们的框架揭示了一个根本性的矛盾：在静态设置中促进泛化的属性，如低秩表示和简单性偏差，直接在持续学习场景中促成LoP。我们通过数值模拟验证了我们的理论分析，并探讨了架构选择或针对性扰动作为潜在的缓解策略。

英文摘要

Deep learning models excel in stationary data but struggle in non-stationary environments due to a phenomenon known as loss of plasticity (LoP), the degradation of their ability to learn in the future. This work presents a first-principles investigation of LoP in gradient-based learning. Grounded in dynamical systems theory, we formally define LoP by identifying stable manifolds in the parameter space that trap gradient trajectories. Our analysis reveals two primary mechanisms that create these traps: frozen units from activation saturation and cloned-unit manifolds from representational redundancy. Our framework uncovers a fundamental tension: properties that promote generalization in static settings, such as low-rank representations and simplicity biases, directly contribute to LoP in continual learning scenarios. We validate our theoretical analysis with numerical simulations and explore architectural choices or targeted perturbations as potential mitigation strategies.

URL PDF HTML ☆

赞 0 踩 0

2509.22849 2026-05-19 cs.CC cs.DM cs.LG cs.NE 版本更新

Parameterized Hardness of Zonotope Containment and Neural Network Verification

参数化的Zonotope包含与神经网络验证的难度

Vincent Froese, Moritz Grillo, Christoph Hertrich, Moritz Stargalla

发表机构 * Technische Universität Berlin（柏林技术大学）； Max Planck Institute for Mathematics in the Sciences（马克斯·普朗克数学研究所）； University of Technology Nuremberg（纽伦堡技术大学）

AI总结研究探讨了2层ReLU网络函数的正性判定问题，证明其在参数d下属于W[1]-难问题，并展示了Zonotope包含、Lp-Lipschitz常数近似等任务的计算复杂性，揭示了这些基础问题的最优解法。

Comments 20 pages, 5 figures, paper accepted at ICLR 2026

详情

AI中文摘要

具有ReLU激活函数的神经网络是机器学习中广泛使用的模型。因此，深入理解此类网络所计算函数的性质至关重要。最近，关于确定这些性质的参数化计算复杂性引起了越来越多的关注。在本工作中，我们填补了几个空白并解决了Froese等人[COLT '25]提出的一个开放问题，涉及网络验证相关问题的参数化复杂性。特别是，我们证明了当参数为d时，判定由2层ReLU网络计算的函数f:R^d→R的正性（从而满射性）是W[1]-难的。这一结果也表明，Zonotope（非）包含问题是W[1]-难的，这一问题在计算几何、控制理论和机器人学中具有独立的兴趣。此外，我们还证明了在2层ReLU网络中近似最大值、计算2层网络的Lp-Lipschitz常数（p∈(0,∞]）以及在3层网络中近似Lp-Lipschitz常数都是NP难且在参数d下W[1]-难的。值得注意的是，我们的难度结果是目前最强的，表明解决这些基础问题的朴素枚举方法在指数时间假设下本质上是最佳的。

英文摘要

Neural networks with ReLU activations are a widely used model in machine learning. It is thus important to have a profound understanding of the properties of the functions computed by such networks. Recently, there has been increasing interest in the (parameterized) computational complexity of determining these properties. In this work, we close several gaps and resolve an open problem posted by Froese et al. [COLT '25] regarding the parameterized complexity of various problems related to network verification. In particular, we prove that deciding positivity (and thus surjectivity) of a function $f\colon\mathbb{R}^d\to\mathbb{R}$ computed by a 2-layer ReLU network is W[1]-hard when parameterized by $d$. This result also implies that zonotope (non-)containment is W[1]-hard with respect to $d$, a problem that is of independent interest in computational geometry, control theory, and robotics. Moreover, we show that approximating the maximum within any multiplicative factor in 2-layer ReLU networks, computing the $L_p$-Lipschitz constant for $p\in(0,\infty]$ in 2-layer networks, and approximating the $L_p$-Lipschitz constant in 3-layer networks are NP-hard and W[1]-hard with respect to $d$. Notably, our hardness results are the strongest known so far and imply that the naive enumeration-based methods for solving these fundamental problems are all essentially optimal under the Exponential Time Hypothesis.

URL PDF HTML ☆

赞 0 踩 0

2509.18150 2026-05-19 cs.LG cs.AI 版本更新

Improving MLLM Training Efficiency via Stage-Aware Sparsity

通过阶段感知稀疏性提升MLLM训练效率

Kean Shi, Liang Chen, Haozhe Zhao, Baobao Chang

发表机构 * Peking University（北京大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出了一种基于稀疏表示的高效训练框架STS，通过阶段感知设计适应不同训练阶段的冗余，采用视觉标记压缩器和层动态跳过器来减少计算开销，验证了其在多种MLLM架构上的有效性。

详情

AI中文摘要

多模态大语言模型（MLLMs）在各种领域中表现出色，但训练效率低下，由于长输入序列和未充分利用的层间操作导致大量计算冗余。值得注意的是，这种冗余并非静态，而是随训练阶段变化。基于此观察，我们关注训练过程本身，提出了一种基于稀疏表示的高效训练框架，称为稀疏训练方案（STS）。不同于统一的稀疏性策略，STS采用阶段感知设计，适应训练过程中不同的冗余来源。具体而言，该框架包含两个互补组件：视觉标记压缩器，通过在模态对齐过程中压缩视觉标记来减少信息负载；层动态跳过器，通过在指令微调过程中动态跳过不必要的层来减轻计算开销。我们的方法广泛适用于多种MLLM架构，并已在多个基准上进行了广泛评估，证明了其有效性和效率。

英文摘要

Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance across a variety of domains. However, training MLLMs is often inefficient, as much of the computation is redundant due to the long input sequences from multimodal data and underutilized inter-layer operations. Notably, such redundancy is not static but varies across different stages of training. Building on this observation, we shift the focus to the training process itself and propose a training-efficient framework based on sparse representations, termed the Sparse Training Scheme (STS). Instead of applying a uniform sparsity strategy, STS adopts a stage-aware design that adapts to different sources of redundancy during training. Specifically, the framework consists of two complementary components: the Visual Token Compressor, which reduces the information load by compressing visual tokens during modality alignment, and the Layer Dynamic Skipper, which mitigates computational overhead by dynamically skipping unnecessary layers during instruction tuning. Our approach is broadly applicable to diverse MLLM architectures and has been extensively evaluated on multiple benchmarks, demonstrating its effectiveness and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2509.16391 2026-05-19 cs.LG cs.AI cs.CV 版本更新

CoUn: Empowering Machine Unlearning via Contrastive Learning

CoUn: 通过对比学习赋能机器无学习

Yasser H. Khalil, Mehdi Setayesh, Hongliang Li

发表机构 * Huawei Noah’s Ark Lab（华为诺亚实验室）

AI总结本文提出CoUn框架，通过对比学习和监督学习调整保留数据的表示，以提高机器无学习的有效性，实验表明其在多个数据集和模型架构上均优于现有方法。

详情

AI中文摘要

机器无学习（MU）旨在从已训练模型中移除特定'遗忘'数据的影响，同时保持对剩余'保留'数据的知识。现有的基于标签操纵或模型权重扰动的MU方法往往效果有限。为此，我们引入了CoUn，一种受观察启发的新MU框架：当模型仅使用保留数据重新训练时，它会根据保留数据的语义相似性对遗忘数据进行分类。CoUn通过对比学习（CL）和监督学习调整学习的数据表示，仅应用于保留数据。具体而言，CoUn（1）利用数据样本之间的语义相似性，通过CL间接调整遗忘表示，（2）通过监督学习保持保留表示在其各自聚类内。在各种数据集和模型架构上的广泛实验表明，CoUn在无学习有效性上 consistently 超过最先进的MU基线。此外，将我们的CL模块集成到现有基线中可以增强其无学习有效性。

英文摘要

Machine unlearning (MU) aims to remove the influence of specific "forget" data from a trained model while preserving its knowledge of the remaining "retain" data. Existing MU methods based on label manipulation or model weight perturbations often achieve limited unlearning effectiveness. To address this, we introduce CoUn, a novel MU framework inspired by the observation that a model retrained from scratch using only retain data classifies forget data based on their semantic similarity to the retain data. CoUn emulates this behavior by adjusting learned data representations through contrastive learning (CL) and supervised learning, applied exclusively to retain data. Specifically, CoUn (1) leverages semantic similarity between data samples to indirectly adjust forget representations using CL, and (2) maintains retain representations within their respective clusters through supervised learning. Extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines in unlearning effectiveness. Additionally, integrating our CL module into existing baselines empowers their unlearning effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2509.02351 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Ordinal Adaptive Correction: A Data-Centric Approach to Ordinal Image Classification with Noisy Labels

序数自适应校正：一种数据导向的带有噪声标签的序数图像分类方法

Alireza Sedighi Moghaddam, Mohammad Reza Mohammadi

发表机构 * School of Computer Engineering, Iran University of Science and Technology（伊朗科学技术大学计算机工程学院）

AI总结本文提出了一种数据导向的序数图像分类方法ORDAC，通过利用标签分布学习来建模序数标签的内在模糊性和不确定性，动态调整每个样本的标签分布均值和标准差，从而有效校正噪声标签并提高模型性能。

Comments 10 pages, 5 figures, 5 tables

详情

AI中文摘要

标记数据是训练计算机视觉任务中监督深度学习模型的基本组成部分。然而，尤其是在序数图像分类中，类边界往往具有模糊性，因此标注过程容易产生错误和噪声。此类标签噪声会显著降低机器学习模型的性能和可靠性。本文针对序数图像分类任务中检测和校正标签噪声的问题，提出了一种新的数据导向方法，称为ORDinal Adaptive Correction（ORDAC）。该方法利用标签分布学习（LDL）的能力来建模序数标签的内在模糊性和不确定性。在训练过程中，ORDAC动态调整每个样本的标签分布的均值和标准差。与其丢弃可能含有噪声的样本不同，该方法旨在校正这些样本并充分利用整个训练数据集。所提出方法在年龄估计（Adience）和疾病严重程度检测（糖尿病视网膜病变）基准数据集上，针对各种不对称高斯噪声场景进行了评估。结果表明，ORDAC及其扩展版本（ORDAC_C和ORDAC_R）在模型性能上取得了显著提升。例如，在Adience数据集上40%的噪声情况下，ORDAC_R将均方误差从0.86降低到0.62，并将召回指标从0.37提高到0.49。该方法还展示了其在原始数据集中固有噪声的校正效果。这项研究表明，使用标签分布进行自适应标签校正是增强在存在噪声数据时序数分类模型鲁棒性和准确性的一种有效策略。

英文摘要

Labeled data is a fundamental component in training supervised deep learning models for computer vision tasks. However, the labeling process, especially for ordinal image classification where class boundaries are often ambiguous, is prone to error and noise. Such label noise can significantly degrade the performance and reliability of machine learning models. This paper addresses the problem of detecting and correcting label noise in ordinal image classification tasks. To this end, a novel data-centric method called ORDinal Adaptive Correction (ORDAC) is proposed for adaptive correction of noisy labels. The proposed approach leverages the capabilities of Label Distribution Learning (LDL) to model the inherent ambiguity and uncertainty present in ordinal labels. During training, ORDAC dynamically adjusts the mean and standard deviation of the label distribution for each sample. Rather than discarding potentially noisy samples, this approach aims to correct them and make optimal use of the entire training dataset. The effectiveness of the proposed method is evaluated on benchmark datasets for age estimation (Adience) and disease severity detection (Diabetic Retinopathy) under various asymmetric Gaussian noise scenarios. Results show that ORDAC and its extended versions (ORDAC_C and ORDAC_R) lead to significant improvements in model performance. For instance, on the Adience dataset with 40% noise, ORDAC_R reduced the mean absolute error from 0.86 to 0.62 and increased the recall metric from 0.37 to 0.49. The method also demonstrated its effectiveness in correcting intrinsic noise present in the original datasets. This research indicates that adaptive label correction using label distributions is an effective strategy to enhance the robustness and accuracy of ordinal classification models in the presence of noisy data.

URL PDF HTML ☆

赞 0 踩 0

2508.06670 2026-05-19 math.NT cs.LG 版本更新

Machines Learn Number Fields, But How? The Case of Galois Groups

机器学习数域，但如何？ Galois群的案例

Kyu-Hwan Lee, Seewoo Lee

AI总结通过使用可解释的机器学习方法，如决策树，研究如何简单的模型能够利用Dedekind zeta系数来分类Q上的Galois扩展的Galois群，研究问题在于理解zeta系数分布如何依赖于Galois群，并证明新的分类标准。

Comments Accepted version, To appear in Research in Mathematical Sciences

2507.21035 2026-05-19 cs.AI cs.LG cs.MA q-bio.GN 版本更新

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

GenoMAS：通过代码驱动的基因表达分析进行科学发现的多智能体框架

Haoyang Liu, Yijiang Li, Haohan Wang

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结该研究提出GenoMAS多智能体框架，通过类型消息传递协议协调六个专门的LLM代理，以实现基因表达数据的高效处理和科学发现，其在数据预处理和基因识别任务上均优于现有方法。

Comments 51 pages (14 pages for the main text, 10 pages for references, and 27 pages for the appendix)

详情

AI中文摘要

基因表达分析对于许多生物医学发现至关重要，但从原始转录组数据中提取见解仍然极具挑战性，这归因于多个大型半结构化文件的复杂性和对大量领域专业知识的需求。当前的自动化方法往往受到不灵活的工作流或完全自主代理的限制，这些代理缺乏进行严谨科学探究所需的精确度。GenoMAS则另辟蹊径，通过集成结构化工作流的可靠性与自主代理的适应性，提出了一支基于LLM的科学家团队。GenoMAS通过类型消息传递协议协调六个专门的LLM代理，每个代理都为共享的分析画布贡献互补的强项。GenoMAS的核心是一个引导规划框架：编程代理将高层任务指南展开为动作单元，并在每个节点选择前进、修订、绕过或回溯，从而在保持逻辑一致性的同时，灵活适应基因组数据的特性。在GenoTEX基准测试中，GenoMAS在数据预处理方面达到了89.13%的复合相似度相关性，在基因识别方面达到了60.48%的F1分数，分别超过了最佳现有方法10.61%和16.85%。除了指标外，GenoMAS还揭示了由文献支持的生物合理基因-表型关联，同时调整了潜在混杂因素。代码可在https://github.com/Liu-Hy/GenoMAS上获得。

英文摘要

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

URL PDF HTML ☆

赞 0 踩 0

2507.16307 2026-05-19 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph 版本更新

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

钙钛矿-R1：一个专门领域的大型语言模型，用于智能发现前驱体添加剂和实验设计

Xin-De Wang, Zhi-Rui Chen, Peng-Jie Guo, Ze-Feng Gao, Cheng Mu, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China（中国人民大学物理学院）； School of Chemistry and Life Resource, Renmin University of China（中国人民大学化学与生命资源学院）

AI总结本研究提出Perovskite-R1，一个专门用于发现钙钛矿太阳能电池前驱体添加剂和实验设计的大型语言模型，通过系统挖掘和整理1232篇高质量科学文献，并整合33269种候选材料，构建了领域特定的指令微调数据集，从而提升材料发现的效率。

Comments 24 pages; 5 figures

详情

DOI: 10.1038/s43246-026-01099-9
Journal ref: Communications Materials 7, 86 (2026)

AI中文摘要

钙钛矿太阳能电池（PSCs）因其卓越的功率转换效率和有利的材料特性而迅速成为下一代光伏技术的有力竞争者。尽管有这些进展，长期稳定性、环境可持续性和可扩展制造等挑战仍然阻碍其商业化。前驱体添加剂工程显示出通过提高PSCs的性能和耐久性来解决这些问题的潜力。然而，科学文献的爆炸性增长以及材料、工艺和设备架构之间的复杂相互作用，使研究人员难以高效地访问、组织和利用该领域内的领域知识。为此，我们介绍了Perovskite-R1，一个具有先进推理能力的专门大型语言模型（LLM），专门用于发现和设计PSC前驱体添加剂。通过系统挖掘和整理1232篇高质量科学出版物，并整合一个包含33,269种候选材料的全面库，我们使用自动问答生成和推理链的方法构建了一个领域特定的指令微调数据集。在该数据集上微调QwQ-32B模型，得到了Perovskite-R1，它可以智能地综合文献见解，生成创新且实用的解决方案用于缺陷钝化和前驱体添加剂的选择。对几个模型提出策略的实验验证证实了它们在提高材料稳定性和性能方面的有效性。我们的工作展示了领域适应的LLM在加速材料发现中的潜力，并提供了一个闭环框架，用于智能、数据驱动的钙钛矿光伏研究进展。

英文摘要

Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.

URL PDF HTML ☆

赞 0 踩 0

2507.01099 2026-05-19 cs.CV cs.AI cs.LG cs.RO 版本更新

Geometry-aware 4D Video Generation for Robot Manipulation

面向机器人操作的几何感知4D视频生成

Zeyi Liu, Shuang Li, Eric Cousineau, Siyuan Feng, Benjamin Burchfiel, Shuran Song

发表机构 * Stanford University（斯坦福大学）； Toyota Research Institute（丰田研究院）

AI总结本文提出了一种几何感知的4D视频生成模型，通过跨视角点图对齐进行训练，以确保生成视频在多视角下的3D一致性，从而在单个RGB-D图像输入下生成时空一致的未来视频序列，并在不依赖相机姿态的情况下实现稳定的视觉和空间对齐预测。

Comments ICLR 2026; Project website: https://robot4dgen.github.io

详情

AI中文摘要

理解并预测物理世界的动态可以增强机器人在复杂环境中的规划和交互能力。尽管最近的视频生成模型在建模动态场景方面显示出强大的潜力，但生成在不同摄像机视角下既时间一致又几何一致的视频仍然是一项重大挑战。为此，我们提出了一种4D视频生成模型，通过在训练过程中使用跨视角点图对齐来监督模型，以确保生成视频的多视角3D一致性。通过这种几何监督，模型学习了一个共享的3D场景表示，使其能够从单个RGB-D图像输入中，根据新的视角生成时空一致的未来视频序列，而无需依赖相机姿态作为输入。与现有基线方法相比，我们的方法在多个模拟和现实世界机器人数据集上产生了更稳定和空间对齐的预测。我们进一步表明，预测的4D视频可用于使用现成的6自由度姿态跟踪器恢复机器人末端执行器轨迹，从而生成在新相机视角下具有良好泛化能力的机器人操作策略。

英文摘要

Understanding and predicting dynamics of the physical world can enhance a robot's ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of generated videos by supervising the model with cross-view pointmap alignment during training. Through this geometric supervision, the model learns a shared 3D scene representation, enabling it to generate spatio-temporally aligned future video sequences from novel viewpoints given a single RGB-D image per view, and without relying on camera poses as input. Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets. We further show that the predicted 4D videos can be used to recover robot end-effector trajectories using an off-the-shelf 6DoF pose tracker, yielding robot manipulation policies that generalize well to novel camera viewpoints.

URL PDF HTML ☆

赞 0 踩 0

2506.23549 2026-05-19 cs.AI cs.HC cs.LG 版本更新

CooT: Learning to Coordinate In-Context with Coordination Transformers

CooT: 通过协调转换器学习协调上下文

Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

发表机构 * Graduate Institute of Communication Engineering, National Taiwan University (NTU)（国立台湾大学通信工程研究所）； NTU Artificial Intelligence Center of Research Excellence (NTU AI-CoRE)（国立台湾大学人工智能研究中心）； University of Utah（犹他大学）

AI总结本研究提出CooT框架，通过上下文学习实现实时合作伙伴适应，解决了多智能体系统中协调不熟悉合作伙伴的挑战，其核心方法是通过观察学习对齐动作与合作伙伴意图，主要贡献是实现了在多样合作伙伴行为下的泛化能力。

Comments ICML 2026

详情

AI中文摘要

在多智能体系统中，协调不熟悉合作伙伴仍然是一个重大挑战。现有方法，如基于种群的方法，通过多样性提高鲁棒性，但通常缺乏在训练分布之外高效适应的机制。此外，微调在少样本设置中不可行，因为其交互成本高。为了解决这些限制，我们提出了CooT，一个利用上下文学习（ICL）进行实时合作伙伴适应的框架。与以往专注于任务泛化的ICL方法不同，CooT旨在在多样化的合作伙伴行为上实现泛化。在行为偏好智能体的轨迹上训练，它通过观察学习对齐动作与合作伙伴意图。我们在两个具有挑战性的多智能体基准测试中评估了CooT：Overcooked和Google Research Football。结果表明，CooT在性能上始终优于基于种群的方法、基于梯度的微调和Meta-RL基线，实现了稳定且快速的适应，而无需参数更新。人类评估也发现CooT是更受青睐的合作者，我们的消融实验确认了其快速适应新合作伙伴并在突然合作伙伴变化下保持稳定的能力，使其在现实世界的人机协作中具有可靠性。

英文摘要

Effective coordination among unfamiliar partners remains a major challenge in multi-agent systems. Existing approaches, such as population-based methods, improve robustness through diversity but often lack mechanisms for efficient adaptation beyond training distribution. Moreover, fine-tuning is impractical in few-shot settings due to its high interaction cost. To address these limitations, we propose CooT, a framework that leverages in-context learning (ICL) for real-time partner adaptation. Unlike prior ICL approaches that focus on task generalization, CooT is designed to generalize across diverse partner behaviors. Trained on trajectories from behavior-preferring agents, it learns to align actions with partner intentions purely through observation. We evaluate CooT on two challenging multi-agent benchmarks: Overcooked and Google Research Football. Results show that CooT consistently outperforms population-based methods, gradient-based fine-tuning, and Meta-RL baselines, achieving stable and rapid adaptation without parameter updates. Human evaluations also identify CooT as a preferred collaborator, and our ablations confirm its ability to adapt quickly to new partners and remain stable under sudden partner changes, making it reliable for real-world human-AI collaboration.

URL PDF HTML ☆

赞 0 踩 0

2506.23287 2026-05-19 cs.LG q-bio.QM 版本更新

HDTree: Generative Modeling of Cellular Hierarchies for Robust Lineage Inference

HDTree: 用于鲁棒谱系推断的细胞层次生成建模

Zelin Zang, WenZhe Li, Yongjie Xu, Chang Yu, Changxi Chi, Jingbo Zhou, Zhen Lei, Stan Z. Li

发表机构 * Centre for Artificial Intelligence and Robotics（人工智能与机器人中心）； Hong Kong Institute of Science and Innovation（香港创新科学研究院）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； School of Engineering, Westlake University（西湖大学工程学院）

AI总结本文提出HDTree，一种用于鲁棒谱系推断的生成建模框架，通过统一的层次代码库和量化扩散过程捕捉细胞层次关系，提升稳定性与可扩展性，并在通用和单细胞数据集上验证了其在谱系推断准确性、重建质量和层次一致性方面的优越性。

Comments accepted by ICML26

详情

AI中文摘要

在单细胞研究中，追踪和分析高通量单细胞分化轨迹对于理解生物过程至关重要。关键在于对支配细胞发育的层次结构的稳健建模。传统方法在计算成本、性能和稳定性方面存在局限。基于VAE的方法虽有所进展，但仍需要分支特定的网络模块，限制了其可扩展性和稳定性，同时常遭遇后验崩溃问题。为克服这些挑战，我们引入HDTree，一种用于稳健谱系推断的生成建模框架。HDTree通过统一的层次代码库在层次化潜在空间中捕捉树状关系，并利用量化扩散过程建模连续细胞状态转换。通过将生成过程与Waddington景观对齐，该方法不仅提高了稳定性和可扩展性，还增强了推断谱系的生物学合理性。HDTree的有效性通过在通用和单细胞数据集上的比较得到验证，其在谱系推断准确性、重建质量和层次一致性方面均优于现有方法。这些贡献使细胞分化路径的准确高效建模成为可能，为生物学发现提供可靠见解。 ootnote{代码可在https://github.com/zangzelin/code\_HDTree\_icml获取。}

英文摘要

In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding biological processes. Key to this is the robust modeling of hierarchical structures that govern cellular development. Traditional methods face limitations in computational cost, performance, and stability. VAE-based approaches have made strides but still require branch-specific network modules, limiting their scalability and stability, while often suffering from posterior collapse. To overcome these challenges, we introduce HDTree, a generative modeling framework designed for robust lineage inference. HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and employs a quantized diffusion process to model continuous cell state transitions. By aligning the generative process with the Waddington landscape, this method not only improves stability and scalability but also enhances the biological plausibility of inferred lineages. HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets, where it outperforms existing methods in lineage inference accuracy, reconstruction quality, and hierarchical consistency. These contributions enable accurate and efficient modeling of cellular differentiation paths, offering reliable insights for biological discovery.\footnote{Code is available at https://github.com/zangzelin/code\_HDTree\_icml.

URL PDF HTML ☆

赞 0 踩 0

2506.17312 2026-05-19 cs.SI cs.AI cs.LG 版本更新

Heterogeneous Temporal Hypergraph Neural Network

异构时序超图神经网络

Huan Liu, Pengfei Jiao, Mengzhou Gao, Chaochao Chen, Di Jin

发表机构 * School of Cyberspace, Hangzhou Dianzi University（杭州电子科技大学信息学院）； Data Security Governance Zhejiang Engineering Research Center（浙江数据安全治理工程研究中心）； College of Computer Science and Technology, Zhejiang University（浙江大学计算机科学与技术学院）； College of Intelligence and Computing, Tianjin University（天津大学智能与计算学院）

AI总结本文提出了一种异构时序超图神经网络（HTHGN），旨在捕捉复杂异构时序超图中的高阶交互关系，通过引入层次注意力机制和对比学习来提升模型对异构节点和超边之间丰富语义的捕捉能力。

Comments Accepted by IJCAI 2025

详情

DOI: 10.24963/ijcai.2025/347

AI中文摘要

图表示学习（GRL）已成为建模图结构数据的有效技术。在建模现实复杂网络中的异质性和动态性时，针对复杂异构时序图（HTGs）设计的GRL方法已被提出，并在各领域取得了成功应用。然而，大多数现有GRL方法主要关注保留低阶拓扑信息，而忽视了更高阶的组交互关系，这些关系更符合现实网络。此外，大多数现有超图方法只能建模静态同构图，限制了它们对HTGs中高阶交互关系的建模能力。因此，为了同时使GRL模型能够捕捉HTGs中的高阶交互关系，我们首先提出了异构时序超图的正式定义和不依赖额外信息的$P$-均匀异构超边构造算法。然后提出了一种新的异构时序超图神经网络（HTHGN），以完全捕捉HTGs中的高阶交互关系。HTHGN包含一个层次注意力机制模块，同时在异构节点和超边之间进行时间消息传递，以捕捉由超边带来的更宽广感受场中的丰富语义。此外，HTHGN通过最大化HTG中低阶相关异构节点对之间的一致性来进行对比学习，以避免低阶结构的模糊性问题。在三个真实世界HTG数据集上的详细实验结果验证了所提出HTHGN在建模HTGs中高阶交互关系的有效性，并展示了显著的性能提升。

英文摘要

Graph representation learning (GRL) has emerged as an effective technique for modeling graph-structured data. When modeling heterogeneity and dynamics in real-world complex networks, GRL methods designed for complex heterogeneous temporal graphs (HTGs) have been proposed and have achieved successful applications in various fields. However, most existing GRL methods mainly focus on preserving the low-order topology information while ignoring higher-order group interaction relationships, which are more consistent with real-world networks. In addition, most existing hypergraph methods can only model static homogeneous graphs, limiting their ability to model high-order interactions in HTGs. Therefore, to simultaneously enable the GRL model to capture high-order interaction relationships in HTGs, we first propose a formal definition of heterogeneous temporal hypergraphs and $P$-uniform heterogeneous hyperedge construction algorithm that does not rely on additional information. Then, a novel Heterogeneous Temporal HyperGraph Neural network (HTHGN), is proposed to fully capture higher-order interactions in HTGs. HTHGN contains a hierarchical attention mechanism module that simultaneously performs temporal message-passing between heterogeneous nodes and hyperedges to capture rich semantics in a wider receptive field brought by hyperedges. Furthermore, HTHGN performs contrastive learning by maximizing the consistency between low-order correlated heterogeneous node pairs on HTG to avoid the low-order structural ambiguity issue. Detailed experimental results on three real-world HTG datasets verify the effectiveness of the proposed HTHGN for modeling high-order interactions in HTGs and demonstrate significant performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2506.06114 2026-05-19 cs.LG 版本更新

Scalable unsupervised feature selection via weight stability

通过权重稳定性实现可扩展的无监督特征选择

Xudong Zhang, Renato Cordeiro de Amorim

发表机构 * School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe, UK（埃塞克斯大学计算机科学与电子工程学院，英国威文豪）

AI总结本文提出了一种基于Minkowski加权k-均值的无监督特征选择方法，通过聚合不同Minkowski指数下的特征权重来识别稳定且信息丰富的特征，从而提升聚类性能。

详情

AI中文摘要

无监督特征选择对于在高维数据中提升聚类性能至关重要，其中无关特征可能会掩盖有意义的结构。在本文中，我们引入了Minkowski加权k-均值++，一种新的Minkowski加权k-均值初始化策略。我们的初始化策略利用数据本身得出的特征相关性估计，以概率方式选择质心。在此基础上，我们提出了两种新的特征选择算法，FS-MWK++，通过聚合不同Minkowski指数下的特征权重来识别稳定且信息丰富的特征，以及SFS-MWK++，一种基于子采样的可扩展变体。我们通过理论分析支持我们的方法，证明在显式假设噪声特征和聚类结构的情况下，相关特征在不同Minkowski指数下均被赋予比噪声特征更高的权重。我们的软件可在https://github.com/xzhang4-ops1/FSMWK找到。

英文摘要

Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski Weighted $k$-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical analysis, demonstrating that, under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights than noise features across a range of Minkowski exponents. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.

URL PDF HTML ☆

赞 0 踩 0

2506.03837 2026-05-19 cond-mat.supr-con cond-mat.mtrl-sci cs.AI cs.LG 版本更新

HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction

HTSC-2025: 一个用于人工智能驱动临界温度预测的环境压力高温超导体基准数据集

Xiao-Qi Han, Ze-Feng Gao, Xin-De Wang, Zhenfeng Ouyang, Peng-Jie Guo, Zhong-Yi Lu

发表机构 * 1. School of Physics ； Beijing Key Laboratory of Opto-electronic Functional Materials \& Micro-nano Devices. Renmin University of China, Beijing 100872, China ； 2. Key Laboratory of Quantum State Construction ； Manipulation (Ministry of Education), Renmin University of China, Beijing 100872, China ； 3. Hefei National Laboratory, Hefei 230088, China

AI总结本文提出HTSC-2025基准数据集，包含2023至2025年由理论物理学家基于BCS超导理论预测的高温超导材料，旨在促进人工智能在超导材料发现中的应用。

Comments 7 pages, 2 figures

详情

DOI: 10.1088/1674-1056/adf042
Journal ref: Chinese Physics B 34, 100301 (2025)

AI中文摘要

高温超导材料的发现对人类工业和日常生活具有重要意义。近年来，利用人工智能（AI）预测超导转变温度的研究日益流行，大多数工具声称实现了显著的准确性。然而，该领域缺乏广泛接受的基准数据集，严重阻碍了不同AI算法之间的公平比较以及这些方法的进一步发展。在本工作中，我们提出了HTSC-2025，一个环境压力高温超导基准数据集。该数据集全面涵盖了基于BCS超导理论由理论物理学家在2023至2025年间发现的理论预测超导材料，包括著名的X₂YH₆系统、钙钛矿MXH₃系统、M₃XH₈系统、源自LaH₁₀结构演化的笼状BCN掺杂金属原子系统，以及从MgB₂演化而来的二维蜂窝状系统。HTSC-2025基准数据集已开源在https://github.com/xqh19970407/HTSC-2025并将持续更新。该基准数据集对加速基于人工智能方法的超导材料发现具有重要意义。

英文摘要

The discovery of high-temperature superconducting materials holds great significance for human industry and daily life. In recent years, research on predicting superconducting transition temperatures using artificial intelligence~(AI) has gained popularity, with most of these tools claiming to achieve remarkable accuracy. However, the lack of widely accepted benchmark datasets in this field has severely hindered fair comparisons between different AI algorithms and impeded further advancement of these methods. In this work, we present the HTSC-2025, an ambient-pressure high-temperature superconducting benchmark dataset. This comprehensive compilation encompasses theoretically predicted superconducting materials discovered by theoretical physicists from 2023 to 2025 based on BCS superconductivity theory, including the renowned X$_2$YH$_6$ system, perovskite MXH$_3$ system, M$_3$XH$_8$ system, cage-like BCN-doped metal atomic systems derived from LaH$_{10}$ structural evolution, and two-dimensional honeycomb-structured systems evolving from MgB$_2$. The HTSC-2025 benchmark has been open-sourced at https://github.com/xqh19970407/HTSC-2025 and will be continuously updated. This benchmark holds significant importance for accelerating the discovery of superconducting materials using AI-based methods.

URL PDF HTML ☆

赞 0 踩 0

2506.01523 2026-05-19 cs.LG stat.ML 版本更新

Beyond RLHF: A Unified Theoretical Framework of Alignment

超越RLHF：对齐的统一理论框架

Jihun Yun, Juno Kim, Jongho Park, Junhyuck Kim, Jongha Jon Ryu, Jaewoong Cho, Kwang-Sung Jun

发表机构 * KRAFTON ； UC Berkeley（加州大学伯克利分校）； MIT（麻省理工学院）； POSTECH

AI总结本文提出了一种统一的对齐理论框架，通过将对齐视为基于成对偏好的分布学习，推导出三种新的对齐目标，并证明了它们在非渐近情况下具有O(1/n)的收敛性，为RLHF提供了理论支持。

详情

AI中文摘要

通过强化学习从人类反馈（RLHF）对大型语言模型（LLMs）输出质量进行控制已成为主流方法。然而，现有理论未能为RLHF目标本身提供有力的理论依据，并且由于不同方法通常在不同框架下分析，难以比较各种方法的保证。为建立统一的对齐框架，本文探讨在何种假设下可以推导出现有或新的训练目标并获得理论保证。为此，本文将对齐重新定义为基于成对偏好的分布学习，这建立了一个概率假设，描述了偏好如何揭示关于目标LM的信息。这导致我们提出三种原理性的对齐目标：偏好最大似然估计、偏好蒸馏和反KL最小化。我们证明了它们都自然地避免退化，并具有O(1/n)的收敛性。特别是，反KL高度类似于RLHF目标，为RLHF提供了有力的理论支持。此外，本文的理论首次解释了实证发现：在策略性目标（如RLHF）通常优于似然式目标（如DPO）。最后，实验结果表明，所提出的目标在多个任务和模型上与强基线竞争。

英文摘要

Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm for controlling the quality of outputs from large language models (LLMs). However, existing theories do not provide strong justification for the RLHF objective itself and do not allow comparisons of the guarantees between various methods because different methods are often analyzed under different frameworks. Toward a unified framework for alignment, we ask under what assumptions can we derive existing or new training objectives and obtain theoretical guarantees. To this end, we reframe alignment as distribution learning from pairwise preferences, which makes a probabilistic assumption describing how preferences reveal information about the target LM. This leads us to propose three principled alignment objectives: preference maximum likelihood estimation, preference distillation, and reverse KL minimization. We prove that they all enjoy strong non-asymptotic $O(1/n)$ convergence to the target LM, naturally avoiding degeneracy. In particular, reverse KL highly resembles the RLHF objective, providing strong justification for RLHF. Furthermore, our theory explains, for the first time, the empirical finding that on-policy objectives (e.g., RLHF) typically outperform likelihood-style objectives (e.g., DPO). Finally, empirical results indicate that the proposed objectives are competitive with strong baselines across several tasks and models.

URL PDF HTML ☆

赞 0 踩 0

2505.11143 2026-05-19 stat.ML cs.LG 版本更新

Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression

Nash: 用于结构高维回归的神经自适应收缩

William R. P. Denault

发表机构 * Departments of Statistics and Human Genetics（统计学与人类遗传学系）

AI总结本文提出Nash框架，通过神经网络整合协变量特定的侧信息，实现高维稀疏回归，提升模型适应性和准确性。

详情

AI中文摘要

稀疏线性回归是数据分析中的基本工具。然而，传统方法在协变量具有结构或来自异质来源时往往表现不佳。在生物医学应用中，协变量可能来自不同的模态或根据潜在图结构进行组织。我们引入了神经自适应收缩（Nash），一种统一的框架，通过神经网络将协变量特定的侧信息整合到稀疏回归中。Nash在每个协变量的基础上自适应地调节惩罚项，学习调整正则化而无需交叉验证。我们使用一种分裂变分经验贝叶斯算法，将先验学习与后验推断解耦，将每轮扫描的M步骤从每个神经网络传递的O(p)次减少到一次批量传递，相对于之前提出的坐标上升CAVI方法，在p在10²到10⁴之间时，实测时间加速了74到106倍。在真实数据上的实验表明，Nash在准确性和适应性上优于现有方法。

英文摘要

Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph. We introduce \textit{Neural Adaptive Shrinkage} (Nash), a unified framework that integrates covariate-specific side information into sparse regression via neural networks. Nash adaptively modulates penalties on a per-covariate basis, learning to tailor regularization without cross-validation. We use a \textit{split variational empirical Bayes} algorithm that decouples prior learning from posterior inference, reducing the M-step from $\mathcal{O}(p) $ neural-network passes per sweep to a single batched pass, a \textit{74 to 106x wall-clock speedup} over previously proposed coordinate ascent CAVI for p between $10^2$ and $10^4$. Experiments on real data demonstrate that Nash improves accuracy and adaptability over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2505.09203 2026-05-19 cond-mat.mtrl-sci cond-mat.supr-con cs.AI cs.LG 版本更新

InvDesFlow-AL: active learning-based workflow for inverse design of functional materials

InvDesFlow-AL: 基于主动学习的反向设计功能材料工作流程

Xiao-Qi Han, Peng-Jie Guo, Ze-Feng Gao, Hao Sun, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China（中国人民大学物理学院）； Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学人工智能学院）； School of Engineering Science, University of Chinese Academy of Sciences（中国科学院大学工程科学学院）

AI总结本研究提出了一种基于主动学习的反向设计功能材料框架InvDesFlow-AL，通过迭代优化材料生成过程，提高性能特征的准确性，并在低形成能和低Ehull材料设计中取得显著成果，成功发现超导材料Li₂AuH₆。

Comments 29 pages, 11 figures

详情

DOI: 10.1038/s41524-025-01830-z
Journal ref: npj Computational Materials 11, 364 (2025)

AI中文摘要

开发具有特定性能的功能材料的反向设计方法对于推进可再生能源、催化、能量存储和碳捕集等领域的进步至关重要。基于扩散原理的生成模型可以直接生成满足性能约束的新材料，从而显著加速材料设计过程。然而，现有生成和预测晶体结构的方法往往受限于低成功率。在本工作中，我们提出了一种新的反向材料设计生成框架InvDesFlow-AL，该框架基于主动学习策略。该框架可以迭代优化材料生成过程，逐步引导其向期望的性能特征发展。在晶体结构预测方面，InvDesFlow-AL模型实现了RMSE为0.0423 Å，相比现有生成模型性能提高了32.96%。此外，InvDesFlow-AL已成功应用于低形成能和低Ehull材料的设计。它可以系统地生成具有逐步降低形成能的材料，同时在多样化的化学空间中不断扩展探索。这些结果充分证明了所提出的基于主动学习的生成模型在加速材料发现和反向设计中的有效性。为进一步证明该方法的有效性，我们以InvDesFlow-AL探索的常压下BCS超导体搜索为例。结果，我们成功发现了Li₂AuH₆作为传统BCS超导体，具有超高的转变温度140 K。这一发现为反向设计在材料科学中的应用提供了有力的实证支持。

英文摘要

Developing inverse design methods for functional materials with specific properties is critical to advancing fields like renewable energy, catalysis, energy storage, and carbon capture. Generative models based on diffusion principles can directly produce new materials that meet performance constraints, thereby significantly accelerating the material design process. However, existing methods for generating and predicting crystal structures often remain limited by low success rates. In this work, we propose a novel inverse material design generative framework called InvDesFlow-AL, which is based on active learning strategies. This framework can iteratively optimize the material generation process to gradually guide it towards desired performance characteristics. In terms of crystal structure prediction, the InvDesFlow-AL model achieves an RMSE of 0.0423 Å, representing an 32.96% improvement in performance compared to exsisting generative models. Additionally, InvDesFlow-AL has been successfully validated in the design of low-formation-energy and low-Ehull materials. It can systematically generate materials with progressively lower formation energies while continuously expanding the exploration across diverse chemical spaces. These results fully demonstrate the effectiveness of the proposed active learning-driven generative model in accelerating material discovery and inverse design. To further prove the effectiveness of this method, we took the search for BCS superconductors under ambient pressure as an example explored by InvDesFlow-AL. As a result, we successfully identified Li$_2$AuH$_6$ as a conventional BCS superconductor with an ultra-high transition temperature of 140 K. This discovery provides strong empirical support for the application of inverse design in materials science.

URL PDF HTML ☆

赞 0 踩 0

2505.07813 2026-05-19 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

DexWild：面向真实场景的机器人策略的灵巧交互

Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出DexWild框架，通过结合人类和机器人示范数据，提升机器人在多样化环境中的泛化能力，实验表明其在未见环境中的成功率显著高于传统方法。

Comments In RSS 2025. Website at https://dexwild.github.io

详情

AI中文摘要

大规模、多样化的机器人数据集已成为使灵巧操作策略泛化到新环境的有希望途径，但获取此类数据集存在诸多挑战。虽然远程操作能提供高保真的数据集，但其高成本限制了可扩展性。相反，如果人们可以像在日常生活中一样使用自己的手来收集数据呢？在DexWild中，一个多样化的数据收集团队使用他们的手在多种环境和物体上收集数小时的交互数据。为了记录这些数据，我们创建了DexWild-System，一种低成本、移动且易于使用的设备。DexWild学习框架在人类和机器人示范数据上共同训练，相较于单独训练每个数据集，其性能得到提升。这种组合产生了能够泛化到新环境、任务和形态的稳健机器人策略，只需少量额外的机器人特定数据。实验结果表明，DexWild显著提高了性能，在未见环境中实现了68.5%的成功率，几乎是仅使用机器人数据训练的策略的四倍，并提供了5.8倍更好的跨形态泛化能力。视频结果、代码库和说明可在https://dexwild.github.io上找到。

英文摘要

Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io

URL PDF HTML ☆

赞 0 踩 0

2505.06852 2026-05-19 cs.LG stat.ML 版本更新

Improving Random Forests by Smoothing

通过平滑改进随机森林

Ziyi Liu, Phuc Luong, Mario Boley, Daniel F. Schmidt

发表机构 * Faculty of Information Technology, Monash University（莫纳什大学信息科技学院）； Faculty of Computer and Information Science, University of Haifa（海法大学计算机与信息科学学院）

AI总结本文提出一种基于核的平滑机制，通过引入局部正则性来增强随机森林的预测性能，同时保留其自适应分区能力，特别是在数据稀缺情况下提升了预测效果。

Comments v2: Accepted manuscript. 30 pages (18 main + 12 appendix), 6 figures

详情

AI中文摘要

随机森林回归是一种强大的非参数方法，通过数据驱动的分区适应局部数据特征，在各种应用领域中表现出色。然而，随机森林预测的分段常数性质意味着每个分区都是独立预测的，忽略了潜在的函数平滑性。特别是在小数据情况下，输入空间内缺乏信息共享可能导致性能不佳。在本文中，我们提出了一种基于核的平滑机制，通过引入局部正则性来增强随机森林，同时保留其自适应分区能力。我们的方法将核平滑应用于随机森林的分段常数输出，有效地结合了基于树的方法的适应性和核方法的平滑性假设。我们证明这种平滑过程可以被解释为在重新采样训练输入的情况下捕捉树切分点的变异性/不确定性。实验证实，所提出的平滑随机森林模型在各种测试案例中一致提高了预测性能，特别是在数据稀缺的情况下。代码、数据集和实验结果可在 https://github.com/Neal-Liu-Ziyi/SmoothedRandomForest.git 公开获取。

英文摘要

Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of random forest predictions means each partition is predicted independently, ignoring potential smoothness in the underlying function. Particularly in the small data regime, this lack of information sharing across the input space can lead to suboptimal performance. In this work, we propose a kernel-based smoothing mechanism that enhances random forests by introducing local regularity to their predictions while preserving their adaptive partitioning capabilities. Our approach applies kernel smoothing to the piecewise constant outputs of random forests, effectively combining the adaptability of tree-based methods with the smoothness assumptions of kernel methods. We show that this smoothing procedure can be interpreted as capturing the variability/uncertainty in the tree cut points under resampling of the training inputs. Empirical results demonstrate that the proposed smoothed random forest model consistently improves predictive performance across diverse test cases, particularly in data-scarce settings. Code, datasets, and experiment results are publicly available at https://github.com/Neal-Liu-Ziyi/SmoothedRandomForest.git.

URL PDF HTML ☆

赞 0 踩 0

2505.02621 2026-05-19 cs.LG math.OC stat.ML 版本更新

Mirror Mean-Field Langevin Dynamics

镜像均场 Langevin 动力学

Anming Gu, Juno Kim

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； UC Berkeley（伯克利大学）

AI总结本文提出镜像均场 Langevin 动力学（MMFLD），用于优化受限在 $\mathbb{R}^d$ 子集上的概率测度，并通过统一的对数 Sobolev 不等式获得连续 MMFLD 的线性收敛性保证，以及其时间-粒子离散化版本的统一时间传播混沌结果。

Comments ICML 2026

2505.00409 2026-05-19 eess.AS cs.AI cs.LG 版本更新

Perceptual implications of automatic anonymization in pathological speech

病态语音中自动匿名化的人感知影响

Soroosh Tayebi Arasteh, Saba Afza, Tri-Thien Nguyen, Lukas Buess, Maryam Parvin, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Hiu Ching Hung, Mahshad Lotfinia, Thomas Gorges, Elmar Noeth, Maria Schuster, Seung Hee Yang, Andreas Maier

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universit\"at Erlangen-N\"urnberg, Erlangen, Germany. Department of Urology, Stanford University, Stanford, CA, USA. Department of Radiology, Stanford University, Stanford, CA, USA. Lab for AI in Medicine, RWTH Aachen University, Aachen, Germany. Department of Diagnostic ； Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany. Institute of Radiology, University Hospital Erlangen, Erlangen, Germany. Department of Foreign Language Education, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany. Department of Otorhinolaryngology, Head ； Neck Surgery, Ludwig-Maximilians-Universität München, Munich, Germany. Speech \& Language Processing Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.

AI总结本研究通过结构化协议评估自动匿名化病态语音的人感知影响，发现匿名化在不同疾病中存在显著差异，且感知质量下降，但临床严重程度评分保持稳定，同时发现感知结果与计算隐私指标脱钩。

详情

AI中文摘要

自动匿名化日益用于促进伦理共享的临床语音，但其感知和临床后果仍不明确。我们通过结构化协议，使用十名母语和非母语德语听众（涵盖临床和信号处理专业知识）对自动匿名化的病态语音进行了以人为中心的评估。受试者包括来自CLP、构音障碍、构语障碍、失声及成人和儿童对照组的180名德语说话者。每段原始录音及其自动匿名化版本在四个任务上进行评估：零样本图灵式辨别、少量样本辨别后短暂熟悉、5点质量评分以及4点盲评临床严重程度评分由资深语音病学家完成。听众在零样本和少量样本任务中检测到匿名化准确率分别为91%和93%，不同疾病之间存在显著差异（p=0.008），且熟悉度降低该差异。感知质量在0-100分上下降了30分（p<0.001），重新组织了各组的感知质量等级。母语影响了可检测性但不影响质量退化，而领域专业知识影响了质量退化但不影响可检测性，形成双分离现象；说话者性别和年龄无明显偏差。临床严重程度评分在构音障碍、构语障碍和失声中保持几乎完美的一致（二次加权Cohen's kappa 0.87-0.94），无录音移位超过一级。关键发现是感知结果与标准计算隐私指标脱钩：计算上匿名化最强的病态语音在感知上最不明显，反之亦然。这些发现支持了按疾病类型和听众类型、经临床验证的评估作为许可匿名语音用于临床使用的最低标准。

英文摘要

Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.

URL PDF HTML ☆

赞 0 踩 0

2504.16397 2026-05-19 cs.DB cs.LG 版本更新

Compass: SLO-aware Query Planner for Compound AI Serving at Scale

Compass: 一种面向大规模复合AI服务的SLO感知查询计划器

Banruo Liu, Wei-Yu Lin, Minghao Fang, Yihan Jiang, Fan Lai

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出Compass，一种首个面向大规模复合AI工作负载的SLO感知查询计划器，通过分解多查询、多SLO规划问题为可处理的子问题，利用查询间和查询内的计划相似性减少搜索步骤，并通过计划分析器提高每步效率，从而在资源竞争下最大化SLO良好吞吐量。

详情

AI中文摘要

复合AI服务的兴起使得端用户应用如生成式AI会议助手、自动驾驶和沉浸式游戏得以实现。这些工作负载跨越多样化的部署空间，从纯云查询到跨基础设施层级的边缘辅助查询，往往包括多个部署环境。实现高服务吞吐量——即满足流水线延迟、准确性和成本的服务级别目标（SLOs）——需要联合规划操作符的放置、配置和资源分配。然而，多样化的SLOs、变化的运行环境（如异构设备速度）以及大量竞争共享基础设施的查询使规划空间变得复杂，使现有进展难以实现实时服务和成本高效的部署。本文提出了Compass，一种首个SLO感知查询计划器，用于优化跨多样化部署空间的大规模复合AI工作负载。Compass将多查询、多SLO规划问题分解为可处理的子问题，同时保持全局决策质量，利用查询内和跨查询的计划相似性来减少搜索步骤。它进一步通过计划分析器提高每步效率，该分析器进行选择性分析以在极低的分析成本下实现高保真度的性能估计。在运行时，Compass执行查询-计划二分匹配以在资源竞争下最大化SLO吞吐量。实际评估表明，Compass将服务吞吐量提高2.4-5.1倍，减少部署成本3.8-4.5倍，并加速规划4.2-10.5倍，实现秒级的服务响应和接近最优的决策质量。

英文摘要

The rise of compound AI serving that integrates multiple operators in a pipeline enables end-user applications such as generative AI-powered meeting companions, autonomous driving, and immersive gaming. These workloads span diverse deployment spaces, from cloud-only queries to edge-assisted ones across infrastructure tiers, often including both within an application. Achieving high service goodput -- i.e., meeting service level objectives (SLOs) for pipeline latency, accuracy, and costs -- requires joint planning of operators' placement, configuration, and resource allocation. However, diverse SLOs, varying runtime environments (e.g., heterogeneous device speeds), and a large volume of queries competing for shared infrastructure explode the planning space, making real-time serving and cost-efficient deployment intractable with existing advances. This paper presents Compass, the first SLO-aware query planner that optimizes large-scale compound AI workloads across diverse deployment spaces. Compass decomposes the many-query, multi-SLO planning problem into tractable subproblems while preserving global decision quality, exploiting plan similarities within and across queries to slash the search steps. It further improves per-step efficiency with a plan profiler that performs selective profiling to achieve high-fidelity performance estimates at a fraction of the profiling cost. At runtime, Compass performs query-plan bipartite matching to maximize SLO goodput under resource contentions. Real-world evaluations show that Compass improves service goodput by 2.4--5.1x, reduces deployment costs by 3.8--4.5x, and accelerates planning by 4.2--10.5x, achieving service responsiveness within seconds and near-optimal decision quality.

URL PDF HTML ☆

赞 0 踩 0

2504.07347 2026-05-19 stat.ML cs.LG math.PR 版本更新

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

面向LLM推理和AI代理的吞吐量最优调度算法

J. G. Dai, Tianze Deng, Yueying Li, Tianyi Peng

发表机构 * School of Operations Research and Information Engineering, Cornell University（Cornell大学运筹学与信息工程学院）； Operations Management, Booth School of Business, University of Chicago（芝加哥大学博斯商学院运营管理系）； Decision, Risk and Operations, Columbia Business School, Columbia University（哥伦比亚大学哥伦比亚商学院决策、风险与运营系）

AI总结本文从排队论角度研究了LLM推理系统的吞吐量优化问题，证明了工作保持调度算法在DAG和Fork-Join路由拓扑中能实现最大吞吐量，并揭示了批量处理网络中K-FCFS调度的流极限框架，评估了Orca和Sarathi-Serve的吞吐量最优性，同时指出批量大小限制和循环路由拓扑对吞吐量的影响。

详情

AI中文摘要

随着大型语言模型（LLM）和AI代理的需求迅速增长，优化高效LLM推理系统变得至关重要。尽管已有大量针对系统级工程的努力，但从数学建模和排队视角进行探索的却很少。本文开发了LLM推理的排队基础。特别地，我们研究了LLM推理系统的吞吐量方面。我们证明了一类广泛的'工作保持'调度算法在单个请求和AI代理工作负载中都能实现最大吞吐量，建立了'工作保持'作为从业者的关键设计原则。技术上，我们开发了在K-FCFS调度下的多类批量处理网络的流极限框架，这可能具有独立价值。对实际系统的评估证实Orca和Sarathi-Serve是吞吐量最优的，使从业者放心，而FasterTransformer和原生vLLM则不是最大稳定，应谨慎使用。我们的分析还揭示了诸如批量大小限制和循环路由拓扑等约束如何复杂化吞吐量的图景，指向排队论与LLM系统设计交汇处丰富的开放问题。

英文摘要

As demand for Large Language Models (LLMs) and AI agents grows rapidly, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little has been explored from a mathematical modeling and queueing perspective. In this paper, we develop the queueing fundamentals for LLM inference. In particular, we study the throughput aspect of LLM inference systems. We prove that a large class of `work-conserving' scheduling algorithms achieve maximum throughput for both individual requests and AI-agent workloads with directed acyclic graph (DAG) and fork-join routing topologies, establishing `work-conserving' as a key design principle for practitioners. Technically, we develop a fluid-limit framework for multi-class batched processing networks under $K$-FCFS scheduling, which may be of independent interest. Evaluations of real-world systems confirm that Orca and Sarathi-Serve are throughput-optimal, reassuring practitioners, while FasterTransformer and vanilla vLLM are not maximally stable and should be used with caution. Our analysis also reveals how constraints such as batch size limits and cyclic routing topologies complicate the throughput picture, pointing to rich open questions at the intersection of queueing theory and LLM system design.

URL PDF HTML ☆

赞 0 踩 0

2503.14800 2026-05-19 cs.IR cs.AI cs.LG 版本更新

Long Context Modeling with Ranked Memory-Augmented Retrieval

长上下文建模与排名记忆增强检索

Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Flora D. Salim, Imran Razzak

发表机构 * University of New South Wales（新南威尔士大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； University of Southampton（南安普顿大学）； Mohamed Bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结本文提出了一种增强的排名记忆增强检索框架，通过动态排名记忆条目和学习到的排名技术，提升语言模型在长上下文任务中的性能和可扩展性。

2503.02161 2026-05-19 cs.LG 版本更新

LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion

LLM-TabLogic: 通过提示引导的潜在扩散模型在合成表格数据中保留列间逻辑关系

Yunbo Long, Liming Xu, Alexandra Brintrup

发表机构 * Department of Engineering, University of Cambridge（剑桥大学工程系）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结本文提出LLM-TabLogic方法，利用大语言模型推理捕捉表格列间的复杂逻辑关系，并通过Score-based Diffusion模型在潜在空间中生成数据，以在不需领域知识的情况下有效保持合成表格数据中的列间关系。

详情

AI中文摘要

合成表格数据越来越多地被用来替代真实数据，作为一种同时保护隐私和解决数据稀缺问题的有效解决方案。然而，除了保持全局统计属性外，合成数据集还必须维持领域特定的逻辑一致性——特别是在供应链等复杂系统中，诸如运输日期、位置和产品类别等字段必须保持逻辑一致性以确保现实应用。现有生成模型往往忽视这些列间关系，导致现实应用中不可靠的合成表格数据。为了解决这些挑战，我们提出了LLM-TabLogic，一种新颖的方法，利用大语言模型推理来捕捉和压缩表格列间的复杂逻辑关系，同时这些条件约束被传递到Score-based Diffusion模型中，在潜在空间中进行数据生成。通过在真实工业数据集上的广泛实验，我们评估了LLM-TabLogic在列推理和数据生成中的表现，将其与SMOTE和最先进的生成模型等五个基线进行比较。我们的结果表明，LLM-TabLogic在逻辑推理方面具有强大的泛化能力，在未见过的表格上实现了超过90%的准确率。此外，我们的方法在数据生成方面优于所有基线，通过完全保留列间关系的同时保持数据保真度、实用性和隐私的最佳平衡。本研究提出了首个在不需领域知识的情况下有效保持合成表格数据中列间关系的方法，为创建逻辑一致的现实表格数据提供了新的见解。代码可在https://github.com/Yunbo-max/TabKG获取。

英文摘要

Synthetic tabular data are increasingly being used to replace real data, serving as an effective solution that simultaneously protects privacy and addresses data scarcity. However, in addition to preserving global statistical properties, synthetic datasets must also maintain domain-specific logical consistency**-**especially in complex systems like supply chains, where fields such as shipment dates, locations, and product categories must remain logically consistent for real-world usability. Existing generative models often overlook these inter-column relationships, leading to unreliable synthetic tabular data in real-world applications. To address these challenges, we propose LLM-TabLogic, a novel approach that leverages Large Language Model reasoning to capture and compress the complex logical relationships among tabular columns, while these conditional constraints are passed into a Score-based Diffusion model for data generation in latent space. Through extensive experiments on real-world industrial datasets, we evaluate LLM-TabLogic for column reasoning and data generation, comparing it with five baselines including SMOTE and state-of-the-art generative models. Our results show that LLM-TabLogic demonstrates strong generalization in logical inference, achieving over 90% accuracy on unseen tables. Furthermore, our method outperforms all baselines in data generation by fully preserving inter-column relationships while maintaining the best balance between data fidelity, utility, and privacy. This study presents the first method to effectively preserve inter-column relationships in synthetic tabular data generation without requiring domain knowledge, offering new insights for creating logically consistent real-world tabular data. The code is available at https://github.com/Yunbo-max/TabKG.

URL PDF HTML ☆

赞 0 踩 0

2503.02087 2026-05-19 cs.RO cs.LG cs.SY eess.SY 版本更新

Uncertainty Representation in a SOTIF-Related Use Case with Dempster-Shafer Theory for LiDAR Sensor-Based Object Detection

基于Dempster-Shafer理论的LiDAR传感器目标检测SOTIF相关用例中的不确定性表示

Milin Patel, Rolf Jung

发表机构 * Institute for Driver Assistance and Connected Mobility（驾驶员辅助与车联网研究所）； Kempten University of Applied Sciences（科佩滕应用科学大学）

AI总结本文提出了一种系统的方法，利用Dempster-Shafer理论构建判定框架，以表示LiDAR传感器目标检测中的不确定性，并通过方差敏感性分析量化和优先处理这些不确定性，以确保自动驾驶场景的安全性。

Comments submitted as extended paper of Vehicle Technology and Intelligent Transport Systems (VEHITS)2024 conference and will be published by Springer in a CCIS Series book later in 2025

详情

DOI: 10.1007/978-3-032-23187-1_10

AI中文摘要

LiDAR传感器目标检测中的不确定性源于环境变化和传感器性能限制。表示这些不确定性对于确保预期功能安全（SOTIF）至关重要，SOTIF旨在防止自动驾驶场景中的危险。本文提出了一种系统的方法，用于识别、分类和表示LiDAR目标检测中的不确定性。Dempster-Shafer理论（DST）被用于构建判定框架（FoD）以表示检测结果。基于识别的不确定性来源之间的依赖性，应用条件基本概率分配（BPAs）。Yager的证据组合规则用于解决多个来源的冲突证据，提供一个结构化的框架来评估不确定性对检测准确性的影响。研究应用方差基于敏感性分析（VBSA）来量化和优先处理不确定性，详细说明其对检测性能的具体影响。

英文摘要

Uncertainty in LiDAR sensor-based object detection arises from environmental variability and sensor performance limitations. Representing these uncertainties is essential for ensuring the Safety of the Intended Functionality (SOTIF), which focuses on preventing hazards in automated driving scenarios. This paper presents a systematic approach to identifying, classifying, and representing uncertainties in LiDAR-based object detection within a SOTIF-related scenario. Dempster-Shafer Theory (DST) is employed to construct a Frame of Discernment (FoD) to represent detection outcomes. Conditional Basic Probability Assignments (BPAs) are applied based on dependencies among identified uncertainty sources. Yager's Rule of Combination is used to resolve conflicting evidence from multiple sources, providing a structured framework to evaluate uncertainties' effects on detection accuracy. The study applies variance-based sensitivity analysis (VBSA) to quantify and prioritize uncertainties, detailing their specific impact on detection performance.

URL PDF HTML ☆

赞 0 踩 0

2502.04055 2026-05-19 cs.LG 版本更新

Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation

评估合成表格数据生成中列之间的逻辑关系

Yunbo Long, Liming Xu, Alexandra Brintrup

发表机构 * Department of Engineering, University of Cambridge（剑桥大学工程系）； The Alan Turing Institute, London（伦敦艾伦·图灵研究所）

AI总结本文提出三种评估指标，用于评估合成表格数据中列间逻辑关系的保持情况，并通过实验证明现有方法在保持逻辑一致性方面存在不足，讨论了改进逻辑关系建模的可能路径。

详情

AI中文摘要

当前对合成表格数据的评估主要集中在联合分布建模的质量上，往往忽略了其在保持真实事件序列和列间一致实体关系方面的有效性。本文提出了三种评估指标，用于评估合成表格数据中列间逻辑关系的保持情况。我们通过在真实工业数据集上评估经典和最新生成方法的性能来验证这些指标。实验结果表明，现有方法往往无法严格保持逻辑一致性（例如地理或组织中的层级关系）和依赖性（例如时间序列或数学关系），这些对于保持真实世界表格数据的细粒度真实性至关重要。基于这些见解，本文还讨论了在建模合成表格数据分布时更好地捕捉逻辑关系的可能路径。代码可在https://github.com/Yunbo-max/TabLogicEval获取。

英文摘要

Current evaluations of synthetic tabular data mainly focus on how well joint distributions are modeled, often overlooking the assessment of their effectiveness in preserving realistic event sequences and coherent entity relationships across columns.This paper proposes three evaluation metrics designed to assess the preservation of logical relationships among columns in synthetic tabular data. We validate these metrics by assessing the performance of both classical and state-of-the-art generation methods on a real-world industrial dataset.Experimental results reveal that existing methods often fail to rigorously maintain logical consistency (e.g., hierarchical relationships in geography or organization) and dependencies (e.g., temporal sequences or mathematical relationships), which are crucial for preserving the fine-grained realism of real-world tabular data. Building on these insights, this study also discusses possible pathways to better capture logical relationships while modeling the distribution of synthetic tabular data. The code is available at https://github.com/Yunbo-max/TabLogicEval.

URL PDF HTML ☆

赞 0 踩 0

2411.03936 2026-05-19 cs.LG stat.ML 版本更新

对Llama3-8b-Instruct自生成文本识别能力的检查与控制

Christopher Ackerman, Nina Panickssery

AI总结本研究探讨了LLM是否能识别自身生成的文本，发现Llama3-8b-Instruct模型能够区分自身输出与人类输出，并通过残差流中的特定向量控制其行为和感知，揭示了模型自我归属的认知机制。

Comments 10 pages, 13 figs, 2 tables, accepted as conference paper to ICLR 2025

详情

Journal ref: The Thirteenth International Conference on Learning Representations (ICLR 2025)

AI中文摘要

已报告LLM能够识别其自身生成的文本，这可能对AI安全有重要影响，但研究较少。我们调查这一现象，以确定其在行为层面是否稳健发生，观察行为是如何实现的，以及是否可以控制。首先，我们发现Llama3-8b-Instruct聊天模型（而非基础Llama3-8b模型）能够可靠地区分自身输出与人类输出，并提供证据表明聊天模型很可能利用其在训练后对自身输出的经验来完成文本识别任务。其次，我们识别出残差流中一个在模型正确识别自身生成文本时被差异激活的向量，证明该向量对自我归属相关信息的响应，并提供证据表明该向量与模型中的“自我”概念相关，并展示该向量与模型感知和声明自我归属能力的因果关系。最后，我们证明该向量可用于控制模型的行为和感知，通过将其应用于模型生成输出时，可引导模型声称或否认作者身份；通过将其应用于模型阅读的文本时，可引导模型相信或不相信其写了任意文本。

英文摘要

It has been reported that LLMs can recognize their own writing. As this has potential implications for AI safety, yet is relatively understudied, we investigate the phenomenon, seeking to establish whether it robustly occurs at the behavioral level, how the observed behavior is achieved, and whether it can be controlled. First, we find that the Llama3-8b-Instruct chat model - but not the base Llama3-8b model - can reliably distinguish its own outputs from those of humans, and present evidence that the chat model is likely using its experience with its own outputs, acquired during post-training, to succeed at the writing recognition task. Second, we identify a vector in the residual stream of the model that is differentially activated when the model makes a correct self-written-text recognition judgment, show that the vector activates in response to information relevant to self-authorship, present evidence that the vector is related to the concept of "self" in the model, and demonstrate that the vector is causally related to the model's ability to perceive and assert self-authorship. Finally, we show that the vector can be used to control both the model's behavior and its perception, steering the model to claim or disclaim authorship by applying the vector to the model's output as it generates it, and steering the model to believe or disbelieve it wrote arbitrary texts by applying the vector to them as the model reads them.

URL PDF HTML ☆

赞 0 踩 0

2410.01223 2026-05-19 stat.CO cs.LG 版本更新

Statistical Taylor Expansion: A New and Path-Independent Method for Uncertainty Analysis

统计泰勒展开：一种新的、路径无关的不确定性分析方法

Chengpu Wang

发表机构 * Grossman Street, Melville, NY 11747, USA（美国纽约州梅尔维尤市格罗斯曼街11747号）

AI总结本文提出了一种新的路径无关的不确定性分析方法，通过将精确输入变量替换为具有已知分布和样本数的随机变量，计算每个结果的均值、偏差和可靠因子，从而实现对输入不确定性的传播追踪，使最终结果成为路径无关的，与传统数学方法不同。

Comments 47 pages, 40 figures

详情

AI中文摘要

作为一种严谨的统计方法，统计泰勒展开扩展了传统泰勒展开，通过将精确输入变量替换为具有已知分布和样本数的随机变量来计算每个结果的均值、偏差和可靠因子。它通过中间步骤追踪输入不确定性的传播，使最终的解析结果成为路径无关的。因此，它与传统数学方法根本不同，后者为每项计算优化计算路径。统计泰勒展开可能为解析表达式的数值计算提供标准化方法。本研究还介绍了称为方差算术的统计泰勒展开的实现，并在广泛的数学应用中展示了相应测试结果。此外，本研究还得出一个重要结论，即库函数中的数值误差可能显著影响结果。理想情况下，每个库函数的值都应通过不确定性偏差来完成。此外，统计泰勒展开与量子物理之间的可能联系也进行了讨论。

英文摘要

As a rigorous statistical approach, statistical Taylor expansion extends the conventional Taylor expansion by replacing precise input variables with random variables of known distributions and sample counts to compute the mean, the deviation, and the reliable factor of each result. It tracks the propagation of the input uncertainties through intermediate steps, so that the final analytic result becomes path independent. Therefore, it differs fundamentally from common approaches in applied mathematics that optimize computational path for each calculation. Statistical Taylor expansion may standardize numerical computations for analytic expressions. This study also introduces the implementation of statistical Taylor expansion termed variance arithmetic and presents corresponding test results across a wide range of mathematical applications. Another important conclusion of this study is that numerical errors in library functions can significantly affect results. It is desirable that each value from library functions be accomplished by an uncertainty deviation. The possible link between statistical Taylor expansion and quantum physics is discussed as well.

URL PDF HTML ☆

赞 0 踩 0

2406.09241 2026-05-19 math.OC cs.LG math.PR stat.ML 版本更新

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

小批量梯度下降的长期分布是什么？一种大偏差分析

Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

发表机构 * Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学，法国国家科学研究中心，法国国家信息与自动化技术研究院，格勒诺布尔理工大学，LJK研究所）； Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, UPS（图卢兹数学研究所，图卢兹大学，法国国家科学研究中心，普罗旺斯大学）

AI总结本文研究了在一般非凸问题中随机梯度下降（SGD）的长期分布。通过基于大偏差理论和随机扰动动力系统的方法，作者发现SGD的长期分布类似于热力学平衡态的玻尔兹曼-盖布斯分布，其中温度等于方法的步长大小，能量水平由问题的目标函数和噪声统计决定。研究还发现，在长期中，(a)问题的临界区域比任何非临界区域被访问的次数指数级更多；(b)SGD的迭代结果在问题的最低能量状态上指数级集中（该状态不总是对应于目标函数的全局最小值）；(c)所有其他临界点的连通分量被访问的频率与它们的能量水平呈指数比例关系；最后，(d)任何局部极大值或鞍点的连通分量都被局部最小值的连通分量所主导，后者被访问的次数指数级更多。

Comments 71 pages, 3 figures; presented in ICML 2024

详情

AI中文摘要

在本文中，我们研究了随机梯度下降（SGD）在一般非凸问题中的长期分布。具体而言，我们试图了解SGD更可能访问问题状态空间的哪些区域，以及程度如何。通过基于大偏差理论和随机扰动动力系统的方法，我们证明SGD的长期分布类似于热力学平衡态的玻尔兹曼-盖布斯分布，其中温度等于方法的步长大小，能量水平由问题的目标函数和噪声的统计特性决定。特别地，我们证明在长期中，(a)问题的临界区域比任何非临界区域被访问的次数指数级更多；(b)SGD的迭代结果在问题的最低能量状态上指数级集中（该状态不总是对应于目标函数的全局最小值）；(c)所有其他临界点的连通分量被访问的频率与它们的能量水平呈指数比例关系；最后，(d)任何局部极大值或鞍点的连通分量都被局部最小值的连通分量所主导，后者被访问的次数指数级更多。

英文摘要

In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (c) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often.

URL PDF HTML ☆

赞 0 踩 0

2405.19189 2026-05-19 cs.LG 版本更新

DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning

DyDiff: 通过动力学扩散实现离线强化学习中的长周期 rollout

Hanye Zhao, Xiaoshen Han, Zhengbang Zhu, Minghuan Liu, Yong Yu, De-Chuan Zhan, Weinan Zhang

发表机构 * School of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China（上海交通大学计算机科学学院）； Department of Computer Science, Nanjing University, Nanjing 210093, China（南京大学计算机科学系）

AI总结本文提出DyDiff，一种通过动力学扩散模型实现离线强化学习中长周期轨迹生成的方法，通过迭代注入学习策略信息，解决行为策略与学习策略不一致的问题，提升长周期rollout的准确性。

Comments 18 pages, 10 figures, 9 tables. The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-52028-5}

详情

AI中文摘要

随着扩散模型（DMs）在生成逼真合成视觉数据方面的巨大成功，许多研究者探索其在决策和控制中的潜力。大多数工作利用DMs直接从轨迹空间采样，其中DMs可视为动力学模型和策略的结合。在本工作中，我们探讨如何在完全离线设置中解耦DMs作为动力学模型的能力，使学习策略能够生成轨迹。由于DMs从数据集中学习数据分布，其内在策略实际上是数据集诱导的行为策略，导致行为策略与学习策略之间存在不匹配。我们提出Dynamics Diffusion，简称DyDiff，可以迭代地将学习策略的信息注入DMs中。DyDiff在保持策略一致性的同时确保长周期rollout的准确性，并且可以轻松部署在无模型算法上。我们提供了理论分析，证明DMs在长周期rollout上的优势优于其他模型，并在离线强化学习的上下文中验证了DyDiff的有效性，其中提供了一个rollout数据集但没有交互环境。

英文摘要

With the great success of diffusion models (DMs) in generating realistic synthetic vision data, many researchers have investigated their potential in decision-making and control. Most of these works utilized DMs to sample directly from the trajectory space, where DMs can be viewed as a combination of dynamics models and policies. In this work, we explore how to decouple DMs' ability as dynamics models in fully offline settings, allowing the learning policy to roll out trajectories. As DMs learn the data distribution from the dataset, their intrinsic policy is actually the behavior policy induced from the dataset, which results in a mismatch between the behavior policy and the learning policy. We propose Dynamics Diffusion, short as DyDiff, which can inject information from the learning policy to DMs iteratively. DyDiff ensures long-horizon rollout accuracy while maintaining policy consistency and can be easily deployed on model-free algorithms. We provide theoretical analysis to show the advantage of DMs on long-horizon rollout over models and demonstrate the effectiveness of DyDiff in the context of offline reinforcement learning, where the rollout dataset is provided but no online environment for interaction.

URL PDF HTML ☆

赞 0 踩 0

2405.06415 2026-05-19 stat.ML cs.LG 版本更新

Generalization analysis with deep ReLU networks for metric and similarity learning

基于深度ReLU网络的度量与相似性学习的泛化分析

Junyu Zhou, Puyu Wang, Ding-Xuan Zhou

发表机构 * RPTU Kaiserslautern-Landau（凯撒斯劳滕-兰道工业大学）； University of Sydney（悉尼大学）

AI总结本文研究了度量与相似性学习的泛化性能，通过构建结构化的深度ReLU神经网络来近似真实度量，并推导出显式的泛化误差界，首次为该领域提供了明确的泛化分析。

Comments 15 pages, 1 figure

详情

AI中文摘要

尽管度量与相似性学习已从多个理论角度被广泛研究，但对其泛化性能的深入理解仍显不足。本文通过利用真实度量（即目标函数）的特定结构，研究了度量与相似性学习的泛化行为。特别地，通过推导具有hinge损失的度量与相似性学习的真实度量的显式形式，我们构建了一个结构化的深度ReLU神经网络作为真实度量的近似，其近似能力取决于网络复杂度。这里，网络复杂度通过网络深度、非零权重数量和计算单元数量来表征。基于由此类结构化深度ReLU网络构成的假设空间，我们通过仔细控制近似误差和估计误差，建立了度量与相似性学习的超额风险界。通过选择适当的构造假设空间的容量，推导出显式的超额风险率。迄今为止，这是首次为度量与相似性学习提供显式超额风险界的泛化分析。此外，我们还研究了在更一般损失函数下度量与相似性学习的真实度量的性质。实验表明，所提出模型在经验上具有竞争力，并能更好地捕捉底层的相似性结构。

英文摘要

While metric and similarity learning has been extensively studied from several theoretical perspectives, a rigorous understanding of its generalization performance is still lacking. In this paper, we investigate the generalization behavior of metric and similarity learning by exploiting the specific structure of the true metric (i.e., the target function). In particular, by deriving the explicit form of the true metric for metric and similarity learning with the hinge loss, we construct a structured deep ReLU neural network as an approximation of the true metric, whose approximation ability depends on the network complexity. Here, the network complexity is characterized by the network depth, the number of nonzero weights, and the number of computational units. Based on the hypothesis space consisting of such structured deep ReLU networks, we establish excess risk bounds for metric and similarity learning by carefully controlling both the approximation error and the estimation error. An explicit excess risk rate is derived by choosing the proper capacity of the constructed hypothesis space. To the best of our knowledge, this is the first generalization analysis that provides explicit excess risk bounds for metric and similarity learning. In addition, we investigate properties of the true metric for metric and similarity learning under more general loss functions. Experiments show that the proposed model is empirically competitive and better captures the underlying similarity structure.

URL PDF HTML ☆

赞 0 踩 0

2402.15058 2026-05-19 math.AT cs.CG cs.LG 版本更新

Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds

Mixup Barcodes: 量化点云之间几何-拓扑相互作用

Hubert Wagner, Nickolas Arustamyan, Matthew Wheeler, Peter Bubenik

发表机构 * University of Florida（佛罗里达大学）； University of Central Florida（中央佛罗里达大学）

AI总结本文提出了一种新的方法，通过结合标准持续同调与图像持续同调，定义了量化形状及其相互作用的新型方法，引入了混合条形码、总混合度和总百分比混合度等统计量，并开发了相关软件工具，用于机器学习中的特征解缠问题。

Comments To appear at SoCG 2026

2308.06197 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features

利用基本特征的深度知识蒸馏进行复杂面部表情识别

Angus Maiden, Bahareh Nakisa

发表机构 * School of Information Technology, Deakin University（德克萨斯大学信息学院）

AI总结本文提出了一种基于持续学习的方法，通过知识蒸馏和新颖的预测排序记忆重放，实现了复杂面部表情识别的最新状态，能够在少量样本下准确识别新复合表情类别。

Comments 13 pages, 9 figures, 6 tables, 3 algorithms. Code available at https://github.com/AngusMaiden/complex-FER

详情

DOI: 10.1109/DICTA68720.2025.11302420

AI中文摘要

复杂情绪识别是一种认知任务，迄今为止尚未达到与其他处于或高于人类认知水平的任务相同的优秀性能。通过面部表情识别情绪尤其困难，因为人类面部表达的情绪复杂性。为了使机器在复杂面部表情识别方面达到人类的水平，可能需要实时综合知识和理解新概念，就像人类所做的那样。人类能够仅通过少量示例学习新概念，通过从记忆中蒸馏重要信息。受人类认知和学习的启发，我们提出了一种新的持续学习方法，用于复杂面部表情识别，通过在基本表情类别上构建和保留知识，能够使用少量训练样本准确识别新的复合表情类别。在本工作中，我们还使用GradCAM可视化来展示基本和复合面部表情之间的关系。我们的方法通过知识蒸馏和一种新颖的预测排序记忆重放来利用这种关系，实现了复杂面部表情识别持续学习的最新状态，新类别的总体准确率为74.28%。我们还证明了使用持续学习进行复杂面部表情识别的性能远优于非持续学习方法，比最先进的非持续学习方法提高了13.95%。我们的工作也是首次将少样本学习应用于复杂面部表情识别，仅使用每个类别一个训练样本，就实现了100%的准确率，达到了最先进的水平。

英文摘要

Complex emotion recognition is a cognitive task that has so far eluded the same excellent performance of other tasks that are at or above the level of human cognition. Emotion recognition through facial expressions is particularly difficult due to the complexity of emotions expressed by the human face. For a machine to approach the same level of performance in complex facial expression recognition as a human, it may need to synthesise knowledge and understand new concepts in real-time, as humans do. Humans are able to learn new concepts using only few examples by distilling important information from memories. Inspired by human cognition and learning, we propose a novel continual learning method for complex facial expression recognition that can accurately recognise new compound expression classes using few training samples, by building on and retaining its knowledge of basic expression classes. In this work, we also use GradCAM visualisations to demonstrate the relationship between basic and compound facial expressions. Our method leverages this relationship through knowledge distillation and a novel Predictive Sorting Memory Replay, to achieve the current state-of-the-art in continual learning for complex facial expression recognition, with 74.28% Overall Accuracy on new classes. We also demonstrate that using continual learning for complex facial expression recognition achieves far better performance than non-continual learning methods, improving on state-of-the-art non-continual learning methods by 13.95%. Our work is also the first to apply few-shot learning to complex facial expression recognition, achieving the state-of-the-art with 100% accuracy using only a single training sample per class.

URL PDF HTML ☆

赞 0 踩 0

2307.12405 2026-05-19 cs.LG 版本更新

Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach

多类流队列网络的最优控制：一种机器学习方法

Dimitris Bertsimas, Cheol Woo Kim

发表机构 * Sloan School of Management, Massachusetts Institute of Technology（麻省理工学院斯隆管理学院）； Operations Research Center, Massachusetts Institute of Technology（麻省理工学院运筹学中心）

AI总结本文提出了一种机器学习方法，用于多类流队列网络（MFQNETs）的最优控制，通过显式且有洞察力的控制策略，证明了存在分段常数最优策略，并通过OCT-H算法学习最优控制策略，实验表明在大规模网络中，该方法在测试集上达到100%的准确率。

详情

AI中文摘要

MasFACT：基于几何感知后验转移的连续多智能体拓扑学习

Xuefei Wang, Jialu Wang, Fengbo Zhang, Yihan Hu, Di Zhang, Yutong Ye, Yikun Ban, Jun Han, Ruijie Wang

发表机构 * Beihang University（北京航空航天大学）

AI总结本文提出MasFACT框架，通过几何感知后验转移方法，解决多智能体系统中因新任务适应导致的拓扑遗忘问题，提升连续学习任务的准确性和拓扑稳定性。

详情

AI中文摘要

多智能体系统（MAS）借助大型语言模型（LLMs）已成为解决复杂问题的强大范式，其性能关键依赖于底层的智能体间通信拓扑。然而，现有拓扑生成方法主要针对孤立任务进行优化，而现实部署涉及连续演化的任务流，要求先前有效的协作模式被保留和重用而非重新发现或覆盖。本文识别出一种此前未被充分探索的失败模式，即拓扑遗忘，其中适应新任务会使拓扑生成器偏离早期任务所需通信结构。该问题源于智能体层面功能语义和关系通信结构的跨任务不一致。为解决这一挑战，我们提出MasFACT，一种几何感知后验转移框架，通过融合Gromov-Wasserstein最优传输在任务特定智能体空间中转移历史协作知识作为可转移拓扑先验，并通过PAC-Bayes引导的保守后验适应在任务特定可塑性与结构稳定性之间取得平衡。在类别级、领域级和任务级连续设置中的实验表明，MasFACT在提升平均准确率的同时减少了拓扑遗忘，相比强大的拓扑生成和重放基线表现更优，并可无缝集成到不同的MAS拓扑生成器中。

英文摘要

Multi-agent systems (MAS) powered by large language models (LLMs) have emerged as a powerful paradigm for complex problem solving, where performance critically depends on the underlying inter-agent communication topology. However, existing topology generation methods mainly optimize for isolated tasks, while real-world deployments involve streams of evolving tasks, requiring previously effective collaboration patterns to be retained and reused rather than rediscovered or overwritten. We identify a previously underexplored failure mode, \emph{topology forgetting}, in which adapting to new tasks shifts the topology generator away from communication structures required by earlier tasks. This issue stems from cross-task misalignment in both agent-level functional semantics and relational communication structures. To address this challenge, we propose \textbf{\textsc{MasFACT}}, a geometry-aware posterior transfer framework that preserves and reuses historical collaboration knowledge as transferable topology priors. We transfer these priors across task-specific agent spaces through Fused Gromov-Wasserstein optimal transport and perform PAC-Bayes-guided conservative posterior adaptation to balance task-specific plasticity with structural stability. Experiments across class-, domain-, and task-level continual settings demonstrate that \textsc{MasFACT} consistently improves average accuracy while reducing topology forgetting compared to strong topology generation and replay-based baselines, and can be seamlessly integrated with different MAS topology generators.

URL PDF HTML ☆

赞 0 踩 0

2605.17347 2026-05-19 cs.CY cs.CV cs.LG 版本更新

Position: Age Estimation Models Do Not Process Biometric Data

位置：年龄估计模型不处理生物特征数据

Nikita Marshalkin

发表机构 * Sumsub GmbH, Berlin, Germany（Sumsub公司，柏林，德国）

AI总结本文研究了年龄估计模型是否处理生物特征数据，通过实验表明这些模型无法达到身份识别阈值，因此不涉及身份识别，呼吁研究者和监管机构提高透明度。

Comments 11 pages, 3 figures, 3 tables. Accepted as a position paper at the 43rd International Conference on Machine Learning (ICML 2026)

2605.17339 2026-05-19 cs.LG 版本更新

Bridging the Gap between Sparse Matrix Reordering and Factorization: A Deep Learning Framework for Fill-in Reduction

弥合稀疏矩阵重排与分解之间的差距：一种用于填充减少的深度学习框架

Ziwei Li, Tao Yuan, Shuzi Niu, Huiyuan Li

发表机构 * Institute of Software, Chinese Academy of Sciences, Beijing, China（中国科学院软件研究所，北京，中国）； University of Chinese Academy of Sciences, Beijing, China（中国科学院大学，北京，中国）

AI总结本文提出一种深度学习框架，通过谱嵌入最小化填充代理函数，弥合稀疏矩阵重排与分解之间的差距，实验表明其性能优于传统图论算法和深度学习方法。

Comments Accepted by DASFAA 2025

详情

AI中文摘要

稀疏矩阵重排可以显著减少矩阵分解过程中的填充量，从而降低稀疏矩阵计算中的计算和存储需求。寻找最小填充量的重排顺序已知是NP难问题。此外，存在一个悖论：矩阵重排在矩阵分解之前进行，但重排方法旨在减少的填充是由矩阵分解产生的。为了弥合重排与分解之间的差距，我们提出了一种深度学习框架，基于谱嵌入最小化填充代理函数。首先，我们采用多网格-like GNN架构来学习近似其图拉普拉斯矩阵的最小特征向量，即谱嵌入，并捕捉矩阵的全局结构信息。然后，另一个多网格-like GNN架构用于基于秩分布最小化潜在的填充空间。实验结果表明，我们的方法在传统图论算法和深度学习方法中表现具有竞争力。

英文摘要

Sparse matrix reordering can significantly reduce the fill-in during matrix factorization, thereby decreasing the computational and storage requirements in sparse matrix computations. Finding a minimal fill-in ordering is known to be an NP-hard problem. Moreover, there is a paradox: matrix reordering is applied before matrix factorization, but fill-ins that matrix reordering methods aim at are generated from matrix factorization. To bridge the gap between reordering and factorization, we propose a deep learning framework to minimize a fill-in surrogate function based on spectral embedding. First, we employ a multi-grid-like GNN architecture to learn to approximate the smallest eigenvectors of its graph Laplacian matrix, i.e. spectral embedding, and capture the global structural information of the matrix. Then, another multi-grid-like GNN architecture is used to minimize the potential space where fill-in can occur based on the rank distribution. Experimental results indicate that our approach achieves competitive performance compared with traditional graph-theoretic algorithms and deep learning methods.

URL PDF HTML ☆

赞 0 踩 0

2605.17334 2026-05-19 cond-mat.mtrl-sci cond-mat.stat-mech cs.LG physics.comp-ph 版本更新

Causal Anomaly Detection for Lithium-Ion Battery Degradation

锂离子电池退化中的因果异常检测

Dieter W. Heermann, Hagen Heermann

发表机构 * Institute for Theoretical Physics, Heidelberg University（海德堡大学理论物理研究所）； Intilion GmbH（Intilion公司）

AI总结本研究提出了一种基于因果图发现和k近邻转移熵的框架，用于通过常规循环 telemetry 数据检测锂离子电池退化，并通过三种信号类别包对异常评分进行组织，以提高检测灵敏度。

详情

AI中文摘要

可靠的早期检测锂离子电池退化需要能够物理解释且能从常规循环 telemetry 数据中计算出的健康指标。我们引入了CausalHealth框架，该框架应用因果图发现和k近邻转移熵对每个循环的电压、电流、温度和电阻时间序列进行处理，并将十二个结果异常评分组织成三个信号类别包（幅度位移、预测残差、复杂性熵）——隔离森林被单独报告，因为它低于包的可靠性阈值——以表征在十个校准分数（5-30%）范围内的检测灵敏度。幅度位移类别在所有七个测试的电池上实现了100%的检测率，覆盖LFP（MIT-Stanford MATR）和LCO（NASA PCoE、CALCE CS2）化学体系，其在渐进衰减电池上在传统容量阈值失效前的提前时间可达402个循环。一个可靠性加权主健康指数（RWMHI）——一个跨包融合的五个高可靠性检测器，按逆系数变异率加权——在长寿命电池上将提前时间提高了15-52个循环，同时保持100%的检测率。通过电化学阻抗谱对一个NMC棱柱电池的验证提供了独立的物理基础：转移熵TE(R→V)与电荷转移电阻R_ct相关（汇总r=+0.990；温度控制部分r=+0.898），对两者进行阿伦尼乌斯分析得到的激活能与已发表的NMC电荷转移动力学一致。这些结果在三个基准数据集上的七块电池上进行了评估。

英文摘要

Reliable early detection of lithium-ion battery degradation requires health indicators that are physically interpretable and computable from routine cycler telemetry without access to the degradation region. We introduce \textsc{CausalHealth}, a framework that applies causal graph discovery and $k$-nearest-neighbour transfer entropy to per-cycle voltage, current, temperature, and resistance time series, and organises twelve resulting anomaly scores into three signal-class bundles (Magnitude-shift, Predictive-residual, Complexity-entropy) -- with Isolation Forest reported separately as it falls below the bundle reliability threshold -- to characterise detection sensitivity across ten commissioning fractions (5--30\,\%). The Magnitude-shift class achieves 100\,\% detection across all seven tested cells spanning LFP (MIT--Stanford MATR) and LCO (NASA PCoE, CALCE CS2) chemistries, with a lead time of up to 402 cycles before conventional capacity-threshold failure on gradual-fade cells. A Reliability-Weighted Master Health Index (RWMHI) -- a cross-bundle fusion of five high-reliability detectors weighted by inverse coefficient of variation -- improves lead time by 15--52 cycles over the class median on long-lived cells while maintaining 100\,\% detection. Validation against electrochemical impedance spectroscopy on an NMC prismatic cell provides independent physical grounding: transfer entropy $\mathrm{TE}(R \!\to\! V)$ correlates with charge-transfer resistance $R_{\mathrm{ct}}$ (pooled $r = +0.990$; temperature-controlled partial $r = +0.898$), and an Arrhenius analysis of both quantities yields an activation energy consistent with published NMC charge-transfer kinetics. These results are evaluated on seven cells across three benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.17316 2026-05-19 cs.LG cs.AI 版本更新

Learning Higher-Order Structure from Incomplete Spatiotemporal Data: Multi-Scale Hypergraph Laplacians with Neural Refinement

从不完整时空数据中学习高阶结构：具有神经细化的多尺度超图拉普拉斯算子

Keshu Wu, Sixu Li, Zihao Li, Zhiwen Fan, Xiaopeng Li, Yang Zhou

发表机构 * Texas A&M University（德克萨斯大学A&M分校）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结本文提出了一种多尺度超图拉普拉斯（MSHL）框架，通过两阶段方法从不完整时空观测中学习高阶结构。该方法通过发现阶段构建多尺度超图，并在细化阶段引入条件残差网络，以处理高阶关系中的残差特征，从而在交通网络中实现了更准确的缺失数据填补。

详情

AI中文摘要

传感器网络日益成为现代基础设施的核心，然而标准填补基准所假设的均匀随机缺失模式往往不适用于实际场景。环形检测器在校准期间会断线，路边柜子会沉默附近传感器的集群，而新安装的仪器则无法提供历史数据。这些故障会产生结构化的缺失，其值受传感器组之间的高阶关系约束，而非仅仅是成对接近性。现有低秩和图方法往往无法捕捉这种集体结构，当缺失性变得一致时可能会失效。本文引入多尺度超图拉普拉斯（MSHL），一种两阶段框架，用于从不完整的时空观测中学习高阶结构。发现阶段通过互补的拓扑和残差相关证据构建多尺度超图，并采用仅基于观测的选取器，适应支持的交互尺度。细化阶段添加一个小型超图条件残差网络，其安全性由构造保证：在存在信息残差特征时学习非线性修正，在不存在时则退化为线性估计。我们证明MSHL可以表示无法被成对图先验捕捉的组内守恒模式，能够适应最佳固定尺度，至多一个对数因子，将这种优势转移到验证的填补误差中，并允许单侧细化保证。在两个真实交通网络上评估，针对散落单元缺失、连续块断电和整个传感器黑箱在五种速率下，MSHL在高阶结构可识别时优于成对图基线，否则在采样噪声范围内匹配。结果表明，可靠的基础设施学习存在更广泛的原则：缺失数据不应被视为孤立的填补条目，而应视为发现结构的证据。

英文摘要

Sensor networks increasingly govern modern infrastructure, yet the data they lose are rarely missing in the uniform-random patterns assumed by standard imputation benchmarks. Loop detectors go offline during calibration, roadside cabinets silence clusters of nearby sensors, and newly installed instruments provide no history. Such failures create structured absences whose values are constrained by higher-order relations among groups of sensors, not merely by pairwise proximity. Existing low-rank and graph-based methods often miss this collective structure and can fail when missingness becomes coherent. We introduce Multi-Scale Hypergraph Laplacians (MSHL), a two-stage framework for learning higher-order structure from incomplete spatiotemporal observations. The Discovery stage builds a multi-scale hypergraph from complementary topology and residual-correlation evidence, with an observation-only selector that adapts to the supported interaction scale. The Refinement stage adds a small hypergraph-conditioned residual network that is safe by construction: it learns nonlinear corrections where informative residual features exist and defers to the linear estimate where they do not. We prove that MSHL represents group-conservation patterns inaccessible to pairwise graph priors, adapts to the best fixed scale up to a logarithmic factor, transfers this advantage to held-out imputation error, and admits a one-sided refinement guarantee. On two real traffic networks evaluated across scattered cell missingness, contiguous block outages, and whole-sensor blackouts at five rates, MSHL improves over a pairwise-graph baseline whenever higher-order structure is identifiable and otherwise matches it within sampling noise. The results point to a broader principle for reliable infrastructure learning: missing data should be treated not as isolated entries to fill, but as evidence of structure to discover.

URL PDF HTML ☆

赞 0 踩 0

2605.17314 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

通过不匹配的错误草稿实现弱到强的引导

Wei Deng

发表机构 * Independent Researcher（独立研究者）

AI总结本文研究了通过较小较弱模型的不匹配错误草稿引导更强学习者的能力，发现这种策略在MATH-500和AIME 2025/2026等任务上表现优异，主要贡献是提出了一种有效的训练方法。

详情

AI中文摘要

我们考虑是否可以利用较小、较弱模型的离线经验来引导更强的学习者，使其在在线策略学习（如GRPO）无法达到的能力。我们发现，将数学上错误但更领域训练的较小模型生成的草稿注入更强学习者的GRPO上下文，能一致优于标准在线GRPO在MATH-500和离分布AIME 2025/2026上。具体来说，我们使用Mathstral-7B作为学习者，Qwen2.5-Math-1.5B作为草稿模型，8.8K Level 3--5 MATH问题（其中MATH-500被排除），并使用Dr. GRPO进行训练。不匹配是关键成分：在保持其他条件不变的情况下，将草稿洗牌到不匹配的问题中，使MATH-500的greedy pass@1提升+1.62pp（n=10种子，p=0.0015，Welch's t检验）。事实上，不匹配-错误变体在MATH-500上所有测试的变体中均优于。在离分布AIME 2025和2026上，不匹配-错误变体在每个样本预算从k=1到k=1024的所有年份中，均将pass@k提升到Mathstral-7B（其原生[INST]格式）和Qwen2.5-Math-1.5B草稿模型之上。所有变体在测试时使用相同的提示，没有草稿注入。该配方——在单个GPU上训练，无需SFT、奖励模型、合成数据和无produce-critique-revise内循环——在Mathstral-7B-v0.1上达到了71.98%的MATH-500成绩，这是目前该模型的最高已发表结果，超过了WizardMath流程在完整MATH上的70.9%（SFT + PPO加过程/指令奖励模型）。

英文摘要

We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a smaller but more domain-trained model -- mismatched to the current problem -- into a stronger learner's GRPO context consistently outperforms standard on-policy GRPO on held-out MATH-500 and out-of-distribution AIME 2025/2026. Concretely, we use Mathstral-7B as the learner, Qwen2.5-Math-1.5B as the draft model, 8.8K Level 3--5 MATH problems (with MATH-500 held out), and train with Dr. GRPO. Mismatch is an active ingredient: shuffling drafts to mismatched problems while holding everything else constant yields $+1.62$pp on MATH-500 (greedy pass@1) over the matched-wrong variant ($n=10$ seeds, $p=0.0015$, Welch's $t$). In fact, the mismatched-wrong variant leads all other variants we tested on MATH-500 across both greedy pass@1 and sampling pass@$k$. On out-of-distribution AIME 2025 and 2026, the mismatched-wrong variant uniquely lifts pass@$k$ above both Mathstral-7B (in its native [INST] format) and the Qwen2.5-Math-1.5B draft model at every sample budget from $k=1$ to $k=1024$ across 2 seeds ($+14.2$pp on 2025 and $+9.0$pp on 2026 at pass@1024 over Mathstral-7B), and at pass@1024 also leads no-draft, matched-wrong, and mismatched-correct variants on both years. All variants use the same prompt with no draft injection at test time. The recipe -- trained on a single GPU with no SFT, no reward models, no synthesized data, and no produce-critique-revise inner loop -- reaches 71.98% MATH-500 on Mathstral-7B-v0.1, the highest published result on this model to our knowledge, surpassing the heavier WizardMath pipeline at 70.9% on full MATH (SFT + PPO with process/instruction reward models).

URL PDF HTML ☆

赞 0 踩 0

2605.17307 2026-05-19 q-fin.PM cs.AI cs.LG cs.NE q-fin.TR 版本更新

Deep Reinforcement Learning Framework for Diversified Portfolio Management Across Global Equity Markets

面向全球股票市场的多样化投资组合管理的深度强化学习框架

Kamil Kashif, Robert Ślepaczuk

发表机构 * Quantitative Finance Research Group, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融研究组，华沙大学）； Quantitative Finance Research Group, Department of Quantitative Finance and Machine Learning, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融与机器学习系量化金融研究组，华沙大学）

AI总结本文提出并评估了一个深度强化学习框架，用于动态分配全球股票市场投资组合，通过比较五种模型配置，探讨了奖励函数、策略结构、投资组合约束和时间编码器对风险调整后表现的影响。

Comments 67 pages, 11 figures, 16 tables

详情

AI中文摘要

本研究开发并评估了一个深度强化学习框架，用于动态分配全球股票市场投资组合。Soft Actor-Critic算法被用于在马尔可夫决策过程中学习连续的投资组合权重，将交易成本、换手惩罚和多样化约束纳入奖励函数中。比较了五种模型配置，这些配置在奖励公式、策略结构（扁平与分层Dirichlet）、投资组合约束和时间编码器（LSTM与Transformer）方面有所不同，并通过走步优化在2003-2026年的纳斯达克100、日经225和欧元 Stoxx 50十六个外样本折上进行了评估。结果表明，强化学习策略在欧元 Stoxx 50市场中实现了有竞争力的风险调整后表现，其中观察到统计显著的异常收益，但核心假设仅部分得到验证：没有策略在HAC稳健推断下相对于持有策略实现统计显著的超额收益。制度分析揭示，强化学习在不确定性升高时期增加价值，而跨市场的集合聚合提高了风险调整后表现，并确认了地理多样化的好处。

英文摘要

This study develops and evaluates a deep reinforcement learning framework for dynamic portfolio allocation across global equity markets. The Soft Actor-Critic algorithm is used to learn continuous portfolio weights within a Markov Decision Process, incorporating transaction costs, turnover penalties, and diversification constraints into the reward function. Five model configurations are compared, varying in reward formulation, policy structure (flat versus hierarchical Dirichlet), portfolio constraints, and temporal encoder (LSTM versus Transformer), and evaluated via walk-forward optimization across sixteen out-of-sample folds spanning 2003-2026 on the Nasdaq-100, Nikkei 225, and Euro Stoxx 50. Results show that RL strategies achieve competitive risk-adjusted performance primarily in the Euro Stoxx 50, where statistically significant abnormal returns are observed, but the central hypothesis is only partially confirmed: no strategy achieves statistically significant excess returns relative to Buy and Hold under HAC-robust inference across all markets. Regime analysis reveals that RL adds the most value during periods of elevated uncertainty, while ensemble aggregation across markets improves risk-adjusted performance and confirms the benefits of geographic diversification.

URL PDF HTML ☆

赞 0 踩 0

2605.17304 2026-05-19 cs.LG cs.CL 版本更新

A2RBench: 一个用于形式可验证抽象推理基准生成的自动范式

Qingchuan Ma, Yuexiao Ma, Yongkang Xie, Tianyu Xie, Xiawu Zheng, Rongrong Ji

发表机构 * Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education（教育部多媒体可信感知与高效计算重点实验室）； Institute of Artificial Intelligence（人工智能研究院）

AI总结本文提出A2RBench自动范式，通过生成、扩展、评估和分析流程提升抽象推理基准生成效率，发现当前LLM在抽象推理能力上存在根本缺陷，且高信息复杂度输入可简化推理过程。

详情

AI中文摘要

抽象推理能力反映了LLM提取和应用抽象规则的智能和泛化能力。然而，准确测量这一能力仍然具有挑战性：现有基准要么依赖昂贵的手动标注，限制了其规模，要么有风险测量记忆而非真正的推理。为此，我们引入了一个名为A2RBench的自动化流程，包括生成、扩展、评估和分析。具体而言，在生成阶段，LLM创建多样化的任务，要求真正的推理；在扩展阶段，LLM重用已验证的规则并扩展新的输入空间以生成任务变体，实现扩展。然而，这一过程可能导致幻觉。为消除它，我们进一步建立了理论框架并证明，程序验证——测试逆操作是否完美地逆转正向操作（循环一致性）——保证了唯一解。通过在主流LLM上的广泛评估，我们发现：（1）当前LLM在抽象推理上存在根本缺陷，顶级模型在代表性子集上显著低于人类（39.8% vs. 68.5%）。（2）当前LLM在生成3D任务的复杂度上远低于2D和1D，揭示了其对高维任务的理解不足。（3）反直觉的是，信息复杂度更高的输入可以简化推理过程。

英文摘要

Abstract reasoning ability reflects the intelligence and generalization capacity of LLMs to extract and apply abstract rules. However, accurately measuring this ability remains challenging: existing benchmarks either rely on expensive manual annotation, limiting their scale, or risk measuring memorization rather than genuine reasoning. To address this, we introduce an automated pipeline named A2RBench, encompassing generation, expansion, evaluation, and analysis. Specifically, in the generation stage, LLMs create diverse tasks demanding genuine reasoning; in the expansion stage, LLMs reuse validated rules and expand new input spaces to generate task variations, achieving scaling. However, such a process may cause hallucinations. To eliminate it, we further establish a theoretical framework and prove that programmatic verification--testing whether the inverse operation perfectly reverses the forward operation (cycle consistency)--guarantees a unique solution. Through extensive evaluations on mainstream LLMs, we find: (1) Current LLMs exhibit fundamental deficiencies in abstract reasoning, with top models significantly underperforming humans on a representative subset (39.8% vs. 68.5%). (2) Current LLMs fall far short of 2D and 1D in the complexity of generated 3D tasks, revealing their lack of understanding of high-dimensional tasks. (3) Counterintuitively, inputs with higher information complexity can simplify the reasoning process.

URL PDF HTML ☆

赞 0 踩 0

2605.17276 2026-05-19 cs.LG cs.AI 版本更新

How Do Electrocardiogram Models Scale?

ECG模型如何扩展？

Jiawei Li, Fabio Bonassi, Ming Jin, Stefan Gustafsson, Johan Sundström, Thomas B. Schön, Antônio H. Ribeiro

发表机构 * Uppsala University（乌普萨拉大学）； Griffith University（格里菲斯大学）

AI总结本文研究了ECG模型在不同规模下的扩展规律，发现监督学习模型在数据受限时表现不佳，而自监督学习模型在模型和数据规模上都具有鲁棒性，同时自监督Transformer在非常大的模型规模上超越了ResNet。

详情

AI中文摘要

尽管扩展定律已为自然语言处理中的基础模型建立了基本框架，但其在心电图（ECG）模型中的适用性仍缺乏充分的描述。事实上，最近的研究并未始终显示出随着ECG模型的大小或预训练数据集大小的增加，下游性能的一致性提升，这使得模型架构归纳偏置、预训练范式以及与规模相关的预期改进的确切作用仍然不明。在本工作中，我们系统地研究了ECG领域内的神经网络和损失到损失扩展定律。通过在大规模CODE数据集（230万条记录）上预训练超过120个模型（参数量从2万到2000万不等），我们解耦了模型架构（ResNet vs. Transformer）和预训练范式（监督学习SL vs. 自监督学习SSL）的影响。我们发现（i）SL模型在分布内是数据瓶颈的，而SSL模型在模型和数据规模上都具有鲁棒性；（ii）对于分布外（OOD）泛化，ResNet比Transformer在参数效率上高1.3到2.5倍，而SSL在数据效率上最高可达16倍，并在未见的临床任务上实现了高达7.6倍的转移效率；（iii）在观察到的规模范围内，基于ResNet的模型通常在OOD损失上表现最低，SSL在未见的临床任务上占据主导地位，而自监督的Transformer在非常大的模型规模上超越了ResNet。我们的结果表明，有效ECG基础模型的路径在于架构和范式的战略对齐，而非单纯的暴力扩展。

英文摘要

While scaling laws have established a fundamental framework for foundation models in natural language processing, their applicability to electrocardiogram (ECG) models remains poorly characterized. Indeed, recent studies do not always yield consistent downstream gains as one increases the model size or pre-training dataset size of ECG models, leaving the exact roles of architectural inductive biases, pre-training paradigms, and expected improvements with size largely unanswered. In this work, we systematically investigate neural and loss-to-loss scaling laws within the ECG domain. By pre-training over $120$ models (ranging from $20$K to $200$M parameters) on the large-scale CODE dataset ($2.3$M records), we decouple the effects of model architecture (ResNet vs. Transformer) and pre-training paradigm, namely supervised learning (SL) versus self-supervised learning (SSL). We found that (i) SL models are data-bottlenecked in-distribution, whereas SSL models scale robustly across both model and data sizes; (ii) for out-of-distribution (OOD) generalization, ResNets are $1.3$ to $2.5$ times more parameter-efficient than Transformers, while SSL is up to $16$ times more data-efficient and achieves up to $7.6$ times higher transfer efficiency than SL on unseen clinical tasks; (iii) across the observed scales, ResNet-based models generally achieve the lowest OOD loss, with SSL dominating on unseen clinical tasks and self-supervised Transformers overtaking at very large model sizes. Our results suggest that the path to effective ECG foundation models lies in the strategic alignment of architecture and paradigm rather than brute-force scaling.

URL PDF HTML ☆

赞 0 踩 0

2605.17275 2026-05-19 q-fin.RM cs.LG 版本更新

A Hybrid Gaussian Process Regression Framework for Stable Volatility-Covariance Estimation: Evidence from Global Equity Indices

一种用于稳定波动-协方差估计的混合高斯过程回归框架：来自全球股票指数的证据

Ujjwala Vadrevu

AI总结本文提出了一种混合高斯过程回归-历史模拟（GPR-HS）框架，用于估计全球股票指数多样化投资组合中的VaR和ES，通过动态建模单个资产波动率和稳定的历史协方差估计交叉资产相关性，从而提高尾部风险预测的准确性。

Comments Working paper. Replication code available at: https://colab.research.google.com/drive/1nrlSqmG10DNerNmEqGIh3EB9CcLWIgH9

详情

AI中文摘要

准确预测波动-协方差矩阵（VCV）对于监管资本充足性过程，如内部资本充足性评估程序（ICAAP）和综合资本分析和审查（CCAR）至关重要。传统的计量经济模型，包括GARCH家族和指数加权移动平均（EWMA）方法，由于参数刚性和分布假设，在压力下导致数值不稳定，从而系统性低估尾部风险。本文提出并验证了一种新的混合高斯过程回归-历史模拟（GPR-HS）框架，用于估计多样化投资组合中七个主要全球股票指数的VaR和ES。该框架将VCV估计问题解耦：单个资产波动率通过具有Matern 5/2核的单变量GPR动态建模，而交叉资产相关性通过稳定的历史协方差估计。关键的方法论贡献是攻击性噪声初始化（ANI）策略，该策略将初始白噪声核方差设置为训练回报的实证方差，确保Gram矩阵正定性、正则化和保守、符合监管要求的预测。通过2020年6月至2025年6月的扩展窗口前向链交叉验证方案评估，GPR-HS框架在大多数测试分割中实现了监管合规性；包括投资组合层面100%的ES通过率，同时在71.4%的单变量案例中通过二次损失优于静态历史VaR基准，在100%的案例中通过违规次数。

英文摘要

Accurate forecasting of the Volatility-Covariance Matrix (VCV) is central to regulatory capital adequacy processes such as the Internal Capital Adequacy Assessment Process (ICAAP) and the Comprehensive Capital Analysis and Review (CCAR). Traditional econometric models, including GARCH-family and Exponentially Weighted Moving Average (EWMA) approaches, suffer from parametric rigidity, distributional assumptions, and numerical instability under stress, leading to systematic underestimation of tail risk. This paper proposes and validates a novel Hybrid Gaussian Process Regression-Historical Simulation (GPR-HS) framework for estimating Value-at-Risk (VaR) and Expected Shortfall (ES) across a diversified portfolio of seven major global equity indices. The framework decouples the VCV estimation problem: individual asset volatilities are modelled dynamically using Univariate GPR with a Matern 5/2 kernel, while inter-asset correlations are estimated via stable historical covariance. A key methodological contribution is the Aggressive Noise Initialization (ANI) strategy, which sets the initial White Noise kernel variance equal to the empirical variance of the training returns, ensuring Gram matrix positive-definiteness, regularization, and conservative, regulatory-compliant forecasts. Evaluated using an expanding window forward-chaining cross-validation scheme over June 2020 -June 2025, the GPR-HS framework achieves regulatory compliance in the majority of test splits; including a 100% ES pass rate at the portfolio level, while outperforming the static Historical VaR benchmark in 71.4% of univariate cases by Quadratic Loss and 100% of cases by violation count.

URL PDF HTML ☆

赞 0 踩 0

2605.17271 2026-05-19 math.OC cs.LG 版本更新

面向实时机电攻击和故障分类的延迟感知深度学习基准测试

Emad Abukhousa, Saman Zonouz, A. P. Sakis Meliopoulos

发表机构 * Emad Abukhousa（埃马德·阿布库霍萨）； Saman Zonouz（萨曼·宗努兹）

AI总结本文提出了一种延迟感知的深度学习基准测试框架，用于评估在逆变器主导电网中使用高保真时域信号进行电力系统异常检测的深度学习模型。通过系统评估从物理故障和网络攻击中生成的流数据集，评估了八种神经网络架构，包括MLP到Transformer。所有模型都能在亚周期响应时间低于15毫秒的情况下实时分类两种代表性多事件序列，但端到端推理延迟始终超过三个周期，范围从50到90毫秒。这些结果突显了算法能力与保护级部署之间的关键差距，指出了进一步优化和硬件加速的必要性。研究结果建立了可重复的亚周期异常检测基准，并为将机器学习方法从研究原型过渡到实际保护应用提供了指导。

详情

AI中文摘要

本文介绍了一种延迟感知的基准测试框架，用于评估在电力系统异常检测中使用高保真时域信号生成的深度学习模型。通过系统评估从物理故障和网络攻击中生成的流数据集，评估了八种神经网络架构，包括MLP到Transformer。所有模型都能在亚周期响应时间低于15毫秒的情况下实时分类两种代表性多事件序列，但端到端推理延迟始终超过三个周期，范围从50到90毫秒。这些结果突显了算法能力与保护级部署之间的关键差距，指出了进一步优化和硬件加速的必要性。研究结果建立了可重复的亚周期异常检测基准，并为将机器学习方法从研究原型过渡到实际保护应用提供了指导。

英文摘要

This work introduces a latency-aware benchmarking framework for evaluating deep learning models in power system anomaly detection using high-fidelity, time-domain signals generated from an industry-grade electromagnetic transient simulator. Eight neural network architectures, ranging from MLPs to Transformers, were systematically evaluated on streaming datasets representing both physical faults and cyber-attacks in inverter-dominated networks. All models successfully classified two representative multi-event sequences in real time with sub-cycle response times below 15 ms. However, although classification decisions occurred within one cycle, the end-to-end inference latency consistently exceeded three cycles, ranging from 50 to 90 ms. These results highlight a critical gap between algorithmic capability and protection-grade deployment, pointing to the need for further optimization and hardware acceleration. The findings establish a reproducible benchmark for sub-cycle anomaly detection and provide guidance for transitioning machine learning methods from research prototypes to real-world protection applications.

URL PDF HTML ☆

赞 0 踩 0

2605.17251 2026-05-19 cs.DS cs.LG 版本更新

Iterative Chow Filtering for Learning with Distribution Shift

迭代 Chow 过滤用于分布偏移学习

Gautam Chandrasekaran, Georgios Gkrinias, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

发表机构 * UT Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出了一种基于迭代 Chow 过滤的方法，解决了分布偏移学习中的效率问题，展示了在 DNF 公式下实现准多项式时间 PQ 学习算法，并在多个函数类上提供了指数级改进。

Comments 30 pages

详情

AI中文摘要

Goel 等人的近期工作给出了在具有挑战性的 PQ 框架中学习分布偏移的第一个高效算法。在此设置中，学习者接收带标签的训练示例和未标记的测试示例，并必须在测试集上做出正确预测，但允许在分布外点上不进行预测。他们的结果依赖于 L2 沙丁鱼近似，这是一个强需求，导致在如 DNF 公式等基本函数类上产生较差的界限。在这里，我们证明较弱的 L1 沙丁鱼近似足以实现高效的 PQ 学习。作为结果，我们获得了在均匀分布下 DNFs 的第一个准多项式时间 PQ 学习算法，并在很大程度上匹配了普通 PAC 学习所知的保证。更广泛地说，我们的界限为包括常深电路和常次数多项式阈值函数在内的多个类提供了指数级改进。我们的主要技术成分是迭代 Chow 过滤，这是一种新的程序，利用低次 Chow 参数来识别和移除与训练分布不兼容的测试点。

英文摘要

Recent work due to Goel et al. gave the first efficient algorithms for learning with distribution shift in the challenging PQ framework. In this setting, a learner receives labeled training examples, unlabeled test examples, and must make correct predictions on the test set but is allowed to abstain from predicting on out-of-distribution points. Their results rely on ${\cal L}_2$ sandwiching approximations, a strong requirement that leads to poor bounds for several basic function classes such as DNF formulas. Here, we show that the weaker notion of ${\cal L}_1$ sandwiching suffices for efficient PQ learning. As a consequence, we obtain the first quasipolynomial-time PQ learning algorithm for DNFs under the uniform distribution and essentially match the guarantees known for ordinary PAC learning. More broadly, our bounds provide exponential improvements for several classes including constant depth circuits and constant degree polynomial threshold functions. Our main technical ingredient is Iterative Chow Filtering, a new procedure that uses low-degree Chow parameters to identify and remove test points incompatible with the training distribution.

URL PDF HTML ☆

赞 0 踩 0

2605.17250 2026-05-19 cs.LG 版本更新

Towards Principled Test-Time Adaptation for Time Series Forecasting

面向时间序列预测的原理性测试时间适应方法

Haochun Wang, Ruichen Xu, Georgios Kementzidis, Karen Cho, Sebastian Ramirez Villarreal, Yuefan Deng

发表机构 * Stony Brook University（石溪大学）

AI总结本文提出了一种基于频率域的轻量级校准方法FAC，用于改进时间序列预测在分布偏移下的适应性，通过频率域分析现有适配器的预测修正，并在多种数据集和预测时间上实现了更高效和一致的性能。

详情

AI中文摘要

测试时间适应（TTA）最近作为一种有前景的方法，用于在分布偏移下改进时间序列预测（TSF）。现有的TSF-TTA方法在利用揭示的目标方面存在差异，导致适应协议异质且缺乏明确统一的公式。为了解决这个问题，我们从协议清洁度的角度重新审视TSF-TTA，并提出了一种仅基于成熟地面真实值的适应协议，从而获得更原理化的适应设置。在该协议下，我们进一步在频域中诊断现有适配器，并发现其预测修正通常表现出有限且弱结构化的频谱修改。受此诊断启发，我们提出了频率感知校准（FAC），一种轻量级校准方法，直接在频域中参数化预测修正。在多种数据集、预测时间跨度和源预测器上，FAC实现了竞争性和一致性的性能，同时所需可训练参数显著少于比较的TSF-TTA适配器。

英文摘要

Test-time adaptation (TTA) has recently emerged as a promising approach for improving time series forecasting (TSF) under distribution shift. Existing TSF-TTA methods differ in how they utilize revealed targets, yet the resulting adaptation protocols remain heterogeneous and lack a clearly unified formulation. To address this issue, we revisit TSF-TTA from the perspective of protocol cleanliness and propose an adaptation protocol based solely on matured ground truth, yielding a more principled setting for adaptation. Under this protocol, we further diagnose existing adapters in the frequency domain and find that their prediction corrections often exhibit limited and weakly structured spectral modifications. Motivated by this diagnosis, we propose Frequency-Aware Calibration (FAC), a lightweight calibration method that directly parameterizes prediction corrections in the frequency domain. Across diverse datasets, forecasting horizons, and source forecasters, FAC achieves competitive and consistent performance while requiring substantially fewer trainable parameters than the compared TSF-TTA adapters.

URL PDF HTML ☆

赞 0 踩 0

2605.17246 2026-05-19 cs.LG cs.AI 版本更新

Fidelity Probes for Specification--Code Alignment

规范-代码对齐的保真度探针

Ferhat Erata, Hao Zhou, Luke Huan

发表机构 * AWS Agentic AI（AWS智能AI）

AI总结本文提出保真度探针，通过从参考artifact生成的自然语言问题和代码派生的地面真实答案，从候选规范中回答问题。保真度是同意探针的比例，分解为矛盾率和覆盖缺口率，驱动针对性的规范编辑以达到收敛。在15个程序、约12000行COBOL基准（AWS CardDemo）上，通过八次迭代将冻结测试规范的保真度从0.63提升到0.94，其中平台位置由仅需四次速率数据的两状态马尔可夫固定点$F^\dagger$预测。探针来自LLM读取代码或静态分析管道对其控制流、数据流和系统依赖图的处理，具有可调混合比例。一个带有冻结留出集的探针重采样协议提供了Hoeffding有界的过拟合判别；我们测量的训练/测试差距保持在该包络线下一个数量级。三种基于图的混合提升了保真度16到30分；跨分布评估显示LLM和符号通道在经验上互补。在五个独立LLM家族（Anthropic、DeepSeek、Google、Alibaba、OpenAI）上进行的跨家族生成器扫描确认了收敛行为不依赖于任何单一模型家族：五个非Claude生成器中有三个产生了与马尔可夫固定点预测一致的轨迹，而冻结测试协议主动否定了两个探针分布随迭代变化的生成器。该方法适用于任何应描述相同行为的artifact对。

Comments 29 pages, 14 figures, 11 tables

详情

AI中文摘要

我们引入了保真度探针：从参考artifact生成的自然语言问题，其代码派生的地面真实答案由候选规范回答。保真度是同意探针的比例，分解为矛盾率和覆盖缺口率，驱动针对性的规范编辑以达到收敛。在15个程序、约12000行COBOL基准（AWS CardDemo）上，我们通过八次迭代将冻结测试规范的保真度从0.63提升到0.94，其中平台位置由仅需四次速率数据的两状态马尔可夫固定点$F^\dagger$预测。探针来自LLM读取代码或静态分析管道对其控制流、数据流和系统依赖图的处理，具有可调混合比例。一个带有冻结留出集的探针重采样协议提供了Hoeffding有界的过拟合判别；我们测量的训练/测试差距保持在该包络线下一个数量级。三种基于图的混合提升了保真度16到30分；跨分布评估显示LLM和符号通道在经验上互补。在五个独立LLM家族（Anthropic、DeepSeek、Google、Alibaba、OpenAI）上进行的跨家族生成器扫描确认了收敛行为不依赖于任何单一模型家族：五个非Claude生成器中有三个产生了与马尔可夫固定点预测一致的轨迹，而冻结测试协议主动否定了两个探针分布随迭代变化的生成器。该方法适用于任何应描述相同行为的artifact对。

英文摘要

We introduce fidelity probes: natural-language questions generated from a reference artifact with code-derived ground-truth answers, answered from a candidate specification. The fraction of agreeing probes, which we call the fidelity, decomposes into contradiction and coverage-gap rates that drive targeted spec edits to convergence. On a 15-program, roughly 12k-line COBOL benchmark (AWS CardDemo), we raise frozen-test specification fidelity from 0.63 to 0.94 over eight iterations, with the plateau location predicted by a two-state Markov fixed point $F^\dagger$ from just four iterations of rate data. Probes come from an LLM reading the code or from a static-analysis pipeline over its control-flow, data-flow, and system-dependence graphs, with a tunable mixture. A probe-resampling protocol with a frozen held-out set gives a Hoeffding-bounded overfitting discriminant; our measured train/test gap stays more than an order of magnitude below this envelope. Three graph-grounded mixtures lift fidelity by +16 to +30 points; cross-distribution evaluation shows the LLM and symbolic channels are empirically complementary. A cross-family generator sweep on five independent LLM lineages (Anthropic, DeepSeek, Google, Alibaba, OpenAI) confirms the convergence behaviour is not tied to any single model family: three of five non-Claude generators produce trajectories consistent with the Markov fixed-point prediction, and the frozen-test protocol actively falsifies the two generators whose probe distributions drift across iterations. The method applies to any pair of artifacts that are supposed to describe the same behaviour.

URL PDF HTML ☆

赞 0 踩 0

2605.17244 2026-05-19 cs.LG cs.AI 版本更新

Drift Flow Matching

漂移流匹配

Chenrui Ma, Xi Xiao, Lin Zhao, Tianyang Wang, Ferdinando Fioretto, Yanning Shen

发表机构 * University of California, Irvine（加州大学伊万斯堡分校）； University of Virginia（弗吉尼亚大学）； University of Alabama at Birmingham（阿拉巴马大学伯明翰分校）； Northeastern University（东北大学）

AI总结本文提出Drift Flow Matching框架，结合漂移生成模型与基于流的迭代生成方法，实现高效生成与多步细化，提升生成质量与效率适应性。

2605.17238 2026-05-19 cs.LG stat.ML 版本更新

向SAR影像中实现近实时海洋油污检测的量子辅助SVM

Joseph Strauss, Jyotsna Sharma

发表机构 * Division of Computer Science \& Engineering Louisiana State University Baton Rouge, United States ； Department of Petroleum Engineeering Louisiana State University Baton Rouge, United States

AI总结本研究提出一种量子辅助SVM集成方法，用于近实时检测SAR影像中的海洋油污，通过量子退火优化小数据集中的SVM支持向量，实现高效准确的油污检测。

详情

AI中文摘要

为何安全护栏在不同语言中会退化？

Max Zhang, Ameen Patel, Sang T. Truong, Sanmi Koyejo

发表机构 * Stanford University（斯坦福大学）

AI总结该研究通过引入多组项目反应理论框架，揭示了语言无关的安全鲁棒性、提示内在难度、全球语言处理难度和提示特定的跨语言安全差距等因素，发现安全退化并非仅在低资源语言中发生，且文化与概念不匹配也会影响安全性能。

详情

AI中文摘要

大型语言模型在非英语语言中表现出安全退化。标准评估依赖于禁令成功率（JSR），但将多个安全驾驶因素合并为一个，掩盖了安全失败的具体原因。我们引入了一个潜在变量模型，即多组项目反应理论（IRT）框架，将安全驾驶因素如语言无关的安全鲁棒性（θ）、内在提示难度（β）、全球语言处理难度（γ）和提示特定的跨语言安全差距（τ）分离。使用MultiJail数据集，我们评估了61种模型配置在5个闭源模型家族和10种资源各异的语言中的安全鲁棒性，汇总了190万行数据集。探索性因子分析显示安全主要是一维的：模型拒绝不同危害类型主要通过共享机制。与预期趋势相反，22种模型配置在英语中比在低资源语言中更易受攻击。低资源语言产生更多不确定响应（高熵）比高资源语言。此外，高τ提示集中在如盗窃和武器等物理危害类别和低资源语言中，趋势通过跨数据集泛化得到验证。虽然全球翻译质量与τ相关性低，但严重翻译错误驱动高偏置异常值，通过本地说话者验证。文化与概念基础不匹配也会影响τ。在预测验证中，IRT框架实现了AUC=0.940，优于更简单的基线，在预测不安全提示的安全拒绝方面表现更优。我们的框架揭示了概念-语言脆弱性，这些指标汇总后被掩盖，使公平的跨语言安全评估和目标改进数据集建设成为可能。

英文摘要

Large language models exhibit safety degradation in non-English languages. Standard evaluation relies on Jailbreak Success Rate (JSR), which confounds several safety-driving factors into one, obscuring the specific cause(s) of safety failure. We introduce a latent variable model, a Multi-Group Item Response Theory (IRT) framework, that decouples safety-driving factors such as language-agnostic safety robustness ($θ$), intrinsic prompt hardness ($β$), global language processing difficulty ($γ$), and a prompt-specific cross-lingual safety gap ($τ$). Using the MultiJail dataset, we evaluate the safety robustness of 61 model configurations across 5 closed-model families and 10 languages of varying resource, aggregating a dataset of 1.9 million rows. Exploratory Factor Analysis shows safety is primarily unidimensional: models refuse different harm types mainly through a shared mechanism. Contrary to the expected trend that safety degrades largely in low-resource languages, 22 model configurations are more vulnerable in English than in low-resource languages. Low-resource languages produce more uncertain responses (high entropy) than high-resource languages. Also, high-$τ$ prompts cluster in physical harm categories like Theft and Weapons and lower-resource languages, trends validated through cross-dataset generalization. While global translation quality shows low correlation with $τ$, severe mistranslations drive high-bias outliers, as validated by native speakers. Cultural and conceptual grounding mismatches also contribute to $τ$. In predictive validation, the IRT framework achieves $\mathrm{AUC} = 0.940$, outperforming simpler baselines in predicting safe refusal of unsafe prompts. Our framework reveals concept-language vulnerabilities that aggregate metrics obscure, enabling fairer cross-lingual safety evaluation and targeted improvements in dataset construction.

URL PDF HTML ☆

赞 0 踩 0

2605.17172 2026-05-19 cs.LG cs.AI cs.CL 版本更新

OpenJarvis: Personal AI, On Personal Devices

OpenJarvis: 个人AI，本地设备上

Jon Saad-Falcon, Avanika Narayan, Robby Manihani, Tanvir Bhathal, Herumb Shandilya, Hakki Orhun Akengin, Gabriel Bo, Andrew Park, Matthew Hart, Caia Costello, Chuan Li, Christopher Ré, Azalia Mirhoseini

发表机构 * OpenClaw ； Hermes Agent ； PinchBench ； GAIA

AI总结本文提出OpenJarvis，一种分解的个人AI堆栈，通过在本地设备上优化五个基本组件（智能、引擎、代理、工具与记忆、学习）来缩小本地与云端之间的性能差距，同时保持本地模型的特性。

Comments Code: https://github.com/openjarvis/openjarvis Website: https://open-jarvis.github.io/OpenJarvis/

详情

AI中文摘要

个人AI堆栈，如OpenClaw和Hermes Agent，正在成为日常工作的核心，但它们几乎将每一个查询（通常涉及敏感的本地数据）都路由到云托管的前沿模型。用现有的堆栈中替换前沿模型为本地模型并不奏效：将Claude Opus 4.6换成Qwen3.5-9B，在个人AI任务如PinchBench和GAIA上会降低25-39个百分点的准确性。现有堆栈围绕特定的云模型捆绑代理提示、工具描述、内存配置和运行时设置。只有提示可以进行调优，而最先进的提示优化器只能自行关闭5个百分点的本地-云差距。这促使了分解的个人AI堆栈：一种能够暴露个体原语，可以单独或联合优化以缩小本地-云差距的堆栈。我们提出了OpenJarvis，一种将个人AI系统表示为五种原语的类型规范的架构：智能、引擎、代理、工具与记忆、学习。每个原语都是独立可编辑的字段，使堆栈能够端到端优化，并且可以针对准确性、成本和延迟进行测量。为了在不牺牲本地模型特性的情况下缩小本地-云差距，OpenJarvis引入了LLM引导的规范搜索，这是一种本地-云协作，在搜索时前沿云模型提出规范的编辑，只有非退化的编辑被接受，最终的规范在推理时完全在设备上运行。通过LLM引导的规范搜索，设备上的规范在8个基准中的4个上匹配或超过了云准确性，并且平均在最佳云基线基础上减少了3.2个百分点。它们还减少了边际API成本约800倍，并将端到端延迟减少了4倍。

英文摘要

Personal AI stacks, like OpenClaw and Hermes Agent, are becoming central to daily work, yet they route nearly every query (often over sensitive local data) to cloud-hosted frontier models. Replacing frontier models with local models inside existing stacks does not work: swapping Claude Opus 4.6 for Qwen3.5-9B drops accuracy by 25-39 pp across personal AI tasks like PinchBench and GAIA. Existing stacks bundle agentic prompts, tool descriptions, memory configuration, and runtime settings around a specific cloud model. Only the prompts can be tuned, and state-of-the-art prompt optimizers close just 5 pp of the local-cloud gap on their own. This motivates a decomposed personal AI stack: one that exposes individual primitives which can be optimized individually or jointly to close the local-cloud gap. We present OpenJarvis, an architecture that represents a personal AI system as a typed spec over five primitives: Intelligence, Engine, Agents, Tools & Memory, and Learning. Each primitive is an independently editable field, making the stack end-to-end optimizable and measurable against accuracy, cost, and latency. Towards closing the local-cloud gap without surrendering local-model properties, OpenJarvis introduces LLM-guided spec search, a local-cloud collaboration in which frontier cloud models propose edits across the spec at search time, only non-regressing edits are accepted, and the resulting spec runs entirely on-device at inference time. With LLM-guided spec search, on-device specs match or exceed cloud accuracy on 4 of 8 benchmarks and land within 3.2 pp of the best cloud baseline on average. They also reduce marginal API cost by ~800x and end-to-end latency by 4x.

URL PDF HTML ☆

赞 0 踩 0

2605.17170 2026-05-19 cs.LG 版本更新

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks

TriAxialKV: 向极低精度KV缓存量化迈进以应对代理推理任务

Hanzhang Shen, Haoran Wu, Yiren Zhao, Robert Mullins

发表机构 * University of Cambridge（剑桥大学）； Imperial College London（伦敦帝国学院）

AI总结本文提出TriAxialKV，一种混合精度的KV缓存量化方案，通过为每个token分配三轴标签，校准每种标签的敏感性，并在固定内存预算下分配INT2/INT4位宽，以提高代理推理任务的效率和吞吐量。

详情

AI中文摘要

代理工作负载已成为LLM推理中的主要工作负载。它们与仅聊天的工作负载有显著不同，要求长上下文处理、处理多模态输入以及支持结构化的多轮交互和工具调用能力。因此，其上下文表现出结构，可以沿三个关键轴携带不同的重要性：时间最近性、模态（如文本或图像标记）以及语义角色（如用户查询、工具调用、观察或推理）。这些轴捕捉了不同的标记行为，并导致不同的对KV缓存压缩的敏感性。然而，现有的KV缓存量化方法通常是同质的或仅在单一维度上利用异质性，如时间接近性或模态，忽略了它们之间的相互作用。为此，我们引入TriAxialKV，一种新的混合精度KV缓存量化方案，为每个token分配三轴标签，校准每种标签的敏感性，并在固定内存预算下分配INT2/INT4位宽。我们实现了TriAxialKV作为端到端的服务系统，包括校准、混合精度量化和内存管理，并定制了融合的Triton解码内核。当使用Qwen3-VL-32B-Thinking作为计算机使用代理操作OSWorld时，TriAxialKV在BF16 KV缓存的准确性与SGLang相当，同时支持4.5倍的KV缓存大小，并在真实GPU系统上实现了30%更高的端到端吞吐量。

英文摘要

Agentic workloads have emerged as a major workload for LLM inference. They differ significantly from chat-only workloads, requiring long-context processing, the ability to handle multimodal inputs, and structured multi-turn interactions with tool calling capabilities. As a result, their context exhibits structure that can carry different importance along three key axes: temporal recency to the current turn, modality such as text or image tokens, and semantic role such as user queries, tool calls, observations, or reasoning. These axes capture distinct token behaviors and lead to different sensitivities to KV-cache compression. However, existing KV-cache quantization methods are typically homogeneous or exploit only heterogeneity on a single dimension, such as temporal proximity or modality, overlooking the interactions among them. To this end, we introduce TriAxialKV, a novel mixed-precision KV-cache quantization scheme that assigns each token a triaxial tag, calibrates per-tag sensitivity, and allocates INT2/INT4 bitwidths under a fixed memory budget. We implement TriAxialKV as an end-to-end serving system, comprising calibration, mixed-precision quantization and memory management, and custom fused Triton decode kernels. When using Qwen3-VL-32B-Thinking as a computer-use agent operating the OSWorld, TriAxialKV matches the accuracy of SGLang with BF16 KV cache while supporting 4.5$\times$ KV cache size and achieving 30% higher end-to-end throughput, when running on real GPU systems.

URL PDF HTML ☆

赞 0 踩 0

2605.17165 2026-05-19 cs.CV cs.LG 版本更新

用可证明稳健实例压力测试神经网络验证器

David Troxell, Yulia Alexandr, Sofia Hunt, Stephanie Lei, Guido Montúfar

发表机构 * Department of Statistics & Data Science, University of California, Los Angeles（统计与数据科学系，加州大学洛杉矶分校）； Department of Mathematics, University of California, Los Angeles（数学系，加州大学洛杉矶分校）； Max Planck Institute for Mathematics in the Sciences, Leipzig（科学数学研究所，莱比锡）

AI总结本文提出了一种生成具有已知真实稳健标签的验证实例的框架，揭示了现有验证器的数值容忍度问题和实现错误，并引入了验证难度轮廓以系统研究验证器失败模式，评估了五种最先进的验证器并展示了不同实例对验证流程不同方面的压力测试。

详情

AI中文摘要

神经网络验证器旨在为模型行为提供正式保证，但现有的验证基准本质上受到缺乏真实标签的限制。因此，验证器评估依赖于间接启发式方法，这阻止了精确评分和系统研究验证器失败模式。我们通过引入一个可重用的框架来生成验证实例，其真实稳健标签通过分析构造已知，从而填补了这一差距。我们的框架导致在流行的验证器中发现了多个数值容忍度问题和实现错误，突显了真实标签的必要性。此外，为了系统研究验证器失败模式，我们引入了验证难度轮廓，一个收集可估计数量的集合，捕捉不同的实例难度来源。使用我们的框架和这些轮廓，我们评估了五种最先进的验证器，并展示了不同实例对验证流程不同方面的压力测试。我们证明这些结果可以帮助未来验证器的发展，因为它们为提高数值可靠性、放松质量和搜索行为提供了可行的目标。我们的代码已公开可用：https://github.com/dtroxell19/VeriStressGT.git。

英文摘要

Neural network verifiers aim to provide formal guarantees on model behavior, but existing verification benchmarks are fundamentally limited by their lack of ground-truth labels. As a result, verifier evaluation relies on indirect heuristics, which prevents exact scoring and systematic study of verifier failure modes. We address this gap by introducing a reusable framework for generating verification instances whose ground-truth robustness labels are known a priori through analytic construction. Our framework led to the discovery of multiple numeric tolerance concerns and an implementation bug in popular verifiers, highlighting the need for ground-truth labels. Additionally, to systematically study verifier failure modes, we introduce the verification Difficulty Profile, a collection of estimable quantities capturing distinct sources of instance hardness. Using our framework and these profiles, we evaluate five state-of-the-art verifiers and show that different instances stress distinct aspects of the verification pipeline. We show that these results can aid the future development of verifiers as they provide actionable targets for improving numerical reliability, relaxation quality, and search behavior. Our code is publicly available: https://github.com/dtroxell19/VeriStressGT.git.

URL PDF HTML ☆

赞 0 踩 0

2605.17151 2026-05-19 cs.LG 版本更新

基于主成分分析的月球陨石坑检测

Travis Driver, John A. Christian

发表机构 * School of Aerospace Engineering, Georgia Institute of Technology（航空航天工程学院，佐治亚理工学院）

AI总结本文提出了一种基于主成分分析的自动陨石坑模板生成方法，用于改进基于图像的陨石坑识别技术，通过在模拟月球图像上展示优于手工挑选模板的检测和定位性能。

2605.17118 2026-05-19 cs.LG stat.CO stat.ML 版本更新

Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning

可微优化层用于深度学习中的保证公平性

David Troxell, Noah Roemer, Guido Montúfar

发表机构 * Department of Statistics \& Data Science, University of California, Los Angeles, USA ； Department of Mathematics, University of California, Los Angeles, USA ； Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

AI总结本文提出了一种称为'公平性层'的可微优化层，该层可确保在神经网络中集成时满足所选的输出平等性概念，并介绍了一个在线对偶推理算法，为流式预测提供可证明的公平性保证，即使使用任意小的批量大小。

Comments To be published in International Conference on Machine Learning (ICML), 2026

2605.17108 2026-05-19 cs.LG 版本更新

Parallel Recursive LSTM

并行递归LSTM

Tristan Gaudreault, Yongyi Mao

发表机构 * School of Electrical Engineering and Computer Science（电气工程与计算机科学学院）； University of Ottawa（渥太华大学）

AI总结本文提出并行递归LSTM（PR-LSTM），一种层次递归架构，通过递归非线性状态组合替代左到右递归，以减少长上下文设置中的计算深度，同时保持非线性门控状态表示，并在形式语言基准测试中实现了更强的序列长度泛化能力。

Comments 13 pages, 5 figures. Code available at https://github.com/tristangaudreault/pr-lstm

详情

AI中文摘要

Transformers have become the dominant architecture for sequence modeling by using self-attention to enable expressive and highly parallel processing. However, the resulting quadratic time and memory costs limit efficiency in long-context settings. Recurrent models such as LSTMs provide explicit nonlinear state updates and strong state-tracking capabilities, yet their strictly sequential computation limits parallelism. We introduce the Parallel Recursive LSTM (PR-LSTM), a hierarchical recurrent architecture that replaces left-to-right recurrence with recursive nonlinear state composition over a balanced computation tree. Tokens are first mapped independently to latent states, which are then recursively merged by a learned gated composition block. This structure uses the reduction pattern underlying parallel scans as a fixed execution schedule, rather than assuming an associative recurrence. As a result, PR-LSTM retains nonlinear gated state representations while reducing recurrent parallel depth from linear to logarithmic. Empirically, PR-LSTM achieves strong sequence-length generalization on formal-language benchmarks, solving more tasks than standard RNN, LSTM, and Transformer baselines, while avoiding the quadratic scaling of attention. These results suggest that recurrent computation can be reorganized hierarchically to expose parallelism without restricting the transition dynamics to linear or associative forms.

英文摘要

Transformers have become the dominant architecture for sequence modeling by using self-attention to enable expressive and highly parallel processing. However, the resulting quadratic time and memory costs limit efficiency in long-context settings. Recurrent models such as LSTMs provide explicit nonlinear state updates and strong state-tracking capabilities, yet their strictly sequential computation limits parallelism. We introduce the Parallel Recursive LSTM (PR-LSTM), a hierarchical recurrent architecture that replaces left-to-right recurrence with recursive nonlinear state composition over a balanced computation tree. Tokens are first mapped independently to latent states, which are then recursively merged by a learned gated composition block. This structure uses the reduction pattern underlying parallel scans as a fixed execution schedule, rather than assuming an associative recurrence. As a result, PR-LSTM retains nonlinear gated state representations while reducing recurrent parallel depth from linear to logarithmic. Empirically, PR-LSTM achieves strong sequence-length generalization on formal-language benchmarks, solving more tasks than standard RNN, LSTM, and Transformer baselines, while avoiding the quadratic scaling of attention. These results suggest that recurrent computation can be reorganized hierarchically to expose parallelism without restricting the transition dynamics to linear or associative forms.

URL PDF HTML ☆

赞 0 踩 0

2605.17107 2026-05-19 stat.ML cs.LG math.OC math.PR 版本更新

Diffusion-Based Stochastic Operator Networks for Uncertainty Quantification in Stochastic Partial Differential Equations

基于扩散的随机算子网络用于随机偏微分方程中的不确定性量化

Phuoc-Toan Huynh, Richard Archibald, Feng Bao

发表机构 * Department of Mathematics, Florida State University（佛罗里达州立大学数学系）； Computer Science and Mathematics Division, Oak Ridge National Laboratory（橡树岭国家实验室计算机科学与数学 division）

AI总结本文提出了一种新的框架，用于随机偏微分方程（SPDEs）解算子的不确定性量化。尽管SPDEs在建模具有不确定性的复杂物理系统中起着核心作用，但其实际应用通常需要指定模型不确定性的幅度和结构，而这些通常是未知且难以从噪声测量中推断出来的。为此，本文开发了一种随机算子学习框架，直接从噪声数据中学习，并输出均值解场和不确定性量化。所提出的方法，即随机算子网络（SON），通过结合深度算子网络（DeepONet）的结构与随机神经网络（SNNs）来建模随机性并实现概率预测。训练过程通过最小化一种哈密顿型损失并使用随机最大原理优化所得目标进行。在多个不确定性源下的基准SPDEs上的数值实验展示了所提出方法在捕捉解结构和量化预测不确定性方面的准确性和鲁棒性。

详情

AI中文摘要

我们介绍了一种新颖的框架，用于随机偏微分方程（SPDEs）解算子的不确定性量化。尽管SPDEs在建模具有不确定性的复杂物理系统中起着核心作用，但其实际应用通常需要指定模型不确定性的幅度和结构，而这些通常是未知且难以从噪声测量中推断出来的。为此，我们开发了一种随机算子学习框架，直接从噪声数据中学习，并输出均值解场和不确定性量化。所提出的方法，即随机算子网络（SON），是通过将深度算子网络（DeepONet）的结构与随机神经网络（SNNs）相结合来建模随机性并实现概率预测。训练过程是通过最小化一种哈密顿型损失并使用随机最大原理优化所得目标进行。在多个不确定性源下的基准SPDEs上的数值实验展示了所提出方法在捕捉解结构和量化预测不确定性方面的准确性和鲁棒性。

英文摘要

We introduce a novel framework for uncertainty quantification of solution operators associated with stochastic partial differential equations (SPDEs). Although SPDEs play a central role in modeling complex physical systems under uncertainty, their practical use typically requires specifying the magnitude and structure of model uncertainties that are often unknown and difficult to infer from noisy measurements. To address this challenge, we develop a stochastic operator-learning framework that learns directly from noisy data and outputs both a mean solution field and a quantification of uncertainty. The proposed method, namely the Stochastic Operator Network (SON), is constructed by combining the structure of the Deep Operator Network (DeepONet) with Stochastic Neural Networks (SNNs) to model stochasticity and enable probabilistic prediction. The training procedure is carried out by minimizing a Hamiltonian-type loss and optimizing the resulting objective using the Stochastic Maximum Principle. Numerical experiments on benchmark SPDEs under multiple uncertainty sources demonstrate the accuracy and robustness of the proposed method in capturing solution structure and quantifying predictive uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2605.17095 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Visual Timelines of Police Encounters in Body-Worn Camera Footage: Operational Context and Activity Cataloging for Training and Analysis in OpenBWC

警察执法视频中的视觉时间线：用于训练和分析的开放BWC操作上下文和活动编目

Angela Srbinovska, Christopher Homan, Adrian Martin, Ernest Fokoué

发表机构 * Rochester Institute of Technology（罗切斯特理工大学）； Rochester Police Department（罗切斯特警察局）； Office of Business Intelligence（业务智能办公室）； School of Mathematics and Statistics（数学与统计学学院）

AI总结本文提出了一种处理体感摄像头视频的方法，生成时间对齐的固定长度10秒窗口序列，用于训练和分析，通过隐私保护协议进行处理和标记，以提高事件审查和培训流程的效率。

Comments 13 pages, 10 figures, 9 tables

详情

AI中文摘要

执法机构正在积累大量体感摄像头（BWC）视频。然而，这些视频仍然在操作上是模糊的。也就是说，分析人员和培训人员仍然需要花费大量时间观看完整视频以确定关键事件的开始点，并识别活动转向更剧烈的物理活动的点。我们提出了一种方法，将BWC视频处理为时间对齐的固定长度10秒窗口序列，通过隐私保护协议进行处理和标记。每个窗口被标记为两个维度的信息：（i）窗口的操作上下文和（ii）窗口内的运动强度水平，对于因黑暗、模糊或遮挡导致证据不足的窗口，使用低证据标签。我们训练模型根据这两个轴分类窗口，使用从每个窗口中采样的帧，通过CLIP模型编码并汇总成窗口级别的表示。我们提取每个窗口的密集光流统计信息以捕捉运动强度。在测试窗口中，最佳上下文模型达到78.75%的准确率，最佳准确率活动模型达到88.33%。我们还包含了完整性审计，以展示结果以及视觉时间线表示如何支持更快的事件审查，并使警官培训流程更加实用。

英文摘要

Law enforcement agencies are accumulating vast amounts of body-worn camera (BWC) footage. However, this remains operationally opaque. That is, analysts and trainers still have to invest considerable time watching full-length videos to pinpoint the start of key encounters and identify the points where activity shifts to something more physically intense. We present an approach to process BWC video into a time-aligned sequence of fixed-length 10-second windows, processed and labeled using a privacy-conscious protocol. Each window is labeled with two dimensions of information: (i) the operational context of the window and (ii) the level of motion intensity within the window, with low-evidence labels for windows for which insufficient evidence exists due to darkness, blur or occlusion. We train models to classify windows based on these two axes using frames sampled from each window encoded using CLIP model and aggregated into a window-level representation. We extract dense optical flow statistics for each window to capture motion intensity. On test windows the best context model achieves 78.75% accuracy, and the best-accuracy activity model achieves 88.33%. We also included integrity audits to show the results and how the visual timeline representations support faster incident review and make the officer training workflow more practical.

URL PDF HTML ☆

赞 0 踩 0

2605.17091 2026-05-19 cs.LG 版本更新

Mechanism Learning: Prototype-Anchored Mechanism Inference for Scientific Forecasting

机制学习：面向科学预测的原型锚定机制推断

Qian Jiang, Liping Sun

发表机构 * School of Computing（计算学院）； The Australian National University（澳大利亚国立大学）； iHuman Institute（iHuman研究院）

AI总结本文提出机制学习框架，通过估计当前活跃的局部机制来预测未来状态，其核心方法是将局部时空片段压缩为机制描述，并利用原型锚定来构建数据驱动的机制空间，从而在科学预测中实现鲁棒性和稳定性。

详情

AI中文摘要

科学预测通常依赖于直接状态预测，这种方法在数据稀缺、扩展时间范围、非平稳动态或高维复杂性下会变得脆弱。尽管原始状态轨迹在这些情况下非常敏感，但底层的局部演化规则往往表现出鲁棒的可重用性。我们引入了机制学习，这是一种通过估计当前活跃的局部机制来预测未来状态的框架。我们的方法将局部时空片段压缩为机制描述，形成一个数据驱动的结构化机制空间，其中相似性反映相似的局部演化规则。为了使这些估计基于观测数据，我们利用原型锚定，一组代表性的机制，稀疏覆盖局部规则的空间。我们在Burgers动力学、WeatherBench2和Lorenz96上评估了这种方法。实证表明，学习的机制空间能够抵抗崩溃并保持强局部一致性。与直接预测和其他模型，包括FNO、NODE、LSTM和回声室方法相比，我们的框架在脆弱的环境中显示出预测优势：在Burgers动力学中显著提高了切换稳定性，在WeatherBench2的稀缺数据固定时间范围协议和中间复杂度Lorenz96中实现了最先进的性能。消融研究和漂移诊断确认，这些改进是由有限的原型锚定而不是纯粹的潜在容量驱动的。这些结果共同确立了机制学习作为在预测复杂系统中直接状态预测的原理性、鲁棒替代方案。

英文摘要

Scientific forecasting typically relies on direct state prediction, an approach that grows brittle under data scarcity, extended horizons, non-stationary dynamics, or high-dimensional complexity. While raw state trajectories are highly sensitive in these regimes, underlying local evolution rules often exhibit robust reusability. We introduce mechanism learning, a framework that forecasts future states by estimating the currently active local mechanism. Our method compresses local spatiotemporal fragments into mechanism descriptors, forming a data-driven, structured mechanism space where proximity reflects similar local evolution rules. To ground these estimates in observed data, we utilize prototype anchors, a set of representative mechanisms that sparsely cover the space of local rules. We evaluate this approach on Burgers dynamics, WeatherBench2, and Lorenz96. Empirically, the learned mechanism spaces resist collapse and maintain strong local consistency. Compared to direct prediction and other models including FNO, NODE, LSTM, and reservoir-family methods, our framework demonstrates predictive gains in fragile regimes: it significantly improves switching stability in Burgers dynamics and achieves state-of-the-art performance both under the scarce-data fixed-horizon WeatherBench2 protocol and in intermediate-complexity Lorenz96. Ablation studies and drift diagnostics confirm that these improvements are driven by finite prototype anchoring rather than sheer latent capacity. Together, these results establish mechanism learning as a principled, robust alternative to direct state prediction in forecasting complex systems.

URL PDF HTML ☆

赞 0 踩 0

2605.17085 2026-05-19 cs.SD cs.LG eess.AS 版本更新

Taming Audio VAEs via Target-KL Regularization

通过目标KL正则化驯服音频VAE

Prem Seetharaman, Rithesh Kumar

发表机构 * Adobe Research（Adobe研究院）

AI总结本文提出通过压缩率调节和目标KL正则化训练音频VAE，以解决在音频生成任务中VAE正则化带来的过正则化与欠正则化之间的平衡问题，并构建了音频VAE的率失真曲线。

Comments Accepted at ICASSP 2026 (Barcelona, Spain, 3-8 May 2026). 5 pages, 1 figure, 3 tables

详情

DOI: 10.1109/ICASSP55912.2026.11460662
Journal ref: Proc. ICASSP 2026

AI中文摘要

潜在扩散模型已成为许多生成任务，如音频生成（如文本到音频、文本到音乐和文本到语音）中的主导范式。潜在扩散模型的关键组成部分是一个自动编码器（VAE），它将高维信号压缩成低帧率的连续表示，以利于后续预测。正则化这些VAE具有挑战性，因为存在过度正则化（输出质量差）和欠正则化（难以预测）的潜在表示之间的权衡。我们提出一个框架来研究这种权衡，通过压缩率调节和通过目标KL正则化训练音频VAE。这使得可以直接与已研究的离散神经音频编解码器模型进行比较，并构建音频VAE的率失真曲线。我们评估了目标KL正则化对文本到声音生成的影响，并发现扫掠压缩率有助于确定最佳生成设置。

英文摘要

Latent diffusion models have emerged as the dominant paradigm for many generation tasks including audio generation such as text-to-audio, text-to-music and text-to-speech. A key component of latent diffusion is an autoencoder (VAE) that compresses high-dimensional signals into a low frame rate continuous representation that is conducive for downstream prediction. Regularizing these VAEs is challenging, as there is a trade-off between over-regularized (poor output quality) and under-regularized (difficult to predict) latent representations. We propose a framework for studying this trade-off through compression and train Audio VAEs at specific bitrates via target-KL regularization. This allows direct comparison to well-studied discrete neural audio codec models, and the construction of rate-distortion curves for audio VAEs. We evaluate the impact of target-KL regularization on text-to-sound generation and find that sweeping compression rates is helpful in identifying the optimal generation setting.

URL PDF HTML ☆

赞 0 踩 0

2605.17084 2026-05-19 cs.LG cs.CL 版本更新

Scale Determines Whether Language Models Organize Representation Geometry for Prediction

尺度决定语言模型是否为预测组织表示几何

Weilun Xu

发表机构 * School of Computer and Communication Sciences（计算机与通信科学学院）； École Polytechnique Fédérale de Lausanne（洛桑联邦理工学院）

AI总结研究探讨了语言模型中表示几何是否为预测组织，通过Subspace PGA指标发现，模型规模影响表示几何的组织程度，小模型在训练后期逐渐失去这种组织，而大模型则保持稳定。

详情

AI中文摘要

在语言模型中，表示所编码的内容由其表示空间的几何结构决定：距离而非激活值承载意义。现有工具描述了这种几何结构的形状，但并未探讨其组织目的。我们引入Subspace PGA指标，测试某层的距离结构是否比随机等大小子空间更符合解嵌入矩阵$W_U$的读出子空间。在七个Pythia模型（70M-6.9B）和三个跨家族模型中，中间几何显著为预测组织（峰值$z = 9$--$24$），但程度依赖于规模：小模型（$d \leq 1024$）在训练后期逐渐失去这种组织——即使损失持续改善，而大模型（$d \geq 2048$）则保持稳定。我们追溯到容量权衡：少数主导方向迁离$W_U$的读出，掩盖而非破坏预测结构，移除它们可恢复对齐。频谱度量和损失曲线无法捕捉这一区别。因此，规模不仅决定了模型预测性能，还决定了其表示几何如何组织以实现预测。

英文摘要

In language models, what a representation encodes is determined by the geometry of its representation space: distances, not activations, carry meaning. Existing tools characterize the shape of this geometry but do not ask what that shape is organized for. We introduce Subspace PGA, a metric that tests whether a layer's distance structure aligns with the readout subspace of the unembedding matrix $W_U$ more than with random subspaces of equal size. Across seven Pythia models (70M--6.9B) and three cross-family models, intermediate geometry is significantly organized for prediction (peak $z = 9$--$24$), but the degree is scale-dependent: small models ($d \leq 1024$) progressively lose it at late layers during training -- even as loss keeps improving -- while large models ($d \geq 2048$) preserve it throughout. We trace this to a capacity trade-off: a few dominant directions migrate away from $W_U$'s readout, masking rather than destroying the predictive structure beneath, and removing them restores alignment. Neither spectral metrics nor loss curves capture this distinction. Scale thus determines not only how well a model predicts, but how its representation geometry is organized to do so.

URL PDF HTML ☆

赞 0 踩 0

2605.17058 2026-05-19 cs.LG 版本更新

Learning Multi-Timescale Abstractions for Hierarchical Combinatorial Planning

学习多时间尺度抽象以进行分层组合规划

Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen

发表机构 * Department of Electrical Engineering（电气工程系）； Automation, Aalto University, Finland（自动化，艾尔沃斯大学，芬兰）

AI总结本文提出了一种基于模型的分层框架，用于解决序列随机组合决策问题，通过多时间尺度目标结构化潜在动态，实现高效的前瞻规划，并联合学习子目标条件预算策略以支持上下文感知的资源分配。

Comments 34 pages, 8 figures, 23 tables

详情

AI中文摘要

指数级大的动作空间、随机动态和在有限资源下进行长周期决策使得序列随机组合优化（SSCO）对强化学习尤其具有挑战性。分层强化学习（HRL）提供了一种自然的分解方法，但将其高层策略置于半马尔可夫决策过程（SMDP）中，其中动作具有可变持续时间，使得学习适用于规划的世界模型变得困难。我们引入了一种基于模型的分层框架，直接解决这一问题。我们的方法结合了潜在空间树搜索规划器和SMDP-aware的世界模型，用于可变持续时间决策。多时间尺度目标结构化潜在动态，使得转移幅度反映抽象动作的有效时间尺度，从而在自适应时间抽象下实现高效的前瞻规划。我们进一步联合学习子目标条件预算策略与世界模型，以支持上下文感知的资源分配。在具有挑战性的SSCO基准测试中，我们的方法优于强大的基线方法。

英文摘要

The combination of exponentially large action spaces, stochastic dynamics, and long-horizon decision-making under limited resources makes Sequential Stochastic Combinatorial Optimization (SSCO) particularly challenging for reinforcement learning. Hierarchical Reinforcement Learning (HRL) offers a natural decomposition, but it places the high-level policy in a Semi-Markov Decision Process (SMDP) where actions have variable durations, making it difficult to learn a world model that is suitable for planning. We introduce a model-based hierarchical framework for sequential stochastic combinatorial decision-making that directly addresses this issue. Our method combines a latent-space tree-search planner with an SMDP-aware world model for variable-duration decisions. A multi-timescale objective structures the latent dynamics so that transition magnitudes reflect the effective temporal scales of abstract actions, enabling efficient lookahead under adaptive temporal abstraction. We further learn a subgoal-conditioned budget policy jointly with the world model to support context-aware resource allocation. Across challenging SSCO benchmarks, our method outperforms strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.17045 2026-05-19 eess.SY cs.LG cs.SY 版本更新

Empirical evaluation of Time Series Foundation Models for Day-ahead and Imbalance Electricity Price Forecasting in Belgium

时间序列基础模型的实证评估：用于比利时日前提前和不平衡电力价格预测

Chi Bui, Maria Margarida Mascarenhas, Arnaud Verstraeten, Hussain Kazmi

发表机构 * ELECTA-ESAT (KU Leuven)（ELECTA-ESAT（比利时列日大学））； Gridual Leuven, Belgium（Gridual（比利时列日））； Leuven, Belgium（列日，比利时）； KU Leuven（比利时列日大学）

AI总结本文评估了时间序列基础模型在比利时日前提前和不平衡电力价格预测中的应用，发现Chronos-2在ARX模式下表现最佳，其在日前提前市场中的预测误差比其他方法低5%，但在不平衡市场中误差较高，但模型仍表现出真正的零样本预测能力。

详情

AI中文摘要

最近的时间序列基础模型（TSFMs）的进步承诺了零样本预测能力，只需最小的任务特定训练。尽管这些模型在通用基准上表现强劲，但它们在波动性大、复杂的电力市场中的适用性仍待探索。针对这一空白，本文系统地评估了几种TSFMs，特别是Amazon开发的Chronos-2和Chronos-Bolt，以及Google提供的TimesFM 2.5，用于预测比利时的日前提前和不平衡电力价格。对于两个考虑的市场，Chronos-2在ARX模式下产生最准确的预测。与其它机器学习方法的最佳集成预测相比，Chronos-2在日前提前市场中的平均绝对误差（MAE）低5%。相比之下，模型在所有预测时间范围内预测不平衡价格时，MAE高出10%，除了两小时提前范围。此外，我们发现TSFMs表现出真正的零样本预测能力，但在极端市场条件下仍面临困难。

英文摘要

Recent advances in Time Series Foundation Models (TSFMs) promise zero-shot forecasting capabilities with minimal task-specific training. While these models have shown strong performance across generic benchmarks, their applicability in volatile, complex electricity markets remains underexplored. Addressing this gap, this study provides a systematic empirical evaluation of several TSFMs, specifically Chronos-2 and Chronos-Bolt (developed by Amazon), and TimesFM 2.5 (provided by Google), for forecasting Belgian day-ahead and imbalance electricity prices. For both considered markets, Chronos-2 in ARX mode produces the most accurate forecasts. Compared with the best ensemble prediction from other machine learning methods, Chronos-2's Mean Absolute Error (MAE) is 5% lower for the day-ahead market. In contrast, the model yields 10% higher MAE predicting imbalance prices across all forecast horizons, except for the two-hour-ahead horizon. Moreover, we find that TSFMs exhibit genuine zero-shot forecasting skills but still struggle under extreme market conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.17039 2026-05-19 cs.LG cs.CE 版本更新

Privacy-Preserving Generation Fraud Detection for Distributed Photovoltaic Systems: A Solar Irradiance-Fused Federated Learning Framework

隐私保护的分布式光伏系统发电欺诈检测：一种融合太阳能辐照度的联邦学习框架

Xiaolu Chen, Chenghao Huang, Yanru Zhang, Hao Wang

发表机构 * School of Computer Science and Technology, University of Electronic Science and Technology of China（电子科技大学计算机科学与技术学院）； Department of Data Science and AI, Faculty of IT, Monash University（墨尔本大学信息技术学院数据科学与人工智能系）； Monash Energy Institute, Monash University（墨尔本大学莫纳什能源研究所）； Shenzhen Institute for Advanced Study of UESTC（电子科技大学深圳研究院）

AI总结本文提出了一种基于联邦学习的隐私保护分布式光伏系统发电欺诈检测框架，通过融合太阳能辐照度数据和天气数据，利用共注意机制检测关键异常，有效解决了光伏发电欺诈检测中的间歇性和不确定性问题，并在真实世界数据集上验证了方法的有效性。

Comments 15 pages

详情

DOI: 10.1109/TSG.2026.3692585
Journal ref: IEEE Transactions on Smart Grid, 2026

AI中文摘要

住宅光伏（PV）系统的广泛应用引入了新的发电欺诈检测（FD）挑战。与传统电力盗窃检测不同，光伏发电欺诈检测（PVG-FD）因光伏发电的固有间歇性和不确定性而更加复杂。由于可扩展性和隐私问题，分布式光伏系统的集中式PVG-FD方法面临进一步挑战。本文开发了一种基于联邦学习（FL）的隐私保护分布式PVG-FD框架。在此框架中，电力公司管理多个家庭社区，每个社区都配备有本地检测器。该框架集成了新颖的检测模型架构与隐私保护的全局协作。每个社区的本地模型通过共注意机制融合光伏发电和天气数据以检测对PVG-FD至关重要的异常。FL框架通过聚合模型参数和原型实现跨社区协作，利用全局知识共享与本地细化，同时保护隐私。它还使用原型对齐来解决类别不平衡问题，通过增强欺诈样本的表示。在真实世界住宅PV数据集上的广泛实验验证了所开发方法的有效性，并证明其在各种场景中优于最先进的FL方法。结果还显示了其在不同社区规模下的可扩展性和对类别不平衡的强鲁棒性。

英文摘要

The wide adoption of residential photovoltaic (PV) systems introduces new challenges for generation fraud detection (FD). Unlike traditional electricity theft detection, which focuses on electricity consumption-side behavior, PV generation fraud detection (PVG-FD) is complicated by the inherent intermittency and uncertainty of PV generation. The distributed nature of PV systems poses further challenges for centralized PVG-FD approaches due to scalability and privacy concerns. This paper develops a privacy-preserving distributed PVG-FD framework based on federated learning (FL). In this framework, a utility company manages multiple household communities, where each of which is equipped with a local detector. The framework integrates a novel detection model architecture with privacy-preserving global collaboration. Each community's local model fuses PV generation and weather data via a co-attention mechanism to detect discrepancies critical for PVG-FD. The FL framework enables cross-community collaboration by aggregating model parameters and prototypes, leveraging global knowledge sharing with local refinement while preserving privacy. It also uses prototype alignment to address class imbalance by enhancing fraud sample representation. Extensive experiments on a real-world residential PV dataset validate the effectiveness of the developed method and demonstrate that it outperforms state-of-the-art FL methods across various scenarios. The results also show its scalability across varying community sizes and strong robustness to class imbalance.

URL PDF HTML ☆

赞 0 踩 0

2605.17037 2026-05-19 cs.LG cs.AI cs.CL 版本更新

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

D$^2$Evo: 双重难度感知的自进化方法用于数据高效的强化学习

Ru Zhang, Renda Li, Ziyu Ma, Weijie Qiu, Chongyang Tao, Yong Wang, Xiangxiang Chu

发表机构 * Zhejiang University（浙江大学）； AMAP, Alibaba Group（AMAP，阿里巴巴集团）

AI总结本文提出D$^2$Evo方法，通过双重难度感知的自进化机制，解决强化学习中有效数据稀缺和动态难度变化的问题，从而在数学推理基准上以少于2K真实数学样本实现优于现有方法的性能。

Comments Accepted by ICML 2026. First two authors contributed equally

详情

AI中文摘要

强化学习（RL）在增强大型语言模型（LLMs）推理能力方面展现出潜力。然而，需要中等难度训练样本的有效RL训练面临两个根本性挑战：有效数据稀缺和动态难度变化，其中中等难度样本稀缺且随着模型提升变得简单。现有方法在一定程度上缓解了这种稀缺性，通过生成训练样本。然而，这些方法存在无锚点生成、忽略共进化和难度不匹配的问题。为了解决这些问题，我们提出了D$^2$Evo，一种双重难度感知的自进化RL框架。在每次迭代中，我们的方法基于当前求解器的能力挖掘中等难度锚点，训练提问者生成不同难度层级的多样化问题，并共同优化两个组件以实现渐进式的推理提升。广泛实验表明，D$^2$Evo在数学推理基准上以少于2K真实数学样本优于现有方法，并在通用推理基准上表现出强大的泛化能力。

英文摘要

Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language models (LLMs). However, effective RL training, which requires medium-difficulty training samples, faces two fundamental challenges: Effective Data Scarcity and Dynamic Difficulty Shifts, where medium-difficulty samples are scarce and become trivial as models improve. Existing methods mitigate this scarcity to some extent by generating training samples. However, these approaches suffer from anchor-free generation, ignoring co-evolution, and difficulty mismatch. To address these issues, we propose D$^2$Evo, a Dual Difficulty-aware self-Evolution RL framework. In each iteration, our method mines medium-difficulty anchors based on the current Solver's capability, trains the Questioner to generate diverse questions at appropriate difficulty levels, and jointly optimizes both components to enable progressive reasoning gains. Extensive experiments demonstrate that D$^2$Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.17026 2026-05-19 cs.LG 版本更新

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

为什么推理模型会失去覆盖能力？数据和道路中的分支在其中的作用

Ngoc-Hieu Nguyen, Parshin Shojaee, Phuc Minh Nguyen, Nan Zhang, Chandan K Reddy, Khoa D Doan, Rui Zhang

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）； Virginia Tech（弗吉尼亚理工大学）； VinUniversity（文大学）

AI总结本文研究了推理模型覆盖能力下降的原因，发现训练数据中决策点的普遍存在是导致覆盖缩小的关键因素，并提出通过数据合成和解码机制改进来缓解这一问题。

Comments 22 pages, 13 figures

详情

AI中文摘要

近年来，大语言模型的进展催生了推理模型，这些模型通过专门的微调过程在复杂任务上表现出色。尽管这些方法能可靠地提高pass@1准确率，但先前的研究发现它们表现出覆盖缩小行为，即pass@k相对于基模型会退化。在本文中，我们调查了基于SFT的后训练过程中推理缩小现象的出现原因。我们假设这种行为是由微调数据的特性驱动的，特别是与决策点或

英文摘要

Recent progress in large language models has led to the emergence of reasoning models, which have shown strong performance on complex tasks through specialized fine-tuning procedures. While these methods reliably improve pass@1 accuracy, prior works have observed that they show a coverage shrinkage behavior, where pass@k degrades relative to the base model. In this paper, we investigate the reasoning shrinkage arise under SFT-based post-training. We hypothesize that this behavior is driven by properties of the fine-tuning data, specifically related to decision points or "forks in the road" scenarios where model faces indecipherable patterns with multiple valid reasoning paths. To test this hypothesis, we design controlled case studies that simulate such decision-point settings, spanning indecipherable nodes in graph branching, and reasoning modes. By tracking post-training dynamics in these settings, we find that the shrinkage phenomenon is tightly correlated with the prevalence of decision-point scenarios in the training data. We also demonstrate that this shrinkage behavior can be partially mitigated through targeted data synthesis design of decision-points, and a more systematic diversity-encouraging decoding mechanism. Our findings identify data-centric factors as a key driver of shrinkage in reasoning models and highlight diversity-aware designs as an effective lever for controlling it.

URL PDF HTML ☆

赞 0 踩 0

2605.17017 2026-05-19 cs.LG cs.AI 版本更新

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

当动态变化时，鲁棒任务推断胜出：重新审视具有行为基础模型的离线模仿学习

Rishabh Agrawal, Rahul Jain, Ashutosh Nayyar

发表机构 * University of Southern California（南加州大学）

AI总结本文提出了一种基于行为基础模型（BFM）的框架，通过将任务推断建模为鲁棒最小最大优化问题，以应对动态变化，从而在不修改预训练的情况下实现对最坏动态扰动的适应。该方法在动态变化下显著优于标准BFM和鲁棒离线模仿学习基线。

2605.17011 2026-05-19 cs.GR cs.CV cs.LG 版本更新

Topo-GS: Continuous Volumetric Embedding of High-Dimensional Data via Topological Gaussian Splatting

Topo-GS: 通过拓扑高斯散射实现高维数据的连续体积分嵌入

João Paulo Gois, Luis Gustavo Nonato

发表机构 * Universidade Federal do ABC (UFABC)（巴西圣安德烈大学）

AI总结本文提出Topo-GS方法，利用拓扑感知策略将高维数据转换为连续体积分表示，通过局部几何约束优化，保持局部拓扑保真度，同时显式表现投影扭曲。

Comments 7 pages, 2 figures

详情

AI中文摘要

降维算法将高维数据映射到可可视化的2D或3D空间，但传统上依赖于离散点云范式。这种离散抽象容易受到视觉遮挡和人工不连续性的影响，往往无法表示底层流形的连续密度。为了解决这些限制，我们引入Topo-GS，一个框架，重新利用3D高斯散射（3DGS）将多维投影作为无网格体积分重建过程。与标准光度损失不同，Topo-GS由局部几何约束驱动。通过解决正交Procrustes目标，优化强制了As-Rigid-As-Possible先验，同时显式对齐每个高斯的空间协方差到局部切空间。认识到解卷不同内在维数的数据需要不同的空间处理，我们利用拓扑感知策略，将损失公式定制以保持连续1D轨迹或连贯2D表面。定量和视觉评估表明，Topo-GS成功地将离散散点图转换为连续体积分表示，其中固有的投影扭曲显式表现为可观察的几何变化，同时保持与离散基线相当的局部拓扑保真度。

英文摘要

Dimensionality reduction algorithms map high-dimensional data into visualizable 2D or 3D spaces, but traditionally rely on a discrete point-cloud paradigm. This discrete abstraction is susceptible to visual occlusion and artificial discontinuities, often failing to represent the continuous density of the underlying manifold. To address these limitations, we introduce Topo-GS, a framework that repurposes 3D Gaussian Splatting (3DGS) to cast multidimensional projection as a meshless volumetric reconstruction process. Instead of standard photometric losses, Topo-GS is driven by local geometric constraints. By solving orthogonal Procrustes targets, the optimization enforces an As-Rigid-As-Possible prior while explicitly aligning the spatial covariance of each Gaussian to the local tangent space. Recognizing that unrolling data of varying intrinsic dimensionalities requires distinct spatial treatments, we utilize a topology-aware strategy that tailors the loss formulation to preserve either continuous 1D trajectories or cohesive 2D surfaces. Quantitative and visual evaluations demonstrate that Topo-GS successfully transforms discrete scatter plots into continuous volumetric representations, where inherent projection distortions explicitly manifest as observable geometric variations, while preserving local topological fidelity comparable to discrete baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.17000 2026-05-19 cs.LG cs.AI 版本更新

BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks

BoLT：一个民主化黑盒优化研究的基准，用于昂贵的LLM任务

Ruth Wan Theng Chew, Zhiliang Chen, Apivich Hemachandra, Bryan Kian Hsiang Low

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文提出BoLT基准，旨在通过提供真实LLM优化问题，促进黑盒优化方法在昂贵的大型语言模型任务中的研究和评估。

详情

AI中文摘要

优化大型语言模型（LLM）的训练和推理配置，如超参数、数据混合和提示，对性能至关重要，但在实践中往往采用启发式方法，导致可能的次优结果。通过将它们视为噪声、昂贵且无导数的优化问题，贝叶斯优化（BO）和其他黑盒优化（BBO）方法提供了一个有前途但尚未充分探索的方向，用于原则性、样本效率高的方法。然而，LLM训练和推理成本对大多数BBO研究社区来说过高，新方法往往仅在合成测试函数和小规模数据集上进行评估，这些数据集无法捕捉现代LLM优化问题的挑战。这阻碍了BBO方法的发展，并使评估这些方法在现代LLM任务上的有效性变得困难。我们介绍了BoLT，这是首个以LLM为中心的基准，旨在民主化LLM研究，服务于BBO社区。BoLT在https://github.com/chewwt/bolt上发布。BoLT涵盖了广泛且有动机的LLM优化问题，包括多保真度、多目标、异方差噪声和高维搜索空间。BoLT中的每个问题都基于真实的实验数据，并通过轻量级的替代模型，基于成千上万的真实LLM实验结果，使其完全可重复和可访问。我们对BoLT进行了广泛的BO和BBO方法的评估，显示选定的BO方法在各种任务上持续优于其他方法，突显了现有BBO方法在LLM任务上的不足，强调了为BBO社区现代化基准的必要性。

英文摘要

Optimization of LLM training and inference configurations, such as hyperparameters, data mixtures, and prompts, is critical to performance, but it is often approached heuristically in practice, leading to potentially suboptimal outcomes. By framing them as noisy, expensive, and derivative-free optimization problems, Bayesian optimization (BO) and other black-box optimization (BBO) methods offer a promising yet underexplored direction for principled, sample-efficient methods. However, LLM training and inference costs are prohibitively high for most of the BBO research community, and new methods are often only evaluated on synthetic test functions and small-scale datasets that fail to capture the challenges of modern LLM optimization problems. This impedes the development of BBO methods and makes it difficult to assess their effectiveness on modern LLM tasks. We introduce BoLT, the first LLM-centric benchmark that democratizes LLM research for the BBO community. BoLT is released at https://github.com/chewwt/bolt. BoLT covers broad and well-motivated LLM optimization problems, involving multi-fidelity, multi-objective, heteroscedastic noise, and high-dimensional search spaces. Each problem in BoLT is grounded in real experimental data and made fully reproducible and accessible through lightweight surrogate models fitted to the results of thousands of real LLM experiments. We benchmark BoLT against an extensive range of BO and BBO methods, showing that selected BO methods consistently outperform others across tasks and highlighting gaps in existing BBO methods on LLM tasks, underscoring the need to modernize benchmarks for the BBO community.

URL PDF HTML ☆

赞 0 踩 0

2605.16999 2026-05-19 cs.LG 版本更新

Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning

基于排名的校准：可靠多模态强化学习

Peng Cui, Boyao Yang, Jun Zhu

发表机构 * Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing（计算机科学与技术系，人工智能研究院，BNRist中心，清华大学-博世联合机器学习中心，THBI实验室，清华大学，北京）； Dept. of Automation, Tsinghua University, Beijing（自动化系，清华大学，北京）

AI总结本文提出Ranking-Aware Calibration方法，通过利用组内强化学习自然产生的比较信号，提升多模态强化学习的校准能力，从而提高任务准确性和校准效果。

详情

AI中文摘要

强化学习后训练显著提高了视觉-语言模型的推理准确性，但由此产生的策略仍然校准不足。终端正确性奖励无法提供梯度来惩罚置信度更高的错误比不确定的错误更严厉，也无法提供将置信度与视觉证据质量联系起来的信号，这一差距在损坏或模糊输入下尤为严重，此时模型仍会报告高置信度的错误答案。我们引入Ranking-Aware Calibration (RAC)，一种训练时框架，利用组内强化学习已经产生的两种比较信号来监督置信度。排名感知组损失强制在同一提示下，更优的回放获得更高的置信度。清洁-损坏成对损失强制置信度随着视觉证据的退化而减弱。由于排名信号迫使策略区分正确和错误的推理路径，它还超越了仅靠正确性奖励所能达到的任务准确性。这两种损失都不需要外部置信度注释，并自然地与组内强化学习后训练整合。我们将在Qwen2.5-VL和InternVL-3.5基础上实例化RAC，并在六个多模态推理基准测试中评估清洁和损坏输入下的表现。实验证明，排名感知损失通过教政策区分更好和更差的推理路径显著提高了任务准确性，而成对损坏损失在退化输入下减少了校准误差。它们的结合在所有测试的backbone上实现了最佳的校准，同时在大多数设置中提高了准确性。

英文摘要

Reinforcement learning post-training has substantially improved the reasoning accuracy of vision-language models, yet the resulting policies remain poorly calibrated. Terminal correctness rewards provide no gradient that penalizes confident errors more than uncertain ones and no signal that ties confidence to the quality of visual evidence, a gap that becomes especially severe under corrupted or ambiguous inputs where models continue to report high confidence on incorrect answers. We introduce Ranking-Aware Calibration (RAC), a training-time framework that supervises confidence using two comparison signals that group-based RL already produces at no additional labeling cost. The ranking-aware group loss enforces that a better rollout receives higher confidence than a worse one within the same prompt. The clean--corrupted pairwise loss enforces that confidence attenuates as visual evidence degrades. Because the ranking signal forces the policy to distinguish between correct and incorrect reasoning paths, it also reinforces task accuracy beyond what correctness rewards alone produce. Both losses require no external confidence annotations and integrate naturally with group-based RL post-training. We instantiate RAC on Qwen2.5-VL and InternVL-3.5 backbones and evaluate on six multimodal reasoning benchmarks under clean and corrupted inputs. Empirical results show that the ranking-aware loss substantially improves task accuracy by teaching the policy to discriminate between better and worse reasoning, while the pairwise corruption loss reduces calibration error under degraded inputs. Their combination achieves the best calibration across all tested backbones while improving accuracy in the majority of settings.

URL PDF HTML ☆

赞 0 踩 0

2605.16998 2026-05-19 quant-ph cs.LG 版本更新

$\mathcal{O}(n)$ alternative to Quantum Fourier Transform with efficient neural net classical post-processing

$\mathcal{O}(n)$ 量子傅里叶变换的替代方案：具有高效的神经网络经典后处理

Kaiming Bian, Zujin Wen, Oscar Dahlsten

发表机构 * Shenzhen Institute for Quantum Science and Engineering（深圳量子科学与工程研究院）； Southern University of Science and Technology（南方科技大学）； International Quantum Academy（国际量子学院）； City University of Hong Kong（香港城市大学）

AI总结本文提出了一种$\mathcal{O}(n)$的量子傅里叶变换替代方案，通过使用Hadamard和受控相位门构建的HP-$L$电路，保留了移位不变性，并通过离散Fisher信息证明了其有效性，最终通过高效的神经网络实现经典后处理，从而在Shor算法中替代传统的$\mathcal{O}(n^2)$量子傅里叶变换。

详情

AI中文摘要

量子傅里叶变换（QFT）是隐子群问题（HSP）算法所必需的，包括用于因数分解的Shor算法。QFT的电路深度对于近期硬件仍然具有挑战性。为了寻找更浅的替代方案，我们识别出QFT用于实现HSP的两个特性。首先，QFT的移位不变性允许移除随机的整体移位。其次，QFT保留了关于隐藏子群生成器的信息，该信息可通过测量结果访问。我们通过离散Fisher信息量化了该信息。我们构造了一组浅层电路，使用Hadamard和受控相位门，称为HP-$L$电路，证明这些电路保留了移位不变性。数值分析显示这些电路保留了指数增长的Fisher信息。$\mathcal{O}(n)$的HP-$1$电路可以替代传统的$\mathcal{O}(n^2)$ QFT在Shor算法中，如数值所示，通过高效的神经网络实现经典后处理。

英文摘要

The Quantum Fourier Transform (QFT) is required by hidden subgroup problem (HSP) algorithms, including Shor's algorithm for factoring. The circuit depth of the QFT remains challenging for near-term hardware. To find shallower alternatives we identify two properties that are exploited by the QFT to enable HSP. Firstly, the shift invariance of the QFT allows for the removal of a random overall shift. Secondly, the QFT retains information about the hidden subgroup generator accessible in the measurement outcomes. We quantify that information via the discrete Fisher information. We construct a family of shallow circuits using Hadamards and controlled-Phase gates, HP-$L$ circuits, that we prove preserve shift invariance. Numerical analysis shows these circuits retain exponentially growing Fisher information. The $\mathcal{O}(n)$ HP-$1$ can replace the $\mathcal{O}(n^2)$ QFT in Shor's algorithm, as demonstrated numerically, with an efficient neural network implementing classical post-processing.

URL PDF HTML ☆

赞 0 踩 0

2605.16993 2026-05-19 cs.CY cs.AI cs.LG 版本更新

Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings

临床AI中的对抗脆弱性与语言脆弱性：在低资源医疗环境中对诊断崩溃的系统审计及不可察觉扰动和跨语言漂移的影响

Anthonio Oladimeji Gabriel, Ahmad Rufai Yusuf

发表机构 * Centre for Clinical Intelligence & Safety（临床智能与安全中心）； Tomorrow University of Applied Sciences（明天应用科学大学）

AI总结本文系统地审计了临床AI在不可察觉扰动和跨语言漂移下的诊断崩溃问题，揭示了对抗脆弱性和语言脆弱性对低资源医疗环境中的临床AI系统的影响。

Comments 23 pages, 9 figures, 3 tables. Code and data available at https://github.com/anthoniooladimeji11-coder/clinical-ai-safety-audit

详情

AI中文摘要

当前的临床人工智能（AI）系统几乎只在干净、标准化的英语输入条件下进行评估，这些条件无法反映低资源环境中的医疗实践现实。本研究首次系统地对临床AI的两种正交安全漏洞进行了双重审计：对抗图像脆弱性和跨语言诊断漂移。使用DenseNet121，这是CheXNet架构的基础，经过在COVID-QU-Ex胸部X光数据集（85,318张图像；COVID-19、非COVID肺炎、正常）上微调，我们证明在Fast Gradient Method（FGM）扰动下，epsilon=0.021时，诊断准确率从89.3%下降到62.0%，这种幅度对人眼来说是不可察觉的。标准防御策略，包括高斯平滑和投票集成，未能恢复临床安全。在平行的语言脆弱性实验中，我们测试了Llama3.1:8b和NatLAS（N-ATLAS）在Standard English、Nigerian Pidgin（Naija）和Yoruba-inflected English中的20例COVID-19临床病例。两种模型均表现出显著的准确性下降：Llama3.1:8b在Pidgin上从80.0%下降到65.0%；NatLAS，一个非洲语境模型，从85.0%下降到55.0%，诊断一致性下降到50%。这些发现为尼日利亚初级卫生中心（PHC）部署中代表性的临床AI系统建立了定量失败范围，并促使对对抗性强、语言包容的临床AI架构的紧急呼吁。

英文摘要

Current clinical artificial intelligence (AI) systems are evaluated almost exclusively on clean, standardised, English-language inputs, conditions that do not reflect the realities of healthcare delivery in low-resource settings. This study presents the first systematic dual audit of two orthogonal safety vulnerabilities in clinical AI: adversarial image fragility and cross-lingual diagnostic drift. Using DenseNet121, the architecture underlying CheXNet, fine-tuned on the COVID-QU-Ex chest X-ray dataset (85,318 images; COVID-19, Non-COVID Pneumonia, Normal), we demonstrate that diagnostic accuracy collapses from 89.3% to 62.0% under a Fast Gradient Method (FGM) perturbation of epsilon=0.021, a magnitude imperceptible to the human eye. Standard defensive strategies including Gaussian smoothing and ensemble voting failed to restore clinical safety. In a parallel language fragility experiment, we tested Llama3.1:8b and NatLAS (N-ATLAS) on 20 COVID-19 clinical cases presented in Standard English, Nigerian Pidgin (Naija), and Yoruba-inflected English. Both models exhibited significant accuracy degradation: Llama3.1:8b dropped from 80.0% to 65.0% on Pidgin; NatLAS, an African-context model, collapsed from 85.0% to 55.0%, with diagnosis consistency falling to 50%. These findings establish a quantitative failure envelope for clinical AI under conditions representative of Primary Health Centre (PHC) deployment in Nigeria, and motivate urgent calls for adversarially hardened, linguistically inclusive clinical AI architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.16989 2026-05-19 cs.LG 版本更新

Decision-Aware Proximal Bridge Learning for Optimal Treatment Selection

面向决策的近端桥学习用于最优治疗选择

Tomàs Garriga, Alejandro Almodóvar, Axel Brando, Gerard Sanz, Eduard Serrahima de Cambra, Juan Parras

发表机构 * Novartis（诺华）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）； Universidad Politécnica de Madrid（马德里理工大学）

AI总结本文提出了一种面向决策的近端桥学习方法，通过强调决策相关治疗区域并保留全局稳定性，解决了在近端因果推断中治疗选择和最优决策的不足。

详情

AI中文摘要

在需要连续动作的个性化治疗选择中，必须在决策相关区域中准确估计因果响应，而不是在整个动作空间中均匀估计。因此，估计全局因果响应面并选择最大化它的治疗可能不最优，因为标准估计目标根据观察到的治疗分布分配建模努力，而不是决定最优决策的区域。虽然在无偏设定中已经研究了面向决策的方法，但在近端因果推断中，这一问题仍处于探索阶段，其中代理变量和桥函数在存在隐藏混杂的情况下能够通过合适假设进行识别。尽管有最近的进展，近端方法主要集中在治疗效应和潜在结果估计，而不是治疗选择和最优决策。为弥合这一差距，我们引入了一种面向政策的加权桥损失，强调决策相关治疗区域的同时保留全局稳定性。我们证明了一个后悔界，表明所提出的加权桥损失通过加权不恰当常数控制治疗选择的后悔。我们将在几种近端桥求解器的决策意识变体中实例化该框架，得到交替进行加权桥估计、响应面投影、策略更新和权重细化的实用算法。经验上，我们发现面向决策的加权方法在多个桥求解器中减少了后悔，表明在近端设置中改进了治疗选择。

英文摘要

Individualized treatment selection with continuous actions requires accurate causal response estimation in decision-relevant regions, rather than uniformly over the entire action space. Estimating a global causal response surface and then choosing the treatment that maximizes it can therefore be suboptimal, since standard estimation objectives allocate modeling effort according to the observed treatment distribution rather than the regions that determine the optimal decision. While decision-aware approaches have been studied in unconfounded settings, this problem remains underexplored in proximal causal inference, where proxy variables and bridge functions enable identification under suitable assumptions even in the presence of hidden confounding. Despite recent progress, proximal methods have primarily focused on treatment-effect and potential-outcome estimation rather than treatment selection and optimal decision-making. To bridge this gap, we introduce a policy-targeted weighted bridge loss that emphasizes decision-relevant treatment regions while retaining global stabilization. We prove a regret bound showing that the proposed weighted bridge loss controls treatment-selection regret through a weighted ill-posedness constant. We instantiate the framework in decision-aware variants of several proximal bridge solvers, yielding practical algorithms that alternate between weighted bridge estimation, response-surface projection, policy update, and weight refinement. Empirically, we find that decision-aware weighting reduces regret across several bridge solvers, suggesting improved treatment selection in proximal settings.

URL PDF HTML ☆

赞 0 踩 0

2605.16975 2026-05-19 cs.LG cs.AI 版本更新

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

扩展预训练的10秒ECG基础模型以适应更长的时域

Wei Tang, Jinpei Han, Kangning Cui, Mattia Carletti, Fredrik K. Gustafsson, Shreyank N Gowda, Patitapaban Palo, Anshul Thakur, Lei Clifton, Jean-michel Morel, Raymond H. Chan, David A. Clifton, Xiao Gu

发表机构 * City University of Hong Kong（香港城市大学）； Imperial College London（伦敦帝国学院）； Wake Forest University（威克森林大学）； University of Nottingham（诺丁汉大学）； Lingnan University（岭南大学）； University of Oxford（牛津大学）

AI总结本文提出了一种参数高效的框架，通过在不重新训练基础模型的情况下扩展预训练的10秒ECG基础模型，使其能够处理更长和可变长度的ECG信号，解决了结构不兼容和语义挑战问题，实验表明其在多个长时域ECG任务中优于滑动窗口和池化基线方法。

详情

AI中文摘要

预训练在典型诊断10秒ECG片段上的ECG基础模型已在多种临床应用中展示了强大的迁移能力。然而，许多实际应用产生的记录通常更长，且在推理过程中持续时间各异。这些10秒模型缺乏整合时间信息的内置方法。将其扩展到更长的时域引入了两个挑战：由于输入长度差异导致的结构不兼容性，以及限制有意义时间聚合的语义挑战。我们提出了一种参数高效的框架，通过冻结预训练的10秒模型，引入一个轻量级插件模块，以两种互补的方式扩展模型：(i) 结构兼容的长序列处理，(ii) 语义指导的时间建模。在多个长时域ECG任务、数据集和基础模型背骨上的实验表明，我们的方法能够从预训练的快照模型中实现稳健的长时域扩展，一致优于滑动窗口和池化基线方法，具有强大的参数效率。

英文摘要

Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, have demonstrated strong transferability across a range of clinical applications. However, many real-world applications produce recordings that are typically longer, and are varied in duration during inference time. These 10-second models have no built-in way to combine information across time. Extending them to longer horizons introduces two challenges: structural incompatibilities arising from input-length disparities, and semantic challenges that limit meaningful temporal aggregation. We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling. Experiments on multiple long-horizon ECG tasks, datasets, and foundation model backbones demonstrate that our method enables robust long-horizon extension from pretrained snapshot models, consistently outperforming sliding-window and pooling-based baselines with strong parameter efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.16973 2026-05-19 cs.CV cs.LG 版本更新

SHED: Style-Homogenized Embedding Alignment for Domain Generalization

SHED: 风格均质化嵌入对齐用于领域泛化

Kai Gan, Tong Wei

发表机构 * School of Computer Science and Engineering, Southeast University, Nanjing 210096, China（1 东南大学计算机科学与工程学院，南京 210096，中国）； Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China（2 教育部计算机网络与信息集成重点实验室（东南大学），中国）

AI总结本文提出SHED方法，通过均质化嵌入对齐来解决领域泛化中的信息不对称问题，实验表明其在多个基准测试中取得了最先进的性能。

详情

AI中文摘要

领域泛化旨在通过嵌入分布偏移增强模型对未见领域的鲁棒性。尽管像CLIP这样的大规模视觉-语言模型表现出色，但其直接的图像-文本嵌入对齐却受到固有信息不对称的限制：图像编码了类别语义和领域特定的风格，而文本提示主要传达基本的类别线索。这种不对称性阻碍了在现实场景中对新领域的泛化。为此，我们提出了SHED，一种基于CLIP的新方法，通过对齐风格均质化的嵌入而不是CLIP编码器的原始表示。在训练过程中，SHED从图像嵌入（按源领域计算）和文本嵌入（在多样化的提示模板下平均并去除全局质心）中移除领域特定的风格质心。在推理过程中，考虑到目标领域信息的缺乏，SHED将多样化的文本领域质心投影到视觉空间，并通过成员加权聚合预测。在五个基准测试上的广泛实验表明，SHED在多个基准测试中取得了最先进的性能，显著优于先前方法（例如，在DomainNet上比标准微调高出+4.0%）

英文摘要

Domain generalization aims to enhance model robustness against unseen domains with embedding distribution shifts. While large-scale vision-language models like CLIP exhibit strong generalization, their direct image-text embedding alignment suffers from inherent information asymmetry: images encode both class semantics and domain-specific styles, whereas text prompts primarily convey basic class cues. This asymmetry hinders generalization to novel domains in realistic scenarios. To address this, we propose Style-Homogenized Embedding alignment for Domain-generalization (SHED), a novel CLIP-based method that aligns style-homogenized embeddings instead of raw representations from encoders in CLIP. During training, SHED removes domain-specific style centroids from both image embeddings computed per source domains and text embeddings which are averaged across diverse prompt templates and stripped of a global centroid. For inference, considering the lack of target domain information, SHED projects diverse textual domain centroids into the visual space and aggregates predictions via membership weighting. Extensive experiments on five benchmarks show SHED achieves state-of-the-art performance, outperforming prior methods significantly (e.g., +4.0\% on DomainNet vs. standard fine-tuning).

URL PDF HTML ☆

赞 0 踩 0

2605.16929 2026-05-19 cs.LG 版本更新

Emulating the Forced Response of Climate Models with Flow Matching

用流匹配模拟气候模型的强迫响应

Graham Clyne, Julia Kaltenborn, Peer Nowack, Claire Monteleoni, Anasatase Charantonis

发表机构 * INRIA ； MILA ； Karlsruhe Institute of Technology (KIT)（卡尔斯鲁厄理工学院）

AI总结本文提出利用深度学习模型模拟气候模型对多种气候强迫的响应，通过训练多个SSP情景生成未见过的场景，并验证了该模型在土地表面温度方面的有效性。

详情

AI中文摘要

全球气候模型是模拟过去和潜在未来气候变化路径以及相关气候影响的关键工具。共享社会经济路径（SSPs）描述了全球经济和人口发展的各种未来情景。这些SSPs本质上与气候强迫的变化相关，这些强迫是外部驱动因素，如温室气体和气溶胶排放，从而导致地球能量平衡随时间的变化。这些强迫是气候模型中的基本边界条件，以了解这些变化对气候影响的潜在影响。然而，运行气候模型计算成本极高，与需要大量模拟集以获得更稳健估计的需求相冲突（考虑内部变异性和情景不确定性）。最近的研究表明，可以利用机器学习捕捉气候模型的动力学，当条件于不同气候情景的强迫。我们在此训练了一个深度学习（DL）模型在多个SSP上，并成功生成训练期间未见过的场景。我们的模拟器验证了MESMER-M，一个土地表面温度的统计模拟器。我们的研究展示了生成对多种同时气候强迫（如二氧化碳、甲烷、一氧化二氮、硫酸气溶胶和臭氧）响应的气候变化状态的能力。特别是，我们的消融研究强调需要包括多种不同强迫以用DL模拟器表示长期大气趋势。

英文摘要

Global climate models are essential tools to simulate past and potential future pathways of climate change, as well as associated climate impacts. Shared Socioeconomic Pathways (SSPs) describe a range of future scenarios of global economic and demographic development. These SSPs are intrinsically linked to changes in climate forcings, the external drivers, such as greenhouse gas and aerosol emissions, which in turn lead to the human impact on the energy balance of the Earth over time. These forcings are fundamental boundary conditions in climate models in order to gain insight into the potential climatic impacts of these changes described by each SSP. Running a climate model, however, is extremely computationally expensive, conflicting with the need for large ensembles of simulations for each model to give, e.g., more robust estimates in the presence of internal variability (the inherent, chaotic fluctuations within the climate system) and scenario uncertainty. Recent research has demonstrated the ability to capture climate model dynamics using machine learning when conditioned on forcings from different climatic scenarios. We here train a Deep Learning (DL) model on multiple SSPs and successfully generate scenarios unseen during training. Our emulator is validated against MESMER-M, a statistical emulator of land surface temperature. Our research demonstrates the capacity to generate such changing climate states in response to a variety of simultaneous climate forcings (e.g., carbon dioxide, methane, nitrous oxide, sulphate aerosols, and ozone). In particular, our ablation studies underline a need to include a range of different forcings to represent long-term atmospheric trends with a DL emulator.

URL PDF HTML ☆

赞 0 踩 0

2605.16919 2026-05-19 stat.ML cs.LG 版本更新

CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series

CAST：基于简单集的因果传输用于分布值时间序列

Jiecheng Lu, Jieqi Di, Runhua Wu, Yuwei Zhou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Indiana University（印第安纳大学）

AI总结该研究提出CAST方法，通过因果锚定简单集传输来处理分布值时间序列的因果预测，解决了分布传输中的结构性失效问题，并在多个基准测试中表现出色。

详情

AI中文摘要

许多面向决策的随机系统是通过聚合分布而非标量轨迹观测的：队列占用、移动份额、公共卫生混合、发电源份额、生态组成和空气质量严重程度剖面都生活在概率简单集上并随时间演变。我们研究这些分布值时间序列的因果（在线）预测，并认为过渡算子本身应围绕简单集进行结构化。我们引入CAST（因果锚定简单集传输），一种 successor-local 操作符，它（i）从因果上下文中检索经验后继，（ii）通过持久锚稳定它们，（iii）在有序支持上应用有界的局部随机传输；每一步都通过构造保持简单集。我们识别出一种结构性失效模式，即潜在的转换核别名，其中相似的观测分布在不同的上下文制度下演变不同，且证明任何仅依赖于别名总结的预测者都会遭受不可约的加权Jensen-Shannon超额风险下界，而CAST假设类包含制度-aware的贝叶斯后继；对于有序支持，当传输后继位于无传输锚壳体外时，额外存在Pinsker分离。在覆盖生态、能源、饮食、死亡率、就业、空气质量、恶劣天气、移动和G/G/1，G_t/G/1队列占用的11个公共和模拟基准上，CAST在一步KL（1.27）和自回归滚动JSD（1.91）上获得最佳平均排名，战胜了广泛的统计、组成、递归、卷积和Transformer基线集，并在所有11个部分中取得前两名的离线KL。组件消融和受控合成别名实验验证了理论。

英文摘要

Many decision-facing stochastic systems are observed through aggregate distributions rather than scalar trajectories: queue occupancies, mobility shares, public-health mixtures, generation-source shares, ecological compositions, and air-quality severity profiles all live on the probability simplex and evolve over time. We study causal (online) forecasting for these distribution-valued time series and argue that the transition operator itself should be structured around the simplex. We introduce CAST (Causal Anchored Simplex Transport), a successor-local operator that (i) retrieves empirical successors from causal context, (ii) stabilizes them with a persistence anchor, and (iii) applies a bounded local stochastic transport on ordered supports; every stage preserves the simplex by construction. We identify a structural failure mode, latent transition-kernel aliasing, where similar observed distributions evolve differently under different contextual regimes, and prove that any forecaster depending only on an aliased summary incurs an irreducible weighted Jensen-Shannon excess-risk lower bound, while the CAST hypothesis class contains the regime-aware Bayes successor; for ordered supports an additional Pinsker separation holds whenever the transported successor lies outside the no-transport anchor hull. On eleven public and simulated benchmarks spanning ecology, energy, diet, mortality, employment, air quality, severe weather, mobility, and G/G/1, G_t/G/1 queue occupancy, CAST attains the best average rank on both one-step KL (1.27) and autoregressive rollout JSD (1.91), winning 8/11 sections on each metric against a broad statistical, compositional, recurrent, convolutional, and Transformer baseline set, and top-2 on all 11 sections for offline KL. Component ablations and a controlled synthetic aliasing experiment corroborate the theory.

URL PDF HTML ☆

赞 0 踩 0

2605.16913 2026-05-19 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG math.PR 版本更新

A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights

从样本复杂性到机理洞察的神经网络学习动态的傅里叶视角

Fabiola Ricci, Claudia Merger, Sebastian Goldt

发表机构 * SISSA（国际理论物理中心）

AI总结本文从傅里叶视角研究神经网络学习动态，揭示了自然图像的近似平移不变性和功率谱特性，展示了简单神经网络在图像分类任务中先依赖幅度信息再利用相位信息的学习过程，并证明了在高维输入下仅基于相位信息的分类任务的难度，以及功率谱如何加速相位信息学习。

详情

Journal ref: ICML 2026

AI中文摘要

通过梯度方法训练的神经网络表现出强烈的简单性偏差：它们在学习数据的更复杂特征之前，先学习更简单的统计特征。以往对此现象的研究主要集中在（准）各向同性输入的设置中。在本文中，我们从傅里叶视角研究简单性偏差，这使我们能够将自然图像的两个关键特性纳入分析：近似平移不变性和功率谱。我们首先实验表明，简单神经网络在图像分类任务中首先依赖于幅度信息——与像素对之间的相关性有关——然后再利用相位信息，后者编码边缘和高阶相关性。为此，我们引入了一个合成数据模型，用于平移不变输入，允许对幅度和相位进行精确控制，同时保持可处理性。我们严格证明了对于各向同性和高维输入，仅基于相位信息的分类任务是一个真正困难的任务：在线随机梯度下降（SGD）在n << N^3步内无法区分结构输入与噪声，但需要至少n >> N^3 log^2{N}步。相比之下，我们通过实验和理论证明，功率谱可以显著加速相位信息学习的速度，即使谱本身不帮助分类。对纹理任务的两层网络和ImageNet和CIFAR100的深度卷积网络的模拟证实了幅度和相位之间非平凡的相互作用，提供了深度神经网络高效学习自然图像分布的机理洞察。

英文摘要

Neural networks trained with gradient-based methods exhibit a strong simplicity bias: they learn simpler statistical features of their data before moving to more complex features. Previous analyses of this phenomenon have largely focused on settings with (quasi-)isotropic inputs. In this work, we study the simplicity bias from a Fourier perspective, which allows us to include two key features of natural images in the analysis: approximate translation-invariance and power-law spectra. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. In view of this, we introduce a synthetic data model for translation-invariant inputs that allows precise control over amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classification based on phase information alone is a genuinely hard task: online stochastic gradient descent (SGD) cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we show both experimentally and theoretically that power-law spectra can dramatically accelerate the speed of learning phase information, even if the spectra do not help with classification. Simulations with two-layer networks trained on textures and with deep convolutional networks on ImageNet and CIFAR100 confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insights into how deep neural networks can learn natural image distributions efficiently.

URL PDF HTML ☆

赞 0 踩 0

2605.16905 2026-05-19 cs.LG cs.CV 版本更新

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

AIM：对抗性信息遮蔽用于显著图忠实性评估

Chia-Ying Hsieh, Hsin-Yuan Fang, Chun-Shu Wei

发表机构 * National Yang Ming Chiao Tung University（阳明交通大学）

AI总结本文提出AIM方法，通过对抗性信息遮蔽框架评估显著图的忠实性及遮蔽操作的可靠性，通过对比不同遮蔽方式下的退化效果，减少遮蔽诱导的偏差，并揭示不同模态下符号和非符号归因之间的差异。

详情

AI中文摘要

后验显著性方法广泛用于解释深度神经网络，但其忠实性难以可靠评估。现有评估方法根据显著性诱导的特征排序进行特征遮蔽并测量性能退化，但这种退化可能受遮蔽操作干扰：零遮蔽可能产生分布外伪影，而基于插值的遮蔽可能保留残余预测信息。我们提出对抗性信息遮蔽（AIM），一种基于显著性的对抗性特征替换框架，用于评估显著图的忠实性和遮蔽操作的可靠性。AIM将选定特征替换为输入的对抗性对应值，并在互补的遮蔽顺序下比较退化效果。我们通过随机归因偏差和解释方法忠实性排名的稳定性来评估可靠性。在图像、音频和EEG任务中的实验表明，AIM相比零和插值遮蔽减少了遮蔽诱导的偏差，同时揭示了符号和非符号归因之间的模态依赖性差异。

英文摘要

Post-hoc saliency methods are widely used to interpret deep neural networks, but their faithfulness is difficult to evaluate reliably. Existing evaluations mask features according to saliency-induced feature ordering and measure performance degradation, but this degradation can be confounded by the masking operator: zero masking may create out-of-distribution artifacts, while interpolation-based masking may preserve residual predictive information. We propose Adversarial Information Masking (AIM), a saliency-guided adversarial feature replacement framework for evaluating both saliency-map faithfulness and masking-operator reliability. AIM replaces selected features with values from an adversarial counterpart of the input and compares degradation under complementary masking orders. We assess reliability using random-attribution bias and stability of explanation-method faithfulness rankings. Experiments on image, audio, and EEG tasks suggest that AIM reduces masking-induced bias compared with zero and interpolation-based masking, while revealing modality-dependent differences between signed and unsigned attributions.

URL PDF HTML ☆

赞 0 踩 0

2605.16902 2026-05-19 cs.LG 版本更新

ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery

ArtifactLinker: 通过自动发现最新研究成果来链接科学制品

Haofei Yu, Jiaxuan You, Peter Clark, Bodhisattwa Prasad Majumder, Kyle Richardson

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Allen Institute for AI（人工智能算法研究所）

AI总结本文提出ArtifactLinker框架，通过图神经网络和大语言模型预测模型-数据集链接，并通过编码实验验证，以实现自动发现最新研究成果

Comments 12 pages

详情

AI中文摘要

科学制品如模型和数据集是研究的基础。随着像HuggingFace这样的平台迅速发展，研究人员现在可以访问大量制品。然而，一个关键挑战依然存在：如何通过充分利用现有制品自动发现给定数据集的最新研究成果（SOTA）模型？我们通过将HuggingFace建模为一个制品图来正式化这一任务，其中节点是模型/数据集，边表示评估。我们提出了ArtifactLinker，一个两阶段框架：（1）使用图神经网络（GNN）或图增强的大语言模型（LLM）对有前途的未观测模型-数据集链接进行排名；（2）通过编码实验使用基于LLM的代理验证顶级链接。我们进一步引入了一个名为ArtifactBench的基准，包含14,053个制品和51,337个关系，以评估两个阶段的性能。结果表明：（1）现有制品之间的图结构对缺失链接预测有效；（2）使用ArtifactLinker进行端到端排名和验证有助于发现潜在的SOTA结果和研究见解。

英文摘要

Scientific artifacts such as models and datasets are foundations for research. With the rapid growth of platforms like HuggingFace, researchers now have access to a large number of artifacts. Yet, a key challenge remains: how can we automatically discover the state-of-the-art (SOTA) model for a given dataset by fully leveraging existing artifacts? We formalize this task as automatic SOTA discovery by modeling HuggingFace as an artifact graph, where nodes are models/datasets and edges represent evaluations. We propose ArtifactLinker, a two-stage framework: (1) ranking promising unobserved model--dataset links using Graph Neural Networks (GNNs) or graph-augmented Large Language Models (LLMs), and (2) verifying top-ranked links via coding experiments with LLM-based agents. We further introduce a benchmark named ArtifactBench with 14,053 artifacts and 51,337 relations to evaluate the performance of both stages. Results show that (1) graph structures between existing artifacts are effective for missing link prediction; (2) end-to-end ranking and verification with ArtifactLinker help discover potential SOTA results and research insights.

URL PDF HTML ☆

赞 0 踩 0

2605.16891 2026-05-19 cs.LG 版本更新

Tensor Channel Equivariant Graph Neural Networks for Molecular Polarizability Prediction

张量通道等价图神经网络用于分子极化率预测

Jean Philip Filling, Daniel Franzen, Michael Wand

发表机构 * Institute for Computer Science, Johannes Gutenberg University Mainz, Germany（明斯特大学计算机科学研究所，德国）

AI总结本文提出了一种张量通道等价图神经网络，用于直接预测分子极化率张量，通过改进的PaiNN架构，在消息传递中传播张量结构，从而在分子极化率预测任务中取得更好的性能。

详情

AI中文摘要

我们介绍了一种张量通道等价图神经网络，用于直接预测分子极化率张量。基于高效的PaiNN架构，我们通过在隐藏表示中加入显式的对称秩-2张量通道，这些通道与极化率分解为各向同性和各向异性部分对齐。与仅在读出阶段构建张量输出的方法不同，我们的模型利用几何动机的张量基，在消息传递过程中传播张量结构。这产生了一种针对张量值分子预测的目标对齐架构。在优化的QM7-X几何结构上，所提出的模型在匹配的训练条件下，其全张量和各向异性误差均低于PaiNN风格的读出基线和介电常数MACE基线，并且在推理速度上也显著更快。消融研究显示，这种增益并非来自单纯增加容量，而是来自显式张量传播和与极化率张量各向异性部分匹配的迹零目标参数化相结合。在考虑的张量基中，最强的结果来自于学习的定向特征之间的相互作用，表明这些特征在建模分子极化率方面特别有效。旋转等价性测试进一步确认了所有比较模型在数值上都是等价的，因此观测到的改进归因于对目标张量本身的更好学习。总体而言，我们的结果表明，对于结构化的张量值目标，传播目标对齐的张量特征可以优于仅读出的张量构建和更一般的高阶等价模型。

英文摘要

We introduce a tensor-channel equivariant graph neural network for direct prediction of molecular polarizability tensors. Building on the efficient PaiNN architecture, we augment the hidden representation with explicit symmetric rank-2 tensor channels aligned with the decomposition of polarizability into isotropic and anisotropic components. In contrast to approaches that construct tensor outputs only at readout, our model propagates tensor structure throughout message passing using geometrically motivated tensor bases. This yields a target-aligned architecture for tensor-valued molecular prediction. On optimized QM7-X geometries, the proposed model achieves lower full-tensor and anisotropic error than both a PaiNN-style readout baseline and a dielectric MACE baseline under matched training conditions and at nearly identical parameter count. In this controlled setting, it also outperforms MACE while remaining substantially faster at inference. Ablation studies show that the gain does not arise from increased capacity alone, but from the combination of explicit tensor propagation and a traceless target parameterization matched to the anisotropic part of the polarizability tensor. Among the tensor bases considered, the strongest results are obtained from interactions between learned directional features, indicating that these are particularly effective for modeling molecular polarizability. Rotational equivariance tests further confirm that all compared models are numerically equivariant, so the observed improvements are attributable to better learning of the target tensor itself. Overall, our results show that for structured tensor-valued targets, propagating target-aligned tensor features can outperform both readout-only tensor construction and a more general higher-order equivariant model in the present training setting.

URL PDF HTML ☆

赞 0 踩 0

2605.16887 2026-05-19 cs.CV cs.LG 版本更新

Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

Xin Niu, Enyi Li, Jinchao Liu, Yan Wang, Margarita Osadchy, Yongchun Fang

发表机构 * Tianjin Key Laboratory of Intelligent Robotics, College of Artificial Intelligence, Nankai University, China（天津智能机器人重点实验室，人工智能学院，南开大学，中国）； Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, China（可信行为智能工程研究中心，教育部，南开大学，中国）； Department of Computer Science, Haifa University, Israel（计算机科学系，海法大学，以色列）； VisionMetric Ltd, Canterbury, Kent, UK（VisionMetric Ltd，坎特伯雷，肯特，英国）

AI总结本文提出了一种紧凑的编码器-解码器神经模块（cmUNet），通过跨模态转换和模态内重建，学习模态无关的表示，同时保留身份相关的信息。此外，作者提出了MarrNet，通过将cmUNet连接到标准特征提取网络，实现跨模态匹配，并在多个挑战性任务上验证了其优越性能。

Comments Published in IEEE Transactions on Image Processing. See full abstract in the PDF file

详情

DOI: 10.1109/TIP.2023.3348656.
Journal ref: n IEEE Transactions on Image Processing, vol. 33, pp. 655-670, 2024

AI中文摘要

Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an adversarial/perceptual loss which encourages indistinguishability of representations in the original sample space. For cross-modality matching, we propose MarrNet where cmUNet is connected to a standard feature extraction network which takes as inputs the modality-agnostic representations and outputs similarity scores for matching. We validated our method on five challenging tasks, namely Raman-infrared spectrum matching, cross-modality person re-identification and heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognition, where MarrNet showed superior performance compared to state-of-the-art methods. Furthermore, it is observed that a cross-modality matching method could be biased to extract discriminant information from partial or even wrong regions, due to incompetence of dealing with modality gaps, which subsequently leads to poor generalization. We show that robustness to occlusions can be an indicator of whether a method can well bridge the modality gap.

英文摘要

Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an adversarial/perceptual loss which encourages indistinguishability of representations in the original sample space. For cross-modality matching, we propose MarrNet where cmUNet is connected to a standard feature extraction network which takes as inputs the modality-agnostic representations and outputs similarity scores for matching. We validated our method on five challenging tasks, namely Raman-infrared spectrum matching, cross-modality person re-identification and heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognition, where MarrNet showed superior performance compared to state-of-the-art methods. Furthermore, it is observed that a cross-modality matching method could be biased to extract discriminant information from partial or even wrong regions, due to incompetence of dealing with modality gaps, which subsequently leads to poor generalization. We show that robustness to occlusions can be an indicator of whether a method can well bridge the modality gap.

URL PDF HTML ☆

赞 0 踩 0

2605.16883 2026-05-19 cs.LG 版本更新

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

SE-GA：基于记忆的自进化GUI代理

Shilong Jin, Lanjun Wang, Zhuosheng Zhang

发表机构 * College of Intelligence and Computing, Tianjin University, Tianjin, China（天津大学智能与计算学院）； School of New Media and Communication, Tianjin University, Tianjin, China（天津大学新媒体与传播学院）； School of Computer Science, Shanghai Jiao Tong University, Shanghai, China（上海交通大学计算机科学学院）

AI总结本文提出SE-GA框架，通过整合分层记忆结构和迭代自我改进机制，解决GUI代理在多步骤任务中因上下文窗口受限和静态策略无法适应动态环境的问题，实验表明其在多个基准测试中均达到领先性能。

Comments Accepted by ICML 2026

详情

AI中文摘要

自主图形用户界面（GUI）代理在多步骤任务中常因上下文窗口受限和静态策略无法适应动态环境而遇到困难。为解决这些限制，本文提出了自进化GUI代理（SE-GA），一种新颖的框架，整合了分层记忆结构和迭代自我改进机制。我们的方法核心是测试时间记忆扩展（TTME），通过动态检索事件性、语义性和经验性记忆，在推理过程中提供显著的上下文。为确保持续学习，我们引入了记忆增强自进化（MASE），这是一种训练流程，采用TTME收集的数据来稳定和增强代理的基础策略。在离线和在线基准测试中的广泛评估表明，SE-GA在ScreenSpot上达到89.0%的成功率，在具有挑战性的AndroidControl-High数据集上达到75.8%的成功率。此外，对AndroidWorld基准测试的显著改进突显了其在动态环境中的优越泛化能力。开源代码：https://github.com/jinshilong-dev/SE-GA

英文摘要

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA achieves state-of-the-art performance, reaching success rates of 89.0\% on ScreenSpot and 75.8\% on the challenging AndroidControl-High dataset. Furthermore, significant improvements on the AndroidWorld benchmark highlight the superior generalization to dynamic environments. Open source code: https://github.com/jinshilong-dev/SE-GA

URL PDF HTML ☆

赞 0 踩 0

2605.16863 2026-05-19 cs.RO cs.AI cs.LG 版本更新

Plan First, Diffuse Later: Extrinsic Graph Guidance for Long-Horizon Diffusion Planning

先规划，后扩散：用于长视距扩散规划的外在图引导

Yaniv Hassidof, Adir Morgan, Yilun Du, Kiril Solovey

发表机构 * Technion（技术Ion大学）； Harvard（哈佛大学）

AI总结本文提出了一种外在搜索引导的扩散模型（XDiffuser），通过在状态空间图上先规划再引导扩散过程，以提高长视距规划的效率和效果，尤其在低质量数据和未见任务中表现优异。

详情

AI中文摘要

组合扩散模型通过去噪多个重叠的子轨迹并确保它们构成全局解，为长视距规划提供了一条有前途的路线。然而，强制在长链上执行局部行为往往不足以产生一致的全局结构。最近的工作通过内在搜索在去噪过程中探索多条路径来解决这一限制。尽管内在搜索提高了全局一致性，但代价是重复评估已经计算密集的模型。在本文中，我们主张在去噪过程之外进行外在搜索，为长视距规划提供更有效的探索模式，同时自然地使经典算法能够解决测试时的未见组合任务。我们的eXtrinsic搜索引导的Diffuser（XDiffuser）首先在状态空间图上计算一个计划——作为扩散模型的轻量级局部连接Oracle。该计划随后用于引导单条轨迹的去噪，有效地将探索负担转移出去。XDiffuser在长视距任务上优于基于扩散的基线，特别是在低质量数据领域和超出目标到达的未见任务中，包括多智能体协调和TSP风格推理。项目网站：https://yanivhass.github.io/XDiffuser-site/

英文摘要

Compositional diffusion models offer a promising route to long-horizon planning by denoising multiple overlapping sub-trajectories while ensuring that together they constitute a global solution. However, enforcing local behavior over long chains is often insufficient for a coherent global structure to emerge. Recent works tackle this limitation through intrinsic search, which explores multiple paths during the denoising process. While intrinsic search improves global coherence, it comes at the cost of repeated evaluations of an already compute-heavy model. In this work, we argue that extrinsic search, performed outside the denoising process, offers a more effective mode of exploration for long-horizon planning while naturally enabling the use of classical algorithms to solve unseen combinatorial tasks at test time. Our eXtrinsic search-guided Diffuser (XDiffuser) first computes a plan over a state-space graph -- serving as a lightweight local connectivity oracle for the diffusion model. The plan is then used to guide denoising for a single trajectory, effectively offloading the burden of exploration. XDiffuser outperforms diffusion-based baselines on long-horizon tasks, with particularly large gains in the low-quality data regime and on unseen tasks beyond goal-reaching, including multi-agent coordination and TSP-style reasoning. Project website: https://yanivhass.github.io/XDiffuser-site/

URL PDF HTML ☆

赞 0 踩 0

2605.16860 2026-05-19 cs.LG cs.AI q-bio.QM 版本更新

PhysioSeq2Seq: A Hybrid Physiological Digital Twin and Sequence-to-Sequence LSTM for Long-Horizon Glucose Forecasting in Type 1 Diabetes

PhysioSeq2Seq：一种混合生理数字孪生和序列到序列LSTM的长周期1型糖尿病葡萄糖预测方法

Phat Tran, Neville Mehta, Clara Mosquera-Lopez, Robert H. Dodier, Lizhong Chen, Peter G. Jacobs

发表机构 * Oregon State University（俄勒冈州立大学）； Oregon Health & Science University（俄勒冈健康与科学大学）

AI总结本文提出了一种结合患者特定生理建模与序列到序列LSTM的混合架构PhysioSeq2Seq，用于长周期1型糖尿病葡萄糖预测，通过消除递归误差累积并注入患者匹配的生理状态，提高了预测精度和临床意义。

详情

AI中文摘要

准确的长周期葡萄糖预测对于自动胰岛素输送系统至关重要，这些系统帮助1型糖尿病患者管理血糖并避免危险的低血糖。然而，标准递归长短期记忆网络（LSTM）在更长的周期内由于误差累积存在系统性负偏置，而纯粹的机理微分方程（ODE）模型在群体参数化时无法跨个体泛化。我们提出PhysioSeq2Seq，一种结合患者特定生理建模与序列到序列（Seq2Seq）LSTM的混合架构。对于每个葡萄糖段，双胞胎匹配搜索300个参数化的数字孪生体群体，以从连续葡萄糖监测（CGM）历史中找到最佳拟合的生理匹配。匹配双胞胎的10个内部ODE状态变量被注入到Seq2Seq LSTM的编码器和解码器中。这种同时48步预测策略消除了递归误差累积，而ODE特征提供了一个基于物理的约束，限制了长周期漂移在生理合理范围内。PhysioSeq2Seq在1型糖尿病运动倡议（T1DEXI）数据集中训练了348名参与者的CGM和胰岛素数据，并在74名被排除的参与者上进行评估。在240分钟的预测范围内，PhysioSeq2Seq的平均绝对误差为39.28 mg/dL，平均误差为-10.62 mg/dL，比递归LSTM减少了13.89 mg/dL的偏置，比基于ODE的数字孪生减少了28.62 mg/dL的平均绝对误差。这些结果表明，消除架构反馈并注入患者匹配的生理状态是一种有效且具有临床意义的策略，用于1型糖尿病的长周期葡萄糖预测。

英文摘要

Accurate long-horizon glucose forecasting is critical for automated insulin delivery systems, which help people with type 1 diabetes (T1D) manage their glucose and avoid dangerous hypoglycemia. However, standard recursive long short-term memory (LSTM) networks suffer from systematic negative bias at longer horizons due to error compounding, while purely mechanistic ordinary differential equation (ODE) models fail to generalize across individuals when parameterized at the population level. We propose PhysioSeq2Seq, a hybrid architecture that combines patient-specific physiological modeling with a sequence-to-sequence (Seq2Seq) LSTM. For each glucose segment, twin matching searches a population of 300 parameterized digital twins to identify the best-fitting physiological match from a 3-hour continuous glucose monitoring (CGM) history. The 10 internal ODE state variables of the matched twin are injected as exogenous covariates into both the encoder and decoder of the Seq2Seq LSTM. This simultaneous 48-step prediction strategy eliminates recursive error compounding, while the ODE features provide a physics-grounded constraint that bounds long-horizon drift within physiologically plausible ranges. PhysioSeq2Seq was trained on CGM and insulin data from 348 participants in the Type 1 Diabetes Exercise Initiative (T1DEXI) dataset and evaluated on 74 held-out participants. At the 240-minute horizon, PhysioSeq2Seq achieves a mean absolute error of 39.28 mg/dL and a mean error of -10.62 mg/dL, reducing bias by 13.89 mg/dL over the recursive LSTM and reducing mean absolute error by 28.62 mg/dL over the ODE-based digital twin. These results show that eliminating architectural feedback and injecting patient-matched physiological states is an effective and clinically meaningful strategy for long-horizon glucose forecasting in T1D.

URL PDF HTML ☆

赞 0 踩 0

2605.16848 2026-05-19 cs.CV cs.AI cs.CL cs.LG 版本更新

Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction

基于模式的思考：通过模式诱导突破视觉规划中的感知瓶颈

Yichang Jian, Boyuan Xiao, Zhenyuan Huang, Yifei Peng, Yao-Xiang Ding

发表机构 * State Key Lab of CAD& CG（CAD与CG国家重点实验室）

AI总结本文提出通过模式诱导的方法，利用模式推理和模式诱导策略，使视觉语言模型在视觉规划任务中实现更高效和准确的感知与推理，解决传统模型在复杂输入下的感知瓶颈问题。

详情

AI中文摘要

从原始视觉输入进行规划仍然对当前的视觉-语言模型（VLMs）构成重大挑战，当输入复杂度超出其一步感知能力时。受最近在图像思考（TWI）中的进展启发，一种合理的解决方案是通过迭代获取和整合局部视觉证据，将感知过程分解为更简单的步骤。然而，尽管当前VLMs在一般TWI能力上训练良好，但其在规划领域中的感知瓶颈仍然存在。为解决这一挑战，我们将TWI视为一种工具，逐步构建并反映一个准确的内部世界模型。我们发现，由此产生的无训练规划策略使VLMs能够解决远超其初始能力的任务，但代价是过多的TWI操作会显著增加计算开销。为进一步提高效率，我们提出模式推理，一种新的TWI策略，使VLMs能够主动识别新任务中的已知视觉模式并直接推断局部世界模型结构。为了获得这些模式，我们提出模式诱导，一种在线归纳学习策略，将视觉模式视为复合且可重用的专家，这些专家是自主从经验中发现和优化的。在FrozenLake、Crafter和CubeBench领域中的实验评估表明，我们的方法在准确性和效率之间实现了良好的平衡。

英文摘要

Planning from raw visual input remains a significant challenge for current Vision-Language Models (VLMs), when the complexity of input is beyond their one-step perception capability. Motivated by recent advances in Thinking with Images (TWI), a reasonable solution is to decompose the perception process into simpler steps by iteratively acquiring and incorporating local visual evidence. However, even though current VLMs are well-trained in general TWI ability, their perceptual bottleneck in the planning domain remains. To tackle this challenge, we formulate TWI as a tool to gradually build and reflect an accurate internal world model. We find that the resulting training-free planning strategy enables VLMs to solve tasks that are far beyond their initial capabilities, at the cost that too many TWI operations would significantly increase the computational overhead. To further improve efficiency, we propose Pattern Inference, a novel TWI strategy enabling VLMs to actively recognize known visual patterns in the new tasks and directly infer local world model structures. To obtain these patterns, we propose Pattern Induction, an online inductive learning strategy treating visual patterns as composite and reusable experts, which are autonomously discovered and optimized from experience. Experimental evaluations in FrozenLake, Crafter and CubeBench domains show that our approaches achieve a desirable balance between accuracy and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.16836 2026-05-19 stat.ML cs.LG 版本更新

HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations

HYVINT: 基于变分表示的强度驱动超图生成

Xinyi Hong, Shuntuo Xu, Zhou Yu

发表机构 * School of Statistics（统计学系）； East China Normal University（东华大学）

AI总结本文提出HYVINT框架，通过强度驱动的超图生成机制和变分估计器，解决超图生成中节点-超边关系的建模问题，实现高保真且具有多样性的生成。

详情

AI中文摘要

超图提供了一个系统的方法来建模多阶交互，应用于推荐系统、社交网络和分子建模等领域。超图生成仍然具有挑战性，因为 incidence 结构是离散、稀疏且由异质的高阶交互支配。现有的生成器通常依赖于隐含的潜在空间或连续的 incidence 解码器，这些方法在解释节点-超边关系的产生机制方面有限。为了解决这些限制，我们提出HYVINT，一种强度驱动的超图生成框架。我们的关键创新是双重：(i) 我们开发了一种强度驱动的 incidence 形成机制，将潜在的交互强度与二进制 incidence 相联系；(ii) 我们推导出一个可处理的变分下界估计器用于学习潜在表示。我们提供了生成误差界和渐近收敛速率，并在合成和现实超图上实验证明HYVINT在保持显著新颖性和多样性的同时实现了强保真度。

英文摘要

Hypergraphs provide a principled framework for modeling polyadic interactions, with applications in recommendation systems, social networks, and molecular modeling. Hypergraph generation remains challenging because incidence structures are discrete, sparse, and governed by heterogeneous higher-order interactions. Existing generators often rely on implicit latent spaces or continuous incidence decoders, which provide limited mechanistic interpretation of how node-hyperedge incidences arise. To address these limitations, we propose HYVINT, an intensity-driven hypergraph generative framework. Our key innovations are twofold: (i) we develop an intensity-driven incidence formation mechanism for hypergraphs that links latent interaction strength to binary incidence, and (ii) we derive a tractable lower-bound variational estimator for learning latent representations. We provide generation error bounds with asymptotic convergence rates and empirically show that HYVINT achieves strong fidelity while maintaining substantial novelty and diversity on synthetic and real-world hypergraphs.

URL PDF HTML ☆

赞 0 踩 0

2605.16834 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data

基于有限数据的细粒度多模态对齐的相对表示学习

Shiwon Kim, Yu Rang Park

发表机构 * Yonsei University（延世大学）

AI总结本文提出了一种基于相对表示的学习方法，用于在有限数据条件下实现细粒度多模态对齐，通过学习token级别的跨模态结构来提升零样本分类、跨模态检索和零样本分割任务的性能。

详情

AI中文摘要

多模态预训练展示了强大的泛化性能，但在缺乏配对数据的领域中，这种范式往往难以实施。一种有前景的替代方法是事后多模态对齐，它通过有限数量的配对示例分别对预训练的单模态编码器进行对齐。然而，现有方法主要关注全局表示的对齐，忽略了片段-token关系。这可能阻碍了需要细粒度跨模态匹配的任务的迁移，超越粗粒度样本层面的语义。为了解决这个问题，我们提出了一种事后对齐方法，通过相对表示学习token级别的跨模态结构。具体来说，我们通过图像和文本与每种模态空间中一组可学习锚点的token级相似性来表示它们，这些锚点被训练以诱导一致的跨模态相似性模式，以匹配对。尽管仅学习锚点而没有重大的投影层，我们的方法在零样本分类、跨模态检索和零样本分割任务中均显著优于现有方法。这突显了在有限配对数据下，建模细粒度跨模态结构对于有效事后多模态对齐的重要性。

英文摘要

Multimodal pre-training demonstrates strong generalization performance, but this paradigm is often impractical in domains where paired data are scarce. A promising alternative is post-hoc multimodal alignment, which aligns separately pre-trained unimodal encoders using a limited number of paired examples. However, existing methods focus primarily on aligning global representations, missing patch-token relations. This may hinder transfer to tasks that require fine-grained cross-modal matching beyond coarse sample-level semantics. To address this issue, we propose a post-hoc alignment method that learns token-level cross-modal structure using relative representations. Specifically, we represent images and texts through their token-level similarities to a set of learnable anchors in each modality space, which are trained to induce consistent cross-modal similarity patterns for matched pairs. Despite learning only the anchors without heavy projection layers, our approach consistently outperforms existing methods in zero-shot classification, cross-modal retrieval, and zero-shot segmentation by a substantial margin. This highlights the importance of modeling fine-grained cross-modal structure for effective post-hoc multimodal alignment with limited paired data.

URL PDF HTML ☆

赞 0 踩 0

2605.16828 2026-05-19 stat.ML cs.AI cs.LG stat.ME 版本更新

AgentKernelArena: GPU核优化代理的通用化意识基准测试

Sharareh Younesian, Wenwen Ouyang, Sina Rafati, Mehdi Rezagholizadeh, Sharon Zhou, Ji Liu, Yue Liu, Yuchen Yang, Hao Li, Ziqiong Liu, Dong Li, Vikram Appia, Zhenyu Gu, Emad Barsoum

发表机构 * AMD

AI总结本文提出AgentKernelArena，一个用于评估GPU核优化代理的开源基准，通过隔离工作区和统一评分机制，测试代理在不同任务和硬件目标上的性能和通用化能力，发现大多数任务在正确性和编译效率上表现优异，但在PyTorch到HIP的转换任务中存在显著的正确性下降。

详情

AI中文摘要

GPU核优化对于高效深度学习系统日益关键，但编写高性能核仍然需要大量的低级专业知识。最近的AI编码代理可以迭代阅读代码、调用编译器和性能分析器，并优化实现，但现有的核基准测试仅评估单个LLM调用而非完整的代理工作流程，且未包含核到核的优化和未见过的配置泛化测试。我们提出了AgentKernelArena，一个开源的基准测试，用于衡量AI编码代理在GPU核优化上的能力。该基准测试包含196个任务，涵盖HIP到HIP的优化、Triton到Triton的优化以及PyTorch到HIP的转换，并在隔离的工作区中使用门控编译、正确性和性能检查，集中评分和一个未见过的配置泛化协议，测试优化是否转移到代理从未见过的输入配置。在包括Cursor Agent、Claude Code和Codex Agent在内的生产代理中，我们发现大多数任务在正确性和编译效率上表现优异，最强配置在PyTorch到HIP任务中平均加速达6.89倍，在HIP到HIP任务中达6.69倍，在Triton到Triton任务中达2.13倍。我们的未见过的配置评估显示，HIP到HIP和Triton到Triton的优化大多能转移到未见过的输入形状，而PyTorch到HIP的转换则表现出显著的正确性下降，表明生成核的代理经常硬编码形状特定的假设。AgentKernelArena被设计为一个模块化、可扩展的框架，用于严格评估跨代理、任务和硬件目标的代理GPU核优化。

英文摘要

GPU kernel optimization is increasingly critical for efficient deep learning systems, but writing high-performance kernels still requires substantial low-level expertise. Recent AI coding agents can iteratively read code, invoke compilers and profilers, and refine implementations, yet existing kernel benchmarks evaluate single LLM calls rather than full agent workflows, and none include both kernel-to-kernel optimization and unseen-configuration generalization testing. We present AgentKernelArena, an open-source benchmark for measuring AI coding agents on GPU kernel optimization. The benchmark contains 196 tasks spanning HIP-to-HIP optimization, Triton-to-Triton optimization, and PyTorch-to-HIP translation, and evaluates complete agent workflows in isolated workspaces using gated compilation, correctness, and performance checks, centralized scoring and an unseen-configuration generalization protocol that tests whether optimizations transfer to input configurations the agent never observed. Across production agents including Cursor Agent, Claude Code, and Codex Agent, we find near-perfect compilation and high correctness rates on most task categories, with the strongest configurations achieving mean speedups of up to 6.89x on PyTorch-to-HIP, 6.69x on HIP-to-HIP, and 2.13x on Triton-to-Triton tasks. Our unseen-configuration evaluation shows that HIP-to-HIP and Triton-to-Triton optimizations largely transfer to unseen input shapes, while PyTorch-to-HIP exhibits substantial correctness drops, indicating that agents generating kernels from scratch frequently hardcode shape-specific assumptions. AgentKernelArena is designed as a modular, extensible framework for rigorous evaluation of agentic GPU kernel optimization across agents, tasks, and hardware targets.

URL PDF HTML ☆

赞 0 踩 0

2605.16809 2026-05-19 cs.LG 版本更新

Lever：智能手机上的推测LLM推理

Tuowei Wang, Fengzu Li, Yanfan Sun, Wei Gao, Ju Ren

发表机构 * Tsinghua University（清华大学）； Beihang University（北航）； University of Pittsburgh（匹兹堡大学）

AI总结本文提出Lever系统，通过联合优化推测解码的三个阶段，在智能手机上实现高效的闪存支持的LLM推理，显著降低了推理延迟。

详情

AI中文摘要

大型语言模型（LLMs）在交互式移动应用中需求日益增加，但高质量模型超出了智能手机上有限的DRAM容量。闪存可以容纳更大的模型，但闪存支持的推理速度慢，因为自回归解码反复调用目标模型并产生昂贵的I/O。我们观察到推测解码非常适合这种环境：一个小型草稿模型可以保留在DRAM中，而一个更大的驻留于闪存的目标模型在每次调用中验证多个候选令牌。然而，现有方法假设服务器级加速器，并未考虑长时间I/O延迟、有限的计算并行性和不规则的推测执行。我们提出了Lever，一个用于智能手机上高效闪存支持LLM推理的端到端系统。Lever在移动约束下联合优化推测解码的三个阶段。在草稿阶段，它使用I/O和计算感知的增益-成本目标构建令牌树。在验证阶段，它通过早期退出预测修剪低价值分支以减少目标模型计算。在执行阶段，它将推测高效地映射到移动CPU-NPU硬件以提高利用率。全面评估显示，Lever将推理延迟降低了2.93倍于基准闪存卸载推理，1.50倍于传统推测解码，缩小了闪存支持与内存驻留LLM推理之间的延迟差距。

英文摘要

Large language models (LLMs) are increasingly needed for interactive mobile applications, but high-quality models exceed the limited DRAM available on smartphones. Flash storage can hold larger models, yet flash-backed inference is slow because autoregressive decoding repeatedly invokes the target model and incurs costly I/O. We observe that speculative decoding is a natural fit for this setting: a small draft model can remain in DRAM, while a larger flash-resident target model verifies multiple candidate tokens per invocation. However, existing methods assume server-class accelerators and fail to account for prolonged I/O latency, limited computation parallelism, and irregular speculation execution. We present Lever, an end-to-end system for efficient flash-backed LLM inference on smartphones. Lever jointly optimizes the three stages of speculative decoding under mobile constraints. For drafting, it builds token trees using an I/O- and compute-aware gain-cost objective. For verification, it prunes low-value branches through early-exit prediction to reduce target-model computation. For execution, it maps speculation efficiently across mobile CPU-NPU hardware to improve utilization. Comprehensive evaluations show that Lever reduces inference latency by an average of 2.93x over baseline flash-offloaded inference and 1.50x over conventional speculative decoding, narrowing the latency gap between flash-backed and memory-resident LLM inference.

URL PDF HTML ☆

赞 0 踩 0

2605.16776 2026-05-19 cs.LG cs.AI 版本更新

Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning

可区分删除：统一知识擦除与拒绝用于大语言模型去学习

Puning Yang, Junchi Yu, Qizhou Wang, Philip Torr, Bo Han, Xiuying Chen

发表机构 * Department of Natural Language Processing, MBZUAI. ； University of Oxford. ； RIKEN Center for Advanced Intelligence Project. ； TMLR Group, Department of Computer Science, Hong Kong Baptist University

AI总结本文提出D^2方法，通过限制潜在表示中的响应分布来擦除不受欢迎的知识，同时区分保留知识，从而实现安全且一致的拒绝机制，以提高大语言模型去学习的效果。

Comments ICML2026 Accepted

详情

AI中文摘要

减轻敏感和有害输出对于确保大型语言模型（LLM）的安全部署至关重要。现有方法通常遵循两种范式：知识删除（KD），在训练期间擦除不受欢迎的信息，以及可区分拒绝（DR），在推理期间引导模型远离使用敏感知识。尽管进展迅速，基于KD的去学习在抑制特定令牌序列作为完整知识移除替代物时面临偏见删除的问题，而基于DR的去学习则因底层知识仍然完整而有重新出现有害知识的风险。为了解决这些问题，我们提出了可区分删除（D^2），一种通过限制潜在表示中的响应分布来擦除不受欢迎知识，同时区分保留知识的范式，从而能够安全且一致地处理去学习的输入。为了实现D^2，我们引入了一个能量指数，该指数量化了知识的存在以及去学习内容与保留内容之间的分离。数学和实证分析表明，能量既准确又高效，使能量基于去学习对齐（EUA）能够在训练期间强制执行能量边界去学习，并在推理时应用基于能量的拒绝机制。广泛的实验表明，EUA显著优于先前方法，表明D^2的优越性。我们的代码可在https://github.com/Puning97/EUA-for-LLM-Unlearning获取。

英文摘要

Mitigating sensitive and harmful outputs is fundamental to ensuring safe deployment of LLMs. Existing approaches typically follow two paradigms: Knowledge Deletion (KD), which erases undesirable information during training, and Distinguishable Refusal (DR), which steers models away from using sensitive knowledge during inference. Despite rapid progress, KD-based unlearning struggles with biased deletion due to suppressing specific token sequences as a substitute for complete knowledge removal, whereas DR-based unlearning risks the re-emergence of harmful knowledge because the underlying knowledge remains intact. To address these issues, we propose Distinguishable Deletion ($\mathrm{D^2}$), a paradigm that restricts the response distribution in the latent representation rather than specific tokens to erase undesirable knowledge, while distinguishing it from retained knowledge, enabling a refusal mechanism to handle unlearned inputs safely and coherently. To implement $\mathrm{D^2}$, we introduce an energy index that quantifies the presence of knowledge and the separation between unlearned and retained content. Mathematical and empirical analyses show that energy is both accurate and efficient, enabling Energy-based Unlearning Alignment (EUA) to enforce energy-boundary unlearning during training and apply an energy-based refusal mechanism at inference. Extensive experiments demonstrate that EUA significantly outperforms previous methods, indicating the superiority of $\mathrm{D^2}$. Our code is available at https://github.com/Puning97/EUA-for-LLM-Unlearning.

URL PDF HTML ☆

赞 0 踩 0

2605.16775 2026-05-19 cs.CV cs.AI cs.LG 版本更新

VolTA-3D: Self-Supervised Learning for Brain MRI using 3D Volumetric Token Alignment

VolTA-3D: 基于3D体积分块对齐的脑MRI自监督学习

Amy Makawana, Abhijeet Parida, Marius George Linguraru, Julia Ive, Syed Muhammad Anwar

发表机构 * Institute of Health Informatics（健康信息学研究所）； Sheikh Zayed Institute for Pediatric Surgical Innovation（谢赫扎耶德儿童外科创新研究所）； School of Medicine and Health Sciences（医学与健康科学学院）

AI总结本文提出VolTA-3D，一种用于脑MRI自监督学习的3D视觉Transformer框架，通过联合对齐全局类风格标记和局部块标记，增强体积分块表示的可迁移性，从而在多个下游任务中表现出更好的泛化能力和鲁棒性。

Comments Accepted at EMBC 2026

详情

AI中文摘要

自监督学习（SSL）通过利用大规模未标记数据推动了医学图像分析的发展。然而，在脑磁共振成像（MRI）中，大多数3D模型仍局限于分割或分类任务，限制了其在不同数据集、成像协议和下游任务中的泛化能力。这种缺乏可迁移性限制了3D MRI模型的临床应用，尽管存在大量未标记的体数据。我们提出了Volta-3D，一种自监督的3D视觉Transformer框架，旨在学习可迁移的体表示。Volta-3D在学生-教师范式中联合对齐全局类风格标记和局部块标记，并强制细粒度结构重建。这种联合全局-局部对齐解决了脑MRI中有限的语义多样性和细微解剖特征，这对现有SSL方法构成了挑战。我们在多个分布外下游任务上评估了Volta-3D，包括海马体分割和性别及阿尔茨海默病与健康对照的分类。在所有任务中，Volta-3D学习的表示均优于随机初始化的基线，证明了其在域偏移下的改进可迁移性和鲁棒性。因此，在预训练过程中联合强制全局语义一致性和局部结构学习，使模型能够从未标记的脑MRI数据中学习更广泛的概念。总体而言，VolTA-3D支持有效的多任务下游性能，具有任务特定的适应性，是迈向通用化和临床可行的3D模型的一步。

英文摘要

Self-supervised learning (SSL) has advanced medical image analysis be enabling learning form large unlabelled data. However, in brain magnetic resonance imaging (MRI), most 3D models remain specialized for either segmentation of classification, limiting their ability to generalize across datasets, imaging protocols,, and downstream tasks. This lack of transferability constrains the clinical utility of 3D MRI models, despite the availability of unlabeled volumetric data. We present Volta-3D, a self-supervised 3D Vision Transformer framework designed to learn transferable volumetric representations. Volta-3D jointly aligns global class-style tokens and local patch tokens within a student-teacher paradigm and enforces fine-grained structural reconstruction. This combined global-local alignment addresses the limited semantic diversity and subtle anatomical characteristics of brain MRI, which challenges existing SSL approaches. We evaluate Volta-3D on multiple out-of-distribution downstream tasks, including hippocampal segmentation and classification of sex and Alzheimer's disease versus healthy controls. Across all tasks, representations learned by Volta-3D outperform randomly initialized baselines, demonstrating improved transferability and robustness under domain shift. Hence jointly enforcing global semantic consistency and local structural learning during pretraining enables broader concept learning from unlabeled brain MRI data. Overall VolTA-3D supports effective multi-task downstream performance with task-specific pertaining, a step towards generalizable and clinically viable 3D models.

URL PDF HTML ☆

赞 0 踩 0

2605.16755 2026-05-19 cs.LG cs.AI 版本更新

Learning Unbiased Permutations via Flow Matching

通过流匹配学习无偏排列

Yimeng Min, Carla P. Gomes

发表机构 * Department of Computer Science（计算机科学系）； Cornell University（康奈尔大学）

AI总结本文提出PermFlow框架，通过在具有单位行和列和的矩阵仿射子空间上直接操作，学习多模态排列分布，避免了基于熵正则化Sinkhorn方法在模糊性下的崩溃问题。

详情

AI中文摘要

学习排列对于排序、排名和匹配至关重要，但现有的基于熵正则化Sinkhorn的可微方法会产生单一的软解，并在模糊性下崩溃。我们提出了PermFlow，一种条件流匹配框架，直接在具有单位行和列和的矩阵仿射子空间上操作。一个闭式切线空间投影器通过构造而非迭代校正，精确保持这些约束沿每条轨迹。一个最近目标耦合将不同的噪声初始值引导到不同的有效排列。结果是一个能够捕捉多模态排列分布而非将其坍缩到单一模式的模型。在具有混合数字模糊性的视觉排序任务和对称线性分配问题上，PermFlow在无歧义输入上具有高精度，并在模糊性下恢复两个有效排列，而基于Sinkhorn的基线方法在结构上失败。

英文摘要

Learning permutations is fundamental to sorting, ranking, and matching, but existing differentiable methods based on entropy-regularized Sinkhorn produce a single softened solution and collapse under ambiguity. We present PermFlow, a conditional flow matching framework that operates directly on the affine subspace of matrices with unit row and column sums. A closed-form tangent-space projector preserves these constraints exactly along every trajectory, by construction rather than through iterative correction, and a nearest-target coupling routes distinct noisy initializations toward distinct valid permutations. The result is a model that captures multimodal permutation distributions rather than collapsing them to a single mode. On a visual sorting task with blended-digit ambiguity and a symmetric linear assignment problem, PermFlow achieves high accuracy on unambiguous inputs and recovers both valid permutations under ambiguity, where Sinkhorn-based baselines structurally fail.

URL PDF HTML ☆

赞 0 踩 0

2605.16748 2026-05-19 cs.GR cs.AI cs.CV cs.LG cs.MA cs.MM 版本更新

Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation

Genflow Ad Studio：一种用于品牌一致、自我纠正视频生成的复合AI架构

Debanshu Das, Lavi Nigam, Sunil Kumar Jang Bahadur, Gopala Dhar

发表机构 * Google（谷歌）

AI总结本文提出Genflow Ad Studio，一种复合AI架构，通过品牌DNA提取模块和对抗性多代理质量控制循环，提高了品牌一致的视频生成效率，将合规率从42%提升到89%。

Comments 6 pages, 2 figures, 2 tables. Accepted to the ACM Conference on AI and Agentic Systems (CAIS '26). Includes demo video and code repository links

详情

DOI: 10.1145/3786335.3813213
Journal ref: ACM Conference on AI and Agentic Systems (CAIS '26), May 26-29, 2026, San Jose, CA, USA

AI中文摘要

近期生成视频模型的进步展示了高水平的视觉保真度，但其在企业环境中的整合受到时间不一致性和严重的品牌不一致性的限制。当前的单体架构难以强制执行严格的品牌约束，经常产生未经批准的视觉资产。我们介绍了Genflow，一种复合AI系统，旨在生成媒体生产中强制执行品牌一致性。我们的架构集成了基于检索的'品牌DNA'提取模块，以参数化生成方式根据已确立的企业身份指南进行生成。此外，我们实现了对抗性多代理质量控制（QC）循环。与单次生成流程不同，此流程采用评估代理，反复批评生成的帧，与提取的参数进行比较，促使生成模型细化输出，直到达成确定性的一致性。通过转向多阶段、自我纠正的流程，Genflow将品牌合规视频生成的产量从42%提高到89%，建立了稳健的框架，用于可扩展的、企业级的生成系统。

英文摘要

Recent advancements in generative video models demonstrate high visual fidelity, yet their integration into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment. Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets. We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production. Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines. Furthermore, we implement an Adversarial Multi-Agent Quality Control (QC) loop. Instead of a single-pass generation, this pipeline employs evaluator agents to iteratively critique generated frames against the extracted parameters, prompting generator models to refine outputs until a deterministic consensus is reached. By transitioning to a multi-stage, self-correcting pipeline, Genflow improved the yield of brand-compliant video generations from 42% to 89%, establishing a robust framework for scalable, enterprise-grade generative systems.

URL PDF HTML ☆

赞 0 踩 0

2605.16747 2026-05-19 cs.LG math.AP math.OC math.PR math.ST stat.TH 版本更新

Propagation of Chaos in Contextual Flow Maps

在上下文流映射中传播混沌

Shi Chen, Zhengjiang Lin, Kaizhao Liu, Philippe Rigollet

发表机构 * Department of Mathematics, Massachusetts Institute of Technology（麻省理工学院数学系）

AI总结本文提出了一种定量统计理论，用于在大上下文范围内研究transformers，通过采用上下文流映射（CFMs）的抽象：在一组注意力块中，动态系统在上下文度量的存在下演进一个区分的token。在此框架下，有限上下文模型近似于理想化的无限上下文系统，其中上下文度量被其底层总体取代，因此上下文长度n成为统计资源。利用动态的麦肯-瓦尔科夫结构和经典的传播混沌经典机器，我们建立了前向边界，控制有限上下文和无限上下文CFMs在深度上的偏差，并建立了后向边界，控制对应的训练轨迹在在线梯度下降迭代中的偏差。这两个边界实现了通用CFMs的最优Wasserstein速率n^{-1/d}和参数速率n^{-1/2}，对于包含transformers的受限CFM类。分析基于新的欧拉共轭公式和由此产生的前向-共轭系统的稳定性估计，这两者可能具有独立兴趣。

Comments 31 pages, 1 figure

详情

AI中文摘要

我们通过采用上下文流映射（CFMs）的抽象来开发一种定量统计理论，用于在大上下文范围内研究transformers：动态系统在一组注意力块中，通过上下文度量的存在演进一个区分的token。在此框架下，有限上下文模型近似于理想化的无限上下文系统，其中上下文度量被其底层总体取代，因此上下文长度n成为统计资源。利用动态的麦肯-瓦尔科夫结构和经典的传播混沌经典机器，我们建立了前向边界，控制有限上下文和无限上下文CFMs在深度上的偏差，并建立了后向边界，控制对应的训练轨迹在在线梯度下降迭代中的偏差。这两个边界实现了通用CFMs的最优Wasserstein速率n^{-1/d}和参数速率n^{-1/2}，对于包含transformers的受限CFM类。分析基于新的欧拉共轭公式和由此产生的前向-共轭系统的稳定性估计，这两者可能具有独立兴趣。

英文摘要

We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent. Both bounds achieve the optimal Wasserstein rate $n^{-1/d}$ for general CFMs and parametric rate $n^{-1/2}$ for a restricted class of CFMs that includes transformers as a special case. The analysis rests on a new Eulerian adjoint formulation of the loss gradient and stability estimates for the resulting forward--adjoint system, both of which may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2605.16746 2026-05-19 cs.AI cs.LG 版本更新

State Contamination in Memory-Augmented LLM Agents

内存增强型大语言模型代理中的状态污染

Yian Wang, Agam Goyal, Yuen Chen, Hari Sundaram

发表机构 * Department of Computer Science, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校计算机科学系）

AI总结研究探讨了内存增强型大语言模型代理中由于状态污染导致的安全问题，通过分析内存总结中的毒性内容传播，提出了一种新的衡量指标，并指出在信息压缩前进行净化可以有效减少潜在影响。

详情

AI中文摘要

LLM代理越来越多地依赖持久化状态，包括转录文本、摘要、检索上下文和内存缓冲区，以支持长周期交互。这使得安全性不仅取决于个体模型输出，还取决于代理存储和后来重用的内容。我们研究了一种称为内存清洗的故障模式：有毒或对抗性上下文可以被压缩成内存摘要，这些摘要在标准检测器下不再显得有毒，但仍保留了影响未来生成的敌对框架或冲突结构。通过配对的反事实多代理模拟，我们证明有毒起源的内存摘要可以保持在常见毒性阈值以下，但相对于匹配的中性基线，仍会增加下游毒性。为了衡量这种隐藏影响，我们引入了子阈值传播间隙（SPG），它量化了在部署监控器视为安全的内存状态下，下游行为差异。我们的实验表明，毒性通过不同的状态通道传播：原始转录文本重用驱动显性下游毒性，而压缩的内存则携带隐藏的子阈值影响。我们进一步发现，缓解依赖于干预位置。在摘要前净化有毒状态可显著减少隐藏传播间隙，而仅清洁完成的摘要则可能保留被清洗的影响。这些结果表明，内存增强型代理的安全性应被视为对演进上下文的状态控制问题，净化应在不安全信息被压缩进持久内存之前应用。

英文摘要

LLM agents increasingly rely on persistent state, including transcripts, summaries, retrieved context, and memory buffers, to support long-horizon interaction. This makes safety depend not only on individual model outputs, but also on what an agent stores and later reuses. We study a failure mode we call memory laundering: toxic or adversarial context can be compressed into memory summaries that no longer appear toxic under standard detectors, while still preserving hostile framing or conflict structure that influences future generations. Using paired counterfactual multi-agent rollouts, we show that toxic-origin memory summaries can remain below common toxicity thresholds while nevertheless increasing downstream toxicity relative to matched neutral baselines. To measure this hidden influence, we introduce the sub-threshold propagation gap (SPG), which quantifies downstream behavioral differences conditioned on memory states that a deployed monitor would classify as safe. Our experiments show that toxicity propagates through distinct state channels: raw transcript reuse drives overt downstream toxicity, while compressed memory carries hidden sub-threshold influence. We further find that mitigation depends critically on intervention placement. Sanitizing toxic state before summarization substantially reduces the hidden propagation gap, whereas cleaning only the completed summary can leave laundered influence intact. These results suggest that safety in memory-augmented agents should be treated as a state-control problem over evolving context, with sanitization applied before unsafe information is compressed into persistent memory.

URL PDF HTML ☆

赞 0 踩 0

2605.16735 2026-05-19 cs.NI cs.LG 版本更新

Transformer-Based MCS Prediction for 5G Multicast-Broadcast Services (MBS)

基于Transformer的5G多播广播服务(MBS)的MCS预测

Kasidis Arunruangsirilert, Jiro Katto

发表机构 * Department of Computer Science and Communications Engineering（计算机科学与通信工程系）； Waseda University（早稻田大学）

AI总结本文提出了一种轻量级的基于Transformer的框架，用于预测即将到来的视频片段 horizon 上所有28个MCS指数的成功概率，以提高5G多播广播服务的可靠性。

Comments 2026 IEEE 104th Vehicular Technology Conference (VTC2026-Fall), 6-9 September 2026, Boston, Massachusetts, USA

详情

AI中文摘要

5G多播广播服务(MBS)的部署正在成为一种关键技术，用于高效频谱的超高清内容交付，并作为现代有线电视部署的有前途的解决方案。然而，与依赖RLC-AM和HARQ重传的单播网络不同，MBS广播在RLC无确认模式(RLC-UM)下运行，其中没有反馈环路意味着丢包是永久的，并立即影响用户QoE。传统链路自适应算法，设计用于单播，通常激进地最大化吞吐量，并在这一风险容忍度低的环境中失败，导致严重的视频卡顿和重新缓冲。为此，我们提出了一种轻量级的基于Transformer的框架，该框架预测即将到来的视频片段 horizon 上所有28个MCS指数的成功概率。利用一个独特的商业网络数据集，具有0.5毫秒的槽级粒度，我们使用一个定制的非对称安全性损失函数训练我们的模型，该函数惩罚信道过估计以优先考虑链路稳定性。实验结果表明，我们的方法在可靠性得分上达到86.89%，显著优于标准AI基线，这些基线优化于原始吞吐量（31.65%），同时保持安全的保守偏见。此外，该模型针对实时应用进行了优化，在COTS 5G时代的智能手机上展示了小于0.07毫秒的推理时间。

英文摘要

The deployment of 5G Multicast-Broadcast Services (MBS) is emerging as a critical technology for spectral-efficient UHD content delivery and serving as a promising solution to modernize CATV deployment. However, unlike unicast networks that rely on RLC-AM with HARQ retransmissions, MBS broadcast operates in RLC Unacknowledged Mode (RLC-UM), where the absence of a feedback loop means packet loss is permanent and immediately impacts user QoE. Conventional link adaptation algorithms, designed for unicast, typically aggressively maximize throughput and fail in this risk-intolerant environment, resulting in severe video stalls and rebuffering. To address this, we propose a lightweight Transformer-based framework that predicts the success probability of all 28 MCS indices over an upcoming video segment horizon. Utilizing a unique commercial network dataset with 0.5 ms slot-level granularity, we train our model using a custom Asymmetric Safety Loss function that penalizes channel overestimation to prioritize link stability. Experimental results show that our approach achieves a reliability score of 86.89%, significantly outperforming standard AI baselines optimized for raw throughput (31.65%) while maintaining a safe conservative bias. Furthermore, the model is optimized for real-time applications, demonstrating an inference time of less than 0.07 ms on COTS 5G-era smartphones.

URL PDF HTML ☆

赞 0 踩 0

2605.16732 2026-05-19 cs.CV cs.LG 版本更新

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

DiRotQ：面向4位扩散变换器的旋转感知量化

Sayeh Sharify, Mahsa Salmani, Hesham Mostafa

发表机构 * d-Matrix

AI总结本文提出DiRotQ，一种W4A4量化框架，通过旋转感知激活量化缓解扩散变换器在4位精度下的性能下降问题，同时引入VLM-as-a-Judge评估协议和Triton定制内核提升压缩下的效率与质量。

详情

AI中文摘要

扩散变换器（DiTs）在图像生成质量上达到最先进的水平，但在推理过程中带来显著的内存和计算成本。尽管激进的后训练量化（PTQ）到4位精度能带来显著的效率提升，但通常会导致严重的质量下降。现有方法，包括基于平滑的方法、混合精度方案、旋转技术以及低秩残差方法，部分缓解了这一问题，但仍与FP16/BF16性能存在明显差距。在本工作中，我们引入DiRotQ，一种W4A4 PTQ框架，通过旋转感知的激活量化来缓解这种降级。DiRotQ通过主成分分析（PCA）识别出捕捉主导激活方差的低秩子空间，在该子空间中保留系数以较高精度，同时将剩余组件量化为4位。在推理时，通过校准得出的正交变换将激活旋转到PCA基底中，而逆旋转被融合到层权重中，离线。结合基于GPTQ的权重量化，DiRotQ在PixArt-Σ数据集上实现了FID（更低越好）为15.9和PSNR（越高越好）为19.1 dB，优于先前最先进的SVDQuant（FID 18.9，PSNR 17.6）在同一INT W4A4设置下的表现。除了标准指标外，我们引入了VLM-as-a-Judge评估协议，这是该设置下的首次此类评估，提供了更全面的感知质量和提示对齐评估。在系统层面，我们实现了基于Triton的定制内核，以实现高效的端到端推理，将12B FLUX.1-dev模型的内存使用减少了2.1倍，并在24 GB RTX 4090 GPU上实现了2.3倍的加速。

英文摘要

Diffusion Transformers (DiTs) achieve state-of-the-art image generation quality but incur substantial memory and computational costs at inference. While aggressive Post-Training Quantization (PTQ) to 4-bit precision offers significant efficiency gains, it typically results in severe quality degradation. Existing approaches, including smoothing-based methods, mixed-precision schemes, rotation techniques, and low-rank residual methods, partially mitigate this issue but still leave a noticeable gap to FP16/BF16 performance. In this work, we introduce DiRotQ, a W4A4 PTQ framework that mitigates this degradation through rotation-aware activation quantization. DiRotQ identifies a low-rank subspace capturing dominant activation variance via Principal Component Analysis (PCA), preserving coefficients in this subspace at higher precision while quantizing the remaining components to 4-bit. Activations are rotated into the PCA basis at inference time using calibration-derived orthogonal transformations, while the inverse rotation is fused into the layer weights offline. Combined with GPTQ-based weight quantization, DiRotQ achieves an FID (lower is better) of 15.9 and PSNR (higher is better) of 19.1 dB on PixArt-Σ over the MJHQ-30K dataset, outperforming the prior state-of-the-art SVDQuant (FID 18.9, PSNR 17.6) under the same INT W4A4 setting. Beyond standard metrics, we introduce a VLM-as-a-Judge evaluation protocol for diffusion model quantization, the first such evaluation in this setting, providing a more holistic assessment of perceptual quality and prompt alignment under aggressive compression. On the systems side, we implement a Triton-based custom kernel to enable efficient end-to-end inference, reducing memory usage of the 12B FLUX.1-dev model by 2.1x and delivering 2.3x speedup over the BF16 baseline, on a 24 GB RTX 4090 GPU.

URL PDF HTML ☆

赞 0 踩 0

2605.16720 2026-05-19 cs.CV cs.LG 版本更新

识别后再投影：从部分观测中利用端-哈密顿结构进行对比学习

Peilun Li, Kaiyuan Tan, Daniel Moyer, Thomas Beckers

发表机构 * Department of Computer Science（计算机科学系）

AI总结本文提出一种两阶段框架，通过对比学习从部分观测中学习隐状态动态，并投影到端-哈密顿子流形，以实现物理一致性。

详情

AI中文摘要

在直接建模不可行的情况下，识别隐状态表示和动态至关重要，尤其是在部分和高维观测下。我们研究了隐式端-哈密顿系统，这是一种包含守恒和耗散动态的结构化类别。我们提出了一种两阶段识别-再投影框架。首先，对比教师从部分观测中学习连续时间隐动态。然后，学生将识别的教师表示和动态投影到端-哈密顿子流形上，通过学习的仿射图表，得到物理一致的实现。作为概念反事实，我们还考虑了单阶段变体，联合学习隐识别和端-哈密顿结构，但发现其可靠性较低，从而提出所提出的两阶段教师-学生框架。我们理论上证明仿射投影是连接对比隐识别的仿射度量和端-哈密顿系统之间的自然桥梁。经验上，我们展示了所提出的两阶段方法在保持教师动态的同时强制物理结构，并在耗散区域和高维视觉设置中比单阶段替代方法更可靠。

英文摘要

Identifying latent state representations and dynamics is essential when direct modeling in observation space is infeasible, particularly under partial and high-dimensional observations. In such settings, representation learning and physics-aware modeling are inherently coupled. We study this problem for latent port-Hamiltonian systems, a structured class encompassing both conservative and dissipative dynamics. We propose a two-stage identify-then-project framework. First, a contrastive teacher learns continuous-time latent dynamics from partial observations. Then, a student projects the identified teacher representation and dynamics onto a port-Hamiltonian submanifold via a learned affine chart, yielding a physically consistent realization. As a conceptual counterfactual, we also consider a single-stage variant that jointly learns latent identification and port-Hamiltonian structure, but find it to be less reliable, motivating the proposed two-stage teacher-student framework. We show theoretically that affine projection is the natural bridge between the affine gauge of contrastive latent identification and the port-Hamiltonian systems. Empirically, we demonstrate that the proposed two-stage approach preserves the teacher's dynamics while enforcing physical structure, and performs more reliably than the single-stage alternative, particularly in dissipative regimes and high-dimensional visual settings.

URL PDF HTML ☆

赞 0 踩 0

2605.16672 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Multi-Object Tracking Consistently Improves Wildlife Inference

多目标跟踪一致地提升野生动物推断

Mufhumudzi Muthivhi, Jiahao Huo, Fredrik Gustafsson, Terence L. van Zyl

发表机构 * World Wide Fund (WWF)（世界自然基金会）； Centre for Artificial Intelligence Research (CAIR)（人工智能研究中心）

AI总结本文利用多目标跟踪技术提升野生动物分类模型的鲁棒性，通过融合轨迹信息改进分类结果，实验表明在三个数据集上均提升了性能。

Comments Accepted for publication in IEEE 2026 29th International Conference on Information Fusion

详情

AI中文摘要

相机陷阱已成为生态研究和生物多样性保护中常用的野生动物监测工具。野生动物分类模型受益于野生动物视觉数据的增加，这些模型在经过整理的高质量数据集上能达到高水平的准确性。然而，其性能仍然易受现实环境约束的影响。在进行时间连续序列的推断时，它们常常产生不一致的预测。单个个体在帧之间的预测标签会迅速变化。本研究利用相机陷阱数据的时间特性来增强野生动物分类模型的推断预测。具体来说，我们采用几种标准的多目标跟踪（MOT）模型，将连续帧中的检测结果进行关联。经过整理的轨迹用于融合softmax类概率。融合的概率评分产生一个单一的共识类标签估计，以覆盖噪声引起的误分类。实验结果分析表明，我们的策略在所有数据集和每个指标上均优于独立分类器。具体而言，表现最好的MOT模型在三个MOT数据集上分别比分类器提高了5.1%、3.1%和2.0%的加权F1分数。

英文摘要

Camera traps have become a common tool for wildlife monitoring efforts in ecological research and biodiversity conservation. Wildlife classification models have benefited from the increase in wildlife visual data. These models reach high levels of accuracy on curated, high-quality datasets. However, their performance remains sensitive to real-world environmental constraints. They often produce inconsistent predictions when performing inference on temporally coherent sequences. The predicted label for a single individual shifts rapidly between frames. This study exploits the temporal nature of camera-trap data to augment inferred predictions from a wildlife classification model. Specifically, we adopt several standard Multi-Object Tracking (MOT) models to link detections across consecutive frames. The curated trajectories are used to fuse the softmax class probabilities. The fused probability score produces a single consensus class label estimate that overrides misclassifications caused by noise. The analysis of the experimental results shows that our proposed strategy improves over a standalone classifier over all datasets and for each metric. Specifically, the best-performing MOT models gain a weighted F1-Score of 5.1%, 3.1% and 2.0% over the classifier across three MOT datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.16671 2026-05-19 cs.AI cs.CV cs.CY cs.LG 版本更新

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

野生环境中的可持续智能：通过知识自适应边缘专家代理实现生态监测民主化

Jiaxing Li, Hao Fang, Chi Xu, Miao Zhang, Jiangchuan Liu, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric

发表机构 * Simon Fraser University（西蒙 Fraser大学）； Wild Salmon Center（野生鲑鱼中心）； Pacific Salmon Foundation（太平洋鲑鱼基金会）； Haida Fisheries Program（海达渔业计划）

AI总结本文提出一种知识自适应边缘代理架构，通过分离视觉感知与推理，结合视觉编码器和动态知识库，实现生态监测的可持续发展，促进伦理AI协同开发。

Comments 10 pages

详情

AI中文摘要

快速的生物多样性丧失凸显了有效监测的紧迫性，但手动调查仍消耗资源。尽管设备上的AI提供了一种可扩展的替代方案，但野外环境中经常受到环境变化的挑战。当前方法依赖云资源，需要持续上传现场数据以重新训练模型。这种方法不适合远程部署，因为它消耗有限的电力和网络连接。为了解决这些限制，本研究提出从模型适应转向知识适应。我们介绍了一种架构，将视觉感知与推理分离，结合视觉编码器和动态知识库。我们使用显式知识库取代隐式编码专家知识到模型参数。这种方法还通过结构化形式保存专家见解来支持知识可持续性。通过跨学科合作与生物学家和原住民社区，这项工作推进了伦理AI的协同开发，促进负责任和文化知情的生态系统管理。

英文摘要

Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Current methods rely heavily on cloud resource, which requires continuous uploading of field data for model retraining. This approach is unsuitable for remote deployments because it consumes limited power and network connectivity. To address these constraints, this research proposes a shift from model adaptation to knowledge adaptation. We introduce an architecture that separates visual perception from reasoning, combining a visual encoder with a dynamic knowledge base. We uses an explicit knowledge base to replace implicitly encoding expert knowledge into model parameters. This method also supports knowledge sustainability by preserving expert insights in a structured form. Through cross-disciplinary collaboration with biologists and Indigenous communities, this work advances ethical AI co-development, fostering responsible and culturally informed ecosystem management.

URL PDF HTML ☆

赞 0 踩 0

2605.16668 2026-05-19 cs.LG cs.AI 版本更新

GraViti: Graph-Level Variational Autoencoders with Relaxed Permutation Invariance

GraViti：具有放松排列不变性的图级变分自编码器

Roman Bresson, Konstantinos Divriotis, Johannes F. Lutzeyer, Iakovos Evdaimon, Michalis Vazirgiannis

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·扎耶德人工智能大学）； LIX, CNRS, École Polytechnique, IP Paris（巴黎理工学院LIX实验室，法国国家科学研究中心，巴黎理工学院，IP巴黎）

AI总结 GraViti通过图级变分自编码器生成紧凑的潜在向量，支持平滑插值和下游任务，优于节点级嵌入。

详情

AI中文摘要

我们介绍了GraViti，一种基于transformer的图级变分自编码器，将整个图映射到紧凑的潜在向量。这种设计产生了一个真正的图级潜在空间，支持平滑插值、属性引导搜索等下游任务，超越节点级嵌入的限制。在分子基准上，GraViti学会解码符合训练数据化学约束的有效样本，表明模型能直接从图级表示中恢复领域规则。我们还显示，在存在可靠规范节点顺序的领域（如分子或贝叶斯网络）中，强制排列不变性可能对一致重建有害。GraViti在大规模数据集上实现了最先进的重建准确性，并提供了坚实的生成性能。其单步解码提供了一种轻量级替代方案，同时保持实用的样本质量。

英文摘要

We introduce GraViti, a transformer-based graph-level variational autoencoder that maps entire graphs to compact latent vectors. This design produces a true graph-level latent space that supports smooth interpolation, property-guided search, and other downstream tasks beyond the constraints of node-level embeddings. On molecular benchmarks, GraViti learns to decode valid samples that follow the chemical constraints present in the training data, showing that the model recovers domain rules directly from graph-level representations. We also show that, in domains where a reliable canonical node ordering exists such as molecules or bayesian networks, enforcing permutation invariance can prove detrimental for consistent reconstruction. GraViti achieves state-of-the-art reconstruction accuracy on large datasets, and provides solid generative performance. Its single-step decoding offers a lightweight alternative to more complex generation pipelines while maintaining practical sample quality.

URL PDF HTML ☆

赞 0 踩 0

2605.16665 2026-05-19 cs.LG physics.geo-ph 版本更新

KamonBench：一种基于语法规则的数据集，用于评估视觉-语言模型中的组合因子恢复

Richard Sproat, Stefano Peluchetti

AI总结 KamonBench通过20000个合成复合徽章及辅助组件示例，提供评估视觉-语言模型中稀疏组合识别和因子恢复的可控测试环境，支持程序代码因子度量和可控因子对重组。

Comments Preprint

2605.12991 2026-05-19 cs.LG cs.AI 版本更新

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

不只是RLHF：为何仅对齐不足以解决多智能体趋同

Adarsh Kumarappan, Ananya Mujoo

发表机构 * California Institute of Technology（加州理工学院）； Evergreen Valley College（艾弗绿谷学院）

AI总结本文研究了多智能体系统在模拟同伴分歧下的错误率问题，发现预训练基础模型与指令模型存在相似的替换模式，且错误率较高。通过激活修补发现错误集中在中间层，修复后可恢复大部分正确率差距。研究还指出压力抑制了清洁推理特征，而非激活新的趋同回路。

详情

AI中文摘要

基于LLM的多智能体管道在模拟同伴分歧下，正确答案转为错误答案的速率我们称为收益，这一漏洞广泛归因于RLHF诱导的趋同。我们测试了四种模型家族，发现这种归因大多不成立：预训练基础模型表现出与指令变体相同的替换模式，其平均收益高于指令变体。通过激活修补，我们发现错误集中在狭窄的中间层窗口，其中注意力承担因果权重，而MLP贡献可忽略不计；在该窗口上方进行修补可恢复96%的清洁到受压P(correct)差距。攻击面分解为两个独立因素（通道框架和共识强度）的相互作用，产生47.5个百分点的收益差距，在多数共识下保持不变，适用于陪审团大小$N \in \{4, 5, 6\}$。两种收敛的激活空间干预显示，压力抑制了清洁推理特征，而非激活新的趋同回路。一个正确论证的异议者在所有测试框架中将收益降低54-73个百分点，而最强的提示级防御在攻击变体超出其设计范围时失效。缓解措施应针对机制，而非提示级防御，应在管道层面实施结构化异议。

英文摘要

LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this attribution across four model families and find it largely wrong: pretrained base models exhibit the same substitution pattern as their Instruct variants, averaging higher yield than Instruct. Using activation patching, we localize the corruption to a narrow mid-layer window where attention carries the causal weight and MLP contribution is negligible; patching above this window restores 96% of the clean-to-pressured P(correct) gap. The attack surface decomposes into two independent factors (channel framing and consensus strength) whose interaction produces a 47.5 percentage-point yield gap at majority consensus, preserved across jury sizes $N \in \{4, 5, 6\}$. Two converging activation-space interventions show that pressure suppresses clean-reasoning features rather than activating a new sycophancy circuit. A single correctly-arguing dissenter reduces yield by 54-73 percentage points across all framings tested, whereas the strongest prompt-level defense fails on attack variants outside its design surface. Mitigations should target the mechanism, structured dissent at the pipeline level, rather than prompt-level defenses.

URL PDF HTML ☆

赞 0 踩 0

2605.12825 2026-05-19 cs.LG cs.AI 版本更新

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Orthrus：通过双视角扩散实现内存高效的并行令牌生成

Chien Van Nguyen, Chaitra Hegde, Van Cuong Pham, Ryan A. Rossi, Franck Dernoncourt, Thien Huu Nguyen

发表机构 * University of Oregon（俄勒冈大学）； Google DeepMind（谷歌深Mind）； Adobe Research（Adobe研究）

AI总结 Orthrus结合自回归大语言模型的高保真生成与扩散模型的高速并行生成，通过双视角机制实现高效推理，提升速度7.8倍且内存开销极低。

详情

AI中文摘要

我们介绍Orthrus，一种简单高效的双架构框架，结合自回归大语言模型（LLM）的精确生成保真度与扩散模型的高速并行令牌生成。标准自回归解码的序列性是高吞吐推理的根本瓶颈。尽管扩散语言模型试图通过并行生成突破这一瓶颈，但存在显著的性能下降、高训练成本和缺乏严格的收敛保证。Orthrus原生解决这一二元对立。设计用于无缝集成到现有Transformer中，框架在冻结的LLM上添加一个轻量可训练模块，创建一个并行扩散视角与标准自回归视角。在统一系统中，两个视角均关注相同的高保真键值（KV）缓存；自回归头执行上下文预填充以构建准确的KV表示，而扩散头执行并行生成。通过在两个视角之间采用精确的一致性机制，Orthrus保证无损推理，仅以O(1)的内存缓存开销和极小的参数增加，即可实现高达7.8倍的速度提升。

英文摘要

We introduce Orthrus, a simple and efficient dual-architecture framework that unifies the exact generation fidelity of autoregressive Large Language Models (LLMs) with the high-speed parallel token generation of diffusion models. The sequential nature of standard autoregressive decoding represents a fundamental bottleneck for high-throughput inference. While diffusion language models attempt to break this barrier via parallel generation, they suffer from significant performance degradation, high training costs, and a lack of rigorous convergence guarantees. Orthrus resolves this dichotomy natively. Designed to seamlessly integrate into existing Transformers, the framework augments a frozen LLM with a lightweight, trainable module to create a parallel diffusion view alongside the standard autoregressive view. In this unified system, both views attend to the exact same high-fidelity Key-Value (KV) cache; the autoregressive head executes context pre-filling to construct accurate KV representations, while the diffusion head executes parallel generation. By employing an exact consensus mechanism between the two views, Orthrus guarantees lossless inference, delivering up to a 7.8x speedup with only an O(1) memory cache overhead and minimal parameter additions.

URL PDF HTML ☆

赞 0 踩 0

2605.12547 2026-05-19 econ.EM cs.LG q-fin.ST stat.AP 版本更新

The Payment Heterogeneity Index: An Integrated Unsupervised Framework for High-Volume Procurement Oversight and Decision Support

支付异质性指数：一种用于高 volume 采购监督和决策支持的集成无监督框架

Kyriakos Christodoulides

发表机构 * Philips University, Department of Computer Science（菲利普斯大学计算机科学系）

AI总结本文提出支付异质性指数（PHI），通过整合高斯混合模型参数和非参数统计，用于高 volume 采购监督和决策支持，揭示支付结构和潜在模式。

Comments Request category change from econ.EM -> stat.ML. Paper is methodological, introducing a new unsupervised ML/stat framework (SHI/PHI index) for distributional structure. Methodology is general; procurement is the application. stat.ML is more appropriate primary; econ.EM as cross-list

详情

AI中文摘要

公共采购易受错误、欺诈和腐败影响，特别是在高交易量超出监督能力时。尽管研究常关注招标阶段异常，但中标后付款监控仍被忽视。由于标记数据稀缺且如本福特定律等方法假设限制多，需要可解释的无监督框架用于高 volume 采购监督和决策支持。本文引入结构异质性指数（SHI），一种一维样本复合统计量，及其支付特定实例支付异质性指数（PHI），用于表征支付结构和潜在模式。它整合高斯混合模型（GMM）参数和非参数统计，整合四个可解释组件：模态、不对称性、尾部行为和结构分散性。独特的是，尾部行为组件捕捉分布厚重和极值集中，而结构分散性结合了潜在支付模式的变异性、普遍性和分离度。应用于英国市政采购数据，PHI识别出一个财务显著的供应商群体（0.6%的供应商；10.1%的高 volume 供应商）具有结构不同的支付模式。统计检验进一步支持这些差异，针对性的人工验证确认了优先案例的合理性。比较分析显示PHI揭示了被变异系数（ρ=0.310）掩盖的模式分离。PHI提供了一个透明、可分解且计算轻量的框架用于采购完整性监督和目标审计优先级。

英文摘要

Public procurement is vulnerable to error, fraud, and corruption, particularly as high transaction volumes overwhelm oversight. While research often focuses on tender-stage anomalies, post-award payment monitoring remains underexplored. Since labelled datasets are rare and methods like Benford's Law face restrictive assumptions, there is a need for interpretable, unsupervised frameworks for high-volume procurement oversight and decision support. This paper introduces the Structural Heterogeneity Index (SHI), a composite statistic for one-dimensional samples, and its payment-specific instantiation, the Payment Heterogeneity Index (PHI), characterising payment structure and latent regimes. It incorporates Gaussian Mixture Model (GMM) parameters alongside non-parametric statistics, integrating four interpretable components: modality, asymmetry, tail behaviour, and structural dispersion. Uniquely, the tail-behaviour component captures both distributional heaviness and extreme-value concentration, while structural-dispersion combines the variability, prevalence, and separation of latent payment regimes. Applied to UK municipal procurement data, PHI identifies a financially significant cohort (0.6\% of suppliers; 10.1\% of high-volume vendors) with structurally distinct payment patterns. Statistical testing further supports these differences, and targeted human verification confirms the plausibility of prioritised cases. Comparative analysis shows PHI reveals regime separation obscured by the Coefficient of Variation ($ρ= 0.310$). PHI provides a transparent, decomposable, and computationally lightweight framework for procurement integrity oversight and targeted audit prioritisation.

URL PDF HTML ☆

赞 0 踩 0

2605.12070 2026-05-19 cs.LG cs.AI 版本更新

AutoLLMResearch: 训练研究代理以自动化LLM实验配置 - 从低成本学习，优化高成本

Taicheng Guo, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

发表机构 * University of Notre Dame（诺丁汉大学）

AI总结本文提出AutoLLMResearch框架，通过多保真度实验环境学习LLM配置原则，解决高成本实验自动化问题，展示其在大规模LLM实验中的有效性与通用性。

详情

AI中文摘要

有效配置可扩展的大规模语言模型（LLM）实验，涵盖架构设计、超参数调优等，对推进LLM研究至关重要，因为糟糕的配置选择会浪费大量计算资源并阻碍模型潜力的实现。以往的自动化方法适用于低成本环境，但可扩展的LLM实验成本过高，无法进行大量迭代。为了解决这一问题，我们提出AutoLLMResearch，一个模仿人类研究人员从低保真度实验中学习一般性原则并高效识别高成本LLM配置的代理框架。核心挑战是如何使代理通过与多保真度实验环境的交互学习LLM配置景观的结构。为此，我们提出一个系统框架，包含两个关键组件：1) LLMConfig-Gym，涵盖四个关键LLM实验任务的多保真度环境，支持超过一百万GPU小时的可验证实验结果；2) 一个结构化训练管道，将配置研究建模为长周期马尔可夫决策过程，并相应地激励跨保真度外推推理。在各种强基线上的广泛评估表明了我们框架的有效性、通用性和可解释性，支持其作为大规模现实LLM实验自动化的实用且通用解决方案的潜力。

英文摘要

Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial computational resources and prevent models from realizing their full potential. Prior automated methods are designed for low-cost settings where repeated trial and error is feasible, but scalable LLM experiments are too expensive for such extensive iteration. To our knowledge, no work has addressed the automation of high-cost LLM experiment configurations, leaving this problem labor-intensive and dependent on expert intuition. Motivated by this gap, we propose AutoLLMResearch, an agentic framework that mimics how human researchers learn generalizable principles from low-fidelity experiments and extrapolate to efficiently identify promising configurations in expensive LLM settings. The core challenge is how to enable an agent to learn, through interaction with a multi-fidelity experimental environment that captures the structure of the LLM configuration landscape. To achieve this, we propose a systematic framework with two key components: 1) LLMConfig-Gym, a multi-fidelity environment encompassing four critical LLM experiment tasks, supported by over one million GPU hours of verifiable experiment outcomes; 2) A structured training pipeline that formulates configuration research as a long-horizon Markov Decision Process and accordingly incentivizes cross-fidelity extrapolation reasoning. Extensive evaluation against diverse strong baselines on held-out experiments demonstrates the effectiveness, generalization, and interpretability of our framework, supporting its potential as a practical and general solution for scalable real-world LLM experiment automation.

URL PDF HTML ☆

赞 0 踩 0

2605.11480 2026-05-19 cs.LG 版本更新

Efficient Adjoint Matching for Fine-tuning Diffusion Models

高效对抗匹配用于扩散模型微调

Jeongwoo Shin, Dongsoo Shin, Yuchen Zhu, Wei Guo, Yongxin Chen, Joonseok Lee, Jaewoong Choi, Jaemoo Choi

发表机构 * Seoul National University（首尔国立大学）； Georgia Institute of Technology（佐治亚理工学院）； Sungkyunkwan University（庆尚大学）

AI总结本文提出高效对抗匹配(EAM)，通过改用线性基础漂移和修改终端成本，解决对抗匹配在扩散模型微调中的计算瓶颈，使训练效率提升4倍并在多个指标上表现优异。

详情

AI中文摘要

奖励微调已成为对齐预训练扩散和流模型与人类偏好的常见方法。在基于奖励梯度的方法中，对抗匹配（AM）通过将奖励微调视为随机最优控制（SOC）问题提供了系统化的公式。然而，AM不可避免地需要显著的计算成本：它要求（i）在无记忆动态下对完整生成轨迹进行随机模拟，导致大量的函数评估，以及（ii）沿每个采样轨迹进行反向ODE模拟。在本工作中，我们观察到这两个瓶颈都与从预训练模型继承的非平凡基础漂移密切相关。受此启发，我们提出高效对抗匹配（EAM），通过将SOC问题改用线性基础漂移和相应修改的终端成本，大幅提高训练效率。此改写消除了两种无效来源；它使训练时采样能够使用几步确定性ODE求解器，并产生闭合形式的伴随解，从而消除反向伴随模拟。在标准的文本到图像奖励微调基准上，EAM比AM快4倍收敛，并在PickScore、ImageReward、HPSv2.1、CLIPScore和Aesthetics等各项指标上匹配或超越了AM。

英文摘要

Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled formulation by casting reward fine-tuning as a stochastic optimal control (SOC) problem. However, AM inevitably requires a substantial computational cost: it requires (i) stochastic simulation of full generative trajectories under memoryless dynamics, resulting in a large number of function evaluations, and (ii) backward ODE simulation of the adjoint state along each sampled trajectory. In this work, we observe that both bottlenecks are closely tied to the \textit{non-trivial base drift} inherited from the pretrained model. Motivated by this observation, we propose \textbf{Efficient Adjoint Matching (EAM)}, which substantially improves training efficiency by reformulating the SOC problem with a \textit{linear base drift} and a correspondingly modified \textit{terminal cost}. This reformulation removes both sources of inefficiency; it enables training-time sampling with a few-step deterministic ODE solver and yields a closed-form adjoint solution that eliminates backward adjoint simulation. On standard text-to-image reward fine-tuning benchmarks, EAM converges up to 4x faster than AM and matches or surpasses it across various metrics including PickScore, ImageReward, HPSv2.1, CLIPScore and Aesthetics.

URL PDF HTML ☆

赞 0 踩 0

2605.10923 2026-05-19 cs.LG cs.CL 版本更新

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

动态技能生命周期管理用于代理强化学习

Junhao Shen, Teng Zhang, Xiaoyan Zhao, Hong Cheng

发表机构 * Database Group, The Chinese University of Hong Kong（香港中文大学数据库组）； Lanzhou University（兰州大学）

AI总结本文提出SLIM框架，通过动态优化变量管理代理强化学习中的外部技能集，提升任务性能。

Comments Implementation code is available at https://github.com/ejhshen/SLIM

详情

AI中文摘要

大型语言模型代理越来越多地依赖外部技能来解决复杂任务，其中技能作为模块化单元扩展其能力。现有方法假设外部技能要么积累为持久指导或内化到策略中，最终导致零技能推断。本文认为这一假设过于限制，因为参数容量有限且不同技能的边际贡献不均，最优活跃技能集是非单调、任务和阶段依赖的。本文提出SLIM，一种动态技能生命周期管理框架，将活跃的外部技能集作为动态优化变量与策略学习共同更新。具体而言，SLIM通过留一技能验证估计每个活跃技能的边际外部贡献，然后应用三种生命周期操作：保留高价值技能、退役贡献变得微不足道的技能、以及在持续失败揭示缺失能力覆盖时扩展技能库。实验显示，SLIM在ALFWorld和SearchQA上平均比最佳基线高出7.1个百分点。结果进一步表明，策略学习和外部技能保留并非互斥：某些技能被吸收进策略，而其他技能继续提供外部价值，支持SLIM作为基于技能的代理强化学习更通用的范式。

英文摘要

Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized into the policy, eventually leading to zero-skill inference. We argue this assumption is overly restrictive, since with limited parametric capacity and uneven marginal contribution across skills, the optimal active skill set is non-monotonic, task- and stage-dependent. In this work, we propose SLIM, a framework of dynamic Skill LIfecycle Management for agentic reinforcement learning (RL), which treats the active external skill set as a dynamic optimization variable jointly updated with policy learning. Specifically, SLIM estimates each active skill's marginal external contribution through leave-one-skill-out validation, then applies three lifecycle operations: retaining high-value skills, retiring skills whose contribution becomes negligible after sufficient exposure, and expanding the skill bank when persistent failures reveal missing capability coverage. Experiments show that SLIM outperforms the best baselines by an average of 7.1% points across ALFWorld and SearchQA. Results further indicate that policy learning and external skill retention are not mutually exclusive: some skills are absorbed into the policy, while others continue to provide external value, supporting SLIM as a more general paradigm for skill-based agentic RL.

URL PDF HTML ☆

赞 0 踩 0

2605.10759 2026-05-19 cs.LG cs.CV 版本更新

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models

强化共轭匹配：扩散和流匹配模型的后训练强化学习扩展

Andreas Bergmeister, Stefanie Jegelka, Nikolas Nüsken, Carles Domingo-Enrich, Jakiw Pidstrigach

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； King's College London（伦敦国王学院）； Microsoft Research New England（微软研究院新英格兰分部）； University of Oxford（牛津大学）

AI总结本文提出Reinforce Adjoint Matching方法，通过强化学习后训练优化扩散和流匹配模型，无需SDE回滚或梯度，提升生成质量与人类偏好匹配度。

详情

AI中文摘要

扩散和流匹配模型的扩展性源于预训练的监督回归：干净样本通过分析噪声，模型回归闭式目标。强化学习后训练将模型对齐于奖励。在图像生成中，这使样本正确组成物体、清晰渲染文本并匹配人类偏好。现有方法依赖于成本高的SDE回滚、奖励梯度或替代损失，牺牲了预训练的回归结构。我们证明结构可扩展至强化学习后训练。在KL正则化的奖励最大化下，最优生成过程使干净端点分布向奖励更高的样本倾斜，而噪声法则不变。结合此与共轭匹配最优条件和REINFORCE恒等式，我们推导出Reinforce Adjoint Matching（RAM）：一种一致性损失，修正预训练目标与奖励。每一步，从当前模型抽样干净端点，评估其奖励，按预训练方式噪声化，并回归。无需SDE回滚、反向共轭扫描或奖励梯度。如同预训练目标，RAM简单且可扩展。在Stable Diffusion 3.5M上，RAM在可组合性、文本渲染和人类偏好方面达到最高奖励，达到Flow-GRPO的峰值奖励，训练步骤减少达50倍。

英文摘要

Diffusion and flow-matching models scale because pretraining is supervised regression: a clean sample is noised analytically, and a model regresses against a closed-form target. RL post-training aligns the model with a reward. In image generation, this makes samples compose objects correctly, render text legibly, and match human preferences. Existing methods rely on costly SDE rollouts, reward gradients, or surrogate losses, sacrificing pretraining's regression structure. We show that the structure extends to RL post-training. Under KL-regularized reward maximization, the optimal generative process tilts the clean-endpoint distribution towards samples with higher reward and leaves the noising law unchanged. Combining this with the adjoint-matching optimality condition and a REINFORCE identity, we derive Reinforce Adjoint Matching (RAM): a consistency loss that corrects the pretraining target with the reward. At each step, we draw a clean endpoint from the current model, evaluate its reward, noise it as in pretraining, and regress. No SDE rollouts, backward adjoint sweeps, or reward gradients are required. Like the pretraining objective, RAM is simple and scales. On Stable Diffusion 3.5M, RAM achieves the highest reward on composability, text rendering, and human preference, reaching Flow-GRPO's peak reward in up to $50\times$ fewer training steps.

URL PDF HTML ☆

赞 0 踩 0

2605.09395 2026-05-19 cs.AI cs.LG cs.MA cs.MM 版本更新

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

通过定制代理推理增强VLMs在少样本多模态时间序列分类中的能力

Lin Li, Jiawei Huang, Qihao Quan, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Wenjie Feng, Jian Lou, See-Kiong Ng

发表机构 * Sun Yat-sen University（中山大学）； Xiaomi Corporation（小米公司）； University of Science and Technology of China（中国科学技术大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出MarsTSC框架，通过自演化知识库和代理推理提升少样本多模态时间序列分类性能，实验表明其在六个VLM基础上均优于传统和基础模型基线。

Comments 18 pages, 12 figures, 6 tables. Preprint

详情

AI中文摘要

本文提出首个VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning框架用于少样本多模态时间序列分类（MarsTSC），引入自演化知识库作为动态上下文，通过反思代理推理不断优化。框架包含三个协作角色：i) 生成器通过推理进行可靠分类；ii) 反射器诊断推理错误根源以获得判别性见解；iii) 修改器应用验证更新以防止上下文崩溃。进一步引入测试时更新策略以实现谨慎持续的知识库优化，缓解少样本偏差和分布偏移。在12个主流时间序列基准上的广泛实验表明，MarsTSC在六个VLM基础上均取得显著且一致的性能提升，优于传统和基础模型基线，并生成可解释的推理依据，使每个分类决策都基于人类可读的特征证据。

英文摘要

In this paper, we propose the first VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning framework for few-$\underline{\textbf{s}}$hot multimodal $\underline{\textbf{T}}$ime $\underline{\textbf{S}}$eries $\underline{\textbf{C}}$lassification ($\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift. Extensive experiments across 12 mainstream time series benchmarks demonstrate that $\textbf{MarsTSC}$ delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.

URL PDF HTML ☆

赞 0 踩 0

超越线性注意：Softmax变换器实现上下文强化学习

Zixuan Xie, Xinyu Liu, Claire Chen, Shuze Daniel Liu, Rohan Chandra, Shangtong Zhang

发表机构 * University of Virginia（弗吉尼亚大学）； California Institute of Technology（加州理工学院）； Purdue University（普渡大学）

AI总结本文研究了在预训练后通过上下文适应新任务的强化学习代理，通过softmax注意力机制证明了Transformer层前向传递等价于加权softmax时序差分算法的迭代更新，并证明了参数在预训练损失中的全局极小性。

详情

AI中文摘要

上下文强化学习（ICRL）研究的是在预训练后，通过额外上下文条件适应新任务而无需参数更新的智能体。现有ICRL理论分析大多依赖线性注意力，即用身份映射替代标准注意力中的softmax函数。本文首次在不采用不现实的线性注意力简化的情况下，提供了ICRL的理论理解。特别地，我们考虑了实践中使用的标准softmax注意力。我们证明，在某些参数下，具有此类softmax注意力的Transformer的层间前向传递等价于加权softmax时序差分（TD）学习算法的迭代更新。这里，加权softmax TD是一种新的强化学习算法，它在核空间中进行策略评估，并采用线性TD和表格TD作为特殊情况。我们还证明，在某种收缩条件下，随着层数增加，策略评估误差会减小，且上述参数在此条件下成立。最后，我们证明这些参数是预训练损失的全局极小值，解释了它们在数值实验中的出现。

英文摘要

In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the softmax function in the standard attention with an identity mapping. This paper provides the first theoretical understanding of ICRL without making the unrealistic linear attention simplification. In particular, we consider the standard softmax attention used in practice. We show that, with certain parameters, the layerwise forward pass of a Transformer with such softmax attention is equivalent to iterative updates of a weighted softmax temporal difference (TD) learning algorithm. Here, weighted softmax TD is a new RL algorithm that performs policy evaluation in kernel space and adopts both linear TD and tabular TD as special cases. We also prove that under a certain contraction condition, the policy evaluation error decays as the number of layers grows, with the identified parameters above. Finally, we prove that those parameters are a global minimizer of a pretraining loss, explaining their emergence in our numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.07098 2026-05-19 cs.LG physics.comp-ph 版本更新

CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation

CarCrashNet：一个大规模数据集和分层神经求解器用于数据驱动的结构碰撞仿真

Mohamed Elrefaie, Dule Shu, Matthew Klenk, Faez Ahmed

发表机构 * MIT（麻省理工学院）； Toyota Research Institute（丰田研究院）； Future Product Innovation（未来产品创新）

AI总结本文提出CarCrashNet数据集和分层神经求解器，用于数据驱动的结构碰撞仿真，包含14000多个碰撞模拟和825辆整车碰撞模拟，通过开源求解器验证并评估了基于机器学习的求解器性能。

详情

AI中文摘要

碰撞仿真是现代汽车开发的核心，因为它减少了昂贵物理原型的需求，加速了安全驱动的设计迭代，并越来越多地支持虚拟测试流程。同时，建模结构碰撞力学仍然极具挑战性：响应由非线性接触、大变形、材料塑性、失效和复杂多体相互作用在高分辨率有限元网格上空间和时间演化决定。在本工作中，我们介绍了CarCrashNet，一个公开的高保真开源基准，用于数据驱动的结构碰撞仿真。CarCrashNet结合了组件级和整车级别的仿真，在多模态格式中包含超过14000个保险杠-梁杆碰撞仿真，具有变化的几何形状、材料和边界条件，以及825辆整车碰撞仿真，基于三种行业标准车辆模型：Dodge Neon、Toyota Yaris和Chevrolet Silverado。为了建立基准的可靠性，我们验证了基于OpenRadioss的开源有限元工作流程，与实验碰撞数据和商业求解器Ansys LS-DYNA进行对比。我们还引入了CrashSolver，一种设计用于从高分辨率有限元碰撞数据预测整车碰撞的机器学习模型。我们进一步在发布的数据集上进行了广泛的基准测试，并评估了CrashSolver与最先进的几何深度学习和基于变压器的神经求解器。我们的结果将CarCrashNet定位为结构仿真、碰撞worthiness建模和AI驱动的虚拟碰撞测试可重复研究的基础。数据集可在https://github.com/Mohamedelrefaie/CarCrashNet获取。

英文摘要

Crash simulation is a cornerstone of modern vehicle development because it reduces the need for costly physical prototypes, accelerates safety-driven design iteration, and increasingly supports virtual testing workflows. At the same time, modeling structural crash mechanics remains exceptionally challenging: the response is governed by nonlinear contact, large deformation, material plasticity, failure, and complex multi-body interactions evolving over space and time on high-resolution finite-element meshes. In this work, we introduce CarCrashNet, a public high-fidelity open-source benchmark for data-driven structural crash simulation. CarCrashNet combines component-scale and full-vehicle simulations in a multi-modal format, including more than 14,000 bumper-beam pole-impact simulations with varying geometry, materials, and boundary conditions, together with 825 full-vehicle crash simulations built from three industry-standard vehicle models of increasing structural complexity: Dodge Neon, Toyota Yaris, and Chevrolet Silverado. To establish the reliability of the benchmark, we validate our open-source finite-element workflow based on OpenRadioss against both experimental crash data and the commercial solver Ansys LS-DYNA. We also introduce CrashSolver, a machine-learning model designed for full-vehicle crash prediction from high-resolution finite-element crash data. We further perform extensive benchmarking across the released datasets and evaluate CrashSolver against state-of-the-art geometric deep learning and transformer-based neural solvers. Our results position CarCrashNet as a foundation for reproducible research in structural simulation, crashworthiness modeling, and AI-driven virtual crash testing. The dataset is available at https://github.com/Mohamedelrefaie/CarCrashNet.

URL PDF HTML ☆

赞 0 踩 0

2605.07005 2026-05-19 cs.DS cs.LG 版本更新

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift

粗粒度与细粒度模型在存在分布偏移学习中的等价性

Adam R. Klivans, Shyamal Patel, Konstantinos Stavropoulos, Arsen Vasilyan

发表机构 * UT Austin（得克萨斯大学）； Columbia University（哥伦比亚大学）

AI总结本文探讨了在无分布假设下，粗粒度与细粒度学习模型的等价性，证明了通过黑盒减少将PQ学习转换为TDS学习的效率，并展示了通过成员查询可绕过硬度结果，实现半空间的分布自由PQ可学习性。

Comments 26 pages, Accepted to COLT 2026

详情

AI中文摘要

最近关于能保证高效学习存在分布偏移的算法研究，集中在两种模型上：PQ学习（Goldwasser等人，2020）和TDS学习（Klivans等人，2024）。TDS学习算法允许在检测到分布偏移时完全拒绝测试集，而PQ学习者只能根据个体点是否被认为是分布外来拒绝。我们的主要结果是在无分布假设下，这两种模型之间存在令人惊讶的等价性。具体来说，我们为任何布尔概念类给出了高效的黑盒减少方法，将PQ学习转换为TDS学习。这种等价性意味着首次在无分布假设下对基本类如半空间的TDS学习的难度结果。我们等价性的主要技术贡献是通过分支程序提升TDS学习者拒绝目标域的弱区分能力。我们还展示，给学习器提供成员查询访问可以绕过这些难度结果，并允许高效地分布自由地学习半空间。我们的算法通过迭代地从训练数据上应用连续的Forster变换来恢复大边距分离器。

英文摘要

Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely if distribution shift is detected. In contrast, PQ learners may only reject points that are deemed out-of-distribution on an individual basis. Our main result is a surprising equivalence between these two models in the distribution-free setting. In particular, we give an efficient black-box reduction from PQ learning to TDS learning for any Boolean concept class. This equivalence implies the first hardness results for distribution-free TDS learning of basic classes such as halfspaces. The main technical contribution underlying our equivalence is a method for boosting, via branching programs, the weak distinguishing power of TDS learners that have rejected the target domain. We also show that giving a learner access to membership queries sidesteps these hardness results and allows for efficient, distribution-free PQ learnability of halfspaces. Our algorithm iteratively recovers large-margin separators obtained by applying successive Forster transforms on the training data.

URL PDF HTML ☆

赞 0 踩 0

2605.06017 2026-05-19 cs.LG math.PR 版本更新

Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards

自回归序列的矩阵解耦浓度：稀疏长上下文奖励的维度无关保证

Pei-Sen Li

发表机构 * School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100872, China（北京理工大学数学系，北京理工大学，北京100872，中国）

AI总结本文提出矩阵解耦浓度框架，通过严格因果依赖解析和目标敏感性向量的精确矩阵-向量乘法，解决自回归模型中依赖结构与目标敏感性分离的问题，提供维度无关的方差代理和长上下文推理的稳定性保证。

详情

AI中文摘要

自回归大语言模型（LLM）在序列层面的评估依赖于高度依赖的token生成。现有框架中存在两个根本性瓶颈：(i) 经典不等式通常将依赖结构与目标敏感性分离，导致标量崩溃，使稀疏终端奖励的方差代理膨胀到次优O(N)；(ii) 虽然某些空间方法能获得更紧的界限，但它们缺乏严格因果过滤所必需的条件，无法应用于自回归设置。为解决这两个瓶颈，本文建立了适用于依赖序列的尖锐McDiarmid型不等式，由因果依赖解析的精确矩阵-向量乘法严格支配。该矩阵解耦浓度（MDC）框架原生恢复马尔可夫链的最优常数，并利用定向d分离获得因果树的最优界限。关键在于通过严格因果框架精确保持奖励的坐标稀疏性，数学上防止标量崩溃，保证维度无关的O(1)方差代理，并为长上下文推理的稳定性提供严格数学依据。

英文摘要

Sequence-level evaluations in autoregressive Large Language Models (LLMs) rely on highly dependent token generation. Establishing tight concentration bounds for these processes remains a challenge due to two fundamental bottlenecks in existing frameworks: (i) classical inequalities typically separate dependency structures from target sensitivities, leading to a scalar collapse that inflates the variance proxy to a suboptimal $\mathcal{O}(N)$ for sparse terminal rewards; (ii) conversely, while certain spatial methods achieve tighter bounds, they lack the strictly causal filtration required by sequential generation, rendering them inapplicable to the autoregressive setting. To resolve both bottlenecks, we establish a sharp McDiarmid-type inequality for dependent sequences, governed strictly by the exact matrix-vector multiplication of the causal dependency resolvent and the target sensitivity vector. This Matrix-Decoupled Concentration (MDC) framework natively recovers optimal constants for Markov chains and exploits directed $d$-separation to yield order-optimal bounds for causal trees. Crucially, by exactly preserving the coordinate-wise sparsity of rewards within a strictly causal framework, MDC mathematically prevents scalar collapse, guaranteeing a dimension-free $\mathcal{O}(1)$ variance proxy and providing a rigorous mathematical justification for the stability of long-context reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.05870 2026-05-19 cs.LG 版本更新

Wasserstein分布鲁棒遗憾优化用于人类反馈的强化学习

Yikai Wang, Shang Liu, Jose Blanchet

发表机构 * Department of Statistics and Operations Research, University of North Carolina（统计与运筹学系，北卡罗来纳大学）； Imperial Business School, Imperial College London（帝国理工学院伦敦商学院）； Department of Management Science and Engineering, Stanford University（管理科学与工程系，斯坦福大学）

AI总结本文提出Wasserstein分布鲁棒遗憾优化（DRRO）用于强化学习从人类反馈，通过简单分配模型研究提示问题，展示在ℓ1-地面成本Wasserstein模糊集下，内最坏遗憾有精确解，最优策略具有水填充结构，从而实现高效政策梯度算法。

详情

AI中文摘要

强化学习从人类反馈（RLHF）已成为对齐大语言模型的核心后训练步骤，但RLHF中使用的奖励信号仅是真实人类效用的学得代理。从运筹学角度看，这形成了一个目标不准确的决策问题：策略是针对估计奖励优化，而部署性能由未观察的目标决定。由此产生的差距导致奖励过度优化，即Goodharting现象，即代理奖励在真正质量下降后仍继续改善。现有缓解方法通过不确定性惩罚、悲观奖励或保守约束，但这些方法计算上负担重且过于悲观。我们提出Wasserstein分布鲁棒遗憾优化（DRRO）用于RLHF。不同于标准DRO悲观最坏价值，DRRO悲观最坏遗憾相对于相同合理奖励扰动下的最佳策略。我们通过简单分配模型研究提示问题，展示在ℓ1-地面成本Wasserstein模糊集下，内最坏遗憾有精确解，最优策略具有水填充结构。这些结果导致具有简单采样奖金解释和仅小幅改动GRPO式RLHF训练的实用策略梯度算法。该框架还理论上澄清了为什么DRRO比DRO更不悲观，且实验显示DRRO比现有基线更有效缓解过度优化，而标准DRO系统性过悲观。

英文摘要

Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human utility. From an operations research perspective, this creates a decision problem under objective misspecification: the policy is optimized against an estimated reward, while deployment performance is determined by an unobserved objective. The resulting gap leads to reward over-optimization, or Goodharting, where proxy reward continues to improve even after true quality deteriorates. Existing mitigations address this problem through uncertainty penalties, pessimistic rewards, or conservative constraints, but they can be computationally burdensome and overly pessimistic. We propose Wasserstein distributionally robust regret optimization (DRRO) for RLHF. Instead of pessimizing worst-case value as in standard DRO, DRRO pessimizes worst-case regret relative to the best policy under the same plausible reward perturbation. We study the promptwise problem through a simplex allocation model and show that, under an $\ell_1$-ground-cost Wasserstein ambiguity set, the inner worst-case regret admits an exact solution and the optimal policy has a water-filling structure. These results lead to a practical policy-gradient algorithm with a simple sampled-bonus interpretation and only minor changes to GRPO-style RLHF training. The framework also clarifies theoretically why DRRO is less pessimistic than DRO, and our experiments show that DRRO mitigates over-optimization more effectively than existing baselines while standard DRO is systematically over-pessimistic.

URL PDF HTML ☆

赞 0 踩 0

2604.26904 2026-05-19 cs.CL cs.AI cs.LG 版本更新

ClawGym: A Scalable Framework for Building Effective Claw Agents

ClawGym：一种构建有效Claw代理的可扩展框架

Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao, Ji-Rong Wen

发表机构 * Gaoling School of Artificial Intelligence（人工智能学院）； Renmin University of China（中国人民大学）； IQuest Research（IQuest研究）； Beihang University（北航）

AI总结本文提出ClawGym框架，用于构建Claw式个人代理的全生命周期，通过合成可验证训练数据和强化学习方法提升代理效能。

详情

AI中文摘要

Claw-style环境支持在本地文件、工具和持久工作区状态上进行多步骤工作流。然而，围绕这些环境的可扩展开发受限于缺乏系统框架，尤其是合成可验证训练数据并将其与代理训练和诊断评估集成的框架。为解决这一挑战，我们提出了ClawGym，一种支持Claw式个人代理全生命周期的可扩展框架。具体而言，我们构建了ClawGym-SynData，一个包含13500个过滤任务的多样化数据集，这些任务由基于人物驱动的意图和技能基础操作合成，配以现实的模拟工作区和混合验证机制。我们随后通过在黑箱滚动轨迹上进行监督微调训练了一组有能力的Claw式模型，称为ClawGym-Agents，并进一步通过轻量级管道探索强化学习，该管道在每项任务的沙箱中并行化滚动。为了支持可靠的评估，我们进一步构建了ClawGym-Bench，一个通过自动化过滤和人工LLM审查校准的200个实例的基准。相关资源已发布在https://github.com/ClawGym。

英文摘要

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes. To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources have been released at https://github.com/ClawGym.

URL PDF HTML ☆

赞 0 踩 0

2604.25858 2026-05-19 cs.LG cs.AI 版本更新

Investigation into In-Context Learning Capabilities of Transformers

对Transformer在上下文学习能力的调查

Rushil Chandrupatla, Leo Bangayan, Sebastian Leng

AI总结本文通过系统实验研究了Gaussian-mixture二分类任务中的上下文学习，分析了输入维度、上下文示例数量和预训练任务数量对上下文测试准确率的影响，并探讨了良性过拟合现象。

详情

AI中文摘要

Transformer在上下文学习（ICL）中展现出强大的能力，使模型能够仅通过推理时提供的输入输出对解决之前未见过的任务。尽管先前的理论工作已经确立了在上下文内进行线性分类的条件，但指导这一机制何时成功的经验性扩展行为仍不够明确。本文对Gaussian-mixture二分类任务的上下文学习进行了系统性的实证研究。基于Frei和Vardi（2024）的理论框架，我们分析了上下文测试准确率如何依赖于三个基本因素：输入维度、上下文示例数量以及预训练任务数量。通过受控的合成设置和线性上下文分类器公式，我们隔离了模型在仅凭上下文自身推断任务结构时成功的几何条件。我们还研究了良性过拟合现象的出现，其中模型记忆了嘈杂的上下文标签，同时在干净的测试数据上仍能保持良好的泛化性能。通过在维度性、序列长度、任务多样性以及信噪比范围内进行广泛的扫描，我们识别了这种现象出现的参数区域，并描述了其如何依赖于数据几何和训练暴露。我们的结果为上下文分类的扩展行为提供了全面的经验图谱，突显了维度性、信号强度和上下文信息在决定上下文学习何时成功、何时失败中的关键作用。

英文摘要

Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the geometric conditions under which models successfully infer task structure from context alone. We additionally investigate the emergence of benign overfitting, where models memorize noisy in-context labels while still achieving strong generalization performance on clean test data. Through extensive sweeps across dimensionality, sequence length, task diversity, and signal-to-noise regimes, we identify the parameter regions in which this phenomenon arises and characterize how it depends on data geometry and training exposure. Our results provide a comprehensive empirical map of scaling behavior in in-context classification, highlighting the critical role of dimensionality, signal strength, and contextual information in determining when in-context learning succeeds and when it fails.

URL PDF HTML ☆

赞 0 踩 0

2604.23437 2026-05-19 cs.CR cs.LG 版本更新

Scalable and Verifiable Federated Learning for Cross-Institution Financial Fraud Detection

跨机构金融欺诈检测的可扩展且可验证的联邦学习

Prajwal Panth, Nishant Nigam

发表机构 * School of Computer Engineering, KIIT Deemed to be University（计算机工程学院，KIIT deemed to be 大学）； School of Electronics Engineering, KIIT Deemed to be University（电子工程学院，KIIT deemed to be 大学）

AI总结本文提出DSFL框架，通过动态随机分片和线性完整性标签，实现跨机构金融欺诈检测的高效安全聚合，实验显示其在大规模场景下具有更低的聚合延迟和更高的检测召回率。

Comments 8 pages, 7 figures. Preprint

详情

AI中文摘要

金融欺诈日益利用机构边界：洗钱网络将交易分布在多个银行中，因为没有单一机构能观察完整模式。联邦学习（FL）允许在不共享原始数据的情况下进行协作检测，但在银行业环境中实际部署仍受三个压力限制。首先，同态加密方案导致高计算成本，限制了大规模实时聚合。其次，基于掩码的协议如谷歌的SecAgg需要O(N²)对等密钥交换，随着参与者数量增加变得低效。第三，现有协议提供的验证有限，提交的梯度更新可能不一致，使聚合易受一致性攻击。本文提出动态分片联邦学习（DSFL），一种用于跨机构欺诈检测的安全聚合框架。DSFL引入动态随机分片，将参与者分成小的密码学短暂集群，大小为m，将通信复杂度降低到O(N*m)。在每个集群中，参与者提交线性完整性标签，加法同态承诺，使服务器无需解密即可验证更新一致性。该机制检测不一致的更新而非恶意梯度。主动邻居恢复协议通过重建孤儿掩码处理中轮丢包。在ULB信用卡欺诈检测数据集（284,807笔交易跨10个模拟银行节点）上的实验显示，DSFL在N=1000时，聚合延迟比基于Paillier的安全聚合低约34倍，基于分析外推从经验基线，同时在20%丢包率下保持99%的恢复保真度。全局欺诈召回率达到91.2%（±0.8%），高于本地训练模型的平均68%。

英文摘要

Financial fraud increasingly exploits institutional boundaries: laundering networks distribute transactions across multiple banks because no single institution can observe the full pattern. Federated Learning (FL) enables collaborative detection without raw data sharing, yet practical deployment in banking environments remains constrained by three pressures. First, homomorphic encryption schemes impose high computational costs that limit real-time aggregation at scale. Second, mask-based protocols such as Google's SecAgg require O(N^2) pairwise key exchanges, which become inefficient as participant count grows. Third, existing protocols provide limited verification that submitted gradient updates are well-formed, leaving aggregation vulnerable to consistency attacks. This paper presents Dynamic Sharded Federated Learning (DSFL), a secure aggregation framework for cross-institution fraud detection. DSFL introduces Dynamic Stochastic Sharding, which partitions participants into small cryptographically ephemeral clusters of fixed size m, reducing communication complexity to O(N*m). Within each cluster, participants submit Linear Integrity Tags, additive-homomorphic commitments that allow the server to verify update consistency without decryption. The mechanism detects inconsistent updates rather than malicious gradients. An Active Neighborhood Recovery protocol handles mid-round dropouts by reconstructing orphaned masks. Experiments on the ULB Credit Card Fraud Detection dataset (284,807 transactions across 10 simulated banking nodes) show that DSFL achieves approximately 34x lower aggregation latency than Paillier-based secure aggregation at N=1000, based on analytical extrapolation from empirical baselines, while maintaining 99% recovery fidelity under a 20% dropout regime. Global fraud recall reached 91.2% (+/-0.8%), above the 68% average of locally trained models.

URL PDF HTML ☆

赞 0 踩 0

2604.20031 2026-05-19 math.OC cs.LG stat.ML 版本更新

Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

在异质目标和约束下聚焦决策的联邦学习

Konstantinos Ziliaskopoulos, Alexander Vinel

发表机构 * Auburn University（亚伯拉罕大学）

AI总结本文研究了在异质目标和约束下聚焦决策的联邦学习，通过SPO+替代损失推导出异质性界限，展示了在强凸可行集下联邦学习的鲁棒性，并通过实验验证了其有效性。

详情

AI中文摘要

我们考虑了决策聚焦联邦学习（DFFL），这是一种预测后再优化的设置，在其中多个客户端协同训练预测模型以解决下游的线性优化问题，而无需交换原始数据。除了标准联邦学习中典型的数据异质性外，客户端还可能有不同的目标函数和可行区域。基于SPO+替代损失，我们推导出异质性界限，将目标偏移（通过成本向量距离测量）与可行集偏移（通过支撑函数和形状距离术语测量）分开。我们证明，对于一般的紧致可行集，小的目标扰动仍可引起非消失的决策聚焦损失差异，而强凸可行区域会产生更尖锐的基于稳定性界限。然后，我们将这些点状界限提升到局部与联邦的超额风险比较，显示当统计优势超过客户端特定的异质性惩罚时，联邦学习是有益的。在多面体和强凸问题上的计算实验证实，在强凸可行区域下联邦学习的鲁棒性显著增强。最后，我们评估了一个简单的基于验证的插值方法，用于本地和联邦DFFL模型之间。该插值方法缓解了理论权衡，减少了合成实验和PJM电力定价案例研究中的累积遗憾和最坏客户端损害。

英文摘要

We consider Decision-Focused Federated Learning (DFFL), a predict-then-optimize setting in which multiple clients collaboratively train predictive models for downstream linear optimization problems without exchanging raw data. Besides the data heterogeneity typical of standard federated learning, clients may also have different objective functions and feasible regions. Building on the SPO+ surrogate loss, we derive heterogeneity bounds that separate objective shift, measured through cost-vector distances, from feasible-set shift, measured through support-function and shape-distance terms. We show that, for general compact feasible sets, small objective perturbations can still induce nonvanishing decision-focused loss discrepancies, while strongly convex feasible regions yield sharper stability-based bounds. We then lift these pointwise bounds to a local-versus-federated excess-risk comparison, showing that federation is beneficial when the statistical advantage of pooling exceeds a client-specific heterogeneity penalty. Computational experiments on polyhedral and strongly convex problems confirm that federation is substantially more robust under strongly convex feasible regions. Finally, we evaluate a simple validation-based interpolation between local and federated DFFL models. This interpolation mitigates the theoretical tradeoff and reduces aggregate regret and worst-client harm in both synthetic experiments and a PJM energy-pricing case study.

URL PDF HTML ☆

赞 0 踩 0

2604.19219 2026-05-19 cs.CR cs.AI cs.DC cs.LG 版本更新

AscendOptimizer: 一种用于Ascend NPU运算优化的经验型智能体

Jiehao Wu, Zixiao Huang, Wenhao Li, Chuyun Shen, Junjie Sheng, Xiangfeng Wang

发表机构 * School of Computer Science and Technology, East China Normal University（东华大学计算机科学与技术学院）； School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）； Shanghai University of International Business and Economics（上海国际商务经济大学）； Key Lab of Mathematics and Engineering Applications (MoE), East China Normal University（东华大学数学与工程应用重点实验室）； School of Mathematical Sciences, East China Normal University（东华大学数学科学学院）； Shenzhen Loop Area Institute (SLAI)（深圳环宇院）

AI总结本文提出AscendOptimizer，通过自身执行构建缺失的优化知识，结合主机侧和内核侧优化，实现AscendC运算的加速，达到1.21倍的几何平均速度提升。

2603.20873 2026-05-19 cs.LG math.OC 版本更新

Incentive-Aware Federated Averaging with Performance Guarantees under Strategic Participation

具有战略参与性能保证的激励感知联邦平均

Fateme Maleki, Krishnan Raghavan, Farzad Yousefian

发表机构 * Department of Industrial and Systems Engineering, Rutgers University（工业与系统工程系，罗格斯大学）； Argonne National Laboratory（阿贡国家实验室）

AI总结本文提出一种激励感知联邦平均方法，通过客户端传输模型参数和数据集大小，利用纳什均衡规则动态调整数据集规模，确保在凸和非凸目标下实现性能保证，并在单调博弈下实现福利损失最小化。

详情

AI中文摘要

联邦学习（FL）是一种通信高效的协作学习框架，使多个代理能够在私有本地数据集上进行模型训练。尽管FL在提高全局模型性能方面的益处已被广泛证实，但个体代理可能会战略性地平衡学习收益与贡献本地数据的成本。受需要成功保留参与代理的FL框架的启发，我们提出了一种激励感知的联邦平均方法，在每次通信轮次中，客户端向服务器传输本地模型参数和更新的训练数据集大小。数据集大小通过寻求纳什均衡（NE）的更新规则动态调整，以捕捉战略数据参与。我们分析了所提出的方法在凸和非凸全局目标设置下的性能保证，并建立了由此产生的激励感知FL算法的性能保证。此外，在仅仅单调博弈设置下，我们考虑了福利损失最小化框架，并建立了该方案的渐近收敛性。在MNIST和CIFAR-10数据集上的数值实验表明，代理在实现稳定的数据参与策略的同时，能够获得具有竞争力的全局模型性能。

英文摘要

Federated learning (FL) is a communication-efficient collaborative learning framework that enables model training across multiple agents with private local datasets. While the benefits of FL in improving global model performance are well established, individual agents may behave strategically, balancing the learning payoff against the cost of contributing their local data. Motivated by the need for FL frameworks that successfully retain participating agents, we propose an incentive-aware federated averaging method in which, at each communication round, clients transmit both their local model parameters and their updated training dataset sizes to the server. The dataset sizes are dynamically adjusted via a Nash equilibrium (NE)-seeking update rule that captures strategic data participation. We analyze the proposed method under convex and nonconvex global objective settings and establish performance guarantees for the resulting incentive-aware FL algorithm. Furthermore, under a merely monotone game setting, we consider a welfare loss minimization framework and establish asymptotic convergence of the scheme. Numerical experiments on the MNIST and CIFAR-10 datasets demonstrate that agents achieve competitive global model performance while converging to stable data participation strategies.

URL PDF HTML ☆

赞 0 踩 0

2603.12676 2026-05-19 cs.LG 版本更新

Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs

解耦潜在动态流形融合用于求解参数化偏微分方程

Zhangyong Liang

发表机构 * National Center for Applied Mathematics, Tianjin University（应用数学国家中心，天津大学）

AI总结本文提出DLDMF框架，通过解耦空间、时间和参数，利用连续时间潜在方法和动态流形融合机制，提升参数泛化和时间外推的稳定性与准确性。

详情

AI中文摘要

通用神经代理模型在不同PDE参数下泛化困难，因为PDE系数变化使学习更困难且优化不稳定。当模型必须预测超出训练时间范围时，问题更加严重。现有方法通常无法同时处理参数泛化和时间外推。标准参数化模型将时间视为另一个输入，因此无法捕捉内在动态，而近期连续时间潜在方法通常依赖昂贵的测试时间自解码，效率低且可能破坏参数化解空间的连续性。为此，我们提出解耦潜在动态流形融合（DLDMF），一种物理指导框架，明确分离空间、时间和参数。代替不稳定自解码，DLDMF通过前馈网络将PDE参数直接映射到连续潜在嵌入。该嵌入初始化并条件化一个潜在状态，其演变由参数条件的神经ODE控制。我们进一步引入动态流形融合机制，使用共享解码器结合空间坐标、参数嵌入和时间演化的潜在状态以重建相应的时空解。通过将预测建模为潜在动态演变而非静态坐标拟合，DLDMF减少参数变化与时间演变之间的干扰，同时保持平滑且一致的解流形。因此，它在未见参数设置和长期时间外推中表现良好。在多个基准问题上的实验表明，DLDMF在准确性、参数泛化和外推鲁棒性方面均优于最先进基线。

英文摘要

Generalizing neural surrogate models across different PDE parameters remains difficult because changes in PDE coefficients often make learning harder and optimization less stable. The problem becomes even more severe when the model must also predict beyond the training time range. Existing methods usually cannot handle parameter generalization and temporal extrapolation at the same time. Standard parameterized models treat time as just another input and therefore fail to capture intrinsic dynamics, while recent continuous-time latent methods often rely on expensive test-time auto-decoding for each instance, which is inefficient and can disrupt continuity across the parameterized solution space. To address this, we propose Disentangled Latent Dynamics Manifold Fusion (DLDMF), a physics-informed framework that explicitly separates space, time, and parameters. Instead of unstable auto-decoding, DLDMF maps PDE parameters directly to a continuous latent embedding through a feed-forward network. This embedding initializes and conditions a latent state whose evolution is governed by a parameter-conditioned Neural ODE. We further introduce a dynamic manifold fusion mechanism that uses a shared decoder to combine spatial coordinates, parameter embeddings, and time-evolving latent states to reconstruct the corresponding spatiotemporal solution. By modeling prediction as latent dynamic evolution rather than static coordinate fitting, DLDMF reduces interference between parameter variation and temporal evolution while preserving a smooth and coherent solution manifold. As a result, it performs well on unseen parameter settings and in long-term temporal extrapolation. Experiments on several benchmark problems show that DLDMF consistently outperforms state-of-the-art baselines in accuracy, parameter generalization, and extrapolation robustness.

URL PDF HTML ☆

赞 0 踩 0

2603.08145 2026-05-19 cs.LG cs.AI 版本更新

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

DARC：通过风险约束解码实现的分歧意识对齐

Mingxi Zou, Jiaxiang Chen, Junfan Li, Langzhang Liang, Qifan Wang, Xu Yinghui, Zenglin Xu

发表机构 * Fudan University, Shanghai, China（复旦大学，上海，中国）； Independent Researcher（独立研究者）； Meta AI ； Incubation Institute, Fudan University, Shanghai, China（创新与孵化院，复旦大学，上海，中国）

AI总结 DARC通过风险约束解码方法，在不重新训练的情况下，通过最大化KL-鲁棒满意度目标来缓解分歧和尾部风险，保持高质量输出。

详情

AI中文摘要

基于偏好对齐的方法（如RLHF、DPO）通常优化单一标量目标，隐式地平均异质人类偏好。在实践中，系统标注者和用户组的分歧使均值奖励最大化变得脆弱且易受代理过优化影响。我们提出了**通过风险约束解码实现的分歧意识对齐（DARC）**，一种无需重新训练的推理时间方法，将响应选择框架为分布鲁棒、风险敏感的决策制定。给定多个偏好样本或可扩展的分歧代理，DARC通过最大化KL-鲁棒（熵）满意度目标对候选者进行重新排序，并提供简单的部署控制，使相应的熵风险溢价相对于均值进行限制或惩罚，从而在不重新训练的情况下实现显式风险预算。我们提供了将此解码规则与原则性悲观主义和基于KL的分布鲁棒优化联系起来的理论分析。在对齐基准测试中，DARC在减少分歧和尾部风险的同时，保持在噪声、异质反馈下的竞争力平均质量。

不变量分层传播用于表达性图神经网络

Asela Hevapathige, Ahad N. Zehmakan, Asiri Wijesinghe, Saman Halgamuge

发表机构 * Department of Mechanical Engineering University of Melbourne（墨尔本大学机械工程系）； School of Computing Australian National University（澳大利亚国立大学计算机学院）

AI总结本文提出不变量分层传播框架，通过改进的WL变体和高效神经网络实现，提升图神经网络的表达能力，解决结构异质性捕捉问题。

详情

Journal ref: Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

AI中文摘要

图神经网络（GNNs）在表达性和捕捉结构异质性方面存在根本限制。标准消息传递架构受限于1维Weisfeiler-Leman（1-WL）测试，无法区分超过度序列的图，并且从邻居均匀聚合信息，无法捕捉节点在更高阶模式中的不同结构性位置。尽管存在实现更高表达性的方法，但它们带来了不可接受的计算成本，并缺乏统一的框架来灵活编码多样的结构属性。为了解决这些限制，我们引入不变量分层传播（ISP），该框架包括一种新的WL变体（ISP-WL）及其高效的神经网络实现（ISPGNN）。ISP根据图不变量分层节点，处理它们在层次结构中揭示的结构差异，这些差异对1-WL不可见。通过层次结构异质性编码，ISP量化节点在更高阶模式中的结构性位置差异，区分参与者占据不同角色的相互作用与参与者参与均匀的相互作用。我们提供了正式的理论分析，证明了超越1-WL的增强表达性，收敛保证以及固有的抗过平滑性。在图分类、节点分类和影响估计的广泛实验中，ISP在标准架构和最先进的表达性基线中表现出一致的改进。

英文摘要

Graph Neural Networks (GNNs) face fundamental limitations in expressivity and capturing structural heterogeneity. Standard message-passing architectures are constrained by the 1-dimensional Weisfeiler-Leman (1-WL) test, unable to distinguish graphs beyond degree sequences, and aggregate information uniformly from neighbors, failing to capture how nodes occupy different structural positions within higher-order patterns. While methods exist to achieve higher expressivity, they incur prohibitive computational costs and lack unified frameworks for flexibly encoding diverse structural properties. To address these limitations, we introduce Invariant-Stratified Propagation (ISP), a framework comprising both a novel WL variant (ISP-WL) and its efficient neural network implementation (ISPGNN). ISP stratifies nodes according to graph invariants, processing them in hierarchical strata that reveal structural distinctions invisible to 1-WL. Through hierarchical structural heterogeneity encoding, ISP quantifies differences in nodes' structural positions within higher-order patterns, distinguishing interactions where participants occupy different roles from those with uniform participation. We provide formal theoretical analysis establishing enhanced expressivity beyond 1-WL, convergence guarantees, and inherent resistance to oversmoothing. Extensive experiments across graph classification, node classification, and influence estimation demonstrate consistent improvements over standard architectures and state-of-the-art expressive baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.01092 2026-05-19 cs.AI cs.LG 版本更新

The Alien Space of Science: Sampling Coherent but Cognitively Unavailable Research Directions

科学的异类空间：采样连贯但认知不可用的研究方向

Alejandro H. Artiles, Martin Weiss, Levin Brinkmann, Iyad Rahwan, Bernhard Schölkopf, Christopher Pal, Hugo Larochelle, Anirudh Goyal, Nasim Rahaman

发表机构 * Max Planck Institute for Human Development（马克斯·普朗克人类发展研究所）； Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； ELLIS Institute Tübingen（图宾根ELLIS研究所）； Polytechnique Montreal（蒙特利尔理工学院）； CIFAR AI Chair（CIFAR人工智能主席）； Mila – Quebec AI Institute（魁北克人工智能研究所）； Tiptree Systems（Tiptree系统）

AI总结本文提出一种框架，通过分解论文为概念单元并学习两个互补模型，采样出连贯但认知不可用的研究方向，扩展了LLM生成的潜在词汇库。

Comments 10 main pages, 42 appendix pages, 29 figures

详情

AI中文摘要

科学发现不仅受真理限制，还受研究人员当前探索领域认知可用性限制。许多方向在文献中是连贯的，但因没有现有社区占据正确的概念、方法和直觉组合而不被提出。现代语言模型继承这种偏见，当被提示生成新想法时会重新组合文献的高密度区域。我们引入了一个框架，旨在针对互补区域，称为科学的异类空间，其中方向在现有知识结构下是可能的，但在现有研究人员分布下不太可能。我们的方法首先将论文分解为细粒度的概念单元，并将它们聚类为共享的词汇概念原子。然后在该词汇上学习两个互补模型。一个连贯性模型评分原子组合是否形成可行的研究方向，另一个可用性模型评分是否任何现有作者社区能够产生给定组合。采样异类方向则减少为排名原子组合，以最大化连贯性同时最小化可用性。在包含16,068篇经同行评审的LLM论文的语料库上，所得到的采样器在不牺牲连贯性的前提下，探索出比前沿LLM生成基线大3.5至7倍的有效原子词汇库，并在盲LLM、人类和下游实验评估中产生匹配或超过基线的想法。通过将科学合理性与社区可用性分开，我们的框架指向AI生成想法，补充而非仅仅加速人类科学，扩展探索到当前社区可能忽视的连贯方向。

英文摘要

Scientific discovery is constrained not only by what is true, but by what is cognitively available to the researchers currently exploring a field. Many directions are coherent in light of the literature yet unlikely to be proposed because no existing community occupies the right combination of concepts, methods, and intuitions. Modern language models inherit this bias, recombining high-density regions of the literature when prompted for novel ideas. We introduce a framework that targets the complementary region, which we call the alien space of science, where directions are plausible under the structure of existing knowledge but unlikely under the distribution of existing researchers. Our method first decomposes papers into granular conceptual units and clusters them into a shared vocabulary of idea atoms. It then learns two complementary models over this vocabulary. A coherence model scores whether a combination of atoms forms a viable research direction, and an availability model scores whether any existing author community is positioned to produce a given combination. Sampling alien directions then reduces to ranking atom combinations that maximize coherence while minimizing availability. On a corpus of 16,068 peer-reviewed LLM papers from NeurIPS, ICLR, ICML, and major NLP venues, the resulting sampler explores a 3.5 - 7 x broader effective atom vocabulary than frontier LLM ideation baselines without sacrificing coherence, and produces ideas that match or exceed those baselines under blind LLM, human, and downstream experimental evaluation. By separating scientific plausibility from community availability, our framework points toward AI ideation that complements rather than merely accelerates human science, expanding exploration into coherent directions that the current community may overlook.

URL PDF HTML ☆

赞 0 踩 0

2603.00975 2026-05-19 cs.LG cs.AI 版本更新

Forgetting is Competition: Rethinking Unlearning as Representation Interference in Diffusion Models

遗忘是竞争：重新思考扩散模型中的去学习作为表征干扰

Ashutosh Ranjan, Vivek Srivastava, Shirish Karande, Murari Mandal

发表机构 * TCS Research（印度 Tata Consulting Engineers 研究部）； Kalinga Institute of Industrial Technology（卡林加工业技术学院）

AI总结本文提出SurgUn方法，通过可控竞争而非直接删除或一对一重分配来实现扩散模型的去学习，有效平衡遗忘与保留，提升模型在版权、安全等场景下的表现。

详情

AI中文摘要

部署的文本到图像扩散模型日益需要事后概念去学习以应对版权主张、艺术家退出、安全更新和受保护内容缓解，而无需完全重新训练。核心挑战是擦除-保留失衡，激进更新抑制目标但损害共享能力，而保守或基于锚点的更新保留质量但使概念可通过相关、组合、改写或对抗性提示恢复。受反向干扰启发，我们提出SurgUn，将遗忘视为受控竞争而非直接删除或一对一重分配。SurgUn通过干扰条件梯度竞争实现反向概念干扰：目标梯度上升削弱目标条件的去噪或流匹配行为，而下降于语义多样的干扰集引入竞争非目标轨迹。这将输出分布在多个非目标模式而非坍缩到单一代理。为通过共享路径限制意外遗忘，SurgUn添加像素基础的权重空间局部化，轻量级诊断通过生成图像擦除-保留行为选择注意力块，利用抑制广泛可行而保留块选择性的不对称性。在UnlearnCanvas、IP-character擦除、Holistic Unlearning、EraseBench和Ring-A-Bell上，SurgUn在Stable Diffusion v1.5、SDXL和SANA-1.5中实现了比基线更强的擦除-保留平衡。消融实验显示，多样干扰、对比竞争和局部化对于稳健抑制同时保留相关和不相关概念都是必要的。

英文摘要

Deployed text-to-image diffusion models increasingly require post-hoc concept unlearning for copyright claims, artist opt-outs, safety updates, and protected-content mitigation without full retraining. A central challenge is erase-retain imbalance, aggressive updates suppress targets but damage shared capabilities, while conservative or anchor-based updates preserve quality yet leave concepts recoverable through related, compositional, paraphrased, or adversarial prompts. Inspired by retroactive interference, we propose SurgUn, which treats forgetting as controlled competition rather than direct deletion or one-to-one reassignment. SurgUn instantiates retroactive concept interference via distractor-conditioned gradient competition: target-gradient ascent weakens target-conditioned denoising or flow-matching behavior, while descent over a semantically diverse distractor set introduces competing non-target trajectories under the same prompt context. This redistributes outputs across multiple non-target modes instead of collapsing to a single proxy. To limit collateral forgetting through shared pathways, SurgUn adds pixel-grounded weight-space localization, a lightweight diagnostic that selects attention blocks by generated-image erase-retain behavior, exploiting the asymmetry that suppression is broadly achievable whereas retention is block-selective. Across UnlearnCanvas, IP-character erasure, Holistic Unlearning, EraseBench, and Ring-A-Bell on Stable Diffusion v1.5, SDXL, and SANA-1.5, SurgUn achieves a stronger erase-retain balance than baselines. Ablations show that diverse distractors, contrastive competition, and localization are all necessary for robust suppression while preserving related and unrelated concepts.

URL PDF HTML ☆

赞 0 踩 0

2602.22801 2026-05-19 cs.RO cs.AI cs.LG 版本更新

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

释放扩散模型在端到端自动驾驶中的潜力

Yinan Zheng, Tianyi Tan, Bin Huang, Enguang Liu, Ruiming Liang, Jianlin Zhang, Jianwei Cui, Guang Chen, Kun Ma, Hangjun Ye, Long Chen, Ya-Qin Zhang, Xianyuan Zhan, Jingjing Liu

发表机构 * Institute for AI Industry Research (AIR), Tsinghua University（人工智能产业研究院（AIR），清华大学）

AI总结本文通过大规模实车数据和道路测试，系统研究了扩散模型在端到端自动驾驶中的规划能力，提出Hyper Diffusion Planner框架，实现10倍性能提升。

详情

AI中文摘要

扩散模型已成为机器人决策任务中的流行选择，近年来也开始被考虑用于解决自动驾驶任务。然而，其在自动驾驶中的应用和评估仍局限于模拟或实验室环境。本研究通过大规模实车数据和道路测试，系统研究了扩散模型作为端到端自动驾驶规划器的潜力。通过全面而受控的研究，我们识别了扩散损失空间、轨迹表示和数据缩放等关键洞察，显著影响端到端规划性能。此外，我们还提供了一种有效的强化学习后训练策略，进一步提升学习规划器的安全性和鲁棒性。所提出的扩散学习框架Hyper Diffusion Planner (HDP)在真实车辆平台上部署，并在6个城市驾驶场景和200公里的真实世界测试中，实现了相对于基模型的10倍性能提升。本文证明了当正确设计和训练时，扩散模型可以作为有效且可扩展的端到端自动驾驶规划器，用于复杂的真实世界自动驾驶任务。

英文摘要

Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety and robustness of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.21185 2026-05-19 cs.LG 版本更新

The Diffusion Duality, Chapter II: $Ψ$-Samplers

扩散对偶性，第二章：Ψ-采样器

Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

发表机构 * EPFL, Lausanne, Switzerland（苏黎世联邦理工学院，洛桑分校）； Microsoft AI（微软人工智能）； Cornell Tech, NY（康奈尔科技）

AI总结本文提出了一种通用的预测-校正采样器，用于离散扩散模型，提升了语言和图像建模的生成质量，并展示了其在训练效率上的优势。

详情

AI中文摘要

离散扩散模型因其自我校正能力在少量步骤生成和引导中表现出色，使其在这些场景中优于自回归或遮蔽扩散模型。然而，随着采样步骤的增加，其采样质量会趋于平缓。我们引入了一类预测-校正（PC）采样器，用于离散扩散，这些方法能够推广先前的方法并适用于任意噪声过程。当与均匀状态扩散结合时，我们的采样器在语言和图像建模上均优于祖先采样，实现了在OpenWebText上的生成困惑度更低，在CIFAR10上的FID/IS分数更优。关键的是，与传统采样器不同，我们的PC方法随着更多采样步骤的增加而持续改进。这些发现质疑了遮蔽扩散是扩散语言模型不可避免未来的假设。除了采样外，我们还开发了一种内存高效的课程学习方法，用于高斯松弛训练阶段，将训练时间减少25%，内存减少33%，同时保持在OpenWebText和LM1B上的困惑度相当，并在下游任务中表现强劲。我们发布了代码、检查点和视频教程：https://s-sahoo.com/duo-ch2

英文摘要

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

URL PDF HTML ☆

赞 0 踩 0

2602.19710 2026-05-19 cs.CV cs.LG cs.RO 版本更新

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

面向通用视觉-语言-动作策略的通用姿态预训练

Haitao Lin, Hanyang Yu, Jingshun Huang, He Zhang, Yonggen Ling, Ping Tan, Xiangyang Xue, Yanwei Fu

发表机构 * Tencent Robotics X（腾讯机器人X）； Futian Laboratory（福田实验室）； The Hong Kong University of Science and Technology（香港科学与技术大学）； Fudan University（复旦大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结本文提出Pose-VLA，通过分离预训练和后训练阶段，解决视觉-语言-动作模型中的特征坍塌和训练效率问题，实现通用3D空间先验提取与机器人特定动作空间的高效对齐。

Comments Accepted to Robotics: Science and Systems (RSS) 2026. Project website: https://hetolin.github.io/PoseVLA

详情

Journal ref: Robotics: Science and Systems, 2026

AI中文摘要

现有视觉-语言-动作（VLA）模型常因将高层感知与稀疏的、特定身体动作监督结合而出现特征坍塌和低训练效率。由于这些模型通常依赖优化用于视觉问答（VQA）的VLM主干，它们擅长语义识别但常忽视细微的3D状态变化，这些变化决定了不同的动作模式。为解决这些不一致，我们提出了Pose-VLA，一种解耦范式，将VLA训练分为预训练阶段以提取统一摄像机空间中的通用3D空间先验，以及后训练阶段以在机器人特定的动作空间中高效对齐。通过引入离散姿态标记作为通用表示，Pose-VLA无缝整合了来自不同3D数据集的空间接地与机器人演示中的几何级轨迹。我们的框架遵循一个两阶段预训练流程，通过姿态建立基本空间接地，然后通过轨迹监督实现运动对齐。广泛的评估显示，Pose-VLA在RoboTwin 2.0上实现了79.5%的平均成功率，并在LIBERO上表现出竞争力。现实世界实验进一步展示了在使用仅100个演示每任务的情况下，对多样化物体的鲁棒泛化能力，验证了我们预训练范式的效率。

英文摘要

Existing Vision-Language-Action (VLA) models often suffer from feature collapse and low training efficiency because they entangle high-level perception with sparse, embodiment-specific action supervision. Since these models typically rely on VLM backbones optimized for Visual Question Answering (VQA), they excel at semantic identification but often overlook subtle 3D state variations that dictate distinct action patterns. To resolve these misalignments, we propose Pose-VLA, a decoupled paradigm that separates VLA training into a pre-training phase for extracting universal 3D spatial priors in a unified camera-centric space, and a post-training phase for efficient embodiment alignment within robot-specific action space. By introducing discrete pose tokens as a universal representation, Pose-VLA seamlessly integrates spatial grounding from diverse 3D datasets with geometry-level trajectories from robotic demonstrations. Our framework follows a two-stage pre-training pipeline, establishing fundamental spatial grounding via poses followed by motion alignment through trajectory supervision. Extensive evaluations demonstrate that Pose-VLA achieves state-of-the-art results on RoboTwin 2.0 with a 79.5% average success rate and competitive performance on LIBERO at 96.0%. Real-world experiments further showcase robust generalization across diverse objects using only 100 demonstrations per task, validating the efficiency of our pre-training paradigm.

URL PDF HTML ☆

赞 0 踩 0

2602.18584 2026-05-19 cs.LG cs.AI cs.CV 版本更新

GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry

GIST: 通过耦合优化几何进行指令微调的目标数据选择

Guanghui Min, Tianhao Huang, Ke Wan, Chen Chen

发表机构 * Department of Computer Science, University of Virginia, Charlottesville, USA（弗吉尼亚大学计算机科学系）

AI总结本文提出GIST方法，通过子空间对齐替代轴对齐缩放，解决参数高效微调中参数耦合问题，实现更高效的目标数据选择。

Comments ICML 2026; 27 pages, 8 figures, 11 tables

详情

AI中文摘要

目标数据选择已成为高效指令微调中的关键范式，旨在为特定任务识别一小部分有影响力的训练示例。在实践中，影响力通常通过示例对参数更新的影响来衡量。为了使选择可扩展，许多方法利用优化器统计（如Adam状态）作为轴对齐的替代品，隐式地将参数视为坐标独立。我们证明在参数高效微调（PEFT）方法如LoRA中，这一假设在破裂。在这种情况下，诱导的优化几何表现出强跨参数耦合和非平凡的非对角交互，而任务相关的更新方向被限制在低维子空间中。受此不匹配的启发，我们提出GIST（梯度等距子空间转换），一种简单但原则性的替代方法，用稳健的子空间对齐替代轴对齐缩放。GIST通过奇异值分解（SVD）从验证梯度中恢复任务特定的子空间，将训练梯度投影到该耦合子空间，并通过与目标方向的对齐程度评分示例。大量实验表明，在相同的选择预算下，GIST仅使用0.29%的存储和25%的计算时间，与当前最先进的基线匹配或优于。

英文摘要

Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured through the effect of an example on parameter updates. To make selection scalable, many approaches leverage optimizer statistics (e.g., Adam states) as an axis-aligned surrogate for update geometry (i.e., diagonal precondition), implicitly treating parameters as coordinate-wise independent. We show that this assumption breaks down in parameter-efficient fine-tuning (PEFT) methods such as LoRA. In this setting, the induced optimization geometry exhibits strong cross-parameter coupling with non-trivial off-diagonal interactions, while the task-relevant update directions are confined to a low-dimensional subspace. Motivated by this mismatch, we propose GIST (Gradient Isometric Subspace Transformation), a simple yet principled alternative that replaces axis-aligned scaling with robust subspace alignment. GIST recovers a task-specific subspace from validation gradients via singular value decomposition (SVD), projects training gradients into this coupled subspace, and scores examples by their alignment with target directions. Extensive experiments have demonstrated that GIST matches or outperforms the state-of-the-art baseline with only 0.29% of the storage and 25% of the computational time under the same selection budget.

URL PDF HTML ☆

赞 0 踩 0

2602.17679 2026-05-19 cs.LG math.OC 版本更新

Joint Parameter and State-Space Bayesian Optimization: Using Process Expertise to Accelerate Manufacturing Optimization

联合参数与状态空间贝叶斯优化：利用过程专业知识加速制造优化

Saksham Kiroriwal, Julius Pfrommer, Jürgen Beyerer

发表机构 * Cognitive Industrial Systems, Fraunhofer IOSB（弗劳恩霍夫工业系统认知研究所）； Karlsruhe Institute of Technology (KIT)（卡尔斯鲁厄理工学院）

AI总结本文提出POGPN-JPSS框架，结合POGPN与联合参数状态空间建模，利用专家知识提取低维特征，提升多阶段生物乙醇生产过程优化效率。

Comments This paper is under review and has been submitted for CIRP CMS 2026

详情

AI中文摘要

贝叶斯优化（BO）是一种强大的方法，用于优化黑盒制造过程，但其性能在处理高维多阶段系统时受限，特别是当可以观察到中间输出时。标准BO将过程视为黑盒并忽略中间观察和底层过程结构。部分可观测高斯过程网络（POGPN）将过程建模为有向无环图（DAG）。然而，当观测是高维状态空间时间序列时，使用中间观测具有挑战性。过程专家知识可用于从高维状态空间数据中提取低维潜在特征。我们提出了POGPN-JPSS框架，结合POGPN与联合参数和状态空间（JPSS）建模，以利用提取的中间信息。我们在具有挑战性的高维多阶段生物乙醇生产过程模拟中展示了POGPN-JPSS的有效性。我们的结果表明，POGPN-JPSS显著优于现有方法，通过在两倍时间内达到所需性能阈值并更具可靠性。快速优化直接转化为时间和资源的显著节省。这突显了将专家知识与结构化概率模型结合以实现快速过程成熟的重要性。

英文摘要

Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing with high-dimensional multi-stage systems, where we can observe intermediate outputs. Standard BO models the process as a black box and ignores the intermediate observations and the underlying process structure. Partially Observable Gaussian Process Networks (POGPN) model the process as a Directed Acyclic Graph (DAG). However, using intermediate observations is challenging when the observations are high-dimensional state-space time series. Process-expert knowledge can be used to extract low-dimensional latent features from the high-dimensional state-space data. We propose POGPN-JPSS, a framework that combines POGPN with Joint Parameter and State-Space (JPSS) modeling to use intermediate extracted information. We demonstrate the effectiveness of POGPN-JPSS on a challenging, high-dimensional simulation of a multi-stage bioethanol production process. Our results show that POGPN-JPSS significantly outperforms state-of-the-art methods by achieving the desired performance threshold twice as fast and with greater reliability. The fast optimization directly translates to substantial savings in time and resources. This highlights the importance of combining expert knowledge with structured probabilistic models for rapid process maturation.

URL PDF HTML ☆

赞 0 踩 0

2602.15405 2026-05-19 cs.LG 版本更新

分析和指导扩散模型中的零样本后验采样

Roi Benita, Michael Elad, Joseph Keshet

发表机构 * Department of Electrical and Computer Engineering（电气工程系）； Department of Computer Science（计算机科学系）； Technion, Haifa, Israel（以色列海法技术学院）

AI总结本文分析了扩散模型中零样本后验采样的方法，提出基于高斯假设的框架，通过频域分析实现参数设计，提升感知质量和信号保真度。

详情

AI中文摘要

从退化测量中恢复信号一直是科学和工程的挑战。最近，零样本扩散方法被提出用于此类逆问题，提供基于先验知识的后验采样解决方案。此类算法通过推理整合观测，通常依赖手动调参和启发式方法。本文提出对这些近似后验采样器的严格分析，基于先验的高斯性假设。在此条件下，我们证明理想后验采样器和扩散重建算法可以表示为闭式形式，从而在频域中进行彻底分析和比较。基于这些表示，我们引入一种系统的方法来设计参数，取代以往的启发式选择策略。所提方法具有方法无关性，产生定制化的参数选择，共同考虑先验、退化信号和扩散动态的特性。我们显示，我们的频域推荐在结构上不同于标准启发式方法，并随扩散步长变化，从而在感知质量和信号保真度之间实现一致的平衡。

英文摘要

Recovering a signal from its degraded measurements is a long standing challenge in science and engineering. Recently, zero-shot diffusion based methods have been proposed for such inverse problems, offering a posterior sampling based solution that leverages prior knowledge. Such algorithms incorporate the observations through inference, often leaning on manual tuning and heuristics. In this work we propose a rigorous analysis of these approximate posterior samplers, relying on a Gaussianity assumption of the prior. Under this regime, we show that both the ideal posterior sampler and diffusion-based reconstruction algorithms can be expressed in closed-form, enabling their thorough analysis and comparisons in the spectral domain. Building on these representations, we introduce a principled framework for parameter design, replacing heuristic selection strategies used to date. The proposed approach is method-agnostic and yields tailored parameter choices that jointly account for the characteristics of the prior, the degraded signal, and the diffusion dynamics. We show that our spectral recommendations differ structurally from standard heuristics and vary with the diffusion step size, resulting in a consistent balance between perceptual quality and signal fidelity.

URL PDF HTML ☆

赞 0 踩 0

2602.06807 2026-05-19 cs.RO cs.AI cs.LG 版本更新

SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments

SuReNav：基于超像素图的约束放松用于过约束环境中的导航

Keonyoung Koh, Moonkyeong Jung, Samuel Seungsup Lee, Daehyung Park

发表机构 * School of Computing, Korea Advanced Institute of Science and Technology, Korea（韩国科学技术院计算机学院）

AI总结本文提出SuReNav方法，通过超像素图构建区域约束，利用图神经网络实现安全高效导航，适用于半静态环境中过约束规划问题，提升导航的人类类比性能。

Comments Accepted by ICRA 2026. Code and videos are available at https://sure-nav.github.io/

详情

AI中文摘要

我们针对半静态环境中过约束规划问题，提出SuReNav方法，通过超像素图构建区域约束，利用图神经网络训练于人类示范数据，实现安全高效的导航。框架包含三个组件：1）带有区域约束的超像素图地图生成，2）利用图神经网络进行区域约束放松，3）放松、规划和执行的交织过程。在2D语义地图和3D OpenStreetMap地图上评估，实现最高的人类类比得分，同时保持效率与安全的平衡。最后在现实城市导航中展示其可扩展性和泛化能力。代码和视频可在https://sure-nav.github.io/获取。

英文摘要

We address the over-constrained planning problem in semi-static environments. The planning objective is to find a best-effort solution that avoids all hard constraint regions while minimally traversing the least risky areas. Conventional methods often rely on pre-defined area costs, limiting generalizations. Further, the spatial continuity of navigation spaces makes it difficult to identify regions that are passable without overestimation. To overcome these challenges, we propose SuReNav, a superpixel graph-based constraint relaxation and navigation method that imitates human-like safe and efficient navigation. Our framework consists of three components: 1) superpixel graph map generation with regional constraints, 2) regional-constraint relaxation using graph neural network trained on human demonstrations for safe and efficient navigation, and 3) interleaving relaxation, planning, and execution for complete navigation. We evaluate our method against state-of-the-art baselines on 2D semantic maps and 3D maps from OpenStreetMap, achieving the highest human-likeness score of complete navigation while maintaining a balanced trade-off between efficiency and safety. We finally demonstrate its scalability and generalization performance in real-world urban navigation with a quadruped robot, Spot. Code and Videos are available at https://sure-nav.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2602.05993 2026-05-19 cs.LG cs.AI 版本更新

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

钻石映射：通过随机流映射实现高效的奖励对齐

Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, Max Simchowitz

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出钻石映射，一种通过随机流映射实现高效奖励对齐的生成模型，能够在推理时对任意奖励进行准确对齐，提升模型适应性和性能。

2602.05813 2026-05-19 cs.LG math.OC 版本更新

Where Does Warm-Up Come From? Adaptive Scheduling for Norm-Constrained Optimizers

自适应调度机制：规范约束优化器中的暖启动来源

Artem Riabinin, Andrey Veprikov, Arman Bolatov, Martin Takáč, Aleksandr Beznosikov

发表机构 * Basic Research of Artificial Intelligence Laboratory（人工智能基础研究实验室）； Federated Learning Problems Laboratory（联邦学习问题实验室）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Innopolis University（因诺普里斯大学）

AI总结本文研究了规范约束优化器的自适应学习率调度，提出了一种通用平滑性假设，证明了在优化轨迹中局部曲率随次优间隙减小，从而自然产生暖启动而非人工设定。

Comments 30 pages, 8 figures, 5 tables

详情

AI中文摘要

我们研究了规范约束优化器（如Muon和Lion）的自适应学习率调度。我们引入了一种通用平滑性假设，其中局部曲率随次优间隙减小，并通过实验证实这种行为在优化轨迹中成立。在该假设下，我们建立了在适当选择学习率时的收敛保证，其中暖启动后衰减自然地从证明中产生，而非人为设定。基于此理论，我们开发了一种实用的学习率调度器，仅依赖标准超参数，并在训练开始时自动调整暖启动持续时间。我们在大型语言模型预训练中评估了该方法，使用LLaMA架构，证明我们的自适应暖启动选择在所有考虑的设置中 consistently 超过或至少匹配最佳的手动调优暖启动调度，无需额外超参数搜索。我们的源代码可在https://github.com/brain-lab-research/llm-baselines/tree/warmup获取。

英文摘要

We study adaptive learning rate scheduling for norm-constrained optimizers (e.g., Muon and Lion). We introduce a generalized smoothness assumption under which local curvature decreases with the suboptimality gap and empirically verify that this behavior holds along optimization trajectories. Under this assumption, we establish convergence guarantees under an appropriate choice of learning rate, for which warm-up followed by decay arises naturally from the proof rather than being imposed heuristically. Building on this theory, we develop a practical learning rate scheduler that relies only on standard hyperparameters and adapts the warm-up duration automatically at the beginning of training. We evaluate this method on large language model pretraining with LLaMA architectures and show that our adaptive warm-up selection consistently outperforms or at least matches the best manually tuned warm-up schedules across all considered setups, without additional hyperparameter search. Our source code is available at https://github.com/brain-lab-research/llm-baselines/tree/warmup

URL PDF HTML ☆

赞 0 踩 0

2602.02236 2026-05-19 cs.RO cs.LG cs.NE cs.SY eess.SY 版本更新

Adaptive Control in Autonomous Driving via Real-Time Recurrent RL

通过实时递归强化学习实现自动驾驶中的自适应控制

Julian Lemmel, Felix Resch, Mónika Farsang, Ramin Hasani, Daniela Rus, Radu Grosu

发表机构 * TU Wien（维也纳技术大学）； MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； Liquid AI

AI总结本文研究了通过实时递归强化学习（RTRRL）对自动驾驶预训练控制策略进行在线微调，结合离线行为克隆与在线RTRRL微调，以适应部署时的分布偏移。在CarRacing模拟和1:10比例的RoboRacer平台上的实验验证了该方法的有效性。

2602.02039 2026-05-19 cs.AI cs.CL cs.DB cs.LG 版本更新

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

在大型语言模型上进行深度数据研究：评估深度数据研究

Wei Liu, Peijie Yu, Michele Orini, Yali Du, Yulan He

发表机构 * GitHub

AI总结本文提出深度数据研究（DDR）任务和DDR-Bench基准，评估大型语言模型的探索智能，发现有效探索需要内在策略而非单纯扩展。

Comments 14 pages, 7 tables, 8 figures, accepted by ICML 2026

2602.01705 2026-05-19 cs.LG cs.AI 版本更新

LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning

LaDi-RL：潜在扩散推理防止强化学习中的熵崩溃

Haoqiang Kang, Yizhe Zhang, Nikki Lijing Kuang, Yi-An Ma, Lianhui Qin

发表机构 * UC San Diego（斯克利普斯海洋研究所）； Apple（苹果公司）

AI总结本文提出LaDi-RL方法，通过潜在扩散模型生成潜在推理轨迹，解决强化学习中熵崩溃问题，提升代码生成和数学推理性能。

详情

AI中文摘要

强化学习已成为改进大语言模型推理的核心范式，但现有方法多在离散token序列上优化政策，导致优化空间与推理结构不匹配。连续潜在空间RL提供了一种替代方案，允许政策探索更高层次的推理表示。然而，单纯转向潜在空间不足，所生成的策略必须建模复杂多模态的合理推理轨迹分布。为此，我们提出潜在扩散推理与强化学习（LaDi-RL），其中扩散模型通过迭代去噪生成潜在推理轨迹。此方法支持结构化探索和表达性分布建模，但也引入了根本的信用分配挑战：策略在潜在空间中行动，而奖励仅在潜在被解码为文本后才被观察到。因此，我们引入层次化潜在-文本回放，对每个潜在轨迹采样多个文本完成并聚合其奖励以获得解码边缘化的潜在效用估计。这为优化扩散策略提供了更清晰且方差更低的奖励信号。实验证明，LaDi-RL在代码生成和数学推理的pass@1指标上分别优于token级RL 9.4%和5.7%，甚至超越了基模型的pass@k性能。

英文摘要

Reinforcement learning has become a central paradigm for improving LLM reasoning, but most existing methods optimize policies over discrete token sequences. This creates a mismatch between the optimization space and the structure of reasoning: many important decisions are semantic, global, and trajectory-level rather than local token choices. Continuous latent-space RL offers a promising alternative by allowing policies to explore higher-level reasoning representations. However, simply moving to latent space is not sufficient. The resulting policy must model a complex, multi-modal distribution over valid reasoning trajectories. We therefore propose Latent Diffusion Reasoning with Reinforcement Learning (LaDi-RL), where a diffusion model generates latent reasoning trajectories through iterative denoising. This formulation enables structured exploration and expressive distribution modeling, but also introduces a fundamental credit-assignment challenge: the policy acts in latent space, while rewards are observed only after the latent is decoded into text. A naive rollout strategy therefore entangles latent reasoning quality with text decoding quality, making it unclear whether an incorrect answer results from a poor latent trajectory or from an imperfect textual realization. To address this, we introduce hierarchical latent-text rollouts. We sample multiple text completions for each latent trajectory and aggregate their rewards to obtain a decoder-marginalized estimate of latent utility. This provides a cleaner and lower-variance reward signal for optimizing the diffusion policy. Empirically, LaDi-RL outperforms token-level RL by 9.4% on code generation and 5.7% on math reasoning in pass@1, and even surpasses the base model's pass@k performance.

URL PDF HTML ☆

赞 0 踩 0

2601.23154 2026-05-19 cs.LG cs.AI 版本更新

On Safer Reinforcement Learning for Sedation and Analgesia in Intensive Care

关于重症监护中镇痛和镇静的安全强化学习

Joel Romero-Hernandez, Oscar Camara

发表机构 * BCN MedTech, Complex Systems Lab Universitat Pompeu Fabra Barcelona, Spain（BCN医疗科技，复杂系统实验室巴塞罗那自治大学巴塞罗那）

AI总结本文提出一种离线深度强化学习框架，用于优化重症监护中的镇痛和镇静，通过减少疼痛或联合减少疼痛和30天出院后死亡率来提升治疗安全性。

Comments 48th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 2026)

详情

AI中文摘要

重症监护中的镇痛管理通常涉及复杂的权衡，因为治疗不足或过量都会影响患者安全。先前强化学习在镇静和镇痛中的研究主要关注优化干预，但未考虑患者生存率或部分可观测性。为探讨这些设计选择的风险，我们开发了一个离线深度强化学习框架，基于递归状态表示建议每小时药物剂量。使用MIMIC-IV数据库中47,144例ICU住院数据，我们训练并评估了行为正则化的actor-critic模型，根据两个目标：减少疼痛或联合减少疼痛和30天出院后死亡率来处方连续剂量的阿片类药物、丙泊酚、苯二氮䓬类药物和去甲肾上腺素。尽管两种政策与较低的疼痛相关，但镇痛政策与死亡率呈正相关（ρ=0.119，p<0.0001），而联合政策与死亡率呈负相关（ρ=-0.316，p<0.0001）。我们发现这种分歧源于对高共病率的不同反应。这表明，重视出院后结果可能对学习更安全的治疗政策至关重要，即使短期目标仍是主要目标。

英文摘要

Pain management in intensive care usually involves complex trade-offs, since both inadequate and excessive treatment can compromise patient safety. Prior work on reinforcement learning for sedation and analgesia has explored how to optimize these interventions, but has not considered patient survival or partial observability. To investigate the risks of these design choices, we developed an offline deep reinforcement learning framework that suggests hourly medication doses based on recurrent state representations. Using retrospective data from 47,144 ICU stays in the MIMIC-IV database, we trained and evaluated behavior-regularized actor-critic models that prescribe continuous doses of opioids, propofol, benzodiazepines, and dexmedetomidine according to two goals: reduce pain or jointly reduce pain and 30-day post-discharge mortality. Although the two resulting policies were associated with lower pain, clinician agreement with the pain-only policy was positively correlated with mortality ($ρ$=0.119, p<0.0001), while agreement with the joint policy was negatively correlated ($ρ$=-0.316, p<0.0001). We found that such divergence arose from a different response to high levels of comorbidity. This suggests that valuing post-discharge outcomes could be critical for learning safer treatment policies, even if a short-term goal remains the primary objective.

URL PDF HTML ☆

赞 0 踩 0

2601.22678 2026-05-19 cs.LG 版本更新

Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective

全图与小批量训练：从批量大小和Fan-Out大小视角的综合分析

Mengfan Liu, Da Zheng, Junwei Su, Chuan Wu

发表机构 * The University of Hong Kong（香港大学）； Ant Group（蚂蚁集团）

AI总结本文从批量大小和Fan-Out大小角度系统比较了全图与小批量GNN训练方法，通过实证和理论分析揭示了批量大小和Fan-Out大小对模型收敛和泛化的影响，为超参数调优提供指导。

详情

AI中文摘要

全图和小批量图神经网络（GNN）训练方法具有不同的系统设计需求，选择合适的方法至关重要。比较这两种GNN训练方法的核心挑战在于刻画其模型性能（即收敛性和泛化）和计算效率。尽管批量大小在分析深度神经网络（DNNs）行为时是一个有效的视角，但GNNs通过引入Fan-Out大小扩展了这一视角，因为全图训练可以视为批量大小和Fan-Out大小最大的小批量训练。然而，GNNs的批量和Fan-Out大小的影响仍不够深入。为此，本文通过实证和理论分析，从批量大小和Fan-Out大小的角度系统比较了GNNs的全图与小批量训练。我们的主要贡献包括：1）我们提供了一种新的泛化分析，使用Wasserstein距离研究图结构，尤其是Fan-Out大小的影响。2）我们揭示了批量大小和Fan-Out大小在GNN收敛和泛化中的各向异性影响，为在资源受限条件下调优这些超参数提供了实际指导。最后，全图训练并不总能比经过良好调优的小批量设置在模型性能或计算效率上更优。实现可在GitHub链接中找到：https://github.com/LIUMENGFAN-gif/GNN_fullgraph_minibatch_training。

英文摘要

Full-graph and mini-batch Graph Neural Network (GNN) training approaches have distinct system design demands, making it crucial to choose the appropriate approach to develop. A core challenge in comparing these two GNN training approaches lies in characterizing their model performance (i.e., convergence and generalization) and computational efficiency. While a batch size has been an effective lens in analyzing such behaviors in deep neural networks (DNNs), GNNs extend this lens by introducing a fan-out size, as full-graph training can be viewed as mini-batch training with the largest possible batch size and fan-out size. However, the impact of the batch and fan-out size for GNNs remains insufficiently explored. To this end, this paper systematically compares full-graph vs. mini-batch training of GNNs through empirical and theoretical analyses from the view points of the batch size and fan-out size. Our key contributions include: 1) We provide a novel generalization analysis using the Wasserstein distance to study the impact of the graph structure, especially the fan-out size. 2) We uncover the non-isotropic effects of the batch size and the fan-out size in GNN convergence and generalization, providing practical guidance for tuning these hyperparameters under resource constraints. Finally, full-graph training does not always yield better model performance or computational efficiency than well-tuned smaller mini-batch settings. The implementation can be found in the github link: https://github.com/LIUMENGFAN-gif/GNN_fullgraph_minibatch_training.

URL PDF HTML ☆

赞 0 踩 0

2601.21350 2026-05-19 cs.LG 版本更新

Factored Causal Representation Learning for Robust Reward Modeling in RLHF

因式分解因果表示学习用于RLHF中的鲁棒奖励建模

Yupei Yang, Lin Yang, Wanxi Deng, Lin Qu, Fan Feng, Biwei Huang, Shikui Tu, Lei Xu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Alibaba Group（阿里巴巴集团）； University of California San Diego（加州大学圣地亚哥分校）； Mohamed bin Zayed University of Artificial Intelligence（莫莫德·本·扎耶德人工智能大学）

AI总结本文提出因式分解表示学习框架，通过分离因果因素与非因果因素提升奖励模型鲁棒性，有效缓解奖励黑客问题。

详情

AI中文摘要

一个可靠的奖励模型对于通过人类反馈强化学习对齐大语言模型与人类偏好至关重要。然而，标准奖励模型易受非因果特征影响，导致奖励黑客问题。本文从因果视角出发，提出因式分解表示学习框架，将模型的上下文嵌入分解为（1）足以预测奖励的因果因素和（2）捕捉与奖励无关的属性如长度或趋炎附势偏差的非因果因素。奖励头被约束仅依赖因果部分。此外，引入对抗头预测非因果因素的奖励，同时应用梯度反转以阻止其编码与奖励相关的信息。数学和对话任务实验表明，本文方法学习更稳健的奖励模型，并在下游RLHF性能上优于现有最佳基线。对长度和趋炎附势偏差的分析进一步验证了方法在缓解奖励黑客行为方面的有效性。

英文摘要

A reliable reward model is essential for aligning large language models with human preferences through reinforcement learning from human feedback. However, standard reward models are susceptible to spurious features that are not causally related to human labels. This can lead to reward hacking, where high predicted reward does not translate into better behavior. In this work, we address this problem from a causal perspective by proposing a factored representation learning framework that decomposes the model's contextual embedding into (1) causal factors that are sufficient for reward prediction and (2) non-causal factors that capture reward-irrelevant attributes such as length or sycophantic bias. The reward head is then constrained to depend only on the causal component. In addition, we introduce an adversarial head trained to predict reward from the non-causal factors, while applying gradient reversal to discourage them from encoding reward-relevant information. Experiments on both mathematical and dialogue tasks demonstrate that our method learns more robust reward models and consistently improves downstream RLHF performance over state-of-the-art baselines. Analyses on length and sycophantic bias further validate the effectiveness of our method in mitigating reward hacking behaviors.

URL PDF HTML ☆

赞 0 踩 0

2601.21170 2026-05-19 cs.LG stat.ML 版本更新

The Powers of Precision: Structure-Informed Detection in Complex Systems -- From Customer Churn to Seizure Onset

精度的威力：复杂系统中的结构引导检测——从客户流失到癫痫发作 onset

Augusto Santos, Teresa Santos, Catarina Rodrigues, José M. F. Moura

发表机构 * Instituto de Telecomunicações（电信研究所）； Cegid（Cegid公司）； ECE Department at Carnegie Mellon University（卡内基梅隆大学电子工程系）

AI总结本文提出一种基于结构信息的机器学习方法，用于复杂系统中关键事件的早期检测，通过学习最优特征表示和分类模块，实现对隐藏因果结构的识别与利用，展示了在癫痫发作检测和客户流失预测中的有效性。

详情

AI中文摘要

涌现现象——癫痫发作 onset、突发客户流失或流行病爆发——往往源于复杂系统中隐藏的因果相互作用。我们提出了一种机器学习方法，用于其早期检测，解决了核心挑战：在数据生成过程未知且部分观测的情况下，揭示并利用系统潜在的因果结构。该方法从一个参数家族的估计器中学习最优特征表示——经验协方差或精度矩阵的幂——提供了一种原则性方法来捕捉驱动关键事件出现的底层结构。随后的监督学习模块对学习到的表示进行分类。我们证明了该家族的结构一致性，并在癫痫发作检测和客户流失预测中展示了方法的实证有效性，取得了竞争性的结果。除了预测之外，我们还发现最优协方差幂显示出良好的可识别性，同时捕捉到结构特征，从而在预测性能与可解释的统计结构之间取得平衡。

英文摘要

Emergent phenomena -- onset of epileptic seizures, sudden customer churn, or pandemic outbreaks -- often arise from hidden causal interactions in complex systems. We propose a machine learning method for their early detection that addresses a core challenge: unveiling and harnessing a system's latent causal structure despite the data-generating process being unknown and partially observed. The method learns an optimal feature representation from a one-parameter family of estimators -- powers of the empirical covariance or precision matrix -- offering a principled way to tune in to the underlying structure driving the emergence of critical events. A supervised learning module then classifies the learned representation. We prove structural consistency of the family and demonstrate the empirical soundness of our approach on seizure detection and churn prediction, attaining competitive results in both. Beyond prediction, and toward explainability, we ascertain that the optimal covariance power exhibits evidence of good identifiability while capturing structural signatures, thus reconciling predictive performance with interpretable statistical structure.

URL PDF HTML ☆

赞 0 踩 0

2601.19624 2026-05-19 cs.LG cs.AI 版本更新

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

追踪漂移：面向非平稳强化学习的变异性感知熵调度

Tongxi Wang, Zhuoyang Xia, Xinran Chen, Shan Liu

发表机构 * School of Future Technology, Southeast University, Nanjing, China（东南大学未来技术学院，南京，中国）； School of Automation, Southeast University, Nanjing, China（东南大学自动化学院，南京，中国）

AI总结本文提出AES方法，通过动态调整熵系数以应对环境漂移，减少性能下降并加快恢复速度。

Comments Accepted by ICML 2026

详情

AI中文摘要

现实中的强化学习常面临环境漂移问题，但现有方法多依赖静态熵系数/目标熵，导致稳定期过度探索和漂移后探索不足。本文证明，在标准假设下，非平稳最大熵强化学习中的熵调度可转化为跟踪漂移比较器与稳定更新之间的动态遗憾权衡，得出熵权重与在线非平稳性代理的平方根缩放规则。基于此，提出AES--自适应熵调度，通过在线训练中使用可观察的漂移代理动态调整熵系数/温度，几乎不改变结构且开销极小。在四种算法变体、十二个任务和四种漂移模式中，AES显著减少了漂移导致的性能下降比例并加速了突变后的恢复。

英文摘要

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift, and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We show that, under standard assumptions, entropy scheduling in non-stationary maximum-entropy RL can be cast as the dynamic-regret trade-off between tracking a drifting comparator and stabilizing updates, yielding a square-root scaling rule for the entropy weight in terms of a online non-stationarity proxy. Building on this, we propose AES--Adaptive Entropy Scheduling--which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead. Across 4 algorithm variants, 12 tasks, and 4 drift modes, AES significantly reduces the fraction of performance degradation caused by drift and accelerates recovery after abrupt changes.

URL PDF HTML ☆

赞 0 踩 0

2601.16527 2026-05-19 cs.LG cs.AI cs.CL cs.CV 版本更新

Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs

超越表面遗忘：多模态大语言模型中Hallucinations的锐度感知鲁棒擦除

Xianya Fang, Feiyang Ren, Xiang Chen, Yu Tian, Zhen Bi, Haiyang Yu, Sheng-Jun Huang

发表机构 * College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics（南京航空航天大学计算机科学与技术学院）； Institute for AI, Tsinghua University（清华大学人工智能研究院）； Huzhou University（湖州大学）； Institute of Dataspace, Hefei Comprehensive National Science Center（合肥综合性国家科学中心数据空间研究院）； University of Science and Technology of China（中国科学技术大学）

AI总结本文提出SARE方法，通过目标导向的min-max优化和Targeted-SAM机制，解决多模态大语言模型中 hallucinations 的鲁棒擦除问题，提升模型稳定性与擦除效果。

详情

AI中文摘要

多模态大语言模型虽然强大，但容易产生hallucinations，即不存在的实体，影响可靠性。尽管最近的遗忘方法试图缓解这一问题，我们发现了一个关键缺陷：结构脆弱性。我们实证显示，标准擦除仅能表面抑制，使模型陷入尖锐极小值，轻度重新学习后hallucinations会灾难性复苏。为确保几何稳定性，我们提出SARE，将遗忘视为目标min-max优化问题，并使用Targeted-SAM机制显式平坦hallucinated概念周围的损失景观。通过在模拟最坏情况参数扰动下抑制hallucinations，我们的框架确保了鲁棒去除的稳定性。大量实验表明，SARE在擦除效果上显著优于基线，同时保持一般生成质量。关键的是，它在重新学习和参数更新中维持持久的hallucination抑制，验证了几何稳定性的有效性。

英文摘要

Multimodal LLMs are powerful but prone to object hallucinations, which describe non-existent entities and harm reliability. While recent unlearning methods attempt to mitigate this, we identify a critical flaw: structural fragility. We empirically demonstrate that standard erasure achieves only superficial suppression, trapping the model in sharp minima where hallucinations catastrophically resurge after lightweight relearning. To ensure geometric stability, we propose SARE, which casts unlearning as a targeted min-max optimization problem and uses a Targeted-SAM mechanism to explicitly flatten the loss landscape around hallucinated concepts. By suppressing hallucinations under simulated worst-case parameter perturbations, our framework ensures robust removal stable against weight shifts. Extensive experiments demonstrate that SARE significantly outperforms baselines in erasure efficacy while preserving general generation quality. Crucially, it maintains persistent hallucination suppression against relearning and parameter updates, validating the effectiveness of geometric stabilization.

URL PDF HTML ☆

赞 0 踩 0

2601.16398 2026-05-19 cs.CY cs.CL cs.LG 版本更新

White-Box Sensitivity Auditing with Steering Vectors

白盒敏感性审计与引导向量

Hannah Cyberey, Yangfeng Ji, David Evans

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结本文提出白盒敏感性审计框架，通过激活引导进行更严格的模型内部评估，用于检测大语言模型中的偏见，揭示模型对保护属性的依赖。

详情

AI中文摘要

算法审计是检查系统属性的重要工具，当前对大语言模型（LLM）的审计主要依赖黑盒评估，仅通过输入输出测试。这些方法局限于输入空间中的测试，通常由启发式生成。此外，许多社会相关模型属性（如性别偏见）抽象且难以通过文本输入单独测量。为解决这些限制，我们提出了一种白盒敏感性审计框架，利用激活引导进行更严格的内部评估。我们的审计方法通过操纵关键概念进行内部敏感性测试，以评估模型的预期功能。我们展示了其在四个模拟高风险LLM决策任务中的应用。我们的方法一致表明，模型预测对保护属性存在显著依赖，即使在标准黑盒评估表明几乎没有偏见的设置中。我们的代码在https://github.com/hannahxchen/llm-steering-audit上公开可用。

英文摘要

Algorithmic audits are essential tools for examining systems for properties required by regulators or desired by operators. Current audits of large language models (LLMs) primarily rely on black-box evaluations that assess model behavior only through input-output testing. These methods are limited to tests constructed in the input space, often generated by heuristics. In addition, many socially relevant model properties (e.g., gender bias) are abstract and difficult to measure through text-based inputs alone. To address these limitations, we propose a white-box sensitivity auditing framework for LLMs that leverages activation steering to conduct more rigorous assessments through model internals. Our auditing method conducts internal sensitivity tests by manipulating key concepts relevant to the model's intended function for the task. We demonstrate its application to bias audits in four simulated high-stakes LLM decision tasks. Our method consistently indicates substantial dependence on protected attributes in model predictions, even in settings where standard black-box evaluations suggest little or no bias. Our code is openly available at https://github.com/hannahxchen/llm-steering-audit

URL PDF HTML ☆

赞 0 踩 0

2601.16287 2026-05-19 physics.optics cond-mat.mtrl-sci cs.LG physics.app-ph 版本更新

Active learning for photonic crystals

光子晶体的主动学习

Ryan Lopez, Charlotte Loh, Rumen Dangovski, Marin Soljačić

发表机构 * Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA（麻省理工学院物理系）； Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA（麻省理工学院电气工程与计算机科学系）

AI总结本文提出基于分析的LL-BNN与不确定性驱动采样结合的方法，用于加速光子带隙预测，通过聚焦高不确定性区域减少训练数据需求，提升光子晶体拓扑优化效率。

Comments 8 pages, 7 figures, accepted to Optics Express; updated version after reviewer comments

详情

AI中文摘要

光子晶体的主动学习探讨了将分析近似贝叶斯最后一层神经网络（LL-BNNs）与不确定性驱动的样本选择相结合，以加速光子带隙预测。我们采用分析LL-BNN公式，对应于无限蒙特卡洛样本极限，以获得与未标记候选结构的真实预测误差强相关的不确定性估计。这些不确定性评分驱动主动学习策略，在训练过程中优先选择最信息量的模拟。应用于预测二维双频光子晶体的带隙大小任务，我们的方法在平均训练数据需求上比随机采样基线减少了2.7倍，同时保持预测准确性。效率提升源于将计算资源集中在设计空间的高不确定性区域，而非均匀采样。鉴于完整带结构模拟的巨大成本，尤其是在三维情况下，这种数据效率使快速可扩展的代理建模成为可能。我们的结果表明，基于分析LL-BNN的主动学习可以显著加速光子晶体的拓扑优化和反向设计流程，并更广泛地提供一个通用的数据高效回归框架，适用于科学机器学习领域。

英文摘要

Active learning for photonic crystals explores the integration of analytic approximate Bayesian last layer neural networks (LL-BNNs) with uncertainty-driven sample selection to accelerate photonic band gap prediction. We employ an analytic LL-BNN formulation, corresponding to the infinite Monte Carlo sample limit, to obtain uncertainty estimates that are strongly correlated with the true predictive error on unlabeled candidate structures. These uncertainty scores drive an active learning strategy that prioritizes the most informative simulations during training. Applied to the task of predicting band gap sizes in two-dimensional, two-tone photonic crystals, our approach achieves up to a 2.7x reduction on average in required training data compared to a random sampling baseline while maintaining predictive accuracy. The efficiency gains arise from concentrating computational resources on high uncertainty regions of the design space rather than sampling uniformly. Given the substantial cost of full band structure simulations, especially in three dimensions, this data efficiency enables rapid and scalable surrogate modeling. Our results suggest that analytic LL-BNN based active learning can substantially accelerate topological optimization and inverse design workflows for photonic crystals, and more broadly, offers a general framework for data efficient regression across scientific machine learning domains.

URL PDF HTML ☆

赞 0 踩 0

2601.11895 2026-05-19 cs.LG cs.AI cs.SE 版本更新

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

DevBench：一个现实的、面向开发者的代码生成模型基准测试

Adarsh Kumarappan, Pareesa Ameneh Golnari, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, Elsie Nallipogu

发表机构 * California Institute of Technology（加州理工学院）； Microsoft（微软公司）

AI总结 DevBench通过真实开发者数据和生成模型合成，构建了包含六种编程语言和任务的1800个实例，评估大语言模型在代码补全任务中的表现，揭示模型在语法精度、语义推理和实用价值上的差异。

详情

AI中文摘要

DevBench是一个基于 telemetry 的基准测试，旨在评估大语言模型（LLMs）在现实代码补全任务中的性能。它包含1,800个评估实例，覆盖六种编程语言和六种任务类别，这些数据来源于真实开发者 telemetry 和多个提供商家庭的生成模型，以减轻单一来源偏差。与以往的基准测试不同，它强调生态效度，避免训练数据污染，并允许详细的诊断。评估结合了功能性正确性、基于相似度的指标以及LLM评估，专注于有用性和上下文相关性。9种最先进的模型被评估，最强的模型在Pass@1上仅达到43.5%，证实了该基准测试仍然具有挑战性，并揭示了语法精度、语义推理和实用价值之间的差异。我们的基准测试提供了可操作的见解，以指导模型选择和改进，这些细节通常缺失于其他基准测试，但对实际部署和目标模型开发至关重要。

英文摘要

DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on realistic code completion tasks. It includes 1,800 evaluation instances across six programming languages and six task categories derived from real developer telemetry and synthesized using generator models from multiple provider families to mitigate single-source bias. Unlike prior benchmarks, it emphasizes ecological validity, avoids training data contamination, and enables detailed diagnostics. The evaluation combines functional correctness, similarity-based metrics, and LLM-judge assessments focused on usefulness and contextual relevance. 9 state-of-the-art models were assessed, with the strongest achieving only 43.5% Pass@1, confirming the benchmark remains challenging and revealing differences in syntactic precision, semantic reasoning, and practical utility. Our benchmark provides actionable insights to guide model selection and improvement, detail that is often missing from other benchmarks but is essential for both practical deployment and targeted model development.

URL PDF HTML ☆

赞 0 踩 0

2601.10705 2026-05-19 cs.LG 版本更新

Distributed Perceptron under Bounded Staleness, Partial Participation, and Noisy Communication

分布式感知机在有界陈旧度、部分参与和噪声通信下的应用

Keval Jain, Anant Raj, Saurav Prakash, Girish Varma

发表机构 * Indian Institute of Science（印度科学研究院）

AI总结本文研究了在联邦和分布式部署中，通过迭代参数混合（IPM风格平均）训练的半异步客户端-服务器感知机，考虑了延迟更新、部分参与和通信噪声的影响，并提出了基于陈旧度的聚合规则。

详情

AI中文摘要

我们研究了一种通过迭代参数混合（IPM风格平均）进行训练的半异步客户端-服务器感知机。客户端运行本地感知机更新，服务器通过聚合每个通信轮次到达的更新来形成全局模型。该设置捕捉了联邦和分布式部署中的三种系统效应：（i）由于模型交付延迟和客户端计算应用延迟导致的陈旧更新（双侧版本滞后），（ii）部分参与（间歇性客户端可用性），以及（iii）下行链路和上行链路通信不完美，建模为具有有界二阶矩的有效零均值加性噪声。我们引入了一种称为带有填充的陈旧度桶聚合的服务器端聚合规则，该规则确定性地强制一个预定的陈旧度配置，而无需假设任何延迟或参与的随机模型。在边缘分离性和有界数据半径条件下，我们证明了在给定的服务器轮次数内，累积加权感知机错误数的有限时间期望界：延迟的影响仅通过强制的均值陈旧度出现，而通信噪声贡献了一个额外的项，其增长速率与时间跨度的平方根成正比，总噪声能量。在无噪声情况下，我们展示了有限的期望错误预算如何在温和的鲜参与条件下产生显式的有限轮次稳定界。

英文摘要

We study a semi-asynchronous client-server perceptron trained via iterative parameter mixing (IPM-style averaging): clients run local perceptron updates and a server forms a global model by aggregating the updates that arrive in each communication round. The setting captures three system effects in federated and distributed deployments: (i) stale updates due to delayed model delivery and delayed application of client computations (two-sided version lag), (ii) partial participation (intermittent client availability), and (iii) imperfect communication on both downlink and uplink, modeled as effective zero-mean additive noise with bounded second moment. We introduce a server-side aggregation rule called staleness-bucket aggregation with padding that deterministically enforces a prescribed staleness profile over update ages without assuming any stochastic model for delays or participation. Under margin separability and bounded data radius, we prove a finite-horizon expected bound on the cumulative weighted number of perceptron mistakes over a given number of server rounds: the impact of delay appears only through the mean enforced staleness, whereas communication noise contributes an additional term that grows on the order of the square root of the horizon with the total noise energy. In the noiseless case, we show how a finite expected mistake budget yields an explicit finite-round stabilization bound under a mild fresh-participation condition.

URL PDF HTML ☆

赞 0 踩 0

2601.08013 2026-05-19 cs.LG 版本更新

Beyond the Next Port: A Multi-Task Transformer for Forecasting Future Voyage Segment Durations

超越下一个港口：一种用于预测未来航行段持续时间的多任务Transformer

Nairui Liu, Fang He, Xindi Tang, Yineng Wang

发表机构 * Department of Industrial Engineering, Tsinghua University, Beijing 100084, P.R. China（清华大学工业工程系）； School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100081, P.R. China（中央财经大学管理科学与工程学院）； Department of Logistics and Maritime Studies, The Hong Kong Polytechnic University, Hong Kong, P.R. China（香港理工大学物流与海运研究部）

AI总结本文提出一种多任务Transformer模型，用于预测未来航行段持续时间，通过整合历史航行时间、目的地港口拥堵代理和静态船舶描述符，提升港口操作的可靠性。

详情

AI中文摘要

准确预测段级航行持续时间对提高海运调度可靠性和优化长期港口运营至关重要。然而，传统到达时间估计（ETA）模型主要针对下一个港口，依赖实时自动识别系统（AIS）数据，无法用于未来航行段。为此，本研究将未来港口ETA预测重新表述为段级时间序列预测问题。我们开发了一种基于Transformer的架构，整合了历史航行持续时间、目的地港口拥堵代理和静态船舶描述符。所提出的框架采用因果掩码注意力机制以捕捉长期时间依赖性，并利用多任务学习头联合预测段航行持续时间和港口拥堵状态，通过共享潜在信号来缓解高不确定性。在2021年真实世界全球数据集上的评估表明，所提模型在综合基线模型上表现更优。结果表明，与顺序深度学习模型相比，均方误差（RMSE）减少了2.59%，平均绝对误差（MAE）减少了4.70%，平均绝对百分比误差（MAPE）减少了4.95%。与梯度提升机相比，MAE减少了7.03%，MAPE减少了39.49%，RMSE减少了4.37%。对一个主要目的地港口的案例研究进一步展示了模型的优越精度。

英文摘要

Accurate forecasts of segment-level sailing durations are fundamental to enhancing maritime schedule reliability and optimizing long-term port operations. However, conventional estimated time of arrival (ETA) models are primarily designed for the immediate next port of call and rely heavily on real-time automatic identification system (AIS) data, which is inherently unavailable for future voyage segments. To address this gap, the study reformulates future-port ETA prediction as a segment-level time-series forecasting problem. We develop a transformer-based architecture that integrates historical sailing durations, destination port congestion proxies, and static vessel descriptors. The proposed framework employs a causally masked attention mechanism to capture long-range temporal dependencies and a multi-task learning head to jointly predict segment sailing durations and port congestion states, leveraging shared latent signals to mitigate high uncertainty. Evaluation on a real-world global dataset from 2021 demonstrates the proposed model consistently outperforms a comprehensive suite of competitive baselines. The result shows a relative reduction of 4.70% in mean absolute error (MAE), 4.95% in mean absolute percentage error (MAPE) and 2.59% in root mean squared error (RMSE) compared with sequential deep learning models. The relative reductions compared with gradient boosting machines are 7.03% in MAE, 39.49% in MAPE and 4.37% in RMSE. The case study conducted on one major destination port further illustrates the model's superior accuracy.

URL PDF HTML ☆

赞 0 踩 0

2601.06633 2026-05-19 cs.LG cs.AI cs.CL cs.CY 版本更新

KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks

KASER：面向开放性编程任务的知识对齐学生错误模拟器

Zhangqi Duan, Nigel Fernandez, Andrew Lan

发表机构 * University of Massachusetts（马萨诸塞大学）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）

AI总结 KASER通过强化学习方法，结合代码相似性、错误匹配和预测多样性，提升大语言模型对学生错误的模拟与预测能力，实验表明其在代码和错误预测及错误覆盖方面优于基线方法。

Comments Published in ACL 2026: The 64th Annual Meeting of the Association for Computational Linguistics

详情

AI中文摘要

开放性任务，如计算机科学教育中的编程问题，能提供关于学生知识的深入洞察。然而，训练大语言模型（LLMs）模拟和预测学生在这些问题上的可能错误具有挑战性：它们常出现模式崩溃，并无法充分捕捉学生响应中的语法、风格和解决方案方法的多样性。在本文中，我们提出了KASER（知识对齐学生错误模拟器），一种将错误与学生知识对齐的新方法。我们提出了一种基于强化学习的训练方法，使用混合奖励反映学生代码预测的三个方面：i）代码与地面真相的相似性，ii）错误匹配，以及iii）代码预测的多样性。在两个真实世界数据集上，我们进行了两个层面的评估，并表明：在每对学生-问题对层面，我们的方法在代码和错误预测上优于基线；在每问题层面，我们的方法在错误覆盖和模拟代码多样性上优于基线。

英文摘要

Open-ended tasks, such as coding problems that are common in computer science education, provide detailed insights into student knowledge. However, training large language models (LLMs) to simulate and predict possible student errors in their responses to these problems can be challenging: they often suffer from mode collapse and fail to fully capture the diversity in syntax, style, and solution approach in student responses. In this work, we present KASER (Knowledge-Aligned Student Error Simulator), a novel approach that aligns errors with student knowledge. We propose a training method based on reinforcement learning using a hybrid reward that reflects three aspects of student code prediction: i) code similarity to the ground-truth, ii) error matching, and iii) code prediction diversity. On two real-world datasets, we perform two levels of evaluation and show that: At the per-student-problem pair level, our method outperforms baselines on code and error prediction; at the per-problem level, our method outperforms baselines on error coverage and simulated code diversity.

URL PDF HTML ☆

赞 0 踩 0

2601.06009 2026-05-19 stat.ML cs.LG eess.SP math.PR stat.AP 版本更新

Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem

通过非参数逃逸定理检测离散信号中的随机性

Sunia Tanweer, Firas A. Khasawneh

发表机构 * Dept. of Mechanical Engineering, Michigan State University（密歇根州立大学机械工程系）； Dept. of Computational Mathematics, Science and Engineering, Michigan State University（密歇根州立大学计算数学、科学与工程系）

AI总结本文提出一种基于连续半鞅逃逸和穿越定理的非参数方法，通过比较实测逃逸次数与理论期望比值，区分扩散过程与确定性信号，不依赖参数模型。

详情

DOI: 10.1063/5.0324348

AI中文摘要

我们开发了一个实用框架，仅使用单个离散时间序列区分扩散随机过程与确定性信号。该方法基于连续半鞅的经典逃逸和穿越定理，将逃逸次数$N_\varepsilon$与过程的二次变分$[X]_T$相关联。该标度定律适用于所有具有有限二次变分的连续半鞅，包括具有非线性或状态依赖波动率的一般伊藤扩散过程，但对确定性系统失效，从而提供了一种理论认证的方法来区分这些动态，而非基于主观熵或复发的最新方法。我们构建了一个稳健的数据驱动扩散测试，该方法将实测逃逸次数与理论期望进行比较。所得比值$K(\varepsilon)=N_{\varepsilon}^{\mathrm{emp}}/N_{\varepsilon}^{\mathrm{theory}}$通过log-log斜率偏差总结，测量$\varepsilon^{-2}$定律，从而分类为扩散样或非扩散样。我们在经典随机系统、某些周期性和混沌映射及加性白噪声系统，以及随机杜芬系统上展示了该方法。该方法是非参数、无模型的，仅依赖于连续半鞅的小尺度结构。

英文摘要

We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical excursion and crossing theorems for continuous semimartingales, which correlates number $N_\varepsilon$ of excursions of magnitude at least $\varepsilon$ with the quadratic variation $[X]_T$ of the process. The scaling law holds universally for all continuous semimartingales with finite quadratic variation, including general Ito diffusions with nonlinear or state-dependent volatility, but fails sharply for deterministic systems -- thereby providing a theoretically-certfied method of distinguishing between these dynamics, as opposed to the subjective entropy or recurrence based state of the art methods. We construct a robust data-driven diffusion test. The method compares the empirical excursion counts against the theoretical expectation. The resulting ratio $K(\varepsilon)=N_{\varepsilon}^{\mathrm{emp}}/N_{\varepsilon}^{\mathrm{theory}}$ is then summarized by a log-log slope deviation measuring the $\varepsilon^{-2}$ law that provides a classification into diffusion-like or not. We demonstrate the method on canonical stochastic systems, some periodic and chaotic maps and systems with additive white noise, as well as the stochastic Duffing system. The approach is nonparametric, model-free, and relies only on the universal small-scale structure of continuous semimartingales.

URL PDF HTML ☆

赞 0 踩 0

2601.05679 2026-05-19 cs.LG 版本更新

Do Sparse Autoencoders Identify Reasoning Features in Language Models?

稀疏自编码器是否在语言模型中识别推理特征？

George Ma, Zhongyuan Liang, Irene Y. Chen, Somayeh Sojoudi

发表机构 * UC Berkeley（加州大学伯克利分校）； UCSF（旧金山大学）

AI总结研究稀疏自编码器在大语言模型中识别推理相关内部特征的可靠性，提出基于因果token注入的评估框架，发现许多候选特征对token级干预敏感，需通过反证法验证其推理相关性。

Comments In Forty-Third International Conference on Machine Learning (2026)

详情

AI中文摘要

我们研究稀疏自编码器（SAEs）如何可靠地支持关于大语言模型中推理相关内部特征的主张。我们首先进行简化分析，表明稀疏正则化解码可以优先保留稳定的低维相关性，同时抑制高维行为内变异性，这促使我们考虑对比选择的'推理'特征可能在推理痕迹耦合时集中在提示结构上。基于此视角，我们提出一种基于反证的评估框架，结合因果token注入与LLM引导的反例构造。在22种配置中，涵盖多个模型家族、层和推理数据集，我们发现许多对比选择的候选特征对token级干预高度敏感，45%-90%在注入少量相关token到非推理文本后激活。对于剩余的上下文依赖候选特征，LLM引导的反证会产生触发激活的非推理输入，并生成保留意义的改写，以抑制激活。小规模引导研究在评估基准上产生最小变化。总体而言，我们的结果表明，在我们研究的设置中，稀疏分解可能倾向于与推理共现的低维相关性，强调在将高层行为归因于个别SAE特征时需要反证。代码可在https://github.com/GeorgeMLP/reasoning-probing获取。

英文摘要

We study how reliably sparse autoencoders (SAEs) support claims about reasoning-related internal features in large language models. We first give a stylized analysis showing that sparsity-regularized decoding can preferentially retain stable low-dimensional correlates while suppressing high-dimensional within-behavior variation, motivating the possibility that contrastively selected "reasoning" features may concentrate on cue-like structure when such cues are coupled with reasoning traces. Building on this perspective, we propose a falsification-based evaluation framework that combines causal token injection with LLM-guided counterexample construction. Across 22 configurations spanning multiple model families, layers, and reasoning datasets, we find that many contrastively selected candidates are highly sensitive to token-level interventions, with 45%-90% activating after injecting only a few associated tokens into non-reasoning text. For the remaining context-dependent candidates, LLM-guided falsification produces targeted non-reasoning inputs that trigger activation and meaning-preserving paraphrases of top-activating reasoning traces that suppress it. A small steering study yields minimal changes on the evaluated benchmarks. Overall, our results suggest that, in the settings we study, sparse decompositions can favor low-dimensional correlates that co-occur with reasoning, underscoring the need for falsification when attributing high-level behaviors to individual SAE features. Code is available at https://github.com/GeorgeMLP/reasoning-probing.

URL PDF HTML ☆

赞 0 踩 0

2601.05527 2026-05-19 cs.LG cs.AI 版本更新

DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

DeMa：双路径延迟感知Mamba用于高效多变量时间序列分析

Rui An, Haohao Qu, Wenqi Fan, Xuequn Shang, Qing Li

发表机构 * Northwestern Polytechnical University（西北工业大学）

AI总结 DeMa通过双路径架构改进Mamba，解决多变量时间序列分析中的延迟建模、跨变量依赖和时间动态分离问题，实现高效且准确的分析。

Comments The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-52221-6}

详情

AI中文摘要

准确且高效的多变量时间序列（MTS）分析对广泛智能应用越来越关键。在这一领域，Transformer因其强大的捕捉成对依赖能力而成为主导架构。然而，基于Transformer的模型存在二次计算复杂度和高内存开销，限制了其在长期和大规模MTS建模中的可扩展性和实用性。最近，Mamba作为一种线性时间替代方案出现，具有高表达能力。然而，直接应用原始Mamba到MTS仍不理想，因为存在三个关键限制：（i）缺乏显式的跨变量建模，（ii）难以分离纠缠的系列内时间动态和系列间交互，（iii）对潜在时间滞后交互效应的建模不足。这些问题限制了其在多样MTS任务中的有效性。为了解决这些挑战，我们提出了DeMa，一种双路径延迟感知Mamba骨干网络。DeMa保留了Mamba的线性复杂度优势，同时显著提高了其在MTS设置中的适用性。具体而言，DeMa引入了三个关键创新：（i）它将MTS分解为系列内时间动态和系列间交互；（ii）它开发了一个时间路径，包含Mamba-SSD模块，以捕捉每个单独系列内的长程动态，实现系列无关的并行计算；（iii）它设计了一个变量路径，包含Mamba-DALA模块，通过延迟感知线性注意力模块来建模跨变量依赖。在五个代表性任务（长期和短期预测、数据插补、异常检测和系列分类）上的广泛实验表明，DeMa在达到最先进性能的同时，还实现了显著的计算效率。

英文摘要

Accurate and efficient multivariate time series (MTS) analysis is increasingly critical for a wide range of intelligent applications. Within this realm, Transformers have emerged as the predominant architecture due to their strong ability to capture pairwise dependencies. However, Transformer-based models suffer from quadratic computational complexity and high memory overhead, limiting their scalability and practical deployment in long-term and large-scale MTS modeling. Recently, Mamba has emerged as a promising linear-time alternative with high expressiveness. Nevertheless, directly applying vanilla Mamba to MTS remains suboptimal due to three key limitations: (i) the lack of explicit cross-variate modeling, (ii) difficulty in disentangling the entangled intra-series temporal dynamics and inter-series interactions, and (iii) insufficient modeling of latent time-lag interaction effects. These issues constrain its effectiveness across diverse MTS tasks. To address these challenges, we propose DeMa, a dual-path delay-aware Mamba backbone. DeMa preserves Mamba's linear-complexity advantage while substantially improving its suitability for MTS settings. Specifically, DeMa introduces three key innovations: (i) it decomposes the MTS into intra-series temporal dynamics and inter-series interactions; (ii) it develops a temporal path with a Mamba-SSD module to capture long-range dynamics within each individual series, enabling series-independent, parallel computation; and (iii) it designs a variate path with a Mamba-DALA module that integrates delay-aware linear attention to model cross-variate dependencies. Extensive experiments on five representative tasks, long- and short-term forecasting, data imputation, anomaly detection, and series classification, demonstrate that DeMa achieves state-of-the-art performance while delivering remarkable computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2601.02353 2026-05-19 cs.CV cs.LG 版本更新

Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices

元学习引导的剪枝用于边缘设备上的少样本植物病理学

Mohammed Mudassir Uddin, Shahnawaz Alam, Mohammed Kaif Pasha, Dr Tasneem Bano Rehman, Dr Fahmina Taranum, Afroze Begum

发表机构 * Department of CSE, Muffakham Jah College of Engineering and Technology (MJCET)（计算机科学与工程系，穆法卡姆·贾赫工程与技术学院（MJCET））

AI总结本文提出DACIS方法，结合神经网络剪枝与少样本学习，实现边缘设备上高效植物疾病识别，实验表明模型大小减小78%且保持92.3%的精度。

详情

AI中文摘要

远程地区农民需要快速可靠的植物疾病识别方法，但通常缺乏实验室或高性能计算资源。深度学习模型可通过叶片图像检测疾病，但模型通常过大且计算成本高，难以在低成本边缘设备如Raspberry Pi上运行。此外，收集数千张标记的疾病图像进行训练既昂贵又耗时。本文通过结合神经网络剪枝和少样本学习解决这两个挑战。本文提出Disease-Aware Channel Importance Scoring (DACIS)，一种识别神经网络中区分不同植物疾病关键部分的方法，集成到三阶段Prune-then-Meta-Learn-then-Prune (PMP)流程中。在PlantVillage和PlantDoc数据集上的实验表明，所提出的方法将模型大小减少78%，同时保持92.3%的原始精度，压缩后的模型在Raspberry Pi 4上以每秒7帧的速度运行，使小农户农民的实时田间诊断成为可能。

英文摘要

Farmers in remote areas need quick and reliable methods for identifying plant diseases, yet they often lack access to laboratories or high-performance computing resources. Deep learning models can detect diseases from leaf images with high accuracy, but these models are typically too large and computationally expensive to run on low-cost edge devices such as Raspberry Pi. Furthermore, collecting thousands of labeled disease images for training is both expensive and time-consuming. This paper addresses both challenges by combining neural network pruning, removing unnecessary parts of the model, with few-shot learning, which enables the model to learn from limited examples. This paper proposes Disease-Aware Channel Importance Scoring (DACIS), a method that identifies which parts of the neural network are most important for distinguishing between different plant diseases, integrated into a three-stage Prune-then-Meta-Learn-then-Prune (PMP) pipeline. Experiments on PlantVillage and PlantDoc datasets demonstrate that the proposed approach reduces model size by 78% while maintaining 92.3% of the original accuracy, with the compressed model running at 7 frames per second on a Raspberry Pi 4, making real-time field diagnosis practical for smallholder farmers.

URL PDF HTML ☆

赞 0 踩 0

2512.23978 2026-05-19 cs.LG math.OC stat.ML 版本更新

Assured autonomy: How operations research powers and orchestrates generative AI systems

保障自主性：如何用运筹学赋能和协调生成式AI系统

Tinglong Dai, David Simchi-Levi, Michelle Xiao Wu, Yao Xie

发表机构 * Carey Business School, Johns Hopkins University（约翰霍普金斯大学卡里商学院）； Data Science and AI Institute, Johns Hopkins University（约翰霍普金斯大学数据科学与人工智能研究院）； Institute for Data, Systems and Society, Operations Research Center, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology（麻省理工学院数据、系统与社会研究所，运筹学中心，土木与环境工程系）； Purdue University（普渡大学）； H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology（佐治亚理工学院H.米尔顿·斯图尔特工业与系统工程学院）

AI总结本文探讨生成式AI在向自主决策系统转变过程中，如何通过运筹学方法提升系统的可行性、鲁棒性和风险控制能力。

Comments Authors are listed alphabetically; Production and Operations Management (POM), 2026

详情

AI中文摘要

生成式人工智能（GenAI）正从对话助手转向代理系统——能够在操作流程中感知、决策和行动的自主决策系统。这种转变带来了自主性悖论：随着GenAI系统获得更大的操作自主权，它们应通过设计体现更正式的结构、更明确的约束和更强的风险控制。我们论证，除非生成模型与提供可验证可行性、对抗鲁棒性和高后果场景下的压力测试机制相结合，否则随机生成模型在操作领域可能脆弱。为此，我们开发了一个以运筹学（OR）为基础的保障自主性框架，基于两种互补方法。首先，基于流的生成模型将生成过程框架为确定性传输，由常微分方程描述，从而实现可审计性、约束感知生成以及与最优传输、鲁棒优化和顺序决策控制的联系。其次，通过对抗鲁棒性视角制定操作安全性：决策规则在不确定性或模糊集内评估最坏扰动，使未建模风险成为设计的一部分。该框架阐明了增加自主性如何使OR的角色从求解器转变为护栏到系统架构师，负责控制逻辑、激励协议、监控制度和安全边界。这些元素定义了在安全关键、可靠性敏感的操作领域中保障自主性的研究议程。

英文摘要

Generative artificial intelligence (GenAI) is shifting from conversational assistants toward agentic systems -- autonomous decision-making systems that sense, decide, and act within operational workflows. This shift creates an autonomy paradox: as GenAI systems are granted greater operational autonomy, they should, by design, embody more formal structure, more explicit constraints, and stronger tail-risk discipline. We argue that stochastic generative models can be fragile in operational domains unless paired with mechanisms that provide verifiable feasibility, robustness to distribution shift, and stress testing under high-consequence scenarios. To address this challenge, we develop a conceptual framework for assured autonomy grounded in operations research (OR), built on two complementary approaches. First, flow-based generative models frame generation as deterministic transport characterized by an ordinary differential equation, enabling auditability, constraint-aware generation, and connections to optimal transport, robust optimization, and sequential decision control. Second, operational safety is formulated through an adversarial robustness lens: decision rules are evaluated against worst-case perturbations within uncertainty or ambiguity sets, making unmodeled risks part of the design. This framework clarifies how increasing autonomy shifts OR's role from solver to guardrail to system architect, with responsibility for control logic, incentive protocols, monitoring regimes, and safety boundaries. These elements define a research agenda for assured autonomy in safety-critical, reliability-sensitive operational domains.

URL PDF HTML ☆

赞 0 踩 0

2512.23752 2026-05-19 cs.LG cs.AI 版本更新

Geometric Scaling of Bayesian Inference in LLMs

贝叶斯推断在大语言模型中的几何特性

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

发表机构 * Columbia University（哥伦比亚大学）； Columbia University School of Professional Studies（哥伦比亚大学专业研究学院）； Department of Statistics（统计学系）； Columbia University Department of Computer Science（哥伦比亚大学计算机科学系）

AI总结研究发现大语言模型中存在几何结构，用于编码后验结构，通过干预实验表明该结构是不确定性的重要读取而非单一计算瓶颈。

Comments v2: Extend cross-architecture analysis with Qwen2.5 and DeepSeek (MLA) families; add SULA and RoPE-channel results; document MLA boundary case (DeepSeek-V2-Lite: substrate preserved, dynamic routing absent); add dual-entropy framework at scale; fix duplicate bibliography entries

2512.23070 2026-05-19 cs.LG 版本更新

高阶LaSDI：多时间导数的降阶建模

Robert Stephany, William Michael Anderson, Youngsoo Choi

发表机构 * Center for Applied Scientific Computing（应用科学计算中心）； Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）

AI总结本文提出高阶LaSDI方法，通过引入灵活的高阶有限差分方案和Rollout损失函数，提升降阶模型在长时间尺度上的预测能力，验证于二维Burgers方程。

Comments 38 pages, 14 figures

2512.09269 2026-05-19 cs.LG cs.IR 版本更新

Goal inference with Rao-Blackwellized Particle Filters

基于 Rao-Blackwellized 粒子滤波器的目标推断

Yixuan Wang, Dan P. Guralnik, Warren E. Dixon

发表机构 * Mechanical \& Aerospace Engineering Department, University of Florida. ； Mathematics Department, Ohio University

AI总结本文提出利用改进的 Rao-Blackwellized 粒子滤波器推断移动智能体的目标，通过闭合环行为假设提升样本效率，并引入两种估计器评估对抗者恢复意图的能力。

Comments 6 pages, 3 figures. Accepted for presentation at the 23rd IFAC World Congress 2026, Busan, Republic of Korea, August 23-28, 2026. To appear in IFAC-PapersOnLine

详情

AI中文摘要

从噪声观测中推断移动智能体最终目标是基本的估计问题。本文首次使用 Rao-Blackwellized 粒子滤波器（RBPF）变体研究意图推断，假设智能体意图通过具有已证明实用稳定性性质的闭环行为表现。利用假设的闭式智能体动力学，RBPF 分析性地边缘化线性高斯子结构，并仅更新粒子权重，提升样本效率。引入两种差分估计器：基于 RBPF 权重的高斯混合模型和限制混合到有效样本的简化版本。通过信息论泄漏度量量化对抗者恢复意图的能力，并通过高斯混合 KL 界提供可计算的 KL 散度下界。还提供两种估计器性能差异的界，表明简化估计器几乎与完整估计器一样有效。实验展示了对合规智能体的快速准确意图恢复，激励未来设计意图混淆控制器的研究。

英文摘要

Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.

URL PDF HTML ☆

赞 0 踩 0

2511.08154 2026-05-19 hep-ph cs.LG hep-th 版本更新

Good flavor search in SU(5): a machine learning approach

在SU(5)中寻找良好风味：一种机器学习方法

Fayez Abu-Ajamieh, Shinsuke Kawai, Nobuchika Okada

发表机构 * Formerly Center for High Energy Physics, Indian Institute of Science, Bangalore 560012, Karnataka, India（原高能物理中心，印度科学研究院，班加罗尔560012，卡纳塔克邦，印度）； Faculty of Science, Yamagata University, 1-4-12 Kojirakawa-machi, Yamagata, 990-8560 Japan（山梨大学科学系，山梨，990-8560日本）； Department of Physics and Astronomy, University of Alabama, Tuscaloosa, Alabama, AL35487 USA（阿拉巴马大学物理与天文学系，塔斯卡洛萨，阿拉巴马州，AL35487 USA）

AI总结本文利用机器学习技术重新审视SU(5)统一理论中的费米子质量问题，通过比较不同修正方案的美观性，发现24维场相互作用模型更接近原始Georgi-Glashow模型。

Comments 15 pages, 9 figures, version to be published

详情

AI中文摘要

我们重新审视SU(5)大统一理论中的费米子质量问题，使用机器学习技术。原始SU(5)模型由Georgi和Glashow提出，与观测到的费米子质量谱不兼容。已知有两种解决办法：一种是通过引入45维场的新相互作用，另一种是通过24维场。我们研究哪种修正更“美丽”，将美丽定义为接近原始Georgi-Glashow SU(5)模型的程度。分析显示，在超对称和非超对称情况下，包含24维场相互作用的模型在这一标准下更美丽。我们通过引入连续参数y，将这些模型一般化，其中y=3对应45维场，y=1.5对应24维场。数值优化显示，y≈0.8能最接近原始SU(5)模型，表明此值对应根据我们定义的最美丽模型。

英文摘要

We revisit the fermion mass problem of the $SU(5)$ grand unified theory using machine learning techniques. The original $SU(5)$ model proposed by Georgi and Glashow is incompatible with the observed fermion mass spectrum. Two remedies are known to resolve this discrepancy, one is through introducing a new interaction via a 45-dimensional field, and the other via a 24-dimensional field. We investigate which modification is more beautiful, defining the beauty as proximity to the original Georgi-Glashow $SU(5)$ model. Our analysis shows that, in both supersymmetric and non-supersymmetric scenarios, the model incorporating the interaction with the 24-dimensional field is more beautiful under this criterion. We then generalise these models by introducing a continuous parameter $y$, which takes the value 3 for the 45-dimensional field and 1.5 for the 24-dimensional field. Numerical optimisation reveals that $y \approx 0.8$ yields the closest match to the original $SU(5)$ model, indicating that this value corresponds to the most beautiful model according to our definition.

URL PDF HTML ☆

赞 0 踩 0

2511.07329 2026-05-19 cs.LG cs.CV 版本更新

Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis

基于分形的计算架构制备用于高级大语言模型分析

Yash Mittal, Dmitry Ignatov, Radu Timofte

发表机构 * Computer Vision Lab, CAIDAS, University of Würzburg（计算机视觉实验室，CAIDAS，乌尔姆大学）

AI总结本文提出FractalNet框架，通过递归模板模式自动生成并评估卷积神经网络架构，实现高效稳定的网络结构探索，实验显示分形架构在五轮训练后达到80.18%的准确率。

详情

AI中文摘要

本文提出FractalNet，一种基于分形设计原理的框架，通过递归模板模式自动生成并评估卷积神经网络（CNN）架构。该框架通过递归分形模板系统地变化关键参数如分形深度、列宽和层配置，而非依赖计算成本高的神经架构搜索（NAS）方法。框架包含生成器、分形模板模块和运行器模块，生成1200多个CNN架构在CIFAR-10数据集上进行测试。使用PyTorch进行训练，采用随机梯度下降和自动混合精度及梯度检查点技术降低计算开销。实验结果显示分形架构具有稳定的训练动态和竞争性性能，五轮训练后验证准确率为60-70%，峰值准确率为80.18%。这些发现表明递归分形结构在平衡网络深度和宽度方面有效，并支持大规模自动化架构探索。

英文摘要

This paper proposes FractalNet, a framework based on fractal design principles that automatically generates and evaluates convolutional neural network (CNN) architectures using recursive template patterns. Rather than relying on computationally expensive Neural Architecture Search (NAS) methods, the framework explores a structured architecture space defined by recursive fractal templates that systematically vary key parameters such as fractal depth, column width, and layer configurations. The framework consists of three core components: a generator that produces candidate architectures via controlled permutations of convolutional, normalization, activation, and dropout layers; a fractal template module that enforces recursive multi-path structural patterns; and a runner module that manages model training, evaluation, and logging. Using this system, over 1,200 distinct CNN architectures were automatically generated and evaluated on the CIFAR-10 image classification benchmark. Training was performed in PyTorch using stochastic gradient descent with Automatic Mixed Precision (AMP) and gradient checkpointing to reduce computational overhead. Experimental results demonstrate that fractal-based architectures exhibit stable training dynamics and achieve competitive performance, with an average validation accuracy of 60-70% and a peak accuracy of 80.18% after only five training epochs. These findings suggest that recursive fractal structures provide an effective means of balancing network depth and width while supporting large-scale automated architecture exploration. The proposed framework offers a resource-efficient and interpretable approach to systematic neural architecture experimentation.

URL PDF HTML ☆

赞 0 踩 0

2510.23641 2026-05-19 cs.LG cs.AI hep-ex physics.ins-det 版本更新

在基于机器学习的天气预报中通过隐空间约束学习更符合物理现实的动力学

Hang Fan, Yi Xiao, Yongquan Qu, Juan Nathaniel, Fenghua Ling, Ben Fei, Lei Bai, Pierre Gentine

发表机构 * Department of Earth and Environmental Engineering, Columbia University（哥伦比亚大学地球与环境工程系）； Learning the Earth with Artificial Intelligence and Physics (LEAP) Center, Columbia University（人工智能与物理联合地球学习中心（LEAP）, 哥伦比亚大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； The Chinese University of Hong Kong（香港中文大学）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）

AI总结本文提出通过隐空间约束改进天气预报模型，以捕捉多变量依赖关系，提升长期预报能力并保持物理真实性。

详情

AI中文摘要

数据驱动的机器学习（ML）模型正在重塑天气预报，并展示了加速和超越传统物理方法的潜力，从而引发该领域第二次革命。然而，大多数ML预报模型在训练时使用加权变量损失，忽视了由物理耦合引起的跨变量和空间误差协方差，通常导致过于平滑且物理不真实的长期预报。为此，我们将模型训练重新表述为四维变分数据同化（4DVar）问题，将再分析数据视为不完美的观测。这使损失函数能够纳入跨变量误差协方差结构，捕捉多变量依赖及其相关误差。在实践中，我们通过计算自动编码器学习的全球大气状态隐空间中的损失来近似此目标。通过编码大气变量间的复杂非线性耦合，这种表示允许高维、复杂误差协方差矩阵在模型空间中近似为隐空间中的对角矩阵，从而大大简化了实现。我们证明了在隐空间约束下的滚动训练能提高长期预报能力，同时比广泛使用的模型空间损失更好地保持细尺度结构和物理真实性。最后，我们扩展了这一框架以适应异质数据源，使预报模型能够在统一的理论框架内联合训练再分析和多源观测。

英文摘要

Data-driven machine learning (ML) models are reshaping weather forecasting and have shown the potential to accelerate and surpass traditional physics-based approaches, leading to a second revolution in the field after data assimilation. However, most ML forecast models are trained with weighted variable-wise losses on rollout forecasts that neglect cross-variable and spatial error covariance induced by physical coupling, often yielding overly smooth and physically unrealistic long-range forecasts. To address this, we reformulate model training as a four-dimensional variational data assimilation (4DVar) problem that treats reanalysis data as imperfect observations. This enables the loss function to incorporate cross-variable error covariance structures that capture multivariate dependencies and their associated errors. In practice, we approximate this objective by computing the loss in an autoencoder-learned latent space of global atmospheric states. By encoding complex nonlinear couplings among atmospheric variables, this representation allows the high-dimensional, complex error covariance matrix in model space to be approximated as nearly diagonal in latent space, substantially simplifying implementation. We show that rollout training with latent-space constraints improves long-term forecast skill, while better preserving fine-scale structures and physical realism than the widely used model-space loss. Finally, we extend this framework to accommodate heterogeneous data sources, enabling the forecast model to be trained jointly on reanalysis and multi-source observations within a unified theoretical formulation.

URL PDF HTML ☆

赞 0 踩 0

2509.26037 2026-05-19 cs.AI cs.CV cs.LG 版本更新

CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search

CoLLM-NAS：协作大型语言模型用于高效知识引导的神经架构搜索

Zhe Li, Zhiwei Lin, Yongtao Wang

发表机构 * Wangxuan Institute of Computer Technology, Peking University（北京大学计算机科学技术研究院）

AI总结本文提出CoLLM-NAS，一种基于协作大型语言模型的两阶段神经架构搜索框架，通过导航和生成两个LLM及协调模块，有效指导搜索过程，提升效率并取得新状态最优结果。

Comments Accepted as Oral at CVPR 2026 Workshop on Neural Architecture Search (NAS)

详情

AI中文摘要

将大型语言模型（LLMs）与神经架构搜索（NAS）结合，为自动设计神经架构提供了新可能。然而，现有方法面临架构无效、计算低效和性能劣于传统NAS的限制。本文提出协作LLM-based NAS（CoLLM-NAS），一种两阶段NAS框架，通过两个互补的LLM驱动知识引导搜索。具体而言，提出具有状态的导航LLM指导搜索方向，无状态的生成LLM合成高质量候选，以及协调模块协调LLM间通信并管理评估过程。CoLLM-NAS通过结合LLM对结构神经架构的内在知识与迭代反馈和历史轨迹的逐步知识，高效指导搜索过程。在ImageNet和NAS-Bench-201上的实验结果表明，CoLLM-NAS超越现有NAS方法和传统搜索算法，取得新状态最优结果，同时显著降低搜索成本4-10倍。此外，CoLLM-NAS在多种搜索空间（如MobileNet、ShuffleNet和AutoFormer）中一致提升各种两阶段NAS方法（如OFA、SPOS和AutoFormer）的性能和效率，展示其优秀的泛化能力。

英文摘要

The integration of Large Language Models (LLMs) with Neural Architecture Search (NAS) has introduced new possibilities for automating the design of neural architectures. However, most existing methods face critical limitations, including architectural invalidity, computational inefficiency, and inferior performance compared to traditional NAS. In this work, we present Collaborative LLM-based NAS (CoLLM-NAS), a two-stage NAS framework with knowledge-guided search driven by two complementary LLMs. Specifically, we propose a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication and manage evaluation processes. CoLLM-NAS efficiently guides the search process by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results while significantly reducing search costs by 4--10. Furthermore, CoLLM-NAS consistently enhances the performance and efficiency of various two-stage NAS methods (e.g., OFA, SPOS, and AutoFormer) across diverse search spaces (e.g., MobileNet, ShuffleNet, and AutoFormer), demonstrating its excellent generalization.

URL PDF HTML ☆

赞 0 踩 0

2509.21319 2026-05-19 cs.CL cs.AI cs.LG 版本更新

RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards

RLBFF：二进制灵活反馈用于连接人类反馈与可验证奖励

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Ellie Evans, Daniel Egert, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev

发表机构 * NVIDIA

AI总结 RLBFF结合人类偏好与规则验证，提升奖励模型对响应质量的精准捕捉，优于Bradley-Terry模型，在RM-Bench和JudgeBench上取得优异成绩，且支持用户自定义反馈原则。

Comments Published at ICLR 2026, 21 pages

详情

AI中文摘要

Reinforcement Learning with Human Feedback (RLHF) 和 Reinforcement Learning with Verifiable Rewards (RLVR) 是LLM后训练的主要RL范式，各有优势。然而，RLHF在可解释性和奖励黑客问题上存在困难，因为它依赖于通常缺乏明确标准的人类判断，而RLVR则受限于其对正确性基于验证器的专注。我们提出Reinforcement Learning with Binary Flexible Feedback (RLBFF)，结合人类驱动的偏好灵活性与规则基础验证的精确性，使奖励模型能够捕捉响应质量的细微方面，超越单纯的正确性。RLBFF从自然语言反馈中提取可以二进制回答的原则（例如信息准确性：是，或代码可读性：否）。这些原则随后可用于将奖励模型训练作为蕴含任务（响应满足或不满足任意原则）。我们展示奖励模型以这种方式训练可以优于匹配数据的Bradley-Terry模型，在RM-Bench（86.2%）和JudgeBench（81.4%，2025年9月24日排行榜第一）上取得最佳成绩。此外，用户可以在推理时指定感兴趣的原理以自定义我们的奖励模型，与Bradley-Terry模型不同。最后，我们提供了一个完全开源的食谱（包括数据）来对Qwen3-32B进行对齐，以匹配或超过o3-mini和DeepSeek R1在MT-Bench、WildBench和Arena Hard v2的一般对齐基准上的性能（在<5%的推理成本下）。模型：https://huggingface.co/collections/nvidia/reward-models-10-2025

英文摘要

Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) are the main RL paradigms used in LLM post-training, each offering distinct advantages. However, RLHF struggles with interpretability and reward hacking because it relies on human judgments that usually lack explicit criteria, whereas RLVR is limited in scope by its focus on correctness-based verifiers. We propose Reinforcement Learning with Binary Flexible Feedback (RLBFF), which combines the versatility of human-driven preferences with the precision of rule-based verification, enabling reward models to capture nuanced aspects of response quality beyond mere correctness. RLBFF extracts principles that can be answered in a binary fashion (e.g. accuracy of information: yes, or code readability: no) from natural language feedback. Such principles can then be used to ground Reward Model training as an entailment task (response satisfies or does not satisfy an arbitrary principle). We show that Reward Models trained in this manner can outperform Bradley-Terry models when matched for data and achieve top performance on RM-Bench (86.2%) and JudgeBench (81.4%, #1 on leaderboard as of September 24, 2025). Additionally, users can specify principles of interest at inference time to customize the focus of our reward models, in contrast to Bradley-Terry models. Finally, we present a fully open source recipe (including data) to align Qwen3-32B using RLBFF and our Reward Model, to match or exceed the performance of o3-mini and DeepSeek R1 on general alignment benchmarks of MT-Bench, WildBench, and Arena Hard v2 (at <5% of the inference cost). Models: https://huggingface.co/collections/nvidia/reward-models-10-2025

URL PDF HTML ☆

赞 0 踩 0

2509.19590 2026-05-19 cs.AI cs.CY cs.LG 版本更新

Position: AI Evaluations Should be Grounded on a Theory of Capability

位置：AI评估应基于能力理论

Nathanael Jo, Ashia Wilson

发表机构 * MIT EECS, Cambridge, USA（麻省理工学院电子工程与计算机科学系，剑桥，美国）

AI总结本文提出AI评估应基于明确的能力理论，通过实验证明评估结果受建模假设影响显著，提出Evaluation Card促进透明化评估实践。

Comments ICML 2026 Position Paper Track

详情

AI中文摘要

生成模型的评估如今普遍存在，其结果深刻影响公众和科学界对AI能力的看法。然而，对其可靠性的怀疑持续增长。如何确定报告的准确率真实反映模型的底层性能？尽管基准结果常被视为能力的直接测量，但实际上它们是推断：将分数视为能力证据已预设了能力定义的理论。我们主张AI评估应作为基于明确能力理论的推断任务。虽然这一观点在心理学测量学等学科中是标准做法，但在AI评估中仍不完善，核心假设常被隐含。作为概念验证，我们实证显示报告性能可能强烈依赖评估者的建模假设，凸显透明、理论驱动的评估实践的必要性。最后，我们提出Evaluation Card帮助研究人员记录、论证和审查AI评估背后的建模决策。

英文摘要

Evaluations of generative models are now ubiquitous, and their outcomes critically shape public and scientific expectations of AI's capabilities. Yet skepticism about their reliability continues to grow. How can we know that a reported accuracy genuinely reflects a model's underlying performance? Although benchmark results are often presented as direct measurements of capability, in practice they are inferences: treating a score as evidence of capability already presupposes a theory of what it means to be capable at a task. We argue that AI evaluations should instead be framed as inference tasks grounded on an explicit theory of capability. While this perspective is standard in fields like psychometrics, it remains underdeveloped in AI evaluation, where core assumptions are often left implicit. As a proof-of-concept, we empirically show that reported performance can depend strongly on the evaluator's modeling assumptions, underscoring the need for transparent, theory-driven evaluation practices. We conclude by offering an Evaluation Card to help researchers document, justify, and scrutinize the modeling decisions underlying AI evaluations.

URL PDF HTML ☆

赞 0 踩 0

2509.18103 2026-05-19 cs.LG math.NT 版本更新

Machine Learnability as a Measure of Order in Aperiodic Sequences

机器可学性作为非周期序列中的秩序度量

Jennifer Dodgson, Michael Joedhitya, Adith Ramdas, Surender Suresh Kumar, Adarsh Singh Chauhan, Akira Rafhael, Wang Mingshu, Nordine Lotfi

发表机构 * ImageNet

AI总结本文通过图像聚焦的机器学习模型，研究素数分布区域的规律性，发现更高区域的素数分布更易被学习，揭示了机器学习在数论研究中的潜力。

详情

AI中文摘要

对素数分布的研究揭示了其双重特性：定义确定性但表现出类似随机过程的统计行为。本文展示了一个图像聚焦的机器学习模型可用于测量特定区域的Ulam螺旋中素数场的相对规律性。具体而言，模型在训练块提取自500m区域时，比训练块提取自低于25m区域的模型在纯准确率上更优。这表明前者区域存在更易学习的秩序。此外，精确度和召回率的详细分析似乎表明，模型在螺旋的不同区域采用不同的分类方法，更关注于识别低数的素数模式，而在高数区域更注重消除合数。这与数论猜想一致，即在更高数量级时，素数分布的噪声应减少，平均值（密度、AP等分布）将主导，而局部随机性在按log x缩放后将趋于规律化。这些发现表明，机器学习可以成为数论研究的新实验工具。值得注意的是，该方法在研究强素数和弱素数的模式以用于密码学目的方面显示出潜力。

英文摘要

Research on the distribution of prime numbers has revealed a dual character: deterministic in definition yet exhibiting statistical behavior reminiscent of random processes. In this paper we show that it is possible to use an image-focused machine learning model to measure the comparative regularity of prime number fields at specific regions of an Ulam spiral. Specifically, we demonstrate that in pure accuracy terms, models trained on blocks extracted from regions of the spiral in the vicinity of 500m outperform models trained on blocks extracted from the region representing integers lower than 25m. This implies existence of more easily learnable order in the former region than in the latter. Moreover, a detailed breakdown of precision and recall scores seem to imply that the model is favouring a different approach to classification in different regions of the spiral, focusing more on identifying prime patterns at lower numbers and more on eliminating composites at higher numbers. This aligns with number theory conjectures suggesting that at higher orders of magnitude we should see diminishing noise in prime number distributions, with averages (density, AP equidistribution) coming to dominate, while local randomness regularises after scaling by log x. Taken together, these findings point toward an interesting possibility: that machine learning can serve as a new experimental instrument for number theory. Notably, the method shows potential 1 for investigating the patterns in strong and weak primes for cryptographic purposes.

URL PDF HTML ☆

赞 0 踩 0

2509.03403 2026-05-19 cs.LG cs.AI 版本更新

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

超越正确性：通过RL训练和谐过程与结果奖励

Chenlu Ye, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwal

发表机构 * Amazon（亚马逊公司）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出PROF方法，通过过程一致性过滤提升推理质量和最终答案准确性，减少对强PRM的依赖。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）提升了推理任务的最终答案准确性，但未能可靠提升推理质量。由于结果奖励仅评估最终答案，它也会奖励虚假成功：错误推理仍可能因偶然得到正确结果而获得最大奖励。这种结果奖励黑客行为会创建有偏的梯度，使当前RLVR不足以学习忠实的推理。过程奖励模型（PRMs）提供逐步监督，但直接优化PRMs或简单地将它们与结果奖励结合在RL训练过程中分布偏移时不稳定。我们引入了过程一致性过滤（PROF），一种数据整理方法，利用PRM-ORM一致性进行样本选择，而不是直接奖励优化。PROF保留具有强过程支持的正确响应和具有弱过程支持的错误响应，同时保持训练比例的平衡。实验表明，PROF在强基线之上一致地提高了最终答案准确性和中间推理质量，对强PRMs的依赖较少。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) improves final-answer accuracy on reasoning tasks, but it does not reliably improve reasoning quality. Because outcome rewards only assess final answers, they also reward spurious successes: flawed reasoning can still receive maximal reward when it accidentally reaches the correct outcome. This outcome reward hacking creates biased gradients, making current RLVR insufficient for learning faithful reasoning. Process Reward Models (PRMs) provide step-wise supervision, but directly optimizing PRMs or naively combining them with outcome rewards is unstable under distribution shift during RL training process. We introduce PRocess cOnsistency Filter (PROF), a data curation method that uses PRM--ORM consistency for sample selection rather than direct reward optimization. PROF keeps correct responses with strong process support and incorrect responses with weak process support while maintaining a balanced training ratio. Experiments show that PROF consistently improves both final-answer accuracy and intermediate reasoning quality over strong baselines, with less dependence on strong PRMs.

URL PDF HTML ☆

赞 0 踩 0

2509.01629 2026-05-19 stat.ML cs.LG cs.NA math.NA 版本更新

Lipschitz-Guided Design of Interpolation Schedules in Generative Models

基于Lipschitz性的生成模型插值调度设计

Yifan Chen, Eric Vanden-Eijnden, Jiawei Xu

发表机构 * Department of Mathematics, University of California, Los Angeles, CA, USA（加州大学洛杉矶分校数学系）； Machine Learning Lab, Capital Fund Management, Paris, France（Capital Fund Management机器学习实验室）； Courant Institute, New York University, NY, USA（纽约大学Courant研究所）； Now at University of Maryland, College Park, MD, USA（现在位于马里兰大学 College Park 分校）

AI总结本文研究了生成模型中插值调度的设计，从统计和数值角度出发，提出通过最小化漂移场的平均平方Lipschitz性来设计调度，以提升生成模型的稳定性与准确性。

详情

AI中文摘要

我们从统计和数值角度研究了流和扩散生成模型中插值调度的设计。在随机插值框架下，我们证明在最优后验调优扩散系数后，标量插值调度在路径空间的Kullback-Leibler散度下是统计等价的。这一等价性促使我们关注漂移场的数值特性而非纯统计标准。我们提出最小化漂移的平均平方Lipschitz性作为调度设计的原理性标准，与最优传输中的动能最小化形成对比。一个简单的转换公式将一个调度的漂移表示为另一个调度的漂移，允许在不同（如线性）调度训练的模型上进行推断而不需重新训练。我们为高斯和高斯混合目标分析了最优调度：对于高斯分布，我们获得比线性调度在Lipschitz常数上指数级改进的调度；对于高斯混合，我们获得在少量步采样中缓解模式崩溃的调度。我们随后在高维不变测度的随机Allen-Cahn和Navier-Stokes方程中验证了该方法，其中设计的调度在固定积分器预算下显著提高了细粒度统计的准确性。

英文摘要

We study the design of interpolation schedules in flow and diffusion-based generative models from both statistical and numerical perspectives. Within the stochastic interpolants framework, we first show that scalar interpolation schedules are statistically equivalent under the Kullback--Leibler divergence in path space, after optimal a posteriori tuning of the diffusion coefficient. This equivalence motivates focusing on numerical properties of the drift field rather than purely statistical criteria. We propose minimizing the averaged squared Lipschitzness of the drift as a principled criterion for schedule design, in contrast with kinetic-energy minimization in optimal transport. A simple transfer formula expresses the drift of one schedule in terms of the drift of another, allowing the designed schedule to be used at inference time with a model trained under a different (e.g., linear) schedule, without retraining. We work out the optimal schedules analytically for Gaussian and Gaussian-mixture targets: for Gaussians, we obtain exponential improvements in the Lipschitz constant over linear schedules; for Gaussian mixtures, we obtain schedules that mitigate mode collapse in few-step sampling. We then validate the approach on high-dimensional invariant measures of stochastic Allen--Cahn and Navier--Stokes equations, where the designed schedule yields markedly more accurate fine-scale statistics at fixed integrator budget.

URL PDF HTML ☆

赞 0 踩 0

2508.15100 2026-05-19 cs.CR cs.LG 版本更新

Shift Detection and Adaptation for Network Intrusion Detection

网络入侵检测中的分布偏移检测与适应

Ehssan Mousavipour, Andrey Dimanchev, Majid Ghaderi

发表机构 * University of Calgary（卡尔加里大学）

AI总结本文提出NetSight框架，通过在线持续检测和适应分布偏移，提升网络入侵检测的鲁棒性，实验表明其在F1-score上优于依赖人工标注的现有方法。

详情

AI中文摘要

分布偏移，即数据统计特性随时间变化，对深度学习异常检测系统构成重大挑战。现有异常检测系统难以适应这些偏移。监督学习系统需昂贵的人工标注，而无监督学习系统依赖干净数据进行偏移适应，但干净数据难以获取。本文引入NetSight框架，通过新颖的伪标注技术消除人工干预，并利用知识蒸馏策略防止灾难性遗忘。在三个长期网络数据集上评估，NetSight在F1-score上优于依赖人工标注的现有方法，最高提升达11.72%，证明其在动态网络中的鲁棒性和有效性。

英文摘要

Distribution shift, a change in the statistical properties of data over time, poses a critical challenge for deep learning anomaly detection systems. Existing anomaly detection systems often struggle to adapt to these shifts. Specifically, systems based on supervised learning require costly manual labeling, while those based on unsupervised learning rely on clean data, which is difficult to obtain, for shift adaptation. Both of these requirements are challenging to meet in practice. In this paper, we introduce NetSight, a framework for supervised anomaly detection in network data that continually detects and adapts to distribution shifts in an online manner. NetSight eliminates manual intervention through a novel pseudo-labeling technique and uses a knowledge distillation-based adaptation strategy to prevent catastrophic forgetting. Evaluated on three long-term network datasets, NetSight demonstrates superior adaptation performance compared to state-of-the-art methods that rely on manual labeling, achieving F1-score improvements of up to 11.72%. This proves its robustness and effectiveness in dynamic networks that experience distribution shifts over time.

URL PDF HTML ☆

赞 0 踩 0

2508.04227 2026-05-19 cs.CV cs.LG 版本更新

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

视觉语言模型的持续学习：超越遗忘的综述与分类

Yuyang Liu, Qiuhe Hong, Linlan Huang, Alexandra Gomez-Villa, Dipam Goswami, Xialei Liu, Joost van de Weijer, Yonghong Tian

AI总结本文综述了视觉语言模型的持续学习挑战，提出四种核心范式以解决跨模态特征漂移和灾难性遗忘问题，强调零样本学习和智能体生态系统的发展。

详情

AI中文摘要

视觉语言模型（VLMs）和近期多模态大语言模型（MLLMs）通过前所未有的跨模态对齐和零样本泛化革新了人工智能。然而，使它们能够从非平稳数据中持续学习仍是一个重大挑战，因为它们的跨模态对齐和泛化能力特别容易受到灾难性遗忘的影响。不同于传统单模态持续学习（CL），VLMs面临独特的挑战，如跨模态特征漂移、由于共享架构导致的参数干扰以及零样本能力侵蚀。此外，生成式MLLMs表现出一种独特的“对齐税”，其中灾难性遗忘不仅表现为事实性遗忘，还表现为深度链式思维（CoT）推理的系统性崩溃。本文首次全面、诊断性地回顾了预测VLMs和生成式MLLMs的持续学习。我们系统地分解了上述失败模式，并提出了一个以挑战为导向的分类，包括四个核心范式：（1）多模态重播策略解决显式和隐式记忆漂移；（2）跨模态正则化强制拓扑和几何对齐；（3）参数高效适应利用动态路由和子空间投影；以及新兴的（4）模型融合与解耦范式。我们批判性地分析了评估协议的演变，强调了向双轨基准（领域 vs. 能力 CL）和微诊断 CoT 评估的转变。最后，我们绘制了未来研究的路线图，强调组合式零样本学习、具身AI与传感器融合以及自主智能体生态系统。所有资源均可在：https://github.com/YuyangSunshine/Awesome-Continual-learning-of-Vision-Language-Models 上找到。

英文摘要

Vision-language models (VLMs) and the recent surge of Multimodal Large Language Models (MLLMs) have revolutionized artificial intelligence with unprecedented cross-modal alignment and zero-shot generalization. However, enabling them to learn continually from non-stationary data remains a major challenge, as their cross-modal alignment and generalization capabilities are particularly vulnerable to catastrophic forgetting. Unlike traditional unimodal continual learning (CL), VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion. Furthermore, generative MLLMs exhibit a unique ``alignment tax,'' where catastrophic forgetting manifests not merely as factual amnesia, but as a systemic collapse of deep Chain-of-Thought (CoT) reasoning. This survey presents the first comprehensive, diagnostic review bridging continual learning for both predictive VLMs and generative MLLMs. We systematically deconstruct the aforementioned failure modes and propose a challenge-driven taxonomy comprising four core paradigms: (1) Multi-Modal Replay Strategies addressing explicit and implicit memory drift; (2) Cross-Modal Regularization enforcing topological and geometric alignment; (3) Parameter-Efficient Adaptation} utilizing dynamic routing and subspace projections; and the emerging (4) Model Fusion and Decoupling paradigms. We critically analyze the evolution of evaluation protocols, highlighting the essential shift toward dual-track benchmarks (Domain vs. Ability CL) and micro-diagnostic CoT evaluations. Finally, we chart a roadmap for future research, emphasizing compositional zero-shot learning, embodied AI with sensor fusion, and autonomous agentic ecosystems. All resources are available at: https://github.com/YuyangSunshine/Awesome-Continual-learning-of-Vision-Language-Models.

URL PDF HTML ☆

赞 0 踩 0

2508.04149 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

基于难度的偏好数据选择：通过DPO隐式奖励差距

Xuan Qi, Rongwu Xu, Zhijing Jin

发表机构 * Paul G. Allen School of Computer Science & Engineering, University of Washington（华盛顿大学计算机科学与工程保罗·G·艾伦学校）； Max Planck Institute for Intelligent Systems, Tübingen, Germany（德国图宾根马克斯·普朗克智能系统研究所）； Jinesis Lab, University of Toronto & Vector Institute（多伦多大学Jinesis实验室及向量研究所）

AI总结本文提出基于难度的偏好数据选择方法，利用DPO隐式奖励机制选择奖励差距小的样本，提升数据效率和模型对齐性能，在多个数据集和对齐任务中优于五个基线方法。

Comments Our code and data are available at https://github.com/Difficulty-Based-Preference-Data-Select/Difficulty-Based-Preference-Data-Select

详情

AI中文摘要

对齐大语言模型（LLMs）与人类偏好是AI研究中的关键挑战。尽管强化学习从人类反馈（RLHF）和直接偏好优化（DPO）等方法被广泛使用，但它们通常依赖于大规模、成本高的偏好数据集。本文缺少针对偏好数据的高质量数据选择方法。在本文中，我们引入了一种基于难度的偏好数据选择策略，该策略基于DPO隐式奖励机制。通过选择奖励差距较小的偏好数据示例，这些示例代表更具挑战性的案例，从而提高数据效率和模型对齐。我们的方法在多个数据集和对齐任务中一致优于五个强大的基线方法，仅使用原始数据的10%即可实现优越性能。这种原理上高效的选择方法为在有限资源下扩展LLM对齐提供了有前景的解决方案。

英文摘要

Aligning large language models (LLMs) with human preferences is a critical challenge in AI research. While methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) are widely used, they often rely on large, costly preference datasets. The current work lacks methods for high-quality data selection specifically for preference data. In this work, we introduce a novel difficulty-based data selection strategy for preference datasets, grounded in the DPO implicit reward mechanism. By selecting preference data examples with smaller DPO implicit reward gaps, which are indicative of more challenging cases, we improve data efficiency and model alignment. Our approach consistently outperforms five strong baselines across multiple datasets and alignment tasks, achieving superior performance with only 10\% of the original data. This principled, efficient selection method offers a promising solution for scaling LLM alignment with limited resources.

URL PDF HTML ☆

赞 0 踩 0

2508.02383 2026-05-19 cs.LG cs.IR 版本更新

Graph Embedding in the Graph Fractional Fourier Transform Domain

图在图分数傅里叶变换域中的嵌入

Changjie Sheng, Zhichao Zhang, Yangfan He

发表机构 * School of Mathematics and Statistics, Nanjing University of Information Science and Technology（南京信息工程大学数学与统计学学院）； Hubei Key Laboratory of Applied Mathematics, Hubei University（湖北省应用数学重点实验室）； Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University（教育部系统控制与信息处理重点实验室，上海交通大学）； Nanjing Institute of Technology（南京理工大学）； Jiangsu Province Engineering Research Center of IntelliSense Technology and System, Nanjing（江苏省智能感知技术与系统工程研究中心，南京）

AI总结本文提出GEFRFE方法，通过引入图分数傅里叶变换扩展通用频率过滤嵌入，提升嵌入信息量，实验表明其能捕捉更丰富的结构特征并提升分类性能。

详情

AI中文摘要

谱图嵌入在图表示学习中通过从图谱信息生成低维向量表示起关键作用。然而，传统谱嵌入方法的嵌入空间往往表达能力有限，无法充分捕捉不同变换域下的潜在结构特征。为解决此问题，本文使用图分数傅里叶变换将现有的最先进通用频率过滤嵌入（GEFFE）扩展到分数域，提出通用分数过滤嵌入（GEFRFE）。GEFRFE通过图分数域过滤和从分数化图拉普拉斯矩阵导出的非线性特征向量成分组合来增强嵌入信息量。为动态确定分数阶数，本文引入了两种并行策略：基于搜索的优化和基于ResNet18的自适应学习。在五个基准数据集上的广泛实验表明，GEFRFE能够捕捉更丰富的结构特征并显著提升分类性能。GEFRFE为图嵌入从“固定域”到“通用域”的发展提供了新范式。结果表明，将GFRFT引入图嵌入领域是正确且有效的研究路径。值得注意的是，所提出的方法保持了与GEFFE方法相当的计算复杂度。

英文摘要

Spectral graph embedding plays a critical role in graph representation learning by generating low-dimensional vector representations from graph spectral information. However, the embedding space of traditional spectral embedding methods often exhibit limited expressiveness, failing to exhaustively capture latent structural features across alternative transform domains. To address this issue, we use the graph fractional Fourier transform to extend the existing state-of-the-art generalized frequency filtering embedding (GEFFE) into fractional domains, giving birth to the generalized fractional filtering embedding (GEFRFE), which enhances embedding informativeness via the graph fractional domain.The GEFRFE leverages graph fractional domain filtering and a nonlinear composition of eigenvector components derived from a fractionalized graph Laplacian. To dynamically determine the fractional order, two parallel strategies are introduced: search-based optimization and a ResNet18-based adaptive learning. Extensive experiments on five benchmark datasets demonstrate that the GEFRFE captures richer structural features and significantly enhance classification performance. The GEFRFE provides a new paradigm for the development of graph embedding from the "fixed domain" to the "generalized domain". The results indicate that introducing the GFRFT into the graph embedding domain is a correct and effective research path. Notably, the proposed method retains computational complexity comparable to GEFFE approaches.

URL PDF HTML ☆

赞 0 踩 0

2508.00712 2026-05-19 cs.LG cs.AI 版本更新

JSON-Bag: A generic game trajectory representation

JSON-Bag：一种通用的游戏轨迹表示方法

Dien Nguyen, Diego Perez-Liebana, Simon Lucas

发表机构 * GitHub

AI总结本文提出JSON-Bag模型，通过分词JSON描述并使用Jensen-Shannon距离衡量游戏轨迹，验证了其在六个桌面游戏中对玩家、参数和种子分类的有效性，优于基线方法并提升了准确性。

Comments 8 pages, 3 figures, 6 tables, published in IEEE Conference on Games 2025

2507.21334 2026-05-19 stat.ML cs.LG 版本更新

Graph neural networks for residential location choice: connection to classical logit models

图神经网络在住宅选址选择中的应用：与经典logit模型的联系

Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Yuqi Zhou, Shenhao Wang

发表机构 * Department of Urban and Regional Planning, University of Florida（佛罗里达大学城市与区域规划系）； Department of Landscape Architecture and Urban Planning, Texas A&M University（德克萨斯大学农业与机械学院景观建筑与城市规划系）； Department of Computer Science, University of California, Santa Barbara（加州大学圣芭芭拉分校计算机科学系）

AI总结本文提出基于图神经网络的住宅选址选择模型，通过捕捉空间替代关系，优于传统模型，展现深度学习与离散选择模型结合的潜力。

详情

DOI: 10.1016/j.trb.2026.103464

AI中文摘要

研究人员已采用深度学习进行经典离散选择分析，因其能捕捉复杂特征关系并提高预测性能。然而，现有深度学习方法无法显式捕捉选择替代品之间的关系，这在经典离散选择模型中一直是重点。为解决这一差距，本文引入图神经网络（GNN）作为新框架分析住宅选址选择。GNN-DCMs提供了一种结构化方法，使神经网络能捕捉空间替代品间的依赖关系，同时保持与经典随机效用理论的明确联系。理论上，证明GNN-DCMs包含嵌套logit（NL）模型和空间相关logit（SCL）模型作为特定情况，通过替代品效用间的消息传递获得新的算法解释。实证上，GNN-DCMs在预测芝加哥77个社区区的住宅选址选择中优于基准MNL、SCL和前馈神经网络。在模型解释方面，GNN-DCMs能捕捉个体异质性和空间感知的替代模式。总体而言，这些结果突显了GNN-DCMs作为统一且表达性强的框架，可整合离散选择建模和深度学习，在复杂空间选择情境中的潜力。

英文摘要

Researchers have adopted deep learning for classical discrete choice analysis as it can capture complex feature relationships and achieve higher predictive performance. However, the existing deep learning approaches cannot explicitly capture the relationship among choice alternatives, which has been a long-lasting focus in classical discrete choice models. To address the gap, this paper introduces Graph Neural Network (GNN) as a novel framework to analyze residential location choice. The GNN-based discrete choice models (GNN-DCMs) offer a structured approach for neural networks to capture dependence among spatial alternatives, while maintaining clear connections to classical random utility theory. Theoretically, we demonstrate that the GNN-DCMs incorporate the nested logit (NL) model and the spatially correlated logit (SCL) model as two specific cases, yielding novel algorithmic interpretation through message passing among alternatives' utilities. Empirically, the GNN-DCMs outperform benchmark MNL, SCL, and feedforward neural networks in predicting residential location choices among Chicago's 77 community areas. Regarding model interpretation, the GNN-DCMs can capture individual heterogeneity and exhibit spatially-aware substitution patterns. Overall, these results highlight the potential of GNN-DCMs as a unified and expressive framework for synergizing discrete choice modeling and deep learning in the complex spatial choice contexts.

URL PDF HTML ☆

赞 0 踩 0

2507.12969 2026-05-19 cs.LG cs.CV 版本更新

WaveletInception Networks for on-board Vibration-Based Infrastructure Health Monitoring

小波 inception 网络用于车载振动基基础设施健康监测

Reza Riahi Samani, Alfredo Nunez, Bart De Schutter

发表机构 * Delft Center for Systems and Control (DCSC), Delft University of Technology（代尔夫特理工大学系统与控制中心）； Section of Railway Engineering, Department of Engineering Structures, Delft University of Technology（工程结构系铁路工程部）

AI总结本文提出WaveletInception-BiGRU网络，通过可学习小波包变换提取频谱特征，结合Inception-残差网络进行多尺度特征学习，并利用BiGRU模块整合时间依赖性，实现无需预处理的振动信号分析，提升车载基础设施健康监测的准确性和自动化水平。

Comments Under reviewer for the Journal of Engineering Application of Artificial Intelligence

详情

DOI: 10.1016/j.engappai.2026.113976

AI中文摘要

本文提出了一种深度学习框架，用于分析车载振动响应信号以进行基础设施健康监测。所提出的WaveletInception-BiGRU网络采用可学习的小波包变换（LWPT）进行早期频谱特征提取，随后通过一维Inception-残差网络（1D Inception-ResNet）模块进行多尺度、高级特征学习。双向门控循环单元（BiGRU）模块则整合时间依赖性，并纳入操作条件，如测量速度。该方法使能够有效分析在不同速度下记录的振动信号，无需显式信号预处理。序列估计头进一步利用双向时间信息，产生准确的基础设施健康局部评估。最终，该框架生成高分辨率的空间映射健康配置文件。针对轨道刚度回归和过渡区分类的案例研究显示，所提出的框架显著优于现有方法，证明了其在准确、局部化和自动化车载基础设施健康监测中的潜力。

英文摘要

This paper presents a deep learning framework for analyzing on board vibration response signals in infrastructure health monitoring. The proposed WaveletInception-BiGRU network uses a Learnable Wavelet Packet Transform (LWPT) for early spectral feature extraction, followed by one-dimensional Inception-Residual Network (1D Inception-ResNet) modules for multi-scale, high-level feature learning. Bidirectional Gated Recurrent Unit (BiGRU) modules then integrate temporal dependencies and incorporate operational conditions, such as the measurement speed. This approach enables effective analysis of vibration signals recorded at varying speeds, eliminating the need for explicit signal preprocessing. The sequential estimation head further leverages bidirectional temporal information to produce an accurate, localized assessment of infrastructure health. Ultimately, the framework generates high-resolution health profiles spatially mapped to the physical layout of the infrastructure. Case studies involving track stiffness regression and transition zone classification using real-world measurements demonstrate that the proposed framework significantly outperforms state-of-the-art methods, underscoring its potential for accurate, localized, and automated on-board infrastructure health monitoring.

URL PDF HTML ☆

赞 0 踩 0

2507.09148 2026-05-19 stat.ML cs.LG math.OC 版本更新

A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation

基于基本SDP松弛的稀疏PCA随机算法

Alberto Del Pia, Dekun Zhou

发表机构 * Department of Industrial and Systems Engineering & Wisconsin Institute for Discovery, University of Wisconsin-Madison（工业与系统工程系及威斯康星大学麦迪逊分校威斯康星发现研究所）

AI总结本文提出基于基本SDP松弛的稀疏PCA随机近似算法，通过构造确定性和随机性解并输出最优解，实现高概率下的稀疏性常数近似比，并在特定条件下保证近似比受对数约束。

Comments 29 pages, 2 figures

详情

AI中文摘要

稀疏主成分分析（SPCA）是一种用于降维的基本技术，属于NP难问题。本文介绍了一种基于基本SDP松弛的随机近似算法，该算法通过构造确定性稀疏解和多个随机解，并输出最优解。该算法在足够多次调用时，近似比最多为稀疏常数。在技术假设下，平均近似比受O(log d)约束，其中d为特征数。我们证明若SDP解低秩或具有指数衰减特征值，则该技术假设成立。我们还展示了两类实例满足该假设，并在协方差模型中证明确定性解可达到近优近似比。通过在真实数据集上的数值测试验证了算法的有效性。

英文摘要

Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm takes an (approximate) SDP solution, constructs one deterministic sparse solution and several randomized solutions, and outputs the best among them. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present two classes of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, the deterministic solution in our algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical tests on real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2506.23978 2026-05-19 cs.LG cs.CL cs.CY cs.SI 版本更新

LLM Agents Are the Antidote to Walled Gardens

大语言模型代理是封闭生态系统的解药

Samuele Marro, Philip Torr

发表机构 * Department of Engineering Science, University of Oxford（牛津大学工程科学系）； Institute for Decentralized AI（去中心化人工智能研究所）

AI总结本文提出通过大语言模型代理实现通用互操作性，打破封闭平台垄断，促进数据端到端迁移，同时探讨其带来的安全与法律挑战。

Comments Published at the ICML 2026 Position Paper track

详情

AI中文摘要

尽管互联网的核心基础设施最初设计为开放和通用，但当今的应用层却被封闭的专有平台主导。开放且互操作的API需要大量投资，而市场领导者缺乏激励去启用可能削弱用户锁定的数据交换。我们主张基于大语言模型的代理从根本上颠覆这一现状。代理可以自动转换数据格式并与为人设计的界面交互：这使互操作性大幅降低且实际上不可避免。我们称之为这种转变通用互操作性：任何两个数字服务都能通过AI调解的适配器无缝交换数据的能力。通用互操作性削弱了垄断行为，促进数据端到端迁移。然而，它也可能导致新的安全风险、技术债务和法律摩擦。我们的立场是ML社区应拥抱这一发展，同时构建适当的框架来减轻负面影响。通过现在行动，我们可以利用AI恢复用户自由和竞争市场，而不牺牲安全。

英文摘要

While the Internet's core infrastructure was designed to be open and universal, today's application layer is dominated by closed, proprietary platforms. Open and interoperable APIs require significant investment, and market leaders have little incentive to enable data exchange that could erode their user lock-in. We argue that LLM-based agents fundamentally disrupt this status quo. Agents can automatically translate between data formats and interact with interfaces designed for humans: this makes interoperability dramatically cheaper and effectively unavoidable. We name this shift universal interoperability: the ability for any two digital services to exchange data seamlessly using AI-mediated adapters. Universal interoperability undermines monopolistic behaviours and promotes data portability. However, it can also lead to new security risks, technical debt, and legal frictions. Our position is that the ML community should embrace this development while building the appropriate frameworks to mitigate the downsides. By acting now, we can harness AI to restore user freedom and competitive markets without sacrificing security.

URL PDF HTML ☆

赞 0 踩 0

2506.22901 2026-05-19 cs.LG cs.AI q-bio.BM q-bio.GN 版本更新

Missing-Modality-Aware Graph Neural Network for Cancer Classification

面向缺失模态的图神经网络用于癌症分类

Sina Tabakhi, Chen, Chen, Haiping Lu

发表机构 * School of Computer Science, University of Sheffield（谢菲尔德大学计算机科学学院）

AI总结本文提出MAGNET模型，通过动态患者-模态多头注意力机制融合低维模态嵌入，以提升部分模态下的多模态预测性能，实验表明其在癌症分类任务中优于现有方法。

Comments 27 pages, 22 figures

详情

AI中文摘要

在学习多模态生物数据时，缺失模态是一个关键挑战，其中某些患者的数据缺失一个或多个模态。现有方法要么排除缺失模态的患者，要么填补缺失模态，或直接使用部分模态进行预测。然而，这些方法大多依赖于不灵活的、患者无关的融合策略，且无法扩展到随着模态数量增加而指数级增长的缺失模态模式。为解决这些限制，我们提出MAGNET（Missing-modality-Aware Graph neural NETwork）以增强部分模态下的多模态预测，其特征是动态患者-模态多头注意力机制，根据贡献和缺失性融合低维模态嵌入。MAGNET融合的复杂性随着模态数量线性增加，同时适应缺失模式的变异性。为了生成预测，MAGNET进一步构建一个患者图，其中融合的多模态嵌入作为节点特征，连接性由模态缺失性决定，随后通过图神经网络进行处理。在三个公共多组学数据集上进行的实验表明，MAGNET在癌症分类任务中优于现有最先进的融合方法。数据和代码可在https://github.com/SinaTabakhi/MAGNET获取。

英文摘要

A key challenge in learning from multimodal biological data is missing modalities, where data from one or more modalities are absent for some patients. Existing approaches either exclude patients with missing modalities, impute missing modalities, or make predictions directly with partial modalities. However, most of these methods rely on inflexible, patient-agnostic fusion strategies and do not scale computationally to the combinatorial growth of missing-modality patterns as the number of modalities increases. To address these limitations, we propose MAGNET (Missing-modality-Aware Graph neural NETwork) to enhance multimodal prediction with partial modalities, featuring a dynamic patient-modality multi-head attention mechanism to fuse lower-dimensional modality embeddings based on their contribution and missingness. MAGNET fusion's complexity increases linearly with the number of modalities while adapting to missing-pattern variability. To generate predictions, MAGNET further constructs a patient graph with fused multimodal embeddings as node features and connectivity determined by the modality missingness, followed by a graph neural network. Experiments on three public multiomics datasets for cancer classification, with real-world missingness, show that MAGNET outperforms state-of-the-art fusion methods. The data and code are available at https://github.com/SinaTabakhi/MAGNET.

URL PDF HTML ☆

赞 0 踩 0

2506.11925 2026-05-19 cs.AR cs.AI cs.CV cs.LG 版本更新

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

基于知识图谱嵌入和贝叶斯推断的车道变换预测架构的现实世界部署

M. Manzour, Catherine M. Elias, Omar M. Shehata, R. Izquierdo, M. A. Sotelo

发表机构 * Department of Computer Engineering, University of Alcalá（阿尔卡拉大学计算机工程系）； Department of Computer Science, German University in Cairo（开罗德国大学计算机科学系）； Department of Mechatronics, German University in Cairo（开罗德国大学机电系）

AI总结本文提出基于知识图谱嵌入和贝叶斯推断的车道变换预测系统，通过现实硬件验证，实现了算法与道路部署的结合，提前3-4秒预测目标车辆车道变换，确保安全。

详情

DOI: 10.1109/ICVES65691.2025.11376512
Journal ref: 2025 IEEE International Conference on Vehicular Electronics and Safety (ICVES)

AI中文摘要

近年来，车道变换预测研究取得显著进展，但大多数研究局限于仿真或数据集结果，未能实现算法与道路部署的结合。本文通过现实硬件展示了基于知识图谱嵌入（KGEs）和贝叶斯推断的车道变换预测系统。该系统包含感知模块和预测模块：感知模块感知环境，提取数值特征并转换为语言类别，与预测模块通信；预测模块执行KGE和贝叶斯推断模型，预测目标车辆的行驶动作并转换为纵向制动动作。现实硬件实验验证表明，该预测系统能提前3-4秒预测目标车辆的车道变换，为自动驾驶车辆提供充足反应时间，确保车道变换安全。

英文摘要

Research on lane change prediction has gained a lot of momentum in the last couple of years. However, most research is confined to simulation or results obtained from datasets, leaving a gap between algorithmic advances and on-road deployment. This work closes that gap by demonstrating, on real hardware, a lane-change prediction system based on Knowledge Graph Embeddings (KGEs) and Bayesian inference. Moreover, the ego-vehicle employs a longitudinal braking action to ensure the safety of both itself and the surrounding vehicles. Our architecture consists of two modules: (i) a perception module that senses the environment, derives input numerical features, and converts them into linguistic categories; and communicates them to the prediction module; (ii) a pretrained prediction module that executes a KGE and Bayesian inference model to anticipate the target vehicle's maneuver and transforms the prediction into longitudinal braking action. Real-world hardware experimental validation demonstrates that our prediction system anticipates the target vehicle's lane change three to four seconds in advance, providing the ego vehicle sufficient time to react and allowing the target vehicle to make the lane change safely.

URL PDF HTML ☆

赞 0 踩 0

2506.10959 2026-05-19 cs.LG cs.AI math.ST stat.TH 版本更新

FlowMixer：一种不依赖深度的神经架构用于可解释的时空预测

Fares B. Mehouachi, Saif Eddin Jabari

发表机构 * New York University in Abu Dhabi（纽约大学阿布扎赫尔分校）； New York University Abu Dhabi（纽约大学阿布扎赫尔分校）； Brooklyn, USA（布鲁克林，美国）

AI总结 FlowMixer通过约束矩阵运算建模结构化时空模式，结合可逆映射框架实现可解释的时空预测，通过Kronecker-Koopman特征模式直接操控预测时间跨度，无需重新训练。

Comments Accepted (main track) at NeurIPS 2025. 44 pages, 17 figures, 22 tables. Published in Advances in Neural Information Processing Systems, vol. 38

详情

Journal ref: Advances in Neural Information Processing Systems, vol. 38, pp. 88811-88861, 2025

AI中文摘要

我们介绍了FlowMixer，一种单层神经架构，利用约束矩阵运算来建模结构化时空模式，提升可解释性。FlowMixer在可逆映射框架中整合非负矩阵混合层，通过先应用变换再应用逆变换的方式实现形状保持设计。这种设计使得Kronecker-Koopman特征模式框架能够连接统计学习与动力系统理论，提供可解释的时空模式，并允许直接进行代数操作，无需重新训练。该架构的半群性质使单层能够通过组合数学上表示任意深度，从而完全消除深度搜索。在多样本域的广泛实验中，FlowMixer展示了长预测时间跨度的能力，同时有效建模物理现象如混沌吸引子和湍流。我们的结果在性能上与最先进的方法相匹配，同时通过可直接提取的特征模式提供更优越的可解释性。这项工作表明，架构约束可以同时保持竞争性的性能并增强神经预测系统的数学可解释性。

英文摘要

We introduce FlowMixer, a single-layer neural architecture that leverages constrained matrix operations to model structured spatiotemporal patterns with enhanced interpretability. FlowMixer incorporates non-negative matrix mixing layers within a reversible mapping framework - applying transforms before mixing and their inverses afterward. This shape-preserving design enables a Kronecker-Koopman eigenmodes framework that bridges statistical learning with dynamical systems theory, providing interpretable spatiotemporal patterns and facilitating direct algebraic manipulation of prediction horizons without retraining. The architecture's semi-group property enables this single layer to mathematically represent any depth through composition, eliminating depth search entirely. Extensive experiments across diverse domains demonstrate FlowMixer's long-horizon forecasting capabilities while effectively modeling physical phenomena such as chaotic attractors and turbulent flows. Our results achieve performance matching state-of-the-art methods while offering superior interpretability through directly extractable eigenmodes. This work suggests that architectural constraints can simultaneously maintain competitive performance and enhance mathematical interpretability in neural forecasting systems.

URL PDF HTML ☆

赞 0 踩 0

2505.03205 2026-05-19 cs.LG cs.NA math.NA math.ST stat.TH 版本更新

神经均衡用于非线性守恒律的长期预测

J. Antonio Lara Benitez, Kareem Hegazy, Junyi Guo, Ivan Dokmanić, Michael W. Mahoney, Maarten V. de Hoop

发表机构 * Rice University（里士大学）； ICSI and University of California at Berkeley（ICSI和加州大学伯克利分校）； University of Basel（巴塞尔大学）； ICSI, LBNL, and University of California at Berkeley（ICSI、劳伦斯伯克利国家实验室和加州大学伯克利分校）

AI总结本文提出NeurDE方法，通过结合守恒律与神经网络，实现对非线性守恒律系统更精确的长期预测，优于现有SciML方法。

详情

AI中文摘要

非线性守恒律 governing 了科学和工业中广泛的重要物理系统，并是科学机器学习（SciML）的核心。大型通用模型提供速度，但替换求解器的数值和物理结构往往会牺牲稳定性、准确性和物理忠实性。本文旨在通过一种守恒意识的SciML backbone，即Neural Discrete Equilibrium（NeurDE），平衡守恒的归纳偏差与神经网络的灵活性和速度。NeurDE通过学习Boltzmann表述的局部平衡闭合，将机器学习置于动能求解器中。动能求解器仍执行传输、松弛、动量恢复和守恒；神经网络仅提供非线性平衡目标。NeurDE在6个守恒系统上进行测试，包括三个极具挑战性的亚声速、跨音速和超声速激波系统。NeurDE优于现有SciML方法，包括神经运算符和预训练SciML基础模型，后者分别大10^4和10^6倍。最值得注意的是，NeurDE在衍生自其的数值方法上有所改进。因此，NeurDE为保守模拟的科学机器学习提供了一个紧凑的目标：学习系统松弛的平衡律，而非自身演化的律本身。

英文摘要

Nonlinear conservation laws govern a broad class of important physical systems in science and industry and are central to scientific machine learning (SciML). Large general-purpose models offer speed, but replacing the numerical and physical structure of solvers often compromises stability, accuracy, and physical faithfulness. Here, we aim to balance the general inductive bias of conservation with the flexibility and speed of neural networks through a conservation-aware SciML backbone, which we call Neural Discrete Equilibrium (NeurDE). NeurDE places machine learning inside a kinetic solver by learning the local equilibrium closure of a Boltzmann formulation. The kinetic solver still performs transport, relaxation, moment recovery, and conservation; the neural network provides only the nonlinear equilibrium target. We test NeurDE on $6$ conserved systems, including three very challenging subsonic, transonic, and supersonic shock systems. NeurDE outperforms state-of-the-art SciML methods, including neural operators and pretrained SciML foundation models that are $10^4$ and $10^6$ times larger, respectively. Most notably, NeurDE improves upon the numerical method from which it is derived. NeurDE therefore provides a compact target for scientific machine learning in conservative simulation: learn the equilibrium law toward which the system relaxes, not the evolution law itself.

URL PDF HTML ☆

赞 0 踩 0

2411.18234 2026-05-19 cs.LG cs.AI cs.PF stat.CO 版本更新

Time-Efficient Hybrid Hyperparameter Tuning Approach for Cardiovascular Disease Classification

用于心血管疾病分类的高效混合超参数调优方法

Abhay Kumar Pathak, Mrityunjay Chaubey, Manjari Gupta

发表机构 * Department of Computer Science, Institute of Science, Banaras Hindu University（计算机科学系，科学学院，班纳拉森胡大学）； School of Computer Science, University of Petroleum and Energy Studies（计算机科学学院，石油与能源研究大学）

AI总结本文提出一种结合随机搜索和网格搜索的混合超参数调优方法，提升心血管疾病分类模型的准确性和效率，实验表明该方法在性能和计算时间上均优于传统方法。

详情

AI中文摘要

心血管疾病（CVDs）是任何严重的心脏疾病，需要准确诊断以防止致命后果。超参数调优在优化机器学习模型性能中起关键作用，通过选择最合适的参数配置来提高准确性、泛化性和可靠性。网格搜索系统地评估预定义的超参数组合，而随机搜索则从搜索空间中随机采样配置，实现更广泛的探索并减少计算成本。因此，在开发分类模型时，高效调优策略至关重要，因为时间和预测能力同样关键。本文提出了一种新的超参数调优方法，用于调优用于CVD分类的机器学习模型。所提出的随机网格搜索结合了随机搜索探索全局空间的能力和网格搜索在最有前途区域的集中和彻底搜索。这种混合方法在探索和利用之间找到最佳平衡，产生了一个稳健且高效的时间机器学习模型。在最先进的模型上的实验结果表明，随机网格搜索比传统超参数调优方法表现更好。除了观察到的模型性能提升外，大多数模型的训练所需计算时间也显著减少。所提研究的结果强调了所提出随机网格搜索方法在训练时间和计算效率上的减少。所提出的技术在医疗保健领域的机器学习应用中具有重大潜力，能够提供及时且准确的CVDs诊断。

英文摘要

Cardiovascular diseases (CVDs) are any serious illness of the heart, which require accurate diagnosis to prevent fatal consequences. Hyperparameter tuning plays a critical role in optimizing machine learning model performance by selecting the most suitable parameter configurations for improved accuracy, generalization, and reliability. Grid search systematically evaluates predefined hyperparameter combinations, whereas random search samples configurations randomly from the search space enabling broader exploration with reduced computational cost. Therefore, an efficient tuning strategy is essential when developing classification models where time plays an crucial role along with the predictive capability. In this work, we propose a new hyperparameter tuning approach to tune the hyperparameters of ML models for CVD classification. The proposed random grid search combines the power of random search to explore the global space with the focused and exhaustive search of grid search in the most promising areas. This hybrid approach finds an optimal balance between exploration and exploitation and yields a robust and time-efficient ML model for classification seetings. Experimental results on state of the art models demonstrated that randomised grid search performed better than traditional hyperparameter tuning methods. In addition to the observed improvement in model performance, the computational time required for training models was substantially reduced across most of the models. Presented results of the proposed study emphasizes the reduction in training time and computational efficiency of the proposed Randomized-Grid Search method. The proposed technique has significant potential to advance ML application in healthcare providing timely and accurate CVDs diagnosis.

URL PDF HTML ☆

赞 0 踩 0

2411.10636 2026-05-19 cs.CL cs.AI cs.LG 版本更新

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

缓解孟加拉语分类任务中的外在性别偏见

Sajib Kumar Saha Joy, Arman Hassan Mahy, Meherin Sultana, Azizah Mamun Abha, MD Piyal Ahmmed, Yue Dong, G M Shahariar

发表机构 * Ahsanullah University of Science and Technology（阿沙努拉科学与技术大学）； University of California, Riverside（加州大学河滨分校）

AI总结本文研究了孟加拉语预训练语言模型中的外在性别偏见，构建了四个任务特定的基准数据集，并提出RandSymKL方法以缓解偏见，实验表明其能有效减少偏见并保持高准确率。

详情

AI中文摘要

在本研究中，我们探讨了孟加拉语预训练语言模型中的外在性别偏见，这是一个在低资源语言中鲜有研究的领域。为了评估这种偏见，我们构建了四个人工标注的任务特定基准数据集，用于情感分析、毒性检测、仇恨言论检测和讽刺检测。每个数据集都通过细致的性别扰动进行了增强，通过系统地交换性别化名称和术语并保持语义内容，实现了对性别驱动预测变化的最小配对评估。然后，我们提出RandSymKL，一种整合对称KL散度和交叉熵损失的随机去偏策略，以在任务特定的预训练模型中缓解偏见。RandSymKL是一种精炼的训练方法，以统一的方式整合这些元素，专注于分类任务的外在性别偏见缓解。我们的方法在现有偏见缓解方法上进行了评估，结果表明，我们的技术不仅有效减少了偏见，还与其他基线方法相比保持了竞争性的准确性。为了促进进一步研究，我们已公开了我们的实现和数据集：https://github.com/sajib-kumar/Mitigating-Bangla-Extrinsic-Gender-Bias

英文摘要

In this study, we investigate extrinsic gender bias in Bangla pretrained language models, a largely underexplored area in low-resource languages. To assess this bias, we construct four manually annotated, task-specific benchmark datasets for sentiment analysis, toxicity detection, hate speech detection, and sarcasm detection. Each dataset is augmented using nuanced gender perturbations, where we systematically swap gendered names and terms while preserving semantic content, enabling minimal-pair evaluation of gender-driven prediction shifts. We then propose RandSymKL, a randomized debiasing strategy integrated with symmetric KL divergence and cross-entropy loss to mitigate the bias across task-specific pretrained models. RandSymKL is a refined training approach to integrate these elements in a unified way for extrinsic gender bias mitigation focused on classification tasks. Our approach was evaluated against existing bias mitigation methods, with results showing that our technique not only effectively reduces bias but also maintains competitive accuracy compared to other baseline approaches. To promote further research, we have made both our implementation and datasets publicly available: https://github.com/sajib-kumar/Mitigating-Bangla-Extrinsic-Gender-Bias

URL PDF HTML ☆

赞 0 踩 0

2410.07191 2026-05-19 cs.RO cs.LG stat.ME 版本更新

Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving

抑制注意力：因果注意力门控用于自动驾驶中的鲁棒轨迹预测

Ehsan Ahmadi, Ray Mercurius, Soheil Alizadeh, Kasra Rezaee, Amir Rasouli

发表机构 * University of Alberta（阿尔伯塔大学）； Noah’s Ark Laboratory, Huawei Technologies Canada（华为加拿大诺亚实验室）； Cornell University（康奈尔大学）

AI总结本文提出CRiTIC模型，通过因果发现网络识别agent间因果关系，并引入因果注意力门控机制提升轨迹预测的鲁棒性和泛化能力，实验表明模型在对抗非因果扰动时鲁棒性提升54%。

Comments Accepted ICRA 2025

详情

DOI: 10.1109/ICRA55743.2025.11128367

AI中文摘要

自动驾驶中的轨迹预测模型易受非因果代理的扰动影响，此类扰动可能导致其他代理轨迹预测错误，进而影响自动驾驶决策的安全性和效率。本文提出CRiTIC模型，利用因果发现网络识别过去时间窗口内代理间的因果关系，并引入因果注意力门控机制，以选择性过滤Transformer架构中的信息。在两个自动驾驶基准数据集上进行了大量实验，评估了模型在对抗非因果扰动和泛化能力方面的鲁棒性。实验结果表明，预测鲁棒性可提升54%而对预测准确性影响不大。此外，本文展示了所提模型在跨域性能上的优越泛化能力，达到29%的改进。进一步细节请参见项目页面：https://ehsan-ami.github.io/critic。

英文摘要

Trajectory prediction models in autonomous driving are vulnerable to perturbations from non-causal agents whose actions should not affect the ego-agent's behavior. Such perturbations can lead to incorrect predictions of other agents' trajectories, potentially compromising the safety and efficiency of the ego-vehicle's decision-making process. Motivated by this challenge, we propose $\textit{Causal tRajecTory predICtion}$ $\textbf{(CRiTIC)}$, a novel model that utilizes a $\textit{Causal Discovery Network}$ to identify inter-agent causal relations over a window of past time steps. To incorporate discovered causal relationships, we propose a novel $\textit{Causal Attention Gating}$ mechanism to selectively filter information in the proposed Transformer-based architecture. We conduct extensive experiments on two autonomous driving benchmark datasets to evaluate the robustness of our model against non-causal perturbations and its generalization capacity. Our results indicate that the robustness of predictions can be improved by up to $\textbf{54%}$ without a significant detriment to prediction accuracy. Lastly, we demonstrate the superior domain generalizability of the proposed model, which achieves up to $\textbf{29%}$ improvement in cross-domain performance. These results underscore the potential of our model to enhance both robustness and generalization capacity for trajectory prediction in diverse autonomous driving domains. Further details can be found on our project page: https://ehsan-ami.github.io/critic.

URL PDF HTML ☆

赞 0 踩 0

2409.07014 2026-05-19 stat.ML cs.DB cs.LG 版本更新

A Practical Theory of Generalization in Selectivity Learning

选择性学习中泛化理论的实用性研究

Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结本文从理论与实践角度探讨选择性学习的泛化能力，提出基于有符号测度的可学习预测方法，并改进OOF泛化性能。

Comments 15 pages. Technical Report (Extended Version)

详情

DOI: 10.14778/3725688.3725708

AI中文摘要

查询驱动的机器学习模型已作为一种有前途的查询选择性估计技术出现。然而，从理论角度看，这些技术的有效性仍知之甚少，因为实际解决方案与基于Probably Approximately Correct (PAC) 学习框架的最先进理论之间存在显著差距。本文旨在弥合理论与实践之间的差距。首先，我们证明由符号测度诱导的选择性预测器是可学习的，这放松了PAC理论对概率测度的依赖。更重要的是，在此基础上，我们建立了在温和假设下，此类选择性预测器在分布外（OOD）泛化误差界上的有利表现。这些理论进步为我们提供了对查询驱动选择性学习的分布内和分布外泛化能力的更好理解，并促进了两种改进分布外泛化的通用策略的设计。我们实证验证了我们的技术在预测准确性和查询延迟性能方面显著帮助查询驱动选择性模型泛化到分布外查询，同时保持其优越的分布内泛化性能。

英文摘要

Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we aim to bridge the gaps between theory and practice. First, we demonstrate that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory. More importantly, beyond the PAC learning framework (which only allows us to characterize how the model behaves when both training and test workloads are drawn from the same distribution), we establish, under mild assumptions, that selectivity predictors from this class exhibit favorable out-of-distribution (OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the in-distribution and OOD generalization capabilities of query-driven selectivity learning, and facilitate the design of two general strategies to improve OOD generalization for existing query-driven selectivity models. We empirically verify that our techniques help query-driven selectivity models generalize significantly better to OOD queries both in terms of prediction accuracy and query latency performance, while maintaining their superior in-distribution generalization performance.

URL PDF HTML ☆

赞 0 踩 0

2409.02428 2026-05-19 cs.LG cs.AI cs.CL cs.SY eess.SY 版本更新

Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement

语言模型作为定制环境多目标强化学习的高效奖励函数搜索器

Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University, China（清华大学深圳国际研究生院，清华大学，中国）； Department of Computer Science, University of Oxford, United Kingdom（英国牛津大学计算机科学系）； Department of Data Science, New Jersey Institute of Technology, USA（美国新泽西理工学院数据科学系）

AI总结本文提出ERFSL，利用语言模型高效搜索奖励函数，通过生成奖励组件和使用奖励批评者修正代码，实现多目标强化学习任务中零样本学习的高效奖励函数设计。

详情

基于变压器卷积神经网络的短段儿童心音分类

Md Hassanuzzaman, Nurul Akhtar Hasan, Mohammad Abdullah Al Mamun, Khawza I Ahmed, Ahsan H Khandoker, Raqibul Mostafa

AI总结本文研究了用于自动分类心音的最短信号持续时间，采用基于MFCC特征的变压器残差一维卷积神经网络，发现5秒信号能获得93.69%的准确率，而3秒信号信息不足，15秒信号噪声较多。

Comments 16 pages,11 Figures

详情

DOI: 10.1109/ACCESS.2025.3573870
Journal ref: IEEE Access, vol. 13, pp. 93852-93868, 2025

AI中文摘要

先天性心脏病（CHDs）是由于心脏和大血管结构缺陷导致的先天异常。PCG能提供关于心脏机械传导系统的重要信息，并指出与不同CHD类型相关的特定模式。本研究旨在调查自动分类心音所需的最短信号持续时间。此外，研究还探讨了最佳信号质量评估指标（RMSSD和ZCR值）。基于MFCC特征构建了变压器残差一维卷积神经网络，用于分类心音。研究显示，0.4是RMSSD和ZCR指标获得合适信号的理想阈值。此外，5秒信号是有效心音分类所需的最小信号长度。研究还表明，较短的信号（3秒心音）无法准确分类，而较长的信号（15秒心音）可能包含更多噪声。5秒信号在区分心音方面获得了最佳准确率93.69%。

英文摘要

Congenital anomalies arising as a result of a defect in the structure of the heart and great vessels are known as congenital heart diseases or CHDs. A PCG can provide essential details about the mechanical conduction system of the heart and point out specific patterns linked to different kinds of CHD. This study aims to investigate the minimum signal duration required for the automatic classification of heart sounds. This study also investigated the optimum signal quality assessment indicator (Root Mean Square of Successive Differences) RMSSD and (Zero Crossings Rate) ZCR value. Mel-frequency cepstral coefficients (MFCCs) based feature is used as an input to build a Transformer-Based residual one-dimensional convolutional neural network, which is then used for classifying the heart sound. The study showed that 0.4 is the ideal threshold for getting suitable signals for the RMSSD and ZCR indicators. Moreover, a minimum signal length of 5s is required for effective heart sound classification. It also shows that a shorter signal (3 s heart sound) does not have enough information to categorize heart sounds accurately, and the longer signal (15 s heart sound) may contain more noise. The best accuracy, 93.69%, is obtained for the 5s signal to distinguish the heart sound.

URL PDF HTML ☆

赞 0 踩 0

2403.11782 2026-05-19 cs.LG stat.ML 版本更新

A tutorial on learning from preferences and choices with Gaussian Processes

基于高斯过程的学习偏好与选择教程

Alessio Benavoli, Dario Azzimonti

发表机构 * School of Computer Science and Statistics, Trinity College Dublin（三一学院都柏林计算机科学与统计学系）； SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)（SUPSI瑞士人工智能研究所）

AI总结本文介绍了利用高斯过程进行偏好学习的框架，结合经济学和决策理论原理，提出新颖的模型以填补现有文献的空白。

2401.03717 2026-05-19 cs.LG cs.AI 版本更新

Universal Time-Series Representation Learning: A Survey

通用时间序列表示学习：综述

Patara Trirat, Yooju Shin, Junhyeok Kang, Youngeun Nam, Jihye Na, Minyoung Bae, Joeun Kim, Byunghyun Kim, Jae-Gil Lee

发表机构 * KAIST（韩国延世大学）

AI总结本文综述了时间序列数据表示学习方法，探讨了深度学习在提取隐藏模式中的优势，并提出了新的分类方法以指导未来研究。

Comments Accepted by ACM Computing Surveys. Extended version: 41 pages, 7 figures

详情

AI中文摘要

时间序列数据存在于现实世界的各个方面，从天空中的卫星到身上的可穿戴设备。通过提取和推断有价值的信息来学习表示对于理解复杂现象的动力学和做出明智决策至关重要。深度学习在无需手动特征工程的情况下展示了在时间序列数据中提取隐藏模式和特征的卓越性能。本文首先提出了一种基于三种基本要素的新分类方法，用于设计最先进的通用表示学习方法。根据该分类法，本文全面回顾了现有研究，讨论了这些方法如何提高学习表示的质量。最后，作为未来研究的指南，本文总结了常用的实验设置和数据集，并讨论了几个有前途的研究方向。相关资源可在https://github.com/itouchz/awesome-deep-time-series-representations上找到。

英文摘要

Time-series data exists in every corner of real-world systems and services, ranging from satellites in the sky to wearable devices on human bodies. Learning representations by extracting and inferring valuable information from these time series is crucial for understanding the complex dynamics of particular phenomena and enabling informed decisions. With the learned representations, we can perform numerous downstream analyses more effectively. Among several approaches, deep learning has demonstrated remarkable performance in extracting hidden patterns and features from time-series data without manual feature engineering. This survey first presents a novel taxonomy based on three fundamental elements in designing state-of-the-art universal representation learning methods for time series. According to the proposed taxonomy, we comprehensively review existing studies and discuss their intuitions and insights into how these methods enhance the quality of learned representations. Finally, as a guideline for future studies, we summarize commonly used experimental setups and datasets and discuss several promising research directions. An up-to-date corresponding resource is available at https://github.com/itouchz/awesome-deep-time-series-representations.

URL PDF HTML ☆

赞 0 踩 0

2310.07983 2026-05-19 cs.LG math.OC stat.ML 版本更新

Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization

通过ProxSkip在分布式随机优化中实现线性加速

Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao

发表机构 * School of Computer Science and Engineering, Suzhou University of Technology（苏州科技大学计算机科学与工程学院）； Department of Electrical Engineering, Kuwait University（科威特大学电子工程系）； Center for Machine Learning Research, Peking University（北京大学机器学习研究中心）； King Abdullah University of Science and Technology（卡塔尔科学与技术大学）； School of Mathematics, Southeast University（东南大学数学系）； Purple Mountain Laboratories（紫金山实验室）

AI总结本文研究了ProxSkip在非凸设置下的收敛性，证明其在节点数量上实现线性加速，并展示了局部更新对通信效率的提升作用。

详情

AI中文摘要

ProxSkip算法在分布式优化中因其减少通信的效果而受到越来越多的关注。然而，现有分析仅限于强凸设置，无法实现节点数量的线性加速。本文重新审视去中心化ProxSkip，回答了其在非凸设置下的行为及线性加速的可实现性问题。我们为随机非凸、凸和强凸问题提供了统一的收敛分析，揭示了梯度噪声、局部更新、网络连通性和数据异质性如何共同决定收敛行为。到目前为止，这是首次证明去中心化ProxSkip在随机梯度下实现节点数量线性加速的分析。此外，我们的结果表明，局部更新可以有效减少通信频率并提高通信效率。

英文摘要

The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its effectiveness in reducing communication. However, existing analyses of ProxSkip are limited to the strongly convex setting and fail to achieve linear speedup with respect to the number of nodes. Key questions regarding its behavior in the non-convex setting and the achievability of linear speedup remain open. In this paper, we revisit decentralized ProxSkip and answer these questions affirmatively. We provide a unified convergence analysis for stochastic non-convex, convex, and strongly convex problems, revealing how gradient noise, local updates, network connectivity, and data heterogeneity jointly determine the convergence behavior. To the best of our knowledge, this is the first analysis showing that decentralized ProxSkip achieves linear speedup in the number of nodes under stochastic gradients. Moreover, our results demonstrate that local updates can effectively reduce communication frequency and improve communication efficiency.

URL PDF HTML ☆

赞 0 踩 0

2309.05646 2026-05-19 cs.CR cs.LG cs.NI 版本更新

Lightweight CNN-Based DDoS Detection for Resource-Constrained Edge Networks

轻量级基于CNN的DDoS检测用于资源受限的边缘网络

Vedanth Ramanathan, Krish Mahadevan, Sejal Dua

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Green River College（绿河学院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出一种轻量级监督深度学习方法，利用CNN检测DDoS攻击，通过提取包流特征并进行分类，实现低延迟的边缘网络检测。

详情

AI中文摘要

分布式拒绝服务（DDoS）攻击仍然是互联网服务、边缘网络和网络物理基础设施可用性的持续威胁。尽管最近的AI安全工作越来越多地关注基础模型、自主代理和对抗鲁棒性，但许多运营防御任务仍然需要靠近网络边缘的低延迟分类，其中云规模分析可能太慢或昂贵。本文提出了一种轻量级监督深度学习方法，使用卷积神经网络（CNN）对来自CIC-DDoS2019基准数据集的包流表示进行训练。所提出的流程从PCAP流量中提取包流，将其标准化为固定长度的表示，并使用紧凑的CNN架构（包含卷积、丢弃、池化和Sigmoid分类层）将每个流分类为良性或恶意。在测试集上，模型在0.28秒内处理评估的测试流，达到0.9883的准确率、0.9864的精确率、0.9784的召回率和0.9824的F1分数。这些结果表明，紧凑的神经模型可以为面向边缘的DDoS检测提供有用的早期预警信号。我们进一步讨论了部署限制、基准限制以及跨数据集评估、硬件感知分析和与缓解管道集成的未来方向。

英文摘要

Distributed Denial of Service (DDoS) attacks remain a persistent threat to the availability of Internet services, edge networks, and cyber-physical infrastructure. Although recent AI-security work has increasingly focused on foundation models, autonomous agents, and adversarial robustness, many operational defense tasks still require low-latency classification close to the network edge, where cloud-scale analysis may be too slow or expensive. This paper presents a lightweight supervised deep learning approach for DDoS detection using a convolutional neural network (CNN) trained on packet-flow representations derived from the CIC-DDoS2019 benchmark dataset. The proposed pipeline extracts packet flows from PCAP traffic, normalizes them to fixed-length representations, and classifies each flow as benign or malicious using a compact CNN architecture with convolution, dropout, pooling, and sigmoid classification layers. On a held-out test set of previously unseen flows, the model achieves 0.9883 accuracy, 0.9864 precision, 0.9784 recall, and 0.9824 F1 score, while processing the evaluated test flows in 0.28 seconds. These results suggest that compact neural models can provide useful early-warning signals for edge-oriented DDoS detection. We further discuss deployment constraints, benchmark limitations, and future directions for cross-dataset evaluation, hardware-aware profiling, and integration with mitigation pipelines.

URL PDF HTML ☆

赞 0 踩 0

2305.10721 2026-05-19 cs.LG cs.AI 版本更新

Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping

重新审视长期时间序列预测：对线性映射的调查

Zhe Li, Shiyi Qi, Yiduo Li, Zenglin Xu

发表机构 * Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学深圳研究院）

AI总结本文研究了长期时间序列预测中线性映射的有效性，揭示了仿射映射在周期信号预测中的关键作用，并探讨了可逆归一化和输入时间 horizon 对模型鲁棒性的影响。

详情

DOI: 10.20935/AcadAI8236
Journal ref: Li, Zhe, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting Long-Term Time Series Forecasting: an Investigation on Affine Mapping. Academia AI and Applications 2, no. 2 (2026)

AI中文摘要

引言：长期时间序列预测（LTSF）近年来获得了广泛关注。尽管存在各种专门设计来捕捉时间依赖性的方法，但近期研究表明，甚至一个单一的线性层也能取得竞争性的性能。本文研究了近期LTSF方法的内在有效性，并揭示了仿射映射在周期信号预测中的关键作用。材料和方法：我们对模拟和现实世界的数据集进行了全面实验，以分析最先进模型的组成部分。我们提供了理论分析，解释仿射映射在周期信号预测中的工作机制。我们评估了可逆归一化和输入时间跨度扩展对模型鲁棒性的影响。结果：我们发现（1）仿射映射在常用的基准测试中主导了预测性能，模型从输入到输出学习了相似的转换矩阵；（2）仿射映射能够有效捕捉周期性模式，但在非周期性信号或具有不同周期的时序数据中表现较差；（3）可逆归一化显著增强了趋势预测，通过将非周期性趋势转换为周期性模式；（4）增加输入时间跨度提高了多通道数据的性能。代码可在：https://github.com/plumprc/RTSF获得。结论：我们的发现为LTSF模型的工作机制提供了理论和实验见解，突显了线性方法的优势和局限性。结果表明，未来模型的发展应关注处理跨通道周期变化和非周期性成分。

英文摘要

Introduction: Long-term time series forecasting (LTSF) has gained significant attention in recent years. While various specialized designs exist for capturing temporal dependency, recent studies have shown that even a single linear layer can achieve competitive performance. This paper investigates the intrinsic effectiveness of recent LTSF approaches and reveals the critical role of affine mapping. Materials and methods: We conduct comprehensive experiments on both simulated and real-world datasets to analyze the components of state-of-the-art models. A theoretical analysis is provided to explain the working mechanisms of affine mapping in periodic signal forecasting. We evaluate the impact of reversible normalization and input horizon extension on model robustness. Results: We find that (1) affine mapping dominates forecasting performance across commonly utilized benchmarks, with models learning similar transition matrices from input to output; (2) affine mapping effectively captures periodic patterns but struggles with non-periodic signals or time series with varying periods across channels; (3) reversible normalization significantly enhances trend forecasting by transforming non-periodic trends into periodic-like patterns; (4) increasing input horizon improves performance on multi-channel data with different periods. Code is available at: \url{https://github.com/plumprc/RTSF}. Conclusions: Our findings provide theoretical and experimental insights into the working mechanisms of LTSF models, highlighting both the strengths and limitations of linear approaches. The results suggest that future model development should focus on handling cross-channel period variations and non-periodic components.

URL PDF HTML ☆

赞 0 踩 0

2212.05155 2026-05-19 cs.DC cs.LG 版本更新

Cost-aware Duration Prediction for Software Upgrades in Datacenters

面向数据中心软件升级的成本感知持续时间预测

Yi Ding, Aijia Gao, Thibaud Ryden, Michal Sedlak, Essam Ewaisha, Igor Marnat, Henry Hoffmann

发表机构 * Meta

AI总结本文提出Acela框架，通过考虑不对称预测成本和选择最佳模型，提升数据中心软件升级调度效率和吞吐量，实测提升升级窗口利用率1.25倍，升级数量增加33%。

Comments 18 pages, 25 figures. The 9th MLSys Conference (Industry Track), Bellevue, WA, USA, 2026

详情

AI中文摘要

软件升级是维护数据中心服务器可靠性的重要环节。尽管作业持续时间预测和调度已广泛研究，但软件升级带来的独特挑战仍被低估。本文首次深入研究数据中心级别的软件升级调度。我们首先刻画各种升级类型，然后将调度任务建模为约束优化问题。为解决此问题，我们引入Acela，一种成本感知的持续时间预测框架，旨在提高升级调度效率和吞吐量，同时满足服务等级目标（SLOs）。Acela考虑不对称的预测成本，战略性地选择最佳预测模型，并缓解滞后效应导致的预测过高。在Meta生产数据中心系统的评估中，Acela显著提高了现有升级调度器的效率，通过提升升级窗口利用率1.25倍，增加计划和完成的升级数量33%和41%，并减少取消率2.4倍。代码和数据集将在论文通过后发布。

英文摘要

Software upgrades are critical to maintaining server reliability in datacenters. While job duration prediction and scheduling have been extensively studied, the unique challenges posed by software upgrades remain largely under-explored. This paper presents the first in-depth investigation into software upgrade scheduling at datacenter scale. We begin by characterizing various types of upgrades and then frame the scheduling task as a constrained optimization problem. To address this problem, we introduce Acela, a cost-aware duration prediction framework designed to improve upgrade scheduling efficiency and throughput while meeting service-level objectives (SLOs). Acela accounts for asymmetric misprediction costs, strategically selects the best predictive models, and mitigates straggler-induced overestimations. Evaluations on Meta's production datacenter systems demonstrate that Acela significantly increases efficiency of the existing upgrade scheduler by improving upgrade window utilization by 1.25X, increasing the number of scheduled and completed upgrades by 33% and 41%, and reducing cancellation rates by 2.4X. The code and data sets will be released after paper acceptance.

URL PDF HTML ☆

赞 0 踩 0

2605.16640 2026-05-19 cs.LG 版本更新

Provably Shorter Scratchpads in Hybrid DeltaNet-Attention Decoders

在混合DeltaNet-注意力解码器中证明更短的scratchpad

Tomasz Steifer

发表机构 * Centre for Credible AI（可信人工智能中心）； Warsaw University of Technology（华沙理工大学）

AI总结研究混合递归-注意力解码器的表达能力，证明混合架构在模型表达性和效率上有优势，使用常数精度假设，Qwen风格的混合模型能以常数scratchpad解决parity-conditioned检索任务，而纯DeltaNet或纯注意力模型需多项式scratchpad。

Comments Under review at a ML conference

2605.16639 2026-05-19 cs.LG 版本更新

MedMIX: Modality-Internal Expert Fusion for Multimodal Medical Diagnosis

MedMIX：多模态医学诊断中的模态内部专家融合

Seungik Cho, Anqi Li, Wei Qiu

发表机构 * Department of Physics and Astronomy（物理与天文学系）； Department of Electrical and Computer Engineering（电气与计算机工程系）； Rice University（里奇大学）

AI总结 MedMIX通过融合模态内部专家、跨模态学习融合及大-小模型协作，提升多模态医学预测的鲁棒性，适用于缺失模态的场景。

详情

AI中文摘要

多模态临床预测面临三个挑战：每种模态有多个互补基础模型、训练和测试时存在普遍缺失模态、以及模态贡献的样本特异性变化。我们引入MedMIX，一种多模态框架，结合模态内部专家融合、学习跨模态融合以及训练时的大-小模型协作，以在不完整模态下实现稳健的医学预测。在每种模态内，MedMIX聚合多个小型专家模型的互补嵌入；跨模态时，它在可用模态上执行学习融合；训练时，它利用大教师模型来改进部署的表示，而无需额外推理成本。在三个异质基准（OpenI、MIMIC-IV-MM和MMIST-ccRCC）上，MedMIX在保持缺失模态扰动下的鲁棒性的同时实现了持续强劲的性能，并进一步在MIMIC-III上展示了跨队列转移的持续鲁棒性。这些结果突显了MedMIX作为一种实用框架，它统一了模态内部专家协作、样本特异性跨模态融合以及高效的大小模型协作，同时在不完整模态下保持稳健。

英文摘要

Multimodal clinical prediction faces three challenges: multiple foundation models (FMs) with complementary strengths per modality, pervasive missing modalities at training and test time, and sample-specific variation in modality contributions. We introduce MedMIX, a multimodal framework that combines intra-modality expert fusion, learned inter-modality fusion, and training-only large--small model collaboration for robust medical prediction under incomplete modalities. Within each modality, MedMIX aggregates complementary embeddings from multiple small expert models; across modalities, it performs learned fusion over available modalities; and during training, it leverages large teacher models to improve deployed representations without additional inference cost. Across three heterogeneous benchmarks (OpenI, MIMIC-IV-MM, and MMIST-ccRCC), MedMIX achieves consistently strong performance while remaining robust under controlled missing-modality perturbations, and further demonstrates sustained robustness under cross-cohort shift on MIMIC-III. These results highlight MedMIX as a practical framework that unifies within-modality expert collaboration, sample-specific cross-modality fusion, and efficient large--small model collaboration while remaining robust to incomplete modalities.

URL PDF HTML ☆

赞 0 踩 0

2605.16632 2026-05-19 cs.LG cs.AI cs.LO 版本更新

Learning How to Cube

学习如何求立方

Ferhat Erata, Sam Kouteili, Thanos Typaldos, Timos Antonopoulos, Robert B. Jones, Byron Cook, Ruzica Piskac

发表机构 * Yale University（耶鲁大学）； AWS Agentic AI（AWS智能体AI）

AI总结本文提出一种神经符号后训练框架，通过MCTS数据整理管道和符号启发式方法，使4B参数模型在SAT竞赛基准上取得53的pass@5分数，超越了Claude-Sonnet-4等前沿LLM。

Comments 33 pages, preprint

详情

AI中文摘要

尽管Cube-and-Conquer（C&C）在解决具有挑战性的布尔可满足性（SAT）问题上非常有效，但之前的工作没有展示基于Transformer的模型能够学习有效的求立方启发式方法。我们介绍了一种神经符号后训练框架。我们设计了一个基于MCTS的数据整理管道，利用符号启发式方法在SAT竞赛公式上探索分割决策，生成基于求解器统计信息的偏好数据，并辅以教师模型的推理轨迹。我们的两阶段后训练，监督微调（SFT）后接直接偏好优化（DPO），使4B参数模型在100个SAT竞赛基准上取得53的pass@5分数，超越了前沿LLM如Claude-Sonnet-4（50）并匹配最佳符号启发式（53）。消融实验显示，SFT单独将pass@5提升至51，DPO增加2个基准；对实际首次立方决策的熵/一致消融显示，SFT而非DPO导致根层决策多样性，产生互补的运行覆盖。这表明Transformer可以在传统由符号方法主导的领域中被训练出有效的求立方决策。

张量烹饪书：通过图表掌握张量

Beheshteh T. Rakhshan, Guillaume Rabusseau

AI总结本文通过图表语言阐述张量网络及其在张量代数中的应用，展示如何用图形化方法简化高维数据处理和梯度推导。

详情

AI中文摘要

高维数据在许多科学和工程领域自然出现，包括机器学习、信号处理、计算物理和统计学。此类数据通常表示为张量，即矩阵的多维推广。尽管张量为多模态结构提供了自然表示，但随着阶数增长，直接操作变得困难：参数数量呈指数增长，涉及多个索引的代数表达式难以理解和实现。张量网络（TNs）提供了解决这些挑战的有效框架。最初由Penrose引入并在量子物理中得到广泛发展，张量网络的图形语言将收缩表示为图中的边，减少了符号开销并揭示了被索引表示掩盖的结构特性。尽管高维张量在现代机器学习和数值分析中扮演核心角色，但张量网络图在量子计算之外仍被低估，部分原因是缺乏一个自包含的数学参考，可供广泛的技术受众使用。本文提供了一个自包含的张量网络指南及其在张量代数中的应用。我们介绍了张量的主要操作，如收缩、乘积和重塑，通过图形符号，并展示经典张量分解及相关计算如何自然地表达在此框架中。我们还说明了张量网络如何简化梯度推导和高维概率分布的操作。在整个过程中，我们展示了图示方法能够产生真正更简洁和透明的证明，证明经典恒等式、秩界和梯度公式，否则需要繁琐的索引操作。

英文摘要

High-dimensional data arise naturally in many areas of science and engineering, including machine learning, signal processing, computational physics, and statistics. Such data are often represented as tensors, multi-dimensional generalizations of matrices. While tensors provide a natural representation for multi-modal structure, their direct manipulation quickly becomes challenging as the order grows: the number of parameters increases exponentially, and algebraic expressions involving many indices become difficult to interpret and implement. Tensor networks (TNs) provide an effective framework for addressing these challenges. Originally introduced by Penrose and developed extensively in quantum physics, the graphical language of tensor networks encodes contractions as edges in a graph, reducing notational overhead and revealing structural properties obscured by index notation. Despite the central role of high-dimensional tensors in modern machine learning and numerical analysis, tensor network diagrams remain underutilized outside quantum computing, partly due to the lack of a self-contained mathematical reference accessible to a broad technical audience. This manuscript provides a self-contained guide to tensor networks and their use in tensor algebra. We present the main operations on tensors, contractions, products, and reshaping through, graphical notation, and show how classical tensor decompositions and related computations are naturally expressed in this framework. We also illustrate how tensor networks simplify the derivation of gradients and the manipulation of high-dimensional probability distributions. Throughout, we show that the diagrammatic approach yields genuinely shorter and more transparent proofs of classical identities, rank bounds, and gradient formulas that would otherwise require laborious index manipulation.

URL PDF HTML ☆

赞 0 踩 0

2605.16604 2026-05-19 cs.LG 版本更新

R2V Agent: Teaching SLMs When to Ask for Help

R2V Agent：教SLMs何时请求帮助

Raghu Vamshi Hemadri, Humaira Firdowse Mohammed, Rishabh Maheshwary, Srivatsava Daruru, Sagar Davasam, Vikas Yadav, Srinivas Sunkara, Sai Rajeswar

AI总结 R2V-Agent通过风险校准的SLM-LLM路由框架提升交互代理的可靠性，结合小型语言模型策略、更强的教师LLM、轻量级过程验证器和校准的步骤路由器，在多个基准测试中显著提升性能与成本效率。

详情

AI中文摘要

高效的智能体系统应仅在本地模型可能失败的决策中承担昂贵的前沿模型成本。现有LLM级联通常在执行前路由整个查询，但任务难度在轨迹中途变化——在不稳定的工具调用、截断观察或叠加的本地错误后发生，使预执行路由变得脆弱。我们引入R2V-Agent，一种用于交互智能体的风险校准SLM-LLM路由框架。R2V结合四个组件：一个蒸馏的小语言模型（SLM）策略、一个更强的教师LLM、一个轻量级的过程验证器，该验证器在每一步评分候选动作，以及一个校准的步骤级路由器。路由器是我们的核心贡献：在SLM训练后，它在每个步骤估计残余失败风险，并仅在教师干预必要时升级。为了使路由问题明确界定，我们首先使用标准离线流水线训练一个稳定的本地SLM：通过行为克隆（BC）在教师轨迹上进行训练，随后通过验证器引导的直接偏好优化（DPO）进行优化，结合一致性正则化。然后，路由器在该固定策略的残余失败上进行训练，使用Brier校准的概率估计和一个条件价值-at-风险（CVaR）约束的目标，该目标惩罚所有扰动种子下的最坏情况失败。在HumanEval+、TextWorld和TerminalBench四个SLM骨干网络上，R2V改进了可靠性-成本前沿：它在HumanEval+上达到94.3%的成功率，仅需0.60%的LLM升级，将TextWorld从64.6%的SLM-only成功提升到98.2%在41.7%的升级率下，最终在TerminalBench上达到93.3%的成功率，在33.9%的LLM调用中，大致是启发式路由器成本的一半。

英文摘要

Efficient agentic systems should incur expensive frontier-model costs only on decisions where a cheaper local model is likely to fail. Existing LLM cascades usually route whole queries before execution, but task difficulty shifts mid-trajectory - after flaky tool calls, truncated observations, or compounding local errors - making pre-execution routing brittle. We introduce \textbf{R2V-Agent}, a risk-calibrated SLM-LLM routing framework for interactive agents. R2V combines four components: a distilled small language model (SLM) policy, a stronger teacher LLM, a lightweight process verifier that scores candidate actions at each step, and a calibrated step-level router. The router is our central contribution: after the SLM is trained, it estimates residual failure risk at each step and escalates only when teacher intervention is warranted. To make the routing problem well-defined, we first train a stable local SLM using a standard offline pipeline: behavioral cloning (BC) on teacher trajectories, followed by verifier-guided Direct Preference Optimization (DPO) with consistency regularization. The router is then trained on this fixed policy's residual failures using Brier-calibrated probability estimation and a Conditional Value-at-Risk (CVaR)-constrained objective that penalizes worst-case failures across perturbation seeds. Across HumanEval+, TextWorld, and TerminalBench with four SLM backbones, R2V improves the reliability-cost frontier: it achieves $94.3\%$ HumanEval+ success with $0.60\%$ LLM escalation, recovers TextWorld from $64.6\%$ SLM-only success to $98.2\%$ at $41.7\%$ escalation, and reaches $93.3\%$ TerminalBench success at $33.9\%$ LLM calls, roughly half the heuristic-router cost.

URL PDF HTML ☆

赞 0 踩 0

2605.16600 2026-05-19 cs.LG cs.AI cs.CL 版本更新

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

预训练写入，对齐读取：Transformer权重空间的不对称性

Valeria Ruscio, Eli-Shaoul Khedouri, Keiran Thompson

AI总结研究揭示了预训练和对齐在Transformer权重空间中的不对称性，通过分析权重变化在残差流激活子空间和预测子空间中的对齐情况，发现读路径权重集中于注意力输入激活的主方向，而写路径权重在预测子空间中保持各向同性。

详情

AI中文摘要

非递减生存回归：从深度Cox模型中校准生存分布

Anchit Jain, Kevin Zhang, Stephen Bates

AI总结本文提出一种非递减回归方法，用于校准深度Cox模型的生存概率，通过理论保证和实验验证提升模型实用性。

2605.16567 2026-05-19 cs.LG cs.AI cs.DB 版本更新

Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version

自动无监督集成异常检测模型选择——扩展版

Hong-Phuc Phan, Tuan-Anh Vu, Tung Kieu, Son Ha Xuan, Bin Yang, Christian S. Jensen

AI总结本文提出MetaEns框架，通过学习预测边际增益模型，自动选择高质异常检测模型集成，无需标注数据，实验显示其在39个真实数据集上表现优异。

Comments 25 pages. An extended version of "Automatic Unsupervised Ensemble Outlier Model Selection" accepted at ICML 2026

详情

AI中文摘要

无监督异常检测因其无需标注数据而具有吸引力。此外，多模型集成可提高检测鲁棒性。然而，无标注数据下构建集成具有挑战性。简单集成可能因冗余或不可靠的检测模型导致饱和问题。我们提出MetaEns，一种自动无监督框架，用于选择异常检测模型的集成。利用标注元数据集，MetaEns学习预测边际增益模型，估计添加候选模型到部分构建集成的预期改进。在测试时，该学习信号结合子模函数启发的代理目标，通过多样性感知折扣和家族级风险正则化，实现贪心顺序选择与自适应提前停止。结果表明，MetaEns可在无真实标签的情况下构建紧凑高质量的集成。在39个真实数据集上的实验显示，MetaEns在平均精度上优于现有无监督选择器和集成基线，同时使用更少的模型。

英文摘要

Unsupervised outlier detection is attractive because it eliminates the need for labeled data. Moreover, forming multi-model ensembles can improve detection robustness. However, composing an ensemble without labeled data is challenging. Naively composed ensembles can suffer from ensemble saturation, where redundant or unreliable detection models degrade performance and incur unnecessary computation. We propose MetaEns, an automatic unsupervised framework for selecting ensembles of outlier detection models. Using labeled meta-datasets, MetaEns learns a model that predicts marginal ensemble gains, estimating the expected improvement from adding a candidate model to a partially constructed ensemble. At test time, this learned signal is combined with a submodular-inspired proxy objective that enforces diminishing returns through diversity-aware discounting and family-level risk regularization, thereby enabling greedy sequential selection with adaptive early stopping. As a result, MetaEns constructs compact, high-quality ensembles without access to ground-truth labels. Experiments on 39 real-world datasets show that MetaEns consistently outperforms state-of-the-art unsupervised selectors and ensemble baselines, achieving higher average precision while using fewer models.

URL PDF HTML ☆

赞 0 踩 0

2605.16550 2026-05-19 cs.CV cs.LG 版本更新

Attention-Aware Transformer-Based Aggregation Network for Video Periocular Recognition

基于注意力的变换器聚合网络用于视频眼周识别

Luiz G F Carreira, Breno A Mariano, Victor H C de Melo, David Menotti, William Robson Schwartz

AI总结本文提出一种基于变换器的聚合网络，用于视频眼周识别，通过特征嵌入和聚合模块提升识别鲁棒性，在COX Face数据集上优于传统方法，达到99.8%的TPR@1e-1和96.6%的Rank-5。

详情

AI中文摘要

视频眼周识别是基于个体眼睛周围区域识别身份的任务。眼周区域是人脸最具有区分性的区域之一，使其适合识别任务。其作为生物特征模态的应用在监控环境中逐渐兴起，尤其是在传统生物特征如面部或虹膜识别因非受限采集条件而不可行时。本文提出了一种针对监控环境的视频眼周识别的注意力感知方法。该框架包含两个主要模块：特征嵌入和聚合。特征嵌入模块是一个深度卷积神经网络，将眼周数据映射到特征向量。聚合模块是一个仅含编码器的变换器，能够自适应地将帧级特征聚合为单一视频表示和静态参考图像的特征向量。在公开可用的COX Face数据集上的实验表明，所提方法的鲁棒性，一致优于传统聚合方案。在最佳情况下，该方法实现了99.8%的TPR@1e-1和96.6%的Rank-5。

英文摘要

Video periocular recognition is the task of recognizing an individual's identity based on the region around an individual's eyes. The periocular area is one of the most discriminative regions of the human face, making it suitable for recognition tasks. Its use as a biometric modality has emerged as an alternative, especially in surveillance scenarios where conventional biometric traits such as face or iris recognition become unfeasible due to unconstrained acquisition conditions. This paper proposes an attention-aware approach for video-based periocular recognition in surveillance environments. The framework consists of two main modules: feature embedding and aggregation. The feature embedding module is a deep convolutional neural network that maps periocular data to feature vectors. The aggregation module is an encoder-only transformer that adaptively learns to aggregate frame-level features into a single video representation and a feature vector for the still reference image. Experiments on the publicly available COX Face dataset show the robustness of the proposed method, consistently outperforming naive aggregation schemes. In the best scenario, the approach achieves $99.8\%$ of TPR@$1e^{-1}$ and $96.6\%$ of Rank-5.

URL PDF HTML ☆

赞 0 踩 0

2605.16547 2026-05-19 cs.LG 版本更新

World Model-Enabled Causal Digital Twins for Semantic Communications in Physical AI Systems

面向物理人工智能系统的语义通信世界模型增强因果数字孪生

Lingyi Wang, Tingyu Shui, Walid Saad, Pascal Adjakple

AI总结本文提出基于世界模型的因果数字孪生框架，解决物理人工智能系统中语义通信的长期回报最大化问题，通过因果信息价值指标和策略训练提升导航成功率。

详情

AI中文摘要

语义通信作为一种面向目标的网络范式，但现有解决方案多用于单次任务，无法支持闭环物理人工智能系统。本文研究面向物理人工智能系统的语义通信，将其建模为无线比特预算约束下的长期回报每比特最大化问题，提出因果信息价值指标评估语义标记的边际贡献，并通过世界模型增强的因果数字孪生框架捕捉闭环系统动态，实现对长周期模拟运行的反事实推理。基于这些模拟运行，通过因果信息价值每比特评估训练策略和语义标记选择器，实验表明该框架在返回每kbit和导航成功率方面优于现有强化学习方案。

英文摘要

Semantic communication has emerged as a promising paradigm for enabling goal-oriented networking. However, most existing semantic communication solutions are tailored to one-shot tasks and optimize instantaneous performance. Hence, they cannot be used to support closed-loop dynamic systems with physical artificial intelligence (AI), in which the transmitted semantics affect not only the current inference outcome but also future control actions, state evolution, and ultimately long-horizon task performance. To address this gap, this paper investigates goal-oriented semantic communications for physical AI systems with closed-loop sensing-communication-inference-control. In particular, the problem of semantic communications is formulated as a long-term return-per-bit maximization under wireless bit-budget constraints while capturing both control efficiency and communication efficiency. To solve this problem, a novel causal information value (CIV) metric is introduced to evaluate the marginal contribution of each semantic token to the expected long-term return by transmission interventions. Then, a world-model-enabled causal digital twin (WM-CDT) framework is proposed to capture the dynamics of closed-loop physical AI systems and enable counterfactual reasoning for long-horizon imagined rollouts. Based on these imagined rollouts, an actor-critic policy is trained for long-horizon agent control with high data efficiency, while the semantic token selector is trained through CIV-per-bit evaluation. Extensive simulations on an AirSim-Sionna-based unmanned aerial vehicle (UAV) navigation simulator show that the proposed WM-CDT framework achieves significant improvement in return-per-kbit and navigation success rate compared to existing reinforcement learning solutions.

URL PDF HTML ☆

赞 0 踩 0

2605.16532 2026-05-19 cs.LG econ.GN q-fin.EC 版本更新

Boundedly Rational Meta-Learning in Sequential Consumer Choice

有界理性元学习在序列消费者选择中的应用

Mehrzad Khosravi, Max Kleiman-Weiner, Hema Yoganarasimhan

AI总结研究消费者在不确定环境下重复选择时的跨情境知识转移，提出有界理性元动态规划政策BRMDP(D)，发现消费者通过粗略的先验不确定性表示实现跨情境学习。

详情

AI中文摘要

许多消费者决策是在不确定环境下重复选择。标准模型利用贝叶斯学习和动态规划来捕捉这些决策：消费者根据反馈更新信念，并利用这些信念指导未来的选择。然而，在许多市场中，当消费者进入新情境时，学习不会重置：先前与品牌、产品或提供商的经验会塑造后续相关决策中的信念。我们研究了序列选择中的跨情境知识转移，或元学习。我们设计了一个分层实验室任务，参与者在多个路线中反复选择航空公司并观察噪声二元结果。实证证据表明，参与者不仅在路线内改进，还在跨路线中改进：他们更早选择更好的航空公司并在后续路线中减少伪遗憾。为了识别这种转移的机制，我们比较了人类选择与无转移基准和完全整合的贝叶斯元学习基准。特别是，我们引入了一类有界理性元动态规划策略BRMDP(D)，通过有限数量的超后验抽样（记为D）近似完全整合。试次级似然比较显示，低D的有界理性元学习，特别是BRMDP(1)，比无转移和完全整合的贝叶斯转移拟合参与者行为更好。因此，消费者通过粗略的先验不确定性表示在跨情境中转移品牌层面的规律性。研究结果表明，消费者学习模型应允许近似的跨情境转移，并且基于无转移或完全整合学习的管理反事实可能具有误导性。

英文摘要

Many consumer decisions are repeated choices under uncertainty. Standard models capture these decisions using Bayesian learning and dynamic programming: consumers update beliefs from feedback and use those beliefs to guide future choices. In many markets, however, learning does not restart when consumers enter a new context: prior experience with a brand, product, or provider can shape beliefs in later, related decisions. We study this cross-context knowledge transfer, or meta-learning, in sequential choice. We design a hierarchical laboratory task in which participants repeatedly choose among airlines across routes and observe noisy binary outcomes. Reduced-form evidence shows that participants improve not only within routes, but also across routes: they choose better airlines earlier in later routes and reduce pseudo-regret. To identify the mechanism behind this transfer, we compare human choices to a no-transfer benchmark and a fully integrated Bayesian meta-learning benchmark. In particular, we introduce a class of boundedly rational meta dynamic programming policies, BRMDP(D), that approximate full integration using a limited number of hyper-posterior draws, denoted by D. Trial-by-trial likelihood comparisons show that low-D boundedly rational meta-learning, especially BRMDP(1), fits participant behavior better than both no transfer and fully integrated Bayesian transfer. Consumers, therefore, transfer brand-level regularities across contexts, but through coarse representations of prior uncertainty. The findings imply that models of consumer learning should allow for approximate cross-context transfer, and that managerial counterfactuals based on either no-transfer or fully integrated learning can be misleading.

URL PDF HTML ☆

赞 0 踩 0

2605.16529 2026-05-19 cs.LG math.OC 版本更新

Multiscale Supervised Unbalanced Optimal Transport Flow Matching

多尺度监督不平衡最优传输流匹配

Qiangwei Peng, Lezhi Chen, Peijie Zhou

AI总结本文提出MUST-FM框架，通过利用多尺度数据结构和已知的转移先验知识，有效降低计算成本并实现鲁棒的轨迹推断，适用于大规模单细胞数据集的动态建模。

2605.16527 2026-05-19 cs.LG cs.AI 版本更新

Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions

超图模式机：用于高阶交互的组合分词

Kyrie Zhao, Zehong Wang, Tianyi Ma, Fang Wu, Xiangru Tang, Pietro Lio, Sheng Wang, Yanfang Ye

AI总结本文提出超图模式机，通过学习子集的组合模式，改进高阶交互的建模，从而在超图基准和真实案例中取得更好效果。

详情

AI中文摘要

超图模型高阶关系，从药物处方到推荐。数据中的核心结构信号是交互组合性：高阶关系是否是组合、涌现或抑制性的。在多药治疗中，制度决定是否停药、保留或排除：组合药物三元组可安全简化，涌现三元组需联合所有药物，抑制三元组标志干扰现有交互的药物。现有超图学习方法仅传播观测超边消息，未建模此信号，导致危险组合被误分类。为此，本文提出超图模式机（HGPM），从消息传递转向学习子集的组合模式。它分词组合子集，组织成包含 DAG，并训练掩码重建的包含意识 Transformer。在十个超图基准上，HGPM 匹配或超越现有方法。值得注意的是，在真实不良事件预测案例中，HGPM 正确识别出抑制副作用的药物添加，而现有方法无法区分。代码和数据见 https://github.com/KryieZhao/HGPM.git.

英文摘要

Hypergraphs model higher-order relations that drive real-world decisions, from drug prescriptions to recommendations. A central structural signal in such data, beyond what pairwise relations can express, is interaction compositionality: whether a higher-order relation is compositional, emergent, or inhibitory with respect to its observed or unobserved sets. In polypharmacy, the regime decides whether a drug should be dropped, kept, or excluded: a compositional drug triple can be safely simplified, an emergent triple requires all drugs jointly, and an inhibitory triple flags a drug that disrupts an existing interaction. However, existing hypergraph learning methods, which merely propagate messages over observed hyperedges, leave this compositional signal unmodeled, allowing dangerous drug combinations to slip through and be misclassified. To this end, we propose the Hypergraph Pattern Machine (HGPM), shifting the paradigm from message passing to learning the compositional pattern of subsets. It tokenizes compositional subsets, organizes them in an inclusion DAG, and trains an inclusion-aware Transformer under masked reconstruction. On ten hypergraph benchmarks, HGPM matches or exceeds state-of-the-art methods. Notably, in a real adverse-event prediction case, HGPM correctly identifies the drug addition that inhibits the side effect among feature-identical candidates, a discrimination existing methods cannot make. The code and data are in https://github.com/KryieZhao/HGPM.git.

URL PDF HTML ☆

赞 0 踩 0

2605.16520 2026-05-19 cs.LG 版本更新

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

通过扩散风格平滑实现采样基于非凸优化的全局收敛

Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu

AI总结本文通过平滑视角分析采样优化，揭示平滑在逃逸局部极小值中的作用，并提出DIDA算法实现全局收敛。

Comments 57 pages, 5 figures

详情

AI中文摘要

采样优化（SBO）如交叉熵方法和进化算法在无梯度非凸问题中取得成功，但其收敛性理解有限。本文通过平滑视角建立非渐近收敛分析，将SBO重新解释为对平滑目标的梯度下降，类似于扩散模型中的噪声条件得分上升。我们分析了平滑目标的景观，证明平滑通过扩大局部凸区域帮助逃逸局部极小值，但引入了最优性间隙。基于此，我们为SBO算法在全局极小值邻域内建立非渐近收敛保证，并提出Diffusion-Inspired Dual-Annealing（DIDA）算法，可证明收敛到全局最优。通过大量数值实验验证景观结果，并展示DIDA在梯度自由优化方法中的优异性能。最后讨论了结果对扩散模型的影响。

英文摘要

Sampling-based optimization (SBO), like cross-entropy method and evolutionary algorithms, has achieved many successes in solving non-convex problems without gradients, yet its convergence is poorly understood. In this paper, we establish a non-asymptotic convergence analysis for SBO through the lens of smoothing. Specifically, we recast SBO as gradient descent on a smoothed objective, mirroring noise-conditioned score ascent in diffusion models. Our first contribution is a landscape analysis of the smoothed objective, demonstrating how smoothing helps escape local minima and uncovering a fundamental coverage-optimality trade-off: smoothing renders the landscape more benign by enlarging the locally convex region around the global minimizer, but at the cost of introducing an optimality gap. Building on this insight, we establish non-asymptotic convergence guarantees for SBO algorithms to a neighborhood of the global minimizer. Furthermore, we propose an annealed SBO algorithm, Diffusion-Inspired Dual-Annealing (DIDA), which is provably convergent to the global optimum. We conduct extensive numerical experiments to verify our landscape results and also demonstrate the compelling performance of DIDA compared to other gradient-free optimization methods. Lastly, we discuss implications of our results for diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2605.16515 2026-05-19 cs.CV cs.LG 版本更新

SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability

SeamCam：通过多线索视觉可探测性量化无缝伪装

Amin Karimi Monsefi, Abolfazl Meyarian, Mridul Khurana, Shuheng Wang, Pouyan Navard, Cheng Zhang, Anuj Karpatne, Wei-Lun Chao, Rajiv Ramnath

AI总结 SeamCam通过将伪装评估转化为视觉定位问题，提出了一种量化动物伪装效果的指标，通过人类实验验证其有效性，并展示了其在扩散模型训练中的应用。

详情

AI中文摘要

动物被描述为有效伪装时，能够无缝融入周围环境，但目前缺乏标准化的量化措施。本文通过将伪装评估转化为视觉定位问题：伪装良好的动物在已知类别时仍难以检测。引入SeamCam指标，量化动物的可探测性。给定图像和目标物种，SeamCam生成类别条件的检测提案，提取分割掩码，并识别其子集，其联合覆盖最大IoU与真实掩码。SeamCam分数是最大可恢复定位信号的补数，分数越高伪装越强（即可探测性越低）。在94名参与者和2390次比较的人类二择一强制选择研究中，SeamCam与人类伪装难度判断达成78.82%的一致性，优于现有最先进方法约25%。随后展示了SeamCam作为直接偏好优化（DPO）的偏好信号，用于微调基于扩散的修复模型以生成伪装。这提供了一种经济的训练方法，其目标专门适用于伪装生成，不同于典型的扩散模型。为支持严格基准测试，进一步引入CamFG-1.5k数据集，包含1521张高分辨率图像，在伪装生成前动物完全可见，使评估更公平，通过控制现有数据集中存在的遮挡伪影。

英文摘要

Animals are described as effectively camouflaged when they blend seamlessly with their surrounding, yet no standardized quantitative measure of this seamlessness exists. We address this gap by framing camouflage evaluation as a visual localization problem: a well-camouflaged animal is one that remains difficult to detect even when its category is known. We introduce SeamCam (Seamless Camouflage), a metric that quantifies how detectable an animal is from the available visual evidence. Given an image and a target species, SeamCam generates category-conditioned detection proposals, extracts segmentation masks, and identifies the subset whose collective union yields the highest IoU with the ground-truth mask. The SeamCam score is one minus this maximum recoverable localization signal, where a higher score indicates stronger camouflage (i.e., lower detectability). In a human two-alternative forced-choice study with 94 participants and 2,390 comparisons, SeamCam achieves 78.82% agreement with human camouflage difficulty judgments, outperforming state-of-the-art by about 25%. We then demonstrate SeamCam's utility as a preference signal for Direct Preference Optimization (DPO) to fine-tune a diffusion-based inpainting model for camouflage generation. This offers an affordable training approach with an objective explicitly suited for camouflage generation, unlike typical diffusion models. To support rigorous benchmarking, we further introduce CamFG-1.5k, a curated dataset of 1,521 high-resolution images in which animals are fully visible prior to camouflage generation, enabling unbiased evaluation by controlling for occlusion artifacts present in existing datasets. https://7amin.github.io/SeamCam/

URL PDF HTML ☆

赞 0 踩 0

2605.16486 2026-05-19 stat.ML astro-ph.IM cs.LG 版本更新

StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow

StAD：基于Stein算子的 amortized 散度用于具有扩散和流的快速似然

Gurjeet Jagwani, Stephen Thorp, Sinan Deger, Hiranya Peiris

AI总结本文提出StAD方法，利用Langevin-Stein算子预测和学习PF-ODE的散度，无需计算雅可比矩阵，提升了似然预测的效率和稳定性。

Comments 24 pages, 10 figures

详情

AI中文摘要

扩散和流基模型广泛用于生成建模和密度估计。它们允许确定性概率流常微分方程（PF-ODE），类似于连续归一化流（CNFs），描述了概率质量的传输。从这些模型中获得似然对于许多工作流程至关重要，尤其是贝叶斯分析，这需要求解雅可比矩阵的迹来计算学习PF-ODE的发散性，这要么是$\mathcal{O}(D^2)$精确计算，要么是$\mathcal{O}(D)$的噪声估计。我们引入StAD，一种新的蒸馏方法，利用兰格vin-斯坦算子预测和学习PF-ODE的发散性，而无需计算雅可比矩阵。我们证明我们的方法在CIFAR-10、ImageNet和其他密度估计任务上与Hutchinson和Hutch++竞争，一致提高了似然预测的方差和速度，优于Hutchinson。我们还证明我们的方法可以推广到各种生成模型，且在某些正则性条件下，这些学习的向量场可以满足斯坦类。

英文摘要

Diffusion and flow-based models are ubiquitously used for generative modelling and density estimation. They admit a deterministic probability flow ordinary differential equation (PF-ODE), analogous to continuous normalizing flows (CNFs), which describes the transport of the probability mass. Obtaining the likelihood from these models is of interest to many workflows, especially Bayesian analysis, and requires solving the trace of the Jacobian to compute the divergence of the learned PF-ODE, which is either $\mathcal{O}(D^2)$ to compute exactly or $\mathcal{O}(D)$ with a noisy estimate. We introduce StAD, a new distillation method to predict and learn the divergence of the PF-ODE using the Langevin-Stein operator without ever computing the Jacobian. We show that our method is competitive with the Hutchinson and Hutch++ on CIFAR-10, ImageNet and other density estimation tasks, consistently improving the variance and speed of the likelihood predictions compared to the Hutchinson. We additionally show our method will generalize to a varied class of generative models, and show that under some regularity conditions these learned vector fields can be made to satisfy the Stein class.

URL PDF HTML ☆

赞 0 踩 0

2605.16477 2026-05-19 cs.LG cs.CV 版本更新

Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning

寻求不熟悉但难忘的概念：作为元学习的概念创造力

Mengye Ren

AI总结本文提出概念创造力作为元学习，通过创作者生成候选刺激和评估者适应学习，产生可学习的创新内容。

Comments 25 pages

2605.16476 2026-05-19 eess.IV cs.CV cs.LG 版本更新

Deep Learning for MRI Slice Interpolation: The Critical Role of Problem Formulation

深度学习在MRI切片插值中的应用：问题建模的关键作用

Shamit Savant

AI总结本文探讨了深度学习在前列腺成像中插值中间MRI切片的方法，发现问题建模对性能的影响远大于架构复杂度，通过改进插值方式提升了SSIM性能。

Comments 10 pages main text, 21 pages total with supplementary, 8 figures, supplementary material included

详情

AI中文摘要

在临床MRI中，通过平面分辨率通常比平面内分辨率更粗糙，限制了诊断价值。本文研究了深度学习方法用于插值中间MRI切片，有效将通过平面分辨率翻倍。评估了五种架构（CNN、U-Net、两种GAN变体和DDPM），发现问题建模对性能的影响远大于架构复杂度。通过将插值任务改用相邻切片（i-1，i+1）而非远距离切片（i-2，i+2），在所有确定性架构上实现了58%的SSIM提升。U-Net模型在PSNR为30.08 dB和SSIM为0.898，比线性插值基线提升了10.1%。DDPM也进行了评估，但因随机生成与确定性重建需求不匹配而表现不佳。这些发现表明，在医学影像任务中，问题建模的影响是架构复杂度的290倍。

英文摘要

Through-plane resolution in clinical MRI is typically much coarser than in-plane resolution, limiting diagnostic utility. This work investigates deep learning approaches to interpolate intermediate MRI slices in prostate imaging, effectively doubling through-plane resolution. I evaluated five architectures (CNN, U-Net, two GAN variants, and DDPM) and discovered that problem formulation has dramatically more impact than architectural complexity. By reformulating the interpolation task to use adjacent slices (i-1, i+1) rather than distant slices (i-2, i+2), I achieved a 58% improvement in SSIM performance across all deterministic architectures. The U-Net model achieved the best results with PSNR of 30.08 dB and SSIM of 0.898, representing a 10.1% improvement over linear interpolation baseline. A DDPM was also evaluated but showed poor reconstruction quality due to fundamental mismatch between stochastic generation and deterministic reconstruction requirements. These findings demonstrate that problem formulation can have 290x more impact than architectural sophistication in medical imaging tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.16473 2026-05-19 stat.ML cs.LG cs.NA math.NA math.PR 版本更新

Dimension-Uniform Discretization Analysis of Preconditioned Annealed Langevin Dynamics for Multimodal Gaussian Mixtures

预处理退火 Langevin 动力学在多模高斯混合中的维度均匀离散化分析

Lorenzo Baldassari, Josselin Garnier, Knut Solna, Maarten V. de Hoop

AI总结本文研究了预处理退火 Langevin 动力学在高斯混合中的稳定性问题，通过 Euler-Maruyama 离散化和指数积分方案，证明了在满足特定谱条件时，KL 散度具有维度均匀的上界。

详情

AI中文摘要

在高维和无穷维设置中，获得稳定的扩散基采样器具有挑战性，因为高频率坐标上的误差累积会使动力学在有限维近似细化时变得不稳定。离散化是此类误差的典型来源，而使用合适的谱衰减预处理是控制其累积的一种方法。本文研究了预处理退火 Langevin 动力学（ALD）应用于高斯混合时的问题。我们首先证明 Euler-Maruyama（EM）离散化通过将退火分数的刚性线性部分用前向 Euler 步处理，施加了将预处理器与退火协方差尺度耦合的稳定性约束。结合确保退火动力学维度均匀控制的条件，该约束迫使初始平滑分布在不同维度上保持均匀接近目标。然后我们考虑了对退火分数的刚性线性部分进行精确积分的指数积分方案。在满足耦合平滑协方差、组件协方差谱和预处理器的显式谱可求和条件时，我们证明了该方案的 KL 散度具有维度均匀的上界。此上界可通过允许足够时间进行退火并相应细化时间网格来使其任意小。重要的是，这些条件允许 KL 散度在不同维度上发散的区域，表明 EM 限制是方案依赖的，而非 ALD 的固有属性。

英文摘要

Obtaining stable diffusion-based samplers in high- and infinite-dimensional settings is challenging because errors can accumulate across high-frequency coordinates and make the dynamics unstable under refinement of the finite-dimensional approximation of the underlying function-space problem. Discretization is a typical source of such errors, and preconditioning with a suitable spectral decay is one way to control their accumulation. In this paper, we study this problem for preconditioned annealed Langevin dynamics (ALD) applied to Gaussian mixtures. We first show that Euler-Maruyama (EM) discretization, by treating the stiff linear part of the annealed score with a forward Euler step, imposes a stability constraint coupling the preconditioner with the annealed covariance scale. Together with the conditions ensuring dimension-uniform control of the annealed dynamics, this constraint forces the initial smoothed law to remain uniformly close to the target across dimensions. We then consider an exponential-integrator scheme that integrates the stiff linear part of the annealed score exactly. Under explicit spectral summability conditions coupling the smoothing covariance, the component covariance spectra, and the preconditioner, we prove a dimension-uniform Kullback-Leibler (KL) bound for this scheme. This bound can be made arbitrarily small, uniformly in dimension, by allowing enough time for annealing and then refining the time mesh accordingly. Importantly, these conditions allow regimes in which the KL divergence between the target and the initial smoothed law diverges with dimension, showing that the restrictions imposed by EM are scheme-dependent rather than intrinsic to ALD.

URL PDF HTML ☆

赞 0 踩 0

2605.16470 2026-05-19 cs.LG cs.AI 版本更新

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

战略性过参数化以实现通用的低秩适应

Jing Gao, Zhong-Yi Lu, Pan Zhang, Ze-Feng Gao

AI总结本文提出LoRA-Over框架，通过训练时丰富优化景观并推理时压缩，提升低秩适应的泛化能力，实验显示其在多个任务上优于传统LoRA。

详情

AI中文摘要

本文提出LoRA-Over框架，通过训练时丰富优化景观并推理时压缩，提升低秩适应的泛化能力，实验显示其在多个任务上优于传统LoRA。

英文摘要

Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation (LoRA) mitigate this by confining updates to a compact set of trainable parameters, but this aggressive reduction often sacrifices generalization, especially under transfer across heterogeneous tasks and domains. We revisit the tension between parameter efficiency and adaptation capacity, and ask whether the two are truly at odds. We answer in the negative by introducing LoRA-Over, a framework grounded in a simple principle: enrich the optimization landscape during training, then collapse the enrichment at inference. LoRA-Over injects auxiliary parameters into the low-rank adapters during training to broaden the effective hypothesis space, and through a decomposition-based reformulation folds them back into a standard low-rank structure with negligible reconstruction error, keeping inference cost identical to vanilla LoRA. Since not all weight matrices benefit equally from added capacity, we further propose two scheduling strategies, one statically predefined and one dynamically determined at runtime, that direct extra capacity where most needed. We evaluate LoRA-Over on language understanding (GLUE, T5-Base), dialogue (MT-Bench), arithmetic reasoning (GSM8K), and code generation (HumanEval), using LLaMA 2-7B and LLaMA 3.1-8B. Across all benchmarks and scales, LoRA-Over consistently outperforms vanilla LoRA, showing that principled over-parameterization designed to vanish at inference is an effective lever for improving PEFT generalization. Code will be released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2605.16468 2026-05-19 cs.CV cs.AI cs.CL cs.LG q-bio.NC 版本更新

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

可解释的神经编码机制揭示人类视觉皮层的精细功能选择性

Idan Daniel Grosbard, Mor Geva, Galit Yovel

AI总结本文提出MINE框架，通过机制可解释工具揭示自然图像中驱动皮层 voxel 活动的特征，验证了特征对 voxel 响应的因果影响，并揭示了视觉皮层中精细的功能选择性。

Comments 40 pages, 28 figures

详情

AI中文摘要

理解人类视觉的核心目标是揭示驱动神经活动的视觉特征。已有研究利用人工神经网络作为编码模型预测皮层对自然图像的响应，揭示了激活类别选择区域的视觉内容。然而，现有方法多为相关性分析，将编码器视为黑箱，无法确定哪些图像特征驱动每个 voxel 的响应。本文提出机制可解释神经编码（MINE）框架，通过机制可解释工具定位自然图像中驱动毫米级（voxel 级）活动的特征。MINE利用语言对齐的图像表示预测每个 voxel 的响应，并生成语义可解释的特征描述，用于 voxel 的激活。进一步将这些 per-image 特征泛化为 per-voxel 功能轮廓。为验证 per-image 描述，我们显示它们足以生成激发 voxel 响应与原始图像响应匹配的图像，其准确性优于随机或低贡献控制生成的图像。此外，通过反事实插入或移除预测特征，可使激活在预期方向变化，提供因果证据。由 voxel 激活轮廓指导的反事实编辑产生更强的激活变化，表明轮廓忠实捕捉每个 voxel 的选择性。最后，将 MINE 应用于研究充分的类别选择脑区，显示其恢复了已知的类别偏好，同时揭示了每个区域内的精细 voxel 结构。总体而言，我们的结果确立了机制可解释性作为发现和验证神经功能精细假设的路径。

英文摘要

A central goal in understanding human vision is to uncover the visual features that drive neuronal activity. A growing body of work has used artificial neural networks as encoding models to predict cortical responses to natural images, revealing the visual content that activates category-selective regions. However, existing approaches are largely correlational and treat the encoder as a black box, leaving open which image features drive each voxel's response. We introduce Mechanistically Interpretable Neural Encoding (MINE), a framework that opens this black box by applying mechanistic-interpretability tools to localize the features within natural images that drive millimeter-scale (voxel-level) activity. MINE predicts each voxel's response using language-aligned image representations, and produces semantically interpretable descriptions of the features critical for the voxel's activation. We further generalize these per-image features into per-voxel functional profiles. To validate the per-image descriptions, we show they are sufficient to generate images that elicit voxel responses matching the responses to the original images, more accurately than images generated from random or low-attribution controls. Moreover, counterfactually inserting or removing the predicted features from images shifts activation in the expected direction, providing causal evidence. Counterfactual editing guided by the per-voxel activation profiles produces even stronger activation shifts, indicating that the profiles faithfully capture each voxel's selectivity. Finally, we apply MINE to well-studied category-selective brain regions, showing it recovers their known categorical preferences while revealing fine-grained unique voxel structure within each region. Overall, our results establish mechanistic interpretability as a path to discover and causally validate fine-grained hypotheses about neural function.

URL PDF HTML ☆

赞 0 踩 0

2605.16454 2026-05-19 cs.LG eess.SP quant-ph 版本更新

QuChaTeR: A Hybrid Quantum-Chaotic Temporal Framework for Earthquake Prediction

QuChaTeR：一种混合量子-混沌时间框架用于地震预测

Emir Kaan Özdemir

AI总结 QuChaTeR结合小波预处理、混沌映射和变分量子电路与递归结构，提升地震信号时间特征提取能力，在真实地震数据集上表现优异，但面临可扩展性和量子硬件限制的挑战。

Comments Accepted at 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026). This is the accepted version of the paper. The final published version will appear in the IEEE proceedings. Proc. IEEE ICASSP 2026, Barcelona, Spain, 2026

详情

DOI: 10.1109/ICASSP55912.2026.11460318

AI中文摘要

地震预测仍面临挑战，因其信号具有高度非线性和混沌动态。尽管经典深度学习模型如LSTM和CNN能捕捉局部时间特征，而量子模型提供更丰富的状态表示，但将混沌驱动机制与之结合仍不充分。我们引入QuChaTeR，一种混合架构，结合基于小波的预处理、混沌映射和变分量子电路与递归结构，以增强时间特征提取能力。QuChaTeR使用PyTorch和PennyLane实现，并在经典（LSTM、GRU、RNN、1D-CNN、Reservoir Computing）和量子启发（Quantum LSTM）基线模型上进行基准测试。在真实世界地震数据集上，QuChaTeR在多个评估标准上均表现出色，收敛速度更快。尽管结果令人鼓舞，但可扩展性和量子硬件限制仍是挑战。总体而言，本工作展示了量子-混沌混合方法如何为更准确和稳健的地震预测提供实用路径。

英文摘要

Seismic prediction remains challenging due to the highly nonlinear and chaotic dynamics of earthquake signals. While classical deep learning models such as LSTMs and CNNs capture local temporal features, and quantum models offer richer state representations, their integration with chaos-driven mechanisms is underexplored. We introduce QuChaTeR, a hybrid architecture that combines wavelet-based preprocessing, chaotic maps, and variational quantum circuits with recurrent structures to enhance temporal feature extraction. Implemented in PyTorch and PennyLane, QuChaTeR is benchmarked against classical (LSTM, GRU, RNN, 1D-CNN, Reservoir Computing) and quantum-inspired (Quantum LSTM) baselines. On real-world seismic datasets, QuChaTeR consistently converges faster and achieves superior performance across multiple evaluation criteria. Despite promising results, scalability and quantum hardware limitations remain challenges. Overall, this work demonstrates how quantum-chaotic hybridization provides a practical pathway toward more accurate and robust earthquake prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.16452 2026-05-19 cs.LG cs.AI 版本更新

Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign

峰值检测器：通过指令调优的大语言模型实现可解释的多模态峰值检测

Jiahui Li, Yida Zhang, Zixuan Zeng, Jiayu Chen, Yingjian Song, Yin Xiao, Nishan Dong, Junjie Lu, Younghoon Kwon, Xiang Zhang, Jin Lu, Wenzhan Song, Fei Dou

AI总结本文提出Peak-Detector框架，利用指令调优的大语言模型实现跨模态、可解释的峰值检测，通过峰表示技术压缩时间序列数据并提升检测准确性，同时生成解释性内容以支持验证与错误分析。

详情

AI中文摘要

准确检测多种心脏生理信号（如心电图、脉搏波容积图、球状心图和体震图）中的峰值对心血管监测至关重要，但常受伪影和信号变异影响。传统算法通常基于专家知识针对单一信号模态设计，限制了通用性。相比之下，深度学习方法缺乏可解释性，限制了专家验证和人机交互。为此，我们引入Peak-Detector框架，利用指令调优的大语言模型（LLMs）实现稳健、跨模态且可解释的峰值检测。框架的核心创新是“峰表示”技术，将时间序列数据转换为压缩格式，在保留关键事件信息的同时显著减少信号长度。此表示提供关键的归纳偏差，引导LLM在生理有意义的事件上推理而非原始噪声数据。模型通过监督微调（SFT）后接强化学习（RL）的多目标奖励函数进行优化。模型的自解释能力通过在自建的Peak-Explanation数据集上微调来培养。在四个模态（ECG、PPG、BCG和BSG）覆盖七个数据集（六个公开基准加一个真实世界队列）上，Peak-Detector展示了强大的跨模态性能，实现了临床相关时间容忍度下的最佳或并列最佳检测。除了准确性外，生成的解释性内容揭示了失败模式并支持验证和错误分析。

英文摘要

Accurate peak detection across diverse cardiac physiological signals, including the Electrocardiogram (ECG), Photoplethysmogram (PPG), Ballistocardiogram (BCG), and Bodyseismography (BSG), is fundamental for cardiovascular monitoring but is often hindered by artifacts and signal variability. Conventional algorithms are typically engineered with expert knowledge for a single signal modality, limiting their generalizability. Conversely, deep learning-based methods often lack interpretability, limiting transparency for expert verification and hindering expert-computer interaction. To address these limitations, we introduce Peak-Detector, a novel framework that leverages instruction-tuned Large Language Models (LLMs) for robust, cross-modal, and explainable peak detection. A core innovation of our framework is a "peak-representation" technique that transforms time-series data into a condensed format, preserving critical event information while significantly reducing signal length. This representation provides a crucial inductive bias, guiding the LLM to reason over physiologically meaningful events rather than raw, noisy data. The model is optimized through a two-stage process: supervised fine-tuning (SFT) followed by reinforcement learning (RL) with a multi-objective reward function. The model's self-explanation capabilities are cultivated by fine-tuning on a custom-built Peak-Explanation dataset. Across four modalities-ECG, PPG, BCG, and BSG-spanning seven datasets (six public benchmarks plus one real-world cohort), Peak-Detector demonstrates strong cross-modal performance, achieving best or tied-best detection under clinically relevant temporal tolerance. Beyond accuracy, the generated rationales surface failure modes and support verification and error analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.16449 2026-05-19 cs.LG cs.AI 版本更新

PESD-TSF: A Period-Aware and Explicit Structured Decomposition Framework for Long-Term Time Series Forecasting

PESD-TSF：一种周期感知和显式结构分解框架，用于长期时间序列预测

Hua Wang, Xianhao Jiao, Fan Zhang

AI总结 PESD-TSF通过引入周期性门控机制、多尺度编码器和跨尺度协作注意力，解决深度网络中周期感知减弱和变量间依赖破坏的问题，提升多变量时间序列预测性能。

Comments 23 pages, 9 figures, 13 tables

详情

AI中文摘要

深度预测模型常面临周期感知减弱和趋势-噪声表示混乱的问题，且通道独立范式虽提高训练稳定性，却破坏变量间动态协调，阻碍多变量时间序列中变量一致性建模。为此，我们提出PESD-TSF，一种受物理启发的结构分解框架，旨在同时强调可解释性和预测准确性。PESD-TSF引入三个关键设计：首先，乘法周期性门控机制整合连续时间先验，动态调节信号幅度，保持深度层间的周期结构；其次，多尺度结构编码器整合去趋势注意力与分层采样，显式分离长期趋势与高频变化，同时保留细粒度时间语义；第三，为恢复被破坏的变量依赖，我们提出跨尺度协作注意力（CSCA）与RLC正则化方案，重构深度特征空间中的全局变量拓扑，并通过正交性和一致性约束实现物理一致的协作。在多个领域的基准数据集上进行的广泛实验表明，PESD-TSF在多变量预测任务中，特别是在涉及复杂变量耦合的任务中， consistently 实现了最先进的性能，突显其优越的结构建模能力和泛化能力。

英文摘要

Deep forecasting models often suffer from attenuated periodic perception and entangled trend-noise representations as network depth increases. Moreover, the widely adopted channel-independent paradigm, while improving training stability, disrupts intrinsic dynamic coordination among variables, hindering the modeling of cross-variable consistency in multivariate time series. To address these issues, we propose PESD-TSF, a physics-inspired structured decomposition framework for long-term time series forecasting that jointly emphasizes interpretability and predictive accuracy. PESD-TSF introduces three key designs. First, a Multiplicative Periodic Gating mechanism incorporates continuous-time priors to dynamically modulate signal amplitudes, preserving periodic structures across deep layers. Second, a multi-scale structured encoder integrates detrended attention with hierarchical sampling to explicitly decouple long-term trends from high-frequency variations while retaining fine-grained temporal semantics. Third, to recover disrupted inter-variable dependencies, we propose Cross-Scale Collaborative Attention (CSCA) together with an RLC regularization scheme, which reconstructs global inter-variable topology in deep feature spaces and enforces physically consistent collaboration through orthogonality and consistency constraints. Extensive experiments on benchmark datasets from multiple domains demonstrate that PESD-TSF consistently achieves state-of-the-art performance, with particularly strong gains on multivariate forecasting tasks involving complex inter-variable coupling, highlighting its superior structural modeling capability and generalization.

URL PDF HTML ☆

赞 0 踩 0

2605.16443 2026-05-19 cs.LG cs.AI 版本更新

Two-Valued Symmetric Circulant Matrices: Applications in Deep Learning

二值对称循环矩阵：在深度学习中的应用

Jayakrishna Amathi, Venkata Prasanth Yanambaka, Saraju P. Mohanty, Elias Kougianos

AI总结本文提出二值对称循环矩阵，通过每层仅使用两个权重实现极稀疏结构，显著降低存储需求，实验显示在MNIST和MIT-BIH数据集上参数减少超过80倍，同时保持较高精度，适用于边缘计算和低功耗系统。

详情

AI中文摘要

尽管深度神经网络在视觉、医疗诊断和物联网场景中取得成功，但其在资源受限平台上的部署面临严峻挑战，由于存储需求高、计算复杂度大和占用空间大。特别是全连接层需要大量权重，使边缘设备难以容纳。为克服与有限平台相关的挑战，本文提出二值对称循环矩阵（TVSCM），一种非常稀疏的架构，每层仅使用两个权重以保持循环和对称性。极结构稀疏架构的存储成本与传统全权重存储相比几乎可以忽略不计。与传统稀疏学习技术如低秩近似和剪枝方法不同，该架构提供极稀疏形式，实现极低的存储需求。模拟研究显示，在MNIST数据集上参数从623,290减少到7,852，MIT-BIH心律失常数据集上从24,709减少到942，同时保持在MNIST上97.6%到93.5%的精度，在MIT-BIH上97.6%到93.1%的精度。由于其极低的架构需求和非常低的功耗，该架构适用于边缘计算平台、微型机器学习平台、IoMT系统和电池供电系统。

英文摘要

Despite the success of deep neural networks in vision, medical diagnosis, and IoT scenarios, their deployment on resource-limited platforms poses serious challenges due to their high storage requirements, computational complexity, and large footprint. In particular, fully connected layers require a large number of weights, making it difficult for edge devices to accommodate them. To overcome these challenges associated with limited platforms, this paper proposes the Two-Valued Symmetric Circulant Matrix (TVSCM), a very sparse architecture that employs just two weights per layer to keep it circulant and symmetric. The extreme form of structured sparse architecture provides negligible storage costs compared to traditional full-weight storage. Instead of hardware and additional stages of other traditional sparse learning techniques, such as low-rank approximation and pruning approaches, this architecture provides an extreme form of sparsity, achieving very minimal storage requirements. The simulation study demonstrates more than 80$\times$ reduction in model parameters, reducing parameters from 623,290 to 7,852 on MNIST and from 24,709 to 942 on the MIT-BIH arrhythmia dataset, while maintaining comparable accuracy from 97.6% to 93.5% on MNIST and from 97.6% to 93.1% on MIT-BIH. Due to its minimal architectural requirements and very low power consumption, this architecture would be ideal for edge computing platforms, tiny-ML platforms, IoMT systems, and battery-powered systems.

URL PDF HTML ☆

赞 0 踩 0

2605.16442 2026-05-19 cs.RO cs.AI cs.LG 版本更新

Hierarchical Two-Stage Framework for Environment-Aware Long-Horizon Vessel Trajectory Prediction

面向环境的长航程船舶轨迹预测分层两阶段框架

Ganeshaaraj Gnanavel, Tharindu Fernando, Sridha Sridharan, Clinton Fookes

AI总结本文提出分层两阶段框架，结合长短期预测器与网格感知短期预测器，通过分层融合机制提升船舶轨迹预测精度，实验显示在ADE和FDE上优于现有方法。

详情

AI中文摘要

长航程船舶轨迹预测在真实海洋条件下对碰撞避免、交通管理和路线规划至关重要。然而，由于长距离时间依赖性和动态环境因素如洋流、风和波浪，实现准确预测具有挑战性。为此，我们提出一种分层两阶段框架，通过分层融合机制结合粗略长时预测器与网格感知的短时预测器。短时分支利用离散化海事单元上的时空图变换器捕捉局部动态，而长时分支编码总体航行意图。集成的环境模块利用洋流参数、风向量和显著波高，通过跨模态注意和特征调制实现对不同海况的适应性响应。此外，可学习的Savitzky-Golay平滑层增强了融合轨迹的时间一致性。我们在澳大利亚船队跟踪系统（CTS）数据上进行了评估，数据来自西北地区，并与Copernicus海洋服务产品对齐，使用3小时输入和10小时预测时间范围。实验结果表明，我们的框架在平均位移误差（ADE）和最终位移误差（FDE）上比现有方法提高了25%和17%。消融研究进一步验证了每个组件的贡献。

英文摘要

Long-horizon vessel trajectory forecasting under real ocean conditions is critical for collision avoidance, traffic management, and route planning. However, achieving accurate predictions is challenging due to long-range temporal dependencies and dynamic environmental factors such as currents, wind, and waves. To address these issues, we propose a hierarchical two-stage framework that combines a coarse long-term predictor with a grid-aware short-term predictor through a hierarchical fusion mechanism. The short-term branch leverages a Spatio-Temporal Graph Transformer on discretized maritime cells to capture localized dynamics, while the long-term branch encodes overarching navigational intent. An integrated environmental module incorporates oceanographic parameters, including surface currents, wind vectors, and significant wave height, using cross-modal attention and feature-wise modulation for adaptive response to varying sea conditions. Additionally, a learnable Savitzky-Golay smoothing layer enhances temporal coherence in fused trajectories. We evaluate our approach on Australian Craft Tracking System (CTS) data from the North West region, aligned with Copernicus Marine Service products, using a 3-hour input and a 10-hour prediction horizon. Experimental results show that our framework outperforms the state-of-the-art by 25% in Average Displacement Error (ADE) and 17% in Final Displacement Error (FDE). Ablation studies further validate the contribution of each component.

URL PDF HTML ☆

赞 0 踩 0

2605.16441 2026-05-19 cs.LG cs.AI 版本更新

DeepArrhythmia: Segment-Contextualized ECG Arrhythmia Classification via Selective Evidence Acquisition

DeepArrhythmia: 基于选择性证据获取的段落上下文化ECG心律失常分类

Jiahui Li, Ruili Fang, Zishuai Liu, WenZhan Song, Jin Lu, Fei Dou

AI总结 DeepArrhythmia通过选择性证据获取实现段落上下文化ECG心律失常分类，结合原始ECG信号和渲染波形图像，利用专门工具分离生理测量与证据整合，提升多beat节奏上下文下的心律失常检测精度。

详情

AI中文摘要

心电图（ECG）心律失常检测旨在为每条心跳分配一个心律失常类别，但许多现有系统将心跳视为孤立的局部实例，限制了对多心跳节奏上下文的依赖。我们提出DeepArrhythmia，一种工具导向的多模态框架，用于段落上下文化的心跳级ECG心律失常分类。给定一个多心跳ECG段，DeepArrhythmia结合原始ECG信号和渲染的波形图像，定位R峰以识别心跳实例，并生成结构化的心跳级预测。该框架通过专门工具分离生理测量与证据整合，用于心跳定位、数值节奏-形态提取和形态聚焦的文本分析。DeepArrhythmia利用段级置信度在最小和丰富证据状态之间路由，因为更丰富的生理证据并不总是有用。这种代理设计整合了节奏上下文、显式生理基础和选择性证据获取以进行决策。

英文摘要

Beat-level Electrocardiography (ECG) arrhythmia detection aims to assign an arrhythmia class to each beat in a recording, yet many existing systems treat beats as isolated local instances. This is limiting because beat labels often depend on multi-beat rhythm context, including timing, compensatory pauses, and beat-to-beat morphological consistency. We present DeepArrhythmia, a tool-grounded multimodal framework for segment-contextualized beat-level ECG arrhythmia classification. Given a multi-beat ECG segment, DeepArrhythmia combines the raw ECG signal and a rendered waveform image, localizes R peaks to identify beat instances, and produces structured beat-level predictions. The framework decouples physiological measurement from evidence integration using specialized tools for beat localization, numerical rhythm--morphology extraction, and morphology-focused textual analysis. DeepArrhythmia uses segment-level confidence to route between minimal and rich evidence states, since richer physiological evidence is not uniformly useful. This agentic design integrates rhythm context, explicit physiological grounding, and selective evidence acquisition for decision making.

URL PDF HTML ☆

赞 0 踩 0

2605.16438 2026-05-19 cs.LG cs.AI 版本更新

Byzantine-Resilient Federated Learning via QUBO-Based Client Selection on Quantum Annealers

通过量子退火的客户端选择实现容错联邦学习

Andras Ferenczi, Sutapa Samanta, Dagen Wang, Jason Qizhe Qin

AI总结本文提出利用量子退火解决联邦学习中的拜占庭容错问题，通过将客户端选择转化为二次无约束二元优化问题，提升对恶意更新的检测能力。

Comments 9 pages, 6 figures, 8 tables

详情

AI中文摘要

联邦学习（FL）在分布式客户端上训练全局模型，但规模扩大时易受恶意更新攻击。本文提出一种量子退火方法，将客户端选择转化为二次无约束二元优化（QUBO）问题，通过量子退火器求解。QUBO方法在小规模客户端中优于MultiKrum，但在大规模客户端中性能下降。本文引入MultiSignal集成方法，结合欧几里得和余弦Krum分数差距，将攻击分类为四个阶段并路由恶意攻击至受惩罚的QUBO。实验表明，MultiSignal在MNIST数据集上达到95.3%的检测准确率，显著优于传统MultiKrum方法。

英文摘要

Federated Learning (FL) trains a global model across decentralized clients while preserving data privacy, but at scale it is vulnerable to malicious updates. Byzantine-resilient aggregation methods such as MultiKrum score gradients against their nearest neighbors and can miss malicious updates that preserve the statistical properties of honest ones. We propose a quantum annealing approach that reformulates client selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, encoding pairwise distances into a cost function solved by quantum annealers (QA). Unlike MultiKrum's greedy per-client scoring, the QUBO formulation jointly optimizes over all subsets to find the mutually closest group of $m$ clients. At small scale (15 clients), QUBO outperforms MultiKrum on the most challenging Byzantine attacks: e.g., Advanced LIE is detected with 95.11% accuracy versus 81.33% on MNIST and 97.78% versus 75.56% on CIFAR-10. QUBO fares poorly on simpler attacks where MultiKrum excels, so the two methods are complementary. QUBO quality also degrades as the number of clients grows. To address this, we introduce a MultiSignal ensemble that uses a dual-feature routing gate based on Euclidean and cosine Krum score gaps to classify attacks into four regimes and routes evasion attacks to a suspicion-penalized QUBO with agreement voting. At 100 clients on MNIST, MultiSignal achieves 95.3% average detection accuracy versus 91.8% for classical MultiKrum, with the largest gains on Sparse Lie (72.0% to 95.2%, +23.2 points) and Advanced Lie (80.4% to 85.2%, +4.8 points). These results show that QUBO-based quantum annealing with MultiSignal is a principled and scalable defense against the most challenging Byzantine strategies in federated learning.

URL PDF HTML ☆

赞 0 踩 0

2605.16435 2026-05-19 cs.LG cs.AI 版本更新

GPU-Accelerated Deep Learning for Heatwave Prediction and Urban Heat Risk Assessment

基于GPU的深度学习用于热浪预测和城市热风险评估

Adis Alihodžić

AI总结本文提出基于GPU的深度学习框架，用于预测城市热条件和评估热风险，采用MODIS和Open-Meteo数据，验证了ConvLSTM混合损失函数的有效性，提升了预测精度与效率。

详情

AI中文摘要

热浪是城市中的重要问题，气候变化使其更加困难。本文提出一种基于GPU的深度学习框架，用于预测城市热条件和热风险评估。研究在萨拉热窝使用MODIS地表温度数据和Open-Meteo预报数据进行。测试了多种模型，包括卷积模型和时空模型。其中，混合损失函数的ConvLSTM模型表现最佳，得到MAE=0.2293，RMSE=0.3089，R2=0.8877。实验还表明，使用更长的时间序列和额外气象变量可提高结果。由于框架在GPU上实现并采用混合精度训练，执行时间减少。基于预测温度场，可以结合危险信息与暴露和脆弱性数据生成城市热风险地图。所提框架可作为城市热分析的实用基础。

英文摘要

Heatwaves are an important problem in cities, and climate change makes this problem more difficult. In this paper, we present a GPU-based deep learning framework for next-day prediction of urban thermal conditions and for heat risk assessment. The study was carried out in Sarajevo by using MODIS land surface temperature data and Open-Meteo forecast data. We tested several models, including convolutional models and spatiotemporal models. Among them, ConvLSTM with a mixed loss function gave the best results. The obtained values were MAE = 0.2293, RMSE = 0.3089, and R2 = 0.8877. The experiments also showed that results can be improved by using longer temporal series and additional meteorological variables. Since the framework was implemented on a GPU and trained with mixed precision, the execution time was reduced. Based on the predicted temperature fields, it was also possible to combine hazard information with exposure and vulnerability data in order to generate city heat risk maps. The proposed framework can be used as a practical basis for city heat analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.16433 2026-05-19 cs.LG cs.AI 版本更新

Edge-AI-Driven Learning-to-Rank for Decentralized Task Allocation in Circular Smart Manufacturing

边缘AI驱动的基于排序的学习排名用于圆环式智能制造中的去中心化任务分配

Mohammadhossein Ghahramani, Yan Qiao, Mengchu Zhou

AI总结本文提出一种边缘AI驱动的去中心化任务分配框架，通过基于排序的协商实现高效资源分配，提升高负载和紧 deadline 场景下的延迟和能效。

详情

Journal ref: Under review at IEEE IoT J, 2026

AI中文摘要

在智能制造系统中，任务分配需要在去中心化决策、动态负载和共享资源约束下运行。在循环制造环境中，这些挑战因需平衡运营效率与资源和能源可持续性而加剧。尽管已有基于学习的方法，但许多方法专注于预测绝对性能指标，这些指标不一定能提升分配结果，因为去中心化分配由候选机器的相对排序决定。本文提出一种基于排序意识协商的边缘AI驱动的去中心化任务分配框架，其中轻量级决策智能嵌入在机器层面，以实现低延迟协调而无需集中控制。该框架逐步开发：首先，资源感知的启发式方法建立去中心化投标结构，然后基于边缘AI的回归模型提供学习的本地投标近似，最后基于排序的公式重塑学习目标以与赢家选择的排序性质一致。每台机器使用本地信息评估 incoming 任务，包括处理能力、队列状态和资源竞争。该框架通过离散事件模拟在高负载和紧 deadline 场景下进行评估，使用延迟、截止期限违规、吞吐量和能耗等指标。结果表明，在高负载下延迟和截止期限遵守有所改善，在更紧的约束下能耗效率提高，导致更高效的资源操作，符合循环制造目标。这些发现表明，将学习目标与去中心化决策结构对齐对于有效的协商驱动任务分配至关重要。

英文摘要

Task allocation in smart manufacturing systems needs to operate under decentralized decision-making, dynamic workloads, and shared resource constraints. In circular manufacturing settings, these challenges are further intensified by the need to balance operational efficiency with resource and energy sustainability. While learning-based approaches have been explored, many focus on predicting absolute performance metrics that do not necessarily translate into improved allocation outcomes, since decentralized assignment is governed by the relative ordering of candidate machines. This work proposes an Edge-AI-driven decentralized task allocation framework based on ranking-aware negotiation, where lightweight decision intelligence is embedded at the machine level to enable low-latency coordination without centralized control. The framework is developed progressively: a resource-aware heuristic first establishes the decentralized bidding structure, an Edge-AI-based regression model then provides learned local bid approximation, and a ranking-aware formulation finally reshapes the learning objective to align with the ordering-based nature of winner selection. Each machine evaluates incoming tasks using local information, including processing capability, queue state, and resource contention. The framework is evaluated via discrete-event simulation under high-load and tight-deadline scenarios using delay, deadline violations, throughput, and energy consumption. Results show improved delay and deadline adherence under high load, and enhanced energy efficiency under tighter constraints, leading to more resource-efficient operation aligned with circular manufacturing objectives. These findings demonstrate that aligning learning objectives with decentralized decision structures is critical for effective negotiation-driven task allocation.

URL PDF HTML ☆

赞 0 踩 0

2605.16429 2026-05-19 cs.LG cs.AI 版本更新

QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck Policy Optimisation in Continuous Reinforcement Learning

QuantFPFlow：用于连续强化学习中Fokker-Planck策略优化的量子振幅估计

Abraham Itzhak Weinberg

AI总结 QuantFPFlow通过量子振幅估计提升连续强化学习中Fokker-Planck策略优化的效率，实现算法复杂度从O(1/ε²)到O(1/ε)的平方加速，并在多模态奖励景观中发现全局最优解。

详情

AI中文摘要

我们引入QuantFPFlow，一种将量子振幅估计整合到随机策略优化的Fokker-Planck（FP）公式中的强化学习框架。经典连续空间RL代理必须以成本O(1/ε²)估计FP分区函数Z=∫e^{-V(x)/D}dx；QuantFPFlow用Grover增强的振幅估计器替代，实现O(1/ε)的可证明二次加速。尽管完全量子加速需要容错硬件，此处展示的量子启发经典模拟已表现出O(1/ε)的算法结构。估计的稳态分布ρstar驱动理论支撑的探索奖励Raug=Renv+αlog(1/ρstar(s))。此奖励将代理引导至多模态奖励景观的全局最优区域，同时通过FP扩散匹配约束策略方差。在专门设计暴露局部最优失败的连续控制任务中，QuantFPFlow实现平均奖励1,295.7±423.2，优于Soft Actor-Critic(SAC)的1,284.0±474.0，同时发现全局最优的频率高10.4%（33.9% vs. 30.7%）。策略熵保持在H(π)≈6.5纳特，而SAC下降至1.5纳特，证实FP扩散匹配主动防止过早收敛。维度实验进一步显示QuantFPFlow的计算规模为O(d^{0.35})，而经典FP估计为O(d^{0.76})。

英文摘要

We introduce \textbf{QuantFPFlow}, a reinforcement learning framework that integrates quantum amplitude estimation into the Fokker--Planck~(FP) formulation of stochastic policy optimisation. Classical continuous-space RL agents must estimate the FP partition function $Z = \int e^{-V(\mathbf{x})/D}\,d\mathbf{x}$ at cost $\calO(1/\varepsilon^{2})$; QuantFPFlow replaces this with a Grover-amplified amplitude estimator achieving $\calO(1/\varepsilon)$ -- a provable quadratic speedup. While the full quantum acceleration requires fault-tolerant hardware, the quantum-inspired classical simulation demonstrated here already exhibits the $\calO(1/\varepsilon)$ algorithmic structure. The estimated stationary distribution $\rhostar$ drives a theoretically grounded exploration bonus $\Raug = \Renv + α\log(1/\rhostar(s))$. This bonus steers the agent toward globally optimal regions of multimodal reward landscapes while simultaneously constraining policy variance through FP diffusion matching. On a continuous-control task specifically designed to expose local-optima failure, QuantFPFlow achieves mean reward $1{,}295.7 \pm 423.2$ versus $1{,}284.0 \pm 474.0$ for Soft Actor-Critic~(SAC), while discovering the global optimum \textbf{10.4\,\% more frequently} (33.9\,\% vs.\ 30.7\,\%). Policy entropy remains near $H(π)\approx 6.5$\,nats throughout training, whereas SAC collapses to $1.5$\,nats, confirming that FP diffusion matching actively prevents premature convergence. Dimensionality experiments further show computational scaling of $\calO(d^{0.35})$ for QuantFPFlow versus $\calO(d^{0.76})$ for classical FP estimation.

URL PDF HTML ☆

赞 0 踩 0

2605.16420 2026-05-19 cs.CV cs.LG 版本更新

Video Reconstruction using Diffusion-based Image-to-Video Generation with Trajectory Guidance

基于扩散模型的图像到视频生成与轨迹引导的视频重建

Stelio Bompai, Ioannis Kontopoulos, Giannis Spiliopoulos, Dimitris Zissis, Konstantinos Tserpes

AI总结本文提出利用预训练的图像到视频扩散模型，通过GPS轨迹引导生成无人机视频的缺失或丢失帧，无需领域特定微调，展示了在低纹理和小目标条件下视频重建的有效性。

Comments Accepted at the 1st Workshop on Multi-Sensor Trajectory Knowledge Discovery and Extraction (MuseKDE 2026), co-located with the 27th IEEE International Conference on Mobile Data Management (IEEE MDM 2026)

详情

AI中文摘要

本文解决了自主水面车辆进行结构化海上 maneuver 时顶视无人机视频中缺失或丢失帧的重建问题。我们提出了一种将原始GPS telemetry 和单个参考帧转换为轨迹引导视频序列的流程，使用预训练的图像到视频扩散模型，无需领域特定微调。通过将GPS坐标投影到图像空间，产生每艘船的运动提示，以条件化SG-I2V扩散模型。生成的帧通过感知、时间和轨迹度量与真实视频进行评估，并与光流外推和RIFE插值基线进行基准测试。SG-I2V在所有方法中产生了最自然的帧（BRISQUE 25.52，接近真实值23.64），最真实的运动幅度（时间平滑度1.14 vs. 真实值1.42），以及最强的GPS轨迹一致性（9.31px vs. 真实值28.70px，后者反映的是视频和GPS日志之间的大致时间对齐，而非生成误差），证明了轨迹引导的扩散合成在挑战性低纹理、小目标条件下是可行的海上视频重建方法。

英文摘要

This paper addresses the problem of reconstructing missing or dropped frames in top-down drone video of autonomous surface vehicles performing structured maritime manoeuvres. We propose a pipeline that converts raw GPS telemetry and a single reference frame into a trajectory-guided video sequence using a pre-trained image-to-video diffusion model, requiring no domain-specific fine-tuning. GPS coordinates from onboard telemetry logs are projected into image space via an equirectangular mapping, producing per-vessel motion cues that condition the SG-I2V diffusion model. The generated frames are evaluated against ground-truth video using perceptual, temporal and trajectory-based metrics, and benchmarked against optical flow extrapolation and RIFE interpolation baselines. SG-I2V produces the most naturally appearing frames among all methods (BRISQUE 25.52, closest to ground-truth 23.64), the most realistic motion magnitude (temporal smoothness 1.14 vs. ground truth 1.42), and the strongest GPS trajectory adherence (9.31px vs. 28.70px for ground-truth, the latter reflecting approximate temporal alignment between footage and GPS logs rather than generation error), demonstrating that trajectory-guided diffusion synthesis is a viable approach to maritime video reconstruction under challenging low-texture, small-object conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.16411 2026-05-19 cs.CV cs.AI cs.CL cs.DB cs.LG 版本更新

Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift

通过分布偏移下的分阶段偏好优化减少视觉-语言模型中的幻觉

Qinwu Xu

AI总结本文提出分阶段偏好优化框架，通过构建针对幻觉问题的数据集，提升视觉-语言模型的 grounded reasoning，减少幻觉并提高响应信息量。

详情

AI中文摘要

幻觉仍然是视觉-语言模型（VLMs）中的基本挑战，其中自回归生成可能因联合概率建模下的最大似然估计而产生语言上合理但物理上不一致或视觉上不 grounded 的响应。我们提出了一种分阶段偏好优化框架，通过有针对性的多模态数据构建来减少幻觉。该框架强调模糊的空间方向、物体关系、OCR不确定性以及对抗性假前提训练。幻觉负样本通过最小扰动但视觉不一致的替代品生成，使直接偏好优化（DPO）能够更好地区分 grounded 推理与 plausible 幻觉。在开源基准和现实多模态评估场景中的实验表明，改进了 grounded 一致性，减少了幻觉，并产生了更具信息量的 grounded 响应。跨模型定性评估进一步显示，所提出的多模态 LLM DPO 框架在模糊空间推理和对抗性假前提设置中比几个前沿专有 VLMs 产生更视觉 grounded 的响应。结果表明，幻觉可能不仅源于模型容量的限制，还源于自回归概率生成在弱视觉 grounding 下倾向于选择语言上合理但视觉上不一致的延续。未来工作可能探索物理一致性建模、不确定性感知的多模态推理以及超越标准自回归解码的架构替代方案。

OrbiSim：作为具身智能的可微物理引擎的世界模型

Jiajian Li, Jingyuan Huang, Junru Gong, Qi Wang, Xiaokang Yang, Yunbo Wang

AI总结 OrbiSim提出了一种新的机器人仿真范式，将世界模型重新定义为完全可微的物理引擎，通过统一的物理基础路径连接结构化场景资产、神经动力学和下游强化学习，提升预测精度和控制性能。

Comments Project page: https://jjleejj85.github.io/projects/orbisim

详情

AI中文摘要

我们提出了OrbiSim，一种新的机器人仿真范式，将世界模型重新定义为完全可微的物理引擎，用于具身智能。不同于以往专注于潜在域或视觉域中无约束想象的世界模型，OrbiSim建立了一个统一的、基于物理的路径，连接结构化场景资产、神经动力学和下游强化学习。通过在整个仿真循环中实现端到端的可微性——从显式状态转换到视觉观察生成——OrbiSim支持传统经典模拟器难以处理的任务，如可微接触建模、稀疏奖励下的基于梯度的策略优化和直观的物理推理。实证结果表明，OrbiSim在预测保真度和控制性能方面显著优于最先进的世界模型。此外，其对资产配置和物理参数的一致响应表明其作为增强机器人仿真和策略训练的可微工具的潜力。

英文摘要

We present OrbiSim, a novel robotic simulation paradigm that redefines world models as a fully differentiable physics engine for embodied intelligence. Unlike prior world models that focus on unconstrained imagination in latent or visual domains, OrbiSim establishes a unified, physically-grounded pathway that bridges structured scene assets, neural dynamics, and downstream reinforcement learning. By enabling end-to-end differentiability throughout the entire simulation loop -- spanning from explicit state transitions to visual observation generation -- OrbiSim supports tasks traditionally intractable for classical simulators, such as differentiable contact modeling, gradient-based policy optimization under sparse rewards, and intuitive physical inference. Empirical results demonstrate that OrbiSim significantly outperforms state-of-the-art world models in both predictive fidelity and control performance. Furthermore, its consistent responsiveness to asset configurations and physical parameters suggests its potential as a differentiable tool for enhancing robot simulation and policy training.

URL PDF HTML ☆

赞 0 踩 0

2605.16392 2026-05-19 q-bio.QM cs.CV cs.LG 版本更新

Bridging the Modality Bottleneck in Pathology MIL through Virtual Molecular Staining

弥合病理MIL中的模态瓶颈：通过虚拟分子染色

Yucheng Xing, Pei Liu, Jingying Ma, Ruping Hong, Jiangdong Qiu, Tianyu Liu, Kai He, Ling Huang, Mengling Feng

AI总结本文提出MIST方法，通过虚拟分子染色提升病理MIL中投影层性能，改进240/256配置，平均提升3.5%，在生存预测、组织分型和生物标志物预测中分别提升5.2%、3.3%和2.6%。

详情

AI中文摘要

多重实例学习（MIL）是计算病理学中全切片图像分析的主流框架，通常结合冻结的补丁编码器、投影层和滑片级聚合器。尽管编码器和聚合器已广泛研究，投影层仍是一个主要的形态学瓶颈。这限制了诸如生物标志物状态和生存等终点，这些终点由未被H&E形态完全捕捉的分子状态决定。我们引入了分子指导的染色转换（MIST），一种可替换MIL投影层的插件，仅在训练期间使用配对的空间转录组学数据来构建虚拟分子染色。MIST将基因表达谱聚类为跨模态原型，将其锚定在冻结的基础模型特征空间中，并利用它们沿分子指导的轴重新组织H&E补丁特征。它不需要转录组学在推理阶段，并且可以在标准MIL聚合器之前插入。我们评估了MIST在23个下游任务和8个MIL聚合器上的表现。MIST在256种配置中改进了240种，平均提升3.5%，在各种终点类型中观察到一致的提升：生存预测提升5.2%，组织分型提升3.3%，生物标志物预测提升2.6%。消融实验确认基因衍生的原型是提升的主要来源，而空间、生物和病理分析显示跨模态原型亲和力能够从H&E中捕捉到空间上一致的分子程序。

英文摘要

Multiple instance learning (MIL) is the dominant framework for whole-slide image analysis in computational pathology, typically combining a frozen patch encoder, a projection layer, and a slide-level aggregator. While encoders and aggregators have been extensively studied, the projection layer remains a largely morphology-only bottleneck. This limits endpoints such as biomarker status and survival, which are governed by a molecular state that is not fully captured by H&E morphology. We introduce Molecularly Informed Staining Transform (MIST), a plug-in replacement for the MIL projection layer that uses paired spatial transcriptomics only during training to construct virtual molecular stains. MIST clusters gene expression profiles into cross-modal prototypes, anchors them in the frozen foundation model feature space, and uses them to reorganize H&E patch features along molecularly guided axes. It requires no transcriptomics at inference and can be inserted before standard MIL aggregators. We evaluate MIST across 23 downstream tasks and 8 MIL aggregators. MIST improves 240 of 256 configurations over the standard projection layer, with an average gain of +3.5%, observed consistently across endpoint types: +5.2% on survival prediction, +3.3% on tissue subtyping, and +2.6% on biomarker prediction. Ablations confirm that gene-derived prototypes are the primary source of the gains, while spatial, biological, and pathological analyses show that cross-modal prototype affinities capture spatially coherent molecular programs from H&E alone.

URL PDF HTML ☆

赞 0 踩 0

2605.16391 2026-05-19 eess.SP cs.AI cs.LG cs.RO 版本更新

Overcoming the Intrinsic Performance Limitations of MEMS IMU via Diffusion-Based Generative Learning

通过扩散生成学习克服MEMS惯性测量单元的固有性能限制

Jiarui Lv, Feng Zhu, Xiaohong Zhang

AI总结本文提出基于扩散的生成学习框架，利用低成本IMU数据生成高保真虚拟IMU数据，提升定位和姿态估计性能，并在空中测绘中验证了其有效性。

详情

AI中文摘要

惯性测量单元（IMUs）是多源集成导航系统中的基本传感组件，其性能直接影响解决方案的精度和可靠性。然而，低成本IMUs的精度受硬件限制。最近，生成式人工智能在建模复杂数据分布和重建高保真信号方面表现出色。受此启发，我们提出了一种基于扩散的生成学习框架，用于从低成本IMU测量中合成高保真虚拟IMU数据。具体而言，基于U-Net架构构建了条件扩散模型，其中高质量IMU测量用作先验真实数据，低成本IMU测量作为条件输入。模型生成的虚拟IMU数据用于后续导航和定位任务。实验结果表明，生成的虚拟IMU数据在定位和姿态估计方面均显著优于原始低成本IMU测量。此外，我们将模型转移到空中测绘实验中，其中所提出的方法产生了更薄且一致的点云。总体而言，所提出的框架突破了低成本IMU的性能限制，并展示了扩散基于生成学习在虚拟高质量IMU数据方面的潜力。

英文摘要

Inertial measurement units (IMUs) are fundamental sensing components in multi-source integrated navigation systems, and their performance directly determines the accuracy and reliability of solutions. However, the precision of low-cost IMUs is inherently constrained by hardware limitations. Recently, generative artificial intelligence has demonstrated remarkable capability in modeling complex data distributions and reconstructing high-fidelity signals. Motivated by this, we propose a diffusion-based generative learning framework for synthesizing high-fidelity virtual IMU data from low-cost IMU measurements. Specifically, a conditional diffusion model based on a U-Net architecture is constructed, where high-grade IMU measurements are utilized as ground-truth priors and low-cost IMU measurements are employed as conditional inputs. The virtual IMU data generated by the model is used for subsequent navigation and localization tasks. Experimental results demonstrate that the generated virtual IMU data significantly outperform the original low-cost IMU measurements in both positioning and attitude estimation. Furthermore, we transfer the model to airborne mapping experiments, where the proposed method produces thinner and more consistent point clouds. Overall, the proposed framework breaks the performance limits of low-cost IMU and demonstrates the potential of diffusion-based generative learning for virtual high-grade IMU data.

URL PDF HTML ☆

赞 0 踩 0

2605.16390 2026-05-19 cs.CV cs.LG stat.ML 版本更新

CheckSupport：一种基于本地LLM的自动化手稿提交检查清单选择与完成工具

Satvik Tripathi, Don Enwerem, Kevin Song, Kristian Quevada, Jacinta Arnold, Tessa S. Cook

AI总结本文提出CheckSupport，利用本地LLM自动化选择和完成检查清单，提升科研报告的透明度和可重复性。系统通过分阶段提示策略实现高准确率，运行在CPU上，每篇手稿耗时12.5秒，准确率达90%。

详情

AI中文摘要

透明和标准化的报告对于可重复的科学研究至关重要，但因手动选择和完成检查清单的劳动强度，遵循报告指南仍不一致。我们提出了CheckSupport，一种开源、本地可部署的系统，利用大语言模型自动化推荐报告检查清单并完成清单。CheckSupport采用分阶段提示策略，将报告流程分解为受约束的推理任务，优先提取忠实信息而非生成文本合成。所有推理均在本地使用指令调优模型完成，保护数据隐私并实现可重复、可审计的工作流程。在同行评审手稿语料库上评估，CheckSupport在清单推荐上达到90%的整体准确率，在项目级完成上达到88%的整体准确率，运行在仅CPU硬件上。平均而言，每篇手稿的墙钟时间为12.5秒，包括检查清单推荐和完整检查清单完成。这些结果表明，当大语言模型作为结构化推理组件应用时，可以减少报告负担，支持跨学科更透明和可重复的科学研究报告。

英文摘要

Transparent and standardized reporting is essential for reproducible scientific research, yet adherence to reporting guidelines remains inconsistent because of the manual effort required to select and complete checklists. We present CheckSupport, an open-source, locally deployable system that uses large language models to automate the recommendation of reporting checklists and the evidence-grounded completion of checklists for scientific manuscripts. CheckSupport employs a staged prompting strategy that decomposes reporting workflows into constrained inference tasks, prioritizing faithful extraction over generative text synthesis. All inference is performed locally using instruction-tuned models, preserving data privacy and enabling reproducible, auditable workflows. Evaluated on a corpus of peer-reviewed manuscripts, CheckSupport achieved 90% overall accuracy for checklist recommendations and 88% overall accuracy for item-level completion while operating on CPU-only hardware. On average, the wall-clock time per manuscript was 12.5 seconds, including the checklist recommendation and full checklist completion. These results demonstrate that large language models, when applied as structured inference components, can reduce reporting burden and support more transparent and reproducible scientific reporting across disciplines.

URL PDF HTML ☆

赞 0 踩 0

2605.16376 2026-05-19 eess.IV cs.CV cs.DC cs.LG cs.MM 版本更新

Kelvin v1.0: A Neural Pre-Encoder for H.264: A standards-compliant learned preprocessor with -27.62% BD-VMAF on UVG

Kelvin v1.0：一种用于H.264的神经预编码器：一种符合标准的学得预处理程序，在UVG上实现-27.62%的BD-VMAF

Marco Graziano

AI总结 Kelvin v1.0通过内容自适应像素调整优化H.264编码，实现比基准libx264更高的BD-VMAF，其在UVG和MCL-JCV数据集上均表现优异，同时解决了H.264非可微的工程挑战。

详情

AI中文摘要

Kelvin是一种轻量级学得预编码器，位于未修改的libx264编码器之前。它应用内容自适应的像素调整，每个通道限制在±1/255以内，使编码器将比特分配到最需要感知的区域，同时输出兼容所有现有解码器、播放器和CDN的标准H.264位流。在七序列1080p UVG基准上，Kelvin v1.0实现平均BD-VMAF为-27.62%（7/7胜），BD-VMAF-NEG为-5.18%（6/7胜）。在30序列MCL-JCV公开数据集上，相同检查点在28/30片段上胜过基准libx264，去除两个可诊断失败后，平均BD-VMAF为-27.70%，与UVG一致。核心工程挑战是H.264的非可微性：我们描述了一种混合编码器代理，结合校准的可微率估计器（与真实libx264的每像素比特数斯皮尔曼_rho=0.986）和在真实编码器输出上训练的U-Net失真代理。我们发布完整的每序列率失真数据，MCL-JCV上的命名失败模式分类（率下限违规、分布偏移、指标饱和），以及五个基准的合理性面板（hqdn3d、unsharp、-tune psnr、-tune ssim、x265 medium），并诚实定位：x265 medium在相同数据集上每项指标均胜过Kelvin。因此，Kelvin是为在H.264上保持是约束而非选择的工作负载设计的。

英文摘要

Kelvin is a lightweight learned pre-encoder that sits in front of an unmodified libx264 encoder. It applies content-adaptive pixel adjustments, bounded at +/-1/255 per channel, so that the encoder allocates bits where they matter most perceptually, while emitting a standard H.264 bitstream compatible with every existing decoder, player, and CDN. On the seven-sequence 1080p UVG benchmark, Kelvin v1.0 achieves a mean BD-VMAF of -27.62% (7 of 7 wins) and BD-VMAF-NEG of -5.18% (6 of 7 wins) relative to baseline libx264 at preset medium. On the 30-sequence MCL-JCV public set (28 unseen by training), the same checkpoint wins on 28 of 30 clips by BD-VMAF; with the two diagnosable failures removed the mean is -27.70% BD-VMAF and -5.37% BD-VMAF-NEG, consistent with UVG to within one percentage point. A central engineering challenge is the non-differentiability of H.264: we describe a hybrid codec proxy that combines a calibrated differentiable rate estimator (Spearman rho = 0.986 vs. real libx264 bits-per-pixel) with a U-Net distortion proxy trained on real encoder outputs. We publish full per-sequence rate-distortion data, a named failure-mode taxonomy on MCL-JCV (rate-floor violation, distribution shift, metric saturation), a five-baseline sanity panel (hqdn3d, unsharp, -tune psnr, -tune ssim, x265 medium), and honest positioning: x265 medium beats Kelvin on every metric on the same corpus. Kelvin is therefore designed for workloads where remaining on H.264 is a constraint rather than a choice.

URL PDF HTML ☆

赞 0 踩 0

2605.16375 2026-05-19 cs.LG cs.NI 版本更新

M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices

M$^2$FedAQI: 多模态联邦学习用于异构边缘设备上的空气质量预测

Manjil Nepal, Kimsie Phan, Tamoghna Ojha, Aritra Dutta, M Krishna Siva Prasad

AI总结本文提出M$^2$FedAQI框架，通过多模态融合机制实现异构边缘设备上的空气质量预测，实验表明其在准确率、AUC、F1-score和R²等指标上均优于现有方法，同时降低MAE和RMSE，提升通信安全性和资源利用率。

详情

AI中文摘要

准确的空气质量预测对公共健康、环境监测和工业安全至关重要。然而，现有方法多依赖集中学习范式，导致分布式物联网环境中可扩展性、隐私保护和通信开销等问题。此外，当前基于联邦学习（FL）的解决方案大多使用单模态数据，限制了其捕捉复杂环境模式的能力。为解决这些限制，我们提出M$^2$FedAQI，一种轻量级多模态联邦框架，用于在异构边缘设备上进行去中心化空气质量指数（AQI）预测。所提出的框架通过基于特征调制的融合机制整合视觉和表格模态，实现高效的跨模态交互，同时保持低计算开销。M$^2$FedAQI在PM25Vision和TRAQID两个基准数据集上进行评估，针对分类和回归任务，在集中式和联邦学习设置下进行测试。实验结果表明，M$^2$FedAQI在准确率、AUC、F1-score和R²等指标上均优于现有方法，达到最高11.0%的准确率提升，3.53%的AUC提升，12.2%的F1-score提升和18.0%的R²提升，同时将MAE和RMSE分别降低25.4%和20.4%。此外，在异构边缘设备上的部署显示了在通信开销、内存足迹和计算成本方面的高效资源利用率。为增强通信安全，采用TLS认证机制，确保客户端参与的安全性并保护联邦学习通信通道免受未经授权第三方访问，而无需修改底层联邦学习协议。

英文摘要

Accurate air quality prediction is essential for public health, environmental monitoring, and industrial safety. However, most existing approaches rely on centralized learning paradigms, which introduce challenges related to scalability, privacy preservation, and communication overhead in distributed Internet of Things (IoT) environments. Moreover, current federated learning (FL) based solutions predominantly utilize unimodal data, limiting their capability to capture complex environmental patterns. To address these limitations, we propose M$^2$FedAQI, a lightweight multimodal federated framework for decentralized Air Quality Index (AQI) prediction across heterogeneous edge devices. The proposed framework integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead. M$^2$FedAQI is evaluated on two benchmark datasets, PM25Vision and TRAQID, for both classification and regression tasks under centralized and federated settings. Experimental results demonstrate that M$^2$FedAQI consistently outperforms existing approaches, achieving improvements of up to 11.0\% in Accuracy, 3.53\% in AUC, 12.2\% in F1-score, and 18.0\% in $R^2$, while reducing MAE and RMSE by up to 25.4\% and 20.4\%, respectively, compared with the strongest baselines. Furthermore, deployment on heterogeneous edge devices demonstrates efficient resource utilization in terms of communication overhead, memory footprint, and computational cost. To enhance communication security, TLS-based authentication is incorporated to ensure secure client participation and protect the FL communication channel from unauthorized third-party access without modifying the underlying FL protocol.

URL PDF HTML ☆

赞 0 踩 0

2605.16374 2026-05-19 cs.LG cs.AI 版本更新

Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

丢失或隐藏？监督连续学习中的概念层面遗忘

Katarzyna Filus, Kamil Faber, Roberto Corizzo, Christopher Kanan

AI总结本文提出一种诊断框架，利用稀疏自编码器分析概念层面遗忘，发现遗忘主要源于表征可访问性变化而非信息擦除。

详情

AI中文摘要

持续学习研究模型如何在适应新任务的同时保留先前知识。尽管已有多种方法缓解灾难性遗忘，但该领域仍以性能为导向，缺乏对视觉模型表征空间中遗忘本质的理解。本文提出利用稀疏自编码器定义任务锚定的潜在特征空间，分析任务特定信息在更细粒度下的演变。我们分解遗忘为显性概念删除、可恢复性和解码性。结果显示，大量看似丢失的概念信息在线性假设下可恢复，而随着任务增加，概念解码性下降。总体而言，我们的发现表明，概念层面遗忘主要归因于表征可访问性变化而非完全信息擦除。

英文摘要

Continual learning studies how models can adapt to new tasks while retaining previously acquired knowledge. Although a broad spectrum of methods has been proposed to mitigate catastrophic forgetting, the field remains predominantly performance-driven, with limited insight into what forgetting actually corresponds to within the vision model's representation space. Prior work has primarily analyzed forgetting through task-level performance or coarse measures of representational drift, without disentangling output-level accessibility from changes in finer-grained internal structure. To this end, we propose a diagnostic framework that leverages Sparse Autoencoders (SAEs) to define a task-anchored latent feature space, enabling analysis of how task-specific information evolves at a finer granularity, where individual SAE latents are treated as concept proxies for recurring and relatively disentangled visual patterns in the model's internal computations. Within this framework, we decompose forgetting into apparent concept deletion, recoverability, and decodability. We show that a large portion of seemingly lost concept-level information can often be recovered under linearity assumption, with concept decodability degrading as more tasks are introduced. Overall, our findings suggest that a significant part of concept-level forgetting can be attributed to changes in the representational accessibility rather than complete information erasure.

URL PDF HTML ☆

赞 0 踩 0

2605.16373 2026-05-19 cs.CV cs.AI cs.LG 版本更新

Cross-Source Supervision for Bone Infection Segmentation in Dual-Modality PET-CT

跨源监督在双模态PET-CT骨感染分割中的应用

Zonglin Yang, Xiaolei Diao, Jishizhan Chen, Xiaozhuang Man, Wei Kong, Gen Wen, Pengfei Cheng, Daqian Shi

AI总结本文提出一种双模态端到端分割框架，通过早融合多模态表示整合PET代谢信号和CT骨窗解剖信息，解决标注不一致下的骨感染分割问题，采用患者级3D体积评估和交叉验证提高性能。

详情

AI中文摘要

早期和准确诊断骨感染及病变定位对临床治疗至关重要。PET-CT结合了CT的解剖信息和PET的代谢信息，是诊断骨感染的重要成像模态。然而，由于病变边界不清晰和不同专家或自动化系统生成的标注不一致，准确的病变分割仍具挑战性。本文研究了在标注不一致下的多模态分割。我们开发了一个双模态端到端分割框架，通过早融合多模态表示整合PET代谢信号和CT骨窗解剖信息。为了缓解小数据集中小切片相关性导致的性能膨胀，本研究弃用传统二维评估方法，采用严格的患者级3D体积评估和交叉验证。此外，我们提出了一种解耦的双源学习框架，其中并行模型在由高灵敏度和高特异性临床意图驱动的独立专家标注上进行训练。实验结果客观报告了患者级性能变化（均值±标准差和均值-标准差），证明了多模态PET-CT融合的有效性。交叉评估矩阵定量揭示了模型如何成功内化不同的专家诊断哲学，提供了一种稳健且保持多样性的临床AI部署范式，用于骨感染分割。

英文摘要

Early and accurate diagnosis and lesion localization of bone infections are crucial for clinical treatment. PET-CT integrates anatomical information from CT with metabolic information from PET, making it an important imaging modality for diagnosing bone infections. However, accurate lesion segmentation remains challenging due to indistinct lesion boundaries and inconsistencies in annotations generated by different experts or automated systems. In this work, we investigate multimodal segmentation of bone infections under annotation discrepancy. We develop a bimodal end-to-end segmentation framework that integrates PET metabolic signals and CT bone-window anatomy through an early-fusion multimodal representation.To mitigate performance inflation caused by inter-slice correlation in small datasets, this study discards traditional two-dimensional evaluation methods and implements a rigorous patient-level 3D volumetric evaluation and cross-validation. Furthermore, instead of forcing a singular consensus, we propose a decoupled dual-source learning framework where parallel models are trained on independent expert annotations driven by high-sensitivity and high-specificity clinical intents. Experimental results objectively report performance variations at the patient level (Mean + SD and Mean - SD), demonstrating the effectiveness of multimodal PET-CT fusion. The cross-evaluation matrix quantitatively reveals how models successfully internalize distinct expert diagnostic philosophies, providing a robust, diversity-preserving paradigm for clinical AI deployment in bone infection segmentation.

URL PDF HTML ☆

赞 0 踩 0

2605.16372 2026-05-19 cs.CV cs.AI cs.LG 版本更新

SwordBench: Evaluating Orthogonality of Steering Image Representations

SwordBench：评估转向图像表示的正交性

Vladimir Zaigrajew, Dawid Pludowski, Hubert Baniecki, Przemyslaw Biecek

AI总结本文提出SwordBench，用于评估视觉模型在多个backbone和概念移除任务中转向表示的正交性，引入了交叉概念鲁棒性和 collateral damage 等新评估指标，发现线性SVM在分离性和正交性上优于稀疏自编码器，但无法实现零 collateral damage。

详情

AI中文摘要

在推理时间对模型表示进行干预以校正预测对于AI可解释性和安全性至关重要，但现有评估协议局限于模糊的语言建模任务。为填补这一空白，我们引入SwordBench，一个用于评估视觉模型在多个backbone和概念移除任务中转向表示的基准。除了统一的基准测试套件外，我们还提出了新的评估概念，揭示了概念激活向量正交性对实用转向的二次影响。具体而言，交叉概念鲁棒性衡量在针对替代概念正交化输入上概念检测性能的稳定性，而collateral damage量化在缺乏偏见的输入上转向是否意外影响下游任务的模型性能。我们发现尽管线性支持向量机在分离性和正交性上表现优异，但无法实现零collateral damage，通常落后于稀疏自编码器。在更简单的环境中，标准基线和优化方法均无法实现完美的转向。源代码将很快在GitHub上发布。

英文摘要

Steering or intervening on model representations at inference time to correct predictions is essential for AI interpretability and safety, yet existing evaluation protocols are limited to ambiguous language modeling tasks. To address this gap, we introduce SwordBench, a benchmark for steering image representations of vision models across multiple backbones and concept removal tasks. Beyond a unified benchmarking suite, we propose new evaluation notions that uncover the second-order effects of orthogonalization among concept activation vectors for pragmatic steering. Specifically, cross-concept robustness measures the stability of concept detection performance across inputs orthogonalized against alternative concepts, and collateral damage quantifies whether steering inadvertently affects model performance on a downstream task for inputs lacking the bias. We find that although a linear support vector machine exhibits superior separability and orthogonality, it fails to achieve zero collateral damage, often trailing sparse autoencoders. In simpler regimes, both standard baselines and optimization-based methods fail to achieve perfect steering. The source code will be made available soon on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2605.16365 2026-05-19 cs.LG cs.DB 版本更新

Machine Learning-Based Pre-Test Risk Stratification for PCR-Confirmed Chlamydia Using Patient-Reported Data and Urine Biomarkers

基于机器学习的PCR确认淋病的预测试风险分层：利用患者报告数据和尿液生物标志物

Mehrab Mahdian, Marko Lehes, Katrin Krolov, Tamas Pardy

AI总结研究利用机器学习模型对PCR确认淋病的高风险个体进行预测试风险分层，结合患者报告数据和尿液生物标志物，提升筛查效率。

详情

AI中文摘要

早期识别淋病感染高风险个体可优化分子检测在资源受限筛查中的应用。本文评估了利用机器学习模型对预测试风险分层（PTRS）进行风险分层的可行性，使用常规可用的非侵入性临床数据进行训练。分析了93个尿液样本的curated数据集，使用三个特征组：患者报告的病史和症状、尿液生物标志物（标准尿液分析）以及它们的组合。评估了五个监督分类器，使用分层五折交叉验证和折叠外概率估计。性能通过受试者工作特征曲线下面积（AUC）和阈值依赖的指标评估，不确定性通过自助法置信区间量化。仅使用患者报告数据的模型显示出中等判别能力（AUC最高达0.72）。基于尿液生物标志物的模型显示出略低的峰值判别能力但更一致的性能，集成方法表现出最强的结果。结合特征组略微提高了峰值AUC并减少了模型间的性能变异，表明了改进的鲁棒性。研究结果表明，尿液生物标志物为PTRS提供了可靠的预测信号，与患者报告信息互补，而特征整合增强了鲁棒性。本研究支持将非侵入性、常规可用的信息整合到筛查流程中，包括去中心化或家庭PCR情境，以优化检测优先级。

英文摘要

Early identification of individuals at elevated risk of Chlamydia trachomatis infection may enable optimal use of molecular testing in resource-aware screening. We evaluate the feasibility of pre-test risk stratification (PTRS) using machine-learning models trained on routinely available, non-invasive clinical data. A curated dataset of 93 urine samples with PCR reference labels was analyzed using three feature groups: patient-reported history and symptoms, urine biomarkers from standard urinalysis, and their combination. Five supervised classifiers were evaluated using stratified 5-fold cross-validation with out-of-fold probability estimates. Performance was assessed using area under the receiver operating characteristic curve (AUC) and threshold-dependent metrics, with uncertainty quantified via bootstrap confidence intervals. Models using only patient-reported data showed moderate discrimination (AUC up to 0.72). Urine biomarker-based models demonstrated slightly lower peak discrimination but more consistent performance, with ensemble methods yielding the strongest results. Combining feature groups marginally increased the peak AUC and reduced performance variability across models, indicating improved robustness. Findings indicate that urine biomarkers provide a reliable predictive signal for PTRS that is complementary to patient-reported information, while feature integration enhances robustness. This work supports the integration of non-invasive, routinely available information for PTRS into screening workflows, including decentralized or home-based PCR contexts, to optimize testing prioritization.

URL PDF HTML ☆

赞 0 踩 0

2605.16363 2026-05-19 cs.LG cs.CY 版本更新

ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage

ORACLE：从流式应用使用轨迹中预见诈骗

Wenbo Gao, Songbai Tan, Zhongan Wang, Fei Shen, Gang Xu, Huiping Zhuang, Yunyun Yang, Ming Li, Xiaofeng Zhu

AI总结本文提出ORACLE框架，通过流式应用使用轨迹预测诈骗，利用自适应上下文管理器和自蒸馏方案提升早期欺诈检测性能。

详情

AI中文摘要

智能手机诈骗日益普遍，通常表现为多阶段、跨应用过程，意图逐渐显现。有效的干预需要在意图明确前预见诈骗，这极具挑战性，因为决策必须依赖部分轨迹和时间分布的证据。本文提出ORACLE在线推理框架，首个针对流式应用使用轨迹的早期诈骗预见框架。我们构建了一个现实世界长周期基准，涵盖12种诈骗类型，平均15天，涉及95种应用，交织正常与诈骗行为。为解决碎片化证据，我们引入自适应上下文管理器，动态整合实体中心交互，提升跨时间证据重建能力。为增强对潜在早期信号的敏感度，我们提出一种在线自蒸馏方案，教师模型基于总结的反诈骗反思和线索监督学生模型。实验表明，该方法在真实流式场景中有效提升早期诈骗预见，及时预警并减少误报。

英文摘要

Smartphone scams are increasingly prevalent and typically manifest as multi-stage, cross-application processes with gradually emerging intent. Effective intervention thus requires anticipating scams before the intent becomes explicit. This is inherently challenging, as decisions must rely on partial trajectories with temporally distributed evidence. In this paper, we propose \textbf{ORACLE} Online Reasoning for Anticipating Cross-temporal Latent thrEats, the first agentic framework for early scam anticipation from \textit{streaming app-usage} trajectories. To support this setting, we curate a real-world long-horizon benchmark of streaming app-usage trajectories, covering 12 scam types, spanning extended periods (15 days on average), involving diverse applications (95 apps), and interleaving normal and scam behaviors. To address fragmented evidence, we introduce a self-evolving context manager that adaptively consolidates entity-centric interactions over time, enabling more effective reconstruction of cross-temporal evidence from partial observations. To enhance sensitivity to latent early-stage signals, we propose an on-policy self-distillation scheme in which a teacher model, conditioned on summarized anti-scam reflections and clues by skills, supervises a student model without access to such reflections. This scheme thereby distills evidence-informed knowledge and improves recognition of emerging fraud patterns from partial trajectories. Experiments show that \method{} consistently improves early scam anticipation, yielding timely warnings while reducing false alerts in realistic streaming scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.16361 2026-05-19 cs.LG cs.AI stat.ML 版本更新

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

TailedTS：用于重尾时间序列预测和周期性量化的大规模基准数据集

Xinyu Chen, HanQin Cai, Lijun Ding, Jinhua Zhao

AI总结 TailedTS数据集用于测试在重尾、零膨胀和非高斯条件下时间序列预测模型的鲁棒性，通过稀疏自回归框架揭示高频页面的周期性较弱，同时提供非高斯损失函数的标准化预测基准。

详情

AI中文摘要

我们介绍了TailedTS，一个基于2024年维基百科每小时页面浏览观测数据的大规模基准数据集，专门用于测试时间序列预测模型在重尾、零膨胀和非高斯条件下的性能。该数据集包含约2469亿个数据点，覆盖约300万个唯一维基百科页面，存储在高效的Apache Parquet格式中。维基百科流量遵循幂律分布，其中约5%的页面贡献了70%的总浏览量，为模型在极端波动下的鲁棒性提供了一个自然且严谨的测试环境。TailedTS支持多个研究任务：首先，我们引入了一个基于稀疏自回归的周期性量化框架，揭示高频页面的周期性结构显著弱于低频页面，这对大型数字平台的服务器分配和流量预测有直接意义。其次，我们提供了在一系列非高斯损失函数下的标准化预测基准，包括ℓ1范数、Huber、分位数和ℓp范数损失，表明基于高斯的估计器在高流量页面类别中性能显著下降，而鲁棒替代方案在所有流量规模上均提供一致的提升。TailedTS可在https://doi.org/10.5281/zenodo.17070469公开获取。

英文摘要

We present TailedTS, a large-scale benchmark dataset derived from Wikipedia hourly page view observations throughout 2024, specifically designed to test time series forecasting models under heavy-tailed, zero-inflated, and non-Gaussian conditions. The dataset comprises approximately 24.69 billion data points spanning roughly 3 million unique Wikipedia pages per month, stored in high-efficiency Apache Parquet format. Wikipedia traffic follows a pronounced power-law distribution where roughly 5% of pages account for over 70% of total page views, creating a natural and rigorous testbed for model robustness against extreme volatility that are absent from or underrepresented in existing benchmarks such as M4, M5, and UCI electricity datasets. TailedTS enables several research tasks. First, we introduce a periodicity quantification framework based on sparse autoregression with sparsity and non-negativity constraints, revealing that frequently-viewed pages exhibit significantly weaker periodic structure than their less-viewed counterparts, showing direct implications for server allocation and traffic forecasting on large digital platforms. Second, we provide standardized prediction benchmarks evaluated under a suite of non-Gaussian loss functions, including $\ell_1$-norm, Huber, quantile, and $\ell_p$-norm losses, demonstrating that standard Gaussian-based estimators degrade substantially on high-volume page categories, while robust alternatives provide consistent gains across all traffic scales. TailedTS is publicly available at https://doi.org/10.5281/zenodo.17070469.

URL PDF HTML ☆

赞 0 踩 0

2605.16360 2026-05-19 cs.LG cs.AI 版本更新

LARGER: 词典锚定的仓库图探索与检索

Yuntong Hu, Tongli Su, Liang Zhao, Bowen Zhu, Hasibul Haque

AI总结 LARGER通过词典锚定的结构化定位方法提升代码仓库文件定位精度，实现测试生成和代码库理解任务的性能提升。

详情

AI中文摘要

仓库级别的编码代理必须首先定位与任务相关的文件和符号；此阶段的失败会影响从补丁生成到测试编写和代码库问答的下游目标。现有代理主要通过词汇搜索导航仓库，常遗漏结构关系如导入、调用链、类型层次和代码-测试链接。基于图的检索可恢复此类依赖，但现有方法常需要单独的图工具或遍历阶段，打断代理的交互循环。我们正式将仓库上下文定位定义为词典锚定的结构化定位，其成功取决于将词汇匹配转化为高精度的结构入口点，并在代理现有搜索循环中暴露最有用的置信度过滤局部邻域。我们引入LARGER（词典锚定的仓库图探索与检索），一种以词汇锚定的主动集检索框架，从词汇匹配开始，将其对齐到图锚点，并在代理现有搜索循环中执行置信度过滤的局部扩展。LARGER直接集成到现有CLI编码代理中，无需外部图数据库或专用图接口。在四个涵盖定位、测试生成和代码库理解的基准测试中，LARGER在LocBench上通过调整超参数将文件级Acc@5提升13.9点，即使在固定超参数下仍比最强基线提升11.8点，并在MuLocBench、SWE-Atlas测试编写和SWE-Atlas代码库问答任务上提供一致的提升。

英文摘要

Repository-level coding agents must first localize the files and symbols relevant to a task; failures at this stage can cascade across downstream objectives ranging from patch generation to test writing and codebase question answering. Existing agents navigate repositories primarily through lexical search, often missing structural relations such as imports, call chains, type hierarchies, and code-test links. Graph-based retrieval can recover such dependencies, but existing approaches often require separate graph tools or traversal stages that fragment the agent's interaction loop. We formalize repository context localization as Lexically Anchored Structural Localization, where success depends on turning lexical matches into high-precision structural entry points and exposing the most useful confidence-filtered local neighborhoods within the agent's existing search loop. We introduce LARGER (Lexically Anchored Repository Graph Exploration and Retrieval), a lexically anchored active-set retrieval framework that starts from lexical matches, aligns them to graph anchors, and performs confidence-filtered local expansion within the agent's existing search loop. LARGER integrates directly into existing CLI coding agents without requiring external graph databases or specialized graph interfaces. Across four benchmarks spanning localization, test generation, and codebase understanding, LARGER improves file-level Acc@5 on LocBench by +13.9 points with tuned hyperparameters and still gains +11.8 points with fixed hyperparameters over the strongest baseline, while delivering consistent gains on MuLocBench, SWE-Atlas Test Writing, and SWE-Atlas Codebase QA.

URL PDF HTML ☆

赞 0 踩 0

2605.16351 2026-05-19 cs.LG cs.AI 版本更新

PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift

PIMSM：基于物理的多尺度Mamba用于在分布偏移下稳定的神经表示

Sangyoon Bae, Shinjae Yoo, Jiook Cha

AI总结本文提出PIMSM，一种基于物理的多尺度Mamba架构，通过时间尺度对齐提升科学基础模型在分布偏移下的鲁棒性和表示稳定性，实验证明其在fMRI和气象预测中的有效性。

Comments 9 pages, 2 figures

详情

AI中文摘要

LoopQ: 递归变换器的量化

Rui Fang, Hsi-Wen Chen, Ming-Syan Chen

AI总结本文提出LoopQ框架，针对递归变换器的量化挑战，通过激活缩放、选择性变换等方法提升模型精度与效率。

2605.16342 2026-05-19 cs.LG cs.AI cs.CL 版本更新

DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models

DACA-GRPO：去噪感知的信用分配用于扩散语言模型中的强化学习

Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade, Lokesh Boominathan, Manuel R. Ciosici, Yizhe Zhang, Irina Belousova

AI总结本文提出DACA-GRPO，通过引入去噪进度评分和分层掩码似然，改进扩散语言模型中强化学习的信用分配，提升数学推理、代码生成等任务性能。

详情

AI中文摘要

扩散大语言模型是自回归模型的有力替代品，但现有强化学习方法将所有去噪步骤视为同等重要，并依赖于有偏、高方差的似然估计。我们识别出两个根本性弱点：去噪轨迹中缺乏时间信用分配，以及用于策略优化的均场似然估计存在系统偏差。为了解决这些问题，我们提出了Denoising-Aware Credit Assignment for GRPO（DACA-GRPO），一种轻量级、即插即用的增强方法，适用于任何GRPO风格的训练器。DACA-GRPO引入了两个互补机制：去噪进度评分，从中间预测中提取每token的重要性权重，无需额外前向成本；分层掩码似然，将token位置分为层次，使每个token在大部分序列作为上下文的情况下进行预测，从而减少均场偏差。在三种GRPO基础方法上应用DACA-GRPO，使其在七个基准测试中取得一致提升，涵盖数学推理、代码生成、约束满足和受约束生成等任务，在数学推理中提升达5.6个百分点，在代码生成中提升7.4个百分点，在约束满足中提升36.3个百分点，在JSON schema符合性中提升5.9个百分点。

英文摘要

Diffusion large language models are a compelling alternative to autoregressive models, yet existing RL methods for diffusion treat all denoising steps as equally important and rely on biased, high-variance likelihood estimates. We identify two fundamental weaknesses: the absence of temporal credit assignment across the denoising trajectory, and the systematic bias of mean-field likelihood estimates used for policy optimization. To address these, we propose Denoising-Aware Credit Assignment for GRPO (DACA-GRPO), a lightweight, plug-and-play enhancement for any GRPO-style trainer. DACA-GRPO introduces two complementary mechanisms: Denoising Progress Scores, which extract per-token importance weights from intermediate predictions at no additional forward cost, and Stratified Masking Likelihood, which partitions token positions into strata so that each token is predicted with most of the sequence as context, reducing the mean-field bias. Applied on top of three GRPO base methods, DACA-GRPO achieves consistent improvements across seven benchmarks spanning mathematical reasoning, code generation, constraint satisfaction, and constrained generation, with gains of up to 5.6pp on math reasoning, 7.4pp on code generation, 36.3pp on constraint satisfaction, and 5.9pp on JSON schema adherence.

URL PDF HTML ☆

赞 0 踩 0

2605.16341 2026-05-19 cs.LG 版本更新

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

Orth-Dion：消除分布式低秩谱优化中的几何不匹配

Tatsuhiro Nakamori, Laura Gomezjurado Gonzalez, Ganesh Talluri, Ansh Tiwari, Hideyuki Kawashima, Ioannis Mitliagkas, Guillaume Rabusseau, Hiroki Naganuma

AI总结 Orth-Dion通过替换列归一化为右因子的QR正交化，解决分布式低秩谱优化中的几何不匹配问题，实现与Dion相同通信成本下的最优收敛率。

Comments 24 pages, 3 figures, 11 tables

详情

AI中文摘要

低秩梯度压缩通过用秩-r因子表示更新来减少分布式训练中的通信开销。Dion是一种近似Muon（一种正交化动量的谱优化器）的方法，通过一次幂迭代后进行列归一化（将右因子的每一列重新缩放为单位长度）。这使其兼容完全分片数据并行训练，但收敛速度比全秩谱方法更慢。我们证明这种差距是几何性的：列归一化并未产生Muon隐式目标的秩-r极因子，因此所得到的方向违反了低秩谱几何的对偶范数约束，即使梯度的低秩近似准确，收敛率仍多了一个√r因子。同样的不匹配也影响了平滑项和误差反馈递归的分析，从而对经验性能产生连锁影响。我们提出Orth-Dion，其将列归一化替换为右因子的QR正交化。在非欧几里得平滑性下，设L_r为沿秩-r方向的曲率常数，Orth-Dion获得收敛率O(√(L_r/T))，在与Dion相同的每步通信成本下达到与精确谱方法相同的性能。证明通过自洽的固定点论证消除了先前误差反馈分析中常见的有界漂移假设，并使用时间平均收缩，仅要求误差序列平均收缩而非每一步都收缩。在大规模语言模型预训练实验中验证了预测的√r缩放，并显示Orth-Dion在Dion的通信成本下关闭了与Muon的收敛差距。

英文摘要

Low-rank gradient compression reduces communication in distributed training by representing updates with rank-$r$ factors. Dion is a recent method that approximates Muon, a spectral optimizer that orthogonalizes momentum, using one step of power iteration followed by column normalization (rescaling each column of the right factor to unit length). This makes it compatible with fully sharded data parallel training, but it converges more slowly than full-rank spectral methods. We show that this gap is geometric: column normalization does not yield the rank-$r$ polar factor that Muon implicitly targets, so the resulting direction violates the dual-norm constraint of the low-rank spectral geometry, and the rate picks up an extra factor of $\sqrt{r}$ even though the low-rank approximation of the gradient itself is accurate. The same mismatch enters the smoothness term and the error-feedback recursion in the analysis, which has a knock-on effect on empirical performance. We propose Orth-Dion, which replaces column normalization with QR orthogonalization of the right factor. Under non-Euclidean smoothness, with $L_r$ the curvature constant along rank-$r$ directions, Orth-Dion attains rate $O(\sqrt{L_r/T})$, matching exact spectral methods at the same per-step communication cost as Dion. The proof removes the bounded-drift assumption common in prior error-feedback analyses via a self-consistent fixed-point argument, and uses a time-averaged contraction that only requires the error sequence to contract on average rather than at every step. Experiments on large-scale language model pre-training validate the predicted $\sqrt{r}$ scaling and show that Orth-Dion closes the convergence gap to Muon at Dion's communication cost.

URL PDF HTML ☆

赞 0 踩 0

2605.16339 2026-05-19 cs.LG 版本更新

决策能力阈值在自我对战强化学习中的崩溃中起作用

Arahan Kujur

AI总结研究揭示决策能力阈值决定自我对战强化学习代理在不对称规则扰动下的崩溃，通过消除所有正可达条件决策导致快速收敛到确定性利用吸引子，而保留单个正可达条件决策可防止崩溃。

Comments 18 pages, 7 figures

2605.16312 2026-05-19 cs.LG cs.AI 版本更新

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

当动作消失时：自我对战强化学习中的对抗性动作移除

Arahan Kujur

AI总结研究了自我对战强化学习中的对抗性动作遮蔽，发现学习的遮蔽比随机遮蔽和学习扰动基线更具破坏性，揭示了动作可用性作为自我对战RL中的新鲁棒性表面。

Comments 17 pages, 2 figures, 18 tables

2605.16311 2026-05-19 cs.LG cs.DC 版本更新

SignMuon: Communication-Efficient Distributed Muon Optimization

SignMuon：高效的分布式μon优化

Neel Mishra, Kushagara Trivedi, Pawan Kumar

AI总结本文提出Sign-Muon优化器，结合signSGD的符号聚合与Muon的极步框架，通过矩阵感知的1位优化方法，在减少通信开销的同时提升训练效率，实验表明其在多个数据集上均取得最佳性能。

Comments 40 pages, 9 figures

详情

AI中文摘要

大规模神经网络的分布式训练受限于全精度梯度通信和忽略权重张量矩阵结构的坐标优化器。我们提出Sign-Muon，一种1位、矩阵感知的优化器，结合signSGD的多数投票符号聚合与Muon的极步框架。每个工作者通过牛顿-施卢茨迭代取动量的极因子形成Muon式方向，仅传输元素符号并进行多数投票聚合；可选的局部极步进一步在不增加通信开销的情况下强制正交性。在谱范数光滑性和有界方差随机梯度下，谱范数归一化的符号步在非凸问题中达到O(1/√T)的收敛速率。在单峰对称噪声下，多数投票跨M个工作者将随机项降低1/√M，与signSGD匹配。在α-β模型中，分布式Sign-Muon每个迭代仅需一次整数sum-allreduce；所有正交化本地完成，相比float32带来32倍的带宽减少（int8为4倍）。在330个CIFAR-10/ResNet-50配置中，Sign-Muon达到最佳验证准确率（92.15%）；其4-GPU多数投票变体在匹配有效批次下以37%更少的训练时间达到92.02%。在nanoGPT上，Sign-Muon在 perplexity 和 anytime 性能上优于其他基于符号的基线，弱标度性能在16 GPU时表现良好。

英文摘要

Distributed training of large neural networks is bottlenecked by full-precision gradient communication and by coordinatewise optimizers that ignore the matrix structure of weight tensors. We propose Sign-Muon, a 1-bit, matrix-aware optimizer that combines majority-vote sign aggregation from signSGD with the polar-step framework of Muon. Each worker forms a Muon-style direction by taking the polar factor of its momentum via a Newton--Schulz iteration, transmits only the entrywise signs, and aggregates by majority vote; an optional local polar step further enforces orthogonality at no extra communication cost. Under spectral-norm smoothness and bounded-variance stochastic gradients, the spectral-norm normalized sign step yields an $\mathcal{O}(1/\sqrt{T})$ nonconvex rate for an $\ell_1$-based stationarity measure. With unimodal symmetric noise, majority vote across $M$ workers cuts the stochastic term by $1/\sqrt{M}$, matching signSGD. In the $α$-$β$ model, distributed Sign-Muon needs only one integer sum-allreduce per iteration; all orthogonalization is local, giving a $32\times$ bandwidth reduction over float32 ($4\times$ for int8). Across 330 CIFAR-10/ResNet-50 configurations Sign-Muon attains the best validation accuracy (92.15\%); its 4-GPU majority-vote variant reaches 92.02\% with 37\% less training time at matched effective batch. On nanoGPT, Sign-Muon achieves lower perplexity and better anytime performance than other sign-based baselines, with favorable weak-scaling up to 16 GPUs.

URL PDF HTML ☆

赞 0 踩 0

2605.16290 2026-05-19 cs.CY cs.AI cs.LG 版本更新

MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling

通过数据驱动的认知画像建模学习者异质性以预测多项选择题难度

Dhriti Krishnan, Jaromir Savelka

AI总结本文提出基于学习者异质性的数据驱动认知画像框架，通过隐类分析识别行为画像并模拟响应分布，结合主题上下文和岭回归模型预测IRT难度参数，提升难度预测精度。

详情

AI中文摘要

预测多项选择题（MCQ）难度对有效评估至关重要，但当前方法通常假设学生能力分布单峰，忽视学生误解的异质性。本文提出一种基于角色的框架，用数据驱动的认知画像替代理论能力采样。利用EEDI数据集中的学生互动，通过潜在类分析（LCA）识别行为画像，然后将大语言模型（LLM）调制以模拟每个画像的响应分布。这些信号与主题上下文结合，输入岭回归模型预测项目反应理论（IRT）难度参数。通过五折交叉验证，本文方法在MSE上优于最近的基线（0.367到0.274；R2：0.525到0.686）。发现的画像具有可解释性，并提供了关于项目难度原因的见解，潜在应用于诊断评估设计。

英文摘要

Predicting the difficulty of multiple-choice questions (MCQs) is important for effective assessment, yet current methods typically assume a unimodal student ability distribution, overlooking the heterogeneous nature of student misconceptions. We propose a persona-driven framework that replaces theoretical ability sampling with data-driven cognitive profiling. Using student interactions from the EEDI dataset, we identify behavioral personas via latent class analysis (LCA), then condition a large language model (LLM) to simulate response distributions for each persona. These signals are aggregated with topic context and fed into a Ridge Regression model to predict the item response theory (IRT) difficulty parameter. With five-fold cross-validation, our method improves over a recent baseline (MSE: 0.367 to 0.274; R2: 0.525 to 0.686). The discovered personas are interpretable and offer insights into why items are difficult, with potential applications to diagnostic assessment design.

URL PDF HTML ☆

赞 0 踩 0

2605.16268 2026-05-19 cs.HC cs.AI cs.LG 版本更新

Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes

帮助陷入困境的客户：一个基于LLM的代理，能够对话、探测和分流

Alankar Atreya, Stefan Sylvius Wanger, Devesh Batra, Robert Hankache, Cristovao Iglesias, Patrick Sinclair, Giulio Pelosio, Michael McMillan, Greig A. Cowan, Raad Khraishi

AI总结本文提出一个基于LLM的AI分流代理，通过多轮对话和提问提高客户问题分类准确性，提升银行客户服务效率。

详情

AI中文摘要

比较神经求解器与启发式求解器在组合优化中的平均效率阈值

Sohaib Afifi

AI总结本文研究了神经求解器在组合优化中的能耗问题，提出平均效率阈值框架，通过实验显示神经求解器在部署量超过阈值后能耗低于启发式方法，提供了新的评估方法。

Comments 13 pages, 3 figures, 1 table. Code and benchmark pipeline at https://github.com/sohaibafifi/aet. v1: initial release with CVRP n=50

详情

AI中文摘要

神经组合优化求解器常被批评其能耗高于CPU启发式方法，因其在GPU上训练的成本高。本文探讨了从

英文摘要

A common critique of neural combinatorial-optimization solvers is that they are less energy-efficient than CPU metaheuristics, given the operational energy cost of training them on GPUs. This paper examines the inferential step from "training is expensive" to "neural solvers are net-inefficient", which is where the critique actually goes wrong. Training the network costs a large fixed amount of GPU energy; running the metaheuristic costs a small amount of CPU energy on every instance, repeated as long as the solver is deployed. The two are not commensurable until a deployment volume is fixed. We define the Amortized Efficiency Threshold (AET) as the deployment volume above which a neural solver breaks even with a heuristic baseline in total energy or carbon, under an explicit constraint on solution quality. We show that the cumulative-energy ratio between the two solvers tends to a constant strictly below one whenever the network wins per instance, and that this limit does not depend on how the training cost was measured. An embodied-carbon term amortizes hardware fabrication symmetrically on both sides. We instantiate the framework on the CVRP environment at n=50 customers with the attention-based autoregressive solver of Kool et al. (2019), trained for 100 epochs on 20,000 instances over five random seeds, and HGS via PyVRP as the heuristic baseline. The measured operational crossover sits near 4.56e3 deployed instances at the median of a six-point baseline-budget sweep; the per-instance neural-to-heuristic ratio is 2.29e-3. The contribution is the framework, the open instrumentation, and the end-to-end measurement protocol. Code and benchmark pipeline are available at https://github.com/sohaibafifi/aet.

URL PDF HTML ☆

赞 0 踩 0

2605.14068 2026-05-19 cs.CV cs.LG 版本更新

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

CurveBench：一种用于嵌套乔丹曲线精确拓扑推理的基准

Amirreza Mohseni, Mona Mohammadi, Morteza Saghafian, Naser Talebizadeh Sardari

AI总结 CurveBench是一个用于从视觉输入中进行层次拓扑推理的基准，包含756张非相交的乔丹曲线图像，通过结构预测任务恢复由曲线诱导的根树结构。

详情

AI中文摘要

我们介绍了CurveBench，一种用于从视觉输入中进行层次拓扑推理的基准。CurveBench包含756张图像，这些图像中的乔丹曲线在易、多边形、地形启发、迷宫状和密集计数配置下成对不相交。每张图像都标注了一个根树，编码平面区域之间的包含关系。我们将任务定义为结构预测：给定一张图像，模型必须恢复由曲线诱导的完整根树。尽管该任务在视觉上简单，但最强的评估模型Gemini 3.1 Pro在CurveBench-Easy上仅达到71.1%的树生成准确率，在CurveBench-Hard上仅为19.1%。我们进一步通过RLVR风格的微调展示了基准的实用性。我们的训练Qwen3-VL-8B模型在CurveBench-Easy上将Qwen-3-VL-8B-Thinking的树生成准确率从2.8%提升到33.3%，超过GPT-5.4和Claude Opus 4.5。剩余的差距，尤其是在CurveBench-Hard上，表明精确的拓扑感知视觉推理仍远未解决。

英文摘要

We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over \texttt{Qwen-3-VL-8B-Thinking} from \textbf{2.8\%} to \textbf{33.3\%} tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.

URL PDF HTML ☆

赞 0 踩 0

2605.13161 2026-05-19 cs.CV cs.LG 版本更新

A$_3$B$_2$: Adaptive Asymmetric Adapter for Alleviating Branch Bias in Vision-Language Image Classification with Few-Shot Learning

A₃B₂：一种自适应非对称适配器，用于缓解视觉-语言图像分类中的分支偏差

Yiyun Zhou, Zhonghua Jiang, Wenkang Han, Kunxi Li, Mingjing Xu, Chang Yao, Jingyuan Chen

AI总结本文提出A₃B₂适配器，通过引入不确定性感知适配器阻尼机制，缓解少样本学习中的分支偏差问题，实验表明其在多个数据集上优于现有基线方法。

Comments Accepted by IJCAI 2026

详情

AI中文摘要

高效的迁移学习方法为大规模视觉-语言模型（例如CLIP）提供了强大的少样本迁移能力，但现有适配方法遵循固定微调范式，隐含假设图像和文本分支的重要性是均匀的，这一假设在图像分类中未被系统研究。通过深入分析，我们揭示了视觉-语言图像分类中的分支偏差问题：在分布外设置下，适配图像编码器并不总能提高性能。受此启发，我们提出了A₃B₂，一种自适应非对称适配器，用于缓解少样本学习中的分支偏差。A₃B₂引入了不确定性感知适配器阻尼（UAAD），在预测不确定性较高时自动抑制图像分支适配，实现软且数据驱动的控制，无需手动干预。在架构上，A₃B₂采用了一种轻量级非对称设计，受混合专家启发，结合负载平衡正则化。在三个少样本图像分类任务上，对11个数据集的广泛实验表明，A₃B₂在多个数据集上一致优于11个竞争的提示和适配基线方法。

英文摘要

Efficient transfer learning methods for large-scale vision-language models ($e.g.$, CLIP) enable strong few-shot transfer, yet existing adaptation methods follow a fixed fine-tuning paradigm that implicitly assumes a uniform importance of the image and text branches, which has not been systematically studied in image classification. Through extensive analysis, we reveal a Branch Bias issue in vision-language image classification: adapting the image encoder does not always improve performance under out-of-distribution settings. Motivated by this observation, we propose A$_3$B$_2$, an Adaptive Asymmetric Adapter that alleviates Branch Bias in few-shot learning. A$_3$B$_2$ introduces Uncertainty-Aware Adapter Dampening (UAAD), which automatically suppresses image-branch adaptation when prediction uncertainty is high, enabling soft and data-driven control without manual intervention. Architecturally, A$_3$B$_2$ adopts a lightweight asymmetric design inspired by mixture-of-experts with Load Balancing Regularization. Extensive experiments on three few-shot image classification tasks across 11 datasets demonstrate that A$_3$B$_2$ consistently outperforms 11 competitive prompt- and adapter-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.09730 2026-05-19 cs.LG cs.SE 版本更新

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

RubricRefine: 通过无训练预执行细化提升工具使用代理的可靠性

Will LeVine, Brendan Evers, Sam Saltwick, Abhay Venkatesh

AI总结 RubricRefine通过预执行语义合同验证，在无执行尝试的情况下提升工具使用代理的可靠性，平均在M3ToolEval上达到0.86，比现有推理时间基线低2.6倍。

详情

AI中文摘要

迭代自我细化是一种流行的推理时间可靠性技术，但其在代码模式工具使用中的有效性严重依赖反馈信号的结构：无结构的批评在不同模型间不一致，即使使用真实执行反馈进行修订也只能小幅提升（0.75 vs. 0.65基线）。主导的失败是跨工具合同违规（错误输出形状、错误工具路由、断裂的参数来源），这些失败在完成运行时不会引发错误，使运行时反馈不足。我们引入RubricRefine，一种无训练的预执行语义合同验证方法，生成任务和注册表特定的评分标准，对候选代码进行显式合同检查评分，并在任何执行发生前迭代修复失败。RubricRefine在M3ToolEval上达到0.86，平均跨七个模型，无执行尝试，比现有推理时间基线提升最高2.6倍。性能在主要单步API-Bank上保持稳定，与方法对跨工具合同结构的依赖一致。评分类别消融和校准分析进一步阐明了该方法何时及为何有效。

英文摘要

Iterative self-refinement is a popular inference-time reliability technique, but its effectiveness in code-mode tool use depends heavily on the structure of the feedback signal: unstructured critique helps inconsistently across models, and even revision with real execution feedback improves only modestly ($0.75$ vs. $0.65$ baseline). The dominant failures are inter-tool contract violations (wrong output shape, incorrect tool routing, broken argument provenance) that run to completion without raising errors, making runtime feedback insufficient. We introduce RubricRefine, a training-free method for pre-execution semantic contract verification that generates task- and registry-specific rubrics, scores candidate code against explicit contract checks, and iteratively repairs failures before any execution occurs. RubricRefine reaches $0.86$, averaged across seven models, on M3ToolEval with zero execution attempts, improving over prior inference-time baselines with up to $2.6\times$ lower latency. Performance remains flat on the predominantly single-step API-Bank, consistent with the method's reliance on inter-tool contract structure. A rubric-category ablation and calibration analysis further characterize when and why the method works.

URL PDF HTML ☆

赞 0 踩 0

2605.08475 2026-05-19 cs.LG cs.AI cs.NA math.NA math.OC 版本更新

Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

Transformer 可实现用于上下文高斯核回归的预条件Richardson迭代

Mingsong Yan, Dongyang Li, Charles Kulick, Sui Tang

AI总结本文研究了上下文核岭回归，证明标准softmax注意力transformer可通过预条件Richardson迭代近似高斯核回归预测器，展示了transformer架构中的功能分解。

详情

AI中文摘要

CoLLM：面向共享GPU集群的SLO感知LLM服务连续适应

Shaoyuan Huang, Yunfeng Zhao, Na Yan, Tiancheng Zhang, Xiaokai Wang, Xiaofei Wang, Wenyu Wang, Yansha Deng

AI总结 CoLLM通过统一联邦参数高效微调与推理，实现LLM服务在共享GPU集群中的连续适应，提升模型质量和效率，实验显示其在吞吐量上表现优异。

详情

AI中文摘要

随着大型语言模型（LLM）在边缘智能中被越来越多地用于驱动领域特定应用和个性化服务，LLM训练后的质量与效率，包括微调和推理，因资源受限而变得至关重要。尽管最近在联邦参数高效微调（FL PEFT）和低延迟推理方面的进展提高了单个任务性能，但微调和推理仍被视为孤立的工作负载，忽略了它们的相互依赖性，导致冗余部署和推理质量提升延迟。为了解决这些限制，我们引入了一个新的共执行框架，并将其实例化为CoLLM，一个将FL PEFT和推理统一在共享边缘副本和模型参数上的系统。CoLLM通过在副本和集群层面解决关键挑战，实现了高效模型参数重用和工作负载平衡，从而联合优化长期模型质量增益和短期推理效率。在多样化的LLM和真实世界跟踪上进行的广泛评估显示，CoLLM在吞吐量上比最先进的LLM系统高出多达3倍，证明了其在边缘智能中无缝LLM训练后处理的有效性。

英文摘要

As Large Language Models (LLMs) are increasingly adopted in edge intelligence to power domain-specific applications and personalized services, the quality and efficiency of the LLM post-training phase-including fine-tuning and inference, have become critical due to constrained resources. Although recent advances in federated parameter-efficient fine-tuning (FL PEFT) and low-latency inference have improved individual task performance, fine-tuning and inference are still handled as isolated workloads, which overlooks their interdependence and results in redundant deployments and delayed improvement in inference quality. To address these limitations, we introduce a new co-execution framework and instantiate it with CoLLM, a system that unifies FL PEFT and inference on shared edge replicas and model parameters. CoLLM addresses key challenges at both replica and cluster levels through: (1) an intra-replica model sharing mechanism that enables real-time model parameter reuse via unmerged inference and shadow adapter strategies; and (2) a two-timescale inter-replica coordination algorithm that adaptively balances fine-tuning and inference workloads to jointly optimize long-term model quality gains and short-term inference efficiency. Extensive evaluation across diverse LLMs and real-world traces show that CoLLM consistently outperforms state-of-the-art LLM systems, achieving up to 3x higher goodput, demonstrating its effectiveness in enabling seamless LLM post-training for edge intelligence.

URL PDF HTML ☆

赞 0 踩 0

2604.02178 2026-05-19 cs.CL cs.AI cs.LG 版本更新

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

专家反击：在专家层面解读混合专家语言模型

Jeremy Herbst, Stefan Wermter, Jae Hee Lee

AI总结研究通过k稀疏探测比较MoE专家与密集FFN，发现专家神经元更单语义，提出以专家为分析单位，揭示专家是细粒度任务专家，而非领域专家或token处理者。

Comments 8 pages, 7 Figures. Accepted at ICML 2026. Improved writing, changed author order, updated citations

详情

AI中文摘要

混合专家（MoE）架构已成为扩展大语言模型（LLMs）的主导选择，每个token仅激活部分参数。尽管MoE主要用于计算效率，但其稀疏性是否使其比密集前馈网络（FFN）更容易解释仍存疑问。通过k稀疏探测比较MoE专家与密集FFN，发现专家神经元始终更单语义，随着路由稀疏性增加，差距扩大。这表明稀疏性迫使神经元和整个专家朝单语义方向发展。基于此发现，我们从神经元层面转向专家层面作为更有效的分析单位。通过自动解读数百个专家，验证了这一方法。此分析解决了关于专业化争论：专家既非广领域专家（如生物学）也非简单token处理者。相反，它们作为细粒度任务专家，专门处理语言操作或语义任务（如闭合LaTeX括号）。我们的发现表明，MoE在专家层面具有内在可解释性，为大规模模型可解释性提供了更清晰路径。代码见：https://github.com/jerryy33/MoE_analysis。

英文摘要

Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are primarily adopted for computational efficiency, it remains an open question whether their sparsity makes them inherently easier to interpret than dense feed-forward networks (FFNs). We compare MoE experts and dense FFNs using $k$-sparse probing and find that expert neurons are consistently less polysemantic, with the gap widening as routing becomes sparser. This suggests that sparsity pressures both individual neurons and entire experts toward monosemanticity. Leveraging this finding, we zoom out from the neuron to the expert level as a more effective unit of analysis. We validate this approach by automatically interpreting hundreds of experts. This analysis allows us to resolve the debate on specialization: experts are neither broad domain specialists (e.g., biology) nor simple token-level processors. Instead, they function as fine-grained task experts, specializing in linguistic operations or semantic tasks (e.g., closing brackets in $\LaTeX{}$). Our findings suggest that MoEs are inherently interpretable at the expert level, providing a clearer path toward large-scale model interpretability. Code is available at: https://github.com/jerryy33/MoE_analysis.

URL PDF HTML ☆

赞 0 踩 0

2603.20421 2026-05-19 cs.CR cs.AR cs.LG cs.NA math.NA 版本更新

Hawkeye: Reproducing GPU-Level Non-Determinism

Hawkeye：重现GPU级非确定性

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

AI总结 Hawkeye系统通过在CPU上重现GPU矩阵乘法运算，实现机器学习模型训练和推理流程的精确复现，解决了传统验证方法的计算开销和鲁棒性问题。

Comments Accepted to MLSys 2026

详情

AI中文摘要

我们提出了Hawkeye系统，用于分析和重现GPU级别的算术运算。通过我们的框架，任何人都可以在CPU上重新执行机器学习模型训练或推理流程中底层的矩阵乘法运算，而不会有任何精度损失。这与以往的可验证机器学习方法形成鲜明对比，后者要么对原始模型所有者引入显著的计算开销，要么导致非鲁棒性和质量退化。Hawkeye的主要技术贡献是系统性的精心设计测试序列，研究矩阵乘法中舍入方向、亚正常数处理以及非结合性积累的顺序，针对NVIDIA的Tensor Cores。我们在多种NVIDIA GPU架构（Ampere、Hopper和Lovelace）和精度类型（FP16、BFP16、FP8）上测试和评估了我们的框架。在所有测试用例中，Hawkeye都能在CPU上完美重现矩阵乘法，为高效且可信的第三方审计ML模型训练和推理铺平了道路。我们提供了Hawkeye的源代码，网址为https://github.com/badasherez/gpu-simulator。

英文摘要

We present Hawkeye, a system for analyzing and reproducing GPU-level arithmetic operations. Using our framework, anyone can re-execute on a CPU the exact matrix multiplication operations underlying a machine learning model training or inference workflow that was executed on an NVIDIA GPU, without any precision loss. This is in stark contrast to prior approaches to verifiable machine learning, which either introduce significant computation overhead to the original model owner, or suffer from non-robustness and quality degradation. The main technical contribution of Hawkeye is a systematic sequence of carefully crafted tests that study rounding direction, subnormal number handling, and order of (non-associative) accumulation during matrix multiplication on NVIDIA's Tensor Cores. We test and evaluate our framework on multiple NVIDIA GPU architectures ( Ampere, Hopper, and Lovelace) and precision types (FP16, BFP16, FP8). In all test cases, Hawkeye enables perfect reproduction of matrix multiplication on a CPU, paving the way for efficient and trustworthy third-party auditing of ML model training and inference. We provide source code for Hawkeye at https://github.com/badasherez/gpu-simulator.

URL PDF HTML ☆

赞 0 踩 0

2603.19470 2026-05-19 cs.LG cs.AI 版本更新

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

自适应分层扰动：统一LLM RL中的非策略修正

Chenlu Ye, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang, Abhinav Gullapalli, Hao Chen, Jing Huang, Tong Zhang

AI总结本文提出自适应分层扰动（ALP），通过在更新过程中向每一层的输入隐藏状态注入可控噪声，缓解策略退化和训练-推理不匹配问题，提升训练稳定性与探索能力。

详情

AI中文摘要

非策略问题如策略老化和训练-推理不匹配已成为LLM RL训练稳定性及进一步探索的主要瓶颈。由于增强推理效率的技术，推理策略与更新策略的分布差距扩大，导致重要性比率呈重尾分布。当策略局部尖锐时，重尾比率出现，进一步放大梯度并可能使更新超出信任区域。为解决此问题，我们提出自适应分层扰动（ALP），在更新过程中向每一层的输入隐藏状态注入小的可学习扰动，并将由此产生的扰动策略作为重要性比率的分子，与未改变的推理策略进行比较。直观上，通过向中间表示添加受控噪声，ALP防止更新策略过于偏离推理策略，并扩大策略家族以覆盖推理时的不匹配噪声。因此，扁平化的分布可自然缩小更新策略与推理策略之间的差距，并减少重要性比率的尾部，从而维持训练稳定性。这通过实验证实。在单轮数学和多轮工具集成推理任务中的实验表明，ALP不仅提高了最终性能，还避免了重要性比率尾部的爆炸和KL尖峰，同时提升了探索能力。消融实验显示，跨所有层的表示级扰动效果最佳，显著优于部分层和logits-only变体。

英文摘要

Off-policy problems such as policy staleness and training--inference mismatch have become a major bottleneck for training stability and further exploration in LLM RL. The distribution gap between the inference and updated policies grows because of the techniques to enhance inference efficiency, leading to heavy-tailed importance ratios. Heavy-tailed ratios arise when the policy is locally sharp, which further inflates gradients and can push updates outside the trust region. To address this, we propose Adaptive Layerwise Perturbation (ALP), which injects small learnable perturbations into the input hidden states of each layer during updates and uses the resulting perturbed policy as the numerator of the importance ratio against the unchanged inference policy in the objective. Intuitively, by adding controlled noise to intermediate representations, ALP prevents the updated policy from deviating too sharply from the inference policy and enlarges the policy family to cover inference-time mismatch noise. Hence, the flattened distribution can naturally tighten the gap between the updated and inference policies and reduce the tail of importance ratios, thus maintaining training stability. This is further validated empirically. Experiments on single-turn math and multi-turn tool-integrated reasoning tasks show that ALP not only improves final performance, but also avoids blow-up in the importance-ratio tail and KL spikes during iterative training, along with boosted exploration. Ablations show that representation-level perturbations across all layers are most effective, substantially outperforming partial-layer and logits-only variants.

URL PDF HTML ☆

赞 0 踩 0

2603.03538 2026-05-19 cs.LG 版本更新

Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

链式思维验证器的在线可学习性：正确性与完备性的权衡

Maria-Florina Balcan, Avrim Blum, Kiriaki Fragkia, Zhiyuan Li, Dravyansh Sharma

AI总结本文提出一种在线学习框架，用于学习链式思维验证器，通过检查解决方案的正确性，解决生成器与验证器之间的反馈循环导致的分布偏移问题，并引入新的Littlestone维度扩展以优化验证器的学习。

详情

AI中文摘要

大型语言模型（LLMs）通过链式思维生成在解决复杂推理和规划任务中展现出巨大潜力。然而，当前LLMs的输出不完全可靠，需要仔细验证。即使LLMs随时间变得更准确，学习的验证器可以帮助提高信任度，执行安全约束，并确保与个人偏好一致。然而，学习验证器的主要挑战在于，当其输出被生成器用来改进推理时，生成器与验证器之间的反馈循环可能产生显著的分布偏移。受此挑战启发，我们提出了一种在线学习框架，用于学习链式思维验证器，给定一个问题和一系列推理步骤，检查解决方案的正确性。我们强调了正确性错误（未能捕捉推理轨迹中的错误）和完备性错误（将正确的推理步骤标记为错误）的不对称作用，并引入了新的Littlestone维度扩展，紧密刻画了在可实现设置中学习验证器的错误界。我们提供了最优算法，用于找到帕累托前沿（给定声音错误预算下的最小总错误数）以及最小化不对称成本的线性组合。此外，我们进一步展示了如何利用学习的验证器来提高弱生成器的准确性，并使生成的证明超越其初始训练内容。在假设其中一个生成器能够以某些最小概率生成下一个推理步骤的前提下，我们展示了如何学习一个具有小误差和回避率的强生成器。

英文摘要

Large Language Models (LLMs) with chain-of-thought generation have demonstrated great potential for solving complex reasoning and planning tasks. However, the output of current LLMs is not fully reliable and needs careful verification. Even if LLMs get more accurate over time, learned verifiers can help increase trust, enforce safety constraints, and ensure alignment with personal preferences. A major challenge in learning verifiers, however, especially when their output will be used by the generator to improve its reasoning, is that the feedback loop between generator and verifier may produce substantial distribution shift. Motivated by this challenge, we propose an online learning framework for learning chain-of-thought verifiers that, given a problem and a sequence of reasoning steps, check the correctness of the solution. Highlighting the asymmetric role of soundness errors (failure in catching errors in a reasoning trace) and completeness errors (flagging correct reasoning steps as wrong), we introduce novel extensions of the Littlestone dimension which tightly characterize the mistake bounds for learning a verifier in the realizable setting. We provide optimal algorithms for finding the Pareto-frontier (the smallest total number of mistakes given a budget of soundness mistakes) as well as for minimizing a linear combination of asymmetric costs. We further show how our learned verifiers can be used to boost the accuracy of a collection of weak generators, and enable generation of proofs beyond what they were initially trained on. With the mild assumption that one of the generators can generate the next reasoning step correctly with some minimal probability, we show how to learn a strong generator with small error and abstention rates.

URL PDF HTML ☆

赞 0 踩 0

2603.02218 2026-05-19 cs.LG cs.AI cs.CL cs.IT math.IT 版本更新

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

仅在自我合成管道确保可学习信息增益时，自我博弈才会进化

Wei Liu, Siya Qi, Yali Du, Yulan He

AI总结本文通过实验揭示可持续自我进化需要可学习信息递增的自我合成数据管道，提出自我进化LLM的三重角色及系统设计，解决自我博弈停滞问题。

Comments 10 pages, 6 figures, 7 formulas, accepted by ICML 2026 position paper track

详情

AI中文摘要

大型语言模型（LLMs）使构建通过自我进化循环改进的系统成为可能，但许多现有方案更倾向于自我博弈且易陷入停滞。核心失败模式是循环生成更多数据但未增加下一轮的可学习信息。通过自我博弈编程任务实验，我们发现可持续自我进化需要具有可学习信息递增的自我合成数据管道。我们识别出自我进化LLM的三重角色：生成任务的Proposer、尝试解决方案的Solver以及提供训练信号的Verifier，并提出三种系统设计共同针对三重角色视角下的可学习信息增益。不对称共进化在角色间形成弱到强到弱的循环。容量增长扩展参数和推理时间预算以匹配上升的可学习信息。主动信息寻求引入外部上下文和新任务来源以防止饱和。这些模块共同提供从脆弱自我博弈动态到持续自我进化的可测量系统级路径。

英文摘要

Large language models (LLMs) make it plausible to build systems that improve through self-evolving loops, but many existing proposals are better understood as self-play and often plateau quickly. A central failure mode is that the loop synthesises more data without increasing learnable information for the next iteration. Through experiments on a self-play coding task, we reveal that sustainable self-evolution requires a self-synthesised data pipeline with learnable information that increases across iterations. We identify triadic roles that self-evolving LLMs play: the Proposer, which generates tasks; the Solver, which attempts solutions; and the Verifier, which provides training signals, and we identify three system designs that jointly target learnable information gain from this triadic roles perspective. Asymmetric co-evolution closes a weak-to-strong-to-weak loop across roles. Capacity growth expands parameter and inference-time budgets to match rising learnable information. Proactive information seeking introduces external context and new task sources that prevent saturation. Together, these modules provide a measurable, system-level path from brittle self-play dynamics to sustained self-evolution.

URL PDF HTML ☆

赞 0 踩 0

2602.16473 2026-05-19 cs.LG cs.FL cs.LO 版本更新

Synthesis and Verification of Transformer Programs (Technical Report)

变换器程序的合成与验证（技术报告）

Hongjian Jiang, Matthew Hague, Philipp Rümmer, Anthony Widjaja Lin

AI总结本文提出新算法自动验证C-RASP程序，并提供学习C-RASP程序的新方法，应用于变换器程序优化与约束学习。

2601.21941 2026-05-19 cs.LG cs.AI 版本更新

Robust Multimodal Representation Learning in Healthcare

医疗领域鲁棒多模态表征学习

Xiaoguang Zhu, Linxiao Gong, Lianlong Sun, Yang Liu, Haoyu Wang, Jing Liu

AI总结本文提出双流特征去相关框架，通过结构因果分析处理医疗多模态数据中的系统性偏差，提升模型泛化能力，实验验证在MIMIC-IV、eICU和ADNI数据集上的性能提升。

详情

DOI: 10.1109/ICASSP55912.2026.11460772

AI中文摘要

医疗多模态表征学习旨在将异构数据整合为统一的患者表示以支持临床结果预测。然而，真实世界医疗数据集通常包含来自多个来源的系统性偏差，这对医疗多模态表征学习提出了重大挑战。现有方法通常专注于有效的多模态融合，忽视了影响泛化能力的固有偏见特征。为解决这些挑战，我们提出了一种双流特征去相关框架，通过引入由潜在混杂因素引入的结构因果分析来识别和处理偏见。我们的方法采用因果偏见去相关框架，结合双流神经网络，将因果特征与虚假相关性分离，利用广义交叉熵损失和互信息最小化实现有效去相关。该框架模型无关，可集成到现有医疗多模态学习方法中。在MIMIC-IV、eICU和ADNI数据集上的全面实验显示了一致的性能提升。

英文摘要

Medical multimodal representation learning aims to integrate heterogeneous data into unified patient representations to support clinical outcome prediction. However, real-world medical datasets commonly contain systematic biases from multiple sources, which poses significant challenges for medical multimodal representation learning. Existing approaches typically focus on effective multimodal fusion, neglecting inherent biased features that affect the generalization ability. To address these challenges, we propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases through structural causal analysis introduced by latent confounders. Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations, utilizing generalized cross-entropy loss and mutual information minimization for effective decorrelation. The framework is model-agnostic and can be integrated into existing medical multimodal learning methods. Comprehensive experiments on MIMIC-IV, eICU, and ADNI datasets demonstrate consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2512.12572 2026-05-19 cs.LG stat.ML 版本更新

On the Accuracy of Newton Step and Influence Function Data Attributions

关于牛顿步和影响函数数据归因的准确性

Ittai Rubinstein, Samuel B. Hopkins

AI总结本文研究了牛顿步和影响函数数据归因方法的准确性，推导出误差缩放规律，揭示了NS方法在特定条件下更准确的原因。

详情

AI中文摘要

数据归因旨在通过估计移除某些训练点时预测的变化来解释模型预测，广泛应用于可解释性、信用分配、遗忘和隐私等领域。即使在逻辑回归这种相对简单的案例中，现有对影响函数（IF）和单步牛顿步（NS）等主流数据归因方法的数学分析仍存在两个关键局限：首先，它们依赖于全局强凸性假设，这在实践中往往不成立；其次，所得的界限在参数数量（d）和移除样本数量（k）方面表现极差。因此，这些分析不够精确，无法回答诸如“每种方法的渐进行为误差如何”或“给定数据集哪种方法更准确”等基本问题。本文引入了针对凸学习问题的NS和IF数据归因方法的新分析。据我们所知，这是首个不假设全局强凸性且解释了[KATL19]和[RH25a]观察到NS数据归因常比IF更准确的分析。我们证明，对于足够良好的逻辑回归，我们的界限在多项对数因子范围内渐近紧致，从而得到平均样本移除情况下的误差缩放定律。[公式]

英文摘要

Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy. Even in the relatively simple case of logistic regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters ($d$) and the number of samples removed ($k$). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given dataset?" In this paper, we introduce a new analysis of the NS and IF data attribution methods for convex learning problems. To the best of our knowledge, this is the first analysis of these questions that does not assume global strong convexity and also the first explanation of [KATL19] and [RH25a]'s observation that NS data attribution is often more accurate than IF. We prove that for sufficiently well-behaved logistic regressions, our bounds are asymptotically tight up to poly-logarithmic factors, yielding scaling laws for the errors in the average-case sample removals. \[ \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T - \hatθ_T^{\mathrm{NS}}\|_2 \bigr] = \widetildeΘ\!\left(\frac{k d}{n^2}\right), \qquad \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T^{\mathrm{NS}} - \hatθ_T^{\mathrm{IF}}\|_2 \bigr] = \widetildeΘ\!\left( \frac{(k + d)\sqrt{k d}}{n^2} \right). \]

URL PDF HTML ☆

赞 0 踩 0

2512.03280 2026-05-19 cs.LG cs.AI 版本更新

全球局部平滑性：线搜索和自适应步长在理论上也能有所帮助！

Curtis Fox, Aaron Mishkin, Sharan Vaswani, Mark Schmidt

AI总结本文提出全球局部平滑性概念，通过函数属性定义，允许用迭代无关常数界迭代复杂度，展示线搜索优于固定步长，且在某些情况下梯度下降比加速方法更优。

详情

AI中文摘要

本文提出全球局部平滑性概念，通过函数属性定义，允许用迭代无关常数界迭代复杂度，展示线搜索优于固定步长，且在某些情况下梯度下降比加速方法更优。

英文摘要

Iteration complexities for optimizing smooth functions with first-order algorithms are typically stated in terms of a global Lipschitz constant of the gradient, and near-optimal results are then achieved using fixed step sizes. But many objective functions that arise in practice have regions with small Lipschitz constants where larger step sizes can be used. Many local Lipschitz assumptions have been proposed, which have led to results showing that adaptive step sizes and/or line searches yield improved convergence rates over fixed step sizes. However, these faster rates tend to depend on the iterates of the algorithm, which makes it difficult to compare the iteration complexities of different methods. We consider a simple characterization of global and local ("glocal") smoothness that only depends on properties of the function. This allows upper bounds on iteration complexities in terms of iterate-independent constants and enables us to compare iteration complexities between algorithms. Under this assumption it is straightforward to show the advantages of line searches over fixed step sizes and that, in some settings, gradient descent with line search has a better iteration complexity than accelerated methods with fixed step sizes. We further show that glocal smoothness can lead to improved complexities for the Polyak and AdGD step sizes, as well other algorithms including coordinate optimization, stochastic gradient methods, accelerated gradient methods, and non-linear conjugate gradient methods.

URL PDF HTML ☆

赞 0 踩 0

2502.18663 2026-05-19 cs.LG cs.DM cs.SI math.CO math.GR 版本更新

CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs

CayleyPy RL：Cayley图上的路径寻找与强化学习

A. Chervov, M. Obozov, A. Soibelman, S. Lytkin, I. Kiselev, S. Fironov, A. Lukyanenko, A. Dolgorukova, A. Ogurtsov, F. Petrov, S. Krymskii, M. Evseev, L. Grunvald, D. Gorodkov, G. Antiufeev, G. Verbii, V. Zamkovoy, L. Cheldieva, I. Koltsov, A. Sychev, A. Eliseev, S. Nikolenko, N. Narynbaev, R. Turtayev, N. Rokotyan, S. Kovalev, A. Rozanov, V. Nelin, S. Ermilov, L. Shishina, D. Mamayeva, A. Korolkova, K. Khoruzhii, A. Romanov

AI总结本文提出一种结合强化学习与扩散距离方法的新型路径寻找方法，通过基准测试和数学方法验证了对称群Cayley图直径的猜想，并在Kaggle平台发起挑战以促进众包活动。

Comments 32+16 pages

详情

DOI: 10.4310/ATMP.260413005111

AI中文摘要

本文是关于开发高效人工智能方法用于在极大规模图（如10^70个节点）上路径寻找的一系列研究中的第二篇，重点研究Cayley图和数学应用。CayleyPy项目是研究的核心部分。本文提出了一种新的强化学习方法与更直接的扩散距离方法的结合。我们的分析包括对方法关键构建块的各种选择进行基准测试：神经网络架构、随机游走生成器和束搜索路径寻找。我们将其与经典计算机代数系统GAP进行比较，证明其在所考虑的例子中超越了GAP。作为特定的数学应用，我们研究了对称群的Cayley图，其生成元为循环移位和置换。我们通过机器学习和数学方法为OEIS-A186783猜想提供有力支持，即直径等于n(n-1)/2。我们识别了猜想中的最长元素并生成其分解。我们证明了直径下界为n(n-1)/2 -n/2，上界为n(n-1)/2 + 3n，并通过给定复杂度的算法证明。我们还提出了由数值实验激发的若干猜想，包括关于中心极限现象（增长近似由Gumbel分布）、图谱的均匀分布以及排序网络的数值研究。为了促进众包活动，我们在Kaggle平台创建挑战并邀请贡献以改进和基准测试Cayley图路径寻找及其他任务的方法。

英文摘要

This paper is the second in a series of studies on developing efficient artificial intelligence-based approaches to pathfinding on extremely large graphs (e.g. $10^{70}$ nodes) with a focus on Cayley graphs and mathematical applications. The open-source CayleyPy project is a central component of our research. The present paper proposes a novel combination of a reinforcement learning approach with a more direct diffusion distance approach from the first paper. Our analysis includes benchmarking various choices for the key building blocks of the approach: architectures of the neural network, generators for the random walks and beam search pathfinding. We compared these methods against the classical computer algebra system GAP, demonstrating that they "overcome the GAP" for the considered examples. As a particular mathematical application we examine the Cayley graph of the symmetric group with cyclic shift and transposition generators. We provide strong support for the OEIS-A186783 conjecture that the diameter is equal to n(n-1)/2 by machine learning and mathematical methods. We identify the conjectured longest element and generate its decomposition of the desired length. We prove a diameter lower bound of n(n-1)/2-n/2 and an upper bound of n(n-1)/2+ 3n by presenting the algorithm with given complexity. We also present several conjectures motivated by numerical experiments, including observations on the central limit phenomenon (with growth approximated by a Gumbel distribution), the uniform distribution for the spectrum of the graph, and a numerical study of sorting networks. To stimulate crowdsourcing activity, we create challenges on the Kaggle platform and invite contributions to improve and benchmark approaches on Cayley graph pathfinding and other tasks.

URL PDF HTML ☆

赞 0 踩 0

2502.17007 2026-05-19 cs.LG cs.AI stat.ML 版本更新

Uncertainty Quantification as a Principled Foundation for Explainable Artificial Intelligence: A Case Study of Counterfactual Explanations

不确定性量化作为可解释人工智能的原理性基础：反事实解释的案例研究

Kacper Sokol, Santo M. A. R. Thies, Eyke Hüllermeier

AI总结本文通过反事实可解释性中的不确定性量化，展示其作为统一框架的潜力，提出两种解释器变体，并证明其在性能上优于现有方法。

2306.12282 2026-05-19 cs.DS cs.LG math.OC 版本更新

Online Resource Allocation with Convex-set Machine-Learned Advice

在线资源分配与凸集机器学习建议

Negin Golrezaei, Patrick Jaillet, Zijie Zhou

AI总结本文提出一种在线资源分配框架，结合凸集机器学习建议，平衡一致性与鲁棒性，通过动态保护水平提升在不确定环境下的性能。

Comments 77 pages, 8 figures

详情

AI中文摘要

决策者往往能够获得关于未来需求的机器学习预测，这些预测可以帮助指导在线资源分配决策。然而，这些预测可能不准确。我们开发了一个在线资源分配框架，该框架可以处理潜在不可靠的机器学习建议，其中建议以需求向量的凸不确定性集形式表示，而不是单一点估计。我们引入了一类参数化的帕累托最优在线算法，以平衡一致性和鲁棒性。一致性比率衡量在建议准确时的性能，而鲁棒比率衡量在对抗性需求下建议不准确时的性能。对于目标一致性水平C，我们的算法在满足至少一致性水平C的条件下最大化鲁棒性。我们的方法通过引入自适应保护水平扩展了经典保护水平算法，这些保护水平能够动态响应建议中的不确定性。我们还提供了一种计算最大可实现一致性水平的方法。数值实验表明，我们的算法在有效平衡最坏情况和平均情况性能方面优于基准方法，包括仅基于点预测的方法。

英文摘要

Decision-makers often have access to machine-learned predictions about future demand that can help guide online resource allocation decisions. However, such predictions may be inaccurate. We develop a framework for online resource allocation with potentially unreliable machine-learned advice, where the advice is represented as a convex uncertainty set for the demand vector rather than a single point estimate. We introduce a parameterized class of Pareto-optimal online algorithms that balance consistency and robustness. The consistent ratio measures performance when the advice is accurate, while the robust ratio measures performance under adversarial demand when the advice is inaccurate. For a target consistency level C, our algorithms maximize robustness subject to achieving at least consistency level C. Our approach extends classical protection-level algorithms by introducing adaptive protection levels that dynamically respond to uncertainty in the advice. We also provide a method for computing the maximum achievable consistency level. Numerical experiments demonstrate that our algorithms outperform benchmark methods, including approaches based solely on point forecasts, by effectively balancing worst-case and average-case performance.

URL PDF HTML ☆

赞 0 踩 0

2304.03427 2026-05-19 cs.CL cs.AI cs.CY cs.LG 版本更新

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

清除珠宝：基于谷歌OCR的藏文手稿的神经拼写纠正模型

Queenie Luo, Yung-Sung Chuang

AI总结本文提出基于谷歌OCR的藏文手稿的神经拼写纠正模型，通过改进的Transformer架构实现自动纠正OCR噪声输出，实验表明其优于其他序列模型。

详情

DOI: 10.1145/3654811
Journal ref: Association for Computing Machinery 2024

AI中文摘要

人文学者依赖古代手稿来研究历史、宗教和社会政治结构。许多努力致力于使用OCR技术数字化这些珍贵的手稿，但大多数手稿因数世纪的污损，使得OCR程序无法准确捕捉褪色的字符和污渍。本文提出基于谷歌OCR处理的藏文手稿的神经拼写纠正模型，用于自动纠正OCR输出中的噪声。本文分为四个部分：数据集、模型架构、训练和分析。首先，我们将原始藏文电子文本语料库特征工程为两个结构化数据框——一组配对玩具数据和一组配对真实数据。然后，我们在Transformer架构中实现了置信度评分机制，用于拼写纠正任务。根据损失和字符错误率，我们的Transformer加置信度评分机制架构证明优于Transformer、LSTM-2-LSTM和GRU-2-GRU架构。最后，为了检验模型的鲁棒性，我们分析了错误的标记，可视化了模型中的注意力和自我注意力热图。

英文摘要

Scholars in the humanities rely heavily on ancient manuscripts to study history, religion, and socio-political structures in the past. Many efforts have been devoted to digitizing these precious manuscripts using OCR technology, but most manuscripts were blemished over the centuries so that an Optical Character Recognition (OCR) program cannot be expected to capture faded graphs and stains on pages. This work presents a neural spelling correction model built on Google OCR-ed Tibetan Manuscripts to auto-correct OCR-ed noisy output. This paper is divided into four sections: dataset, model architecture, training and analysis. First, we feature-engineered our raw Tibetan etext corpus into two sets of structured data frames -- a set of paired toy data and a set of paired real data. Then, we implemented a Confidence Score mechanism into the Transformer architecture to perform spelling correction tasks. According to the Loss and Character Error Rate, our Transformer + Confidence score mechanism architecture proves to be superior to Transformer, LSTM-2-LSTM and GRU-2-GRU architectures. Finally, to examine the robustness of our model, we analyzed erroneous tokens, visualized Attention and Self-Attention heatmaps in our model.

URL PDF HTML ☆

赞 0 踩 0

2010.15538 2026-05-19 stat.ML cs.LG 版本更新

Matérn Gaussian Processes on Graphs

图上的Matérn高斯过程

Viacheslav Borovitskiy, Iskander Azangulov, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth, Nicolas Durrande

AI总结本文研究了图上Matérn高斯过程，利用其随机偏微分方程特性，继承了欧几里得和黎曼流形高斯过程的特性，提供标准训练方法，使其适用于小批量和非共轭场景。

详情

Journal ref: Artificial Intelligence and Statistics, 2021

AI中文摘要

高斯过程是一种用于学习未知函数的灵活框架，允许利用对函数性质的先验信息。尽管许多不同的高斯过程模型在欧几里得输入空间中 readily available，但对于输入空间为无向图的高斯过程，选择则更加有限。在本文中，我们利用Matérn高斯过程的随机偏微分方程特性——在欧几里得设置中广泛使用的模型类——来研究其在无向图上的类比。我们证明，所得到的高斯过程继承了其欧几里得和黎曼流形类比的各种吸引特性，并提供了允许使用标准方法（如诱导点）进行训练的技术。这使得图Matérn高斯过程能够应用于小批量和非共轭设置，从而使其更易于从业者使用，并更容易在更大的学习框架中部署。

英文摘要

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes - a widely-used model class in the Euclidean setting - to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.

URL PDF HTML ☆

赞 0 踩 0

1908.05387 2026-05-19 cs.LG stat.ML 版本更新

HONEM: Learning Embedding for Higher Order Networks

HONEM：用于高阶网络的嵌入学习

Mandana Saebi, Giovanni Luca Ciampaglia, Lance M Kaplan, Nitesh V Chawla

AI总结本文提出HONEM方法，针对高阶网络结构，有效捕捉非马尔可夫高阶依赖，提升节点分类、网络重建、链接预测和可视化性能。

详情

DOI: 10.1089/big.2019.0169
Journal ref: Big Data 8, no. 4 (2020): 255-269

AI中文摘要

图网络上的表示学习为手动特征工程往往繁琐的过程提供了一个强大的替代方案，因此近年来取得了显著的成功。然而，现有的所有表示学习方法都是基于一阶网络（FON），即只捕捉节点之间成对相互作用的网络。因此，这些方法可能无法纳入非马尔可夫高阶依赖性。因此，生成的嵌入可能无法准确表示网络中的底层现象，导致在不同的归纳或传递学习任务中表现不佳。为了解决这一挑战，本文提出了HONEM，一种能够捕捉网络中非马尔可夫高阶依赖性的高阶网络嵌入方法。HONEM专门针对高阶网络结构（HON）设计，并在包含非马尔可夫高阶依赖性的网络中，在节点分类、网络重建、链接预测和可视化任务中优于其他最先进的方法。

英文摘要

Representation learning on networks offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. However, all the existing representation learning methods are based on the first-order network (FON), that is, the network that only captures the pairwise interactions between the nodes. As a result, these methods may fail to incorporate non-Markovian higher-order dependencies in the network. Thus, the embeddings that are generated may not accurately represent of the underlying phenomena in a network, resulting in inferior performance in different inductive or transductive learning tasks. To address this challenge, this paper presents HONEM, a higher-order network embedding method that captures the non-Markovian higher-order dependencies in a network. HONEM is specifically designed for the higher-order network structure (HON) and outperforms other state-of-the-art methods in node classification, network re-construction, link prediction, and visualization for networks that contain non-Markovian higher-order dependencies.

URL PDF HTML ☆

赞 0 踩 0