arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 12 篇

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

RNN(p) for Power Consumption Forecasting

RNN(p) 用于电力消耗预测

Roberto Baviera, Pietro Manzoni

发表机构 * Politecnico di Milano, Department of Mathematics(米兰理工大学数学系) University of Edinburgh, Business School(爱丁堡大学商学院)

AI总结 提出RNN(p)作为ARX(p)的推广,用于多时间尺度季节模式预测,通过结构化反馈设计高效训练策略,在电力消耗预测中实现高精度与可解释性。

详情
AI中文摘要

一种基本的循环神经网络,它作用于p个时间滞后,称为RNN(p),是线性自回归模型ARX(p)的自然推广。对于在多个时间尺度上显示固有季节模式的变量,如能源、经济和金融时间序列中经常观察到的,它是一个强大的预测工具。RNN(p)模型的结构,以跨时间滞后的结构化反馈为特征,使得设计高效的训练策略成为可能。我们对这些模型的学习算法进行了比较研究,对其计算复杂度和训练性能进行了严格分析。我们展示了RNN(p)模型在电力消耗预测中的两个应用,这是能源领域的一个关键领域,准确的预测为运营和财务决策提供信息。实验结果表明,RNN(p)模型在保持高度可解释性的同时实现了出色的预测精度。这些特性使其非常适合能源市场和其他金融科技应用中的决策,其中可靠的预测在经济中发挥着重要作用。

英文摘要

An elementary Recurrent Neural Network that operates on p time lags, called an RNN(p), is the natural generalisation of a linear autoregressive model ARX(p). It is a powerful forecasting tool for variables displaying inherent seasonal patterns across multiple time scales, as is often observed in energy, economic, and financial time series. The architecture of RNN(p) models, characterised by structured feedbacks across time lags, enables the design of efficient training strategies. We conduct a comparative study of learning algorithms for these models, providing a rigorous analysis of their computational complexity and training performance. We present two applications of RNN(p) models in power consumption forecasting, a key domain within the energy sector where accurate forecasts inform both operational and financial decisions. Experimental results show that RNN(p) models achieve excellent forecasting accuracy while maintaining a high degree of interpretability. These features make them well-suited for decision-making in energy markets and other fintech applications where reliable predictions play a significant economic role.

2503.01805 2026-06-18 cs.LG cs.AI cs.CL 版本更新

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

图任务算法推理中Transformer的深度-宽度权衡

Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir Globerson

发表机构 * Courant Institute of Mathematical Sciences, New York University(纽约大学应用数学科学研究所) Google Research(谷歌研究) Meta AI Bar-Ilan University(巴伊兰大学) Department of Bio-Medical Engineering, Edmond J. Safra Center for Bioinformatics, Tel-Aviv University(生物医学工程系,埃德蒙·J·萨法中心,特拉维夫大学) Tel Aviv University(特拉维夫大学)

AI总结 研究Transformer在图算法任务中深度与宽度的权衡,发现线性宽度下常数深度足以解决许多图问题,而某些问题需要二次宽度,实验验证了宽模型在保持精度的同时训练和推理更快。

Comments Updated ISF grant number

详情
AI中文摘要

Transformer已经彻底改变了机器学习领域。特别是,它们可用于解决复杂的算法问题,包括基于图的任务。在此类算法任务中,一个关键问题是能够实现该任务的Transformer的最小尺寸是多少。最近的工作开始探索图任务的这个问题,表明对于次线性嵌入维度(即模型宽度),对数深度就足够了。然而,我们在这里解决的一个开放问题是,如果允许宽度线性增长而深度保持固定,会发生什么。我们分析了这种情况,并得出了一个令人惊讶的结果:在线性宽度下,常数深度足以解决一系列基于图的问题。这表明宽度的适度增加可以允许更浅的模型,这在推理和训练时间方面是有利的。对于其他问题,我们表明需要二次宽度。我们的结果展示了Transformer实现图算法的复杂而有趣的格局。我们通过实验研究了深度和宽度相对能力之间的这些权衡,并发现宽模型在具有与深模型相同准确度的任务中,由于可并行化的硬件,训练和推理时间更快。

英文摘要

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement the task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly, while depth is kept fixed. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference and train time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We empirically investigate these trade-offs between the relative powers of depth and width and find tasks where wider models have the same accuracy as deep models, while having much faster train and inference time due to parallelizable hardware.

2503.08038 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Generalized Kullback-Leibler Divergence Loss

广义Kullback-Leibler散度损失

Jiequan Cui, Beier Zhu, Qingshan Xu, Zhuotao Tian, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

发表机构 * Hefei University of Technology(合肥工业大学) University of Science and Technology of China(中国科学技术大学) Nanyang Technological University(南洋理工大学) The Chinese University of Hong Kong(香港中文大学) The University of Hong Kong(香港大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳))

AI总结 本文提出广义KL散度损失,通过解耦KL损失为加权MSE和交叉熵损失,并引入非对称优化修正和类别全局信息,在对抗训练和知识蒸馏中取得SOTA性能。

Comments TPAMI 2026, extension of our NeurIPS paper "Decoupled Kullback-Leibler Divergence Loss". arXiv admin note: substantial text overlap with arXiv:2305.13948

详情
AI中文摘要

在本文中,我们深入探讨了Kullback-Leibler (KL) 散度损失,并从数学上证明它等价于由(1)加权均方误差(wMSE)损失和(2)包含软标签的交叉熵损失组成的解耦Kullback-Leibler (DKL) 散度损失。得益于DKL损失的解耦结构,我们确定了两个改进方向。首先,我们通过打破KL损失的不对称优化性质并引入更平滑的权重函数,解决了其在知识蒸馏等场景中的局限性。这一修改有效缓解了优化中的收敛困难,特别是对于软标签中预测分数较高的类别。其次,我们将类别级别的全局信息引入KL/DKL,以减少单个样本带来的偏差。通过这两项改进,我们推导出广义Kullback-Leibler (GKL) 散度损失,并通过在CIFAR-10/100、ImageNet和视觉-语言数据集上进行实验,聚焦于对抗训练和知识蒸馏任务,评估其有效性。具体来说,我们在公开排行榜RobustBench上实现了新的最先进对抗鲁棒性,并在CIFAR/ImageNet模型和CLIP模型上取得了具有竞争力的知识蒸馏性能,展示了其重要的实际价值。我们的代码可在该https URL获取。

英文摘要

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL loss in scenarios like knowledge distillation by breaking its asymmetric optimization property along with a smoother weight function. This modification effectively alleviates convergence challenges in optimization, particularly for classes with high predicted scores in soft labels. Secondly, we introduce class-wise global information into KL/DKL to reduce bias arising from individual samples. With these two enhancements, we derive the Generalized Kullback-Leibler (GKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100, ImageNet, and vision-language datasets, focusing on adversarial training, and knowledge distillation tasks. Specifically, we achieve new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive knowledge distillation performance across CIFAR/ImageNet models and CLIP models, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich(慕尼黑路德维希-马克西米利安大学) Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Notre Dame(诺丁汉大学)

AI总结 提出Agentic Neural Network框架,将多智能体协作建模为分层神经网络,通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化,在七个基准数据集上超越现有方法。

详情
AI中文摘要

利用多个大型语言模型(LLM)已被证明对处理复杂、高维任务有效,但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制,我们提出Agentic Neural Network(ANN)框架,该框架将多智能体协作概念化为分层神经网络架构。在此设计中,每个智能体作为节点运行,每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略:(1)前向阶段——受神经网络前向传播启发,任务被动态分解为子任务,并逐层构建具有合适聚合方法的协作智能体团队。(2)反向阶段——模仿反向传播,我们通过迭代反馈优化全局和局部协作,使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队,在准确性和适应性方面带来显著提升。在七个基准数据集上,我们的工作在相同配置下超越了领先的多智能体基线,显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Pennsylvania(宾夕法尼亚大学)

AI总结 通过设计结合连续上下文学习与离散关联召回的新玩具问题,发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制:一种依赖离散符号标签进行关联召回,另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情
AI中文摘要

我们引入了一类新的玩具问题,将线性回归风格的连续上下文学习(ICL)特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型,具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时,是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务,很明显模型必须执行两个功能:(1)识别应召回哪个系统的状态,并将该系统应用于其最后看到的状态;(2)继续应用正确的系统来预测后续状态。训练动态表明,第一个能力在模型训练中后期才出现。令人惊讶的是,第二个能力(继续预测恢复的序列)发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析,我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回,以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关,基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象(表现为不同的相变)不仅仅是玩具设置的人为产物,我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象:第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

2601.14968 2026-06-18 cs.LG cs.AI 版本更新

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

InstructTime++: 通过隐式特征增强的多模态语言建模进行时间序列分类

Mingyue Cheng, Xiaoyu Tao, Huajian Zhang, Qi Liu, Zhiding Liu, Yucong Luo, Yiheng Chen, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China(中国科学技术大学认知智能国家重点实验室)

AI总结 提出将时间序列分类转化为多模态生成任务,通过离散化模块和对齐投影层弥合模态差距,并利用隐式特征建模提升语言模型性能。

详情
AI中文摘要

大多数现有的时间序列分类方法采用判别范式,将输入序列直接映射到独热编码的类别标签。虽然有效,但这种范式难以融入上下文特征,也无法捕捉类别间的语义关系。为了解决这些局限性,我们提出了InstructTime,一种将时间序列分类重新定义为多模态生成任务的新框架。具体来说,连续的数值序列、上下文文本特征和任务指令被视为多模态输入,而类别标签则通过调优的语言模型作为文本输出生成。为了弥合模态差距,InstructTime引入了一个时间序列离散化模块,将连续序列转换为离散的时间标记,同时结合对齐投影层和生成式自监督预训练策略,以增强跨模态表示对齐。在此框架基础上,我们进一步提出了InstructTime++,通过引入隐式特征建模来扩展InstructTime,以补偿语言模型有限的归纳偏差。InstructTime++利用专门的工具包从原始时间序列和上下文输入中挖掘信息丰富的隐式模式,包括统计特征提取和基于视觉-语言模型的图像描述,并将其转化为文本描述以实现无缝集成。在多个基准数据集上的大量实验证明了InstructTime++的优越性能。

英文摘要

Most existing time series classification methods adopt a discriminative paradigm that maps input sequences directly to one-hot encoded class labels. While effective, this paradigm struggles to incorporate contextual features and fails to capture semantic relationships among classes. To address these limitations, we propose InstructTime, a novel framework that reformulates time series classification as a multimodal generative task. Specifically, continuous numerical sequences, contextual textual features, and task instructions are treated as multimodal inputs, while class labels are generated as textual outputs by tuned language models. To bridge the modality gap, InstructTime introduces a time series discretization module that converts continuous sequences into discrete temporal tokens, together with an alignment projection layer and a generative self-supervised pre-training strategy to enhance cross-modal representation alignment. Building upon this framework, we further propose InstructTime++, which extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models. InstructTime++ leverages specialized toolkits to mine informative implicit patterns from raw time series and contextual inputs, including statistical feature extraction and vision-language-based image captioning, and translates them into textual descriptions for seamless integration. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of InstructTime++.

2601.20361 2026-06-18 cs.LG cs.NA math.NA 版本更新

TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs

TINNs:时间诱导神经网络求解时变偏微分方程

Chen-Yang Dai, Che-Chia Chang, Te-Sheng Lin, Ming-Chih Lai, Chieh-Hsin Lai

发表机构 * Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan(应用数学系,国立阳明交通大学,新竹30010,台湾) Institute of Artificial Intelligence Innovation, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan(人工智能创新研究所,国立阳明交通大学,新竹30010,台湾) National Center for Theoretical Sciences, National Taiwan University, Taipei 10617, Taiwan(理论科学研究中心,国立台湾大学,台北10617,台湾)

AI总结 提出时间诱导神经网络(TINNs),将网络权重参数化为时间的函数,使空间表示随时间演化,结合Levenberg-Marquardt优化,在时变PDE求解中相对误差降低4倍,收敛速度提升10倍。

Comments Accepted at ICML 2026. Camera-ready version. Includes appendix

详情
AI中文摘要

物理信息神经网络(PINNs)通过学习一个无网格、可微的解来求解时变偏微分方程(PDE),该解可在空间和时间的任意位置进行评估。然而,标准的时空PINNs将时间作为输入,但在所有时间上重用具有共享权重的单一网络,迫使相同的特征表示显著不同的动力学。这种耦合会降低误差性能,并在联合强制执行PDE、边界和初始条件时可能破坏训练稳定性。我们提出时间诱导神经网络(TINNs),一种新颖的架构,将网络权重参数化为时间的可学习函数,允许有效的空间表示随时间演化,同时保持共享结构。由此产生的公式自然产生一个非线性最小二乘问题,我们使用Levenberg-Marquardt方法高效优化。在各种时变PDE上的实验表明,与PINNs和强基线相比,相对误差提高了4倍,收敛速度提高了10倍。

英文摘要

Physics-informed neural networks (PINNs) solve time-dependent partial differential equations (PDEs) by learning a mesh-free, differentiable solution that can be evaluated anywhere in space and time. However, standard space-time PINNs take time as an input but reuse a single network with shared weights across all times, forcing the same features to represent markedly different dynamics. This coupling degrades error performance and can destabilize training when enforcing PDE, boundary, and initial constraints jointly. We propose Time-Induced Neural Networks (TINNs), a novel architecture that parameterizes the network weights as a learned function of time, allowing the effective spatial representation to evolve over time while maintaining shared structure. The resulting formulation naturally yields a nonlinear least-squares problem, which we optimize efficiently using a Levenberg-Marquardt method. Experiments on various time-dependent PDEs show up to 4 times improved relative error and 10 times faster convergence compared to PINNs and strong baselines.

2604.13082 2026-06-18 cs.LG cs.AI 版本更新

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

算术泛化的长延迟:当学习到的表征超越行为时

Laura Gomezjurado Gonzalez

发表机构 * Stanford University(斯坦福大学)

AI总结 研究Transformer在算术任务中泛化延迟的原因,发现编码器早期已学到结构,但解码器瓶颈导致延迟,通过移植编码器或冻结编码器可加速泛化,且数字基的选择影响学习难度。

Comments 19 pages, 10 fugures

详情
AI中文摘要

在算法任务上训练的Transformer中的grokking现象以训练集拟合与突然泛化之间的长延迟为特征,但该延迟的来源仍不清楚。在编码器-解码器算术模型中,我们认为这种延迟反映了对已学习结构的有限访问,而非未能首先获得该结构。我们研究一步Collatz预测,发现编码器在最初几千训练步内组织了奇偶性和残差结构,而输出精度在数万步内仍接近随机。因果干预支持解码器瓶颈假说。将训练好的编码器移植到新模型中将grokking加速2.75倍,而移植训练好的解码器则有害。冻结收敛的编码器并仅重新训练解码器完全消除了平台期,并达到97.6%的准确率,而联合训练为86.1%。解码器任务的难易取决于数字表示。在15种基中,那些分解与Collatz映射算术对齐的基(例如基24)达到99.8%的准确率,而二进制完全失败,因为其表示崩溃且无法恢复。基的选择作为归纳偏置,控制解码器可利用的局部数字结构量,从而在相同底层任务上产生巨大的可学习性差异。

英文摘要

Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this delay reflects limited access to already learned structure rather than failure to acquire that structure in the first place. We study one-step Collatz prediction and find that the encoder organizes parity and residue structure within the first few thousand training steps, while output accuracy remains near chance for tens of thousands more. Causal interventions support the decoder bottleneck hypothesis. Transplanting a trained encoder into a fresh model accelerates grokking by 2.75 times, while transplanting a trained decoder actively hurts. Freezing a converged encoder and retraining only the decoder eliminates the plateau entirely and yields 97.6% accuracy, compared to 86.1% for joint training. What makes the decoder's job harder or easier depends on numeral representation. Across 15 bases, those whose factorization aligns with the Collatz map's arithmetic (e.g., base 24) reach 99.8% accuracy, while binary fails completely because its representations collapse and never recover. The choice of base acts as an inductive bias that controls how much local digit structure the decoder can exploit, producing large differences in learnability from the same underlying task.

2605.11287 2026-06-18 cs.LG cs.AI 版本更新

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

超越相似性:时间序列分析中的时序操作注意力

Jevon Twitty, Vinh Pham, Nitiwith Rotchanarak, Viresh Pati, Yubin Kim, Shihao Yang, Jiecheng Lu

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出时序操作注意力(TOA),通过引入可学习的操作符增强注意力机制,以更有效地处理时间序列数据中的符号和振荡变换,提升时间序列预测、异常检测和分类任务的性能。

详情
AI中文摘要

时间序列预测中存在一个持久性悖论:结构简单的MLP和线性模型往往优于高容量的Transformer。我们指出,这种差距源于序列建模基本原理的不匹配:尽管许多时间序列动态由全局时间操作符(如滤波和谐波结构)主导,标准注意力将每个输出视为输入的凸组合。这限制了其表示带符号和振荡变换的能力,这些能力对于时间信号处理至关重要。我们正式将这一限制定义为softmax注意力中的简单约束混合瓶颈,这对由操作符驱动的时间序列任务尤其限制性。为了解决这一问题,我们提出时序操作注意力(TOA),一种通过显式、可学习的序列空间操作符增强注意力的框架,使时间内的符号混合成为可能,同时保持输入依赖的适应性。为了使密集的N×N操作符实用化,我们引入了随机操作符正则化,一种高方差的dropout机制,它稳定了训练并防止了记忆性学习。在预测、异常检测和分类基准上,TOA在集成到标准骨干如PatchTST和iTransformer时始终提高了性能,尤其是在重建密集任务中表现尤为突出。这些结果表明,显式操作符学习是有效时间序列建模的关键要素。

英文摘要

A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while many time-series dynamics are governed by global temporal operators (e.g., filtering and harmonic structure), standard attention forms each output as a convex combination of inputs. This restricts its ability to represent signed and oscillatory transformations that are fundamental to temporal signal processing. We formalize this limitation as a simplex-constrained mixing bottleneck in softmax attention, which becomes especially restrictive for operator-driven time-series tasks. To address this, we propose $\textbf{Temporal Operator Attention (TOA)}$, a framework that augments attention with explicit, learnable sequence-space operators, enabling direct signed mixing across time while preserving input-dependent adaptivity. To make dense $N \times N$ operators practical, we introduce Stochastic Operator Regularization, a high-variance dropout mechanism that stabilizes training and prevents trivial memorization. Across forecasting, anomaly detection, and classification benchmarks, TOA consistently improves performance when integrated into standard backbones such as PatchTST and iTransformer, with particularly strong gains in reconstruction-heavy tasks. These results suggest that explicit operator learning is a key ingredient for effective time-series modeling.

2606.01249 2026-06-18 cs.LG cs.CL 版本更新

Trust Region On-Policy Distillation

信任区域在线策略蒸馏

Xingrun Xing, Haoqing Wang, Boyan Gao, Ziheng Li, Yehui Tang

发表机构 * Samsung Research(三星研究院) University of Oxford(牛津大学) Peking University(北京大学)

AI总结 提出信任区域在线策略蒸馏(TrOPD),通过信用分配策略和信任区域学习解决师生分布差异导致的训练不稳定问题,在数学推理、代码生成和通用基准上超越现有方法。

详情
AI中文摘要

在线策略蒸馏(OPD)是大型语言模型(LLM)高效后训练的基本技术,在智能体学习、多任务增强和模型压缩中具有广泛应用。然而,当教师和学生分布差异较大时,OPD训练变得不稳定,因为教师对学生生成token的监督可能产生不可靠的策略梯度,甚至导致优化失败。本文通过信用分配策略解决可靠的在线策略token级监督问题,并提出信任区域在线策略蒸馏(TrOPD)。它具有以下特点:1)信任区域在线策略学习:TrOPD仅在教师提供可靠监督的区域进行OPD,缓解了分布不匹配下K1反向KL估计的优化困难。2)异常值估计:对于异常区域,我们探索梯度裁剪、掩码和前向KL估计,以减少不可靠监督的不利影响。3)离策略引导:学生从教师前缀继续生成,并使用前向KL模仿离策略引导,鼓励向可靠区域进行在线策略探索。实验表明,TrOPD在数学推理、代码生成和通用领域基准上始终优于最先进的OPD基线,包括OPD、EOPD和REOPOLD。

英文摘要

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

2606.06564 2026-06-18 cs.LG cs.AI 版本更新

HAARES Half-Split Residual Basis Routing for Deep Transformers

WAV:面向深度仅解码器Transformer的多分辨率块残差路由

Kehan Wang

发表机构 * Chongqing University(重庆大学)

AI总结 提出WAV v1方法,通过为每个块增加方向性细节基(相位基和分裂基)来增强残差路由,在深层Transformer中优于现有方法,48层时在TinyStories和Text8上取得更低验证损失。

Comments 6 pages, 4 figures, 3 tables

详情
AI中文摘要

残差连接对于训练深度Transformer至关重要,但标准的PreNorm残差流以固定的单位权重聚合子层更新。最近的注意力残差用内容相关的深度路由替代了这种固定累积,而块注意力残差通过对块级残差摘要进行路由使机制高效。然而,单个块摘要仅存储块内的低频总残差位移,丢弃了方向性结构,例如注意力与MLP的不平衡以及早期与晚期块的动态。我们提出WAV v1,一种用于仅解码器Transformer的轻量级多分辨率残差路由方法。WAV v1不是仅通过累积残差和来表示每个块,而是为每个块增加两个方向性细节基:一个对比注意力和MLP更新的相位基,以及一个对比早期和晚期子层更新的分裂基。这些基与标准块摘要一起通过相同的深度softmax混合器进行路由,而负细节源初始化和分离的RMS匹配稳定了训练。在字符级TinyStories和Text8语言建模中,WAV v1显示出明显的深度相关优势。尽管在12层时并非始终有益,但在24层时变得有竞争力,并在48层时优于所有基线。在48层时,WAV v1将TinyStories上的验证损失从0.4960降至0.4738,Text8上从0.9363降至0.9305,且额外参数可忽略。这些结果表明,方向性残差细节(而不仅仅是块级和)对于在更深Transformer中扩展残差路由很重要。

英文摘要

Block-level residual routing makes learned residual aggregation practical by routing over block summaries, but each summary compresses an ordered sequence of attention and MLP updates into one cumulative vector. We propose \method{}, a lightweight residual basis router that keeps the cumulative block source and adds one half-split detail basis, computed as the difference between first-half and second-half residual updates. The detail basis is RMS-matched and updated online, exposing coarse intra-block trajectory information without dense sublayer-level routing. Across OpenWebText, cross-domain character-level benchmarks, and BPE-tokenized OpenWebText, the empirical pattern is depth-dependent: gains are small or mixed at shallow depth and most reliable in 48-layer models. In the 201M 48-layer setting, \method{} improves over Block AttnRes across all three seeds, while a 453M two-seed probe shows the same direction. Ablations rule out source duplication, random signed details, fixed detail-source biases, or block-count changes alone. Cost analysis shows that the method is FLOP-light but not wall-clock-free: it adds memory and routing overhead, yet its relative arithmetic cost is amortized as width grows and earlier convergence can reduce time-to-target.

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3:面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

发表机构 * NVIDIA

AI总结 提出基于统一混合Transformer架构的全模态世界模型Cosmos 3,联合处理语言、图像、视频、音频和动作序列,在理解和生成任务上达到新最优,为具身智能体提供可扩展的通用骨干。

详情
AI中文摘要

我们介绍了Cosmos 3,一个全模态世界模型家族,设计用于在统一的混合Transformer架构中联合处理和生成语言、图像、视频、音频和动作序列。通过支持高度灵活的输入输出配置,Cosmos 3无缝统一了物理AI的关键模态——有效地将视觉语言模型、视频生成器、世界模拟器和世界动作模型整合到一个框架中。我们的评估表明,Cosmos 3在一系列多样化的理解和生成任务中确立了新的最优水平,展示了全模态世界模型作为具身智能体可扩展、通用骨干的能力。我们的后训练Cosmos 3模型在技术报告撰写时被Artificial Analysis评为最佳开源文本到图像和图像到视频模型,并被RoboArena评为最佳策略模型。为了加速物理AI领域的开放研究和部署,我们在Linux基金会的OpenMDW-1.1许可证下提供我们的代码、模型检查点、策划的合成数据集和评估基准,网址为https://this https URL License at this https URL }{ this http URL and this https URL。项目网站位于https://this https URL。

英文摘要

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

2. 表示学习、自监督与对比学习 4 篇

2406.07775 2026-06-18 cs.LG 版本更新

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

基于自注意力的非线性基变换用于动态光纤传输矩阵的紧凑潜在空间建模

Yijie Zheng, Robert J. Kilpatrick, David B. Phillips, George S. D. Gordon

发表机构 * Optics and Photonics research group, University of Nottingham, UK(诺丁汉大学光学与光子学研究组,英国) University of Exeter, UK(埃克塞特大学,英国) State Key Laboratory of Extreme Photonics and Instrumentation, College of Optical Science and Engineering International Research Center for Advanced Photonics, Zhejiang University, Hangzhou, China(极端光子学与仪器国家重点实验室,浙江大学光科学与工程学院,国际先进光子学研究中心,中国杭州) Research Center for Humanoid Sensing, Zhejiang Lab, Hangzhou, China(人感知研究中心,浙江实验室,中国杭州)

AI总结 提出使用自注意力层动态变换光纤矩阵的坐标表示到紧凑基,实现低维表示,在多个数据集上验证了基稀疏性(参与比0.01-0.11)和低重建误差(<10%)。

详情
AI中文摘要

多模光纤是头发丝粗细的玻璃丝,能高效传输光。它们有望实现下一代医用内窥镜,在体内深处提供前所未有的亚细胞图像分辨率。然而,将光限制在这样的光纤中意味着图像在传输过程中固有地被打乱。传统上,通过预先校准特定光纤如何打乱光并求解表示光纤物理模型的静态线性矩阵方程来补偿这种打乱。然而,随着技术向实际部署发展,解扰过程必须考虑由于移动和温度变化等因素导致的光纤对光影响的矩阵的动态变化,以及由于光纤尖端在体内不可及而产生的非线性。这种复杂、动态和非线性行为非常适合用神经网络近似,但大多数领先的图像重建网络依赖卷积层,这些层假设相邻像素之间存在强相关性,这种强归纳偏置不适用于光纤矩阵,因为光纤矩阵可以用具有长程相关性的任意坐标表示来表达。我们引入了一个新概念,使用自注意力层将变化的光纤矩阵的坐标表示动态变换到允许紧凑、低维表示的基,适合进一步处理。我们在不同的光纤矩阵数据集上展示了该方法的有效性。我们展示了我们的模型在变换基上显著提高了光纤基的稀疏性,以参与比p作为稀疏性度量,介于0.01和0.11之间。此外,我们展示了这些变换后的表示允许以<10%的重建误差重建原始矩阵,证明了可逆性。

英文摘要

Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

2605.10840 2026-06-18 cs.LG cs.AI q-bio.QM 版本更新

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA:一种多阶段协同训练框架,用于EHR患者轨迹的联合嵌入预测预训练

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, Rishikesan Kamaleswaran

发表机构 * Duke University(杜克大学)

AI总结 本文提出Clin-JEPA框架,通过多阶段预训练稳定协同训练编码器和预测器,解决EHR数据中联合嵌入预测的挑战,实现多任务下游任务的高性能表现。

Comments 16 pages, 4 figures, 8 tables. Code: https://github.com/YeungYathin/Clin-JEPA

详情
AI中文摘要

我们介绍了Clin-JEPA,一种用于EHR患者轨迹的联合嵌入预测(JEPA)预训练的多阶段协同训练框架。JEPA架构已在机器人领域实现了潜在空间规划,并在视觉领域实现了高质量的表示学习,但将其扩展到EHR数据以获得一个能够同时预测患者轨迹并服务于多种下游风险预测任务的单一主干,仍是一个开放性挑战。现有的JEPA框架要么在预训练后丢弃预测器(I-JEPA,V-JEPA),要么在冻结的预训练编码器上训练预测器(V-JEPA 2-AC),导致编码器在推理时无法感知预测器必须使用的滚动信号;在共享JEPA预测目标下协同训练编码器和预测器将提供这种基础,但朴素的协同训练不稳定,代表性崩溃和在线/目标漂移导致自回归滚动发散。Clin-JEPA的五阶段预训练课程——预测器预热、联合细化、EMA目标对齐、硬同步和预测器最终化——通过阶段解决每个失败模式,稳定地协同训练基于Qwen3-8B的编码器和一个具有9200万参数的潜在轨迹预测器。在MIMIC-IV ICU数据上,三个独立评估支持该框架:(1)潜在ℓ1滚动漂移唯一收敛(-15.7%)在48小时范围内,而基线和消融测试发散(+3%至+4951%);(2)编码器学习了临床可区分的潜在几何结构(衰变患者群体在潜在空间中偏离4.83×,而稳定患者仅偏离≤2.62×);(3)单一主干在多任务下游评估中优于强大的表格和序列基线。Clin-JEPA在ICareFM EEP上达到平均AUROC 0.851,在8个二元风险任务上达到0.883(比基线平均高0.038和0.041)

英文摘要

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum -- predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization -- addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

2606.12629 2026-06-18 cs.LG cs.AI 版本更新

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Bag of Dims:通过维度级符号模式实现无需训练的机制可解释性

Varun Reddy Nalagatla

发表机构 * Amazon Web Services(亚马逊云服务)

AI总结 本文提出Bag of Dims框架,证明Transformer隐藏状态的标准基即可作为无需训练的特征基,通过维度符号模式编码语义,并在三个模型上验证了其有效性。

Comments 22 pages, 5 figures, 27 tables

详情
AI中文摘要

我们表明,Transformer隐藏状态的标准基已经提供了一个无需训练、架构通用的特征基。单个维度通过其符号编码语义内容,通过其幅度编码置信度,充当独立的二进制寄存器。我们通过四个渐进实验在三个模型家族(Qwen 3.5-4B、Gemma 3-4B、Mistral 7B)上验证了这种Bag of Dims框架。仅符号模式就携带预测性内容:将所有幅度替换为1,通过LM头实现72-93%的top-5下一个token准确率,而无需任何解码器的纯汉明评分达到80-90%的top-4096准确率。这些符号模式组织成语义特征:使用单token类型缓存(每个词汇token一次前向传播,无上下文),我们通过每维度符号一致性(平均AUC 0.80)从50个锚点发现了175个类别,无需任何训练。一个训练过的探针仅增加+0.018 AUC并收敛到轴对齐的权重,证实了可忽略的跨维度结构。这种结构扩展到注意力:所有175个类别在K和V投影中仍然可发现。在写入端,静态FFN权重检查将20%的特征与单个写入神经元联系起来(一致性>0.70;随机对照:0%),通过多数投票,top-200神经元联盟在99.9%的原型上实现>0.70的一致性。完全无监督的发现(随机种子,无标签)在所有三个模型上扩展到1500个特征,产量100%,稀疏度99%,成对互信息为0.0014比特,证实了低维度间耦合。这些结果确立了标准基已经足以在整个Transformer计算路径中进行特征读取,无需训练、无需优化,且每个词汇token仅需一次前向传播,无需GPU天数。

英文摘要

We show the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs (+/-1) and confidence via their magnitudes, acting as independent binary registers; a feature is a subset of dimensions with a consistent sign pattern, read by counting sign agreements with no learned rotation. We validate this Bag of Dims framework across seven models spanning language (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B, Qwen3-32B), vision (DINOv2, ViT-Base), and audio (AST). Signs alone carry predictive content: unit-magnitude sign patterns preserve 60-93% top-5 next-token accuracy through the LM head, and decoder-free Hamming scoring reaches 80-90% top-4096. From a single-token cache (one forward pass per token, no context, no labels), we detect 175 categories at AUC 0.97-0.99 by sign agreement; a trained probe adds only +0.018 AUC and converges to axis-aligned weights. These features are causally operative: they survive the K/V attention projections, trace to the FFN neuron coalitions that write them (random-weight controls never reproduce this), and flipping a feature's signs during the live forward pass suppresses its concept across four language models, magnitude-matched and concept-specific. Dimensions stay independent throughout (pairwise mutual information below 0.006 bits). The structure is not specific to language: the same per-dimension signs appear in self-supervised vision (DINOv2, 9/12 ImageNet superclasses), supervised vision (ViT-Base, 11/12), and audio (AST, 50/50 ESC-50 categories), so it reflects transformer training in general, not the language-modeling objective. The standard basis already suffices for feature reading at one forward pass, no optimization, no GPU-days. The open problem shifts from finding the right rotation to cataloging what each dimension encodes.

2603.11417 2026-06-18 cs.CV cs.LG 版本更新

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

端到端自动驾驶中的零样本跨城市泛化:自监督与监督表示

Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

发表机构 * Department of Electrical and Computer Engineering, NYU Tandon School of Engineering(电气工程系,纽约大学Tandon工程学院)

AI总结 研究端到端自动驾驶模型在跨城市零样本迁移中的泛化能力,发现自监督预训练(如I-JEPA、DINOv2、MAE)相比监督预训练能显著减少位移和碰撞退化,提升闭环评估中的分布外PDMS。

详情
AI中文摘要

端到端自动驾驶模型通常使用监督的ImageNet预训练骨干网络在多城市数据集上训练,但其泛化到未见城市的能力尚未得到充分检验。当训练和评估数据在地理上混合时,模型可能隐含地依赖城市特定线索,掩盖了在真实世界域偏移下泛化到新位置时可能出现的失败模式。在这项工作中,我们将零样本跨城市迁移定义为端到端自动驾驶的受控表示级压力测试,并探究视觉预训练如何影响地理域偏移下的迁移行为。我们通过将自监督骨干网络I-JEPA、DINOv2和MAE集成到规划框架中进行了全面研究。我们在nuScenes上的开环设置和NAVSIM上的闭环评估协议中,在严格的地理划分下评估性能。我们的实验揭示了当模型在不同道路拓扑、交通规则和视觉环境的城市间迁移时存在显著的泛化差距。在开环评估中,监督骨干网络在城市间迁移时表现出严重退化,而某些领域特定的自监督方法可以显著减少位移和碰撞退化。在闭环评估中,自监督预训练在多个单城市训练设置中提高了平均分布外PDMS。我们的结果提供了经验证据,表明表示学习影响跨城市规划的鲁棒性,并促使将零样本地理迁移作为评估端到端自动驾驶系统的重要压力测试。

英文摘要

End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real-world domain shifts when generalizing to new locations. In this work, we formulate zero-shot cross-city transfer as a controlled representation-level stress test for end-to-end autonomous driving and ask how visual pretraining affects transfer behavior under geographic domain shift. We conduct a comprehensive study by integrating self-supervised backbones I-JEPA, DINOv2, and MAE into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a substantial generalization gap when transferring models across cities with different road topologies, traffic conventions, and visual environments. In open-loop evaluation, a supervised backbone exhibits severe degradation when transferring between cities, yet some domain-specific self-supervised methods can substantially reduce both displacement and collision degradation. In closed-loop evaluation, self-supervised pretraining improves average out-of-distribution PDMS in several single-city training settings. Our results provide empirical evidence that representation learning influences the robustness of cross-city planning and motivate zero-shot geographic transfer as an important stress test for evaluating end-to-end autonomous driving systems.

3. 强化学习与序列决策 5 篇

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)(软件竞争力中心哈根贝格) Institut für Strömungsmechanik und Wärmeübertragung, TU Wien(流体力学与传热研究所,维也纳技术大学) CERBSim GmbH(CERBSim公司)

AI总结 提出基于强化学习的自适应优化算法,通过代理模型和演员-评论家策略评估的MCMC方法,冻结部分参数以降低维度,加速气动外形优化,并在简单流体动力学问题上验证了特征重要性解释能力。

详情
AI中文摘要

我们引入了一种基于强化学习(RL)的自适应优化算法,用于气动外形优化,重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法,允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量,并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化,如果(a)参数必须驻留的局部邻域足够大,能够与网格大小的步长及其大量模拟相竞争,并且(b)对这些邻域所需的奖励和成本估计足够准确,以实现良好的逐步参数自适应,则可以加速全局优化。我们给出了一个简单流体动力学问题的例子,在该问题上,该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

2604.03208 2026-06-18 cs.LG 版本更新

Hierarchical Planning with Latent World Models

基于潜在世界模型的分层规划

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

发表机构 * FAIR at Meta(Meta旗下的FAIR) New York University(纽约大学) Mila - Québec AI Institute(魁北克AI研究院) Brown University(布朗大学)

AI总结 提出HWM架构,通过多时间尺度潜在世界模型和潜在匹配实现分层模型预测控制,解决长时域任务中单层规划失败和计算爆炸问题。

详情
AI中文摘要

世界模型是通过规划实现零样本具身控制的一条有前景的路径。然而,现有的世界模型规划器在长时域、多阶段任务中面临困难:预测误差累积,且朴素搜索的复杂度随规划时域呈指数增长。分层方法通过将任务分解为更短、可处理的子问题来缓解这两个问题;然而,先前的分层方法要么将控制摊销为任务特定的策略(分层强化学习),要么假设低维状态和已知动力学(经典分层MPC)。我们提出了基于潜在世界模型的分层规划(HWM),这是一种直接在仅通过下一潜在预测训练的视觉世界模型上进行分层模型预测控制(MPC)的架构和规划范式。HWM在共享潜在空间内学习多个时间尺度的世界模型,因此长时域模型的预测通过潜在匹配作为短时域模型的子目标,无需任务特定的奖励、技能学习或分层策略。为了保持长时域搜索的可处理性,HWM学习了一个动作编码器,将原始动作块压缩为潜在宏动作。在真实世界的Franka操作中,HWM从单个目标图像中完成拾取和放置的成功率为70%,而单层规划的成功率为0%。在模拟的推操作和迷宫导航任务中,HWM在长时域任务上持续提升性能,同时所需规划计算量最多减少3倍。

英文摘要

World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

2605.22142 2026-06-18 cs.LG cs.AI 版本更新

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移:在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

AI总结 本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题,提出了一种基于神经符号价值决策的方法,通过在长期插入前决定保留或丢弃观察到的三元组,从而提升记忆效率,并在RoomKG基准测试中优于符号和神经基线方法。

详情
AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息,但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程,将其建模为一个神经符号价值决策问题:对于每个观察到的三元组,智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区,我们采用了一种每项Q学习设计,使用共享参数和实际的时间差分更新,跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中,学习到的转移决策优于符号和神经基线,包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中,一个轻量级的本地短期-only变体表现最佳,且在步骤层面行为显示,策略保留导航和查询相关的事实,同时丢弃低价值的候选事实,支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

2606.12808 2026-06-18 cs.LG cs.AI 版本更新

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

SymQNet: 低延迟自适应哈密顿量学习的摊销获取

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出SymQNet,一种摊销强化学习方法,通过离线学习后验条件获取策略,在线快速前向传播,显著降低自适应哈密顿量学习的获取延迟。

详情
AI中文摘要

自适应哈密顿量学习对于校准和表征量子设备至关重要。在自适应控制器中,选择下一个实验本身就是一个计算。贝叶斯设计规则在每次后验更新后重新计算,这一步可能需要几秒钟。在数百次试验中,这些秒数成为自适应性的显著墙钟成本。我们引入SymQNet,一种用于低延迟自适应哈密顿量学习的摊销强化学习方法。SymQNet离线学习后验条件获取策略,然后在线使用快速策略前向传播,同时保留贝叶斯后验反馈。在横向场伊辛基准测试中,相对于有界Fisher信息搜索和有界两步贝叶斯主动学习(BALD),SymQNet显著降低了获取延迟。在五量子比特时,相对于这些在线基线,它仅获取决策延迟降低了$47.1\ imes$和$72.6\ imes$;在十二量子比特时,SymQNet的完整模拟步骤需要$1.02$秒,而有界两步BALD需要$13.27$秒。总体而言,我们表明学习获取可以使自适应哈密顿量学习对于重复的低延迟工作负载变得实用。

英文摘要

Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

2511.00802 2026-06-18 cs.SE cs.CL cs.LG 版本更新

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker: 使用代码修改型LLM代理的自动离线策略评估优化

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

发表机构 * Michigan Technological University, Houghton(密歇根技术大学) Birmingham City University(伯明翰城市大学) University of British Columbia, Kelowna(不列颠哥伦比亚大学, 肯洛纳)

AI总结 提出GrowthHacker基准,利用LLM代理自动迭代修改代码以优化离线策略评估(OPE)实现,在Open Bandit Pipeline和Scope-RL上评估多种框架,证明基于LLM的代理可作为自动增长黑客持续改进OPE系统。

Comments Accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情
AI中文摘要

随着数据驱动开发的广泛采用,在线A/B测试已成为衡量新技术效果的既定方法。然而,部署在线实验需要设计、实现和部署资源,并可能对用户产生负面影响(例如,不安全或不道德的结果),同时需要数周的数据收集。为了解决这一问题,离线策略评估(OPE)或离线A/B测试这一日益增长的研究领域,使用先前收集的日志数据离线评估新技术。OPE也是强化学习中的一个基本问题,在在线测试昂贵或风险高的领域(如医疗保健、推荐系统、教育和机器人技术)中非常重要。尽管代码生成大语言模型(LLM)和代理工作流取得了进展,但关于LLM和基于LLM的代理是否以及如何自动优化OPE实现,我们知之甚少。我们提出了GrowthHacker,这是一个基准测试,用于在大规模公共数据集上评估基线LLM和基于LLM的代理。GrowthHacker自主迭代修改代码,运行OPE,并使用指标指导后续优化。我们在Open Bandit Pipeline(OBP)和Scope-RL上评估方法,并开发了一个双代理框架,该框架解决了现有框架的局限性,同时降低了复杂性。在两个库中,双代理显示出最高的可靠性(98.1%-100%成功率)和正向结果率(78%),正向结果的中位改进为4.4%;CrewAI实现了最高的平均改进(37.9%),并且是唯一没有极端值失败的框架。AutoGen和Default各达到65%的正向结果率。这些结果证明了使用基于LLM的代理作为自动“增长黑客”持续改进OPE系统的可行性,对在手动优化成本高昂的情况下扩展数据驱动决策具有重要意义。

英文摘要

With data-driven development now widely adopted, online A/B testing is an established method for measuring the effects of new technologies. However, deploying online experiments demands resources for design, implementation, and deployment, and may negatively impact users (e.g., unsafe or unethical outcomes) while requiring weeks of data collection. To address this, the growing research area of off-policy evaluation (OPE), or offline A/B testing, assesses new technologies offline using previously collected logged data. OPE is also a fundamental problem in reinforcement learning and is important where online testing is expensive or risky, such as healthcare, recommender systems, education, and robotics. Despite advances in code-generation large language models (LLMs) and agentic workflows, little is known about whether and how LLMs and LLM-based agents can automatically optimize OPE implementations. We propose GrowthHacker, a benchmark that evaluates baseline LLMs and LLM-based agents on large-scale public datasets. GrowthHacker autonomously and iteratively modifies code, runs OPE, and uses the metrics to guide subsequent optimization. We evaluate methods on Open Bandit Pipeline (OBP) and Scope-RL, and develop a two_agent framework that addresses limitations of existing frameworks while reducing complexity. Across both libraries, two_agent shows the highest reliability (98.1%-100% success rate) and positive-outcome rate (78%), with a median improvement of 4.4% among positive outcomes; CrewAI achieves the highest average improvement (37.9%) and is the only framework with zero extreme-value failures. AutoGen and Default each reach 65% positive-outcome rates. These results establish the feasibility of using LLM-based agents as automated "growth hackers" to continuously improve OPE systems, with implications for scaling data-driven decision-making where manual optimization is expensive.

4. 生成模型与概率建模 12 篇

2602.11467 2026-06-18 cs.LG 版本更新

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM:一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校计算机科学系) Department of Computer Science, University of California San Diego, La Jolla, USA(加州大学圣地亚哥分校计算机科学系) School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校医学院)

AI总结 提出PRISM框架,结合隐式神经表示与不确定性感知统计形状分析,通过封闭形式Fisher信息度量实现高效局部时间不确定性量化,在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情
AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM,一种新颖的框架,将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布,提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量,通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明,PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色,同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

2603.10718 2026-06-18 cs.LG 版本更新

Riemannian MeanFlow for One-Step Generation on Manifolds

Riemannian MeanFlow用于流形上的单步生成

Zichen Zhong, Haoliang Sun, Yukun Zhao, Yongshun Gong, Yilong Yin

发表机构 * School of Software, Shandong University, Jinan, China(软件学院,山东大学,济南,中国)

AI总结 本文提出Riemannian MeanFlow(RMF),通过平行运输定义平均速度场,并推导出将平均速度与瞬时速度联系起来的Riemannian MeanFlow恒等式,从而实现流形上基于位置的切空间中的单步生成,改进了生成质量与效率的权衡并降低了采样成本。

Comments ICML 2026

详情
AI中文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

英文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

2604.04342 2026-06-18 cs.LG stat.ML 版本更新

Generative models for decision-making under distributional shift

分布偏移下决策的生成模型

Xiuyuan Cheng, Yunqin Zhu, Yao Xie

发表机构 * Department of Mathematics, Duke University(杜克大学数学系) H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology(佐治亚理工学院H. Milton Stewart工业与系统工程学院)

AI总结 本文提出基于流和分数生成模型的统一框架,通过传输映射、速度场等工具处理分布偏移下的决策问题,实现鲁棒性、条件分布生成及不确定性量化。

Comments INFORMS TutORials in Operations Research, 2026

详情
AI中文摘要

许多数据驱动的决策问题使用从历史数据估计的名义分布来制定,而性能最终由可能发生偏移、依赖于上下文、部分观测或由压力引起的部署分布决定。本教程介绍了现代生成模型,特别是基于流和分数的方法,作为构建决策相关分布的数学工具。从运筹学的角度来看,它们的主要价值不在于无约束的样本合成,而在于通过传输映射、速度场、分数场和引导随机动力学来表示和变换分布。我们提出了一个基于前推映射、连续性、Fokker-Planck方程、Wasserstein几何和概率空间优化的统一框架。在此框架内,生成模型可用于学习名义不确定性、构建用于鲁棒性的受压或最不利分布,以及在侧信息和部分观测下生成条件或后验分布。我们还强调了代表性的理论保证,包括迭代流模型的前向-反向收敛、传输映射空间中的一阶极小极大分析,以及具有生成先验的后验采样的误差传递界。本教程为在分布偏移下使用生成模型进行场景生成、鲁棒决策、不确定性量化及相关问题提供了原则性的介绍。

英文摘要

Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

2605.17232 2026-06-18 cs.LG math.ST stat.ML stat.TH 版本更新

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

离散扩散模型的维度无关收敛性:伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

发表机构 * Department of Mathematics(数学系) Oden Institute School of Data Science and Society(数据科学与社会学院) UCLA(加州大学洛杉矶分校) University of Texas at Austin(德克萨斯大学奥斯汀分校) UNC Chapel Hill(北卡罗来纳大学教堂山分校) Computational and Applied Sciences Group(计算与应用科学组) Department of Mathematics and Statistics(数学与统计学系) SRI International(SRI国际) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 本文提出了一种基于伴随方程的统一框架,实现了任何积分概率度量(IPM)下的维度无关收敛保证,克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情
AI中文摘要

离散扩散已成为生成建模中的领先框架,广泛应用于语言、视觉和生物学等领域。然而,现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散,而总变差(TV)的界依赖于状态空间大小S,并在现代语言任务中变得无效,因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架,建立了任何积分概率度量(IPM)下的维度无关收敛保证。到目前为止,我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是,我们的理论仅依赖于一个标准的速率矩阵正则性假设,并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进:通过伴随方程在可观测空间中工作而不是直接处理概率测度,一种产生任何IPM界正则性分析,一种耦合论证在均匀转移下去除S依赖性,以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此,我们的框架与先前分析显著不同,并避免了路径空间-KL和现有TV方法的不足。除了收敛界外,我们的框架还提供了一种灵活的工具包,用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

2605.30920 2026-06-18 cs.LG 版本更新

Unsupervised Diffusion Solver for Combinatorial Optimization via Combinatorial Adjoint Matching

通过组合伴随匹配实现组合优化的无监督扩散求解器

Shengyu Feng, Tarun Suresh, Yiming Yang

发表机构 * Language Technologies Institute, Carnegie Mellon University(卡内基梅隆大学语言技术研究所) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出组合伴随匹配(CAM)框架,利用离散伴随动力学和随机控制公式,实现无监督训练离散扩散求解器,在多种组合优化问题上达到与监督方法竞争的性能。

Comments ICML26

详情
AI中文摘要

基于扩散的神经求解器在组合优化(CO)中显示出强大潜力,但现有方法通常依赖于使用大量近最优解进行监督训练。在这项工作中,我们将基于伴随的轨迹优化方法扩展到离散组合域。我们将基于扩散的CO表述为连续时间马尔可夫链上的随机控制问题,并引入离散伴随动力学,用于通过离散生成轨迹传播优化信号。基于这一表述,我们提出了组合伴随匹配(CAM),一种用于离散扩散求解器的无监督训练框架,具有结构化和低方差的轨迹级优化信号。实验上,CAM在多种组合优化问题上始终优于现有的无监督扩散基线,并与强大的监督扩散求解器甚至传统求解器性能相当。我们的代码可在 https://github.com/Shengyu-Feng/CAM 获取。

英文摘要

Diffusion-based neural solvers have shown strong promise for combinatorial optimization (CO), but existing methods typically rely on supervised training with large collections of near-optimal solutions. In this work, we extend adjoint-based trajectory optimization methods to discrete combinatorial domains. We formulate diffusion-based CO as a stochastic control problem over Continuous-Time Markov Chains and introduce discrete adjoint dynamics for propagating optimization signals through discrete generative trajectories. Building on this formulation, we propose Combinatorial Adjoint Matching (CAM), an unsupervised training framework for discrete diffusion solvers with structured and low-variance trajectory-level optimization signals. Empirically, CAM consistently outperforms existing unsupervised diffusion baselines and achieves performance competitive with strong supervised diffusion solvers and even traditional solvers across diverse combinatorial optimization problems. Our code is available at https://github.com/Shengyu-Feng/CAM.

2606.10466 2026-06-18 cs.LG cs.AI 版本更新

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

UPLOTS: 一种用于约束时间序列生成的统一预训练语言模型

Du Yin, Hao Xue, Jinliang Deng, Yang Yang, Shuang Ao, Arian Prabowo, Flora Salim

发表机构 * University of New South Wales(新南威尔士大学) HKUST(GZ)(香港科技大学(广州)) BUAA(北京航空航天大学)

AI总结 提出UPLOTS,一种基于统一预训练语言模型和提示引导的框架,通过动态多数据集损失重加权和提示到模式映射,实现跨领域约束时间序列生成,在四个基准上验证了其泛化性和数据增强效果。

详情
AI中文摘要

在时间序列生成中,现有方法通常为每个数据集手工设计或训练单独的模型,这阻碍了它们的可扩展性,并且未能利用跨领域的共享时间结构。为了解决这种碎片化问题,我们提出了UPLOTS,一种统一的、提示引导的语言模型框架,用于跨不同领域的约束时间序列生成。UPLOTS不是构建任务特定的模型,而是利用一个由学习到的约束提示引导的单一预训练transformer骨干网络,从而能够按需生成并精确控制模式。一个关键创新是我们的动态多数据集损失重加权和提示到模式映射,这使得UPLOTS能够在训练期间内化多样化的时间结构,并在推理时有条件地生成它们。我们在四个真实世界基准和多个约束设置(包括峰值周期、日历、负载水平和波动性模式)上评估了UPLOTS。额外的保留约束组合和下游预测实验进一步表明,UPLOTS能够泛化到原始峰值模式设置之外,并在真实数据稀缺的情况下改进数据增强。我们的代码和基线可在匿名GitHub仓库获取:this https URL。

英文摘要

In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address this fragmentation, we propose UPLOTS, a Unified, Prompt-guided Language model framework fOr constrained Time-Series Generation across diverse domains. Instead of building task-specific models, UPLOTS leverages a single pre-trained transformer backbone guided by learned constraint prompts, enabling on-demand generation with precise pattern control. One key innovation is our dynamic multi-dataset loss re-weighting and prompt-to-pattern mapping, which allows UPLOTS to internalize diverse temporal structures during training and conditionally generate them at inference. We evaluate UPLOTS on four real-world benchmarks and multiple constraint settings, including peak-period, calendar, load-level, and volatility patterns. Additional held-out constraint-combination and downstream forecasting experiments further demonstrate that UPLOTS generalizes beyond the original peak-pattern setting and improves data augmentation under scarce real-data regimes. Our code and baselines are available at anonymous github repo: https://anonymous.4open.science/r/UPLOTS-6C36.

2606.13795 2026-06-18 cs.LG 版本更新

DiPOD: Diffusion Policy Optimization without Drifting Apart

无漂移扩散策略优化

Haozhe Jiang, Haiwen Feng, Pieter Abbeel, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab

发表机构 * University of California, Berkeley(加州大学伯克利分校) Simons Institute for the Theory of Computing(西蒙斯计算理论研究所) Department of Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电气工程与计算机科学系)

AI总结 针对扩散策略梯度方法的不稳定性,提出DiPOD框架,通过自蒸馏与策略改进梯度更新交替进行,维持紧界行为,实现稳定且高效的策略优化。

Comments Project page: astro-eric.github.io/blogs/dipod/ Code: https://github.com/Astro-Eric/DiPOD-release

详情
AI中文摘要

RL后训练对于改进扩散策略越来越关键,但现有的扩散策略梯度方法往往不稳定,无法实现可靠的策略改进。我们确定原因是双重漂移现象:优化变分代理可能导致ELBO与真实对数似然分离,从而使产生的代理策略梯度与期望回报的真实策略梯度不对齐。我们提出\textbf{DiPOD},一种扩散策略优化框架,通过将自蒸馏与策略改进梯度更新交替进行,在整个训练过程中维持紧界行为。这导致了一个简单实用的算法:在每个扩散策略梯度更新中增加一个在策略ELBO正则化项。在扩散语言模型后训练和连续控制扩散策略中,DiPOD显著稳定了训练,并达到了比先前方法更高的奖励。

英文摘要

RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

2502.07531 2026-06-18 cs.CV cs.AI cs.LG cs.MM 版本更新

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

VidCRAFT3: 面向图像到视频生成的相机、物体与光照控制

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) Shanghai Innovation Institute(上海创新研究院) Zhejiang University(浙江大学) Huawei Noah’s Ark Lab(华为诺亚实验室) Westlake University(西湖大学) School of Data Science and MOE Frontiers Center for Brain Science, Fudan University(复旦大学数据科学学院和脑科学前沿中心) Fudan ISTBI–ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University(复旦大学-浙江师范大学脑启发智能算法中心)

AI总结 提出VidCRAFT3框架,通过显式建模几何、运动与光照的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制,在控制精度和视觉一致性上达到最优。

Comments Accepted to TVCG 2026

详情
AI中文摘要

可控图像到视频(I2V)生成将参考图像转换为由用户指定控制信号引导的连贯视频。虽然对相机运动、物体运动和光照的精确控制对于高保真创作至关重要,但现有方法通常独立处理这些因素,忽视了动态场景中视角、几何和光照之间的物理耦合,导致同时变化时出现阴影不匹配和透视漂移等视觉不一致问题。我们提出了VidCRAFT3,一个统一且灵活的I2V框架,显式建模几何、运动和光照之间的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制。Image2Cloud提供显式的3D几何先验以实现精确的相机运动控制。ObjMotionNet将稀疏物体轨迹编码为多尺度运动特征,以引导逼真的物体运动。空间三重注意力变压器通过光照交叉注意力整合光照方向,实现一致的重光照。为了解决联合标注数据的稀缺性,我们构建了VideoLightingDirection(VLD)数据集,包含精确的逐帧光照方向标注,并引入三阶段渐进训练策略,使得无需完全联合标注即可实现鲁棒学习。大量实验表明,VidCRAFT3在多种场景下的控制精度和视觉一致性上达到了最先进水平。

英文摘要

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

2602.23006 2026-06-18 stat.ML cs.LG 版本更新

Regular Fourier Features for Nonstationary Gaussian Processes

非平稳高斯过程的规则傅里叶特征

Arsalan Jawaid, Abdullah Karatas, Jörg Seewig

发表机构 * Institute of Measurement and Sensor Technology University of Kaiserslautern-Landau(测量与传感器技术研究所 柏林-卡尔斯鲁厄大学) Independent Researcher(独立研究者)

AI总结 提出规则傅里叶特征方法,通过直接离散化谱表示避免概率假设,实现非平稳高斯过程的低秩近似,并扩展至核学习。

Comments 11 pages (9 main + 2 suppl.), 5 figures, 2 tables

详情
AI中文摘要

模拟高斯过程需要从高维高斯分布中采样,其计算复杂度随采样点数量呈三次方增长。谱方法通过利用傅里叶表示并将谱密度视为适用于蒙特卡洛近似的概率分布来应对这一挑战。尽管这种概率解释对平稳过程有效,但对于非平稳情况则过于严格,因为非平稳过程的谱密度通常不是概率测度。我们针对可调和过程提出规则傅里叶特征以避免这一限制。我们的方法直接离散化谱表示,保留谱权重之间的相关结构,无需概率假设。在有限谱支撑假设下,这产生了一个高效的低秩近似,该近似一致且半正定。当谱密度未知时,该框架自然地扩展到基于数据的核学习。我们在局部平稳和可调和混合核(后者具有复值谱密度)上演示了该方法,并将核学习扩展应用于真实和合成数据。

英文摘要

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation and treating the spectral density as a probability distribution suitable for Monte Carlo approximation. Although this probabilistic interpretation is valid for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes to avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite-spectral-support assumption, this yields an efficient low-rank approximation that is consistent and positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary and harmonizable mixture kernels, the latter with a complex-valued spectral density, and apply the kernel-learning extension to real and synthetic data.

2605.27478 2026-06-18 stat.ML cs.LG math.PR 版本更新

Triangular-Reference Schrödinger Bridges for Time Series Generation

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

发表机构 * Arakne S.r.l.(阿拉克内公司)

AI总结 提出三角参考薛定谔桥框架,通过区间冻结的退化扩散参考和层次化潜在波动率结构,实现时间序列的保守生成,并保持熵最小化的变分核心。

详情
AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥(TR-SBTS),这是SBTS框架的一种保守扩展,其中布朗参考被替换为区间冻结的、可能退化的扩散参考,在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影,变分约束在时间和潜在水平上联合施加,并通过相对熵的分解层次展开。SBTS的变分核心得以保留:熵最小化器是参考的h-变换,在每个冻结区间上,最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式,即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现,该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核,以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤,其中每个潜在水平为上一水平产生动态参考,并由协方差描述符总结;该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

2605.28690 2026-06-18 quant-ph cs.LG 版本更新

Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

潜在条件参数化量子电路作为量子态分布的通用近似器

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited(Fujitsu 研究所量子实验室, Fujitsu 有限公司)

AI总结 提出潜在条件参数化量子电路(LPQC),通过经典神经网络将潜在变量映射到量子电路参数,证明其在1-Wasserstein距离下是密度算子概率测度的通用近似器,并引入多模态潜在先验和专家混合电路架构缓解贫瘠高原问题。

Comments 21 pages, 11 figures (fix the proof and update appendix for barren plateaus analysis)

详情
AI中文摘要

量子模拟、量子化学和量子机器学习中的许多应用不仅需要单个量子态,还需要表征目标系统异质性的量子态系综。在变分和容错设置中,逐个状态地准备这样的系综是不可行的,这激发了生成式建模方法。我们引入了潜在条件参数化量子电路(LPQC),这是一种混合量子-经典框架,其中经典神经网络将从先验分布中采样的潜在变量映射到参数化量子电路的参数。我们证明了LPQC在1-Wasserstein距离下是密度算子概率测度的通用近似器,将经典通用近似定理扩展到量子分布设置。我们还引入了多模态潜在先验和专家混合电路架构,并表明它在优化过程中经验性地缓解了贫瘠高原问题。数值实验在合成多簇混合量子态系综和QM9衍生的3D分子结构系综上验证了该框架。在这些任务中,LPQC优于最近的量子生成基线,同时与典型的经典基线相比,在输出维度大幅降低的情况下保持竞争力。通过利用潜在空间中的经典表达能力,LPQC为量子生成建模提供了一条可行的途径。

英文摘要

Many applications in quantum simulation, quantum chemistry, and quantum machine learning require not a single quantum state but an ensemble of states characterizing the heterogeneity of a target system. Preparing such ensembles state-by-state is prohibitive in both variational and fault-tolerant settings, thereby motivating a generative modeling approach. We introduce latent-conditioned parameterized quantum circuits (LPQCs), a hybrid quantum-classical framework in which classical neural networks map a latent variable sampled from a prior distribution to the parameters of a parameterized quantum circuit. We prove that LPQCs are universal approximators for probability measures over density operators in the 1-Wasserstein distance, extending classical universal approximation theorems to the quantum-distribution setting. We additionally introduce a multimodal latent prior and a mixture-of-experts circuit architecture, and show empirically that the latent-conditioned parameterization alleviates the barren plateau problem during optimization, a behavior for which we provide rigorous partial guarantees. Numerical experiments validate the framework on a synthetic multi-cluster ensemble of mixed quantum states and on a QM9-derived ensemble of 3-D molecular structures. In these tasks, LPQC outperforms recent quantum generative baselines and matches the generation quality of a classical neural-network baseline, while requiring an output dimension that grows only linearly with the number of qubits rather than exponentially. By leveraging classical expressivity in the latent space, LPQCs offer a tractable route to quantum generative modeling.

2606.17491 2026-06-18 stat.ML cs.LG stat.ME 版本更新

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

5. 优化、泛化与理论分析 10 篇

2602.11557 2026-06-18 cs.LG stat.ML 版本更新

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

小批量随机梯度下降的隐式偏差

Jichu Li, Xuan Tang, Difan Zou

AI总结 研究小批量随机最陡下降在多类分类中的隐式偏差,揭示批大小、动量和方差缩减对最大间隔行为和收敛率的影响,并证明动量可实现小批量收敛,方差缩减可恢复全批量隐式偏差。

详情
AI中文摘要

多种广泛使用的优化方法,如SignSGD和Muon,可以被解释为在不同范数诱导几何下的最陡下降实例。在这项工作中,我们研究了多类分类中小批量随机最陡下降的隐式偏差,刻画了批大小、动量和方差缩减如何在一般逐项和Schatten-$p$范数下塑造极限最大间隔行为和收敛率。我们证明,在没有动量时,最坏情况下的收敛和成功分类只能通过全批量梯度保证。相反,动量通过批量-动量权衡使得小批量收敛到近似最大间隔解成为可能,尽管会减慢收敛速度。该方法提供了完全显式、与维度无关的收敛率,优于先前的结果。此外,我们证明方差缩减可以恢复任意批大小下的精确全批量隐式偏差,尽管收敛速度较慢。最后,我们进一步研究了无动量的单批量最陡下降,并通过一个具体数据示例揭示了其收敛到根本不同偏差的特性,这揭示了纯随机更新的一个关键局限性。总体而言,我们的统一分析阐明了随机优化何时与全批量行为一致,并为更深入地探索随机梯度最陡下降算法的训练行为铺平了道路。

英文摘要

A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

2411.16206 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Scalable Batch Bayesian Optimization Via Subspace Acquisition Functions

可扩展的批量贝叶斯优化:基于子空间采集函数

Dawei Zhan, Zhaoxi Zeng, Shuoxiao Wei, Ping Wu

发表机构 * School of Computing and Artificial Intelligence(计算与人工智能学院)

AI总结 提出通过从原始问题的轴对齐子空间中各选一点来扩展贝叶斯优化至大规模批量评估,显著加速收敛,与十种批量算法相比极具竞争力。

详情
Journal ref
ACM Transactions on Evolutionary Learning and Optimization, 2026
AI中文摘要

将贝叶斯优化扩展到批量评估可以使设计者充分利用并行计算技术。然而,当前大多数批量方法在批量大小增大时扩展性不佳,优化效率往往下降。为解决此问题,本文提出一种简单高效的方法,将贝叶斯优化扩展到大规模批量评估。与现有批量方法不同,新方法的思想是从原始问题中抽取一批轴对齐子空间,并使用现有采集函数从每个子空间中选择一个点。数值实验表明,与顺序贝叶斯优化算法相比,我们提出的方法显著加速收敛,并且与十种批量贝叶斯优化算法相比表现非常有竞争力。我们提出的方法的实现可在此 https URL 获取。

英文摘要

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their optimization efficiencies often deteriorate as the batch size increases. To address this issue, we propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation in this work. Different from existing batch approaches, the idea of the new approach is to draw a batch of axis-aligned subspaces of the original problem and select one point from each subspace using existing acquisition functions. Numerical experiments show that our proposed approach speedups the convergence significantly when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with ten batch Bayesian optimization algorithms. The implementation of our proposed approach is available at https://github.com/zhandawei/SubSpace_Acquisition_Functions.

2506.08764 2026-06-18 cs.LG 版本更新

On the Stability of the Jacobian Matrix in Deep Neural Networks

深度神经网络中雅可比矩阵的稳定性

Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

AI总结 本文利用随机矩阵理论,建立了深度神经网络中雅可比矩阵谱稳定性的通用定理,适用于稀疏和非独立同分布权重,扩展了初始化方案的理论基础。

Comments 21 pages, 28 figures; the main theorem was wrong (again) and is now corrected

详情
AI中文摘要

深度神经网络随着深度增加容易出现梯度爆炸或消失,这一现象与输入-输出雅可比矩阵的谱行为密切相关。先前的工作确定了确保雅可比稳定性的关键初始化方案,但这些分析通常局限于具有独立同分布权重的全连接网络。在这项工作中,我们显著超越了这些限制:我们建立了一个适用于深度神经网络的通用稳定性定理,该定理能够处理稀疏性(例如由剪枝引入的)以及非独立同分布、弱相关权重(例如由训练引起的)。我们的结果依赖于随机矩阵理论的最新进展,并为更广泛类别的网络模型提供了谱稳定性的严格保证。这扩展了具有结构化和依赖随机性的现代神经网络中初始化方案的理论基础。

英文摘要

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251(波尔多大学 CNRS,波尔多 INP,IMB,UMR 5251)

AI总结 提出一种无需超参数调优的随机梯度自适应步长策略,利用一阶随机Oracle的局部几何信息,理论证明收敛性,实验与调优基线竞争。

详情
AI中文摘要

我们引入了一种新的自适应步长策略,用于随机梯度的凸优化,该策略仅通过一阶随机Oracle利用目标函数的局部几何信息,无需任何超参数调优。该方法源于将自适应梯度下降无需下降方法理论化地适应到随机设置。我们证明了在多种假设下,使用我们的步长的随机梯度下降的收敛性,并展示了它在经验上与调优基线竞争。

英文摘要

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

2602.14789 2026-06-18 cs.LG stat.ML 版本更新

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性:超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍兹信息安全中心)

AI总结 研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响,推导了多元设置下稳定振荡的精确条件,并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情
AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如,梯度下降(GD)的稳定解对应于平坦极小值,而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性,但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明,GD可能在线性不稳定的极小值附近稳定振荡,并在步长衰减后收敛,这表明线性分析可能具有误导性。在这项工作中,我们明确研究了非线性项的影响。具体而言,我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数,推广了现有结果。将分析扩展到随机梯度下降(SGD),我们表明即使单个批次不稳定,非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定,而非线性分析所暗示的平均效应。最后,我们证明如果所有批次都是线性稳定的,则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

2605.04267 2026-06-18 cs.LG cs.NE math.OC 版本更新

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

QUIVER: 代理辅助多目标进化优化中的成本自适应偏好查询

Florian A. D. Burnat

发表机构 * University of Warwick(沃里克大学) Warwick Business School(沃里克商学院)

AI总结 提出QUIVER方法,通过自适应选择目标评估与异质偏好查询(成对偏好陈述与无差异调整),在代理辅助多目标优化中最小化决策遗憾,实验显示在WFG难题上效用遗憾降低25%。

Comments Accepted at Genetic and Evolutionary Computation Conference (GECCO '26)

详情
AI中文摘要

交互式多目标优化系统面临预算分配困境:资源可用于昂贵的目标评估,或用于引出决策者偏好以识别帕累托集的相关区域。此外,偏好引出本身跨越具有不同信息内容和认知负担的模态,从廉价、嘈杂的成对偏好陈述(PS)到更丰富但成本更高的无差异调整(IA)。我们研究了未知标量化下的成本感知优化,并引入了QUIVER(查询信息价值估计遗憾),这是一种代理辅助的进化多目标优化器,可自适应地在目标评估和异质偏好查询之间进行选择。在每一步,QUIVER通过最大化每单位总成本的预期决策质量改进来选择下一个动作。在合成决策者模型下的DTLZ和WFG基准测试中,QUIVER在具有挑战性的WFG问题上实现了最低的最终效用遗憾(WFG4上效用遗憾为2.14,WFG9上为2.82:比基线提高25%),优于所有单模态基线。我们分析了PS和IA的最优混合如何适应问题难度:在简单问题(DTLZ2)上,QUIVER选择80%的PS查询;在困难问题(WFG9)上,它转向35%的IA查询。这种自适应模态选择展示了成本感知偏好学习的实际应用。

英文摘要

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA). We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

Clustering and Pruning in Causal Data Fusion

因果数据融合中的聚类与剪枝

Otto Tabell, Santtu Tikka, Juha Karvanen

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对多数据源因果融合中变量增多导致计算复杂的问题,提出剪枝和聚类预处理方法,基于小图推断大图中因果效应的可识别性并给出识别函数。

详情
AI中文摘要

数据融合,即结合观测数据和实验数据的过程,可以使得原本不可识别的因果效应变得可识别。尽管针对特定场景已经开发了识别算法,但do-calculus仍然是因果数据融合的唯一通用工具,特别是当某些变量存在于部分数据源而其他数据源中没有时。然而,基于do-calculus的方法可能随着变量数量增加和因果图复杂度增长而面临计算挑战。因此,有必要在保留必要特征的同时减小此类模型的规模。为此,我们提出将剪枝(移除不必要的变量)和聚类(合并变量)作为因果数据融合的预处理操作。我们将先前关于单一数据源的结果进行推广,并推导出在多数据源情况下应用剪枝和聚类的条件。我们给出了基于较小图推断较大图中因果效应可识别性或不可识别性的充分条件,并展示了如何为可识别的因果效应获得相应的识别函数。来自流行病学和社会科学的例子展示了这些结果的应用。

英文摘要

Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

2509.03734 2026-06-18 cs.DS cs.LG 版本更新

How fast can you find a good hypothesis?

你能多快找到一个好的假设?

Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Sandeep Silwal

发表机构 * BARC, University of Copenhagen(巴尔的效力研究所,哥本哈根大学) Rice University(里士满大学) MIT University of Wisconsin-Madison(麻省理工学院,威斯康星大学麦迪逊分校)

AI总结 研究假设选择问题,提出一种运行时间为poly(n)的混合输出算法,达到C=3-2/n的近似保证,并将正确算法的运行时间改进为Õ(n/(δε²))。

Comments Abstract abridged to meet arxiv requirements. This is the full version of a paper appearing at COLT 2026

详情
AI中文摘要

在假设选择问题中,我们被给予对有限候选分布(假设)集合 $\mathcal{H} = \{H_1, \ldots, H_n\}$ 的样本和查询访问,以及来自未知分布 $P$ 的样本,两者都在域 $\mathcal{X}$ 上。目标是输出一个分布 $Q$,使其到 $P$ 的距离与 $\mathcal{H}$ 中最近假设的距离相当。具体来说,如果最小距离是 $\mathsf{OPT}$,我们旨在输出 $Q$,使得以至少 $1-\delta$ 的概率,其到 $P$ 的总变差距离至多为 $C \cdot \mathsf{OPT} + \varepsilon$。对于正确算法(其中 $Q \in \mathcal{H}$),最优近似为 $C=3$,使用来自 $P$ 的 $\Theta(\log(n/\delta)/\varepsilon^2)$ 个样本;对于不正确算法(其中 $Q$ 不一定在 $\mathcal{H}$ 中),最优近似为 $C=2$,使用来自 $P$ 的 $\tilde{\Theta}(\log(n/\delta)/\varepsilon^2)$ 个样本。在不正确设置中,达到 $C=2$ 的算法 [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] 的运行时间随 $|\mathcal{X}|$ 多项式增长——对于实值分布,它无法在有限时间内运行。改进运行时间的一个有希望的途径是考虑输出假设混合 $Q$ 的不正确算法,因为这样的分布可以用 $n$ 个内存字表示。我们证明 (1) 一个下界:除非样本数量是 $|\mathcal{X}|$ 的多项式,否则任何输出混合的算法都无法实现比 $C = 3-2/n$ 更好的近似,以及 (2) 一个运行时间为 $\text{poly}(n)$ 并达到相同近似保证的算法。在正确设置中,[Aliakbarpour, Bun, Smith, NeurIPS 2024] 提供了一个 $C=3$ 且运行时间为 $\tilde{O}(n/(\delta^3\varepsilon^3))$ 的算法。我们将时间复杂度改进为 $\tilde{O}(n/(\delta \varepsilon^2))$,显著减少了对置信度和误差参数的依赖。

英文摘要

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$, both over a domain $\mathcal{X}$. The goal is to output a distribution $Q$ whose distance to $P$ is comparable to that of the nearest hypothesis in $\mathcal{H}$. Specifically, if the minimum distance is $\mathsf{OPT}$, we aim to output $Q$ such that, with probability at least $1-δ$, its total variation distance to $P$ is at most $C \cdot \mathsf{OPT} + \varepsilon$. The optimal approximation for proper algorithms (where $Q \in \mathcal{H}$) is $C=3$ using $Θ(\log(n/δ)/\varepsilon^2)$ samples from $P$ and for improper algorithms (where $Q$ is not necessarily in $\mathcal{H}$) is $C=2$ using $\tildeΘ(\log(n/δ)/\varepsilon^2)$ samples from $P$. In the improper setting, the algorithm achieving $C=2$ [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with $|\mathcal{X}|$ -- it does not run in finite time for real-valued distributions. A promising path towards improved runtime is to consider improper algorithms which output a mixture $Q$ of the hypotheses as such a distribution can be represented in $n$ words of memory. We show (1) a lower bound that no algorithm which outputs a mixture can achieve approximation better than $C = 3-2/n$ unless the number of samples is polynomial in $|\mathcal{X}|$, as well as (2) an algorithm which runs in time $\text{poly}(n)$ and achieves the same approximation guarantee. In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with $C=3$ running in $\tilde{O}(n/(δ^3\varepsilon^3))$ time. We improve this time complexity to $\tilde{O}(n/(δ\varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters.

2603.04895 2026-06-18 stat.ML cs.LG math.OC 版本更新

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

ReLU激活函数如何影响高维神经网络回归中梯度下降的隐式偏差?

Kuo-Wei Lai, Guanghui Wang, Molei Tao, Vidya Muthukumar

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文通过原始-对偶分析,研究了高维随机数据下浅层ReLU模型平方损失梯度下降的隐式偏差,证明其以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁))。

Comments 66 pages

详情
AI中文摘要

过度参数化的机器学习模型(包括神经网络)通常会导致欠定的训练目标,具有多个全局最小值。隐式偏差指的是通过常见优化算法(如梯度下降)达到的极限全局最小值。在本文中,我们刻画了在高维随机特征上使用平方损失训练浅层ReLU模型时梯度下降的隐式偏差。先前的工作(Vardi和Shamir,2021)表明,在最坏情况下隐式偏差不存在,或者在完全正交数据下恰好对应于最小ℓ2范数插值解(Boursier等人,2022)。我们的工作介于这两个极端之间,并表明,对于足够高维的随机数据,隐式偏差以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁)),其中n是训练样本数,λ表示数据协方差矩阵的谱。我们的结果通过一种新颖的原始-对偶分析获得,该分析仔细跟踪了预测、数据跨度系数及其相互作用的演变,并表明ReLU激活模式在随机数据上以高概率迅速稳定。

英文摘要

Overparameterized ML models, including neural networks, typically induce underdetermined training objectives with multiple global minima. The implicit bias refers to the limiting global minimum that is attained by a common optimization algorithm, such as gradient descent (GD). In this paper, we characterize the implicit bias of GD for training a shallow ReLU model with the squared loss on high-dimensional random features. Prior work (Vardi and Shamir, 2021) showed that the implicit bias does not exist in the worst-case, or corresponds exactly to the minimum-$\ell_2$-norm interpolating solution under exactly orthogonal data (Boursier et al., 2022). Our work interpolates between these two extremes and shows that, for sufficiently high-dimensional random data, the implicit bias approximates the minimum-$\ell_2$-norm solution with high probability with a gap on the order $Θ(\sqrt{n/||λ||_1})$, where $n$ is the number of training examples and $λ$ denotes the spectrum of the data covariance matrix. Our results are obtained through a novel primal-dual analysis that carefully tracks the evolution of predictions, data-span coefficients, as well as their interactions, and show that the ReLU activation pattern quickly stabilizes with high probability over random data.

2605.20726 2026-06-18 stat.ME cs.LG stat.ML 版本更新

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

发表机构 * Department of Statistics, Stanford University(斯坦福大学统计学系) Department of Statistics and Data Science, University of Pennsylvania(宾夕法尼亚大学统计学与数据科学系) Department of Mathematics, Stanford University(斯坦福大学数学系)

AI总结 本文提出了一种在多重检验问题中对虚假发现比例(FDP)的处处有效界,通过构造高概率包络来保证在任意后验阈值选择下的统计保证,同时展示了该方法在异常检测和符合选择中的应用。

Comments 34 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference

详情
AI中文摘要

现代将符合推断应用于多重检验问题,如异常检测和候选选择时,通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例(FDP)来衡量,定义为错误选择的比例。现有方法通常控制FDP的期望值,使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界,且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界,从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络,实现了同时有效性。此外,我们的框架允许从业者调节包络的形状,从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验,我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

6. 高效学习、压缩与部署 6 篇

2509.22020 2026-06-18 cs.LG 版本更新

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University(中山大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) The Hong Kong University of Science and Technology(香港科技大学) The Chinese University of Hong Kong(香港中文大学) National Supercomputing Center in Shenzhen(深圳国家超算中心) Huawei Technologies Co., Ltd(华为技术有限公司) Tsinghua University(清华大学)

AI总结 提出WeatherPEFT框架,通过任务自适应动态提示和随机Fisher引导自适应选择,在天气下游任务上以更少参数达到全微调性能。

详情
AI中文摘要

尽管机器学习的最新进展使天气基础模型(WFM)在多种下游任务中具备了强大的泛化能力,但随着模型规模扩大,计算需求不断攀升,实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调(PEFT)方法无法应对天气下游任务的独特挑战,如变量异质性、分辨率多样性和时空覆盖变化,导致在WFM上性能欠佳。为弥补这一差距,我们提出WeatherPEFT,一种新颖的PEFT框架,包含两项协同创新。首先,在前向传播中,任务自适应动态提示(TADP)通过内部和外部模式提取,将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌,实现针对特定下游任务的上下文感知特征重校准。其次,在反向传播中,随机Fisher引导自适应选择(SFAS)不仅利用Fisher信息识别并更新最关键的任务参数,从而保留不变的预训练知识,还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率,现有PEFT方法与全微调相比存在显著差距,而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

2601.21626 2026-06-18 cs.LG cs.AI 版本更新

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

HeRo-Q: 通过Hessian条件化实现稳定低比特量化的通用框架

Jinhao Zhang, Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Science and Technology of China(中国科学技术大学) Zhejiang Lab(浙江实验室) Peng Cheng Laboratory(鹏城实验室)

AI总结 针对后训练量化中“低误差、高损失”的矛盾,提出HeRo-Q算法,通过轻量可学习的旋转压缩矩阵重塑损失景观,降低最大Hessian特征值,增强对量化噪声的鲁棒性,在Llama和Qwen模型上优于现有方法。

详情
AI中文摘要

后训练量化(PTQ)是一种主流的模型压缩技术,但由于其仅专注于最小化量化误差,常常导致矛盾的“低误差、高损失”现象。根本原因在于LLM损失景观的Hessian矩阵:少数高曲率方向对扰动极其敏感。为了解决这个问题,我们提出了Hessian鲁棒量化(HeRo Q)算法,该算法在量化前对权重空间应用一个轻量级、可学习的旋转压缩矩阵。这个联合框架通过降低最大的Hessian特征值并减小其最大特征值来重塑损失景观,从而显著增强对量化噪声的鲁棒性。HeRo-Q不需要修改架构,计算开销可忽略不计,并且可以无缝集成到现有的PTQ流程中。在Llama和Qwen模型上的实验表明,HeRo Q在标准W4A8设置下不仅持续优于包括GPTQ、AWQ和SpinQuant在内的最先进方法,而且在极具挑战性的W3A16超低比特场景中表现出色,将Llama3 8B在GSM8K上的准确率提升至70.15%,并有效避免了激进量化中常见的逻辑崩溃。

英文摘要

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

2602.00161 2026-06-18 cs.LG cs.AI cs.CL quant-ph 版本更新

LLM Compression by Block Removal with Constrained Binary Optimization

通过带约束二进制优化的块移除进行LLM压缩

David Jansen, Roman Rausch, Ali Hashemi, David Montero, Román Orús

发表机构 * Multiverse Computing(多维计算公司) Donostia International Physics Center(多斯蒂亚国际物理中心) Ikerbasque Foundation for Science(伊克尔巴斯克科学基金会)

AI总结 提出将大语言模型块移除压缩问题建模为约束二进制优化,映射到Ising玻璃系统,实现高效排序和高质量非连续块移除,在50%压缩时MMLU提升近23个百分点,且计算高效、通用性强。

Comments 16 pages, 3 figures

详情
AI中文摘要

在本文中,我们将通过最优删除Transformer块(“块移除”)来压缩大语言模型(LLM)的问题,表述为一个约束二进制优化(CBO)问题,该问题可以映射到物理系统(Ising玻璃),其能量是下游模型性能的强代理。这种表述使得能够高效地对大量候选块移除配置进行排序,产生许多高质量、非平凡的解决方案,而不仅仅是移除连续区域。我们的方法在深度压缩场景中表现强劲,例如在Llama-3.3-70B-Instruct的50%压缩中,与其他最先进的块移除方法相比,我们在MMLU基准上取得了近23个百分点的提升。对于较轻的压缩,它在多个基准上与这些方法表现相当,适用于Llama-3.1-8B-Instruct、Qwen3-14B(重训练前后)以及Llama-3.3-70B-Instruct。该方法计算效率高,仅需在校准数据集上对少数活跃参数进行前向和反向传播。此外,我们证明,当无法精确求解CBO问题时,使用良好的启发式求解器可以在可忽略的运行时间内提供在下游任务上表现良好的解决方案。该方法可以轻松应用于任何架构。我们在最近的NVIDIA-Nemotron-3-Nano-30B-A3B-FP8模型上展示了这种通用性,该模型具有高度不均匀且具有挑战性的块结构,并且在移除2个注意力层或3个混合专家层时,我们在AIME25和GPQA上超越了最先进水平。

英文摘要

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

2512.12850 2026-06-18 cs.AR cs.LG cs.SY eess.SY hep-ex 版本更新

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

KANELÉ:基于Kolmogorov-Arnold网络的高效LUT评估

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出KANELÉ框架,利用Kolmogorov-Arnold网络(KAN)的独特性质,通过量化与剪枝协同优化,首次系统实现FPGA上的高效LUT映射,相比先前方法加速高达2700倍并节省大量资源。

Comments International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)

详情
AI中文摘要

低延迟、资源高效的FPGA神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表(LUT)的神经网络是一种常见解决方案,结合了强大的表示能力和高效的FPGA实现。在这项工作中,我们介绍了KANELÉ,一个利用Kolmogorov-Arnold网络(KAN)独特性质进行FPGA部署的框架。与传统的多层感知器(MLP)不同,KAN使用可学习的一维样条作为边缘激活函数,其域固定,这种结构天然适合离散化和高效的LUT映射。我们提出了第一个在FPGA上实现KAN的系统设计流程,通过量化与剪枝协同优化训练,以实现紧凑、高吞吐量和低延迟的KAN架构。我们的结果表明,与先前的KAN-on-FPGA方法相比,加速高达2700倍,并节省了数量级的资源。此外,KANELÉ在广泛使用的基准测试中匹配或超越了其他基于LUT的架构,特别是在涉及符号或物理公式的任务中,同时平衡了FPGA硬件上的资源使用。最后,我们通过将框架扩展到实时、高能效的控制系统,展示了其多功能性。

英文摘要

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT(麻省理工学院)

AI总结 针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求,提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性,在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情
AI中文摘要

超快在线学习对于高频系统(如量子计算和核聚变控制)至关重要,这些系统中的自适应必须在亚微秒时间尺度内发生。满足这些需求需要在严格的内存约束下进行低延迟、固定精度的计算,而传统的多层感知器(MLP)在这种条件下既低效又不稳定。我们识别了Kolmogorov-Arnold网络(KAN)与这些约束相符的关键特性。具体来说,我们表明:(i)利用B样条局部性的KAN更新是稀疏的,从而实现优越的片上资源缩放;(ii)KAN对固定点量化具有固有的鲁棒性。通过在现场可编程门阵列(FPGA)上实现固定点在线训练(一种代表性的片上计算平台),我们证明基于KAN的在线学习器在一系列低延迟和资源受限的任务中比MLP显著更高效且更具表达力。据我们所知,这项工作首次展示了在亚微秒延迟下的无模型在线学习。

英文摘要

Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

2606.04404 2026-06-18 stat.ML cs.LG 版本更新

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

基于Knockoffs的深度神经网络错误发现率控制与简化

Wenyu Liao, Yiqing Shi, Fang Xie

发表机构 * bnbu.edu.cn(北京理工大学)

AI总结 本文基于knockoff方法和正则化神经网络,提出了三种在控制错误发现率条件下的变量筛选方法(单层过滤、多层过滤、变量权重聚合过滤),以简化深度神经网络并降低计算复杂度。

详情
AI中文摘要

深度神经网络是机器学习中广泛使用的框架,已广泛应用于各个领域。然而,深度神经网络通常涉及大量参数和输入,其中许多可能与目标或真实输出无关。这些参数和输入变量不仅增加了计算复杂度,还导致了额外的计算成本。解决这一问题的一种方法是knockoff方法,该方法在高维回归中已被证明能有效控制错误发现率。基于knockoff方法和正则化神经网络,本文提出了三种在控制错误发现率条件下的变量筛选方法:单层过滤、多层过滤、变量权重聚合过滤。与现有算法相比,我们发现我们的算法表现出令人满意的性能。

英文摘要

The deep neural network is a widely used framework in machine learning that has been widely applied in various fields. However, deep neural networks often involve a large number of parameters and inputs, many of which may be irrelevant to the goal or true output. These parameters and input variables not only increase computational complexity, but also contribute to additional computational cost. One solution to this problem is knockoff methods, which have proven successful in controlling false discovery rates in high-dimensional regression. Building on the knockoff methods and using the regularised neural network, this paper proposes three variable screening methods under the condition of controlling false discovery rates: one layer filter, multiple layers filter, and variable weight aggregation filter. In comparison with existing algorithms, we find that our algorithms show satisfactory performance.

7. 联邦学习、隐私与安全 4 篇

2502.10239 2026-06-18 cs.LG cs.AI 版本更新

Efficient Zeroth-Order Federated Finetuning of Language Models on Resource-Constrained Devices

资源受限设备上语言模型的高效零阶联邦微调

Mohamed Aboelenien Ahmed, Kilian Pfeiffer, Ramin Khalili, Heba Khdr, Jörg Henkel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Huawei(华为) Heisenberg Research Center (Munich), Germany(海森堡研究中心(慕尼黑),德国)

AI总结 提出一种基于零阶优化的联邦微调方法,通过分块模型并分配更多扰动到后一块,复用中间激活减少前向评估次数,在保持内存和通信优势的同时将计算量降低至其他零阶方法的1/3。

Comments Published at TMLR

详情
AI中文摘要

联邦学习是一种有前景的范式,可以在分布式数据源上微调大型语言模型,同时保护数据隐私。然而,在边缘设备上微调如此大的模型由于资源需求高而具有挑战性。零阶优化通过有限差分近似估计梯度,依赖于模型参数随机扰动下的函数评估。因此,与任务对齐的零阶优化提供了一种潜在解决方案,允许仅使用前向传播(推理级内存需求和低通信开销)进行微调,但存在收敛慢和计算需求高的问题。在本文中,我们提出了一种新的基于零阶优化的方法,应用更高效的技术来减少使用大量扰动带来的计算需求,同时保留其收敛优势。这是通过将模型分成连续的块,并为第二块分配更多扰动来实现的,从而能够高效复用中间激活,以更少的前向评估更新整个网络。我们在RoBERTa-large、OPT1.3B、LLaMa-3-3.2B模型上的评估显示,与其他基于零阶优化的技术相比,计算量减少了高达3倍,同时保留了一阶联邦学习技术的内存和通信优势。

英文摘要

Federated Learning (FL) is a promising paradigm for finetuning Large Language Models (LLMs) across distributed data sources while preserving data privacy. However, finetuning such large models is challenging on edge devices due to its high resource demand. Zeroth-order Optimization (ZO) estimates gradients through finite-difference approximations, which rely on function evaluations under random perturbations of the model parameters. Consequently, ZO with task alignment provides a potential solution, allowing finetuning using only forward passes with inference-level memory requirements and low communication overhead, but it suffers from slow convergence and higher computational demand. In this paper, we propose a new ZO-based method that applies a more efficient technique to reduce the computational demand associated with using a large number of perturbations while preserving their convergence benefits. This is achieved by splitting the model into consecutive blocks and allocating a higher number of perturbations to the second block, enabling efficient reuse of intermediate activations to update the full network with fewer forward evaluations. Our evaluation on RoBERTa-large, OPT1.3B, LLaMa-3-3.2B models shows up to $3\times$ reduction in computation compared to the other ZO-based techniques, while retaining the memory and communication benefits over first-order federated learning techniques.

2502.17748 2026-06-18 cs.LG cs.CR 版本更新

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

FinP:联邦学习中通过解决隐私风险差异实现隐私公平性

Tianyu Zhao, Mahmoud Srewa, Salma Elmalaki

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 针对联邦学习中隐私风险分布不均的问题,提出FinP框架,通过服务器端自适应聚合和客户端正则化技术,减轻源推理攻击风险,将隐私暴露差异降低57.14%,同时保持模型效用与基线相当。

Comments To appear in PoPETS 2026 Issue 4. Privacy Enhancing Technology Symposium (PETS) 2026

详情
AI中文摘要

联邦学习(FL)固有地缓解了大规模数据集中化风险;然而,其隐私保护并非均匀分布——使得脆弱个体不成比例地暴露于复杂的隐私攻击之下。关键的是,以人为中心的FL环境中的统计异质性常常导致隐私风险的不公平分布,尤其影响那些敏感属性或行为使其成为异常值的个体。为解决这一关键差距,我们引入了FinP,这是一个新颖的框架,旨在通过减轻客户端对源推理攻击(SIA)的过度脆弱性来形式化和实施隐私公平性。FinP实施了一种双管齐下的防御策略,同时解决隐私差异的症状和根本原因,确保没有一组客户端承担过度的隐私负担。它结合了服务器端自适应聚合机制(根据客户端的估计隐私风险动态加权其贡献)和客户端正则化技术(抑制导致独特数据记忆的局部过拟合)。在FEMNIST、人类活动识别(HAR)和CIFAR-10数据集上的广泛实证评估表明,FinP有效地将隐私公平性与主要任务效用对齐。值得注意的是,FinP成功减轻了SIA风险并减少了隐私暴露差异,证明了强大的隐私公平性保证无需牺牲模型效用。最终,FinP通过将脆弱性差异降低高达57.14%,同时将全局模型效用保持在标准联邦基线±1.75%的微小范围内,建立了公平的隐私保护。

英文摘要

Federated Learning (FL) inherently mitigates mass data centralization risks; however, its privacy protections are not equally distributed - leaving vulnerable individuals disproportionately exposed to sophisticated privacy attacks. Crucially, statistical heterogeneity in human-centric FL environments often results in an inequitable distribution of privacy risks, particularly affecting those whose sensitive attributes or behaviors make them outliers. To address this critical gap, we introduce FinP, a novel framework designed to formalize and enforce fairness-in-privacy by mitigating disproportionate client vulnerability to Source Inference Attacks (SIA). FinP operationalizes a two-pronged defense strategy that tackles both the symptoms and root causes of privacy disparity, ensuring that no group of clients bears an excessive privacy burden. It combines a server-side adaptive aggregation mechanism, which dynamically weights client contributions based on their estimated privacy risk, with a client-side regularization technique to curb localized overfitting that drives unique data memorization. Extensive empirical evaluations on FEMNIST, Human Activity Recognition (HAR), and CIFAR-10 datasets demonstrate that FinP effectively aligns privacy fairness with primary task utility. Notably, FinP successfully mitigates SIA risks and reduces disparities in privacy exposure, establishing that strong fairness-in-privacy guarantees need not compromise model utility. Ultimately, FinP establishes equitable privacy protections by reducing vulnerability disparities by up to 57.14%, while preserving global model utility within a marginal +/- 1.75% of standard federated baselines.

2507.04219 2026-06-18 cs.LG cs.AI 版本更新

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

模型崩溃不是错误,而是大语言模型机器遗忘中的一种特性

Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan Günnemann

发表机构 * Dept. of Computer Science & Munich Data Science Institute, Technical University of Munich(计算机科学系及慕尼黑数据科学研究所,技术大学慕尼黑) Mila, Université de Montréal(蒙特利尔大学Mila)

AI总结 提出部分模型崩溃(PMC)方法,通过故意触发模型在目标数据上的分布崩溃实现遗忘,无需在遗忘目标上优化,有效移除私有信息并保持模型效用。

Comments Accepted at ICLR 2026

详情
AI中文摘要

当前大语言模型的遗忘方法通过将待移除的私有信息纳入微调数据来优化。我们认为这不仅可能强化对敏感数据的暴露,而且从根本上违背了最小化其使用的原则。作为补救,我们提出了一种新颖的遗忘方法——部分模型崩溃(PMC),该方法在遗忘目标中不需要遗忘目标。我们的方法受到最近观察的启发:在生成模型上训练其自身生成会导致分布崩溃,从而有效移除模型输出中的信息。我们的核心见解是,可以通过故意触发我们旨在移除的数据上的模型崩溃来利用模型崩溃进行机器遗忘。我们从理论上分析了我们的方法收敛到期望结果,即模型遗忘目标移除的数据。我们实验证明,PMC克服了现有显式优化遗忘目标的遗忘方法的四个关键限制,并在保持通用模型效用的同时更有效地从模型输出中移除私有信息。总体而言,我们的贡献代表了向更全面、更符合现实隐私约束的遗忘迈出的重要一步。代码可在该 https URL 获取。

英文摘要

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes four key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that better aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

2605.21115 2026-06-18 cs.DC cs.LG 版本更新

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

自动化抗拜占庭攻击的集群化去中心化联邦学习用于连接电动车的电池智能

Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane

发表机构 * SAMOVAR, Télécom SudParis(SAMOVAR,法国电信南巴黎学院) Department of Computer Science, German University of Technology in Oman (GUtech)(阿曼技术大学计算机科学系) L3i, La Rochelle University(拉罗什大学L3i)

AI总结 本文提出了一种自动化抗拜占庭攻击的集群化去中心化联邦学习框架ABC-DFL,用于连接电动车的电池智能,通过引入动态Quorum拜占庭容错协议和基于或acles的聚合层,提高信任、安全和自动化水平,FLECA协议通过适应性阈值过滤恶意更新,有效缓解拜占庭攻击。

Comments 16 pages, 8 figures

详情
AI中文摘要

联邦学习(FL)已作为一种有前景的范式,用于管理智能交通系统(ITS)中的电动汽车(EV)电池数据,使其能够执行隐私保护的任务,如异常检测和容量估计。然而,大多数现有框架依赖于集中式聚合方案,这在安全性和信任方面存在关键限制。为了应对这些挑战,我们提出了ABC-DFL,一种用于连接电动车的自动化抗拜占庭攻击的集群化去中心化联邦学习(C-DFL)框架。所提出的激励驱动的C-DFL系统用开放许可的区块链取代中央服务器,特征新的动态Quorum拜占庭容错(QBFT)协议和基于或acles的聚合层,以增强信任、安全和自动化。ABC-DFL的核心是FLECA(过滤分层增强聚合),一种稳健的分层聚合协议,通过让每个EV使用基于其参考模型更新偏差的适应性阈值过滤恶意更新来缓解拜占庭攻击。Oracle节点负责跨组聚合,利用稳健的聚类来隔离和聚合来自可信EV组的模型更新。全面的实验评估显示,FLECA在良好条件下与FedProx收敛,并在适应性对抗场景中显著优于现有防御措施,攻击影响评分低于0.10。此外,多个多任务模型学习实验验证了激励机制的有效性和公平性。最后,链上和链下基准验证了ABC-DFL的实用性。

英文摘要

Federated learning (FL) has emerged as a promising paradigm for managing electric vehicle (EV) battery data in intelligent transportation systems (ITS), enabling privacy-preserving tasks such as anomaly detection and capacity estimation. However, most existing frameworks rely on centralized aggregation schemes, which pose critical limitations in terms of security and trust. To address these challenges, we propose ABC-DFL, an automated Byzantine-resilient clustered decentralized federated learning (C-DFL) framework for connected EVs. The proposed incentive-driven C-DFL system replaces the central server with an open-permissioned blockchain, featuring a new dynamic Quorum Byzantine Fault Tolerance (QBFT) protocol and an oracle-based aggregation layer, to enhance trust, security, and automation. At the core of ABC-DFL lies FLECA (Filtered Layered Enhanced Clustering Aggregation), a robust hierarchical aggregation protocol that mitigates Byzantine attacks by having each EV filter malicious updates using an adaptive threshold based on deviations from its reference model update. Oracle nodes, responsible for inter-group aggregation, employ robust clustering to isolate and aggregate model updates from trustworthy EV groups. Comprehensive experimental evaluations demonstrate that FLECA matches FedProx convergence under benign conditions and significantly outperforms existing defenses with attack impact scores below 0.10 in adaptive adversarial scenarios. Furthermore, several learning experiments with multitask models confirm the effectiveness and fairness of the incentive mechanism. Finally, on-chain and off-chain benchmarks validate the practicality of ABC-DFL.

8. 鲁棒性、不确定性与可信学习 5 篇

2504.14798 2026-06-18 cs.LG cs.CV 版本更新

RUB: Evaluating Residual Knowledge in Unlearned Models

RUB: 评估未学习模型中的残留知识

Hao Xuan, Xingyu Li

发表机构 * Electrical and Computer Engineering University of Alberta(电气与计算机工程大学阿尔伯塔大学)

AI总结 提出鲁棒未学习原则及统一基准RUB,通过未学习映射攻击(UMA)检测残留信息,揭示现有方法在对抗评估下的脆弱性。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026, pages 8550-8559
AI中文摘要

机器未学习(MUL)已成为隐私保护和内容监管的关键机制,然而当前技术往往无法保证完全移除敏感信息。虽然现有工作大多关注验证未学习的执行,但它们忽略了模型在面对对抗性恢复遗忘知识尝试时是否保持鲁棒性的关键问题。在这项工作中,我们倡导鲁棒未学习原则,要求模型既与重新训练的模型不可区分,又能抵御多样化的对抗威胁。为实例化这一原则,我们提出了一个统一基准RUB(鲁棒未学习基准),系统评估未学习算法在分类、图像到图像重建和文本到图像合成中的鲁棒性。在此框架内,我们引入未学习映射攻击(UMA)作为检测残留信息的通用方法,并展示现有攻击策略如何适应此框架,只要它们符合通用UMA框架。我们在判别式和生成式任务上的实验表明,最先进的未学习方法在这些评估下仍然脆弱,即使通过了标准验证指标。通过将鲁棒性定位为核心标准并提供对抗评估基准,我们希望RUB能为更可靠和安全的未学习实践铺平道路。RUB中的代码库和模型检查点将公开发布。

英文摘要

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

2505.03646 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

通过梯度信号恢复揭示自编码器中的隐藏漏洞

Chethan Krishnamurthy Ramanaik, Arjun Roy, Tobias Callies, Eirini Ntoutsi

发表机构 * University of the Bundeswehr Munich(联邦国防军理工大学)

AI总结 针对自编码器对抗攻击中梯度消失导致鲁棒性被高估的问题,提出GRILL框架恢复梯度信号,显著提升攻击效果,暴露隐藏漏洞。

详情
AI中文摘要

深度自编码器(AE)的对抗鲁棒性受到的关注远少于判别模型,尽管其压缩的潜在表示会导致病态映射,从而放大小的输入扰动并破坏重建稳定性。现有的AE白盒攻击通过优化范数有界的对抗扰动以最大化重建损失,往往收敛到次优扰动,从而可能高估AE的鲁棒性。我们表明,这种限制与通过病态层反向传播时对抗损失梯度消失有关,这些病态层的中间权重矩阵具有接近零的奇异值。为了解决这个问题,我们提出了GRILL(病态层中的梯度信号恢复)框架,旨在减轻梯度退化并提高编码器-解码器架构中对抗鲁棒性评估的可靠性。GRILL旨在缓解优化过程中的对抗梯度退化,使攻击能够在固定范数约束下更好地逼近高失真扰动。通过在多种AE架构上的广泛实验,包括样本特定和通用攻击,以及标准和自适应攻击设置,我们表明GRILL显著提高了攻击有效性,从而暴露了现有攻击限制所隐藏的漏洞。除了AE之外,我们提供了初步证据表明现代多模态编码器-解码器架构也存在类似的漏洞。

英文摘要

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

2606.16214 2026-06-18 cs.LG cs.AI 版本更新

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

贝叶斯深度学习中的校准无采样不确定性估计

Tobias Jan Wieczorek, Leon de Andrade, Thomas Möllenhoff, Marcus Rohrbach

发表机构 * TU Darmstadt & hessian.AI, Darmstadt, Germany(达姆施塔特工业大学 & hessian.AI,德国达姆施塔特) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan(日本理化学研究所革新智能研究中心,日本东京)

AI总结 提出校准方差传播(CVP),通过新型归一化层传播方法、激活函数处理技术及轻量校准步骤,在单次前向传播中高效估计不确定性,在Transformer和CNN上达到与MC采样相当的精度,成本显著降低。

详情
AI中文摘要

现代深度学习模型仍然以过度自信而闻名,限制了它们在高风险应用中的可靠性。贝叶斯方法通过学习模型参数的分布来应对这一问题,最近的进展使得在大规模架构上以与AdamW相当的成本实现这一目标成为可能。然而,测试时仍存在一个挑战:预测必须对从后验中采样的权重进行多次前向传播的平均,这代价高昂。方差传播提供了一种高效的替代方案,在单次前向传播中计算每层不确定性的解析近似。虽然此类技术对MLP有效,但由于现代架构的深度增加和层类型多样性,其扩展仍然具有挑战性。为填补这一空白,我们提出了校准方差传播(CVP),它引入了一种新的归一化层传播方法,结合了处理激活函数的近期技术,并通过轻量校准步骤吸收残差误差。CVP在Transformer和CNN上产生与MC采样相当准确的不确定性估计,而成本仅为极小部分。与先前的方差传播工作相比,CVP在BEiT-3上对视觉推理(NLVR2)的$0.5\%$风险覆盖率从$8.2\%$提高到$14.6\%$,在ViLT上对VQAv2从$2.6\%$提高到$10.8\%$,且增益扩展到卷积架构。

英文摘要

Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结 研究半随机模型下植入子图检测问题,证明存在对抗者时强次对数密度子图检测在信息论上不可能,而对数以上密度子图统计极限不变,并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情
AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究,产生了丰富的刻画统计和计算阈值的结果。然而,大多数先前的工作假设纯随机生成模型,使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型,其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是,统计学家仍然不知道哪些边被移除,这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限,揭示了尖锐的二分性。具体而言,对于具有强次对数最大密度的植入子图,在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是,对于具有超对数密度的子图,统计极限基本保持不变;我们证明最优(尽管计算上不可行)的似然比检验仍然是鲁棒的。在这些统计边界之外,我们设计了一种新的计算高效且鲁棒的检测算法,并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架,并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少,而且何处:将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对安全关键分类中认知不确定性度量无法区分类别的问题,提出将互信息分解为每类向量$C_k$,通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制,在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

详情
Journal ref
Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}
AI中文摘要

在安全关键分类中,失败的代价往往是不对称的,然而贝叶斯深度学习用单个标量——互信息(MI)来总结认知不确定性,这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$,其中$\mu_k{=}\mathbb{E}[p_k]$,$\sigma_k^2{=}\mathrm{Var}[p_k]$,计算基于后验样本。该分解来自熵的二阶泰勒展开;$1/\mu_k$加权校正了边界抑制,使$C_k$在稀有类别和常见类别之间具有可比性。根据构造,$\sum_k C_k \approx \mathrm{MI}$,并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中关键类别的$C_k$相比MI降低了34.7%的选择性风险,相比方差基线降低了56.2%;(ii)临床和图像基准上的分布外检测,其中$\sum_k C_k$取得了最高的AUROC,并且每类视角暴露了MI无法察觉的不对称偏移;(iii)受控的标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入的偶然噪声的敏感性低于MI,而在迁移学习下两种度量均退化。在所有任务中,后验近似的质量对不确定性的影响至少与度量选择本身一样强,这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

9. 图学习与结构化数据 3 篇

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN:面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London(伦敦大学学院) The Hong Kong University of Science(香港科学大学) Nokia Bell Labs(诺基亚贝尔实验室) Technical University of Munich(慕尼黑技术大学) University of Oxford(牛津大学)

AI总结 提出UST-GNN框架,整合邻域连通性、异质城市特征和位置嵌入,在大伦敦4835个邻域的健康预测中,严格空间交叉验证下R²提升8.4-13.2%,并引入主成分模块解释嵌入。

详情
AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果,对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系,而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战,通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN},一个统一的空间-拓扑图神经网络框架,将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集(包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果),UST-GNN在严格空间交叉验证下,比强统计基线、地理增强基线和图机器学习基线表现更优,样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块,从地理角度解释学习到的节点嵌入,并将其与政策相关的协变量联系起来。结果分析恢复了已知模式,为有争议的关联提供了新视角,并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外,UST-GNN提供了一个统一的GeoAI分析流程,可嵌入城市数字孪生工作流,用于情景测试、监测和数据驱动的决策,以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

2606.15633 2026-06-18 cs.LG 版本更新

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

形式化并缓解大语言模型注意力中的结构失真以实现零样本图推理

Donald Loveland, Puja Trivedi, Ari Weinstein, Edward W Huang, Danai Koutra

发表机构 * University of Michigan(密歇根大学) Amazon(亚马逊)

AI总结 本文形式化了大语言模型处理文本属性图时因图线性化导致的结构失真机制,并提出轻量级推理时修改方法GaLA,通过校正注意力偏差提升零样本图推理性能。

Comments Accepted to KDD 2026

详情
AI中文摘要

大语言模型(LLM)在文本属性图(TAG)推理中展现出潜力。然而,将LLM应用于图需要将其结构线性化为序列,这引入了根源于图带宽问题的失真。虽然这种失真已被证明会降低性能,但通常归因于提示设计或模型规模,其潜在机制尚不清楚。在这项工作中,我们展示了旋转位置嵌入如何将图线性化为带宽相关的注意力衰减,抑制了序列化序列中被强制分隔开的图相邻节点之间的注意力。这将基于LLM的图推理的焦点从提示工程和规模缩放转向纠正注意力错位。受此分析启发,我们提出了图对齐语言注意力(GaLA),一种轻量级的、推理时修改LLM的方法。GaLA将注意力偏向图相邻节点,同时保留LLM的序列归纳偏差。在TAG基准测试中,GaLA以可忽略的开销提升了性能,表明失真是基于LLM的图推理中可纠正的瓶颈。

英文摘要

Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph bandwidth problem. While this distortion has been shown to degrade performance, it is often attributed to prompt design or model scale, leaving the underlying mechanism unclear. In this work, we show \textit{how} rotary positional embeddings turn graph linearization into bandwidth-dependent attention decay, suppressing attention between graph-adjacent nodes that are forced far apart in the serialized sequence. This shifts the focus of LLM-based graph reasoning from prompt engineering and scaling toward correcting attention misalignment. Motivated by this analysis, we propose \textbf{G}raph-\textbf{a}ligned \textbf{L}anguage \textbf{A}ttention (\textbf{GaLA}), a lightweight, inference-time modification for LLMs. GaLA biases attention toward graph-adjacent nodes while preserving the LLM's sequential inductive biases. Across TAG benchmarks, GaLA improves performance with negligible overhead, demonstrating that distortion is a correctable bottleneck in LLM-based graph reasoning.

2505.12369 2026-06-18 cs.AI cs.LG cs.LO 版本更新

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

知识图谱上具有传递关系的全几何多跳推理

Fernando Zhapa-Camacho, Robert Hoehndorf

发表机构 * KAUST Center of Excellence for Smart Health (KCSH)(智能健康卓越中心) KAUST Center of Excellence for Generative AI(生成人工智能卓越中心)

AI总结 提出GeometrE方法,将逻辑操作映射为纯几何变换,并引入传递损失函数,在保持可解释性的同时提升多跳推理性能。

Comments Accepted at ESWC 2026

详情
Journal ref
The Semantic Web. ESWC 2026. Lecture Notes in Computer Science, vol 16549. Springer, Cham (2026)
AI中文摘要

知识图谱上的多跳逻辑推理需要将逻辑语义忠实地映射到潜在空间。当前的几何嵌入方法通过将实体映射到几何区域、逻辑操作映射到潜在变换,在此任务上表现出有效性。虽然几何嵌入可以为查询回答提供直接的可解释性框架,但当前方法仅利用了实体的几何构造,未能将逻辑操作映射为纯几何变换,而是使用神经组件来学习这些操作。另一方面,纯神经方法优于几何方法,但在潜在空间中缺乏可解释性。我们提出了GeometrE,一种用于多跳推理的几何嵌入方法,它将每个逻辑操作映射为潜在空间中的纯几何操作。此外,我们引入了一个传递损失函数,并表明与现有方法不同,它可以保留对所有a,b,c的逻辑规则:r(a,b)和r(b,c) -> r(a,c)。我们的实验表明,GeometrE优于当前最先进的几何方法,并在标准基准数据集上与现有的神经方法保持竞争力。

英文摘要

Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule for all a,b,c: r(a,b) and r(b,c) -> r(a,c). Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.

10. 迁移、元学习与持续学习 5 篇

2506.14126 2026-06-18 cs.LG cs.AI 版本更新

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

从记忆到参数干扰:过度训练专家如何损害模型合并

Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite

发表机构 * Concordia University(康科德大学) Mila -- Québec AI Institute(魁北克人工智能研究所) Google DeepMind(谷歌深Mind)

AI总结 本文研究专家模型微调过度对模型合并的影响,发现长时间微调导致记忆困难样本,造成参数干扰,降低合并性能,并提出任务相关的早停策略改善合并效果。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

现代深度学习日益以使用开放权重基础模型为特征,这些模型可以在专门数据集上进行微调。这导致了专家模型和适配器的激增,通常通过HuggingFace和AdapterHub等平台共享。模型合并最近成为一种有效利用这些现有资源的方法,使得能够组合不同模型检查点的能力。因此,形成了一种自然的流程来利用迁移学习的好处并分摊沉没训练成本:模型在通用数据上预训练,在特定任务上微调,然后合并多个检查点以获得更强大的模型。一个普遍假设是,该流程中某一阶段的改进会向下游传播,从而在后续步骤中带来收益。在这项工作中,我们通过研究专家微调如何影响模型合并来挑战这一假设。我们表明,针对个体性能优化的专家长时间微调会导致跨视觉和语言模态、多种模型规模以及完全微调和LoRA适配模型的合并性能下降。我们将这种退化追溯到对一小部分困难样本的记忆,这些样本主导了微调后期步骤。这会导致负参数干扰,并编码在合并过程中被遗忘的知识。最后,我们证明任务相关的激进早停策略可以显著改善模型合并性能。

英文摘要

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

2602.09234 2026-06-18 cs.LG cs.AI 版本更新

Do Neural Networks Lose Plasticity in a Gradually Changing World?

神经网络在渐变世界中会失去可塑性吗?

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta Canada CIFAR AI Chair

AI总结 研究任务转换的突然性对神经网络可塑性损失的影响,通过输入/输出插值和任务采样模拟渐变环境,理论和实验表明可塑性损失严重程度与任务转换突然性密切相关,渐变环境下可显著减轻。

详情
AI中文摘要

持续学习已成为机器学习的热门话题。最近的研究发现了一个有趣的现象,称为可塑性丧失,指的是神经网络逐渐失去学习新任务的能力。然而,现有的可塑性研究很大程度上依赖于具有突然任务转换的基准测试,而没有检验突然性本身是否导致了观察到的可塑性损失。在本文中,我们通过输入/输出插值和任务采样模拟逐渐变化的环境,研究了转换突然性的作用。我们进行了理论和实证分析,表明可塑性损失的严重程度与任务转换的突然性密切相关,并且在环境逐渐变化时可以显著降低。

英文摘要

Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on benchmarks with abrupt task transitions, without examining whether the abruptness itself contributes to the observed plasticity loss. In this paper, we investigate the role of transition abruptness by simulating gradually changing environments through input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the severity of plasticity loss is closely tied to the abruptness of task transitions, and can be substantially reduced when the environment changes gradually.

2303.18031 2026-06-18 cs.CV cs.AI cs.LG 版本更新

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

简单域泛化方法是开放域泛化的强基线

Masashi Noguchi, Shinichi Shirakawa

发表机构 * Graduate School of Environment and Information Sciences(环境与信息科学研究生院) Yokohama National University(Yokohama国立大学) Faculty of Environment(环境学系)

AI总结 本文评估现有域泛化方法在开放域泛化中的表现,发现简单方法CORAL和MMD与复杂方法DAML竞争力相当,并通过集成学习和Dirichlet混合数据增强简单扩展后性能接近DAML且计算成本更低。

Comments Accepted at IJCNN 2024. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval

详情
AI中文摘要

在现实应用中,机器学习模型需要处理开放集识别(OSR),即在推理过程中出现未知类别,同时还要处理域偏移,即训练和推理阶段数据分布不同。域泛化(DG)旨在处理推理阶段目标域在模型训练期间不可访问的域偏移情况。开放域泛化(ODG)同时考虑DG和OSR。域增强元学习(DAML)是一种针对ODG的方法,但其学习过程复杂。相比之下,尽管已提出多种DG方法,但它们尚未在ODG场景下进行评估。在本研究中,我们全面评估了现有DG方法在ODG中的表现,并表明两种简单的DG方法——相关对齐(CORAL)和最大均值差异(MMD)——在多种情况下与DAML具有竞争力。此外,我们通过引入DAML中使用的技术(如集成学习和Dirichlet混合数据增强)提出了CORAL和MMD的简单扩展。实验评估表明,扩展后的CORAL和MMD可以以较低的计算成本达到与DAML相当的性能。这表明简单的DG方法及其简单扩展是ODG的强基线。

英文摘要

In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the data distribution differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during the model training. Open domain generalization (ODG) considers DG and OSR. Domain-augmented meta-learning (DAML) is a method targeting ODG; however, it has a complicated learning process. By contrast, although various DG methods have been proposed, they have not been evaluated in ODG situations. In this study, we comprehensively evaluate the existing DG methods in ODG and show that the two simple DG methods, CORrelation ALignment (CORAL) and maximum mean discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG.

2510.15551 2026-06-18 cs.CL cs.AI cs.LG 版本更新

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

从统计视角重新思考跨语言差距

Vihari Piratla, Purvam Jain, Darshan Singh, Trevor Cohn, Preethi Jyothi, Partha Talukdar

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出跨语言差距源于目标语言响应方差,通过形式化偏差和无偏误差,并采用推理时集成方法降低方差,使跨语言迁移得分提升8%-50%以上。

Comments 30 pages

详情
AI中文摘要

任何知识片段通常以一种或少数几种自然语言表达在网页或大型语料库中。大型语言模型(LLMs)通过从源语言获取知识,并在使用目标语言查询时使其可访问,从而充当桥梁。跨语言差距是指使用目标语言而非源语言查询知识时准确率的下降。现有研究侧重于导致跨语言差距的建模或训练失败。在这项工作中,我们采取另一种视角来表征跨语言错误的性质,并假设目标语言中响应的方差是造成这一差距的关键原因。我们首次将跨语言差距形式化为有偏误差和无偏误差。通过多种控制方差并减少跨语言差距的推理时干预,我们实证验证了我们的假设。我们展示了几种测试时集成方法,这些方法降低了响应方差,从而将源-目标迁移得分提高了多达12个绝对百分点,在各种LLMs上实现了8%到超过50%的相对提升。

英文摘要

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化:利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple(苹果公司) ETH Zürich(苏黎世联邦理工学院)

AI总结 针对反因果设置下的域泛化问题,提出利用无标签数据估计环境扰动方向,通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性,并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据,这在标记数据稀缺时限制了它们的适用性。在这项工作中,我们研究了反因果设置下的域泛化,其中结果导致观察到的协变量。在这种结构下,影响协变量的环境扰动不会传播到结果,这促使我们对模型对这些扰动的敏感性进行正则化。关键在于,估计这些扰动方向不需要标签,使我们能够利用来自多个环境的无标签数据。我们提出了两种方法,分别惩罚模型对跨环境协变量均值和协方差变化的敏感性,并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后,我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

11. 数据集、基准与评测 14 篇

2406.14399 2026-06-18 cs.LG cs.CV physics.ao-ph stat.ML 版本更新

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

面向全球站点业务天气预报的物理信息时间序列模型基准测试

Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

发表机构 * Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR China(香港科技大学计算机科学与工程系) Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学计算机科学与工程系) School of Computer and Information Sciences, University of Newcastle, Newcastle, Australia(新castle大学计算机与信息科学学院) Hangzhou Innovation Institute of Beihang University, Hangzhou, China(北京航空航天大学杭州创新研究院) Shanghai Artificial Intelligence Laboratory, Shanghai, China(上海人工智能实验室)

AI总结 提出大规模观测数据集WEATHER-5K和物理信息模型PhysicsFormer,通过压力-风对齐和能量感知平滑损失增强物理一致性,在多个天气变量和极端事件预测上评估学术模型与业务系统的差距。

Comments Accepted by ICML2026

详情
AI中文摘要

时间序列预测(TSF)模型的发展常受限于缺乏全面的数据集,尤其是在全球站点天气预报(GSWF)中,现有数据集规模小、时间短且空间稀疏。为解决这一问题,我们引入了WEATHER-5K,一个大规模观测天气数据集,能更好地反映真实世界条件,支持改进模型训练和评估。尽管最近的TSF方法在基准测试上表现良好,但在捕捉复杂天气动态和极端事件方面落后于业务数值天气预报系统。我们提出了PhysicsFormer,一种物理信息预测模型,结合动态核心与Transformer残差来预测未来天气状态。通过压力-风对齐和能量感知平滑损失强制物理一致性,确保在捕捉复杂时间模式的同时保持合理的动力学。我们将PhysicsFormer及其他TSF模型与业务系统在多个天气变量、极端事件预测和模型复杂度上进行基准测试,全面评估学术TSF模型与业务预报之间的差距。数据集和基准测试实现可在以下网址获取:this https URL。

英文摘要

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

2508.20330 2026-06-18 cs.LG 版本更新

FORGE: Foundational Optimization Representations from Graph Embeddings

FORGE:基于图嵌入的基础优化表示

Zohair Shafi, Serdar Kadioglu

发表机构 * Khoury College of Computer Science Northeastern University(诺埃弗大学计算机科学学院) AI Center of Excellence, Fidelity Investments(富达投资人工智能卓越中心) Department of Computer Science, Brown University(布朗大学计算机科学系)

AI总结 提出FORGE框架,通过无监督预训练向量量化图自编码器学习混合整数规划实例的通用表示,无需求解器或最优解,在下游任务中提升求解器性能并超越现有方法。

Comments Published in TMLR

详情
AI中文摘要

组合优化问题在科学和工程中无处不在。然而,基于学习的加速组合优化方法通常需要求解大量困难实例来收集训练数据,导致显著的计算成本。现有的学习方法需要为每个问题分布和每个下游任务训练专用模型,严重限制了其可扩展性和泛化能力。我们提出Forge:基于图嵌入的基础优化表示,这是一个框架,它在大规模、多样化的混合整数规划(MIP)实例集合上以无监督方式预训练向量量化图自编码器,不依赖优化求解器或最优解。向量量化产生离散的代码分配,作为表示优化实例的词汇表。我们在无监督和有监督设置下评估Forge。在无监督设置中,Forge嵌入有效聚类跨问题领域和规模的未见实例。在有监督设置中,我们微调Forge嵌入,并展示单个预训练模型有助于预测割生成的完整性差距和搜索指导的变量提示,跨越多个问题和规模分布。在这两个任务中,我们提升了商业优化求解器的性能,并超越了最先进的基于学习的方法。最后,我们开源训练代码、预训练Forge权重和多个MIP分布的嵌入,以促进优化问题表示学习的进一步研究。

英文摘要

Combinatorial optimization problems are ubiquitous in science and engineering. Still, learning-based approaches to accelerate combinatorial optimization often require solving a large number of difficult instances to collect training data, incurring significant computational cost. Existing learning-based methods require training dedicated models for each problem distribution, for each downstream task, severely limiting their scalability and generalization. We introduce Forge: Foundational Optimization Representations from Graph Embeddings, a framework that pre-trains a vector-quantized graph autoencoder on a large, diverse collection of mixed-integer programming (MIP) instances in an unsupervised manner, without relying on optimization solvers or optimal solutions. Vector quantization produces discrete code assignments that serve as a vocabulary for representing optimization instances. We evaluate Forge in both unsupervised and supervised settings. In the unsupervised setting, Forge embeddings effectively cluster unseen instances across problem domains and sizes. In the supervised setting, we fine-tune Forge embeddings and show that a single pre-trained model helps predicting both the integrality gap for cut-generation and variable hints for search guidance across multiple problem and size distributions. In both tasks, we improve the performance of a commercial optimization solver and outperform state-of-the-art learning-based methods. Finally, we open-source our training code, pre-trained Forge weights, and embeddings for multiple MIP distributions to foster further research in representation learning for optimization problems https://skadio.github.io/forge/

2509.02555 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Surrogate Benchmarks for Model Merging Optimization

模型合并优化的替代基准

Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 针对模型合并超参数优化计算成本高的问题,构建替代基准以低成本预测合并模型性能并模拟优化算法行为。

Comments AutoML 2025 Non-Archival Content Track. The code of the surrogate benchmark is available at https://github.com/shiralab/SMM-Bench

详情
AI中文摘要

模型合并技术旨在将多个模型的能力整合到一个模型中。大多数模型合并技术都有超参数,其设置会影响合并模型的性能。由于现有几项工作表明,调整模型合并中的超参数可以增强合并结果,因此为模型合并开发超参数优化算法是一个有前景的方向。然而,其优化过程计算成本高昂,特别是在合并大型语言模型时。在这项工作中,我们为合并超参数的优化开发了替代基准,以实现低成本的算法开发和性能比较。我们定义了两个搜索空间并收集数据样本,以构建替代模型来预测合并模型在给定超参数下的性能。我们证明了我们的基准能够很好地预测合并模型的性能,并模拟优化算法的行为。

英文摘要

Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

2509.22363 2026-06-18 cs.LG eess.AS 版本更新

Investigating Faithfulness in Large Audio Language Models

大型音频语言模型中的忠实性研究

Pooneh Mousavi, Lovenya Jain, Mirco Ravanelli, Cem Subakan

发表机构 * Concordia University(康科迪亚大学) Mila - Quebec AI Institute(魁北克人工智能研究院) Université Laval(拉瓦尔大学) Birla Institute of Technology and Science, Pilani(比拉理工学院和科学学院,皮兰尼)

AI总结 提出系统框架评估大型音频语言模型在推理链忠实性上的表现,定义三个音频忠实性标准,并通过基准测试发现模型推理与音频输入存在脱节。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

大型音频语言模型(LALMs)将音频编码器与预训练的大型语言模型集成,以执行复杂的多模态推理任务。虽然这些模型可以生成思维链(CoT)解释,但这些推理链的忠实性仍不清楚。在这项工作中,我们提出了一个系统框架来评估LALMs中CoT在输入音频和最终模型预测方面的忠实性。我们定义了音频忠实性的三个标准:无幻觉、整体性和专注聆听。我们还引入了一个基于音频和CoT干预的基准来评估忠实性\footnote{基准测试界面和评估结果可在以下网址获取:https://this https URL。}。在Audio Flamingo 3和Qwen2.5-Omni上的实验表明存在潜在的多模态脱节:推理通常与最终预测一致,但并不总是强烈基于音频,并且可能容易受到幻觉或对抗性扰动的影响。

英文摘要

Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framework to evaluate CoT faithfulness in LALMs with respect to both the input audio and the final model prediction. We define three criteria for audio faithfulness: hallucination-free, holistic, and attentive listening. We also introduce a benchmark based on both audio and CoT interventions to assess faithfulness\footnote{The benchmarking interface and evaluation results are available at https://poonehmousavi.github.io/faithfulness/. Experiments on Audio Flamingo 3 and Qwen2.5-Omni suggest a potential multimodal disconnect: reasoning often aligns with the final prediction but is not always strongly grounded in the audio and can be vulnerable to hallucinations or adversarial perturbations.

2605.07022 2026-06-18 cs.LG 版本更新

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

自主驾驶数据集:从2000万篇论文到大规模精细化生物医学知识

Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang, Kaiwen Wu, Jiaming Liang, Maggie Ziyu Huan, Yoseph Barash, Cesar de la Fuente-Nunez, Osbert Bastani, Zachary Ives, Mark Yatskar, Jacob R. Gardner

发表机构 * Department of Computer and Information Science, University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系) Department of Genetics, University of Pennsylvania(宾夕法尼亚大学遗传学系) Departments of Bioengineering and Chemical and Biomolecular Engineering, University of Pennsylvania(宾夕法尼亚大学生物工程与化学与生物分子工程系)

AI总结 本文提出通过PubMed自动生成结构化数据集,实现更大规模、更精细和更准确的生物医学知识,展示Starling系统在多个任务中生成大规模数据集并提升准确性。

详情
AI中文摘要

人工编纂的生物医学仓库在生物活性、基因组学和化学领域昂贵且滞后于原始文献,丢弃实验背景,掩盖了评估数据正确性和覆盖范围所需的细微差别。我们证明PubMed本身可以被自动且经济地转化为结构化数据集,这些数据集比它们取代的编纂数据库更大、更细致和更准确。我们提出了三个耦合贡献:(1)基于九个生物医学本体的LLM实体标记流水线,能够在包含2250万篇论文和2500亿个token的PubMed语料库中标记45亿个实体,跨19个类别;(2)混合稀疏密集检索支持在标记语料库上执行实体过滤的语义查询;(3)Starling,一个多代理深度研究系统,仅给定自然语言任务描述,即可设计精度和召回率目标的检索过滤器,诱导提取模式,并输出具有丰富细节字段和支持段落的结构化记录。在六个任务中——血脑屏障渗透性、口服生物利用度、急性毒性(LD50)、基因疾病关联、蛋白质亚细胞定位和化学反应——Starling生成约630万条记录(每任务91K至3M条);其中一些是目前最大的公开数据集。前沿模型对我们的提取的拒绝率在0.6-7.7%之间,远低于我们在广泛使用的编纂数据集上测量的错误率(例如,BBB_Martins为16.5%,Bioavailability_Ma为7.3%)。除了规模和准确性外,支持段落还携带了表格数据库所丢弃的细微差别——例如,口服生物利用度可能取决于进食与否的状态。共同,语料库、检索和代理为AI驱动的治疗设计建立了基础。代码和数据集:https://github.com/starling-labs/starling.

英文摘要

Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

2606.07591 2026-06-18 cs.LG cs.AI cs.CL 版本更新

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

ResearchClawBench: 端到端自主科学研究基准

Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Koutian Wu, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 提出ResearchClawBench基准,包含10个领域40个任务,通过多模态评分标准评估自主科研能力,最强智能体仅得21.5分,揭示当前系统在实验协议、证据匹配和科学核心方面的不足。

详情
AI中文摘要

AI编码智能体越来越多地用于科学工作,但其端到端自主研究能力仍然难以验证。我们提出了ResearchClawBench,一个用于评估自主科学研究的基准,涵盖来自10个科学领域的40个任务。每个任务基于一篇真实发表论文,提供相关文献和原始数据,并在评估期间隐藏目标论文。专家策划的多模态评分标准将目标科学制品分解为加权标准,从而能够评估目标论文级别的重新发现,同时为新发现留出空间。我们在统一协议下评估了七个自主研究(auto-research)智能体,并通过轻量级ResearchHarness评估了十七个原生LLM。当前系统远未达到可靠的重新发现:最强的自主智能体Claude Code平均得分为21.5,最强的ResearchHarness LLM Claude-Opus-4.7平均得分为20.7,LLM前沿均值仅为26.5。错误分析表明,失败集中在实验协议不匹配、证据不匹配和缺失科学核心。ResearchClawBench为衡量自主科学研究进展提供了一个可复现的评估前沿。

英文摘要

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

2407.18245 2026-06-18 cs.CV cs.LG 版本更新

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

VGGHeads: 基于大规模合成数据集的3D多头部对齐

Orest Kupyn, Eugene Khvedchenia, Christian Rupprecht

发表机构 * University of Oxford(牛津大学) Piñata Farms Ukrainian Catholic University(乌克兰天主大学)

AI总结 提出VGGHeads,一个由扩散模型生成的大规模合成数据集,用于单步同时进行头部检测和3D网格重建,在真实图像上表现优异。

详情
AI中文摘要

人类头部检测、关键点估计和3D头部模型拟合是许多应用中的基本任务。然而,传统的真实世界数据集常常存在偏差、隐私和伦理问题,并且是在实验室环境中记录的,这使得训练出的模型难以泛化。在这里,我们介绍\method——一个使用扩散模型生成的大规模合成数据集,用于人类头部检测和3D网格估计。我们的数据集包含超过100万张高分辨率图像,每张图像都标注了详细的3D头部网格、面部标志和边界框。利用这个数据集,我们引入了一种新的模型架构,能够从单张图像中单步同时进行头部检测和头部网格重建。通过广泛的实验评估,我们证明了在我们的合成数据上训练的模型在真实图像上取得了强劲的性能。此外,我们数据集的多样性使其适用于广泛的任务,提供了人类头部的通用和全面表示。

英文摘要

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

2507.07156 2026-06-18 stat.ML cs.CG cs.LG math.AT 版本更新

Unreduced Persistence Diagrams for Topological Machine Learning

未约简持久图在拓扑机器学习中的应用

Nicole Abreu, Parker B. Edwards, Francis Motta

发表机构 * Department of Mathematics and Statistics, Florida Atlantic University, Boca Raton, FL(数学与统计学系,佛罗里达国际大学, Boca Raton, FL)

AI总结 研究未约简边界矩阵生成的拓扑特征向量在机器学习中的性能,发现其与完全约简持久图性能相当甚至更优,且计算内存需求低一个数量级。

Comments Substantially expanded to include additional ML and software benchmark experiments. 11 figures, 4 tables, 20 pages (without appendix and references)

详情
AI中文摘要

基于持久同源性特征训练的监督机器学习流程在实验中被观察到忽略了持久图中包含的大量信息。然而,计算持久图通常是此类流程中计算最密集的步骤。为了探索这一动态,我们引入了几种从未约简边界矩阵生成拓扑特征向量的方法,并研究了它们的理论和计算性质。我们比较了基于未约简持久图向量化的流程与基于完全约简持久图向量化的流程在多种数据和任务类型上的性能。结果表明,基于未约简图构建的持久图训练的模型在某些任务上可以与基于完全约简图训练的模型表现相当,甚至更优。我们还对一个计算未约简图的算法进行了计算性能基准测试,该算法是Ripser的 heavily modified 版本。这些计算是可并行的,并且平均所需内存比计算完全持久图少一个数量级。我们的结果表明,利用未约简边界矩阵中包含信息的机器学习流程可能在计算成本和性能方面受益。

英文摘要

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

2604.06367 2026-06-18 cs.CR cs.AI cs.LG 版本更新

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

WebSP-Eval:在网站安全与隐私任务上评估网络代理

Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 提出WebSP-Eval框架,通过200个任务实例和自动化评估器,测试多模态大模型在网站安全与隐私任务上的表现,发现状态UI元素(如开关)导致超过45%的任务失败。

Comments Accepted at PETS 2026. Project Page: https://wiscprivacy.com/webspeval/

详情
AI中文摘要

网络代理自动化浏览器任务,从简单的表单填写到复杂的工作流程(如订购杂货)。虽然当前的基准测试评估通用性能(如WebArena)或针对恶意行为的安全性(如SafeArena),但没有现有框架评估代理成功执行面向用户的网站安全和隐私任务的能力,例如管理cookie偏好、配置隐私敏感账户设置或撤销非活动会话。为填补这一空白,我们引入了WebSP-Eval,一个用于衡量网络代理在网站安全和隐私任务上性能的评估框架。WebSP-Eval包括:1)一个手动制作的任务数据集,涵盖28个网站的200个任务实例;2)一个强大的代理系统,支持使用自定义Google Chrome扩展在多次运行中进行账户和初始状态管理;以及3)一个自动化评估器。我们使用最先进的多模态大语言模型评估了总共8个网络代理实例,对网站、任务类别和UI元素进行了细粒度分析。我们的评估显示,当前模型在可靠解决网站安全和隐私任务方面自主探索能力有限,并且在特定任务类别和网站上表现困难。关键的是,我们发现状态UI元素是代理失败的主要原因,其中开关导致许多模型超过45%的任务失败。

英文摘要

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements are a primary reason for agent failure, with toggles causing more than 45% task failure across many models.

2604.20822 2026-06-18 cs.CV cs.LG 版本更新

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

全球海上风电基础设施:基于密集Sentinel-1时间序列的部署与运行动态

Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

发表机构 * Earth Observation Center (EOC), German Aerospace Center (DLR)(地球观测中心(EOC),德国航空航天中心(DLR)) Institute for Geography and Geology, University of Wuerzburg(地理与地质研究所,乌尔姆大学)

AI总结 提出全球Sentinel-1 SAR时间序列数据集,通过目标检测和规则分类器识别海上风电基础设施的部署与运行阶段,支持全球尺度动态分析。

Comments 29 pages, 18 figures

详情
AI中文摘要

海上风电行业正在快速扩张,增加了对全球范围内基础设施部署和运行进行独立、高时间分辨率监测的需求。虽然基于地球观测的海上风电基础设施测绘在空间定位方面已经成熟,但现有的开放数据集缺乏关于建设和运行动态的时间密集且语义精细的信息。我们引入了一个全球Sentinel-1合成孔径雷达(SAR)时间序列数据语料库,该语料库解析了2016年第一季度至2025年第一季度海上风电基础设施的部署和运行阶段。基于更新的目标检测工作流程,我们在检测到的基础设施位置编译了15,606条时间序列,共有14,840,637个事件作为分析就绪的一维SAR后向散射剖面,每个剖面对应一次Sentinel-1采集和一个位置。为了便于直接使用和基准测试,我们发布了(i)分析就绪的一维SAR剖面,(ii)由基于规则的分类器生成的事件级基线语义标签,以及(iii)包含553条时间序列和328,657个事件标签的专家标注基准数据集。基线分类器在事件评估中实现了0.84的宏F1分数,在折叠编辑相似性-质量阈值曲线下面积(AUC)为0.785,表明时间一致性。我们证明,由此产生的语料库支持全球尺度的部署动态分析、区域部署模式差异的识别、船只交互和运行事件,并为开发和比较海上风电基础设施监测的时间序列分类方法提供了参考。

英文摘要

The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack temporally dense and semantically fine-grained information on construction and operational dynamics. We introduce a global Sentinel-1 synthetic aperture radar (SAR) time series data corpus that resolves deployment and operational phases of offshore wind infrastructure from 2016Q1 to 2025Q1. Building on an updated object detection workflow, we compile 15,606 time series at detected infrastructure locations, with overall 14,840,637 events as analysis-ready 1D SAR backscatter profiles, one profile per Sentinel-1 acquisition and location. To enable direct use and benchmarking, we release (i) the analysis ready 1D SAR profiles, (ii) event-level baseline semantic labels generated by a rule-based classifier, and (iii) an expert-annotated benchmark dataset of 553 time series with 328,657 event labels. The baseline classifier achieves a macro F1 score of 0.84 in event-wise evaluation and an area under the collapsed edit similarity-quality threshold curve (AUC) of 0.785, indicating temporal coherence. We demonstrate that the resulting corpus supports global-scale analyses of deployment dynamics, the identification of differences in regional deployment patterns, vessel interactions, and operational events, and provides a reference for developing and comparing time series classification methods for offshore wind infrastructure monitoring.

2604.28076 2026-06-18 cs.CL cs.AI cs.LG 版本更新

TopBench: A Benchmark for Implicit Predictive Reasoning in Tabular Question Answering

TopBench:表格问答中隐式预测推理的基准

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University, China(人工智能学院,南京大学,中国) National Key Laboratory for Novel Software Technology, Nanjing University, China(新型软件技术国家重点实验室,南京大学,中国)

AI总结 提出TopBench基准,包含779个样本和四个子任务,评估大语言模型在表格问答中识别隐式预测意图并进行可靠推理的能力,发现当前模型在意图识别上存在困难。

详情
AI中文摘要

大型语言模型(LLM)推动了表格问答的发展,其中大多数查询可以通过提取信息或简单聚合来回答。然而,一类常见的现实世界查询是隐式预测性的,需要从历史模式中推断未观察到的答案,而不仅仅是检索。这些查询带来了两个挑战:识别潜在意图和对大规模表格进行可靠的预测推理。为了评估LLM在带有隐式预测任务的表格问答中的表现,我们引入了TopBench,一个包含779个样本的基准,涵盖四个子任务,从单点预测到决策制定、处理效应分析和复杂过滤,要求模型生成涵盖推理文本和结构化表格的输出。我们在基于文本和代理工作流下评估了多种模型。实验表明,当前模型通常在意图识别上存在困难,默认进行查找。更深入的分析发现,准确的意图消歧是引导这些预测行为的前提。此外,提升预测精度的上限需要整合更复杂的建模或推理能力。

英文摘要

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring the inference of unobserved answers from historical patterns rather than mere retrieval. These queries introduce two challenges: recognizing latent intent and reliable predictive reasoning over massive tables. To assess LLMs in such Tabular questiOn answering with implicit Prediction tasks, we introduce TopBench, a benchmark consisting of 779 samples across four sub-tasks, ranging from single-point prediction to decision making, treatment effect analysis, and complex filtering, requiring models to generate outputs spanning reasoning text and structured tables. We evaluate diverse models under both text-based and agentic workflows. Experiments reveal that current models often struggle with intent recognition, defaulting to just lookups. Deeper analysis identifies that accurate intent disambiguation serves as the prerequisite for leading these predictive behaviors. Furthermore, elevating the upper bound of prediction precision requires the integration of more sophisticated modeling or reasoning capabilities.

2605.03460 2026-06-18 cs.AI cs.LG 版本更新

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

FinSTaR:面向时间序列推理模型的金融推理

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, Soonyoung Lee, Wonbin Ahn

发表机构 * LG AI Research(LG人工智能研究)

AI总结 针对时间序列推理模型在金融领域的失效问题,提出基于2x2能力分类法的FinSTaR模型,通过Compute-in-CoT和Scenario-Aware CoT策略在FinTSR-Bench基准上达到78.9%平均准确率。

Comments KDD Workshop on SciSoc Agents & LLMs 2026 (Oral Presentation)

详情
AI中文摘要

时间序列推理模型在通用领域表现出色,但在具有独特特征的金融领域却持续失败。我们提出一个通用的2x2能力分类法,通过交叉1)单实体与多实体分析,以及2)当前状态评估与未来行为预测来划分TSRM能力。我们在金融领域实例化该分类法——其中确定性评估与随机性预测的区分尤为关键——形成十个金融推理任务,并基于标普股票构建FinTSR-Bench基准。为此,我们提出FinSTaR(金融时间序列思考与推理),在FinTSR-Bench上训练,并针对每个类别采用不同的思维链策略。对于评估(确定性,即可从可观测数据计算得出),我们采用Compute-in-CoT,一种程序化思维链,使模型能够直接从原始价格推导答案。对于预测(本质上是随机的,即受不可观测因素影响),我们采用场景感知思维链,在做出判断前生成多种场景,模拟金融分析师在不确定性下的推理方式。所提方法在FinTSR-Bench上达到78.9%的平均准确率,显著优于LLM和TSRM基线。此外,我们展示了四个能力类别通过联合训练具有互补性和相互增强性,并且场景感知思维链相比标准思维链持续提升预测准确率。代码已公开:https://github.com/seunghan96/FinSTaR。

英文摘要

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail in the financial domain, which exhibits unique characteristics. We propose a general 2 x 2 capability taxonomy for TSRMs by crossing 1) single-entity vs. multi-entity analysis with 2) assessment of the current state vs. prediction of future behavior. We instantiate this taxonomy in the financial domain-where the distinction between deterministic assessment and stochastic prediction is particularly critical-as ten financial reasoning tasks, forming the FinTSR-Bench benchmark based on S&P stocks. To this end, we propose FinSTaR (Financial Time Series Thinking and Reasoning), trained on FinTSR-Bench with distinct chain-of-thought (CoT) strategies tailored to each category. For assessment, which is deterministic (i.e., computable from observable data), we employ Compute-in-CoT, a programmatic CoT that enables models to derive answers directly from raw prices. For prediction, which is inherently stochastic (i.e., subject to unobservable factors), we adopt Scenario-Aware CoT, which generates diverse scenarios before making a judgment, mirroring how financial analysts reason under uncertainty. The proposed method achieves 78.9% average accuracy on FinTSR-Bench, substantially outperforming LLM and TSRM baselines. Furthermore, we show that the four capability categories are complementary and mutually reinforcing through joint training, and that Scenario-Aware CoT consistently improves prediction accuracy over standard CoT. Code is available at https://github.com/seunghan96/FinSTaR.

2606.16000 2026-06-18 cs.CL cs.LG 版本更新

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

GRACE-DS:数据科学中的受保护奖励引导智能体修正环境

Aleksandr Tsymbalov, Danis Zaripov, Artem Epifanov, Anastasiya Palienko

发表机构 * ITMO University(ITMO大学) HSE University(高等经济学院)

AI总结 提出GRACE-DS,一个用于评估LLM驱动的AutoML智能体在部署前性能的隔离环境,通过隐藏的可执行验证器衡量预测性能、泄漏避免、可重复性等指标,实验证明其灵活迭代交互模式优于基线方法。

详情
AI中文摘要

我们介绍了GRACE-DS,一个数据科学中的受保护奖励引导智能体修正环境,用于对LLM驱动的AutoML智能体进行部署前评估。GRACE-DS是一组在隔离环境中的评估指标,可应用于特定组织的表格ML任务。它将智能体暴露于现实的工作流阶段,从规划和数据检查到特征工程、模型开发、验证、代码修复直至最终提交,同时隐藏的可执行验证器不仅衡量最终预测性能,还衡量泄漏避免、可重复性、协议有效性、修正行为和奖励对齐。最强的结构化机制——灵活迭代交互(我们的方法)——实现了比单次生成、非结构化交互和基于重启的基线更高的端到端归一化隐藏测试质量,同时提高了协议有效完成率。经过7000多个回合的验证,这些结果确立了GRACE-DS作为评估基于LLM的AutoML智能体在生产类条件下按照组织特定要求执行机器学习工作流能力的稳健平台。

英文摘要

We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be applied to tabular ML tasks specific to a particular organization. It exposes agents to realistic workflow stages, from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction (our approach), achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-based baselines, while also improving protocol-valid completion. Validated across more than 7,000 episodes, these results establish GRACE-DS as a robust platform for assessing the capacity of LLM-based AutoML agents to execute machine learning workflows under production-like conditions and in accordance with organization-specific requirements.

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述:数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University(浙江大学) Nanyang Technological University(南洋理工大学) Alibaba Group(阿里巴巴集团)

AI总结 综述直接偏好优化(DPO)在理论、变体、数据集和应用方面的进展,指出其作为RL-free替代方案的潜力与局限,并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情
AI中文摘要

随着大语言模型(LLMs)的快速发展,将策略模型与人类偏好对齐变得日益关键。直接偏好优化(DPO)作为一种有前景的对齐方法,作为从人类反馈中强化学习(RLHF)的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性,但文献中目前缺乏对这些方面的深入综述。在这项工作中,我们对DPO中的挑战和机遇进行了全面回顾,涵盖理论分析、变体、相关偏好数据集和应用。具体而言,我们基于关键研究问题对近期DPO研究进行分类,以提供对DPO当前格局的透彻理解。此外,我们提出了几个未来研究方向,为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

12. 机器学习应用 29 篇

2509.24725 2026-06-18 cs.LG cs.AI 版本更新

Q-Net: Queue Length Estimation via Kalman-based Neural Networks

Q-Net:基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam(阿姆斯特丹大学) Delft University of Technology(代尔夫特理工大学)

AI总结 本文提出Q-Net框架,通过结合卡尔曼滤波与神经网络,解决信号交叉口队列长度估计中的数据融合问题,提升空间转移性和实时性,实现无需昂贵传感设备的准确队列估计。

详情
AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源:(i) 接近停止线的环形检测器提供的车辆计数汇总数据,以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD),但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此,本文提出Q-Net:一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战,如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构,并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现,并通过将aFCD测量分组为固定大小的局部组来提高空间转移性,使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示,Q-Net优于基线方法,能够准确追踪队列的形成和消散,并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性,Q-Net在无需昂贵的传感基础设施(如摄像头或雷达)的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

2307.05623 2026-06-18 cs.LG cs.AI 版本更新

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

一种用于动态估计起点-终点序列的深度学习框架

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * School of Data Science University of Science(数据科学学院 中国科学技术大学) Yangtze River Delta Information Intelligence Innovation Research Institute, China(长江三角洲信息智能创新研究院)

AI总结 针对OD矩阵估计中的欠定性和滞后性问题,提出集成深度学习方法,利用神经网络推断OD序列结构并引导数值优化,实验证明能有效提供时空约束。

Comments 11 pages,25 figures

详情
AI中文摘要

OD矩阵估计是交通领域的一个关键问题。主要方法利用交通传感器测量信息(如交通计数)来估计由OD矩阵表示的交通需求。该问题分为两类:静态OD矩阵估计和动态OD矩阵序列(简称OD序列)估计。上述两类都面临由大量待估参数和不足的约束信息引起的欠定性问题。此外,OD序列估计还面临滞后挑战:由于拥堵等不同交通状况,同一车辆在相同观测时段内会出现在不同路段,导致相同的OD需求对应不同的行程。为此,本文提出一种集成方法,利用深度学习方法推断OD序列的结构,并利用结构约束指导传统数值优化。实验表明,神经网络能有效推断OD序列的结构,并为数值优化提供实用的约束以获得更好的结果。此外,实验表明,所提供的结构信息不仅包含对OD矩阵空间结构的约束,还提供了对OD序列时间结构的约束,很好地解决了滞后问题的影响。

英文摘要

OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

2506.13196 2026-06-18 cs.LG 版本更新

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

KEPLA:一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong(香港城市大学计算机科学系) ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University(浙江大学杭州国际科技创新中心) School of Software, Shandong University(山东大学软件学院) College of Informatics, Harbin Institute of Technology (Shenzhen)(哈尔滨工业大学(深圳)计算机学院)

AI总结 提出KEPLA框架,通过整合基因本体和配体属性的先验知识,利用全局表示对齐与局部交叉注意力,提升蛋白质-配体结合亲和力预测的准确性,在多个基准数据集上超越现有方法。

详情
AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果,但它们通常仅依赖蛋白质和配体的结构特征,忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限,我们提出KEPLA,一种新颖的深度学习框架,明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入,并优化两个互补目标:(1)将全局表示与知识图谱关系对齐,以捕获领域特定的生化见解;(2)利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明,KEPLA始终优于最先进的基线方法。此外,基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记:一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室) University of Science and Technology of China(中国科学技术大学) College of Intelligence and Computing(智能科学与计算学院) iFLYTEK Research(iFLYTEK研究院)

AI总结 提出TokenCast框架,利用大语言模型通过符号离散化将连续时间序列转化为标记,与上下文文本对齐,实现上下文感知的预测,实验证明有效。

详情
AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展,但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战,预测精度仍然有限。为了解决这一挑战,我们提出了TokenCast,一个由大语言模型(LLM)驱动的框架,利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说,TokenCast采用离散分词器将连续数值序列转化为时间标记,实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距,时间和上下文标记通过预训练的LLM嵌入到共享表示空间中,并通过生成目标进一步优化。基于这一统一语义空间,对齐的LLM随后以监督方式进行微调,以预测未来的时间标记,然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性,并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect:通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany(科隆大学数学与自然科学学院,德国) Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院生物医学信息学研究所,德国) Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆分子医学中心(CMMC),科隆大学医学院与科隆大学医院,德国) Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究,德国) Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany(认知神经科学,神经科学与医学研究所,Juelich研究中心,德国) Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院神经科,德国) Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany(神经科中心,帕金森、睡眠与运动障碍部门,波恩大学医院,德国) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany(德国神经退行性疾病研究中心(DZNE),波恩,德国) Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany(老龄化与相关疾病卓越中心(CECAD),科隆大学,德国) Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany(神经科,施普伦德-霍斯特大学医院,基尔校区和基尔大学,德国) Department of Informatics, Technical University of Munich, Germany(信息学院,慕尼黑技术大学,德国) Institute for Digital Medicine, University Hospital Bonn, Germany(数字医学研究所,波恩大学医院,德国) Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark(路德维希基金会帕金森病研究中心(PACE),奥胡斯大学,丹麦) Department of Nuclear Medicine, Aarhus University Hospital, Denmark(核医学部,奥胡斯大学医院,丹麦) Department of Electrical and Computer Engineering, Aarhus University, Denmark(电气与计算机工程系,奥胡斯大学,丹麦) Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK(牛津帕金森病中心与神经科,牛津大学临床神经科学系,英国)

AI总结 提出ActiTect,一个全自动开源机器学习工具,通过标准化预处理和睡眠-觉醒检测,从体动记录中识别RBD,在多个独立队列中验证了泛化能力(AUROC 0.84-0.94)。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

详情
Journal ref
npj Digital Medicine (2026)
AI中文摘要

孤立性快速眼动睡眠行为障碍(iRBD)是α-突触核蛋白病的主要前驱标志,通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力,但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect,一个全自动开源机器学习工具,用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力,我们的流程包括稳健的预处理和自动睡眠-觉醒检测,以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列,在嵌套交叉验证下表现出强大的区分能力(AUROC = 0.95)。在盲法本地测试集(n = 31,AUROC = 0.86)和两个独立外部队列(n = 113,AUROC = 0.84;n = 57,AUROC = 0.94)上验证了泛化性。为评估现实世界鲁棒性,跨内部和外部队列的留一数据集交叉验证显示出一致的性能(AUROC范围 = 0.84-0.89)。补充稳定性分析表明,关键预测特征在数据集中保持可重复性,支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用,我们的工具促进了广泛采用,并促进了独立验证和协作改进,从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

2602.19591 2026-06-18 cs.LG cs.AI 版本更新

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

使用异构图神经网络检测高潜力中小企业

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

发表机构 * University of Michigan(密歇根大学) The University of Hong Kong(香港大学)

AI总结 提出SME-HGT异构图Transformer框架,利用公开数据构建包含公司、研究主题和政府机构的异构图,预测SBIR第一阶段获奖者能否进入第二阶段,AUPRC达0.621,优于基线模型。

Comments accepted by (ICIIS 2026)

详情
AI中文摘要

中小企业占美国企业的99.9%,贡献44%的经济活动,但系统性地识别高潜力中小企业仍是一个开放挑战。我们提出了SME-HGT,一个异构图Transformer框架,仅使用公开数据预测哪些SBIR第一阶段获奖者将进入第二阶段资助。我们构建了一个异构图,包含32,268个公司节点、124个研究主题节点和13个政府机构节点,通过约99,000条边连接三种语义关系类型。SME-HGT在时间分割测试集上达到0.621±0.003的AUPRC,在五个随机种子上优于MLP基线(0.590±0.002)和R-GCN(0.608±0.013)。在筛选深度为100家公司时,SME-HGT达到89.6%的精确率,比随机选择提升2.14倍。我们的时间评估协议防止信息泄露,对公开数据的依赖确保了可重复性。这些结果表明,公司、研究主题和资助机构之间的关系结构为中小企业潜力评估提供了有意义的信号,对政策制定者和早期投资者具有启示意义。

英文摘要

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

2605.10083 2026-06-18 cs.LG 版本更新

Unlocking air traffic flow prediction through microscopic aircraft-state modeling

通过微观飞机状态建模解锁空交通流量预测

Bin Wang, Anqi Liu, Jiangtao Zhao, Hina Birahmani, Yanyong Huang, Peilan He, Guiyuan Jiang, Feng Hong, Yanwei Yu, Yuanyuan Hou, Tianrui Li

发表机构 * Faculty of Information Science and Engineering(信息科学与工程学院) Ocean University of China(中国海洋大学) Sanya Oceanographic Institution(三亚海洋研究所) Joint Laboratory of Data Science and Business Intelligence(数据科学与商务智能联合实验室) Southwestern University of Finance and Economics(西南财经大学) The Affiliated Hospital of Qingdao University(青岛大学附属医院) School of Computing and Artificial Intelligence(计算机与人工智能学院)

AI总结 本文提出AeroSense模型,通过微观飞机状态直接预测未来区域交通流量,提升高密度交通下的预测精度,替代传统时间序列方法。

详情
AI中文摘要

终端空域短期空交通流量预测对主动空交通管理至关重要。现有方法主要将交通流量建模为聚合时间序列,尽管交通动态由飞机状态和连续空域中的相互作用决定。此类聚合掩盖了包括飞机运动学、边界相互作用和控制意图在内的细粒度信息。本文提出AeroSense,一种从即时空域情况中的动态飞机状态集直接预测未来交通流量的状态到流量建模框架。通过建立从微观飞机状态到未来区域交通流量的端到端映射,AeroSense在保持飞机级动态的同时,自然适应变化的交通密度,而无需依赖历史回溯窗口。在大规模真实数据集上的实验表明,AeroSense在高密度交通期间比基于聚合的预测方法具有持续的预测精度提升。这些发现表明,即时空域情况为传统基于时间序列的交通预测范式提供了有效的替代方案。

英文摘要

Short-term air traffic flow prediction in terminal airspace is essential for proactive air traffic management. Existing approaches predominantly model traffic flow as aggregated time series. However, traffic dynamics are governed by aircraft states and their interactions in continuous airspace. Such aggregation obscures fine-grained information, including aircraft kinematics, boundary interactions, and control intent. Here we present AeroSense, a state-to-flow modeling paradigm that predicts future traffic flow directly from instantaneous airspace situations represented as dynamic sets of aircraft states derived from ADS-B trajectories. By establishing an end-to-end mapping from microscopic aircraft states to future regional traffic flow, AeroSense preserves aircraft-level dynamics while naturally accommodating varying traffic density without relying on historical look-back windows. Experiments on a large-scale real-world dataset show that AeroSense exhibits admirable predictive accuracy and robustness over aggregation-based forecasting approaches, particularly during high-density traffic periods. These findings suggest that aircraft-state situation modeling provides a promising alternative to conventional time-series forecasting in air traffic flow management.

2605.13566 2026-06-18 cs.LG 版本更新

Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks

基于深度神经网络的城市地表温度时空下垫面精细化与现在预报

Solomiia Kurchaba, Angela Meyer

发表机构 * Department of Geoscience and Remote Sensing(地质科学与遥感系) Delft University of Technology(代尔夫特理工大学) School of Engineering and Computer Science(工程与计算机科学学院) Bern University of Applied Sciences(伯恩应用科学大学)

AI总结 本文提出利用深度神经网络结合静止和极轨卫星数据,实现高时空分辨率的城市地表温度场估计与现在预报,提升城市气候与生态研究的精度与时效性。

Comments Paper after publication in IEEE Access

详情
Journal ref
IEEE Access, vol. 14, pp. 85134-85151, 2026
AI中文摘要

地表温度(LST)是多种应用的关键变量,如城市气候和生态研究。然而,现有卫星衍生的LST产品提供的是高空间或高时间分辨率,导致两者之间存在根本性权衡。为解决这一权衡,我们结合静止和极轨卫星的观测数据,提供高空间和高时间分辨率(1公里,15分钟间隔)的LST场。我们展示了其在日内LST预报中的应用。为了估计高时空分辨率的LST场,训练了一个U-Net模型,将SEVIRI/MSG(3公里,15分钟分辨率)的LST场映射到Terra/Aqua MODIS(1公里,每天4次过境)的LST场,二者在空间和时间上同步。所提出的模型已在欧洲大都市的LST上进行训练,人口超过100万,且在留出测试集上达到RMSE=1.92°C和接近零偏移MVE=0.01°C。作为第二步,我们提出基于ConvLSTM架构的LST现在预报模型,训练数据为下缩的LST场,预测时间跨度为15至75分钟。该现在预报模型优于持续性和气候滚动中位数基准,对于所考虑的预测时间,RMSE为0.57至1.15°C,偏移范围从-0.1到0.14°C。此外,与独立MODIS过境的额外验证确认了鲁棒性能。我们的高时空分辨率LST预报模型可直接应用于基于卫星的LST监测操作。

英文摘要

Land Surface Temperature (LST) is a key variable for various applications, such as urban climate and ecology studies. Yet, existing satellite-derived LST products provide either high spatial or high temporal resolution, resulting in a fundamental trade-off between the two. To address this trade-off, we combine observations from a geostationary and a polar orbiting satellite and provide LST fields at high spatial and high temporal resolution (1 km at 15-min intervals). We demonstrate their application for intraday forecasting of LSTs. To estimate LST fields at high spatiotemporal resolution, a U-Net model is trained to map LST fields from SEVIRI/MSG (3 km and 15 min resolution) to LST fields from Terra/Aqua MODIS (1 km, 4 overpasses per day) that are collocated in space and time. The presented model has been trained on LSTs across large European cities with a population exceeding 1 million inhabitants, and achieves an RMSE = $1.92$°C and near-zero bias MBE = $0.01$°C on the hold-out test set. As a second step, we present an LST nowcasting model based on ConvLSTM architecture, trained across downscaled LST fields with forecast lead times of 15 to 75 minutes. The nowcasting model outperforms a persistence and a Climatological Rolling Median benchmarks, with RMSEs of $0.57$ to $1.15$°C for the considered lead times and biases ranging from $-0.1$ to $0.14$°C. An additional validation conducted against independent MODIS overpasses confirms robust performance. Our LST forecast model at high spatiotemporal resolution is directly applicable to operational satellite-based LST monitoring.

2605.21528 2026-06-18 cs.LG cs.AI 版本更新

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University(杭州师范大学基础医学院) Research Department, Hangzhou Domain Zones Technology Co.Ltd.(杭州域区技术有限公司)

AI总结 本文提出了一种可重复的基于日志的自动机器学习框架,用于医疗风险预测中的可解释流水线优化,通过分析组件属性、交互和冗余性,提高了模型性能和稳定性。

详情
AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性,由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit,一种确定性和基于日志的自动化机器学习框架,将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体,使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间,其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示,增强(0.454)、模型选择(0.198)和不平衡处理(0.101)是Pima数据集的关键驱动因素,而不平衡处理主导中风(0.406)。组件相似性分析显示强冗余性,特征选择变体(biMax-biMean)表现出低RMS距离(0.0252),混合匹配无增强(0.0279),TomekLinks与无不平衡处理对齐(0.0325),而高斯噪声与无增强的差异更大(0.10)。该框架使用集成模型(加权F1 0.89,宏F1 0.88在Pima;加权F1 0.94在中风)实现了强且稳定的性能,而宏F1在中风上较低(0.67)由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡,集成模型的变异性低于SVM。这些结果表明,有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

2606.07622 2026-06-18 cs.LG stat.AP 版本更新

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University(萨特巴耶夫大学) Institute of Ionosphere LLP(电离层研究所) Information Technology Department(信息技术部门) Assiut University(阿西乌特大学)

AI总结 提出结合目标检测、GAN(CRA)修复和Transformer/CNN分类的端到端框架,发现重建质量高(PSNR 28.7dB)但分类准确率仅53%,通过置信度混合集成将MCA从48%提升至58%,揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情
AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻,这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功,但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线,利用目标检测进行样本分割,随后使用具有上下文残差聚合(CRA)的生成对抗网络(GAN)进行图像修复,以重建缺失的高频细节。接着,我们在重建数据上评估了现代基于Transformer(Swin、ViT)和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧:尽管结构保真度高(PSNR 28.7 dB,FID 74.01),分类准确率却停滞在53%。为了改善少数类检测,我们提出了一种基于置信度的混合集成方法,将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性,它们可能产生视觉上合理但语义模糊的特征(“幻觉”),从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系,为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围,我们将所得到的系统定位为岩相解释的决策支持和筛选工具,而非完全自主的分类器。代码可在以下网址获取:https://github.com/your-repo(注:原文URL未提供,此处为示例)

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

2508.10178 2026-06-18 q-bio.QM cs.LG 版本更新

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

估算欧洲陆架海环境中的碳库:用模型指导的机器学习替代再分析?

Jozef Skakala

发表机构 * Plymouth Marine Laboratory(普利茅斯海洋实验室) National Centre for Earth Observation(国家地球观测中心)

AI总结 提出用深度集成神经网络学习可观测变量与海洋碳库的关系,以低成本替代昂贵再分析,在西北欧陆架海实现高效碳库预测并提供不确定性。

Comments 37 pages, 9 figures (+ 3 in the appendix), v3 - published version

详情
Journal ref
JGR - Machine Learning and Computation 3 (2026)
AI中文摘要

陆架海对经济和碳循环至关重要,但碳库观测往往稀疏或高度不确定。碳再分析(无论是同化叶绿素a等代理变量还是直接同化碳)可提供替代方案,但运行成本高昂。我们提出使用计算成本低的神经网络集成(即深度集成)来学习直接可观测(大气、河流和海洋)变量与海洋碳库之间的关系,该关系来自一个物理-生物地球化学耦合模型。深度集成在西北欧陆架海(NWES)物理-生物地球化学模型自由运行模拟上训练。训练后,使用来自NWES再分析的输入而非自由运行来运行深度集成,证明它能高效预测多个NWES碳库(如碎屑、浮游动物、异养细菌),且与再分析的一致性远优于自由运行,同时提供不确定性信息。我们进一步表明,当深度集成直接由同化到再分析中的观测驱动时,其表现同样良好,但碳库只能预测在观测位置和时间。我们关注结果的可解释性,并展示了深度集成在未来气候假设情景中的潜在应用。我们认为,模型指导的机器学习为昂贵的再分析提供了可行的替代方案,并可在观测缺失和/或高度不确定的地方补充观测。

英文摘要

Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

面向数字孪生应用中导数信息高斯过程代理的流式稀疏Cholesky方法

Shridhar Vashishtha, Krishna Prasath Logakannan, Jacob Hochhalter, Shandian Zhe, Robert M. Kirby

发表机构 * organization= Department of Mechanical Engineering, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Kahlert School of Computing, University of Utah , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Scientific Computing \& Imaging Institute, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA

AI总结 提出一种流式稀疏Cholesky方法,通过动态更新和导数信息增强高斯过程代理,降低协方差矩阵维度,实现数字孪生中飞机结构性能的实时预测。

详情
AI中文摘要

数字孪生被开发用于模拟特定物理资产(或孪生体)的行为,它们可以由高保真基于物理的模型或代理组成。高精度代理通常优于多物理场模型,因为它们能够实时预测物理孪生体的未来状态。为了适应特定的物理孪生体,必须使用来自该物理孪生体的在役数据更新数字孪生模型。在本文中,我们结合并扩展了几项先前与代理相关的进展,旨在展示一个端到端的数字孪生(DT)解决方案,用于预测飞机结构(物理资产)的性能。为此,我们将高斯过程(GP)模型扩展到包含导数数据,以提高精度,并通过动态更新来吸收在役期间的物理孪生体数据。然而,包含导数数据会带来协方差矩阵维度增加的过高成本。我们通过改进的动态稀疏Cholesky线性系统求解器规避了这个问题。数值实验表明,导数增强的稀疏Cholesky GP方法在动态数据添加时产生了改进的模型预测精度。最后,我们在一个数字孪生框架内演示了所开发的算法,用于模拟航空航天飞行器中的疲劳裂纹扩展,从而通过我们组装的工程系统展示了数字孪生技术如何在实践中结合。

英文摘要

Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as they enable forecasting the physical twin future state in real-time. To adapt to a specific physical twin, the digital twin model must be updated using in-service data from that physical twin. In this paper, we combine and extend several previous surrogate-related advancements with the goal of demonstrating an end-to-end digital twin (DT) solution for predicting performance of an aircraft structure (the physical asset). To this end, we extend Gaussian process (GP) models to include derivative data, for improved accuracy, with dynamic updating to ingest physical twin data during service. Including derivative data, however, comes at a prohibitive cost of increased covariance matrix dimension. We circumvent this issue through our modified dynamic sparse Cholesky linear system solver. Numerical experiments demonstrate that the prediction accuracy of the derivative-enhanced sparse Cholesky GP method produces improved models upon dynamic data additions. Lastly, we demonstrate the developed algorithm within a DT framework to model fatigue crack growth in an aerospace vehicle, thereby exhibiting through our assembled engineered system how digital twin technologies can be combined in practice.

2511.19468 2026-06-18 cs.DC cs.ET cs.LG physics.space-ph 版本更新

Towards a future space-based, highly scalable AI infrastructure system design

面向未来天基、高度可扩展的AI基础设施系统设计

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, Jessica V. Bloom, Thomas Fischbacher, Konstantin Gromov, Urs Köster, Rishiraj Pravahan, James Manyika

发表机构 * Google(谷歌)

AI总结 本文探索利用卫星集群、太阳能板、自由空间光通信和TPU芯片构建天基机器学习计算系统,并分析辐射测试、发射成本等可行性。

Comments 18 pages, 4 figures. v2: Cleaned up references. Improved rough estimates. Fixed typos. Re-ran radiation test with improved methods

详情
AI中文摘要

如果AI是一种基础通用技术,我们应该预期对AI计算和能源的需求将持续增长。太阳是太阳系中最大的能源来源,因此值得考虑未来的AI基础设施如何最有效地利用这种能量。本文探索了用于太空机器学习的可扩展计算系统,该系统使用配备太阳能板的卫星群、自由空间光通信的星间链路以及谷歌张量处理单元(TPU)加速芯片。为了促进高带宽、低延迟的星间通信,卫星将近距离飞行。我们通过一个半径为1公里的81颗卫星集群说明了编队飞行的基本方法,并描述了一种使用基于高精度ML模型来控制大规模星座的方法。Trillium TPU经过了辐射测试。它们在总电离剂量相当于5年任务寿命的情况下存活,没有永久性故障,并针对位翻转错误进行了表征。发射成本是整体系统成本的关键部分;学习曲线分析表明,到2030年代中期,发射到近地轨道(LEO)的成本可能达到$\lesssim$200美元/公斤。

英文摘要

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via an 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有:面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

AI总结 提出三阶段框架,利用未标注构音障碍语音和典型语音数据集,通过教师模型生成伪标签、标签感知对比学习预训练和微调,在五个未见数据集上平均SRCC达0.761,显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情
AI中文摘要

构音障碍语音质量评估(DSQA)对于临床诊断和包容性语音技术至关重要。然而,主观评估成本高且难以规模化,而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题,我们提出了一个三阶段框架,利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签,然后使用标签感知对比学习策略进行弱监督预训练,使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器(如SpICE),完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

2603.29247 2026-06-18 cs.CL cs.AI cs.LG 版本更新

MemRerank: Preference Memory for Personalized Product Reranking

MemRerank:用于个性化产品重排序的偏好记忆

Zhiyuan Peng, Xuyang Wu, Huaixiao Tou, Yi Fang, Yu Gong

发表机构 * Santa Clara University(圣克拉拉大学) Independent Researcher(独立研究者)

AI总结 提出MemRerank框架,通过强化学习将用户购买历史提炼为查询无关的偏好记忆,用于LLM购物代理的个性化重排序,在1-in-5选择任务中准确率提升高达10.61个百分点。

Comments correct author name in metadata

详情
AI中文摘要

基于LLM的购物代理越来越依赖长购买历史和多轮交互来实现个性化,然而,由于噪声、长度和相关性不匹配,将原始历史简单地附加到提示中通常效果不佳。我们提出MemRerank,一个偏好记忆框架,将用户购买历史提炼为简洁、查询无关的信号,用于个性化产品重排序。为了研究这个问题,我们构建了一个端到端的基准测试和评估框架,围绕基于LLM的\ extbf{1-in-5}选择任务,该任务同时衡量记忆质量和下游重排序效用。我们进一步使用强化学习(RL)训练记忆提取器,以下游重排序性能作为监督。使用两个基于LLM的重排序器进行的实验表明,MemRerank始终优于无记忆、原始历史和现成记忆基线,在1-in-5准确率上提高了高达\ extbf{+10.61}个绝对百分点。这些结果表明,显式偏好记忆是代理型电子商务系统中个性化的一种实用且有效的构建模块。

英文摘要

LLM-based shopping agents increasingly rely on long purchase histories and multi-turn interactions for personalization, yet naively appending raw history to prompts is often ineffective due to noise, length, and relevance mismatch. We propose MemRerank, a preference memory framework that distills user purchase history into concise, query-independent signals for personalized product reranking. To study this problem, we build an end-to-end benchmark and evaluation framework centered on an LLM-based \textbf{1-in-5} selection task, which measures both memory quality and downstream reranking utility. We further train the memory extractor with reinforcement learning (RL), using downstream reranking performance as supervision. Experiments with two LLM-based rerankers show that MemRerank consistently outperforms no-memory, raw-history, and off-the-shelf memory baselines, yielding up to \textbf{+10.61} absolute points in 1-in-5 accuracy. These results suggest that explicit preference memory is a practical and effective building block for personalization in agentic e-commerce systems.

2604.00730 2026-06-18 cs.CY cs.AI cs.LG cs.SE 版本更新

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

基于CEFR启发的模糊C均值分类框架:自动化评估Scratch编程技能

Ricardo Hidalgo-Aragón, Jesús M. González-Barahona, Gregorio Robles

发表机构 * Universidad Rey Juan Carlos(雷昂卡洛斯大学)

AI总结 提出一种基于CEFR的Scratch项目评估框架,使用模糊C均值聚类对200万+项目分级,识别B2瓶颈并引入分类确定性指标以平衡自动反馈与人工审核。

Comments Best Paper Award CSEDU 2026 -Minor change FPC fix-

详情
AI中文摘要

背景:学校、培训平台和技术公司日益需要以透明、可重复的方法大规模评估编程能力,以支持个性化学习路径。目标:本研究引入一个与欧洲共同语言参考标准(CEFR)一致的Scratch项目评估教学框架,为学生和教师提供通用能力等级,并为课程设计提供可行见解。方法:我们对通过此http URL评估的2008246个Scratch项目应用模糊C均值聚类,实施序数准则将聚类映射到CEFR等级(A1-C2),并引入增强分类指标,识别过渡学习者,实现持续进度跟踪,量化分类确定性以平衡自动反馈与教师评审。影响:该框架能够诊断系统性课程缺口——特别是“B2瓶颈”,由于逻辑同步和数据表示的认知负荷,仅13.3%的学习者处于该等级——同时提供基于确定性的触发机制以进行人工干预。

英文摘要

Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation--while providing certainty--based triggers for human intervention.

2604.03275 2026-06-18 physics.ao-ph cs.AI cs.LG 版本更新

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

IPSL-AID:用于从全球到区域尺度气候降尺度的生成扩散模型

Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

发表机构 * Climate Modeling Center(气候建模中心) Sorbonne University(索邦大学) CNRS(法国国家科学研究中心) IPSL Paris(巴黎) France(法国)

AI总结 提出基于去噪扩散概率模型的IPSL-AID工具,利用ERA5再分析数据从粗分辨率输入生成0.25°温度、风和降水场,并建模细尺度特征概率分布以量化不确定性,准确重建统计分布、极端事件和空间结构。

Comments 17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science

详情
AI中文摘要

有效的气候变化适应和减缓策略需要高分辨率预测来指导战略决策。传统的全球气候模型通常以150至200公里的分辨率运行,缺乏表示关键区域过程的能力。IPSL-AID是一种基于去噪扩散概率模型的全球到区域降尺度工具,旨在解决这一限制。该工具在ERA5再分析数据上训练,利用粗分辨率输入及其时空上下文生成0.25°分辨率的温度、风和降水场。它还建模细尺度特征的概率分布,以产生用于不确定性量化的合理情景。该模型准确重建了统计分布,包括极端事件、功率谱和空间结构。这项工作突出了生成扩散模型在高效气候降尺度及不确定性量化方面的潜力。

英文摘要

Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty

2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University(interdisciplinary 能源研究平台,九州大学) Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University(物理研究所,物理、天文学与信息学学院,尼古拉库普林大学)

AI总结 本研究利用热力学驱动的机器学习方法(光谱映射)从全原子分子动力学轨迹中学习集体变量,揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制,并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情
AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码($-1$ PRF)调控蛋白质合成,该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量(CVs),可以促进通过无偏分子动力学(MD)模拟理解这一过程。这里,我们使用光谱映射(SM),一种热力学驱动的机器学习技术,直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物(中性和离子化形式)复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观(FELs)表明,配体诱导的去稳定化是拓扑选择性的。在穿线假结中,抑制剂去稳定化S2茎,而在非穿线假结中,去稳定化发生在S1和S3茎。此外,每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配,而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言,我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学,并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

2604.22476 2026-06-18 cs.CV cs.LG 版本更新

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

全神贯注于工作流:从视频流中自动高效发现事件

Marco Pegoraro, Jonas Seng, Dustin Heller, Wil M. P. van der Aalst, Kristian Kersting

发表机构 * Chair of Process and Data Science, RWTH Aachen University(过程与数据科学教授席位,亚琛工业大学) Artificial Intelligence & Machine Learning Lab, Technical University of Darmstadt(人工智能与机器学习实验室,达姆施塔特技术大学)

AI总结 提出SnapLog方法,利用图像嵌入和帧间相似矩阵进行时间分割,结合广义少样本分类从视频中提取事件数据,生成可解释的带标签时间戳帧序列。

Comments 18 pages, 6 figures, 1 table, 27 references

详情
AI中文摘要

业务流程管理和流程挖掘等学科通过基于记录的事件数据发现流程见解来帮助组织。然而,流程分析的一个障碍是数据多模态性:例如,视频形式的数据不能直接解释为事件。现有方法依赖于活动标签字典作为输入,无法提供逐帧标签解释,或依赖于过时的计算机视觉技术。在这项工作中,我们提出了SnapLog,一种通过使用图像嵌入将帧转换为特征向量,并通过帧间相似矩阵进行时间分割来从视频中提取事件数据的方法。然后使用广义少样本分类为视频片段分配标签,生成可解释为事件的带标签、时间戳的子帧序列。传统的流程挖掘技术可用于分析结果数据。我们表明,我们的方法生成的日志准确反映了视频中的流程。

英文摘要

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. Existing approaches rely on a dictionary of activity label as input, cannot provide frame-by-frame labeling explanations, or rely on superseded computer vision techniques. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.

2605.22845 2026-06-18 cs.CE cs.LG 版本更新

A finite-element-inspired bipartite graph learned simulator for manufacturability assessment in large-deformation sheet forming

基于交叉注意力的二分图神经网络用于大变形板材成形中节点和单元场的耦合预测

Yingxue Zhao, Haoran Li, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering(设计工程学院) Imperial College London(帝国理工学院伦敦分校) NVIDIA(NVIDIA公司)

AI总结 提出交叉注意力二分图神经网络(CAtt-BiGNN),通过节点-单元二分图结构和边感知交叉注意力机制,实现大变形板材成形中节点位移增量和单元减薄量的耦合预测。

详情
AI中文摘要

大变形板材成形的有限元模拟涉及节点运动学与单元级变形度量之间的节点-单元耦合。机器学习代理可以加速此类模拟,但大多数基于图的模型使用以节点为中心的表示。这种表示对于单元级量是间接的,通常通过插值或后处理从节点预测中恢复。它也可能模糊有限元更新背后的节点-单元耦合结构。本文提出了一种基于交叉注意力的二分图神经网络(CAtt-BiGNN),用于节点位移增量和单元减薄量的耦合预测。该图将网格节点和单元表示为不同但相连的实体,通过有向节点-单元边连接,从而在它们本征的离散域上预测节点场和单元场。边感知交叉注意力处理器根据几何边特征自适应地调节节点-单元耦合权重,实现节点运动状态与单元变形状态之间的双向消息传递。层次化扩展CAtt-BiUGNN将CAtt-BiGNN与图下采样-上采样相结合,以改善在较大网格上的信息传播。进一步评估了自适应高斯噪声作为可选的展开稳定策略。模型在两个具有不同图尺寸的代表性成形案例上进行了测试。与以节点为中心的基线和二分消融变体相比,CAtt-BiGNN改善了位移和减薄预测之间的平衡,而CAtt-BiUGNN在较大图设置下给出了最强的整体性能。结果表明,所提出的模型为大变形板材成形提供了一个有效的代理框架。

英文摘要

Explicit dynamic finite element (FE) simulations are widely used for large deformation engineering analysis, but repeated simulations remain costly during design space exploration and optimisation. In explicit FE analysis, nodal kinematics and element level deformation measures evolve through coupled node element updates. This motivates graph learned simulators that approximate one step FE state transitions and roll them out autoregressively. However, many mesh based graph surrogates are node centred, which makes element level variables and native nodal elemental exchange less direct to represent. This work proposes CAttBiGNN, a cross attention based bipartite graph neural network for coupled nodal elemental learning. The graph represents FE mesh nodes and elements as distinct entities linked by directed node element edges, enabling nodal displacement increments and element level deformation states to be predicted on their native discretisation domains. An edge aware cross attention processor uses geometric edge embeddings to modulate directional node element message passing. For larger graphs, CAttBiUGNN combines the bipartite processor with graph downsampling and upsampling to improve long-range information propagation. The method is evaluated on dome shaped cold forming and corner shaped hot forming benchmarks. Comparisons with node centred baselines and bipartite and attention ablations show improved accuracy and balance in nodal displacement and elemental thinning prediction during autoregressive rollout. The results indicate that the proposed finite element inspired learned simulator can support manufacturability oriented field prediction and efficient design space exploration in large deformation sheet material forming.

2605.26631 2026-06-18 stat.AP cs.LG 版本更新

Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

基于Knockoff滤波器与多准则权衡的数据驱动稀疏识别控制偏微分方程

Pongpisit Thanasutives, Naichang Ke, Yoshinobu Kawahara

发表机构 * RIKEN Center for Advanced Intelligence Project (AIP)(RIKEN先进人工智能项目中心) The University of Osaka(大阪大学)

AI总结 提出KO-PDE-IDENT框架,通过模型-X knockoff滤波器控制错误发现率,结合递归特征消除和多准则决策,从噪声数据中稀疏识别偏微分方程。

Comments 44 pages, 5 figures, 11 tables

详情
AI中文摘要

我们提出KO-PDE-IDENT,一个用于识别简洁偏微分方程(PDE)并控制错误发现率(FDR)的数据驱动框架。从噪声观测中发现PDE常常受到候选项之间极端多重共线性的阻碍,这导致典型的稀疏回归方法选择虚假项。为了解决这个问题,KO-PDE-IDENT首先通过具有有限样本FDR控制的模型-X knockoff滤波器挖掘潜在候选项的支持集,然后对存活的PDE备选方案进行细化和排序。该框架整合了三个组成部分。首先,通过将$\ell_{0}$约束的自适应最佳子集选择与SHapley Additive exPlanations(SHAP)相结合,构建knockoff特征统计量,产生有效且计算高效的差异统计量。其次,递归特征消除(RFE)过程去除边际贡献可省略的项,并通过knockoff扰动假设检验评估统计必要性。第三,最终模型选择被表述为一个多准则决策(MCDM)问题,其中最优控制方程是在预测精度、模型复杂度和系数不确定性等广泛准则之间取得最佳平衡的备选方案。我们在严重噪声污染下对五个经典PDE验证了KO-PDE-IDENT。实验结果表明,我们的框架可以精确恢复真实的PDE结构,消除错误发现同时保留所有真实潜在项,且系数估计误差低。

英文摘要

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme multicollinearity among candidate terms, which causes typical sparse-regression methods to select spurious terms. To address this problem, KO-PDE-IDENT initially mines a support set of potential candidate terms via model-X knockoff filters with finite-sample FDR control, then refines and ranks the surviving PDE alternatives. The framework integrates three components. First, knockoff feature statistics are constructed by coupling $\ell_{0}$-constrained adaptive best-subset selection with SHapley Additive exPlanations (SHAP), yielding an effective and computationally efficient difference statistic. Second, a recursive feature elimination (RFE) procedure removes terms whose marginal contributions are dispensable and assesses statistical necessity through knockoff-perturbed hypothesis testing. Third, the final model selection is formulated as a multi-criteria decision-making (MCDM) problem, where the optimal governing equation is the alternative that best balances a wide range of criteria such as predictive accuracy, model complexity and coefficient uncertainty. We evaluate KO-PDE-IDENT on five canonical PDEs under severe noise corruption. Empirical results show that our framework can exactly recover the true PDE structure, eliminating false discoveries while retaining all true underlying terms, with low coefficient estimation error.

2606.06133 2026-06-18 cs.SE cs.AI cs.LG cs.LO 版本更新

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

TLA-Prover: 通过偏好优化低秩适配实现可验证的 TLA+ 规范合成

Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

发表机构 * Department of Computer Science, Loyola University Chicago(洛约拉芝加哥大学计算机科学系)

AI总结 提出 TLA-Prover 模型,结合监督微调和基于修复的组相对策略优化,在 TLC 模型检查器上实现 TLA+ 规范合成,Gold/Diamond 级别通过率达 30%,约为未调优基线的 3.5 倍。

Comments 12 pages, 5 tables, 3 figures. Accepted at the 21st International Conference on Software Technologies (ICSOFT 2026)

详情
AI中文摘要

TLA+ 是一种用于验证分布式系统和安全关键协议的正式规范语言。大型语言模型(LLM)生成的 TLA+ 规范常常因语义原因无法通过 TLC 模型检查器。在 25 个 LLM 中,最佳公开基线的语法解析成功率为 26.6%,语义模型检查通过率为 8.6%。我们提出了 TLA-Prover,一个 200 亿参数的 TLA+ 规范合成模型。训练结合了在已验证示例上的监督微调(SFT)和基于修复的组相对策略优化(GRPO)。在 GRPO 阶段,模型学习修复自身被拒绝的规范。我们还从相同的 SFT 检查点训练了一个直接偏好优化(DPO)变体作为消融实验。TLC 直接提供奖励信号,无需学习奖励模型。每个输出分为四个等级:青铜(解析通过)、银(无警告)、金(通过 TLC)和钻石。要达到钻石级,模型的正确性属性会被自动微小修改;TLC 必须检测到违反。如果 TLC 仍然通过,则该属性始终为真且无贡献;输出无法达到钻石级。在一个保留的 30 问题基准上,TLA-Prover 在金级和钻石级均达到 9/30(即 pass@1 = 30%)。这大约是未调优基线 8.6% 的 3.5 倍。DPO 变体在钻石级达到 20%。金级和钻石级在每个检查点都一致;这防止了平凡属性失败模式。

英文摘要

TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-check. We present TLA-Prover, a 20-billion-parameter model for TLA+ specification synthesis. Training combines supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO). In the GRPO stage, the model learns to fix its own rejected specifications. We also train a direct preference optimization (DPO) variant from the same SFT checkpoint as an ablation. TLC provides the reward signal directly, with no learned reward model. Four tiers grade each output: Bronze (parses), Silver (no warnings), Gold (passes TLC), and Diamond. To reach Diamond, the model's correctness property is automatically altered in a small way; TLC must then detect a violation. If TLC still passes, the property was always-true and contributes nothing; the output fails Diamond. TLA-Prover reaches 9/30 (i.e. pass@1 = 30%) at both Gold and Diamond on a held-out 30-problem benchmark. This is roughly 3.5x the 8.6% untuned baseline. The DPO variant reaches 20% at Diamond. Gold and Diamond coincide at every checkpoint; this prevents the trivial-property failure mode.

2606.08206 2026-06-18 cs.CV cs.LG 版本更新

SegmentAnyTreeV2: Scaling Transformer-Based Tree Instance Segmentation Across Sensors, Platforms, and Forests

SegmentAnyTreeV2:跨传感器、平台和森林的基于Transformer的树木实例分割扩展

Maciej Wielgosz, Stefano Puliti, Rasmus Astrup

发表机构 * Norwegian Institute of Bioeconomy Research (NIBIO)(挪威生物经济研究所(NIBIO))

AI总结 提出SegmentAnyTreeV2,一种传感器和平台无关的森林点云语义与实例分割框架,结合Point Transformer v3骨干网络、轻量语义头和树木交叉注意力掩码解码器,在FOR-instance v3基准上达到90.5%精度和80.2%召回率,并展现出强跨域泛化能力。

Comments 25 pages, 6 figures, 10 tables, Corrected bibliography metadata and minor typographical issues; results unchanged

详情
AI中文摘要

我们提出SegmentAnyTreeV2,一种传感器和平台无关的森林点云语义与实例分割框架。该模型结合了基于序列化的Point Transformer v3骨干网络、轻量级语义头以及专注于树木的交叉注意力掩码解码器。语义预测将实例解码限制在树木类体素上,而实例感知的查询初始化、一对多种子监督和非对称掩码评分改善了密集和结构复杂林分中的分离效果。我们进一步引入了FOR-instance v3,一个扩展的基准数据集,包含427个场景和26,496棵标注树木,涵盖不同生物群落、森林结构和LiDAR平台。在FOR-instanceV2测试集上,SegmentAnyTreeV2实现了90.5%的精度、80.2%的召回率、85.0%的F1分数、90.7%的覆盖率和87.6%的语义mIoU,在实例检测和掩码完整性方面均优于以往基于学习的方法。在独立站点上的零样本评估进一步证明了其强大的跨域泛化能力。

英文摘要

We present SegmentAnyTreeV2, a sensor- and platform-agnostic framework for semantic and instance segmentation of forest point clouds. The model combines a serialization-based Point Transformer v3 backbone with a lightweight semantic head and a tree-focused cross-attention mask decoder. Semantic predictions restrict instance decoding to tree-class voxels, while instance-aware query initialization, one-to-many seed supervision, and asymmetric mask scoring improve separation in dense and structurally complex stands. We further introduce FOR-instance v3, an expanded benchmark comprising 427 scenes and 26,496 annotated trees across diverse biomes, forest structures, and LiDAR platforms. On the FOR-instanceV2 test split, SegmentAnyTreeV2 achieves 90.5% precision, 80.2% recall, 85.0% F1, 90.7% coverage, and 87.6% semantic mIoU, outperforming previous learning-based methods in both instance detection and mask completeness. Zero-shot evaluation on independent sites further demonstrates strong cross-domain generalization.

2606.11615 2026-06-18 cs.CV cs.CR cs.LG 版本更新

Adv-TGD: Adversarial Text-Guided Diffusion for Face Recognition Impersonation Attacks

Adv-TGD:面向人脸识别冒充攻击的对抗性文本引导扩散

Omid Ahmadieh, Nima Karimian

发表机构 * University of South Florida, Bellini College of Artificial Intelligence, Cybersecurity and Computing(南佛罗里达大学贝利尼人工智能、网络安全与计算学院)

AI总结 提出Adv-TGD框架,利用Stable Diffusion和LoRA微调生成逼真对抗人脸,在保持视觉质量的同时实现高成功率身份冒充攻击,平均ASR达85.90%。

详情
AI中文摘要

人脸识别(FR)技术的广泛普及引发了严重的隐私担忧,因为面部数据可能在未经同意的情况下被利用。为了解决这一挑战,我们提出了Adv-TGD,一个生成式对抗攻击框架,能够合成逼真的人脸,冒充目标身份并欺骗人脸识别系统。基于Stable Diffusion,Adv-TGD对每个样本进行LoRA微调,以简洁的文本提示为条件,生成自然但具有对抗性操控的身份。与传统的身份攻击方法不同,我们的方法在单步去噪过程中为每个源-目标对优化轻量级交叉注意力适配器。潜在混合受到面部局部热图掩码的约束,以确保空间精确的身份操控,同时保留非敏感区域。我们引入了一个复合目标,结合了掩码epsilon-MSE重建、FR嵌入空间中的阈值化身份差异、方向特征对齐和源相似性抑制,以平衡对抗攻击和视觉真实性。可选地,LLaVA生成的属性提示增强了细粒度语义细节,而不会重新引入身份线索。在黑盒评估协议下,Adv-TGD在IR152、IRSE50、MobileFace和FaceNet上平均攻击成功率(ASR)达到85.90%,超过语义SOTA基线Adv-CPG +6.25个百分点、基于扩散的化妆方法DiffAIM +3个百分点以及基于噪声的P3-Mask +16个百分点。尽管攻击效果强劲,Adv-TGD仍保持了高视觉保真度(PSNR = 27.15 dB,SSIM = 0.981)。此外,我们通过成功将其扩展到野外数据集(LADN)、通用对象分类(ImageNet)和基于Transformer的扩散模型(FLUX.1),展示了我们框架的灵活性。

英文摘要

The widespread adoption of face recognition (FR) technologies raises serious privacy concerns, as facial data can be exploited without consent. To address this challenge, we propose Adv-TGD, a generative adversarial attack framework that synthesizes photorealistic faces capable of impersonating target identities and deceiving face recognition systems. Built upon Stable Diffusion v2.1, Adv-TGD performs per-sample LoRA fine-tuning conditioned on concise textual prompts to generate natural yet adversarially manipulated identities. Unlike conventional identity attack approaches, our method optimizes lightweight cross-attention adapters for each source-target pair within a fixed-timestep denoising process. Latent blending is constrained by a face-local heatmap mask to ensure spatially precise identity manipulation while preserving non-sensitive regions. We introduce a composite objective that integrates masked epsilon-MSE reconstruction, thresholded identity divergence in FR embedding space, directional feature alignment, and source-similarity suppression to balance adversarial attack and visual realism. Optionally, LLaVA-generated attribute prompts enhance fine-grained semantic details without reintroducing identity cues. Under the black-box evaluation protocol, Adv-TGD attains an average attack success rate (ASR) of 85.90% across IR152, IRSE50, MobileFace, and FaceNet, surpassing the semantic SOTA baseline Adv-CPG by 6.25 points, the diffusion-based makeup method DiffAIM by 3 points, and the noise-based P3-Mask by 16 points. Despite its strong attack efficacy, Adv-TGD preserves high visual fidelity (PSNR = 28.18 dB, SSIM = 0.981). Furthermore, we demonstrate the flexibility of our framework by successfully extending it to in-the-wild datasets (LADN), general object classification (ImageNet), and transformer-based diffusion models (FLUX.1).

2606.12816 2026-06-18 quant-ph cs.ET cs.LG 版本更新

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley(加州大学伯克利分校) National Institute of Standards and Technology(国家标准与技术研究院)

AI总结 提出一种利用图强化学习进行校准感知的量子电路路由方法,通过IBM Heron r2校准数据选择SWAP操作,在MQT Bench电路上平均保真度达0.727,优于SABRE-best20的0.440。

详情
AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由,在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器,该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略,并通过九个慕尼黑量子工具包(MQT)基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中,合并的平均精确保真度为$0.727$,而SABRE-best20为$0.440$,目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数,并集中在5q和8q电路系列中;在固定树动作图下,所有10q系列都倾向于SABRE-best20。总体而言,我们的结果表明,校准感知的学习路由可以超越基于门计数的编译,提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. We observed that fidelity gains came with higher routed two-qubit counts and were concentrated in 5 qubit and 8 qubit circuit families; under the fixed tree action graph, all 10 qubit families favored SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

2606.17276 2026-06-18 cs.IR cs.LG 版本更新

On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies

LLM在生成式推荐中的记忆行为:观察、启示与训练策略

Sunwoo Kim, Sunkyung Lee, Clark Mingxuan Ju, Donald Loveland, Bhuvesh Kumar, Kijung Shin, Neil Shah, Liam Collins

发表机构 * KAIST(韩国科学技术院) Sungkyunkwan University(成均馆大学) Snap Inc.(Snap公司)

AI总结 研究LLM在生成式推荐中的记忆倾向,发现其过度依赖一跳记忆,提出IIRG训练策略以学习多跳协同与语义关系,显著提升对非一跳记忆用户的推荐效果。

详情
AI中文摘要

生成式推荐(GR)已成为推荐系统的一个有前景的方向。最近,大型语言模型(LLM)越来越多地被用于GR,因为其丰富的预训练知识有望帮助它们泛化到传统以记忆为导向的基线所能捕捉的常见用户行为模式之外。然而,现有的基于LLM的GR工作很大程度上忽略了LLM众所周知的记忆倾向,如果这种倾向存在于为GR微调的LLM中,将限制它们对预训练知识的利用。在这项工作中,我们通过检查一跳记忆(即模型推荐训练数据中项目的直接后继项目)来研究这一担忧。我们表明,LLM比非LLM的GR模型更频繁地这样做——事实上,它们相对于GR基线的大部分增益实际上来自那些目标项目可以通过一跳记忆预测的用户。我们直觉认为,提高剩余用户的性能需要LLM学习更丰富的项目-项目关系,超越一跳转换。为此,我们提出了IIRG,一种新颖的训练策略,教导LLM捕获:(1)从用户序列中跨多跳的项目共现导出的协同关系,以及(2)具有相似主题的项目之间的语义关系,这两者都可以作为有用的推荐信号。我们表明,IIRG显著优于仅使用标准下一项目预测训练的LLM,尤其是对于那些测试项目在训练时的一跳转换中未覆盖的用户,增益尤为显著。

英文摘要

Generative recommendation (GR) has emerged as a promising direction for recommender systems. Recently, large language models (LLMs) have been increasingly adopted for GR, as their rich pretrained knowledge is expected to help them generalize beyond common user behavior patterns that traditional memorization-oriented baselines can capture. However, existing LLM-based GR works largely ignore LLMs' well-known tendency to memorize, which, if present in LLMs fine-tuned for GR, would restrict their utilization of pretrained knowledge. In this work, we investigate this concern by examining one-hop memorization, where a model recommends items that are direct successors of items in the training data. We show that LLMs do this more than non-LLM-based GR models-in fact, the vast majority of their gains over GR baselines are actually on users whose target items can be predicted through one-hop memorization. We intuit that improving performance on the remaining users requires LLMs to learn richer item-item relations beyond one-hop transitions. To achieve this, we propose IIRG, a novel training strategy that teaches LLMs to capture: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes, both of which can serve as useful recommendation signals. We show that IIRG significantly improves over LLMs trained solely with standard next-item prediction, with especially large gains for users whose test items are not covered by train-time one-hop transitions.

2606.17846 2026-06-18 cs.RO cs.CV cs.LG 版本更新

Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models

Qwen-RobotManip 技术报告:对齐解锁机器人操作基础模型的规模

Haoqi Yuan, Zhixuan Liang, Anzhe Chen, Ye Wang, Haoyang Li, Pei Lin, Yiyang Huang, Zixing Lei, Tong Zhang, Jiazhao Zhang, Jie Zhang, Jingyang Fan, Gengze Zhou, Qihang Peng, Chenxu Lv, Xiaoyue Chen, An Yang, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou, Chenfei Wu, Xiong-Hui Chen

发表机构 * Qwen Team(Qwen团队)

AI总结 提出 Qwen-RobotManip,通过统一的对齐框架(表示、运动和行为维度)实现多源异构操作数据的大规模协同训练,构建约38,100小时预训练语料,在零样本指令跟随、跨本体迁移等泛化能力上超越先前模型。

Comments 44 pages

详情
AI中文摘要

语言和多模态基础模型通过统一公式对齐异构数据并大规模训练,实现了强大的泛化能力。在本报告中,我们研究这种扩展方法是否可以应用于机器人操作以实现真正的泛化。这具有挑战性,因为与文本不同,操作数据本质上是异构的、收集成本高且多样性狭窄,使得对齐和规模同时变得困难。我们提出了 Qwen-RobotManip,一个基于 Qwen-VL 构建的可泛化视觉-语言-动作基础模型。Qwen-RobotManip 引入了一个跨操作表示、运动和行为维度的统一对齐框架,使大规模多源训练变得一致而非冲突。这种对齐能力进而使 Qwen-RobotManip 能够吸收以前训练方案无法维持规模的操作数据。一个人到机器人合成流水线将第一人称手部演示转换为跨15个平台的机器人轨迹,一个严格的策展流水线协调异构数据集。仅使用开源数据集和人类视频,无需专有数据收集,Qwen-RobotManip 构建了约38,100小时的预训练语料,并展现出涌现的泛化能力,包括零样本指令跟随、对扰动的鲁棒性、反应性错误恢复和跨本体迁移。我们发现标准基准无法捕捉预训练质量,因此采用了包括 RoboCasa365、LIBERO-Plus、EBench、RoboTwin-Clean2Rand、RoboTwin-IF 和 RoboTwin-XE 在内的 OOD 设置。Qwen-RobotManip 在所有 OOD 设置中显著优于先前最先进的模型(包括 π0.5),在 RoboChallenge 中排名第一,相对改进20%,并在包括 AgileX ALOHA、Franka、UR 和 ARX 在内的真实机器人平台上得到验证。

英文摘要

Foundation models in language and multimodality achieve strong generalization by aligning heterogeneous data under a unified formulation and training at scale. In this report, we investigate whether this scaling recipe can be applied to robotic manipulation to achieve genuine generalization. This is challenging because, unlike text, manipulation data is heterogeneous by nature, expensive to collect, and narrow in diversity, making alignment and scale simultaneously difficult. We present Qwen-RobotManip, a generalizable Vision-Language-Action foundation model built on Qwen-VL. Qwen-RobotManip introduces a unified alignment framework across the representation, motion, and behavioral dimensions of manipulation, making large-scale multi-source training coherent rather than conflicting. This alignment capability in turn enables Qwen-RobotManip to absorb manipulation data at a scale that prior training regimes could not sustain. A human-to-robot synthesis pipeline converts egocentric hand demonstrations into robot trajectories across 15 platforms, and a rigorous curation pipeline harmonizes heterogeneous datasets. Using only open-source datasets and human videos without proprietary data collection, Qwen-RobotManip constructs a ~38,100-hour pretraining corpus and exhibits emergent generalization capabilities, including zero-shot instruction following, robustness to perturbations, reactive error recovery, and cross-embodiment transfer. We find that standard benchmarks fail to capture pretraining quality and instead adopt OOD settings including RoboCasa365, LIBERO-Plus, EBench, RoboTwin-Clean2Rand, RoboTwin-IF, and RoboTwin-XE. Qwen-RobotManip substantially outperforms prior state-of-the-art models, including $π$0.5, across all OOD settings, ranks 1st in RoboChallenge with a 20% relative improvement, and is validated on real-robot platforms including AgileX ALOHA, Franka, UR, and ARX.

2606.18105 2026-06-18 cs.NI cs.LG 版本更新

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization

OmniPlan:一种用于及时且近乎最优的网络规划优化的自适应框架

Longlong Zhu, Jiashuo Yu, Zedi Chen, Yuhan Wu, Zhifan Jiang, Yuchen Xian, Yimeng Liu, Jiajie Su, Shaopeng Zhou, Xingyuan Li, Hongyan Liu, Xuan Liu, Dong Zhang, Chunming Wu, Xiang Chen

发表机构 * Zhejiang University(浙江大学) Fuzhou University(福州市大学) Yangzhou University(扬州大学) The State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) College of Computer Science and Technology(计算机科学与技术学院)

AI总结 提出OmniPlan自适应框架,利用大语言模型解析用户意图,通过混合专家架构动态选择MIP求解器、启发式算法或深度强化学习模型,实现网络规划优化的及时性与近乎最优性,在分布式机器学习推理卸载任务中延迟降低97.8%,资源消耗降低11.5%。

Comments Accepted by ACM KDD 2026

详情
AI中文摘要

网络规划优化是跨多个领域(包括交通系统、通信网络和电网)的基本问题。它需要在复杂约束下同时优化多个相互竞争的目标。现有的网络规划优化框架依赖混合整数规划(MIP)求解器、启发式算法和深度强化学习(DRL)模型来计算规划决策。然而,它们缺乏对多样化和动态用户意图的有效适应性,从而导致执行时间与最优性之间的权衡。在本文中,我们提出OmniPlan,一种自适应框架,在网络规划优化中同时实现及时性和近乎最优性。为了实现现有解决方案所缺乏的适应性,OmniPlan采用基于大语言模型(LLM)的解释器,将异构的自然语言意图转换为统一且可量化的用户偏好向量。然后,它采用混合专家架构,集成MIP求解器、启发式算法和DRL模型作为专门专家,OmniPlan通过动态选择及时且近乎最优的专家来适应多样化的意图。最后,它包含一个基于DRL的专家配置模块,该模块微调优化目标权重,使规划决策与用户特定偏好对齐。我们使用代表性的真实工作负载(即分布式机器学习(ML))评估OmniPlan,其中我们利用OmniPlan将广泛的ML推理任务(例如决策树、SVM、朴素贝叶斯、XGBoost和随机森林)卸载到硬件设备网络。我们在真实测试平台上的实验表明,OmniPlan为真实ML推理任务实现了近乎最优且低执行时间的卸载,延迟降低高达97.8%,网络设备资源消耗降低高达11.5%。

英文摘要

Network planning optimization is a fundamental problem across diverse domains, including transportation systems, communication networks, and power grids. It requires simultaneous optimization of multiple competing objectives under complex constraints. Existing network planning optimization frameworks rely on mixed integer programming (MIP) solvers, heuristics, and deep reinforcement learning (DRL) models to compute planning decisions. However, they lack effective adaptability to diverse and dynamic user intents, thus leading to the trade-off between execution time and optimality. In this paper, we propose OmniPlan, an adaptive framework that achieves both timeliness and near-optimality in network planning optimization. To achieve the adaptability lacking in existing solutions, OmniPlan employs a large language model (LLM)-based interpreter to convert heterogeneous natural-language intents into a unified and quantifiable user-preference vector. Then it employs a mixture-of-experts architecture that integrates MIP solvers, heuristics, and DRL models as specialized experts, where OmniPlan adapts to diverse intents by dynamically selecting timely and near-optimal experts. Finally, it incorporates a DRL-based expert configuration module that fine-tunes optimization objective weights to align planning decisions with user-specific preferences. We evaluate OmniPlan with a representative real-world workload, i.e., distributed machine learning (ML), where we leverage OmniPlan to offload a wide spectrum of ML inference tasks, e.g., decision trees, SVM, naive Bayes, XGBoost, and random forests, onto a network of hardware devices. Our experiments on a real-world testbed indicate that OmniPlan achieves near-optimal and low-execution-time offloading for real-world ML inference tasks, reducing latency by up to 97.8\% and network device resource consumption by up to 11.5\%.

13. 其他/综合机器学习 19 篇

2412.16468 2026-06-18 cs.LG 版本更新

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

通往人工超级智能之路:超级对齐的全面综述

HyunJin Kim, DongHyun Ryu, Xiaoyuan Yi, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, JinYeong Bak, Xing Xie

发表机构 * Microsoft Research Asia(微软亚洲研究院) Sungkyunkwan University(顺天大学) Stanford University(斯坦福大学) Fudan University(复旦大学)

AI总结 本文综述了超级对齐问题,通过分析可扩展监督范式(夹层、自我增强和弱到强泛化)及其局限性,探讨了监督、控制和管理人工超级智能的挑战与路径。

Comments 24 pages

详情
AI中文摘要

大型语言模型(LLMs)的出现引发了关于人工超级智能(ASI)的讨论,这是一种假设性的、超越人类智能的AI系统。尽管ASI仍处于假设阶段且远超出当前AI能力,但讨论其潜力、探索其可行性和潜在风险对于未来AI系统的发展至关重要。超级对齐的概念源于可扩展监督,后者研究当直接人类监督不足时如何监督日益强大的AI系统。本文聚焦于超级对齐问题:“监督、控制和管理人工超级智能的过程”。我们首先回顾可扩展监督范式——夹层、自我增强和弱到强泛化,然后通过可能性和不可能性的视角分析当前范式的局限性,讨论关键挑战,并提出未来AI系统安全持续改进的路径。

英文摘要

The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization -- then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.

2605.08934 2026-06-18 cs.LG 版本更新

From Mechanistic to Compositional Interpretability

从机制到组合可解释性

Ward Gauderis, Thomas Dooms, Steven T. Homer, Kola Ayonrinde, Geraint A. Wiggins

发表机构 * UK AI Security Institute(英国人工智能安全研究所)

AI总结 本文提出组合可解释性框架,通过范畴论原理解决机制可解释性无法客观验证的问题,将解释质量分解为忠实度和复杂度,引入压缩细化方法实现模型简化,理论证明简洁性准则保障人类对齐的解释。

详情
AI中文摘要

机制可解释性旨在通过逆向工程神经模型的行为来解释其计算结构,但缺乏正式框架导致无法客观验证。本文引入组合可解释性,基于组合性和最小描述长度原则的范畴论框架。组合解释是语法和语义映射的对,必须满足一致性。将解释质量分解为忠实度和复杂度,将其视为约束优化问题,并引入压缩细化方法系统地重构模型为更简单的部分。最后证明了在简洁性准则下,语法压缩理论上能保证更简洁的人类对齐解释。该框架将 prominent 机制方法作为细化子类,澄清了为何其压缩性启发式方法与人类可解释性一致。本文为自动化发现和评估机制解释提供了可测量、可优化的基础。

英文摘要

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we derive a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable blueprint for automating the discovery and evaluation of mechanistic explanations.

2410.21258 2026-06-18 quant-ph cs.CC cs.LG 版本更新

Provable quantum speedups for computing persistence in topological data analysis

可证明的量子加速用于拓扑数据分析中的持久性计算

Casper Gyurik, Alexander Schmidhuber, Robbie King, Vedran Dunjko, Ryu Hayakawa

发表机构 * applied Quantum algorithms (aQa), Leiden University, 2300 RA Leiden, The Netherlands Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, USA Department of Computing Yukawa Institute for Theoretical Physics \& The Hakubi Center, Kyoto University, Japan

AI总结 提出一种高效量子算法,用于判断拓扑数据分析中洞的持久性,并证明该问题为BQP_1-hard,暗示在标准复杂性假设下存在指数级量子加速。

Comments 17 pages

详情
Journal ref
PRX Quantum 7, 020361 (2026)
AI中文摘要

拓扑数据分析(TDA)旨在通过检查数据拓扑中空洞的数量和持久性,从数据集中提取对噪声鲁棒的特征。我们为与TDA核心任务密切相关的一个计算问题提供了高效的量子算法——判断给定空洞是否在不同长度尺度上持续存在。此外,我们证明该问题本身是$\mathsf{BQP}_1$-hard的,意味着经典解决方案极不可能;这与所有先前的TDA量子方法形成对比,在这些方法中,问题对于量子计算机也是难解的,或者严格的经典困难性证明仍然悬而未决。这一结果表明,在标准复杂性理论假设下,该问题存在指数级的量子加速。我们的方法依赖于将空洞的持久性编码到引导稀疏哈密顿量问题的一个变体中,其中引导态由空洞的调和代表元构造而成。

英文摘要

Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA -- determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

2604.23716 2026-06-18 cs.AI cs.IT cs.LG cs.MA math.IT 版本更新

Information-Theoretic Measures in AI: A Practical Decision Guide

人工智能中的信息论度量:实用决策指南

Nikolaos Al. Papadopoulos, Konstantinos E. Psannis

发表机构 * Department of Applied Informatics, University of Macedonia(马其顿大学应用信息系)

AI总结 本文为七种信息论度量提供实用决策框架,围绕每个度量的三个关键问题:回答的问题与AI场景、适合的估计器、最危险的误用,并附有流程图和决策表。

Comments 25 pages, 2 tables, 1 figure. Submitted to Entropy (MDPI)

详情
AI中文摘要

信息论(IT)度量在人工智能中无处不在:熵驱动决策树分裂和不确定性量化,交叉熵是默认的分类损失,互信息支撑表示学习和特征选择,转移熵揭示动态系统中的有向影响。第二类较不成熟的度量——整合信息(Phi)、有效信息(EI)和自主性——已出现用于表征智能体复杂性。尽管被广泛采用,度量选择常常与估计器假设、失败模式和安全的推断主张脱节。本文为所有七种度量提供了一个实用决策框架,围绕每个度量的三个指导性问题组织:(i)该度量回答什么问题,在何种AI背景下;(ii)哪种估计器适合数据类型和维度;(iii)最危险的误用是什么。该框架通过两个互补的人工制品实现:度量选择流程图和主决策表。我们涵盖每个度量的AI/ML和决策智能体应用领域,并使用标准化桥接框将IT量与认知构造联系起来。三个工作示例展示了该框架在具体从业者场景中的应用,涵盖表示学习、时间影响分析和进化智能体复杂性。

英文摘要

Information-theoretic (IT) measures are ubiquitous in artificial intelligence: entropy drives decision-tree splits and uncertainty quantification, cross-entropy is the default classification loss, mutual information underpins representation learning and feature selection, and transfer entropy reveals directed influence in dynamical systems. A second, less consolidated family of measures, integrated information (Phi), effective information (EI), and autonomy, has emerged for characterizing agent complexity. Despite wide adoption, measure selection is often decoupled from estimator assumptions, failure modes, and safe inferential claims. This paper provides a practical decision framework for all seven measures, organized around three prescriptive questions for each: (i) what question does the measure answer and in which AI context; (ii) which estimator is appropriate for the data type and dimensionality; and (iii) what is the most dangerous misuse. The framework is operationalized in two complementary artifacts: a measure-selection flowchart and a master decision table. We cover both AI/ML and decision-making agent application domains per measure, with standardized Bridge Boxes linking IT quantities to cognitive constructs. Three worked examples illustrate the framework on concrete practitioner scenarios spanning representation learning, temporal influence analysis, and evolved agent complexity.

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany(纽约州立大学阿尔巴尼分校)

AI总结 本文系统性地探讨了点云分类和分割中的深度学习架构,分析了点云数据的结构特性,分类了不同架构的工作,并评估了其在主流基准上的性能,同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

详情
Journal ref
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026
AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而,其固有的无序和不规则性质,加剧了传感器噪声和遮挡的影响,给基于机器学习的方法带来了独特的挑战。为应对这些问题,已开发出多种策略,包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中,我们的重点是深度学习模型在3D视觉三个基本任务中的应用:点云分类、部分分割和语义分割。我们首先正式定义点云数据,然后深入讨论其结构特性。接着,我们根据其骨干结构对重要工作进行分类,并评估其在流行基准上的性能。除了经验比较外,我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

2605.25929 2026-06-18 cs.MA cs.LG 版本更新

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合:谁成为影响者?

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全中心) Faculty of Computer Science, University of Vienna(维也纳大学计算机科学系)

AI总结 本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制,揭示输入依赖的FJ参数使系统成为专家混合,并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

详情
AI中文摘要

多智能体LLM协商的有效性不仅取决于智能体的个体预测,还取决于它们如何沟通和协作。我们通过Friedkin-Johnsen (FJ)意见动力学的视角研究这一机制,这是一个可处理的模型,用于分析多智能体系统中的固执、影响力和意见变化,并捕捉经验观察到的协商模式。我们表明FJ参数是输入依赖的,将多智能体协商转变为专家混合。这一视角意味着,当路由反映智能体能力时,多智能体系统可以胜过单个智能体和静态集成。由于能力在实践中是潜在的,我们分析了影响力如何通过可观察的代理建立:智能体的自我评估自信度、感知自信度以及与其他智能体观点的初始对齐。

英文摘要

The effectiveness of multi-agent LLM deliberation depends not only on the agents' individual predictions, but also on how they communicate and collaborate. We study this mechanism through the lens of Friedkin-Johnsen (FJ) opinion dynamics, a tractable model for analyzing stubbornness, influence, and opinion change in multi-agent systems that captures empirically observed deliberation patterns. We show that the FJ parameters are input-dependent, turning multi-agent deliberation into a mixture of experts. This perspective implies that multi-agent systems can outperform single agents and static ensembles when routing reflects agent competence. Since competence is latent in practice, we analyze how influence is established through observable proxies: agents' self-assessed confidence, their perceived confidence, and initial alignment with other agents' views.

2606.17454 2026-06-18 cs.AI cs.LG 版本更新

Dissecting model behavior through agent trajectories

通过智能体轨迹剖析模型行为

Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras

发表机构 * AWS AI Labs(AWS人工智能实验室)

AI总结 本文提出“意图-执行差距”概念,并设计Simple Strands Agent(SSA)框架,通过分析138k条轨迹揭示模型在自主问题解决中的行为差异。

Comments 106 pages, 50 Figures, 16 Tables

详情
AI中文摘要

AI智能体性能不仅仅是一个建模问题,它本质上是一个系统问题。模型的高级能力通过智能体框架(harness)实现。因此,模型假设与框架行为之间的差距很容易阻止模型的全部能力转化为智能体性能。我们将此形式化为“意图-执行差距”:模型意图与框架执行之间的不匹配,反之亦然。我们认为,最小化这种意图-执行差距与框架设计的其他方面(如工具和执行循环)同样重要。为了说明这种框架-模型对齐的影响,我们开发了一个简单且可定制的框架,称为“Simple Strands Agent”(SSA)。SSA旨在找到跨不同模型家族(如Claude、Gemini、GPT、Grok、Qwen)通用的常见模式,以及少量模型特定的偏好。我们做出两个贡献:(i)我们在流行的智能体基准测试(SWE-Pro、SWE-Verified和Terminal-Bench-2)上**复现或改进了**不同模型提供商家族报告的pass@1性能;(ii)基于对**SSA生成的138k条轨迹的分析**,我们超越了前沿模型之间通常相对均匀的pass@1数字。通过在代码状态空间中表示智能体轨迹,我们观察到问题解决行为中的模型级差异。更细粒度的指标,如编辑频率、测试活动和阶段转换,揭示了单个模型如何在自主问题解决的不同阶段分配努力。

英文摘要

AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we reproduce or improve on the pass@1 performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an analysis of 138k trajectories generated by SSA, we look beyond the pass@1 numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

2510.15300 2026-06-18 cs.LG 版本更新

DFCA: Decentralized Federated Clustering Algorithm

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues, Stefan Harmeling

发表机构 * Fraunhofer Institute for Software and Systems Engineering(弗劳恩霍夫软件与系统工程研究所) Lamarr Institute for Machine Learning and AI(拉马尔人工智能与机器学习研究所)

详情
英文摘要

Clustered Federated Learning has emerged as an effective approach for handling heterogeneous data across clients by partitioning them into clusters with similar or identical data distributions. However, most existing methods, including the Iterative Federated Clustering Algorithm (IFCA), rely on a central server to coordinate model updates, which creates a bottleneck and a single point of failure, limiting their applicability in more realistic decentralized learning settings. In this work, we introduce DFCA, a fully decentralized clustered FL algorithm that enables clients to collaboratively train cluster-specific models without central coordination. DFCA uses a sequential running average to aggregate models from neighbors as updates arrive, providing a communication-efficient alternative to batch aggregation while maintaining clustering performance. Our experiments on various datasets demonstrate that DFCA outperforms other decentralized algorithms and performs comparably to centralized IFCA, even under sparse connectivity, highlighting its robustness and practicality for dynamic real-world decentralized networks.

2601.18637 2026-06-18 quant-ph cs.LG stat.ML 版本更新

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan(富士通量子实验室,富士通研究,富士通株式会社,神户,神奈川县211-8588,日本)

Comments 21 pages, 6 figures (added Github repository)

详情
Journal ref
IJCNN 2026
英文摘要

Generating quantum data by learning the underlying quantum distribution poses challenges in both theoretical and practical scenarios, yet it is a critical task for understanding quantum systems. A fundamental question in quantum machine learning (QML) is the universality of approximation: whether a parameterized QML model can approximate any quantum distribution. We address this question by proving a universality theorem for the Many-body Projected Ensemble (MPE) framework, a method for quantum state design that uses a single many-body wave function to prepare random states. This demonstrates that MPE can approximate any distribution of pure states within a 1-Wasserstein distance error. This theorem provides a rigorous guarantee of universal expressivity, addressing key theoretical gaps in QML. For practicality, we propose an Incremental MPE variant with layer-wise training to improve the trainability. Numerical experiments on clustered quantum states and quantum chemistry datasets validate MPE's efficacy in learning complex quantum data distributions.

2405.14273 2026-06-18 cs.LG cs.AI math.OC 版本更新

Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based Methods

通过基于梯度的方法在有限时间内精确求解混合整数线性规划的驱动数据反优化问题

Akira Kitaoka

发表机构 * NEC Corporation(日本电气株式会社)

AI总结 本文研究了混合整数线性规划中驱动数据反优化问题,揭示了子最优损失的几何结构,并证明了基于梯度的优化方法可以在有限次迭代内达到观测数据的一致性,同时给出了投影子梯度下降法的迭代次数上界。

Comments 66 pages; comments are welcome

详情
AI中文摘要

驱动数据反优化问题(DDIOP)是估计能够解释观测最优解数据的目标函数参数(权重)的问题,广泛应用于混合整数线性规划(MILP)中。在MILP的反优化中,特征的预测误差对权重的不连续性使得直接应用基于梯度的优化方法具有挑战性。本文聚焦于子最优损失,该损失在权重与观测数据完全一致时达到最小值零。我们揭示了该损失的几何结构——它具有凸性和分段线性特性,并且与观测数据完全一致的权重集合具有正的“厚度”而非单一点或薄边界。利用这一结构,我们证明了:首先,一类广泛的基于梯度的优化方法,包括投影子梯度下降法,在有限次迭代中可以达到观测数据的一致性(在有限时间内获得精确解)。其次,对于投影子梯度下降法,我们给出了达到精确一致性的迭代次数的显式上界。第三,当正向问题是一个整数线性规划(ILP)时,我们将其上界表示为仅由样本数、特征维度和约束系数矩阵结构(例如,若系数矩阵是总模矩阵,则迭代次数被显式地限制为样本数平方和维度的多项式)决定的完全显式迭代次数。通过数值实验,我们验证了这种有限步数达到行为。

英文摘要

A data-driven inverse optimization problem (DDIOP) is the problem of estimating the objective-function parameters (weights) that explain observed optimal-solution data, and it arises in many applications, including mixed integer linear programming (MILP). In inverse optimization for MILPs, the prediction error of the features is discontinuous with respect to the weights, so applying gradient-based optimization directly is difficult. In this paper we focus on the suboptimality loss. This loss attains its minimum value, zero, if and only if the weights are exactly consistent with the observed data. We reveal a geometric structure of this loss -- it is convex and piecewise linear, and moreover the set of weights that are exactly consistent with the observed data has a positive ``thickness'' rather than being a single point or a thin boundary -- and use it to show the following. First, a broad class of gradient-based optimization methods, including projected subgradient descent, reaches exact consistency with the observed data in finitely many iterations (an exact solution is obtained in finite time). Second, for projected subgradient descent we give an explicit upper bound on the number of iterations needed to reach exact consistency. Third, when the forward problem is an integer linear program (ILP), we give this upper bound as a fully explicit iteration count determined solely by the number of samples, the dimension of the features, and the structure of the constraint coefficient matrix. Through numerical experiments, we confirm this finite-step attainment behavior.

2407.00449 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Fully tensorial approach to hypercomplex-valued neural networks

Agnieszka Niemczynowicz, Radosław Antoni Kycia

发表机构 * Faculty of Computer Science and Mathematics, Cracow University of Technology(克拉科夫技术大学计算机科学与数学系)

Comments 23 pages, 3 figures

详情
Journal ref
Information Sciences, 2026, 123796
英文摘要

A fully tensorial theoretical framework for hypercomplex-valued neural networks is presented. The proposed approach enables neural network architectures to operate on data defined over arbitrary finite-dimensional algebras. The central observation is that algebra multiplication can be represented by a rank-three tensor, which allows all algebraic operations in neural network layers to be formulated in terms of standard tensor contractions, permutations, and reshaping operations. This tensor-based formulation provides a unified and dimension-independent description of hypercomplex-valued dense and convolutional layers and is directly compatible with modern deep learning libraries supporting optimized tensor operations. The proposed framework recovers existing constructions for four-dimensional algebras as a special case. Within this setting, a tensor-based version of the universal approximation theorem for single-layer hypercomplex-valued perceptrons is established under mild non-degeneracy assumptions on the underlying algebra, thereby providing a rigorous theoretical foundation for the considered class of neural networks.

2512.17696 2026-06-18 cs.LG stat.ME stat.ML 版本更新

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

Yuri Calleo

发表机构 * Unimercatorum(乌尼默卡图姆大学)

详情
英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

2508.06406 2026-06-18 cs.DC cs.LG 版本更新

Blockchain-Enabled Federated Learning

Murtaza Rangwala, KR Venugopal, Rajkumar Buyya

发表机构 * Quantum Cloud and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne, Australia(量子云与分布式系统实验室,计算机与信息系统学院,墨尔本大学,澳大利亚) Department of Computer Science and Engineering, University of Visvesvaraya College of Engineering, Bangalore University, India(计算机科学与工程系,维萨瓦拉亚工程学院,班加罗尔大学,印度)

Comments 32 pages, 6 figures, chapter for edited book (Federated Learning: Foundations and Applications)

详情
英文摘要

Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM 版本更新

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine(基因组医学系) D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology(D.O. Ott妇产科与生殖医学研究所)

Comments 31 pages, 5 figures

详情
Journal ref
Frontiers in Genetics 16 (2026) 1694070
英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

2503.01163 2026-06-18 cs.AI cs.CL cs.HC cs.LG cs.NE 版本更新

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

Comments Accepted to ACL 2025 Findings

详情
英文摘要

Prompt optimization aims to search for effective prompts that enhance the performance of large language models (LLMs). Although existing prompt optimization methods have discovered effective prompts, they often differ from sophisticated prompts carefully designed by human experts. Prompt design strategies, representing best practices for improving prompt performance, can be key to improving prompt optimization. Recently, a method termed the Autonomous Prompt Engineering Toolbox (APET) has incorporated various prompt design strategies into the prompt optimization process. In APET, the LLM is needed to implicitly select and apply the appropriate strategies because prompt design strategies can have negative effects. This implicit selection may be suboptimal due to the limited optimization capabilities of LLMs. This paper introduces Optimizing Prompts with sTrategy Selection (OPTS), which implements explicit selection mechanisms for prompt design. We propose three mechanisms, including a Thompson sampling-based approach, and integrate them into EvoPrompt, a well-known prompt optimizer. Experiments optimizing prompts for two LLMs, Llama-3-8B-Instruct and GPT-4o mini, were conducted using BIG-Bench Hard. Our results show that the selection of prompt design strategies improves the performance of EvoPrompt, and the Thompson sampling-based mechanism achieves the best overall results. Our experimental code is provided at https://github.com/shiralab/OPTS .

2502.15376 2026-06-18 cs.LG cond-mat.mes-hall 版本更新

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

发表机构 * Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg(数学科学系,查尔姆斯理工大学和哥德堡大学) Department of Physics, Stockholm University, AlbaNova University Center(物理系,斯德哥尔摩大学,阿尔巴诺瓦大学中心) VERSES AI Research Lab, Los Angeles, USA(VERSES AI研究实验室,美国洛杉矶) Department of Physics, University of Gothenburg(物理系,哥德堡大学)

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025)
英文摘要

Equivariant network architectures are a well-established tool for predicting invariant or equivariant quantities. However, almost all learning problems considered in this context feature a global symmetry, i.e. each point of the underlying space is transformed with the same group element, as opposed to a local ``gauge'' symmetry, where each point is transformed with a different group element, exponentially enlarging the size of the symmetry group. Gauge equivariant networks have so far mainly been applied to problems in quantum chromodynamics. Here, we introduce a novel application domain for gauge-equivariant networks in the theory of topological condensed matter physics. We use gauge equivariant networks to predict topological invariants (Chern numbers) of multiband topological insulators. The gauge symmetry of the network guarantees that the predicted quantity is a topological invariant. We introduce a novel gauge equivariant normalization layer to stabilize the training and prove a universal approximation theorem for our setup. We train on samples with trivial Chern number only but show that our models generalize to samples with non-trivial Chern number. We provide various ablations of our setup. Our code is available at https://github.com/sitronsea/GENet/tree/main.

2410.23503 2026-06-18 cs.LG 版本更新

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada(SADC-CDSS IA儿科,圣-朱斯特医院,蒙特利尔,加拿大) Solutions Applicare AI Inc., Montreal, Canada(应用爱智AI公司,蒙特利尔,加拿大) Université de Montréal, Canada(蒙特利尔大学,加拿大) MEDINT CBRNE Group, Montreal, Canada(MEDINT CBRNE组,蒙特利尔,加拿大)

Comments 12 figures, 12 tables and 39 pages

详情
Journal ref
Diagnostics 14 (2024) 2763
英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

2211.01960 2026-06-18 q-bio.NC cs.HC cs.LG 版本更新

FingerFlex: Inferring Finger Trajectories from ECoG signals

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

发表机构 * Bauman Moscow State Technical University(巴乌曼莫斯科国立技术大学) ALVI Labs(ALVI实验室) Brain Dynamics Group, Higher School of Economics(高等经济学院脑动力组) University of Tuebingen(图宾根大学)

Comments 6 pages, 3 figures, 4 tables. Preprint. Under review

详情
Journal ref
10.1109/IEEECONF58974.2023.10405112
英文摘要

Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor brain-computer interfaces.

1909.13203 2026-06-18 cs.LG stat.ML 版本更新

Learning transport cost from subset correspondence

Ruishan Liu, Akshay Balsubramani, James Zou

发表机构 * Department of Electrical Engineering(电气工程系) Department of Genetics(遗传学系) Stanford University(斯坦福大学) Department of Biomedical Data Science(生物医学数据科学系)

详情
Journal ref
International Conference on Learning Representations (ICLR 2020)
英文摘要

Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related. Reliable cost functions are typically not available and practitioners often resort to using hand-crafted or Euclidean cost even if it may not be appropriate. In this work, we investigate how to learn the cost function using a small amount of side information which is often available. The side information we consider captures subset correspondence -- i.e. certain subsets of points in the two data sets are known to be related. For example, we may have some images labeled as cars in both datasets; or we may have a common annotated cell type in single-cell data from two batches. We develop an end-to-end optimizer (OT-SI) that differentiates through the Sinkhorn algorithm and effectively learns the suitable cost function from side information. On systematic experiments in images, marriage-matching and single-cell RNA-seq, our method substantially outperform state-of-the-art benchmarks.