arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.21488 2026-05-21 cs.LG 版本更新

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

平衡推理器:学习吸引子使推理可扩展

Benhao Huang, Zhengyang Geng, Zico Kolter

发表机构 * CMU(卡内基梅隆大学)

AI总结 本文提出平衡推理器(EqR),通过学习任务条件的吸引子来实现可扩展推理,该方法在测试时无需外部验证器或任务特定先验,通过增加深度和广度实现推理能力的提升,从而在Sudoku-Extreme上将准确率从2.6%提升至超过99%。

Comments ICML 2026

详情
AI中文摘要

通过迭代更新潜在状态来扩展测试时计算已成为推理的强大范式。然而,这些迭代模型能够超越记忆模式进行泛化内部机制仍不清楚。我们假设可泛化推理源于学习任务条件的吸引子:潜在动态系统,其稳定固定点对应有效解。我们通过平衡推理器(EqR)正式化这一过程,该方法在测试时无需外部验证器或任务特定先验,通过沿两个轴扩展内部动态:深度(通过运行更多迭代)和广度(通过聚合多个初始化中的随机轨迹)。经验上,测试时扩展的收益与更强的收敛性向解对齐的吸引子紧密相关。这种吸引子视角使神经网络能够根据任务难度自适应分配测试时计算。虽然简单案例在1到5次迭代步骤内收敛,更难的案例则受益于大规模测试时扩展。通过展开相当于40,000层的深度,可扩展的潜在推理将准确率从前馈模型的2.6%提升到Sudoku-Extreme上的超过99%。这些结果表明,学习的吸引子景观为理解迭代潜在模型中的可扩展推理提供了有用的机制视角。

英文摘要

Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.

2605.21486 2026-05-21 cs.LG cond-mat.dis-nn cs.AI stat.ML 版本更新

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

量化超参数迁移与嵌入层学习率的重要性

Dayal Singh Kalra, Maissam Barkeshli

发表机构 * Department of Physics, University of Maryland, College Park(马里兰大学物理系) Department of Computer Science, University of Maryland, College Park(马里兰大学计算机科学系) Joint Quantum Institute, University of Maryland, College Park(马里兰大学联合量子研究所) Meta Superintelligence Labs, Fundamental AI Research(Meta超智能实验室,基础人工智能研究)

AI总结 本文研究了超参数迁移的量化方法,通过三种指标评估超参数迁移的质量,发现Maximal Update(μP)参数化在训练中通过最大化嵌入层学习率提升了超参数迁移质量,而权重衰减虽改善了缩放定律拟合,但会降低外推鲁棒性。

Comments 10+28 pages, 5+17 figures

详情
AI中文摘要

超参数迁移允许从小规模到大规模模型中外推最优优化超参数,这对于训练大型语言模型(LLMs)至关重要。这可以通过拟合缩放定律或通过精心选择参数化方式(如Maximal Update(μP))来实现,使最优超参数近似规模不变。本文首先开发了一个框架,通过三个指标量化超参数迁移:(1)缩放定律拟合的质量,(2)对外推误差的鲁棒性,以及(3)由于参数化选择导致的渐近损失惩罚。接着,通过一系列全面的消融实验,探讨了为何μP相对于标准参数化(SP)在训练AdamW时提供高质量的学习率迁移,因为现有理论不足。我们发现,μP相对于SP的主要优势在于最大化嵌入层学习率。在SP中,嵌入层学习率充当瓶颈,导致训练不稳定性;将其增加到宽度的倍数以匹配μP,可显著平滑训练并提高超参数迁移质量。此外,权重衰减改善了缩放定律拟合,但在固定token-per-parameter设置下会损害外推的鲁棒性。

英文摘要

Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper, we first develop a framework to quantify hyperparameter transfer through three metrics: (1) the quality of the scaling law fit, (2) the robustness to extrapolation errors, and (3) the asymptotic loss penalty due to choice of parameterization. Next, we investigate through a comprehensive series of ablations why $μ$P appears to offer high-quality learning rate transfer relative to standard parameterization (SP), as existing theory is inadequate. We find that the overwhelming benefit of $μ$P relative to SP when training with AdamW arises simply from maximizing the learning rate of the embedding layer. In SP, the embedding layer learning rate acts as a bottleneck that induces training instabilities; increasing it by a factor of width to match $μ$P dramatically smooths out training while improving hyperparameter transfer. We also find that weight decay improves the scaling law fits, while, in the fixed token-per-parameter setting, it hurts the robustness of the extrapolation.

2605.21485 2026-05-21 cs.LG 版本更新

EvoStruct: Bridging Evolutionary and Structural Priors for Antibody CDR Design via Protein Language Model Adaptation

EvoStruct: 通过蛋白质语言模型适应桥接进化和结构先验以进行抗体CDR设计

Mansoor Ahmed, Sujin Lee, Umar Khayaz, Murray Patterson

发表机构 * Georgia State University, Atlanta, USA(佐治亚州立大学,亚特兰大,美国) Georgia Institute of Technology, Atlanta, USA(佐治亚理工学院,亚特兰大,美国)

AI总结 本文提出EvoStruct方法,通过蛋白质语言模型适应桥接进化和结构先验,解决抗体CDR设计中的词汇崩溃问题,提升了氨基酸恢复率和降低困惑度。

详情
AI中文摘要

等价图神经网络(GNN)方法在抗体互补决定区(CDR)设计中实现了最高的序列恢复,但面临严重的词汇崩溃问题。当前最佳的GNN方法只预测非常少的氨基酸,如酪氨酸和甘氨酸,而忽略了功能上重要的残基。我们追溯这种失败的原因在于GNN编码器从有限的结构数据中学习氨基酸分布,丢弃了在进化数据库中编码的替代模式。为了解决这个问题,我们提出了EvoStruct,它通过一个冻结的蛋白质语言模型(PLM)与来自E(3)-等价GNN的3D结构上下文通过交叉注意力适配器进行连接。与以往用于一般蛋白质设计的PLM-结构适配器不同,EvoStruct通过逐步解冻PLM和R-Drop一致性正则化,针对CDR设计特有的词汇崩溃问题进行优化。在CHIMERA-Bench数据集上,EvoStruct在几种抗体设计方法中实现了最高的氨基酸恢复率和最低的困惑度,相比最佳的GNN基线,提升了序列恢复率16%,降低了困惑度43%,同时恢复了2.3倍更大的氨基酸多样性,并与地面真实值具有最高的结合对相关性。

英文摘要

Equivariant graph neural network (GNN) methods for antibody complementarity-determining region (CDR) design achieve the highest sequence recovery but suffer from severe vocabulary collapse. The current best GNN methods over-predict very few amino acids, such as tyrosine and glycine, while ignoring functionally important residues. We trace this failure to GNN encoders learning amino acid distributions de novo from limited structural data, discarding substitution patterns encoded in evolutionary databases. To resolve this, we propose EvoStruct, which bridges a frozen protein language model (PLM) with 3D structural context from an E(3)-equivariant GNN via a cross-attention adapter. Unlike prior PLM-structure adapters for general protein design, EvoStruct targets the vocabulary collapse problem specific to CDR design through progressive PLM unfreezing and R-Drop consistency regularization. On the CHIMERA-Bench dataset, EvoStruct achieves the highest amino acid recovery and lowest perplexity among several antibody design methods, improving sequence recovery by 16% and reducing perplexity by 43% relative to the best GNN baselines, while recovering 2.3x greater amino acid diversity and the highest binding-pair correlation with ground truth.

2605.21483 2026-05-21 astro-ph.CO cs.LG 版本更新

Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction

Velocityformer: 用于宇宙学速度重建的破缺对称性匹配等价图变换器

Tilman Tröster, David Mirkovic, Veronika Oehl, Arne Thomsen

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 该研究提出Velocityformer,一种等价图变换器架构,通过匹配观测数据的破缺对称性来提高宇宙学速度重建的精度,其在速度相关系数r上比标准线性理论基线提高了35%。

详情
AI中文摘要

精确测量动能Sunyaev-Zel'dovich效应(kSZ效应)——一种探测大尺度宇宙中等离子体分布的关键可观测量——需要准确从光谱巡天中重建星系速度。kSZ测量的信噪比(SNR)直接与重建速度和真实速度之间的相关系数r成正比。我们引入了Velocityformer,一种等价图变换器架构,旨在匹配观测数据的特定对称性。尽管底层物理在平移和旋转下是等价的,但观测效应由于视线方向的偏好而打破了这一对称性。将模型的归纳偏置与数据的破缺对称性匹配,能够一致地提高所有模型大小和训练体积下的性能,Velocityformer在标准线性理论基线上将r提高了35%,并在所有数据体积上优于机器学习基线。通过将模型的归纳偏置与数据以及基于物理的长波长解进行条件化,Velocityformer具有高度的数据效率,能够在最少的低保真模拟数据上训练到高精度,并在输入几何、宇宙学参数和星系样本上实现零样本泛化。在高保真模拟星系目录上,这将r比物理基线提高了30%,直接转化为观测数据上的相同SNR增益。

英文摘要

Precise measurement of the kinematic Sunyaev-Zel'dovich (kSZ) effect - a probe of the large-scale distribution of baryonic matter, a key observable for cosmological inference - requires accurate reconstruction of galaxy velocities from spectroscopic surveys. The signal-to-noise ratio (SNR) of kSZ measurements scales directly with the correlation coefficient $r$ between reconstructed and true velocities. We introduce Velocityformer, an equivariant graph transformer architecture designed to match the specific symmetry of the observational data. While the underlying physics is equivariant with respect to translations and rotations, observational effects break this symmetry due to the preferred line-of-sight direction. Matching the model's inductive bias to the data's broken symmetry consistently improves performance across all model sizes and training volumes, with Velocityformer improving $r$ by 35% over the standard linear theory baseline and outperforming ML baselines at every data volume. By matching the model's inductive bias to the data and conditioning on the physics-based long-wavelength solution, Velocityformer is highly data-efficient, training to high accuracy on as few as 4 low-fidelity simulations, and generalises zero-shot across input geometry, cosmological parameters, and galaxy sample. On high-fidelity simulated galaxy catalogues, this yields a 30% improvement in $r$ over the physical baseline, directly translating to the same SNR gain on observational data.

2605.21481 2026-05-21 cs.AI cs.CL cs.LG 版本更新

AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

AiraXiv:一个面向人类和AI科学家的AI驱动的开放获取平台

Junshu Pan, Panzhong Lu, Yixuan Weng, Qiyao Sun, Fang Guo, Zijie Yang, Qiji Zhou, Yue Zhang

发表机构 * Westlake University(西湖大学) Zhejiang University(浙江大学) Shanghai Innovation Institution(上海创新研究院) Zhongguancun Academy(中关村学院)

AI总结 本文提出AiraXiv平台,通过AI驱动的开放预印本、AI增强的分析与评审以及读者反馈,解决传统学术出版系统在AI时代面临的研究产出增长和可扩展性挑战。

详情
AI中文摘要

近年来,人工智能(AI)的进步加速了人类和AI生成的研究产出的增长,对传统学术出版系统施加了越来越大的压力,并在提交量增加、评审工作量和会议规模扩大时挑战了以会议和期刊为中心的可扩展性。为了解决这些挑战,我们探索了一个AI时代的出版范式,其中人类和AI科学家作为作者和读者参与,并通过持续反馈驱动的迭代使论文不断发展。我们提出了AiraXiv,一个基于开放预印本、AI增强的分析和评审以及读者反馈的AI驱动的开放获取平台。AiraXiv通过交互式UI支持人类科学家,通过基于模型上下文协议(MCP)的交互支持AI科学家。通过实际部署验证了AiraXiv,包括作为IC AIS 2025的提交平台,展示了其作为AI时代快速、包容和可扩展的研究基础设施的潜力。AiraXiv在https://airaxiv.com上公开可用。

英文摘要

Recent advances in artificial intelligence (AI) have accelerated the growth of both human-authored and AI-generated research outputs, placing increasing strain on traditional academic publishing systems and challenging the scalability of conference- and journal-centered paradigms amid rising submission volumes, reviewer workload, and venue size. To address these challenges, we explore an AI-era publishing paradigm in which both human and AI scientists participate as authors and readers, and papers evolve through continuous, feedback-driven iteration. We propose AiraXiv, an AI-driven open-access platform built on open preprints, AI-augmented analysis and review, and reader feedback. AiraXiv supports human scientists through an interactive UI and AI scientists through Model Context Protocol (MCP)-based interactions. We validate AiraXiv through real-world deployments, including serving as the submission platform for ICAIS 2025, demonstrating its potential as a fast, inclusive, and scalable research infrastructure for the AI era. AiraXiv is publicly available at https://airaxiv.com.

2605.21475 2026-05-21 cs.LG 版本更新

Is Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep Learning

关系预测任务是否需要固定模式图?关系深度学习中的全分辨率图结构学习

Yi Huang, Qingyun Sun, Jia Li, Xingcheng Fu, Jianxin Li

发表机构 * SKLCCSE, School of Computer Science and Engineering, Beihang University, Beijing, China(信息与电子学院,北京航空航天大学,北京,中国) Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China(教育区块链与智能技术重点实验室,教育部,广西师范大学,桂林,中国)

AI总结 本文提出了一种全分辨率且可优化的图结构学习框架FROG,用于关系深度学习,将关系结构学习建模为可学习的表角色建模问题,允许表作为节点和边在信息传递中发挥作用,并设计了基于角色的信息传递机制,以捕捉关系语义,同时通过功能依赖约束确保语义一致性,实验表明该方法在多个下游任务中优于现有方法。

Comments Accepted by the Forty-third International Conference on Machine Learning (ICML2026)

详情
AI中文摘要

关系预测任务在许多现实世界应用中至关重要,其中数据自然存储在关系数据库(RDBs)中。关系深度学习(RDL)通过将RDBs建模为图并应用图神经网络(GNNs)进行端到端学习来解决这个问题。然而,全分辨率属性通常被用作图构造的设计原则,以保持关系语义,这导致大多数现有方法依赖于固定的图结构。在本文中,我们提出FROG,一种用于RDL的全分辨率和可优化的图结构学习框架,将关系结构学习建模为可学习的表角色建模问题,允许表作为节点和边在信息传递中发挥作用。我们进一步设计了基于角色的信息传递机制,以捕捉关系语义,使图结构和GNN表示能够联合优化。为了确保语义一致性,我们引入了功能依赖约束,以在表和实体层面正则化表示。广泛的实验表明,我们的方法在多个下游任务中优于现有方法,并揭示了表角色对下游任务的影响,为RDL的图构造提供了新的见解。

英文摘要

Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs). Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning. However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs to preserve relational semantics, which leads most existing methods to rely on fixed graph structures. In this paper, we propose FROG, a Full-Resolution and Optimizable Graph Structure Learning} framework for RDL that formulates relational structure learning as a learnable table role modeling problem, allowing tables to contribute as nodes and edges in message passing. We further design role-driven message passing mechanisms to capture relational semantics, enabling joint optimization of graph structure and GNN representations. To ensure semantic consistency, we introduce functional dependency constraints that regularize representations across table and entity levels. Extensive experiments demonstrate that our method outperforms existing approaches and reveal how table roles impact downstream tasks, offering new insights into graph construction for RDL

2605.21468 2026-05-21 cs.LG cs.CL 版本更新

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

你只需要最小的RLVR训练:通过秩-1轨迹来扩展LLMs

Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Chengsong Huang, Jiaxin Huang, Yu Meng

发表机构 * University of Virginia(弗吉尼亚大学) Washington University in St. Louis(华盛顿大学圣路易斯分校)

AI总结 本文研究了通过秩-1轨迹扩展LLMs的方法,发现RLVR参数轨迹具有极低的秩和高度可预测性,并提出RELEX方法,通过简单的线性回归在无需训练模型的情况下实现高效的超量扩展。

Comments preprint. Code: https://github.com/weizhepei/RELEX

详情
AI中文摘要

可验证奖励的强化学习(RLVR)已成为改进大语言模型(LLMs)推理能力的主要范式,但其底层几何结构仍待探索。本文证明RLVR权重轨迹具有极低的秩且高度可预测。具体而言,我们发现大多数下游性能提升可通过参数增量的秩-1近似来捕捉,其中该投影的幅度与训练步数近似线性增长。受此启发,我们提出了一种简单且计算高效的RELEX(强化学习扩展)方法,通过从短观察窗口估计秩-1子空间并利用线性回归进行超量扩展,无需任何训练模型。在三个模型(即Qwen2.5-Math-1.5B、Qwen3-4B-Base和Qwen3-8B-Base)上,RELEX生成的检查点在领域内和领域外基准测试中表现匹配或优于RLVR性能,仅需完整RLVR训练的15%步数。令人惊讶的是,RELEX能在无训练成本的情况下超量扩展远超观察窗口,预测检查点多达10-20倍于观察前缀,并持续改进(例如,仅观察前50步并扩展到1000步)。我们的消融分析证实了RELEX的极简充分性:增加子空间秩或采用非线性建模不会进一步提升超量扩展效果。最后,我们显示RELEX的成功源于“去噪”效应:通过将更新投影到秩-1子空间,模型会丢弃那些会降性能的随机优化噪声。我们的代码可在https://github.com/weizhepei/RELEX获取。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of this projection evolves near-linearly with training steps. Motivated by this, we propose a simple and compute-efficient method RELEX (REinforcement Learning EXtrapolation), which estimates the rank-1 subspace from a short observation window and extrapolates future checkpoints via linear regression, with no learned model required. Across three models (i.e., Qwen2.5-Math-1.5B, Qwen3-4B-Base, and Qwen3-8B-Base), RELEX produces checkpoints that match or exceed RLVR performance on both in-domain and out-of-domain benchmarks, requiring as few as 15% steps of full RLVR training. Remarkably, RELEX is able to extrapolate far beyond the observation window at no training cost, predicting checkpoints up to 10-20$\times$ beyond the observed prefix with continued improvement (e.g., observe only the first 50 steps and extrapolate to 1000 steps). Our ablation analysis confirms the minimalist sufficiency of RELEX: neither increasing the subspace rank nor employing non-linear modeling yields further gains in extrapolation. Finally, we show that RELEX's success stems from a "denoising" effect: by projecting updates onto the rank-1 subspace, the model discards stochastic optimization noise that would otherwise degrade performance during extrapolation. Our code is available at https://github.com/weizhepei/RELEX.

2605.21467 2026-05-21 cs.LG cs.CL 版本更新

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

DelTA: 一种用于可验证奖励强化学习的判别性token信用分配

Kaiyi Zhang, Wei Wu, Yankai Lin

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学人工智能学院 Gallagher 学院) Ant International(蚂蚁国际)

AI总结 本文提出DelTA方法,通过估计token系数来增强特定侧的token梯度方向,从而改进可验证奖励强化学习中的token概率更新,提升了模型在数学基准测试中的性能。

详情
AI中文摘要

可验证奖励强化学习(RLVR)已成为提升大语言模型推理能力的核心技术。尽管其有效性已得到认可,但响应级奖励如何转化为token级概率变化仍缺乏深入理解。本文引入了RLVR更新的判别视角,表明策略梯度更新方向隐式地作为token梯度向量的线性判别器,从而决定学习过程中哪些token概率被增加或减少。在标准序列级RLVR中,该判别器由通过优势加权平均得到的正负侧质心构成。然而,此类质心构建可能被共享的高频模式(如格式token)主导,稀释了稀疏但判别性强的方向,这些方向更能区分高分响应与低分响应。为解决这一限制,本文提出DelTA,一种判别性token信用分配方法,通过估计token系数来放大侧特定的token梯度方向并降低共享或弱判别性的方向。这些系数重新加权了自我归一化的RLVR替代方案,使有效的侧向质心更具对比性,从而重塑RLVR更新方向。在七个数学基准测试中,DelTA在Qwen3-8B-Base和Qwen3-14B-Base上分别比最强的同规模基线高出3.26和2.62个平均分。此外,代码生成、不同backbone和域外评估的额外结果进一步展示了DelTA的泛化能力。

英文摘要

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose $\textbf{DelTA}$, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.

2605.21461 2026-05-21 cs.LG 版本更新

A Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation Functions

一种基于激活函数的加权最小二乘GNSS定位机器学习框架

Pin-Hsun Lee, Harry Leib

发表机构 * Department of Electrical and Computer Engineering, McGill University(麦吉尔大学电气与计算机工程系)

AI总结 本文提出了一种基于激活函数的加权最小二乘GNSS定位机器学习框架,通过使用信号质量指标作为训练特征,利用集成学习算法识别低质量信号,并通过激活函数将机器学习预测的分数转换为适当的权重以提高定位精度。

详情
AI中文摘要

全球导航卫星系统(GNSS)被广泛用于为各种应用提供位置、速度和时间(PVT)信息,包括交通运输、基于位置的通信服务和智能农业。在城市峡谷中,高楼大厦和狭窄街道可能导致信号遮挡、非视距(NLOS)接收和多路径效应,这些都会引入GNSS伪距测量的误差。尽管多星座GNSS有效增加了可用卫星的数量,但包含退化信号可能导致严重的定位误差。本文提出了一种基于激活函数的加权最小二乘(WLS)算法的机器学习框架,以提高定位精度。几种信号质量指标被用作集成学习算法的训练特征,以通过提供质量分数来识别低质量信号。然后,激活函数被用来将机器学习预测的分数转换为适合WLS定位的适当权重。为了评估我们方法的性能,使用来自香港和东京城市地区的实际数据集进行了实验。对激活函数的比较分析表明,Sigmoid函数在不同的机器学习算法和GNSS星座配置下始终产生最大的改进。所提出的算法在单星座和多星座场景中均表现出显著的定位误差减少。此外,我们的结果表明,所提出的算法在训练数据来自其他具有类似城市化水平的地区时,表现出强大的地理迁移性。

英文摘要

Global Navigation Satellite Systems (GNSS) are widely used to provide position, velocity, and timing (PVT) information for various applications, including transportation, location-based communication services, and intelligent agriculture. In urban canyons, high-rise buildings and narrow streets can cause signal obstruction, non-line-of-sight (NLOS) reception, and multipath effects that introduce errors in GNSS pseudorange measurements. Although multi-constellations GNSS effectively increase the number of available satellites, the inclusion of degraded signals can lead to severe positioning errors. This study proposes a machine learning framework for the weighted least squares (WLS) algorithm incorporating activation functions to enhance positioning accuracy. Several signal quality indicators are employed as training features for ensemble learning algorithms to identify poor quality signals by providing quality scores. Then, activation functions are employed to transform the machine learning predicted scores to appropriate weights for WLS positioning. To evaluate the performance of our approach, experiments are conducted using real-world datasets from Hong Kong and Tokyo urban areas. Comparative analysis of activation functions reveals that sigmoid functions consistently yield the greatest improvements with different machine learning algorithms and GNSS constellation configurations. The proposed algorithm demonstrates substantial reductions in positioning errors for both single- and multiconstellation scenarios. Furthermore, our results indicate that the proposed algorithm exhibits strong geographical transferability. The proposed algorithm maintains comparable level of performance when trained on data from other regions with similar levels of urbanization.

2605.21458 2026-05-21 cs.AI cs.LG stat.ME 版本更新

Mind the Sim-to-Real Gap & Think Like a Scientist

注意仿真到现实的差距并像科学家一样思考

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

发表机构 * Amazon SCOT(亚马逊SCOT团队) Yale University(耶鲁大学) Duke University(杜克大学)

AI总结 本文研究了在仿真和现实之间如何补充实验以减少价值差距,提出了Fisher-SEP方法,并通过两个案例研究展示了其应用。

详情
AI中文摘要

假设有规划者拥有一个预先训练的序列决策问题的仿真器,并有机会在现实中进行实验。仿真器查询成本低,但继承了校准数据中的混杂因素和漂移。实验是无偏的,但每次试验消耗一个现实单位。我们研究了规划者何时以及如何补充仿真器进行实验。我们给出了三个结果。首先,扩展的仿真引理将仿真器的价值误差分解为校准-部署偏移,该偏移可以随机化识别,以及一个参数残差,无法通过进一步交互减少。第二,仿真器最优策略与最优解之间的价值差距分为局部部分,这部分在部署策略已访问的状态上,以及可达性部分,这部分在部署策略未访问的状态上。在纯被动学习下,可达性部分在任何时间范围内都保持远离零。第三,我们提出了Fisher-SEP,一种辅助仿真的实验策略(SEP),该策略最小化目标策略价值的后验预测方差,具有仅奖励和仅转换的特殊化版本。两个案例研究展示了这些制度。在自动售货机供应链中,前端实验在时间范围足够长以抵消试点成本后超过后验更新。在HIV移动测试示例中,有一个走廊将一个受监控区域与一个受监控较差的区域分开,只有设计的探索才能到达受监控较差的区域。

英文摘要

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

2605.21455 2026-05-21 cs.LG 版本更新

Mitigating Label Bias with Interpretable Rubric Embeddings

通过可解释的评分标准嵌入缓解标签偏差

Calvin Isley, Johann D. Gaebler, Sharad Goel

发表机构 * Harvard Kennedy School(哈佛肯尼迪学校) Harvard University(哈佛大学)

AI总结 本文提出通过可解释的评分标准嵌入来缓解标签偏差问题,通过理论和实验证明该方法在合理条件下能减少标签偏差并提升群体质量评估。

详情
AI中文摘要

统计决策算法越来越多地应用于难以获取真实标签的领域,如招聘、大学录取和内容审核。在这些情况下,模型通常是在历史人类评估上进行训练——例如使用过去招聘决定作为真实申请者质量的代理。然而,如果过去的评估不公正地偏袒某些群体,基于这些标签训练的模型可能会继承这些偏见。为了解决这个问题,我们提出基于评分标准嵌入进行预测,这是一种表示框架,用专家定义的准则派生的特征替代标准黑盒嵌入,这些准则与感兴趣的底层构造对齐。通过将预测锚定在语义有意义的维度上,这种方法可以防止受偏见代理信号的影响。我们提供了理论和实证证据,证明在合理条件下评分标准嵌入能够缓解标签偏见。实证上,我们在一个新型的数据集上评估了我们的方法,该数据集包含申请大型硕士项目的申请。我们发现,基于评分标准嵌入训练的模型在减少群体差异的同时提高了群体质量的衡量标准。我们的结果表明,基于可解释、领域相关的表示进行预测,为存在偏见标签的学习提供了一种实用方法。

英文摘要

Statistical decision algorithms are increasingly deployed in domains where ground-truth labels are hard to obtain, such as hiring, university admissions, and content moderation. In these settings, models are typically trained on historical human evaluations -- for example, using past hiring decisions as a proxy for true applicant quality. However, if past evaluations unjustly favor certain groups, models trained on these labels may inherit those biases. To address this problem, we propose basing predictions on rubric embeddings, a representation framework that replaces standard black-box embeddings with features derived from expert-defined criteria that align with the underlying construct of interest. By anchoring predictions to semantically meaningful dimensions, this approach guards against biased proxy signals. We provide both theoretical and empirical evidence that rubric embeddings mitigate label bias under plausible conditions. Empirically, we evaluate our method on a novel dataset of applications to a large master's program. We find that models trained on rubric embeddings reduce group disparities while improving measures of cohort quality. Our results suggest that basing predictions on interpretable, domain-grounded representations offers a practical approach to learning in the presence of biased labels.

2605.21451 2026-05-21 cs.LG cond-mat.dis-nn cs.AI cs.NE 版本更新

Approximation Theory for Neural Networks: Old and New

神经网络的近似理论:旧与新

Soumendu Sundar Mukherjee, Himasish Talukdar

AI总结 本文综述了神经网络近似理论的发展,包括传统单隐层网络的密度结果、量化误差界限以及深度-宽度权衡,还探讨了Kolmogorov-Arnold网络等新架构的理论性质。

Comments 31 pages, 4 figures

详情
AI中文摘要

通用近似定理为神经网络的表达能力提供了数学解释。它们断言,在激活函数的温和条件下,前馈神经网络在广泛的函数类中是密集的,例如实数空间$\mathbb{R}^d$的紧致子集上的连续函数、$L^p$空间或Sobolev空间。在过去四十年里,这些定性的一般性结果已发展成丰富的定量理论,涉及近似速率、参数效率以及深度和宽度等架构特征的作用。本文综述了该理论的几个方面。我们回顾了单隐层网络的经典密度结果,以及将近似误差与网络大小和目标函数的光滑性假设联系起来的量化界限。特别强调了深度-宽度权衡以及证明更深层次架构在结构函数类中可实现更高参数效率的结果。除了标准前馈神经网络外,我们还回顾了Kolmogorov-Arnold网络(KANs)等近期发展的理论性质。

英文摘要

Universal approximation theorems provide a mathematical explanation for the expressive power of neural networks. They assert that, under mild conditions on the activation function, feedforward neural networks are dense in broad function classes, such as continuous functions on compact subsets of $\mathbb{R}^d$, $L^p$ spaces, or Sobolev spaces. Over the past four decades, these qualitative universality results have evolved into a rich quantitative theory addressing approximation rates, parameter efficiency, and the role of architectural features such as depth and width. This survey presents several glimpses into this theory. We review classical density results for single-hidden-layer networks, as well as quantitative bounds that relate approximation error to network size and smoothness assumptions on target functions. Particular emphasis is placed on depth--width trade-offs and on results demonstrating that deeper architectures can achieve superior parameter efficiency for structured function classes. In addition to standard feedforward neural networks, we also review recent developments on Kolmogorov--Arnold Networks (KANs), which offer an alternative architectural paradigm and whose approximation-theoretic properties have begun to attract significant theoretical attention.

2605.21442 2026-05-21 cs.LG cs.AI 版本更新

torchtune: PyTorch native post-training library

torchtune: 一种基于PyTorch的后训练库

Mark Obozov, Maxime Griot, Joseph Cummings, Evan Smothers, Felipe Mello, Rafi Ayub, Philip John Bontrager, Salman Mohammadi, Ariel Kwiatkowski, Nathan Azrak, Mircea Mironenco

发表机构 * PyTorch Meta Stanford(斯坦福) Meta-FAIR

AI总结 本文介绍了torchtune,一种基于PyTorch的后训练库,旨在简化大语言模型的后训练生命周期,提供高效的微调、实验和部署流程,通过模块化和可扩展性提升性能和灵活性。

Comments 14 pages

详情
AI中文摘要

现代大语言模型通常需要多阶段训练流水线才能实现强大的下游性能,后训练是适应开放式模型的主要接口。我们介绍了torchtune,一种基于PyTorch的库,旨在简化大语言模型的后训练生命周期,使微调、实验和面向部署的工作流程更加高效。与许多现有的微调框架不同,这些框架往往在易用性、专用食谱或硬件效率方面进行优化,而牺牲了透明性和扩展性,torchtune强调模块化、可修改性和对底层PyTorch组件的直接访问。在本文中,我们阐述了torchtune的设计原则,描述了这些原则如何体现在其模型构建器、训练食谱和分布式训练堆栈中,并在具有代表性的后训练设置中评估了该库。我们对比了流行的微调框架,包括Axolotl和Unsloth,并展示了torchtune在许多设置中提供了强大的性能和内存效率,同时保持足够的灵活性以支持快速的研究迭代。这些结果将torchtune定位为可重复的大语言模型后训练研究的实用基础。

英文摘要

Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enabling efficient fine-tuning, experimentation, and deployment-oriented workflows. Unlike many existing fine-tuning frameworks, which often optimize for ease of use, specialized recipes, or hardware efficiency at the cost of transparency and extensibility, torchtune emphasizes modularity, hackability, and direct access to the underlying PyTorch components. In this paper, we present the design principles behind torchtune, describe how they are reflected in its model builders, training recipes, and distributed training stack, and evaluate the library across representative post-training settings. We compare against popular fine-tuning frameworks, including Axolotl and Unsloth, and show that torchtune provides strong performance and memory efficiency across many settings while remaining flexible enough for rapid research iteration. These results position torchtune as a practical foundation for reproducible LLMs post-training research.

2605.21437 2026-05-21 physics.geo-ph cs.LG stat.ML 版本更新

Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

基于神经网络的负二项回归用于每周地震预测:每个单元的分散估计和尾部风险评估

Alim Igilik

AI总结 本文提出了一种基于神经网络的地震预测方法,通过每个单元的分散参数估计和尾部风险评估,改进了传统泊松分布的假设,提高了极端事件预测的准确性。

Comments 28 pages, 9 figures. Source code available at https://github.com/Al1mkaYandere/seismic-probabilistic-modeling

详情
AI中文摘要

传统方法在空间网格上预测每周地震数量时依赖于具有单一全局分散假设的泊松分布。我们证明在中亚(2010-2024)的地震数据中,这一假设系统性地被违反,通过具有边界校正的似然比检验,强烈拒绝泊松假设(p < 10^{-179})。本文的主要贡献是EarthquakeNet架构,它通过神经网络(空间嵌入+MLP)提供每个单元的过分散参数alpha的内生估计,而无需显式空间协方差指定。与现有地震预测中的负二项回归方法不同,后者通常假设单一全局alpha,所提出的每个单元公式允许模型识别地震聚类的空间异质性,并通过预测分布的分位数构建概率风险意识警报。在2018-2023年的四系统走步评估中,与负二项GLM基线相比,平均皮球偏差(MPD)减少了8.6%。在尾部区域(Y >= 5)的改进最为显著,所提出模型的连续排名概率得分(CRPS)比基线低12.5%,表明极端事件预测的校准得到改善。

英文摘要

Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.

2605.21435 2026-05-21 cs.LG math.AT math.CT 版本更新

Gaussian Sheaf Neural Networks

高斯sheaf神经网络

André Ribeiro, Ana Luiza Tenório, Tiago da Silva, Diego Mesquita

发表机构 * Getulio Vargas Foundation(盖图利奥·瓦格斯基金会) MBZUAI(穆斯林人工智能研究所)

AI总结 本文提出高斯sheaf神经网络(GSNNs),通过将高斯分布的均值和协方差矩阵作为节点特征,解决传统GNN在处理概率分布特征时的不足,提出新的拉普拉斯算子并进行实验验证。

详情
AI中文摘要

图神经网络(GNNs)已成为学习关系数据的主流方法。尽管传统GNN的消息传递机制适合向量值节点特征,但某些情况下节点特征更适合用概率分布表示而非实数向量。具体来说,当节点特征是高斯分布时,其由均值和协方差矩阵描述,简单地将参数拼接成单一向量并应用标准消息传递会丢失均值和协方差的几何和代数结构。我们提出高斯sheaf神经网络(GSNNs),这是一个将这些归纳偏置纳入图学习的系统框架。基于细胞sheaf理论,我们推导出一个新的拉普拉斯算子,该算子扩展到此设置并保留其关键性质。我们通过合成和实际数据的实验补充了我们的理论贡献,展示了GSNNs的实用相关性。

英文摘要

Graph Neural Networks (GNNs) have become the de facto standard for learning on relational data. While traditional GNNs' message passing is well suited for vector-valued node features, there are cases in which node features are better represented by probability distributions than real vectors. Concretely, when node features are Gaussians, characterized by a mean and a covariance matrix, naively concatenating their parameters into a single vector and applying standard message passing discards the geometric and algebraic structure that governs means and covariances. We propose Gaussian Sheaf Neural Networks (GSNNs), a principled framework that incorporates these inductive biases into graph-based learning. Building on the theory of cellular sheaves, we derive a new Laplacian operator that generalizes the sheaf Laplacian to this setting and preserves its key properties. We complement our theoretical contributions with experiments on synthetic and real-world data that illustrate the practical relevance of GSNNs.

2605.21429 2026-05-21 cs.RO cs.LG 版本更新

roto 2.0: The Robot Tactile Olympiad

roto 2.0:机器人触觉奥林匹克

Elle Miller, Jayaram Reddy, Ayush Deshmukh, Trevor McInroe, David Abel, Oisin Mac Aodha, Sethu Vijayakumar

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 本文提出roto 2.0,一个基于触觉的强化学习基准,旨在通过四种不同的机器人形态(16-DOF到24-DOF)标准化触觉强化学习,专注于端到端的'盲'操作,仅使用本体感觉和触觉传感,不使用状态信息或蒸馏。研究展示了显著的性能提升,盲控代理在10秒内完成13次保定球旋转,比当前最先进的速度快了一个数量级。通过开源环境和经过充分调优的基线,降低了进入门槛,使研究人员能够优先考虑基本算法挑战而非繁琐的强化学习调优。

Comments Accepted to 7th ViTac Workshop, ICRA 2026

详情
AI中文摘要

基于触觉的强化学习(RL)目前受到碎片化研究和对过饱和方向任务的关注所限制。我们介绍了Robot Tactile Olympiad的v2版本(roto 2.0),一个GPU并行化的基准,旨在标准化四种不同的机器人形态(16-DOF到24-DOF)之间的触觉强化学习。与之前的基准不同,roto专注于端到端的'盲'操作,仅使用本体感觉和触觉传感,而不使用状态信息或蒸馏。我们展示了显著的性能提升,我们的盲控代理在10秒内完成13次保定球旋转,比当前最先进的速度快了一个数量级。通过开源我们的环境和经过充分调优的基线,我们降低了进入门槛,使研究人员能够优先考虑基本算法挑战而非繁琐的强化学习调优。网站:https://elle-miller.github.io/roto/

英文摘要

Tactile-based reinforcement learning (RL) is currently hindered by fragmented research and a focus on over-saturated orientation tasks. We introduce v2 of the Robot Tactile Olympiad (\texttt{roto 2.0}), a GPU-parallelised benchmark designed to standardise tactile-based RL across four distinct robotic morphologies (16-DOF to 24-DOF). Unlike prior benchmarks, roto focuses on end-to-end "blind" manipulation, utilising only proprioception and tactile sensing without state information or distillation. We demonstrate a significant performance leap, with our blind agents achieving 13 Baoding ball rotations in 10 seconds, an order of magnitude faster than current state-of-the-art speeds. By open-sourcing our environments and robustly tuned baselines, we reduce the barrier to entry and enable researchers to prioritise fundamental algorithmic challenges over tedious RL tuning. Website: https://elle-miller.github.io/roto/

2605.21428 2026-05-21 cs.LG cs.DS 版本更新

Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals

多项式时间鲁棒多类线性分类下的高斯边缘分布

Ilias Diakonikolas, Giannis Iakovidis, Mingchen Ma

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 研究在高斯分布下多类线性分类器的无偏学习任务,提出了一种多项式时间鲁棒学习算法,解决了多类分类中误差保证的问题,特别是在k≥3的情况下。

详情
AI中文摘要

我们研究在高斯分布下多类线性分类器的无偏学习任务。给定来自R^d × [k]分布的标记示例(x, y),其中x的边缘分布为高斯分布,目标是输出一个误差与最佳k类线性分类器相当的假设。尽管二分类情况k=2有成熟的算法理论,但k≥3的情况了解较少。即使对于k=3,先前的鲁棒算法在复杂性和表示大小上也存在指数依赖于所需准确度的倒数。在本文中,我们为多类线性分类器开发了新的结构结果,并利用这些结果设计了具有维度无关误差保证的完全多项式时间鲁棒学习器。我们的第一个结果表明,标准多类感知机算法即使在干净标签和高斯边缘分布的情况下也需要超多项式样本和更新,揭示了二分类中不存在的基本障碍。我们的主要积极结果是一个成对不恰当学习框架,该框架产生了一个高效的误差为~O(k^{3/2}√opt)+ε的一般k的学习器。此外,我们还开发了一个更精确的基于定位的框架,导致k=3时的误差为O(opt)+ε,以及对于几何上规则的k类线性分类器,误差为poly(k)opt+ε。

英文摘要

We study the task of agnostic learning of multiclass linear classifiers under the Gaussian distribution. Given labeled examples $(x, y)$ from a distribution over $\mathbb{R}^d \times [k]$, with Gaussian $x$-marginal, the goal is to output a hypothesis whose error is comparable to that of the best $k$-class linear classifier. While the binary case $k=2$ has a well-developed algorithmic theory, much less is known for $k \ge 3$. Even for $k=3$, prior robust algorithms incur exponential dependence on the inverse of the desired accuracy in both complexity and representation size. In this work, we develop new structural results for multiclass linear classifiers and use them to design fully polynomial-time robust learners with dimension-independent error guarantees. Our first result shows that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels and Gaussian marginals, revealing a basic obstruction absent in the binary case. Our main positive result is a pairwise improper-learning framework which yields an efficient learner with error $\widetilde O(k^{3/2}\sqrt{\mathrm{opt}})+ε$ for general $k$. Additionally, we develop a sharper localization-based framework which leads to error $O(\mathrm{opt})+ε$ for $k=3$, and error $\mathrm{poly}(k)\mathrm{opt}+ε$ for geometrically regular $k$-class linear classifiers.

2605.21426 2026-05-21 cs.LG 版本更新

Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks

自适应信号复苏:用于稀疏视觉网络的通道级后剪枝修复

Qishi Zhan, Ziheng Chen, Minxuan Hu

发表机构 * Department of Mathematical and Statistical Sciences, Marquette University(马歇尔大学数学与统计科学系) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Cornell Ann S. Bowers College of Computing and Information Science, Cornell University(康奈尔大学安·S·博尔斯计算与信息科学学院)

AI总结 本文提出了一种无需训练的通道级修复方法ASR,用于解决高稀疏度下因后剪枝修复粒度不匹配导致的精度下降问题,通过估计每个输出通道的方差匹配修正并结合数据驱动的收缩规则,提升稀疏视觉网络的性能。

详情
AI中文摘要

一次性的幅度剪枝在高稀疏度情况下会导致严重的精度下降,即使剪枝掩码保留了最大的权重。我们认为这种失败反映了后剪枝修复的粒度不匹配。在全局幅度剪枝下,几乎崩溃的通道可以与在同一层中保留信息激活方差的通道共存。现有的逐层激活修复方法对整个层应用单一修正,因此在尝试恢复层级信号时可能会过度放大受损通道。我们提出了自适应信号复苏(ASR),一种无需训练的通道级修复方法,该方法的修复粒度与损伤粒度相匹配。ASR为每个输出通道估计方差匹配的修正,并通过数据驱动的收缩规则稳定该修正,抑制信号弱的后剪枝通道的不可靠修正,同时保留健康通道的修正。在批量归一化重校准之前应用ASR,仅需在小校准集上进行几次前向传递,无需重新训练。在三个数据集、四种卷积架构以及无结构和有结构稀疏性设置下,ASR通常优于逐层修复,尤其在高稀疏度情况下效果显著。在ResNet-50在90%稀疏度下,ASR在CIFAR-10上恢复了55.6%的Top-1准确率,相比逐层修复的41.0%和仅批量归一化重校准的28.0%。消融实验表明,朴素的通道级方差匹配不足,而收缩稳定了后剪枝修复。

英文摘要

One-shot magnitude pruning can cause severe accuracy collapse in the high-sparsity regime, even when the pruning mask preserves the largest weights. We argue that this failure reflects a granularity mismatch in post-pruning repair. Under global magnitude pruning, nearly collapsed channels can coexist with channels that retain informative activation variance within the same layer. Existing layer-wise activation repair methods apply a single correction to the whole layer, and can therefore over-amplify damaged channels while trying to restore the layer-level signal. We propose Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair method that matches the granularity of repair to the granularity of damage. ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining. Across three datasets, four convolutional architectures, and both unstructured and structured sparsity settings, ASR generally improves over layer-wise repair, with the clearest gains in high-sparsity regimes. On ResNet-50 at 90% sparsity, ASR recovers 55.6% top-1 accuracy on CIFAR-10, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration. Ablations show that naive channel-wise variance matching is insufficient, and that shrinkage stabilizes post-pruning repair.

2605.21420 2026-05-21 cs.LG cs.AI q-bio.MN 版本更新

HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation

HiRes: 反应条件推荐的可检查先例记忆

Shreyas Vinaya Sathyanarayana, Raja Sekhar Pappala, Deepak Warrier

发表机构 * Mstack AI

AI总结 HiRes通过结合图编码器、变换感知交叉注意力、多流反应融合和k-NN检索层,实现了反应条件推荐的高准确率和可解释性,其在催化剂、溶剂和试剂的Top-1准确率分别达到0.929、0.534和0.530,优于现有方法。

详情
AI中文摘要

反应条件推荐紧接在 retrosynthetic disconnection 选择之后,实际应用中化学家需要准确的预测以及支持这些预测的先例。我们提出了HiRes(分层反应表示),这是一种检索增强的条件推荐系统,其学习的反应空间同时作为分类特征和可检查的先例记忆。模型结合了图编码器、变换感知交叉注意力、多流反应融合和k-NN检索层。HiRes在主要槽位USPTO-Condition模型中达到最先进的性能,分别在催化剂、溶剂和试剂的Top-1准确率(Acc@1)为0.929、0.534和0.530。它与最佳报告的基线在催化剂上持平,但在溶剂和试剂上优于REACON等模型。此外,配对bootstrap分析表明,将检索与学习的条件头部结合,为溶剂和试剂选择提供了统计上显著的优势,优于纯参数方法。最终,HiRes在预测准确性和化学可解释性之间架起桥梁,提供了一个单一的表示,既能提供具有竞争力的推荐,又能提供实际合成计划所需的具体化学先例。

英文摘要

Reaction condition recommendation sits immediately after retrosynthetic disconnection selection, and in practice, chemists require both accurate predictions and the precedents that justify them. We present HiRes (Hierarchical Reaction Representations), a retrieval-augmented condition recommendation system whose learned reaction space serves as both a classifier feature and an inspectable precedent memory. The model combines a graph encoder, transformation-aware cross-attention, multi-stream reaction fusion, and a k-NN retrieval layer. HiRes achieves state-of-the-art performance among primary-slot USPTO-Condition models, reaching Catalyst, Solvent, and Reagent top-1 accuracies (Acc@1) of 0.929, 0.534, and 0.530 respectively. It ties the best reported baseline on Catalyst while outperforming models such as REACON on Solvent and Reagent. Furthermore, paired bootstrap analysis demonstrates that integrating retrieval with learned condition heads provides statistically significant gains for solvent and reagent selection over purely parametric approaches. Ultimately, HiRes bridges the gap between predictive accuracy and chemical interpretability, offering a single representation that supplies both competitive recommendations and the concrete chemical precedents necessary for practical synthesis planning.

2605.21418 2026-05-21 cs.LG cs.AI cs.CV cs.NI 版本更新

FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

FedCritic: 一种基于联邦批评学习的多小区OFDMA资源分配方法用于6G

Amin Farajzadeh, Melike Erol-Kantarci

发表机构 * School of Electrical Engineering and Computer Science, University of Ottawa(奥克塔维亚大学电气工程与计算机科学学院)

AI总结 本文研究了6G超密集网络中因频率重用加剧的小区间干扰问题,提出FedCritic框架,通过轻量级基于干扰图的参数平均实现去中心化执行,从而在不依赖中央协调器的情况下稳定估计价值函数,提升信号干扰噪声比(SINR)和小区边缘速率,提高网络总和速率和公平性。

Comments Submitted to IEEE for possible publication

详情
AI中文摘要

在第六代(6G)超密集网络中,激进的频率重用加剧了小区间干扰(IC),使得多小区正交频分多址(OFDMA)调度和功率控制在相邻小区之间高度耦合。我们研究了在干扰耦合和长期用户服务质量(QoS)最小速率约束下,分布式下行资源管理——联合子载波调度和功率分配。通过使用虚拟队列缺陷权重来强制长期QoS,我们开发了FedCritic,一种无服务器的联邦多智能体actor-critic框架,具有去中心化执行。与需要集中式批评学习和联合轨迹聚合的集中式训练与去中心化执行(CTDE)方法不同,FedCritic通过轻量级基于干扰图的参数平均联邦化批评,从而在不依赖中央协调器的情况下保持策略本地化,实现稳定的值估计。在干扰丰富的重用-1设置中的仿真显示,FedCritic在均值信号干扰噪声比(SINR)和小区边缘速率、网络总和速率和公平性方面优于非协调和CTDE基线,并实现了更低的协调开销和更稳定的训练。

英文摘要

In sixth-generation (6G) ultra-dense networks, aggressive frequency reuse amplifies inter-cell interference (ICI), making multi-cell orthogonal frequency-division multiple access (OFDMA) scheduling and power control strongly coupled across neighboring cells. We study distributed downlink resource management -- joint subcarrier scheduling and power allocation -- under interference coupling and long-term per-user quality-of-service (QoS) minimum-rate constraints. By using virtual-queue deficit weights to enforce long-term QoS, we develop FedCritic, a serverless federated multi-agent actor-critic framework with decentralized execution. Unlike centralized training with decentralized execution (CTDE) approaches that require centralized critic learning and joint trajectory aggregation, FedCritic federates the critic through lightweight gossip-based parameter averaging over the interference graph, enabling stable value estimation without a central coordinator while keeping policies local. Simulations in an interference-rich reuse-1 setting show that FedCritic improves mean signal-to-interference-plus-noise ratio (SINR) and cell-edge rate, increases network-wide average sum-rate and fairness relative to non-coordinated and CTDE baselines, and achieves more stable training with lower coordination overhead.

2605.21404 2026-05-21 cs.LG 版本更新

What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema

十二篇LLM代理基准测试论文披露了什么:一项初步审计和开放评分方案

Mahdi Naser Moghadasi, Faezeh Ghaderi

发表机构 * Research Division, BrightMind AI(BrightMind AI研究部) Texas Tech University(德克萨斯理工大学) University of Texas at Arlington(德克萨斯大学阿灵顿分校)

AI总结 本文通过分析十二篇知名LLM代理基准测试论文,揭示了这些论文在评估方法披露方面的不足,设计了一种开放评分方案以提高透明度和可重复性。

Comments Pilot audit of 12 LLM agent benchmark papers; schema, codebook, and per-paper scoring sheet released. Submission to IEEE Big Data 2026

详情
AI中文摘要

我们阅读了十二篇著名的LLM代理基准测试论文,并逐项记录了每篇论文对其实验评估如何运行的描述。这一动机源于一个常见的挫败感:两篇论文会使用相同的基准测试和相同的模型名称报告结果,但却得出不同的结论,而你无法查明原因——可能是框架、采样设置、子集或评估者版本。在许多情况下,发表的成果文件并不允许你回答这些问题。本文是对这一尝试的实施报告。我们设计了一个小型审计方案(五个字段:基准身份、框架规格、推理设置、成本报告、失败分解),编写了一个包含我们在试点评分中遇到的边界情况的评分代码书,将其应用于十二篇经典论文(八篇代理,四篇经典静态),并记录了我们所看到的内容。我们对代理运行的披露进行评分,而不是其正确性,并不声称披露意味着可靠的结果。在八篇代理基准测试论文中的平均审计评分为0.38(满分1.0),而在四篇经典静态基准测试中为0.66;最大的差距出现在成本(八篇代理基准测试论文中没有任何一篇以任何形式披露推理成本)和框架规格(没有任何一篇完全披露评估环境的内容寻址容器镜像)。我们发布了该方案作为JSON Schema文件,代码书作为Markdown文档,原始评分表作为CSV文件。评分由单个审计员在一次通过中完成;多评分者审计是自然的下一步,我们讨论了我们认为它会如何改变。

英文摘要

We read twelve well-known LLM agent benchmark papers and recorded, dimension by dimension, what each paper actually says about how its evaluation was run. The motivation came from a familiar frustration: two papers will report results on the same benchmark with the same model name and disagree, and you cannot tell why -- the scaffold, the sampling settings, the subset, or the evaluator version. In many cases the published artifact does not let you answer. This paper is an implementation report on the attempt. We designed a small audit schema (five fields: benchmark identity, harness specification, inference settings, cost reporting, failure breakdown), wrote a scoring codebook with the boundary cases we hit during pilot scoring, applied it to twelve canonical papers (eight agent, four classical static), and recorded what we saw. We score the disclosure of an agent run, not its correctness, and make no claim that disclosure implies a trustworthy result. The mean audit score across the eight agent-benchmark papers is 0.38 (out of 1.0), and across the four classical static benchmarks 0.66; the largest gap is on cost (none of the eight agent benchmark papers disclose inference cost in any form) and on harness specification (none fully disclose a content-addressed container image of the evaluation environment). We release the schema as a JSON Schema file, the codebook as a Markdown document, and the raw scoring sheet as a CSV. The scoring was performed by a single auditor in one pass; a multi-rater audit is the natural next step, and we discuss what we think it would change.

2605.21402 2026-05-21 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG 版本更新

Memorisation, convergence and generalisation in generative models

记忆、收敛与泛化在生成模型中的表现

Antoine Maillard, Sebastian Goldt

发表机构 * INRIA Paris & DI ENS, PSL University, Paris, France(巴黎国家信息与自动化研究所(INRIA)及巴黎高等师范学院(ENS)与巴黎大学(PSL University)) International School of Advanced Studies (SISSA), Trieste, Italy(意大利国际高级研究学院(SISSA))

AI总结 本文研究了生成模型中记忆、收敛和泛化的区别,通过线性生成模型的分析,发现当样本数与输入维度成线性关系时,模型会从记忆过渡到泛化,并揭示了泛化包含两个不同目标:匹配数据分布的主体和恢复数据的主潜在因素。

详情
AI中文摘要

生成神经网络通过少量但有限的示例学习生成高度逼真的图像——它们是通过记忆训练集还是真正收敛到数据分布?为了解决这个问题,Kadkhodaie、Guth、Simoncelli和Mallat(ICLR '24)分别在数据集的不同子集上训练扩散模型,并显示当训练图像数量足够大时,它们会收敛到几乎相同的密度。这一结果提出了两个基本问题:需要多少数据才能收敛,以及收敛在学习数据分布方面捕捉了什么?本文通过提供线性生成模型从记忆到泛化的精确分析来解决这些问题。我们发现这些模型在小负载下会记忆,而当样本数与输入维度成线性关系时,收敛会连续出现。令人惊讶的是,我们发现收敛对恢复数据的主潜在因素不敏感,这些因素在尖锐的过渡中被恢复。在将我们的方法扩展到具有幂律谱的数据后,我们在卷积去噪器实验和Kadkhodaie等人的数据中发现了相同的收敛与潜在因素恢复的区别。因此,我们证明生成模型的泛化分解为至少两个不同的目标:匹配数据分布的主体和恢复数据的主潜在因素。这些目标对应于真实与学习数据分布之间的两种不同距离,只有第一个被收敛所捕捉。

英文摘要

Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.

2605.21395 2026-05-21 cs.AI cs.LG 版本更新

Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G

迈向稳健和自主的网络:AI原生6G的BlueSky愿景

Liang Wu, Kelly Wan, Mayank Darbari, Liangjie Hong

发表机构 * Nokia(诺基亚)

AI总结 本文提出了一种AI原生6G的BlueSky愿景,旨在将人工智能原生整合到6G中,从'为AI的网络'转向'为网络的AI',通过基础模型和协作多智能体系统,将网络管理转化为统一的多模态多任务优化问题,推动6G向智能自维持通信基础设施发展。

Comments Accepted at KDD 2026

详情
AI中文摘要

新兴应用的普及,如自动驾驶和沉浸式体验,要求细胞网络不仅更快,而且从根本上更稳健和自主。本文提出了一种BlueSky愿景,探讨人工智能如何原生整合到6G中,从'为AI的网络'转向'为网络的AI'。我们设想,不同于5G对分散、随机模型的依赖,6G时代原生AI将由基础模型锚定,并通过协作多智能体系统进行协调,将网络管理视为统一的多模态、多任务优化问题。基于这一愿景,我们提出了两个变革性方向。第一方向是开发一个6G基础模型作为统一的骨干,将任务特定的知识蒸馏成适合多样边缘部署的紧凑模型。第二方向是推进多智能体系统,以自主诊断、维护和恢复网络,最小化人工干预。这些方向为6G演变为智能、自维持的通信基础设施指明了道路。

英文摘要

The proliferation of emerging applications, such as autonomous driving and immersive experiences, demands cellular networks that are not only faster, but fundamentally more resilient and autonomous. This paper presents a BlueSky vision on how Artificial Intelligence will be natively integrated into 6G, shifting the paradigm from \underline{Network for AI} to \underline{AI for Network}. We envision that, unlike 5G's reliance on scattered, ad-hoc models each trained for a single task, native AI in the 6G era will be anchored by a foundation model and and orchestrated via collaborative multi-agent systems, framing network management as a unified, multi-modal, multi-task optimization problem. Built on this vision, we outline two transformative directions. The first focuses on developing a 6G foundation model as a unified backbone, with task-specific knowledge distilled into compact models suited for diverse edge deployments. The second advances multi-agent systems designed to autonomously diagnose, maintain, and recover networks with minimal human intervention. These directions chart a roadmap for 6G to evolve into an intelligent, self-sustaining communication infrastructure.

2605.21388 2026-05-21 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

关于PDE诱导度量的一步Wasserstein引导生成模型的正则性和泛化性

Likun Lin, Zhongjian Wang, Jack Xin, Zhiwen Zhang

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系) Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University(南洋理工大学数学科学系) Department of Mathematics, University of California at Irvine(加州大学 Irvine 分校数学系)

AI总结 本文研究了一步Wasserstein引导生成模型在处理PDE诱导概率度量时的正则性和泛化性,通过理论框架证明了运输映射的正则性和生成模型的泛化性质,并通过实验验证了理论结果。

详情
AI中文摘要

尽管生成模型在经验上取得了显著成功,但其在科学计算中的统计准确性理论仍然较为悲观。本文发展了一个理论框架,用于理解运输映射的正则性和一步Wasserstein引导生成模型的泛化性质。我们考虑了与线性椭圆和抛物型方程在有界域上以及扩散和福克-计划克方程在环面上关联的归一化目标密度。在标准结构假设下,我们证明这些目标度量满足倍增条件。通过结合这一事实与倍增度量之间最优运输的正则性理论,我们证明了从均匀源度量到目标度量的最优运输映射是Hölder连续的。这种正则性为通过单个推前映射学习PDE诱导分布的一步生成模型提供了近似理论依据。作为代表实例,我们研究了DeepParticle,并推导了描述学习映射与总体最优映射之间差异的额外风险界。我们还建立了在目标转移下的鲁棒性估计,并通过实验验证了推导出的速率。

英文摘要

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

2605.21381 2026-05-21 cs.CV cs.LG 版本更新

Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

解耦生成与回归在可控图像恢复中的随机插值

Yi Liu, Jia Ma, Wengen Li, Jihong Guan, Shuigeng Zhou, Yichao Zhang

发表机构 * Tongji University(同济大学) Fudan University(复旦大学)

AI总结 本文提出DiSI框架,通过解耦随机插值过程中的生成与回归组件,实现从纯回归到全生成的连续可控过渡,提升图像恢复任务的效率和精度。

Comments 44 pages, 16 figures, 16 tables

详情
AI中文摘要

近年来,图像恢复(IR)的进步主要由生成方法如扩散模型和流匹配驱动,这些方法在合成逼真纹理方面表现出色,但存在推理慢和像素保真度差的问题。相比之下,传统基于回归的IR方法在这些方面表现更佳,提供单步高效性和高像素级重建保真度。为弥合这一差距,我们提出DiSI,一个统一框架,将随机插值过程解耦为独立的生成和回归组件。这种解耦使DiSI具有显著的通用性,能够连续且可控地从纯回归过程过渡到全生成过程。技术上,我们通过两种特定的采样轨迹实例化该框架,并辅以统一的采样器,实现高质量的少步推理。此外,我们设计了双分支U-Net风格变压器网络,在像素空间中使用专用分支增强条件引导,同时确保高吞吐量。大量实验表明,DiSI在各种IR任务中实现了高效且具有竞争力的结果,同时在单个模型中提供推理时的灵活性,以控制失真感知的权衡。

英文摘要

Recent advances in Image Restoration (IR) have been largely driven by generative methods such as Diffusion Models and Flow Matching, which excel in synthesizing realistic textures while suffering from slow multi-step inference and compromised pixel fidelity. In contrast, classical regression-based IR methods excel precisely in these aspects, offering single-step efficiency and high pixel-level reconstruction fidelity. To bridge this gap, we propose DiSI, a unified framework that Disentangles the underlying Stochastic Interpolant process into independent generation and regression components. This decoupling endows DiSI with remarkable versatility, enabling a continuous and controllable transition from a pure regression process to a fully generative one. Technically, we instantiate this framework with two specific sampling trajectories, accompanied by a unified sampler for high-quality, few-step inference on arbitrary trajectories. Furthermore, we design a dual-branch U-Net style transformer network in pixel space, using a dedicated branch to enhance conditional guidance while ensuring high throughput. Extensive experiments demonstrate that DiSI efficiently achieves competitive results on various IR tasks, while uniquely offering the inference-time flexibility to control the distortion-perception trade-off within a single model.

2605.21372 2026-05-21 cs.CV cs.AI cs.LG cs.RO 版本更新

Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

闭环动态驾驶数据混合用于真实-合成协同训练

Hongzhi Ruan, Pei Liu, Weiliang Ma, Zhengning Li, Xueyang Zhang, Jun Ma, Dan Xu, Kun Zhan

发表机构 * Li Auto(力汽车) HKUST(香港科技大学) HKUST (GZ)(香港科技大学(广州))

AI总结 本文提出了一种闭环动态数据混合方法,通过动态优化过程调整训练数据混合比例,以提升模型性能,解决了在有限预算下优化数据混合的关键问题。

详情
AI中文摘要

数据扩展是现代深度学习的基础,随着自动驾驶转向端到端学习,其重要性日益增加。现实世界驾驶数据标注成本高且场景偏向性明显,使利用几乎无限的合成数据进行真实-合成协同训练成为有前景的方向。然而,简单地整合所有可用的合成数据效率低下且导致分布偏移,优化实际训练预算下的数据混合仍是一个关键但尚未充分研究的问题。因此,我们主张在场景类型和数量上为训练数据混合提供明确指导。特别是在本文中,我们将数据混合近似概念化为一个动态优化过程,通过闭环评估反馈迭代调整训练数据混合以最大化模型性能,并提出AutoScale,一种完全自动化的闭环数据引擎,统一了场景表示、数据混合优化与检索以及模型训练与评估。具体而言,我们提出了图正则化的自编码器(Graph-RAE)用于驾驶场景表示,引入了簇感知梯度上升(Cluster-GA)用于簇级重要性估计和重新加权,并执行簇引导的向量检索以选择高价值样本。在NavSim上的实验表明,AutoScale在有限预算下优于传统协同训练和跨域基线,实现了更好的性能。

英文摘要

Data scaling is fundamental to modern deep learning, and grows increasingly critical as autonomous driving shifts to end-to-end learning. Real-world driving data is expensive to annotate and scene-biased, making real-synthetic co-training with near-infinite synthetic data a promising direction. However, naively incorporating all available synthetic data is inefficient and leads to distribution shifts, and optimizing data mixture under practical training budgets remains a critical yet under-explored problem. In this sense, we claim that the mixture of training data requires clear guidance in terms of scene types and quantities. Particularly in this work, we conceptualize the data mixture approximately as a dynamic optimization process that iteratively adjusts the training data mixture to maximize model performance, guided by closed-loop evaluation feedback, and propose AutoScale, a fully automated closed-loop data engine unifying scene representation, data mixture optimization and retrieval, as well as model training and evaluation. Specifically, we propose Graph Regularized AutoEncoder (Graph-RAE) for driving scene representations, introduce Cluster-aware Gradient Ascent (Cluster-GA) for cluster-wise importance estimation and reweighting, and perform cluster-guided vector retrieval to select high-value samples. Experiments on NavSim demonstrate that AutoScale outperforms vanilla co-training and cross-domain baselines, achieving better performance with fewer synthetic samples under constrained budgets.

2605.21352 2026-05-21 cs.LG cs.CE cs.ET 版本更新

Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework

基于切换电压的单相和混合局部放电分类的AWA-CNN框架

Md Rafid Kaysar Shagor, Zannatul Ferdousy Mouri, Farhina Haque, Anindya Bijoy Das

AI总结 本文提出了一种基于AWA模式表示的CNN框架,用于在切换电压激励下对局部放电源进行分类,通过分析脉冲幅度、宽度和面积生成可视化模式,实现对六种不同放电源的高准确率分类。

详情
AI中文摘要

随着快速开关功率电子的应用增加,局部放电(PD)分析在切换电压激励下的重要性日益增加,但比在正弦条件下更具挑战性,因为活动集中在电压转换处。本文提出了一种幅度-宽度-面积(AWA)模式表示,用于在切换电压激励下进行源导向的局部放电分析。在所提出的方法中,时间域的局部放电脉冲通过脉冲幅度、宽度和面积进行表征,并映射到可视化模式中,其中幅度和面积定义坐标轴,宽度通过颜色编码。生成的AWA模式用于区分六种单个和混合的局部放电源条件:电晕、内部、表面、电晕+内部、电晕+表面和内部+表面。为了评估所提出表示的分类能力,比较了随机森林基线和两个卷积神经网络(CNN)模型,即InceptionV3和ResNet-18。AWA模式显示出可区分的源依赖分布,CNN基于分类在测试准确率上超过96%,而随机森林为73.33%。结果表明,AWA模式为在切换电压激励下多类局部放电源分类提供了合适的可视化表示。

英文摘要

The growing use of fast-switching power electronics has made partial discharge (PD) analysis under switching-voltage excitation increasingly important, yet more challenging than under sinusoidal conditions due to activity concentrated at voltage transitions. This work presents an Amplitude-Width-Area (AWA) pattern representation for source-oriented PD analysis under switching-voltage excitation. In the proposed method, time domain PD pulses are characterized using pulse amplitude, width, and area, and mapped into a visual pattern where amplitude and area define the coordinate axes and width is encoded by color. The generated AWA patterns are used to distinguish six single and mixed PD source conditions: corona, internal, surface, corona+internal, corona+surface, and internal+surface. To evaluate the classification capability of the proposed representation, a Random Forest baseline and two Convolutional Neural Network (CNN) models, InceptionV3 and ResNet-18, are compared. The AWA patterns show distinguishable source-dependent distributions, and CNN-based classification achieves testing accuracy above 96%, compared with 73.33% for Random Forest. The results indicate that AWA patterns provide a visual representation of PD pulses suitable for multi-class PD source classification under switching-voltage excitation.

2605.21348 2026-05-21 cs.LG cs.AI cs.NA math.NA physics.comp-ph 版本更新

Data-Efficient Neural Operator Training via Physics-Based Active Learning

通过物理引导的主动学习实现数据高效的神经算子训练

Alicja Polanska, Lorenzo Zanisi, Vignesh Gopakumar, Stanislas Pamela

发表机构 * University College London(伦敦大学学院) Atomic Energy Authority(原子能局)

AI总结 本文提出了一种基于物理的主动学习方法,用于提高神经算子训练的数据效率,通过利用偏微分方程残差来指导数据选择,在1D Burgers方程和2D可压缩纳维-斯托克斯方程的数值实验中验证了该方法在数据效率上的优越性。

Comments Presented at the ICLR 2026 Workshop on Artificial Intelligence and Partial Differential Equations

详情
AI中文摘要

使用神经算子求解偏微分方程显著降低了计算成本,但仍然受到高训练数据需求的限制。主动学习提供了一个自然的框架,通过迭代方式选择最有信息量的样本来缓解这一问题。我们引入了基于物理的获取方法,这是一种新的物理引导的主动学习算法,利用偏微分方程残差来指导数据选择。我们通过1D Burgers方程和2D可压缩纳维-斯托克斯方程的数值实验验证了该方法。我们显示,在我们的实验中,基于物理的获取方法在数据效率上始终优于随机获取,并且在数据效率上与当前最先进的方法相媲美。同时,它具有独特的优势,即在训练过程中注入物理归纳偏差,确保在模型物理理解最弱的地方花费模拟成本。

英文摘要

Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements. Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner. We introduce physics-based acquisition - a novel physics-informed active learning algorithm that leverages the partial differential equation residual to guide data selection. We validate the method by presenting numerical experiments for the 1D Burgers equation and the 2D compressible Navier-Stokes equations. We show that, in our experiments, physics-based acquisition consistently outperforms random acquisition and matches the state of the art in data efficiency. At the same time, it has the unique advantage of injecting a physics inductive bias into the training process, ensuring that simulation cost is spent where the model's physical understanding is weakest.

2605.21341 2026-05-21 stat.ML cs.LG 版本更新

Semiparametric Efficient Bilevel Gradient Estimation

半参数高效双层梯度估计

Fares El Khoury, Houssam Zenati, Nathan Kallus, Michael Arbel, Aurélien Bibaut

发表机构 * Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK(格勒诺布尔阿尔卑斯大学、法国国家科学研究中心、格勒诺布尔INP、LJK实验室) Gatsby Computational Neuroscience Unit, University College London(伦敦大学学院(UCL)的Gatsby计算神经科学单位) Cornell University(康奈尔大学) Netflix Research(Netflix研究)

AI总结 本文提出一种半参数去偏理论,用于消除双层梯度估计中的一阶偏差,通过交叉拟合的正交超梯度估计器实现了渐近正态性,并在二次损失下简化为基于条件均值 nuisances 的双重鲁棒分数。

详情
AI中文摘要

功能双层方法估计下层函数并将其插入选项超梯度,但当下层问题非参数学习时,这种插入选项梯度可能保留一阶偏差。为消除此偏差,我们基于高效影响函数开发了半参数去偏理论,用于总体双层梯度。这种视角导致了交叉拟合的正交超梯度估计器,我们建立了渐近正态性并统一控制外参数。在二次损失下,该估计器简化为基于条件均值 nuisances 的简单双重鲁棒分数。在具有已知真实值的合成双层基准测试中,该方法跟踪 oracle 高效梯度基准,并优于插入选项函数超梯度和正则化核双层基线。

英文摘要

Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we develop a semiparametric debiasing theory for population bilevel gradients based on the efficient influence function. This perspective leads to a cross-fitted orthogonal hypergradient estimator for which we establish asymptotic normality together with uniform control over the outer parameter. Under quadratic losses, the estimator reduces to a simple doubly robust score based on conditional mean nuisances. On synthetic bilevel benchmarks with known ground truth, the method tracks the oracle efficient-gradient benchmark and improves over plug-in functional hypergradients and regularized kernel bilevel baselines.

2605.21325 2026-05-21 cs.LG 版本更新

Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers

快速且稳定的三角矩阵求逆用于Delta规则线性变换器

Aleksandros Sobczyk, Gioele Gottardo, Christos K. Matzoros, Mirko De Vita, Filip Skogh, Anastasios Zouzias, Jiawei Zhuang

发表机构 * Computing Systems Lab(计算系统实验室) Huawei Technologies Switzerland(华为技术瑞士)

AI总结 本文研究了Delta规则线性变换器中快速且稳定的三角矩阵求逆方法,通过分析直接和迭代算法,探讨了矩阵乘法丰富的算法在现代硬件上的高效利用,实验验证了不同方法在低精度浮点表示下的性能和稳定性,实现了三角矩阵求逆的4.3倍加速,从而提升整个层级性能并保持端到端模型精度。

Comments Preprint

详情
AI中文摘要

线性注意力机制已成为高效长上下文架构的核心,如Qwen3.5/3.6、Kimi Linear和RWKV-7等先进开源模型均整合了该机制。包含线性注意力层的Delta规则模型涉及三角矩阵求逆作为核心子过程。该操作常成为性能瓶颈,且由于对数值误差高度敏感,若未正确实现,会显著降低端到端模型精度。本文系统分析了直接和迭代三角矩阵求逆算法,针对矩阵乘法丰富的算法,从而可能高效利用现代硬件。为此,我们的分析涵盖了广泛的数学和实际方面,重点在于数值稳定性、计算复杂度以及最终的硬件效率和实际考虑。我们提供了严谨的实验评估以验证这些属性在实际场景中的表现,并在低精度浮点表示下突出每种方法的优势和局限性。在NPUs上的性能基准测试显示,三角矩阵求逆的实现相比SGLang的最新实现快达4.3倍,从而在整个层级上实现显著的性能提升,同时保持完整的端到端模型精度。

英文摘要

Linear attention has emerged as a cornerstone for efficient long-context architectures, as evidenced by its integration into state-of-the-art open-source models including Qwen3.5/3.6, Kimi Linear, and RWKV-7. Models that incorporate linear attention layers with the so-called Delta-Rule involve the inversion of triangular matrices as a core sub-routine. This operation often forms a performance bottleneck, and, due to its high-sensitivity to numerical errors, it can significantly deteriorate end-to-end model accuracy if it is not carefully implemented. This work provides a systematic analysis of both direct and iterative triangular inversion algorithms, targeting methods that are rich in matrix products, and, therefore, have the potential to efficiently utilize modern hardware. To that end, our analysis covers a broad spectrum of mathematical and practical aspects, with a heavy focus on numerical stability, computational complexity, and, ultimately, hardware efficiency and practical considerations. We provide a rigorous experimental evaluation to verify these properties in practical scenarios, and in low-precision floating-point representations, highlighting the strengths and limitations of each method. Performance benchmarks on NPUs reveal up to $4.3\times$ speed-up against the state-of-the-art implementations of SGLang for triangular matrix inversion, leading to significant performance improvements on the entire layer level, while maintaining full end-to-end model accuracy.

2605.21324 2026-05-21 q-bio.NC cs.LG 版本更新

Stimulus symmetries can confound representational similarity analyses

刺激对称性可能混淆表征相似性分析

Farhad Pashakhanloo, Jacob A. Zavatone-Veth

发表机构 * Center for Brain Science(脑科学中心) Society of Fellows(fellows 社会) Harvard University(哈佛大学)

AI总结 研究探讨了网络输入对称性如何影响表征相似性矩阵(RSMs)的分析,指出不同配置可能导致不同的RSMs,并展示了随机梯度下降或能量正则化如何生成稀疏漂移代码,从而导致漂移RSMs。

Comments 40 pages

详情
AI中文摘要

表征相似性矩阵(RSMs)能告诉我们关于神经编码的什么信息?随着这些汇总统计量的普及,对它们性质的更全面描述的需求也日益增加。本文表明,网络输入中的对称性可能干扰基于RSM的分析。刺激对称性使许多表示在功能上等价,但这些不同配置可能导致不同的RSMs。这些不同的RSMs反映了质上不同的表征几何结构。我们展示随机梯度下降或能量正则化可以生成稀疏、漂移的代码,从而导致漂移的RSMs。此外,我们证明这些现象在训练以编码图像数据的网络中也存在,其中对称性是隐含的。我们的结果说明了在非线性神经编码比较中面临的挑战,当功能等价的表示不通过简单的旋转相关时。

英文摘要

What can representational similarity matrices (RSMs) tell us about a neural code? As the popularity of these summary statistics grows, so too does the need for a more complete characterization of their properties. Here, we show that symmetries in network inputs can confound RSM-based analyses. Stimulus symmetries render many representations functionally equivalent, but these different configurations can lead to different RSMs. These different RSMs reflect qualitatively different representational geometries. We show that stochastic gradient descent or energetic regularization can generate sparse, drifting codes, leading in turn to drifting RSMs. Moreover, we demonstrate that these phenomena are present in networks trained to encode image data, where the symmetry is latent. Our results illustrate the challenges inherent in comparing nonlinear neural codes, when functionally-equivalent representations are not related by a simple rotation.

2605.21322 2026-05-21 cs.LG 版本更新

Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search

优化的联邦知识蒸馏与分布式神经架构搜索

Chaimaa Medjadji, Sylvain Kubler, Yves Le Traon, Guilain Leduc, Sadi Alawadi, Feras M. Awaysheh

发表机构 * Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg(安全、可靠与信任跨学科研究中心(SnT),卢森堡大学) Blekinge Institute of Technology(布莱金厄理工大学) ADSLabs, Umea University(ADSLabs,乌梅亚大学)

AI总结 本文提出FedKDNAS框架,结合客户端侧神经架构选择与服务器协调的知识蒸馏,以解决联邦学习中数据异质性、系统异质性和通信效率问题,通过提升准确率和效率的帕累托效率。

详情
AI中文摘要

联邦学习(FL)使在不集中数据的情况下进行协同模型训练成为可能。然而,现实部署必须同时解决客户端数据的统计异质性(非iid)、系统异质性(设备能力差异)和通信效率。现有FL方法通过改进聚合、个性化或知识蒸馏来缓解这些挑战,但几乎都假设客户端架构固定,限制了对异质数据复杂性和硬件约束的适应性。这种架构限制通常导致现实FL系统中准确率与效率之间的次优权衡。本文引入FedKDNAS,一种由蒸馏驱动的FL框架,结合客户端侧神经架构选择与服务器协调的知识蒸馏。每个客户端在准确率-资源约束下自主选择轻量模型,然后使用结合监督学习和知识蒸馏的混合目标在本地训练,并仅分享预测结果。服务器然后聚合并平滑这些预测,可选地与教师模型结合,以生成下一轮的稳定蒸馏目标。在六个数据集上对六个代表性的FL基线(FedAvg、Ditto、FedMD、FedDF、FedDistill、Local-KD)的广泛评估表明,FedKDNAS在非iid条件下将准确率提高高达15%,减少客户端CPU使用约28%,同时将通信开销减少高达44倍,同时保持轻量的logit通信。

英文摘要

Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.

2605.21318 2026-05-21 cs.CL cs.AI cs.LG 版本更新

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

TextReg: 通过正则化的文本空间优化缓解提示分布过拟合

Lucheng Fu, Ye Yu, Yiyang Wang, Yiqiao Jin, Haibo Jin, B. Aditya Prakash, Haohan Wang

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文研究了提示分布过拟合问题,提出TextReg框架通过正则化的文本梯度实现软惩罚目标,结合双证据梯度净化、语义编辑正则化和正则化引导的提示更新,提升模型在分布外(OOD)任务上的泛化能力。

Comments Code: https://github.com/luchengfu6/TextReg

详情
AI中文摘要

大型语言模型(LLMs)对用于指定任务目标和行为约束的提示非常敏感。许多最近的提示优化方法通过迭代使用LLM生成的反馈来重写提示,但结果提示往往变长,积累狭窄的样本特定规则,并在训练分布之外泛化能力差。我们研究这种失败模式作为提示分布过拟合,并认为这反映了离散文本空间优化中表示控制的不足。我们通过表示不效率(representational inefficiency)进行了形式化,这是一种双因素度量,将提示不效率分解为容量成本和范围狭窄,将分布提示过拟合归因于优化过程中两者的耦合增长。我们提出了TextReg,一个正则化框架,通过正则化的文本梯度实现软惩罚目标,结合双证据梯度净化、语义编辑正则化和正则化引导的提示更新。在多个推理基准上,TextReg显著提高了分布外(OOD)泛化能力,其准确性在TextGrad和REVOLVE上分别提高了+11.8%和+16.5%。

英文摘要

Large language models (LLMs) are highly sensitive to the prompts used to specify task objectives and behavioral constraints. Many recent prompt optimization methods iteratively rewrite prompts using LLM-generated feedback, but the resulting prompts often become longer, accumulate narrow sample-specific rules, and generalize poorly beyond the training distribution. We study this failure mode as prompt distributional overfitting and argue that it reflects a lack of representation control in discrete text-space optimization. We formalize this view through representational inefficiency, a dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness, attributing distributional prompt overfitting to their coupled growth during optimization. We propose TextReg, a regularization framework that realizes a soft-penalty objective through regularized textual gradients, combining Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update. Across multiple reasoning benchmarks, TextReg substantially improves out-of-distribution (OOD) generalization, with accuracy gains of up to +11.8% over TextGrad and +16.5% over REVOLVE.

2605.21317 2026-05-21 cs.LG 版本更新

CRAFT: Conflict-Resolved Aggregation for Federated Training

CRAFT: 用于联邦训练的冲突解决聚合

Ziqi Wang, Qiang Liu, Nils Thuerey

发表机构 * Department of Mathematics, Friedrich-Alexander-Universität Erlangen-Nürnberg(埃朗根-纽伦堡费里德里希-亚历山大大学数学系) School of Computation, Information and Technology, Technical University of Munich(慕尼黑技术大学计算、信息与技术学院) Munich Center for Machine Learning(慕尼黑机器学习中心)

AI总结 本文提出CRAFT框架,通过将全局更新视为几何校正问题,解决联邦学习中冲突客户端更新的聚合问题,提升全局模型准确率并减少客户端间性能差异。

详情
AI中文摘要

在异构数据分布下,冲突客户端更新的聚合仍是联邦学习(FL)中的关键瓶颈。简单平均会产生一个改进全局目标但与特定客户端冲突的全局更新,导致这些客户端性能下降。本文提出CRAFT(Conflict-Resolved Aggregation for Federated Training),一种新的聚合框架,将全局更新视为几何校正问题。我们将其形式化为寻找最接近参考方向且满足无冲突对齐约束的更新。我们推导出约束优化问题的闭式表达式,避免了迭代求解器的计算开销。此外,我们使用分层适应来解决不同特征粒度下的冲突。我们提供了理论分析,证明CRAFT通过其投影几何促进共同下降结构并缓解冲突。在异构基准上的广泛实验表明,与最先进的基线相比,CRAFT在提升全局模型准确率的同时,减少了客户端间的性能差异。CRAFT的源代码可在https://github.com/tum-pbs/CRAFT获取。

英文摘要

The aggregation of conflicting client updates remains a fundamental bottleneck in federated learning (FL) under heterogeneous data distributions. Naive averaging can produce a global update that improves the global objective while conflicting with specific clients, causing degradation for those clients. In this work, we propose CRAFT (Conflict-Resolved Aggregation for Federated Training), a new aggregation framework that treats the global update as a geometric correction problem. We formulate aggregation as finding the update closest to a reference direction while satisfying conflict-free alignment constraints. We derive a closed-form expression for the constrained optimization problem, avoiding the computational overhead of iterative solvers. Furthermore, we use a layer-wise adaptation to address conflicts at varying feature granularities. We provide a theoretical analysis showing that CRAFT promotes a common-descent structure and mitigates conflicts through its projection geometry. Extensive experiments on heterogeneous benchmarks demonstrate that CRAFT improves the accuracy of the global model while reducing performance disparity across clients compared with state-of-the-art baselines. The source code for CRAFT is available at https://github.com/tum-pbs/CRAFT.

2605.21313 2026-05-21 cs.LG 版本更新

A New Framework to Analyse the Distributional Robustness of Deep Neural Networks

分析深度神经网络分布鲁棒性的新框架

Divij Khaitan, Subhashis Banerjee

发表机构 * Microsoft(微软) Ashoka University(阿什oka大学)

AI总结 本文提出了一种新框架,通过研究层权重与激活之间的相互作用来分析和量化深度神经网络的分布鲁棒性,展示了该框架在CIFAR-10和ImageNet上模型的实用性,并表明所提指标能区分记忆训练数据和未记忆的网络。

Comments 9 pages, 6 figures, 3 tables

详情
AI中文摘要

深度神经网络在多种任务上取得了显著性能,但其对分布变化的脆弱性仍然是实际部署中的重大障碍。本文提出了一种框架,通过研究层权重与激活之间的相互作用来分析和量化神经网络的分布鲁棒性。我们使用伯努利分布建模这些相互作用,利用类别间分离度作为鲁棒性的诊断代理。我们通过在CIFAR-10和ImageNet上训练的模型展示了该框架的实用性。我们证明所提出的指标可以区分记忆训练数据的网络和未记忆的网络。我们还进行了类似的激活空间实验,发现相同的性质不成立。此外,我们研究了我们的指标在各种分布变化下的行为,并显示这些变化在我们的路径基础上降低了分离度。我们的结果表明,该框架提供了有用的模型级表示结构和鲁棒性的诊断。

英文摘要

Deep neural networks have achieved impressive performance on a variety of tasks, but their brittleness to distributional shifts remains a significant barrier to real-world deployment. In this paper, we propose a framework to analyse and quantify the distributional robustness of neural networks by studying the interactions between layer weights and activations. We model these interactions using Bernoulli distributions, using the separation between classes as a diagnostic proxy for robustness. We demonstrate the usefulness of this framework through models trained on CIFAR-10 and ImageNet. We show that our proposed metrics can distinguish between networks that have memorised their training data and those that have not. We also perform analogous experiments in the activation space and find that the same properties do not hold up. Additionally, we investigate the behaviour of our metrics under various distribution shifts and show that these shifts reduce separation under our path-based diagnostics. Our results suggest that this framework provides useful model-level diagnostics of representation structure and robustness.

2605.21311 2026-05-21 cs.LG cs.AI 版本更新

DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning

DeCoR:基于强化学习的城市街道设计与控制联合优化

Bibek Poudel, Lei Zhu, Kevin Heaslip, Sai Swaminathan, Weizi Li

发表机构 * University of Tennessee, Knoxville, TN, USA(田纳西大学,诺克斯维尔分校) University of North Carolina at Charlotte, Charlotte, NC, USA(北卡罗来纳大学夏洛特分校) University of California, Riverside, CA, USA(加州大学河滨分校)

AI总结 本文提出DeCoR框架,通过强化学习联合优化城市街道的过街横道布局和网络级信号控制,减少了行人到达最近过街横道的时间,并显著降低了行人和车辆等待时间。

Comments 22 pages, 8 figures

详情
AI中文摘要

现代视觉系统可以大规模检测、跟踪和预测城市中的行人,但将感知输出转化为城市设计仍然有限。我们介绍了DeCoR,一种两阶段强化学习框架,利用流量观测来联合优化过街横道布局和网络级信号控制。设计阶段将行人网络编码为图,并学习一种生成策略,该策略参数化一个高斯混合模型,用于过街横道的位置和宽度,从中采样新的过街横道。对于每个布局,共享的控制策略学习自适应信号时序以最小化行人和车辆的总延迟。在一条750米的现实世界城市走廊上,DeCoR学习了一个布局,该布局将行人到达最近过街横道的时间减少了23%,同时使用比现有配置更少的过街横道。在控制方面,DeCoR相对于固定时间信号控制,将行人和车辆等待时间分别减少了79%和65%。进一步,控制策略能够泛化到训练外的需求,并且在不重新训练的情况下对布局变化具有鲁棒性。

英文摘要

Modern vision systems can detect, track, and forecast urban actors at scale, yet translating perception outputs to urban design remains limited. We introduce DeCoR, a two-stage reinforcement learning framework that leverages flow observations to co-optimize crosswalk layout and network-level signal control. The design stage encodes the pedestrian network as a graph and learns a generative policy that parameterizes a Gaussian mixture model over crosswalk location and width, from which new crosswalks are sampled. For each layout, a shared control policy learns adaptive signal timings to minimize joint pedestrian and vehicle delay. On a 750 m real-world urban corridor with demand sensed from video and Wi-Fi logs, DeCoR learns a layout that reduces pedestrian arrival time to their nearest crosswalk by 23% while using fewer crosswalks than existing configurations. On the control side, DeCoR reduces pedestrian and vehicle wait time by 79% and 65%, respectively, relative to fixed-time signalization. Further, the control policy generalizes to demands outside of training and is robust to layout changes without retraining.

2605.21303 2026-05-21 cs.LG cs.AI cs.LO 版本更新

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

从电路证据到机制理论:一种归纳逻辑方法

Nura Aljaafari, Danilo S. Carvalho, Andre Freitas

发表机构 * Department of Computer Science, University of Manchester(曼彻斯特大学计算机科学系) Idiap Research Institute(Idiap研究机构) CRUK National Biomarker Centre, University of Manchester(曼彻斯特大学癌症研究联盟国家生物标志物中心)

AI总结 本文提出了一种基于归纳逻辑的方法,通过将电路解释视为归纳理论构建,为累积的机制科学提供形式化基础设施。该方法通过因果功能签名和建筑签名,明确机制主张,并在不同模型规模之间实现可移植性。

Comments 27 pages, 10 Figures, 14 Tables

详情
AI中文摘要

机制可解释性能够产生神经网络行为的电路层面因果分析,但发现的电路往往仍然是孤立的实验艺术品:没有共享的形式化表示来说明电路计算什么,它们如何相互关联,或者两个发现是否为同一机制提供证据。本文通过将电路解释视为归纳理论构建,提供了一种形式化基础设施,用于累积的机制科学。每个电路在两个层面进行表征:因果功能签名(CFS),它通过因果归因证据和令牌角色配置文件将组件行为基础化;以及建筑签名τ_arch,通过归纳逻辑编程(ILP)从尺度不变的结构谓词中学习。共同,这些构成了一个形式化的一致层,使机制主张显式化,并通过θ-子sume进行比较,并在模型规模之间实现可移植性。CFS揭示了不同任务类型中不同的计算策略,包括注意力介导的复制与MLP介导的绑定。ILP签名在结构分离方面优于图核和特征向量基线,并支持在不同模型规模和架构家族之间进行原理性转移。

英文摘要

Mechanistic interpretability produces circuit-level causal analyses of neural network behaviour, but discovered circuits often remain isolated experimental artefacts: there is no shared formal representation for what circuits compute, how they relate, or when two findings provide evidence for the same mechanism. This work provides a formal infrastructure for cumulative mechanistic science by treating circuit interpretation as inductive theory construction. Each circuit is characterised at two levels: a Causal Functional Signature (CFS), which grounds component behaviour in causal attribution evidence and token role profiles, and an architectural signature $τ_{\mathrm{arch}}$, learned by inductive logic programming (ILP) from scale-invariant structural predicates. Together, these constitute a formal coherence layer that makes mechanistic claims explicit, comparable via $θ$-subsumption, and portable across model scales. CFS reveals qualitatively distinct computational strategies across task types, including attention-mediated copying versus MLP-mediated binding. ILP signatures achieve substantially better structural separation than graph kernel and feature-vector baselines, and support principled transfer across model scales and architecture families.

2605.21301 2026-05-21 cs.LG cs.CV 版本更新

Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

通过与健康对照组对比自动发现疾病亚组

Robin Louiset, Edouard Duchesnay, Benoit Dufumier, Antoine Grigis, Pietro Gori

发表机构 * NeuroSpin(神经旋) Université Paris-Saclay(巴黎-萨克勒大学) CEA(法国原子能委员会) LTCI Institut Polytechnique de Paris(巴黎高等理工学院)

AI总结 本文提出了一种通过对比患者与健康对照组来发现可解释且同质的疾病亚组的方法,该方法在医学影像数据集上展示了改进的亚组估计质量。

Comments Accepted to Data Mining and Knowledge Discovery, ECML-PKDD 2026 Journal Track

详情
AI中文摘要

在生物医学亚组发现中,研究者致力于在患者群体中发现可解释且同质的亚组。在本文中,我们假设健康个体(即对照组)与患者共享一些无关的变异性因素,从而提出了一种称为Deep UCSL的对比亚组发现方法。通过对比患者与对照组,Deep UCSL识别出仅由病理因素驱动的亚组,忽略与健康个体共享的共同变异性。我们的框架采用深度特征提取器来学习判别性表示空间。数学上,我们基于潜在聚类和患者/对照组标签的条件联合似然推导出一种新的损失函数,并通过期望最大化策略交替优化亚组推断和特征编码器更新。一个正则化项进一步鼓励表示捕捉疾病特异性变异性,同时忽略与对照组共享的变异性。与先前相关工作相比,我们的方法在MNIST示例和四个不同的医学影像数据集上展示了改进的亚组估计质量。代码和数据集可在:https://github.com/rlouiset/deep_ucsl获取。

英文摘要

In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.

2605.21295 2026-05-21 cs.LG cs.AI cs.HC 版本更新

TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health

TimeSRL: 通过语义RL调优的LLM实现通用的时间序列行为建模 -- 一项心理健康应用的案例研究

Yuang Fan, Lilin Xu, Millie Wu, Jingping Nie, Qingyu Chen, Yuzhe Yang, Zhuo Zhang, Xin Liu, Subigya Nepal, Xiaofan Jiang, Xuhai "Orson" Xu

发表机构 * Columbia University(哥伦比亚大学) University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) Yale University(耶鲁大学) University of California, Los Angeles(加州大学洛杉矶分校) Google(谷歌) University of Virginia(弗吉尼亚大学)

AI总结 本文提出TimeSRL,一种两阶段LLM框架,通过显式的语义瓶颈路由预测,将原始信号抽象为高级自然语言,从而预测行为结果,该方法在心理健康预测中实现了最先进的跨群体泛化性能。

详情
AI中文摘要

纵向被动传感能够实现连续健康预测,但模型在跨数据集分布偏移下往往失效。传统机器学习容易过拟合群体特异性特征,而大型语言模型(LLMs)在长且异质的时间序列上难以可靠推理。我们引入TimeSRL,一种两阶段LLM框架,通过显式的语义瓶颈路由预测。模型首先将原始信号抽象为高级自然语言,然后仅从这些抽象中预测行为结果。这迫使模型在我们认为泛化更好的语义概念上进行推理。我们通过组相对策略优化(GRPO)结合可验证奖励的强化学习(RLVR)端到端优化这一过程,学习与结果对齐的抽象,而无需金标准中间注释。在心理健康预测中,TimeSRL在设计用于在严格的一留一数据集-out(LOSO)协议下压力测试跨群体泛化能力的基准上实现了最先进的性能,将焦虑的均绝对误差(MAE)在强大的非LLM ML和LLM基线模型上分别降低了3.1-10.1%和9.5-44.1%,抑郁的MAE则降低了3.2-9.6%和27.4-57.6%(所有p值<0.05)。TimeSRL在不同传感管道上的跨基准迁移中显著优于先前方法,在不进行目标领域微调的情况下,其性能与自身在领域内性能相当。这些结果表明语义抽象具有可重用性,并指出了通过RL调优的LLM实现通用行为建模的新方向。

英文摘要

Longitudinal passive sensing enables continuous health prediction, yet models often fail under cross-dataset distribution shifts. Traditional ML overfits cohort-specific artifacts, while Large Language Models (LLMs) struggle to reason reliably over long, heterogeneous time-series. We introduce TimeSRL, a two-stage LLM framework that routes predictions through an explicit semantic bottleneck. The model first abstracts raw signals into high-level natural language, then predicts behavioral outcomes from these abstractions alone. This forces the model to reason over semantic concepts that we argue generalize better than raw numbers. We optimize this process end-to-end using Group Relative Policy Optimization (GRPO) with Reinforcement Learning from Verifiable Rewards (RLVR), learning outcome-aligned abstractions without gold intermediate annotations. Instantiated on mental-health prediction, TimeSRL achieves state-of-the-art performance on a benchmark designed to stress-test cross-cohort generalization under a rigorous leave-one-dataset-out (LOSO) protocol, reducing mean absolute error (MAE) over strong non-LLM ML and LLM baselines by 3.1--10.1% and 9.5--44.1% for anxiety, and 3.2--9.6% and 27.4--57.6% for depression (all $p$s<0.05). TimeSRL significantly outperforms prior methods in cross-benchmark transfer across different sensing pipelines, rivaling its own within-domain performance without target-domain fine-tuning. These results demonstrate that semantic abstractions are reusable and point to a new direction for generalizable behavior modeling via RL-tuned LLMs.

2605.21292 2026-05-21 stat.ML cs.AI cs.LG math.DS 版本更新

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

双因子线性变换器模型的大步训练动态

Krishnakumar Balasubramanian

发表机构 * Department of Statistics, University of California, Davis(加州大学戴维斯分校统计学系)

AI总结 本文研究了双因子线性变换器模型在大学习率下的训练动态,通过分析发现大步长学习率可以改变变换器的训练吸引子,而非仅仅加速收敛,可能在稳定性阈值之外导致训练进入循环、有界混沌或发散。

详情
AI中文摘要

梯度流分析显示,简化的线性变换器可以学习上下文线性回归算法,但无法解释大学习率下梯度下降的有限步行为。受高学习率变换器不稳定性实证研究和二次回归的立方图相图启发,我们研究了一个可以简化为单提示线性变换器训练问题的恰好可约问题。归一化后,动态减少为一个双因子乘积映射,具有有效步长参数μ。在平衡切片上,该映射恢复了已知的标量立方过渡,从单调收敛到飞弹收敛,周期性和有界非收敛,以及发散。我们随后分析了完整的二维系统,显示对于0<μ<2,它有一个显式不变的切比雪夫椭圆,将前向不变区域分开;该椭圆承载着不平衡的混沌动态,但横向排斥,而平衡标量吸引子可以横向吸引。这些结果表明,大常数学习率可以改变学习变换器的训练吸引子,而不仅仅是加速收敛:在稳定性阈值之外,有限步训练可能进入循环、有界混沌或发散,而不是单一的上下文线性回归解。我们还讨论了这对基于小批量梯度下降训练方法的影响。

英文摘要

Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instabilities and by the cubic-map phase diagram for quadratic regression, we study an exactly reducible one-prompt linear-transformer training problem. After normalization, the dynamics reduce to a two-factor product map with an effective step-size parameter \(μ\). On the balanced slice, this map recovers the known scalar cubic transition from monotone convergence to catapult convergence, periodic and chaotic bounded nonconvergence, and divergence. We then analyze the full two-dimensional system and show that, for \(0<μ<2\), it has an explicit invariant Chebyshev ellipse separating forward-invariant regions; this ellipse carries off-balanced chaotic dynamics but is transversely repelling, while balanced scalar attractors can be transversely attracting. These results show that large constant learning rates can change the training attractor of the learned transformer rather than merely accelerating convergence: beyond sharp stability thresholds, finite-step training may settle into cycles, bounded chaos, or divergence instead of a single in-context linear-regression solution. We also discuss the consequences for mini-batch gradient descent based training methods.

2605.21288 2026-05-21 cs.LG 版本更新

A Mechanistic Study of Tabular Foundation Models

表格基础模型的机理研究

Marin Biloš, James T. Wilson, Anderson Schneider, Yuriy Nevmyvaka

发表机构 * Morgan Stanley(摩根大通)

AI总结 本文研究了不同架构的表格基础模型在分类和回归任务中的准确性收敛问题,揭示了模型内部算法、对称性来源以及扰动鲁棒性的机理,发现先前指出的表示崩溃并非实际问题。

详情
AI中文摘要

表格基础模型在不同架构下在多种分类和回归任务中表现出准确性的收敛。这引发了排行榜无法回答的问题:(i)这些模型是否执行相同的上下文算法?(ii)行、列和类置换不变性来源在哪里?(iii)在针对推断机制设计的扰动下,它们的鲁棒性如何?我们对这三个问题进行了特征化。模型家族实现了质上不同的相似性基于读取:从加权投票上下文标签到类条件均值读取,每种都通过因果干预得到验证。我们发现先前工作中强调的表示崩溃并非这些模型的实际问题。每个模型的置换不变性可以追溯到特定的位置参数,移除这些参数可保持准确性并使近似不变性变为精确。针对每个读取设计的扰动复现了预测的失败模式;枢纽和排名攻击将它们与重训练基线隔离。这些结果共同提供了当前表格基础模型的机理解释,并识别了哪些归纳偏置同时决定了其准确性和特征性失败。

英文摘要

Tabular foundation models with different architectures converge in accuracy across a range of classification and regression tasks. This raises questions a leaderboard cannot answer: (i) whether the models execute the same in-context algorithm, (ii) where row, column, and class-permutation invariances originate, and (iii) how robust they are under perturbations engineered against the inferred mechanism. We characterize all three. The model families realize qualitatively distinct similarity-based readouts: from an attention-weighted vote over context labels to a class-conditional mean readout, each confirmed by causal intervention. We find that the representation collapse highlighted in prior work is not a practical concern for them. Each model's permutation invariances trace to specific positional parameters whose removal preserves accuracy and makes approximate invariance exact. Perturbations engineered against each readout reproduce predicted failure modes; hub and rank attacks isolate them from refit baselines. Together these results give a mechanistic account of contemporary tabular foundation models and identify which inductive biases govern both their accuracy and characteristic failures.

2605.21266 2026-05-21 cs.LG cs.AI 版本更新

How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR

在线强化学习需要多少?用于RLVR中离线偏好优化的信息性回放

Richa Verma, Balaraman Ravindran

发表机构 * TCS Research Department of CSE(TCS计算机科学系研究部) IIT Madras(印度理工学院马德拉斯分校) Department of Data Science & AI(数据科学与人工智能系) Wadhwani School of Data Science & AI(Wadhwani数据科学与人工智能学院)

AI总结 本文提出G2D方法,通过短时GRPO预热、构建静态偏好数据集和离线DPO微调,以较低的计算成本实现优于GRPO的性能,强调偏好数据信息性而非数量的重要性。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)已成为语言模型推理的强大范式,GRPO是其主要例子。然而,GRPO需要连续在线回放生成,这使它计算成本高且难以扩展。尽管直接偏好优化(DPO)提供了稳定的离线替代方案,但通常在训练时表现不如在线RL方法如GRPO。我们引入G2D(GRPO到DPO),一个三阶段流程,进行短GRPO预热,构建静态偏好数据集,并使用DPO离线微调模型。在Qwen2.5-7B和Llama-3.1-8B上,我们发现离线DPO在适度预热下能以显著更低的计算成本匹配或超越GRPO。在Qwen2.5-7B上,G2D在K=150时在MATH-500上达到62.4%,比GRPO(51.6%)高出10.8%,计算成本低约4倍。在Llama-3.1-8B上,G2D在K=500时达到49.4%,在实验设置中超越GRPO。我们表明性能不取决于偏好对的数量,而取决于其信息性。适度预热产生校准的不确定性回放,产生更强的对比信号,而过度预热导致过于自信的策略和信息较少的数据。我们的结果将RLVR中的离线-在线差距重新定义为主要的数据信息性问题,并识别了适当难度校准的离线微调数据集的短在线RL预热作为计算高效的在线RL替代方案。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for reasoning in language models, with GRPO as its primary example. However, GRPO requires continuous online rollout generation, making it computationally expensive and difficult to scale. While Direct Preference Optimization (DPO) offers a stable and efficient offline alternative, it is typically expected to underperform w.r.t. online RL methods such as GRPO when trained on rollouts from a cold supervised fine-tuned (SFT) policy. We introduce G2D (GRPO to DPO)}, a three-stage pipeline that performs a short GRPO warm-up, constructs a static preference dataset, and fine-tunes a model offline with DPO. Across a set of values of the number of online steps (K) in GRPO on Qwen2.5-7B and Llama-3.1-8B, we find that offline DPO with moderate warm-up matches or outperforms GRPO at substantially lower compute cost in our setting. On Qwen2.5-7B, G2D at K=150 achieves 62.4% on MATH-500, outperforming GRPO (51.6%) by 10.8% at ~4x lower compute. On Llama-3.1-8B, G2D at K=500 achieves 49.4%, surpassing GRPO in our experimental setting. We show that performance is not governed by the number of preference pairs, which does not vary much w.r.t. K, but by their informativeness. Moderate warm-up produces rollouts with calibrated uncertainty, yielding stronger contrastive signal, while excessive warm-up leads to overconfident policies and less informative data. Our results recast the offline-online gap in RLVR as primarily a data informativeness problem, and identify short online RL warm-up with appropriate difficulty calibration of the fine-tuning dataset as a compute-efficient alternative to online RL.

2605.20706 2026-05-21 cs.DC cs.AI cs.LG 版本更新

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

网络上的Llamas:基于WebGPU的内存高效、性能可移植和多精度LLM推理

Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen, Neha Abbas, James Contini, Tyler Sorensen

发表机构 * Microsoft Research(微软研究院) UC Santa Cruz(加州大学圣克鲁兹分校)

AI总结 本文提出LlamaWeb,一种基于WebGPU的LLM推理框架,通过静态内存规划和高效模型加载减少内存开销,支持多种模型权重格式,实现了内存高效、性能可移植的LLM推理。

Comments 19 pages, 11 figures, 5 tables

详情
AI中文摘要

在浏览器中运行语言模型提供了一个独特的机会,可以构建高效、私有且可移植的AI应用,但需要应对受限的内存可用性和异构硬件目标。为了实现这一机会,我们提出了Llamas on the Web(LlamaWeb),一种针对llama.cpp的WebGPU后端,能够在浏览器中实现内存高效且性能可移植的LLM推理,适用于广泛范围的模型权重格式。我们的设计通过静态内存规划和高效的模型加载显著减少了内存开销,通过可调的内核库解决了跨设备的差异性,并引入了模板化的GPU内核,支持多种量化格式的高性能实现,从而实现了广泛模型支持和对新格式的扩展性。我们评估了LlamaWeb在16个设备上,收集了10个语言模型和四种模型权重格式的数据。我们比较了LlamaWeb与现有的浏览器LLM框架,发现LlamaWeb在多种设备、浏览器和操作系统组合下需要29-33%更少的内存。我们还评估了LlamaWeb的性能,发现其在四个不同供应商的GPU上解码吞吐量提高了45-69%。此外,我们还比较了LlamaWeb与其他llama.cpp后端的性能,发现其在某些设备上与甚至超越了供应商特定的后端性能。

英文摘要

Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama$.$cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models and four model weight formats. We compare LlamaWeb against existing browser-based LLM frameworks and find that LlamaWeb requires 29-33% less memory across several combinations of device, browser, and operating system. We also evaluate LlamaWeb's performance against these frameworks and find that it increases decode throughput by 45-69% across four GPUs from separate vendors. In addition, we compare LlamaWeb's performance against other llama$.$cpp backends, where it is competitive with and even beats vendor-specific backend performance on some devices.

2605.19269 2026-05-21 cs.LG 版本更新

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

CODA:将Transformer块重写为GEMM-epilogue程序

Han Guo, Jack Zhang, Arjun Menon, Driss Guessous, Vijay Thakkar, Yoon Kim, Tri Dao

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Princeton University(普林斯顿大学) Together AI Meta

AI总结 本文提出CODA,一种将Transformer中的非注意力计算重写为GEMM-epilogue程序的GPU内核抽象,以提高训练效率和硬件利用效率。

详情
AI中文摘要

Transformer训练系统围绕密集线性代数构建,但端到端时间的非平凡分数耗费在周围的内存绑定操作上。归一化、激活、残差更新、减少和相关计算反复在全局内存中移动大型中间张量,同时进行很少的算术运算,使数据移动成为在高度优化的训练堆栈中越来越重要的瓶颈。我们引入CODA,一种GPU内核抽象,将这些计算表示为GEMM-plus-epilogue程序。CODA基于观察到许多Transformer运算作为独立框架内核暴露时,可以重新参数化为在GEMM输出瓷砖留在芯片上执行,然后再写入内存。该抽象固定了GEMM主循环,并暴露了一组小的可组合的epilogue原语用于缩放、减少、配对转换和累积。这种受限的接口保留了专家编写GEMM的性能结构,同时足够表达以覆盖标准Transformer块正向和反向传递中几乎所有非注意力计算。在代表性的Transformer工作负载上,无论是人工还是LLM编写的CODA内核均实现了高性能,表明GEMM-plus-epilogue编程为结合框架级生产力与硬件级效率提供了一条实际路径。

英文摘要

Transformer training systems are built around dense linear algebra, yet a nontrivial fraction of end-to-end time is spent on surrounding memory-bound operators. Normalization, activations, residual updates, reductions, and related computations repeatedly move large intermediate tensors through global memory while performing little arithmetic, making data movement an increasingly important bottleneck in otherwise highly optimized training stacks. We introduce CODA, a GPU kernel abstraction that expresses these computations as GEMM-plus-epilogue programs. CODA is based on the observation that many Transformer operators exposed as separate framework kernels can be algebraically reparameterized to execute while a GEMM output tile remains on chip, before it is written to memory. The abstraction fixes the GEMM mainloop and exposes a small set of composable epilogue primitives for scaling, reductions, pairwise transformations, and accumulation. This constrained interface preserves the performance structure of expert-written GEMMs while remaining expressive enough to cover nearly all non-attention computation in the forward and backward pass of a standard Transformer block. Across representative Transformer workloads, both human- and LLM-authored CODA kernels achieve high performance, suggesting that GEMM-plus-epilogue programming offers a practical path toward combining framework-level productivity with hardware-level efficiency.

2605.15156 2026-05-21 cs.CL cs.AI cs.LG 版本更新

MeMo: Memory as a Model

MeMo:记忆作为模型

Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma, Alok Prakash, Nancy F. Chen, Bryan Kian Hsiang Low, Daniela Rus, Armando Solar-Lezama

发表机构 * Institute of Data Science, National University of Singapore(数据科学研究院,新加坡国立大学) Integrative Sciences and Engineering Programme, NUSGS(整合科学与工程计划,NUSGS) Agency for Science, Technology, Research (A*STAR)(科技研究局(A*STAR)) Department of Computer Science, National University of Singapore(计算机科学系,新加坡国立大学) University of Tokyo(东京大学) Liquid AI CSAIL, Massachusetts Institute of Technology(CSAIL,麻省理工学院) AI Singapore Singapore-MIT Alliance for Research and Technology Centre, Singapore(新加坡-麻省理工学院研究与技术中心,新加坡)

AI总结 本文提出MeMo框架,通过在不改变LLM参数的情况下将新知识编码到专用记忆模型中,解决了大型语言模型在需要及时领域特定信息的应用中的问题,同时具备处理复杂跨文档关系、抗检索噪声、避免灾难性遗忘、无需访问LLM权重或输出logits以及检索成本与语料库大小无关等优势。

Comments MeMo augments any LLM with up-to-date or domain-specific knowledge via a trained memory model, avoiding costly retraining, mitigating catastrophic forgetting, and remaining robust to retrieval noise

详情
AI中文摘要

大型语言模型(LLMs)在广泛的任务上表现出色,但预训练后保持冻结状态,直到后续更新。许多现实应用需要及时、领域特定的信息,这促使需要高效的机制来整合新知识。在本文中,我们介绍MeMo(Memory as a Model),一个模块化框架,能够将新知识编码到专用的记忆模型中,同时保持LLM参数不变。与现有方法相比,MeMo具有几个优势:(a)它能够捕捉复杂的跨文档关系;(b)它对检索噪声具有鲁棒性;(c)它避免了LLM中的灾难性遗忘;(d)它不需要访问LLM的权重或输出logits,从而能够与开源和专有闭源LLM进行即插即用式集成;(e)其检索成本在推理时间与语料库大小无关。我们在三个基准测试集BrowseComp-Plus、NarrativeQA和MuSiQue上的实验结果表明,MeMo在多种设置中相比现有方法表现优异。

英文摘要

Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In this paper, we introduce MeMo (Memory as a Model), a modular framework that encodes new knowledge into a dedicated memory model while keeping the LLM parameters unchanged. Compared to existing methods, MeMo offers several advantages: (a) it captures complex cross-document relationships, (b) it is robust to retrieval noise, (c) it avoids catastrophic forgetting in the LLM, (d) it does not require access to the LLM's weights or output logits, enabling plug-and-play integration with both open and proprietary closed-source LLMs, and (e) its retrieval cost is independent of corpus size at inference time. Our experimental results on three benchmarks, BrowseComp-Plus, NarrativeQA, and MuSiQue, show that MeMo achieves strong performance compared to existing methods across diverse settings.

2605.12597 2026-05-21 cond-mat.dis-nn cond-mat.stat-mech cs.AI cs.LG physics.comp-ph 版本更新

The critical slowing down in diffusion models

扩散模型中的临界减慢现象

Luca Maria Del Bono, Giulio Biroli, Patrick Charbonneau, Marylou Gabrié

发表机构 * Laboratoire de Physique Statistique, École normale supérieure, PSL Research University(统计物理实验室,巴黎高等师范大学,PSL研究大学) Department of Physics, Duke University(杜克大学物理系) Department of Chemistry, Duke University(杜克大学化学系)

AI总结 本文研究了扩散模型在统计场理论O(n)模型中的应用,揭示了训练过程中参数学习的临界减慢现象,并通过引入局部得分近似方法,展示了通过适当架构设计可以克服这一现象,为统计物理中的采样方法提供了可控的改进框架。

Comments 17 pages, 8 figures

详情
AI中文摘要

计算采样自20世纪中叶以来一直是科学的核心。尽管基于机器学习的方法最近取得了重大进展,但其行为仍缺乏深入理解,理论上对何时以及为何成功控制有限。本文通过分析扩散模型在统计场理论O(n)模型的高斯极限n→∞下的应用,提供了对扩散模型的深入见解。在这一可分析的设置中,我们展示了训练一个具有单层网络架构的得分模型时,参数学习会出现临界减慢现象。这种减慢也影响生成过程,表明即使对于学习生成模型,接近临界点的采样困难仍然存在。为克服这一瓶颈,我们展示了通过结合架构深度与物理局部性可以提升性能。我们发现使用双层架构可以显著减少临界减慢,训练时间与系统规模的关系从二次方变为对数。通过引入局部得分近似,我们证明这种训练时间的加速可以在不增加神经网络参数数量的情况下实现。总体而言,这些结果表明扩散模型可以通过适当的架构设计克服临界减慢现象,并为统计物理及其他领域中的学习采样方法建立了可控的改进框架。

英文摘要

Computational sampling has been central to the sciences since the mid-20th century. While machine-learning-based approaches have recently enabled major advances, their behavior remains poorly understood, with limited theoretical control over when and why they succeed. Here we provide such insight for diffusion models-a class of generative schemes highly effective in practice-by analyzing their application to the $O(n)$ model of statistical field theory in the Gaussian limit $n \to \infty$. In this analytically tractable setting, we show that training a score model with a one-layer network architecture matching the exact solution exhibits a form of critical slowing down in parameter learning. This slowing down also impacts the generation process, indicating that the well-known difficulties of sampling near criticality persist even for learned generative models. To overcome this bottleneck, we demonstrate the power of combining architectural depth with physical locality. We find that using a two-layer architecture drastically reduces the critical slowing down, with the training time scaling logarithmically rather than quadratically with system size. By introducing a local score approximation we show that this acceleration in training time can be achieved without increasing the number of neural network parameters. Taken together, these results demonstrate that diffusion models can overcome the critical slowing down through appropriate architectural design, and establish a controlled framework for understanding and improving learned sampling methods in statistical physics and beyond.

2605.10933 2026-05-21 cs.LG cs.CL 版本更新

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

DECO:稀疏专家混合模型在端侧设备上实现密集级性能

Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu

发表机构 * Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China(计算机科学与技术系,人工智能研究院,清华大学,北京,中国)

AI总结 本文提出DECO,一种稀疏的专家混合模型,能够在相同的总参数预算和训练token数量下,实现与密集Transformer相当的性能,并在端侧设备上实现高效部署。

Comments 15 pages, 10 figures, 12 tables

详情
AI中文摘要

尽管混合专家(MoE)可以在不按比例增加计算的情况下扩展模型容量,但其巨大的总参数足迹造成了显著的存储和内存访问瓶颈,阻碍了同时需要高性能、低计算成本和小存储开销的端侧部署。为了实现这些特性,我们提出了DECO,一种稀疏的MoE架构,旨在在相同的总参数预算和训练token数量下匹配密集Transformer的性能。DECO利用可微且灵活的基于ReLU的路由增强,通过可学习的专家级缩放,自适应地平衡路由和共享专家的贡献。此外,我们引入了NormSiLU,一种在SiLU操作前对输入进行归一化的激活函数,产生更稳定的路由专家激活比率趋势和更高的内在稀疏性水平。我们还发现使用非门控MLP专家与基于ReLU的路由具有经验优势,表明MoE架构可能简化。实验表明,DECO仅激活20%的路由专家,能够实现密集性能,并优于现有的MoE基线。我们的专用加速内核在Jetson AGX Orin上相比密集推理实现了2.93倍的加速。代码和检查点可在https://github.com/thunlp/DECO获取。

英文摘要

While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computational cost, and small storage overhead. To achieve these properties, we present DECO, a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. DECO utilizes the differentiable and flexible ReLU-based routing enhanced by learnable expert-wise scaling, which adaptively balances the contributions of routed and shared experts. Furthermore, we introduce NormSiLU, an activation function that normalizes inputs prior to SiLU operators, producing a more stable trend of routed-expert activation ratio and a higher intrinsic sparsity level. We also identify an empirical advantage in using non-gated MLP experts with ReLU-based routing, indicating the possibility of MoE architecture simplification. Experiments demonstrate that DECO, activating only 20% of routed experts, matches dense performance and outperforms established MoE baselines. Our specialized acceleration kernel delivers a 2.93$\times$ speedup on Jetson AGX Orin compared with dense inference. Code and checkpoints are available at https://github.com/thunlp/DECO.

2605.08352 2026-05-21 cs.LG math.PR stat.ML 版本更新

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

神经网络过参数化极限下牛顿方法的收敛性分析

Konstantin Riedl, Konstantinos Spiliopoulos, Justin Sirignano

发表机构 * Mathematical Institute(数学研究所) University of Oxford(牛津大学) Department of Mathematics & Statistics(数学与统计学系) Boston University(波士顿大学)

AI总结 本文研究了在过参数化极限下,正则化牛顿方法训练神经网络的收敛性问题,通过分析牛顿神经切线核(NNTK)的特性,证明了在无限宽极限下,神经网络以指数速度收敛到目标数据,并解决了频谱偏置问题。

详情
AI中文摘要

本文开发了一种正则化牛顿方法用于训练神经网络(NNs)在过参数化极限下的收敛性分析。当隐藏单元数量趋于无穷大时,NN训练动态在概率意义上收敛到一个确定性极限方程的解,该方程涉及一个“牛顿神经切线核”(NNTK)。给出了描述这种收敛的显式速率,并在无限宽度极限下证明NN以指数速度收敛到目标数据(即零损失的全局极小值)。我们证明这种收敛在频谱上是均匀的,解决了梯度下降中的频谱偏置问题。梯度下降的NNTK的特征值聚集在零,导致具有高频分量的目标数据收敛缓慢。相反,如果适当选择正则化参数,NNTK的特征值具有统一的下界,使得牛顿方法能够更快地收敛到具有高频分量的数据。数学上需要解决的问题包括牛顿方法隐式参数更新中可能的不定Hessian矩阵以及随着NN宽度增加,该线性方程组的维度趋于无穷大。这使得在过参数化极限下推导训练动态以及证明有限宽度动态收敛变得复杂。分析确定了一个正则化参数的标度公式,我们证明该公式可以随着隐藏单元数量的增加以合适速率趋于零。我们证明,对于足够大的隐藏单元数量,正则化Hessian在训练过程中保持正定,且NN参数的牛顿更新收敛到零,表明模型行为如同初始化周围的线性化。

英文摘要

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.

2605.07892 2026-05-21 cs.LG 版本更新

Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers

自适应正则化用于Bregman-based优化器中的稀疏性控制

Ahmad Aloradi, Tim Roith, Emanuël A. P. Habets, Daniel Tenbrinck

发表机构 * Department of Data Science, FAU Erlangen-Nürnberg(FAU厄林根-纽伦堡数据科学系) International Audiolabs, FAU Erlangen-Nürnberg(FAU厄林根-纽伦堡国际声学实验室) School of Computation, Information and Technology, Technical University of Munich(慕尼黑技术大学计算、信息与技术学院) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心)

AI总结 本文提出了一种自适应正则化方法,用于在Bregman-based优化器中更精确地控制稀疏性,通过动态调整正则化参数λ,从而提高稀疏性控制的效率和准确性。

Comments 21 pages, 15 figures

详情
AI中文摘要

稀疏训练可以降低深度神经网络的内存和计算成本。然而,稀疏优化方法,例如添加ℓ1惩罚项的方法,通常通过正则化参数λ间接控制稀疏性,而λ到最终稀疏率的映射是非显式的。在我们的实验中,我们发现这种参数敏感性在Bregman-based优化器中尤为明显。具体来说,LinBreg和AdaBreg两种变体在λ值相差两个数量级时达到相同的稀疏性,需要昂贵的试错扫描来实现用户指定的稀疏性。为了解决这个问题,我们提出了一种自适应正则化方案,根据模型当前稀疏性与目标稀疏性之间的差异来更新λ。我们分析了所得到的算法,并在VoxCeleb和CNCeleb上的自动语音验证任务中评估了该方法,使用ECAPA-TDNN和ResNet34。所提出的方法能够可靠地实现75%到99%的稀疏性目标。它在早期训练中比oracle调优的非自适应基线收敛得更快,并在等误差率上与基线持平或优于基线。我们进一步表明,自适应方案继承了其非自适应对应物的关键特性,包括在密集基线上的改进的分布外鲁棒性。

英文摘要

Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter $λ$, whose mapping to the final sparsity rate is non-trivial. In our experiments, we found this parameter sensitivity to be particularly pronounced for Bregman-based optimizers. Specifically, the two variants LinBreg and AdaBreg reach the same sparsity at $λ$ values that differ by up to two orders of magnitude, requiring expensive trial-and-error sweeps to achieve a user-specified sparsity. To address this, we propose an adaptive regularization scheme that updates $λ$ based on the difference between the model's current sparsity and the target sparsity. We analyze the resulting algorithm and evaluate it on automatic speaker verification with ECAPA-TDNN and ResNet34 on VoxCeleb and CNCeleb. The proposed method reliably achieves sparsity targets ranging between 75% and 99%. It also converges faster than the oracle-tuned non-adaptive baseline during early training and matches or surpasses its final performance in equal error rate. We further show that the adaptive scheme inherits key properties from its non-adaptive counterpart, including improved out-of-distribution robustness over the dense baselines.

2604.23944 2026-05-21 stat.ML cs.LG 版本更新

Sliced-Regularized Optimal Transport

切片正则化最优传输

Khai Nguyen

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出了一种新的正则化最优传输(OT)方法,称为切片正则化最优传输(SROT)。与熵正则化最优传输(EOT)不同,SROT将正则化方向指向平滑的切片最优传输(SOT)计划。我们提供了SROT的正式定义,推导了其对偶形式,并提供了SROT的后贝叶斯解释。然后,我们开发了一种类似Sinkhorn的算法,以高效计算,保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验,SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外,所得到的传输计划优于参考的SOT计划本身。我们还引入了由SROT引起的相应的OT分歧度,称为SROT分歧度,并分析了其拓扑和计算性质。最后,我们通过合成数据集和颜色传输任务的实验验证了我们的方法,证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

Comments 22 pages, 8 figures, 1 table

详情
AI中文摘要

我们提出了一种新的正则化最优传输(OT)公式,称为切片正则化最优传输(SROT)。与熵正则化最优传输(EOT)不同,SROT正则化方向指向平滑的切片最优传输(SOT)计划。据我们所知,SROT是首个利用SOT计划的版本作为参考来改进经典OT的方法。我们提供了SROT的正式定义,推导了其对偶形式,并提供了SROT的后贝叶斯解释。然后,我们开发了一种类似Sinkhorn的算法以实现高效的计算,保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验,SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外,所得到的传输计划优于参考的SOT计划本身。我们进一步引入了由SROT引起的相应的OT分歧度,称为SROT分歧度,并分析了其拓扑和计算性质。最后,我们通过合成数据集和颜色传输任务的实验验证了我们的方法,证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

英文摘要

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

2603.26603 2026-05-21 cs.SE cs.AI cs.LG 版本更新

Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence

可持续性并非线性:在设备智能中量化性能、能耗和隐私的权衡

Eziyo Ehsani, Luca Giamattei, Ivano Malavolta, Roberto Pietrantuono

发表机构 * University of Naples Federico II(那不勒斯费德里科二世大学) Vrije Universiteit Amsterdam(阿姆斯特丹自由大学)

AI总结 本文研究了将大语言模型从云集群迁移到边缘设备过程中性能、能耗和隐私之间的权衡,通过实验证明模型架构对电池寿命的影响大于量化方案,并发现中等大小模型在响应质量和可持续能耗之间达到最佳平衡。

Comments Under review at Empirical Software Engineering (EMSE)

详情
AI中文摘要

将大型语言模型(LLMs)从云集群迁移到边缘设备有望提高隐私性和离线访问性,但这一转变面临严峻现实:移动电池的物理限制、热限制以及最重要的是内存限制。为了应对这一挑战,我们构建了一个可复现的实验管道,用于分析移动设备上LLMs的能耗、延迟和质量之间的复杂相互作用。我们利用该管道对旗舰Android设备进行了实证案例研究,捕捉了从0.5B到9B参数的八个LLMs的细粒度指标,无需root权限,确保我们的发现反映了现实用户条件。研究结果突显了生成质量、性能、功率和资源消耗之间的权衡,揭示了哪些LLMs在不同条件下提供了最佳平衡。此外,我们发现了一个反直觉的量化能耗悖论:虽然现代重要性感知量化能够减少内存占用以适应更大的模型到RAM,但我们发现其能耗节省与标准混合精度方法相比微不足道。这证明了对于电池寿命而言,模型架构而非其量化方案是决定性因素。我们进一步发现,专家混合(MoE)架构违背了标准大小能耗趋势,提供了7B模型的存储容量,同时保持了1B到2B模型的较低能耗。最后,对这些多目标权衡的分析揭示了中等大小模型(如Qwen2.5-3B)的务实平衡点,这些模型在响应质量和可持续能耗之间实现了有效平衡。

英文摘要

The migration of Large Language Models (LLMs) from cloud clusters to edge devices promises enhanced privacy and offline accessibility, but this transition encounters a harsh reality: the physical constraints of mobile batteries, thermal limits, and, most importantly, memory constraints. To navigate this landscape, we constructed a replicable and reproducible experimental pipeline to profile the complex interplay between energy consumption, latency, and quality of LLMs on mobile devices. We harness this pipeline to conduct an empirical case study on a flagship Android device, capturing granular metrics across eight LLMs ranging from 0.5B to 9B parameters without requiring root access, ensuring our findings reflect realistic user conditions. The findings highlight the trade-offs between generation quality, performance, power and resource consumption, revealing which LLMs offer the best balance across metrics and under different conditions. Besides, we uncovered a counter-intuitive quantization energy paradox: while modern importance-aware quantization successfully reduces memory footprints to fit larger models into RAM, we found it yields negligible energy savings compared to standard mixed-precision methods. This proves that for battery life, the architecture of the model, not its quantization scheme, is the decisive factor. We further identified that Mixture-of-Experts (MoE) architectures defy the standard size-energy trend, offering the storage capacity of a 7B model while maintaining the lower energy profile of a 1B to 2B model. Finally, an analysis of these multi-objective trade-offs reveals a pragmatic sweet spot of mid-sized models, such as Qwen2.5-3B, that effectively balance response quality with sustainable energy consumption.

2603.24472 2026-05-21 cs.CL cs.LG 版本更新

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

为什么自蒸馏(有时)会降低大语言模型的推理能力?

Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim, Jiwon Jeon, Dongsheng Li, Yuqing Yang

发表机构 * Microsoft Research(微软研究院) KAIST(韩国成均馆大学) Seoul National University(首尔国立大学)

AI总结 本文研究了自蒸馏在数学推理中降低大语言模型推理能力的原因,发现其通过抑制模型在推理过程中的不确定性表达,导致在未见过的问题上表现下降,强调了适当表达不确定性对鲁棒推理的重要性。

Comments Code is available at https://github.com/beanie00/self-distillation-analysis

详情
AI中文摘要

自蒸馏作为一种有效的预训练后训练范式,通常在缩短推理轨迹的同时提升性能。然而,在数学推理中,我们发现它会减少响应长度并降低性能。我们追溯这种退化到对“信念性言语化”的抑制——模型在推理过程中表达不确定性。通过控制实验,变化条件上下文的丰富性和任务覆盖范围,我们发现使教师模型在丰富信息上进行条件会抑制不确定性表达,从而在有限的任务覆盖范围内实现快速的领域内优化,但损害了领域外性能,其中未见过的问题受益于表达不确定性并进行相应调整。在Qwen3-1.7B/8B、DeepSeek-Distill-Qwen-7B和Olmo3-7B-Instruct上,我们观察到性能下降高达40%。我们的发现强调了适当表达不确定性对于鲁棒推理的重要性,并突显了优化推理行为的重要性,而不仅仅是强化正确答案轨迹。

英文摘要

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly. Across Qwen3-1.7B/8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct, we observe performance drops of up to 40%. Our findings highlight that exposing appropriate levels of uncertainty is crucial for robust reasoning and underscore the importance of optimizing reasoning behavior beyond merely reinforcing correct answer traces.

2603.24139 2026-05-21 cs.CV cs.LG 版本更新

Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

tutor-student 强化学习:一种动态课程以实现鲁棒的深度伪造检测

Zhanhe Lei, Zhongyuan Wang, Jikang Cheng, Baojin Huang, Yuhong Yang, Zhen Han, Chao Liang, Dengpan Ye

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) School of Integrated Circuits, Peking University(北京大学集成电路学院) School of Information, Huazhong Agricultural University(华中农业大学信息学院) Cyberspace Institute of Advanced Technology, Guangzhou University(广州大学先进技术网络研究院)

AI总结 本文提出了一种 tutor-student 强化学习框架,通过动态优化训练课程来提高深度伪造检测的鲁棒性和泛化能力。

Comments Accepted to CVPR 2026

详情
Journal ref
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026)
AI中文摘要

标准的监督训练将所有样本视为同等重要,这在学习鲁棒且可泛化的特征方面可能是次优的。在本工作中,我们提出了一种新颖的 tutor-student 强化学习 (TSRL) 框架,以动态优化训练课程。我们的方法将训练过程建模为马尔可夫决策过程,其中一个 ``tutor'' agent 学习引导一个 ``student'' (深度伪造检测器)。tutor 实现为一个近端策略优化 (PPO) agent,观察每个训练样本的丰富状态表示,包括不仅其视觉特征,还包括其历史学习动态,如 EMA 损失和遗忘计数。基于此状态,tutor 通过分配连续权重 (0-1) 到样本的损失,从而动态重新加权训练批次。tutor 的奖励基于 student 的即时性能变化,具体奖励从错误预测转为正确预测的过渡。这种策略促使 tutor 学习一个优先考虑高价值样本的课程,如困难但可学习的例子,从而实现更高效和有效的训练过程。我们证明,这种自适应课程相比传统训练方法提高了 student 对未见操纵技术的泛化能力。代码可在 https://github.com/wannac1/TSRL 上获得。

英文摘要

Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incorrect to correct predictions. This strategy encourages the Tutor to learn a curriculum that prioritizes high-value samples, such as hard-but-learnable examples, leading to a more efficient and effective training process. We demonstrate that this adaptive curriculum improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods. Code is available at https://github.com/wannac1/TSRL.

2603.17784 2026-05-21 cs.CV cs.LG 版本更新

ResNet-50 with Class Reweighting and Anatomy-Guided Temporal Decoding for Gastrointestinal Video Analysis

基于类重加权和解剖引导时间解码的ResNet-50在消化系统视频分析中的应用

Romil Imtiaz, Dimitris K. Iakovidis

发表机构 * Department of Computer Science and Biomedical Informatics, University of Thessaly(塞萨洛尼基大学计算机科学与生物医学信息学系)

AI总结 本文提出了一种多标签消化系统视频分析管道,结合ResNet-50帧分类器和解剖引导的时间事件解码,通过类重加权和解剖引导的解码方法提高稀有病理类别的识别性能,最终在挑战测试集上将时间mAP从0.3801提升到0.4303。

Comments ICPR 2026 RARE-VISION Competition

详情
AI中文摘要

我们开发了一种基于ResNet-50帧分类器的多标签消化系统视频分析管道,随后进行解剖引导的时间事件解码。系统从336x336大小的帧中预测17个标签,包括5个解剖类别和12个病理类别。主要挑战是严重的类别不平衡,尤其是罕见病理标签。为了解决这个问题,我们在训练损失中使用了截断的类别级正权重,这在提高罕见类别学习的同时保持了稳定的优化。在时间阶段,我们发现直接帧到事件的转换与官方地面真实值存在碎片化的不匹配。最终提交因此结合了GT风格的帧级事件组成、解剖投票平滑和基于解剖的病理门控,以及保守的滞回解码器。这种设计在挑战测试集上将最终的时间mAP从0.3801提升到0.4303。

英文摘要

We developed a multi-label gastrointestinal video analysis pipeline based on a ResNet-50 frame classifier followed by anatomy-guided temporal event decoding. The system predicts 17 labels, including 5 anatomy classes and 12 pathology classes, from frames resized to 336x336. A major challenge was severe class imbalance, particularly for rare pathology labels. To address this, we used clipped class-wise positive weighting in the training loss, which improved rare-class learning while maintaining stable optimization. At the temporal stage, we found that direct frame-to-event conversion produced fragmented mismatches with the official ground truth. The final submission therefore combined GT-style framewise event composition, anatomy vote smoothing, and anatomy-based pathology gating with a conservative hysteresis decoder. This design improved the final temporal mAP from 0.3801 to 0.4303 on the challenge test set.

2602.13485 2026-05-21 cs.LG stat.ML 版本更新

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

基于图注意力的跨客户端可解释性非线性时序动态联邦学习

Ayse Tursucular, Ayush Mohanty, Nazal Mohamed, Nagi Gebraeel

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出了一种联邦学习框架,用于在分布式非线性系统中学习跨客户端的时序依赖关系。该框架通过非线性状态空间模型将本地高维观测映射到低维潜在状态,并利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型,通过将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联,实现了对跨客户端时序依赖关系的可解释性。

Comments Manuscript under review

详情
AI中文摘要

现代工业系统的网络越来越多地由分布式传感器监控,其中每个系统由多个子系统组成,生成高维时间序列数据。这些子系统通常是相互依赖的,因此理解一个子系统中的时序模式如何与其他子系统相关联变得很重要。在去中心化设置中,原始测量值无法共享,客户端观测是异质的,这使得问题更加复杂。在实际部署中,每个子系统(客户端)运行一个固定的专有模型,无法修改或重新训练,限制了现有方法。非线性动态进一步使跨客户端时序依赖关系难以解释,因为它们嵌入在非线性状态转移函数中。本文提出了一种联邦框架,用于在这些约束下学习跨客户端的时序依赖关系。每个客户端使用非线性状态空间模型将高维本地观测映射到低维潜在状态。中央服务器利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型。为了可解释性,我们将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联,从而首次提供了对去中心化非线性系统中跨客户端时序依赖关系的可解释性描述。我们建立了理论收敛保证,以达到集中化 oracle,并通过合成实验验证了该框架,展示了收敛性、可解释性、可扩展性和隐私。此外,现实世界实验显示其性能与去中心化基线相当。

英文摘要

Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal patterns at one subsystem relate to others. This is challenging in decentralized settings where raw measurements cannot be shared and client observations are heterogeneous. In practical deployments each subsystem (client) operates a fixed proprietary model that cannot be modified or retrained, limiting existing approaches. Nonlinear dynamics further make cross client temporal interdependencies difficult to interpret because they are embedded in nonlinear state transition functions. We present a federated framework for learning temporal interdependencies across clients under these constraints. Each client maps high dimensional local observations to low dimensional latent states using a nonlinear state space model. A central server learns a graph structured neural state transition model over the communicated latent states using a Graph Attention Network. For interpretability we relate the Jacobian of the learned server side transition model to attention coefficients, providing the first interpretable characterization of cross client temporal interdependencies in decentralized nonlinear systems. We establish theoretical convergence guarantees to a centralized oracle and validate the framework through synthetic experiments demonstrating convergence, interpretability, scalability and privacy. Additional real world experiments show performance comparable to decentralized baselines.

2601.15133 2026-05-21 cs.CV cs.LG 版本更新

Building Deep Graph Predictors with Graph Imitation Learning

通过图模仿学习构建深度图预测器

André Eberhard, Gerhard Neumann, Pascal Friederich

发表机构 * Karlsruhe Institute of Technology (KIT)(卡尔斯鲁厄理工学院)

AI总结 本文提出GRAIL框架,通过图模仿学习解决图生成中的表示问题,实验证明其在多个基准测试中表现优异。

详情
AI中文摘要

近年来,神经生成文本、图像和音频方面取得了显著进展,得益于成熟的训练流程和大规模优化。然而,对于图而言,这种进展更为有限。我们归因于图特定的优化和表示挑战,这些挑战削弱了通过反向传播和梯度下降训练神经网络的有效性。我们主张在最近提出的监督图预测模型中,将图表示为固定大小的欧几里得网格可能不是最优选择。为了支持我们的观点,我们分析了神经图生成方法,并识别出导致训练神经网络生成图时出现陷阱的理论挑战。受此分析启发,我们引入GRAIL(Graph Imitation Learning),一种用于监督设置的框架,其中监督信号是一个图。GRAIL通过马尔可夫决策过程在部分图的嵌入上依次生成图,从而避免了固定大小网格图表示相关的表示问题。我们实验证明,GRAIL在18个全面的基准测试中实现了具有竞争力的结果,在多个设置中匹配或超过了最先进的方法。

英文摘要

Recent years have seen substantial progress in neural generation of text, images, and audio, supported by mature training pipelines and large-scale optimization. For graphs, however, comparable progress has been more limited. We attribute this gap to graph-specific optimization and representation challenges that undermine the effectiveness of training neural networks with backpropagation and gradient descent. We argue that representing graphs on a fixed-size Euclidean grid, as is common in recently proposed models for supervised graph prediction, may not be the optimal choice in these settings. To support our view, we provide an analysis of neural graph generation methods and identify theoretical challenges that lead to pitfalls when training neural networks to produce graphs as their output. Motivated by this analysis, we introduce \textbf{GRA}ph~\textbf{I}mitation~\textbf{L}earning~(GRAIL), a framework for training neural networks in supervised settings in which the supervision signal is a graph. GRAIL generates graphs sequentially through a Markov decision process over embeddings of partial graphs, thereby avoiding the representation issues associated with fixed-size grid graph representations. We empirically show that GRAIL achieves competitive results on supervised graph prediction across a comprehensive suite of 18 benchmarks, matching or surpassing state-of-the-art methods in several settings.

2512.23943 2026-05-21 cs.CY cs.LG stat.ME 版本更新

Statistical Guarantees in the Search for Less Discriminatory Algorithms

在寻找更少歧视性算法中统计保证

Chris Hays, Ben Laufer, Solon Barocas, Manish Raghavan

发表机构 * MIT(麻省理工学院) Cornell University(康奈尔大学) Microsoft Research(微软研究院)

AI总结 本文研究了在高风险领域中,企业为减少对受保护群体的歧视性影响而寻找更少歧视性算法的统计保证问题,提出了一种自适应停止算法以确定何时停止搜索以证明进一步搜索不会带来有意义的改进。

Comments 38 pages, 10 figures

详情
AI中文摘要

美国反歧视法可以对企业未能采用减少歧视的替代方案(LDA)施加责任:一种决策政策,能够在实现相同商业目标的同时减少对受法律保护群体的歧视性影响。最近的学术研究认为,这一学说对高风险领域(如就业、贷款和住房)的算法决策有直接影响,可能迫使企业寻找“更少歧视性算法”(Black等,2024)。监管机构有时会鼓励主动寻找LDA,强化了企业努力寻找同样表现但影响更小的模型的期望。模型多样性使得此类搜索成为可能:通过不同的随机种子重新训练可以产生具有相似预测性能但实质性不同的歧视性影响的模型。然而企业无法无限重新训练,这提出了一个核心问题:何时搜索足够证明善意?我们正式将LDA搜索在多样性下作为最优停止问题,其中开发者试图产生证据表明进一步搜索不太可能带来有意义的改进。我们的主要贡献是一种自适应停止算法,它提供了一个高概率的上界,以确定通过继续重新训练所能达到的最佳歧视性影响改进,使开发者能够证明(例如,向法院)进一步搜索不太可能有所帮助。我们还展示了在模型空间上更强的分布假设可以产生更紧的界限,并在现实世界信用和住房数据集上验证了该方法。

英文摘要

U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain indefinitely, raising a central question: when is the search sufficient to demonstrate good faith? We formalize LDA search under multiplicity as an optimal stopping problem in which a developer seeks to produce evidence that further search is unlikely to yield meaningful improvements. Our main contribution is an adaptive stopping algorithm that provides a high-probability upper bound on the best disparate-impact gains attainable through continued retraining, enabling developers to certify (e.g., to a court) that additional search is unlikely to help. We also show how stronger distributional assumptions over the model space can yield tighter bounds, and we validate the approach on real-world credit and housing datasets.

2511.09557 2026-05-21 cs.DC cs.LG 版本更新

Understanding and Improving Communication Performance in Multi-node LLM Inference

理解并改进多节点LLM推理中的通信性能

Prajwal Singhania, Siddharth Singh, Lannie Dalton Hough, Akarsh Srivastava, Harshitha Menon, Charles Fredrick Jekel, Abhinav Bhatele

发表机构 * Department of Computer Science, University of Maryland(马里兰大学计算机科学系) Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室)

AI总结 本研究探讨了多节点分布式推理中通信性能的优化,通过分析不同模型并行方案的强标度行为,提出了一种基于递归倍增的分层all-reduce算法NVRAR,显著降低了推理延迟。

Comments 17 Figures, To Appear in Proceedings of ACM Conference on AI and Agentic Systems 2026

详情
AI中文摘要

随着大型语言模型(LLMs)的持续增长,分布式推理变得越来越重要。模型并行策略现在必须高效地扩展到多个GPU以及多个节点。在本工作中,我们对使用GPU超级计算机上的LLM进行多节点分布式推理进行了详细性能研究。我们使用几种最先进的推理引擎以及YALIS,一个面向研究的原型引擎进行实验。我们分析了不同模型并行方案的强标度行为,并识别了关键瓶颈。由于all-reduce操作是常见的性能瓶颈,我们开发了NVRAR,一种基于递归倍增的分层all-reduce算法,使用NVSHMEM。NVRAR在HPE Slingshot和InfiniBand互连上,对于128 KB到2 MB的消息大小,延迟比NCCL低高达1.9$ imes$-3.6$ imes$。集成到YALIS中,NVRAR在使用张量并行的多节点解码密集工作负载中,对于Llama 3.1 405B模型实现了高达1.72$ imes$的端到端批量延迟减少。

英文摘要

As large language models (LLMs) continue to grow in size, distributed inference has become increasingly important. Model-parallel strategies must now efficiently scale not only across multiple GPUs but also across multiple nodes. In this work, we present a detailed performance study of multi-node distributed inference using LLMs on GPU-based supercomputers. We conduct experiments with several state-of-the-art inference engines alongside YALIS, a research-oriented prototype engine designed for controlled experimentation. We analyze the strong-scaling behavior of different model-parallel schemes and identify key bottlenecks. Because all-reduce operations are a common performance bottleneck, we develop NVRAR, a hierarchical all-reduce algorithm based on recursive doubling with NVSHMEM. NVRAR achieves up to 1.9$\times$-3.6$\times$ lower latency than NCCL for message sizes between 128 KB and 2 MB on HPE Slingshot and InfiniBand interconnects. Integrated into YALIS, NVRAR achieves up to a 1.72$\times$ reduction in end-to-end batch latency for the Llama 3.1 405B model in multi-node decode-heavy workloads using tensor parallelism.

2507.21168 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

多样化的大语言模型还是多样化的问题解释?那是集成的问题

Rafael Rosales, Santiago Miret

发表机构 * Intel Labs(英特尔实验室)

AI总结 本文比较了使用大语言模型回答二元问题的两种多样性方法:模型多样性和问题解释多样性,并发现问题解释多样性在集成准确性上表现更优。

详情
Journal ref
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pages 5116-5128
AI中文摘要

有效利用多样性已被证明可以提高各种机器学习模型,包括大语言模型(LLMs)的性能。然而,确定最有效的多样性使用方法仍是一个挑战。在本工作中,我们比较了两种用于使用LLMs回答二元问题的多样性方法:模型多样性,即多个模型回答相同的问题,以及问题解释多样性,即使用同一模型以不同方式 framing 相同的问题来回答。对于这两种情况,我们应用多数投票作为集成共识启发式方法来确定最终答案。我们的boolq、strategyqa和pubmedqa实验表明,问题解释多样性在集成准确性上始终优于模型多样性。此外,我们对GPT和LLaMa的分析表明,模型多样性通常产生在最佳和最差集成成员之间的结果,而没有明显的改进。

英文摘要

Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.

2503.00565 2026-05-21 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Batched Single-Index Global Multi-Armed Bandits with Covariates

批量单索引全局多臂老虎机与协变量

Sakshi Arya, Hyebin Song

发表机构 * Department of Mathematics, Applied Mathematics and Statistics, Case Western Reserve University(数学系、应用数学与统计学系,凯斯西储大学) Department of Statistics, Pennsylvania State University(统计系,宾夕法尼亚州立大学)

AI总结 本文提出了一种新的半参数框架,用于带有协变量的批量老虎机问题,通过引入共享参数和单索引回归模型来捕捉臂奖励之间的关系,提出BIDS算法,在两种设置下推导了理论遗憾界,证明了在协变量维度为1时非参数批量老虎机的最优率。

详情
AI中文摘要

多臂老虎机(MAB)框架是一种广泛用于顺序决策制定的方法,其中决策者在每一轮中选择一个臂,以最大化长期奖励。在许多实际应用中,如个性化医学和推荐系统,决策时可用上下文信息,不同臂的奖励相关而非独立,且反馈以批量形式提供。我们提出了一种新的半参数框架,用于带有协变量的批量老虎机,该框架在臂之间共享参数。我们利用单索引回归(SIR)模型来捕捉臂奖励之间的关系,同时在可解释性和灵活性之间取得平衡。我们的算法,批量单索引动态分箱和 successive arm elimination(BIDS),采用批量 successive arm elimination 策略,并通过单索引方向引导的动态分箱机制。我们考虑了两种设置:一种是可用 pilot 方向,另一种是方向从数据估计,推导了两种情况的理论遗憾界。当 pilot 方向足够准确且臂的数量 K 固定时,我们的方法在非参数批量老虎机中实现了最小化最优率(d=1),规避了维度灾难。在模拟和现实数据集上的大量实验展示了我们的算法相比由 \cite{jiang2025batched} 引入的非参数批量老虎机方法的有效性。

英文摘要

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

2501.01793 2026-05-21 cs.LG cs.AI 版本更新

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

创建从未存在过的虚拟学生:利用大型语言模型和CTGANs进行合成数据生成

Mohammad Khalil, Sam Urmian, Ronas Shakya, Qinyi Liu

发表机构 * Centre for the Science of Learning & Technology (SLATE)(学习科学与技术中心(SLATE)) University of Bergen(卑尔根大学)

AI总结 本文研究了利用生成对抗网络(GANs)和大型语言模型(LLMs)生成合成表格数据的潜力,探讨了通过合成数据创建虚拟学生以服务于学习分析模型的可能性,并评估了不同生成模型的性能。

详情
AI中文摘要

在本研究中,我们探索了人工智能和深度学习技术,特别是生成对抗网络(GANs)和大型语言模型(LLMs)在生成合成表格数据方面的成长潜力。获取高质量学生数据对于推进学习分析至关重要,但隐私问题和全球更严格的数据保护法规限制了其可用性和使用。合成数据提供了一个有前途的替代方案。我们探讨了是否可以利用合成数据来创建虚拟学生以服务于学习分析模型。使用流行的GAN模型CTGAN和三种LLMs-GPT2、DistilGPT2和DialoGPT,我们生成了合成的表格学生数据。我们的结果表明,这些方法具有强大的潜力,能够生成高质量的合成数据集,与真实学生数据相似。为了验证我们的发现,我们应用了一套全面的效用评估指标来评估合成数据的统计和预测性能,并比较了不同生成模型,特别是LLMs的性能。本研究旨在为学习分析社区提供有价值的见解,为扩展学习分析领域的方法学工具箱提供新的创新方法。

英文摘要

In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students data is critical for advancing learning analytics, but privacy concerns and stricter data protection regulations worldwide limit their availability and usage. Synthetic data offers a promising alternative. We investigate whether synthetic data can be leveraged to create artificial students for serving learning analytics models. Using the popular GAN model CTGAN and three LLMs- GPT2, DistilGPT2, and DialoGPT, we generate synthetic tabular student data. Our results demonstrate the strong potential of these methods to produce high-quality synthetic datasets that resemble real students data. To validate our findings, we apply a comprehensive set of utility evaluation metrics to assess the statistical and predictive performance of the synthetic data and compare the different generator models used, specially the performance of LLMs. Our study aims to provide the learning analytics community with valuable insights into the use of synthetic data, laying the groundwork for expanding the field methodological toolbox with new innovative approaches for learning analytics data generation.

2501.01785 2026-05-21 cs.LG cs.AI cs.CY 版本更新

Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

合成数据能否公平且隐私?合成数据生成与公平性算法的比较研究

Qinyi Liu, Oscar Deho, Sam Urmian, Mohammad Khalil, Srecko Joksimovic, George Siemens

发表机构 * Centre for the Science of Learning & Technology (SLATE), University of Bergen(学习科学与技术中心(SLATE),卑尔根大学) University of South Australia(澳大利亚南澳大利亚大学)

AI总结 本研究探讨了合成数据生成与公平性算法在平衡隐私和公平性方面的效果,发现DECAF算法在隐私和公平性之间取得最佳平衡,但其预测准确性较低,而对合成数据应用预处理公平算法能进一步提升公平性。

详情
AI中文摘要

随着机器学习在学习分析(LA)中的广泛应用,算法公平性和隐私问题引发了广泛关注。合成数据作为一种双重用途工具,能够增强LA模型的隐私性和公平性。然而,先前研究指出公平性与隐私之间存在反比关系,使同时优化两者变得困难。本研究探讨了哪些合成数据生成器能最好地平衡隐私和公平性,并确定预处理公平算法(通常应用于真实数据集)在合成数据上的有效性。我们的结果表明,DEbiasing CAusal Fairness(DECAF)算法在隐私和公平性之间取得了最佳平衡。然而,DECAF在实用性上表现不佳,这体现在其预测准确性上。值得注意的是,我们发现将预处理公平算法应用于合成数据时,公平性提升幅度比应用于真实数据时更大。这些发现表明,结合合成数据生成与公平性预处理可以为创建更公平的LA模型提供有前途的方法。

英文摘要

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

2409.08700 2026-05-21 cs.LG 版本更新

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

通过可穿戴设备和人工智能实现个性化体重管理

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Blanca Lacruz-Pleguezuelos, Sofia Bosch Pastor, Laura Judith Marcos-Zambrano, Guadalupe X. Bazán, Gala Freixer, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia, Isabel Espinosa-Salinas, Enrique Carrillo de Santa Pau

发表机构 * Department of Mathematics, Universidad de Las Palmas de Gran Canaria, 35001, Spain(拉斯帕尔马斯德Gran Canaria大学数学系) Cancer Research Program, IMDEA Food Institute(IMDEA食品研究所癌症研究计划) Health, IMDEA Food Institute(IMDEA食品研究所健康)

AI总结 本文研究利用可穿戴设备和人工智能预测超重和肥胖人群的体重变化,通过分析100名受试者的生物标志物、体征和行为数据,发现体重减轻者与未减轻者的关键差异,使用梯度提升分类器达到84.44%的AUC,表明多数据源整合在个性化医疗中的潜力。

Comments 25 pages, 6 figures, 7 tables, 1 appendix

详情
Journal ref
Computers in Biology and Medicine, Vol. 173, 111676, 2026
AI中文摘要

早期检测慢性及非传染性疾病(NCDs)对于在初始阶段有效治疗至关重要。本研究探讨了可穿戴设备和人工智能(AI)在预测超重和肥胖个体体重变化中的应用。使用来自AI4FoodDB数据库的1个月试验数据,包括生物标志物、体征和行为数据,我们识别出体重减轻(≥初始体重2%)者与未减轻者之间的关键差异。特征选择技术和分类算法显示出有前景的结果,梯度提升分类器达到84.44%的曲线下面积(AUC)。多数据源(如体征、体力和睡眠活动等)的整合增强了性能,表明可穿戴设备和AI在个性化医疗中的潜力。

英文摘要

Early detection of chronic and Non-Communicable Diseases (NCDs) is crucial for effective treatment during the initial stages. This study explores the application of wearable devices and Artificial Intelligence (AI) in order to predict weight loss changes in overweight and obese individuals. Using wearable data from a 1-month trial involving around 100 subjects from the AI4FoodDB database, including biomarkers, vital signs, and behavioral data, we identify key differences between those achieving weight loss (>= 2% of their initial weight) and those who do not. Feature selection techniques and classification algorithms reveal promising results, with the Gradient Boosting classifier achieving 84.44% Area Under the Curve (AUC). The integration of multiple data sources (e.g., vital signs, physical and sleep activity, etc.) enhances performance, suggesting the potential of wearable devices and AI in personalized healthcare.

2605.21260 2026-05-21 cs.LG 版本更新

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

关于思维链的成本与收益:一种学习理论视角

Yue Zhang, Zhiyi Dong, Tommaso Cesari, Yongyi Mao

发表机构 * University of Ottawa(渥太华大学)

AI总结 本文从学习理论的角度出发,研究了思维链(CoT)的成本与收益,通过分析回答映射与链式规则的交互作用,定义了假设在该交互下的推理风险,并推导出该风险的紧分解,揭示了CoT在不同条件下的帮助与损害作用。

详情
AI中文摘要

我们开发了一个学习理论框架,用于理解思维链(CoT)。我们将CoT建模为回答映射与链式规则之间的交互作用,链式规则通过自回归的方式生成中间问题,并定义了在该交互下假设的推理风险。我们的第一个结果是将该风险紧分解为两个具有相反作用的项:一个oracle轨迹风险(OTR),它捕捉了CoT的收益,并在领域适应问题中减少到目标领域风险;一个轨迹不匹配风险(TMR),它捕捉了CoT通过在不匹配的推理轨迹上积累误差所带来的成本。然后我们展示,这种成本在没有结构的情况下是无法避免的:如果任何一项损失、假设的回答映射或链式规则缺乏稳定性,即使OTR为零且假设与真实值一致,TMR也可以任意大。相反,在具有稳定性的情况下,我们证明了在精确放大因子下TMR的紧上界,该放大因子识别了有界、线性和指数误差增长区域。这些结果共同给出了CoT何时有助于推理、何时有害以及控制两者之间转换的精确理论。

英文摘要

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

2605.21253 2026-05-21 stat.ML cs.LG 版本更新

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

关于组成式得分方法在基于模拟的推断中的退火动力学理论指南

Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues

发表机构 * Univ. Grenoble Alpes(格勒诺布尔阿尔卑斯大学) Inria(法国国家信息与自动化技术研究所) CNRS(法国国家科学研究中心) Grenoble INP(格勒诺布尔INP) LJK(实验室) Geostatistics team(地质统计学团队) Centre for geosciences and geoengineering(地球科学与地球工程中心) Mines Paris(巴黎矿校) PSL University(巴黎 sciences et lettres 大学)

AI总结 本文研究了基于模拟的推断中组成式得分方法的退火动力学理论,提出了一种新的理论框架,通过推导Wasserstein界,为超参数选择提供了理论指导,并在高斯情况下证明了不同复合得分方法在步长和总动力学步数上的差异。

详情
AI中文摘要

基于模拟的推断(SBI)中的组成式得分方法通过聚合单独学习的后验得分来近似给定n个独立观测的后验分布。目前主要有两种方法(Geffner等人,2023;Linhart等人,2026)。由于所得到的复合得分不对应于真实多观测后验的正向扩散路径上的任何分布的得分,通过反向SDE采样会导致不可消除的偏差。退火动力学提供了一种原理性的替代方法:它将复合得分视为一系列可处理的桥梁密度序列的真实得分,并依次采样这些密度。当正确调节时,它可能导致可控的偏差。然而,其超参数,即步长、每个级别步数和退火级别数,迄今为止都是经验选择。我们推导了退火动力学在近似得分下的Wasserstein界,并将其转化为这些超参数的显式决策规则,以保证规定的采样精度,同时突显每种复合得分方法的不同理论方面。在高斯情况下,我们获得了所有相关量的闭式表达式,并证明了Linhart等人(2026)的桥梁密度一致地允许更大的步长和更少的总动力学步数,而Geffner等人(2023)的则不然。此外,我们还通过实验证明,在高斯情况下的调节可以推广到更复杂的问题,从而为使用组成式得分方法的实践者提供了一个清晰且理论坚实的起点。

英文摘要

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

2605.21241 2026-05-21 cs.LG 版本更新

Divide and Contrast: Learning Robust Temporal Features without Augmentation

划分与对比:无需增强学习鲁棒的时间特征

Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor

发表机构 * Department of Computer Science, Norwegian University of Science(挪威科学与技术大学计算机科学系) Department of Computer Science, United States Naval Academy(美国海军学院计算机科学系)

AI总结 本文提出Di-COT框架,通过对比时间窗口内的信息子结构而非单个时间步,实现了无需数据增强和多编码器传递的自监督学习,从而在六个大规模真实世界数据集和UCR/UEA基准上取得了最先进的性能,同时显著减少了训练时间。

Comments Published in the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

针对时间序列表示的自监督学习旨在减少对标记数据的依赖,同时保持强大的下游性能,但许多现有方法存在计算成本高或依赖不适用于多样化时间动态的假设。在本工作中,我们引入了Divide and Contrast (Di-COT),一种无需数据增强和多次编码器传递的无监督框架,通过对比时间窗口内的信息子结构而非单个时间步来实现。Di-COT在每次迭代中随机将每个窗口划分为少量重叠的子块,从而实现高效且有意义的对比,同时减轻时间转换期间的假阳性。为进一步提高可扩展性,我们采用了一种对比目标,其计算依赖于批量大小和子块数量,使损失计算独立于序列长度。在六个大规模真实世界数据集以及UCR和UEA基准上的广泛实验表明,Di-COT学习了语义结构化且可迁移的表示,实现了分类、聚类、kNN和跨数据集转移任务上的最先进的性能,同时大幅减少了训练时间。源代码可在https://github.com/sfi-norwai/Di-COT上公开获取。

英文摘要

Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps. Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions. To further improve scalability, we adopt a contrastive objective whose computation depends on the batch size and the number of sub-blocks, making loss computation independent of sequence length. Extensive experiments on six large-scale real-world datasets, as well as the UCR and UEA benchmarks, demonstrate that Di-COT learns semantically structured and transferable representations, achieving state-of-the-art performance on classification, clustering, $k$NN, and cross-dataset transfer, while substantially reducing training time. The source code is publicly available at https://github.com/sfi-norwai/Di-COT.

2605.21240 2026-05-21 cs.LG cs.AI 版本更新

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

APEX:自主策略探索用于自演化大语言模型代理

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

发表机构 * National University of Singapore(新加坡国立大学) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 本文提出APEX,一种用于自演化大语言模型代理的自主策略探索方法,通过构建和维护显式的策略空间来解决探索崩溃问题,并在多个基准测试中表现出色。

详情
AI中文摘要

LLM代理在广泛复杂的任务中表现出强大的性能,包括需要长时间决策的交互环境。但是这些代理在测试时间无法实时学习。自演化代理通过在多个回合中积累记忆和反思来解决这个问题,而不是要求模型权重更新。然而,这些代理常常面临探索崩溃的问题:随着记忆的增长,行为会集中在熟悉的高奖励惯例上,减少了发现更好替代品的机会。为了解决这个问题,我们提出了自主策略探索(APEX),通过策略图——一个具有先决条件依赖边的有向无环图来构建和维护显式的策略空间。在APEX中,分支发现通过证据支持的未探索方向扩展地图,而策略选择在规划过程中平衡探索和利用。在九个Jericho文本冒险游戏和WebArena(一个现实的网络交互基准)上进行评估,APEX优于所有基线。广泛的消融实验验证了每个组件的贡献,并展示了在不同设置中的鲁棒性,证明了APEX在自演化代理中的持续探索有效性。

英文摘要

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

2605.21226 2026-05-21 cs.LG cs.AI 版本更新

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

OCTOPUS: 通过在最优平方误差量化下的八面体参数化优化Transformer的KV缓存

Mark Boss, Vikram Voleti, Simon Donné, Shimon Vainer

发表机构 * Stability AI

AI总结 OCTOPUS通过联合量化旋转坐标三元组,优化了Transformer的KV缓存,在保持内存带宽和足迹的同时,通过八面体参数化将方向映射到平方,并利用Lloyd-Max量化来实现非均匀的位分配,从而在各种数据类型中实现了优于现有旋转编码器的性能。

详情
AI中文摘要

关键值(KV)缓存是长上下文自回归推断中内存带宽和足迹的主要瓶颈。最近的旋转预条件编码器(TurboQuant, PolarQuant)表明,通过结构化的随机旋转后,再配合每个坐标轴的标量量化器,该量化器的边际分布具有解析性,可以近似达到KV压缩的最优解。OCTOPUS通过联合量化旋转坐标三元组进一步推进了这一范式。每个三元组的方向通过八面体参数化映射到平方,然后得到的两个坐标和三元组范数通过Lloyd-Max量化与实现匹配的边际分布进行量化。通过优化每个三元组的平方误差,得到的位分配严格非均匀,仅依赖于键的总维度。我们发现,在有限维的情况下,通过扫描找到的质量最优是恒定的,无论在我们测试的任何现实解码器中。该编码器是数据无关的、在线的,并且在给定种子的情况下是确定性的。在文本、视频和音频中,OCTOPUS在每个报告的比特宽度和指标上都匹配或超越了所有先前的旋转编码器,其优势随着比特数的减少而增加。此外,一个融合的Triton实现可以在不生成未压缩键的情况下实时重建键,因此编码器在解码时间上不会增加带宽或延迟。项目页面:https://octopus-quant.github.io/

英文摘要

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

2605.21225 2026-05-21 cs.LG cs.AI 版本更新

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

PREFINE: 基于偏好的隐式奖励和成本微调以实现安全对齐

Richa Verma, Bavish Kulur, Sanjay Chawla, Balaraman Ravindran

发表机构 * TCS Research, \ of CSE, IIT Madras India Department of Computing Science, \ of Alberta Canada Qatar Computing Research Institute, \ Bin Khalifa University Qatar Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras India TCS Research, \ of CSE, IIT Madras Department of Computing Science, \ of Alberta Qatar Computing Research Institute, \ Bin Khalifa University Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras

AI总结 该研究提出PREFINE方法,通过基于偏好的隐式奖励和成本微调,在连续控制环境中实现安全策略对齐,通过微调预训练强化学习策略以生成低成本行为同时保持高奖励。

Comments Accepted at AAMAS 2026 as a full paper

详情
AI中文摘要

我们解决了通过引入成本约束使预训练的强化学习(RL)策略安全意识的问题,而无需重新训练。虽然成本可以数值编码,但我们假设更一般的情况是当成本作为偏好提供时。给定一个奖励优化的策略和一个小的偏好(低成本)和不偏好(高成本)轨迹数据集,我们的目标是微调策略以生成低成本行为,同时保留高奖励。与标准RLHF在语言模型中不同,我们的设置涉及轨迹层面的偏好,在连续控制环境中。我们介绍了PREFINE:基于偏好的隐式奖励和成本微调以实现安全对齐,这是一种基于偏好的微调方法,将现在广泛用于LLM微调的直接偏好优化(DPO)适应到序列决策设置中。PREFINE构造策略采样的反事实轨迹以建立有意义的偏好对比,并联合优化奖励保留和安全对齐。实证上,PREFINE将约束违反和灾难性故障减少了超过60%,同时保持原始奖励行为。PREFINE生成的策略在显著提高数据和计算效率的情况下,实现了低成本、高奖励性能, bridging preference alignment和安全策略适应在连续域中。

英文摘要

We address the problem of making a pre-trained reinforcement learning (RL) policy safety-aware by incorporating cost constraints without retraining it from scratch. While costs could be numerically encoded, we assume a more general setting is when costs are provided as preferences. Given a reward-optimized policy and a small dataset of preferred (low-cost) and dispreferred (high-cost) trajectories, our goal is to fine-tune the policy to generate low-cost behaviors while retaining high rewards. Unlike standard RLHF in language models, where preferences are defined over responses to the same prompt, our setting involves trajectory-level preferences in continuous control environments. We introduce PREFINE: Preference-based Implicit Reward and Cost Fine-Tuning for Safety Alignment which is a preference-based fine-tuning method that adapts Direct Preference Optimization (DPO), which is now widely used for LLM fine-tuning, to the sequential decision making setting. PREFINE constructs policy-sampled counterfactual trajectories to establish meaningful preference contrasts and jointly optimizes for reward retention and safety alignment. Empirically, PREFINE reduces constraint violations and catastrophic failures by over 60% while maintaining original reward behavior. PREFINE produces policies that achieve low-cost, high-reward performance with significantly improved data and computational efficiency compared to full offline RL or imitation learning, bridging preference alignment and safe policy adaptation in continuous domains.

2605.21217 2026-05-21 stat.ML cs.LG 版本更新

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

通过协作对齐的联邦LoRA微调大型语言模型

Shuaida He, Liwen Chen, Long Feng

发表机构 * School of Computing & Data Science, The University of Hong Kong(计算与数据科学学院,香港大学)

AI总结 本文研究了在联邦学习环境下使用LoRA进行参数高效微调的问题,提出了一种名为CLAIR的框架,通过结构低秩加块稀疏分解来恢复共享LoRA子空间并检测污染客户端,从而在噪声情况下实现精确恢复,并在不同条件下实现稳定和一致的协作集恢复。

详情
AI中文摘要

低秩适应(LoRA)已成为参数高效微调大型语言模型(LLMs)的强大工具。本文研究了在联邦学习设置下的LoRA,使客户端能够在保持参数效率的同时进行协作微调。我们专注于一个高度异质的环境,在这种环境中客户端仅共享部分结构,且大量子集可能被污染。我们提出了Collaborative Low-rank Alignment and Identifiable Recovery(CLAIR),一个意识污染的框架,仅依赖于初步的本地估计器。其公式适用于从线性回归到神经网络和LLM模块的广泛领域,只要本地适应可以表示为矩阵值更新。CLAIR通过结构低秩加块稀疏分解恢复共享LoRA子空间并检测污染客户端。我们证明了在无噪声情况下能够精确恢复共享LoRA子空间,在初步估计误差下实现稳定恢复,并在温和的分离条件下实现一致的协作集恢复。我们进一步量化了CLAIR的改进效果:它通过跨客户端平均减少子空间外的估计误差,同时在共享LoRA子空间内保留客户端特定的变异,从而在该Oracle增益超过子空间估计和良性客户端异质性的成本时优于本地微调。经验上,我们通过在文本复制任务上微调Transformer架构来展示CLAIR的优势。结果表明,与本地微调和非鲁棒联邦平均相比,CLAIR在准确检测污染客户端和改善良性客户端性能方面表现出色。

英文摘要

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

2605.21213 2026-05-21 quant-ph cs.AI cs.LG math.OC 版本更新

Enhanced Reinforcement Learning-based Process Synthesis via Quantum Computing

通过量子计算增强的强化学习过程合成

Austin Braniff, Fengqi You, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering, West Virginia University(西弗吉尼亚大学化学与生物医学工程系) R.F. Smith School of Chemical and Biomolecular Engineering, Cornell University(康奈尔大学R.F. Smith化学与生物分子工程学院)

AI总结 本文提出了一种基于量子强化学习的过程合成方法,通过构建通用框架将过程合成问题形式化为马尔可夫决策过程,并引入量子增强的强化学习算法以提高可扩展性,同时通过经典强化学习作为基准进行比较,展示了量子方法在过程合成中的竞争力。

详情
AI中文摘要

在本文中,我们提出量子强化学习(RL)作为解决过程合成问题的策略。基于我们先前的工作,我们开发了一个通用框架,将过程合成正式化为马尔可夫决策过程,并引入量子增强的强化学习算法来解决它,从而提高了可扩展性。早期的量子强化学习在过程合成中的实现受到量子位需求的限制,随着问题复杂度的增加,其扩展性较差。本文通过引入状态编码算法将量子位需求与问题规模解耦。使用经典强化学习作为基准,在相同的训练条件下评估量子算法。所有算法在具有递增单元数量的流程表合成问题上进行评估,以分析其性能和可扩展性。结果表明,所有方法都能在小设计空间中识别出最优的流程表设计。对于中等规模的单元数量,量子方法在每回合的基础上表现出竞争性的性能,并且在每参数的基础上具有改进的效率,优于经典强化学习基准。本文为未来量子计算在过程系统工程中的应用提供了基础,建立了比较经典和量子算法的受控基准,并展示了所提出的量子变体在本文研究的过程合成问题中仍具有竞争力。

英文摘要

In this work, we present quantum reinforcement learning (RL) as a solution strategy for process synthesis problems. Building on our prior work, we develop a generalized framework that formally poses process synthesis as a Markov decision process and introduces quantum-enhanced RL algorithms to solve it with improved scalability. Earlier implementations of quantum-based RL for process synthesis were limited by qubit requirements, which scaled poorly with problem complexity. This work overcomes this challenge by introducing state encoding algorithms to decouple qubit requirements from problem size. A classical RL-based solution strategy is used as a baseline to benchmark the quantum algorithms under identical training conditions. All algorithms are evaluated across a flowsheet synthesis problem of increasing unit counts to analyze their performance and scalability. Results show that all approaches are capable of identifying the optimal flowsheet designs in small design spaces. For moderate-scale unit counts, quantum approaches demonstrate competitive performance on a per-episode basis and improved efficiency on a per-parameter basis versus the classical RL benchmark. This work provides a foundation for future quantum computing applications within process systems engineering, establishes a controlled benchmark for comparing classical and quantum algorithms, and shows that the proposed quantum variants remain competitive for the process synthesis problem examined in this work.

2605.21211 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

基于Y-wise affine神经网络的强化学习控制:化学过程的比较案例研究

Austin Braniff, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering(化学与生物医学工程系)

AI总结 本文提出了一种高效且实用的强化学习控制方法,用于化学过程系统,通过Y-wise Affine Neural Network (YANN)-RL算法解决信任RL算法和训练可靠智能体的挑战,并在三个公开的化学工程案例研究中展示了其在减少训练时间和数据需求方面的优势。

Comments Accepted for publication at the 23rd IFAC World Congress, 2026

详情
AI中文摘要

在本工作中,我们提出了一种高效且实用的方法,用于将基于强化学习(RL)的控制应用于化学过程系统。这是一个尚未广泛采用RL控制的领域,主要由于RL算法的固有挑战和训练可靠智能体的耗时过程。为了解决这些挑战,我们利用了一类称为Y-wise Affine Neural Network (YANN)-RL的RL算法,该算法在我们之前的研究所提出(Braniff和Tian,2025a)。通过战略性地初始化actor和critic网络,YANN-RL算法在控制方案中提供自信且可解释的起点。我们将这种基于RL的控制方法应用于三个不同的过程工程案例研究,这些研究在PC-Gym库(Bloor等人,2026)中公开:(i)连续搅拌釜反应器(CSTR),(ii)四塔系统,以及(iii)多级萃取柱。我们的方法与几种流行的RL算法(PPO、SAC、DDPG和TD3)以及非线性模型预测控制(NMPC)进行了比较。这些案例研究证明,YANN-RL可以显著减少训练时间和所需的数据,可以放心地部署在化学过程系统中,并且在不掌握完整非线性模型的情况下可以接近NMPC的性能。

英文摘要

In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.

2605.21180 2026-05-21 cs.LG cs.SE 版本更新

Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

用于密集奖励的领域可适应强化学习代码生成

Erfan Aghadavoodi Jolfaei, Daniel Maninger, Abhinav Anand, Mert Tiftikci, Mira Mezini

发表机构 * Hessian Center for Artificial Intelligence (hessian.AI)(海斯曼人工智能中心) National Research Center for Applied Cybersecurity ATHENE(应用网络安全国家研究中心ATHENE)

AI总结 本研究提出了一种领域可适应的强化学习框架,用于改进代码生成的正确性、质量和安全性,通过定制化的执行感知奖励公式和令牌级奖励映射机制,提高了代码生成在不同领域中的适应性和执行效率。

Comments 10 pages, 2 figures, under review

详情
AI中文摘要

大型语言模型在自动化代码生成中显示出强大的潜力,但缺乏正确性、质量和安全性的保证,特别是在领域特定约束方面。例如在机器人领域,代码生成越来越多地用于规划和执行动作,环境意识和物理约束至关重要。为了促进代码生成LLM适应多样化需求,包括领域特定需求,我们提出了一种强化学习框架,通过近端策略优化微调预训练LLM。我们的可定制执行感知奖励公式捕捉并优化语法、功能正确性、代码风格、安全性和模拟器可执行性。一个令牌级奖励映射机制使从执行结果到生成令牌的有效信用分配成为可能。该框架在通用代码生成(MBPP/MBPP+)和机器人程序合成(RoboEval)上进行了评估。结果表明,在功能正确性和模拟器可执行性方面有显著改进,包括在MBPP上的pass@1绝对增加19%,在RoboEval上的执行失败减少51%。这些发现表明,结构化的强化学习可以有效地将语言模型对齐到正确的程序生成和领域特定需求。

英文摘要

Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for planning and executing actions, awareness of the environment and physical constraints is critical. To facilitate the adaption of code-generating LLMs to diverse requirements, including domain-specific ones, we present a reinforcement learning framework that fine-tunes pre-trained LLMs using proximal policy optimization. Our customizable execution-aware reward formula captures and optimizes syntax, functional correctness, code style, security, and simulator executability. A token-level reward mapping mechanism enables effective credit assignment from execution outcomes to generated tokens. The framework is evaluated on general-purpose code generation (MBPP/MBPP+) and robotic program synthesis (RoboEval). The results show substantial improvements in functional correctness and simulator executability, including an absolute pass@1 increase of 19% on MBPP and a reduction in execution failures by 51% on RoboEval. These findings demonstrate that structured reinforcement learning can effectively align language models to correct program generation and domain-specific requirements.

2605.21177 2026-05-21 cs.LG cs.CL 版本更新

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

ChunkFT: 用于内存高效全微调的分块优化

Yongkang Liu, Zijing Wang, Mengjie Zhao, Ercong Nie, Mingyang Wang, Qian Li, Feiliang Ren, Shi Feng, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China(中国东北大学) Shanghai Jiao Tong University, China(上海交通大学) CIS, LMU Munich, Germany(慕尼黑大学CIS实验室) MCML, Germany(德国MCML实验室) Shandong University, China(山东大学)

AI总结 本文提出ChunkFT框架,通过动态激活的工作集重新定义全参数微调,实现了无需修改网络架构即可对任意子张量进行梯度计算,理论分析和实验表明其在内存使用、运行时间和优化质量上均有效,且在下游任务中表现优于现有内存高效基线。

详情
AI中文摘要

本文提出了ChunkFT,一种内存高效的微调框架,其通过动态激活的工作集重新定义全参数微调。ChunkFT能够在不修改网络架构的情况下,对任意子张量进行梯度计算,为优化任意子网络提供了算法基础,同时避免了标准密集梯度计算。在确定性设置下,我们提供了ChunkFT的理论收敛分析。实验中,我们使用单块RTX 4090-24GB GPU和两块H800-80GB GPU分别对Llama 3-8B和Llama 3-70B进行微调。一个7B模型在1K输入长度下的全参数微调仅需13.72GB的GPU内存。结果表明,ChunkFT在内存使用、运行时间和优化质量上均有效。此外,在语言理解、数学推理和MT-Bench等下游任务中,ChunkFT在性能上一致优于现有内存高效的基线。值得注意的是,ChunkFT在某些情况下甚至超过了全参数微调的性能。我们的代码库可在https://github.com/misonsky/chunk上找到。

英文摘要

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

2605.21167 2026-05-21 stat.ML cs.LG 版本更新

A Rigorous, Tractable Measure of Model Complexity

一个严格且可计算的模型复杂度度量

Oskar Allerbo, Thomas B. Schön

发表机构 * KTH Royal Institute of Technology(皇家理工学院) Uppsala University(乌普萨拉大学)

AI总结 本文提出了一种严格且易于计算的模型复杂度度量方法,基于模型在不同输入上的梯度相似性,适用于参数模型和非参数模型,并扩展了多项式度数、核长度尺度等模型特定复杂度度量,同时揭示了随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象。

详情
AI中文摘要

对模型复杂度的准确评估对于解释、泛化和模型选择等主题至关重要。然而,大多数现有复杂度度量要么依赖于启发式假设,要么计算上不可行。在本文中,我们提出了一种数学上严谨且易于计算的模型复杂度度量方法,该方法基于模型在不同输入上的梯度相似性。因此,它适用于任何参数模型,也适用于基于核的非参数模型。我们证明了我们的复杂度度量可以推广到模型特定的复杂度度量,如多项式度数(多项式回归)、核长度尺度(Matérn核)、邻居数(k-近邻)、分割数(决策树)和树数(随机森林)。我们还利用我们的度量方法获得了关于随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象的新见解。

英文摘要

An accurate assessment of a model's complexity is crucial for topics such as interpretation, generalization, and model selection. However, most existing complexity measures either rely on heuristic assumptions or are computationally prohibitive. In this paper, we present a mathematically rigorous yet easy-to-compute measure of model complexity that is based on the similarities between the model gradients across inputs. It is thus well-defined for any parametric model, but also for kernel-based non-parametric models. We prove that our measure of complexity generalizes model-specific complexity measures such as polynomial degree (for polynomial regression), kernel length scale (for Matérn kernels), number of neighbors (for k-nearest neighbors), number of splits (for decision trees), and number of trees (for random forests). We also use our measure to obtain new insights into the double descent phenomenon for random Fourier features, random forests, neural networks, and gradient boosting.

2605.21164 2026-05-21 cs.LG quant-ph 版本更新

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Q-SYNTH:混合量子-经典对抗增强用于不平衡欺诈检测

Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai

发表机构 * Quantum Physics and Spintronics Team, LPMC, Faculty of Sciences Ben M'sick(量子物理与自旋电子团队,拉瓦尔学院,本·马西克科学学院) eBRAIN Lab, Division of Engineering, New York University Abu Dhabi (NYUAD)(eBRAIN实验室,工程学院,纽约大学阿布扎比分校(NYUAD)) Center for Quantum and Topological Systems (CQTS), NYUAD Research Institute(量子与拓扑系统中心(CQTS),NYUAD研究机构)

AI总结 本文提出Q-SYNTH,一种混合量子-经典对抗框架,用于生成不平衡欺诈检测中的少数类样本,通过量子电路生成器和经典神经网络判别器,提升欺诈检测的召回率和F1分数。

Comments 13 pages, 6 figures

详情
AI中文摘要

信用卡欺诈检测受到极端类别不平衡的挑战,其中欺诈交易稀少但操作上至关重要。这种不平衡通常使监督学习器偏向合法类别,导致整体准确率高但欺诈类召回率和F1分数较弱。本文介绍了Q-SYNTH,一种混合经典-量子生成对抗框架,其中参数化量子电路作为生成器,经典神经网络作为判别器。Q-SYNTH旨在表数据中生成少数类欺诈样本,并从两个维度进行评估:生成样本与真实欺诈样本的统计保真度以及下游欺诈检测性能。为此,生成的样本通过基于Kolmogorov-Smirnov统计和Wasserstein距离的分布相似性度量进行评估,通过AUC-ROC衡量真实与合成的可检测性,并在量子和经典分类器上评估下游分类性能。在报告的协议下,Q-SYNTH在与经典GAN基线相比减少了边缘分布不匹配,同时保持了具有竞争力的下游欺诈检测性能。尽管SMOTE在特征相似性方面最强,而经典GAN在某些设置中达到最高的下游性能,Q-SYNTH在分布保真度和下游性能之间提供了良好的权衡,支持了混合量子增强在不平衡欺诈检测中的可行性。

英文摘要

Credit card fraud detection is fundamentally challenged by extreme class imbalance, where fraudulent transactions are rare yet operationally critical. This imbalance often biases supervised learners toward the legitimate class, leading to high overall accuracy but weaker fraud-class recall and F1-score. This paper introduces Q-SYNTH, a hybrid classical--quantum generative adversarial framework in which a parameterized quantum circuit serves as the generator and a classical neural network serves as the discriminator. Q-SYNTH is designed for minority-class fraud synthesis in tabular data and is evaluated along two dimensions: statistical fidelity to real fraud samples and downstream performance for fraud detection. To this end, generated samples are assessed using distributional similarity measures based on Kolmogorov-Smirnov statistics and Wasserstein distances, real-vs-synthetic detectability measured by AUC-ROC, and downstream classification performance across both quantum and classical classifiers. Under the reported protocol, Q-SYNTH reduces marginal distribution mismatch relative to a classical GAN baseline while maintaining competitive downstream fraud-detection performance. Although SMOTE achieves the strongest feature-wise similarity and the classical GAN attains the highest downstream performance in several settings, Q-SYNTH offers a favorable compromise between distributional fidelity and downstream performance, supporting the feasibility of hybrid quantum augmentation for imbalanced fraud detection.

2605.21160 2026-05-21 cs.LG 版本更新

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

通过反向生成数据和引导强化学习学习第一积分

Jingfeng Zhong, Zhengxiang Liu, Zhijie Wang, Shuai Li

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 本文提出FISolver,一种基于LLM的求解器,通过反向生成数据和引导强化学习方法,解决第一积分发现中的数据稀缺问题,并在挑战性基准上显著优于其他方法。

Comments 17 pages, 2 figures, 3 tables

详情
AI中文摘要

发现第一积分对理解动力系统中的守恒律具有根本科学意义。然而,现有的符号计算工具和大语言模型在这一任务上仍然有限,因为高质量的训练数据稀缺,且成功的解决方案往往依赖于数学直觉。本文提出了FISolver,一种旨在解决这一挑战的基于LLM的求解器。首先,我们介绍了一种

英文摘要

The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.

2605.21157 2026-05-21 cs.CV cs.AI cs.LG cs.RO 版本更新

Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums

多光谱下无人机影像用于军事检测的比较分析

Sourov Roy Shuvo, Prajwal Panth, Rajesh Chowdhury, Sorup Chakraborty, Sudip Chakrabarty, Prasant Kumar Pattnaik

发表机构 * School of Computer Engineering KIIT Deemed to be University(计算机工程学院 KIIT deemed to be 大学)

AI总结 本文研究了不同光谱条件下无人机影像用于军事目标检测的问题,通过构建四种不同数据集(灰度、热成像、夜视和模糊成像)来评估模型在不同环境下的性能,提出了一种改进的YOLOv11-small模型以提升无人机作战的性能和可靠性。

Comments 6 pages, 7 figures. Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 6-11, 2025, IIT Indore. Proceedings pending publication

详情
AI中文摘要

在现代战争中,无人机已成为情报收集和精确打击在不同 hostile 环境中的重要组成部分。其能够从安全距离实时操作 hostile 环境的能力使其在监视和军事行动中具有无价的价值。KIIT-MiTA 数据集由从无人机拍摄的不同军事场景图像组成,为检测军事目标提供了基础,但未考虑各种现实场景。为此,创建了四种不同类型的数据集:灰度、热成像、夜视和模糊成像,以模拟现实环境如低能见度、热成像和夜间条件。YOLOv11-small 模型被训练和用于检测不同设置中的目标。本研究通过在防御和进攻任务中开发先进的检测系统,提高了基于无人机的作战性能和可靠性。

英文摘要

In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of images of different military scenarios taken from drones, and these provide a foundation for detecting military objects, but it does not take into account the various types of real-world scenarios. With that in mind, to evaluate how the models are performing under varying conditions, four different types of datasets are created: Gray Scale, Thermal Vision, Night Vision, and Obscura Vision. These simulate the real-world environments such as low visibility, heat-based imagery, and nighttime conditions. The YOLOv11-small model is trained and used to detect objects across diverse settings. This research boosts the performance and reliability of drone-based operations by contributing to the development of advanced detection systems in both defensive and offensive missions.

2605.21154 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

精神病诊断的ICD分类自动化:从经典NLP到大语言模型

Fernando Ortega, Raúl Lara-Cabrera, Jorge Dueñas-Lerín, Alejandro de la Torre-Luque, Mercé Salvador Robert, Enrique Baca-García

发表机构 * Department of Sistemas Informáticos, Universidad Politécnica de Madrid, Spain(西班牙马德里理工大学信息系统系) KNODIS Research Group, Universidad Politécnica de Madrid, Spain(西班牙马德里理工大学KNODIS研究组) CIBERSAM ISCIII, Spain(西班牙ISCIII CIBERSAM) Department of Legal Medicine, Psychiatry and Pathology. Complutense University of Madrid, Spain(西班牙马德里康普顿斯大学法医学、精神病学与病理学系) Hospital Universitario de Móstoles, Universidad Rey Juan Carlos, Spain(西班牙雷阿尔皇家卡洛斯大学莫斯特oles大学医院) Department of Psychiatry, University Hospital Jimenez Díaz Fundation, Madrid, Spain(西班牙圣地亚哥· jiménez Díaz基金会精神病科部) Department of Psychiatry, University Hospital Rey Juan Carlos, Móstoles, Spain(西班牙雷阿尔皇家卡洛斯大学莫斯特oles医院精神病科部) Department of Psychiatry, General Hospital of Villalba, Madrid, Spain(西班牙维拉尔巴医院精神病科部) Department of Psychiatry, University Hospital Infanta Elena, Madrid, Spain(西班牙伊菲格尼亚医院精神病科部) Department of Psychology, Universidad Catolica del Maule, Talca, Chile(智利马尔学院心理学系) Department of Psychiatry, Madrid Autonomous University, Madrid, Spain(西班牙马德里自治大学精神病科部)

AI总结 本研究提出利用NLP和机器学习技术将自由文本描述映射到国际疾病分类(ICD),以自动化精神病诊断分析,通过评估从经典频率模型到先进大语言模型的多种文本表示方法,展示了transformer嵌入在捕捉隐含语义线索和细致医学术语方面的优势。

详情
AI中文摘要

心理健康已成为全球优先事项,导致临床诊断编码的行政负担巨大。本研究提出通过将自由文本描述映射到国际疾病分类(ICD)来自动化精神病诊断分析,利用包含145,513个西班牙精神病描述的专用数据集,评估了从经典频率模型(BoW,TF-IDF)到先进大语言模型(如e5_large、BioLORD和Llama-3-8B)的各种文本表示方法。结果表明,基于transformer的嵌入 consistently 超过传统方法,通过端到端微调,e5_large模型实现了最高的性能,F1_micro得分为0.866。本研究证明了将大语言模型适应特定临床术语对于克服“长尾”标签分布和精神病 discourse 的固有模糊性至关重要。

英文摘要

Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

2605.21147 2026-05-21 cs.LG cs.CL 版本更新

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

SMoA:用于参数高效微调的频谱调制适配器

Yongkang Liu, Xing Li, Mengjie Zhao, Shanru Zhang, Zijing Wang, Qian Li, Shi Feng, Feiliang Ren, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China(东北大学,中国) Shandong University, China(山东大学,中国) CIS, LMU Munich, Germany(慕尼黑莱布尼茨大学CIS中心,德国) MCML, Germany(德国MCML)

AI总结 本文提出SMoA,一种频谱感知更新的适配器,通过在较小的参数预算下扩大可访问的频谱更新家族,提升参数高效微调的性能。

详情
AI中文摘要

随着模型参数数量的增加,参数高效微调(PEFT)已成为定制预训练大语言模型的首选方法。低秩适应(LoRA)使用低秩更新方法来模拟全参数微调,广泛用于减少资源需求。然而,降低秩面临代表能力有限的挑战。理论表明,LoRA微调秩r收敛于预训练权重矩阵的前r个奇异值。随着秩的增加,更多主奇异方向被保留,通常会提高模型性能。然而,更大的秩也会引入更多的可训练参数,导致更高的计算成本。为克服这一矛盾,我们提出SMoA,一种频谱调制适配器,通过在较小的参数预算下扩大可访问的频谱感知更新家族。SMoA将层分成多个对齐的频谱块,并在每个对角块上应用一个块内Hadamard调制的低秩分支,从而获得更广泛的预训练频谱方向覆盖。我们提供了多个任务的理论分析和实证结果。在我们的实验中,SMoA在当前较低预算设置下优于LoRA和具有竞争力的LoRA风格基线。

英文摘要

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity. Theory suggests that LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix. As the rank increases, more principal singular directions are preserved, which generally improves the model's performance. However, a larger rank also introduces more trainable parameters, leading to higher computational cost. To overcome this dilemma, we propose SMoA, a \textbf{S}pectrum \textbf{Mo}dulation \textbf{A}dapter that enlarges the accessible family of spectrum-aware updates under a smaller parameter budget. SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions. We provide theoretical analysis and empirical results on multiple tasks. In our experiments, SMoA improves average performance in the current lower-budget setting over LoRA and competitive LoRA-style baselines.

2605.21127 2026-05-21 cs.LG 版本更新

Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

推理轨迹坍缩:在微调过程中显式推理能力的丧失评估

Lukas Twist, Helen Yannakoudakis, Jie M. Zhang

发表机构 * King’s College London(伦敦国王学院)

AI总结 本文研究了在微调过程中显式推理能力的丧失问题,提出了一种结构评估框架来区分答案正确性与推理轨迹的有效性,并发现标准监督微调会迅速抑制有效的推理轨迹,而仅关注答案的指标会掩盖这一问题。

Comments 22 pages, 3 tables, 3 figures

详情
AI中文摘要

显式推理模型被训练以在最终答案之前生成中间推理轨迹,但下游微调通常在不包含此类轨迹的普通指令-响应数据上进行。我们证明这种不匹配会导致推理轨迹坍缩:微调后的模型仍然能生成合理的最终答案,但会失去使其成为推理模型的结构有效推理轨迹。我们引入了一种结构评估框架,将答案正确性与推理轨迹有效性分开,测量有效、空、缺失和截断的推理轨迹以及基于推理的任务性能。使用该框架,我们研究了四个开放式推理模型,发现标准监督微调可以迅速抑制有效的推理轨迹,而仅关注答案的指标会显著掩盖这一失败:在几种设置中,基于有效推理的性能仍保持高位,而有效推理的比例却大幅下降。我们进一步表明,简单的损失屏蔽策略可以在不需教师生成推理轨迹的情况下显著缓解坍缩。这些结果表明,微调后的推理模型的评估应报告结构推理可靠性指标,尤其是在适应数据不包含显式推理轨迹的情况下。

英文摘要

Explicit reasoning models are trained to produce intermediate reasoning traces before final answers, but downstream fine-tuning is often performed on ordinary instruction-response data that contains no such traces. We show that this mismatch can induce reasoning-trace collapse: a fine-tuned model continues to produce plausible final answers while losing the structurally valid explicit reasoning traces that made it a reasoning model in the first place. We introduce a structural evaluation framework that separates answer correctness from reasoning-trace validity, measuring valid, empty, missing, and truncated reasoning alongside reasoning-conditioned task performance. Using this framework, we study four open-weight reasoning models and find that standard supervised fine-tuning can rapidly suppress valid reasoning traces, and that answer-only metrics can substantially obscure this failure: in several settings, performance conditional on valid reasoning remains high while the rate of valid reasoning falls sharply. We further show that simple loss-masking strategies can substantially mitigate collapse without requiring teacher-generated reasoning traces. These results suggest that evaluations of fine-tuned reasoning models should report structural reasoning reliability metrics in addition to final-answer performance, especially when adaptation data does not contain explicit reasoning traces.

2605.21123 2026-05-21 cs.CV cs.LG 版本更新

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Linear-DPO: 用于扩散和流匹配生成模型的线性直接偏好优化

Kesong Li, Yixuan Xu, Kuo-kun Tseng, Weiyi Lu, Kan Liu, Tao Lan

发表机构 * School of Computer Science and Technology, Harbin Institute of Technology(哈尔滨工业大学计算机科学与技术学院) Alibaba Group(阿里巴巴集团)

AI总结 本文提出Linear-DPO,通过统一的反向时间SDE框架推导出涵盖扩散和流匹配的通用DPO目标,指出标准DPO目标在文本到图像生成中不最优,并通过定性定量实验验证了其在扩散模型和流匹配模型上的优越性。

Comments Code and models are available at: https://github.com/Whynot0101/Linear-DPO . Work done during an internship at Alibaba Group

详情
AI中文摘要

直接偏好优化(DPO)在大语言模型对齐中取得成功,但在文本到图像生成中仍面临挑战。现有研究局限于去噪扩散模型,忽略了流匹配,并在将离散NLP基础的DPO应用于回归基础生成任务时存在目标不匹配的问题。本文推导出一个通用的DPO目标,通过统一的反向时间SDE框架涵盖扩散和流匹配,并从梯度角度指出标准DPO目标在文本到图像生成中不最优。因此,我们提出Linear-DPO,用持续的线性效用函数替代了激进的sigmoid基效用函数,并结合EMA更新的参考模型。在扩散模型(SD1.5、SDXL)和流匹配模型(SD3-Medium)上的定性和定量实验展示了我们的方法优于现有基线。

英文摘要

Direct Preference Optimization (DPO) is successful for alignment in LLMs but still faces challenges in text-to-image generation. Existing studies are confined to denoising diffusion models while overlooking flow-matching, and suffer from an objective mismatch when applying discrete NLP-based DPO to regression-based generative tasks.\ In this paper, we derive a generalized DPO objective that covers both diffusion and flow-matching via a unified reverse-time SDE framework, and point out from a gradient perspective that the standard DPO objective is suboptimal for text-to-image generation. Consequently, we propose Linear-DPO, which replaces the aggressive sigmoid-based utility function with a sustained linear utility and incorporates an EMA-updated reference model. Qualitative and quantitative experiments on diffusion models (SD1.5, SDXL) and flow-matching model (SD3-Medium) demonstrate the superiority of our approach over existing baselines.

2605.21114 2026-05-21 cs.LG 版本更新

A Unified Framework for Uncertainty-Aware Explainable Artificial Intelligence: A Case Study in Power Quality Disturbance Classification

不确定性感知可解释人工智能的统一框架:电力质量扰动分类的案例研究

Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim

发表机构 * School of Engineering, Deakin University(德肯大学工程学院) Faculty of Mathematics and Computer Science, FernUniversität in Hagen(哈根应用科学大学数学与计算机科学学院) Department of Computing Technologies, Swinburne University of Technology(斯威本科技大学计算技术学院)

AI总结 本文提出了一种统一的框架,用于不确定性感知的可解释人工智能,通过在电力质量扰动分类任务中使用贝叶斯神经网络来捕捉解释分布的变异性,以提高决策的不确定性意识。

详情
AI中文摘要

事后可解释人工智能(XAI)方法通常产生确定性的归因图,而贝叶斯神经网络(BNNs)则在解释上诱导出一个分布。捕捉这种分布的变异性对于不确定性感知的决策至关重要。本文将解释分布定义为通过任何Lipschitz连续的归因操作符将BNN后验推前得到的测度。进一步,本文提出了不确定性感知的相关归因操作符(UA-RAO),这是一个概括性的操作符家族,通过均值、方差、变异系数、分位数和集合论聚合度量来总结解释分布。通过蒙特卡洛可访问性和Wasserstein近似界提供了理论支持。该框架在15类电力质量扰动(PQD)分类基准上进行了评估,比较了三种BNN近似方法与三种归因操作符,使用相关质量准确度和交并比作为局部化度量。结果表明,深度集成模型与均值UA-RAO相比,在确定性基线之上提高了局部化效果,而其他UA-RAO总结揭示了点估计归因中不存在的不确定性模式。对测量信号的定性结果进一步表明,这些模式能够超越合成训练分布。该框架是领域无关的,可以应用于任何配对Lipschitz连续归因操作符的BNN。

英文摘要

Post-hoc explainable AI (XAI) methods typically produce deterministic attribution maps, whereas Bayesian neural networks (BNNs) induce a distribution over explanations. Capturing the variability of this distribution is important for uncertainty-aware decision-making. This paper formalises the \emph{explanation distribution} as the push-forward measure of the BNN posterior through any Lipschitz-continuous attribution operator. It further proposes the uncertainty-aware relevance attribution operator (UA-RAO), a general family of operators that summarises the explanation distribution using the mean, variance, coefficient of variation, quantiles, and set-theoretic aggregation measures. Theoretical support is provided through Monte Carlo accessibility and Wasserstein approximation bounds. The framework is evaluated on a 15-class power quality disturbance (PQD) classification benchmark, comparing three BNN approximations paired with three attribution operators using relevance mass accuracy and intersection-over-union as localisation metrics. Results show that deep ensembles with the mean UA-RAO improve localisation over the deterministic baseline, while other UA-RAO summaries reveal uncertainty patterns absent from point-estimate attributions. Qualitative results on measured signals further suggest that these patterns generalise beyond the synthetic training distribution. The framework is domain-agnostic and can be applied to any BNN paired with a Lipschitz-continuous attribution operator.

2605.21107 2026-05-21 cs.LG stat.ML 版本更新

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

通过自收缩性获得约束在线凸优化的改进保证

Dhruv Sarkar, Abhishek Sinha

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur(计算机科学与工程系,印度理工学院Kharagpur分校) School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, India(技术与计算机科学学院,印度基础研究科学研究院,孟买,印度)

AI总结 本文提出了一种基于投影的算法,在强凸损失下同时实现O(log T)的 regrets 和 O(log T) 的 CCV,对于凸损失则在保持最优 O(√T) regrets 的同时将 CCV 提升到 O(√T)。

详情
AI中文摘要

我们考虑了具有对抗性选择约束的约束在线凸优化 (COCO)。在每一轮中,学习者在观察该轮损失和约束函数之前选择动作。目标是在满足所有约束的最佳点上实现小静态遗憾,同时控制累积约束违反(CCV)。对于强凸损失,最先进的算法实现 O(log T) 的遗憾和 O(√(T log T)) 的 CCV。对应的凸损失最佳已知界限是 O(√T) 的遗憾和 O(√T log T) 的 CCV。在本文中,我们提出了一种简单的投影算法,对于强凸损失同时实现 O(log T) 的遗憾和 O(log T) 的 CCV,从而在 CCV 方面实现了指数级改进。对于凸损失,我们的算法将 CCV 提高到 O(√T),同时保持最优的 O(√T) 悲伤。我们改进的关键是一个最近的几何结果,用于自收缩曲线,这可能具有独立兴趣。

英文摘要

We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

2605.21104 2026-05-21 cs.LG 版本更新

HORST: Composing Optimizer Geometries for Sparse Transformer Training

HORST:用于稀疏Transformer训练的优化几何组合

Tom Jacobs, Rohan Jain, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security(CISPA 河岸信息安全中心)

AI总结 本文提出HORST,一种结合优化几何的模块化优化器,通过超几何镜像映射引入L1稀疏性偏置,以在保持训练稳定性的同时促进稀疏性。

Comments 22 pages, 8 figures

详情
AI中文摘要

稀疏化Transformer仍然是一个根本性挑战,因为标准优化器无法同时促进稀疏性和保持训练稳定性。有效的自适应优化器表现出隐含的L∞偏置,有利于稳定性,但稀疏性需要L1偏置。为了整合稀疏性,我们提出了一种优化器步骤的组合,将其视为非交换算子,以系统的方式分析和结合其优化几何。这导致了HORST(Hyperbolic Operator for Robust Sparse Training),一种模块化优化器,继承自自适应方法的稳定性,同时通过双曲镜像映射引入L1稀疏性偏置。我们的实验表明,HORST在视觉和语言任务上的稀疏Transformer训练中具有实用性。HORST在所有稀疏性水平上都显著优于AdamW基线,特别是在高稀疏性时有显著提升。

英文摘要

Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain training stability. Effective adaptive optimizers exhibit an implicit $L_{\infty}$ bias favoring stability, yet, sparsity requires an $L_1$ bias. To integrate sparsity, we propose a composition of optimizer steps, which we cast as non-commutative operators to analyze and combine their optimization geometry in a principled way. This yields HORST (Hyperbolic Operator for Robust Sparse Training), a modular optimizer that inherits stability from adaptive methods while inducing $L_1$ sparsity bias through a hyperbolic mirror map. Our experiments demonstrate its utility for sparse training of transformers on both vision and language tasks. HORST consistently and significantly outperforms AdamW baselines across all sparsity levels, with large gains at higher sparsity.

2605.21103 2026-05-21 cs.LG 版本更新

A Typed Tensor Language for Federated Learning

一种用于联邦学习的类型化张量语言

Theofilos Mailis, Kalliopi-Christina Despotidou, Konstantinos Filippopolitis, Yannis Foufoulas, Thanasis-Michail Karampatsis, Andreas Ktenidis, Evdokia Mailli, Theodore Papamarkou, Yannis Ioannidis

发表机构 * Athena Research Center(阿提卡研究中心) National and Kapodistrian University of Athens(雅典国家与卡波迪斯蒂亚诺大学) National Technical University of Athens(雅典技术大学)

AI总结 本文提出了一种类型化的张量语言,用于形式化联邦学习中的结构,通过共享状态因子分解理论和可微片段,实现了联邦学习计算的正式描述。

详情
AI中文摘要

联邦学习和分析通常被描述为多个独立协议的集合,即使它们共享相同的数学形式:客户端本地张量计算、可合并到共享状态的聚合,以及仅共享的后处理。我们引入了一种类型化的张量语言,该语言正式化了这种结构。该语言区分了联邦张量,其记录在客户端之间沿跟踪的记录轴上被分割,以及共享张量,其在全球范围内可用。其语义由与虚拟全局张量的比较定义,仅用作参考对象。主要结果是共享状态因子分解理论。我们证明了类型化的单轮程序通过固定维度的共享状态因子分解,其大小与客户端和记录的数量无关,由客户端本地张量表达式计算并跨客户端合并。我们还证明了一个相反的可表示性结果;那些编码器和解码器可以由该语言表达的因子分解由类型化的单轮程序实现,并且这种对应关系扩展到迭代程序,其跨轮状态是共享的。这给出了语言中可表示的计算的正式描述,这些计算可以表示为编码、合并和解码过程。然后,我们开发了一个可微片段用于学习。如果每个记录的损失及其每个记录的梯度由客户端本地张量表达式表示,则全局梯度由记录轴求和的联邦梯度张量表示。这产生了用于服务器端梯度下降和共享线性代数二次更新的类型化迭代程序。该框架表征了一类广泛的联邦学习计算,其通信通过固定维度的共享状态传递。

英文摘要

Federated learning and analytics are often described as collections of separate protocols, even when they share the same mathematical form: client-local tensor computation, mergeable aggregation into shared state, and shared-only post-processing. We introduce a typed tensor language that formalizes this structure. The language distinguishes federated tensors, whose records are partitioned across clients along a tracked record axis, from shared tensors, which are available globally. Its semantics are defined by comparison with a virtual global tensor, used only as a reference object. The main result is a shared-state factorization theory. We show that typed one-round programs factor through fixed-dimensional shared state whose size is independent of the number of clients and records, computed from client-local tensor expressions and merged across clients. We also prove a converse representability result; factorizations whose encoders and decoders are expressible in the language are realized by typed one-round programs, and the correspondence extends to iterative programs whose cross-round state is shared. This gives a formal account of the computations in the language that can be expressed as encode, merge, and decode procedures. We then develop a differentiable fragment for learning. If a per-record loss and its per-record gradient are represented by client-local tensor expressions, the global gradient is represented by record-axis summation of the federated gradient tensor. This yields typed iterative programs for server-side gradient descent and shared-linear-algebra second-order updates. The framework characterizes a broad class of federated learning computations whose communication passes through fixed-dimensional shared state.

2605.21094 2026-05-21 cs.LG 版本更新

UOTIP: Unbalanced Optimal Transport Map for Unpaired Inverse Problems

UOTIP:用于无配对逆问题的不平衡最优传输映射

Donggyu Lee, Taekyung Lee, Jaewoong Choi

发表机构 * Sungkyunkwan University(成均馆大学) IPAI (Interdisciplinary Program in Artificial Intelligence, Seoul National University)(人工智能跨学科项目(Seoul National University)) Department of Mathematical Sciences, Seoul National University(首尔国立大学数学科学系)

AI总结 本文提出了一种基于不平衡最优传输的逆问题求解器UOTIP,通过引入基于似然的成本函数,将重建任务建模为从噪声测量分布到干净信号分布的学习过程,从而在无配对逆问题上实现了最先进的性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了无配对图像逆问题,这是一种具有挑战性的设置,其中只有独立的、未配对的噪声测量和干净目标信号集可用进行训练。我们提出了一种基于不平衡最优传输的新型逆问题求解器,称为用于逆问题的不平衡最优传输映射(UOTIP)。我们的方法将重建任务建模为学习从噪声测量分布到干净信号分布的UOT映射,通过引入基于似然的成本函数进行预测。通过放松精确边缘约束,UOT框架为我们的模型提供了关键优势:对多级观测噪声的鲁棒性、适应噪声和干净数据集之间的类别不平衡,以及对不同噪声类型场景的泛化能力。此外,我们理论证明,引入二次成本项通过满足扭条件确保了运输映射的存在性和唯一性,即使在病态逆问题中也是如此。我们的实验表明,UOTIP在无配对图像逆问题基准上实现了最先进的性能,涵盖了线性和非线性逆问题。

英文摘要

We investigate unpaired image inverse problems, a challenging setting where only independent, non-paired sets of noisy measurements and clean target signals are available for training. We propose a novel inverse problem solver based on Unbalanced Optimal Transport, called Unbalanced Optimal Transport Map for Inverse Problems (UOTIP). Our method formulates the reconstruction task, predicting clean target signals from noisy measurements, as learning a UOT Map from noisy measurement distribution to clean signal distribution by incorporating a likelihood-based cost function. By relaxing the exact marginal constraint, the UOT framework provides key advantages to our model: robustness to multi-level observation noise, adaptability to class imbalance between noisy and clean datasets, and generalizability to diverse noise-type scenarios. Furthermore, we theoretically demonstrate that incorporating a quadratic cost term ensures the existence and uniqueness of the transport map by satisfying the twist condition, even for ill-posed inverse problems. Our experiments demonstrate that UOTIP achieves state-of-the-art performance on unpaired image inverse problem benchmarks, across linear and nonlinear inverse problems.

2605.21088 2026-05-21 cs.LG 版本更新

Reviving Error Correction in Modern Deep Time-Series Forecasting

在现代深度时间序列预测中复兴误差校正

Minh Hoang Nguyen, Dai Do, Huu Hiep Nguyen, Dung Nguyen, Kien Do, Hung Le

发表机构 * Deakin Applied AI Initiative, Deakin University, Australia(德克萨斯应用人工智能倡议,德克萨斯大学,澳大利亚)

AI总结 本文研究了深度时间序列预测中的误差累积问题,提出了一种通用误差校正器,通过分解趋势和季节性成分来提升预测的准确性和鲁棒性。

Comments 27 pages

详情
AI中文摘要

现代深度学习模型在时间序列预测中取得了显著成功。然而,由于自回归推理中的误差累积,其在长期预测中的性能会下降。尽管经典的误差校正机制(ECMs)长期以来被用于统计方法,但它们在深度学习模型中的应用仍然有限或无效。在本文中,我们重新审视了深度时间序列预测中的误差累积问题,并探讨了ECMs在此新背景中的作用和必要性。我们提出了一种简单、架构无关的误差校正模型,可以与任何现有的预测器集成,而无需重新训练。通过显式地将预测分解为趋势和季节性成分,并分别训练校正器来调整每个成分,我们引入了具有季节-趋势分解的通用误差校正器(UEC-STD),在4种骨干网络和10个数据集上显著提高了校正精度和鲁棒性。我们的发现提供了一种实用工具来增强预测,同时为减轻深度时间序列模型中的自回归误差提供了新的见解。代码可在https://github.com/DA2I2-SLM/UEC-STD上获得。

英文摘要

Modern deep-learning models have achieved remarkable success in time-series forecasting. Yet, their performance degrades in long-term prediction due to error accumulation in autoregressive inference, where predictions are recursively used as inputs. While classical error correction mechanisms (ECMs) have long been used in statistical methods, their applicability to deep learning models remains limited or ineffective. In this work, we revisit the error accumulation problem in deep time-series forecasting and investigate the role and necessity of ECMs in this new context. We propose a simple, architecture-agnostic error correction model that can be integrated with any existing forecaster without requiring retraining. By explicitly decomposing predictions into trend and seasonal components and training the corrector to adjust each separately, we introduce the Universal Error Corrector with Seasonal-Trend Decomposition (UEC-STD), which significantly improves correction accuracy and robustness across 4 backbones and 10 datasets. Our findings provide a practical tool for enhancing forecasts while offering new insights into mitigating autoregressive errors in deep time-series models. Code is available at https://github.com/DA2I2-SLM/UEC-STD.

2605.21085 2026-05-21 cs.MA cs.AI cs.LG 版本更新

Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

分离通信与策略:在带宽限制下的鲁棒多智能体强化学习

Alexi Canesse, Benoît Goupil, Jesse Read, Sonia Vanier

发表机构 * École polytechnique (LIX)(巴黎高等理工学院(LIX)) CNRS(国家科学研究中心) Institut Polytechnique de Paris(巴黎高等理工学院) Palaiseau, France(法国Palaiseau)

AI总结 本文提出了一种新的方法,通过引入β指标和SLIM架构,将通信路径与策略的潜在表示分离,从而在带宽受限的情况下提高多智能体强化学习的鲁棒性和性能。

详情
AI中文摘要

通信在多智能体强化学习(MARL)中起到了协调作用,但许多实际应用,例如无人机编队的搜索与救援任务,在严重的带宽限制下运行。许多通信架构仍然存在耦合瓶颈,其中共享的潜在表示用于策略执行和智能体间通信。因此,减少信息量会直接限制策略的潜在空间,通常导致显著的性能下降。我们通过两个贡献来解决这个问题。首先,我们引入β,一个归一化的每智能体带宽预算,将稀疏性、轮次和信息维度统一为一个可比的约束。其次,我们提供SLIM,一个最小的架构,将通信路径与策略的潜在表示分离,使我们能够隔离带宽的影响与策略容量的影响,同时受益于步骤内通信。我们在几个部分可观测的MARL基准上评估了我们的方法,其中通信是至关重要的。我们的方法在状态空间中实现了最先进的性能,并且在有限的通信下表现出可扩展性和鲁棒性,随着带宽的减少,降级仅是轻微的。

英文摘要

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.

2605.21083 2026-05-21 physics.app-ph cs.LG physics.bio-ph physics.med-ph 版本更新

AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation

AIMBio-Mat: 一个面向AI的FAIR平台,用于闭环材料发现与生物医学转化

D. -M. Mei, K. Acharya, C. M. Adhikari, M. Adhikari, S. Aryal, B. V. Benson, K. Bhatta, S. Bhattarai, N. Budhathoki, A. M. Castillo, D. Chakraborty, S. Chhetri, S. Choudhury, T. A. Chowdhury, R. D. Cruz, B. Cui, S. Dhital, K. -M. Dong, R. Gapuz, A. Ghasemi, E. Z. Gnimpieba, B. D. S. Gurung, H. A. Hashim, R. I. Harry, K. -E. Hasin, M. K. Hassanzadeh, M. K. Jha, D. Kim, K. -C. Kong, B. Lama, A. Mahat, N. Maharjan, A. Majeed, J. Mammo, M. M. Masud, K. S. Moore, A. Nawaz, H. Oli, S. A. Panamaldeniya, L. Pandey, R. Pandey, Z. Peng, A. Prem, M. M. Rana, K. Rana Magar, R. Rizk, C. S. Tadi, L. -W. Wang, Y. Yang, G. -L. Yin, C. -X. Yu, D. Zeng, M. Zhou, Q. Zhou

发表机构 * Department of Physics, University of South Dakota(南达科他大学物理系) South Dakota School of Mines and Technology(南达科他州矿学院) Department of Chemistry, Physics and Materials Science, Fayetteville State University(费耶特维尔州立大学化学、物理与材料科学系) University of South Dakota(南达科他大学) PROMISE Lab, Sanford Research(桑福德研究机构PROMISE实验室) Department of Mechanical Engineering, University of Mississippi(密西西比大学机械工程系) Department of Physics and Astronomy, University of Kansas(堪萨斯大学物理与天文学系) Tiospa Zina Tribal School(蒂奥萨宾纳部落学校) Department of Mechanical and Materials Engineering, University of Nebraska–Lincoln(内布拉斯加大学林肯分校机械与材料工程系)

AI总结 本文提出AIMBio-Mat平台,通过整合材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习,解决材料发现与生物医学转化中跨领域推理的问题,并提供可验证的平台蓝图。

Comments 35 pages, 4 figures, and 12 tables

详情
AI中文摘要

材料发现和生物医学转化日益需要能够跨组成、加工、结构、生物响应、可制造性、安全性和治理约束进行推理的模型。现有的材料和生物医学数据生态系统虽然强大,但仍然缺乏与AI指导发现相结合的能力。本文提出AIMBio,一个面向AI的、符合FAIR原则和治理意识的决策层框架,将材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习联系起来。该框架将生物医学-材料发现建模为在不确定性下的约束多目标优化,并引入了元数据、模型文档、风险分层治理、评估指标和分阶段实施的实用要求。为使路线图可测试,我们增加了最小可行原型规范和一个用于AI指导的纳米材料药物输送的示范试点。AIMBio被定位为探索性和临床前发现基础设施,而不是临床决策支持软件;任何临床或受控设备使用都需要单独的验证、变更控制和监管审查。核心贡献是提供一个可发表的平台蓝图,将碎片化的材料和生物医学记录转化为可审计、实验可操作和转化负责任的发现工作流。

英文摘要

Materials discovery and biomedical translation increasingly require models that can reason across composition, processing, structure, biological response, manufacturability, safety, and governance constraints. Existing materials and biomedical data ecosystems are powerful but remain poorly coupled for AI-guided discovery. Here we present AIMBio, a conceptual framework for an AI-native, FAIR, and governance-aware decision layer that links materials provenance, biomedical context, knowledge graphs, uncertainty-aware machine learning, and human-in-the-loop active learning. The framework formulates biomedical-materials discovery as constrained multi-objective optimization under uncertainty and introduces practical requirements for metadata, model documentation, risk-tiered governance, evaluation metrics, and phased implementation. To make the roadmap testable, we add a minimum viable prototype specification and a worked pilot for AI-guided nanomaterials for drug delivery. AIMBio is positioned as exploratory and preclinical discovery infrastructure, not as clinical decision-support software; any clinical or regulated-device use would require separate validation, change control, and regulatory review. The central contribution is a publishable platform blueprint for converting fragmented materials and biomedical records into auditable, experimentally actionable, and translationally responsible discovery workflows.

2605.21081 2026-05-21 cs.SD cs.LG 版本更新

Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model

音乐注意力转换器:使用音乐特定的注意力模型进行音乐生成

Shinnosuke Taksuka, Hideo Mukai

发表机构 * Department of Computer Science, School of Science and Technology, Meiji University(计算机科学系,科学与技术学部,立命馆大学)

AI总结 本文提出了一种音乐特定的注意力模型,通过整合元信息来提升音乐生成的质量,核心方法是将音乐结构和元数据结合,主要贡献是提高了生成音乐的连贯性和多样性。

Comments 32 pages, 13 figures

详情
AI中文摘要

本研究旨在通过引入元信息来提升使用Transformer进行音乐生成的质量。尽管基于Transformer的方法在捕捉音乐作品中的长期依赖性方面有效,但它们生成的音乐常出现重复或音符重复的问题,导致不自然的旋律。为了解决这些限制,我们提出了音乐注意力机制,该机制将元信息如小节号、调性、节拍等整合到注意力过程中。音乐注意力显式利用音乐的结构属性及其相关元数据,使Transformer的注意力机制能够更有效地运作,从而提高生成输出的质量。在我们的框架中,每个音乐音符被表示为五个事件(音高、小节号、起始时间、持续时间和力度)以及三个元数据元素的组合。然后将注意力机制修改为反映这些八个特征之间的相关性,使模型能够更好地捕捉音乐编排的内在特性。实验结果表明,整合音乐注意力的模型在音乐连贯性、变化性和整体质量方面优于先前的方法,如全注意力和步进注意力。值得注意的是,它显著减少了重复并增强了模型生成多样化、和谐一致的旋律的能力。音乐注意力因此在AI驱动的音乐生成中代表了重要的进展,有助于创建更自然和富有表现力的音乐作品。

英文摘要

This study aims to enhance the quality of music generation using Transformers by incorporating meta-information. While Transformer-based approaches are effective at capturing long-term dependencies in musical compositions, the music they generate often suffers from issues such as excessive repetition or duplication of notes, leading to unnatural melodies. To address these limitations, we propose Musical Attention, a mechanism that incorporates meta-information such as bar numbers, key, signatures, and tempos into the attention process. Musical Attention explicitly leverages both the structural properties of music and its associated metadata, enabling the Transformer's attention mechanism to operate more effectively and thereby improving the quality of the generated output. In our framework, each musical note is represented as a combination of five events-pitch, bar number, onset, duration, and velocity in addition to the three metadata elements. The attention mechanism is then modified to reflect the correlations among these eight features, allowing the model to better capture the inherent characteristics of musical composition. Experimental results demonstrate that the model incorporating Musical Attention outperforms prior methods, such as Full Attention and Strided Attention, in terms of musical coherence, variation, and overall quality. Notably, it significantly reduces repetition and enhances the model's ability to generate diverse, harmonically consistent melodies. Musical Attention thus represents a meaningful advancement in AI-driven music generation, facilitating the creation of more natural and expressive compositions.

2605.21075 2026-05-21 cs.CV cs.LG 版本更新

SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

SpectralEarth-FM: 将高光谱图像引入多模态地球观测预训练

Nassim Ait Ali Braham, Aaron Banze, Conrad M. Albrecht, Julien Mairal, Jocelyn Chanussot, Xiao Xiang Zhu

发表机构 * Chair of Data Science in Earth Observation(地球观测数据科学主任) Technical University of Munich(慕尼黑技术大学) Remote Sensing Technology Institute(遥感技术研究所) German Aerospace Center (DLR)(德国航空航天中心) Department of Aerospace Engineering(航空航天工程系) University of the Bundeswehr Munich(联邦国防军慕尼黑大学) LEAP Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Univ. Grenoble Alpes(格勒诺布尔阿尔卑斯大学) Inria(法国国家信息与自动化技术研究院) CNRS(法国国家科学研究中心) Grenoble INP(格勒诺布尔INP) LJK

AI总结 本文提出SpectralEarth-FM,一种用于多传感器地球观测输入的分层变压器,旨在联合处理高光谱图像与低通道观测。通过构建SpectralEarth-MM数据集,采用JEPA风格的目标进行预训练,实现了在高光谱下游任务和标准EO基准上的最佳性能。

详情
AI中文摘要

地球观测(EO)基础模型(FMs)越来越多地使用多传感器数据进行训练,涵盖多谱段图像(MSI)、合成孔径雷达(SAR)和衍生的地理空间层,但高光谱图像(HSI)仍被低估。相反,现有的高光谱FM仅在HSI上训练,未探索HSI与共定位EO传感器的联合预训练和融合。我们引入SpectralEarth-FM,一种用于多传感器EO输入的分层变压器,具有异构光谱维度。该架构结合了高光谱输入的光谱标记化、传感器特定编码器、跨传感器融合模块和共享分层编码器,能够联合处理HSI和低通道观测。为了预训练SpectralEarth-FM,我们构建了SpectralEarth-MM数据集,该数据集将EnMAP、EMIT、DESI三颗空间载荷的HSI与Sentinel-2、Landsat-8/9光学图像、Landsat地表温度(LST)和Sentinel-1 SAR在共同地理足迹上进行共定位。该数据集包含约2000万个全球分布的地点,25000万个地理参考碎片,以及超过40TB的数据。预训练使用一种联合嵌入预测架构(JEPA)风格的目标,匹配全球视图和同一地点单传感器局部视图之间的表示。我们评估了SpectralEarth-FM在高光谱下游任务和标准EO基准上的性能,遵循PANGAEA协议,实现了在两种评估设置中的最佳性能。

英文摘要

Earth observation (EO) foundation models (FMs) are increasingly trained on multisensor data, spanning multispectral imagery (MSI), synthetic aperture radar (SAR), and derived geospatial layers, but hyperspectral imagery (HSI) remains underrepresented. Conversely, existing hyperspectral FMs are trained on HSI alone, leaving joint pretraining and fusion of HSI with co-located EO sensors unexplored. We introduce SpectralEarth-FM, a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. To pretrain SpectralEarth-FM, we curate SpectralEarth-MM, a dataset that co-locates HSI from three spaceborne sensors (EnMAP, EMIT, DESIS) with Sentinel-2, Landsat-8/9 optical imagery, Landsat land surface temperature (LST), and Sentinel-1 SAR, over common geographic footprints. It comprises approximately 2M globally distributed locations, 25M georeferenced patches, and over 40TB of data. Pretraining uses a Joint-Embedding Predictive Architecture (JEPA)-style objective that matches representations between global views and single-sensor local views from the same location. We evaluate SpectralEarth-FM on hyperspectral downstream tasks and standard EO benchmarks following the PANGAEA protocol, achieving state-of-the-art results across both evaluation settings.

2605.21070 2026-05-21 cs.LG 版本更新

Towards Understanding Self-Pretraining for Sequence Classification

向序列分类中的自预训练理解迈进

Omar Coser, Loredana Zollo, Paolo Soda, Antonio Orvieto

发表机构 * Unit of Artificial Intelligence & Computer Systems, Università Campus Bio-Medico di Roma(人工智能与计算机系统单位,罗马生物医学学院) Unit of Advanced Robotics and Human-Centered Technologies, Università Campus Bio-Medico di Roma(先进机器人与以人为本技术单位,罗马生物医学学院) Department of Diagnostics and Intervention, Radiation Physics, Biomedical Engineering, Umeå University(诊断与介入部门,辐射物理,生物医学工程,乌梅拉大学) Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所) Tübingen AI Center(图宾根人工智能中心)

AI总结 本文通过复制和系统消融Amos等人的研究,揭示了自预训练(SPT)在序列分类中提升性能的关键因素,发现标签监督在学习有用的查询-键注意力模式方面存在瓶颈,并通过简化理论框架证明了自预训练通过学习接近性交互来提升性能。

Comments v1: Preliminary, extension of the version accepted at ICML 2025 Workshop MOSS

详情
AI中文摘要

Amos等人(2024)表明,通过首先使用掩码标记预测目标进行预训练,可以在不使用外部数据或增强的情况下显著提高Transformer模型在序列分类中的准确性,这一过程称为自预训练(SPT)。尽管Amos等人(2024)的主要目标是展示Transformer在Long-Range Arena(LRA)上的强大性能,但他们的流程引发了更多根本性问题:SPT如何驱动优化以获得更好的解决方案?为什么标准监督训练在Transformer中会失效?为了更好地理解这一点,我们复制并系统消除了Amos等人(2024)的发现。我们的消融分析表明,在研究的设置中,关键瓶颈并非深度或泛化本身,而是标签监督在随机初始化下学习有用查询-键注意力模式的能力。在最小化设置中,我们识别出学习接近性交互——将绝对位置编码转换为接近性偏置的注意力分数——是SPT带来的改进的关键来源。最后,在简化理论框架中,我们证明标签监督在某些注意力分数方向上可能是局部盲目的,而这些方向可以通过掩码重建来检测。

英文摘要

Amos et al. (2024) showed that the accuracy of Transformer models in sequence classification can be significantly improved by first pretraining with a masked token prediction objective without external data or augmentation, a procedure referred to as self-pretraining (SPT). While the primary objective of Amos et al. (2024) was to showcase that Transformers can achieve strong performance on the Long-Range Arena (LRA), their pipeline raises more fundamental questions: How does SPT drive optimization to better solutions? Why can standard supervised training fail in Transformers? To better understand this, we replicate and systematically ablate the findings of Amos et al. (2024). Our ablations suggest that a central bottleneck in the studied settings is not depth or generalization alone, but the ability of label supervision to learn useful query-key Attention patterns from random initialization. With a minimal setup, we identify learning proximity interactions - turning absolute positional encodings into proximity-biased Attention scores - as a key source of the improvements brought by SPT. Finally, in a simplified theoretical setup, we show that label supervision can be locally blind to certain Attention-score directions that are instead detectable through masked reconstruction.

2605.21066 2026-05-21 cs.LG 版本更新

Robust Personalized Recommendation under Hidden Confounding in MNAR

在MNAR中具有隐藏混杂因素的鲁棒个性化推荐

Zongyu Li, Wanting Su, Tianyu Xia

发表机构 * Guangdong University of Technology(广东工业大学) Chinese Academy of Sciences(中国科学院) Peking University(北京大学)

AI总结 本文提出了一种新的框架,通过估计用户-项目层面的敏感度界限,缓解了全局敏感度界限中固有的同质性假设,从而在存在隐藏混杂因素的情况下实现更鲁棒和准确的个性化推荐。

详情
AI中文摘要

推荐系统通常依赖于观察到的用户-项目交互数据,这些数据由于用户对项目的有选择性交互而容易产生选择偏差。逆概率加权和双重稳健估计器在观察到的混杂因素下有效缓解了选择偏差,但在存在隐藏混杂因素的情况下不可靠。现有的方法依赖于随机对照试验(RCTs)或全局敏感度界限,在实践中受到限制:RCTs需要昂贵的实验数据,而全局敏感度界限假定通过敏感性分析,未测量的混杂因素对倾向性的影响是均匀有界的,从而忽视了用户-项目交互中的异质性。为克服这一限制,我们提出了一种新的框架,该框架估计用户-项目层面的敏感度界限,从而显著放宽了全局敏感度界限中固有的同质性假设,称为个性化未观察混杂因素意识交互去混杂(PUID)。为确保鲁棒性和预测准确性,我们进一步开发了对抗优化策略,并提出了一个基准引导的变体(BPUID),该变体结合了预训练模型作为稳定参考。在三个真实世界数据集上的广泛实验表明,我们的方法在存在隐藏混杂因素的情况下显著优于全局方法,且不需要RCT数据。

英文摘要

Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.

2605.21060 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Divide et Calibra: 通过向量量化实现多类局部校准

Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana

发表机构 * University of Pisa(比萨大学) University of Trento(特伦托大学) Meta(Meta公司) Fondazione Bruno Kessler(布鲁诺·凯斯勒基金会)

AI总结 本文提出了一种复合方法,通过向量量化诱导表示空间的结构划分,并利用Dirichlet浓度的参数化实现跨区域参数共享,从而学习出能泛化到稀疏区域的异质校准映射,提升了局部校准性能同时保持了全局校准和预测性能。

详情
AI中文摘要

在高风险场景中,准确且校准良好的机器学习(ML)模型是必需的,但有效的多类校准仍然具有挑战性:全局方法假设校准误差在潜在空间中是同质的,而局部方法通常依赖于潜在空间降维,导致信息丢失。为了解决这些问题,我们提出了一种多类校准的复合方法,其中区域特定的校准映射是从共享的码字依赖因素中构建的。我们通过向量量化(VQ)实现这一想法,它诱导了表示空间的结构划分,并利用Dirichlet浓度的参数化实现跨区域参数共享。我们的方法学习了能泛化到稀疏区域的异质校准映射。在基准数据集上的实验显示,在保持竞争性的全局校准和预测性能的同时,显著提高了局部校准性能。

英文摘要

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

2605.21059 2026-05-21 cs.CV cs.LG 版本更新

Multimodal LLMs under Pairwise Modalities

基于成对模态的多模态大语言模型

Yan Li, Yunlong Deng, Yuewen Sun, Gongxu Luo, Kun Zhang, Guangyi Chen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种基于成对模态训练多模态大语言模型的方法,通过理论分析和表示学习框架,实现了跨模态对齐和重构,提升了模型的跨模态性能。

详情
AI中文摘要

尽管多模态大语言模型(MLLMs)取得了令人印象深刻的结果,但其训练通常依赖于联合编纂的多模态数据,需要大量的人力来构建多向对齐的数据集,从而限制了跨领域的可扩展性。在本工作中,我们探索了仅利用多种成对模态作为完整联合多模态分布的替代方案进行训练。具体来说,我们首先提供了理论分析,探讨在仅观察成对模态的情况下,表示可识别的条件。基于此分析,我们提出了一种表示学习框架,用于仅使用成对数据对齐跨模态的潜在表示。该框架包括两个阶段:潜在表示对齐和跨模态重构。具体而言,在第一阶段,我们通过自模态重建和成对对比学习学习跨模态的共享潜在空间。我们还通过部分对齐和最小潜在规范在对比学习过程中引入归纳偏置。在第二阶段,我们将新引入的模态的编码器与预训练模态的解码器整合起来,以促进跨模态转移和生成。我们通过将3D点云和触觉模态添加到预训练的MLLMs中,并使用三种模态对进行评估,证明通过学习对齐的潜在表示空间,我们的模型在跨模态性能上表现优异。

英文摘要

Despite the impressive results achieved by multimodal large language models (MLLMs), their training typically relies on jointly curated multimodal data, requiring substantial human effort to construct multi-way aligned datasets and thereby limiting scalability across domains. In this work, we explore training MLLMs by only leveraging multiple paired modalities as a surrogate for the full joint multimodal distribution. Specifically, we first provide a theoretical analysis of the conditions under which the representations are identifiable with only observing pairwise modalities. Building on this analysis, we propose a representation learning framework for aligning latent representations across modalities using only pairwise data. The framework consists of two stages: latent representation alignment and cross-modal recomposition. Specifically, in the first stage, we learn the shared latent space across modalities by both self-modal reconstruction and pair-wise contrastive learning. We also incorporate an inductive bias in the contrastive learning process by partially aligning and minimal latent specification. In stage two, we integrate the encoder of newly introduced modalities with the decoders of the pre-trained modalities to facilitate cross-modal transfer and generation. We evaluate our method by newly adding 3D point clouds and tactile modalities into pre-trained MLLMs with three modality pairs and show that, by learning an aligned latent representation space, our model achieves strong cross-modal performance.

2605.21058 2026-05-21 cs.LG 版本更新

A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation

因果与传统表征学习之间的对话:在统一框架中实现相互受益

Yan Li, Yuewen Sun, Shaoan Xie, Gongxu Luo, Yunlong Deng, Kun Zhang, Guangyi Chen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎伊德·本·扎耶德人工智能大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文探讨了因果表征学习与传统表征学习之间的对话,提出统一框架,通过任务组件和约束组件相互促进发展,实验表明因果约束的有效性依赖于所配的任务。

详情
AI中文摘要

因果表征学习(CRL)和传统表征学习在发展轨迹上大相径庭。传统表征学习主要由应用和经验目标驱动,而CRL则更关注理论问题,尤其是可识别性。这种侧重点的不同导致了两个领域在术语、问题建模和评估上的差距,限制了交流,有时导致孤立或冗余的努力。本文认为,这两个领域应对话而非视为独立范式。为此,我们引入了一个统一框架,其中表征学习由两个组件定义:任务组件,指定所学表征需要保留的信息;约束组件,指定对潜在空间的结构约束。在此框架下,双向收益。CRL提供理论工具,用于理解何时结构化潜在约束是有用或必要的,而传统表征学习提供实用见解,关于任务设计和目标选择,可以改进CRL方法的发展。为了说明这种交互,我们实验研究了不同任务组件如何影响CRL方法在不同结构约束下的行为。在CausalVerse上的结果表明,因果约束的有效性强烈依赖于所配的任务。

英文摘要

Causal representation learning (CRL) and traditional representation learning have largely developed along different trajectories. Traditional representation learning has been driven mainly by applications and empirical objectives, whereas CRL has focused more on theoretical questions, particularly identifiability. This difference in emphasis has created a gap between the two fields in terminology, problem formulation, and evaluation, limiting communication and sometimes leading to disconnected or redundant efforts. In this paper, we argue that these two fields should be brought into dialogue rather than treated as separate paradigms. To this end, we introduce a unified formulation in which the representation learning is characterized by two components: a task component, which specifies what information the learned representation is required to preserve, and a constraint component, which specifies what structure is imposed on the latent space. Under this formulation, the benefits run in both directions. CRL provides theoretical tools for understanding when structured latent constraints are useful or necessary, while traditional representation learning offers practical insights on task design and objective choice that can improve the development of CRL methods. To illustrate this interaction, we experimentally study how different task components affect the behavior of CRL methods under different structured constraints. Results on CausalVerse show that the effectiveness of causal constraints depends strongly on the tasks with which they are paired.

2605.21055 2026-05-21 cs.NE cs.LG 版本更新

Genetic Programming with Transformer-Based Mutation for Approximate Circuit Design

基于变压器的突变遗传编程用于近似电路设计

Ondrej Galeta, Lukas Sekanina

发表机构 * Brno University of Technology, Faculty of Information Technology(布拉格技术大学信息科技学院)

AI总结 本文提出了一种基于变压器的突变算子,用于改进遗传编程在近似算术电路自动设计中的进化设计和优化过程,通过混合方案防止电路近似过程停滞,并在多个目标误差约束下优于EvoApproxLib库中的现有高优化设计。

Comments To appear at IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation, Maastricht, NL, 2026

详情
AI中文摘要

最近的趋势是利用机器学习模型来提高进化设计和优化过程。我们提出了一种新的基于变压器的突变算子,用于Cartesian遗传编程(CGP)以实现近似算术电路的自动设计。我们引入了一种CGP的混合方案,其中所提出的突变算子与标准突变算子交替使用,以防止电路近似过程停滞。我们还开发了一种新的训练方案,用于底层变压器,该方案利用由成千上万的CGP染色体组成的训练向量,这些染色体代表各种近似乘法器。对于几种目标误差约束,使用基于变压器的突变算子的CGP进化出的近似乘法器在性能和优化方面优于EvoApproxLib库中的现有高优化设计。尽管训练和进化过程计算上都很耗费资源,但它们似乎是改进现有近似电路和产生新、可能可专利的电路设计所必需的步骤。

英文摘要

A recent trend is to leverage machine learning models to improve the evolutionary design and optimization process. We propose a novel transformer-based mutation operator for Cartesian genetic programming (CGP) for the automated design of approximate arithmetic circuits. We introduce a hybrid scheme for CGP in which the proposed mutation operator is switched with the standard mutation operator to prevent stagnation of the circuit approximation process. We also develop a new training scheme for the underlying transformer that utilizes training vectors composed of thousands of CGP chromosomes representing various approximate multipliers. For several target error constraints, the approximate multipliers evolved with CGP utilizing the transformer-based mutation achieve better trade-offs than the highly optimized designs available in the state-of-the-art EvoApproxLib library of approximate circuits. Although both training and evolutionary processes are computationally demanding, they appear to be necessary steps for improving existing approximate circuits and producing new, potentially patentable circuit designs.

2605.21041 2026-05-21 stat.ML cs.LG stat.ME 版本更新

Conditioning Gaussian Processes on Almost Anything

对几乎任何事物进行高斯过程的条件化

Henry Moss, Lachlan Astfalck, Thomas Cowperthwaite, Colin Doumont, Sam Willis, Philipp Hennig, Christopher Nemeth, Andrew Zammit-Mangion

发表机构 * Lancaster University(兰卡斯特大学) University of New South Wales(新南威尔士大学) University of Cambridge(剑桥大学) University of Tübingen(图宾根大学)

AI总结 本文提出了一种通用的方法,通过将高斯过程与线性扩散模型建立等价关系,实现了对任意条件语句的高效条件化,包括非线性物理模型和自然语言,从而扩展了高斯过程在现实世界建模中的应用。

详情
AI中文摘要

高斯过程(GPs)提供了一种基于函数的原理性概率模型,但精确推断仅限于线性-高斯范式。我们建立了GPs与一类线性扩散模型之间的显式等价关系,将预测采样重新表述为一个具有闭式高斯动力学和一个依赖似然的引导项的ODE,该引导项允许简单的蒙特卡洛近似。在线性-高斯设置中,我们精确恢复了标准GP条件化;超越共轭性之外,相同的机制能够处理任何允许逐点似然评估的条件语句——包括非线性物理模型,以及首次通过大型语言模型实现自然语言。白化分离了不可约的非高斯动力学,最小化了Wasserstein-2运输成本并消除了数值刚性。结果是一种通用的GP推断方案,无需专门推导。这些结果提供了一种通用机制,将现实世界知识的全部丰富性作为条件信息纳入其中,为现实世界问题的概率建模开辟了新的前沿。

英文摘要

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

2605.21033 2026-05-21 cs.LG cs.DS 版本更新

Efficient Banzhaf-Based Data Valuation for $k$-Nearest Neighbors Classification

高效基于Banzhaf值的$k$-最近邻分类数据估值

Guangyi Zhang, Lutz Oettershagen, Lixu Wang, Aristides Gionis

发表机构 * Shenzhen Technology University(深圳技术大学) University of Liverpool(利物浦大学) Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种高效计算$k$-最近邻分类器中Banzhaf值的方法,解决了数据估值中的计算复杂性问题,通过动态规划框架实现了显著的计算效率提升。

Comments To appear at VLDB 2026

详情
AI中文摘要

数据估值,即量化单个数据点对模型性能的贡献,已成为机器学习中的基本挑战。基于博弈论的方法,如Banzhaf值,提供了公平数据估值的原理性框架;然而,它们存在指数级计算复杂性。我们通过开发专门用于计算$k$-最近邻($k$NN)分类器中Banzhaf值的高效算法来解决这一挑战。我们首先通过证明该问题为\#P难来建立该问题的理论难度。尽管这种不可计算性,我们利用$k$NN分类器的局部性质开发了实用的精确算法。我们的主要贡献是一个动态规划框架,实现了显著的计算改进:我们提出了一种伪多项式算法,时间复杂度为$O(Wkn^2)$,适用于加权$k$NN分类器,其中$W$是前$k$个权重的总和最大值,并且为无权$k$NN提出了一种专门的算法,时间复杂度为$O(nk^2)$,即与数据点数量成线性关系。我们还提供了高效的蒙特卡洛估计方法。在现实世界数据集上的广泛实验展示了我们方法的实用效率及其在数据估值应用中的有效性。

英文摘要

Data valuation, the task of quantifying the contribution of individual data points to model performance, has emerged as a fundamental challenge in machine learning. Game-theoretic approaches, such as the Banzhaf value, offer principled frameworks for fair data valuation; however, they suffer from exponential computational complexity. We address this challenge by developing efficient algorithms specifically tailored for computing Banzhaf values in $k$-nearest neighbor ($k$NN) classifiers. We first establish the theoretical hardness of the problem by proving that it is \#P-hard. Despite this intractability, we exploit the locality properties of $k$NN classifiers to develop practical exact algorithms. Our main contribution is a dynamic programming framework that achieves significant computational improvements: we present a pseudo-polynomial algorithm with $O(Wkn^2)$ time complexity for weighted $k$NN classifiers, where $W$ is the maximum sum of top-$k$ weights, and a specialized algorithm for unweighted $k$NN that achieves $O(nk^2)$ time complexity, that is, linear in the number of data points. We also offer efficient Monte Carlo estimation methods. Extensive experiments on real-world datasets demonstrate the practical efficiency of our approach and its effectiveness in data valuation applications.

2605.20999 2026-05-21 math.PR cs.LG math.OC stat.ML 版本更新

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

一般随机逼近的集中性在重尾马尔可夫噪声下

Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia

发表机构 * Indian Institute of Science(印度科学研究院) Georgia Institute of Technology(佐治亚理工学院) University of Minnesota(明尼苏达大学)

AI总结 本文研究了在具有有限状态马尔可夫分量和马丁格尔差分分量的噪声下,随机逼近算法迭代项的最大集中性界。通过新的Lyapunov函数和辅助投影算法,分析了不同步长序列和随机算子性质对误差尾部行为的影响,并展示了在无界马丁格尔差分噪声情况下,误差尾部的集中性结果。

Comments 67 pages

详情
AI中文摘要

我们建立了由具有通用步长的随机逼近算法生成的迭代项的最大集中性界,其中噪声包含有限状态马尔可夫分量和马丁格尔差分分量。当马丁格尔差分噪声有界时,我们证明误差尾部可以是亚高斯、亚魏伯或比任何帕累托分布更轻但比任何魏伯分布更重,这取决于步长序列和随机算子是否几乎必然收缩、几乎必然非扩张或以正概率扩张。我们的分析依赖于一个涉及解泊松方程的矩生成函数的新型Lyapunov函数,以及一个辅助投影算法。我们通过最坏情况例子补充上界,表明更精确的上界不可能实现。我们进一步研究了当平均算子是收缩的且步长为$1/k$时无界马丁格尔差分噪声的情况,在此设置下,如果随机算子几乎必然非扩张,则误差尾部至多是噪声尾部的三倍重;如果随机算子以正概率扩张,则误差尾部可能显著更重。这些结果通过一种新的黑盒截断论证获得,将无界噪声情况转化为有界噪声情况。

英文摘要

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.

2605.20997 2026-05-21 cs.CV cs.AI cs.LG physics.comp-ph 版本更新

Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data

基于TanDEM-X和Landsat数据的混合机器学习模型用于森林高度估计

Islam Mansour, Ronny Haensch, Irena Hajnsek, Konstantinos Papathanassiou

发表机构 * German Aerospace Center (DLR)(德国航空航天中心(DLR)) Institute of Environmental Engineering, ETH Zürich(环境工程研究所,苏黎世联邦理工学院)

AI总结 本文提出了一种结合机器学习与物理模型的混合方法,利用TanDEM-X干涉相干测量和Landsat光学数据来提高森林高度估计的精度,通过扩展特征空间减少高度和基线地形坡度的模糊性,实验结果表明RMSE和MAE分别降低了13.5%和16.6%。

详情
AI中文摘要

将机器学习(ML)与物理模型(PM)结合,已成为从遥感数据中检索地球物理参数的一种有前途的方法。在此背景下,一种用于从TanDEM-X干涉相干测量中估计森林高度的ML模型最近被提出,该模型通过物理模型约束学习过程。虽然所选特征用于训练和反演以确保解决方案的物理一致性,但它们无法解决数据中的所有高度/结构和基线/地形坡度模糊性。为改进这一点,提出通过扩展特征空间加入光学Landsat数据,以提供关于森林类型或结构的补充信息。扩展的模型被应用于几处Gabon的Lopé国家公园的TanDEM-X数据,并与空中LiDAR测量进行评估。结果表明,与原始混合模型相比,RMSE和MAE分别减少了13.5%和16.6%,证实了多光谱输入的附加价值。

英文摘要

Integrating machine learning (ML) with physical models (PM) has emerged as a promising way of retrieving geophysical parameters from remote sensing data. In this context, a ML model for estimating forest height from TanDEM-X interferometric coherence measurements has recently been proposed, that constrains the learning process through a PM. While the features used for training and inversion where selected to ensure the physical consistency of the solutions, they could not resolve all height / structure and baseline / terrain slope ambiguities in the data. To improve this, the extension of the feature space with optical Landsat data is proposed able to provide complementary information on forest type or structure. The extended model is applied and validated on several TanDEM-X acquisitions over the Gabonese Lopé national park site and assessed against airborne LiDAR measurements. Results show a 13.5% reduction in RMSE and a 16.6% reduction in MAE compared to the original hybrid model, confirming the added value of multispectral inputs.

2605.20996 2026-05-21 cs.LG math.OC 版本更新

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

超越贝尔曼递归:一种指导性框架用于非指数折扣

Hojin Ko, Jeonggyu Huh

发表机构 * Department of Mathematics, Sungkyunkwan University, Suwon, Republic of Korea(韩国首尔大学数学系)

AI总结 本文提出了一种基于庞特里亚金原理的直接策略优化框架(PG-DPO),以解决非指数折扣问题,通过放弃递归方法,结合庞特里亚金最大原理和蒙特卡洛回放,提高动态规划的准确性和稳定性。

详情
AI中文摘要

大多数基于价值的方法和演员-评论家强化学习方法依赖于贝尔曼式递归,然而这些递归在非指数折扣情况下会崩溃,这在人类偏好和生存过程中很常见。我们证明这种崩溃是结构性的:指数折扣处于乘法性和时间齐性的脆弱交界处,违反任一属性都会破坏标准动态规划。为克服这一问题,我们提出庞特里亚金指导的直接策略优化(PG-DPO),一种变分框架,放弃递归并结合庞特里亚金最大原理与蒙特卡洛回放,通过伴随-蒙特卡洛投影强制点wise哈密顿最大化。在多维双曲和生存折扣基准上,PG-DPO在方程驱动求解器和基于批评者的基础线中提高了准确性和稳定性。

英文摘要

Most value-based and actor--critic reinforcement learning methods rely on Bellman-style recursions, yet these recursions collapse under non-exponential discounting common in human preferences and survival processes. We show the breakdown is structural: exponential discounting sits at a fragile intersection of multiplicativity and time homogeneity, and violating either property breaks standard dynamic programming. To overcome this, we propose Pontryagin-Guided Direct Policy Optimization (PG-DPO), a variational framework that abandons recursion and couples the Pontryagin Maximum Principle with Monte Carlo rollouts via an Adjoint-MC projection enforcing pointwise Hamiltonian maximization. Across multi-dimensional hyperbolic and survival-discount benchmarks, PG-DPO improves accuracy and stability where equation-driven solvers and critic-based baselines diverge.

2605.20989 2026-05-21 cs.LG q-bio.GN 版本更新

Modeling Temporal scRNA-seq Data with Latent Gaussian Process and Optimal Transport

用潜在高斯过程和最优传输建模时间序列scRNA-seq数据

Mehmet Yigit Balik, Harri Lähdesmäki

发表机构 * Department of Computer Science, Aalto University, Espoo, Finland(奥卢大学计算机科学系,埃斯波,芬兰)

AI总结 本文提出了一种生成框架,利用潜在异方差高斯过程建模种群趋势,并通过最优传输对齐生成和观测的种群分布,以捕捉生物异质性,从而在复杂插值和外推基准上实现最先进的性能。

详情
AI中文摘要

单细胞RNA测序提供了单细胞分辨率的基因表达见解,但从这些静态快照测量中推断时间过程仍然是一个根本性挑战。当前利用神经微分方程和流的方法容易过拟合且缺乏对生物变异性的仔细考虑。在本文中,我们提出了一种生成框架,利用希尔伯特空间方法近似潜在异方差高斯过程(GP)来建模种群趋势。为解决真实细胞轨迹的缺失问题,我们利用最优传输(OT)目标对齐生成和观测的种群分布。我们的方法通过引入细胞特异性潜在时间和细胞类型条件来捕捉生物异质性,从而解构时间异步性和不同细胞类型的轨迹。我们展示了在复杂插值和外推基准上的最先进性能,并引入了一种新的基于梯度的策略来推断扰动轨迹。

英文摘要

Single-cell RNA sequencing provides insights into gene expression at single-cell resolution, yet inferring temporal processes from these static snapshot measurements remains a fundamental challenge. Current approaches utilizing neural differential equations and flows are sensitive to overfitting and lack careful considerations of biological variability. In this work, we propose a generative framework that models population trends using a latent heteroscedastic Gaussian process (GP) approximated by Hilbert space methods. To address the absence of genuine cell trajectories, we leverage an optimal transport (OT) objective that aligns generated and observed population distributions. Our method explicitly captures biological heterogeneity by incorporating cell-specific latent time and cell type conditioning to disentangle temporal asynchrony and trajectories to different cell types. We demonstrate state-of-the-art performance on complex interpolation and extrapolation benchmarks and introduce a novel gradient-based strategy for inferring perturbation trajectories.

2605.20982 2026-05-21 cs.DC cs.AI cs.LG 版本更新

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

调度操作中的开销诊断:跨架构观测站

Bole Ma, Jan Eitzinger, Harald Koestler, Gerhard Wellein

发表机构 * DeepSeek-V2-Lite MLA DeepSeek-MoE-16B MHA Qwen3-30B GQA Nemotron-30B Mamba-2 Qwen3.5-35B GDN

AI总结 该研究通过测试四个缓解方案的假设,发现扩展进程(EP)规模变化对专家最大/均值token比率的影响最多为5%,并且mock-token基准在路由Gini系数和批大小缩放趋势上存在高估。研究发现五种架构在相同矩阵中形成两个稳定的带状分布,这些带状分布而非EP度或mock数据配置是AlltoAll-aware互连和调度设计的正确工作负载输入。

详情
AI中文摘要

AlltoAll调度是MoE专家并行性的主要瓶颈,互连社区对此做出了四种缓解方案:预测样本放置、自适应专家重新布局、分层收集和EP-aware拓扑。这四种方案都基于两个关于工作负载的假设。第一个假设是路由不平衡可以通过系统层纠正。第二个假设是评估它们的mock-token基准忠实代表生产路由。我们引入DODOCO来测试这两个假设。我们对五个MoE检查点进行仪器化,涵盖五个序列混合器设计(DeepSeek-V2-Lite MLA,DeepSeek-MoE-16B MHA,Qwen3-30B GQA,Nemotron-30B Mamba-2,Qwen3.5-35B GDN)在5x6的数据条件下网格以及匹配的EP扫描(4到32个rank在H100s上);两个假设都失败。扩展EP在每个架构的可测量范围内将每专家最大/均值token比率改变最多5%:straggler是模型路由决策的固有属性,而不是其专家落在rank上的方式。mock tokens高估路由Gini系数高达2.35倍,并制造了一个批次大小缩放趋势,一旦真实文本取代随机ID,该趋势就消失。从相同矩阵中出现第三种模式,意外的是,五种架构分裂成两个稳定的带状分布。MHA和Mamba-2(数据容错)在wikitext上降至Gini 0.105和0,150。MLA和GDN(持续集中)在所有真实文本条件下保持在0.24以上,并在mock中达到0.29到0.38。GQA是中间情况。这些带状分布,而不是EP度或mock数据配置,是AlltoAll-aware互连和调度设计的正确工作负载输入。

英文摘要

AlltoAll dispatch is the dominant bottleneck of MoE expert parallelism, and the interconnect community has responded with four families of mitigations: predictive sample placement, adaptive expert relayout, hierarchical collectives, and EP-aware topology. All four rest on two assumptions about the workload. The first is that routing imbalance is correctable by the system layer. The second is that the mock-token benchmarks evaluating them faithfully represent production routing. We introduce DODOCO to test both assumptions. We instrument five MoE checkpoints spanning five sequence-mixer designs (DeepSeek-V2-Lite MLA, DeepSeek-MoE-16B MHA, Qwen3-30B GQA, Nemotron-30B Mamba-2, Qwen3.5-35B GDN) under a 5 by 6 grid of data conditions plus a matched EP scan from 4 to 32 ranks on H100s; both assumptions fail. Scaling EP changes the per-expert max/mean token ratio by at most 5% within every architecture's measurable range: the straggler is intrinsic to the routing decision the model makes, not to how its experts land on ranks. Mock tokens overestimate routing Gini by up to a factor of 2.35 and fabricate a batch-size scaling trend that vanishes the moment real text replaces random IDs. A third pattern, unexpected, emerges from the same matrix: the five architectures cleave into two stable bands. MHA and Mamba-2 (data-resilient) drop to Gini 0.105 and 0.150 on wikitext. MLA and GDN (persistently concentrated) stay above 0.24 on every real-text condition and reach 0.29 to 0.38 on mock. GQA is the intermediate case. These bands, not the EP degree or the mock-data profile, are the right workload input to AlltoAll-aware interconnect and dispatch design.

2605.20978 2026-05-21 cs.LG 版本更新

Point Cloud Sequence Encoding for Material-conditioned Graph Network Simulators

用于材料条件化图网络模拟器的点云序列编码

Philipp Dahlinger, Balázs Gyenes, Niklas Freymuth, Luca Geminiani, Tobias Würth, Johannes Mitsch, Nadja Klein, Luise Kärger, Gerhard Neumann

发表机构 * Autonomous Learning Robots(自主学习机器人) Methods for Big Data(大数据方法) Institute of Vehicle System Technology(车辆系统技术研究所)

AI总结 本文提出PEACH框架,通过点云序列编码实现对未知物理属性的适应,提高了模拟到现实的零样本转移精度,并在实际部署中更具实用性。

Comments 9 pages + appendix, 7 figures. Submitted to the 40th Conference on Neural Information Processing Systems (NeurIPS 2026)

详情
AI中文摘要

图网络模拟器(GNSs)已作为复杂物理模拟的强大替代方案,提供内在可微性和比传统求解器快多个数量级的速度提升。然而,GNSs通常假设可以访问底层材料参数,如刚度或粘度,这严重限制了其在现实实验中的实用性。尽管最近的元学习方法通过从网格轨迹推断属性来解决参数依赖性,但从观察场景中重建网格具有挑战性。在本文中,我们介绍了Point Cloud Encoding for Accurate Context Handling(PEACH),一种新的框架,通过上下文学习在点云上适应学习的模拟器以适应未见过的物理属性。我们的方法依赖于一种新颖的时空点云序列编码器,以及两种形式的辅助监督来帮助提高模拟保真度。我们证明PEACH能够在具有挑战性的动态场景中实现准确的零样本模拟到现实转移。在模拟场景上的实验表明,PEACH在预测精度上甚至优于基于网格的基线,同时在实际部署中更加实用。

英文摘要

Graph Network Simulators (GNSs) have emerged as powerful surrogates for complex physics-based simulation, offering inherent differentiability and orders-of-magnitude speedups over traditional solvers. However, GNSs typically assume access to the underlying material parameters, such as stiffness or viscosity, severely limiting their utility in realistic experimental settings. While recent meta-learning approaches address the parameter dependency by inferring properties from mesh trajectories, reconstructing a mesh from an observed scene is challenging. In this work, we introduce Point Cloud Encoding for Accurate Context Handling (PEACH), a novel framework that applies in-context learning on point clouds to adapt a learned simulator to unseen physical properties during inference. Our approach relies on a novel spatio-temporal point cloud sequence encoder, as well as two forms of auxiliary supervision to help improve simulation fidelity. We demonstrate that PEACH is capable of accurate zero-shot sim-to-real transfer on a challenging, dynamic scene. Experiments on simulation scenes show that PEACH even outperforms mesh-based baselines on prediction accuracy, while being much more practical for real-world deployment.

2605.20956 2026-05-21 cs.LG cs.CY 版本更新

A Deployment Audit of Release-Side Risk in Conformal Triage under Prevalence Shift

发布侧风险的符合性分诊部署审计

Chengze Li, Xiao Liu, Hanrong Zhang, Haiyang Peng, Yanghao Ruan, Huanhuan Ma, Chunyu Miao, Qichao Zhou, Xiangrong Qi, Philip Yu

发表机构 * University of Illinois Chicago(伊利诺伊大学芝加哥分校) Manteia Technologies Co., Ltd(Manteia技术有限公司) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of California Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出了一种泄漏感知的发布侧符合性分诊审计方法,用于评估在患病率变化下,是否真正经历目标事件的患者被释放而无需审查,通过将目标主体分为三个非重叠角色来评估发布直接安全性。

Comments 18 pages, 4 figures, 5 tables

详情
AI中文摘要

符合性分诊将预测分数转换为部署行动,即释放案例、标记为紧急关注或推迟给人类审查。然而,在患病率变化下,通常的边际覆盖和人类审查率总结可能无法回答关键的安全问题:是否真正经历目标事件的患者被释放而无需审查。为解决这一差距,我们引入了一种泄漏感知的发布侧符合性分诊审计。它首先将目标主体分配给三个非重叠角色:患病率校正、符合性校准和保留的发布安全性评估。这种分离使审计能够直接评估发布:有多少事件阳性患者被清除而无需审查,是否试点有足够的事件标签用于校准,以及安全审查权衡如何转移。将此审计应用于回顾性NSCLC试点显示了较低审查可能具有误导性:在患病率校正后,池化符合性分支通过释放更多患者降低审查,其中一些是事件阳性。在审计中,类内分支充当稀缺性诊断:试点拥有过多的事件标签以认证安全的低审查释放。

英文摘要

Conformal triage converts predictive scores into deployment actions that either release a case, flag it for urgent attention, or defer it to human review. Under prevalence shift, however, the usual summaries of marginal coverage and human-review rate can miss the safety-critical question of whether patients who truly experience the target event are released without review. To address this gap, we introduce a leakage-aware deployment audit for release-side conformal triage. It first assigns target subjects to three non-overlapping roles: prevalence correction, conformal calibration, and held-out release-safety evaluation. This separation then lets the audit evaluate release directly: how many event-positive patients are cleared without review, whether the pilot has enough event labels for calibration, and how the safety-review trade-off shifts. Applying this audit to a retrospective NSCLC pilot shows why lower review can be misleading: after prevalence correction, the pooled conformal branch lowers review by releasing more patients, some of whom are event-positive. Within the audit, the classwise branch acts as a scarcity diagnostic: the pilot has too few event labels to certify safe low-review release.

2605.20936 2026-05-21 cs.LG cs.AI cs.CL 版本更新

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

DASH:在单个GPU上几分钟内完成的快速可微架构搜索用于混合注意力

Weizhe Chen, Miao Zhang, Junpeng Jiang, Yaping Li, Weili Guan, Liqiang Nie

发表机构 * Harbin Institute of Technology (Shenzhen)(哈尔滨工业大学(深圳))

AI总结 本研究提出DASH,一种快速可微架构搜索框架,用于混合注意力架构设计,通过将离散的层间注意力操作放置转化为连续的架构logits,准备可重用的教师对齐线性候选,并在模型和操作权重冻结的情况下进行架构仅搜索,显著提高了搜索效率。DASH在Qwen2.5-3B-Instruct上优于现有的所有选择器风格的混合注意力设计基线,展示了直接可微搜索可以发现更强的混合架构。此外,DASH在RULER性能上优于已发布的Jet-Nemotron模型,同时在重叠的短上下文和通用基准上保持竞争力。值得注意的是,每个DASH搜索运行仅使用12.3M tokens,并在单个RTX Pro 6000 GPU上仅需约20分钟,对应Jet-Nemotron报告的PostNAS搜索tokens的0.006%。这些结果表明,通过分钟级的可微搜索可以获得高质量的混合注意力架构,为混合架构设计提供了有前景的方向。

Comments 19 pages, 7 figures

详情
AI中文摘要

混合注意力架构正变得越来越重要,用于在保持模型质量的同时提高LLM推理效率,使混合架构设计成为核心问题。现有的设计通常依赖于手动经验规则或基于代理的选择器信号来分配层间操作符。最近的NAS风格系统,如Jet-Nemotron,展示了自动混合架构搜索的潜力。然而,Jet-Nemotron的PostNAS搜索阶段单独使用200B tokens,使得此类搜索流程难以作为混合架构设计的常规方法。我们引入DASH,一种用于混合注意力架构设计的快速可微搜索框架,它将离散的层间注意力操作放置放松为连续的架构logits,准备可重用的教师对齐线性候选,并在模型和操作权重冻结的情况下进行架构仅搜索,以显著提高搜索效率。在Qwen2.5-3B-Instruct上,DASH一致优于现有的所有选择器风格的混合注意力设计基线,表明直接可微搜索可以发现更强的混合架构。此外,DASH在RULER性能上优于已发布的Jet-Nemotron模型,同时在重叠的短上下文和通用基准上保持竞争力。值得注意的是,每个DASH搜索运行仅使用12.3M tokens,并在单个RTX Pro 6000 GPU上仅需约20分钟,对应Jet-Nemotron报告的PostNAS搜索tokens的0.006%。这些结果表明,通过分钟级的可微搜索可以获得高质量的混合注意力架构,为混合架构设计提供了有前景的方向。

英文摘要

Hybrid attention architectures are becoming an increasingly important paradigm for improving LLM inference efficiency while preserving model quality, making hybrid architecture design a central problem. Existing designs often rely on manual empirical rules or proxy-based selector signals for layer-wise operator allocation. Recent NAS-style systems such as Jet-Nemotron demonstrate the promise of automated hybrid architecture search. However, Jet-Nemotron's PostNAS search stages alone use 200B tokens, making such search pipelines difficult to use as routine methods for hybrid architecture design. We introduce DASH, a fast differentiable search framework for hybrid attention architecture design, which relaxes discrete layer-wise attention operator placement into continuous architecture logits, prepares reusable teacher-aligned linear candidates, and performs architecture-only search with model and operator weights frozen to significantly enhance search efficiency. On Qwen2.5-3B-Instruct, DASH consistently outperforms a comprehensive suite of existing selector-style hybrid attention design baselines, showing that direct differentiable search can discover stronger hybrid architectures. Moreover, DASH achieves stronger RULER performance than released Jet-Nemotron models while remaining competitive on overlapping short-context and general benchmarks. Notably, each DASH search run uses only 12.3M tokens and takes about 20 minutes on a single RTX Pro 6000 GPU, corresponding to merely 0.006% of the PostNAS search tokens reported by Jet-Nemotron. These results suggest that high-quality hybrid attention architectures can be obtained through minutes-level differentiable search, providing a promising direction for hybrid architecture design.

2605.20922 2026-05-21 cs.LG cs.AI cs.CV 版本更新

Winfree Oscillatory Neural Network

Winfree振荡神经网络

Jiawen Dai, Yue Song

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai Qi Zhi Institute(上海启智研究院) College of AI, Tsinghua University(清华大学人工智能学院)

AI总结 本文提出了一种基于广义Winfree动力学的振荡神经网络WONN,通过结构化的振荡交互在流形$(S^1)^d$上进化表示,结合基于相位的归纳偏置与灵活的层次交互机制,实现了在图像识别和复杂推理任务上的竞争力和参数效率。

Comments Project page: https://jiawen-dai.github.io/WONN_Project_Page/

详情
AI中文摘要

振荡和同步被认为是表示和计算中的基本要素。然而,现有的基于同步动力学的机器学习方法大多局限于特定领域,如物体发现,缺乏在标准视觉基准或逻辑推理任务中的扩展性证据。我们提出Winfree振荡神经网络(WONN),一种基于广义Winfree动力学的动态神经架构。WONN通过结构化的振荡交互在流形$(S^1)^d$上进化表示,结合基于相位的归纳偏置与灵活且层次化的交互机制,这些机制可以是固定的三角函数映射或可学习的神经网络。我们在图像识别和复杂推理任务上评估了WONN,包括CIFAR、ImageNet、Maze-hard和Sudoku。在这些领域中,WONN实现了具有竞争力或优越性能的成果,并且具有强参数效率。特别是,WONN是目前已知第一个能够与ImageNet-1K竞争的基于同步的振荡架构。此外,在Maze-hard上,WONN仅使用前状态-of-the-art模型1%的参数就达到了80.1%的准确率。这些结果表明,结构化的振荡动力学为传统神经架构提供了一种可扩展且参数高效的替代方案。

英文摘要

Oscillations and synchronization are widely believed to play a fundamental role in representation and computation. However, existing machine learning approaches based on synchronization dynamics have largely been confined to specialized settings such as object discovery, with limited evidence of scalability to standard vision benchmarks or logic reasoning tasks. We propose the Winfree Oscillatory Neural Network (WONN), a dynamical neural architecture based on generalized Winfree dynamics. WONN evolves representations on the torus $(S^1)^d$ through structured oscillatory interactions, combining phase-based inductive biases with flexible and hierarchical interaction mechanisms instantiated as either fixed trigonometric mappings or learnable neural networks. We evaluate WONN on image recognition and complex reasoning tasks, including CIFAR, ImageNet, Maze-hard, and Sudoku. Across these domains, WONN achieves competitive or superior performance with strong parameter efficiency. In particular, WONN is, to our knowledge, the first synchronization-based oscillatory architecture to scale competitively to ImageNet-1K. Furthermore, on Maze-hard, WONN achieves 80.1% accuracy using only 1% of the parameters of prior state-of-the-art models. These results suggest that structured oscillatory dynamics provide a scalable and parameter-efficient alternative to conventional neural architectures.

2605.20915 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models

校准与决策制定:重新审视未学习语言模型中的可靠性悖论

Divyaksh Shukla, Ashutosh Modi

发表机构 * Indian Institute of Technology Kanpur (IIT Kanpur)(印度理工学院坎浦尔学院(IIT坎浦尔))

AI总结 本文研究了生成语言模型中校准与决策可靠性之间的差距,通过TOFU基准测试中的多项选择问答评估协议,发现经过微调的模型在校准误差较低,而未学习后的模型在校准误差仍低,但依赖于相关性特征的决策规则增加,扩展了可靠性悖论到机器未学习领域。

Comments Accepted at SRW, ACL 2026; 17 pages (9 + 2 + 6)

详情
AI中文摘要

机器未学习旨在从模型中移除特定训练数据的影响,同时保持对剩余数据的可靠行为,使可靠的预测和不确定性估计成为评估的关键。校准常被用作语言模型可靠性代理,但低校准误差并不一定意味着可靠的决策规则,因为模型可能依赖于虚假相关性而保持良好校准。我们通过TOFU基准测试中的多项选择问答评估协议,研究了生成语言模型中的这一差距,利用校准指标(ECE、MCE、Brier)测量概率可靠性,并通过基于属性的快捷方式检测(使用积分梯度和局部互信息)评估决策规则可靠性。我们发现,微调模型的校准误差(ECE ~ 0.04)低于预训练模型(ECE > 0.5),而未学习后的模型在校准误差相似,尽管在遗忘分割上的准确性降低,属性分析显示对基于相关性的标记依赖增加。这些结果表明,良好的校准可以与未学习后的基于快捷方式的决策规则共存,将可靠性悖论扩展到了机器未学习领域。

英文摘要

Machine unlearning aims to remove the influence of specific training data from a model while preserving reliable behavior on the remaining data, making reliable prediction and uncertainty estimation essential for evaluation. Calibration is commonly used as a proxy for reliability in language models, but low calibration error does not necessarily imply reliable decision rules, as models may rely on spurious correlations while remaining well calibrated. We investigate this gap in generative language models using the multiple-choice question-answering evaluation protocol on the TOFU benchmark, measuring probabilistic reliability with calibration metrics (ECE, MCE, Brier) and decision-rule reliability via attribution-based shortcut detection with Integrated Gradients and Local Mutual Information. We find that fine-tuned models achieve low calibration error (ECE ~ 0.04) compared to pretrained models (ECE > 0.5), and models after unlearning retain similarly low calibration despite reduced accuracy on the forget split, while attribution analysis shows increased reliance on correlation-based tokens. These results demonstrate that good calibration can coexist with shortcut-based decision rules after unlearning, extending the reliability paradox to the machine unlearning setting.

2605.20911 2026-05-21 cs.AI cs.LG 版本更新

For How Long Should We Be Punching? Learning Action Duration in Fighting Games

我们应该持续打击多久?在格斗游戏中学习动作持续时间

Hoang Hai Nguyen, Kurt Driessens, Dennis J. N. J. Soemers

发表机构 * Department of Advanced Computing Sciences, Maastricht University(马斯特里赫特大学高级计算科学系)

AI总结 本文研究了在格斗游戏中如何通过学习动作持续时间来提高强化学习代理的决策能力,探讨了动态调整反应时间的方法及其对性能和行为模式的影响。

Comments Accepted at Computers and Games 2026

详情
AI中文摘要

像《街头霸王II》这样的格斗游戏对强化学习(RL)代理提出了独特的挑战,因为它们具有快速且实时的性质。在大多数RL框架中,代理被硬编码为在固定间隔内做出决策,通常每帧或每N帧。虽然这种设计确保了及时的响应,但限制了代理调整反应时间的能力。每帧行动提供帧完美反应,这与人类玩家相比不现实,而更长的固定间隔会降低计算成本但会阻碍响应速度。我们考虑了一种替代的决策框架,其中代理不仅学习采取什么动作,还学习执行该动作有多久。通过同时预测动作和持续时间,代理可以动态调整其对游戏不同情况的响应能力。我们使用开源的FightLadder环境,通过训练代理对抗内置的脚本机器人,系统地测试不同的帧跳配置,以分析其对性能、响应性和学习行为的影响。实验表明,学习的时间可以与精心选择的固定帧跳性能相匹配,并鼓励可重复的动作模式,但本身并不能保证鲁棒性。在大多数情况下,我们发现代理在一致的高帧跳值(即低响应速度)下表现最佳。这种策略使学习利用性策略变得更容易,其中相同的动作被反复执行,而脚本机器人似乎容易受到这种策略的影响。

英文摘要

Fighting games such as Street Fighter II present unique challenges to reinforcement learning (RL) agents due to their fast-paced, real-time nature. In most RL frameworks, agents are hard-coded to make decisions at a fixed interval, typically every frame or every N frames. Although this design ensures timely responses, it restricts the agent's ability to adjust its reaction timing. Acting every frame grants frame-perfect reflexes, which are unrealistic compared to human players, whereas longer fixed intervals reduce computational cost but hinder responsiveness. We consider an alternative decision-making framework in which the agent learns not only what action to take but also for how long to execute it. By jointly predicting both action and duration, the agent can dynamically adapt its responsiveness to different situations in the game. We implement this method using the open-source FightLadder environment with agents trained against scripted built-in bots, systematically testing different frame skip configurations to analyze their influence on performance, responsiveness, and learned behavior. Experiments show that learned timing can match the performance of well-chosen fixed frame skips and encourages repeatable action patterns, but does not ensure robustness on its own. In most cases, we see agents performing best with consistently high frame skip values (i.e., low responsiveness). This strategy makes it easier to learn exploitative strategies where the same action is repeated over and over, which the scripted bots appear to be susceptible to.

2605.20885 2026-05-21 cs.LG q-bio.QM 版本更新

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

训练分布决定了药物盲癌敏感性预测的上限

Taekyung Heo

发表机构 * Taekyung Heo

AI总结 本文研究了药物盲癌敏感性预测中训练分布对预测性能的影响,发现传统指标存在偏差,通过机制分层训练和响应匹配策略恢复了预测增益。

详情
AI中文摘要

精准肿瘤学需要预测特定肿瘤从其分子特征出发哪种药物能抑制它,但尽管药物表示越来越复杂,药物盲敏感性预测却停滞不前。本文表明这种停滞反映的是度量偏差而非表示瓶颈。标准基准全球皮尔逊相关系数受药物间效力差异主导,一个简单的药物均值预测器即可捕捉。每种药物皮尔逊相关系数揭示了在四个独立数据集中,没有药物编码能超过仅基于细胞特征的预测。受控实验将作用机制身份作为药物特征或训练分布约束,确定了原因。将作用机制作为特征提供微小收益,而将其作为训练分布分层则显著提高针对靶向激酶抑制剂的每种药物相关系数,因为全癌症联合训练抑制了通路特异性敏感信号。机制分层训练和试点观察的响应匹配提供了两种可部署策略,共同恢复了药物盲敏感性预测中的主要预测增益来源。

英文摘要

Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises per-drug r substantially for targeted kinase inhibitors, because pan-cancer co-training suppresses pathway-specific sensitivity signals. Mechanism-stratified training and response matching from pilot observations provide two deployable strategies that together recover the principal sources of predictive gain in drug-blind sensitivity prediction.

2605.20883 2026-05-21 cs.LG 版本更新

Learning fMRI activations dictionaries across individual geometries via optimal transport

通过最优传输学习跨个体几何的fMRI激活字典

Sonia Mazelet, Rémi Flamary, Bertrand Thirion

发表机构 * CMAP, Ecole Polytechnique Palaiseau, France(CMAP,巴黎政治学院帕莱索校区,法国) Mind, Inria-Saclay Palaiseau, France(Mind,法国国家信息与自动化研究所萨克雷帕莱索分所,法国)

AI总结 本文提出了一种基于最优传输的fMRI字典学习方法,通过Fused Gromov-Wasserstein距离处理个体脑几何差异,利用amortized优化减少计算成本,并学习依赖FGW参数平衡特征对齐与结构一致性的字典原子。

详情
AI中文摘要

字典学习是一种创建可解释表示的强大工具。当应用于功能性磁共振成像(fMRI)数据时,所得到的脑活动模式可用于各种下游任务,如脑状态分类或群体水平分析。然而,一个主要挑战是不同个体之间的脑几何差异。通常通过将每个个体的脑几何投影到一个通用模板上来解决,这会移除个体特定的信息。在本工作中,我们提出了一种新的fMRI数据字典学习方法,该方法明确考虑了这种差异。我们使用基于最优传输的融合Gromov-Wasserstein(FGW)距离来比较具有不同几何和特征的图。为了解决计算多个FGW距离对于大图(如来自fMRI数据的图)带来的挑战,我们依赖于amortized优化来学习一个神经网络,该网络可以预测最优传输计划的近似值,从而显著降低计算成本。此外,我们学习了依赖FGW权衡参数的字典原子,该参数控制特征对齐和结构一致性之间的平衡。在HCP数据集上的数值实验表明,所提出的方法能够捕捉数据中的不同几何差异水平,并提供保留关键信息的表示。

英文摘要

Dictionary learning is a powerful tool for creating interpretable representations. When applied to functional magnetic resonance imaging (fMRI) data, the resulting patterns of brain activity can be used for various downstream tasks, such as brain state classification or population-level analysis. However, a major challenge is the variability in brain geometry across individuals. This is usually addressed by projecting each individual brain geometry onto a common template, which removes subject-specific information. In this work, we introduce a novel approach to dictionary learning on fMRI data that explicitly accounts for this variability. We use the optimal transport-based Fused Gromov-Wasserstein (FGW) distance to compare graphs with different geometries and features. To address the challenge of computing multiple FGW distances for large graphs such as those arising from fMRI data, we rely on amortized optimization to learn a neural network that predicts an approximation of the optimal transport plans, which substantially reduces the computational cost. Additionally, we learn dictionary atoms that depend on the FGW trade-off parameter, which controls the balance between feature alignment and structural consistency. Numerical experiments on the HCP dataset demonstrate that the proposed approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

2605.20879 2026-05-21 cs.LG 版本更新

NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity

NeighborDiv: 一种基于邻居多样性、无需训练的跨域通用图异常检测方法

Kaifeng Wei, Teng Liu, Liang Dong, Xiubo Liang, Yuke Li

发表机构 * Netease Yidun AI Lab(网易易盾AI实验室) School of Software Technology(软件学院) Zhejiang University(浙江大学)

AI总结 本文提出NeighborDiv,一种无需训练的通用图异常检测方法,通过邻居多样性原理来检测异常,克服了传统方法在训练复杂度、数据依赖性和跨域泛化稳定性方面的不足,实验表明其在多个评估框架下均取得最佳性能。

详情
AI中文摘要

图异常检测(GAD)正逐渐转向通用图异常检测(GGAD)以实现跨域的'一揽子'检测,但现有GGAD方法主要依赖邻居一致性原则,陷入'节点到邻居一致性范式'的异常量化中。这些方法存在训练流程复杂、依赖大量训练数据、计算成本高以及跨域泛化不稳定等问题。为了解决这些限制,我们提出了NeighborDiv,一种基于邻居多样性的无需训练的通用图异常检测框架。偏离主流的'节点到邻居一致性范式',我们转向'邻居到邻居多样性范式',发现节点邻居集合的内部结构分散性是一种强大且独立的异常信号。我们通过邻居间特征相似性的方差来量化邻居多样性,捕捉节点如何组织其局部图环境,并独立于传统节点到邻居一致性框架。在两个标准的GGAD评估范式下进行的大量实验表明,NeighborDiv在单域独立训练(SDIT)下平均AUC提升了10.25%,平均AP提升了17.78%;在统一多域训练(UMDT)下,AUC和AP分别提升了6.89%和9.58%。值得注意的是,NeighborDiv在所有数据集上均无性能波动,消除了训练集依赖性,建立了一个轻量且高度实用的GGAD框架。

英文摘要

Graph Anomaly Detection (GAD) is increasingly shifting to Generalist GAD (GGAD) for cross-domain "one-for-all" detection, but existing GGAD methods predominantly rely on the neighbor consistency principle, falling into the \textbf{Node-to-Neighbor Consistency Paradigm} for anomaly quantification. These methods suffer from complex training pipelines, heavy training data dependency, high computational costs, and unstable cross-domain generalization. To address these limitations, we propose NeighborDiv, a training-free generalist graph anomaly detection framework based on neighbor diversity. Departing from the dominant Node-to-Neighbor Consistency Paradigm, we shift the focus to the \textbf{Neighbor-to-Neighbor Diversity Paradigm}, and uncover that the internal structural dispersion of a node's neighbor set is a powerful, independently discriminative anomaly signal. We quantify neighbor diversity via the variance of inter-neighbor feature similarities, which captures how a node organizes its local graph environment, and operates independently of conventional node-to-neighbor consistency frameworks. Extensive experiments under two standard GGAD evaluation paradigms show NeighborDiv achieves state-of-the-art performance, with relative gains of 10.25% in average AUC and 17.78% in average AP over the second-best baseline under Single-Domain Independent Training (SDIT), and 6.89%/9.58% in AUC/AP under Unified Multi-Domain Training (UMDT), respectively. Notably, NeighborDiv yields zero performance volatility across all datasets, eliminating training-set dependency and establishing a lightweight and highly practical GGAD framework.

2605.20878 2026-05-21 cs.LG 版本更新

CIG: Exploration via Conditional Information Gain

CIG: 通过条件信息增益进行探索

Tim Joseph, Marcus Fechner, Philipp Stegmaier, Karam Daaboul, J. Marius Zöllner

发表机构 * FZI Karlsruhe(弗赖堡研究所卡尔斯鲁厄分所) KIT Karlsruhe(卡尔斯鲁厄理工学院)

AI总结 该研究提出了一种条件信息增益(CIG)奖励机制,用于强化学习中的探索问题,通过可追溯的log-determinant目标和Ensemble Disagreement核来生成因果每步奖励,从而在高维状态空间中实现有效的探索。

Comments 28 pages, 10 figures, 3 tables

详情
AI中文摘要

在强化学习中,内在奖励用于探索时会根据不同的上下文进行条件化:终身奖励对每个转移进行累积经验评分,但忽略轨迹内的冗余;事件奖励惩罚轨迹内的重复,但丢弃长期进步。混合方法通过启发式权重结合两种信号,或需要高斯过程动态模型,无法扩展到低维状态空间。轨迹级信息增益可以分解为每步项,这些项同时条件于回放缓冲区和轨迹前缀,但在深度模型中仍然不可行。我们推导出条件信息增益(CIG)奖励作为可追溯的替代方案:一个基于集合分歧核的log-determinant目标,其Cholesky因子分解产生因果每步奖励,保留两个条件集并在高维状态空间中扩展。我们在基于模型的设置中实例化CIG,其中轨迹较短且轨迹内的修正仍大部分未探索。在十二个任务上,包括离散(MiniGrid)和连续控制(OGBench),在干净和随机干扰设置中,CIG在性能上优于或匹配先前的探索方法,同时对随机干扰具有鲁棒性。

英文摘要

Intrinsic rewards for exploration in reinforcement learning condition on different contexts: lifelong rewards score each transition against accumulated experience but ignore within-rollout redundancy; episodic rewards penalize intra-trajectory repetition but discard lifetime progress. Hybrid methods combine both signals through heuristic weights or require Gaussian-process dynamics that do not scale beyond low-dimensional state spaces. Trajectory-level information gain decomposes into per-step terms that condition on the replay buffer and rollout prefix simultaneously, but remains intractable for deep models. We derive the Conditional Information Gain (CIG) reward as a tractable surrogate: a log-determinant objective over an ensemble disagreement kernel whose Cholesky factorization yields causal per-step rewards that retain both conditioning sets while scaling to high-dimensional state spaces. We instantiate CIG in a model-based setting, where rollouts are short and within-rollout corrections remain largely unexplored. Across twelve tasks spanning discrete (MiniGrid) and continuous control (OGBench), in both clean and stochastic-distractor settings, CIG outperforms or matches prior exploration methods while remaining robust to stochastic distractors.

2605.20872 2026-05-21 cs.LG cs.AI cs.GR 版本更新

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

CAdam: 3D高斯密度细化中的上下文自适应矩估计

SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang

发表机构 * IIIXR Lab, Kyung Hee University(庆尚大学IIIXR实验室) IIIXR Lab, Korea University(韩国大学IIIXR实验室)

AI总结 本文提出CAdam方法,通过将密度细化问题转化为统计信号验证问题,解决生成式蒸馏中密度估计的瓶颈,从而在保持视觉质量的同时显著减少高斯点数量。

Comments Accepted to SIGGRAPH 2026 Conference Papers. 12 pages, 8 figures

详情
AI中文摘要

Adaptive densification是3D高斯点划法(3DGS)的核心引擎。然而,当将其应用于基于优化的生成式蒸馏范式时,这种重建原生机制暴露了根本性限制,导致效率低下且充满冗余的表示。我们诊断这种失败为密度困境,源于生成指导的随机性:标准的幅度基积累无差别地聚合瞬态噪声与几何信号,难以在过密度和欠拟合之间取得平衡。为了解决这一问题,我们引入了上下文自适应矩估计(CAdam),一种新的框架,将密度细化重新解释为统计上站得住的信号验证问题。CAdam利用梯度的一阶矩来利用干涉原理,其中随机波动通过破坏性干涉抵消,而一致的几何漂移通过建设性干涉累积,从而有效分离底层信号与生成噪声底座。这进一步通过基于分位数的上下文意识和内在信号噪声比(SNR)门控机制增强,确保在优化阶段之间具有鲁棒的适应性,并使密度细化能够软终止。在多样化的目标(SDS,ISM,VFDS)和强大的生成3DGS后端上进行了广泛的实验,结果表明CAdam相比标准密度细化将高斯点数减少85%-97%,同时保持整体可比的视觉质量。这些结果突显了信号感知密度控制作为改进优化生成式蒸馏内存效率的实用方法。

英文摘要

Adaptive densification is the engine of 3D Gaussian Splatting (3DGS). However, when transposed to the optimization-based Generative Distillation paradigm, this reconstruction-native mechanism reveals fundamental limitations, resulting in inefficient representations cluttered with redundant primitives. We diagnose this failure as a Densification Dilemma stemming from the stochastic nature of generative guidance: the standard magnitude-based accumulation indiscriminately aggregates transient noise alongside geometric signals, making it difficult to strike a balance between over-densification and under-fitting. To resolve this, we introduce Context-Adaptive Moment Estimation (CAdam), a novel framework that reinterprets densification as a statistically grounded signal verification problem. CAdam leverages the first moment of gradients to exploit the interference principle, where stochastic fluctuations cancel out via destructive interference while consistent geometric drifts accumulate via constructive interference, effectively disentangling the underlying signal from the generative noise floor. This is further augmented by a quantile-based context awareness and an intrinsic Signal-to-Noise Ratio (SNR) gating mechanism, which ensure robust adaptation across optimization stages and enable the soft termination of densification. Extensive experiments across diverse objectives (SDS, ISM, VFDS) and strong generative 3DGS backbones show that CAdam reduces Gaussian count by 85%-97% relative to standard densification while preserving overall comparable perceptual quality. These results highlight signal-aware density control as a practical way to improve memory efficiency in optimization-based generative distillation.

2605.20868 2026-05-21 cs.LG cs.AI cs.SY eess.SY 版本更新

Runtime-Certified Bounded-Error Quantized Attention

具有运行时认证的误差受限量化注意

Dean Calver

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出了一种分层的KV缓存架构,通过在GPU内存中存储INT8键和INT4值,同时在系统RAM中保留FP16原始数据,实现了运行时认证的注意机制,通过误差分解得到每头每步的误差界,以驱动自适应精度选择和多阶段回退流程,确保在需要时能恢复到精确的密集注意输出。

Comments 32 pages, 1 figure

详情
AI中文摘要

KV缓存量化减少了长上下文LLM推理的内存成本,但引入了通常仅通过经验验证的近似误差。现有系统依赖于平均情况下的鲁棒性,没有机制在运行时检测或恢复失败。本文提出了一种分层的KV缓存架构,使注意机制具有运行时认证:INT8键和INT4值存储在GPU内存中,而FP16原始数据保留在系统RAM中以实现确定性回退。一个两术语误差分解提供了每头每步的误差界(i)键量化导致的注意分布扭曲和(ii)值重建误差。这些界在线计算并用于驱动自适应精度选择和多阶段回退阶梯,确保在需要时能恢复到精确的密集注意输出。在PG-19、NIAH和RULER基准上,对LLaMA~3.1-8B(上下文长度达128K)的测试中,系统在语言建模和检索任务中与密集FP16 KV质量在噪声范围内匹配,同时恢复了在朴素INT8/INT4基线中观察到的灾难性故障。短上下文的值敏感任务暴露了压缩与保真度之间的可控权衡,可通过更紧的值容忍度或FP16值回退消除。认证是局部的(每头、每步),不保证端到端模型的正确性,但确保每个注意计算要么相对于FP16参考是受控的,要么通过回退精确恢复。这将KV缓存量化重新定义为运行时验证的计算,而不是固定近似。目标不是原始的速度提升,而是使在严格质量约束下安全部署的激进KV压缩成为可能。

英文摘要

KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.

2605.20866 2026-05-21 cs.LG cs.DC math.OC stat.ML 版本更新

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

LOSCAR-SGD:局部SGD与通信-计算重叠及延迟校正的稀疏模型平均

Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richtárik

发表机构 * KAUST(卡塔尔科技大学)

AI总结 本文研究了在异构计算环境下结合通信压缩、局部训练和通信-计算重叠的局部SGD方法,提出LOSCAR-SGD通过仅通信稀疏模型坐标并持续优化来提高分布式学习效率,首次给出了这种组合方法的理论保证。

详情
AI中文摘要

在分布式学习中,通信是主要的瓶颈,尤其是在大规模设置和联邦学习环境中链接缓慢时。减少此成本的三种标准方法是通信压缩、局部训练和通信-计算重叠。结合这些成分的方法在实践中被发现对大规模训练有效,但很少有理论支持同时结合这三种方法的方法。我们研究了一个异构计算环境,其中不同的工作者可能进行不同数量的局部步骤,并提出LOSCAR-SGD,一种局部SGD方法,仅通信模型坐标的稀疏子集,并在通信飞行期间继续优化。关键成分是延迟校正的合并规则,该规则在不丢弃重叠阶段所做进展的情况下整合延迟同步信息。我们为光滑非凸目标函数提供了收敛保证,并展示了稀疏性、重叠和工作者异质性如何影响收敛速度。据我们所知,这是首次针对这种成分组合的理论。实验进一步表明,通信-计算重叠减少了训练时间,并且延迟校正的合并优于朴素覆盖。

英文摘要

Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap phase. We give convergence guarantees for smooth non-convex objectives and show how sparsity, overlap, and worker heterogeneity affect the rate. To the best of our knowledge, this is the first theory for this combination of ingredients. Experiments further show that communication-computation overlap reduces training time and that the delay-corrected merge outperforms naive overwriting.

2605.20865 2026-05-21 cs.LG cs.AI 版本更新

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

多步似然比校正用于可验证奖励的强化学习

Deokgyu Yoon, Hyungkyu Kang, Joongkyu Lee, Byeongchan Kim, Gyungin Shin, Sungrae Park, Min-hwan Oh

发表机构 * Seoul National University(首尔国立大学) Upstage

AI总结 本文提出了一种多步前向轨迹政策优化(NFPO)算法,通过引入N步前向轨迹来改进PPO的近似目标,从而在可验证奖励的强化学习中实现更精确的策略改进。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)在提升大语言模型的推理能力方面起着关键作用。然而,广泛使用的PPO替代目标本质上是局部的,因为它们依赖于精确策略梯度目标的局部近似。虽然这种近似通过减少重要性采样引起的方差来提高稳定性,但它也引入了结构偏差到替代目标中,必须通过信任区域机制进行控制。在本文中,我们引入了N步前向轨迹,通过累积下一个N-1个token的似然比来增强PPO替代目标。基于这一想法,我们提出了N步前向轨迹策略优化(NFPO),一种将N步前向轨迹整合到掩码策略梯度框架中的实用RLVR算法。NFPO提供了一个连续的桥梁,将PPO替代目标与精确策略梯度目标联系起来,提供了一种控制偏差-方差权衡的原理机制。我们的理论分析表明,通过适当选择N,所提出的目标比标准PPO替代目标提供了更紧的策略改进界。在全面推理基准测试中,实验表明NFPO一致地提高了性能,支持了我们的理论发现。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) plays a pivotal role in improving the reasoning ability of large language models. However, widely used PPO surrogate objectives are fundamentally local, as they rely on a local approximation of the exact policy gradient objective. While this approximation improves stability by reducing the variance induced by importance sampling, it also introduces structural bias into the surrogate objective, which must be controlled through trust region mechanisms. In this work, we introduce the $N$-step forward trace, which augments the PPO surrogate objective using the cumulative likelihood ratio of the next $N-1$ tokens. Building on this idea, we propose $N$-Step Forward-Trace Policy Optimization (NFPO), a practical RLVR algorithm that integrates the $N$-step forward trace into the masked policy gradient framework. NFPO provides a continuous bridge between the PPO surrogate objective and the exact policy gradient objective, offering a principled mechanism for controlling the bias-variance trade-off. Our theoretical analysis shows that, with an appropriate choice of $N$, the proposed objective yields a tighter policy-improvement bound than the standard PPO surrogate. Experiments on comprehensive reasoning benchmarks demonstrate that NFPO consistently improves performance, supporting our theoretical findings.

2605.20863 2026-05-21 cs.DC cs.LG 版本更新

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

PlexRL: 服务化大语言模型执行在RLVR中的集群级编排

Yiqi Zhang, Fangzheng Jiao, Tian Tang, Boyu Tian, Hangyu Wang, Qiaoling Chen, Guoteng Wang, Zhen Jiang, Peng Sun, Ping Zhang, Xiaohe Hu, Ziming Liu, Menghao Zhang, Yanmin Jia, Yang You, Siyuan Feng

发表机构 * National University of Singapore(新加坡国立大学) Beihang University(北航) Shanghai Qiji Zhifeng Co., Ltd.(上海启智风科技有限公司) Nanyang Technological University(南洋理工大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 本文提出PlexRL,通过集群级编排服务化大语言模型执行,解决RLVR训练中的效率问题,提升集群容量并降低GPU小时成本,同时保持算法灵活性和最小的单任务开销。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)最近在大语言模型(LLMs)中解锁了强大的推理能力,触发了新算法和数据的快速探索。然而,RLVR训练 notoriously 不高效:长尾回放、工具引起的停滞以及回放和训练之间资源需求的不对称性引入了大量空闲时间,无法通过作业本地优化如同步流水线、异步回放或 colocated 执行来消除。我们认为这种低效是结构性的。虽然个体RLVR作业中的空闲间隙是不可避免的,但它们在不同作业之间 largely 抗相关,因此可以在集群级别利用。基于这一观察,我们提出了PlexRL,一个用于在RLVR作业中多路复用统一LLM服务的集群级运行时。通过集中管理模型放置、状态转换和功能级调度,在严格亲和约束下,PlexRL将LLM执行时间片分配到作业中以填补否则空闲的时期,而无需昂贵的模型迁移。我们的实现和评估表明,PlexRL显著提高了有效集群容量,并通过最大37.58%减少了用户GPU小时成本,同时保持算法灵活性并引入最小的单作业开销。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induced stalls, and asymmetric resource requirements between rollout and training introduce substantial idle time that cannot be eliminated by job-local optimizations such as synchronous pipelining, asynchronous rollout, or colocated execution. We argue that this inefficiency is structural. While idle gaps are unavoidable within individual RLVR jobs, they are largely anti-correlated across jobs and therefore exploitable at the cluster level. Leveraging this observation, we present PlexRL, a cluster-level runtime for multiplexing unified LLM services across RLVR jobs. By centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, PlexRL time-slices LLM execution across jobs to fill otherwise idle periods without expensive model migration. Our implementation and evaluations demonstrate that PlexRL significantly improves effective cluster capacity and reduces user GPU hour cost by maximum 37.58% while preserving algorithmic flexibility and introducing minimal per-job overhead.

2605.20856 2026-05-21 cs.RO cs.AI cs.LG 版本更新

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

DISC: 通过策略生成解耦指令与状态条件控制

Hanxiang Ren, Pei Zhou, Xunzhe Zhou, Yanchao Yang

发表机构 * Zhejiang University(浙江大学) The University of Hong Kong(香港大学) TranscEngram

AI总结 DISC通过策略生成解耦指令与状态条件控制,解决了任务状态耦合导致的观察泄漏问题,并在多个基准测试中表现出色,证明了语言生成的策略参数驱动行为。

详情
AI中文摘要

语言条件的操控策略通常通过共享网络参数处理指令和观察。这种任务-状态耦合提供了观察泄漏的路径——网络学习了场景到动作的捷径,完全绕过了语言接地。DISC通过结构上消除这一失败。而不是将通用策略条件在语言上,DISC使用超网络从指令本身生成整个任务特定的视觉-运动策略参数集。生成的策略从不直接访问语言;因此,其任务意识必须来自语言。 Consequently,观察泄漏没有路径出现。另一方面,生成一致的高维策略权重本身是一个具有挑战性的问题。我们通过两阶段超网络解决它,其细化阶段将基于梯度优化的结构作为前馈归纳偏差嵌入,产生全局一致的参数,而无需实际梯度计算。在标准数据预算上完全从头训练,DISC在LIBERO-90和Meta-World上优于所有耦合基线,在复杂、长周期任务中优势扩大,并在不使用外部预训练数据的情况下超越了大规模预训练的π₀。在一个现实基准中,所有任务共享相同的视觉上下文,DISC显著优于耦合替代方案,直接证实了语言生成的策略参数,而非视觉捷径,驱动行为。超网络进一步学习了一个语义结构化的参数流形,能够从最少的演示中实现少样本适应,并在改写指令中实现稳健的泛化。我们的代码可在:https://github.com/ReNginx/DISC获取。

英文摘要

Language-conditioned manipulation policies typically process instructions and observations through shared network parameters. This task-state entanglement provides a pathway for observation leakage -- networks learn scene-to-action shortcuts that bypass language grounding entirely. DISC eliminates this failure structurally. Rather than conditioning a universal policy on language, DISC uses a hypernetwork to generate the entire parameter set of a task-specific visuomotor policy from the instruction alone. The generated policy never directly accesses language; therefore, its task-awareness must come from the language. Consequently, observation leakage has no pathway to emerge. On the other hand, generating coherent high-dimensional policy weights is itself a challenging problem. We address it with a two-stage hypernetwork whose refinement stage embeds the structure of gradient-based optimization as a feed-forward inductive bias, producing globally consistent parameters without actual gradient computation. Trained entirely from scratch on standard data budgets, DISC outperforms all entangled baselines on LIBERO-90 and Meta-World, with advantages that widen on complex, long-horizon tasks -- and surpasses the large-scale pretrained $π_0$ despite using no external pretraining data. On a real-world benchmark where all tasks share identical visual context, DISC substantially outperforms entangled alternatives, directly confirming that language-generated policy parameters, not visual shortcuts, drive behavior. The hypernetwork further learns a semantically structured parameter manifold that enables few-shot adaptation from minimal demonstrations and robust generalization across paraphrased instructions. Our code is available at: {https://github.com/ReNginx/DISC}.

2605.20839 2026-05-21 cs.CV cs.LG 版本更新

Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

无需激活的图像识别回骨:在MetaFormer风格视觉模型中的多项式替代方案

Jeffrey Wang, Jonathan Gregory, Grigorios G. Chrysos

发表机构 * University of Wisconsin--Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出无需激活函数的多项式替代方法,用于在MetaFormer风格的视觉模型中实现图像识别,展示了多项式模块在多个数据集上的优越性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

现代视觉回骨将点激活(如ReLU、GELU)和指数softmax视为非线性性的必要来源,但我们证明在MetaFormer风格的视觉回骨中并不需要这些。我们为三个核心基本操作(MLP、卷积和注意力)设计了无需激活的多项式替代方案,其中Hadamard乘积替代标准非线性性以产生输入的多项式函数。这些模块可以无缝集成到现有架构中:在MetaFormer中实现,一个模块化的视觉回骨框架,我们的PolyNeXt模型在ImageNet分类、ADE20K语义分割和分布外鲁棒性上匹配或超过了基于激活的对应物,并且在计算成本降低的情况下显著优于先前的多项式网络,显示了标准模块的多项式变体击败了复杂自定义架构。

英文摘要

Modern vision backbones treat pointwise activations (e.g., ReLU, GELU) and exponential softmax as essential sources of nonlinearity, but we demonstrate they are not required within MetaFormer-style vision backbones. We design activation-free polynomial alternatives for three core primitives (MLPs, convolutions, and attention), where Hadamard products replace standard nonlinearities to yield polynomial functions of the input. These modules integrate seamlessly into existing architectures: instantiated within MetaFormer, a modular framework for vision backbones, our PolyNeXt models match or exceed activation-based counterparts across model scales on ImageNet classification, ADE20K semantic segmentation, and out-of-distribution robustness. We also substantially outperform prior polynomial networks at reduced computational cost, showing that polynomial variants of standard modules beat complex custom architectures.

2605.20834 2026-05-21 cs.AI cs.LG 版本更新

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

DPO与RLHF的条件等价性:隐含假设、失败模式与可证明对齐

Zhiqin Yang, Yonggang Zhang, Wei Xue, Dong Fang, Bo Han, Yike Guo

发表机构 * The Hong Kong University of Science(香港科技大学) Hong Kong Baptist University(香港 Baptist 大学)

AI总结 本文研究了DPO与RLHF的等价性问题,指出其等价性依赖于一个隐含假设,当该假设不成立时,DPO会优化相对优势而非绝对对齐,从而导致路径性收敛。作者提出CPO方法,通过引入约束实现可证明对齐,并通过几何解释揭示DPO的margin ranking机制。

Comments 49 pages

详情
AI中文摘要

直接偏好优化(DPO)作为一种替代强化学习从人类反馈(RLHF)的方法,理论上等价但实现更简单。我们证明这种等价性是条件性的而非普遍的,取决于一个隐含假设:RLHF最优策略必须偏好人类偏好响应。当该假设不成立时,DPO优化参考策略的相对优势而非绝对对齐人类偏好,导致路径性收敛,即策略降低DPO损失但偏好不被偏好响应。我们刻画了该假设被违反的情况,展示了不可取的解空间存在,并证明在这些情况下DPO和RLHF优化根本不同的目标。为解决此问题,我们引入约束偏好优化(CPO),通过在RLHF中加入约束以实现可证明对齐。我们进一步通过软边距排名提供几何解释,揭示DPO实现边距排名但可能具有潜在负目标。我们的理论分析确立了DPO保证成立的条件,并提供了保持简单性的同时具有可证明对齐的解决方案。在标准基准上的全面实验表明,CPO实现了最先进的性能。代码可在:https://github.com/visitworld123/CPO获取。

英文摘要

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently violated in practice: the RLHF-optimal policy must prefer human-preferred responses. When this assumption fails, DPO optimizes relative advantage over the reference policy rather than absolute alignment with human preferences, leading to pathological convergence where policies decrease DPO loss while preferring dispreferred responses. We characterize when this assumption is violated, show the existence of an undesirable solution space, and prove that DPO and RLHF optimize fundamentally different objectives in such cases. To address this, we introduce Constrained Preference Optimization (CPO), augmenting RLHF with constraints for provable alignment. We further provide a geometric interpretation through soft margin ranking, revealing that DPO implements margin ranking with potentially negative targets. Our theoretical analysis establishes when DPOs' guarantees hold and provides solutions preserving simplicity with provable alignment. Comprehensive experiments on standard benchmarks demonstrate that CPO achieves state-of-the-art performance. Code is available at: https://github.com/visitworld123/CPO.

2605.20824 2026-05-21 cs.LG 版本更新

Markovian Circuit Tracing for Transformer State Dynamic

马尔可夫电路追踪用于Transformer状态动态

Abdullah X

发表机构 * Project AWARE and Zephara AI(项目AWARE和Zephara AI)

AI总结 本研究提出马尔可夫电路追踪(MCT)方法,用于评估Transformer激活是否包含粗粒度的状态转移结构,通过合成的隐马尔可夫模型任务验证了残差激活中包含部分贝叶斯信念信息,并展示了状态抽象在不同状态下恢复粗粒度转移信号的效果。

详情
AI中文摘要

许多序列计算更容易通过内部状态的运动来研究,而不是孤立的局部电路。我们引入了马尔可夫电路追踪(MCT),一种用于测试Transformer激活是否包含粗粒度状态转移结构的诊断流程。该基准使用合成的隐马尔可夫模型(HMM)任务,其中潜在状态、转移矩阵、贝叶斯信念向量、贝叶斯最优预测以及强制状态反事实目标都是已知的。在六个HMM家族和每个家族三个种子的情况下,tiny因果Transformer学习接近贝叶斯的下一个token预测器,其平均超额损失为0.0138。残差激活在受控的合成基准中包含部分贝叶斯信念信息。从这些激活中提取的状态抽象在持久和低状态领域恢复粗粒度转移信号最强,在模糊发射和六状态领域则较弱。最清晰的结果来自状态强制。修复恢复的状态质心将KL值从未修复模型中的0.1957降低到0.0532,平均上优于错误状态、均值激活、随机激活和洗牌标签控制。本研究的贡献是一个受控的基准和评估框架,用于Transformer状态动态可解释性,MCT作为简单的参考流程。

英文摘要

Many sequence computations are easier to study as movement through internal states than as isolated local circuits. We introduce Markovian Circuit Tracing (MCT), a diagnostic pipeline for testing whether transformer activations contain coarse state-transition structure. The benchmark uses synthetic Hidden Markov Model (HMM) tasks where latent states, transition matrices, Bayesian belief vectors, Bayes-optimal predictions, and forced-state counterfactual targets are known exactly. Across six HMM families and three seeds per family, tiny causal transformers learn near-Bayes next-token predictors, with mean excess loss over Bayes of 0.0138. Residual activations contain partial Bayesian belief information in this controlled synthetic benchmark. State abstractions extracted from these activations recover coarse transition signal, strongest in persistent and lower-state regimes, and weaker in ambiguous-emission and six-state regimes. The clearest result comes from state forcing. Patching a recovered-state centroid reduces KL to the exact HMM counterfactual target from 0.1957 in the unpatched model to 0.0532 on average, beating wrong-state, mean-activation, random-activation, and shuffled-label controls. The contribution is a controlled benchmark and evaluation framework for transformer state-dynamics interpretability, with MCT as a simple reference pipeline

2605.20815 2026-05-21 cs.CL cs.AI cs.IR cs.LG 版本更新

GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

在消费级硬件上实现GraphRAG:对本地LLMs在医疗EHR模式检索中的基准测试

Peter Fernandes, Ria Kanjilal

发表机构 * Department of Computer Engineering(计算机工程系) California Polytechnic State University(加州州立大学波特兰分校)

AI总结 本文研究了在消费级硬件上使用本地LLMs进行医疗EHR模式检索的GraphRAG方法,评估了四种不同模型在索引效率、知识图构建、查询延迟、回答质量和幻觉方面的表现,发现模型参数大小和检索模式对结果有显著影响。

Comments 9 pages, 1 figure, 5 tables

详情
AI中文摘要

基于图的检索增强生成(GraphRAG)扩展了检索增强生成,以支持对复杂语料库的结构化推理,但其在资源受限、隐私敏感的部署中的可靠性仍不清楚。在医疗领域,电子健康记录(EHR)数据复杂且严格监管,依赖云基于大语言模型(LLMs)会带来成本、延迟和合规性的挑战。本文系统评估了GraphRAG在EHR模式检索中的应用,使用本地部署的开源LLMs。我们实现了Microsoft GraphRAG管道在真实的EHR模式文档上,并基准测试了四种模型,包括Llama 3.1(8B)、Mistral(7B)、Qwen 2.5(7B)和Phi-4-mini(3.8B),这些模型通过Ollama在单个消费级GPU(8 GB VRAM)上部署。我们评估了索引效率、知识图构建、查询延迟、回答质量和幻觉在全局和局部检索模式下的表现。我们的结果揭示了显著差异:Llama 3.1生成最丰富的知识图(1,172个实体),Qwen 2.5达到最佳回答质量(3.3/5),Phi-4-mini因结构化输出错误无法完成流程,而Mistral表现出退化重复行为。我们进一步表明,GraphRAG具有实际容量阈值,其中模型参数低于约7B的模型无法可靠地生成有效的结构化输出并无法完成流程。此外,索引和回答质量在不同模型之间是脱耦的,局部检索在延迟和事实基础方面均优于全局总结,且幻觉减少。这些发现表明,GraphRAG可以在消费级硬件上实现,同时强调了模型选择和检索设计在受监管环境中的重要性。

英文摘要

Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency, answer quality, and hallucination under both global and local retrieval modes. Our results reveal substantial differences: Llama 3.1 produces the richest knowledge graph (1,172 entities), Qwen 2.5 achieves the best answer quality (3.3/5), Phi-4-mini fails to complete the pipeline due to structured-output errors, and Mistral exhibits degenerate repetition behavior. We further show that GraphRAG exhibits a practical capacity threshold, where models below approximately 7B parameters fail to reliably produce valid structured outputs and cannot complete the pipeline. In addition, indexing and answer quality are decoupled across models, and local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination. These findings demonstrate that GraphRAG is feasible on consumer hardware while highlighting the importance of model selection and retrieval design for robust deployment in regulated settings.

2605.20804 2026-05-21 cs.CV cs.LG 版本更新

OlmoEarth v1.1: A more efficient family of OlmoEarth models

OlmoEarth v1.1: 一个更高效的OlmoEarth模型家族

Gabriel Tseng, Yawen Zhang, Favyen Bastani, Henry Herzog, Joseph Redmon, Hadrien Sablon, Piper Wolters, Patrick Alan Johnson, Christopher Wilhelm, Patrick Beukema

发表机构 * Allen Institute for AI(人工智能研究所)

AI总结 本文提出了一种改进的OlmoEarth模型家族,通过优化训练和推理过程,显著降低了计算成本,同时保持了模型的整体性能。

详情
AI中文摘要

我们介绍了OlmoEarth家族的一系列改进。这些改进使我们在训练过程中减少了计算成本(训练Base模型所需的GPU小时减少了1.7倍),并在Sentinel-2任务中推理时减少了MACs(2.9倍),同时保持了模型的整体性能。所有训练代码均在github.com/allenai/olmoearth_pretrain上提供。

英文摘要

We present a set of improvements to the OlmoEarth family. These improvements allow us to cut compute costs during training ($1.7 \times$ reduction in GPU hours required to train our Base models) and inference ($2.9\times$ reductions in MACs on Sentinel-2 tasks), while maintaining the models' overall performance. All training code is available at github.com/allenai/olmoearth_pretrain.

2605.20803 2026-05-21 cs.LG cs.AI 版本更新

Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

可调MAGMAX:面向持续学习的偏好感知模型融合

Kei Hiroshima, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(Yokohama国立大学)

AI总结 本文提出了一种名为可调MAGMAX的模型融合框架,通过引入偏好向量控制任务特定性能,以适应不同的部署环境和用户偏好,从而在持续学习中实现更有效的模型融合。

Comments 17 pages, 4 figures. Accepted at ICPR 2026

详情
AI中文摘要

持续学习(CL)旨在顺序训练多个任务的同时,减轻对之前学习知识的灾难性遗忘。最近在大预训练模型(LPMs)和模型融合技术,如MAGMAX方面的进展,通过结合任务特定参数展示了有效的CL性能。然而,现有方法主要关注所有任务的平均性能,并未充分解决如何构建能够适应不同部署环境或变化用户偏好的模型的问题。本文提出了一种模型融合框架,称为可调MAGMAX,它使持续学习中的任务特定性能能够受到偏好控制。我们的方法引入了一个偏好向量,该向量在模型融合过程中控制从每个任务向量中选择的元素数量,使我们能够根据部署需求调整融合模型的性能。我们进一步提出了一种方法,通过利用少量目标环境数据和模型训练任务的数据集,自动构建合适的偏好向量,从而消除了手动指定的需要。在CL基准任务上的实验结果表明,可调MAGMAX有效地控制了任务层面的性能,并成功地将融合模型适应于各种目标环境。所提出的可调MAGMAX在性能上优于或与基线方法相当,使其成为部署到各种环境中的实用解决方案,其中每个任务的偏好不同。

英文摘要

Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific parameters. However, existing methods primarily focus on average performance across all tasks and do not adequately address how to construct models accommodating different deployment environments or varying user preferences. This paper proposes a model merging framework, termed Tunable MAGMAX, which enables preference-aware control of task-specific performance in CL. Our method introduces a preference vector that controls the number of elements selected from each task vector during model merging, allowing us to adjust the merged model performance according to their deployment needs. We further propose a method for automatically constructing appropriate preference vectors by leveraging small amounts of target environment data and datasets from model training tasks, thereby eliminating the need for manual specification. The experimental result on CL benchmark tasks demonstrates that Tunable MAGMAX effectively controls task-wise performance and successfully adapts merged models to various target environments. The proposed Tunable MAGMAX achieves superior or comparable performance to baseline methods, making it a practical solution for deploying CL models to various environments where the preferences of each task performance differ.

2605.20799 2026-05-21 cs.DC cs.LG 版本更新

Instant GPU Efficiency Visibility at Fleet Scale

在舰队规模上实现GPU效率的即时可见性

Connor Pedersen, Dong H. Ahn, Michel Migdal, Collin Neale, Nik Konyuchenko

发表机构 * Microsoft(微软) Alphabet Meta

AI总结 本文提出了一种硬件级别的GPU效率指标Overall FLOP Utilization (OFU),用于HPC系统上的AI工作负载,通过两种芯片级性能计数器:张量管道活动和SM时钟频率来推导。OFU无需应用程序仪器化,适用于所有GPU代际和数值精度。通过在H100和GB200上进行受控的GEMM实验,研究了OFU近似值的五个属性,并在FP16、TF32、FP8和NVFP4上进行了验证。经过瓷砖量化修正后,OFU可以预测应用级别的MFU误差不超过2个百分点。在608个生产训练任务中,OFU与应用级别的MFU相关性达到0.78,并揭示了两个框架级别的FLOPs计算错误。在大规模GPU舰队上部署后,OFU检测到2.5倍的效率退化,并追踪了混合精度预训练中的精度依赖性利用变化。评估和运营经验表明,OFU是应用级别MFU的实用且可部署的补充,用于持续的舰队级效率监控。

Comments 12 pages, 7 figures, 3 tables

详情
AI中文摘要

我们提出Overall FLOP Utilization (OFU),一种用于HPC系统上AI工作负载的硬件级别、不依赖精度的GPU效率度量标准,其基于两个芯片级性能计数器:张量管道活动和SM时钟频率。OFU不需要应用程序仪器化,并适用于所有GPU代际和数值精度。我们通过在H100和GB200上进行受控的GEMM实验,研究了OFU近似值的五个属性——瓷砖量化、浮点精度缩放、时钟采样噪声、张量核心时钟域和非张量低估——在FP16、TF32、FP8和NVFP4上进行验证。在瓷砖量化修正后,OFU预测应用级别的MFU误差不超过2个百分点。在608个生产训练任务中,OFU与应用级别的MFU相关性达到0.78,并揭示了两个框架级别的FLOPs计算错误。在大规模GPU舰队上部署后,OFU检测到2.5倍的效率退化,并追踪了混合精度预训练中的精度依赖性利用变化。我们的评估和运营经验表明,OFU是应用级别MFU的实用且可部署的补充,用于持续的舰队级效率监控。

英文摘要

We present Overall FLOP Utilization (OFU), a hardware-level, precision-agnostic GPU efficiency metric for AI workloads on HPC systems, derived from two on-chip performance counters: Tensor Pipe Activity and SM clock frequency. OFU requires no application instrumentation and works across GPU generations and numeric precisions. We characterize five properties of the OFU approximation -- tile quantization, floating-point precision scaling, clock sampling noise, Tensor Core clock domains, and non-tensor undercounting -- through controlled GEMM experiments on H100 and GB200 across FP16, TF32, FP8, and NVFP4. After tile-quantization correction, OFU predicts application-level MFU to within <=2 percentage points. Against 608 production training jobs, OFU achieves r = 0.78 correlation with application-level MFU and surfaces two framework-level FLOPs miscalculations. Deployed across large-scale GPU fleets, OFU has detected a 2.5x efficiency regression and tracked precision-dependent utilization changes in mixed-precision pretraining. Our evaluation and operational experience suggest OFU is a practical, deployment-ready complement to application-level MFU for continuous fleet-wide efficiency monitoring.

2605.20798 2026-05-21 cs.LG cs.CL 版本更新

Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

大多数变换器修改仍无法在1-3B规模上迁移:对Narang等人(2021)的2020-2026年更新,包含下游评估和噪声底限

Yang Zhao, Jiahao Lu, Bin Huang, Guhua Zhang, Jie Zhou

发表机构 * Tencent(腾讯)

AI总结 本文在1-3B参数规模下,大多数变换器修改仍无法迁移,通过严格的等数据、等计算、等配方控制测试,并结合下游评估和噪声底限进行验证。

Comments 19 pages, 3 figures, under review at EMNLP 2026

详情
AI中文摘要

Narang等人(2021)在T5-base规模上评估了40多种变换器修改,并得出结论,大多数修改无法迁移。五年后,典型的运行模式已转移到1-3B参数规模,下游评估已取代预训练困惑度,且出现了一大批新的修改类别。我们通过在1.2B和3B参数规模上严格测试20种2021年后出现的变换器修改,采用多种子基线噪声底限和CLIMB-12下游评估作为主要指标,重新审视其问题。核心发现在此精心挑选的集合中重现了他们的结论:大多数修改仍无法迁移。在20种修改中,只有两种在1.2B规模上通过Bonferroni校正;其中一种在3B规模下采用共享配方时无法稳定训练。我们还发现,Tay等人(2023)报告的损失-下游差距对于注意力输出修改而言扩大了几倍:两种显著失败的修改在基准验证损失上接近2-3%,但CLIMB点数却下降了6-16点。我们得出结论,噪声底限报告、下游评估和跨规模稳定性测试现在是1-3B参数规模下架构比较的必要条件。

英文摘要

Narang et al. (2021) evaluated 40+ Transformer modifications at T5-base scale and concluded that most did not transfer. Five years later, the typical working regime has moved to 1-3B parameters, downstream evaluation has replaced pretraining perplexity, and a substantially different catalogue of modifications has emerged. We revisit their question by testing 20 post-2021 Transformer modifications at 1.2B and 3B under strict iso-data, iso-compute, iso-recipe control, with a multi-seed baseline noise floor and CLIMB-12 downstream evaluation as the primary metric. The central finding reproduces theirs at this curated set: most modifications do not transfer. Of the 20 modifications, only two clear Bonferroni correction at 1.2B; one of those two further fails to train stably at 3B under the shared recipe. We also find that the loss-downstream gap reported by Tay et al. (2023) enlarges several-fold for attention-output modifications: two significant failures converge to within 2-3% of baseline validation loss yet drop 6-16 CLIMB-points. We conclude that noise-floor reporting, downstream evaluation, and cross-scale stability testing are now prerequisites for architecture comparisons at 1-3B.

2605.20797 2026-05-21 cs.LG 版本更新

Beyond Numerical Features: CNN-Driven Algorithm Selection via Contour Plots for Continuous Black-Box Optimization

超越数值特征:通过等高线图进行CNN驱动的算法选择用于连续黑盒优化

Yiliang Yuan, Xiang Shi, Mustafa Misir

发表机构 * Mohamed bin Zayed University of Artificial Intelligence, Masdar City, United Arab Emirates(莫扎德人工智能大学,马斯达尔城,阿拉伯联合酋长国)

AI总结 本文提出了一种基于表示的实例级算法选择方法,应用于黑盒优化,用于自动从固定组合中选择最有前途的求解器。传统连续优化工作主要依赖于数值描述符,包括探索景观分析特征和学习嵌入如Deep-ELA。本文研究了一种互补的表示:探测景观的等高线图可视化。一个CNN回归器利用多个实例特定的等高线视图(堆叠或编码每个视图并聚合)来预测每个求解器的性能,从而通过预测的最佳值进行选择。在标准BBOB 2009单目标协议上,所得到的选者显著优于单最佳求解器(SBS),并与基于特征的基线具有竞争力。随后在DeepELA设置下的双目标评估进一步表明,当使用窗口等高线视图时,基于图像的原则同样具有竞争力。总体而言,结果表明,简单的视觉模型可以利用探测景观中的空间结构进行算法选择,而无需手工设计ELA特征。

详情
AI中文摘要

本文介绍了一种新的基于表示的方法,用于实例级算法选择,应用于黑盒优化,以自动从固定组合中选择最有前途的求解器。传统连续优化工作主要依赖于数值描述符,包括探索景观分析特征和学习嵌入如Deep-ELA。本文研究了一种互补的表示:探测景观的等高线图可视化。一个CNN回归器利用多个实例特定的等高线视图(堆叠或编码每个视图并聚合)来预测每个求解器的性能,从而通过预测的最佳值进行选择。在标准BBOB 2009单目标协议上,所得到的选者显著优于单最佳求解器(SBS),并与基于特征的基线具有竞争力。随后在DeepELA设置下的双目标评估进一步表明,当使用窗口等高线视图时,基于图像的原则同样具有竞争力。总体而言,结果表明,简单的视觉模型可以利用探测景观中的空间结构进行算法选择,而无需手工设计ELA特征。

英文摘要

The present paper introduces a new representation-driven approach to per-instance algorithm selection, applied to black-box optimization, for automatically choosing the most promising solver from a fixed portfolio. Prior work in continuous optimization largely relies on numerical descriptors, including Exploratory Landscape Analysis features and learned embeddings such as Deep-ELA. This work studies a complementary representation: contour-map visualizations of probed landscapes. A CNN regressor takes multiple instance-specific contour views (stacked or encoded per view and aggregated) and predicts per-solver performance, enabling selection by the predicted best value. On the standard BBOB 2009 single-objective protocol, the resulting selectors significantly outperform the single best solver (SBS) and are competitive with feature-based baselines. A subsequent bi-objective evaluation under the DeepELA setting further indicates that the same image-based principle can be competitive when using windowed contour views. Overall, the results suggest that simple vision models can exploit spatial structure in probed landscapes for algorithm selection without handcrafted ELA features.

2605.20784 2026-05-21 cs.AI cs.LG 版本更新

Interaction Locality in Hierarchical Recursive Reasoning

层次递归推理中的交互局部性

Yosuke Miyanishi, Tetsuro Morimura

发表机构 * CyberAgent Inc.(CyberAgent公司)

AI总结 本文提出交互局部性框架,用于测量信息流是否在附近单元或语义段内传输或跨越,通过在HRM和TRM等层次递归推理模型上应用,验证了局部执行与全局规划的可重复测量框架。

详情
AI中文摘要

空间推理需要位置绑定计算和位置不变结构:智能体必须在保持路线、对象或约束层次计划的同时进行局部移动。我们提出交互局部性,一种任务-几何感知的框架,用于衡量信息流是否在附近单元或语义段内传输或跨越。我们通过稀疏自动编码器特征消融和有限噪声激活补丁来实例化该框架,并在附录中报告了结构性雅可比和注意力检查。将其应用于Maze-Hard、Sudoku Extreme和ARC-AGI等模型。在这些模型中,激活补丁给出了最清晰的架构指纹:高层递归状态倾向于在附近单元或相同段内写入信息,而重复的递归更新将这些局部写入累积到更广泛的解决方案结构中。这种模式在迷宫路径、数独约束和ARC-AGI对象邻域中均成立,其中TRM表现最强。为了测试交互局部性是否超越玩具但具有挑战性的网格基准,我们还将其应用于MTU3D,一个大规模的具身3D场景-grounding模型。在MTU3D设置中,因果空间局部性主要出现在视觉场景特征传递给下游grounding模块的过渡处,而不是在视觉编码器中均匀分布。这种对比表明,HRM和TRM中观察到的局部到全局的交接与显式递归推理动态有关,而具身3D模型可能在模块边界集中因果空间结构。交互局部性将直观的局部执行/全局规划故事转化为可重复测量的递归和具身空间推理框架。

英文摘要

Spatial reasoning requires both location-bound computation and location-invariant structure: agents must make local moves while preserving route, object, or constraint-level plans. We propose interaction locality, a task-geometry-aware framework for measuring whether information flow stays within nearby cells or semantic segments, or crosses them. We instantiate the framework with sparse-autoencoder feature ablations and finite-noise activation patching, with structural Jacobian and attention checks reported in the appendix, and apply it to HRM and TRM, two compact hierarchical and recursive reasoning models, on Maze-Hard, Sudoku Extreme, and ARC-AGI. Across these models, activation patching gives the clearest architectural fingerprint: high-level recurrent states tend to write information within nearby cells or same-segment units, while repeated recursive updates accumulate these local writes into broader solution structure. This pattern holds across maze paths, Sudoku constraints, and ARC-AGI object neighborhoods, with the strongest concentration in TRM. To test whether interaction locality extends beyond toy-yet-challenging grid benchmarks, we also apply it to MTU3D, a large-scale embodied 3D scene-grounding model. In this MTU3D setting, causal spatial locality appears primarily at the transition where visual scene features are handed to the downstream grounding module, rather than uniformly throughout the visual encoder. This contrast suggests that the local-to-global handoff observed in HRM and TRM is tied to explicit recursive reasoning dynamics, while embodied 3D models may concentrate causal spatial structure at module boundaries. Interaction locality turns the intuitive local-execution/global-planning story into a reproducible measurement framework for recursive and embodied spatial reasoning.

2605.20782 2026-05-21 cs.LG 版本更新

Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health

因果机器学习并非万能:健康领域观察性因果推断的路线图

Donna Tjandra, Trenton Chang, Sonali Parbhoo, Rajesh Ranganath, Andre Kurepa Waschka, William Mitchell, Maggie Makar, Shalmali Joshi, Finale Doshi-Velez, Leo Anthony Celi, Jenna Wiens

发表机构 * Division of Computer Science and Engineering, University of Michigan(密歇根大学计算机科学与工程系) Department of Electrical and Electronic Engineering, Imperial College London(伦敦帝国理工学院电子与电气工程系) Courant Institute of Mathematical Sciences, New York University(纽约大学Courant数学科学研究所) Center for Data Science, New York University(纽约大学数据科学中心) Department of Mathematics & Statistics, Elon University(埃洛伊大学数学与统计学系) Department of Ophthalmology, Cambridge University Hospitals(剑桥大学医院眼科部) Department of Biomedical Informatics, Columbia University(哥伦比亚大学生物医学信息学系) School of Engineering and Applied Science, Harvard University(哈佛大学工程与应用科学学院) Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology(麻省理工学院医学工程与科学研究所计算生理学实验室) Department of Medicine, Beth Israel Deaconess Medical Center(贝斯以色列德aconess医疗中心医学部) Department of Biostatistics, Harvard T.H. Chan School of Public Health(哈佛T.H. Chan公共卫生学院生物统计学系)

AI总结 本文探讨了因果机器学习在观察性数据中的应用,强调了验证有效性假设和合理使用因果机器学习的重要性,提出了加强因果分析严谨性和可解释性的模板。

详情
AI中文摘要

目的:随着大规模观察性临床数据集的日益可用以及随机对照试验的挑战,使用因果机器学习(ML)进行观察性数据中的因果推断引发了热情。我们提出了应用因果ML到观察性数据的路线图。材料和方法:我们概述了在可用数据中评估有效性假设的重要性,并负责任地应用于临床专家使用因果ML和ML从业者有限的临床专业知识。观察:尽管因果ML有所进步,但其限制在各学科中仍然被低估。这种知识缺口可能影响发现的有效性。讨论:因果假设必须得到满足,模型选择必须得到证明。否则,这些方法可能会产生有偏见或误导性的结果,对临床研究和患者护理产生影响。结论:因果ML可以成为生成因果假设的强大工具。我们提供了一个模板来加强因果分析的严谨性和可解释性。

英文摘要

Objective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in observational data. We present a roadmap for applying causal ML to observational data. Materials and methods: We outline the importance of assessing validity assumptions within available data and applying causal ML responsibly for clinical experts using causal ML and ML practitioners with limited clinical expertise. Observations: Despite advances in causal ML, its limitations remain largely under-appreciated across disciplines. This gap in shared knowledge may impact the validity of findings. Discussion: Causal assumptions must be satisfied and modeling choices justified. Otherwise, these approaches risk producing biased or misleading results, with consequences for clinical research and patient care. Conclusion: Causal ML can be a powerful tool for generating causal hypotheses. We provide a template to strengthen the rigor and interpretability of causal analyses.

2605.20780 2026-05-21 cs.LG cs.CV 版本更新

Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

学习物理中的推理:通过表征对齐打破科学扩散中的捷径学习

Haozhe Jia, Pengyu Yin, Wenshuo Chen, Shaofeng Liang, Lei Wang, Bowen Tian, Xiucheng Wang, Nanqian Jia, Yutao Yue

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港理工大学(广州)) Shandong University(山东大学) LimX Dynamics Technology Co., Ltd.(LimX动态技术有限公司) Xidian University(西安电子科技大学) Peking University(北京大学) Institute of Deep Perception Technology, Jiangsu Industrial Technology Research Institute (JITRI)(感知技术研究院,江苏工业技术研究院(JITRI)) Griffith University(格里菲斯大学)

AI总结 该研究提出了一种无需教师的框架REPA-P,通过使用原理残差对中间特征与物理状态进行对齐,以解决物理信息扩散模型中中间表示在边界条件变化时容易产生捷径学习的问题,从而在四个PDE任务中提高了收敛速度、减少了物理残差并增强了分布外鲁棒性。

详情
AI中文摘要

物理信息扩散模型通常只在最终输出上强制实施PDE约束,导致中间表示不受约束且在边界条件变化时容易产生捷径学习。我们引入了REPA-P,一种无需教师、架构无关的框架,通过原理残差对中间特征与物理状态进行对齐。REPA-P在选定的层上附加轻量级1×1投影头,将隐藏激活解码为物理量,并在训练过程中应用PDE残差损失。这些头在推理时被丢弃,引入了零开销。在四个PDE任务中,包括达西流、拓扑优化、静电势和湍流通道流,REPA-P通过2倍的收敛加速、66.4%的残差减少和49.3%的分布外鲁棒性提升,实现了在U-Net和扩散变换器骨干网络上的持续收益。消融实验显示,监督少量中间层捕获了大部分收益,并补充了输出级物理损失。代码可在[https://github.com/Hxxxz0/REPA-P](https://github.com/Hxxxz0/REPA-P)获得。

英文摘要

Physics-informed diffusion models typically enforce PDE constraints only on final outputs, leaving intermediate representations unconstrained and prone to shortcut learning under shifted boundary conditions. We introduce **REPA-P**, a teacher-free, architecture-agnostic framework that aligns intermediate features with physical states using first-principles residuals. REPA-P attaches lightweight $1{\times}1$ projection heads to selected layers, decodes hidden activations into physical quantities, and applies PDE residual losses during training. These heads are discarded at inference, introducing **zero overhead**. Across four PDE tasks, including Darcy flow, topology optimization, electrostatic potential, and turbulent channel flow, REPA-P accelerates convergence by up to $2{\times}$, reduces physics residuals by up to $66.4\%$, and improves out-of-distribution robustness by up to $49.3\%$, with consistent gains on both U-Net and Diffusion Transformer backbones. Ablations show that supervising a small set of intermediate layers captures most benefits and complements output-level physics losses. Code is available at [https://github.com/Hxxxz0/REPA-P](https://github.com/Hxxxz0/REPA-P).

2605.20771 2026-05-21 cs.LG 版本更新

Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations

通过主动学习查询进行累积元学习以增强对虚假相关性的鲁棒性

Kin Whye Chew, Jingxian Wang

发表机构 * Department of Computer Science(计算机科学系) National University of Singapore(新加坡国立大学)

AI总结 本文提出了一种累积主动元学习(CAML)框架,通过主动学习查询样本来元学习先验知识,以提高模型对虚假相关性的鲁棒性,实验结果显示在多个基准测试中性能显著提升。

Comments Under review. 26 pages, 7 figures

详情
AI中文摘要

现实世界数据集中的虚假相关性导致机器学习模型依赖于无关模式,削弱了可靠性、泛化能力和公平性。主动学习提供了一种有前景的方法来解决这一故障模式,通过查询能够区分核心特征和虚假特征的信息样本。然而,标准的主动学习方法只是将查询的示例添加到标记集中,仅更新了似然项。在深度学习领域,这些信息样本的影响可能被更大的标记集稀释,并被过参数化的模型记忆化。我们提出了累积主动元学习(CAML),一种主动学习框架,利用查询的示例来元学习先验,或归纳偏差,以指导模型的适应。CAML将每个主动学习轮次视为一个元学习任务:当前的标记集作为元训练数据用于适应,而新查询的批次作为元测试数据用于评估泛化能力。与传统元学习不同,CAML利用主动学习轮次之间的序列依赖性,通过维护一个逐步细化的累积归纳偏差。理论上,我们证明了这种累积形式引入了交互项,将早期元学习的归纳偏差与后期查询诱导的目标联系起来,捕捉了标准元学习中缺失的依赖关系。实验表明,CAML在多个虚假相关性基准测试和获取策略中提高了少数群体的准确性,最高在Dominoes上提升了27.8%,在Waterbirds上提升了29.9%,在SpuCo上提升了14.3%,在CivilComments上提升了24.0%。

英文摘要

Spurious correlations in real-world datasets cause machine learning models to rely on irrelevant patterns, undermining reliability, generalization, and fairness. Active learning offers a promising way to address this failure mode by querying informative samples that distinguish core features from spurious ones. However, standard active-learning methods simply append queried examples to the labeled set, effectively updating only the likelihood term. In deep learning regimes, the influence of these informative samples can be diluted by the larger labeled set and memorized by overparameterized models. We propose Cumulative Active Meta-Learning (CAML), an active-learning framework that uses queried examples to meta-learn the prior, or inductive bias, governing how the model adapts. CAML casts each active-learning round as a meta-learning task: the current labeled set serves as meta-train data for adaptation, while the newly queried batch serves as meta-test data for evaluating generalization. Unlike conventional meta-learning, which treats tasks as independent and identically distributed, CAML exploits the sequential dependence between active-learning rounds by maintaining a cumulative inductive bias that is progressively refined. Theoretically, we show that this cumulative formulation introduces interaction terms that couple earlier meta-learned inductive biases with later query-induced objectives, capturing dependencies absent from standard meta-learning. Empirically, CAML improves minority-group accuracy across spurious-correlation benchmarks and acquisition strategies, with gains of up to 27.8% on Dominoes, 29.9% on Waterbirds, 14.3% on SpuCo, and 24.0% on CivilComments.

2605.20767 2026-05-21 cs.CL cs.LG stat.ME 版本更新

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

干预的幻觉:你的LLM模拟实验实际上是一个观察性研究

Victoria Lin, Taedong Yun, Maja Matarić, John Canny, Arthur Gretton, Alexander D'Amour

发表机构 * Google DeepMind(谷歌深Mind) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文探讨了大型语言模型在模拟人类行为中的潜在作用,指出在LLM模拟的合成用户中进行干预可能引起潜在用户属性的意外变化,从而导致用户漂移,影响效果估计。本文提出了使用负对照结果来检测分布变化的方法,并通过调整角色描述以减少偏倚来缓解漂移问题。

详情
AI中文摘要

大型语言模型(LLMs)显示出作为人类行为模拟器的潜力,提供了一种可扩展的方式研究对干预的反应。然而,由于LLMs主要基于观察性数据进行训练,在与LLM模拟的合成用户进行实验时,干预可能会引起潜在用户属性的意外变化,导致用户漂移,其中隐含的模拟总体在不同处理条件下有所不同,这可能会扭曲效应估计。我们正式化了由于用户漂移可能产生的混淆或选择偏差,并展示了干预依赖性变化如何放大或减弱干预下用户响应的观测差异。为了诊断混淆,我们提出使用负对照结果——在干预下应保持不变的属性——来识别干预条件间的分布变化,提供用户漂移的证据。为了缓解漂移,我们研究了通过获取额外的混杂因素来调整角色描述,发现针对特定场景的相关混杂因素可以显著减少调查式和多轮代理评估中的偏倚。

英文摘要

Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

2605.20758 2026-05-21 cs.AI cs.CV cs.LG cs.RO 版本更新

Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

面向组合奖励的冲突感知加法引导:流模型中的对抗性生成

Xuehui Yu, Fucheng Cai, Meiyi Wang, Xiaopeng Fan, Harold Soh

发表机构 * Smart Systems Institute, National University of Singapore, Singapore(新加坡国立大学智能系统研究所) Faculty of Computing, Harbin Institute of Technology, Harbin, China(哈尔滨工业大学计算机学院) School of Computing, National University of Singapore, Singapore(新加坡国立大学计算机学院)

AI总结 本文提出了一种面向组合奖励的冲突感知加法引导方法,用于在流模型中处理对抗性生成问题,通过动态检测和解决梯度冲突来纠正离曼福德漂移,提升了生成保真度。

Comments Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

在推理时间进行引导采样可以无需微调就通过解释生成过程为可控轨迹来驱动最先进的扩散和流模型。这提供了一种简单灵活的方式,将外部约束(如成本函数或预训练验证器)注入受控生成中。然而,现有方法在同时组合多个约束时往往失效,导致偏离真实数据曼福德。在本工作中,我们识别出这种离曼福德漂移的根本原因,并发现近似误差随着梯度不一致程度严重增加。基于这些发现,我们提出了一种轻量且可学习的方法,即冲突感知加法引导(g^car),该方法通过动态检测和解决梯度冲突来主动纠正离曼福德漂移。我们验证了g^car在多样化的领域中的有效性,从合成数据集和图像编辑到生成决策规划与控制。我们的结果表明,g^car有效纠正了离曼福德漂移,在生成保真度方面超越了基线方法,同时使用轻量计算。代码可在https://github.com/yuxuehui/CAR-guidance获取。

英文摘要

Inference-time guided sampling steers state-of-the-art diffusion and flow models without fine-tuning by interpreting the generation process as a controllable trajectory. This provides a simple and flexible way to inject external constraints (e.g., cost functions or pre-trained verifiers) for controlled generation. However, existing methods often fail when composing multiple constraints simultaneously, which leads to deviations from the true data manifold. In this work, we identify root causes of this off-manifold drift and find that the approximation error scales severely with gradient misalignment. Building on these findings, we propose Conflict-Aware Additive Guidance ($g^\text{car}$), a lightweight and learnable method, which actively rectifies off-manifold drift by dynamically detecting and resolving gradient conflicts. We validate $g^\text{car}$ across diverse domains, ranging from synthetic datasets and image editing to generative decision-making for planning and control. Our results demonstrate that $g^\text{car}$ effectively rectifies off-manifold drift, surpassing baselines in generation fidelity while using light compute. Code is available at https://github.com/yuxuehui/CAR-guidance.

2605.20756 2026-05-21 cs.LG cs.AI math.OC stat.ML 版本更新

Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers

纠正预条件语言模型优化器中的随机更新偏差

Nikhil Nayak, Julia White, Urchade Zaratiana, Kelton Zhang, Henrijs Princis, Dhruv Atreja, Henry Fawcett, Matthew Thomas, George Hurn-Maloney, Ash Lewis

发表机构 * Fastino Labs(Fastino实验室)

AI总结 本文研究了预条件优化器中随机更新规则的有限样本偏差问题,提出了一种单批次偏差校正框架,通过交叉拟合预条件估计和方差校正逆运算来减少梯度-预条件器耦合偏差和逆运算偏差,从而提升预条件优化器的性能。

Comments 32 pages, 3 figures, 13 tables

详情
AI中文摘要

预条件优化器在语言模型训练中至关重要,但其随机更新规则通常被视为对群体预条件下降的直接近似。我们证明这种观点忽略了两个有限样本偏差。首先,梯度和预条件器通常从同一个mini-batch估计,引入梯度-预条件器耦合偏差。其次,即使预条件器估计是无偏的,其逆或逆根通常有偏,因为逆运算是非线性的。我们提出了一种单批次偏差校正框架,以解决这两种效应:交叉拟合预条件估计从独立的微批次组中估计分子和预条件器,而方差校正逆运算利用微批次变化来减去主导的delta-方法偏差项。该框架适用于对角矩、对角曲率和矩阵预条件方法,分别在AdamW、Sophia和Shampoo中实现。偏差校正将Qwen2.5-0.5B的保持预训练损失减少了0.15、0.07和0.11 nat,分别;对混合质量预训练和下游指令微调的影响始终是中性到积极的。这些结果确立了偏差校正作为减少有限样本更新偏差和提升预条件优化器性能的实用机制。

英文摘要

Preconditioned optimizers are central to language model training, but their stochastic update rules are usually treated as direct approximations to population preconditioned descent. We show that this view misses two finite-sample biases. First, the gradient and preconditioner are typically estimated from the same minibatch, introducing gradient--preconditioner coupling bias. Second, even when the preconditioner estimate is unbiased, its inverse or inverse-root is generally biased because inversion is nonlinear. We propose a single-batch bias-correction framework that addresses both effects: cross-fitted preconditioning estimates the numerator and preconditioner from independent microbatch groups, while variance-corrected inversion uses microbatch variability to subtract the leading delta-method bias term. The framework applies to diagonal moment, diagonal curvature, and matrix preconditioning methods, instantiated in AdamW, Sophia, and Shampoo. Bias correction reduces held-out pretraining loss on Qwen2.5-0.5B by $0.15$, $0.07$, and $0.11$ nats, respectively; the effects on mixed-quality pretraining and downstream instruction tuning are consistently neutral-to-positive. Together, these results establish bias correction as a practical mechanism for reducing finite-sample update bias and improving the performance of preconditioned optimizers.

2605.20751 2026-05-21 cs.LG cs.AI cs.SY eess.SY 版本更新

PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

PACD-Net: 假设增强对比学习用于从SMBG估计血糖控制

Canyu Lei, David Repaske, Jianxin Xie

发表机构 * University of Virginia, School of Data Science, Charlottesville, VA 22903, USA(弗吉尼亚大学数据科学学院) University of Virginia, Department of Pediatrics, Charlottesville, VA 22903, USA(弗吉尼亚大学儿科系)

AI总结 本研究提出PACD-Net,一种自监督对比学习框架,用于从稀疏不规则采样的SMBG数据中估计血糖控制指标,通过伪SMBG样本指导学习并提高模型的准确性和稳定性。

详情
AI中文摘要

有效的糖尿病管理需要持续监测血糖水平。临床中,通过连续葡萄糖监测(CGM)获取的指标如时间范围(TIR)、低于范围时间(TBR)和高于范围时间(TAR)用于评估血糖控制。然而,由于CGM成本高且可及性有限,许多患者依赖自测血糖(SMBG)。与CGM不同,SMBG提供稀疏且不规则的测量,使得准确估计这些指标具有挑战性。传统监督学习方法在稀疏数据下表现不佳,导致泛化能力差和性能不稳定。为此,我们提出PACD-Net,一种自监督对比学习框架,用于从SMBG估计血糖控制。使用具有更丰富时间覆盖的伪SMBG样本作为教师信号,指导从稀疏观测中学习。此外,多视图对比学习强制不同采样模式下的表征一致性。模型采用混合Swin Transformer-CNN主干网络以捕捉稀疏SMBG序列中的时间依赖性。实验结果表明,PACD-Net在真实世界SMBG数据中对TAR、TIR和TBR的估计优于现有方法,实现了在极稀疏观测设置下的改进准确性和增强的稳定性与泛化能力。所提出的框架为临床SMBG解释提供了实用工具,并为从稀疏且不规则采样的传感器数据中学习提供了通用方法。

英文摘要

Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically derived from continuous glucose monitoring (CGM). However, many patients rely on self-monitoring of blood glucose (SMBG) due to the high cost and limited accessibility of CGM. Unlike CGM, SMBG provides sparse and irregular measurements, making accurate estimation of these metrics challenging. Conventional supervised learning approaches struggle under such sparsity, leading to poor generalization and unstable performance. To address this, we propose PACD-Net, a self-supervised contrastive knowledge distillation framework for estimating glycemic control from SMBG. Pseudo-SMBG samples with richer temporal coverage are used as teacher signals to guide learning from sparse observations. In addition, multi-view contrastive learning enforces representation consistency across diverse sampling patterns. The model adopts a hybrid Swin Transformer-CNN backbone to capture temporal dependencies in sparse SMBG sequences. Experimental results demonstrate that PACD-Net consistently outperforms existing methods in estimating TAR, TIR, and TBR from real-world SMBG data, achieving improved accuracy as well as enhanced stability and generalization under extremely sparse observation settings. The proposed framework provides a practical tool for clinical SMBG interpretation and offers a generalizable approach for learning from sparse and irregularly sampled sensor data in broader applications.

2605.20745 2026-05-21 cs.LG cs.AI cs.CL 版本更新

The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering

验证器严格性的隐含信号:通过选择性潜在引导控制和改进逐步验证

Yefan Zhou, Yilun Zhou, Austin Xu, Soroush Vosoughi, Shafiq Joty, Jiang Gui

发表机构 * Dartmouth College(达特茅斯学院) Datadog AI Research(Datadog人工智能研究) Salesforce AI Research(Salesforce人工智能研究)

AI总结 本文研究了通过隐藏状态干预控制验证器严格性的方法,提出VerifySteer通过利用潜在正确性信号进行样本级路由并选择性干预段落边界,从而在ProcessBench和Hard2Verify数据集上优于基线方法,且在推理计算上更高效。

详情
AI中文摘要

生成验证器已成为逐步验证的一种有前途的范式,但其验证行为往往校准不佳:它们可能过于宽松而错过错误步骤,或过于严格而拒绝正确推理。我们将这种倾向于过于宽松或过于严格的行为称为验证器严格性。在本工作中,我们研究是否可以通过隐藏状态干预来控制验证器严格性。我们揭示了一个验证特定的隐藏状态信号:在逐步验证中,验证器接受或拒绝解决方案步骤的倾向编码在对应的验证段落边界附近。利用这一信号,我们证明隐藏状态引导可以直接调节验证器严格性,而无需微调。然而,统一引导会导致错误检测与正确性认证之间的权衡。为了解决这个问题,我们提出了VerifySteer,它利用潜在正确性信号进行样本级路由,并选择性地在段落边界进行干预。在ProcessBench和Hard2Verify上的实验表明,VerifySteer优于提示优化和激活引导基线,并且在需要更少推理计算的情况下与自一致性竞争。VerifySteer还与验证微调互补,在微调验证器上提供进一步的收益。代码可在https://github.com/YefanZhou/VerifySteer上获得。

英文摘要

Generative verifiers have emerged as a promising paradigm for step-wise verification, but their verification behavior is often poorly calibrated: they may be under-critical and miss erroneous steps, or over-critical and reject correct reasoning. We refer to this tendency to be overly lenient or overly critical as verifier strictness. In this work, we study whether verifier strictness can be controlled through hidden-state intervention. We uncover a verification-specific hidden-state signal: in step-wise verification, a verifier's tendency to accept or reject a solution step is encoded near the boundary of the corresponding verification paragraph. Exploiting this signal, we show that hidden-state steering can directly modulate verifier strictness without fine-tuning. However, uniform steering induces a trade-off between error detection and correctness certification. To address this, we propose VerifySteer, which exploits latent correctness signals for sample-level routing and selectively intervenes on paragraph boundaries. Experiments on ProcessBench and Hard2Verify show that VerifySteer outperforms prompt optimization and activation steering baselines, and is competitive with self-consistency while requiring 4-7x less inference compute. VerifySteer is also complementary to verification fine-tuning, providing further gains on top of fine-tuned verifiers. The code is available at https://github.com/YefanZhou/VerifySteer.

2605.20744 2026-05-21 cs.LG cs.AI 版本更新

Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale

可验证的环境:面向大规模评估奖励黑客的尝试

Amit Roth, Ankur Samanta, Matan Halevy, Yoav Levine, Yonathan Efroni

发表机构 * Tel Aviv University(特拉维夫大学) Columbia University(哥伦比亚大学) Taso Labs(Taso实验室)

AI总结 本文提出了一种新的评估方法来衡量奖励黑客,通过在环境中嵌入可检测的奖励黑客机会,使评估更加可靠和自动化,通过TextArena测试床分析了不同语言模型在多样化环境中的奖励黑客行为。

Comments Project Page - https://majoroth.github.io/hack-verifiable-environments/

详情
AI中文摘要

使自主代理与人类意图对齐仍然是现代AI中的核心挑战。这一挑战的一个关键表现是奖励黑客,即代理在评估信号下表现成功,但违反了预期目标。奖励黑客已在多种设置中被观察到,但可靠的大规模测量方法仍然匮乏。在本文中,我们引入了一种新的评估范式来衡量奖励黑客。与以往主要通过事后分析代理轨迹不同,我们直接在环境中嵌入可检测的奖励黑客机会,使其利用可验证,从而能够确定和自动化测量代理如何利用这些漏洞。我们通过TextArena实现了这一方法,并发布了Hack-Verifiable TextArena,一个可以可靠测量奖励黑客的测试床。使用此基准,我们分析了不同语言模型在多样化环境和设置中的奖励黑客行为。我们开源代码在https://github.com/MajoRoth/hack-verifiable-environments/。

英文摘要

Aligning autonomous agents with human intent remains a central challenge in modern AI. A key manifestation of this challenge is reward hacking, whereby agents appear successful under the evaluation signal while violating the intended objective. Reward hacking has been observed across a wide range of settings, yet methods for reliably measuring it at scale remain lacking. In this work, we introduce a new evaluation paradigm for measuring reward hacking. Whereas prior studies have primarily analyzed it post hoc by inspecting agent trajectories, we instead embed detectable reward hacking opportunities directly into environments. This makes their exploitation verifiable by design, enabling deterministic and automated measurement of whether and how agents exploit such vulnerabilities. We instantiate this approach in $\textit{TextArena}$ and release $\textit{Hack-Verifiable TextArena}$, a testbed in which reward hacking can be measured reliably. Using this benchmark, we analyze reward hacking behavior across language models in diverse environments and settings. We open source the code at https://github.com/MajoRoth/hack-verifiable-environments/.

2605.20740 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Distribution-Aware Reward: 用于LLM回归的预测分布强化学习

Jungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan Ritter

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Allen Institute for AI(人工智能研究院)

AI总结 本文提出Distribution-Aware Reward,一种基于预测分布的强化学习方法,旨在提升语言模型在回归任务中的预测分布质量,而非仅优化单个解码输出。通过连续排名概率分数评估多个解码样本的分布,并基于每个rollout对分布质量的边际贡献分配信用,从而提升预测的准确性和分散性。实验表明,该方法在多个任务中优于监督微调和点wise强化学习基线,尤其在KBSS数据集上Spearman相关性提升6点。

Comments 21 pages, 5 figures

详情
AI中文摘要

大型语言模型能够从异质输入(如文本、代码和分子字符串)预测实值量,但大多数训练目标独立评分每个解码的浮点数,仅改进点估计而无法确保校准的预测分布。这限制了需要候选排序或不确定性估计的应用。我们引入Distribution-Aware Reward,一种基于策略的强化学习目标,其主要贡献是训练语言模型生成更好的回归任务预测分布,而非仅优化单个解码输出与标量目标的匹配。我们的方法将多个解码样本视为经验预测分布,并使用连续排名概率分数进行评估,基于每个rollout对分布质量的边际贡献分配leave-one-out信用,奖励既准确又适当分散的预测。我们在受控高斯混合任务、代码性能预测和分子属性预测(从SMILES字符串)上评估了我们的方法。在所有任务中,我们的方法优于监督微调和点wise强化学习基线,具有显著的排名相关性提升,包括在KBSS数据集上Spearman相关性提升6点。在MoleculeNet上,仅使用SMILES字符串,仍能与强大的图基和3D分子模型竞争。进一步分析表明,我们的方法缓解了rollout多样性崩溃并改进了不确定性诊断,表明直接优化预测分布使语言模型回归更具鲁棒性和校准性。

英文摘要

Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions. This limits applications requiring candidate ranking or uncertainty estimation. We introduce Distribution-Aware Reward, an on-policy reinforcement learning objective whose main contribution is to train language models to produce better predictive distributions for regression tasks, rather than only optimizing individual decoded outputs against scalar targets. Our method treats multiple decoded samples as an empirical predictive distribution, evaluates it with the Continuous Ranked Probability Score, and assigns leave-one-out credit based on each rollout's marginal contribution to distribution quality, rewarding predictions that are both accurate and appropriately dispersed. We evaluate our method on a controlled Gaussian-mixture task, code performance prediction, and molecular property prediction from SMILES strings. Across tasks, our method improves over supervised fine-tuning and pointwise reinforcement learning baselines, with strong rank-correlation gains, including a 6-point Spearman improvement on KBSS. On MoleculeNet, it uses only SMILES strings yet remains competitive with strong graph-based and 3D molecular models. Further analyses show that our method mitigates rollout diversity collapse and improves uncertainty diagnostics, suggesting that directly optimizing predictive distributions makes language model regression more robust and better calibrated.

2605.20723 2026-05-21 cs.LG 版本更新

Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds

在资源受限的Android蜂窝中实现内存高效的分区DNN推理

Lakshani Manamperi, Disumi Pathirana, Thiwanka Pathirana, Nipun Premarathna, Kutila Gunasekera

发表机构 * Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka(计算机科学与工程系,莫图瓦大学,莫图瓦,斯里兰卡)

AI总结 本文提出了一种在资源受限的Android设备上实现高效DNN推理的方法,通过五个机制将内存压力分散到多个设备上,从而在不修改模型的情况下实现ONNX推理,显著降低了电池消耗和延迟。

Comments 6 pages, 3 figures, 4 tables. Accepted at the ICML 2026 Workshop on Machine Learning for the Global South

详情
AI中文摘要

在边缘机器学习中,将大型深度神经网络部署到内存受限的移动设备上是一个核心挑战。尽管压缩、剪枝和量化可以降低每个参数的成本,但基于Transformer的模型仍然太大,无法适应商用Android手机约3.3-7.4 GB RAM的范围。我们提出了CROWDio的DNN管道调度子系统,通过五个机制将内存压力分散到多个设备上,从而在不修改模型的情况下实现资源受限Android设备上的实用ONNX推理。这些机制包括JIT延迟分区加载、单分区驻留约束、四层亲和调度器、zlib压缩张量传输以及流式1:1依赖模型。在DistilBERT(Sanh等人,2019)(约6700万参数,SST-2)上跨五个Android手机进行十次运行评估时,我们的系统使每个设备的峰值RSS保持在43±2 MB,限制电池消耗到每运行50±3 mAh,同时流式并发将批次延迟降低了34%低于屏障同步。

英文摘要

Deploying large deep neural networks on memory-constrained mobile devices is a central challenge in edge ML. While compression, pruning, and quantization reduce per-parameter cost, transformer-based models remain too large for the 3.3-7.4 GB RAM envelope of commodity Android handsets. We present the DNN pipeline scheduling subsystem of CROWDio, which achieves practical ONNX inference across resource-constrained Android workers without model modification, by distributing memory pressure across devices via five mechanisms: JIT deferred partition loading, a single-partition-resident constraint, a 4-tier affinity scheduler, a zlib-compressed tensor transport, and a streaming 1:1 dependency model. Evaluated on DistilBERT (Sanh et al., 2019) (approximately 67 M parameters, SST-2) across five Android handsets over ten runs, our system holds peak per-device RSS to 43+-2 MB and limits battery draw to 50+-3 mAh per run, while streaming concurrency cuts batch latency 34% below barrier synchronisation.

2605.20722 2026-05-21 cs.LG cs.AI 版本更新

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

AGPO: 基于双统计反馈的自适应群体策略优化

Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络安全学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 本文提出AGPO,一种无 critic 的 GRPO 改进方法,通过群体层面的统计信息控制更新幅度和探索。在九个英语和中文数学/STEM 基准上,Qwen2.5-14B 在相同生成 token 预算下优于 PPO/GRPO,达到 GSM8K 67.3% 和 MATH 40.5%。

详情
AI中文摘要

强化学习提升大语言模型推理能力,但 PPO/GRPO 通常使用固定剪切和解码温度,使训练脆弱且调参困难。我们提出自适应群体策略优化(AGPO),一种无 critic 的 GRPO 改进方法,利用群体层面统计信息控制更新幅度和探索。AGPO 使用共享的探针衍生统计状态驱动两个控制器:(i)自适应剪切,根据奖励分散度和偏度、探针投票熵、策略熵和逐步 KL 偏移设置信任区域大小;(ii)双向自适应温度采样,根据与运行基线相对的中心不确定性加热或冷却解码。在九个英语和中文数学/STEM 基准上,使用 AGPO 训练的 Qwen2.5-14B 在相同生成 token 预算下优于 PPO/GRPO,达到 GSM8K 67.3% 和 MATH 40.5%。收益转移到 Llama-3-8B 和 Gemma-2-9B,消融实验确认两个模块互补。我们的实现可在 https://github.com/wandugu/paper_agpo 公开获取。

英文摘要

Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement of GRPO that uses group-level statistics to control both update magnitude and exploration. AGPO uses a shared probe-derived statistical state to drive two controllers: (i) adaptive clipping, which sets the trust-region size from reward dispersion and skewness, probe vote entropy, policy entropy, and step-wise KL drift; and (ii) bidirectional adaptive temperature sampling, which heats or cools decoding around a base temperature according to centered uncertainty relative to a running baseline. On nine English and Chinese math/STEM benchmarks, Qwen2.5-14B trained with AGPO outperforms PPO/GRPO under the same generated-token budget, reaching 67.3% on GSM8K and 40.5% on MATH. Gains transfer to Llama-3-8B and Gemma-2-9B, and ablations confirm both modules are complementary. Our implementation is publicly available at https://github.com/wandugu/paper_agpo.

2605.20721 2026-05-21 cs.LG 版本更新

Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework

从噪声隐式反馈中鲁棒推荐:一种加权贝叶斯标签转移矩阵框架

Zongyu Li, Xuanyu Liu, Gongce Cao, Shirui Sun, Yaqi Fang, Yongshuai Yu

发表机构 * Guangdong University of Technology(广东工业大学) University of Chinese Academy of Sciences(中国科学院大学) Capital Normal University(首都师范大学) Xiamen University(厦门大学) Beijing Jiaotong University(北京交通大学)

AI总结 本文提出了一种鲁棒的高斯混合模型加权贝叶斯标签转移矩阵框架(RGBT),通过利用高斯混合模型生成实例特定的可靠性评分,系统校准贝叶斯标签转移矩阵估计以减少偏差,从而在保证全样本利用的同时,实现一致的估计和显著的估计方差减少。

详情
AI中文摘要

在推荐系统中,从隐式反馈学习受到普遍标签噪声的挑战。虽然传统去噪方法通常丢弃噪声实例以确保鲁棒性,但这种策略不可避免地导致数据利用率低。替代方法利用贝叶斯标签转移矩阵(BLTM)可以利用所有可用数据,但其估计在实际推荐场景中往往存在偏差。为了解决这些限制,本文提出了一种鲁棒的高斯混合模型加权贝叶斯标签转移矩阵框架(RGBT)。我们的解决方案利用高斯混合模型(GMM)推导实例特定的可靠性评分,系统校准BLTM估计以减轻偏差。理论分析确认,通过利用BLTM框架结合GMM校准,我们的方法同时确保了全样本利用、一致的估计以及关键的估计方差显著减少。在多个真实世界和合成翻转数据集上的广泛实验表明,RGBT不仅比主流可靠样本去噪方法更有效地利用噪声样本,而且在状态-of-the-art转移矩阵去噪方法中实现了显著更优的转移矩阵校准能力。

英文摘要

Learning from implicit feedback in recommender systems is fundamentally challenged by pervasive label noise. While conventional denoising approaches often discard noisy instances to ensure robustness, this strategy inevitably suffers from low data utilization. Alternative methods that employ a Bayes-label transition matrix (BLTM) can leverage all available data, but their estimates tend to be biased in practical recommendation scenarios. To address these limitations, this paper proposes a Robust GMM-weighted Bayes-label Transition Matrix framework (RGBT). Our solution utilizes a Gaussian Mixture Model (GMM) to derive instance-specific reliability scores, which systematically calibrate the BLTM estimation to mitigate bias. Theoretical analysis confirms that our approach, by leveraging the BLTM framework with GMM calibration, simultaneously ensures full sample utilization, delivers consistent estimation, and critically, achieves a significant reduction in estimation variance. Extensive experiments on multiple real-world and synthetically flipped datasets demonstrate that RGBT not only utilizes noisy samples more effectively than mainstream reliable sample-based denoising methods, but also achieves significantly superior calibration capability of the transition matrix compared to state-of-the-art transition matrix-based denoising approaches.

2605.20713 2026-05-21 cs.CV cs.AI cs.LG 版本更新

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

SAVER:选择性所需视觉证据用于多模态信息提取

Miaobo Hu, Shuhao Hu, Bokun Wang, Rui Chen, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) University of Chinese Academy of Sciences(中国科学院大学) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络安全学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 该研究提出SAVER框架,通过选择性视觉证据提升多模态命名实体识别和关系抽取的性能,减少计算开销并提高准确性。

详情
AI中文摘要

多模态信息提取在社交媒体中具有挑战性,因为帖子可能附加多个弱相关、冗余甚至误导性的图像。在这样的情况下,持续的多模态融合会浪费计算资源并放大虚假的视觉提示。核心挑战是决定是否为每个候选跨度或标记实体对咨询视觉信息,以及如果需要,哪些小图像子集提供可信的证据。我们提出SAVER,一种选择性视觉所需框架用于多模态命名实体识别和多模态关系抽取。SAVER使用符合性地面性门(CGG)来估计MNER中的跨度级视觉地面性,从两个标记实体推导出对级激活,通过符合性风格程序和Clopper-Pearson上界校准激活阈值。当被激活时,一个子模ularity相关性-多样性选择器选择跨图像的紧凑证据子集,然后通过集合变换器进行聚合。一个受能量启发的联合评分头结合文本、可选视觉证据、文本-图像一致性以及稀疏路由用于实体类型或关系分类。实验表明,SAVER在强文本-only和持续多模态基线上一致提高F1,同时减少AURC,增加激活覆盖面积,在固定风险水平下,降低FLOPs和P90延迟。

英文摘要

Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this setting, always-on multimodal fusion wastes computation and can amplify spurious visual cues. The core challenge is to decide, for each candidate span or marked entity pair, whether vision should be consulted at all and, if so, which small subset of images provides trustworthy evidence. We propose SAVER, a selective vision-as-needed framework for multimodal named entity recognition and multimodal relation extraction. SAVER uses a Conformal Groundability Gate (CGG) to estimate span-level visual groundability in MNER, derive pair-level activation in MRE from the two marked entities, and calibrate the activation threshold on a held-out split via a conformal-style procedure with Clopper--Pearson upper bounds. When activated, a submodular relevance--diversity selector chooses a compact evidence subset across images, which is then aggregated by a Set Transformer. An energy-inspired joint scoring head combines text, optional visual evidence, text--image consistency, and sparse routing for entity typing or relation classification. Experiments show that SAVER consistently improves F1 over strong text-only and always-on multimodal baselines, while reducing AURC, increasing activation coverage at a fixed risk level, and lowering FLOPs and P90 latency.

2605.20696 2026-05-21 cs.LG 版本更新

Distributed Direct Preference Optimization

分布式直接偏好优化

Zhanhong Jiang

发表机构 * Translational AI Center, Iowa State University, Ames, USA(翻译人工智能中心,爱荷华州立大学,爱荷华州阿姆斯)

AI总结 本文研究了在分布式环境中直接偏好优化(DPO)的收敛性和时间复杂度,分析了联邦学习和去中心化学习中偏好数据碎片化对优化动态的影响,并提出了具有理论保证的鲁棒且可扩展的实现实现方法。

Comments 29 pages, 12 figures

详情
AI中文摘要

基于偏好强化学习(RL)是将策略与人类判断对齐的关键范式,然而其在分布式设置中,偏好数据在异构用户之间碎片化的情况下理论行为仍不明确。直接偏好优化(DPO)避免显式奖励建模,但在联邦和去中心化训练中缺乏收敛保证,其中通信约束和非独立同分布(non-IID)偏好根本上改变了优化动态。我们为分布式环境中的DPO提供了首次收敛性和时间复杂度分析。通过建模具有用户特定偏好分布的个性化离线RL,我们刻画了诱导的全局优化景观。对于联邦DPO,我们推导了收敛率,量化了客户端漂移、通信频率和偏好异质性的影响;对于去中心化DPO,我们建立了在一般通信图上的收敛性,并展示了谱连通性如何控制优化速度和共识。实证上,我们在标准对齐基准上验证了我们的理论见解,证明了我们提出的方法不仅具有强理论保证,而且在实践中也表现出鲁棒性和可扩展性。代码库在此处提供。

英文摘要

Preference-based reinforcement learning (RL) is a key paradigm for aligning policies with human judgments, yet its theoretical behavior in distributed settings where preference data are fragmented across heterogeneous users remains poorly understood. Direct Preference Optimization (DPO) avoids explicit reward modeling but lacks convergence guarantees under federated and decentralized training, where communication constraints and non-IID preferences fundamentally alter optimization dynamics. We provide the first convergence and time-complexity analysis of DPO in distributed environments. Modeling personalized offline RL with user-specific preference distributions, we characterize the induced global optimization landscape. For federated DPO, we derive convergence rates that quantify the impact of client drift, communication frequency, and preference heterogeneity; for decentralized DPO, we establish convergence over general communication graphs and show how spectral connectivity governs optimization speed and consensus. Empirically, we corroborate our theoretical insights on standard alignment benchmarks, demonstrating that our proposed methods not only enjoy strong theoretical guarantees but also deliver robust and scalable performance in practice. The code base is available here.

2605.20689 2026-05-21 cs.CL cs.AI cs.IR cs.LG 版本更新

DIVE: Embedding Compression via Self-Limiting Gradient Updates

DIVE: 通过自限制梯度更新实现嵌入压缩

Dongfang Zhao

发表机构 * University of Washington Tacoma School of Engineering and Technology(华盛顿大学塔可姆分校工程与技术学院)

AI总结 本文提出DIVE方法,通过自限制的三元组损失和头级NT-Xent对比损失解决嵌入压缩中因标注数据稀缺导致的过拟合问题,提升了检索性能。

详情
AI中文摘要

大型语言模型的高维嵌入对向量搜索系统造成了显著的存储和计算成本。最近的嵌入压缩方法,包括Matryoshka-Adaptor(EMNLP 2024)、Search-Adaptor(ACL 2024)和SMEC(EMNLP 2025),通过轻量级残差适配器实现降维,但其训练目标在标注数据稀缺时导致严重过拟合,使检索性能低于冻结基线。我们提出DIVE(通过隐式视图集合进行降维),一种压缩适配器,通过两种机制解决这一失败。首先,一个自限制的基于hinge的三元组损失在三元组满足边距约束时产生零梯度,限制应用于预训练嵌入空间的总扰动。其次,头级NT-Xent对比损失将每个嵌入的多个学习投影视为隐式视图,提供密集的自监督梯度,补偿小数据集上三元组信号的稀疏性。在六个BEIR数据集上,DIVE在每个数据集和每个评估的压缩比上均优于所有三个基线适配器,具有14M参数的开源实现。

英文摘要

High-dimensional embeddings from large language models impose significant storage and computational costs on vector search systems. Recent embedding compression methods, including Matryoshka-Adaptor (EMNLP 2024), Search-Adaptor (ACL 2024), and SMEC (EMNLP 2025), enable dimensionality reduction through lightweight residual adapters, but their training objectives cause severe overfitting when labeled data is scarce, degrading retrieval performance below the frozen baseline. We propose \textsc{DIVE} (\textbf{D}imensionality reduction with \textbf{I}mplicit \textbf{V}iew \textbf{E}nsembles), a compression adapter that addresses this failure through two mechanisms. First, a self-limiting hinge-based triplet loss produces zero gradient once a triplet satisfies the margin constraint, bounding the total perturbation applied to the pretrained embedding space. Second, a head-wise NT-Xent contrastive loss treats multiple learned projections of each embedding as implicit views, providing dense self-supervised gradients that compensate for the sparsity of the triplet signal on small datasets. Across six BEIR datasets, \textsc{DIVE} outperforms all three baseline adapters on every dataset and at every evaluated compression ratio, with a 14M-parameter open-source implementation.

2605.20687 2026-05-21 eess.IV cs.LG 版本更新

Motion-Robust Deep Reconstruction for Free-Breathing Cardiac Cine MRI

运动鲁棒深度重建用于自由呼吸心脏 cine MRI

Mahmut Yurt, Kanghyun Ryu, Zhitao Li, Xucheng Zhu, Xianglun Mao, Martin Janich, Marcus Alley, Kawin Setsompop, John Pauly, Shreyas Vasanawala, Ali Syed

发表机构 * Stanford University(斯坦福大学) KIST GE Healthcare(通用电气医疗)

AI总结 本文提出Cine-DL框架,通过结合目标k空间预处理和快速模型基于深度重建,解决自由呼吸径向采集在高加速下的运动伪影问题,提升临床应用可行性。

详情
AI中文摘要

传统心脏 cine MRI 依赖于呼吸保持的 Cartesian 采集,容易产生运动伪影且可能不舒适或不可行,特别是对于儿童和其他不配合患者。自由呼吸径向采集可以缓解这些限制,但高加速下的鲁棒重建仍具挑战,因显著的条纹伪影。为解决这些限制,我们提出 Cine-DL,一个面向临床的框架,结合目标 k 空间预处理与快速、基于模型的深度重建。在该流程中,原始自由呼吸径向数据经过回顾性心脏分箱和呼吸门控以分辨心脏相位并丢弃运动损坏的 spokes。我们然后引入条纹优化线圈压缩 (SOC),明确保留心脏信号同时抑制通常驱动条纹伪影的外围干扰。所得 2D+t cine 系列通过一个展开网络重建,交替使用 ResNet 近似算子与基于物理的数据一致性更新,通过共轭梯度求解。我们进一步采用内存高效的训练策略以减少峰值内存使用。我们在自由呼吸志愿者数据上评估 Cine-DL,与已建立的基线 (k-t SENSE 和 iGRASP) 相比,并通过医院部署新获得的患者数据证明临床应用。我们的实验表明,Cine-DL 一致提高定量指标和视觉保真度,支持向自由呼吸 cine MRI 的常规、时间敏感临床应用的实用路线。

英文摘要

Conventional cardiac cine MRI relies on breath-hold Cartesian acquisitions, which are vulnerable to motion artifacts and can be uncomfortable or infeasible, particularly for pediatric and other noncompliant patients who cannot reliably hold their breath. Free-breathing radial acquisitions can alleviate these limitations, but robust reconstruction at high acceleration remains challenging due to prominent streak artifacts. To address these limitations, we propose Cine-DL, a clinically oriented framework that couples targeted k-space preprocessing with fast, model-based deep reconstruction. In this pipeline, raw free-breathing radial data undergo retrospective cardiac binning and respiratory gating to resolve cardiac phases and discard motion-corrupted spokes. We then introduce Streak Optimized Coil Compression (SOC), which explicitly preserves cardiac signals while suppressing peripheral interference that typically drives the streak artifacts. The resulting 2D+t cine series is reconstructed with an unrolled network that alternates a ResNet proximal operator with physics-based data consistency updates solved via conjugate gradient. We further employ a memory-efficient training strategy that reduces peak memory usage. We evaluate Cine-DL on free-breathing volunteer data against established baselines (k-t SENSE and iGRASP) and demonstrate clinical translation via hospital deployment on newly acquired patient data. Our experiments show that Cine-DL consistently improves quantitative metrics and visual fidelity, supporting a practical route toward routine, time-sensitive clinical adoption of free-breathing cine MRI.

2605.20681 2026-05-21 stat.ME cs.LG 版本更新

Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis

基于尺度校准的中位数-均值方法用于鲁棒分布式主成分分析

Kisung You

发表机构 * Baruch College(巴彻学院) The Graduate Center, City University of New York(纽约市立大学研究生中心)

AI总结 本文研究了基于尺度校准的中位数-均值估计器,用于鲁棒分布式主成分分析,通过欧几里得空间和格拉斯曼流形的产品几何结构,提出了一个节点级PCA展开,证明了所提出的产品流形中位数-均值估计器的渐近等价性,并展示了鲁棒块尺度和推断最优校准规则,以及高概率中位数-均值界限。

详情
AI中文摘要

分布式主成分分析(PCA)产生节点级的均值向量和主子空间估计。稳健地聚合这些异质对象需要均值误差和子空间误差之间的相对尺度。我们研究了使用欧几里得空间和格拉斯曼流形的产品几何结构的尺度校准的中位数-均值估计器用于此问题。一个节点级PCA展开显示,均值组件具有通常的线性影响,而子空间组件是特征间隙加权的协方差扰动。我们证明了一个局部减少,显示所提出的产品流形中位数-均值估计器在渐近上等价于一个缩放后的节点影响误差的空间中位数。这导致了固定节点非高斯极限、增长节点高斯极限和有限块偏差的高斯极限,以及显式依赖于尺度的协方差公式。我们提出了鲁棒块尺度和推断最优校准规则,建立了高概率中位数-均值界限,刻画了因子wise坏节点影响,并证明了节点Bootstrap有效性。模拟和大规模单细胞RNA-seq数据表明,尺度校准适应于特征间隙驱动的子空间不确定性,并提供了鲁棒的分布式PCA总结。

英文摘要

Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.

2605.20678 2026-05-21 cs.LG cs.AI 版本更新

Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting

动态TMoE:一种针对非平稳时间序列预测的漂移感知动态专家混合框架

Jiawen Zhu, Shuhan Liu, Di Weng, Yingcai Wu

发表机构 * School of Software Technology, Zhejiang University, Ningbo, China State Key Lab of CAD\&CG, Zhejiang University, Hangzhou, China

AI总结 本文提出Dynamic TMoE框架,通过动态构建异构专家和剪枝冗余专家来优化容量,并利用时间记忆路由器确保稳定且上下文感知的专家选择,从而在非平稳时间序列预测中实现更优性能。

Comments 27 pages, 7 figures. Accepted to ICML 2026

详情
AI中文摘要

非平稳时间序列预测面临由演变分布偏移带来的挑战,静态模型难以捕捉这些变化。虽然混合专家(MoE)架构提供了解耦复杂漂移模式的有前景范式,但现有方法受限于固定专家池和无记忆路由,阻碍了其适应突发制度转变的能力。为此,我们提出Dynamic TMoE框架,将架构进化与时间连续性统一在学习阶段。通过最大均值偏差(MMD)检测分布偏移,动态实例化异构专家并剪枝冗余专家以优化容量。此外,时间记忆路由器利用循环状态和异常库确保稳定、上下文感知的专家选择,无需测试时更新。在九个基准测试中的实验表明,该方法实现了最先进的性能,将MSE减少10.4%,MAE减少7.8%。代码可在https://github.com/andone-07/Dynamic-TMoE获取。

英文摘要

Non-stationary time series forecasting is challenged by evolving distribution shifts that static models struggle to capture. While Mixture-of-Experts (MoE) architectures offer a promising paradigm for decoupling complex drift patterns, existing approaches are limited by fixed expert pools and memoryless routing, hampering their ability to adapt to abrupt regime shifts. To address this, we propose Dynamic TMoE, a framework that unifies architectural evolution with temporal continuity during learning phase. By detecting distribution shifts via Maximum Mean Discrepancy (MMD), we dynamically instantiate heterogeneous experts and prune redundant ones to optimize capacity. Additionally, a temporal memory router leverages recurrent states and an anomaly repository to ensure stable, context-aware expert selection without requiring test-time updates. Experiments on nine benchmarks demonstrate state-of-the-art performance, reducing MSE by 10.4% and MAE by 7.8%. Code is available at https://github.com/andone-07/Dynamic-TMoE.

2605.20674 2026-05-21 cs.LG 版本更新

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

无需微调的模块化多模态分类:一种简单的组合方法

Herman Bergström, Aditya Mehrotra, Rahul G. Krishnan

发表机构 * Chalmers University of Technology and University of Gothenburg(查尔姆斯理工大学和哥德堡大学) Vector Institute(向量研究所) University of Toronto(多伦多大学)

AI总结 本文提出CoMET,一种无需微调的多模态分类方法,通过冻结预训练的backbone对每个模态进行处理,使用PCA压缩嵌入并输入到表格基础模型中进行预测,展示了PCA作为适配器在不同模态上的强大鲁棒性能,并提出了PALPooling来提升表示质量,实现了无需训练的多模态学习最佳结果。

Comments 30 pages, 17 figures

详情
AI中文摘要

我们介绍CoMET,即通过表格基础模型(TFM)组合模态编码器的简单而具有竞争力的多模态分类方法:将每个模态通过冻结的预训练backbone处理,用PCA压缩得到的嵌入,并将其连接作为输入到TFM中进行预测。我们证明仅PCA就足以作为适配器,在不同模态上实现强大且稳健的性能。当基础模型的CLS标记与下游任务匹配不佳时,我们提出了PALPooling,一种轻量级的自适应标记池化器,能够一致地提高表示质量。通过将强大的冻结表示学习backbone与TFM组合,我们的方法在多样化的多模态基准上实现了最先进的结果,无需任何训练。在具有大规模细粒度类别空间的分层任务中,我们的方法实现了快速且可扩展的分类,能够处理超过500,000个样本和2,000个类别的数据集,无需任何微调。总体而言,我们的结果表明,基础模型的组合是一种简单但强大的即开即用解决方案,挑战了为新问题进行复杂端到端训练管道的必要性。

英文摘要

We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen pre-trained backbone, compress the resulting embeddings with PCA, and concatenate as input into a Tabular Foundation Model (TFM) for prediction. We show that PCA alone suffices to act as an adaptor yielding strong, robust performance across modalities. When the \texttt{CLS} tokens of the foundation model align poorly with downstream tasks, we propose \textbf{PALPooling}, a lightweight adaptive token pooler that consistently improves representation quality. By composing strong frozen representation learning backbones with TFMs, our approach achieves state-of-the-art results across diverse multimodal benchmarks without any training. On hierarchical tasks with large fine-grained class spaces, our approach enables fast and scalable classification, handling datasets with over 500,000 samples and 2,000 classes without any fine-tuning. Overall, our results show that the composition of foundation models is a simple, yet powerful, out-of-the-box solution for multimodal learning, challenging the necessity of complex, end-to-end training pipelines for new problems.

2605.20668 2026-05-21 cs.CL cs.AI cs.LG 版本更新

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

人工智能审稿人的局限与机遇:对Nature系列论文审稿的45位专家科学家的审查

Seungone Kim, Dongkeun Yoon, Kiril Gashteovski, Juyoung Suk, Jinheon Baek, Pranjal Aggarwal, Ian Wu, Viktor Zaverkin, Spase Petkoski, Daniel R. Schrider, Ilija Dukovski, Francesco Santini, Biljana Mitreska, Yong Jeong, Kyeongha Kwon, Young Min Sim, Dragana Manasova, Arthur Porto, Biljana Mojsoska, Makoto Takamoto, Marko Shuntov, Ruoqi Liu, Hyunjoo Jenny Lee, Niyazi Ulas Dinç, Yehhyun Jo, Sunkyu Han, Chungwoo Lee, Huishan Li, Esther H. R. Tsai, Ergun Simsek, Khushboo Shafi, Yeonseung Chung, Jihye Park, Aleksandar Shulevski, Henrik Christiansen, Yoosang Son, Elly Knight, Amanda Montoya, Jeongyoun Ahn, Christian Langkammer, Heera Moon, Changwon Yoon, Nikola Stikov, Mooseok Jang, Edward Choi, Junhan Kim, Yeon Sik Jung, Woo Youn Kim, Jae Kyoung Kim, Ishraq Md Anjum, Hyun Uk Kim, Drew Bridges, Carolin Lawrence, Xiang Yue, Alice Oh, Akari Asai, Sean Welleck, Graham Neubig

发表机构 * Nature(自然)

AI总结 本文通过大规模专家标注研究,探讨了AI审稿人在科学同行评审中的能力与局限,发现AI审稿在准确性、显著性和证据充分性方面表现优异,但存在领域知识有限、上下文管理不足等弱点,表明AI审稿是人类审稿的补充而非替代。

Comments Work in progress

详情
AI中文摘要

随着AI能力的提升,AI审稿人开始被应用于科学同行评审,但其能力和可信度仍存疑:许多科学家将其视为概率系统,缺乏评估研究的专业能力,而其他研究人员则对AI的准备程度更为乐观,但缺乏实证支持。理解AI审稿人擅长什么、哪里不足以及仍需解决的挑战至关重要。然而,现有的AI审稿评估主要关注其判断是否与人类一致(例如评分对齐、接受预测),这不足以表征其能力和局限。在本文中,我们通过大规模专家标注研究填补了这一空白,45位物理、生物和健康科学领域的专家花费469小时对2960个个体批评(每个批评针对论文的一个特定方面)进行评分,这些批评来自人类和AI生成的82篇Nature系列论文的审稿。在综合正确性、显著性和证据充分性三个维度上,由GPT-5.2驱动的审稿代理在每篇论文的最高评分人类审稿人评分上(60.0% vs. 48.2%,p = 0.009),而所有三个AI审稿(包括Gemini 3.0 Pro和Claude Opus 4.5)在每个维度上都超过了最低评分的人类审稿人。AI审稿的准确批评也更常被评分显著且证据充分,并揭示了人类未提及的26%的问题。然而,AI审稿在交叉审稿者对之间重叠远多于人类(21% vs. 3%),并且表现出16个人类不共享的弱点,如领域知识有限、缺乏多文件上下文管理能力以及对次要问题过于批判。总体而言,我们的结果表明当前AI审稿人是人类审稿人的补充,而非替代。

英文摘要

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of AI reviewers have focused on whether their verdicts match human verdicts (e.g., score alignment, acceptance prediction), which is insufficient to characterize their capabilities and limits. In this paper, we close this gap through a large-scale expert annotation study, in which 45 domain scientists in Physical, Biological, and Health Sciences spent 469 hours rating 2,960 individual criticisms (each targeting one specific aspect of a paper) from human-written and AI-generated reviews of 82 Nature-family papers on correctness, significance, and sufficiency of evidence. On a composite of all three dimensions, a reviewing agent powered by GPT-5.2 scores above each paper's top-rated human reviewer (60.0% vs. 48.2%, p = 0.009), while all three AI reviewers (including Gemini 3.0 Pro and Claude Opus 4.5) exceed the lowest-rated human across every dimension. AI reviewers' accurate criticisms are also more often rated significant and well-evidenced, and surface a distinct 26% of issues no human raises. However, AI reviewers overlap far more than humans do (21% vs. 3% for cross-reviewer pairs), and exhibit 16 recurring weaknesses humans do not share, such as limited subfield knowledge, lack of long context management over multiple files, and overly critical stance on minor issues. Overall, our results position current AI reviewers as complements to, not substitutes for, human reviewers.

2605.20659 2026-05-21 cs.CV cs.LG 版本更新

RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers

RoPeSLR: 3D RoPE驱动的稀疏低秩注意力用于高效的扩散变换器

Yuxi Liu, Zekun Zhang, Yixiang Cai, Renjia Deng, Yutong He, Kun Yuan

发表机构 * Peking University(北京大学) University of Electronic Science and Technology of China(电子科技大学) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 本研究提出RoPeSLR,一种基于3D RoPE的稀疏低秩注意力框架,旨在解决扩散变换器中长序列生成的高复杂度问题,通过结合高频率语义尖峰集和极低秩背景连续体,实现子二次稀疏性和子线性秩增长,从而在超长视频推理中表现出色。

详情
AI中文摘要

扩散变换器(DiTs)已革新了高保真视频生成,但其$\mathcal{O}(L^2)$的注意力复杂度对长序列合成构成了重大瓶颈。尽管近期的稀疏线性注意力混合体旨在缓解这一问题,但其在极端稀疏性下性能严重下降,这是因为“RoPE困境”:标准线性注意力无法保持3D旋转位置嵌入(RoPE)的正交相对位置结构,从而消除了关键的距离意识。为了解决这个问题,我们提出了RoPeSLR,一种3D RoPE引导的稀疏低秩注意力框架。我们建立,根据经验证实的假设,DiT注意力流形可以解耦为一个高频率语义尖峰集(受限于$\mathcal{O}(L^{3/2})$稀疏性)和一个极低秩($\mathcal{O}(d_h \log L)$)背景连续体。受这一结构先验的指导,RoPeSLR摒弃标准线性注意力,采用具有可学习3D绝对位置嵌入(PE)注入的头级低秩参数化,无缝合成长距离相对距离衰减。通过保证子二次稀疏性和子线性秩增长,RoPeSLR特别适合扩展到超长视频推理。广泛的评估验证了这种可扩展优势:在90%稀疏性下,RoPeSLR在Wan2.1-1.3B上实现高达10倍的FLOPs减少,并在HunyuanVideo-13B的超长100K+ token序列上提供2.26倍的端到端推理加速,同时保持接近无损的生成保真度(平均VBench退化低于1.3%)

英文摘要

Diffusion Transformers (DiTs) have revolutionized high-fidelity video generation, yet their $\mathcal{O}(L^2)$ attention complexity poses a formidable bottleneck for long-sequence synthesis. While recent sparse-linear attention hybrids aim to mitigate this, their performance severely degrades at extreme sparsity due to the "RoPE Dilemma": standard linear attention fails to preserve the orthogonal relative-position structure of 3D Rotary Position Embeddings (RoPE), neutralizing vital distance awareness. To address this, we propose \textbf{RoPeSLR}, a 3D RoPE-guided Sparse-LowRank attention framework. We establish that under empirically validated assumptions, the DiT attention manifold admits a decoupling into a high-frequency semantic spike set (bounded by $\mathcal{O}(L^{3/2})$ sparsity) and an extreme low-rank ($\mathcal{O}(d_h \log L)$) background continuum. Guided by this structural prior, RoPeSLR eschews standard linear attention for a head-wise low-rank parameterization equipped with a learnable 3D Absolute Positional Embedding (PE) injection, seamlessly synthesizing long-range relative distance decay. By guaranteeing sub-quadratic sparsity and sub-linear rank growth, RoPeSLR is exceptionally suited for scaling to ultra-long video inference. Extensive evaluations validate this scalable superiority: at 90\% sparsity, RoPeSLR achieves up to $10\times$ fewer FLOPs on Wan2.1-1.3B and delivers a $2.26\times$ end-to-end inference speedup on the ultra-long 100K+ token sequences of HunyuanVideo-13B, all while maintaining near-lossless generation fidelity (less than 1.3\% average VBench degradation).

2605.20649 2026-05-21 eess.SP cs.AI cs.LG 版本更新

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

AMAR: 基于注意力机制的轻量级多用户活动识别从Wi-Fi CSI

Amirhossein Mohammadi, Hina Tabassum

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系)

AI总结 本文提出了一种基于注意力机制的轻量级多用户活动识别框架AMAR,通过将活动识别转化为集合预测问题,利用Transformer架构和边缘-云混合架构,实现了在多用户环境下对并发活动的高精度识别,同时显著减少带宽使用和占用估计误差。

Comments 25 pages, 6 figures, 3 tables

详情
AI中文摘要

基于Wi-Fi的人体活动识别(HAR)已发展为一种有前景的无接触传感方法,利用无线收发器收集的信道状态信息(CSI)。尽管现有研究主要集中在单用户场景,但实际部署通常涉及多用户设置,其中并发用户的行为导致CSI模式重叠,挑战传统分类方法。为解决这一限制,本文提出了一种基于注意力机制的多用户活动识别(AMAR)框架,将HAR转化为集合预测问题。AMAR的Transformer架构利用可学习的查询嵌入作为专用活动检测器,使系统能够同时从复合CSI表示中识别多种活动。此外,为应对部署限制,AMAR采用边缘-云混合架构,其中边缘设备上的轻量级卷积网络执行初始特征提取,随后通过残差向量量化实现显著的带宽减少,同时保留活动区分信息。云组件通过基于注意力的集合匹配执行最终活动预测,使系统能够处理变化的占用水平。在教室、会议厅和空房间环境中,AMAR在平均情况下几乎将完美预测所有并发活动的速率提高了两倍,同时其F1分数达到53.4%,比最佳基准45.6%有所提高,并将占用估计误差减少了74%,同时大幅减少带宽使用。

英文摘要

Wi-Fi-based human activity recognition (HAR) has emerged as a promising approach for contactless sensing, leveraging channel state information (CSI) collected from wireless transceivers. While existing studies have primarily concentrated on single-user scenarios, real-world deployments often involve multi-user settings where concurrent users' movements induce overlapping CSI patterns that challenge conventional classification methods. To address this limitation, this paper introduces an attention-based multi-user activity recognition (AMAR) framework that formulates HAR as a set prediction problem. The transformer-based architecture in AMAR leverages learnable query embeddings acting as specialized activity detectors, enabling the simultaneous identification of multiple activities from composite CSI representations. Moreover, to address deployment constraints, AMAR is designed in an edge-cloud split architecture form where lightweight convolutional networks on edge devices perform initial feature extraction, followed by residual vector quantization that achieves substantial bandwidth reduction while preserving activity-discriminative information. The cloud component performs final activity prediction through attention-based set matching, enabling the system to handle varying occupancy levels. Across classroom, meeting-room, and empty-room environments, on average AMAR nearly doubles the rate of perfectly predicting all concurrent activities compared to the best baseline. Moreover, it achieves an $F_1$-score of 53.4% compared to 45.6% for the best benchmark, and reduces occupancy estimation error by 74%, while minimizing bandwidth substantially.

2605.20644 2026-05-21 cs.LG cs.AI cs.RO 版本更新

Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines

制造设计:一种集成制造知识的强化学习框架用于航空发动机自由形管道路由

Caicheng Wang, Zili Wang, Shuyou Zhang, Yongzhe Xiang, Zheyi Li, Liangyou Li, Jianrong Tan

发表机构 * State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University(浙江大学流体动力与机电系统国家重点实验室) Engineering Research Center for Design Engineering and Digital Twin of Zhejiang Province, Zhejiang University(浙江省设计工程与数字孪生工程研究中心) Zhejiang Changxing Heliang Intelligent Equipment Co., Ltd.(浙江长兴鹤浪智能装备有限公司)

AI总结 本文提出了一种集成制造知识的强化学习框架,用于航空发动机中自由形管道路由优化,通过将制造知识作为约束条件,提高了管道路径的可制造性和几何平滑度。

详情
AI中文摘要

制造设计在先进航空发动机开发中起着关键作用,其中复杂组件需要仔细考虑可制造性。然而,当前的管道路由实践仍然很大程度上与下游制造脱节,导致需要大量劳动和试错迭代以获得可制造的设计。为了解决这个问题,本研究提出了一种基于弗伦塞尔的管道路由优化(FPRO)框架,这是一种用于航空发动机自由形管道设计的集成制造知识的强化学习方法。FPRO将路由问题表述为弗伦塞尔框架中的边界值问题。在此框架中,管道路径由曲率和扭率剖面表示,这些剖面通过三次赫尔迈特插值生成。为了将设计与制造相结合,领域特定的制造知识被嵌入到曲率和扭率的允许范围的约束中。路径优化使用了具有随机探索和阶段引导奖励机制的近端策略优化算法。统一的映射公式然后将优化的路径转换为弯曲模具的运动轨迹,使六轴自由弯曲机能够直接制造。实验结果表明,FPRO能够持续生成无碰撞、可制造的路径,其几何剖面比基于笛卡尔的方法更平滑。它还实现了更快的收敛速度和在终端对齐、路径长度、障碍物避让和可制造性方面的优越性能,优于最先进的强化学习基线。现实验证确认了制造管道与数字设计之间几何的紧密对应关系,验证了FPRO的实践可行性。

英文摘要

Design for manufacturing plays a critical role in advanced aeroengine development, where complex components necessitate careful consideration of manufacturability. However, current practices in pipe routing remain largely decoupled from down-stream manufacturing, leading to labor-intensive, trial-and-error iterations to achieve manufacturable designs. To address this problem, this study proposes the Frenet-based pipe routing optimization (FPRO) framework, a manufacturability knowledge-integrated reinforcement learning approach for free-form pipe design in aeroengines. FPRO formulates the routing problem as a boundary value problem in the Frenet frame. In this framework, the pipe path is represented by curvature and torsion profiles, which are generated using cubic Hermite interpolation. To integrate design and manufacturing, domain-specific manufacturing knowledge is embedded as constraints on the permissible ranges of curvature and torsion. The path optimization is performed using the proximal policy optimization algorithm with stochastic exploration and a stage-guided reward mechanism. A unified mapping formulation then translates the optimized path into motion trajectories for the bending die, enabling direct fabrication on a six-axis free-bending machine. Experimental results demonstrate that FPRO consistently generates collision-free, manufacturable paths with smoother geometric profiles compared to Cartesian-based methods. It also achieves faster convergence and superior performance in terminal alignment, path length, obstacle avoidance, and manufacturability compared to state-of-the-art reinforcement learning baselines. Real-world validation confirms the close geometric correspondence between the manufactured pipe and its digital design, validating the practical feasibility of FPRO.

2605.20643 2026-05-21 cs.LG cs.AI cs.CL 版本更新

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

AVSD:通过平衡共识和教师特定的特权信号实现自适应视图自蒸馏

Duy Nguyen, Hanqi Xiao, Archiki Prasad, Zaid Khan, Anirban Das, Austin Zhang, Sambit Sahu, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal

发表机构 * UNC Chapel Hill(北卡罗来纳大学教堂山分校) Capital One(Capital One公司) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出AVSD,一种通过平衡共识和教师特定的特权信号来实现自适应视图自蒸馏的方法,以解决自蒸馏中教师和学生信息不对称和特权信息选择的问题。

Comments Code: https://github.com/duykhuongnguyen/AVSD

详情
AI中文摘要

自蒸馏使语言模型能够通过使用同一模型作为学生和教师来从自身轨迹中学习,其中教师基于学生无法访问的特权信息进行条件。此类信息可以是不同种类或视图,如解决方案、演示、反馈或最终答案。这种设置可以在不依赖外部模型的情况下提供密集的token级反馈,但会产生根本性的不对称性:教师可能依赖于视图特定的信息,而学生在推理时无法访问。此外,最佳的特权信息类型通常是任务依赖的,使得选择单一教师视图变得困难。在本工作中,我们通过引入AVSD(自适应视图自蒸馏),一种具有多种特权信息视图的自蒸馏新方法,来同时解决这两个挑战。AVSD通过分离稳定的跨视图共识和视图特定的残差信号来重建token级监督。AVSD识别出跨视图共享的共识信号,提供可靠的更新方向,然后在两者一致且比例适当的情况下,选择性地添加视图特定的残差信号以调整更新幅度。在数学竞赛基准(AIME24、AIME25和HMMT25)上的实验表明,AVSD在Qwen3-8B和Qwen3-4B上分别比单视图自蒸馏基线和GRPO平均Avg@8提升了3.1%和2.2%。此外,在代码生成基准(Codeforces、LiveCodeBench v6)上使用Qwen3-8B时,AVSD在平均上比单视图自蒸馏基线高出2.4%。

英文摘要

Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student. Such information can come in different types or views, such as solutions, demonstrations, feedback, or final answers. This setup provides dense token-level feedback without relying on a separate external model, but creates a fundamental asymmetry: the teacher may rely on view-specific information that the student cannot access at inference time. Moreover, the best type of privileged information is often task-dependent, making it difficult to choose a single teacher view. In this work, we address both these challenges jointly by introducing AVSD (Adaptive-View Self-Distillation), a novel method of self-distillation with multiple privileged-information views, which reconstructs token-level supervision by separating stable cross-view consensus from view-specific residual signals. AVSD identifies the consensus signal shared across views, which provides a reliable update direction, and then selectively adds the view-specific residual signal to adjust the update magnitude when it both aligns with the consensus direction and remains proportionate to the consensus signal. Experiments on math competition benchmarks (AIME24, AIME25, and HMMT25) show that AVSD consistently outperforms both single-view self-distillation baselines and GRPO, achieving average Avg@8 gains of 3.1% and 2.2% over the strongest baselines on Qwen3-8B and Qwen3-4B, respectively. Moreover, on code-generation benchmarks (Codeforces, LiveCodeBench v6) using Qwen3-8B, AVSD outperforms the single-view self-distillation baseline by 2.4% on average.

2605.20642 2026-05-21 cs.LG 版本更新

Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions

相同目标,不同盆地:标注者分布中的硬标签与软标签

Mirerfan Gheibi, Gashin Ghazizadeh

发表机构 * Independent Researcher(独立研究者)

AI总结 本文研究了在标注者分布中硬标签与软标签的区别,发现当每个示例的标注数量较少时,硬标签方法在性能上优于软标签训练,尤其是在稀疏经验目标远离完整标注者分布时效果更佳。

Comments 14 pages, 12 figures. Accepted to the 2nd Workshop on Epistemic Intelligence in Machine Learning (EIML @ ICML 2026)

详情
AI中文摘要

当标注者存在分歧时,这种分歧可能反映的是知识不确定性而非简单的标签噪声。我们研究了硬标签交付作为一种替代方法,以替代通常的投票汇总为单一标签或直接在经验软标签分布上训练。我们重点关注两种主要的硬标签方法:多轮次(multipass),它在保持数据集大小不变的情况下循环处理观察到的投票;以及随机标签采样(SLS),它在每个epoch开始时对每个示例采样一个标签。在CIFAR-10H上,我们发现当每个示例仅有少量标注时,硬标签交付在软标签训练上表现更优,尤其是在稀疏经验目标远离完整标注者分布时改进更明显。当完整标注者分布可用时,两种硬标签方法与软标签训练相当。我们使用确定性控制作为多轮次的消融实验,并使用洗牌SLS作为打破示例到分布匹配的对照。我们还展示了SLS和软标签交叉熵优化相同的预期目标。硬标签交付还收敛到更平坦的盆地,这在SVHN和CIFAR-100上的OoD检测中提供了支持性的描述证据。总体而言,这些结果表明,当原始投票数可用时,多轮次是一个强大的实用默认选择,而SLS则提供了一个轻量级的替代方案,当每个示例仅有少量投票时仍具有竞争力,且在完整标注者分布可用时与软标签训练相当。

英文摘要

When annotators disagree, that disagreement can reflect epistemic uncertainty rather than simple label noise. We study hard-label delivery as an alternative to the usual choices of collapsing votes to a single label or training directly on the empirical soft-label distribution. We focus on two primary hard-label methods: multipass, which cycles through observed votes while keeping the dataset size fixed, and stochastic label sampling (SLS), which samples one label per example at the start of each epoch. On CIFAR-10H, we find that when only a small number of annotations per example is available, hard-label delivery improves over soft-label training, with larger improvements where the sparse empirical target is farther from the full annotator distribution. When full annotator distributions are available, both hard-label methods match soft-label training. We use deterministic control as an ablation of multipass and shuffled SLS as a control that breaks the example-to-distribution match. We also show that SLS and soft-label cross-entropy optimize the same expected objective. Hard-label delivery also converges to flatter basins, with supporting descriptive evidence from OOD detection on SVHN and CIFAR-100. Overall, these results suggest that multipass is a strong practical default when raw vote counts are available, while SLS offers a lightweight alternative that remains competitive when only a few votes per example are available and matches soft-label training when full annotator distributions are available.

2605.20641 2026-05-21 cs.CR cs.AI cs.LG 版本更新

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

可信的权重,危险的优化?针对大语言模型的优化触发后门攻击

Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan

发表机构 * Shanghai Jiao Tong University(上海交通大学) Beihang University(北京航空航天大学) Tongji University(同济大学) Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种利用编译优化过程植入隐蔽后门的攻击方法,通过两种互补策略在无需修改编译器或硬件的情况下,实现对大语言模型的后门攻击,并展示了其在多个开源大语言模型上的高成功率。

Comments 20 pages, 3 figures

详情
AI中文摘要

推理优化是部署大规模语言模型(LLMs)的关键技术。编译是LLMs中最广泛采用的优化技术。尽管编译假设原始图与编译图之间具有语义等价性,但我们首先揭示其数值副作用可以被恶意利用以在LLMs中植入隐蔽的后门。我们提出了一种包含两种互补策略的统一优化触发攻击框架。在不修改编译器或硬件的情况下,一种策略仅在模型编译时翻转特定输入的预测,而另一种策略使用一个通用触发器,在未编译执行时保持静默,但在应用编译优化时劫持任意输入。这两种攻击都能绕过在没有编译时运行的标准安全评估。我们实证表明,这些优化触发后门在四个主流开源LLMs和四个任务上实现了平均90%的攻击成功率,同时在所有设置下保持几乎100%的干净准确性。我们的发现揭示了优化与安全在LLM部署流程交集处的新攻击面,并探讨了减轻此威胁的实用防御方法。

英文摘要

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack success rates averaging 90% across four mainstream open-source LLMs and four tasks, while clean accuracy is preserved at nearly 100% under all settings. Our findings reveal a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline, and we investigate practical defenses to mitigate this threat.

2605.20639 2026-05-21 math.OC cs.LG math.DS 版本更新

Time-Dependent PDE-Constrained Optimization via Weak-Form Latent Dynamics

通过弱形式潜变量动力学进行时间依赖的PDE约束优化

April Tran, Terry Haut, David Bortz, Youngsoo Choi

发表机构 * Department of Applied Mathematics, University of Colorado(应用数学系,科罗拉多大学) Center for Applied Scientific Computing, Lawrence Livermore National Laboratory(应用科学计算中心,劳伦斯利弗莫尔国家实验室)

AI总结 本文提出了一种基于弱形式潜空间降阶建模的框架,用于加速梯度基PDE约束优化,通过弱形式系统识别方法压缩高维解轨迹并识别参数化潜变量动力学,从而在多查询设计和控制场景中实现高效优化。

详情
AI中文摘要

受高维时间依赖偏微分方程约束的优化问题需要重复的正向和灵敏度求解,这在许多多查询设计和控制设置中使高保真优化计算上不可行。我们提出了一种弱形式潜空间降阶建模框架,用于加速梯度基PDE约束优化。所提出的方法基于弱形式潜空间动力学识别(WLaSDI),该方法将高维解轨迹压缩到低维潜变量表示中,并利用弱形式系统识别来识别参数化潜变量动力学。通过避免显式数值微分训练轨迹,弱形式提高了对噪声数据的鲁棒性,并产生了更可靠的代理动力学用于优化。我们制定了由此产生的降阶PDE约束优化问题,并推导了针对所学潜变量动力学的直接灵敏度和伴随基梯度表达式,从而能够以可扩展的方式对设计参数进行梯度评估。该框架在三个时间依赖的基准问题上得到验证:用于最优hohlraum设计的热辐射传递、两流不稳定性Vlasov-Poisson系统以及无粘Burgers方程。在这些例子中,WLaSDI产生了准确的最优设计,保持了在噪声训练数据下的鲁棒性,并实现了显著的计算节省,包括相对于全阶优化的速度提升高达五量级。这些结果表明,弱形式潜变量动力学为复杂时间依赖PDE系统的梯度基优化提供了高效且噪声鲁棒的代理基础。

英文摘要

Optimization problems constrained by high-dimensional, time-dependent partial differential equations require repeated forward and sensitivity solves, making high-fidelity optimization computationally prohibitive in many-query design and control settings. We present a weak-form latent-space reduced-order modeling framework for accelerating gradient-based PDE-constrained optimization. The proposed approach builds on Weak-form Latent Space Dynamics Identification (WLaSDI), which compresses high-dimensional solution trajectories into a low-dimensional latent representation and identifies parametric latent dynamics using weak-form system identification. By avoiding explicit numerical differentiation of training trajectories, the weak-form improves robustness to noisy data and yields more reliable surrogate dynamics for optimization. We formulate the resulting reduced PDE-constrained optimization problem and derive both direct-sensitivity and adjoint-based gradient expressions for the learned latent dynamics, enabling scalable gradient evaluation with respect to design parameters. The framework is demonstrated on three time-dependent benchmark problems: thermal radiative transfer for optimal hohlraum design, the two-stream instability Vlasov-Poisson system, and the inviscid Burgers equation. Across these examples, WLaSDI produces accurate optimal designs, remains robust under noisy training data, and delivers substantial computational savings, including speedups of up to five orders of magnitude relative to full-order optimization. These results demonstrate that weak-form latent dynamics provide an efficient and noise-robust surrogate foundation for gradient-based optimization of complex time-dependent PDE systems.

2605.20624 2026-05-21 cs.CV cs.AI cs.LG 版本更新

Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

用自回归扩散模型加速视频逆问题求解器

Taesung Kwon, Jonghyun Park, Hyungjin Chung, Jong Chul Ye

发表机构 * KAIST(韩国科学技术院) EverEx

AI总结 本文提出自回归视频逆问题求解器(AVIS),通过自回归扩散模型实现流式视频恢复,显著降低初始延迟并提高吞吐量,同时保持高质量的恢复效果,并进一步提出加速变体AVIS Flash,实现更高的吞吐量和更优的效率-性能权衡,为实时部署铺平道路。

Comments Project page is available here: https://avis-project.github.io/

详情
AI中文摘要

扩散模型为零样本视频逆问题提供了强大的先验知识,但其实时部署受到两个效率问题的阻碍:由整体视频恢复引起的高初始延迟,以及由于在像素空间中多次VAE传递以强制测量一致性导致的低吞吐量。为克服这些限制,我们提出了自回归视频逆问题求解器(AVIS)。AVIS框架利用自回归视频扩散模型以流式方式恢复视频,自然地消除了延迟瓶颈。具体而言,AVIS通过测量一致性的估计初始化反向扩散,减少了所需的采样步骤。与领先的非自回归求解器相比,AVIS将初始延迟从114秒减少到4秒,并将吞吐量从0.71提高到1.18 FPS,同时实现更优的恢复质量。我们进一步引入了一个高度加速的变体,称为AVIS Flash,该变体仅在第一个片段上强制测量一致性。AVIS Flash在单个RTX 4090 GPU上将吞吐量提高到5.91 FPS,同时保持竞争性的性能,并实现有利的效率-性能权衡,为实时部署铺平道路。

英文摘要

Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holistic video restoration, and low throughput resulting from multiple VAE passes to enforce measurement consistency in pixel space. To overcome these limitations, we propose Autoregressive Video Inverse problem Solver (AVIS). The AVIS framework leverages autoregressive video diffusion models to restore videos in a streaming manner, naturally eliminating latency bottlenecks. Specifically, AVIS initializes reverse diffusion with a measurement-consistent estimate, reducing the required sampling steps. Compared to leading non-autoregressive solvers, AVIS drastically reduces initial latency from 114s to 4s and increases throughput from 0.71 to 1.18 FPS while achieving superior restoration quality. We further introduce a highly accelerated variant, dubbed AVIS Flash, that enforces measurement consistency solely on the first chunk. AVIS Flash substantially boosts throughput to 5.91 FPS on a single RTX 4090 GPU while maintaining competitive performance and achieving a favorable efficiency-performance trade-off, paving the way toward real-time deployment.

2605.20620 2026-05-21 cs.LG cs.DB cs.GT 版本更新

Dynamic Shapley Computation

动态Shapley值计算

Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei

发表机构 * Duke University(杜克大学) National Taiwan University(国立台湾大学)

AI总结 本文提出D-Shap框架,通过将Shapley值表示为玩家-任务矩阵,解决动态环境下训练数据贡献评估的高效更新问题,利用任务和联盟的局部性特性实现快速更新和自评估。

详情
AI中文摘要

基于数据的Shapley估值提供了一种量化训练数据贡献的原则性方法,但其高计算成本使其在动态设置中难以应用,其中任务和训练玩家不断变化。现有方法将Shapley计算视为一次性过程,将贡献汇总为聚合分数,阻止了重用并要求在任何变化时重新计算。我们引入了一种新的视角,将Shapley值表示为玩家-任务矩阵,并将动态估值建模为结构化矩阵维护问题。我们利用每个任务依赖于少量训练玩家的事实以及相似任务产生相似估值,导致效用局部性和联盟局部性。基于这些见解,我们提出了D-Shap,一种动态估值框架,通过仅修改矩阵的小部分实现高效更新:新任务估值通过结构感知插值推断,而由新玩家引起的更新被限制在受影响的局部矩阵块中。为消除对预指定评估任务的需求,我们引入了自估值,通过可扩展的子集重用和覆盖感知的锚点选择,直接从训练数据构建初始矩阵。在多样模型上的实验表明,D-Shap在毫秒级内完成任务更新,并将玩家更新成本降低至全重新计算的三量级,同时实现与全重新计算相当的估值质量。

英文摘要

Shapley-based data valuation provides a principled way to quantify the contribution of training data, but its high computational cost makes it impractical in dynamic settings where tasks and training players evolve. Existing methods treat Shapley computation as a one-shot process and collapse contributions into aggregated scores, preventing reuse and requiring recomputation under any change. We introduce a new perspective that represents Shapley values as a player-by-task matrix and formulates dynamic valuation as a structured matrix maintenance problem. We exploit the fact that each task depends on a small subset of training players and that similar tasks yield similar valuations, leading to utility locality and coalition locality. Based on these insights, we propose D-Shap, a dynamic valuation framework that enables efficient updates by modifying only a small portion of the matrix: new task valuations are inferred via structure-aware interpolation, while updates induced by new players are confined to affected local matrix blocks. To eliminate the need for pre-specified evaluation tasks, we introduce self-valuation, which constructs the initial matrix directly from training data, supported by scalable subset reuse and coverage-aware anchor selection. Experiments across diverse models show that D-Shap performs task updates in milliseconds and reduces the cost of player updates by up to three orders of magnitude, while achieving valuation quality competitive with full recomputation.

2605.20619 2026-05-21 cs.LG math.OC stat.ML 版本更新

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

SURF: 通过调整标量化权重以均匀遍历帕累托前沿

Liuyuan Jiang, Chentong Huang, Lisha Chen

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 本文提出SURF方法,通过调整标量化权重以实现帕累托前沿的均匀覆盖,解决了传统标量化方法在多目标优化中导致非均匀覆盖的问题。

详情
AI中文摘要

标量化在多目标优化中因其简单性和可扩展性而被广泛应用。然而,在许多应用中,目标是生成代表多样化用户偏好的解决方案,理想情况下应实现帕累托前沿(PF)的均匀覆盖。然而,通常均匀采样标量化权重通常会导致PF的非均匀覆盖。我们通过标量化路径的几何分析解释了这种不匹配。随着标量化权重的变化,对应的解决方案通常以非均匀的速度遍历PF。这种速度诱导了一个弧长累积分布函数(CDF);通过反向此CDF映射,可以得到一个原则性的规则,用于选择产生均匀PF覆盖的权重。基于这一见解,我们提出了SURF(沿帕累托前沿均匀采样)。对于结构化问题,包括双目标老虎机,我们推导了此CDF映射和由此产生的PF感知的权重采样规则。对于一般问题,SURF在CDF重建和权重采样之间交替进行。理论上,我们证明在可证明的条件下,SURF收敛到一个不可避免的有限采样地板。经验上,在老虎机、多目标gymnasium和多目标LLM对齐实验中,SURF在效率上实现了比基线更均匀的PF覆盖。

英文摘要

Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.

2605.20609 2026-05-21 cs.LG 版本更新

Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning

基于潜在类比的组合转导用于离线目标条件强化学习

Junseok Kim, Dohyeong Kim, Mineui Hong, Songhwai Oh

发表机构 * Department of Electrical and Computer Engineering and ASRI, Seoul National University(电气与计算机工程系和首尔国立大学ASRI) Independent researcher(独立研究者) Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所)

AI总结 本文提出了一种基于潜在类比的组合转导方法,用于解决离线目标条件强化学习中面对新情境时的目标泛化问题,通过引入新的类比表示方法,提升了在不同情境下的目标达到能力。

Comments ICML 2026

详情
AI中文摘要

组合泛化对于在新颖的上下文变化中达到未见过的目标在离线目标条件强化学习(GCRL)中至关重要,其中必须从有限的数据中学习一个通用的目标达到智能体。大多数先前的方法通过在时间连续的片段上进行轨迹缝合来实现这一点,这限制了在不同上下文中组合行为的能力。为了克服这一限制,我们正式将类比转导定义为通过组合任务内固有的类比与给定的上下文来合成新的计划,并提出了一个针对此目的的新型类比表示。基于我们的理论,这种类比表示捕捉了在最优任务执行下发生变化的内容,对上下文变化保持不变,并且足以实现最优的目标达到。我们进一步认为,对未见过的类比-上下文对的泛化是类比转导中的实际障碍,并引入了一种新的离线GCRL方法,使类比转导能够超越已见过的对到未见的组合。我们通过在OGBench操纵环境中实验证明了我们方法的有效性,显著优于不进行类比转导的先前方法。项目页面:https://rllab-snu.github.io/projects/CTA/

英文摘要

Compositional generalization is essential for reaching unseen goals under novel contextual variations in offline goal-conditioned reinforcement learning (GCRL), where a generalist goal-reaching agent must be learned from limited data. Most prior approaches pursue this via trajectory stitching over temporally contiguous segments, which limits composing behaviors across varying contexts. To overcome this limitation, we formalize analogy transduction as synthesizing new plans by composing task-endogenous analogies with given contexts and propose a novel analogy representation tailored for it. Grounded in our theory, this analogy representation captures what changes under optimal task execution, remains invariant to contextual variations, and is sufficient for optimal goal reaching. We further contend that generalization to unseen analogy-context pairs is a practical obstacle in analogy transduction, and introduce a new approach for offline GCRL that enables analogy transduction beyond seen pairs to unseen combinations. We empirically demonstrate the effectiveness of our approach on OGBench manipulation environments, substantially outperforming prior methods that do not perform analogy transduction. Project page: https://rllab-snu.github.io/projects/CTA/

2605.20607 2026-05-21 cs.LG cs.CV cs.RO 版本更新

Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System

基于视觉着陆系统的学习保证机制解释

Romeo Valentin, Olivia Beyer Bruvik, Marc R. Schlichting, Mykel J. Kochenderfer

发表机构 * Stanford Intelligent Systems Laboratory, Stanford University, Stanford, CA, USA(斯坦福智能系统实验室,斯坦福大学,斯坦福,CA,美国)

AI总结 本文提出了一种基于视觉着陆系统的学习保证机制,通过分离内容与风格来构建可解释的模型,从而提供可靠的证据支持,同时引入了新的运行时保证方法来监控模型的情境表示。

Comments 10 pages, 4 figures

详情
AI中文摘要

EASA的学习保证指导要求数据驱动的航空系统构建并监控自身的情境表示,但对神经网络而言,提供此类证据的技术手段仍是一个开放问题。我们针对基于视觉的飞机着陆系统填补了这一空白:我们提出,一个可保证的模型至少必须展示其情境表示中能够分离内容与风格。展示模型的预测主要依赖于内容表示组件,从而得到一个具体的保证路径。为了在具体模型上展示这个保证路径,我们训练了一个用于跑道关键点回归的视觉Transformer模型,在LARDv2数据集上进行训练。该模型作为我们保证演示的主体,产生每块嵌入,我们通过K-SVD稀疏字典学习将其分解为可解释的原子。定性可视化确认了内容原子跟踪任务相关的跑道结构,风格原子跟踪领域特定的外观,且回归头几乎将所有线性权重放在内容原子上。我们进一步基于内容/风格分离并定义了模型外范围(OOMS)检测,一种新的运行时保证方法,直接监控模型的情境表示。OOMS监控与操作设计领域和输出空间的分布外监控互补,并满足最近EASA指导的明确要求。通过在测试时间和运行时直接分析模型的情境表示,本工作提供了EASA学习保证指导所要求的第一个具体的表示层面证据,并指出了机制解释作为未来航空安全案例的实用构建块。

英文摘要

EASA's learning-assurance guidance requires data-driven aviation systems to build and monitor their own situation representation, yet for neural networks the technical means to provide such evidence remain an open problem. We address this gap for a vision-based aircraft landing system: we propose that a minimally assurable model must at least be shown to separate content from style in its own situation representation. Showing that the model's predictions then rely largely on the contentful representation components leads to a concrete assurance path. To demonstrate this assurance path on a concrete model we train a vision transformer model for runway keypoint regression on the LARDv2 dataset. The model, which acts as the subject for our assurance demonstration, produces per-patch embeddings that we decompose into interpretable atoms via K-SVD sparse dictionary learning. A qualitative visualization confirms that contentful atoms track task-relevant runway structure and stylistic atoms track domain-specific appearance, and the regression head is shown to place almost all of its linear weight on contentful atoms. We further build on the content/style separation and define out-of-model-scope (OOMS) detection, a novel runtime assurance approach directly monitoring the model's situation representation. OOMS monitoring is complementary to operational design domain and output-space out-of-distribution monitoring and addresses concrete requirements of the recent EASA guidance. By directly analyzing a model's situation representation both at test time and runtime, this work delivers the first concrete piece of the representation-level evidence that EASA learning-assurance guidance demands, and points to mechanistic interpretability as a practical building block of future aviation safety cases.

2605.20602 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

自我训练不使语言扁平化——它重构了它:表面标记增强而深层语法消失

Ming Liu

发表机构 * Amazon(亚马逊)

AI总结 该研究通过实验发现自我训练过程并非使语言扁平化,而是重构了语言结构,表面标记增强而深层语法结构消失,并提出了结构性深度假说来解释这一现象。

Comments 19 pages (14 main + 5 appendix), 8 figures, 3 tables

详情
AI中文摘要

连续对语言模型自身输出进行自我训练通常被描述为一种扁平化过程:多样性下降,分布变窄,文本变得“更像自己”。我们提供了证据表明这种描述是不完整的。在对五个模型(GPT-2 124M,Pythia-410M,Pythia-1.4B,OPT-1.3B,Pythia-2.8B)进行十一代自我训练的过程中,语言并非均匀扁平化——它被重构了。表面标记(连贯词、缓和词、破折号)上升,而中层和深层语法结构(疑问句、插入语、被动语态、条件句)崩溃。我们正式将这种不对称崩溃定义为结构性深度假说(SDH):语言特征的每一代衰减率主要由其结构性深度——它所需嵌套语法依赖的数量——决定,其次才由其生成零次输出频率决定。通过汇总五个模型中三个架构家族的17个特征面板(N=85),汇总的斯皮尔曼相关系数为rho=0.540(p < 10^{-6};簇Bootstrap 95% CI [0.434, 0.634]),而频率是一个显著较弱的预测因子(rho=0.225)。一个匹配的人类文本微调对照实验得到rho=0.039(p=0.88),证实了该梯度是特定于自我训练的。我们进一步记录了一个表面复杂性悖论:总体复杂性代理(依赖树深度、TTR、词长)在底层从句结构消失时均上升,这对训练数据筛选和LLM文本检测有直接影响。

英文摘要

Successive self-training on a language model's own outputs is widely characterized as a process of flattening: diversity drops, distributions narrow, and the text becomes "more like itself." We provide evidence that this characterization is incomplete. Across eleven generations of self-training on five models (GPT-2 124M, Pythia-410M, Pythia-1.4B, OPT-1.3B, Pythia-2.8B), language is not flattened uniformly -- it is restructured. Surface markers (discourse connectives, hedges, em-dashes) rise, while mid- and deep-syntactic structures (questions, parentheticals, passives, subjunctives) collapse. We formalize this asymmetric collapse as the Structural Depth Hypothesis (SDH): the per-generation decay rate of a linguistic feature is predicted primarily by its structural depth -- the number of nested syntactic dependencies it requires -- and only secondarily by its generation-zero output frequency. Pooling 17-feature panels from five models spanning three architecture families (N=85), the pooled Spearman correlation is rho=0.540 (p < 10^{-6}; cluster-bootstrap 95% CI [0.434, 0.634]), while frequency is a substantially weaker predictor (rho=0.225). A matched human-text fine-tuning control yields rho=0.039 (p=0.88), confirming the gradient is self-training-specific. We further document a Superficial Complexity Paradox: aggregate complexity proxies (dep-tree depth, TTR, word length) all rise as the underlying clause structure dies, with direct implications for training-data curation and LLM-text detection.

2605.20599 2026-05-21 cs.LG 版本更新

Unsupervised clustering and classification of upper limb EMG signals during functional movements: a data-driven

无监督聚类和分类功能性运动中上肢EMG信号:一种数据驱动的方法

L. F. Salazar Álvarez, D. Escobar-Saltarén, M. B. Salazar Sánchez, S. C. Henao-Aguirre

发表机构 * In2Lab, Engineer Faculty, Universidad de Antioquia(1 In2实验室,工程师学院,安提奥基亚大学) School of Engineering and Sciencies, Tecnológico de Monterrey(2 工程与科学学院,蒙特雷技术学院)

AI总结 本文提出了一种综合方法,用于对功能性抓取和抓握运动中上肢表面肌电信号进行聚类和分类,通过数据驱动的方法在NINAPRO DB4数据集上应用,提出了一种四阶段流程,包括信号预处理、特征提取、通过层次聚类选择手势以及比较模型评估,最终选出五个关键特征用于分类任务。

Comments 19 Congreso Colombiano de Computación (19CCC)

详情
AI中文摘要

本研究提出了一种综合方法,用于对功能性抓取和抓握运动中上肢表面肌电信号进行聚类和分类。该方法应用于NINAPRO DB4数据集,该数据集提供了52个手势的多通道肌电信号记录。设计了一种四阶段流程,包括信号预处理、特征提取、通过层次聚类选择手势以及比较模型评估。预处理包括四阶低通滤波器(0.6 Hz)和希尔伯特包络变换,有效减少噪声并增强信号清晰度。特征提取得到26个时域和频域指标,随后通过视觉分析、互信息、主成分分析和决策树重要性分数进行优化。最终选出五个关键特征用于分类任务。通过使用马氏距离进行层次聚类,选择了六个代表性动作,平衡了生物力学多样性和计算效率。200 ms窗口被确定为最佳时间分割长度,基于稳定性和生理合理性。分类器模型在两个阶段进行评估。使用PyCaret自动比较发现Extra Trees(ET)和人工神经网络(ANN)表现最佳。随后的独立训练证实了它们的稳定性和泛化能力,ANN显示出渐进学习,而ET保持了稳健、一致的结果。研究结果支持了对肌电假肢实施自适应、低延迟控制策略的实现,并提供了一个可扩展的流程用于未来实时应用。

英文摘要

This study presents a comprehensive approach for the clustering and classification of upper-limb surface electromyography (sEMG) signals during functional reach and grasp movements. The methodology was applied to the NINAPRO DB4 dataset, which provides multichannel EMG recordings of 52 gestures. A four-stage pipeline was designed, including signal preprocessing, fea-ture extraction, gesture selection via hierarchical clustering, and comparative model evaluation. Preprocessing involved a fourth-order low-pass filter (0.6 Hz) and Hilbert envelope transformation, effectively reducing noise and enhancing signal clarity. Feature extraction yielded 26 temporal and frequency-domain met-rics, which were later refined using visual analysis, mutual information, principal component analysis, and decision tree importance scores. A final subset of five key features was selected for classification tasks. Gesture selection was per-formed through hierarchical clustering using Mahalanobis distance, resulting in six representative movements that balanced biomechanical diversity and compu-tational efficiency. A 200 ms window was identified as optimal for temporal seg-mentation based on stability and physiological plausibility. Classifier models were evaluated in two stages. Automated comparison using PyCaret identified Extra Trees (ET) and Artificial Neural Networks (ANN) as top performers. Sub-sequent independent training confirmed their stability and generalization capac-ity, with ANN showing progressive learning and ET maintaining robust, con-sistent results. The findings support the implementation of adaptive, low-latency control strategies for myoelectric prostheses and provide a scalable pipeline for future real-time applications.

2605.20592 2026-05-21 cs.LG 版本更新

ReversedQ: Opportunities for Faster Q-Learning in Episodic Online Reinforcement Learning

ReversedQ: 在回合制在线强化学习中更快的Q学习机会

Sofia R. Miskala-Dinc, Aviva Prins

发表机构 * University of Maryland(马里兰大学)

AI总结 本文研究了在回合有限的马尔可夫决策过程(MDPs)中使用无模型Q学习的效率问题,提出了ReversedQ方法,通过改进价值函数更新顺序、更新频率和初始化来提升学习速度,实验表明其在多个任务中均优于现有方法。

Comments This paper contains 5 pages and 2 figures. To be presented at the Adaptive and Learning Agents workshop (ALA 2026) at AAMAS 2026

详情
AI中文摘要

我们研究了在有限回合的回合制马尔可夫决策过程(MDPs)中使用无模型Q学习的性能,其中动态在回合间保持稳定。我们识别了新兴无模型后验抽样工作中一个核心问题:为了证明理论保证,必须依赖延迟学习。特别是,我们识别了三个加速学习的机会:(i)价值函数更新顺序,(ii)更新频率,以及(iii)价值函数初始化。基于Wang等人提出的RandomizedQ,我们展示了这些变化及其单独和累积的影响,并在多个经验研究中进行了验证。我们发现,我们的综合修改,称为ReversedQ,在Bidirectional Diabolical Combination Lock(BDCL)任务中,相对于RandomizedQ,缩放后的平均累积奖励从9.53%提升至78.78%,在链状MDP中,从21.76%提升至61.81%。

英文摘要

We study model-free Q-learning in finite-horizon episodic Markov Decision Processes (MDPs) with stationary dynamics across episodes. We identify a central issue in nascent model-free posterior-sampling works: the reliance on delayed learning in order to prove theoretical guarantees. In particular, we identify three opportunities for faster learning - (i) value-function update order, (ii) update frequencies, and (iii) value-function initialization. Using Wang et al.'s RandomizedQ as a basis, we illustrate these changes and their individual (as well as cumulative) impact in multiple empirical studies. We find that our combined modifications, termed ReversedQ, improve scaled mean cumulative reward compared to RandomizedQ, from 9.53% to 78.78% in the Bidirectional Diabolical Combination Lock (BDCL), and from 21.76% to 61.81% in a chain MDP.

2605.20581 2026-05-21 cs.LG cond-mat.mtrl-sci 版本更新

TriForces: Augmenting Atomistic GNNs for Transferable Representations

TriForces: 为可迁移表示增强原子istic GNNs

Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, Joseph Musielewicz

发表机构 * Université Paris-Saclay, CentraleSupélec, Inria, Gif-sur-Yvette, France(巴黎-萨克雷大学,中央超算研究所,法国国家信息与自动化技术研究院,法国吉夫-sur-耶vette)

AI总结 TriForces通过分离组成和结构信息并结合自监督学习,提升MatBench和QM9的性能,无需DFT标签,并在OMat24上实现高效相似结构检索。

Comments 28 pages, 11 figures. Accepted at ICML 2026

详情
AI中文摘要

机器学习互原子势(MLIPs)在训练于大规模密度泛函理论(DFT)数据时能取得优异的准确性。为了在实践中有用,它们通常需要通过小而昂贵的特定任务数据集进行调整。然而,MLIPs在不同领域之间的迁移不一致,其表示往往失去可访问的组成和结构信息。为此,我们提出了TriForces,一个模型无关的三流框架,通过分离组成和结构信息并结合自监督学习来保持可迁移的表示。TriForces在MatBench和QM9上优于基线模型,无需DFT标签,并通过其学习的潜在空间实现高效的相似结构检索。在OMat24上,在有限数据训练条件下,TriForces在20K样本仅需时将能量MAE减少57%,并在不同样本数量下提升力MAE。我们还发布了多个MLIP架构的预训练TriForces变体,并在https://github.com/Ramlaoui/triforces上提供代码。

英文摘要

Machine learning interatomic potentials (MLIPs) achieve excellent accuracy when trained on large Density Functional Theory (DFT) data. To be useful in practice, they must often be adapted to target chemistries using small and expensive task-specific datasets. However, MLIPs transfer inconsistently across domains, with representations that often loose accessible composition and structure information. To address this, we present TriForces, a model-agnostic three-stream framework that separates composition and structure information, combined with self-supervised learning to preserve transferable representations. TriForces improves performance on MatBench and QM9 over baselines without needing DFT labels and enables efficient similar structure retrieval through its learned latent space. On OMat24, in limited-data training regime, TriForces reduces energy MAE by 57% at 20K samples only and improves force MAE across sample sizes. We release pretrained TriForces variants across multiple MLIP architectures with code at https://github.com/Ramlaoui/triforces.

2605.20580 2026-05-21 cs.LG 版本更新

Deep Learning Surrogates for Emulating Stochastic Climate Tipping Dynamics

深度学习代理用于模拟随机气候临界动态

Adeline Hillier, Jennifer Sleeman, Jay Brett, Caroline Tang, Jenelle Millison, Anand Gnanadesikan

发表机构 * Johns Hopkins Applied Physics Laboratory(约翰霍普金斯应用物理实验室) Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文提出了一种基于动态信息的时序融合变换器作为数据驱动的代理,用于高效模拟复杂的地球系统模拟,通过预测临界事件的时间来提高计算效率。

详情
AI中文摘要

本文探讨了一种基于动态信息的时序融合变换器(TFT)作为数据驱动代理,用于计算密集型地球系统模拟。聚焦于描述全球海洋输送的多变量时间序列,我们展示了该代理在数千个时间步上预测临界事件的能力。数据包括多达21个非平稳时间序列以及描述自由参数和初始条件的静态协变量。对架构和目标函数的修改使代理能够高保真地预测大西洋和太平洋崩溃的时间,并捕捉跨集合预测的随机不确定性。所学代理在数值模拟器上实现了465倍的计算加速,同时保持对参数和初始条件的可微性。

英文摘要

This work explores a dynamics-informed Temporal Fusion Transformer (TFT) as a data-driven surrogate for computationally intensive Earth system simulations. Focusing on multivariate time series describing global ocean transport, we demonstrate the surrogate's ability to forecast tip events across thousands of time steps. The data involve up to 21 non-stationary time series in addition to static covariates describing free parameters and initial conditions. Modifications to the architecture and objective function yield a surrogate that anticipates the timing of Atlantic and Pacific collapses to high fidelity and captures the stochastic uncertainty in transition timing across ensemble predictions. The learned surrogate achieves a 465x computational speedup over the numerical simulator while maintaining differentiability with respect to parameters and initial conditions.

2605.20577 2026-05-21 cs.AI cs.LG 版本更新

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

Mahjax: 一种用于在JAX中进行强化学习的GPU加速麻将模拟器

Soichiro Nishimori, Shinri Okano, Keigo Habara, Sotetsu Koyamada, Eason Yu, Masashi Sugiyama

发表机构 * The University of Tokyo(东京大学) RIKEN AIP(日本理化学研究院AIP) Nara Institute of Science and Technology(奈良科学技術大學) Kobe University(Kobe大学) Kyoto University(京都大学) ATR The University of Sydney(悉尼大学)

AI总结 本文提出Mahjax,一种基于JAX实现的麻将环境,利用GPU加速大规模并行化,以解决麻将游戏中的高维状态空间和随机性问题,为强化学习提供高效的训练平台。

详情
AI中文摘要

Riichi Mahjong是一种多玩家、信息不完全的游戏,具有随机性和高维状态空间的特性。这些属性构成了强化学习中复杂决策问题的独特挑战。尽管先前研究主要依赖于从人类游戏日志中监督学习来预训练策略,但能够从头开始学习(tabula rasa)的算法在通用性上具有更大潜力,如AlphaZero所示。为促进此类研究,我们引入了Mahjax,一个完全向量化实现的Riichi Mahjong环境,用于在图形处理器(GPU)上实现大规模的回放并行化。我们还提供了一个高质量的可视化工具,以简化调试和与训练代理的交互。实验结果表明,Mahjax在八块NVIDIA A100 GPU上分别实现了高达200万和100万步每秒的吞吐量。此外,我们通过展示代理能够有效训练以提高其相对于基线策略的排名,验证了该环境在强化学习中的实用性。

英文摘要

Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also provide a high-quality visualization tool to streamline debugging and interaction with trained agents. Experimental results demonstrate that Mahjax achieves throughputs of up to \textbf{2 million} and \textbf{1 million steps per second} on eight NVIDIA A100 GPUs under the no-red and red rules, respectively. Furthermore, we validate the environment's utility for reinforcement learning by showing that agents can be trained effectively to improve their rank against baseline policies.

2605.20563 2026-05-21 cs.MA cs.AI cs.CL cs.LG cs.SE 版本更新

Multi-agent Collaboration with State Management

具有状态管理的多智能体协作

Mengyang Liu, Taozhi Chen, Zhenhua Xu, Xue Jiang, Yihong Dong

发表机构 * Shanghai Jiaotong University(上海交通大学) Cortices AI Emory University(埃默里大学) Peking University(北京大学)

AI总结 本文提出STORM,一种面向多智能体协作的状态管理方法,通过在共享工作区中调解智能体的交互,确保每个智能体在一致的代码库视图上操作,并在写入时检测和解决冲突。STORM在多个LLM上优于基于git-worktree的多智能体基线,且在成本效率上具有竞争力,表明显式状态管理比工作区隔离更有效。

详情
AI中文摘要

近年来,多智能体系统在解决复杂任务方面展现出巨大潜力。然而,当多个智能体同时编辑共享代码库时,他们的更改可能会产生冲突,不一致的视图会导致集成失败。现有的多智能体系统通过工作区隔离(例如每个智能体一个git工作树)来解决这个问题,但这种方法将冲突解决推迟到事后合并步骤,恢复成本较高。在本文中,我们提出了STORM,即面向多智能体协作的状态管理(STate-ORiented Management)。具体而言,STORM通过调解智能体与共享工作区的交互来管理智能体状态,确保每个智能体都在代码库的一致视图上操作,并在写入时检测和解决冲突。我们评估了STORM在Commit0和PaperBench多个LLM上的表现。STORM在Commit0-Lite上比基于git-worktree的多智能体基线高出18.7%,在PaperBench上高出1.4%,同时在成本效率上具有竞争力或更好。结合单智能体运行,STORM在两个基准测试中分别达到87.6和78.2的最高分数,表明显式状态管理比工作区隔离更有效作为多智能体协作的基础。STORM也可以无缝地集成到任何多智能体系统中。

英文摘要

Recent advances in multi-agent systems have shown great potential for solving complex tasks. However, when multiple agents edit a shared codebase concurrently, their changes can silently conflict and inconsistent views lead to integration failures. Existing multi-agent systems address this through workspace isolation (e.g., one git worktree per agent), but this defers conflict resolution to a post-hoc merge step where recovery is expensive. In this paper, we propose STORM, i.e., STate-ORiented Management for multi-agent collaboration. Specifically, STORM manages agent states by mediating their interactions with the shared workspace, ensuring that each agent operates on a consistent view of the codebase and that conflicting edits are detected and resolved at write time. We evaluate STORM on Commit0 and PaperBench across multiple LLMs. STORM outperforms the git-worktree-based multi-agent baseline by +18.7 on Commit0-Lite and +1.4 on PaperBench, while achieving comparable or better cost efficiency. Combined with single-agent runs, STORM reaches highest scores of 87.6 and 78.2 on the two benchmarks respectively, suggesting that explicit state management is a more effective foundation for multi-agent collaboration than workspace isolation. STORM can also be plugged into any multi-agent system seamlessly.

2605.20559 2026-05-21 stat.ML cs.LG stat.AP stat.ME 版本更新

Group-Aware Matrix Estimation and Latent Subspace Recovery

基于群体的矩阵估计与潜在子空间恢复

Hamza Golubovic, Matthew Shen, Genevera I. Allen, Tarek M. Zikry

发表机构 * Department of Statistics, Columbia University(哥伦比亚大学统计系) Irving Institute for Cellular Dynamics, Zuckerman Institute Columbia University(Zuckerman研究所细胞动力学院,哥伦比亚大学) School of Data and Information Sciences, University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校数据与信息科学学院)

AI总结 本文提出了一种针对异质数据中群体特定低秩矩阵估计的凸估计器GAME,通过重叠核范数惩罚正则化来恢复子群特定的子空间结构,同时在共享坐标系中保留局部潜在结构,并在不同数据集上验证了其在结构缺失情况下优于传统低秩方法的性能。

Comments 12 pages, 6 main figures, 1 main algorithm

详情
AI中文摘要

现代矩阵补全问题通常涉及异质数据,其行同时属于多个元类别,如推荐系统中的人口统计数据和年龄组,或神经电生理实验中的区域和记录会话标签。标准低秩估计器施加单一全局潜在几何结构,可以恢复平均结构,但可能平滑掉子群特定的变异,尤其是在观察分布不均的情况下。我们引入了Group-Aware Matrix Estimation (GAME),一种用于重叠子群级低秩矩阵估计的凸估计器。GAME通过重叠核范数惩罚正则化子群特定的子矩阵,允许相关组之间共享信息,同时在共享坐标系中保留局部潜在结构。我们为重建误差和子群特定子空间恢复提供了有限样本保证,展示了性能如何依赖于采样密度、子群秩和重叠结构。在合成、推荐、生态和神经科学数据集上的实验表明,GAME在结构缺失情况下最有益,其中子群意识正则化提高了重建准确性和潜在子空间保真度。在这些基准测试中,GAME在全局低秩、侧信息和现代填补基线中表现竞争力或最佳,当子群表现出不同低秩结构时,收益最大。

英文摘要

Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

2605.20555 2026-05-21 cs.LG cs.AI 版本更新

Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs

通过logit平均在LLMs后训练中补充强化学习

Xingwei Gan, Ying Zhu

发表机构 * UC San Diego(加州大学圣迭戈分校)

AI总结 本文提出一种在LLMs后训练中通过logit平均补充强化学习的方法,将该方法整合到Group Relative Policy Optimization (GRPO)中,无需使用KL正则化或critic,通过logit平均结构将可训练策略与参考策略耦合,以利用可训练策略的推理能力并保持SFT的格式优势。

详情
AI中文摘要

我们介绍了一种新颖的方法,该方法对冻结的参考策略(例如SFT)和可训练策略的logits进行平均,并将该方法整合到Group Relative Policy Optimization (GRPO)中。与Reinforcement Learning with Verifiable Rewards (RLVR)方法不同,我们的方法不涉及Kullback Leibler (KL)正则化或critic;可训练策略和参考锚点通过logit平均结构耦合,以利用可训练策略的推理能力,同时保持SFT的格式优势。我们的方法在MATH、cn-k12和MMLU上进行了评估,结果表明其准确率高于或至少与传统的KL正则化GRPO相当。

英文摘要

We introduce a novel method that averages the logits of a frozen reference policy (e.g., SFT) and a trainable policy, and incorporate the method into Group Relative Policy Optimization (GRPO). In contrast to Reinforcement Learning with Verifiable Rewards (RLVR) methods, our proposal does not involve a Kullback Leibler (KL) regularization or critic; the trainable policy and the reference anchor are coupled through the logit averaging structure to leverage the reasoning expertise of the trainable policy while maintaining the formatting advantage of SFT. Our method is evaluated on MATH, cn-k12, and MMLU, and the results show a higher accuracy or at least comparable accuracy relative to the canonical KL-regularized GRPO.

2605.20552 2026-05-21 stat.ML cs.LG 版本更新

Spectral bandits for smooth graph functions with applications in recommender systems

图上平滑函数的谱带it问题及其在推荐系统中的应用

Tomáš Kocák, Michal Valko, Rémi Munos, Branislav Kveton, Shipra Agrawal

发表机构 * SequeL team, Inria France Microsoft Research New England(Inria法国微软新英格兰研究实验室SequeL团队) Technicolor Research Center California(Technicolor加州研究中心) Microsoft Research New England(微软新英格兰研究实验室) Microsoft Research Bangalore India(微软班加罗尔印度研究实验室)

AI总结 本文研究了图上平滑函数的带it问题,提出了一种在推荐系统中有效学习用户偏好的方法,通过有效维度的定义和线性缩放的算法,实现了低悔的在线学习。

Comments Published at AAAI 2014 - SDMBD

详情
AI中文摘要

图上的平滑函数在流形和半监督学习中有广泛应用。本文研究了一个带it问题,其中臂的收益在图上是平滑的。该框架适用于涉及图的在线学习问题,如基于内容的推荐。在该问题中,每个推荐的项目是一个节点,其预期评分与其邻居相似。目标是推荐具有高预期评分的项目。我们旨在设计累积遗憾不随节点数量劣化的算法。特别是,我们引入了有效维度的概念,该概念在现实世界图中较小,并提出了两种算法,其规模与该维度线性相关。我们在现实世界的内容推荐问题上的实验表明,从仅几十个节点的评估中即可学习出对成千上万项目的良好用户偏好估计器。

英文摘要

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

2605.20547 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Latent Process Generator Matching

潜在过程生成器匹配

Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell

发表机构 * Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet(微生物学、肿瘤和细胞生物学系,Karolinska研究院)

AI总结 本文提出了一种潜在过程生成器匹配框架,该框架将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像,从而扩展了生成器匹配理论,使其适用于时间依赖的潜在条件过程。

Comments 18 pages, 1 figure

详情
AI中文摘要

许多近期的流匹配和扩散式生成模型在训练过程中依赖于辅助的随机动力学:通过模拟更丰富的过程来定义条件目标,但辅助状态在生成时要么难以采样,要么并不属于期望的输出。现有的生成器匹配理论规范了对静态潜在随机变量的条件,而几篇近期论文证明了特定增强状态构造的投影结果的特殊情况。我们引入了潜在过程生成器匹配,一种通用框架,将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像 $X_t=Φ(Y_t)$。我们显示在这一设定下,可以在图像空间中学习一个随机过程的生成器,其一阶边缘分布与投影过程相同。这扩展并涵盖了文献中的离散潜在过程结果,并将生成器匹配从静态潜在变量扩展到丰富的时间依赖潜在条件过程家族。

英文摘要

Many recent flow-matching and diffusion-style generative models rely on auxiliary stochastic dynamics during training: a richer process is simulated to define conditional targets, but the auxiliary state is either intractable to sample at generation time or simply not part of the desired output. Existing Generator Matching theory formalises conditioning on static latent random variables, and several recent papers prove special cases of projection results for particular augmented-state constructions. We introduce latent process generator matching, a general framework that treats the observed generative state as a deterministic image $X_t=Φ(Y_t)$ of a tractable Markov process $Y_t$. We show that in this setting one may learn the generator of a stochastic process on the image space which has the same one-time marginal distributions as the projected process. This generalizes and subsumes the discrete latent process results from the literature, and extends Generator Matching from static latent variables to a rich family of time-dependent latent conditional processes.

2605.20545 2026-05-21 stat.ML cs.LG 版本更新

Sample Complexity of Transfer Learning: An Optimal Transport Approach

迁移学习的样本复杂性:一种最优传输方法

Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

发表机构 * Tsinghua-Berkeley Shenzhen Institute(清华大学-伯克利深圳研究院)

AI总结 本文通过最优传输视角分析迁移学习的样本效率,发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),优于直接学习的O(m^{-p/d}),其中α表示数据分布的光滑度,p表示最优目标模型的光滑度。

详情
AI中文摘要

迁移学习是许多复杂结构的机器学习/AI模型,如大语言模型和生成式AI中的关键技术。迁移学习的本质是利用已解决的源任务知识来解决新目标任务,尤其是在后者训练数据样本量m较低时。本文严格分析了迁移学习在样本效率方面的潜在优势。具体而言,从最优传输视角出发,我们发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),其中α表示数据分布的光滑度,而直接学习的样本复杂性为O(m^{-p/d}),其中p表示最优目标模型的光滑度。我们的发现从理论上支持了当目标任务在一系列不太光滑的模型(即高度复杂的网络,可能使用非光滑激活函数)中优化时,迁移学习具有更好的样本效率。以图像分类为例,我们通过数值实验展示了迁移学习的样本效率,即在数据渴求的 regime 中,迁移学习可以显著提升模型性能。

英文摘要

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

2605.20030 2026-05-21 cs.LG math.OC 版本更新

Take It or Leave It: Intent-Controlled Partial Optimal Transport

Take It or Leave It: Intent-Controlled Partial Optimal Transport

Salil Parth Tripathi, Bertrand Chapron, Fabrice Collard, Nicolas Courty, Ronan Fablet

发表机构 * OceanDataLab Ifremer Université Bretagne Sud(布列塔尼大学) IMT Atlantique(IMT阿蒂提斯)

AI总结 本文提出了一种意图控制的局部最优传输(IC-POT),通过引入点wise拒绝成本替代全局拒绝机制,解决了在应用中需要更结构化的点wise拒绝机制的问题,并展示了其在正样本无标签学习和开放部分领域适应中的实际应用价值。

详情
AI中文摘要

虽然最优传输(OT)通过要求两个测度精确匹配来施加刚性约束,而部分最优传输通过允许通过全局预算、标量退款或统一拒绝规则来保留未匹配的质量。然而,许多应用需要更结构化的点wise拒绝机制,其中决定是否未匹配质量取决于侧面特定的可靠性、支持几何或外部信息,关于哪些组件应参与比较。我们引入了意图控制的部分最优传输(IC-POT),即部分传输的一种有针对性的扩展,它用两个测度上的点wise拒绝成本替代了全局拒绝范式。我们证明了由此产生的优化问题可以以局部接受阈值的形式进行双解释,并可以通过将其重新表述为在扩展支持上的平衡Kantorovich OT问题来求解。除了理论分析外,我们还展示了IC-POT在拒绝由侧面信息驱动的设置中的实际相关性。在正样本无标签学习和开放部分领域适应中,将编码统计结构的点wise拒绝规则纳入固定基线流程中可以提高性能。最后,我们用一个地球物理实际案例来说明IC-POT的使用:多模态卫星海洋测量,其中物理和传感器先验自然地指导拒绝机制并定义检索的可比信号信息。

英文摘要

While optimal transport (OT) enforces a rigid constraint by requiring two measures to be matched exactly, partial optimal transport relaxes this requirement by allowing mass to remain unmatched through a global budget, scalar rebate, or uniform rejection rule. However, many applications call for more structured, pointwise rejection mechanisms, where the decision to leave mass unmatched depends on side-specific reliability, support geometry, or external information about which components should participate in the comparison. We introduce \emph{intent-controlled partial optimal transport} (IC-POT), a targeted generalization of partial transport that replaces the global rejection paradigm with pointwise rejection costs over both measures. We show that the resulting optimization problem admits a dual interpretation in terms of local acceptance thresholds and can be solved by recasting it as a balanced Kantorovich OT problem on an augmented support. Beyond theoretical analysis, we demonstrate the practical relevance of IC-POT in settings where rejection is driven by side information. In positive-unlabeled learning and open-partial domain adaptation, incorporating pointwise rejection rules that encode statistical structure improves fixed baseline pipelines. Finally, we motivate the use of IC-POT with a geophysical practical case: multi-modal satellite ocean measurements, for which physical and sensors priors naturally inform the rejection mechanism and define the retrieved comparable signal information.

2605.19537 2026-05-21 cs.LG 版本更新

The Silent Hyperparameter: Quantifying the Impact of Inference Backends on LLM Reproducibility

沉默的超参数:量化推理后端对LLM可重复性的影响

David Pape, Jonathan Evertz, Lea Schönherr

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全研究中心)

AI总结 本文研究了推理后端对LLM基准测试结果的影响,发现不同后端可能导致基准分数变化达16.6个百分点,并引发高比例的输出分歧,强调了推理后端作为关键超参数的重要性。

详情
AI中文摘要

在LLM的进步中,标准化基准测试已成为衡量进展的主要方式,其中最先进的改进通常仅以小数点后几位百分比点来区分。同时,现代LLM评估的计算成本推动了专用推理后端的广泛应用,这些软件系统在推理时高效执行训练好的模型。尽管对可扩展性至关重要,系统级优化,如定制CUDA内核和降低精度的算术,可能会改变令牌概率并引入非确定性,这可能引发生成结果的分歧。在本工作中,我们首先调查了推理景观,识别出200个不同的引擎,并分析了35,000篇机器学习论文,发现尽管存在广泛多样性,特定的推理堆栈很少被报告。然后,我们系统地研究了推理后端如何影响LLM基准测试结果。在保持模型权重、解码参数和硬件不变的情况下,我们评估了五个广泛使用的推理引擎,包括vLLM、SGLang和llama.cpp,跨多个开放权重模型和已建立的基准测试。我们证明,仅选择后端即可使基准分数变化高达16.6个百分点,并引发高比例的输出分歧。通过隔离后端优化并追踪执行管道,我们发现这种分歧是由系统级优化如前缀缓存和CUDA图、定制内核以及日志处理中的引擎特定默认设置所驱动。我们的发现将推理后端识别为在LLM评估中之前未报告但重要的超参数,并倡导标准化报告推理堆栈以提高基准比较的可重复性和可解释性。

英文摘要

Progress in LLMs is increasingly measured through standardized benchmarks, where state-of-the-art improvements are often separated by fractions of a percentage point. At the same time, the computational cost of evaluating modern LLMs has driven widespread adoption of specialized inference backends, software systems that execute trained models efficiently at inference time. While critical for scalability, system-level optimizations, such as custom CUDA kernels and reduced-precision arithmetic, can alter token probabilities and introduce non-determinism, possibly cascading into divergent generation. In this work, we first survey the inference landscape, identifying 200 distinct engines, and analyze 35,000 ML publications, finding that the specific inference stack is rarely reported despite this widespread diversity. We then present a systematic empirical study of how inference backends affect LLM benchmark results. Holding model weights, decoding parameters, and hardware constant, we evaluate five widely used inference engines, including vLLM, SGLang, and llama$.$cpp, across multiple open-weight models and established benchmarks. We show that the choice of backend alone can shift benchmark scores by up to 16.6 percentage points and induce high rates of output disagreement. By isolating backend optimizations and tracing the execution pipeline, we find this divergence is driven by system-level optimizations like prefix caching and CUDA graphs, custom kernels, and engine-specific defaults in logit processing. Our findings identify the inference backend as a previously unreported but consequential hyperparameter in the evaluation of LLM and advocate standardized reporting of inference stacks to improve the reproducibility and interpretability of benchmark comparisons.

2605.19503 2026-05-21 cs.RO cs.AI cs.LG 版本更新

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

ARC-RL: 一种受ARC Raiders启发的强化学习游乐场

Carlo Romeo, Andrew D. Bagdanov

发表机构 * Media Integration and Communication Center – University of Florence(媒体整合与通信中心——佛罗伦萨大学)

AI总结 本文提出ARC-RL,一个包含四种MuJoCo连续控制环境的强化学习游乐场,这些环境的机器人形态灵感来自ARC Raiders的生物目录,通过统一的观察模板、动作约定和奖励函数,研究不同形态和动画风格约束下的强化学习算法性能。

详情
AI中文摘要

腿部运动的强化学习已经发展成一个多组件奖励函数和物理引擎基准的堆叠,其形态统一来源于现实商业硬件。然而,游戏NPC受风格约束,缺乏sim-to-real机器人,通常以没有现实机器人对应物的生物形式出现。我们介绍了ARC-RL,一个包含四种MuJoCo连续控制环境的套件,其机器人形态受ARC Raiders的生物目录启发:18自由度的高六足Queen、12自由度的装甲六足Bastion、18自由度的紧凑六足Tick以及12自由度的四足Leaper。这四个机器人共享统一的观察模板、动作约定、仿真节奏和一个单一的闭式多组件奖励函数,其唯一形态差异体现在一小部分权重和参数中。奖励融合了速度跟踪帐篷、健康生存奖励、相位锁定步态适应奖励/成本对、动作正则化器、三个安全惩罚和姿态锚;在任何点都不会引入运动捕捉数据。我们还为每种形态提供手工制作的中心模式生成器演示,这些演示既作为固定专家参考,也作为离线到在线训练的先验数据来源。在此游乐场中,我们进行了一项受控的实证研究,比较标准在线算法(SAC、SPEQ、SOPE-EO)和带有先验数据的算法(SACfD、SPEQ-O2O、SOPE),并研究每种范式如何应对游乐场的形态多样性和动画风格约束。源代码可在https://github.com/CarloRomeo427/ARC_RL.git获取。

英文摘要

Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints. Source code is available at https://github.com/CarloRomeo427/ARC_RL.git.

2605.19278 2026-05-21 q-fin.PM cs.LG 版本更新

Do Better Volatility Forecasts Lead to Better Portfolios? Evidence from Graph Neural Networks

波动率预测是否能带来更好的投资组合?图神经网络的实证证据

Rylan Wade

发表机构 * University of Southern California(南加州大学)

AI总结 本文研究图神经网络是否能提高实际波动率预测,并探讨这些预测是否能提升投资组合表现。通过2015-2025年间465只标普500股票的每周实际波动率数据,将异质自回归和长短期记忆基线模型与基于滚动相关性、行业和格兰杰因果图的图神经网络模型进行比较,包括和不包括宏观经济状态特征。实证发现,预测误差最小、横截面排名准确度最高、投资组合夏普比率最高的模型是三种不同的模型。预测准确性、排名质量与投资组合表现相关但不等同。只有当投资规则能利用其编码的横截面结构时,图波动率模型才具有价值。

详情
AI中文摘要

本文检验图神经网络是否能提高实际波动率预测,并探讨这些预测是否能提升投资组合表现。使用2015-2025年间465只标普500股票的每周实际波动率数据,将异质自回归和长短期记忆基线模型与基于滚动相关性、行业和格兰杰因果图的图神经网络模型进行比较,包括和不包括宏观经济状态特征。实证发现,预测误差最小、横截面排名准确度最高、投资组合夏普比率最高的模型是三种不同的模型。预测准确性、排名质量与投资组合表现相关但不等同。只有当投资规则能利用其编码的横截面结构时,图波动率模型才具有价值。

英文摘要

This paper tests whether graph neural networks improve realized volatility forecasts and whether those forecasts improve portfolio performance. Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, Heterogeneous Autoregressive and Long Short-Term Memory baselines are compared against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.

2605.19138 2026-05-21 cs.RO cs.AI cs.LG 版本更新

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

COBALT: 通过基于云的远程操作利用智能手机进行机器人学习

Ayush Agarwal, Ansh Gandhi, Jeremy A. Collins, Omar Rayyan, Aryan Sarswat, Ranjani Koushik, Masoud Moghani, Ajay Mandlekar, Animesh Garg

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of California, Berkeley(加州大学伯克利分校) New York University Abu Dhabi (NYUAD)(纽约大学阿布扎克分校) University of Toronto(多伦多大学) NVIDIA(英伟达)

AI总结 本文提出COBALT平台,通过基于云的远程操作技术,利用智能手机等设备大规模收集高质量的机器人学习数据,提高仿真实验和现实世界中的机器人学习效率。

详情
AI中文摘要

大规模、高质量的演示数据稀缺仍然是扩展模仿学习用于机器人操作的主要瓶颈。我们提出了COBALT,一个旨在大规模普及机器人学习的远程操作平台,无论是仿真还是现实世界。通过利用向量化的环境,我们的可扩展、负载均衡的基础设施支持多个用户在单个GPU上同时进行远程操作,从而显著降低远程操作成本。操作员可以使用几乎全球任何地方的常见设备连接,包括单或双智能手机、VR头盔、3D鼠标和键盘。内存中的数据缓存和高效的视频流保持控制和渲染同步,支持数十个并发用户在20 Hz下以不超过100毫秒的端到端延迟运行,每GPU支持多达8个并发用户。我们还展示了稳定运行支持256个模拟客户端跨8个GPU,凸显了系统在硬件和单个服务器内的扩展能力。我们进行了全面的用户研究,显示基于手机的远程操作性能与或优于专用硬件,能够更快、更符合人体工学地收集数据。为确保数据质量,COBALT记录一套实时指标以自动过滤劣质演示。我们进一步证明,结构化的用户培训课程显著提高了数据收集质量。基于用户研究的洞察,我们通过众包收集了一个大规模、高质量的试点数据集,该数据集包含7500多个演示(50多个小时),在五个国家的智能手机上收集了九天的数据。我们通过训练最先进的模仿学习算法验证了数据集的质量。请访问https://cobalt-teleop.github.io/获取更多详情。

英文摘要

The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit https://cobalt-teleop.github.io/ for more details.

2605.18860 2026-05-21 cs.LG cs.CV 版本更新

Spectral structural distortion reveals redundant neurons in neural networks

谱结构扭曲揭示神经网络中的冗余神经元

Yongyu Wang

AI总结 本文提出了一种基于谱结构扭曲的神经元冗余判定方法,通过分析神经网络层变换前后的关系结构,识别可移除的神经元并保持任务性能。

详情
AI中文摘要

过度参数化的神经网络通常包含许多可移除的神经元,但什么使神经元冗余仍不明确。现有剪枝标准通常依赖局部量如权重大小、激活强度或梯度敏感性,但这些指标对神经元在层变换中结构作用的洞察有限。本文表明,神经元冗余可通过在层间表示变换中参与谱结构扭曲的程度来表征。对于训练好的网络的每个隐藏层,我们记录预激活和后激活的隐藏状态,将神经元视为图节点,构建描述神经元层面关系结构的输入侧和输出侧图。然后我们定义了一个谱结构重要性分数,测量每个神经元对这两个关系结构之间主导图谱扭曲的贡献。参与度低的神经元被视为结构冗余并通过迭代剪枝过程移除,在每次结构变化后重新计算分数。在中间剪枝轮次中不进行参数更新;在达到目标参数减少后,对紧凑模型应用一次恢复微调阶段。直接消融分析和在传统神经网络、编码器-only Transformer 和解码器-only 语言模型上的实验表明,这种图谱标准能够识别可移除的神经元和 Transformer 单元,同时在压缩后保持任务性能。这些结果表明,神经冗余不仅仅是小权重或弱激活的结果,而是可以通过在层间关系结构谱扭曲中的弱参与来理解。

英文摘要

Overparameterized neural networks often contain many removable neurons, yet what makes a neuron redundant remains poorly understood. Existing pruning criteria commonly rely on local quantities such as weight magnitude, activation strength, or gradient sensitivity, but these measures provide limited insight into the structural role of a neuron in the transformation performed by a layer. Here we show that neuronal redundancy can be characterized by weak participation in the spectral structural distortion induced by layer-wise representation transformations. For each hidden layer of a trained network, we record pre-activation and post-activation hidden states, model neurons as graph nodes, and construct input-side and output-side graphs that describe neuron-level relational structure before and after the layer transformation. We then define a spectral structural importance score that measures the contribution of each neuron to the dominant graph-spectral distortion between these two relational structures. Low-participation neurons are treated as structurally redundant and removed through an iterative pruning process in which scores are recomputed after each structural change. No parameter updates are performed during intermediate pruning rounds; after the target parameter reduction is reached, a single recovery fine-tuning stage is applied to the compact model. Direct ablation analysis and experiments across conventional neural networks, encoder-only Transformers, and decoder-only language models show that this graph-spectral criterion identifies removable neurons and Transformer units while preserving task performance after compression. These results suggest that neural redundancy is not merely a consequence of small weights or weak activations, but can be understood through weak participation in the spectral distortion of layer-wise relational structure.

2605.18833 2026-05-21 cs.LG cs.AI 版本更新

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

利用知识图谱嵌入进行自动化大数据质量评估

Hadi Fadlallah, Rima Kilany, Mitri Haber, Ali Jaber

发表机构 * Saint-Joseph University(圣约瑟夫大学) Lebanese University(黎巴嫩大学)

AI总结 本文提出了一种基于知识图谱嵌入的自动化大数据质量评估方法,通过整合多样化的知识图谱表示,利用上下文信息生成针对每个情境的全面数据质量评估计划。

Comments 17 pages, 10 figures

详情
Journal ref
International Journal of Data Mining, Modelling and Management 17.4 (2025) 383-405
AI中文摘要

自动化数据质量评估对于管理大数据至关重要,但现有解决方案在实现准确的上下文感知评估方面面临挑战。本文提出了一种基于知识的新方法,利用知识图谱嵌入来预测输入数据集的上下文表示与知识图谱中相关质量规则和维度之间的缺失边。我们通过整合知识图谱中的多样化表示,从深入的文献研究中获取洞察,从而开发出针对每个情境的全面且上下文特定的数据质量评估计划。利用知识图谱提高了我们对输入数据集上下文的理解,克服了传统方法仅依赖严格匹配并忽视上下文特征的局限性。通过注入数值边属性,我们为每个预测的质量测量分配相应的权重,为输入数据集提供全面的数据质量评估计划。为了评估我们的方法,我们利用AccentureLabs开发和基准测试的AmpliGraph框架。评估涉及使用由黎巴嫩原子能委员会(LAEC-CNRS)提供的现实世界辐射传感器数据集。从该评估中获得的结果证明了我们的解决方案能够为给定的输入数据集生成全面的数据质量评估计划。

英文摘要

Automated data quality assessment is crucial for managing big data, but existing solutions face challenges in achieving accurate context-aware assessment. This paper presents a novel knowledge-based approach to enhance automated data quality assessment. Our approach utilizes knowledge graph embeddings to predict missing edges between the input dataset's context representation and the relevant quality rules and dimensions within a knowledge graph representing contextual data characteristics and the required quality assessment operations. We surpass conventional practices by integrating diverse representations within the knowledge graph, drawing insights from contextual information from a thorough literature investigation. This integration allows us to develop a comprehensive and context-specific data quality assessment plan tailored to each context. Leveraging the knowledge graph improves our understanding of the input dataset's context, overcoming the limitations of traditional methods that rely solely on strict matching and overlook contextual characteristics. By injecting numerical edge attributes, we assign corresponding weights to each predicted quality measurement, providing a comprehensive data quality assessment plan for the input dataset. To evaluate our approach, we leverage AmpliGraph, a framework developed and benchmarked by AccentureLabs. The evaluation involves employing a real-world radiation sensors dataset provided by the Lebanese Atomic Energy Commission (LAEC-CNRS). The results obtained from this evaluation demonstrate the capability of our solution to generate a comprehensive data quality assessment plan for the given input dataset.

2605.18579 2026-05-21 cs.LG 版本更新

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

S2Aligner: 用于稀疏文本属性图的高效且可迁移的预训练方法

Yuhan Wang, Haopeng Zhang, Yibo Ding, Jiaqi Yu, Xinyu Zhao, Yuhang Liu, Ziwei Zhang, Xiao Wang, Ruijie Wang

发表机构 * Beihang University(北航大学) Tianjin University(天津大学)

AI总结 本文提出S2Aligner,一种针对稀疏文本属性图的高效且可迁移的预训练方法,通过解耦语义对齐与结构建模,增强对齐过程而不污染共享的语义空间,从而减少跨域泛化差距。

Comments 19 pages

详情
AI中文摘要

在文本属性图(TAGs)上进行预训练是构建可迁移图基础模型的核心,其中LLM-as-Aligner方法通过大语言模型的语义知识对图和文本表示进行对齐。然而,这些方法通常假设节点文本提供足够的监督,但在实际稀疏TAGs中这一假设往往不成立。当文本锚点缺失、嘈杂或跨域不均时,图结构必须通过弱语义证据进行对齐,导致不可靠的结构-语义对应关系和稀疏性引起的迁移偏差。本文提出S2Aligner,一种针对稀疏TAGs的稀疏感知且结构增强的LLM-as-Aligner框架用于图-文本预训练。关键思想是解耦语义对齐与结构建模,使拓扑感知信号能够增强对齐而不污染共享的语义空间。具体而言,S2Aligner将图-文本表示分解为语义和结构成分,利用结构导向的重建与一致性控制来将可靠的拓扑线索注入文本表示,并在文本稀疏性下抑制不一致的结构信号。此外,S2Aligner引入稀疏感知的跨域风险平衡,通过全局-域密度比校准域风险,并通过图可靠性估计降低不可靠的稀疏样本权重。理论分析表明,该目标通过控制域风险差异来减少跨域泛化差距。在多样化的图域、稀疏程度和下游任务上进行的广泛实验表明,S2Aligner在一致性上优于现有基线。

英文摘要

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak semantic evidence, leading to unreliable structure-semantics correspondence and sparsity-induced transfer bias. This paper presents S2Aligner, a sparsity-aware and structure-enhanced LLM-as-Aligner framework for graph-text pre-training on sparse TAGs. The key idea is to decouple semantic alignment from structural modeling, allowing topology-aware signals to enhance alignment without contaminating the shared semantic space. Specifically, S2Aligner decomposes graph-text representations into semantic and structural components, uses structure-oriented reconstruction with consistency control to inject reliable topology cues into text representations, and suppresses inconsistent structural signals under textual sparsity. Moreover, S2Aligner introduces sparsity-aware cross-domain risk balancing, which calibrates domain risks through a global-domain density ratio and downweights unreliable sparse samples via graph reliability estimation. Theoretical analysis shows that this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy. Extensive experiments across diverse graph domains, sparsity levels, and downstream tasks demonstrate that S2Aligner consistently outperforms existing baselines.

2605.17946 2026-05-21 cs.AI cs.CV cs.LG 版本更新

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

SVFSearch: 一种面向游戏垂直领域的多模态知识密集型短视频帧搜索基准

Lingtao Mao, Huangyu Dai, Xinyu Sun, Zihan Liang, Ben Chen, Chenyi Lei, Wenwu Ou

发表机构 * Kuaishou Technology(快手科技)

AI总结 本文提出SVFSearch,首个针对中文游戏领域短视频帧搜索的多模态知识密集型基准,通过5000个四选一测试示例和4198个辅助训练示例,评估了从直接问答到计划-行动-重新计划代理等多种方法在短视频帧搜索中的性能。

详情
AI中文摘要

多模态大语言模型越来越多地被用作代理的骨干,以理解多模态输入、计划检索操作、调用外部工具并推理由检索信息得出的结论。然而,现有的基准很少评估在短视频应用中的这种能力,其中暂停的帧通常在视觉上具有歧义性,回答需要垂直的、长尾的和快速发展的领域知识。我们引入了SVFSearch,这是首个针对中文游戏领域短视频帧搜索的开放基准。SVFSearch包含5,000个四选一测试示例和4,198个辅助训练示例,每个示例都围绕一个暂停的游戏场景展开,来自真实的短视频片段。为了支持公平且可重复的评估,SVFSearch提供了一个冻结的离线检索环境,包括一个游戏领域文本语料库、一个主题链接的图像画廊以及文本、图像和多模态检索接口,避免了对不受控的网络搜索API的依赖。我们评估了从直接问答和RAG工作流程到计划-行动-重新计划代理和学习搜索模型在内的代表性范式。结果揭示了模型单独回答、实际代理搜索和 oracle 知识之间的巨大差距:最好的开源直接问答模型达到66.4%,最好的实际代理达到79.1%,而 oracle 知识达到95.4%。进一步分析揭示了视觉定位、检索质量、证据基础推理和工具使用行为中的瓶颈,包括过度检索、只回答捷径和检索诱导的误导。

英文摘要

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieved information. Yet existing benchmarks rarely evaluate this ability in short-video applications, where a paused frame is often visually ambiguous and answering requires vertical, long-tail, and fast-evolving domain knowledge. We introduce SVFSearch, the first open benchmark for short-video frame search in the Chinese gaming domain. SVFSearch contains 5,000 four-choice test examples and 4,198 auxiliary training examples, each centered on a paused game scene from a real short-video clip. To support fair and reproducible evaluation, SVFSearch provides a frozen offline retrieval environment with a game-domain text corpus, a topic-linked image gallery, and text, image, and multimodal retrieval interfaces, avoiding reliance on uncontrolled web search APIs. We evaluate representative paradigms ranging from direct QA and RAG workflow to Plan-Act-Replan agents and learned search models. Results reveal a large gap between model-only answering, practical agentic search, and oracle knowledge: the best open-source direct-QA model reaches 66.4%, the best practical agent achieves 79.1%, and oracle knowledge reaches 95.4%. Further analysis exposes bottlenecks in visual grounding, retrieval quality, evidence-grounded reasoning, and tool-use behavior, including over-search, answer-only shortcuts, and retrieval-induced misleading.

2605.17164 2026-05-21 cs.DC cs.AI cs.LG cs.PL 版本更新

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

Charon:一种用于大规模大语言模型训练和推理的统一且细粒度模拟器

Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出Charon模拟器,通过统一、模块化和细粒度的方法,准确预测大语言模型性能,实验显示其在不同模型和配置上具有高精度,预测误差低于5.35%,并在实际推理部署中发现提升系统吞吐量的配置,展示了其实际价值。

Comments Accepted by MLSys 2026

详情
AI中文摘要

在大规模大语言模型(LLM)训练和推理中,由于并行策略、系统优化和硬件配置的复杂设计空间,实现最优性能极具挑战性。准确且快速的性能模拟对于通过验证“假设”图进行优化努力和系统研究至关重要。为此,我们引入Charon,一种统一、模块化且细粒度的模拟器,以准确预测LLM性能。实验显示,Charon在不同模型和配置上均具有高精度,总体预测误差始终低于5.35%,甚至在使用大规模GPU集群进行训练时也低于3.74%。在实际推理部署案例中,Charon发现了一种比工程调优基线配置提升系统吞吐量的配置,证明了其在现实中的重要价值。

英文摘要

Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value.

2605.16812 2026-05-21 cs.LG cs.CR 版本更新

Jacobian-Guided Anisotropic Noise Reshaping for Enhancing Representation Utility under Local Differential Privacy

Jacobian-Guided Anisotropic Noise Reshaping for Enhancing Representation Utility under Local Differential Privacy

Youngmok Ha, Viktor Schlegel, Yidan Sun, Anil Anthony Bharath

发表机构 * Imperial College London(帝国理工学院伦敦分校) Imperial College London, Imperial Global Singapore(帝国理工学院伦敦分校,帝国全球新加坡)

AI总结 本文提出了一种基于雅可比矩阵的各向异性噪声重塑方法,以在局部差分隐私下提升表示的效用。该方法通过识别任务关键子空间,选择性地衰减噪声,并将标准LDP的各向同性噪声重塑为各向异性分布,从而在保持每个维度隐私预算的同时,异质地调节噪声影响,显著提升数据效用。

详情
AI中文摘要

尽管局部差分隐私(LDP)是分布式数据收集的基础原始构件,其严格的噪声注入要求常常导致数据效用的严重下降。这种下降源于传统LDP机制的任务无关性质,即在所有维度上均匀注入噪声,而不考虑其对下游目标的相对重要性。为了解决这个问题,我们提出了一种新的方法,通过数据表示的任务相关子空间来减轻噪声。我们的方法通过公共下游模型的雅可比矩阵识别任务关键子空间,选择性地衰减这些维度的噪声,并将标准LDP的各向同性噪声重塑为各向异性分布。该方法在保持每个维度隐私预算的同时,异质地调节噪声影响,从而显著提升数据效用。此外,我们的方法适用于线性和非线性模型,并能无缝集成到现有机制中。在CIFAR-10-C(亮度腐败最高严重级别5)上的大量实验表明,整合我们的方法可使PrivUnit2和PrivUnitG的效用在ε=7.5时提高约20%。源代码可在https://github.com/ymha/jacobian-anr-ldp获取。

英文摘要

While Local Differential Privacy (LDP) serves as a foundational primitive for distributed data collection, its stringent noise injection requirement often leads to severe degradation in data utility. This degradation stems from the task-agnostic nature of conventional LDP mechanisms, which inject noise uniformly across all dimensions regardless of their relative importance to the downstream objective. To address this issue, we propose a novel approach that mitigates noise in task-relevant subspaces of the data representation. Our method identifies task-critical subspaces via the Jacobian matrix of the public downstream model, selectively attenuates noise along those dimensions, and reshapes the isotropic noise of standard LDP into an anisotropic distribution. This method preserves the uniform per-dimension privacy budget while heterogeneously modulating noise impact across dimensions, thereby substantially enhancing data utility. Furthermore, our approach generalizes to both linear and non-linear models and integrates seamlessly with existing mechanisms. Extensive experiments on CIFAR-10-C (Brightness corruption at the highest severity level 5) demonstrate that integrating our approach improves the utility of PrivUnit2 and PrivUnitG by approximately 20\% at $ε=7.5$. The source code is available at https://github.com/ymha/jacobian-anr-ldp.

2605.16793 2026-05-21 cs.LG 版本更新

PULSE: Generative Phase Evolution for Non-Stationary Time Series Forecasting

PULSE: 非平稳时间序列预测的生成性相演变

Yangyou Liu, Zezhi Shao, Xinyu Chen, Hu Chen, Fei Wang, Yuankai Wu

发表机构 * College of Computer Science, Sichuan University, Chengdu, China(四川大学计算机学院) University of Chinese Academy of Sciences, Beijing, China(中国科学院大学) Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China(中国科学院计算技术研究所) Institute of Artificial Intelligence, University of Central Florida, Orlando, USA(佛罗里达大学人工智能研究所)

AI总结 针对非平稳时间序列预测中稳定表示与分布偏移之间的矛盾,本文提出PULSE框架,通过物理假设引导相演变,采用解耦-演化-模拟设计哲学,通过相锚解耦、相路由器和统计感知混合等方法提升模型鲁棒性,实验证明物理引导的归纳偏置比原始架构复杂度更重要。

详情
AI中文摘要

在非平稳条件下进行时间序列预测面临稳定表示与适应分布偏移之间的根本矛盾。现有方法隐式依赖静态历史假设,导致我们称之为相遗忘的临界失败模式,即模型对演变的全局上下文失明。为了解决这一问题,我们通过三个物理假设形式化非平稳动态:世界分解、动态相演变和异方差流形生成。这些原理启发了PULSE,一个受物理启发、即插即用的框架,采用解耦-演化-模拟设计哲学。具体而言,PULSE利用相锚定解耦来解决由主导趋势引起的优化干扰,采用相路由器主动生成未来轨迹,并引入统计感知混合(SAM)以确保对分布外波动的鲁棒性。实验证明,PULSE使简单的MLP主干在12个现实世界基准上达到最先进的或高度竞争的性能。这验证了正确的物理引导的归纳偏置比原始架构复杂度对非平稳预测更为关键。代码可在:https://github.com/Gemost/PULSE获取。

英文摘要

Time series forecasting under non-stationarity faces a fundamental tension between capturing stable representations and adapting to distribution shifts. Existing methods implicitly rely on static historical assumptions, leading to a critical failure mode we term Phase Amnesia, where models become blind to the evolving global context. To resolve this, we formalize non-stationary dynamics through three physical hypotheses: wold decomposition, dynamical phase evolution, and heteroscedastic manifold generation. These principles inspire PULSE, a physics-informed, plug-and-play framework adopting a Disentangle--Evolve--Simulate design philosophy. Specifically, PULSE utilizes phase-anchored disentanglement to resolve optimization interference caused by dominant trends, employs a Phase Router to actively generate future trajectories, and introduces Statistic-Aware Mixup (SAM) to ensure robustness against out-of-distribution volatility. Empirically, PULSE enables a simple MLP backbone to achieve state-of-the-art or highly competitive performance across 12 real-world benchmarks. This validates that a correct physics-informed inductive bias is far more critical than raw architectural complexity for non-stationary forecasting. The code is available at: https://github.com/Gemost/PULSE.

2605.15944 2026-05-21 cs.RO cs.LG 版本更新

FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

FocalPolicy: 频率优化的分块和局部锚定的流匹配用于连贯的视觉-运动策略

Qian He, Zhenshuo Yang, Wenqi Liang, Chunhui Hao, Nicu Sebe, Jiandong Tian

发表机构 * State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences(机器人与智能系统国家重点实验室,沈阳自动化研究所,中国科学院) University of the Chinese Academy of Sciences(中国科学院大学) University of Trento(特伦多大学)

AI总结 本文提出FocalPolicy,一种面向视觉-运动策略的策略,通过频率优化的分块和局部锚定的流匹配,解决连续视觉-运动策略中的精度与远见之间的平衡问题。

详情
AI中文摘要

视觉-运动策略旨在从专家示范中学习复杂的操作任务。然而,生成平滑且连贯的轨迹仍然具有挑战性,因为它需要在近端精度与远端远见之间进行平衡。现有方法通常专注于优化块内动作分布,往往忽略了块间连贯性。因此,块间不连续性显著阻碍了连贯长周期动作的学习。为克服这一限制并实现精度与远见之间的协同平衡,我们提出了FocalPolicy,一种具有远见的视觉-运动策略,结合了频率优化的分块与局部锚定的流匹配。我们引入了一个远见复合目标,监督时间域内近端动作的对齐,同时在多个未来动作块上正则化频率域结构以提高跨块连贯性。为了高效学习复杂动作分布,我们设计了局部锚定采样,以提高一致性流匹配训练期间的目标信号传播效率。广泛的实验表明,FocalPolicy优于现有方法,并验证了我们的模块对其他基线的通用性。项目网站:https://focalpolicy.github.io/

英文摘要

Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions. To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored sampling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that FocalPolicy outperforms existing approaches and confirm the generalizability of our modules to other baselines. Project website: https://focalpolicy.github.io/

2605.15691 2026-05-21 cs.LG 版本更新

SEED: Targeted Data Selection by Weighted Independent Set

SEED:通过加权独立集实现目标数据选择

Yuan Zhang, Lifeng Guo, Junwen Pan, Wenzhao Zheng, Wen Zhou, Kuan Cheng, Kurt Keutzer, Shanghang Zhang

发表机构 * School of Computer Science, Peking University(北京大学计算机科学学院) Beijing University of Posts and Telecommunications(北京邮电大学) Tianjin University(天津大学) EECS, UC Berkeley(伯克利大学电子工程与计算机科学系) Chinese Academy of Sciences(中国科学院)

AI总结 本文提出SEED方法,通过将数据选择问题建模为加权独立集(WIS)在相似性图上,解决样本质量与多样性之间的平衡问题,并引入节点价值校准和局部尺度归一化来提升数据选择的鲁棒性和可扩展性。

Comments 20 pages

详情
AI中文摘要

数据选择旨在从大规模训练语料中识别出紧凑且信息丰富的子集,平衡样本质量和收集多样性。我们将该问题建模为相似性图上的加权独立集(WIS),其中节点代表数据样本并按影响程度加权,边连接语义冗余的配对。这种建模自然产生同时高质量和多样化的子集。然而,实践中存在两个挑战:朴素的节点权重无法区分信息信号与梯度噪声,且在异构领域分布下构造边会产生结构不平衡的图,偏向社会稀疏区域。为解决这些问题,我们引入了两种从统一图视角出发的改进方法:(1)节点价值校准,限制影响估计到双侧显著子空间,以任务相关信号为基础确定节点重要性,而不是表面统计;(2)局部尺度归一化,适应边阈值到局部邻域密度,缓解因跨领域分布偏移引起的图不平衡。这些组件共同产生了一个稳健且可扩展的数据选择流程,称为SEED。我们进一步构建了 exttt{Honeybee-Remake-SEED-200K},一个由SEED编纂的紧凑多模态数据集。广泛实验表明,SEED在指令微调、视觉指令微调和语义分割等任务上,优于现有最先进方法,适用于多种模型家族。

英文摘要

Data selection seeks to identify a compact yet informative subset from large-scale training corpora, balancing sample quality against collection diversity. We formulate this problem as a Weighted Independent Set (WIS) on a similarity graph, where nodes represent data samples weighted by influence, and edges connect semantically redundant pairs. This formulation naturally yields subsets that are simultaneously high-quality and diverse. However, two challenges arise in practice: naive node weights fail to distinguish informative signals from gradient noise, and edge construction under heterogeneous domain distributions produces structurally imbalanced graphs that bias selection toward sparse regions. To address these issues, we introduce two principled refinements from a unified graph perspective: (1) \textit{node value calibration} that restricts influence estimation to the bilateral salient subspace to ground node importance in task-relevant signals rather than surface-level statistics; (2) \textit{local scale normalization} that adapts edge thresholds to local neighborhood density, mitigating graph imbalance induced by cross-domain distribution shifts. Together, these components yield a robust and scalable data selection pipeline dubbed SEED. We further construct \texttt{Honeybee-Remake-SEED-200K}, a compact multimodal dataset curated by SEED. Extensive experiments show that SEED consistently outperforms state-of-the-art methods on instruction tuning, visual instruction tuning, and semantic segmentation across diverse model families.

2605.15305 2026-05-21 cs.GR cs.LG 版本更新

WorldParticle: Unified World Simulation of Lagrangian Particle Dynamics via Transformer

WorldParticle:通过Transformer实现拉格朗日粒子动力学的统一世界模拟

Caoliwen Wang, Minghao Guo, Siyuan Chen, Heng Zhang, Mengdi Wang, Xingyu Ni, Hanson Sun, Kunyi Wang, Zherong Pan, Kui Wu, Lingjie Liu, Yin Yang, Chenfanfu Jiang, Taku Komura, Wojciech Matusik, Peter Yichen Chen

发表机构 * University of British Columbia(不列颠哥伦比亚大学) MIT CSAIL(麻省理工学院计算机科学与人工智能实验室) Georgia Institute of Technology(佐治亚理工学院) Inria(法国国家信息与自动化技术研究院) Meta(Meta公司) Independent Researcher(独立研究者) University of Pennsylvania(宾夕法尼亚大学) University of Utah(犹他大学) University of California Los Angeles(加州大学洛杉矶分校) University of Hong Kong(香港大学)

AI总结 本文提出基于Transformer架构的粒子模拟器,能够统一模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学等不同物理现象,通过预测-校正设计和粒子表示,实现高效的模拟与泛化。

详情
AI中文摘要

一个能够模拟多种物理现象而无需针对特定求解器重新设计的统一模拟器一直是模拟科学中的长期目标。我们提出一个基于学习的粒子模拟器,基于单一的Transformer架构,以模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学。我们的模型采用共享拉格朗日粒子表示的预测-校正设计。一个显式预测器首先在已知的外力作用下推进粒子,产生一个中间状态,该状态捕捉了外部驱动的运动,但不捕捉粒子间相互作用。一个学习的校正器通过三个阶段预测残差位置和速度更新:一个粒子分词器编码局部粒子-粒子、粒子-边界和拓扑引导的相互作用;一个超分词编码器通过交替的自注意力和分词合并将粒子分词合并为紧凑的超分词集;一个超分词解码器通过交叉注意力将这些超分词提升回粒子分辨率,以预测每个粒子的位置和速度校正。逐步分词合并通过在每一层将分词数量减半来减少后续编码器层的注意力成本,解码器通过紧凑的超分词集而不是完整的粒子-粒子注意力进行通信。在六个动力学类别中,相同的架构能够泛化到未见过的材料、边界配置、初始条件和外力。我们进一步展示了下游交互控制、反向设计和从现实世界操作数据中学习,减少了对每个现象求解器工程的需要。

英文摘要

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based particle simulator built on a single transformer architecture to model cloth, elastic solds, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics. Our model follows a prediction-correction design on a shared Lagrangian particle representation. An explicit predictor first advances particles under the known external forces, producing an intermediate state that captures externally driven motion but not inter-particle interactions. A learned corrector then predicts the residual position and velocity updates through three stages: a particle tokenizer that encodes local particle-particle, particle-boundary, and topology-guided interactions; a super-token encoder that hierarchically merges particle tokens into a compact set of super tokens via alternating self-attention and token merging; and a super-token decoder that lifts these super tokens back to particle resolution through cross-attention to predict per-particle position and velocity corrections. Progressive token merging reduces the attention cost at successive encoder layers by halving the token count at each level, and the decoder communicates through the compact super-token set rather than full particle-to-particle attention. Across the six dynamics categories, the same architecture generalizes to unseen materials, boundary configurations, initial conditions, and external forces. We further demonstrate downstream interactive control, inverse design, and learning from real-world manipulation data, reducing the need for per-phenomenon solver engineering.

2605.15157 2026-05-21 cs.RO cs.LG 版本更新

Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention

手在环中:通过无缝手臂干预改进VLA策略以实现灵巧操作

Zhuohang Li, Liqun Huang, Wei Xu, Zhengming Zhu, Nie Lin, Xiao Ma, Xinjun Sheng, Ruoshi Wen

发表机构 * State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University(机械系统与振动国家重点实验室,机械工程学院,上海交通大学) Shanghai Key Laboratory of Intelligent Robotics, Meta Robotics Institute, Shanghai Jiao Tong University, Shanghai 200240, China(智能机器人上海市重点实验室,元机器人研究院,上海交通大学,上海200240,中国) The University of Tokyo(东京大学)

AI总结 本文提出Hand-in-the-Loop方法,通过无缝整合人类干预与自主策略执行,减少手部操作中的突兀变化,提升双臂灵巧操作的鲁棒性和效率。

详情
AI中文摘要

Vision-Language-Action (VLA)模型在灵巧操作中容易累积误差,高维动作空间和接触丰富的动态会放大政策偏差。虽然交互模仿学习(IIL)可通过人类修正数据细化策略,但将其应用于高自由度机械手仍具有挑战性,因为人类遥控与策略执行在干预时刻的命令不匹配,导致机器人手部配置的突兀变化,即'手势跳跃'。我们提出了Hand-in-the-Loop (HandITL),一种无缝的人在回路干预方法,将人类的修正意图与自主策略执行相结合,以避免在双臂灵巧操作中的手势跳跃。与使用直接遥控接管相比,HandITL将干预抖动减少了99.8%,并保持了干预后的稳健操作,将抓取失败减少了87.5%,平均完成时间减少了19.1%。我们在需要双臂协调、工具使用和精细长时域操作的任务上验证了HandITL。当用于收集策略细化的修正数据时,HandITL在三个长时域灵巧任务中平均优于使用标准遥控数据训练的策略19%。

英文摘要

Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human correction data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the intervention moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with taking over control using direct teleoperation, HandITL reduces intervention jitter by 99.8% and preserves robust post-intervention manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect correction data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.

2605.14364 2026-05-21 cs.LG 版本更新

MoRe: Modular Representations for Principled Continual Representation Learning on Sequential Data

MoRe:模块化表示用于序列数据的原理化持续表示学习

Jiaqi Sun, Boyang Sun, Rasmy M. H., Xiangchen Song, Kun Zhang

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 本文提出MoRe框架,通过模块化表示方法实现序列数据的原理化持续学习,其核心贡献是通过分解知识为可识别的模块层级,实现模块的重用、对齐和扩展,从而在保持旧模块的同时提升模型的可塑性和稳定性。

详情
AI中文摘要

持续学习要求模型在适应新数据的同时保持已获得的知识。其核心挑战可以视为原理化的一步适应:在最小干扰的情况下将新信息整合到现有表示中。大多数现有方法通过监督、任务特定的方式修改模型参数或架构来解决这一挑战。然而,根本问题在于表示层面:任务需要具有不同但结构化的表示,这些表示可以被选择性更新而不破坏表示,同时结构应反映数据中的内在组织而非任务边界。在序列数据中,时间延迟依赖性提供了一种自然的信号,用于揭示这种组织,展示如何基本表示产生更具体的表示。受人类大脑模块化组织的启发,我们提出MoRe,一个框架,它在表示本身中识别模块性,而不是在架构层面分配。MoRe将知识分解为具有可识别保证的基本和特定模块层级,使在适应过程中能够实现原理化的模块重用、对齐和扩展,同时通过构造保留旧模块。在合成基准和真实世界LLM激活数据上的实验表明了可解释的层次结构,改进了可塑性-稳定性权衡,表明MoRe是持续适应的原理化基础。

英文摘要

Continual learning requires models to adapt to new data while preserving previously acquired knowledge. At its core, this challenge can be viewed as principled one-step adaptation: incorporating new information with minimal interference to existing representations. Most existing approaches address this challenge by modifying model parameters or architectures in a supervised, task-specific manner. However, the underlying issue is representational: tasks require distinct yet structured representations that can be selectively updated without disrupting representations, while structure should reflect intrinsic organization in the data rather than task boundaries. In sequential data, time-delayed dependencies provide a natural signal for uncovering this organization, revealing how fundamental representations give rise to more specific ones. Inspired by the modular organization of the human brain, we propose MoRe, a framework that identifies modularity in the representation itself rather than allocating it at the architectural level. MoRe decomposes knowledge into a hierarchy of fundamental and specific modules with identifiability guarantees, enabling principled module reuse, alignment, and expansion during adaptation while preserving old modules by construction. Experiments on synthetic benchmarks and real-world LLM activations demonstrate interpretable hierarchical structure, improved plasticity-stability trade-offs, suggesting MoRe as a principled foundation for continual adaptation

2605.13302 2026-05-21 cs.LG cs.SY eess.SY 版本更新

Safe Bayesian Optimization for Uncertain Correlation Matrices in Linear Models of Co-Regionalization

安全的贝叶斯优化用于线性共区域化模型中的不确定相关矩阵

Jannis Lübsen, Annika Eichler

发表机构 * Institute of Control Systems, Hamburg University of Technology(控制系统研究所,汉堡技术大学)

AI总结 本文将多任务贝叶斯优化的安全保证从内在共区域化模型扩展到线性共区域化模型,通过组合多个特征更灵活地建模任务间相关性,并推导了从线性共区域化核高斯过程中采样的向量值函数的统一误差界,同时在安全多任务贝叶斯优化基准上的数值比较中展示了线性共区域化模型的潜在性能优势。

Comments Accepted at IFAC WC26

详情
AI中文摘要

本文将多任务贝叶斯优化的安全保证从内在共区域化模型扩展到线性共区域化模型。后者通过组合多个特征提供了更灵活的任务间相关性建模方式。我们推导了从线性共区域化核高斯过程中采样的向量值函数的统一误差界。此外,我们通过在安全多任务贝叶斯优化基准上的数值比较,展示了线性共区域化模型的潜在性能优势。

英文摘要

This paper extends safety guarantees for multi-task Bayesian optimization with uncertain co-regionalization matrices from intrinsic co-regionalization models to linear models of co-regionalization. The latter allows for more flexible modeling of the inter-task correlations by composing multiple features. We derive uniform error bounds for vector-valued functions sampled from a Gaussian process with a linear model of co-regionalization kernel. Furthermore, we show the potential performance gains of linear models of co-regionalization in a numerical comparison on a safe multi-task Bayesian optimization benchmark.

2605.12483 2026-05-21 cs.LG cs.AI 版本更新

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

超越GRPO和在线策略蒸馏:一种经验性稀疏到密集奖励原则用于语言模型后训练

Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard

AI总结 本文提出了一种经验性的稀疏到密集奖励原则,用于语言模型后训练,通过在教师模型上使用稀疏奖励进行探索和发现,然后通过密集监督将行为压缩到部署模型中,从而在数学问题上实现了优于GRPO的性能。

详情
AI中文摘要

在标记可验证的训练数据是约束的情况下,每个检查的示例应分配给模型和奖励密度,其中它最有信息量。我们识别出一个支配这种分配的奖励密度原则:稀疏序列级奖励在能够探索和发现更好行为的模型上最有用,而密集的token级教师监督更适合将该行为压缩到更小的部署模型中。该原则产生了一个简单的分配规则:在最强的可用教师上使用稀缺的标记数据,然后将奖励形状的行为作为密集监督转移到下游。我们通过一个四阶段的工作流程——教师RL、forward-KL预热、在线策略蒸馏、可选的后桥学生RL——在可验证的数学上评估了此规则,使用Qwen3和Llama模型。在固定的Qwen3-1.7B部署学生大小下,一个通过密集桥进行蒸馏的RL改进的8B教师在相同的学生上表现优于直接GRPO(79.3% vs. 75.9%在MATH;25.2% vs. 19.8%在AIME 2024,avg@16),而从相同教师提前进行RL的转移效果更差。一个组件消融确认了每个阶段的重要性:用RL改进的教师替换为原始教师会损失7.8个MATH点,移除forward-KL预热会损失1.7个点,移除在线策略蒸馏会损失3.3个点。教师质量顺序——原始教师转移 < 直接GRPO < RL教师转移——在使用Llama-3.1-8B-Instruct作为教师和Llama-3.3-70B-Instruct作为教师的情况下重复。操作教训是避免将稀缺的标记数据用于准备最少的策略:使用稀疏奖励进行教师端的发现,使用密集转移进行学生端的压缩,并在桥接后才使用学生端的稀疏奖励。

英文摘要

In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated to the model and reward density where it is most informative. We identify a reward-density principle that governs this allocation: sparse sequence-level reward is most useful on models that can explore and discover better behavior, while dense token-level teacher supervision is better suited for compressing that behavior into a smaller deployment model. The principle yields a simple allocation rule: use scarce labeled data upstream on the strongest available teacher, then transfer the reward-shaped behavior downstream as dense supervision. We evaluate this rule through a four-stage workflow -- teacher RL, forward-KL warmup, on-policy distillation, optional post-bridge student RL -- on verifiable math with Qwen3 and Llama models. At fixed Qwen3-1.7B deployment-student size, an RL-improved 8B teacher distilled through the dense bridge outperforms direct GRPO on the same student ($79.3\%$ vs.\ $75.9\%$ on MATH; $25.2\%$ vs.\ $19.8\%$ on AIME~2024, avg@16), while transfer from the same teacher \emph{before} RL underperforms. A component ablation confirms that each stage is load-bearing: replacing the RL-improved teacher with a raw teacher costs $7.8$ MATH points, removing the forward-KL warmup costs $1.7$, and removing on-policy distillation costs $3.3$. The teacher-quality ordering -- raw-teacher transfer $<$ direct GRPO $<$ RL-teacher transfer -- replicates on Llama-3.1-8B-Instruct with a Llama-3.3-70B-Instruct teacher. The operational lesson is to avoid spending scarce labeled data on the least prepared policy: use sparse reward for teacher-side discovery, dense transfer for student compression, and student-side sparse reward only after the bridge.

2605.12196 2026-05-21 cs.LG 版本更新

ECTO: Exogenous-Conditioned Temporal Operator for Ultra-Short-Term Wind Power Forecasting

ECTO:用于超短期风功率预测的外源性条件化时间运算符

Cao Yuan, Junjun Wang

发表机构 * Wuhan Polytechnic University(武汉理工大学) Wuhan Public Meteorological Service Center(武汉市气象局)

AI总结 本文提出了一种统一框架ECTO,通过物理基础变量选择和外源性条件化制度细化模块,实现了对超短期风功率预测中非平稳、条件依赖的风力发电的高效建模,从而在不同气候、容量和外源变量维度的风场中取得最佳的均方误差性能。

Comments 42 pages, 10 figures, 9 tables

详情
AI中文摘要

准确的超短期风功率预测对于电网调度和备用管理至关重要,但因其风力发电的非平稳性和条件依赖性而具有挑战性。气象外源变量包含大量预测信息,但最有信息量的变量组合会因站点、运行条件和预测时间跨度而异。现有的深度学习方法要么将外源输入视为通用的辅助通道通过统一混合或软门控,要么依赖于固定的预处理步骤如PCA,而没有利用气象变量的物理结构。我们提出ECTO(外源性条件化时间运算符),一个统一的框架,将外源变量建模分解为两个互补的模块。物理基础变量选择(PGVS)使用领域指导的物理先验和稀疏max激活进行层次化、组意识的稀疏选择,产生一个紧凑、条件适应的外源上下文。外源性条件化制度细化(ECRR)将预测路由通过学习到的制度专家,通过专家混合范式应用增益-偏置校准和特定时间跨度的校正。在三个跨越不同气候、容量(66-200 MW)和外源变量维度(11-13个变量)的风场实验中,ECTO在所有站点中实现了最低的均方误差,相对于最强基线的相对改进范围从2.2%到5.2%,在较长的预测时间跨度(H=32)时扩大到8.6%。消融分析确认了每个与外源变量相关的组件都贡献了积极的效果(PGVS +1.84%,ECRR +2.86%),可解释性分析揭示PGVS学习了具有物理意义的、特定站点的变量选择模式,而ECRR收敛到一致的校准策略。

英文摘要

Accurate ultra-short-term wind power forecasting is critical for grid dispatch and reserve management, yet remains challenging due to the non-stationary, condition-dependent nature of wind generation. Meteorological exogenous variables carry substantial predictive information, but the most informative variable combination varies across sites, operating conditions, and prediction horizons. Existing deep learning approaches either treat exogenous inputs as generic auxiliary channels through uniform mixing or soft gating, or rely on fixed preprocessing steps such as PCA, without exploiting the physical structure of meteorological variables. We propose ECTO (Exogenous-Conditioned Temporal Operator), a unified framework that decomposes exogenous variable modeling into two complementary modules. Physically-Grounded Variable Selection (PGVS) performs hierarchical, group-aware sparse selection over exogenous variables using a domain-informed physical prior and sparsemax activations, producing a compact, condition-adaptive exogenous context. Exogenous-Conditioned Regime Refinement (ECRR) routes the forecast through learned regime experts that apply gain--bias calibration and horizon-specific corrections via a mixture-of-experts paradigm. Experiments on three wind farms spanning different climates, capacities (66--200 MW), and exogenous dimensions (11--13 variables) demonstrate that ECTO achieves the lowest MSE across all sites, with relative improvements over the strongest baseline ranging from 2.2% to 5.2%, widening to 8.6% at the longer prediction horizon ($H=32$). Ablation analysis confirms that each exogenous-related component contributes positively (PGVS +1.84%, ECRR +2.86%), and interpretability analysis reveals that PGVS learns physically meaningful, site-specific variable selection patterns, while ECRR converges to well-separated calibration strategies consistent across sites.

2605.11302 2026-05-21 cs.LG cs.AI cs.CL 版本更新

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

时间敏感语言生成理论:稀疏幻觉战胜模式崩溃

Atul Ganju, Travis McVoy, Shaddin Dughmi, Shang-Hua Teng

发表机构 * University of Southern California(美国南加州大学)

AI总结 本文研究了在全局偏好顺序下语言生成的极限情况,提出了一种时间敏感的语言生成方法,通过稀疏幻觉技术克服了模式崩溃问题,证明了在特定条件下可以实现最优密度。

详情
AI中文摘要

我们研究了在全局偏好顺序下语言生成的极限情况,如Kleinberg和Wei所引入的。与以往工作类似,我们追求广度,但增加了时效性要求:高排名字符串应更早生成。一个字符串只有在截止时间前生成才被认可,其截止时间由一个函数确定,该函数将字符串在目标语言中的排名映射到必须生成的时间。这与机器学习中的归纳偏置一致,即在其他条件相同的情况下,倾向于选择更简单或更可能的输出。我们证明,在强意义上,最终一致的生成器无法实现时效性生成——这是大多数先前相关工作的主角。在可能最温和的一致性放松下,即幻觉率随时间消失,我们证明可以绕过我们的不可能结果。特别是,我们可以实现相对于任何超线性截止函数的最优密度。我们还证明这是紧的,通过排除线性截止时间和消失幻觉率下的时效性生成。

英文摘要

We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As is done in previous work, we aim for breadth, but impose an additional requirement of timeliness: higher-ranked strings should be generated earlier. A string is then only credited if it is generated before a deadline, where its deadline is defined by a function that maps a string's rank in the target language to the time by which it must be produced. This is in keeping with a central consideration in machine learning, where inductive bias favors ``simpler'' or ``more plausible'' outputs, all else being equal. We show that timely generation is impossible in a strong sense for eventually consistent generators -- the protagonists of most prior related work. Under what is perhaps the mildest natural relaxation of consistency, a hallucination rate that vanishes over time, we show that we can circumvent our impossibility result. In particular, we can achieve optimal density with respect to any superlinear deadline function. We also show this is tight by ruling out timely generation with linear deadlines and vanishing hallucination rate.

2605.10830 2026-05-21 cs.CV cs.LG 版本更新

Predicting 3D structure by latent posterior sampling

通过潜在后验采样预测3D结构

Azmi Haider, Dan Rosenbaum

发表机构 * Department of Computer Science(计算机科学系) University of Haifa(海法大学) Department of Computational Science(计算科学系)

AI总结 本文提出了一种结合NeRF表示和扩散模型的概率建模方法,用于从不同类型的观测数据(如单视角、多视角、噪声图像、稀疏像素和稀疏深度数据)中准确预测3D结构。

详情
AI中文摘要

生成模型在2D图像和神经场表示在3D场景中的显著成就提供了一个有吸引力的机会,将两种方法的优势结合起来。在本工作中,我们提出了一种方法,将基于NeRF的3D场景表示与扩散模型的概率建模和推理相结合。我们将3D重建视为一个具有内在不确定性的感知问题,从而可以受益于概率推断方法。核心思想是将3D场景表示为一个随机的潜在变量,我们可以学习其先验分布,并在给定一组观测数据的情况下进行后验推断。我们通过扩散模型的分数推理方法进行后验采样,并结合从重建模型计算出的似然项(包括体渲染)。我们通过两阶段过程训练模型:首先训练重建模型并自动解码潜在表示以处理3D场景的数据集,然后在潜在空间上训练扩散模型的先验。通过使用模型从后验中生成样本,我们证明了各种3D重建任务可以执行,根据所使用的输入观测类型不同。我们展示了从单视角、多视角、噪声图像、稀疏像素和稀疏深度数据的重建。这些观测在提供的场景信息量上有所不同,我们展示了我们的方法能够建模与每个任务相关的不同水平的内在不确定性。我们的实验表明,这种方法产生了一种全面的方法,能够准确地从各种观测类型中预测3D结构。

英文摘要

The remarkable achievements of both generative models of 2D images and neural field representations for 3D scenes present a compelling opportunity to integrate the strengths of both approaches. In this work, we propose a methodology that combines a NeRF-based representation of 3D scenes with probabilistic modeling and reasoning using diffusion models. We view 3D reconstruction as a perception problem with inherent uncertainty that can thereby benefit from probabilistic inference methods. The core idea is to represent the 3D scene as a stochastic latent variable for which we can learn a prior and use it to perform posterior inference given a set of observations. We formulate posterior sampling using the score-based inference method of diffusion models in conjunction with a likelihood term computed from a reconstruction model that includes volumetric rendering. We train the model using a two-stage process: first we train the reconstruction model while auto-decoding the latent representations for a dataset of 3D scenes, and then we train the prior over the latents using a diffusion model. By using the model to generate samples from the posterior we demonstrate that various 3D reconstruction tasks can be performed, differing by the type of observation used as inputs. We showcase reconstruction from single-view, multi-view, noisy images, sparse pixels, and sparse depth data. These observations vary in the amount of information they provide for the scene and we show that our method can model the varying levels of inherent uncertainty associated with each task. Our experiments illustrate that this approach yields a comprehensive method capable of accurately predicting 3D structure from diverse types of observations.

2605.08731 2026-05-21 cs.PF cs.LG 版本更新

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

单线程JPEG解码器基准测试误评了ML数据加载器

Vladimir Iglovikov, Dmitry Kosarevsky

发表机构 * Ternaus Independent Researcher(独立研究者)

AI总结 本文通过评估不同Python可访问的JPEG解码路径在五种匹配的16核Google Cloud CPU上的表现,发现单线程基准测试无法准确评价ML数据加载器的性能,揭示了不同架构和解码器在多线程工作负载下的差异。

Comments 10 pages, 4 figures. Code and data: https://github.com/ternaus/imread_benchmark

详情
AI中文摘要

JPEG解码是常规的机器学习基础设施,但Python解码器的选择通常基于单进程、单线程的微观基准测试。我们通过在五种匹配的16核Google Cloud CPU(Intel Emerald Rapids,AMD Zen 4,AMD Zen 5,ARM Neoverse V2和ARM Neoverse N1)上审计十三种Python可访问的JPEG解码路径,验证了这一评估假设。ImageNet验证是工作负载,而不是新的数据集贡献:每次运行都从内存中解码完整的50,000张图像分割,并报告所有解码器的单线程吞吐量,对于符合条件的解码器,在工人数量{0,2,4,8}时报告PyTorch DataLoader吞吐量以及解码器跳过行为。评估协议改变了支持的结论。在Neoverse V2上,imageio在单线程吞吐量中排名第九,但进入与torchvision并列的DataLoader层级;在Zen 4上,torchvision从第七名的单线程提升到最高测量的DataLoader层级;在Neoverse N1上,imagecodecs是单线程领导者,但在峰值DataLoader吞吐量中排名第五。我们还发现Zen 4和Zen 5之间的工人数量结论不同,TensorFlow在单线程ARM上有较大的惩罚,严格的原生JPEG解码器/包装器拒绝了相同的罕见ImageNet JPEG。对于PyTorch DataLoader工作负载,torchvision和simplejpeg形成了最强的零跳过层级:torchvision具有最高的平均归一化吞吐量,而simplejpeg具有最高的最低吞吐量。OpenCV在每种测试的CPU上仍然是一个稳健的通用备用选项,超过平台本地胜者的90%。我们发布了原始JSON,生成的表格/图表以及一个可执行的本地/云基准框架。

英文摘要

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched 16 vCPU Google Cloud CPUs: Intel Emerald Rapids, AMD Zen 4, AMD Zen 5, ARM Neoverse V2, and ARM Neoverse N1. ImageNet validation is the workload, not a new dataset contribution: each run decodes the full 50,000-image split from memory and reports single-thread throughput for all decoders, PyTorch \texttt{DataLoader} throughput for eligible decoders at worker counts $\{0,2,4,8\}$, and decoder skip behavior. The evaluation protocol changes the supported conclusion. On Neoverse V2, \texttt{imageio} is ninth in single-thread throughput yet lands in the top DataLoader tier with \texttt{torchvision}; on Zen 4, \texttt{torchvision} rises from seventh single-thread to the top measured DataLoader tier; on Neoverse N1, \texttt{imagecodecs} is the single-thread leader but fifth at peak DataLoader throughput. We also find that worker-count conclusions differ between Zen 4 and Zen 5, TensorFlow has a large single-thread ARM penalty, and strict native JPEG decoders/wrappers reject the same rare ImageNet JPEG. For PyTorch DataLoader workloads, \texttt{torchvision} and \texttt{simplejpeg} form the strongest measured zero-skip tier: \texttt{torchvision} has the highest mean normalized throughput, while \texttt{simplejpeg} has the highest minimum. OpenCV remains a robust general-purpose fallback above 90\% of the platform-local winner on every tested CPU. We release raw JSON, generated tables/figures, and an executable local/cloud benchmark framework.

2605.08123 2026-05-21 cs.LG cs.CL 版本更新

Block-Wise Differentiable Sinkhorn Attention: Tail-Refinement Gradients with a Gap-Aware Dustbin Bridge

块级可微的Sinkhorn注意力:带有间隙意识的尘桶桥尾部细化

Dylan Forde

发表机构 * Independent Researcher(独立研究者)

AI总结 本文研究了通过停止基固定深度尾部细化代理在TPU硬件上实现长上下文平衡熵最优传输(OT)注意力。通过停止T步Sinkhorn求解后,展开一个短的细化尾部并精确地对这个代理进行微分。对于报告的R=2 TPU路径,反向传播包含四个阶梯计划因子。我们证明了一个精确的一参考瓷砖计划:R=2分数余切是单个参考计划瓷砖乘以一个由向量余切和双差分构建的显式修改字段。这导致了块级成本O((T+R)LW),O(Ld)输入存储,以及O(L)额外的HBM使用,对于固定头部维度d和带宽W在平衡固定支撑路径上。我们还正式化了当前dustbin_block路径作为在增强支撑上的相同单位目标代理,因此共轭计划提升到单个活跃尘桶路径,这在我们的TPU运行中使用;这个桥是代数的,不声称一般KL不平衡或任意容量间隙模型。我们提供了局部代理偏置界,后验偏置证书和严格正活跃块的投影收缩证书。在合成掩码问题上,优化的内核在10^-5至10^-10范围内与相同中心代理的精确自动微分匹配。在TPU v6e-8上,一个四配置Pfam屏幕完成端到端,一个提升的平衡R=2运行通过三小时预算,每秒维持大约8.5个示例,达到第1437步。保留的Pfam测试碎片将重建从5.57提高到2.05,稀疏CE从5.53提高到5.30,相对于第0步,CE被诊断性记录而不是直接优化;目标-均值对齐度量没有显著改善,而确定性对角参考在这些度量上仍更强。

详情
AI中文摘要

我们研究了通过停止基固定深度尾部细化代理在TPU硬件上实现长上下文平衡熵最优传输(OT)注意力。在停止T步Sinkhorn求解后,我们展开一个短的细化尾部并精确地对这个代理进行微分。对于报告的R=2 TPU路径,反向传播包含四个阶梯计划因子。我们证明了一个精确的一参考瓷砖计划:R=2分数余切是单个参考计划瓷砖乘以一个由向量余切和双差分构建的显式修改字段。这导致了块级成本O((T+R)LW),O(Ld)输入存储,以及O(L)额外的HBM使用,对于固定头部维度d和带宽W在平衡固定支撑路径上。我们还正式化了当前dustbin_block路径作为在增强支撑上的相同单位目标代理,因此共轭计划提升到单个活跃尘桶路径,这在我们的TPU运行中使用;这个桥是代数的,不声称一般KL不平衡或任意容量间隙模型。我们提供了局部代理偏置界,后验偏置证书和严格正活跃块的投影收缩证书。在合成掩码问题上,优化的内核在10^-5至10^-10范围内与相同中心代理的精确自动微分匹配。在TPU v6e-8上,一个四配置Pfam屏幕完成端到端,一个提升的平衡R=2运行通过三小时预算,每秒维持大约8.5个示例,达到第1437步。保留的Pfam测试碎片将重建从5.57提高到2.05,稀疏CE从5.53提高到5.30,相对于第0步,CE被诊断性记录而不是直接优化;目标-均值对齐度量没有显著改善,而确定性对角参考在这些度量上仍更强。

英文摘要

We study long-context balanced entropic optimal transport (OT) attention on TPU hardware through a stopped-base, fixed-depth tail-refinement surrogate. After a stopped $T$-step Sinkhorn solve, we unroll a short refinement tail and differentiate that surrogate exactly. For the reported $R=2$ TPU path, the backward pass contains four staircase plan factors. We prove an exact one-reference-tile schedule: the $R=2$ score cotangent is a single reference plan tile times an explicit modifier field built from vector cotangents and dual differences. This yields block-wise cost $O((T+R)LW)$, $O(Ld)$ input storage, and $O(L)$ additional HBM usage for fixed head dimension $d$ and band width $W$ on the balanced fixed-support path. We also formalize the current \texttt{dustbin\_block} path as the same unit-target surrogate on an augmented support, so the adjoint schedule lifts to the single-active-dustbin path used in our TPU runs; this bridge is algebraic and does not claim a general KL-unbalanced or arbitrary-capacity gap model. We provide a local surrogate-bias bound, an a posteriori bias certificate, and a projective contraction certificate for strictly positive active blocks. On synthetic masked problems, the optimized kernel matches exact autodiff of the same centered surrogate to within $10^{-5}$--$10^{-10}$. On TPU v6e-8, a four-configuration Pfam screen completes end-to-end, and a promoted balanced $R=2$ run sustains roughly $8.5$ examples per second through a three-hour budget, reaching step $1437$. Held-out Pfam test shards improve reconstruction from $5.57$ to $2.05$ and sparse CE from $5.53$ to $5.30$ relative to step $0$, with CE logged diagnostically rather than optimized directly; target-barycenter alignment metrics do not materially improve, and a deterministic diagonal reference remains stronger on those metrics.

2605.06139 2026-05-21 cs.LG cs.AI 版本更新

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

列表式策略优化:基于组的RLVR作为LLM响应单纯形上的目标投影

Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang, Yingyue Li, Wutong Xu, Lizhou Cai, Weijie Liu, Clive Bai, Kai Yang, Yangkun Chen, Saiyong Yang, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University(清华大学自动化系) LLM Department, Tencent(腾讯LLM部门)

AI总结 本文提出列表式策略优化(LPO),通过显式执行目标投影来解构隐式目标,利用响应单纯形限制近端RL目标,并通过精确散度最小化进行策略投影,从而在多样推理任务和LLM基础上提升训练性能,同时保持优化稳定性和响应多样性。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)已成为大语言模型(LLMs)训练后的一种标准方法,以激励推理能力。在现有方法中,基于组的策略梯度很流行,它为每个提示样本生成一组响应,并通过组内优势信号更新策略。本文揭示这些优化策略共享一个共同的几何结构:每种策略隐式地定义了一个目标分布,并通过一阶近似向响应单纯形投影。基于这一见解,我们提出了列表式策略优化(LPO)以显式执行目标投影,通过限制近端RL目标到响应单纯形来解构隐式目标,然后通过精确散度最小化进行策略投影。该框架提供了(i)在列表式目标上单调改进,具有有界、零和和自校正的投影梯度,以及(ii)通过解耦的投影步骤灵活选择散度,具有不同的结构性质。在多样推理任务和LLM基础架构上,LPO在匹配的目标下一致地优于典型的策略梯度基线,同时内在地保持了优化稳定性和响应多样性。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which samples a group of responses per prompt and updates the policy via group-relative advantage signals. This work reveals that these optimization strategies share a common geometric structure: each implicitly defines a target distribution on the response simplex and projects toward it via first-order approximation. Building on this insight, we propose Listwise Policy Optimization (LPO) to explicitly conduct the target-projection, which demystifies the implicit target by restricting the proximal RL objective to the response simplex, and then projects the policy via exact divergence minimization. This framework provides (i) monotonic improvement on the listwise objective with bounded, zero-sum, and self-correcting projection gradients, and (ii) flexibility in divergence selection with distinct structural properties through the decoupled projection step. On diverse reasoning tasks and LLM backbones, LPO consistently improves training performance over typical policy gradient baselines under matched targets, while intrinsically preserving optimization stability and response diversity.

2605.05863 2026-05-21 cs.LG cs.AI 版本更新

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data

SOPE: 通过先验数据稳定在线强化学习中的策略评估

Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov

发表机构 * Media Integration and Communication Center – University of Florence(媒体集成与通信中心——佛罗伦萨大学) SEED – Electronic Arts(SEED——电子艺界)

AI总结 本文提出SOPE算法,通过使用与演员对齐的离策略策略评估(OPE)信号作为自动早停机制,动态控制离线训练阶段的长度,从而在连续控制任务中提高基线性能并减少计算资源消耗。

详情
AI中文摘要

将先验数据纳入在线强化学习可以加速训练,但通常需要在高计算成本和长的多阶段训练流水线之间做出艰难的权衡。虽然固定长度的稳定阶段比静态更新计划更具计算效率,但它们需要任务相关的手动调整,可能会导致先验知识的浪费或严重的过拟合。为此,我们提出了SOPE算法,该算法利用与演员对齐的离策略策略评估(OPE)信号作为自动早停机制,动态控制离线训练阶段的长度。通过在当前策略的动作分布下对批评者进行保留验证集的评估,SOPE在离分布收益饱和时精确停止梯度更新,从而消除了手动调度调整的需要。在Minari基准套件的25个连续控制任务上评估,SOPE将基线性能提高了高达45.6%,同时将所需的TFLOPs减少了高达22倍,从而在样本效率和计算效率之间取得了平衡。这些发现表明,自适应的、基于评估的更新计划比依赖静态、详尽的更新计划更有效。

英文摘要

Incorporating prior data into online reinforcement learning accelerates training but typically forces a difficult trade-off between high computational costs and long, multi-stage training pipelines. While fixed-length stabilization phases are significantly more computationally efficient than static update schedules, they require task-dependent manual tuning, risking either the waste of prior knowledge or severe overfitting. To address this, we propose SOPE, an algorithm that uses an actor-aligned Off-Policy Policy Evaluation (OPE) signal as an automated early-stopping mechanism to dynamically control the length of offline training phases. By evaluating the critic on a held-out validation split under the current policy's action distribution, SOPE halts gradient updates exactly when out-of-distribution benefits saturate, eliminating the need for manual schedule tuning. Evaluated on 25 continuous control tasks from the Minari benchmark suite, SOPE improves baseline performance by up to 45.6% while reducing the required TFLOPs by up to 22x, thus balancing the tradeoff between sample and computational efficiency. These findings demonstrate that adaptive, evaluation-driven update schedules are more effective than relying on static, exhaustive update schedules.

2605.04128 2026-05-21 cs.GR cs.AI cs.CL cs.CV cs.LG 版本更新

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

JoyAI-Image: 激活统一多模态理解和生成中的空间智能

Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, Guohui Zhang, Wenhu Zhang, Hang Xu, Nan Jiang, Xin Han, Haoze Sun, Maoquan Zhang, Haoyang Huang, Nan Duan

发表机构 * Joy Future Academy, JD(joy未来学院,京东)

AI总结 本文提出JoyAI-Image,一种统一的多模态基础模型,用于视觉理解、文本到图像生成和指令引导的图像编辑。该模型结合了空间增强的多模态大语言模型(MLLM)和多模态扩散Transformer(MMDiT),通过共享的多模态接口实现感知与生成的交互。构建可扩展的训练配方,结合统一指令微调、长文本渲染监督、空间 grounded 数据和通用及空间编辑信号,使模型具备广泛的多模态能力,同时增强几何感知推理和可控视觉合成。实验表明,JoyAI-Image在理解、生成、长文本渲染和编辑基准上达到最先进的性能。更重要的是,增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力,向更强的空间智能发展。

Comments Code: https://github.com/jd-opensource/JoyAI-Image

详情
AI中文摘要

我们提出了JoyAI-Image,一种统一的多模态基础模型,用于视觉理解、文本到图像生成和指令引导的图像编辑。JoyAI-Image将空间增强的多模态大语言模型(MLLM)与多模态扩散Transformer(MMDiT)结合,允许感知和生成通过共享的多模态接口进行交互。围绕此架构,我们构建了一个可扩展的训练配方,结合了统一指令微调、长文本渲染监督、空间 grounded 数据以及通用和空间编辑信号。该设计使模型具备广泛的多模态能力,同时增强了几何感知推理和可控视觉合成。在理解、生成、长文本渲染和编辑基准上的实验表明,JoyAI-Image实现了最先进的或高度竞争的性能。更重要的是,增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力,向更强的空间智能发展。这些结果表明,统一视觉模型在下游应用如视觉-语言-动作系统和世界模型中具有前景。

英文摘要

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

2605.03690 2026-05-21 cs.LG cs.AI q-bio.QM 版本更新

Graph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Prediction

基于图神经网络的面向层次的知识图谱嵌入:应用于酵母表型预测

Filip Kronström, Alexander H. Gower, Daniel Brunnsåker, Ievgeniia A. Tiukova, Ross D. King

发表机构 * Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg(计算机科学与工程系,查尔姆斯理工大学和哥德堡大学) Department of Life Sciences, Chalmers University of Technology(生命科学系,查尔姆斯理工大学) Department of Industrial Biotechnology, KTH Royal Institute of Technology(工业生物技术系,皇家理工学院) Department of Chemical Engineering and Biotechnology, University of Cambridge(化学工程与生物技术系,剑桥大学)

AI总结 本文提出了一种利用图神经网络和来自底层本体的语义损失来生成层次感知的知识图谱嵌入的方法,用于酵母表型预测,并展示了其在基因敲除效应预测和知识图谱修订评估中的应用。

详情
AI中文摘要

我们提出了一种利用图神经网络和来自底层本体的语义损失来生成层次感知的知识图谱嵌入的方法。该方法生成的嵌入更能反映领域知识。为了展示其效用,我们预测并解释了酵母Saccharomyces cerevisiae中基因敲除的影响,并在没有预测任务的情况下学习知识图谱的盒嵌入。我们进一步展示了盒嵌入如何作为评估知识图谱修订的基础。我们的酵母知识图谱是从社区数据库和本体术语构建的。低维盒嵌入结合图神经网络用于预测双基因敲除的细胞生长。在10折交叉验证中,这些预测的平均R²分数为0.360,显著高于基线比较,证明了高层定性知识对实验结果的影响力。在模型训练中纳入语义损失项提高了其预测性能(R²=0.377),通过将嵌入对齐本体结构。这表明本体中的类层次可以用于定量预测。我们还测试了训练好的模型在三基因敲除上的表现,展示了其对训练数据之外数据的泛化能力。此外,通过识别酵母知识图谱中对细胞生长预测重要的共现关系,我们构建了关于酵母相互作用特征的假说。一个生物实验验证了其中一个发现,揭示了肌醇利用与渗透压压力抗性之间的关联,突显了模型在生物发现中的潜力。

英文摘要

We present a method for finding hierarchy-aware embeddings of knowledge graphs (KGs) using graph neural networks (GNNs) enriched with a semantic loss derived from underlying ontologies. This method yields embeddings that better reflect domain knowledge. To demonstrate their utility, we predict and interpret the effects of gene deletions in the yeast Saccharomyces cerevisiae and learn box embeddings for KGs in the absence of a prediction task. We further show how box embeddings can serve as the basis for evaluating KG revisions. Our yeast KG is constructed from community databases and ontology terms. Low-dimensional box embeddings combined with GNNs are used to predict cell growth for double gene knockouts. Over 10-fold cross validation, these predictions have a mean $R^2$~score~of~0.360, significantly higher than baseline comparisons, demonstrating that high-level qualitative knowledge is informative about experimental outcomes. Incorporating semantic loss terms in the training of the models improves their predictive performance ($R^2$=0.377) by aligning embeddings with ontology structure. This shows that class hierarchies from ontologies can be exploited for quantitative prediction. We also test the trained models on triple gene knockouts, showing they generalise to data beyond those seen in training. Additionally, by identifying co-occurring relations in the yeast KG important for the cell-growth predictions, we construct hypotheses about interacting traits in yeast. A biological experiment validates one such finding, revealing an association between inositol utilisation and osmotic stress resistance, highlighting the model's potential to guide biological discovery.

2604.23937 2026-05-21 physics.flu-dyn cs.LG 版本更新

Multi-scale Dynamic Wake Modeling and Prediction of Floating Offshore Wind Turbines via Physics-Informed Neural Networks and Fourier Neural Operators

基于物理信息神经网络和傅里叶神经算子的浮式海上风力涡轮机多尺度动态涡流建模与预测

Guodan Dong, Jianhua Qin, Chang Xu

发表机构 * College of Renewable Energy, Hohai University, Changzhou, 213200, China(能源学院,河海大学,常州,213200,中国) College of Water Conservancy and Hydropower Engineering(水利水电工程学院) National Technology Innovation Center for Wind Power, Hohai University, Changzhou, 213200, China(风能技术创新中心,河海大学,常州,213200,中国)

AI总结 本文提出利用物理信息神经网络和傅里叶神经算子对浮式海上风力涡轮机的多尺度动态涡流进行建模与预测,通过高保真数据集验证了FNOs在效率、长期预测能力和多尺度相干结构保真度方面的优势。

详情
AI中文摘要

多尺度动态涡流建模与预测对于浮式海上风力涡轮机(FOWTs)的实时控制和优化至关重要。在本研究中,通过两种新的深度学习框架——物理信息神经网络(PINNs)和傅里叶神经算子(FNOs),对在不同斯特劳哈尔数(St)范围内耦合的涌动和俯仰运动下产生的涡流进行建模。高保真数据集来源于具有旋翼线模型的大涡模拟(LES-AL)。结果表明,两种框架都能很好地建模主导的大尺度动态结构,如涡流蜿蜒;然而,FNOs在效率(计算速度提升8倍,收敛速度提升40倍)、长期预测能力和多尺度相干结构保真度方面显著优于PINN模型。此外,PINN模型预测的涡流具有平滑效应,限制了高频相干结构的分辨率,并低估了涡流中心和半宽处的湍流波动。频谱分析显示,FNOs能解析主要的涡流蜿蜒频率(其中Stp表示由耦合涌动和俯仰运动引起的频率),其对应的高阶谐波(2Stp,3Stp)以及能量级联。相比之下,PINN预测中的能量级联在高频范围(St > 1.0)内衰减得更快。此外,预乘功率谱密度表明,PINN模型所建模的涡流蜿蜒及对应谐波频率的能量含量相对于CFD和FNOs而言相对较低。这些发现表明,FNOs在高保真、实时建模FOWT涡流方面具有广阔前景。

英文摘要

Multi-scale dynamic wake modeling and prediction are essential for the real-time control and optimization of floating offshore wind turbines (FOWTs). In this study, wakes of FOWTs under coupled surge and pitch motions across a range of Strouhal numbers (St), which can induce wake meandering, are modeled via two novel deep-learning frameworks: physics-informed neural networks (PINNs) and Fourier neural operators (FNOs). The high-fidelity dataset is obtained from large-eddy simulations with the actuator line model (LES-AL). The results demonstrate that the dominant large-scale dynamic structures, such as meandering, can be well modeled by both frameworks; however, FNOs exhibit significant advantages over the PINN model in terms of efficiency (8-fold computational speedup and 40-fold faster convergence), long-term predictive capability, and multi-scale coherent structural fidelity. Furthermore, the wakes predicted by the PINN model exhibit a smoothing effect that limits the resolution of high-frequency coherent structures and underestimates turbulent fluctuations in both the wake center and half-width. Spectral analysis reveals that FNOs resolve the primary meandering frequency (where Stp denotes the frequency induced by the coupled surge and pitch motions), its corresponding higher-order harmonics (2Stp, 3Stp), and the energy cascade. In contrast, the energy cascade in the PINN predictions dissipates more rapidly in the high-frequency regime (St > 1.0). Additionally, the pre-multiplied power spectral density indicates that the energy contained in meandering and the corresponding harmonic frequencies modeled by PINNs is relatively low compared to that in CFD and FNOs. These findings suggest that FNOs are promising for the high-fidelity, real-time modeling of FOWT wakes.

2604.20985 2026-05-21 cs.LG cs.AI cs.CR stat.ML 版本更新

Differentially Private Model Merging

差分隐私模型融合

Qichuan Yin, Manzil Zaheer, Tian Li

发表机构 * The University of Chicago(芝加哥大学) Google DeepMind(谷歌DeepMind)

AI总结 本文提出两种后处理技术,随机选择和线性组合,用于在不额外训练的情况下生成满足任意目标差分隐私要求的最终私有模型,同时分析了这些方法在一般问题和私有均值估计中的隐私-效用权衡。

详情
AI中文摘要

在机器学习中,推理或部署时间的隐私要求往往由于政策、法规或用户偏好变化而演变。在本文中,我们旨在构建一组模型,以满足任何目标差分隐私(DP)要求,而无需额外训练,给定一组已在相同数据集上训练且具有不同隐私/效用权衡的现有模型。我们提出两种后处理技术,即随机选择和线性组合,以生成最终的私有模型,满足任何目标隐私参数。我们从R'enyi DP和一般问题中的隐私损失分布的角度提供了这些方法的隐私计费,以及在私有均值估计中的精确隐私/效用权衡分析,并比较了这两种机制。实验上,我们展示了我们方法的有效性,并在多个模型和合成及现实世界数据集上验证了我们的分析。

英文摘要

In machine learning, privacy requirements at inference or deployment time often evolve due to changing policies, regulations, or user preferences. In this work, we aim to construct a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post-processing techniques, namely random selection and linear combination, to generate final private models satisfying any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions on general problems, as well as on private mean estimation, where we precisely characterize the privacy/utility tradeoffs and compare the two mechanisms. Empirically, we demonstrate the effectiveness of our approaches and validate our analyses on several models and both synthetic and real-world datasets.

2604.11661 2026-05-21 cs.LG cs.AI 版本更新

Towards Autonomous Mechanistic Reasoning in Virtual Cells

向虚拟细胞中的自主机理推理迈进

Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院) Valence Labs(Valence实验室) University College London(伦敦大学学院)

AI总结 本文提出了一种结构化解释形式化方法,用于虚拟细胞中的生物推理,通过机理动作图实现系统验证和反驳,并引入VCR-Agent多智能体框架,结合生物基础知识检索和基于验证器的过滤方法,生成并验证机理推理。

详情
AI中文摘要

大型语言模型(LLMs)最近因其在加速科学发现方面的潜力而受到广泛关注。然而,它们在如生物学等开放性科学领域中的应用仍然有限,主要是由于缺乏事实性支撑和可操作的解释。为此,我们引入了一种结构化解释形式化方法,用于虚拟细胞,将生物推理表示为机理动作图,从而实现系统验证和反驳。在此基础上,我们提出了VCR-Agent多智能体框架,该框架整合了生物基础知识检索与基于验证器的过滤方法,以自动生成并验证机理推理。使用该框架,我们发布了VC-TRACES数据集,该数据集由来自Tahoe-100M图谱的验证机理解释组成。实证研究表明,使用这些解释训练可以提高事实准确性,并为下游基因表达预测提供更有效的监督信号。这些结果强调了通过多智能体和严格验证的协同作用,可靠机理推理在虚拟细胞中的重要性。

英文摘要

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.

2604.11071 2026-05-21 cs.CV cs.AI cs.LG 版本更新

Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net

轻量级低光照图像增强 via 分布归一化预处理和深度卷积U-Net

Shimon Murai, Teppei Kurita, Ryuta Satoh, Yusuke Moriuchi

发表机构 * Sony Semiconductor Solutions Corporation(索尼半导体解决方案公司)

AI总结 本文提出了一种轻量级两阶段框架,通过分布归一化预处理和深度卷积U-Net实现低光照图像增强,相比现有方法参数更少且感知质量更优。

Comments Technical report for the NTIRE 2026 Efficient Low-Light Image Enhancement Challenge (CVPR 2026 Workshops), 3rd place solution

详情
AI中文摘要

我们提出了一种轻量级两阶段框架,用于低光照图像增强(LLIE),该框架在参数远少于现有方法的情况下实现了具有竞争力的感知质量。我们的方法结合了冻结算法的预处理与一个完全由深度卷积构成的紧凑型U-Net。预处理通过提供互补的亮度校正视图来归一化输入分布,使可训练网络能够专注于残差颜色校正。我们的方法在CVPR 2026 NTIRE高效低光照图像增强挑战中获得了第三名。我们进一步提供了扩展的基准测试和消融实验以证明我们方法的通用有效性。

英文摘要

We present a lightweight two-stage framework for low-light image enhancement (LLIE) that achieves competitive perceptual quality with significantly fewer parameters than existing methods. Our approach combines frozen algorithm-based preprocessing with a compact U-Net built entirely from depthwise-separable convolutions. The preprocessing normalizes the input distribution by providing complementary brightness-corrected views, enabling the trainable network to focus on residual color correction. Our method achieved 3rd place in the CVPR 2026 NTIRE Efficient Low-Light Image Enhancement Challenge. We further provide extended benchmarks and ablations to demonstrate the general effectiveness of our methods.

2604.07213 2026-05-21 cs.LG math.PR 版本更新

Diffusion Processes on Implicit Manifolds

隐式流形上的扩散过程

Victor Kawasaki-Borruat, Clara Grotehans, Pierre Vandergheynst, Adam Gosztolai

发表机构 * Signal Processing Laboratory 2(信号处理实验室2) Institute of Artificial Intelligence(人工智能研究所) EPFL(瑞士联邦理工学院) Medical University of Vienna(维也纳医学大学)

AI总结 本文研究如何仅使用点云样本在数据流形上构建扩散过程,提出隐式流形估值扩散(IMDs)方法,通过近似扩散过程的无穷小生成元和carré-du-champ来定义高维空间中的随机微分方程,实现流形内在过程的外推,并通过实验验证其在数据流形上的约束性和探索性。

Comments Comments are more than welcome!

详情
AI中文摘要

高维数据通常被认为位于低维流形上。我们研究如何仅使用点云样本,在不访问图表、投影或其他几何原始元素的情况下,构建该数据流形上的扩散过程。本文引入隐式流形估值扩散(IMDs),一种数据驱动的数学形式化方法,用于在原始高维空间中定义描述内在流形上漂移布朗粒子的随机微分方程。我们的构造基于使用数据上的邻近图近似扩散过程的相应无穷小生成元,并利用生成元的carré-du-champ,该编码流形的局部切空间,并将内在过程提升到环境坐标中。我们证明随着样本数量的增长,我们的离散扩散过程在概率路径空间上收敛到其光滑流形对应物。我们进一步提出一个欧拉-马尔代夫方案用于IMDs的数值积分。我们通过在合成流形和MNIST数据流形上的数值实验验证了我们的框架,显示IMDs能够在流形上保持约束并实现其引导探索。我们的工作为数据流形上的扩散过程提供了数学基础和实际实现,开辟了流形感知采样、探索和生成建模的新途径。

英文摘要

High-dimensional data are often assumed to lie on lower-dimensional manifolds. We study how to construct diffusion processes on this data manifold using only point cloud samples and without access to charts, projections, or other geometric primitives. Here, we introduce Implicit Manifold-valued Diffusions (IMDs), a data-driven mathematical formalism for defining stochastic differential equations in the original high-dimensional space that describe drifting Brownian particles evolving intrinsically on the underlying manifold. Our construction hinges on approximating the corresponding infinitesimal generator of the diffusion process using a proximity graph over the data and using the carré-du-champ of the generator, which encodes the local tangent spaces of the manifold and lifts the intrinsic process into ambient coordinates. We show that as the number of samples grows, our discrete diffusion process converges in law on the space of probability paths to its smooth manifold counterpart. We further present an Euler-Maruyama scheme for the numerical integration of IMDs. We validate our framework using numerical experiments on synthetic manifolds and the MNIST data manifold, showing that IMDs remain confined over the manifold and enable its guided exploration. Our work provides the mathematical foundation and practical implementations of diffusion processes on data manifolds, opening new avenues for manifold-aware sampling, exploration, and generative modeling.

2603.29183 2026-05-21 cs.LG cs.AI 版本更新

IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection

IMPACT: 开集时间序列异常检测中的影响建模

Xiaohui Zhou, Yijie Wang, Hongzuo Xu, Weixuan Liang, Xiaoli Li, Guansong Pang

发表机构 * National Key Laboratory of Parallel and Distributed Computing(国家级并行与分布式计算实验室) College of Computer Science and Technology, National University of Defense Technology(国防科技大学计算机科学与技术学院) Intelligent Game and Decision Lab (IGDL)(智能游戏与决策实验室(IGDL)) Information Systems Technology and Design, Singapore University of Technology and Design(新加坡科技设计大学信息系统技术与设计系) School of Computing and Information Systems, Singapore Management University(新加坡管理大学计算与信息系统学院)

AI总结 本文提出IMPACT框架,通过影响建模方法解决开集时间序列异常检测中的挑战,通过学习影响函数生成真实异常模式并净化训练数据。

Comments Accepted by ICML 2026

详情
AI中文摘要

开集异常检测(OSAD)是一种新兴范式,旨在利用训练中观察到的异常类有限标记数据,在测试时识别已见和未见的异常。当前方法依赖简单的增强方法生成伪异常以复制未见异常。尽管在图像数据中表现良好,但这些方法在时间序列数据中效果不佳,因为未能保持其序列特性,导致异常模式变得琐碎或不真实。当训练数据被未标记异常污染时,问题进一步加剧。本文引入IMPACT,一种新的框架,利用影响建模方法解决这些挑战。关键见解是学习一个影响函数,以准确估计单个训练样本对建模的影响,然后利用这些影响分数生成语义上不同但真实的未见异常,同时将高影响样本重新利用为监督异常以净化数据。大量实验表明,IMPACT显著优于现有最先进方法,在各种OSAD设置和污染率下表现出更高的准确性。代码可在https://github.com/mala-lab/IMPACT获取。

英文摘要

Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its sequential nature, resulting in trivial or unrealistic anomaly patterns. They are further plagued when the training data is contaminated with unlabeled anomalies. This work introduces $\textbf{IMPACT}$, a novel framework that leverages $\underline{\textbf{i}}$nfluence $\underline{\textbf{m}}$odeling for o$\underline{\textbf{p}}$en-set time series $\underline{\textbf{a}}$nomaly dete$\underline{\textbf{ct}}$ion, to tackle these challenges. The key insight is to $\textbf{i)}$ learn an influence function that can accurately estimate the impact of individual training samples on the modeling, and then $\textbf{ii)}$ leverage these influence scores to generate semantically divergent yet realistic unseen anomalies for time series while repurposing high-influential samples as supervised anomalies for anomaly decontamination. Extensive experiments show that IMPACT significantly outperforms existing state-of-the-art methods, showing superior accuracy under varying OSAD settings and contamination rates. Code is available at https://github.com/mala-lab/IMPACT.

2603.23890 2026-05-21 cs.SE cs.LG 版本更新

Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis

Praxium:基于AI的遥测和依赖分析的云异常诊断

Rohan Kumar, Jason Li, Zongshun Zhang, Syed Mohammad Qasim, Gianluca Stringhini, Ayse K. Coskun

发表机构 * Boston University(波士顿大学)

AI总结 本文提出Praxium框架,利用AI技术进行云服务异常检测和根本原因推断,通过遥测数据和依赖分析提高故障诊断效率和准确性。

详情
AI中文摘要

随着现代微服务架构在云应用中的普及,云服务正变得越来越复杂,更容易受到配置错误和软件bug的影响。传统方法依赖专家输入来诊断和修复微服务异常,但在持续集成和持续部署(CI/CD)范式下缺乏可扩展性。微服务发布包含新的软件安装,与应用程序组件有复杂的相互作用。因此,将异常行为归因于任何特定安装或发布变得更加困难,导致可能的解决时间变慢。为了解决当前诊断方法的不足,本文引入Praxium,一个用于异常检测和根本原因推断的框架。Praxium帮助管理员在软件发现工具PraxiPaaS提供的依赖安装信息的背景下评估目标指标性能。Praxium持续监控遥测数据以识别异常,然后通过最近软件安装的因果影响进行根本原因分析,以向站点可靠性工程师(SRE)提供有关观察到的异常的相关信息。在本文中,我们证明Praxium能够有效进行异常检测和根本原因推断,并提供在实际环境中所需的有效异常检测超参数调优分析。在75次总试验中使用四个合成异常,异常检测始终在>0.97宏F1水平上表现良好。此外,我们还显示因果影响分析能够可靠地推断异常的根本原因,即使软件包安装时间间隔越来越短。

英文摘要

As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.

2603.22727 2026-05-21 cs.LG eess.SP 版本更新

Spiking Personalized Federated Learning for Brain-Computer Interface-Enabled Immersive Communication

基于脑机接口的沉浸式通信的脉冲个性化联邦学习

Chen Shang, Dinh Thai Hoang, Diep N. Nguyen, Jiadong Yu

发表机构 * School of Electrical and Data Engineering, University of Technology Sydney(悉尼技术大学电气与数据工程学院) Thrust of Internet of Things, The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)物联网研究所)

AI总结 本文提出了一种利用脑机接口获取脑信号以推断用户中心状态(如意图和感知相关不适)的沉浸式通信框架,通过个性化联邦学习模型处理脑信号,以适应神经多样性数据并防止敏感脑信号信息泄露,同时通过嵌入脉冲神经网络降低能耗,实验表明在真实脑信号数据集上识别准确率最高且能耗降低6.46倍。

Comments 6 pages, 3 figures

详情
Journal ref
INFOCOM Workshop, 2026
AI中文摘要

本文提出了一种新颖的沉浸式通信框架,利用脑机接口(BCI)获取脑信号以推断用户中心状态(例如意图和感知相关不适),从而在强个体差异下实现更个性化和稳健的沉浸式适应。具体而言,我们开发了一个个性化联邦学习(PFL)模型来分析和处理收集到的脑信号,该模型不仅能够适应神经多样性脑信号数据,还能防止敏感脑信号信息泄露。为了解决持续设备学习和推理在能量受限的沉浸终端(如头戴式显示器)中的能量瓶颈,我们进一步将脉冲神经网络(SNNs)嵌入到PFL中。通过利用稀疏、事件驱动的脉冲计算,SNN启用的PFL在保持竞争性个性化性能的同时,降低了训练和推理的计算和能耗。在真实脑信号数据集上的实验表明,我们的方法在整体识别准确率方面表现最佳,同时与传统人工神经网络基线相比,推理能耗降低了6.46倍。

英文摘要

This work proposes a novel immersive communication framework that leverages brain-computer interface (BCI) to acquire brain signals for inferring user-centric states (e.g., intention and perception-related discomfort), thereby enabling more personalized and robust immersive adaptation under strong individual variability. Specifically, we develop a personalized federated learning (PFL) model to analyze and process the collected brain signals, which not only accommodates neurodiverse brain-signal data but also prevents the leakage of sensitive brain-signal information. To address the energy bottleneck of continual on-device learning and inference on energy-limited immersive terminals (e.g., head-mounted display), we further embed spiking neural networks (SNNs) into the PFL. By exploiting sparse, event-driven spike computation, the SNN-enabled PFL reduces the computation and energy cost of training and inference while maintaining competitive personalization performance. Experiments on real brain-signal dataset demonstrate that our method achieves the best overall identification accuracy while reducing inference energy by 6.46$\times$ compared with conventional artificial neural network-based personalized baselines.

2603.22430 2026-05-21 cs.LG 版本更新

Inference Time Policy Optimization for Offline RL with Differentiable World Models

基于可微世界模型的离线强化学习推理时间策略优化

Rohan Deb, Stephen J. Wright, Arindam Banerjee

发表机构 * Siebel School of Computing and Data Science(计算与数据科学学院) Department of Computer Sciences(计算机科学系) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Wisconsin Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出了一种在推理时间利用可微世界模型优化策略参数的方法,通过端到端的梯度计算提升离线强化学习的性能,同时探讨了推理时间适应的计算开销与收益的权衡。

详情
AI中文摘要

Offline Reinforcement Learning (RL) learns optimal policies from fixed datasets, training a policy once and deploying it at inference time without further refinement. Inspired by model predictive control (MPC), we introduce an inference time adaptation framework that utilizes a pretrained policy along with a learned world model. While existing world model and diffusion-planning methods use learned dynamics to generate imagined trajectories during training, or to sample candidate plans at inference time, they do not use inference-time information to *optimize* the policy parameters on the fly. In contrast, our design is a Differentiable World Model (DWM) pipeline that enables end-to-end gradient computation through imagined rollouts for inference time policy optimization (ITPO). We evaluate our algorithm on D4RL continuous-control benchmarks (MuJoCo locomotion tasks and AntMaze), and show that exploiting inference-time information to optimize the policy parameters yields consistent gains over strong offline RL baselines. Inference-time adaptation, however, is expensive: rollout generation and backpropagation dominate per-step compute. We study this tradeoff explicitly, showing that a suitable tilted version of one-step MeanFlow sampler recovers much of the gains at a fraction of the cost.

英文摘要

Offline Reinforcement Learning (RL) learns optimal policies from fixed datasets, training a policy once and deploying it at inference time without further refinement. Inspired by model predictive control (MPC), we introduce an inference time adaptation framework that utilizes a pretrained policy along with a learned world model. While existing world model and diffusion-planning methods use learned dynamics to generate imagined trajectories during training, or to sample candidate plans at inference time, they do not use inference-time information to *optimize* the policy parameters on the fly. In contrast, our design is a Differentiable World Model (DWM) pipeline that enables end-to-end gradient computation through imagined rollouts for inference time policy optimization (ITPO). We evaluate our algorithm on D4RL continuous-control benchmarks (MuJoCo locomotion tasks and AntMaze), and show that exploiting inference-time information to optimize the policy parameters yields consistent gains over strong offline RL baselines. Inference-time adaptation, however, is expensive: rollout generation and backpropagation dominate per-step compute. We study this tradeoff explicitly, showing that a suitable tilted version of one-step MeanFlow sampler recovers much of the gains at a fraction of the cost.

2603.21033 2026-05-21 cs.CE cs.LG 版本更新

TabPFN Extensions for Interpretable Geotechnical Modelling

TabPFN扩展在可解释地质建模中的应用

Taiga Saito, Yu Otake, Daijiro Mizutani, Stephen Wu

发表机构 * Department of Civil and Environmental Engineering, Tohoku University(东北大学土木环境工程系) Research Organization of Information and Systems, The Institute of Statistical Mathematics(信息与系统研究所统计数学研究所) Department of Statistical Science, The Graduate University for Advanced Studies(高级研究大学统计科学系)

AI总结 本文评估了TabPFN及其扩展库在地质任务中的表现,通过土壤类型分类和参数迭代填补,展示了TabPFN在不确定性量化和可解释性方面的优势。

详情
AI中文摘要

地质场地特性依赖于稀疏且异质的钻孔数据,其中不确定性量化和可解释性与预测准确性同样重要。我们评估了TabPFN以及其tabpfn-extensions库在两个地质任务中的表现:(1) 从N值和剪切波速度数据进行土壤类型分类作为受控示例;(2) 在BM/AirportSoilProperties/2/2025中迭代填补五个机械参数(s_u,E_u,σ'_p,C_c,C_v)。在不重新训练的情况下,我们应用余弦相似度分析TabPFN嵌入,可视化预测分布,并计算SHAP属性。在回归基准测试中,我们比较了TabPFN与均值填补、线性回归、随机森林、XGBoost和HBM;引入了预测不确定性在上下文扰动类中的代理分解;并通过一维固结模型传播边缘C_c和σ'_p分布以获得可靠性指数β和服务性超额概率P_f。嵌入表现出标签一致的黏土/砂分组;迭代填补减少了所有五个目标的RMSE,其中TabPFN在四个目标上最低;SHAP属性与Skempton压缩指数相关性和反向预固结压力-含水量依赖性一致;代理分解中的后验成分最大。我们将贡献定位为一个工作评估流程,可能补充数据稀缺的地质学方法,而不是算法创新。

英文摘要

Geotechnical site characterisation relies on sparse, heterogeneous borehole data, where uncertainty quantification and interpretability matter as much as predictive accuracy. We evaluate TabPFN~\citep{Hollmann2025}, a tabular foundation model, and its \texttt{tabpfn-extensions} library on two geotechnical tasks: (1) soil-type classification from N-value and shear-wave velocity data as a controlled illustrative case, and (2) iterative imputation of five mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${σ'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in BM/AirportSoilProperties/2/2025. Without retraining, we apply cosine-similarity analysis to TabPFN embeddings, visualise predictive distributions, and compute SHAP attributions. On the regression benchmark we compare TabPFN with mean imputation, linear regression, random forests, XGBoost, and HBM; introduce a proxy decomposition of predictive uncertainty across context-perturbation classes; and propagate marginal $C_\mathrm{c}$ and ${σ'}_\mathrm{p}$ distributions through a one-dimensional consolidation model to obtain the reliability index $β$ and serviceability exceedance probability $P_\mathrm{f}$. Embeddings exhibit label-consistent Clay/Sand grouping; iterative imputation reduces RMSE for all five targets, with TabPFN lowest on four; SHAP attributions are consistent with the Skempton compression-index correlation and the inverse preconsolidation-pressure-water-content dependence; the within-posterior component is largest in the proxy decomposition. We position the contribution as a worked evaluation workflow that may complement established methods for data-scarce geotechnics, not as algorithmic innovation.

2603.20420 2026-05-21 q-bio.GN cs.LG q-bio.QM 版本更新

CRANE: Correcting Errors in Raw Nanopore Signals Using Hidden Markov Models

CRANE:利用隐马尔可夫模型纠正原始纳米孔信号中的错误

Simon Ambrozak, Ulysse McConnell, Bhargav Srinivasan, Burak Ozkan, Ernest Zhang, Can Firtina

发表机构 * University of Maryland(马里兰大学) ETH Zurich(苏黎世联邦理工学院) Bilkent University(比尔肯特大学)

AI总结 本文提出CRANE方法,通过训练和使用隐马尔可夫模型(HMM)来纠正纳米孔信号中的错误,从而提高原始信号分析的准确性,减少分析管道优化的负担,并且不引入显著的计算开销。

详情
AI中文摘要

纳米孔测序可以读取比其他测序方法更长的核酸分子序列,称为读数,这已推动了基因组分析的进步,如无间隙的人类基因组组装。通过分析纳米孔测序生成的原始电信号读数,现有方法可以将这些读数映射到DNA字符(即碱基调序)而无需转换,从而实现快速高效的测序数据分析。然而,原始信号常常由于噪声和处理误差而包含错误,这限制了原始信号分析的总体准确性。本文的目标是检测并纠正原始信号中的错误,以提高原始信号分析的准确性。为此,我们提出了CRANE,一种通过训练和利用隐马尔可夫模型(HMM)来准确纠正信号错误的机制。我们在各种数据集上的广泛评估表明,CRANE 1)一致提高了底层原始信号分析工具的整体准确性,2)最小化了为新型纳米孔技术优化分析管道的负担,3)不引入显著的计算开销。我们得出结论,CRANE提供了一种有效的方法,系统地在进一步分析之前识别并纠正原始纳米孔信号中的错误,这可以促进一种专门为原始纳米孔信号设计的新类别的错误校正机制。源代码:CRANE可在https://github.com/STORMgroup/CRANE上获得。我们还在GitHub页面上提供了完全重现我们结果的脚本。

英文摘要

Nanopore sequencing can read substantially longer sequences of nucleic acid molecules, called reads, than other sequencing methods, which has led to advances in genomic analysis such as the gapless human genome assembly. By analyzing the raw electrical signal reads that nanopore sequencing generates from molecules, existing works can map these reads without translating them into DNA characters (i.e., basecalling), allowing for quick and efficient analysis of sequencing data. However, raw signals often contain errors due to noise and processing errors, which limits the overall accuracy of raw signal analysis. Our goal in this work is to detect and correct errors in raw signals to improve the accuracy of raw signal analyses. To this end, we propose CRANE, a mechanism that trains and utilizes a Hidden Markov Model (HMM) to accurately correct signal errors. Our extensive evaluation on various datasets shows that CRANE 1) consistently improves the overall accuracy of the underlying raw signal analysis tools, 2) minimizes the burden of optimizing analysis pipelines for newer nanopore technologies, and 3) does not introduce substantial computational overhead. We conclude that CRANE provides an effective mechanism to systematically identify and correct the errors in raw nanopore signals before further analysis, which can enable the development of a new class of error correction mechanisms purely designed for raw nanopore signals. Source Code: CRANE is available at https://github.com/STORMgroup/CRANE. We also provide the scripts to fully reproduce our results on our GitHub page

2603.19545 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Verifiable Error Bounds for Physics-Informed Neural Network Solutions of Lyapunov and Hamilton-Jacobi-Bellman Equations

用于李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程的物理信息神经网络解的可验证误差界

Jun Liu

发表机构 * Department of Applied Mathematics, Faculty of Mathematics, University of Waterloo(应用数学系,数学学院,滑铁卢大学)

AI总结 本文研究了如何通过物理信息神经网络求解李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程的可验证误差界,提出了基于这些方程的解的误差界计算方法,并展示了如何通过残差界来估计真实解的相对误差以及近似解的后验估计。

Comments The paper will appear in the IEEE Control Systems Letters

详情
AI中文摘要

许多非线性系统分析和控制的核心问题可以重新表述为求解偏微分方程(如李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程)的问题。物理信息神经网络(PINNs)作为一种无网格方法,已被提出用于近似这些方程的解,但在大多数现有工作中,没有严格的保证表明小的PDE残差意味着小的解误差。本文开发了用于李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程近似解的可验证误差界,特别强调基于PINN的近似方法。对于李雅普诺夫和哈密尔顿-雅可比-贝尔曼PDEs,我们展示了可验证残差界可以产生相对于真实解的相对误差界以及以近似解为术语的可计算后验估计。对于哈密尔顿-雅可比-贝尔曼方程,这还提供了在紧致子水平集上的最优值函数的认证上界和下界,并量化了由此诱导的反馈策略的最优性差距。我们进一步证明了一侧残差界已经意味着近似本身定义了有效的李雅普诺夫或控制李雅普诺夫函数。我们通过数值示例展示了这些结果。

英文摘要

Many core problems in nonlinear systems analysis and control can be recast as solving partial differential equations (PDEs) such as Lyapunov and Hamilton-Jacobi-Bellman (HJB) equations. Physics-informed neural networks (PINNs) have emerged as a promising mesh-free approach for approximating their solutions, but in most existing works there is no rigorous guarantee that a small PDE residual implies a small solution error. This paper develops verifiable error bounds for approximate solutions of Lyapunov and HJB equations, with particular emphasis on PINN-based approximations. For both the Lyapunov and HJB PDEs, we show that a verifiable residual bound yields relative error bounds with respect to the true solutions as well as computable a posteriori estimates in terms of the approximate solutions. For the HJB equation, this also yields certified upper and lower bounds on the optimal value function on compact sublevel sets and quantifies the optimality gap of the induced feedback policy. We further show that one-sided residual bounds already imply that the approximation itself defines a valid Lyapunov or control Lyapunov function. We illustrate the results with numerical examples.

2603.16513 2026-05-21 cs.LG cs.AI 版本更新

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

FEAT: 一个线性复杂度的超大规模结构化数据基础模型

Zhenghang Song, Tang Qian, Lu Chen, Yushuai Li, Zhengke Hu, Bingbing Fang, Yumeng Song, Junbo Zhao, Sheng Zhang, Tianyi Li

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团) Aalborg University(奥尔堡大学)

AI总结 本文提出FEAT,一种线性复杂度的基础模型,用于处理超大规模结构化数据,通过多层双轴编码架构和自适应融合双向状态空间模型,实现线性时间内的跨元组上下文化,同时支持排列不变的表示学习。

详情
AI中文摘要

结构化数据在医疗、金融和科学数据管理等领域被广泛应用。最近关于结构化数据基础模型(SFMs)的研究旨在支持在这些数据上的数据分析和挖掘任务,但将其应用于现实世界的企业数据库时仍面临可扩展性和泛化能力的挑战。首先,许多SFMs依赖于完全自注意力机制,这引入了O(N²)的计算瓶颈,并限制了可以同时处理的元组数量。其次,直接用线性复杂度序列模型替代注意力可能与结构化数据的排列不变性质相冲突,引入人为的顺序偏差并降低表示质量。此外,仅在合成数据上训练的模型可能难以泛化到现实世界数据库中常见的重尾和异质分布。为了解决这些挑战,我们提出了FEAT,一种用于超大规模结构化数据的线性复杂度基础模型。FEAT用多层双轴编码架构替代二次注意力。它集成了自适应融合双向状态空间模型(AFBM)与卷积门控线性注意力(Conv-GLA),在O(N)时间内实现跨元组上下文化,同时支持排列不变的表示学习。为了提高在现实数据偏斜下的鲁棒性,FEAT进一步采用混合结构因果预训练流水线,具有鲁棒的重建目标。在12个现实世界数据库基准测试中,FEAT在零样本任务上始终优于代表性的SFMs,并且与结构化数据样本长度线性扩展,达到高达50倍的推理延迟提升。

英文摘要

Structured data is widely used in domains such as healthcare, finance, and scientific data management. Recent studies on structured data foundation models (SFMs) aim to support data analysis and mining tasks over such data, but still face scalability and generalization challenges when applied to real-world enterprise databases. First, many SFMs rely on full self-attention, which introduces an O(N^2) computational bottleneck and limits the number of tuples that can be processed jointly. Second, directly replacing attention with linear-complexity sequence models may conflict with the permutation-invariant nature of structured data, introducing artificial order bias and degrading representation quality. Moreover, models trained only on synthetic data may struggle to generalize to the heavy-tailed and heterogeneous distributions commonly found in real-world databases. To address these challenges, we propose FEAT, a linear-complexity foundation model for extremely large structured data. FEAT replaces quadratic attention with a multi-layer dual-axis encoding architecture. It integrates an adaptive-fusion bidirectional state-space model (AFBM) with convolutional gated linear attention (Conv-GLA), enabling cross-tuple contextualization in O(N) time while supporting permutation-invariant representation learning. To improve robustness under real-world data skewness, FEAT further adopts a hybrid structural causal pre-training pipeline with a robust reconstruction objective. Experiments on 12 real-world database benchmarks show that FEAT consistently outperforms representative SFMs on zero-shot tasks and scales linearly with structured-data sample length, achieving up to 50x faster inference latency.

2603.14392 2026-05-21 cs.LG cs.RO 版本更新

WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

WestWorld: 一种知识编码的可扩展轨迹世界模型用于多样化机器人系统

Yuchen Wang, Jiangtao Kong, Sizhe Wei, Xiaochang Li, Haohong Lin, Hongjue Zhao, Tianyi Zhou, Lu Gan, Huajie Shao

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Carnegie Mellon University(卡内基梅隆大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 本文提出WestWorld,一种知识编码的可扩展轨迹世界模型,用于多样化机器人系统,通过引入系统感知的混合专家(Sys-MoE)和结构嵌入来提升可扩展性和零样本泛化能力,实现了在多种机器人环境中的高效轨迹预测和控制。

Comments ICML 2026 spotlight

详情
AI中文摘要

轨迹世界模型在机器人动力学学习、规划和控制中起着关键作用。尽管最近的研究已经探索了适用于多样化机器人系统的轨迹世界模型,但它们难以扩展到大量不同的系统动态,并忽略了物理结构的领域知识。为了解决这些限制,我们引入了WestWorld,一种针对多样化机器人系统的知识编码可扩展轨迹世界模型。为了解决可扩展性挑战,我们提出了一种新颖的系统感知混合专家(Sys-MoE),通过可学习的系统嵌入动态结合和路由针对不同机器人系统的专用专家。为进一步增强零样本泛化能力,我们通过引入结构嵌入来整合机器人物理结构的领域知识,使轨迹表示与形态学信息对齐。在预训练于89个复杂环境(涵盖多样化形态的仿真和现实世界设置)后,WestWorld在零样本和少样本轨迹预测上显著优于竞争基线。此外,它在广泛范围的机器人环境中的可扩展性表现出色,并在不同机器人上的下游基于模型的控制中显著提高了性能。最后,我们在现实世界中的Unitree Go1上部署了该模型,展示了稳定的移动性能。代码可在https://github.com/511205787/WestWorld上获取。

英文摘要

Trajectory world models play a crucial role in robotic dynamics learning, planning, and control. While recent works have explored trajectory world models for diverse robotic systems, they struggle to scale to a large number of distinct system dynamics and overlook domain knowledge of physical structures. To address these limitations, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotic systems. To tackle the scalability challenge, we propose a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically combines and routes specialized experts for different robotic systems via a learnable system embedding. To further enhance zero-shot generalization, we incorporate domain knowledge of robot physical structures by introducing a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 complex environments spanning diverse morphologies across both simulation and real-world settings, WestWorld achieves significant improvements over competitive baselines in zero- and few-shot trajectory prediction. Additionally, it shows strong scalability across a wide range of robotic environments and significantly improves performance on downstream model-based control for different robots. Finally, we deploy our model on a real-world Unitree Go1, where it demonstrates stable locomotion performance. The code is available at https://github.com/511205787/WestWorld.

2603.13419 2026-05-21 cs.LG 版本更新

Diffusion Models Memorize in Training -- and Generalize in Inference

扩散模型在训练中记忆,而在推理中泛化

Tim Kaiser, Markus Kollmann

发表机构 * Heinrich-Heine-University Düsseldorf(杜伊斯堡-埃森大学)

AI总结 本文研究了扩散模型在训练中过度拟合去噪目标,导致训练样本与验证样本性能差距,但通过模型误差使采样轨迹远离训练样本分布,从而在推理中实现泛化。

Comments 31 pages and 29 figures

详情
AI中文摘要

扩散模型在实践中泛化效果良好。然而,一个最优的扩散模型完全记忆训练数据,因此无法泛化,引发了一个问题:是什么因素使真实的扩散模型能够泛化?我们发现,尽管在样本层面泛化,扩散模型逐渐过度拟合去噪训练目标,从而在验证样本和训练样本的性能之间产生泛化差距。这种差距在中间噪声水平最明显。使用一个完全分析性的误差易犯玩具模型,我们追踪了影响泛化差距的因素。我们发现,最优去噪流场在训练点周围局部化,但模型误差抑制了对训练点的精确回忆,从而产生一个平滑、泛化的流场。最后,我们发现训练中观察到的泛化差距不会转移到推理中,这会导致生成样本与训练样本有很强的相似性。这是因为采样轨迹的中间状态足够远离模型训练所用的噪声训练样本分布。这些发现揭示了扩散模型泛化的全新图景:流场通过模型误差泛化,使采样轨迹远离噪声训练样本的领域,从而自然地防止过拟合。

英文摘要

Diffusion models generalize well in practice. However, an optimal diffusion model fully memorizes the training data and therefore fails to generalize, raising the question of what induces generalization in a real diffusion model. We show that, despite generalizing at the sample level, diffusion models progressively overfit the denoising training objective and thereby create a generalization gap between the performance on validation and training samples. This gap is most pronounced at intermediate noise levels. Using a fully analytic error-prone toy model, we trace the factors affecting the generalization gap. We find that the optimal denoising flow field localizes sharply around training points, but the model error suppresses the exact recall of training points, yielding a smooth, generalizing flow field. Finally, we find that the generalization gap observed in training does not translate to inference, which would result in a strong similarity between generated samples and training samples. This is because the intermediate states of sampling trajectories are sufficiently far from the distribution of noisy training samples the model is trained on. Together, these findings reveal a novel picture of how diffusion models generalize: the flow field generalizes through model error, which moves sampling trajectories outside the domain of noisy training samples and thereby naturally prevents overfitting.

2603.10726 2026-05-21 cs.CR cs.DC cs.LG 版本更新

PrefixWall: Mitigating Prefix Caching Side Channels in Shared LLM Systems

PrefixWall: 缓解共享LLM系统中的前缀缓存侧信道

Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri, Thaleia Dimitra Doudali

发表机构 * IMDEA Software Institute(IMDEA软件研究所) Universidad Politécnica de Madrid(马德里理工大学)

AI总结 本文提出PrefixWall系统,通过监控缓存重用并选择性隔离前缀,有效缓解多租户LLM服务系统中自动前缀缓存(APC)侧信道带来的安全风险,同时提升缓存利用率和推理效率。

详情
AI中文摘要

大型语言模型(LLMs)依赖自动前缀缓存(APC)等优化技术来加速推理。APC通过重用先前计算的状态来加速请求的前缀部分,当另一个请求以相同文本开始时。尽管APC提高了吞吐量,但它引入了定时侧信道:缓存命中比缓存未命中更快,导致可观察的延迟差异。在多租户系统中,攻击者可以利用这些差异推断敏感信息,例如通过观察命中/未命中模式逐步重建其他用户的请求。当前的防御方法采用“砸锤子”策略:禁用APC和缓存共享,隔离用户,牺牲效率以换取常规用户。本文提出PrefixWall,一种系统,能够在不牺牲性能和效率的情况下保护多租户LLM服务系统免受APC侧信道的攻击。PrefixWall监控跨用户的缓存重用,标记可疑共享,并选择性隔离前缀,仅在必要时限制其重用。评估显示,与现有隔离用户的防御方法相比,PrefixWall可实现高达70%的缓存利用率提升和30%的推理延迟降低。PrefixWall的轻量级设计展示了LLM服务中的安全性和性能之间不必相互牺牲。

英文摘要

Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents PrefixWall, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency. PrefixWall monitors cache reuse across users, flags suspicious sharing, and selectively isolates prefixes, restricting their reuse only when necessary. Evaluation shows that PrefixWall enables up to 70% higher cache reuse and 30% lower inference latency compared to existing defenses that isolate users. PrefixWall's lightweight design demonstrates how security in LLM serving does not have to come at the cost of unnecessarily reduced performance or unbearable overheads.

2603.09024 2026-05-21 cs.LG 版本更新

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

在漂移后何时重新训练:对后漂移数据大小充分性的数据-only测试

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

发表机构 * SANKEN, The University of Osaka, Japan(SANKEN大学大阪大学日本)

AI总结 本文提出CALIPER方法,通过数据-only测试估计后漂移数据大小以确保稳定重新训练,该方法利用动态系统生成的数据状态依赖性,通过单次加权局部回归和局部性参数θ跟踪一歩代理误差,当有效样本量门控满足时,误差随局部性参数增加而单调非递增,表明数据足够用于重新训练。

Comments Accepted by ICLR 2026

详情
AI中文摘要

突然的概念漂移使之前训练的预测器变得不可靠,但决定何时重新训练和后漂移数据大小是否足够 rarely 被解决。我们提出CALIPER - 一个检测器和模型无关的数据-only测试,用于估计后漂移数据大小以实现稳定重新训练。CALIPER利用动态系统生成的数据状态依赖性:我们在后漂移窗口上运行一次加权局部回归,并跟踪一个一步代理误差作为局部性参数θ的函数。当有效样本量门控被满足时,该误差随局部性参数增加而单调非递增,表明数据大小足够用于重新训练。我们还提供了对方法的理论分析,并展示了该算法具有低的每更新时间和内存。在四个异质领域、三个学习者家族和两个检测器的数据集上,CALIPER一致匹配或超过最佳固定数据大小进行重新训练,同时产生可忽略的开销,经常优于增量更新。CALIPER缩小了漂移检测和数据充分适应在流学习中的差距。

英文摘要

Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $θ$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

2603.08155 2026-05-21 cs.LG 版本更新

C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

C$^2$FG: 通过分数差异分析实现控制分类器无关引导

Jiayang Gao, Tianyi Zheng, Jiayang Zou, Fengxiang Yang, Shice Liu, Luyao Fan, Zheyu Zhang, Hao Zhang, Jinwei Chen, Peng-Tao Jiang, Bo Li, Jia Wang

发表机构 * Shanghai Jiao Tong University(上海交通大学) vivo BlueImage Lab(vivo 蓝影实验室) vivo Mobile Communication Co., Ltd.(vivo 通信有限公司)

AI总结 本文提出C$^2$FG,一种基于分数差异分析的控制分类器无关引导方法,通过严格理论分析建立了条件分布与无条件分布在不同时间步的分数差异上界,从而为时间依赖引导提供了理论基础,并通过实验验证了其在多种生成任务中的有效性。

Comments Accepted to CVPR 2026 (Highlight)

详情
AI中文摘要

分类器无关引导(CFG)是现代条件扩散模型的核心,但其依赖于固定或启发式动态引导权重,主要基于经验,忽略了扩散过程的内在动态。本文对分类器无关引导进行了严格的理论分析。具体而言,我们基于扩散过程建立了条件分布与无条件分布在不同时间步的分数差异的严格上界。这一发现解释了固定权重策略的局限性,并为时间依赖引导建立了原理基础。受此启发,我们引入了控制分类器无关引导(C$^2$FG),一种新颖的、无需训练且可直接使用的插件方法,通过指数衰减控制函数将引导强度与扩散动态对齐。大量实验表明,C$^2$FG在多种生成任务中均有效且具有广泛的应用性,同时与现有策略具有正交性。

英文摘要

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.

2603.01712 2026-05-21 cs.AI cs.LG 版本更新

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

FT-Dojo: 向自主LLM微调迈进的语言代理

Qizheng Li, Yifei Zhang, Xiao Yang, Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian

发表机构 * Peking University(北京大学) Nanjing University(南京大学) Microsoft Research Asia(微软亚洲研究院) The University of Chicago(芝加哥大学)

AI总结 本文提出FT-Dojo交互式基准环境,用于研究自主LLM微调,通过标准化任务接口、共享数据仓库、沙盒执行环境和反馈协议,开发了FT-Agent框架,实现了结构化迭代规划和多级反馈分析,实验显示FT-Agent在13个任务中表现优异,且展示了代理在故障恢复和长期规划中的能力。

Comments 26 pages, 6 figures, 11 tables

详情
AI中文摘要

针对垂直领域LLM微调仍需大量人力劳动的问题,本文引入FT-Dojo交互式基准环境,包含5个领域13个任务。FT-Dojo标准化了任务接口、共享数据仓库、沙盒执行环境、结构化反馈协议和评估流程。进一步开发了FT-Agent框架,通过结构化迭代规划、快速失败验证和多级反馈分析优化数据和训练策略。实验表明FT-Agent在13个任务中表现优异,且通过与前沿代理、开源规划框架和多轮统计对比验证了主要发现。案例研究表明代理可通过累积学习恢复故障,但仍存在因果诊断和长期规划的局限性。实现代码见https://github.com/microsoft/rd-agent。

英文摘要

Fine-tuning large language models for vertical domains remains labor-intensive, requiring practitioners to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine learning and language agents, end-to-end LLM fine-tuning has not been systematically studied as an interactive agent task. We introduce FT-Dojo, an interactive benchmark environment for autonomous LLM fine-tuning, comprising 13 tasks across 5 domains. Rather than a new collection of static datasets, FT-Dojo standardizes a task interface, shared raw-data repository, sandboxed execution environment, structured feedback protocol, and held-out evaluation procedure. We further develop FT-Agent, a fine-tuning-oriented autonomous framework that uses structured iteration planning, fail-fast validation, and multi-level feedback analysis to refine data and training strategies. Experiments show that FT-Agent provides a strong initial baseline, achieving the best performance on 10 out of 13 tasks, with additional controlled comparisons against frontier agents, open-source planning backbones, and multi-run statistics supporting the main findings. Case studies show that agents can recover from failures through cumulative learning, while still exposing limitations in causal diagnosis and long-horizon planning. The implementation is available at https://github.com/microsoft/rd-agent.

2603.01406 2026-05-21 cs.LG cs.AI cs.NA math.NA 版本更新

One Operator to Rule Them All? On Boundary-Indexed Operator Families in Neural PDE Solvers

一个运算符统治一切?关于神经PDE求解器中边界索引运算符家族的探讨

Lennon J. Shikhman

发表机构 * College of Computing, Georgia Institute of Technology(佐治亚理工学院计算机学院) Department of Mathematics and Systems Engineering, Florida Institute of Technology(佛罗里达理工学院数学与系统工程系)

AI总结 本文探讨了神经PDE求解器中边界索引运算符家族的核心问题,指出传统方法在边界条件变化时存在非识别性问题,并通过实验验证了在不同边界条件下求解器的局限性。

Comments Published in the ICLR 2026 Workshop on AI & PDEs. 10 pages, 5 figures

详情
AI中文摘要

神经PDE求解器通常被描述为学习映射问题数据到PDE解的运算符。本文作者认为,当边界条件变化时,这种解释通常是不正确的。我们展示了标准的神经运算符训练实际上隐式地学习了一个边界索引的运算符家族,而不是一个单一的、不考虑边界的运算符,其中学习的映射本质上依赖于训练过程中看到的边界条件分布。我们通过将运算符学习框架为边界条件上的条件风险最小化来正式化这一观点,这导致了在训练边界分布之外的非识别性结果。因此,forcing terms或resolution的泛化并不意味着在边界条件上的泛化。我们通过受控实验在泊松方程上支持我们的理论分析,展示了在边界条件转移时的明显退化,不同边界集合之间的跨分布失败,以及在去除边界信息时收敛到条件期望。我们的结果澄清了当前神经PDE求解器的核心限制,并突显了在追求PDE基础模型时需要显式边界意识建模的必要性。

英文摘要

Neural PDE solvers are often described as learning solution operators that map problem data to PDE solutions. In this work, we argue that this interpretation is generally incorrect when boundary conditions vary. We show that standard neural operator training implicitly learns a boundary-indexed family of operators, rather than a single boundary-agnostic operator, with the learned mapping fundamentally conditioned on the boundary-condition distribution seen during training. We formalize this perspective by framing operator learning as conditional risk minimization over boundary conditions, which leads to a non-identifiability result outside the support of the training boundary distribution. As a consequence, generalization in forcing terms or resolution does not imply generalization across boundary conditions. We support our theoretical analysis with controlled experiments on the Poisson equation, demonstrating sharp degradation under boundary-condition shifts, cross-distribution failures between distinct boundary ensembles, and convergence to conditional expectations when boundary information is removed. Our results clarify a core limitation of current neural PDE solvers and highlight the need for explicit boundary-aware modeling in the pursuit of foundation models for PDEs.

2602.20399 2026-05-21 cs.LG 版本更新

GeoPT: Scaling Physics Simulation via Lifted Geometric Pre-Training

GeoPT:通过提升几何预训练实现物理模拟的扩展

Haixu Wu, Minghao Guo, Zongyi Li, Zhiyang Dou, Mingsheng Long, Kaiming He, Wojciech Matusik

发表机构 * MIT CSAIL Tsinghua University(清华大学)

AI总结 本文提出GeoPT,一种基于提升几何预训练的通用物理模拟预训练模型,通过合成动态增强几何,实现动态感知的自监督学习,从而提升物理模拟的效率和效果。

Comments Project Page: https://physics-scaling.github.io/GeoPT/

详情
AI中文摘要

神经模拟器有望成为高效的物理模拟替代品,但其扩展受限于生成高保真训练数据的高昂成本。在大量现成几何上预训练提供了一种自然替代方案,但面临根本性缺口:仅对静态几何进行监督会忽略动态,并可能导致物理任务的负迁移。我们提出了GeoPT,一种基于提升几何预训练的通用物理模拟预训练模型。核心思想是通过合成动态增强几何,实现动态感知的自监督学习,无需物理标签。在超过一百万个样本上预训练后,GeoPT在流体力学、汽车、飞机和船舶以及固体力学中的碰撞模拟等工业保真度基准上持续改进,将标注数据需求减少20-60%,并加速收敛2倍。这些结果表明,通过合成动态提升可以弥合几何-物理的差距,解锁神经模拟的可扩展路径,可能进一步扩展到其他领域。代码可在https://github.com/Physics-Scaling/GeoPT上获得。

英文摘要

Neural simulators promise efficient surrogates for physics simulation, but scaling them is bottlenecked by the prohibitive cost of generating high-fidelity training data. Pre-training on abundant off-the-shelf geometries offers a natural alternative, yet faces a fundamental gap: supervision on static geometry alone ignores dynamics and can lead to negative transfer on physics tasks. We present GeoPT, a unified pre-trained model for general physics simulation based on lifted geometric pre-training. The core idea is to augment geometry with synthetic dynamics, enabling dynamics-aware self-supervision without physics labels. Pre-trained on over one million samples, GeoPT consistently improves industrial-fidelity benchmarks spanning fluid mechanics for cars, aircraft, and ships, and solid mechanics in crash simulation, reducing labeled data requirements by 20-60% and accelerating convergence by 2$\times$. These results show that lifting with synthetic dynamics bridges the geometry-physics gap, unlocking a scalable path for neural simulation and potentially beyond. Code is available at https://github.com/Physics-Scaling/GeoPT.

2602.16608 2026-05-21 cs.CL cs.AI cs.CV cs.LG 版本更新

Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

可解释的人工智能:面向Transformer模型的上下文感知分层集成梯度方法

Melkamu Abay Mersha, Jugal Kalita

发表机构 * College of Engineering and Applied Science, University of Colorado Colorado Springs(科罗拉多州立大学工程与应用科学学院)

AI总结 本文提出了一种上下文感知分层集成梯度框架(CA-LIG),用于解释Transformer模型的决策过程,通过计算每个Transformer块内的分层集成梯度,并将这些token级属性与类特定的注意力梯度融合,从而生成具有符号和上下文敏感性的属性图,以捕捉支持和反对的证据,并追踪Transformer层中的相关性层次流动。

详情
AI中文摘要

Transformer模型在多个领域和任务中实现了最先进的性能,然而其深层表示使得预测难以解释。现有的可解释性方法依赖于最终层的属性,只能捕捉局部token级属性或全局注意力模式,缺乏对token间依赖关系和结构组件的上下文感知能力。它们还无法捕捉相关性如何在层之间演变以及结构组件如何影响决策。为了解决这些限制,我们提出了上下文感知分层集成梯度(CA-LIG)框架,一种统一的层次属性框架,该框架在每个Transformer块内计算分层集成梯度,并将这些token级属性与类特定的注意力梯度融合。这种整合产生了带有符号和上下文敏感性的属性图,能够捕捉支持和反对的证据,同时追踪Transformer层中的相关性层次流动。我们评估了CA-LIG框架在多样化的任务、领域和Transformer模型家族中的表现,包括使用BERT进行情感分析和长多类文档分类,使用XLM-R和AfroLM在低资源语言设置中进行仇恨言论检测,以及使用Masked Autoencoder Vision Transformer模型进行图像分类。在所有任务和架构中,CA-LIG提供了更忠实的属性,显示出对上下文依赖的更强敏感性,并产生了更清晰、更语义连贯的可视化结果,优于现有可解释性方法。这些结果表明,CA-LIG提供了更全面、上下文感知和可靠的Transformer决策解释,推动了深度神经网络的实用可解释性和概念理解。

英文摘要

Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We evaluate the CA-LIG Framework across diverse tasks, domains, and transformer model families, including sentiment analysis and long and multi-class document classification with BERT, hate speech detection in a low-resource language setting with XLM-R and AfroLM, and image classification with Masked Autoencoder vision Transformer model. Across all tasks and architectures, CA-LIG provides more faithful attributions, shows stronger sensitivity to contextual dependencies, and produces clearer, more semantically coherent visualizations than established explainability methods. These results indicate that CA-LIG provides a more comprehensive, context-aware, and reliable explanation of Transformer decision-making, advancing both the practical interpretability and conceptual understanding of deep neural models.

2602.16399 2026-05-21 eess.AS cs.LG cs.SD 版本更新

Multi-Channel Replay Speech Detection using Acoustic Maps

基于声学地图的多通道回放语音检测

Michael Neri, Tuomas Virtanen

发表机构 * Faculty of Information Technology(信息科技学院) Commmunication Sciences(通信科学) Tampere University(塔尔皮奥大学) Tampere, Finland(芬兰塔尔皮奥)

AI总结 本文提出利用声学地图作为新型空间特征表示方法,用于多通道录音中的回放语音检测,通过轻量级卷积神经网络在ReMASC数据集上实现了竞争性性能,展示了声学地图在不同设备和声学环境下的紧凑且物理可解释的特征空间。

Comments Accepted in EUSIPCO 2026

详情
AI中文摘要

回放攻击仍然是自动说话人验证系统中的关键漏洞,特别是在实时语音助手应用中。在本工作中,我们提出声学地图作为新型的空间特征表示方法,用于从多通道录音中检测回放语音。声学地图源自经典波束成形在离散方位和仰角网格上的处理,编码方向能量分布,反映了人类语音辐射与基于扬声器的回放之间的物理差异。设计了一个轻量级卷积神经网络来操作此表示,在ReMASC数据集上约有6000个可训练参数。实验结果表明,声学地图为回放攻击检测提供了紧凑且物理可解释的特征空间,适用于不同设备和声学环境。

英文摘要

Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.

2602.10989 2026-05-21 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH 版本更新

Variational Optimality of Föllmer Processes in Generative Diffusions

变分最优的Föllmer过程在生成扩散中的应用

Yifan Chen, Eric Vanden-Eijnden

发表机构 * Department of Mathematics, University of California, Los Angeles, CA, USA(加州大学洛杉矶分校数学系) Machine Learning Lab, Capital Fund Management, Paris, France(Capital Fund Management机器学习实验室) Courant Institute, New York University, NY, USA(纽约大学Courant研究所)

AI总结 本文研究了利用随机插值框架构造和分析生成扩散的过程,通过条件期望估计漂移项,证明了在变分最优条件下Föllmer过程在路径空间中最小化相对熵,并提供了数据驱动的模拟方法。

详情
AI中文摘要

我们构造并分析了利用随机插值框架在有限时间范围内将点质量运输到指定目标分布的生成扩散。漂移项以条件期望形式表达,可通过独立样本估计而无需模拟随机过程。我们证明扩散系数可以在事后调整而不改变时间边际分布。在所有此类调整中,最小化估计误差对路径空间Kullback-Leibler散度的影响会选出闭式形式的Föllmer过程——一种路径测度相对于由插值计划确定的参考过程最小化相对熵的扩散。这为Föllmer过程提供了新的变分刻画,补充了经典的Schrodinger桥和随机控制方法,并提供了Föllmer漂移的条件期望表示,使从数据中无模拟估计成为可能。我们进一步证明,在最优扩散系数下,路径空间Kullback-Leibler散度与插值计划无关,使得不同计划在变分意义上统计等价。我们还通过数值实验展示了Föllmer过程在概率预报和数据同化中的路径空间变分最优影响。

英文摘要

We construct and analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon using the stochastic interpolant framework. The drift is expressed as a conditional expectation that can be estimated from independent samples without simulating stochastic processes. We show that the diffusion coefficient can be tuned \emph{a~posteriori} without changing the time-marginal distributions. Among all such tunings, we prove that minimizing the impact of estimation error on the path-space Kullback--Leibler divergence selects, in closed form, a Föllmer process -- a diffusion whose path measure minimizes relative entropy with respect to a reference process determined by the interpolation schedules alone. This yields a new variational characterization of Föllmer processes, complementing classical formulations via Schrödinger bridges and stochastic control, and provides a conditional-expectation representation of the Föllmer drift that enables simulation-free estimation from data. We further establish that, under this optimal diffusion coefficient, the path-space Kullback--Leibler divergence becomes independent of the interpolation schedule, rendering different schedules statistically equivalent in this variational sense. We provide numerical experiments to illustrate the impact of path-space variational optimality of Föllmer's processes in probabilistic forecasting and data assimilation applications.

2602.10455 2026-05-21 cs.IR cs.LG 版本更新

Compute Only Once: UG-Separation for Efficient Large Recommendation Models

只计算一次:用于高效大规模推荐模型的UG分离

Hui Lu, Zheng Chai, Shipeng Bai, Hao Zhang, Zhifang Fan, Kunmin Bai, Ke Sun, Yingwen Wu, Bingzheng Wei, Xiang Sun, Ziyan Gong, Tianyi Liu, Hua Chen, Deping Xie, Zhongkai Chen, Zhiliang Guo, Qiwei Chen, Yuchao Zheng

发表机构 * ByteDance AML(字节跳动人工智能实验室)

AI总结 本文提出UG分离方法,通过在TokenMixer密集交互模型中显式分离用户侧和物品侧的信息流,实现用户侧计算的重用,从而减少冗余推理成本,并通过信息补偿策略和权重量化技术提升效率。

Comments Large Recommender Model, Industrial Recommenders, Scaling Law

详情
AI中文摘要

受扩展定律驱动,推荐系统越来越多地依赖大规模模型以捕捉复杂的特征交互和用户行为,但这种趋势也导致了训练和推理成本过高。虽然长序列模型可以通过KV缓存重用用户侧计算,但在基于TokenMixer的密集特征交互架构中,这种重用困难,因为用户和物品特征在各层之间深度交织和混合。在本工作中,我们提出了用户-组分离(UG-Sep),一种工业级大规模框架,首次在基于TokenMixer的密集交互模型中实现了用户侧计算的重用。UG-Sep在token-mixing层中显式分离用户侧和物品侧的信息流,确保一组令牌在各层中保持纯粹的用户侧表示。这种设计允许相应的每令牌计算在多个样本之间重用,显著减少冗余推理成本。为了补偿由遮蔽引起的潜在表达能力损失,我们进一步提出了信息补偿策略,该策略能够自适应地重建被抑制的用户-物品交互。此外,由于UG-Sep显著减少了用户侧FLOPs并暴露了内存受限组件,我们引入了W8A16(8位权重,16位激活)权重仅量化技术,以缓解内存带宽瓶颈并实现额外加速。我们进行了广泛的离线评估和大规模在线A/B测试,以验证UG-Sep的有效性。结果表明,与字节跳动上的TokenMixer相比,UG-Sep在多个有影响力的业务场景中,如抖音Feed推荐、Hongguo Feed推荐、楚天ja广告和钱盾广告,将推理延迟减少了高达20%,且未对在线用户体验和商业指标造成负面影响。

英文摘要

Driven by scaling laws, recommender systems increasingly rely on larger-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive training and inference costs. While long-sequence models can reuse user-side computation through KV Caching, such reuse is difficult in TokenMixer-based dense feature interaction architectures, where user and group features are deeply entangled and mixed-up across layers. In this work, we present User-Group Separation (UG-Sep), an industrial large-scale framework that enables user-side computation reusable in TokenMixer-based dense interaction models for the first time. UG-Sep explicitly disentangles user-side and item-side information flows within token-mixing layers, ensuring that a subset of tokens preserves purely user-side representations across layers. This design allows the corresponding per-token computations to be reused across multiple samples, significantly reducing redundant inference cost. To compensate for the potential expressive capacity loss induced by masking, we further propose an Information Compensation strategy that adaptively reconstructs suppressed user-item interactions. Moreover, as UG-Sep substantially reduces user-side FLOPs and exposes memory-bound components, we incorporate W8A16 (8-bit weight, 16-bit activation) weight-only quantization to alleviate memory bandwidth bottlenecks and achieve additional acceleration. We conduct extensive offline evaluations and large-scale online A/B experiments at ByteDance to validate the effectiveness of UG-Sep. Results show that UG-Sep reduces inference latency by up to 20% without causing adverse changes to online user experience and commercial metrics on multiple influential business scenarios compared to TokenMixer at ByteDance, including Douyin Feed Recommendation, Hongguo Feed Recommendation, Chuanshanjia Ads, and Qianchuan Ads.

2602.08819 2026-05-21 cs.LG cs.CL 版本更新

Bayesian Preference Learning for Test-Time Steerable Reward Models

基于测试时间可调节的贝叶斯偏好学习的奖励模型

Jiwoo Hong, Shao Tang, Zhipeng Wang

发表机构 * LinkedIn Corporation(LinkedIn公司) Nubank

AI总结 本文提出了一种新的贝叶斯奖励建模目标,即变分上下文奖励建模(ICRM),通过上下文偏好演示实现测试时间可调节性,从而适应未见过的偏好分布,提高了奖励模型的准确性和鲁棒性。

Comments Preprint

详情
AI中文摘要

奖励模型在通过强化学习(RL)对语言模型与人类偏好对齐中起核心作用。随着RL越来越多地应用于可验证奖励和多目标对齐等场景,RMs被期望编码更复杂和多维的偏好分布。然而,分类RMs一旦训练完成就保持静态,限制了测试时间的适应性。我们提出变分上下文奖励建模(ICRM),一种新颖的贝叶斯奖励建模目标,通过上下文偏好演示实现测试时间可调节性。ICRM将奖励建模视为在Bradley-Terry模型下对潜在偏好概率的变分推断,使用共轭Beta先验。我们证明ICRM能够适应单目标和多目标设置中的未见过的偏好分布。随着更多演示,ICRM在RM-Bench上的准确性从60.5提高到70.8,在道德困境偏好上比生成判断者具有更低的校准误差,并在冲突偏好下扩展了可达到的帕累托前沿。我们进一步研究了ICRM在RL训练中的实际适用性,证明其可以通过在数学推理中优于传统RM来有效编码可验证奖励。最后,我们提供了理论保证,变分目标在有限置信度下具有全局内部最优解,并分析了KL正则化如何缓解奖励过度优化。

英文摘要

Reward models are central to aligning language models with human preferences via reinforcement learning (RL). As RL is increasingly applied to settings such as verifiable rewards and multi-objective alignment, RMs are expected to encode more complex and multifaceted preference distributions. However, classifier RMs remain static once trained, limiting their adaptability at test time. We propose Variational In-Context Reward Modeling (ICRM), a novel Bayesian reward modeling objective that enables test-time steerability via in-context preference demonstrations. ICRM casts reward modeling as amortized variational inference over a latent preference probability under the Bradley-Terry model using a conjugate Beta prior. We show that ICRM adapts to unseen preference distributions at test time for both single and multi-objective settings. With more demonstrations, ICRM improves RM-Bench accuracy from 60.5 to 70.8, achieves lower calibration error than a generative judge on moral dilemma preferences, and expands the attainable Pareto frontier under conflicting preferences. We further study the practical applicability of ICRM for RL training, showing that it can effectively encode verifiable rewards by outperforming a conventional RM in math reasoning. Finally, we provide theoretical guarantees that the variational objective admits a global interior optimum with finite confidence, and we analyze how KL regularization mitigates reward over-optimization.

2602.06500 2026-05-21 cs.LG 版本更新

Can Microcanonical Langevin Dynamics Leverage Mini-Batch Gradient Noise?

微 canonical 动力学能否利用小批量梯度噪声?

Emanuel Sommer, Kangning Diao, Jakob Robnik, Uros Seljak, David Rügamer

发表机构 * Department of Statistics, LMU Munich(统计系,慕尼黑大学) Munich Center for Machine Learning(慕尼黑机器学习中心) Department of Physics, University of California, Berkeley(伯克利大学物理系) Department of Astronomy, Tsinghua University(清华大学天文系) Physics Division, Lawrence Berkeley National Lab(伯克利国家实验室物理部)

AI总结 本文研究了微 canonical 动力学能否有效利用小批量梯度噪声,提出了一种梯度噪声预条件化方案和能量方差基于的自适应调节器,从而开发出一种鲁棒且可扩展的微 canonical 采样器,实现了在高维推断任务中的最佳性能。

Comments In Proceedings of the 43rd International Conference on Machine Learning

详情
AI中文摘要

将推断方法如马尔可夫链蒙特卡罗扩展到高维模型仍然是贝叶斯深度学习中的核心挑战。一个有前景的最新提案,微 canonical 动力学蒙特卡罗,在广泛的问题上展示了最先进的性能。然而,其对完整数据集梯度的依赖使其在大规模问题中成本过高。本文解决了一个根本性问题:微 canonical 动力学能否有效利用小批量梯度噪声?我们提供了该问题的第一个系统研究,建立了随机梯度微 canonical 动力学的新型连续时间理论分析。我们揭示了两种关键的失败模式:由于各向异性梯度噪声导致的理论偏置和复杂高维后验中的数值不稳定性。为解决这些问题,我们提出了一种原理性的梯度噪声预条件化方案,已证明能显著减少这种偏置,并开发了一种新的基于能量方差的自适应调节器,自动化步长选择并动态告知数值保护措施。所得到的算法是一种鲁棒且可扩展的微 canonical 采样器,能够在具有挑战性的高维推断任务如贝叶斯神经网络中实现最先进的性能。结合最近的集合技术,我们的工作解锁了一种新的随机微 canonical 动力学集合(SMILE)采样器类,用于大规模贝叶斯推断。

英文摘要

Scaling inference methods such as Markov chain Monte Carlo to high-dimensional models remains a central challenge in Bayesian deep learning. A promising recent proposal, microcanonical Langevin Monte Carlo, has shown state-of-the-art performance across a wide range of problems. However, its reliance on full-dataset gradients makes it prohibitively expensive for large-scale problems. This paper addresses a fundamental question: Can microcanonical dynamics effectively leverage mini-batch gradient noise? We provide the first systematic study of this problem, establishing a novel continuous-time theoretical analysis of stochastic-gradient microcanonical dynamics. We reveal two critical failure modes: a theoretically derived bias due to anisotropic gradient noise and numerical instabilities in complex high-dimensional posteriors. To tackle these issues, we propose a principled gradient noise preconditioning scheme shown to significantly reduce this bias and develop a novel, energy-variance-based adaptive tuner that automates step size selection and dynamically informs numerical guardrails. The resulting algorithm is a robust and scalable microcanonical Monte Carlo sampler that achieves state-of-the-art performance on challenging high-dimensional inference tasks like Bayesian neural networks. Combined with recent ensemble techniques, our work unlocks a new class of stochastic microcanonical Langevin ensemble (SMILE) samplers for large-scale Bayesian inference.

2602.04907 2026-05-21 cs.LG cs.AI stat.ME 版本更新

Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

从不完美物理模型下的异方差随机动力系统中进行因果发现

Jianhong Chen, Naichen Shi, Xubo Yue

发表机构 * Department of Mechanical & Industrial Engineering(机械与工业工程系) Northeastern University(东北大学) Department of Industrial Engineering and Management Sciences(工业工程与管理科学系) Department of Mechanical Engineering(机械工程系) Northwestern University(西北大学)

AI总结 本文提出了一种整合因果发现框架,利用随机微分方程中的部分物理知识来提高动态系统中因果图的恢复能力,同时分析了在不完美物理模型下的鲁棒性。

Comments 101 pages

详情
AI中文摘要

因果发现是一种数据驱动的复杂系统分析范式,而基于物理的模型,如常微分方程(ODEs),为现实世界的动力学过程提供了机理结构。整合这些范式可以提高可识别性、稳定性和鲁棒性。然而,真实动力系统往往表现出循环交互和非平稳性,而许多因果发现方法依赖于无循环、平稳或平衡假设。我们提出了一种整合因果发现框架,利用随机微分方程(SDEs)中的部分物理知识。漂移项编码已知的ODE动力学,而扩散项捕捉超出规定物理的未知因果耦合。我们开发了一种可扩展的稀疏诱导最大准似然估计器,并通过理论上合理的稳定技术来改善优化景观。在温和条件下,我们为稳定和不稳定SDEs建立了因果图恢复保证。我们还分析了我们的因果图估计在ODE不准确情况下的鲁棒性,并澄清了引入的稳定技术如何平衡数值稳定性和统计恢复能力。在线性SDEs和非线性基准测试,包括具有无循环和循环结构的Lotka-Volterra和Lorenz动力学上,实验显示了比数据驱动基线更好的图恢复和鲁棒性。我们还通过在我们的因果发现框架内重建随机SIR动力学来展示实际应用,以在现实世界流行病数据中进行因果图重建。

英文摘要

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

2602.03004 2026-05-21 cs.LG cs.AI 版本更新

Graph Autoencoder for Process Monitoring

用于过程监控的图自编码器

Xiangrui Zhang

发表机构 * School of Information and Control Engineering, China University of Mining and Technology(信息与控制工程学院,中国矿业大学)

AI总结 本文提出了一种因果图时空自编码器(CGSTAE),通过结合基于空间自注意力机制的空间相关图结构学习模块和利用图卷积长短期记忆(GCLSTM)的空间-时间编码器-解码器模块,以提高工业过程监控的可靠性和可解释性。

详情
AI中文摘要

为提高工业过程监控的可靠性和可解释性,本文提出了一种因果图时空自编码器(CGSTAE)。CGSTAE的网络架构结合了两个组件:基于空间自注意力机制的空间相关图结构学习模块(SSAM)和利用图卷积长短期记忆(GCLSTM)的空间-时间编码器-解码器模块。SSAM通过捕捉变量之间的动态关系来学习相关图,而一种新的三步因果图结构学习算法被引入,以从这些相关图中推导出因果图。该算法利用因果不变性原理的反向视角来揭示从变化相关性中得到的不变因果图。空间-时间编码器-解码器由GCLSTM单元构建,在序列到序列框架内重建时间序列过程数据。所提出的CGSTAE通过特征空间和残差空间中的两个统计量实现有效的过程监控和故障检测。最后,我们通过田纳西东部过程和一个现实世界的空气分离过程验证了CGSTAE在过程监控中的有效性。

英文摘要

To improve the reliability and interpretability of industrial process monitoring, this article proposes a Causal Graph Spatial-Temporal Autoencoder (CGSTAE). The network architecture of CGSTAE combines two components: a correlation graph structure learning module based on spatial self-attention mechanism (SSAM) and a spatial-temporal encoder-decoder module utilizing graph convolutional long-short term memory (GCLSTM). The SSAM learns correlation graphs by capturing dynamic relationships between variables, while a novel three-step causal graph structure learning algorithm is introduced to derive a causal graph from these correlation graphs. The algorithm leverages a reverse perspective of causal invariance principle to uncover the invariant causal graph from varying correlations. The spatial-temporal encoder-decoder, built with GCLSTM units, reconstructs time-series process data within a sequence-to-sequence framework. The proposed CGSTAE enables effective process monitoring and fault detection through two statistics in the feature space and residual space. Finally, we validate the effectiveness of CGSTAE in process monitoring through the Tennessee Eastman process and a real-world air separation process.

2602.02304 2026-05-21 cs.AI cs.LG 版本更新

Comparing Explanations is Not Enough, Explain the Change: New Standards are Needed to Explain Behavioral Shifts in Large Language Models

比较解释并不足够,解释变化:需要新的标准来解释大型语言模型中的行为转变

Martino Ciaperoni, Marzio Di Vece, Roberto Pellungrini, Luca Pappalardo, Fosca Giannotti, Francesco Giannini

发表机构 * Scuola Normale Superiore(诺莱学院) ISTI-CNR(意大利国家研究委员会ISTI研究所) University of Pisa(比萨大学)

AI总结 本文提出了一种新的XAI方法,旨在解释大型语言模型在干预后行为转变的原因和机制,以应对现有解释方法无法解释行为转变的问题。

详情
AI中文摘要

大规模基础模型在受到缩放、微调、人类反馈强化学习或上下文学习等干预时会表现出行为转变。当前的可解释性方法结构上不适用于解释这些转变,因为它们要么将模型视为静态对象,如传统可解释AI(XAI)方法所做的,要么仅仅比较不同模型检查点的独立解释。因此,这些方法无法解释两个模型实例之间的功能转变,其中某种行为在干预后发生了变化。这种差距在欧盟人工智能法案、美国州立法和中国人工智能法规等司法管辖区中带来了重大治理风险,这些法规要求记录重大系统修改的因果链。本文主张,解释大型语言模型的行为转变需要一种系统的方法,将转变本身作为解释的主要对象:即解释干预如何和为何将参考模型转变为具有不同行为的更新模型。为了支持这一主张,我们引入了称为比较XAI(XAI_Δ)的新XAI范式,旨在解释两个模型检查点之间的差异,其中行为发生了变化,以及一组规范,规定XAI_Δ解释器和解释必须满足的条件,包括可比性、有效性、可操作性和监控,目标是将模型审计 grounded 在明确、可测量的要求中。最后,我们通过示例实验提供初步证据,表明在实践中需要XAI_Δ,将结果汇总成一份转换报告,直接可用于治理和事件记录。

英文摘要

Large-scale foundation models exhibit \emph{behavioral shifts} when subjected to interventions such as scaling, fine-tuning, reinforcement learning with human feedback, or in-context learning. Current explainability methods are structurally ill-suited to explain these shifts, because they either treat models as static objects, as traditional eXplainable AI (XAI) approaches do, or merely compare independent explanations across different checkpoints of a model. As a result, these approaches fail to explain the functional transition between two model instances in which a certain behavior has shifted following an intervention. This gap creates significant governance risks across jurisdictions including the EU AI Act, US state legislation, and Chinese AI regulations, which require documenting causal chains for substantial system modifications. This position paper argues that explaining behavioral shifts in large language models requires a principled approach that treats the shift itself as the primary object of explanation: namely, one that explains how and why an intervention transforms a reference model into an updated model with different behavior. To support this claim, we introduce \textit{Comparative} XAI (XAI$_Δ$), a novel XAI paradigm aimed at explaining the difference between two model checkpoints where a behavior has shifted, together with a set of desiderata specifying what XAI$_Δ$ explainers and explanations must satisfy, including comparability, validity, actionability, and monitoring, with the goal of grounding model auditing in explicit, measurable requirements. Finally, we provide preliminary evidence suggesting the need for XAI$_Δ$ in practice through illustrative experiments, compiling the resulting findings into a transition report directly usable for governance and incident documentation.

2601.22932 2026-05-21 cs.LG 版本更新

DC-LA: Difference-of-Convex Langevin Algorithm

DC-LA:差分凸拉格朗日算法

Hoang Phuc Hau Luu, Zhongjian Wang

发表机构 * Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore(数学科学系,物理与数学科学学院,南洋理工大学,新加坡)

AI总结 本文研究了一个采样问题,其目标分布为π∝exp(-f-r),其中数据保真项f是Lipschitz光滑的,而正则化项r=r1-r2是一个非光滑的差分凸(DC)函数。通过利用r的DC结构,分别对r1和r2应用Moreau包络以平滑r。随后,将正则化部分的凹部分分配给数据保真项,并研究相应的近端拉格朗日算法(称为DC-LA)。在V远离耗散的假设下,建立了DC-LA在q-Wasserstein距离上收敛到目标分布π的结论,且在离散化和平滑误差范围内对所有q∈ℕ*成立。结果在非对数凹采样方面改进了之前的成果。

详情
AI中文摘要

我们研究了一个采样问题,其目标分布为π∝exp(-f-r),其中数据保真项f是Lipschitz光滑的,而正则化项r=r1-r2是一个非光滑的差分凸(DC)函数,即r1,r2是凸函数。通过利用r的DC结构,我们分别对r1和r2应用Moreau包络以平滑r。遵循DC编程,我们将正则化部分的凹部分分配给数据保真项,并研究其对应的近端拉格朗日算法(称为DC-LA)。我们在V远离耗散的假设下,建立了DC-LA在q-Wasserstein距离上收敛到目标分布π的结论,且在离散化和平滑误差范围内对所有q∈ℕ*成立。我们的结果在非对数凹采样方面改进了之前的成果。数值实验表明,DC-LA在合成设置中能够生成准确的分布,并在实际应用的计算机断层扫描中提供定性合理的不确定性量化。

英文摘要

We study a sampling problem whose target distribution is $π\propto \exp(-f-r)$ where the data fidelity term $f$ is Lipschitz smooth while the regularizer term $r=r_1-r_2$ is a non-smooth difference-of-convex (DC) function, i.e., $r_1,r_2$ are convex. By leveraging the DC structure of $r$, we can smooth out $r$ by applying Moreau envelopes to $r_1$ and $r_2$ separately. In line with DC programming, we then redistribute the concave part of the regularizer to the data fidelity and study its corresponding proximal Langevin algorithm (termed DC-LA). We establish convergence of DC-LA to the target distribution $π$, up to discretization and smoothing errors, in the $q$-Wasserstein distance for all $q \in \mathbb{N}^*$, under the assumption that $V$ is distant dissipative. Our results improve previous work on non-log-concave sampling in terms of a more general framework and assumptions. Numerical experiments show that DC-LA produces accurate distributions in synthetic settings and provides qualitatively reasonable uncertainty quantification in a real-world Computed Tomography application.

2601.22292 2026-05-21 cs.MA cs.LG 版本更新

Learning Incentive Structures for Cooperative Resilience in Multi-Agent Systems under Social Dilemmas

在社会困境中的多智能体系统中学习合作韧性激励结构

Manuela Chacon-Chamorro, Luis Felipe Giraldo, Nicanor Quijano

发表机构 * School of Engineering, Universidad de los Andes(工程学院,亚诺斯大学)

AI总结 本文研究了在社会困境中通过多智能体强化学习系统学习促进集体福祉的激励结构,提出了一种评估和排名智能体轨迹的韧性度量标准,并通过三种激励结构评估了资源共享环境中的系统性能。

Comments Supplementary material in https://github.com/mavivi95/supplementary_files/blob/main/Learning_TCSS___Supplementary_File__AN_.pdf Updated version submitted to IEEE Transactions on Computational Social Systems (TCSS). This preprint is under review for possible publication in IEEE

详情
AI中文摘要

多智能体社会困境,如公地悲剧,捕捉了个体激励与集体福祉冲突的场景,使这些系统在受到干扰时极易崩溃。在这一背景下,本文研究了合作韧性,即系统层面在扰动下通过适应性智能体行为维持集体福祉的能力。我们提出了一种框架,用于学习多智能体强化学习系统中与集体福祉一致的激励结构,其中奖励函数塑造个体决策和集体行为。使用韧性度量标准对智能体轨迹进行评分和排序,可以推断出促进韧性集体行为的奖励函数。这些推断出的奖励函数被整合到多智能体强化学习过程中,以塑造社会困境设置中的智能体互动。该方法在受干扰的资源共享环境中进行了评估,使用了三种激励结构:个体激励、与韧性一致的激励,以及结合了个体和集体成分的混合激励结构。结果表明,混合激励结构促进了持续的集体行为,减少了与资源枯竭相关的崩溃事件,并在干扰下保持了系统性能。这些发现突显了激励设计作为促进韧性集体行为的机制,并为在干扰下多智能体社会困境提供了计算框架。

英文摘要

Multi-agent social dilemmas, such as the tragedy of the commons, capture settings where individual incentives conflict with collective well-being, making these systems highly vulnerable to collapse under disruptions. In this context, this work studies cooperative resilience, understood as the system-level ability to maintain collective well-being under perturbations through adaptive agent behavior. We propose a framework for learning incentive structures aligned with collective well-being in multi-agent reinforcement learning systems, where reward functions shape individual decision-making and collective behavior. A resilience metric is used to score and rank agent trajectories, allowing the inference of reward functions that promote resilient collective behavior. These inferred reward functions are integrated into the multi-agent reinforcement learning process to shape agent interactions in social dilemma settings. The approach is evaluated in resource-sharing environments subject to disruptions, using three incentive structures: individual incentives, resilience-aligned incentives, and a hybrid incentive structure that combines both individual and collective components. The results show that the hybrid incentive structure promotes sustained collective behavior, reduces collapse events associated with resource depletion, and preserves system performance under disruption. These findings highlight the role of incentive design as a mechanism for promoting resilient collective behavior and provide a computational framework for multi-agent social dilemmas under disruptions.

2601.21662 2026-05-21 cs.LG 版本更新

Epistemic Uncertainty Quantification for Pre-trained VLMs via Riemannian Flow Matching

通过黎曼流匹配对预训练视觉语言模型进行知识不确定性量化

Li Ju, Mayank Nautiyal, Andreas Hellander, Ekta Vats, Prashant Singh

发表机构 * Department of Information Technology, Uppsala University, Uppsala, Sweden(瑞典乌普萨拉大学信息科技系) Science for Life Laboratory, Uppsala University, Uppsala, Sweden(瑞典乌普萨拉大学生命科学实验室)

AI总结 本文提出REPVLM方法,通过黎曼流匹配在视觉语言模型嵌入的超球面流形上计算概率密度,以量化模型的知识不确定性,并在分类和异常检测中取得显著效果。

详情
Journal ref
Forty-Third International Conference on Machine Learning, 2026
AI中文摘要

视觉语言模型(VLMs)通常具有确定性性质,并缺乏内在机制来量化知识不确定性,这反映了模型对知识的缺乏或对其自身表示的无知。我们理论上提出嵌入的负对数密度作为知识不确定性的代理,低密度区域表示模型的无知。所提出的方法REPVLM通过黎曼流匹配在VLM嵌入的超球面流形上计算概率密度。我们实证表明,REPVLM在不确定性与预测误差之间实现了接近完美的相关性,显著优于现有基线。除了分类之外,我们还证明该模型还提供了一种可扩展的度量标准,用于异常检测和自动化数据整理。

英文摘要

Vision-Language Models (VLMs) are typically deterministic in nature and lack intrinsic mechanisms to quantify epistemic uncertainty, which reflects the model's lack of knowledge or ignorance of its own representations. We theoretically motivate negative log-density of an embedding as a proxy for the epistemic uncertainty, where low-density regions signify model ignorance. The proposed method REPVLM computes the probability density on the hyperspherical manifold of the VLM embeddings using Riemannian Flow Matching. We empirically demonstrate that REPVLM achieves near-perfect correlation between uncertainty and prediction error, significantly outperforming existing baselines. Beyond classification, we also demonstrate that the model also provides a scalable metric for out-of-distribution detection and automated data curation.

2601.18696 2026-05-21 cs.LG 版本更新

Explainability Methods for Hardware Trojan Detection: A Systematic Comparison

用于硬件木马检测的可解释性方法:系统性比较

Paul Whitten, Francis Wolff, Chris Papachristou

发表机构 * Electrical, Computer, and Systems Engineering(电子工程与系统工程)

AI总结 本文针对硬件木马检测中的可解释性方法进行系统性比较,探讨领域感知属性分析、基于案例的推理和特征归因技术在硬件安全应用中的性能差异。

详情
AI中文摘要

硬件木马是恶意电路,会破坏集成电路(IC)的功能和安全性。这些电路直接制造在硅片上,无法像软件一样通过安全补丁修复。解决方案需要通过更换IC进行昂贵的产品召回,因此在设计过程中早期检测至关重要。最佳的硬件检测仅能提供基于统计的解决方案,存在大量假阳性和假阴性。这些检测方法需要更深入的可解释性分析来过滤假指标。现有为通用领域(如图像分类)开发的可解释性方法可能无法提供硬件工程师所需的操作洞察。问题在于:领域感知属性分析、基于案例的推理和特征归因技术在硬件安全应用中如何比较?本文比较了三种可解释性方法用于门级硬件木马检测,在Trust-Hub基准数据集上:(1)基于31个电路特定特征的领域感知属性分析,这些特征来自门扇入模式、触发器距离和主输入/输出(I/O)连接;(2)使用k-最近邻进行基于案例的推理以获得基于先例的解释;(3)基于模型无关的特征归因方法(局部可解释模型无关解释(LIME)、SHapley Additive exPlanations(SHAP)、梯度)提供通用的重要性评分,而无需电路级上下文。

英文摘要

Hardware trojans are malicious circuits which compromise the functionality and security of an integrated circuit (IC). These circuits are manufactured directly into the silicon and cannot be fixed by security patches like software. The solution would require a costly product recall by replacing the IC and hence, early detection in the design process is essential. Hardware detection at best provides statistically based solutions with many false positives and false negatives. These detection methods require more thorough explainable analysis to filter out false indicators. Existing explainability methods developed for general domains like image classification may not provide the actionable insights that hardware engineers need. A question remains: How do domain-aware property analysis, model-agnostic case-based reasoning, and model-agnostic feature attribution techniques compare for hardware security applications? This work compares three categories of explainability for gate-level hardware trojan detection on the Trust-Hub benchmark dataset: (1) domain-aware property-based analysis of 31 circuit-specific features derived from gate fanin patterns, flip-flop distances, and primary Input/Output (I/O) connectivity; (2) model-agnostic case-based reasoning using k-nearest neighbors for precedent-based explanations; and (3) model-agnostic feature attribution methods (Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), gradient) that provide generic importance scores without circuit-level context.

2601.18577 2026-05-21 cs.CV cs.LG 版本更新

Self-Refining Video Sampling

自 refining 视频采样

Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Saining Xie, Jaehong Yoon, Sung Ju Hwang

AI总结 本文提出了一种自 refining 视频采样方法,通过预训练的视频生成器作为自身 refine 器,无需外部验证器或额外训练,在推理时实现迭代内部循环 refine,提高了运动一致性和物理对齐性。

Comments ICML 2026. Project page: https://agwmon.github.io/self-refine-video/

详情
AI中文摘要

现代视频生成器仍难以处理复杂的物理动态,往往无法达到物理真实感。现有方法通过外部验证器或在增强数据上额外训练来解决这一问题,但计算成本高且仍难以捕捉细粒度运动。在本工作中,我们提出了自 refining 视频采样,一种简单的方法,利用在大规模数据集上预训练的视频生成器作为自身的 self-refiner。通过将生成器解释为去噪自编码器,我们能够在推理时实现迭代内部循环 refine,而无需任何外部验证器或额外训练。我们进一步引入了一种不确定性的 refine 策略,根据 self-consistency 选择性地 refine 区域,这防止了过度 refine 引起的伪影。在最先进的视频生成器上进行的实验显示,在运动一致性与物理对齐性方面有显著提升,达到比默认采样器和 guidance-based 采样器高出 70% 以上的人类偏好。

英文摘要

Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70% human preference compared to the default sampler and guidance-based sampler.

2601.00473 2026-05-21 cs.LG cs.AI 版本更新

Deep Neural Networks as Discrete Dynamical Systems: Implications for Physics-Informed Learning

深度神经网络作为离散动力系统:对物理信息学习的启示

Abhisek Ganguly, Santosh Ansumali, Sauro Succi

发表机构 * Engineering Mechanics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research(纳拉扬·德赛高级科学研究中心工程力学单元) Italian Institute of Technology(意大利理工学院) University of Roma Tre(罗马三大学) Physics Department, Harvard University(哈佛大学物理系) Cornell University(康奈尔大学)

AI总结 本文探讨了深度神经网络与离散动力系统之间的类比,通过比较Burgers方程和Eikonal方程的数值/精确解与PINNs获得的解,展示了PINN学习在近似相同系统动力学时提供了一种不同的计算路径,同时指出PINNs的密集参数表示在高维情况下可能具有优势。

详情
AI中文摘要

我们重新审视了前馈深度神经网络(DNNs)与源自神经积分方程及其相应偏微分方程(PDE)形式的离散动力系统之间的类比。本文呈现了Burgers方程和Eikonal方程的数值/精确解与通过PINNs获得的解的比较分析。我们展示了PINN学习在近似本质上相同的系统动力学时提供了一种不同于标准数值离散化的计算路径。在此框架下,DNNs可以被解释为离散动力系统,其层间演进方法趋向于吸引子,多个参数配置可能产生可比的解,反映了逆映射的退化性。与有限差分(FD)过程相关的结构化算子不同,PINNs学习密集的参数表示,这些表示与经典离散化 stencil 无直接关联。这种分布式表示通常涉及更多的参数,导致可解释性降低和计算成本增加。然而,这种额外的灵活性可能在高维情况下提供优势,其中经典网格方法变得不切实际。

英文摘要

We revisit the analogy between feed-forward deep neural networks (DNNs) and discrete dynamical systems derived from neural integral equations and their corresponding partial differential equation (PDE) forms. A comparative analysis between the numerical/exact solutions of the Burgers' and Eikonal equations, and the same obtained via PINNs is presented. We show that PINN learning provides a different computational pathway compared to standard numerical discretization in approximating essentially the same underlying dynamics of the system. Within this framework, DNNs can be interpreted as discrete dynamical systems whose layer-wise evolution approaches attractors, and multiple parameter configurations may yield comparable solutions, reflecting the degeneracy of the inverse mapping. In contrast to the structured operators associated with finite-difference (FD) procedures, PINNs learn dense parameter representations that are not directly associated with classical discretization stencils. This distributed representation generally involves a larger number of parameters, leading to reduced interpretability and increased computational cost. However, the additional flexibility of such representations may offer advantages in high-dimensional settings where classical grid-based methods become impractical.

2601.00418 2026-05-21 cs.CR cs.DC cs.LG 版本更新

Secure, Verifiable, and Scalable Multi-Client Data Sharing via Consensus-Based Privacy-Preserving Data Distribution

通过基于共识的隐私保护数据分发实现安全、可验证和可扩展的多客户端数据共享

Prajwal Panth, Sahaj Raj Malla

发表机构 * School of Computer Engineering, KIIT University(KIIT大学计算机工程学院) Department of Mathematics, Kathmandu University(加德满都大学数学系)

AI总结 本文提出了一种基于共识的隐私保护数据分发(CPPDD)框架,该框架是一种轻量级且在设置后自动运行的协议,用于安全的多客户端数据聚合。该框架通过结合每个客户端的仿射掩码和优先级驱动的顺序共识锁定的双层保护机制,强制实施一致发布保密性。通过步骤(sigma_S)和数据(sigma_D)校验和实现去中心化完整性,从而在不需要持续协调的情况下实现自动恶意偏差检测和原子回滚。该设计支持标量、向量和矩阵负载,具有O(N*D)的计算和通信复杂度,可选边缘服务器卸载,并在N-1破坏情况下具有抗合谋性。形式分析证明了正确性、共识依赖完整性与公平性(CDIF)以及在偏差下的高概率回滚,并假设伪随机函数族的情况下证明了IND-CPA安全性。在MNIST衍生向量上的实证评估显示,可扩展性线性增长到N=500,每个客户端的计算时间亚毫秒级。该框架实现了100%的恶意偏差检测、精确的数据恢复以及与MPC和HE基线相比低三个到四个数量级的FLOPs。CPPDD在安全投票、联盟联邦学习、区块链担保和地理信息能力构建中实现了原子协作,解决了在受监管和资源受限环境中可扩展性、信任最小化和可验证多方计算的关键差距。

Comments 25 pages, 6 figures, preprint

详情
AI中文摘要

我们提出了一种基于共识的隐私保护数据分发(CPPDD)框架,一种轻量级且在设置后自动运行的协议,用于安全的多客户端数据聚合。该框架通过结合每个客户端的仿射掩码和优先级驱动的顺序共识锁定的双层保护机制,强制实施一致发布保密性。去中心化完整性通过步骤(sigma_S)和数据(sigma_D)校验和进行验证,从而在不需要持续协调的情况下实现自动恶意偏差检测和原子回滚。该设计支持标量、向量和矩阵负载,具有O(N*D)的计算和通信复杂度,可选边缘服务器卸载,并在N-1破坏情况下具有抗合谋性。形式分析证明了正确性、共识依赖完整性与公平性(CDIF)以及在偏差下的高概率回滚,并假设伪随机函数族的情况下证明了IND-CPA安全性。在MNIST衍生向量上的实证评估显示,可扩展性线性增长到N=500,每个客户端的计算时间亚毫秒级。该框架实现了100%的恶意偏差检测、精确的数据恢复以及与MPC和HE基线相比低三个到四个数量级的FLOPs。CPPDD在安全投票、联盟联邦学习、区块链担保和地理信息能力构建中实现了原子协作,解决了在受监管和资源受限环境中可扩展性、信任最小化和可验证多方计算的关键差距。

英文摘要

We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framework enforces unanimous-release confidentiality through a dual-layer protection mechanism that combines per-client affine masking with priority-driven sequential consensus locking. Decentralized integrity is verified via step (sigma_S) and data (sigma_D) checksums, facilitating autonomous malicious deviation detection and atomic abort without requiring persistent coordination. The design supports scalar, vector, and matrix payloads with O(N*D) computation and communication complexity, optional edge-server offloading, and resistance to collusion under N-1 corruptions. Formal analysis proves correctness, Consensus-Dependent Integrity and Fairness (CDIF) with overwhelming-probability abort on deviation, and IND-CPA security assuming a pseudorandom function family. Empirical evaluations on MNIST-derived vectors demonstrate linear scalability up to N = 500 with sub-millisecond per-client computation times. The framework achieves 100% malicious deviation detection, exact data recovery, and three-to-four orders of magnitude lower FLOPs compared to MPC and HE baselines. CPPDD enables atomic collaboration in secure voting, consortium federated learning, blockchain escrows, and geo-information capacity building, addressing critical gaps in scalability, trust minimization, and verifiable multi-party computation for regulated and resource-constrained environments.

2512.13788 2026-05-21 cs.LG cs.RO 版本更新

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

通过基于采样的权重空间投影进行约束策略优化

Shengfan Cao, Francesco Borrelli, Eunhyek Joa

发表机构 * Department of Mechanical Engineering, Seoul National University, Seoul, Korea(首尔国立大学机械工程系)

AI总结 该研究提出了一种基于采样的权重空间投影方法SCPO,用于在不离开安全操作范围的情况下优化策略,通过在参数空间中直接强制安全约束,确保在训练过程中保持安全性和可行性,同时在约束控制任务中实现闭环稳定性。

Comments Accepted for publication at IFAC World Congress 2026; fixed minor notation inconsistencies

详情
AI中文摘要

安全关键学习需要在不离开安全操作范围的情况下提高性能的策略。我们研究了约束策略学习,其中模型参数必须满足基于滚动的安全部署约束,这些约束可以评估但不能解析地微分。我们提出了SCPO,一种基于采样的权重空间投影方法,该方法在不需梯度访问约束函数的情况下直接在参数空间中强制安全。SCPO通过结合基于滚动的安全评估和参数扰动与安全度量变化之间的平滑性界,构建局部安全区域,并通过凸QCQP将每个梯度更新投影。我们建立了安全-by-induction保证:从任何安全初始化开始,给定可行的投影,所有中间策略保持安全。在具有稳定备份策略的约束控制设置中,SCPO进一步确保闭环稳定性,同时在保守备份之外实现安全适应。在具有有害监督的约束回归和双积分模仿与恶意专家的实验中,SCPO拒绝了不安全的更新,保持了训练过程中的可行性,并实现了有意义的目标改进。

英文摘要

Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy rollout-based safety constraints that can be evaluated but not differentiated analytically. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. SCPO constructs a local safe region by combining rollout-based safety evaluations with smoothness bounds relating parameter perturbations to changes in safety metrics, and projects each gradient update via a convex QCQP. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, SCPO further ensures closed-loop stability while enabling safe adaptation beyond the conservative backup. Experiments on constrained regression with harmful supervision and double-integrator imitation with a malicious expert show that SCPO rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful objective improvement.

2512.08013 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control

从稀疏输出测量中学习动态以实现不确定性感知的最优控制

Robert Lefringhausen, Theodor Springer, Sandra Hirche

发表机构 * Chair of Information-oriented Control, School of Computation, Information and Technology(信息导向控制研究所,计算、信息与技术学院)

AI总结 该研究提出了一种基于贝叶斯先验的连续时间动态和潜在状态轨迹建模方法,利用目标Metropolis-Hastings采样器和数值ODE求解器进行更新,通过场景优化方法解决不确定性下的最优控制问题,验证了在1型糖尿病血糖调节中的有效性。

Comments Accepted for publication in the Proceedings of the 2026 IFAC World Congress

详情
AI中文摘要

当非线性系统动态未知且仅有稀疏、噪声的输出测量时,可靠的最优控制极具挑战性。本文针对这种有限传感设置,通过构建连续时间动态和潜在状态轨迹的状态空间形式的贝叶斯先验,并利用配备数值ODE求解器的目标Metropolis-Hastings采样器进行更新。所得后验样本用于构建考虑动态和潜在状态不确定性的场景优化最优控制问题,并通过标准非线性规划方法求解。该方法在使用1型糖尿病模型的数值案例研究中得到了验证。

英文摘要

Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for the uncertainty in the dynamics and latent state and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model.

2512.07420 2026-05-21 hep-ph cs.LG hep-ex 版本更新

E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features

E-PCN:利用可解释的粒子切比雪夫网络进行喷注标记:使用动量学特征

Md Raqibul Islam, Adrita Khan, Mir Sazzat Hossain, Choudhury Ben Yamin Siddiqui, Md. Zakir Hossan, Tanjib Khan, M. Arshad Momen, Amin Ahsan Ali, AKM Mahbubur Rahman

发表机构 * a Center for Computational \& Data Sciences, Independent University, Bangladesh, Dhaka-1229, Bangladesh b Department of Physical Sciences, Independent University, Bangladesh, Dhaka-1229, Bangladesh c Department of Theoretical Physics, University of Dhaka, Dhaka-1000, Bangladesh [-1em]

AI总结 本文提出E-PCN,一种结合动量学特征的可解释粒子切比雪夫网络,用于喷注标记,通过构建四个图表示来提高分类的可解释性和准确性。

Comments 25 pages, 3 figures

详情
AI中文摘要

喷注的识别和分类对于解释高能碰撞实验数据至关重要。尽管深度学习已经改善了喷注分类,但通常缺乏可解释性。我们介绍了可解释的粒子切比雪夫网络(E-PCN),这是一种扩展粒子切比雪夫网络(PCN)的图神经网络。E-PCN通过为每个喷注构建四个图表示,将动量学变量整合到喷注分类中,每个图表示由不同的变量加权:角分离(Δ)、横向动量(k_T)、动量分数(z)和不变质量平方(m²)。我们使用梯度加权类激活映射(Grad-CAM)的概念来确定哪些动量学变量主导分类结果。分析表明,角分离和横向动量共同占分类决策的约76%(分别占40.72%和35.67%),动量分数和不变质量贡献剩余的24%。在JetClass数据集上评估,E-PCN在10个信号类上实现了宏精度94.67%、宏AUC 96.78%和宏AUPR 86.79%,分别比基线PCN实现提高了2.36%、4.13%和24.88%,同时展示了物理上可解释的特征学习。

英文摘要

The identification and classification of collimated particle sprays, or jets, are essential for interpreting data from high-energy collider experiments. While deep learning has improved jet classification, it often lacks interpretability. We introduce the Explainable Particle Chebyshev Network (E-PCN), a graph neural network extending the Particle Chebyshev Network (PCN). E-PCN integrates kinematic variables into jet classification by constructing four graph representations per jet, each weighted by a distinct variable: angular separation ($Δ$), transverse momentum ($k_T$), momentum fraction ($z$), and invariant mass squared ($m^2$). We use the concept of Gradient-weighted Class Activation Mapping (Grad-CAM) to determine which kinematic variables dominate classification outcomes. Analysis reveals that angular separation and transverse momentum collectively account for approximately 76% of classification decisions (40.72% and 35.67%, respectively), with momentum fraction and invariant mass contributing the remaining 24%. Evaluated on the JetClass dataset with 10 signal classes, E-PCN achieves a macro-accuracy of 94.67%, macro-AUC of 96.78%, and macro-AUPR of 86.79%, representing improvements of 2.36%, 4.13%, and 24.88% respectively over the baseline PCN implementation, while demonstrating physically interpretable feature learning.

2511.23152 2026-05-21 cs.LG cond-mat.dis-nn math.OC math.RT stat.ML 版本更新

A Differentiable Measure of Algebraic Complexity: Provably Exact Discovery of Group Structures

一种可微的代数复杂性度量:证明精确发现群结构

Dongsung Huh, Lior Horesh, Halyun Jeong

发表机构 * Independent Researcher(独立研究者) IBM Research(IBM研究院) University at Albany, SUNY(阿尔巴尼大学,SUNY)

AI总结 本文提出了一种可微的代数复杂性度量,通过Cayley表完成问题,证明了通过超立方体操作符张量分解可以精确发现群结构,解决了Huh(2025)的核心开放猜想。

Comments 29 pages, 3 figures. All theoretical conjectures are formally proven as theorems and verified in Lean 4. v4: Minor typographical corrections

详情
AI中文摘要

从数据中发现离散代数规则是机器学习中的基本挑战。我们通过Cayley表完成——经典矩阵完成的代数对应物——正式化了这个问题,其中关联性违反的程度取代线性秩作为复杂性的内在度量。我们对超立方体,一种操作值张量分解,在完全观察的目标表δ上进行了严格的景观分析,证明其全局下界H_inf(δ) := inf_{Θ∈F_δ} H(Θ)隐式定义了这种复杂性的精确可微度量。我们证明了超立方体的原目标函数H(Θ)分解为两个组成部分:几何对齐(共线性)和反ℓ_2惩罚。我们建立这些连续变分压力诱导了核心离散属性:共线性强制关联性(共线性-关联性等价),而反ℓ_2惩罚在共线性流形内减少为精确反秩惩罚,驱动参数向全秩单位性发展。因此,我们推导出一个绝对下界H(Θ) ≥ H_inf(δ) ≥ 3 |δ|,其中|δ|是目标表大小。我们证明这个绝对地板在且仅在目标是同源于群时被达到,并将全局最小值表征为底层群的正则表示(除单位性规范外),解决了Huh(2025)的核心开放猜想。本文为某些离散代数结构可以被可微度量精确表征提供了存在证明,使得基于梯度的发现无需组合搜索。所有理论结果均在Lean 4中机械验证并通过小规模实验确认。

英文摘要

Discovering discrete algebraic rules from data is a fundamental challenge in machine learning. We formalize this problem through Cayley-table completion -- an algebraic counterpart to classical matrix completion -- where the degree of associativity violation replaces linear rank as the intrinsic measure of complexity. We provide a rigorous landscape analysis of HyperCube, an operator-valued tensor factorization, on the fully observed target table $δ$, proving that its global infimum $H_{\inf}(δ) := \inf_{Θ\in F_δ} H(Θ)$ implicitly defines an exact differentiable measure for this complexity. We show that HyperCube's native objective $H(Θ)$ decomposes into two components: geometric alignment (collinearity) and an inverse $\ell_2$ penalty. We establish that these continuous variational pressures induce core discrete properties: collinearity enforces associativity (Collinearity--Associativity Equivalence), and the inverse $\ell_2$ penalty reduces to an exact inverse rank penalty within the collinear manifold, driving the parameters toward full-rank unitarity. Consequently, we derive an absolute lower bound $H(Θ) \ge H_{\inf}(δ) \ge 3 \, |δ|$, where $|δ|$ is the target table size. We prove this absolute floor is attained if and only if the target is isotopic to a group, and characterize the global minimizer as the regular representation of the underlying group (up to unitary gauge), resolving the central open conjecture of Huh (2025). This work serves as an existence proof that certain discrete algebraic structures can be exactly characterized by differentiable measures, enabling gradient-based discovery without the need for combinatorial search. All theoretical results are mechanically verified in Lean 4 and confirmed via small-scale experiments.

2511.21223 2026-05-21 stat.ML cs.LG 版本更新

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Jasraj Singh, Shelvia Wongso, Jeremie Houssineau, Badr-Eddine Chérief-Abdellatif

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种基于可能性理论的变分推断方法,通过建立最大性Donsker-Varadhan公式,解决了传统变分推断中对加法性假设的依赖问题,并提出了CBOpt优化器以提升图像分类任务的性能。

Comments 37 pages, 3 figures, 13 tables

详情
AI中文摘要

变分推断(VI)是现代贝叶斯学习的核心,使复杂模型的近似推断成为可能。然而,其公式依赖于高维积分定义的期望和发散,通常使解析处理变得不可能,需要依赖大量近似。可能性理论是一种不精确概率框架,允许我们直接建模信念不确定性,而不是依赖概率的主观解释。尽管该框架在稀疏或不精确信息下提供鲁棒性和可解释性,但将VI适应到可能性设置中需要重新思考核心概念,如发散,这预设了加法性。在本工作中,我们开发了一种原则性的公式,以进行可能性VI,通过建立经典Donsker-Varadhan公式的最大性类比。所得到的框架使我们能够推导出具有指数族候选者的可能性VI学习规则和实用的神经网络训练更新规则,从而产生了一族称为CBOpt的优化器。最后,我们证明CBOpt在域内和域外图像分类任务中实现了有竞争力的性能。

英文摘要

Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on approximations. Possibility theory, an imprecise probability framework, allows us to directly model epistemic uncertainty instead of relying on a subjective interpretation of probabilities. While this framework provides robustness and interpretability under sparse or imprecise information, adapting VI to the possibilistic setting requires rethinking core concepts such as divergences, which presuppose additivity. In this work, we develop a principled formulation for performing possibilistic VI by establishing a maxitive analogue of the classical Donsker-Varadhan formulation. The resulting framework enables us to derive a learning rule for possibilistic VI with exponential-family candidates and practical update rules for neural-network training, giving rise to a family of optimizers termed CBOpt. Finally, we demonstrate that CBOpt achieves competitive performance on both in-domain and out-of-domain image classification tasks.

2510.14444 2026-05-21 cs.LG cs.AI 版本更新

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

在LLM压缩中寻找免费午餐:重新审视剪枝后的重新训练

Moritz Wagner, Christophe Roux, Max Zimmer, Sebastian Pokutta

发表机构 * Department for AI in Society, Science, and Technology, Zuse Institute Berlin(人工智能社会、科学与技术系,柏林Zuse研究所) Institute of Mathematics, Technische Universität Berlin(数学系,柏林技术大学)

AI总结 本文研究了在剪枝后通过局部重建进行适应的方法,发现其在减少数据和计算成本的同时能有效提升模型性能,并揭示了在不同粒度下重建参数窗口对最终质量的影响,挑战了LLM剪枝后适应不可行的主流观点。

详情
AI中文摘要

后训练剪枝可以显著降低LLM推理成本,但除非剩余权重被适应,否则往往会降质。由于在LLM规模上全局重新训练成本高昂,近期研究大多集中在日益复杂的剪枝标准上,旨在选择更好的稀疏模式而不进行适应。我们通过局部重建重新审视这一权衡:在剪枝后,我们依次在校准集上适应模型参数的一个子集,训练其以匹配密集模型的相应中间激活值。我们评估了局部重建在不同模型家族和规模上的表现,最高达到72B参数,并得出三个主要发现。首先,局部重建是LLM的有效适应机制:它在剪枝后重新训练时,使用了超过一个数量级更少的数据和计算资源,即使使用PEFT技术也是如此。其次,重建在粒度上表现出广泛的“免费午餐”区域,即重建参数窗口:只要重建区域包含至少一个非线性子模块,最终质量对窗口大小几乎不敏感,允许粒度主要基于内存约束来选择。相比之下,重建单个矩阵,尽管是文献中常提出的方法,却持续表现不佳,因为小的矩阵级误差会积累成更大的激活漂移。最后,重建减少了剪枝标准的相对重要性:随着模型规模的增加,复杂标准与简单基线之间的性能差距缩小,使简单方法再次具有竞争力。总体而言,我们的结果挑战了LLM剪枝后适应不可行的主流观点。

英文摘要

Post-training pruning can substantially reduce LLM inference costs, but it often degrades quality unless the remaining weights are adapted. Since global retraining is expensive at LLM scale, recent work has largely focused on increasingly sophisticated pruning criteria that aim to select better sparsity patterns without adaptation. We revisit this trade-off through local reconstruction: after pruning, we adapt one subset of the model parameters at a time on a calibration set, training it to match the corresponding intermediate activations of the dense model. We evaluate local reconstruction across model families and scales, up to 72B parameters, and establish three main findings. First, local reconstruction is an effective adaptation mechanism for LLMs: it matches post-pruning retraining while using over an order of magnitude less data and compute, even when using PEFT techniques. Second, reconstruction exhibits a broad "free-lunch" regime in granularity, i.e., the reconstruction parameter window: as long as the reconstructed region contains at least a nonlinear submodule, final quality is largely insensitive to the window size, allowing granularity to be chosen primarily based on memory constraints. In contrast, reconstructing individual matrices, despite being the natural approach often proposed in the literature, consistently underperforms, as small matrix-level errors accumulate into larger activation drift. Lastly, reconstruction reduces the relative importance of the pruning criterion: performance gaps between sophisticated criteria and simple baselines shrink with model scale, making simple methods competitive again. Overall, our results challenge the prevailing view that post-pruning adaptation is impractical for LLMs.

2510.06824 2026-05-21 cs.LG 版本更新

Efficient numeracy in language models through single-token number embeddings

通过单token数字嵌入提升语言模型的数值处理效率

Linus Kreitner, Paul Hager, Jonathan Mengedoht, Georgios Kaissis, Daniel Rueckert, Martin J. Menten

发表机构 * Chair for AI in Healthcare and Medicine, Technical University of Munich (TUM) and TUM University Hospital, Munich, Germany(人工智能在医疗和医学中的Chair,慕尼黑技术大学(TUM)和慕尼黑技术大学医院,德国慕尼黑) Department of Computing, Imperial College London, UK(计算系,伦敦帝国学院,英国) Munich Center for Machine Learning (MCML), Munich, Germany(慕尼黑机器学习中心(MCML),德国慕尼黑) Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany(哈索·platzer研究所数字工程学院,波茨坦大学,德国)

AI总结 本文提出BitTokens,一种利用IEEE 754二进制浮点表示将数字编码为单token的方法,使语言模型能更高效地处理数值计算,从而提升其解决复杂问题的能力。

详情
AI中文摘要

为了推动科学和工程领域的进步,大型语言模型(LLMs)必须能够高效处理大量数值数据并解决长计算。目前只能通过外部工具或大量推理链实现,这要么削弱了LLMs的数值表示,要么限制了它们能解决的问题长度。我们发现前沿LLMs解决基本计算需要过多的推理token,这被其分拆单个数字为多个token的分词策略所加剧。这促使了对高效且有效的单token数字编码的需求。我们提出了一组此类编码的准则,并展示现有方法未能满足这些准则。为解决这些不足,我们提出了BitTokens,一种新的编码策略,通过IEEE 754二进制浮点表示将任何数字编码为单个token。通过广泛实验,我们证明我们的BitTokens使即使是小型语言模型也能学习到几乎完美解决基本算术运算的算法。这种新获得的效率可以扩展语言模型能解决的问题长度和复杂性。

英文摘要

To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or extensive reasoning chains, either weakening the numerical representations of LLMs or limiting the length of problems they can solve. We show that frontier LLMs require excessive amounts of reasoning tokens to solve even basic calculations, which is exacerbated by their tokenization strategies that split single numbers into multiple tokens. This motivates the need for efficient and effective single-token number encodings. We introduce a set of desiderata for such encodings and show that existing approaches fail to fulfill them. To address these shortcomings, we propose BitTokens, a novel encoding strategy that represents any number as a single token using its IEEE 754 binary floating-point representation. Through extensive experiments we show that our BitTokens allow even small language models to learn algorithms that solve basic arithmetic operations nearly perfectly. This newly gained efficiency could expand the length and complexity of problems language models can solve.

2510.00171 2026-05-21 quant-ph cs.LG 版本更新

Quantum reservoir computing in Jaynes-Cummings models: Nonlinear memory and time-series prediction

在Jaynes-Cummings模型中进行量子回声计算:非线性记忆与时间序列预测

Sreetama Das, Gian Luca Giorgi, Roberta Zambrini

发表机构 * Institute for Cross-Disciplinary Physics and Complex Systems (IFISC) UIB-CSIC(交叉学科物理与复杂系统研究所(IFISC) UIB-CSIC)

AI总结 本文研究了基于Jaynes-Cummings模型的量子回声计算,探讨了非线性记忆和时间序列预测的核心方法,并展示了其在复杂动态系统中的应用价值。

Comments 16 pages, 14 figures, published version

详情
Journal ref
Phys. Rev. Research 8, 023148 (2026)
AI中文摘要

我们研究了利用由Jaynes-Cummings(JC)哈密顿量及其色散极限(DJC)描述的混合量子-玻色子系统进行量子回声计算(QRC)。这些模型提供了高维希尔伯特空间和内在非线性动力学,使其成为时间信息处理的强大基质。我们通过线性和非线性记忆任务系统地评估了两种回声体,证明它们表现出非线性记忆能力优于线性记忆能力。我们进一步在Mackey-Glass时间序列上测试其预测性能,该序列是用于混沌动态的广泛基准,展示了可比的预测能力。我们还研究了记忆和预测准确性如何随回声参数变化,并展示了更高阶玻色子可观测量和时间复用在增强表达性中的作用,即使在最小的自旋-玻色子配置中也是如此。我们的结果确立了基于JC和DJC的回声体作为时间序列处理的多功能平台,并作为克服等效量子位对设置的基本单元,提供了通往可调、高性能量子机器学习架构的途径。

英文摘要

We investigate quantum reservoir computing (QRC) using a hybrid qubit-boson system described by the Jaynes-Cummings (JC) Hamiltonian and its dispersive limit (DJC). These models provide high-dimensional Hilbert spaces and intrinsic nonlinear dynamics, making them powerful substrates for temporal information processing. We systematically benchmark both reservoirs through linear and nonlinear memory tasks, demonstrating that they exhibit an unusual superior nonlinear over linear memory capacity. We further test their predictive performance on the Mackey-Glass time series, a widely used benchmark for chaotic dynamics, and show comparable forecasting ability. We also investigate how memory and prediction accuracy vary with reservoir parameters, and show the role of higher-order bosonic observables and time multiplexing in enhancing expressivity, even in minimal spin-boson configurations. Our results establish JC- and DJC-based reservoirs as versatile platforms for time-series processing and as elementary units that overcome the setting of equivalent qubit pairs and offer pathways toward tunable, high-performance quantum machine learning architectures.

2509.26627 2026-05-21 cs.AI cs.LG cs.RO 版本更新

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

TimeRewarder: 通过帧间时间距离从被动视频中学习密集奖励

Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China(清华大学交叉信息研究院) Shanghai Qi Zhi Institute(上海启智研究院) Shanghai Jiao Tong University(上海交通大学) University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出TimeRewarder方法,通过帧间时间距离从被动视频中学习密集奖励,以提升强化学习在稀疏奖励任务中的性能,实验表明其在多个任务中显著提高了成功率和样本效率。

Comments ICML 2026 spotlight paper

详情
AI中文摘要

设计密集奖励对于强化学习(RL)至关重要,但在机器人学中往往需要大量的手动工作且缺乏可扩展性。一个有前景的解决方案是将任务进展视为密集奖励信号,因为它量化了动作在时间上推动系统向任务完成迈进的程度。我们提出了TimeRewarder,一种简单而有效的奖励学习方法,通过建模帧对之间的时间距离,从被动视频(包括机器人演示和人类视频)中推导出进展估计信号。然后展示如何通过TimeRewarder提供逐步的代理奖励以指导强化学习。在我们对十个具有挑战性的Meta-World任务的全面实验中,我们表明TimeRewarder显著提高了稀疏奖励任务的强化学习性能,仅在每个任务中进行200,000次环境交互时,就实现了9/10任务的几乎完美成功。该方法在最终成功率和样本效率上均优于先前方法和手动设计的环境密集奖励。此外,我们还展示了TimeRewarder预训练可以利用真实世界的人类视频,突显了其作为从多样化视频源中获取丰富奖励信号的可扩展方法的潜力。

英文摘要

Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 environment interactions per task. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach to rich reward signals from diverse video sources.

2509.25606 2026-05-21 cs.LG 版本更新

Effective Model Pruning: Measure The Redundancy of Model Components

有效模型剪枝:衡量模型组件的冗余性

Yixuan Wang, Dan P. Guralnik, Saiedeh Akbari, Warren E. Dixon

发表机构 * Department of Mechanical and Aerospace Engineering, University of Florida(佛罗里达大学机械与航空航天工程系) Department of Mathematics, Ohio University(俄亥俄大学数学系)

AI总结 本文研究了模型剪枝中的基本问题,提出了一种基于有效样本大小的剪枝方法,通过分析重要性评分分布来确定可丢弃的组件数量,并在多种网络架构上验证了该方法的有效性。

Comments 18 pages, 4 figures. Accepted at ICML 2026 (Spotlight)

详情
AI中文摘要

本文开创性地研究了模型剪枝中的基本问题:给定一个分配给模型组件的重要性评分向量s,如何确定在不牺牲性能的情况下可以丢弃多少评分组件?我们提出了有效模型剪枝(EMP),该方法通过粒子过滤中的有效样本大小概念(也称为逆西姆逊指数)直接从评分分布中推导出所需的稀疏性。EMP提供了一个通用的自适应阈值,该阈值基于评分s在模型组件上的分布:EMP将s映射到一个称为有效样本大小的数值N_eff(s)。丢弃N-N_eff分值最低的组件。推导了有效质量s_eff(保留的标准化评分总和)关于N_eff的紧下界。这一过程产生了一个相对于原始密集模型具有可证明上界损失变化的模型。在多种网络架构上进行了数值实验,包括MLPs、CNNs、Transformers、LLMs和KAN。还展示了EMP能够处理多种剪枝标准,如权重大小、注意力评分、KAN重要性评分以及特征级信号如图像像素。

英文摘要

This article initiates the study of a basic question about model pruning. Given a vector $s$ of importance scores assigned to model components, how many of the scored components could be discarded without sacrificing performance? We propose Effective Model Pruning (EMP), which derives the desired sparsity directly from the score distribution using the notion of effective sample size from particle filtering, also known as the inverse Simpson index. Rather than prescribe a pruning criterion, EMP supplies a universal adaptive threshold derived from the distribution of the score $s$ over the model components: EMP maps $s$ to a number $N_{eff}=N_{eff}(s)$, called the effective sample size. The $N-N_{eff}$ lowest scoring components are discarded. A tight lower bound on the effective mass $s_{eff}$ (the sum of retained normalized scores) in terms of $N_{eff}$ is derived. This process yields models with a provable upper bound on the loss change relative to the original dense model. Numerical experiments are performed demonstrating this phenomenon across a variety of network architectures including MLPs, CNNs, Transformers, LLMs, and KAN. It is also shown that EMP addresses a rich set of pruning criteria such as weight magnitude, attention score, KAN importance score, and even feature-level signals such as image pixels.

2509.22963 2026-05-21 cs.LG 版本更新

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

基于离散扩散策略的强化学习

Haitong Ma, Ofir Nabati, Aviv Rosenberg, Bo Dai, Oran Lang, Craig Boutilier, Na Li, Shie Mannor, Lior Shani, Guy Tenneholtz

发表机构 * Google Research(谷歌研究) Harvard University(哈佛大学) Google DeepMind(谷歌DeepMind) Nvidia Research(Nvidia研究)

AI总结 本文提出了一种新的框架,用于在复杂的组合动作空间中训练高效的离散扩散模型策略,通过高效的在线训练过程和策略镜像下降方法,实现了稳定的策略改进,并在多个挑战性组合基准上取得了最先进的性能。

Comments 22 pages, 10 figures. Haitong Ma and Ofir Nabati contributed equally to this paper

详情
AI中文摘要

强化学习(RL)在面对许多现实问题中常见的大规模组合动作空间时面临扩展困难。本文介绍了一种新的框架,用于训练离散扩散模型作为这些复杂设置中的高效策略。我们的关键创新是一个高效的在线训练过程,确保了稳定的策略改进。通过利用策略镜像下降(PMD)来定义一个理想的、正则化的目标策略分布,我们将策略更新框架为一个分布匹配问题,训练具有表现力的扩散模型以复制这个稳定的靶向分布。这种解耦方法稳定了学习过程,并显著提高了训练性能。我们的方法在一系列具有挑战性的组合基准上实现了最先进的结果和优越的样本效率,包括DNA序列生成、具有宏动作的强化学习和多智能体系统。实验表明,我们的扩散策略在与其他基线相比时表现出优越的性能。

英文摘要

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these complex settings. Our key innovation is an efficient online training process that ensures stable and effective policy improvement. By leveraging policy mirror descent (PMD) to define an ideal, regularized target policy distribution, we frame the policy update as a distributional matching problem, training the expressive diffusion model to replicate this stable target. This decoupled approach stabilizes learning and significantly enhances training performance. Our method achieves state-of-the-art results and superior sample efficiency across a diverse set of challenging combinatorial benchmarks, including DNA sequence generation, RL with macro-actions, and multi-agent systems. Experiments demonstrate that our diffusion policies attain superior performance compared to other baselines.

2509.13648 2026-05-21 cs.LG cs.IR 版本更新

Sequential Data Augmentation for Generative Recommendation

生成推荐中的序列数据增强

Geon Lee, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Kijung Shin, Neil Shah, Liam Collins

发表机构 * Snap Inc.(Snap公司)

AI总结 本文研究了生成推荐中数据增强的影响,提出了一种系统化的框架GenPAS,通过三种受偏步骤统一了多种增强策略,提升了模型的准确率、数据效率和参数效率。

详情
AI中文摘要

生成推荐在个性化系统中起着关键作用,通过预测用户的历史行为序列来预测用户未来的行为。在训练这些模型时,数据增强是一个关键但尚未充分研究的因素,即从用户交互历史中构建训练数据的过程。通过塑造训练分布,数据增强直接影响模型的泛化能力和性能。然而,在现有工作中,这一过程通常被简化、应用不一致或被视为次要设计选择,而没有系统和原则性的理解。受我们实证发现不同增强策略会产生显著性能差异的启发,我们深入分析了它们如何重塑训练分布并影响与未来目标的对齐以及对未见输入的泛化能力。为了系统化这一设计空间,我们提出GenPAS,一个通用且原则性的框架,将增强建模为输入-目标对上的随机采样过程,包含三个受偏步骤:序列采样、目标采样和输入采样。这种形式将广泛使用的策略作为特殊情况统一起来,并使训练分布的灵活控制成为可能。我们在基准和工业数据集上的大量实验表明,GenPAS在准确率、数据效率和参数效率方面优于现有策略,为生成推荐中原则性的训练数据构建提供了实用指导。我们的代码可在https://github.com/snap-research/GenPAS上获得。

英文摘要

Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training data from user interaction histories. By shaping the training distribution, data augmentation directly and often substantially affects model generalization and performance. Nevertheless, in much of the existing work, this process is simplified, applied inconsistently, or treated as a minor design choice, without a systematic and principled understanding of its effects. Motivated by our empirical finding that different augmentation strategies can yield large performance disparities, we conduct an in-depth analysis of how they reshape training distributions and influence alignment with future targets and generalization to unseen inputs. To systematize this design space, we propose GenPAS, a generalized and principled framework that models augmentation as a stochastic sampling process over input-target pairs with three bias-controlled steps: sequence sampling, target sampling, and input sampling. This formulation unifies widely used strategies as special cases and enables flexible control of the resulting training distribution. Our extensive experiments on benchmark and industrial datasets demonstrate that GenPAS yields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies, providing practical guidance for principled training data construction in generative recommendation. Our code is available at https://github.com/snap-research/GenPAS.

2508.16474 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Reinforcement Learning-based Control via Y-wise Affine Neural Networks (YANNs)

基于Y-wise仿射神经网络的强化学习控制

Austin Braniff, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering, West Virginia University(化学与生物医学工程系,西弗吉尼亚大学)

AI总结 本文提出了一种基于Y-wise仿射神经网络(YANNs)的新型强化学习算法,通过利用YANNs的可解释性,将多参数线性模型预测控制的显式解重新表述,并在初始化RL策略网络和评估网络时提供线性最优控制的自信度,最终实现对一般非线性优化问题的求解。

详情
Journal ref
Computers & Chemical Engineering, Volume 209, 109610 (2026)
AI中文摘要

本文提出了一种基于Y-wise仿射神经网络(YANNs)的新型强化学习算法。YANNs提供了一种可解释的神经网络,能够精确表示任意输入和输出维度的分段仿射函数,定义在任意数量的多面体子域上。YANNs的一个典型应用是重新表述多参数线性模型预测控制的显式解。在此基础上,本文提出利用YANNs初始化RL的策略网络和评估网络,使由此产生的YANN-RL控制算法能够以线性最优控制的自信度开始。YANN-策略网络通过使用离线计算获得的多参数控制解,利用近似的线性系统模型进行初始化。YANN-评估网络表示线性系统中状态-动作价值函数的显式形式以及作为优化控制问题(OCP)目标函数的奖励函数。此外,通过注入额外的网络层来扩展YANNs以实现非线性表达,这些层可以在线通过直接与真实复杂的非线性系统交互进行训练。这样,策略和状态价值函数最初精确表示线性OCP,并能够最终学习一般非线性OCP的解。此外,还实现了连续策略改进,以提供启发式信心,即线性OCP的解作为RL策略性能的有效下界。YANN-RL算法在裁剪摆和安全关键的化学反应系统上进行了演示。实验结果表明,YANN-RL在考虑安全约束时显著优于使用深度确定性策略梯度的现代RL算法。

英文摘要

This work presents a novel reinforcement learning (RL) algorithm based on Y-wise Affine Neural Networks (YANNs). YANNs provide an interpretable neural network which can exactly represent known piecewise affine functions of arbitrary input and output dimensions defined on any amount of polytopic subdomains. One representative application of YANNs is to reformulate explicit solutions of multi-parametric linear model predictive control. Built on this, we propose the use of YANNs to initialize RL actor and critic networks, which enables the resulting YANN-RL control algorithm to start with the confidence of linear optimal control. The YANN-actor is initialized by representing the multi-parametric control solutions obtained via offline computation using an approximated linear system model. The YANN-critic represents the explicit form of the state-action value function for the linear system and the reward function as the objective in an optimal control problem (OCP). Additional network layers are injected to extend YANNs for nonlinear expressions, which can be trained online by directly interacting with the true complex nonlinear system. In this way, both the policy and state-value functions exactly represent a linear OCP initially and are able to eventually learn the solution of a general nonlinear OCP. Continuous policy improvement is also implemented to provide heuristic confidence that the linear OCP solution serves as an effective lower bound to the performance of RL policy. The YANN-RL algorithm is demonstrated on a clipped pendulum and a safety-critical chemical-reactive system. Our results show that YANN-RL significantly outperforms the modern RL algorithm using deep deterministic policy gradient, especially when considering safety constraints.

2508.16453 2026-05-21 cs.SI cs.CL cs.LG 版本更新

Anti-establishment sentiment on TikTok: Implications for understanding influence(rs) and expertise on social media

TikTok上的反 Establishment 情绪:对社交媒体中影响者和专业知识理解的启示

Tianliang Xu, Ariel Hasell, Sabina Tomkins

发表机构 * GitHub

AI总结 本文研究了TikTok上反 Establishment 情绪的普遍性,通过计算方法分析了金融、健康和阴谋论等主题内容中反 Establishment 情绪的分布,并探讨了社交媒体环境中反 Establishment 情绪对用户参与和平台激励的影响。

Comments 10 pages excluding references; 14 pages in total; 4 figures; Accepted by the AAAI Conference on Web and Social Media (ICWSM-2026)

详情
AI中文摘要

对公共服务机构的不信任和反 Establishment 观点正在上升(尤其是在美国)。随着人们转向社交媒体获取信息,有必要了解社交媒体环境是否以及如何促进对机构的不信任。在社交媒体中,内容创作者、影响者和其他意见领袖往往将自己定位为在健康、政治等众多话题上具有专业知识和权威性,并在许多情况下贬低和否定机构专业知识以建立追随者并增加自身可见性。然而,这种内容的普及程度以及此类内容是否增加参与度仍不清楚。本研究分析了TikTok平台上反 Establishment 情绪(AES)的普遍性。尽管TikTok作为信息来源非常流行,但其仍然相对研究较少,可能为人们如何形成对机构态度提供重要见解。我们采用计算方法,对TikTok帖子进行标注,判断其是否包含AES,涵盖内容创作者通常定位为专家的主题领域:金融和健康。作为比较,我们还考虑了阴谋论主题,其中AES预期较为常见。我们发现,AES在阴谋论内容中最为普遍,而在其他两个主题的内容中相对罕见。然而,我们发现与此类内容的参与模式因领域而异,并且可能存在平台激励用户发布表达反 Establishment 情绪的内容。

英文摘要

Distrust of public serving institutions and anti-establishment views are on the rise (especially in the U.S.). As people turn to social media for information, it is imperative to understand whether and how social media environments may be contributing to distrust of institutions. In social media, content creators, influencers, and other opinion leaders often position themselves as having expertise and authority on a range of topics from health to politics, and in many cases devalue and dismiss institutional expertise to build a following and increase their own visibility. However, the extent to which this content appears and whether such content increases engagement is unclear. This study analyzes the prevalence of anti-establishment sentiment (AES) on the social media platform TikTok. Despite its popularity as a source of information, TikTok remains relatively understudied and may provide important insights into how people form attitudes towards institutions. We employ a computational approach to label TikTok posts as containing AES or not across topical domains where content creators tend to frame themselves as experts: finance and wellness. As a comparison, we also consider the topic of conspiracy theories, where AES is expected to be common. We find that AES is most prevalent in conspiracy theory content, and relatively rare in content related to the other two topics. However, we find that engagement patterns with such content varies by area, and that there may be platform incentives for users to post content that expresses anti-establishment sentiment.

2508.11354 2026-05-21 cs.CV cs.AI cs.LG 版本更新

FunduSegmenter: Leveraging the RETFound Foundation Model for Joint Optic Disc and Optic Cup Segmentation in Retinal Fundus Images

FunduSegmenter:利用RETFound基础模型进行视网膜底照相图像中视盘和视杯联合分割

Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco

发表机构 * University of Dundee(邓迪大学)

AI总结 本文提出了一种基于RETFound基础模型的FunduSegmenter模型,通过引入一系列新颖模块实现视盘和视杯的联合分割,实验表明该模型在多个数据集上均优于现有方法。

详情
Journal ref
Trans. Vis. Sci. Tech. 2026;15(5):14
AI中文摘要

目的:本研究首次将RETFound模型应用于视盘(OD)和视杯(OC)的联合分割。RETFound是一个为眼底相机和光学相干断层扫描图像开发的知名基础模型,已在疾病诊断中表现出色。方法:我们提出FunduSegmenter,该模型整合了一系列新颖模块与RETFound,包括预适配器、解码器、后适配器、带有卷积块注意模块的跳跃连接以及视觉Transformer块适配器。该模型在自有数据集GoDARTS以及四个公开数据集IDRiD、Drishti-GS、RIM-ONE-r3和REFUGE上进行了评估,通过内部验证、外部验证和领域泛化实验进行验证。结果:在内部验证中,平均Dice相似系数达到90.51%,优于所有基线方法,其中nnU-Net为82.91%,DUNet为89.17%,TransUNet为87.91%。在所有外部验证实验中,平均结果比最佳基线高约3%,且在领域泛化中也具有竞争力。结论:本研究探讨了RETFound通过学习潜在通用表示在眼底相机图像中进行OD和OC分割的潜力。我们的FunduSegmenter在整体上优于现有最先进基线方法。所提出的模块是通用的,可以扩展到其他基础模型的微调。临床相关性:该模型在分布内和分布外数据上均表现出强大的稳定性与泛化能力,提供了稳定的OD和OC分割。这是许多自动化任务的关键步骤,从设置准确的视网膜坐标到生物标志物发现。代码和训练权重可在:https://github.com/JusticeZzy/FunduSegmenter上获得。

英文摘要

Purpose: This study introduces the first adaptation of RETFound for joint optic disc (OD) and optic cup (OC) segmentation. RETFound is a well-known foundation model developed for fundus camera and optical coherence tomography images, which has shown promising performance in disease diagnosis. Methods: We propose FunduSegmenter, a model integrating a series of novel modules with RETFound, including a Pre-adapter, a Decoder, a Post-adapter, skip connections with Convolutional Block Attention Module and a Vision Transformer block adapter. The model is evaluated on a proprietary dataset, GoDARTS, and four public datasets, IDRiD, Drishti-GS, RIM-ONE-r3, and REFUGE, through internal verification, external verification and domain generalization experiments. Results: An average Dice similarity coefficient of 90.51% was achieved in internal verification, which outperformed all baselines, some substantially (nnU-Net: 82.91%; DUNet: 89.17%; TransUNet: 87.91%). In all external verification experiments, the average results were about 3% higher than those of the best baseline, and our model was also competitive in domain generalization. Conclusions: This study explored the potential of the latent general representations learned by RETFound for OD and OC segmentation in fundus camera images. Our FunduSegmenter generally outperformed state-of-the-art baseline methods. The proposed modules are general and can be extended to fine-tuning other foundation models. Translational Relevance: The model shows strong stability and generalization on both in-distribution and out-of-distribution data, providing stable OD and OC segmentation. This is an essential step for many automated tasks, from setting the accurate retinal coordinate to biomarker discovery. The code and trained weights are available at: https://github.com/JusticeZzy/FunduSegmenter.

2508.09001 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Retrospective Sparse Attention for Efficient Long-Context Generation

回顾性稀疏注意力用于高效长上下文生成

Seonghwan Choi, Beomseok Kang, Dongwon Jo, Jae-Joon Kim

发表机构 * Seoul National University(首尔国立大学)

AI总结 本文提出RetroAttention,一种新的KV缓存更新技术,通过回顾后续解码步骤的KV条目来修正过去的注意力输出,从而提高长上下文生成的效率和准确性。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地应用于长上下文任务,如推理、代码生成和多轮对话。然而,扩展上下文的推理受到键值(KV)缓存的限制,其内存占用与序列长度成线性增长,且在每个解码步骤中主导延迟。尽管最近的KV缓存压缩方法识别并加载重要的少量token,但它们主要集中在输入上下文中,未能解决长时间解码中累积的注意力误差。在本文中,我们引入了RetroAttention,一种新的KV缓存更新技术,通过回顾后续解码步骤的KV条目来修正过去的注意力输出。通过维护一个轻量级的输出缓存,RetroAttention使过去的查询能够高效地补充更多上下文,同时产生最小的延迟开销。这打破了固定注意力输出的范式,允许对先前近似进行持续修正。在长生成基准测试中,RetroAttention在长生成任务中始终优于最先进的(SOTA)KV压缩方法,有效KV暴露量增加高达1.6倍,准确性提高高达21.9%。

英文摘要

Large Language Models (LLMs) are increasingly deployed in long-context tasks such as reasoning, code generation, and multi-turn dialogue. However, inference over extended contexts is bottlenecked by the Key-Value (KV) cache, whose memory footprint grows linearly with sequence length and dominates latency at each decoding step. While recent KV cache compression methods identify and load important few tokens, they focus predominantly on input contexts and fail to address the cumulative attention errors that arise during long decoding. In this paper, we introduce RetroAttention, a novel KV cache update technique that retrospectively revises past attention outputs using newly arrived KV entries from subsequent decoding steps. By maintaining a lightweight output cache, RetroAttention enables past queries to be efficiently supplemented with more contexts, while incurring minimal latency overhead. This breaks the fixed-attention-output paradigm and allows continual correction of prior approximations. Extensive experiments on long-generation benchmarks show that RetroAttention consistently outperforms state-of-the-art (SOTA) KV compression methods, increasing effective KV exposure by up to 1.6$\times$ and accuracy by up to 21.9\%.

2508.04999 2026-05-21 cs.LG 版本更新

Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis

通过建模内模和跨模态因果注意力来解构偏见以进行多模态情感分析

Menghua Jiang, Yuxia Lin, Baoliang Chen, Haifeng Hu, Yuncheng Jiang, Sijie Mai

发表机构 * School of Computer Science, South China Normal University(华南师范大学计算机学院) School of Electronics and Information Technology, Sun Yat-sen University(中山大学电子与信息学院)

AI总结 本文提出了一种多关系多模态因果干预(MMCI)框架,通过因果理论的后门调整来解决多模态情感分析中因统计捷径导致的偏见问题,通过建模多模态输入为多关系图并应用注意力机制分离因果特征和捷径特征,从而提升模型在分布偏移下的稳定性。

Comments Corrected several hyperparameter settings. Updated some experimental results

详情
AI中文摘要

多模态情感分析(MSA)旨在通过整合文本、音频和视觉等多种模态的信息来理解人类情感。然而,现有方法常面临模态内部和跨模态的虚假相关性问题,导致模型依赖统计捷径而非真实因果关系,从而影响泛化能力。为缓解此问题,我们提出了一种多关系多模态因果干预(MMCI)框架,该框架利用因果理论中的后门调整方法来处理此类捷径的干扰影响。具体而言,我们首先将多模态输入建模为多关系图,以显式捕捉内模和跨模依赖关系。然后,我们应用注意力机制,分别估计并分离与这些内模和跨模关系对应的因果特征和捷径特征。最后,通过应用后门调整,我们对捷径特征进行分层并动态将其与因果特征结合,以促使MMCI在分布偏移下产生稳定的预测。在多个标准MSA数据集和分布外(OOD)测试集上的大量实验表明,我们的方法有效抑制了偏见并提升了性能。

英文摘要

Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) framework, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.

2508.02291 2026-05-21 cs.LG cs.AI 版本更新

FAIR-Pruner: A Flexible Framework for Automatic Layer-Wise Pruning via Tolerance of Difference

FAIR-Pruner: 一种通过差异容忍性实现自动分层剪枝的灵活框架

Chenqing Lin, Mostafa Hussien, Chengyao Yu, Bingyi Jing, Ruixing Ming, Kim Khoa Nguyen, Mohamed Cheriet

发表机构 * School of Statistics and Mathematics, Zhejiang Gongshang University(浙江工商大学统计与数学学院) École de technologie supérieure (ÉTS), Université du Québec(魁北克大学埃克森技术学院) Southern University of Science and Technology(南方科技大学)

AI总结 本文提出FAIR-Pruner,一种无需搜索的自适应分层结构化剪枝框架,通过引入差异容忍度(ToD)来实现非均匀的分层剪枝深度,从而在多个数据集和模型上实现了良好的准确率-压缩率权衡。

Comments Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

详情
AI中文摘要

结构化剪枝是压缩深度神经网络的标准工具,但其实际性能取决于稀疏性如何分配到各层。我们提出了FAIR-Pruner,一种无需搜索的自适应分层结构化剪枝框架。FAIR-Pruner使用两种在同一层内的排名:一种是去除导向的信号,提出候选单元;另一种是保护导向的信号,识别任务敏感的单元。其核心组件,差异容忍度(ToD),测量去除前缀与保护尾部之间的重叠,并使用共享容忍级别来诱导各层非均匀的剪枝深度。作为默认视觉实例,FAIR-Pruner结合基于Wasserstein的U-Score用于类条件单元分离性,以及基于Taylor的R-Score用于任务级敏感性;相同的ToD分配规则也可以与替代的去除信号配对。理论上,我们通过群体R-Score分析ToD,推导出高R-Score质量进入剪枝集的排名控制,并识别出相同预算比较与均匀剪枝的加法交换条件。在CIFAR-10、CIFAR-100、SVHN和ImageNet上,跨VGG、ResNet、DenseNet、ConvNeXt和DeiT的实验显示了强的准确率-压缩率权衡。在 routed-expert Qwen1.5-MoE-A2.7B-Chat 上的仅剪枝实验进一步检验了在匹配专家预算下的架构扩展性。FAIR-Pruner作为可 pip-install 的开源包发布。

英文摘要

Structured pruning is a standard tool for compressing deep neural networks, but its practical performance depends on how sparsity is allocated across layers. We propose FAIR-Pruner, a search-free framework for adaptive layer-wise structured pruning. FAIR-Pruner uses two within-layer rankings: a removal-oriented signal that proposes candidate units and a protection-oriented signal that identifies task-sensitive units. Its core component, Tolerance of Difference (ToD), measures the overlap between the removal prefix and the protected tail, and uses a shared tolerance level to induce non-uniform pruning depths across layers. As a default vision instantiation, FAIR-Pruner combines a Wasserstein-based U-Score for class-conditional unit separability with a Taylor-based R-Score for task-level sensitivity; the same ToD allocation rule can also be paired with alternative removal signals. Theoretically, we analyze ToD through the population R-Score, derive rank-based control of the high-R-Score mass entering the pruning set, and identify an additive exchange condition for same-budget comparison with uniform pruning. Experiments on CIFAR-10, CIFAR-100, SVHN, and ImageNet across VGG, ResNet, DenseNet, ConvNeXt, and DeiT show strong accuracy--compression trade-offs. Prune-only experiments on routed-expert Qwen1.5-MoE-A2.7B-Chat further examine architectural extensibility under matched expert budgets. FAIR-Pruner is released as a pip-installable open-source package.

2507.06929 2026-05-21 cond-mat.mtrl-sci cs.LG physics.comp-ph 版本更新

Machine-Learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy

基于耦合簇水平精度的机器学习力场用于晶格动力学

Sita Schönbauer, Johanna P. Carbone, Fredrik V. Eriksson, Florian Libisch, Andreas Grüneis

发表机构 * Institute of Theoretical Physics, Technical University of Vienna(维也纳技术大学理论物理研究所) Faculty of Physics and Center for Computational Materials Science, University of Vienna(维也纳大学物理系和计算材料科学中心)

AI总结 本文研究了基于近似密度泛函理论和耦合簇水平势能面训练的机器学习力场,通过计算声子色散关系和振动密度态与实验和参考ab initio结果进行比较,验证了其在碳金刚石和锂氢固体中的准确性和精度,并探讨了通过耦合簇与密度泛函结果差异的delta学习方法和带电意识的机器学习力场方法。

Comments 17 pages, 7 figures

详情
AI中文摘要

我们研究了基于近似密度泛函理论(DFT)和耦合簇(CC)水平势能面训练的机器学习力场(MLFFs),用于碳金刚石和锂氢固体的晶格动力学。我们通过计算声子色散关系和振动密度态(VDOS),并与实验和参考ab initio结果进行比较,评估了MLFFs的准确性和精度。为克服CC训练数据中长程效应和缺乏原子力的限制,我们探讨了基于CC和DFT结果差异的delta学习方法以及带电意识的MLFF方法。与DFT相比,基于CC理论训练的MLFFs在光学模式的振动频率上更高,更符合实验结果。此外,MLFFs还用于估计锂氢在耦合簇水平上的非谐效应。

英文摘要

We investigate Machine-Learned Force Fields (MLFFs) trained on approximate Density Functional Theory (DFT) and Coupled Cluster (CC) level potential energy surfaces for the carbon diamond and lithium hydride solids. We assess the accuracy and precision of the MLFFs by calculating phonon dispersions and vibrational densities of states (VDOS) that are compared to experiment and reference ab initio results. To overcome limitations from long-range effects and the lack of atomic forces in the CC training data, a delta-learning approach based on the difference between CC and DFT results, as well as a charge aware MLFF approach is explored. Compared to DFT, MLFFs trained on CC theory yield higher vibrational frequencies for optical modes, agreeing better with experiment. Furthermore, the MLFFs are used to estimate anharmonic effects on the VDOS of lithium hydride at the level of CC theory.

2507.06344 2026-05-21 quant-ph cs.CC cs.LG 版本更新

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes

量子成本景观的梯度可扩展性与泰勒近似

Sabri Meyer, Francesco Scala, Francesco Tacchino, Aurelien Lucchi

发表机构 * Department of Mathematics and Computer Science, University of Basel(数学与计算机科学系,巴塞尔大学) IBM Quantum, IBM Research Europe -- Zurich(IBM量子,IBM欧洲研究院——苏黎世)

AI总结 本文研究了变分量子算法中梯度可扩展性与计算复杂性之间的关系,提出了一种经典模拟技术泰勒近似,并引入了线性克莱因编码器以确保梯度的常数可扩展性,通过数值实验发现梯度可能在超多项式复杂区域中衰减多项式而非指数。

Comments 12 pages, 6 figures, 54 pages of supplementary material

详情
AI中文摘要

变分量子算法是近期量子计算的有希望候选者,但因 barren plateaus 问题导致梯度相对于系统规模指数衰减,从而面临可扩展性挑战。最近的推测认为避免这些 plateaus 可能导致经典可模拟性,从而限制量子优势的机会。在本文中,我们推进了梯度可扩展性与变分量子算法计算复杂性之间关系的理论理解。我们首先提出了泰勒近似,一种经典模拟技术,它在近克莱因区域上匹配泡利路径运行时间保证,并在特定情况下提供运行时间优势。利用此近似,我们证明在之前已确立的经典可模拟区域之外,计算复杂性至少为超多项式。接着,我们引入了线性克莱因编码器,一种经典高效的基础集修改器,确保在接近克莱因电路的景观区域中梯度的常数可扩展性。最后,对这些修改后的景观进行数值实验,提供了初步的实验证据,表明在常数可扩展梯度可能在超多项式复杂区域中衰减多项式而非指数的过渡区。这些发现表明可能存在非消失梯度和超多项式复杂性共存的推测实例,这验证了未来正式证明的必要性。

英文摘要

Variational Quantum Algorithms are promising candidates for near-term quantum computing, yet they face scalability challenges due to barren plateaus, where gradients vanish exponentially relative to system size. Recent conjectures suggest that avoiding these plateaus might inherently lead to classical simulability, thereby limiting the opportunities for quantum advantage. In this work, we advance the theoretical understanding of the relationship between gradient scalability at initialization and the computational complexity of variational quantum algorithms. We first present the Taylor surrogate, a classical simulation technique that matches Pauli path runtime guarantees on near-Clifford regions while offering runtime advantages in specific regimes. Leveraging this surrogate, we prove that beyond previously established classically simulable regions, the computational complexity is at least super-polynomial. Next, we introduce the Linear Clifford Encoder, a classically efficient ansatz modifier that ensures constant-scaling gradients within landscape regions close to Clifford circuits. Finally, numerical experiments on these modified landscapes provide preliminary empirical evidence of a transition zone where constant-scaling gradients may decay polynomially in super-polynomially complex regions rather than exponentially. These findings suggest speculative instances where non-vanishing gradients and super-polynomial complexity could potentially coexist, vindicating the need for future formal proofs.

2506.21039 2026-05-21 cs.LG cs.AI 版本更新

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

严格子目标执行:在分层强化学习中的可靠长 horizon 规划

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence(人工智能研究生院) Ulsan National Institute of Science and Technology (UNIST)(釜山国立科学与技术研究所) Ulsan, South Korea(韩国釜山)

AI总结 本文提出严格子目标执行(SSE)框架,通过前沿经验回放(FER)分离不可达与可接受的子目标,提高高层决策效率,从而在长horizon任务中实现更可靠的规划。

Comments 10 pages for main, 26 pages for total, Accepted to ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

长horizon目标条件任务对强化学习(RL)提出了根本性挑战,特别是在目标遥远且奖励稀疏的情况下。虽然分层和图基方法提供了部分解决方案,但它们对传统 hindsight relabeling 的依赖往往无法纠正子目标不可行性,导致高层规划效率低下。为此,我们提出严格子目标执行(SSE),一种基于图的分层RL框架,整合前沿经验回放(FER)以分离不可达与可接受的子目标,并优化高层决策。FER利用失败和部分成功转移确定可达性前沿,识别不可靠的子目标,提高子目标可靠性,并减少不必要的高层决策。此外,SSE采用解耦探索策略以覆盖目标空间的未探索区域,并通过路径细化调整边成本以利用观察到的低层失败。在多样化的长horizon基准测试中,SSE在效率和成功率方面均优于现有目标条件和分层RL方法。我们的代码可在 https://jaebak1996.github.io/SSE/ 上获得。

英文摘要

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions of the goal space and a path refinement that adjusts edge costs using observed low-level failures. Experimental results across diverse long-horizon benchmarks show that SSE consistently outperforms existing goal-conditioned and hierarchical RL methods in both efficiency and success rate. Our code is available at https://jaebak1996.github.io/SSE/

2506.17631 2026-05-21 cs.LG cs.AI 版本更新

Time-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting

Time-Prompt: 集成异构提示以解锁时间序列预测中的LLM

Zesen Wang, Lijuan Lan, Yonggang Li

发表机构 * Central South University, Changsha, China(中南大学,长沙,中国)

AI总结 本文提出Time-Prompt框架,通过构建统一的提示范式、设计语义空间嵌入和跨模态对齐模块以及高效微调LLM参数,提升时间序列预测性能,并在碳排放数据集上验证其有效性。

Comments Accepted at IJCNN 2026

详情
AI中文摘要

时间序列预测旨在建模变量间的时序依赖关系以推断未来状态,对现实世界场景具有重要性和广泛应用。尽管基于深度学习的方法已取得显著进展,但其在长期预测中仍表现不佳。最近研究表明,大型语言模型(LLMs)在时间序列预测中表现出色,但其在该任务中的实用性仍存疑。为此,我们提出Time-Prompt框架,旨在激活LLMs进行时间序列预测。具体而言,我们首先构建了一个统一的提示范式,利用可学习的软提示引导LLM的行为,并利用文本化的硬提示增强时间序列表示。其次,为了增强LLM对预测任务的全面理解,我们设计了一个语义空间嵌入和跨模态对齐模块,以实现时序和文本数据的融合。最后,我们利用时间序列数据高效地微调LLM的参数。此外,我们专注于碳排放领域,旨在为全球碳中和做出贡献。在6个公开数据集和3个碳排放数据集上的综合评估表明,Time-Prompt是一个强大的时间序列预测框架。

英文摘要

Time series forecasting aims to model temporal dependencies among variables for future state inference, holding significant importance and widespread applications in real-world scenarios. Although deep learning-based methods have achieved remarkable progress, they still exhibit suboptimal performance in long-term forecasting. Recent research demonstrates that large language models (LLMs) achieve promising performance in time series forecasting, but this progress is still met with skepticism about whether LLMs are truly useful for this task. To address this, we propose Time-Prompt, a framework for activating LLMs for time series forecasting. Specifically, we first construct a unified prompt paradigm with learnable soft prompts to guide the LLM's behavior and textualized hard prompts to enhance the time series representations. Second, to enhance LLM' comprehensive understanding of the forecasting task, we design a semantic space embedding and cross-modal alignment module to achieve fusion of temporal and textual data. Finally, we efficiently fine-tune the LLM's parameters using time series data. Furthermore, we focus on carbon emissions, aiming to provide a modest contribution to global carbon neutrality. Comprehensive evaluations on 6 public datasets and 3 carbon emission datasets demonstrate that Time-Prompt is a powerful framework for time series forecasting.

2505.19075 2026-05-21 cs.AI cs.CL cs.LG 版本更新

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

Universal Reasoner: 一个单一、可组合的即插即用推理器用于冻结的LLM

Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye

发表机构 * Graduate School of Artificial Intelligence, Korea Advanced Institute of Science and Technology(人工智能研究生院,韩国科学技术院)

AI总结 本文提出Universal Reasoner,一种可组合且即插即用的推理模块,能够在冻结的大规模语言模型上提供专门的推理能力,通过共享或对齐的token空间实现弱到强的泛化,实验表明其在数学推理和机器翻译中优于现有微调方法。

Comments ICML 2026

详情
AI中文摘要

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

英文摘要

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

2505.07054 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions

YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions

Austin Braniff, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering, West Virginia University, United States of America(化学与生物医学工程系,西弗吉尼亚大学,美国)

AI总结 本文提出YANNs,一种能够精确且高效表示分段线性函数的Y-wise仿射神经网络,无需训练即可实现功能等效表示,为多参数模型预测控制提供了应用展示,展示了在实时计算中的高效性与控制理论保证。

详情
Journal ref
Computers & Chemical Engineering, Volume 208, 109589 (2026)
AI中文摘要

本文正式介绍了Y-wise仿射神经网络(YANNs),一种完全可解释的网络架构,能够连续且高效地表示具有多面体子域的分段仿射函数。根据证明,YANNs的开发无需训练即可实现功能等效表示。YANNs因此保留了原始公式的全部数学性质。多参数模型预测控制被用作YANNs的应用展示,理论上计算最优控制律作为状态、输出、设定点和扰动的分段仿射函数。通过精确表示多参数控制律,YANNs保留了如递归可行性与稳定性等关键控制理论保证。这使YANNs区别于现有工作,后者将神经网络用于近似最优控制律而非精确表示。通过优化网络推理速度,YANNs在实时计算中比传统分段仿射函数计算快得多。数值案例研究展示了算法在输入/输出维度和子域数量方面的可扩展性。YANNs在控制领域代表了重大进展,作为首个内在确保可行性和稳定性的神经网络控制器。未来应用可将其作为数据驱动建模/控制的高效且可解释的起点。

英文摘要

This work formally introduces Y-wise Affine Neural Networks (YANNs), a fully-explainable network architecture that continuously and efficiently represent piecewise affine functions with polytopic subdomains. Following from the proofs, it is shown that the development of YANNs requires no training to achieve the functionally equivalent representation. YANNs thus maintain all mathematical properties of the original formulations. Multi-parametric model predictive control is utilized as an application showcase of YANNs, which theoretically computes optimal control laws as a piecewise affine function of states, outputs, setpoints, and disturbances. With the exact representation of multi-parametric control laws, YANNs retain essential control-theoretic guarantees such as recursive feasibility and stability. This sets YANNs apart from the existing works which apply neural networks for approximating optimal control laws instead of exactly representing them. By optimizing the inference speed of the networks, YANNs can evaluate substantially faster in real-time compared to traditional piecewise affine function calculations. Numerical case studies are presented to demonstrate the algorithmic scalability with respect to the input/output dimensions and the number of subdomains. YANNs represent a significant advancement in control as the first neural network-based controller that inherently ensures both feasibility and stability. Future applications can leverage them as an efficient and interpretable starting point for data-driven modeling/control.

2502.12120 2026-05-21 cs.LG cs.AI cs.CL 版本更新

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

LLMs on the Line: 数据决定损失-损失缩放定律

Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, Matthias Bethge, Wieland Brendel

发表机构 * Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所) Tübingen AI Center(图宾根人工智能中心) University of Tübingen(图宾根大学)

AI总结 研究探讨了影响LLM损失-损失缩放定律的主要因素,发现预训练数据决定了缩放趋势,而模型大小、优化超参数、分词器和架构差异对缩放影响有限,因此应精心选择预训练数据以获得最佳下游性能。

Comments ICML 2025 camera-ready version

详情
AI中文摘要

缩放定律指导大型语言模型(LLMs)的发展,通过提供模型大小、令牌和计算量之间的最佳平衡估计。最近,损失-损失缩放定律,即预训练数据集和下游任务之间损失的关系,已成为理解并改进LLM性能和泛化能力的强大工具。在本工作中,我们研究了哪些因素最强烈地影响损失-损失缩放。我们的实验发现,预训练数据决定了缩放趋势。相比之下,模型大小、优化超参数、分词器甚至显著的架构差异,如基于Transformer的模型如Llama和状态空间模型如Mamba之间的差异,通常影响有限。因此,从业者应仔细选择适合的预训练数据集以获得最佳下游性能,而架构和其他设置可以自由优化以提高训练效率。

英文摘要

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance and generalization. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data determines the scaling trend. In contrast, model size, optimization hyperparameters, tokenizer and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, generally have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.

2502.03752 2026-05-21 cs.LG cs.AI 版本更新

Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning

基于鲁棒技能的元强化学习中的自我改进技能学习

Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence(人工智能研究生院) Ulsan National Institute of Science and Technology (UNIST)(釜山国立科学技术研究院 (UNIST))

AI总结 本文提出Self-Improving Skill Learning (SISL)方法,通过解耦的高层和技能改进策略进行自我指导的技能细化,并利用最大回报重标记进行技能优先级排序,从而在噪声和次优数据下实现鲁棒且稳定的适应,优于其他基于技能的元强化学习方法。

Comments 10 pages main, 27 pages appendix with reference. Accepted to ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

元强化学习(Meta-RL)能够快速适应未见任务,但在长时间 horizon 环境中面临挑战。基于技能的方法通过将状态-动作序列分解为可重用的技能并采用分层决策来解决这一问题。然而,这些方法对噪声的离线演示高度敏感,导致技能学习不稳定和性能下降。为此,我们提出Self-Improving Skill Learning (SISL),通过解耦的高层和技能改进策略进行自我指导的技能细化,同时应用最大回报重标记进行技能优先级排序,从而在噪声和次优数据下实现鲁棒且稳定的适应。通过减轻噪声的影响,SISL实现了可靠的技能学习,并在多样化的长horizon任务上一致优于其他基于技能的元强化学习方法。我们的代码可在https://epsilog.github.io/SISL获取。

英文摘要

Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks. Our code is available at https://epsilog.github.io/SISL.

2502.02844 2026-05-21 cs.LG cs.AI cs.CR cs.MA 版本更新

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

狼群对抗攻击用于鲁棒多智能体强化学习

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea(人工智能研究生院,UNIST,韩国乌山)

AI总结 本文提出狼群对抗攻击框架,用于对抗多智能体强化学习中的协同对抗攻击,并引入狼群-对抗学习框架来训练鲁棒的MARL策略以防御该攻击。

Comments 9 pages main, 23 pages appendix with reference. Accepeted by ICML 2025

详情
Journal ref
Proceedings of Machine Learning Research (PMLR), ICML 2025
AI中文摘要

传统多智能体强化学习(MARL)中的鲁棒方法往往难以应对合作场景中的协调对抗攻击。为了解决这一限制,我们提出了受狼群狩猎策略启发的狼群对抗攻击框架,该框架针对初始智能体及其辅助智能体以破坏合作。此外,我们还引入了狼群-对抗学习用于MARL(WALL)框架,该框架通过促进系统内协作来训练鲁棒的MARL策略以防御所提出的狼群攻击。实验结果突显了狼群攻击的毁灭性影响以及WALL所实现的显著鲁棒性改进。我们的代码可在https://github.com/sunwoolee0504/WALL上获得。

英文摘要

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering systemwide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL. Our code is available at https://github.com/sunwoolee0504/WALL.

2502.02834 2026-05-21 cs.LG cs.AI 版本更新

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

任务感知虚拟训练:增强元强化学习在分布外任务中的泛化能力

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea(人工智能研究生院,UNIST,韩国乌山)

AI总结 本文提出Task-Aware Virtual Training方法,通过度量学习提升元强化学习在分布外任务中的泛化能力,采用虚拟任务保持任务特征并利用状态正则化技术减少状态变化环境中的过估计误差。

Comments 9 pages main paper, 20 pages appendices with reference. Accepted to ICML 2025

详情
Journal ref
Proceedings of Machine Learning Research (PMLR), ICML 2025
AI中文摘要

元强化学习旨在开发能够泛化到未见任务的策略,这些任务从任务分布中采样。尽管基于上下文的元强化学习方法通过任务潜在变量改善任务表示,但它们在分布外(OOD)任务上常常表现不佳。为了解决这个问题,我们提出了Task-Aware Virtual Training(TAVT),一种新的算法,通过度量基于的表示学习准确捕捉任务特征,用于训练和OOD场景。我们的方法在虚拟任务中成功保持任务特征,并采用状态正则化技术以减轻状态变化环境中的过估计误差。数值结果表明,TAVT在各种MuJoCo和MetaWorld环境中显著增强了对OOD任务的泛化能力。我们的代码可在https://github.com/JM-Kim-94/tavt.git获取。

英文摘要

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

2501.02407 2026-05-21 cs.CL cs.CR cs.LG 版本更新

Towards the Anonymization of the Language Modeling

朝向语言模型的匿名化

Antoine Boutet, Lucas Magnana, Juliette Sénéchal

发表机构 * INSA Lyon, Inria, CITI, UR3720(里昂国家理工学院、法国国家科学研究中心、CITI、UR3720) Inria, INSA Lyon, CITI, UR3720(法国国家科学研究中心、里昂国家理工学院、CITI、UR3720)

AI总结 本文提出了一种隐私保护的语言模型方法,通过掩码语言模型(MLM)和因果语言模型(CLM)方法,旨在解决语言模型的匿名化问题,从而促进其共享。研究通过医疗数据集评估了这两种方法,并表明在避免记忆直接和间接标识信息的同时,能够保持高隐私性和高实用性。

详情
AI中文摘要

自然语言处理(NLP)的快速发展已经革新了许多领域,包括医疗保健。然而,这些进展带来了显著的隐私问题,特别是当预训练模型在敏感数据上进行微调和专门化时,可能会记住并暴露个人信息。本文提出了一种隐私保护的语言模型方法,以解决语言模型的匿名化问题,从而促进其共享。具体来说,我们提出了掩码语言模型(MLM)方法,用于专门化类似于BERT的语言模型,以及因果语言模型(CLM)方法,用于专门化类似于GPT的语言模型,以避免模型记住训练数据中直接和间接的标识信息。我们使用医疗数据集全面评估了我们的方法,并将其与不同的基线进行了比较。我们的结果表明,通过在模型专门化过程中避免记忆直接和间接的标识符,我们的掩码和因果语言模型方案在保持高隐私性的同时,能够保持高实用性。

英文摘要

Rapid advances in Natural Language Processing (NLP) have revolutionized many fields, including healthcare. However, these advances raise significant privacy concerns, especially when pre-trained models fine-tuned and specialized on sensitive data can memorize and then expose and regurgitate personal information. This paper presents a privacy-preserving language modeling approach to address the problem of language models anonymization, and thus promote their sharing. Specifically, we propose both a Masking Language Modeling (MLM) methodology to specialize a BERT-like language model, and a Causal Language Modeling (CLM) methodology to specialize a GPT-like model that avoids the model from memorizing direct and indirect identifying information present in the training data. We have comprehensively evaluated our approaches using a medical dataset and compared them against different baselines. Our results indicate that by avoiding memorizing both direct and indirect identifiers during model specialization, our masking and causal language modeling schemes offer a good tradeoff for maintaining high privacy while retaining high utility.

2412.14738 2026-05-21 cs.LG 版本更新

Spectrally unstable nodes drive reliability failures in graph learning

谱不稳定性节点驱动图学习中的可靠性故障

Yongyu Wang

发表机构 * MTU(MTU大学)

AI总结 研究探讨了图学习中谱不稳定性节点对可靠性故障的影响,提出了一种可靠性感知干预方法以隔离这些节点,从而提升算法在对抗性和内在噪声下的鲁棒性。

详情
AI中文摘要

图学习算法在图结构被对抗性扰动、本质上嘈杂或由不完美观测构造时可能会失效。本文展示了一些节点比其他节点对对抗性扰动和内在噪声损害图学习算法承担更大的责任。基于图谱畸变分析,我们识别出这些故障驱动节点,并引入一种可靠性感知干预,将其隔离出主要学习步骤。目标算法应用于稳定的诱导子图,隔离节点的预测通过拓扑或质心传播恢复。在针对和非针对的结构攻击下,以及谱超图聚类和多视图谱聚类等图神经网络中,这一原理在对抗性和内在噪声下均提高了可靠性。这些结果表明,节点层面的谱不稳定性为理解并缓解图学习中的可靠性故障提供了一个共同机制。

英文摘要

Graph-learning algorithms can fail when graph structure is adversarially perturbed, intrinsically noisy or constructed from imperfect observations. Here we show that some nodes bear much greater responsibility than others for allowing adversarial perturbations and intrinsic noise to harm graph-learning algorithms. Building on graph-spectral distortion analysis, we identify these failure-driving nodes and introduce a reliability-aware intervention that isolates them from the main learning step. The target algorithm is applied to a stable induced subgraph, and predictions for isolated nodes are recovered through topology- or centroid-based propagation. Across graph neural networks under targeted and non-targeted structural attacks, spectral hypergraph clustering and multi-view spectral clustering, this principle improves reliability under both adversarial and intrinsic noise. These results suggest that node-level spectral instability provides a common mechanism for understanding and mitigating reliability failures in graph learning.

2410.23212 2026-05-21 stat.ML cs.LG math.ST stat.TH 版本更新

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

kNN图拉普拉斯算子的改进收敛速度:可微自调亲和力

Xiuyuan Cheng, Yixuan Tan, Nan Wu

发表机构 * Department of Mathematics, Duke University(杜克大学数学系) Department of Mathematical Sciences, The University of Texas at Dallas(德克萨斯大学达拉斯分校数学科学系)

AI总结 本文研究了kNN图的收敛速度问题,提出了一种可微自调亲和力的方法,通过改进分析得到在流形数据设定下,kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子,验证了理论结果。

详情
AI中文摘要

在基于图的数据分析中,k最近邻(kNN)图因其对局部数据密度的适应性而被广泛应用。允许图中边的加权,核化图亲和力提供了一种更一般的kNN图,其中kNN距离用于自适应地设置核带宽。在本文中,我们考虑了一类一般的kNN图,其中图亲和力为W_{ij}=ε^{-d/2}k_0(||x_i -x_j||^2/εϕ(ρ(x_i),ρ(x_j))^2),其中ρ(x)是点x的(重新缩放的)kNN距离,ϕ是一个对称双变量函数,k_0是一个非负函数。在流形数据设定下,其中N个i.i.d.样本x_i从一个未知的d维流形上的密度p中抽取,我们证明了在k_0和ϕ具有C^3正则性并满足其他技术条件时,kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子(取决于p),并验证了理论结果。

英文摘要

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = ε^{-d/2} k_0 ( \| x_i - x_j \|^2 / εϕ( \hat ρ(x_i), \hat ρ(x_j) )^2 ) $, with $\hatρ(x)$ being the (rescaled) $k$NN distance at the point $x$, $ϕ$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the operator pointwise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)})$, up to a log factor, when $k_0$ and $ϕ$ have $C^3$ regularity and satisfy other technical conditions. This is obtained when $ε\sim N^{-2/(d+6)}$ and $k \sim N^{6/(d+6)}$, both at the optimal order to balance the theoretical bias and variance errors. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.

2409.18272 2026-05-21 cs.LG 版本更新

SLIDE: A machine-learning based method for forced dynamic response estimation of multibody systems

SLIDE:一种基于机器学习的多体系统强迫动态响应估计方法

Peter Manzl, Alexander Humer, Qasim Khadim, Johannes Gerstmayr

发表机构 * University of Innsbruck, Austria(奥地利因斯布鲁克大学) Johannes Kepler University Linz, Austria(奥地利林茨约翰尼斯·凯普勒大学) University of Oulu, Finland(芬兰奥卢大学)

AI总结 本文提出了一种基于机器学习的SLIDE方法,用于估计机械或多体系统的输出序列,通过滑动窗口初始截断动态响应估计器,利用复数特征值近似阻尼效应,提高模拟速度并实现实时性能。

Comments Paper currently in submission for journal publication

详情
Journal ref
Mechanics Based Design of Structures and Machines 54(1), 2026
AI中文摘要

在计算工程中,提高模拟速度和效率是一个永恒的目标。为了充分利用神经网络技术和硬件,我们提出了SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE),一种基于深度学习的方法,用于估计机械或多体系统的输出序列,主要但不局限于强迫激励。SLIDE的一个关键优势是能够估计阻尼系统的动态响应,而无需完整系统状态,使其特别有效于柔性多体系统。该方法根据初始效应(如阻尼)的衰减截断输出窗口,该衰减通过系统线性化方程的复数特征值近似。此外,还训练了一个第二个神经网络来提供误差估计,进一步增强了方法的应用性。该方法应用于包括Duffing振荡器、柔性滑块-曲柄系统和安装在柔性底座上的工业6R机械臂在内的多种系统。我们的结果表明,从模拟到数百万次的加速显著,远超实时性能。

英文摘要

In computational engineering, enhancing the simulation speed and efficiency is a perpetual goal. To fully take advantage of neural network techniques and hardware, we present the SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE), a deep learning-based method designed to estimate output sequences of mechanical or multibody systems with primarily, but not exclusively, forced excitation. A key advantage of SLIDE is its ability to estimate the dynamic response of damped systems without requiring the full system state, making it particularly effective for flexible multibody systems. The method truncates the output window based on the decay of initial effects, such as damping, which is approximated by the complex eigenvalues of the systems linearized equations. In addition, a second neural network is trained to provide an error estimation, further enhancing the methods applicability. The method is applied to a diverse selection of systems, including the Duffing oscillator, a flexible slider-crank system, and an industrial 6R manipulator, mounted on a flexible socket. Our results demonstrate significant speedups from the simulation up to several millions, exceeding real-time performance substantially.

2409.04777 2026-05-21 cs.LG math.OC 版本更新

Optimization Hyper-parameter Laws for Large Language Models

大语言模型的优化超参数规律

Xingyu Xie, Kuangyu Ding, Shuicheng Yan, Kim-Chuan Toh, Tianwen Wei

发表机构 * Department of Mathematics, National University of Singapore, Singapore(新加坡国立大学数学系) School of Computing, National University of Singapore(新加坡国立大学计算机学院) Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore(新加坡国立大学数学系和运筹分析研究所) Skywork AI, Beijing(北京Skywork AI)

AI总结 本文提出Opt-Laws框架,通过分析SDE收敛和逃逸特性,预测最终训练损失,从而在小规模实验中预选学习率调度方案,提高了超参数选择的准确性。

详情
AI中文摘要

大语言模型推动了显著的AI进步,但其训练过程资源消耗大且对超参数选择高度敏感。尽管扩展定律提供了模型大小和数据需求的指导,但它们在选择动态超参数(如学习率调度)方面存在不足。为此,我们提出优化超参数规律(Opt-Laws),该框架将最终训练损失作为学习率调度、模型大小和数据大小的函数进行预测。基于SDE基于的收敛和逃逸分析,Opt-Laws产生可解释的收敛和逃逸特征,能够预测不同模型规模下的最终训练损失,从而在小规模实验中预选调度方案。实证表明,Opt-Laws在验证配置上实现了94%的Top-2命中率,正确识别了所有五个评估的非家族设置中的最佳性能调度家族,并以F1=0.92检测到训练发散。

英文摘要

Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimization Hyper-parameter Laws (Opt-Laws), a framework that predicts final training loss as a function of LR schedule, model size, and data size. Grounded in SDE-based convergence and escape analyses, Opt-Laws yield interpretable convergence and escape features that predict final training loss across model scales, enabling schedule pre-selection from small-scale experiments. Empirically, Opt-Laws achieve a 94% Top-2 hit rate for identifying near-optimal schedule candidates on held-out configurations, correctly identify the best-performing schedule family in all five evaluated out-of-family settings, and detect training divergence with F1 = 0.92.

2408.08812 2026-05-21 cs.LG 版本更新

TRAM: Test-Time Risk Adaptation with Mixture of Agents

TRAM: 测试时风险适应与代理混合

Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu

发表机构 * UT Austin(得克萨斯大学) University of Central Florida(中央佛罗里达大学) MIT(麻省理工学院) UMD(大学公园分校)

AI总结 本文研究了在部署时无需更新的零更新适应问题,提出TRAM方法通过混合代理评估源策略的风险调整分数,以降低部署风险并保持奖励。

详情
AI中文摘要

部署的强化学习代理常面临在训练后才指定的安全要求,如新的危险地图、修订的风险阈值或行为对齐约束。我们研究零更新部署时适应,其中固定的风险中性源策略库在新的奖励-风险权衡下被重用。我们提出TRAM(通过代理混合的测试时风险适应),一种源评分的组合规则,该规则在目标奖励和基于占用的部署风险下评估每个源策略,然后使用风险调整的源评分选择动作。不同于训练时与固定替代物(如回报方差)绑定的风险敏感方法,TRAM支持在测试时指定的空间屏障暴露、与参考行为的偏离以及局部波动风险。我们明确将TRAM作为替代方法:它不解决拼接策略的完整占用控制问题,但允许一个可测量的源壳匹配项,将源评分风险与实际风险联系起来。在网格世界、MuJoCo Reacher、Safety-Gymnasium和LLM对齐设置中的实验表明,TRAM在不需测试时任何参数更新的情况下减少了部署风险,同时保持了奖励。

英文摘要

Deployed reinforcement learning agents often face safety requirements that are specified only after training, such as new hazard maps, revised risk thresholds, or behavioral alignment constraints. We study zero-update deployment-time adaptation, where a fixed library of risk-neutral source policies is reused under a newly specified reward-risk tradeoff. We propose TRAM (Test-Time Risk Adaptation via Mixture of Agents), a source-scored composition rule that evaluates each source policy under the target reward and an occupancy-based deployment risk, then selects actions using risk-adjusted source scores. Unlike training-time risk-sensitive methods tied to a fixed surrogate such as return variance, TRAM supports spatial barrier exposure, divergence from a reference behavior, and local volatility risks specified at test time. We explicitly characterize TRAM as a surrogate method: it does not solve the full occupancy-control problem of the stitched policy, but admits a measurable source-hull mismatch term connecting source-scored risk to realized risk. Experiments in gridworlds, MuJoCo Reacher, Safety-Gymnasium, and an LLM alignment setting show that TRAM reduces deployment risk while preserving reward, without requiring any parameter updates at test time.

2407.08976 2026-05-21 stat.ML cs.LG math.ST stat.TH 版本更新

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

核两样本检验中计算与统计的权衡:随机傅里叶特征

Ikjun Choi, Ilmun Kim

发表机构 * Department of Statistics and Data Sciences, The University of Texas at Austin(德克萨斯大学奥斯汀分校统计与数据科学系) Department of Mathematical Sciences, KAIST(韩国科学技术院数学科学系)

AI总结 本文研究了使用随机傅里叶特征近似MMD检验在计算复杂度与统计功效之间的权衡,证明通过合理选择随机特征数量可以在亚二次时间内达到与MMD检验相同的最小最大分离率。

详情
AI中文摘要

近年来,两样本检验方法得到了快速发展,其中最大均值差异(MMD)检验已成为处理复杂和高维数据的有效工具。尽管MMD检验在成功和广泛应用方面表现突出,但其二次时间复杂度限制了大规模分析的应用。为了解决这一问题,本文重新审视了使用随机傅里叶特征近似的MMD检验,并研究其计算-统计权衡。我们首先揭示,只有当随机特征数量趋于无穷时,近似MMD检验才能在点估计上保持一致性。随后,我们考虑检验的均匀功效,并在最小最大检验框架下研究时间-功效权衡。我们的结果表明,通过精心选择随机特征数量,可以在亚二次时间内达到与MMD检验相同的最小最大分离率。我们基于不同的分布假设(如Sobolev球内的密度)展示了这一点。理论发现通过模拟研究得到验证。

英文摘要

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

2406.07125 2026-05-21 cs.CR cs.AI cs.LG 版本更新

CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation

CARACAS:用于详细CAN攻击模拟的车辆架构

Sadek Misto Kirdi, Nicola Scarano, Franco Oberti, Luca Mannella, Stefano Di Carlo, Alessandro Savino

发表机构 * Politecnico di Torino, Department of Control(都灵理工大学控制与计算机工程系)

AI总结 本文提出CARACAS,一种用于模拟详细CAN攻击的车辆模型,通过结合Simulink等仿真框架和攻击模型的稳健表示,生成合成数据集以提高IDS的检测能力,重点展示电池电动车的扭矩控制攻击模拟。

Comments 6 pages, 8 figures, TrustAICyberSec workshop - IEEE ISCC 2024

详情
Journal ref
Proceeding of the 29th IEEE Symposium on Computers and Communications, ISCC 2024
AI中文摘要

现代车辆越来越容易受到利用网络基础设施的攻击,特别是控制器局域网络(CAN)网络。为了使用基于数据分析和分类的现代工具如入侵检测系统(IDS)来有效应对这些威胁,需要大量的CAN消息大数据集。本文探讨了通过利用仿真框架如Simulink的建模能力以及攻击模型的稳健表示来生成合成数据集的可行性,提出了CARACAS车辆模型,包括通过CAN消息进行组件控制和攻击注入能力。CARACAS展示了该方法的有效性,包括电池电动车(BEV)模型,并重点针对两种不同的场景中的扭矩控制攻击进行分析。

英文摘要

Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.

2312.01386 2026-05-21 cs.LG stat.ML 版本更新

On the Suboptimality of GP-UCB under Polynomial Effective Optimism

关于多项式有效乐观性下GP-UCB的次优性质

Wenjia Wang, Xiaowei Zhang

发表机构 * Department of Industrial Systems Engineering and Management, National University of Singapore(新加坡国立大学工业系统工程与管理系) Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology(香港科学与技术大学工业工程与决策分析系)

AI总结 本文研究了GP-UCB在多项式有效乐观性下的次优性质,通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积),在统一置信假设下证明了GP-UCB在Matérn核下的新后悔下界,表明有效乐观性水平的多项式增长排除了最小最大最优后悔率,揭示了标准GP-UCB证明最小最大最优性的障碍。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)被广泛用于昂贵黑盒函数的序列优化。尽管文献中已建立了许多关于其累积后悔的上界,但GP-UCB是否最小最大最优仍是一个开放问题。我们通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积)来研究这一问题。在统一置信假设下,我们证明了GP-UCB在Matérn核下的新后悔下界。该下界表明,有效乐观性水平的多项式增长(至对数因子)排除了最小最大最优的后悔率。由于这一情形涵盖大多数现有分析,我们的结果指出了证明标准GP-UCB最小最大最优性的具体障碍。更广泛地说,它表明当前上界与最小最大下界之间的差距可能反映了算法本身的限制,而不仅仅是分析的限制。

英文摘要

Gaussian process upper confidence bound (GP-UCB) is widely used for sequential optimization of expensive black-box functions. Although many upper bounds on its cumulative regret have been established in the literature, whether GP-UCB is minimax optimal remains open. We study this question through the effective optimism level, defined as the product of the exploration coefficient and the regularization parameter in kernel ridge regression. Under a uniform confidence assumption, we prove a new regret lower bound for GP-UCB with Matérn kernels. The bound shows that polynomial growth of the effective optimism level, up to logarithmic factors, rules out the minimax-optimal regret rate. Since this is the regime covered by most existing analyses, our result identifies a concrete obstacle to proving minimax optimality for standard GP-UCB. More broadly, it suggests that the gap between current upper bounds and minimax lower bounds may reflect a real limitation of the algorithm, not only of the analysis.

2307.11925 2026-05-21 cs.LG math.CA 版本更新

Mercer Large-Scale Kernel Machines from Ridge Function Perspective

从岭函数视角出发的Mercer大规模核机

Karol Dziedziul, Sergey Kryzhevich, Paweł Wieczyński

发表机构 * Faculty of Applied Mathematics, The Gda\'nsk University of Technology, ul. G. Narutowicza 11/12, 80-952 Gda\'nsk, Poland

AI总结 本文从岭函数视角出发,研究大规模核机的Mercer性质,探讨了通过余弦函数的乘积之和近似核函数的可行性,并分析了该方法的障碍,应用于图像处理中的'一对一'方法。

Comments 17 pages, 3 figures

详情
AI中文摘要

为了从岭函数视角呈现Mercer大规模核机,我们回顾了Lin和Pinkus在《岭函数的基础性》中的结果。我们考虑了Rachimi和Recht于2008年发表的《大规模核机的随机特征》从近似理论的角度出发的主要结果。我们研究了哪些核可以被余弦函数的乘积之和近似,其中余弦函数的参数依赖于x和y,并展示了这种方法的障碍。本文的结果应用于图像处理中的'一对一'方法。

英文摘要

To present Mercer large-scale kernel machines from a ridge function perspective, we recall the results by Lin and Pinkus from {\it Fundamentality of ridge functions}. We consider the main result of the recent paper by Rachimi and Recht, 2008, {\it Random features for large-scale kernel machines} from the Approximation Theory point of view. We study which kernels could be approximated by a sum of products of cosine functions with arguments depending on $x$ and $y$ and present the obstacles of such an approach. The results of this article are applied to Image Processing by procedure "one-vs-rest".

2304.12906 2026-05-21 cs.LG stat.ML 版本更新

The Score-Difference Flow for Implicit Generative Modeling

隐式生成建模的分数差流

Romann M. Weber

发表机构 * Disney Research(迪士尼研究)

AI总结 本文提出分数差流作为隐式生成建模的一种新方法,通过最优减少两个分布之间的KL散度,展示了其与去噪扩散模型的等价性,并揭示了生成对抗网络训练中隐含的数据优化子问题与分数差流之间的联系。

Comments 25 pages, 5 figures, 4 tables. Updated final version of a paper originally published in Transactions on Machine Learning Research (TMLR), including minor typographical corrections and post-publication commentary connecting the SD flow to drifting models

详情
Journal ref
Transactions on Machine Learning Research (7/2023)
AI中文摘要

隐式生成建模(IGM)旨在生成与目标数据分布特征相符的合成样本。近期工作(如分数匹配网络、扩散模型)从推动合成源数据向目标分布的角度出发,通过动力学扰动或环境空间中的流来实现。在此方向上,我们提出任意目标与源分布之间的分数差(SD)作为一种流,该流能够最优地减少两者之间的KL散度。我们应用SD流到方便的代理分布上,这些分布只有在原始分布对齐时才对齐。我们证明在某些条件下,这种形式与去噪扩散模型具有形式等价性。我们还表明,生成对抗网络的训练包含一个隐含的数据优化子问题,当判别器最优时,该子问题在特定损失函数选择下诱导出SD流。因此,SD流为解决生成建模三重困境(高质量样本、模式覆盖和快速采样)的三种模型类别提供了理论联系,从而为统一方法奠定了基础。

英文摘要

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

2212.08989 2026-05-21 cs.LG 版本更新

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

深度学习应用于计算力学:综述、现状和经典方法

Loc Vu-Quoc, Alexander Humer

发表机构 * University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Johannes Kepler University(约翰尼斯·开普勒大学)

AI总结 本文综述了深度学习在计算力学中的应用,包括固体力学、流体力学和有限元技术,并讨论了混合和纯机器学习方法在解决非线性偏微分方程中的作用,同时介绍了LSTM、注意力机制和核方法等技术。

Comments 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Sciences

详情
Journal ref
CMES-Computer Modeling in Engineering & Sciences, Vol. 137, No. 2, pp.1069-1343, 2023
AI中文摘要

三个最近由于人工智能在艺术和科学领域取得的突破性进展作为动机:获奖的数字图像、蛋白质折叠、快速矩阵乘法。本文详细回顾了近年来在人工神经网络中的许多发展,特别是深度学习(DL),并将其应用于计算力学(固体力学、流体力学、有限元技术)。讨论了混合和纯机器学习(ML)方法。混合方法将传统PDE离散化与ML方法结合,以帮助建模复杂的非线性本构关系,非线性地降低模型阶数以实现高效模拟(湍流),或通过预测传统积分方法中的某些组件来加速模拟。其中,方法(1)和(2)依赖于长短期记忆(LSTM)架构,方法(3)依赖于卷积神经网络。纯ML方法解决(非线性)PDEs的方法由物理信息神经网络(PINN)方法表示,这些方法可以结合注意力机制来处理不连续解。LSTM和注意力架构,以及现代和通用的经典优化器,包括用于DL网络的随机性,都被广泛回顾。核机,包括高斯过程,为更高级的工作如浅层网络无限宽度提供了足够的深度。不仅面向专家,读者被假定熟悉计算力学,但不熟悉DL,其概念和应用从基础开始构建,旨在让首次学习者快速进入研究前沿。AI的历史和限制被回顾和讨论,特别关注指出经典方法中的错误陈述或误解,即使在知名参考文献中也是如此。大变形梁的位置和指向控制作为示例。

英文摘要

Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.

1908.05972 2026-05-21 cs.LG stat.ML 版本更新

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

基于AI的独立施工安全结果的属性预测

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

发表机构 * University of Edinburgh, UK(爱丁堡大学,英国) University of Colorado at Boulder, USA(科罗拉多大学博尔德分校,美国)

AI总结 本文改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法,使用NLP提取属性并训练模型预测伤害严重性、类型、受影响身体部位和事件类型,通过独立人工标注消除潜在的人工相关性,结果表明属性仍具有高度预测性,同时引入了更大的数据集、新模型、模型堆叠和更合适的评估指标,最终成功预测伤害严重性,这是重大进展。

Comments Added author contributions and journal reference, updated corresponding author, fixed a few typos

详情
Journal ref
Automation in Construction 118 (2020): 103146
AI中文摘要

本文显著改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法。与原始研究类似,我们使用自然语言处理(NLP)从原始事件报告中提取基本属性,并训练机器学习模型进行预测。此处预测的安全结果包括伤害严重性、伤害类型、受影响身体部位和事件类型。与原始研究不同,安全结果不是通过NLP提取,而是由独立的人工标注提供,从而消除了预测变量和预测目标之间可能的人工相关性。结果表明,属性仍具有高度预测性,证实了原始方法的有效性。当前研究的其他改进包括使用(1)一个包含超过90,000份报告的更大数据集,(2)两种新模型,XGBoost和线性支持向量机(SVM),(3)模型堆叠,(4)更简单的实验设置和更合适的性能指标,以及(5)对各属性重要性评分的分析。最后,伤害严重性结果得到良好预测,这在原始研究中并未实现。这是重大进展。

英文摘要

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

2605.20539 2026-05-21 cs.LG 版本更新

OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI

OpenSeisML: 开放式大规模真实地震和井历数据集用于生成式AI

Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Osokey Ltd(Osokey公司)

AI总结 本文提出OpenSeisML,一个开放的大型真实地震和井历数据集,用于支持生成式AI在地震反演中的应用,通过自动化数据整理流程提供可重复的地震数据准备,以训练生成模型捕捉地下属性的统计分布,从而生成多个统计上一致的现实实现用于不确定性量化。

Comments 5 pages, 8 figures

详情
AI中文摘要

机器学习(ML)和计算机视觉的出现显著加速了地震反演工作流程,通过减少传统昂贵的迭代方法的计算成本。然而,ML方法的发展和评估仍然受限于真实速度模型的稀缺性,因为大多数高质量数据由石油和天然气公司私有拥有。为了解决这一差距,我们提出了OpenSeisML,一个收集真实地震数据集的集合,旨在支持生成式AI(Gen-AI)在地震反演中的工作流程。这些数据集是从英国国家数据存储库(NDR)中公开可用的调查中精心挑选的。当地震体积处于时域而井位于深度时,需要进行时-深转换。我们使用检波器数据建立时-深关系,并通过插值构建速度模型,以实现对叠后地震数据的准确转换。在这里,我们提出了一种自动化数据整理流程,使地震数据准备成为可能,同时确保可重复性。目标是训练一个生成模型,以捕捉地下属性的统计分布,从而生成多个统计上一致的现实实现,用于不确定性量化,这些可以作为地震反演的先验条件。

英文摘要

The advent of machine learning (ML) and computer vision has significantly accelerated seismic inversion workflows by reducing the computational cost of traditionally expensive iterative methods. However, the development and evaluation of ML methods remain limited by the scarcity of realistic velocity models, as most high-quality data are privately owned by oil and gas companies. To address this gap, we present OpenSeisML, a collection of real seismic datasets designed to support generative AI (Gen-AI) workflows for seismic inversion. The datasets are curated from publicly available surveys in the UK National Data Repository (NDR). When seismic volumes are in the time domain and wells are in depth, a time-to-depth conversion is required. We use checkshot data to establish the time-depth relationship and construct a velocity model through interpolation for accurate conversion of post-stack seismic data. Here, we present an automated data curation pipeline that enables seismic data preparation while ensuring reproducibility. The objective is to train a generative model that captures the statistical distribution of subsurface properties, enabling the synthesis of multiple statistically consistent realizations for uncertainty quantification which can act as a prior for seismic inversion.

2605.20534 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Axiomatizing Neural Networks via Pursuit of Subspaces

通过子空间追求轴心化神经网络

Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj

发表机构 * Tampere University, Faculty of ITC, Finland(芬兰塔尔库大学信息与通信技术学院) Department of Electrical Engineering, Qatar University, Qatar(卡塔尔大学电气工程系) Donders Institute, Radboud University, The Netherlands(荷兰拉德堡德大学多纳尔斯研究所)

AI总结 本文提出一个基于几何公理的框架,用于解释神经网络的行为,通过子空间追求假设,统一了表示、计算和泛化在浅层和深层架构中的视角。

Comments 43 pages, 25 figures. Code and additional materials will be released

详情
AI中文摘要

尽管深度神经网络在许多领域取得了显著成功,但其底层机制仍不清晰,常被视为黑箱。这种经验表现与理论理解之间的差距类似于经典几何学的前公理阶段。在本文中,我们引入了子空间追求(PoS)假设,这是一个轴心化的框架,通过一组几何公理来表征神经网络的行为。这些公理及其推导出的结论为浅层和深层架构中的表示、计算和泛化提供了统一的视角。我们展示了该框架能够为深度学习中的基本问题提供几何解释,包括表示结构、架构机制和泛化行为,从而为一个连贯的理论基础提供了有原则的步骤。

英文摘要

While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.

2605.20533 2026-05-21 cs.LG 版本更新

Ada2MS: A Hybrid Optimization Algorithm Based on Exponential Mixing of Elementwise and Global Second-Moment Estimates

Ada2MS: 一种基于元素级和全局二阶矩估计指数混合的混合优化算法

Meng Zhu, Quan Xiao, Weidong Min

发表机构 * School of Information Management and Mathematics, Jiangxi University of Finance and Economics(江西财经大学信息管理与数学学院) School of Mathematics and Computer Science, Nanchang University(南昌大学数学与计算机科学学院) Institute of Metaverse, Nanchang University(南昌大学元宇宙研究院) Jiangxi Provincial Key Laboratory of Virtual Reality(江西省虚拟现实重点实验室)

AI总结 本文提出Ada2MS算法,通过连续指数插值元素级和全局二阶矩估计,平衡AdamW和动量SGD的优缺点,在视觉任务中取得竞争性结果。

详情
AI中文摘要

优化算法是机器学习模型通过迭代最小化损失函数、更新参数、从数据中学习并提高性能的核心方法。动量SGD和AdamW代表了两种重要的优化范式。AdamW产生稳定的更新,通常在各种训练场景中具有较强的鲁棒性,但其泛化性能有时弱于动量方法。动量SGD在仔细调参后通常可以获得更好的泛化性能,但对梯度尺度变化和超参数设置更敏感。为了平衡这两种范式的优缺点,本文提出Ada2MS优化算法,通过连续指数插值元素级二阶矩估计和全局二阶矩估计,实现AdamW-like行为和动量SGD-like行为的平滑过渡。在本研究评估的视觉任务中,Ada2MS在统一的优化器比较协议下取得了竞争性结果。代码将在https://github.com/mengzhu0308/Ada2MS上发布。

英文摘要

Optimization algorithms are core methods by which machine learning models iteratively minimize loss functions, update parameters, learn from data, and improve performance. Momentum SGD and AdamW represent two important optimization paradigms. AdamW produces stable updates and usually has strong robustness across training scenarios, but its generalization performance is sometimes weaker than that of momentum methods. Momentum SGD can often obtain better generalization after careful tuning, but it is more sensitive to gradient-scale variation and hyperparameter settings. To balance the strengths and weaknesses of the two paradigms, this paper proposes Ada2MS, an optimization algorithm that achieves a smooth transition between AdamW-like behavior and momentum-SGD-like behavior through continuous exponential interpolation between elementwise second-moment estimates and global second-moment estimates. On the visual tasks evaluated in this study, Ada2MS obtains competitive results under a unified optimizer-comparison protocol. The code will be released at https://github.com/mengzhu0308/Ada2MS

2605.20525 2026-05-21 cs.CV cs.AI cs.CL cs.LG eess.IV 版本更新

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

NeuroQA: 一种大规模的3D脑部MRI理解图像 grounded 评估基准

Mohammad H. Abbasi, Favour Nerrise, Shaurnav Ghosh, Ridvan Yesiloglu, Yuncong Mao, Bailey Trang, Mohammad Asadi, Merryn Daniel, Gustavo Chau Loo Kung, Ken Chang, Pavan Pinkesh Shah, Adam Turnbull, Kyan Younes, Seena Dehkharghani, Ehsan Adeli

发表机构 * Stanford University(斯坦福大学)

AI总结 本文提出NeuroQA,一个大规模的3D脑部MRI视觉问答基准,包含来自12977名受试者的56953个问答对,涵盖5-104岁及五个临床领域,通过3D体积评估11种临床推理技能,并提供可复现的生成脚本和在线排行榜。

Comments 30 pages, dataset and benchmark release

详情
AI中文摘要

我们提出了NeuroQA,一个大规模的3D脑部磁共振成像(MRI)视觉问答基准,包含来自12977名受试者的56953个问答对,涵盖5-104岁及五个临床领域:阿尔茨海默病、帕金森病、肿瘤、白质疾病和神经发育。与以往基于2D切片或狭窄诊断标签的医学视觉问答(VQA)方法不同,NeuroQA将每个项目与完整的3D体积配对。它评估11种临床相关的推理技能,涵盖是/否、多项选择和开放式格式。在203个模板中,131个是图像 grounded(可从3平面查看器回答),72个是图像 informed(答案来自定量体积测量或临床仪器)。为消除纯文本捷径,我们应用了答案分布优化,将封闭式文本-only 准确率从>80%降至44.6%;图像必要性通过发布的图像 grounded 协议单独评估。一个38规则的确定性管道和两轮专家审查验证每个QA对与FreeSurfer测量、元数据或放射学报告字段的匹配,零个相同受试者矛盾。我们进行了临床评估,两名临床医生独立评估100个冻结测试项目,使用3平面查看器。在封闭式(是/否+多项选择)测试公开项目上,最好的零样本视觉语言模型和监督的3D CNN基线分别达到47.5%和43.7%的准确率,均低于49.4%的文本-only 多数模板基准。NeuroQA采用两级发布,公开QA对用于开放访问数据集和受数据使用协议(DUAs)限制的数据集的可复现生成脚本,加上受试者级划分、保留的私人测试集和在线排行榜。

英文摘要

We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.

2605.20523 2026-05-21 cs.LG cs.AI q-bio.QM 版本更新

Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models

机器学习增强的非侵入性测试用于MASLD纤维化:浅层-深层神经网络与FIB-4、表格基础模型和大语言模型的比较

Athanasios Angelakis, Gabriele De Vito, Eleni-Myrto Trifylli, Filomena Ferrucci

发表机构 * BioML Lab, RI CODE, UniBw, Munich, Germany(BioML实验室,RI CODE,UniBw,慕尼黑,德国) Department of Epidemiology and Data Science, Amsterdam UMC, Amsterdam, Netherlands(流行病学与数据科学系,阿姆斯特丹大学医学中心,阿姆斯特丹,荷兰) Alpha Indicium, Rijswijk, Netherlands(Alpha Indicium,里杰斯霍伊斯,荷兰) Department of Computer Science, University of Salerno, Salerno, Italy(计算机科学系,萨勒诺大学,萨勒诺,意大利) GI-Liver Unit, 2nd Department of Internal Medicine, National and Kapodistrian University of Athens, General Hospital of Athens “Hippocratio”, Athens, Greece(肝病单位,第二内科部,雅典国家与卡波迪斯托里亚大学,雅典“希波克拉底”医院,希腊)

AI总结 本文研究了机器学习增强的非侵入性测试在MASLD纤维化检测中的应用,比较了浅层-深层神经网络、FIB-4、表格基础模型和大语言模型在不同队列中的性能,发现浅层-深层神经网络在保持FIB-4变量空间的同时提供了更平衡的外部操作性能。

Comments 26 pages, 4 figures, 3 tables. Preprint

详情
AI中文摘要

晚期纤维化是代谢功能障碍相关脂肪性肝病(MASLD)中肝相关发病率的主要决定因素。FIB-4被广泛用作一线非侵入性测试,但其固定公式可能低估了年龄、天冬氨酸转氨酶、丙氨酸转氨酶和血小板计数中包含的诊断信息。我们评估了机器学习增强的非侵入性测试(MLE-NIT)是否能够在保持FIB-4变量空间的同时提高晚期纤维化的检测能力。我们使用了来自中国、马来西亚和印度的三个经活检确认的MASLD队列(n=784)。中国队列被分为486名训练样本和54名内部验证/调整治疗样本;最终性能仅在马来西亚和印度的外部队列中报告。模型使用了五个变量:年龄、FIB-4、天冬氨酸转氨酶、血小板计数和丙氨酸转氨酶。我们比较了FIB-4与浅层-深层神经网络(s-DNN)、TabPFN和gpt-4o-2024-08-06。FIB-4在马来西亚和印度的外部ROC-AUC分别为0.75和0.60。TabPFN达到0.69和0.66,微调后的GPT-4o达到0.75和0.63,而s-DNN达到0.77和0.67。s-DNN仅包含354个可训练参数,相比TabPFN的7,244,554个参数,却提供了更平衡的外部操作性能。校准显示s-DNN的Brier分数为0.18和0.22,排列重要性识别出AST和FIB-4为主要变量。紧凑的非线性MLE-NIT可能在不增加临床数据需求的情况下增强基于FIB-4的纤维化评估。

英文摘要

Advanced fibrosis is a major determinant of liver-related morbidity in metabolic dysfunction-associated steatotic liver disease (MASLD). FIB-4 is widely used as a first-line non-invasive test, but its fixed formula may underuse diagnostic information contained in age, aspartate aminotransferase, alanine aminotransferase, and platelet count. We evaluated whether machine-learning-enhanced non-invasive testing (MLE-NIT) can improve advanced fibrosis detection while preserving this FIB-4 variable space. We used three biopsy-confirmed MASLD cohorts from China, Malaysia, and India (n=784). The Chinese cohort was split into 486 training and 54 internal validation/tuning patients; final performance was reported only on the Malaysian and Indian external cohorts. Models used five variables: age, FIB-4, aspartate aminotransferase, platelet count, and alanine aminotransferase. We compared FIB-4 with a shallow-deep neural network (s-DNN), TabPFN, and gpt-4o-2024-08-06. FIB-4 achieved external ROC-AUCs of 0.75 and 0.60 in Malaysia and India, respectively. TabPFN achieved 0.69 and 0.66, fine-tuned GPT-4o achieved 0.75 and 0.63, and the s-DNN achieved 0.77 and 0.67, respectively. The s-DNN contained only 354 trainable parameters, compared with 7,244,554 for TabPFN, yet provided a more balanced external operating profile. Calibration showed s-DNN Brier scores of 0.18 and 0.22, and permutation importance identified AST and FIB-4 as dominant variables. Compact non-linear MLE-NITs may enhance FIB-4-based fibrosis assessment without increasing clinical data requirements.

2605.20521 2026-05-21 cs.LG cs.CR 版本更新

An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees

基于二次近似的指数机制用于具有隐私保障的机器学习模型微调

Hoang Tran, Jorge Ramirez, Jiayi Wang, Alberto Bocchinfuso, Christopher Stanley, M. Paul Laiu

发表机构 * Computer Science and Mathematics Division, Oak Ridge National Laboratory(橡树岭国家实验室计算机科学与数学 division) Computational Science and Engineering Division, Oak Ridge National Laboratory(橡树岭国家实验室计算科学与工程 division) HPC Department, Cineca(Cineca 高性能计算部)

AI总结 本文提出一种基于指数机制的随机算法,用于在保证差分隐私的前提下微调预训练模型,通过结合局部二次近似和新数据集信息构建效用函数,并引入随机投影策略提升高维模型的可扩展性。

详情
AI中文摘要

微调过程将预训练的机器学习模型适应到一个小而敏感的数据集,但此过程有风险记住个体新的数据点,使模型对试图提取敏感信息的对手而言变得脆弱。在本文中,我们开发了一种基于指数机制的随机算法,用于微调的同时确保差分隐私。我们的关键思想是构建一个简单的效用函数,该函数结合了预训练模型的局部二次近似和新数据集的信息。所得到的指数机制允许以闭式形式精确地从多元正态分布中进行抽样。我们建立了该方法的理论隐私保证、灵敏度界限和准确性估计。我们进一步引入了一种随机投影策略,使该方法能够扩展到高维模型。在MNIST基准和MIMIC临床数据集上的数值实验显示,该方法在现有差分隐私微调技术中表现具有竞争力。

英文摘要

Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive information. In this work, we develop a randomized algorithm based on the exponential mechanism for fine-tuning while ensuring differential privacy. Our key idea is to construct a simple utility function that combines a local quadratic approximation of the pretrained model with information from the new dataset. The resulting exponential mechanism admits exact sampling from a multivariate normal distribution in closed form. We establish theoretical privacy guarantees, sensitivity bounds, and accuracy estimations for our method. We further introduce a random-projection strategy that makes the approach scalable to high-dimensional models. Numerical experiments on the MNIST benchmark and the MIMIC clinical dataset demonstrate competitive performance against existing differentially private fine-tuning techniques.

2605.20515 2026-05-21 cs.LG eess.SP 版本更新

Online Conformal Prediction with Corrupted Feedback

在线腐蚀反馈下的符合预测

Bowen Wang, Matteo Zecchin, Osvaldo Simeone

发表机构 * Department of Engineering, King’s College London(伦敦国王学院工程系) Communication Systems Department, EURECOM(EURECOM通信系统部) Institute for Intelligent Networked Systems, Northeastern University London(伦敦东北大学智能网络系统研究所)

AI总结 本文研究了在存在腐蚀反馈的情况下在线符合预测的鲁棒性问题,提出两种鲁棒方案并通过实验验证了其在腐蚀反馈下的改进性能。

详情
AI中文摘要

现代人工智能系统需要校准的不确定性估计,这些估计在顺序和非平稳环境中仍需保持可靠。在线符合预测(OCP)通过适应性更新的预测集来解决这一挑战,这些预测集提供确定性的长期误覆盖保证。然而,这些保证依赖于对过去预测集覆盖情况的完美反馈假设。在实践中,观察到的误覆盖指示器可能受到噪声、通信故障或对抗性操纵的干扰,这会严重降低OCP的校准保证。本文研究了在腐蚀反馈下的OCP。我们首先将反馈腐蚀建模为任意的二进制翻转序列,并分析反馈腐蚀如何影响和降低标准OCP的误覆盖性能。然后我们提出两种鲁棒方案:通过过滤的鲁棒OCP,利用预测阈值的结构特性来过滤腐蚀反馈;以及通过主动补偿的鲁棒OCP,整合主动补偿机制以减轻腐蚀反馈的影响。对于这两种方法,我们建立了显式的误覆盖保证,并进一步专门针对独立随机翻转模型和具有记忆限制的任意误差模型。在真实世界数据集上的实验验证了所提出的方法,显示在腐蚀反馈下校准显著改进,预测集明显更小,相比基线OCP方法。

英文摘要

Modern artificial intelligence systems require calibrated uncertainty estimates that remain reliable in sequential and non-stationary environments. Online conformal prediction (OCP) addresses this challenge through adaptively updated prediction sets that provide deterministic long-run miscoverage guarantees. These guarantees, however, hinge on the assumption of perfect feedback about the coverage of past prediction sets. In practice, the observed miscoverage indicator may be corrupted by noise, communication failures, or adversarial manipulation, which can severely degrade OCP's calibration guarantees. In this paper, we study OCP under corrupted feedback. We first model feedback corruption as an arbitrary binary flip sequence, and analyze how feedback corruption affects and degrades the miscoverage performance of standard OCP. We then propose two robust schemes: robust OCP via filtering, which leverages the structural properties of the predicted threshold to filter corrupted feedback, and robust OCP via active compensation, which incorporates an active compensation mechanism to mitigate the effect of corrupted feedback. For both methods, we establish explicit miscoverage guarantees, which are further specialized for an independent stochastic flip model and for an arbitrary error model with memory bounds. Experiments on real-world datasets validate the proposed approach, showing markedly improved calibration and significantly smaller prediction sets compared with baseline OCP methods under corrupted feedback.

2605.20506 2026-05-21 cs.LG cs.CL 版本更新

Reinforcing Human Behavior Simulation via Verbal Feedback

通过言语反馈强化人类行为模拟

Weiwei Sun, Xuhui Zhou, Jiarui Liu, Weihua Du, Haojia Sun, Yiqing Xie, Qianou Ma, Sihao Chen, Mengting Wan, Longqi Yang, Pei Zhou, Sherry Wu, Sean Welleck, Graham Neubig, Yiming Yang, Maarten Sap

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Microsoft(微软)

AI总结 本文提出DITTO模型,通过将言语反馈作为强化学习中的首要信号来提升LLM模拟人类行为的能力,并引入SOUL基准测试平台,展示了在多个任务中显著提升性能的成果。

详情
AI中文摘要

人类通过言语反馈(例如父母说“那很粗鲁”或朋友解释“这是为什么那会伤害你”)学习社会规范和行为。然而,对于LLM而言,学习反馈主要集中在代码和数学等领域,这些领域中的RL奖励可以直接验证并压缩为标量值。随着LLM越来越多地用于模拟人类行为,例如代表用户、患者、学生和其他角色,有必要使它们更加人性化,这需要接受一种根本不同的信号:主观的、多方面的言语反馈。我们提出了DITTO,一个通过将言语反馈作为强化学习中的首要信号进行训练的模型。每次回放后,DITTO会接收言语反馈并生成反馈条件的改进回放;两个输出通过GRPO联合优化,将言语指导蒸馏到基础策略中,而无需在测试时使用反馈。我们还引入了SOUL(Simulation gym Of hUman-Like behavior),一个涵盖10个任务、六个类别的统一基准和训练数据集:理论思维、角色扮演、社交技能、学习模拟、用户模拟和角色模拟。DITTO在基础模型上平均提升了36%,并在SOUL基准测试中的6个任务上超过了GPT-5.4,证明了通过言语反馈的强化学习是训练LLM模拟人类行为的有前途的方向。

英文摘要

Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values. As LLMs are increasingly used to simulate human behavior, e.g., standing in for users, patients, students, and other personas, there is a pressing need to make them more human-like, which requires embracing a fundamentally different kind of signal: feedback that is verbal, subjective, and multi-faceted. We present DITTO, a model trained by treating verbal feedback as a first-class signal in reinforcement learning. After each rollout, DITTO receives verbal feedback and generates a feedback-conditioned improved rollout; both outputs are jointly optimized with GRPO, distilling verbal guidance into the base policy without requiring feedback at test time. We also introduce SOUL (Simulation gym Of hUman-Like behavior), a unified benchmark and training data suite spanning 10 tasks across six categories: Theory of Mind, character role play, social skill, learner simulation, user simulation, and persona simulation. DITTO achieves an average 36% improvement over the base model and exceeds GPT-5.4 on 6 of 10 SOUL benchmarks, demonstrating that RL with verbal feedback is a promising direction for training LLMs to simulate human behavior.

2605.20502 2026-05-21 cs.LG cs.AI cs.CV stat.AP stat.ML 版本更新

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

基于表示空间扩散模型的Tippett最小融合多编码器异常检测

Neelkamal Bhuyan

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出了一种多编码器融合的表示空间扩散模型,通过统计分析每个编码器对特定分布偏移类型的敏感性,引入EncMin2L门控机制,无需使用OOD标签即可在较低参数成本下提升异常检测性能,同时在四种分布偏移类型上均达到0.94以上的AUROC。

Comments 14 pages

详情
AI中文摘要

我们通过多编码器融合的每编码器表示空间扩散模型(RDMs)来解决跨完整分布偏移谱的异常检测问题,包括全局域变化、语义分歧、纹理差异和协变量腐蚀。我们从ID数据中统计地识别每个编码器对特定偏移类型的敏感性,并引入EncMin2L——一种编码器无关的两级min(⋅)门控,能够在不使用OOD标签的情况下结合和校准每编码器扩散基的似然检测器,参数成本比单编码器基线低2.3倍。两种ID数据诊断:η²(类条件F检验)和Δμ(在合成腐蚀下的对数似然偏移)量化编码器的专业化,而Tippett最小p值组合将每编码器得分聚合为一个校准稳定的OOD信号。EncMin2L在所有四种偏移类型上均达到≥0.94的AUROC,优于在重叠基准上的最佳表示空间扩散OOD检测器。

英文摘要

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

2605.20494 2026-05-21 cs.LG physics.ao-ph stat.AP 版本更新

A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)

具有风依赖性路径转换的10,000年全球随机热带气旋目录(WHITS)

Jennifer Nakamura, Upmanu Lall

发表机构 * Lamont-Doherty Earth Observatory, Columbia University(哥伦比亚大学拉蒙特-多赫蒂地球观测站) School of Complex Adaptive Systems, Arizona State University(亚利桑那州立大学复杂适应系统学院) Earth and Environmental Engineering, Columbia University(哥伦比亚大学地球与环境工程系)

AI总结 本文提出WHITS方法,通过非参数半马尔可夫路径生成器生成全球10,000年合成气旋目录,以提高保险损失评估的可靠性。

详情
AI中文摘要

可靠的热带气旋(TC)风险评估受到历史记录的简短和空间稀疏性的限制,特别是对于罕见的高强度登陆事件,这些事件主导了保险损失。我们提出了WHITS(风聚焦飓风交互路径模拟器),这是一种非参数半马尔可夫路径生成器,扩展了Nakamura等人(2015)的HITS框架,有三种改进:在历史路径段之间转换时,除了位置、年龄和前进向量外,还根据局部风速进行条件;在比较向量项上选择核时,进行了细化以抑制动态不一致的跳跃;并在每个转换中应用了短平滑窗口,以消除下游风暴潮用户报告的位置和风速不连续性。WHITS被拟合到每个六个盆地的完整可用最佳轨迹记录中,北大西洋延伸至1851年,在其他盆地延伸至可靠最佳轨迹数据的最早年份。所得到的10,000年全球合成目录重现了所有盆地的观测路径密度和每年飓风/台风风力打击概率。该目录旨在用于灾难风险应用,其中大量、低偏倚的物理合理路径比小而统计上修正的样本更有用。

英文摘要

Reliable assessment of tropical cyclone (TC) risk is limited by the brevity and spatial sparsity of the historical record, particularly for the rare, high-intensity landfalls that dominate insured loss. We present WHITS (Wind-focused Hurricane Interactive Track Simulator), a non-parametric semi-Markov track generator that extends the HITS framework of Nakamura et al. (2015) in three ways: transitions between historical track segments are conditioned on local wind speed in addition to position, age, and forward vector; the kernel selection on the comparative-vector term is sharpened to suppress dynamically inconsistent jumps; and a short smoothing window is applied across each transition to remove the position and wind discontinuities reported by downstream surge users. WHITS is fit to the full available best-track record in each of six basins in IBTrACS, extending in the North Atlantic to 1851 and in other basins to the earliest year of reliable best-track data. The resulting 10,000-yr global synthetic catalog reproduces observed track density and the annual hurricane/typhoon-force wind-hit probability across all basins. The catalog is intended for catastrophe-risk applications where a large, low-bias sample of physically plausible tracks is more useful than a small, statistically corrected one.

2605.20485 2026-05-21 cs.LG 版本更新

ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration

ZEBRA: 零样本预算化资源分配用于LLM编排

May Hamri, Inbal Talgam-Cohen

发表机构 * Tel Aviv University(特拉维夫大学)

AI总结 该研究提出ZEBRA框架,通过将多阶段预算分配转化为连续非线性背包问题,有效解决多智能体流水线中预算分配问题,实验显示其在多个任务上均优于传统方法。

详情
AI中文摘要

随着自主代理在固定货币预算下执行端到端任务,关键问题从预算是否被尊重转变为如何有效使用预算。现有预算感知方法通常在单一代理内逐步控制推理过程,或通过强化学习学习资源分配策略。本文提出ZEBRA,一种零样本框架,将多阶段预算分配转化为连续非线性背包问题:一个LLM控制器估计各阶段的效用曲线,通过拉格朗日乘数的水填充搜索返回各阶段的分配。加法和乘法聚合统一在同一个求解器下。在150个任务APPS编码基准测试中,ZEBRA变体在所有聚合指标上均优于LLM直接分配方法。在预算为无约束支出的α=0.5时,ZEBRA恢复了94.4%的无约束质量,而LLM直接分配仅为88.1%。该优势具有统计显著性,并且在编码之外也具有转移性:在3阶段的HotpotQA流水线中,ZEBRA比LLM直接分配高出14.3个百分点,分配在经验上对曲线估计噪声具有鲁棒性。在HotpotQA中,ZEBRA达到的预算分配(近平衡)与APPS中的分配(偏向细化阶段)不同,显示出对流水线结构的适应性。更广泛地说,我们展示了在推理时间使用轻量级算法指导可以改善自主多智能体系统的经济行为。

英文摘要

As autonomous agents increasingly execute end-to-end tasks under fixed monetary budgets, the pressing open question shifts from whether the budget is respected, to how to spend it effectively. Existing budget-aware methods typically control reasoning step-by-step within a single agent, or learn resource allocation policies via RL. None address how to split a budget across the composing phases of a multi-agent pipeline at inference time. We propose ZEBRA, a zero-shot framework that reduces multi-phase budget allocation to a continuous nonlinear knapsack problem: an LLM controller estimates per-phase utility curves, and a water-filling search on the Lagrange multiplier returns the per-phase split. Additive and multiplicative aggregations are unified under the same solver. On a $150$-task APPS coding benchmark, both ZEBRA variants outperform LLM-direct (budget allocation directly by an LLM) on every aggregate metric. At a budget of $α= 0.5$ of the unconstrained spend, ZEBRA recovers $94.4\%$ of unconstrained quality, versus $88.1\%$ for LLM-direct. The advantage is statistically significant and transfers beyond coding: on a $3$-phase HotpotQA pipeline, ZEBRA beats LLM-direct by $14.3$pp, with allocations empirically robust to curve-estimation noise. On HotpotQA, ZEBRA arrives at a different budget split (near-balanced) compared to the APPS one (skewed towards a refinement phase), showing adaptation to the pipeline structure. More broadly, we show that lightweight algorithmic guidance at inference time can improve the economic behavior of autonomous multi-agent systems.

2605.20482 2026-05-21 cs.LG cs.SY eess.SY 版本更新

Quadratic Characterizations for Reachability Analysis of Neural Networks

二次特性用于神经网络可达性分析

Elias Khalife, Mazen Farhood, Pierre-Loic Garoche

发表机构 * Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Tech(凯文·T·克罗夫顿航空航天与海洋工程系,弗吉尼亚理工学院) Federation ENAC ISAE-SUPAERO ONERA, Universite de Toulouse(ENAC ISAE-SUPAERO ONERA联盟,图卢兹大学)

AI总结 本文提出了一种构建二维实平面上标量关系的验证二次特性的框架,通过局部生成候选二次不等式并全局验证,以提高神经网络可达性分析的精度和效率。

详情
AI中文摘要

二次约束(QCs)广泛用于表征非线性和不确定性,但在有界域上通用分析特性可能较为保守。本文开发了一个框架,用于构建二维实平面上标量关系的验证二次特性。候选二次不等式通过使用关系和外部样本点求解凸二次规划局部生成。然后通过求和平方证书在精确半代数描述或非多项式关系的放松多项式描述上进行全局验证。所得到的验证约束定义了所考虑域上标量关系的可信上近似。这些约束与基于QCs和点wise积分二次约束(IQCs)的现有分析框架直接兼容,可用于静态非线性和不确定性的分析,并可嵌入基于QCs的半正定规划中,用于前馈神经网络的可达性和安全性分析。对于平滑激活函数如tanh,该方法产生域依赖的二次特性,作为通用扇区或斜率描述的替代方案。对于ReLU网络,我们给出了减少QC基于可达性分析保守性的方法,通过利用神经元间的依赖关系和更紧的局部界限。数值示例展示了对平滑激活函数的改进可达性结果,对ReLU网络的减少保守性,以及通过涉及饱和的示例展示了其在神经网络之外的应用。

英文摘要

Quadratic constraints (QCs) are widely used to characterize nonlinearities and uncertainties, but generic analytical characterizations can be conservative on bounded domains. This paper develops a framework for constructing verified quadratic characterizations of scalar relations in the two-dimensional real plane. Candidate quadratic inequalities are locally generated by solving convex quadratic programs using samples from the relation and exterior sample points. They are then verified globally using sum-of-squares certificates over an exact semialgebraic description or, in the case of nonpolynomial relations, over relaxed polynomial descriptions. The resulting verified constraints define a sound overapproximation of the scalar relations over the considered domains. These constraints are directly compatible with existing analysis frameworks based on QCs and pointwise integral quadratic constraints (IQCs) for static nonlinearities and uncertainties, and they can also be embedded in QC-based semidefinite programs for reachability and safety analysis of feedforward neural networks. For smooth activations such as $\tanh$, the method yields domain-dependent quadratic characterizations that constitute an alternative to generic sector- or slope-based descriptions. For ReLU networks, we give methods to reduce conservatism in QC-based reachability analysis of feedforward networks by exploiting dependencies between neurons and tighter local bounds. Numerical examples demonstrate improved reachability results for smooth activations, reduced conservatism for ReLU networks, and applicability beyond neural networks through an example involving saturation.

2605.20479 2026-05-21 cs.CV cs.LG 版本更新

Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising

用于基于模型的图像去噪中超参数预测的Oracle监督转移

Jianmin Liao, Lixin Shen, Yuesheng Xu

发表机构 * Department of Mathematics Syracuse University(数学系苏利文大学) Department of Mathematics & Statistics Old Dominion University(数学与统计学系老 Dominion 大学)

AI总结 该研究提出HyperDn,一种单配置条件预测器,通过聚合源配置的Oracle监督,预测新的去噪器-噪声配置的异质超参数,展示了在跨范式实验中,从相对便宜的TV/TGV变分源转移到更昂贵的扩散模型DiffPIR时,通过少量或无目标Oracle标签实现接近Oracle性能的成果。

详情
AI中文摘要

超参数预测是基于模型的图像去噪器中的关键实际瓶颈,从经典的TV/TGV变分求解器到现代的扩散基模型如DiffPIR。尽管现有的学习预测器可以实现接近Oracle的性能,但这种方法扩展性差:每个新的配置通常需要其自身的Oracle标记训练集,且每个标签都需要通过与干净地面真实值对比的分层网格搜索来评估。因此,我们询问是否可以从源配置收集的Oracle监督能够转移到目标配置,而使用很少或没有目标Oracle标签。我们提出了HyperDn,一种单配置条件预测器,通过聚合源配置的Oracle监督,预测新的去噪器-噪声配置的异质超参数。在跨范式实验中,HyperDn从相对便宜的TV/TGV变分源转移到更昂贵的扩散基DiffPIR。仅使用2个目标Oracle标签,它达到了30.23 dB,接近Oracle性能,且在使用1/32个目标标签的情况下优于训练自研的每配置64标签预测器。在没有目标Oracle标签的情况下,HyperDn在两个未见过的噪声类型混合和从相对便宜的96×96源图像转移到512×768目标时也达到了接近Oracle的PSNR。这些结果表明,超参数预测的昂贵Oracle监督可以从源转移到新的目标配置,从而减少为每个新的去噪配置重建Oracle标签的需求。

英文摘要

Hyperparameter prediction is a critical practical bottleneck for model-based image denoisers, ranging from classical TV/TGV variational solvers to modern diffusion-based models such as DiffPIR. While existing learned predictors can achieve near-oracle performance, this approach scales poorly: each new configuration conventionally requires its own oracle-labeled training set, and each label requires a hierarchical grid search evaluated against clean ground truth. We therefore ask whether oracle supervision collected on source configurations can transfer to target configurations with few or no target oracle labels. We propose HyperDn, a single configuration-conditioned predictor that pools oracle supervision across source configurations and predicts heterogeneous hyperparameters for new denoiser--noise configurations. In a cross-paradigm experiment, HyperDn transfers from relatively cheap TV/TGV variational sources to more expensive diffusion-based DiffPIR. With only $2$ target oracle labels, it reaches $30.23$\,dB, within $0.90$\,dB of the oracle, and outperforms the $64$-label per-configuration predictor trained from scratch, using $1/32$ as many target labels as that baseline point. Without any target oracle labels, HyperDn also reaches near-oracle PSNR on two unseen mixtures of seen noise types and on transfer from relatively cheap $96\times 96$ source images to $512\times 768$ targets. Together, these results show that expensive oracle supervision for hyperparameter prediction can be transferred from source to new target configurations, reducing the need to rebuild oracle labels for each new denoising configuration.

2605.20477 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Training Language Agents to Learn from Experience

训练语言代理以从经验中学习

Yuval Shalev, Zifeng Ding, Mateja Jamnik

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出了一种名为In-context Training(ICT)的任务框架,用于评估语言代理在跨任务中的自我改进能力,并通过基于强化学习的训练管道直接从经验中学习反思,从而在多个基准任务中优于基线模型,展示了从经验中学习的能力本身可以被学习。

详情
AI中文摘要

语言代理可以在交互环境中通过经验进行适应,但当前基于反思的方法只能在单个任务实例内进行自我纠正。是否可以将这种经验提炼成可重用的教训,从而在未来的未见任务上提高性能仍不明确。我们通过引入In-context Training(ICT)任务来解决这个问题,这是一种用于评估语言代理跨任务自我改进能力的框架。在ICT中,一个反思模型观察由行为模型收集的轨迹,并生成旨在提高行为模型在未见任务上的性能的系统提示。然后,我们提出了一种基于强化学习的训练管道,用于直接从经验中学习此类反思,而无需人工提供的示例。在ALFWorld和MiniHack上,我们训练的反思器在大多数保留的任务家族上优于未训练的基线,表明从经验中学习的能力本身可以被学习。在某些情况下,我们观察到在训练反射器的基准之外的泛化能力,能够显著不同的环境。最后,我们介绍了MetaGym,一个通用的Python库,用于构建元环境,从而促进未来对自我改进语言代理的研究。

英文摘要

Language agents can adapt from experience in interactive environments, but current reflection-based methods can only self-correct within a single task instance. Whether such experience can be distilled into reusable lessons that improve performance on future unseen tasks remains unclear. We address this problem by introducing the In-context Training (ICT) task, a framework for evaluating cross-task self-improvement in language agents. In ICT, a reflector model observes trajectories collected by an actor model and generates system prompts intended to improve the actor's performance on future unseen tasks. We then propose an RL-based training pipeline for learning such reflections directly from experience, without human-provided examples. Across ALFWorld and MiniHack, our trained reflectors outperform an untrained baseline on most held-out task families, showing that the ability to learn from experience can itself be learned. In some cases, we observe generalisation beyond the benchmark on which the reflector was trained, to substantially different environments. Finally, we introduce MetaGym, a generic Python library for constructing meta-environments, enabling future research on self-improving language agents.

2605.20473 2026-05-21 cs.SE cs.AI cs.LG 版本更新

Code Generation by Differential Test Time Scaling

通过微分测试时间缩放进行代码生成

Yifeng He, Ethan Wang, Jicheng Wang, Xuanxin Ouyang, Hao Chen

发表机构 * University of California, Davis(加州大学戴维斯分校)

AI总结 本文提出DiffCodeGen,一种基于覆盖引导的微分分析的代码生成方法,通过生成多样化的代码候选并利用覆盖引导模糊测试来合成输入,无需现有测试用例或大语言模型,从而提高效率和可扩展性。

Comments 16 main text, 21 pages with references

详情
AI中文摘要

测试时间缩放已崭露头角,成为通过在推理时间探索大规模解决方案空间来改进代码生成的有前途的方法。然而,现有方法通常依赖于公开的测试用例,这些在实践中不可用,或需要大量的LLM推理来选择候选,导致显著的token消耗和时间开销。我们提出了DiffCodeGen,一种基于覆盖引导的微分分析的新型测试时间缩放方法用于代码生成。DiffCodeGen利用各种采样和提示策略生成多样化的代码候选,然后应用覆盖引导的模糊测试来合成输入,而无需任何现有的测试用例或大语言模型。通过在这些输入上执行所有候选,DiffCodeGen捕捉到它们的动态行为并根据行为相似性对候选进行聚类。DiffCodeGen选择最大聚类的medoid作为最终输出。不同于先前的测试时间缩放方法需要额外的LLM推理来选择候选,DiffCodeGen在不调用任何额外模型的情况下进行选择,导致极小或没有额外的token消耗。DiffCodeGen完全异步,自然适合当前代理编程的趋势,因此是高效且高度可扩展的。我们评估了DiffCodeGen在4个大型语言模型上的表现,展示了相对于基线的一致改进。与最先进的测试时间缩放方法相比,DiffCodeGen在仅使用少量时间和token的情况下实现了竞争或更优的性能。DiffCodeGen是模型无关的,可以与推理模型结合以进一步提升性能。

英文摘要

Test-time scaling has emerged as a promising approach for improving code generation by exploring large solution spaces at inference time. However, existing methods often rely on public test cases that are unavailable in practice, or require extensive LLM inference for candidate selection, leading to significant token consumption and time overhead. We present DiffCodeGen, a novel test-time scaling method for code generation based on coverage-guided differential analysis. DiffCodeGen generates diverse code candidates using various sampling and prompting strategies, then applies coverage-guided fuzzing to synthesize inputs without requiring any existing tests or large language models. By executing all candidates on these inputs, DiffCodeGen captures their dynamic behavior and clusters candidates based on behavioral similarity. DiffCodeGen selects the medoid of the largest cluster as the final output. Unlike prior test-time scaling methods that invoke additional LLM inference for candidate selection, DiffCodeGen performs selection without any extra model calls, incurring little to no additional token consumption. DiffCodeGen is fully asynchronous, naturally suited to the current trend of agentic coding, and is thus efficient and highly scalable. We evaluate DiffCodeGen across 4 large language models, demonstrating consistent improvements over baselines. Compared to state-of-the-art test-time scaling methods, DiffCodeGen achieves competitive or superior performance while using only a fraction of time and tokens. DiffCodeGen is model-agnostic and can be combined with reasoning models to further boost performance.

2605.20450 2026-05-21 cs.LG cs.CR 版本更新

SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning

SMA-DP:基于频谱记忆的差分隐私用于深度学习

Mohammad Partohaghighi, Roummel Marcia

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系) University of California, Merced(加州大学默塞德分校) Department of Applied Mathematics(应用数学系)

AI总结 本文提出了一种名为SMA-DP-SGD的差分隐私随机梯度下降方法,通过引入频谱记忆分支来增强DP-SGD的隐私保护性能,从而在多个数据集上实现了更优的准确率和隐私保护。

详情
AI中文摘要

差分隐私随机梯度下降(DP-SGD)通过每个示例裁剪和校准的高斯噪声实现私人的深度学习,但其高方差更新会降低在具有挑战性的数据集上的效用。我们提出了SMA-DP-SGD,一种基于频谱记忆的差分隐私随机梯度下降方法,该方法通过在之前隐私化噪声发布中构建的分数记忆分支来增强DP-SGD。受WeightWatcher启发的幂律频谱指数提供了组级可靠性信号,在实验中以层级方式实现,以适应衰减和有效记忆深度。隐私历史对齐、范数匹配和激活预热稳定了记忆贡献。隐私保持透明:在给定隐私发布历史的条件下,记忆分支是固定的,而唯一新的数据依赖项是当前裁剪总和乘以固定系数β。因此,SMA-DP-SGD保持了干净的条件敏感度结构,并且当β=1时,精确恢复组级DP-SGD。在CIFAR-100、CIFAR-10和MNIST上的实验显示,SMA-DP-SGD在多个DP优化基线中表现竞争或更优,尤其在CIFAR-100和CIFAR-10上获得最大收益。CIFAR-10的消融实验显示,β控制隐私-效用轨迹,而频谱和记忆诊断确认了受控的短至中等有效记忆深度和小的记忆分支比。运行时分析显示,该机制带来了额外的开销,大约是DP-SGD的2.94倍,在我们的CIFAR-10实现中,揭示了适应性隐私记忆与计算成本之间的实际权衡。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) enables private deep learning through per-example clipping and calibrated Gaussian noise, but its high-variance updates can reduce utility on challenging datasets. We propose \textbf{SMA-DP-SGD}, a \textbf{Spectral Memory-Aware Differentially Private Stochastic Gradient Descent} method that augments DP-SGD with a fractional memory branch built only from previously privatized noisy releases. WeightWatcher-inspired power-law spectral exponents provide group-wise reliability signals, instantiated layer-wise in our experiments, to adapt the decay and effective memory depth. Private-history alignment, norm matching, and warm-up activation stabilize the memory contribution. Privacy remains transparent: conditioned on the private release history, the memory branch is fixed, and the only newly data-dependent term is the current clipped sum scaled by a fixed coefficient \(β\). Hence, SMA-DP-SGD preserves a clean conditional sensitivity structure and exactly recovers group-wise DP-SGD when \(β=1\). Experiments on CIFAR-100, CIFAR-10, and MNIST show competitive or superior accuracy over several DP optimization baselines, with the largest gains on CIFAR-100 and CIFAR-10. CIFAR-10 ablations show that \(β\) controls the privacy--utility trajectory, while spectral and memory diagnostics confirm a controlled short-to-moderate effective memory depth and a small memory-branch ratio. Runtime analysis shows that the mechanism incurs additional overhead, about \(2.94\times\) DP-SGD in our CIFAR-10 implementation, revealing a practical trade-off between adaptive private memory and computational cost.

2605.20449 2026-05-21 cs.LG cs.AI 版本更新

LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series

LLM预训练塑造了可泛化的流形:跨模态迁移至时间序列的洞察

Alexis Roger, Prateek Humane, Zhenghan Tai, Gwen Legate, Andrei Mircea, Vasilii Feofanov, Irina Rish

发表机构 * McGill University(麦吉尔大学) Mila - Quebec AI Institute(魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) University of Toronto(多伦多大学) Concordia University(康科迪亚大学) com(42.com)

AI总结 研究探讨了语言预训练的Transformer能否成为有效的时序预测器,并揭示了跨模态迁移的机制,指出预训练构建了流形,微调则将数值动态投影到任务相关方向。

详情
AI中文摘要

语言预训练的Transformer能否成为有效的时序预测器,以及原因是什么?本文表明,跨模态迁移出现是因为语言预训练为时序训练预设了一个可重用的流形。在冻结的LLM状态上进行线性探测可以解码出真实的时序轨迹而无需配对监督,该投影空间中的检索能产生具有竞争力的预测,表明在微调之前就已经存在结构和动态。预训练初始化还提升了优化效果,产生连贯的梯度和高度各向异性的损失景观,不同于随机初始化。微调则起到低维对齐的作用,重用已有的方向而非从头学习时间原始特性,这通过低秩更新、子空间对齐和共享的周期性、趋势和重复特征得到证实。这些结果支持了LLM到时序迁移的几何解释:语言预训练构建了流形,微调将数值动态投影到任务相关方向上。

英文摘要

Can language-pretrained transformers become effective time-series forecasters, and why? In this paper, we show that cross-modal transfer arises because language pretraining preconditions time series training with a reusable manifold. A linear probe on frozen LLM states decodes realistic time-series trajectories without paired supervision, and retrieval in this projected space yields competitive forecasts, showing that structure and dynamics exist before finetuning. Pretrained initialization also improves optimization, producing coherent gradients and a highly anisotropic loss landscape unlike random initialization. Finetuning then acts as low-dimensional alignment, reusing existing directions rather than learning temporal primitives from scratch, as evidenced by low-rank updates, subspace alignment, and shared features for periodicity, trend, and repetition. Together, these results support a geometric account of LLM-to-time-series transfer: language pretraining builds the manifold, and finetuning projects numerical dynamics onto task-relevant directions.

2605.20441 2026-05-21 cs.LG cs.AI cs.NE 版本更新

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

Transformer在Grokking中的权重衰减区域:廉价的在线诊断

Lucky Verma

发表机构 * Independent Researcher(独立研究者)

AI总结 研究探讨了在模运算中训练的Transformer模型在记忆、泛化和崩溃之间的尖锐转变,并通过权重衰减作为标量经验控制参数来分析这些区域,引入了两种廉价的在线诊断方法,通过注意力激活来跟踪训练动态,并在较低计算成本下补充损失景观诊断。

Comments 28 pages, 11 figures, 5 tables. Code and aggregate JSONs: https://github.com/lucky-verma/grokking-diagnostics. Per-run JSONs: https://huggingface.co/datasets/lucky-verma/grokking-diagnostics-runs. Lean 4/mathlib v4.29.0 formal checks available in the code repository

详情
AI中文摘要

在模运算中训练的Transformer模型表现出记忆、泛化和崩溃之间的尖锐转变。我们证明权重衰减作为这些区域的标量经验控制参数,并引入了两种廉价的在线诊断方法,即平均成对注意力头余弦相似度和熵标准差,这些方法仅通过注意力激活来跟踪训练动态,并在较低计算成本下补充损失景观诊断。在十一种实验条件和三种模型规模(0.82M到85M参数)中,权重衰减轴将记忆、发展性Grokking和崩溃分开。一个接近临界点的逻辑拟合将记忆到发展性的边界定位在λ_c=0.0158(95%置信区间[0.0109, 0.0200],N=210);一个幂律拟合给出经验指数ν=0.757(置信区间[0.725, 0.799])。参考指数ν=1/2和3D伊辛ν≈0.63在我们四格网格下位于此经验置信区间之外,因此我们报告ν为经验值,并将临界点类别的识别推迟到更密集的有限大小缩放工作。一个与地平线匹配的多任务复制(n=280,四个模运算)保留了权重衰减控制模式;在λ=0.05时进行的配对注意力头重新初始化实验改变了阶段2的振幅(Cohen的d=-1.190,n=10,p_t=4.5×10^-3),而匹配的权重范数裁剪则没有。三个跨架构探测(4L MLP,4L LSTM和4L Mamba;每个n=70)在小Transformer注意力模型的模运算中复制了权重衰减控制的转变,具有架构特定的λ_c值。主要诊断主张限于小Transformer注意力模型的模运算;非注意力实验是范围探测,架构广泛、语言模型和临界点类别的主张超出范围。

英文摘要

Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical control parameter for these regimes, and introduce two cheap online diagnostics, mean pairwise attention-head cosine similarity and entropy standard deviation, that track training dynamics from attention activations alone and complement loss-landscape diagnostics at lower compute cost. Across eleven experimental conditions and three model scales (0.82M to 85M parameters), the weight-decay axis separates memorization, developmental grokking, and collapse. A near-transition logistic fit localizes the memorization-to-developmental boundary at $λ_c=0.0158$ (95% CI [0.0109, 0.0200], N=210); a power-law fit gives an empirical exponent $ν=0.757$ (CI [0.725, 0.799]). Reference exponents $ν=1/2$ and 3D Ising $ν\approx 0.63$ lie outside this empirical CI under our four-bin grid, so we report $ν$ as empirical and defer universality-class identification to denser finite-size-scaling work. A horizon-matched multi-task replication (n=280, four modular operations) preserves the weight-decay control pattern; a paired attention-head re-initialization experiment at $λ=0.05$ changes Phase-2 amplitude (Cohen's $d=-1.190$, n=10, $p_t=4.5 \times 10^{-3}$), while matched weight-norm clipping does not. Three cross-architecture probes (4L MLP, 4L LSTM, and 4L Mamba; each n=70) replicate the weight-decay-controlled transition with architecture-specific $λ_c$ values. Main diagnostic claims are scoped to modular arithmetic in small transformer attention models; the non-attention experiments are scope probes, and architecture-wide, language-model, and universality-class claims are out of scope.

2605.20440 2026-05-21 cs.LG cs.AI math.RA 版本更新

Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery

群代数张量:可证明最优的等变学习与物理对称性发现

Paulina Hoyos, Shashanka Ubaru, Dongsung Huh, Vasileios Kalantzis, Kenneth L. Clarkson, Misha Kilmer, Haim Avron, Lior Horesh

发表机构 * UT Austin(得克萨斯大学奥斯汀分校) IBM Research(IBM研究院) Independent(独立) Tufts University(塔夫茨大学) Tel-Aviv University(特拉维夫大学)

AI总结 本文提出了一种群代数张量框架,通过将有限群G的乘法规则引入张量代数,使等变性成为代数属性而非架构限制。该框架基于三个理论支柱:(i) Eckart-Young最优性保证的星G-SVD;(ii)通过Kronecker分解组合多个对称性;(iii)600行的Lean4形式化证明。该框架提供了等变神经网络无法实现的能力:每个预测的闭式分解和数据驱动发现最佳对称群。在QM9分子几何上,通过八面体子群恢复角动量选择规则,展示了数据驱动的物理发现。

详情
AI中文摘要

我们引入了$\star_G$张量代数,在其中任何有限群$G$定义乘法规则,使等变性成为代数属性而非架构约束。该框架基于三个机器验证的理论支柱:(i) $\star_G$-SVD的Eckart-Young最优性保证,是首个对称保持张量近似的结果,精确且多项式时间;(ii) 通过Kronecker分解组合多个对称性,通过将$F_G$替换为$F_{G_1} \otimes F_{G_2}$无需架构重设计;(iii) 600行的Lean~4形式化证明了$\star_G$代数。该框架提供了等变神经网络(ENNs)结构无法实现的能力:每个预测的闭式分解,以及数据驱动发现最佳对称群。作为非平凡的实证演示,分解QM9分子几何的八面体子群恢复了角动量选择规则,仅凭数据而非量子力学输入:标量性质由A$_1$主导,偶极子成分由T$_1$主导,各向异性极化率对l=1不敏感,因为秩2迹分解l=0⊕l=2要求,T$_1$/A$_1$预测能力比将向量可观测量与标量可观测量分离了五倍。在完整的QM9(130,831分子)上,$\star_G$-SVD与岭回归提供闭式预测,参数数量比参数匹配的MLP少50-90倍。代数等变性因此补充架构等变性,不是更快、更好、更便宜的替代方案,而是不同的数学能力:可证明最优的对称保持压缩,每irrep可解释性,以及数据驱动的物理发现。

英文摘要

We introduce the $\star_G$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart-Young optimality guarantee for the $\star_G$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\star_G$ algebra. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\star_G$-SVD with ridge regression provides closed form predictions at $\sim50-90\times$ fewer parameters than parameter-matched MLPs. Algebraic equivariance thus complements architectural equivariance not as a faster-better-cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, per-irrep interpretability, and data-driven physical discovery.

2605.20439 2026-05-21 cs.LG cs.HC 版本更新

Can Conversational XAI Improve User Performance? An Experimental Study

对话式XAI能否提升用户表现?一项实证研究

Sven Kruschel, Julian Rosenberger, Lasse Bohlen, Mathias Kraus, Patrick Zschech

发表机构 * TU Dresden(德累斯顿技术大学) University of Regensburg(罗滕堡大学)

AI总结 本研究通过实验评估对话式XAI对用户表现的影响,探讨其在预测准确性、模型理解和错误识别方面的核心方法及主要贡献。

Comments Accepted at Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy

详情
AI中文摘要

可解释人工智能(XAI)技术旨在为预测模型提供洞察并提升用户表现,但往往未能达到这些期望。对话式XAI助手承诺克服这些限制,但关于其对客观性能指标影响的实证证据仍然有限。我们提出了一种实验设计,通过预测准确性、模型理解和错误识别来评估解释辅助。使用一个可解释性设计的预测模型,我们创建了用户能够通过识别和补偿系统性误差而超越模型的条件。我们将对话辅助与问答辅助进行比较,以评估哪种辅助更有效地支持用户与模型解释互动。初步测试我们实验设计的结果显示,两组参与者(N=42)均显著超越了模型,但两种辅助类型之间没有表现差异,整体参与度较为有限。这些发现为我们的计划全面研究提供了改进方向,包括增强的参与干预措施和对驱动预测改进机制的调查。

英文摘要

Explainable AI (XAI) techniques aim to provide insights into predictive models and enhance user performance, yet they often fall short of these expectations. Conversational XAI assistants promise to overcome such limitations, but empirical evidence on their impact on objective performance measures remains limited. We propose an experimental design for evaluating explanation assistance through prediction accuracy, model understanding, and error identification. Using an explainable-by-design prediction model, we create conditions where users can outperform the model by identifying and compensating for systematic errors. We compare conversational assistance against Q&A-based assistance to assess which better supports users in working with model explanations. Preliminary results from testing our experimental design show that participants (N=42) in both treatments significantly outperformed the model but reveal no performance differences between assistance types and modest engagement overall. These findings inform refinements for our planned full study, including enhanced engagement interventions and investigation of the mechanisms driving improved predictions.

2605.20434 2026-05-21 stat.ML cs.DM cs.LG 版本更新

Contradiction Graphs Determine VC Dimension

矛盾图确定VC维

Jesse Campbell, Daniel Ibaibarriaga, Lev Reyzin

发表机构 * Department of Mathematics, Statistics, & Computer Science(数学、统计与计算机科学系)

AI总结 本文研究二元概念类的矛盾图,通过分析矛盾图的结构确定VC维的阈值,从而精确计算VC维并区分有限与无限VC维。

详情
AI中文摘要

我们研究与二元概念类相关的矛盾图。对于一个概念类$H \subseteq \{0,1\}^X$,顺序-$m$矛盾图$G_m(H)$的顶点是长度为$m$的可由$H$实现的标记序列,当两个序列对某个公共域点赋予相反标签时,两个顶点相邻。我们的主要结果是单个图$G_m(H)$确定阈值谓词$\mathrm{VCdim}(H)\ge m$。因此,完整的序列$(G_m(H))_{m \ge 1}$确定精确的VC维,并且特别地,区分有限与无限VC维,回答了Alon等人(2024)提出的问题。

英文摘要

We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

2605.20413 2026-05-21 cs.LG 版本更新

Supervised Latent Restructuring for Small-Data Quantum Learning in Plant Phenomics

监督潜在重构在植物表型小数据量子学习中的应用

Alakananda Mitra, David H. Fleisher, Vangimalla Reddy, Chittaranjan Ray

发表机构 * Nebraska Water Center, IANR University of Nebraska–Lincoln(内布拉斯加水中心,IA NR 内布拉斯加大学林肯分校) Adaptive Cropping Systems Laboratory USDA-ARS(适应性种植系统实验室 USDA-ARS) Nebraska Water Center, DWFI University of Nebraska–Lincoln(内布拉斯加水中心,DWFI 内布拉斯加大学林肯分校)

AI总结 本文研究了在小数据条件下,通过监督潜在重构提升植物表型数据中高维特征压缩的几何分离性,提出混合工作流程结合PCA和LDA进行潜在空间重构,并利用GPU加速的量子核对齐方法,发现潜在几何结构在小数据量子学习中是关键设计变量。

Comments 11 pages, 4 Tables, 3 Figures

详情
AI中文摘要

高维生物数据往往表现出特征维度与样本数量之间的严重不匹配,这使得在极小数据条件下可靠分类变得困难。在这些情况下,核方法在潜在压缩无法保持类别分离结构时会失去判别能力。我们研究了细粒度植物表型学中的这一问题,并提出了一种混合工作流程,将1280维的深度图像嵌入压缩到64维的PCA空间,然后通过线性判别分析(LDA)重构为11维的监督潜在空间,并在NVIDIA L40S硬件上进行GPU加速的量子核对齐(QKA)。实证研究表明,监督潜在重构显著提高了压缩表示的几何分离性,使轮廓系数从原始嵌入空间中的0.003和PCA-64空间中的-0.006增加到监督LDA-11空间中的0.197。然而,下游经典评估显示存在明显的压缩权衡:线性SVM和XGBoost在重构的潜在空间中有所改善,而RBF-SVM和随机森林在相同的11维瓶颈下则有所下降。在受限的优化预算下,该领域的QKA仍然具有挑战性,表明潜在几何结构本身不足以实现强可训练的量子性能。这些发现将表示几何学定位为小数据量子学习中的关键设计变量,并揭示了从剧烈压缩的生物表示中恢复非线性判别结构的实践难度。

英文摘要

High-dimensional biological data often exhibit a severe mismatch between feature dimensionality and sample size, making reliable classification difficult in extremely small-data regimes. In these settings, kernel methods can lose discriminative power when latent compression fails to preserve class-separating structure. We study this problem in fine-grained plant phenomics and propose a hybrid workflow that compresses 1280-dimensional deep image embeddings into a 64-dimensional PCA space and then restructures them into an 11-dimensional supervised latent space using Linear Discriminant Analysis (LDA), followed by GPU-accelerated Quantum Kernel Alignment (QKA) on NVIDIA L40S hardware. Empirically, supervised latent restructuring substantially improves the geometric separability of the compressed representation, increasing the Silhouette coefficient from 0.003 in the raw embedding space and -0.006 in PCA-64 to 0.197 in the supervised LDA-11 space. However, downstream classical evaluation reveals a clear compression trade-off: Linear SVM and XGBoost improve in the restructured latent space, whereas RBF-SVM and Random Forest degrade under the same 11-dimensional bottleneck. Under a constrained optimization budget, QKA in this regime remains challenging, indicating that latent geometry alone is not sufficient for strong trainable quantum performance. These findings position representation geometry as a central design variable in small-data quantum learning and expose the practical difficulty of recovering nonlinear discriminative structure from aggressively compressed biological representations.

2605.20408 2026-05-21 cs.LG 版本更新

Spectral Souping: A Unified Framework for Online Preference Alignment

谱汤:一种在线偏好对齐的统一框架

Yinlam Chow, Guy Tennenholtz, Ted Yun, James Harrison, Arthur Gretton, Andre Barreto, Bo Dai

发表机构 * Google DeepMind(谷歌深Mind) Google Research(谷歌研究)

AI总结 本文提出了一种统一的在线偏好对齐框架Spectral Souping,通过发现LLM中的通用谱表示,实现了高效的模型合并,从而在不需昂贵在线重训练的情况下快速适应个体用户偏好。

详情
AI中文摘要

基于人类反馈的强化学习(RLHF)能够有效地将大型语言模型(LLMs)与聚合人类偏好对齐,但往往无法解决个体用户多样且冲突的需求。为了解决这个问题,我们引入了Spectral Souping,一种高效的在线偏好对齐统一框架。我们的贡献是发现LLM中的通用谱表示,该表示已被证明对模型合并具有高度适应性。这一理论洞察使我们能够采用两阶段方法:我们首先在离线学习中学习一组专门的策略,每个策略专注于不同的细粒度偏好维度。一个在线适应算法随后在推理时间高效地对这些策略进行“汤化”,通过合并其输出或参数,使模型能够快速适应而无需昂贵的在线重训练。在在线偏好对齐基准测试中的实验表明,我们的方法在现有最先进方法上实现了显著的性能提升,提供了一种可扩展且计算高效的方法,用于动态适应LLMs以适应个体用户偏好。

英文摘要

Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this issue, we introduce Spectral Souping, a unified framework for efficient, online preference alignment. Our contribution is the discovery of a universal spectral representation within LLMs, which is proven to be highly amenable to model merging. This theoretical insight enables a two-phase methodology: we first learn a basis of specialized policies offline, each focused on a distinct, fine-grained preference dimension. An online adaptation algorithm then efficiently ``soups'' these policies at inference time, either by merging their outputs or parameters, enabling rapid model adaptation without the need for costly online retraining w.r.t. tailored preference rewards. Experiments on online preference alignment benchmarks demonstrate that our method achieves significant performance improvements over existing state-of-the-art approaches, presenting a scalable and computationally efficient solution for dynamically adapting LLMs to individual user preferences.

2605.20400 2026-05-21 stat.AP cs.LG stat.ML 版本更新

Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

理解基础设施管理中的劣化随机效应以进行因果发现

Takato Yasuno

AI总结 本文提出了一种结合贝叶斯分层危险模型与因果发现的新框架,用于识别驱动泵设备异质劣化率的操作模式,通过GPU加速NUTS估计随机效应并验证线性假设,揭示不同操作制度需要不同的管理策略。

Comments 20 pages, 7 figures, 4 tables

详情
AI中文摘要

基础设施劣化对资产管理工作构成重大挑战,但现有方法依赖于人口平均模型,忽略了设备特定的异质性。我们提出了一种新的框架,结合贝叶斯分层危险建模与因果发现,以识别驱动泵设备异质劣化率的操作模式。我们的方法首先利用GPU加速的No-U-Turn Sampling (NUTS) 估计泵特定的随机效应 $u_i$,实现比CPU实现快3-5倍的速度提升。然后,我们使用DirectLiNGAM发现22个工程时间序列特征与劣化率之间的因果关系,并根据正 ($u_i > 0$, 更快劣化) 与负 ($u_i \leq 0$, 更慢劣化) 随机效应进行分层。分析112台泵共92,861个观测值,持续650天,我们发现显著的异质性:负组的因果效应比正组大400倍,标准差 (std) 显示在低风险设备上,正因果效应 ($+1.515$) 对劣化率有显著影响。我们通过NonlinearLiNGAM比较验证线性假设,并通过GPU加速展示实际可扩展性。我们的发现使通过揭示不同操作制度需要根本不同的管理方法,推动预测性维护从人口平均到异质性感知决策的进展。

英文摘要

Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.

2605.20396 2026-05-21 cs.LG stat.ML 版本更新

Score-Based Causal Discovery of Latent Variable Causal Models

基于得分的潜在变量因果模型因果发现

Ignavier Ng, Xinshuai Dong, Haoyue Dai, Biwei Huang, Peter Spirtes, Kun Zhang

发表机构 * Carnegie Mellon University(卡内基梅隆大学) University of California, San Diego(加州大学圣地亚哥分校) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 本文提出了一种基于得分的方法,用于识别包含因果相关潜在变量的因果结构,并提供了可识别性保证,同时通过实验验证了方法的有效性。

Comments ICML 2024

详情
AI中文摘要

识别潜在变量及其涉及的因果结构在各种科学领域中都是至关重要的。尽管许多现有工作属于约束性方法(例如条件独立性或秩不足测试),但它们可能面临经验挑战,如测试顺序依赖性、误差传播和选择合适显著性水平的问题。这些问题可以通过精心设计的基于得分的方法(如在没有潜在变量的情况下使用的贪心等价搜索(GES))来缓解。然而,设计包含潜在变量的基于得分的方法却极具挑战性。在本文中,我们开发了能够识别包含因果相关潜在变量的因果结构的基于得分的方法,并提供了可识别性保证。具体而言,我们证明了适当制定的评分函数可以实现结构学习的得分等价性和一致性。我们进一步对文献中考虑的多种结构假设下观测变量边缘分布的有效自由度进行了表征,并据此开发了精确和连续的基于得分的方法。这为几种现有约束性方法提供了统一的视角。实验结果验证了所提出方法的有效性。

英文摘要

Identifying latent variables and the causal structure involving them is essential across various scientific fields. While many existing works fall under the category of constraint-based methods (with e.g. conditional independence or rank deficiency tests), they may face empirical challenges such as testing-order dependency, error propagation, and choosing an appropriate significance level. These issues can potentially be mitigated by properly designed score-based methods, such as Greedy Equivalence Search (GES) (Chickering, 2002) in the specific setting without latent variables. Yet, formulating score-based methods with latent variables is highly challenging. In this work, we develop score-based methods that are capable of identifying causal structures containing causally-related latent variables with identifiability guarantees. Specifically, we show that a properly formulated scoring function can achieve score equivalence and consistency for structure learning of latent variable causal models. We further provide a characterization of the degrees of freedom for the marginal over the observed variables under multiple structural assumptions considered in the literature, and accordingly develop both exact and continuous score-based methods. This offers a unified view of several existing constraint-based methods with different structural assumptions. Experimental results validate the effectiveness of the proposed methods.

2605.20391 2026-05-21 cs.CR cs.LG 版本更新

Latent Geometry as a Structural Monitor: Eigenspace Alignment for Anomaly Detection in Anonymity Networks

潜在几何作为结构监视器:用于匿名网络异常检测的特征空间对齐

Vaibhav Chhabra

发表机构 * USPTO(美国专利局)

AI总结 本文提出利用潜在几何结构来监测匿名网络中的异常,通过特征空间对齐方法检测行为群体中的异常模式,展示了在Tor网络中通过双观察者流程识别稳定九维负载子空间的方法,并验证了其结构稳定性。

Comments 14 pages, 5 figures, 1 table

详情
AI中文摘要

传统异常检测在测量信号超过预设阈值时标记事件,这捕捉到了转变的时刻,但未能捕捉到其前的结构性压力。我们提出将大规模行为群体视为几何能量景观,其变形可以在主要转变前和期间测量。核心论点是结构优先于几何:行为群体的结构组织是信号,而几何度量是测量它的工具。应用于Tor匿名网络连续67天的观测窗口,双观察者流程识别出一个在观测期间保持不变的九维负载子空间,并通过蒙特卡洛模拟在噪声底面以上16.8西格玛验证了该结构。主要检测门在24个确认稳定的窗口中实现了0.0%的误报率。对2026年2月20日确认的基础设施事件的调查正式否定了中继退出假说,识别出连接降级而无拓扑变化为可检测的网络故障模式。结果是一种候选的结构监视框架,适用于具有足够遥测数据的行为群体。

英文摘要

Traditional anomaly detection marks events when measured signals cross predefined thresholds. This captures the moment of transition but not the structural pressure that precedes it. We propose treating large behavioral populations as geometric energy landscapes whose deformation can be measured before and during major transitions. The central thesis is that structure precedes geometry: the structural organization of the population is the signal, and geometric metrics are instruments for measuring it. Applied to the Tor anonymity network across 67 consecutive daily observation windows, the dual-observer pipeline identifies a stable nine-dimensional load-bearing subspace invariant across the observation period and validates this structure by Monte Carlo simulation at 16.8 sigma above the noise floor. Primary detection gates achieve 0.0% false positive rate on 24 confirmed stable windows. Forensic analysis of the February 20, 2026 confirmed infrastructure event formally falsifies the relay-departure hypothesis, identifying connectivity degradation without topology change as a detectable network failure mode. The result is a candidate structural-monitoring framework for behavioral populations with sufficient telemetry.

2605.20390 2026-05-21 cs.CV cs.AI cs.LG cs.RO 版本更新

STELLAR: Scaling 3D Perception Large Models for Autonomous Driving

STELLAR: 为自动驾驶扩展3D感知大模型

Yingwei Li, Xin Huang, Yang Liu, Yang Fu, Alex Zihao Zhu, Chen Song, Junwen Yao, Anant Subramanian, Hao Xiang, Weijing Shi, Yuliang Zou, Tom Hoddes, Zhaoqi Leng, Govind Thattai, Dragomir Anguelov, Mingxing Tan

发表机构 * Waymo UCSD(加州大学圣地亚哥分校)

AI总结 本文研究了大规模训练在自动驾驶感知系统中的应用,通过扩展输入模态并训练大规模模型,实现了在Waymo数据集上的新状态-of-the-art性能。

详情
AI中文摘要

模型扩展通过在多样化数据集上进行大规模训练已显示出显著的成功。然而,尚不清楚相同的范式是否适用于自动驾驶感知系统,因为存在独特的挑战,如融合异构传感器数据和需要复杂的3D空间理解。为弥合这一差距,我们进行了系统分析,研究了规模对这些系统的影响。我们基于稀疏窗口变换器开发了STELLAR模型,扩展了输入模态,包括LiDAR、雷达、相机和地图先验。我们在一个包含5000万驾驶示例的大规模数据集上训练该模型,参数数量高达5亿。我们的大规模实验揭示了模型性能与模型大小、数据和计算之间的经验扩展趋势。所得到的模型在Waymo Open Dataset挑战中建立了新的状态-of-the-art,大幅超越了先前的成果。我们的工作表明,大规模训练是提升自动驾驶感知模型能力极具前景的路径。

英文摘要

Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Window Transformer, by extending the input modalities to include LiDAR, radar, camera, and map prior. We train the model on a large-scale dataset of 50 million driving examples with up to 500 million parameters. Our large-scale experiments reveal empirical scaling trends that connect model performance to model size, data, and compute. The resulting model establishes a new state-of-the-art on the Waymo Open Dataset challenge, outperforming prior arts by a large margin. Our work demonstrates that large-scale training is a highly promising path for advancing the capabilities of perception models for autonomous driving.

2605.20389 2026-05-21 cs.LG cs.AI 版本更新

Nonlocal operator learning for fMRI encoding and decoding tasks

非局部算子学习用于fMRI编码和解码任务

Andreas Kramer, Saugat Acharya, Alice Giola, Emanuele Zappala

发表机构 * Department of Computer Science, Idaho State University(计算机科学系,爱达荷州立大学) Department of Mathematics and Statistics, Idaho State University(数学与统计学系,爱达荷州立大学)

AI总结 本文提出了一种基于神经积分算子的框架,用于fMRI数据的编码和解码任务,探讨了非局部时空上下文的作用,并通过实验验证了更长的时间窗口和视觉皮层与全脑记录对性能和潜在空间几何的影响。

Comments 18 pages, 4 figures, 5 tables. Comments are welcome!

详情
AI中文摘要

功能性磁共振成像(fMRI)数据表现出高维时空结构,使得预测和解码变得具有挑战性。在本工作中,我们研究了基于神经积分算子的模型用于fMRI的编码和解码任务,特别强调非局部时空上下文的作用。我们实现了一个潜在的神经积分算子框架,该框架在辅助空间中执行固定点迭代,通过解码器进行分类和刺激预测。我们在两个开源fMRI数据集上评估了我们的模型。我们的实验检验了从fMRI记录中解码刺激以及从刺激表示中编码fMRI动态。主要关注点是时空上下文的影响:我们系统比较了短和长的时间窗口,以及使用视觉皮层与全脑记录,并分析其对性能和潜在空间几何的影响。在不同任务和数据集中,更长的时间窗口通常会改善结果并产生更具结构化的学习表示。在解码实验中,学习的潜在空间通常比原始数据提供更清晰的类别分离。在编码实验中,尽管由于任务难度绝对性能保持中等,但更长的时间窗口仍能产生一致的改进。这些发现表明,神经积分算子为建模fMRI动态提供了一个有前景的框架,并且更广泛的时空上下文对预测和表示学习都是有益的。更广泛地说,结果表明,利用大脑动态中的分布式非局部结构需要专门设计的模型架构来捕捉此类依赖关系。

英文摘要

Functional MRI data exhibit high-dimensional spatiotemporal structure, making both prediction and decoding challenging. In this work, we investigate neural integral-operator-based models for encoding and decoding tasks in fMRI, with particular emphasis on the role of nonlocal spatiotemporal context. We implement a latent neural integral operator framework that performs fixed point iterations in an auxiliary space from which classification and stimuli prediction is performed via a decoder. We evaluate our model on two open-source fMRI datasets. Our experiments examine both decoding of stimuli from fMRI recordings and encoding of fMRI dynamics from stimulus representations. A main focus is the effect of spatiotemporal context: we systematically compare short and long temporal windows, as well as the use of visual cortex vs whole brain recordings, and analyze their influence on performance and latent-space geometry. Across tasks and datasets, larger temporal windows generally improve results and produce more structured learned representations. In decoding experiments, the learned latent space often provides clearer class separation than the raw data. In encoding experiments, although absolute performance remains moderate due to the difficulty of the task, longer temporal windows still yield consistent gains. These findings suggest that neural integral operators provide a promising framework for modeling fMRI dynamics and that broader spatiotemporal context can be beneficial for both prediction and representation learning. More broadly, the results indicate that exploiting distributed nonlocal structure in brain dynamics requires model architectures specifically designed to capture such dependencies.

2605.20369 2026-05-21 cs.CL cs.AI cs.LG 版本更新

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

DEL:用于大语言模型数值学习的数字熵损失

Zhaohui Zheng, Chenhang He, Shihao Wang, Yuxuan Li, Ming-Ming Cheng, Lei Zhang

发表机构 * The Hong Kong Polytechnic University(香港理工大学) VCIP, College of Computer Science, Nankai University(南开大学计算机学院VCIP)

AI总结 本文提出Digit Entropy Loss (DEL)用于大语言模型的自回归数值学习,通过重新设计传统无监督熵优化,引入数字条件概率和二元交叉熵,使熵优化转向监督方式,同时推广整数基于的数值学习到浮点数优化,从而提升数值预测的准确性。

详情
AI中文摘要

数字预测是大语言模型(LLMs)在数学问题解决和代码生成中的基本能力。广泛采用的最大似然估计(MLE)用于LLM训练并不适合数字预测。最近,惩罚驱动的方法,例如数字标记损失和离散化距离损失,引入了数字距离的归纳偏置,但分别导致了数字分布过度锐化和过度扁平化。在本文中,我们深入分析了LLM的数值学习,并表明现有的数值学习方法在概念上遵循一个准则-距离公式,其中准则项代表优化模式,距离项灌输几何先验。因此,我们提出了Digit Entropy Loss (DEL)用于自回归数值学习,其重新设计传统无监督熵优化的三个关键设计:利用数字条件概率和二元交叉熵将熵优化引导为监督方式;舍弃距离项以避免数值距离的问题;并将整数基于的数值学习推广到浮点数优化,使数值预测更加准确。我们的DEL公式可以结合整数、小数和小数点,将学习目标从单个数字扩展到浮点数领域。在七个数学推理基准测试中使用四个代表性的LLM,包括CodeLlama、Mistral、DeepSeek和Qwen-2.5,进行实验,结果表明DEL在整体预测准确性和数值距离方面均优于其替代方法。源代码在https://github.com/PolyU-VCLab/DEL。

英文摘要

Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction. Recently, penalty-driven approaches, e.g., Number Token Loss and Discretized Distance Loss, introduce an inductive bias of numerical distance but induce over-sharpened and over-flattened digit distributions, respectively. In this paper, we make an in-depth analysis on LLM numerical learning, and show that existing numerical learning methods conceptually follow a criterion-distance formulation, where the criterion term represents optimization pattern and the distance term instills geometric prior. Consequently, we present Digit Entropy Loss (DEL) for auto-regressive numerical learning, which reformulates the conventional unsupervised entropy optimization in three key designs: leveraging digit conditional probability and binary cross-entropy to guide the entropy optimization into a supervised manner; deprecating the distance term to bypass the issue of numerical distance; and generalizing the integer-based numerical learning to floating-point number optimization, enabling more accurate number prediction. Our DEL formulation can incorporate integers, decimals, and decimal points, expanding the learning objective from a single digit to the floating-point number domain. Experiments conducted on seven mathematical reasoning benchmarks with four representative LLMs, including CodeLlama, Mistral, DeepSeek, and Qwen-2.5, demonstrate that DEL consistently outperforms its counterparts in both overall prediction accuracy and numerical distance. Source codes are at https://github.com/PolyU-VCLab/DEL

2605.20357 2026-05-21 cs.LG cs.AI 版本更新

Consistently Informative Soft-Label Temperature for Knowledge Distillation

一致信息软标签温度用于知识蒸馏

Hoang-Chau Luong, Nghia Van Vo, Kaiqi Zhao, Lingwei Chen

发表机构 * Rochester Institute of Technology(罗切斯特理工学院) Oakland University(奥克兰大学)

AI总结 本文提出CIST方法,通过为教师和学生分配样本级自适应温度,解决传统固定温度设计中教师软标签熵不一致和教师-学生logit尺度对齐过严的问题,从而提升知识蒸馏效果。

详情
AI中文摘要

知识蒸馏(KD)通过匹配教师和学生预测分布将知识从高容量教师传递给紧凑学生,温度缩放是平滑教师预测并暴露信息量大的

英文摘要

Knowledge distillation (KD) transfers knowledge from a high-capacity teacher to a compact student by matching their predictive distributions, with temperature scaling serving as a central mechanism for smoothing teacher predictions and exposing informative "dark knowledge" beyond the hard label. However, the standard fixed-temperature design is inherently sample-agnostic. Since samples differ in logit scale and learning difficulty, a single global temperature produces teacher soft labels with highly inconsistent entropy: some predictions remain overly sharp and provide limited inter-class information, whereas others become over-smoothed and lose class-discriminative information. Moreover, sharing the same temperature between teacher and student further imposes rigid logit-scale alignment despite their capacity mismatch. To address these limitations, we propose CIST (Consistently Informative Soft-label Temperature), which assigns separate sample-wise adaptive temperatures to the teacher and student. This design produces consistently informative teacher soft labels while relaxing rigid teacher--student logit-scale matching. It also reweights the distillation objective according to teacher confidence and student learning difficulty. Theoretically, we show that teacher-label entropy is largely governed by the ratio between the maximum teacher logit and the temperature, providing a principled basis for adaptive smoothing. Empirically, CIST mitigates the inconsistency induced by fixed temperature, and experiments on both vision and language distillation tasks show consistent improvements over standard KD and strong baselines with negligible computational overhead.

2605.20355 2026-05-21 cs.RO cs.HC cs.LG 版本更新

Proximal State Nudging: Reducing Skill Atrophy from AI Assistance

近端状态引导:减少人工智能辅助下的技能退化

Megha Srivastava, Jonathan Ouyang, Eric Zhou, Andrew Silva, Emily Sumner, Dorsa Sadigh, Yuchen Cui, Deepak Gopinath, Guy Rosman

发表机构 * Stanford University(斯坦福大学) University of California Los Angeles(加州大学洛杉矶分校) Toyota Research Institute(丰田研究院)

AI总结 本文提出了一种名为近端状态引导(PSN)的共享自主算法,通过引导用户向最易学习的状态发展,同时优化技能发展和任务表现,以减少人工智能辅助下的技能退化问题。

Comments 9 pages

详情
AI中文摘要

技能退化,即在人工智能辅助下人类能力的逐渐下降,对半自主系统的共享控制构成了安全风险,因为在这种情况下,操作员可能无法区分自己的输入与自主修正。我们提出了近端状态引导(PSN),一种共享自主算法,通过引导用户向估计最易学习的状态发展,共同优化技能发展和任务表现。我们首先展示了PSN在平衡无辅助奖励下的学生进步与总体共享表现方面优于现有共享自主基线,使用经典LunarLander环境中的模拟学生。然后,我们呈现了迄今为止关于整合学习兼容共享自主的规划器的人类受试者研究:在CARLA模拟器中的两个驾驶任务(高性能赛车和并线,n=60)中,PSN在无辅助技能方面产生的收益比标准混合共享自主大7倍,同时碰撞次数比无辅助自我练习少50%。

英文摘要

Skill atrophy, the gradual decline of human capability under AI assistance, poses a safety risk in shared-control of semi-autonomous systems, where operators may be unable to distinguish their own inputs from autonomous corrections. We propose Proximal State Nudging (PSN), a shared autonomy algorithm that jointly optimizes for skill development and task performance by nudging users toward states estimated to be most learnable. We first show that PSN outperforms existing shared autonomy baselines in balancing student improvement in unassisted reward with overall shared performance, using simulated students in the classic LunarLander environment. We then present, to the best of our knowledge, the first human subject studies of a planner incorporating learning-compatible shared autonomy: across two driving tasks in the CARLA simulator (High Performance Racing and Parallel Parking, n = 60), PSN produces up to 7x larger gains in unassisted skill than standard blended shared autonomy, while incurring 50% fewer collisions than unassisted self-practice.

2605.20345 2026-05-21 stat.ML cs.LG 版本更新

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

修正的积分拉普拉斯近似法用于潜在高斯模型的贝叶斯推断

Jinlin Lai, Charles C. Margossian, Daniel R. Sheldon

发表机构 * Manning College of Information and Computer Sciences University of Massachusetts Amherst(信息与计算机科学学院 马萨诸塞大学阿姆赫斯特分校) Department of Statistics University of British Columbia(统计学系 不列颠哥伦比亚大学)

AI总结 本文提出了一种重要性采样方案来纠正积分拉普拉斯近似法(ILA)在潜在高斯模型(LGMs)中引入的误差,通过增加重要性采样的样本数使近似后验收敛到正确后验,并在自动微分框架中实现该方法以支持超参数推断中的梯度基算法,特别是哈密顿蒙特卡洛方法。

详情
AI中文摘要

潜在高斯模型(LGMs)是一类流行的贝叶斯分层模型,包括高斯过程、某些空间模型和混合效应模型。对LGMs进行高效贝叶斯推断通常需要对潜在变量进行边缘化。对于具有非高斯似然的LGMs,精确边缘化是不可能的,一种流行的方法是使用积分拉普拉斯近似(ILA)进行近似边缘化。使用ILA会产生一个近似后验,在某些情况下,它可能与正确后验有显著差异,从而影响下游应用。我们提出了一种重要性采样方案来纠正ILA引入的误差。通过增加重要性采样的样本数,ILA产生的后验将收敛到正确后验。这一想法通过伪边缘化、拟蒙特卡洛和随机化拟蒙特卡洛等技术实现。我们将在自动微分框架中实现我们的方法,以支持在超参数推断中的梯度基算法。对于后者,我们特别考虑使用哈密顿蒙特卡洛方法。我们展示了在各种应用模型中减少误差的好处。

英文摘要

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

2605.20314 2026-05-21 cs.LG cs.AI 版本更新

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

数据更少,训练更快:重复较小的数据集通过采样偏差加速学习

Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu

发表机构 * Columbia University(哥伦比亚大学) University of Pennsylvania(宾夕法尼亚大学) Harvard University(哈佛大学)

AI总结 研究探讨了'小数据与大数据差距'现象,即使用更少样本重复训练比使用更大数据集更节省计算资源,通过层间增长和采样偏差机制实现加速,为优化提供了新的归纳偏差。

Comments ICML 2026

详情
AI中文摘要

本文研究了'小数据与大数据差距'现象,即在较少样本上重复训练相比使用较大数据集可以节省训练计算资源。这一现象在算法任务、架构和优化器中均被观察到,无法用现有理论解释。我们提出,这种加速是由于适当的层间增长机制,由采样偏差驱动,且在数据集较小时更为显著。我们通过多种干预措施提供了理论分析和实证证据。研究结果表明,使用较小数据集并进行重复训练不仅是在数据稀缺时的退化策略,而且可以主动作为优化的有利归纳偏差,特别是在推理任务中。

英文摘要

This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and cannot be explained using prior theory. We argue that the speedup comes from appropriate layer-wise growth enabled by sampling biases, which is more pronounced when the dataset size is smaller. We provide both theoretical analysis and empirical evidence from various interventions. Our results suggest that using a smaller dataset with more repetitions is not just a fallback strategy under data scarcity, but can be proactively leveraged as a favorable inductive biases for optimization, particularly in reasoning tasks.

2605.20311 2026-05-21 cs.LG 版本更新

WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning

WaveGraphNet: 通过耦合逆向-前向图学习实现物理一致的引导波损伤定位

Vinay Sharma, Aditya Bharade, Olga Fink

发表机构 * EPFL, Intelligent Maintenance and Operations Systems(瑞士联邦理工学院智能维护与运营系统) EPFL, Intelligent Maintenance(瑞士联邦理工学院智能维护)

AI总结 本文提出WaveGraphNet,一种用于碳纤维增强聚合物板引导波损伤定位的耦合逆向-前向图学习框架,通过图结构建模传感布局,利用图连接性表示测量传播路径,结合逆向分支和前向分支实现损伤定位的鲁棒性提升。

详情
AI中文摘要

引导波结构健康监测通过稀疏的粘结压电换能器网络在复合板中实现损伤定位。然而,从pitch-catch测量推断缺陷的空间位置仍然在仅有有限损伤位置用于训练时受到弱约束。因此,训练以预测缺陷位置的模型可能在已见案例中表现良好,但在未见结构区域泛化能力差。本文提出WaveGraphNet,一种用于引导波损伤定位的耦合逆向-前向图学习框架。传感布局被显式建模为图,其中换能器表示为节点,测量传播路径定义图连接性。逆向分支将图结构化的频谱描述符映射到损伤位置,而前向分支预测与候选位置相关的测量波响应路径的能量偏差模式。在训练过程中,前向分支作为物理一致的正则化器,抑制那些在数值上合理但与测量波响应能量重新分布不一致的位置估计。这种耦合促使推断的损伤坐标与底层波传播行为达成一致。在本基准中,所提出的图基公式为稀疏引导波传感提供了强大的定位模型,并在 extrapolation 到 held-out 区域时相比非图和图基基线表现出改进的鲁棒性。这些结果突显了耦合逆向-前向图学习作为在有限空间覆盖下引导波定位的有效策略的潜力。

英文摘要

Guided-wave structural health monitoring enables damage localization in composite plates using sparse networks of bonded piezoelectric transducers. However, inferring the spatial location of defects from pitch-catch measurements remains weakly constrained when only a limited set of damage locations is available for training. As a result, models trained to predict defect locations may perform well on seen cases but generalize poorly to unseen regions of the structure. This paper proposes WaveGraphNet, a coupled inverse--forward graph learning framework for guided-wave damage localization in Carbon Fiber Reinforced Polymer (CFRP) plates. The sensing layout is explicitly modeled as a graph, where transducers are represented as nodes and measured propagation paths define the graph connectivity. An inverse branch maps graph-structured spectral descriptors of differential guided-wave responses to a damage location, while a forward branch predicts the path-wise energy-deviation patterns of measured wave responses associated with a candidate location. During training, the forward branch serves as a physics-consistent regularizer, discouraging location estimates that are numerically plausible but inconsistent with the measured redistribution of wave-response energy. This coupling encourages agreement between inferred damage coordinates and the underlying wave propagation behavior. Within this benchmark, the proposed graph-based formulation provides a strong localization model for sparse guided-wave sensing and demonstrates improved robustness in extrapolation to held-out regions compared to both non-graph and graph baselines. These results highlight the potential of coupled inverse-forward graph learning as an effective strategy for guided-wave localization under limited spatial coverage.

2605.20308 2026-05-21 cs.CV cs.AI cs.LG 版本更新

SDM: A Powerful Tool for Evaluating Model Robustness

SDM:评估模型鲁棒性的强大工具

Xinlei Liu, Tao Hu, Jichao Xie, Peng Yi, Hailong Ma, Baolin Li

发表机构 * Information Engineering University, Zhengzhou, China Key Laboratory of Cyberspace Endogenous Safety \& Security of Henan Province, Zhengzhou, China Key Laboratory of Cyberspace Security Ministry of Education of China, Zhengzhou, China Songshan Laboratory, Zhengzhou, China

AI总结 本文提出了一种名为SDM的新型梯度攻击方法,通过重新定义对抗样本生成的目标,解决了传统方法中'高损失非对抗样本'导致的性能下降问题,并在实验中证明了其在攻击性能和成本效率上的优势。

Comments 16 pages

详情
Journal ref
Forty-third International Conference on Machine Learning (ICML 2026)
AI中文摘要

基于梯度的攻击方法是评估模型鲁棒性的重要方法。然而,自从提出APGD以来,此类方法难以取得显著突破。为了实现这一效果,我们首先分析了先前方法中导致攻击性能下降的'高损失非对抗样本'问题,并证明该问题源于对抗样本生成目标的不恰当。随后,我们将目标重新定义为

英文摘要

Gradient-based attacks are important methods for evaluating model robustness. However, since the proposal of APGD, it has been difficult for such methods to achieve significant breakthroughs. To achieve such an effect, we first analyze the issue of "high-loss non-adversarial examples" that degrades attack performance in previous methods, and prove that this issue arises from inappropriate objectives for adversarial example generation. Subsequently, we reconstruct the objective as "maximizing the difference between the non-ground-truth label probability upper bound and the ground-truth label probability", and proposes a novel and powerful gradient-based attack method named Sequential Difference Maximization (SDM). SDM establishes a three-layer optimization framework of "cycle-stage-step". It adopts the negative probability loss function and the Directional Probability Difference Ratio (DPDR) loss function in the initial and subsequent optimization stages, respectively, and approaches the ideal objective of adversarial example generation via stage-wise sequential optimization. Experiments demonstrate that compared with previous state-of-the-art methods, SDM not only achieves stronger attack performance but also exhibits superior cost-effectiveness. The code is available at https://github.com/X-L-Liu/ICML-SDM.

2605.20300 2026-05-21 cs.LG cs.AI 版本更新

Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning

鲁棒的子空间约束二次模型用于低维结构学习

Zheng Zhai, Xiaohui Li

发表机构 * Department of Statistics, Faculty of Arts and Sciences at Beijing Normal University, Zhuhai(北京师范大学统计学系,北京师范大学艺术科学 faculty,珠海分校)

AI总结 本文提出了一种鲁棒的子空间约束二次模型(SCQM),用于从高维数据中学习低维结构。基于子空间约束二次矩阵分解(SQMF)框架,该模型能够适应广泛噪声分布,包括广义高斯和径向拉普拉斯模型。这种泛化能力使其在重尾和轻尾噪声下均能保持稳定性能,显著提高了在不同数据场景下的鲁棒性。为高效解决由此产生的非凸优化问题,我们开发了一种基于梯度的算法,配备回溯线搜索策略以确保稳定和高效的收敛。此外,我们还对$\ell_p^p$和$\ell_2$损失函数进行了敏感性分析,阐明了它们在不同噪声特性下的不同行为。大量数值实验验证了理论分析,并展示了所提方法在鲁棒性和重建准确性方面优于现有方法。

详情
AI中文摘要

在本文中,我们提出了一种鲁棒的子空间约束二次模型(SCQM),用于从高维数据中学习低维结构。基于子空间约束二次矩阵分解(SQMF)框架,所提出的模型能够适应广泛噪声分布,包括广义高斯和径向拉普拉斯模型。这种泛化能力使该方法在重尾和轻尾噪声下均能保持稳定性能,从而在不同数据场景下显著提高了鲁棒性。为高效解决由此产生的非凸优化问题,我们开发了一种基于梯度的算法,配备回溯线搜索策略以确保稳定和高效的收敛。此外,我们还对$\ell_p^p$和$\ell_2$损失函数进行了敏感性分析,阐明了它们在不同噪声特性下的不同行为。大量数值实验验证了理论分析,并展示了所提方法在鲁棒性和重建准确性方面优于现有方法。

英文摘要

In this paper, we propose a robust subspace-constrained quadratic model (SCQM) for learning low-dimensional structure from high-dimensional data. Building upon the subspace-constrained quadratic matrix factorization (SQMF) framework, the proposed model accommodates a broad class of noise distributions, including generalized Gaussian and radial Laplace models. This generalization enables reliable performance under both heavy-tailed and light-tailed noise, thereby substantially enhancing robustness across diverse data regimes. To efficiently address the resulting nonconvex optimization problem, we develop a gradient-based algorithm equipped with a backtracking line-search strategy that ensures stable and efficient convergence. In addition, we present a sensitivity analysis of the $\ell_p^p$ and $\ell_2$ loss functions, elucidating their distinct behaviors under varying noise characteristics. Extensive numerical experiments corroborate the theoretical analysis and demonstrate that the proposed approach consistently outperforms existing methods in terms of robustness and reconstruction accuracy.

2605.20299 2026-05-21 cs.LG cs.AI cs.RO 版本更新

Mechanisms of Misgeneralization in Physical Sequence Modeling

物理序列建模中泛化错误的机制

Kento Nishi, Raphael Tang, Karun Kumar, Core Francisco Park, Hidenori Tanaka

发表机构 * Harvard College(哈佛大学) Harvard John A. Paulson School of Engineering and Applied Sciences(哈佛大学约翰·A·保罗森工程与应用科学学院) Comcast AI CBS-NTT Program in Physics of Intelligence, Harvard University(哈佛大学物理智能计划) Physics of Artificial Intelligence Group, NTT Research, Inc., Sunnyvale, CA, USA(人工智能物理研究组,NTT研究公司,美国加利福尼亚州山景城) Microsoft(微软)

AI总结 本文研究了物理序列建模中由于局部误差传播导致的物理泛化错误,提出了一种数据偏差核来预测物理量的质量变化,并提出了基于核的干预策略。

Comments Preprint. kentonishi.com/physical-misgeneralization

详情
AI中文摘要

生成序列模型通常用于在物理领域规划运动,从机器人到机械模拟。在构建训练此类模型的数据集时,工程师可能会选择演示来指定轨迹在物理量如旅行距离或机械能上的分布。例如,一个构建迷宫导航代理的机器人工程师可能会选择旅行距离覆盖固定范围的演示,希望限制代理的预期功率使用。我们发现标准深度学习可以违反这一意图:每个生成的轨迹在单独看来都合理,但物理量的总体分布是错误的。我们将这种失败称为物理泛化错误,并发展了其机制。通过受控的合成任务,我们发现物理泛化错误出现在局部误差典型于模型类通过物理测量传播到恢复分布时。我们用数据偏差核估计这些误差,并利用它来预测在我们的合成任务和更应用的迷宫导航和双摆运动任务中哪些物理量获得或失去质量。最后,我们的机制性解释有助于识别哪些缓解策略在结构上具有前景,并利用它提出了一种基于核的干预。

英文摘要

Generative sequence models are often trained to plan motion in physical domains, from robotics to mechanical simulations. When constructing a dataset to train such a model, engineers may curate demonstrations to specify how trajectories should be distributed over a physical quantity like travel distance or mechanical energy. For example, a roboticist building a maze navigation agent might choose demonstrations whose travel distances cover a fixed range uniformly, hoping to constrain the agent's expected power usage. We find that standard deep learning can violate this intent: each generated trajectory can seem plausible on its own, but the aggregate distribution over the physical quantity is wrong. We call this failure physical misgeneralization, and develop an account of its mechanism. Using controlled synthetic tasks, we show that physical misgeneralization arises when local errors typical of the model class propagate through the physical measurement to shift the recovered distribution. We estimate these errors with a data deviation kernel, and we use it to predict which physical quantities gain or lose mass in both our synthetic and more applied maze navigation and double-pendulum motion tasks. Finally, our mechanistic interpretation helps identify which mitigation strategies are structurally promising, and we use it to propose a kernel-informed intervention.

2605.20297 2026-05-21 cs.CV cs.LG 版本更新

MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery

MedCRP-CL: 通过贝叶斯非参数语义模态发现实现连续医学图像分割

Ziyuan Gao

发表机构 * University College London, London, United Kingdom(伦敦大学学院)

AI总结 该研究提出MedCRP-CL框架,通过在线任务结构发现和结构感知的连续学习方法,解决医学图像分割在持续学习中的挑战,实现了73.3%的Dice得分和仅4.1%的遗忘率。

Comments Accepted by ICML 2026

详情
AI中文摘要

医学图像分割在持续学习中面临根本性挑战:数据按顺序从异质源到来,但有效的持续学习需要发现哪些任务共享足够的结构以受益于联合学习。现有方法要么在所有任务上应用统一约束,导致任务冲突时发生灾难性遗忘,要么需要预定义的任务分组,无法预测未来任务多样性。我们引入MedCRP-CL框架,实现在线任务结构发现和结构感知的持续学习。利用中文餐厅过程(CRP),我们的方法从临床文本提示中动态推断任务分组,无需预定义聚类数量或访问未来任务。我们将发现的分组称为语义模态,因为它们通过整合解剖区域和病理背景捕捉更细粒度的结构。在发现的结构指导下,我们维护语义模态特定的LoRA适配器,通过内模态EWC正则化,确保在不同任务组之间参数隔离,同时促进相似组的知识转移。该框架也是无回放的,仅存储聚合统计信息而非原始患者数据。在16个医学分割任务和四种成像模态上的实验表明,MedCRP-CL实现了73.3%的Dice得分,仅4.1%的遗忘率,优于最佳基线8.0%,同时仅需6倍更少的参数。代码可在https://github.com/zygao930/MedCRP-CL获取。

英文摘要

Medical image segmentation faces a fundamental challenge in continual learning: data arrives sequentially from heterogeneous sources, yet effective continual learning requires discovering which tasks share sufficient structure to benefit from joint learning. Existing methods either apply uniform constraints across all tasks, causing catastrophic forgetting when tasks conflict, or require predefined task groupings that cannot anticipate future task diversity. We introduce MedCRP-CL, a framework that performs online task structure discovery and structure-aware continual learning. Leveraging the Chinese Restaurant Process (CRP), our method dynamically infers task groupings from clinical text prompts as tasks arrive, without requiring predefined cluster counts or access to future tasks. We term these discovered groupings semantic modalities, as they capture finer-grained structure than physical imaging modalities by integrating anatomical region and pathological context. Guided by this discovered structure, we maintain semantic modality-specific LoRA adapters regularized by intra-modality EWC, ensuring parameter isolation across dissimilar task groups while facilitating knowledge transfer within similar ones. The framework is also replay-free, storing only aggregate statistics rather than raw patient data. Experiments on 16 medical segmentation tasks across four imaging modalities demonstrate that MedCRP-CL achieves 73.3% Dice score with only 4.1% forgetting, outperforming the best baseline by 8.0% while requiring 6$\times$ fewer parameters. Code is available at https://github.com/zygao930/MedCRP-CL.

2605.20296 2026-05-21 cs.LG cs.AI 版本更新

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

谱遗忘:无需重新训练的后验能力恢复

Aarash Abro, Muhammad Tahir

发表机构 * Zeta Labs(泽塔实验室) Lahore University of Management Sciences(拉合尔管理科学大学)

AI总结 研究探讨了语言模型在目标任务微调过程中因训练数据未显式威胁而退化的能力现象,提出了一种仅使用预训练检查点和微调后检查点的后验修复方法,通过谱修复技术恢复受损能力并保留目标任务收益。

详情
AI中文摘要

对语言模型进行目标任务微调通常会退化那些训练数据从未显式威胁的能力。我们研究这种现象,称为灾难性遗忘,并提出一种后验修复解决方案,仅使用预训练检查点W_base和其微调后代W_ft。目标不仅是将模型回退到基础检查点,而是恢复微调损坏的能力,同时保留目标任务的收益和任何有益的未显式改进。我们引入了DG-Hard,一种仅使用检查点的谱修复方法,用于微调更新Δ= W_ft - W_base。DG-Hard将Δ视为嵌入在IID-like噪声残差中的低秩任务对齐信号,该信号梯度下降没有动力去除,并对每个权重-增量矩阵应用Donoho-Gavish硬奇异值阈值,保留更新的结构高能部分并去除谱体。这将修复简化为一个闭合形式的SVD过滤步骤,无需数据依赖的调优。一个核心困难是评估:平均准确率隐藏了每个基准的失败,而朴素恢复分数奖励那些简单回退到基础的模型。因此,我们引入了一个分区条件度量,分别跟踪愈合、保留、非损坏和目标任务保留。在14(模型,任务)设置和九个跨领域未显式基准上,DG-Hard在后验基线中实现了最强的平衡修复。DG-Hard还恢复了由良性微调退化的三个独立安全轴的安全对齐,尽管不使用任何对齐数据。这些结果表明,部分微调引起的能力建设损失并非专业化不可避免的后果,而是在权重更新本身中可去除的谱残余。

英文摘要

Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that uses only the pretrained checkpoint $W_{\mathrm{base}}$ and its fine-tuned descendant $W_{\mathrm{ft}}$. The goal is not merely to revert the model toward the base checkpoint, but to recover capabilities damaged by fine-tuning while preserving both the target-task gains and any beneficial held-out improvements. We introduce DG-Hard, a checkpoint-only spectral repair method for the fine-tuning update $Δ= W_{\mathrm{ft}} - W_{\mathrm{base}}$. DG-Hard treats $Δ$ as a low-rank task-aligned signal embedded in an IID-like noise residual that gradient descent has no incentive to remove, and applies the Donoho-Gavish hard singular-value threshold to each weight-delta matrix, keeping the structured high-energy part of the update and removing the spectral bulk. This reduces repair to a closed-form SVD filtering step requiring no data-dependent tuning. A central difficulty is evaluation: average accuracy hides per-benchmark failures, while naive recovery scores reward models that simply revert toward the base. We therefore introduce a partition-conditional metric that separately tracks healing, preservation, non-damage, and target-task retention. Across $14$ (model, task) settings and nine cross-domain held-out benchmarks, DG-Hard achieves the strongest balanced repair among post-hoc baselines. DG-Hard also restores safety alignment degraded by benign fine-tuning on three independent safety axes, despite using no alignment data. These results suggest that part of fine-tuning-induced capability loss is not an unavoidable consequence of specialization, but a removable spectral residue in the weight update itself.

2605.20295 2026-05-21 cs.LG cs.AI 版本更新

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Quant.npu:通过完全静态量化实现高效的移动NPU推理以支持设备端LLM

Jinghe Zhang, Daliang Xu, Chenghua Wang, Weikai Xie, Tao Qi, Yun Ma, Mengwei Xu, Gang Huang

发表机构 * Qualcomm(高通)

AI总结 本文提出Quant.npu框架,通过完全静态量化方法实现高效的移动NPU推理,解决了传统后训练量化方法在NPU硬件约束下的兼容性问题,并在实际移动NPU上实现了较高的准确性和较低的推理延迟。

详情
AI中文摘要

大型语言模型(LLMs)正越来越多地部署在移动设备上,其中神经处理单元(NPUs)需要完全静态量化以实现最优的推理效率。然而,现有的后训练量化(PTQ)方法主要依赖于动态激活量化,使其与NPU硬件约束不兼容。为了弥合高保真PTQ与NPU受限推理之间的差距,我们提出了Quant.npu,一个仅整数的完全静态量化框架。它结合了可学习的量化参数和旋转矩阵,使低比特激活-权重量化无需运行时重新计算量化参数。关键的是,我们发现初始化和选择性优化量化参数对于优化稳定性至关重要,因为不恰当的初始化和简单的联合优化会引发梯度不稳定,破坏旋转矩阵的优化。为此,我们提出了针对不同激活特征的旋转和比特宽感知初始化,以及针对旋转和未旋转张量的分布感知选择性优化(双阶段量化流水线)。此外,我们引入了一种敏感性引导的自适应混合精度方案,以在准确性和推理效率之间取得平衡。在实际移动NPU上的大量实验表明,Quant.npu在准确度上与最先进的方法相当,同时将推理延迟降低了最高15.1%。

英文摘要

Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Units (NPUs) necessitate fully static quantization for optimal inference efficiency. However, existing post-training quantization (PTQ) methods predominantly rely on dynamic activation quantization, rendering them incompatible with NPU hardware constraints. To bridge the gap between high-fidelity PTQ and NPU-constrained inference, we propose Quant.npu, a integer-only fully static quantization framework. It incorporates learnable quantization parameters and rotation matrices, enabling low-bit activation-weight quantization without runtime quantization parameters re-computation. Crucially, we identify that initialization and selective optimization of quantization parameters is pivotal for optimization stability, as improper initialization and naive joint optimization induce gradient instability that disrupts the optimization of rotation matrices. To address this, we propose a rotation-and-bit-width-aware initialization tailored to diverse activation profiles and a distribution-aware selective optimization (two-stage quantization pipeline) tailored to rotated and unrotated tensors. Furthermore, we introduce a sensitivity-guided adaptive mixed-precision scheme to balance accuracy with inference efficiency. Extensive experiments on real-world mobile NPUs demonstrate that Quant.npu achieves comparable accuracy to state-of-the-art methods, while reducing inference latency by up to 15.1%.

2605.20293 2026-05-21 cs.LG cs.AI cs.NE 版本更新

Closed-form predictive coding via hierarchical Gaussian filters

通过分层高斯滤波器实现闭式预测编码

Aleksandrs Baskakovs, Sylvain Estebe, Kenneth Enevoldsen, Kristoffer Nielbo, Chris Mathys, Nicolas Legrand

发表机构 * Center for Humanities Computing(人文计算中心) Aarhus University(奥胡斯大学) Interacting Minds Center(互动心灵中心)

AI总结 本文提出通过分层高斯滤波器实现预测编码,恢复了精度加权的信息传递,实现了动态不确定性估计和Hebbian兼容的更新规则,从而在单个自由能目标下同时学习激活、权重和精度,无需全局误差信号,且无需迭代或自动微分。

详情
AI中文摘要

预测编码(PC)提供了一种局部且生物基础的替代反向传播方法,用于训练人工神经网络,但至今仍较慢,且随着网络深度增加性能急剧下降。我们追溯这两个问题到一个简化:当前PC网络将精度矩阵固定为单位矩阵,丢弃了变分推导所需的精度加权预测误差,以实现快速、局部和贝叶斯的特性。我们通过将预测编码网络表示为深度分层高斯滤波器(HGF)并恢复精度加权的信息传递,从而在每一层实现动态不确定性估计和Hebbian兼容的更新规则。所得到的网络可以在单个自由能目标下同时学习激活、权重和精度,无需全局误差信号,并且在推断过程中无需迭代或自动微分。在FashionMNIST上,我们的解决方案在epoch级的运行时间成本上接近反向传播,同时在更少的epoch中收敛,并在在线、数据效率和概念漂移任务上优于反向传播。因此,我们证明了闭式变分推断与在线精度学习相结合,为深度预测编码网络提供了一个可处理的基础,保留了生物和解释性优势,而无需迭代松弛或全局误差信号。

英文摘要

Predictive coding (PC) offers a local and biologically grounded alternative to backpropagation in the training of artificial neural networks, yet to date, it remains slower, and performance degrades sharply as network depth increases. We trace both problems to a single simplification: current PC networks fix the precision matrix to the identity, discarding precision-weighted prediction errors that the variational derivation requires to be fast, local, and Bayesian. We close this gap by expressing predictive coding networks as deep hierarchical Gaussian filters (HGFs) and restore precision-weighted message passing, yielding dynamic uncertainty estimates and Hebbian-compatible update rules at every layer. The resulting networks can simultaneously learn activations, weights, and precisions under a single free-energy objective, with no global error signal, and resolve inference without requiring iterations or automatic differentiation. On FashionMNIST, our solution approaches backpropagation in epoch-level wall-clock cost while converging in fewer epochs, and outperforms it on online, data efficiency, and concept-drift tasks. We thus establish that closed-form variational inference with online precision learning provides a tractable foundation for deep predictive coding networks, retaining biological and interpretative advantages, without requiring iterative relaxation or global error signals.

2605.20292 2026-05-21 cs.LG 版本更新

TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction

TreeText-CTS: 用于不规则临床时间序列预测的紧凑、可追溯的树路径证据

Kwanhyung Lee, Juhwan Choi, Jongheon Kim, Joohyung Lee, Hyeongwon Jang, Eunho Yang

发表机构 * Kim Jaechul Graduate School of AI, KAIST(金 Jaechul人工智能研究生院,韩国科学技术院)

AI总结 本文提出TreeText-CTS,一种用于不规则临床时间序列预测的紧凑、可追溯的树路径证据方法,通过冻结XGBoost模型生成多尺度窗口摘要,并将激活的树路径转换为确定性、可追溯的证据单元,从而在多个数据集上实现了最佳的AUROC和AUPRC性能。

Comments 27 pages, 4 figures

详情
AI中文摘要

数值时间序列模型可以有效地处理不规则的电子健康记录(EHR)轨迹,但它们不自然地暴露支持每个风险估计的测量和时间模式作为可读的证据。现有的基于文本的界面提高了可读性,但通常依赖于原始序列化,这冗长且重复,或患者级别的自由形式摘要,难以追溯到源测量和时间窗口。为弥合这一差距,我们引入TreeText-CTS(临床时间序列),将不规则EHR轨迹转换为可读、紧凑、可追溯的树路径证据单元,而无需患者级别的摘要或推理时间的自回归解码。TreeText-CTS通过冻结的XGBoost模型路由多尺度窗口摘要,并将激活的树路径转换为由阈值条件组成的确定性、可追溯的证据单元。一个证据选择器将这些单元的有信息子集组合在一起,然后语言模型编码器将其整合用于预测。在PhysioNet 2012死亡率、MIMIC-III死亡率和PhysioNet 2019脓毒性休克发病预测等多个数据集上,TreeText-CTS在评估的基于文本的EHR时间序列接口中实现了最佳的AUROC和AUPRC性能,其AUPRC比最强的先前基于文本的接口提高了6.0到9.7个百分点,同时在数值时间序列模型中保持竞争力。消融实验显示,树路径证据构建、证据选择和语言模型组合各自都对性能有所贡献。因为传递给语言模型编码器的每个跨度都是由激活的树路径阈值条件构建的,TreeText-CTS使提供给最终预测器的证据是可检查和可追溯的。

英文摘要

Numerical time-series models can effectively process irregular electronic health record (EHR) trajectories, but they do not naturally expose the measurements and temporal patterns supporting each risk estimate as readable evidence. Existing text-based interfaces improve readability, but typically rely on either raw serialization, which is lengthy and redundant, or patient-level free-form summaries, which are difficult to trace to source measurements and time windows. To bridge this gap, we introduce TreeText-CTS (Clinical Time-Series), which converts irregular EHR trajectories into human-readable, compact, source-traceable tree-path evidence units without patient-level summarization or inference-time autoregressive decoding. TreeText-CTS routes multi-scale window summaries through frozen XGBoost models and verbalizes activated tree paths as deterministic, source-traceable evidence units composed of threshold conditions. An evidence selector assembles an informative subset of these units, which a language-model encoder then integrates for prediction. Across PhysioNet 2012 mortality, MIMIC-III mortality, and PhysioNet 2019 sepsis-onset forecasting, TreeText-CTS achieves the best AUROC and AUPRC among evaluated text-based EHR time-series interfaces, improving AUPRC by 6.0 to 9.7 absolute percentage points over the strongest prior text-based interface while remaining competitive with numerical time-series models. Ablations show that tree-path evidence construction, evidence selection, and language-model composition each contribute to performance. Because every span passed to the language-model encoder is constructed from activated tree-path threshold conditions, TreeText-CTS makes the evidence supplied to the final predictor inspectable and source-traceable.

2605.20289 2026-05-21 cs.LG cs.AI 版本更新

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

插件式脉冲运算符:突破脉冲变换器中的非线性瓶颈

Xinzhe Yuan, Xiang Peng, Bin Gu, Huan Xiong

发表机构 * IASM, Harbin Institute of Technology, China(哈尔滨工业大学人工智能研究所,中国) School of Artificial Intelligence, Jilin University, China(吉林大学人工智能学院,中国)

AI总结 本文提出了一种插件式框架,通过将Transformer中的非线性运算分解为三个基本算子(除法、指数和ℓ2范数),并利用LIF神经元群体和轻量级位移缩放实现脉冲友好的近似,从而在不需微调的情况下支持常见的Transformer非线性运算。

Comments Accepted to ICML 2026. 9 pages main paper, 8 pages appendix, 6 figures, 5 tables. Correspondence to Bin Gu and Huan Xiong

详情
AI中文摘要

ANN到SNN的转换提供了一条实用且无需训练的途径来构建脉冲大规模语言模型。然而,当前的流程主要关注于脉冲驱动实现Transformer线性代数运算,而对关键非线性运算的支持有限。这种差距限制了与神经形态风格执行约束的兼容性,其中此类非线性通常需要除法、指数或范数计算,这些计算并不自然支持于标准的泄漏积分-放电动力学。为了解决这个问题,我们提出了一种插件式框架,实现了Transformer非线性的脉冲友好的近似,并整合到现有的ANN到SNN流程中。我们的方法将这些非线性计算分解为三个反复出现的基本算子——除法、指数和ℓ2范数——并通过利用LIF神经元群体进行群体计算,并结合轻量级位移缩放以避免浮点运算来实现它们。通过将这些基本算子作为模块化运算块进行组合,我们的框架支持常见的Transformer非线性运算(例如Softmax、SiLU和归一化),而无需任何微调。在一系列LLM Transformer上的实验表明,选择性地替换目标非线性运算符在所有评估任务中导致的精度下降少于1%。

英文摘要

ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transformer linear-algebra operations, while providing limited support for key nonlinear operators. This gap limits compatibility with neuromorphic-style execution constraints, where such nonlinearities typically require division, exponentiation, or norm computations that are not naturally supported by standard leaky integrate-and-fire dynamics. To solve this problem, we propose a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities and integrates into existing ANN-to-SNN pipelines. Our method decomposes these nonlinear computations into three recurring primitives -- division, exponentiation, and $\ell_2$ norms -- and realizes them via population computation using LIF neuron groups, combined with lightweight bit-shift scaling to avoid floating-point arithmetic. By composing these primitives as modular operator blocks, our framework supports common Transformer nonlinearities (e.g., Softmax, SiLU, and normalization) without any fine-tuning. Experiments on a range of LLMs Transformers show that selectively replacing the targeted nonlinear operators incurs less than a $1\%$ accuracy drop across all evaluated tasks.

2605.20287 2026-05-21 cs.LG cs.AI cs.CV 版本更新

FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction

FusionCell: 跨注意力融合布局几何与网络列表拓扑以实现标准单元性能预测

Haoyi Zhang, Kairong Guo, Bojie Zhang, Yibo Lin, Runsheng Wang

发表机构 * School of Integrated Circuits, Peking University, Beijing, China(集成电路学院,北京大学,北京,中国)

AI总结 本文提出FusionCell,通过跨注意力机制融合布局几何和网络列表拓扑,以提高标准单元性能预测的准确性,解决了传统方法忽略布局几何导致的耦合和布局依赖效应的问题。

详情
AI中文摘要

标准单元是数字电路的基本构建块,其延迟和功率对芯片级性能有关键影响;然而,其表征仍依赖于缓慢的仿真扫描,许多快速预测器忽略了布局几何,未能捕捉到耦合和布局依赖效应。挑战在于如何联合表示布局几何和网络列表拓扑,使模型能够同时捕捉细粒度的空间细节和结构连接,以实现准确的性能预测。我们引入FusionCell,一种双模态预测器,将路由布局几何和网络列表拓扑作为输入,并在统一模型中显式融合它们。一个DeiT编码器处理三层路由布局,而图Transformer模型异构设备/网络图。模态通过拓扑引导机制集成,其中网络列表作为结构“地图”主动查询布局中的相关物理区域,以实现联合几何和拓扑推理。我们构建了一个基于ASAP7 PDK的7nm数据集,使用自动工具生成超过19500个单元,涵盖149种类型,针对六个指标:信号上升/下降延迟、过渡和功率。实验结果表明,FusionCell减少了回归误差,平均MAPE为0.92个百分点,并在基线模型上提高了Spearman/Kendall排名,同时将表征过程的速度提高了数十倍,相比电路仿真。

英文摘要

Standard cells form the building blocks of digital circuits, so their delay and power critically influence chip-level performance; yet characterization still relies on slow simulation sweeps, and many fast predictors ignore layout geometry, missing coupling and layout-dependent effects. The challenge is to jointly represent layout geometry and netlist topology so models capture fine-grained spatial details together with structural connectivity for accurate performance prediction. We introduce FusionCell, a dual-modality predictor that treats routed layout geometry and netlist topology as inputs and fuses them explicitly in a unified model. A DeiT encoder processes three-layer routed layouts, while a graph transformer models heterogeneous device/net graphs. The modalities are integrated through a topology-guided mechanism, where the netlist acts as a structural "map" to actively query relevant physical regions in the layout for joint geometric and topological reasoning. We build a 7nm dataset based on the ASAP7 PDK with over 19.5k cells spanning 149 types using automatic tools, targeting six metrics: signal rise/fall delay, transition, and power. Experimental results demonstrate that FusionCell reduces regression error, with an average MAPE of 0.92 percent, and improves Spearman/Kendall ranking over baselines, while accelerating the characterization process by orders of magnitude compared to circuit simulation.

2605.20286 2026-05-21 cs.CR cs.LG 版本更新

Adaptive Probe-based Steering for Robust LLM Jailbreaking

适应性探针引导用于鲁棒大语言模型劫持

Junxi Chen, Junhao Dong, Xiaohua Xie

发表机构 * School of Computer Science(计算机科学学院) Engineering, Sun Yat-Sen University, China(中山大学工程学院,中国) Nanyang Technological University, Singapore(南洋理工大学,新加坡)

AI总结 本文提出了一种基于模型提取的适应性探针引导方法,通过动态调整引导强度来提升大语言模型劫持的鲁棒性和有效性,无需额外对比提示或手动调参,显著提高了攻击效果。

Comments 19 pages, 13 figures, accepted by ICML 2026

详情
AI中文摘要

近期研究表明,对比引导在大语言模型(LLMs)劫持中具有潜力。然而,现有方法依赖于有限且本质上偏见的对比提示,并需要繁琐的手动调整引导强度,限制了其鲁棒性和有效性。本文利用模型提取的想法来引导学习到的引导向量以近似理想向量,并提出基于对比激活统计信息适应性地调整引导强度。实验表明,我们的方法显著提高了基于探针的引导的有效性和鲁棒性,无需任何额外的对比提示或繁琐的手动调参。作为一篇攻击论文,本文旨在揭示加固LLMs的崩溃,将平均有害性得分从6%提升到70%。我们的代码可在https://github.com/fhdnskfbeuv/adaptiveSteering上获得。

英文摘要

Recent work has demonstrated the potential of contrastive steering for jailbreaking Large Language Models (LLMs). However, existing methods rely on limited and inherently biased contrastive prompts and require laborious manual tuning of steering strength, limiting their robustness and effectiveness. In this paper, we leverage the idea of model extraction to guide the learned steering vectors to approximate the ideal one and propose tuning the steering strength adaptively based on contrastive activations' statistics. Experiments demonstrate that our method notably improves the effectiveness and robustness of probe-based steering, without any extra contrastive prompts or laborious manual tuning. Being an attack paper, this paper focuses on revealing the breakdown of fortified LLMs, raising the average harmfulness score from 6\% to 70\%. Our code is available at https://github.com/fhdnskfbeuv/adaptiveSteering.

2605.20285 2026-05-21 cs.LG cs.AI 版本更新

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

反思式X训练:反馈条件化提升跨所有LLM训练阶段的扩展性

Brandon Cui, Ximing Lu, Jaehun Jung, Syeda Nahida Akter, Hyunwoo Kim, Yuxiao Qu, David Acuna, Shrimai Prabhumoye, Yejin Choi, Prithviraj Ammanabrolu

发表机构 * NVIDIA University of Washington(华盛顿大学) Carnegie Mellon University(卡内基梅隆大学) UC San Diego(南加州大学)

AI总结 本文提出反思式训练(IXT),通过利用后续阶段的动态来改进早期阶段,从而提高LLM训练的扩展效率,实验表明该方法在计算效率和性能上均有显著提升。

详情
AI中文摘要

我们探讨了如何更高效地扩展当前LLM训练流水线中的多个不断增长的阶段。我们的核心直觉源于后续阶段的动态(例如训练后)可以用来指导早期阶段(例如预训练)。为此,我们提出了反思式训练(或IXT),受离线奖励条件强化学习启发,适用于训练的任何阶段。IXT使用一个思考奖励模型来用自然语言批评性反馈标注数据,使从流水线的最早阶段开始就能进行质量感知训练。然后通过将生成的反馈作为前缀条件化数据来训练模型——确保在训练早期阶段并非所有token都被同等对待。在7.5-12B基于transformer的密集LLM上进行的全面实验表明,我们的方法:使扩展曲线弯曲,从而在一般情况下实现高达2.8倍的计算效率提升;并在数学和代码等领域达到其他训练方法无法达到的性能水平。

英文摘要

We tackle the question of how to scale more efficiently across the many, ever-growing stages of current LLM training pipelines. Our guiding intuition stems from the fact that the dynamics of later stages of the pipeline, e.g. post-training, can be used to inform earlier stages such as pre-training. To this end, we propose Introspective Training (or IXT), inspired by offline reward-conditioned reinforcement learning and applicable to any stage of training. IXT uses a thinking reward model to annotate data with natural language critique based feedback, enabling quality aware training from the earliest stages of the pipeline. Models are then trained by prefix-conditioning the data with the generated feedback -- ensuring that not all tokens are treated equally starting much earlier in training than usual. Comprehensive experiments on 7.5-12B transformer-based dense LLMs trained from scratch all the way up to 18 Trillion tokens seen show that our method: bends scaling curves resulting in up to 2.8x more compute efficiency generally; and reaches performance levels unachievable for models trained otherwise in domains such as math and code.

2605.20284 2026-05-21 cs.CV cs.AI cs.LG 版本更新

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

JUDO: 一种面向工业异常问答的多模态推理框架

Hyunju Kang, Woohyun Lee, Jaewon Kim, Hogun Park

发表机构 * Sungkyunkwan University(成均馆大学) Seoul National University(首尔国立大学)

AI总结 本文提出JUDO框架,通过结合领域知识和上下文提升多模态推理能力,以解决工业异常检测中模型缺乏领域知识的问题,实验表明其在MMAD基准上优于Qwen2.5-VL-7B和GPT-4o。

Comments Published at ICLR 2026

详情
AI中文摘要

工业异常检测已显著受益于大多模态模型(LMMs),使检测能力超越了单纯的检测,尤其通过视觉引导推理提升图像理解能力。然而,LMMs缺乏领域特定知识,限制了其在复杂工业场景中生成准确响应的能力。在本工作中,我们提出了JUDO,即Juxtaposed Domain-Oriented Multimodal Reasoner,一种能够高效整合领域知识和上下文的视觉和文本推理框架。通过视觉推理,我们的模型通过将查询图像与正常图像进行对比,分割缺陷区域,实现细粒度的视觉比较检查。此外,我们通过监督微调(SFT)注入领域知识,以增强上下文理解,并通过强化学习(GRPO)引导领域推理,采用领域导向的推理过程。实验结果表明,JUDO在MMAD基准上表现优异,超越了Qwen2.5-VL-7B和GPT-4o等模型。这些结果突显了增强领域知识和上下文对有效推理在异常理解中的重要性。

英文摘要

Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually grounded reasoning for better image understanding. However, LMMs lack domain-specific knowledge, which limits their ability to generate accurate responses in complex industrial scenarios. In this work, we present JUDO, Juxtaposed Domain-Oriented Multimodal Reasoner, a framework that efficiently incorporates domain knowledge and context in visual and textual reasoning. Through visual reasoning, our model segments the defect region by juxtaposing query images with normal images as visual domain context, enabling a fine-grained visual comparative inspection. Furthermore, we inject domain knowledge through supervised fine-tuning (SFT) to enhance context understanding and subsequently guide domain reasoning through reinforcement learning (GRPO) with tailored rewards, opting for a domain-oriented reasoning process. Experimental results demonstrate that JUDO achieves superior performance on the MMAD benchmark, surpassing models such as Qwen2.5-VL-7B and GPT-4o. These results highlight the importance of enhancing domain knowledge and context for effective reasoning in anomaly understanding.

2605.20281 2026-05-21 econ.GN cs.LG q-fin.EC 版本更新

The Economics of AI Inference: Inflation Dynamics, Welfare Costs, and Optimal Monetary Policy under the Inference-Cost Phillips Curve

人工智能推理的经济学:通胀动态、福利成本和在推理成本菲利普曲线下的最优货币政策

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov

发表机构 * Department of Economics(经济系) Stockholm University(斯德哥尔摩大学)

AI总结 本文提出了一种统一的微观经济学和货币理论,研究人工智能推理成本及其对通胀、福利和最优货币政策的影响。通过引入推理成本菲利普曲线(ICPC),并证明了其结构斜率,分析了消费者福利的 Hicks-卡尔多分解,推导了广义的泰勒原则,并确定了最优货币政策响应系数。

Comments 6 pages, 5 tables

详情
AI中文摘要

我们发展了一种统一的微观经济学和货币理论,研究人工智能推理成本及其对通胀、福利和最优货币政策的影响。我们引入了推理成本菲利普曲线(ICPC),即一个增强的新凯恩斯菲利普曲线,其中企业层面的差异化商品边际成本包括一个非平凡的人工智能推理成分lambda-bar,并证明了一个闭合形式的结构斜率kappa*_inf = lambda-bar * kappa,其中kappa是标准的Calvo-Yun斜率。我们推导了在推理成本冲击下的消费者福利的 Hicks-卡尔多分解,证明了在增强的经济中的广义泰勒原则,并刻画了在承诺下的最优货币政策响应系数psi*_inf = (1 + phi*rho) * lambda-bar * kappa。一个二阶福利损失公式闭合了模型。我们用两步GMM估计器和Newey-West HAC标准误差以及Hansen J检验将理论与美国2022年M01-2026年M04月度数据相对比,恢复了一个经验斜率kappa-hat_inf = 0.087 (HAC标准误差0.021),该斜率位于结构预测的一个标准误差内。一个50个滚动窗口子窗口的缩放回归得到b-hat = 0.987 (R^2 = 0.998),与近单位弹性传递一致。一个G7简化的面板模型,使用Driscoll-Kraay HAC标准误差,得到b-hat^G7 = 0.094 (s.e. 0.026),并进行了瓦尔德检验,未能拒绝跨国家同质性(p = 0.78)。该框架为人工智能推理成本动态、在生成式AI冲击下的货币政策以及推理驱动通胀的福利成本的联合研究提供了一个单一的均衡框架。

英文摘要

We develop a unified microeconomic and monetary theory of artificial intelligence inference costs and their pass-through to inflation, welfare, and optimal monetary policy. We introduce the Inference-Cost Phillips Curve (ICPC), an augmented New Keynesian Phillips curve in which firm-level marginal costs of producing differentiated goods include a non-trivial AI inference component lambda-bar, and prove a closed-form structural slope kappa*_inf = lambda-bar * kappa, where kappa is the standard Calvo-Yun slope. We derive a welfare-relevant Hicks-Kaldor decomposition of consumer welfare under inference-cost shocks, prove a generalized Taylor principle for the inference-augmented economy, and characterize the optimal monetary policy response coefficient psi*_inf = (1 + phi*rho) * lambda-bar * kappa under commitment. A second-order welfare loss formula closes the model in closed form. We confront the theory with U.S. monthly data 2022:M01-2026:M04 using a two-step GMM estimator with Newey-West HAC standard errors and Hansen J-test, recovering an empirical slope kappa-hat_inf = 0.087 (HAC s.e. 0.021) which lies within one standard error of the structural prediction. A scaling regression over 50 rolling-window subwindows yields b-hat = 0.987 (R^2 = 0.998), consistent with a near-unit-elasticity pass-through. A G7 reduced-form panel with Driscoll-Kraay HAC standard errors yields b-hat^G7 = 0.094 (s.e. 0.026), and a Wald test fails to reject cross-country homogeneity (p = 0.78). The framework provides a single equilibrium scaffold for the joint study of AI inference cost dynamics, monetary policy under generative-AI shocks, and the welfare cost of inference-driven inflation.

2605.20279 2026-05-21 econ.GN cs.CY cs.LG q-fin.EC 版本更新

The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets

模型崩溃的经济学:均衡、福利与合成数据市场中的最优来源补贴

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov

发表机构 * Department of Economics(经济系) Stockholm University(斯德哥尔摩大学)

AI总结 本文研究了合成数据市场中模型崩溃的微观经济学问题,提出了合成数据污染均衡理论,推导了福利分解公式,并得出了最优来源补贴和水印强度的闭式表达式,同时证明了信息约束下的实现不可能性。

Comments 7 pages, 5 tables, 1 algorithm; IEEEtran conference format; submitted to IEEE BigData 2026

详情
AI中文摘要

生成式人工智能正在迅速改变训练数据的供应端:越来越多的新令牌、图像和结构化记录是由前一代模型而非人类创作者生成的。对这类合成内容的递归训练会引发可测量且往往不可逆的分布忠实度损失,这种现象称为模型崩溃。本文发展了首个统一的合成数据市场微观经济学理论,引入了合成数据污染均衡(SDCE),证明了其存在性和通用唯一性,推导了福利分解W = W_prod + W_cons - L_coll - L_info,建立了Wasserstein-梯度流均场崩溃极限,证明了在信息约束下的实现不可能性,并获得了福利最大化来源补贴s* = KL(q||p)/(2 kappa)和福利最大化水印强度w* = (1 - psi) KL(q||p)/(2 kappa psi)的闭式表达式。证明了任何仅使用生产端观察的来源估计器的信息论Cramer-Rao下限,并展示了Provenance-Market Iterative Retraining(PMIR)算法在常数范围内达到该下限并收敛到epsilon-SDCE的O(epsilon^-2 log T)次迭代。对C4合成基准的简化形式OLS估计在十个重新训练世代上得到崩溃率系数b-hat = 0.181(HAC标准差0.024),在结构预测0.183的一标准误差内。校准实验将第十代模型质量提升23.1%超过无监管基准,同时将2-Wasserstein漂移从0.318降至0.142。在世代t ∈ {1,...,10}上的缩放实验恢复了对数形式的崩溃定律log Q_t = log Q_0 - 0.183 t rho^2,R^2 = 0.962。

英文摘要

Generative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa) and the welfare-maximizing watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi). We prove an information-theoretic Cramer-Rao lower bound on any provenance estimator using only producer-side observations and show that the Provenance-Market Iterative Retraining (PMIR) algorithm attains this bound up to constants while converging to an epsilon-SDCE in O(epsilon^-2 log T) iterations. A reduced-form OLS estimation on a C4-synthetic benchmark over ten retraining generations yields a collapse-rate coefficient b-hat = 0.181 (HAC s.e. 0.024), within one standard error of the structural prediction 0.183. Calibrated experiments raise generation-ten model quality by 23.1 percent over the unregulated benchmark while lowering the 2-Wasserstein drift on a held-out diversity probe from 0.318 to 0.142. Scaling experiments over generations t in {1,...,10} recover a logarithmic-in-t collapse law log Q_t = log Q_0 - 0.183 t rho^2 with R^2 = 0.962.

2605.20276 2026-05-21 cs.LG 版本更新

OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization

OmniISR: 一个通过中间监督和正则化实现集中学习和联邦学习统一框架

Wei-Bin Kou, Guangxu Zhu, Ming Tang, Chen Zhang, Lisheng Wu, Lei Zhou, Yujiu Yang

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China(清华大学深圳国际研究生院,清华大学,深圳,中国) Shenzhen Research Institute of Big Data, Shenzhen, China(深圳大数据研究院,深圳,中国) Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学计算机科学与工程系,深圳,中国) Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China(香港大学电子与电气工程系,香港,中国) Yinwang Intelligent Technology Co. Ltd., Shenzhen, China(云网智能科技有限公司,深圳,中国)

AI总结 本文提出OmniISR框架,通过中间监督和正则化信号融合纯集中学习、纯联邦学习和混合集中-联邦学习训练模式,解决了集中学习和联邦学习之间的不兼容优化问题,并在理论上推导了收敛界、联邦漂移界、梯度对齐保证和逃逸时间界。

Comments 18 pages

详情
AI中文摘要

边缘智能的全球部署跨越异构法律框架。虽然一些地区允许通过云数据聚合进行集中学习(CL),而其他地区则要求严格的数据本地化, necessitating 联邦学习(FL)。这种操作二元性引入了两个不兼容的优化制度(即无偏全局梯度但伴随内部协变量漂移的CL与有偏、易漂的本地更新的FL),导致任何简单的整合都缺乏严谨的理论保证。为填补这一空白,我们提出OmniISR,一个统一的框架,通过在多个隐藏层中配备中间监督和正则化(ISR)信号来融合纯CL、纯FL和混合CL-FL训练模式。具体来说,我们提出(i)使用互信息(MI)作为中间监督以对齐CL中的漂移内部协变量和FL中的客户端漂移表示,以及(ii)采用负熵(NE)作为中间正则化器以惩罚过度自信的预测,保持表示不确定性,并避免设备特定的崩溃。在理论方面,我们推导了(i)一个统一的、ISR无关的、非渐近的O(1/sqrt(T))收敛界,显示引入的ISR不违反标准SGD收敛,(ii)一个联邦漂移界,量化了ISR减少的客户端漂移,(iii)一个梯度对齐保证,确保在轻微偏置下非冲突的CL和FL更新,以及(iv)一个显式逃逸时间界,表明CL-FL混合混合扩大了有效随机性并加速了从严格鞍点的逃逸。广泛的实验表明,OmniISR在集中和联邦范式中一致提高了模型性能,减少了CL-FL差距22.60%,并在多个FL算法中产生了37/48配对指标胜利。

英文摘要

The global deployment of edge intelligence operates across heterogeneous legal frameworks. While some regions permit centralized learning (CL) via cloud data aggregation, others enforce strict data localization, necessitating federated learning (FL). This operational dichotomy introduces two incompatible optimization regimes (i.e., unbiased global gradients yet coupled with internal covariate shift in CL versus biased, drift-prone local updates in FL), resulting in that any naive integration of the two lacks rigorous theoretical guarantees. To fill this gap, we propose OmniISR, a unified framework that fuses pure CL, pure FL, and hybrid CL-FL training modes via equipping intermediate supervision and regularization (ISR) signals at multiple hidden layers. Specifically, we propose (i) to use mutual-information (MI) as intermediate supervision to align shifting internal covariate in CL and client-drifting representations in FL, and (ii) to adopt negative-entropy (NE) as intermediate regularizer to penalize overconfident prediction, preserve representational uncertainty, and avoid device-specific collapse. On the theory side, we derive (i) a unified, ISR-agnostic, and non-asymptotic O(1/sqrt(T)) convergence bound that shows the introduced ISR does not violate standard SGD convergence, (ii) a federated drift-bound that quantifies the ISR-reduced client drift, (iii) a gradient-alignment guarantee that ensures non-conflicting CL and FL updates under mild bias, and (iv) an explicit escape-time bound that indicates that CL-FL hybrid mixing enlarges effective stochasticity and accelerates escape from strict saddles. Extensive experiments demonstrate that OmniISR consistently improves model performance in both centralized and federated paradigms, reduces the CL-FL gap by 22.60%, and yields 37/48 paired metric wins across multiple FL algorithms.

2605.20273 2026-05-21 cs.LG cs.AI 版本更新

Modality-Decoupled Online Recursive Editing

模态解耦的在线递归编辑

Siyuan Li, Youyuan Zhang, Fangming Liu, Jing Li

发表机构 * Harbin Institute of Technology, Shenzhen, China.(哈尔滨工业大学(深圳)) Peng Cheng Laboratory, China.(鹏城实验室) Huazhong University of Science and Technology, China(华中科技大学)

AI总结 本文提出M-ORE,一种用于持续多模态大语言模型适应的模态解耦在线递归编辑器,通过统一的近端投影公式和Sherman-Morrison递归实现常数级的每编辑开销,从而在保持模块局部统计信息和固定正交低秩编辑子空间的同时,减少长周期干扰,提升可靠性、通用性和局部性。

详情
AI中文摘要

针对多模态大语言模型(MLLMs)的在线模型编辑需要在计算和内存预算限制下处理连续的纠正流,但为文本-only LLMs开发的编辑器在MLLMs上往往表现不佳:视觉主导的激活偏移了塑造更新的统计信息,导致跨模态冲突,而顺序写入在共享的编辑空间中交织,放大了长周期干扰,导致跨编辑干扰。为了解决这些问题,我们提出了M-ORE,一种用于持续MLLM适应的模态解耦在线递归编辑器。M-ORE源自统一的近端投影公式,并允许通过Sherman-Morrison递归实现闭式更新,从而实现每编辑常数开销。它维护文本堆栈和视觉投影器的模块级局部统计信息,以避免视觉主导的更新塑造,并通过Sherman-Morrison递归在固定正交低秩编辑子空间中进行持续更新,以缓解长周期干扰。在多个MLLM基础架构和在线编辑基准上的实验表明,我们的M-ORE方法在可靠性、通用性和局部性方面优于强大的基线方法,同时实现了有利的质量-效率扩展。我们的代码在https://github.com/lab-klc/M-ORE上公开可用。

英文摘要

Online model editing for multimodal large language models (MLLMs) requires assimilating a stream of corrections under tight compute and memory budgets. Yet editors developed for text-only LLMs often degrade on MLLMs: visually dominant activations skew the statistics that shape updates, causing cross-modal conflict, while sequential writes become entangled in a shared edit space and amplify long-horizon interference, causing inter-edit interference. To address these, we propose M-ORE, a modality-decoupled online recursive editor for lifelong MLLM adaptation. M-ORE is derived from a unified proximal-projection formulation and admits a closed-form update with a Sherman-Morrison recursion, yielding constant per-edit overhead. It maintains module-wise locality statistics for the text stack and the visual projector to avoid visually dominated update shaping and performs continual updates in a fixed orthogonal low-rank edit subspace via a Sherman-Morrison recursion to mitigate long-horizon interference. Experiments on multiple MLLM backbones and online editing benchmarks show that our M-ORE method consistently improves reliability, generality, and locality over strong baselines, while achieving favorable quality-efficiency scaling. Our code is publicly available at https://github.com/lab-klc/M-ORE.

2605.20272 2026-05-21 cs.LG cs.AI 版本更新

Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

更小的抽象状态空间在强化学习中实现跨尺度泛化

Nasehatul Mustakim, Lucas Lehnert

发表机构 * Department of Computer Science(计算机科学系) University of Saskatchewan(萨斯喀彻温大学) Saskatoon, Saskatchewan, Canada(加拿大萨斯喀彻温省萨斯喀彻温市)

AI总结 本文提出了一种理论模型,通过扩展POMDP中的状态抽象框架,定义了 successor-weighted model reduction,从而在强化学习代理中实现跨尺度泛化,并分析了抽象状态空间大小对泛化能力的影响。

详情
AI中文摘要

尽管人类能够轻易地将抽象概念推广到更复杂或更大的任务中,但构建具备这种能力的强化学习(RL)系统仍然难以实现。本文提出了首个关于如何在RL代理中实现Out-of-Distribution(OOD)泛化的理论模型。我们的方法考虑了部分可观测马尔可夫决策过程(POMDPs),并假设智能体使用抽象函数来确定哪些经验可以被视为等价,哪些必须区分。首先,我们扩展了现有的状态抽象框架和证明技术到POMDPs。然后,我们定义了successor-weighted model reduction,这是一种允许压缩到比先前定义更小的抽象空间的模型缩减变体。我们推导了代理OOD测试性能的界限,从而定义了实现OOD泛化的条件。该界限将代理的性能损失分解为近似和估计误差,揭示了减少代理抽象状态空间大小如何提高测试性能和OOD泛化能力。我们的分析表明,限制代理在有限的抽象状态集合上操作对于实现更复杂任务的泛化是必要的。我们的结果鼓励进一步研究学习能够跨不同复杂程度任务进行扩展的RL架构。

英文摘要

While humans readily generalize abstract concepts to more complex or larger tasks, building Reinforcement Learning (RL) systems with this ability remains elusive. Here, we present the first theoretical model of how such Out-of-Distribution (OOD) generalization can be achieved in RL agents. Our approach considers Partially Observable Markov Decision Processes (POMDPs) and assumes that an intelligent agent uses an abstraction function to determine which experiences can be treated as equivalent and which must be distinguished. First, we extend the existing state abstraction framework and proof techniques to POMDPs. Then, we define a successor-weighted model reduction, a model reduction variant that enables compression into smaller abstract spaces than prior definitions allow. We derive a bound on the agent's OOD test performance, thereby defining the conditions under which OOD generalization is achievable. This bound decomposes an agent's performance loss into approximation and estimation errors, revealing how reducing an agent's abstract state space size improves test performance and OOD generalization. Our analysis suggests that constraining an agent to operate over a small, finite set of abstract states is necessary for achieving generalization to more complex tasks. Our results motivate further research into learning RL architectures that scale across tasks of varying complexity levels.

2605.20271 2026-05-21 stat.ML cs.LG 版本更新

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

多头注意力作为恩德里亚-沃森估计的集合:方差减少、去相关和最优头多样性

Ernest Fokoué

发表机构 * School of Mathematics and Statistics, College of Science(数学与统计学学院,科学学院)

AI总结 本文提出多头注意力可以视为恩德里亚-沃森核回归估计器的集合,通过分析头输出的去相关性,推导出方差减少与头多样性之间的关系,并提出头多样性指数来衡量不同头之间的去相关程度,最终得出最优的头数量和维度分配方案。

Comments 14 pages

详情
AI中文摘要

我们发展了多头注意力(MHA)作为恩德里亚-沃森(NW)核回归估计器集合的严谨统计理论。基于单头softmax注意力与NW估计器之间的代数恒等式,我们证明MHA是H个NW估计器的结构化集合,每个在键空间的不同的学习投影子空间中操作。我们推导出MHA均方误差的显式偏倚-方差-协方差分解,表明方差减少不仅取决于头数H,还根本上取决于头输出的去相关性。去相关由学习投影子空间之间的主角之间决定:正交投影产生最大方差减少;对齐投影产生无。我们引入头多样性指数(HDI),一个可计算的谱度量,衡量头之间的去相关程度,并证明MHA均方误差随HDI单调递减。这为经验观察到的注意力头的专业化提供了第一个严谨的理论解释。在固定总维度预算D=H*d_k下,我们解决最优头维度分配问题,推导出MSE最小化的配对(H*,d_k*)从数据分布和回归平滑度。解决方案得出新的架构扩展定律:最优每头维度随着训练集大小对数增长,而最优头数几乎与总预算D线性增长。我们的框架统一了三个先前的工作:单头注意力的NW理论、集合学习的一般加权理论以及生物和计算集合之间的去相关-方差减少同构性。多头注意力是Transformer对通用原则的实例化:相同代理加上促进多样性的机制产生涌现最优性。

英文摘要

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

2605.20270 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

conformal selective acting: any-time-valid risk control for rlvr-trained llms

Hamed Khosravi, Xiaoming Huo

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 该研究提出了一种 conformal selective acting 方法,用于在 rlvr 训练的 llms 部署中实现 anytime-valid 的风险控制,通过在部署要求下强制一个空单元,利用 e-process 和 bonferroni 网格来维护 pathwise 有效性,同时在多个基准测试中证明了其有效性。

详情
AI中文摘要

一个本地专家 llm,通过在操作员本地数据上使用强化学习从可验证奖励 (rlvr) 进行微调,被安装在一个受监管的组织中,具有每个部署的误差预算 α。操作员需要在每个回合为该部署的流提供安全证书:不跨部署汇总,不等待长期平均。现有封装器无法在自适应、在线更新的流上实现这一点:离线 conformal 风险方法需要可交换性;在线 conformal 方法仅绑定长期平均;非可交换扩展是边际有效的;最接近的 anytime 封装器,A-RCPS,控制的是边际风险而非选择性风险。使用 (测试统计量,有效性保证,部署规则) 框架,我们识别了一个被部署要求强制的空单元:e-process 每个阈值,选择性风险,anytime-pathwise 有效性,max-certified-threshold 规则。Conformal Selective Acting (CSA) 填充它作为每回合的封装器,维护每个阈值上的 ville 型 e-process 在 bonferroni 网格上,评估相对于 rlvr 过滤器。在可预测的更新和 isotonic-calibrated 单调风险下,我们证明了 (i) 一个 anytime-pathwise 选择性风险界 $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$,(ii) 与 $Θ(arη^{-2}\log(1/δ))$ 匹配的认证率,以及 (iii) 与 horizon 无关的发布率差距。在八个专家基准 ($480$ 流)、十六个对抗性分布偏移单元 ($160$ 流) 和五个 live Expert-Iteration RLVR 单元 (在四个基础模型上使用在线 LoRA 在三个架构家族中) ($10{,}300$ 轮) 中,CSA 是十种方法中唯一一个在每个单元上都满足 pathwise 有效性和非拒绝部署的方法。我们不提出新的 llm、训练算法或策略类;CSA 是部署端的补充,与模型正交,适用于无法使用前沿 API 的操作员。

英文摘要

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

2605.20269 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

捕捉移动子空间:超越平稳性的低秩老虎机

Hamed Khosravi, Xiaoming Huo

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering(H. Milton Stewart工业与系统工程学院) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文研究了在子空间漂移的情况下,低秩线性上下文老虎机的问题,提出了一种新的算法SPSC,在保持子空间变化的同时,实现了基于秩的动态遗憾率。

详情
AI中文摘要

许多老虎机应用(推荐、临床给药、广告定向)有两个事实,以往的工作只孤立处理:奖励生活在低维潜在子空间上,且该子空间漂移。静态低秩老虎机利用秩但受子空间变化影响;非静态线性老虎机适应漂移但以环境速率$\widetilde{O}(d\sqrt{T})$工作。我们研究了分段静态低秩线性上下文老虎机,具有标量反馈:$θ_t = B_k^\star w_t$,其中秩-$r$因子$B_k^\star\in\mathbb{R}^{d\times r}$在每个未知的$K$段内恒定,且可以在边界处改变。我们的结果在三个轴上都是紧致的。 (i) 识别边界。在单次标量奖励下,移动子空间可通过奖励的二次函数来恢复,当且仅当三个探针侧条件成立:已知噪声方差、有界状态-噪声耦合、以及全维探针支持。每个都是在无限制二次矩问题中的必要条件,且共同它们是充分的,表征了解决区域的边界。 (ii) 算法和动态遗憾。SPSC在学习的$r$维子空间内交替等距探针与窗口投影岭UCB利用;CUSUM样式的变体在线发现段边界。成本动态遗憾是$\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$,用内在秩代替环境$d\sqrt{T}$速率。 (iii) 实验。在十一基准上,从合成、UCI/MovieLens、半合成临床和ZOZOTOWN生产日志数据跨度,SPSC在$d-r\gtrsim T^{1/6}$时优于非静态和低秩基线,匹配分析交叉点。据我们所知,这是在该设置中首次工作来表征识别边界并达到内在秩动态遗憾率的工作。

英文摘要

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $θ_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

2605.20268 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

Chronicle:一种用于联合语言和时间序列理解的多模态基础模型

Paul Quinlan, Jeremy Levasseur, Qingguo Li, Xiaodan Zhu

发表机构 * InertialAI Department of Electrical and Computer Engineering, Queen’s University(皇后大学电气与计算机工程系) Department of Mechanical and Materials Engineering, Queen’s University(皇后大学机械与材料工程系)

AI总结 本文提出Chronicle,一种联合训练语言和时间序列的多模态基础模型,通过统一架构实现两者共享参数,从而在多个任务上取得了优异表现。

详情
AI中文摘要

现实中的时间序列通常伴随着文本:元数据、描述、新闻、报告。然而,时间序列基础模型通常孤立处理数值序列,而试图弥合两者差距的多模态文本-时间序列模型往往事后使用预训练语言模型,继承了从未见过时间数据的表示。这些模型几乎全部在其他多模态基线上进行评估,而不是在各自领域最强的单模基础模型上进行评估,这留下了联合训练是否必要的疑问。我们提出了Chronicle,一个仅含324M参数的解码器-only变压器,从头开始在自然语言和时间序列上进行单统一架构的训练。两种模态共享相同的transformer块、注意力机制和残差流;预训练的大部分使用单模批次,因此跨模态能力纯粹来自共享参数,辅以一个短的对齐阶段,交替处理两者。据我们所知,Chronicle是第一个从头开始联合训练文本和时间序列的模型,也是第一个在两个领域中评估专用基础模型的多模态模型。它在19个NLU任务上与Gemma-3-270M-PT相当,在24个UCR/UEA数据集上设定了新的冻结-嵌入时间序列分类标准,并在Time-MMD上产生多模态预测,优于所有监督融合基线,所有这些都来自单一主干。

英文摘要

Real-world time series come with text: metadata, descriptions, news, reports. Yet time series foundation models process numerical sequences in isolation, and the multimodal text-and-time-series models that attempt to bridge the two all adapt a pretrained language model post hoc, inheriting representations shaped without ever seeing temporal data. These models are also evaluated almost exclusively against other multimodal baselines, not against the strongest unimodal foundation models in either domain, leaving open whether joint training is needed at all. We present Chronicle, a compact 324M-parameter decoder-only transformer trained from scratch on natural language and time series within a single unified architecture. Both modalities share the same transformer blocks, attention mechanism, and residual stream; the bulk of pretraining uses unimodal batches so cross-modal capability emerges purely from shared parameters, with a short alignment stage that interleaves the two. To our knowledge, Chronicle is the first model jointly pretrained on text and time series from scratch, and the first multimodal model evaluated against dedicated foundation models in both domains. It matches Gemma-3-270M-PT on 19 NLU tasks, sets a new bar for frozen-embedding time series classification on 24 UCR/UEA datasets, and produces multimodal forecasts on Time-MMD that beat every supervised fusion baseline, all from a single backbone.

2605.20262 2026-05-21 cs.LG cs.AI 版本更新

Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

残差铺垫:在选择性拒绝编辑中的路由瓶颈诊断

Bryce Hinkley, Peyman Najafirad

发表机构 * University of Texas at San Antonio(德克萨斯大学圣安东尼奥分校)

AI总结 本文研究了选择性拒绝编辑作为三重控制问题,通过引入残差铺垫方法,分离路由选择、是否干预和残差编辑能力,从而减少编辑拒绝率并提高良性分布和有害分布的保留率。

详情
AI中文摘要

我们研究选择性拒绝编辑作为三重控制问题:在指定的编辑提示上诱导非拒绝,同时在编辑集之外保持良性行为和有害拒绝。我们引入残差铺垫,一种用于冻结指令微调变压器的路由残差编辑方法,将路由选择、是否干预与残差编辑能力分离。早期层的路由预测一个标量门和专家混合;当激活时,提示条件的瓶颈残差专家应用后期层的残差更新,同时保持骨干不变。这种分解支持一个oracle路由诊断,其中仅将学习到的标量门替换为保留的编辑/保留标签,其余残差编辑器和冻结的骨干保持不变。在主要的Gemma-3-4B-IT保留分割上,学习到的残差铺垫将编辑拒绝率从88.6%降至4.0%,同时保持95.5%的良性分布和87.3%的有害分布。相同协议的一向引导控制在编辑成功方面要弱得多,留下编辑拒绝率为86.8%(针对Edit-target ActAdd)和78.9%(针对DIM风格的拒绝引导)。剩余的失败是偏离目标的有害-保留退化:有害拒绝仍低于冻结基础率,65.3% vs. 81.6%。在六个骨干上,oracle路由在每行报告的指标上都提高了保留侧的诊断分数,中位数增益+12.9个百分点,支持了学习到的路由选择是主要观察到的瓶颈的解释。对两个骨干的轨迹诊断进一步表明,运动方向是朝向编辑目标延续而非通用拒绝抑制。

英文摘要

We study selective refusal editing as a three-way control problem: induce non-refusal on designated edit prompts while preserving benign behavior and harmful refusals outside the edit set. We introduce Residual Paving, a routed residual editing method for frozen instruction-tuned transformers that separates route selectivity, whether to intervene, from residual-edit capacity, what edit to apply. An early-layer router predicts a scalar gate and expert mixture; when active, prompt-conditioned bottleneck residual experts apply later-layer residual updates while leaving the backbone unchanged. This decomposition supports an oracle-routing diagnostic where only the learned scalar gate is replaced with the held-out edit/keep label, leaving the residual editor and frozen backbone fixed. On the primary Gemma-3-4B-IT held-out split, learned Residual Paving reduces edit refusal from 88.6% to 4.0%, with 95.5% benign distribution preservation and 87.3% harmful distribution preservation. Same-protocol one-direction steering controls are much weaker on edit success, leaving edit refusal at 86.8% for Edit-target ActAdd and 78.9% for DIM-style refusal steering. The remaining failure is off-target harmful-keep degradation: harmful refusal remains below the frozen-base rate, 65.3% vs. 81.6%. Across six backbones, oracle routing improves the keep-side diagnostic score on every reported row, with median gain +12.9 pp, supporting the interpretation that learned route selectivity is the main observed bottleneck. Trajectory diagnostics on two backbones further suggest directed movement toward edit-target continuations rather than generic refusal suppression.

2605.20258 2026-05-21 cs.LG cs.AI cs.CR 版本更新

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

需要两人:互补的自我蒸馏用于大语言模型中的上下文完整性

Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi, Hyomin Lee, Kangsan Kim, Jinheon Baek, Seong Joon Oh, Sung Ju Hwang

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出SELFCI框架,通过分离信息抑制与任务解决,解决大语言模型中隐私与效用的权衡问题,通过互补的自我蒸馏方法提升上下文完整性。

Comments 28 pages, 16 figures

详情
AI中文摘要

上下文完整性(CI)定义隐私不仅仅是保持信息隐藏,而是根据给定情境的规范来管理信息流。随着大型语言模型越来越多地被用作个人代理处理敏感工作流程,遵循CI变得至关重要。然而,即使前沿模型在做出披露决策时仍然不可靠,现有的缓解策略往往会降低基础任务性能。为了解决这一隐私-效用权衡问题,我们提出了SELFCI,一种互补的自我蒸馏框架,将信息抑制与任务解决解耦。SELFCI联合优化两个独立的反向KL散度,这些散度来源于反馈得到的不同教师分布:一个鼓励保留与任务相关的信息以提高效用,另一个强制最小化和适当披露。这种互补的公式诱导出一个专家产品(PoE)目标,使策略与能力和隐私需求的交集对齐。实证评估显示,SELFCI无需依赖昂贵的外部监督,始终优于竞争基线,如在线强化学习算法(例如GRPO)。这些趋势进一步扩展到涉及代理工作流程和积累私人上下文的离域设置中,表明SELFCI为实现CI对齐提供了一条实用路径。

英文摘要

Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary formulation induces a Product-of-Experts (PoE) target, aligning the policy with the intersection of capability and privacy requirements. Empirical evaluations demonstrate that SELFCI, without relying on costly external supervision, consistently outperforms competitive baselines such as online reinforcement learning algorithms (e.g., GRPO). These trends further extend to out-of-domain settings involving agentic workflows and accumulated private context, suggesting that SELFCI provides a practical path toward CI alignment.

2605.20257 2026-05-21 cs.LG cs.AI 版本更新

Instance Discrimination for Link Prediction

实例判别用于链接预测

Valentin Cuzin-Rambaud, Mathieu Lefort, Rémy Cazabet

AI总结 本文提出了一种基于链接表示的新模型L-GRACE和L-BGRL,用于改进链接预测任务的性能,特别是在无属性图上,并展示了其在监督和自监督场景下的竞争力。

详情
AI中文摘要

最近,实例判别模型已成为自监督学习的主要解决方案。在图像领域已证明其有效性后,实例判别学习现在在图领域,特别是节点分类任务中也表现出色。然而,针对链接预测任务的贡献较少。在本文中,我们提出将现有方法适应到此领域。我们首先对现有自监督模型在链接预测领域的性能进行了严格评估,表明主要性能依赖于增强过程(类似于计算机视觉)。然后,我们提出了一种基于社区结构的新的结构增强方法,这对链接预测相关。我们的主要贡献是引入了两个新的模型,L-GRACE和L-BGRL,基于链接表示而不是节点表示,这些模型改进了现有方法的性能,特别是在无属性图上,并且我们展示了它们在监督和自监督场景下与最先进的方法相当。

英文摘要

Recently, instance discrimination models have emerged as a major solution for self-supervised learning. Having already demonstrated its effectiveness in the image domain, instance discrimination learning is now proving equally convincing in the graph domain, in particular for node classification. However, fewer contributions have tackled the link prediction task. In this contribution, we propose to adapt existing methods to this context. We first provide a rigorous evaluation of existing self-supervised models in the field of link prediction, showing that the main performance depends on the augmentation process (like in computer vision). We then propose a new structural augmentation based on the community structure that is relevant for link prediction. Our main contribution introduces two new models, L-GRACE and L-BGRL, based on link representations instead of node representations, which improve the performance of the existing methods, especially on unattributed graphs, and we show that they perform on par with the state of the art, both in supervised and self-supervised contexts.

2605.20256 2026-05-21 cs.LG cs.AI 版本更新

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning

FBOS-RL: 基于反馈的双目标协同强化学习

Xikai Zhang, Yongzhi Li, Likang Xiao, Yingze Zhang, Yanhua Cheng, Quan Chen, Peng Jiang, Wenjun Wu, Liu Liu

发表机构 * Hangzhou International Innovation Institute, Beihang University(北京航空航天大学杭州国际创新研究院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Kuaishou Technology(快手科技)

AI总结 本文提出FBOS-RL框架,通过环境反馈引导探索增强,并设计两个相互促进的目标:以利用为导向的策略对齐(EPA)和以探索为导向的能力培养(ECC),从而提高强化学习的训练效率和最终性能。

详情
AI中文摘要

强化学习已成为对齐和解锁大规模模型推理能力的基石。在GRPO及其变种的核心训练循环中,交替进行rollout采样和策略更新。与监督学习不同,每个梯度步骤都锚定在显式的地面真实目标上,而在这种设置中,更新模型参数的最佳梯度方向是未知的;在采样阶段获得的高质量rollout因此充当隐含的“教师”,指导每个参数更新。然而,GRPO采用简单的采样方案,将所有rollout条件在同一原始提示上。当任务超出策略模型当前能力时,这种采样方案很少产生高质量rollout,导致策略模型在更新参数时缺乏有意义的梯度方向,从而导致训练停滞。为了解决这个问题,我们提出了FBOS-RL,一种基于反馈的双目标协同强化学习框架。具体来说,我们让模型基于环境提供的反馈进行反馈引导探索增强,并在此基础上设计两个相互促进的训练目标:以利用为导向的策略对齐(EPA)和以探索为导向的能力培养(ECC)。大量实验表明,EPA和ECC可以相互促进,形成正向飞轮效应,显著提高强化学习的训练效率和最终性能上限。具体而言,在相同数量的rollout下,FBOS-RL比GRPO和基于反馈的基线学习速度更快,并最终达到更高的性能上限,同时在训练过程中表现出更高的策略熵和更低的梯度范数。

英文摘要

Reinforcement learning has become a cornerstone for aligning and unlocking the reasoning capabilities of large-scale models. At its core, the training loop of GRPO and its variants alternates between rollout sampling and policy update. Unlike supervised learning, where each gradient step is anchored to an explicit ground-truth target, the optimal gradient direction for updating model parameters in this setting is not known a priori; the high-quality rollouts drawn during the sampling stage therefore act as the implicit "teacher" that guides every parameter update. However, GRPO adopt a simple sampling scheme that conditions all rollouts on the same original prompt. When a task lies beyond the policy model's current capability, this sampling scheme rarely yields a high-quality rollout, leaving the policy model without a meaningful gradient direction when updating its parameters, which causes training to stall. To address this issue, we propose FBOS-RL, a Feedback-Driven Bi-Objective Synergistic reinforcement learning framework. Specifically, we let the model perform Feedback-Guided Exploration Enhancement based on the feedback provided by the environment, and on top of this we design two mutually reinforcing training objectives: Exploitation-oriented Policy Alignment(EPA) and Exploration-oriented Capability Cultivation(ECC). Extensive experiments demonstrate that EPA and ECC can mutually reinforce each other, forming a positive flywheel effect that significantly improves both the training efficiency and the final performance ceiling of reinforcement learning. Specifically, under an identical number of rollouts, FBOS-RL learns substantially faster than GRPO and feedback-based baselines and ultimately attains a higher performance ceiling, while exhibiting higher policy entropy and lower gradient norms throughout training.

2605.20254 2026-05-21 cs.IR cs.AI cs.CV cs.LG 版本更新

Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

通过表格网格导航和逐步推理提示实现高效的表格问答

Amritansh Maurya, Navjot Singh, Mohammed Javed, Omar Moured

发表机构 * Vision Intelligence Lab, IIIT Allahabad, Prayagraj, India(视觉智能实验室,印度拉贾斯坦邦阿拉哈巴德)

AI总结 本文提出了一种无需训练的表格问答方法,通过TableGrid导航和Progressive Inference Prompting框架,提升了表格问答的精度和效率,并在多个数据集上验证了其有效性。

Comments Accepted for Presentation in ICDAR 2026, Vienna, Austria

详情
AI中文摘要

大型语言模型(LLMs)在自然语言处理任务中表现出色,但在表格数据上的表现仍需进一步研究,因为表格问答(TQA)需要精确的单元格检索和多步结构化推理。现有工作通过微调或在任务特定的表格数据上训练LLMs来改进TQA,但通常缺乏对模型如何导航表格和推导答案的可验证控制。在本文中,我们提出了一种无需训练的TQA方法,包含两个结构化提示框架:TableGrid导航(TGN),通过三模块循环迭代导航行和列以定位证据并细化答案;Progressive Inference Prompting(PIP),通过根据查询强制识别列,以明确的逐步行选择约束进行推理。我们在TableBench和FeTaQa数据集上评估了17个LLMs和6个基线模型。在TableBench上,TGN比最强基线提高了3.8分,而在FeTaQa上,PIP在ReAct和Chain-of-Thought上实现了SOTA性能。除了推理时间的提升外,PIP和TGN还可以作为监督模板来微调小型模型,在资源受限的设置中缩小与更大架构之间的性能差距,为TQA提供了多功能且成本效益高的解决方案。

英文摘要

Large Language Models (LLMs) have shown promising results on NLP tasks, however, their performance on tabular data still needs research attention, because Table Question-Answering (TQA) requires precise cell retrieval and multi-step structured reasoning. Existing work improves TQA either by fine-tuning or training LLMs on task-specific tabular data, but often lacks verifiable control over how the model navigates tables and derives answers. In this work, we propose a training-free TQA approach with two structured prompting frameworks: TableGrid Navigation (TGN), which iteratively navigates rows and columns via a three-module loop to locate evidence and refine answers, and Progressive Inference Prompting (PIP), which enforces columns identification for explicit progressive row selection constraint according to the query. We evaluate 17 LLMs against 6 baselines on TableBench and FeTaQa dataset. On TableBench, TGN improves over the strongest baseline by 3.8 points, and on FeTaQa, PIP achieves SOTA performance over ReAct and Chain-of-Thought. Beyond inference-time gains, PIP and TGN can also serve as supervision templates to fine-tune small models, narrowing the performance gap to much larger architectures in resource-constrained settings, offering versatile and cost-efficient solution for TQA.

2605.20250 2026-05-21 cs.LG physics.comp-ph physics.flu-dyn 版本更新

Physics-informed convolutional neural networks for fluid flow through porous media

具有物理信息的卷积神经网络用于多孔介质中的流体流动

Rafał Topolnicki, Paweł Dłotko, Maciej Matyka

发表机构 * Dioscuri Center in Topological Data Analysis, Institute of Mathematics, Polish Academy of Sciences(拓扑数据分析迪奥斯库里中心,波兰科学院数学研究所) Institute of Experimental Physics, Faculty of Physics and Astronomy, University of Wrocław(物理学与天文学系实验物理研究所,沃林堡大学) Institute of Theoretical Physics, Faculty of Physics and Astronomy, University of Wrocław(天文学与物理学系理论物理研究所,沃林堡大学) Parallel and Distributed Systems Laboratory, Jožef Stefan Institute(乔泽夫·斯蒂芬研究所并行与分布式系统实验室)

AI总结 本文提出了一种基于卷积神经网络的框架,用于直接从样本几何结构预测孔隙尺度的流速场,通过结合流体不可压缩性、固体内部无流条件、周期性约束和全局迂曲度指数等物理一致性约束,提高预测精度,并在不同几何和边界条件下验证了模型的泛化能力。

Comments 14 pages, supplement, dedicated github repo

详情
AI中文摘要

准确模拟多孔介质中的流体流动具有挑战性,因为孔隙空间的几何复杂性和求解纳维-斯托克斯方程的计算成本。这种困难在需要重复模拟时尤为重要,因为标准数值求解器在复杂的多孔域中可能收敛缓慢。我们提出了一种基于神经网络的框架,直接从样本几何结构预测孔隙尺度的速度场。该方法使用带有跳跃连接的卷积编码器-解码器架构,在提取多尺度特征的同时保留空间细节。通过自定义损失函数结合速度重构、不可压缩性、固体内部无流条件、周期性约束以及与全局迂曲度指数的一致性来鼓励物理一致性。我们分析了相应损失权重的影响,并量化了各个损失组件对预测精度的贡献。评估了多种CNN主干网络以识别提供准确且稳健预测的架构。在训练分布外的样本上测试了训练模型的泛化能力,包括障碍物几何、边界条件、孔隙率和现实多孔结构的变化。最后,我们展示了预测的速度场作为Lattice-Boltzmann模拟初始条件的实用应用。这种预热策略加速了求解器收敛,使90%的测试案例中的迭代次数减少。

英文摘要

Accurate simulation of fluid flow in porous media is challenging due to complex pore-space geometries and the computational cost of solving the Navier-Stokes equations. This difficulty is particularly important when repeated simulations are required, as standard numerical solvers may converge slowly in intricate porous domains. We present a neural-network-based framework for predicting pore-scale velocity fields directly from sample geometry. The method uses a convolutional encoder-decoder architecture with skip connections to preserve spatial detail while extracting multi-scale features. Physical consistency is encouraged through a custom loss function combining velocity reconstruction with incompressibility, no-flow conditions inside solids, periodicity constraints, and agreement with the global tortuosity index. We analyze the influence of the corresponding loss weights and quantify the contribution of individual loss components to prediction accuracy. Several CNN backbones are evaluated to identify architectures providing accurate and robust predictions. The generalization ability of the trained model is tested on samples outside the training distribution, including changes in obstacle geometry, boundary conditions, porosity, and realistic porous structures. Finally, we demonstrate a practical use of the predicted velocity fields as initial conditions for Lattice-Boltzmann simulations. This warm-start strategy accelerates solver convergence, reducing the number of iterations in over 90% of tested cases.

2605.20249 2026-05-21 cs.LG cs.AI 版本更新

Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization

面向高维贝叶斯优化理解的自动核发现

Taeyoung Yun, Woocheol Shin, Inhyuck Song, Jaewoo Lee, Jinkyoo Park

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 本文提出了一种基于大语言模型的进化框架,用于高维贝叶斯优化中的自动核发现,通过扩展核空间并避免依赖观测条件,提高了高维问题中核设计的有效性。

Comments 36 pages, 27 figures, 12 tables

详情
AI中文摘要

高斯过程(GP)核是贝叶斯优化(BO)的核心,但设计有效的高维问题核仍依赖于大量手动工程。现有自动方法在高维情况下面临两个瓶颈:其核搜索空间仅限于基本核的加法和乘法组合,且基于大语言模型的方法需要对原始观测进行条件化,这由于上下文长度限制和提取有意义模式的难度而变得不可行。我们引入了Kernel Discovery,一种基于大语言模型的进化框架,用于高维BO,它搜索超越预定义组合规则的更广泛的核空间,并且不需要对观测进行条件化。受直接提示大语言模型生成核代码会产生语法各异但功能相同的核的观察启发,我们采用两阶段方法:首先,大语言模型提出新的数学形式,然后通过第二次大语言模型调用将每个形式转换为经过验证的可执行代码。我们还提出了一种留一法连续排名概率评分(LOO-CRPS)作为选择标准,该标准惩罚过拟合的核。在五个高维BO基准上,我们的方法实现了平均排名为1.2(共17个),优于竞争基线。我们进一步分析发现的核,以确定哪些核在高维BO中带来了改进。

英文摘要

Gaussian Process (GP) kernels are central to Bayesian optimization (BO), yet designing effective kernels for high-dimensional problems still relies on extensive manual engineering. Existing automated approaches struggle in high dimensions for two bottlenecks: their kernel search space is limited to additions and multiplications of base kernels, and LLM-based approaches require conditioning on raw observations, which becomes infeasible due to context-length limits and the difficulty of extracting meaningful patterns. We introduce \textbf{Kernel Discovery}, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our method achieves an average rank of \textbf{1.2 out of 17}, outperforming competitive baselines. We further analyze the discovered kernels to identify which kernels lead to improvements in high-dimensional BO.

2605.20248 2026-05-21 cs.LG 版本更新

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

图转导锐化:利用未标记预测进行节点分类

Brown Zaz, Mar Gonzàlez I Català, Ferran Hernandez Caralt, Moshe Eliasof, Pietro Liò

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出了一种转导锐化方法,通过利用未标记节点的预测来改进节点分类任务,无需改变基础架构即可在多个基准上提升性能。

Comments 19 pages, 4 figures, 17 tables

详情
AI中文摘要

在转导设置中,当整个图被观察到但节点标签仅部分可用时,半监督节点分类的进展主要集中在架构创新上。本文重新审视了一个垂直轴:训练目标。我们从一个简单的观察出发:转导模型在训练过程中为每个节点生成预测,包括没有标签的节点。这些未标记节点的预测可能包含有用的训练信号,但标准监督目标会丢弃它们,因为没有真实标签可用。受交叉熵分解为标签依赖对齐项和标签无关熵项的启发,我们提出预测置信度作为在没有标签情况下提取此信号的自然方式。这促使了转导锐化(TS):一种损失层面的修改,它在未标记节点上最小化预测熵,同时在标记节点上平衡这一影响。我们评估了转导锐化在广泛节点分类基准上的表现,并观察到一致的性能提升,而无需对基础架构进行任何更改。代码可在https://github.com/transductive-sharpening/tunedGNN上获得。

英文摘要

In the transductive setting, where the full graph is observed but node labels are only partially available, progress in semi-supervised node classification has largely focused on architectural innovation. In this paper, we revisit an orthogonal axis: the training objective. We start from a simple observation: transductive models produce predictions for every node during training, including nodes without labels. These unlabeled-node predictions may contain useful training signal, but standard supervised objectives discard them because no ground-truth labels are available. Inspired by the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, we propose prediction confidence as a natural way to extract this signal in the absence of labels. This motivates Transductive Sharpening (TS): a loss-level modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes. We evaluate Transductive Sharpening across a wide range of node-classification benchmarks and observe consistent performance improvements without requiring any changes to the backbone architecture. Code is available at https://github.com/transductive-sharpening/tunedGNN.

2605.20247 2026-05-21 cs.LG cs.AI cs.CL cs.CV 版本更新

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

CP-MoE:一致性保留的混合专家用于持续学习

Yang Liu, Toan Nguyen, Flora D. Salim

发表机构 * School of Computer Science and Engineering University of New South Wales(计算机科学与工程学院 新南威尔士大学)

AI总结 本文提出CP-MoE,一种基于瞬时专家的持续学习框架,通过一致性保留的路由偏置和瞬时专家引导的正则化机制,减少参数干扰和遗忘,同时保留跨任务知识转移。

详情
AI中文摘要

持续学习在大语言模型(LLMs)和视觉-语言模型(VLMs)中仍面临灾难性遗忘的严重障碍。尽管混合专家(MoE)架构提供了扩展的有效途径,但现有的基于LoRA的MoE持续学习方法仍面临根本性的权衡:要么过于激进地隔离专家,限制任务间的知识转移,要么允许任务特定的更新覆盖重要的现有参数,导致严重的遗忘。为此,我们提出了CP-MoE,一种持续学习框架,围绕瞬时专家构建,该专家捕捉早期任务特定的更新并引导其整合到稳定的专家中。CP-MoE引入了一种一致性保留的路由偏置,利用瞬时专家估计与稳定专家的表示相似性,并引导路由向更兼容的专家选择方向;还引入了一种瞬时专家引导的正则化机制,该机制在合并过程中选择性地保护重要历史参数。这些组件共同减少了参数干扰和遗忘,同时保留了跨任务的知识转移。我们在基于LLM和VLM的MoE模型上验证了CP-MoE,既在单模态又在多模态持续学习基准上进行了测试。在SuperNI基准上,涵盖多样化的序列语言任务,CP-MoE实现了最先进的性能,并在未见任务上表现出更强的零样本迁移能力。在VQA v2数据集上,它能有效扩展到多模态视觉推理,一致地减少遗忘,并优于强大的MoE基线。

英文摘要

Catastrophic forgetting remains a major obstacle to continual learning in large language models (LLMs) and vision--language models (VLMs). Although Mixture-of-Experts (MoE) architectures offer an efficient path to scaling, existing LoRA-based MoE continual learning methods still face a fundamental trade-off: they either isolate experts too aggressively, limiting knowledge transfer across tasks, or allow task-specific updates to overwrite important existing parameters, leading to severe forgetting. To address this, we propose CP-MoE, a continual learning framework built around a transient expert that captures early task-specific updates and guides their integration into stable experts. CP-MoE introduces a consistency-preserving routing bias, which uses the transient expert to estimate representation similarity with stable experts and steer routing towards more compatible expert selection, and a transient expert-guided regularisation mechanism, which selectively protects important historical parameters during merging. Together, these components reduce parameter interference and forgetting while preserving cross-task knowledge transfer. We validate CP-MoE on both unimodal and multimodal continual learning benchmarks with LLM-based and VLM-based MoE models. On SuperNI benchmark, spanning diverse sequential language tasks, CP-MoE achieves state-of-the-art performance and stronger zero-shot transfer to unseen tasks. On VQA v2 dataset, it scales effectively to multimodal visual reasoning, consistently reduces forgetting, and outperforms strong MoE baselines.

2605.20245 2026-05-21 cs.SI cs.LG 版本更新

Prism: Structural Symmetry Scanning via Duality-Constrained Laplacian Projection

Prism:通过双重视约束拉普拉斯投影进行结构对称性扫描

Jiatong Xie

发表机构 * Independent researcher(独立研究者)

AI总结 Prism通过双重视约束拉普拉斯投影方法,利用图拉普拉斯矩阵和双重视算子计算结构对称性缺陷,以检测复杂网络的结构自一致性偏离程度,并在不同数据集上验证其在社区检测和结构应力检测中的有效性。

Comments 10 pages, 4 tables, 1 figure. This work presents a first-principles unsupervised network structural diagnosis framework based on symmetric involution operator and Laplacian commutator constraint. It achieves noise-robust community detection and early structural risk detection in financial time-series networks without supervised training data

详情
AI中文摘要

我们介绍了Prism,一个用于复杂网络结构对称性诊断的框架。给定一个图拉普拉斯矩阵L和一个双重视算子P(一个对称的逆运算),Prism计算双重视缺陷δ(L,P) = ||LP - PL||_F / ||L||_F ——一个标量,衡量网络偏离结构自一致性程度。当P编码网络的真实对称性时,δ接近零并在结构退化时单调上升;任意P给出噪声。我们证明了满足[L', P] = 0的最优L'由闭合形式的块对角投影给出,并提供了一个无监督的交替优化方法,从图自身的费米向量中学习P。在合成网络上的实验表明,真实P的缺陷比索引反转基线更敏感于结构退化,并比模块度更敏感。在带有边噪声的Zachary's Karate Club数据集上,Prism在5%噪声下达到94.5%的社区检测准确度,而原始拉普拉斯基线为76.6%。应用于实时S&P 500数据(2026-05-17)时,Prism检测到结构应力上升(缺陷0.43→0.73在90天内)而表面相关性仍低——一个相关性方法无法检测到的信号。在涵盖五个主要压力事件(2011-2020)的历史回测中,双重视缺陷表现出一致的模式:它在相关性尖峰之前达到高水平,并在结构脆弱期维持高水平,而传统指标将其归类为平静期。双重视缺陷是一种基于原理的结构可接受条件,不需要训练数据,可在毫秒内计算。

英文摘要

We introduce \textbf{Prism}, a framework for structural symmetry diagnosis in complex networks. Given a graph Laplacian $L$ and a duality operator $P$ (a symmetric involution), Prism computes the \emph{duality defect} $δ(L,P) = \|LP - PL\|_F / \|L\|_F$ -- a scalar measuring how far the network deviates from structural self-consistency. When $P$ encodes the network's true symmetry, $δ$ starts near zero and rises monotonically as structure degrades; an arbitrary $P$ gives noise. We prove that the optimal $L'$ satisfying $[L', P] = 0$ is given by a closed-form block-diagonal projection, and provide an unsupervised alternating optimization that learns $P$ from the graph's own Fiedler vector. Experiments on synthetic networks show the true-$P$ defect is $3.38\times$ more sensitive to structural degradation than an index-reversal baseline and more sensitive than modularity. On Zachary's Karate Club with edge noise, Prism achieves $94.5\%$ community detection accuracy at $5\%$ noise versus $76.6\%$ for the raw Laplacian baseline. Applied to live S\&P~500 data (2026-05-17), Prism detects rising structural stress (defect $0.43 \to 0.73$ over 90 days) while surface correlations remain low -- a signal invisible to correlation-based methods. In a historical backtest spanning five major stress events (2011--2020), the duality defect exhibits a consistent pattern: it reaches elevated levels \emph{before} the correlation spike that accompanies each crisis, and sustains high readings during periods of structural fragility that conventional metrics classify as calm. The duality defect is a first-principles structural admissibility condition, requiring no training data and computable in milliseconds.

2605.20244 2026-05-21 cs.LO cs.AI cs.CL cs.LG cs.SE 版本更新

Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Lean Refactor: 通过代理策略搜索实现多目标可控的证明优化

Jialin Lu, Soonho Kong, Rodrigo Stehling, Kaiyu Yang, Zhangyang Wang, Weiran Sun, Wuyang Chen

发表机构 * Simon Fraser University(西蒙弗雷泽大学) Amazon Web Services(亚马逊网络服务) MiroMind University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出Lean Refactor框架,通过检索增强的代理策略搜索,解决多目标、可控和版本鲁棒的Lean证明重构问题,主要贡献是通过预注释的多目标重构策略数据库实现高效的证明优化。

详情
AI中文摘要

我们提出了Lean Refactor,一个插件式的检索增强型代理框架,用于多目标、可控和版本鲁棒的Lean证明重构。LLM生成的证明虽然正确但冗长且易碎,现有重构工作忽视了三个实际挑战:1)Lean重构本质上是多目标的(证明长度、编译成本和版本兼容性常存在矛盾);2)Lean仓库具有脆弱的兼容性,而LLM发布不了解Lean/Mathlib版本;3)基于训练的流水线需要每次新LLM发布时重复微调,无法随模型变化或Lean发布周期扩展。Lean Refactor通过检索预注释的多目标重构策略数据库中的冻结代理LLM,每个策略都密集注释了元数据,如支持的Lean/Mathlib版本和预期的编译成本减少。实验显示在竞争基准上压缩超过70%的token级别,在研究仓库上压缩超过20%,并达到高达60%的编译时间减少,优于先前工作和Claude Code。版本过滤检索进一步提高了目标Lean版本的压缩效果,重构后的miniF2F证明在零样本版本迁移至未来Lean发布时表现优于未重构的对应物。

英文摘要

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over $70\%$ token-level compression on competition benchmarks, over $20\%$ on research repositories, and up to $60\%$ compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

2605.20242 2026-05-21 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph 版本更新

LEAP: A closed-loop framework for perovskite precursor additive discovery

LEAP:一种用于钙钛矿前驱体添加剂发现的闭环框架

Xin-De Wang, Zhi-Rui Chen, Ze-Feng Gao, Peng-Jie Guo, Cheng Mu, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China(中国人民大学物理学院) School of Chemistry and Life Resource, Renmin University of China(中国人民大学化学与生命资源学院)

AI总结 该研究提出LEAP框架,结合大语言模型和主动学习,通过文献驱动的机制相关描述符和贝叶斯优化,实现了钙钛矿太阳能电池添加剂的高效发现,实验验证显示其在性能提升方面优于通用模型。

Comments 30 pages; 11 figures

详情
AI中文摘要

高效发现前驱体添加剂对于提高钙钛矿太阳能电池性能至关重要,但庞大的化学空间使传统试错筛选效率低下。我们开发了LEAP(通过主动学习进行钙钛矿添加剂探索的LLM驱动闭环框架),该框架结合了领域专用的大语言模型(LLM)和主动学习,用于迭代性添加剂优先级排序。LLM被训练以从钙钛矿添加剂文献中提取机制相关知识,并通过可解释的描述符表示候选分子,这些描述符进一步整合到贝叶斯优化工作流中,以在低数据条件下进行不确定性感知的优先级排序。在未见过的文献基准测试中,领域专用模型在机制一致推理方面优于通用模型。专家在闭环中的证明概念研究实验验证显示,经过三次筛选轮次后,添加剂优先级得到改善,导致平均设备PCEs分别为20.13%和20.87%,分别比对照组的19.25%有所提高,其中最佳PCE为21.32%。这些结果提供了初步证据,表明基于文献的机制描述符,当结合贝叶斯优化和专家可行性审查时,可以支持钙钛矿光伏中的机制感知添加剂优先级排序。

英文摘要

Efficient discovery of precursor additives is essential for improving the performance of perovskite solar cells, yet the large chemical space makes conventional trial-and-error screening inefficient. We develop LEAP(LLM-driven Exploration via Active Learning for Perovskites), an expert-in-the-loop closed framework that couples a domain-specialized large language model(LLM) with active learning for iterative additive prioritization. The LLM is trained to extract mechanism-relevant knowledge from the perovskite additive literature and to represent candidate molecules through interpretable descriptors, which are further integrated into a Bayesian optimization workflow for uncertainty-aware prioritization under low-data conditions. Benchmark results on unseen literature show that the domain-specialized model outperforms general-purpose models in mechanism-consistent reasoning. Experimental validation in an expert-in-the-loop proof-of-concept study suggests improved additive prioritization across three screening rounds, leading to average device PCEs of 20.13% and 20.87% for the later-round 6-CDQ- and 2-CNA-treated devices, respectively, compared with 19.25% for the control, with a champion PCE of 21.32%. These results provide preliminary evidence that literature-grounded mechanistic descriptors, when coupled with Bayesian optimization and expert feasibility review, can support mechanism-aware additive prioritization in perovskite photovoltaics.

2605.20241 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry

Geometry-Lite: 通过层间边际几何进行可解释的安全探测

Woo Seob Sim, Yu Rang Park

发表机构 * Yonsei University(延世大学) Yonsei University College of Medicine(延世大学医学院) Department Biomedical Systems Informatics(生物医学系统信息学部门)

AI总结 本文研究了大语言模型在提示级别上的安全探测问题,提出了一种名为Geometry-Lite的紧凑探测器,通过层间边际几何分析来提高安全检测的可解释性和准确性。

详情
AI中文摘要

用于大语言模型的提示级别安全探测使用隐藏状态表示来区分安全和不安全的提示,但强平均检测性能并不能解释这种分离的几何结构。特别是,仍然不清楚安全证据是如何在层间形成的,哪些层间几何特性支持低误报决策,以及哪些几何偏见在基准转移下保持稳定。我们将此视为一个经验分解问题,并引入Geometry-Lite,一种紧凑的提示级别探测器,它将每一层的最终提示令牌表示映射到以质心、局部邻域和监督线性边界读出为中心的符号边际,然后通过边界位置、层间变化和粗略形状对结果边际配置进行总结。在九个指令微调的backbone(1.2B-70B)和七个安全基准上,Geometry-Lite在单层探测器上表现更好,同时接近原始多层分数堆叠,使其成为分析多层安全信号的有用工具。分解显示,安全证据主要通过持久的边界位置几何结构表达:最终或极端边际和不安全侧层占用主导汇总检测性能。相比之下,有限差分漂移和结构总结对汇总AUROC贡献很小,尽管漂移可以在低FPR阈值下提供小的召回修正。在基准转移下,优化的线性边界在训练混合物上是尖锐的,而类条件均值几何在预定义的硬保留子集上保持分离更可靠。总体而言,提示级别安全证据不是主要的层间运动信号,而是一种持久的层间边际几何结构,其有用组件和读取级偏见在决策关键区域变得明显。

英文摘要

Prompt-level safety probes for large language models use hidden-state representations to separate safe from unsafe prompts, but strong average detection performance does not explain the geometry of this separation. In particular, it remains unclear how safety evidence is formed across layers, which aspects of that layer-wise geometry support low-false-positive decisions, and which geometric biases remain stable under benchmark shift. We study this as an empirical decomposition problem and introduce Geometry-Lite, a compact prompt-level probe that maps each layer's final prompt-token representation to signed margins under centroid, local-neighborhood, and supervised linear-boundary readouts, then summarizes the resulting margin profiles by boundary position, layer-to-layer change, and coarse shape. Across nine instruction-tuned backbones ($1.2$B--$70$B) and seven safety benchmarks, Geometry-Lite improves over single-layer probes while remaining close to raw multi-layer score stacking, making it a useful instrument for analyzing the multi-layer safety signal. The decomposition shows that safety evidence is expressed primarily through persistent boundary-position geometry: final or extremal margins and unsafe-side layer occupancy dominate aggregate detection performance. In contrast, finite-difference drift and structural summaries add little to pooled AUROC, although drift can provide small recall-oriented corrections under shifted low-FPR thresholds. Under benchmark shift, optimized linear boundaries are sharp on the training mixture, whereas class-conditional mean geometry retains separation more reliably on a predefined hard held-out subset. Overall, prompt-level safety evidence is not primarily a layer-to-layer motion signal, but a persistent layer-wise margin geometry whose useful components and readout-level biases become visible in decision-critical regimes.

2605.20240 2026-05-21 cs.LG 版本更新

MagBridge-Battery: A Synthetic Bridge Dataset for Li-ion Magnetometry and State-of-Health Diagnostics

MagBridge-Battery: 一种用于锂离子磁测和健康状态诊断的合成桥梁数据集

Sakthi Prabhu Gunasekar, Prasanna Kumar Rangarajan

发表机构 * Dept.\ of Computer Science \& Engineering Amrita School of Computing Amrita Vishwa Vidyapeetham, India ORCID: 0009-0006-0153-5674 , 0000-0001-6103-259X

AI总结 本文提出MagBridge-Battery数据集,通过结合Mohammadi-Jerschow开放科学框架中的真实磁形态数据与PulseBat数据集的健康状态标签,为锂离子电池的磁测和健康状态诊断提供了一个公开的基准测试平台,同时验证了数据集在健康状态回归、二次生命分类和异常检测等任务上的有效性。

Comments 10 pages, 3 figures, 4 tables. Synthetic dataset and benchmark suite for battery magnetometry and state-of-health diagnostics; dataset released on Zenodo and code available on GitHub

详情
AI中文摘要

目前,电池健康诊断主要依赖于在电池端子测量的电化学信号。平行文献表明,磁感应可以解决终端-only测量所遗漏的信息,但方法开发受到缺乏公开的电池磁测量数据集与退化标签的限制。我们发布了MagBridge-Battery v1.0,这是一个包含6,760个磁场签名的合成数据集,将Mohammadi-Jerschow开放科学框架(OSF)档案中的真实磁形态与状态-of-health(SOH)标签相结合。该发布包含5,600个PulseBat处理的接地样本、600个从干净父体衍生的合成传感器异常样本以及560个低电压Regime-B外推样本。一个细胞不重叠、父-子泄漏自由的主基准划分已被验证,其中包含零重叠单元格、零跨分割父-子对以及零样本ID重叠。我们定义了三个主要基准任务:SOH回归、二次生命分类和异常检测,以及一个辅助的异常子类型分类任务。受控标签洗牌消融将SOH回归的R²从约0.77降低到约0,证实了桥梁编码输入SOH非平凡地而不是产生标签对齐的伪影。该数据集在Zenodo上以CC-BY-4.0发布,桥梁代码和基准套件以Apache-2.0发布。这项工作为磁感应电池诊断提供了公开的基准测试,同时在配对的磁电化学测量仍然稀缺的情况下。

英文摘要

Battery health diagnostics today rely overwhelmingly on electrochemical signals measured at the cell terminals. A parallel literature has shown that magnetic sensing can resolve information that terminal-only measurements miss, but method development is limited by the absence, to the best of our knowledge, of public battery magnetic-measurement datasets paired with degradation labels. We release MagBridge-Battery v1.0, a synthetic dataset of 6,760 magnetic-field signatures that bridges real magnetic morphology from the Mohammadi-Jerschow Open Science Framework (OSF) archive with state-of-health (SOH) labels from the PulseBat dataset. The release contains 5,600 PulseBat-conditioned grounded samples, 600 synthetic sensor-anomaly samples derived from clean parents, and 560 low-voltage Regime-B extrapolation samples. A cell-disjoint, parent-child-leakage-free primary benchmark split is verified to contain zero overlapping cells, zero cross-split parent-child pairs, and zero sample-ID overlap. We define three primary benchmark tasks: SOH regression, second-life classification, and anomaly detection, plus an auxiliary anomaly-subtype classification task. A controlled label-shuffle ablation collapses SOH regression from R^2 approximately 0.77 to approximately 0, confirming that the bridge encodes input SOH non-trivially rather than producing label-aligned artifacts. The dataset is released on Zenodo under CC-BY-4.0, and the bridge code and benchmark suite are released under Apache-2.0. This work provides a public benchmark for magnetic-sensing battery diagnostics while paired magnetic-electrochemical measurements remain scarce.

2605.20235 2026-05-21 cs.LG cs.AI 版本更新

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

在流形假设下证明学习扩散模型:坍缩与细化

Wei Huang, Andi Han, Mingyuan Bai, Huanjian Zhou, Qixin Zhang, Taiji Suzuki, Kenji Fukumizu

发表机构 * RIKEN AIP & The Institute of Statistical Mathematics(日本理化学研究所AIP及统计数学研究所) University of Sydney(悉尼大学) Agency for Science, Technology and Research & The Institute of Statistical Mathematics(科技研究局及统计数学研究所) The University of Tokyo(东京大学) Nanyang Technological University(南洋理工大学) The Institute of Statistical Mathematics(统计数学研究所)

AI总结 本文在流形假设下研究扩散模型的学习问题,提出了一种由得分函数几何特性驱动的坍缩与细化机制,并通过Score-induced Latent Diffusion模型验证了其理论预测,证明样本复杂性依赖于内在维度而非外在维度。

Comments 3 figures

详情
AI中文摘要

扩散模型能够生成高质量的高维数据,但其训练如何高效学习得分函数并在数据支持于低维流形时克服维度灾难仍缺乏理论解释。我们识别出一种由得分函数几何特性驱动的坍缩与细化机制:在小噪声尺度下,得分函数的发散奇点导致诱导去噪映射快速坍缩到数据流形投影上;在中等噪声尺度下,训练在学习的流形上细化内在密度。我们将其原理实例化为Score-induced Latent Diffusion (SiLD),一种两阶段框架,其中流形学习和密度估计均源自单一去噪得分匹配目标,取代了基于VAE的潜在扩散模型的启发式KL正则化。我们证明所得到的样本复杂性依赖于内在维度而非外在维度。在Stacked MNIST、CelebA变体和分子生成基准测试中,SiLD在生成质量上匹配或优于基于VAE的LDMs,并且在重建方面始终有所改进,验证了我们的理论预测。

英文摘要

Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, remains theoretically unexplained. We identify a collapse-and-refine mechanism driven by the geometry of the score function itself: at small noise scales, the diverging singularity of the score drives a rapid dimensional collapse of the induced denoising map onto the data manifold projection; at moderate noise scales, training refines the intrinsic density on the learned manifold. We instantiate this principle as Score-induced Latent Diffusion (SiLD), a two-stage framework in which both manifold learning and density estimation emerge from a single denoising score matching objective, replacing the heuristic KL regularization of VAE-based latent diffusion models. We prove that the resulting sample complexity depends on the intrinsic dimension rather than the ambient dimension. Experiments on Stacked MNIST, CelebA variants, and molecular generation benchmarks show that SiLD matches or outperforms VAE-based LDMs in generation quality and consistently improves reconstruction, validating our theoretical predictions.

2605.20234 2026-05-21 cs.LG cs.AI 版本更新

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

TabPFN-MT: 一种原生多任务上下文学习器用于表格数据

Cormac Cureton, Narges Armanfard

发表机构 * McGill University(麦吉尔大学) Mila - Quebec AI Institute(Mila-魁北克人工智能研究所)

AI总结 本文提出TabPFN-MT,一种针对表格数据的原生多任务上下文学习器,通过扩展多目标合成先验来捕捉上下文中的任务依赖性,实现多任务上下文学习和同时推断,同时在小到中等规模数据集上表现出色,提升了多目标表格应用的计算效率。

Comments 24 pages, 7 figures

详情
AI中文摘要

Prior-Data Fitted networks (PFNs) have been very successful in tabular contexts, handling prediction tasks in context. However, they are designed for single-task inference, meaning that predicting several target values within a context requires repeated forward calls and precludes inter-task information sharing. We propose TabPFN-MT, which is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded $y$-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. The model is uniquely specialized for small-to-medium datasets by relying on in-context learning rather than traditional gradient-based training. Within this regime (averaging fewer than 1,000 samples), extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning. Furthermore, despite the inherent compute asymmetry of joint optimization, our model remains highly competitive with the latest state-of-the-art single-task ensembles. Notably, on multitask datasets it achieves an overall Accuracy rank of 4.89, the highest average rank among all models tested. Crucially, TabPFN-MT delivers this highly competitive performance while reducing the inference cost for $T$ tasks from $O(T)$ to $O(1)$ forward passes, offering a massive computational efficiency improvement for multi-target tabular applications.

英文摘要

Prior-Data Fitted networks (PFNs) have been very successful in tabular contexts, handling prediction tasks in context. However, they are designed for single-task inference, meaning that predicting several target values within a context requires repeated forward calls and precludes inter-task information sharing. We propose TabPFN-MT, which is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded $y$-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. The model is uniquely specialized for small-to-medium datasets by relying on in-context learning rather than traditional gradient-based training. Within this regime (averaging fewer than 1,000 samples), extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning. Furthermore, despite the inherent compute asymmetry of joint optimization, our model remains highly competitive with the latest state-of-the-art single-task ensembles. Notably, on multitask datasets it achieves an overall Accuracy rank of 4.89, the highest average rank among all models tested. Crucially, TabPFN-MT delivers this highly competitive performance while reducing the inference cost for $T$ tasks from $O(T)$ to $O(1)$ forward passes, offering a massive computational efficiency improvement for multi-target tabular applications.

2605.20222 2026-05-21 quant-ph cs.LG 版本更新

Quantum End-to-End Learning for Contextual Combinatorial Optimization

量子端到端学习用于上下文组合优化

Jaehwan Lee, Changhyun Kwon

发表机构 * KAIST(韩国科学技术院) Omelet

AI总结 本文提出量子端到端学习框架QEL,用于解决上下文组合优化问题,通过量子近似优化算法实现端到端训练,有效捕捉上下文、不确定系数和最优解之间的复杂关系,避免调用NP难优化求解器,展现出在量子时代应用的潜力。

Comments 23 pages, 2 figures, preprint

详情
AI中文摘要

上下文组合优化(CCO)在不确定性决策中起关键作用,但仍是重大挑战。我们提出了量子端到端学习(QEL),这是首个基于量子计算的端到端学习框架,用于CCO,利用量子近似优化算法。受数据重新上传中状态准备和演化的整合启发,我们提出了一种上下文重新上传相分离器,共同捕捉上下文、不确定系数和最优解之间的复杂关系。这使得上下文编码器可以无缝集成到量子替代策略中,实现联合端到端训练,并保证平稳性。利用基于物理原理的优化感知结构,经典方法难以利用,我们的方法通过直接在任务损失上训练,尽管存在离散性和非凸性,仍避免调用NP难优化求解器。QEL在参数数量上显著少于经典基准,实验证明其在量子时代具有工业级应用潜力。

英文摘要

Contextual combinatorial optimization (CCO) plays a critical role in decision-making under uncertainty, yet remains a significant challenge. We present Quantum End-to-End Learning (QEL), the first quantum computing-based end-to-end learning framework for CCO that leverages Quantum Approximate Optimization Algorithms. Inspired by the integration of state preparation and evolution in data re-uploading, we propose a context re-uploading phase-separator that jointly captures the complex relations among contexts, uncertain coefficients, and optimal solutions. This allows a contextual encoder to be seamlessly integrated within a quantum surrogate policy, enabling joint end-to-end training with a stationarity guarantee. Exploiting an optimization-aware structure grounded in physical principles that classical methods cannot readily leverage, our approach demonstrates practicality by directly training on task loss despite the discreteness and nonconvexity, while avoiding calls to NP-hard optimization solvers. QEL empirically achieves competitive performance while requiring substantially fewer parameters than classical benchmarks, highlighting its industrial-level potential for the future quantum era.

2605.20220 2026-05-21 cs.SD cs.IR cs.LG 版本更新

Advanced Scientific Methodology Plays Rossini

高级科学方法论应用于罗西尼

Silvia Licciardi, Daniela Macchione, Emmanuel Caronna, Elisa Francomano

发表机构 * University of Palermo, Department of Engineering(巴勒莫大学工程系) Conservatory Alfredo Casella(阿尔弗雷多·卡塞拉音乐学院)

AI总结 本文通过计算分析方法,对罗西尼为梅斯塔西奥的《Mi lagnerò tacendo》所作的音乐作品进行结构分析,揭示其旋律、和声及文本创作选择,为音乐文献学研究提供新的系统研究基础。

详情
AI中文摘要

音乐谱子提供了表演的基本指示,同时包含有时隐含的作曲家意图指示。作者的变体以及更复杂的与同一文本相关的修订系列,给分析研究带来了挑战。本研究在科学方法论应用于音乐文献学的背景下,提出了一种面向结构分析的方法,研究罗西尼为同一梅斯塔西奥阿里埃塔《Mi lagnerò tacendo》所作的多个作品之一。通过计算分析——包括解析、数据挖掘和图论——对旋律、和声及文本创作选择进行了严谨探讨。结果构成了该领域的独特贡献,为系统研究奠定了基础,支持文献学研究,并为使用生成模型研究创作过程铺平了道路。

英文摘要

A musical score provides the essential instructions for its performance while containing indications - at times implicit - regarding the composer's intentions. The presence of authorial variants, and even more so complex series of revisions associated with a single text, presents a challenging path for analytical study. This research, situated within the application of Scientific Methodologies to Music Philology, proposes a methodological approach oriented toward the structural analysis of one of the many settings composed by Gioachino Rossini on the same Metastasio arietta ``Mi lagnerò tacendo''. Through Computational Analysis - incorporating parsing, data mining, and graph theory - the melodic, harmonic, and textual compositional choices have been rigorously explored. The results constitute a significant unicum in the field, laying the foundation for a systematic study that supports philological research and paves the way for the use of generative models to investigate the creative process.

2605.20209 2026-05-21 cs.GR cs.LG cs.RO 版本更新

NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

NaP-Control: 为多功能和快速字符控制导航扩散先验

Chia-Wen Chen, Yan Wu, Korrawe Karunratanakul, Siyu Tang

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出NaP-Control方法,通过强化学习操控任务无关的扩散策略先验的潜在噪声,实现快速、鲁棒且高保真的字符控制,同时通过环境交互优化任务奖励,提升成功率并适应挑战性场景。

详情
AI中文摘要

在基于物理的动画中实现精确、多功能的全身字符控制仍然具有挑战性。最近的基于扩散的策略生成丰富且表达性强的动作,但通常依赖于基于梯度的测试时间引导以满足任务目标,这会减慢速度并降低鲁棒性。我们引入NaP-Control(Navigating Diffusion Prior for Versatile and Fast Character Control),简称NaP。我们的方法使用强化学习操控任务无关的扩散策略先验的潜在噪声,将其引导至任务特定的行为,以实现快速、鲁棒且高保真的控制。与仅依赖离线训练的方法不同,NaP在训练期间与环境交互以校正动作并优化任务奖励,提高成功率并使系统能够适应具有挑战性的场景。通过直接预测任务优化的扩散噪声,NaP消除了去噪过程中的迭代引导,实现了高效的推理。实验表明,NaP在多样化的任务中实现了更高的成功率和更快的推理速度,同时保持自然的动作。

英文摘要

Achieving precise, versatile whole-body character control in physics-based animation remains challenging. Recent diffusion-based policies generate rich and expressive motions but typically rely on gradient-based test-time guidance to satisfy task objectives, which is slow and can reduce robustness. We introduce NaP-Control (Navigating Diffusion Prior for Versatile and Fast Character Control), abbreviated as NaP. Our method uses reinforcement learning to manipulate the latent noise of a task-agnostic diffusion policy prior, steering it toward task-specific behaviors for fast, robust control with high motion fidelity. In contrast to methods that rely solely on offline training, NaP interacts with the environment during training to correct motions and optimize task rewards, improving success rates and enabling adaptation to challenging scenarios. By directly predicting task-optimized diffusion noise, NaP eliminates iterative guidance during denoising and enables efficient inference. Experiments show that NaP attains higher success rates and faster inference while preserving natural motion across diverse tasks.

2605.20198 2026-05-21 cs.HC cs.CY cs.LG 版本更新

Augmented Analytics and Decision Quality: The Role of Trust among Non-Technical BI Users

增强分析与决策质量:非技术BI用户之间的信任作用

Thuy Pham Thi Phuong, Ha Nguyen Manh, Ngan Nguyen Thi Thuy, Lan Hoang Thi

发表机构 * British International School Ho Chi Minh City, Vietnam(越南河内国际学校) Truong Thanh Viet Nam Group, Vietnam(越南Truong Thanh Viet Nam集团) Hanoi College of Industry and Trade, Vietnam(越南河内工业贸易学院) University of Languages and International Studies, Ha Noi, Vietnam(越南河内语言与国际研究大学)

AI总结 本研究探讨了增强分析如何通过非技术BI用户对系统信任的提升来改善决策质量,采用认知委托理论分析了信任在决策质量中的作用。

Comments 13 pages, 1 figure, 4 tables

详情
AI中文摘要

增强分析已改变了商业智能(BI)系统支持管理决策的方式。这尤其适用于没有技术背景的用户,他们越来越多地依赖自动化洞察而非手动分析。以往的BI研究集中在系统采用和用户意图上,很少研究AI增强分析对决策质量和其中的认知机制的影响。基于认知委托理论,本文研究了非技术BI用户在增强分析和决策质量中的信任作用。250名商业专业人士完成了调查,数据通过偏最小二乘结构方程建模(PLS-SEM)进行分析。结果显示,增强分析能力显著提高了对系统使用简便性、有用性和信任度的感知。此外,信任和有用性影响BI的采用并提高决策质量。进一步地,信任对决策质量有直接的积极影响,突显了其作为依赖AI生成洞察的促进者的重要性。本研究将增强分析视为一种认知委托,并扩展了BI采用研究的范围,以包括决策结果。

英文摘要

Augmented analytics has transformed how business intelligence (BI) systems support managerial decision-making. This is especially true for users without technical backgrounds, who increasingly rely on automated insights rather than manual analysis. BI research has previously concentrated on system adoption and user intention, with very little research examining the impact of AI-enabled analytics on decision quality and the cognitive mechanisms in between. Using the theory of cognitive delegation, this paper investigates the role of trust in augmented analytics and decision-making quality among non-technical BI users. 250 business professionals completed the survey, and the data were analyzed using partial least squares structural equation modeling (PLS-SEM). The results show that augmented analytics capabilities lead to a significant increase in perceived ease of use, perceived usefulness, and trust in BI systems. In addition, trust and usefulness influence BI adoption and improve decision quality. Furthermore, trust has a direct and positive impact on decision quality, highlighting its importance as an enabler of reliance on AI-generated insights. This study considers augmented analytics as a form of cognitive delegation and expands the scope of BI adoption research to include decision-making outcomes.

2605.20196 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum

数据扩展作为预测贡献光谱的渐进覆盖

Zihui Song, Shihao Ji, Hongxi Li, Shuaizhi Cheng, Chunlin Huang

发表机构 * sysu.edu.cn(华南理工大学) stu.hit.edu.cn(哈尔滨理工大学)

AI总结 本文研究了真实数据扩展定律是由潜在预测贡献光谱的渐进覆盖而非仅由词频尾部决定的假设,通过文本语料库的后缀自动机表示,定义了数据内在的全局KL预测贡献光谱,每个状态根据其经验质量乘以与全局下一个词基线的KL偏差进行贡献。在12个真实语料库上,该光谱的尾部斜率与固定小GPT学习者的经验数据扩展指数有强相关性。然后定义了每个训练规模N的有效截断秩K(N),通过匹配观察到的超额损失与准备的100万全球KL光谱的残余尾部质量。实证结果显示,log K接近log N的线性关系,原始光谱的R²约为0.96,平滑光谱的R²约为0.90。这些发现为简单机制图提供了有力的实证支持:训练规模通过预测状态光谱推进有效前沿,该光谱的残余尾部质量跟踪剩余超额损失。

Comments 8 pages,6 figures

详情
AI中文摘要

我们研究了真实数据扩展定律是由潜在预测贡献光谱的渐进覆盖而非仅由词频尾部决定的假设。我们使用后缀自动机表示文本语料库,并定义了一个数据内在的全局KL预测贡献光谱,其中每个状态根据其经验质量乘以与全局下一个词基线的KL偏差进行贡献。在12个真实语料库上,该光谱的尾部斜率与固定小GPT学习者的经验数据扩展指数有强相关性。然后我们超越了斜率相关性,并为每个训练规模N定义了一个有效截断秩K(N),通过匹配观察到的超额损失与准备的100万全球KL光谱的残余尾部质量。实证结果显示,log K接近log N的线性关系,原始光谱的R²约为0.96,平滑光谱的R²约为0.90。这些发现为简单机制图提供了有力的实证支持:训练规模通过预测状态光谱推进有效前沿,且该光谱的残余尾部质量跟踪剩余超额损失。

英文摘要

We investigate the hypothesis that real-data scaling laws are governed by progressive coverage of a latent predictive contribution spectrum rather than by token-frequency tails alone. We work with a suffix-automaton representation of text corpora and define a data-intrinsic global-KL predictive contribution spectrum, in which each state contributes according to its empirical mass times its KL deviation from a global next-token baseline. Across 12 real corpora, the tail slope of this spectrum is already strongly correlated with the empirical data-scaling exponent of a fixed small GPT learner. We then go beyond slope correlation and define, for each training size N, an effective truncation rank K(N) by matching the observed excess loss to the residual tail mass of the prepared 1000k global-KL spectrum. Empirically, log K is close to linear in log N, with pooled R^2 about 0.96 for the raw spectrum and R^2 about 0.90 for the smoothed spectrum. These findings provide strong empirical support for a simple mechanism picture: training scale advances an effective frontier through a predictive state spectrum, and the residual tail mass of that spectrum tracks the remaining excess loss.

2605.20195 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues

面向目标的主动对话中规划的伪孪生网络

Xinyue Kang, Maodong Li, Yibin Zheng, Fang Kong

发表机构 * School of Computer Science and Technology(计算机科学与技术学院)

AI总结 本文提出了一种面向目标的主动对话规划方法,通过FF-BPSN网络实现对话路径规划,提升目标导向型主动对话系统的有效性。

Comments ICASSP2026

详情
AI中文摘要

针对目标导向型主动对话系统,旨在引导对话向预设目标发展并主动提供建议。该系统的核心范式是规划合理的对话路径,并引导语言模型生成响应,其中对话路径规划是核心组件,是一个新颖但研究不足的问题。本文提出了一种前向聚焦双向伪孪生网络(FF-BPSN)用于面向预设对话目标的对话路径规划。FF-BPSN采用两个相同的基于Transformer的解码器用于前向和后向规划,并结合一个前向聚焦模块,整合双向信息以构建最终的前向路径。该路径受益于双向规划,同时优先考虑前向信息。然后,我们利用规划的路径来引导语言模型进行响应生成。在DuRecDial和DuRecDial 2.0上的广泛实验表明,FF-BPSN在对话路径规划中实现了最先进的性能,并显著增强了目标导向型主动对话系统的效果。

英文摘要

A target-oriented proactive dialogue system is designed to steer conversations toward predefined targets while actively providing suggestions. The core paradigm of such a system is to plan a reasonable dialogue path and subsequently guide language models (e.g., pre-trained or large language models) to generate responses, where dialogue path planning serves as the central component-a novel yet under-explored problem. In this work, we propose a Forward-Focused Bidirectional Pseudo-Siamese Network (FF-BPSN) for dialogue path planning toward predefined dialogue targets. FF-BPSN employs two identical transformer-based decoders for forward and backward planning, together with a forward-focused module that integrates bidirectional information to construct the final forward path. This path benefits from bidirectional planning while prioritizing forward information. We then employ the planned path to guide language models in response generation. Extensive experiments on DuRecDial and DuRecDial 2.0 demonstrate that FF-BPSN achieves state-of-the-art performance in dialogue path planning and significantly enhances the effectiveness of target-oriented proactive dialogue systems.

2605.20194 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

并行大语言模型推理用于偏见鲁棒、稳健的概念抽象

Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye

发表机构 * University of Turku, Turku, Finland(图尔库大学,芬兰图尔库) Brilloconnetz Partners avoin yhtiö, Turku, Finland(Brilloconnetz Partners 公司,芬兰图尔库)

AI总结 本文提出了一种结合并行分块处理与证据锚定整合的结构化框架,旨在减少长文档分析中的偏见、遗漏误差和过度泛化问题,通过并行处理和证据锚定提高文本分析的可靠性和可扩展性。

Comments Accepted to be Published in 12th Intelligent Systems Conference 2026, 3-4 September 2026 in Amsterdam, The Netherlands

详情
AI中文摘要

大型语言模型(LLMs)在分析文本方面被越来越多地使用。然而,当分析长文档时,它们常常受到上下文推理限制的困扰。当长文档被顺序处理时,早期或主导的概念会掩盖不明显但有意义的解释,导致累积分析偏见、遗漏误差和过度泛化。此外,独立生成的输出通常在没有系统基础的情况下合并,引入了冗余、概念漂移和未经支持的主张。本研究提出了一种结合并行分块处理与证据锚定整合的结构化框架。文本首先被划分为语义连贯的分块,并独立并行处理以消除早期处理的影响。然后,独立生成的解释通过显式的证据锚定和优先级整合进行整合,从而减少主导和过度泛化,同时提高可追溯性。在多种模型类型和规模上的实验表明,并行处理显著减少了约84%的遗漏误差,提高了高达130%的证据可追溯性,并减少了高达91%的未经支持的主张。较小的模型受益最大,表明高效的并行分块和整合在实现可靠和可扩展的文本分析中起关键作用。

英文摘要

Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early or dominant concepts can overshadow less visible but meaningful interpretations, leading to cumulative analytical bias, omission error, and over-generalization. Additionally, independently generated outputs are often merged without systematic grounding, introducing redundancy, conceptual drift, and unsupported claims. This study proposes a structured framework combining parallel chunk-level processing with evidence-anchored consolidation. Texts are first divided into semantically coherent chunks and processed independently in parallel to remove influence from earlier processing. The independently generated interpretations are then consolidated using explicit evidence anchoring and prioritization that reduces dominance and over-generalization while improving traceability. Experiments with multiple model types and sizes indicate that parallel processing significantly reduces omission error by approximately 84%, increases evidence traceability by up to 130%, and reduces unsupported claims by up to 91%. Smaller models benefited most, suggesting that efficient parallel chunking and consolidation play a critical role in achieving reliable and scalable textual analysis.

2605.20193 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

通过多轮提示验证提升量化模型在定性分析中的性能

Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye

发表机构 * University of Turku, Turku, Finland(图尔库大学,芬兰图尔库) Brilloconnetz Partners avoin yhtiö, Turku, Finland(Brilloconnetz Partners 有限公司,芬兰图尔库)

AI总结 本文研究了不同位数量化级别和类型对LLaMA-3.1(8B)在定性分析中的性能影响,提出了一种量化感知的多轮提示验证方法以提高模型的稳定性和准确性,结果显示8位模型最接近黄金标准,4位模型在应用方法后变得稳定,3位和2位模型在提示设计和验证后性能有所提升。

Comments Accepted to publish in 12th Intelligent Systems Conference 2026; 3-4 September 2026 in Amsterdam, The Netherlands

详情
AI中文摘要

量化大型语言模型(LLMs)因其运行速度快且计算资源需求低而更常用于定性分析。本研究探讨了不同低位量化级别(8位、4位、3位和2位)和量化类型对LLaMA-3.1(8B)在定性分析中的性能影响。研究使用了82份访谈记录中的专家和非专家回应。低比特模型常产生较高的幻觉和不稳定结果,尤其是在处理非专家语言中的不明确术语时。为提高性能,我们提出了一种量化感知的多轮提示验证方法。该方法通过受控步骤引导模型减少幻觉,移除不可靠内容,并在验证后将结果传递给下一访谈文本,从而提高准确性。为了验证性能,人类编码器使用NVivo和BF16 LLaMA分析了访谈记录。BF16 LLaMA-3.1产生了高精度输出,但存在语义漂移和幻觉。这些错误被手动纠正。纠正后的BF16输出和NVivo人工编码被结合,以创建主题提取和频率分析的黄金标准地面真实值(GSGT)。结果表明,8位模型最接近GSGT。4位模型在应用所提方法后变得稳定。3位和2位模型因压缩严重而性能下降,但通过所提提示设计和验证有所提升。本研究还发现,相同位数的模型在不同量化类型下行为不同。总体而言,该方法帮助低资源LLM变得更加稳定、准确,并以更低的成本适用于定性研究。

英文摘要

Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed transcripts using NVivo and BF16 LLaMA. BF16 LLaMA-3.1 produced high-precision output but had semantic drift and hallucination. These errors were corrected manually. The corrected BF16 output and NVivo human coding were combined to create a gold-standard ground truth (GSGT) for thematic extraction and frequency analysis. The results show that 8-bit models stay closest to the GSGT. The 4-bit models lose accuracy but become stable when the proposed method is applied. The 3-bit and 2-bit models drop in performance because of heavy compression, but they improve with the proposed prompt design and verification. The study also finds that models at the same bit level behave differently depending on quantization type. Overall, the method helps low-resource LLMs become more stable, accurate, and suitable for qualitative research at lower cost.

2605.20189 2026-05-21 cs.AI cs.LG 版本更新

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

SOLAR:一种自优化的开放式自主代理,用于终身学习和持续适应

Nitin Vetcha, Dianbo Liu

发表机构 * Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore(眼科学系,Yong Loo Lin医学院,新加坡国立大学,新加坡) Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, Karnataka, India(计算与数据科学系,印度科学研究院,班加罗尔,卡纳塔克邦,印度)

AI总结 本文提出SOLAR,一种自优化的开放式自主代理,通过参数级元学习实现自我改进,解决了动态真实世界中概念漂移和梯度基适应成本高的问题,展示了在常识、数学、医学、编程、社交和逻辑推理任务上的优越性能。

Comments Accepted at "Association for the Advancement of Artificial Intelligence 2026 Conference" in Streaming Continual Learning Bridge. Published in CEUR Workshop Proceedings (Original version at https://ceur-ws.org/Vol-4183/paper2.pdf)

详情
Journal ref
CEUR Workshop Proceedings, Vol. 4183, 2026
AI中文摘要

尽管大型语言模型(LLMs)在许多任务上取得了显著成功,但在动态、真实世界环境中部署时仍然面临瓶颈,主要挑战是概念漂移和基于梯度的适应成本高。传统微调(FT)难以适应非平稳数据流,且会导致灾难性遗忘或需要大量人工数据校准。为了解决这些限制,本文在流式和持续学习范式中提出Self-Optimizing Lifelong Autonomous Reasoner(SOLAR),即一种开放式自主代理,利用参数级元学习实现自我改进,将模型权重视为探索的环境。SOLAR通过在常识常识知识上建立强先验,使其在迁移学习中有效。通过多级强化学习方法,SOLAR自主发现适应策略,实现对未见领域的高效测试时间适应。关键在于SOLAR维护一个不断发展的有效修改策略知识库,隐式地作为事件记忆缓冲器,平衡可塑性(适应新任务)和稳定性(保留元知识)。实验表明,SOLAR在常识、数学、医学、编程、社交和逻辑推理任务上优于强基线,标志着向能够适应演进环境的自主代理迈出重要一步。

英文摘要

Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.

2605.20188 2026-05-21 cs.LG cs.AI 版本更新

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

GraphDiffMed: 基于药理图先验的知识约束差分注意力用于药物推荐

Krati Saxena, Tomohiro Shibata

发表机构 * Kyushu Institute of Technology(九州工业大学)

AI总结 本文提出GraphDiffMed,一种结合噪声感知注意力和药理约束的药物推荐框架,通过双尺度差分注意力在院内和院间层面过滤虚假信号,提升推荐质量和安全性。

详情
AI中文摘要

从电子健康记录(EHRs)中推荐安全有效的药物组合是核心临床AI问题,但因患者轨迹长、噪声大且临床异质性高而困难。现有方法通常在时间建模或药理知识整合方面表现优异,但难以同时实现两者并有效抑制噪声。我们提出GraphDiffMed,一种基于双尺度差分注意力v2的知识约束药物推荐框架。差分注意力应用于院内和院间层面以过滤遇境内的虚假信号和纵向历史中的噪声,而药理约束则在学习过程中整合。在MIMIC-III和消融研究中,该设计在推荐质量和排名上优于强基线模型,同时实现了更平衡的安全性能。我们进一步发现,最强表现的配置在实验设置下仅使用人口统计辅助特征。总体而言,GraphDiffMed证明了结合噪声感知注意力与药理约束能产生更可靠且具有临床意义的药物推荐。我们开源代码至https://github.com/saxenakrati09/GraphDiffMed。

英文摘要

Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous. Existing methods typically excel at either temporal modeling across visits or pharmacological knowledge integration (e.g., drug-drug interactions, DDIs), but rarely achieve both while robustly suppressing noise. We present GraphDiffMed, a knowledge-constrained medication recommendation framework built on dual-scale Differential Attention v2. Differential attention is applied at both intra-visit and inter-visit levels to filter spurious signals within encounters and across longitudinal history, while pharmacological constraints are incorporated during learning. Experiments on MIMIC-III and ablation studies show that this design consistently improves recommendation quality and ranking over strong baselines while achieving a more favorable safety performance balance. We further find that the strongest-performing configuration uses only demographic auxiliary features under our experimental setting. Overall, GraphDiffMed demonstrates that combining noise-aware attention with pharmacological constraints yields more reliable and clinically meaningful medication recommendation. We open-source our code at https://github.com/saxenakrati09/GraphDiffMed.

2605.20187 2026-05-21 cs.LG cs.AI cs.IT math.IT 版本更新

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

在遮蔽离散序列模型中神经估计成对互信息

Jai Sharma, Yifan Wang, Bryan Li

发表机构 * University of California, Berkeley, CA, USA(加州大学伯克利分校)

AI总结 本文提出了一种神经框架,直接从预训练的遮蔽扩散模型(MDMs)的隐藏状态中估计成对条件互信息(MI),利用模型自身条件分布计算的地面真实MI进行监督,从而捕捉模型内部对依赖结构的信念,并在单次前向传递中预测完整的MI矩阵,实现MI引导的并行解码。

Comments 6 pages, 3 figures; submitting to ICML 2026

详情
AI中文摘要

理解变量之间的依赖关系对于解释性和高效生成在遮蔽扩散模型(MDMs)中至关重要,但这些模型主要暴露边际条件分布,而不显式表示变量间依赖。我们提出了一种神经框架,直接从预训练MDM的隐藏状态中估计成对条件互信息(MI),使用模型自身条件分布计算的地面真实MI进行监督。所得到的估计器捕捉了模型内部对依赖结构的信念,并在单次前向传递中预测完整的MI矩阵,从而通过识别条件独立的变量子集实现MI引导的并行解码。我们在Sudoku和蛋白质序列生成中使用ESM-C评估了我们的方法,其中MI图恢复了已知的结构约束,并在保持生成质量的同时,相比顺序解码将推理时间前向传递次数减少了3-5倍,同时优于基于熵的并行化方法。

英文摘要

Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a pretrained MDM, using ground-truth MI computed from the model's own conditional distributions for supervision. The resulting estimator captures the model's internal belief about dependency structure and predicts the full MI matrix in a single forward pass, enabling MI-guided parallel decoding by identifying conditionally independent subsets of variables. We evaluate our approach on Sudoku and protein sequence generation with ESM-C, where the MI maps recover known structural constraints and enable a 3-5x magnitude reduction in inference-time forward passes compared to sequential decoding, while preserving generative quality and outperforming entropy-based parallelization methods.

2605.18623 2026-05-21 cs.DS cs.LG 版本更新

An Approximation Algorithm for Graph Label Selection

图标签选择的近似算法

Josia John, Simon Meierhans, Maximilian Probst Gutenberg

发表机构 * Department of Computer Science, ETH Zurich(苏黎世联邦理工学院计算机科学系)

AI总结 本文提出了一种新的图标签选择算法,在标准预算约束下,首次实现了O(log^{1.5}n)的近似比,解决了如何从整个图中选择少量代表性顶点以准确预测剩余顶点标签的问题。

Comments Accepted at ICML 2026. 9 pages, 7 figures

详情
AI中文摘要

在图标签选择问题中,给定一个n个顶点的图和一个预算k,目标是选择k个顶点,其标签能够准确预测剩余顶点的标签。该问题旨在从整个图中提炼出一个小的代表性集合。我们提出了第一个在标准预算约束下具有O(log^{1.5}n)近似比的图标签选择算法。先前的工作要么依赖于资源增强,允许显著多于k个标记的顶点,要么主要由启发式方法组成,没有可证明的保证。最后,我们证明了我们的算法的实用启发式变种能够扩展到比以前的方法大得多的图,同时几乎保持了其质量。

英文摘要

In the graph label selection problem, one is given an $n$-vertex graph and a budget $k$, and seeks to select $k$ vertices whose labels enable accurate prediction of the labels on the remaining vertices. This problem formalizes distilling a small representative set from the whole graph. We present the first $\tilde{O}(\log^{1.5} n)$-approximation algorithm for graph label selection under the standard budget constraint. Prior work either relies on resource augmentation, allowing substantially more than $k$ labeled vertices, or consists primarily of heuristics without provable guarantees. Finally, we demonstrate that practical heuristic variants of our algorithm scale to significantly larger graphs than previous methods, while essentially retaining their quality.

2605.17568 2026-05-21 cs.LG 版本更新

Structured Neural Marked Point Processes for Interpretable Event Interaction Modeling

结构化神经标记点过程用于可解释的事件交互建模

Zhitong Xu, Qiwei Yuan, Yinghao Chen, Shandian Zhe, Bin Shen

发表机构 * Kahlert School of Computing, University of Utah(犹他大学计算学院) Celonis AI

AI总结 本文提出了一种结构化神经标记点过程(SNMPP),通过显式发现事件级和类别级的关系,实现高灵活性的建模,同时在合成和现实数据集上验证了其揭示结构关系和强预测性能的能力。

详情
AI中文摘要

多类事件流在许多现实世界应用中出现,其中揭示结构化、可解释的事件间关系,以及准确预测,仍然是一个核心挑战。现有的神经点过程模型具有高度表达能力,但以黑箱方式编码事件交互,阻止了显式发现结构依赖关系。在本文中,我们提出了一种结构化神经标记点过程(SNMPP),在实现高建模灵活性的同时,能够从数据中显式发现事件级和类别级的关系。我们的模型构建了一个由事件类型上的符号交互网络和延迟感知的单调时间网络组成的产品形式神经影响核。这种设计使能够显式表征类间影响拓扑结构--包括激发、抑制和中性--同时灵活捕捉多样的时间衰减模式和潜在影响延迟。为了高效学习,我们开发了一种分层蒙特卡洛估计器用于随机训练。在合成和现实世界基准数据集上的广泛实验验证了我们的方法揭示结构关系和提供强预测性能的能力。

英文摘要

Multi-class event streams arise in numerous real-world applications, where uncovering structured, interpretable inter-event relationships, together with accurate prediction, remains a central challenge. Existing neural point process models are highly expressive but encode event interactions in a black-box manner, preventing explicit discovery of structured dependencies. In this paper, we propose a structured neural marked point process (SNMPP) that achieves high modeling flexibility while enabling explicit event-wise and class-wise relationship discovery from data. Our model constructs a product-form neural influence kernel composed of a signed interaction network over event types and a delay-aware monotonic temporal network. This design enables explicit characterization of inter-class influence topology -- including excitation, inhibition, and neutrality -- while flexibly capturing diverse temporal decay patterns and potential influence delays. For efficient learning, we develop a stratified Monte Carlo estimator for stochastic training. Extensive experiments on synthetic and real-world benchmark datasets validate the ability of our approach to uncover structured relationships and deliver strong predictive performance.

2605.15419 2026-05-21 cs.LG 版本更新

Lagrangian Flow Matching: A Least-Action Framework for Principled Path Design

Lagrangian Flow Matching: 一个基于最小作用原理的路径设计框架

Shukai Du, Junzhe Zhang, Yiming Li

发表机构 * Department of Mathematics, Syracuse University(数学系,苏利文大学) Department of EECS, Syracuse University(电子工程与计算机科学系,苏利文大学)

AI总结 本文提出了一种基于最小作用原理的路径设计框架,通过最小化一般拉格朗日量的作用来确定概率路径和速度场,展示了其在最优传输和条件流匹配中的应用。

详情
AI中文摘要

流匹配通过回归来训练神经速度场,以对抗与给定概率路径相关的靶向速度,该路径连接了一个简单的初始分布到数据分布。核心设计选择是路径本身。现有的构造,包括修正和基于最优传输的路径,将样本沿直线运输在耦合端点之间,因此只覆盖了狭义的动力学类别。我们观察到,这对应于经典力学中最小作用原理的最简单情况,其中动能拉格朗日量产生自由粒子的直线轨迹。基于这一观察,我们提出了拉格朗日流匹配,一个基于物理的框架,其中概率路径和速度场由最小化一般拉格朗日量的作用确定,同时满足连续性方程和给定的端点。我们展示了这一动态问题等价于静态最优传输(OT)的公式化,从而得到一组无模拟训练目标,能够恢复最优传输基于的流匹配作为动能特例,以及三角函数保持方差的扩散路径作为谐振子情况。更一般的拉格朗日量产生新的概率路径和速度场,并且数值实验表明,它们在学习动力学中引起有意义的变化,同时仍能与现有的条件流匹配模型竞争。

英文摘要

Flow matching trains a neural velocity field by regression against a target velocity associated with a prescribed probability path connecting a simple initial distribution to the data distribution. A central design choice is the path itself. Existing constructions, including rectified and optimal-transport-based paths, transport samples along straight lines between coupled endpoints and thus cover only a narrow class of dynamics. We observe that this corresponds to the simplest case of the least-action principle in classical mechanics, in which the kinetic Lagrangian yields free-particle straight-line trajectories. Building on this observation, we propose Lagrangian flow matching, a physics-based framework in which the probability path and velocity field are determined by minimizing the action of a general Lagrangian subject to the continuity equation and the prescribed endpoints. We show that this dynamic problem admits an equivalent static optimal transport (OT) formulation, yielding a family of simulation-free training objectives that recover OT-based flow matching as the kinetic special case and the trigonometric variance-preserving diffusion path as the harmonic-oscillator case. More general Lagrangians give rise to new probability paths and velocity fields, and numerical experiments show that they induce meaningful changes in the learned dynamics while remaining competitive with existing conditional flow matching models.

2605.12770 2026-05-21 cs.LG cs.AI cs.CL 版本更新

WriteSAE: Sparse Autoencoders for Recurrent State

WriteSAE: 用于递归状态的稀疏自编码器

Jack Young

发表机构 * Indiana University(印第安纳大学)

AI总结 本文提出WriteSAE,一种用于递归语言模型状态中矩阵更新的稀疏自编码器,通过在递归缓存中替换原始写入操作来提升生成效果,并在多个模型上验证了其有效性。

Comments 26 pages, 14 figures, 21 tables; code at https://github.com/JackYoung27/writesae

详情
AI中文摘要

我们介绍了WriteSAE,一种用于递归语言模型状态中矩阵更新的稀疏自编码器。在Gated DeltaNet、Mamba-2和RWKV-7中,每个token向递归缓存写入一个矩阵形状的更新;残差流SAE具有向量形状的原子,无法直接替换该更新。WriteSAE学习具有与模型自身写入相同形状的秩-1矩阵原子。这使我们能够测试直接替换:在SAE激活原子的位置,我们移除模型的写入,插入由SAE激活缩放的原子,并继续前向传递。在92.4%的评估位置上,原子比删除写入能产生更接近的最终token分布;平均每个原子,该比率是89.8%。对于Gated DeltaNet,一个使用忘记门、读取查询和输出嵌入的公式可以预测结果的logit变化,$R^2 = 0.98$。相同的替换测试在Mamba-2-370M上转移,达到88.1%。在生成中,该公式选择写入方向;将写入方向写入三个连续的缓存位置,其范数为模型写入的3倍,使在未修改模型中初始排名为100-1000的token出现在100%的延续中,比33.3%有所提高。据我们所知,这是首次在状态空间或混合递归层中报告的缓存级引导干预。

英文摘要

We introduce WriteSAE, a sparse autoencoder for the matrix updates written into recurrent language-model state. In Gated DeltaNet, Mamba-2, and RWKV-7, each token writes a matrix-shaped update to a recurrent cache; a residual-stream SAE has vector-shaped atoms and cannot replace that update directly. WriteSAE learns rank-1 matrix atoms with the same shape as the model's own write. This lets us test a direct replacement: at positions where the SAE activates an atom, we remove the model's write, insert the atom scaled by its SAE activation, and continue the forward pass. The atom gives a closer final token distribution than deleting the write on 92.4% of evaluated positions; averaged per atom, the rate is 89.8%. For Gated DeltaNet, a formula using the forget gate, read query, and output embedding predicts the resulting logit change with $R^2 = 0.98$. The same replacement test transfers to Mamba-2-370M at 88.1%. In generation, the formula chooses a write direction; writing it into three consecutive cache positions at $3\times$ the norm of the model's write makes tokens initially ranked 100--1000 by the unmodified model appear in 100% of continuations, up from 33.3%. To our knowledge this is the first cache-level steering intervention reported in a state-space or hybrid recurrent layer.

2605.06395 2026-05-21 cs.LG cs.AI eess.SP 版本更新

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

通过希尔伯特丛和细胞sheaf实现一致的几何深度学习

Kartik Tandon, Julian Gould, Tanishq Bhatia, Francesca Dominici, Alejandro Ribeiro, Claudio Battiloro

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Sakana AI Northeastern University(东北大学) Harvard University(哈佛大学)

AI总结 本文提出了一种新的卷积学习框架,用于在流形上支持的可能无限维信号,通过希尔伯特丛关联的连接拉普拉斯算子作为卷积算子,引入了称为HilbNets的滤波器和神经网络,并通过两阶段采样过程实现,证明了采样诱导的希尔伯特细胞sheaf的sheaf拉普拉斯收敛于底层连接拉普拉斯,从而在无限维丛设置中推广了Belkin和Niyogi的收敛结果,最终在合成和现实任务中验证了该框架。

Comments 51 pages, 3 figures, 5 tables

详情
AI中文摘要

现代深度学习架构越来越多地面临复杂信号的挑战,这些信号本质上是无限维的,如时间序列、概率分布或算子,并在不规则域上定义。然而,针对这些设置的统一学习理论仍然缺乏。为了开始解决这一差距,我们引入了一种新的卷积学习框架,用于在流形上支持的可能无限维信号。具体来说,我们使用与希尔伯特丛相关的连接拉普拉斯算子作为卷积算子,并推导出滤波器和神经网络,称为HilbNets。我们使HilbNets以及更一般地卷积操作通过两阶段采样过程实现。首先,我们证明采样流形诱导了一个希尔伯特细胞sheaf,这是一个带有希尔伯特特征空间和边耦合规则的广义图结构,并证明其sheaf拉普拉斯在采样密度增加时以概率收敛于底层连接拉普拉斯。值得注意的是,这一结果是Belkin & Niyogi收敛结果在无限维丛设置中的推广,这是几何学习方法的理论基石。其次,我们离散化信号并证明离散化的(可实现的)HilbNets收敛于底层连续架构,并且可以在相同丛的不同采样中转移,从而为学习提供一致性。最后,我们验证了我们的框架在合成和现实任务中的有效性。总体而言,我们的结果通过将经典拉普拉斯框架提升到信号在每个点居住在自身希尔伯特空间的设置中,扩展了几何学习的范围。

英文摘要

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as \textit{HilbNets}. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin \& Niyogi \cite{BELKIN20081289} convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

2605.03601 2026-05-21 cs.LG cs.DM math.CO 版本更新

Most ReLU Networks Admit Identifiable Parameters

大多数ReLU网络允许可识别的参数

Moritz Grillo, Guido Montúfar

发表机构 * Max Planck Institute for Mathematics in the Sciences(马克斯·普朗克数学研究所)

AI总结 研究ReLU深度网络的实现映射,探讨函数是否能确定其参数(除缩放和排列外),引入基于加权多面体复形的框架,证明对于输入和隐藏层宽度至少为2的架构,存在可识别参数的开集,且函数维度等于参数数量减去隐藏神经元数量,并建立通用深度层次。

详情
AI中文摘要

我们研究深度ReLU网络的实现映射,聚焦于何时函数能确定其参数(除缩放和排列外)。为了分析超越这些标准对称性的隐藏冗余,我们引入基于加权多面体复形的框架。我们的主要结果表明,对于输入和隐藏层宽度至少为2的每个架构,存在可识别参数的开集。这表明此类架构的函数维度恰好等于参数数量减去隐藏神经元数量。我们进一步证明,最小函数表示仍可能具有非平凡的参数冗余。最后,我们建立了通用深度层次,即对于参数的开集,所实现的函数无法由任何更浅的网络泛化表示。

英文摘要

We study the realization map of deep ReLU networks, focusing on when a function determines its parameters up to scaling and permutation. To analyze hidden redundancies beyond these standard symmetries, we introduce a framework based on weighted polyhedral complexes. Our main result shows that for every architecture whose input and hidden layers have width at least two, there exists an open set of identifiable parameters. This implies that the functional dimension of every such architecture is exactly the number of parameters minus the number of hidden neurons. We further show that minimal functional representations can still have non-trivial parameter redundancies. Finally, we establish a generic depth hierarchy, whereby for an open set of parameters the realized function cannot be represented generically by any shallower network.

2605.03562 2026-05-21 cs.LG cs.AI 版本更新

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

HeadQ: KV-Cache量化中的模型可见失真与分数空间校正

Jorge L. Ruiz Williams

AI总结 本文提出HeadQ方法,通过在键侧存储低秩残差侧码并在校准学习的查询基上应用作为加性对数修正,以解决KV缓存量化中的模型可见失真问题,并通过分数空间误差预测注意力KL散度,优于原始键MSE。

Comments Withdrawn by the author because ethical concerns were identified after posting

详情
AI中文摘要

KV缓存量化器通常优化存储空间重建,尽管注意力通过logits读取键,通过注意力加权读出读取值。我们主张应以模型可见坐标测量持久缓存误差。对于键,可见对象是分数误差模常数位移;这导致HeadQ,一种键侧方法,存储一个低秩残差侧码在校准学习的查询基上,并将其作为加性对数修正。对于值,固定注意力读出提供了一个A²加权的token失真替代物。在六个模型上,Fisher/分数空间误差预测注意力KL散度比原始键MSE更好;相同预算的反例、空空间干预、查询-PCA控制以及错误符号HeadQ否定了存储MSE替代方案。匹配的Pythia检查点将主要异常定位到小模型低熵路由翻转边界。在仅使用键的WikiText-103解码实验中,使用密集值时,HeadQ在最强的2位行中移除了约84-94%的额外困惑度;在辅助的全KV 2位组合中,HeadQ加上A²值策略改进了所有六个模型。

英文摘要

KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error should be measured in model-visible coordinates. For keys, the visible object is score error modulo constant shifts; this yields HeadQ, a key-side method that stores a low-rank residual side code in a calibration-learned query basis and applies it as an additive logit correction. For values, fixed-attention readout gives an $A^2$-weighted token-distortion surrogate. Across six models, Fisher/score-space error predicts attention KL far better than raw key MSE; same-budget counterexamples, null-space interventions, query-PCA controls, and wrong-sign HeadQ falsify storage-MSE alternatives. Matched Pythia checkpoints localize the main anomaly to a small-model low-entropy route-flip boundary. In K-only WikiText-103 decode experiments with dense values, HeadQ removes roughly $84$--$94\%$ of the excess perplexity on the strongest 2-bit rows; in an auxiliary full-KV 2-bit composition, HeadQ plus an $A^2$ value policy improves all six models.

2604.24957 2026-05-21 cs.LG cs.AI 版本更新

Compute Aligned Training: Optimizing for Test Time Inference

计算对齐训练:优化测试时间推断

Adam Ousherovitch, Ambuj Tewari

发表机构 * Department of Statistics(统计学系) University of Michigan(密歇根大学)

AI总结 本文提出计算对齐训练方法,通过将训练目标与测试时间策略对齐,提升大语言模型在测试时的推断性能。

详情
AI中文摘要

在测试时间计算方面扩大模型性能已成为增强大型语言模型(LLM)性能的强大机制。然而,标准的后训练范式,监督微调(SFT)和强化学习(RL),优化基础策略下单个样本的似然,导致与依赖聚合或过滤输出的测试时间过程产生不一致。在本文中,我们提出计算对齐训练,将训练目标与测试时间策略对齐。通过将推理策略视为对基础策略的操作,我们推导出新的损失函数,这些损失函数在应用所述策略时最大化性能。我们为SFT和RL在常见测试时间策略下实例化此类损失函数。最后,我们提供了实证证据,证明这种训练方法在测试时间扩展方面显著优于标准训练。

英文摘要

Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a base policy, creating a misalignment with test time procedures that rely on aggregated or filtered outputs. In this work, we propose Compute Aligned Training, which aligns training objectives with test-time strategies. By conceptualizing inference strategies as operators on the base policy, we derive new loss functions that maximize performance when said strategies are applied. We instantiate such loss functions for SFT and RL across common test time strategies. Finally, we provide empirical evidence that this training method substantially improves test time scaling over standard training.

2604.15038 2026-05-21 cs.LG cs.AI cs.CV 版本更新

When Fairness Metrics Disagree: Evaluating the Reliability of Demographic Fairness Assessment in Machine Learning

当公平性指标产生分歧:评估机器学习中人口公平性评估的可靠性

Khalid Adnan Alsayed

发表机构 * Founder, Ducaltus(Ducaltus创始人) BSc (Hons) Artificial Intelligence(人工智能学士(荣誉)) School of Computing, Engineering & Digital Technologies(计算、工程与数字技术学院) Teesside University, UK(英国泰赛德大学)

AI总结 本文研究了公平性评估的一致性问题,通过多指标分析评估机器学习模型中的人口偏见,发现不同公平性指标可能导致矛盾的评估结果,引入了公平性分歧指数(FDI)来量化指标间的不一致程度。

Comments 15 pages, 4 figues, 5 tables

详情
AI中文摘要

在高风险应用中,机器学习系统的公平性评估已成为核心问题,包括生物识别、医疗决策和自动风险评估。现有方法通常依赖少量公平性指标来评估模型行为,隐含假设这些指标能提供一致和可靠的结论。然而,不同公平性指标捕捉模型性能的不同统计属性,可能在相同系统上产生冲突的评估。本文通过系统性的多指标分析,评估机器学习模型中的人口偏见,使用面部识别作为受控实验环境,评估模型在多个群体分区下的性能,包括误差率差异和基于性能的指标。结果表明,公平性评估可能因指标选择而显著变化,导致关于模型偏见的矛盾结论。为量化此现象,我们引入公平性分歧指数(FDI),以捕捉公平性指标间的不一致程度。进一步表明,分歧在阈值和模型配置下仍保持高位。这些发现突显了当前公平性评估实践的关键限制,并表明单一指标报告不足以可靠地评估偏见。

英文摘要

The evaluation of fairness in machine learning systems has become a central concern in high-stakes applications, including biometric recognition, healthcare decision-making, and automated risk assessment. Existing approaches typically rely on a small number of fairness metrics to assess model behaviour across group partitions, implicitly assuming that these metrics provide consistent and reliable conclusions. However, different fairness metrics capture distinct statistical properties of model performance and may therefore produce conflicting assessments when applied to the same system. In this work, we investigate the consistency of fairness evaluation by conducting a systematic multi-metric analysis of demographic bias in machine learning models. Using face recognition as a controlled experimental setting, we evaluate model performance across multiple group partitions under a range of commonly used fairness metrics, including error-rate disparities and performance-based measures. Our results demonstrate that fairness assessments can vary significantly depending on the choice of metrics, leading to contradictory conclusions regarding model bias. To quantify this phenomenon, we introduce the Fairness Disagreement Index (FDI), a measure designed to capture the degree of inconsistency across fairness metrics. We further show that disagreement remains high across thresholds and model configurations. These findings highlight a critical limitation in current fairness evaluation practices and suggest that single-metric reporting is insufficient for reliable bias assessment.

2604.01449 2026-05-21 cs.AI cs.LG 版本更新

When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems

当AI出错时:AI辅助用药决策系统中的可靠性与风险

Khalid Adnan Alsayed

发表机构 * Ducaltus(Ducaltus公司) School of Computing, Engineering & Digital Technologies(计算、工程与数字技术学院) Teesside University(泰赛德大学)

AI总结 本文研究了AI辅助用药系统在现实决策中的可靠性问题,通过模拟药物相互作用和剂量决策场景,分析系统故障类型及其潜在临床影响,强调在安全关键领域如药房实践中,需补充传统性能指标的风险意识评估方法。

Comments 9 pages, 1 figure. Position paper with simulated experimental analysis of AI reliability in medication decision systems. Minor Correction to Title Metadata (Typo Fix)

详情
AI中文摘要

人工智能(AI)系统日益被整合到医疗和药房工作中,支持药物推荐、剂量确定和药物相互作用检测等任务。尽管这些系统在标准评估指标下通常表现良好,但其在现实决策中的可靠性仍不够理解。在高风险领域如用药管理中,单个错误推荐可能导致严重患者伤害。本文通过聚焦系统故障及其潜在临床后果,探讨AI辅助用药系统的可靠性。不同于仅通过聚合指标评估性能,本文关注错误发生的方式以及AI系统产生错误输出时的情况。通过一系列受控的模拟场景,分析不同类型的系统故障,包括遗漏相互作用、错误风险标记和不适当的剂量推荐。研究发现,AI在用药相关情境中的错误可能导致不良药物反应、无效治疗或延误护理,尤其是在缺乏充分人类监督的情况下。此外,本文讨论了过度依赖AI推荐的风险以及决策过程透明度有限带来的挑战。本文为医疗领域AI评估提供了以可靠性为核心的视角,强调理解故障行为和现实影响的重要性。它突显了在安全关键领域如药房实践中,需补充传统性能指标的风险意识评估方法的必要性。

英文摘要

Artificial intelligence (AI) systems are increasingly integrated into healthcare and pharmacy workflows, supporting tasks such as medication recommendations, dosage determination, and drug interaction detection. While these systems often demonstrate strong performance under standard evaluation metrics, their reliability in real-world decision-making remains insufficiently understood. In high-risk domains such as medication management, even a single incorrect recommendation can result in severe patient harm. This paper examines the reliability of AI-assisted medication systems by focusing on system failures and their potential clinical consequences. Rather than evaluating performance solely through aggregate metrics, this work shifts attention towards how errors occur and what happens when AI systems produce incorrect outputs. Through a series of controlled, simulated scenarios involving drug interactions and dosage decisions, we analyse different types of system failures, including missed interactions, incorrect risk flagging, and inappropriate dosage recommendations. The findings highlight that AI errors in medication-related contexts can lead to adverse drug reactions, ineffective treatment, or delayed care, particularly when systems are used without sufficient human oversight. Furthermore, the paper discusses the risks of over-reliance on AI recommendations and the challenges posed by limited transparency in decision-making processes. This work contributes a reliability-focused perspective on AI evaluation in healthcare, emphasising the importance of understanding failure behavior and real-world impact. It highlights the need to complement traditional performance metrics with risk-aware evaluation approaches, particularly in safety-critical domains such as pharmacy practice.

2603.28675 2026-05-21 cs.CV cs.AI cs.LG 版本更新

Why Aggregate Accuracy is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems

为何聚合准确率不足以评估执法面部识别系统的公平性

Khalid Adnan Alsayed

发表机构 * Ducaltus School of Computing, Engineering & Digital Technologies(计算、工程与数字技术学院) Teesside University(泰赛德大学)

AI总结 本文探讨了在执法场景中,面部识别系统的聚合准确率作为公平性评估指标的不足,通过分析子群体误差分布,指出聚合指标可能掩盖不同群体间的显著差异,并强调需要更全面的评估框架来确保负责任的AI部署。

Comments 9 pages, 2 tables, 1 figure. Position paper with empirical subgroup analysis highlighting limitations of aggregate accuracy in fairness evaluation

详情
AI中文摘要

面部识别系统正在越来越多地应用于执法和安全领域,在这些领域中算法决策可能带来重大社会影响。尽管报告的准确率较高,但越来越多的证据表明,这些系统在不同群体中的表现往往不均衡,导致不公正的误差率和潜在危害。本文认为,聚合准确率是评估执法中面部识别系统公平性和可靠性不足的指标。通过分析子群体层面的误差分布,包括假阳性率(FPR)和假阴性率(FNR),本文展示了聚合性能指标如何掩盖不同群体间的关键差异。实证观察表明,具有相似总体准确率的系统可以表现出显著不同的公平性特征,子群体误差率在单一聚合指标下可能有显著差异。本文进一步探讨了在执法应用中以准确率为中心的评估实践所带来的操作风险,其中误分类可能导致错误怀疑或遗漏识别。它强调了公平性意识评估方法和模型无关审计策略的重要性,这些方法能够实现部署后的现实系统评估。研究结果强调了需要超越准确率作为主要指标,并采用更全面的评估框架来确保负责任的AI部署。

英文摘要

Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demographic groups, leading to disproportionate error rates and potential harm. This paper argues that aggregate accuracy is an insufficient metric for evaluating the fairness and reliability of facial recognition systems in high-stakes environments. Through analysis of subgroup-level error distribution, including false positive rate (FPR) and false negative rate (FNR), the paper demonstrates how aggregate performance metrics can obscure critical disparities across demographic groups. Empirical observations show that systems with similar overall accuracy can exhibit substantially different fairness profiles, with subgroup error rates varying significantly despite a single aggregate metric. The paper further examines the operational risks associated with accuracy-centric evaluation practices in law enforcement applications, where misclassification may result in wrongful suspicion or missed identification. It highlights the importance of fairness-aware evaluation approaches and model-agnostic auditing strategies that enable post-deployment assessment of real-world systems. The findings emphasise the need to move beyond accuracy as a primary metric and adopt more comprehensive evaluation frameworks for responsible AI deployment.

2603.15842 2026-05-21 cs.LG cs.AI cs.IT math.IT 版本更新

Informationally Compressive Anonymization: Non-Degrading Sensitive Input Protection for Privacy-Preserving Supervised Machine Learning

信息压缩匿名化:非降级的敏感输入保护用于隐私保护的监督机器学习

Jeremy J Samuelson

发表机构 * EVP Artificial Intelligence & Innovation(EVP人工智能与创新)

AI总结 本文提出了一种信息压缩匿名化(ICA)方法和VEIL架构,通过架构和数学设计而非噪声注入或密码学来实现强隐私保障,确保在隐私保护监督机器学习中保留预测效用,同时支持可扩展的多地区部署。

Comments 47 pages, 29 figures

详情
AI中文摘要

现代机器学习系统越来越多地依赖敏感数据,这带来了显著的隐私、安全和监管风险,而现有的隐私保护机器学习(ppML)技术,如差分隐私(DP)和同态加密(HE),只能通过降级性能、增加复杂性或禁止性计算开销来解决。本文介绍了信息压缩匿名化(ICA)和VEIL架构,一种隐私保护的机器学习框架,通过架构和数学设计实现强隐私保障,而非噪声注入或密码学。ICA在受信任的源环境中嵌入一个监督的多目标编码器,将原始输入转换为低维、任务对齐的潜在表示,确保只有不可逆匿名化的向量被导出到不可信的训练和推理环境中。本文严格证明这些编码在拓扑和信息论论证中结构非可逆,表明即使在理想化的攻击者假设下,逆向也是逻辑上不可能的,并且在实际部署中,攻击者对原始数据的条件熵发散,驱动重建概率趋于零。与以往基于自编码器的ppML方法不同,ICA通过将表示学习与下游监督目标对齐,保留预测效用,从而在无需梯度裁剪、噪声预算或推理时间加密的情况下实现低延迟、高性能的机器学习。VEIL架构强制执行严格的信任边界,支持可扩展的多地区部署,并自然与隐私设计监管框架对齐,建立了一种新的企业ML基础,即使在后量子威胁面前,也是安全、高效且安全的。

英文摘要

Modern machine learning systems increasingly rely on sensitive data, creating significant privacy, security, and regulatory risks that existing privacy-preserving machine learning (ppML) techniques, such as Differential Privacy (DP) and Homomorphic Encryption (HE), address only at the cost of degraded performance, increased complexity, or prohibitive computational overhead. This paper introduces Informationally Compressive Anonymization (ICA) and the VEIL architecture, a privacy-preserving ML framework that achieves strong privacy guarantees through architectural and mathematical design rather than noise injection or cryptography. ICA embeds a supervised, multi-objective encoder within a trusted Source Environment to transform raw inputs into low-dimensional, task-aligned latent representations, ensuring that only irreversibly anonymized vectors are exported to untrusted training and inference environments. The paper rigorously proves that these encodings are structurally non-invertible using topological and information-theoretic arguments, showing that inversion is logically impossible, even under idealized attacker assumptions, and that, in realistic deployments, the attacker conditional entropy over the original data diverges, driving reconstruction probability to zero. Unlike prior autoencoder-based ppML approaches, ICA preserves predictive utility by aligning representation learning with downstream supervised objectives, enabling low-latency, high-performance ML without gradient clipping, noise budgets, or encryption at inference time. The VEIL architecture enforces strict trust boundaries, supports scalable multi-region deployment, and naturally aligns with privacy-by-design regulatory frameworks, establishing a new foundation for enterprise ML that is secure, performant, and safe by construction, even in the face of post-quantum threats.

2602.10408 2026-05-21 cs.LG cs.CL 版本更新

Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers

预归一化变换器中的门控归一化移除与尺度锚定

Andrei Kanavalau, Carmen Amo Alonso, Sanjay Lall

发表机构 * Department of Electrical Engineering(电气工程系) Department of Computer Science(计算机科学系) Stanford University(斯坦福大学)

AI总结 本文研究了预归一化变换器中归一化层的必要性,提出了一种门控归一化移除方法,通过TaperNorm逐步将归一化操作转为样本无关的线性或仿射映射,并揭示了最终归一化层对预logit表示尺度的锚定作用。

详情
AI中文摘要

归一化层在变换器中是标准组件,但其样本依赖的计算在整个训练和推理过程中是否必要尚不明确。本文为预归一化变换器开发了一种门控归一化移除方法。该方法通过TaperNorm实现,从标准RMSNorm/LayerNorm逐步过渡到学习的样本无关线性或仿射映射。一旦门控达到零, tapered层将不再计算每个token的统计信息,所得到的映射可以折叠到相邻的线性投影中。结果表明,在测试的预训练和微调设置中,内部归一化可以逐步移除,且验证损失的增加较小。我们的方法揭示了最终归一化层的独特作用,即它锚定了预logit表示的尺度。有了这个锚定,最后隐藏状态的径向变化不会直接减少损失;当移除它时,通过增加logit的幅度可以实现交叉熵的减少。固定目标尺度损失提供了显式的替代锚定,使在测试范围内能够完全无归一化地进行消融实验。最后,在KV缓存自回归解码基准中,逐步移除内部归一化可提供高达1.14倍的吞吐量,使用显式缩放操作,折叠后可达1.18倍。

英文摘要

Normalization layers are standard in transformers, but it is not clear whether their sample-dependent computations are necessary throughout both training and inference. This work develops a gated normalization-removal approach for pre-norm transformers. The approach is implemented using TaperNorm, which starts from standard RMSNorm/LayerNorm and gradually tapers to learned sample-independent linear or affine maps. Once the gate reaches zero, per-token statistics are no longer computed in the tapered layers and the resulting maps can be folded into adjacent linear projections. The results indicate that internal normalization can be tapered in the tested pre-training and fine-tuning settings with small validation-loss increases. Our approach helps reveal a distinct role for final normalization, namely that it anchors the scale of the pre-logit representation. With this anchor present, radial changes in the last hidden state do not directly reduce the loss; when it is removed, reducing cross-entropy can be achieved by increasing logit magnitudes. A fixed-target scale loss provides an explicit alternative anchor and enables fully norm-free ablations in the tested regimes. Finally, in a KV-cached autoregressive decoding benchmark, tapering internal norms gives up to $1.14\times$ higher throughput with explicit scaling operations and up to $1.18\times$ after folding.

2602.08686 2026-05-21 cs.LG cs.AI 版本更新

CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation

CompilerKV: 通过离线经验编译实现风险适应性的键值压缩

Ning Yang, Chengzhi Wang, Yibo Liu, Baoliang Tian, Haijun Zhang

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) University of Electronic Science and Technology of China(电子科技大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) ByteDance(字节跳动) University of Science and Technology Beijing(北京科技大学)

AI总结 本文提出CompilerKV,一种通过离线经验编译实现风险适应性的键值压缩方法,通过离线编译校准语料库中的纠正表,将在线纠正减少到O(1)查找加预算限制,从而在多个模型架构上实现了压缩SOTA,并在不同压力条件下保持最优性能。

详情
AI中文摘要

Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce extsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to $O(1)$ lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman $arρ{=}0.90$), and direct model-to-model table transfer costs only $0.4$--$0.8$ LongBench points on average. At a 512-token budget, extsc{CompilerKV} attains compressed-SOTA on all four backbones, improving over the strongest prefill-only baseline by $+1.67$ points on average (task-bootstrap 95\% CI $[+1.08,+2.37]$). Pressure regimes amplify the gap: under a fixed $512/32k$ cache ratio, CompilerKV remains the strongest compressed method through 128k RULER ($\sim\!73$ vs.\ FullKV $\sim\!79$, SnapKV $\sim\!38$); on 32k NIAH it reaches $0.89$ vs.\ SnapKV $0.42$; and at 32k input, retaining only $1.56\%$ of the prefill KV, batch-16 serving remains feasible where FullKV is OOM.

英文摘要

Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce \textsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to $O(1)$ lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman $\barρ{=}0.90$), and direct model-to-model table transfer costs only $0.4$--$0.8$ LongBench points on average. At a 512-token budget, \textsc{CompilerKV} attains compressed-SOTA on all four backbones, improving over the strongest prefill-only baseline by $+1.67$ points on average (task-bootstrap 95\% CI $[+1.08,+2.37]$). Pressure regimes amplify the gap: under a fixed $512/32k$ cache ratio, CompilerKV remains the strongest compressed method through 128k RULER ($\sim\!73$ vs.\ FullKV $\sim\!79$, SnapKV $\sim\!38$); on 32k NIAH it reaches $0.89$ vs.\ SnapKV $0.42$; and at 32k input, retaining only $1.56\%$ of the prefill KV, batch-16 serving remains feasible where FullKV is OOM.

2602.07832 2026-05-21 cs.LG cs.AI 版本更新

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

rePIRL: 通过逆强化学习学习PRM以提高LLM推理

Xian Wu, Kaijie Zhu, Ying Zhang, Lun Wang, Wenbo Guo

发表机构 * Meta AI Department of Computer Science, University of California, Santa Barbara(加州大学圣芭芭拉分校计算机科学系) Google DeepMind(谷歌DeepMind) Independent Researcher(独立研究者)

AI总结 本文提出rePIRL框架,通过逆强化学习学习高效的PRM,无需依赖专家策略的强假设,解决了传统方法中熵崩溃等固有限制问题,通过双学习过程和定制技术提升LLM推理性能,并在数学和编程任务数据集上验证了其有效性。

详情
AI中文摘要

过程奖励已被广泛用于深度强化学习以提高训练效率、减少方差并防止奖励黑客。在LLM推理中,现有工作也探索了各种解决方案来学习有效的过程奖励模型(PRM),有或无专家策略的帮助。然而,现有方法要么依赖于对专家策略的强假设(例如要求其奖励函数),要么受到固有限制(例如熵崩溃),导致PRM效果有限或泛化能力差。在本文中,我们引入了rePIRL,一种受逆强化学习启发的框架,能够在对专家策略假设最少的情况下学习有效的PRM。具体来说,我们设计了一种双学习过程,交替更新策略和PRM。我们的学习算法具有定制技术,以解决将传统逆强化学习扩展到LLM的挑战。我们理论证明,所提出的学习框架可以统一在线和离线PRM学习方法,证明rePIRL可以在最少假设下学习PRM。在标准化数学和编程推理数据集上的经验评估展示了rePIRL在现有方法上的有效性。我们进一步展示了训练的PRM在测试时训练、测试时扩展以及为训练困难问题提供早期信号的应用。最后,我们通过详细的消融研究验证了我们的训练配方和关键设计选择。

英文摘要

Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explore various solutions for learning effective process reward models (PRM) with or without the help of an expert policy. However, existing methods either rely on strong assumptions about the expert policies (e.g., requiring their reward functions) or suffer intrinsic limitations (e.g., entropy collapse), resulting in weak PRMs or limited generalizability. In this paper, we introduce rePIRL, an inverse RL-inspired framework that learns effective PRMs with minimal assumptions about expert policies. Specifically, we design a dual learning process that updates the policy and the PRM interchangeably. Our learning algorithm has customized techniques to address the challenges of scaling traditional inverse RL to LLMs. We theoretically show that our proposed learning framework can unify both online and offline PRM learning methods, justifying that rePIRL can learn PRMs with minimal assumptions. Empirical evaluations on standardized math and coding reasoning datasets demonstrate the effectiveness of rePIRL over existing methods. We further show the application of our trained PRM in test-time training, test-time scaling, and providing an early signal for training hard problems. Finally, we validate our training recipe and key design choices via a detailed ablation study.

2601.18973 2026-05-21 cs.LG cs.AI cs.SY eess.SY quant-ph 版本更新

When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control

何时适应胜出?量子控制中元学习的缩放定律

Nima Leclerc, Chris Miller, Nicholas Brawand

发表机构 * The MITRE Corporation(MITRE公司)

AI总结 本文研究了元学习在量子控制中的适应性问题,推导了适应增益的缩放定律,表明适应增益随着梯度步数指数饱和,而随任务方差线性增长,为判断适应的必要性提供了量化标准。

Comments 28 pages, 11 figures

详情
AI中文摘要

量子硬件固有地存在设备异质性和环境漂移,迫使实践者在次优非适应控制器和高成本的设备特定重新校准之间做出选择。我们推导了元学习的缩放定律下限,表明适应增益(任务特定梯度步的预期保真度提升)随着梯度步数指数饱和,而随任务方差线性增长,提供了判断适应是否值得其开销的量化标准。在量子门校准上的验证显示,低方差任务的适应收益微乎其微,但在极端分布外条件(训练噪声的10倍)下,两量子位门的保真度提升超过40%,这对减少云量子处理器上的设备校准时间具有启示。进一步在经典线性二次控制上的验证证实这些定律源于通用优化几何而非量子特定物理。我们还引入了一种少量次预适应协议,能够在3-19%的相对误差范围内,通过N=3-5次探测步估计最优的适应预算。

英文摘要

Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but >40% fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. We further introduce a few-shot pre-adaptation protocol that estimates the optimal adaptation budget from $N{=}3$-5 probe steps within 3-19% relative error across out-of-distribution regimes.

2601.05639 2026-05-21 cs.CV cs.LG 版本更新

Efficient training for compact compression models via sequential distillation

通过序列知识蒸馏实现紧凑压缩模型的高效训练

Caroline Mazini Rodrigues, Nicolas Keriven, Thomas Maugey

发表机构 * Univ. Rennes, Inria, CNRS, IRISA, Rennes, France(里昂大学、法国国家科学研究中心、法国国家信息与自动化研究所、IRISA、里昂,法国)

AI总结 本文提出了一种通过序列知识蒸馏减少自动编码器压缩网络的方法,通过简化早期优化目标和逐步引入复杂性,提高了轻量级模型的重建质量与统计保真度,适用于资源受限环境。

详情
AI中文摘要

深度学习图像压缩模型在硬件受限的应用中常面临实际限制。尽管这些模型能够实现高质量的重建,但它们通常复杂、重量大且需要大量的训练数据和计算资源。我们提出了一种方法,通过更稳定的知识蒸馏过程显著减少基于自动编码器的压缩网络。其核心思想是高度减少的架构可以从早期训练中的简化优化目标中受益,随后逐步引入复杂性。因此,我们的方法首先通过序列编码器-解码器知识蒸馏阶段为轻量模型提供稳健的初始化,随后通过标准训练并可使用潜在蒸馏进行正则化。我们在两个不同的架构上评估了所得到的轻量级自动编码器在图像压缩任务中的表现。实验表明,与使用原始损失训练的轻量级自动编码器相比,我们的方法在早期epoch中更好地保持了重建质量和统计保真度,使其在资源受限环境中更具实用性。

英文摘要

Deep learning models for image compression often face practical limitations in hardware-constrained applications. Although these models achieve high-quality reconstructions, they are typically complex, heavyweight, and require substantial training data and computational resources. We propose a methodology to significantly reduce autoencoder-based compression networks in a more stable Knowledge Distillation process. The intuition is that highly reduced architectures benefit from simplified optimization objectives in early training, with complexity gradually introduced later. Therefore, our approach begins with a sequential encoder--decoder distillation stage that provides a robust initialization for the lightweight model. This is followed by standard training that can be regularized with latent distillation. We evaluate the resulting lightweight autoencoders across two different architectures on the image compression task. Experiments show that our method preserves reconstruction quality and statistical fidelity in early epochs better than training lightweight autoencoders with the original loss, making it practical for resource-limited environments.

2512.19373 2026-05-21 stat.ML cs.LG 版本更新

Cluster-Based Generalized Additive Models Informed by Random Fourier Features

基于聚类的广义加性模型:受随机傅里叶特征启发

Xin Huang, Jia Li, Jun Yu

发表机构 * Department of Mathematics and Mathematical Statistics, Umeå University(数学与统计学系,乌梅大学) Department of Statistics, The Pennsylvania State University(统计系,宾夕法尼亚州立大学)

AI总结 本文提出了一种结合响应引导的谱表示学习与局部加性建模的可解释回归框架,用于处理异质数据。通过随机傅里叶特征回归模型构建谱特征图,并利用主成分分析压缩以获得低维潜在嵌入,随后通过高斯混合模型发现软区域,在每个区域中使用聚类特定的广义加性模型捕捉非线性协变量效应,最终通过软混合这些局部加性模型实现对非线性和异质结构的灵活建模,同时保持可解释性。

Comments 33 pages, 13 figures, 7 tables

详情
AI中文摘要

在开发数据驱动的建模方法时,需要在黑箱模型的强大预测性能与关键应用所需透明性之间取得平衡。本文介绍了一种可解释且计算上可行的回归框架,用于异质数据,通过结合响应引导的谱表示学习与局部加性建模。该方法首先拟合一个随机傅里叶特征回归模型,并构建一个谱特征图,从学习的振幅和自适应重新采样频率中获得,使表示反映数据中的预测变化。该表示随后通过主成分分析压缩以获得低维潜在嵌入,在其中高斯混合模型执行软区域发现。在每个区域中,聚类特定的广义加性模型通过可解释的样条基单变量平滑函数捕捉非线性协变量效应。最终预测器由这些局部加性模型的软混合组成,使能够灵活地建模非线性和异质结构,同时保持可解释性。在多个基准回归数据集上的数值实验表明,所提出的方法在一致地优于经典全局可解释基线的同时,仍与更灵活的黑箱模型竞争。总体而言,该框架提供了一种统一的异质回归方法,结合了预测适应性与可解释的局部协变量效应。

英文摘要

In developing data-driven modeling methodologies, there is an ongoing need to reconcile the strong predictive performance of opaque black-box models with the transparency required for critical applications. This work introduces an interpretable and computationally tractable regression framework for heterogeneous data by combining response-informed spectral representation learning with localized additive modeling. The method first fits a random Fourier feature regression model and constructs a spectral feature map from the learned amplitudes and adaptively resampled frequencies, so that the representation reflects predictive variation in the data. This representation is then compressed by principal component analysis to obtain a low-dimensional latent embedding, in which a Gaussian mixture model performs soft regime discovery. Within each regime, a cluster-specific generalized additive model captures nonlinear covariate effects through interpretable spline-based univariate smooth functions. The final predictor is formed as a soft mixture of these local additive models, enabling flexible modeling of a nonlinear, heterogeneous structure while preserving interpretability. Numerical experiments across several benchmark regression datasets show that the proposed method consistently improves upon classical globally interpretable baselines while remaining competitive with more flexible black-box models. Overall, the framework provides a unified approach to heterogeneous regression that combines predictive adaptivity with interpretable local covariate effects.

2512.13593 2026-05-21 cs.LG 版本更新

Verification of Unknown Dynamical Systems via Autoencoder Latent Space

通过自编码器潜在空间验证未知动态系统

Robert Reed, Luca Laurenti, Morteza Lahijanian

发表机构 * Delft University of Technology(代尔夫特理工大学) AI4I, Turin, Italy(意大利都灵AI4I)

AI总结 本文提出了一种基于凸自编码器和核方法的学习方法,用于减少动态系统维度并验证其在潜在空间中的行为,从而在高维情况下实现更有效的形式验证。

Comments 25 pages, 6 figures, under review

详情
AI中文摘要

形式验证提供了一个强大的框架,用于证明动态系统满足其规范。然而,这些技术在高维设置中面临可扩展性挑战,因为它们通常依赖于状态空间离散化,而这种离散化随着维度的增长呈指数级增长。基于学习的降维方法,利用神经网络和自编码器,已显示出缓解这一问题的巨大潜力。然而,确保潜在空间验证结果的正确性仍是一个开放性问题。在本文中,我们提供了一种正式的方法,通过凸自编码器减少系统的维度,并通过基于核的方法在潜在空间中学习动态。然后,我们从学习的模型中构建一个有限的抽象,并保证该抽象包含原始系统的真正行为。我们证明了潜在空间中的验证结果可以映射回原始系统。最后,我们在多个系统上展示了该方法,包括由神经网络控制的26维系统,展示了显著的可扩展性改进。

英文摘要

Formal verification provides a powerful framework for proving that dynamical systems satisfy their specifications. However, these techniques face scalability challenges in high-dimensional settings, as they often rely on state-space discretization which grows exponentially with dimension. Learning-based approaches to dimensionality reduction, utilizing neural networks and autoencoders, have shown great potential to alleviate this problem. However, ensuring correctness of latent space verification results remains an open question. In this work, we provide a formal approach to reduce the dimensionality of systems via convex autoencoders and learn the dynamics in the latent space through a kernel-based method. We then construct a finite abstraction from the learned model in the latent space and guarantee that the abstraction contains the true behaviors of the original system. We show that the verification results in the latent space can be mapped back to the original system. Finally, we demonstrate the approach on multiple systems, including a 26D system controlled by a neural network, showing significant scalability improvements.

2508.16860 2026-05-21 cs.SE cs.AI cs.LG 版本更新

TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings

TriagerX: 用于基于内容和交互的缺陷分类任务的双变换器

Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

发表机构 * University of Calgary(卡尔加里大学) York University(约克大学) IBM Canada(IBM加拿大)

AI总结 本文提出TriagerX,一种双变换器架构,通过结合内容和交互信息来改进缺陷分类任务的推荐准确性,优于现有最先进方法。

Comments Accepted to IEEE Transactions on Software Engineering (TSE). 17 pages, 15 figures

详情
AI中文摘要

预训练语言模型(PLMs)是基于变换器的架构,可用于缺陷分类任务。PLMs比传统机器学习(ML)模型更能捕捉标记语义(例如TF-IDF、词袋)。然而,PLMs可能仍然会关注在缺陷报告中不相关的标记,这会影响其有效性。此外,当不考虑开发人员围绕类似缺陷的交互历史时,模型的推荐可能不够优化。我们设计了TriagerX来解决这些限制。首先,为了更可靠地评估标记语义,我们利用双变换器架构。与当前最先进的(SOTA)基线使用单一变换器架构不同,TriagerX从两个变换器中收集推荐,每个变换器通过其最后三层提供推荐。这种设置生成了一个稳健的内容基于候选开发人员的排名。TriagerX然后通过一种新的基于交互的排名方法来细化此排名,该方法考虑了开发人员与类似修复缺陷的历史交互。在五个数据集中,TriagerX超越了所有九种基于变换器的方法,包括SOTA基线,通常在Top-1和Top-3开发人员推荐准确性上提高了超过10%。我们与我们的大型行业合作伙伴合作,成功将其部署到他们的开发环境中。合作伙伴要求开发人员和组件的推荐,组件作为团队分配的代理,特别是在开发人员轮岗或团队变化的情况下特别有用。我们训练TriagerX在合作伙伴的数据集上进行两项任务,并在组件推荐上优于SOTA基线最高达10%,在开发人员推荐上最高达54%。

英文摘要

Pretrained Language Models or PLMs are transformer-based architectures that can be used in bug triaging tasks. PLMs can better capture token semantics than traditional Machine Learning (ML) models that rely on statistical features (e.g., TF-IDF, bag of words). However, PLMs may still attend to less relevant tokens in a bug report, which can impact their effectiveness. In addition, the model can be sub-optimal with its recommendations when the interaction history of developers around similar bugs is not taken into account. We designed TriagerX to address these limitations. First, to assess token semantics more reliably, we leverage a dual-transformer architecture. Unlike current state-of-the-art (SOTA) baselines that employ a single transformer architecture, TriagerX collects recommendations from two transformers with each offering recommendations via its last three layers. This setup generates a robust content-based ranking of candidate developers. TriagerX then refines this ranking by employing a novel interaction-based ranking methodology, which considers developers' historical interactions with similar fixed bugs. Across five datasets, TriagerX surpasses all nine transformer-based methods, including SOTA baselines, often improving Top-1 and Top-3 developer recommendation accuracy by over 10%. We worked with our large industry partner to successfully deploy TriagerX in their development environment. The partner required both developer and component recommendations, with components acting as proxies for team assignments-particularly useful in cases of developer turnover or team changes. We trained TriagerX on the partner's dataset for both tasks, and it outperformed SOTA baselines by up to 10% for component recommendations and 54% for developer recommendations.

2506.20764 2026-05-21 math.OC cs.LG 版本更新

Control and optimization for Neural Partial Differential Equations in Supervised Learning

神经偏微分方程在监督学习中的控制与优化

Alain Bensoussan, Minh-Binh Tran, Bangjie Wang

发表机构 * Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, TX 75080, USA Department of Mathematics, Texas A\&M University, College Station, TX 77843, USA

AI总结 本文提出将神经网络视为偏微分方程的新视角,研究了在抛物型和双曲型算子中优化和控制系数的问题,并证明了抛物型偏微分方程控制问题的可解性。

详情
AI中文摘要

尽管关于抛物型和双曲型系统控制和优化问题已有大量文献,但针对此类系统中相关算子系数的控制和优化问题尚未得到充分探讨。本文旨在开启控制理论中研究这些算子系数优化与控制问题的新方向,这一问题自然出现在神经网络和监督学习的背景下。在监督学习中,主要目标是通过神经网络的层将初始数据传输到目标数据。我们提出将神经网络视为偏微分方程(PDEs)的新视角,从这一角度看,传统在常微分方程(ODEs)中研究的控制问题被重新表述为PDEs的控制问题,特别是针对抛物型和双曲型算子的系数优化与控制。据我们所知,这一特定问题在PDEs的控制理论中尚未系统地得到解决。为此,我们为抛物型PDEs的控制和优化问题提出了双系统公式,为未来研究中开发高效的数值方案奠定了基础。我们还提供了一个理论证明,显示抛物型PDEs的控制和优化问题具有极小值解。最后,我们研究了双曲型PDEs的控制问题,并证明了对应近似控制问题的解的存在性。

英文摘要

Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the coefficients of the associated operators within such systems has not yet been thoroughly explored. In this work, we aim to initiate a line of research in control theory focused on optimizing and controlling the coefficients of these operators-a problem that naturally arises in the context of neural networks and supervised learning. In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: neural networks can be interpreted as partial differential equations (PDEs). From this viewpoint, the control problem traditionally studied in the context of ordinary differential equations (ODEs) is reformulated as a control problem for PDEs, specifically targeting the optimization and control of coefficients in parabolic and hyperbolic operators. To the best of our knowledge, this specific problem has not yet been systematically addressed in the control theory of PDEs. To this end, we propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers. Finally, we investigate the control problem associated with hyperbolic PDEs and prove the existence of solutions for a corresponding approximated control problem.

2506.16950 2026-05-21 cs.CV cs.LG 版本更新

LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models

LAION-C: 一个用于网络级视觉模型的分布外基准

Fanfei Li, Thomas Klein, Wieland Brendel, Robert Geirhos, Roland S. Zimmermann

发表机构 * Max Planck Institute for Intelligent Systems, Tübingen, Germany(马克斯·普朗克智能系统研究所,图宾根,德国) ELLIS Institute Tübingen(图宾根ELLIS研究所) Tübingen AI Center(图宾根人工智能中心) Google DeepMind(谷歌DeepMind)

AI总结 本文提出LAION-C作为ImageNet-C的替代基准,旨在评估网络级数据集下的分布外鲁棒性,通过引入六种新的分布外扰动类型,发现现代模型在这些扰动下的表现显著提升,甚至超过人类观察者。

Comments ICML 2025 camera ready version

详情
AI中文摘要

分布外鲁棒性是计算机视觉模型的期望属性。提高模型鲁棒性需要高质量的鲁棒性基准信号来量化进展。尽管在ImageNet时代提出了多种基准数据集,如ImageNet-C,但大多数ImageNet-C的腐蚀类型不再相对于当今的大型网络爬取数据集是分布外的,因为这些数据集已经包含常见的腐蚀如模糊或JPEG压缩伪影。因此,这些基准不再适合评估网络级数据集中的分布外鲁棒性。事实上,最近的模型在ImageNet时代的分布外基准上显示出饱和分数,表明不清楚在网络级数据集上训练的模型是否真的在分布外泛化上更好,或者是否只是在训练过程中暴露于测试扭曲。为此,我们引入LAION-C作为ImageNet-C的替代基准。LAION-C包含六种新的扰动类型,专门设计为即使对于LAION这样的网络级数据集也是分布外的。在对最新模型的全面评估中,我们发现LAION-C数据集对当代模型提出了重大挑战,包括Gemini和GPT-4o等大语言模型。我们还进行了心理物理实验来评估我们扰动对人类观察者难度,从而能够将模型与实验室质量的人类鲁棒性数据进行比较。我们观察到分布外泛化的一个范式转变:从人类优于模型,到最佳模型现在匹配或优于最佳人类观察者。

英文摘要

Out-of-distribution (OOD) robustness is a desired property of computer vision models. Improving model robustness requires high-quality signals from robustness benchmarks to quantify progress. While various benchmark datasets such as ImageNet-C were proposed in the ImageNet era, most ImageNet-C corruption types are no longer OOD relative to today's large, web-scraped datasets, which already contain common corruptions such as blur or JPEG compression artifacts. Consequently, these benchmarks are no longer well-suited for evaluating OOD robustness in the era of web-scale datasets. Indeed, recent models show saturating scores on ImageNet-era OOD benchmarks, indicating that it is unclear whether models trained on web-scale datasets truly become better at OOD generalization or whether they have simply been exposed to the test distortions during training. To address this, we introduce LAION-C as a benchmark alternative for ImageNet-C. LAION-C consists of six novel distortion types specifically designed to be OOD, even for web-scale datasets such as LAION. In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models, including MLLMs such as Gemini and GPT-4o. We additionally conducted a psychophysical experiment to evaluate the difficulty of our corruptions for human observers, enabling a comparison of models to lab-quality human robustness data. We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.

2506.08277 2026-05-21 q-bio.NC cs.AI cs.CL cs.CV cs.LG 版本更新

Task-conditioned probing of instruction-tuned multimodal LLMs: Region-specific brain alignment patterns under naturalistic stimuli

基于任务的指令调制多模态大语言模型探测:在自然主义刺激下的区域特定大脑对齐模式

Subba Reddy Oota, Khushbu Pahwa, Prachi Jindal, Satya Sai Srinath Namburi, Maneesh Singh, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

发表机构 * Technische Universität Berlin(柏林技术大学) Rice University(Rice 大学) AWS AI Labs, Amazon(Amazon 人工智能实验室) IIT Delhi(德里理工学院) University of Wisconsin - Madison(威斯康星大学麦迪逊分校) Spector Inc(Spector 公司) IIIT-Hyderabad(海得拉巴理工学院) Microsoft(微软)

AI总结 本研究探讨了指令调制多模态大语言模型在自然主义刺激下的大脑对齐模式,通过比较不同模型在视频和音频任务中的表现,揭示了指令调制对模型表示能力的影响。

Comments 57 pages, 39 figures

详情
AI中文摘要

近期的体素级多模态脑编码研究显示,多模态大语言模型(MLLMs)在大脑对齐程度上高于单模态模型。更近期的研究表明,指令调制多模态(IT)模型能够生成与大脑活动强相关的任务特定表示,但大多数先前评估集中在单模态刺激或非指令调制模型上。我们仍然缺乏对指令调制是否使IT-MLLMs围绕功能任务需求组织其表示,还是仅反映表面语义的清晰理解。为此,我们通过预测自然主义电影观看(带音频的视频)期间记录的fMRI响应,来估计大脑对齐情况。使用来自六个视频和两个音频IT-MLLMs的指令特定嵌入,跨13个视频任务指令,我们发现指令调制视频MLLMs的大脑对齐程度高于上下文学习(ICL)多模态模型(~9%)、非指令调制多模态模型(~15%)和单模态基线(~20%)。我们对视频和音频任务以及语言引导的探测评估,产生了不同任务特定的MLLM表示,这些表示在不同大脑区域中变化。我们还发现,ICL模型表现出强语义组织(r=0.78),而IT模型与指令文本语义的耦合较弱(r=0.14),这与与更高大脑对齐相关的任务条件子空间一致。这些发现支持了任务特定指令与更强的大脑-MLLM对齐之间的关联,并为映射两个系统中的联合信息处理开辟了新途径。我们公开了代码 [https://github.com/subbareddy248/mllm_videos]。

英文摘要

Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models. More recently, instruction-tuned multimodal (IT) models have been shown to generate task-specific representations that align strongly with brain activity, yet most prior evaluations focus on unimodal stimuli or non-instruction-tuned models under multimodal stimuli. We still lack a clear understanding of whether instruction-tuning is associated with IT-MLLMs organizing their representations around functional task demands or if they simply reflect surface semantics. To address this, we estimate brain alignment by predicting fMRI responses recorded during naturalistic movie watching (video with audio) from MLLM representations. Using instruction-specific embeddings from six video and two audio IT-MLLMs, across 13 video task instructions, we find that instruction-tuned video MLLMs show higher brain alignment than in-context learning (ICL) multimodal models (~9%), non-instruction-tuned multimodal models (~15%), and unimodal baselines (~20%). Our evaluation of MLLMs across video and audio tasks, and language-guided probing produces distinct task-specific MLLM representations that vary across brain regions. We also find that ICL models show strong semantic organization (r=0.78), while IT models show weak coupling to instruction-text semantics (r=0.14), consistent with task-conditioned subspaces associated with higher brain alignment. These findings are consistent with an association between task-specific instructions and stronger brain-MLLM alignment, and open new avenues for mapping joint information processing in both systems. We make the code publicly available [https://github.com/subbareddy248/mllm_videos].

2503.19708 2026-05-21 physics.flu-dyn cs.LG 版本更新

FLUME-FNO: data-efficient and scalable prediction of 3D wind and temperature fields in unseen urban morphologies

FLUME-FNO:在未见的城市形态中高效且可扩展地预测三维风场和温度场

Shaoxiang Qin, Theodore Potsis, Dongxue Zhan, Xue Liu, Ted Stahopoulos, Liangzhu Leon Wang

发表机构 * Department of Building, Civil and Environmental Engineering, Cetner Zero Energy Building Studies, Concordia University(建筑、土木与环境工程系,零能耗建筑研究中心,康科迪亚大学) School of Computer Science, McGill University(计算机科学系,麦吉尔大学) Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates(人工智能Mohamed bin Zayed大学,阿布扎比,阿拉伯联合酋长国)

AI总结 本文提出FLUME-FNO方法,通过仅使用建筑几何信息高效且可扩展地预测未见城市形态中的三维风场和温度场,解决了传统CFD计算成本高和深度学习方法依赖大量训练数据的问题。

详情
AI中文摘要

城市微气候,由建筑物几何形状所塑造的风场和温度场,显著影响能源消耗、行人风、污染物扩散、城市热岛效应和公共健康。准确预测微气候至关重要但具有挑战性。传统计算流体动力学(CFD)在快速评估中计算成本过高,而许多深度学习方法需要大量训练数据且在未见配置中泛化能力差。我们提出了快速局部化城市微气候模拟傅里叶神经算子(FLUME-FNO),一种基于仅建筑几何信息的高效且可扩展的框架,用于快速预测三维风场和温度场。FLUME-FNO假设局部城市微气候主要由从特定位置直接可见的周围几何形状控制。为此,该框架引入了一种新的多方向距离特征(MDDF),通过测量到周围建筑物的方向距离来表示可见的开放空间结构。通过在全域上计算MDDF并将编码的几何特征裁剪成较小的3D块,FLUME-FNO有效地增强了有限的CFD数据,使其能够从仅23个CFD模拟中进行稳健学习。该模型在未见配置上实现了风速的均绝对误差为0.2 m/s和温度的均绝对误差为0.19 °C。为满足对可信快速微气候预测的需求,该框架进一步使用深度集成作为FLUME-FNO不确定性的实用代理,不确定性范围从3%到40%不等。UQ框架证明FLUME-FNO在风工程和微气候研究中提供了稳健、可信的预测,其精度在可接受的误差阈值内,突显了其在现实应用中的潜力。

英文摘要

Urban microclimate, encompassing wind and temperature fields shaped by building geometry, significantly impacts energy consumption, pedestrian winds, pollutant dispersion, urban heat island, and public health. Accurately predicting microclimate is crucial yet challenging. Conventional Computational Fluid Dynamics (CFD) is computationally prohibitive for rapid assessments, while many deep learning approaches require extensive training data and struggle with generalization in unseen configurations. We present the Fast Localized Urban Microclimate Emulation Fourier Neural Operator (FLUME-FNO), a data-efficient and scalable framework for rapid prediction of 3D wind and temperature fields based solely on building geometry. FLUME-FNO assumes the local urban microclimate is primarily governed by surrounding geometry directly visible from a specific location. To encode this, the framework introduces a novel Multi-Directional Distance Feature (MDDF), representing visible open-space structures by measuring directional distances to surrounding buildings. By computing MDDF over the full domain and cropping encoded geometric features into smaller 3D patches, FLUME-FNO effectively augments limited CFD data, enabling robust learning from just 23 CFD simulations. The model achieves mean absolute errors of 0.2 m/s for wind speed and 0.19 °C for temperature on unseen configurations. Addressing the need for trustworthy fast microclimate prediction, the framework is further assessed using a deep ensemble as a practical proxy for FLUME-FNO uncertainty, ranging from 3% to 40% depending on location. The UQ framework demonstrates FLUME-FNO provides resilient, trustworthy predictions within acceptable accuracy thresholds for wind engineering and microclimate studies, highlighting its potential for real-world applications.

2503.15105 2026-05-21 math.NA cs.LG cs.NA math.OC 版本更新

Control, Optimal Transport and Neural Differential Equations in Supervised Learning

控制、最优传输与神经微分方程在监督学习中的应用

Minh-Nhat Phung, Minh-Binh Tran

发表机构 * Department of Mathematics, Texas A\&M University, College Station, TX 77843, USA(德克萨斯A&M大学数学系)

AI总结 本文研究了使用神经微分方程近似最优传输方程的基本计算问题,提出了一个新颖的框架用于用神经ODE近似连续域中的不平衡最优传输,通过推广具有皮尔逊发散的离散UOT问题,构造了收敛于真实UOT动态的向量场,推动了计算传输和机器学习的数学基础。

详情
AI中文摘要

我们研究了使用神经微分方程近似最优传输(OT)方程的基本计算问题。更具体地说,我们开发了一个新的框架,用于用神经ODE近似连续域中的不平衡最优传输(UOT)。通过推广具有皮尔逊发散的离散UOT问题,我们构造了神经ODE的向量场,这些向量场收敛于真实的UOT动态,从而推进了计算传输和机器学习的数学基础。为此,我们设计了一种受Sinkhorn算法启发的数值方案来解决相应的最小化问题,并严格证明其收敛性,提供明确的误差估计。从获得的数值解中,我们推导出定义传输动态的向量场,并构造相应的传输方程。最后,从数值获得的传输方程中,我们构造了一个神经微分方程,其流在适当的极限情况下收敛于真实的传输动态。

英文摘要

We study the fundamental computational problem of approximating optimal transport (OT) equations using neural differential equations (Neural ODEs). More specifically, we develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics, thereby advancing the mathematical foundations of computational transport and machine learning. To this end, we design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural differential equation whose flow converges to the true transport dynamics in an appropriate limiting regime.

2406.03506 2026-05-21 cs.LG cs.AI 版本更新

Fuzzy Convolution Neural Networks for Tabular Data Classification

模糊卷积神经网络用于表格数据分类

Arun D. Kulkarni

发表机构 * Computer Science Department, University of Texas at Tyler(德克萨斯大学泰勒分校计算机科学系)

AI总结 本文提出了一种针对表格数据分类的模糊卷积神经网络(FCNN),通过将特征值映射为模糊隶属度并转换为图像来训练CNN模型,从而在表格数据分类任务中实现有效的学习和优于现有方法的性能。

Comments 10 pages, 16 figures, Submitted to IEEE Access

详情
Journal ref
IEEE Access, vol. 12, pp. 151846-151855 (2024)
AI中文摘要

近年来,由于在各种领域中表现出色,特别是图像和文本分类任务,卷积神经网络(CNNs)已经引起了广泛关注。然而,它们在表格数据分类中的应用仍然很少被探索。在生物信息学、金融、医学等领域,非图像数据普遍存在。将CNNs适应于分类非图像数据仍然极具挑战性。本文研究了CNNs在表格数据分类中的有效性,旨在弥合传统机器学习方法与深度学习技术之间的差距。我们提出了一种专门针对表格数据的新型框架——模糊卷积神经网络(FCNN),以捕捉特征向量中的局部模式。在我们的方法中,我们将特征值映射到模糊隶属度。模糊隶属度向量被转换为图像,用于训练CNN模型。训练后的CNN模型用于分类未知的特征向量。为了验证我们的方法,我们生成了六个复杂的噪声数据集。我们从每个数据集中随机选择70%的样本用于训练,30%用于测试。数据集还使用了最先进的机器学习算法,如决策树(DT)、支持向量机(SVM)、模糊神经网络(FNN)、贝叶斯分类器和随机森林(RF)进行分类。实验结果表明,我们提出的方法能够有效地从表格数据中学习有意义的表示,实现与现有方法相媲美或更优的性能。总体而言,我们的发现表明,所提出的FCNN模型在表格数据分类任务中具有前景,作为一种可行的替代方案,为在结构化数据分析中利用深度学习提供了新的视角和潜在的机会。

英文摘要

Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.

2305.09620 2026-05-21 cs.CL cs.AI cs.LG 版本更新

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

AI增强的调查:利用大型语言模型和调查进行意见预测

Junsol Kim, Byungkyu Lee

发表机构 * Department of Sociology(社会学系) University of Chicago(芝加哥大学) New York University(纽约大学) Chicago, IL(伊利诺伊州芝加哥市) New York, NY(纽约州纽约市)

AI总结 本文提出了一种基于大型语言模型的框架,通过结合问题、受访者和调查时期的嵌入表示,预测重复横断面调查中缺失的响应,从而弥补传统调查在捕捉历史变化方面的不足。

详情
AI中文摘要

全国代表性调查追踪公众意见,但每年只询问有限的问题,限制了其捕捉历史变化的潜力。为填补这一空白,我们开发了一个基于大型语言模型(LLM)的框架,通过结合问题、受访者和调查时期的嵌入表示,预测重复横断面调查中缺失的响应。我们引入了LLM在调查研究中的两个新应用:回溯预测(预测年度层面的缺失意见)和未询问意见预测(预测完全缺失的意见)。使用1972-2021年一般社会调查的数据,我们的LLM模型在交叉验证和在GSS未询问的年份中通过其他组织测量的公众意见方面表现良好。这些能力使我们能够恢复缺失的趋势并确定公众态度变化的时间,例如同性婚姻支持率的上升。然而,未询问意见预测的性能仍较为有限。我们展示了当我们的模型优于现有基准时的情况,检验了哪些意见和受访者更具可预测性,并评估了我们的方法是否减少了LLM预测响应的同质化倾向。我们的研究证明了LLM和调查可以相互增强:LLM扩大了调查的潜力,而调查则校准LLM以模拟人类意见。

英文摘要

Nationally representative surveys track public opinion, yet they ask only a limited set of questions each year, limiting its potential to capture historical changes. To fill this gap, we develop a large language model (LLM)-based framework for predicting missing responses in repeated cross-sectional surveys by incorporating embeddings for questions, respondents, and survey periods. We introduce two new applications of LLMs to survey research: retrodiction (predicting year-level missing opinions) and unasked opinion prediction (predicting entirely missing opinions). Using data from the 1972-2021 General Social Surveys, our LLM-based models perform strongly in retrodicting masked GSS opinions through cross-validation and public opinions measured by other organizations in years when the GSS did not ask them. These capabilities enable us to recover missing trends and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. However, performance remains modest for unasked opinion prediction. We show when our models outperform established benchmarks, examine which opinions and and respondents are more predictable, and evaluate whether our approach reduces LLMs' tendency to homogenize predicted responses. Our study demonstrates that LLMs and surveys can mutually enhance each other: LLMs broaden survey potential, while surveys calibrate LLMs for simulating human opinions.